Improving GroupBy.map with Dask and Xarray

Running large-scale GroupBy-Map patterns with Xarray that are backed by Dask arrays is an essential part of a lot of typical geospatial workloads. Detrending is a very common operation where this pattern is needed.

In this post, we will explore how and why this caused so many pitfalls for Xarray …

Dask DataFrame is Fast Now

Intro

Dask DataFrame scales out pandas DataFrames to operate at the 100GB-100TB scale.

Historically, Dask was pretty slow compared to other tools in this space (like Spark). Due to a number of improvements focused on performance, it's now pretty fast (about 20x faster than before). The new implementation moved Dask …

What's new in pandas 2.2

The most interesting things about the new release

pandas 2.2 was released on January 22nd 2024. Let’s take a look at the things this release introduces and how it will help us to improve our pandas workloads. It includes a bunch of improvements that will improve the user …

Deep dive into pandas Copy-on-Write mode - part III

Explaining the migration path for Copy-on-Write

Introduction

The introduction of Copy-on-Write (CoW) is a breaking change that will have some impact on existing pandas-code. We will investigate how we can adapt our code to avoid errors when CoW will be enabled by default. This is currently planned for the pandas …

What's new in pandas 2.1

The most interesting things about the new release

pandas 2.1 was released on August 30th 2023. Let’s take a look at the things this release introduces and how it will help us improving our pandas workloads. It includes a bunch of improvements and also a set of new …