Archive

Utilizing PyArrow to improve pandas and Dask workflows

Get the most out of PyArrow support in pandas and Dask right now

Introduction

This post investigates where we can use PyArrow to improve our pandas and Dask workflows right now. General support for PyArrow dtypes was added with pandas 2.0 to pandas and Dask. This solves a bunch …


Welcoming pandas 2.0

How the API is changing and how to leverage new functionalities

Introduction

After 3 years of development, the second pandas 2.0 release candidate was released on the 16th of March. There are many new features in pandas 2.0, including improved extension array support, pyarrow support for DataFrames and …


A guide to efficient data selection in pandas

Improve performance when selecting data from a pandas object

Introduction

There exist different ways of selecting a subset of data from a pandas object. Depending on the specific operation, the result will either be a view pointing to the original data or a copy of the original data. This ties …


A solution for inconsistencies in indexing operations in pandas

Get rid of annoying SettingWithCopyWarning messages

Introduction

Indexing operations in pandas are quite flexible and thus, have many cases that can behave quite different and therefore produce unexpected results. Additionally, it is hard to predict when a SettingWithCopyWarningis raised and what this means exactly. I’ll show a couple of …