High Level Query Optimization in Dask
Introduction
Dask DataFrame doesn't currently optimize your code for you (like Spark or a SQL database would). This means that users waste a lot of computation. Let's look at a common example which looks ok at first glance, but is actually pretty inefficient.
import dask.dataframe as dd
df = dd …