Performance¶
Getting Dask to perform well can be tricky. If you run into performance issues, whether with runtime or memory usage, break up your problem into stages and make sure each stage is working correctly before moving to the next.
Note
These general recommendations are for performing reasonably straightforward operations on NetCDF files. Different types of analyses may show different behaviour, if in doubt profile!
Cluster Size¶
Generally your Dask cluster size should be kept reasonably small, say 8ish CPUs, if doing basic operations on NetCDF data. Due to Ahmdhal’s law more CPUs may not give you that much more performance. Profile your code by running it on a subset of the full dataset with different CPU counts.
Prefer using processes (using dask.distributed.Client
) to threads. Accessing a single NetCDF file within a single process is a serial operation - the NetCDF library locks the file, preventing more than one thread from reading the file. Different processes however can read from the same file fine (but not write to it).
Graph of scaling with different cluster / thread sizes
(distributed dashboard)=
Distributed Dashboard¶
The dashboard that comes up when you start a cluster in Jupyter can give a lot of useful information.
Is the memory use fairly stable, or does it keep increasing? Break up your problem to identify the operation that’s using all the memory, perhaps by using intermediate save & loads, or look at your initial chunking
Is there a lot of red boxes in the timeline? That means Dask is spending a lot of time shuffling data around rather than doing useful work.
Does it take ages for the dashboard to start showing anything running? Your task graph may have gotten too large.
Choice of chunking¶
The specific chunking you’re using can have a big effect on performance. Look at the chunking when you first open a dataset, this is what is most important for memory use. Note however that too small a chunk size can also create problems, by making too large a task graph
Chunk size¶
Aim for a chunk size that is a good bit less than the amount of memory available per CPU. On Gadi, there is 4 gb of memory per CPU on general purpose compute nodes, aim for a chunk size of less than 200 mb. This allows the computer to load multiple chunks at once and do useful work with them - if a single chunk is close to the memory limit then you can’t load more than one of them to say add them together.
Chunk shape¶
If you’re going to be filtering out data - say selecting a single level or timestep, then aim for a chunking that will make that easy to do - use a low chunk size in that dimension.
Generally, files are laid out so that nearby grid points are close to each other in a file, and consecutive time points are further away from each other. Loading nearby points from a file is faster, so aim to have a larger horizontal chunk size and a smaller time chunk size.
NetCDF file chunking¶
NetCDF files can contain their own chunks, this is used for compression and faster data access. In this case loading data within the same chunk is faster so aim for your Dask chunk size to be some multiple of the NetCDF chunk size.
Task Graph¶
As you built up operations on your data, or if you have a great number of small chunks the size of Dask’s task graph can become difficult for it to manage. Normally this manifests as Dask taking a long time to start running once you execute a Jupyter cell.
Intermediate save & loads¶
To cut down on the graph size it can be helpful to reset everything by saving your current progress to a file and then re-opening it. This can also be helpful if you’re going to loop over one of the dimensions (say to makean animated plot over time) - otherwise Dask can end up re-calculating everything on every loop iteration.
Nothing tricky here - just mind file sizes and clean up the temporary file when you’re done
data.to_netcdf('tmp.nc')
data = xarray.open_dataset('tmp.nc', chunks={...})