{ "cells": [ { "cell_type": "markdown", "metadata": {}, "source": [ "# Measuring Performance\n", "\n", "Here's some examples of how to find the optimal worker count and chunk size for different Dask operations. See the scripts in [the github repository](https://github.com/coecms-training/parallel/tree/main/case-studies/read_speed) for samples of how these were measured.\n", "\n", ":::{note}\n", "The first time a file is read can be quite slow if it's not in cache. When running benchmarks make sure to load the file fully once (e.g. with `data.mean().load()`) before doing any timing.\n", ":::" ] }, { "cell_type": "code", "execution_count": 1, "metadata": {}, "outputs": [], "source": [ "import pandas\n", "import numpy\n", "import dask\n", "import matplotlib.pyplot as plt" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## ERA5 single level one year mean\n", "\n", "Lets look at performance for reading in a year of ERA5 single level data. This data comes from compressed netcdf files, so the time spent decompressing will affect our measurements.\n", "\n", "I've run for various chunk sizes and Dask cluster sizes\n", "```python\n", "path = \"/g/data/rt52/era5/single-levels/reanalysis/2t/2001/2t_era5_oper_sfc_*.nc\"\n", "with xarray.open_mfdataset(\n", " path, combine=\"nested\", concat_dim=\"time\", chunks=chunks\n", ") as ds:\n", " var = ds[variable]\n", "\n", " start = time.perf_counter()\n", " mean = var.mean().load()\n", " duration = time.perf_counter() - start\n", "```\n", "and saved the results in a netcdf file" ] }, { "cell_type": "code", "execution_count": 2, "metadata": { "tags": [ "hide-output" ] }, "outputs": [ { "data": { "text/html": [ "
\n", " | duration | \n", "data_size | \n", "chunk_size | \n", "time | \n", "latitude | \n", "longitude | \n", "workers | \n", "threads | \n", "
---|---|---|---|---|---|---|---|---|
0 | \n", "38.058817 | \n", "36379929600 | \n", "194987520 | \n", "NaN | \n", "182 | \n", "360 | \n", "4 | \n", "1 | \n", "
1 | \n", "38.475415 | \n", "36379929600 | \n", "97493760 | \n", "NaN | \n", "182 | \n", "180 | \n", "4 | \n", "1 | \n", "
2 | \n", "38.962101 | \n", "36379929600 | \n", "48746880 | \n", "NaN | \n", "91 | \n", "180 | \n", "4 | \n", "1 | \n", "
3 | \n", "39.011292 | \n", "36379929600 | \n", "97493760 | \n", "NaN | \n", "91 | \n", "360 | \n", "4 | \n", "1 | \n", "
4 | \n", "50.406360 | \n", "36379929600 | \n", "6093360 | \n", "93.0 | \n", "91 | \n", "180 | \n", "4 | \n", "1 | \n", "
... | \n", "... | \n", "... | \n", "... | \n", "... | \n", "... | \n", "... | \n", "... | \n", "... | \n", "
112 | \n", "17.296662 | \n", "36379929600 | \n", "6093360 | \n", "93.0 | \n", "91 | \n", "180 | \n", "16 | \n", "1 | \n", "
113 | \n", "18.460571 | \n", "36379929600 | \n", "97493760 | \n", "NaN | \n", "182 | \n", "180 | \n", "16 | \n", "1 | \n", "
114 | \n", "18.579301 | \n", "36379929600 | \n", "97493760 | \n", "NaN | \n", "91 | \n", "360 | \n", "16 | \n", "1 | \n", "
115 | \n", "20.263248 | \n", "36379929600 | \n", "194987520 | \n", "NaN | \n", "182 | \n", "360 | \n", "16 | \n", "1 | \n", "
116 | \n", "20.702859 | \n", "36379929600 | \n", "389975040 | \n", "NaN | \n", "182 | \n", "720 | \n", "16 | \n", "1 | \n", "
117 rows × 8 columns
\n", "\n", " | workers | \n", "chunk_size | \n", "threads | \n", "duration | \n", "data_size | \n", "time | \n", "latitude | \n", "longitude | \n", "
---|---|---|---|---|---|---|---|---|
workers | \n", "\n", " | \n", " | \n", " | \n", " | \n", " | \n", " | \n", " | \n", " |
1 | \n", "1 | \n", "48746880 | \n", "1 | \n", "192.943019 | \n", "3.637993e+10 | \n", "NaN | \n", "91.0 | \n", "180.0 | \n", "
2 | \n", "2 | \n", "48746880 | \n", "1 | \n", "86.172158 | \n", "3.637993e+10 | \n", "NaN | \n", "91.0 | \n", "180.0 | \n", "
4 | \n", "4 | \n", "48746880 | \n", "1 | \n", "39.269419 | \n", "3.637993e+10 | \n", "NaN | \n", "91.0 | \n", "180.0 | \n", "
8 | \n", "8 | \n", "48746880 | \n", "1 | \n", "21.874852 | \n", "3.637993e+10 | \n", "NaN | \n", "91.0 | \n", "180.0 | \n", "
16 | \n", "16 | \n", "48746880 | \n", "1 | \n", "13.159739 | \n", "3.637993e+10 | \n", "NaN | \n", "91.0 | \n", "180.0 | \n", "
32 | \n", "32 | \n", "48746880 | \n", "1 | \n", "7.959964 | \n", "3.637993e+10 | \n", "NaN | \n", "91.0 | \n", "180.0 | \n", "
48 | \n", "48 | \n", "48746880 | \n", "1 | \n", "6.344361 | \n", "3.637993e+10 | \n", "NaN | \n", "91.0 | \n", "180.0 | \n", "