{ "cells": [ { "cell_type": "markdown", "metadata": {}, "source": [ "# Loading Ensemble Members\n", "\n", "The C20C dataset contains files split by both year and ensemble member. Let's load them with Xarray and create an ensemble mean." ] }, { "cell_type": "code", "execution_count": 1, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "/g/data/ua8/C20C/v3/member_monthly/TMPS/1850/TMPS.1850.mnmean_mem001.nc\n", "/g/data/ua8/C20C/v3/member_monthly/TMPS/1850/TMPS.1850.mnmean_mem002.nc\n", "/g/data/ua8/C20C/v3/member_monthly/TMPS/1850/TMPS.1850.mnmean_mem003.nc\n", "/g/data/ua8/C20C/v3/member_monthly/TMPS/1850/TMPS.1850.mnmean_mem004.nc\n", "ls: write error: Broken pipe\n" ] } ], "source": [ "ls /g/data/ua8/C20C/v3/member_monthly/TMPS/*/*_mem*.nc | head -4" ] }, { "cell_type": "code", "execution_count": 2, "metadata": { "tags": [ "hide-input", "hide-output" ] }, "outputs": [ { "data": { "text/html": [ "\n", "\n", "\n", "\n", "\n", "
\n", "

Client

\n", "\n", "
\n", "

Cluster

\n", "
    \n", "
  • Workers: 2
  • \n", "
  • Cores: 2
  • \n", "
  • Memory: 8.59 GB
  • \n", "
\n", "
" ], "text/plain": [ "" ] }, "execution_count": 2, "metadata": {}, "output_type": "execute_result" } ], "source": [ "import xarray\n", "from tqdm.auto import tqdm\n", "import climtas.nci\n", "\n", "climtas.nci.GadiClient()" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "We can use a loop to load each member individually, storing the members' data in the array `dss`. `tqdm()` here just adds a progress bar, so we can see where the load has gotten to.\n", "\n", "One of the files in this dataset has latitude values slightly different to the others, from inspection of the file they're just stored at a higher precision. I've used `join='override'` to take the _lat_ and _lon_ coordinate values from the first file." ] }, { "cell_type": "code", "execution_count": 3, "metadata": {}, "outputs": [ { "data": { "application/vnd.jupyter.widget-view+json": { "model_id": "3c15d0d3a2854528b99c3fa8e2ed2c2f", "version_major": 2, "version_minor": 0 }, "text/plain": [ " 0%| | 0/80 [00:00\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "
<xarray.DataArray 'TMP' (member: 80, time: 1992, lat: 256, lon: 512)>\n",
       "dask.array<concatenate, shape=(80, 1992, 256, 512), dtype=float32, chunksize=(1, 12, 256, 512), chunktype=numpy.ndarray>\n",
       "Coordinates:\n",
       "  * time     (time) datetime64[ns] 1850-01-16T10:30:00 ... 2015-12-16T10:30:00\n",
       "  * lon      (lon) float32 0.0 0.703 1.406 2.109 ... 357.1 357.8 358.5 359.2\n",
       "  * lat      (lat) float32 89.46 88.77 88.07 87.37 ... -88.07 -88.77 -89.46\n",
       "Dimensions without coordinates: member\n",
       "Attributes:\n",
       "    standard_name:       air_temperature\n",
       "    long_name:           Temperature\n",
       "    units:               K\n",
       "    param:               0.0.0\n",
       "    realization:         1\n",
       "    ensemble_members:    10\n",
       "    forecast_init_type:  3\n",
       "    original_name:       t
" ], "text/plain": [ "\n", "dask.array\n", "Coordinates:\n", " * time (time) datetime64[ns] 1850-01-16T10:30:00 ... 2015-12-16T10:30:00\n", " * lon (lon) float32 0.0 0.703 1.406 2.109 ... 357.1 357.8 358.5 359.2\n", " * lat (lat) float32 89.46 88.77 88.07 87.37 ... -88.07 -88.77 -89.46\n", "Dimensions without coordinates: member\n", "Attributes:\n", " standard_name: air_temperature\n", " long_name: Temperature\n", " units: K\n", " param: 0.0.0\n", " realization: 1\n", " ensemble_members: 10\n", " forecast_init_type: 3\n", " original_name: t" ] }, "execution_count": 5, "metadata": {}, "output_type": "execute_result" } ], "source": [ "ds.TMP" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "The large number of tasks means that Dask can take a while to set up everything before it starts to process the data." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Another method of doing this which can cut down on the number of files loaded at once and so reducing the number of tasks is to work on the files a group at a time - say create the mean for each decade in a different file. A function is helpful to let us do the same operation multiple times." ] }, { "cell_type": "code", "execution_count": 6, "metadata": {}, "outputs": [], "source": [ "def mean_decade(year):\n", " dss = []\n", " for mem in range(1,81):\n", "\n", " path = f'/g/data/ua8/C20C/v3/member_monthly/TMPS/{year//10}?/TMPS.*.mnmean_mem{mem:03d}.nc'\n", " ds = xarray.open_mfdataset(\n", " path,\n", " combine='nested',\n", " concat_dim='time',\n", " join='override',\n", " coords='minimal',\n", " parallel=True,\n", " )\n", " dss.append(ds)\n", " \n", " ds = xarray.concat(dss, dim='member')\n", " \n", " ds.TMP.mean('member').to_netcdf(f'/scratch/w35/saw562/C20C_TMP_memmean_{year}.nc')" ] }, { "cell_type": "code", "execution_count": 7, "metadata": {}, "outputs": [ { "data": { "application/vnd.jupyter.widget-view+json": { "model_id": "b0a03fc8b4904583963d81e1cfa475eb", "version_major": 2, "version_minor": 0 }, "text/plain": [ " 0%| | 0/17 [00:00