{ "cells": [ { "cell_type": "markdown", "metadata": {}, "source": [ "# Monthly Climatology of Many Variables" ] }, { "cell_type": "code", "execution_count": 1, "metadata": { "tags": [ "hide-input", "hide-output" ] }, "outputs": [ { "data": { "text/html": [ "\n", "\n", "\n", "\n", "\n", "
\n", "

Client

\n", "\n", "
\n", "

Cluster

\n", "
    \n", "
  • Workers: 4
  • \n", "
  • Cores: 4
  • \n", "
  • Memory: 17.18 GB
  • \n", "
\n", "
" ], "text/plain": [ "" ] }, "execution_count": 1, "metadata": {}, "output_type": "execute_result" } ], "source": [ "import xarray\n", "import pandas\n", "import climtas.nci\n", "\n", "climtas.nci.GadiClient()" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Our source dataset is a collection of model outputs. There's a bit over 500 files in total, and each file has close to 200 variables. We'd like to compute a monthly climatology of this dataset for all variables present." ] }, { "cell_type": "code", "execution_count": 2, "metadata": {}, "outputs": [], "source": [ "path = '/scratch/y99/dd7103/PORT_Apr22/Base_for_PORT_conApr21/Base_for_PORT_conApr21.cam.h1.*.nc'" ] }, { "cell_type": "code", "execution_count": 3, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "529\n" ] } ], "source": [ "! ls {path} | wc -l" ] }, { "cell_type": "code", "execution_count": 4, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "Data variables: (12/186)\n", " hyam (lev) float64 ...\n", " hybm (lev) float64 ...\n", " hyai (ilev) float64 ...\n", " hybi (ilev) float64 ...\n", " P0 float64 ...\n", " date (time) int32 ...\n", " ... ...\n", " rad_temp (time, lev, lat, lon) float64 ...\n", " rad_ts (time, lat, lon) float64 ...\n", " rad_watice (time, lev, lat, lon) float64 ...\n", " rad_watliq (time, lev, lat, lon) float64 ...\n", " rad_watvap (time, lev, lat, lon) float64 ...\n", " rad_zint (time, ilev, lat, lon) float64 ..." ] }, "execution_count": 4, "metadata": {}, "output_type": "execute_result" } ], "source": [ "ds = xarray.open_dataset('/scratch/y99/dd7103/PORT_Apr22/Base_for_PORT_conApr21/Base_for_PORT_conApr21.cam.h1.0020-01-01-00000.nc')\n", "ds.data_vars" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Since there's a large number of files we'll use all the tricks when opening them. `parallel` lets multiple files be read at once, `join` and `compat` will read coordinates and variables from the first file if they don't have a time dimension, and `coords` and `data_vars` will make sure variables are only concatenated if they have a time dimension." ] }, { "cell_type": "code", "execution_count": 5, "metadata": {}, "outputs": [], "source": [ "ds = xarray.open_mfdataset(\n", " path,\n", " combine='nested',\n", " concat_dim='time',\n", " parallel=True,\n", " data_vars='minimal',\n", " join='override',\n", " coords='minimal',\n", " compat='override')" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Trying to compute all the variables at once is certainly possible, but it will result in a giant Dask task graph. To help Dask along as much as possible we can evaluate the variables one at a time, saving each to a different file, then join them back together afterwards if needed as a post-processing step." ] }, { "cell_type": "code", "execution_count": 6, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "co2vmr\n" ] } ], "source": [ "# Loop over all the variables\n", "for name, var in ds.data_vars.items():\n", " # If the variable has a time axis and is a floating point value\n", " if 'time' in var.dims and var.dtype in ['float32','float64']:\n", " print(name)\n", " \n", " # Do the mean and save to file\n", " clim = var.groupby('time.month').mean('time')\n", "\n", " # Copy any attributes\n", " clim.attrs = var.attrs\n", "\n", " clim.to_netcdf(f'clim_{name}.nc')\n", " \n", " # Stop after one variable for testing\n", " break" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Checking the output shows the expected climatology." ] }, { "cell_type": "code", "execution_count": 7, "metadata": {}, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "
<xarray.Dataset>\n",
       "Dimensions:  (month: 12)\n",
       "Coordinates:\n",
       "  * month    (month) int64 1 2 3 4 5 6 7 8 9 10 11 12\n",
       "Data variables:\n",
       "    co2vmr   (month) float64 0.0002848 0.0002852 ... 0.0002849 0.000285
" ], "text/plain": [ "\n", "Dimensions: (month: 12)\n", "Coordinates:\n", " * month (month) int64 1 2 3 4 5 6 7 8 9 10 11 12\n", "Data variables:\n", " co2vmr (month) float64 ..." ] }, "execution_count": 7, "metadata": {}, "output_type": "execute_result" } ], "source": [ "xarray.open_dataset('clim_co2vmr.nc')" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Batch Job\n", "\n", "When doing the full analysis we should use a batch job rather than Jupyter. To do this just use the calculations from the notebook, wrapping it all in a `if __name__ == '__main__'` check so that Dask starts up properly.\n", "\n", "```python\n", "import xarray\n", "import climtas.nci\n", "\n", "if __name__ == '__main__':\n", " # Start a Dask client using the resouces from qsub\n", " climtas.nci.GadiClient()\n", "\n", " # Open the files\n", " path = '/scratch/y99/dd7103/PORT_Apr22/Base_for_PORT_conApr21/Base_for_PORT_conApr21.cam.h1.00*.nc'\n", " ds = xarray.open_mfdataset(path, combine='nested', concat_dim='time', parallel=True,\n", " data_vars='minimal', join='override', coords='minimal', compat='override')\n", "\n", " # Loop over all the variables\n", " for name, var in ds.data_vars.items():\n", " # If the variable has a time axis and is a floating point value\n", " if 'time' in var.dims and var.dtype in ['float32','float64']:\n", " # Do the mean and save to file\n", " clim = var.groupby('time.month').mean('time')\n", "\n", " # Copy any attributes\n", " clim.attrs = var.attrs\n", "\n", " clim.to_netcdf(f'clim_{name}.nc')\n", "```" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [] } ], "metadata": { "kernelspec": { "display_name": "Python [conda env:analysis3] *", "language": "python", "name": "conda-env-analysis3-py" }, "language_info": { "codemirror_mode": { "name": "ipython", "version": 3 }, "file_extension": ".py", "mimetype": "text/x-python", "name": "python", "nbconvert_exporter": "python", "pygments_lexer": "ipython3", "version": "3.8.6" } }, "nbformat": 4, "nbformat_minor": 4 }