{ "cells": [ { "cell_type": "markdown", "id": "optional-abraham", "metadata": {}, "source": [ "# Introduction\n", "\n", "The dask library provides parallel versions of many operations available in numpy and pandas. It does this by breaking up an array into chunks.\n", "\n", "The full documentation for the Dask library is available at https://docs.dask.org/en/latest/" ] }, { "cell_type": "code", "execution_count": 1, "id": "applicable-bunny", "metadata": {}, "outputs": [ { "data": { "text/html": [ "\n", "\n", "\n", "\n", "\n", "
\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
Array Chunk
Bytes 800 B 200 B
Shape (10, 10) (5, 5)
Count 4 Tasks 4 Chunks
Type float64 numpy.ndarray
\n", "
\n", "\n", "\n", " \n", " \n", " \n", " \n", "\n", " \n", " \n", " \n", " \n", "\n", " \n", " \n", "\n", " \n", " 10\n", " 10\n", "\n", "
" ], "text/plain": [ "dask.array" ] }, "execution_count": 1, "metadata": {}, "output_type": "execute_result" } ], "source": [ "import dask.array\n", "\n", "dask.array.ones((10,10), chunks=(5,5))" ] }, { "cell_type": "markdown", "id": "aging-lightweight", "metadata": {}, "source": [ "Here I've made a 10x10 array, that's made up of 4 5x5 chunks.\n", "\n", "Dask will run operations on different chunks in parallel. It evaluates arrays lazily - data is only computed for the chunks needed, if you don't explicitly load the data (e.g. by using `.compute()`, or plotting or saving the data) it builds up a graph of operations that can be run later when the actual values are needed." ] }, { "cell_type": "code", "execution_count": 2, "id": "hybrid-pakistan", "metadata": {}, "outputs": [ { "data": { "image/png": "\n", "text/plain": [ "" ] }, "execution_count": 2, "metadata": {}, "output_type": "execute_result" } ], "source": [ "data = dask.array.ones((10,10), chunks=(5,5))\n", "\n", "(data + 5).visualize()" ] }, { "cell_type": "code", "execution_count": 3, "id": "altered-liberal", "metadata": {}, "outputs": [ { "data": { "text/plain": [ "6.0" ] }, "execution_count": 3, "metadata": {}, "output_type": "execute_result" } ], "source": [ "(data + 5)[4,6].compute()" ] }, { "cell_type": "markdown", "id": "driving-wonder", "metadata": {}, "source": [ "Since I've only asked for a single element of the array, only that one chunk needs to be calculated. This is less important for this tiny example, but it does get important when you've got a huge dataset that you only need a subset of." ] }, { "cell_type": "markdown", "id": "quick-clerk", "metadata": {}, "source": [ "## Creating Dask arrays manually\n", "\n", "Something to keep in mind with Dask arrays is that you can't directly write to an array element" ] }, { "cell_type": "code", "execution_count": 4, "id": "advisory-privacy", "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "ERROR Item assignment with not supported\n" ] } ], "source": [ "try:\n", " data[4, 6] = 3\n", "except Exception as e:\n", " print('ERROR', e)" ] }, { "cell_type": "markdown", "id": "foreign-advocate", "metadata": {}, "source": [ "This keeps us from trying to do calculations with loops, which Dask can't parallelise and are really inefficient in Python. Instead you should use whole-array operations.\n", "\n", "Say we want to calculate the area of grid cells, knowing the latitude and longitude of grid point centres.\n", "\n", "For each grid point we need to calculate\n", "\n", "$$A = R^2\\int_{\\phi_0}^{\\phi_1}\\int_{\\theta_0}^{\\theta_1}\\cos\\theta d\\theta d\\phi = R^2(\\phi_1 - \\phi_0)(\\cos \\theta_1 - \\cos \\theta_0)$$\n", "\n", "Rather than doing this in a loop over each grid point, we can use numpy array operations, computing the width and height of each cell and then performing an outer vector product to create a 2d array" ] }, { "cell_type": "code", "execution_count": 5, "id": "fewer-hello", "metadata": {}, "outputs": [ { "data": { "text/html": [ "\n", "\n", "\n", "\n", "\n", "
\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
Array Chunk
Bytes 1.20 MB 80.00 kB
Shape (300, 500) (100, 100)
Count 90 Tasks 15 Chunks
Type float64 numpy.ndarray
\n", "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "\n", " \n", " \n", "\n", " \n", " 500\n", " 300\n", "\n", "
" ], "text/plain": [ "dask.array" ] }, "execution_count": 5, "metadata": {}, "output_type": "execute_result" } ], "source": [ "import numpy\n", "from numpy import deg2rad\n", "\n", "# Grid centres\n", "lon = dask.array.linspace(0, 360, num=500, endpoint=False, chunks=100)\n", "lat = dask.array.linspace(-90, 90, num=300, endpoint=True, chunks=100)\n", "\n", "# Grid spacing\n", "dlon = (lon[1] - lon[0]).compute()\n", "dlat = (lat[1] - lat[0]).compute()\n", "\n", "# Grid edges in radians\n", "lon0 = deg2rad(lon - dlon/2)\n", "lon1 = deg2rad(lon + dlon/2)\n", "lat0 = deg2rad((lat - dlat/2).clip(-90, 90))\n", "lat1 = deg2rad((lat + dlat/2).clip(-90, 90))\n", "\n", "# Compute cell dimensions\n", "cell_width = lon1 - lon0\n", "cell_height = numpy.sin(lat1) - numpy.sin(lat0)\n", "\n", "# Area\n", "R_earth = 6_371_000\n", "A = R_earth**2 * numpy.outer(cell_height, cell_width)\n", "A" ] }, { "cell_type": "markdown", "id": "affecting-arbitration", "metadata": {}, "source": [ "You can see that even though only numpy operations were used we still ended up with a Dask array at the end. An array much larger than would fit in memory can be created this way, with the values only needing to be computed if they're actually used.\n", "\n", "Plotting Dask arrays works just like numpy arrays:" ] }, { "cell_type": "code", "execution_count": 6, "id": "olympic-chase", "metadata": {}, "outputs": [ { "data": { "text/plain": [ "" ] }, "execution_count": 6, "metadata": {}, "output_type": "execute_result" }, { "data": { "image/png": "iVBORw0KGgoAAAANSUhEUgAAAV0AAAEDCAYAAACWDNcwAAAAOXRFWHRTb2Z0d2FyZQBNYXRwbG90bGliIHZlcnNpb24zLjMuNCwgaHR0cHM6Ly9tYXRwbG90bGliLm9yZy8QVMy6AAAACXBIWXMAAAsTAAALEwEAmpwYAAAYQUlEQVR4nO3df6yc1X3n8fdnrq8NBaJADMRgd82mpCtCG2gtulpWK6dJEzZFpdmKiKhNqRat+wdpEjVRsVNp06ay1lm1tJGSVr0tKEYNEG8ShIWyIYYtYpEAx06BYBwaN3hTxxau+bGYVTD3zv3sH89jMoH7Y67v3DPP3OfzkkYzc+bMM+eg5HuPv+fHI9tEREQZnWE3ICKiTRJ0IyIKStCNiCgoQTcioqAE3YiIghJ0IyIKStCNiGVJ0q2Sjkp6so+6/0rS/ZKekPSApLVL1a4E3YhYrr4IXNVn3T8FbrP988BngP+2VI1K0I2IZcn2g8DzvWWS3ibpG5L2Svrfkv5N/dElwP31678HrlmqdiXoRkSbTAC/Z/sXgU8Cf1mXPw78Rv36A8BZkt6yFA1YsRQXjYhoGklnAv8O+B+SThavqp8/CXxe0u8ADwI/BKaWoh0JuhHRFh3gRduXvf4D24eB/wSvBeffsP1/l6oRERHLnu2XgGckXQugyjvr16slnYyHW4Bbl6od8wZdSadJ2i3pcUn7JP1xXX6OpF2Svlc/n93znS2SDkh6WtL7lqrxERGzkXQH8DDws5IOSboB+E3gBkmPA/v48YTZRuBpSf8InA9sXbJ2zXe0o6rkxxm2X5Y0DjwEfIxqKP687W2SNgNn275J0iXAHcAVwAXAfcDbbXeXqhMREaNi3pGuKy/Xb8frh6n+Qmyvy7cDv16/vga40/YJ288AB6gCcERE6/U1kSZpDNgL/AzwBduPSjrf9hEA20cknVdXvxB4pOfrh+qy119zE7AJYIyxX/wp3nTqvYiI1jjOC8dsn7uYa7zvXWf4uef7+8f33idO3Gu7300W8+or6NapgcskvRm4S9Klc1TXDGVvyGHYnqBaM8ebdI5/Se/upykR0XL3+Sv/Z7HXeO75Lrvv/em+6o6t+d7qxf5erwUtGbP9oqQHqLbWPStpTT3KXQMcrasdAtb1fG0tcHiu60qis+q0hTQlItrqlcVfwsA004u/0CmYN+hKOheYrAPu6cB7gM8CO4HrgW318931V3YCt0u6mWoi7WJg9zw/AmNZvRYRZRgzOaS5/X5GumuA7XVetwPssH2PpIeBHfUyjB8A1wLY3idpB/AU1Y6OG+ddudARndNPX0Q3IqI1/t9gLtPYka7tJ4DLZyh/DpgxEWt7Kwta55aRbkSUY0x3SHdCb8Y2YAlWNKMpEdEO02+c3y+iGZFOgpUrh92KiGgJA912B11gxdiwWxERLZKRboJuRBRiYLLtOV2vbEZTImL5M056wSuyeiEiCjF0hxNzmxF0LTGdkW5EFFLtSBscSQeB40AXmLK9Yba6zYh0Aq+Y6ciGiIilILozHhOzKO+yfWy+Ss0IuoA7CboRUUY1kTacmNOIoOuOmF6V1QsRUUa1TrfvoLta0p6e9xP1KYmvv+Q3JRn46xk+f00jgi5kpBsRZU33P9I9NleOtnal7cP1ueK7JH3X9oMzVWxG0E1ONyIKWuBId/7rVXcTxvZRSXdR3S2nuUHXEt3xBN2IKMOI7oBuhi7pDKBj+3j9+r3AZ2ar34igi8BjCboRUc4C0gvzOZ/qjjpQxdTbbX9jtsqNCbrdnHcTEYUY8aoHM3lv+/vAO/ut34igazKRFhHlVJsjhrMLthFBt0ovDLsREdEmS7A5oi+NCbqZSIuIUmzRdYtHuiYj3Ygoa7rtI90h/dGJiBaqJtKGE/4aE3Snx4fdiIhoi0ykkfRCRJTVbfWBN4LpRrQkItpgkDvSFqoZoS453YgobLrNqxcg6YWIKKc68KbFQdeZSIuIgoyYHNJIrxFBN+mFiCjJprmbIyStA24D3kp1L7cJ25+T9EfAfwH+pa76Kdtfr7+zBbiB6iZtH7V973y/M6QlcxHRSmr05ogp4BO2vy3pLGCvpF31Z39u+097K0u6BLgOeAdwAXCfpLfb7s76CxnpRkRBpsEjXdtHgCP16+OS9gMXzvGVa4A7bZ8AnpF0gOoU9Yfn/J2xId2EPiJaaSQm0iStBy4HHgWuBD4i6beBPVSj4ReoAvIjPV87xNxBOut0I6Ioo0EeYr4gfYc6SWcCXwU+bvslSX8F/AnVSP1PgD8D/jPMmCh5wzBW0iZgE8DY2WdDRroRUUh1C/YGn70gaZwq4H7J9tcAbD/b8/nfAPfUbw8B63q+vhY4/Ppr1rcongBY9dPrPKQ/OhHRSmruebqqbvxzC7Df9s095WvqfC/AB4An69c7gdsl3Uw1kXYxsHvuHwGPZ6QbEWWYZu9IuxL4MPAdSY/VZZ8CPiTpMqr2HwR+F8D2Pkk7gKeoVj7cOOfKBagu0UnQjYhyGjvStf0QM+dpvz7Hd7YCW/tuhcArEnQjogxbjR7pLj2BMpEWEYVUE2lt3gYMoATdiCil5fdIQ6azYnrYrYiIlqgm0hqa0y1F2QYcEQWNxI60pSJBZywj3YgoYyR2pC0pwViCbkQU1OobUwrT6SToRkQZNkxOtznoCsZXzLN/IiJiQKr0QouDLpix7EiLiIIauyOtBAFjSnohIspo/ZIxyaxMeiEiihlsekHSGNW54j+0ffVcdRsSdGEsE2kRUdCA75H2MWA/8Kb5KjYj6GJWjU0NuxkR0RLV6oXBnL0gaS3wq1SHfP3+fPWbEXQFKzLSjYhCBrw54i+APwDO6qdyM4IuZkUm0iKioAWkF1ZL2tPzfqK+8w2SrgaO2t4raWM/F2tE0O0ApyW9EBGFLHD1wjHbG2b57Erg1yS9HzgNeJOkv7P9W7NdrBFBF5lOjnaMiIIGsXrB9hZgC0A90v3kXAEXGhJ0BazoZMlYRJRhi6k270jryJw+NjnsZkREiwx6c4TtB4AH5qvXiKAL0CHphYgoo/U70jpMc/rYq8NuRkS0SKuDrgTjWacbEYXkEHOq0W5ERCkD3gbct0YE3Q5mVSfrdCOiDBumWn2IOWZcWTIWEeW0Or0gmfGMdCOikNbndIU5TQm6EVGO2x10oZMDbyKioMZOpElaB9wGvBWYpjph53OSzgG+DKwHDgIftP1C/Z0twA1AF/io7Xvn/A3MacqOtIgow252TncK+ITtb0s6C9graRfwO8D9trdJ2gxsBm6SdAlwHfAO4ALgPklvtz3rTFnukRYRZYluU1cv2D4CHKlfH5e0H7gQuAbYWFfbTrXn+Ka6/E7bJ4BnJB0ArgAenu03pKxeiIiyRiKnK2k9cDnwKHB+HZCxfUTSeXW1C4FHer52qC57/bU2AZsAVl+wMumFiChmJM5ekHQm8FXg47ZfkmZt8EwfvOE0m/rk9QmAt/3cGc5EWkQU4yqvOwx9BV1J41QB90u2v1YXPytpTT3KXQMcrcsPAet6vr4WODzn9TFj2QYcEQU1efWCgFuA/bZv7vloJ3A9sK1+vrun/HZJN1NNpF0M7J7zNzCndZJeiIgy3OSJNKp7AH0Y+I6kx+qyT1EF2x2SbgB+AFwLYHufpB3AU1QrH26ca+UC1KsXMtKNiIIam16w/RAz52kB3j3Ld7ZS3QO+L1m9EBGljcTqhaUiYJwE3Ygow2550AVnc0REFNX4JWNLScB4gm5EFNTYnG4JWTIWESUZMd3g1QtLLiPdiChtWPcfb0zQHcst2COilEykJehGRGFtz+mOK0E3Ispp/Ug3IqIUA9PTLQ66AoYzjxgRrWSgzSNdSayc/ajIiIiBa/U6XchINyIKa3PQre6RlpFuRJSiTKR1hnSgcES0VJtHuhERRRk8oNULkk4DHgRWUcXUr9j+9Gz1GxF0q6Mdk9WNiJIG9q/rE8Av2365vrXZQ5L+p+1HZqrciKAbEVHcgNILtg28XL8drx+zXr0RQVcoE2kRUVb/QXe1pD097yfqu5m/RtIYsBf4GeALth+d7WKNCLoAnaQXIqKUhW2OOGZ7w5yXq+4DeZmkNwN3SbrU9pMz1U2ki4hWqm7ZM/9jYdf0i8ADwFWz1WnESLfaBpz0QkQUNLjVC+cCk7ZflHQ68B7gs7PVb0TQjYgobYAHG64Bttd53Q6ww/Y9s1VuSNAVY0qmIyIKMYNcvfAEcHm/9RsSdCMiSlK7TxmLiCiuzduAq4m0pBcioqAh3Qu3EUE3IqKoIR5iPu/wUtKtko5KerKn7I8k/VDSY/Xj/T2fbZF0QNLTkt63VA2PiFgMub/HoPXzb/ovMvNC3z+3fVn9+DqApEuA64B31N/5y3oZRUREs7jPx4DNG3RtPwg83+f1rgHutH3C9jPAAeCKRbQvImJZWczs1UckPVGnH86uyy4E/rmnzqG67A0kbZK0R9Kef3muu4hmREQsXJPTCzP5K+BtwGXAEeDP6vKZMtMzNtv2hO0Ntjec+5ZkICKiIFNtA+7nMWCnFHRtP2u7a3sa+Bt+nEI4BKzrqboWOLy4JkZELIGm5nRnImlNz9sPACdXNuwErpO0StJFwMXA7sU1MSJi8IaVXph3na6kO4CNVAf5HgI+DWyUdBnV34GDwO8C2N4naQfwFDAF3FifMxkR0SxN3ZFm+0MzFN8yR/2twNbFNCoiYsk1NeiWYGB6WHvyIqJ1lip10I9GBN2IiOKWYGVCPxJ0I6KVWj7SNV0nvRARBbU76EZEFNT2nG41kTak/wIR0U5tDroREaWp7YeYZ8lYRLRBI4KuMV0nvRARBSW9EBFRSNsn0iATaRFRWJuDroFugm5ElNTmoAsZ6UZEOaLlqxcMmUiLiHKS0yULxiKirDYHXePkdCOirFYHXcNkYm5EFNT69EJERFFtDrpGTHo4BwpHRAu55asXALok6EZEQQMa6UpaB9wGvJVqTcCE7c/NVr8RQbfaHJGgGxHlDDCnOwV8wva3JZ0F7JW0y/ZTM1VuTNCddGfYzYiINhlQ0LV9BDhSvz4uaT9wIdDkoCu6JOhGRCFmIUF3taQ9Pe8nbE/MVFHSeuBy4NHZLtaIoAswnYm0iChELCi9cMz2hnmvKZ0JfBX4uO2XZqvXiKBrxKseG3YzIqJFBrlOV9I4VcD9ku2vzVW3IUEXJknQjYiCBrd6QcAtwH7bN89XvxlB12IyI92IKGlwI90rgQ8D35H0WF32Kdtfn6nyvEFX0q3A1cBR25fWZecAXwbWAweBD9p+of5sC3AD0AU+avve+X6jWjKWibSIKGSAp4zZfgj6X/Paz0j3i8DnqRb/nrQZuN/2Nkmb6/c3SboEuA54B3ABcJ+kt9vuzv0TopslYxFRUlO3Adt+sF4G0esaYGP9ejvwAHBTXX6n7RPAM5IOAFcAD8/1G9OIVzy+oIZHRCzGqG0DPr9eEIztI5LOq8svBB7pqXeoLnsDSZuATQCrL1jJdEa6EVHQcjllbKa8xoxdqxcXTwBc9HNnOkvGIqKYhW2OGKhTDbrPSlpTj3LXAEfr8kPAup56a4HD813MFieSXoiIkkYs6O4Erge21c9395TfLulmqom0i4Hd812sukda0gsRUcYCd6QNVD9Lxu6gmjRbLekQ8GmqYLtD0g3AD4BrAWzvk7SD6qCHKeDG+VcuVDvSMpEWESVpejhRt5/VCx+a5aN3z1J/K7B1IY0wZCItIsoZwZzuQFV3jshEWkSU09j0Qgm2eGU66YWIKKjVQTcj3YgorN0jXXKebkQU1u6gm3W6EVFQ2+8GbMPkdFYvREQZjV6nW8I0HX7UXTnsZkREm7ih63RLmc4t2COioFaPdKv0QlYvREQhbd8cMY14pZuJtIgop9UTaVhZMhYRRbU66E4Dr3Qb0ZSIaAPT7ok0I6Zy4E1EFNT6ibSprNONiJJaHXQRJ5JeiIhCWr85woZuRroRUYrd3EPMy1CCbkSU1eaR7rTh1W42R0REOa1OL1Qj3azTjYhCqvNkh/LTjQi6NkxOZaQbEQW1eaRrxHRyuhFRULvTC86dIyKirFavXrChO5n0QkQU0vZTxgA8pMMnIqJ9qs0Rg4m6km4FrgaO2r50vvoNCbrCWb0QESUNbqD3ReDzwG39VG5G0DW4m4m0iChnUCNd2w9KWt9v/QYF3Yx0I6KQheV0V0va0/N+wvbEqf70ooKupIPAcaALTNneIOkc4MvAeuAg8EHbL8x5IYOmEnQjopQFnb1wzPaGQf3yIEa677J9rOf9ZuB+29skba7f3zT3JQTJ6UZEScvoEPNrgI316+3AA8wXdA0kvRARpXh0b9dj4JuSDPx1nec43/YRANtHJJ3Xz4U63UW2JCJiIQa3ZOwOqoHmakmHgE/bvmW2+osNulfaPlwH1l2SvruAhm4CNgGMnX12RroRUdaAsgu2P7SQ+osKurYP189HJd0FXAE8K2lNPcpdAxyd5bsTwATAqnXrPKyhfkS0k6aHE3ROOehKOgPo2D5ev34v8BlgJ3A9sK1+vnveaxk6kxnpRkQhZpCbIxZkMSPd84G7JJ28zu22vyHpW8AOSTcAPwCunfdKQ0xqR0T7CA9sc8RCnXLQtf194J0zlD8HvHuh19PUqbYkIuIUjFrQHaiMdCOitDYHXQHKkrGIKGVEc7qDY+hMDrsREdEmI7d6YaCSXoiIotzy9IKhk4m0iCjFtDvoQnK6EVFY23O6SS9EREkjt053oDKRFhGltTnoZslYRBRlQ7f1qxeGdD/kiGinNo90MYy9OuxGRESrtDnoiox0I6IgA0OKOY0IuhnpRkRZBrc9p9vNSDciCjHtnkiTF3Q75IiIxWtzThfD2IkE3YgoqNVBl0ykRURJOfAGTSXoRkQhBtp9tKPpTGZLWkQU1OaRLhnpRkRRLd8GLJvOqzlQNyIKMbj163SncrZjRBTU7h1phm5yuhFRULtzukYnkl6IiELstq9eAKYy0o2Igto+0uXVnHgTEaUYDyml2ZygO5X0QkQU0vqjHYe4Zi4iWmq5LRmTdBXwOWAM+Fvb22atPG2mf/SjpWpKRMRPMODlNNKVNAZ8AfgV4BDwLUk7bT814xeGeJO4iGghL79DzK8ADtj+PoCkO4FrgBmDrm08lXuwR0Q5y20i7ULgn3veHwJ+qbeCpE3ApvrtiV1TX35yidoyTKuBY8NuxIAtxz5B+jVKfnaxFzjOC/fe56+s7rP6QP/7LVXQ1QxlP5FAsT0BTABI2mN7wxK1ZWiWY7+WY58g/RolkvYs9hq2rxpEW05FZ4muewhY1/N+LXB4iX4rImJkLFXQ/RZwsaSLJK0ErgN2LtFvRUSMjCVJL9iekvQR4F6qJWO32t43x1cmlqIdDbAc+7Uc+wTp1ygZ6T7JQ9p/HBHRRkuVXoiIiBkk6EZEFDT0oCvpKklPSzogafOw29MvSbdKOirpyZ6ycyTtkvS9+vnsns+21H18WtL7htPq+UlaJ+nvJe2XtE/Sx+ryke2bpNMk7Zb0eN2nP67LR7ZPJ0kak/QPku6p3y+HPh2U9B1Jj51cHrYc+vUa20N7UE2y/RPwr4GVwOPAJcNs0wLa/h+AXwCe7Cn778Dm+vVm4LP160vqvq0CLqr7PDbsPszSrzXAL9SvzwL+sW7/yPaNat34mfXrceBR4N+Ocp96+vb7wO3APcvof4MHgdWvKxv5fp18DHuk+9p2YduvAie3Czee7QeB519XfA2wvX69Hfj1nvI7bZ+w/QxwgKrvjWP7iO1v16+PA/updhiObN9cebl+O14/zAj3CUDSWuBXgb/tKR7pPs1h2fRr2EF3pu3CFw6pLYNwvu0jUAUv4Ly6fCT7KWk9cDnVyHCk+1b/M/wx4Ciwy/bI9wn4C+APgN6TW0a9T1D9QfympL31cQGwPPoFDP883Xm3Cy8TI9dPSWcCXwU+bvslaaYuVFVnKGtc32x3gcskvRm4S9Klc1RvfJ8kXQ0ctb1X0sZ+vjJDWaP61ONK24clnQfskvTdOeqOUr+A4Y90l9t24WclrQGon4/W5SPVT0njVAH3S7a/Vhcvi77ZfhF4ALiK0e7TlcCvSTpIlZb7ZUl/x2j3CQDbh+vno8BdVOmCke/XScMOusttu/BO4Pr69fXA3T3l10laJeki4GJg9xDaNy9VQ9pbgP22b+75aGT7JunceoSLpNOB9wDfZYT7ZHuL7bW211P9/+Z/2f4tRrhPAJLOkHTWydfAe4EnGfF+/YRhz+QB76eaIf8n4A+H3Z4FtPsO4AgwSfXX9gbgLcD9wPfq53N66v9h3cengf847PbP0a9/T/XPsyeAx+rH+0e5b8DPA/9Q9+lJ4L/W5SPbp9f1byM/Xr0w0n2iWsn0eP3YdzImjHq/eh/ZBhwRUdCw0wsREa2SoBsRUVCCbkREQQm6EREFJehGRBSUoBsRUVCCbkREQf8f7gnKGepsFm8AAAAASUVORK5CYII=\n", "text/plain": [ "
" ] }, "metadata": { "needs_background": "light" }, "output_type": "display_data" } ], "source": [ "import matplotlib.pyplot as plt\n", "plt.pcolormesh(A)\n", "plt.colorbar()" ] }, { "cell_type": "markdown", "id": "activated-state", "metadata": {}, "source": [ "And we can check the total area is about $5.1 \\times 10^{14}$" ] }, { "cell_type": "code", "execution_count": 7, "id": "cleared-auditor", "metadata": {}, "outputs": [ { "data": { "text/plain": [ "'5.100645e+14'" ] }, "execution_count": 7, "metadata": {}, "output_type": "execute_result" } ], "source": [ "'%e'%(A.sum().compute())" ] }, { "cell_type": "markdown", "id": "sustained-snake", "metadata": {}, "source": [ "## Creating Dask arrays from NetCDF files\n", "\n", "The most common way of creating a Dask array is to read them from a netcdf file with Xarray. You can give `open_dataset()` and `open_mfdataset()` a `chunks` parameter, which is how large chunks should be in each dimension of the file.\n", "\n", "If you use `open_mfdataset()`, by default each input file will be its own chunk." ] }, { "cell_type": "code", "execution_count": 8, "id": "outdoor-collectible", "metadata": {}, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "
<xarray.DataArray 'tas' (time: 1980, lat: 144, lon: 192)>\n",
       "dask.array<open_dataset-0e410aadf9156f5cf0b6cd8c2849d202tas, shape=(1980, 144, 192), dtype=float32, chunksize=(1, 144, 192), chunktype=numpy.ndarray>\n",
       "Coordinates:\n",
       "  * time     (time) datetime64[ns] 1850-01-16T12:00:00 ... 2014-12-16T12:00:00\n",
       "  * lat      (lat) float64 -89.38 -88.12 -86.88 -85.62 ... 86.88 88.12 89.38\n",
       "  * lon      (lon) float64 0.9375 2.812 4.688 6.562 ... 353.4 355.3 357.2 359.1\n",
       "    height   float64 ...\n",
       "Attributes:\n",
       "    standard_name:  air_temperature\n",
       "    long_name:      Near-Surface Air Temperature\n",
       "    comment:        near-surface (usually, 2 meter) air temperature\n",
       "    units:          K\n",
       "    cell_methods:   area: time: mean\n",
       "    cell_measures:  area: areacella\n",
       "    history:        2019-11-08T06:41:45Z altered by CMOR: Treated scalar dime...\n",
       "    _ChunkSizes:    [  1 144 192]
" ], "text/plain": [ "\n", "dask.array\n", "Coordinates:\n", " * time (time) datetime64[ns] 1850-01-16T12:00:00 ... 2014-12-16T12:00:00\n", " * lat (lat) float64 -89.38 -88.12 -86.88 -85.62 ... 86.88 88.12 89.38\n", " * lon (lon) float64 0.9375 2.812 4.688 6.562 ... 353.4 355.3 357.2 359.1\n", " height float64 ...\n", "Attributes:\n", " standard_name: air_temperature\n", " long_name: Near-Surface Air Temperature\n", " comment: near-surface (usually, 2 meter) air temperature\n", " units: K\n", " cell_methods: area: time: mean\n", " cell_measures: area: areacella\n", " history: 2019-11-08T06:41:45Z altered by CMOR: Treated scalar dime...\n", " _ChunkSizes: [ 1 144 192]" ] }, "execution_count": 8, "metadata": {}, "output_type": "execute_result" } ], "source": [ "import xarray\n", "\n", "path = 'https://dapds00.nci.org.au/thredds/dodsC/fs38/publications/CMIP6/CMIP/CSIRO-ARCCSS/ACCESS-CM2/historical/r1i1p1f1/Amon/tas/gn/v20191108/tas_Amon_ACCESS-CM2_historical_r1i1p1f1_gn_185001-201412.nc'\n", "ds = xarray.open_dataset(path, chunks={'time': 1})\n", "ds.tas" ] }, { "cell_type": "markdown", "id": "promotional-transsexual", "metadata": {}, "source": [ "There are a few ways to turn a Dask xarray.DataArray back into a numpy array. `.load()` will compute the dask data and retain metadata, `.values` will compute the dask data and return a numpy array, and `.data` will return the dask array itself." ] }, { "cell_type": "code", "execution_count": 9, "id": "settled-yellow", "metadata": {}, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "
<xarray.DataArray 'tas' ()>\n",
       "array(295.87695312)\n",
       "Coordinates:\n",
       "    time     datetime64[ns] 1850-01-16T12:00:00\n",
       "    lat      float64 -26.88\n",
       "    lon      float64 186.6\n",
       "    height   float64 2.0\n",
       "Attributes:\n",
       "    standard_name:  air_temperature\n",
       "    long_name:      Near-Surface Air Temperature\n",
       "    comment:        near-surface (usually, 2 meter) air temperature\n",
       "    units:          K\n",
       "    cell_methods:   area: time: mean\n",
       "    cell_measures:  area: areacella\n",
       "    history:        2019-11-08T06:41:45Z altered by CMOR: Treated scalar dime...\n",
       "    _ChunkSizes:    [  1 144 192]
" ], "text/plain": [ "\n", "array(295.87695312)\n", "Coordinates:\n", " time datetime64[ns] 1850-01-16T12:00:00\n", " lat float64 -26.88\n", " lon float64 186.6\n", " height float64 2.0\n", "Attributes:\n", " standard_name: air_temperature\n", " long_name: Near-Surface Air Temperature\n", " comment: near-surface (usually, 2 meter) air temperature\n", " units: K\n", " cell_methods: area: time: mean\n", " cell_measures: area: areacella\n", " history: 2019-11-08T06:41:45Z altered by CMOR: Treated scalar dime...\n", " _ChunkSizes: [ 1 144 192]" ] }, "execution_count": 9, "metadata": {}, "output_type": "execute_result" } ], "source": [ "ds.tas[0,50,99].load()" ] }, { "cell_type": "code", "execution_count": 10, "id": "accessible-reputation", "metadata": {}, "outputs": [ { "data": { "text/plain": [ "array(295.87695312)" ] }, "execution_count": 10, "metadata": {}, "output_type": "execute_result" } ], "source": [ "ds.tas[0,50,99].values" ] }, { "cell_type": "code", "execution_count": 11, "id": "blond-malta", "metadata": {}, "outputs": [ { "data": { "text/html": [ "\n", "\n", "\n", "\n", "\n", "
\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
Array Chunk
Bytes 218.97 MB 110.59 kB
Shape (1980, 144, 192) (1, 144, 192)
Count 1981 Tasks 1980 Chunks
Type float32 numpy.ndarray
\n", "
\n", "\n", "\n", " \n", " \n", " \n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "\n", " \n", " \n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "\n", " \n", " \n", " \n", "\n", " \n", " \n", "\n", " \n", " \n", " \n", "\n", " \n", " \n", " \n", "\n", " \n", " \n", "\n", " \n", " 192\n", " 144\n", " 1980\n", "\n", "
" ], "text/plain": [ "dask.array" ] }, "execution_count": 11, "metadata": {}, "output_type": "execute_result" } ], "source": [ "ds.tas.data" ] }, { "cell_type": "markdown", "id": "naval-impossible", "metadata": {}, "source": [ "## Distributed Dask\n", "\n", "Without any special setup, Dask will run operations in threaded mode. You can configure it to run in distributed mode instead with" ] }, { "cell_type": "code", "execution_count": 9, "id": "revolutionary-antenna", "metadata": {}, "outputs": [ { "name": "stderr", "output_type": "stream", "text": [ "distributed.comm.tcp - WARNING - Could not set timeout on TCP stream: [Errno 92] Protocol not available\n", "distributed.comm.tcp - WARNING - Could not set timeout on TCP stream: [Errno 92] Protocol not available\n", "distributed.comm.tcp - WARNING - Could not set timeout on TCP stream: [Errno 92] Protocol not available\n", "distributed.comm.tcp - WARNING - Could not set timeout on TCP stream: [Errno 92] Protocol not available\n", "distributed.comm.tcp - WARNING - Could not set timeout on TCP stream: [Errno 92] Protocol not available\n", "distributed.comm.tcp - WARNING - Could not set timeout on TCP stream: [Errno 92] Protocol not available\n", "distributed.comm.tcp - WARNING - Could not set timeout on TCP stream: [Errno 92] Protocol not available\n", "distributed.comm.tcp - WARNING - Could not set timeout on TCP stream: [Errno 92] Protocol not available\n", "distributed.comm.tcp - WARNING - Could not set timeout on TCP stream: [Errno 92] Protocol not available\n", "distributed.comm.tcp - WARNING - Could not set timeout on TCP stream: [Errno 92] Protocol not available\n" ] }, { "data": { "text/html": [ "\n", "\n", "\n", "\n", "\n", "
\n", "

Client

\n", "\n", "
\n", "

Cluster

\n", "
    \n", "
  • Workers: 4
  • \n", "
  • Cores: 12
  • \n", "
  • Memory: 17.13 GB
  • \n", "
\n", "
" ], "text/plain": [ "" ] }, "execution_count": 9, "metadata": {}, "output_type": "execute_result" }, { "name": "stderr", "output_type": "stream", "text": [ "distributed.comm.tcp - WARNING - Could not set timeout on TCP stream: [Errno 92] Protocol not available\n", "distributed.comm.tcp - WARNING - Could not set timeout on TCP stream: [Errno 92] Protocol not available\n", "distributed.comm.tcp - WARNING - Could not set timeout on TCP stream: [Errno 92] Protocol not available\n", "distributed.comm.tcp - WARNING - Could not set timeout on TCP stream: [Errno 92] Protocol not available\n" ] } ], "source": [ "import dask.distributed\n", "import tempfile\n", "\n", "try:\n", " client\n", "except NameError:\n", " dask_worker_dir = tempfile.TemporaryDirectory()\n", " \n", " client = dask.distributed.Client(\n", " local_directory = dask_worker_dir.name,\n", " )\n", "client" ] }, { "cell_type": "markdown", "id": "mental-algebra", "metadata": {}, "source": [ "This will by default ask for all resources available on your computer.\n", "\n", "```{hint}\n", "The `try: ... except NameError:` structure is to make sure only on Dask client is created, in case you execute the notebook cell more than once. If you're writing a python script rather than using a notebook it's not needed.\n", "```\n", "\n", "```{warning}\n", "It's important to set the `local_directory` parameter, otherwise Dask will store temporary files in the current working directory which can be a problem if filesystem quotas are enabled.\n", "```\n", "\n", "Other useful options are:\n", " * `n_workers`: Number of distributed processes\n", " * `threads_per_worker`: Number of shared memory threads within each process\n", " * `memory_limit`: Memory available to each process (e.g. `'4gb'`)\n", " \n", "```{warning}\n", "If you're using a shared system, be polite and don't take over the whole system with your Dask cluster, set reasonable limits. If running on NCI's Gadi supercomputer, `climtas.nci.GadiClient()` will inspect the PBS resources requested by `qsub` and set up the cluster using those limits\n", "```\n", "\n", "You can follow the dashboard link displayed by Jupyter to get an interactive view of what the Dask cluster is doing.\n", "\n", "To stop the Dask cluster run" ] }, { "cell_type": "code", "execution_count": 7, "id": "pleased-representative", "metadata": {}, "outputs": [], "source": [ "client.close()" ] }, { "cell_type": "markdown", "id": "assumed-yahoo", "metadata": {}, "source": [ "This isn't normally needed, Dask will clean itself up at the end of your script automatically, but it can be helpful if you're experimenting with different cluster sizes." ] }, { "cell_type": "code", "execution_count": null, "id": "north-december", "metadata": {}, "outputs": [], "source": [] } ], "metadata": { "kernelspec": { "display_name": "Python [conda env:analysis3] *", "language": "python", "name": "conda-env-analysis3-py" }, "language_info": { "codemirror_mode": { "name": "ipython", "version": 3 }, "file_extension": ".py", "mimetype": "text/x-python", "name": "python", "nbconvert_exporter": "python", "pygments_lexer": "ipython3", "version": "3.8.6" } }, "nbformat": 4, "nbformat_minor": 5 }