Converting a function from 1D to ND - slice method

We have a function that works along the time axis, but only for 1D data. We’d like to convert it to work on a full dataset.

import numpy as np

def s_kendall(data):
    data = data[np.isfinite(data)]
        
    s_stor = []
    n = len(data)

    # Outer sum
    for i in np.arange(0, n - 1):
        # Inner sum
        for k in np.arange(i + 1, n):
            s_stor.append(np.sign(data[k]  - data[i]))

    return np.sum(s_stor)

Unlike the example Converting a function from 1D to ND - apply_along_axis, which did a correlation along the time axis, in this case the function is just stepping along the time axis. We can modify this function so that rather than working on one point at a time it acts on an entire slice of the grid at once.

The function is computing

\[ \sum_{i=0}^{n-1}\sum_{k=i+1}^n \mathrm{sgn} ( D_k - D_i ) \]

We can easily re-write the function to act on a Nd array by changing the indexing a little:

def s_kendall_nd(data):
    # data = data[np.isfinite(data)] # use nansum() instead
        
    s_stor = []
    n = data.shape[0] # Assume time is the first dimension

    # Outer sum
    for i in np.arange(0, n - 1):
        # Inner sum
        for k in np.arange(i + 1, n):
            s_stor.append(np.sign(data[k,...]  - data[i,...])) # Use an ellipses when we don't know the number of remaining dimensions

    return np.nansum(s_stor, axis=0) # The sum, excluding any non-finte values

Now we have two versions of the function, lets test them to make sure they give the same values.

sample = np.random.random((10,10,10)) - 0.5 # Random values between -0.5 and 0.5

First the 1d function:

a = s_kendall(sample[:,0,0])
a
-13.0

Then the new slice function:

b = s_kendall_nd(sample)
b[0,0]
-13.0

Advantages compared to apply_along_axis

Compared to using the apply_along_axis function rewriting to a slice function has the benefit that it’s taking advantage of numpy’s strong point - calculations on arrays.

The apply_along_axis method is basically a loop over all of the points, and loops in Python are pretty slow. Operations on whole arrays on the other hand use Numpy’s optimised code.

It’s also a better method of reading data from files. In a file you generally get all the data for a single timestep bunched together, and it’s quick to read this bunched data all at once rather than having the computer search the file for all times at a single point.