Benchmarking steps¶

A minimal workflow employing this package would basically involve three steps :

Setup functions : A list or dictionary of functions to be benchmarked. It supports both single and multiple arguments.
Setup datasets : A list or dictionary of datasets to be benchmarked.
Benchmark to get timings in a dataframe-like object. Each row holds one dataset and each header represents one function each. Dataframe has been the design choice, as it supports plotting directly from it and additionally benchmarking setup information could be stored as name values for index and columns.

We will study these with the help of a sample setup in Minimal workflow.

We will study about setting up functions and datasets in detail later in this document.

Note

Prior to Python 3.6, dictionary keys are not maintained in the order they are inserted. So, when working with those versions and with input dataset being defined as a dictionary, to keep the order, collections.OrderedDict could be used.

To get more out of it, we could optionally do the following :

Plot the timings.
Get speedups or scaled-timings of all functions with respect to one among them.
Rank the functions based on various performance-metrics.

A detailed study with examples in the next section should clear up things.

We will try to take a hands-on approach and explore the features available with this package. We will start off with the minimal steps to benchmarking on a setup and then explore other utilities to cover most common features.

Rest of the documentation will use the module’s methods. So, let’s import it once :

>>> import benchit

Minimal workflow¶

We will study a case of single argument with default parameters. Let’s take a sample case where we try to benchmark the five most common NumPy ufuncs - sum, prod, max, mean, median on arrays varying in their sizes. To keep it simple, let’s consider 1D arrays. Thus, the benchmarking steps would look something like this :

>>> import numpy as np
>>> funcs = [np.sum,np.prod,np.max,np.mean,np.median]
>>> inputs = [np.random.rand(i,i) for i in 4**np.arange(7)]
>>> t = benchit.timings(funcs, inputs)
>>> t
Functions       sum      prod      amax      mean    median
Len
1          0.000005  0.000004  0.000005  0.000007  0.000046
4          0.000005  0.000004  0.000005  0.000007  0.000047
16         0.000005  0.000005  0.000005  0.000007  0.000049
64         0.000007  0.000014  0.000007  0.000009  0.000094
256        0.000035  0.000131  0.000030  0.000038  0.000845
1024       0.000511  0.002050  0.000512  0.000522  0.011525
4096       0.008208  0.032582  0.008257  0.008274  0.261838

It’s a dataframe-like object, called BenchmarkObj. We can plot it, which automatically adds in system configuration into the title area to convey all the available benchmarking information :

>>> t.plot(logy=True, logx=True, save='timings.png')

Resultant plot would look something like this :

timings

These 4 lines of codes would be enough for most of the benchmarking workflows.

Extract dataframe & construct back¶

The underlying benchmarking data is stored as a pandas dataframe that could be extracted with :

>>> df = t.to_dataframe()

As we shall see in the next sections, this would be useful in our benchmarking quest to extend the capabilities.

There’s a benchmarking object construct function benchit.bench that accepts dataframe alongwith dtype. So, we can construct it, like so :

>>> t = benchit.bench(df, ...)

Setup functions¶

This would be a list or dictionary of functions to be benchmarked.

A general syntax for list version would look something like this :

>>> funcs = [func1, func2, ...]

We already saw a sample of it in Minimal workflow.

A general syntax for dictionary version would look something like this :

>>> funcs = {'func1_name':func1, 'func2_name':func2, ...}

Mixing in lambdas¶

Lambda functions could also be mixed into our functions for benchmarking with a dictionary. So, the general syntax would be :

>>> funcs = {'func1_name':func1, 'lambda1_name':lamda1, 'func2_name':func2, ...}

This is useful for directly incorporating one-liner solutions without the need of defining them beforehand.

Let’s take a sample setup where we will tile a 1D array twice with various solutions as lambda and regular functions mixed in :

import numpy as np

def numpy_concat(a):
    return np.concatenate([a, a])

# We need a dictonary to give each lambda an unique name, through keys
funcs = {'r_':lambda a:np.r_[a, a],
         'stack+reshape':lambda a:np.stack([a, a]).reshape(-1),
         'hstack':lambda a:np.hstack([a, a]),
         'concat':numpy_concat,
         'tile':lambda a:np.tile(a,2)}

Setup datasets¶

This would be a list or dictionary of datasets to be benchmarked.

A general syntax for list version would look something like this :

>>> in_ = [dataset1, dataset2, ...]

For such list type inputs, based on the datasets and additional argument indexby to benchit.timings, each dataset is assigned an index.

A general syntax for dictionary version would look something like this :

>>> in_ = {'argument_value1':dataset1, 'argument_value2':dataset2, ...}

For such dictionary type inputs, index values would be the dictionary keys.

For both lists and dicts, these index values are used for plotting, etc. With single argument cases, this is pretty straight-forward.

Now, we might have functions that accept more than one argument, let’s call those as multivar cases and focus on those. Please keep in mind that for those multivar cases, we need to feed in multivar=True into benchit.timings.

Pseudo code would look something like this :

>>> in_ = {m:generate_inputs(m,k1,k2) for m in m_list} # k1, k2 are constants
>>> t = benchit.timings(fncs, in_, multivar=True, input_name='arg0')

Groupings¶

Groupings are applicable for both single and multiple variable cases.

There are essentially two rules to form groupings :

Use dictionary as inputs.
Use a nested loop structure to form the input datasets with tuples of input parameters as the dictionary keys. These keys could be derived from the input arguments or otherwise. Essentially, we would have two or more sources of forming that input argument(s) and those are to be listed as the keys.

Thus, considering two sources, a general structure would be :

>>> in_ = {(source1_value1, source2_value1): dataset1,
           (source1_value2, source2_value2): dataset2, ...}

As stated earlier, with multiple arguments case, as the most common scenario, we would have the input arguments put in as the key elements. Thus, with functions that accept two arguments, it would be :

>>> in_ = {(argument1_value1, argument2_value1): dataset1,
           (argument1_value2, argument2_value2): dataset2, ...}

Example :

Let’s take a complete example to understand groupings :

>>> in_ = {(argument1_value1, argument2_value1): dataset1,
           (argument1_value1, argument2_value2): dataset2,
           (argument1_value1, argument2_value3): dataset3,
           (argument1_value2, argument2_value1): dataset4,
           (argument1_value2, argument2_value2): dataset5,
           (argument1_value2, argument2_value3): dataset6, ...}

Thus,

Considering argument1 values as reference, we would have 2 groups - (dataset1, 2, 3) and (dataset4, 5, 6).
Considering argument2 values as reference, we would have 3 groups - (dataset1, 4), (dataset2, 5) and (dataset3, 6).

Optionally, to finalize the groupings with proper names, we can assign names to each argument with input_name argument to benchit.timings. So, input_name would be a list or tuple specifying the names for each argument as its elements as strings. These would be picked up for labelling purpose when plotting.

Thus, a complete pseudo code to form groupings with a two-level nested loop would look something like this :

>>> in_ = {(m,n):generate_inputs(m,n) for m in m_list for n in n_list}
>>> t = benchit.timings(fncs, in_, input_name=['arg0', 'arg1'])

Plots on groupings would result in subplots. More on this with examples is shown later in this document. Note that we can have a n-level nested loop structure and the subplots would take care of the plotting.