Welcome to pyexperiment’s documentation!

For a brief introduction and installation instructions, check the README on github.

The main idea behind pyexperiment is to make starting a short experiment as simple as possible without sacrificing the comfort of some basic facilities like a reasonable CLI, logging, persistent state, configuration management, plotting and timing.

Motivating Example

Let’s assume we need to write a quick and clean script that reads a couple of files with time series data and computes the average value. We also want to generate a plot of the data.

CLI

To be efficient, we split our script into three functions. read should read one or multiple raw data files, average should compute the average of the data in the read files and plot should plot the data over time. Moreover, we want to add a test for the average function to make sure it’s working correctly. In pyexperiment, we can achieve a CLI with these functions very easily. Let’s write the basic structure of our script to a file, say ‘analyzer.py’

#!/usr/bin/env python
from pyexperiment import experiment

def read(*filenames):
    pass

def average():
    pass

def plot():
    pass

class AverageTest(unittest.TestCase):
    """Tests the average function
    """
    pass

if __name__ == 'main':
    experiment.main(commands=[load, average, plot],
                    tests=[AverageTest])

Without any further code, the call to pyexperiment.experiment.main() will set up a command line interface for our application that allows executing the three functions read, average, and plot by calling analyzer read ./datafile1 ./datafile2, analyzer average, and analyzer plot respectively. A call to analyzer test will run our (yet unimplemented) unittests.

State

Next, let’s write the read function and save the loaded data to a persistent state file. To this end, we can use pyexperiment’s pyexperiment.State which we get by adding from pyexperiment import state and from pyexperiment.experiment import save_state to the top of ‘analyzer.py’. Then, assuming the data files consist of comma separated values, we can achieve this by defining load as

def read(*filenames):
    """Reads data files and stores their content
    """
    # Initialize the state with an empty list for the data
    state['data'] = []
    for filename in filenames:
        with open(filename) as f:
            state['data'] += [float(data)
                              for data in f.readlines()]
    save_state()

Note that internally, the implementation of pyexperiment.State uses a pyexperiment.utils.Singleton.Singleton wrapped by pyexperiment.utils.Singleton.delegate_singleton(), so that wherever you access the state you are accessing the same underlying data structure (in a thread safe way).

Logging

In order to better understand our results, it would be nice to have a logger to print some debug output, e.g., printing the names of the files we load and how many data points they contain. A few calls to pyexperiment’s pyexperiment.log will do the job - simply add from pyexperiment import log and add logging calls at the desired level:

def read(*filenames):
    """Reads data files and stores their content
    """
    # Initialize the state with an empty list for the data
    state['data'] = []
    for filename in filenames:
        log.info("Reading file %s", filename)
        with open(filename) as f:
            data = [float(data)
                    for data in f.readlines()]
            if len(data) == 0:
                log.warning("Datafile %s does not contain any data",
                            filename)
            log.debug("Read %i datapoints", len(data))
            state['data'] += data
    save_state()

At this point, let’s factor out a method that reads a single file to make our code more readable

def read_file(filenam):
    """Read a file and return the data
    """
    log.info("Reading file %s", filename)
    with open(filename) as f:
        data = [float(data)
        for data in f.readlines()]
        if len(data) == 0:
            log.warning("Datafile %s does not contain any data",
                        filename)
        log.debug("Read %i datapoints", len(data))
        return data


def read(*filenames):
    """Reads data files and stores their content
    """
    # Initialize the state with an empty list for the data
    state['data'] = []
    for filename in filenames:
        state['data'] += read_file(filename)
    save_state()

Configuration

You will notice that by default, pyexperiment does not log to a file and it will only print messages at, or above the ‘WARNING’ level. If you would like to see more (or less) messages, you can change the logging level by running the analyzer with an additional argument e.g., --verbosity DEBUG. In general, any configuration option can be set from the command line with -o [level[.level2.[...]]].key value.

The verbosity configuration value is predefined by pyexperiment, but we can use the same configuration mechanism for our own parameters. This is achieved by defining a specification for the configuration and passing it as the config_spec argument to the pyexperiment.experiment.main() call. For example, we may want to add an option to ignore data files longer than a certain length:

CONFIG_SPEC = ("[read]\n"
               "max_length = integer(min=1, default=100)\n")

if __name__ == '__main__':
    experiment(commands=[load, average, plot],
               tests=[AverageTest],
               config_spec=CONFIG_SPEC)

We can then access the parameters by adding from pyexperiment import conf at the top of ‘analyzer.py’ and calling pyexperiment.conf like a dictionary with the levels of the configuration separated by dots:

def read(*filenames):
    """Reads data files and stores their content
    """
    # Initialize the state with an empty list for the data
    state['data'] = []

    # Get the max length from the configuration
    max_length = conf['read.max_length']

    for filename in filenames:
        data = read_file(filename)
        if len(data < max_length):
            state['data'] += data
    save_state()

By default, pyexperiment will try to load a file called ‘config.ini’ (if necessary, one can of course override this default filename). To generate an initial configuration file with the default options, simply run analyzer save_config ./config.ini. Any options set in the resulting file will be used in future runs.

Timing

If we are loading big data files, we may also be interested to learn how much time it takes to load an individual file - there may be some room for optimization. To measure the time it takes to load a file and compute statistics, we can use pyexperiment’s timing function from the pyexperiment.Logger.

def read(*filenames):
    """Reads data files and stores their content
    """
    # Initialize the state with an empty list for the data
    state['data'] = []

    # Get the max length from the configuration
    max_length = conf['read.max_length']

    for filename in filenames:
        with log.timed("read_file"):
            data = read_file(filename)
        if len(data < max_length):
            state['data'] += data
    save_state()
    log.print_timings()

Loading State

To average over our data, we will need the state from when we called our script with the read command. By default, pyexperiment does not load the state saved in previous runs, but we can load it manually with the pyexperiment.State.load() function.

def average():
    """Returns the average of the data stored in state
    """
    state.load(conf['pyexperiment.state_filename'])
    data = state['data']
    return sum(data)/len(data)

We can now call analyzer.py load file1 file2 followed by analyzer.py average to get the average of the data points in our files. If you add timing calls you will notice that pyexperiment.state.load() returns almost immediately. By default, pyexperiment loads entries in the pyexperiment.State only when they are needed.

Plotting

Finally, let’s add the setup_figure function with from pyexperiment.utils.plot import setup_figure as well as pyplot (with from matplotlib import pyplot as plt) and write the plotter:

def plot():
    """Plots the data saved in the state
    """
    state.load(conf['pyexperiment.state_filename'])
    data = state['data']

    fig = setup_figure('Time Series Data')
    plt.plot(data)

With this code in place, we can now call analyze.py plot which will open an window with the plotted data. To make the window fullscreen, press the ‘f’ key on your keyboard, to close the window press ‘q’.

Indices and tables