5. Reproducibly saving data

Once you’ve gone through preprocessing and manually editing your data, you’ll likely want to save your work. peakdet provides two ways to save your outputs, depending on your data storage needs.

5.1. Duplicating data

If you don’t mind storing multiple copies of your data, you can simply save the Physio object directly using peakdet.save_physio():

>>> from peakdet import save_physio
>>> path = save_physio('out.phys', data)

If later on you want to reload the processed data you can do so:

>>> from peakdet import load_physio
>>> data = load_physio('out.phys', allow_pickle=True)

Warning

peakdet.save_physio() uses numpy.savez() to save the data objects, meaning the generated files can actually be quite a bit larger than the original data, themselves! Moreover, you NEED to pass the allow_pickle=True parameter; this is turned off by default for safety reasons, as you should never load pickle files not generated by a trusted source.

5.2. Saving history

If you loaded all your data using the IO functions contained in peakdet then your Physio objects should have a complete history. If that’s the case, you can avoid saving a duplicate copy of your entire data structure and just save the history! To do this we can use peakdet.save_history():

>>> from peakdet import save_history
>>> print(data)
Physio(size=240000, fs=250.0)
>>> path = save_history('out.json', data)

The history is saved as a JSON file. If you’re unfamiliar, JSON files are plain text files that can store lists and dictionaries–which is exactly what the history is!

We can then load in the history (and recreate the Physio object it described) with peakdet.load_history():

>>> from peakdet import load_history
>>> reloaded_data = load_history('out.json')
>>> print(reloaded_data)
Physio(size=240000, fs=250.0)

The data object contains all the processing steps (including manual edits!) that were performed on the original physiological data.

5.2.1. Relative paths in history

While the saved history file (in the above example, out.json) can be stored anywhere (next to the raw data file typically makes sense!), extra care must be taken when loading it back in. Because the history file contains a path to the raw data file you must ensure that it is loaded with load_history() from the same directory in which the raw data were originally loaded.

Let’s say that we have a directory tree that looks like the following:

./experiment
├── code/
│   └── preprocess.py
└── data/
    └── sub-001/
        └── PPG.csv

We navigate to this directory (cd experiment) and run python code/preprocess.py, which generates a history file:

./experiment
├── code/
│   └── preprocess.py
└── data/
    └── sub-001/
        ├── PPG.csv
        └── PPG_history.json

Now, say we zip the entire experiment directory to send to a collaborator who wants to run some analyses on our processed data. If they want to regenerate the Physio objects we created from the saved history files, they must call load_history() from within the experiment directory—calling it from anywhere else in the directory tree will result in a FileNotFoundError.

Note

In order to be able to reproducibly regenerate data using history files, you need to ensure that you load your data using relative paths from the get-go!