peakdet: A toolbox for physiological peak detection analyses

This package is designed for use in the reproducible processing and analysis of physiological data, like those collected from respiratory belts, pulse photoplethysmography, or electrocardiogram (ECG/EKG) monitors.

https://travis-ci.org/rmarkello/peakdet.svg?branch=master https://codecov.io/gh/rmarkello/peakdet/branch/master/graph/badge.svg https://readthedocs.org/projects/peakdet/badge/?version=latest https://img.shields.io/badge/license-Apache%202-blue.svg

Overview

Physiological data are messy and prone to artifact (e.g., movement in respiration and pulse, ectopic beats in ECG). Despite leaps and bounds in recent algorithms for processing these data there still exists a need for manual inspection to ensure such artifacts have been appropriately removed. Because of this manual intervention step, understanding exactly what happened to go from “raw” data to “analysis-ready” data can often be difficult or impossible.

This toolbox, peakdet, aims to provide a set of tools that will work with a variety of input data to reproducibly generate manually-corrected, analysis- ready physiological data. If you’d like more information about the package, including how to install it and some example instructions on its use, check out our documentation!

Development and getting involved

This package has been largely developed in the spare time of a single graduate student (@rmarkello) with help from some incredible contributors. While it would be amazing if anyone else finds it helpful, given the limited time constraints of graduate school, the current package is not currently accepting requests for new features.

However, if you’re interested in getting involved in the project, we’re thrilled to welcome new contributors! You should start by reading our contributing guidelines and code of conduct. Once you’re done with that, take a look at our issues to see if there’s anything you might like to work on. Alternatively, if you’ve found a bug, are experiencing a problem, or have a question, create a new issue with some information about it!

License Information

This codebase is licensed under the Apache License, Version 2.0. The full license can be found in the LICENSE file in the peakdet distribution. You may also obtain a copy of the license at: http://www.apache.org/licenses/LICENSE-2.0.

Contents

Installation and setup

Basic installation

This package requires Python >= 3.6. Assuming you have the correct version of Python installed, you can install peakdet by opening a terminal and running the following:

git clone https://github.com/physiopy/peakdet.git
cd peakdet
python setup.py install

User guide

The Physio data object

The primary funtionality of peakdet relies on its operations being performed on physiological data loaded into a peakdet.Physio object. So, before we get into using peakdet, its best to understand a little bit about this helper class!

All you need to create a Physio object is a data array:

>>> import numpy as np
>>> from peakdet import Physio
>>> data = np.random.rand(5000)
>>> phys = Physio(data)
>>> print(phys)
Physio(size=5000, fs=nan)

However, it is strongly recommended that you provide the sampling rate of the data as well. Many functions in peakdet require this information:

>>> sampling_rate = 100.0
>>> phys = Physio(data, fs=sampling_rate)
>>> print(phys)
Physio(size=5000, fs=100.0)

A Physio object is designed to be lightweight, and mostly exists to store raw physiological data and its corresponding sampling rate in the same place. In most instances it can be treated like a one-dimensional numpy.ndarray; the underlying data can be accessed via slicing and the object can be passed directly to most numpy functions:

>>> phys[:5]
array([0.19151945, 0.62210877, 0.43772774, 0.78535858, 0.77997581])
>>> np.mean(phys)
0.4991523987519155

Beyond being a simple container, however, Physio objects have a few attributes that are of interest when working with real physiological data. Importantly, they have a history that records all operations performed on the data:

>>> from peakdet import operations
>>> phys = operations.filter_physio(phys, cutoffs=0.1, method='lowpass')
>>> phys.history
[('filter_physio', {'cutoffs': 0.1, 'method': 'lowpass'})]

Moreover, if you perform peak finding on a Physio object it will store the indices of the detected peaks and troughs alongside the object:

>>> phys = operations.peakfind_physio(phys)
>>> phys.peaks
array([ 477, 2120, 3253])
>>> phys.troughs
array([1413, 2611])

Next, we’ll move on to how you can load your data into a Physio object in a more reproducible manner. Feel free to refer to the API for more information.

Reproducibly loading data

In order to ensure that a data analysis pipeline with peakdet is fully reproducible from start to finish, it is recommended that you load data with the built-in peakdet IO functions.

The peakdet.load_physio() function is the most simple of these functions, and accepts data stored as single-column text file. For example, if we have a file ECG.csv that we might normally load with numpy:

>>> import numpy as np
>>> np.loadtxt('ECG.csv')
array([ 1.66656,  1.53076,  1.38153, ..., -0.05188, -0.05249, -0.05554])

we can instead load it into a Physio object in one step with:

>>> from peakdet import load_physio
>>> ecg = load_physio('ECG.csv', fs=1000.)
>>> print(ecg)
Physio(size=44611, fs=1000.0)
>>> ecg.data
array([ 1.66656,  1.53076,  1.38153, ..., -0.05188, -0.05249, -0.05554])

This way, the loading of the data is retained in the object’s history:

>>> ecg.history
[('load_physio', {'allow_pickle': False, 'data': 'ECG.csv', 'dtype': None, 'fs': 1000.0, 'history': None})]

There are also a number of functions for loading data from “standard” formats. For example, if your data were collected using the rtpeaks module, it might look like this:

>>> import pandas as pd
>>> pd.read_csv('rtpeaks.csv').head()
   time  channel1  channel2  channel9
0   1.0  4.984436  0.020752 -0.333862
1   2.0  4.984131  0.021057 -0.328979
2   3.0  4.984131  0.021057 -0.325623
3   4.0  4.984436  0.021057 -0.323792
4   5.0  4.983826  0.025024 -0.319519

Instead, you can load it with peakdet.load_rtpeaks() so that it is recorded in the object’s history:

>>> from peakdet import load_rtpeaks
>>> ecg = load_rtpeaks('rtpeaks.csv', fs=1000., channel=9)
>>> print(ecg)
Physio(size=40, fs=1000.0)
>>> ecg[:5]
array([-0.3338623 , -0.32897949, -0.32562256, -0.3237915 , -0.31951904])
>>> ecg.history
[('load_rtpeaks', {'channel': 9, 'fname': 'rtpeaks.csv', 'fs': 1000.0})]

Processing physiological data

There are a few common processing steps often performed when working with physiological data:

We have already seen that peakdet.operations has functions to perform a few of these steps, but it is worth going into all of them in a bit more detail:

Visual inspection

One of the first steps to do with raw data is visually inspect it. No amount of processing can fix bad data, and so it’s good to check that your data quality is appropriate before continuing. Plotting data can be achieved with plot_physio():

>>> from peakdet import load_physio, operations
>>> data = load_physio('ECG.csv', fs=1000.0)
>>> ax = operations.plot_physio(data)
>>> ax.set_xlim(0, 10)  # doctest: +SKIP
_images/processing-1.png

For now this will simply plot the raw waveform, but we’ll see later how this function has some added benefits.

Interpolation

Raw data can often be collected at a sampling rate above what is biologically meaningful. For example, human respiration is relatively slow, so acquiring it at, for example, 250 Hz is often more than sufficient; when it is acquired at a higher rate it can be quite noisy. In this case, we might want to interpolate (or decimate) the data to a lower sampling rate, which can be done with interpolate_physio():

>>> data = load_physio('RESP.csv', fs=1000.)
>>> print(data)
Physio(size=24000, fs=1000.0)
>>> data = operations.interpolate_physio(data, target_fs=250.)
>>> print(data)
Physio(size=6000, fs=250.0)

Note that the size of the data decreased by a factor of four (24000 to 6000), the same as the decrease in sampling rate.

Data can also be upsampled via interpolation, though care must be taken in interpreting the results of such a procedure:

>>> data = load_physio('PPG.csv', fs=25.0)
>>> print(data)
Physio(size=24000, fs=25.0)
>>> data = operations.interpolate_physio(data, target_fs=250.0)
>>> print(data)
Physio(size=240000, fs=250.0)
Temporal filtering

Once our data is at an appropriate sampling rate, we may want to apply a temporal filter with filter_physio(). This function supports lowpass, highpass, bandpass, and bandstop filters with user-specified frequency cutoffs. First, let’s take a look at our interpolated PPG data:

>>> ax = operations.plot_physio(data)
>>> ax.set_xlim(0, 10)  # doctest: +SKIP
_images/processing-4.png

If we’re going to do peak detection, it would be great to get rid of the venous pulsations in the waveform to avoid potentially picking them up. If we apply a lowpass filter with a 1.0 Hz cutoff we can do just that:

>>> data = operations.filter_physio(data, cutoffs=1.0, method='lowpass')
>>> ax = operations.plot_physio(data)
>>> ax.set_xlim(0, 10)  # doctest: +SKIP
_images/processing-5.png

Filter settings are highly dependent on the data, so visually confirming that the filter is performing as expected is important!

Peak detection

Many physiological processing pipelines requiring performing peak detection on the data (e.g., to calculate heart rate, respiratory rate, pulse rate). That process can be accomplished with peakfind_physio():

>>> data = operations.peakfind_physio(data, thresh=0.1, dist=100)
>>> data.peaks[:10]
array([ 164,  529,  901, 1278, 1628, 1983, 2381, 2774, 3153, 3486])
>>> data.troughs[:10]
array([ 356,  732, 1111, 1465, 1817, 2205, 2603, 2989, 3330, 3677])

The peaks and troughs attributes mark the indices of the detected peaks and troughs in the data; these can be converted to time series by dividing by the sampling frequency:

>>> data.peaks[:10] / data.fs
array([ 0.656,  2.116,  3.604,  5.112,  6.512,  7.932,  9.524, 11.096,
       12.612, 13.944])
>>> data.troughs[:10] / data.fs
array([ 1.424,  2.928,  4.444,  5.86 ,  7.268,  8.82 , 10.412, 11.956,
       13.32 , 14.708])

Once these attributes are instantiated, subsequent calls to plot_physio() will denote peaks with red dots and troughs with green dots to aid visual inspection:

>>> ax = operations.plot_physio(data)
>>> ax.set_xlim(0, 10)  # doctest: +SKIP
_images/processing-7.png

Manually editing data

Physiological data are messy and prone to artifact (e.g., movement in respiration and pulse, ectopic beats in ECG). Despite leaps and bounds in recent algorithms for processing these data there still exists a need for manual inspection to ensure such artifacts have been appropriately removed.

To do this reproducibly, however, we want to ensure there is a record of these manual changes. The edit_physio() function was designed for this exact purpose. Invoking it will open up an interactive viewer that allows you to remove peaks from noisy portions of the time series, or delete peaks/troughs that were erroneously detected.

First, let’s process some pulse photoplethysmography data collected at 25 Hz:

>>> from peakdet import load_physio, operations
>>> data = load_physio('PPG.csv', fs=25.0)
>>> data = operations.interpolate_physio(data, target_fs=250.0)
>>> data = operations.filter_physio(data, cutoffs=1.0, method='lowpass')
>>> data = operations.peakfind_physio(data, thresh=0.1, dist=100)
>>> ax = operations.plot_physio(data)
>>> ax.set_xlim(0, 10)  # doctest: +SKIP
_images/editing-1.png

The data looks good, but it would be nice to go through it all quickly and ensure that there are no noisy sections or peaks/troughs that need to be removed. We can do that easily with edit_physio():

>>> data = operations.edit_physio(data)

This function will open up an interactive viewer, which supports scrolling through the time series (with the scroll wheel), rejection of noisy segments of data (left click + drag, red highlight), and deleting peaks / troughs that were erroneously detected and shouldn’t be considered at all (right click + drag, blue highlight):

_images/physio_edit.gif

If you accidentally reject or delete peaks, you can undo the selection with ctrl+z (or command+z if you’re on a Mac). The interactive viewer will track your history of edits as long as it is open, so you can undo multiple selections, if desired. Once done, you can close the interactive viewer by pressing ctrl+q (command+q).

All edits performed in the editor are stored in the history of the data object, as with other operations, ensuring a record of your manual changes are retained.

Reproducibly saving data

Once you’ve gone through preprocessing and manually editing your data, you’ll likely want to save your work. peakdet provides two ways to save your outputs, depending on your data storage needs.

Duplicating data

If you don’t mind storing multiple copies of your data, you can simply save the Physio object directly using peakdet.save_physio():

>>> from peakdet import save_physio
>>> path = save_physio('out.phys', data)

If later on you want to reload the processed data you can do so:

>>> from peakdet import load_physio
>>> data = load_physio('out.phys', allow_pickle=True)

Warning

peakdet.save_physio() uses numpy.savez() to save the data objects, meaning the generated files can actually be quite a bit larger than the original data, themselves! Moreover, you NEED to pass the allow_pickle=True parameter; this is turned off by default for safety reasons, as you should never load pickle files not generated by a trusted source.

Saving history

If you loaded all your data using the IO functions contained in peakdet then your Physio objects should have a complete history. If that’s the case, you can avoid saving a duplicate copy of your entire data structure and just save the history! To do this we can use peakdet.save_history():

>>> from peakdet import save_history
>>> print(data)
Physio(size=240000, fs=250.0)
>>> path = save_history('out.json', data)

The history is saved as a JSON file. If you’re unfamiliar, JSON files are plain text files that can store lists and dictionaries–which is exactly what the history is!

We can then load in the history (and recreate the Physio object it described) with peakdet.load_history():

>>> from peakdet import load_history
>>> reloaded_data = load_history('out.json')
>>> print(reloaded_data)
Physio(size=240000, fs=250.0)

The data object contains all the processing steps (including manual edits!) that were performed on the original physiological data.

Relative paths in history

While the saved history file (in the above example, out.json) can be stored anywhere (next to the raw data file typically makes sense!), extra care must be taken when loading it back in. Because the history file contains a path to the raw data file you must ensure that it is loaded with load_history() from the same directory in which the raw data were originally loaded.

Let’s say that we have a directory tree that looks like the following:

./experiment
├── code/
│   └── preprocess.py
└── data/
    └── sub-001/
        └── PPG.csv

We navigate to this directory (cd experiment) and run python code/preprocess.py, which generates a history file:

./experiment
├── code/
│   └── preprocess.py
└── data/
    └── sub-001/
        ├── PPG.csv
        └── PPG_history.json

Now, say we zip the entire experiment directory to send to a collaborator who wants to run some analyses on our processed data. If they want to regenerate the Physio objects we created from the saved history files, they must call load_history() from within the experiment directory—calling it from anywhere else in the directory tree will result in a FileNotFoundError.

Note

In order to be able to reproducibly regenerate data using history files, you need to ensure that you load your data using relative paths from the get-go!

Deriving physiological metrics

Once you’ve processed your physiological data, chances are you want to use it to calculate some derived metrics. peakdet currently only support metrics related to heart rate variability, accessible through the peakdet.HRV class.

Assuming you have a Physio object that contains some sort of heart data and you’ve performed peak detection on it, you can provide it directly to the HRV class:

>>> from peakdet import HRV
>>> metrics = HRV(data)
>>> print(f'{metrics.rmssd:.2f} ms')
26.61 ms

The HRV class contains many common heart rate variability metrics including the root mean square of successive differences, as shown above. It also calculates the R-R interval time series from the provided data, which can be accessed via the rrint attribute. The corresponding HRV.rrtime attribute details the times at which these intervals occurred.

Take a look at the API for more information.

API

Physiological data

class peakdet.Physio(data, fs=None, history=None, metadata=None)[source]

Class to hold physiological data and relevant information

Parameters:
  • data (array_like) – Input data array
  • fs (float, optional) – Sampling rate of data (Hz). Default: None
  • history (list of tuples, optional) – Functions performed on data. Default: None
  • metadata (dict, optional) – Metadata associated with data. Default: None
data

Physiological waveform

Type:numpy.ndarray
fs

Sampling rate of data in Hz

Type:float
history

History of functions that have been performed on data, with relevant parameters provided to functions.

Type:list of tuples
peaks

Indices of peaks in data

Type:numpy.ndarray
troughs

Indices of troughs in data

Type:numpy.ndarray

Loading data

peakdet.load_physio(data, *, fs=None, dtype=None, history=None, allow_pickle=False)[source]

Returns Physio object with provided data

Parameters:
  • data (str or array_like or Physio_like) – Input physiological data. If array_like, should be one-dimensional
  • fs (float, optional) – Sampling rate of data. Default: None
  • dtype (data_type, optional) – Data type to convert data to, if conversion needed. Default: None
  • history (list of tuples, optional) – Functions that have been performed on data. Default: None
  • allow_pickle (bool, optional) – Whether to allow loading if data contains pickled objects. Default: False
Returns:

data – Loaded physiological data

Return type:

peakdet.Physio

Raises:

TypeError – If provided data is unable to be loaded

peakdet.load_history(file, verbose=False)[source]

Loads history from file and replays it, creating new Physio instance

Parameters:
  • file (str) – Path to input JSON file
  • verbose (bool, optional) – Whether to print messages as history is being replayed. Default: False
Returns:

file – Full filepath to saved output

Return type:

str

peakdet.load_rtpeaks(fname, channel, fs)[source]

Loads data file as obtained from the rtpeaks Python module

Data file fname should have a single, comma-delimited header of format:

time,channel#,channel#,…,channel#

Raw data should be stored in columnar format, also comma-delimited, beneath this header. All data should be stored as integers. For more information, see the rtpeaks homepage: https://github.com/rmarkello/rtpeaks.

Parameters:
  • fname (str) – Path to data file to be loaded
  • channel (int) – Integer corresponding to the channel number in fname from which data should be loaded
  • fs (float) – Sampling rate at which fname was acquired
Returns:

data – Loaded physiological data

Return type:

peakdet.Physio

Processing data

Functions for processing and interpreting physiological data

peakdet.operations.interpolate_physio(data, target_fs, *, kind='cubic')[source]

Interpolates data to desired sampling rate target_fs

Parameters:
  • data (Physio_like) – Input physiological data to be interpolated
  • target_fs (float) – Desired sampling rate for data
  • kind (str or int, optional) – Type of interpolation to perform. Must be one of available kinds in scipy.interpolate.interp1d(). Default: ‘cubic’
Returns:

interp – Interpolated input data

Return type:

peakdet.Physio

peakdet.operations.filter_physio(data, cutoffs, method, *, order=3)[source]

Applies an order-order digital method Butterworth filter to data

Parameters:
  • data (Physio_like) – Input physiological data to be filtered
  • cutoffs (int or list) – If method is ‘lowpass’ or ‘highpass’, an integer specifying the lower or upper bound of the filter (in Hz). If method is ‘bandpass’ or ‘bandstop’, a list specifying the lower and upper bound of the filter (in Hz).
  • method ({'lowpass', 'highpass', 'bandpass', 'bandstop'}) – The type of filter to apply to data
  • order (int, optional) – Order of filter to be applied. Default: 3
Returns:

filtered – Filtered input data

Return type:

peakdet.Physio

peakdet.operations.peakfind_physio(data, *, thresh=0.2, dist=None)[source]

Performs peak and trough detection on data

Parameters:
  • data (Physio_like) – Input data in which to find peaks
  • thresh (float [0,1], optional) – Relative height threshold a data point must surpass to be classified as a peak. Default: 0.2
  • dist (int, optional) – Distance in indices that peaks must be separated by in data. If None, this is estimated. Default: None
Returns:

peaks – Input data with detected peaks and troughs

Return type:

peakdet.Physio

peakdet.operations.plot_physio(data, *, ax=None)[source]

Plots data and associated peaks / troughs

Parameters:
  • data (Physio_like) – Physiological data to plot
  • ax (matplotlib.axes.Axes, optional) – Axis on which to plot data. If None, a new axis is created. Default: None
Returns:

ax – Axis with plotted data

Return type:

matplotlib.axes.Axes

peakdet.operations.edit_physio(data)[source]

Opens interactive plot with data to permit manual editing of time series

Parameters:data (Physio_like) – Physiological data to be edited
Returns:edited – Input data with manual edits
Return type:peakdet.Physio

Saving data

peakdet.save_physio(fname, data)[source]

Saves data to fname

Parameters:
  • fname (str) – Path to output file; .phys will be appended if necessary
  • data (Physio_like) – Data to be saved to file
Returns:

fname – Full filepath to saved output

Return type:

str

peakdet.save_history(file, data)[source]

Saves history of physiological data to file

Saved file can be replayed with peakdet.load_history

Parameters:
  • file (str) – Path to output file; .json will be appended if necessary
  • data (Physio_like) – Data with history to be saved to file
Returns:

file – Full filepath to saved output

Return type:

str

Derived metrics

class peakdet.HRV(data)[source]

Class for calculating various HRV statistics

Parameters:data (Physio_like) – Physiological data object with detected peaks and troughs
rrint

R-R intervals derived from data (sometimes referred to as N-N intervals in derived metrics)

Type:numpy.ndarray
rrtime

Time stamps of rrint

Type:numpy.ndarray
avgnn

Average heart rate (N-N interval)

Type:float
sdnn

Standard deviation of heart rate (N-N intervals)

Type:float
rmssd

Root mean square of successive differences

Type:float
sdsd

Standard deviation of successive differences

Type:float
nn50

Number of N-N intervals greater than 50ms

Type:float
pnn50

Percent of N-N intervals greater than 50ms

Type:float
nn20

Number of N-N intervals greater than 20ms

Type:float
pnn20

Percent of N-N intervals greater than 20ms

Type:float
hf

High-frequency power of R-R intervals, summed across 0.15-0.40 Hz

Type:float
hf_log

Log of hf

Type:float
lf

Low-frequency power of R-R intervals, summed across 0.04-0.15 Hz

Type:float
lf_log

Log of lf

Type:float
vlf

Very low frequency power of R-R intervals, summed across 0-0.04 Hz

Type:float
vlf_log

Log of vlf

Type:float
lftohf

Ratio of lf over hf

Type:float
hf_peak

Peak frequency in hf band (0.15-0.40 Hz)

Type:float
lf_peak

Peak frequency in lf band (0.04-0.15 Hz)

Type:float

Notes

Uses scipy.signal.welch for calculation of frequency-based statistics