peakdet: A toolbox for physiological peak detection analyses¶
This package is designed for use in the reproducible processing and analysis of physiological data, like those collected from respiratory belts, pulse photoplethysmography, or electrocardiogram (ECG/EKG) monitors.
Overview¶
Physiological data are messy and prone to artifact (e.g., movement in respiration and pulse, ectopic beats in ECG). Despite leaps and bounds in recent algorithms for processing these data there still exists a need for manual inspection to ensure such artifacts have been appropriately removed. Because of this manual intervention step, understanding exactly what happened to go from “raw” data to “analysis-ready” data can often be difficult or impossible.
This toolbox, peakdet
, aims to provide a set of tools that will work with a
variety of input data to reproducibly generate manually-corrected, analysis-
ready physiological data. If you’d like more information about the package,
including how to install it and some example instructions on its use, check out
our documentation!
Development and getting involved¶
This package has been largely developed in the spare time of a single graduate student (@rmarkello) with help from some incredible contributors. While it would be amazing if anyone else finds it helpful, given the limited time constraints of graduate school, the current package is not currently accepting requests for new features.
However, if you’re interested in getting involved in the project, we’re thrilled to welcome new contributors! You should start by reading our contributing guidelines and code of conduct. Once you’re done with that, take a look at our issues to see if there’s anything you might like to work on. Alternatively, if you’ve found a bug, are experiencing a problem, or have a question, create a new issue with some information about it!
License Information¶
This codebase is licensed under the Apache License, Version 2.0. The full
license can be found in the LICENSE file in the peakdet
distribution. You may also
obtain a copy of the license at: http://www.apache.org/licenses/LICENSE-2.0.
Contents¶
Installation and setup¶
Basic installation¶
This package requires Python >= 3.6. Assuming you have the correct version of
Python installed, you can install peakdet
by opening a terminal and running
the following:
git clone https://github.com/physiopy/peakdet.git
cd peakdet
python setup.py install
User guide¶
The Physio
data object¶
The primary funtionality of peakdet
relies on its operations being
performed on physiological data loaded into a peakdet.Physio
object. So, before we get into using peakdet
, its best to understand
a little bit about this helper class!
All you need to create a Physio
object is a data array:
>>> import numpy as np
>>> from peakdet import Physio
>>> data = np.random.rand(5000)
>>> phys = Physio(data)
>>> print(phys)
Physio(size=5000, fs=nan)
However, it is strongly recommended that you provide the sampling rate of the
data as well. Many functions in peakdet
require this information:
>>> sampling_rate = 100.0
>>> phys = Physio(data, fs=sampling_rate)
>>> print(phys)
Physio(size=5000, fs=100.0)
A Physio
object is designed to be lightweight, and mostly exists
to store raw physiological data and its corresponding sampling rate in the same
place. In most instances it can be treated like a one-dimensional
numpy.ndarray
; the underlying data can be accessed via slicing and
the object can be passed directly to most numpy
functions:
>>> phys[:5]
array([0.19151945, 0.62210877, 0.43772774, 0.78535858, 0.77997581])
>>> np.mean(phys)
0.4991523987519155
Beyond being a simple container, however, Physio
objects have a
few attributes that are of interest when working with real physiological data.
Importantly, they have a history
that records all
operations performed on the data:
>>> from peakdet import operations
>>> phys = operations.filter_physio(phys, cutoffs=0.1, method='lowpass')
>>> phys.history
[('filter_physio', {'cutoffs': 0.1, 'method': 'lowpass'})]
Moreover, if you perform peak finding on a Physio
object it will
store the indices of the detected peaks
and
troughs
alongside the object:
>>> phys = operations.peakfind_physio(phys)
>>> phys.peaks
array([ 477, 2120, 3253])
>>> phys.troughs
array([1413, 2611])
Next, we’ll move on to how you can load your data into a Physio
object in a more reproducible manner. Feel free to refer to the API
for more information.
Reproducibly loading data¶
In order to ensure that a data analysis pipeline with peakdet
is
fully reproducible from start to finish, it is recommended that you load data
with the built-in peakdet
IO functions.
The peakdet.load_physio()
function is the most simple of these
functions, and accepts data stored as single-column text file. For example, if
we have a file ECG.csv that we might normally load with numpy
:
>>> import numpy as np
>>> np.loadtxt('ECG.csv')
array([ 1.66656, 1.53076, 1.38153, ..., -0.05188, -0.05249, -0.05554])
we can instead load it into a Physio
object in one step with:
>>> from peakdet import load_physio
>>> ecg = load_physio('ECG.csv', fs=1000.)
>>> print(ecg)
Physio(size=44611, fs=1000.0)
>>> ecg.data
array([ 1.66656, 1.53076, 1.38153, ..., -0.05188, -0.05249, -0.05554])
This way, the loading of the data is retained in the object’s history:
>>> ecg.history
[('load_physio', {'allow_pickle': False, 'data': 'ECG.csv', 'dtype': None, 'fs': 1000.0, 'history': None})]
There are also a number of functions for loading data from “standard” formats. For example, if your data were collected using the rtpeaks module, it might look like this:
>>> import pandas as pd
>>> pd.read_csv('rtpeaks.csv').head()
time channel1 channel2 channel9
0 1.0 4.984436 0.020752 -0.333862
1 2.0 4.984131 0.021057 -0.328979
2 3.0 4.984131 0.021057 -0.325623
3 4.0 4.984436 0.021057 -0.323792
4 5.0 4.983826 0.025024 -0.319519
Instead, you can load it with peakdet.load_rtpeaks()
so that it is
recorded in the object’s history:
>>> from peakdet import load_rtpeaks
>>> ecg = load_rtpeaks('rtpeaks.csv', fs=1000., channel=9)
>>> print(ecg)
Physio(size=40, fs=1000.0)
>>> ecg[:5]
array([-0.3338623 , -0.32897949, -0.32562256, -0.3237915 , -0.31951904])
>>> ecg.history
[('load_rtpeaks', {'channel': 9, 'fname': 'rtpeaks.csv', 'fs': 1000.0})]
Processing physiological data¶
There are a few common processing steps often performed when working with physiological data:
We have already seen that peakdet.operations
has functions to perform
a few of these steps, but it is worth going into all of them in a bit more
detail:
Visual inspection¶
One of the first steps to do with raw data is visually inspect it. No amount of
processing can fix bad data, and so it’s good to check that your data quality
is appropriate before continuing. Plotting data can be achieved with
plot_physio()
:
>>> from peakdet import load_physio, operations
>>> data = load_physio('ECG.csv', fs=1000.0)
>>> ax = operations.plot_physio(data)
>>> ax.set_xlim(0, 10) # doctest: +SKIP

For now this will simply plot the raw waveform, but we’ll see later how this function has some added benefits.
Interpolation¶
Raw data can often be collected at a sampling rate above what is biologically
meaningful. For example, human respiration is relatively slow, so acquiring it
at, for example, 250 Hz is often more than sufficient; when it is acquired at a
higher rate it can be quite noisy. In this case, we might want to interpolate
(or decimate) the data to a lower sampling rate, which can be done with
interpolate_physio()
:
>>> data = load_physio('RESP.csv', fs=1000.)
>>> print(data)
Physio(size=24000, fs=1000.0)
>>> data = operations.interpolate_physio(data, target_fs=250.)
>>> print(data)
Physio(size=6000, fs=250.0)
Note that the size of the data decreased by a factor of four (24000 to 6000), the same as the decrease in sampling rate.
Data can also be upsampled via interpolation, though care must be taken in interpreting the results of such a procedure:
>>> data = load_physio('PPG.csv', fs=25.0)
>>> print(data)
Physio(size=24000, fs=25.0)
>>> data = operations.interpolate_physio(data, target_fs=250.0)
>>> print(data)
Physio(size=240000, fs=250.0)
Temporal filtering¶
Once our data is at an appropriate sampling rate, we may want to apply a
temporal filter with filter_physio()
. This function
supports lowpass, highpass, bandpass, and bandstop filters with user-specified
frequency cutoffs. First, let’s take a look at our interpolated PPG data:
>>> ax = operations.plot_physio(data)
>>> ax.set_xlim(0, 10) # doctest: +SKIP

If we’re going to do peak detection, it would be great to get rid of the venous pulsations in the waveform to avoid potentially picking them up. If we apply a lowpass filter with a 1.0 Hz cutoff we can do just that:
>>> data = operations.filter_physio(data, cutoffs=1.0, method='lowpass')
>>> ax = operations.plot_physio(data)
>>> ax.set_xlim(0, 10) # doctest: +SKIP

Filter settings are highly dependent on the data, so visually confirming that the filter is performing as expected is important!
Peak detection¶
Many physiological processing pipelines requiring performing peak detection on
the data (e.g., to calculate heart rate, respiratory rate, pulse rate). That
process can be accomplished with peakfind_physio()
:
>>> data = operations.peakfind_physio(data, thresh=0.1, dist=100)
>>> data.peaks[:10]
array([ 164, 529, 901, 1278, 1628, 1983, 2381, 2774, 3153, 3486])
>>> data.troughs[:10]
array([ 356, 732, 1111, 1465, 1817, 2205, 2603, 2989, 3330, 3677])
The peaks
and troughs
attributes mark
the indices of the detected peaks and troughs in the data; these can be
converted to time series by dividing by the sampling frequency:
>>> data.peaks[:10] / data.fs
array([ 0.656, 2.116, 3.604, 5.112, 6.512, 7.932, 9.524, 11.096,
12.612, 13.944])
>>> data.troughs[:10] / data.fs
array([ 1.424, 2.928, 4.444, 5.86 , 7.268, 8.82 , 10.412, 11.956,
13.32 , 14.708])
Once these attributes are instantiated, subsequent calls to
plot_physio()
will denote peaks with red dots and troughs
with green dots to aid visual inspection:
>>> ax = operations.plot_physio(data)
>>> ax.set_xlim(0, 10) # doctest: +SKIP

Manually editing data¶
Physiological data are messy and prone to artifact (e.g., movement in respiration and pulse, ectopic beats in ECG). Despite leaps and bounds in recent algorithms for processing these data there still exists a need for manual inspection to ensure such artifacts have been appropriately removed.
To do this reproducibly, however, we want to ensure there is a record of these
manual changes. The edit_physio()
function was designed
for this exact purpose. Invoking it will open up an interactive viewer that
allows you to remove peaks from noisy portions of the time series, or delete
peaks/troughs that were erroneously detected.
First, let’s process some pulse photoplethysmography data collected at 25 Hz:
>>> from peakdet import load_physio, operations
>>> data = load_physio('PPG.csv', fs=25.0)
>>> data = operations.interpolate_physio(data, target_fs=250.0)
>>> data = operations.filter_physio(data, cutoffs=1.0, method='lowpass')
>>> data = operations.peakfind_physio(data, thresh=0.1, dist=100)
>>> ax = operations.plot_physio(data)
>>> ax.set_xlim(0, 10) # doctest: +SKIP

The data looks good, but it would be nice to go through it all quickly and
ensure that there are no noisy sections or peaks/troughs that need to be
removed. We can do that easily with edit_physio()
:
>>> data = operations.edit_physio(data)
This function will open up an interactive viewer, which supports scrolling through the time series (with the scroll wheel), rejection of noisy segments of data (left click + drag, red highlight), and deleting peaks / troughs that were erroneously detected and shouldn’t be considered at all (right click + drag, blue highlight):

If you accidentally reject or delete peaks, you can undo the selection with
ctrl+z
(or command+z
if you’re on a Mac). The interactive viewer will
track your history of edits as long as it is open, so you can undo multiple
selections, if desired. Once done, you can close the interactive viewer by
pressing ctrl+q
(command+q
).
All edits performed in the editor are stored in the history
of the data object, as with other operations, ensuring a record of your manual
changes are retained.
Reproducibly saving data¶
Once you’ve gone through preprocessing and manually editing your data, you’ll
likely want to save your work. peakdet
provides two ways to save your
outputs, depending on your data storage needs.
Duplicating data¶
If you don’t mind storing multiple copies of your data, you can simply save the
Physio
object directly using peakdet.save_physio()
:
>>> from peakdet import save_physio
>>> path = save_physio('out.phys', data)
If later on you want to reload the processed data you can do so:
>>> from peakdet import load_physio
>>> data = load_physio('out.phys', allow_pickle=True)
Warning
peakdet.save_physio()
uses numpy.savez()
to save the data
objects, meaning the generated files can actually be quite a bit larger
than the original data, themselves! Moreover, you NEED to pass the
allow_pickle=True
parameter; this is turned off by default for safety
reasons, as you should never load pickle files not generated by a trusted
source.
Saving history¶
If you loaded all your data using the IO functions contained in
peakdet
then your Physio
objects should have a
complete history
. If that’s the case, you can avoid saving
a duplicate copy of your entire data structure and just save the history! To do
this we can use peakdet.save_history()
:
>>> from peakdet import save_history
>>> print(data)
Physio(size=240000, fs=250.0)
>>> path = save_history('out.json', data)
The history is saved as a JSON file. If you’re unfamiliar, JSON files are plain text files that can store lists and dictionaries–which is exactly what the history is!
We can then load in the history (and recreate the Physio
object
it described) with peakdet.load_history()
:
>>> from peakdet import load_history
>>> reloaded_data = load_history('out.json')
>>> print(reloaded_data)
Physio(size=240000, fs=250.0)
The data
object contains all the processing steps (including manual edits!)
that were performed on the original physiological data.
Relative paths in history¶
While the saved history file (in the above example, out.json
) can be stored
anywhere (next to the raw data file typically makes sense!), extra care must be
taken when loading it back in. Because the history file contains a path to the
raw data file you must ensure that it is loaded with load_history()
from the same directory in which the raw data were originally loaded.
Let’s say that we have a directory tree that looks like the following:
./experiment
├── code/
│ └── preprocess.py
└── data/
└── sub-001/
└── PPG.csv
We navigate to this directory (cd experiment
) and run python
code/preprocess.py
, which generates a history file:
./experiment
├── code/
│ └── preprocess.py
└── data/
└── sub-001/
├── PPG.csv
└── PPG_history.json
Now, say we zip the entire experiment
directory to send to a collaborator
who wants to run some analyses on our processed data. If they want to
regenerate the Physio
objects we created from the saved history
files, they must call load_history()
from within the experiment
directory—calling it from anywhere else in the directory tree will result in
a FileNotFoundError
.
Note
In order to be able to reproducibly regenerate data using history files, you need to ensure that you load your data using relative paths from the get-go!
Deriving physiological metrics¶
Once you’ve processed your physiological data, chances are you want to use it
to calculate some derived metrics. peakdet
currently only support
metrics related to heart rate variability, accessible through the peakdet.HRV
class.
Assuming you have a Physio
object that contains some sort of
heart data and you’ve performed peak detection on it, you can provide it
directly to the HRV
class:
>>> from peakdet import HRV
>>> metrics = HRV(data)
>>> print(f'{metrics.rmssd:.2f} ms')
26.61 ms
The HRV
class contains many common heart rate variability metrics
including the root mean square of successive differences, as shown above. It
also calculates the R-R interval time series from the provided data, which can
be accessed via the rrint
attribute. The corresponding
HRV.rrtime
attribute details the times at which these intervals
occurred.
Take a look at the API for more information.
API¶
Physiological data¶
-
class
peakdet.
Physio
(data, fs=None, history=None, metadata=None)[source]¶ Class to hold physiological data and relevant information
Parameters: - data (array_like) – Input data array
- fs (float, optional) – Sampling rate of data (Hz). Default: None
- history (list of tuples, optional) – Functions performed on data. Default: None
- metadata (dict, optional) – Metadata associated with data. Default: None
-
data
¶ Physiological waveform
Type: numpy.ndarray
-
fs
¶ Sampling rate of data in Hz
Type: float
-
history
¶ History of functions that have been performed on data, with relevant parameters provided to functions.
Type: list of tuples
-
peaks
¶ Indices of peaks in data
Type: numpy.ndarray
-
troughs
¶ Indices of troughs in data
Type: numpy.ndarray
Loading data¶
-
peakdet.
load_physio
(data, *, fs=None, dtype=None, history=None, allow_pickle=False)[source]¶ Returns Physio object with provided data
Parameters: - data (str or array_like or Physio_like) – Input physiological data. If array_like, should be one-dimensional
- fs (float, optional) – Sampling rate of data. Default: None
- dtype (data_type, optional) – Data type to convert data to, if conversion needed. Default: None
- history (list of tuples, optional) – Functions that have been performed on data. Default: None
- allow_pickle (bool, optional) – Whether to allow loading if data contains pickled objects. Default: False
Returns: data – Loaded physiological data
Return type: Raises: TypeError
– If provided data is unable to be loaded
-
peakdet.
load_history
(file, verbose=False)[source]¶ Loads history from file and replays it, creating new Physio instance
Parameters: - file (str) – Path to input JSON file
- verbose (bool, optional) – Whether to print messages as history is being replayed. Default: False
Returns: file – Full filepath to saved output
Return type: str
-
peakdet.
load_rtpeaks
(fname, channel, fs)[source]¶ Loads data file as obtained from the
rtpeaks
Python moduleData file fname should have a single, comma-delimited header of format:
time,channel#,channel#,…,channel#Raw data should be stored in columnar format, also comma-delimited, beneath this header. All data should be stored as integers. For more information, see the
rtpeaks
homepage: https://github.com/rmarkello/rtpeaks.Parameters: - fname (str) – Path to data file to be loaded
- channel (int) – Integer corresponding to the channel number in fname from which data should be loaded
- fs (float) – Sampling rate at which fname was acquired
Returns: data – Loaded physiological data
Return type:
Processing data¶
Functions for processing and interpreting physiological data
-
peakdet.operations.
interpolate_physio
(data, target_fs, *, kind='cubic')[source]¶ Interpolates data to desired sampling rate target_fs
Parameters: - data (Physio_like) – Input physiological data to be interpolated
- target_fs (float) – Desired sampling rate for data
- kind (str or int, optional) – Type of interpolation to perform. Must be one of available kinds in
scipy.interpolate.interp1d()
. Default: ‘cubic’
Returns: interp – Interpolated input data
Return type:
-
peakdet.operations.
filter_physio
(data, cutoffs, method, *, order=3)[source]¶ Applies an order-order digital method Butterworth filter to data
Parameters: - data (Physio_like) – Input physiological data to be filtered
- cutoffs (int or list) – If method is ‘lowpass’ or ‘highpass’, an integer specifying the lower or upper bound of the filter (in Hz). If method is ‘bandpass’ or ‘bandstop’, a list specifying the lower and upper bound of the filter (in Hz).
- method ({'lowpass', 'highpass', 'bandpass', 'bandstop'}) – The type of filter to apply to data
- order (int, optional) – Order of filter to be applied. Default: 3
Returns: filtered – Filtered input data
Return type:
-
peakdet.operations.
peakfind_physio
(data, *, thresh=0.2, dist=None)[source]¶ Performs peak and trough detection on data
Parameters: - data (Physio_like) – Input data in which to find peaks
- thresh (float [0,1], optional) – Relative height threshold a data point must surpass to be classified as a peak. Default: 0.2
- dist (int, optional) – Distance in indices that peaks must be separated by in data. If None, this is estimated. Default: None
Returns: peaks – Input data with detected peaks and troughs
Return type:
-
peakdet.operations.
plot_physio
(data, *, ax=None)[source]¶ Plots data and associated peaks / troughs
Parameters: - data (Physio_like) – Physiological data to plot
- ax (
matplotlib.axes.Axes
, optional) – Axis on which to plot data. If None, a new axis is created. Default: None
Returns: ax – Axis with plotted data
Return type:
-
peakdet.operations.
edit_physio
(data)[source]¶ Opens interactive plot with data to permit manual editing of time series
Parameters: data (Physio_like) – Physiological data to be edited Returns: edited – Input data with manual edits Return type: peakdet.Physio
Saving data¶
-
peakdet.
save_physio
(fname, data)[source]¶ Saves data to fname
Parameters: - fname (str) – Path to output file; .phys will be appended if necessary
- data (Physio_like) – Data to be saved to file
Returns: fname – Full filepath to saved output
Return type: str
-
peakdet.
save_history
(file, data)[source]¶ Saves history of physiological data to file
Saved file can be replayed with peakdet.load_history
Parameters: - file (str) – Path to output file; .json will be appended if necessary
- data (Physio_like) – Data with history to be saved to file
Returns: file – Full filepath to saved output
Return type: str
Derived metrics¶
-
class
peakdet.
HRV
(data)[source]¶ Class for calculating various HRV statistics
Parameters: data (Physio_like) – Physiological data object with detected peaks and troughs -
rrint
¶ R-R intervals derived from data (sometimes referred to as N-N intervals in derived metrics)
Type: numpy.ndarray
-
rrtime
¶ Time stamps of rrint
Type: numpy.ndarray
-
avgnn
¶ Average heart rate (N-N interval)
Type: float
-
sdnn
¶ Standard deviation of heart rate (N-N intervals)
Type: float
-
rmssd
¶ Root mean square of successive differences
Type: float
-
sdsd
¶ Standard deviation of successive differences
Type: float
-
nn50
¶ Number of N-N intervals greater than 50ms
Type: float
-
pnn50
¶ Percent of N-N intervals greater than 50ms
Type: float
-
nn20
¶ Number of N-N intervals greater than 20ms
Type: float
-
pnn20
¶ Percent of N-N intervals greater than 20ms
Type: float
-
hf
¶ High-frequency power of R-R intervals, summed across 0.15-0.40 Hz
Type: float
-
hf_log
¶ Log of hf
Type: float
-
lf
¶ Low-frequency power of R-R intervals, summed across 0.04-0.15 Hz
Type: float
-
lf_log
¶ Log of lf
Type: float
-
vlf
¶ Very low frequency power of R-R intervals, summed across 0-0.04 Hz
Type: float
-
vlf_log
¶ Log of vlf
Type: float
-
lftohf
¶ Ratio of lf over hf
Type: float
-
hf_peak
¶ Peak frequency in hf band (0.15-0.40 Hz)
Type: float
-
lf_peak
¶ Peak frequency in lf band (0.04-0.15 Hz)
Type: float
Notes
Uses scipy.signal.welch for calculation of frequency-based statistics
-