Description
Is your feature request related to a problem? Please describe.
I'm working with periodic time series data (e.g., EGG data) where the data points are taken with high frequency, and it is necessary to change the sampling frequency or reduce the number of time points. It would be great to have built-in subsampling methods for periodic time series in tslearn.
This would make it much easier to preprocess data like EGG signals directly within the library.
Describe the solution you'd like
I would like to see built-in subsampling methods for time series in tslearn
.
Requested methods include:
- Simple sampling frequency change by applying functions such as 'mean', 'max', 'overlapping mean', or 'uniform' (selecting every N-th point). For example:
def subsample_data(time_series, reduction_factor=2, method='nth'):
"""
Subsamples the data from the original frequency to the target frequency.
Parameters:
time_series (numpy array): The original data.
original_frequency (int): The original sampling frequency (e.g., 250 Hz).
target_frequency (int): The target sampling frequency (e.g., 1 Hz).
method (str): The method to use for subsampling. Options are 'mean', 'max', 'overlapping_mean', or 'nth'.
Returns:
numpy array: The subsampled data.
"""
if method == 'mean':
# Take the mean of every 'factor' samples
subsampled_time_series = np.mean(time_series.reshape(-1, reduction_factor), axis=1)
elif method == 'max':
# Take the max of every 'factor' samples
subsampled_time_series = np.max(time_series.reshape(-1, reduction_factor), axis=1)
elif method == 'overlapping_mean':
# Take the mean of overlapping windows of size 'factor'
subsampled_time_series = np.array([np.mean(time_series[i: i + reduction_factor + reduction_factor //2]) for i in range(0, len(time_series), reduction_factor)])
elif method == 'nth':
# Take every 'factor'-th sample
subsampled_time_series = time_series[::reduction_factor]
return subsampled_time_series
- Periodic time series specific, amplitude-preserving subsampling methods inspired by signal processing, such as applying 'Nyquist
filter before subsampling. This option would make it easier to preprocess and analyze periodic time series data directly with
tslearn`. For example:
def nyquist_subsample(time_series, reduction_factor, sampling_rate):
"""
Downsample a time series using the Nyquist principle with anti-aliasing.
Parameters
----------
time_series (np.ndarray): The input time series (1D or 2D).
reduction_factor(int): The factor by which to reduce the number of samples.
sampling_rate (float): The original sampling rate (Hz).
Returns:
numpy array: The subsampled data.
"""
# Calculate the new sampling rate after reduction
new_sampling_rate = sampling_rate / reduction_factor
# Design a low-pass Butterworth filter to prevent aliasing
nyquist = 0.5 * sampling_rate
cutoff = 0.5 * new_sampling_rate
normalized_cutoff = cutoff / nyquist
b, a = signal.butter(N=5, Wn=normalized_cutoff, btype='low')
# Apply the filter to the signal
filtered = signal.filtfilt(b, a, time_series)
# Select every Nth sample
indices = np.arange(0, len(time_series), reduction_factor)
subsampled_time_series = filtered[indices]
return subsampled_,time_series
This functionality would be especially useful for biomedical signals (e.g., EGG or EKG data) and other periodic time series where both frequency and amplitude information are important.