Skip to content

Feature Request: Add Built-in Subsampling Methods #537

Open
@AnnaBobasheva

Description

@AnnaBobasheva

Is your feature request related to a problem? Please describe.
I'm working with periodic time series data (e.g., EGG data) where the data points are taken with high frequency, and it is necessary to change the sampling frequency or reduce the number of time points. It would be great to have built-in subsampling methods for periodic time series in tslearn. This would make it much easier to preprocess data like EGG signals directly within the library.

Describe the solution you'd like
I would like to see built-in subsampling methods for time series in tslearn.
Requested methods include:

  • Simple sampling frequency change by applying functions such as 'mean', 'max', 'overlapping mean', or 'uniform' (selecting every N-th point). For example:
def subsample_data(time_series, reduction_factor=2, method='nth'):
    """
    Subsamples the data from the original frequency to the target frequency.

    Parameters:
        time_series (numpy array): The original data.
        original_frequency (int): The original sampling frequency (e.g., 250 Hz).
        target_frequency (int): The target sampling frequency (e.g., 1 Hz).
        method (str): The method to use for subsampling. Options are 'mean', 'max', 'overlapping_mean', or 'nth'.

    Returns:
        numpy array: The subsampled data.
    """
    if method == 'mean':
        # Take the mean of every 'factor' samples
        subsampled_time_series = np.mean(time_series.reshape(-1, reduction_factor), axis=1)
    elif method == 'max':
        # Take the max of every 'factor' samples
        subsampled_time_series = np.max(time_series.reshape(-1, reduction_factor), axis=1)
    elif method == 'overlapping_mean':
        # Take the mean of overlapping windows of size 'factor'
        subsampled_time_series = np.array([np.mean(time_series[i: i + reduction_factor + reduction_factor //2]) for i in range(0, len(time_series), reduction_factor)])
    elif method == 'nth':
        # Take every 'factor'-th sample
        subsampled_time_series = time_series[::reduction_factor]

    return subsampled_time_series
  • Periodic time series specific, amplitude-preserving subsampling methods inspired by signal processing, such as applying 'Nyquistfilter before subsampling. This option would make it easier to preprocess and analyze periodic time series data directly withtslearn`. For example:
   def nyquist_subsample(time_series, reduction_factor, sampling_rate):
    """
    Downsample a time series using the Nyquist principle with anti-aliasing.

    Parameters
    ----------
    time_series (np.ndarray): The input time series (1D or 2D).
    reduction_factor(int): The factor by which to reduce the number of samples.
    sampling_rate (float):  The original sampling rate (Hz).

    Returns:
        numpy array: The subsampled data.
    """
    # Calculate the new sampling rate after reduction
    new_sampling_rate = sampling_rate / reduction_factor

    # Design a low-pass Butterworth filter to prevent aliasing
    nyquist = 0.5 * sampling_rate
    cutoff = 0.5 * new_sampling_rate
    normalized_cutoff = cutoff / nyquist
    b, a = signal.butter(N=5, Wn=normalized_cutoff, btype='low')

    # Apply the filter to the signal
    filtered = signal.filtfilt(b, a, time_series)

    # Select every Nth sample
    indices = np.arange(0, len(time_series), reduction_factor)
    subsampled_time_series = filtered[indices]

    return subsampled_,time_series

This functionality would be especially useful for biomedical signals (e.g., EGG or EKG data) and other periodic time series where both frequency and amplitude information are important.

Metadata

Metadata

Assignees

No one assigned

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions