GapRollForward

class tscv.GapRollForward(*, min_train_size=0, max_train_size=inf, min_test_size=1, max_test_size=1, gap_size=0, roll_size=None)[source]

A more flexible and thus powerful version of walk forward

New in version 0.1.

Provides train/test indices to split time series data samples that are observed at fixed time intervals, in train/test sets. In each split, test indices must be higher than before.

Parameters

min_train_sizeint, default=0: Minimum size for the training set. Can be 0.
max_train_sizeint, default=np.inf: Maximum size for the training set, aka the window.
min_test_sizeint, default=1: Minimum size for the test set. Will stop rolling when there are not enough remaining data samples.
max_test_sizeint, default=1: Maximum size for the test set. Set it to a small number so that each split will not use up the whole sample.
gap_sizeint, default=0: The gap between the training set and the test set.
roll_sizeint, default=`max_test_size`: The length by which each split move forward. The default value ensures that each data sample is test for at most once. A smaller value allows overlapped test sets. It has a similar flavor with rolling back but with the opposite direction.

Examples

>>> import numpy as np
>>> from tscv import GapRollForward
>>> X = np.array([[1, 2], [3, 4], [1, 2], [3, 4], [1, 2], [3, 4]])
>>> y = np.array([1, 2, 3, 4, 5, 6])
>>> cv = GapRollForward()
>>> print(cv)
GapRollForward(gap_size=0, max_test_size=1, max_train_size=inf,
        min_test_size=1, min_train_size=0, roll_size=1)
>>> for train_index, test_index in cv.split(X):
...    print("TRAIN:", train_index, "TEST:", test_index)
...    X_train, X_test = X[train_index], X[test_index]
...    y_train, y_test = y[train_index], y[test_index]
TRAIN: [] TEST: [0]
TRAIN: [0] TEST: [1]
TRAIN: [0 1] TEST: [2]
TRAIN: [0 1 2] TEST: [3]
TRAIN: [0 1 2 3] TEST: [4]
TRAIN: [0 1 2 3 4] TEST: [5]
>>> X = np.random.randn(10, 2)
>>> y = np.random.randn(10)
>>> cv = GapRollForward(min_train_size=1, max_train_size=3,
...                     min_test_size=1, max_test_size=3,
...                     gap_size=2, roll_size=2)
>>> for train_index, test_index in cv.split(X):
...    print("TRAIN:", train_index, "TEST:", test_index)
...    X_train, X_test = X[train_index], X[test_index]
...    y_train, y_test = y[train_index], y[test_index]
TRAIN: [0] TEST: [3 4 5]
TRAIN: [0 1 2] TEST: [5 6 7]
TRAIN: [2 3 4] TEST: [7 8 9]
TRAIN: [4 5 6] TEST: [9]

get_n_splits(X=None, y=None, groups=None)[source]

Returns the number of splitting iterations in the cross-validator

Parameters

Xarray-like, shape (n_samples, n_features): Training data, where n_samples is the number of samples and n_features is the number of features.
yobject: Always ignored, exists for compatibility.
groupsobject: Always ignored, exists for compatibility.

Returns

n_splitsint: Returns the number of splitting iterations in the cross-validator.

split(X, y=None, groups=None)[source]

Generate indices to split data into training and test set.

Parameters

Xarray-like, shape (n_samples, n_features): Training data, where n_samples is the number of samples and n_features is the number of features.
yarray-like, shape (n_samples,): Always ignored, exists for compatibility.
groupsarray-like, with shape (n_samples,): Always ignored, exists for compatibility.

Yields

trainndarray: The training set indices for that split.
testndarray: The testing set indices for that split.