GapRollForward

class tscv.GapRollForward(*, min_train_size=0, max_train_size=inf, min_test_size=1, max_test_size=1, gap_size=0, roll_size=None)[source]

A more flexible and thus powerful version of walk forward

New in version 0.1.

Provides train/test indices to split time series data samples that are observed at fixed time intervals, in train/test sets. In each split, test indices must be higher than before.

Parameters
min_train_sizeint, default=0

Minimum size for the training set. Can be 0.

max_train_sizeint, default=np.inf

Maximum size for the training set, aka the window.

min_test_sizeint, default=1

Minimum size for the test set. Will stop rolling when there are not enough remaining data samples.

max_test_sizeint, default=1

Maximum size for the test set. Set it to a small number so that each split will not use up the whole sample.

gap_sizeint, default=0

The gap between the training set and the test set.

roll_sizeint, default=`max_test_size`

The length by which each split move forward. The default value ensures that each data sample is test for at most once. A smaller value allows overlapped test sets. It has a similar flavor with rolling back but with the opposite direction.

Examples

>>> import numpy as np
>>> from tscv import GapRollForward
>>> X = np.array([[1, 2], [3, 4], [1, 2], [3, 4], [1, 2], [3, 4]])
>>> y = np.array([1, 2, 3, 4, 5, 6])
>>> cv = GapRollForward()
>>> print(cv)
GapRollForward(gap_size=0, max_test_size=1, max_train_size=inf,
        min_test_size=1, min_train_size=0, roll_size=1)
>>> for train_index, test_index in cv.split(X):
...    print("TRAIN:", train_index, "TEST:", test_index)
...    X_train, X_test = X[train_index], X[test_index]
...    y_train, y_test = y[train_index], y[test_index]
TRAIN: [] TEST: [0]
TRAIN: [0] TEST: [1]
TRAIN: [0 1] TEST: [2]
TRAIN: [0 1 2] TEST: [3]
TRAIN: [0 1 2 3] TEST: [4]
TRAIN: [0 1 2 3 4] TEST: [5]
>>> X = np.random.randn(10, 2)
>>> y = np.random.randn(10)
>>> cv = GapRollForward(min_train_size=1, max_train_size=3,
...                     min_test_size=1, max_test_size=3,
...                     gap_size=2, roll_size=2)
>>> for train_index, test_index in cv.split(X):
...    print("TRAIN:", train_index, "TEST:", test_index)
...    X_train, X_test = X[train_index], X[test_index]
...    y_train, y_test = y[train_index], y[test_index]
TRAIN: [0] TEST: [3 4 5]
TRAIN: [0 1 2] TEST: [5 6 7]
TRAIN: [2 3 4] TEST: [7 8 9]
TRAIN: [4 5 6] TEST: [9]
get_n_splits(X=None, y=None, groups=None)[source]

Returns the number of splitting iterations in the cross-validator

Parameters
Xarray-like, shape (n_samples, n_features)

Training data, where n_samples is the number of samples and n_features is the number of features.

yobject

Always ignored, exists for compatibility.

groupsobject

Always ignored, exists for compatibility.

Returns
n_splitsint

Returns the number of splitting iterations in the cross-validator.

split(X, y=None, groups=None)[source]

Generate indices to split data into training and test set.

Parameters
Xarray-like, shape (n_samples, n_features)

Training data, where n_samples is the number of samples and n_features is the number of features.

yarray-like, shape (n_samples,)

Always ignored, exists for compatibility.

groupsarray-like, with shape (n_samples,)

Always ignored, exists for compatibility.

Yields
trainndarray

The training set indices for that split.

testndarray

The testing set indices for that split.