GapRollForward
- class tscv.GapRollForward(*, min_train_size=0, max_train_size=inf, min_test_size=1, max_test_size=1, gap_size=0, roll_size=None)[source]
A more flexible and thus powerful version of walk forward
New in version 0.1.
Provides train/test indices to split time series data samples that are observed at fixed time intervals, in train/test sets. In each split, test indices must be higher than before.
- Parameters
- min_train_sizeint, default=0
Minimum size for the training set. Can be 0.
- max_train_sizeint, default=np.inf
Maximum size for the training set, aka the window.
- min_test_sizeint, default=1
Minimum size for the test set. Will stop rolling when there are not enough remaining data samples.
- max_test_sizeint, default=1
Maximum size for the test set. Set it to a small number so that each split will not use up the whole sample.
- gap_sizeint, default=0
The gap between the training set and the test set.
- roll_sizeint, default=`max_test_size`
The length by which each split move forward. The default value ensures that each data sample is test for at most once. A smaller value allows overlapped test sets. It has a similar flavor with rolling back but with the opposite direction.
Examples
>>> import numpy as np >>> from tscv import GapRollForward >>> X = np.array([[1, 2], [3, 4], [1, 2], [3, 4], [1, 2], [3, 4]]) >>> y = np.array([1, 2, 3, 4, 5, 6]) >>> cv = GapRollForward() >>> print(cv) GapRollForward(gap_size=0, max_test_size=1, max_train_size=inf, min_test_size=1, min_train_size=0, roll_size=1) >>> for train_index, test_index in cv.split(X): ... print("TRAIN:", train_index, "TEST:", test_index) ... X_train, X_test = X[train_index], X[test_index] ... y_train, y_test = y[train_index], y[test_index] TRAIN: [] TEST: [0] TRAIN: [0] TEST: [1] TRAIN: [0 1] TEST: [2] TRAIN: [0 1 2] TEST: [3] TRAIN: [0 1 2 3] TEST: [4] TRAIN: [0 1 2 3 4] TEST: [5] >>> X = np.random.randn(10, 2) >>> y = np.random.randn(10) >>> cv = GapRollForward(min_train_size=1, max_train_size=3, ... min_test_size=1, max_test_size=3, ... gap_size=2, roll_size=2) >>> for train_index, test_index in cv.split(X): ... print("TRAIN:", train_index, "TEST:", test_index) ... X_train, X_test = X[train_index], X[test_index] ... y_train, y_test = y[train_index], y[test_index] TRAIN: [0] TEST: [3 4 5] TRAIN: [0 1 2] TEST: [5 6 7] TRAIN: [2 3 4] TEST: [7 8 9] TRAIN: [4 5 6] TEST: [9]
- get_n_splits(X=None, y=None, groups=None)[source]
Returns the number of splitting iterations in the cross-validator
- Parameters
- Xarray-like, shape (n_samples, n_features)
Training data, where n_samples is the number of samples and n_features is the number of features.
- yobject
Always ignored, exists for compatibility.
- groupsobject
Always ignored, exists for compatibility.
- Returns
- n_splitsint
Returns the number of splitting iterations in the cross-validator.
- split(X, y=None, groups=None)[source]
Generate indices to split data into training and test set.
- Parameters
- Xarray-like, shape (n_samples, n_features)
Training data, where n_samples is the number of samples and n_features is the number of features.
- yarray-like, shape (n_samples,)
Always ignored, exists for compatibility.
- groupsarray-like, with shape (n_samples,)
Always ignored, exists for compatibility.
- Yields
- trainndarray
The training set indices for that split.
- testndarray
The testing set indices for that split.