gap_train_test_split

tscv.gap_train_test_split(*arrays, **options)[source]

Split arrays or matrices into random train and test subsets (with a gap)

Parameters

*arrayssequence of indexables with same length / shape[0]: Allowed inputs are lists, numpy arrays, scipy-sparse matrices or pandas dataframes.
gap_sizefloat or int, default=0: If float, should be between 0.0 and 1.0 and represent the proportion of the dataset between the training and the test set. If int, represents the absolute number of the dropped samples.
test_sizefloat, int, or None, default=None: If float, should be between 0.0 and 1.0 and equal to test / (train + test). If int, represents the absolute number of test samples. If None, the value is set to the complement of the train size and the gap. If train_size is also None, it will be set to 0.25.
train_sizefloat, int, or None, default=None: If float, should be between 0.0 and 1.0 and equal to train / (train + test). If int, represents the absolute number of train samples. If None, the value is automatically set to the complement of the test size and the gap size.

Returns

splittinglist, length=2 * len(arrays): List containing train-test split of inputs.

Examples

>>> import numpy as np
>>> from tscv import gap_train_test_split
>>> X, y = np.arange(10).reshape((5, 2)), range(5)
>>> X
array([[0, 1],
       [2, 3],
       [4, 5],
       [6, 7],
       [8, 9]])
>>> list(y)
[0, 1, 2, 3, 4]
>>> X_train, X_test, y_train, y_test = gap_train_test_split(
...     X, y, test_size=0.33, gap_size=1)
...
>>> X_train
array([[0, 1],
       [2, 3],
       [4, 5]])
>>> y_train
[0, 1, 2]
>>> X_test
array([[8, 9]])
>>> y_test
[4]
>>> gap_train_test_split(list(range(10)), gap_size=0.1)
[[0, 1, 2, 3, 4, 5, 6], [8, 9]]