gap_train_test_split

tscv.gap_train_test_split(*arrays, **options)[source]

Split arrays or matrices into random train and test subsets (with a gap)

Parameters
*arrayssequence of indexables with same length / shape[0]

Allowed inputs are lists, numpy arrays, scipy-sparse matrices or pandas dataframes.

gap_sizefloat or int, default=0

If float, should be between 0.0 and 1.0 and represent the proportion of the dataset between the training and the test set. If int, represents the absolute number of the dropped samples.

test_sizefloat, int, or None, default=None

If float, should be between 0.0 and 1.0 and equal to test / (train + test). If int, represents the absolute number of test samples. If None, the value is set to the complement of the train size and the gap. If train_size is also None, it will be set to 0.25.

train_sizefloat, int, or None, default=None

If float, should be between 0.0 and 1.0 and equal to train / (train + test). If int, represents the absolute number of train samples. If None, the value is automatically set to the complement of the test size and the gap size.

Returns
splittinglist, length=2 * len(arrays)

List containing train-test split of inputs.

Examples

>>> import numpy as np
>>> from tscv import gap_train_test_split
>>> X, y = np.arange(10).reshape((5, 2)), range(5)
>>> X
array([[0, 1],
       [2, 3],
       [4, 5],
       [6, 7],
       [8, 9]])
>>> list(y)
[0, 1, 2, 3, 4]
>>> X_train, X_test, y_train, y_test = gap_train_test_split(
...     X, y, test_size=0.33, gap_size=1)
...
>>> X_train
array([[0, 1],
       [2, 3],
       [4, 5]])
>>> y_train
[0, 1, 2]
>>> X_test
array([[8, 9]])
>>> y_test
[4]
>>> gap_train_test_split(list(range(10)), gap_size=0.1)
[[0, 1, 2, 3, 4, 5, 6], [8, 9]]