gap_train_test_split
- tscv.gap_train_test_split(*arrays, **options)[source]
Split arrays or matrices into random train and test subsets (with a gap)
- Parameters
- *arrayssequence of indexables with same length / shape[0]
Allowed inputs are lists, numpy arrays, scipy-sparse matrices or pandas dataframes.
- gap_sizefloat or int, default=0
If float, should be between 0.0 and 1.0 and represent the proportion of the dataset between the training and the test set. If int, represents the absolute number of the dropped samples.
- test_sizefloat, int, or None, default=None
If float, should be between 0.0 and 1.0 and equal to test / (train + test). If int, represents the absolute number of test samples. If None, the value is set to the complement of the train size and the gap. If train_size is also None, it will be set to 0.25.
- train_sizefloat, int, or None, default=None
If float, should be between 0.0 and 1.0 and equal to train / (train + test). If int, represents the absolute number of train samples. If None, the value is automatically set to the complement of the test size and the gap size.
- Returns
- splittinglist, length=2 * len(arrays)
List containing train-test split of inputs.
Examples
>>> import numpy as np >>> from tscv import gap_train_test_split >>> X, y = np.arange(10).reshape((5, 2)), range(5) >>> X array([[0, 1], [2, 3], [4, 5], [6, 7], [8, 9]]) >>> list(y) [0, 1, 2, 3, 4] >>> X_train, X_test, y_train, y_test = gap_train_test_split( ... X, y, test_size=0.33, gap_size=1) ... >>> X_train array([[0, 1], [2, 3], [4, 5]]) >>> y_train [0, 1, 2] >>> X_test array([[8, 9]]) >>> y_test [4] >>> gap_train_test_split(list(range(10)), gap_size=0.1) [[0, 1, 2, 3, 4, 5, 6], [8, 9]]