maxframe.learn.model_selection.train_test_split#

maxframe.learn.model_selection.train_test_split(*arrays, **options)[源代码]#

将数组或矩阵随机分割为训练集和测试集子集

参数:

*arrays (sequence of indexables with same length / shape[0]) -- 允许的输入包括列表、numpy 数组、scipy 稀疏矩阵或 pandas DataFrame。
test_size (float, int or None, optional (default=None)) -- 如果为浮点数，应在 0.0 到 1.0 之间，并表示测试集中包含的数据比例。如果为整数，则表示测试样本的绝对数量。如果为 None，则设置为训练集大小的补集。如果 train_size 同时也为 None，则默认设为 0.25。
train_size (float, int, or None, (default=None)) -- 如果为浮点数，应在 0.0 到 1.0 之间，并表示训练集中包含的数据比例。如果为整数，则表示训练样本的绝对数量。如果为 None，则自动设置为测试集大小的补集。
random_state (int, RandomState instance or None, optional (default=None)) -- 如果为整数，random_state 是随机数生成器使用的种子；如果为 RandomState 实例，random_state 即为随机数生成器；如果为 None，则使用 np.random 所用的 RandomState 实例作为随机数生成器。
shuffle (boolean, optional (default=True)) -- 是否在分割前打乱数据。如果 shuffle=False，则 stratify 必须为 None。目前仅支持 shuffle=False。
stratify (array-like or None (default=None)) -- 如果不为 None，则使用此参数作为类别标签以分层方式进行数据分割。

返回:

splitting -- 包含输入数据训练集和测试集分割结果的列表。

返回类型:

list, length=2 * len(arrays)

示例

>>> import maxframe.tensor as mt
>>> from maxframe.learn.model_selection import train_test_split
>>> X, y = mt.arange(10).reshape((5, 2)), range(5)
>>> X.execute()
array([[0, 1],
       [2, 3],
       [4, 5],
       [6, 7],
       [8, 9]])
>>> list(y)
[0, 1, 2, 3, 4]

>>> X_train, X_test, y_train, y_test = train_test_split(
...     X, y, test_size=0.33, random_state=42)
...
>>> X_train.execute()
array([[8, 9],
       [0, 1],
       [4, 5]])
>>> y_train.execute()
array([4, 0, 2])
>>> X_test.execute()
array([[2, 3],
       [6, 7]])
>>> y_test.execute()
array([1, 3])

>>> train_test_split(y, shuffle=False)
[array([0, 1, 2]), array([3, 4])]