maxframe.learn.model_selection.KFold#

class maxframe.learn.model_selection.KFold(n_splits=5, *, shuffle=False, random_state=None)[源代码]#

K 折交叉验证器

提供训练/测试索引以将数据划分为训练集和测试集。默认情况下将数据集分割为 k 个连续的折（不打乱顺序）。

每一轮中使用一个折作为验证集，而其余的 k - 1 个折组成训练集。

参数:

n_splits (int, default=5) -- 折数。必须至少为 2。 .. versionchanged:: 0.22 n_splits 默认值从 3 改为 5。
shuffle (bool, default=False) -- 是否在分割成批次之前打乱数据。注意，每个分割中的样本不会被打乱。
random_state (int, RandomState instance or None, default=None) -- 当 shuffle 为 True 时，random_state 会影响索引的排序，从而控制每一折的随机性。否则此参数无效。传递一个整数可在多次函数调用中产生可重现的结果。参见术语表。

示例

>>> import maxframe.tensor as mt
>>> from maxframe.learn.model_selection import KFold
>>> X = mt.array([[1, 2], [3, 4], [1, 2], [3, 4]])
>>> y = mt.array([1, 2, 3, 4])
>>> kf = KFold(n_splits=2)
>>> kf.get_n_splits(X)
2
>>> print(kf)
KFold(n_splits=2, random_state=None, shuffle=False)
>>> for train_index, test_index in kf.split(X):
...     print("TRAIN:", train_index, "TEST:", test_index)
...     X_train, X_test = X[train_index], X[test_index]
...     y_train, y_test = y[train_index], y[test_index]
TRAIN: [2 3] TEST: [0 1]
TRAIN: [0 1] TEST: [2 3]

备注

前 n_samples % n_splits 个折的大小为 n_samples // n_splits + 1，其余折的大小为 n_samples // n_splits，其中 n_samples 是样本数量。

随机化的交叉验证分割器在每次调用 split 时可能返回不同的结果。你可以通过将 random_state 设置为一个整数来使结果一致。

参见

StratifiedKFold: 考虑到类别信息，避免构建具有不平衡类别分布的折（用于二分类或多分类任务）。
GroupKFold: 非重叠组的 K 折迭代器变体。
RepeatedKFold: 重复 K 折 n 次。

__init__(n_splits=5, *, shuffle=False, random_state=None)[源代码]#

方法

`__init__`([n_splits, shuffle, random_state])
`get_n_splits`([X, y, groups])	返回交叉验证器中的分割迭代次数
`split`(X[, y, groups])	生成将数据划分为训练集和测试集的索引。