sample_weight 与 scikit-learn 中的 class_weight 相比如何?

Mangs

76人浏览 · 2022-08-20 19:08:40

Mangs · 2022-08-20 19:08:40 发布

回答问题

我想在不平衡的分类问题上使用sklearn.ensemble.GradientBoostingClassifier。我打算针对接收器操作特性曲线 (ROC AUC)](http://scikit-learn.org/stable/modules/generated/sklearn.metrics.roc_auc_score.html)下的[区域进行优化。为此,我想重新调整我的类,使小类对分类器更重要。

这通常可以通过设置 class_weight u003d “balanced” 来完成(例如在RandomForestClassifier中),但 GradientBoostingClassifier 中没有这样的参数。

文档说:

“平衡”模式使用 y 的值自动调整权重,与输入数据中的类频率成反比,如 n_samples / (n_classes * np.bincount(y))

如果 y_train 是我的目标数据框,其中元素在 {0,1},那么文档暗示这应该与 class_weight u003d “balanced” 相同

sample_weight = y_train.shape[0]/(2*np.bincount(y_train))
clf = ensemble.GradientBoostingClassifier(**params)
clf.fit(X_train, y_train,sample_weight = sample_weight[y_train.values])

这是正确的还是我错过了什么?

Answers

我建议你在 scikit-learn 中使用class_weight.compute_sample_weight实用程序。例如:

from sklearn.utils.class_weight import compute_sample_weight
y = [1,1,1,1,0,0,1]
compute_sample_weight(class_weight='balanced', y=y)

输出:

array([ 0.7 ,  0.7 ,  0.7 ,  0.7 ,  1.75,  1.75,  0.7 ])

您可以将其用作sample_weight关键字的输入。

Python

Python社区为您提供最前沿的新闻资讯和知识内容

更多推荐

求助！为什么用InsCode部署会出现无限重定向？

Python

如何重塑熊猫。系列

问题:如何重塑熊猫。系列在我看来,它就像 pandas.Series 中的一个错误。 a = pd.Series([1,2,3,4]) b = a.reshape(2,2) b b 有类型 Series 但无法显示,最后一条语句给出异常,非常冗长,最后一行是“TypeError: %d format: a number is required, not numpy.ndarray”。 b.sha

Python

在哪里可以找到有关 Keras 中默认权重初始化器的文档? [复制]

问题:在哪里可以找到有关 Keras 中默认权重初始化器的文档? [复制] 我刚刚在这里](https://keras.io/initializers/)中阅读了有关[中的 Keras 权重初始化器的信息。在文档中,只介绍了不同的初始化程序。如: model.add(Dense(64, kernel_initializer='random_normal')) 当我没有指定kernel_initia