KAGGLE ENSEMBLING GUIDE

http://mlwave.com/kaggle-ensembling-guide/

creating ensembles from submission files

  • no need to retrain a model
  • quick
  • already existion model prediction
  • ideal when teaming up

voting ensembles

classification with metrics.accuracy_scroe

error correcting
  • 错误率低,vote,少数服从多数
correlation
  • 不相关携带更多信息
  • work better to ensemble low-correlated model predictions
weighting
  • give a better model more weight
  • 避免democracy 平均

提升幅度有限 1%

averaging

classification and regression adn metrics.AUC, squared error, logatrimic loss
also called bagging
- reduces overfit
- average can reduces noisy impact
- a single poorly cross-validated
- overfitted submission may even bring you some gain through adding diversity (thus less correlation).

rank averaging
  • not all predictors are perfectly calibrated
  • average会使结果趋同,减少出现偏离程度大的预测
  • 将结果排序然后增加差异

stacked generalization

  • a pool of base classifiers
  • another classifier to combine their predictions to reducing the generalization error

2-fold stacking:
1. Split the train set in 2 parts: train_a and train_b
1. Fit a first-stage model on train_a and create predictions for train_b
1. Fit the same model on train_b and create predictions for train_a
1. Finally fit the model on the entire train set and create predictions for the test set.
1. Now train a second-stage stacker model on the probabilities from the first-stage model(s).

A stacker model gets more information on the problem space by using the first-stage predictions as features, than if it was trained in isolation.

the level 0 generalizers should “span the space”.

The more each generalizer has to say (which isn’t duplicated in what the other generalizer’s have to say), the better the resultant stacked generalization.

creating out-of-fold predictions for the train set

blending

very close to stacked generalization, but a bit simpler and less risk of an information leak.

create a small holdout set of say 10% of the train set. The stacker model then trains on this holdout set only.

  • simpler
  • The generalizers and stackers use different data.
    • wards against information leak
  • blender decides if it wants to keep that model or not.

cons:
- use less data overall
- final model may overfit holdout set
- CV not solid as stacking(calculater over more folds)

If you can not choose, you can always do both. Create stacked ensembles with stacked generalization and out-of-fold predictions. Then use a holdout set to further combine these models at a third stage.

Stacking classifiers with regressors

  • turn y-label into evenly spaced classes
    • regression porblem turn into a multiclass classification

stack with unsupervised learning techniques

online stacking

think every action as a hyper-parameter for the stacker model
- scaling the data
- number of base models
- feature selection
- imputation

Sometimes it is useful to allow XGBoost to see what a KNN-classifier sees.

Logo

CSDN联合极客时间,共同打造面向开发者的精品内容学习社区,助力成长!

更多推荐