Answer a question

I am trying out multi-class classification with xgboost and I've built it using this code,

clf = xgb.XGBClassifier(max_depth=7, n_estimators=1000)

clf.fit(byte_train, y_train)
train1 = clf.predict_proba(train_data)
test1 = clf.predict_proba(test_data)

This gave me some good results. I've got log-loss below 0.7 for my case. But after looking through few pages I've found that we have to use another objective in XGBClassifier for multi-class problem. Here's what is recommended from those pages.

clf = xgb.XGBClassifier(max_depth=5, objective='multi:softprob', n_estimators=1000, 
                        num_classes=9)

clf.fit(byte_train, y_train)  
train1 = clf.predict_proba(train_data)
test1 = clf.predict_proba(test_data)

This code is also working but it's taking a lot of time to complete compared when to my first code.

Why is my first code also working for multi-class case? I have checked that it's default objective is binary:logistic used for binary classification but it worked really well for multi-class? Which one should I use if both are correct?

Answers

By default, XGBClassifier uses the objective='binary:logistic'. When you use this objective, it employs either of these strategies: one-vs-rest (also known as one-vs-all) and one-vs-one. It may not be the right choice for your problem at hand.

When you use objective='multi:softprob', the output is a vector of number of data points * number of classes. As a result, there is an increase in time complexity of your code.

Try setting objective=multi:softmax in your code. It is more apt for multi-class classification task.

Logo

Python社区为您提供最前沿的新闻资讯和知识内容

更多推荐