Answer a question

I am trying to plot feature importances for a random forest model and map each feature importance back to the original coefficient. I've managed to create a plot that shows the importances and uses the original variable names as labels but right now it's ordering the variable names in the order they were in the dataset (and not by order of importance). How do I order them in order of feature importance? Thanks!

enter image description here

My code is:

importances = brf.feature_importances_
std = np.std([tree.feature_importances_ for tree in brf.estimators_],
         axis=0)
indices = np.argsort(importances)[::-1]

# Print the feature ranking
print("Feature ranking:")

for f in range(x_dummies.shape[1]):
    print("%d. feature %d (%f)" % (f + 1, indices[f], importances[indices[f]]))

# Plot the feature importances of the forest
plt.figure(figsize=(8,8))
plt.title("Feature importances")
plt.bar(range(x_train.shape[1]), importances[indices],
   color="r", yerr=std[indices], align="center")
feature_names = x_dummies.columns
plt.xticks(range(x_dummies.shape[1]), feature_names)
plt.xticks(rotation=90)
plt.xlim([-1, x_dummies.shape[1]])
plt.show()

Answers

A sort of generic solution would be to throw the features/importances into a dataframe and sort them before plotting:

import pandas as pd
%matplotlib inline
#do code to support model
#"data" is the X dataframe and model is the SKlearn object

feats = {} # a dict to hold feature_name: feature_importance
for feature, importance in zip(data.columns, model.feature_importances_):
    feats[feature] = importance #add the name/value pair 

importances = pd.DataFrame.from_dict(feats, orient='index').rename(columns={0: 'Gini-importance'})
importances.sort_values(by='Gini-importance').plot(kind='bar', rot=45)
Logo

Python社区为您提供最前沿的新闻资讯和知识内容

更多推荐