Answer a question

EDIT: this question arose back in 2013 with pandas ~0.13 and was obsoleted by direct support for boxplot somewhere between version 0.15-0.18 (as per @Cireo's late answer; also pandas greatly improved support for categorical since this was asked.)


I can get a boxplot of a salary column in a pandas DataFrame...

train.boxplot(column='Salary', by='Category', sym='')

...however I can't figure out how to define the index-order used on column 'Category' - I want to supply my own custom order, according to another criterion:

category_order_by_mean_salary = train.groupby('Category')['Salary'].mean().order().keys()

How can I apply my custom column order to the boxplot columns? (other than ugly kludging the column names with a prefix to force ordering)

'Category' is a string (really, should be a categorical, but this was back in 0.13, where categorical was a third-class citizen) column taking 27 distinct values: ['Accounting & Finance Jobs','Admin Jobs',...,'Travel Jobs']. So it can be easily factorized with pd.Categorical.from_array()

On inspection, the limitation is inside pandas.tools.plotting.py:boxplot(), which converts the column object without allowing ordering:

  • pandas.core.frame.py.boxplot() is a passthrough to
  • pandas.tools.plotting.py:boxplot() which instantiates ...
  • matplotlib.pyplot.py:boxplot() which instantiates ...
  • matplotlib.axes.py:boxplot()

I suppose I could either hack up a custom version of pandas boxplot(), or reach into the internals of the object. And also file an enhance request.

Answers

Hard to say how to do this without a working example. My first guess would be to just add an integer column with the orders that you want.

A simple, brute-force way would be to add each boxplot one at a time.

import numpy as np
import pandas as pd
import matplotlib.pyplot as plt

df = pd.DataFrame(np.random.rand(37,4), columns=list('ABCD'))
columns_my_order = ['C', 'A', 'D', 'B']
fig, ax = plt.subplots()
for position, column in enumerate(columns_my_order):
    ax.boxplot(df[column], positions=[position])

ax.set_xticks(range(position+1))
ax.set_xticklabels(columns_my_order)
ax.set_xlim(xmin=-0.5)
plt.show()

enter image description here

Logo

Python社区为您提供最前沿的新闻资讯和知识内容

更多推荐