Answer a question

Is it possible to change the order of columns in a dataframe in place?

If yes, would that be faster than making a copy? I am working with a large dataframe with 100 million+ rows.

I see how to change the order with a copy: How to change the order of DataFrame columns?

Answers

Their is no easy way to do this without making a copy. In theory it is possible to do if you ONLY have a single dtype (or are only changing columns WITHIN out the labels changing dtypes). But is fairly complicated, and hence is not implemented.

That said, if you are careful you can do this. You should ONLY do this with a single-dtyped frame (you are forewarned).

In [22]: df = DataFrame(np.random.randn(5,3),columns=list('ABC'))

In [23]: df
Out[23]: 
          A         B         C
0 -0.696593 -0.459067  1.935033
1  1.783658  0.612771  1.553773
2 -0.572515  0.634174  0.113974
3 -0.908203  1.454289  0.509968
4  0.776575  1.629816  1.630023

If df is multi-dtyped then df.values WILL NOT BE A VIEW (of course you can subselect out the single-dtyped frame which is a view itself). Another note, this is NOT ALWAYS POSSIBLE to have this come out as a view. It depends on what you are doing, YMMV.

e.g. df.values.take([2,0,1],axis=1) gives you the same result BUT IS A COPY.

In [24]: df2 = DataFrame(df.values[:,[2,0,1]],columns=list('ABC'))

In [25]: df2
Out[25]: 
          A         B         C
0  1.935033 -0.696593 -0.459067
1  1.553773  1.783658  0.612771
2  0.113974 -0.572515  0.634174
3  0.509968 -0.908203  1.454289
4  1.630023  0.776575  1.629816

We have a view on the original values

In [26]: df2.values.base
Out[26]: 
array([[ 1.93503267,  1.55377291,  0.1139739 ,  0.5099681 ,  1.63002264],
       [-0.69659276,  1.78365777, -0.5725148 , -0.90820288,  0.7765751 ],
       [-0.45906706,  0.61277136,  0.63417392,  1.45428912,  1.62981613]])

Note that if you then assign to df2 (another float column for instance), you will trigger a copy. So you have to be extremely careful with this.

That said the creation from a view of another frame takes almost no memory and is just a pointer, so very fast.

Logo

学AI,认准AI Studio!GPU算力,限时免费领,邀请好友解锁更多惊喜福利 >>>

更多推荐