Answer a question

I have a dataframe with rows indexed by chemical element type and columns representing different samples. The values are floats representing the degree of presence of the row element in each sample.

I want to compute the mean of each row and subtract it from each value in that specific row to normalize the data, and make a new dataframe of that dataset.

I tried using mean(1), which give me a Series object with the mean for each chemical element, which is good, but then I tried using subtract, which didn't work.

Answers

You could use DataFrame's sub method and specify that the subtraction should happen row-wise (axis=0) as opposed to the default column-wise behaviour:

df.sub(df.mean(axis=1), axis=0)

Here's an example:

>>> df = pd.DataFrame({'a': [1.5, 2.5], 'b': [0.25, 2.75], 'c': [1.25, 0.75]})
>>> df
     a     b     c
0  1.5  0.25  1.25
1  2.5  2.75  0.75

The mean of each row is straightforward to calculate:

>>> df.mean(axis=1)
0    1
1    2
dtype: float64

To de-mean the rows of the DataFrame, just subtract the mean values of rows from df like this:

>>> df.sub(df.mean(axis=1), axis=0)
     a     b     c
0  0.5 -0.75  0.25
1  0.5  0.75 -1.25
Logo

Python社区为您提供最前沿的新闻资讯和知识内容

更多推荐