I have a dataframe with rows indexed by chemical element type and columns representing different samples. The values are floats representing the degree of presence of the row element in each sample.
I want to compute the mean of each row and subtract it from each value in that specific row to normalize the data, and make a new dataframe of that dataset.
I tried using mean(1), which give me a Series object with the mean for each chemical element, which is good, but then I tried using subtract, which didn't work.
You could use DataFrame's sub method and specify that the subtraction should happen row-wise (axis=0) as opposed to the default column-wise behaviour:
df.sub(df.mean(axis=1), axis=0)
Here's an example:
>>> df = pd.DataFrame({'a': [1.5, 2.5], 'b': [0.25, 2.75], 'c': [1.25, 0.75]})
>>> df
a b c
0 1.5 0.25 1.25
1 2.5 2.75 0.75
The mean of each row is straightforward to calculate:
>>> df.mean(axis=1)
0 1
1 2
dtype: float64
To de-mean the rows of the DataFrame, just subtract the mean values of rows from df like this:
>>> df.sub(df.mean(axis=1), axis=0)
a b c
0 0.5 -0.75 0.25
1 0.5 0.75 -1.25
所有评论(0)