I have a pandas dataframe with a column of real values that I want to zscore normalize:
>> a
array([ nan, 0.0767, 0.4383, 0.7866, 0.8091, 0.1954, 0.6307,
0.6599, 0.1065, 0.0508])
>> df = pandas.DataFrame({"a": a})
The problem is that a single nan
value makes all the array nan
:
>> from scipy.stats import zscore
>> zscore(df["a"])
array([ nan, nan, nan, nan, nan, nan, nan, nan, nan, nan])
What's the correct way to apply zscore
(or an equivalent function not from scipy) to a column of a pandas dataframe and have it ignore the nan
values? I'd like it to be same dimension as original column with np.nan
for values that can't be normalized
edit: maybe the best solution is to use scipy.stats.nanmean
and scipy.stats.nanstd
? I don't see why the degrees of freedom need to be changed for std
for this purpose:
zscore = lambda x: (x - scipy.stats.nanmean(x)) / scipy.stats.nanstd(x)
所有评论(0)