使用 pandas 比较两列
回答问题 以此为起点: a = [['10', '1.2', '4.2'], ['15', '70', '0.03'], ['8', '5', '0']] df = pd.DataFrame(a, columns=['one', 'two', 'three']) Out[8]: one two three 0 10 1.2 4.2 1 15 70 0.03 2 8 5 0 我想在熊猫中使用类似if
·
回答问题
以此为起点:
a = [['10', '1.2', '4.2'], ['15', '70', '0.03'], ['8', '5', '0']]
df = pd.DataFrame(a, columns=['one', 'two', 'three'])
Out[8]:
one two three
0 10 1.2 4.2
1 15 70 0.03
2 8 5 0
我想在熊猫中使用类似if
语句的东西。
if df['one'] >= df['two'] and df['one'] <= df['three']:
df['que'] = df['one']
基本上,通过if
语句检查每一行,创建新列。
文档说要使用.all
但没有例子......
Answers
您可以使用np.where。如果cond
是布尔数组,而A
和B
是数组,那么
C = np.where(cond, A, B)
定义 C 等于A
,其中cond
为真,B
,其中cond
为假。
import numpy as np
import pandas as pd
a = [['10', '1.2', '4.2'], ['15', '70', '0.03'], ['8', '5', '0']]
df = pd.DataFrame(a, columns=['one', 'two', 'three'])
df['que'] = np.where((df['one'] >= df['two']) & (df['one'] <= df['three'])
, df['one'], np.nan)
产量
one two three que
0 10 1.2 4.2 10
1 15 70 0.03 NaN
2 8 5 0 NaN
如果您有多个条件,则可以改用np.select。例如,如果您希望df['que']
在df['one'] < df['two']
时等于df['two']
,则
conditions = [
(df['one'] >= df['two']) & (df['one'] <= df['three']),
df['one'] < df['two']]
choices = [df['one'], df['two']]
df['que'] = np.select(conditions, choices, default=np.nan)
产量
one two three que
0 10 1.2 4.2 10
1 15 70 0.03 70
2 8 5 0 NaN
如果我们可以假设df['one'] < df['two']
为 False 时df['one'] >= df['two']
,那么条件和选择可以简化为
conditions = [
df['one'] < df['two'],
df['one'] <= df['three']]
choices = [df['two'], df['one']]
(如果df['one']
或df['two']
包含 NaN,则该假设可能不成立。)
注意
a = [['10', '1.2', '4.2'], ['15', '70', '0.03'], ['8', '5', '0']]
df = pd.DataFrame(a, columns=['one', 'two', 'three'])
定义一个带有字符串值的 DataFrame。由于它们看起来是数字,因此您最好将这些字符串转换为浮点数:
df2 = df.astype(float)
然而,这会改变结果,因为字符串是逐个字符比较的,而浮点数是按数字比较的。
In [61]: '10' <= '4.2'
Out[61]: True
In [62]: 10 <= 4.2
Out[62]: False
更多推荐
已为社区贡献126483条内容
所有评论(0)