使用 pandas 比较两列

Mangs

19人浏览 · 2022-08-20 11:24:22

Mangs · 2022-08-20 11:24:22 发布

回答问题

以此为起点:

a = [['10', '1.2', '4.2'], ['15', '70', '0.03'], ['8', '5', '0']]
df = pd.DataFrame(a, columns=['one', 'two', 'three'])

Out[8]: 
  one  two three
0   10  1.2   4.2
1   15  70   0.03
2    8   5     0

我想在熊猫中使用类似if语句的东西。

if df['one'] >= df['two'] and df['one'] <= df['three']:
    df['que'] = df['one']

基本上,通过if语句检查每一行,创建新列。

文档说要使用.all但没有例子......

Answers

您可以使用np.where。如果cond是布尔数组,而A和B是数组,那么

C = np.where(cond, A, B)

定义 C 等于A,其中cond为真,B,其中cond为假。

import numpy as np
import pandas as pd

a = [['10', '1.2', '4.2'], ['15', '70', '0.03'], ['8', '5', '0']]
df = pd.DataFrame(a, columns=['one', 'two', 'three'])

df['que'] = np.where((df['one'] >= df['two']) & (df['one'] <= df['three'])
                     , df['one'], np.nan)

产量

  one  two three  que
0  10  1.2   4.2   10
1  15   70  0.03  NaN
2   8    5     0  NaN

如果您有多个条件,则可以改用np.select。例如,如果您希望df['que']在df['one'] < df['two']时等于df['two'],则

conditions = [
    (df['one'] >= df['two']) & (df['one'] <= df['three']), 
    df['one'] < df['two']]

choices = [df['one'], df['two']]

df['que'] = np.select(conditions, choices, default=np.nan)

产量

  one  two three  que
0  10  1.2   4.2   10
1  15   70  0.03   70
2   8    5     0  NaN

如果我们可以假设df['one'] < df['two']为 False 时df['one'] >= df['two'],那么条件和选择可以简化为

conditions = [
    df['one'] < df['two'],
    df['one'] <= df['three']]

choices = [df['two'], df['one']]

(如果df['one']或df['two']包含 NaN,则该假设可能不成立。)

注意

a = [['10', '1.2', '4.2'], ['15', '70', '0.03'], ['8', '5', '0']]
df = pd.DataFrame(a, columns=['one', 'two', 'three'])

定义一个带有字符串值的 DataFrame。由于它们看起来是数字,因此您最好将这些字符串转换为浮点数:

df2 = df.astype(float)

然而,这会改变结果,因为字符串是逐个字符比较的,而浮点数是按数字比较的。

In [61]: '10' <= '4.2'
Out[61]: True

In [62]: 10 <= 4.2
Out[62]: False

Python

Python社区为您提供最前沿的新闻资讯和知识内容

更多推荐

求助！为什么用InsCode部署会出现无限重定向？

Python

如何重塑熊猫。系列

问题:如何重塑熊猫。系列在我看来,它就像 pandas.Series 中的一个错误。 a = pd.Series([1,2,3,4]) b = a.reshape(2,2) b b 有类型 Series 但无法显示,最后一条语句给出异常,非常冗长,最后一行是“TypeError: %d format: a number is required, not numpy.ndarray”。 b.sha

Python

在哪里可以找到有关 Keras 中默认权重初始化器的文档? [复制]

问题:在哪里可以找到有关 Keras 中默认权重初始化器的文档? [复制] 我刚刚在这里](https://keras.io/initializers/)中阅读了有关[中的 Keras 权重初始化器的信息。在文档中,只介绍了不同的初始化程序。如: model.add(Dense(64, kernel_initializer='random_normal')) 当我没有指定kernel_initia