Play pandas access_ lower

Mangs

7人浏览 · 2022-08-12 16:47:50

Mangs · 2022-08-12 16:47:50 发布

Official account: Special House
Author: Peter
Editor: Peter

Hello, I'm Peter~

This will be the last article on DataFrame data filtering, focusing on the use of three pairs of functions:

iloc and loc, the most important and frequently used pair of functions
at and iat
any and all

Important learning materials: https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.DataFrame.html , various examples on the official website of pandas take you to learn.

Extended reading

Pandas has a variety of data retrieval methods. There are many functions and skills that we need to master and accumulate. The previous two articles are:

pandas data retrieval operation of various coquettish - up
Various padans access operations - medium

Analog data

Two data are simulated in this paper:

The index of the first copy is of character type
The second index uses the default numeric type

import pandas as pd
import numpy as np

# First analog data df0

df0 = pd.DataFrame(
    [[one hundred and one, one hundred and two, one hundred and forty], [114, 95, 67], [eighty-seven, 128, 117]],
    index=['language', 'mathematics', 'English'],
    columns=['Xiao Ming', 'Xiao Hong',"Xiao Sun"])

df0

#  Second analog number df

df = pd.DataFrame({
    "name":['Xiao Ming','Xiao Wang','Zhang Fei','Guan Yu','Xiao Xiao Sun','Wang Jianguo','Liu Bei'],
    "sex":['male','female','female','male','female','male','female'],
    "age":[20,23,18,21,25,21,24],
    "score":[np.nan,600,550,np.nan,610,580,634],  # Two pieces of data are missing
    "address":[
        "Nanshan District, Shenzhen City, Guangdong Province",
        np.nan,  # Missing data
        "Yuhua District, Changsha City, Hunan Province",
        "Dongcheng District ",
        "Baiyun District, Guangzhou City, Guangdong Province",
        "Jiangxia District, Wuhan City, Hubei Province",
        "Longhua District, Shenzhen City, Guangdong Province"
        ]
})

df

iloc and loc

iloc filters by numeric value, while loc filters by attribute or row index name

iloc

Directly specify the value and take out the single line record

# 1. Use value

df1 = df.iloc[1]  # Row record of single value extraction
df1

# result
name          Xiao Wang
sex            female
age           23
score      600.0
address      NaN
Name: 1, dtype: object

Use colons to indicate all

df1 = df.iloc[1,:]  # : colon indicates all
df1

# result
name          Xiao Wang
sex            female
age           23
score      600.0
address      NaN
Name: 1, dtype: object

You can also use slices to retrieve data:

df1 = df.iloc[:3]  # Take out the first 3 lines of records
df1

Take out discontinuous multi line records:

df2 = df.iloc[[1,2,4]]  # Take out multiple lines of records
df2

	name	sex	age	score	address
1	Xiao Wang	female	23	600.0	NaN
2	Zhang Fei	female	18	550.0	Yuhua District, Changsha City, Hunan Province
4	Xiao Xiao Sun	female	25	610.0	Baiyun District, Guangzhou City, Guangdong Province

# 2. Extract some column attributes of row records

df3 = df.iloc[2,0:2]
df3

# result
name    Zhang Fei
sex      female
Name: 2, dtype: object

# Use slices in the column direction in steps of 2

df4 = df.iloc[2,0:5:2]  
df4

# result
name              Zhang Fei
age               18
address    Yuhua District, Changsha City, Hunan Province
Name: 2, dtype: object

# The row index is 2 and the column index numbers are 1 and 3

df5 = df.iloc[2,[1,3]]  
df5

# result
sex          female
score    550.0
Name: 2, dtype: object

# 3. Take out the specific value

df6 = df.iloc[2,4]
df6

# result
'Yuhua District, Changsha City, Hunan Province'

Using slices in both row and column directions, you can also specify the step size:

# 4. Use slices in both row and column directions

df7 = df.iloc[0:4,0:6:2]
df7

Compare with the original data:

!!! A very useful method: NP r_， Help us extract discontinuous column attributes

# 5. Take out the discontinuous row and column data and use NP r_

df8 = df.iloc[:, np.r_[0,2:4]]
df8

	name	age	score
0	Xiao Ming	20	NaN
1	Xiao Wang	23	600.0
2	Zhang Fei	18	550.0
3	Guan Yu	21	NaN
4	Xiao Xiao Sun	25	610.0
5	Wang Jianguo	21	580.0
6	Liu Bei	24	634.0

df9 = df.iloc[np.r_[0,2:4],:]
df9

	name	sex	age	score	address
0	Xiao Ming	male	20	NaN	Nanshan District, Shenzhen City, Guangdong Province
2	Zhang Fei	female	18	550.0	Yuhua District, Changsha City, Hunan Province
3	Guan Yu	male	21	NaN	Dongcheng District

loc

Use the row index name or column attribute to retrieve data directly

# 1. Take out a single column

df10 = df.loc[:,"name"]  
df10

0     Xiao Ming
1     Xiao Wang
2     Zhang Fei
3     Guan Yu
4    Xiao Xiao Sun
5    Wang Jianguo
6     Liu Bei
Name: name, dtype: object

# 2. Fetch multiple columns

df11 = df.loc[:,["name","age"]]  
df11

#  3. Using numeric values, take out the first row with index 0

df12 = df.loc[0]
df12

name              Xiao Ming
sex                male
age               20
score            NaN
address    Nanshan District, Shenzhen City, Guangdong Province
Name: 0, dtype: object

# 4. Take out the row records with indexes of 0, 1 and 3. At this time, all column fields are reserved

df13 = df.loc[[0,1,3]]
df13

# Use colon:, to indicate all columns. The effect is the same as above

df14 = df.loc[[0,1,3],:]  
df14

# 5. Take out some rows and some columns

df15 = df.loc[[0,1,3],["name","sex","score"]]
df15

# 6,!!! Use index slice: both start and end positions are included

df16 = df.loc[0:3]
df16

df.loc[:]  # Represents all data

# 7. When filtering columns, there must be row elements

# name and score columns of all rows
df17 = df.loc[:,["name","score"]]
df17

	name	score
0	Xiao Ming	NaN
1	Xiao Wang	600.0
2	Zhang Fei	550.0
3	Guan Yu	NaN
4	Xiao Xiao Sun	610.0
5	Wang Jianguo	580.0
6	Liu Bei	634.0

# age of all rows and all subsequent columns

df18 = df.loc[:,"age":]
df18

	age	score	address
0	20	NaN	Nanshan District, Shenzhen City, Guangdong Province
1	23	600.0	NaN
2	18	550.0	Yuhua District, Changsha City, Hunan Province
3	21	NaN	Dongcheng District
4	25	610.0	Baiyun District, Guangzhou City, Guangdong Province
5	21	580.0	Jiangxia District, Wuhan City, Hubei Province
6	24	634.0	Longhua District, Shenzhen City, Guangdong Province

# 8. Partial row, age and all subsequent columns

# Remember: the start and end positions are included, which is different from python slicing

df19 = df.loc[1:3,"age":]
df19

	age	score	address
1	23	600.0	NaN
2	18	550.0	Yuhua District, Changsha City, Hunan Province
3	21	NaN	Dongcheng District

# 9. Fetching for non numeric row index

df20 = df0.loc["language"]
df20

Xiao Ming    one hundred and one
 Xiao Hong    102
 Xiao Sun    140
Name: language, dtype: int64

# 10. Note that the two square brackets take out DataFrame data, and the single bracket is Series data

df0.loc[["language"]]

	Xiao Ming	Xiao Hong	Xiao Sun
language	101	102	140

df0.loc[["language","English"]]

	Xiao Ming	Xiao Hong	Xiao Sun
language	101	102	140
English	87	128	117

# 11. Extract some row and column data

df21 = df0.loc[["language","English"],"Xiao Ming"]  
df21

language    101
 English     87
Name: Xiao Ming, dtype: int64

df0.loc[["language","English"],["Xiao Ming","Xiao Sun"]]

	Xiao Ming	Xiao Sun
language	101	140
English	87	117

# 12. Directly use the row index name to retrieve data

df0.loc[["language","English"]]

	Xiao Ming	Xiao Hong	Xiao Sun
language	101	102	140
English	87	128	117

Comparison between the two

df.loc[[1,2]]

	name	sex	age	score	address
1	Xiao Wang	female	23	600.0	NaN
2	Zhang Fei	female	18	550.0	Yuhua District, Changsha City, Hunan Province

df.iloc[[1,2]]

	name	sex	age	score	address
1	Xiao Wang	female	23	600.0	NaN
2	Zhang Fei	female	18	550.0	Yuhua District, Changsha City, Hunan Province

# Specify the column property name we need

df.loc[[1,2],["name","score"]]

	name	score
1	Xiao Wang	600.0
2	Zhang Fei	550.0

# Take out rows 1 and 2, columns 0 and 3

df.iloc[[1,2],np.r_[0,3]]

	name	score
1	Xiao Wang	600.0
2	Zhang Fei	550.0

at and iat

at

The at function is similar to loc, but what the at function takes out is only a value

df22 = df.at[4,"sex"]
df22

'female'

df.at[2,"name"]

'Zhang Fei'

df0

	Xiao Ming	Xiao Hong	Xiao Sun
language	101	102	140
mathematics	114	95	67
English	87	128	117

# Specify both index and column names

df23 = df0.at['language','Xiao Sun']
df23

140

# at and loc

df.loc[1].at['age']

23

df

	name	sex	age	score	address
0	Xiao Ming	male	20	NaN	Nanshan District, Shenzhen City, Guangdong Province
1	Xiao Wang	female	23	600.0	NaN
2	Zhang Fei	female	18	550.0	Yuhua District, Changsha City, Hunan Province
3	Guan Yu	male	21	NaN	Dongcheng District
4	Xiao Xiao Sun	female	25	610.0	Baiyun District, Guangzhou City, Guangdong Province
5	Wang Jianguo	male	21	580.0	Jiangxia District, Wuhan City, Hubei Province
6	Liu Bei	female	24	634.0	Longhua District, Shenzhen City, Guangdong Province

# The fourth element with the column name name
df.name.at[4]   

'Xiao Xiao Sun'

iat

Like iloc, only numeric indexing is supported

df24 = df.iat[2,4]
df24

'Yuhua District, Changsha City, Hunan Province'

df.loc[2].iat[4]

'Yuhua District, Changsha City, Hunan Province'

df.iloc[2].iat[4]

'Yuhua District, Changsha City, Hunan Province'

any and all

any: True if at least one is True
All: all results are required to be True

When the incoming axis=1, the query will be performed according to the row; axis=0 means query by column

Comparison of data in Series

# Two False pass any and the result is False

pd.Series([False, False]).any()   # False

pd.Series([True, False]).any()  # True

pd.Series([True, False]).all()  # False

# any: skip null value

pd.Series([np.nan]).any()  # False

pd.Series([np.nan]).any(skipna=False)  # True

# all: skip null value

pd.Series([np.nan]).all()  # True

pd.Series([np.nan]).all(skipna=False)  #True

Comparison in DataFrame

df0

	Xiao Ming	Xiao Hong	Xiao Sun
language	101	102	140
mathematics	114	95	67
English	87	128	117

# 1. Fetch the data to be queried
df0.loc[:,["Xiao Ming","Xiao Hong"]]

	Xiao Ming	Xiao Hong
language	101	102
mathematics	114	95
English	87	128

# 2. Compare
df0.loc[:,["Xiao Ming","Xiao Hong"]] >= 100

	Xiao Ming	Xiao Hong
language	True	True
mathematics	True	False
English	False	True

any

# 3. Filter with any function

df0[(df0.loc[:,["Xiao Ming","Xiao Hong"]] >= 100).any(1)]

all

Only language can meet the needs of three people at the same time, and all of them are more than 100

# 4. Use the all function to filter: only the language meets 3 people and is greater than 100 at the same time

df0[(df0.loc[:,["Xiao Ming","Xiao Hong"]] >= 100).all(1)]

summary

This paper introduces the use of three pairs of functions of pandas through simulated data. Among them, loc and iloc functions are very common and practical functions, which they often use. So far, the data filtering part of pandas has been fully introduced.

Of course, the methods introduced are only part of pandas's rich retrieval skills. There are many functions and methods that readers need to learn and accumulate by themselves. I hope the methods introduced will be helpful to you.

Starting with the next article, we will introduce various operation skills in Pandas.

Python

Python社区为您提供最前沿的新闻资讯和知识内容

更多推荐

求助！为什么用InsCode部署会出现无限重定向？

Python

如何重塑熊猫。系列

问题:如何重塑熊猫。系列在我看来,它就像 pandas.Series 中的一个错误。 a = pd.Series([1,2,3,4]) b = a.reshape(2,2) b b 有类型 Series 但无法显示,最后一条语句给出异常,非常冗长,最后一行是“TypeError: %d format: a number is required, not numpy.ndarray”。 b.sha

Python

在哪里可以找到有关 Keras 中默认权重初始化器的文档? [复制]

问题:在哪里可以找到有关 Keras 中默认权重初始化器的文档? [复制] 我刚刚在这里](https://keras.io/initializers/)中阅读了有关[中的 Keras 权重初始化器的信息。在文档中,只介绍了不同的初始化程序。如: model.add(Dense(64, kernel_initializer='random_normal')) 当我没有指定kernel_initia