【Linux】（16）文本处理命令：管道+awk+xargs+tr+sort+uniq+cut

目录一、管道 |1.1 管道的用处1.2 【拓展】进程与进程之间的通信方式1.2.1 管道 pipe1.2.2 socket文件1.3 【拓展】awk 截取1.4 ; 命令连接符、1.5 插入两个小练习1.5.11.5.21.5.3 【知识补充】如何判断一个目录是否存在1.5.4 在python中如何判断文件夹是否存在...

南昀晞

5172人浏览 · 2022-04-04 20:08:24

南昀晞 · 2022-04-04 20:08:24 发布

1.5.3 【知识补充】如何判断一个目录是否存在

1.5.4 在python中如何判断文件夹是否存在

2.3.1 【例】搭建一个nginx的web服务器，统计访问次数最多的前3个ip地址

一、管道 |

将前面命令的输出送给后面的命令使用

1.1 管道的用处

默认情况下，管道只会将前面一个命令正确的输出送给后面一个命令作为输入。

[root@localhost 0325]# ip add | grep "inet"
    inet 127.0.0.1/8 scope host lo
    inet6 ::1/128 scope host
    inet 192.168.255.132/24 brd 192.168.255.255 scope global noprefixroute dynamic ens33
    inet6 fe80::5e3f:2e7b:4978:aa4e/64 scope link noprefixroute
[root@localhost 0325]# ip fdlihua | grep "fd"
Object "fdlihua" is unknown, try "ip help".

【实现将错误的输出也传给后面的命令】

grep 是一个文本过滤命令，根据字符串去匹配，只要这一行里含有这个字符串就会输出这一行。

[root@localhost 0325]# ip fdlihua 2>&1 | grep "fd"
Object "fdlihua" is unknown, try "ip help".

1.2 【拓展】进程与进程之间的通信方式

1.2.1 管道 pipe

默认情况下，为了保护进程安全，进程和进程之间是不能随意访问。

管道文件作为一个中间文件，存放在内存里，默认只接受上一个命令正确的输出。


[root@localhost 0325]# find / -type "p"     ==》在根目录下查找管道类型的文件
/run/dmeventd-client
/run/dmeventd-server
/run/systemd/inhibit/1.ref
/run/systemd/sessions/3704.ref
/run/systemd/sessions/3703.ref
/run/systemd/sessions/2029.ref
/run/systemd/sessions/4.ref
/run/systemd/initctl/fifo
[root@localhost 0325]# ll /run/systemd/initctl/fifo
prw-------. 1 root root 0 3月  23 12:47 /run/systemd/initctl/fifo

1.2.2 socket文件

socket文件可以是存放在磁盘里的

1.3 【拓展】awk 截取

【shell编程里文本处理3剑客】

grep：过滤
awk：截取：行里的字段，一个字段代表一列
sed：替换

awk 是截取的命令

'{ }' 是固定的语法，用来输出内容

$0 代表整行

$1 代表第1个字段

$2 代表第2个字段

-F 指定分割符。awk 默认的字段和字段之间的分割符是空白（空格、ab）

, 是表示输出的时候使用一个空格作为分割符

NF 是awk里的一个变量代表一行里有多少个字段（字段的数量）number of fields

$NF 表示最后一列

[root@localhost 0324]# cat student_info.txt
name    age     sex     grade
cali    36      m       80
lihua   18      m       90
hanmei  25      f       85
liyu    22      f       93
[root@localhost 0324]# cat student_info.txt | awk '{print $4}'
grade
80
90
85
93
[root@localhost 0324]# cat student_info.txt | awk '{print $1,$4}'
name grade
cali 80
lihua 90
hanmei 85
liyu 93

1.4 ; 命令连接符、

可以将多个命令写到一行，不管前面的命令是否执行成功，都会执行所有命令

cmd1 ; cmd2 ; cmd3 先执行cmd1，再执行cmd2，再执行cmd3

[root@localhost 0325]# cd dfhw ; echo hello ; cat sadfhs
-bash: cd: dfhw: 没有那个文件或目录
hello
cat: sadfhs: 没有那个文件或目录

cmd1 && cmd2 如果cmd1执行成功就执行cmd2，如果cmd1执行不成功就不执行cmd2

cmd1 || cmd2 如果cmd1执行不成功，就执行cmd2，如果cmd1执行成功就不执行cmd2

cmd1 && cmd2 || cmd3 如果cmd1执行成功，就执行cmd2，如果执行不成功就执行cmd3

[root@localhost 0325]# echo 1 && echo 2 || echo 3
1
2
[root@localhost 0325]# cat 1 && echo 2 || echo 3
cat: 1: 没有那个文件或目录
3

以上3个命令，不要和if语句的条件弄混了

1.5 插入两个小练习

1.5.1

【答案】EGHI

[root@localhost 0325]# cat aa bb | cat
aaaaaa
bbbbbbbbbbbb
[root@localhost 0325]# cat < aa; cat < bb
aaaaaa
bbbbbbbbbbbb
[root@localhost 0325]# cat aa bb > /dev/stdout
aaaaaa
bbbbbbbbbbbb
[root@localhost 0325]# cat aa bb > /dev/stderr
aaaaaa
bbbbbbbbbbbb

1.5.2

# 第一题
id feng &>/dev/null && echo 123456|passwd feng --stdin
# 第二题
==如何判断一个目录是否存在
1. cd
2. ls
3. stat
4. find
5. [ -d /backup ]   ==>[ -d 目录地址 ]
==答案
[ -d /backup ] || mkdir /backup 或者  mkdir /backup -p
# 第三题
du -sh / 2>/dev/null

1.5.3 【知识补充】如何判断一个目录是否存在

cd
ls
stat
find
[ -d /backup ] test -d /backup 两种方式等价，命令的效果也相同，只是语法不同

【注意】中括号里两侧一定要有空格

-d 判断是否是目录 directory

-f 判断是否是文件 file

-e 判断文件或文件夹是否存在 exist

1.5.4 在python中如何判断文件夹是否存在

判断：os.path.exists() 新建：os.mkdir()

>>> import os
>>> os.path.exists("/backup")
False
>>> os.mkdir("/backup")
>>> os.path.exists("/backup")
True

【脚本】

[root@localhost 0325]# vim dir.py

[root@localhost 0325]# cat dir.py 
#!/usr/bin/python3

import os

if os.path.exists("/backup"):
	print("/backup is exists")
else:
	os.mkdir("/backup")
	print("/backup create ok")
[root@localhost 0325]# python3 dir.py 
/backup is exists
[root@localhost 0325]# rm -rf /backup
[root@localhost 0325]# python3 dir.py 
/backup create ok
[root@localhost 0325]# ls /
backup

1.6 xargs

将前面命令的输出送给后面的命令作为参数使用

用途：将参数列表转换成小块分段传递给其他命令

读入stdin的数据转换为参数添加到命令行中

让一些不支持管道的命令可以使用管道。

【注】xargs需要依赖管道，只是将前面的命令的输出送给后面的命令做参数使用，更加精准

[root@localhost 0325]# which mkdir
/usr/bin/mkdir
[root@localhost 0325]# which mkdir | ls -l  ==》管道符号将输出送来，ls -l不知道如何使用，就没有用
总用量 8
-rw-r--r--. 1 root root 144 3月  25 13:29 dir.py
-rw-r--r--. 1 root root 210 3月  25 09:58 position.py

[root@localhost 0325]# ls -l
总用量 8
-rw-r--r--. 1 root root 144 3月  25 13:29 dir.py
-rw-r--r--. 1 root root 210 3月  25 09:58 position.py

如何实现以下效果

[root@localhost 0325]# ls -l /usr/bin/mkdir
-rwxr-xr-x. 1 root root 79768 8月  20 2019 /usr/bin/mkdir

【通过xargs】

[root@localhost 0325]# which mkdir | xargs ls -l
-rwxr-xr-x. 1 root root 79768 8月  20 2019 /usr/bin/mkdir

二、文本处理命令

2.1 tr命令

tr - translate or delete characters

tr 是字符转换和删除字符的工具

tr set1 set2 意为：用set2中的字符替换掉set1中同一位置的字符

# 将所有的1都替换为a，所有的2都替换为b，所有的3都替换为b，
[root@localhost 0325]# echo 123456112233445566 | tr 123 abc
abc456aabbcc445566

[root@localhost 0325]# echo $PATH
/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/root/bin
[root@localhost 0325]# echo $PATH | tr ":"  "\n"   ==》将所有的:替换为换行
/usr/local/sbin
/usr/local/bin
/usr/sbin
/usr/bin
/root/bin
[root@localhost 0325]# echo $PATH | tr ":"  " "  ==》将所有的:替换为空格
/usr/local/sbin /usr/local/bin /usr/sbin /usr/bin /root/bin

2.1.1 tr命令的一个注意事项

只能对stdin操作，不能直接对文件操作

需要使用管道或者< 给tr传递参数

[root@localhost 0325]# cat name
chunhuofanmingyou
chunhuofanmingyou
chunhuofanmingyou
chunhuofanmingyou
chunhuofanmingyou
chunhuofanmingyou
[root@localhost 0325]# tr "c" "C" name
tr: 额外的操作数 "name"
Try 'tr --help' for more information.
[root@localhost 0325]# tr "c" "C" < name
Chunhuofanmingyou
Chunhuofanmingyou
Chunhuofanmingyou
Chunhuofanmingyou
Chunhuofanmingyou
Chunhuofanmingyou

[root@localhost 0325]# cat name |tr "f" "F"
chunhuoFanmingyou
chunhuoFanmingyou
chunhuoFanmingyou
chunhuoFanmingyou
chunhuoFanmingyou
chunhuoFanmingyou

2.1.2 替换文件里的某一些内容

[0-9.] 等价于 [0123456789.]

[root@localhost 0325]# cat name 
1.1chunhuofanmingyou
1.2chunhuofanmingyou
1.3chunhuofanmingyou
1.4chunhuofanmingyou
1.5chunhuofanmingyou
1.6chunhuofanmingyou
[root@localhost 0325]# cat name | tr '[0-9]' ' '  ==》删除文件中的数字
 . chunhuofanmingyou
 . chunhuofanmingyou
 . chunhuofanmingyou
 . chunhuofanmingyou
 . chunhuofanmingyou
 . chunhuofanmingyou
[root@localhost 0325]# cat name | tr '[0-9.]' ' '  ==》删除文件中的数字和.
   chunhuofanmingyou
   chunhuofanmingyou
   chunhuofanmingyou
   chunhuofanmingyou
   chunhuofanmingyou
   chunhuofanmingyou

2.1.3 选项（-d -s）

-d 删除字符串

-s 压缩相同的字符串(将连续相同的字符压缩成一个字符) 去重

[root@localhost 0325]# echo 1345464321343543 | tr -d 3
14546421454

[root@localhost 0325]# echo 12111234533333333322 | tr -s 123
121234532

2.2 sort

一个排序命令，默认分隔符是空白

默认按每一行的第一个字符排序 ==》按照行首第一个字符的ASCII码值升序（从小到大）排列

-n 按整数进行排序 number ==》将一串数字识别成一个整数

【举例】22 和 203

【解释】如果不接-n，默认是先比较第一位（2和2）==》相同，再比较第二位（2和0）==》前一个更大==》22>203

接了-n，就会比较整体比较22和203==》得到203>22

[root@localhost 0325]# cat test.txt | sort
203
22
[root@localhost 0325]# cat test.txt | sort -n
22
203

-r 递减排序 reverse

# python里二进制、字符和ASCII码的转换
>>> bin(99)
'0b1100011'   ==》0b开头的就是二进制数
>>> chr(97)
'a'
>>> ord('a')
97

[root@localhost 0325]# cat student_info.txt 
name    age    sex    Chinese    Math    English
cali    36     M      80         75      76
fmy     3      M      60         87      79
fzt     19     F      83         92      75
nyx     26     F      92         85      86

2.2.1 默认情况

按照行首第一个字符的ASCII码值升序（从小到大）排列。如果首字母相同，就同理比较第2个字符

[root@localhost 0325]# cat student_info.txt |sort
cali    36     M      80         75      76
fmy     3      M      60         87      79
fzt     19     F      83         92      75
name    age    sex    Chinese    Math    English
nyx     26     F      92         85      86

2.2.2 指定排序键 -k

指定按哪一列数据进行排序，默认是升序

[root@localhost 0325]# cat student_info.txt |sort -k4
fmy     3      M      60         87      79
cali    36     M      80         75      76
fzt     19     F      83         92      75
nyx     26     F      92         85      86
name    age    sex    Chinese    Math    English
[root@localhost 0325]# cat student_info.txt |sort -k4 -n 
name    age    sex    Chinese    Math    English
fmy     3      M      60         87      79
cali    36     M      80         75      76
fzt     19     F      83         92      75
nyx     26     F      92         85      86
[root@localhost 0325]# cat student_info.txt |sort -k4 -nr
nyx     26     F      92         85      86
fzt     19     F      83         92      75
cali    36     M      80         75      76
fmy     3      M      60         87      79
name    age    sex    Chinese    Math    English

2.2.3 指定分割符 -t

【举例】

[root@localhost 0325]# cat /etc/passwd | sort -n -k 3 -t : -r
lihua321:x:1012:1012::/home/lihua321:/bin/bash
shijunhao:x:1011:1011::/home/shijunhao:/bin/bash
liangluyao:x:1010:1010::/home/liangluyao:/bin/bash
xiaohong:x:1009:1009::/home/xiaohong:/bin/bash

2.2.4 综合性例题

[root@localhost 0325]# ps aux | sort -k4 -rn | head -5
root        715  0.0  1.5 359024 29168 ?        Ssl  3月24   0:03 /usr/bin/python2 -Es /usr/sbin/firewalld --nofork --nopid
root       1076  0.0  0.9 574280 17456 ?        Ssl  3月24   0:08 /usr/bin/python2 -Es /usr/sbin/tuned -l -P
polkitd     682  0.0  0.6 613004 13012 ?        Ssl  3月24   0:05 /usr/lib/polkit-1/polkitd --no-debug
root        754  0.0  0.5 700304  9536 ?        Ssl  3月24   0:08 /usr/sbin/NetworkManager --no-daemon
root      72201  0.0  0.3 158904  5608 ?        Ss   13:08   0:00 sshd: root@pts/1
[root@localhost 0325]# ps aux | sort -k3 -rn | head

[root@localhost 0325]# ps aux | sort -k4 -rn | head -5|awk '{print $2,$4,$11}'
715 1.5 /usr/bin/python2
1076 0.9 /usr/bin/python2
682 0.6 /usr/lib/polkit-1/polkitd
754 0.5 /usr/sbin/NetworkManager
72201 0.3 sshd:

2.3 uniq

去重（连续重复的多行数据只显示一行）==》一定要先使用sort的行，让重复的内容连在一起

-c 统计重复的次数 count

-u 只显示唯一的行

-d 显示重复的行

[root@localhost 0325]# cat student_info.txt 
name    age    sex    Chinese    Math    English
cali    36     M      80         75      76
fmy     3      M      60         87      79
fzt     19     F      83         92      75
nyx     26     F      92         85      86
[root@localhost 0325]# cat student_info.txt |awk '{print $3}'
sex
M
M
F
F
[root@localhost 0325]# cat student_info.txt |awk '{print $3}'|sort
F
F
M
M
sex
[root@localhost 0325]# cat student_info.txt |awk '{print $3}'|sort|uniq
F
M
sex
[root@localhost 0325]# cat student_info.txt |awk '{print $3}'|sort|uniq -c
      2 F
      2 M
      1 sex
[root@localhost 0325]# cat student_info.txt |awk '{print $3}'|sort|uniq -u
sex
[root@localhost 0325]# cat student_info.txt |awk '{print $3}'|sort|uniq -d
F
M

2.3.1 【例】搭建一个nginx的web服务器，统计访问次数最多的前3个ip地址

【步骤】

第一步：安装nginx web服务器软件==》yum install nginx -y

第二步：启动nginx服务==》service nginx restart

第三步：关闭防火墙==》service firewalld stop

查看防火墙的规则

[root@localhost 0325]# iptables -L
Chain INPUT (policy ACCEPT)
target     prot opt source               destination         

Chain FORWARD (policy ACCEPT)
target     prot opt source               destination         

Chain OUTPUT (policy ACCEPT)
target     prot opt source               destination

第四步：查看ip地址==》ip add

第五步：打开浏览器输入ip地址，访问你的web服务器

[root@localhost nginx]# curl http://192.168.255.132 ==》Linux里访问

第六步：修改网站的首页

[root@localhost 0325]# cd /usr/share/nginx/html   进入nginx服务存放网页的目录
[root@localhost html]# ls
404.html  en-US  img           index.html      poweredby.png
50x.html  icons  IMG_7438.JPG  nginx-logo.png

index.html 网站看到的第一个界面==》首页

echo "hello, world" > index.html 替换修改index.html里面的内容

第七步：刷新刚刚在浏览器打开的网页，会看到我们自己重定向进去的内容

第八步：进入记录nginx日志的目录

[root@localhost html]# cd /var/log/nginx
[root@localhost nginx]# ls
access.log  access.log-20211222  error.log  error.log-20211222

access.log 访问日志，记录谁什么时间访问了那个界面，成功还是失败

[root@localhost nginx]# cat access.log | tail -1
192.168.255.1 - - [25/Mar/2022:17:07:54 +0800] "GET /IMG_7438.JPG HTTP/1.1" 304 0 "http://192.168.255.132/" "Mozilla/5.0 (Windows NT 10.0; WOW64; Trident/7.0; rv:11.0) like Gecko" "-"

192.168.255.1 表示哪台客户机访问量我们的web服务

[25/Mar/2022:17:07:54 +0800] 访问时间

【答案】统计得出访问量前3的ip地址

[root@localhost nginx]# cat access.log | awk '{print $1}'|sort|uniq -c|sort -rn|head -3|awk '{print $2}'

2.4 cut

截取文本的命令

默认分割符是tab

从文本文件或者文本流中提取文本列

【格式】cut -选项提取范围文本文件

2.4.1 常见选项

-c 从指定提取范围中提取字符（单个字符）[root@localhost 0325]# echo 123456|cut -c 2 得到==》 2

-f 从指定提取范围中提取字段（列） fields

-d 指定分隔符

[root@localhost 0325]# cat student_info.txt |cut -f 1  ==》因为默认的分割符是tab键
name    age    sex    Chinese    Math    English
cali    36     M      80         75      76
fmy     3      M      60         87      79
fzt     19     F      83         92      75
nyx     26     F      92         85      86
[root@localhost 0325]# cat student_info.txt |cut -d " " -f 1
name
cali
fmy
fzt
nyx

【注】有时候会无法截取：因为它认为连续的两个分割符之间也算一个字段，需要和tr结合，可去除多余的内容，保证没有连续的分割符

[root@localhost 0325]# echo 1#2#3#4##5|cut -d "#" -f 5   ==》无法截取：因为它认为#和#之间也算一个字段

[root@localhost 0325]# echo 1#2#3#4##5|cut -d "#" -f 6
5
[root@localhost 0325]# w
 17:42:03 up 1 day,  2:43,  3 users,  load average: 0.00, 0.01, 0.05
USER     TTY      FROM             LOGIN@   IDLE   JCPU   PCPU WHAT
root     tty1                      三12    2days  0.05s  0.05s -bash
root     tty2                      四09    7:57m  0.04s  0.04s -bash
root     pts/1    192.168.255.1    13:08    3.00s  0.36s  0.00s w
[root@localhost 0325]# w|cut -d " " -f 1,2  ==》同理无法切割
 17:42:06
USER 
root 
root 
root 
[root@localhost 0325]# w|tr -s " " |cut -d " " -f 1,2  ==》和tr结合，可去除多余的空格
 17:42:24
USER TTY
root tty1
root tty2
root pts/1

2.4.2 提取范围

类似于python中的切片

n 第n项

n- 第n向到行尾

-m 行首到第m项

n,m 第n项和第m项：单个的

n-m 第n项到第m项：连续的

[root@localhost 0325]# echo 123456|cut -c 2
2
[root@localhost 0325]# echo 123456|cut -c 2-4
234
[root@localhost 0325]# echo 123456|cut -c 1,3,5
135

三、插入一个练习

grep 过滤命令，输出含有匹配的字符串所在的行

选项：-o 只是输出匹配到的内容，不匹配不显示

# 第一题
==答案1
[root@localhost ~]# ll -R | sort -n -k5
==答案2
[root@localhost ~]# ll -R | sort -n -k5 | awk '{print $5,$9}'
# 第二题
[root@localhost ~]# cat /etc/passwd | cut -d ":" -f 7 | sort |uniq -c
[root@localhost ~]# cat /etc/passwd | awk -F: '{print $7}'|sort |uniq -c
# 第三题
==答案1
[root@localhost ~]# df -Th | awk '{print $1,$2,$6}'
==答案2
[root@localhost ~]# df -Th |tr -s " " |cut -d " " -f 1,2,6
# 第四题
==答案1
[root@localhost ~]# cat /etc/passwd|tr ":" "\n" | grep sbin|awk -F '/' '{print $2}'|uniq -c
==答案2 
[root@localhost ~]# cat /etc/passwd|tr ":" "\n" | grep sbin|wc -l
==答案3
[root@localhost ~]# cat /etc/passwd|grep -o sbin|wc -l
==答案4
[root@localhost ~]# cat /etc/passwd|grep -o sbin|uniq -c
# 第五题
[root@localhost ~]# ps aux | sort -nr -k 4|head -5
# 第六题
[root@localhost ~]# ps aux | sort -nr -k 3|head -5
# 第七题

[root@localhost ~]# ip add | grep "192"
# 第八题
[root@localhost nginx]# cat access.log|awk '{print $1}' | sort |uniq -c|sort -nr|head -3
# 第九题
[root@localhost nginx]# cat access.log|awk '{print $9}' | sort |uniq -c|sort -nr|head -2