Linux wget 批量下载

本文转自https://www.cnblogs.com/chenjinxi/p/7479386.html需求：已知50个pdf的URL地址，需要批量下载，该怎么办呢？方案一：使用wget自带的一个功能 -i 选项从指定文件中读取下载地址，这样的好处是一直是这一个wget进程下载所有pdf，不会来回的启、停止进程[root@Jenkins tmp]# pwd/roo

AISeekOnline

6347人浏览 · 2018-02-02 10:09:10

AISeekOnline · 2018-02-02 10:09:10 发布

本文转自https://www.cnblogs.com/chenjinxi/p/7479386.html

需求：已知50个pdf的URL地址，需要批量下载，该怎么办呢？

方案一：使用wget自带的一个功能 -i 选项从指定文件中读取下载地址，这样的好处是一直是这一个wget进程下载所有pdf，不会来回的启、停止进程

[root@Jenkins tmp]# pwd
/root/tmp
[root@Jenkins tmp]# wc -l 50pdf.log 
50 50pdf.log
[root@Jenkins tmp]# head -3 50pdf.log 
14788669468643331.pdf
1479035133045678.pdf
14799731544302441.pdf
[root@Jenkins tmp]# awk '{print "http://xxxxx/"$1}' 50pdf.log > download.log
[root@Jenkins tmp]# head -3 download.log 
http://xxxxx/14788669468643331.pdf
http://xxxxx/1479035133045678.pdf
http://xxxxx/14799731544302441.pdf
[root@Jenkins tmp]# wget -i download.log 
--2017-09-05 16:12:52--  http://xxxxx/14788669468643331.pdf
Resolving nfs.htbaobao.com... 106.75.138.13
Connecting to nfs.htbaobao.com|106.75.138.13|:80... connected.
HTTP request sent, awaiting response... 200 OK
Length: 2601963 (2.5M) [application/pdf]
Saving to: “14788669468643331.pdf”

100%[========================================================================================================================================================================>] 2,601,963    244K/s   in 10s     

2017-09-05 16:13:02 (245 KB/s) - “14788669468643331.pdf” saved [2601963/2601963]
.......................................中间省略
--2017-09-05 16:14:04--  http://xxxxx/1481341338750833.pdf
Reusing existing connection to nfs.htbaobao.com:80.
HTTP request sent, awaiting response... 200 OK
Length: 152155 (149K) [application/pdf]
Saving to: “1481341338750833.pdf”

100%[========================================================================================================================================================================>] 152,155      209K/s   in 0.7s    

2017-09-05 16:14:05 (209 KB/s) - “1481341338750833.pdf” saved [152155/152155]

FINISHED --2017-09-05 16:14:05--
Downloaded: 50 files, 16M in 1m 13s (226 KB/s)

[root@Jenkins tmp]# ls
14788669468643331.pdf 1481187682278708.pdf 1481262534034760.pdf 1481266593232456.pdf 1481340827926207.pdf 1481340948842260.pdf 1481341049634040.pdf 1481341172815801.pdf 1481341307823881.pdf
1479035133045678.pdf 1481193562811982.pdf 1481262611307371.pdf 1481267034803389.pdf 1481340853666343.pdf 1481340973957872.pdf 1481341112979143.pdf 1481341185245978.pdf 1481341338750833.pdf
14799731544302441.pdf 1481247789582233.pdf 1481262623674903.pdf 1481270022285676.pdf 1481340897933322.pdf 1481341008561312.pdf 1481341130545646.pdf 1481341216517700.pdf 50pdf.log
14799944743125144.pdf 1481262178457017.pdf 1481262846773279.pdf 1481286012498927.pdf 1481340922434822.pdf 1481341008584230.pdf 1481341134346522.pdf 1481341229730723.pdf download.log
1481034002739896.pdf 1481262229905206.pdf 1481265452669335.pdf 1481340787767089.pdf 1481340927135663.pdf 1481341022043499.pdf 1481341148759269.pdf 1481341244148718.pdf
1481095290513785.pdf 1481262241457479.pdf 1481265807661321.pdf 1481340826599027.pdf 1481340943094250.pdf 1481341045655154.pdf 1481341159027852.pdf 1481341261314587.pdf

在下载过程中打开另外一个窗口查看是否是同一个wget进程

[root@Jenkins ~]# ps -ef|grep -v grep|grep wget
root     11752  9933  0 16:12 pts/1    00:00:00 wget -i download.log
[root@Jenkins ~]# ps -ef|grep -v grep|grep wget
root     11752  9933  0 16:12 pts/1    00:00:00 wget -i download.log
[root@Jenkins ~]# ps -ef|grep -v grep|grep wget
root     11752  9933  0 16:12 pts/1    00:00:00 wget -i download.log
[root@Jenkins ~]# ps -ef|grep -v grep|grep wget
root     11752  9933  0 16:12 pts/1    00:00:00 wget -i download.log
[root@Jenkins ~]# ps -ef|grep -v grep|grep wget
[root@Jenkins ~]#

方案二：把这些URL地址放在一个文件里面，然后写个脚本直接for循环取一个URL地址交给wget下载，但是这样不好的是每下载一个pdf都会启动一个wget进程，下载完成后关闭wget进程 ......一直这样循环到最后一个，比较影响系统性能

[root@Jenkins tmp]# ls
50pdf.log  download.log  wget_pdf.sh
[root@Jenkins tmp]# cat wget_pdf.sh
#!/usr/bin/env bash
#
for url in `cat /root/tmp/download.log`;do
    wget $url
done
[root@Jenkins tmp]# sh wget_pdf.sh 
--2017-09-05 16:24:06--  http://xxxxx/14788669468643331.pdf
Resolving nfs.htbaobao.com... 106.75.138.13
Connecting to nfs.htbaobao.com|106.75.138.13|:80... connected.
HTTP request sent, awaiting response... 200 OK
Length: 2601963 (2.5M) [application/pdf]
Saving to: “14788669468643331.pdf”

100%[========================================================================================================================================================================>] 2,601,963    230K/s   in 11s     

2017-09-05 16:24:17 (224 KB/s) - “14788669468643331.pdf” saved [2601963/2601963]
......................................................中间省略
--2017-09-05 16:25:21--  http://xxxxx/1481341338750833.pdf
Resolving nfs.htbaobao.com... 106.75.138.13
Connecting to nfs.htbaobao.com|106.75.138.13|:80... connected.
HTTP request sent, awaiting response... 200 OK
Length: 152155 (149K) [application/pdf]
Saving to: “1481341338750833.pdf”

100%[========================================================================================================================================================================>] 152,155      184K/s   in 0.8s    

2017-09-05 16:25:22 (184 KB/s) - “1481341338750833.pdf” saved [152155/152155]

[root@Jenkins tmp]# ls
14788669468643331.pdf  1481187682278708.pdf  1481262534034760.pdf  1481266593232456.pdf  1481340827926207.pdf  1481340948842260.pdf  1481341049634040.pdf  1481341172815801.pdf  1481341307823881.pdf
1479035133045678.pdf   1481193562811982.pdf  1481262611307371.pdf  1481267034803389.pdf  1481340853666343.pdf  1481340973957872.pdf  1481341112979143.pdf  1481341185245978.pdf  1481341338750833.pdf
14799731544302441.pdf  1481247789582233.pdf  1481262623674903.pdf  1481270022285676.pdf  1481340897933322.pdf  1481341008561312.pdf  1481341130545646.pdf  1481341216517700.pdf  50pdf.log
14799944743125144.pdf  1481262178457017.pdf  1481262846773279.pdf  1481286012498927.pdf  1481340922434822.pdf  1481341008584230.pdf  1481341134346522.pdf  1481341229730723.pdf  download.log
1481034002739896.pdf   1481262229905206.pdf  1481265452669335.pdf  1481340787767089.pdf  1481340927135663.pdf  1481341022043499.pdf  1481341148759269.pdf  1481341244148718.pdf  wget_pdf.sh
1481095290513785.pdf   1481262241457479.pdf  1481265807661321.pdf  1481340826599027.pdf  1481340943094250.pdf  1481341045655154.pdf  1481341159027852.pdf  1481341261314587.pdf

在下载过程中打开另外一个窗口查看是否是同一个wget进程

[root@Jenkins ~]# ps -ef|grep -v grep|grep wget
root     11778  9933  0 16:24 pts/1    00:00:00 sh wget_pdf.sh
root     11780 11778  0 16:24 pts/1    00:00:00 wget http://xxxxx/14788669468643331.pdf
[root@Jenkins ~]# ps -ef|grep -v grep|grep wget
root     11778  9933  0 16:24 pts/1    00:00:00 sh wget_pdf.sh
root     11784 11778  0 16:24 pts/1    00:00:00 wget http://xxxxx/1479035133045678.pdf
[root@Jenkins ~]# ps -ef|grep -v grep|grep wget
root     11778  9933  0 16:24 pts/1    00:00:00 sh wget_pdf.sh
root     11784 11778  0 16:24 pts/1    00:00:00 wget http://xxxxx/1479035133045678.pdf
[root@Jenkins ~]# ps -ef|grep -v grep|grep wget
root     11778  9933  0 16:24 pts/1    00:00:00 sh wget_pdf.sh
root     11791 11778  0 16:24 pts/1    00:00:00 wget http://xxxxx/14799731544302441.pdf
[root@Jenkins ~]# ps -ef|grep -v grep|grep wget
root     11778  9933  0 16:24 pts/1    00:00:00 sh wget_pdf.sh
root     11791 11778  0 16:24 pts/1    00:00:00 wget http://xxxxx/14799731544302441.pdf
[root@Jenkins ~]# ps -ef|grep -v grep|grep wget
root     11778  9933  0 16:24 pts/1    00:00:00 sh wget_pdf.sh
root     11798 11778  0 16:24 pts/1    00:00:00 wget http://xxxxx/14799944743125144.pdf
[root@Jenkins ~]# ps -ef|grep -v grep|grep wget
root     11778  9933  0 16:24 pts/1    00:00:00 sh wget_pdf.sh
root     11846 11778  0 16:25 pts/1    00:00:00 wget http://xxxxx/1481341307823881.pdf

小结：

　　1、使用方案一只有一个进程下载，且在最后会显示总共下载了多少个文件，下载的总大小等信息

　　2、使用方案二每次下载都会重新生成一个wget进程，上下文频繁切换

***** 不要在该努力拼搏的年纪选择安逸 *****

向您推荐>>Eolink开发者社区

权威｜前沿｜技术｜干货｜国内首个API全生命周期开发者社区

更多推荐

深入理解 Mocha 测试框架：从零实现一个 Mocha

前言什么是自动化测试自动化测试在很多团队中都是Devops环节中很难执行起来的一个环节，主要原因在于测试代码的编写工作很难抽象，99%的场景都需要和业务强绑定，而且写测试代码的编写工作量往往比编写实际业务代码的工作量更多。在一些很多业务场景中投入产出比很低，适合写自动化测试的应该是那些中长期业务以及一些诸如组件一样的基础库。自动化测试是个比较大的概念，其中分类也比较多，比如单元测试，端对端测试，集

云原生

ELK实现containerd的容器日志采集展示【基于logging的全栈监测】

企业级ELK Stack构建介绍

云原生

(20200916 Solved)docker-compose up创建容器自动退出

问题描述如题，创建容器后自动退出了。并且docker start container无效解决方案原因是缺失了控制终端的配置，需要在docker-compose.yml中增加tty:true ，有时候这样也不行，需要再增加一个command:/bin/bash，命令不一定是这个，需要是一个不会退出的命令，然后用-d后台启动容器。Referencesdocker-compose启动容器后自动退出...