Python爬虫:splash的安装与简单示例
安装splash1、安装docker(参考:mac安装docker)2、安装splashdocker pull scrapinghub/splash# 安装docker run -p 8050:8050 scrapinghub/splash# 运行访问测试: http://localhost:8050/代码示例import requestsimport...
·
安装splash
1、安装docker(参考:mac安装docker)
2、安装splash
docker pull scrapinghub/splash # 安装
docker run -p 8050:8050 scrapinghub/splash # 运行
访问测试: http://localhost:8050/
代码示例
import requests
import time
from scrapy import Selector
def timer(func):
def inner(*args):
start = time.time()
response = func(*args)
print("time: %s" % (time.time() - start))
return response
return inner
@timer
def use_request(url):
return requests.get(url)
@timer
def use_splash(url):
splash_url = "http://localhost:8050/render.html"
args = {
"url": url,
"timeout": 5,
"image": 0
}
return requests.get(splash_url, params=args)
if __name__ == '__main__':
url = "http://quotes.toscrape.com/js/"
r1 = use_request(url)
sel1 = Selector(r1)
text = sel1.css(".quote .text::text").extract_first()
print(text)
r2 = use_splash(url)
sel2 = Selector(r2)
text = sel2.css(".quote .text::text").extract_first()
print(text)
"""
time: 0.632809877396
None
time: 0.685022830963
“The world as we have created it is a process of our thinking.
It cannot be changed without changing our thinking.”
"""
通过测试,发现需要splash对网页进行了渲染,获取到了数据,而且速度还很快
args参数说明:
url: 需要渲染的页面地址
timeout: 超时时间
proxy:代理
wait:等待渲染时间
images: 是否下载,默认1(下载)
js_source: 渲染页面前执行的js代码
更多推荐
已为社区贡献4条内容
所有评论(0)