Python实现Splash爬取网页
先开启splash:sudo docker run -p 8050:8050 scrapinghub/splash.py代码:import requestsfrom urllib.parse import quotefrom requests import ConnectionErrorlua = '''function main(splash)splash:go("...
·
先开启splash:
sudo docker run -p 8050:8050 scrapinghub/splash
.py代码:
import requests
from urllib.parse import quote
from requests import ConnectionError
lua = '''
function main(splash)
splash:go("https://www.baidu.com")
input = splash:select("#kw")
input:send_text("Python")
submit = splash:select("#su")
submit:mouse_click()
splash:wait(3)
return splash:jpeg()
end
'''
#将lua脚本转换为url格式并与url地址拼接
url = "http://localhost:8050/execute?lua_source=" + quote(lua)
try:
#请求url
response = requests.get(url)
print(response.status_code)
#将返回的信息写入文件
with open('baidu.jpg', 'wb') as f:
f.write(response.content)
except ConnectionError as e:
print(e)
其中: lua为lua语言编写的脚本, url中execute为splash中的API.
结果:
更多推荐
已为社区贡献1条内容
所有评论(0)