爬虫工具记录

爬虫工具总结pythonscrapy通用型爬虫框架 https://scrapy-chs.readthedocs.io/zh_CN/0.24/intro/tutorial.htmlportia基于scrapy的可是化爬虫，目的是为了让不会编程的人也可以轻松写爬虫，最终项目可以导出为scrapyhttps://support.scrapinghub.com/support/sol...

lpliner

267人浏览 · 2019-05-29 16:41:58

lpliner · 2019-05-29 16:41:58 发布

爬虫工具总结

python

scrapy

通用型爬虫框架 https://scrapy-chs.readthedocs.io/zh_CN/0.24/intro/tutorial.html

portia

基于scrapy的可是化爬虫，目的是为了让不会编程的人也可以轻松写爬虫，最终项目可以导出为scrapy
https://support.scrapinghub.com/support/solutions/articles/22000200442-using-portia-the-complete-beginner-s-guide
https://portia.readthedocs.io/en/2.0-docs/
https://app.scrapinghub.com/

scrapyd

用来部署scrapy爬虫
https://scrapyd.readthedocs.io/en/stable/

scrapydweb

用来管理多个scrapyd的web系统
由于scrapyd只能在一台机器上部署，scrapydweb将会实现分布式部署多个scrapyd
https://github.com/my8100/scrapydweb

pyspider

强大的爬虫系统

requests

python 自带库

aiohttp

异步库 asyncio

ruia

异步爬虫框架

python-goose

newspaper

新闻类爬取

crawley

botflow

grep

arsenic

requestium

pyppeteer

vibora

asks

cola

darksand/sasila

东西太多，很多都没用过，这里记录下以后会用的上

nodejs

headless-chrome-crawler

golang

colly

CSDN学习社区

CSDN联合极客时间，共同打造面向开发者的精品内容学习社区，助力成长！

更多推荐

cover

用 OpenAI Assistants 做大模型应用开发

CSDN学习社区

cover

1 小时解读鸿蒙 10 大热点问题

CSDN学习社区

cover

1 小时解读鸿蒙 10 大热点问题

CSDN学习社区

所有评论(0)

查看更多评论

lpliner

已为社区贡献1条内容