How to deploy a Scrapy spider on Heroku cloud

Mangs

0人浏览 · 2022-09-02 11:33:11

Mangs · 2022-09-02 11:33:11 发布

Answer a question

I developed few spiders in scrapy & I want to test those on Heroku cloud. Does anybody have any idea about how to deploy a Scrapy spider on Heroku cloud?

Answers

Yes, it's fairly simple to deploy and run your Scrapy spider on Heroku.

Here are the steps using a real Scrapy project as example:

Clone the project (note that it must have a requirements.txt file for Heroku to recognize it as a Python project):

git clone https://github.com/scrapinghub/testspiders.git
Add cffi to the requirement.txt file (e.g. cffi==1.1.0).
Create the Heroku application (this will add a new heroku git remote):

heroku create
Deploy the project (this will take a while the first time, when the slug is built):

git push heroku main
Run your spider:

heroku run scrapy crawl followall

Some notes:

Heroku disk is ephemeral. If you want to store the scraped data in a persistent place, you can use a S3 feed export (by appending -o s3://mybucket/items.jl) or use an addon (like MongoHQ or Redis To Go) and write a pipeline to store your items there
It would be cool to run a Scrapyd server on Heroku, but it's not currently possible because the sqlite3 module (which Scrapyd requires) doesn't work on Heroku
If you want a more sophisticated solution for deploying your Scrapy spiders, consider setting up your own Scrapyd server or using a hosted service like Scrapy Cloud

Python

Python社区为您提供最前沿的新闻资讯和知识内容

更多推荐

求助！为什么用InsCode部署会出现无限重定向？

Python

如何重塑熊猫。系列

问题:如何重塑熊猫。系列在我看来,它就像 pandas.Series 中的一个错误。 a = pd.Series([1,2,3,4]) b = a.reshape(2,2) b b 有类型 Series 但无法显示,最后一条语句给出异常,非常冗长,最后一行是“TypeError: %d format: a number is required, not numpy.ndarray”。 b.sha

Python

在哪里可以找到有关 Keras 中默认权重初始化器的文档? [复制]

问题:在哪里可以找到有关 Keras 中默认权重初始化器的文档? [复制] 我刚刚在这里](https://keras.io/initializers/)中阅读了有关[中的 Keras 权重初始化器的信息。在文档中,只介绍了不同的初始化程序。如: model.add(Dense(64, kernel_initializer='random_normal')) 当我没有指定kernel_initia