Beautiful Soup Web Scraper on Binance Announcement Page Lags behind by 5 minutes

Mangs

8人浏览 · 2022-08-25 07:39:03

Mangs · 2022-08-25 07:39:03 发布

Answer a question

I have built a web scraper using bs4, where the purpose is to get notifications when a new announcement is posted. At the moment I am testing this with the word 'list' instead of all announcement keywords. For some reason when I compare the time it determines a new announcement has been posted versus the actual time it was posted in the website, the time is off by 5 minutes give or take.

This is actual time it was announced

from bs4 import BeautifulSoup
from requests import get
import time
import sys

x = True
while x == True:
    time.sleep(30)
    # Data for the scraping
    url = "https://www.binance.com/en/support/announcement"
    response = get(url)
    html_page = response.content
    soup = BeautifulSoup(html_page, 'html.parser')
    news_list = soup.find_all(class_ = 'css-qinc3w')

    # Create a bag of key words for getting matches
    key_words = ['list', 'token sale', 'open trading', 'opens trading', 'perpetual', 'defi', 'uniswap', 'airdrop', 'adds', 'updates', 'enabled', 'trade', 'support']

    # Empty list
    updated_list = []

    for news in news_list:
        article_text = news.text

        if ("list" in article_text.lower()):
            updated_list.append([article_text])

        if len(updated_list) > 4:
            print(time.asctime( time.localtime(time.time()) ))
            print(article_text)
            sys.exit()

The Response when the length of the list increased by 1 to 5 resulted in printing the following time, and new announcement: Fri May 28 04:17:39 2021, Binance Will List Livepeer (LPT)

I am unsure why this is. At first I thought I was being throttled, but looking again at robot.txt, I didn't see any reason why I should be. Moreover I included a sleep time of 30 seconds which should be more than enough to web scrape without any issues. Any help or an alternative solution would be much appreciated.

My question is:

Why is it 5 minutes behind? Why does it not notify me once the website posts it? The program takes 5 minutes longer to recognize there is a new post in comparison to the time it is posted on the website.

Answers

from xrzz import http ## give it try using my simple scratch module
import json

url = "https://www.binance.com/bapi/composite/v1/public/cms/article/list/query?type=1&pageNo=1&pageSize=30"

req = http("GET", url=url, tls=True).body().decode()

key_words = ['list', 'token sale', 'open trading', 'opens trading', 'perpetual', 'defi', 'uniswap', 'airdrop', 'adds', 'updates', 'enabled', 'trade', 'support']

for i in json.loads(req)['data']['catalogs']:
    for o in i['articles']:
        if key_words[0] in o['title']:
            print(o['title'])

Ouput:

Result

Python

Python社区为您提供最前沿的新闻资讯和知识内容

更多推荐

求助！为什么用InsCode部署会出现无限重定向？

Python

如何重塑熊猫。系列

问题:如何重塑熊猫。系列在我看来,它就像 pandas.Series 中的一个错误。 a = pd.Series([1,2,3,4]) b = a.reshape(2,2) b b 有类型 Series 但无法显示,最后一条语句给出异常,非常冗长,最后一行是“TypeError: %d format: a number is required, not numpy.ndarray”。 b.sha

Python

在哪里可以找到有关 Keras 中默认权重初始化器的文档? [复制]

问题:在哪里可以找到有关 Keras 中默认权重初始化器的文档? [复制] 我刚刚在这里](https://keras.io/initializers/)中阅读了有关[中的 Keras 权重初始化器的信息。在文档中,只介绍了不同的初始化程序。如: model.add(Dense(64, kernel_initializer='random_normal')) 当我没有指定kernel_initia