How to retrieve scraped data using asyncio

Mangs

0人浏览 · 2022-08-24 22:24:39

Mangs · 2022-08-24 22:24:39 发布

Answer a question

I am a noob who is trying to scrape a list of urls and search for a word using asynchronous programming in python. My code is as follows:

async def fetch(session, url):
    async with session.get(url) as response:
        return await response.text()

def parse(wd, html, url):
    add_soup = bsoup(html,'html.parser')
    res = []
    for para in (add_soup.find_all("p")):
        para_txt = para.text
        for sent_txt in para_txt.split("."):
            if wd in sent_txt:
                res.append([sent_txt, url])
    return res

async def scrape_urls(wd, urls):
    async with aiohttp.ClientSession() as session:
        return await asyncio.gather(
            *(fetch_and_parse(wd, session, url) for url in urls)
        )

async def fetch_and_parse(wd, session, url):
    html = await fetch(wd, session, url)
    loop = asyncio.get_event_loop()
    paras = await loop.run_in_executor(None, parse, html)
    return paras

I wrote the above code from this link. But I am unclear as how to proceed to retrieve the resultant list

I am trying to get the results using this co = scrape_urls("agriculture", urls). As expected I get a coroutine object. How do I parse the coroutine object?

Answers

Not entirely sure what issue you're facing. Once you use gather to get the Future instance, use an event loop to execute it and get results.

loop = asyncio.get_event_loop()
group = scrape_urls("agriculture", urls)
results = loop.run_until_complete(group)
loop.close()
print(results)

Python

Python社区为您提供最前沿的新闻资讯和知识内容

更多推荐

求助！为什么用InsCode部署会出现无限重定向？

Python

如何重塑熊猫。系列

问题:如何重塑熊猫。系列在我看来,它就像 pandas.Series 中的一个错误。 a = pd.Series([1,2,3,4]) b = a.reshape(2,2) b b 有类型 Series 但无法显示,最后一条语句给出异常,非常冗长,最后一行是“TypeError: %d format: a number is required, not numpy.ndarray”。 b.sha

Python

在哪里可以找到有关 Keras 中默认权重初始化器的文档? [复制]

问题:在哪里可以找到有关 Keras 中默认权重初始化器的文档? [复制] 我刚刚在这里](https://keras.io/initializers/)中阅读了有关[中的 Keras 权重初始化器的信息。在文档中,只介绍了不同的初始化程序。如: model.add(Dense(64, kernel_initializer='random_normal')) 当我没有指定kernel_initia