Loop through URLs using BeautifulSoup, and either RegEx or Lambda, to do matching?

Mangs

0人浏览 · 2022-08-24 21:02:25

Mangs · 2022-08-24 21:02:25 发布

Answer a question

I am trying to loop through a few URLs and scrape out one specific class. I believe it's called:

<div class="Fw(b) Fl(end)--m Fz(s) C($primaryColor" data-reactid="192">Overvalued</div>

Here is the URL:

https://finance.yahoo.com/quote/goog

Here is the data that I want for GOOG.

Near Fair Value

I believe this will require some kind of Lambda function or RegEx. I tried to do this without using these methodologies, but I couldn't get it working. Here is the code that I am testing.

import requests
from bs4 import BeautifulSoup
import re

mylink = "https://finance.yahoo.com/quote/"
mylist = ['SBUX', 'MSFT', 'GOOG']
mystocks = []

html = requests.get(mylink).text
soup = BeautifulSoup(html, "lxml")

#details = soup.findAll("div", {"class" : lambda L: L and L.startswith('Fw(b) Fl(end)--m')})

details = soup.findAll('div', {'class' : re.compile('Fw(b)*')})
for item in mylist:
    for r in details:
        mystocks.append(item + ' - ' + details)

print(mystocks)

Here is a screen shot:

enter image description here

After the code runs, I would like to see something like this.

GOOG - Near Fair Value
SBUX - Near Fair Value
MSFT - Overvalued

The problem is, that if I use something like this: 'Fw(b)*', I get too much data pulled back. If I try to expand that, to this: 'Fw(b) Fl(end)--m Fz(s)', I get nothing back. How can I get the results I showed above?

Answers

No need to use regex, CSS selector is enough. The key is to use correct HTTP header - User-Agent.

For example:

import requests
from bs4 import BeautifulSoup

urls = [('GOOG', 'https://finance.yahoo.com/quote/goog'),
        ('SBUX', 'https://finance.yahoo.com/quote/sbux'),
        ('MSFT', 'https://finance.yahoo.com/quote/msft')]

headers = {'User-Agent': 'Mozilla/5.0 (X11; Ubuntu; Linux x86_64; rv:77.0) Gecko/20100101 Firefox/77.0'}

for q, url in urls:
    soup = BeautifulSoup(requests.get(url, headers=headers).content, 'html.parser')
    value = soup.select_one('div:contains("XX.XX") + div').text

    print('{:<10}{}'.format(q, value))

Prints:

GOOG      Near Fair Value
SBUX      Near Fair Value
MSFT      Overvalued

Python

Python社区为您提供最前沿的新闻资讯和知识内容

更多推荐

求助！为什么用InsCode部署会出现无限重定向？

Python

如何重塑熊猫。系列

问题:如何重塑熊猫。系列在我看来,它就像 pandas.Series 中的一个错误。 a = pd.Series([1,2,3,4]) b = a.reshape(2,2) b b 有类型 Series 但无法显示,最后一条语句给出异常,非常冗长,最后一行是“TypeError: %d format: a number is required, not numpy.ndarray”。 b.sha

Python

在哪里可以找到有关 Keras 中默认权重初始化器的文档? [复制]

问题:在哪里可以找到有关 Keras 中默认权重初始化器的文档? [复制] 我刚刚在这里](https://keras.io/initializers/)中阅读了有关[中的 Keras 权重初始化器的信息。在文档中,只介绍了不同的初始化程序。如: model.add(Dense(64, kernel_initializer='random_normal')) 当我没有指定kernel_initia