Answer a question

I am trying to loop through a few URLs and scrape out one specific class. I believe it's called:

<div class="Fw(b) Fl(end)--m Fz(s) C($primaryColor" data-reactid="192">Overvalued</div>

Here is the URL:

https://finance.yahoo.com/quote/goog

Here is the data that I want for GOOG.

Near Fair Value

I believe this will require some kind of Lambda function or RegEx. I tried to do this without using these methodologies, but I couldn't get it working. Here is the code that I am testing.

import requests
from bs4 import BeautifulSoup
import re

mylink = "https://finance.yahoo.com/quote/"
mylist = ['SBUX', 'MSFT', 'GOOG']
mystocks = []

html = requests.get(mylink).text
soup = BeautifulSoup(html, "lxml")

#details = soup.findAll("div", {"class" : lambda L: L and L.startswith('Fw(b) Fl(end)--m')})

details = soup.findAll('div', {'class' : re.compile('Fw(b)*')})
for item in mylist:
    for r in details:
        mystocks.append(item + ' - ' + details)

print(mystocks)

Here is a screen shot:

enter image description here

After the code runs, I would like to see something like this.

GOOG - Near Fair Value
SBUX - Near Fair Value
MSFT - Overvalued

The problem is, that if I use something like this: 'Fw(b)*', I get too much data pulled back. If I try to expand that, to this: 'Fw(b) Fl(end)--m Fz(s)', I get nothing back. How can I get the results I showed above?

Answers

No need to use regex, CSS selector is enough. The key is to use correct HTTP header - User-Agent.

For example:

import requests
from bs4 import BeautifulSoup

urls = [('GOOG', 'https://finance.yahoo.com/quote/goog'),
        ('SBUX', 'https://finance.yahoo.com/quote/sbux'),
        ('MSFT', 'https://finance.yahoo.com/quote/msft')]

headers = {'User-Agent': 'Mozilla/5.0 (X11; Ubuntu; Linux x86_64; rv:77.0) Gecko/20100101 Firefox/77.0'}

for q, url in urls:
    soup = BeautifulSoup(requests.get(url, headers=headers).content, 'html.parser')
    value = soup.select_one('div:contains("XX.XX") + div').text

    print('{:<10}{}'.format(q, value))

Prints:

GOOG      Near Fair Value
SBUX      Near Fair Value
MSFT      Overvalued
Logo

Python社区为您提供最前沿的新闻资讯和知识内容

更多推荐