Loop through URLs using BeautifulSoup, and either RegEx or Lambda, to do matching?
Answer a question
I am trying to loop through a few URLs and scrape out one specific class. I believe it's called:
<div class="Fw(b) Fl(end)--m Fz(s) C($primaryColor" data-reactid="192">Overvalued</div>
Here is the URL:
https://finance.yahoo.com/quote/goog
Here is the data that I want for GOOG.
Near Fair Value
I believe this will require some kind of Lambda function or RegEx. I tried to do this without using these methodologies, but I couldn't get it working. Here is the code that I am testing.
import requests
from bs4 import BeautifulSoup
import re
mylink = "https://finance.yahoo.com/quote/"
mylist = ['SBUX', 'MSFT', 'GOOG']
mystocks = []
html = requests.get(mylink).text
soup = BeautifulSoup(html, "lxml")
#details = soup.findAll("div", {"class" : lambda L: L and L.startswith('Fw(b) Fl(end)--m')})
details = soup.findAll('div', {'class' : re.compile('Fw(b)*')})
for item in mylist:
for r in details:
mystocks.append(item + ' - ' + details)
print(mystocks)
Here is a screen shot:

After the code runs, I would like to see something like this.
GOOG - Near Fair Value
SBUX - Near Fair Value
MSFT - Overvalued
The problem is, that if I use something like this: 'Fw(b)*', I get too much data pulled back. If I try to expand that, to this: 'Fw(b) Fl(end)--m Fz(s)', I get nothing back. How can I get the results I showed above?
Answers
No need to use regex, CSS selector is enough. The key is to use correct HTTP header - User-Agent.
For example:
import requests
from bs4 import BeautifulSoup
urls = [('GOOG', 'https://finance.yahoo.com/quote/goog'),
('SBUX', 'https://finance.yahoo.com/quote/sbux'),
('MSFT', 'https://finance.yahoo.com/quote/msft')]
headers = {'User-Agent': 'Mozilla/5.0 (X11; Ubuntu; Linux x86_64; rv:77.0) Gecko/20100101 Firefox/77.0'}
for q, url in urls:
soup = BeautifulSoup(requests.get(url, headers=headers).content, 'html.parser')
value = soup.select_one('div:contains("XX.XX") + div').text
print('{:<10}{}'.format(q, value))
Prints:
GOOG Near Fair Value
SBUX Near Fair Value
MSFT Overvalued
更多推荐

所有评论(0)