Answer a question

I need to scrape this website:

https://sec.report/Ticker/AAPL

I need to get the CIK number 0000320193

When I do soup.prettify, it just says it needs to use javascript. Also, I don't want to open a web browser because it needs to be automated

I need to use the python beautiful soup and requests library

Answers

To get correct response from server, set correct User-Agent HTTP header:

import requests
from bs4 import BeautifulSoup


url = 'https://sec.report/Ticker/AAPL'
headers = {'User-Agent': 'Mozilla/5.0 (X11; Ubuntu; Linux x86_64; rv:78.0) Gecko/20100101 Firefox/78.0'}

soup = BeautifulSoup(requests.get(url, headers=headers).content, 'html.parser')
print(soup.h2.text) # or print(soup.h2.text.split()[-1]) for "0000320193"

Prints:

SEC CIK 0000320193
Logo

Python社区为您提供最前沿的新闻资讯和知识内容

更多推荐