Answer a question

I am trying to scrape a website. I have tried using two methods but both do not provide me with the full website source code that I am looking for. I am trying to scrape the news titles from the website URL provided below.

URL: "https://www.todayonline.com/"

These are the two methods I have tried but failed.

Method 1: Beautiful Soup

tdy_url = "https://www.todayonline.com/"
page = requests.get(tdy_url).text
soup = BeautifulSoup(page)
soup  # Returns me a HTML with javascript text
soup.find_all('h3')

### Returns me empty list []

Method 2: Selenium + BeautifulSoup

tdy_url = "https://www.todayonline.com/"

options = Options()
options.headless = True

driver = webdriver.Chrome("chromedriver",options=options)

driver.get(tdy_url)
time.sleep(10)
html = driver.page_source

soup = BeautifulSoup(html)
soup.find_all('h3')

### Returns me only less than 1/4 of the 'h3' tags found in the original page source 

Please help. I have tried scraping other news websites and it is so much easier. Thank you.

Answers

You can access data via API (check out the Network tab): enter image description here


For example,

import requests
url = "https://www.todayonline.com/api/v3/news_feed/7"
data = requests.get(url).json()
Logo

学AI,认准AI Studio!GPU算力,限时免费领,邀请好友解锁更多惊喜福利 >>>

更多推荐