BeautifulSoup Iterating through pages but data scraped is duplicating every loop
·
Answer a question
import requests
from bs4 import BeautifulSoup
import pandas as pd
from pandas import DataFrame
import numpy as np
import time
headers = {"Accept-Language": "en-US,en;q=0.5"}
symbol = []
name = []
asset_class = []
pages = np.arange(1, 5)
list=[]
for page in pages:
print("https://etfdb.com/screener/#page=" + str(page)) # to watch progress
page = requests.get("https://etfdb.com/screener/#page=" + str(page),
headers=headers)
time.sleep(3) #in seconds
soup = BeautifulSoup(page.text, 'html.parser') # Parse the HTML as a string
table = soup.find('table', class_ = "table table-bordered table-hover table-striped mm-mobile-table")
for etf in table.find_all('tbody'):
rows = etf.find_all('tr')
for row in rows:
Cell = row.get_text().rstrip()
Cell = Cell.replace("\n\n\n", "\n").replace("\n\n", "\n")
list.append(Cell.splitlines())
df = DataFrame(list, columns=['Empty','Symbol', 'ETF Name',
"Previous Closing Price","Total Assets ($MM)",
"Avg. Daily Share Volume (3mo)","YTD Price Change","Asset Class"])
del df['Empty']
print(df)
df.to_csv('etfs.csv')'''
I added a sleep timer between pages, but that didn't help. Data is flowing in from all the pages but it is just repeating on every page. I made sure that the page numbers are working correctly in the browser.
Answers
The data is loaded from external source via Javascript. You can use this example how to load the data:
import json
import requests
params = {"page": 1, "only": ["meta", "data"]}
url = "https://etfdb.com/api/screener/"
page = 1
while True:
print("Getting page {}...".format(page))
params["page"] = page
data = requests.post(url, json=params).json()
# uncomment to see all data:
# print(json.dumps(data, indent=4))
if not data["data"]:
break
# print some data:
for d in data["data"]:
print("{:<60} {}".format(d["mobile_title"], d["price"]))
page += 1
Prints:
Getting page 1...
SPY - SPDR S&P 500 ETF $408.52
IVV - iShares Core S&P 500 ETF $410.01
VTI - Vanguard Total Stock Market ETF $212.85
VOO - Vanguard S&P 500 ETF $375.55
QQQ - Invesco QQQ $335.08
VEA - Vanguard FTSE Developed Markets ETF $50.44
IEFA - iShares Core MSCI EAFE ETF $74.02
AGG - iShares Core U.S. Aggregate Bond ETF $114.31
IEMG - iShares Core MSCI Emerging Markets ETF $65.30
VWO - Vanguard FTSE Emerging Markets ETF $52.65
VTV - Vanguard Value ETF $132.85
VUG - Vanguard Growth ETF $270.14
BND - Vanguard Total Bond Market ETF $85.06
IJR - iShares Core S&P Small-Cap ETF $109.39
IWM - iShares Russell 2000 ETF $222.56
IWF - iShares Russell 1000 Growth ETF $254.60
IJH - iShares Core S&P Mid-Cap ETF $265.11
GLD - SPDR Gold Trust $164.51
VIG - Vanguard Dividend Appreciation ETF $149.85
EFA - iShares MSCI EAFE ETF $77.75
IWD - iShares Russell 1000 Value ETF $154.08
VO - Vanguard Mid-Cap Index ETF $226.92
VB - Vanguard Small Cap ETF $217.44
VXUS - Vanguard Total International Stock ETF $64.13
VGT - Vanguard Information Technology ETF $378.80
Getting page 2...
VCIT - Vanguard Intermediate-Term Corporate Bond ETF $93.55
BNDX - Vanguard Total International Bond ETF $57.28
XLK - Technology Select Sector SPDR Fund $140.42
LQD - iShares iBoxx $ Investment Grade Corporate Bond ETF $131.06
ONEQ - Fidelity NASDAQ Composite Index Track $53.72
XLF - Financial Select Sector SPDR Fund $34.85
...and so on.
更多推荐

所有评论(0)