Scraping all URLs from search result page BeautifulSoup
Answer a question
I'm trying to get 100 URLs from the following search result page:
https://www.willhaben.at/iad/kaufen-und-verkaufen/marktplatz/fahrraeder-radsport/fahrraeder-4552?rows=100&areaId=900
Here's the test code I have:
import requests
from bs4 import BeautifulSoup
urls = []
def get_urls(url):
page = requests.get(url)
soup = BeautifulSoup(page.content,'html.parser')
s = soup.find('a', class_="header w-brk")
urls.append(s)
print(urls)
Unfortunately the list returns [None]
. I've also tried using href=True
in the soup.find
or soup.find_all
method but unfortunately that doesn't work either. I can see another problem with this:
The URL the page provides in the source is for example: a href="/iad/kaufen-und-verkaufen/d/fahrrad-429985104/"
just the end of the willhaben.at
URL. When I do get all of these URLs appended to my list, I won't be able to scrape them just like they are, I'll have to somehow add the root URL to it before my scraper can load it.
What is the most effective way I can solve this?
Thanks!
Answers
Checkout:
import requests
from bs4 import BeautifulSoup
urls = []
url = "https://www.willhaben.at/iad/kaufen-und-verkaufen/marktplatz/fahrraeder-radsport/fahrraeder-4552?rows=100&areaId=900"
def get_urls(url):
page = requests.get(url)
soup = BeautifulSoup(page.content,'html.parser')
s = soup.findAll("div", {"class": "w-brk"})
for link in s:
l = link.find("a")
urls.append("https://www.willhaben.at"+l['href'])
print(urls)
get_urls(url)
更多推荐
所有评论(0)