BeautifulSoup - All href links don't appear to be extracting
Answer a question
I am trying to extract all href links that are within class ['address']. Each time I run the code, I only get the first 5 and that's it, even though I know there should be 9.
Web-Page: https://www.walgreens.com/storelocator/find.jsp?requestType=locator&state=AK&city=ANCHORAGE&from=localSearch
I have read through a variety of threads below, altered my code countless times, including switching through all parsers (html.parser, html5lib, lxml, xml, lxml-xml) but nothing seems to be working. Any idea of what's causing it stop after the 5th iteration? I am still fairly new into python so I apologize if this is a rookie mistake that I'm overlooking. Any help would be appreciated, even the sarcastic answers :)
-
Beautiful Soup findAll doesn't find them all
-
Beautiful Soup 4 find_all don't find links that Beautiful Soup 3 finds
-
BeautifulSoup fails to parse long view state
-
Beautifulsoup lost nodes
-
Missing parts on Beautiful Soup results
-
Python 64 bit not storing as long of string as 32 bit python
I used pretty similar code on the following web-pages below and did not experience any issues scraping the hrefs: https://www.walgreens.com/storelistings/storesbystate.jsp?requestType=locator https://www.walgreens.com/storelistings/storesbycity.jsp?requestType=locator&state=AK
My code below:
import requests
from bs4 import BeautifulSoup
local_rg = requests.get('https://www.walgreens.com/storelocator/find.jsp?requestType=locator&state=AK&city=ANCHORAGE&from=localSearch')
local_rg_content = local_rg.content
local_rg_content_src = BeautifulSoup(local_rg_content, 'lxml')
for link in local_rg_content_src.find_all('div'):
local_class = str(link.get('class'))
if str("['address']") in str(local_class):
local_a = link.find_all('a')
for a_link in local_a:
local_href = str(a_link.get('href'))
print(local_href)
My results (first 5):
- /locator/walgreens-1470+w+northern+lights+blvd-anchorage-ak-99503/id=15092
- /locator/walgreens-725+e+northern+lights+blvd-anchorage-ak-99503/id=13656
- /locator/walgreens-4353+lake+otis+parkway-anchorage-ak-99508/id=15653
- /locator/walgreens-7600+debarr+rd-anchorage-ak-99504/id=12679
- /locator/walgreens-2197+w+dimond+blvd-anchorage-ak-99515/id=12680
But should be 9:
- /locator/walgreens-1470+w+northern+lights+blvd-anchorage-ak-99503/id=15092
- /locator/walgreens-725+e+northern+lights+blvd-anchorage-ak-99503/id=13656
- /locator/walgreens-4353+lake+otis+parkway-anchorage-ak-99508/id=15653
- /locator/walgreens-7600+debarr+rd-anchorage-ak-99504/id=12679
- /locator/walgreens-2197+w+dimond+blvd-anchorage-ak-99515/id=12680
- /locator/walgreens-2550+e+88th+ave-anchorage-ak-99507/id=15654
- /locator/walgreens-12405+brandon+st-anchorage-ak-99515/id=13449
- /locator/walgreens-12051+old+glenn+hwy-eagle+river-ak-99577/id=15362
- /locator/walgreens-1721+e+parks+hwy-wasilla-ak-99654/id=12681
Answers
Try using selenium instead of requests to get the source code of the page. Here is how you do it:
from selenium import webdriver
driver = webdriver.Chrome()
driver.get('https://www.walgreens.com/storelocator/find.jsp?requestType=locator&state=AK&city=ANCHORAGE&from=localSearch')
local_rg_content = driver.page_source
driver.close()
local_rg_content_src = BeautifulSoup(local_rg_content, 'lxml')
The rest of the code is the same. Here is the full code:
from bs4 import BeautifulSoup
from selenium import webdriver
driver = webdriver.Chrome()
driver.get('https://www.walgreens.com/storelocator/find.jsp?requestType=locator&state=AK&city=ANCHORAGE&from=localSearch')
local_rg_content = driver.page_source
driver.close()
local_rg_content_src = BeautifulSoup(local_rg_content, 'lxml')
for link in local_rg_content_src.find_all('div'):
local_class = str(link.get('class'))
if str("['address']") in str(local_class):
local_a = link.find_all('a')
for a_link in local_a:
local_href = str(a_link.get('href'))
print(local_href)
Output:
/locator/walgreens-1470+w+northern+lights+blvd-anchorage-ak-99503/id=15092
/locator/walgreens-725+e+northern+lights+blvd-anchorage-ak-99503/id=13656
/locator/walgreens-4353+lake+otis+parkway-anchorage-ak-99508/id=15653
/locator/walgreens-7600+debarr+rd-anchorage-ak-99504/id=12679
/locator/walgreens-2197+w+dimond+blvd-anchorage-ak-99515/id=12680
/locator/walgreens-2550+e+88th+ave-anchorage-ak-99507/id=15654
/locator/walgreens-12405+brandon+st-anchorage-ak-99515/id=13449
/locator/walgreens-12051+old+glenn+hwy-eagle+river-ak-99577/id=15362
/locator/walgreens-1721+e+parks+hwy-wasilla-ak-99654/id=12681
更多推荐

所有评论(0)