Answer a question

I am trying to extract all href links that are within class ['address']. Each time I run the code, I only get the first 5 and that's it, even though I know there should be 9.

Web-Page: https://www.walgreens.com/storelocator/find.jsp?requestType=locator&state=AK&city=ANCHORAGE&from=localSearch

I have read through a variety of threads below, altered my code countless times, including switching through all parsers (html.parser, html5lib, lxml, xml, lxml-xml) but nothing seems to be working. Any idea of what's causing it stop after the 5th iteration? I am still fairly new into python so I apologize if this is a rookie mistake that I'm overlooking. Any help would be appreciated, even the sarcastic answers :)

  • Beautiful Soup findAll doesn't find them all

  • Beautiful Soup 4 find_all don't find links that Beautiful Soup 3 finds

  • BeautifulSoup fails to parse long view state

  • Beautifulsoup lost nodes

  • Missing parts on Beautiful Soup results

  • Python 64 bit not storing as long of string as 32 bit python

I used pretty similar code on the following web-pages below and did not experience any issues scraping the hrefs: https://www.walgreens.com/storelistings/storesbystate.jsp?requestType=locator https://www.walgreens.com/storelistings/storesbycity.jsp?requestType=locator&state=AK

My code below:

import requests
from bs4 import BeautifulSoup


local_rg = requests.get('https://www.walgreens.com/storelocator/find.jsp?requestType=locator&state=AK&city=ANCHORAGE&from=localSearch')

local_rg_content = local_rg.content
local_rg_content_src = BeautifulSoup(local_rg_content, 'lxml')

for link in local_rg_content_src.find_all('div'):
    local_class = str(link.get('class'))
    if str("['address']") in str(local_class):
        local_a = link.find_all('a')
        for a_link in local_a:
            local_href = str(a_link.get('href'))
            print(local_href)

My results (first 5):

  1. /locator/walgreens-1470+w+northern+lights+blvd-anchorage-ak-99503/id=15092
  2. /locator/walgreens-725+e+northern+lights+blvd-anchorage-ak-99503/id=13656
  3. /locator/walgreens-4353+lake+otis+parkway-anchorage-ak-99508/id=15653
  4. /locator/walgreens-7600+debarr+rd-anchorage-ak-99504/id=12679
  5. /locator/walgreens-2197+w+dimond+blvd-anchorage-ak-99515/id=12680

But should be 9:

  1. /locator/walgreens-1470+w+northern+lights+blvd-anchorage-ak-99503/id=15092
  2. /locator/walgreens-725+e+northern+lights+blvd-anchorage-ak-99503/id=13656
  3. /locator/walgreens-4353+lake+otis+parkway-anchorage-ak-99508/id=15653
  4. /locator/walgreens-7600+debarr+rd-anchorage-ak-99504/id=12679
  5. /locator/walgreens-2197+w+dimond+blvd-anchorage-ak-99515/id=12680
  6. /locator/walgreens-2550+e+88th+ave-anchorage-ak-99507/id=15654
  7. /locator/walgreens-12405+brandon+st-anchorage-ak-99515/id=13449
  8. /locator/walgreens-12051+old+glenn+hwy-eagle+river-ak-99577/id=15362
  9. /locator/walgreens-1721+e+parks+hwy-wasilla-ak-99654/id=12681

Answers

Try using selenium instead of requests to get the source code of the page. Here is how you do it:

from selenium import webdriver

driver = webdriver.Chrome()
driver.get('https://www.walgreens.com/storelocator/find.jsp?requestType=locator&state=AK&city=ANCHORAGE&from=localSearch')

local_rg_content = driver.page_source
driver.close()
local_rg_content_src = BeautifulSoup(local_rg_content, 'lxml')

The rest of the code is the same. Here is the full code:

from bs4 import BeautifulSoup
from selenium import webdriver

driver = webdriver.Chrome()
driver.get('https://www.walgreens.com/storelocator/find.jsp?requestType=locator&state=AK&city=ANCHORAGE&from=localSearch')

local_rg_content = driver.page_source
driver.close()
local_rg_content_src = BeautifulSoup(local_rg_content, 'lxml')

for link in local_rg_content_src.find_all('div'):
    local_class = str(link.get('class'))
    if str("['address']") in str(local_class):
        local_a = link.find_all('a')
        for a_link in local_a:
            local_href = str(a_link.get('href'))
            print(local_href)

Output:

/locator/walgreens-1470+w+northern+lights+blvd-anchorage-ak-99503/id=15092
/locator/walgreens-725+e+northern+lights+blvd-anchorage-ak-99503/id=13656
/locator/walgreens-4353+lake+otis+parkway-anchorage-ak-99508/id=15653
/locator/walgreens-7600+debarr+rd-anchorage-ak-99504/id=12679
/locator/walgreens-2197+w+dimond+blvd-anchorage-ak-99515/id=12680
/locator/walgreens-2550+e+88th+ave-anchorage-ak-99507/id=15654
/locator/walgreens-12405+brandon+st-anchorage-ak-99515/id=13449
/locator/walgreens-12051+old+glenn+hwy-eagle+river-ak-99577/id=15362
/locator/walgreens-1721+e+parks+hwy-wasilla-ak-99654/id=12681
Logo

Python社区为您提供最前沿的新闻资讯和知识内容

更多推荐