从单个结果中抓取数据,生成提供者列表
·
问题:从单个结果中抓取数据,生成提供者列表
我正在尝试通过给定的邮政编码获取所有治疗师的地址。我想输入邮政编码并获取结果列表。然后,进入个人结果并抓取提供者的地址。
我是 python 新手。我一直在尝试使用请求和 BeautifulSoup。也许使用 Selenium 可能会更好?
import requests
from bs4 import BeautifulSoup
url = 'https://www.psychologytoday.com/us/therapists/60148'
headers = {'User-Agent': 'Mozilla/5.0 (Macintosh; Intel Mac OS X 10_11_5) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/50.0.2661.102 Safari/537.36'}
page = requests.get(url, headers=headers)
print(page.content.decode())
soup = BeautifulSoup(page.text, 'html.parser')
myclass = soup.find_all(class_='result-row normal-result row')
print(myclass)
我现在被困住了。不知道如何进行。 PS。在我们说话的时候,我正在上 Python 课程。请善待。
解答
试试这个,您将通过给定的邮政编码获得所有治疗师的地址:
但是,如果您想获取地址的所有页面,那么这仅提供 1 页编号的地址列表,那么您应该使用 selenium,将解决您的问题。
import requests
from bs4 import BeautifulSoup
from bs4.element import Tag
url = 'https://www.psychologytoday.com/us/therapists/60148'
headers = {'User-Agent': 'Mozilla/5.0 (Macintosh; Intel Mac OS X 10_11_5) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/50.0.2661.102 Safari/537.36'}
page = requests.get(url, headers=headers)
soup = BeautifulSoup(page.text, 'html.parser')
result = soup.find(class_='results-column')
addressArray = []
for tag in result:
if isinstance(tag,Tag):
_class = tag.get("class")
if _class is None or _class is not None and "row" not in _class:
continue
link = (tag.find(class_='result-actions')).find('a',href=True)
_href = link['href']
address_link = requests.get(_href, headers=headers)
soup1 = BeautifulSoup(address_link.text, 'html.parser')
address = (soup1.find(class_='address')).find(class_="location-address-phone")
text = ''
for index,data in enumerate((address.text.strip()).split('\n')):
if not data.strip():
continue
if not text:
text = data.strip()
else:
text = text+","+data.strip()
if text:
addressArray.append(text)
print(addressArray)
输出/输出:
['Lia Reynolds, LCSW,Lombard, Illinois 60148,(630) 343-5819', 'Clarity Counseling and Wellness, LLC,477 Butterfield Road,#202,Lombard, Illinois 60148,(630) 656-9713', '450 East 22nd St.,Suite 172,Lombard, Illinois 60148,(773) 599-3959', '10 E 22nd Street,Suite 217,Lombard, Illinois 60148,(630) 517-9505', 'Ron Ahlberg & Associates,477 E Butterfield Rd,Suite 310,Lombard, Illinois 60148,(630) 451-8653', 'Health Transitions Counseling,477 Butterfield Road,Suite 310,Lombard, Illinois 60148,(630) 785-6642', 'Way Beyond Counseling and Coaching,477 E Butterfield Road,Floor 3 - Wellness Center - Office 7,Lombard, Illinois 60148,Call Mr. Larry Westenberg,(630) 556-8484', 'Chicago Area Behavioral Health Services,150 W St Charles Road,Lombard, Illinois 60148,Call Augustus Edeh. Chicago Area Behavioral Health Services,(630) 599-8032', 'Adult Children Center, Ltd,2 East 22nd Street,Suite 302,Lombard, Illinois 60148,(630) 387-9750', 'Midwest Center for Hope & Healing, Ltd.,1165 S Westmore-meyers Rd,Lombard, Illinois 60148,(630) 765-5355', 'Madrigal Consulting and Counseling, LLP,450 E. 22nd Street,Suite 150,Lombard, Illinois 60148,Call Cesar Madrigal,(630) 413-9942', '477 E Butterfield Rd,Suite 202,Lombard, Illinois 60148,(630) 560-6920', 'Lombard,Lombard, Illinois 60148,(630) 796-7904', 'Dupage Clinical Counseling Services,450 E 22nd St,150,Lombard, Illinois 60148,(630) 313-4990', '2200 S Main St,Suite 316,Lombard, Illinois 60148,(630) 426-7819', 'Institute for Motivational Development,10 E 22nd Street, Suite 217,Lombard, Illinois 60148,(309) 723-8170', 'Michele DeCanio Counseling Services,2200 S. Main Street,Suite 305,Lombard, Illinois 60148,(630) 560-6926', 'A New Day Counseling Center,450 E 22nd St,Suite 150,Lombard, Illinois 60148,(630) 748-8261', '477 E Butterfield Rd,Suite 310,Lombard, Illinois 60148,(630) 426-6878', 'Bricolage Wellness,477 Butterfield Road,Suite 202,Lombard, Illinois 60148,(630) 426-7823']
其中'result-actions'是打开新页面的操作视图按钮类,因此需要再发出一个请求才能获得完整地址。
"location-address-phone"是 scrape 地址的新地址页类。
文档链接:
https://selenium-python.readthedocs.io/
https://www.crummy.com/software/BeautifulSoup/bs4/doc/
更多推荐

所有评论(0)