BeautifulSoup: How do I extract the text child element with no tag?
·
Answer a question
I have the following html to parse. But I am having an issue extracting the Name only.
<div class="profile-heading--desktop">
<h1>
<span class="profile-heading__rank">
#1
</span>
Jeff Bezos
</h1>
<div class="profile-subheading">
CEO and Founder, Amazon
</div>
</div>
I am having an issue extracting the text for the name, as it is extracting the Rank alongwith it. I want to exclude the Rank showing up with the Name in the following line 2.
#1
#1 Jeff Bezos
CEO and Founder, Amazon
The code is as follows:
import requests
from bs4 import BeautifulSoup
URL = "https://www.forbes.com/profile/jeff-bezos/?list=forbes-400"
page = requests.get(URL)
soup = BeautifulSoup(page.content, "html.parser")
# Rank
rank = soup.find("span", class_="profile-heading__rank")
print(rank.text)
# Name
name = soup.find("div", class_="profile-heading--desktop").find("span").parent
print(name.text)
# Role
role = soup.find("div", class_="profile-subheading")
print(role.text)
Answers
You can use .find_next_sibling() method with text=True:
import requests
from bs4 import BeautifulSoup
URL = "https://www.forbes.com/profile/jeff-bezos/?list=forbes-400"
page = requests.get(URL)
soup = BeautifulSoup(page.content, "html.parser")
# Rank
rank = soup.find("span", class_="profile-heading__rank")
print(rank.text)
# Name
name = rank.find_next_sibling(text=True) # <-- change
print(name) # <-- .text is not necessary
# Role
role = soup.find("div", class_="profile-subheading")
print(role.text)
Prints:
#1
Jeff Bezos
CEO and Founder, Amazon
更多推荐

所有评论(0)