beautifulsoup get only string directly inside tag
·
Answer a question
I have a BeautifulSoup of this format
<div class='text'>
<h3> text </h3>
<p> some more text </p>
"text here <b> is </b> important"
</div>
How do I extract just the string "text here is important" leaving out the h3, and p elements but the bold tag text remains inside the output
Thanks a ton
Answers
You can use tag.decompose() to remove the unwanted tags and then extract the remaining text.
from bs4 import BeautifulSoup
spam = """<div class='text'>
<h3> text </h3>
<p> some more text </p>
"text here <b> is </b> important"
</div>"""
soup = BeautifulSoup(spam, 'html.parser')
div = soup.find('div')
for tag in ('h3', 'p'):
div.find(tag).decompose()
print(div.text.strip())
output
"text here is important"
更多推荐

所有评论(0)