Answer a question

I have a BeautifulSoup of this format

<div class='text'>
<h3> text </h3>
<p> some more text </p>
"text here <b> is </b> important"
</div>

How do I extract just the string "text here is important" leaving out the h3, and p elements but the bold tag text remains inside the output

Thanks a ton

Answers

You can use tag.decompose() to remove the unwanted tags and then extract the remaining text.

from bs4 import BeautifulSoup
spam = """<div class='text'>
<h3> text </h3>
<p> some more text </p>
"text here <b> is </b> important"
</div>"""

soup = BeautifulSoup(spam, 'html.parser')
div = soup.find('div')
for tag in ('h3', 'p'):
    div.find(tag).decompose()
print(div.text.strip())

output

"text here  is  important"
Logo

Python社区为您提供最前沿的新闻资讯和知识内容

更多推荐