Answer a question

sentences.find_all(['p','h2'],attrs={['class':None,'class':Not None]}).

This is an invalid syntax but is there any alternative to doing this. I want p tags with one attribute and h2 tag with another attribute and I need them sequentially not like finding them as two parse tree i.e I don't want to do

  1. sentences.find_all('p',attrs={'class':None])
  2. sentences.find_all('h2',attrs={'class':Not None])

Answers

You can use CSS selector with , (CSS reference):

from bs4 import BeautifulSoup

html_doc = """
<p class="cls1">Select this</p>
<p class="cls2">Don't select this</p>
<h2 class="cls3">Select this</h2>
<h2 class="cls4">Don't select this</h2>
"""

soup = BeautifulSoup(html_doc, "html.parser")

for tag in soup.select("p.cls1, h2.cls3"):
    print(tag)

Prints:

<p class="cls1">Select this</p>
<h2 class="cls3">Select this</h2>

EDIT: To select multiple tags and one tag has to have empty attributes:

from bs4 import BeautifulSoup

html_doc = """
<p>Select this</p>
<p class="cls2">Don't select this</p>
<h2 class="cls3">Select this</h2>
<h2 class="cls4">Don't select this</h2>
"""

soup = BeautifulSoup(html_doc, "html.parser")

for tag in soup.select("p, h2.cls3"):
    if tag.name == "p" and len(tag.attrs) != 0:
        continue
    print(tag)

Prints:

<p>Select this</p>
<h2 class="cls3">Select this</h2>
Logo

Python社区为您提供最前沿的新闻资讯和知识内容

更多推荐