Answer a question

I have some html like

html = '''<span class="head">A</span>Explanation <span style="color: red;">1</span><span class="head">B</span>Explanation 2<span class="head">C</span>Explanation <span style="color: red;">3</span>'''
soup = BeautifulSoup(html)

Now I want to separate this by

head = ["A", "B", "C"]
contents = ["Explanation 1", "Explanation 2", "Explanation 3"]

I could extract heads by

head = [i.get_text() for i in soup.select("span.head")]

but couldn't figure out how to extract contents.

Answers

Unfortunately, my hanzi is not what it should be, but this is what I get:

targets = soup.select('span.head')
heads = []
entries = []
for target in targets:
    entry = []
    heads.append(target.text)
    entry.append(target.next_sibling)
    if target.next_sibling.next_sibling.has_attr('style'):
        entry.append(target.next_sibling.next_sibling.text)
    entries.append(''.join(entry).strip().replace('\n\t',''))
print(heads)
print(entries)

Output:

['東', '菄', '鶇']
['春方也〾說文曰動...爲人', '東風菜義見上注俗加艹', '鶇鵍鳥名美形出廣雅亦作?']

Is that correct?

Logo

学AI,认准AI Studio!GPU算力,限时免费领,邀请好友解锁更多惊喜福利 >>>

更多推荐