How can I get text without specific tags in BeautifulSoup?
·
Answer a question
I have some html like
html = '''<span class="head">A</span>Explanation <span style="color: red;">1</span><span class="head">B</span>Explanation 2<span class="head">C</span>Explanation <span style="color: red;">3</span>'''
soup = BeautifulSoup(html)
Now I want to separate this by
head = ["A", "B", "C"]
contents = ["Explanation 1", "Explanation 2", "Explanation 3"]
I could extract heads by
head = [i.get_text() for i in soup.select("span.head")]
but couldn't figure out how to extract contents.
Answers
Unfortunately, my hanzi is not what it should be, but this is what I get:
targets = soup.select('span.head')
heads = []
entries = []
for target in targets:
entry = []
heads.append(target.text)
entry.append(target.next_sibling)
if target.next_sibling.next_sibling.has_attr('style'):
entry.append(target.next_sibling.next_sibling.text)
entries.append(''.join(entry).strip().replace('\n\t',''))
print(heads)
print(entries)
Output:
['東', '菄', '鶇']
['春方也〾說文曰動...爲人', '東風菜義見上注俗加艹', '鶇鵍鳥名美形出廣雅亦作?']
Is that correct?
更多推荐
所有评论(0)