How can I get text without specific tags in BeautifulSoup?

Mangs

0人浏览 · 2022-08-24 16:57:54

Mangs · 2022-08-24 16:57:54 发布

Answer a question

I have some html like

html = '''<span class="head">A</span>Explanation <span style="color: red;">1</span><span class="head">B</span>Explanation 2<span class="head">C</span>Explanation <span style="color: red;">3</span>'''
soup = BeautifulSoup(html)

Now I want to separate this by

head = ["A", "B", "C"]
contents = ["Explanation 1", "Explanation 2", "Explanation 3"]

I could extract heads by

head = [i.get_text() for i in soup.select("span.head")]

but couldn't figure out how to extract contents.

Answers

Unfortunately, my hanzi is not what it should be, but this is what I get:

targets = soup.select('span.head')
heads = []
entries = []
for target in targets:
    entry = []
    heads.append(target.text)
    entry.append(target.next_sibling)
    if target.next_sibling.next_sibling.has_attr('style'):
        entry.append(target.next_sibling.next_sibling.text)
    entries.append(''.join(entry).strip().replace('\n\t',''))
print(heads)
print(entries)

Output:

['東', '菄', '鶇']
['春方也〾說文曰動...爲人', '東風菜義見上注俗加艹', '鶇鵍鳥名美形出廣雅亦作？']

Is that correct?

向您推荐>>百度飞桨AI Studio社区

学AI，认准AI Studio！GPU算力，限时免费领，邀请好友解锁更多惊喜福利 >>>

更多推荐

求助！为什么用InsCode部署会出现无限重定向？

Python

如何重塑熊猫。系列

问题:如何重塑熊猫。系列在我看来,它就像 pandas.Series 中的一个错误。 a = pd.Series([1,2,3,4]) b = a.reshape(2,2) b b 有类型 Series 但无法显示,最后一条语句给出异常,非常冗长,最后一行是“TypeError: %d format: a number is required, not numpy.ndarray”。 b.sha

Python

在哪里可以找到有关 Keras 中默认权重初始化器的文档? [复制]

问题:在哪里可以找到有关 Keras 中默认权重初始化器的文档? [复制] 我刚刚在这里](https://keras.io/initializers/)中阅读了有关[中的 Keras 权重初始化器的信息。在文档中,只介绍了不同的初始化程序。如: model.add(Dense(64, kernel_initializer='random_normal')) 当我没有指定kernel_initia