尝试在特定 div 和 sub div 下提取数据

Mangs

0人浏览 · 2022-08-28 19:52:55

Mangs · 2022-08-28 19:52:55 发布

问题:尝试在特定 div 和 sub div 下提取数据

我试图得到它,这样我就可以让它打印书名和章节,但只打印每本书和标题。

所以基本上是“雅各的第一本书”第 1-7 章

而不是遍历所有书籍。

这是页面布局(python代码中包含的url)

<dl>
  <dt>Title</dt>
  <dd>
    <dl>
      <dt>Sub Title</dt>
    </dl>
  </dd>
  <dt>Title 2</dt>
  <dd>
    <dl>
      <dt>Sub Title 2</dt>
    </dl>
  </dd>
</dl>
#this continues for Title 3, Sub title 3, etc etc

这是python代码

import requests
import bs4


scripture_url = 'http://scriptures.nephi.org/docbook/bom/'
response = requests.get(scripture_url)
soup = bs4.BeautifulSoup(response.text)

links = soup.select('dl dd dt')
for item in links:
    title = str(item.get_text()).split(' ', 1)[1]
    print title

这是输出

Chapter 1
Chapter 2
Chapter 3
Chapter 4
Chapter 5
Chapter 6
Chapter 7
Chapter 8
Chapter 9
Chapter 10
Chapter 11
Chapter 12
Chapter 13
Chapter 14
Chapter 15
Chapter 16
Chapter 17
Chapter 18
Chapter 19
Chapter 20
Chapter 21
Chapter 22
Chapter 1
Chapter 2
Chapter 3
Chapter 4
Chapter 5
Chapter 6
Chapter 7
Chapter 8
Chapter 9
Chapter 10
Chapter 11
Chapter 12
Chapter 13
Chapter 14
Chapter 15
Chapter 16
Chapter 17
Chapter 18
Chapter 19
Chapter 20
Chapter 21
Chapter 22
Chapter 23
Chapter 24
Chapter 25
Chapter 26
Chapter 27
Chapter 28
Chapter 29
Chapter 30
Chapter 31
Chapter 32
Chapter 33
Chapter 1
Chapter 2
Chapter 3
Chapter 4
Chapter 5
Chapter 6
Chapter 7
Chapter 1
Chapter 1

解答

你可以尝试这样的事情。首先,找到一本书,例如,标题为“The Book of Jacob”:

book_title = 'The Book of Jacob'
book = soup.find('a', text=book_title)
print book.text

然后选择书名的直接兄弟<dd>,并在<dd>元素中找到所有对应的章节:

links = book.parent.select('+ dd > dl > dt')
for item in links:
    title = str(item.get_text()).split(' ', 1)[1]
    print title

输出 :

The Book of Jacob
Chapter 1
Chapter 2
Chapter 3
Chapter 4
Chapter 5
Chapter 6
Chapter 7

Python

Python社区为您提供最前沿的新闻资讯和知识内容

更多推荐

求助！为什么用InsCode部署会出现无限重定向？

Python

如何重塑熊猫。系列

问题:如何重塑熊猫。系列在我看来,它就像 pandas.Series 中的一个错误。 a = pd.Series([1,2,3,4]) b = a.reshape(2,2) b b 有类型 Series 但无法显示,最后一条语句给出异常,非常冗长,最后一行是“TypeError: %d format: a number is required, not numpy.ndarray”。 b.sha

Python

在哪里可以找到有关 Keras 中默认权重初始化器的文档? [复制]

问题:在哪里可以找到有关 Keras 中默认权重初始化器的文档? [复制] 我刚刚在这里](https://keras.io/initializers/)中阅读了有关[中的 Keras 权重初始化器的信息。在文档中,只介绍了不同的初始化程序。如: model.add(Dense(64, kernel_initializer='random_normal')) 当我没有指定kernel_initia