BeautifulSoup_find_ 用法和实例

Usage_BeautifulSoup_find_* 函数的使用#!/usr/bin/env python# -*- coding: utf-8 -*-# @Date: 2017-09-24 17:27:33# @Author: kangvcar (kangvcar@126.com)# @Link: http://www.github.com/kangvca

Kangvcar Blogs

4965人浏览 · 2017-09-25 15:35:17

Kangvcar Blogs · 2017-09-25 15:35:17 发布

Usage_BeautifulSoup_find_* 函数的使用

#!/usr/bin/env python
# -*- coding: utf-8 -*-
# @Date    : 2017-09-24 17:27:33
# @Author  : kangvcar (kangvcar@126.com)
# @Link    : http://www.github.com/kangvcar/
# @Version : $Id$

from bs4 import BeautifulSoup
import re

html = """
The Dormouse's story

The Dormouse's story
Once upon a time there were three little sisters; and their names were

     
     ,
Lacie and
Tillie;
and they lived at the bottom of a well.
...
"""
soup = BeautifulSoup(html, "lxml")

########################
###find_all()返回list####
########################

## 查找文档中所有的标签,返回list
# print soup.find_all('b')

## 查找文档中所有的标签,返回list
# for i in soup.find_all('a'):
# 	print i

## 找出所有以b开头的标签,返回list
# print soup.find_all(re.compile("^b"))
# for tag in soup.find_all(re.compile("^b")):
# 	print tag

## 找到文档中所有标签和标签,返回list
# print soup.find_all(['a','b'])

## 找到所有的tag,但是不会返回字符串节点,返回list
# for tag in soup.find_all(True):
# 	print tag.name

## 传入方法，如果这个方法返回 True 表示当前元素匹配并且被找到,如果不是则反回 False
# def has_class_but_no_id(tag):
# 	return tag.has_attr('class') and not tag.has_attr('id')
# print soup.find_all(has_class_but_no_id)

## 匹配id=link2的tag
# print soup.find_all(id='link2')
 
## Beautiful Soup会搜索每个tag的”href”属性
# print soup.find_all(href=re.compile("elsie"))
 
## 使用多个指定名字的参数可以同时过滤tag的多个属性 
# print soup.find_all(href=re.compile("elsie"), id='link1')

## class 是 python 的关键词，这怎么办？加个下划线就可以
# print soup.find_all("a", class_="sister")

## 通过 text 参数可以搜搜文档中的字符串内容
## 与 name 参数的可选值一样, text 参数接受 字符串 , 正则表达式 , 列表, True
# print soup.find_all(text="Elsie")
# print soup.find_all(text=["Tillie", "Elsie", "Lacie"])
# print soup.find_all(text=re.compile("Dormouse"))

## 当搜索到的结果数量达到 limit 的限制时,就停止搜索返回结果.
# print soup.find_all("a", limit=2)

## 调用tag的 find_all() 方法时,Beautiful Soup会检索当前tag的所有子孙节点,如果只想搜索tag的直接子节点,可以使用参数 recursive=False .
# print soup.html.find_all("title")
# print soup.html.find_all("title", recursive=False)

## find_all() 方法的返回结果是值包含一个元素的list
## find() 方法直接返回结果,非list
# print soup.find('a')

## find_parents()  find_parent() 用来搜索当前节点的父辈节点
# print soup.head.title.find_parents()
# print soup.head.title.find_parent()


## find_next_siblings()  方法返回所有符合条件的后面的兄弟节点,返回list
## find_next_sibling()  只返回符合条件的后面的第一个tag节点,非list
# print soup.body.p.find_next_siblings()
# print soup.body.p.find_next_sibling()

## find_previous_siblings()  方法返回所有符合条件的前面的兄弟节点,返回list
## find_previous_sibling()  方法返回第一个符合条件的前面的兄弟节点,非list
# print soup.body.find_previous_siblings()
# print soup.body.find_previous_sibling()

## find_all_next()  方法返回所有符合条件的节点,返回list
## find_next()  方法返回第一个符合条件的节点,非list
# print soup.head.find_all_next()
# print soup.head.find_next()

## find_all_previous()  方法返回所有符合条件的节点,返回list
## find_previous()  方法返回第一个符合条件的节点,非list
# print soup.head.title.find_all_previous()
# print soup.head.title.find_previous()

AtomGit 开源协作平台测评赛

瓜分20万奖金获得内推名额丰厚实物奖励易参与易上手

更多推荐

ChatGPT 提示词：2024最新AIGC提示词大全

开放原子开发者工作坊

项目实战9—企业级分布式存储应用与实战MogileFS、FastDFS

企业级分布式存储应用与实战-mogilefs　　环境：公司已经有了大量沉淀用户，为了让这些沉淀用户长期使用公司平台，公司决定增加用户粘性，逐步发展基于社交属性的多样化业务模式，决定开展用户讨论区、卖家秀、买家秀、用户试穿短视频等业务，因此，公司新的业务的业务特征将需要海量数据存储，你的领导要求基于开源技术，实现对公司海量存储业务的技术研究和实现，你可以完成任务吗？总项目流程图，详见...

开放原子开发者工作坊

.NET Core开源API网关 – Ocelot中文文档

.NET Core开源API网关 – Ocelot中文文档原文:.NET Core开源API网关 – Ocelot中文文档Ocelot是一个用.NET Core实现并且开源的API网关，它功能强大，包括了：路由、请求聚合、服务发现、认证、鉴权、限流熔断、并内置了负载均衡器与Service Fabric、Butterfly Tracing集成。...