python正则表达式re模块之findall函数

它返回string中所有与pattern匹配的全部字符串,返回形式为列表，如果pattern中含有分组，返回分组的匹配结果。如果有pattern中有多个分组，则返回元组列表。importrekk=re.compile(r'\d+')kk.findall('one1two2three3four4')#[1,2,3,4]#注意此处findall()的用法，可传两个参数;......

文章共1,726字 · 阅读需要大约6分钟

一键AI生成摘要，助你高效阅读

问答

Lavi_qq_2910138025

13977人浏览 · 2022-08-01 14:40:55

Lavi_qq_2910138025 · 2022-08-01 14:40:55 发布

1. re.findall函数介绍

findall（）函数在re模块中的定义如下：

def findall(pattern, string, flags=0):
    """Return a list of all non-overlapping matches in the string.

    If one or more capturing groups are present in the pattern, return
    a list of groups; this will be a list of tuples if the pattern
    has more than one group.

    Empty matches are included in the result."""

    return _compile(pattern, flags).findall(string)

它返回string中所有与pattern匹配的全部字符串,返回形式为列表，如果pattern中含有分组，返回分组的匹配结果。如果有pattern中有多个分组，则返回元组列表。

findall()函数的调用有两种表示形式，其实所有的python正则函数，都用这两种调用方式，详细可以参看博客：python正则表达式与re模块

import re
kk = re.compile(r'\d+')
kk.findall('one1two2three3four4')
#[1,2,3,4]
 
#注意此处findall()的用法，可传两个参数;
kk = re.compile(r'\d+')
re.findall(kk,"one123")
#[1,2,3]

示例代码：

import re

str = 'aabbabaabbaa'

# 一个"."就是匹配除 \n (换行符)以外的任意一个字符
print(re.findall(r'a.b', str))  # ['aab', 'aab']

# *前面的字符出现0次或以上
print(re.findall(r'a*b', str))  # ['aab', 'b', 'ab', 'aab', 'b']

输出结果：

['aab', 'aab']
['aab', 'b', 'ab', 'aab', 'b']

2. findall函数捕获分组

实际上这并不是python特有的，这是正则所特有的，任何一门高级语言使用正则都满足这个特点：有括号时只能匹配到括号中的内容，没有括号【相当于在最外层增加了一个括号】。在正则里面 “()” 代表的是分组的意思，一个括号代表一个分组，你只能匹配到 “()” 中的内容。

import re

str = 'aabpythonbaregexa,aabpythonbacoola'

#分组搜索
print(re.findall(r'a(.+?)a', str))
print(re.findall(r'a(.*?)a', str))
print(re.findall(r'b(.*?)b.*?a(.*?)a', str))

嵌套括号：

import re

string="abcdefg  acbdgef  abcdgfe  cadbgfe"

#不带括号
regex=re.compile("((\w+)\s+\w+)")
print(regex.findall(string))
#输出：[('abcdefg  acbdgef', 'abcdefg'), ('abcdgfe  cadbgfe', 'abcdgfe')]

regex1=re.compile("(\w+)\s+\w+")
print(regex1.findall(string))
#输出：['abcdefg', 'abcdgfe']

regex2=re.compile("\w+\s+\w+")
print(regex2.findall(string))
#输出：['abcdefg  acbdgef', 'abcdgfe  cadbgfe']

3. re.findall中正则表达式(.*?)

import re

str = 'aabpythonbaregexa,aabpythonbacoola'

#分组搜索
# aa中间的字符出现一次以上
print(re.findall(r'a(.+?)a', str))
# aa中间的字符出现0次或者多次
print(re.findall(r'a(.*?)a', str))
# 多个分组捕获
print(re.findall(r'b(.*?)b.*?a(.*?)a', str))

输出结果：

['abpythonb', ',', 'bpythonb']
['', 'regex', '', 'cool']
[('python', 'regex'), ('python', 'cool')]

4. re.findall中参数re.S的意义

import re
str = '''aabbab
         aabbaa
         bb'''  # 后面多加了2个b

# 参数无re.S,没有把最后一个换行的aab算进来
print(re.findall(r'a.*?b', str))  # ['aab', 'ab', 'aab']

# re.S不会对\n进行中断
print(re.findall(r'a.*?b', str, re.S))  # ['aab', 'ab', 'aab', 'aa\n         b']

输出结果：

['aab', 'ab', 'aab']
['aab', 'ab', 'aab', 'aa\n         b']

Python 正则表达式re模块之findall()详解

GitCode 开源社区

旨在为数千万中国开发者提供一个无缝且高效的云端环境，以支持学习、使用和贡献开源项目。

更多推荐

【目标检测】目标检测的一些常用神经网络模型及方法

我的阶段性总结????文章目录1.概述1.2 目标检测的任务1.3 目标检测的分类2.R-CNN系列2.1 [R-CNN（Region with CNN features）](https://arxiv.org/pdf/1311.2524.pdf)2.2 [Fast R-CNN](https://www.cv-foundation.org/openaccess/content_iccv_2015/