When matching an expression on multiple lines, I always used re.DOTALL
and it worked OK. Now I stumbled across the re.MULTILINE
string, and it looks like it's doing the same thing.
From the re
module (doesn't make it clearer, but the values are different):
M = MULTILINE = sre_compile.SRE_FLAG_MULTILINE # make anchors look for newline
S = DOTALL = sre_compile.SRE_FLAG_DOTALL # make dot match newline
SRE_FLAG_MULTILINE = 8 # treat target as multiline string
SRE_FLAG_DOTALL = 16 # treat target as a single string
So is there a difference in the usage, and what is the subtle cases where it could return something different?
They are quite different. Yes, both affect how newlines are treated, but they switch behaviour for different concepts.
-
re.MULTILINE
affects where ^
and $
anchors match.
Without the switch, ^
and $
match only at the start and end, respectively, of the whole text. With the switch, they also match just before or after a newline:
>>> import re
>>> re.search('foo$', 'foo\nbar') is None # no match
True
>>> re.search('foo$', 'foo\nbar', flags=re.MULTILINE)
<_sre.SRE_Match object; span=(0, 3), match='foo'>
-
re.DOTALL
affects what the .
pattern can match.
Without the switch, .
matches any character except a newline. With the switch, newlines are matched as well:
>>> re.search('foo.', 'foo\nbar') is None # no match
True
>>> re.search('foo.', 'foo\nbar', flags=re.DOTALL)
<_sre.SRE_Match object; span=(0, 4), match='foo\n'>
所有评论(0)