Answer a question

When matching an expression on multiple lines, I always used re.DOTALL and it worked OK. Now I stumbled across the re.MULTILINE string, and it looks like it's doing the same thing.

From the re module (doesn't make it clearer, but the values are different):

M = MULTILINE = sre_compile.SRE_FLAG_MULTILINE # make anchors look for newline
S = DOTALL = sre_compile.SRE_FLAG_DOTALL # make dot match newline

SRE_FLAG_MULTILINE = 8 # treat target as multiline string
SRE_FLAG_DOTALL = 16 # treat target as a single string

So is there a difference in the usage, and what is the subtle cases where it could return something different?

Answers

They are quite different. Yes, both affect how newlines are treated, but they switch behaviour for different concepts.

  • re.MULTILINE affects where ^ and $ anchors match.

    Without the switch, ^ and $ match only at the start and end, respectively, of the whole text. With the switch, they also match just before or after a newline:

    >>> import re
    >>> re.search('foo$', 'foo\nbar') is None  # no match
    True
    >>> re.search('foo$', 'foo\nbar', flags=re.MULTILINE)
    <_sre.SRE_Match object; span=(0, 3), match='foo'>
    
  • re.DOTALL affects what the . pattern can match.

    Without the switch, . matches any character except a newline. With the switch, newlines are matched as well:

    >>> re.search('foo.', 'foo\nbar') is None  # no match
    True
    >>> re.search('foo.', 'foo\nbar', flags=re.DOTALL)
    <_sre.SRE_Match object; span=(0, 4), match='foo\n'>
    
Logo

学AI,认准AI Studio!GPU算力,限时免费领,邀请好友解锁更多惊喜福利 >>>

更多推荐