re2 module
Regular expressions using Google’s RE2 engine.
Compared to Python’s re
, the RE2 engine compiles regular expressions to
deterministic finite automata, which guarantees linear-time behavior.
Intended as a drop-in replacement for re
. Unicode is supported by encoding
to UTF-8, and bytes strings are treated as UTF-8 when the UNICODE flag is given.
For best performance, work with UTF-8 encoded bytes strings.
Regular expressions that are not compatible with RE2 are processed with
fallback to re
. Examples of features not supported by RE2:
lookahead assertions
(?!...)
backreferences (
\\n
in search pattern)W and S not supported inside character classes
On the other hand, unicode character classes are supported (e.g., \p{Greek}
).
Syntax reference: https://github.com/google/re2/wiki/Syntax
What follows is a reference for the regular expression syntax supported by this module (i.e., without requiring fallback to re).
Regular expressions can contain both special and ordinary characters. Most ordinary characters, like “A”, “a”, or “0”, are the simplest regular expressions; they simply match themselves.
The special characters are:
"." Matches any character except a newline.
"^" Matches the start of the string.
"$" Matches the end of the string or just before the newline at
the end of the string.
"*" Matches 0 or more (greedy) repetitions of the preceding RE.
Greedy means that it will match as many repetitions as possible.
"+" Matches 1 or more (greedy) repetitions of the preceding RE.
"?" Matches 0 or 1 (greedy) of the preceding RE.
*?,+?,?? Non-greedy versions of the previous three special characters.
{m,n} Matches from m to n repetitions of the preceding RE.
{m,n}? Non-greedy version of the above.
"\\" Either escapes special characters or signals a special sequence.
[] Indicates a set of characters.
A "^" as the first character indicates a complementing set.
"|" A|B, creates an RE that will match either A or B.
(...) Matches the RE inside the parentheses.
The contents can be retrieved or matched later in the string.
(?:...) Non-grouping version of regular parentheses.
(?imsux) Set the I, M, S, U, or X flag for the RE (see below).
The special sequences consist of “\” and a character from the list below. If the ordinary character is not on the list, then the resulting RE will match the second character:
\A Matches only at the start of the string.
\Z Matches only at the end of the string.
\b Matches the empty string, but only at the start or end of a word.
\B Matches the empty string, but not at the start or end of a word.
\d Matches any decimal digit.
\D Matches any non-digit character.
\s Matches any whitespace character.
\S Matches any non-whitespace character.
\w Matches any alphanumeric character.
\W Matches the complement of \w.
\\ Matches a literal backslash.
\pN Unicode character class (one-letter name)
\p{Greek} Unicode character class
\PN negated Unicode character class (one-letter name)
\P{Greek} negated Unicode character class
This module exports the following functions:
count Count all occurrences of a pattern in a string.
match Match a regular expression pattern to the beginning of a string.
fullmatch Match a regular expression pattern to all of a string.
search Search a string for a pattern and return Match object.
contains Same as search, but only return bool.
sub Substitute occurrences of a pattern found in a string.
subn Same as sub, but also return the number of substitutions made.
split Split a string by the occurrences of a pattern.
findall Find all occurrences of a pattern in a string.
finditer Return an iterator yielding a match object for each match.
compile Compile a pattern into a RegexObject.
purge Clear the regular expression cache.
escape Backslash all non-alphanumerics in a string.
Some of the functions in this module takes flags as optional parameters:
A ASCII Make \w, \W, \b, \B, \d, \D match the corresponding ASCII
character categories (rather than the whole Unicode
categories, which is the default).
I IGNORECASE Perform case-insensitive matching.
M MULTILINE "^" matches the beginning of lines (after a newline)
as well as the string.
"$" matches the end of lines (before a newline) as well
as the end of the string.
S DOTALL "." matches any character at all, including the newline.
X VERBOSE Ignore whitespace and comments for nicer looking RE's.
U UNICODE Enable Unicode character classes and make \w, \W, \b, \B,
Unicode-aware (default for unicode patterns).
This module also defines an exception ‘RegexError’ (also available under the alias ‘error’).
- exception re2.BackreferencesException
Bases:
Exception
Search pattern contains backreferences.
- exception re2.CharClassProblemException
Bases:
Exception
Search pattern contains unsupported character class.
- class re2.Match
Bases:
object
- end(group=0)
- endpos
- expand(template)
Expand a template with groups.
- group(*args)
- groupdict()
- groups(default=None)
- lastgroup
- lastindex
- pos
- re
- regs
- span(group=0)
- start(group=0)
- string
- class re2.Pattern
Bases:
object
- contains(string, pos=0, endpos=-1)
“contains(string[, pos[, endpos]]) –> bool.”
Scan through string looking for a match, and return True or False.
- count(string, pos=0, endpos=-1)
Return number of non-overlapping matches of pattern in string.
- findall(string, pos=0, endpos=-1)
Return all non-overlapping matches of pattern in string as a list of strings.
- finditer(string, pos=0, endpos=-1)
Yield all non-overlapping matches of pattern in string as Match objects.
- flags
- fullmatch(string, pos=0, endpos=-1)
“fullmatch(string[, pos[, endpos]]) –> Match object or None.”
Matches the entire string.
- groupindex
- groups
- match(string, pos=0, endpos=-1)
Matches zero or more characters at the beginning of the string.
- pattern
- scanner(arg)
- search(string, pos=0, endpos=-1)
Scan through string looking for a match, and return a corresponding Match instance. Return None if no position in the string matches.
- split(string, maxsplit=0)
split(string[, maxsplit = 0]) –> list
Split a string by the occurrences of the pattern.
- sub(repl, string, count=0)
sub(repl, string[, count = 0]) –> newstring
Return the string obtained by replacing the leftmost non-overlapping occurrences of pattern in string by the replacement repl.
- subn(repl, string[, count = 0]) --> (newstring, number of subs)
Return the tuple (new_string, number_of_subs_made) found by replacing the leftmost non-overlapping occurrences of pattern with the replacement repl.
- re2.SREPattern
alias of
Pattern
- re2.compile(pattern, flags=0, max_mem=8388608)
- re2.count(pattern, string, flags=0)
Return number of non-overlapping matches in the string.
Empty matches are included in the count.
- exception re2.error(msg, pattern=None, pos=None)
Bases:
Exception
Exception raised for invalid regular expressions.
Attributes:
msg: The unformatted error message pattern: The regular expression pattern pos: The index in the pattern where compilation failed (may be None) lineno: The line corresponding to pos (may be None) colno: The column corresponding to pos (may be None)
- re2.escape(pattern)
Escape all non-alphanumeric characters in pattern.
- re2.findall(pattern, string, flags=0)
Return a list of all non-overlapping matches in the string.
Each match is represented as a string or a tuple (when there are two ore more groups). Empty matches are included in the result.
- re2.finditer(pattern, string, flags=0)
Yield all non-overlapping matches in the string.
For each match, the iterator returns a
Match
object. Empty matches are included in the result.
- re2.fullmatch(pattern, string, flags=0)
Try to apply the pattern to the entire string, returning a
Match
object, orNone
if no match was found.
- re2.match(pattern, string, flags=0)
Try to apply the pattern at the start of the string, returning a
Match
object, orNone
if no match was found.
- re2.purge()
Clear the regular expression caches.
- re2.search(pattern, string, flags=0)
Scan through string looking for a match to the pattern, returning a
Match
object or none if no match was found.
- re2.set_fallback_notification(level)
Set the fallback notification to a level; one of: FALLBACK_QUIETLY FALLBACK_WARNING FALLBACK_EXCEPTION
- re2.split(pattern, string, maxsplit=0, flags=0)
Split the source string by the occurrences of the pattern, returning a list containing the resulting substrings.
- re2.sub(pattern, repl, string, count=0, flags=0)
Return the string obtained by replacing the leftmost non-overlapping occurrences of the pattern in string by the replacement
repl
.repl
can be either a string or a callable; if a string, backslash escapes in it are processed. If it is a callable, it’s passed theMatch
object and must return a replacement string to be used.
- re2.subn(pattern, repl, string, count=0, flags=0)
Return a 2-tuple containing
(new_string, number)
. new_string is the string obtained by replacing the leftmost non-overlapping occurrences of the pattern in the source string by the replacementrepl
.number
is the number of substitutions that were made.repl
can be either a string or a callable; if a string, backslash escapes in it are processed. If it is a callable, it’s passed theMatch
object and must return a replacement string to be used.