Regular Expressions Quick Reference
Complete explanation
Matching
Characters and classes
. Any single character
[] Character class (see below)
\t Horizontal tab
\n Newline
\w Word character: alphanumeric + connector punctuation, e.g., underscore
\W Non-word character
\s Whitespace character
\S Non-whitespace character
\d Decimal digit: [0-9]
\D Non-digit: [^0-9]
Quantifiers
? Zero or one occurrence
* Zero or more occurrences
+ One or more occurrences
{min,max} At least min and at most max occurrences
{n} Exactly n occurrences
{min,} At least min occurrences
min, max, n must be between 0 and 255.
You can append ? to any quantifier (except ?) to make it non-greedy.
Assertions
^ The beginning of a line
$ The end of a line
\b Word boundary
\B Not a word boundary
(?<=y)x The pattern x must be preceded by the pattern y.
(?<!y)x The pattern x must not be preceded by the pattern y.
x(?=y) The pattern x must be followed by the pattern y.
x(?!y) The pattern x must not be followed by the pattern y.
For all the above assertions, the result of matching
or not matching y is not included as part of the match,
i.e., $0 in the replacement pattern will not include y.
Other
x|y Alternation: either x or y
(...) Grouping and subexpression capturing
(?P<name>...) defines a named subexpression
(?:...) turns off capturing of the subexpression
Backreferences
A backslash followed by a number is a backreference if the number
is 1 to 9 or is not explicitly octal, i.e., does not start with
the digit zero, and there have been at least that many capturing
subpatterns.
(?P=name) is a backreference to a named subexpression.
Options
(?i) Turn on case insensitive matching for the
duration of the subexpression.
(?-i) Turn off case insensitive matching for the
duration of the subexpression.
Oddities
{ If not part of a valid quantifier expression,
matches itself.
Character Classes
- Between two characters, indicates a range.
As the first character after any '^', the last
character, or the second endpoint of a range,
matches itself. ([.-.] can be used as the first
endpoint of a range.)
^ As the first character, complements the set.
Otherwise, matches itself.
[ Matches itself, since classes don't nest.
] As the first character after any ^, matches itself.
Otherwise, ends the character class.
[:alnum:] [A-Za-z0-9]
[:alpha:] [A-Za-z]
[:blank:] Space or tab
[:cntrl:] Any control character
[:digit:] [0-9]
[:graph:] Any printable character except space
[:lower:] [a-z]
[:print:] Any printable character including space
[:punct:] Any printable character except [^ A-Za-z0-9]
[:space:] Space, tab, newline, carriage return, form feed,
vertical tab
[:upper:] [A-Z]
[:word:] Any word character, i.e., same as \w
[:xdigit:] Any hexadecimal digit: [0-9A-Fa-f]
Replacement
$0 The overall match
$N The Nth subexpression counting '('s from the left
$-N The Nth subexpression counting '('s from the right
${x} The subexpression named "x"
\$ Literal dollar character
\\ Literal backslash