Regular Expressions Quick Reference

Complete explanation

Matching

Characters and classes

  .             Any single character
  []            Character class (see below)

  \t            Horizontal tab
  \n            Newline

  \w            Word character: alphanumeric + connector punctuation, e.g., underscore
  \W            Non-word character

  \s            Whitespace character
  \S            Non-whitespace character

  \d            Decimal digit: [0-9]
  \D            Non-digit:     [^0-9]

Quantifiers

  ?             Zero or one occurrence
  *             Zero or more occurrences
  +             One or more occurrences
  {min,max}     At least min and at most max occurrences
  {n}           Exactly n occurrences
  {min,}        At least min occurrences

  min, max, n must be between 0 and 255.

  You can append ? to any quantifier (except ?) to make it non-greedy.

Assertions

  ^             The beginning of a line
  $             The end of a line

  \b            Word boundary
  \B            Not a word boundary

  (?<=y)x       The pattern x must be preceded by the pattern y.
  (?<!y)x       The pattern x must not be preceded by the pattern y.
  x(?=y)        The pattern x must be followed by the pattern y.
  x(?!y)        The pattern x must not be followed by the pattern y.
                For all the above assertions, the result of matching
                or not matching y is not included as part of the match,
                i.e., $0 in the replacement pattern will not include y.

Other

  x|y           Alternation: either x or y
  (...)         Grouping and subexpression capturing
                (?P<name>...) defines a named subexpression
                (?:...) turns off capturing of the subexpression

Backreferences

  A backslash followed by a number is a backreference if the number
  is 1 to 9 or is not explicitly octal, i.e., does not start with
  the digit zero, and there have been at least that many capturing
  subpatterns.

  (?P=name) is a backreference to a named subexpression.

Options

  (?i)          Turn on case insensitive matching for the
                duration of the subexpression.
  (?-i)         Turn off case insensitive matching for the
                duration of the subexpression.

Oddities

  {             If not part of a valid quantifier expression,
                matches itself.

Character Classes

  -             Between two characters, indicates a range.
                As the first character after any '^', the last
                character, or the second endpoint of a range,
                matches itself. ([.-.] can be used as the first
                endpoint of a range.)

  ^             As the first character, complements the set.
                Otherwise, matches itself.

  [             Matches itself, since classes don't nest.

  ]             As the first character after any ^, matches itself.
                Otherwise, ends the character class.

  [:alnum:]     [A-Za-z0-9]
  [:alpha:]     [A-Za-z]
  [:blank:]     Space or tab
  [:cntrl:]     Any control character
  [:digit:]     [0-9]
  [:graph:]     Any printable character except space
  [:lower:]     [a-z]
  [:print:]     Any printable character including space
  [:punct:]     Any printable character except [^ A-Za-z0-9]
  [:space:]     Space, tab, newline, carriage return, form feed,
                vertical tab
  [:upper:]     [A-Z]
  [:word:]		Any word character, i.e., same as \w
  [:xdigit:]    Any hexadecimal digit: [0-9A-Fa-f]

Replacement

  $0            The overall match
  $N            The Nth subexpression counting '('s from the left
  $-N           The Nth subexpression counting '('s from the right
  ${x}          The subexpression named "x"
  \$            Literal dollar character
  \\            Literal backslash