Perl Regular Expressions
(modified from this page)
The following is paraphrased from the
Perl reference guide,
a postscript quick reference to Perl syntax.
Each character matches itself, unless it is one of the special characters:
+?.*()[]{}|\
- diabetes only matches "diabetes".
. matches an arbitrary character, but not a newline
- Gonzale. matches "Gonzales", "Gonzalez", "Gonzalem",
etc.
- .i.e matches "mite", "bite", "life", etc.
(...) groups a series of pattern elements to a single element
The parenthesis group units together. For an example, see the next entry.
+ matches the preceding pattern element one or more times
- a+rdvark matches "ardvark", "aardvark", "aaardvark",
etc.
- (do)+ matches "do", "dodo", "dododo", etc.
? matches the preceding pattern element zero or one times
- outcomes? matches "outcome" and "outcomes"
- .?ill matches "ill", "bill", but not "still"
- mex.*am matches "mex-am", "Mexican-American", "Mexican
American", etc.
* matches the preceding pattern element zero or more times
- abs0*9 matches "abs9", "abs09", "abs009", etc.
- .*ill matches "ill", "bill", "still", "schmill", etc.
{n} means exactly n times
- (e.?){3} matches "eve equals", but not "bee seepage"
{n,} means at least n times
{n,m} denotes the minum n and maximum m
match count.
- bo{1,2}p matches "bop", "boop", but not "booop"
[...] denotes a class of characters to match
- Type [1,2] matches "Type 1" and "Type 2"
- abs00[0-9] matches "abs000", "abs001", up to "abs009"
(...|...|...) matches on of the alternatives
- (The|A) Christmas Story matches both "A Christmas Story" and
"The Christmas Story", but not just "Christmas Story"
The above reserved punctuation can be searched for by ("escaped from their
regular meanings") by using a preceding \
- 1+1 matches "1+1", "11+1", "111+1", etc.
- 1\+1 matches only "1+1"
- \\ matches "\"
\w matches alphanumeric, including "_".
\W matches non-alphanumeric (i.e., punctuation)
\b matches word boundaries
\B matches non-boundaries
\s matches whitespace
\S matches non-whitespace
\d matches numeric
\D matches non-numeric
\n, \r, \f, \t mean "newline",
"carriage-return", "form-feed", and "tab"
\w, \s, \d may be used within character classes,
\b denotes backspace in this context.