Regular Expressions
Wilma can use regular expressions to match content words, file names
and folder paths. In each case the regular expression is applied to
each word, name or path individually. In other words you can not use
something like:
Fred.*Tom
to match the multiple words in the phrase "Fred and Tom" although
Wilma's proximity operators
can be used for that purpose, but not when using regular expressions.
Regular expressions can be used with the logical operators when searching
for content words, but the operators must be separated from the regular
expressions by spaces. Spaces are not allowed in Wilma regular expressions.
A full description of regular expressions is beyond the scope of this
documentation, but there is a vast amount of information on this
subject on the net. The regular expression engine used in Wilma is
derived from the PCRE package developed by Philip Hazel of the
University of Cambridge. Below is a brief description of the syntax
of regular expressions, largely taken from the REALbasic
documentation.
Pattern |
Description |
. |
Matches any character except newline. |
[a-z0-9] |
Matches any single character of set. |
[^a-z0-9] |
Matches any single character not in set. |
\d |
Matches a digit. Same as [0-9]. |
\D |
Matches a non-digit. Same as [^0-9]. |
\w |
Matches an alphanumeric (word) character -- [a-zA-Z0-9_]. |
\W |
Matches a non-word character [^a-zA-Z0-9_]. |
\s |
Matches a whitespace character (space, tab, newline, etc.). |
\S |
Matches a non-whitespace character. |
\n |
Matches a newline (line feed). |
\r |
Matches a return. |
\t |
Matches a tab. |
\f |
Matches a formfeed. |
\b |
Matches a backspace. |
\0 |
Matches a null character. |
\000 |
Also matches a null character because of the following: |
\nnn |
Matches an ASCII character of that octal value. |
\xnn |
Matches an ASCII character of that hexadecimal value. |
\cX |
Matches an ASCII control character. |
\metachar |
Matches the meta-character (e.g., \, ., |). |
(abc) |
Used to create subexpressions. Remembers the match
for later back references. Referenced by replacement patterns that use \1, \2, etc. |
\1, \2, |
Matches whatever first (second, and so on) of parens matched. |
x? |
Matches 0 or 1 x's, where x is any of above. |
x* |
Matches 0 or more x's. |
x+ |
Matches 1 or more x's. |
x{m,n} |
Matches at least m x's, but no more than n. |
abc |
Matches all of a, b, and c in order. |
a|b|c |
Matches one of a, b, or c. |
\b |
Matches a word boundary (outside [] only). |
\B |
Matches a non-word boundary. |
^ |
Anchors match to the beginning of a line or string. In Wilma's case,
this would be the beginning of the word, name, or folder path.
|
$ |
Anchors match to the end of a line or string. In Wilma's case,
this would be the end of the word, name, or folder path.
|
|