Home| Design| License| Regex| Download| Credits| Contact
Text-Zap Info
› Overview› Text-Zap Ant Task› Expand Text-Zap› API Docs
Components
› Filters› Savers› Groupers› Sequencers
Navigate
› Regex Uses› Simple Examples› Capturing Groups› Complex Examples› Text-Zap Specific Info› Internet Resources

Regex Uses

Regular expressions or regex for short may quite possibly be the greatest thing since sliced bread (right up there next to instant coffee). They are a simple way to define a pattern of characters. Unfortunately, for those not yet initiated (and even for some who already are) regex patterns look like gibberish. We will try to sort out some of the details here. This is only a basic introduction, for more comprehensive information see the Internet Resources section bellow.

Text-Zap uses the new Regex package that is built into Java 1.4. Some of the examples given here are specific to the Java regex package.

^topSimple Examples

Any string of characters can be a regular expression, but characters are reserved and have special meanings. Here is a listing of some of these constructs:

Basic Regex Constructs
ConstructMeaning
xThe character x
\\The backslash character
\uhhhhThe character with hexadecimal value 0xhhhh
\tThe tab character
\nThe newline (line feed) character. See note.
Character classes
[abc]'a', 'b', or 'c'
[^abc]Any character except 'a', 'b', or 'c'
[a-zA-Z]a through z or A through Z
Qualifiers
ab*Zero or more occurrences of 'ab'
aa+One or more occurrences of 'aa'
.*Zero or more of any character
Predefined character classes
.Any character (may or may not match line terminators)
\dA digit
\DA non-digit
\sA whitespace character
\SA non-whitespace character
\wA word character (a letter number or _)
\WA non-word character (anything but a letter number or _)
Boundary matchers
^The beginning of a line
$The end of a line
\bA word boundary
\BA non-word boundary
\AThe beginning of the input
\GThe end of the previous match
\ZThe end of the input but for the final terminator, if any
\zThe end of the input

^topCapturing Groups

A regex can be divided by parentheses. These will allow you to apply the * or + operators to sections of a pattern. More importantly they break up a pattern into capturing groups. These groups can be recursively referred to within the pattern and within the replacement. For example, if you wish to search for a pattern and replace part of it with something else, but leave the first part intact, you can refer to it with a captureing group and put it back into the replacement.

Capturing groups are refered to by a '$' and the number that the parentheses occupies within the pattern. The parentheses are counted from left to right The entire pattern is refered to as '$0'.

An example:
(\n)(\t)--((.)*)
This pattern breaks down into the following groups:
$1(\n)A new line. See note
$2(\t)A tab
$3((.)*)Any and all characters
$0(.)Any single character
$0(\n)(\t)--((.)*)Search for line that begins with a tab followed by two dashes '--' and any characters that follow.

^topComplex Examples

Here are a few more complex examples using some of the concepts outlined above.

More Complex Regex Constructs
RegexMeaning
(.*)\n(.*)\n(.*)\nThree line containing anything. See note.
([^-])( \{)Any character other than '-' that appears before ' {'
\b[A-Z]+\bAny capitalized word
[\u0401-\u0491]All characters in the Cyrllic alphabet

^topText-Zap Specific Info

There are a few details about how Text-Zap gets and uses Regex's.

Firstly since Text-Zap is run from an Ant build XML file the < and > characters cannot appear in a regex. The &lt; and &gt; characters must be substituted.

Also, since Text-Zap is multiplatform, it uses the system default for line terminators. Searching for '\n' is guaranteed not to work. The recommended usage is ${line.separator} which Ant translates into the system default.

^topInternet Resources

Here are some Internet based resources for more information about regular expressions:

SourceForge.net Logo