| Subexp |
|
typedict |
| |
A
Subexp
is used during regular expression matching to keep track of matched
portions of a
String.
A
Subexp
is intended as a read-only structure.
In many cases, the use of the built-ins
gsubsti
or
substi
obviates the need for accessing a
Subexp.
The fields in a
Subexp
are:
| ranges |
An
Array
of
Dictionary
values.
The first array element describes the entire range of the target
that the regular expression matched.
Subsequent array entries describe the sub-ranges of any subexpressions
(see the
Regexp
documentation for more about subexpressions) in the order that the
subexpressions appear in the pattern.
Each dictionary has only two elements:
sp
and
ep,
which indicate the starting and ending offsets, respectively,
of the match within the
target
string.
The starting offset indicates the starting character with offset zero meaning
the first character.
The ending offset indicates the character after the last matching character.
In either case,
a negative value indicates that there was no corresponding match.
| | target |
A
String
giving the target text against which the regular expression matching that
generated this
Subexp
was performed.
The
ranges
apply to this target string.
|
Several permanent fields have not been documented and should not be
used in Yoix applications.
| |
| Example: |
In many cases, a
Subexp
object is not needed explicitly.
It is provided for instances where particular information about a
pattern match is needed.
The three scripts shown here all accomplish exactly the same thing, namely
changing characters in the first lower-case instance of the word
the
in each line of the Yoix home page HTML source into asterisks.
The first script uses
regexec
to try the pattern match and populate the
Subexp
object when a successful match occurs.
The
Subexp
object is then explicitly used in making the substitution both in retrieving
the parts of the target that did not match and as an argument to
regsub,
which performs the substitution for the part that did match.
import yoix.*.*;
Regexp re;
Subexp se;
String line;
String result;
Stream page = open("http://www.research.att.com/sw/tools/yoix/", "r");
re.pattern = "(^|[^a-zA-Z0-9])the([^a-zA-Z0-9]|$)";
while (line = page.nextline) {
if (regexec(re, line, se)) {
result = substring(se.target,0,se.ranges[0].sp) +
regsub(#\1***\2#, se) +
substring(se.target,se.ranges[0].ep);
printf("Before: %s\nAfter: %s\n", line, result);
}
}
The next script uses
regexec
as above, but now hands off the
Subexp
object to
substi
to adjust the text line as needed.
import yoix.*.*;
Regexp re;
Subexp se;
String line;
String result;
Stream page = open("http://www.research.att.com/sw/tools/yoix/", "r");
re.pattern = "(^|[^a-zA-Z0-9])the([^a-zA-Z0-9]|$)";
while (line = page.nextline) {
if (regexec(re, line, se)) {
result = substi(#\1***\2#, se);
printf("Before: %s\nAfter: %s\n", line, result);
}
}
The third script achieves the
same result in the easiest way without explicit use of the
Subexp
object by letting
substi
test the match and perform the substitution as needed.
Note that
substi
returns the same target string as was passed to it when no match occurs.
import yoix.*.*;
Regexp re;
String line;
String result;
Stream page = open("http://www.research.att.com/sw/tools/yoix/", "r");
re.pattern = "(^|[^a-zA-Z0-9])the([^a-zA-Z0-9]|$)";
while (line = page.nextline) {
result = substi(#\1***\2#, re, line);
if (result != line)
printf("Before: %s\nAfter: %s\n", line, result);
}
A sample of the output from any of these scripts might look like:
Before: The Yoix interpreter supports the important data types and
After: The Yoix interpreter supports *** important data types and
One last note: these examples use the sharp characters
(#)
as a quoting character in place of the normal double quote character
(")
since that indicates to the parser that octal-character escape
sequences
(e.g., \012)
will not be recognized and a backslash in front of an unrecognized
escape will simply be treated as a backslash instead of indicating that
the next character should be taken literally.
So, in the above example,
#\1***\2#
is equivalent to:
"\\1***\\2"
as a convenience.
| | |
| See Also: |
gsubsti,
gvsubsti,
regexec,
Regexp,
regexp,
regsub,
substi,
vsubsti
|
|
Yoix is a registered trademark of AT&T Intellectual Property.
|