AT&T Home | AT&T Labs | Research
AT&T Labs, Inc. - Research

The Yoix® Scripting Language

Home | What's New | Grammar | Documentation | Download | License | YDAT | YWAIT | Byzgraf | FAQs
Subexp typedict
 
A Subexp is used during regular expression matching to keep track of matched portions of a String. A Subexp is intended as a read-only structure. In many cases, the use of the built-ins gsubsti or substi obviates the need for accessing a Subexp. The fields in a Subexp are:
ranges An Array of Dictionary values. The first array element describes the entire range of the target that the regular expression matched. Subsequent array entries describe the sub-ranges of any subexpressions (see the Regexp documentation for more about subexpressions) in the order that the subexpressions appear in the pattern. Each dictionary has only two elements: sp and ep, which indicate the starting and ending offsets, respectively, of the match within the target string. The starting offset indicates the starting character with offset zero meaning the first character. The ending offset indicates the character after the last matching character. In either case, a negative value indicates that there was no corresponding match.
target A String giving the target text against which the regular expression matching that generated this Subexp was performed. The ranges apply to this target string.
Several permanent fields have not been documented and should not be used in Yoix applications.
 
 Example:   In many cases, a Subexp object is not needed explicitly. It is provided for instances where particular information about a pattern match is needed. The three scripts shown here all accomplish exactly the same thing, namely changing characters in the first lower-case instance of the word the in each line of the Yoix home page HTML source into asterisks. The first script uses regexec to try the pattern match and populate the Subexp object when a successful match occurs. The Subexp object is then explicitly used in making the substitution both in retrieving the parts of the target that did not match and as an argument to regsub, which performs the substitution for the part that did match.
import yoix.*.*;

Regexp re;
Subexp se;
String line;
String result;

Stream page = open("http://www.research.att.com/sw/tools/yoix/", "r");

re.pattern = "(^|[^a-zA-Z0-9])the([^a-zA-Z0-9]|$)";

while (line = page.nextline) {
    if (regexec(re, line, se)) {
        result = substring(se.target,0,se.ranges[0].sp) +
            regsub(#\1***\2#, se) +
            substring(se.target,se.ranges[0].ep);
        printf("Before: %s\nAfter:  %s\n", line, result);
    }
}
The next script uses regexec as above, but now hands off the Subexp object to substi to adjust the text line as needed.
import yoix.*.*;

Regexp re;
Subexp se;
String line;
String result;

Stream page = open("http://www.research.att.com/sw/tools/yoix/", "r");

re.pattern = "(^|[^a-zA-Z0-9])the([^a-zA-Z0-9]|$)";

while (line = page.nextline) {
    if (regexec(re, line, se)) {
        result = substi(#\1***\2#, se);
        printf("Before: %s\nAfter:  %s\n", line, result);
    }
}
The third script achieves the same result in the easiest way without explicit use of the Subexp object by letting substi test the match and perform the substitution as needed. Note that substi returns the same target string as was passed to it when no match occurs.
import yoix.*.*;

Regexp re;
String line;
String result;

Stream page = open("http://www.research.att.com/sw/tools/yoix/", "r");

re.pattern = "(^|[^a-zA-Z0-9])the([^a-zA-Z0-9]|$)";

while (line = page.nextline) {
    result = substi(#\1***\2#, re, line);
    if (result != line)
        printf("Before: %s\nAfter:  %s\n", line, result);
}
A sample of the output from any of these scripts might look like:
Before: The Yoix interpreter supports the important data types and
After:  The Yoix interpreter supports *** important data types and
One last note: these examples use the sharp characters (#) as a quoting character in place of the normal double quote character (") since that indicates to the parser that octal-character escape sequences (e.g., \012) will not be recognized and a backslash in front of an unrecognized escape will simply be treated as a backslash instead of indicating that the next character should be taken literally. So, in the above example,
#\1***\2#
is equivalent to:
"\\1***\\2"
as a convenience.
 
 See Also:   gsubsti, gvsubsti, regexec, Regexp, regexp, regsub, substi, vsubsti

 

Yoix is a registered trademark of AT&T Intellectual Property.