LY_BEGIN(lex, klex, lexer, MyLangLex, Lexer,l) A _IT(lexer specification) is a file with the _TT(.l) suffix which specifies regular expressions to be converted to tokens. The syntax differs from that of a UNIX _TT(.l) file in that each regular expression is associated with an _IT(expression method), which must be given a name:

_TRP(%expr { bare_STRING RegExp1 quoted_STRING RegExp2 }) define the expression methods named _TT(bare_STRING) and _TT(quoted_STRING), which will be called whenever the respective expressions _TT(RegExp1) and _TT(RegExp2) are matched. See _LN(#defmeth, default method construction) below. _TRE _TR(METHOD1 RegExp1) same as above, i.e. the _TT(%expr{}) is optional. _TRE _TRP(%macro { MACRO1 RegExp1 MACRO2 RegExp2 }) define _TT({MACRO1}) to stand for _TT(RegExp1), and define _TT({MACRO2}) to stand for _TT(RegExp2). _TRE _TR(%macro MACRO RegExp) alternate syntax, same meaning as above. _TRE

_A(defmeth)_H(default method construction) The longest suffix of the method name matching a token name causes that token type to be returned by default. For example, if there is a token named _TT(STRING), then any methods named _TT(bare_STRING) or _TT(quoted_STRING) will by default be assigned a procedure which returns a new token of type _TT(STRING).

The default method named _TT(char) returns a constant token whose value equals the character code of the first matched character. If a method name is not _TT(char) and does not have a suffix matching a token name, the default method returns _TT(NIL) (instructing the lexer to skip the token) and a warning is printed. The warning is not printed, however, if the method is named _TT(skip); in that case skipping is assumed to be the desired default behavior.

In addition it is customary to define a token named _TT(ERROR), which does not ordinarily match any grammar rules. Thus a lexer specification will typically end with the following 3 lines:

char        {%char}
skip        [ \t]*
ERROR       [^]
The behavior of any default method can be changed by overriding the method, for example using _EXT. _H(regular expressions) _BF(klex) supports the following subset of _LN(#bib,flex) regular expressions, in order of decreasing precedence:

#define _RS _TT(r)'s #define _R _TT(r) _TR(x) match the character _Q(x) _TRE _TR(.) any character (byte) except newline _TRE _TR([xyz]) a "character class"; in this case, the pattern matches either an _Q(x), a _Q(y), or a _Q(z) _TRE _TR([abj-oZ]) a "character class" with a range in it; matches an _Q(a), a _Q(b), any letter from _Q(j) through _Q(o), or a _Q(Z) _TRE _TR([^A-Z]) a "negated character class", i.e., any character but those in the class. In this case, any character EXCEPT an uppercase letter. _TRE _TR([^A-Z\n]) any character EXCEPT an uppercase letter or a newline _TRE _TR(r*) zero or more _RS, where _R is any regular expression _TRE _TR(r+) one or more _RS _TRE _TR(r?) zero or one _RS (that is, an optional _R) _TRE _TR2(r{2,5}) anywhere from two to five _RS _TRE _TR2(r{2,}) two or more _RS _TRE _TR(r{4}) exactly 4 _RS _TRE _TR({NAME}) the expansion of macro _TT(NAME) _TRE _TR({%char}) the _TT(%char) macro expands to the class of characters which were declared _TT(%char) in the _TI. _TRE _TR("[xyz]\"foo") the literal string: [xyz]"foo_TRE _TR(\X) if _TT(X) is an _Q(n) or _Q(t), then the ANSI-C interpretation of _TT(\x). Otherwise, a literal _Q(X) (used to escape operators such as _Q(*)) _TRE _TR(\123) the character with octal value 123 _TRE _TR((r)) match an r; parentheses are used to override precedence _TRE _TR(rs) the regular expression r followed by the regular expression s; called "concatenation" _TRE _TR(r|s) either an r or an s _TRE
_A(intf)_H(lexer interface) A _IT(lexer interface) is a Modula-3 interface defining a type _TT(T) that can be passed as an argument to a parser initialization method (or extended using _EXT), and is itself generated from a _LN(#spec, lexer specification).

The type _TT(T) representing a lexer is declared as an opaque subtype of the _LN(ktok.html#RdLexer,_TT(RdLexer)) instantiated in the token interface. Hence the following uses are possible:

_TT(myLexer := NEW(MyLangLex.T).setRd(rd);)
Initialize the new lexer _TT(myLexer) using the reader _TT(rd: Rd.T).
 
_TT(start := myParser.parse(NEW(MylangLex.T).fromText(text));)
Parse _TT(text: TEXT), using a new lexer and _TT(myParser). The interface which was used to initialize _TT(myParser) must be _LN(ktok.html#compat,compatible) with _TT(MyLangLex.i3).

There is no method named _TT(init), to allow customized initialization parameters in extended lexers. _A(bib)_H(see also) M. E. Lesk and E. Schmidt, _IT(LEX - Lexical Analyzer Generator)

Vern Paxson et. al., _IT(flex - fast lexical analyzer generator)

A. Aho, R. Sethi and J. Ullman, Compilers: Principles, Techniques and Tools PL_END $Id: klex.html,v 1.2 2001-09-19 15:31:35 wagner Exp $ HTML_END