PL_BEGIN(ktok: defining the token) _A(compat) A lexer and parser are _IT(compatible) if they import the same token interface. A _IT(token interface) and its implementation are generated _LN(m3build.html,automatically) by m3build by running the command _C(_TT(tok MyLang.t [ -o MyLangTok.i3 ].)) where _TT(MyLang.t) is a _IT(token specification), and _TT(MyLangTok.i3) is the generated token interface. _A(spec)_H(token specification) A _IT(token specification) is a file with the _TT(.t) suffix which specifies which tokens will be passed from a lexer to a parser. Each line of the file must have one of the following forms:
By convention, tokens are written in _TT(ALL CAPS) so as not to be confused with other _TT(ParseType)s (see below). _A(intf)_H(token interface) A token interface is a Modula-3 interface that can be imported by generated lexers and parsers (or extended using _EXT), and is itself generated from a _LN(#spec, token specification).
The token interface defines _TT(BRANDED OBJECT) types with the following subtype relationship: _C(_TT(Token <: ParseType)) In addition, each token declared nonconstant in the token specification becomes a subtype of _TT(Token).
If a generated parser imports the token interface, then all arguments and return types of parser reduction methods are subtypes of _TT(ParseType). If a generated lexer imports the token interface, then each expression method returns a _TT(Token).
Any lexer (generated or handwritten) must be a subtype of the _TT(Lexer) type defined in the token interface, which is defined generically as follows:
Lexer = OBJECT METHODS get(): Token RAISES {Rd.EndOfFile}; (* get next token, or raise Rd.EndOfFile if token cannot be formed from remaining input *) unget(); (* will be called at most once after get(), and only when lookahead is required after last token when parsing without exhausting input *) error(message: TEXT); (* might print file name, line number, and message, and exit *) END;_A(RdLexer) A lexer generated by _LN(klex.html,klex) will more specifically be an _TT(RdLexer), which provides the following additional methods:
RdLexer = Lexer OBJECT METHODS setRd(rd: Rd.T): RdLexer; (* Prepare to read tokens starting at cur(rd). After every token, rd is repositionned after that token. *) getRd(): Rd.T; (* get reader *) fromText(t: TEXT): RdLexer; (* Calls setRd with a textReader. *) rewind(); (* equivalent to Rd.Seek(rd, 0) followed by setRd *) getText(): TEXT; (* get TEXT of last token *) purge(): INTEGER; (* Allow any internally allocated ParseTypes to be garbage collected, even if the lexer itself remains in scope. Return number of ParseType objects allocated but not discarded (not the number of purged objects). Can be called at any time by the thread calling get. *) END;PL_END $Id: ktok.html,v 1.2 2001-09-19 15:31:35 wagner Exp $ HTML_END