PL_BEGIN(ktok: defining the token) _A(compat) A lexer and parser are _IT(compatible) if they import the same token interface. A _IT(token interface) and its implementation are generated _LN(m3build.html,automatically) by m3build by running the command _C(_TT(tok MyLang.t [ -o MyLangTok.i3 ].)) where _TT(MyLang.t) is a _IT(token specification), and _TT(MyLangTok.i3) is the generated token interface. _A(spec)_H(token specification) A _IT(token specification) is a file with the _TT(.t) suffix which specifies which tokens will be passed from a lexer to a parser. Each line of the file must have one of the following forms:

_TR(TOKEN1 TOKEN2 ...) The given list of tokens are extendable types that can be returned by a lexer. The list is optionally preceded by _TT(%token). TR_ _A(const)_TR(%const TOKEN1 TOKEN2) The given list of tokens can be returned by a lexer, but these tokens cannot be extended to contain a value. TR_ _TR(%char [chars]) The lex-style set of characters enclosed in _TT([]) can be returned by a lexer, and behave like _TT(%const). TR_

By convention, tokens are written in _TT(ALL CAPS) so as not to be confused with other _TT(ParseType)s (see below). _A(intf)_H(token interface) A token interface is a Modula-3 interface that can be imported by generated lexers and parsers (or extended using _EXT), and is itself generated from a _LN(#spec, token specification).

The token interface defines _TT(BRANDED OBJECT) types with the following subtype relationship: _C(_TT(Token <: ParseType)) In addition, each token declared nonconstant in the token specification becomes a subtype of _TT(Token).

If a generated parser imports the token interface, then all arguments and return types of parser reduction methods are subtypes of _TT(ParseType). If a generated lexer imports the token interface, then each expression method returns a _TT(Token).

Any lexer (generated or handwritten) must be a subtype of the _TT(Lexer) type defined in the token interface, which is defined generically as follows:

  Lexer = OBJECT METHODS
    get(): Token RAISES {Rd.EndOfFile};
    (* get next token, or raise Rd.EndOfFile if token cannot be formed
       from remaining input *)

    unget();
    (* will be called at most once after get(), and only when lookahead is
       required after last token when parsing without exhausting input *)

    error(message: TEXT);
    (* might print file name, line number, and message, and exit *)
  END;
_A(RdLexer) A lexer generated by _LN(klex.html,klex) will more specifically be an _TT(RdLexer), which provides the following additional methods:
  RdLexer = Lexer OBJECT METHODS
    setRd(rd: Rd.T): RdLexer;
    (* Prepare to read tokens starting at cur(rd).
       After every token, rd is repositionned after that token. *)

    getRd(): Rd.T;
    (* get reader  *)

    fromText(t: TEXT): RdLexer;
    (* Calls setRd with a textReader. *)

    rewind();
    (* equivalent to Rd.Seek(rd, 0) followed by setRd *)

    getText(): TEXT;
    (* get TEXT of last token *)

    purge(): INTEGER;
    (* Allow any internally allocated ParseTypes to be garbage collected,
       even if the lexer itself remains in scope. Return number of ParseType
       objects allocated but not discarded (not the number of purged objects).
       Can be called at any time by the thread calling get. *)
  END;
PL_END $Id: ktok.html,v 1.2 2001-09-19 15:31:35 wagner Exp $ HTML_END