INTERFACEThis interface defines additional fields and methods for the Scanner object defined in the generated SGMLC interface. Since SGML does not have a context free syntax, the scanner uses a current state to guide the token scanning. Furthermore, a state stack is maintained to properly handle nested constructs.SGMLCScanner ;
Another annoying feature
of SGML is the parameter entities which need
to be replaced before the parsing may be attempted. Thus, the scanner
recognizes parameter entities and obtains the replacement text from
a specified entity resolver. These replacement texts, and external
files such as Dtd are seen as nested files to be scanned until all
the input file stack is exhausted.
IMPORT SGMLC, Rd, RefSeq; REVEAL SGMLC.Scanner <: PublicScanner; TYPE PublicScanner = SGMLC.PublicS OBJECT input: Input; inputStack: RefSeq.T; METHODS initSimple(e: SGMLC.ErrHandler): SGMLC.Scanner; setEntityResolver(r: EntityResolver); pushState(s: State); popState(); pushFile(name: TEXT; rd: Rd.T); pushNextFile(name: TEXT; rd: Rd.T); inMarkupDecl(b: BOOLEAN); END;Input stores the state for the current file or replacement text, while a stack of input states is maintained in case of nested includes or parameter entities.
The call s.initSimple(e)
initializes s
with e
as error handler.
The call s.setEntityResolver(r)
sets r
as the entity resolver from which
the replacement text may be obtained for parameter entities.
The call sc.pushState(s)
pushes s
as the new state for the
non-context-free scanning required by SGML.
The call s.popState()
returns to the previous scanning state.
The call s.pushFile(name,rd)
inserts rd
as a new file named name
to read from. Once rd
is exhausted, the scanning resumes where it was.
The call s.pushNextFile(name,rd)
adds rd
as a file named name
to
read from immediately after the current input file is exhausted, before
resuming with the next input file on the stack.
The call s.inMarkupDecl(b)
indicates to the scanner that markup
declarations will be processed.
EntityResolver = OBJECT METHODS resolve(name: TEXT): Rd.T; END;The entity resolver object decouples the scanner from the higher level construct (usually the parser) which finds and stores parameter entities and their corresponding replacement text.
State = { AttValue, EntityValue, PCData, ContentCSect, StartCSect, ElementTag, DocType, Element, AttList, Entity, Notation, Catalog };The scanning is slightly different for each of these possible states.
Input = REF RECORD offset: CARDINAL := 0; currentLine: CARDINAL := 1; currentCol: CARDINAL := 0; rd: Rd.T := NIL; name: TEXT := ""; END;The input structure stores relevant information to determine the exact position in a file, typically for error reporting.
END SGMLCScanner.