A Parser object processes the specified sgml files, and calls methods on a user defined Application object for each significant parsing event. The user defined Application object overrides the methods to react appropriately to these events (e.g. print back a modified sgml file, construct an abstract syntax tree...).
INTERFACEThese options define the behavior of the parser.SGML ; IMPORT Rd; TYPE ParserOptions = RECORD showOpenEntities, showOpenElements, outputCommentDecls, outputMarkedSections, outputGeneralEntities, mapCatalogDocument: BOOLEAN := FALSE; defaultDoctype: TEXT; addCatalog, includeParam, enableWarning, addSearchDir, activateLink, architecture: REF ARRAY OF TEXT := NIL; END;
ShowOpenEntities
and showOpenElements
produce information about
the corresponding entity and element when parsing error messages are
issued. OutputCommentDecls
, outputMarkedSections
and
outputGeneralEntities
determine if the parser produces events
when these SGML constructs are encountered.
DefaultDoctype
specifies the document type definition to use when
no DOCTYPE tag is found.
AddCatalog
adds the specified file names as SGML DTD catalogs.
IncludeParam
defines the specified names as parameter entities
set to INCLUDE
(ENTITY % param INCLUDE
); this way, sections
of sgml files which were IGNORE
by default may be changed to
INCLUDE
just by setting this option. EnableWarning
enables the
named warnings: mixed
mixed content model which does not allow #PCDATA,
sgmldecl
dubious constructs in SGML declarations,
should
non followed ISO 8879 recommendations,
default
defaulted references,
duplicate
duplicate entity declarations,
undefined
undefined elements used in the DTD,
unclosed
unclosed start and end tags,
empty
empty start and end tags, net
net-enabling start and end tags,
min-tag
minimized start and end tags (equivalent to unclosed, empty
and net), unused-map
defined but unused short reference maps,
unused-param
defined but unused parameter entities,
notation-sysid
notation for which no system identifier could be
generated, all
equivalent to all the above,
no-idref
do not warn about unresolved references, no-significant
do not warn about non significant characters in literals.
Parser <: ParserPublic; ParserPublic = OBJECT METHODS init(options: ParserOptions; programName: TEXT; files: REF ARRAY OF TEXT; rds: REF ARRAY OF Rd.T := NIL): Parser; run(a: Application): CARDINAL; halt(); inhibitMessages(inhibit: BOOLEAN); subdocumentParser(systemId: TEXT): Parser; newParser(files: REF ARRAY OF TEXT; rds: REF ARRAY OF Rd.T := NIL): Parser; END;The call
p.init(o,p,f,r)
initializes a parser with options o
,
program name p
(used in error messages), and f
the array of names
of files to be parsed. When r
is not specified, files from f
are
opened. Otherwise, f
is used for file names but the actual input is
taken from the readers in r
.
The call p.run(a)
starts parsing the files and calls back the
Application object a
for each parsing event. It returns the number
of errors encountered once the parsing is through.
The call p.halt()
stops the parsing, causing the run method to return.
It is usually called from one the the Application object methods.
The call p.inhibitMessages(b)
disables error and warning messages when
b is TRUE.
The call p.subdocumentParser(s)
creates a new parser ready to process
s
which identifies a subdocument in the context of the file currently
parsed by p
.
The call p.newParser(f,r)
returns a parser using the same options as
p
but ready to process a new set of files defined by f
and r
.
Since the options are the same (catalog name, search paths...), caching
of parsed document type definitions may occur for a significant speedup.
Application <: ApplicationPublic; ApplicationPublic = OBJECT METHODS init(): Application; appInfo(READONLY e: AppinfoEvent); startDtd(READONLY e: StartDtdEvent); endDtd(READONLY e: EndDtdEvent); endProlog(READONLY e: EndPrologEvent); startElement(READONLY e: StartElementEvent); endElement(READONLY e: EndElementEvent); data(READONLY e: DataEvent); sdata(READONLY e: SdataEvent); pi(READONLY e: PiEvent); externalDataEntityRef(READONLY e: ExternalDataEntityRefEvent); subdocEntityRef(READONLY e: SubdocEntityRefEvent); nonSgmlChar(READONLY e: NonSgmlCharEvent); commentDecl(READONLY e: CommentDeclEvent); markedSectionStart(READONLY e: MarkedSectionStartEvent); markedSectionEnd(READONLY e: MarkedSectionEndEvent); ignoredChars(READONLY e: IgnoredCharsEvent); generalEntity(READONLY e: GeneralEntityEvent); error(READONLY e: ErrorEvent); openEntityChange(); getDetailedLocation(pos: Position): DetailedLocation; END;An instance of the
Application
type, or one of its descendant type, is
passed to a Parser and receives the parsing information as methods
being called back. Each of these methods receives a corresponding
parsing event structure.
The call a.init()
initializes a
, before it is used for parsing.
The call a.getDetailedLocation(pos)
returns detailed information about
the location of pos
within the currently parsed entity. It may only be
called from within one of the other methods.
The other methods are called by the Parser and are overidden in Application
type descendants to perform the desired work. AppInfo
is called when the
APPINFO section of the SGML declaration is encountered, startDtd
upon
encountering the Document Type Definition (DTD), endDtd
at the end of
the DTD, endProlog
at the end of the prolog (local markup declarations),
startElement
when a start element tag is found, endElement
for a
real or implied end element tag, data
for character data (CDATA)
within elements or marked sections, sdata
for special character
data (SDATA like bitmap images), pi
for a processing instruction,
externalDataEntityRef
for a reference to an external data entity,
subdocEntityRef
for a reference to a subdoc entity,
nonSgmlChar
for non SGML conforming characters, commentDecl
for
a sequence of comments, markedSectionStart
at the beginning of a marked
section, markedSectionEnd
at the end of a marked section, ignoredChars
for character data within an IGNORE
marked section, generalEntity
for a general entity definition (this occurs within the prolog except
for undefined entities which when referenced are set to the default
entity content), error
upon encountering a parsing error, and
openEntityChange
each time the currently opened entity changes.
PROCEDURE CharRefToCode(t: TEXT; VAR c: CharCode): BOOLEAN;While the input files only contain 8 bits ISO-8859 character codes, larger 16 bits UNICODE codes may be specified by (decimal or hexadecimal) character references. For this reason, all such 16 bits codes are kept
escaped
as character references. Moreover, the special ampersand
character (&) is also kept as an entity reference throughout the
processing. This allows all the processing to use ordinary TEXT elements
which are limited to 8 bits characters. The call CharRefToCode(t,c)
return TRUE
when a valid character reference is received in t
and returns the corresponding code in c
. A valid character reference is
either &amp;, or &#decimalNumber;, or &#xHexaNumber;,
with the number
within the interval 0..65535). This procedure is
typically used by applications to process 16 bits characters escaped
as character references in Sdata events.
TYPE CharCode = [0..65535]; Position = CARDINAL; ExternalId = RECORD systemId: TEXT; publicId: TEXT; generatedSystemId: TEXT; END; (* Depending on the type of external identifier, each of these fields may or may not be available (non NIL). At least one should be non NIL. *) Notation = RECORD name: TEXT; externalId: ExternalId; END; (* A named notation with the corresponding external identifier. *) EntityDataType = { Sgml, CData, SData, NData, Subdoc, Pi }; EntityDeclType = { General, Parameter, Doctype, Linktype }; Entity = RECORD name: TEXT; dataType: EntityDataType; declType: EntityDeclType; internalText: TEXT; (* Following valid if internalText is NIL *) externalId: ExternalId; attributes: REF ARRAY OF Attribute; notation: Notation; END; (* For an internal entity, the replacement text is found in "internalText". For external entities, an external identifier, attributes and a notation are provided. *) AttributeType = { Invalid, Implied, CData, Tokenized }; AttributeDefaulted = { Specified, Definition, Current }; CdataChunk = RECORD nonSgmlChar: CHAR; data: TEXT; entityName: TEXT; END; (* For an SDATA entity reference, entityName is the entity name and data the replacement text. For normal data, entityName is NIL and data contains the character data. For non SGML conforming characters, data and entityName are NIL and nonSgmlChar contains the character. *) Attribute = RECORD name: TEXT; type: AttributeType; defaulted: AttributeDefaulted; cdataChunks: REF ARRAY OF CdataChunk; tokens: TEXT; isId: BOOLEAN; isGroup: BOOLEAN; entities: REF ARRAY OF Entity; notation: Notation; END; (* If the attribute type is Cdata, the value is found in "cdataChunks", otherwise if the type is Tokenized, the value is found in "tokens". For an attribute type NOTATION notation is defined, ENTITY or ENTITIES entities is defined. The field isId is TRUE for an attribute of type ID. *) (* The event structures all contain a position which may be used to obtain detailed position information. *) PiEvent = RECORD pos: Position; data: TEXT; entityName: TEXT; END; (* The content of the processing instruction is in data. If it was an entity reference, the entityName is provided (non NIL). *) ElementContentType = { Empty, CData, RCData, Mixed, Element }; StartElementEvent = RECORD pos: Position; gi: TEXT; contentType: ElementContentType; included: BOOLEAN; attributes: REF ARRAY OF Attribute; END; (* The element type (tag name) is in gi. *) EndElementEvent = RECORD pos: Position; gi: TEXT; END; (* The element type is in gi. *) DataEvent = RECORD pos: Position; data: TEXT; END; SdataEvent = RECORD pos: Position; text: TEXT; entityName: TEXT; END; (* Reference to an internal sdata entity. The replacement text is in text and the referenced entity in entityName. *) ExternalDataEntityRefEvent = RECORD pos: Position; entity: Entity; END; SubdocEntityRefEvent = RECORD pos: Position; entity: Entity; END; NonSgmlCharEvent = RECORD pos: Position; c: CHAR; END; ErrorType = { Info, Warning, Quantity, IDRef, Capacity, OtherError }; ErrorEvent = RECORD pos: Position; type: ErrorType; message: TEXT; END; AppinfoEvent = RECORD pos: Position; string: TEXT; END; StartDtdEvent = RECORD pos: Position; name: TEXT; (* If it does not have an external ID all names within will be NIL *) externalId: ExternalId; END; EndDtdEvent = RECORD pos: Position; name: TEXT; END; EndPrologEvent = RECORD pos: Position; END; GeneralEntityEvent = RECORD entity: Entity; END; CommentDeclEvent = RECORD pos: Position; comments: REF ARRAY OF TEXT; seps: REF ARRAY OF TEXT; END; MarkedSectionStatus = { Include, RCData, CData, Ignore }; MarkedSectionParamType = { Temp, Include, RCData, CData, Ignore, EntityRef }; MarkedSectionParam = RECORD type: MarkedSectionParamType; entityName: TEXT; END; MarkedSectionStartEvent = RECORD pos: Position; status: MarkedSectionStatus; params: REF ARRAY OF MarkedSectionParam; END; MarkedSectionEndEvent = RECORD pos: Position; status: MarkedSectionStatus; END; IgnoredCharsEvent = RECORD pos: Position; data: TEXT; END; DetailedLocation = RECORD lineNumber: CARDINAL; columnNumber: CARDINAL; byteOffset: CARDINAL; entityOffset: CARDINAL; entityName: TEXT; filename: TEXT; END;Debugging
PROCEDURE DumpDefinitions(this: Parser); END SGML.