TOK_AND | TOK_COM | TOK_CRO | TOK_DTGC |
TOK_DTGO | TOK_DSC | TOK_DSO | TOK_EE |
TOK_ERO | TOK_ETAGO | TOK_GRPC | TOK_GRPO |
TOK_LIT | TOK_LITA | TOK_MDC | TOK_MDO |
TOK_MINUS | TOK_MSC | TOK_NET | TOK_NONSGML |
TOK_OPT | TOK_OR | TOK_PERO | TOK_PIC |
TOK_PIO | TOK_PLUS | TOK_RE | TOK_REFC |
TOK_REP | TOK_RNI | TOK_RS | TOK_SEQ |
TOK_SHORTREF | TOK_SPACE | TOK_STAGO | TOK_TAGC |
TOK_VI | TOK_DIGIT | TOK_DATACHAR | TOK_DELMCHAR |
TOK_FUNCHAR | TOK_LETTER | TOK_MSICHAR | TOK_MSOCHAR |
TOK_MSSCHAR | TOK_NMCHAR | TOK_NMSTRT | TOK_SEPCHAR |
TOK_SPECIAL |
MDO_ATTLIST | MDO_DOCTYPE | MDO_ELEMENT |
MDO_ENTITY | MDO_LINK | MDO_LINKTYPE |
MDO_NOTATION | MDO_SHORTREF | MDO_SGML |
MDO_USELINK | MDO_USEMAP |
The other delimiters that may follow the MDO delimiter, COM ("--") and MDC (">") and DSO ("["), are also joined with the MDO to form a single token. In this way the tokens MDO_COM, MDO_MDC, TOK_MDO_DSO are defined.
The MSC ("]]") and MDC delimiters are also joined and form the token TOK_MSC_MDC.
To avoid an error in the standard the token TOK_PERODEF is defined, which corresponds to the PERO ("%") delimiter when it occurs in a parameter entity declaration. See the description of the function pero() in the lexical analyser.
Inside the SGML declaration all keywords are recognized by the
lexical analyser and returned as tokens.
This defines the tokens:
SGML_APPINFO | SGML_BASESET | SGML_CAPACITY | SGML_CHARSET |
SGML_CONCUR | SGML_CONTROLS | SGML_DATATAG | SGML_DELIM |
SGML_DESCSET | SGML_DOCUMENT | SGML_ENTITY | SGML_EXPLICIT |
SGML_FEATURES | SGML_FORMAL | SGML_FUNCTION | SGML_GENERAL |
SGML_IMPLICIT | SGML_INSTANCE | SGML_LCNMCHAR | SGML_LCNMSTRT |
SGML_LINK | SGML_MINIMIZE | SGML_NAMECASE | SGML_NAMES |
SGML_NAMING | SGML_NO | SGML_NONE | SGML_OMITTAG |
SGML_OTHER | SGML_PUBLIC | SGML_QUANTITY | SGML_RANK |
SGML_RE | SGML_RS | SGML_SCOPE | SGML_SHORTREF |
SGML_SHORTTAG | SGML_SHUNCHAR | SGML_SGMLREF | SGML_SIMPLE |
SGML_SPACE | SGML_SUBDOC | SGML_SWITCHES | SGML_SYNTAX |
SGML_UCNMCHAR | SGML_UCNMSTRT | SGML_UNUSED | SGML_YES |
There is a special token TOK_NOD, which denotes the undefined token. It is returned by various functions when no appropriate token can be found.
The token TOK_CONREF is used when an element occurs with a filled in CONREF attribute. When this occurs, the content of the element is empty (see Standard Annex B, page 86). TOK_CONREF is put into the input stream by the parser to avoid recognition of the content of the element.
Starttags are special to the lexical analyser. Whenever the lexical analyser recognizes a STAGO ("<") delimiter, it scans the input until the end of the starttag is found. The end can be marked by a TAGC (">"), NET ("/"), STAGO or ETAGO ("</"). The lexical analyser returns the complete starttag as one token. The lexical analyser reads the endtag in the same way. The names of the tokens are constructed from the names of the generic identifiers of the elements. If, for example, the DTD is:
<!doctype DOC [ <!element DOC - - (A, B)> <!element A - - (#PCDATA)> <!element B - - (#PCDATA)> ]>then the tokens for the starttags of DOC, A and B are respectively ST_DOC, ST_A, ST_B. The tokens for the endtags are respectively END_DOC, END_A, and END_B. The attributes that belong to a starttag are read and stored. See ``att_par.c'' for a description of the attribute storage.