The Backend


The backend of the Amsterdam SGML Parser is simple, but powerful enough to create typeset documents from SGML documents. The user can specify a mapping from each starttag with its attributes to a replacement text, and a mapping from each endtag to a replacement text. For example the mapping:
<!usemap ex-doc>
[title>    ".TL"
[head>     ".NH <![CDATA[[level]]]>"
denotes that the starttag of the element `title' is to be replaced by the string `.TL', which is the Troff ms-macro troff ms macro for a title. The starttag of `head' is to be replaced by the string `.NH' followed by the value of the attribute `level'. Of course `level' must be a valid attribute of `head', otherwise an error message is given. The replacement text stands between double quotes `"' and an attribute value is referred to by placing the attribute name between square brackets `[' and `]'. The can be called with a user specified replacement file, which contains the mapping for the tags in the DTD. If a replacement file is specified, the tags in the output are replaced according to the mappings in the replacement file. Otherwise the `complete' document will be output. For example, if the replacement file looks like:
<!usemap ex-doc>
[memo>         ".MS"
[sender>       "From: "
]forename>    " "
[receivers>    "To: "
[contents>     ".PP"
]memo>        ".ME"
The SGML document in figure 1 will be converted to the Troff document in figure 2. Tags that are not mentioned in the replacement file are mapped to the empty string and they do not appear in the output.
<!usemap ex-doc>
[memo>[sender>
[forename>Jos[surname>Warmer
[receivers>
[forename>Sylvia[surname>van Egmond
[contents>The meeting of tomorrow will be postponed.
[/memo>
<label>Figure 1: SGML input document
<!usemap ex-doc><ex-body>
 .MS
 From: Jos Warmer
 To: Sylvia van Egmond
 .PP
 The meeting of tomorrow will be postponed.
 .ME
<label>Figure 2: SGML output document, with replacement
It is possible to specify that the replacement text must appear on a separate line. This is needed by Troff, since each Troff command must start with a `.' at the start of a line. Provisions are made to make it possible to put any (including non-printable) character in the replacement text. This is done by an escape mechanism similar to that of the C programming language.
Our experience is that it is easy to convert an SGML document to Troff or \*T, or some other similar looking code, to produce a typeset document on paper.

The replacement file


When a document parser is called, a replacement file may be specified. The replacement file contains the mapping between starttags and their attributes and endtags to some replacement text. The syntax of the file is given in figure 3. We use the formalism of LLgen to describe the syntax. Each identifier in uppercase is a token. Text between `<' and `>' contains an informal description.
%token COMMENT, PLUS, STRING_OPEN, STRING_CLOSE, ATT_OPEN, ATT_CLOSE,
       CHARACTER, EOLN, STAGO, ETAGO, TAGC;

%start file, file;
Figure 3: syntax of replacement file
file        : [repl | comment]* ;
comment     : COMMENT chars EOLN ;
repl        : start_repl | end_repl ;
start_repl  : starttag s* [PLUS s*]? rep_text [PLUS s*]? ;
end_repl    : endtag   s* [PLUS s*]? rep_text [PLUS s*]? ;
starttag    : STAGO name TAGC ;
endtag      : ETAGO name TAGC ;
rep_text    : [string s*]* ;
string      : STRINGOPEN [chars | attref]* STRINGCLOSE ;
chars       : CHARACTER* ;
attref      : ATTOPEN name ATTCLOSE ;
name        : < SGML name> ;
s           : < layout characters: space, tab, newline, return>

Figure 4: Definition of the tokens
tokencorrespoding stringrecognised in
COMMENT%repl
PLUS+repl
STRING_OPEN"rep_text
STRING_CLOSE"string
ATT_OPEN[string
ATT_CLOSE]attref
CHARACTER<any character>always
EOLN<the newline character>comment
STAGO<repl
ETAGO</repl
TAGC>starttag, endtag

A comment is ignored. A start_repl (end_rep) defines the mapping for the named starttag (endtag).
If the first PLUS in a repl is present, then the replacement text must start at the beginning of a line. If the second PLUS in a repl is present, then the replacement text must be directly followed by a newline in the output. When both PLUS's are present, the effect is that he replacement text is on a separate line, apart from its surrounding text, with no empty lines inserted.
rep_text is the replacement text itself, which consists of any number of strings. All specified strings are concatenated to form the replacement text. Putting replacement text in several strings is only useful to get a neat layout in the replacement file. So

&stago;table>  ".[keep]\en" ".TS"
is identical to
<table>  ".[keep]\en"
         ".TS"

The tokens are recognised only within the rule specified in the third column of the definition of the tokens in figure 4. There is one exception for the ATT_OPEN token: ATT_OPEN is never recognised inside the replacement text of an end_repl, because there are no attributes associated with an endtag.
Within a string, characters can be escaped to ensure that they are recognised as CHARACTER's. For instance, this can be used to put a `"' in a string. Escape sequences can also be used to denote unprintable characters. The escape mechanism is similar to that of the C programming language. The recognised escape-sequences are shown in figure 5.
Figure 5: Recognised escape-sequences
sequencename
\ennewline
\ettab
\erreturn
\esspace
\efformfeed
\e\e\e
\e[[
\e""
\e@lt;number>where @lt;number> is an octal number character with octal value <number>

The escape character is defined as '\e'. An escape character followed by a character that is not mentioned in figure 5, denotes itself. For example, if the replacement file contains:
<!usemap 	 ex-doc >
[report>   "line 1\en\e"line 2\e"\e12line 3"
then <report> is replaced by:
line 1
"line 2"
line 3

See the file `article.rep' in the distributed sources for a more complete example of a replacement file.