Processing an SGML marked up document presumes that a document parser exists that understands the markups and their relationships to each other (i.e., the logical structure of the document).
In this lab, we will use a pre-created DTD to (1) create a document parser, and (2) process a pre-written document using the generated document parser to validate that our document is marked up correctly. The document parser will also insert any missing tags.
NOTE: These directions were written to be executed at WPI. Mail
for advice on how to perform these steps outside of the WPI CS department.
Step 1: Let's look at the DTD. Can you see what the tags are and what the document's logical structure is? Ask if you can't!
Step 2: Let's use SGML to create a Document Parser that will recognize a document of this type:
(a) First, make sure you're on the host owl or sequoia .
You can find out which machine you are on by typing hostname.
(b) First, copy the document type definition (DTD) and the pre-written
document to be parsed to your current directory.
cp /usr/local/lib/asp/dtd/article.dtd . cp /usr/local/lib/asp/dtd/article.doc .
(c) Now, invoke the parser generator, aspgen . If you're
running low on disk quota (you need about 300k for the parser executable),
you may invoke aspgen with a -min flag. This instructs
aspgen to leave the generated parser in "temp space" and
not in your current directory.
aspgen article.dtd- or -
aspgen -min article.dtd
Step 3: aspgen has now created the document parser. Execute the parser on the marked up document, article.doc . (You will have to cd /tmp if you invoked aspgen with the -min flag. Look at the output from aspgen .)
Type:
article.asp article.doc > article.outThis sends the results of parsing to the file article.out.
Compare article.doc with article.out.
Can you see what the difference is? If not, ask!
Amsterdam Parser (ASP) Documentation
Send questions and comments to: Karen Lemone