XML

Contents

Introduction

SGML is a metalanguage developed in the mid 1980's - before the web. Not only was SGML developed for another purpose (creating markup languages), it is huge, difficult to implement, and slow to run. It revolutionized publishing, but it needed revolutionizing, itself, for the web.

XML was designed by looking at SGML's strengths and weaknesses. Its strengths included the ability to describe document types (using dtd's) and document instances (using marked up documents). Its weaknesses include its unwieldy syntax with complex and/or rarely used features - SGML strove to be complete.

XML is a subset of SGML, introduced by the W3C in 1998. This means we can use SGML processors although XML validators and processors are beginning to appear.

One difference between SGML and XML is that XML documents must include both starting and ending tags (remember, SGML allows them to be omitted if they can be inferred). Because of this, XML documents can exist without a dtd - the dtd, itself, or a subset of the dtd, can often be inferred. However, dtd's for XML are often written so that a document parser can be generated to process many documents of that type (just as in SGML).

Different vocabulary (from SGML) is surfacing around XML. For example dtd data is often called schemata - and the singular called schema. A valid XML document is one which is well-formed and has an existing dtd. To be well-formed, an XML document must conform to the rules for XML (see next section), but need not have a defining dtd.

XHTML


XHTML is actually, HTML, but with XML's philosophy: rules. Some of these rules are:

XHTML documents can be validated with XML tools, browsers can show them with predictable results, but, when using XHTML processors, the document is either displayed or an error message occurs. Some of the flexibility of HTML is lost.

XHTML is both XML and HTML! It is XML without an accompanying dtd and should have the following declaration: <?xml version="1.0" standalone="yes"?> There are other intermediaries, e.g., in a dtd-less file, character entities can be declared.

XHTML can be generated with XML is the same sense that HTML can be generated (i.e., there is a defining dtd) by SGML.

Tutorials


You can learn the syntax of both XML schemas and XML documents from the following tutorials:

XML 1.0, the defining document. Note that the goals seem to be responding to SGML's complexity. Note also that the constructs are described using BNF (from CS503) - EBNF, actually.

XML Schema Part 0: Primer uses a small catalog example to explain the basics of XML schemas and documents.

More details can be found in:

XML Schema Part 1: Structures and XML Schema Part 2: Datatypes

And XML can be expressed in XML.

A great faq site.

And, finally, The XML Cover Pages Everything about XML