CS 561 Spring 2007.
Mini-Project 2: XML Modeling and Querying.
Assigned: Thursday, 15th February 2007
Due: Thursday, 8th March 2007.
Maximum: 100pts.
Collaboration: This project can be done
in teams of two, or also on an individual basis.
However, if done in a team,
every student in the team should make sure to learn all aspects of the
project, ranging from design decisions, to implementation, project
documentation, and final report.
Objective:
To get some hands-on experience with some of the XML tools, in
particular:
1. understand XML schema and XML data, and their
interrelationships and validation.
2. query XML data and XML schema using
an XML query language (XQuery)
3. load XML data into some relational database and
query the mapped XML data using SQL.
4. investigate differences of working
with XML data natively versus via a relational server.
Project Description
Part I: XML + XML Schema + XQUERY
- Given the following DTD:
!DOCTYPE bibliograph [
< !ELEMENT book (title, author+, year, publisher, place?)>
< !ELEMENT article (title, author+, journal, year, volumne, pages?)>
< !ELEMENT publisher (name, location+)>
< !ELEMENT location (cityname, street)>
< !ELEMENT author (lastname,firstname)>
< !ELEMENT title ( #PCDATA)>
... continue with PCDATA declarations for all the other
types occuring above like year, publisher, etc.
Please feel free to further expand and enhance this schema,
as it's a rather minimalistic one just to get you going.
Now define an XML Schema for the DTD above.
This can be done either by hand or by
automatically generating an XML schema
using one of the tools indicated below from the given DTD.
- For this XML schema, create some XML data files with appropriate
content.
We will also provide sample data on this webpage.
Let's refer to it as MYDATA data file from now onwards.
The project you turn in must
show the output of your systems having been run on our given input files.
Validate the XML data against your XML schema to assure that the XML
document is valid. Modify your XML data file into MYDATA2 that now has
some
extra attributes or subelements not defined by your XML schema.
Attempt to revalidate your new xml document.
Show some screendump, or report what tools you used and what
the precise outputs where, to illustrate you have successfully
completed this task.
- Design a number of XQUERIES,
apply them on the sample XML document, and record both the query and the
resulting XML document. In particular, you must write the following
queries. You can use any native XML engine of your choosing.
Sample engines include Galax or Kweelt.
- Find the cityname and street to which to mail in
order to reach the publishers of all books published in the year
< year >.
- Find all authors who have authored a book
for two years in a row.
- Display books and articles sorted by year.
- Display articles with more than three authors.
- For each book, provide the last names
of the first and second author, if exists, and the book title.
- Write an XQuery that queries the XML schema for all element
definitions
that have a required subelement, or put different, that have a nested
subelement with MinOccurs being 1 or higher. Apply this Xquery to the
above XML Schema, and return the name of elements that meet this query.
Part II: XML + XML-SQL Utility of RDBMS
For the second part of the project, you should gain some experience with
a relational DBMS and its XML capabilities.
As primary example, you may want to look at
Oracle's XML features, such as
XML-SQL utility XSU (8.i version) or
Berkeley-DBXML. However, you can equally explore any
of the other major relational DBMS vendors and their tools
for XML management using RDB related technologies.
- Take the XML document you worked with for
Part I above.
Use the built-in functionality of for example XSU tool (see links below for
documentation and downloads) to store your XML document in Oracle.
Any default mapping, such as mapping all XML
into one single nested object-relational table, or any
other such simple solution will be sufficient.
Make sure to describe clearly what you decided to do.
- Now take all the queries from Part I above,
and determine which of them you can specify and how
over this now loaded XML data.
If there are any that you cannot ask, explain why not.
Those that you can process,
execute them over your loaded
XML table, and return the results to the user.
Part III: XML and Oracle (Pure Relational Database Technology)
For this third part of the project, you need
to go back to the "good old" relational days
and how the
equivalent of "XML queries" may have been
serviced within that paradigm.
- Take the XML schema and XML document you worked with for
Part I above. Assume that you do not have the XSU-facility available
to you, yet you still wanted to load this XML data into a relational
server because your boss wants to be able to integrate the data with other
relational business data and manage it all in one central system.
For this, design a flat relational table schema that stores
the same content as the XML schema. You probably do want
to write a script to flatten your XML document into this
flat relational structure (the so called default XML view).
Or, you can also do this translation into the proper format
by hand. Then, create that
flat relational tables in (regular) Oracle, and load the
correspondingly (flattened) relational data. Provide a design
discussion of what choices you have made and why to capture
the different XML model characteristics, such as sets, optionality,
and nested structures.
(NOTE: In this application we are interested
in keeping track of the ORDER of the authors. Also,
you can assume that only Books and Articles will appear as top level
elements in your XML documents. Then, also describe
what, if anything, would have to change about your relational
schema design if you were to make Authors also be allowed to appear
as top elements in a document. You do however not need to
work with this changed schema below. )
- Lastly, take the queries from part I again and
determine which of them you can specify in pure SQL
against your now truly relationalized 'XML' data.
If there are any that you cannot ask, state this
and explain why not.
Those that can be processed over your relational schema,
use JDBC to execute them,
and to return the results to the user (in some format).
- Lastly, write some java code for
Part III.2 that converts this relational
output of your SQL queries into XML by wrapping the
result relational tuples with (appropriate?)
TAGS before you print them for the user in some file.
(Hint: you are not required to construct on the fly
"complex XML structures" here, but rather
simply wrap the "flat" tuples as they come back.
Describe here how close or rather how different
your "xml-ish" query results are compared to
the query results
produced with approaches in either part I or part II above.)
Part IV: Comparison
- Compare the alternate approaches for working with XML based
on your experience in doing so in the projects above.
In particular, compare and contrast
native XML
support, with managing XML documents using relational technology,
versus using XML-extended relational technology.
You may want to include in your discussion issues,
such as expressive power of querying,
lightweightness of the tool, overhead,
ease of program development, types of applications suitable for
technology, and whatever else comes to your mind.
What to Turn in
-
PARTS I to III.
Like for project 1, provide a complete script documenting
the queries you designed and the output for each query (in hardcopy).
You should hand in hardcopy
all material related to each of the above
tasks so to illustrate the complete
steps you took per task and what tool you
used to accomplish the task.
The tasks should be turned in in the order as listed above.
You must clearly label the problem # for
each of your answers, so that we know what is meant to be what.
For all queries, we need to turn in both the
query specification, as well as the generated
output.
Do also generate XML data on your own to further test out
your queries, and provide this in your e-version of answer (only).
Furthermore, indicate for each part explicitly what is
working and what is not.
(70pts)
-
You should provide a link where your source code for
each of the PARTS II and III can be downloaded (.zip
file is prefered). If you do not have webspace anywhere, you can
submit the code
on
a CD. Also include any scripts, SQL or otherwise
used to populate or create the database.
Please display the actual statements being generated and executed.
Include a README.txt that will explain how to build and run your program and
a description of what does and does not work. As always, your programs must
be documented with appropriate comments to practice good software
engineering. (20pts)
-
Your discussion in part IV of course is to be turned
in in hardcopy only.
(10pts)
Resources you may want to consult
Below are references to various XML tools, we recommend that you download
and work with in order to complete your project:
Instructions on down-loading the KWEELT XML query engine:
Instructions
on download of Kweelt engine for XQUERY processing.
Or, see
Kweelt sourceforge).
Developer's guide to XML and Oracle including XSU and Oracle's XML
parser. Chapters 1,2, and 4
seem to be the most useful ones.
or, go to XSU tutorial directly at:
XSU download.
Or, visit Oracle and "search" for the tool there.
XSU is part of this larger XDK for java.
Oracle site
Java and XML documentation from Sun's site. This provides
resources
on the standard Java libraries in JDK (not XSU) that interact with XML.
The Java Web Services Tutorial may be helpful to you.
Java and XML
Other Places to Learn about XML and XML Tools:
WPI RAINBOW project site contains some useful
XML + RDB related materials.
Rainbow
project site (though not updated).
Good introductory web site about XML:
http://www.w3schools.com/xml/
Another popular xml tool with free trial evaluation:
http://www.xmlspy.com/
Published: from DTD to XML schema:
http://www.w3.org/2000/04/sche
ma_hack/
Transforming XML to HTMLd using XSL:
http://www.w3schools.co
m/xml/xml_browsers.asp
Various XML tools from IBM
http://www.alphaworks.ibm.com/
IBM
XML4J parser (version 3.2.1):
http://www.alphaworks.ibm.co
m/tech/xml4j
XQuery tools: Demo systems to try out XQueries on-line.
Software AG's QuiP
http://www.softwareag.com/d
eveloper/quip/
Microsoft
http://msdn.microsoft.com/XML/XMLDownloads/
Lucent
http://db.bell-labs.com/galax/
X-Hive
http://www.x-hive.com/xquery
SourceForge
http://sourceforge.net/proje
cts/xquench/
Fatdog
http://www.fatdog.com/
Kawa-Query - compiles to Java byte code
http://www.gnu.org/software/kawa/xq
uery
Kweelt (Quilt)
Kweelt sourceforge.