CS 561 Spring 2007.

Mini-Project 2: XML Modeling and Querying.

Assigned: Thursday, 15th February 2007

Due: Thursday, 8th March 2007.

Maximum: 100pts.

Collaboration: This project can be done in teams of two, or also on an individual basis. However, if done in a team, every student in the team should make sure to learn all aspects of the project, ranging from design decisions, to implementation, project documentation, and final report.

Objective:

To get some hands-on experience with some of the XML tools, in particular:
1. understand XML schema and XML data, and their interrelationships and validation.
2. query XML data and XML schema using an XML query language (XQuery)
3. load XML data into some relational database and query the mapped XML data using SQL.
4. investigate differences of working with XML data natively versus via a relational server.

Project Description

Part I: XML + XML Schema + XQUERY

Given the following DTD:

!DOCTYPE bibliograph [ 

< !ELEMENT book (title, author+, year, publisher, place?)>

< !ELEMENT article (title, author+, journal, year, volumne, pages?)>

< !ELEMENT publisher (name, location+)>

< !ELEMENT location (cityname, street)>

< !ELEMENT author (lastname,firstname)>

< !ELEMENT title ( #PCDATA)>

... continue with PCDATA declarations for all the other
types occuring above like year, publisher, etc.

Please feel free to further expand and enhance this schema, as it's a rather minimalistic one just to get you going.

Now define an XML Schema for the DTD above. This can be done either by hand or by automatically generating an XML schema using one of the tools indicated below from the given DTD.

For this XML schema, create some XML data files with appropriate content. We will also provide sample data on this webpage. Let's refer to it as MYDATA data file from now onwards. The project you turn in must show the output of your systems having been run on our given input files. Validate the XML data against your XML schema to assure that the XML document is valid. Modify your XML data file into MYDATA2 that now has some extra attributes or subelements not defined by your XML schema. Attempt to revalidate your new xml document. Show some screendump, or report what tools you used and what the precise outputs where, to illustrate you have successfully completed this task.
Design a number of XQUERIES, apply them on the sample XML document, and record both the query and the resulting XML document. In particular, you must write the following queries. You can use any native XML engine of your choosing. Sample engines include Galax or Kweelt.
1. Find the cityname and street to which to mail in order to reach the publishers of all books published in the year < year >.
2. Find all authors who have authored a book for two years in a row.
3. Display books and articles sorted by year.
4. Display articles with more than three authors.
5. For each book, provide the last names of the first and second author, if exists, and the book title.
Write an XQuery that queries the XML schema for all element definitions that have a required subelement, or put different, that have a nested subelement with MinOccurs being 1 or higher. Apply this Xquery to the above XML Schema, and return the name of elements that meet this query.

Part II: XML + XML-SQL Utility of RDBMS

For the second part of the project, you should gain some experience with a relational DBMS and its XML capabilities. As primary example, you may want to look at Oracle's XML features, such as XML-SQL utility XSU (8.i version) or Berkeley-DBXML. However, you can equally explore any of the other major relational DBMS vendors and their tools for XML management using RDB related technologies.

Take the XML document you worked with for Part I above. Use the built-in functionality of for example XSU tool (see links below for documentation and downloads) to store your XML document in Oracle. Any default mapping, such as mapping all XML into one single nested object-relational table, or any other such simple solution will be sufficient. Make sure to describe clearly what you decided to do.
Now take all the queries from Part I above, and determine which of them you can specify and how over this now loaded XML data. If there are any that you cannot ask, explain why not. Those that you can process, execute them over your loaded XML table, and return the results to the user.

Part III: XML and Oracle (Pure Relational Database Technology)

For this third part of the project, you need to go back to the "good old" relational days and how the equivalent of "XML queries" may have been serviced within that paradigm.

Take the XML schema and XML document you worked with for Part I above. Assume that you do not have the XSU-facility available to you, yet you still wanted to load this XML data into a relational server because your boss wants to be able to integrate the data with other relational business data and manage it all in one central system.
For this, design a flat relational table schema that stores the same content as the XML schema. You probably do want to write a script to flatten your XML document into this flat relational structure (the so called default XML view). Or, you can also do this translation into the proper format by hand. Then, create that flat relational tables in (regular) Oracle, and load the correspondingly (flattened) relational data. Provide a design discussion of what choices you have made and why to capture the different XML model characteristics, such as sets, optionality, and nested structures. (NOTE: In this application we are interested in keeping track of the ORDER of the authors. Also, you can assume that only Books and Articles will appear as top level elements in your XML documents. Then, also describe what, if anything, would have to change about your relational schema design if you were to make Authors also be allowed to appear as top elements in a document. You do however not need to work with this changed schema below. )
Lastly, take the queries from part I again and determine which of them you can specify in pure SQL against your now truly relationalized 'XML' data. If there are any that you cannot ask, state this and explain why not. Those that can be processed over your relational schema, use JDBC to execute them, and to return the results to the user (in some format).
Lastly, write some java code for Part III.2 that converts this relational output of your SQL queries into XML by wrapping the result relational tuples with (appropriate?) TAGS before you print them for the user in some file. (Hint: you are not required to construct on the fly "complex XML structures" here, but rather simply wrap the "flat" tuples as they come back. Describe here how close or rather how different your "xml-ish" query results are compared to the query results produced with approaches in either part I or part II above.)

Part IV: Comparison

Compare the alternate approaches for working with XML based on your experience in doing so in the projects above. In particular, compare and contrast native XML support, with managing XML documents using relational technology, versus using XML-extended relational technology. You may want to include in your discussion issues, such as expressive power of querying, lightweightness of the tool, overhead, ease of program development, types of applications suitable for technology, and whatever else comes to your mind.

What to Turn in

PARTS I to III. Like for project 1, provide a complete script documenting the queries you designed and the output for each query (in hardcopy). You should hand in hardcopy all material related to each of the above tasks so to illustrate the complete steps you took per task and what tool you used to accomplish the task. The tasks should be turned in in the order as listed above. You must clearly label the problem # for each of your answers, so that we know what is meant to be what. For all queries, we need to turn in both the query specification, as well as the generated output. Do also generate XML data on your own to further test out your queries, and provide this in your e-version of answer (only). Furthermore, indicate for each part explicitly what is working and what is not. (70pts)
You should provide a link where your source code for each of the PARTS II and III can be downloaded (.zip file is prefered). If you do not have webspace anywhere, you can submit the code on a CD. Also include any scripts, SQL or otherwise used to populate or create the database. Please display the actual statements being generated and executed. Include a README.txt that will explain how to build and run your program and a description of what does and does not work. As always, your programs must be documented with appropriate comments to practice good software engineering. (20pts)
Your discussion in part IV of course is to be turned in in hardcopy only. (10pts)

Resources you may want to consult

Below are references to various XML tools, we recommend that you download and work with in order to complete your project:

Instructions on down-loading the KWEELT XML query engine:
Instructions on download of Kweelt engine for XQUERY processing. Or, see
Kweelt sourceforge).

Developer's guide to XML and Oracle including XSU and Oracle's XML parser. Chapters 1,2, and 4 seem to be the most useful ones.
or, go to XSU tutorial directly at:
XSU download.
Or, visit Oracle and "search" for the tool there. XSU is part of this larger XDK for java.
Oracle site

Java and XML documentation from Sun's site. This provides resources on the standard Java libraries in JDK (not XSU) that interact with XML. The Java Web Services Tutorial may be helpful to you.
Java and XML

Other Places to Learn about XML and XML Tools:

WPI RAINBOW project site contains some useful XML + RDB related materials.
Rainbow project site (though not updated).

Good introductory web site about XML:
http://www.w3schools.com/xml/

Another popular xml tool with free trial evaluation:
http://www.xmlspy.com/

Published: from DTD to XML schema:
http://www.w3.org/2000/04/sche ma_hack/

Transforming XML to HTMLd using XSL:
http://www.w3schools.co m/xml/xml_browsers.asp

Various XML tools from IBM
http://www.alphaworks.ibm.com/

IBM XML4J parser (version 3.2.1):
http://www.alphaworks.ibm.co m/tech/xml4j

XQuery tools: Demo systems to try out XQueries on-line.

Software AG's QuiP
http://www.softwareag.com/d eveloper/quip/

Microsoft
http://msdn.microsoft.com/XML/XMLDownloads/

Lucent
http://db.bell-labs.com/galax/

X-Hive
http://www.x-hive.com/xquery

SourceForge
http://sourceforge.net/proje cts/xquench/

Fatdog
http://www.fatdog.com/

Kawa-Query - compiles to Java byte code
http://www.gnu.org/software/kawa/xq uery

Kweelt (Quilt)
Kweelt sourceforge.