AIRG Topics - Spring 2001

Our group meets on Thursdays at 11:00 a.m., FL 246.

Dates and topics for this semester are as follows:

Jan 18
AIRG/DKBRG Organizational Meeting (Coordinators: DCB & EAR)

Jan 25
"Automated Program Synthesis"
Dr. Elaine Kant
Schumberger Lab for Computer Science

Feb 1
Lee Becker ( paper available online )
Discussion of:
    Vasant Dhar, Dashin Chou, Forster Provost.
    "Discovering Interesting Patterns for Investment Decision Making with GLOWER - A Genetic Learner Overlaid With Entropy Reduction",
    to appear in the journal Data Mining and Knowledge Discovery.

Feb 8
Xin Zhang [DKBRG]
"Clock: Synchronizing Internal Relational Storage with External XML Documents"
Xin Zhang, Gail Mitchell, Wang-chien Lee, Elke Rundensteiner,
Verizon Communications and Worcester Polytechnic Institute.
    In many business settings, a relational database system (RDBMS) will serve as the storage manager for data from XML documents. In such a system, once the XML data is dissembled and loaded into the storage system, XML queries posed against the (virtual) XML documents are processed by translating them into SQL queries against the relational storage. However, for applications which frequently update their XML documents, we cannot afford to reload a complete, possibly large, document for each update, instead we must be able to incrementally propagate document updates to the stored XML data. In this paper, we address the issue of correctly reflecting updates of external XML documents into the loaded XML data in a relational database system. We describe Clock, a framework for synchronizing the relational storage with updated XML documents by exploiting a metadata-driven technology. First, we propose a set of (DTD preserving) update primitives for XML documents. Second, based on the mapping between XML and the relational model, we describe the propagation of those update primitives. Validation of the updates ensures they will not violate the constraints specified by the DTD. We have implemented a working prototype of the Clock system using the IBM's XML4J parser, JDBC 2 and Oracle 8i. We report on preliminary experiments conducted using this prototype to analyze our algorithms in a document update setting.

Feb 15
Advising Appointment Day: No meeting

Feb 22
Dave Brown ( paper available online )
Discussion of:
    Greg A. Keim, Noam Shazeer, Michael L. Littman, Sushant Agarwal, Catherine M. Cheves, Joseph Fitzgerald, Jason Grosland, Fan Jiang, Shannon Pollard, and Karl Weinmeister. Proverb: The probabilistic cruciverbalist, Proc. AAAI'99

Mar 1
Andreas Koeller (WPI CS)
"Meta data discovery"
PhD progress presentation
    The integration of data from different sources requires knowledge about the information that each source provides. While it is usually trivial to query the schema of each source, the extent of a source's data is often unknown. We are looking for methods to determine the information content of information sources through querying. Techniques include ontology-based reasoning over the schema names of a source, the use of database statistics (relation sizes, value distribution) and probabilistic methods to infer source properties from samples of their data. Sampling can be used to discover overlaps of information across sources, which can be helpful in data integration projects. We give an overview of techniques and focus in particular on the theory and practice of sampling.

Mar 8
Term break: No meeting

Mar 15
Mark Claypool
"A Research Mega-Byte on on Information Filtering"

Mar 22 --

Mar 29
Janet Burge
"Non-functional Requirements: Fact or Fiction?"

Apr 5
Mike Sao Pedro [AIRG/DKBRG]
"Using Association Rules for Recommendation"

Apr 12
Hong Su [DKBRG]

Apr 19
Project Presentation Day: No meeting

Apr 26
Chris Shoemaker [AIRG/DKBRG]
"Set-Based Association Rules"
(MS Thesis presentation)

