|
Our group meets on Thursdays at 11:00 a.m., FL 246.
Dates and topics for this semester are as follows:
- Jan 18
- AIRG/DKBRG Organizational Meeting (Coordinators: DCB & EAR)
- Jan 25
- Video:
"Automated Program Synthesis"
Dr. Elaine Kant
Schumberger Lab for Computer Science
- Feb 1
- Lee Becker
(
paper available online )
Discussion of:
Vasant Dhar, Dashin Chou, Forster Provost.
"Discovering Interesting Patterns for Investment Decision Making with
GLOWER - A Genetic Learner Overlaid With Entropy Reduction",
to appear in the journal Data Mining and Knowledge Discovery.
- Feb 8
- Xin Zhang [DKBRG]
"Clock: Synchronizing Internal Relational Storage with External XML Documents"
Xin Zhang, Gail Mitchell, Wang-chien Lee, Elke Rundensteiner,
Verizon Communications and Worcester Polytechnic Institute.
In many business settings, a relational database system (RDBMS)
will serve as the storage manager for data from XML documents. In
such a system, once the XML data is dissembled and loaded into the
storage system, XML queries posed against the (virtual) XML
documents are processed by translating them into SQL queries
against the relational storage. However, for applications which
frequently update their XML documents, we cannot afford to reload
a complete, possibly large, document for each update, instead we
must be able to incrementally propagate document updates to the
stored XML data. In this paper, we address the issue of correctly
reflecting updates of external XML documents into the loaded XML
data in a relational database system. We describe Clock, a
framework for synchronizing the relational storage with updated
XML documents by exploiting a metadata-driven technology. First,
we propose a set of (DTD preserving) update primitives for XML
documents. Second, based on the mapping between XML and the
relational model, we describe the propagation of those update
primitives. Validation of the updates ensures they will not
violate the constraints specified by the DTD. We have implemented
a working prototype of the Clock system using the IBM's XML4J
parser, JDBC 2 and Oracle 8i. We report on preliminary experiments
conducted using this prototype to analyze our algorithms in a
document update setting.
- Feb 15
- Advising Appointment Day: No meeting
- Feb 22
- Dave Brown (
paper available online )
Discussion of:
Greg A. Keim, Noam Shazeer, Michael L. Littman, Sushant Agarwal,
Catherine M. Cheves, Joseph Fitzgerald, Jason Grosland, Fan
Jiang, Shannon Pollard, and Karl Weinmeister. Proverb: The
probabilistic cruciverbalist, Proc. AAAI'99
- Mar 1
- Andreas Koeller (WPI CS)
"Meta data discovery"
PhD progress presentation
The integration of data from different sources requires knowledge
about the information that each source provides. While it is usually
trivial to query the schema of each source, the extent of a source's
data is often unknown. We are looking for methods to determine the
information content of information sources through querying.
Techniques include ontology-based reasoning over the schema names of a
source, the use of database statistics (relation sizes, value
distribution) and probabilistic methods to infer source properties
from samples of their data. Sampling can be used to discover overlaps
of information across sources, which can be helpful in data
integration projects. We give an overview of techniques and focus
in particular on the theory and practice of sampling.
- Mar 8
- Term break: No meeting
- Mar 15
- Mark Claypool
"A Research Mega-Byte on on Information Filtering"
- Mar 22
--
- Postponed
- Mar 29
- Janet Burge
"Non-functional Requirements: Fact or Fiction?"
- Apr 5
- Mike Sao Pedro [AIRG/DKBRG]
"Using Association Rules for Recommendation"
- Apr 12
- Hong Su [DKBRG]
tba
- Apr 19
- Project Presentation Day: No meeting
- Apr 26
- Chris Shoemaker [AIRG/DKBRG]
"Set-Based Association Rules"
(MS Thesis presentation)
|