In this project, we are addressing subproblems of the well-known software legacy problem, namely, the support of on-line modification of databases without disturbing existing applications. Our basic methodology to solve this problem is to integrate schema evolution and view support functionalities into one system. Our transparent schema evolution system, TSE, computes a new view schema that reflects the semantics of the desired schema change, and replaces the old view with the new one. The algorithms supported in TSE implement all schema evolution operations typically supported by OODB systems as view definitions. This represents an important contribution since it demonstrates the feasibility of our approach.
Numerous open research issues remain that we plan to explore in this five-year project, including: to develop a framework for transparently realizing 'arbitrarily complex' schema changes with the desired change semantics; to develop and compare different OODB implementation architectures that provide the flexibility necessary for view and schema evolution support; to build and experiment with practical tools exploiting these results; and to extend the concepts and tools developed above to federated and multidatabase environments. The tools developed by this project will be of utility to rapidly progressing application domains, such as the Human Genome project, manufacturing systems, and digital library systems, which struggle with the problems of continuously evolving software systems.
For More Detailed MultiView: Homepage
[back to Projects Index]
The University of Michigan Digital Library (UMDL) project is a large interdisciplinary effort funded by NSF, ARPA, and NASA with the goal of developing a distributed agent-based infra-structure for information retrieval across the internet, and for employing this technology for educational training in classrooms. In the context of the project, we are focusing on the development of database systems to manage unstructured data types such as SGML documents and video data.
In order to handle the efficient storage and retrieval of unstructured documents, we are exploring the extension of object-oriented database systems to maintain SGML (the standard document markup language) documents and to provide both structured and information-retrieval queries on the hypermedia database system. We have chosen the object-relational system Illustra as the target implementation platform, since it already provides rudimentary pattern matching capabilities. We have designed a suite of abstract data types required to handle textual documents; which include optional repeating groups, union types, sequences, and variable-length lists. Exploiting the rule mechanism of Illustra for constraint specification, a successful implementation of this hypertext database system supporting these customized textual data types has been achieved.
We are investigating schema integration techniques for merging of different document type definitions (DTDs) for books, articles, etc. This will result in the integrated maintenance and retrieval of all types of textual documents in one unified system. The long-term goal of this research effort is to achieve a successful marriage of efficient (but structured) database technology with flexible (but often slow) information retrieval techniques.
We are also investigating the development of the conspectus database for the UMDL system, which is the unified directory service maintaining information about all information sources in the system. Such meta-data includes the management of the number, type, content of collections, query and other capabilities, as well as ip-address location of the information sources. Given that such a database will be the bottleneck to the functioning of the overall system, we are exploring efficient storage structures, backup support, distributed database techniques and information retrieval algorithms. Caching policies to physically distribute as well as continuously update these information source descriptions are explored -- required to handle evolution in the system in terms of new sources appearing, old ones go out of scope, and existing ones revising their content.
In this project, we are developing state-of-the-art map database technology to address the data management needs of ITS applications. Current GIS systems are designed for static maps, typically assuming no or only very infrequent changes. In ITS, we must address the problem of frequent and often real-time changes to the map model in terms of traffic volume, incidents, etc. Current relational DBMS products are designed for SQL type applications, whereas path queries (route computation) as required for ITS are extremely inefficient. Our goal is to address these problems by developing object-oriented map data technology customized to ITS applications.
Our efforts have focused on efficient path computation necessary for route guidance, one of the key requirements for ITS. While other ITS database systems typically use search algorithms (e.g., heuristic A*) to provide for compute-on-demand path finding, we have developed a semi-materialized path view approach that precomputes optimal paths. Advantages of our approach are (1) route computation is very efficient and no longer dependent on the number of route requests per time period, and (2) the storage overhead is minimal compared to a full enumeration of all possible paths. Our algorithms incrementally update the semi-materialized path view structure in response to weight changes on the traffic links of the underlying (possibly cyclic) network. We have extended the semi-materialized path view approach to hierarchical route guidance. This hierarchical extension results in a significant reduction in view storage costs as well as update performance, without any noticeable penalty in terms of path retrieval speed. Our experimental results measuring storage requirements, route retrieval and path view update times, and route accuracy demonstrate the potential of our approach.
In continuation of this project, we are exploring the implementation of our map and route management approach on commercial DBMS platforms, conduct experimental studies of persistent implementations versus main-memory alternatives, and utilize object-oriented spatio-temporal models in order to incorporate expected traffic behavior into path planning.
Due to their limited data modeling power and the view update ambiguity problem, relational views have been found to be of limited use for engineering applications. Views in object-oriented databases on the other hand will play an important role for defining customized interfaces for engineering applications, since they address both shortcomings. Our goal is to characterize the issues associated with supporting views in the context of the more powerful (but hence also more complex) object-oriented data model, to solve a subset of of these research issues, and to build tools to experiment with different view specification and maintenance approaches.
Objectives of this research include the development of a theoretical foundation of views, of efficient algorithms for view implementation and query processing, as well as its application to tool interfacing and heterogeneous system integration. The foundation of this research is our object-oriented view methodology, called MultiView. MultiView is unique in that it focuses on the specification of a complete and consistent view schema graph rather than of individual view classes. One key contribution of MultiView is the classification of all virtual classes into the global schema graph, allowing for property inheritance among both base and virtual classes.
We have completed the implementation of a working MultiView prototype on top of the commercial OODB system GemStone. This required the redesign of a suitable object model for view support; for which we employed the basic principles of an object-slicing architecture. This prototype provides us with a practical testbed for experimenting with different update and query processing algorithms. Our initial results for view materialization algorithms for object-preserving view classes, a largely unexplored area in the context of OODB systems, demonstrate that consistent updates can be achieved in time independent of the size of the number of objects. Our research continues to tackle the following issues: strategies for caching versus view query substitution, efficient reclassification of objects, dynamic method resolution, and automatic view update techniques.
We are also exploring the extension of Multiview to handle more complex transformations required by ECAD applications, such as flattening hierarchical design graphs and sequence-to-graph mappings. These extended requirements impose a reexamination of solutions for the supported view definition language, the materialization strategies, update policies as well as general models for distributed system update contracts.
For More Detailed MultiView: Homepage
[back to Projects Index]
We are investigating the development of tools for annotating and analyzing video data in 3-dimensions by using spatial and temporal context. We have developed a general yet simple graphical query language for specifying relative temporal queries between sets of annotations. This work is a direct extension of the dynamic queries approach for Visual Information Seeking Environments first proposed by Shneiderman's research group. We are now extending the graphical query language to handle relative spatial and motion queries. We're building a prototype interactive visualization environment in which users can not only query video data using temporal-spatial characteristics but they can also review continuous visual output for trend analysis of query results. This allows users to explore temporal and/or spatial relationships between different types of events within the video data.
This project is concerned with developing and customizing database technology for manufacturing environments. Feedback and feed-forward control have been identified as important features of semiconductor manufacturing facilities of the future. In feed-forward control, deviations in the processing of a wafer in one step are compensated for by adjusting the processing in some/all of the steps which are yet to be carried out on that wafer. In feedback control, a cell which has already processed a wafer receives advise from another cell to adjust its model for better processing in future runs.
However, the issue of providing enabling mechanisms for inter-cell feed-forward and feedback control has not been adequately addressed. In this project we are aiming to fill this gap by developing software tools to provide an enabling environment for inter-cell control. The software enabler being developed utilizes an active object-oriented database to allow rule definition to facilitate this type of control. The controller is being designed to allow easy incorporation in existing manufacturing environments.
Since the semiconductor manufacturing process is not very well-understood and in many cases measurements cannot be interpreted to precise actions to take, the active database of the controller needs to allow the definition of fuzzy rules. To add this capability to active databases, research is being focussed on developing models for fuzzy data and fuzzy rules, developing techniques for efficient query processing on fuzzy data, and developing models for definition and detection of composite events for fuzzy rules.
This work is being carried out in collaboration with Center for Display Technology and Manufacturing, University of Michigan, and will complement the research previously carried out on the design and development of a Generic Cell Controller (GCC) at the University of Michigan. The GCC controls semiconductor manufacturing equipments to carry out requested processes and performs discrete control on these processes. At present, no information sharing takes place between multiple GCCs controlling various equipments, or between a GCC and controllers at a higher level. The inter-cell controller being developed will use the processing information from the various GCCs to generate inter-cell feed-forward and feedback advise. After development, the inter-cell controller is planned to be tested on a two chamber Plasma Therm Cluster Tool.
In summary, this research effort will produce a framework and mechanism for inter-cell control in manufacturing facilities that is generic, portable, flexible and adaptable, and can be easily assimilated by existing facilities. This research will advance the state of the art in computer science by contributing models for handling fuzzy data and rules in active object-oriented databases, and methods for efficient querying processing on fuzzy data.
Manufacturing automation applications, such as machine tool controllers, have become more sophisticated in recent years by capitalizing on the technological progress made in the field of computer. However, problems still remain in terms of life-cycle cost and lack of openness in commercially available products. There is a general consensus that these applications should have a modular architecture and well-defined interfaces that allow third parties to develop and use these modules independently. Modules can be either hardware or software. The modules may be selected based on price and/or performance, while meeting the constraints of the application. Modular manufacturing applications require a built-in database management system (DBMS) to support concurrent data access and provide well-defined interfaces between different software entities (tasks, modules, etc.). They typically are subject to a range of timing constraints, which require the DBMS to provide timing guarantees, sometimes under complex conditions.
The objective of our research is to develop a high performance real-time object-oriented DBMS that is suitable for manufacturing applications. We define a real-time object model that explicitly captures important characteristics of RTDB applications, especially in the manufacturing application domain, namely, timing constraints and performance polymorphism. It uses specialization dimensions to model timing specifications and letter class hierarchies to capture performance polymorphism. The timing constraint of a task refers to the deadline by which the task must be completed. Performance polymorphism refers to the concept of maintaining and selecting among multiple implementations of a method (body) that carry out the same task and differ only in their performance measures, such as execution time, memory space, system configuration, result precision, and so on. Although regular object-oriented programming techniques (e.g., composite object classes) may be used to implement the proposed concepts, they neither explicitly capture these concepts nor provide a mechanism to enforce them.
The a real-time system attempts to satisfy the timing requirements of as many tasks as possible, given that all hard real-time tasks are accommodated. If a system can only make hard deadline guarantees, some application requirements may not be met and the computational resources will be under-utilized. In such cases, a real-time system with hard and probabilistic guarantees is needed in order to satisfy the requirements of as many tasks as possible. We are investigating approaches to make probabilistic, as well as hard, deadline guarantees.
It has been shown that the database schemata often experience considerable changes during the development and initial use phases of database systems for advanced applications, mostly due to changing requirements. An automated schema evolution system can significantly reduce the amount of work and potential errors related to schema changes. Although schema evolution for non-real-time databases was the subject of some previous research, its impact on real-time database systems remains unexplored. We re-evaluate previous (non-real-time) schema evolution support in the context of RTDBs, which results in several modifications to the semantics of schema changes and to the needs of schema change resolution rules and schema invariants. Furthermore, we expand the schema change framework with new constructs-including new schema change operators, new resolution rules, and new invariants-for handling additional features of the real-time object model.
In collaboration with the Medical Human Genome Center at the University of Michigan, we have completed the design and implementation of a scientific database for the map assembly tasks performed by the geneticists at the Center. Our system manages complex genomic data and supports the automation of the associated map assembly tasks. For this purpose, we have developed an overlap refinement hierarchy characterizing the types of overlap and ordering relationships between DNA fragments. Based on this model, we designed an associated set of inference operators to automate some of the contig assembly steps, such as inferring overlap information using transitivity rules. In order to realize this map inferencing approach and to take advantage of the modeling power offered by object-oriented database technology, we have developed an active object-oriented database (OODB) system, called Crystal, on the GemStone OODB. Crystal seamlessly integrates rule inferencing with object modeling and other typical database capabilities. We have accomplished the implementation of the physical map assembly (PMA) tool on top of Crystal.
We are planning to continue this effort by investigating optimization strategies for our working active OODB system, in particular, incremental condition evaluation, indexing of rules and data, and by building tools to support truth maintenance directly within the object-oriented system.