[Home] | [Research Interests] | [Publications] | [Working Experience] | [TA] | [Personal]

My Research Areas
Adaptive Distributed Continuous Query Processing.
Currently, I am working on the distributed and parallel continuous query processing. Particular, I propose to address the scalability concerns of complex continuous query processing. I am working on the impact of initial query plan distribution, dynamic adaptation policies and query plan adaptations on the parallel continuous query processing performance. Proposed techniques are designed and evaluated in the D-Cape system, a large-scale continuous query engine developed at WPI DSRG Lab.

Scalable ETL and Data Warehousing.
Project Objectives:
Data warehousing technology involves first gathering data from heterogeneous and autonomous data sources and then providing an integrated typically aggregated view of such data to the end users. Such technology is not only critical for effective decision support scenarios, but also to more modern emerging applications such as e-business and semantic web. New trends to further stress existing technology include that: 1) the data that needs to be integrated from different data sources is larger and larger, and 2) the distributed data sources are of increasingly more dynamic nature undergoing frequent updates. These trends pose challenges on developing scalable technologies for both loading and updating such data warehouse data.

In this project, we propose to study the following two aspects related to the above challenges. 1) Develop scalable and reliable ETL techniques to extract large-scale data from distributed data sources, transform it and then load it into a remote database, often a data warehouse. 2) Develop high-performing maintenance algorithms to incorporate the changes of dynamic sources into the remote view to assure view freshness.

These two objectives are closely interrelated, since both aim to incorporate the data from data sources (either delta changes or the whole data) to a typically remote database. And, for both scenarios, we intend to study, adopt and adapt core database technologies in as much as possible.

Concrete tasks for this project include for example:
  • Model the cost of ETL operators and the resource of processing sites available for distributed ETL processing.
  • Partition an ETL transformation plan to allocate particular operators to different processing sites to exploit scalable distributed query processing features.
  • Develop a cost model to guide the dynamic re-allocation of ETL operators for adaptive ETL support.
  • Efficiently resume the interrupted ETL process after failure of one of the processing sites.
  • Unify and apply the techniques developed in ETL to the view maintenance task, and vice versa.
  •  
    View Maintenance in Dynamic Environments
    I've got involved in materialized maintenance area since Jan. 2001. i worked on developing optimization strategies towards the improvement of incremental view maintenance performance. A batch view maintenance strategy which capable of maintaining both source schema changes and data updates was proposed. A grouping view maintenance strategy was proposed to further optimize the view maintenance performance when maintaing large batches of source updates. A two-layered view maintenance optimization framework also has been implemented based on the TxnWrap testbed.
     
    Artifical Neural Network
    I was involved some research work in Artifical Neural Network area from Sep. 1996 to Mar. 1999 when i was in Beijing Institute of Technology, Beijing, China. As a member of reasearch team there, i took part in developing a Neural Network Simulation Environments. And also did some work on Object Identification using Neural Network algorithms.