Distance Metric Learning on Attributed Sequences
Mar 2017 - Oct 2017
Distance metric learning has attracted much attention in recent years, where the goal is to learn a distance metric based on user feedback. In this project, we study the problem of deep metric learning on attributed sequences. We propose a deep learning framework, called MLAS (Metric Learning on Attributed Sequences), to learn a distance metric that effectively measures dissimilarities between attributed sequences. Empirical results on real-world datasets demonstrate that the proposed MLAS framework significantly improves the performance of metric learning compared to state-of-the-art methods on attributed sequences.
Attributed Sequence Embedding
May 2016 - Feb 2017
Attributed sequence is a generalization of the data model in many real-world applications, from clickstream to gene. In this project, we propose a deep multimodal learning framework, called NAS, to produce feature representations of attributed sequences. NAS effectively identifies the dependencies between attributes and sequences. The embeddings are task independent and can be used on various mining tasks of attributed sequences.
Preference-aware Recurring Query Optimization
PRO is a preference-aware recurring query processing system that produces a recurring execution configuration that meets the application guidelines expressed via preference models. We propose an approach to tackle this maximal preference execution configuration problem using a PRO execution relation graph (ERG) model that effectively incorporates the dependencies between executions. This enables us to transform this problem into the well-known minimum weight length-k path problem, and to further design a dynamic-programming based pseudo-polynomial solution, called PRO-OPT. We also introduce adaptive re-optimization techniques to tackle the problem of fluctuating stream workloads.
Zhongfang Zhuang, Chuan Lei, Elke A. Rundensteiner, and Mohamed Eltabakh. "PRO: Preference-Aware Recurring Query Optimization." CIKM 2016
Recurring Query Optimization
Helix is the first scalable multi-query sharing engine tailored for recurring workloads in the MapReduce infrastructure. Helix deploys new sliced window-alignment techniques to create sharing opportunities among recurring queries without introducing additional I/O overheads or unnecessary data scans. It introduces a cost/benefit model for creating a sharing plan among the recurring queries, and a scheduling strategy for executing them to maximize the SLA satisfaction.
Chuan Lei, Zhongfang Zhuang, Elke A. Rundensteiner, and Mohamed Eltabakh. "Shared execution of recurring workloads in MapReduce." VLDB 2015
This demonstration presents the Redoop infrastructure, the first full-fledged MapReduce framework with native support for recurring big data queries. We demonstrate Redoop’s capabilities on a compute cluster with real life workloads including click-stream and sensor data analysis.
Chuan Lei, Zhongfang Zhuang, Elke A. Rundensteiner, and Mohamed Eltabakh. "Redoop infrastructure for recurring big data queries." VLDB 2014.