Zhongfang Zhuang

Ph.D. Candidate @ Data Science Research Group

Computer Science Department, Worcester Polytechnic Institute

Links:   LinkedIn   Email   PGP


About Me

I am a PhD candidate in the Computer Science Department at WPI. I am working in Data Science Research Group (DSRG) with Professors Elke Rundensteiner and Xiangnan Kong. My research interest includes data mining in real-world problem settings, deep learning models and large-scale data management infrastructures.


Deep Learning On Attributed Sequences

Committee Members

Prof. Elke Rundensteiner, Advisor, WPI
Prof. Xiangnan Kong, Co-advisor, WPI
Prof. Mohamed Eltabakh, WPI
Prof. Philip Yu, University of Illinois at Chicago

* Dissertation Proposal Defense: Nov 30, 2017

Ongoing Project

Fraud Detection in One Shot

Dec 2017 - Now

Fraud detection and prevention has been a prevalent topic in various industries. However, the lacking of a large number of fraudulent instances poses significant challenges of data mining models. In this project, we aim at solving the problem of "How to detect fraudulent with only one instance using attributed sequences".

Past Projects

Distance Metric Learning on Attributed Sequences

Mar 2017 - Oct 2017

Distance metric learning has attracted much attention in recent years, where the goal is to learn a distance metric based on user feedback. In this project, we study the problem of deep metric learning on attributed sequences. We propose a deep learning framework, called MLAS (Metric Learning on Attributed Sequences), to learn a distance metric that effectively measures dissimilarities between attributed sequences. Empirical results on real-world datasets demonstrate that the proposed MLAS framework significantly improves the performance of metric learning compared to state-of-the-art methods on attributed sequences.

Attributed Sequence Embedding

May 2016 - Feb 2017

Attributed sequence is a generalization of the data model in many real-world applications, from clickstream to gene. In this project, we propose a deep multimodal learning framework, called NAS, to produce feature representations of attributed sequences. NAS effectively identifies the dependencies between attributes and sequences. The embeddings are task independent and can be used on various mining tasks of attributed sequences.

Preference-aware Recurring Query Optimization

PRO is a preference-aware recurring query processing system that produces a recurring execution configuration that meets the application guidelines expressed via preference models. We propose an approach to tackle this maximal preference execution configuration problem using a PRO execution relation graph (ERG) model that effectively incorporates the dependencies between executions. This enables us to transform this problem into the well-known minimum weight length-k path problem, and to further design a dynamic-programming based pseudo-polynomial solution, called PRO-OPT. We also introduce adaptive re-optimization techniques to tackle the problem of fluctuating stream workloads.

Zhongfang Zhuang, Chuan Lei, Elke A. Rundensteiner, and Mohamed Eltabakh. "PRO: Preference-Aware Recurring Query Optimization." CIKM 2016

Recurring Query Optimization

Helix is the first scalable multi-query sharing engine tailored for recurring workloads in the MapReduce infrastructure. Helix deploys new sliced window-alignment techniques to create sharing opportunities among recurring queries without introducing additional I/O overheads or unnecessary data scans. It introduces a cost/benefit model for creating a sharing plan among the recurring queries, and a scheduling strategy for executing them to maximize the SLA satisfaction.

Chuan Lei, Zhongfang Zhuang, Elke A. Rundensteiner, and Mohamed Eltabakh. "Shared execution of recurring workloads in MapReduce." VLDB 2015

Redoop Infrastructure

This demonstration presents the Redoop infrastructure, the first full-fledged MapReduce framework with native support for recurring big data queries. We demonstrate Redoop’s capabilities on a compute cluster with real life workloads including click-stream and sensor data analysis.

Chuan Lei, Zhongfang Zhuang, Elke A. Rundensteiner, and Mohamed Eltabakh. "Redoop infrastructure for recurring big data queries." VLDB 2014.

Service Activity

External Reviewer for

EDBT 2014, 2017 VLDB 2015, ICDE 2016, SIGMOD 2015, 2017, 2018


Master of Engineering, Beijing University of Posts and Telecommunications, 2013

Bachelor of Engineering, Xi'an University of Posts and Telecommunications, 2011