Worcester Polytechnic Institute–logo

Abhishek Mukherji

Worcester Polytechnic Institute– MA

Publications
Abhishek Mukherji, Elke A. Rundensteiner, Matthew O. Ward: Achieving High Freshness and Optimal Throughput in CPU-Limited Execution of Multi-Join Continuous Queries, BNCOD 2011: 48-65.
abstract..
Due to high data volumes and unpredictable arrival rates, continuous query systems processing expensive queries in real-time may fail to keep up with the input data streams - resulting in buffer overflow and uncontrolled loss of data. In this work, we investigate the applicability of join direc- tion adaptation (JDA) to tackle resource-limited processing of multi-join stream queries. The state-of-the-art JDA solutions focus only on individual operators. While they allocate the scarce CPU resources to the most productive half-way join within the operator, we instead leverage the operator inter- dependencies to optimize the overall query throughput. We identify result staleness as an impending issue in resourcelimited processing, which gets further aggravated by application of the throughput optimizing techniques. We propose the path-productivity model for throughput optimization. We also extend the model to enforce the Freshness requirements that our system allows the users to specify. We show that the twin problems of throughput optimization and freshness fulfillment can be mapped to the knapsack and the set-cover problems, respectively. Based on this insight, we propose our JAQPOT approach, the first integrated solution to achieve near optimal query throughputwhile also guaranteeing fulfillment of the desired result freshness. JAQPOT runs in quadratic time of the number of streams irrespective of the query plan shape. Our experimental study, using both synthetic and real data sets, demonstrates that JAQPOT achieves 2 to 6 times higher throughput than the state-of-the-art strategies while, unlike the other methods, also fulfilling freshness predicates.
Abhishek Mukherji, Elke A. Rundensteiner, David C. Brown, Venkatesh Raghavan: SNIF TOOL: sniffing for patterns in continuous streams, CIKM 2008: 369-378.
PDF | PPT | abstract..
Continuous time-series sequence matching, specifically, matching a numeric live stream against a set of predefined pattern sequences, is critical for domains ranging from fire spread tracking to network traffic monitoring. While several algorithms exist for similarity matching of static time-series data, matching continuous data poses new, largely unsolved challenges including online real-time processing requirements and system resource limitations for handling infinite streams. In this work, we propose a novel live stream matching framework, called n-Snippet Indices Framework (in short, SNIF), to tackle these challenges. SNIF employs snippets as the basic unit for matching streaming time-series. The insight is to perform the matching at two levels of granularity: bag matching of subsets of snippets of the live stream against prefixes of the patterns, and order checking for maintaining successive candidate snippet bag matches. We design a two-level index structure, called SNIF index, which supports these two modes of matching. We propose a family of online two-level prefix matching algorithms that trade off between result accuracy and response time. The effectiveness of SNIF to detect patterns has been thoroughly tested through experiments using real datasets from the domains of fire monitoring and sensor motes. In this paper, we also present a study of SNIF's performance, accuracy and tolerance to noise compared against those of the state-of-the-art Continuous Query with Prediction (CQP) approach.
Venkatesh Raghavan, Elke A. Rundensteiner, John Woycheese, Abhishek Mukherji: FireStream: Sensor Stream Processing for Monitoring Fire Spread, ICDE 2007: 1507-1508.
PDF | PPT | abstract..
This demonstration presents FireStream, a sensor stream processing system which provides services for run-time detection, monitoring and visualization of fire spread in intelligent buildings that can be of great benefit to first responders. Our system can effectively handle large heterogeneous sensor streams using shared window execution and dynamic participant handling to yield a high-ary MJoin solution.
Abhishek Mukherji, Elke A. Rundensteiner, Matthew O. Ward: Cost-based Optimization of Localized Association Rule Mining Queries, (in submission).
PPT | abstract..
Association Rule Mining is a valuable decision support technique that can be used in analysis of customer preferences, buying patterns and product correlations. Current data mining systems are however handicapped by the long processing times required by the current mining algorithms. In this paper we focus on enabling users with the capability of interactively mining rules that are not only valid in the global context but may be hidden from a global perspective but are significant in a local context. Such user requests can be traditionally processed by running a rule mining algorithm from scratch over the user-chosen subset. Alternatively, we propose a two level frequent itemset cache that can be used to compose the answers to these online mining queries. In our target scenario,where both the user-specified minSupport and minConfidence thresholds as well as the user-chosen subset are arbitrary, neither of the two approaches is a clear winner over the other. Thus, we propose the COLARM framework, that utilizes cost-based query optimization to select the optimal plan based on the user-selections.
Xika Lin, Abhishek Mukherji, Elke A. Rundensteiner, Matthew O. Ward: Summarizing Parameter Space for Visual Exploration of Association Rules, (in preparation).
Technical Reports
Abhishek Mukherji, Elke A. Rundensteiner: Achieving High Freshness and Optimal Throughput in Resource-Limited Execution of Multi-Join Continuous Queries
PDF
Research Projects
XMDVTool–CS@WPI, Research Assistant
May 2009 - present
XMDVTool is a public-domain software package for the interactive visual exploration of multivariate data sets. Working on the Managing Discoveries in Visual Analytics and the Interactive Stream Views: Visual Analysis of Streaming Data projects, both funded by NSF. My current work focuses on integrating scientific hypothesis testing with exploratory data mining for improved knowledge discovery. Also employed query optimization techniques for data mining operations such as association rule mining. The overall research goal is to develop a recommender system to assist analysts and scientists to explore not only the data space but also the nugget space consisting of interesting information such as clusters, association rules, annotations about the base data. The vision is to utilize the nugget repositories to not only improve hypothesis testing but also help analysts in sense-making to learn reasons behind significant hypotheses.
CAPE–CS@WPI, Graduate Researcher
August 2005 - present
Constraint-exploiting Adaptive Processing Engine (CAPE) is a continuous query engine capable of processing user queries over potentially infinite live streaming data. Currently I am working on CPU-limited processing of multi-join queries. My proposed JAQPOT approach guarantees optimal query throughput by efficient utilization of the given limited resources. I map the problem space to a combination of the set cover and the knapsack problems to find this efficient utilization. Previously, I worked in a group of 4 on developing FireStreams: a Fire Monitoring and Prediction System, enhancing the capabilities of the CAPE Stream Processor to execute monitoring and tracking queries for firefighting. My focus was on the subproblem of predicting fire behavior by developing a pattern matching technique for streaming sensor data. I adapted the IR n-Gram inverted index approach, originally applicable in Text Mining, to build a framework for detecting patterns in real numbered sensor data. The distributed version of CAPE is called (D-CAPE)
FOCAL–CS@WPI, Research Assistant
Spring 2006
Worked, as a research assistant with Software Engineering Research Group (SERG@WPI), on implementing the concept of evolving legacy systems by locating system features using regression test cases. We, in a group of 3, worked on developing a C code refactoring tool as an eclipse plug-in extending CDT. Our approach aimed at bridging the complexity gap between the problem and solution domains.
Milestones in CS@WPI
MS Thesis (January 2008): SNIF TOOL: sniffing for patterns in continuous streams.
PPT | (also available as a 12-page research paper published in CIKM 2008).
PhD Qualifier (May 2009): Resource-Limited Execution of Multi-Join Continuous Queries: The JAQPOT Approach.
PPT | (work in submission).

Database Systems Conferences

Upcoming Conferences

Name Paper Deadline Notification Date Conference Date Venue
VLDB 2012 1st of each month until Mar 1, 2012 Two months after submission Aug 27, 2012 Istanbul, Turkey
SIGMOD 2012 Nov 1, 2011 Feb 14, 2012 May 20, 2012 Scottsdale, Arizona, USA
EDBT 2012 Sep 15, 2011 Dec 8, 2011 Mar 26, 2011 Berlin, Germany
ICDE 2012 Jul 19, 2011 Sep 27, 2011 Apr 1, 2012 Washington DC, USA
SOCC 2011 Apr 30, 2011 Jul 11, 2011 Oct 27, 2011 Cascais, Portugal
VLDB 2011 1st of each month until Mar 1, 2011 Two months after submission Aug 29, 2011 Seattle, WA, USA
Useful read for stream data processing and CQ newbies
hit counter
html hit counter