PI- National Science Foundation (NSF)-Award: CRI-1305258: 8/1/13 - 7/31/15, “Compute Infrastructure for Large-Scale Data Analytics”, $189,952.
Recent Research Projects
|* InsightNotes: Large-Scale Annotation Management.
* Redoop: Recurring Queries in MapReduce Infrastructure.
* HandsOn DB: Managing Human-Involved Dependencies in RDBMS.
* CoHadoop & E3: Hadoop-Based Query Optimizations.
expertise that I am keen to bring to WPI and integrate into both the
research and teaching activities fall under the broad areas of big data
management and analytics. I enjoy system-oriented research in which my
team builds prototype engines and techniques—possibly within larger
systems such as relational database, Hadoop, and NoSQL engines—and
deploy these prototypes to serve diverse real-world applications. |
My research interests span the areas of big data infrastructres, scalable data analytics, query optimization, scalable data curation and metadata management, scientific data management, and scalable graph (network) analytics.
Specific areas of recent focus:
Next-Generation of Data Curation Engines:
Data curation and metadata management play significant roles in most modern scientific applications. The creation and maintenance of annotated and curated databases require a great deal of effort (and cost) from many scientists and domain experts. Yet, the gain from the maintained annotations is still very limited, and the virtue of the annotations' hidden knowledge is still uncharted. This is especially true because the growing volume, profound complexity, increasing heterogeneity, and hidden semantics of the emerging annotation repositories create unprecedented challenges to annotation management techniques.
In addition, the rapid evolution of the big data infrastructure, e.g., MapReduce, Spark, and NoSQL databases, creates numerous challenges and research opportunities for curation, annotation, and metadata management including distributed metadata processing, metadata migration across different infrastructures, provenance tracking through complex workflows over possibly several platforms, proactive curation techniques, and many others.
In my research, we rethink the whole design methodology and processing cycle of metadata management. We address various challenges including new metadata models, extensible semantic extraction and summarization, advanced query processing and manipulation, proactive and automated data curation, distributed metadata processing, and metadata quality and verification.
Related Papers: EDBT '16, VLDBJ '16, SIGMOD '15, EDBT '15, SIGMOD '14, EDBT '09
Big Data Management & Analytics:
Big data is one of the most critical and highly promising areas of both research and teaching. Almost all modern and emerging applications are collecting and/or generating massive volumes of data. Therefore, innovations in these applications is only hindered by their ability to analyze and discover knowledge from the data in a timely and scalable fashion. Big data warrant the need for new infrastructures, innovative optimizations, and efficient indexing techniques.
My team is exploring various optimizations and advances to big data infrastructures, and more specifically on Hadoop.
Related Papers: ICDE'17, PVLDB '16, SSDBM '16, VLDBJ '16, PVLDB '15, PVLDB '14, EDBT '14, EDBT '13, PVLDB '11
|Selected Publications: DBLP , Google Scholar
|Lei Cao, izhou Yany, Caitlin Kuhlmany, Qingyang Wangy, Elke A. Rundensteinery, Mohamed Y. Eltabakh, "Multi-Tactic Distance-Based Outlier Detection", International Conference on Data Engineering (ICDE) 2017, to appear [pdf]
Hai Liu, Dongqing Xiao, Pankaj Didwania, Mohamed Y. Eltabakh, “Exploiting Soft and Hard Correlations in Big Data Query Optimization”, The Very Large Databases Endowment Journal (PVLDB), 9(10), 2016, to appear. [pdf]
Xuebin He, Stephen Donohue, Mohamed Y. Eltabakh, "Discovering Correlations in Annotated Databases", The International Conference on Extending Database Technology (EDBT), 2016, to appear. [pdf]
Dongqing Xiao, Mohamed Y. Eltabakh, Xiangnan Kong, “Bermuda: An Efficient MapReduce Triangle Listing Algorithm for Web-Scale Graphs”, The International Conference of Scientific and Statistical Database Management (SSDBM), 2016, to appear. [pdf]
Yue Lu, Yuguan Li, Mohamed Y. Eltabakh, "Decorating the cloud: enabling annotation management in MapReduce", The Journal of Very Large Databases (VLDB Journal), 25(3), pp. 399-424, 2016. [pdf]
Dongqing Xiao, Armir Bashllari, Tyler Menard, Mohamed Eltabakh, "Even Metadata is Getting Big: Annotation Summarization using InsightNotes", Demo in SIGMOD 2015, Melbourne, Australia [pdf]
Karim Ibrahim, Xiao Du, Mohamed Y. Eltabakh, "Proactive Annotation Management in Relational Databases", SIGMOD 2015, Melbourne, Australia [pdf]
Chuan Lei, Zhongfang Zhuang, Elke A. Rundensteiner, Mohamed Eltabakh, "Shared Execution of Recurring Workloads in MapReduce", PVLDB 2015, Hawaii [pdf]
Karim Ibrahim, Dongqing Xiao, Mohamed Y. Eltabakh, “Elevating Annotation Summaries To First-Class Citizens In InsightNotes", EDBT 2015, Belgium, [pdf].
Chuan Lei, Zhongfang Zhuang, Elke Rundensteiner, Mohamed Y. Eltabakh, "Redoop Infrastructure for Recurring Big Data Queries", Demo in VLDB 2014, China [pdf]
Dongqing Xiao, Mohamed Y. Eltabakh, "InsightNotes: Summary-Based Annotation Management in Relational Databases", SIGMOD 2014, Utah, USA. [pdf]
Yang Zheng, Annies Ductan, Devin Thomas, Mohamed Y. Eltabakh, "Complex Patten Processing in Spatio-Temporal Databases", DATA 2014, Vienna, Austria [pdf]
Anh Pham, Mohamed Y. Eltabakh, "FunctionGuard: A Query Engine For Expensive Scientific Functions In Relational Databases", DATA 2014, Vienna, Austria [pdf]
Chuan Lei, Elke Rundensteiner, Mohamed Y. Eltabakh, “Redoop: Supporting Recurring Queries in Hadoop’’, EDBT 2014, Athens, Greece. [pdf]
Karim Ibrahim, Nathaniel Selvo, Mohamed El-Rifai, Mohamed Y. Eltabakh, “FusionDB: Conflict Management System for Small-Science Databases’’, Demo in CIKM, 2013, pp. 2469- 2472, CA, USA. [pdf]
Dongqing Xiao, Mohamed Y. Eltabakh, “STEPQ: Spatio-Temporal Engine for Complex Pattern Queries”, International Symposium on Spatial and Temporal Databases (SSTD) 2013, pp. 386-390, Munich Germany. [pdf]
Mohamed Eltabakh, Fatma Ozcan, Yannis Sismanis, Peter Haas, Hamid Pirahesh, and Jan Vondrak. "Eagle-Eyed Elephant: Split-Oriented Indexing in Hadoop", EDBT 2013, Genoa, Italy. [pdf]
Mohamed Eltabakh, Walid Aref, Ahmed Elmagarmid, Mourad Ouzzani, “HandsOn DB: Managing Data Dependencies involving Human Actions”, TKDE, 2013. [pdf]
Mohamed Y. Eltabakh, Jalaja Padma, Yasin N. Silva, Walid G. Aref, Elisa Bertino, “Query Processing with K-Anonymity”, International Journal of Data Engineering (IJDE), (3, 2), pp. 48-65, 2012. [pdf]
Mohamed Y. Eltabakh, Yuanyuan Tian, Fatma Ozcan, Rainer Gemulla, Aljoscha Krettek, John McPherson, "CoHadoop: Flexible Data Placement and Its Exploitation in Hadoop", PVLDB 4(9): 575-585, 2011. [pdf]
Kevin Beyer, Vuk Ercegovac, Rainer Gemulla, Andret Balmin, Mohamed Y. Eltabakh, Carl-Christian Kanne, Fatma Ozcan, Eugebe Shekita, "Jaql: A Scripting Language for Large Scale Semi-Structured Data Analysis", PVLDB 4(9), 2011. [pdf]
Mohamed Y. Eltabakh, Walid G. Aref, Ahmed K. Elmagarmid, Yasin Silva, Mourad Ouzzani, “Supporting Real-world Activities in Database Management Systems”, Short paper in the International Conference on Data Engineering (ICDE) 2010, Los Angeles, CA, pp 808-811. [pdf]
Mohamed Y. Eltabakh, Walid G. Aref, Ahmed K. Elmagarmid: "A database server for next-generation scientific data management". ICDE PhD Workshops 2010: 313-316. [pdf]
Mohamed Y. Eltabakh, Walid G. Aref, Ahmed K. Elmagarmid, Mourad Ouzzani, Yasin Silva “Supporting Annotations on Relations”, In Proceedings of the International Conference on Extending Database Technology (EDBT) 2009, Saint-Petersburg, Russia, pp. 379-390. [pdf]
Mohamed Y. Eltabakh, Wing-Kai Hon, Rahul Shah, Walid G. Aref, Jeffrey S. Vitter, “The SBC-Tree: An Index for Run-Length Compressed Sequences”, In Proceedings of the International Conference on Extending Database Technology (EDBT) 2008, Nantes, France, pp. 523-534. [pdf]
Mohamed Y. Eltabakh, Mourad Ouzzani, Walid G. Aref, Ahmed K. Elmagarmid, Yasin Laura-Silva, Muhammad Arshad, David Salt, Ivan Baxter “Managing Biological Data using bdbms”, Demo in the International Conference on Data Engineering (ICDE) 2008, Cancun, Mexico, pp. 1600-1603. [pdf]
Mohamed Y. Eltabakh, Mourad Ouzzani, Walid G. Aref, “Duplicate Elimination in Space-partitioning Tree Indexes”, In Proceedings of the International Conference on Scientific and Statistical Database Management (SSDBM) 2007, Banff, Canada, pp. 18-27. [pdf]
Rafae Bhatti, Arjumand Samuel, Mohamed Y. Eltabakh, Haseeb Amjad, Arif Ghafoor, “Engineering a Policy-Based System for Federated Healthcare Databases”, IEEE Transactions on Knowledge and Data Engineering Journal (TKDE) 2007 19(9), pp. 1288-1304. [pdf]
Mohamed Y. Eltabakh, Mourad Ouzzani, Walid G. Aref, “bdbms: A Database Management System for Biological Data”, In Proceedings of the Conference on Innovative Data Systems Research (CIDR) 2007, Asilomar, CA, pp.196-206. [pdf]