“Big Data is (at least) Four Different Problems”
MICHAEL STONEBRAKER, PhD
Computer Science & Artificial
Thursday, April 11, 2013
1:00 p.m. – 2:00 p.m.
Higgins Laboratory, 218
"Big Data" means different things to different people. To me, it means four totally different problems:
BIG VOLUMES OF DATA, but "small" analytics. The traditional data warehouse vendors support SQL analytics on very large volumes of data.
In this talk, I make a few comments on where I see this market going.
BIG ANALYTICS on big volumes of data. By big analytics, I mean data clustering, regressions, machine learning, and other much more complex analytics on very large amounts of data. I will explain the various approaches to in tegrating complex analytics into DBMSs, and discuss which ones seem more promising. In addition, I will explore why features need to be added to Hadoop to make it a player in this market.
BIG VELOCITY. By this I mean being able to absorb and process a firehose of incoming data for applications like electronic trading.
In this market, the traditional SQL vendors are a non-starter. I will discuss alternatives including complex event processing (CEP), NoSQL and NewSQL systems.
BIG DIVERSITY. Many enterprises are faced with integrating a larger and larger number of data sources with diverse data (spreadsheets, web sources, XM L, traditional DBMSs). The traditional ETL products do not appear up to th e challenges of this new world, and I talk about an alternate way to go.
Dr. Stonebraker has been a pioneer of data base research and technology for more than a quarter of a century. He was the main architect of the INGRES relational DBMS, and the object-relational DBMS, POSTGRES. These prototypes were developed at the University of California at Berkeley where Stonebraker was a Professor of Computer Science for twenty five years. More recently at M.I.T. he was a co-architect of the Aurora/Borealis stream processing engine, the C-Store column-oriented DBMS, and the H-Store transaction pro cessing engine. Currently, he is working on science-oriented DBMSs, OLTP D BMSs, and scalable data curation. He is the founder of five venture- capital backed startups, which commercialized his prototypes. Presently he serves as Chief Technology Officer of VoltDB and Paradigm4, Inc.
Professor Stonebraker is the author of scores of research papers on data ba se technology, operating systems and the architecture of system software se rvices. He was awarded the ACM System Software Award in 1992, for his work on INGRES. Additionally, he was awarded the first annual Innovation award by the ACM SIGMOD special interest group in 1994, and was elected to the N ational Academy of Engineering in 1997.
He was awarded the IEEE John Von Neumann award in 2005, and is presently an Adjunct Professor of Computer Science at M.I.T, where he is co-director of the new Intel Science and Technology Center focused on big data.
Host: Prof. Elke Rundensteiner