beckman abadi-5min-pres

4
@daniel_abadi Yale University * Big Data and the Database Community

Upload: daniel-abadi

Post on 25-May-2015

405 views

Category:

Documents


7 download

DESCRIPTION

Slides forthe Beckman Database Research Self-Assessment Meeting. Provides thoughts about Big Data and the database systems research community.

TRANSCRIPT

Page 1: Beckman abadi-5min-pres

@daniel_abadi

Yale University

*Big Data and the Database Community

Page 2: Beckman abadi-5min-pres

*Big Data

*The Big Data phenomenon is the best thing that could have happened to the database community

*Despite other definitions related to ‘3 Vs’ --- Big Data means BIG Data

*Which means we need scalable database systems

*Still two main components of Big Data

*Performing data analysis at scale

*Performing requests on data at scale

Page 3: Beckman abadi-5min-pres

*Performing Data Analysis at Scale

*Database community has won the battle

*Some thought that MapReduce might replace traditional database technology as the primary means to perform analysis at scale

* Just about every MapReduce vendor has abandoned this goal

*Hadapt, Impala, Tez, and several others are in a race to see who can add the most traditional database execution technology to Hadoop fastest

*Everyone is going in the direction of cost-based optimizers, traditional database operators, and push-based query execution

Page 4: Beckman abadi-5min-pres

*Performing Requests on Data

at Scale

*The database community is losing the battle

*NoSQL systems still have very little traditional database technology inside (despite adding SQL interfaces)

*No race to add DB technology --- why?

* Don’t blame CAP --- CAP is only relevant when there’s a network partition

* We never figured out how to do ACID and active replication at scale

* Many new proposals make simplifying assumptions in order to handle scale

* It’s been 30 years ---- why can’t we build a distributed database that can handle distributed transactions over actively replicated data at scale?