beckman abadi-5min-pres
DESCRIPTION
Slides forthe Beckman Database Research Self-Assessment Meeting. Provides thoughts about Big Data and the database systems research community.TRANSCRIPT
@daniel_abadi
Yale University
*Big Data and the Database Community
*Big Data
*The Big Data phenomenon is the best thing that could have happened to the database community
*Despite other definitions related to ‘3 Vs’ --- Big Data means BIG Data
*Which means we need scalable database systems
*Still two main components of Big Data
*Performing data analysis at scale
*Performing requests on data at scale
*Performing Data Analysis at Scale
*Database community has won the battle
*Some thought that MapReduce might replace traditional database technology as the primary means to perform analysis at scale
* Just about every MapReduce vendor has abandoned this goal
*Hadapt, Impala, Tez, and several others are in a race to see who can add the most traditional database execution technology to Hadoop fastest
*Everyone is going in the direction of cost-based optimizers, traditional database operators, and push-based query execution
*Performing Requests on Data
at Scale
*The database community is losing the battle
*NoSQL systems still have very little traditional database technology inside (despite adding SQL interfaces)
*No race to add DB technology --- why?
* Don’t blame CAP --- CAP is only relevant when there’s a network partition
* We never figured out how to do ACID and active replication at scale
* Many new proposals make simplifying assumptions in order to handle scale
* It’s been 30 years ---- why can’t we build a distributed database that can handle distributed transactions over actively replicated data at scale?