![Page 1: "Navigating the Database Universe" by Dr. Michael Stonebraker and Scott Jarr, VoltDB](https://reader034.vdocuments.us/reader034/viewer/2022052522/54b6cf724a79596f468b4616/html5/thumbnails/1.jpg)
Navigating the Database Universe
Dr. Michael Stonebraker and Scott Jarr
![Page 2: "Navigating the Database Universe" by Dr. Michael Stonebraker and Scott Jarr, VoltDB](https://reader034.vdocuments.us/reader034/viewer/2022052522/54b6cf724a79596f468b4616/html5/thumbnails/2.jpg)
About Our Presenters
Mike Stonebraker
Co-founder & CTO, VoltDB
A pioneer of database research and technology for more than a quarter of a century, and the main architect of the Ingres relational DBMS and the object-relational DBMS PostgreSQL
Scott Jarr
Co-founder & Chief Strategy Officer, VoltDB
More than 20 years of experience building, launching and growing technology companies from inception to market leadership in the search, mobile, security, storage and virtualization markets
![Page 3: "Navigating the Database Universe" by Dr. Michael Stonebraker and Scott Jarr, VoltDB](https://reader034.vdocuments.us/reader034/viewer/2022052522/54b6cf724a79596f468b4616/html5/thumbnails/3.jpg)
• The (proper) design of DBMSs– Presented by Dr. Michael Stonebraker
• The database universe
• Where the future value comes from
Agenda
![Page 4: "Navigating the Database Universe" by Dr. Michael Stonebraker and Scott Jarr, VoltDB](https://reader034.vdocuments.us/reader034/viewer/2022052522/54b6cf724a79596f468b4616/html5/thumbnails/4.jpg)
• “Big Data” is a rare, transformative market
• Velocity is becoming the cornerstone
• Specialized databases (working together) are the answer
• Products must provide tangible customer value... Fast
We Believe…
![Page 5: "Navigating the Database Universe" by Dr. Michael Stonebraker and Scott Jarr, VoltDB](https://reader034.vdocuments.us/reader034/viewer/2022052522/54b6cf724a79596f468b4616/html5/thumbnails/5.jpg)
THE (PROPER) DESIGNOF THE DBMS
Dr. Michael Stonebraker
![Page 6: "Navigating the Database Universe" by Dr. Michael Stonebraker and Scott Jarr, VoltDB](https://reader034.vdocuments.us/reader034/viewer/2022052522/54b6cf724a79596f468b4616/html5/thumbnails/6.jpg)
Lessons from 40 Years of Database Design
1. Get the user interaction right
– Bet on a small number of easy-to-understand constructs
– Plus standards
2. Get the implementation right
– Bet on a small number of easy-to-understand constructs
3. One size does not fit all
– At least not if you want fast, big or complex
Those who don’t learn from history are destined to repeat it.
“”-Winston Churchill
![Page 7: "Navigating the Database Universe" by Dr. Michael Stonebraker and Scott Jarr, VoltDB](https://reader034.vdocuments.us/reader034/viewer/2022052522/54b6cf724a79596f468b4616/html5/thumbnails/7.jpg)
#1: Get the User Interaction Right
Winner: RDBMS• Simple data model
(tables)• Simple access
language (SQL)• ACID (transactions)• Standards (SQL)
Loser: CODASYL• Complicated data model
(records; participate in “sets”; set has one owner and, perhaps, many members, etc.)
• Messy access language (sea of “cursors”; some -- but not all -- move on every command, navigation programming)
Loser: OODBs• Complex data model
(hierarchical records, pointers, sets, arrays, etc.)
• Complex access language (navigation, through this sea)
• No standards
Historical Lesson: RDBMS vs. CODASYL vs. OODB
![Page 8: "Navigating the Database Universe" by Dr. Michael Stonebraker and Scott Jarr, VoltDB](https://reader034.vdocuments.us/reader034/viewer/2022052522/54b6cf724a79596f468b4616/html5/thumbnails/8.jpg)
Interaction Take Away − Simple is Good
• ACID was easy for people to understand
• SQL provided a standard, high-level language and made people productive (transportable skills)
![Page 9: "Navigating the Database Universe" by Dr. Michael Stonebraker and Scott Jarr, VoltDB](https://reader034.vdocuments.us/reader034/viewer/2022052522/54b6cf724a79596f468b4616/html5/thumbnails/9.jpg)
#2: Get the Implementation Right
• Leverage a few simple ideas: Early relational implementations– System R storage system dropped links– Views (protection, schema modification, performance)– Cost-based optimizer
• Leverage a few simple ideas: Postgres– User-defined data types and functions (adopted by most everybody)– Rules/triggers– No-overwrite storage
• Leverage a few simple ideas: Vertica– Store data by column– Compressed up the ging gong– Parallel load without compromising ACID
Histo
rical Win
ners
![Page 10: "Navigating the Database Universe" by Dr. Michael Stonebraker and Scott Jarr, VoltDB](https://reader034.vdocuments.us/reader034/viewer/2022052522/54b6cf724a79596f468b4616/html5/thumbnails/10.jpg)
#3: One Size Does NOT Fit All
• OSFA is an old technology with
hundreds of bags hanging off it
• It breaks 100% of the time when under
load
• Load = size or speed or complexity
• Load is increasing at a startling rate
• Purpose-built will exceed by 10x to 100x
• History has not been completely written
yet…but let’s look at VoltDB as an
example
…specialized systems can each be a factor of 50 faster than the single ‘one size fits all’ system…A factor of 50 is nothing to sneeze at.
“
”-My Top 10 Assertions About Data Warehouses, 2010
![Page 11: "Navigating the Database Universe" by Dr. Michael Stonebraker and Scott Jarr, VoltDB](https://reader034.vdocuments.us/reader034/viewer/2022052522/54b6cf724a79596f468b4616/html5/thumbnails/11.jpg)
Example: VoltDB
• Get the interface right– SQL– ACID
• Implementation: Leverage a few simple ideas– Main memory– Stored procedures– Deterministic scheduling
• Specialization– OLTP focus allowed for above implementation choices
![Page 12: "Navigating the Database Universe" by Dr. Michael Stonebraker and Scott Jarr, VoltDB](https://reader034.vdocuments.us/reader034/viewer/2022052522/54b6cf724a79596f468b4616/html5/thumbnails/12.jpg)
Proving the Theory
• Challenge: OLTP performance
– TPC-C CPU cycles
– On the Shore DBMS prototype
– Elephants should be similar
Recovery 24%Latching 24%
Buffer Pool 24%Locking 24%
Useful Work4%
![Page 13: "Navigating the Database Universe" by Dr. Michael Stonebraker and Scott Jarr, VoltDB](https://reader034.vdocuments.us/reader034/viewer/2022052522/54b6cf724a79596f468b4616/html5/thumbnails/13.jpg)
Implementation Construct #1: Main Memory
• Main memory format for data
– Disk format gets you buffer pool overhead
• What happens if data doesn’t fit?
– Return to disk-buffer pool architecture (slow)
– Anti-caching
• Main memory format for data
• When memory fills up, then bundle together elderly tuples and write them out
• Run a transaction in “sleuth mode”; find the required records and move to main memory (and pin)
• Run Xact normally
![Page 14: "Navigating the Database Universe" by Dr. Michael Stonebraker and Scott Jarr, VoltDB](https://reader034.vdocuments.us/reader034/viewer/2022052522/54b6cf724a79596f468b4616/html5/thumbnails/14.jpg)
Implementation Construct #2: Stored Procedures
• Round trip to the DBMS is expensive
– Do it once per transaction
– Not once per command
– Or even once per cursor move
• Ad-hoc queries supported
– Turn them into dynamic stored procedures
![Page 15: "Navigating the Database Universe" by Dr. Michael Stonebraker and Scott Jarr, VoltDB](https://reader034.vdocuments.us/reader034/viewer/2022052522/54b6cf724a79596f468b4616/html5/thumbnails/15.jpg)
Implementation Construct #3: Deterministic and Non-deterministic Scheduling
• Non-deterministic (can’t tell order until commit time)
– MVCC
– Dynamic locking
• Deterministic
– Time stamp order
![Page 16: "Navigating the Database Universe" by Dr. Michael Stonebraker and Scott Jarr, VoltDB](https://reader034.vdocuments.us/reader034/viewer/2022052522/54b6cf724a79596f468b4616/html5/thumbnails/16.jpg)
Result of Design Principles: VoltDB Example
• Good interface decisions – made developers more productive
– SQL & ACID
• Leveraging a few simple implementation ideas – made VoltDB wicked fast
– Main memory
– Stored procedures
– Deterministic scheduling
![Page 17: "Navigating the Database Universe" by Dr. Michael Stonebraker and Scott Jarr, VoltDB](https://reader034.vdocuments.us/reader034/viewer/2022052522/54b6cf724a79596f468b4616/html5/thumbnails/17.jpg)
Proving the Theory
• Answer: OLTP performance
– 3 million transactions per second
– 7x Cassandra
– 15 million SQL statements per second
– 100,000+ transactions per commodity server
…we are heading toward a world with at least 5 (and probably more) specialized engines and the death of the ‘one size fits all’ legacy systems.
“
”-The End of an Architectural Era (It’s Time for a Complete
Rewrite), 2007
![Page 18: "Navigating the Database Universe" by Dr. Michael Stonebraker and Scott Jarr, VoltDB](https://reader034.vdocuments.us/reader034/viewer/2022052522/54b6cf724a79596f468b4616/html5/thumbnails/18.jpg)
THE DATABASE UNIVERSE
Scott Jarr
![Page 19: "Navigating the Database Universe" by Dr. Michael Stonebraker and Scott Jarr, VoltDB](https://reader034.vdocuments.us/reader034/viewer/2022052522/54b6cf724a79596f468b4616/html5/thumbnails/19.jpg)
Technology Meets the Market
Believe
– “Big Data” is a rare, transformative market
– Velocity is becoming the cornerstone
– Specialized databases (working together) are the answer
– Products must provide tangible customer value… Fast
Observations
– Noisy, crowded and new – kinda like Christmas shopping at the mall
– Everyone wants to understand where the pieces fit
– Analysts build maps on technology NOT use cases
What we need is…
![Page 20: "Navigating the Database Universe" by Dr. Michael Stonebraker and Scott Jarr, VoltDB](https://reader034.vdocuments.us/reader034/viewer/2022052522/54b6cf724a79596f468b4616/html5/thumbnails/20.jpg)
Data Value Chain
Interactive Real-time Analytics Record Lookup Historical Analytics Exploratory Analytics
Milliseconds Hundredths of seconds Second(s) Minutes Hours
• Place trade• Serve ad• Enrich stream• Examine packet• Approve trans.
• Calculate risk• Leaderboard• Aggregate• Count
• Retrieve click stream
• Show orders
• Backtest algo• BI• Daily reports
• Algo discovery• Log analysis• Fraud pattern match
Age of Data
![Page 21: "Navigating the Database Universe" by Dr. Michael Stonebraker and Scott Jarr, VoltDB](https://reader034.vdocuments.us/reader034/viewer/2022052522/54b6cf724a79596f468b4616/html5/thumbnails/21.jpg)
Data Value Chain
Interactive Real-time Analytics Record Lookup Historical Analytics Exploratory Analytics
Milliseconds Hundredths of seconds Second(s) Minutes Hours
• Place trade• Serve ad• Enrich stream• Examine packet• Approve trans.
• Calculate risk• Leaderboard• Aggregate• Count
• Retrieve click stream
• Show orders
• Backtest algo• BI• Daily reports
• Algo discovery• Log analysis• Fraud pattern match
Value of Individual Data Item
Data V
alue
AggregateData Value
Age of Data
![Page 22: "Navigating the Database Universe" by Dr. Michael Stonebraker and Scott Jarr, VoltDB](https://reader034.vdocuments.us/reader034/viewer/2022052522/54b6cf724a79596f468b4616/html5/thumbnails/22.jpg)
Traditional RDBMSSimple SlowSmall
FastComplexLarge
Ap
pli
cati
on
Co
mp
lexi
ty
Value of Individual Data Item Aggregate Data Value
Data V
alue
The Database Universe
Interactive Real-time Analytics Record Lookup Historical Analytics Exploratory Analytics
Transactional Analytic
![Page 23: "Navigating the Database Universe" by Dr. Michael Stonebraker and Scott Jarr, VoltDB](https://reader034.vdocuments.us/reader034/viewer/2022052522/54b6cf724a79596f468b4616/html5/thumbnails/23.jpg)
Traditional RDBMSSimple SlowSmall
FastComplexLarge
Ap
pli
cati
on
Co
mp
lexi
ty
Value of Individual Data Item Aggregate Data Value
Data V
alue
NewSQLData
Warehouse
Hadoop, etc.NoSQL
Velocity
The Database Universe
Interactive Real-time Analytics Record Lookup Historical Analytics Exploratory Analytics
Transactional Analytic
![Page 24: "Navigating the Database Universe" by Dr. Michael Stonebraker and Scott Jarr, VoltDB](https://reader034.vdocuments.us/reader034/viewer/2022052522/54b6cf724a79596f468b4616/html5/thumbnails/24.jpg)
Closed-loop Big Data
Interactive & Real-time Analytics
Historical Reports & Analytics
Exploratory Analytics
loginssensors impressionsorders
authorizations clickstrades
![Page 25: "Navigating the Database Universe" by Dr. Michael Stonebraker and Scott Jarr, VoltDB](https://reader034.vdocuments.us/reader034/viewer/2022052522/54b6cf724a79596f468b4616/html5/thumbnails/25.jpg)
Closed-loop Big Data
• Make the most informed decision every time there is an interaction
• Real-time decisions are informed by operational analytics and past knowledge
Knowledge
Interactive & Real-time Analytics
Historical Reports & Analytics
Exploratory Analytics
loginssensors impressionsorders
authorizations clickstrades
![Page 26: "Navigating the Database Universe" by Dr. Michael Stonebraker and Scott Jarr, VoltDB](https://reader034.vdocuments.us/reader034/viewer/2022052522/54b6cf724a79596f468b4616/html5/thumbnails/26.jpg)
The Velocity Use Case
What’s it look like?
– High throughput, relentless data feeds
– Fast decisions on high-value data
– Real-time, operational analytics present immediate visibility
What’s the big deal?
– Batch converts to real time = efficiency
– Decisions made at time of event = better decisions
– Ability to micro segment/target/personalize/etc. = conversion, satisfaction, more data is
coming at you, use it to improve your business
![Page 27: "Navigating the Database Universe" by Dr. Michael Stonebraker and Scott Jarr, VoltDB](https://reader034.vdocuments.us/reader034/viewer/2022052522/54b6cf724a79596f468b4616/html5/thumbnails/27.jpg)
QUESTIONS AND ANSWERS
Next Up
![Page 28: "Navigating the Database Universe" by Dr. Michael Stonebraker and Scott Jarr, VoltDB](https://reader034.vdocuments.us/reader034/viewer/2022052522/54b6cf724a79596f468b4616/html5/thumbnails/28.jpg)
THANK YOU
www.voltdb.com