hadoop and the relational database: the best of both worlds
DESCRIPTION
The Briefing Room with Dr. Robin Bloor and Splice Machine Live Webcast on August 5, 2014 Watch the archive: https://bloorgroup.webex.com/bloorgroup/lsr.php?RCID=71551d669454741c8bd56f2349bdf140 As the pressure of Big Data collides with the reality of daily operations, many organizations are trying to solve the challenge of meeting new requirements without disrupting the flow of business. One solution focuses on the data layer itself, by combining the well known functionality of relational database technology with the scale-out capabilities of Hadoop. Register for this episode of The Briefing Room to hear from veteran Analyst Dr. Robin Bloor as he outlines the critical components of a business-ready data layer. He’ll be briefed by John Leach and Rich Reimer of Splice Machine who will explain how their solution delivers the best of both data worlds: the trusted capabilities of relational with the infinite scalability of Hadoop. They will also discuss how Hadoop has transformed from a batch-oriented workhorse into a scale-out layer capable of supporting real-time applications and operational analytics using traditional SQL. Visit InsideAnlaysis.com for more information.TRANSCRIPT
Grab some coffee and
enjoy the
pre-show
banter
before the top of the
hour!
The Briefing Room
Hadoop and the Relational Database: The Best of Both Worlds
Twitter Tag: #briefr
The Briefing Room
! Reveal the essential characteristics of enterprise software, good and bad
! Provide a forum for detailed analysis of today’s innovative technologies
! Give vendors a chance to explain their product to savvy analysts
! Allow audience members to pose serious questions... and get answers!
Mission
Twitter Tag: #briefr
The Briefing Room
Topics
2014 Editorial Calendar at www.insideanalysis.com/webcasts/the-briefing-room
This Month: BIG DATA ECOSYSTEM
September: INTEGRATION & DATA FLOW
October: ANALYTIC PLATFORMS
Twitter Tag: #briefr
The Briefing Room
Executive Summary
Scale out is the new Agile
Business needs constant flexibility
No time for down time
Grow as quickly as you can sell
Twitter Tag: #briefr
The Briefing Room
Analyst: Robin Bloor
Robin Bloor is Chief Analyst at The Bloor Group
[email protected] @robinbloor
Twitter Tag: #briefr
The Briefing Room
Splice Machine
! Splice Machine is a SQL-on-Hadoop database
! The product is ACID-compliant and can power both OLAP and OLTP workloads
! Splice Machine is built on Java-based Apache Derby and Hbase/Hadoop
Twitter Tag: #briefr
The Briefing Room
Guests: John Leach & Rich Reimer
John Leach, Co-Founder and Chief Technology Officer With over 15 years of software experience under his belt, John’s expertise in analytics and BI drives his role as Chief Technology Officer. Prior to Splice Machine, John founded Incite Retail in June 2008 and led the company’s strategy and development efforts. Prior to Incite Retail, he ran the business intelligence practice at Blue Martini Software and built strategic partnerships with integration partners. His focus at Blue Martini was helping clients incorporate decision support knowledge into their current business processes utilizing advanced algorithms and machine learning.
Rich Reimer, VP of Marketing and Product Management Rich has over 15 years of sales, marketing and management experience in high-tech companies. Before joining Splice Machine, Rich worked at Zynga as the Treasure Isle studio head, where he used petabytes of data from millions of daily users to optimize the business in real-time. Prior to Zynga, he was the COO and co-founder of a social media platform named Grouply. Before founding Grouply, Rich held executive positions at Siebel Systems, Blue Martini Software and Oracle Corporation as well as sales and marketing positions at General Electric and Bell Atlantic.
Affordable Scale-‐Out
August 5, 2014
11
Data Doubling Every 2 Years… Driven by web, social, mobile, and Internet of Things
Source: 2013 IBM Briefing Book
12
TradiBonal RDBMSs Overwhelmed… Scale-‐up becoming cost-‐prohibi=ve
Oracle is too darn expensive! My DB is
hiLng the wall
Users keep geLng those spinning
beach balls
We have to throw data
away
Our reports take forever
13
Case Study: Harte-‐Hanks
Overview ! Digital markeBng services provider ! Real-‐Bme campaign management ! Complex OLTP and OLAP environment
Challenges ! Oracle RAC too expensive to scale
! Queries too slow – even up to ½ hour
! GeLng worse – expect 30-‐50% data growth
! Looked for 9 months for a cost-‐effecBve soluBon
SoluBon Diagram
IniBal Results
¼ cost with commodity scale out
3-‐7x faster through parallelized queries
10-‐20x price/perf with no applicaBon, BI or ETL rewrites
Cross-Channel Campaigns
Real-Time Personalization
Real-Time Actions
14
Scale-‐Out: The Future of Databases Drama=c improvement in price/performance
Scale Up (Increase server size)
Scale Out (More small servers)
vs. $ $ $ $ $ $
15
Who are We?
THE ONLY HADOOP RDBMS Replace your old RDBMS
with a scale-‐out SQL database
! Affordable, Scale-‐Out ! ACID TransacBons ! No ApplicaBon Rewrites
10x Beier
Price/Perf
16
Customer Performance Benchmarks Typically 10x price/performance improvement
30x
3-‐7x 10-‐20x
10x 20x
10-‐15x
7x
5x
SPEED
PRICE/PERFORMANCE VS.
Use Cases
§ Digital MarkeBng § Campaign management § Unified Customer Profile § Real-‐Bme personalizaBon
§ Data Lake § OperaBonal reporBng and analyBcs § OperaBonal Data Stores
§ Fraud DetecBon § Personalized Medicine § Internet of Things
§ Network monitoring § Cyber-‐threat security § Wearables and sensors
17
Seasoned Team
18
Successful Serial Entrepreneurs
Enterprise So?ware Experience
Database & Big Data Experience
Big Data Research & Community Leadership
Hadoop User Group
What People are Saying…
19
Recognized as a key innovator in databases
Scaling out on Splice Machine presented some major benefits
over Oracle ...automaBc balancing between clusters...avoiding the costly
licensing issues. Quotes
Awards
An alternaKve to today’s
RDBMSes, Splice Machine effecBvely
combines tradiBonal relaBonal database technology with the scale-‐out capabiliBes
of Hadoop.
The unique claim of … Splice Machine is that it can run
transacKonal applicaKons as well as support analyBcs on
top of Hadoop.
20
Proven Building Blocks: Hadoop and Derby
APACHE DERBY § ANSI SQL-‐99 RDBMS § Java-‐based § ODBC/JDBC Compliant
APACHE HBASE/HDFS § Auto-‐sharding § Real-‐Bme updates § Fault-‐tolerance § Scalability to 100s of PBs § Data replicaBon
21
HBase: Proven Scale-‐Out
§ Auto-‐sharding § Scales with commodity hardware § Cost-‐effecBve from GBs to PBs
§ High availability thru failover and replicaBon
§ LSM-‐trees
22
Distributed, Parallelized Query ExecuBon
! Parallelized computaBon across cluster ! Moves computaBon to the data
! UBlizes HBase co-‐processors ! No MapReduce
ANSI SQL-‐99 Coverage
23
§ Data types – e.g., INTEGER, REAL, CHARACTER, DATE, BOOLEAN, BIGINT
§ DDL – e.g., CREATE TABLE, CREATE SCHEMA, ALTER TABLE, DELETE, UPDATE
§ Predicates – e.g., IN, BETWEEN, LIKE, EXISTS § DML – e.g., INSERT, DELETE, UPDATE, SELECT § Query specificaKon – e.g., SELECT DISTINCT,
GROUP BY, HAVING § SET funcKons – e.g., UNION, ABS, MOD, ALL,
CHECK § AggregaKon funcKons – e.g., AVG, MAX,
COUNT § String funcKons – e.g., SUBSTRING,
concatenaBon, UPPER, LOWER, POSITION, TRIM, LENGTH
§ CondiKonal funcKons – e.g., CASE, searched CASE
§ Privileges – e.g., privileges for SELECT, DELETE, INSERT, EXECUTE
§ Cursors – e.g., updatable, read-‐only, posiBoned DELETE/UPDATE
§ Joins – e.g., INNER JOIN, LEFT OUTER JOIN § TransacKons – e.g., COMMIT, ROLLBACK,
READ COMMITTED, REPEATABLE READ, READ UNCOMMITTED, Snapshot IsolaBon
§ Sub-‐queries § Triggers § User-‐defined funcKons (UDFs) § Views – including grouped views
24
Lockless, ACID transacBons State-‐of-‐the-‐Art Snapshot Isola=on
! Adds mulB-‐row, mulB-‐table transacBons to HBase with rollback
! Fast, lockless, high concurrency
! ZooKeeper coordinaBon ! Extends research from Google Percolator, Yahoo Labs, U of Waterloo
Transaction A
Transaction B
Transaction C
Ts Tc
25
BI and SQL tool support via ODBC No applica=on rewrites needed
26
Who are We?
THE ONLY HADOOP RDBMS Replace your old RDBMS
with a scale-‐out SQL database
! Affordable, Scale-‐Out ! ACID TransacBons ! No ApplicaBon Rewrites
10x Beier
Price/Perf
Thank You!
Twitter Tag: #briefr
The Briefing Room
Perceptions & Questions
Analyst: Robin Bloor
Hadoop as a Data Refinery?
Robin Bloor, PhD
Data Flow – A Set of Principles
u The data layer is one logical collection of data, both external and internal
u The data flows, from ingest through a refining process to a point of application
u It is best if data doesn’t flow much
u “Vanilla Hadoop” is a viable catching & refining vehicle
u Beyond that a database is required to manage workloads
Big Data Architecture
Data Refining
The Data Engines
STREAMING DATA
OLTP
LARGE QUERY
LARGE ANALYTICAL QUERY
SQL, JSON, SPARQL QUERIES
u How does Splice Machine organize its data?
u Is this an OLTP database or a BI database? Or can it be both at the same time?
u What do you see as the sweet spot for this database: • In respect of Big Data? • In respect of business applications?
u Is Splice Machine also suited for analytical applications?
u Do you also find yourselves competing with NoSQL products?
u In respect of scale, what is your largest implementation by data volume and by transaction rate?
Twitter Tag: #briefr
The Briefing Room
Twitter Tag: #briefr
The Briefing Room
Upcoming Topics
www.insideanalysis.com
2014 Editorial Calendar at www.insideanalysis.com/webcasts/the-briefing-room
This Month: BIG DATA ECOSYSTEM
September: INTEGRATION & DATA FLOW
October: ANALYTIC PLATFORMS
Twitter Tag: #briefr
The Briefing Room
THANK YOU for your
ATTENTION!
Opening slide image courtesy of Wikimedia Commons