hadoop and the relational database: the best of both worlds

38
Grab some coee and enjoy the pre-show banter before the top of the hour!

Upload: inside-analysis

Post on 02-Jul-2015

340 views

Category:

Technology


0 download

DESCRIPTION

The Briefing Room with Dr. Robin Bloor and Splice Machine Live Webcast on August 5, 2014 Watch the archive: https://bloorgroup.webex.com/bloorgroup/lsr.php?RCID=71551d669454741c8bd56f2349bdf140 As the pressure of Big Data collides with the reality of daily operations, many organizations are trying to solve the challenge of meeting new requirements without disrupting the flow of business. One solution focuses on the data layer itself, by combining the well known functionality of relational database technology with the scale-out capabilities of Hadoop. Register for this episode of The Briefing Room to hear from veteran Analyst Dr. Robin Bloor as he outlines the critical components of a business-ready data layer. He’ll be briefed by John Leach and Rich Reimer of Splice Machine who will explain how their solution delivers the best of both data worlds: the trusted capabilities of relational with the infinite scalability of Hadoop. They will also discuss how Hadoop has transformed from a batch-oriented workhorse into a scale-out layer capable of supporting real-time applications and operational analytics using traditional SQL. Visit InsideAnlaysis.com for more information.

TRANSCRIPT

Page 1: Hadoop and the Relational Database: The Best of Both Worlds

Grab some coffee and

enjoy the

pre-show

banter

before the top of the

hour!

Page 2: Hadoop and the Relational Database: The Best of Both Worlds

The Briefing Room

Hadoop and the Relational Database: The Best of Both Worlds

Page 3: Hadoop and the Relational Database: The Best of Both Worlds

Twitter Tag: #briefr

The Briefing Room

Welcome

Host: Eric Kavanagh

[email protected] @eric_kavanagh

Page 4: Hadoop and the Relational Database: The Best of Both Worlds

Twitter Tag: #briefr

The Briefing Room

!   Reveal the essential characteristics of enterprise software, good and bad

!   Provide a forum for detailed analysis of today’s innovative technologies

!   Give vendors a chance to explain their product to savvy analysts

!   Allow audience members to pose serious questions... and get answers!

Mission

Page 5: Hadoop and the Relational Database: The Best of Both Worlds

Twitter Tag: #briefr

The Briefing Room

Topics

2014 Editorial Calendar at www.insideanalysis.com/webcasts/the-briefing-room

This Month: BIG DATA ECOSYSTEM

September: INTEGRATION & DATA FLOW

October: ANALYTIC PLATFORMS

Page 6: Hadoop and the Relational Database: The Best of Both Worlds

Twitter Tag: #briefr

The Briefing Room

Executive Summary

Scale out is the new Agile

Business needs constant flexibility

No time for down time

Grow as quickly as you can sell

Page 7: Hadoop and the Relational Database: The Best of Both Worlds

Twitter Tag: #briefr

The Briefing Room

Analyst: Robin Bloor

Robin Bloor is Chief Analyst at The Bloor Group

[email protected] @robinbloor

Page 8: Hadoop and the Relational Database: The Best of Both Worlds

Twitter Tag: #briefr

The Briefing Room

Splice Machine

!   Splice Machine is a SQL-on-Hadoop database

!  The product is ACID-compliant and can power both OLAP and OLTP workloads

!   Splice Machine is built on Java-based Apache Derby and Hbase/Hadoop

Page 9: Hadoop and the Relational Database: The Best of Both Worlds

Twitter Tag: #briefr

The Briefing Room

Guests: John Leach & Rich Reimer

John Leach, Co-Founder and Chief Technology Officer With over 15 years of software experience under his belt, John’s expertise in analytics and BI drives his role as Chief Technology Officer. Prior to Splice Machine, John founded Incite Retail in June 2008 and led the company’s strategy and development efforts. Prior to Incite Retail, he ran the business intelligence practice at Blue Martini Software and built strategic partnerships with integration partners. His focus at Blue Martini was helping clients incorporate decision support knowledge into their current business processes utilizing advanced algorithms and machine learning.

Rich Reimer, VP of Marketing and Product Management Rich has over 15 years of sales, marketing and management experience in high-tech companies. Before joining Splice Machine, Rich worked at Zynga as the Treasure Isle studio head, where he used petabytes of data from millions of daily users to optimize the business in real-time. Prior to Zynga, he was the COO and co-founder of a social media platform named Grouply. Before founding Grouply, Rich held executive positions at Siebel Systems, Blue Martini Software and Oracle Corporation as well as sales and marketing positions at General Electric and Bell Atlantic.

Page 10: Hadoop and the Relational Database: The Best of Both Worlds

Affordable  Scale-­‐Out  

August  5,  2014  

Page 11: Hadoop and the Relational Database: The Best of Both Worlds

11  

Data  Doubling  Every  2  Years…  Driven  by  web,  social,  mobile,  and  Internet  of  Things  

Source: 2013 IBM Briefing Book

Page 12: Hadoop and the Relational Database: The Best of Both Worlds

12  

TradiBonal  RDBMSs  Overwhelmed…  Scale-­‐up  becoming  cost-­‐prohibi=ve  

 

Oracle  is    too  darn  expensive!   My  DB  is    

hiLng    the  wall    

Users  keep  geLng  those  spinning  

beach  balls  

We  have  to    throw  data    

away  

Our  reports  take  forever  

Page 13: Hadoop and the Relational Database: The Best of Both Worlds

13  

Case  Study:  Harte-­‐Hanks  

Overview    !  Digital  markeBng  services  provider  !  Real-­‐Bme  campaign  management  !  Complex  OLTP  and  OLAP  environment  

 

Challenges  !  Oracle  RAC  too  expensive  to  scale  

!  Queries  too  slow  –  even  up  to  ½  hour  

!  GeLng  worse  –  expect  30-­‐50%  data  growth  

!  Looked  for  9  months  for  a  cost-­‐effecBve  soluBon    

SoluBon  Diagram    

IniBal  Results  

¼  cost  with  commodity  scale  out  

3-­‐7x  faster  through  parallelized  queries  

10-­‐20x  price/perf  with  no  applicaBon,  BI  or  ETL  rewrites  

 

Cross-Channel Campaigns

Real-Time Personalization

Real-Time Actions

Page 14: Hadoop and the Relational Database: The Best of Both Worlds

14  

Scale-­‐Out:  The  Future  of  Databases  Drama=c  improvement  in  price/performance  

 

Scale  Up  (Increase  server  size)  

Scale  Out  (More  small  servers)  

vs.  $ $ $ $ $ $

Page 15: Hadoop and the Relational Database: The Best of Both Worlds

15  

Who  are  We?  

THE  ONLY  HADOOP  RDBMS  Replace  your  old  RDBMS  

with  a  scale-­‐out  SQL  database  

! Affordable,  Scale-­‐Out  ! ACID  TransacBons  ! No  ApplicaBon  Rewrites  

10x    Beier    

Price/Perf    

Page 16: Hadoop and the Relational Database: The Best of Both Worlds

16  

Customer  Performance  Benchmarks  Typically  10x  price/performance  improvement  

 

30x  

3-­‐7x  10-­‐20x  

10x  20x  

10-­‐15x  

7x  

5x  

SPEED  

PRICE/PERFORMANCE  VS.  

Page 17: Hadoop and the Relational Database: The Best of Both Worlds

Use  Cases  

§  Digital  MarkeBng  §  Campaign  management  §  Unified  Customer  Profile  §  Real-­‐Bme  personalizaBon  

§  Data  Lake  §  OperaBonal  reporBng  and  analyBcs    §  OperaBonal  Data  Stores  

§  Fraud  DetecBon  §  Personalized  Medicine    §  Internet  of  Things  

§  Network  monitoring  §  Cyber-­‐threat  security  §  Wearables  and  sensors  

17  

Page 18: Hadoop and the Relational Database: The Best of Both Worlds

Seasoned  Team  

18  

Successful  Serial  Entrepreneurs    

Enterprise  So?ware  Experience  

Database  &  Big  Data  Experience  

Big  Data  Research  &  Community  Leadership  

Hadoop User Group

Page 19: Hadoop and the Relational Database: The Best of Both Worlds

What  People  are  Saying…  

19  

Recognized  as  a  key  innovator  in  databases  

Scaling  out  on  Splice  Machine  presented    some  major  benefits    

over  Oracle  ...automaBc  balancing  between  clusters...avoiding  the  costly  

licensing  issues.  Quotes  

Awards  

 An  alternaKve  to  today’s  

RDBMSes,  Splice  Machine  effecBvely    

combines  tradiBonal  relaBonal  database    technology  with    the  scale-­‐out  capabiliBes    

of  Hadoop.    

The  unique  claim  of  …  Splice  Machine  is  that  it  can  run  

transacKonal  applicaKons  as  well  as  support  analyBcs  on    

top  of  Hadoop.  

Page 20: Hadoop and the Relational Database: The Best of Both Worlds

20  

Proven  Building  Blocks:  Hadoop  and  Derby  

APACHE  DERBY    §   ANSI  SQL-­‐99  RDBMS  §   Java-­‐based  §   ODBC/JDBC  Compliant    

APACHE  HBASE/HDFS  §  Auto-­‐sharding  §  Real-­‐Bme  updates  §  Fault-­‐tolerance  §  Scalability  to  100s  of  PBs  §  Data  replicaBon    

   

Page 21: Hadoop and the Relational Database: The Best of Both Worlds

21  

HBase:  Proven  Scale-­‐Out  

§  Auto-­‐sharding    §  Scales  with  commodity  hardware  §  Cost-­‐effecBve  from  GBs  to  PBs  

§  High  availability  thru  failover  and  replicaBon  

§  LSM-­‐trees  

Page 22: Hadoop and the Relational Database: The Best of Both Worlds

22  

Distributed,  Parallelized  Query  ExecuBon  

! Parallelized  computaBon  across  cluster  ! Moves  computaBon  to  the  data  

! UBlizes  HBase  co-­‐processors  ! No  MapReduce  

Page 23: Hadoop and the Relational Database: The Best of Both Worlds

ANSI  SQL-­‐99  Coverage  

23  

§  Data  types  –  e.g.,  INTEGER,  REAL,  CHARACTER,  DATE,  BOOLEAN,  BIGINT  

§  DDL  –  e.g.,  CREATE  TABLE,  CREATE  SCHEMA,  ALTER  TABLE,  DELETE,  UPDATE  

§  Predicates  –  e.g.,  IN,  BETWEEN,  LIKE,  EXISTS  §  DML  –  e.g.,  INSERT,  DELETE,  UPDATE,  SELECT  §  Query  specificaKon  –  e.g.,  SELECT  DISTINCT,  

GROUP  BY,  HAVING  §  SET  funcKons  –  e.g.,  UNION,  ABS,  MOD,  ALL,  

CHECK  §  AggregaKon  funcKons  –  e.g.,  AVG,  MAX,  

COUNT  §  String  funcKons  –  e.g.,  SUBSTRING,  

concatenaBon,  UPPER,  LOWER,  POSITION,  TRIM,  LENGTH  

§  CondiKonal  funcKons  –  e.g.,  CASE,  searched  CASE  

§  Privileges  –  e.g.,  privileges  for  SELECT,  DELETE,  INSERT,  EXECUTE  

§  Cursors  –  e.g.,  updatable,  read-­‐only,  posiBoned  DELETE/UPDATE  

§  Joins  –  e.g.,  INNER  JOIN,  LEFT  OUTER  JOIN  §  TransacKons  –  e.g.,  COMMIT,  ROLLBACK,  

READ  COMMITTED,  REPEATABLE  READ,  READ  UNCOMMITTED,  Snapshot  IsolaBon  

§  Sub-­‐queries  §  Triggers  §  User-­‐defined  funcKons  (UDFs)  §  Views  –  including  grouped  views  

Page 24: Hadoop and the Relational Database: The Best of Both Worlds

24  

Lockless,  ACID  transacBons  State-­‐of-­‐the-­‐Art  Snapshot  Isola=on    

 

! Adds  mulB-­‐row,  mulB-­‐table  transacBons  to  HBase  with  rollback  

! Fast,  lockless,  high  concurrency    

! ZooKeeper  coordinaBon  ! Extends  research  from  Google  Percolator,  Yahoo  Labs,  U  of  Waterloo  

Transaction A

Transaction B

Transaction C

Ts Tc

Page 25: Hadoop and the Relational Database: The Best of Both Worlds

25  

BI  and  SQL  tool  support  via  ODBC  No  applica=on  rewrites  needed  

 

Page 26: Hadoop and the Relational Database: The Best of Both Worlds

26  

Who  are  We?  

THE  ONLY  HADOOP  RDBMS  Replace  your  old  RDBMS  

with  a  scale-­‐out  SQL  database  

! Affordable,  Scale-­‐Out  ! ACID  TransacBons  ! No  ApplicaBon  Rewrites  

10x    Beier    

Price/Perf    

Page 27: Hadoop and the Relational Database: The Best of Both Worlds

Thank  You!  

Page 28: Hadoop and the Relational Database: The Best of Both Worlds

Twitter Tag: #briefr

The Briefing Room

Perceptions & Questions

Analyst: Robin Bloor

Page 29: Hadoop and the Relational Database: The Best of Both Worlds

Hadoop as a Data Refinery?

Robin Bloor, PhD

Page 30: Hadoop and the Relational Database: The Best of Both Worlds

Data Flow – A Set of Principles

u  The data layer is one logical collection of data, both external and internal

u  The data flows, from ingest through a refining process to a point of application

u  It is best if data doesn’t flow much

u  “Vanilla Hadoop” is a viable catching & refining vehicle

u  Beyond that a database is required to manage workloads

Page 31: Hadoop and the Relational Database: The Best of Both Worlds

Big Data Architecture

Page 32: Hadoop and the Relational Database: The Best of Both Worlds

Data Refining

Page 33: Hadoop and the Relational Database: The Best of Both Worlds

The Data Engines

STREAMING DATA

OLTP

LARGE QUERY

LARGE ANALYTICAL QUERY

SQL, JSON, SPARQL QUERIES

Page 34: Hadoop and the Relational Database: The Best of Both Worlds

u  How does Splice Machine organize its data?

u  Is this an OLTP database or a BI database? Or can it be both at the same time?

u  What do you see as the sweet spot for this database: •  In respect of Big Data? •  In respect of business applications?

Page 35: Hadoop and the Relational Database: The Best of Both Worlds

u  Is Splice Machine also suited for analytical applications?

u  Do you also find yourselves competing with NoSQL products?

u  In respect of scale, what is your largest implementation by data volume and by transaction rate?

Page 36: Hadoop and the Relational Database: The Best of Both Worlds

Twitter Tag: #briefr

The Briefing Room

Page 37: Hadoop and the Relational Database: The Best of Both Worlds

Twitter Tag: #briefr

The Briefing Room

Upcoming Topics

www.insideanalysis.com

2014 Editorial Calendar at www.insideanalysis.com/webcasts/the-briefing-room

This Month: BIG DATA ECOSYSTEM

September: INTEGRATION & DATA FLOW

October: ANALYTIC PLATFORMS

Page 38: Hadoop and the Relational Database: The Best of Both Worlds

Twitter Tag: #briefr

The Briefing Room

THANK YOU for your

ATTENTION!

Opening slide image courtesy of Wikimedia Commons