sql in hadoop: big data innovation without the risk

55
Grab some coffee and enjoy the preshow banter before the top of the hour!

Upload: inside-analysis

Post on 07-Aug-2015

18 views

Category:

Technology


1 download

TRANSCRIPT

Grab some coffee and enjoy the pre-­show banter

before the top of the

hour!

The Briefing Room

SQL In Hadoop: Big Data Innovation Without the Risk

Twitter Tag: #briefr The Briefing Room

Welcome

Host: Eric Kavanagh

[email protected] @eric_kavanagh

Twitter Tag: #briefr The Briefing Room

  Reveal the essential characteristics of enterprise software, good and bad

  Provide a forum for detailed analysis of today’s innovative technologies

 Give vendors a chance to explain their product to savvy analysts

  Allow audience members to pose serious questions... and get answers!

Mission

Twitter Tag: #briefr The Briefing Room

Topics

July: SQL INNOVATION

August: REAL-TIME DATA

September: HADOOP 2.0

Twitter Tag: #briefr The Briefing Room

Twitter Tag: #briefr The Briefing Room

Analyst: Robin Bloor

Robin Bloor is Chief Analyst at The Bloor Group

[email protected] @robinbloor

Twitter Tag: #briefr The Briefing Room

Actian

Actian offers a variety of analytics, data management and integration solutions

  The Actian Analytics Platform includes Vortex, a SQL-in-Hadoop solution for big data analytics

Actian Vortex leverages a vector-based columnar analytics engine that is YARN-compliant

Twitter Tag: #briefr The Briefing Room

Guest: Todd Untrecht

Todd Untrecht joined Actian in 2013, where he is currently Vice President of Global Product Management and Strategy. Todd brings more than 20 years of experience in both large company and startup environments. He specializes in product management, engineering management, and business transformation with particular expertise leading and aligning global engineering and product organizations, driving new products into new markets, and accelerating cross-company innovation.

Confiden'al  ©  2014  Ac'an  Corpora'on  10

SQL  in  Hadoop  

Todd  Untrecht  -­‐  Vice  President,  Product  Management  and  Strategy  Emma  McGraIan  –  Sr.  Vice  President,  Engineering  Ac'an  Corpora'on  

July  2015    

Big  Data  Innova'on  Without  Risk    

Bloor  Group  Briefing  Room    

Confiden'al  ©  2015  Ac'an  Corpora'on  11      

Who  is  Ac'an?    $100M+ Revenues & Profitable

10,000+ Customers

Global Presence: 8 world-wide offices, 7x 24 multinational support model

11 “Fast becoming a big data powerhouse to challenge the market.” Forrester

“Actian is now very powerfully positioned in the big data and analytics markets.” Bloor

Actian has invested 100’s of millions into next generation technology that is architected to meet future demands

Confiden'al  ©  2014  Ac'an  Corpora'on  12

Modernizing  BI  &  Analy'c  Workloads  

Small  Data   Big  Data  

Opera'onal  

Analy'cs  

Performance    Ceiling  

Analyze  more  (and  different)  

 data  

Big  Data  SQL  Analy/cs  Market  

Reduce  Costs  

Catalysts  driving  Big  Data  SQL  Analy'cs  

Ac'an  Sweet  Spot  

Modern,  massively  distributed  compute  infrastructure  w/  commodity  hw  

Well  Recognized  Business  Value  but  Exis'ng  Legacy  Systems  failing   SQL  

Confiden'al  ©  2015  Ac'an  Corpora'on  13      

Your  BI  and  Analy'c  Systems  are  Under  Pressure  

Legacy  HW  &  SW  Pla:orms  

INDUSTRIAL  

SQL  

ISVs

Custom Apps

Confiden'al  ©  2015  Ac'an  Corpora'on  14      

Your  BI  and  Analy'c  Systems  are  Under  Pressure  

Legacy  HW  &  SW  Pla:orms  

INDUSTRIAL  

SQL  

ISVs

Custom Apps

Increasing  pressures  on  legacy  infrastructure  are  causing  analy/c  workloads  to  break  

Confiden'al  ©  2015  Ac'an  Corpora'on  15      

Your  BI  and  Analy'c  Systems  are  Under  Pressure  

Legacy  HW  &  SW  Pla:orms  

INDUSTRIAL  

SQL  

ISVs

Custom Apps

Increasing  pressures  on  legacy  infrastructure  are  causing  analy/c  workloads  to  break  

$$  

Confiden'al  ©  2015  Ac'an  Corpora'on  16      

Your  BI  and  Analy'c  Systems  are  Under  Pressure  

Legacy  HW  &  SW  Pla:orms  

INDUSTRIAL  

SQL  

ISVs

Custom Apps

Increasing  pressures  on  legacy  infrastructure  are  causing  analy/c  workloads  to  break  

$$  

Confiden'al  ©  2015  Ac'an  Corpora'on  17      

How  to  Innovate  and  Modernize  Without  Risk….    

ISVs

Custom Apps

Legacy  HW  &  SW  Pla:orms  

INDUSTRIAL  

SQL  

Confiden'al  ©  2015  Ac'an  Corpora'on  18      

How  to  Innovate  and  Modernize  Without  Risk….    

ISVs

Custom Apps

Legacy  HW  &  SW  Pla:orms  

INDUSTRIAL  

SQL  

Modern,  massively  distributed  compute  infrastructure  w/  commodity  HW  

Confiden'al  ©  2015  Ac'an  Corpora'on  19      

How  to  Innovate  and  Modernize  Without  Risk….    

ISVs

Custom Apps

Legacy  HW  &  SW  Pla:orms  

INDUSTRIAL  

SQL  

Modern,  massively  distributed  compute  infrastructure  w/  commodity  HW  

Keep Existing Apps and People

Confiden'al  ©  2015  Ac'an  Corpora'on  20      

How  to  Innovate  and  Modernize  Without  Risk….    

ISVs

Custom Apps

Legacy  HW  &  SW  Pla:orms  

INDUSTRIAL  

SQL  

Modern,  massively  distributed  compute  infrastructure  w/  commodity  HW  

Grow Data with No Change in Performance and No Extra Budget

Confiden'al  ©  2015  Ac'an  Corpora'on  21      

How  to  Innovate  and  Modernize  Without  Risk….    

ISVs

Custom Apps

Legacy  HW  &  SW  Pla:orms  

INDUSTRIAL  

SQL  

Modern,  massively  distributed  compute  infrastructure  w/  commodity  HW  

Leverage Advances in Open Source and Avoid Vendor Lock-In

Confiden'al  ©  2015  Ac'an  Corpora'on  22      

How  to  Innovate  and  Modernize  Without  Risk….    

ISVs

Custom Apps

Legacy  HW  &  SW  Pla:orms  Modern,  massively  distributed  compute  

infrastructure  w/  commodity  HW  

Ac/an  Vortex  Modern,  Super  Scaling    Columnar  SQL  Analy'c  Engine  

Enterprise  Grade,  Fast,  Open  INDUSTRIAL  

SQL  

Enterprise  

Social  

Internet  of  Things  

SaaS  

DATA   Delight  Customers    

Improve  Compe''ve  Edge  

 Reduce  Risk    and  Cost  

 Innovate  

VALUE  

The  Wiz  Data  

Scien'st  

IT  Sophis/cate  CIO  

Maestro  Business  Analyst  

Speed  Demon  Impa'ent  

Business  User  

Elas'c  Data  Prepara'on  

SQL  Analy'cs  

Predic've  Analy'cs  

Ac'an  Vortex™  Highest  Performance  Analy'cs  at  Scale  in  Hadoop    

Enterprise  

Social  

Internet  of  Things  

SaaS  

DATA   Delight  Customers    

Improve  Compe''ve  Edge  

 Reduce  Risk    and  Cost  

 Innovate  

VALUE  

The  Wiz  Data  

Scien'st  

IT  Sophis/cate  CIO  

Maestro  Business  Analyst  

Speed  Demon  Impa'ent  

Business  User  

Ac'an  Vortex™  Highest  Performance  Analy'cs  at  Scale  in  Hadoop    

Predic've  Analy'cs  

Ac/an    Vector  in  Hadoop  

Elas/c  Data  Prepara/on  

DataFlow  SQL  Analy/cs  

Vector  in  Hadoop  

Predic/ve  Analy/cs  

DataFlow  

Enterprise  

Social  

Internet  of  Things  

SaaS  

DATA   Delight  Customers    

Improve  Compe''ve  Edge  

 Reduce  Risk    and  Cost  

 Innovate  

VALUE  

The  Wiz  Data  

Scien'st  

IT  Sophis/cate  CIO  

Maestro  Business  Analyst  

Speed  Demon  Impa'ent  

Business  User  

Ac'an  Vortex™  Highest  Performance  Analy'cs  at  Scale  in  Hadoop    

Predic've  Analy'cs  

Ac/an    Vector  in  Hadoop  

Elas/c  Data  Prepara/on  

DataFlow  SQL  Analy/cs  

Vector  in  Hadoop  

Predic/ve  Analy/cs  

DataFlow  

Enterprise  

Social  

Internet  of  Things  

SaaS  

DATA   Delight  Customers    

Improve  Compe''ve  Edge  

 Reduce  Risk    and  Cost  

 Innovate  

VALUE  

The  Wiz  Data  

Scien'st  

IT  Sophis/cate  CIO  

Maestro  Business  Analyst  

Speed  Demon  Impa'ent  

Business  User  

Ac'an  Vortex™  Highest  Performance  Analy'cs  at  Scale  in  Hadoop    

Predic've  Analy'cs  

Ac/an    Vector  in  Hadoop  

Elas/c  Data  Prepara/on  

DataFlow  SQL  Analy/cs  

Vector  in  Hadoop  

Predic/ve  Analy/cs  

DataFlow  

Confiden'al  ©  2014  Ac'an  Corpora'on  27

Vortex  -­‐  Elas'c  Data  Prepara'on  

Remote Vortex Hadoop Cluster

High Volume Data Pipes D

ata

Inflo

w

…  

Data

Data

Data

…  

HDFS

Data

LAN

Vector    in  Hadoop  DataFlow  Elas/c  Data  Inges/on  

•  New  Edge-­‐to-­‐Engine  high  speed  parallel  inges'on  no  maIer  where  the  source  data  resides  

Local  Data    Sources  

Cloud Data & Applications

Data

Data Highly  parallel  and  elas'c  data  inges'on  

New  Vector  in  Hadoop  Writer  

Data

•  Secure,  Compressed,  Binary  to  Binary  inges'on  (no  intermediate  HDFS  files  needed)  

The  fastest  way  to  get  big  data  into  Ac'an  

Data

Streaming

Confiden'al  ©  2014  Ac'an  Corpora'on  28

Vortex  –  Open  Architecture  

Vortex Hadoop Cluster

High Volume Data Pipes D

ata

Inflo

w

…  

Data

Data

Data

…  

HDFS

Data

Vector    in  Hadoop  DataFlow  Elas/c  Data  Inges/on  

Query  na've  Hadoop  file  formats  (i.e.  Parquet)  without  inges'on…    

Highly  parallel  and  elas'c  data  inges'on  

Parquet  

Parquet  

Parquet  

Parquet  

External  Table  Support  

Remote

Local  Data    Sources  

Cloud Data & Applications

Data

Data

Data

Data

Streaming

LAN

Confiden'al  ©  2014  Ac'an  Corpora'on  29

Vortex  –  Open  Architecture  Open  up  Vector  in  Hadoop  file  format  for  lightning  fast  external  consump'on    

Open  APIs  /    Java  Reference  Implementa'on  

Enterprise  

Social  

Internet  of  Things  

SaaS  

DATA   Delight  Customers    

Improve  Compe''ve  Edge  

 Reduce  Risk    and  Cost  

 Innovate  

VALUE  

The  Wiz  Data  

Scien'st  

IT  Sophis/cate  CIO  

Maestro  Business  Analyst  

Speed  Demon  Impa'ent  

Business  User  

Ac'an  Vortex™  Highest  Performance  Analy'cs  at  Scale  in  Hadoop    

Predic've  Analy'cs  

Ac/an    Vector  in  Hadoop  

Elas/c  Data  Prepara/on  

DataFlow  SQL  Analy/cs  

Vector  in  Hadoop  

Predic/ve  Analy/cs  

DataFlow  

Confiden'al  ©  2015  Ac'an  Corpora'on  31      

The  Basics:  Ac'an  Vector  

Pioneered  high  speed  columnar,  Vector  processing  architecture  

Over  10  Years  in  Development  

5  Years  in  Produc'on  

Supports  Standard  SQL  Interfaces  

Mature  SQL  Processing  Front-­‐End  

Supports  Advanced  Analy'c  Capabili'es  e.g.  CUBE,  ROLLUP,  GROUPING  SETS,  Windowing  Func'ons  

Unique  Trickle  Update  Capabili'es  

Designed  to  Leverage  Modern  Hardware  

SQ

L P

roce

ssin

g SQL parser

Optimizer

Cross compiler

parsed tree

query plan

Client application

X100 algebra

X10

0

X100 rewriter

Builder

Execution engine

annotated query tree

operator tree

Buffer manager

data data request

Compressed Storage

SQL query

I/O

result

Founda'on  for  Enterprise  Grade  

Confiden'al  ©  2015  Ac'an  Corpora'on  32      

Ac'an  Vector  in  Hadoop  –  Distributed  X100  ‘”Secret  Sauce”  S

QL

Pro

cess

ing SQL parser

Optimizer

Cross compiler

parsed tree

query plan

Client application

X100 algebra

X10

0

Distributed rewriter

Builder

Execution engine

annotated query tree

operator tree

Buffer manager

data data request

HDFS

Mas

ter n

ode

SQL query

I/O

result

HDFS namenode

Confiden'al  ©  2015  Ac'an  Corpora'on  33      

Ac'an  Vector  in  Hadoop  –  Distributed  X100  ‘”Secret  Sauce”  S

QL

Pro

cess

ing SQL parser

Optimizer

Cross compiler

parsed tree

query plan

Client application

X100 algebra

X10

0

Distributed rewriter

Builder

Execution engine

annotated query tree

operator tree

Buffer manager

data data request

HDFS

Mas

ter n

ode

SQL query

I/O

result

HDFS namenode

annotated tree

partial result set

MPI

MPI

X100

X100

X100

X100

HDFS

HDFS

HDFS

HDFS

HDFS

X100

Wor

ker n

ode

[1..n

] (da

tano

des)

X10

0

Rewriter

Builder

Execution engine

annotated query tree

partial operator tree

Buffer manager

data data request

HDFS

I/O

MP

I in

ter-

node

com

mun

icat

ion

HDFS datanode

X100

X100

X100

X100

Ac'an  Vector  extended  and  op'mized  to  run  inside  a  Hadoop  cluster  

Confiden'al  ©  2015  Ac'an  Corpora'on  34      

Ac'an  Vector  in  Hadoop  –  Distributed  X100  ‘”Secret  Sauce”  S

QL

Pro

cess

ing SQL parser

Optimizer

Cross compiler

parsed tree

query plan

Client application

X100 algebra

X10

0

Distributed rewriter

Builder

Execution engine

annotated query tree

operator tree

Buffer manager

data data request

HDFS

Mas

ter n

ode

SQL query

I/O

result

HDFS namenode

annotated tree

partial result set

MPI

MPI

X100

X100

X100

X100

HDFS

HDFS

HDFS

HDFS

HDFS

X100

Wor

ker n

ode

[1..n

] (da

tano

des)

X10

0

Rewriter

Builder

Execution engine

annotated query tree

partial operator tree

Buffer manager

data data request

HDFS

I/O

MP

I in

ter-

node

com

mun

icat

ion

HDFS datanode

X100

X100

X100

X100

Ac'an  Vector  extended  and  op'mized  to  run  inside  a  Hadoop  cluster  

•  Enterprise  Ready,  Industrial  Strength  SQL  in  Hadoop  

•  HDFS  for  storage  scalability  and  redundancy  •  ACID  Compliant  with  SQL  update  capability  •  YARN  cer'fied  for  cluster  and  resource  

management  •  Vector  performance  on  every  node  

Confiden'al  ©  2015  Ac'an  Corpora'on  35      

0

5

10

15

20

25

30

35

Q3 Q7 Q19 Q27 Q34 Q42 Q43 Q46 Q52 Q53 Q55 Q59 Q63 Q65 Q68 Q73 Q79 Q89 Q98

“Impala Subset” of TPC-DS at Scale Factor 3000 (3TB) Actian+HDP2.1 vs Cloudera Impala

Impala Actian

Background to “Impala Subset “of TPC-DS benchmark can be found here: http://blog.cloudera.com/blog/2014/01/impala-performance-dbms-class-speed/

Both Executed on the Same Hardware and Software Environment: 5 Node Cluster with 64GB of RAM per node and 24x1TB Hard Disks.

16x Faster Average

Results:  Highest  Performing  SQL  in  Hadoop  #  /m

es  fa

ster  th

an  Im

pala   Up to 30x Faster

Confiden'al  ©  2015  Ac'an  Corpora'on  36      

0

5

10

15

20

25

30

35

Q3 Q7 Q19 Q27 Q34 Q42 Q43 Q46 Q52 Q53 Q55 Q59 Q63 Q65 Q68 Q73 Q79 Q89 Q98

“Impala Subset” of TPC-DS at Scale Factor 3000 (3TB) Actian+HDP2.1 vs Cloudera Impala

Impala Actian

Background to “Impala Subset “of TPC-DS benchmark can be found here: http://blog.cloudera.com/blog/2014/01/impala-performance-dbms-class-speed/

Both Executed on the Same Hardware and Software Environment: 5 Node Cluster with 64GB of RAM per node and 24x1TB Hard Disks.

16x Faster Average

Results:  Highest  Performing  SQL  in  Hadoop  #  /m

es  fa

ster  th

an  Im

pala   Up to 30x Faster Note the use

of partition keys

Confiden'al  ©  2014  Ac'an  Corpora'on  37

Vortex  –  SQL  Analy'cs  

Vortex  -­‐  Summary  

•  Collabora've  architecture  •  Open  access  to  Ac'an  formats  •  Support  for  non-­‐Ac'an  formats    

You’re  NOT  locked  in,  and  you  can  benefit  from  all  the  

advances  and  innovaBon  in  open  source.  

•  Fastest  data  prep  and  inges'on  •  Fastest  SQL  analy'c  engines  •  Unbridled  processing  power  on  

data  nodes  in  a  Hadoop  cluster  

•  Full  SQL  support  •  Extreme  scalability  •  Full  security  •  High  Availability  &  

Disaster  Recovery  

You  get  the  results  you  need  when  you  need  them  as  your  

data  volumes  grow    

You  get  all  the  advantages  of  proven  technology  in  an  

immature  space.    

What  we  provide  

Customer  Benefits  

Open   Fast   Enterprise  Grade  

Confiden'al  ©  2015  Ac'an  Corpora'on  39      

…and  it’s  very  easy  to  get  started    

Pick  your  analy'c  workload  causing  you  the  most  pain  

Seamlessly  run  it  in  our  modern  SQL  analy'cs  plaqorm  

Benefit  from  our  open  architecture  

Enjoy  flexible  deployment  op'ons  (on-­‐prem  or  in  the  cloud)  

Get  up  and  running  in  30  minutes  

Easily  migrate  workloads  

Innovate  and  modernize  now  without  risk…  

Ac/an  Vortex  Modern,  Super  Scaling    Columnar  SQL  Analy'c  Engine  

INDUSTRIAL  

SQL  

Confiden'al  ©  2014  Ac'an  Corpora'on  40       Confiden'al  ©  2014  Ac'an  Corpora'on  40      

Thank  You!  Thank  You!  

Ques'ons?  

Confiden'al  ©  2014  Ac'an  Corpora'on  41

Clearly  Differen'ated  

Slow   Fast  

Immature    

Industrial  Strength  

Enterprise  Re

adiness  

Performance  

Open  Source  Up-­‐Starts  

Big  Data  Analy/cs  Market  

Good  Enough  

Produc'on  Ready  

Legacy  Opera'onal  

Level  of  Openness  

Modern,  Super  Scaling  Columnar  Analy'c  Engines  

Ac'an  –  Open,  Fast,  Enterprise  Grade   SQL  

Confiden'al  ©  2014  Ac'an  Corpora'on  42

Ac'an  Vector  –  Unmatched  Innova'on  

Twitter Tag: #briefr The Briefing Room

Perceptions & Questions

Analyst: Robin Bloor

Analytical Workloads

Robin Bloor, PhD

Johnny-Come-Lately

Data Science/Analytics is not an application; it is a work flow environment involving many

applications

Analytics: The 80%

HadoopEnterpriseData Store

DataAnalyticsEngine

DataEncryption

Data DataData

Pre-Data Science Analytics

The Data Science Cycle DataAccess

Audit

Execute

Deploy Analyze

DataPrep

Model

The Data Science Latencies

1  Data access

2  Data preparation

3  Model development

4  Execution

5  Implementation

6  Model audit & update

This is where the rubber meets the road: Speed = Value

The Net Net

There is now a technology race to reduce the Data

Science Latencies

u  Given that analytics is a complex application, what is the process of implementing Actian’s technology?

u  How many of your customers (roughly) are building predictive analytics apps?

u  Does your technology have application for real-time streaming?

u  Is Vortex also appropriate for BI applications?

u  What is the largest amount of data currently under management with any of your customers?

u  Which companies/technologies do you compete with directly?

Twitter Tag: #briefr The Briefing Room

Twitter Tag: #briefr The Briefing Room

Upcoming Topics

www.insideanalysis.com

July: SQL INNOVATION

August: REAL-TIME DATA

September: HADOOP 2.0

Twitter Tag: #briefr The Briefing Room

THANK YOU for your

ATTENTION!

Some images provided courtesy of Wikimedia Commons