sql in hadoop – the real deal. emma mcgrattan, svp engineering @ actian

13
Confidential © 2014 Actian Corporation Confidential © 2014 Actian Corporation © 2014 Actian Corporation Actian SQL in Hadoop Emma K McGrattan, Actian Corp. October 7 th 2014

Upload: huguk

Post on 25-May-2015

263 views

Category:

Technology


2 download

DESCRIPTION

Can you use full SQL language in Hadoop without limitations? Can you run SQL on the freshest Hadoop data without moving data out into another database every time you want to run a query? The answer is a resounding Yes. Unlike many solutions that promise SQL access in Hadoop but don’t deliver, this session will showcase a solution where users can benefit from enterprise-ready, scalable SQL access to data in Hadoop. Furthermore, we’ll talk about adding trickle update support on HDFS - a file system designed for data to be written once and read ever after. The addition of trickle-update support on Hadoop will mean that traditional OLTP workloads can be run natively on Hadoop without having to pull the data into a traditional RDBMS.

TRANSCRIPT

Page 1: SQL in Hadoop – the real deal. Emma McGrattan, SVP Engineering @ Actian

Confidential © 2014 Actian Corporation1 Confidential © 2014 Actian Corporation1 © 2014 Actian Corporation

Actian SQL in Hadoop

Emma K McGrattan, Actian Corp.October 7th 2014

Page 2: SQL in Hadoop – the real deal. Emma McGrattan, SVP Engineering @ Actian

Confidential © 2014 Actian Corporation2 Confidential © 2014 Actian Corporation2

Actian is Delivering Transformational Value

$140M Revenues + Profitable

10,000+ Customers

Global Presence: 8 world-wide offices, 7x 24 multinational support model

2 “Fast becoming a big data

powerhouse to challenge

the market.” Forrester

“Actian is now very powerfully

positioned in the big data and

analytics markets.” Bloor

Page 3: SQL in Hadoop – the real deal. Emma McGrattan, SVP Engineering @ Actian

Confidential © 2014 Actian Corporation3 Confidential © 2014 Actian Corporation3

Actian Analytics Platform – VectorBuilt for Speed, Fast Time to Value

Tim

e / C

yc

les

to

Pro

ce

ss

Data Processed

DISK

RAM

CHIP

10GB2-3GB40-400MB

2-2

0150-2

50

Mill

ions

Vector Processing

Single

Instruction

Multiple

Data

2nd Gen Column Store

Limit I/O

Efficient real time updates

Smarter Compression

Maximize throughput

Vectorized decompression

Exploiting Chip Cache

Process data on chip – not in RAM

1

2

3

4

Multi-core ParallelismMaximize system resource

utilization…

Storage Indexes

Quickly identify candidate data

blocks

Minimize IO

5

6

Page 4: SQL in Hadoop – the real deal. Emma McGrattan, SVP Engineering @ Actian

Confidential © 2014 Actian Corporation4 Confidential © 2014 Actian Corporation4

TPC-H 1TB – Faster, Less Hardware

0 100,000 200,000 300,000 400,000

Actian Vector 445,529

Actian Vector 436,788

SQL Server 219,888

Oracle 209,534

Oracle 201,487

SQL Server 173,962

Sybase IQ 164,747

Oracle 140,181

SQL Server 134,117

June ‘12

May ‘11

Aug ‘11

June ‘11

Sept ‘11

Apr ‘11

Dec ‘10

Apr ‘10

Dec ‘11

$57,146

$1,229,968

$460,869

$2,402,706

$753,392

$278,527

$85,621

$1,249,967

$258,880

Hardware Cost(excluding discounts)QphH

Fastest TPC-H QphH@1TB Benchmark (non-clustered)

Source: www.tpc.org /

Page 5: SQL in Hadoop – the real deal. Emma McGrattan, SVP Engineering @ Actian

Confidential © 2014 Actian Corporation5 Confidential © 2014 Actian Corporation5

X100

X100

X100

X100

HDFS

HDFS

HDFS

HDFS

HDFS

X100

Work

er

no

de

[1

..n

] (d

ata

no

des)

SQL in Hadoop ArchitectureS

QL P

rocessin

g SQL parser

Optimizer

Cross compiler

parsed tree

query plan

Client application

X100 algebra

X100

Distributed rewriter

Builder

Execution engine

annotated query tree

operator tree

Buffer manager

datadata request

HDFS

Ma

ste

r n

od

e (

na

me

node

)

SQL query

I/O

X100

Rewriter

Builder

Execution engine

annotated query tree

partial operator tree

Buffer manager

datadata request

HDFS

I/O

MPI

annotated tree

result

MPI

partial result set

MP

I

inte

r-node c

om

mu

nic

atio

n

HDFS

namenode

HDFS

datanode

X100

Page 6: SQL in Hadoop – the real deal. Emma McGrattan, SVP Engineering @ Actian

Confidential © 2014 Actian Corporation6 Confidential © 2014 Actian Corporation6

HADOOP

YARN

HDFS

Standard

SQL

Interfaces

DataNode

HDFS

Visual Data

& Analytics

Workflow

DataNode

HDFS

DataNode

HDFS

DataNode

HDFS

X100X100X100

ReadLoad

Actian VectorBlend &Enrich

Data Science & Analytics

DataNode

HDFS

X100

HDFS

Data Block

• Original file format

• Standard block

replicationNameNode

High Performance,

Industrialized SQL

Database

High Performance,

Data Science &

Analytics

• Column-based

blocks

• Compressed

• Partitioned

Replicated

Data

• >=3 Replicated

Copies of Data

Blocks

• Leveraged to co-

locate data with

various join keys

Actian Analytics Platform – Hadoop IntegrationTransforms Hadoop into a High Performance Analytics Platform

Page 7: SQL in Hadoop – the real deal. Emma McGrattan, SVP Engineering @ Actian

Confidential © 2014 Actian Corporation7 Confidential © 2014 Actian Corporation7

History of the TPC-DS Comparison

Page 8: SQL in Hadoop – the real deal. Emma McGrattan, SVP Engineering @ Actian

Confidential © 2014 Actian Corporation8 Confidential © 2014 Actian Corporation 8

TPC-DS Benchmark Components

OperationalSystems

Refresh Process Ad-hoc Reporting Queries

User Queries

DSS DatabaseTPC-DS

Reports

Store

Web

Catalog

Inventory

Promotions

Set of Files

ETL

Page 9: SQL in Hadoop – the real deal. Emma McGrattan, SVP Engineering @ Actian

Confidential © 2014 Actian Corporation9

Actian Hadoop SQL Performance

0

5

10

15

20

25

30

35

Q3 Q7 Q19 Q27 Q34 Q42 Q43 Q46 Q52 Q53 Q55 Q59 Q63 Q65 Q68 Q73 Q79 Q89 Q98

“Impala Subset” of TPC-DS Queries at Scale Factor 3000 (3TB)Speedup vs Impala

Impala Actian

16x avg. speedup

Background to “Impala Subset “of TPC-DS benchmark can be found here:

http://blog.cloudera.com/blog/2014/01/impala-performance-dbms-class-speed/

Both Executed on the Same Hardware and Software Environment:

5 Node Cluster with 64GB of RAM per node and 24x1TB Hard Disks.

Sp

ee

du

p F

acto

r

Page 10: SQL in Hadoop – the real deal. Emma McGrattan, SVP Engineering @ Actian

Confidential © 2014 Actian Corporation10

Live Demo

Page 11: SQL in Hadoop – the real deal. Emma McGrattan, SVP Engineering @ Actian

Confidential © 2014 Actian Corporation11

AAP – Express Editions

Page 12: SQL in Hadoop – the real deal. Emma McGrattan, SVP Engineering @ Actian

Confidential © 2014 Actian Corporation12

Hadoop cluster installations

The fastest analytics database running natively in Hadoop with a powerful yet simple analytics workflow builder

Free!

Community supported

Coming very soon. Pre-register at:

http://bigdata.actian.com/sql-in-hadoop

Single node installations

The leading single server analytics database with a powerful yetsimple analytics workflow builder

Free!

Community supported

Get it today:

http://bigdata.actian.com/express

Actian Analytics Platform - Express Editions

Extreme Performance Edition Hadoop SQL Edition

Page 13: SQL in Hadoop – the real deal. Emma McGrattan, SVP Engineering @ Actian

Confidential © 2014 Actian Corporation13

Thank You!