sql + hadoop: the high performance advantage�

32
Confidential © 2014 Actian Corporation 1 SQL + Hadoop: The High Performance Advantage Turn Hadoop into a High Performance Analytics Platform Emma McGrattan, Actian Jim Hare, Actian 8 July 2014

Upload: actian-corporation

Post on 26-Jan-2015

112 views

Category:

Data & Analytics


1 download

DESCRIPTION

Turn Hadoop into a High Performance Analytics Platform

TRANSCRIPT

Page 1: SQL + Hadoop:  The High Performance Advantage�

Confidential © 2014 Actian Corporation1

SQL + Hadoop: The High Performance AdvantageTurn Hadoop into a High Performance Analytics Platform

Emma McGrattan, ActianJim Hare, Actian

8 July 2014

Page 2: SQL + Hadoop:  The High Performance Advantage�

Confidential © 2014 Actian Corporation2

1. Introduction

2. Hadoop Challenges

3. Actian Analytics Platform – Hadoop SQL Edition

4. Industrialized, High Performance SQL in Hadoop

5. Questions

Agenda

All lines are muted

To ask a question, use Chat or Q&A panel

Recording will be made available

We‘ll be running a few polling questions

Page 3: SQL + Hadoop:  The High Performance Advantage�

Confidential © 2014 Actian Corporation3

$140M Revenues + Profitable

10,000+ Customers

Global Presence: 8 world-wide offices, 7x 24 multinational support model

3 “Actian is now very powerfully positioned in the big data and analytics markets.” Robin Bloor

Actian is Delivering Transformational Value

“Actian has assembled all of the next generation IPs into a single analytics platform, allowing users a level of flexibility in data interaction that competitors have not been able to match.” siliconANGLE

Page 4: SQL + Hadoop:  The High Performance Advantage�

Confidential © 2014 Actian Corporation4

Big Data Offers Significant Opportunities

Personalized Experience

New Products/Services

Reduce RiskPredictive Analytics

Many Data Sources

Low Cost Storage

…But only for those who embrace it

Improve Decision-Making

Page 5: SQL + Hadoop:  The High Performance Advantage�

Confidential © 2014 Actian Corporation5

Enter Hadoop as the Big Data Enabler for Low Cost Storage

DW Offload

Landing Zone

Data Reservoir

?

Page 6: SQL + Hadoop:  The High Performance Advantage�

Confidential © 2014 Actian Corporation6

But It isn’t Easy with Hadoop

Batch performanceTime to Value

Expensive Skills

Silo’d Data Access

Data preparation

Page 7: SQL + Hadoop:  The High Performance Advantage�

Confidential © 2014 Actian Corporation7

Hadoop Complexity Forcing Organizations to Move Data in order to Analyze it

DW Offload

Landing Zone

Hadoop Data Reservoir

Data Management

Analytics Processing

Visualization & Data Science

Workbench

Result: duplicate storage & infrastructure costs, more IT resources, network bandwidth usage, and complexity

Data Transfer

Page 8: SQL + Hadoop:  The High Performance Advantage�

Confidential © 2014 Actian Corporation8

CIOs Challenged by Big Data Costs

One in three CIOs pay between 21 cents to 30 cents per

gigabyte a month. Translation: it costs a company $3.12

million per year to store 500,000 gigabytes at an average cost of 26

cents per gigabyte per month.

Source: http://www.cioinsight.com/it-strategy/storage/slideshows/cios-challenged-by-big-data-costs.html

-- CIO Insight

Page 9: SQL + Hadoop:  The High Performance Advantage�

Confidential © 2014 Actian Corporation9

CIOs Challenged by Types of Big Data

73% of CIOs day up to 50% of their data will be unstructured

within two years.

Source: http://www.cioinsight.com/it-strategy/storage/slideshows/cios-challenged-by-big-data-costs.html

-- CIO Insight

Page 10: SQL + Hadoop:  The High Performance Advantage�

Confidential © 2014 Actian Corporation10

Instead, what if you could move the analytic processing to the Hadoop data?

Data Science Workbench

Analytic Processing

Data Management

… And transform Hadoop from a data lake into a high performance, fully functional analytics platform

SQL User Access

Page 11: SQL + Hadoop:  The High Performance Advantage�

Confidential © 2014 Actian Corporation11

What is it?

Introducing the Actian Analytics Platform – Hadoop SQL Edition

Patented X100 vector processing engine plus visual data and analytics work flow, all running natively in Hadoop via YARNTurns Hadoop into a High-Performance, Fully-Functional Analytics Database

How is this unique?Highest performing, most industrialized SQL access to Hadoop data

Only end-to-end analytic processing natively in Hadoop

Most consumable, accessible, manageable Hadoop analytics

What does this mean to you? Removes all barriers for business access to big data analytics

Enables SQL users with no constraints on Hadoop data

Accelerates time to value

Page 12: SQL + Hadoop:  The High Performance Advantage�

Confidential © 2014 Actian Corporation12

The Industry’s Abuzz – about Actian!“Deploying on Hadoop enables the Actian Analytics Platform to scale to massively parallel scale without having to modify the underlying engine. For Actian, Hadoop is a means to an end; it provides an opening for Actian to introduce a fast SQL engine that operates at scale.”

Tony Baer, Principal Analyst, Software, Ovum

“Actian’s platform now makes Hadoop data repositories accessible to the entire enterprise by empowering millions of business-savvy SQL users and business analysts to conduct advanced analytics directly on data in the Hadoop Distributed File System (HDFS). Companies investing in Hadoop now can broaden the scope of data discovery, increase the accuracy of decisions, and speed time to value.”

Daniel Gutierrez, Inside Big Data

“The latest version of the Actian Analytics Platform provides end-to-end analytic processing natively in Hadoop. This will make the Hadoop Big Data framework more accessible by offering high-performance ELT (extract, load and transform) and SQL analytics on Hadoop with no need for MapReduce skills. This is a big deal because data scientists with Hadoop skills are in short supply, while SQL skills are relatively abundant.”

Page 13: SQL + Hadoop:  The High Performance Advantage�

Confidential © 2014 Actian Corporation13

Libraries of Analytics

Hadoop

Connections to Access Any Data

Actian Analytics Platform – Hadoop SQL Edition

Visual Data and Analytic Workbench

High Performance Data Flow Engine

Industrialized SQL Analytics Database Natively in Hadoop

Removes all barriers for business access to big data analytics

Business Processes

Users

Machines

Applications

Expansive Connectivity Data Blending & Enrichment Discovery Data Science Analytics Operational BI

Enterprise Data

Machine Data

Social Data

Data Warehouse

SaaS Data

Amazon Redshift

Page 14: SQL + Hadoop:  The High Performance Advantage�

Confidential © 2014 Actian Corporation14

Actian Analytics Platform – Hadoop SQL Edition

Lightning fast and industrial strength SQL in Hadoop – Up to 30X faster than Impala

Full end-to-end analytic processing platform - all native in Hadoop

Packaged with “real world” solution blueprints

Page 15: SQL + Hadoop:  The High Performance Advantage�

Confidential © 2014 Actian Corporation15

Visual Data Science & Analytics Workbench• Drag/drop interface with 100’s of data prep and analytic functions• Connect, blend, & enrich data and perform discovery & data science• Build and test predictive models• Running on top of a high performance data flow engine• All natively within Hadoop via YARN

MapReduce

Coding

Page 16: SQL + Hadoop:  The High Performance Advantage�

Confidential © 2014 Actian Corporation16

Ubiquitous Skills■1 Million+ SQL Users

■$ Lower cost

■Easy to find, in most companies

■Embedded in the business

Specialty Skills■150K MapReduce

Programmers

■$$$ Expensive

■170K Shortage, hard to find

■Separate from the business

Unleash millions of business-savvy, SQL users with no constraints on Hadoop data

Actian Analytics PlatformTM

Analyze ActConnect+

Page 17: SQL + Hadoop:  The High Performance Advantage�

Confidential © 2014 Actian Corporation17

Actian Analytics Platform = 25 Minutes

Log Reader Filter Rows Group Load Vectork-Means

Coding MapReduce = 4 Weeks

Avro WriterMapReduce Code

k-MeansMapReduce Code

Log Reader Filter Rows Group Load VectorMapReduce Code MapReduce Code MapReduce Code MapReduce Code

Accelerate time to value and turn Hadoop data into transformational value

Page 18: SQL + Hadoop:  The High Performance Advantage�

Confidential © 2014 Actian Corporation18

Vendor Approaches to “SQL on Hadoop”

“marketing jobs”

“wrapped legacy”

“from scratch”

SQL Outside Hadoop• Connector approach• MPP DB need 2 clusters• Expensive, hard to manage

Mature but non-Integrated• Legacy engine (e.g. Postgres) + top layer• Store data outside HDFS (local files)• Separate Failover Management (tools)

Integrated but Immature• No trickle updates• Immature/poor optimizers+engines• I18N, security, workload mgmt,

access control?

Page 19: SQL + Hadoop:  The High Performance Advantage�

Confidential © 2014 Actian Corporation19

“wrapped legacy”

“from scratch”

Maturity(SQL support,

ACID, reliability,security, connectivity,

performance)

Hadoop IntegrationLow Native

High

“marketing jobs” Mature & Integrated

++

“SQL on Hadoop” Vendor Landscape

Page 20: SQL + Hadoop:  The High Performance Advantage�

Confidential © 2014 Actian Corporation20 Confidential © 2014 Actian Corporation 20

Actian Vector Hadoop Edition

Actian Analytics Platform Hadoop SQL Edition

Actian Analytics Platform

NameNode

DataNode DataNode

DataNode DataNode

DataNode DataNode

DataNode DataNode

Prepare

Standard SQL Interfaces

Orchestrate

ConnectConnect to any data

via Actian DataConnect

Manage dataflow across the entire analytic process

6 POINTS OF INNOVATION:

Vector Processing

On Chip Cache

Fast Real-time Updates

Smart Compression

Storage Indexes

Multi-Core Parallelism

Running natively in Hadoop via YARN

Prepare, enrich, and analyze any data with

Actian DataFlow

NEXT GENERATION DATABASE

TECHNOLOGY::

Columnar

Compressed

Storage Indexes

Page 21: SQL + Hadoop:  The High Performance Advantage�

Confidential © 2014 Actian Corporation21

Actian Vector – Unmatched InnovationT

ime

/ C

yc

les

to

Pro

ce

ss

Data Processed

DISK

RAM

CHIP

10GB2-3GB40-400MB

2-20

150-

250

Mill

ions

Vector Processing

Single Instruction Multiple Data

2nd Gen Column StoreLimit I/OEfficient real time updates

Smarter Compression

Maximize throughputVectorized decompression

Exploiting Chip Cache

Process data on chip – not in RAM

1

2

3

4

Multi-core ParallelismMaximize system resource utilization…

Storage Indexes

Quickly identify candidate data blocksMinimize IO

5

6

Page 22: SQL + Hadoop:  The High Performance Advantage�

Confidential © 2014 Actian Corporation22

TPC-H 1TB – Faster, Less Hardware

0 100,000 200,000 300,000 400,000

Actian Vector 445,529

Actian Vector 436,788

SQL Server 219,888

Oracle 209,534

Oracle 201,487

SQL Server 173,962

Sybase IQ 164,747

Oracle 140,181

SQL Server 134,117

June ‘12

May ‘11

Aug ‘11

June ‘11

Sept ‘11

Apr ‘11

Dec ‘10

Apr ‘10

Dec ‘11

$57,146

$1,229,968

$460,869

$2,402,706

$753,392

$278,527

$85,621

$1,249,967

$258,880

Hardware Cost(excluding discounts)QphH

Fastest TPC-H QphH@1TB Benchmark (non-clustered)Source: www.tpc.org /

Page 23: SQL + Hadoop:  The High Performance Advantage�

Confidential © 2014 Actian Corporation23

HADOOP

YARN

HDFS

Standard SQLInterfaces

DataNode

HDFS

Visual Data & Analytics Workflow

Actian Analytics Platform – Hadoop SQL EditionTransform Hadoop into a High Performance Analytics Platform

DataNode

HDFS

DataNode

HDFS

DataNode

HDFS

X100X100X100

ReadLoad

Actian VectorBlend &Enrich

Data Science & Analytics

DataNode

HDFS

X100

HDFS

Vector

• Original file format• Standard block

replicationNameNode

High Performance, Industrialized SQL

Database

High Performance, Parallelized Data Flow

Engine

• Column-based blocks

• Compressed• Partitioned

Replicated Vector

• >=3 ReplicatedCopies of VectorBlocks

• Leveraged to co-locate data with various join keys

Page 24: SQL + Hadoop:  The High Performance Advantage�

Confidential © 2014 Actian Corporation24

History of the TPC-DS Comparison

Page 25: SQL + Hadoop:  The High Performance Advantage�

Confidential © 2014 Actian Corporation25 Confidential © 2014 Actian Corporation 25

TPC-DS Benchmark Components

OperationalSystems

Refresh Process Ad-hoc Reporting Queries

User Queries

DSS DatabaseTPC-DS

Reports

Store

Web

Catalog

Inventory

Promotions

Set of Files

ETL

Page 26: SQL + Hadoop:  The High Performance Advantage�

Confidential © 2014 Actian Corporation26

Actian Hadoop SQL Performance

Q3 Q7 Q19 Q27 Q34 Q42 Q43 Q46 Q52 Q53 Q55 Q59 Q63 Q65 Q68 Q73 Q79 Q89 Q980

2

4

6

8

10

12

14

16

18

“Impala Subset” of TPC-DS Queries at Scale Factor 3000 (3TB) Speedup vs Impala

Impala Actian

16x avg. speedup

Background to “Impala Subset “of TPC-DS benchmark can be found here:http://blog.cloudera.com/blog/2014/01/impala-performance-dbms-class-speed/

Both Executed on the Same Hardware and Software Environment:5 Node Cluster with 64GB of RAM per node and 12x2TB Hard Disks.

Spe

ed

up

Fa

ctor

Page 27: SQL + Hadoop:  The High Performance Advantage�

Confidential © 2014 Actian Corporation27

Comprehensive – covers full analytic process: data blending & enrichment, discovery & data science, analytics & operational BI

Accessible – standard ANSI SQL to support standard BI tools; plus key advanced analytics including cube, grouping sets and windowing functions

Optimized – mature, proven planner and optimizer; optimal use of every node, CPU, memory, and cache

Secure – native DBMS security including authentication, user and role-based security, data protection, and encryption

Reliable - fully ACID-compliant with multi-version read consistency, plus system-wide failover protection

Manageable – resources managed automatically in Hadoop via YARN

Consumable – now usable by millions of users with every SQL tool and application on the planet

Scalable – unlimited expansion to handle extreme #s of users, nodes, data

Most Industrialized SQL in Hadoop

Page 28: SQL + Hadoop:  The High Performance Advantage�

Confidential © 2014 Actian Corporation28

Actian Director for Management

Page 29: SQL + Hadoop:  The High Performance Advantage�

Confidential © 2014 Actian Corporation29

Actian Analytics Platform – Hadoop SQL EditionIndustrialized, High-Performance SQL in Hadoop

Only end-to-end analytic processing natively in Hadoop

Highest performing, most industrialized SQL in Hadoop

Removes all barriers for business access to big data analytics

Unleashes millions of business-savvy SQL users on Hadoop data

Outperforms Cloudera’s Impala by up to 30x

Actian transforms Hadoop from a data lake into a high-performance analytics platform.

Page 30: SQL + Hadoop:  The High Performance Advantage�

Confidential © 2014 Actian Corporation30

Transform Hadoop – Transform your Business

Page 31: SQL + Hadoop:  The High Performance Advantage�

Confidential © 2014 Actian Corporation31

3

Get started today! www.actian.com/hadoop

Pre-register for an evaluation copy of Actian’s SQL in Hadoopbigdata.actian.com/sql-in-hadoop

Register for a Sand Hill Hadoop Survey Results webinar on July 24, 2014bigdata.actian.com/SandHill- Hadoop-Results

2

1

Page 32: SQL + Hadoop:  The High Performance Advantage�

Confidential © 2014 Actian Corporation32

3

Get started today! www.actian.com/hadoop

Pre-register for an evaluation copy of Actian’s SQL in Hadoopbigdata.actian.com/sql-in-hadoop

Register for a Sand Hill Hadoop Survey Results webinar on July 24, 2014bigdata.actian.com/SandHill- Hadoop-Results

2

1