getting big data done on a gpu-based...
TRANSCRIPT
SQream Technologies Ltd -‐ Confiden7al 1
Ge#ng Big Data Done On a GPU-‐Based Database
Ori Netzer VP Product 26-‐Mar-‐14
Analy7cs Performance -‐ 3 TB, 18 Billion records
SQream Technologies Ltd -‐ Confiden7al
SQream Database 400x More Cost Efficient!
Big Data 3V’s -‐ Volume
SQream Technologies Ltd -‐ Confiden7al 4
Big Data 3V’s – Vareity
SQream Technologies Ltd -‐ Confiden7al 5
Big Data 3V’s -‐ velocity
SQream Technologies Ltd -‐ Confiden7al 6
Structured Data
SQream Technologies Ltd -‐ Confiden7al 7
Un-‐Structured Data
SQream Technologies Ltd -‐ Confiden7al 8
Big Data Challenges
9 SQream Technologies Ltd -‐ Confiden7al
Big Data
Volume
Velocity
Variety
Growth rate, history …
Throughput – Transac7on/Seconds
Data Types, logs, video
Performance Batch, In-‐rest, In-‐Mo7on
Work Data Modeling, ETL
Time Work, storage, ETL, computa7on
• Hardware – Disk Space – Processing Power – Network
• Skill Set – Large clusters – New querying language – New methods
Big Data – Challenges
SQream Technologies Ltd -‐ Confiden7al 10
GPU advantages in a nutshell
Massive parallel compu7ng power
Aggressive data crunching
Extreme scalability
Low power consump7on
SQream Technologies Ltd -‐ Confiden7al
GPU Challenges in a nutshell
PCI-‐E
Limited GRAM
New Algorithms SIMD
SQream Technologies Ltd -‐ Confiden7al
SQream Top Level Architecture
SQream Technologies Ltd -‐ Confiden7al
The Product - under the hood
14
SQream Data Storage
1 2 3 4
2 a 1 c
3 d 1 C
4 d 1 c
2 4
a c
d d
d j
a 1 c
d 1 C
d 1 c
a
d
1 c
100T Data in rows + Indices + Views
50T Columnar Data
9T Crunched Data + Smart Meta Data
SQream Technologies Ltd -‐ Confiden7al
Storage Foot Print
Before 100T AIer 9T
SQream Technologies Ltd -‐ Confiden7al
DBMS -‐ Query Run Time – No Op7miza7ons
SQream Technologies Ltd -‐ Confiden7al
Select T, sum(A) from Example group by T having (A) > 1000
A B C D E F
1010 N 1 a 10 25 66
900 N 2 b 54 55 55
5000 N 3 c 754 54 58
500 N 4 f 748 554 85
800 N 5 a 47 55 88
600 N 6 r 58 555 478
Read ALL data Process ALL data
Query output
Long execu7on 7me 2 x
CPU
= 16 cores
SQream -‐ Query Run Time Select T, sum(A) from Example group by T having (A) > 1000
Read Pin Point data Process Relevant data Query output
Shortest execu7on 7me 2 x
GPU
= ~5600 cores
A B C D E F
1010 N 1 a 10 25 66
900 N 2 b 54 55 55
5000 N 3 c 754 54 58
500 N 4 f 748 554 85
800 N 5 a 47 55 88
600 N 6 r 58 555 478
GPUs CPUs
SQream Technologies Ltd -‐ Confiden7al
SQream as Data Warehouse
SQream Technologies Ltd -‐ Confiden7al
CEP Cluster
SQream Technologies Ltd -‐ Confiden7al
SQREAM
InfiniBand
Direct GPU
SQREAM
Direct GPU
SQREAM
Direct GPU
Web App
PBX
Data stream: web logs
Data stream: CDR
·∙ Apply new query on Data in Motion·∙ Apply Real time Ad Hoc query on all data in rest
Real time stream processing Ad hoc query processing
High Availability Cluster
SQream Technologies Ltd -‐ Confiden7al
Analytics as a Services (AaaS)
SQream Technologies Ltd -‐ Confiden7al
SQream Technologies Ltd -‐ Confiden7al 23
USE CASES & BENCHMARKS
SQream Technologies Ltd -‐ Confiden7al 23
17
106
390
0
50
100
150
200
250
300
350
400
450
Time in Secon
ds
Sqream Infobright MSSQL
Lab Results – 100GB TPC H
24
SQream vs. Redshii vs. Hadoop*
25
29
155
1491
0
200
400
600
800
1000
1200
1400
1600 Time in Sec
Amount of Data
SQream vs. Redshii vs. Hadoop
Sqream
Redshii
Hadoop
hkp://www.hapyrus.com/blog/posts/behind-‐amazon-‐redshii-‐is-‐10x-‐faster-‐and-‐cheaper-‐than-‐hadoop-‐hive-‐slides
Query Run 7me – based on Hapyrus benchmark
SQream vs. Redshii vs. Hadoop*
26
hkp://www.hapyrus.com/blog/posts/behind-‐amazon-‐redshii-‐is-‐10x-‐faster-‐and-‐cheaper-‐than-‐hadoop-‐hive-‐slides
Data Load Time – based on Hapyrus benchmark
0
2
4
6
8
10
12
14
16
18 Time in Hou
rs
Amount of Data
SQream vs. Redshii vs. Hadoop
Sqream
Redshii
Cyber Analysis – Pakern Matching
Use Case • load and analyze massive network traffic
The Test • Load 6000 IP CDR per second
• Run queries under 10 seconds
SQream Technologies Ltd -‐ Confiden7al
Cyber Use Case – Exis7ng Status
Hardware • HP Superdome 32 Cores • EMC Storage SoIware • Oracle 11g
Total Hardwa
re cost: ~$15
0,000
SQream Technologies Ltd -‐ Confiden7al
Cyber Use Case – SQream Solu7on
Hardware • Dell T7600 SoIware • SQream DB
Dell T7600
Total Hardwa
re cost: ~$7,
000
SQream Technologies Ltd -‐ Confiden7al
0.8 1 0.6 0.4
0.1
10
11
9
8
2
0
2
4
6
8
10
12
1 2 3 4 5
Time in Secon
ds
Run7me query results for IP Traffic Analysis by query number
Sqream Oracle
Results for 8.5B Records ~17TB
SQream Technologies Ltd -‐ Confiden7al
SQream POC on Databases with Orange Silicon Valley
Soumik Sinharoy Sr. Product Manager at Orange
SQream’s DB Saves ~$6,000,000 on Big Data insights
$6,300,000 Price list
$250,000 Price list
X40 cost perfor
mance
Data Is Growing World Wide IP Traffic Growth
Orange Group
Our Data Is Growing Orange France IP Traffic Growth
• Monthly CDR Volume – Today : 60 Billion – Q4 2014 : 150 Billion
What We Plan To Do With Big Data
37
Massive database consolida_on : 400 X savings in CapEX
Airborne Datacenters -‐ SuperCompu_ng
Real_me processing of sensor data
Problem Statement -‐ Massive Data Store -‐ Mul7 Dimensional Queries -‐ 4-‐9 Billion CDR -‐ Reduce execu7on 7me -‐ Reduce TCO
Current Challenges
IP CDR
IPTV VoD
Fixed
Mul_ Media
Revenue Assurance – POC vs. Leading RDBMS
• Scan & Join 35 Billion CDR and 30M Subscribers
• Challenges: – Require massive investment
in Hardware DW system – Require have manual process
for op7mizing the soiware – Doesn’t scale
9
60
0
10
20
30
40
50
60
70
Time in m
inutes
Loading Time 300M CDRs
39
• Q2 – Map load on the network by hour, total call, minutes • Q3 – Map by hour and call type, break down for the load on the network by call
type. Sms, data, voice • Q4 – Map load on the network in specific 7me • Q5 -‐ Map load on the network in specific 7me and specific service: voice, sms, data
Query Run Time 300 Million CDRs
SQream vs. leading RDBMS run7me 300 million
leading RDBMS
Mul_ Media Repor_ng – IP Traffic Analysis on anonymized IP CDR from Orange France
• Data coming from MMT are monthly received on MMDM server
• Data are processed for a dynamic repor7ng applica7on
• Challenges: – Require massive investment
in Hardware DW system – Require have manual process
for op7mizing the soiware 9
60
0
10
20
30
40
50
60
70
Time in m
inutes
Loading Time 300M CDRs
41
~4,300,000,000 Call records ~30,000,000 Subscribers
4 Months 1 Operator -‐ Orange
14 Servers 168 CPU cores 1,344GB RAM 67,200GB SSD 403,200GB HDD
875 Seconds
~4,300,000,000 Call records ~30,000,000 Subscribers
4 Months 1 Operator -‐ Orange
1 Server 8 CPU cores 128GB RAM 3,200GB SSD 10,000GB HDD $250,000 Price list
1x K40 GPU
636 Seconds • 18 month of data was also tested Execu7on Time = 2484 Seconds
Less is More… SQream vs. Leading EDW
0
100
200
300
400
500
600
700
800
Q1 Q2 Q1 Q2
1 Month 4 Months
Time(Sec)
• Q1 – Aggrega7on query for IP u7liza7on by users • Q2 – Group By query for 1 dimension
SQream – Yes, We Scale Linearly!
0
500
1000
1500
2000
2500
3000
1 Month 4 Months 9 Months 13 Months 18 Months
Second
s
Months
Grouping for all costumer segments -‐ linear scaling 1 month to 18 months
SQream Technologies Ltd -‐ Confiden7al 46
QUESTIONS ???
SQream Technologies Ltd -‐ Confiden7al
• Data Crunching: • Faster compression 7me x20 • Faster decompression 7me x50-‐70 • Higher compression ra7o x5-‐15 • Compute: • Faster MPP in a node x20 • Higher scalability x1 node x3000 cores • Lower hardware cost $7,000,000 > $15K
Why SQream? Technology
SQream Technologies Ltd -‐ Confiden7al
TCO • Lower power consump7on 1 node vs. 10 nodes • Lower storage and license cost x10 – x50 • Lower footprint 2U vs. 20U Deployment advantages • No indexes, no cubicles, no projec7ons. • Built-‐in full scan ad-‐hoc queries support. • No code changes. Simple SQL.
Why SQream? Hassle free Solu7on
SQream Technologies Ltd -‐ Confiden7al
SQream the fastest analy_cs ever
Thank you