getting big data done on a gpu-based...

49
SQream Technologies Ltd Confiden7al 1

Upload: others

Post on 15-Aug-2020

0 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Getting Big Data Done On a GPU-Based Databaseon-demand.gputechconf.com/gtc/2014/presentations/S4644... · 2014-04-10 · Title: Getting Big Data Done On a GPU-Based Database Author:

SQream  Technologies  Ltd  -­‐    Confiden7al   1  

Page 2: Getting Big Data Done On a GPU-Based Databaseon-demand.gputechconf.com/gtc/2014/presentations/S4644... · 2014-04-10 · Title: Getting Big Data Done On a GPU-Based Database Author:

Ge#ng  Big  Data  Done  On  a  GPU-­‐Based  Database  

Ori  Netzer  VP  Product  26-­‐Mar-­‐14  

Page 3: Getting Big Data Done On a GPU-Based Databaseon-demand.gputechconf.com/gtc/2014/presentations/S4644... · 2014-04-10 · Title: Getting Big Data Done On a GPU-Based Database Author:

Analy7cs  Performance  -­‐  3  TB,  18  Billion  records  

SQream  Technologies  Ltd  -­‐    Confiden7al  

SQream  Database    400x  More  Cost  Efficient!    

Page 4: Getting Big Data Done On a GPU-Based Databaseon-demand.gputechconf.com/gtc/2014/presentations/S4644... · 2014-04-10 · Title: Getting Big Data Done On a GPU-Based Database Author:

Big  Data  3V’s  -­‐  Volume  

SQream  Technologies  Ltd  -­‐    Confiden7al   4  

Page 5: Getting Big Data Done On a GPU-Based Databaseon-demand.gputechconf.com/gtc/2014/presentations/S4644... · 2014-04-10 · Title: Getting Big Data Done On a GPU-Based Database Author:

Big  Data  3V’s  –  Vareity  

SQream  Technologies  Ltd  -­‐    Confiden7al   5  

Page 6: Getting Big Data Done On a GPU-Based Databaseon-demand.gputechconf.com/gtc/2014/presentations/S4644... · 2014-04-10 · Title: Getting Big Data Done On a GPU-Based Database Author:

Big  Data  3V’s  -­‐  velocity  

SQream  Technologies  Ltd  -­‐    Confiden7al   6  

Page 7: Getting Big Data Done On a GPU-Based Databaseon-demand.gputechconf.com/gtc/2014/presentations/S4644... · 2014-04-10 · Title: Getting Big Data Done On a GPU-Based Database Author:

Structured  Data    

SQream  Technologies  Ltd  -­‐    Confiden7al   7  

Page 8: Getting Big Data Done On a GPU-Based Databaseon-demand.gputechconf.com/gtc/2014/presentations/S4644... · 2014-04-10 · Title: Getting Big Data Done On a GPU-Based Database Author:

Un-­‐Structured  Data    

SQream  Technologies  Ltd  -­‐    Confiden7al   8  

Page 9: Getting Big Data Done On a GPU-Based Databaseon-demand.gputechconf.com/gtc/2014/presentations/S4644... · 2014-04-10 · Title: Getting Big Data Done On a GPU-Based Database Author:

Big  Data  Challenges  

9  SQream  Technologies  Ltd  -­‐    Confiden7al  

Big  Data  

Volume  

Velocity  

Variety  

Growth  rate,  history  …  

Throughput  –  Transac7on/Seconds  

Data  Types,  logs,  video  

Performance    Batch,  In-­‐rest,  In-­‐Mo7on  

Work  Data  Modeling,  ETL  

Time  Work,  storage,  ETL,  computa7on  

Page 10: Getting Big Data Done On a GPU-Based Databaseon-demand.gputechconf.com/gtc/2014/presentations/S4644... · 2014-04-10 · Title: Getting Big Data Done On a GPU-Based Database Author:

•  Hardware  – Disk  Space  – Processing  Power  – Network  

•  Skill  Set  – Large  clusters  – New  querying  language  – New  methods    

Big  Data  –  Challenges    

SQream  Technologies  Ltd  -­‐    Confiden7al   10  

Page 11: Getting Big Data Done On a GPU-Based Databaseon-demand.gputechconf.com/gtc/2014/presentations/S4644... · 2014-04-10 · Title: Getting Big Data Done On a GPU-Based Database Author:

GPU  advantages  in  a  nutshell  

Massive  parallel  compu7ng  power  

Aggressive  data  crunching    

Extreme  scalability  

Low  power  consump7on  

 

SQream  Technologies  Ltd  -­‐    Confiden7al  

Page 12: Getting Big Data Done On a GPU-Based Databaseon-demand.gputechconf.com/gtc/2014/presentations/S4644... · 2014-04-10 · Title: Getting Big Data Done On a GPU-Based Database Author:

GPU  Challenges  in  a  nutshell  

PCI-­‐E    

Limited  GRAM  

New  Algorithms  SIMD    

 

SQream  Technologies  Ltd  -­‐    Confiden7al  

Page 13: Getting Big Data Done On a GPU-Based Databaseon-demand.gputechconf.com/gtc/2014/presentations/S4644... · 2014-04-10 · Title: Getting Big Data Done On a GPU-Based Database Author:

SQream  Top  Level  Architecture  

SQream  Technologies  Ltd  -­‐    Confiden7al  

Page 14: Getting Big Data Done On a GPU-Based Databaseon-demand.gputechconf.com/gtc/2014/presentations/S4644... · 2014-04-10 · Title: Getting Big Data Done On a GPU-Based Database Author:

The Product - under the hood  

14

Page 15: Getting Big Data Done On a GPU-Based Databaseon-demand.gputechconf.com/gtc/2014/presentations/S4644... · 2014-04-10 · Title: Getting Big Data Done On a GPU-Based Database Author:

SQream  Data  Storage  

1   2   3   4  

2   a   1   c  

3   d   1   C  

4   d   1   c  

2   4  

a   c  

d   d  

d   j  

a   1   c  

d   1   C  

d   1   c  

a

d

1 c

100T    Data  in  rows  +  Indices  +  Views  

50T  Columnar  Data  

9T    Crunched  Data  +  Smart  Meta  Data  

SQream  Technologies  Ltd  -­‐    Confiden7al  

Page 16: Getting Big Data Done On a GPU-Based Databaseon-demand.gputechconf.com/gtc/2014/presentations/S4644... · 2014-04-10 · Title: Getting Big Data Done On a GPU-Based Database Author:

Storage  Foot  Print  

Before  100T   AIer  9T  

SQream  Technologies  Ltd  -­‐    Confiden7al  

Page 17: Getting Big Data Done On a GPU-Based Databaseon-demand.gputechconf.com/gtc/2014/presentations/S4644... · 2014-04-10 · Title: Getting Big Data Done On a GPU-Based Database Author:

DBMS  -­‐  Query  Run  Time  –  No  Op7miza7ons  

SQream  Technologies  Ltd  -­‐    Confiden7al  

Select  T,  sum(A)  from  Example  group  by  T  having  (A)  >  1000  

A   B   C   D   E   F  

1010   N  1   a   10   25   66  

900   N  2   b   54   55   55  

5000   N  3   c   754   54   58  

500   N  4   f   748   554   85  

800   N  5   a   47   55   88  

600   N  6   r   58   555   478  

Read  ALL  data   Process  ALL  data  

Query  output  

Long  execu7on  7me  2  x  

CPU  

=  16  cores  

Page 18: Getting Big Data Done On a GPU-Based Databaseon-demand.gputechconf.com/gtc/2014/presentations/S4644... · 2014-04-10 · Title: Getting Big Data Done On a GPU-Based Database Author:

SQream  -­‐  Query  Run  Time  Select  T,  sum(A)  from  Example  group  by  T  having  (A)  >  1000  

Read  Pin  Point  data   Process  Relevant  data   Query  output  

Shortest  execu7on  7me  2  x  

GPU  

=  ~5600  cores  

A   B   C   D   E   F  

1010   N  1   a   10   25   66  

900   N  2   b   54   55   55  

5000   N  3   c   754   54   58  

500   N  4   f   748   554   85  

800   N  5   a   47   55   88  

600   N  6   r   58   555   478  

GPUs CPUs

SQream  Technologies  Ltd  -­‐    Confiden7al  

Page 19: Getting Big Data Done On a GPU-Based Databaseon-demand.gputechconf.com/gtc/2014/presentations/S4644... · 2014-04-10 · Title: Getting Big Data Done On a GPU-Based Database Author:

SQream as Data Warehouse  

SQream  Technologies  Ltd  -­‐    Confiden7al  

Page 20: Getting Big Data Done On a GPU-Based Databaseon-demand.gputechconf.com/gtc/2014/presentations/S4644... · 2014-04-10 · Title: Getting Big Data Done On a GPU-Based Database Author:

CEP  Cluster  

SQream  Technologies  Ltd  -­‐    Confiden7al  

SQREAM

InfiniBand

Direct  GPU

SQREAM

Direct  GPU

SQREAM

Direct  GPU

Web  App

PBX

Data  stream:  web  logs

Data  stream:  CDR

·∙   Apply  new  query  on  Data  in  Motion·∙   Apply  Real  time  Ad  Hoc  query  on  all  data  in  rest

Real  time  stream  processing  Ad  hoc  query  processing

Page 21: Getting Big Data Done On a GPU-Based Databaseon-demand.gputechconf.com/gtc/2014/presentations/S4644... · 2014-04-10 · Title: Getting Big Data Done On a GPU-Based Database Author:

High  Availability  Cluster  

SQream  Technologies  Ltd  -­‐    Confiden7al  

Page 22: Getting Big Data Done On a GPU-Based Databaseon-demand.gputechconf.com/gtc/2014/presentations/S4644... · 2014-04-10 · Title: Getting Big Data Done On a GPU-Based Database Author:

Analytics as a Services (AaaS)  

SQream  Technologies  Ltd  -­‐    Confiden7al  

Page 23: Getting Big Data Done On a GPU-Based Databaseon-demand.gputechconf.com/gtc/2014/presentations/S4644... · 2014-04-10 · Title: Getting Big Data Done On a GPU-Based Database Author:

SQream  Technologies  Ltd  -­‐    Confiden7al   23  

USE  CASES  &  BENCHMARKS  

SQream  Technologies  Ltd  -­‐    Confiden7al   23  

Page 24: Getting Big Data Done On a GPU-Based Databaseon-demand.gputechconf.com/gtc/2014/presentations/S4644... · 2014-04-10 · Title: Getting Big Data Done On a GPU-Based Database Author:

17  

106  

390  

0  

50  

100  

150  

200  

250  

300  

350  

400  

450  

Time  in  Secon

ds  

Sqream   Infobright   MSSQL  

Lab  Results  –  100GB  TPC  H  

24

Page 25: Getting Big Data Done On a GPU-Based Databaseon-demand.gputechconf.com/gtc/2014/presentations/S4644... · 2014-04-10 · Title: Getting Big Data Done On a GPU-Based Database Author:

SQream  vs.  Redshii  vs.  Hadoop*  

25

29  

155  

1491  

0  

200  

400  

600  

800  

1000  

1200  

1400  

1600  Time  in  Sec  

Amount  of  Data  

SQream  vs.  Redshii  vs.  Hadoop  

Sqream  

Redshii  

Hadoop  

hkp://www.hapyrus.com/blog/posts/behind-­‐amazon-­‐redshii-­‐is-­‐10x-­‐faster-­‐and-­‐cheaper-­‐than-­‐hadoop-­‐hive-­‐slides  

Query  Run  7me  –  based  on  Hapyrus  benchmark  

Page 26: Getting Big Data Done On a GPU-Based Databaseon-demand.gputechconf.com/gtc/2014/presentations/S4644... · 2014-04-10 · Title: Getting Big Data Done On a GPU-Based Database Author:

SQream  vs.  Redshii  vs.  Hadoop*  

26

hkp://www.hapyrus.com/blog/posts/behind-­‐amazon-­‐redshii-­‐is-­‐10x-­‐faster-­‐and-­‐cheaper-­‐than-­‐hadoop-­‐hive-­‐slides  

Data  Load  Time  –  based  on  Hapyrus  benchmark  

0  

2  

4  

6  

8  

10  

12  

14  

16  

18  Time  in  Hou

rs  

Amount  of  Data  

SQream  vs.  Redshii  vs.  Hadoop  

Sqream  

Redshii  

Page 27: Getting Big Data Done On a GPU-Based Databaseon-demand.gputechconf.com/gtc/2014/presentations/S4644... · 2014-04-10 · Title: Getting Big Data Done On a GPU-Based Database Author:

Cyber  Analysis  –  Pakern  Matching  

Use  Case  •  load  and  analyze  massive  network  traffic  

The  Test  •  Load  6000  IP  CDR  per  second  

•  Run  queries  under  10  seconds  

SQream  Technologies  Ltd  -­‐    Confiden7al  

Page 28: Getting Big Data Done On a GPU-Based Databaseon-demand.gputechconf.com/gtc/2014/presentations/S4644... · 2014-04-10 · Title: Getting Big Data Done On a GPU-Based Database Author:

Cyber  Use  Case  –  Exis7ng  Status  

Hardware  •  HP  Superdome  32  Cores  •  EMC  Storage        SoIware  •  Oracle  11g  

Total  Hardwa

re  cost:  ~$15

0,000  

SQream  Technologies  Ltd  -­‐    Confiden7al  

Page 29: Getting Big Data Done On a GPU-Based Databaseon-demand.gputechconf.com/gtc/2014/presentations/S4644... · 2014-04-10 · Title: Getting Big Data Done On a GPU-Based Database Author:

Cyber  Use  Case  –  SQream  Solu7on  

Hardware  •  Dell  T7600        SoIware  •  SQream  DB  

Dell  T7600  

Total  Hardwa

re  cost:  ~$7,

000  

SQream  Technologies  Ltd  -­‐    Confiden7al  

Page 30: Getting Big Data Done On a GPU-Based Databaseon-demand.gputechconf.com/gtc/2014/presentations/S4644... · 2014-04-10 · Title: Getting Big Data Done On a GPU-Based Database Author:

0.8   1  0.6   0.4  

0.1  

10  

11  

9  

8  

2  

0  

2  

4  

6  

8  

10  

12  

1   2   3   4   5  

Time  in  Secon

ds  

Run7me  query  results  for  IP  Traffic  Analysis  by  query  number  

Sqream   Oracle  

Results  for  8.5B  Records  ~17TB    

SQream  Technologies  Ltd  -­‐    Confiden7al  

Page 31: Getting Big Data Done On a GPU-Based Databaseon-demand.gputechconf.com/gtc/2014/presentations/S4644... · 2014-04-10 · Title: Getting Big Data Done On a GPU-Based Database Author:

SQream  POC  on  Databases  with  Orange  Silicon  Valley  

Soumik  Sinharoy    Sr.  Product  Manager  at  Orange  

Page 32: Getting Big Data Done On a GPU-Based Databaseon-demand.gputechconf.com/gtc/2014/presentations/S4644... · 2014-04-10 · Title: Getting Big Data Done On a GPU-Based Database Author:

SQream’s  DB  Saves  ~$6,000,000  on  Big  Data  insights    

$6,300,000  Price  list  

$250,000  Price  list  

X40  cost  perfor

mance  

Page 33: Getting Big Data Done On a GPU-Based Databaseon-demand.gputechconf.com/gtc/2014/presentations/S4644... · 2014-04-10 · Title: Getting Big Data Done On a GPU-Based Database Author:

Data  Is  Growing  World  Wide  IP  Traffic  Growth  

Page 34: Getting Big Data Done On a GPU-Based Databaseon-demand.gputechconf.com/gtc/2014/presentations/S4644... · 2014-04-10 · Title: Getting Big Data Done On a GPU-Based Database Author:

Orange  Group  

Page 35: Getting Big Data Done On a GPU-Based Databaseon-demand.gputechconf.com/gtc/2014/presentations/S4644... · 2014-04-10 · Title: Getting Big Data Done On a GPU-Based Database Author:
Page 36: Getting Big Data Done On a GPU-Based Databaseon-demand.gputechconf.com/gtc/2014/presentations/S4644... · 2014-04-10 · Title: Getting Big Data Done On a GPU-Based Database Author:

Our  Data  Is  Growing  Orange  France  IP  Traffic  Growth    

•  Monthly  CDR  Volume  –  Today  :  60  Billion  –  Q4  2014  :  150  Billion    

Page 37: Getting Big Data Done On a GPU-Based Databaseon-demand.gputechconf.com/gtc/2014/presentations/S4644... · 2014-04-10 · Title: Getting Big Data Done On a GPU-Based Database Author:

What  We  Plan  To  Do  With  Big  Data  

37  

Massive    database  consolida_on  :  400  X  savings  in  CapEX  

Airborne  Datacenters  -­‐  SuperCompu_ng  

Real_me  processing  of  sensor  data  

Page 38: Getting Big Data Done On a GPU-Based Databaseon-demand.gputechconf.com/gtc/2014/presentations/S4644... · 2014-04-10 · Title: Getting Big Data Done On a GPU-Based Database Author:

Problem  Statement  -­‐  Massive  Data  Store  -­‐  Mul7  Dimensional  Queries  -­‐  4-­‐9  Billion  CDR  -­‐  Reduce  execu7on  7me  -­‐  Reduce  TCO  

Current  Challenges  

IP  CDR  

IPTV  VoD  

Fixed  

Mul_  Media  

Page 39: Getting Big Data Done On a GPU-Based Databaseon-demand.gputechconf.com/gtc/2014/presentations/S4644... · 2014-04-10 · Title: Getting Big Data Done On a GPU-Based Database Author:

Revenue  Assurance  –  POC  vs.  Leading  RDBMS  

•  Scan  &  Join  35  Billion  CDR  and  30M  Subscribers    

•  Challenges:  –  Require  massive  investment  

in  Hardware  DW  system    –  Require  have  manual  process  

for  op7mizing  the  soiware  –  Doesn’t  scale  

9  

60  

0  

10  

20  

30  

40  

50  

60  

70  

Time  in  m

inutes  

 Loading  Time  300M  CDRs  

39  

Page 40: Getting Big Data Done On a GPU-Based Databaseon-demand.gputechconf.com/gtc/2014/presentations/S4644... · 2014-04-10 · Title: Getting Big Data Done On a GPU-Based Database Author:

•  Q2  –  Map  load  on  the  network  by  hour,  total  call,  minutes  •  Q3  –  Map  by  hour  and  call  type,  break  down  for  the  load  on  the  network  by  call  

type.  Sms,  data,  voice  •  Q4  –  Map  load  on  the  network  in  specific  7me  •  Q5  -­‐  Map  load  on  the  network  in  specific  7me  and  specific  service:  voice,  sms,  data  

Query  Run  Time  300  Million  CDRs  

SQream  vs.  leading  RDBMS  run7me  300  million  

leading  RDBMS  

Page 41: Getting Big Data Done On a GPU-Based Databaseon-demand.gputechconf.com/gtc/2014/presentations/S4644... · 2014-04-10 · Title: Getting Big Data Done On a GPU-Based Database Author:

Mul_  Media  Repor_ng  –  IP  Traffic  Analysis  on  anonymized  IP  CDR  from  Orange  France  

•  Data  coming  from  MMT  are  monthly  received  on  MMDM  server  

•  Data  are  processed  for  a  dynamic  repor7ng  applica7on      

•  Challenges:  –  Require  massive  investment  

in  Hardware  DW  system    –  Require  have  manual  process  

for  op7mizing  the  soiware  9  

60  

0  

10  

20  

30  

40  

50  

60  

70  

Time  in  m

inutes  

 Loading  Time  300M  CDRs  

41  

Page 42: Getting Big Data Done On a GPU-Based Databaseon-demand.gputechconf.com/gtc/2014/presentations/S4644... · 2014-04-10 · Title: Getting Big Data Done On a GPU-Based Database Author:

~4,300,000,000  Call  records  ~30,000,000  Subscribers  

4  Months  1  Operator  -­‐  Orange  

14  Servers  168  CPU  cores  1,344GB  RAM  67,200GB  SSD    403,200GB  HDD  

875  Seconds    

Page 43: Getting Big Data Done On a GPU-Based Databaseon-demand.gputechconf.com/gtc/2014/presentations/S4644... · 2014-04-10 · Title: Getting Big Data Done On a GPU-Based Database Author:

~4,300,000,000  Call  records  ~30,000,000  Subscribers  

4  Months  1  Operator  -­‐  Orange  

1  Server  8  CPU  cores  128GB  RAM  3,200GB  SSD    10,000GB  HDD  $250,000  Price  list  

1x  K40  GPU  

636  Seconds    •  18  month  of  data  was  also  tested  Execu7on  Time  =  2484  Seconds    

Page 44: Getting Big Data Done On a GPU-Based Databaseon-demand.gputechconf.com/gtc/2014/presentations/S4644... · 2014-04-10 · Title: Getting Big Data Done On a GPU-Based Database Author:

Less  is  More…  SQream  vs.  Leading  EDW  

0  

100  

200  

300  

400  

500  

600  

700  

800  

Q1   Q2   Q1   Q2  

1  Month   4  Months  

Time(Sec)  

•  Q1  –  Aggrega7on  query  for  IP  u7liza7on  by  users  •  Q2  –  Group  By  query  for  1  dimension    

Page 45: Getting Big Data Done On a GPU-Based Databaseon-demand.gputechconf.com/gtc/2014/presentations/S4644... · 2014-04-10 · Title: Getting Big Data Done On a GPU-Based Database Author:

SQream  –  Yes,  We  Scale  Linearly!    

0  

500  

1000  

1500  

2000  

2500  

3000  

1  Month   4  Months   9  Months   13  Months   18  Months  

Second

s  

Months  

Grouping  for  all  costumer  segments  -­‐    linear  scaling  1  month  to  18  months  

Page 46: Getting Big Data Done On a GPU-Based Databaseon-demand.gputechconf.com/gtc/2014/presentations/S4644... · 2014-04-10 · Title: Getting Big Data Done On a GPU-Based Database Author:

SQream  Technologies  Ltd  -­‐    Confiden7al   46  

QUESTIONS  ???  

SQream  Technologies  Ltd  -­‐    Confiden7al  

Page 47: Getting Big Data Done On a GPU-Based Databaseon-demand.gputechconf.com/gtc/2014/presentations/S4644... · 2014-04-10 · Title: Getting Big Data Done On a GPU-Based Database Author:

•  Data  Crunching:  •  Faster  compression  7me  x20  •  Faster  decompression  7me  x50-­‐70  •  Higher  compression  ra7o  x5-­‐15  •  Compute:  •  Faster  MPP  in  a  node  x20  •  Higher  scalability  x1  node  x3000  cores    •  Lower  hardware  cost  $7,000,000  >  $15K  

Why  SQream?    Technology  

SQream  Technologies  Ltd  -­‐    Confiden7al  

Page 48: Getting Big Data Done On a GPU-Based Databaseon-demand.gputechconf.com/gtc/2014/presentations/S4644... · 2014-04-10 · Title: Getting Big Data Done On a GPU-Based Database Author:

TCO  •  Lower  power  consump7on  1  node  vs.  10  nodes  •  Lower  storage  and  license  cost  x10  –  x50  •  Lower  footprint  2U  vs.  20U  Deployment  advantages  •  No  indexes,  no  cubicles,  no  projec7ons.  •  Built-­‐in  full  scan  ad-­‐hoc  queries  support.  •  No  code  changes.  Simple  SQL.  

Why  SQream?  Hassle  free  Solu7on  

SQream  Technologies  Ltd  -­‐    Confiden7al  

Page 49: Getting Big Data Done On a GPU-Based Databaseon-demand.gputechconf.com/gtc/2014/presentations/S4644... · 2014-04-10 · Title: Getting Big Data Done On a GPU-Based Database Author:

SQream  the  fastest  analy_cs  ever  

Thank  you