samza memory capacity_2015_ieee_big_data_data_quality_workshop

16
A Memory Capacity Model for High Performing Datafiltering Applica:ons in Samza Framework 1 Tao Feng, Zhenyun Zhuang, Yi Pan, Haricharan Ramachandra LinkedIn Corp

Upload: tao-feng

Post on 13-Jan-2017

1.665 views

Category:

Engineering


0 download

TRANSCRIPT

Page 1: Samza memory capacity_2015_ieee_big_data_data_quality_workshop

A  Memory  Capacity  Model  for  High  Performing  Data-­‐filtering  

Applica:ons  in  Samza  Framework  

1  

Tao  Feng,    Zhenyun  Zhuang,  Yi  Pan,  Haricharan  Ramachandra  LinkedIn  Corp  

Page 2: Samza memory capacity_2015_ieee_big_data_data_quality_workshop

Agenda  

•  Introduc:on  •  Memory  capacity  model    •  Evalua:on  •  Summary  

2  

Page 3: Samza memory capacity_2015_ieee_big_data_data_quality_workshop

INTRODUCTION      

3  

Page 4: Samza memory capacity_2015_ieee_big_data_data_quality_workshop

What  Is  Samza  

4  

Input  Stream  

Task  1   Task  2   Task  3  

Output  Stream   Changelog  Stream  

Local  state  store  

Checkpoint  

Container  

Page 5: Samza memory capacity_2015_ieee_big_data_data_quality_workshop

Samza-­‐based  Data  Filtering  Systems  

•  Two  main  scenarios  

5  

Data  Filtering  By  Rules   Data  Filtering  By  Joining  Streams  

Page 6: Samza memory capacity_2015_ieee_big_data_data_quality_workshop

MEMORY  CAPACITY  MODEL    

6  

Page 7: Samza memory capacity_2015_ieee_big_data_data_quality_workshop

Mo:va:on  

•  We  need  an  accurate  resource  predic:ve  model  for  beSer  capacity  planning  

•  We  could  have  more  containers  within  single  node  •  Higher  density  without  SLA  viola:on  •  Lower  business  cost  

7  

Page 8: Samza memory capacity_2015_ieee_big_data_data_quality_workshop

Memory  Capacity  Model  

•  L  =  TPE(B  +  Bk  +  Bm)    •  L:  live  data  set  size  •  T:  Number  of  input  topics  •  P:  Number  of  par::on  per  topic  •  E:  Number  of  unique  entry  per  par::on  •  B:  bytes  per  treemap  entry  •  Bk:  bytes  of  key  serializa:on  •  Bm:  bytes  of  value  message  serializa:on  

•  Required  Heap  Size  1H  =  2*L  •  Details  of  proof  could  be  found  in  our  paper  

8  

Page 9: Samza memory capacity_2015_ieee_big_data_data_quality_workshop

EVALUATION  

9  

Page 10: Samza memory capacity_2015_ieee_big_data_data_quality_workshop

Test  Setup  

10  

0  

broker  

Ka^a  Clusters  

1   …   N  

Contaier  

Test  System  

•  Test  System  config  •  24  cores  •  1gbps  nic  •  45GB  mem  

•  JVM  op:on:  •  UseG1GC  •  G1HeapRegion

Size=  4M  

broker  

broker  

Page 11: Samza memory capacity_2015_ieee_big_data_data_quality_workshop

Evalua:on  Methodology  

•  Firstly  we  deduct  the  heap  size  based  on  the  model  as  1H  •  e.g  with  T:  1,  P:  8,  E:  5  million,  B:  40  bytes,  Bk:  24  bytes,  Bm:  24  bytes,  1H  =  2*L  =  2*TPE(B  +  Bk  +  Bm)  =  7G  

•  Secondly  we  compare  Samza  job  throughput,  system  performance  metrics(GC  :me,  CPU:me)  with  2H,  3H  cases  

11  

Page 12: Samza memory capacity_2015_ieee_big_data_data_quality_workshop

Performance  Results  

12  

Page 13: Samza memory capacity_2015_ieee_big_data_data_quality_workshop

Performance  Results(conc)  

13  

Page 14: Samza memory capacity_2015_ieee_big_data_data_quality_workshop

Performance  Results(conc)  

14  

1H   2H   3H  

Young  GC  of  G1   Count   88   29   32  

Total  :me(ms)   9850   5063   6144  

Mixed  GC  of  G1   Count   24   0   0  

Total  :me(ms)   70166   0   0  

Total   Count   112   29   31  

Total  :me(ms)   80117   5063   6144  

•  No  full  GC  involved  in  1H  case    •  Expected  Higher  CPU  :me  and  GC  :me  for  1H  case  

Page 15: Samza memory capacity_2015_ieee_big_data_data_quality_workshop

Summary  

•  The  model  predicts  memory  usage  of  Samza  accurately  and  guarantees  Samza  job  SLA  w/o  much  Samza  SLA  viola:on  

•  It  allows  2X  dense  Samza  containers  deployments  within  the  same  node  with  the  accurate  memory  es:ma:on  

 

15  

Page 16: Samza memory capacity_2015_ieee_big_data_data_quality_workshop

Q  &  A  

16