dark silicon accelerators for database indexing

22
Dark Silicon Accelerators for Database Indexing Onur Kocberber , Kevin Lim, Babak Falsafi, Partha Ranganathan, Stavros Harizopoulos © 2012 EPFL PARSA

Upload: trinhtu

Post on 13-Feb-2017

221 views

Category:

Documents


1 download

TRANSCRIPT

Dark  Silicon  Accelerators  for  Database  Indexing  

 Onur  Kocberber,  Kevin  Lim,  Babak  Falsafi,  Partha  Ranganathan,  Stavros  Harizopoulos  

©  2012  EPFL  PARSA  

Dark  Silicon  and  Big  Data  Challenges  

•  Data  explosion    –  Data  growing  faster  than  technology  

•  End  of  “Free  energy”  –  Higher  density                  higher  energy      

•  Challenge:  CPUs  ill-­‐matched  to  server  workloads  – Most  of  Rme  waiRng  for  data  rather  than  compuRng    

 

©  2012  EPFL  PARSA  

Need to specialize for data-centric workloads

How  Do  Data-­‐Centric  Workloads  Access  Data?  •  Databases  create  and  use  an  index  –  Data  structures  for  fast  data  lookup  – Most  oUen  balanced  tree  or  hash  table  –  Frequently  accessed    

 •  Indexing  is  pointer-­‐intensive  –  UnderuRlize  general-­‐purpose  CPUs  –  IPCs  as  low  as  0.25  on  OoO  core  

 ©  2012  EPFL  PARSA  

Hash  Table   Tree  

ContribuRon:  Database  Indexing  Widget  

•  Index  lookups  on  general-­‐purpose  CPUs:  –  Pointer-­‐intensive                    low  IPC  –  Time-­‐intensive                  poor  energy-­‐efficiency  

•  Database  Indexing  Widget  –  Dedicated  hardware  for  database  index  lookups  –  Full-­‐service  offload:  core  sleeps  when  widget  runs  –  Up  to  65%  less  energy  per  query  

©  2012  EPFL  PARSA  

Outline  

IntroducRon  Indexing  in  Databases  Indexing  Widget  Results      

©  2012  EPFL  PARSA  

 

 

Modern  Databases  and  Indexing  Two  types  of  contemporary  in-­‐memory  databases:  

 

•   Two  fundamental  indexing  operaRons  – Hash  table  probe  – Tree  traversal  

 

 

©  2012  EPFL  PARSA  

Customer Date Product Customer Date Product Customer Date Product

Column-­‐store  analy/cal  processing   Scale-­‐out  transac/on  processing  

with  DSS   with  OLTP  

How  Much  Time  is  Spent  Indexing?    Measurement  on  Xeon  5670  CPU  with  HW  Counters  

©  2012  EPFL  PARSA  

0%  

25%  

50%  

75%  

100%  

Order  Status   Payment   Query  2   Query  17  

OLTP   DSS  

 ExecuEo

n  Time  

Hash  Table  

Tree  Hash  Table  

Tree  /  Hash  Table  

Indexing can account for up to 73% of execution

Example:  Hash  Join  

©  2012  EPFL  PARSA  

SQL : SELECT A_name FROM A,B WHERE A_age = B_age

Table  B  (60M  rows)  

Hash  Table  (A)  

❶ Build  ❷   Probe  

❸   Result  

Hash table probes dominate execution

Table  A  (2M  rows)    

2  1  

3  4  

2  1  

3  4  5  6  7  8  

25  48  19  11  63  31  26  41  

35  26  71  19  

9   42  

2   26  4   19  

 age  

 age  

Indexing  with  Hash  Table  Probes  

©  2012  EPFL  PARSA  

>>

Hash  FuncEon  

Buckets   Chains  

Hash  Table  Key  

Compare  

?  

Each  hash  probe  operaEon:    à100-­‐200  dynamic  instrucRons:  hash,  then  chase  pointers  à50%  memory  ref.  

Indexing  with  Tree  Traversals  

©  2012  EPFL  PARSA  

10  

8   15  

12   25  

25  Key  

Tuple  Ptr  

Customer       Age   Product          

Result  

SQL : SELECT A_Product,A_Customer FROM A WHERE A_age = 25

Index  on    A_age  

Date  

Indexing  with  Tree  Traversals  

©  2012  EPFL  PARSA  

Each  index  traversal  :    à10K-­‐15K  dynamic  instrucRons:  lots  of  pointer  chasing  à50-­‐60%  memory  ref.  

10  

8   15  

12   25  

25  Key  

SQL : SELECT A_Product,A_Customer FROM A WHERE A_age = 25

Index  on    A_age  

Outline  

IntroducRon  Indexing  in  Databases  Indexing  Widget  Results      

©  2012  EPFL  PARSA  

 

 

Indexing  Widget  Overview  •  Dedicated  offload  engine  for  index  lookups  –  AcRvated  on-­‐demand  by  the  core  –  Full-­‐service  index  lookup    –  Core  sleeps  when  widget  runs  

•  Widget  features  –  Efficient:  Specialized  control  and  funcRonal  units  –  Low-­‐latency:  Caches  frequently-­‐accessed  index  data  –  Tightly-­‐integrated:  Uses  core’s  L1-­‐D  and  TLB  

©  2012  EPFL  PARSA  

Widget  Details  

❶ Configure  ❷ Run  ❸ Return  

©  2012  EPFL  PARSA  

Index  Addr.  

Buffer  (SRAM)  

Tree  

Key  Search  Type  

Result  Table  Addr.  Data  type  

Configura3on  Registers  

Controller  (FSM)  

Hash  

From  Core  

ComputaEonal  Logic  

Widget  Details  

If (hasWidget) {!widget.index=&A;!widget.key=&B;!widget.type=EQUAL;!widget.result=&R;!widget.data= int;!…!…!widget.run();!} else {!Hashprobe(); }!!

❶ Configure  

©  2012  EPFL  PARSA  

Index  Addr.  

Buffer  (SRAM)  

Tree  

Key  Search  Type  

Result  Table  Addr.  Data  type  

Configura3on  Registers  

Controller  (FSM)  

Hash  

From  Core  

ComputaEonal  Logic  

Widget  Details  

©  2012  EPFL  PARSA  

Index  Addr.  

Buffer  (SRAM)  

Tree  

Key  Search  Type  

Result  Table  Addr.  Data  type  

Configura3on  Registers  

Controller  (FSM)  

Hash  

To/From  L1    

From  Core  

❷ Run  ComputaEonal  

Logic  

Widget  Details  

©  2012  EPFL  PARSA  

Index  Addr.  

Buffer  (SRAM)  

Tree  

Key  Search  Type  

Data  type  Result  Table  Addr.  

Configura3on  Registers  

Controller  (FSM)  

Hash  

To/From  L1    

From  Core  

❸ Return  ComputaEonal  

Logic  

&Result  Table,  Key    

Store  

&Result  Table,  Key    &Result  Table,  Key    

Methodology  

©  2012  EPFL  PARSA  

•  First-­‐order  analyRcal  model    –  ExecuRon  traces:  Pin  –  ExecuRon  profiling:  Vtune,  Oprofile    

•  Benchmark  ApplicaRons  ◦  OLTP:  TPC-­‐C  on  VoltDB  ◦  DSS:  TPC-­‐H  on  MonetDB  

•  Model  Parameters  ◦  L1  /  L2  /  Off-­‐chip  latency:            

2  /  12  /  200  cycles  ◦  Widget  buffer:  2-­‐way  set  

associaRve  cache  

•  Energy  EsRmaRons  ◦         Mcpat    

0  

25  

50  

75  

100  

10   20   30   40   50   60   70   80   90   100  

Redu

cEon

 in  Ene

rgy  (%

)  

ApplicaEon  Coverage  (%)  

Energy  Efficiency  with  Indexing  Widget    

©  2012  EPFL  PARSA  

Qry  2  

Qry  17  

Payment  Order  S.  

Up to 65% reduction in energy

ReducRon  over  ConvenRonal  OoO  

ReducRon  over  ARM-­‐like  OoO  

Performance  with  Indexing  Widget  

©  2012  EPFL  PARSA  

Widget does not hurt performance

0  

1  

2  

3  

4  

0   0.5KB   1KB   2KB   4KB   8KB  

Overall  Speedu

p  

Widget  Buffer  Size  

Qry  17  

Order  Status    

Payment  

Qry  2  

Conclusions  •  Data  explosion,  dark  silicon  trends  call  for  specializaRon  

–  Rethinking  of  architectures  to  achieve  efficiency  

•  Databases  spend  significant  Rme  in  indexing  –  Mostly  pointer  chasing:  general  purpose  CPUs  are  poorly  suited    

•  Augment  CPU  with  indexing  widget    –  Dedicated  offload  engine:  core  sleeps  when  widget  runs  –  Improves  efficiency:  65%  less  energy,  3x  faster  query  execuRon  

 More  challenges:    Data  types,  data  sharing,  generalizaRon…    

   

 

©  2012  EPFL  PARSA  

 

 

   

Thanks!  

©  2012  EPFL  PARSA