dark silicon accelerators for database indexing
TRANSCRIPT
Dark Silicon Accelerators for Database Indexing
Onur Kocberber, Kevin Lim, Babak Falsafi, Partha Ranganathan, Stavros Harizopoulos
© 2012 EPFL PARSA
Dark Silicon and Big Data Challenges
• Data explosion – Data growing faster than technology
• End of “Free energy” – Higher density higher energy
• Challenge: CPUs ill-‐matched to server workloads – Most of Rme waiRng for data rather than compuRng
© 2012 EPFL PARSA
Need to specialize for data-centric workloads
How Do Data-‐Centric Workloads Access Data? • Databases create and use an index – Data structures for fast data lookup – Most oUen balanced tree or hash table – Frequently accessed
• Indexing is pointer-‐intensive – UnderuRlize general-‐purpose CPUs – IPCs as low as 0.25 on OoO core
© 2012 EPFL PARSA
Hash Table Tree
ContribuRon: Database Indexing Widget
• Index lookups on general-‐purpose CPUs: – Pointer-‐intensive low IPC – Time-‐intensive poor energy-‐efficiency
• Database Indexing Widget – Dedicated hardware for database index lookups – Full-‐service offload: core sleeps when widget runs – Up to 65% less energy per query
© 2012 EPFL PARSA
Modern Databases and Indexing Two types of contemporary in-‐memory databases:
• Two fundamental indexing operaRons – Hash table probe – Tree traversal
© 2012 EPFL PARSA
Customer Date Product Customer Date Product Customer Date Product
Column-‐store analy/cal processing Scale-‐out transac/on processing
with DSS with OLTP
How Much Time is Spent Indexing? Measurement on Xeon 5670 CPU with HW Counters
© 2012 EPFL PARSA
0%
25%
50%
75%
100%
Order Status Payment Query 2 Query 17
OLTP DSS
ExecuEo
n Time
Hash Table
Tree Hash Table
Tree / Hash Table
Indexing can account for up to 73% of execution
Example: Hash Join
© 2012 EPFL PARSA
SQL : SELECT A_name FROM A,B WHERE A_age = B_age
Table B (60M rows)
Hash Table (A)
❶ Build ❷ Probe
❸ Result
Hash table probes dominate execution
Table A (2M rows)
2 1
3 4
2 1
3 4 5 6 7 8
25 48 19 11 63 31 26 41
35 26 71 19
9 42
2 26 4 19
age
age
Indexing with Hash Table Probes
© 2012 EPFL PARSA
>>
Hash FuncEon
Buckets Chains
Hash Table Key
Compare
?
Each hash probe operaEon: à100-‐200 dynamic instrucRons: hash, then chase pointers à50% memory ref.
Indexing with Tree Traversals
© 2012 EPFL PARSA
10
8 15
12 25
25 Key
Tuple Ptr
Customer Age Product
Result
SQL : SELECT A_Product,A_Customer FROM A WHERE A_age = 25
Index on A_age
Date
Indexing with Tree Traversals
© 2012 EPFL PARSA
Each index traversal : à10K-‐15K dynamic instrucRons: lots of pointer chasing à50-‐60% memory ref.
10
8 15
12 25
25 Key
SQL : SELECT A_Product,A_Customer FROM A WHERE A_age = 25
Index on A_age
Indexing Widget Overview • Dedicated offload engine for index lookups – AcRvated on-‐demand by the core – Full-‐service index lookup – Core sleeps when widget runs
• Widget features – Efficient: Specialized control and funcRonal units – Low-‐latency: Caches frequently-‐accessed index data – Tightly-‐integrated: Uses core’s L1-‐D and TLB
© 2012 EPFL PARSA
Widget Details
❶ Configure ❷ Run ❸ Return
© 2012 EPFL PARSA
Index Addr.
Buffer (SRAM)
Tree
Key Search Type
Result Table Addr. Data type
Configura3on Registers
Controller (FSM)
Hash
From Core
ComputaEonal Logic
Widget Details
If (hasWidget) {!widget.index=&A;!widget.key=&B;!widget.type=EQUAL;!widget.result=&R;!widget.data= int;!…!…!widget.run();!} else {!Hashprobe(); }!!
❶ Configure
© 2012 EPFL PARSA
Index Addr.
Buffer (SRAM)
Tree
Key Search Type
Result Table Addr. Data type
Configura3on Registers
Controller (FSM)
Hash
From Core
ComputaEonal Logic
Widget Details
© 2012 EPFL PARSA
Index Addr.
Buffer (SRAM)
Tree
Key Search Type
Result Table Addr. Data type
Configura3on Registers
Controller (FSM)
Hash
To/From L1
From Core
❷ Run ComputaEonal
Logic
Widget Details
© 2012 EPFL PARSA
Index Addr.
Buffer (SRAM)
Tree
Key Search Type
Data type Result Table Addr.
Configura3on Registers
Controller (FSM)
Hash
To/From L1
From Core
❸ Return ComputaEonal
Logic
&Result Table, Key
Store
&Result Table, Key &Result Table, Key
Methodology
© 2012 EPFL PARSA
• First-‐order analyRcal model – ExecuRon traces: Pin – ExecuRon profiling: Vtune, Oprofile
• Benchmark ApplicaRons ◦ OLTP: TPC-‐C on VoltDB ◦ DSS: TPC-‐H on MonetDB
• Model Parameters ◦ L1 / L2 / Off-‐chip latency:
2 / 12 / 200 cycles ◦ Widget buffer: 2-‐way set
associaRve cache
• Energy EsRmaRons ◦ Mcpat
0
25
50
75
100
10 20 30 40 50 60 70 80 90 100
Redu
cEon
in Ene
rgy (%
)
ApplicaEon Coverage (%)
Energy Efficiency with Indexing Widget
© 2012 EPFL PARSA
Qry 2
Qry 17
Payment Order S.
Up to 65% reduction in energy
ReducRon over ConvenRonal OoO
ReducRon over ARM-‐like OoO
Performance with Indexing Widget
© 2012 EPFL PARSA
Widget does not hurt performance
0
1
2
3
4
0 0.5KB 1KB 2KB 4KB 8KB
Overall Speedu
p
Widget Buffer Size
Qry 17
Order Status
Payment
Qry 2
Conclusions • Data explosion, dark silicon trends call for specializaRon
– Rethinking of architectures to achieve efficiency
• Databases spend significant Rme in indexing – Mostly pointer chasing: general purpose CPUs are poorly suited
• Augment CPU with indexing widget – Dedicated offload engine: core sleeps when widget runs – Improves efficiency: 65% less energy, 3x faster query execuRon
More challenges: Data types, data sharing, generalizaRon…
© 2012 EPFL PARSA