optimistic intra-transaction parallelism on chip-multiprocessors chris colohan 1, anastassia...
Post on 19-Dec-2015
213 Views
Preview:
TRANSCRIPT
Optimistic Intra-Transaction Parallelism onChip-Multiprocessors
Chris Colohan1, Anastassia Ailamaki1,J. Gregory Steffan2 and Todd C. Mowry1,3
1Carnegie Mellon University2University of Toronto
3Intel Research Pittsburgh
Copyright 2005 Chris Colohan 2
Chip Multiprocessors are Here!
2 cores now, soon will have 4, 8, 16, or 32 Multiple threads per core How do we best use them?
IBM Power 5
AMD Opteron
Intel Yonah
Copyright 2005 Chris Colohan 3
Multi-Core Enhances Throughput
Database ServerUsers
Transactions DBMS Database
Cores can run concurrent transactions and improve
throughput
Cores can run concurrent transactions and improve
throughput
Copyright 2005 Chris Colohan 4
Multi-Core Enhances Throughput
Database ServerUsers
Transactions DBMS Database
Can multiple cores improvetransaction latency?
Can multiple cores improvetransaction latency?
Copyright 2005 Chris Colohan 5
Parallelizing transactions
SELECT cust_info FROM customer;UPDATE district WITH order_id; INSERT order_id INTO new_order;foreach(item) { GET quantity FROM stock; quantity--; UPDATE stock WITH quantity; INSERT item INTO order_line;}
DBMS
Intra-query parallelism Used for long-running queries (decision support) Does not work for short queries
Short queries dominate in commercial workloads
Copyright 2005 Chris Colohan 6
Parallelizing transactions
SELECT cust_info FROM customer;UPDATE district WITH order_id; INSERT order_id INTO new_order;foreach(item) { GET quantity FROM stock; quantity--; UPDATE stock WITH quantity; INSERT item INTO order_line;}
DBMS
Intra-transaction parallelism Each thread spans multiple queries
Hard to add to existing systems! Need to change interface, add latches and locks,
worry about correctness of parallel execution…
Copyright 2005 Chris Colohan 7
Parallelizing transactions
SELECT cust_info FROM customer;UPDATE district WITH order_id; INSERT order_id INTO new_order;foreach(item) { GET quantity FROM stock; quantity--; UPDATE stock WITH quantity; INSERT item INTO order_line;}
DBMS
Intra-transaction parallelism Breaks transaction into threads
Hard to add to existing systems! Need to change interface, add latches and locks,
worry about correctness of parallel execution…
Thread Level Speculation (TLS)makes parallelization easier.
Thread Level Speculation (TLS)makes parallelization easier.
Copyright 2005 Chris Colohan 8
Thread Level Speculation (TLS)
*p=
*q=
=*p
=*q
Sequential
Tim
e
Parallel
*p=
*q=
=*p
=*q
=*p
=*q
Epoch 1 Epoch 2
Copyright 2005 Chris Colohan 9
Thread Level Speculation (TLS)
*p=
*q=
=*p
=*q
Sequential
Tim
e
*p=
*q=
=*p
R2
Violation!
=*p
=*q
Parallel
Use epochs
Detect violations
Restart to recover
Buffer state
Worst case: Sequential
Best case: Fully parallel
Data dependences limit performance.Data dependences limit performance.
Epoch 1 Epoch 2
Copyright 2005 Chris Colohan 10
Transactions
DBMS
Hardware
A Coordinated Effort
TPC-CTPC-C
BerkeleyDB
BerkeleyDB
Simulated machine
Simulated machine
Copyright 2005 Chris Colohan 11
TransactionProgrammer
DBMS Programmer
Hardware Developer
A Coordinated Effort
Choose epoch boundaries
Choose epoch boundaries
Remove performance bottlenecks
Remove performance bottlenecks
Add TLS support to architecture
Add TLS support to architecture
Copyright 2005 Chris Colohan 12
So what’s new?
Intra-transaction parallelism Without changing the transactions With minor changes to the DBMS Without having to worry about locking Without introducing concurrency bugs With good performance
Halve transaction latency on four cores
Copyright 2005 Chris Colohan 13
Related Work
Optimistic Concurrency Control (Kung82)
Sagas (Molina&Salem87)
Transaction chopping (Shasha95)
Copyright 2005 Chris Colohan 14
Outline
Introduction Related work Dividing transactions into epochs Removing bottlenecks in the DBMS Results Conclusions
Copyright 2005 Chris Colohan 15
Case Study: New Order (TPC-C)
Only dependence is the quantity field Very unlikely to occur (1/100,000)
GET cust_info FROM customer;UPDATE district WITH order_id; INSERT order_id INTO new_order;foreach(item) { GET quantity FROM stock WHERE i_id=item; UPDATE stock WITH quantity-1 WHERE i_id=item; INSERT item INTO order_line;}
78% of transactionexecution time
Copyright 2005 Chris Colohan 16
Case Study: New Order (TPC-C)GET cust_info FROM customer;UPDATE district WITH order_id; INSERT order_id INTO new_order;foreach(item) { GET quantity FROM stock WHERE i_id=item; UPDATE stock WITH quantity-1 WHERE i_id=item; INSERT item INTO order_line;}
GET cust_info FROM customer;UPDATE district WITH order_id; INSERT order_id INTO new_order;
TLS_foreach(item) { GET quantity FROM stock WHERE i_id=item; UPDATE stock WITH quantity-1 WHERE i_id=item; INSERT item INTO order_line;}
GET cust_info FROM customer;UPDATE district WITH order_id; INSERT order_id INTO new_order;
TLS_foreach(item) { GET quantity FROM stock WHERE i_id=item; UPDATE stock WITH quantity-1 WHERE i_id=item; INSERT item INTO order_line;}
Copyright 2005 Chris Colohan 17
Outline
Introduction Related work Dividing transactions into epochs Removing bottlenecks in the
DBMS Results Conclusions
Copyright 2005 Chris Colohan 19
Dependences in DBMSTim
e
Dependences serialize execution!
Performance tuning: Profile execution Remove bottleneck
dependence Repeat
Copyright 2005 Chris Colohan 20
Buffer Pool Management
CPU
Buffer Pool
get_page(5)
ref: 1
put_page(5)
ref: 0
Copyright 2005 Chris Colohan 21
get_page(5)put_page(5)
Buffer Pool Management
CPU
Buffer Pool
get_page(5)
ref: 0
put_page(5)
Tim
e
get_page(5)
put_page(5)
TLS ensures first epoch gets page
first.Who cares?
TLS ensures first epoch gets page
first.Who cares?
get_page(5)put_page(5)
Copyright 2005 Chris Colohan 22
Buffer Pool Management
CPU
Buffer Pool
get_page(5)
ref: 0
put_page(5)
Tim
e
get_page(5)
put_page(5)
= Escape Speculation
• Escape speculation• Invoke operation• Store undo function• Resume speculation
• Escape speculation• Invoke operation• Store undo function• Resume speculation
get_page(5)put_page(5)
put_page(5)get_page(5)
Copyright 2005 Chris Colohan 23
Buffer Pool Management
CPU
Buffer Pool
get_page(5)
ref: 0
put_page(5)
Tim
e
get_page(5)
put_page(5)
get_page(5)put_page(5)
Not undoable!
Not undoable!
get_page(5)put_page(5)
= Escape Speculation
Copyright 2005 Chris Colohan 24
Buffer Pool Management
CPU
Buffer Pool
get_page(5)
ref: 0
put_page(5)
Tim
e
get_page(5)
put_page(5)
get_page(5)
put_page(5)
Delay put_page until end of epoch Avoid dependence
= Escape Speculation
Copyright 2005 Chris Colohan 25
Removing Bottleneck Dependences
We introduce three techniques: Delay operations until non-speculative
Mutex and lock acquire and release Buffer pool, memory, and cursor release Log sequence number assignment
Escape speculation Buffer pool, memory, and cursor allocation
Traditional parallelization Memory allocation, cursor pool, error
checks, false sharing
Copyright 2005 Chris Colohan 26
Outline
Introduction Related work Dividing transactions into epochs Removing bottlenecks in the DBMS Results Conclusions
Copyright 2005 Chris Colohan 27
Experimental Setup
Detailed simulation Superscalar, out-of-
order, 128 entry reorder buffer
Memory hierarchy modeled in detail
TPC-C transactions on BerkeleyDB In-core database Single user Single warehouse Measure interval of 100
transactions Measuring latency not
throughput
CPU
32KB4-wayL1 $
Rest of memory system
Rest of memory system
CPU
32KB4-wayL1 $
CPU
32KB4-wayL1 $
CPU
32KB4-wayL1 $
2MB 4-way L2 $
Copyright 2005 Chris Colohan 28
Optimizing the DBMS: New Order
0
0.25
0.5
0.75
1
1.25
Seque
ntial
No Opt
imiza
tions
Latch
es
Lock
s
Mall
oc/F
ree
Buffe
r Poo
l
Curso
r Que
ue
Error C
heck
s
False
Sharin
g
B-Tre
e
Logg
ing
Tim
e (n
orm
aliz
ed)
Idle CPU
ViolatedCache Miss
Busy
Cache misses
increase
Cache misses
increase
Other CPUs not helping
Other CPUs not helping
Can’t optimize
much more
Can’t optimize
much more
26% improvemen
t
26% improvemen
t
Copyright 2005 Chris Colohan 29
Optimizing the DBMS: New Order
0
0.25
0.5
0.75
1
1.25
Seque
ntial
No Opt
imiza
tions
Latch
es
Lock
s
Mall
oc/F
ree
Buffe
r Poo
l
Curso
r Que
ue
Error C
heck
s
False
Sharin
g
B-Tre
e
Logg
ing
Tim
e (n
orm
aliz
ed)
Idle CPU
ViolatedCache Miss
Busy
This process took me 30 days and <1200 lines of code.
This process took me 30 days and <1200 lines of code.
Copyright 2005 Chris Colohan 30
Other TPC-C Transactions
0
0.25
0.5
0.75
1
New Order Delivery Stock Level Payment Order Status
Tim
e (
no
rma
lize
d)
Idle CPU
FailedCache Miss
Busy
3/5 Transactions speed up by 46-66%
Copyright 2005 Chris Colohan 31
Conclusions
A new form of parallelism for databases Tool for attacking transaction latency
Intra-transaction parallelism Without major changes to DBMS
TLS can be applied to more than transactions
Halve transaction latency by using 4 CPUs
Copyright 2005 Chris Colohan 34
TPC-C Transactions on 2 CPUs
0
0.25
0.5
0.75
1
New Order Delivery Stock Level Payment Order Status
Tim
e (n
orm
aliz
ed)
Idle CPUFailedCache MissBusy
3/5 Transactions speed up by 30-44%
Copyright 2005 Chris Colohan 36
Latches
Mutual exclusion between transactions Cause violations between epochs
Read-test-write cycle RAW Not needed between epochs
TLS already provides mutual exclusion!
Copyright 2005 Chris Colohan 37
Latches: Aggressive Acquire
Acquirelatch_cnt++…work…latch_cnt--
Homefree
latch_cnt++…work…(enqueue release)
Commit worklatch_cnt--
Homefree
latch_cnt++…work…(enqueue release)
Commit worklatch_cnt--Release
Larg
e c
riti
cal secti
on
Copyright 2005 Chris Colohan 38
Latches: Lazy Acquire
Acquire…work…Release
Homefree
(enqueue acquire) …work…(enqueue release)
AcquireCommit workRelease
Homefree
(enqueue acquire)…work…(enqueue release)
AcquireCommit workRelease
Sm
all c
riti
cal secti
on
s
Copyright 2005 Chris Colohan 40
TLS in Database Systems
Non-DatabaseTLS
Tim
e
TLS in DatabaseSystems
Large epochs:• More dependences
• Must tolerate
• More state• Bigger buffers
Large epochs:• More dependences
• Must tolerate
• More state• Bigger buffers
Concurrenttransactions
Copyright 2005 Chris Colohan 41
Feedback Loop I know this is
parallel!
for() { do_work();}
par_for() { do_work();}
Must…Make…Faster
think
feed back feed back feed back feed back feed back feed back feed back feed back feed back feed back feed
Copyright 2005 Chris Colohan 42
Violations == Feedback
*p=
*q=
=*p
=*q
Sequential
Tim
e
*p=
*q=
=*p
R2
Violation!
=*p
=*q
Parallel
0x0FD80xFD200x0FC00xFC18
Must…Make…Faster
Must…Make…Faster
Copyright 2005 Chris Colohan 43
Eliminating Violations
*p=
*q=
=*p
R2
Violation!
=*p
=*q
Parallel
*q==*q
=*q
Violation!
Eliminate *p Dep.
Tim
e
0x0FD80xFD200x0FC00xFC18
Optimization maymake slower?
Copyright 2005 Chris Colohan 44
Tolerating Violations: Sub-epochs
Tim
e
*q=Violation!
Sub-epochs
=*q
=*q
*q==*q
=*q
Violation!
Eliminate *p Dep.
top related