retso hotdep-2011
DESCRIPTION
Talk at the 7th Workshop on Hot Topics in Systems Dependability (HotDep 2011)TRANSCRIPT
Lock-free transactional support for large-scale
storage systemsFlavio Junqueira, Benjamin Reed, Maysam Yabandeh
Yahoo! ResearchJune 2011
June 2011
Big data
• Large data sets
✓ Unstructured, semi-structured data
✓ Critical for business logic
• Examples of such data
✓ Web logs, server logs, social media, etc
2
June 2011
Big data
3
+43% clicksvs. editor selected
+160% clicksvs. one-size fits all
Eric Baldeschwieler @IBM Big Data, May 2011
June 2011
Big data: Hadoop
4
Eric Baldeschwieler @IBM Big Data, May 2011
June 2011
• Database generations in batches
• Online concurrent updates
Background
5
InputDB
Hours of MapReduce
OutputDB
Hours of MapReduce
OutputDB
Input
OutputDB
Input
Input txn
Input txn
Require transactional support
e.g., Hbase, HDFS
June 2011
Examples
• Mutable tables
• Various indexes: Web, news, shopping, coupons
• User and content models
• Characteristics
✓ Concurrency
✓ Losing updates is undesirable
✓ There are concurrent reads and they must be consistent
6
June 2011
Semantics
• Read only previously committed values
7
w(x,v)
w(x,v’)
r(x) = v
Time
Txn
June 2011
Semantics
• No concurrent writes to the same row
8
w(x,v’)
w(x,v)
Time
Txn
At least one must abort
June 2011
Snapshot Isolation
• Known in the database realm
• Conflicting transactions
✓ Write to the same element (e.g., row)
✓ Time range between start and commit overlap
• Efficient implementation by versioning
9
June 2011
Locks?
• Previous approaches: Lock data to modify
✓ Convoy effect
✓ Delays of several seconds
✓ Higher overhead on data servers
• Our approach
✓ Lock-free, centralized transaction manager
✓ Single point of failure, potential bottleneck?
10
[Percolator, OSDI’10]
June 2011
Transaction Status Oracle
• Single process
✓ Processes client inquiries about transactions
✓ Includes a timestamp oracle
11
TSOTO
ClientDB1 DB2
Keeps stateabout committed rows
June 2011
Transaction Status Oracle
• Single process
✓ Processes client inquiries about transactions
✓ Includes a timestamp oracle
11
TSOTO
ClientDB1 DB2
Ts①
Keeps stateabout committed rows
June 2011
Transaction Status Oracle
• Single process
✓ Processes client inquiries about transactions
✓ Includes a timestamp oracle
11
TSOTO
ClientDB1 DB2
Ts①r(r1)w(r2, v2, Ts(txnr))
v1, Ts(txnw), ⊥ACK
②②
Ts(txnw) < Ts(txnr)
Keeps stateabout committed rows
June 2011
Transaction Status Oracle
• Single process
✓ Processes client inquiries about transactions
✓ Includes a timestamp oracle
11
TSOTO
ClientDB1 DB2
Ts① Tc(txnw) < Ts(txnr)? ③r(r1)w(r2, v2, Ts(txnr))
v1, Ts(txnw), ⊥ACK
②②
Ts(txnw) < Ts(txnr)
Keeps stateabout committed rows
June 2011
Transaction Status Oracle
• Single process
✓ Processes client inquiries about transactions
✓ Includes a timestamp oracle
11
TSOTO
ClientDB1 DB2
Ts① Tc(txnw) < Ts(txnr)? ③Commit r2
④r(r1)w(r2, v2, Ts(txnr))
v1, Ts(txnw), ⊥ACK
②②
Ts(txnw) < Ts(txnr)
Keeps stateabout committed rows
June 2011
Transaction Status Oracle
• Single process
✓ Processes client inquiries about transactions
✓ Includes a timestamp oracle
11
TSOTO
ClientDB1 DB2
Ts① Tc(txnw) < Ts(txnr)? ③Commit r2
④
Cleanup(r2, txnr) ⑤
r(r1)w(r2, v2, Ts(txnr))
v1, Ts(txnw), ⊥ACK
②②
Ts(txnw) < Ts(txnr)
Keeps stateabout committed rows
June 2011
ReTSO: Design choices
• TSO
✓ Keeps state of modified rows
• In-memory state
✓ Highest commit timestamp of all garbage-collected rows
• Auto-GC Hash map
✓ Lazy garbage-collection
✓ Upon a hit
12
June 2011
ReTSO: Increasing dependability
• Remote write-ahead log
13
WAL
ReTSOInquiries
Updates
BackupReTSO Warm or cold
e.g., NFS, BookKeeper
[http://zookeeper.apache.org/bookkeeper]
Writes to WALare synchronous but do
not block other txns
June 2011
Preliminary results
• Coded in Java
✓ Except for hash map (C++ with JNI interface)
• Uses BookKeeper for WAL
• 10 identical servers
✓ 2.13 Dual Core Intel Xeon
✓ 4GB of RAM
✓ 1 Gigabit interfaces
14
June 2011
Preliminary results
• Average throughput observed
✓ 3 clients, 1,000 concurrent transactions
✓ 81k TPS
• Average latency
✓ 1 client, 1 txn
✓ 0.87 ms (with WAL)
✓ 0.17 ms (without WAL)
15
June 2011
Preliminary results
• Increasing the load of the system
✓ 1 to 16 clients
✓ Max is 72k TPS
16
0
2
4
6
8
10
12
14
16
18
20000 40000 60000 80000 100000 120000
Late
ncy
in m
s
Throughput in TPS
ReTSOWAL-disabled
June 2011
What’s baking?
• Integration
✓ HBase
✓ Query engine
• Real workloads
17
June 2011
Summary
• Transaction management for large-scale data repositories
• Lock-based vs. Lock-free
✓ ReTSO is lock-free and dependable
✓ Reduced load on storage nodes
✓ Low latency despite faults
• Performance sufficient for realistic applications
18