dangers of replication
DESCRIPTION
Dangers of Replication. Materials taken from “J. Gray, P. Helland, P. O’Neil, and D. Shasha. The Dangers of Replication and a Solution. SIGMOD, 2006.” http://research.microsoft.com/~gray/replicas.ps. What’s the danger?. Replication of transactional data results in unstable system performance - PowerPoint PPT PresentationTRANSCRIPT
CS 600.419 Storage Systems
Dangers of Replication
Materials taken from “J. Gray, P. Helland, P. O’Neil, and D. Shasha. The Dangers of Replication and a Solution. SIGMOD, 2006.”
http://research.microsoft.com/~gray/replicas.ps
CS 600.419 Storage Systems
What’s the danger?
• Replication of transactional data results in unstable system performance
• For consistent replication– Waits and deadlocks
• For update-anywhere-anytime replication– Reconciliations
• Both grow polynomially (w/ meaningful exponents) in the number of clients– Based on simple, lower bounds derived from mean-value analysis
CS 600.419 Storage Systems
What’s the point?
• This theme is predicated on the knowledge that globally consistent replication does not scale
QuickTime™ and aTIFF (LZW) decompressor
are needed to see this picture.
CS 600.419 Storage Systems
Replication Policies
• Eager replication:– Copies are updated as part of the original transaction.
• Lazy replication:– One replica is updated. Other copies are updated asynchronously
• Update policy:– Group: any node can update its replica.
– Master: only master updates its replica. The rest replicas are read only.
CS 600.419 Storage Systems
Representing Writes
QuickTime™ and aTIFF (LZW) decompressor
are needed to see this picture.
CS 600.419 Storage Systems
Mastered and Group Replication
QuickTime™ and aTIFF (LZW) decompressor
are needed to see this picture.
CS 600.419 Storage Systems
The Scale-up Pitfall
• Replication works well on small, prototype systems– But, at deployment, replication is unstable
• At larger scales– Messages propagation delay increases
– Higher transaction rates
• For eager replication– More transactions with each txn taking longer
• For lazy transactions– Delays in reconciliation leads to system delusion
CS 600.419 Storage Systems
Analysis of Eager Group Replication
• Scaling laws– Third power of the number of nodes
– Fifth power of the # of operations per transaction
• Problems with eager replication– Cannot be used by disconnected nodes
– Probability of deadlocks (failed transactions) increases with systems size
QuickTime™ and aTIFF (LZW) decompressor
are needed to see this picture.QuickTime™ and a
TIFF (LZW) decompressorare needed to see this picture.
CS 600.419 Storage Systems
Analysis of Lazy Group Replication
• Scaling laws– Third power of the number of nodes
– third power of the # of operations per transaction
• Better than eager, but not so good
QuickTime™ and aTIFF (LZW) decompressor
are needed to see this picture.
CS 600.419 Storage Systems
Analysis of Lazy Master Replication
• Scaling laws– second power of the number of nodes
– fifth power of the # of operations per transaction
QuickTime™ and aTIFF (LZW) decompressor
are needed to see this picture.
CS 600.419 Storage Systems
Status of Replication
• Negative scaling results– Don’t account for message delays (so it’s worse)
– Can’t escape these via lazy vs eager options
• No reason for group replication– Master is the same (eager) or better (lazy)
• So, what do we do– Avoid scale, keep systems small
CS 600.419 Storage Systems
Two-Tier Replication
• Two node types:– Base nodes: Always connected, store replica, master most objects
– Mobile nodes: often disconnected, store a replica, issues tentative transactions
• Two version types:– Master version:
• Exists at the object owner, other may have older versions
– Tentative version:• Local version is updated by tentative transactions
CS 600.419 Storage Systems
Pictures to Entertain
QuickTime™ and aTIFF (LZW) decompressor
are needed to see this picture.
QuickTime™ and aTIFF (LZW) decompressor
are needed to see this picture.
CS 600.419 Storage Systems
System Principles
• Hierarchies to reduce scale– Nodes (Master & Mobile-disconnected)
– Transactions (Tentative and Eager/Consistent)
• Techniques– Convergence (Bayou-like eventual consistency)
– Idempotence: encode writes in non-conflicting ways
• Does it fix any of Bayou’s semantic problems?
CS 600.419 Storage Systems
Conclusions
• Eager: waits and deadlocks
• Lazy converts waits and deadlocks into reconciliations
• Both do not scale.
• Two tier replication: – Supports mobile nodes
– Combine eager-master-replication with local updates