consistency and rep contd
TRANSCRIPT
8/2/2019 Consistency and Rep Contd
http://slidepdf.com/reader/full/consistency-and-rep-contd 1/28
Distribution Protocols
• Replica Placement
• Update Propagation
• Epidemic Protocols
8/2/2019 Consistency and Rep Contd
http://slidepdf.com/reader/full/consistency-and-rep-contd 2/28
Replica Placement
• Replica server placement
– Web: geophically skewed request patterns
– Where to place a proxy?
• Permanent replicas – Eg Mirroring: all replicas mirror the same content
• Temporary replicas
– Server-initiated- push caches
– Client-initiated
8/2/2019 Consistency and Rep Contd
http://slidepdf.com/reader/full/consistency-and-rep-contd 3/28
Replica Placement
The logical organization of different kinds of copies of a data
store into three concentric rings.
8/2/2019 Consistency and Rep Contd
http://slidepdf.com/reader/full/consistency-and-rep-contd 4/28
Update Propagation
• Propagate only a notification of an update
• Transfer data from one copy to another
• Propagate the update operation to other copies (active
replication)
8/2/2019 Consistency and Rep Contd
http://slidepdf.com/reader/full/consistency-and-rep-contd 5/28
Pull versus Push Protocols
A comparison between push-based and pull-based approaches to
updates
Hybrid approach: Leases
Issue Push-based Pull-based
State of server List of client replicas and caches None
Messages sentUpdate (and possibly fetch updatelater)
Poll and update
Response timeat client
Immediate (or fetch-update time)Fetch-updatetime
8/2/2019 Consistency and Rep Contd
http://slidepdf.com/reader/full/consistency-and-rep-contd 6/28
Epidemic Protocols
• A class of algorithms to provide update propagation in eventual-
consistent data store
• Do not solve any update conflicts; only propagate updates to all
replicas in as few messages as possible
• Based on theory of epidemics (spreading infectious diseases)
– Upon an update, try to “infect” other replicas as quickly as
possible
– Pair-wise exchange of updates (like pair-wise spreading of a
disease)
– Terminology:
• Infective store: store with an update it is willing to spread
• Susceptible store: store that is not yet updated
8/2/2019 Consistency and Rep Contd
http://slidepdf.com/reader/full/consistency-and-rep-contd 7/28
Spreading an Epidemic
• Anti-entropy – Server P picks a server Q at random and exchanges updates
– Three possibilities: only push, only pull, both push and pull
– Claim: A pure push-based approach does not help spread
updates quickly (Why?)• Pull or initial push with pull work better
• Rumor mongering (aka gossiping)
– Upon receiving an update, P tries to push to Q
– If Q already received the update, stop spreading with prob 1/k
– Analogous to “hot” gossip items => stop spreading if “cold”
– Does not guarantee that all replicas receive updates
8/2/2019 Consistency and Rep Contd
http://slidepdf.com/reader/full/consistency-and-rep-contd 8/28
Removing Data
• Deletion of data items is hard in epidemic protocols
• Example: server deletes data item x
– No state information is preserved
• Can’t distinguish between a deleted copy and no copy!
• Solution: death certificates
– Treat deletes as updates and spread a death certificate
• Mark copy as deleted but don’t delete • Need an eventual clean up
– Clean up dormant death certificates
8/2/2019 Consistency and Rep Contd
http://slidepdf.com/reader/full/consistency-and-rep-contd 9/28
• Many algorithms possible to spread updates
• Used in Bayou system from Xerox PARC
• Bayou: weakly connected replicas – Useful in mobile computing (mobile laptops)
– Useful in wide area distributed databases (weak
connectivity)
8/2/2019 Consistency and Rep Contd
http://slidepdf.com/reader/full/consistency-and-rep-contd 10/28
Implementation Issues: Consistency Protocols
Two techniques to implement consistency models
– Primary-based protocols• Assume a primary replica for each data item
• Primary responsible for coordinating all writes
– Remote write
– Local writes
– Replicated write protocols
• No primary is assumed for a data item
• Writes can take place at any replica – Active replication
– Quorum Based
– Cache coherence
8/2/2019 Consistency and Rep Contd
http://slidepdf.com/reader/full/consistency-and-rep-contd 11/28
Remote-Write Protocols (1)Primary-based remote-write protocol with a fixed server to
which all read and write operations are forwarded.
8/2/2019 Consistency and Rep Contd
http://slidepdf.com/reader/full/consistency-and-rep-contd 12/28
Remote-Write Protocols (2)
The principle of primary-backup
protocol.
8/2/2019 Consistency and Rep Contd
http://slidepdf.com/reader/full/consistency-and-rep-contd 13/28
• Can primary backup protocols provide an
implementation of sequential consistency?
• If non blocking protocols are used, is providing
sequential consistency easy?
8/2/2019 Consistency and Rep Contd
http://slidepdf.com/reader/full/consistency-and-rep-contd 14/28
Local-Write Protocols (1)Primary-based local-write protocol in which a single copy is migratedbetween processes.
8/2/2019 Consistency and Rep Contd
http://slidepdf.com/reader/full/consistency-and-rep-contd 15/28
Local-Write Protocols (1)
• Is consistency guaranteed?
• Problems with the approach?
8/2/2019 Consistency and Rep Contd
http://slidepdf.com/reader/full/consistency-and-rep-contd 16/28
Local-Write Protocols (2)
Primary-backup protocol in which the primary migrates to the
process wanting to perform an update.
8/2/2019 Consistency and Rep Contd
http://slidepdf.com/reader/full/consistency-and-rep-contd 17/28
• How can the approach be applied to mobile computers
that are able to operate in a disconnected mode?
8/2/2019 Consistency and Rep Contd
http://slidepdf.com/reader/full/consistency-and-rep-contd 18/28
Active Replication
• Each replica has an associated process that carries out
update operations
• Operations need to be carried out in the same order
everywhere• How?
8/2/2019 Consistency and Rep Contd
http://slidepdf.com/reader/full/consistency-and-rep-contd 19/28
Active Replication (1)
The problem of replicated invocations.
8/2/2019 Consistency and Rep Contd
http://slidepdf.com/reader/full/consistency-and-rep-contd 20/28
Active Replication (2)
a) Forwarding an invocation request from a replicated object.
b) Returning a reply to a replicated object.
8/2/2019 Consistency and Rep Contd
http://slidepdf.com/reader/full/consistency-and-rep-contd 21/28
Quorum-Based Protocols
• Requires clients to request and acquire the permission of
multiple servers before reading or writing a replicated
file.
• Gifford’s scheme 1. NR + NW > N
2. NW > N/2
8/2/2019 Consistency and Rep Contd
http://slidepdf.com/reader/full/consistency-and-rep-contd 22/28
Quorum-Based Protocols
Three examples of the voting algorithm:
a) A correct choice of read and write set
b) A choice that may lead to write-write conflicts
c) A correct choice, known as ROWA (read one, write all)
8/2/2019 Consistency and Rep Contd
http://slidepdf.com/reader/full/consistency-and-rep-contd 23/28
• Does a 2-resilient algorithm for 6 processes exist? Write
it down or sketch a proof for its non-existence.
8/2/2019 Consistency and Rep Contd
http://slidepdf.com/reader/full/consistency-and-rep-contd 24/28
This is a proof by contradiction: It is known that consensus for 3processes, one of them Byzantine, cannot be solved.. Assume nowthat consensus for 6 processes could be solved by an algorithm a.We create a new algorithm b which solves consensus for 3processes using a. The algorithm b works as follows: each of the 3
processes (hosts) simulates 2 processes (guests). The input of theguests is equal to the input of their hosts. Correct hosts simulatecorrect guests, the Byzantine host simulates Byzantine guests. The6 guests solve consensus using a. The hosts copy the decision of their guests (all guests make the same decision). This is a
contradiction! Since b works if a works, therefore a must bewrong (as it is known, that b cannot exist.) Hence there exists noalgorithm a solving consensus for 6 processes
8/2/2019 Consistency and Rep Contd
http://slidepdf.com/reader/full/consistency-and-rep-contd 25/28
• Consider a business database/intelligence system for a webretailer such as amazon, where the data updates are sales ordersgenerated by web servers which include the contents of the order,the customer ID of the person placing the order. The resultingdatabase is used both for:
– Tracking inventory and order status by web applications
– Mining suggestions to customers about what to purchase
Describe the demands of and tradeoffs between consistency and
performance in this system
8/2/2019 Consistency and Rep Contd
http://slidepdf.com/reader/full/consistency-and-rep-contd 26/28
We use the notation W(x):v to denote a write operation to the
variable x with the value v and R(x):v to denote a read operation
to the variable x that returns the value v
Initially all variables are set to zero Is the memory underlying the
following two processes sequentially consistent?
8/2/2019 Consistency and Rep Contd
http://slidepdf.com/reader/full/consistency-and-rep-contd 27/28
Conclusion
• Replication and caching improve performance in
distributed systems
• Consistency of replicated data is crucial
• Many consistency semantics (models) possible – Need to pick appropriate model depending on the application
– Example: web caching: weak consistency is OK since humans
are tolerant to stale information (can reload browser)
– Implementation overheads and complexity grows if stronger
guarantees are desired
8/2/2019 Consistency and Rep Contd
http://slidepdf.com/reader/full/consistency-and-rep-contd 28/28
Example
• In a web indexing system used by companies such as
Google, the data updates are web pages crawled by
concurrent web spiders. The backend then processes the
resulting archived graph of web pages to generateindexes that are used to satisfy search requests.
• Analyze and describe the demands of and tradeoffs
between consistency and performance in this system.
• Which consistency model should be applied in this
application?