consistency and rep contd

28
Distribution Protocols Replica Placement Update Propagation Epidemic Protocols

Upload: madhvi-sharma

Post on 05-Apr-2018

230 views

Category:

Documents


0 download

TRANSCRIPT

8/2/2019 Consistency and Rep Contd

http://slidepdf.com/reader/full/consistency-and-rep-contd 1/28

Distribution Protocols

• Replica Placement

• Update Propagation

• Epidemic Protocols

8/2/2019 Consistency and Rep Contd

http://slidepdf.com/reader/full/consistency-and-rep-contd 2/28

Replica Placement

• Replica server placement

 –  Web: geophically skewed request patterns

 –  Where to place a proxy?

• Permanent replicas –  Eg Mirroring: all replicas mirror the same content

• Temporary replicas

 –  Server-initiated- push caches

 –  Client-initiated

8/2/2019 Consistency and Rep Contd

http://slidepdf.com/reader/full/consistency-and-rep-contd 3/28

Replica Placement

The logical organization of different kinds of copies of a data

store into three concentric rings.

8/2/2019 Consistency and Rep Contd

http://slidepdf.com/reader/full/consistency-and-rep-contd 4/28

Update Propagation

• Propagate only a notification of an update

• Transfer data from one copy to another

• Propagate the update operation to other copies (active

replication)

8/2/2019 Consistency and Rep Contd

http://slidepdf.com/reader/full/consistency-and-rep-contd 5/28

 

Pull versus Push Protocols

A comparison between push-based and pull-based approaches to

updates

Hybrid approach: Leases

Issue Push-based Pull-based

State of server List of client replicas and caches None

Messages sentUpdate (and possibly fetch updatelater)

Poll and update

Response timeat client

Immediate (or fetch-update time)Fetch-updatetime

8/2/2019 Consistency and Rep Contd

http://slidepdf.com/reader/full/consistency-and-rep-contd 6/28

Epidemic Protocols

• A class of algorithms to provide update propagation in eventual-

consistent data store

• Do not solve any update conflicts; only propagate updates to all

replicas in as few messages as possible

• Based on theory of epidemics (spreading infectious diseases)

 –  Upon an update, try to “infect” other replicas as quickly as

possible

 –  Pair-wise exchange of updates (like pair-wise spreading of a

disease) 

 –  Terminology:

• Infective store: store with an update it is willing to spread

• Susceptible store: store that is not yet updated

8/2/2019 Consistency and Rep Contd

http://slidepdf.com/reader/full/consistency-and-rep-contd 7/28

Spreading an Epidemic

• Anti-entropy –  Server P picks a server Q at random and exchanges updates

 –  Three possibilities: only push, only pull, both push and pull

 –  Claim: A pure push-based approach does not help spread

updates quickly (Why?)• Pull or initial push with pull work better

• Rumor mongering (aka gossiping)

 –  Upon receiving an update, P tries to push to Q

 –  If Q already received the update, stop spreading with prob 1/k  

 –  Analogous to “hot” gossip items => stop spreading if “cold” 

 –  Does not guarantee that all replicas receive updates

8/2/2019 Consistency and Rep Contd

http://slidepdf.com/reader/full/consistency-and-rep-contd 8/28

Removing Data

• Deletion of data items is hard in epidemic protocols

• Example: server deletes data item x 

 –  No state information is preserved

• Can’t distinguish between a deleted copy and no copy! 

• Solution: death certificates

 –  Treat deletes as updates and spread a death certificate

• Mark copy as deleted but don’t delete • Need an eventual clean up

 –  Clean up dormant death certificates

8/2/2019 Consistency and Rep Contd

http://slidepdf.com/reader/full/consistency-and-rep-contd 9/28

• Many algorithms possible to spread updates

• Used in Bayou system from Xerox PARC

• Bayou: weakly connected replicas – Useful in mobile computing (mobile laptops)

 – Useful in wide area distributed databases (weak 

connectivity)

8/2/2019 Consistency and Rep Contd

http://slidepdf.com/reader/full/consistency-and-rep-contd 10/28

Implementation Issues: Consistency Protocols

Two techniques to implement consistency models

 – Primary-based protocols• Assume a primary replica for each data item

• Primary responsible for coordinating all writes

 – Remote write

 – Local writes

 – Replicated write protocols

• No primary is assumed for a data item

• Writes can take place at any replica – Active replication

 – Quorum Based

 – Cache coherence

8/2/2019 Consistency and Rep Contd

http://slidepdf.com/reader/full/consistency-and-rep-contd 11/28

Remote-Write Protocols (1)Primary-based remote-write protocol with a fixed server to

which all read and write operations are forwarded.

8/2/2019 Consistency and Rep Contd

http://slidepdf.com/reader/full/consistency-and-rep-contd 12/28

Remote-Write Protocols (2)

The principle of primary-backup

protocol.

8/2/2019 Consistency and Rep Contd

http://slidepdf.com/reader/full/consistency-and-rep-contd 13/28

• Can primary backup protocols provide an

implementation of sequential consistency?

• If non blocking protocols are used, is providing

sequential consistency easy?

8/2/2019 Consistency and Rep Contd

http://slidepdf.com/reader/full/consistency-and-rep-contd 14/28

Local-Write Protocols (1)Primary-based local-write protocol in which a single copy is migratedbetween processes.

8/2/2019 Consistency and Rep Contd

http://slidepdf.com/reader/full/consistency-and-rep-contd 15/28

Local-Write Protocols (1)

• Is consistency guaranteed?

• Problems with the approach?

8/2/2019 Consistency and Rep Contd

http://slidepdf.com/reader/full/consistency-and-rep-contd 16/28

Local-Write Protocols (2)

Primary-backup protocol in which the primary migrates to the

process wanting to perform an update.

8/2/2019 Consistency and Rep Contd

http://slidepdf.com/reader/full/consistency-and-rep-contd 17/28

• How can the approach be applied to mobile computers

that are able to operate in a disconnected mode?

8/2/2019 Consistency and Rep Contd

http://slidepdf.com/reader/full/consistency-and-rep-contd 18/28

Active Replication

• Each replica has an associated process that carries out

update operations

• Operations need to be carried out in the same order

everywhere• How?

8/2/2019 Consistency and Rep Contd

http://slidepdf.com/reader/full/consistency-and-rep-contd 19/28

Active Replication (1)

The problem of replicated invocations.

8/2/2019 Consistency and Rep Contd

http://slidepdf.com/reader/full/consistency-and-rep-contd 20/28

Active Replication (2)

a) Forwarding an invocation request from a replicated object.

b) Returning a reply to a replicated object.

8/2/2019 Consistency and Rep Contd

http://slidepdf.com/reader/full/consistency-and-rep-contd 21/28

Quorum-Based Protocols

• Requires clients to request and acquire the permission of 

multiple servers before reading or writing a replicated

file.

• Gifford’s scheme 1. NR + NW > N

2. NW > N/2

8/2/2019 Consistency and Rep Contd

http://slidepdf.com/reader/full/consistency-and-rep-contd 22/28

Quorum-Based Protocols

Three examples of the voting algorithm:

a) A correct choice of read and write set

b) A choice that may lead to write-write conflicts

c) A correct choice, known as ROWA (read one, write all)

8/2/2019 Consistency and Rep Contd

http://slidepdf.com/reader/full/consistency-and-rep-contd 23/28

• Does a 2-resilient algorithm for 6 processes exist? Write

it down or sketch a proof for its non-existence.

8/2/2019 Consistency and Rep Contd

http://slidepdf.com/reader/full/consistency-and-rep-contd 24/28

This is a proof by contradiction: It is known that consensus for 3processes, one of them Byzantine, cannot be solved.. Assume nowthat consensus for 6 processes could be solved by an algorithm a.We create a new algorithm b which solves consensus for 3processes using a. The algorithm b works as follows: each of the 3

processes (hosts) simulates 2 processes (guests). The input of theguests is equal to the input of their hosts. Correct hosts simulatecorrect guests, the Byzantine host simulates Byzantine guests. The6 guests solve consensus using a. The hosts copy the decision of their guests (all guests make the same decision). This is a

contradiction! Since b works if a works, therefore a must bewrong (as it is known, that b cannot exist.) Hence there exists noalgorithm a solving consensus for 6 processes

8/2/2019 Consistency and Rep Contd

http://slidepdf.com/reader/full/consistency-and-rep-contd 25/28

• Consider a business database/intelligence system for a webretailer such as amazon, where the data updates are sales ordersgenerated by web servers which include the contents of the order,the customer ID of the person placing the order. The resultingdatabase is used both for:

 –  Tracking inventory and order status by web applications

 –  Mining suggestions to customers about what to purchase

Describe the demands of and tradeoffs between consistency and

performance in this system 

8/2/2019 Consistency and Rep Contd

http://slidepdf.com/reader/full/consistency-and-rep-contd 26/28

We use the notation W(x):v to denote a write operation to the

variable x with the value v and R(x):v to denote a read operation

to the variable x that returns the value v

Initially all variables are set to zero Is the memory underlying the

following two processes sequentially consistent?

8/2/2019 Consistency and Rep Contd

http://slidepdf.com/reader/full/consistency-and-rep-contd 27/28

Conclusion

• Replication and caching improve performance in

distributed systems

• Consistency of replicated data is crucial

• Many consistency semantics (models) possible –  Need to pick appropriate model depending on the application

 –  Example: web caching: weak consistency is OK since humans

are tolerant to stale information (can reload browser)

 –  Implementation overheads and complexity grows if stronger

guarantees are desired

8/2/2019 Consistency and Rep Contd

http://slidepdf.com/reader/full/consistency-and-rep-contd 28/28

Example

• In a web indexing system used by companies such as

Google, the data updates are web pages crawled by

concurrent web spiders. The backend then processes the

resulting archived graph of web pages to generateindexes that are used to satisfy search requests.

• Analyze and describe the demands of and tradeoffs

between consistency and performance in this system.

• Which consistency model should be applied in this

application?