ubi529 distributed algorithms

36
UBI529 Distributed Algorithms 2. Time Synchronization in Distributed Systems

Upload: aizza

Post on 13-Jan-2016

31 views

Category:

Documents


2 download

DESCRIPTION

UBI529 Distributed Algorithms. 2. Time Synchronization in Distributed Systems. Assumptions. Process can perform three kinds of atomic actions/events Local computation Send a single message to a single process Receive a single message from a single process Communication model - PowerPoint PPT Presentation

TRANSCRIPT

Page 1: UBI529  Distributed Algorithms

UBI529 Distributed Algorithms

2. Time Synchronization in Distributed Systems

Page 2: UBI529  Distributed Algorithms

2

Assumptions

Process can perform three kinds of atomic actions/events Local computation Send a single message to a single process Receive a single message from a single process

Communication model Point-to-point Error-free, infinite buffer Potentially out of order

Page 3: UBI529  Distributed Algorithms

3

Motivation

Many protocols need to impose an ordering among events If two players open the same treasure chest “almost” at the

same time, who should get the treasure?

Physical clocks: Seems to completely solve the problem But what about theory of relativity? Even without theory of relativity – efficiency problems

How accurate is sufficient? Without out-of-band communication: Minimum message

propagation delay With out-of-band communication: distance/speed of light In other words, some time it has to be “quite” accurate

Page 4: UBI529  Distributed Algorithms

4

Physical Clock Synchronization

The relation between clock time and UTC when clocks tick at different rates.

Page 5: UBI529  Distributed Algorithms

5

Classification

Types of Synchronization

External Synchronization : Keep as close

to UTC as possible

Internal Synchronization : Mutual

synchronization is important

Phase Synchronization

Types of clocks

Unbounded 0, 1, 2, 3, . . .

Bounded 0,1, 2, . . . M-1, 0, 1, .

. .

Unbounded clocks are not realistic, but are easier to deal with in algorithms. Real clocks are always bounded.

Page 6: UBI529  Distributed Algorithms

6

Terminology

R RNewtonian time

c l o c k t i m e

clock 1

clock 2

Š

drift rate=

Drift rate : Max rate two clocks can differ, < 10^-6 for Xtal clocks

Clock skew : Max. Allowed drift

Resynchronization interval R : Depends on clock skew

Challenges

(Drift is unavoidable)Accounting for propagation delayAccounting for processing delayFaulty clocks

Page 7: UBI529  Distributed Algorithms

7

Internal synchronization

A simple averaging algorithm Assume n clocks, at most t are

faulty

Step 1. Read every clock in the system.

Step 2. Discard outliers and substitute them by

the value of the local clock.

Step 3. Update the clock using the average of

these values.

Synchronization is maintained if n > 3tWhy?

i j

k

c

c+

c-

c-2

Some clocks may be faulty. Exhibits 2-faced behavior

Page 8: UBI529  Distributed Algorithms

8

Internal synchronization

A simple averaging algorithmThe maximum difference between

the averages computed by two

non-faulty nodes is (3td / n)

To keep the clocks synchronized,

3td /n < d

So, 3t < n

i j

k

c

c+

c-

c-2

Page 9: UBI529  Distributed Algorithms

9

Network Time Protocol (NTP)

Tiered architecture Broadcast mode

- least accurate

Procedure call

- medium accuracy

Peer-to-peer mode

- upper level servers use this for max

accuracy

Timeserver

The tree can reconfigure itself if some node fails.

Level 1Level 1

Level 1Level 0

Level 2Level 2

Level 2

Page 10: UBI529  Distributed Algorithms

10

P2P mode of NTP

Let Q’s time be ahead of P’s time by . Then

T2 = T1 + TPQ + T4 = T3 + TQP -

y = TPQ + TQP = T2 +T4 -T1 -T3 (RTT)

= (T2 -T4 -T1 +T3) / 2 - (TPQ - TQP) / 2

So, x- y/2 ≤ ≤ x+ y/2

T2

T1 T4

T3Q

P

Ping several times, and obtain the smallest value of y. Use it to calculate

x Between y/2 and -y/2

Page 11: UBI529  Distributed Algorithms

11

Cristian’s Algorithm

1. Client gets data from a time server.2. It computes the round trip time (RTT), and compensates for this delay while adjusting its own clock

Page 12: UBI529  Distributed Algorithms

12

Berkeley Time Protocol (FromTanenbaum 2007)

a) The time daemon asks all the other machines for their clockvalues.

b) The machines answer

The time daemon tells everyone how to adjust their clock.

Page 13: UBI529  Distributed Algorithms

13

Software “Clocks”

Software “clocks” can incur much lower overhead than maintaining (sufficiently accurate) physical clocks

Allows a protocol to infer ordering among events

Goal of software “clocks”: Capture event ordering that are visible to users

But what orderings are visible to users without physical clocks?

Page 14: UBI529  Distributed Algorithms

14

Visible Ordering to Users

A B (process order)

B C (send-receive order)

A C (transitivity)

user1 (process1)

user2 (process2)

user3 (process3)

A B

C

D

A ? D

B ? D

C ? D

Page 15: UBI529  Distributed Algorithms

15

“Happened-Before” Relation

“Happened-before” relation captures the ordering that is visible to users when there is no physical clock

A partial order among events Process-order, send-receive order, transitivity

First introduced by Lamport – Considered to be the first fundamental result in distributed computing

Goal of software “clock” is to capture the above “happened-before” relation. Formally, a logical clock C is a map from the set of events E to N (the set of natural numbers) with the following constraint:

Page 16: UBI529  Distributed Algorithms

16

1: Logical Clocks

Each event has a single integer as its logical clock value Each process has a local counter C Increment C at each “local computation” and “send” event When sending a message, logical clock value V is attached to

the message. At each “receive” event, C = max(C, V) + 1

user1 (process1)

user2 (process2)

user3 (process3)

1 2 3 4

1

3 = max(1,2)+1

4 5

1 2 5

3

4 = max(3,3)+1

Page 17: UBI529  Distributed Algorithms

17

Logical Clock Properties

Theorem:

Event s happens before t the logical clock value of s is smaller than the logical clock value of t.

The reverse may not be true

user1 (process1)

user2 (process2)

user3 (process3)

1 2 3 4

1 4 5

1 2 5

3

Page 18: UBI529  Distributed Algorithms

18

Lamport’s Logical Clock Algorithm

Page 19: UBI529  Distributed Algorithms

19

Lamport's logical clock algorithm in Java

public class LamportClock { int c; public LamportClock() { c = 1; } public int getValue() { return c; } public void tick() { // on internal actions c = c + 1; } public void sendAction() { // include c in message c = c + 1; } public void receiveAction(int src, int sentValue) { c = Util.max(c, sentValue) + 1; }}

Page 20: UBI529  Distributed Algorithms

20

2: Vector Clocks

Logical clock: Event s happens before event t the logical clock value of s is

smaller than the logical clock value of t.Vector clock:

Event s happens before event t the vector clock value of s is “smaller” than the vector clock value of t.

Each event has a vector of n integers as its vector clock value v1 = v2 if all n fields same v1 v2 if every field in v1 is less than or equal to the

corresponding field in v2 v1 < v2 if v1 v2 and v1 ≠ v2

Relation “<“ here is not a total order

Page 21: UBI529  Distributed Algorithms

21

Vector Clock Protocol

Each process i has a local vector C

Increment C[i] at each “local computation” and “send” event

When sending a message, vector clock value V is attached to the

message. At each “receive” event, C = pairwise-max(C, V); C[i]++;

user1 (process1)

user2 (process2)

user3 (process3)

(1,0,0) (2,0,0) (3,0,0)

(0,0,2)

(0,2,2)(0,1,0)

(0,0,1)

(2,3,2)

(2,3,3)

C = (0,1,0), V = (0,0,2)

pairwise-max(C, V) = (0,1,2)

Page 22: UBI529  Distributed Algorithms

22

Vector Clock Properties

Event s happens before t vector clock value of s < vector clock value of t: There

must be a chain from s to t

Event s happens before t vector clock value of s < vector clock value of t If s and t on same process, done If s is on p and t is on q, let VS be s’s vector clock and VT be t’s VS < VT VS[p] ≤ VT[p] Must be a sequence of message from p to q after s

and before t

q

s

t

p

Page 23: UBI529  Distributed Algorithms

23

Example Application of Vector Clock

Each user has an order placed by a customer Want all users to know all orders

Each order has a vector clock value

user1 (process1)

user2 (process2)

user3 (process3)

D1, (1,0,0) (2,0,0) (3,0,0)

(0,0,2)

(0,2,2)D2, (0,1,0)

D3, (0,0,1)

(2,3,2)

(2,3,3)

D1

D3 D1, D2, D3

I have seen all orders whose vector clock is smaller than (2,3,2):

I can avoid linear search for existence testing

Page 24: UBI529  Distributed Algorithms

24

Vector Clock Algorithm

Page 25: UBI529  Distributed Algorithms

25

A sample execution of the vector clock algorithm

Page 26: UBI529  Distributed Algorithms

26

Vector clock algorithm in Java

public class VectorClock { public int[] v; int myId; int N; public VectorClock(int numProc, int id) { myId = id; N = numProc; v = new int[numProc]; for (int i = 0; i < N; i++) v[i] = 0; v[myId] = 1; } public void tick() { v[myId]++; } public void sendAction() { //include the vector in the message v[myId]++; } public void receiveAction(int[] sentValue) { for (int i = 0; i < N; i++) v[i] = Util.max(v[i], sentValue[i]); v[myId]++;} }

Page 27: UBI529  Distributed Algorithms

27

3: Matrix Clocks

Motivation My vector clock describe what I “see” In some applications, I also want to know what other people see

Matrix clock: Each event has n vector clocks, one for each process The i th vector on process i is called process i ’s principle vector Principle vector is the same as vector clock before Non-principle vectors are just piggybacked on messages to update “knowledge”

Page 28: UBI529  Distributed Algorithms

28

Matrix Clock Protocol

For principle vector C on process i Increment C[i] at each “local computation” and “send” event When sending a message, all n vectors are attached to the message At each “receive” event, let V be the principle vector of the sender. C = pairwise-

max(C, V); C[i]++;

For non-principle vector C on process i, suppose it corresponds to process j At each “receive” event, let V be the vector corresponding to process j as in the

received message. C = pairwise-max(C, V)

Page 29: UBI529  Distributed Algorithms

29

Matrix Clock Example

user1 (process1)

user2 (process2)

user3 (process3)

(1,0,0)(0,0,0)(0,0,0)

(0,0,0)(0,1,0)(0,0,0)

(0,0,0)(0,0,0)(0,0,1)

(2,0,0)(0,0,0)(0,0,0)

(3,0,0)(0,0,0)(0,0,0)

(0,0,0)(0,0,0)(0,0,2)

(0,0,0)(0,2,2)(0,0,2)

(2,0,0)(2,3,2)(0,0,2)

(2,0,0)(2,3,2)(2,3,3)

Page 30: UBI529  Distributed Algorithms

30

Matrix Clock Example

(1,0,0)(0,0,0)(0,0,0)

(0,0,0)(0,1,0)(0,0,0)

(0,0,0)(0,0,0)(0,0,1)

(2,0,0)(0,0,0)(0,0,0)

(3,0,0)(0,0,0)(0,0,0)

(0,0,0)(0,0,0)(0,0,2)

(0,0,0)(0,2,2)(0,0,2)

(2,0,0)(2,3,2)(0,0,2)

(2,0,0)(2,3,2)(2,3,3)

Theorem:

On any given process, any non-principle vector is smaller than or equal to the principle vector.

Page 31: UBI529  Distributed Algorithms

31

Application of Matrix Clock

user1 (process1)

gossip D1 = (1,0,0)

user2 (process2)

gossip D2 = (0,1,0)

user3 (process3)

gossip D3 = (0,0,1)

(1,0,0)(0,0,0)(0,0,0)

(0,0,0)(0,1,0)(0,0,0)

(0,0,0)(0,0,0)(0,0,1)

(2,0,0)(0,0,0)(0,0,0)

(3,0,0)(0,0,0)(0,0,0)

(0,0,0)(0,0,0)(0,0,2)

(0,0,0)(0,2,2)(0,0,2)

(2,0,0)(2,3,2)(0,0,2)

(2,0,0)(2,3,2)(2,3,3)

D1

D3D1, D2, D3

user3 now knows that all 3 users have seen D1

Page 32: UBI529  Distributed Algorithms

32

Matrix Clock Algorithm

Page 33: UBI529  Distributed Algorithms

33

The matrix clock algorithm in Java

public class MatrixClock { int[][] M; int myId; int N; public MatrixClock(int numProc, int id) { myId = id; N = numProc; M = new int[N][N]; for (int i = 0; i < N; i++) for (int j = 0; j < N; j++) M[i][j] = 0; M[myId][myId] = 1; } public void tick() { M[myId][myId]++; } public void sendAction() { //include the matrix in the message M[myId][myId]++; } public void receiveAction(int[][] W, int srcId) { // component-wise maximum of matrices for (int i = 0; i < N; i++) if (i != myId) { for (int j = 0; j < N; j++) M[i][j] = Util.max(M[i][j], W[i][j]); }} }

Page 34: UBI529  Distributed Algorithms

34

Discussions

Vector clock tells me what I know one-dimensional data structure

Matrix clock tells me what I know about what other people know Two-dimensional data structure

?? tells me what I know about what other people know about what other people know

??-dimensional data structure

Page 35: UBI529  Distributed Algorithms

35

Summary

Goal : Define “time” in a distributed system

Logical clock “happened before” smaller clock value

Vector clock “happened before” smaller clock value

Matrix clock Gives a process knowledge about what other processes know

Page 36: UBI529  Distributed Algorithms

36

Acknowledgements

This part is heavily dependent on the course : CS4231 Parallel and Distributed Algorithms, NUS by Dr. Haifeng Yu and Dr. Vijay Gargh Elements of Distributed Computing Book and also Dr. Sukumar Ghosh Iowa University Distributed Systems course 22C:166