ubi529 distributed algorithms

UBI529 Distributed Algorithms

2. Time Synchronization in Distributed Systems

2

Assumptions

Process can perform three kinds of atomic actions/events Local computation Send a single message to a single process Receive a single message from a single process

Communication model Point-to-point Error-free, infinite buffer Potentially out of order

3

Motivation

Many protocols need to impose an ordering among events If two players open the same treasure chest “almost” at the

same time, who should get the treasure?

Physical clocks: Seems to completely solve the problem But what about theory of relativity? Even without theory of relativity – efficiency problems

How accurate is sufficient? Without out-of-band communication: Minimum message

propagation delay With out-of-band communication: distance/speed of light In other words, some time it has to be “quite” accurate

4

Physical Clock Synchronization

The relation between clock time and UTC when clocks tick at different rates.

5

Classification

Types of Synchronization

External Synchronization : Keep as close

to UTC as possible

Internal Synchronization : Mutual

synchronization is important

Phase Synchronization

Types of clocks

Unbounded 0, 1, 2, 3, . . .

Bounded 0,1, 2, . . . M-1, 0, 1, .

. .

Unbounded clocks are not realistic, but are easier to deal with in algorithms. Real clocks are always bounded.

6

Terminology

R RNewtonian time

c l o c k t i m e

clock 1

clock 2

Š

drift rate=

Drift rate : Max rate two clocks can differ, < 10^-6 for Xtal clocks

Clock skew : Max. Allowed drift

Resynchronization interval R : Depends on clock skew

Challenges

(Drift is unavoidable)Accounting for propagation delayAccounting for processing delayFaulty clocks

7

Internal synchronization

A simple averaging algorithm Assume n clocks, at most t are

faulty

Step 1. Read every clock in the system.

Step 2. Discard outliers and substitute them by

the value of the local clock.

Step 3. Update the clock using the average of

these values.

Synchronization is maintained if n > 3tWhy?

i j

k

c

c+

c-

c-2

Some clocks may be faulty. Exhibits 2-faced behavior

8

Internal synchronization

A simple averaging algorithmThe maximum difference between

the averages computed by two

non-faulty nodes is (3td / n)

To keep the clocks synchronized,

3td /n < d

So, 3t < n

i j

k

c

c+

c-

c-2

9

Network Time Protocol (NTP)

Tiered architecture Broadcast mode

- least accurate

Procedure call

- medium accuracy

Peer-to-peer mode

- upper level servers use this for max

accuracy

Timeserver

The tree can reconfigure itself if some node fails.

Level 1Level 1

Level 1Level 0

Level 2Level 2

Level 2

10

P2P mode of NTP

Let Q’s time be ahead of P’s time by . Then

T2 = T1 + TPQ + T4 = T3 + TQP -

y = TPQ + TQP = T2 +T4 -T1 -T3 (RTT)

= (T2 -T4 -T1 +T3) / 2 - (TPQ - TQP) / 2

So, x- y/2 ≤ ≤ x+ y/2

T2

T1 T4

T3Q

P

Ping several times, and obtain the smallest value of y. Use it to calculate

x Between y/2 and -y/2

11

Cristian’s Algorithm

1. Client gets data from a time server.2. It computes the round trip time (RTT), and compensates for this delay while adjusting its own clock

12

Berkeley Time Protocol (FromTanenbaum 2007)

a) The time daemon asks all the other machines for their clockvalues.

b) The machines answer

The time daemon tells everyone how to adjust their clock.

13

Software “Clocks”

Software “clocks” can incur much lower overhead than maintaining (sufficiently accurate) physical clocks

Allows a protocol to infer ordering among events

Goal of software “clocks”: Capture event ordering that are visible to users

But what orderings are visible to users without physical clocks?

14

Visible Ordering to Users

A B (process order)

B C (send-receive order)

A C (transitivity)

user1 (process1)

user2 (process2)

user3 (process3)

A B

C

D

A ? D

B ? D

C ? D

15

“Happened-Before” Relation

“Happened-before” relation captures the ordering that is visible to users when there is no physical clock

A partial order among events Process-order, send-receive order, transitivity

First introduced by Lamport – Considered to be the first fundamental result in distributed computing

Goal of software “clock” is to capture the above “happened-before” relation. Formally, a logical clock C is a map from the set of events E to N (the set of natural numbers) with the following constraint:

16

1: Logical Clocks

Each event has a single integer as its logical clock value Each process has a local counter C Increment C at each “local computation” and “send” event When sending a message, logical clock value V is attached to

the message. At each “receive” event, C = max(C, V) + 1

user1 (process1)

user2 (process2)

user3 (process3)

1 2 3 4

1

3 = max(1,2)+1

4 5

1 2 5

3

4 = max(3,3)+1

17

Logical Clock Properties

Theorem:

Event s happens before t the logical clock value of s is smaller than the logical clock value of t.

The reverse may not be true

user1 (process1)

user2 (process2)

user3 (process3)

1 2 3 4

1 4 5

1 2 5

3

18

Lamport’s Logical Clock Algorithm

19

Lamport's logical clock algorithm in Java

public class LamportClock { int c; public LamportClock() { c = 1; } public int getValue() { return c; } public void tick() { // on internal actions c = c + 1; } public void sendAction() { // include c in message c = c + 1; } public void receiveAction(int src, int sentValue) { c = Util.max(c, sentValue) + 1; }}

20

2: Vector Clocks

Logical clock: Event s happens before event t the logical clock value of s is

smaller than the logical clock value of t.Vector clock:

Event s happens before event t the vector clock value of s is “smaller” than the vector clock value of t.

Each event has a vector of n integers as its vector clock value v1 = v2 if all n fields same v1 v2 if every field in v1 is less than or equal to the

corresponding field in v2 v1 < v2 if v1 v2 and v1 ≠ v2

Relation “<“ here is not a total order

21

Vector Clock Protocol

Each process i has a local vector C

Increment C[i] at each “local computation” and “send” event

When sending a message, vector clock value V is attached to the

message. At each “receive” event, C = pairwise-max(C, V); C[i]++;

user1 (process1)

user2 (process2)

user3 (process3)

(1,0,0) (2,0,0) (3,0,0)

(0,0,2)

(0,2,2)(0,1,0)

(0,0,1)

(2,3,2)

(2,3,3)

C = (0,1,0), V = (0,0,2)

pairwise-max(C, V) = (0,1,2)

22

Vector Clock Properties

Event s happens before t vector clock value of s < vector clock value of t: There

must be a chain from s to t

Event s happens before t vector clock value of s < vector clock value of t If s and t on same process, done If s is on p and t is on q, let VS be s’s vector clock and VT be t’s VS < VT VS[p] ≤ VT[p] Must be a sequence of message from p to q after s

and before t

q

s

t

p

23

Example Application of Vector Clock

Each user has an order placed by a customer Want all users to know all orders

Each order has a vector clock value

user1 (process1)

user2 (process2)

user3 (process3)

D1, (1,0,0) (2,0,0) (3,0,0)

(0,0,2)

(0,2,2)D2, (0,1,0)

D3, (0,0,1)

(2,3,2)

(2,3,3)

D1

D3 D1, D2, D3

I have seen all orders whose vector clock is smaller than (2,3,2):

I can avoid linear search for existence testing

24

Vector Clock Algorithm

25

A sample execution of the vector clock algorithm

26

Vector clock algorithm in Java

public class VectorClock { public int[] v; int myId; int N; public VectorClock(int numProc, int id) { myId = id; N = numProc; v = new int[numProc]; for (int i = 0; i < N; i++) v[i] = 0; v[myId] = 1; } public void tick() { v[myId]++; } public void sendAction() { //include the vector in the message v[myId]++; } public void receiveAction(int[] sentValue) { for (int i = 0; i < N; i++) v[i] = Util.max(v[i], sentValue[i]); v[myId]++;} }

27

3: Matrix Clocks

Motivation My vector clock describe what I “see” In some applications, I also want to know what other people see

Matrix clock: Each event has n vector clocks, one for each process The i th vector on process i is called process i ’s principle vector Principle vector is the same as vector clock before Non-principle vectors are just piggybacked on messages to update “knowledge”

28

Matrix Clock Protocol

For principle vector C on process i Increment C[i] at each “local computation” and “send” event When sending a message, all n vectors are attached to the message At each “receive” event, let V be the principle vector of the sender. C = pairwise-

max(C, V); C[i]++;

For non-principle vector C on process i, suppose it corresponds to process j At each “receive” event, let V be the vector corresponding to process j as in the

received message. C = pairwise-max(C, V)

29

Matrix Clock Example

user1 (process1)

user2 (process2)

user3 (process3)

(1,0,0)(0,0,0)(0,0,0)

(0,0,0)(0,1,0)(0,0,0)

(0,0,0)(0,0,0)(0,0,1)

(2,0,0)(0,0,0)(0,0,0)

(3,0,0)(0,0,0)(0,0,0)

(0,0,0)(0,0,0)(0,0,2)

(0,0,0)(0,2,2)(0,0,2)

(2,0,0)(2,3,2)(0,0,2)

(2,0,0)(2,3,2)(2,3,3)

30

Matrix Clock Example

(1,0,0)(0,0,0)(0,0,0)

(0,0,0)(0,1,0)(0,0,0)

(0,0,0)(0,0,0)(0,0,1)

(2,0,0)(0,0,0)(0,0,0)

(3,0,0)(0,0,0)(0,0,0)

(0,0,0)(0,0,0)(0,0,2)

(0,0,0)(0,2,2)(0,0,2)

(2,0,0)(2,3,2)(0,0,2)

(2,0,0)(2,3,2)(2,3,3)

Theorem:

On any given process, any non-principle vector is smaller than or equal to the principle vector.

31

Application of Matrix Clock

user1 (process1)

gossip D1 = (1,0,0)

user2 (process2)

gossip D2 = (0,1,0)

user3 (process3)

gossip D3 = (0,0,1)

(1,0,0)(0,0,0)(0,0,0)

(0,0,0)(0,1,0)(0,0,0)

(0,0,0)(0,0,0)(0,0,1)

(2,0,0)(0,0,0)(0,0,0)

(3,0,0)(0,0,0)(0,0,0)

(0,0,0)(0,0,0)(0,0,2)

(0,0,0)(0,2,2)(0,0,2)

(2,0,0)(2,3,2)(0,0,2)

(2,0,0)(2,3,2)(2,3,3)

D1

D3D1, D2, D3

user3 now knows that all 3 users have seen D1

32

Matrix Clock Algorithm

33

The matrix clock algorithm in Java

public class MatrixClock { int[][] M; int myId; int N; public MatrixClock(int numProc, int id) { myId = id; N = numProc; M = new int[N][N]; for (int i = 0; i < N; i++) for (int j = 0; j < N; j++) M[i][j] = 0; M[myId][myId] = 1; } public void tick() { M[myId][myId]++; } public void sendAction() { //include the matrix in the message M[myId][myId]++; } public void receiveAction(int[][] W, int srcId) { // component-wise maximum of matrices for (int i = 0; i < N; i++) if (i != myId) { for (int j = 0; j < N; j++) M[i][j] = Util.max(M[i][j], W[i][j]); }} }

34

Discussions

Vector clock tells me what I know one-dimensional data structure

Matrix clock tells me what I know about what other people know Two-dimensional data structure

?? tells me what I know about what other people know about what other people know

??-dimensional data structure

35

Summary

Goal : Define “time” in a distributed system

Logical clock “happened before” smaller clock value

Vector clock “happened before” smaller clock value

Matrix clock Gives a process knowledge about what other processes know

36

Acknowledgements

This part is heavily dependent on the course : CS4231 Parallel and Distributed Algorithms, NUS by Dr. Haifeng Yu and Dr. Vijay Gargh Elements of Distributed Computing Book and also Dr. Sukumar Ghosh Iowa University Distributed Systems course 22C:166

ubi529 distributed algorithms

Documents