w ha t is a distributed sys tem?

25
CS 378 Intro to Distributed Computing Lorenzo Alvisi Harish Rajamani What is a distributed system? “A distributed system is one in which the failure of a computer you didn’t even know existed can render your own computer unusable.” Leslie Lamport

Upload: others

Post on 03-Feb-2022

2 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: W ha t is a distributed sys tem?

CS 378Intro to

Distributed ComputingLorenzo Alvisi

Harish Rajamani

What is a distributed system?

“A distributed system is one in which the failure of a computer you didn’t even know existed can render your own computer unusable.”

Leslie Lamport

Page 2: W ha t is a distributed sys tem?

Local OS Local OS Local OS

Network

Machine CMachine BMachine A

Middleware

What is a distributed system?

A distributed system is software through which a collection of independent computers appears to its users as a single, coherent system

Distributed Applications

Goals (and auto-goals) of a Distributed System

Connecting Resources and Users

Transparency

Openness

Scalability

Page 3: W ha t is a distributed sys tem?

Transparency

Access Hides differences in data representation and invocation mechanisms

Location Hides where an object resides

Migration Hides from an object that object’s location

Relocation Hides from a client the change of location of an object to which the client is bound

Replication Hides that an object may be replicated, with replicas at different locations

Concurrency Hides coordination of activities between objects

Failure Hides the failure and recovery of object

Persistence Hides whether a resource is in memory or on disk

Transparency Description

OpennessConform to well-defined interfaces

Easily interact with other open systems

Achieve independence in heterogeneity wrtHardwarePlatformLanguages

Support different app/user-specific policies

Page 4: W ha t is a distributed sys tem?

OpennessConform to well-defined interfaces

Easily interact with other open systems

Achieve independence in heterogeneity wrtHardwarePlatformLanguages

Support different app/user-specific policies

ideally, provide only mechanisms!

Scalability

Size scalability number of users and processes

Geographical scalabilitymaximum distance between nodes

Administrative scalabilitynumber of administrative domains

Page 5: W ha t is a distributed sys tem?

ScalingDistribute

partition data and computation across multiple machineJava applets, DNS, WWW

Replicatemake copies of data available at different machines

mirrored web sites, replicated fs, replicated db

Cacheallow client processes to access local copies

Web caches, file cachingConsi

stency

A first course in Distributed Computing...

Two basic approaches

cover many interesting systems, and distill from them fundamental principles

focus on a deep understanding of the fundamental principles, and see them instantiated in a few systems

Page 6: W ha t is a distributed sys tem?

A few intriguing questions

How do we talk about a distributed execution?

Can we draw global conclusions from local information?

Can we coordinate operations without relying on synchrony?

For the problems we know how to solve, how do we characterize the “goodness” of our solution?

Are there problems that simply cannot be solved?

What are useful notions of consistency, and how do we maintain them?

What if part of the system is down? Can we still do useful work? What if instead part of the system becomes “possessed” and starts behaving arbitrarily–all bets are off?

Page 7: W ha t is a distributed sys tem?

Global Predicate Detection

and Event Ordering

Our Problem

To compute predicatesover the state of

a distributed application

Page 8: W ha t is a distributed sys tem?

Model

Message passing

No failures

Two possible timing assumptions:1. Synchronous System2. Asynchronous System

No upper bound on message delivery timeNo bound on relative process speedsNo centralized clock

Clock Synchronization

External Clock Synchronization:keeps processor clock within some maximum

deviation from an external time source.

• can exchange of info about timing events of different systems

• can take actions at real-time deadlines

• synchronization within 0.1 ms

Internal Clock Synchronization:keeps processor clocks within some maximum deviation from each other.

• can measure duration of distributed activities that start on one process and terminate on another

• can totally order events that occur on a distributed system

Page 9: W ha t is a distributed sys tem?

Clock Synchronization:Take 1

Assume an upper bound and a lower bound on message delivery time

Guarantee that processes stay synchronized within .max−min

max

min

Clock Synchronization:Take 1

Assume an upper bound and a lower bound on message delivery time

Guarantee that processes stay synchronized within .

Time (ms) 93.174.484.22

% of messagesProblem:

5000 message run(IBM Almaden)

max−min

max

min

Page 10: W ha t is a distributed sys tem?

Clock Synchronization:Take 2

No upper bound on message delivery time...

...but lower bound on message delivery time

Use timeout to detect process failures

slaves send messages to master

Master averages slaves value; computes fault-tolerant average

Precision:

maxp

min

4 maxp − min

Probabilistic Clock Synchronization (Cristian)

Master-Slave architecture

Master is connected to external time source

Slaves read master’s clock and ! adjust their own

How accurately can a slave read the master’s clock?

Page 11: W ha t is a distributed sys tem?

The Idea

Clock accuracy depends on message roundtrip time

if roundtrip is small, master and slave cannot have drifted by much!

Since no upper bound on message delivery, no certainty of accurate enough reading...

… but very accurate reading can be achieved by repeated attempts

Asynchronous systems

Weakest possible assumptions

cfr. “finite progress axiom”

Weak assumptions less vulnerabilities

Asynchronous " slow

“Interesting” model w.r.t. failures (ah ah ah!)

Page 12: W ha t is a distributed sys tem?

Client-Server

Processes exchange messages using Remote Procedure Call (RPC)

A client requests a service by sending the server a message. The client blocks while waiting

for a response

sc

Client-Server

Processes exchange messages using Remote Procedure Call (RPC)

The server computes the response (possibly asking other servers) and returns it to the

client

A client requests a service by sending the server a message. The client blocks while waiting

for a response

s

#!?%!

c

Page 13: W ha t is a distributed sys tem?

Deadlock!

p2

p1

p3

Goal

Design a protocol by which a processor can determine whether a global predicate (say, deadlock)

holds

Page 14: W ha t is a distributed sys tem?

Draw arrow from to if has received a request but has not responded yet

Wait-For Graphs

pi pj pj

Draw arrow from to if has received a request but has not responded yet

Cycle in WFG deadlock

Deadlock cycle in WFG

Wait-For Graphs

⇒ ♦

⇒ ·

pi pj pj

Page 15: W ha t is a distributed sys tem?

The protocol

sends a message to

On receipt of ’s message, replies with its state and wait-for info

p1 . . . p3p0

p0 pi

An execution

p1p1

p2 p2p3 p3

Page 16: W ha t is a distributed sys tem?

An execution

p1p1

p2 p2p3 p3

An execution

Ghost Deadlock!

p2 p2

p1p1

p3 p3

Page 17: W ha t is a distributed sys tem?

Houston,we have a problem...

Asynchronous system

no centralized clock, etc. etc.

Synchrony useful to

coordinate actions

order events

Mmmmhhh...

Events and HistoriesProcesses execute sequences of events

Events can be of 3 types: local, send, and receive

is the -th event of process

The local history of process is the sequence of events executed by process

: prefix that contains first k events

: initial, empty sequence

The history H is the set

hp

hkp

h0

p

eip

hp0∪ hp1

∪ . . . hpn−1

NOTE: In H, local histories are interpreted as sets, rather than sequences, of events

p

p

p

i

Page 18: W ha t is a distributed sys tem?

Ordering events

Observation 1: Events in a local history are totally ordered

timepi

Ordering events

Observation 1: Events in a local history are totally ordered

Observation 2: For every message , precedes

timepi

timepi

time

m receive(m)send(m)

m

pj

Page 19: W ha t is a distributed sys tem?

Happened-before(Lamport[1978])

A binary relation defined over events

1. if and , then

2. if and , then

3. if and then

eki , el

i ∈ hi k < l eki → e

li

ei = send(m) ej = receive(m)ei → ej

e → e′

e′→ e

′′e → e

′′

Space-Time diagrams

A graphic representation of a distributed execution

time

p1

p2p3

p1

p2

p3

Page 20: W ha t is a distributed sys tem?

Space-Time diagrams

A graphic representation of a distributed execution

time

p1

p2p3

p1

p2

p3

Space-Time diagrams

A graphic representation of a distributed execution

time

p1

p2p3

p1

p2

p3

Page 21: W ha t is a distributed sys tem?

Space-Time diagrams

A graphic representation of a distributed execution

time

p1

p2p3

p1

p2

p3

Space-Time diagrams

A graphic representation of a distributed execution

time

p1

p2p3

p1

p2

p3

H and impose a partial order→

Page 22: W ha t is a distributed sys tem?

Space-Time diagrams

A graphic representation of a distributed execution

time

p1

p2p3

p1

p2

p3

H and impose a partial order→

Space-Time diagrams

A graphic representation of a distributed execution

time

p1

p2p3

p1

p2

p3

H and impose a partial order→

Page 23: W ha t is a distributed sys tem?

Space-Time diagrams

A graphic representation of a distributed execution

time

p1

p2p3

p1

p2

p3

H and impose a partial order→

Runs andConsistent Runs

A run is a total ordering of the events in H that is consistent with the local histories of the processors

Ex: is a run

A run is consistent if the total order imposed in the run is an extension of the partial order induced by

A single distributed computation may correspond to several consistent runs!

h1, h2, . . . , hn

Page 24: W ha t is a distributed sys tem?

Cuts

A cut C is a subset of the global history of H

p1

p2

p3

C = hc1

1∪ h

c2

2∪ . . . h

cn

n

A cut C is a subset of the global history of H

The frontier of C is the set of events

Cuts

p1

p2

p3

C = hc1

1∪ h

c2

2∪ . . . h

cn

n

ec1

1, e

c2

2, . . . e

cn

n

Page 25: W ha t is a distributed sys tem?

Global states and cuts

The global state of a distributed computation is an -tuple of local states

To each cut corresponds a global state

Σ = (σ1, . . .σn)

(σc1

1, . . .σ

cn

n)

(c1 . . . cn)

n