tddd07 real-time systems lecture 5: distributed systems ...simna73/teaching/reap/ht19/... ·...

Undergraduate course on Real-time SystemsLinköping University

TDDD07 Real-time Systems

Lecture 5: Distributed Systems &

Real-time Communication

Simin Nadjm-Tehrani

Real-time Systems Laboratory

Department of Computer and Information ScienceLinköping University

50 pagesAutumn 2019


2 of 50 Autumn 2019

Overview: Next three lectures

From one CPU to networked CPUs:• First, from one CPU to multiple CPUs

– Allocating VMs on multiple CPUs: Cloud • Next, fully distributed systems

– fundamental issues with timing and order of events

• Next, hard real-time communication– Guaranteed message delivery within a

deadline, bandwidth as a resource• Finally: QoS guarantees instead of

timing guarantees, focus on soft RT


3 of 50 Autumn 2019

Questions

• Can we temporally order all events in a distributed system?–Only if we can timestamp them with a

value from a global (universal) clock

• Can we draw any conclusions if we do not have a global clock? –What about a set of local clocks?–What if no clocks at all?


4 of 50 Autumn 2019

Clock synchronisation

Two types of algorithms:• External synchronisation

–Tries to keep the values of a set of clocks agree with an accurate clock, within a skew of δ

• Internal synchronisation–Tries to keep a set of clock values

close to each other with a maximum skew of δ


5 of 50 Autumn 2019

Lamport/Melliar-Smith Algorithm

• Internal synchronisation of n clocks

• Each clock reads the value of all other clocks at regular intervals– If the value of some clock differs from value

of own clock by more than δ, that clock value is replaced by own clock value

– The average of all clocks is computed at each node

– Own clock value is updated to the average value


6 of 50 Autumn 2019

Does it work?

• After each synchronisation interval the clocks get closer to each other

• If the skews are within δ, and the clocks are initially synchronised, then they are kept within δ from each other

• But what if clocks are faulty? What is considered a fault?


7 of 50 Autumn 2019

Faulty clocks

• If a clock skew exceeds δ then its value is eliminated – does not “harm” other clocks

• What if the skew is exactly δ? – check it as an exercise!

• What is the worst case?


8 of 50 Autumn 2019

Will be considered as correct by i and j…

c

c-2

c-

c+

i j

k

A two-face faulty clock k


9 of 50 Autumn 2019

Bound on the faulty clocks

• To guarantee that the set will keep all non-faulty clocks within δ we need an assumption on the number of faulty clocks

• For f faulty clocks the algorithm works if the number of clocks n >3f


10 of 50 Autumn 2019

Synchronisation example

“I also included a temperature compensated real time clock on Saboten to maintain an accurate alarm for periodic wake from sleep.”http://hackaday.com/2015/10/05/sensor-net-makes-life-easier-for-rice-farmers/



• Physical time vs. Logical time

• Example clock synchronisation algorithm

• Logical clocks

• Vector clocks

Time in Distributed Systems

Next we look at events…



Event ordering

• In the absence of clock synchronisation we may use order that is intrinsic in an application

Client A

Client B

Server

ReqA RepA

ReqB



Logical time

• Based on event counts at each node• May reflect causality • Sending a message always precedes

receiving it • Messages sent in a sequence by one

node are (potentially) causally related to each other– I do not pay for an item if I do not first

check the item’s availability



Happened before~~~~

• Assume each process has a monotonically increasing local clock

• Rule 1: if the time for event x is before the time for event y then x y

• Rule 2: if x denotes sending a message and y denotes receiving the same message then x y

• Rule 3: is transitive

A partial order…



Lamport’s Logical clocks

Seminal paper from 1978…

• Logical clock: An event counter that respects the “happened before” ordering

• Partial order: Hence, any events that are not in the “happened before” relation are treated as concurrent



Example (1)

P

Q

R

a g

b c he

f d

What do we know here?



Implementing logical clocks

LC “time-stamps” each event

• Rule 1: Each time a local event takes place, increment LC by 1

• Rule 2: Each time a message m is sent the LC value at the sender is appended to the message (m_LC)

• Rule 3: Each time a message m is received set LC to max(LC, m_LC)+1



Exercise

• Calculate LC for all events in example (1)!



What does LC tell us?

• x y → LC(x) < LC(y)

• Note that:

LC(x) < LC(y) does not imply x y



Example (1)

P

Q

R

a g

b c he

f d

What did we capture by LC?



Is concurrency transitive?

• e is concurrent with g• g is concurrent with f

• but e is not concurrent with f!

• Comparing the LC values does not tell us if two events are concurrentin the sense of • Vector clocks do more...



Vector clocks (VC)

• Every node maintains a vector of counted events (one entry for each other node)

• VC for event e, VC(e) = [1,…,n], shows the perceived count of events at nodes 1,…,n

• VC(e)[k] denotes the entry for node k



Implementation of VC

• Rule 1: For each local event increment own entry

• Rule 2: When sending message m, append to m the VC(send(m)) as a timestamp T

• Rule 3: When event x is “receiving a message” at node i, – increment own entry: VC(x)[i]:= VC(x)[i]+1– For every entry j in the VC: Set the entry to

max (T[j], VC(x)[j])



Example (1) revisited

P

Q

R

VC(a) = [1, 0, 0] VC(g) = [2, 0, 0]

VC(b) = [1, 2, 0]VC(c) = [1, 3, 0]

VC(h) = [2, 4, 0]VC(e) = [0, 1, 0]

VC(f) = [0,1,1] VC(d) = [1, 3, 2]

With Vector clocks VC:



Precedence in VC

• Relation < on vector clocks defined by:VC(x) < VC(y) iff– For all i: VC(x)[i] ≤ VC(y)[i]– For some i: VC(x)[i] < VC(y)[i]

• It follows that event x event y if VC(x) < VC(y)



Concurrency and VC

Hence:• VC(x) < VC(y) iff x y

• If neither VC(x) < VC(y) nor VC(y) < VC(x) then x and y are concurrent



Exercise: Example (2)

[0,0,0]

[0,0,0]

[0,0,0]

h

p



Pros and cons

• Vector clocks are a simple means of capturing “happened before” exactly

VC(x) < VC(y) iff x y

• For large systems we have resource issues (bandwidth wasted), and maintainability issues

Recall: LC(x) < LC(y) x y →



• Vector clocks help to synchronise at event level– Consistent snapshots

• But reasoning about response times and fault management needs quantitative bounds

Distributed snapshot

PQ

R



Extra reading

• If you already knew VC or want to go deeper, here’s more to read!

– http://basho.com/posts/technical/why-vector-clocks-are-easy/

– http://basho.com/posts/technical/why-vector-clocks-are-hard/



We will come back

• ... to fault management in distributed systems and the impact of time in next lectures!



Overview: Next three lectures

From one CPU to networked CPUs:• First, from one CPU to multiple CPUs

– Allocating VMs on multiple CPUs: Cloud • Next, fully distributed systems

– fundamental issues with timing and order of events

• Next, hard real-time communication– Guaranteed message delivery within a

deadline, bandwidth as a resource• Finally: QoS guarantees instead of

timing guarantees, focus on soft RT



So far…

• We distinguished between reasoning about message end points (clock values) and event sequences

• Next: message latency



• In the scheduling lectures we looked at single processor hard real-time scheduling

• RT communication is about scheduling the communication medium

From last lectures

...

...



Fundamental reason

Two interaction models in distributed systems

• Synchronous model – Assumes that the rate of computation at

different nodes can be related, and there is a bound on maximum message exchange latency

• Asynchronous model– Has no assumptions on rate of processing in

different nodes, or bounds on message latency

Can use timers and timeouts

Only coordination possible at event level



Real-time message scheduling

• Needed for providing the bound on maximum message delay

• Essential for reasoning about system properties under the synchronous model of distributed systems–e.g. proof that a service will be

provided despite a single node crash will need bounds on message delay



RT communication in applications

• Vehicle electronics– Power train and chassis– Infotainment/telematics– Body electronics

• A modern car configuration has over 40 ECUs, distributed over several buses

• Avionics-specific standards– ARINC 429 (70’s), AFDX (in Airbus 380)



Message constraints

• Message delivery time bound dictated by application–So called end-to-end deadlines

• Example: shortly after each driver braking, brake light must know it in order to turn on!



New resource

Single Node DistributedResource CPU Bandwidth

Scheduled element

Task/process Message

Demand on resource

WCET & interarrival

Message size & frequency

Performancemetric

Deadlines met &Utilisation

Message delay &Throughput



Two approaches

• We will look at two well-known methods for bus scheduling–Event triggered (CAN)–Time triggered (TTP)

• Used extensively in automotive and aerospace applications respectively



The CAN bus

Amount of wires…

• Controller area network that was developed for use in all cars built in Europe

• Compulsory for the on-board diagnostics in USA car models from 2008

• Why?– Imagine: 2500 signals, 32 ECUs on

one bus



Predecessor to CAN (1976)

Ethernet:• Current versions give high bandwidth

but time-wise nondeterministic• CSMA/CD

–Sense before sending on the medium(Carrier Sense: CS)

–All nodes broadcast to all (Multiple Access: MA)

– If collision, back off and resend (Collision Detection: CD)



Collisions

• Ethernet has high throughput but temporally nondeterministic

Node 1 sends

Node 3 waits for sending

Node 2 waits for sending

Node 2 & 3 start to send

Collision



Backoff

• The period for waiting after a collision• Each node waits up to two “slot times”

after a collision (random wait)• If a new collision, the max. backoff

interval is doubled• After 10 attempts the node stops

doubling• After 16 attempts declares an error



Collisions & non-determinism

• Model the network throughput and compute probabilistic guarantees that collisions will not be too often– Theoretical study: With 100Mbps, sending

1000 messages of 128 bytes per second, there is a 99% probability that there will not be a delay longer than 1 ms due to collisions over ~1140 years

[www.rti.com Ethernet study] • If you cannot measure effects of

collisions, make collision resolution deterministic!

How often?



CAN protocol

• Developed by Bosch and Intel (1986)• ISO Standard 1993• Highest bandwidth 1Mbps, ~40m• CSMA/CR: broadcast to all nodes• CR: Collision resolution by bit-wise

arbitration plus fixed priorities (deterministic)

• Bus value is bitwise AND of the sent messages



Message priority

• The ID of the frame is located at the beginning– initial bits that are inserted into the

bus are the ID-bits– ID also determines the priority of a

frame–priority of the frame increases as the

ID decreases



Bitwise arbitration

Node 1 sends: 010...Node 2 sends: 100...Node 3 sends: 011...

• This is how ID for a message (frame) works as its priority

... sends rest of packet

... detects collision first

... detects collision next



Note

• Two roles for message ID:–Arbitration via priority–Every node upon receiving a

message, uses the ID to work out whether that message is any use to it or not



Response time analysis

• Scheduling analysis: Is every message delivered before its deadline?

Next lecture…

tddd07 real-time systems lecture 5: distributed systems ...simna73/teaching/reap/ht19/... ·...

Documents