tddd07 real-time systems lecture 5: distributed systems ...simna73/teaching/reap/ht19/... ·...
TRANSCRIPT
Undergraduate course on Real-time SystemsLinköping University
TDDD07 Real-time Systems
Lecture 5: Distributed Systems &
Real-time Communication
Simin Nadjm-Tehrani
Real-time Systems Laboratory
Department of Computer and Information ScienceLinköping University
50 pagesAutumn 2019
Undergraduate course on Real-time SystemsLinköping University
2 of 50 Autumn 2019
Overview: Next three lectures
From one CPU to networked CPUs:• First, from one CPU to multiple CPUs
– Allocating VMs on multiple CPUs: Cloud • Next, fully distributed systems
– fundamental issues with timing and order of events
• Next, hard real-time communication– Guaranteed message delivery within a
deadline, bandwidth as a resource• Finally: QoS guarantees instead of
timing guarantees, focus on soft RT
Undergraduate course on Real-time SystemsLinköping University
3 of 50 Autumn 2019
Questions
• Can we temporally order all events in a distributed system?–Only if we can timestamp them with a
value from a global (universal) clock
• Can we draw any conclusions if we do not have a global clock? –What about a set of local clocks?–What if no clocks at all?
Undergraduate course on Real-time SystemsLinköping University
4 of 50 Autumn 2019
Clock synchronisation
Two types of algorithms:• External synchronisation
–Tries to keep the values of a set of clocks agree with an accurate clock, within a skew of δ
• Internal synchronisation–Tries to keep a set of clock values
close to each other with a maximum skew of δ
Undergraduate course on Real-time SystemsLinköping University
5 of 50 Autumn 2019
Lamport/Melliar-Smith Algorithm
• Internal synchronisation of n clocks
• Each clock reads the value of all other clocks at regular intervals– If the value of some clock differs from value
of own clock by more than δ, that clock value is replaced by own clock value
– The average of all clocks is computed at each node
– Own clock value is updated to the average value
Undergraduate course on Real-time SystemsLinköping University
6 of 50 Autumn 2019
Does it work?
• After each synchronisation interval the clocks get closer to each other
• If the skews are within δ, and the clocks are initially synchronised, then they are kept within δ from each other
• But what if clocks are faulty? What is considered a fault?
Undergraduate course on Real-time SystemsLinköping University
7 of 50 Autumn 2019
Faulty clocks
• If a clock skew exceeds δ then its value is eliminated – does not “harm” other clocks
• What if the skew is exactly δ? – check it as an exercise!
• What is the worst case?
Undergraduate course on Real-time SystemsLinköping University
8 of 50 Autumn 2019
Will be considered as correct by i and j…
c
c-2
c-
c+
i j
k
A two-face faulty clock k
Undergraduate course on Real-time SystemsLinköping University
9 of 50 Autumn 2019
Bound on the faulty clocks
• To guarantee that the set will keep all non-faulty clocks within δ we need an assumption on the number of faulty clocks
• For f faulty clocks the algorithm works if the number of clocks n >3f
Undergraduate course on Real-time SystemsLinköping University
10 of 50 Autumn 2019
Synchronisation example
“I also included a temperature compensated real time clock on Saboten to maintain an accurate alarm for periodic wake from sleep.”http://hackaday.com/2015/10/05/sensor-net-makes-life-easier-for-rice-farmers/
Undergraduate course on Real-time SystemsLinköping University
11 of 50 Autumn 2019
• Physical time vs. Logical time
• Example clock synchronisation algorithm
• Logical clocks
• Vector clocks
Time in Distributed Systems
Next we look at events…
Undergraduate course on Real-time SystemsLinköping University
12 of 50 Autumn 2019
Event ordering
• In the absence of clock synchronisation we may use order that is intrinsic in an application
Client A
Client B
Server
ReqA RepA
ReqB
Undergraduate course on Real-time SystemsLinköping University
13 of 50 Autumn 2019
Logical time
• Based on event counts at each node• May reflect causality • Sending a message always precedes
receiving it • Messages sent in a sequence by one
node are (potentially) causally related to each other– I do not pay for an item if I do not first
check the item’s availability
Undergraduate course on Real-time SystemsLinköping University
14 of 50 Autumn 2019
Happened before~~~~
• Assume each process has a monotonically increasing local clock
• Rule 1: if the time for event x is before the time for event y then x y
• Rule 2: if x denotes sending a message and y denotes receiving the same message then x y
• Rule 3: is transitive
A partial order…
Undergraduate course on Real-time SystemsLinköping University
15 of 50 Autumn 2019
Lamport’s Logical clocks
Seminal paper from 1978…
• Logical clock: An event counter that respects the “happened before” ordering
• Partial order: Hence, any events that are not in the “happened before” relation are treated as concurrent
Undergraduate course on Real-time SystemsLinköping University
16 of 50 Autumn 2019
Example (1)
P
Q
R
a g
b c he
f d
What do we know here?
Undergraduate course on Real-time SystemsLinköping University
17 of 50 Autumn 2019
Implementing logical clocks
LC “time-stamps” each event
• Rule 1: Each time a local event takes place, increment LC by 1
• Rule 2: Each time a message m is sent the LC value at the sender is appended to the message (m_LC)
• Rule 3: Each time a message m is received set LC to max(LC, m_LC)+1
Undergraduate course on Real-time SystemsLinköping University
18 of 50 Autumn 2019
Exercise
• Calculate LC for all events in example (1)!
Undergraduate course on Real-time SystemsLinköping University
19 of 50 Autumn 2019
What does LC tell us?
• x y → LC(x) < LC(y)
• Note that:
LC(x) < LC(y) does not imply x y
Undergraduate course on Real-time SystemsLinköping University
20 of 50 Autumn 2019
Example (1)
P
Q
R
a g
b c he
f d
What did we capture by LC?
Undergraduate course on Real-time SystemsLinköping University
21 of 50 Autumn 2019
Is concurrency transitive?
• e is concurrent with g• g is concurrent with f
• but e is not concurrent with f!
• Comparing the LC values does not tell us if two events are concurrentin the sense of • Vector clocks do more...
Undergraduate course on Real-time SystemsLinköping University
22 of 50 Autumn 2019
Vector clocks (VC)
• Every node maintains a vector of counted events (one entry for each other node)
• VC for event e, VC(e) = [1,…,n], shows the perceived count of events at nodes 1,…,n
• VC(e)[k] denotes the entry for node k
Undergraduate course on Real-time SystemsLinköping University
23 of 50 Autumn 2019
Implementation of VC
• Rule 1: For each local event increment own entry
• Rule 2: When sending message m, append to m the VC(send(m)) as a timestamp T
• Rule 3: When event x is “receiving a message” at node i, – increment own entry: VC(x)[i]:= VC(x)[i]+1– For every entry j in the VC: Set the entry to
max (T[j], VC(x)[j])
Undergraduate course on Real-time SystemsLinköping University
24 of 50 Autumn 2019
Example (1) revisited
P
Q
R
VC(a) = [1, 0, 0] VC(g) = [2, 0, 0]
VC(b) = [1, 2, 0]VC(c) = [1, 3, 0]
VC(h) = [2, 4, 0]VC(e) = [0, 1, 0]
VC(f) = [0,1,1] VC(d) = [1, 3, 2]
With Vector clocks VC:
Undergraduate course on Real-time SystemsLinköping University
25 of 50 Autumn 2019
Precedence in VC
• Relation < on vector clocks defined by:VC(x) < VC(y) iff– For all i: VC(x)[i] ≤ VC(y)[i]– For some i: VC(x)[i] < VC(y)[i]
• It follows that event x event y if VC(x) < VC(y)
Undergraduate course on Real-time SystemsLinköping University
26 of 50 Autumn 2019
Concurrency and VC
Hence:• VC(x) < VC(y) iff x y
• If neither VC(x) < VC(y) nor VC(y) < VC(x) then x and y are concurrent
Undergraduate course on Real-time SystemsLinköping University
27 of 50 Autumn 2019
Exercise: Example (2)
[0,0,0]
[0,0,0]
[0,0,0]
h
p
Undergraduate course on Real-time SystemsLinköping University
28 of 50 Autumn 2019
Pros and cons
• Vector clocks are a simple means of capturing “happened before” exactly
VC(x) < VC(y) iff x y
• For large systems we have resource issues (bandwidth wasted), and maintainability issues
Recall: LC(x) < LC(y) x y →
Undergraduate course on Real-time SystemsLinköping University
29 of 50 Autumn 2019
• Vector clocks help to synchronise at event level– Consistent snapshots
• But reasoning about response times and fault management needs quantitative bounds
Distributed snapshot
PQ
R
Undergraduate course on Real-time SystemsLinköping University
30 of 50 Autumn 2019
Extra reading
• If you already knew VC or want to go deeper, here’s more to read!
– http://basho.com/posts/technical/why-vector-clocks-are-easy/
– http://basho.com/posts/technical/why-vector-clocks-are-hard/
Undergraduate course on Real-time SystemsLinköping University
31 of 50 Autumn 2019
We will come back
• ... to fault management in distributed systems and the impact of time in next lectures!
Undergraduate course on Real-time SystemsLinköping University
32 of 50 Autumn 2019
Overview: Next three lectures
From one CPU to networked CPUs:• First, from one CPU to multiple CPUs
– Allocating VMs on multiple CPUs: Cloud • Next, fully distributed systems
– fundamental issues with timing and order of events
• Next, hard real-time communication– Guaranteed message delivery within a
deadline, bandwidth as a resource• Finally: QoS guarantees instead of
timing guarantees, focus on soft RT
Undergraduate course on Real-time SystemsLinköping University
33 of 50 Autumn 2019
So far…
• We distinguished between reasoning about message end points (clock values) and event sequences
• Next: message latency
Undergraduate course on Real-time SystemsLinköping University
34 of 50 Autumn 2019
• In the scheduling lectures we looked at single processor hard real-time scheduling
• RT communication is about scheduling the communication medium
From last lectures
...
...
Undergraduate course on Real-time SystemsLinköping University
35 of 50 Autumn 2019
Fundamental reason
Two interaction models in distributed systems
• Synchronous model – Assumes that the rate of computation at
different nodes can be related, and there is a bound on maximum message exchange latency
• Asynchronous model– Has no assumptions on rate of processing in
different nodes, or bounds on message latency
Can use timers and timeouts
Only coordination possible at event level
Undergraduate course on Real-time SystemsLinköping University
36 of 50 Autumn 2019
Real-time message scheduling
• Needed for providing the bound on maximum message delay
• Essential for reasoning about system properties under the synchronous model of distributed systems–e.g. proof that a service will be
provided despite a single node crash will need bounds on message delay
Undergraduate course on Real-time SystemsLinköping University
37 of 50 Autumn 2019
RT communication in applications
• Vehicle electronics– Power train and chassis– Infotainment/telematics– Body electronics
• A modern car configuration has over 40 ECUs, distributed over several buses
• Avionics-specific standards– ARINC 429 (70’s), AFDX (in Airbus 380)
Undergraduate course on Real-time SystemsLinköping University
38 of 50 Autumn 2019
Message constraints
• Message delivery time bound dictated by application–So called end-to-end deadlines
• Example: shortly after each driver braking, brake light must know it in order to turn on!
Undergraduate course on Real-time SystemsLinköping University
39 of 50 Autumn 2019
New resource
Single Node DistributedResource CPU Bandwidth
Scheduled element
Task/process Message
Demand on resource
WCET & interarrival
Message size & frequency
Performancemetric
Deadlines met &Utilisation
Message delay &Throughput
Undergraduate course on Real-time SystemsLinköping University
40 of 50 Autumn 2019
Two approaches
• We will look at two well-known methods for bus scheduling–Event triggered (CAN)–Time triggered (TTP)
• Used extensively in automotive and aerospace applications respectively
Undergraduate course on Real-time SystemsLinköping University
41 of 50 Autumn 2019
The CAN bus
Amount of wires…
• Controller area network that was developed for use in all cars built in Europe
• Compulsory for the on-board diagnostics in USA car models from 2008
• Why?– Imagine: 2500 signals, 32 ECUs on
one bus
Undergraduate course on Real-time SystemsLinköping University
42 of 50 Autumn 2019
Predecessor to CAN (1976)
Ethernet:• Current versions give high bandwidth
but time-wise nondeterministic• CSMA/CD
–Sense before sending on the medium(Carrier Sense: CS)
–All nodes broadcast to all (Multiple Access: MA)
– If collision, back off and resend (Collision Detection: CD)
Undergraduate course on Real-time SystemsLinköping University
43 of 50 Autumn 2019
Collisions
• Ethernet has high throughput but temporally nondeterministic
Node 1 sends
Node 3 waits for sending
Node 2 waits for sending
Node 2 & 3 start to send
Collision
Undergraduate course on Real-time SystemsLinköping University
44 of 50 Autumn 2019
Backoff
• The period for waiting after a collision• Each node waits up to two “slot times”
after a collision (random wait)• If a new collision, the max. backoff
interval is doubled• After 10 attempts the node stops
doubling• After 16 attempts declares an error
Undergraduate course on Real-time SystemsLinköping University
45 of 50 Autumn 2019
Collisions & non-determinism
• Model the network throughput and compute probabilistic guarantees that collisions will not be too often– Theoretical study: With 100Mbps, sending
1000 messages of 128 bytes per second, there is a 99% probability that there will not be a delay longer than 1 ms due to collisions over ~1140 years
[www.rti.com Ethernet study] • If you cannot measure effects of
collisions, make collision resolution deterministic!
How often?
Undergraduate course on Real-time SystemsLinköping University
46 of 50 Autumn 2019
CAN protocol
• Developed by Bosch and Intel (1986)• ISO Standard 1993• Highest bandwidth 1Mbps, ~40m• CSMA/CR: broadcast to all nodes• CR: Collision resolution by bit-wise
arbitration plus fixed priorities (deterministic)
• Bus value is bitwise AND of the sent messages
Undergraduate course on Real-time SystemsLinköping University
47 of 50 Autumn 2019
Message priority
• The ID of the frame is located at the beginning– initial bits that are inserted into the
bus are the ID-bits– ID also determines the priority of a
frame–priority of the frame increases as the
ID decreases
Undergraduate course on Real-time SystemsLinköping University
48 of 50 Autumn 2019
Bitwise arbitration
Node 1 sends: 010...Node 2 sends: 100...Node 3 sends: 011...
• This is how ID for a message (frame) works as its priority
... sends rest of packet
... detects collision first
... detects collision next
Undergraduate course on Real-time SystemsLinköping University
49 of 50 Autumn 2019
Note
• Two roles for message ID:–Arbitration via priority–Every node upon receiving a
message, uses the ID to work out whether that message is any use to it or not
Undergraduate course on Real-time SystemsLinköping University
50 of 50 Autumn 2019
Response time analysis
• Scheduling analysis: Is every message delivered before its deadline?
Next lecture…