networked systems and services, fall 2018 chapter 2 · cyclic redundancy check (crc) •burst...

Networked Systems and Services, Fall 2018

Chapter 2Jussi Kangasharju

Markku KojoLea Kutvonen

Outline

• Physical layer reliability• Low level reliability• Parities and checksums• Cyclic Redundancy Check (CRC)• Error detecting and correcting codes• Automatic repeat request (ARQ)

• Network layer reliability• Forward Error Correction• Network coding

Physical layer reliability

• Sender, receiver and channel• Data sent electrically or optically• Sent as bits or combinations of bits• Various techniques:• Modulation• Pulses• …

• In the rest, we do not really cover this• Does not interact (much) with the higher layers

Link layer basics

• Unit of interest: A sequence of bits, e.g.,• Character• Longer block of bits

• Bits might be received correctly or incorrectly• Physical layer provides no guarantees

• Obvious goal: Get the right bits at the receiver• How to:• Detect?• Correct?• Recover?

What do we want to achieve?

• Detect: Receiver is able to determine the presence of error• Either an error in general or in specific bits

• Correct: Receiver is able to correct error(s) on its own• No need to contact sender

• Recover: Receiver must contact sender and ask for retransmission• Needed anyway unless correction is feasible

• Which one is most important?• Absolute minimum: Detection and recovery• Sometimes recovering too slow à Need correction

Detection basics

• How can receiver detect an error?• Which transmitted sequence is

correct?• Not possible unless we send

additional information• All detection (and correction)

mechanisms require sending additional, redundant data

S R1 0 0 0 100

S R1 0 1 0 100

Simple solution

• Send every bit 3 times• Triple-modular redundancy• Commonly used in many areas for safety-critical systems

• (Example: https://en.wikipedia.org/wiki/Triple_modular_redundancy)

• Pros:• Can detect and correct 1 bit errors

• Cons:• Lot of overhead, send 3 times the needed bits• Not very good with burst errors

Interleaving to the rescue!

• Interleaving means not sending bits in order• (Example: https://en.wikipedia.org/wiki/Forward_error_correction#Interleaving)• Pros:• More robust against burst errors

• Cons:• Still high overhead• Slower to receive data; cannot detect until everything is received

• Interleaving used in media delivery and video encodings

Checksums

• Checksum: • Mathematical function of the data• Added to the transmission (data + checksum) by sender• Receiver can calculate same function• If match, then good, if not, then error

• Checksums used in many, many places• Bank account numbers, credit card numbers, ID numbers, …• Goal often: Protection against typos and stupid forgers• Algorithms public and known

Parity bit

• Parity is the simplest form of checksum

• Adds one bit of redundancy to data

• Two variants: Even and odd parity

• Calculation:• Calculate number of ‘1’ bits in data

• Set parity bit to ‘0’ or ‘1’ so that total number of ’1’ bits is even or odd (depending on which parity is used)

• Receiver calculates number of ‘1’ bits in data and checks result

• (Example: https://en.wikipedia.org/wiki/Parity_bit)

Parity properties

• Can detect all single bit errors• Actually, can detect all cases with odd number of bit errors• Cannot detect an even number of bit errors• Errors cancel each other out in calculation

• Typically used on byte level• Cannot correct errors, must retransmit in case error detected• Use cases:• RAID disks, memory buses, serial data transmission

Cyclic Redundancy Check (CRC)

• Burst errors very common in data transmission

• Cyclic codes protect against them and are fast and easy to calculate

• Parity is a special case of CRC (1 bit CRC)

• Generator polynomial of degree n• Message divided by generator polynomial• Essentially bitwise XOR (fast to calculate)

• Remainder becomes the checksum

• (Example, https://en.wikipedia.org/wiki/Cyclic_redundancy_check)

CRC use cases

• Data transmission and storage• Ethernet uses CRC-32• Various CRCs used in mobile networks• gzip and bzip2 use CRC-32 (same as Ethernet)• iSCSI uses CRC-32 (different from Ethernet)• Also used in train communication and aviation

• Only detection, no correction of errors

Error correction

• Detection is mandatory, but needs recovery to proceed• How about also correcting detected errors?• Pros:• No need to contact sender again, saves one RTT

• Cons:• Needs more redundancy à Reduced goodput

• Error correction on low level here• Forward Error Correction (FEC) across packets comes later

Hamming codes

• How to indicate which bit was in error?• Repetition codes (e.g., triple redundancy) not very efficient• Basic idea:• We have n bits in message• We need k error correction bits, such that 2k ≥ n• Then we can determine which bit was in error (single bit error)

• So called (n, (n-k)) code• Code rate is k/n (= goodput)

• Set of all possible sent data• Remember: Low level communication, maybe 7 bits?

Hamming distance

• Metric to determine difference between two strings• Must be of equal length• For us: Received data and correct data

• Distance is the number of substitutions needed to go from received data to correct data• How many errors have happened?

• Set of possible original data has some Hamming distance• Hamming distance 2: Can detect single bit errors• Hamming distance 3: Can correct single bit errors

Hamming (7, 4)

• We send data items of 4 bits • Need 3 bits for error correction• Why?

• Construction of code:1. Number bits in binary, starting from 12. Power of 2 bit positions are parity bits3. All others are data bits4. Each data bit covered by at least 2 parity bits as follows:

• Parity bit 1 covers all bits with LSB = 1• Parity bit 2 covers all bits with second-LSB = 1• Parity bit 3 covers all bits with third-LSB = 1, and so on

Hamming (7, 4) construction example

Bit number 1 2 3 4 5 6 7

Data/Parity p1 p2 d1 p4 d2 d3 d4

P1 covers X X X X

P2 covers X X X X

P4 covers X X X X

If all parity bits are correct, then no errorSum positions of erroneous parity bits to find out location of actual data bit in errorIf only one parity bit indicates error, then error is in parity bitHamming (8, 4) can also detect two bit errors (includes one overall parity bit)

Recovering from errors

• Error correcting codes on bit level useful, but have high overhead• Hamming (7,4) only sends 4/7 = 57% useful information

• Typically communication channels not that unreliable• What could be possible exceptions?

• Always sending redundant information wastes resources• Focus on detection and subsequent recovery• Detection usually needs less redundancy

• Recovery = Receiver asks sender to retransmit• Key issues: How, when, and how much?

Recovery basics

• What are possible errors?• Corrupted data• Can be identified via checksums, etc.• The stuff we have just seen• Receiver asks sender to retransmit corrupted data

• Lost packet• How can a receiver know to expect a packet?• How can a receiver know to ask for something it doesn’t know exists?

ARQ: Automatic Repeat Request

• Every packet must have a sequence number• Sequence number must be unique

across packets in-flight • Sequence numbers can be re-used if

no risk of confusion• Receiver acknowledges reception

of packet number X• Sender knows packet was

successfully received• Sender sends next packet

S RP:123

ACK:123

P:124

ARQ: Problems

• Lost packet• Receiver cannot acknowledge• How to solve this?• Timeout• Sender waits t seconds for ACK• No ACK à Retransmit same packet

• Everything resumes as usual

S RP:123

P:123

ACK:123

ARQ: More problems

• What if ACK is lost?• What is the difference for the sender?• No difference, timeout, retransmit• How about for receiver?• Receives same packet twice• Must keep track of received packets• Prune duplicates

S RP:123

ACK:123

P:123P:123

P:123

ARQ: More issues

• What if no ACK comes despite multiple retransmissions?• Maximum number of retransmissions• If no success, then connection is assumed to be lost

• Another (longer) timeout• If this is triggered, connection is assumed to be lost

• How much bookkeeping for receiver for duplicate packets?• Sliding window, i.e., define maximum number of outstanding packets• Limits need for bookkeeping at receiver• Also defines maximum number needed for sequence numbers

Types of ARQ

• ARQ typically exists both on link and transport layers• Here, general properties of ARQ• Later a practical case with TCP

• Stop-and-Wait• Send one packet, wait for ACK• Only then send new packet• Simple to implement• Very inefficient

• How to make it more efficient?

Go-back-N ARQ

• Window for unacked packets• Sender can send this many at once

• Receiver acks last received packet• Sender sends next window of packets• ACK can be for next expected packet

• For lost packets• Receiver acks last consecutive packet• Sender resumes from that point• Retransmits all packets after missing one,

even if they were correctly received

S RP:123

P:124

P:125

ACK:125

P:126

P:127

P:128

ACK:126

Go-back-N issues

• More efficient than Stop-and-Wait• Can send one window worth of packets per RTT• Stop-and-Wait has N = 1

• Issue: Everything after lost packet is sent again• Worse: ACK is lost• Worst case: Whole window is sent twice• Especially bad if window is big

• How to solve this problem?

Selective Repeat ARQ

• Sender sends one window of packets• No errors: Receiver acks them all• Either individually or cumulative• Must make sure this works

• Error: Receiver tells sender which packets were missing/wrong• Either acks successes or nacks failures

• Sender only retransmits failed packets

S RP:123

P:124

P:125

ACK:125

P:126

P:127

P:128

NACK:127

P:127

Comparison

• Go-back-N• Pros:• Easy to implement, not much

bookkeping needed (one number at receiver)• Works if errors are rare enough

• Cons:• More transmission overhead for

errors• Sender needs to keep track of all

packets in window

• Selective Repeat• Pros:• Only retransmits data that didn’t

make it the first time• Most efficient use of network

resources• Cons:• Sender needs to keep track of all

packets in window• Receiver needs to keep track of all

packets in window

ARQ summary

• Basic recovery mechanism• Used both on link and transport layers

• Two main variants: Go-back-N and Selective Repeat• TCP was originally Go-back-N• Nowadays extensions for Selective Repeat (SACK)• Later there will be a discussion on TCP

• Another variant: Hybrid ARQ• Combines ARQ with Forward Error Correction• We will see this later in the chapter

Network level reliability

Network level solutions

• Forward Error Correction (FEC)• Basics• Reed-Solomon codes• Fountain codes

• Network coding• Hybrid ARQ

FEC basics

• Add redundancy to sent data to allow receiver to recover from errors• Error can be corrupted data or lost packet• Is there a difference between these two?

• Two basic ways of error correction:• Add redundancy to allow corrupted data to be recovered• Add additional data to allow recovery of completely lost data

• Forward Error Correction typically means the second option• When to use FEC?• Common use case: Retransmission is impossible or too expensive

Types of FEC

• Block codes• Fixed size blocks or packets• Lost or corrupted data• Hamming codes, Reed Solomon

• Convolutional codes• Bit streams of arbitrary length

• Erasure codes• Specifically against lost data• Fountain codes

Block codes

• Block codes divide data into fixed size blocks (e.g., packets)• Assume k bits in size

• Then add redundancy to produce n bits of output• Rate of code: R = k/n• Large R: Not much redundancy, opposite for small R• Tradeoff between n and resulting overhead• Lots of different block codes in existence

Simple example

• Our programming assignment has a simple block code• Two inputs: Packets A and B• Redundancy: C = A XOR B• Rate: 2/3

• Three packets A, B, and C form a group• For receiver: • Receive any 2 packets out of the group of A, B, C• Reconstruct A and B (directly or XORing)

• No redundancy across different groups of 3 packets

Reed Solomon codes

• Defined by Reed and Solomon in 1960

• Where are they used?

• CD, DVD, QR codes, DVB, DSL, space communications, …

• Pretty much everywhere J

• Operates on multi-bit symbols (read: group of bits)

• Burst error affect multiple bits

• But hopefully only one symbol

• Good error correction properties

Tornado codes

• Tornado codes are like Reed Solomon codes• Less efficient on space• More efficient on speed• Tornado codes based on layered approach• All layers (but one) use Low-density Parity Check (LDPC) code

• Efficient, but can fail• Last layer uses Reed Solomon

• Slower but optimal

• Many other similar codes exist

Erasure codes

• Goal: Recover lost data• Reed Solomon codes are one example of this category• Polynomial interpolation

• Basic idea:• Message of k packets (also called symbols)• Encoded into n packets• Receiver can reconstruct from any k’ packets received (out of n)

• Rate: k/n• Reception efficiency: k’/k• Ideally k’ = k; any received k symbols sufficient

Fountain codes

• Rateless erasure codes• Unlimited number of encoded

packets• n source packets• Need any n (or close to n)

encoded packets to decode original data• Example n = 3

SIHGFEDCBA

R3

R2

R1

Raptor codes

• Raptor = Rapid Tornado• First fountain codes with linear encoding and decoding• Original message k symbols• Receive any k encoded symbols à High probability of decoding• For k received symbols, less than 1% chance of error• For k+2 received symbols, less than 1 in million chance of error• Symbol can be of any size (byte, packet, …)

Use of fountain codes

• Useful for broadcast content

• Same content being broadcast to multiple recipients

• Especially when receivers can join at any time

• Also known as data carousel

• RFC 5053 has been widely adopted

• Used by 3GPP

• DVB for handheld devices

• DVB-IPTV, TV over IP networks

• Updated RaptorQ in RFC 6330

Usefulness of fountain codes

• Low overhead• Almost close to ideal

• Receiver can act independently• No need to contact sender for recovery• Not enough packets à Receive some more

• Works best for broadcast or multicast• No need to know identities of receivers

• Works for unicast as well• Efficiency depends on channel and many other factors

Network coding

• Not so much for reliability as for improved performance• Improved scalability and throughput

• Basic idea:• Combine multiple packets together

• Theoretical property:• Can achieve maximum throughput for single-source multicast• No proof for multi-source cases, though

Butterfly network

A

A

B A B

B

A

A B

B

A ⊕ B

A ⊕ B

A ⊕ B

Network coding in practice

• Routing of packets only: Central link can send either A or B, not both• Network coding makes a combination of A and B• Send combination over bottleneck link• Receivers get A or B separately, can decode other

• Sender has N packets to send• Create linear combinations of packets with random coefficients• Coefficients chosen from a Galois field

• If received packets are linearly independent, decoding successful• If not, unlikely to be able to decode anything• Solution: Continue to send more

Network coding example

• Three original packets A, B, and C• Select coefficients to create 3 encoded packets D, E, F• D = xA + yB + zC• E = kA + lB + mC• F = nA + oB + pC

• If this set of linear equations has a unique solution, then code works• We can create further packets G, H, I, … with different coefficients• Receiver needs enough packets to solve A, B, and C• At least 3, could be more depending on the coefficients

Network coding article

• C. Gkantsidis, P. Rodriguez, Network Coding for Large Scale Content Distribution, IEEE Infocom 2005• Next article essay to be completed• Article discusses how to use network coding in large scale content

delivery systems• Similar to BitTorrent• Was planned to be used for software updates

• See Moodle and announcements later about deadline and link

Combining different mechanisms

• How about combining ARQ and FEC?• Let’s try a layered approach• Put one as layer on top of the other• Both working independently • What could go wrong?

FEC on top of ARQ

• First layer is FEC• Add FEC to packets from application• Pass them to ARQ which tries to get them through• What is the problem?• ARQ tries to get all packets through• But whole idea of FEC is to allow for loss• Not much benefit…

ARQ on top of FEC

• How about the other way around?• ARQ gets packets from application• Lower layer uses FEC to ensure delivery• What is the problem?• FEC may add delay à Possible timeouts• Longer timeouts à Slower recovery à Lower throughput

• Unnecessary retransmissions (with FEC)• Neither solution is without problems

Interactions between mechanisms

• Different reliability mechanisms may interact• Like previous example

• Not always a good idea to enable everything • Must be aware of (subtle) effects in chosen design• Usually no clear optimal solution• Can make new combinations of solutions• These kinds of interactions can happen between any solutions

Hybrid ARQ

• A different (smarter?) way of combining ARQ and FEC• Use both at the same time• Encode packets with FEC for error correction• Use ARQ and its error detection as a fallback• Pros:• Works well over poor quality channels

• Cons:• Adds significant overhead for good quality channels

• How to tweak?

Hybrid ARQ

• Adjust amount of FEC based on observed channel quality• At first, only use ARQ• If everything goes smoothly, remain with ARQ• If there are errors, start including FEC (below ARQ)• Adjust amount of FEC based on need• Soft combining:• Receiver keeps incorrectly received packets• Attempts to combine with future packets

• Used for example in HSDPA

Summary

• Link level and network level mechanisms• Error detection: Parities, checksums, CRC• Error correction: Hamming codes• Recovery: ARQ• Forward Error Correction• Network coding• Hybrid ARQ

networked systems and services, fall 2018 chapter 2 · cyclic redundancy check (crc) •burst...

Documents