ECE544: Communication Networks-II Spring 2009
Hang LiuLecture 8
Includes teaching materials from D. Raychaudhuri, S. Gopal, L. Peterson,
Today’s Lecture
• Introduction to transport protocols• UDP• TCP• RTP
Protocol Stack
R1
ETH FDDI
IPIP
ETH
TCP/UDP
R2
FDDI PPP
IPR3
PPP ETH
IP
Host1
IP
ETH
TCP/UDP
Host8
Appl. Appl.
• Transport protocol– Enable communication between 2 or more
processes (which may be on different hosts in different networks)
– The Transport Layer is the lowest Layer in the network stack that is an end-to-end protocol
Transport Protocols• Different applications have different requirements for
transport protocols.– Guarantee message delivery
• network may drop messages– Deliver messages in the same order they are sent
• Messages may be reordered in networks and incurs a long delay
– Delivers at most one copy of each message• Messages may duplicate in networks
– Support arbitrarily large message• Networks may limit message size
– Support synchronization between sender and receiver– Allows the receiver to apply flow control to the sender– Support multiple application processes on each host– ……
• Design just a few transport protocols to meet most of the current and future application requirements– Each satisfies the requirements for a class of appls– Many applications=>few transport protocols
Most Popular Transport Protocols• User Datagram Protocol (UDP):
– Support multiple applications processes on each host– Option to check messages for correctness with CRC check
• Transmission Control Protocol (TCP):– Ensures reliable delivery of packets between source and
destination processes– Ensures in-order delivery of packets to destination process– Other options
• Real Time Protocol (RTP):– Serves real-time multimedia applications– Header contains sequence number, timestamp, marker bit
etc– Runs over UDP
• TCP, UDP and RTP satisfy needs of the most common applications– Applications requiring other functionality usually use UDP for
transport protocol, and implement additional features as part of the application
6
User Datagram Protocol (UDP)
• De-multiplex the applications/processes on a host– Port: an identification for a communication process– Each process-to-process communication is identified by 4-Tuple
Connection Identifier<SrcPort, SrcIPAddr, DestPort, DestIPAddr >– Well-known port: Unix talk: port 517
SrcPort DesPort
Length Checksum
Data
0 16 31Appl process
UDP
IP
Appl process
Appl process
TCP
Appl process
Appl process
Port
Checksum: <UDP header, data, pseudoheader (protocol ID, src IP addr, dest IP addr in IP header + UDP length)>
Length: UDP header + data bytes
UserSpace
Kernel
Transmission Control Protocol (TCP)
• First proposed by Vinton Cerf and Robert Kahn, 1974– TCP/IP enabled computers of all sizes,
from different vendors, different OSs, to communicate with each other.
– Used by 80% of all traffic on the Internet
• Reliable, in-order delivery, connection-oriented, bye-stream service
A Simple File Transfer Application
• Server: – passive open and wait for connection
• Client:– Active open and initialize connection
establishment• After connection establishment
– Reliable data transport• Terminate connection
TCP Operation
• Sender application process only needs to provide a byte stream to the kernel
• Kernels on sending and receiving hosts operate TCP processes• Receiver application process only needs to read received
bytes from the assigned TCP buffers
4-Tuple Connection Identifier:•SrcPort, •SrcIPAddr, •DestPort, •DestIPAddr
TCP Operation Sequence
• TCP protocol completely implemented at the end hosts
• Sequence numbers maintained in bytes (remember, TCP serves a byte stream)
• Start of operation:– Connection Establishment by a Three-Way
Handshake algorithm– Consensus on Initial Sequence Number (ISN)
• Data Transfer:– Sends the data in packets, reliably and as fast as the
network/receiver permits• Finish:
– Both sides independently close their half of the connection
TCP Header Format
• Flags:– SYN– FIN– RESET– PUSH– URG– ACK
Connection Establishment
•Three-Way Handshake Algorithm– SYN and ACK flags in the header used
–Initial Sequence numbers x and y selected at random•Required to avoid
same number for previous incarnation on the same connection
Active participant(client)
Passive participant(server)
SYN, Seq#=x
SYN+ACK, Seq#=y
Ack#=x+1
ACK, Ack#=y+1Data+ACK
Data+ACK
Connection Establishment
Data transport
Termination
Connection tear-down
• Any side can terminate the connection• Each side closes its half of the connection independently
FIN
FIN-ACK
FIN-ACK
FIN
ACK
DATA Data write
Data ACK
TCP State-Transition
• Max segment lifetime (MSL): recommendation 120 sec
CLOSED
LISTEN
ESTABLISHED
SYN_RCVD SYN_SENT
FIN_WAIT_1
FIN_WAIT_2
CLOSE_WAIT
LAST_ACK
CLOSEDTIME_WAIT
CLOSING
Passive open Close
SYN/SYN+ACK
Close
Active open/SYN
SYN/SYN+ACK
ACK SYN+ACK/ACK
Send/SYN
Close/FIN
Close/FIN FIN/ACK
Event/Action
ACKACK
FIN/ACK
ACK+FIN/ACK Timeout after 2 x MSL
FIN/ACK
ACKClose/FIN
TCP Functions• Goal of TCP: Deliver data reliably and in order as fast as
possible (Throughput = bytes delivered/ time taken)
• Flow Control:– avoid that the sender sends data too fast so that the TCP
receiver cannot reliably receive and process it. • Error Control and Congestion Control:
– When packets are lost, it implies that the one or more queues in intermediate routers have overflowed.
– Retransmit lost packets– Scale back flow rate to reduce congestion– Congestion control and error control are intertwined using
the congestion window (cwnd)• TCP increases the sending rate to use the network (the
route) to full capacity or receiver capability– But scale back if congestion occurs or if receiver is flooded.
Flow Control• TCP uses a sliding window flow control protocol
– the receiver specifies the amount of data (in bytes) willing to buffer in the AdvertisedWindow field of each segment
– The sender can send only up to that amount of data before it must wait for an ack and window update from the receiver.
Sending Appl Receiving Appl
LastByteAcked LastByteSent
LastByteWritten
NextByteExpected LastByteRcvd
LastByteRead
LastByteRead < NextByteExp <= LastByteRcvd+1LastByteAcked <= LastByteSent <= LastByteWritten
AdvertisedWindow = MaxRcvBuffer- ((NextByteExp-1)-LastByteRead)
LastByteSent – LastByteAcked <= AdvertisedWindowEffWin = AdvertisedWin- (LastByteSent-LastByteAcked)
LastByteRcvd-LastByteRead<=MaxRcvBuffer
LastByteWritten – LastByteAcked <= MaxSendBuffer
TCP TCP
Sequence Number
• Protect against SequenceNum wraparound– Sliding window
• Seq # space >= 2 x WinSize• For TCP: 232 >> 2 x 216
– Seq # should not wraparound within a MSL (120 sec) period of time
– For OC-48 (2.5 Gbps), time until wraparound: 14 sec
• TCP extension to the sequence # space for protecting against seq # wrapping around– Add 32-bit timestamp as optional header
Keep the pipe full
• AdvertisedWindow: 216=>64 KB– Big enough to allow the sender to keep the
pipe full (assume that the receiver has enough buffer to handle the data)
– If RTT = 100 ms, • Delay x Bandwidth = 122 KB for 10 Mbps Ethernet• Delay x Bandwidth = 1.2 MB for 100 Mbps
Ethernet (AdvertisedWindow is not large enough)
• TCP Extension:– Scaling factor option for AdvertisedWindow,
• e.g. use 16-byte units of data
Triggering Transmission
• When to transmit a segment: – small segments subject to large overhead
• Reach max segment size (MSS): the size of the largest segment TCP can send without causing the local IP to fragment– MSS = local MTU – IP & TCP header
• The sending process explicitly ask the TCP to transmit, “push”
TCP Deadlock
• TCP Deadlock– receiver advertises a window size of 0, the
sender stops sending data – the window size update from the receiver is
lost
• To solve it:– the sender starts the persist timer when
AdvertisedWindow = 0– When the persist timer expires, the sender
sends a small packet
TCP Silly Window Syndrome• TCP Silly Window Syndrome
– Sender has MSS bytes of data to send, but window is closed
– ACK arrives with a small window– Sender sends a small segment (high overhead)– Receiver advertise a small window– Sender sends a small receive segment– Repeat the above
• To solve: Nagle’s Algorithm– When the application have data to send
• If both available data and the window >= MSS– Send a full segment
• Else– If there is unACKed data in flight
» Buffer the new data until an ACK arrives– Else
» Send all the new data now
TCP Error Control
• Cumulative retransmission: ack the expected seq # in Ack field– Extension: selective ack (SACK), ack
additional blocks of received data in TCP optional header
• Adaptive retransmission– Adapt the retran timer to RTT
TCP Timeout• Original Algorithm:
– EstimatedRTT = x EstimatedRTT + (1-) x SampleRTT
– Timeout = 2 x EstimatedRTT– Issue: does not distinguish whether the ACK is for
original transmission or retransmission
• Karn/Partridge Algorithm– Whenever TCP retransmits a segment, it stops taking
samples of the RTT• Only measure SampleRTT for segments that have only
have been send once– Each time TCP retransmits, set the next timeout to
be twice the last timeout• Relieve congestion
10
TCP Timeout (Cont)• TCP Timeout
– If timeout too soon, unnecessarily retransmit and add load to network
– If timeout too late, increase latency
• Jacobson/Karels Algorithm: better RTT estimation by considering the variance
Difference = SampleRTT - EstimatedRTTEstimatedRTT = EstimatedRTT + ( x Difference)Deviation = Deviation + (|Difference|- Deviation)Timeout = x EstimatedRTT + x Deviation(default: set = 1 and = 4, ) 10
Congestion
• TCP assumes packet loss as congestion
Source1
Source2
Source3
Dest2
Dest1
TCP Congestion Control
• TCP sends packets into network without reservation– Try to use network resource (bandwidth,
buffer) as much as it can
• As congestion occurs, scales back• Strategy:
– Conservatively increases packet sending rate if no congestion
– Quickly reduce sending rate as congestion detected (timeout)
Additive increase/multiplicative decrease
(AIMD) • Maintain a CongestionWindow
– MaxWindow = MIN(CongestionWindow, AdvertisedWindow)
– EffectiveWin = MaxWindow – (LastByteSent – LastByteAcked)
• Decrease congestion window aggressively and increase it conservatively– A simple algorithm: Additive increase/multiplicative
decrease (AIMD) – Each time a timeout occurs, congestion window size
(cwnd)• cwnd = cwnd/2
– cwnd = Max(Cwnd, MSS)– Each time an ACK received
• Increment = MSS x (MSS/Cwnd)– Cwnd += Increment
• (CongestionWindow increase by one packet or MSS after all packets sent out during last RTT have been acked)
Additive increase/multiplicative decrease (Cont)
• TCP sawtooth pattern
Time
Con
gest
ionW
indo
w
Siz
e
• Issues with additive increase: – takes too long to ramp up a
connection from the beginning
– The entire advertised window may be reopened when a lost packet retransmitted and a single cumulative ACK is received by the sender
TCP “Slow Start”• When timeout
– Slow Start Threshold (SSThresh) = cwnd/2– Cwnd = MSS
• when receive an ack – If cwnd <= SSThresh (Slow start phase) incr = MSS (exponential growth, double cwnd
every RTT)– Else (congestion avoidance mode) incr = MSS x MSS/cwnd (linear growth, add 1 MSS
per RTT)Cwnd = min(cwnd+incr, TCP_MAXWIN)
TCP Slow Start (Cont)
• Slow startTime
Con
gest
ionW
indo
w
Siz
e
Slow-start and congestion avoidance
• A closer look
Fast Retransmit•TCP timeout issue:
– may be a long time periods of time during which the connection went dead while waiting for a timer expire
•Solution: – Add fast retransmit (not
replace regular timeout): •Everytime receiver
receives a packet (out-of-order), send a duplicate ACK
•Sender retransmit the missing packet after it receives some number of duplicate ACKs (e.g. 3 duplicate ACKs)
ACK 1
ACK 2
ACK 2
ACK 2
ACK 2
ACK 6
PKT 1PKT 2
PKT 4
PKT 5PKT 6
PKT 3Retran
PKT 3
Fast Recovery
• during congestion avoidance mode, when packets (detected through 3 duplicate ACKs) are not received, the congestion window size is reduced to the slow-start threshold, rather than the small initial value 1 MSS.
– Cwnd = SSThresh = cwnd/2• (escape slow start phase when fast
retransmit detects a lost packet and additive increase begins)
TCP Sender Operation• TCP operation is paced by its ACKs
Wait for ACK
(operate in slow-start or congestion avoidance)•Measure RTT if applicable•Set cwnd_start = purge_acked_pkts()•Set cwnd = cwnd + increment_value()•Send new packets•Retart timer if applicable
•If 3rd duplicate ACK•Fast_retransmit()•Fast_recovery()
•Set ssthresh = cwnd/2, cwnd = 1•Retransmit cwnd_start packet
ACKs indicate lossless operation
ACKs indicate possible losses
Duplicate ACK
No
AC
Ks
Tim
eout
New ACK
TCP Reno• Most popular TCP flavor; implemented in most
operating systems• Includes Fast Retransmit and Fast Recovery
Real-time Traffic• Quality of Service (QoS) factors: Reliability, Delay and Jitter
– Late arrival = loss– Because of possibly unbounded retransmissions in TCP, large
delay and jitter may ensue.• Real-time applications prefer UDP instead.• Jitter can be smoothed out by a playback buffer, but initial
buffering introduces a longer delay and reduce the interactivity.– Depends on application delay tolerance
•Real-time transport protocol (RTP) operates over UDP, – RTP modules run in user-
space. RTP libraries included in the application.
– RTP provides no error recovery and congestion control; Applications handle all aspects themselves.
Application
RTP
UDP
IP
Subnet
RTP
• RTP standard: RFC 3550
V=2 P X CC MPayload Type
Sequence number
TimestampSync. source (SSRC) identifierContributing source (CSRC) identifier (optional)
..........
Extension header
RTP payload
Padding length
RTCP
• Real-time transport control protocol (RTCP)– provides out-of-band statistics and control
information for an RTP flow– partners RTP in the delivery and packaging
of multimedia data, but does not transport any media streams itself.
– RTP uses even port number, whereas RTCP uses the next higher odd port number.
RTCP Functions• RTCP Functions
– Gather statistics on quality aspects of the media distribution
• can be used by the source for adaptive media encoding (codec) and detection of transmission faults
– Convey the canonical end-point identifiers (CNAME) to all session participants
• SSRC may change during a session, but CNAME represents the unique identity of a sender
– Correlate and synchronize different media streams• RTCP messages
– Source reports– Receiver reports– Source descriptions– Application-specific control packets
Homework
• 5.13• 5.16• 5.28• 5.34• 5.39
Due 4/17