csci 547 transport layer3-1 chapter 3 transport layer computer networking: a top down approach, 4 th...
Post on 21-Dec-2015
228 views
TRANSCRIPT
CSCI 547 Transport Layer 3-1
Chapter 3Transport Layer
Computer Networking: A Top Down Approach ,4th edition. Jim Kurose, Keith RossAddison-Wesley, July 2007.
A note on the use of these ppt slides:We’re making these slides freely available to all (faculty, students, readers). They’re in PowerPoint form so you can add, modify, and delete slides (including this one) and slide content to suit your needs. They obviously represent a lot of work on our part. In return for use, we only ask the following: If you use these slides (e.g., in a class) in substantially unaltered form, that you mention their source (after all, we’d like people to use our book!) If you post any slides in substantially unaltered form on a www site, that you note that they are adapted from (or perhaps identical to) our slides, and note our copyright of this material.
Thanks and enjoy! JFK/KWR
All material copyright 1996-2007J.F Kurose and K.W. Ross, All Rights Reserved
CSCI 547 Transport Layer 3-2
OSI Protocol Suite
Read http://en.wikipedia.org/wiki/OSI_model
CSCI 547 Transport Layer 3-3
TCP/IP Protocol Suite
Defined in T
CP/IP Protocol S
uiteU
nd
efine
d
CSCI 547 Transport Layer 3-4
Chapter 3: Transport Layer
Our goals: understand
principles behind transport layer services: multiplexing/
demultiplexing reliable data
transfer flow control congestion control
learn about transport layer protocols (TCP & UDP): UDP: connectionless
transport TCP: connection-oriented
transport TCP congestion control
CSCI 547 Transport Layer 3-5
Chapter 3 outline
3.1 Transport-layer services
3.2 Multiplexing and demultiplexing
3.3 Connectionless transport: UDP
3.4 Principles of reliable data transfer
3.5 Connection-oriented transport: TCP segment structure reliable data transfer flow control connection
management
3.6 Principles of congestion control
3.7 TCP congestion control
CSCI 547 Transport Layer 3-6
Transport services and protocols
provide end-to-end logical communication(virtual communication) between app processes running on different hosts
transport protocols run in end systems send side: breaks app
messages into segments, passes to network layer
rcv side: reassembles segments into messages, passes to app layer
more than one transport protocol available to apps Internet: TCP and UDP
application
transportnetworkdata linkphysical
application
transportnetworkdata linkphysical
networkdata linkphysical
networkdata linkphysical
networkdata linkphysical
networkdata linkphysicalnetwork
data linkphysical
logical end-end transport
TCP (or UDP) is an end-to-end protocol compared the lower layer protocols—they are hop-to-hop protocol
CSCI 547 Transport Layer 3-7
Transport vs. network layer
network layer: logical communication between hosts
transport layer: logical communication between processes uses the services of
network layer and enhances
Remember the concept of “service provider” & “service user”?
Household analogy:12 kids sending letters
to 12 kids processes = kids app messages =
letters in envelopes hosts = houses transport protocol =
Ann and Bill network-layer protocol
= postal service
CSCI 547 Transport Layer 3-8
Internet transport-layer protocols
reliable, in-order delivery (TCP) congestion control flow control Error recovery connection setup
unreliable, unordered delivery: UDP no-frills extension of
“best-effort” IP services not available:
delay guarantees bandwidth guarantees Why not available?
application
transportnetworkdata linkphysical
application
transportnetworkdata linkphysical
networkdata linkphysical
networkdata linkphysical
networkdata linkphysical
networkdata linkphysicalnetwork
data linkphysical
logical end-end transport
CSCI 547 Transport Layer 3-9
A layer’s Services
In a layered model of network protocols, a layer provides a predefined set of services, e.g. connection management, message delivery, etc.
A higher layer uses the services of a lower layer
Therefore, the concept of “service user”, “service provider”
CSCI 547 Transport Layer 3-10
Layer service—service user, service provider
Layer N
.
..
..
.
Layer N + 1
Layer N - 1
Layer N entity
Services to layer N + 1
Services from layer N - 1
Communicatewith peer layer Nusing Nth layerprotocol
Layered structure ofprotocols
CSCI 547 Transport Layer 3-11
Connection services of any layer except phycical layer3 possible services provided to upper layer 1. Unacknowledged Connectionless Service
-Also called as Datagram (DG) service or simply Connectionless-Frames can get lost-No flow control-No error control-Like regular mail
2. Acknowledged Connectionless Service
-No connection is established prior to data transmission-A message will be answered by a message(=Acknowledged)-Like a return receipt-Usually not used in networking
3. Connection-oriented Service
-Also called as Virtual Circuit (VC) service-Connection management-Error recovery-Flow control-Ensures correct delivery of frames
CSCI 547 Transport Layer 3-12
CSCI 547 Transport Layer 3-13
Connection-oriented(VC) vs Conncectionless(DG) ISSUES Connection-oriented Connectionless-------------------------------------------------------------------------------------------------------------------Initial setup Required Noand termination
Routing Routing only done on Each packet routedDecisions initial VC setup independently
Connection state Routers keep state info. Router do not hold state info.
for each connection
Need for Needed during initial setup Full address neededFull address Afterwards only VC# always
needed
Packet Guaranteed Not guaranteedSequencing Error recovery Handles error conditions Left to a higher layer
CSCI 547 Transport Layer 3-14
ISSUES Connection-oriented Connectionless-------------------------------------------------------------------------------------------------------------Congestion control Easy Difficult
QOS Easy Difficult
Flow Control Handles Not done
Overhead High Low
Examples: TCP UDP, IP, IPX, ISO-IP
Connection-oriented(VC) vs Conncectionless(DG)—Cont’d
CSCI 547 Transport Layer 3-15
Chapter 3 outline
3.1 Transport-layer services
3.2 Multiplexing and demultiplexing
3.3 Connectionless transport: UDP
3.4 Principles of reliable data transfer
3.5 Connection-oriented transport: TCP segment structure reliable data transfer flow control connection
management
3.6 Principles of congestion control
3.7 TCP congestion control
CSCI 547 Transport Layer 3-16
Multiplexing/demultiplexing
application
transport
network
link
physical
P1
transport
network
link
physical
application
transport
network
link
physical
P2P3 P4P1
host 1 host 2 host 3
= process= socket
delivering received segmentsto correct socket
Demultiplexing at rcv host:gets data from multiplesockets, enveloping data with header (later used for demultiplexing) and delivers
Multiplexing at send host:
P5
CSCI 547 Transport Layer 3-17
How demultiplexing works
host receives IP datagrams each datagram has
source IP address, destination IP address
each datagram carries 1 transport-layer segment
each segment has source, destination port number
host uses IP addresses & port numbers to direct segment to appropriate socket
TCP/UDP segment format
IP header20 Bytes or more
UDP or TCP header
Source port # Destination port #
32 bits
Data
CSCI 547 Transport Layer 3-18
Connectionless demultiplexing
Create sockets with port numbers:DatagramSocket mySocket1 = new DatagramSocket(12534);DatagramSocket mySocket2 = new DatagramSocket(12535);
UDP socket identified by two-tuple:
(Dest IP addr, Dest port #)
When host receives UDP segment: checks destination port
number in segment directs UDP segment to
socket program with that port number
IP datagrams with different source IP addresses and/or source port numbers directed to same socket
CSCI 547 Transport Layer 3-19
Connectionless demux (cont)
DatagramSocket serverSocket = new DatagramSocket(6428);
ClientIP:B
P2
client IP: A
P1P1P3
serverIP: C
SP: 6428
DP: 9157
SP: 9157
DP: 6428
SP: 6428
DP: 5775
SP: 5775
DP: 6428
SP(Source Port #) provides “return address”
6428
CSCI 547 Transport Layer 3-20
Connection-oriented demux
TCP socket connection identified by 4-tuple (unique #): source IP address source port number dest IP address dest port number
recv host uses all four values to direct segment to appropriate socket
Server host may support many simultaneous TCP socket connections: each socket connection
identified by its own 4-tuple
Web servers have different sockets for each connecting client non-persistent HTTP will
have different socket connections for each request
CSCI 547 Transport Layer 3-21
Connection-oriented demux (cont)
clientIP:B
P1
client IP: A
P1P2P4
serverIP: C
SP: 9157
DP: 80
SP: 9157
DP: 80
P5 P6 P3
D-IP: CS-IP: A
D-IP: C
S-IP: B
SP: 5775
DP: 80
D-IP: CS-IP: B
TCPTCP TCP
4-tuples are unique
Pn Processes—not Ports
CSCI 547 Transport Layer 3-22
Connection-oriented demux: Concurrent Web Server
ClientIP:B
P1
client IP: A
P1P2
serverIP: C
SP: 9157
DP: 80
SP: 9157
DP: 80
P4 P3
D-IP: CS-IP: A
D-IP: C
S-IP: B
SP: 5775
DP: 80
D-IP: CS-IP: B
Child processe
s
CSCI 547 Transport Layer 3-23
Chapter 3 outline
3.1 Transport-layer services
3.2 Multiplexing and demultiplexing
3.3 Connectionless transport: UDP
3.4 Principles of reliable data transfer
3.5 Connection-oriented transport: TCP segment structure reliable data transfer flow control connection
management
3.6 Principles of congestion control
3.7 TCP congestion control
CSCI 547 Transport Layer 3-24
UDP: User Datagram Protocol [RFC 768]
“no frills,” “bare bones” Internet transport protocol
“best effort” service, UDP segments may be: lost delivered out of order
to app connectionless:
no handshaking between UDP sender, receiver
each UDP segment handled independently of others
Why is there a UDP? no connection
establishment—No delay simple: no connection
state at sender or receiver
small segment header no congestion control:
UDP can blast away as fast as desired but receiver may not keep the pace with sender!
CSCI 547 Transport Layer 3-25
UDP: more
often used for streaming multimedia apps loss tolerant rate sensitive
other apps that use UDP DNS SNMP
achieve reliable transfer over UDP?We need to add reliability at application layer application-level error
recovery!
source port # dest port #
32 bits
Applicationdata
(message)
UDP segment format
length checksumLength, in
bytes of UDPsegment,including
header
CSCI 547 Transport Layer 3-26
Apps use TCP or UDP ?
CSCI 547 Transport Layer 3-27
UDP checksum
Sender: treat segment contents
as sequence of 16-bit integers
checksum: addition (1’s complement sum) of segment contents
sender puts checksum value into UDP checksum field
Receiver: compute checksum of
received segment check if computed checksum
equals checksum field value: NO - error detected YES - no error detected.
But maybe errors nonetheless? More later ….
Goal: detect “errors” in transmitted segment
CSCI 547 Transport Layer 3-28
Internet Checksum Example
Note When adding numbers, a carryout from the
most significant bit needs to be added to the result
Example: add two 16-bit integers
1 1 1 1 0 0 1 1 0 0 1 1 0 0 1 1 01 1 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1
1 1 0 1 1 1 0 1 1 1 0 1 1 1 0 1 1
1 1 0 1 1 1 0 1 1 1 0 1 1 1 1 0 01 0 1 0 0 0 1 0 0 0 1 0 0 0 0 1 1
wraparound
sum
Checksum (1’s complement)
CSCI 547 Transport Layer 3-29
Chapter 3 outline
3.1 Transport-layer services
3.2 Multiplexing and demultiplexing
3.3 Connectionless transport: UDP
3.4 Principles of reliable data transfer
3.5 Connection-oriented transport: TCP segment structure reliable data transfer flow control connection
management
3.6 Principles of congestion control
3.7 TCP congestion control
CSCI 547 Transport Layer 3-30
Principles of Reliable data transfer
top-10 list of most important networking topics!
characteristics of unreliable channel will determine complexity of reliable data transfer protocol (rdt)
CSCI 547 Transport Layer 3-31
Principles of Reliable data transfer
characteristics of unreliable channel will determine complexity of reliable data transfer protocol (rdt)
CSCI 547 Transport Layer 3-32
Principles of Reliable data transfer
important in app., transport, link layers top-10 list of important networking topics!
characteristics of unreliable channel will determine complexity of reliable data transfer protocol (rdt)
CSCI 547 Transport Layer 3-33
Reliable data transfer: getting started
sendside
receiveside
rdt_send(): called from above, (e.g., by app.). Passed data to
deliver to receiver’s upper layer
udt_send(): called by rdt,to transfer packet over unreliable channel to
receiver
rdt_rcv(): called when packet arrives on rcv-side of channel
deliver_data(): called by rdt to deliver data to
upper
CSCI 547 Transport Layer 3-34
Reliable data transfer: getting started
We’ll design a protocol: incrementally develop sender, receiver
sides of reliable data transfer protocol (rdt) consider only unidirectional data transfer
but control info will flow on both directions!
use finite state machines (FSM) to specify sender, receiver
state1
state2
event causing state transitionactions taken on state transition
state: when in this “state” next state
uniquely determined by next event
eventactions
CSCI 547 Transport Layer 3-35
Rdt1.0: reliable transfer over a reliable channel
underlying channel perfectly reliable no bit errors no loss of packets
separate FSMs for sender, receiver: sender sends data into underlying channel receiver read data from underlying channel
Wait for call from above packet = make_pkt(data)
udt_send(packet)
rdt_send(data)
extract (packet,data)deliver_data(data)
Wait for call from
below
rdt_rcv(packet)
sender receiver
Statename Output action
Input event
State transition
CSCI 547 Transport Layer 3-36
Rdt2.0: channel with bit errors
underlying channel may get hit by noise Using checksum to detect bit errors
the question: how to recover from errors: acknowledgements (ACKs): receiver explicitly tells
sender that pkt received OK negative acknowledgements (NAKs): receiver
explicitly tells sender that pkt had errors sender retransmits pkt on receipt of NAK
new mechanisms in rdt2.0 (beyond rdt1.0): error detection receiver feedback: control msgs (ACK,NAK) rcvr
sender
CSCI 547 Transport Layer 3-37
rdt2.0: FSM specification
Wait for call from above
sndpkt = make_pkt(data, checksum)udt_send(sndpkt)
extract(rcvpkt,data)deliver_data(data)udt_send(ACK)
rdt_rcv(rcvpkt) && notcorrupt(rcvpkt)
rdt_rcv(rcvpkt) && isACK(rcvpkt)
udt_send(sndpkt)
rdt_rcv(rcvpkt) && isNAK(rcvpkt)
udt_send(NAK)
rdt_rcv(rcvpkt) && corrupt(rcvpkt)
Wait for ACK or
NAK
Wait for call from
belowsender
receiverrdt_send(data)
&& = andDo nothing
CSCI 547 Transport Layer 3-38
rdt2.0: operation with no errors
Wait for call from above
snkpkt = make_pkt(data, checksum)udt_send(sndpkt)
extract(rcvpkt,data)deliver_data(data)udt_send(ACK)
rdt_rcv(rcvpkt) && notcorrupt(rcvpkt)
rdt_rcv(rcvpkt) && isACK(rcvpkt)
udt_send(sndpkt)
rdt_rcv(rcvpkt) && isNAK(rcvpkt)
udt_send(NAK)
rdt_rcv(rcvpkt) && corrupt(rcvpkt)
Wait for ACK or
NAK
Wait for call from
below
rdt_send(data)
1
2
3
4
CSCI 547 Transport Layer 3-39
rdt2.0: error scenario
Wait for call from above
snkpkt = make_pkt(data, checksum)udt_send(sndpkt)
extract(rcvpkt,data)deliver_data(data)udt_send(ACK)
rdt_rcv(rcvpkt) && notcorrupt(rcvpkt)
rdt_rcv(rcvpkt) && isACK(rcvpkt)
udt_send(sndpkt)
rdt_rcv(rcvpkt) && isNAK(rcvpkt)
udt_send(NAK)
rdt_rcv(rcvpkt) && corrupt(rcvpkt)
Wait for ACK or
NAK
Wait for call from
below
rdt_send(data)
1
2
3
4
5
6
7
8
rdt2.0 has a fatal flaw!What is it?
CSCI 547 Transport Layer 3-40
rdt2.0 has a fatal flaw!
What happens if ACK/NAK corrupted?
sender doesn’t know what happened at receiver!
can’t just retransmit: possible duplicate
Handling duplicates: sender retransmits current
pkt if ACK/NAK garbled sender adds sequence
number to each pkt receiver discards (doesn’t
deliver up) duplicate pkt
Sender sends one packet, then waits for receiver response
stop and wait
CSCI 547 Transport Layer 3-41
rdt2.1: sender, handles garbled ACK/NAKs
Wait for call 0 from
above
sndpkt = make_pkt(0, data, checksum)udt_send(sndpkt)
rdt_send(data)
Wait for ACK or NAK 0 udt_send(sndpkt)
rdt_rcv(rcvpkt) && ( corrupt(rcvpkt) || isNAK(rcvpkt) )
sndpkt = make_pkt(1, data, checksum)udt_send(sndpkt)
rdt_send(data)
rdt_rcv(rcvpkt) && notcorrupt(rcvpkt) && isACK(rcvpkt)
udt_send(sndpkt)
rdt_rcv(rcvpkt) && ( corrupt(rcvpkt) || isNAK(rcvpkt) )
rdt_rcv(rcvpkt) && notcorrupt(rcvpkt) && isACK(rcvpkt)
Wait for call 1 from
above
Wait for ACK or NAK 1
&& = “and”|| = “or”
CSCI 547 Transport Layer 3-42
rdt2.1: receiver, handles garbled ACK/NAKs
Wait for 0 from below
sndpkt = make_pkt(NAK, chksum)udt_send(sndpkt)
rdt_rcv(rcvpkt) && not corrupt(rcvpkt) && has_seq0(rcvpkt)
rdt_rcv(rcvpkt) && notcorrupt(rcvpkt) && has_seq1(rcvpkt)
extract(rcvpkt,data)deliver_data(data)sndpkt = make_pkt(ACK, chksum)udt_send(sndpkt)
Wait for 1 from below
rdt_rcv(rcvpkt) && notcorrupt(rcvpkt) && has_seq0(rcvpkt)
extract(rcvpkt,data)deliver_data(data)sndpkt = make_pkt(ACK, chksum)udt_send(sndpkt)
rdt_rcv(rcvpkt) && (corrupt(rcvpkt)
sndpkt = make_pkt(ACK, chksum)udt_send(sndpkt)
rdt_rcv(rcvpkt) && not corrupt(rcvpkt) && has_seq1(rcvpkt)
rdt_rcv(rcvpkt) && (corrupt(rcvpkt)
sndpkt = make_pkt(ACK, chksum)udt_send(sndpkt)
sndpkt = make_pkt(NAK, chksum)udt_send(sndpkt)
CSCI 547 Transport Layer 3-43
rdt2.1: discussion
Sender: seq # added to pkt two seq. #’s (0,1)
will suffice. Why? must check if
received ACK/NAK is corrupted
twice as many states state must
“remember” whether “current” pkt has 0 or 1 seq. #
Receiver: must check if
received packet is duplicate state indicates
whether 0 or 1 is expected pkt seq #
note: receiver can not know if its last ACK/NAK was received correctly at sender
CSCI 547 Transport Layer 3-44
rdt2.2: a NAK-free protocol
same functionality as rdt2.1, but using ACKs only instead of NAK, receiver sends ACK for last pkt
received OK receiver must explicitly include seq # of pkt being
ACKed
duplicate ACKs at sender results in same action as NAK: retransmit current pkt
CSCI 547 Transport Layer 3-45
rdt2.2: sender, receiver fragments—NAK-free
Wait for call 0 from
above
sndpkt = make_pkt(0, data, checksum)udt_send(sndpkt)
rdt_send(data)
udt_send(sndpkt)
rdt_rcv(rcvpkt) && ( corrupt(rcvpkt) || isACK(rcvpkt,1) )
rdt_rcv(rcvpkt) && notcorrupt(rcvpkt) && isACK(rcvpkt,0)
Wait for ACK
0
sender FSMfragment
Wait for 0 from below
rdt_rcv(rcvpkt) && notcorrupt(rcvpkt) && has_seq1(rcvpkt)
extract(rcvpkt,data)deliver_data(data)sndpkt = make_pkt(ACK1, chksum)udt_send(sndpkt)
rdt_rcv(rcvpkt) && (corrupt(rcvpkt) || has_seq1(rcvpkt))
udt_send(sndpkt)
receiver FSMfragment
CSCI 547 Transport Layer 3-46
rdt3.0: channels with errors and loss
New assumption: underlying channel can also lose packets (data or ACKs is completely lost—does not arrive at destination) This can happen when
entire packet is hit by noise or when the beginning of a packet is not recognized by receiver (synchronization bits lost)
checksum, seq. #, ACKs, retransmissions will be of help, but not enough
Approach: sender waits “reasonable” amount of time for ACK
retransmits if no ACK received within the time limit--timeout
if pkt (or ACK) just delayed (not lost): retransmission will be
duplicate, but use of seq. #’s already handles this
receiver must specify seq # of pkt being ACKed
requires countdown timer
CSCI 547 Transport Layer 3-47
Ethernet frame format
If these bits are hit by noise, a receiver will not recognize a frame—so, the frame is completely lost !
CSCI 547 Transport Layer 3-48
rdt3.0 sender
sndpkt = make_pkt(0, data, checksum)udt_send(sndpkt)start_timer
rdt_send(data)
Wait for
ACK0
rdt_rcv(rcvpkt) && ( corrupt(rcvpkt) ||isACK(rcvpkt,1) )
Wait for call 1 from
above
sndpkt = make_pkt(1, data, checksum)udt_send(sndpkt)start_timer
rdt_send(data)
rdt_rcv(rcvpkt) && notcorrupt(rcvpkt) && isACK(rcvpkt,0)
rdt_rcv(rcvpkt) && ( corrupt(rcvpkt) ||isACK(rcvpkt,0) )
rdt_rcv(rcvpkt) && notcorrupt(rcvpkt) && isACK(rcvpkt,1)
stop_timer
stop_timer
udt_send(sndpkt)start_timer
timeout
udt_send(sndpkt)start_timer
timeout
rdt_rcv(rcvpkt)
Wait for call 0 from
above
Wait for
ACK1
rdt_rcv(rcvpkt)
CSCI 547 Transport Layer 3-49
rdt3.0 in action
CSCI 547 Transport Layer 3-50
rdt3.0 in action
CSCI 547 Transport Layer 3-51
Performance of rdt3.0
rdt3.0 works, but performance stinks example: 1 Gbps link, 15 ms end-end prop. delay, 1KB packet:
Ttransmit
= 8kb/pkt10**9 b/sec
= 8 microsec
U sender: utilization -- fraction of time sender busy sending
U sender
= .008
30.008 = 0.00027
microseconds
L / R
RTT + L / R =
L (packet length in bits)R (transmission rate, bps)
=
1KB pkt every 30 msec -> 33kB/sec thruput over 1 Gbps link network protocol limits use of physical resources!
=Time taken for transmission of data
Total timeutilizatio
n
0.27%==
CSCI 547 Transport Layer 3-52
rdt3.0: stop-and-wait protocol
first packet bit transmitted, t = 0
sender receiver
RTT
last packet bit transmitted, t = L / R
first packet bit arriveslast packet bit arrives, send ACK
ACK arrives, send next packet, t = RTT + L / R
U sender
= .008
30.008 = 0.00027
microseconds
L / R
RTT + L / R =
Assumption: The size of ACK is very small—so, ignore the time taken to transmit ACK
Utilization
CSCI 547 Transport Layer 3-53
To increase Utilization ?
=Time taken for transmission of data
Total timeutilizatio
n
=RTT + L / R
=
L / R
D / S + L / R
=Distance
Speed of signalRTT
L / R
L / R
=
L / R
1
1
D / S
L / R+
We want utilization as high as possible---Maximum utilization possible is 1 (100%)
To achieve a closer to the max,
we need to make
D / S
L / R
very small, but D, S, R are fixedTherefore, what should we do ?
Total Distance
Make this as small as possible
CSCI 547 Transport Layer 3-54
We should increase “L” --Length of packet! What is disadvantage having long packet?
To increase Utilization ?
Higher probability of error !
Also, increased buffer size
CSCI 547 Transport Layer 3-55
Dilemma
To achieve high utilization Longer packet Longer packet higher probability of error Two conflicting aspects ! Solution is “windowing (pipelining)” Main idea of windowing is: Rather than
sending a long packet, send many small packets
CSCI 547 Transport Layer 3-56
3 ARQ(Automatic Repeat Request) protocols
To cope with the problems of lost packets, error packets(either data packets or ACK packets), 3 popular protocols exist:
1) Stop-and-Wait ARQ—rdt 3.02) Go-Back-N ARQ3) Selective Repeat ARQ
2) & 3) are sometimes called as “pipelined protocol” or “windowing”
1) can be classified as windowing with windows size of “1”
CSCI 547 Transport Layer 3-57
Pipelined protocols
Pipelining: sender made capable of multiple, “in-flight”, yet-to-be-acknowledged pkts—send several packets that are numbered— P1, P3, P3, … range of sequence numbers must be bigger than
[0, 1] (as in stop-and-wait) buffering required at sender and/or receiver
Two generic forms of pipelined protocols: go-Back-N, selective repeat
ACK packet
CSCI 547 Transport Layer 3-58
Pipelining: increased utilization
first packet bit transmitted, t = 0
sender receiver
RTT
last bit transmitted, t = L / R
first packet bit arriveslast packet bit arrives, send ACK
ACK arrives, send next packet, t = RTT + L / R
last bit of 2nd packet arrives, send ACKlast bit of 3rd packet arrives, send ACK
U sender
= .024
30.008 = 0.0008
microseconds
3 * L / R
RTT + L / R =
Increase utilizationby a factor of 3!
example: 1 Gbps link, 15 ms end-end prop. delay, 1KB packet
CSCI 547 Transport Layer 3-59
Go-Back-NSender: k-bit seq # in pkt header “window” of up to N, consecutive unack’ed pkts allowed
ACK(n): ACKs all pkts up to, including seq # n -- “cumulative ACK” may receive duplicate ACKs (see receiver)
timer for each in-flight pkt timeout(n): retransmit pkt n and all higher seq # pkts in window
In actual protocols, the ACK(n) is used as “the next expected packet #”. In other words, ACK(n) says “I received all packets up to n-1 and I am expecting the “packet n” as the next packet
CSCI 547 Transport Layer 3-60
Frame format of HDLC
Sequence number
ACK = next expected packet number
N(s)=0, N(R)=0
N(s)=0, N(R)
=1
N(s)=1, N(R)=1
CSCI 547 Transport Layer 3-61
TCP Packet format
Byte numbers
CSCI 547 Transport Layer 3-62
Go-Back-N: sender extended FSM
Wait start_timerudt_send(sndpkt[base])udt_send(sndpkt[base+1])…udt_send(sndpkt[nextseqnum-1])
timeout
rdt_send(data)
if (nextseqnum < base+N) { sndpkt[nextseqnum] = make_pkt(nextseqnum,data,chksum) udt_send(sndpkt[nextseqnum]) if (base == nextseqnum) start_timer nextseqnum++ }else refuse_data(data)
base = getacknum(rcvpkt)+1If (base == nextseqnum) stop_timer else start_timer
rdt_rcv(rcvpkt) && notcorrupt(rcvpkt)
base=1nextseqnum=1
rdt_rcv(rcvpkt) && corrupt(rcvpkt)
CSCI 547 Transport Layer 3-63
GBN: receiver extended FSM
ACK-only: always send ACK for correctly-received pkt with highest in-order seq # may generate duplicate ACKs need only remember expectedseqnum
When out-of-order pkt received: discard (don’t buffer) -> no receiver buffering! Re-ACK pkt with highest in-order seq #
Wait
udt_send(sndpkt)
default
rdt_rcv(rcvpkt) && notcurrupt(rcvpkt) && hasseqnum(rcvpkt,expectedseqnum)
extract(rcvpkt,data)deliver_data(data)sndpkt = make_pkt(expectedseqnum,ACK,chksum)udt_send(sndpkt)expectedseqnum++
expectedseqnum=1sndpkt = make_pkt(expectedseqnum,ACK,chksum)
initially
CSCI 547 Transport Layer 3-64
Go-Back-N in action
What are buffer sizes needed?
For sender = ? For receiver = ?
CSCI 547 Transport Layer 3-65
Selective Repeat
receiver individually acknowledges all correctly received pkts buffers pkts, as needed, for eventual in-order
delivery to upper layer
sender only resends pkts for which ACK not received sender timer for each unACKed pkt
sender window N consecutive seq #’s=window size limits # of sent but unACKed pkts
CSCI 547 Transport Layer 3-66
Selective repeat: sender & receiver windows
CSCI 547 Transport Layer 3-67
Selective repeat
data from above : if next available seq # in
window, send pkt & start timer for the packet
timeout(n): resend pkt n, restart timer
ACK(n) came in & in the range [sendbase,sendbase+N-1]:
mark pkt n as “done” if n is smallest unACKed
pkt, advance window base to next unACKed seq #
senderpkt n came in & in the
range [rcvbase, rcvbase+N-1]
send ACK(n) out-of-order: buffer it in-order: deliver (also
deliver buffered, in-order pkts), advance window to next not-yet-received pkt
pkt n came in & in the range [rcvbase-N,rcvbase-1]
send ACK(n)
otherwise: ignore
receiver
CSCI 547 Transport Layer 3-68
Selective repeat in action Windows size = 4
Windows advancesWindows advances
Windows advances
CSCI 547 Transport Layer 3-69
Selective repeat: dilemma
Example: seq #’s: 0, 1, 2, 3 using
2 bits for seq# window size=3
receiver sees no difference in two scenarios!
incorrectly passes duplicate data as new in (a)
Q: What is the relationship between seq # size and window size?
A: Window size = N/2 where N = size of sequence numbers---ex. 0,1,2,3 N=4
CSCI 547 Transport Layer 3-70
TCP implementations on Go-Back-N or Selective-Repeat? Hybrid of Go-Back-N & Selective-Repeat Originally, TCP was Go-Back-N but later SACK option is
added
Transmission Control Protocol, Src Port: 1459 (1459), Dst Port: ftp (21), Seq: 276805644, Len: 0 Source port: 1459 (1459) Destination port: ftp (21) Sequence number: 276805644 Header length: 28 bytes Flags: 0x0002 (SYN) Window size: 65535 Checksum: 0x0637 [correct] Options: (8 bytes) Maximum segment size: 1260 bytes NOP NOP SACK permitted
See RFC 2018
Tcp initialization packet captured using Ethereal
CSCI 547 Transport Layer 3-71
Chapter 3 outline
3.1 Transport-layer services
3.2 Multiplexing and demultiplexing
3.3 Connectionless transport: UDP
3.4 Principles of reliable data transfer
3.5 Connection-oriented transport: TCP segment structure reliable data transfer flow control connection
management
3.6 Principles of congestion control
3.7 TCP congestion control
CSCI 547 Transport Layer 3-72
TCP: Overview RFCs: 793, 1122, 1323, 2018, 2581
full duplex data: bi-directional data flow
in a connection MSS: maximum
segment size set by MTU of link layer
connection-oriented: Initialization Maintenance Termination
flow controlled: Receiver controls the
flow so that sender will not overwhelm receiver
point-to-point: one sender, one receiver
(unicast) reliable, in-order byte
steam: no “message
boundaries” pipelined:
TCP congestion and flow control set window size
send & receive buffers needed for windowing
socketdoor
T C Psend buffer
T C Preceive buffer
socketdoor
segm ent
applica tionwrites data
applica tionreads data
CSCI 547 Transport Layer 3-73
TCP segment structure
source port # dest port #
32 bits
applicationdata
(variable length)
sequence number
acknowledgement numberReceive window
Urg data ptrchecksum
FSRPAUheadleng
notused
Options (variable length)
URG: urgent data (generally not used)
ACK: ACK #is valid or not
PSH: push data now(generally not used)
RST, SYN, FIN:Used for connection management(setup, teardown, or reset connection)
# of bytes rcvr is willingto accept
Byte numbering of data(not packets or segments!)TCP is called “byte-Transfer protocol”
Checksum for header & data(as in UDP)
When U bit is set, then byte position Where urgent message starts
CSCI 547 Transport Layer 3-74
TCP seq. #’s and ACKs
Seq. #’s: byte stream
“number” of first byte in segment’s data
ACKs(piggybacked ACKs): seq # of next byte
expected from other side
cumulative ACKQ: how receiver handles
out-of-order segments A: TCP spec doesn’t
say, - up to implementer (mostly buffer it—in Selective-Repeat mode)
Host A Host B
Seq=42, ACK=79, data = ‘C’
Seq=79, ACK=43, data = ‘C’
Seq=43, ACK=80
Usertypes
‘C’
host ACKsreceipt
of echoed‘C’
host ACKsreceipt of
‘C’, echoesback ‘C’
timesimple telnet scenario
Actual sequence numbers looks something like 1522036564
For each connection, a pseudorandom number is generated for the initial sequence number. Here we use relative sequence numbers
CSCI 547 Transport Layer 3-75
TCP Round Trip Time and Timeout
Q: how to set TCP timeout value?
longer than RTT but RTT changes
dynamically If too short premature timeout
unnecessary retransmissions
If too long slow reaction to segment loss—less efficient
Q: how to estimate RTT? SampleRTT: measured time
from segment transmission until ACK receipt ignore retransmissions—
the RTT for retransmitted packet is not considered in the calculation of SampleRTT—why?
SampleRTT will vary, want estimated RTT “smoother” average several recent
measurements, not just current SampleRTT
CSCI 547 Transport Layer 3-76
Timeout value
time
A B A Bpkt0
ack1
pkt0Duplicate
ignored
timeout
RTT
Timeout too short results in Premature
timeout
Unnecessary transmission Wasted bandwidth
RTT
timeout
pkt0
ack1X
lost
pkt0
Timeout too long
Inefficient usage of
bandwidth
CSCI 547 Transport Layer 3-77
TCP Round Trip Time and Timeout
EstimatedRTT = (1- )*EstimatedRTT + *SampleRTT
In statistics, it is called “Exponential weighted moving average” The influence of past sample decreases exponentially fast 0<= < 1 typical value: = 0.125
Example: EstimatedRTT = 250ms, SampleRTT = 70ms, = 0.125
EstimatedRTT = (1 - 0.125)*250 + 0.125*70 = 218.75 + 8.75 = 227.5ms
Current sampled value
Give 87.5 % of weight to the current EstimatedRTT
Give 12.5% of weight to the current SampleRTT
CSCI 547 Transport Layer 3-78
Example RTT estimation:RTT: gaia.cs.umass.edu to fantasia.eurecom.fr
100
150
200
250
300
350
1 8 15 22 29 36 43 50 57 64 71 78 85 92 99 106
time (seconnds)
RTT
(mill
isec
onds
)
SampleRTT Estimated RTT
CSCI 547 Transport Layer 3-79
TCP Round Trip Time and Timeout
Setting the timeout EstimtedRTT plus “safety margin” –- a measure of variability
large variation in EstimatedRTT -> larger safety margin first estimate of how much SampleRTT deviates from
EstimatedRTT:
TimeoutInterval = EstimatedRTT + 4*DevRTT
DevRTT = (1-)*DevRTT + *|SampleRTT-EstimatedRTT|
(typically, = 0.25)
Then set timeout interval:
For more detail: ftp://ftp.rfc-editor.org/in-notes/rfc2988.txt
CSCI 547 Transport Layer 3-80
Chapter 3 outline
3.1 Transport-layer services
3.2 Multiplexing and demultiplexing
3.3 Connectionless transport: UDP
3.4 Principles of reliable data transfer
3.5 Connection-oriented transport: TCP segment structure reliable data transfer flow control connection
management
3.6 Principles of congestion control
3.7 TCP congestion control
CSCI 547 Transport Layer 3-81
TCP reliable data transfer
TCP creates rdt service on top of IP’s unreliable service
Pipelined segments Cumulative acks—on
Windows systems, ack is sent for every other received packets to reduce the number of packets on the network
TCP implementations use single retransmission timer rather than one timer per packet as assumed in previous slides
Retransmissions are triggered by: timeout events duplicate acks
Let’s initially consider a simplified TCP sender: ignore duplicate acks ignore flow control,
congestion control
CSCI 547 Transport Layer 3-82
From RFC 2988The following is the RECOMMENDED algorithm for managing the
retransmission timer: (5.1) Every time a packet containing data is sent (including a
retransmission), if the timer is not running, start it running so that it will expire after RTO seconds (for the current value of RTO).
(5.2) When all outstanding data has been acknowledged, turn off the retransmission timer.
(5.3) When an ACK is received that acknowledges new data, restart the retransmission timer so that it will expire after RTO seconds (for the current value of RTO). When the retransmission timer expires, do the following:
(5.4) Retransmit the earliest segment that has not been acknowledged by the TCP receiver.
(5.5) The host MUST set RTO <- RTO * 2 ("back off the timer"). The maximum value discussed in (2.5) above may be used to provide an upper bound to this doubling operation.
(5.6) Start the retransmission timer, such that it expires after RTO seconds (for the value of RTO after the doubling operation outlined in 5.5).
In summary, when an ACK received, calculate RTO(using earlier formula (TimeoutInterval = EstimatedRTT + 4*DevRTT) and restart the timer
When a time out occurs, RTO 2*RTO and restart the timer
Here, we have only one timer that is managed by TCP
CSCI 547 Transport Layer 3-83
TCP sender events:
data rcvd from app: Create segment with
seq # seq # is byte-stream
number of first data byte in segment
start timer if not already running (think of timer as for oldest unacked segment)
expiration interval: TimeOutInterval
timeout: retransmit segment
that caused timeout restart timer Ack rcvd: If acknowledges
previously unacked segments update what is known
to be acked start timer if there are
outstanding segments
CSCI 547 Transport Layer 3-84
TCP sender(simplified)
NextSeqNum = InitialSeqNum SendBase = InitialSeqNum
loop (forever) { switch(event)
event: data received from application above create TCP segment with sequence number NextSeqNum if (timer currently not running) start timer pass segment to IP NextSeqNum = NextSeqNum + length(data)
event: timer timeout retransmit not-yet-acknowledged segment with smallest sequence number start timer
event: ACK received, with ACK field value of y if (y > SendBase) { SendBase = y if (there are currently not-yet-acknowledged segments) start timer }
} /* end of loop forever */
Comment:• SendBase-1: last cumulatively ack’ed byteExample:• SendBase-1 = 71;y= 73, so the rcvrwants 73+ ;y > SendBase, sothat new data is acked
Advances window
Restarts timer
CSCI 547 Transport Layer 3-85
TCP: retransmission scenarios
Host A
Seq=100, 20 bytes data
ACK=100
timepremature timeout
Host B
Seq=92, 8 bytes data
ACK=120
Seq=92, 8 bytes data
Seq=
92
tim
eout
ACK=120
Host A
Seq=92, 8 bytes data
ACK=100
loss
tim
eout
lost ACK scenario
Host B
X
Seq=92, 8 bytes data
ACK=100
timeSeq=
92
tim
eout
SendBase= 100
SendBase= 120
SendBase= 120
Sendbase= 100
CSCI 547 Transport Layer 3-86
TCP retransmission scenarios (more)
Host A
Seq=92, 8 bytes data
ACK=100
loss
tim
eout
Cumulative ACK scenario
Host B
X
Seq=100, 20 bytes data
ACK=120
time
SendBase= 120
Host A
Seq=92, 8 bytes data
tim
eout
On Windows systems
Host B
Seq=100, 20 bytes data
ACK=120
time
SendBase= 120
CSCI 547 Transport Layer 3-87
TCP ACK generation [RFC 1122, RFC 2581]
Event at Receiver
Arrival of in-order segment withexpected seq #. All data up toexpected seq # already ACKed
Arrival of in-order segment withexpected seq #. One other segment has ACK pending
Arrival of out-of-order segmenthigher-than-expect seq. # .Gap detected
Arrival of segment that partially or completely fills gap
TCP Receiver action
Delayed ACK. Wait up to 500msfor next segment. If no next segment,send ACK
Immediately send single cumulative ACK, ACKing both in-order segments
Immediately send duplicate ACK, indicating seq. # of next expected byte
Immediately send ACK, provided thatsegment starts at lower end of gap
200 ms for Windows systems
CSCI 547 Transport Layer 3-88
Fast Retransmit
Time-out period often relatively long: long delay before
resending lost packet Detect lost segments
via duplicate ACKs. Sender often sends
many segments back-to-back
If segment is lost, there will likely be many duplicate ACKs.
If sender receives 3 duplicated ACKs (for the same data), it assumes that segments after the ACKed data was lost and it fast retransmit: resend
segment before timer expires
Look at http://www.speedguide.net/read_articles.php?id=157 for adjusting TCP/IP parameters on Windows systems
CSCI 547 Transport Layer 3-89
event: ACK received, with ACK field value of y if (y > SendBase) { SendBase = y if (there are currently not-yet-acknowledged segments) start timer } else { increment count of dup ACKs received for y if (count of dup ACKs received for y = 3) { resend segment with sequence number y }
Fast retransmit algorithm:
a duplicate ACK for already ACKed segment
fast retransmit
For information about TCP implementation on Windows systems, visit
http://support.microsoft.com/kb/224829/EN-US/
CSCI 547 Transport Layer 3-90
Chapter 3 outline
3.1 Transport-layer services
3.2 Multiplexing and demultiplexing
3.3 Connectionless transport: UDP
3.4 Principles of reliable data transfer
3.5 Connection-oriented transport: TCP segment structure reliable data transfer flow control connection
management
3.6 Principles of congestion control
3.7 TCP congestion control
CSCI 547 Transport Layer 3-91
TCP Flow Control
receive side of TCP connection has a receive buffer:
speed-matching service: matching the send rate to the receiving app’s drain rate app process may be
slow at reading from buffer
sender won’t overflowreceiver’s buffer bytransmitting too much, too fast—usually controlled by receiver
flow control
CSCI 547 Transport Layer 3-92
TCP Flow control: how it works
(Suppose TCP receiver discards out-of-order segments in above picture—but actually the out-of-order segments should be discounted from the spare room)
spare room in buffer= RcvWindow
= RcvBuffer-[LastByteRcvd – LastByteRead]
Rcvr advertises spare room by including value of RcvWindow in segments
Sender limits unACKed data to RcvWindow guarantees receive
buffer doesn’t overflow Receiver throttles sender
by advertising a window size no larger than the amount it can buffer.
read by application
CSCI 547 Transport Layer 3-93
TCP Flow control: how it works—by credit scheme
source port # dest port #
32 bits
applicationdata
(variable length)
sequence number
acknowledgement numberReceive window
Urg data ptrchecksum
FSRPAUheadleng
notused
Options (variable length)
TCP Rcvr advertises spare roomRcvWindow=RcvBuffer-[LastByteRcvd – LastByteRead]
For example: 65535 (64 kbytes)Sender is limited to having no more thanRcvWindow bytes of unACKed dataat any time.
TCP flow control is sometimes called as “credit” scheme since receiver gives “credit” to sender (to “spend”)
CSCI 547 Transport Layer 3-94
Credit=Window size
CSCI 547 Transport Layer 3-95
Chapter 3 outline
3.1 Transport-layer services
3.2 Multiplexing and demultiplexing
3.3 Connectionless transport: UDP
3.4 Principles of reliable data transfer
3.5 Connection-oriented transport: TCP segment structure reliable data transfer flow control connection
management
3.6 Principles of congestion control
3.7 TCP congestion control
CSCI 547 Transport Layer 3-96
TCP Connection Management
Recall: TCP sender, receiver establish “connection” before exchanging data segments
initialize TCP variables: seq. #s buffers, flow control info
(e.g. RcvWindow) client: connection initiator Socket clientSocket = new
Socket("hostname","port
number"); server: contacted by client Socket connectionSocket =
welcomeSocket.accept();
Three way handshake:
Step 1: client host sends TCP SYN segment to server specifies initial seq # no data sent here
Step 2: server host receives SYN, replies with SYNACK segment
server allocates buffers specifies server initial
seq. #Step 3: client receives
SYNACK, replies with ACK segment, which may contain data
CSCI 547 Transport Layer 3-97
Host A
Window 4096, <mss 1024>
tim
eout
3-way handshaketo initialize connection
Host B
ACK236
time
SYN, Seq=92
SYN, Seq=235
ACK93, window 65536, <mss 500>
Window 4096
Connection request(SYN=1)
Gives credit(window 4096) of 4096 bytes, also sets mss(max segment size) to 1024
Connection request(SYN=1) for reverse direction
Acknowledges connection request by ACK93,
Gives credit(window 65536) of 65536 bytes, also sets mss(max segment size) to 500
Acknowledges connection for reverse connection by ACK236
Tells “you still have credit of 4096 bytes”
3-way handshake for TCP connection initialization
1
2
3
CSCI 547 Transport Layer 3-98
Host A
Window 4096, <mss 1024>
tim
eout
3-way handshaketo initialize connection
Host B
ACK=236
time
SYN, Seq=92
SYN, Seq=235
ACK 93, window 65536, <mss 1024>
Window 4096
Why 3-way handshake?
First: We need to establish Full-duplex connections (both ways)
Second: To avoid the case of duplicated connections as shown in next slide
CSCI 547 Transport Layer 3-99
Why 3-way handshake? Recovery from old Duplicated SYN
Host A
Window 4096, <mss 1024>
tim
eout
Host B
time
SYN, Seq=92
SYN, Seq=93Window 4096, <mss 1024>
SYN, Seq=235
ACK 93, window 65536, <mss 1024>
SYN, Seq=236
ACK 94, window 65536, <mss 1024>
Host B keeps the connection open and waits for data
Host A reopens(new) connection but Host B thinks it is a new connection and accepts it
RST, Seq=92
Host B aborts the connection
CSCI 547 Transport Layer 3-100
TCP connection termination
Goal: Both sides agree to close connection Two-army problem:“Two blue armies are separated by a valley where white army is. Two
blue armies must attack simultaneously to defeat the white army. The only communication is sending birds which can be lost.”
Can you design a protocol that ensures the attack by blue armies?
Blue army
Blue army
White army
CSCI 547 Transport Layer 3-101
Two army protocolLet’s
attack
Yes
Blue army
Blue army
White army
Sure?
Sure!
What is the problem?
CSCI 547 Transport Layer 3-102
TCP connection termination—4-way handshake
Client Server
Client State
ESTABLISHED
FIN-WAIT-2
TIME-WAIT
FIN-WAIT-1
CLOSED
Wait for Double Maximum Segment Life (MSL) time
Receive FIN, Send ACK
Wait for Server FIN
Receive Close signal from App,
Send FIN
Wait for ACK and FIN from Server
Receive ACK
Server State
ESTABLISHED
LAST-ACK
CLOSE-WAIT
CLOSED
App is ready to Close, Send FIN
Normal Operation
Receive FIN, Send ACK, Tell App to Close
(Wait for App)
Wait for ACK to FIN
Receive ACK
FIN
#1
ACK
#2
FIN
#1
ACK
#2
Even when the 2nd ACK is lost, the connection will be closed
CSCI 547 Transport Layer 3-103
TCP state transition diagram
CLOSED
LISTEN
passive open
ESTABLISHED
closeAppl:passive open / send:<nothing>
SYN_RCVD
rcv:RST /
send:<noting>
rcv:SYN,ACK /
send:ACK
timeout / send:RST
SYN_SENT
Appl: send data /
Send:SYN
Appl:close or timeout /
resetactive open
rcv:SYN; / send:SYN,ACK
rcv:ACK /Send: <nothing>
data transfer state
LAST_ACK
CLOSE_WAIT
CLOSINGFIN_WAIT_1
TIME_WAITFIN_WAIT_2
rcv: FIN /
send: ACK
AppL:close / send:FIN
rcv: SYN /
send:SYN, ACKsimultaneous open
rcv: ACK /
send: <nothing>
2MSL timeout
rcv: ACK /
send: <nothing>
rcv: FIN /
send: ACK
rcv: FIN /
send: ACK
rcv:FIN,ACK /send:ACK
Appl:close /
send:FIN
active close
Client
Server
passive close
Appl: close /
send: FIN
rcv: ACK /
send: <nothing>
Appl: active open /
send: SYN
MSL (Maximum Segment Lifetime) 2 minutes recommended—typically 30 seconds used
Timeout after 2 MSL
normal transition for client
normal transition for server
Appl -- state transition take when application issues operation
CSCI 547 Transport Layer 3-104
Chapter 3 outline
3.1 Transport-layer services
3.2 Multiplexing and demultiplexing
3.3 Connectionless transport: UDP
3.4 Principles of reliable data transfer
3.5 Connection-oriented transport: TCP segment structure reliable data transfer flow control connection
management
3.6 Principles of congestion control
3.7 TCP congestion control
CSCI 547 Transport Layer 3-105
Congestion Control: Congestion control vs Flow control
Congestion: informally: “too many sources sending too much data too fast for network to handle” In flow control, the sender adjusts its transmission rate so as not to overwhelm the receiver
o One end is sending data too fast for a receiving end to handle In congestion control the sender(s) adjust their transmission rate so as not to overwhelm routers in the network
o Many sources independently work to avoid sending too muchdata too fast for the network to handle
Symptoms of congestion:o Lost packets (buffer overflow at routers)o Long delays (queuing in router buffers)
One of the top problems of networking!
CSCI 547 Transport Layer 3-106
Causes& Effects of Congestion:scenario 1: Two equal-rate senders share a single link
Two sources send (each at rate in) as fast as possible
to two receivers across a shared link with capacity Ro Data is delivered to the application at the receiver
at rate out Packets queue at the router
o Assume the router has infinite storage capacity (Thus no packets are lost and there are no retransmissions)
CSCI 547 Transport Layer 3-107
Causes& Effects of Congestion:scenario 1: Two equal-rate senders share a single link
The maximum achievable per connection throughput is constrained by 1/2 the capacity of the shared link Exponentially large delays are experienced when the router becomes congested
o The queue grows without bound but packets are delivered with long delay—packets are not lost
Packets queued
CSCI 547 Transport Layer 3-108
Causes & Effects of Congestionscenario 2: Finite capacity router queue
in
in
retransmit= +
Senders assume packets can now be losto Sender retransmits upon detection of loss
Define offered load(load coming in) as the original transmissions plus retransmissions
(For each sender)
CSCI 547 Transport Layer 3-109
Causes& Effects of Congestionscenario 2: Finite capacity router queue—Throughput analysis
out
out
inR/2
R/4
R/2
R/3
Ideal throughput ( )
Perfect retransmissions
Premature retransmissions
(Each segment transmitted twice)
Premature retransmissions plus loss
in =
in
in<( )
in
in
= )(
“Effects” of congestion: Sender must retransmit to compensate for dropped packets unneeded retransmissions: link carries multiple copies of packet
Throughput
2/27/2009Assume Host is able to somehow (magically) determine whether or not a buffer is free in the router and thus sends a packet only when a buffer is free. Therefore, there is no retransmission.
Sender retransmitts when it is sure that a packet is lost. By setting timeout to a large enough value.
Sender may time out prematurely and retransmit a packet that has been delayed in the queue but not lost. In this case, both the original packet and the retransmission may both reach the receiver. In effect, each packet is sent twice.
CSCI 547 Transport Layer 3-110
Causes & Effects of Congestionscenario 3: Four equal-rate senders share multiple hops
Assuming: Each source’s
traffic transits two routers
Routers have finite, same # of buffers
All links have the same capacity
Senders timeout and retransmit lost packets
in
Q: what happens as and increase ?
in
retransmitted
original
CSCI 547 Transport Layer 3-111
Causes & Effects of Congestionscenario 3:Four equal-rate senders share multiple hops
Throughput increases linearly as the network remains underloaded At the saturation point loss starts to occur Once loss occurs the offered load increases—for both original & retransmitted Loss rates increase… all packets are retransmitted ones spiraling effect
CSCI 547 Transport Layer 3-112
Causes & Effects of Congestionscenario 3:Four equal-rate senders share multiple hops—Throughput analysis
out
Congestion collapse All links are fully utilized but no data is delivered—all traffic
will be retransmissions of retransmissions!
R/2
R/2 in
Throughput
Why R/2? in from host + in in transit
in
CSCI 547 Transport Layer 3-113
Causes & Effects of CongestionSummary
Uncontrolled, congestion can lead to dropped packetso This means that
bandwidth used delivering packets to the point of congestion was wasted
In the limit, it can lead to network collapse The network is fully busy
but no work gets done All packets in the
network are retransmissions
CSCI 547 Transport Layer 3-114
Approaches towards congestion control: End-to-end vs. Hop-by-hop
End-end congestion control:
End-systems receive no feedback from network
congestion inferred by observing loss and/or delay by end systems
approach taken by TCP
Network-assisted congestion control (Hop-by-hop):
routers provide feedback to end systems Network determines an
explicit rate that a sender should transmit
Network signals congestion by setting a single bit in a packet (SNA, DECbit, TCP/IP ECN, ATM)
Two broad approaches towards congestion control:
CSCI 547 Transport Layer 3-115
Explicit Congestion Notification (ECN)
From Wikipedia, the free encyclopedia
Explicit Congestion Notification (ECN) is an extension to the Internet Protocol and is defined in RFC 3168 (2001). ECN allows end-to-end notification of network congestion without dropping packets. It is an optional feature, and is only used when both endpoints signal that they want to use it.
Traditionally, TCP/IP networks signal congestion by dropping packets. When ECN is successfully negotiated, an ECN-aware router may set a bit in the IP header instead of dropping a packet in order to signal the beginning of congestion. The receiver of the packet echoes the congestion indication to the sender, which must react as though a packet drop were detected.
ECN uses two bits in the Differentiated Services field in the IP header, in the IPv4 TOS(Type Of Service) Byte or the IPv6 Traffic Class Octet. These two bits can be used to encode one of the values ECN-unaware transport, ECN-aware transport or congestion experienced.
CSCI 547 Transport Layer 3-116
Example of Congestion ControlDigression: Asynchronous Transfer Mode (ATM) networks
ATM is a standard for B-ISDN (Broadband Integrated Service Digital Network) networks
o Operates at speeds from 155 Mbps to multi-Gbpso Employs packet-switching of fixed length packets (“cells”) using virtual circuits—54 byte cells
B-ISDN provides integrated end-to-end transport of data, real-time digital voice, and video
o Designed to meet the “quality-of-service” requirements of voice/video applications
ATM is scalable so that it can be used for LANs, MANs, or WANsATM used in the Internet today by ISPs as a WAN (Wide Area Network) technology but not very successful in LAN deployment
CSCI 547 Transport Layer 3-117
A sample of Internet backbone using ATMFrom: http://www.nthelp.com/maps.htm
CSCI 547 Transport Layer 3-118
ATM Protocol Architecture
CSCI 547 Transport Layer 3-119
Hop-by-Hop Congestion Control of Asynchronous Transfer Mode (ATM) networks
“Integrated services” implies multiple service models
o As opposed to the Internet’s single “best-effort” model
ATM service models (levels of QOS):o Constant bit-rate (CBR) — Guaranteed throughput, end-to-end delay, delay-variation, and loss rate boundso Variable bit-rate (VBR) — Just like CBR except the sender is assumed to generate irregular traffico Available bit-rate (ABR) — Minimum guaranteed transmission rate, congestion notificationo Unspecified bit-rate (UBR) — Guaranteed in-order delivery
CSCI 547 Transport Layer 3-120
Hop-by-Hop Congestion ControlExample: ATM ABR congestion control
ABR is an “elastic service”o If the sender’s path is “underloaded” then the sender can use the available bandwidtho If the sender’s path is congested then the sender is throttled back to its minimum guaranteed rate
An ABR sender periodically generates Resource Management (RM) packets Bits in RM packets are set by switches depending on the level of congestion (“network-assisted”)
o NI bit: No increase in rate (mild congestion)o CI bit: Congestion indication
RM packets are returned to the sender by receiver with bits intact
CSCI 547 Transport Layer 3-121
Hop-by-Hop Congestion ControlATM ABR congestion control
RM packets contain a two byte explicit rate (ER) field
o Congested switches decrement the ER valueo RM packets arriving at the receiver contain the minimum supportable transmission rate for the path
Data packets contain an “EFCI” bit which can be set by a congested switch--Explicit Forward Congestion Indication
o If the data packet preceding an RM packet has EFCI set, then the receiver sets the CI bit in the RM packet before returning itBut these kinds of congestion control is possible only on a connection-oriented network service
CSCI 547 Transport Layer 3-122
Chapter 3 outline
3.1 Transport-layer services
3.2 Multiplexing and demultiplexing
3.3 Connectionless transport: UDP
3.4 Principles of reliable data transfer
3.5 Connection-oriented transport: TCP segment structure reliable data transfer flow control connection
management
3.6 Principles of congestion control
3.7 TCP congestion control
CSCI 547 Transport Layer 3-123
TCP Congestion Control
TCP must use end-to-end congestion control since IP provides no explicit feedback about network congestion
Approach taken by TCP: To have each sender limit the rate calculated as a function of perceived network congestion
3 Questions:o When & How to limit the rate?o How to perceive congestion?o What algorithm for changing the rate?
CSCI 547 Transport Layer 3-124
TCP Congestion Control: details
sender limits transmission to keep the following condition:
LastByteSent-LastByteAcked
MIN (CongWin, RcvWindow) Roughly,
CongWin = w x MSS Bytes CongWin is dynamic, function
of perceived network congestion
Max rate = CongWin
RTT Bytes/sec
Data flow
W is a variable
CSCI 547 Transport Layer 3-125
TCP Congestion Control: details
How does sender perceive congestion?
In TCP, when a loss event is detected, TCP perceives a “congestion”
loss event = timeout or 3 duplicate ACKs perceived network congestion
TCP sender reduces rate (CongWin) after a loss event In Flow control, receiver does the control In Congestion control, sender does the control
Reduces how much? Depends on the versions—TCP Tahoe, TCP Reno
CSCI 547 Transport Layer 3-126
TCP Congestion Control: details
TCP congestion control algorithms:
AIMD (Additive-Increase, Multiplicative-Decrease)
Slow start Reaction to loss events
CSCI 547 Transport Layer 3-127
TCP congestion control: AIMD(Additive Increase, Multiplicative Decrease)
Approach: increase transmission rate (congestion window size) gradually(linear), probing for usable bandwidth, until loss occurs additive increase: increase CongWin by 1 MSS every RTT
until loss detected or reaches RcvWindow multiplicative decrease: cut CongWin in half after loss—
why so much? Because effect of congestion is exponential—see slide 113
Saw toothbehavior: probing
for bandwidth
CSCI 547 Transport Layer 3-128
TCP congestion control:
The congestion window grows in two phases:
oSlow start — Start with small window but ramp up transmission rate until loss occurso Congestion avoidance — After the congestion window grows over a threshold, increase the congestion window cautiously to avoid congestion
CSCI 547 Transport Layer 3-129
TCP Slow Start—TCP starts slow(1 MSS)
When connection begins, CongWin typically initialized to 1 MSS—so, slow start
Example: MSS = 500 bytes & RTT = 200 msecinitial rate = 25 kbps
available bandwidth may be much larger than MSS/RTT
desirable to quickly ramp up to respectable rate—so, we need quicker than linear increase
Max rate = CongWin
RTT Bytes/sec
Slow start actually means, “Slow start with quick increase(exponential)”
CSCI 547 Transport Layer 3-130
TCP Slow Start (more)
When connection begins, start with 1 MSS and increase rate exponentially until first loss event: double CongWin every RTT done by incrementing CongWin for every ACK received
Continue until a loss event (timeout or 3 dupAcks) or reaches RcvWindows, then go to AIMD(Additive Increase, Multiplicative Decrease)
Summary: initial rate is slow (“slow start”) but ramps up exponentially fast
Host A
one segment
RTT
Host B
time
two segments
four segments
CSCI 547 Transport Layer 3-131
TCP congestion control: Actual implementationFor a loss detected by receiving 3 duplicated ACKs or by reaching RcvWindow, CongWin is cut in halfFor a timeout, TCP goes back to “slow start” mode (CongWin = 1 MSS), then exponentially grow until reaching ½ of previous CongWin, then grow linearlyNote: TCP implementations handles loss differentlyoTCP “Tahoe”: Timeout & 3 dup ACKs are treated same—older version of TCPoTCP “ Reno”: Timeout & three duplicate ACKs treated differently—most implementations currentlyFor more details, see Table 3.3
CSCI 547 Transport Layer 3-132
Refinement
Q: When should the exponential increase switch to linear?
A: When CongWin gets to 1/2 of its value before a loss.
Implementation: Variable Threshold At loss event, Threshold
is set to 1/2 of CongWin just before loss event
Threshold set to 8 initially
Above diagram shows the case for 3-dup-ACKS received
For Tahoe, CongWin is set to 1—For Tahoe, it does not distinguish between Timeout and 3-dup-ACKS
For Reno, CongWin is set to half in case of 3-dup-ACKS but CongWin is set to 1 in case of Timeout
CSCI 547 Transport Layer 3-133
Refinement: inferring loss—actual implementation After 3 dup ACKs:
CongWin is cut in half (only for Reno) but send the packet again—”Fast retransmit”
window then grows linearly
But after Timeout event: CongWin set to 1 MSS
—both Tahoe & Reno window then grows
exponentially until reaches to a threshold (half point), then grows linearly
3 dup ACKs indicates
network capable of delivering some segments timeout indicates a “more alarming” congestion scenario
Philosophy:
CSCI 547 Transport Layer 3-134
Summary: TCP Congestion Control
Note: TCP implementations detect loss differently TCP “Tahoe”: Timeout & 3-dup-ACKS—older version of TCP TCP “ Reno”: Timeout or three duplicate ACKs--current
When CongWin is below Threshold, sender in slow-start phase, window grows exponentially.
When CongWin is above Threshold, sender is in congestion-avoidance phase, window grows linearly.
When a triple duplicate ACK occurs, Threshold set to CongWin/2 and CongWin set to Threshold—for Reno only
When timeout occurs, Threshold set to CongWin/2 and then CongWin is set to 1 MSS—for both
CSCI 547 Transport Layer 3-135
TCP Reno
Timeout
Used by most implementations
currently
CSCI 547 Transport Layer 3-136
TCP sender congestion control
State Event TCP Sender Action Commentary
Slow Start (SS)
ACK receipt for previously unacked data
CongWin = CongWin + MSS, If (CongWin > Threshold) set state to “Congestion Avoidance”
Resulting in a doubling of CongWin every RTT
CongestionAvoidance (CA)
ACK receipt for previously unacked data
CongWin = CongWin+MSS * (MSS/CongWin)
Additive increase, resulting in increase of CongWin by 1 MSS every RTT
SS or CA Loss event detected by triple duplicate ACK
Threshold = CongWin/2, CongWin = Threshold,Set state to “Congestion Avoidance”
Fast recovery, implementing multiplicative decrease. CongWin will not drop below 1 MSS.
SS or CA Timeout Threshold = CongWin/2, CongWin = 1 MSS,Set state to “Slow Start”
Enter slow start
SS or CA Duplicate ACK
Increment duplicate ACK count for segment being acked
CongWin and Threshold not changed
Table 3.3
CSCI 547 Transport Layer 3-137
TCP throughput
What’s the average throughout of TCP as a function of congestion window size and RTT? Ignore slow start
Let W be the window size when loss occurs.
When window is W, throughput is W/RTT Just after loss, window drops to W/2,
throughput to W/2RTT. Average throughout: .75 W/RTT
See Slide 125W
2RTT
W
RTT +
2
CSCI 547 Transport Layer 3-138
Fairness goal: if K TCP sessions share same bottleneck link of bandwidth R, each should have average rate of R/K
TCP connection 1
bottleneckrouter
capacity R
TCP connection 2
TCP Fairness
CSCI 547 Transport Layer 3-139
Why is TCP fair?
Two competing sessions with same MSS & RTT: Additive increase gives slope of 1, as throughput increases multiplicative decrease decreases throughput proportionally
R
R
equal bandwidth share
Connection 1’s throughput
Connection 2’s
throughput congestion avoidance: additive increaseloss: decrease window by factor of 2
congestion avoidance: additive increaseloss: decrease window by factor of 2
Throughput goal
Full bandwidth utilization line
CSCI 547 Transport Layer 3-140
Fairness (more)
Fairness and UDP Multimedia apps often do
not use TCP do not want rate
throttled by congestion control, also avoid TCP’s overhead
Instead use UDP: pump audio/video at
constant rate, tolerate packet loss
Research area: How to prevent UDP(no flow or congestion control) traffic bringing down Internet TCP friendly
Fairness and parallel TCP connections
nothing prevents app from opening parallel connections between 2 hosts.
Web browsers do this Example: link of rate R
supporting 9 connections; new app asks for 1 TCP, gets
rate R/10 new app asks for 11 TCPs,
gets R/2 ! Is this fair?
CSCI 547 Transport Layer 3-141
Chapter 3: Summary
principles behind transport layer services: multiplexing,
demultiplexing reliable data transfer flow control congestion control
instantiation and implementation in the Internet UDP TCP
Next: leaving the network
“edge” (application, transport layers)
into the network “core”
CSCI 547 Transport Layer 3-142
Lab –Hand-in next class Using Wireshark, capture packets for a tcp session and
Identify tcp’s three way handshake for tcp connection initialization—show the window size negotiation also
Identify the change in the size of window during the connection—in other words, the receiver controls the flow by shrinking/expanding the window size
Identify the RTT adjustments on the captured packets Identify the four way handshakes for tcp connection
termination Cumulative acks—on Windows systems, ack is sent for
every other received packets to reduce the number of packets on the network—Prove this
Try to identify the case where the selective repeat is visible—this may not be easy—after trying, you state whatever you found out
Look for 3 duplicated ACKs to a segment—look for “[TCP previous segment lost”] in the info field. What is your TCP’s reaction to this? What is this feature called?
Show the packets and mark them clearly to illustrate the concepts