7 tcp-congestion
DESCRIPTION
More details on the TCP protocol including some security issues with TCP and introduction of congestion controlTRANSCRIPT
Week 7UDP and TCP
SCTP and Internet Congestion control
Agenda
• TCP
• Connection establishment
• Reliable data transfer
• Connection release
• SCTP
• Congestion control
TCP segment
Source port Destination port
Payload
32 bits
Checksum Urgent pointer
THL Reserved Flags
20 bytes
Sequence number
Optional header extension
Window
Flags :used to indicate the function of a segmentSYN : used during establishmentFIN : used during connection releaseRST : used in case of problemsACK : if true, means that the Acknowledgementnumber inside the segment is valid
Computed over the entire segment and part of the IP
header
Acknowledgement number
Segment header length
Three-way handshake
ACK(seq=x+1, ack=y+1)
CONNECT.req
CONNECT.ind
SYN+ACK(ack=x+1,seq=y)
CONNECT.resp
CONNECT.conf
Initial sequence number (x)
Initial sequence number (y)
SYN(seq=x)
Connection established
Connection established
The sequence numbers of all segments A->B will start at x+1
The sequence numbers of allsegments B->A will start at y+1
TCP FSM
Init
SYN RCVD SYN Sent
Established
?SYN / !SYN+ACK !SYN
?SYN+ACK / !ACK
?SYN / !SYN+ACK
?ACK
!SYN
?ACK
Simultaneous open
CONNECT.conf
SYN(seq=y)CONNECT.req
CONNECT.req
SYN(seq=x)
Connection establishedConnection established
CONNECT.conf
SYN+ACK(seq=y, ack=x+1)
SYN+ACK(seq=x, ack=y+1)
Negotiating options
ACK(seq=x+1, ack=y+1)
CONNECT.req
CONNECT.ind
SYN+ACK(ack=x+1,seq=y) Option
CONNECT.resp
CONNECT.conf
Initial sequence number (x) Option proposed
Initial sequence number (y)Option accepted
SYN(seq=x),Option
Connection establishedOption accepted
Connection established
The sequence numbers of all segments A->B will start at x+1
The sequence numbers of allsegments B->A will start at y+1
TCP options
• MSS
• Selective acknowledgements
• Timestamps
• Window Scale
• Multipath TCP
• ...
Agenda
• TCP
• Connection establishment
• Reliable data transfer
• Connection release
• SCTP
• Congestion control
Reliable data transfer
(seq=127,"ef")
(seq=123,"abcd")
(seq=123,"abcd")
(seq=127,"ef")
(ack=123)
Retransmission timer
(ack=129)
(ack=129)unnecessary
retransmission
"abcdef"
Retransmission of all unacked segments
“ef” placed in buffer
Retransmission timer
• How to compute it ?
• round-trip-time may change frequently
during the lifetime of a TCP connection
Retransmission timer
• Algorithm
• timer = mean(rtt) + 4*std_dev(rtt)
• est_mean(rtt) = (1- )*est_mean(rtt)
+ *rtt_measured
• est_std_dev=(1-)*est_std_dev+
*|rtt_measured - est_mean(rtt)|
RTT measurements
• Solution (Karn/Partridge)
• Do not measure rtt of retransmitted segments
(seq=123,"abcd")
(seq=120,"xyz")
(ack=123)
(ack=128)
measured rtt
Timerwhich is the good one ?
(seq=123,"abcd")
With Timestamp option
(seq=123,TS=3, TS echo=12, "abcd")
(seq=120,TS=1, TS echo=7, "xyz")
(ack=123, TS=12, TS echo=1)
(ack=127, TS=17, TS echo=3)
measured rtt
timer
measured rtt
(seq=123,TS=5, TS echo=12, "abcd")
Fast retransmit
(seq=123,"abcd")
(ack=123)
(ack=123)
(ack=123)
(ack=123)
(ack=133)
(seq=123,"abcd")
"abcdefghij"
(seq=127,"ef")
Out of sequence, in buffer
(seq=129,"gh")
Out of sequence, in buffer(seq=131,"ij")
Out of sequence, in buffer
Selective Acks
(seq=123,"abcd")
(seq=127,"ef")
(ack=123)
(seq=129,"gh")
(seq=131,"ij")
(ack=123,sack:127-128)
(ack=123, sack:127-130)
(ack=123, sack:127-132)
Lost
(seq=123,"abcd")(ack=133)
"abcdefghij"
only 123-126 must beretransmitted
• Receiver reports SACK blocks
• Negotiated during establishment
Delayed acks• Sending an ack per segment is costly
• Tradeoff
• In sequence data segment
• no ack waiting, delay by up to 50msec
• one ack waiting, send immediately
• Out-of-sequence data segment
• send ack immediately
When to send data ?
• When should a segment be sent ?
• After each write system call
• When there is a full segment of data
Nagle algorithm
• A new data segment can be sent if
• This is a full segment (MSS bytes)
• There are no unacknowledged bytes
Observed IP packets
http://www.caida.org/research/traffic-analysis/pkt_size_distribution/graphs.xml
Flow control
(seq=122,"abcd")
(ack=126,rwin=0)
Last_ack=122, swin=100, rwin=4To transmit : abcdefghijklm
Last_ack=122, swin=96, rwin=0
Last_ack=126, swin=100, rwin=0(ack=126,rwin=2)
(seq=126,"ef")
(ack=128,rwin=20)
Last_ack=126, swin=100, rwin=2Last_ack=126, swin=98, rwin=0
Last_ack=128, swin=100, rwin=20
Last_ack=128, swin=93, rwin=13(seq=128,"ghijklm")
(ack=135,rwin=20)Last_ack=135, swin=100, rwin=20
TCP flow control
• Performance function of window size
• Throughput ~= window/rtt
• TCP window : 16 bits field
• RFC1323 Window scale extension
rtt 1 msec 10 msec 100 msecWindow8 Kbytes 65.6 Mbps 6.5 Mbps 0.66 Mbps64 Kbytes 524.3 Mbps 52.4 Mbps 5.2 Mbps
Agenda
• TCP
• Connection establishment
• Reliable data transfer
• Connection release
• SCTP
• Congestion control
Connection release
FIN(seq=x)
DISCONNECT.req (A-B)
DISCONNECT.ind(A-B)
ACK(ack=x+1)DISCONNECT.conf(A-B)
ACK(ack=y+1)DISCONNECT.conf(A-B)
DISCONNECT.req(B-A)
DISCONNECT.ind(B-A)
FIN(seq=y)
Time WAITMaintain state for this connection during twice MSLto be able to retransmit ACK if a segment is received from the other entity
outgoing connection closed
incoming connection closed
incoming connection closed
outgoing connection closed
State can be removed
Last sent data : x-1
Last sent data : y-1
Abrupt release
RST(seq=x)
DISCONNECT.req (abrupt)
DISCONNECT.ind(abrupt)
Connection closed
Connection closed
State can be removed
State can be removed
Last sent data : x
• Data segments can be lost during such an abrupt release• No entity needs to wait in TIME_WAIT state after such a release
• anyway, any segment received when there is no state causes the transmission of a RST segment
TCP connection release
FIN Wait1
SYN RCVD
CLOSE Wait
Established
FIN Wait2
LAST-ACK
TIME Wait
Closing
Closed
?FIN/!ACK
!FIN
?ACK
Timeout[2MSL]
?FIN/!ACK
?ACK
!FIN
?ACK
?FIN/!ACK
!FIN
Agenda
• TCP
• Connection establishment
• Reliable data transfer
• Connection release
• SCTP
• Congestion control
TCP limitations
• Service
• Only supports bytestream service
• Extensibility
• Limited space for options
• Security
• Various issues like Denial of Service
attacks
TCP establishment
SYN(Src=C,seq=x)
CONNECT.ind
SYN+ACK(Dest=C,ack=x+1,seq=y)
ACK(Src=A,seq=x)
CONNECT.req
DoS attack
SYN(Src=A,seq=x)
CONNECT.ind
CONNECT.ind
SYN+ACK(Dest=A,ack=x+1,seq=y)
SYN+ACK(Dest=B,ack=x+1,seq=z)
SYN(Src=B,seq=x)
• Attacker sends 1000s of SYNs
TCP Security
• 20th century security
• Server trusts Alice but not Bob
• Server accepts all TCP connections
from Alice's IP address without
asking a password
• Server always asks a password
from Bob's IP address
TCP Security • Can Bob create a fake TCP connection
by spoofing Alice's IP when she is away
?
SYN+ACK(ack=x+1,seq=y)
SYN(seq=x)
ACK(seq=x+1, ack=y+1)
CONNECT.req
CONNECT.ind
CONNECT.resp
CONNECT.conf
TCP Security
• Bob's view of the transfer
SYN+ACK(Dst=A,ack=x+1,seq=y)
SYN(Src=A,seq=x)
ACK(seq=x+1, ack=y+1)
Data(Src=A,seq=x+1)
SYN Cookies
SYN+ACK(ack=x+1,seq=y)
SYN(seq=x)
ACK(seq=x+1, ack=y+1)
CONNECT.req
CONNECT.ind
CONNECT.conf
No state createdy=Hash(IPClient,PortClient,Secret)
Verify thatack=1+Hash(IPClient,PortClient,Secret)
State is created
• Stateless passive opener
SCTP
• Segment format
SCTP connection
establishment
Agenda
• TCP
• Connection establishment
• Reliable data transfer
• Connection release
• SCTP
• Congestion control
TCP Congestion
Control• Congestion detection
• Packet loss
• Explicit Congestion Notification
• Congestion control
• Additive Increase Multiplicative
Decrease
Additive Increase• No congestion ?
• All acks move window
• Additive increase
• Increment cwnd by on MSS every rttCwnd
Time
Faster increase• How to speed up the growth of the
congestion window at connection
startup ?
• Slow-start
• Double cwnd every rttCwnd
Slow-startexponential increase of cwnd
Time
Max window
Multiplicative
decrease• How to detect congestion ?
• Three duplicate acks
• mild congestion for TCP
• cwnd/2 and restart additive increase
• Expiration of retransmission timer
• severe congestion
• Reset cwnd at 1 MSS
• Perform slow-start until half previous cwnd
and then continue with congestion
avoidance
CwndFast retransmit
Threshold
Threshold
Slow-startexponential increase of cwnd
Congestion avoidance linear increase of cwnd
Fast retransmit
Mild congestion
Severe congestion
Cwnd
Time
Timer expiration
Threshold
Timer expiration
Threshold
Slow-startexponential increase of cwnd
Congestion avoidance linear increase of cwnd