1 tcp. 2 referencias cerf, v., and r. kahn, "a protocol for packet network...
TRANSCRIPT
1
TCP
2
Referencias
• Cerf, V., and R. Kahn, "A Protocol for Packet Network Intercommunication", IEEE Transactions on Communications, Vol. COM-22, No. 5, pp 637-648, May 1974.
• J. Postel, RFC-793 "Transmission Control Protocol" September 1981.
( Various errors and inconsistencies were detected)
• Clarifications and bug fixes in RFC 1122 October 1989
• Extensions in RFC 1323
• Peterson and Davie , pp 378-405
3
Agenda
o Revisión Nivel de Transporteo El Protocolo TCP
Características TCP Connection setup Segmentos TCP Números de Secuencia TCP TCP Sliding Window Control de Flujo Timeouts y Retransmisiones (RTX)
4
Recordar los modelos de capas….
1 Parte de la clase
5
Nivel Transporte
Enri Lean
milagros.dc.uba.ar stone.ac.upc.es
Nivel de Red
Nivel de Enlace
Nivel de Aplicación
Nivel de Transporte
O.S. O.S.HeaderData HeaderData
HD
HD
HD
HD HD
HD
6
Modelo OSI
Session
Network
Link
PhysicalPhysicalPhysical
Application
Presentation
Transport
Network
Link Link
Network
Transport
Session
Presentation
Application
Network
Link
Physical
Peer-layer communication
layer-to-layer communication
Router Router
1
2
3
4
5
6
7
1
2
3
4
5
6
7
7
TCP : Características TCP es orientado a conexion
Manejo de la conexión : 3-way handshake usado para setup y 2-2 o 4 way handshake para la liberación
TCP provee un servicio de flujo de bytes (stream-of-bytes ) TCP es confiable ( estableciendo una suerte de “conexión lógica entre los
sockets”) Acknowledgements ACKs Checksums Números de secuencia para detectar datos perdidos o desordenados Datos perdidos o corruptos se RTX después de un timeout. Datos desordenados se podrían reordenar. Control de Flujo evita inundar al receptor .
TCP implementa mecanismos de control de congestión ( se le dedica una clase la semana próxima ).
8
TCP : orientado a conexión
3-way handshake
(Activo)Cliente
(Pasivo)Server
Syn
Syn + Ack
Ack
4-way handshake
(Activo)Cliente
(Pasivo)Server
Fin
(Data +) Ack
Fin
Ack
Caso ideal , en el RFC 793 Se plantean varios escenarios
9
TCP soporta un “stream de bytes”
By te 0
By te 1
By te 2
By te 3
By te 0
By te 1
By te 2
By te 3
Host A
Host B
By te 8 0
By te 8 0
10
…dicho servicio se emula usando segmentos TCP
By te 0
By te 1
By te 2
By te 3
By te 0
By te 1
By te 2
By te 3
Host A
Host B
By te 8 0
TCP Data
TCP Data
By te 8 0
Un segmento se envía cuando:1. Segmento full (MSS bytes),2. No esta “full” , pero sucede
time out, 3. “Pushed” por la aplicación.
11
Formato del Segmento TCP
IP HdrIP Data
TCP HdrTCP Data
Src port Dst port
Sequence #
Ack Sequence #
HLEN4
RSVD6
UR
GA
CK
PS
HR
ST
SYN
FIN
FlagsWindow Size
Checksum Urg Pointer
(TCP Options)
0 15 31
TCP Data
TCP Header y Data + dirección
IP
Los números de port Src/dst y la direcciones identifican unívocamente un “socket”
Veremos concepto de
pseudo header
12
Números de SecuenciaHost A
Host B
TCP Data
TCP Data
TCP HDR
TCP HDR
ISN (initial sequence number)
Seq_Num = 1er byte
ACK Seq_Num = Próximo byte
esperado
13
Initial Sequence Numbers
Connection Setup3-way handshake
(Active)Client
(Passive)Server
Syn +ISNA
Syn + Ack +ISNB
Ack
14
TCP Sliding Window How much data can a TCP sender have
outstanding in the network? How much data should TCP retransmit when an
error occurs? Just selectively repeat the missing data?
How does the TCP sender avoid over-running the receiver’s buffers?
15
TCP Sliding Window
Window Size
OutstandingUn-ack’d data
Data OK to send
Data not OK to send yet
Data ACK’d
Retransmission policy is “Go Back N”. Current window size is “advertised” by receiver (usually 4k – 8k Bytes when connection set-up).
16
TCP Sliding Window
Host A
Host BACK
Window Size
Round-trip time
(1) RTT > Window size
ACK
Window Size
Round-trip time
(2) RTT = Window sizeACK
Window Size???
17
Jacobson ( 1988)
18
TCP: Retransmission and Timeouts
Host A
Host B
ACK
Round-trip time (RTT)
ACK
Retransmission TimeOut (RTO)
Estimated RTT
Data1 Data2
Guard
Band
TCP uses an adaptive retransmission timeout value:
CongestionChanges in Routing
RTT changes frequently
19
TCP: Retransmission and Timeouts
Picking the RTO is important: Pick a values that’s too big and it will wait too long to
retransmit a packet, Pick a value too small, and it will unnecessarily retransmit
packets.
The original algorithm for picking RTO:1. EstimatedRTT = EstimatedRTT + (1 - ) SampleRTT2. RTO = 2 * EstimatedRTT
Characteristics of the original algorithm: Variance is assumed to be fixed. But in practice, variance increases as congestion
increases.
20
TCP Timer ManagementOf the several timers TCP maintains the most important is the retransmission timer RTO, (also called timeout) . After each segment is sent, TCP starts a retransmission timer, if ACK arrives before timer expires, cancel timer. If timer expires first, consider segment lost.
How long should RTO be ?Typically some small multiple of RTT.
So how to measure RTT ?Measure time between segment sent and ACK receiver.
Unfortunately, in the Internet RTT are not constant, they a vary a lot.
21
TCP Timer Management
(a) Probability density of ACK arrival times in the data link layer.
(b) Probability density of ACK arrival times for TCP.
22
Retransmisión Adaptiva (Algoritmo Original)
• Mide SampleRTT para cada par segmento/ ACK• Calcula el promedio ponderado de RTT
– EstimatedRTT = x EstimatedRTT + x SampleRTT– donde + = 1 <= 0.9 <= 0.2
• Fijar timeout basado en EstimatedRTT– TimeOut = 2 x EstimatedRTT
23
Algoritmo de Karn/Partridge
• No considerar RTT cuando se retransmite• Duplicar timeout luego de cada retransmisión
Sender Receiver
Original transmission
ACK
Sam
pleR
TT
Retransmission
Sender Receiver
Original transmission
ACK
Sam
pleR
TT
Retransmission
24
Algoritmo de Jacobson/ Karels• Nueva forma de calcular el promedio de RTT• Diff = sampleRTT - EstRTT• EstRTT = EstRTT + ( x Diff)• Dev = Dev + ( |Diff| - Dev)
– donde es un factor entre 0 y 1 (Por ejemplo 1/8)
• Considerar varianza cuando fijamos el timeout• TimeOut = x EstRTT + x Dev
– donde = 1 y = 4
• Notas– Los algoritmos son tan buenos/malos como la granularidad del reloj
(500ms en Unix)– Un preciso mecanismo de timeout es importante para controlar la
congestión (más adelante)– Además de controlar congestión, la idea es no retransmitir cuando no es
necesario.
25
RTO ( Ret. Timeout) exceptionsAssume a segment times out and is then retransmitted. An ACK for the segment arrives.
So for purposes for calculating M how do we decide if the ack is for the first send or the retransmission ?
We cannot. It might be for the first, but very delayed, or might be for the second. So we cannot use ACKs of retransmitted segments for calculating M (or updating RTT).
Rule: Don't use acks of retransmitted segments to update RTT. Instead, if segment times out, simply double RTO.
This is called the Karn's algorithm.
26
Ejemplo de estimación de RTT:RTT: gaia.cs.umass.edu to fantasia.eurecom.fr
100
150
200
250
300
350
1 8 15 22 29 36 43 50 57 64 71 78 85 92 99 106
tiempo (segundos)
RTT
(mili
segu
ndos
)
MuestraRTT EstimadoRTT
27
TCP
2 parte
28
TCP
Repaso ( basado en el Peterson)
29
Protocolos End-to-End
• Se apoyan en la capa Red, la cual es de mejor esfuerzo (best-effort)– descarta mensajes– re-ordena mensajes– puede entregar múltiples copias de un mensaje dado– limita los mensajes a algún tamaño finito– entrega mensajes después de un tiempo arbitrariamente largo
• Servicios comunes ofrecidos/deseados end-to-end– garantía de entrega de mensajes– entrega de mensajes en el mismo orden que son enviados– entrega de a lo más una copia de cada mensaje– soporte para mensajes arbitrariamente largos mensajes– soporte de sincronización– permitir al receptor controlar el flujo de datos del transmisor– soportar múltiples procesos de nivel aplicación en cada máquina
30
Demultiplexor Simple (UDP: User Datagram Protocol)
• Servicio de entrega no confiable y no ordenado de datagramas
• Agrega multiplexión• No hay control de flujo• Hay puertos definidos en cada extremo
– servidor posee un puerto bien conocido– ver /etc/services en Unix (o Linux)
• Formato de encabezado
• Chequeo se suma opcional– psuedo header(IP) + UDP header + data
SrcPort DstPort
Checksum Length
Data
0 16 31
31
Version HLen TOS Length
Ident Flags Offset
TTL Protocol Checksum
SourceAddr
DestinationAddr
Options (variable) Pad(variable)
0 4 8 16 19 31
Contexto para encabezado UDP
SrcPort DstPort
Checksum Length
Data
IP
UDP
Pseudo encabezado
Largo de encabezado + datos
32
TCP Generalidades• Orientado a conexión• flujo de byte
– app escriben bytes– TCP envía segmentos– app lee bytes
Application process
Writebytes
TCPSend buffer
Segment Segment Segment
Transmit segments
Application process
Readbytes
TCPReceive buffer
…
… …
• Full duplex• Control de flujo: evita que el Tx
rebalse al receptor• Control de congestión: evita que el Tx
sobrecargue la red
33
Enlace de Datos Versus Transporte• Potencialmente conecta muchas máquinas diferentes
– requiere de establecimiento y término de conexión explícitos
• Potencialmente diferentes RTT– requiere mecanismos adaptivos para timeout
• Potencialmente largos retardos en la red– requiere estar preparado par el arribo de paquetes muy antiguos
• Potencialmente diferente capacidad en destino– requiere acomodar diferentes capacidades de nodos
• Potencialmente diferente capacidad de red– requiere estar preparado para congestión en la red
34
Version HLen TOS Length
Ident Flags Offset
TTL Protocol Checksum
SourceAddr
DestinationAddr
Options (variable) Pad(variable)
0 4 8 16 19 31
Contexto Formato de Segmento
Options (variable)
Data
Checksum
SrcPort DstPort
HdrLen 0 Flags
UrgPtr
AdvertisedWindow
SequenceNum
Acknowledgment
IP
TCP
Pseudo encabezado
URG|ACK|PSH|RST|SYN|FIN
35
Formato de Segmento
Options (variable)
Data
Checksum
SrcPort DstPort
HdrLen 0 Flags
UrgPtr
AdvertisedWindow
SequenceNum
Acknowledgment
0 4 10 16 31
36
Formato de Segmento (cont)• Cada conexión es identificada por la 4-tupla:
– (SrcPort, SrcIPAddr, DsrPort, DstIPAddr)
• Ventana deslizante + control de flujo– acknowledgment, SequenceNum, AdvertisedWinow
• Flags– SYN, FIN, RESET, PUSH, URG, ACK
• Checksum– pseudo header(IP) + TCP header + data
Sender
Data (SequenceNum)
Acknowledgment +AdvertisedWindow
Receiver
37
Establecimiento y Término de Conexión
Active participant(client)
Passive participant(server)
SYN, SequenceNum = x
SYN + ACK, SequenceNum = y,
ACK, Acknowledgment = y + 1
Acknowledgment = x + 1
38
Diagrama de Estado de TransmisiónCLOSED
LISTEN
SYN_RCVD SYN_SENT
ESTABLISHED
CLOSE_WAIT
LAST_ACKCLOSING
TIME_WAIT
FIN_WAIT_2
FIN_WAIT_1
Passive open Close
Send/SYNSYN/SYN + ACK
SYN + ACK/ACK
SYN/SYN + ACK
ACK
Close/FIN
FIN/ACKClose/FIN
FIN/ACKACK + FIN/ACK Timeout after two segment lifetimes
FIN/ACK
ACK
ACK
ACK
Close/FIN
Close
CLOSED
Active open/SYN
39
Revisión de Ventana Deslizante
• Lado TransmisorLastByteAcked < = LastByteSent
LastByteSent < = LastByteWritten
Se tiene en buffer los bytes entre LastByteAcked y LastByteWritten
Sending application
LastByteWritten
TCP
LastByteSentLastByteAcked
Receiving application
LastByteRead
TCP
LastByteRcvdNextByteExpected
• Lado ReceptorLastByteRead < NextByteExpected
NextByteExpected < = LastByteRcvd +1
Se tiene en buffer los bytes entre NextByteRead y LastByteRcvd
LastByteRead+1
40
Control de Flujo• Tamaño del buffer de envío: MaxSendBuffer• Tamaño del buffer de recepción: MaxRcvBuffer• Lado receptor
– LastByteRcvd - LastByteRead < = MaxRcvBuffer– AdvertisedWindow = MaxRcvBuffer - (LastByteRcvd - NextByteRead)
• Lado Transmisor– LastByteSent - LastByteAcked < = AdvertisedWindow– EffectiveWindow = AdvertisedWindow - (LastByteSent - LastByteAcked)– LastByteWritten - LastByteAcked < = MaxSendBuffer– Bloquear Tx si (LastByteWritten - LastByteAcked) + y > MaxSenderBuffer, y
bytes que se desean escribir. • Siempre enviar ACK en respuesta a la llegada de segmentos de datos• Tx persiste enviando 1 byte cuando AdvertisedWindow = 0
Sending application
LastByteWrittenTCP
LastByteSentLastByteAcked
Receiving application
LastByteReadTCP
LastByteRcvdNextByteExpected
41
• ¿Qué tan agresivamente el Tx explota la apertura de ventana?
• Soluciones en lado Receptor– Retardar los acuses de recibo
Síndrome de Ventana estúpida (Silly)
Sender Receiver
42
Silly Window Syndrome
43
Algoritmo de Nagle• ¿Qué tanto tiempo el Tx retarda la transmisión de datos?
– Demasiado largo: afecta aplicaciones interactivas
– Demasiado corto: Utilización de la red es pobre
– Estrategias: Basadas en timers v/s auto relojes
• Cuando la aplicación genera datos adicionales:– Si se llena un segmento (y la ventana está abierta): enviar
– Sino
• Si hay datos sin ack en Tx: dejar en buffer hasta llegada de ack
• sino: enviar datos
44
Nagle's algorithmPurpose is to allow the sender TCP to make efficient use of the network, while still being responsive to the sender applications.
Idea:If application data comes in byte by byte, send first byte only. Then buffer all application data till until ACK for first byte comes in. If network is slow and application is fast, the second segment will contain a lot of data. Send second segment and buffer all data till ACK for second segment comes in.
This way the algorithm is clocking the sends to speed of the network and simultaneously preventing sending several one byte segments back to back.
An exception to this rule is to always send (not wait for ACK) if enough data for half the receiver window or MSS.
45
Protección contra reapariciones de igual número de secuencia
• SequenceNum de 32 bits
Bandwidth Tiempo hasta tener problemaT1 (1.5 Mbps) 6.4 hoursEthernet (10 Mbps) 57 minutesT3 (45 Mbps) 13 minutesFDDI (100 Mbps) 6 minutesSTS-3 (155 Mbps) 4 minutesSTS-12 (622 Mbps) 55 secondsSTS-24 (1.2 Gbps) 28 seconds
46
Mantención de la tubería llena
• AdvertisedWindow de 16 bits
Bandwidth Delay x Bandwidth ProductT1 (1.5 Mbps) 18KBEthernet (10 Mbps) 122KBT3 (45 Mbps) 549KBFDDI (100 Mbps) 1.2MBSTS-3 (155 Mbps) 1.8MBSTS-12 (622 Mbps) 7.4MBSTS-24 (1.2 Gbps) 14.8MB
Asumiendo RTT de 100 ms
64 KB
47
Extensiones de TCP
• Son implementadas como opciones del encabezado
• Almacenar marcas de tiempo en segmentos de salida
• Extender espacio de secuencia con marca de tiempo de 32-bit (PAWS)
• Desplazar (escalar) ventana avisada. La idea es medir la ventana en unidades de 2, 4, 8 bytes.
48
TCP OptionsSome TCP options are:
Maximum segment size (MSS): Specified what is the payload the sender is able to receive. (Default MSS = 536 bytes, i.e., Segment size = MSS + 20). SMSS/RMSS is Sender/Receiver MSS.
Window scale: The window size field allows for upto 2^16 bytes of data. But this might be inefficient for high bw x delay situations. This options TCP indicate a scaling factor.
Negative acknowledgement: Lets receiver user NAKs to get realize selective repeat rather than the normal go-back-N TCP behaviour.
49
Recordemos MSS
Application Message
TCP dataTCP hdr
MSSTCP Segment
IP dataIP hdrIP Packet
Ethernet dataEthernet
Ethernet Frame
20 bytes
20 bytes
14 bytes 4 bytesMTU 1500 bytes
50
Checksum
• Se calcula entre el TCP Header , Data y el pseudo header
51
Window Management
52
Initial Sequence Number• Select initial sequence numbers (ISN) to protect against
segments from prior connections (that may circulate in the network and arrive at a much later time)
• Select ISN to avoid overlap with sequence numbers of prior connections
• Use local clock to select ISN sequence number
• Time for clock to go through a full cycle should be greater than the maximum lifetime of a segment (MSL); Typically MSL=120 seconds
53
TCP Connection Establishment
(a) TCP connection establishment in the normal case.(b) Call collision.
6-31
Initial sequence numbers are not 0. TCP uses a clock tick counter (at 4 usecs rate) to setup the initial sequence numbers. This scheme prevents delayed duplicates.
54
Connection Establishment (cont)
Active participant(client)
Passive participant(server)
SYN, SequenceNum = x
ACK, Acknowledgment =y+1
Acknowledgment =x+1
SYN+ACK,
SequenceNum=y,
55
TCP Connection Release • Graceful release:
– Each side of the connection released independently.
• Either side send TCP segment with FIN=1.• When FIN acknowledged, that direction is shut down for data.• Connection released when both sides shut down.
– 4 segments: 1 FIN and 1 ACK for each direction; 1st. ACK+2nd. FIN combined.
– Two-army problem , Timers , 2 MSL
56
TCP Connection Management Modeling
TCP connection management finite state machine. The heavy solid line is the normal path for a client. The heavy dashed line is the normal path for a server. The light lines are unusual events. Each transition is labeled by the event causing it and the action resulting from it, separated by a slash.
57
TCP Connection Management ModelingThe states used in the TCP connection management finite
state machine.
58
Maximum Segment Size
• Maximum Segment Size– largest block of data that TCP sends to other end
• Each end can announce its MSS during connection establishment
• Default is 576 bytes including 20 bytes for IP header and 20 bytes for TCP header
• Ethernet implies MSS of 1460 bytes• IEEE 802.3 implies 1452
59
Tear-down Packet Exchange
Sender ReceiverFIN
FIN-ACK
FIN
FIN-ACK
Data write
Data ack
60
Connection Tear-down
61
Detecting Half-open Connections
62
TIME-WAIT Assassination