simple tcp

Upload: arockiaruby-ruby

Post on 04-Jun-2018

218 views

Category:

Documents


0 download

TRANSCRIPT

  • 8/13/2019 Simple Tcp

    1/35

    #Create a simulator object

    set ns [new Simulator]

    #Create two nodes and a link

    set bs [$ns node]

    set br [$ns node]

    $ns duplex-link $bs $br 100Mb 10ms DropTail

    #setup sender side

    set tcp [new Agent/TCP/Linux]

    $tcp set timestamps_ true

    $tcp set window_ 100000

    #$tcp set windowOption_ 8

    $ns attach-agent $bs $tcp

    #set up receiver side

    set sink [new Agent/TCPSink/Sack1]

    $sink set ts_echo_rfc1323_ true

    $ns attach-agent $br $sink

    #logical connection

    $ns connect $tcp $sink

    #Setup a FTP over TCP connection

    set ftp [new Application/FTP]

  • 8/13/2019 Simple Tcp

    2/35

    $ftp attach-agent $tcp

    $ftp set type_ FTP

    $ns at 0 "$tcp select_ca highspeed"

    #Start FTP

    $ns at 0 "$ftp start"

    $ns at 10 "$ftp stop"

    $ns at 11 "exit 0"

    set MonitorInterval 0.1

    set qmonfile [open "queue.trace" "w"]

    close $qmonfile

    set qmon [$ns monitor-queue $bs $br "" $MonitorInterval]

    source "sampling.tcl"

    proc monitor {interval} {

    global ns tcp qmon

    set nowtime [$ns now]

    monitor_tcp $ns $tcp p_simple.trace

    monitor_queue $ns $qmon queue_simple.trace

    $ns after $interval "monitor $interval"

    }

    $ns at 0 "monitor $MonitorInterval"

  • 8/13/2019 Simple Tcp

    3/35

    $ns run

    =====================================================================================

    ============

    TCP Congestion Control

    Status of this Memo

    This document specifies an Internet standards track protocol for the

    Internet community, and requests discussion and suggestions for

    improvements. Please refer to the current edition of the "Internet

    Official Protocol Standards" (STD 1) for the standardization state

    and status of this protocol. Distribution of this memo is unlimited.

    Copyright Notice

    Copyright (C) The Internet Society (1999). All Rights Reserved.

    Abstract

    This document defines TCP's four intertwined congestion control

    algorithms: slow start, congestion avoidance, fast retransmit, and

    fast recovery. In addition, the document specifies how TCP should

    begin transmission after a relatively long idle period, as well as

    discussing various acknowledgment generation methods.

  • 8/13/2019 Simple Tcp

    4/35

    1. Introduction

    This document specifies four TCP [Pos81] congestion control

    algorithms: slow start, congestion avoidance, fast retransmit and

    fast recovery. These algorithms were devised in [Jac88] and [Jac90].

    Their use with TCP is standardized in [Bra89].

    This document is an update of [Ste97]. In addition to specifying the

    congestion control algorithms, this document specifies what TCP

    connections should do after a relatively long idle period, as well as

    specifying and clarifying some of the issues pertaining to TCP ACK

    generation.

    Note that [Ste94] provides examples of these algorithms in action and

    [WS95] provides an explanation of the source code for the BSD

    implementation of these algorithms.

    Allman, et. al. Standards Track [Page 1]

    RFC 2581 TCP Congestion Control April 1999

    This document is organized as follows. Section 2 provides various

  • 8/13/2019 Simple Tcp

    5/35

    definitions which will be used throughout the document. Section 3

    provides a specification of the congestion control algorithms.

    Section 4 outlines concerns related to the congestion control

    algorithms and finally, section 5 outlines security considerations.

    The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT",

    "SHOULD", "SHOULD NOT", "RECOMMENDED", "MAY", and "OPTIONAL" in this

    document are to be interpreted as described in [Bra97].

    2. Definitions

    This section provides the definition of several terms that will be

    used throughout the remainder of this document.

    SEGMENT:

    A segment is ANY TCP/IP data or acknowledgment packet (or both).

    SENDER MAXIMUM SEGMENT SIZE (SMSS): The SMSS is the size of the

    largest segment that the sender can transmit. This value can be

    based on the maximum transmission unit of the network, the path

    MTU discovery [MD90] algorithm, RMSS (see next item), or other

    factors. The size does not include the TCP/IP headers and

    options.

    RECEIVER MAXIMUM SEGMENT SIZE (RMSS): The RMSS is the size of the

  • 8/13/2019 Simple Tcp

    6/35

    largest segment the receiver is willing to accept. This is the

    value specified in the MSS option sent by the receiver during

    connection startup. Or, if the MSS option is not used, 536 bytes

    [Bra89]. The size does not include the TCP/IP headers and

    options.

    FULL-SIZED SEGMENT: A segment that contains the maximum number of

    data bytes permitted (i.e., a segment containing SMSS bytes of

    data).

    RECEIVER WINDOW (rwnd) The most recently advertised receiver window.

    CONGESTION WINDOW (cwnd): A TCP state variable that limits the

    amount of data a TCP can send. At any given time, a TCP MUST NOT

    send data with a sequence number higher than the sum of the

    highest acknowledged sequence number and the minimum of cwnd and

    rwnd.

    INITIAL WINDOW (IW): The initial window is the size of the sender's

    congestion window after the three-way handshake is completed.

  • 8/13/2019 Simple Tcp

    7/35

    Allman, et. al. Standards Track [Page 2]

    RFC 2581 TCP Congestion Control April 1999

    LOSS WINDOW (LW): The loss window is the size of the congestion

    window after a TCP sender detects loss using its retransmission

    timer.

    RESTART WINDOW (RW): The restart window is the size of the

    congestion window after a TCP restarts transmission after an idle

    period (if the slow start algorithm is used; see section 4.1 for

    more discussion).

    FLIGHT SIZE: The amount of data that has been sent but not yet

    acknowledged.

    3. Congestion Control Algorithms

    This section defines the four congestion control algorithms: slow

    start, congestion avoidance, fast retransmit and fast recovery,

    developed in [Jac88] and [Jac90]. In some situations it may be

    beneficial for a TCP sender to be more conservative than the

    algorithms allow, however a TCP MUST NOT be more aggressive than the

    following algorithms allow (that is, MUST NOT send data when the

    value of cwnd computed by the following algorithms would not allow

  • 8/13/2019 Simple Tcp

    8/35

    the data to be sent).

    3.1 Slow Start and Congestion Avoidance

    The slow start and congestion avoidance algorithms MUST be used by a

    TCP sender to control the amount of outstanding data being injected

    into the network. To implement these algorithms, two variables are

    added to the TCP per-connection state. The congestion window (cwnd)

    is a sender-side limit on the amount of data the sender can transmit

    into the network before receiving an acknowledgment (ACK), while the

    receiver's advertised window (rwnd) is a receiver-side limit on the

    amount of outstanding data. The minimum of cwnd and rwnd governs

    data transmission.

    Another state variable, the slow start threshold (ssthresh), is used

    to determine whether the slow start or congestion avoidance algorithm

    is used to control data transmission, as discussed below.

    Beginning transmission into a network with unknown conditions

    requires TCP to slowly probe the network to determine the available

    capacity, in order to avoid congesting the network with an

    inappropriately large burst of data. The slow start algorithm is

    used for this purpose at the beginning of a transfer, or after

    repairing loss detected by the retransmission timer.

  • 8/13/2019 Simple Tcp

    9/35

    Allman, et. al. Standards Track [Page 3]

    RFC 2581 TCP Congestion Control April 1999

    IW, the initial value of cwnd, MUST be less than or equal to 2*SMSS

    bytes and MUST NOT be more than 2 segments.

    We note that a non-standard, experimental TCP extension allows that a

    TCP MAY use a larger initial window (IW), as defined in equation 1

    [AFP98]:

    IW = min (4*SMSS, max (2*SMSS, 4380 bytes)) (1)

    With this extension, a TCP sender MAY use a 3 or 4 segment initial

    window, provided the combined size of the segments does not exceed

    4380 bytes. We do NOT allow this change as part of the standard

    defined by this document. However, we include discussion of (1) in

    the remainder of this document as a guideline for those experimenting

    with the change, rather than conforming to the present standards for

    TCP congestion control.

  • 8/13/2019 Simple Tcp

    10/35

    The initial value of ssthresh MAY be arbitrarily high (for example,

    some implementations use the size of the advertised window), but it

    may be reduced in response to congestion. The slow start algorithm

    is used when cwnd < ssthresh, while the congestion avoidance

    algorithm is used when cwnd > ssthresh. When cwnd and ssthresh are

    equal the sender may use either slow start or congestion avoidance.

    During slow start, a TCP increments cwnd by at most SMSS bytes for

    each ACK received that acknowledges new data. Slow start ends when

    cwnd exceeds ssthresh (or, optionally, when it reaches it, as noted

    above) or when congestion is observed.

    During congestion avoidance, cwnd is incremented by 1 full-sized

    segment per round-trip time (RTT). Congestion avoidance continues

    until congestion is detected. One formula commonly used to update

    cwnd during congestion avoidance is given in equation 2:

    cwnd += SMSS*SMSS/cwnd (2)

    This adjustment is executed on every incoming non-duplicate ACK.

    Equation (2) provides an acceptable approximation to the underlying

    principle of increasing cwnd by 1 full-sized segment per RTT. (Note

    that for a connection in which the receiver acknowledges every data

    segment, (2) proves slightly more aggressive than 1 segment per RTT,

  • 8/13/2019 Simple Tcp

    11/35

    and for a receiver acknowledging every-other packet, (2) is less

    aggressive.)

    Allman, et. al. Standards Track [Page 4]

    RFC 2581 TCP Congestion Control April 1999

    Implementation Note: Since integer arithmetic is usually used in TCP

    implementations, the formula given in equation 2 can fail to increase

    cwnd when the congestion window is very large (larger than

    SMSS*SMSS). If the above formula yields 0, the result SHOULD be

    rounded up to 1 byte.

    Implementation Note: older implementations have an additional

    additive constant on the right-hand side of equation (2). This is

    incorrect and can actually lead to diminished performance [PAD+98].

    Another acceptable way to increase cwnd during congestion avoidance

  • 8/13/2019 Simple Tcp

    12/35

    is to count the number of bytes that have been acknowledged by ACKs

    for new data. (A drawback of this implementation is that it requires

    maintaining an additional state variable.) When the number of bytes

    acknowledged reaches cwnd, then cwnd can be incremented by up to SMSS

    bytes. Note that during congestion avoidance, cwnd MUST NOT be

    increased by more than the larger of either 1 full-sized segment per

    RTT, or the value computed using equation 2.

    Implementation Note: some implementations maintain cwnd in units of

    bytes, while others in units of full-sized segments. The latter will

    find equation (2) difficult to use, and may prefer to use the

    counting approach discussed in the previous paragraph.

    When a TCP sender detects segment loss using the retransmission

    timer, the value of ssthresh MUST be set to no more than the value

    given in equation 3:

    ssthresh = max (FlightSize / 2, 2*SMSS) (3)

    As discussed above, FlightSize is the amount of outstanding data in

    the network.

    Implementation Note: an easy mistake to make is to simply use cwnd,

    rather than FlightSize, which in some implementations may

    incidentally increase well beyond rwnd.

  • 8/13/2019 Simple Tcp

    13/35

    Furthermore, upon a timeout cwnd MUST be set to no more than the loss

    window, LW, which equals 1 full-sized segment (regardless of the

    value of IW). Therefore, after retransmitting the dropped segment

    the TCP sender uses the slow start algorithm to increase the window

    from 1 full-sized segment to the new value of ssthresh, at which

    point congestion avoidance again takes over.

    Allman, et. al. Standards Track [Page 5]

    RFC 2581 TCP Congestion Control April 1999

    3.2 Fast Retransmit/Fast Recovery

    A TCP receiver SHOULD send an immediate duplicate ACK when an out-

    of-order segment arrives. The purpose of this ACK is to inform the

    sender that a segment was received out-of-order and which sequence

    number is expected. From the sender's perspective, duplicate ACKs

  • 8/13/2019 Simple Tcp

    14/35

    can be caused by a number of network problems. First, they can be

    caused by dropped segments. In this case, all segments after the

    dropped segment will trigger duplicate ACKs. Second, duplicate ACKs

    can be caused by the re-ordering of data segments by the network (not

    a rare event along some network paths [Pax97]). Finally, duplicate

    ACKs can be caused by replication of ACK or data segments by the

    network. In addition, a TCP receiver SHOULD send an immediate ACK

    when the incoming segment fills in all or part of a gap in the

    sequence space. This will generate more timely information for a

    sender recovering from a loss through a retransmission timeout, a

    fast retransmit, or an experimental loss recovery algorithm, such as

    NewReno [FH98].

    The TCP sender SHOULD use the "fast retransmit" algorithm to detect

    and repair loss, based on incoming duplicate ACKs. The fast

    retransmit algorithm uses the arrival of 3 duplicate ACKs (4

    identical ACKs without the arrival of any other intervening packets)

    as an indication that a segment has been lost. After receiving 3

    duplicate ACKs, TCP performs a retransmission of what appears to be

    the missing segment, without waiting for the retransmission timer to

    expire.

    After the fast retransmit algorithm sends what appears to be the

    missing segment, the "fast recovery" algorithm governs the

    transmission of new data until a non-duplicate ACK arrives. The

  • 8/13/2019 Simple Tcp

    15/35

    reason for not performing slow start is that the receipt of the

    duplicate ACKs not only indicates that a segment has been lost, but

    also that segments are most likely leaving the network (although a

    massive segment duplication by the network can invalidate this

    conclusion). In other words, since the receiver can only generate a

    duplicate ACK when a segment has arrived, that segment has left the

    network and is in the receiver's buffer, so we know it is no longer

    consuming network resources. Furthermore, since the ACK "clock"

    [Jac88] is preserved, the TCP sender can continue to transmit new

    segments (although transmission must continue using a reduced cwnd).

    The fast retransmit and fast recovery algorithms are usually

    implemented together as follows.

    1. When the third duplicate ACK is received, set ssthresh to no more

    than the value given in equation 3.

    Allman, et. al. Standards Track [Page 6]

    RFC 2581 TCP Congestion Control April 1999

    2. Retransmit the lost segment and set cwnd to ssthresh plus 3*SMSS.

  • 8/13/2019 Simple Tcp

    16/35

    This artificially "inflates" the congestion window by the number

    of segments (three) that have left the network and which the

    receiver has buffered.

    3. For each additional duplicate ACK received, increment cwnd by

    SMSS. This artificially inflates the congestion window in order

    to reflect the additional segment that has left the network.

    4. Transmit a segment, if allowed by the new value of cwnd and the

    receiver's advertised window.

    5. When the next ACK arrives that acknowledges new data, set cwnd to

    ssthresh (the value set in step 1). This is termed "deflating"

    the window.

    This ACK should be the acknowledgment elicited by the

    retransmission from step 1, one RTT after the retransmission

    (though it may arrive sooner in the presence of significant out-

    of-order delivery of data segments at the receiver).

    Additionally, this ACK should acknowledge all the intermediate

    segments sent between the lost segment and the receipt of the

    third duplicate ACK, if none of these were lost.

    Note: This algorithm is known to generally not recover very

    efficiently from multiple losses in a single flight of packets

  • 8/13/2019 Simple Tcp

    17/35

    [FF96]. One proposed set of modifications to address this problem

    can be found in [FH98].

    4. Additional Considerations

    4.1 Re-starting Idle Connections

    A known problem with the TCP congestion control algorithms described

    above is that they allow a potentially inappropriate burst of traffic

    to be transmitted after TCP has been idle for a relatively long

    period of time. After an idle period, TCP cannot use the ACK clock

    to strobe new segments into the network, as all the ACKs have drained

    from the network. Therefore, as specified above, TCP can potentially

    send a cwnd-size line-rate burst into the network after an idle

    period.

    [Jac88] recommends that a TCP use slow start to restart transmission

    after a relatively long idle period. Slow start serves to restart

    the ACK clock, just as it does at the beginning of a transfer. This

    mechanism has been widely deployed in the following manner. When TCP

    has not received a segment for more than one retransmission timeout,

    cwnd is reduced to the value of the restart window (RW) before

  • 8/13/2019 Simple Tcp

    18/35

    Allman, et. al. Standards Track [Page 7]

    RFC 2581 TCP Congestion Control April 1999

    transmission begins.

    For the purposes of this standard, we define RW = IW.

    We note that the non-standard experimental extension to TCP defined

    in [AFP98] defines RW = min(IW, cwnd), with the definition of IW

    adjusted per equation (1) above.

    Using the last time a segment was received to determine whether or

    not to decrease cwnd fails to deflate cwnd in the common case of

    persistent HTTP connections [HTH98]. In this case, a WWW server

    receives a request before transmitting data to the WWW browser. The

    reception of the request makes the test for an idle connection fail,

    and allows the TCP to begin transmission with a possibly

    inappropriately large cwnd.

    Therefore, a TCP SHOULD set cwnd to no more than RW before beginning

    transmission if the TCP has not sent data in an interval exceeding

    the retransmission timeout.

    4.2 Generating Acknowledgments

  • 8/13/2019 Simple Tcp

    19/35

    The delayed ACK algorithm specified in [Bra89] SHOULD be used by a

    TCP receiver. When used, a TCP receiver MUST NOT excessively delay

    acknowledgments. Specifically, an ACK SHOULD be generated for at

    least every second full-sized segment, and MUST be generated within

    500 ms of the arrival of the first unacknowledged packet.

    The requirement that an ACK "SHOULD" be generated for at least every

    second full-sized segment is listed in [Bra89] in one place as a

    SHOULD and another as a MUST. Here we unambiguously state it is a

    SHOULD. We also emphasize that this is a SHOULD, meaning that an

    implementor should indeed only deviate from this requirement after

    careful consideration of the implications. See the discussion of

    "Stretch ACK violation" in [PAD+98] and the references therein for a

    discussion of the possible performance problems with generating ACKs

    less frequently than every second full-sized segment.

    In some cases, the sender and receiver may not agree on what

    constitutes a full-sized segment. An implementation is deemed to

    comply with this requirement if it sends at least one acknowledgment

    every time it receives 2*RMSS bytes of new data from the sender,

    where RMSS is the Maximum Segment Size specified by the receiver to

    the sender (or the default value of 536 bytes, per [Bra89], if the

    receiver does not specify an MSS option during connection

    establishment). The sender may be forced to use a segment size less

  • 8/13/2019 Simple Tcp

    20/35

    than RMSS due to the maximum transmission unit (MTU), the path MTU

    discovery algorithm or other factors. For instance, consider the

    Allman, et. al. Standards Track [Page 8]

    RFC 2581 TCP Congestion Control April 1999

    case when the receiver announces an RMSS of X bytes but the sender

    ends up using a segment size of Y bytes (Y < X) due to path MTU

    discovery (or the sender's MTU size). The receiver will generate

    stretch ACKs if it waits for 2*X bytes to arrive before an ACK is

    sent. Clearly this will take more than 2 segments of size Y bytes.

    Therefore, while a specific algorithm is not defined, it is desirable

    for receivers to attempt to prevent this situation, for example by

    acknowledging at least every second segment, regardless of size.

    Finally, we repeat that an ACK MUST NOT be delayed for more than 500

    ms waiting on a second full-sized segment to arrive.

    Out-of-order data segments SHOULD be acknowledged immediately, in

    order to accelerate loss recovery. To trigger the fast retransmit

    algorithm, the receiver SHOULD send an immediate duplicate ACK when

    it receives a data segment above a gap in the sequence space. To

    provide feedback to senders recovering from losses, the receiver

  • 8/13/2019 Simple Tcp

    21/35

    SHOULD send an immediate ACK when it receives a data segment that

    fills in all or part of a gap in the sequence space.

    A TCP receiver MUST NOT generate more than one ACK for every incoming

    segment, other than to update the offered window as the receiving

    application consumes new data [page 42, Pos81][Cla82].

    4.3 Loss Recovery Mechanisms

    A number of loss recovery algorithms that augment fast retransmit and

    fast recovery have been suggested by TCP researchers. While some of

    these algorithms are based on the TCP selective acknowledgment (SACK)

    option [MMFR96], such as [FF96,MM96a,MM96b], others do not require

    SACKs [Hoe96,FF96,FH98]. The non-SACK algorithms use "partial

    acknowledgments" (ACKs which cover new data, but not all the data

    outstanding when loss was detected) to trigger retransmissions.

    While this document does not standardize any of the specific

    algorithms that may improve fast retransmit/fast recovery, these

    enhanced algorithms are implicitly allowed, as long as they follow

    the general principles of the basic four algorithms outlined above.

    Therefore, when the first loss in a window of data is detected,

    ssthresh MUST be set to no more than the value given by equation (3).

    Second, until all lost segments in the window of data in question are

    repaired, the number of segments transmitted in each RTT MUST be no

  • 8/13/2019 Simple Tcp

    22/35

    more than half the number of outstanding segments when the loss was

    detected. Finally, after all loss in the given window of segments

    has been successfully retransmitted, cwnd MUST be set to no more than

    ssthresh and congestion avoidance MUST be used to further increase

    cwnd. Loss in two successive windows of data, or the loss of a

    retransmission, should be taken as two indications of congestion and,

    therefore, cwnd (and ssthresh) MUST be lowered twice in this case.

    Allman, et. al. Standards Track [Page 9]

    RFC 2581 TCP Congestion Control April 1999

    The algorithms outlined in [Hoe96,FF96,MM96a,MM6b] follow the

    principles of the basic four congestion control algorithms outlined

    in this document.

    5. Security Considerations

    This document requires a TCP to diminish its sending rate in the

    presence of retransmission timeouts and the arrival of duplicate

    acknowledgments. An attacker can therefore impair the performance of

    a TCP connection by either causing data packets or their

    acknowledgments to be lost, or by forging excessive duplicate

  • 8/13/2019 Simple Tcp

    23/35

    acknowledgments. Causing two congestion control events back-to-back

    will often cut ssthresh to its minimum value of 2*SMSS, causing the

    connection to immediately enter the slower-performing congestion

    avoidance phase.

    The Internet to a considerable degree relies on the correct

    implementation of these algorithms in order to preserve network

    stability and avoid congestion collapse. An attacker could cause TCP

    endpoints to respond more aggressively in the face of congestion by

    forging excessive duplicate acknowledgments or excessive

    acknowledgments for new data. Conceivably, such an attack could

    drive a portion of the network into congestion collapse.

    6. Changes Relative to RFC 2001

    This document has been extensively rewritten editorially and it is

    not feasible to itemize the list of changes between the two

    documents. The intention of this document is not to change any of the

    recommendations given in RFC 2001, but to further clarify cases that

    were not discussed in detail in 2001. Specifically, this document

    suggests what TCP connections should do after a relatively long idle

    period, as well as specifying and clarifying some of the issues

    pertaining to TCP ACK generation. Finally, the allowable upper bound

    for the initial congestion window has also been raised from one to

    two segments.

  • 8/13/2019 Simple Tcp

    24/35

    Acknowledgments

    The four algorithms that are described were developed by Van

    Jacobson.

    Some of the text from this document is taken from "TCP/IP

    Illustrated, Volume 1: The Protocols" by W. Richard Stevens

    (Addison-Wesley, 1994) and "TCP/IP Illustrated, Volume 2: The

    Implementation" by Gary R. Wright and W. Richard Stevens (Addison-

    Wesley, 1995). This material is used with the permission of

    Addison-Wesley.

    =====================================================================================

    =========

    TCP SUMMARY

    TCP provides a connection oriented, reliable, byte stream service. The term connection-oriented means

    the two applications using TCP must establish a TCP connection with each other before they can

    exchange data. It is a full duplex protocol, meaning that each TCP connection supports a pair of byte

    streams, one flowing in each direction. TCP includes a flow-control mechanism for each of these byte

    streams that allows the receiver to limit how much data the sender can transmit. TCP also implements a

    congestion-control mechanism.

    Two processes communicating via TCP sockets. Each side of a TCP connection has a socket which can be

    identified by the pair < IP_address, port_number >. Two processes communicating over TCP form a

    logical connection that is uniquely identifiable by the two sockets involved, that is by the combination .

  • 8/13/2019 Simple Tcp

    25/35

    TCP provides the following facilities to:

    Stream Data Transfer

    From the application's viewpoint, TCP transfers a contiguous stream of bytes. TCP does this by grouping

    the bytes in TCP segments, which are passed to IP for transmission to the destination. TCP itself decides

    how to segment the data and it may forward the data at its own convenience.

    Reliability

    TCP assigns a sequence number to each byte transmitted, and expects a positive acknowledgment (ACK)

    from the receiving TCP. If the ACK is not received within a timeout interval, the data is retransmitted.

    The receiving TCP uses the sequence numbers to rearrange the segments when they arrive out of order,

    and to eliminate duplicate segments.

    Flow Control

    The receiving TCP, when sending an ACK back to the sender, also indicates to the sender the number of

    bytes it can receive beyond the last received TCP segment, without causing overrun and overflow in its

    internal buffers. This is sent in the ACK in the form of the highest sequence number it can receive

    without problems.

    Multiplexing

    To allow for many processes within a single host to use TCP communication facilities simultaneously, the

    TCP provides a set of addresses or ports within each host. Concatenated with the network and host

    addresses from the internet communication layer, this forms a socket. A pair of sockets uniquely

    identifies each connection.

    Logical Connections

    The reliability and flow control mechanisms described above require that TCP initializes and maintains

    certain status information for each data stream. The combination of this status, including sockets,

    sequence numbers and window sizes, is called a logical connection. Each connection is uniquely

    identified by the pair of sockets used by the sending and receiving processes.

    Full Duplex

    TCP provides for concurrent data streams in both directions.

    TCP HEADER

  • 8/13/2019 Simple Tcp

    26/35

    TCP data is encapsulated in an IP datagram. The figure shows the format of the TCP header. Its normal

    size is 20 bytes unless options are present. Each of the fields is discussed below:

    The SrcPort and DstPort fields identify the source and destination ports,respectively. These two fields

    plus the source and destination IP addresses, combine to uniquely identify each TCP connection.

    The sequence number identifies the byte in the stream of data from the sending TCP to the receiving

    TCP that the first byte of data in this segment represents.

    The Acknowledgement number field contains the next sequence number that the sender of the

    acknowledgement expects to receive. This is therefore the sequence number plus 1 of the last

    successfully received byte of data. This field is valid only if the ACK flag is on. Once a connection is

    established the Ack flag is always on.

    The Acknowledgement, SequenceNum, and AdvertisedWindow fields are all involved in TCP's sliding

    window algorithm.The Acknowledgement and AdvertisedW indow fields carry information about the

    flow of dat going in the other direction. In TCP's sliding window algorithm the reciever advertises a

    window size to the sender. This is done using the AdvertisedWindow field. The sender is then limited to

    having no more than a value of AdvertisedWindow bytes of un acknowledged data at any given time.

    The receiver sets a suitable value for the AdvertisedWindow based on the amount of memory allocated

    to the connection for the purpose of buffering data.

    The header length gives the length of the header in 32-bit words. This is required because the length of

    the options field is variable.

    The 6-bit Flags field is used to relay control information between TCP peers. The possible flags include

    SYN, FIN, RESET, PUSH, URG, and ACK.

    The SYN and Fin flags are used when establishing and terminating a TCP connection, respectively.

    The ACK flag is set any time the Acknowledgement field is valid, implying that the receiver should pay

    attention to it.

    The URG flag signifies that this segment contains urgent data. When this flag is set, the UrgPtr field

    indicates where the non-urgent data contained in this segment begins.

    The PUSH flag signifies that the sender invoked the push operation, whic h indicates to the receiving side

    of TCP that it should notify the receiving process of this fact.

    Finally, the RESET flag signifies that the receiver has become confused and so wants to abort the

    connection.

  • 8/13/2019 Simple Tcp

    27/35

    The Checksum covers the TCP segment: the TCP header and the TCP data. This is a mandatory field that

    must be calculated by the sender, and then verified by the receiver.

    The Option field is the maximum segment size option, called the MSS. Each end of the connection

    normally specifies this option on the first segment exchanged. It specifies the maximum sized segment

    the sender wants to recieve.

    The data portion of the TCP segment is optional.

    TCP STATE TRANSITION DIAGRAM

    The two transitions leading to the ESTABLISHED state correspond to the opening of a connection, and

    the two transitions leading from the ESTABLISHED state are for the termination of a connection. The

    ESTABLISHED state is where data transfer can occur between the two ends in both the directions.

    If a connection is in the LISTEN state and a SYN segment arrives, the connection makes a transition to

    the SYN_RCVD state and takes the action of replying with an ACK+SYN segment. The client does an

    active open which causes its end of the connection to send a SYN segment to the server and to move to

    the SYN_SENT state. The arrival of the SYN+ACK segment causes the client to mo ve to the ESTABLISHED

    state and to send an ack back to the server. When this ACK arrives the server finally moves to the

    ESTABLISHED state. In other words, we have just traced the THREE-WAY HANDSHAKE.

    In the process of terminating a connection, the important thing to kee p in mind is that the application

    process on both sides of the connection must independently close its half of the connection. Thus, on

    any one side there are three combinations of transition that get a connection from the ESTABLISHED

    state to the CLOSED state:

    This side closes first:

    ESTABLISHED -> FIN_WAIT_1-> FIN_WAIT_2 -> TIME_WAIT -> CLOSED.

    The other side closes first:

    ESTABLISHED -> CLOSE_WAIT -> LAST_ACK -> CLOSED.

  • 8/13/2019 Simple Tcp

    28/35

    Both sides close at the same time:

    ESTABLISHED -> FIN_WAIT_1-> CLOSING ->TIME_WAIT -> CLOSED.

    The main thing to recognize about connection teardown is that a connection in the TIME_WAIT state

    cannot move to the CLOSED state until it has waited for two times the maximum amount of time an IP

    datagram might live in the Inter net. The reason for this is that while the local side of the connection has

    sent an ACK in response to the other side's FIN segment, it does not know that the ACK was successfully

    delivered. As a consequence this other side might re transmit its FIN segment, and this second FIN

    segment might be delayed in the network. If the connection were allowed to move directly to the

    CLOSED state, then another pair of application processes might come along and open the same

    connection, and the delayed FIN segment from the earlier incarnation of the connection would

    immediately initiate the termination of the later incarnation of that connection.

    SLIDING WINDOW

    The sliding window serves several purposes:

    (1) it guarantees the reliable delivery of data

    (2) it ensures that the data is delivered in order,

    (3) it enforces flow control between the sender and the receiver.

    Reliable and ordered delivery

    The sending and receiving sides of TCP interact in the following manner to implement reliable and

    ordered delivery:

  • 8/13/2019 Simple Tcp

    29/35

    Each byte has a sequence number.

    ACKs are cumulative.

    Sending side

    LastByteAcked

  • 8/13/2019 Simple Tcp

    30/35

    LastByteSent - LastByteAcked

  • 8/13/2019 Simple Tcp

    31/35

    a between 0.8 and 0.9

    b between 0.1 and 0.2

    Set timeout based on EstimatedRTT

    TimeOut = 2 * EstimatedRTT

    Karn/Partridge Algorithm

    Do not sample RTT when retransmitting

    Double timeout after each retransmission

    Jacobson/Karels Algorithm

    New calculation for average RTT

    Difference = SampleRTT - EstimatedRTT

    EstimatedRTT = EstimatedRTT + ( d * Difference)

  • 8/13/2019 Simple Tcp

    32/35

    Deviation = Deviation + d ( |Difference| - Deviation)), where d is a fraction between 0 and 1

    Consider variance when setting timeout value

    Timeout = u * EstimatedRTT + q * Deviation, where u = 1 and q = 4

    Congestion Control

    Slow Start

    It operates by observing that the rate at which new packets should be injected into the network is the

    rate at which the acknowledgments are returned by the other end.

    Slow start adds another window to the sender's TCP: the congestion window, called "cwnd". When a

    new connection is established with a host on another network, the congestion window is initialized to

    one segment (i.e., the segment size announced by the other end, or the default, typically 536 or 512).Each time an ACK is received, the congestion window is increased by one segment. The sender can

    transmit up to the minimum of the congestion window and the advertised window. The congestion

    window is flow control imposed by the sender, while the advertised window is flow control imposed by

    the receiver. The former is based on the sender's assessment of perceived network congestion; the

    latter is related to the amount of available buffer space at the receiver for this connection.

    The sender starts by transmitting one segment and waiting for its ACK. When that ACK is received, the

    congestion window is incremented from one to two, and two segments can be sent. When each of those

    two segments is acknowledged, the congestion window is increased to four. This provides an

    exponential growth, although it is not exactly exponential because the receiver may delay its ACKs,

    typically sending one ACK for every two segments that it receives.

    At some point the capacity of the internet can be reached, and an intermediate router will start

    discarding packets. This tells the sender that its congestion window has gotten too large.

  • 8/13/2019 Simple Tcp

    33/35

    Early implementations performed slow start only if the other end was on a different network. Current

    implementations always perform slow start.

    Congestion Avoidance

    Congestion can occur when data arrives on a big pipe (a fast LAN) and gets sent out a smaller pipe (a

    slower WAN). Congestion can also occur when multiple input streams arrive at a router whose output

    capacity is less than the sum of the inputs. Congestion avoidance is a way to deal with lost packets.

    The assumption of the algorithm is that packet loss caused by damage is very small (much less than 1%),therefore the loss of a packet signals congestion somewhere in the network between the source and

    destination. There are two indications of packet loss: a timeout occurring and the receipt of duplicate

    ACKs.

    Congestion avoidance and slow start are independent algorithms with different objectives. But when

    congestion occurs TCP must slow down its transmission rate of packets into the network, and then

    invoke slow start to get things going again. In practice they are implemented together.

    Congestion avoidance and slow start require that two variables be maintained for each connection: a

    congestion window, cwnd, and a slow start threshold size, ssthresh. The combined algorithm operates

    as follows:

    1. Initialization for a given connection sets cwnd to one segment and ssthresh to 65535 bytes.

    2. The TCP output routine never sends more than the minimum of cwnd and the receiver's advertised

    window.

    3. When congestion occurs (indicated by a timeout or the reception of duplicate ACKs), one-half of the

    current window size (the minimum of cwnd and the receiver's advertised window, but at least two

  • 8/13/2019 Simple Tcp

    34/35

    segments) is saved in ssthresh. Additionally, if the congestion is indicated by a timeout, cwnd is set to

    one segment (i.e., slow start).

    4. When new data is acknowledged by the other end, increase cwnd, but the way it increases depends

    on whether TCP is performing slow start or congestion avoidance.

    If cwnd is less than or equal to ssthresh, TCP is in slow start; otherwise TCP is performing congestion

    avoidance. Slow start continues until TCP is halfway to where it was when congestion occurred (since it

    recorded half of the window size that caused the problem in step 2), and then congestion avoidance

    takes over.

    Slow start has cwnd begin at one segment, and be incremented by one segment every time an ACK is

    received. As mentioned earlier, this opens the window exponentially: send one segment, then two, then

    four, and so on. Congestion avoidance dictates that cwnd be incremented by segsize*segsize/cwnd each

    time an ACK is received, where segsize is the segment size and cwnd is maintained in bytes. This is a

    linear growth of cwnd, compared to slow start's exponential growth. The increase in cwnd should be at

    most one segment each round-trip time (regardless how many ACKs are received in that RTT), whereas

    slow start increments cwnd by the number of ACKs received in a round-trip time.

    Fast Retransmit

    TCP may generate an immediate acknowledgment (a duplicate ACK) when an out- of-order segment is

    received. This duplicate ACK should not be delayed. The purpose of this duplicate ACK is to let the other

    end know that a segment was received out of order, and to tell it what sequence number is expected.

    Since TCP does not know whether a duplicate ACK is caused by a lost segment or just a reordering of

    segments, it waits for a small number of duplicate ACKs to be received. It is assumed that if there is just

    a reordering of the segments, there will be only one or two duplicate ACKs before the reordered

    segment is processed, which will then generate a new ACK. If three or more duplicate ACKs are received

    in a row, it is a strong indication that a segment has been lost. TCP then performs a retransmission of

    what appears to be the missing segment, without waiting for a retransmission timer to expire.

    Fast Recovery

  • 8/13/2019 Simple Tcp

    35/35

    After fast retransmit sends what appears to be the missing segment, congestion avoidance, but not slow

    start is performed. This is the fast recovery algorithm. It is an improvement that allows high throughput

    under moderate congestion, especially for large windows.

    The reason for not performing slow start in this case is that the receipt of the duplicate ACKs tells TCP

    more than just a packet has been lost. Since the receiver can only generate the duplicate ACK when

    another segment is received, that segment has left the network and is in the receiver's buffer. That is,

    there is still data flowing between the two ends, and TCP does not want to reduce the flow abruptly by

    going into slow start.

    The fast retransmit and fast recovery algorithms are usually implemented together as follows.

    1. When the third duplicate ACK in a row is received, set ssthresh to one-half the current congestion

    window, cwnd, but no less than two segments. Retransmit the missing segment. Set cwnd to ssthresh

    plus 3 times the segment size. This inflates the congestion window by the number of segments that have

    left the network and which the other end has cached .

    2. Each time another duplicate ACK arrives, increment cwnd by the segment size. This inflates the

    congestion window for the additional segment that has left the network. Transmit a packet, if allowed

    by the new value of cwnd.

    3. When the next ACK arrives that acknowledges new data, set cwnd to ssthresh (the value set in step 1).

    This ACK should be the acknowledgment of the retransmission from step 1, one round-trip time after

    the retransmission. Additionally, this ACK should acknowledge all the intermediate segments sent

    between the lost packet and the receipt of the first duplicate ACK. This step is congestion avoidance,

    since TCP is down to one-half the rate it was at when the packet was lost.

    =====================================================================================

    =========================\=