advanced internet technologiesdressler/teaching/internet-technologi... · · 2004-08-10a traffic...

Advanced Internet Technologies, SS 2004 4.1

Advanced Internet TechnologiesChapter 4

Congestion Control and Traffic Management

Dr.-Ing. Falko Dressler

Chair for Computer Networks & InternetWilhelm-Schickard-Institute for Computer Science

University of Tübingen

http://net.informatik.uni-tuebingen.de/[email protected]


Chapter 4Congestion Control and Traffic Management

Congestion ControlLink-Level Flow and Error ControlTCP Traffic ControlTraffic and Congestion Control in ATM Networks

Queues in Systems and Networks


Input and Output Queues at Switch or Router Interaction of Queues in a Data Network

Effects of Congestion


Effects of Congestion II


Congestion Control Mechanisms


BackpressureBased on logical connections (flows)Flow restrictions are applied in case of congestionThese are propagated backward to the source (hop-by-hop)Operable in connection-oriented networks, e.g. X.25Not available in frame replay, ATM, or the IP based internet

Choke PacketControl packet generated at a congested node and transmitted back to the source to restrict traffic flowExample: ICMP source quench packet

Generated by any router or host that must discard IP datagramsbecause of a full bufferThe sender should cut back the rate at which it is sending trafficRepeated for every discarded packet

Congestion Control Mechanisms II


Implicit Congestion SignalingIf the source is able to detect increased delays and packet discards, then it has implicit evidence of network congestion.Example: TCP congestion techniques

Based on acknowledgmentsUsing delay measures and packet losses

Explicit Congestion SignalingTypically used in connection-oriented networksCan operate in two directions:

Backward: notifying the source about congestionForward: notifying the user that congestion control mechanisms should be initiated for traffic in the same direction as the received packet

Categories of explicit congestion signaling approachesBinary: a bit is set in a data packetCredit based: explicit credits are required for sending packetsRate based: explicit rate limits are applied to a connection

Congestion Control Mechanisms III


Traffic Management


FairnessDuring congestion, all flows suffer from increased delays and packet lossesFairness means, that all flows suffer from congestion equally

Quality of ServiceIn some cases flows should be treated differentlyVoice or video applications are more delay sensitive than othersDifferent priorities might be applied to different flowsDuring congestion, flows should be treated based on their QoSrequirements and/or priorities

Traffic Management II


ReservationsReservation schemes are one way to avoid congestion and to provide assured service to applicationsA traffic contract is negotiated during connection setupThe network agrees to give a defined QoS so long as the traffic flow is within contract parametersIf the current resources are inadequate to meet new reservations, the new reservation is denied

Link-Level Flow and Error Control


Fundamental mechanisms that determine the performance of communication links, networks, and internetworksIntimately associated with recover mechanisms from lost packets

Typically implemented on link level, sometimes also at network level, transport level, or application level

Performance modeling of flow and error control techniques is extraordinarily difficultNevertheless, very important and, therefore, discussed in this section

The need for Flow and Error Control


Flow Control limits the amount or rate of data that is sent for several reasons:

As each PDU arrives, the destination must process it. The source may attempt to send PDUs faster than the destination can process them.The destination protocol entity may buffer the incoming data for delivery to the higher-level protocol. It that is slow, the buffer may fill up and the destination may need to limit or temporarily halt the flow from source.The destination may buffer the incoming data for retransmission on another I/O port (e.g. in the case of a router) and may need to limit the incoming flow to match the outgoing flow.

Flow Control at Multiple Protocol Layers


Flow Control Scope


Link Control Mechanisms


Reasons for breaking a data block into a sequence of framesBuffer size of the receiver my be limitedThe longer the transmission, the more likely that there will be an errorOn a shared medium, such as a LAN, fairness between the senders is desired

Link control mechanismsStop and WaitSliding-Window technique

Go-Back-NSelective-Reject

Stop and Wait


Simplest form of flow controlWorking principle

A source entity transmits a frameAfter reception, the destination entity indicates its willingness to accept another frame by sending an acknowledgementThe source must wait until it receives the acknowledgement before sending the next frame

Possible errorsA received frame could be damaged; that is, one or more bits have been alteredSolution: error control, e.g. cyclic redundancy check (CRC), retransmission of the damaged frameA received acknowledge could be damagedSolution: labeling of all frames with 0 and 1, acknowledging with ACK0 and ACK1

Stop and Wait II


Sliding-Window Techniques


Mains problem of Stop-and-WaitOnly one frame at a time can be in transitAn explicit acknowledge is required for each single frameThus, the transmission time for each frame is very high

SolutionEfficiency can be greatly improved by allowing multiple frames to be in transit at the same timeAn acknowledge is required for a group of framesThus, the transmission time decreases with an increasing window size

Examples of ARQs (Automatic Repeat Requests)Go-back-NSelective-Reject

Sliding-Window Techniques II


Sliding-Window Techniques - Example


Sliding-Window Techniques IV


Go-back-N ARQA sender may transmit a series of frames sequentially numbered modulo some maximum valueThe number of unacknowledged frames outstanding is determined by the window sizeWhile no error occurs, the receiver acknowledges incoming frames

RR (receive ready)Piggybacked acknowledgements

If an error is detected, a negative acknowledge is sent: REJ (reject). The destination will discard this and all future incoming frames until the frame in error is correctly received

Selective-Reject ARQOnly those frames are retransmitted that receive a negative acknowledgement (SREJ), or those that time outMore efficient than Go-back-NReceiver must maintain a buffer large enough to save post-SREJ frames

Sliding-Window Techniques V


ARQ Performance – Stop-and-Wait


Error-freeTransmission time T

Assuming a negligible processing time and very small acknowledgement frames

Normalized throughput S

procpropackprocpropframe TTTTTTT +++++=

propframe 2TTT +=

propframe

frame

frame

propframe

2/1)2/(1

TTT

TTTS

+=

+=

ARQ Performance – Stop-and-Wait II


With errorsTransmission time T

Assuming a negligible processing time and very small acknowledgement frames

Normalized throughput S

propframe 2Timeout TTT ++=

)2( propframe TTNT x +=

)2( propframe

frame

TTNTS

x +=

Parameter a


Parameter ad distance of the linkV velocity of propagationL length of the link control frameR data rate on the link

Stop-and-Wait without errors

Stop-and-Wait with errors

frameprop TTa =

VLRd

RLVda ==

aS

211+

=

aPS21

1+−

=

Stop-and-Wait Timing


Stop-and-Wait Performance


ARQ Performance – Sliding-Window


Selective-Reject

Go-back-N

12

12

12)1(

1

+<

+≥

⎪⎩

⎪⎨⎧

+−−

=aW

aW

aPW

PS

12

12

)1)(12()1(

211

+<

+≥

⎪⎪⎩

⎪⎪⎨

⎧

+−+−

+−

=aW

aW

WPPaPW

aPP

S

ARQ Performance


TCP Traffic Control


Overview to TCP Flow ControlEnd-to-end flow control mechanismConnection-oriented transport protocolHigh impact on the congestion level in the network

Sliding-window mechanismDecoupled acknowledgments and permissions to send additional data

TCP Flow Control Mechanism


Credit allocation scheme

Parameter (TCP header)Sequence number (SN)Acknowledgment number (AN)Window (W)

Working principleData segment including SNAcknowledge including (AN=I, W=j)

All octets through SN=i-1 are acknowledgedPermission is granted to send additional W=j octets of data

TCP Flow Control Mechanism II


Sending and Receiving Flow Control Perspectives


Effects of Window Size on TCP Performance


NotationW = TCP window size (octets)R = data rate (bps) at TCP source available to a given TCP connectionD = propagation delay (seconds) between TCP source and destination

Normalized throughput

4

4 4

1

RDW

RDW

RDWS

<

>

⎪⎩

⎪⎨

⎧=

Effects of Window Size on TCP Performance II


TCP Error Control


Retransmission strategyNo explicit negative acknowledgment such as REJ or SREJTCP relies exclusively on positive acknowledgments and retransmissions when an acknowledgment does not arrive within a given timeout period

Retransmission timerKey element of TCPTwo strategies to set the timer

Fixed– If too small, there will be many unnecessary retransmissions– If too large, the protocol will be sluggish in responding to a lost segment

Variable– Should be set to a value somewhat longer than the round-trip delay– Adapting to the current network behavior

Adaptive Retransmission Timer


Average round-trip time

Weighted average round-trip time

Retransmission timer (retransmission timeout) (RFC793)

∑+

=+=+

1

1)(

11)1(

K

iiRTT

KKARTT

)1(1

1)(1

)1( ++

++

=+ KRTTK

KARTTK

KKARTT

)1(*)1()(*)1( +−+=+ KRTTKSRTTKSRTT αα

)1(*)1( )(*)1()1(*)1()1(

RTTKRTTKRTTKSRTT

K αα

ααα

−+

+−++−=+

K

)1(*)1( +=+ KSRTTKRTO β

Exponential Smoothing Coefficients


Exponential Averaging – Increasing Function


Exponential Averaging – Decreasing Function


TCP Implementation Policy Options


Send policyTCP may construct a segment for each batch of user data,or it may wait until a certain amount of data accumulates before sending a segment

Deliver policyTCP may deliver data as each in-order segment is received,or it may buffer data from a number of segments before delivery

Accept policyIn-order: accept only segments that arrive in-orderIn-window: accept all segments that are within the receive window

TCP Implementation Policy Options II


Retransmit policyFirst-only: one retransmission timer per queue, if the timer expires, retransmit the first segment in the queueBatch: one retransmission timer per queue, if the timer expires, retransmit all segments in the queueIndividual: one timer for each segment in the queue

Acknowledge policyImmediate: immediately transmit an empty (no data) segment containing the appropriate acknowledge numberCumulative: wait for data on which to piggyback the acknowledge, if no data is available (timer), transmit an empty segment containing the acknowledge number

TCP Congestion Control


Congestion control in a TCP/IP-based internet is a complex and difficult undertaking due to the following factors:

IP is a connectionless, stateless protocol that includes no provision for detecting, much less controlling congestionTCP provides only end-to-end flow control and can only deduce the presence of congestion within the intervening internet by indirect meansTCP entities cannot cooperate to maintain a certain total level of flow and, indeed, are more likely to compete selfishly for available resources

The only tool in TCP that relates to network congestion is the sliding-window flow and error control mechanism

TCP Flow and Congestion Control


Pacing effect in TCPThe rate at which a TCP entity can send data is determined by the rate of incoming ACKs to previous segments with new credit

TCP Flow and Congestion Control II


Pacing effect in TCPThe rate of ACKs is determined by the bottleneck in the round-trip path between the source and the destination, and that bottleneck may be either the destination or the internet

TCP Flow and Congestion Control III


Pacing effect in TCPThe source has no way of knowing whether the pacing rate at which it receives ACKs reflects the status of the internet (congestion control) or thestatus of the destination (flow control)

TCP Congestion Control Mechanisms


Retransmission timer managementRTT variance estimationExponential RTO backoffKarn’s algorithm

RTT Variance Estimation (Jacobson‘s Algorithm)


Retransmission timer (retransmission timeout) (RFC793)

A typical value for β = 2. In a stable environment, with low variance of RTT, this leads to an unnecessarily high value of RTO.

Estimation of the variability in RTT valuesAERR(K) is the sample mean deviation, ADEV is the mean deviation

)1(*)1( +=+ KSRTTKRTO β

)()1()1( KARTTKRTTKAERR −+=+

)1(1

1)(1

)(1

1)1(1

1

++

++

=

+=+ ∑

+

=

KAERRK

KADEVK

K

iAERRK

KADEVK

i

RTT Variance Estimation (Jacobson‘s Algorithm) II


Algorithm proposed by Jacobson

Proposed values for the constants:g = 1/8 = 0.125h = 1/4 = 0.25f = 2

)1(*)(*)1()1( ++−=+ KRTTgKSRTTgKSRTT

)()1()1( KSRTTKRTTKSERR −+=+

)1(*)(*)1()1( ++−=+ KSERRhKSDEVhKSDEV

)1(*)1()1( +++=+ KSDEVfKSRTTKRTO

Exponential RTO Backoff


When a TCP sender times out on a segment, it must retransmit thesegment. RFC 793 assumes that the same RTO value will be used for this retransmitted segmentIf a number of parallel TCP connections suffer from lost packets, the retransmission will be started roughly at the same time (constant RTO)

Backoff process: the RTO will be increased at each retransmission

With the common value for q = 2, this technique is also referred to as binary exponential backoff

RTOqRTO *=

Karn‘s Algorithm


If no segments are retransmitted, the sampling process for Jacobson’s algorithm is straightforwardHowever, if an acknowledgment is received after a retransmission, there are two possibilities

This is the ACK to the first transmission, in this case the RTT is simply longer than expectedThis is the ACK to the second transmission

The TCP source cannot distinguish between these two casesConsider the second case

Measuring the RTT between the first transmission and the ACK will be much too long. Jacobson’s algorithm will produce an unnecessarily high value of SRTT and therefore RTOAn even worse approach would be to measure the RTT from the second transmission to the receipt of the ACK. If this is fact the ACK to the first transmission, then the measured RTT will be much too small, producing a too low value of SRTT and RTO.

Karn‘s Algorithm II


Karn’s algorithm solves this problem with the following rules:Do not use the measured RTT for a retransmitted segment to update SRTT and SDEVCalculate the backoff RTO when a retransmission occursUse the backoff RTO value for succeeding segments until an acknowledgement arrives for a segment that has not been retransmitted

TCP Congestion Control Mechanisms


Window managementSlow startDynamic window sizing on congestionFast retransmitFast recoveryLimited transmit

Slow Start


The larger the send window used in TCP, the more segments that aTCP source can send before it must wait for an acknowledgmentIn the ordinary course of events, the self-clocking nature of TCP paces TCP appropriatelyHowever, when a connection is first initialized, it has no such pacing to guide it

Simple strategy: start with some relatively large window, hoping to approximate the window size that would ultimately be provided by the connectionThis is risky because the sender might flood the internet with many segments before it realized from timeouts that the flow was excessive

Slow Start II


Jacobson recommends a procedure known as slow start

awnd = allowed window, currently allowed window sizecwnd = congestion window, used during startup and to reduce flow during periods of congestioncredit = amount of unused credit granted in the most recent acknowledgment

When a new connection is opened, TCP initializes cwnd = 1Each time an acknowledgment arrives, cwnd is increased by 1, up to some maximum value

[ ]cwndcreditawnd ,MIN=


Slow Start III

The term slow start is a bit of misnomer, because cwnd actually grows exponentially. When the first ACK arrives, TCP sets cwnd to 2 and can send two segments. When these two segments are ACKed, TCP can increase cwnd for each incoming ACK. Therefore, at this point TCP can send four segments.

Dynamic Window Sizing on Congestion


The slow start algorithm has been found to work effectively for initializing a connectionCan the same procedure be used in congestion?Suppose a running connection, cwnd has reached its maximum, and a single segment is lost (timeout). This is a signal that congestion is occurring.Because it is not clear how serious the congestion is, it would be a prudent procedure to reset cwnd = 1 and begin the slow start process all overOn the other hand, this procedure might be too aggressive because the exponential growth might even worsen the congestion

Dynamic Window Sizing on Congestion II


Jacobson proposed the use of slow start to begin with, followed by a linear growth in cwndWhen a timeout occurs

Set a slow start threshold equal to half the current congestion window; that is, set ssthresh = cwnd / 2Set cwnd = 1 and perform the slow start process until cwnd = ssthreshFor cwnd ≥ ssthresh, increase cwnd by one for each round-trip time


Slow Start and Congestion Avoidance

a) Slow start

b) Dynamic window sizing on congestion (ssthesh=8)

Slow Start and Congestion Avoidance II


Fast Retransmit


The retransmission timer (RTO) will generally be noticeably longer than the actual round-trip time (RTT). Both, RFC793 and Jacobson’s algorithm et the value of RTO at somewhat greater than the estimated RTT because:

RTO is a prediction of the next RTT; if delays in the network fluctuate, the estimated RTT may be smaller than the actual RTTSimilarly, if delays at the destination fluctuate, the estimated RTT becomes unreliableThe destination may not ACK each segment but cumulatively ACK multiple segments, while at the same time sending ACKs when it has any data to send. This behavior contributes to the fluctuation in RTT.

The consequence is that if a segment is lost, TCP may be slow toretransmit

Fast Retransmit II


TCP ruleWhen a TCP entity receives a segment out of order, it must immediately issue an ACK for the last in-order segment that was receivedTCP will continue to repeat this ACK with each incoming segment until the missing segment arrives

When a source TCP receives a duplicate ACK, it means thatthe segment following the ACKed segment was delayed, orthat segment was lost.

Fast retransmit (Jacobson)To make sure that the second case is true, wait until three duplicate ACKsto the same segment are received (that is, a total of four ACKs to the same segment)Retransmit the lost segment


Fast Retransmit III

Fast Recovery


When a TCP entity retransmits a segment using fast retransmit, it knows that a segment was lost, even it has not yet timed out on that segment. Accordingly, the TCP entity should take congestion avoidance measures.Jacobson pointed out that the slow start/congestion avoidance procedure is unnecessarily conservativeBecause multiple ACKs have returned, this indicates that data segments are getting through fairly regularly to the other side

Fast retransmit techniqueRetransmit the lost segmentCut cwnd in halfProceed with the linear increase of cwnd

This technique avoids the initial exponential slow start process

Fast Recovery II


When the third duplicate ACK arrivesSet ssthresh = cwnd / 2Retransmit the missing segmentSet cwnd = ssthresh + 3The reason for adding 3 to ssthresh is that this accounts for the number of segments that have left the network and that the other end has cached

Each time an additional duplicate ACK arrives, increase cwnd by 1 and retransmit the segment if possible

When the next ACK arrives that acknowledges new data, set cwnd = ssthresh

Fast Recovery III


last acknowledgedsequence number

sequence numberplus send window

Limited Transmit


To improve the behavior of TCP, especially for short lasting connections, in cases of small receive windows, and for congestion control over connections with a small RD product, RFC3042 defines a mechanism for transmitting new segments when the following threeconditions are met:

Two consecutive duplicate ACKs are received. That is, a total of three ACKs to a transmitted segment are received.The destination TCP entity’s advertised window allows the transmission of the segment. That is, sufficient credit remains available to the source TCP entity to be able to send a new segment.The amount of outstanding data, after sending the new segment, would remain less or equal to cwnd + 2. That is, the sender may only send two segments beyond the congestion window.

Implementation of TCP Congestion Control Measures


Traffic and Congestion Control in ATM Networks


The types of traffic patterns imposed on ATM networks, as well as the transmission characteristics of those networks, differ markedly from those of other switching networks.Most packet switching networks carry non-real-time data traffic.Typically, the traffic on individual virtual circuits is bursty in nature, and the receiving system expects to receive incoming traffic on eachconnection in a bursty fashion.As a result,

The network does not need to replicate the exact timing pattern of incoming traffic at the exit node.Therefore, simple statistical multiplexing can be used to accommodate multiple logical connections over the physical interface. The average data rate required by each connection is less than the burst rate, and the user-network interface need only be designed for a capacity somewhat greater than the sum of the average data rates for all connections.

Traffic and Congestion Control in ATM Networks II


Typical congestion control schemes are inadequate for ATM networks for the following reasons:

The majority of traffic is not amenable to flow control (e.g. voice or video traffic)Feedback is slow due to drastically reduced cell transmission time compared to propagation delay across the networkATM networks typically support a wide range of applications requiring capacity ranging from a few kbps to several hundred MbpsApplications on ATM networks may generate very different traffic patterns (e.g. CBR vs. VBR)Different applications on ATM networks require different network services (e.g. delay-sensitive service for voice and video)The very high speeds in switching and transmission make ATM networks more volatile in terms of congestion control

Latency / Speed Effects


Consider the transfer of ATM cells over a network at a data rate of 150Mbps; at that rate it takes (53*8bits)/(150*106bps)≈2.8*10-6

seconds to insert a single cell onto the networkIgnoring the switching delay and assuming propagation at the speed of light, the round-trip time between the opposite costs of the United States is about 48*10-3 secondsWith these conditions in place, additional N cells are transmitted before congestion (detected by loss of data) can be deduced, where

bits 10*7.2cells 10*7.1llseconds/ce 10*8.2

seconds 10*48 648

3

=== −

−

N

Cell Delay Variation


Time reassembly for CBR

Cell Delay Variation II


Network contributionQueuing effects, processing time for the header, routingIn ATM networks very low because

fixed size cells, fixed header formats, no flow and error controlNegligible processing time for an individual node

UNI contributionProcessing required at the three layers of the ATM modelConnection multiplexingOAM (operation and maintenance) insertion

Resource Management Using Virtual Paths


Aggregate peak demand: The network may set the capacity (data rate) on the VPC equal to the total of the peak data rates on all VCCs.Statistical multiplexing: If the network sets the capacity of the VPC to be greater or equal to the average data rates of all the VCCs but less than the aggregate peak demand, then statistical multiplexing is supplied.

Traffic Policing


Generic Cell Rate Algorithm (GCRA)Leaky bucket algorithm

I … incrementL … limit

Traffic Shaping


Token bucket (special form of the leaky bucket algorithm)

advanced internet technologiesdressler/teaching/internet-technologi... · · 2004-08-10a traffic...

Documents