advanced internet technologiesdressler/teaching/internet-technologi... · · 2004-08-10a traffic...
TRANSCRIPT
Advanced Internet Technologies, SS 2004 4.1
Advanced Internet TechnologiesChapter 4
Congestion Control and Traffic Management
Dr.-Ing. Falko Dressler
Chair for Computer Networks & InternetWilhelm-Schickard-Institute for Computer Science
University of Tübingen
http://net.informatik.uni-tuebingen.de/[email protected]
Advanced Internet Technologies, SS 2004 4.2
Chapter 4Congestion Control and Traffic Management
Congestion ControlLink-Level Flow and Error ControlTCP Traffic ControlTraffic and Congestion Control in ATM Networks
Queues in Systems and Networks
Advanced Internet Technologies, SS 2004 4.3
Input and Output Queues at Switch or Router Interaction of Queues in a Data Network
Congestion Control Mechanisms
Advanced Internet Technologies, SS 2004 4.6
BackpressureBased on logical connections (flows)Flow restrictions are applied in case of congestionThese are propagated backward to the source (hop-by-hop)Operable in connection-oriented networks, e.g. X.25Not available in frame replay, ATM, or the IP based internet
Choke PacketControl packet generated at a congested node and transmitted back to the source to restrict traffic flowExample: ICMP source quench packet
Generated by any router or host that must discard IP datagramsbecause of a full bufferThe sender should cut back the rate at which it is sending trafficRepeated for every discarded packet
Congestion Control Mechanisms II
Advanced Internet Technologies, SS 2004 4.7
Implicit Congestion SignalingIf the source is able to detect increased delays and packet discards, then it has implicit evidence of network congestion.Example: TCP congestion techniques
Based on acknowledgmentsUsing delay measures and packet losses
Explicit Congestion SignalingTypically used in connection-oriented networksCan operate in two directions:
Backward: notifying the source about congestionForward: notifying the user that congestion control mechanisms should be initiated for traffic in the same direction as the received packet
Categories of explicit congestion signaling approachesBinary: a bit is set in a data packetCredit based: explicit credits are required for sending packetsRate based: explicit rate limits are applied to a connection
Traffic Management
Advanced Internet Technologies, SS 2004 4.9
FairnessDuring congestion, all flows suffer from increased delays and packet lossesFairness means, that all flows suffer from congestion equally
Quality of ServiceIn some cases flows should be treated differentlyVoice or video applications are more delay sensitive than othersDifferent priorities might be applied to different flowsDuring congestion, flows should be treated based on their QoSrequirements and/or priorities
Traffic Management II
Advanced Internet Technologies, SS 2004 4.10
ReservationsReservation schemes are one way to avoid congestion and to provide assured service to applicationsA traffic contract is negotiated during connection setupThe network agrees to give a defined QoS so long as the traffic flow is within contract parametersIf the current resources are inadequate to meet new reservations, the new reservation is denied
Link-Level Flow and Error Control
Advanced Internet Technologies, SS 2004 4.11
Fundamental mechanisms that determine the performance of communication links, networks, and internetworksIntimately associated with recover mechanisms from lost packets
Typically implemented on link level, sometimes also at network level, transport level, or application level
Performance modeling of flow and error control techniques is extraordinarily difficultNevertheless, very important and, therefore, discussed in this section
The need for Flow and Error Control
Advanced Internet Technologies, SS 2004 4.12
Flow Control limits the amount or rate of data that is sent for several reasons:
As each PDU arrives, the destination must process it. The source may attempt to send PDUs faster than the destination can process them.The destination protocol entity may buffer the incoming data for delivery to the higher-level protocol. It that is slow, the buffer may fill up and the destination may need to limit or temporarily halt the flow from source.The destination may buffer the incoming data for retransmission on another I/O port (e.g. in the case of a router) and may need to limit the incoming flow to match the outgoing flow.
Link Control Mechanisms
Advanced Internet Technologies, SS 2004 4.15
Reasons for breaking a data block into a sequence of framesBuffer size of the receiver my be limitedThe longer the transmission, the more likely that there will be an errorOn a shared medium, such as a LAN, fairness between the senders is desired
Link control mechanismsStop and WaitSliding-Window technique
Go-Back-NSelective-Reject
Stop and Wait
Advanced Internet Technologies, SS 2004 4.16
Simplest form of flow controlWorking principle
A source entity transmits a frameAfter reception, the destination entity indicates its willingness to accept another frame by sending an acknowledgementThe source must wait until it receives the acknowledgement before sending the next frame
Possible errorsA received frame could be damaged; that is, one or more bits have been alteredSolution: error control, e.g. cyclic redundancy check (CRC), retransmission of the damaged frameA received acknowledge could be damagedSolution: labeling of all frames with 0 and 1, acknowledging with ACK0 and ACK1
Sliding-Window Techniques
Advanced Internet Technologies, SS 2004 4.18
Mains problem of Stop-and-WaitOnly one frame at a time can be in transitAn explicit acknowledge is required for each single frameThus, the transmission time for each frame is very high
SolutionEfficiency can be greatly improved by allowing multiple frames to be in transit at the same timeAn acknowledge is required for a group of framesThus, the transmission time decreases with an increasing window size
Examples of ARQs (Automatic Repeat Requests)Go-back-NSelective-Reject
Sliding-Window Techniques IV
Advanced Internet Technologies, SS 2004 4.21
Go-back-N ARQA sender may transmit a series of frames sequentially numbered modulo some maximum valueThe number of unacknowledged frames outstanding is determined by the window sizeWhile no error occurs, the receiver acknowledges incoming frames
RR (receive ready)Piggybacked acknowledgements
If an error is detected, a negative acknowledge is sent: REJ (reject). The destination will discard this and all future incoming frames until the frame in error is correctly received
Selective-Reject ARQOnly those frames are retransmitted that receive a negative acknowledgement (SREJ), or those that time outMore efficient than Go-back-NReceiver must maintain a buffer large enough to save post-SREJ frames
ARQ Performance – Stop-and-Wait
Advanced Internet Technologies, SS 2004 4.23
Error-freeTransmission time T
Assuming a negligible processing time and very small acknowledgement frames
Normalized throughput S
procpropackprocpropframe TTTTTTT +++++=
propframe 2TTT +=
propframe
frame
frame
propframe
2/1)2/(1
TTT
TTTS
+=
+=
ARQ Performance – Stop-and-Wait II
Advanced Internet Technologies, SS 2004 4.24
With errorsTransmission time T
Assuming a negligible processing time and very small acknowledgement frames
Normalized throughput S
propframe 2Timeout TTT ++=
)2( propframe TTNT x +=
)2( propframe
frame
TTNTS
x +=
Parameter a
Advanced Internet Technologies, SS 2004 4.25
Parameter ad distance of the linkV velocity of propagationL length of the link control frameR data rate on the link
Stop-and-Wait without errors
Stop-and-Wait with errors
frameprop TTa =
VLRd
RLVda ==
aS
211+
=
aPS21
1+−
=
ARQ Performance – Sliding-Window
Advanced Internet Technologies, SS 2004 4.28
Selective-Reject
Go-back-N
12
12
12)1(
1
+<
+≥
⎪⎩
⎪⎨⎧
+−−
=aW
aW
aPW
PS
12
12
)1)(12()1(
211
+<
+≥
⎪⎪⎩
⎪⎪⎨
⎧
+−+−
+−
=aW
aW
WPPaPW
aPP
S
TCP Traffic Control
Advanced Internet Technologies, SS 2004 4.30
Overview to TCP Flow ControlEnd-to-end flow control mechanismConnection-oriented transport protocolHigh impact on the congestion level in the network
Sliding-window mechanismDecoupled acknowledgments and permissions to send additional data
TCP Flow Control Mechanism
Advanced Internet Technologies, SS 2004 4.31
Credit allocation scheme
Parameter (TCP header)Sequence number (SN)Acknowledgment number (AN)Window (W)
Working principleData segment including SNAcknowledge including (AN=I, W=j)
All octets through SN=i-1 are acknowledgedPermission is granted to send additional W=j octets of data
Effects of Window Size on TCP Performance
Advanced Internet Technologies, SS 2004 4.34
NotationW = TCP window size (octets)R = data rate (bps) at TCP source available to a given TCP connectionD = propagation delay (seconds) between TCP source and destination
Normalized throughput
4
4 4
1
RDW
RDW
RDWS
<
>
⎪⎩
⎪⎨
⎧=
TCP Error Control
Advanced Internet Technologies, SS 2004 4.36
Retransmission strategyNo explicit negative acknowledgment such as REJ or SREJTCP relies exclusively on positive acknowledgments and retransmissions when an acknowledgment does not arrive within a given timeout period
Retransmission timerKey element of TCPTwo strategies to set the timer
Fixed– If too small, there will be many unnecessary retransmissions– If too large, the protocol will be sluggish in responding to a lost segment
Variable– Should be set to a value somewhat longer than the round-trip delay– Adapting to the current network behavior
Adaptive Retransmission Timer
Advanced Internet Technologies, SS 2004 4.37
Average round-trip time
Weighted average round-trip time
Retransmission timer (retransmission timeout) (RFC793)
∑+
=+=+
1
1)(
11)1(
K
iiRTT
KKARTT
)1(1
1)(1
)1( ++
++
=+ KRTTK
KARTTK
KKARTT
)1(*)1()(*)1( +−+=+ KRTTKSRTTKSRTT αα
)1(*)1( )(*)1()1(*)1()1(
RTTKRTTKRTTKSRTT
K αα
ααα
−+
+−++−=+
K
)1(*)1( +=+ KSRTTKRTO β
TCP Implementation Policy Options
Advanced Internet Technologies, SS 2004 4.41
Send policyTCP may construct a segment for each batch of user data,or it may wait until a certain amount of data accumulates before sending a segment
Deliver policyTCP may deliver data as each in-order segment is received,or it may buffer data from a number of segments before delivery
Accept policyIn-order: accept only segments that arrive in-orderIn-window: accept all segments that are within the receive window
TCP Implementation Policy Options II
Advanced Internet Technologies, SS 2004 4.42
Retransmit policyFirst-only: one retransmission timer per queue, if the timer expires, retransmit the first segment in the queueBatch: one retransmission timer per queue, if the timer expires, retransmit all segments in the queueIndividual: one timer for each segment in the queue
Acknowledge policyImmediate: immediately transmit an empty (no data) segment containing the appropriate acknowledge numberCumulative: wait for data on which to piggyback the acknowledge, if no data is available (timer), transmit an empty segment containing the acknowledge number
TCP Congestion Control
Advanced Internet Technologies, SS 2004 4.43
Congestion control in a TCP/IP-based internet is a complex and difficult undertaking due to the following factors:
IP is a connectionless, stateless protocol that includes no provision for detecting, much less controlling congestionTCP provides only end-to-end flow control and can only deduce the presence of congestion within the intervening internet by indirect meansTCP entities cannot cooperate to maintain a certain total level of flow and, indeed, are more likely to compete selfishly for available resources
The only tool in TCP that relates to network congestion is the sliding-window flow and error control mechanism
TCP Flow and Congestion Control
Advanced Internet Technologies, SS 2004 4.44
Pacing effect in TCPThe rate at which a TCP entity can send data is determined by the rate of incoming ACKs to previous segments with new credit
TCP Flow and Congestion Control II
Advanced Internet Technologies, SS 2004 4.45
Pacing effect in TCPThe rate of ACKs is determined by the bottleneck in the round-trip path between the source and the destination, and that bottleneck may be either the destination or the internet
TCP Flow and Congestion Control III
Advanced Internet Technologies, SS 2004 4.46
Pacing effect in TCPThe source has no way of knowing whether the pacing rate at which it receives ACKs reflects the status of the internet (congestion control) or thestatus of the destination (flow control)
TCP Congestion Control Mechanisms
Advanced Internet Technologies, SS 2004 4.47
Retransmission timer managementRTT variance estimationExponential RTO backoffKarn’s algorithm
RTT Variance Estimation (Jacobson‘s Algorithm)
Advanced Internet Technologies, SS 2004 4.48
Retransmission timer (retransmission timeout) (RFC793)
A typical value for β = 2. In a stable environment, with low variance of RTT, this leads to an unnecessarily high value of RTO.
Estimation of the variability in RTT valuesAERR(K) is the sample mean deviation, ADEV is the mean deviation
)1(*)1( +=+ KSRTTKRTO β
)()1()1( KARTTKRTTKAERR −+=+
)1(1
1)(1
)(1
1)1(1
1
++
++
=
+=+ ∑
+
=
KAERRK
KADEVK
K
iAERRK
KADEVK
i
RTT Variance Estimation (Jacobson‘s Algorithm) II
Advanced Internet Technologies, SS 2004 4.49
Algorithm proposed by Jacobson
Proposed values for the constants:g = 1/8 = 0.125h = 1/4 = 0.25f = 2
)1(*)(*)1()1( ++−=+ KRTTgKSRTTgKSRTT
)()1()1( KSRTTKRTTKSERR −+=+
)1(*)(*)1()1( ++−=+ KSERRhKSDEVhKSDEV
)1(*)1()1( +++=+ KSDEVfKSRTTKRTO
Exponential RTO Backoff
Advanced Internet Technologies, SS 2004 4.50
When a TCP sender times out on a segment, it must retransmit thesegment. RFC 793 assumes that the same RTO value will be used for this retransmitted segmentIf a number of parallel TCP connections suffer from lost packets, the retransmission will be started roughly at the same time (constant RTO)
Backoff process: the RTO will be increased at each retransmission
With the common value for q = 2, this technique is also referred to as binary exponential backoff
RTOqRTO *=
Karn‘s Algorithm
Advanced Internet Technologies, SS 2004 4.51
If no segments are retransmitted, the sampling process for Jacobson’s algorithm is straightforwardHowever, if an acknowledgment is received after a retransmission, there are two possibilities
This is the ACK to the first transmission, in this case the RTT is simply longer than expectedThis is the ACK to the second transmission
The TCP source cannot distinguish between these two casesConsider the second case
Measuring the RTT between the first transmission and the ACK will be much too long. Jacobson’s algorithm will produce an unnecessarily high value of SRTT and therefore RTOAn even worse approach would be to measure the RTT from the second transmission to the receipt of the ACK. If this is fact the ACK to the first transmission, then the measured RTT will be much too small, producing a too low value of SRTT and RTO.
Karn‘s Algorithm II
Advanced Internet Technologies, SS 2004 4.52
Karn’s algorithm solves this problem with the following rules:Do not use the measured RTT for a retransmitted segment to update SRTT and SDEVCalculate the backoff RTO when a retransmission occursUse the backoff RTO value for succeeding segments until an acknowledgement arrives for a segment that has not been retransmitted
TCP Congestion Control Mechanisms
Advanced Internet Technologies, SS 2004 4.53
Window managementSlow startDynamic window sizing on congestionFast retransmitFast recoveryLimited transmit
Slow Start
Advanced Internet Technologies, SS 2004 4.54
The larger the send window used in TCP, the more segments that aTCP source can send before it must wait for an acknowledgmentIn the ordinary course of events, the self-clocking nature of TCP paces TCP appropriatelyHowever, when a connection is first initialized, it has no such pacing to guide it
Simple strategy: start with some relatively large window, hoping to approximate the window size that would ultimately be provided by the connectionThis is risky because the sender might flood the internet with many segments before it realized from timeouts that the flow was excessive
Slow Start II
Advanced Internet Technologies, SS 2004 4.55
Jacobson recommends a procedure known as slow start
awnd = allowed window, currently allowed window sizecwnd = congestion window, used during startup and to reduce flow during periods of congestioncredit = amount of unused credit granted in the most recent acknowledgment
When a new connection is opened, TCP initializes cwnd = 1Each time an acknowledgment arrives, cwnd is increased by 1, up to some maximum value
[ ]cwndcreditawnd ,MIN=
Advanced Internet Technologies, SS 2004 4.56
Slow Start III
The term slow start is a bit of misnomer, because cwnd actually grows exponentially. When the first ACK arrives, TCP sets cwnd to 2 and can send two segments. When these two segments are ACKed, TCP can increase cwnd for each incoming ACK. Therefore, at this point TCP can send four segments.
Dynamic Window Sizing on Congestion
Advanced Internet Technologies, SS 2004 4.57
The slow start algorithm has been found to work effectively for initializing a connectionCan the same procedure be used in congestion?Suppose a running connection, cwnd has reached its maximum, and a single segment is lost (timeout). This is a signal that congestion is occurring.Because it is not clear how serious the congestion is, it would be a prudent procedure to reset cwnd = 1 and begin the slow start process all overOn the other hand, this procedure might be too aggressive because the exponential growth might even worsen the congestion
Dynamic Window Sizing on Congestion II
Advanced Internet Technologies, SS 2004 4.58
Jacobson proposed the use of slow start to begin with, followed by a linear growth in cwndWhen a timeout occurs
Set a slow start threshold equal to half the current congestion window; that is, set ssthresh = cwnd / 2Set cwnd = 1 and perform the slow start process until cwnd = ssthreshFor cwnd ≥ ssthresh, increase cwnd by one for each round-trip time
Advanced Internet Technologies, SS 2004 4.59
Slow Start and Congestion Avoidance
a) Slow start
b) Dynamic window sizing on congestion (ssthesh=8)
Fast Retransmit
Advanced Internet Technologies, SS 2004 4.61
The retransmission timer (RTO) will generally be noticeably longer than the actual round-trip time (RTT). Both, RFC793 and Jacobson’s algorithm et the value of RTO at somewhat greater than the estimated RTT because:
RTO is a prediction of the next RTT; if delays in the network fluctuate, the estimated RTT may be smaller than the actual RTTSimilarly, if delays at the destination fluctuate, the estimated RTT becomes unreliableThe destination may not ACK each segment but cumulatively ACK multiple segments, while at the same time sending ACKs when it has any data to send. This behavior contributes to the fluctuation in RTT.
The consequence is that if a segment is lost, TCP may be slow toretransmit
Fast Retransmit II
Advanced Internet Technologies, SS 2004 4.62
TCP ruleWhen a TCP entity receives a segment out of order, it must immediately issue an ACK for the last in-order segment that was receivedTCP will continue to repeat this ACK with each incoming segment until the missing segment arrives
When a source TCP receives a duplicate ACK, it means thatthe segment following the ACKed segment was delayed, orthat segment was lost.
Fast retransmit (Jacobson)To make sure that the second case is true, wait until three duplicate ACKsto the same segment are received (that is, a total of four ACKs to the same segment)Retransmit the lost segment
Fast Recovery
Advanced Internet Technologies, SS 2004 4.65
When a TCP entity retransmits a segment using fast retransmit, it knows that a segment was lost, even it has not yet timed out on that segment. Accordingly, the TCP entity should take congestion avoidance measures.Jacobson pointed out that the slow start/congestion avoidance procedure is unnecessarily conservativeBecause multiple ACKs have returned, this indicates that data segments are getting through fairly regularly to the other side
Fast retransmit techniqueRetransmit the lost segmentCut cwnd in halfProceed with the linear increase of cwnd
This technique avoids the initial exponential slow start process
Fast Recovery II
Advanced Internet Technologies, SS 2004 4.66
When the third duplicate ACK arrivesSet ssthresh = cwnd / 2Retransmit the missing segmentSet cwnd = ssthresh + 3The reason for adding 3 to ssthresh is that this accounts for the number of segments that have left the network and that the other end has cached
Each time an additional duplicate ACK arrives, increase cwnd by 1 and retransmit the segment if possible
When the next ACK arrives that acknowledges new data, set cwnd = ssthresh
Fast Recovery III
Advanced Internet Technologies, SS 2004 4.67
last acknowledgedsequence number
sequence numberplus send window
Limited Transmit
Advanced Internet Technologies, SS 2004 4.68
To improve the behavior of TCP, especially for short lasting connections, in cases of small receive windows, and for congestion control over connections with a small RD product, RFC3042 defines a mechanism for transmitting new segments when the following threeconditions are met:
Two consecutive duplicate ACKs are received. That is, a total of three ACKs to a transmitted segment are received.The destination TCP entity’s advertised window allows the transmission of the segment. That is, sufficient credit remains available to the source TCP entity to be able to send a new segment.The amount of outstanding data, after sending the new segment, would remain less or equal to cwnd + 2. That is, the sender may only send two segments beyond the congestion window.
Traffic and Congestion Control in ATM Networks
Advanced Internet Technologies, SS 2004 4.70
The types of traffic patterns imposed on ATM networks, as well as the transmission characteristics of those networks, differ markedly from those of other switching networks.Most packet switching networks carry non-real-time data traffic.Typically, the traffic on individual virtual circuits is bursty in nature, and the receiving system expects to receive incoming traffic on eachconnection in a bursty fashion.As a result,
The network does not need to replicate the exact timing pattern of incoming traffic at the exit node.Therefore, simple statistical multiplexing can be used to accommodate multiple logical connections over the physical interface. The average data rate required by each connection is less than the burst rate, and the user-network interface need only be designed for a capacity somewhat greater than the sum of the average data rates for all connections.
Traffic and Congestion Control in ATM Networks II
Advanced Internet Technologies, SS 2004 4.71
Typical congestion control schemes are inadequate for ATM networks for the following reasons:
The majority of traffic is not amenable to flow control (e.g. voice or video traffic)Feedback is slow due to drastically reduced cell transmission time compared to propagation delay across the networkATM networks typically support a wide range of applications requiring capacity ranging from a few kbps to several hundred MbpsApplications on ATM networks may generate very different traffic patterns (e.g. CBR vs. VBR)Different applications on ATM networks require different network services (e.g. delay-sensitive service for voice and video)The very high speeds in switching and transmission make ATM networks more volatile in terms of congestion control
Latency / Speed Effects
Advanced Internet Technologies, SS 2004 4.72
Consider the transfer of ATM cells over a network at a data rate of 150Mbps; at that rate it takes (53*8bits)/(150*106bps)≈2.8*10-6
seconds to insert a single cell onto the networkIgnoring the switching delay and assuming propagation at the speed of light, the round-trip time between the opposite costs of the United States is about 48*10-3 secondsWith these conditions in place, additional N cells are transmitted before congestion (detected by loss of data) can be deduced, where
bits 10*7.2cells 10*7.1llseconds/ce 10*8.2
seconds 10*48 648
3
=== −
−
N
Cell Delay Variation II
Advanced Internet Technologies, SS 2004 4.74
Network contributionQueuing effects, processing time for the header, routingIn ATM networks very low because
fixed size cells, fixed header formats, no flow and error controlNegligible processing time for an individual node
UNI contributionProcessing required at the three layers of the ATM modelConnection multiplexingOAM (operation and maintenance) insertion
Resource Management Using Virtual Paths
Advanced Internet Technologies, SS 2004 4.75
Aggregate peak demand: The network may set the capacity (data rate) on the VPC equal to the total of the peak data rates on all VCCs.Statistical multiplexing: If the network sets the capacity of the VPC to be greater or equal to the average data rates of all the VCCs but less than the aggregate peak demand, then statistical multiplexing is supplied.
Traffic Policing
Advanced Internet Technologies, SS 2004 4.76
Generic Cell Rate Algorithm (GCRA)Leaky bucket algorithm
I … incrementL … limit