CS551: Queue Management

Congestion control vs. resource allocation

• Network’s key role is to allocate its transmission resources to users or applications

• Two sides of the same coin– let network do resource allocation (e.g., VCs)

• difficult to do allocation of distributed resources• can be wasteful of resources

– let sources send as much data as they want• recover from congestion when it occurs• easier to implement, may lose packets

Connectionless Flows• How can a connectionless network allocate anything to a

user?– It doesn’t know about users or applications

• Flow:– a sequence of packets between same source - destination pair,

following the same route• Flow is visible to routers - it is not a channel, which is an end-

to-end abstraction• Routers may maintain soft-state for a flow• Flow can be implicitly defined or explicitly established

(similar to VC)– Different from VC in that routing is not fixed

Taxonomy• Router-centric v.s. Host-centric

– router-centric: address problem from inside network - routers decide what to forward and what to drop

• A variant not captured in the taxonomy: adaptive routing!

– host centric: address problem at the edges - hosts observe network conditions and adjust behavior

– not always a clear separation: hosts and routers may collaborate, e.g., routers advise hosts

..Taxonomy..• Reservation-based v.s. Feedback-based

– Reservations: hosts ask for resources, network responds yes/no

• implies router-centric allocation– Feedback: hosts send with no reservation,

adjust according to feedback• either router or host centric: explicit (e.g., ICMP

source quench) or implicit (e.g., loss) feedback

..Taxonomy• Window-based v.s. Rate-based• Both tell sender how much data to transmit• Window: TCP flow/congestion control

– flow control: advertised window– congestion control: cwnd

• Rate: still an open area of research– may be logical choice for reservation-based


Service Models• In practice, fewer than eight choices• Best-effort networks

– Mostly host-centric, feedback, window based– TCP as an example

• Networks with flexible Quality of Service– Router-centric, reservation, rate-based

Queuing Disciplines• Each router MUST implement some

queuing discipline regardless of what the resource allocation mechanism is

• Queuing allocates bandwidth, buffer space, and promptness:– bandwidth: which packets get transmitted– buffer space: which packets get dropped– promptness: when packets get transmitted

FIFO Queuing• FIFO:first-in-first-out (or FCFS: first-come-first-

serve)• Arriving packets get dropped when queue is full

regardless of flow or importance - implies drop-tail

• Important distinction:– FIFO: scheduling discipline (which packet to serve

next)– Drop-tail: drop policy (which packet to drop next)

Per-connection state Single class

Drop positionHead Tail

Random location

Class-based queuing

Early drop Overflow drop


..FIFO• FIFO + drop-tail is the simplest queuing algorithm

– used widely in the Internet• Leaves responsibility of congestion control to

edges (e.g., TCP)• FIFO lets large user get more data through but

shares congestion with others– does not provide isolation between different flows– no policing

Fair Queuing


Fair Queuing• Main idea:

– maintain a separate queue for each flow currently flowing through router

– router services queues in Round-Robin fashion• Changes interaction between packets from

different flows– Provides isolation between flows– Ill-behaved flows cannot starve well-behaved flows– Allocates buffer space and bandwidth fairly

FQ IllustrationFlow 1

Flow 2

Flow n


Variation: Weighted Fair Queuing (WFQ)

Some Issues• What constitutes a user?

– Several granularities at which one can express flows– For now, assume at the granularity of source-

destination pair, but this assumption is not critical• Packets are of different length

– Source sending longer packets can still grab more than their share of resources

– We really need bit-by-bit round-robin– Fair Queuing simulates bit-by-bit RR

• not feasible to interleave bits!

Bit-by-bit RR• Router maintains local clock• Single flow: suppose clock ticks when a bit

is transmitted. For packet i:– Pi: length, Ai = arrival time, Si: begin transmit

time, Fi: finish transmit time. Fi = Si+Pi– Fi = max (Fi-1, Ai) + Pi

• Multiple flows: clock ticks when a bit from all active flows is transmitted

Fair Queuing• While we cannot actually perform bit-by-bit

interleaving, can compute (for each packet) Fi. Then, use Fi to schedule packets– Transmit earliest Fi first

• Still not completely fair– But difference now bounded by the size of the

largest packet– Compare with previous approach

Fair Queuing Example


Flow 1(arriving)

Flow 2transmitting Output



Flow 1 Flow 2 Output


Cannot preempt packetcurrently being transmitted

Delay Allocation• Aim: give less delay to those using less than

their fair share• Advance finish times for sources whose

queues drain temporarily• Bi = Pi + max (Fi-1, Ai - )• Schedule earliest Bi first

Allocate Promptness• Bi = Pi + max (Fi-1, Ai - )• gives added promptness:

– if Ai < Fi-1, conversation is active and does not affect it: Fi = Pi + Fi-1

– if Ai > Fi-1, conversation is inactive and determines how much history to take into account

Notes on FQ• FQ is a scheduling policy, not a drop policy• Still achieves statistical muxing - one flow can fill

entire pipe if no contenders – FQ is work conserving

• WFQ is a possible variation – need to learn about weights off line. Default is one bit per flow, but sending more bits is possible

More Notes on FQ• Router does not send explicit feedback to source -

still needs e2e congestion control– FQ isolates ill-behaved users by forcing users to share

overload with themselves– user: flow, transport protocol, etc

• Optimal behavior at source is to keep one packet in the queue

• But, maintaining per flow state can be expensive– Flow aggregation is a possibility

Congestion Avoidance• TCP’s approach is reactive:

– detect congestion after it happens– increase load trying to maximize utilization until loss

occurs– TCP has a congestion avoidance phase, but that’s

different from what we’re talking about here• Alternatively, we can be proactive:

– we can try to predict congestion and reduce rate before loss occurs

– this is called congestion avoidance

Router Congestion Notification

• Routers well-positioned to detect congestion– Router has unified view of queuing behavior– Routers can distinguish between propagation and

persistent queuing delays– Routers can decide on transient congestion, based on

workload• Hosts themselves are limited in their ability to

infer these from perceived behavior

Router Mechanisms• Congestion notification

– the DEC-bit scheme• explicit congestion feedback to the source

– Random Early Detection (RED)• implicit congestion feedback to the source• well suited for TCP

Design Choices for Feedback• What kind of feedback

– Separate packets (source quench)– Mark packets, receiver propagates marks in ACKs

• When to generate feedback– Based on router utilization

• You can be near 100% utilization without seeing a throughput degradation

– Queue lengths• But what queue lengths (instantaneous, average)?

A Binary Feedback Scheme for Congestion Control in

Computer Networks (DEC-bit)Ramakrishnan90

The Dec-bit Scheme

The Dec-bit Scheme• Basic ideas:

– on congestion, router sets a bit (CI) bit on packet– receiver relays bit to sender in acknowledgements– sender uses feedback to adjust sending rate

• Key design questions:– Router: Feedback policy (how and when does a router

generate feedback)– Source: Signal filtering (how does the sender respond?)

Why Queue Lengths?• It is desirable to implement FIFO

– Fast implementations possible– Shares delay among connections– Gives low delay during bursts

• FIFO queue length is then a natural choice for detecting the onset of congestion

The Use of Hysteresis• If we use queue lengths, at

what queue lengths should we generate feedback?– Threshold or hysteresis?

• Surprisingly, simulations showed that if you want to increase power– Use no hysteresis– Use average queue length

threshold of 1– Maximizes power function

Power = throughput/delay

Computing Average Queue Lengths

• Possibilities:– Instantaneous

• Premature, unfair– Averaged over a fixed time

window, or exponential average

• Can be unfair if time window different from round-trip time

• Solution– Adaptive queue length

estimation: busy/idle cycles– But need to account for long

current busy periods

Sender Behavior• How often should the source change

window?• In response to what received information

should it change its window?• By how much should the source change its

window?– We already know the answer to this: AIMD

• DEC-bit scheme uses a multiplicative factor of 0.875

How Often to Change Window?

• Not on every ACK received– Window size would oscillate dramatically

because it takes time for a window change’s effects to be felt

• If window changes to W, it takes (W+1) packets for feedback about that window to be received

• Correct policy: wait for (W+W’) acks– Where W is window size before update and W’

is size after update

Using Received Information• Use the CI bits from W’ acks in order to decide

whether congestion still persists• Clearly, if some fraction of bits are set, then

congestion exists• What fraction?

– Depends on the policy to set the threshold– When queue size threshold is 1, cutoff fraction should

be 0.5– This has the nice property that the resulting power is

relatively insensitive to this choice

Changing the Sender’s Window

• Sender policy– monitor packets within a window– make change if more than 50% of packets had

CI set:• if < 50% had CI set, then increase window by 1• else new window = window * 0.875

– additive increase, multiplicative decrease for stability

Page 37: Queue Management

Dec-bit Evaluation• Relatively easy to implement• No per-connection state• Stable• Assumes cooperative sources• Conservative window increase policy• Some analytical intuition to guide design

– Most design parameters determined by extensive simulation

Random Early Detection (RED)Floyd93

Random Early Detection (RED)

• Motivation:– high bandwidth-delay flows have large queues

to accommodate transient congestion– TCP detects congestion from loss - after queues

have built up and increase delay• Aim:

– keep throughput high and delay low– accommodate bursts

Why Active Queue Management? (Rfc2309)

• Lock-out problem– drop-tail allows a few flows to monopolize the

queue space, locking out other flows (due to synchronization)

• Full queues problem:– drop tail maintains full or nearly-full queues

during congestion; but queue limits should reflect the size of bursts we want to absorb, not steady-state queuing

Other Options• Random drop:

– packet arriving when queue is full causes some random packet to be dropped

• Drop front:– on full queue, drop packet at head of queue

• Random drop and drop front solve the lock-out problem but not the full-queues problem

Solving the Full Queues Problem

• Drop packets before queue becomes full (early drop)

• Intuition: notify senders of incipient congestion– example: early random drop (ERD):

• if qlen > drop level, drop each new packet with fixed probability p

• does not control misbehaving users

Differences With Dec-bit• Random marking/dropping of packets• Exponentially weighted queue lengths• Senders react to single packet• Rationale:

– Exponential weighting better for high bandwidth connections

– No bias when weighting interval different from round-trip time, since packets are marked randomly

– Random marking avoids bias against bursty traffic

RED Goals• Detect incipient congestion, allow bursts• Keep power (throughput/delay) high

– keep average queue size low– assume hosts respond to lost packets

• Avoid window synchronization– randomly mark packets

• Avoid bias against bursty traffic• Some protection against ill-behaved users

RED OperationMin threshMax thresh

Average queuelength

minthresh maxthresh



Avg length


Queue Estimation• Standard EWMA: avg - (1-wq) avg + wqqlen

• Upper bound on wq depends on minth

– want to set wq to allow a certain burst size

• Lower bound on wq to detect congestion relatively quickly

Thresholds• minth determined by the utilization

requirement– Needs to be high for fairly bursty traffic

• maxth set to twice minth

– Rule of thumb– Difference must be larger than queue size

increase in one RTT• Bandwidth dependence

Packet Marking• Marking probability based on queue length

– Pb = maxp(avg - minth) / (maxth - minth)

• Just marking based on Pb can lead to clustered marking -> global synchronization

• Better to bias Pb by history of unmarked packets– Pb = Pb/(1 - count*Pb)

RED Algorithm

RED Variants• FRED: Fair Random Early Drop (Sigcomm, 1997)

– maintain per flow state only for active flows (ones having packets in the buffer)

• CHOKe (choose and keep/kill) (Infocom 2000)– compare new packet with random pkt in queue– if from same flow, drop both– if not, use RED to decide fate of new packet

Extending RED for Flow Isolation

• Problem: what to do with non-cooperative flows?

• Fair queuing achieves isolation using per-flow state - expensive at backbone routers

• Pricing can have a similar effect– But needs much infrastructure to be developed

• How can we isolate unresponsive flows without per-flow state?

Red Penalty Box• With RED, monitor history for packet

drops, identify flows that use disproportionate bandwidth

• Isolate and punish those flows

Flows That Must Be Regulated

• Unresponsive:– fail to reduce load in response to increased loss

• Not TCP friendly– long-term usage exceeds that of TCP under same

conditions• Using disproportionate bandwidth

– use disproportionately more bandwidth than other flows during congestion

• Assumptions:– We can monitor a flow’s arrival rate

Identifying Flows to Regulate• Not TCP friendly: use TCP model

– TCP tput: (1.5*sqrt(0.66B)) / (RTT*sqrt(p))– B: packet size in bytes, p: packet drop rate– Better approximation in Padhye et al. paper– Problems:

• Needs bounds on packet sizes and RTTs

• Unresponsive– if drop rate increases by x then arrival rate should

decrease by a factor of sqrt(x)


..Flows to Regulate• Flows using disproportionate bandwidth

– assume additive increase, multiplicative decrease only flows

– assume cwin = W at loss– can be shown that: loss prob <= 8/(3W2)– for segment size B:– tput < 0.75W*B/RTT