fairness max-min fairness · 2013-03-15 · • congestion control, efficiency and fairness •...
TRANSCRIPT
Congestion Control and Active Queue Management
• Congestion Control, Efficiency and Fairness• Analysis of TCP Congestion Control
– A simple TCP throughput formula
• RED and Active Queue Management– How RED works– Fluid model of TCP and RED interaction
BISS 2010: FAN 1
– Fluid model of TCP and RED interaction– Other AQM mechanisms
• XCP: congestion for large delay-bandwidth product– Router-based mechanism– Decoupling congestion control from fairness
Readings: please do required readings ! and optional readings if interested
Why Congestion Control• Inefficiency and Congestion Collapse
– “self-interest” vs. “social welfare”
• Inefficiency: a simple “artificial” example
s1d1
C3 =110kb/sC4 =100kb/sC1 =100kb/ssource 1 rate
λ1 =100kb/s source 1 throughput
BISS 2010: FAN 2
s2
x y
d2
C3 =110kb/s
C5 =10kb/sC2 =1000kb/ssource 2 rateλ2 =1000kb/s
source 1 throughputµ1 =10kb/s !
source 2 throughputµ2 =10kb/s
Assumption: when total offered traffic exceeds link capacity, all sources see their traffic reduced in proportion of their offered traffic (e.g., when FIFO is used)
Fairness• Consider a simple scenario:
– N users want to transit data over a link of bandwidth C
– Each user i wants ri bandwidth
– If Σri <C, no problem!
– But if Σ ri >C, what should we do?
• Suppose all users are of equal “importance”:– To be fair, allocate the same share, C/N, to each user
BISS 2010: FAN 3
– To be fair, allocate the same share, C/N, to each user
– Ok, if ri > C/N; what if there exists ri, ri < C/N• i.e., some users want less than that their “fair share”
• how do we allocate the “residue” bandwidth of these users?
– the “Fair Queuing” algorithm: WFQ where wi =1 for all i’s
• If not all users are equal: importance denoted by wi
– weighted fair queueing
Max-Min Fairness• Network scenario: A simple line network example
• First, maximize thruput at each router/link (or total network thruput) may lead to unfair bw allocation
• How to allocate bw fairly?– let x be a “feasible” bw share of user i at link j
user 0
user i
link i bw Ci
BISS 2010: FAN 4
– let x ij be a “feasible” bw share of user i at link j– Bw allocation to user i: xi = minj xij
– Ideally, we want to max xi =max minj xij , for all users • Max-min fairness:
– Let {xj} be a bw allocation vector (bav), it is max-min fair if • for any other bav y, if yi > xi, then there exists j, xj <= xi
and yj < xj
– Unfortunately, such max-min fair bav may not always exist!
Fairness (cont’d) • (Abstract) Network model: S sources and L links (cl)
– Al,s (routing matrix): fraction of traffic of source s on link l
– feasible (rate) allocation:
– formal def of “bottleneck” link (with respect to source s)
• Some Facts (Theorems)– A feasible rate allocation is max-min fair if and only if every source
has a bottleneck link
BISS 2010: FAN 5
has a bottleneck link
– Under network model and assuming routing matrix fixed, there exists a unique max-min fair allocation !
– Fair Queueing implements max-min fairness
TCP Congestion Control Behavior
• congestion control: – decrease sending rate when
loss detected, increase when no loss
• routers– discard, mark packets
when congestion occurs
TCP runs at end-hosts
BISS 2010: FAN 6
when congestion occurs
• interaction between end systems (TCP) and routers?– want to understand
(quantify) this interaction
congested router drops packets
Generic TCP CC Behavior: Additive Increase
• window algorithm (window W )– up to W packets in network– return of ACK allows sender to send another packet– cumulative ACKS
• increase window by one per RTT W <<<<−−−− W +1/W per ACK
BISS 2010: FAN 7
W <<<<−−−− W +1/W per ACK⇒⇒⇒⇒ W <<<<−−−− W +1 per RTT
• seeks available network bandwidth• Ignoring the “slow start” phase during which window
increased by one per ACKW <<<<−−−− W +1 per ACK⇒⇒⇒⇒ W <<<<−−−− 2222W per RTT
receiver
BISS 2010: FAN 8
senderW
Generic TCP CC Behavior:Multiplicative Decrease
• window algorithm (window W)
• increase window by one per RTT
W <<<<−−−− W +1/W per ACK
• loss indication of congestion
BISS 2010: FAN 9
• loss indication of congestion
• decrease window by half on detection of loss, (triple duplicate ACKs), W <<<<−−−− W/2
receiver
TD
BISS 2010: FAN 10
sender
TD
Generic TCP CC Behavior:After Time-Out (TO)
• window algorithm (window W)
• increase window by one per RTT
W <<<<−−−− W +1/W per ACK
• halve window on detection of loss, W <<<<−−−− W/2
BISS 2010: FAN 11
• timeouts due to lack of ACKs −−−−>>>> window reduced to one, W <<<<−−−− 1
receiver
BISS 2010: FAN 12
sender
TO
Generic TCP Behavior: Summary
• window algorithm (window W)
• increase window by one per RTT (or one over window per ACK, W <<<<−−−− W +1/W)
• halve window on detection of loss, W <<<<−−−− W/2
• timeouts due to lack of ACKs, W <<<<−−−− 1
BISS 2010: FAN 13
• successive timeout intervals grow exponentially longup to six times
Understanding TCP Behavior
• can simulate (ns-2)+ faithful to operation of TCP
- expensive, time consuming
• deterministic approximations+ quick
BISS 2010: FAN 14
+ quick
- ignore some TCP details, steady state
• fluid models+ transient behavior
- ignore some TCP details
TCP Throughput/Loss Relationship
Idealized model:• W is maximum supportable
window size (then loss occurs)• TCP window starts at W/2 grows
to W, then halves, then grows to W, then halves…
• one window worth of packets
TCPwindow
size
W
loss occurs
BISS 2010: FAN 15
• one window worth of packets each RTT
• to find: throughput as function of loss, RTT
size
time (rtt)
W/2
TCP Throughput/Loss Relationship
TCPwindow
size
W
# packets sent per “period” =
BISS 2010: FAN 16
size
time (rtt)
W/2
period
TCP Throughput/Loss Relationship
TCPwindow
size
W ∑=
+=++
++2/
0
)2
(...122
W
n
nW
WWW
∑+
+=2/
21
2
W
nWW
# packets sent per “period” =
BISS 2010: FAN 17
size
time (rtt)
W/2
period
∑=
022 n
2
)12/(2/
21
2
++
+= WWWW
WW4
3
8
3 2 +=
2
8
3W≈
TCP Throughput/Loss Relationship
TCPwindow
size
W
# packets sent per “period” 2
8
3W≈
1 packet lost per “period” implies:
ploss 23
8
W≈ or:
losspW
3
8=
packets3
BISS 2010: FAN 18
size
time (rtt)
W/2
period
rtt
packets
4
3utavg._thrup WB ==
B throughput formula can be extendedto model timeouts and slow start [PFTK’98]
rtt
packets22.1utavg._thrup
losspB ==
Drawbacks of FIFO with Tail-drop
• Sometimes too late a signal to end system about network congestion – in particular, when RTT is large
• Buffer lock out by misbehaving flows
BISS 2010: FAN 19
• Buffer lock out by misbehaving flows
• Synchronizing effect for multiple TCP flows
• Burst or multiple consecutive packet drops– Bad for TCP fast recovery
FIFO Router with Two TCP Sessions
BISS 2010: FAN 20
Active Queue Management• Dropping/marking packets
depends on average queue length -> p = p(x)
• Advantages:– signal end systems earlier– absorb burst better– avoids synchronization
BISS 2010: FAN 21
– avoids synchronization
• Examples:
– RED
– REM
– …
– … tmin tmax
pmax
1
2tmax
Mark
ing
prob
abilit
y p
average queue length x
0
RED: Parameters
• min_th – minimum threshold
• max_th – maximum threshold
• avg_len – average queue length– avg_len = (1-w)*avg_len + w*sample_len
BISS 2010: FAN 22
Discard Probability
AverageQueue Length
0
1
min_th max_th queue_len
RED: Packet Dropping
• If (avg_len < min_th) � enqueue packet
• If (avg_len > max_th) � drop packet
• If (avg_len >= min_th and avg_len < max_th) �enqueue packet with probability P
BISS 2010: FAN 23
Discard Probability (P)
AverageQueue Length
0
1
min_th max_th queue_len
RED: Packet Dropping (cont’d) • P = max_P*(avg_len – min_th)/(max_th – min_th)
Discard Probability
BISS 2010: FAN 24
Discard Probability
AverageQueue Length
0
1
min_th max_th queue_lenavg_len
P
max_P
RED Router with Two TCP Sessions
BISS 2010: FAN 25
Dynamic (Transient) Analysis of TCP Fluids
• model TCP traffic as fluid
• describe behavior of flows and queues using Ordinary Differential Equations (ODEs)
• solve resulting ODEs numerically
BISS 2010: FAN 26
Loss Model
Sender
AQM Router
Packet Drop/Mark
Receiver
B(t)p(t)
BISS 2010: FAN 27
Loss Rate as seen by Sender: λλλλ(t)))) = B(t-τ)∗τ)∗τ)∗τ)∗p(t-τ)τ)τ)τ)
Round Trip Delay (ττττ)
A Single Congested Router
TCP flow i
• focus on single bottlenecked router– capacity {C (packets/sec) }
– queue length q(t)– discard prob. p(t)
• N TCP flows thru router
BISS 2010: FAN 28
AQM router
C, p
• N TCP flows thru router– window sizes Wi(t)
– round trip time
Ri(t) = Ai+q(t)/C– throughputs
Bi (t) = Wi(t)/Ri(t)
Adding RED to the Model
RED: Marking/dropping based on average queue length x(t)
p
1
Mar
king
pro
bab
ilit
y p
BISS 2010: FAN 29
tmin tmax
pmax
2tmax
Mar
king
pro
bab
ilit
y
Average queue length x
t ->
- q(t)- x(t)
x(t): smoothed, time averaged q(t)
TCP Window Dynamic Model
TCP class -- TCP flows sharing the same route
average window size of a TCP class:
sender receiver
multi.decrease
additiveincrease
loss arrivalrate
Link Model: RED
-- average queue length -- averaging parameter
packet loss/mark probability
Traffic Propagation Model
track TCP class’s arrival/departure rates at each queue
-- arrival rate of TCP class i on kth queue
-- departure rate of TCP class i from kth queue
Traffic Propagation Model
track TCP class’s arrival/departure rates at each queue
TCP Source Rate
Putting it Together
TCP Queue
AQM
Traffic
Propagation
Averaging
Throughput Queueing Delay
AQMAveragingLoss
Coupled differential equations solved numerically
Loss Probability
A Queue is not a NetworkNetwork - set of AQM routers, V
sequence Vi for session i
Round trip time - aggregate delay
Ri(t) = Ai + Σv∈Vi qv(t)
BISS 2010: FAN 35
Link bandwidth constraints
Queue equations
Loss/marking probability - cumulative prob
pi (t) = 1-Πv ∈Vi(1 - pv(qv(t)))
Steady State Behavior
• let t → ∞
• this yields
kkk RtRWtWptp
dtWd →→→→ )(,)(,)(,0
pWWp )1(21 −=−−= or
BISS 2010: FAN 36
• the throughput is
p
pWp
R
WW
R
pk
k
kk
k
)1(2
2
10
−=−−= or
ppRpR
pB
kk
k smallfor2)1(2
≈−
=
How well does it work?
• OC-12 – OC-48 links
• RED with target delay 5msec
• 2600 TCP flows OC-48
OC-12
BISS 2010: FAN 37
OC-48
• decrease to 1300 at 30 sec.
• increase to 2600 at 90 sec.
t=30 t=90
2600 ×××× j 2600 ×××× j1300 ×××× j
inst
anta
neou
s dela
y
simulation
fluid model
BISS 2010: FAN 38
Good queue length match
inst
anta
neou
s dela
y
time (sec)
win
dow
siz
e
simulation
fluid model
avera
ge w
indow
siz
e
BISS 2010: FAN 39
time (sec)
win
dow
siz
e
matches average window size
time (sec)
avera
ge w
indow
siz
e
simulation
fluid model
Numerical Solution: time-stepped simulation
� solve ODEs using � Matlab: low efficiency, poor flexibility� C program: fixed step-size Runge-Kutta method
� time-stepped simulation:�update windows of all TCP classes�update windows of all TCP classes�calculate departure/arrival rates on each queue�update queue length and packet loss/mark prob.
� computation cost� step-size� number of TCP classes� number of links
Other Model Enhancements
� adjustment for TCP implementations� reno, newreno, sack� window backoff size vs. average window size
� different AQMs� PI controller� AVQ� AVQ� REM
� adjustment for RED implementations� geometric and uniform dropping� wait option
load variations: {class 1, 2}� {class 1}�{class 1,2,3}
Accuracy: Transient Behaviorcompare with packet level simulation: network simulator (ns)
S1
S2 B1
S3
B2 D2
D3
D1class1
class2
class3
queu
e le
ngth
win
dow
siz
e
time time
ns avg.ns indiv.ns
queu
e le
ngth
win
dow
siz
e
time time
fluid fluidns
ns avg.ns indiv.
Scalability: Link Bandwidth & Flow Population
8 links, 3 TCP classes
Class1
scale bandwidth and flow population with k=1, 10, 50, 100link Bandwidth: 10M*k, 100M*kflows population each class: 40*k
Class2Class3
Scalability: Link Bandwidth & Flow Population (cont.)
k=1
rate
k=100k=10
27min.56sec16min.23sec2min.2sec12.5secNS
2,1881,283159.316.32Speed-up
FFM
Scale
0.766sec
1
0.766sec
10
0.766sec
50
0.766sec
100
time time time
Issues with RED
• Parameter sensitivity– how to set minth, maxth, and maxp
– Goal: maintain avg. queue size below midpoint between min_{th} and max_{th}
• maxth needs to be significantly smaller than max. queue size to absorb transient peaks
BISS 2010: FAN 45
queue size to absorb transient peaks
• maxp determines drop rate
– In reality, hard to set these parameters
• RED uses avg. queue length, may introduce large feedback delay, lead to instability
Other AQM Mechanisms
• Adaptive RED (ARED)
• BLUE
• Virtual Queue
• Random Early Discard (REM)
• Proportional Integral Controller
BISS 2010: FAN 46
• Proportional Integral Controller
• Adaptive Virtual Queue – Improved AQMs are designed based on control
theory to provide better faster response to congestion and more stable systems
Explicit Congestion Notification (ECN)
• Standard TCP:– Losses needed to detect congestion
– Wasteful and unnecessary
• ECN (RFC 2481):– Routers mark packets instead of dropping them
– Receiver returns marks to sender in ACK packets
BISS 2010: FAN 47
– Receiver returns marks to sender in ACK packets
– Sender adjusts its window accordingly
• Two bits in IP header:– ECT: ECN-capable transport (set to 1)
– CE: congestion experienced (set to 1)
TCP congestion control performs poorly as bandwidth or delay increases
Shown analytically in [Low01] and via simulations
50 flows in both directionsBuffer = BW x Delay
RTT = 80 ms
50 flows in both directionsBuffer = BW x Delay
BW = 155 Mb/s
BISS 2010: FAN 48
Round Trip Delay (sec)Bottleneck Bandwidth (Mb/s)
Because TCP lacks fast response Because TCP lacks fast response
• Spare bandwidth is available ⇒ TCP increasesby 1 pkt/RTT even if spare bandwidth is huge
• When a TCP starts, it increases exponentially ⇒ Too many drops ⇒ Flows ramp up by 1 pkt/RTT,taking forever to grab the large bandwidth
High Utilization; Small Queues; Few Drops
Bandwidth Allocation
Solution: Decouple Congestion Control from Fairness
XCP: eXplicit congestion Control Protocol
BISS 2010: FAN 49
Small Queues; Few Drops
Bandwidth Allocation Policy
Solution: Decouple Congestion Control from Fairness
Example: In TCP, Additive-Increase Multiplicative-Decrease (AIMD) controls both
Coupled because a single mechanism controls both
How does decoupling solve the problem?
Why Decoupling?
BISS 2010: FAN 50
How does decoupling solve the problem?
1. To control congestion: use MIMD which shows fast response
2. To control fairness: use AIMD which converges to fairness
Characteristics of XCP Solution
1. Improved Congestion Control (in high bandwidth-delay & conventional environments):
• Small queues
• Almost no drops
2. Improved Fairness
BISS 2010: FAN 51
2. Improved Fairness
3. Scalable (no per-flow state)
4. Flexible bandwidth allocation: min-max fairness, proportional fairness, differential bandwidth allocation,…
XCP: An eXplicit Control Protocol
BISS 2010: FAN 52
1. Congestion Controller2. Fairness Controller
Round Trip TimeRound Trip Time
How does XCP Work?
BISS 2010: FAN 53
Feedback
Congestion Window
Congestion Header
Feedback
Congestion Window
Feedback = + 0.1 packet
Round Trip Time
How does XCP Work?
BISS 2010: FAN 54
Feedback = + 0.1 packet
Congestion Window
Feedback = - 0.3 packet
Congestion Window = Congestion Window + Feedback
How does XCP Work?
BISS 2010: FAN 55
Congestion Window = Congestion Window + Feedback
Routers compute feedback without any per-flow state
Routers compute feedback without any per-flow state
XCP uses ECN and “Core Stateless” mechanism (i.e. state carried in packet header)
How Does an XCP Router Compute the Feedback?
Congestion Controller Fairness ControllerGoal: Divides ∆ between flows to converge to fairness
Looks at a flow’s state in Congestion Header
Goal: Matches input traffic to link capacity & drains the queue
Looks at aggregate traffic & queue
BISS 2010: FAN 56
Congestion Header
Algorithm:
If ∆ > 0 ⇒ Divide ∆ equally between flows
If ∆ < 0 ⇒ Divide ∆ between flows proportionally to their current rates
MIMD AIMDqueue
Algorithm:
Aggregate traffic changes by ∆∆ ~ Spare Bandwidth
∆ ~ - Queue Size
So, ∆ = α davg Spare - β Queue
∆ = α davg Spare - β Queue
Theorem: System converges to optimal utilization (i.e., stable) for any link bandwidth,
Getting the devil out of the details …
Congestion Controller Fairness ControllerAlgorithm:
If ∆ > 0 ⇒ Divide ∆ equally between flows
If ∆ < 0 ⇒ Divide ∆ between flows proportionally to their current rates
Need to estimate number of flows N
BISS 2010: FAN 57
224
0 2αβπα =<< and
stable) for any link bandwidth, delay, number of sources if:
(Proof based on Nyquist Criterion)No Parameter TuningNo Parameter Tuning
flows N
∑ ×=
Tinpkts pktpkt RTTCwndTN
)/(1
RTTpkt : Round Trip Time in header Cwndpkt : Congestion Window in headerT: Counting Interval
No Per-Flow StateNo Per-Flow State