Download - Experience in Black-box OSPF Measurement
Sigcomm IMW - 2001 1A. Sshaikh, A. Greenberg; Nov 01
UCSC
Experience in Black-box OSPF Measurement
Aman Shaikh, UCSC
Albert Greenberg, AT&T Labs-Research
Sigcomm IMW – November 2001
Sigcomm IMW - 2001 2A. Sshaikh, A. Greenberg; Nov 01
UCSC
Why Measure OSPF?
• OSPF behavior in large ISPs not well understood, yet– any meaningful performance assurance depends on routing
stability– an internal network change (OSPF event) can have major impact
on flows and customers, during which • intra-domain routing reconverges• inter-domain routing reconverges (BGP uses OSPF metrics)
• Internal OSPF processing delays matter!– message processing, routing calculation, table update– add up to impact convergence, instabilities
• OSPF measurements also needed for– guidance in tuning configurable parameters– head to head vendor comparisons
Sigcomm IMW - 2001 3A. Sshaikh, A. Greenberg; Nov 01
UCSC
How to Measure OSPF?
• Problem: Instrumenting routing code for measuring delays is challenging– commercial implementations are proprietary– may involve grappling with
• numerous code versions, hardware platforms, and developers
• Solution: black-box measurements– measure the timing delays using external observations
• Contribution: black-box measurements for internal OSPF delays– applied to Cisco and GateD OSPF implementations
• Key prior work:– IS-IS measurements by Packet Design [draft-alaettinoglu-
ISIS-convergence-00]
Sigcomm IMW - 2001 4A. Sshaikh, A. Greenberg; Nov 01
UCSC
Black-box Techniques are Effective
• Works across wide range of timing delays– 100 sec for packet processing– 10s of msec for routing calculation
• Works even for the purely CPU bound tasks– packet processing subtasks, Dijkstra’s shortest
path calculation
• Captures scaling– O(n2) time for shortest path calculation, for full
n n mesh topologies
Sigcomm IMW - 2001 5A. Sshaikh, A. Greenberg; Nov 01
UCSC
OSPF Background
• Link-state routing protocol– all routers in the domain come to a consistent view of
the topology by exchange of Link State Advertisements (LSAs) • set of LSAs (self-originated + received) at a router = topology
• SPF Calculation – each router calculates a single source shortest path tree
• Forwarding Information Base (FIB)– each router uses the tree to build its FIB, which governs
packet forwarding
Sigcomm IMW - 2001 6A. Sshaikh, A. Greenberg; Nov 01
UCSC
Link-state Advertisement (LSA)
• LSA propagation: each router– describes local connectivity in an LSA– floods LSA to other routers in the domain– acknowledges LSA in an LS Ack packet
• Duplicate LSAs: each router– can receive multiple copies of a given LSA
• first copy received is termed “new”• copies received later are termed “duplicate”
• Duplicate LSAs MUST be acknowledged immediately (RFC2328)– allows us to build a timestamp
Sigcomm IMW - 2001 7A. Sshaikh, A. Greenberg; Nov 01
UCSC
Router Model
Route Processor (CPU)
FIB
Interface card Interface card
Forwarding
SwitchingFabric
Data packet
Data packet
TopologyView
SPF Calculation
OSPF Process
LSA
LS Ack
LSA
Forwarding
LSA Processing
LSA Flooding
SPF Calculation
FIB Update
Sigcomm IMW - 2001 8A. Sshaikh, A. Greenberg; Nov 01
UCSC
Methodology
TopTracker Target router
Emulated topology
•Load emulated topology on target router
•Initiate task of interest•Measure the time for task
Testbed
LSALSALSA
Sigcomm IMW - 2001 9A. Sshaikh, A. Greenberg; Nov 01
UCSC
Measuring Task Time
top bracket event
bottom bracket event
task start time
task finish time
time
1. Use a black-box method to bracket task start and finish times
2. Subtract out intervals that precede and exceed these times
X
B
C
X = A - (B+C)
A
Sigcomm IMW - 2001 10A. Sshaikh, A. Greenberg; Nov 01
UCSC
Methodology for SPF Calculation
Ack for duplicate LSA arrives
Initiator LSA arrives
SPF calculation ends
SPF calculation starts
time
Target RouterTopTracker
Send initiator LSA
Send duplicate LSA
Load desired topology
Send ack for duplicate LSA
• X = A – (B + C + D + E)• Estimate the overhead = B + C + D + E
A X
C
D
B
E
Sigcomm IMW - 2001 11A. Sshaikh, A. Greenberg; Nov 01
UCSC
Estimating the Overhead
• Remove SPF calculation from bracket– spf_delay = 60 seconds
Ack for duplicate LSA arrives
Initiator LSA arrives
Initiator LSA processing done
Duplicate LSA arrivestime
Target RouterTopTracker
Send initiator LSASend duplicate LSA
Duplicate LSA processing done; send ack
SPF calculation starts
overhead = B + C + D + E
B
E
C
D
Overhead
Sigcomm IMW - 2001 12A. Sshaikh, A. Greenberg; Nov 01
UCSC
Results
• Results for Cisco GSR, 7513 and GateD– for GateD, comparison of black-box results
with those obtained using instrumentation (white-box)
– route processors• Cisco: 200 MHz R5000 processor• GateD: 500 MHz AMD-K6 processor
• Topology used is a full n n mesh with random OSPF edge weights– vary n in the range 10, 20, …, 100
Sigcomm IMW - 2001 13A. Sshaikh, A. Greenberg; Nov 01
UCSC
Results for Cisco Routers
• Similar results for two models• SPF calculation time is O(n2)
Mean SPF time (Cisco GSR)
0
0.005
0.01
0.015
0.02
0.025
0.03
0 20 40 60 80 100
Number of nodes (n)
Tim
e (s
eco
nd
s)
Mean SPF Time (Cisco 7513)
0
0.005
0.01
0.015
0.02
0.025
0.03
0 20 40 60 80 100
Number of nodes (n)T
ime
(sec
on
ds)
Sigcomm IMW - 2001 14A. Sshaikh, A. Greenberg; Nov 01
UCSC
Results for GateD
• Black-box over-estimates white-box measurement• Black-box captures the characteristics very well
Mean SPF Time (GateD)
0
0.002
0.004
0.006
0.008
0.01
0.012
0.014
0.016
0.018
0 20 40 60 80 100
Number of nodes (n)
Tim
e (s
eco
nd
s)
Black-box
White-box
Sigcomm IMW - 2001 15A. Sshaikh, A. Greenberg; Nov 01
UCSC
OSPF Task Delays (Cisco)
• LSA Processing – 100-800 microseconds
• LSA flooding– 30-40 milliseconds– pacing timer is the determining factor
• SPF calculation– 1-40 milliseconds– O(n2) behavior for full n x n mesh
• FIB update time– 100-300 milliseconds– no dependence on the size of the topology
Sigcomm IMW - 2001 16A. Sshaikh, A. Greenberg; Nov 01
UCSC
Toolkit
• Use of topology emulator– loads topologies– generates specific patterns of LSAs
• Use of protocol dynamics mandated by standards– duplicate LSA mechanism: OSPF is required to ack a
duplicate LSA immediately– useful for estimating end-point of tasks like SPF
calculation
• Use of vendor-specific parameters:– spf_delay– spf_holdtime– Pacing timer
Sigcomm IMW - 2001 17A. Sshaikh, A. Greenberg; Nov 01
UCSC
Conclusions
• Black-box methods for estimating OSPF processing delays: – LSA processing and flooding
– SPF calculation and FIB Update
• Applied techniques to Cisco GSR and 7513 routers as well as GateD
• Black-box methods worked• Future work
– develop techniques for other protocols, in particular BGP
Sigcomm IMW - 2001 18A. Sshaikh, A. Greenberg; Nov 01
UCSC
Backup
Sigcomm IMW - 2001 19A. Sshaikh, A. Greenberg; Nov 01
UCSC
OSPF Overview : Example
A
B
DC
E
F
I
G
H
J
11 1
1 12 1
3
21
1
1 1
OSPF Domain (single area)
A
B
DC
E
F
I
G
H
J
1
1 1 1
21
1
1
SPT at G
1
Sigcomm IMW - 2001 20A. Sshaikh, A. Greenberg; Nov 01
UCSC
LSA Processing
Receive an LSA
New/duplicate?new
Update topology view
Schedule SPF calc. if reqd.
duplicate
Acknowledge LSA immed.Send LS Ack packet back
Flood the LSA out
LSA Processing over
SPF Calculation
paced by hold-down timer(spf_delay)
Sigcomm IMW - 2001 21A. Sshaikh, A. Greenberg; Nov 01
UCSC
SPF Calculation
LSA Processing over
SPF calculation ends
FIB is updated
SPF calculation starts
Sigcomm IMW - 2001 22A. Sshaikh, A. Greenberg; Nov 01
UCSC
Internal OSPF Tasks to Measure
• Processing Link State Advertisements (LSAs)
• Flooding LSAs
• Performing SPF calculation– described in this talk
• Updating the Forwarding Information Base (FIB)