dynamics of hot-potato routing in ip networks jennifer rexford at&t labs—research jrex joint...

40
Dynamics of Hot-Potato Routing in IP Networks Jennifer Rexford AT&T Labs—Research http://www.research.att.com /~jrex Joint work with Renata Teixeira, Aman Shaikh, and Timothy Griffin

Post on 22-Dec-2015

216 views

Category:

Documents


1 download

TRANSCRIPT

Page 1: Dynamics of Hot-Potato Routing in IP Networks Jennifer Rexford AT&T Labs—Research jrex Joint work with Renata Teixeira, Aman

Dynamics of Hot-Potato Routing in IP Networks

Jennifer RexfordAT&T Labs—Research

http://www.research.att.com/~jrex

Joint work with Renata Teixeira, Aman Shaikh, and Timothy Griffin

Page 2: Dynamics of Hot-Potato Routing in IP Networks Jennifer Rexford AT&T Labs—Research jrex Joint work with Renata Teixeira, Aman

2

Network “Operations” Research

Understand key operations tasks Traffic engineering Planned maintenance Problem troubleshooting

Develop sound models and methods Problem the operator is trying to solve Underlying protocols and mechanisms Control knobs the operator to tune

Incorporate network measurements Characterize system behavior Detect and diagnose problems Input into “what if” models

Page 3: Dynamics of Hot-Potato Routing in IP Networks Jennifer Rexford AT&T Labs—Research jrex Joint work with Renata Teixeira, Aman

3

Outline

Internet routing Interdomain and intradomain routing Coupling due to hot-potato routing

Measuring hot-potato routing Measuring the two routing protocols Correlating the two data streams

Performance evaluation Characterization on AT&T’s network Implications on operational practices

Conclusions and future directions

Page 4: Dynamics of Hot-Potato Routing in IP Networks Jennifer Rexford AT&T Labs—Research jrex Joint work with Renata Teixeira, Aman

4

Autonomous Systems

1

2

3

4

5

67

ClientWeb server

AS path: 6, 5, 4, 3, 2, 1

Multiple links Middle of path

Page 5: Dynamics of Hot-Potato Routing in IP Networks Jennifer Rexford AT&T Labs—Research jrex Joint work with Renata Teixeira, Aman

5

Interdomain Routing (BGP)

Border Gateway Protocol (BGP) IP prefix: block of destination IP addresses AS path: sequence of ASes along the path

Policy configuration by the operator Path selection: which of the paths to use? Path export: which neighbors to tell?

1 2 3

12.34.158.5

“I can reach 12.34.158.0/24”

“I can reach 12.34.158.0/24 via AS 1”

Page 6: Dynamics of Hot-Potato Routing in IP Networks Jennifer Rexford AT&T Labs—Research jrex Joint work with Renata Teixeira, Aman

6

Intradomain Routing (IGP) Interior Gateway Protocol (OSPF and IS-IS)

Shortest path routing based on link weights Routers flood link-state information to each other Routers compute “next hop” to reach other routers

Link weights configured by the operator Simple heuristics: link capacity or physical distance Traffic engineering: tuning link weights to the traffic

32

2

1

13

1

4

5

3Path cost: 2+1+5

Page 7: Dynamics of Hot-Potato Routing in IP Networks Jennifer Rexford AT&T Labs—Research jrex Joint work with Renata Teixeira, Aman

7

Two-Level Internet Routing

Hierarchical routing Intra-domain

Metric based Inter-domain

Reachability and policy

Design principles Scalability Isolation Simplicity of reasoning

AS 1

AS 2

AS 3

intra-domainrouting (IGP)

inter-domainrouting (BGP)

Autonomous system (AS) = network with unified administrative routing policy (ex. AT&T, Sprint, UCSD)

Page 8: Dynamics of Hot-Potato Routing in IP Networks Jennifer Rexford AT&T Labs—Research jrex Joint work with Renata Teixeira, Aman

8

packet to dst

Coupling: Hot-Potato Routing

dst

XISP network

Y

10

911

Z

Hot-potato routing = ISPs policy of routing to closest exit point when there is more than one route to destination

Consequences:Routers CPU overloadTransient forwarding instabilityTraffic shiftInter-domain routing changes

ISP networkfailureplanned maintenancetraffic engineering

Routes to thousandsof destinations switch exit point!!!

Page 9: Dynamics of Hot-Potato Routing in IP Networks Jennifer Rexford AT&T Labs—Research jrex Joint work with Renata Teixeira, Aman

9

BGP Decision Process

Ignore if exit point unreachable Highest local preference Lowest AS path length Lowest origin type Lowest MED (with same next hop AS) Lowest IGP cost to next hop Lowest router ID of BGP speaker

Hot potato

Page 10: Dynamics of Hot-Potato Routing in IP Networks Jennifer Rexford AT&T Labs—Research jrex Joint work with Renata Teixeira, Aman

10

Hot-Potato Routing Model

Cost vector for Z: cX=10, cW=8, and cY=9

Egress set for dst: {X, Y} Best route for Z: through Y, which is closest

X

Y

10

99

Z

dst

8

W

Hot-potato change: change in cost vector causes change in best route

Page 11: Dynamics of Hot-Potato Routing in IP Networks Jennifer Rexford AT&T Labs—Research jrex Joint work with Renata Teixeira, Aman

11

The Big Picture

Interdomainchanges

Cost vector changes

Hot-potato routing changes(interdomain changes caused by intradomain changes)

Page 12: Dynamics of Hot-Potato Routing in IP Networks Jennifer Rexford AT&T Labs—Research jrex Joint work with Renata Teixeira, Aman

12

Why Care about Hot Potatoes?

Understanding of Internet routing Frequency of hot-potato routing changes Influence on end-to-end performance

Operational practices Knowing when hot-potato changes happen Avoiding unnecessary hot-potato changes Analyzing externally-caused BGP updates

Distributed root cause analysis Each AS can tell what BGP updates it caused Someone should know why each change happened

Page 13: Dynamics of Hot-Potato Routing in IP Networks Jennifer Rexford AT&T Labs—Research jrex Joint work with Renata Teixeira, Aman

13

Outline

Internet routing Interdomain and intradomain routing Coupling due to hot-potato routing

Measuring hot-potato routing Measuring the two routing protocols Correlating the two data streams

Performance evaluation Characterization on AT&T’s network Implications on network practices

Conclusions and future directions

Page 14: Dynamics of Hot-Potato Routing in IP Networks Jennifer Rexford AT&T Labs—Research jrex Joint work with Renata Teixeira, Aman

14

Why is This So Darn Hard?

Noisy signals Multiple IGP messages per event Multiple BGP messages per change Large background of BGP updates

Routing protocols Protocols don’t divulge their reasons Highly-configurable BGP policies Vendor-specific details (e.g., timers)

Monitoring limitations Limited number of vantage points Delivering the measurement data Time synchronization across collectors

Page 15: Dynamics of Hot-Potato Routing in IP Networks Jennifer Rexford AT&T Labs—Research jrex Joint work with Renata Teixeira, Aman

15

Our Approach

Measure both protocols BGP and OSPF monitors

Correlate the two streams Match BGP updates with OSPF events

Analyze the interaction

X

Y

Z

M

AT&T backboneOSPF messagesBGP updates

Page 16: Dynamics of Hot-Potato Routing in IP Networks Jennifer Rexford AT&T Labs—Research jrex Joint work with Renata Teixeira, Aman

16

Heuristic for Matching

Classify BGP updates by possible OSPF causes

Transform stream of OSPFmessages into routing changes

link failurerefresh weight change

chg cost

del

chg cost

Match BGP updateswith OSPF events thathappen close in time

Stream of OSPF messages

Stream of BGP updates

time

Page 17: Dynamics of Hot-Potato Routing in IP Networks Jennifer Rexford AT&T Labs—Research jrex Joint work with Renata Teixeira, Aman

17

Pre-processing OSPF LSAs

Transform OSPF messages into routing changes from a router’s perspective

MX

Y

Z

1

1

12 1

22 10LSA weight change, 10

10LSA weight change, 10

X 5Y 4

CHG Y, 7X 5Y 7

LSA deleteLSA add, 1

DEL X

Y 7

ADD X, 5

X 5 Y 7

OSPF routing changes:

2

1

Page 18: Dynamics of Hot-Potato Routing in IP Networks Jennifer Rexford AT&T Labs—Research jrex Joint work with Renata Teixeira, Aman

18

Classifying BGP Updates

BGP update from Z

Announcement of dst, X Withdrawal of dst, Y

Replacement of routeto dst

different route through Y

See next slide!

Y X

X

Y

Z

dst

M

= “Can’t be caused by a cost change”

Page 19: Dynamics of Hot-Potato Routing in IP Networks Jennifer Rexford AT&T Labs—Research jrex Joint work with Renata Teixeira, Aman

19

Classifying BGP Updates

route via X is betterroute via X is worse

routes are equally good

CHG X, CHG Y?

Y X

X

Y

Z

dst

M

Page 20: Dynamics of Hot-Potato Routing in IP Networks Jennifer Rexford AT&T Labs—Research jrex Joint work with Renata Teixeira, Aman

20

The Role of Time

IGP link-state advertisements Multiple LSAs from a single physical event Group into single cost vector change

BGP update messages Multiple BGP updates during convergence Group into single BGP routing change

Matching IGP to BGP Avoid matching unrelated IGP and BGP changes Match related changes that are close in time

Characterize the measurement data to determine the right windows

10 sec

70 sec

180 sec

Page 21: Dynamics of Hot-Potato Routing in IP Networks Jennifer Rexford AT&T Labs—Research jrex Joint work with Renata Teixeira, Aman

21

Outline

Internet routing Interdomain and intradomain routing Coupling due to hot-potato routing

Measuring hot-potato routing Measuring the two routing protocols Correlating the two data streams

Performance evaluation Characterization on AT&T’s network Implications on network practices

Conclusions and future directions

Page 22: Dynamics of Hot-Potato Routing in IP Networks Jennifer Rexford AT&T Labs—Research jrex Joint work with Renata Teixeira, Aman

22

Summary Results (June 2003)

High variability in % of BGP updates

One LSA can have a big impact

location min max days > 10%

close to peers 0% 3.76% 0

between peers 0% 25.87% 5

location no impact prefixes impacted

close to peers 97.53% less than 1%

between peers 97.17% 55%

Page 23: Dynamics of Hot-Potato Routing in IP Networks Jennifer Rexford AT&T Labs—Research jrex Joint work with Renata Teixeira, Aman

23

Delay for BGP Routing Change

Router vendor scan timer BGP table scan every 60 seconds OSPF changes arrive uniformly in interval

Internal BGP hierarchy (route reflectors) Wait for another router to change best route Introduces a wait for a second BGP scan

Transmitting many BGP messages Latency for transferring the data

We have seen delays of up to 180 seconds!

Page 24: Dynamics of Hot-Potato Routing in IP Networks Jennifer Rexford AT&T Labs—Research jrex Joint work with Renata Teixeira, Aman

24

Delay for BGP Change (1st Prefix)

Cisco BGP scan timer

Two BGP scans

Page 25: Dynamics of Hot-Potato Routing in IP Networks Jennifer Rexford AT&T Labs—Research jrex Joint work with Renata Teixeira, Aman

25

iBGP Route Reflectors

20

10

8

9

X

Y

Z

Wdst

dst Y, 18dst W, 20

11

dst Y, 21dst W, 20

Announcement X dst X,19dst W, 20

Scalability trade-off: Less BGP state

vs. Number of BGP updates from Z and longer convergence delay

Page 26: Dynamics of Hot-Potato Routing in IP Networks Jennifer Rexford AT&T Labs—Research jrex Joint work with Renata Teixeira, Aman

26

Transferring Multiple Prefixes

Page 27: Dynamics of Hot-Potato Routing in IP Networks Jennifer Rexford AT&T Labs—Research jrex Joint work with Renata Teixeira, Aman

27

BGP Updates Over PrefixesC

umul

ativ

e %

BG

P u

pdat

es

% prefixes

Non-OSPF triggeredAll

OSPF-triggered

OSPF-triggered BGP updatesaffects ~50% of prefixesuniformly

prefixes with onlyone exit point

Contrast with non-OSPFtriggered BGP updates

Page 28: Dynamics of Hot-Potato Routing in IP Networks Jennifer Rexford AT&T Labs—Research jrex Joint work with Renata Teixeira, Aman

28

Operational Implications

Forwarding plane convergence Accuracy of active measurements

Router proximity to exit points Likelihood of hot-potato routing changes

Cost in/out of links during maintenance Avoid triggering BGP routing changes

Page 29: Dynamics of Hot-Potato Routing in IP Networks Jennifer Rexford AT&T Labs—Research jrex Joint work with Renata Teixeira, Aman

29

Forwarding Convergence

R1R2

dst

10

100 10111

R2 starts using R1 to reach dstScan process

runs in R2

R1’s scan processcan take up to

60 seconds to run

Packets to dst may be caught in a loop

for 60 seconds!

Page 30: Dynamics of Hot-Potato Routing in IP Networks Jennifer Rexford AT&T Labs—Research jrex Joint work with Renata Teixeira, Aman

30

Measurement Accuracy

Measurements of customer experience Probe machines have just one exit point!

R1R2

dst

10

100 111

loop to reach dst

W1 W2

Page 31: Dynamics of Hot-Potato Routing in IP Networks Jennifer Rexford AT&T Labs—Research jrex Joint work with Renata Teixeira, Aman

31

Avoid Equal-distance Exits

Z

10

10X

Y

Z

1000

1X

Y

dst dst

Small changes will make Z switch exit points to dst

More robust to intra-domainrouting changes

Page 32: Dynamics of Hot-Potato Routing in IP Networks Jennifer Rexford AT&T Labs—Research jrex Joint work with Renata Teixeira, Aman

32

Careful Cost in/out Links

Z

X

Y

55

1010

10

dst

4

100

Traffic is more predictableFaster convergenceLess impact on neighbors

Page 33: Dynamics of Hot-Potato Routing in IP Networks Jennifer Rexford AT&T Labs—Research jrex Joint work with Renata Teixeira, Aman

33

Ongoing Work

Black-box testing of the routers Scan timer and its effects (forwarding loops) Vendor interactions (with Cisco)

Impact of the IGP-triggered BGP updates Changes in the flow of traffic Forwarding loops during convergence Externally visible BGP routing changes

Improving isolation (cooling those potatoes!) Operational practices: preventing interactions Protocol design: weaken the IGP/BGP coupling Network design: internal topology/architecture

Page 34: Dynamics of Hot-Potato Routing in IP Networks Jennifer Rexford AT&T Labs—Research jrex Joint work with Renata Teixeira, Aman

34

Thanks!

Any questions?

Page 35: Dynamics of Hot-Potato Routing in IP Networks Jennifer Rexford AT&T Labs—Research jrex Joint work with Renata Teixeira, Aman

35

Exporting Routing Instability

Z

X

Y

Z

X

Y

dst

dst

Announcement

No change => no announcement

Page 36: Dynamics of Hot-Potato Routing in IP Networks Jennifer Rexford AT&T Labs—Research jrex Joint work with Renata Teixeira, Aman

36

What to do?

Increase estimate for forwarding convergence For destinations/customers with multiple exit points

Extensions to measurement infrastructure Multiple access links for a probe machine Multiple probe machines with same address

Better BGP implementation on the router Decrease scan timer (maybe too much overhead?) Event-driven IGP/BGP implementation

Page 37: Dynamics of Hot-Potato Routing in IP Networks Jennifer Rexford AT&T Labs—Research jrex Joint work with Renata Teixeira, Aman

37

Time Lag

OSPF-triggered BGP updatesfor June 25th, 2003

Cum

ulat

ive

%B

GP

upd

ates

time BGP – time OSPF (seconds)

~15% of OSPF-triggeredBGP updates in a day

most OSPF-triggered BGP updates lagfrom 10 to 60 seconds

Page 38: Dynamics of Hot-Potato Routing in IP Networks Jennifer Rexford AT&T Labs—Research jrex Joint work with Renata Teixeira, Aman

38

Challenges

Lack of information on routing messages Routing protocols are designed to determine a path

between two hosts, but not to give reason

X

Y

Z

M

dst

Example 1: BGP update caused by OSPF

BGP: announcement: dst, XOSPF: CHG: X, 8

9

10

dst, Ydst, X

8

Page 39: Dynamics of Hot-Potato Routing in IP Networks Jennifer Rexford AT&T Labs—Research jrex Joint work with Renata Teixeira, Aman

39

M

Challenges

Lack of information on routing messages Routing protocols are designed to determine a path

between two hosts, but not to give reason

X

Y

Z

dst

Example 2: BGP update NOT caused by OSPF

9

10

dst, Ydst, X

BGP: announcement: dst, X

Page 40: Dynamics of Hot-Potato Routing in IP Networks Jennifer Rexford AT&T Labs—Research jrex Joint work with Renata Teixeira, Aman

40

M

Challenges

Lack of information on routing messages Routing protocols are designed to determine a path

between two hosts, but not to give reason

X

Y

Z

dst

Example 2: BGP update NOT caused by OSPF

9

10

dst, Ydst, X

8

BGP: announcement: dst, XOSPF: CHG: X, 8