global internet & bgp

69
Global Internet & BGP

Upload: gray-gould

Post on 04-Jan-2016

39 views

Category:

Documents


0 download

DESCRIPTION

Global Internet & BGP. Acknowledgement(texts & figs). Kurose Govindan Zahid Peterson & Davie Kevin. Hierarchies. What? Logical structure overlaid on collections of nodes Why? Together with information abstraction, the only known solution to scaling issues. Routing Hierarchies. - PowerPoint PPT Presentation

TRANSCRIPT

Page 1: Global Internet & BGP

Global Internet & BGP

Page 2: Global Internet & BGP

Acknowledgement(texts & figs)

• Kurose

• Govindan

• Zahid

• Peterson & Davie

• Kevin

Page 3: Global Internet & BGP

Hierarchies

• What?– Logical structure overlaid on collections of

nodes

• Why?– Together with information abstraction, the only

known solution to scaling issues

Page 4: Global Internet & BGP

Routing Hierarchies

• Flat routing doesn’t scale– Each node cannot be expected to have routes to every

destination (or destination network)

• Key observation– Need less information with increasing distance to

destination

• Two radically different approaches for routing– The area hierarchy

– The landmark hierarchy

Page 5: Global Internet & BGP

Inter-AS routing

Page 6: Global Internet & BGP

Autonomous systems

• What is an AS?– A set of routers under a single technical

administration, using an interior gateway protocol (IGP) and common metrics to route packets within the AS and using an exterior gateway protocol (EGP) to route packets to other AS’s.

– sometimes AS’s use multiple IGPs and metrics, but appear as single AS’s to other AS’s.

Page 7: Global Internet & BGP

Subnetting & CIDR

Page 8: Global Internet & BGP

Global Addresses

• Properties– IPv4 uses 32 bit address space– globally unique– hierarchical: network + host

• Dot Notation– 10.3.2.4– 128.96.33.81– 192.12.69.77

• Assigning authority– Jon Postel ran IANA ‘til ‘98– Assigned by ICANN

Network Host

7 24

0A:

Network Host

14 16

1 0B:

Network Host

21 8

1 1 0C:

1 1 1D: 0 Multicast

1 1 1E: 1 Experimental

Page 9: Global Internet & BGP

How to Make Routing Scale• Flat (Ethernet) versus Hierarchical (Internet) Addresses

– All hosts attached to same network have same network address

• Problem: inefficient use of Hierarchical Address Space– class C with 2 hosts (2/255 = 0.78% efficient)– class B with 256 hosts (256/65535 = 0.39% efficient)

• Problem: still Too Many Networks– routing tables do not scale

• Big tables make routers expensive

– route propagation protocols do not scale

Page 10: Global Internet & BGP

Today’s Internet

• Consists of ISP’s (Internet Service Providers) who run AS’s (Autonomous Systems)

• All you need to become an ISP is some address space, an AS number and a peer or two– Easier said than done

• Getting addresses and AS number is the tricky part• There are public peering points (MAE East, Central and West)

– NAP’s run by MCI where peering can take place

• Most peering points are private

• Number of connections have been doubling for some time – how do we deal with this kind of scaling?

Page 11: Global Internet & BGP

Subnetting - 1985• Original intent was for network to identify one physical network

– Lots of small networks are what we actually have – how do we handle this?

• Solution: add another level to address/routing hierarchy: subnet• Subnet masks define variable partition of host part

– 1’s identify subnet, 0’s identify hosts within the subnet– Mechanism for sharing a single network number among multiple networks

• Subnets visible only within a site

Network number Host number

Class B address

Subnet mask (255.255.255.0)

Subnetted address

111111111111111111111111 00000000

Network number Host IDSubnet ID

Page 12: Global Internet & BGP

Subnetting

• Subnetting is the process of creating multiple segments within a single IP network address space

• From the perspective of a node outside the network, all nodes on any of the subnetworks appear to be on the original single network

• Internet routing tables are not affected by subnetting, I.e. routing tables need not be overloaded with information about routes to all internal subnets, just information to the access router/gateway

Page 13: Global Internet & BGP

Subnetting (Continued)

• Classes A, B and C in IP addressing are designed with two levels of hierarchy (netid & hostid)

• Problem: An organization with a class B address can not have more than network and all 216 hosts are attached to that network A nightmare in managing this network, single broadcast domain, security issues, etc…

• Subnetting create another level of hierarchy (netid, subnetid and hostid). Delivery of IP packets involves three steps; delivery to the site router, delivery to the subnet router, delivery to the host

Page 14: Global Internet & BGP

Subnet Example

Forwarding table at router R1Subnet Number Subnet Mask Next Hop128.96.34.0 255.255.255.128 interface 0128.96.34.128 255.255.255.128 interface 1128.96.33.0 255.255.255.0 R2

Subnet mask: 255.255.255.128Subnet number: 128.96.34.0

128.96.34.15 128.96.34.1

H1R1

128.96.34.130Subnet mask: 255.255.255.128Subnet number: 128.96.34.128

128.96.34.129128.96.34.139

R2H2

128.96.33.1128.96.33.14

Subnet mask: 255.255.255.0Subnet number: 128.96.33.0

H3

Page 15: Global Internet & BGP

A network with two levels of hierarchy

R 141.14.0.0

141.14.2.20141.14.2.21 141.14.2.105

141.14.7.96

141.14.7.95

141.14.7.44141.14.22.64

141.14.22.8

To the internet

Page 16: Global Internet & BGP

A network with three levels of hierarchy

R

141.14.2.0

141.14.7.0

141.14.22.0

141.14.2.20 141.14.2.21141.14.2.105

141.14.7.96

141.14.4.45

141.14.22.9

141.14.22.64

To the internet

The previous network isdivided into 3 subnets

Page 17: Global Internet & BGP

Subnet Masking• Subnetting is achieved by “stealing” some bits from

the hostid field to represent the subnet portion of the address

• Those bits that are used for the subnetid are identified through the use of a subnet mask

• Masking is the process of extracting the address of the physical network (if subnetting is not used) or the subnet address (if subnetting is used) from an IP address

• A subnet mask is a 32-bit pattern having a “1” in every netid and subnetid locations and a “0” in every hostid location

Page 18: Global Internet & BGP

Subnet masking (Continued)

• Subnet masking is performed (both at the host and at the router) by applying “bit-wise-and” operation between the IP address and the subnet mask

• Example 1: Class B network without subnetting– 141.14.2.21 is 10001101.00001110.00000010.00010101

– 255.255.0.0 is 11111111.11111111.00000000.00000000

– “Bit-wise and”10001101.00001110.00000000.00000000

Mask255.255.0.0

IP address141.14.2.21

Network address 141.14.0.0

Page 19: Global Internet & BGP

Subnet Masking (Continued)

• Example 2: Class B network with subnetting– 141.14.2.21 is

10001101.00001110.00000010.00010101

– 255.255.255.0 is 11111111.11111111.11111111.00000000

– “Bit-wise and” is 10001101.00001110.00000010.00000000

– The subnet address is hence 141.14.2.0

Mask

255.255.255.0IP address141.14.2.21

Subnet address 141.14.2.0

Page 20: Global Internet & BGP

Forwarding Algorithm

D = destination IP addressfor each entry (SubnetNum, SubnetMask, NextHop)

D1 = SubnetMask & D if D1 = SubnetNum if NextHop is an interface deliver datagram directly to D

else deliver datagram to NextHop

Page 21: Global Internet & BGP

Forwarding

• Use a default router if nothing matches• Not necessary for all 1s in subnet mask to be

contiguous • Can put multiple subnets on one physical network• Subnets not visible from the rest of the Internet• This is a simple, toy example!!

Page 22: Global Internet & BGP

Subnets

• Subnetting is not the only way to solve scalability problems

• Additional router support is necessary to include netmask and forwarding functionality

• Non-contiguous netmask numbers can be used

– They make administration more difficult

• Multiple subnets can reside on a single network

– Requires routers within the network

• Subnets help solve scalability problems

– Do not require us to use class B or C address for each physical network

– Help us to aggrigate information

• Chief advantage of IP addresses: routers could keep one entry per network instead of one per destination host

Page 23: Global Internet & BGP

Continued Problems with IPv4 Addresses

• Problem: – Potential exhaustion of IPv4 address space (due to

inefficiency)• Class B network numbers are highly prized• Lots of class C addresses but no one wants them

– Growth of back bone routing tables• We don’t want lots of small networks since this causes large routing

tables

• Solution: – Allow addresses assigned to a single entity to span multiple

classed prefixes– Enhance route aggregation

Page 24: Global Internet & BGP

Supernetting • Assign block of contiguous network numbers to nearby

networks• Called CIDR: Classless Inter-Domain Routing

– Breaks rigid boundries between address classes– If ISP needs 16 class C addresses, make them contiguous

• Eg.192.4.16 to 192.4.31 enables a 20-bit network number

• Represent blocks (number of class C networks) with a single pair (first_network_address, count)

• Restrict block sizes to powers of 2• Use a bit mask (CIDR mask) to identify block size• All routers must understand CIDR addressing

Page 25: Global Internet & BGP

IP addressing: CIDR

• CIDR: Classless InterDomain Routing– network portion of address of arbitrary length– address format: a.b.c.d/x, where x is # bits in

network portion of address

11001000 00010111 00010000 00000000

networkpart

hostpart

200.23.16.0/23

Page 26: Global Internet & BGP

CIDR (continued)

• Why?– Reduce amount of global routing information

via aggregation

Service Provider

GlobalInternetRoutingMesh

204.71.0.0204.71.1.0

204.71.2.0204.71.3.0

204.71.4.0

204.71.0.0/16

Page 27: Global Internet & BGP

CIDR Addresses• Identifying a CIDR block requires both an address and a mask

– Slash notation– 128.211.168.0/21 for addresses 128.211.168.0 – 128.211.175.255

• Here the /21 indicates a 21 bit mask

– All possible CIDR masks can easily be generated• /8, /16, /24 correspond to traditional class A, B, C categories

• IP addresses are now arbitrary integers, not classes• Raises interesting questions about lookups

– Routers cannot determine the division between prefix and suffix just by looking at the address

• Hashing does not work well• Interesting lookup algorithms have been developed and analyzed

Page 28: Global Internet & BGP

CIDR – A Couple Details

• ISP’s can further subdivide their blocks of addresses using CIDR

• Some prefixes are reserved for private addresses– 10/8, 172.16/12, 192.168/16, 169.254/16– These are not routable in the Internet

Page 29: Global Internet & BGP

Inter-domain routing

BGP

Page 30: Global Internet & BGP

AS Numbers (ASNs)ASNs are 16 bit values.

64512 through 65535 are “private”

• Genuity: 1 • MIT: 3• Harvard: 11• UC San Diego: 7377• AT&T: 7018, 6341, 5074, … • UUNET: 701, 702, 284, 12199, …• Sprint: 1239, 1240, 6211, 6242, …• …

ASNs represent units of routing policy

Currently over 11,000 in use.

Page 31: Global Internet & BGP

How Many ASNs are there?

Thanks to Geoff Huston. http://www.telstra.net/ops on June 23, 2001

Page 32: Global Internet & BGP

64,511

2005?2007?

When will we run out of ASNs?

Page 33: Global Internet & BGP

Internet inter-AS routing: BGP

• BGP (Border Gateway Protocol): the de facto standard

• Path Vector protocol:

– similar to Distance Vector protocol

– each Border Gateway broadcast to neighbors (peers) entire path (I.e, sequence of ASs) to destination

– E.g., Gateway X may send its path to dest. Z:

Path (X,Z) = X,Y1,Y2,Y3,…,Z

Page 34: Global Internet & BGP

Internet inter-AS routing: BGPSuppose: gateway X send its path to peer gateway W• W may or may not select path offered by X

– cost, policy (don’t route via competitors AS), loop prevention reasons.

• If W selects path advertised by X, then:Path (W,Z) = w, Path (X,Z)

• Note: X can control incoming traffic by controling it route advertisements to peers:– e.g., don’t want to route traffic to Z -> don’t advertise any routes to Z

Page 35: Global Internet & BGP

Internet inter-AS routing: BGP• BGP messages exchanged using TCP.• BGP messages:

– OPEN: opens TCP connection to peer and authenticates sender

– UPDATE: advertises new path (or withdraws old)– KEEPALIVE keeps connection alive in absence of

UPDATES; also ACKs OPEN request– NOTIFICATION: reports errors in previous msg;

also used to close connection

Page 36: Global Internet & BGP

Why different Intra- and Inter-AS routing ? Policy: • Inter-AS: admin wants control over how its traffic

routed, who routes through its net. • Intra-AS: single admin, so no policy decisions needed

Scale:• hierarchical routing saves table size, reduced update

trafficPerformance: • Intra-AS: can focus on performance• Inter-AS: policy may dominate over performance

Page 37: Global Internet & BGP

Architecture of Dynamic Routing

AS 1

AS 2

BGP

EGP = Exterior Gateway Protocol

IGP = Interior Gateway Protocol

Metric based: OSPF, IS-IS, RIP, EIGRP (cisco)

Policy based: BGP

The Routing Domain of BGP is the entire Internet

OSPF

EIGRP

Page 38: Global Internet & BGP

38

Many Routing Processes Can Run on a Single Router

Forwarding Table

OSPFDomain

RIPDomain

BGP

OS kernel

OSPF Process

OSPF Routing tables

RIP Process

RIP Routing tables

BGP Process

BGP Routing tables

Forwarding Table Manager

Page 39: Global Internet & BGP

AS categories

– Stub: an AS that has only a single connection to one other AS - carries only local traffic.

– Multihomed: an AS that has connections to more than one AS, but refuses to carry transit traffic

– Transit: an AS that has connections to more than one AS, and carries both transit and local traffic (under certain policy restrictions)

Page 40: Global Internet & BGP

40

Nontransit vs. Transit ASes

ISP 1ISP 2

Nontransit ASmight be a corporateor campus network.Could be a “content provider”

NET ATraffic NEVER flows from ISP 1through NET A to ISP 2(At least not intentionally!)

IP traffic

Internet Serviceproviders (often)have transit networks

Page 41: Global Internet & BGP

41

Selective Transit

NET BNET C

NET A provides transitbetween NET B and NET Cand between NET D and NET C

NET A

NET D

NET A DOES NOTprovide transitBetween NET D and NET B

Most transit networks transit in a selective manner…

IP traffic

Page 42: Global Internet & BGP

Choices for global routing

• Link state or distance vector?– no universal metric - policy decisions

• Problems with distance-vector:– Bellman-Ford algorithm may not converge

• Problems with link state:– metric used by routers not the same - loops– LS database too large - entire Internet– may expose policies to other AS’s

Page 43: Global Internet & BGP

Solution: Path Vectors

• Each routing update carries the entire path

• Loops are detected as follows:– when AS gets route check if AS already in path– if yes, reject route– if no, add self and advertise route further

• Advantage:– metrics are local - AS chooses path, protocol

ensures no loops

Page 44: Global Internet & BGP

44

ASPATH Attribute

AS7018135.207.0.0/16AS Path = 6341

AS 1239Sprint

AS 1755Ebone

AT&T

AS 3549Global Crossing

135.207.0.0/16AS Path = 7018 6341

135.207.0.0/16AS Path = 3549 7018 6341

AS 6341

135.207.0.0/16

AT&T Research

Prefix Originated

AS 12654RIPE NCCRIS project

AS 1129Global Access

135.207.0.0/16AS Path = 7018 6341

135.207.0.0/16AS Path = 1239 7018 6341

135.207.0.0/16AS Path = 1755 1239 7018 6341

135.207.0.0/16AS Path = 1129 1755 1239 7018 6341

Page 45: Global Internet & BGP

45

Interdomain Loop Prevention

BGP at AS YYY will never accept a route with ASPATH containing YYY.

AS 7018

12.22.0.0/16ASPATH = 1 333 7018 877

Don’t Accept!

AS 1

Page 46: Global Internet & BGP

Problems

• Routing table size– need an entry for all paths to all networks

• Required memory= O(N + M*A) * K)– N: number of networks– M: mean AS distance– A: number of AS’s– K: number of BGP peers

• Problem reduced with CIDR

Page 47: Global Internet & BGP

Routing information bases (RIB)

• Routes are stored in RIBs

• Adj-RIBs-In: routing info that has been learned from other routers (unprocessed routing info)

• Loc-RIB: local routing information selected from Adj-RIBs-In (routes selected locally)

• Adj-RIBs-Out: info to be advertised to peers (routes to be advertised)

Page 48: Global Internet & BGP

Conceptual Mode of Operation

• RIB = Routing information baseAdj-RIB-In Adj-RIB-Out

Loc-RIB

per BGP neighbor per BGP neighbor

Page 49: Global Internet & BGP

Routing table size

networks(NLRI)

mean ASdistance

# of AS’s BGPpeers/net

memory

2,100 5 59 3 27,000

4,000 10 100 6 108,000

10,000 15 300 10 490,000

100,000 20 3,000 20 1,040,000

Page 50: Global Internet & BGP

Policy with BGP

• BGP provides capability for enforcing various policies

• Policies are not part of BGP: they are provided to BGP as configuration information

• BGP enforces policies by choosing paths from multiple alternatives and controlling advertisement to other AS’s

Page 51: Global Internet & BGP

51

So Many Choices

Which route shouldFrank pick to 13.13.0.0./16?

AS 1

AS 2

AS 4

AS 3

13.13.0.0/16

Frank’s Internet Barn

peer peer

customerprovider

Page 52: Global Internet & BGP

52

Implementing Backup Links with Local Preference (Outbound

Traffic)

Forces outbound traffic to take primary link, unless link is down.

AS 1

primary link backup link

Set Local Pref = 100for all routes from AS 1 AS 65000

Set Local Pref = 50for all routes from AS 1

We’ll talk about inbound traffic soon …

Page 53: Global Internet & BGP

53

Back to Frank …

AS 1AS 2

AS 4

AS 3

13.13.0.0/16

peer peer

customerprovider

local pref = 80

local pref = 100

local pref = 90

Higher Localpreference valuesare more preferred

Local preference only used in iBGP

Page 54: Global Internet & BGP

CIDR and BGP

AS X197.8.2.0/24

AS Y197.8.3.0/24

AS T (provider)197.8.0.0/23

AS Z

What should T announce to Z?

Page 55: Global Internet & BGP

Options

• Advertise all paths:– Path 1: through T can reach 197.8.0.0/23– Path 2: through T can reach 197.8.2.0/24– Path 3: through T can reach 197.8.3.0/24

• But this does not reduce routing tables! We would like to advertise:– Path 1: through T can reach 197.8.0.0/22

Page 56: Global Internet & BGP

Sets and Sequences

• Problem: what do we list in the route?– list T: omitting information- not acceptable,

may lead to loops– list T, X, Y: misleading, appears as 3-hop path

• Solution: restructure AS Path attribute as:– Path: (Sequence (T), Set (X, Y))– if Z wants to advertise path:

• Path: (Sequence (Z, T), Set (X, Y))

Page 57: Global Internet & BGP

Routing Areas

• Address areas hierarchically– sequentially number top-level areas– sub-areas of area are labeled relative to that

area– nodes are numbered relative to the smallest

containing area• nodes can have multiple addresses

Page 58: Global Internet & BGP

Routing

• Within area– each node has routes to every other node

• Outside area– each node has routes for other top-level areas

only– inter-area packets are routed to nearest border

router

• Can result in sub-optimal paths

Page 59: Global Internet & BGP

Path Suboptimality

1 2

3

1.11.2

2.1 2.2

3.1 3.2

2.2.1

3 hop red pathvs2 hop green path

Page 60: Global Internet & BGP

In fairness: could you do this “right” and still scale?

Exporting internalstate would dramatically increase global instability and amount of routingstate

Shorter Doesn’t Always Mean Shorter

AS 4

AS 3

AS 2

AS 1

Mr. BGP says that path 4 1 is better than path 3 2 1

Duh!

Page 61: Global Internet & BGP

BGP limitations

Delayed Internet Routing Convergence

Page 62: Global Internet & BGP

BGP Limitations: Policy

A

B

C

D

E

F

Page 63: Global Internet & BGP

BGP Limitations: Oscillations

A

B C

D

Page 64: Global Internet & BGP

Daily Update Count

Page 65: Global Internet & BGP

What is the Sound of One Route Flapping?

Page 66: Global Internet & BGP

Implementation Does Matter!

Thanks to Abha Ahuja and Craig Labovitz for this plot.

stateless withdrawswidely deployed

stateful withdrawswidely deployed

Page 67: Global Internet & BGP

A Few Bad Apples …

Thanks to Madanlal Musuvathi for this plot. Data source: RIPE NCC

Typically, 80% ofthe updates are for less than 5% Of the prefixes.

Most prefixes are stable most of the time. On this day, about 83% of the prefixes were not updated.

Percent of BGP table prefixes

Page 68: Global Internet & BGP

30 Second Bursts

Page 69: Global Internet & BGP

How Long Does BGP Take to Adapt to Changes?

0

10

20

30

40

50

60

70

80

90

100

0 20 40 60 80 100 120 140 160

Seconds Until Convergence

Cu

mu

lati

ve P

erce

nta

ge

of

Eve

nts

Tup

Tshort

Tlong

Tdow n

Thanks to Abha Ahuja and Craig Labovitz for this plot.