understanding the internet as-level structure

169
UNIVERSITY OF C ALIFORNIA Los Angeles Understanding the Internet AS-level Structure A dissertation submitted in partial satisfaction of the requirements for the degree Doctor of Philosophy in Computer Science by Ricardo V. Oliveira 2009

Upload: others

Post on 11-Feb-2022

1 views

Category:

Documents


0 download

TRANSCRIPT

UNIVERSITY OF CALIFORNIA

Los Angeles

Understanding the Internet AS-level Structure

A dissertation submitted in partial satisfaction

of the requirements for the degree

Doctor of Philosophy in Computer Science

by

Ricardo V. Oliveira

2009

The dissertation of Ricardo V. Oliveira is approved.

Yingnian Wu

Beichuan Zhang

Mario Gerla

Songwu Lu

Leonard Kleinrock

Lixia Zhang, Committee Chair

University of California, Los Angeles

2009

ii

To my Parents . . .

for their unconditional love and affection

iii

TABLE OF CONTENTS

1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1

2 Background . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5

2.1 Internet Routing . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5

2.2 Inter-domain Connectivity and Peering . . . . . . . . . . . . . . . . . 7

2.3 Ground Truth vs. Observed Map . . . . . . . . . . . . . . . . . . . . 9

3 Topology Liveness and

Completeness Problems . . . . . . . . . . . . . . . . . . . . . . . . . . . . 13

4 A Solution to the Liveness Problem . . . . . . . . . . . . . . . . . . . . 18

4.1 An Empirical Model of Observed Topology Dynamics . . . . . . . . 18

4.1.1 Data Sets . . . . . . . . . . . . . . . . . . . . . . . . . . . . 18

4.1.2 An Empirical Model . . . . . . . . . . . . . . . . . . . . . . 20

4.1.3 Comparison with router configuration files from a Tier-1 . . . 29

4.1.4 Comparison with Internet Registry Data . . . . . . . . . . . . 31

4.1.5 Evaluation of Traceroute Data . . . . . . . . . . . . . . . . . 32

4.2 Applications . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 37

4.2.1 More Accurate View of the Topology . . . . . . . . . . . . . 38

4.2.2 Evaluating Theoretical Models . . . . . . . . . . . . . . . . . 42

4.2.3 Characterizing Evolution Trends . . . . . . . . . . . . . . . . 44

4.3 Discussion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 47

iv

5 Quantifying the Topology (in)Completeness . . . . . . . . . . . . . . . . 51

5.1 Data Sets . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 51

5.2 Establishing the Ground Truth . . . . . . . . . . . . . . . . . . . . . 53

5.3 Case studies . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 58

5.3.1 Tier-1 Network . . . . . . . . . . . . . . . . . . . . . . . . . 59

5.3.2 Tier-2 Network . . . . . . . . . . . . . . . . . . . . . . . . . 62

5.3.3 Abilene and Geant . . . . . . . . . . . . . . . . . . . . . . . 65

5.3.4 Content provider . . . . . . . . . . . . . . . . . . . . . . . . 67

5.3.5 Simple stubs . . . . . . . . . . . . . . . . . . . . . . . . . . 70

5.4 Completeness of the public view . . . . . . . . . . . . . . . . . . . . 71

5.4.1 ”Public view” vs. ground truth . . . . . . . . . . . . . . . . . 71

5.4.2 Network Classification . . . . . . . . . . . . . . . . . . . . . 73

5.4.3 Coverage of the public view . . . . . . . . . . . . . . . . . . 77

5.5 Discussion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 79

6 Path Exploration and Internet Topology . . . . . . . . . . . . . . . . . 82

6.1 BGP Path Exploration . . . . . . . . . . . . . . . . . . . . . . . . . . 82

6.2 Methodology and Data Set . . . . . . . . . . . . . . . . . . . . . . . 83

6.2.1 Data Set and Preprocessing . . . . . . . . . . . . . . . . . . . 85

6.2.2 Clustering Updates into Events . . . . . . . . . . . . . . . . . 86

6.2.3 Classifying Routing Events . . . . . . . . . . . . . . . . . . . 89

6.2.4 Comparing AS Paths . . . . . . . . . . . . . . . . . . . . . . 91

6.3 Characterizing Events . . . . . . . . . . . . . . . . . . . . . . . . . . 98

v

6.3.1 The Impact of Unstable Prefixes . . . . . . . . . . . . . . . . 104

6.4 Policies, Topology and Routing Convergence . . . . . . . . . . . . . 104

6.4.1 MRAI Timer . . . . . . . . . . . . . . . . . . . . . . . . . . 105

6.4.2 The Impact of Policy and Topology on Routing Convergence . 106

6.4.3 Origin of Fail-down Events . . . . . . . . . . . . . . . . . . . 111

6.4.4 Impact of Fail-down Convergence . . . . . . . . . . . . . . . 112

7 Prefix Hijacking and Internet Topology . . . . . . . . . . . . . . . . . . 115

7.1 Prefix Hijacking . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 115

7.2 Hijack Evaluation Metrics . . . . . . . . . . . . . . . . . . . . . . . 116

7.3 Evaluating Hijacks . . . . . . . . . . . . . . . . . . . . . . . . . . . 119

7.3.1 Simulation Setup . . . . . . . . . . . . . . . . . . . . . . . . 119

7.3.2 Characterizing Topological Resilience . . . . . . . . . . . . . 120

7.3.3 Factors Affecting Resilience . . . . . . . . . . . . . . . . . . 122

7.4 Prefix Hijack Incidents in the Internet . . . . . . . . . . . . . . . . . 125

7.4.1 Case I: Prefix Hijacks by AS-27506 . . . . . . . . . . . . . . 125

7.4.2 Case II: Prefix Hijacks by AS-9121 . . . . . . . . . . . . . . 128

7.5 Discussion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 131

8 Related Work . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 134

8.1 Internet Topology Modeling . . . . . . . . . . . . . . . . . . . . . . 134

8.2 Path Exploration . . . . . . . . . . . . . . . . . . . . . . . . . . . . 135

8.3 Prefix Hijacking . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 137

vi

9 Conclusion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 139

References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 141

vii

LIST OF FIGURES

2.1 Route propagation. . . . . . . . . . . . . . . . . . . . . . . . . . . . 7

2.2 A sample IXP. ASes A through G connect to each other through a

layer-2 switch in subnet 195.69.144/24. . . . . . . . . . . . . . . . . 8

2.3 A set of interconnected ASes, each node represent an AS. (a) shows

an example of hidden links, and (b) an example of invisible links. . . . 10

3.1 Observing Topology Over Time . . . . . . . . . . . . . . . . . . . . 14

4.1 Number of links captured by different sets of monitors . . . . . . . . 19

4.2 Number of monitors in RouteViews and RIPE-RIS combined . . . . . 19

4.3 Number of links, Tier-1 monitor with different starting times . . . . . 19

4.4 Visible links seen by all monitors . . . . . . . . . . . . . . . . . . . . 19

4.5 Link disappearance period . . . . . . . . . . . . . . . . . . . . . . . 21

4.6 Link disappearance period, by all monitors . . . . . . . . . . . . . . . 21

4.7 Observation period as a function of confidence level for links . . . . . 24

4.8 Node birth from RIR . . . . . . . . . . . . . . . . . . . . . . . . . . 24

4.9 Link birth from IRR . . . . . . . . . . . . . . . . . . . . . . . . . . . 27

4.10 Link death from IRR . . . . . . . . . . . . . . . . . . . . . . . . . . 27

4.11 Visible links in Skitter, λ = 0.00598, b = 39.86. . . . . . . . . . . . . 28

4.12 Comparison between routers’ config files connectivity and BGP data

(cumulative) from a Tier-1 network. . . . . . . . . . . . . . . . . . . 30

4.13 Comparison of appearance times between routers’ config files and BGP

data of a Tier-1 network. . . . . . . . . . . . . . . . . . . . . . . . . 30

viii

4.14 Link disappearance period, by Skitter, µ = 0.0385, d = 57.61. . . . . 31

4.15 Comparison of appearance timestamps between Skitter and BGP. . . . 31

4.16 Comparison of disappearance timestamps between Skitter and BGP. . 33

4.17 Number of reachable addresses in Skitter destination list. . . . . . . . 33

4.18 Trade-off between liveness and completeness for topology snapshot. . 37

4.19 Fraction of multi-homed customers. . . . . . . . . . . . . . . . . . . 37

4.20 Attachment probability distribution for a target node degree. . . . . . 38

4.21 Model evaluation . . . . . . . . . . . . . . . . . . . . . . . . . . . . 41

4.22 Node net growth . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 41

4.23 Link net growth . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 44

4.24 Net growth of node wirings. . . . . . . . . . . . . . . . . . . . . . . 44

4.25 Frequency of link changes . . . . . . . . . . . . . . . . . . . . . . . 45

4.26 Number of collected links in DIMES. . . . . . . . . . . . . . . . . . 48

4.27 Diurnal pattern of new link appearances. . . . . . . . . . . . . . . . . 48

4.28 Weekly pattern of new link appearances. . . . . . . . . . . . . . . . . 49

4.29 Link growth for Abilene (AS11537). . . . . . . . . . . . . . . . . . . 49

5.1 Output of “show ip bgp summary” command. . . . . . . . . . . . . . 54

5.2 Configuring remote BGP peerings. R0 and R2 are physically directly

connected, while R1 and R3 are not. . . . . . . . . . . . . . . . . . . 54

5.3 Connectivity of the Tier-1 network (since 2004). . . . . . . . . . . . . 58

5.4 Connectivity of the Tier-1 network (since 2007). . . . . . . . . . . . . 58

5.5 Capturing the connectivity of the Tier-1 network through table snap-

shots and updates. . . . . . . . . . . . . . . . . . . . . . . . . . . . . 59

ix

5.6 Tier-2 network connectivity. . . . . . . . . . . . . . . . . . . . . . . 59

5.7 Capturing Tier-2 network connectivity through table snapshots and up-

dates. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 60

5.8 Abilene connectivity. . . . . . . . . . . . . . . . . . . . . . . . . . . 60

5.9 Projection of the number of peer ASes of a representative content

provider. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 68

5.10 Customer-provider links can be revealed over time, but downstream

peer links are invisible to upstream monitors. . . . . . . . . . . . . . 74

5.11 Distribution of number of downstream customers per AS. . . . . . . . 74

5.12 Example of a prefix hijack scenario where AS2 announces prefix p

belonging to AS1. Because of the invisible peer link AS2–AS3, the

number of ASes affected by the attack is underestimated. . . . . . . . 74

6.1 Path exploration triggered by a fail-down event. . . . . . . . . . . . . 83

6.2 CCDF of inter-arrival times of BGP updates for the 8 beacon prefixes

as observed from the 50 monitors. . . . . . . . . . . . . . . . . . . . 88

6.3 Difference in number of events per [monitor,prefix] for T=2 and 8 min-

utes, relatively to T=4 minutes, during one month period. . . . . . . . 88

6.4 Event taxonomy. . . . . . . . . . . . . . . . . . . . . . . . . . . . . 89

6.5 Usage time per ASPATH-Prefix for router 12.0.1.63, Jan 2006. . . . . 93

6.6 Validation of path preference metric. . . . . . . . . . . . . . . . . . . 95

6.7 Comparison between Ccorrect,Cequal and Cwrong of length , policy and

usage time metrics for (a) Tup and (b) Tdown events of beacon prefixes. 95

6.8 Comparison between accuracy of length, policy and usage time metrics. 95

6.9 Number of Tdown events per monitor. . . . . . . . . . . . . . . . . . . 99

x

6.10 Duration of events for January 2006. . . . . . . . . . . . . . . . . . . 100

6.11 Duration of events for February 2006. . . . . . . . . . . . . . . . . . 100

6.12 Number of Updates per Event, January 2006. . . . . . . . . . . . . . 101

6.13 Number of Unique Paths Explored per Event, January 2006. . . . . . 101

6.14 Duration of events for unstable prefixes, January 2006. . . . . . . . . 101

6.15 Duration of events for stable prefixes, January 2006. . . . . . . . . . . 101

6.16 Determining MRAI configuration. . . . . . . . . . . . . . . . . . . . 106

6.17 Duration of Tdown events as seen by monitors at different tiers. . . . . 109

6.18 Number of unique paths explored during Tdown as seen by monitors at

different tiers. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 109

6.19 Topology example. . . . . . . . . . . . . . . . . . . . . . . . . . . . 110

6.20 Duration of Tdown events observed and originated in different tiers. . . 110

6.21 Number of paths explored during Tdown events observed and originated

in different tiers. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 110

6.22 Median of duration of Tdown events observed and originated in differ-

ent tiers. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 110

6.23 Number of Tdown events over time. . . . . . . . . . . . . . . . . . . . 112

6.24 Case where Tdown convergence disrupts data delivery. . . . . . . . . . 114

7.1 Hijack scenario. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 117

7.2 Distribution of node resilience. . . . . . . . . . . . . . . . . . . . . . 121

7.3 Resilience of nodes in different tiers. . . . . . . . . . . . . . . . . . . 121

7.4 Understanding resilience of tier-1 nodes . . . . . . . . . . . . . . . . 124

7.5 Resilience of nodes with different number of Tier-1 providers. . . . . 124

xi

7.6 Case study: AS-27506 as false origin . . . . . . . . . . . . . . . . . . 127

7.7 Case studies with AS-9121 as false origin . . . . . . . . . . . . . . . 130

xii

LIST OF TABLES

4.1 Model Parameters . . . . . . . . . . . . . . . . . . . . . . . . . . . . 25

4.2 Comparison Between Stub and Transit changes. . . . . . . . . . . . . 41

5.1 IXP membership data, July 2007. . . . . . . . . . . . . . . . . . . . . 53

5.2 Connectivity of stub networks. . . . . . . . . . . . . . . . . . . . . . 71

5.3 Coverage of BGP monitors. . . . . . . . . . . . . . . . . . . . . . . . 78

5.4 Coverage of BGP monitors for different network types. . . . . . . . . 79

6.1 Event Statistics for Jan 2006 (31 days) . . . . . . . . . . . . . . . . . 99

6.2 Event Statistics for Feb 2006 (28 days) . . . . . . . . . . . . . . . . . 100

6.3 Tdown Events by Origin AS . . . . . . . . . . . . . . . . . . . . . . . 113

xiii

ACKNOWLEDGMENTS

First and foremost, I would like to acknowledge my dissertation advisor Dr. Lixia

Zhang for her constant support and guidance through out my dissertation. I would also

like to acknowledge Dr. Mohit Lad for his infinite patience, helpful discussions and

relentless support. I am also grateful to Dr. Beichuan Zhang for contributing to the

original idea of modeling topology evolution by a birth/death model, Dr. Walter Will-

inger and Dr. Dan Pei for their guidance during the AT&T internship, Dr. Christophe

Diot for his guidance during Thomson internship and Dr. Qingming Ma for his super-

vision while at Juniper Networks. I would like to extend a special note of thanks to

Verra Morgan for her time and support during my Ph.D. Finally, various friends and

colleagues have played an important role during my Ph.D., notable among them are

Dr. Vasilis Pappas, Dr. Dan Massey, Rafit Izhak-Ratzin, Cesar Marcondes, Cassio

Lopes, Niko Palaskas, Bruno Miranda, Leonardo Alves and my sister Raquel Oliveira.

Finally, I would like to acknowledge the portuguese ”Fundacao para a Ciencia e Tec-

nologia” (FCT) for their scholarships under which my Phd work was supported.

xiv

VITA

1978 Born, Povoa de Varzim, Portugal

2001 B.E. Electrical Engineering, Faculty of Engineering of Porto Uni-

versity, Portugal.

2001–2002 Software developer, Oberonsis, Portugal.

2002–2003 Telecommunications Engineer, TMN, Portugal.

2005 M.Sc. Computer Science, University of California, Los Angeles.

2007 Intern at AT&T Labs Research.

2007 Intern at Thomson, Paris.

2008 Intern at Juniper Networks.

PUBLICATIONS

1. Ricardo Oliveira, Dan Pei , Walter Willinger, Beichuan Zhang, Lixia Zhang, ”The

(in)Completeness of the Observed Internet AS-level Structure”, to appear in IEEE/ACM

Transactions on Networking

2. Ricardo Oliveira, Beichuan Zhang, Dan Pei, Lixia Zhang, ”Quantifying Path Ex-

ploration in the Internet”, to appear in IEEE/ACM Transactions on Networking, June

xv

2009

3. Italo Cunha, Fernando Silveira, Ricardo Oliveira, Renata Teixeira, Christophe Diot,

”Uncovering Artifacts of Flow Measurement Tools”,Passive and Active Measurement

Conference, April 2009

4. He Yan, Ricardo Oliveira, Kevin Burnett, Dave Matthews, Lixia Zhang, Dan

Massey, ”BGPmon: A real-time, scalable, extensible monitoring system”, Cyberse-

curity Applications and Technologies Conference for Homeland Security (CATCH),

March 2009

5. Ricardo Oliveira, Fernando Silveira, Renata Teixeira, Christophe Diot, ”The elusive

Effect of Routing Dynamics on Traffic Anomalies”, Technical Report, Thomson, CR-

PRL-2008-02-0001

6. Ricardo Oliveira, Dan Pei , Walter Willinger, Beichuan Zhang, Lixia Zhang, ”Quan-

tifying the Completeness of the Observed Internet AS-level Structure”, Technical Re-

port, UCLA CS Department, TR 080026, September 2008

7. Ying-Ju Chi, Ricardo Oliveira , Lixia Zhang, ”Cyclops: The Internet AS-level

Observatory”, ACM SIGCOMM Computer Communication Review (CCR), October

2008

8. Ricardo Oliveira, Ying-Ju Chi, Mohit Lad, Lixia Zhang, ”Cyclops: The Internet

AS-level Observatory”, NANOG 43, Brooklyn, New York, June 2008

xvi

9. Ricardo Oliveira, Dan Pei, Walter Willinger, Beichuan Zhang, Lixia Zhang, ”In

Search of the elusive Ground Truth: The Internet’s AS-level Connectivity Structure”,

ACM SIGMETRICS, Annapolis, USA, June 2008

10. Ricardo Oliveira, Mohit Lad, Beichuan Zhang, Lixia Zhang, ”Geographically

Informed Inter-Domain Routing”, in IEEE ICNP, Beijing, China, October 2007

11. Mohit Lad, Ricardo Oliveira, Dan Massey, Lixia Zhang, ”Inferring the Origin of

Routing Changes using Link Weights”, in IEEE ICNP, Beijing, China, October 2007

12. Ricardo Oliveira, Beichuan Zhang, Lixia Zhang, ”Observing the Evolution of

Internet AS Topology”, in ACM SIGCOMM, Kyoto, Japan, August 2007

13. Ricardo Oliveira, Ying-Ju Chi, Ioannis Pefkianakis, Mohit Lad, Lixia Zhang, ”Vi-

sualizing Internet Topology Dynamics with Cyclops”, in ACM SIGCOMM (poster

session), Kyoto, Japan, August 2007

14. Mohit Lad, Ricardo Oliveira, Beichuan Zhang, Lixia Zhang, ”Understanding the

Resiliency of Internet Topology Against False Origin Attacks”, in IEEE/IFIP DSN,Edinburgh,

UK, June 2007

15. Ricardo Oliveira, Beichuan Zhang, Dan Pei, Rafit Izhak-Ratzin, Lixia Zhang,

”Quantifying Path Exploration in the Internet”, ACM SIGCOMM/USENIX Internet

Measurement Conference(IMC), Rio de Janeiro, Brazil, October 2006

16. Mohit Lad, Ricardo Oliveira, Beichuan Zhang, Lixia Zhang, ”Understanding the

xvii

Impact of Prefix Hijacks in Internet Routing ”, ACM SIGCOMM (poster session),

Pisa, Italy, September 2006

17. Beichuan Zhang, Vamsi Kambhampati, Daniel Massey, Ricardo Oliveira, Dan

Pei, Lan Wang, Lixia Zhang ”A Secure and Scalable Internet Routing Architecture

(SIRA)”, ACM SIGCOMM (poster session), Pisa, Italy, September 2006

18. Ricardo Oliveira, Mohit Lad, Beichuan Zhang, Dan Pei, Daniel Massey, Lixia

Zhang, ”Placing BGP Monitors in the Internet”, Technical Report, UCLA CS Depart-

ment, TR 060017, May 2006

19. Vidyut Samanta, Ricardo Oliveira, Advait Dixit, Parixit Aghera, Petros Zerfos,

Songwu Lu, ”Impact of Video Encoding Parameters on Dynamic Video Transcoding”,

in IEEE COMSWARE, Delhi, India, January 2006.

20. Ricardo Oliveira, Rafit Izhak-Ratzin, Beichuan Zhang, Lixia Zhang,”Measurement

of Highly Active Prefixes in BGP”, in IEEE GLOBECOM, St. Louis, USA, November

2005.

xviii

ABSTRACT OF THE DISSERTATION

Understanding the Internet AS-level Structure

by

Ricardo V. OliveiraDoctor of Philosophy in Computer Science

University of California, Los Angeles, 2009

Professor Lixia Zhang, Chair

The Internet is a vast distributed system consisting of a myriad of independent net-

works interconnected to each other by business relationships. The border gateway

protocol is the glue that keeps this structure connected. Characterizing and modeling

the Internet topology is important to our understanding of Internet routing and its in-

terplay with technical, economic and social forces. In this thesis we address several

challenges that emerge when studying the Internet connectivity. First, not all the ob-

served changes in connectivity correspond to actual changes in the topology. There

are changes that may be caused by transient routing dynamics while others are real

topology changes. The problem of distinguishing between these two types of changes

is non-trivial, and we call it the liveness problem. We propose a solution to this prob-

lem based on a birth/death model of observed links. This solution allows to accurately

detect the permanent changes in the Internet topology graph and measure topology

dynamics in an accurate way. The second problem in obtaining accurate topology

models is the completeness problem, which consists in establishing how much of the

real topology is missing from the observed data. We address the completeness prob-

lem by defining some bounds on how (in)complete is the graph provided by the current

observation. The results using ground truth information obtained from a Tier-1 ISP in-

xix

dicate that the observed Internet graph contains most of the customer-provider links,

but may be missing the vast majority of the peer-peer links. Finally, we study how

protocol properties such as routing convergence and resilience to prefix hijack attacks

depend on the connectivity and relationship between networks. We find that networks

at the border of the Internet undergo more severe path exploration because of the higher

number of paths available to reach other destinations. On the other hand, we show that

Tier-1 networks have the fastest convergence time because of the limited number of

alternative routes. In terms of prefix hijack attacks, we surprisingly find that Tier-1

networks at the core of the Internet are vulnerable to hijack attacks from customers

because of the business nature of BGP route selection. Based on our observations, we

formulate a connectivity recommendation for ISPs to increase their resiliency to these

type of attacks.

xx

CHAPTER 1

Introduction

The Internet has been evolving rapidly over recent years, much like a living organism,

and its topology has become more complex. Characterizing the structure and evolu-

tion trends of the Internet topology is an important research topic for several reasons. It

provides an essential input to the understanding of limitations of existing routing pro-

tocols, the evaluations of new designs, as well as the projection of future needs; and it

will help advance our understanding of the interplay between networking technology,

the resulting topology, and the economic forces behind them.

Many research projects have used a graphic representation of the Internet AS-level

topology, where nodes represent entire autonomous systems (ASes) and two nodes

are connected if and only if the two ASes are engaged in a business relationship to

exchange data traffic. Due to the Internet’s decentralized architecture, however, this

AS-level construct is not readily available and obtaining accurate AS maps has re-

mained an active area of research. A common feature of all the AS maps that have

been used by the research community is that they have been inferred from either BGP-

based or traceroute-based data. Unfortunately, both types of measurements are more a

reflection of what we can measure than what we really would like to measure, resulting

in fundamental limitations as far as their ability to reveal the Internet’s true AS-level

connectivity structure is concerned.

While these limitations inherent in the available data have long been recognized,

there has been little effort in assessing the degree of completeness, accuracy, and am-

1

biguity of the resulting AS maps. Although it is relatively easy to collect a more or less

complete set of ASes, it has proven difficult, if not impossible, to collect the complete

set of inter-AS links. The sheer scale of the AS-level Internet makes it infeasible to

install monitors everywhere or crawl the topology exhaustively. At the same time, big

stakeholders of the AS-level Internet, such as Internet service providers and large con-

tent providers, tend to view their AS connectivity as proprietary information and are in

general unwilling to disclose it. As a result, the quality of the currently used AS maps

has remained by and large unknown. Yet numerous projects [37, 55, 59, 20, 31, 29, 91]

have been conducted using these maps of unknown quality, causing serious scientific

and practical concerns in terms of the validity of the claims made and accuracy of the

results reported.

Obtaining accurate and complete Internet topology data is a challenging task. First,

the observed AS topology snapshots only capture a subset of the real Internet topology

[27, 99, 60, 77, 96, 70]. This is referred as the completeness problem. The incom-

pleteness of the observed AS topology stems from the fact that our main source of

connectivity data are BGP routing tables (Border Gateway Protocol), and BGP was

designed to propagate routing information, not AS adjacencies. In BGP, only the best

path is propagated to neighbors, and not all neighbors receive all routes, therefore it’s

only natural that there is missing connectivity information when using a limited num-

ber of vantage points. By using ground truth information of a Tier-1 ISP, we quantify

to some extent the degree of incompleteness of the observed topology. We find that

current set of vantage points are able to capture the totality of customer-provider links,

but as much as 90% of the peer links are still escaping from observation. The invisible

peer links exist mainly between nodes at the border of the network.

Second, a new problem arises when we try to measure topology changes over time:

the changes in the observed topology do not necessarily reflect the changes in the

2

real topology and vice versa. Because the observed topology is normally inferred

from routing or data paths, its changes can be due to either real topology changes or

transient routing dynamics (e.g. , caused by link failures or router crashes). Therefore

the challenge is, given all the changes in the observed topology over time, how to

differentiate those caused by real topology changes from those caused by transient

routing dynamics, which we call the liveness problem. Only after solving the liveness

problem can we provide empirical topology evolution data such as when and where

an AS or an inter-AS link is added or removed from the Internet. In this thesis we

develop a solution to the liveness problem based on the analysis of available data. Our

analysis shows that the effect of transient routing dynamics on the observed topology

decreases exponentially over time, and the real topology changes can be modeled as

the combination of a constant-rate birth process and a constant-rate death process.

There are several properties of BGP that depend on the structure of the Internet

topology. In this thesis we study two of these properties: path exploration and re-

siliency to prefix hijacks. Before declaring a destination unreachable, BGP explores

all backup paths until it finds a valid one. We call this process path exploration. In or-

der to reduce delays and data loss during routing convergence, path exploration should

happen as fast as possible. We show that path exploration depends on the number of

alternative paths between the source and the destination, and that nodes at the border

of the network with more alternative paths will experience more severe convergence

delays than nodes at the core of the network. Other protocol property(or deficiency)

that depends heavily on the topology is the resiliency of a node to prefix hijacks. A

prefix hijack attack happens when a network X starts announcing address space that

belongs to a network Y . The end result is that a fraction of the traffic will be deviated

to the false origin. In some cases the false origin can even intercept the traffic and

send it back to the true origin. After conducting a set of Internet scale simulations we

find that networks connected with multiple Tier-1s are the most resilient to this type of

3

attacks. Furthermore, we also surprisingly find that Tier-1s at the core of the network

are more vulnerable to prefix hijacks launched by its customers because of the policy

factor in BGP route selection.

The main contributions of this dissertation can be summarized as follows. First, we

formulate the topology liveness problem and propose a solution for it, this is described

in Chapter 4. Second, we investigate the completeness of the observed AS topology

by quantifying and explaining the reasons why AS adjacencies are missing from com-

monly used data sources, which is described in Chapter 5. Third, in Chapter 6 we

establish the dependency between the convergence of BGP routes and the topological

location of both the monitor and the origin of the routes. Lastly, in Chapter 7 we show

how the resiliency of networks to prefix hijack depends on how close to the Tier-1 core

each network is connected.

4

CHAPTER 2

Background

In this section we present the relevant background on Internet routing and relationships

between different networks.

2.1 Internet Routing

The Internet consists of more than thirty thousand networks called “Autonomous Sys-

tems” (AS). Each AS is represented by a unique numeric ID known as its AS num-

ber, and may advertise one or more IP address prefixes. For example, the prefix

131.179.0.0/16 represents a range of 216 IP addresses belonging to AS-52 (UCLA).

Internet Registries such as ARIN and RIPE assign prefixes to organizations, who then

become the owner of the prefixes. Automonous Systems run the Border Gateway Pro-

tocol (BGP) [78] to propagate prefix reachability information among themselves. In

the rest of the thesis, we abstract an autonomous system into a single entity called AS

node or node, and the BGP connection between two autonomous systems as AS link or

simply link.

BGP uses routing update messages to propagate routing changes. As a path-vector

routing protocol, BGP lists the entire AS path to reach a destination prefix in its rout-

ing updates. Route selection and announcement in BGP are determined by networks’

routing policies, in which the business relationship between two connected ASes plays

a major role. AS relationships can be generally classified as customer-provider or peer-

5

peer1. In a customer-provider relationship, the customer AS pays the provider AS for

access service to the rest of the Internet. The peer-peer relationship does not usually

involve monetary flow; The two peer ASes exchange traffic between their respective

customers only. Usually a customer AS does not forward traffic between its providers,

nor does a peer AS forward traffic between two other peers. For example in Figure 2.1,

AS-1 is a customer of AS-2 and AS-3, and hence would not want to be a transit be-

tween AS-2 and AS-3, since it would be pay both AS-2 and AS-3 for traffic exchange

between themselves. This results in the so-called valley-free BGP paths [39] generally

observed in the Internet. When ASes choose their best path, they usually follow the

order of customer routes, peer routes, and provider routes. This policy of no valley

prefer customer is generally followed by most networks in the Internet. As we will see

later, the no valley prefer customer policy plays an important role in determining the

impact of prefix hijacks and hence we present a simple example to illustrate how this

policy works.

Figure 2.1 provides a simple example illustrating route selection and propagation.

AS-1 announces a prefix (e.g. 131.179.0.0/16) to its upstream service providers AS-2

and AS-3. The AS announcing a prefix to the rest of the Internet is called the origin

AS of that prefix. Each of these providers then prepends its own AS number to the

path and propagates the path to their neighbors. Note that AS-3 receives paths from its

customer, AS-1, as well as its peer, AS-2, and it selects the customer path over the peer

path thus advertising the path {3 1} to its neighbors AS-4 and AS-5. AS-5 receives

routes from AS-2 and AS-3 and we assume AS-5 selects the route announced by AS-

3 and announces the path {5 3 1} to its customer AS-6. In general, an AS chooses

which routes to import from its neighbors and which routes to export to its neighbors

based on import and export routing policies. An AS receiving multiple routes picks

1Sometimes the relationship between two AS nodes can be “siblings,” usually because they belongto the same organization.

6

4

2

1

3

6

5

Provider CustomerPeer Peer 2-1

1

1

5-3-13-1

Tier-1

3-1

2-1

2-1

Figure 2.1: Route propagation.

the best route based on policy preference. Metrics such as path length and other BGP

parameters are used in route selection if the policy is the same for different routes. The

BGP decision process also contains many more parameters that can be configured to

mark the preference of routes. A good explanation of these parameters can be found

in [41].

2.2 Inter-domain Connectivity and Peering

As a path-vector protocol, BGP includes in its routing updates the entire AS-level path

to each prefix, which can be used to infer the AS-level connectivity. Projects such as

RouteViews [15] and RIPE-RIS [14] host multiple data collectors that establish BGP

sessions with operational routers, which we term monitors, in hundreds of ASes to

obtain their BGP forwarding tables and routing updates over time.

Among all the ASes, less than 10% are transit networks, and the rest are stub net-

works. A transit network is an Internet Service Provider (ISP) whose business is to

provide packet forwarding service between other networks. Stub networks, on the

7

FC

D E

G

A

B

Layer-2 cloud

IXP

195.69.144.1

195.69.144.2195.69.144.7

195.69.144.3

195.69.144.4195.69.144.5

195.69.144.6

Figure 2.2: A sample IXP. ASes A through G connect to each other through a layer-2

switch in subnet 195.69.144/24.

other hand, do not forward packets for other networks. In the global routing hier-

archy, stub networks are at the bottom or at the edge, and need transit networks as

their providers to reach the rest of the Internet. Transit networks may have their own

providers and peers, and are usually described as different tiers, e.g., regional ISPs,

national ISPs, and global ISPs. At the top of this hierarchy are a dozen or so tier-1

ISPs, which connect to each other in a fully mesh to form the core of the global rout-

ing infrastructure. The majority of stub networks today multi-home with more than

one provider, and some stub networks also peer with each other. In particular, content

networks, e.g., networks supporting search engines, e-commerce, and social network

sites, tend to peer with a large number of other networks.

Peering is a delicate but also important issue in inter-domain connectivity. A

network has incentives to peer with other networks to reduce the traffic sent to its

providers, hence saving operational costs. But peering also comes with its own issues.

For ISPs, besides additional equipment and management cost, they also do not want

to establish peer-peer relationships with potential customers. Therefore ISPs in gen-

8

eral are very selective in choosing their peers. Common criteria include number of

co-locations, ratio of inbound and outbound traffic, and certain requirements on prefix

announcements [2, 1]. In recent years, with the fast growth of available content in the

Internet, content networks have been keen on peering with other networks to bypass

their providers. Because they have no concern regarding transit traffic or potential cus-

tomers, content networks generally have an open peering policy and peer with a large

number of other networks.

AS peering can be realized through either private peering or public peering. A

private peering is a dedicated connection between two networks. It provides dedi-

cated bandwidth, makes troubleshooting easier, but has a higher cost. Public peering

usually happens at the Internet Exchange Points (IXPs), which are third-party main-

tained physical infrastructures that enable physical connectivity between their mem-

ber networks2. Currently most IXPs connect their members through a shared layer-2

switching fabric (or layer-2 cloud). Figure 2.2 shows an IXP that interconnects ASes

A through G using a subnet 195.69.144.0/24. Though an IXP provides physical con-

nectivity among all participants, it is up to individual networks to decide with whom to

establish BGP sessions. It is often the case that one network only peers with some of

the other participants in the same IXP. Public peering has a lower cost but its available

bandwidth capacity between any two parties can be limited. However, with the recent

increase in bandwidth capacity, we have seen a trend to migrate private peerings to

public peerings.

2.3 Ground Truth vs. Observed Map

To study AS-level connectivity, we need a clear definition on what constitutes an inter-

AS link. A link between two ASes exists if the two ASes have a contractual agreement

2Note that private and public peering can happen in the same physical facility.

9

5

2

4

1

9

6 7

83

10

Provider CostumerPeer Peer

Best path

(a) (b)

p0 p1 p2

Figure 2.3: A set of interconnected ASes, each node represent an AS. (a) shows an

example of hidden links, and (b) an example of invisible links.

to exchange traffic over one or multiple BGP sessions. The ground truth of the Inter-

net AS-level connectivity is the complete set of AS links. As the Internet evolves, its

AS-level connectivity also changes over time. We use Greal(t) to denote the ground

truth of the entire Internet AS-level connectivity at time t.

Ideally if each ISP maintains an up-to-date list of its AS links and makes the list ac-

cessible, obtaining the ground truth would be trivial. However, such a list is proprietary

and rarely available, especially for large ISPs with a large and changing set of links.

In this thesis, we derive the ground truth of several individual networks whose data

is made available to us, including their router configurations, syslogs, BGP command

outputs, as well as personal communications with the operators.

From router configurations, syslogs and BGP command outputs, we can infer

whether there is a working BGP session, i.e., a BGP session that is in the established

state as specified in RFC 4271 [78]. We assume there is a link between two ASes if

there is at least one working BGP session between them. However if all the BGP ses-

sions between two ASes are down at the moment of data collection, the link may not

10

appear in the ground truth on that particular day, even though the two ASes have a valid

agreement to exchange traffic. Fortunately we have continuous daily data going back

for years, thus the problem of missing links due to transient failures should be neg-

ligible. When inferring connectivity from router configurations, extra care is needed

to remove stale BGP sessions, i.e. , sessions that appear to be correctly configured in

router configurations, but are actually no longer active. We use syslog data in this case

to remove the stale entries (as described in detail in the next section). We believe that

this careful filtering makes our inferred connectivity a very good approximation of the

real ground-truth.

We denote an observed global AS topology at time t by Gobsv(t), which typically

provides only a partial view of the ground truth. There are two types of missing links

when we compare Gobsv and Greal: hidden links and invisible links. Given a set of

monitors, a hidden link is one that has not yet been observed but could possibly be

revealed at a later time. An invisible link is one that is impossible to be observed by

the given set of monitors. For example, in Figure 2.3(a), assuming that AS5 hosts a

monitor (either a BGP monitoring router or a traceroute probing host) which sends to

the collector all the AS paths used by AS5. Between the two customer paths to reach

prefix p0, AS5 picks the best one, [5-2-1], so we are able to observe the existence of

AS links 2-1 and 5-2. The three other links, 5-4, 4-3, and 3-1, are hidden at the time,

but will be revealed when AS5 switches to path [5-4-3-1] if a failure along the primary

path [5-2-1] occurs. In Figure 2.3(b), the monitor AS10 uses paths [10-8-6] and [10-

9-7] to reach prefixes p1 and p2, respectively. In this case, link 8-9 is invisible to the

monitor in AS10, because it is a peer link that will not be announced to AS10 under

any circumstances due to the no-valley policy.

Hidden links are typically revealed if we build AS maps using routing data (e.g.,

BGP updates) collected over an extended period. However, a new problem arises from

11

this approach: the introduction of potentially stale links; that is, links that existed

some time ago but are no longer present. A empirical solution for removing possible

stale links has been developed in [72]. To discover all invisible links, we would need

additional monitors at most, if not all, edge ASes where routing updates can contain

the peering links as permitted by routing policy. The issues of hidden and invisible

links are shared by both BGP logs and traceroute measurements.

12

CHAPTER 3

Topology Liveness and

Completeness Problems

Because individual ASes apply private routing policies to BGP updates, generally

speaking one cannot observe the complete AS topology. We denote the complete real

Internet AS topology graph by Greal, and the topology graph that one infers from mea-

surement data by Gobsv. The observed portion of the AS topology is a subset of the

real topology, i.e. , Gobsv ⊂ Greal. Knowing how much these two topologies differ is

what we term the Completeness Problem.

Gobsv can be constructed in multiple ways. One way is to have data collectors

establish BGP sessions with a set of operational routers, which we call monitors, to

obtain their BGP routing tables and updates. Another way is to have a set of van-

tage points send traceroute probes and then to convert the obtained router paths to AS

paths1. For example, in Fig. 3.1, at time t0, we measure the topology from monitor

A by either examining A’s routing table or probing other two nodes B and C. The

resulting Gobsv misses one link, B-C, from Greal. To study graph properties of the

AS topology it is important to minimize the number of missing links. Existing efforts

in this area include deploying additional monitors and incorporating data from other

sources (e.g. , routing registry [96]). For example, if B is also a monitor, then one can

observe the existence of link B-C.1Different from BGP monitors, traceroute vantage points are usually end hosts. However in this

thesis we term both as monitors.

13

Figure 3.1: Observing Topology Over Time

As a direct consequence of our inability to observe the complete topology, another

problem, which we call the Liveness Problem, arises when we study topology evolution

over time. That is, an observed change in Gobsv does not necessarily reflect a change

in Greal. For example, in Fig. 3.1, at time t1, link A-C goes down due to a physical

failure, but this failure does not change the contractual relationship between A and

C, i.e. , link A-C still exists in Greal. However, the routing protocol will adapt to the

failure and linkA-C disappears from the observation. As a result, comparingGobsv(t0)

with Gobsv(t1), we will see one link removal (A-C) and one link addition (B-C). In

another example, consider the changes from time t2 to time t3 in Fig. 3.1. D changes its

service provider by switching from C to B. This is a real topology change and results

in one link removal (D-C) and one link addition (D-B) in both Greal and Gobsv. In

both cases, what we observe are changes inGobsv, and the question is how to tell which

ones are real topology changes happened in Greal.

We use appearance and disappearance to name the addition and removal of ele-

ments (i.e. , links and nodes) in Gobsv respectively, and birth and death to name the

addition and removal of elements in Greal respectively. The liveness problem concerns

14

how to infer the real births and deaths from observed appearances and disappearances.

More specifically, when a link or node disappears from Gobsv, is it still alive in Greal?

When a link or node appears for the very first time, has it been alive in Greal before?

Answering these questions is critical to studying topology evolution, as we need to

know when and where births and deaths occur in Greal.

The liveness problem and completeness problem are related in that solving one

will help solve the other. If the liveness of links and nodes is known, we can combine

observations made at different times to form a more complete topology estimate. For

example, in Fig. 3.1, combining Gobsv(t0) and Gobsv(t1) will give a more complete

topology at time t1, provided that we know linkA-C is still alive at time t1. Similarly, if

the complete topology is known, we will be able to differentiate real topology changes

from transient routing changes. For example, if we know the complete topology in

Fig. 3.1, we will not take the appearance of link B-C at time t1 as a birth.

However the liveness problem and completeness problem are also fundamentally

different. On the one hand, even if we know the liveness of all the observed links and

nodes over time and are able to combine observations made through a long time period,

we still do not know whether the combined topology is complete, or how incomplete

it may be. For example, in Fig. 3.1, from time t2 to time t3, knowing the liveness of

links and nodes does not help tell whether link B-C exists. On the other hand, even if

monitors are placed at every node to capture all the links (except those having failures

at the moment), when link A-C disappears from the observation at time t1, we still

cannot tell instantly whether it is due to an operational failure or the termination of

the inter-AS contract, although observations over time can provide a good estimate as

described later in this thesis.

Both the liveness problem and completeness problem are important to a full under-

standing of the Internet topology and its evolution. An ideal solution would be having

15

all the ISPs register their inter-AS connectivity at a central registry and keep their en-

tries up-to-date, which, unfortunately, does not seem feasible in the current Internet.

A near ideal solution would be placing a monitor in each AS, which is also infeasi-

ble in reality. A number of research efforts have been devoted to making Gobsv more

complete, without knowing exactly how close the obtained Gobsv is to Greal. However,

to our knowledge, no one has addressed the liveness problem, which has been a ma-

jor hurdle to empirical studies of topology evolution. In this thesis, we focus on the

liveness problem and propose a solution based on the analysis of available topology

data.

Intuitively, real topology changes generally occur over relatively long time inter-

vals (e.g. , months or even years), while transient routing changes happen within much

shorter periods (e.g. , minutes or hours). Thus if we keep observing the topology

over time, we should be able to differentiate topology changes from transient routing

changes. For example, if a link disappears and re-appears after a short period of time,

it is most likely that the disappearance is not a death. If a link disappears and never

re-appears again over a long time period, it is most likely that the link no longer ex-

ists. The research question is how long one should wait before declaring a birth or

death with a given level of confidence. We develop an empirical model that captures

the effects of long-term topology changes and short-term routing changes on observed

topologies.

Internet topology can be abstracted at different granularity, e.g. , router-level topol-

ogy, AS-level topology, and ISP-level topology (a number of ISPs have multiple ASes).

Although this thesis focuses on the AS-level topology, the liveness problem is a gen-

eral problem that exists independently from whether the nodes in Fig. 3.1 are routers,

ASes, or ISPs. Thus we believe that solving the problem at the AS-level could lead

a way to liveness solutions at other granularity. For example, if we can identify real

16

topology changes for each AS, by combining the behavior of ASes that belong to the

same ISP, we will get the topology changes for ISP-level topology. One of our future

work is to apply the methodology developed in this thesis to other types of topologies.

17

CHAPTER 4

A Solution to the Liveness Problem

In this chapter we develop a solution to the topology liveness problem based on em-

pirical data and provide some example applications of the model.

4.1 An Empirical Model of Observed Topology Dynamics

We develop the model using BGP log data, verify its consistency with information

extracted from Internet registries, and evaluate the suitability of router configuration

files and traceroute data sets in solving the liveness problem.

4.1.1 Data Sets

We use data from four different types of sources: BGP, router configurations, tracer-

oute, and Internet registries. The BGP data consists of both routing tables and updates

collected by RouteViews [15] and RIPE-RIS [14] from a few hundreds of monitors be-

tween January 1, 2004 and December 1, 2006, a period of almost three years 1. From

BGP routing tables and updates, we extract topology information (i.e. , AS nodes

and links) and record the timestamps of appearances and disappearances of links and

nodes. There are totally 27,972 nodes and 123,182 links in the entire data set. To

evaluate the effects of different monitors, we group BGP data into three sets.

1The main reason for starting from 2004 instead of earlier is to have an adequate number of monitorsfor the entire measurement period.

18

50000

100000

150000

200000

0 200 400 600 800 1000

Cum

ulat

ive

num

ber

of li

nks

obse

rved

Number of days since January 1st 2004

Tier-1Set-54

All

Figure 4.1: Number of links captured by

different sets of monitors

0

100

200

300

400

500

600

700

0 50 100 150 200 250 300

Num

ber

of p

eers

Number of covered ASes

Figure 4.2: Number of monitors in Route-

Views and RIPE-RIS combined

20000

30000

40000

50000

60000

70000

80000

90000

0 200 400 600 800 1000

Cum

ulat

ive

num

ber

of li

nks

Number of days since January 1st 2004

2004-01-012004-07-012005-01-012005-07-012006-01-01

Figure 4.3: Number of links, Tier-1 moni-

tor with different starting times

30000

40000

50000

60000

70000

80000

90000

100000

110000

120000

130000

0 200 400 600 800 1000

Cum

ulat

ive

num

ber

of li

nks

obse

rved

Number of days since Jan 1st 2004

DataLinear component

Fit

Figure 4.4: Visible links seen by all moni-

tors

19

• Tier-1: data from a single monitor residing in a Tier-1 network.

• Set-54: data from a set of 54 monitors residing in 35 ASes; these monitors are

present throughout the entire measurement period.

• ALL: data from all monitors.

The traceroute data is collected and kindly provided to us by three research projects:

Skitter [17], DIMES [82], and iPlane [57]. They all have monitors around the globe to

periodically traceroute thousands of destination IP addresses, and convert router paths

to AS paths. They differ in the number of monitors, locations of monitors, probing

frequency, and the list of destinations to probe. Both Skitter and DIMES have data

from January 1, 2004 to December 1, 2006, but iPlane’s data collection only started

from late June, 2006. Each data set comes with an AS adjacency list describing the

AS topology it observes.

We also extract AS number allocation data from Regional Internet Registries (RIR) [12],

and AS connectivity information from Internet Routing Registries (IRR) [7].

In addition to the above publicly available data sources, we also made use of router

configuration data of all the routers of a Tier-1 backbone network, which includes

historical configuration files of more than one thousand routers filtered as described

in [70]. Moreover, we have access to iBGP feeds of several routers in this network.

Finally, in Section 4.3 we also use iBGP data provided by Abilene, the US research

and educational network.

4.1.2 An Empirical Model

We first use BGP data to develop an empirical model for observed topology changes.

Before starting the model development, we would like to note an important difference

between links and nodes in terms of their observability. Due to the relatively small

20

0

10000

20000

30000

40000

50000

60000

0 200 400 600 800 1000

Cum

ulat

ive

num

ber

of li

nks

Disappearance period of links

Tier-1Set-54

All

Figure 4.5: Link disappearance period

0

10000

20000

30000

40000

50000

60000

0 200 400 600 800 1000

Cum

ulat

ive

num

ber

of li

nks

Disappearance period

DataLinear component

Fit

Figure 4.6: Link disappearance period, by

all monitors

number of existing monitors and the rich connectivity among ASes, many links are

not seen on the first day of observation; some of them get revealed through routing

dynamics over time. However, because most ASes (over 99%) originate one or more

prefixes, they appear in the global routing table on the first day of observation; the

small number of remaining transit ASes behave in the same way as links in terms of

their observability. As a result, the same model applies to both links and nodes. We

will focus on developing the model for links, and only show the results of applying the

model to nodes.

4.1.2.1 The Appearance of Links and Nodes

Observations: Fig. 4.1 shows the cumulative number of unique links captured by dif-

ferent monitor sets over time. Taking the Tier-1 curve for instance: on the first day,

the observed links are those in the monitor’s routing table on January 1, 2004; a point

(200, 40000) on the curve means that during the first 200 days, this monitor has seen

40000 unique links in total from its BGP routing tables and updates.

21

As shown in Fig. 4.1, all the three curves share a common pattern: they start with

a relatively high growing rate, but slows down over time and settle on a more or less

constant growth rate. For the ALL curve, despite that the number of monitors has been

changing over time (see Fig. 4.2), its overall shape is the same as the other two’s,

except slight bumpiness at the beginning. The same pattern also holds across different

starting times of the observation, as shown in Fig. 4.3. Therefore, this pattern hints at

something fundamental to topology observation.

Intuitively, we can interpret the linear portion of the curve as due to real topology

changes (i.e. , link births) and the initial fast growth as caused by originally hidden

links being revealed by transient routing dynamics. The curves show that, within the

first 100 to 200 days, most links that could be revealed have shown up. After that

point the effect of the revelation process becomes minimal, and the curves would have

flattened out eventually had there been no link birth. The sustained linear increase of

the curves gives a strong indication of topology changes by link births. We derive an

empirical model to quantify this intuition as follows.

Modeling: According to their observability, we sort all links into three types: Visible

(links that have been observed), Invisible (links that cannot be observed by the given

set of monitors2) and Hidden (links that are possible to be observed but have not yet).

Fig. 4.1 and Fig. 4.3 show the cumulative number of unique visible links over time.

We make the following two simple assumptions:

• Constant Birth Rate: Let bv be the birth rate of visible links, bh the birth rate of

hidden links, then the total birth rate of visible and hidden links b = bv + bh.

• Uniform Revelation Probability: The probability for each hidden link to be re-

vealed during a small time interval ∆t is λ∆t.2Invisible links exist because of routing policies, e.g. , a peer-to-peer link between two ASes will

not be advertised to their providers, thus it cannot be observed by monitors in the provider networks.

22

At a given time t, let v(t) be the cumulative number of visible links observed from

time 0 to time t, and h(t) be the number of hidden links at time t. Consider a small

time interval from t to t+∆t. During this period, λ ·h(t)∆t hidden links are revealed;

at the same time, bh ·∆t new hidden links are born. Therefore,

∆h = h(t+ ∆t)− h(t) = −λh(t)∆t+ bh∆t = (bh − λh(t))∆t

∆h

bh − λh(t)= ∆t

Integrating both sides from time 0 to time t, we have:

h(t) = h0e−λt +

bhλ

(1− e−λt)

where h0 is the number of hidden links at time 0. Since h(t → ∞) = bhλ

, we can

re-write the above equation as

h(t) = h0e−λt + h∞(1− e−λt)

Now consider the number of observed links v(t), between time t and t+∆t,

∆v = λh(t)∆t+ bv∆t = λ(h0 − h∞)e−λt∆t+ b∆t

Integrating both sides from time 0 to time t, we get

v(t) = v0 + bt+ (h0 − h∞)(1− e−λt) (4.1)

where v0 is the number of links observed on the first day, bt reflects the linear birth

process, h0 is the initial number of hidden links, h∞ is the number of hidden links as

observation time t→∞, and the impact of revelation process decreases exponentially

over time.

Results: We perform non-linear regressions on the data based on Eq. 4.1, and the fit is

very good for all three sets of monitors, including the ALL curve (Fig. 4.4), which has a

23

0

50

100

150

200

250

300

350

0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1

Obs

erva

tion

perio

d (d

ays)

Confidence

BirthDeath

Figure 4.7: Observation period as a func-

tion of confidence level for links

0

2000

4000

6000

8000

10000

12000

0 200 400 600 800 1000

Cum

ulat

ive

num

ber

of A

Ses

Number of days since Jan 1st 2004

RIRLinear fit (b=10.4)

Figure 4.8: Node birth from RIR

changing set of monitors over the measurement period. All regression results presented

in this thesis (e.g. , Fig. 4.4, 4.6, 4.11 and 4.14) have high coefficient of determination,

R2 > 99.5%. The good fitting indicates that the simple model approximates real data

satisfactorily. As explained earlier, the same model should apply to nodes as well,

since hidden nodes are revealed in the same way as hidden links by routing dynamics.

The fit to node data is also very good. Values of model parameters are obtained from

regressions and listed in Fig. 4.1.

Through further examinations of the inter-AS relations, we also developed the

following observations. First, the hidden AS links correspond to backup customer-

provider links which are revealed over time. Second, there should be no hidden peer

links, because, generally speaking, peer links are always used to carry routes. They

are used in primary rather than backup routes, thus peer links are immediately visible

unless they are invisible. Furthermore, the truly invisible links correspond to peer links

between lower tier ASes which do not have monitors installed. That is, the invisible

links are under the line of sight of all the existing monitors. [70] presents addtional

elaborations on these observations.

24

Parameters Links Nodes

Birth rate b (day−1) 67.3 10.3

Revelation λ (day−1) 0.0151 0.0223

(h0 − h∞) 11013 240

Death rate d (day−1) 45.7 2.87

Revelation µ (day−1) 0.0196 0.0172

f0/µ 10545 797

Table 4.1: Model Parameters

4.1.2.2 The Disappearance of Links and Nodes

Observations: A link disappeared from Gobsv(t) can be a real death, or it can be still

alive inGreal, not observed by any monitor at the moment but may re-appear sometime

in the future. Assuming the observation period ends on day n, we define that a link has

a disappearance period of (n−m) days if the link disappeared on day m and has not

re-appeared until the end of the observation. Note that even though a link may appear

and disappear many times in the entire observation period, only the last disappearance

counts in calculating the link’s disappearance period.

Fig. 4.5 shows the cumulative number of links over the disappearance period. For

instance, a data point at (200, 21000) on the Tier-1 curve means that, at the end of the

observation, 21000 links have a disappearance period less than or equal to 200 days

as seen by the Tier-1 monitor. Interestingly, the curve also exhibits the pattern of an

initial exponential component plus a stable linear component over time, and this same

pattern holds across different monitor sets and different observation ending times.

Modeling: We divide visible links into two subtypes: in-sight (links that are in the

25

currently observed topology Gobsv(t)), and out-of-sight (links that have been seen pre-

viously, but are not inGobsv(t), and may come back toGobsv sometime later). We make

two simple assumptions:

• Constant Death Rate: In-sight links disappear from the monitors’ view at a rate

of d + f0, where d is the number of link deaths, f0 the number of links that

become out-of-sight, per unit time.

• Uniform Revelation Probability: For each out-of-sight link, the probability of

being revealed (i.e. , become in-sight again) during a small time interval ∆t

is µ∆t. This revelation process is essentially the same as the one described

earlier for appearance. We use different notations, λ and µ, since the former is

computed from the first appearance of links, and the latter is computed from the

re-appearance of links.

Suppose the observation ends at time tend. Consider the f0 links that become out-

of-sight at time td, and s = tend − td. Let f(x) be the number of these links that have

not re-appeared since time td through time td + x. After a short time period ∆x, some

of these links may be revealed by routing dynamics. Therefore,

∆f(x) = f(x+∆x)− f(x) = −µf(x)∆x

Integrating from x = 0 to x = s,

f(s) = f0e−µs

At the end of observation, the number of links with disappearance period of s is equal

to d + f(s). Let z(s) be the cumulative number of links whose disappearance period

is less than or equal to s, then

z(s) =

∫ y=s

y=0

(d+ f(y)) dy =f0

µ(1− e−µs) + sd (4.2)

26

0

5000

10000

15000

20000

25000

30000

35000

40000

0 100 200 300 400 500 600

Cum

ulat

ive

num

ber

of n

ew li

nks

obse

rved

Number of days since April 5th 2005

IRR dataLinear fit: b=57.6

Figure 4.9: Link birth from IRR

0

2000

4000

6000

8000

10000

12000

0 100 200 300 400 500 600

Cum

ulat

ive

num

ber

of li

nks

Disappearance period (days)

IRR dataLinear fit: d=18.6

Figure 4.10: Link death from IRR

where the death process is captured by a linear term, and the revelation process (of

disappeared links) is captured by an exponential term.

Results: Eq. 4.2 fits data well and the results are shown in Fig. 4.6. The same model

can also be applied to nodes. Model parameters, for both appearance and disappear-

ance of links and nodes, are listed in Fig. 4.1. Even though λ is estimated from first

appearance and µ is estimated from re-appearance, they have similar numerical values,

which is consistent with our model that both parameters characterize the same revela-

tion process. Note that in deriving the model for link appearance, we did not take into

account the death process of visible or hidden links for clarity. The death of visible

links does not affect Eq. 4.1 because Eq. 4.1 is about cumulative number of observed

links. Assuming the death rate for hidden links is dh, the only change it makes in

Eq. 4.1 is to replace bh by (bh − dh), e.g. , b = bv + bh − dh instead of bv + bh.

4.1.2.3 Distinguishing Topology Changes from Transient Routing Changes

Based on our empirical model, the effects of transient routing dynamics on observed

topology decrease exponentially over time, while the real birth and death occur at

constant rates. If one observes the topology long enough, the observed changes will

27

0

10000

20000

30000

40000

50000

60000

70000

80000

90000

0 200 400 600 800 1000

Cum

ulat

ive

num

ber

of li

nks

obse

rved

Number of days since Jan 1st 2004

EmpiricalLinear component

Fit

Figure 4.11: Visible links in Skitter, λ = 0.00598, b = 39.86.

be dominated by real topology changes. Assume the observation starts at time tstart,

ideally one can wait for a sufficiently long time B so that every new link appearance

at time t > tstart + B can be considered a birth with high confidence. Similarly, after

a link disappears, one can wait for a sufficiently long time D and if the link does not

re-appear during this time, it can be considered a death with high confidence. Now we

are ready to quantify B and D with certain confidence.

According to our model, on each day the newly discovered links come from two

sources: bv from birth and λh(t) from revelation. If we count all the newly discovered

links on day t as birth, the chance of being correct is

confidence(t) =bv

bv + λh(t)=

bvb+ λ(h0 − h∞)e−λt

(4.3)

From regression on data, we can obtain the values of b, λ, and (h0−h∞). To estimate

bv, we assume bv ' b. This is based on the following observation. Since b = bv +

bh − dh, our assumption is equivalent to bh ' dh. If bh and dh differ significantly,

the number of hidden links at the beginning of observation would vary significantly

with different starting dates, i.e. , h0 as well as (h0 − h∞) would change significantly

over different starting dates. However, our examination of data shows that this is not

28

the case, i.e. , (h0 − h∞) remains relatively stable over different starting dates, which

validates the assumption of bh ' dh and bv ' b.

Knowing the values of the parameters, we can now calculate B for a given confi-

dence level using Eq. 4.3. Similarly, D can be calculated for a given confidence level

based on our model of disappearance. Fig. 4.7 shows the values of B and D for dif-

ferent confidence values. For instance, after 205 days from January 1, 2004, we can

count all newly discovered links as real births with 90% or higher probability of being

correct. If a link disappears and does not show up after 189 days, with 90% chance

this is a real death.

So far we have considered links and nodes separately. For a given observation

period (B or D), the confidence level of nodes usually is higher than that of links.

Thus considering both together sometimes can improve the confidence on links. For

instance, a node birth is always accompanied by some link births. Therefore we can

decide these link births with higher confidence than what we would have by only con-

sidering links.

4.1.3 Comparison with router configuration files from a Tier-1

In this section we compare the connectivity of a Tier-1 network extracted from Route-

views and RIPE-RIS since early 2004 (“BGP data”) with the connectivity obtained

from router configuration files as described in [70]. This process involves collecting

all router configuration files from the Tier-1 network and process them to extract AS

adjacencies. To reduce the number of false positives, we only consider BGP sessions

for which there is a valid route configured between the two incident routers. Fur-

thermore, we remove BGP sessions for which there are no syslog messages since a

reasonable long time (the process is explained in more detail in [70]). As a first step

we compare the appearance rate of incident links to the Tier-1 as observed from BGP

29

data and as extracted from router configs, which is shown in Figure 4.12. We note

that the curves are very close to each other (the top curve has only ∼ 7% additional

slope), and the gap is solely caused by neighbors that announce prefixes that are longer

than /24, which are aggregated in a shorter prefix originated by the Tier-1. These links

never appear in eBGP, even though they are revealed in iBGP routes inside the Tier-1.

Figure 4.13 shows the difference between the appearance timestamps of incident AS

links of the Tier-1 as seen in BGP data from RouteViews and RIPE (birthBGP ) and

from router configs (birthcon). We observe that almost 80% of the AS links appear in

router configs in the same day as they appear in BGP. The remaining 20% have some

lag between the time they appear in the configs and the time they appear in BGP, which

can be on the order of several months. This is expected since the router configs in this

study are only from one side of the BGP session, i.e. the neighbor AS may take more

time to configure the session on their routers. The fact that we were using a monitor

of the Tier-1 network in the BGP view contributes to the high accuracy of the link

appearance timestamps, since direct links to the monitor are preferred in BGP routes,

and hence immediately revealed (versus being hidden).

0

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

1

0 100 200 300 400 500 600

Cum

ulat

ive

num

ber

of li

nks

(nor

mal

ized

)

Number of days since Jan 1st 2006

router configsBGP data

Figure 4.12: Comparison between routers’

config files connectivity and BGP data

(cumulative) from a Tier-1 network.

0

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

1

0 20 40 60 80 100

Num

ber

of li

nks

(CD

F)

birthBGP-birthcon (days)

Figure 4.13: Comparison of appearance

times between routers’ config files and

BGP data of a Tier-1 network.

30

0

10000

20000

30000

40000

50000

60000

70000

0 200 400 600 800 1000

Cum

ulat

ive

num

ber

of li

nks

Disapperance period

EmpiricalLinear component

Fit

Figure 4.14: Link disappearance period,

by Skitter, µ = 0.0385, d = 57.61.

0

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

1

-1000 -500 0 500 1000

CD

F

Bsk - Bbgp (days)

Figure 4.15: Comparison of appearance

timestamps between Skitter and BGP.

4.1.4 Comparison with Internet Registry Data

Our model is built upon the assumption that a linear term captures real topology

changes in Greal. Here we use Internet registry data to check the soundness of this

assumption. The registry data is particularly useful because it is not affected by rout-

ing dynamics.

Regional Internet Registries (RIRs) [12] maintain complete history of AS number

allocations. Fig. 4.8 shows the cumulative number of allocations since January 1, 2004.

It can be approximated by a straight line with slope of 10.4 nodes/day, very close to

10.3 nodes/day obtained from our model 3. Since there is a variable delay between an

AS number’s allocation and its announcement in the global routing system, we cannot

use the allocation date to verify the birth date of nodes. However, the fact that the

AS allocation exhibits a growth rate extremely close to that derived from our model

provides a supporting evidence for our assumption of linear node birth. We are not

3Some AS numbers are allocated but never used for global routing. The cumulative number of suchASes grows over time, which may explain the slight difference between node birth rates obtained fromRIR and BGP data.

31

able to check node death rate with RIR data since deallocation of AS numbers is not

mandated and usually is not done in practice.

Internet Routing Registries (IRR) [7] are databases for registering inter-AS con-

nections and routing policies. Registration with IRR is done voluntarily. It is known

that information in IRR is incomplete, and out-of-date for some entries. Historic IRR

files are not publicly available, but we have been downloading a daily copy since April

5, 2005. Fig. 4.9 shows link appearances and Fig. 4.10 shows link disappearances

based on IRR data. Again, one cannot use the IRR data to verify the birth/death dates

of individual links, since registering a link and bringing a link up usually happen on

different days. Both figures exhibit linear behavior over time, which is consistent with

our assumption of linear link birth and death. The rates obtained from IRR data are

lower than that from BGP data, which is likely due to the incompleteness of IRR data.

More specifically, the birth rate estimated in Fig. 4.9 is about 86% of that from BGP

data, whereas the death rate estimated in Figure 4.10 is only 40% of that from BGP

data. This indicates that even some operators are willing to register their connectivity,

they still tend to overlook the removal of stale information.

4.1.5 Evaluation of Traceroute Data

Besides BGP routing tables and updates, traceroute data is another major source for

AS topology information. In this subsection, we analyze existing traceroute data with

regard to their effectiveness in solving the liveness problem.

4.1.5.1 Measuring AS Topology by Traceroute

BGP and traceroute measurements work differently. A BGP data collector passively

listens to routing updates regarding all the globally announced IP prefixes from indi-

vidual monitors and logs the updates as well as routing tables for each monitor. A

32

0

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

1

-1000 -500 0 500 1000

CD

F

Dsk - Dbgp (days)

Figure 4.16: Comparison of disappear-

ance timestamps between Skitter and

BGP.

350000

400000

450000

500000

550000

600000

650000

700000

750000

2004 2005 2006 2007

Num

ber

of r

each

able

add

ress

es

Time

Figure 4.17: Number of reachable ad-

dresses in Skitter destination list.

traceroute monitor actively sends UDP or ICMP probes to a list of IP addresses and

records the router-level paths, which is then converted to AS-level paths.

BGP and traceroute data can be complementary in topology measurement. Since

usually their monitors are placed in very different locations, they may be able to see

different parts of the Internet topology. Also, BGP data records the routing paths, while

traceroute records the data paths, which can be different in some cases. For example,

if a provider AS P aggregates a customer AS C’s prefix in P ’s routing announcement,

a BGP monitor may not be able to see the link P -C, but traceroute can reveal its

existence. However, as pointed out in [64, 44, 28], accurately converting router paths

to AS paths remains an open issue, and there can be many pitfalls in this process.

One commonly used method of converting router IP addresses to AS numbers is to

look up BGP routing table and RIR address allocation database, which, as shown in

previous work, may introduce false AS links. For example, assuming three ASes are

connected asA-B-C, ifB’s border router uses one ofA’s IP addresses, then the simple

conversion will give a false AS link A-C.

33

As a sanity check on traceroute data, we compared the links observed in BGP data

with that reported by Skitter during January 2007, and manually verified the differ-

ence between the two data sets by contacting the operators of 20 ASes. For these 20

ASes, Skitter reported 447 links that were not in BGP data. However, only 16 out

of these 447 links (4%) were confirmed by the operators. Unfortunately, all the three

major traceroute data sets mentioned later in this section used BGP tables and WHOIS

lookups for the IP address to AS number conversion, thus they may potentially suffer

the same conversion errors4. Due to the potentially false links in the data sets, we did

not include traceroute data in our model development. However, since traceroute data

potentially can be a very valuable source for AS topology information, we evaluate the

three data sets and discuss the impact of two important measurement factors: probing

frequency and destination list.

4.1.5.2 Skitter

During our measurement period, Skitter has about 20 monitors around the globe. Ev-

ery day each monitor probes every IP address in a fixed list of around 970,000 ad-

dresses [18]. We apply our model to Skitter data. Fig. 4.11 shows the cumulative

number of unique links observed, and Fig. 4.14 shows the disappearance period of

links. The curves have the same shape as BGP data and our model fits the data well,

which means that the model can be applied to the topology dynamics observed by

traceroute too. However, the observed parameters may not reflect the real rates of birth

and death due to the data set’s limitations.

First, Skitter’s revelation parameters are different from that obtained from BGP data.

Its λ is less than the half of BGP’s, and its µ is about twice as BGP’s. These differences

can be explained as follows. When a routing change happens, a BGP monitor will be

4iPlane improves the conversion by first doing alias resolution of IP addresses and then mappingeach IP address to the AS that majority of the addresses in the alias map to.

34

notified by triggered routing updates. But for a traceroute monitor to see the change, it

must probe the path when the change is still in effect. For instance, if a hidden backup

link is exposed for 2 hours and the monitor probes once every 24 hours, the chance to

discover this link is only 2/24 = 8%. Since primary paths (links) are being used in the

majority of time, and Skitter only probes each destination once every day, the chance

of observing backup paths (links) is small. Therefore, Skitter is slow in discovering

backup links (i.e. , small λ), but quick in picking up recovered primary links (i.e. ,

large µ). This is further verified by examining the links observed by both BGP and

Skitter. Fig. 4.15 shows the difference between BGP’s and Skitter’s timestamps for the

same link, when they discover the link for the first time. BGP discovers 60% of links

earlier than Skitter, Skitter discovers 10% of links earlier than BGP, and they discover

30% of links on the same day. Fig. 4.16 shows the difference between BGP’s and

Skitter’s timestamps when they see a particular link for the last time. BGP observes

50% of links for longer time, Skitter observes 10% of links for longer time, and they

see 40% of links on the same day for the last time. Clearly, to be more effective in

observing topology dynamics, traceroute data collection needs to probe destinations

frequently5.

Second, Skitter data shows a higher death rate (57.61/day) than the birth rate

(39.86/day), which implies that the observed topology shrinks every day, and we have

verified it by counting the number of unique nodes and links observed by Skitter ev-

eryday. This is a result of Skitter using a fixed list of destination IP addresses over

several years. Using a fixed destination list underestimates the birth rate because new

ASes (which announce new prefixes) and their links will not be probed. It also over-

estimates the death rate because over time some IP addresses on the fixed list become

unreachable due to various reasons. For example, a noticeable percentage of prefixes

stop being announced over time [66], or an ISP may decide to block ICMP traffic.

5High probing frequency may cause security concerns from networks being probed.

35

Fig. 4.17 clearly shows the decline of reachable addresses in Skitter’s destination list.

Many links that Skitter no longer observes are due to its shrinking probing scope, not

because the links are dead. To be effective in observing topology dynamics, traceroute

data collections must update the destination list constantly to include all ASes.

4.1.5.3 DIMES

DIMES [82] is a distributed measurement infrastructure consisting of a large num-

ber of agents installed by end users on their computers. These agents periodically

traceroute or ping a given set of destinations. The number of DIMES agents has been

growing rapidly, from a few hundreds in early 2005 to almost 12,000 in late 2006.

Each DIMES agent probes a different destination list, which is computed and updated

periodically [82]. The probing frequency varies for different agents, and can be as

low as once per week. The purpose of using partial destination lists and low probing

frequency for each agent is to distribute the measurement load among agents while

maintaining a good coverage of the topology. Fig. 4.26 shows the cumulative number

of links observed by DIMES over time. Although its overall trend is similar to BGP’s

and Skitter’s, the curve is much more irregular due to the fast growth of DIMES agents,

partial and changing destination lists, and different probing frequencies.

4.1.5.4 iPlane

iPlane [57] employs around 200 monitors installed on PlanetLab nodes [33]. All iPlane

monitors use a common destination list extracted from BGP routing tables daily, and

probe one address inside every /24 address block. The destinations are being probed

once every day by every monitor. iPlane’s measurement scheme seems very promising

in terms of observing topology dynamics. However, its starting date is recent (June

2006), whereas the BGP data set goes back to 2004.

36

60000

65000

70000

75000

80000

85000

90000

0 0.2 0.4 0.6 0.8 1

Num

ber

of v

alid

link

s

Liveness confidence

Figure 4.18: Trade-off between liveness

and completeness for topology snapshot.

0.5

0.55

0.6

0.65

0.7

0.75

0 200 400 600 800 1000

Fra

ctio

n of

mul

ti-ho

med

cus

tom

ers

Number of days since Jan 1st 2004

Figure 4.19: Fraction of multi-homed cus-

tomers.

In summary, traceroute data is an important source of topology information. We

can potentially use it to enhance our understanding of the liveness problem. However,

existing data sets that we have examined do not seem suitable for studying topology

dynamics due to a few well known limitations. To be effective in capturing topology

dynamics, we must first find an effective means to accurately convert router paths to

corresponding AS paths. Furthermore, traceroute monitors collectively should have a

destination list that is both broad (i.e. , destinations represent all ASes) and fresh (i.e.

, destination list is updated), and all the destinations must be probed frequently.

4.2 Applications

This section uses three applications to illustrate the importance of solving the liveness

problem. First, knowing the liveness of links and nodes helps obtain a more com-

plete topology and its properties more accurately. Second, the birth and death dates

inferred from our empirical model can be used to evaluate theoretical models of topol-

ogy evolution. Third, the same data of birth and death dates can be used to empirically

37

0

0.02

0.04

0.06

0.08

0.1

0.12

1 10 100 1000

Atta

chm

ent p

roba

bilit

y

Target node degree

Figure 4.20: Attachment probability distribution for a target node degree.

characterize topology growth trends. Although these applications demonstrate the use-

fulness of our model, the accuracy of the numerical results may be affected by the raw

data available. Recent work [96, 70] point out that this topology may miss a significant

number of peer-to-peer links6.

4.2.1 More Accurate View of the Topology

The AS topology on a particular day is often referred to as a topology snapshot. Un-

derstanding the graph properties of a topology snapshot (static view) and the changes

of the properties over time (dynamic view) is an active research area. Knowing the

liveness of links and nodes can help capture more accurate views of the topology.

6The topology in [96] is collected from BGP routing tables, IRR, and traceroute data. Our topologyis collected from both BGP routing tables and routing updates. A comparison of the two topology datasets on May 12, 2005 shows that our set has 9154 additional links but misses 7056 links.

38

4.2.1.1 Static View

Some previous work (e.g. , [37], [32], [60]) obtain a topology snapshot by extracting

AS nodes and links from BGP routing tables of a single day. This approach yields an

incomplete static view of the topology. Besides the invisible links, which the moni-

tors are unable to capture, there are many hidden links that can be captured, but are

missing in the routing tables on the particular sampling day. One way to obtain a more

complete topology snapshot on day t is to include live links and nodes appeared in

routing tables and routing updates of recent past, i.e. , since day t − L. The value

of L depends on how confident we want to be that the links added are still alive on

day t. For instance, L = 0 means that we have 100% confidence that all the links are

alive on day t, however the topology will be rather incomplete. As L increases, the

topology becomes more complete, however the confidence on link liveness decreases

as the topology may contain links that are already dead. By adjusting the value of

L we can make trade-offs between the liveness and the completeness of the resulting

topology snapshot. Fig. 4.18 shows the number of links in a topology snapshot of

November 30, 2006, as a function of the liveness confidence. For instance, a point

(0.6, 75000) represents a snapshot with 75000 links in which all links have more than

60% chance of being alive on November 30, 2006. The liveness confidence is obtained

by (1− death confidence), where death confidence is calculated using the equivalent

of Eq. 4.3 for disappearance 7. Depending on the liveness confidence we want to put

in the snapshot, the number of links in the topology graph can vary from about 64000

to more than 88000.7The gap in the curve from x = 0.85 to x = 1 is due to using the parameters extracted from BGP

data, Eq. 4.3 does not have a solution for death confidence lower than 15% with time resolution of aday.

39

4.2.1.2 Dynamic View

In order to study how graph properties of the topology change over time, the effects

of revelation process on the observed topology must be taken into consideration. As

an example, assume we want to measure the percentage of multi-homed stub ASes

over time. A stub AS is the one that always appears as the last AS in an AS path; it

corresponds to a customer network at the bottom tier of the Internet routing hierarchy.

A stub AS may connect to multiple network service providers, but it does not forward

data traffic between its providers. Fig. 4.19 shows the percentage of multi-homed

stub ASes without considering the effects of revelation process: starting with an initial

snapshot on the first day, the topology is updated every day by adding links as they first

appear and removing links according to their last-seen time, and the percentage of stub

ASes is calculated and plotted for each day’s topology. The curve shows a fast increase

at the beginning of the study period and a fast decrease at the end, which might look

puzzling at first, but can be easily explained by the revelation process. At the begin-

ning, there are many hidden links that take some time to appear, and as they appear,

we discover that existing single-homed stub ASes are in fact multi-homed. Near the

end, many in-sight links become out-of-sight, and are prematurely discounted from the

topology graph, resulting in false classification of multi-homed stub ASes into single-

homed. To take into account the effects of revelation process, in Fig. 4.19 we draw two

vertical lines corresponding to the 95% confidence margins calculated from Eq. 4.3.

Only the part of the curve between these two vertical lines reflects the real percentage

of multi-homed stub ASes with a high confidence level. This example illustrates the

importance and usefulness of our revelation model in topology measurement and other

similar types of topological studies.

40

0.2

0.3

0.4

0.5

0.6

0.7

0.8

300 400 500 600 700 800

Fra

ctio

n of

targ

et A

Ses

with

deg

ree>

d med

Number of days after January 1st 2004

Data (GLP)Data (BA)

Model median

Figure 4.21: Model evaluation

0

2000

4000

6000

8000

10000

12000

14000

16000

18000

0 200 400 600 800 1000

Net

gro

wth

of n

odes

Number of days since January 1st 2004

StubTransit

Figure 4.22: Node net growth

(per day) Transit Stub Total

Node Birth 2.4 8.3 10.7

Node Death 0.8 2.5 3.3

Net Growth 1.6 5.8 7.4

Link Birth 37.7 29.2 66.9

Link Death 29.0 16.7 45.7

Net Growth 8.7 12.5 21.2

Table 4.2: Comparison Between Stub and Transit changes.

41

4.2.2 Evaluating Theoretical Models

A number of theoretical models have been proposed for network topology evolution.

They generally model the decision process of where to add/remove new links and

nodes into/from the topology at each time. A common evaluation method of these

models is to let the network grow to a certain size by simulating the link/node addi-

tion and removal processes, then compare graph properties of the resulting network

topology with that of an observed topology of a similar size. This approach to evalua-

tion, however, is limited in its effectiveness because different evolution processes can

generate topologies that share certain graph properties. A much better approach is to

compare the decisions of the theoretical models directly with the observed link/node

births and deaths, which is made possible once the liveness problem is solved. In this

section, we demonstrate the value of this new approach by applying it to evaluate two

theoretical models.

Barabasi et al. [23] proposed an evolution model (the BA model) that explains

the emergence of the power-law distribution of node degrees in complex networks.

One of the key elements in this model is the preferential attachment or rich-get-richer

paradigm. The basic idea is that new nodes tend to connect to existing nodes that

already have high degrees. More precisely, according to the BA model, a new node

attaches to an existing node i at time t with the probability pi(t) = di(t)/∑

j dj(t),

where di(t) is the degree of node i at time t, and the summation is over all the existing

nodes in the topology. Thus, the probability that a new node attaches to a node with

degree d at time t is

P (d, t) = N(d, t) · d∑j dj(t)

where N(d, t) is the number of existing nodes with degree d at time t. Fig. 4.20

shows the distribution of P (d, t) on a particular day from BGP data, and the vertical

line at dmed = 49 represents the median of the distribution. Since the distribution of

42

node degrees is heavy-tailed, i.e. , there exists a small number of nodes with high

degrees and a large number of nodes with small degrees, on average a new node has

a higher chance to attach to a small degree node than attaching to a high degree node.

To evaluate the BA model, we extract the birth and death events of links and nodes

from BGP data using 90% confidence margins provided by our model. In other words,

we record a topological change only when we are at least 90% sure that it is a real

topological change. Appearances and disappearances near the beginning and the end

of the study period are discarded to eliminate the effect of revelation process. One

main problem in evaluating a probabilistic theoretical model is that, on the one hand,

we need a large number of node births to make a meaningful sample set; on the other

hand, the degree distribution changes over time as nodes join the topology, therefore

each sample of node degrees is only good for a specific instance in time. To overcome

this problem, we use the following evaluation scheme inspired by the Monte Carlo

method: (1) Compute the distribution P (d, t) for day t and its median dmed(t). (2) For

each node born on day t + 1, check if the node it connects to has a degree higher or

lower than dmed(t); if it is higher, increment counter h, otherwise increment counter l.

(3) Repeat for every day. By the law of large numbers, if the node-join process follows

the BA model, the ratio h/(h+ l) should converge to 50%.

Fig. 4.21 plots the value of h/(h + l) over time and shows that the BA model

actually converges to 58% in the long run. This indicates that, during the evolution of

Internet AS topology, high degree nodes attracted more new nodes than what the BA

model describes. This result is also consistent with the conclusion in [32], which used

an earlier data set and different evaluation techniques.

Bu et al. [24] proposed the Generalized Linear Preference (GLP) model, in which

the probability that a new node attaches to a node with degree d is given by:

P (d, t) = N(d, t) · d− β∑j (dj(t)− β)

43

0

10000

20000

30000

40000

50000

60000

0 200 400 600 800 1000

Net

gro

wth

of l

inks

Number of days since January 1st 2004

StubTransit

Figure 4.23: Link net growth

0

500

1000

1500

2000

2500

3000

3500

4000

4500

200 300 400 500 600 700 800 900

Net

gro

wth

of w

iring

s

Number of days since January 1st 2004

stubtransit

Figure 4.24: Net growth of node wirings.

where β ' 0.8. Fig. 4.21 plots the result of applying our technique to the GLP model.

Note that Fig. 4.21 only checks for the median of the P (d, t) distribution, and it shows

that GLP model matches the median of the empirical data for node attachment. To

further evaluate the model, we need to compare other percentiles of the distribution as

well.

4.2.3 Characterizing Evolution Trends

We use the same data of link/node births and deaths to empirically characterize the

trends of topology evolution. Generally speaking, ASes can be classified into two

categories: stub and transit. A stub AS only appears as the last AS in an AS path,

while a transit AS will appear in the middle of some AS paths. A stub AS corresponds

to a customer network, which does not forward traffic between its neighbors. A transit

AS corresponds to a network service provider, which provides data delivery service

for other networks. In the context of the AS topology graph, we refer to them as stub

nodes and transit nodes, respectively. Links between transit nodes are called transit

44

0

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

1

0 50 100 150 200 250 300

CD

F

Time between connectivity adjustments (days)

p2pc2p stub

c2p transit

Figure 4.25: Frequency of link changes

links, and links between transit and stub nodes are called stub links8.

Provider networks and customer networks are fundamentally different business en-

tities in the Internet, and our data shows that their growth trends in the Internet topol-

ogy are also very different. Table 4.2 shows the breakdown of birth, death, and net

growth rates for transit and stub nodes of the topology, using our model with 90%

confidence margin. It is clear that most node dynamics, including birth, death, and

net growth, happen mainly to the stub nodes. In particular, one may note that the stub

nodes’ net growth rate is 3.6 times of the transit nodes’. Fig. 4.22 shows the node net

growth on a daily basis, with the curves already compensated by the number of hidden

and out-of-sight nodes (which is minimal in the case of nodes). These results indi-

cate that the size of the Internet, as measured by the number of AS nodes, is rapidly

increasing mainly due to new customers joining the Internet.

Table 4.2 also shows the breakdown of link growth rates, and Fig. 4.23 plots the

daily link net growth, with the curves already adjusted to compensate the impact of the

8There also exist links between stub nodes, usually these links are not observable in BGP data asthey are not announced to other ASes.

45

hidden and out-of-sight links. Although the transit nodes only make about 28% of the

total nodes and their percentage is decreasing, their link birth rate is 29% higher than

that of stub links, and the death rate 74% higher. Note that the link birth and death

counts lump together all the link changes that can be either due to node birth/death

or due to link adjustment between existing nodes. To quantify the latter, we define

wiring as the addition of a new link between two existing nodes, and unwiring as the

removal of a link between two live nodes (i.e. , the two incident nodes are still alive in

the topology after the link is removed). Stub wirings and unwirings reflect customer’s

actions of changing providers, while transit wirings and unwirings reflect provider’s

actions of adjusting their inter-ISP connectivity.

Fig. 4.24 shows the net growth of wirings between July 24, 2004 and May 25, 2006,

a period that falls within our 90% confidence margins. Here we consider only the nodes

that are present throughout this measurement period and find 10349 such stub nodes

and 6319 transit nodes. We then count the wiring and unwiring events among these

nodes. Fig. 4.24 shows that provider networks get more densely connected over time,

perhaps as a result of serving ever increasing customer demands9. Keeping in mind

that the number of transit nodes is much lower than the number of stub nodes, and that

a transit wiring event means adding a link between two transit nodes only, we see a

high net growth rate in transit links, despite the high death rate shown in Table. 4.2.

This confirms a general observation that over recent years, provider networks have

been actively adjusting their inter-connectivity.

What types of inter-ISP connectivity that providers are actively adjusting? Based

on the type of inter-AS business relationships, links are generally classified into three

classes: customer-to-provider (c2p), peer-to-peer (p2p), and sibling-to-sibling (e.g. ,

ASes that belong to the same company). Since the sibling relationship is relatively

9Other relevant factors, such as increases in link capacity and the number of BGP sessions betweenneighboring ASes, are not observable in BGP data.

46

rare, here we focus on c2p and p2p links only. In c2p relationship, the customer (or

lower-tier ISP) pays its provider (or upper-tier ISP) to gain the global reachability.

In p2p relationship, data traffic is exchanged free of charge between the two peers,

but only traffic originated from a peer AS (or its customers) is allowed on this link.

We applied the PTE algorithm [39] to infer AS link relationships, and the algorithm

was able to infer the relationships for 75% of all the links involved in wirings and

unwirings. We classify wiring and unwiring events according to their link types. For

each type, we calculate the time interval between two consecutive events and plot the

distribution of the intervals in Fig. 4.25. We can see that, among c2p links, stub links

are more stable compared to c2p transit links, and that the p2p links have much shorter

intervals between their connectivity adjustments than all c2p links. According to [88],

a common ISP operational practice is to set up a p2p link and re-evaluate it periodically

(e.g. , once every few months). Based on whether the p2p link helps reduce the overall

cost, it may be either kept or terminated.

4.3 Discussion

Our model captures the main characteristics of observed AS topology changes by three

dynamic processes: birth, death, and revelation. This model can help us obtain key

parameters of different dynamic processes and separate real topology changes from

transient routing changes with a given confidence. At the same time, we must also

understand the limitations of this model.

The model is mainly descriptive and not derived from first principles. It matches

the data well and is useful in studying topology dynamics. However, it is possible to

have other models that also fit the data well, e.g. , node birth may be modeled by an

exponential function with a small exponent [43]. Our model does not provide an expla-

nation for why birth and death rates are constant, or why the revelation probability are

47

0

20000

40000

60000

80000

100000

120000

0 100 200 300 400 500 600 700 800

Cum

ulat

ive

num

ber

of li

nks

(CD

F)

Number of days since Sept 23th 2004

Figure 4.26: Number of collected links in

DIMES.

0.02

0.025

0.03

0.035

0.04

0.045

0.05

0.055

0.06

0.065

0.07

0 5 10 15 20

Fra

ctio

n of

link

birt

hs

Hour of day (UTC)

Figure 4.27: Diurnal pattern of new link

appearances.

uniform. Answering these questions is likely to require looking deep into economic,

technological, and operational factors behind the Internet evolution, and our model can

serve as an important input to such a study.

We may also articulate the reasons that made the model work well. One sound rea-

son relates to the model’s macroscopic granularity and the large scale of the Internet.

Individual factors influencing the AS topology evolution are probably not constant or

uniformly distributed. However, given the large scale of the Internet, and the large

number of (perhaps independent) factors in action, fluctuations caused by individual

factors may even out when we measure macroscopic properties using data aggregated

from many different views.

Had we restricted our study to a small geographic area, or to a subgraph of the

Internet (e.g. , an academic network), we might have obtained very different results.

As an example, compare the curves of (1) Figure 4.4, which shows the growth of all

observable links, (2) Figure 4.12, which shows the growth of the incidents links of

a Tier-1 network, and (3) Figure 4.29, which shows the growth of incident links of

Abilene (a US research network). Even though the first two cases apparently can be

48

0.02

0.04

0.06

0.08

0.1

0.12

0.14

0.16

0.18

0.2

0 1 2 3 4 5 6

Fra

ctio

n of

link

birt

hs

Day of week (0=thursday)

Figure 4.28: Weekly pattern of new link

appearances.

0

10

20

30

40

50

60

0 50 100 150 200 250 300 350 400 450

Cum

ulat

ive

num

ber

of li

nks

Number of days since Jan 1st 2007

Figure 4.29: Link growth for Abilene

(AS11537).

modeled as a constant rate process, the scale of the Abilene in terms of number of links

does not seem large enough to reveal the characteristic of linear growth. The model

also may not apply to very different time scales, e.g. , looking at link/node appearances

at hourly basis (or over decades) might show a different pattern. This is evident when

looking at the diurnal and weekly patterns of links appearances in Figure 4.27 and

4.28 respectively. From Figure 4.27, we observe that if we break the day in hour units,

there’s actually a diurnal cycle, where the spike of new link appearances corresponds

roughly to the night time in US. We believe this is because new AS links are put

into service during night time when traffic levels are lower and service disruptions are

minimized. Figure 4.28 shows the weekly pattern of new link appearances. We note

that the number of links that appear during weekdays is roughly the same, but during

the weekends (saturday and sunday), the number of new links is much smaller. We

think this is because router reconfigurations are done usually on weekdays, since most

of the staff related to establishing peerings are not working on weekends. Even though

these weekly patterns affect the per day constant rate of our model, when studying the

topology in the scale of years, these periodic patterns balance each other, and in the

49

long run, the appearance of new links can be well approximated by a constant rate

process. Furthermore, we develop the model based on data from recent three years,

and we are yet to see whether the model will hold in the future.

Our model also makes some simplified assumptions. For instance, in reality dif-

ferent links may have different revelation probabilities (λ and µ), depending on their

connectivity and routing polices. According to our previous findings in [70], peer links

should be immediately revealed in routes due to their heavy usage, whereas customer-

provider links may take some time to appear in BGP routes (up to hundreds of days).

50

CHAPTER 5

Quantifying the Topology (in)Completeness

In this chapter we address the topology incompleteness problem. Based on the use of

ground truth information from some case studies, we are able to extract insights that

allow us to bound the type and number of invisible AS adjacencies.

5.1 Data Sets

We use the following data sources to infer the AS-level connectivity and the ground

truth of individual ASes.

BGP data: The public view (PV) of the AS-level connectivity is derived from

all public BGP data at our disposal. These data include BGP forwarding tables and

updates from ∼700 routers in ∼400 ASes provided by Routeviews, RIPE-RIS, Abi-

lene [19], and the China Education and Research Network [3], BGP routing tables

extracted from ∼80 route servers, and “show ip bgp sum” outputs from ∼150 looking

glasses located worldwide. In addition, we use “show ip bgp” outputs from Abilene

and Geant [5] to infer their ground truth. Note that we currently do not use AS topo-

logical data derived from traceroute measurements due to issues in converting router

paths to AS paths, as extensively reported in previous work [28, 64, 44, 72]. For re-

sults reported in Section 5.4, we use Routviews and RIPE-RIS data collected over a

7-month period from 2007-06-01 to 2007-12-31. Due to the overlap in covered ASes

between Routeviews and RIPE-RIS and the fact that some ASes have multiple mon-

51

itors, the set of monitors with full routing tables covers only 126 ASes. All Tier-1

ASes are included in this set except AS209 (Qwest), but fortunately one of AS209’s

customer ASes hosts a monitor.

IXP data: There are a number of websites, including Packet Clearing House (PCH) [8],

Peeringdb [9], and Euro-IX [4], that maintain a list of IXPs worldwide together with a

list of ISP participants in some IXPs. The list of IXP facilities is believed to be close

to complete [10], but the list of ISP participants at the different IXPs is likely incom-

plete or outdated, since its input is done by the ISPs on a voluntary basis. However,

most IXPs publish the subnet prefixes they use in their layer-2 clouds, and the best

current practice [6] recommends that each IXP participant keeps reverse DNS entries

for their assigned IP addresses inside the IXP subnet. Based on the above information,

we adopted the method used in [96] to infer IXP participants. The basic idea is to do

reverse DNS lookups on the IXP subnet IP addresses, and then infer the participating

ISPs from the returned DNS names. From the aforementioned three data sources, we

were able to derive a total of 6,084 unique presences corresponding to 2,786 ASes in

204 IXPs worldwide. Table 5.1 shows the breakdown of the observed presences per

data source. Note that a presence means that there exists an AS-IXP pair. For example,

if two ASes peer at two IXPs, it will be counted as two presences. Although we do

not expect our list to be complete, we noticed that the total number of presences we

obtained is very close to the sum of the number of participants in each IXP disclosed

on the PCH website.

IRR data: The Internet Routing Registry (IRR) [7] is a database to register inter-

AS connectivity and routing polices. Since registration with IRR is done by ISP op-

erators on a voluntary basis, the data is known to be incomplete and many records are

outdated. We filtered IRR records by ignoring all entries that had a “Last Modified”

date that was more than one year old.

52

Presences (AS-IXP pairs) Peeringdb Euro-IX PCH

Listed on source website 2,203 2,478 575

Inferred from reverse DNS 2,878 3,613

Unique within the source 4,092 2,478 3,870

Total unique across all sources 6,084

Table 5.1: IXP membership data, July 2007.

Proprietary Router Configurations and Syslogs: This is a major source for de-

riving the ground truth for our Tier-1 and Tier-2 ISPs, where the latter is a transit

provider and a direct customer of the former. The data include historical configuration

files of more than one thousand routers in these two networks, historical syslog files

from all routers in the Tier-1 network, and “show ip bgp sum” outputs from all routers

in the Tier-2 network. We also have access to iBGP feeds from several routers in these

two networks.

Other Proprietary Data: To obtain the ground truth for other types of networks,

we had conversations with the operators of a small number of content providers. Since

large content providers are unwilling to disclose their connectivity information in gen-

eral, in this thesis we present a fictitious content provider whose numbers of AS neigh-

bors, peer links, and IXP presences are consistent with the data we collected privately.

We also obtained the ground truth of the AS-level connectivity for four stub networks

from their operators.

5.2 Establishing the Ground Truth

We describe here the method we use to obtain the ground truth of AS level connectivity

of the Tier-1 network; we use a similar process for the other networks. To obtain the

53

Neighbor V AS MsgRcvd MsgSent TblVer InQ OutQ Up/Down State/PfxRcd

4.68.1.166 4 3356 387968 6706 1652742 0 0 4d15h 231606

64.71.255.61 4 812 600036 6706 1652742 0 0 4d15h 230964

64.125.0.137 4 6461 0 0 0 0 0 never Idle

65.106.7.139 4 2828 466128 6706 1652742 0 0 4d15h 232036

Figure 5.1: Output of “show ip bgp summary” command.

10

20

129.213.1.1

129.213.1.2

BGP multihop

30

R0

R1

R2

R3

129.213.2.1

175.220.1.2

Figure 5.2: Configuring remote BGP peerings. R0 and R2 are physically directly

connected, while R1 and R3 are not.

AS-level connectivity ground truth, we need to know at each instant in time the BGP

sessions that are in the established state for all the BGP routers in the network. A

straightforward way to do this is to launch the command “show ip bgp summary” in

all the routers simultaneously. Figure 5.1 shows an example output produced by this

command. The state of each BGP session can be inferred by looking at the column

”State/PfxRcd” - when this column shows a numeric value, it refers to the number of

prefixes received from the neighbor router, and it is implied that the BGP session is in

established state. In this example, all connections are in the established state except

for the session with neighbor 64.125.0.137, which is in the idle state.

Due to the large size of the Tier-1 network under study, it is infeasible to run the

54

“show ip bgp sum” command over all the routers of the network and over a long study

period. It is also impossible to obtain any historic “show ip bgp sum” data for the

past. Therefore, we resort to an alternative way to infer the connectivity ground truth

- analyzing routers’ configuration files. Routers’ configuration files are a valuable

source of information about AS level connectivity. Before setting up a BGP session

with a remote AS, each router needs to have a minimum configuration state. As an

example, in Figure 5.2, for router R0 in AS10 to open a BGP session with R2 in AS20,

it needs to have a “neighbor 129.213.1.2 remote-as 20” entry in its configuration file,

as well as IP connectivity between R0 and R2 through a configured route to reach

R2. Similarly, R2 needs to have a configured route to reach R0. The IP connectivity

between the two routers of a BGP session can be established in one of the following

two ways:

• Single-hop: two routers are physically connected directly, as the case of R0

and R2 in Figure 5.2. More specifically R0 can (1) define a subnet for the lo-

cal interface at R0 that includes the remote address 129.213.1.2 of R2, e.g. “

ip address 129.213.1.1 255.255.255.252” (where 255.255.255.252 is the subnet

mask) or (2) set a static route in R0 to the remote address 129.213.1.2 of R2,

e.g. “ip route 129.213.1.0 255.255.255.252 Serial4/1/1/24:0” (in this case Se-

rial4/1/1/24:0 refers to the name of the local interface at R0).

• Multi-hop: two routers (such as R1 and R3 in Figure 5.2) are not directly con-

nected, but connected via other routers. To configure such a multi-hop BGP

session, R1 configures e.g. “neighbor 175.220.1.2 ebgp-multihop 3” (here 3

refers to the number of IP hops between R1 and R3); R1 reaches R3 by doing

longest prefix matching of 175.220.1.2 in its routing table.

Ideally, we would like to verify the existence of a BGP session by checking the

configuration files on both sides of a session. Unfortunately it is impossible to get

55

the router configurations of the neighbor ASes. We thus limit ourselves to check only

the configuration files of routers belonging to the Tier-1 network. We noticed that a

number of entries in the router configuration files did not satisfy the minimal BGP

configuration described above, probably because the sessions were already inactive,

and these sessions should be discarded. After searching systematically through the

historic archive of router configuration files, we ended up with a list of neighbor ASes

that have at least one valid BGP configuration. The “router configs” curve in Figure

5.3 shows the number of neighbor ASes in this list over time1.

However, even after this filtering, we still noticed a considerable number of neigh-

bor ASes that appeared to be “correctly configured”, but did not have any established

BGP session. This could be due to routers on the other side of the sessions not being

configured correctly. Given that we do not have the configuration files for those neigh-

bor routers, we utilize router syslog data to filter out the possible stale entries in the

Tier-1’s router configurations. Syslog records include information about BGP session

failures and recoveries, indicating at which time each session comes up or goes down.

More Specifically, a BGP-5-ADJCHANGE syslog message has the following format:

“timestamp local-router BGP-5-ADJCHANGE: neighbor remote-ip-address Down”,

and it indicates the failure of the session between the local-router and the neighbor

router whose IP address is remote-ip-address. We use the following two simple rules

to further filter the previous list of neighbors:

1. If the last message of a session occurs at day t and the content was “session

down”, and there is no other message from the session in the period [t, t + 1

month], then we assume the session was removed at day t (i.e. we wait at least

one month before discarding the session).

2. If a session is seen in a router configuration at day t, but does not appear in1Note that the number is normalized for non-disclosure reasons.

56

syslog for the period [t, t + 1 year], then we assume the session was removed at

day t (i.e. we wait at least 1 year before discarding the session).

Note that the above thresholds were empirically selected to minimize the number of

false positives and false negatives in the inferred ground truth. A smaller value would

increase the number of false negatives (i.e. sessions that are prematurely removed by

our scheme while still in the ground truth), whereas a higher value would increase the

false positives (i.e. sessions that are no longer in the ground truth, but have not been

removed yet by our scheme). We calibrated the thresholds using AS adjacencies that

were present in both the syslog messages and in the public view, e.g. we quantified the

false negatives by looking at adjacencies that we excluded using the syslog thresholds,

but were actually still visible in the public view. Even though these threshold values

worked well in this case, depending on the stability of links and routers’ configuration

state, other networks may require different values. Note also that these two rules are for

individual BGP sessions only. An AS-level link between the Tier-1 ISP and a neighbor

AS will be removed only when all of the sessions between them are removed by the

above two rules. The sessions between the Tier-1 ISP and its peers tend to be stable

with infrequent session failures [89], thus it is possible that a session never fails within

a year. But our second rule above is unlikely to remove the AS-level link between

the Tier-1 ISP and its peer because there are usually multiple BGP sessions between

them and the probability that none of the sessions have any failures for an entire year

is very small. Similarly, this argument is true for large customer networks which have

multiple BGP sessions with the Tier-1 ISP. On the other hand, small customers tend

to have a small number of sessions with the Tier-1 ISP (perhaps one or two), and the

sessions tend to be less stable thus have more failures and recoveries. Thus if the AS

link exists, the above two rules should not filter it out since some syslog session up

or down messages will be seen. For similar reasons, the results are not significantly

57

0.55

0.6

0.65

0.7

0.75

0.8

0.85

0.9

0.95

1

0 50 100 150 200

Num

ber

of li

nks

(nor

mal

ized

)

Number of days since Jan 1st 2007

router configsrouter configs + syslog

Public view (2004)single peer view (2004)

single customer view (2004)

Figure 5.3: Connectivity of the Tier-1 net-

work (since 2004).

0.55

0.6

0.65

0.7

0.75

0.8

0.85

0.9

0.95

1

0 50 100 150 200

Num

ber

of li

nks

(nor

mal

ized

)

Number of days since Jan 1st 2007

router configs + syslogPublic view (2007)

single peer view (2007)single customer view (2007)

Figure 5.4: Connectivity of the Tier-1 net-

work (since 2007).

affected by the fact that some syslog messages might be lost in transmission due to

unreliable transport protocol (UDP). Using the two simple rules above, we removed

a considerable number of entries from the config files, and obtained the curve “router

configs+syslog” in Figure 5.3; note that our measurement started in 2006-01-01, but

we used an initial 1-year window to apply the second syslog rule. In the next section

we compare in detail the inferred ground truth with the observable connectivity in the

public view for different networks, including the Tier-1.

5.3 Case studies

In this section we compare the ground truth of networks for which we have operational

data with the connectivity derived from the public view to find out what links are

missing from the latter and why they are missing.

58

0.55

0.6

0.65

0.7

0.75

0.8

0.85

0.9

0.95

1

0 50 100 150 200

Num

ber

of li

nks

(nor

mal

ized

)

Number of days since Jan 1st 2007

router configs + syslogOregon RV RIB+updatesOregon RV RIB snapshot

Figure 5.5: Capturing the connectivity of

the Tier-1 network through table snap-

shots and updates.

0.6

0.65

0.7

0.75

0.8

0.85

0.9

0.95

1

0 20 40 60 80 100 120 140 160 180

Num

ber

of li

nks

(nor

mal

ized

)

Number of days since March 10th 2007

router configsPublic view

single customer viewsingle provider view

show ip bgp sumPublic view (show ip bgp sum)

Figure 5.6: Tier-2 network connectivity.

5.3.1 Tier-1 Network

Once we achieved a good approximation of the ground truth as described in the pre-

vious section, we compared it to the public view derived connectivity. For each day

t, we compared the list of ASes in the inferred ground truth Ttier1(t) obtained from

router configs+syslog, with the list of ASes seen in public view as connected to the

Tier-1 network up to day t. The “Public view (2004)” curve in Figure 5.3 is obtained

by accumulating public view BGP-derived connectivity since 2004. We first note that

all the Tier-1 ISP’s links to its peers and sibling ASes are captured by the public view.

In particular, we note that the public view captured all the peer-peer links of the Tier-1

ISP. The peer links of an AS are visible as long as a monitor resides in the AS itself,

or in any of the AS’s customers, or the customer’s customers. In fact the public view

captured all the peer-peer links for all tier-1 ASes, due to the small number of tier-1

networks and the fairly large set of monitors used by public view.

Comparing the “Public view (2004)” curve with the “router configs+syslog” curve

in Figure 5.3, we also note that there is an almost constant small gap, which is of the

59

0.65

0.7

0.75

0.8

0.85

0.9

0.95

1

0 50 100 150 200

Num

ber

of li

nks

(nor

mal

ized

)

Number of days since Jan 1st 2007

Customer viewbecame available

router configsOregon RV RIB+updatesOregon RV RIB snapshot

Figure 5.7: Capturing Tier-2 network con-

nectivity through table snapshots and up-

dates.

60

70

80

90

100

110

120

130

0 100 200 300 400 500

Num

ber

of li

nks

Number of days since Feb 22nd 2006

show ip bgp sum, ipv4+ipv6show ip bgp sum, ipv4 only

Abilene eBGP feedPublic view

Figure 5.8: Abilene connectivity.

order of some tens of links (3% of the total links in “router configs+syslog”). We

manually investigated these links, and found that there are three main causes for why

they do not show up in the public view: (1) the links that connect to the Tier-1’s

customer ASes which only advertise prefixes longer than /24; these long prefixes are

then aggregated by the Tier-1 AS before announcing to other neighbors. This category

accounts for about half of the missing links; (2) there is one special purpose AS number

(owned by the Tier-1 ISP) which is only used by the Tier-1 ISP; (3) false positives,

i.e. ASes that were wrongly inferred as belonging to Ttier1(t), including stale entries,

as well as newly allocated ASes whose sessions were not up yet. The false positive

contributes to about half of the “missing links” (which should not be called ”missing”).

Figure 5.4 shows similar curves using the same vertical scale as in Figure 5.3,

but this time the public view BGP data collection is started in the beginning of 2007.

When comparing “Public view (2007)” and “router configs+syslog” we note the gap

is bigger, indicating that some entries in “router configs+syslog” did not show up in

public view after 2007, but they did show up before, which likely means they are stale

60

entries (false positives).

The “Single customer view” and “Single peer view” curves in both Figures 5.3 and

5.4 represent the Tier-1 connectivity as seen from a single router in a customer of the

Tier-1 ISP and a single router in a peer of the ISP, both from the public view. The sin-

gle peer view captures slightly less links than the single customer view, corresponding

to about ∼1.5% of the total number of links of the Tier-1 network. Further analysis

revealed that this small delta corresponds to the peer links of the Tier-1, which are

included in routes advertised to the customer but not advertised to the peer. This is

expected and consistent with the no-valley routing policy. We also note that the “Sin-

gle peer view” and “Single customer view” curves in Figure 5.4 show an exponential

increase in the first few days of the x-axis, which is caused by the revelation of hidden

links, as explained in Chapter 2. However, the nine months of the measurement should

be enough to reveal the majority of the hidden links [72]. In addition, note that in both

figures, the “Single customer view” curve is very close to the public view curve, which

means that the connectivity of the Tier-1 as seen by the customer is representative of

what is visible from the public view.

Figure 5.5 shows the difference between using routing table snapshots (RIB) ver-

sus using an initial RIB plus BGP updates from all the routers at Oregon RouteViews

(a subset of 46 routers of the entire public view). Note that on each day, the num-

ber of links in the curves “Oregon RV (RouteViews) RIB snapshot” and “Oregon RV

RIB+updates” represent the overlap with the set of links in the inferred ground truth

represented by the curve “router configs+syslog”, i.e. , those links not in “router con-

figs + syslog” are removed from the two “Oregon RV” curves. Even though both

curves start in the same point, after more than nine months of measurement, “Oregon

RV RIB+upates” reveals about 10% more links than those revealed by “Oregon RV

RIB snapshot”, these are the links that were revealed in BGP updates of alternative

61

routes encountered during path exploration as described in [71]. We also note that the

difference between the two curves are all customer-provider links, and all the Tier-1

ISP’s links to the peers are captured by the ”Oregon RV RIB snapshot”, because of the

large number of routes that go through these peer-peer links.

Summary:

• A single snapshot of the Oregon RV RIB can miss a noticeable percentage (e.g.,

10%) of the Tier-1’s AS-level links, all of them customer-provider links, when

compared to using RIBs plus updates accumulated in several months.

• The Tier-1 AS’s links are covered fairly completely by the public view over

time. All the peer-peer and sibling links are covered; the small percentage (e.g.,

1.5%) of links missing from public view are the links to customer ASes who

only announce prefixes longer than /24 and hence their routes are aggregated.

• The Tier-1 AS’s links are covered fairly completely by a single customer by us-

ing the historic BGP tables and updates, which can be considered representative

of the public view.

• The Tier-1 AS’s links are covered fairly completely by a single peer (when the

historic BGP table and updates are used), and the about 1.5% missing links are

all peer-peer links.

5.3.2 Tier-2 Network

The Tier-2 network we studied differs from the previous Tier-1 case in a few important

ways. First of all, not being a Tier-1 network, the Tier-2 has providers. Second, it

is considerably smaller in size as measured by the number of routers, however it has

considerably more peer links than the Tier-1 network. Third, the Tier-1 network peers

62

exclusively through private peering, this Tier-2 network had close to 23

of its peers

through IXPs. We do an analysis similar to the Tier-1 case, except that we did not have

access to syslog data.

The “router configs” curve in Figure 5.6 shows the number of neighbor ASes ob-

tained from router configurations over time. Let us assume for now this is a good

approximation of the ground truth of the Tier-2 network connectivity. We include in

Figure 2 two single router view curves, one is obtained from a router in a customer of

the Tier-2 network, and the other is derived from a router in a provider of the Tier-2

network, both are in the public view. Note that this time we started the measurement

in March 2007 when the BGP data for the customer router became available in the

public view. This customer router became unavailable after August 13, 2007, hence

the single customer view curve is chopped off after that date. Figure 5.6 shows that the

provider view misses a significant number of links that are captured by the customer

view. This difference amounts to more than 12% of the Tier-2’s links captured by the

customer, which are all the peer links of the Tier-2 network. For comparison, we also

included the public view curve, starting at March 10th 2007. Note that the public view

captured a very small number of neighbors that are not in the customer view. We found

that most of the links in this small gap were revealed in the routes that were originated

by the Tier-2’s customers and had several levels of AS prepending. The customer we

used for the customer view curve did not pick these routes because of the path inflation

due to the AS prepending, however following the prefer-customer policy, routers in the

Tier-2 network picked these prepended routes, and one of these routers is in the public

view data set.

From Figure 5.6 we also note that the connectivity captured by the public view is

∼85% of that inferred from router configs, which could be due to incorrect or stale

entries in the router configuration files. To verify whether this is the case, we launched

63

a “show ip bgp summary” command on all the routers of the network on 2007-09-03,

and we take into account only those BGP sessions that were in the established state.

The number of neighbors with at least one such session is shown in Figure 5.6 by the

“show ip bgp sum” point, which has only 80% of the connectivity inferred from the

router configurations. This means that about 20% of the connectivity extracted from

router configs were false positives. On the other hand, we observe that by accumulating

BGP updates over time, we also increase the number of false positives, i.e. adjacencies

that were active in the past and became inactive. By comparing the curves “Public

View” and “Public view (show ip bgp sum)”, we note that about 1 − 0.750.85' 0.12 (or

12%) of the links accumulated in public view over the 6-month period correspond to

false positives. There are however ways to filter these false positives: (1) by removing

the short-lived links, since most likely they correspond to misconfigurations, or (2) by

timing out links after a certain period of time. The point “Public view (show ip bgp

sum)” in the figure represents the intersection between the set of neighbors extracted

from “show ip bgp sum” and the set of neighbors seen so far in the public view. Note

that public view missed∼7% of the links given by “show ip bgp sum”, which amounts

to a few tens of links. One of these links was the RouteViews passive monitoring

feed, some other were internal AS numbers, and the remaining ones were to the ASes

announcing longer than /24 routes (that were aggregated). Note also that the fairly

complete coverage of the Tier-2 network’s connectivity is due to the existence of a

monitor residing in a customer of the Tier-2. As we explained in the Tier-1’s study,

the public view can capture all the links, including all peer links of an AS, if a monitor

resides in either the AS itself, or in the AS’s customer or customer’s customers.

Figure 5.7 shows the difference between using single RIB snapshot versus initial

RIB+updates from RouteViews Oregon collector, using the same vertical scale as in

Figure 5.6. In this case, using updates reveals ∼12% more links than those revealed

by router RIB snapshots in the long run. Note that there is a lack of configuration files

64

at beginning of 2007, hence the missing initial part on the curve “router configs”. The

jump in the figure is due to the addition of the monitor in the Tier-2 customer AS,

which revealed the peer links of the Tier-2 network.

Summary:

• A single snapshot of the Oregon RV RIB can miss a noticeable percentage (e.g.,

12%) of the Tier-2’s AS-level links, all of them customer-provider links, when

compared to using RIBs+updates accumulated in several months.

• The Tier-2 AS’s links are covered fairly completely by a single customer over

time (RIBs +updates), which can be considered representative of the entire pub-

lic view.

• A single provider view can miss a noticeable percentage (e.g., 12%) of the Tier-

2’s links, and all the missing links are peer-peer links.

• A Tier-2 AS’s links are covered fairly completely by the public view over time

if there is a monitor in it, or its customer or its customer’s customers, in which

case all the peer-peer links are revealed. The small percentage (e.g., 7%) of

links missing from the public view are those connecting to customers who only

announce prefixes longer than /24 or those ASes dedicated for internal use.

5.3.3 Abilene and Geant

Abilene: Abilene (AS11537) is the network interconnecting universities and research

institutions in the US. The Abilene Observatory [19] keeps archives of the output of

“show ip bgp summary” for all the routers in the network. Using this data set, we built

a list of Abilene AS neighbors over time, which is shown in the “show ip bgp sum,

ipv4+ipv6” curve in Figure 5.8. Even though Abilene does not provide commercial

65

transit, it enables special arrangements where its customers may inject their prefixes to

commercial providers through Abilene, and receive routes from commercial providers

through Abilene. The academic-to-commercial service is called Commercial Peer-

ing Service (or CPS) versus the default academic-to-academic Research & Education

(R&E) service. These two services are implemented by two different VPNs over the

Abilene backbone. BGP sessions for both VPNs are included in the output of “show ip

bgp summary”. We compare Abilene connectivity ground truth with that derived from

a single router eBGP feed (residing in Abilene) containing only the R&E sessions. In

addition, we do a similar comparison with our public view, which should contain both

CPS and R&E sessions (public view contains eBGP+iBGP Abilene feeds, as well as

BGP data from commercial providers of Abilene). However, since there are a consid-

erable number of neighbors in Abilene that are using IPv6 only, and since the BGP

data in our data set are mostly IPv4-only, we decided to place the IPv4-only neighbors

in a separate set. The curve “show ip bgp sum, ipv4 only” in Figure 5.8 shows only

the AS neighbors that have at least one IPv4 session connected to Abilene2. Contrary

to the “show ip bgp sum, ipv4+ipv6” curve which includes all sessions, the IPv4-only

curve shows a decreasing trend. We believe this is because some of the IPv4 neighbors

have been migrating to IPv6 over time. When comparing the “show ip bgp sum, ipv4

only” curve with the one derived from the eBGP feed, we find there is a constant gap

of about 10 neighbors. A closer look into these cases revealed that these AS num-

bers belonged to commercial ASes with sessions associated with the CPS service. The

small gap between the public view and the IPv4-only curve corresponds to the passive

monitoring session with RouteViews (AS6447).

Geant: Geant (AS20965) is an European research network connecting 26 R&E net-

works representing 30 countries across Europe. In contrast to Abilene where the focus

2Note that there was a period of time between days 350 and 475 for which there was no “show ipbgp sum” data from Abilene.

66

is on establishing academic-to-academic connectivity, Geant enables its members to

connect to the commercial Internet using its backbone. We inferred Geant connectiv-

ity ground truth by running the command “show ip bgp sum” in all its routers through

its looking glass site [5]. We found a total of 50 AS neighbors with at least one ses-

sion in the established state. By comparing Geant ground truth with the connectivity

revealed in public view, we found a match on all neighbor ASes except two. One of

the exceptions was a neighbor which was running only IPv6 multicast sessions, and

therefore hidden from public view which consists mostly of IPv4-only feeds. The other

exception seems due to a passive monitoring session to a remote site, which explains

why its AS number was missing from BGP feeds.

Summary: In Abilene and Geant, the public view matches the connectivity ground

truth (no invisible or hidden links), capturing all the customer-provider and peer links.

Abilene represents a special case, where depending on the viewpoint there can be in-

visible links. For instance, some Abilene connectivity may be invisible to its customers

due to the academic-to-commercial special arrangements.

5.3.4 Content provider

Content networks are fundamentally different from transit providers such as the Tier-1

and Tier-2 cases we studied earlier. Content networks are edge ASes and do not transit

traffic between networks, thus they only have peers and providers. They generally try

to reduce the amount of (more expensive) traffic sent to providers by directly peering

with as many other networks as possible; direct peerings can also help improve per-

formance. Consequently, content networks in general have a heavy presence at IXPs,

where they can peer with multiple different networks. While two transit providers

usually peer at every location where they have a common presence in order to disperse

traffic to closer exit-points, peering of content networks is more “data-driven” (versus

67

0

200

400

600

800

1000

1200

1400

1600

0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1

Num

ber

of li

nks

Connection probability per IXP (q)

IXP-based projectionPublic view

Public view + IRR

Figure 5.9: Projection of the number of peer ASes of a representative content provider.

“route-driven”), and may happen in only a fraction of the IXPs where two networks

have common locations. Based on this last observation, we estimate the connectivity

of a representative content provider C, and compare it to the connectivity observed

from the public view. We assume that in each IXP where C has presence, it connects

to a fixed fraction q of the networks that are also present at that IXP, i.e. if C has n

common locations with another network X , the chances that C and X are connected

in at least one IXP are given by 1 − (1 − q)n. More generally, the expected number

of peer ASes of C, PC , is given by PC =∑

i(1− (1− q)ni), where i is summed over

all the networks that have at least one common presence with C, and ni is the number

of IXPs where both C and i have presence. In our data set, C has presence in 30 IXPs

worldwide, which is very close to the number that was disclosed to us by the operators

of C. Furthermore, we know that the number of providers of C is negligible com-

pared to the number of its peers, and that more than 95% of its peerings are at IXPs.

Therefore it is reasonable to represent the AS-level connectivity of C by its peerings

at IXPs.

Figure 5.9 shows the projection of the number of neighbor ASes of C as a function

68

of the connection probability q at each IXP. For comparison purposes, we also include

the number of neighbor ASes of C as inferred from the public view over a window of

6 months. From discussions with C’s operators, we know that at each IXP, C peers

with about 80-95% of the participants at the IXP (parameter q), and that the total

number of BGP sessions of C is more than 2,000, even though we do not know the

total number of unique peer ASes3. In view of these numbers, the projection in Figure

5.9 seems reasonable, even after taking into account that our IXP membership data

is incomplete. The most striking observation is the amount of connectivity missed

from the public view, which is on the order of thousands of links and represents about

90% of C’s connectivity. This result is not entirely surprising, however, because based

on no-valley policy, the content provider C does not announce its peer-peer links to

anyone, and a peer-peer link is visible only if the public view has a monitor in C, or

in the peer or a customer of the peer. Yet the number of available monitors is much

smaller than the projected total number of C’s peer. We believe this result holds true

for other large content providers, search engines, and content distribution networks.

Trying to close the gap between reality and the public view, we looked for addi-

tional connectivity in the IRR, as described in Section 5.1. We discovered 62 additional

neighbor ASes for C that were not present in the initial set of 155 ASes seen in the

public view. Even though this addition increased the number of covered neighbor ASes

of C to 217, it is still only about 15% of the AS-level connectivity of C.

Summary: The public view misses about 90% of C’s connectivity, and we believe

all of them are invisible peer-peer links, and most of them are likely at IXPs. Using

IRR information reduces the missing connectivity slightly, to 85%. The public BGP

view’s inability to catch these peer-peer links is due to the no-valley policy and the

absence of monitors in the peers or their customers of the content network.

3The number of unique neighbor ASes is less than the total number of BGP sessions, as there existmultiple BGP sessions with the same neighbor AS.

69

5.3.5 Simple stubs

Stub networks are those ASes that do not have customers (or have a very small num-

ber of customers)4. Stubs represent the vast majority of ASes, and they are typically

sorted according to their business rationale into: 1)content, 2)eyeball and 3)simple.

Content networks have heavy outbound traffic, whereas eyeballs are heavy inbound

(e.g. cable/dsl providers). Simple stubs represent enterprise customers such as uni-

versities and small companies. We obtained the AS-level connectivity ground truth of

4 simple stubs by directly contacting their operators. Table 5.2 shows for each net-

work the number of neighbor ASes in the ground truth as reported by the operators, as

well as the number of neighbor ASes captured by the BGP-derived public view. Note

that for public view we use 6 month worth of BGP RIB and updates to accumulate

the topology to account for hidden links that take time to be revealed [72]. Network

D is the only case where there is a perfect match between ground truth and public

view. For network A, there are two neighbors included in public view that were dis-

connected during the 6-month window (false positives). For network B, the public

view was missing a neighbor due to a special agreement in which the routes learned

from the neighbor are not announced to B’s provider. Finally, for network C there was

an extra neighbor in public view that was never connected to C, but appeared in routes

during one day in the 6-month window. We believe this case was originated either by

a misconfiguration or a malicious false link attack.

Summary: The 6-month accumulated public view captured all the customer-provider

links of the stub networks studied. In total, the public view has one false negative (in-

visible link) and 3 false positives, the latter can be eliminated by reducing the interval

of the observation window of public view.

4The details about stub classification are describe in Section 5.4.2

70

Network # of neighbor ASes #of neighbor ASes

in ground truth in public view

A 8 10

B 7 6

C 3 4

D 2 2

Table 5.2: Connectivity of stub networks.

5.4 Completeness of the public view

In this section, we first summarize the classes of topological information that are cap-

tured and necessarily missed in the public view. Based on this observation, we then

describe a novel method to infer the business relationships between ASes. We use the

inferred relationships to do AS classification and determine how much of the topology

is covered by the current set of monitors in the public view.

5.4.1 ”Public view” vs. ground truth

We use Figure 5.10 as an illustration to summarize the degree of completeness of the

observed topology as seen by the public view. Our observations presented here are

the natural results of the no-valley-and-prefer-customer policy, and some of them have

been speculated briefly in previous work. In this thesis we quantify and verify the

degree of completeness by comparing the ground truth with the observed topology.

Though the few classes of networks we have examined are not necessarily exhaustive,

we believe the observations drawn from these case studies provide insights that are

valid for the Internet as a whole.

First, if a monitor resides in an AS A, the public view should be able to capture

71

all of A’s direct links, including both customer-provider and peer links. However, not

all the links of the AS may show up in a snapshot observation. It takes time, which

may be as long as a few years, to have all hidden customer-provider links exposed by

routing dynamics. Second, a monitor in a provider network should be able to capture

all the provider-customer links between itself and all of its downstream customers, and

a monitor in a customer network should be able to capture all the customer-provider

links between itself and its upstream providers. For example, in Figure 5.10, a monitor

in AS2 can capture not only its direct provider-customer links (2-6 and 2-7), but also

the provider-customer links between its downstream customers (6-8, 6-9, 7-9, and 7-

10). AS5, as a peer of AS2, is also able to capture all the provider-customer links

downstream of AS2 since AS2 will announce its customer routes to its peers. Again,

it can take quite a long time to reveal all the hidden links. Third, a monitor cannot

observe a peer link of its customer, or peer links of its neighbors at the same tier 5. For

example, a monitor at AS5 will not be able to capture the peer link 6-7 or 1-2, because

a peer route is not announced to providers or other peers according to the no-valley

policy. Fourth, to capture a peer link requires a monitor in one of the peer ASes or in a

downstream customer of the two ASes incident to the link. For example, a monitor at

AS9 can observe the peer links 6-7 and 5-2, but not the peer link 1-3 since AS9 is not

a downstream customer of either AS1 or AS3.

The current public view has monitors in all the Tier-1 ASes except one, and that

particular Tier-1 AS has a direct customer AS that hosts a monitor. Applying the

above observations, we can summarize and generalize the completeness of the AS-

level topology captured by the public view as follows.

• Coverage of Tier-1 links: The public view contains all the links of all the Tier-1

ASes.5We assume that the provider-customer links do not form a circle.

72

• Coverage of customer-provider links: There is no invisible customer-provider

link. Thus over time the public view should be able to reveal all the customer-

provider links in the Internet topology, i.e. , the number of hidden customer-

provider links should gradually approach zero with the increase of the observa-

tion period length. This is supported by our empirical findings: in all our case

studies we found all the customer-provider links from BGP data collected over

a few years.

• Coverage of peer links: The public view misses a large number of peer links,

especially peer links between lower tier ASes in the Internet routing hierarchy.

The public view will not capture a peer link A–B unless there is a monitor

installed in either A or B, or in a downstream customer of A or B. Presently, the

public monitors are in about 400+ ASes out of a total over 27,000 existing ASes,

this ratio gives a rough perspective on the percentage of peer links missing from

the public view. Peer links between stub networks (i.e. , links 8-9 and 9-10 in

Figure 5.10) are among the most difficult ones to capture. Unfortunately, with

the recent growth of content networks, it is precisely these links that are rapidly

increasing in numbers.

5.4.2 Network Classification

The observations from the last section led us to a novel and simple method for inferring

the business relationships between ASes, that allow us also to classify ASes in different

types.

73

2

Provider CostumerPeer Peer

13

45

8

Tier-1

76

9 10

Propagation ofcustomer routes

Figure 5.10: Customer-provider links can

be revealed over time, but downstream

peer links are invisible to upstream mon-

itors.

0.8

0.82

0.84

0.86

0.88

0.9

0.92

0.94

0.96

0.98

1

0 20 40 60 80 100

Fre

quen

cy (

CD

F)

Number of customer ASes downstream

Figure 5.11: Distribution of number of

downstream customers per AS.

1

2 3

Provider CostumerPeer Peer

65

p

p

4

pinvisiblelink

Figure 5.12: Example of a prefix hijack scenario where AS2 announces prefix p be-

longing to AS1. Because of the invisible peer link AS2–AS3, the number of ASes

affected by the attack is underestimated.

74

5.4.2.1 Inferring AS Relationships

The last section concluded that, assuming routes follow a no-valley policy, monitors

at the top of the routing hierarchy (i.e. those in Tier-1 ASes) are able to reveal all the

downstream provider-customer connectivity over time. This is an important observa-

tion since, by definition, each non-Tier-1 AS is a customer of at least one Tier-1 AS,

then essentially all the provider-customer links in the topology can be observed by

the Tier-1 monitors over time. This is the basic idea of our AS relationship inference

algorithm.

We start with the assumption that the set of Tier-1 ASes is already known6. By

definition of Tier-1 ASes, all links between Tier-1s are peer links, and a Tier-1 AS

is not a customer of any other ASes. Suppose a monitor at Tier-1 AS m reveals an

ASPATH m-a1-a2-...-an. The link m-a1 can be either a provider-customer link, or a

peer link (this is because in certain cases a Tier-1 may have a specially arranged peer

relationship with a lower-tiered AS). However, according to the no-valley policy, a1-

a2, a2-a3, ... , an−1-an must be provider-customer links, because a peer or provider

route should not be propagated upstream from a1 to m. Therefore the segment a2, ...,

an must correspond to a customer route received by a1. To infer the relationship of

m-a1, we note that according to no-valley policy, if m-a1 is a provider-customer link,

this link should appear in the routes propagated from m to other Tier-1 ASes, whose

monitors will reveal this link. On the other hand, if m-a1 is a peer link, it should

never appear in the routes received by the monitors in other Tier-1 ASes. Given we

have monitors in all Tier-1 ASes or their customer ASes, we can accurately infer the

relationship m-a1 by examining whether it is revealed by other Tier-1 ASes. Using

this method, we can first find and label all the provider-customer links, and then label

all the other links revealed by the monitors as peer links.6The list of Tier-1 ASes can be obtained from website such as

http://en.wikipedia.org/wiki/Tier 1 carrier

75

Our algorithm is illustrated in Figure 5.10, where 1, 2, 3, and 4 are known to be

Tier-1s. Suppose AS 2 monitor reveals an ASPATH 2-5-6-8 and another ASPATH 2-

7-9; while monitors at AS 4 reveals an ASAPTH 4-2-7-9, but none of 1, 3, 4 reveals

an ASPATH with segment of 2-5-6-8. According to our new method, 5-6, 6-8, and

7-9 are definitely provider-customer links. 2-7 is a provider-customer link since it is

revealed by Tier-1s other than 2, while 2-5 is a peer link since it is not revealed by any

other Tier-1s. Furthermore, suppose AS 6 is a monitor and it reveals link 6-7, and 6-7

is never revealed by Tier-1 ASes 1,2,3, or 4. Then we can conclude that this 6-7 is a

peer link.

From BGP data collected from all the Tier-1 monitors over a 7-month period, we

were able to infer a total of 70,698 provider-customer links. We also noticed that a

small number of these links only existed in routes that had a very short lifetime (less

than 2 days). These cases are most likely caused by BGP misconfigurations (e.g. route

leakages) or route hijacks, as described in [61]. After filtering all the routes with a

lifetime less than 2 days over the 7-month measurement period, we excluded 5,239

links, ending up with a total of 65,459 provider-customer links. Note that even though

our relationship inference has the advantage of being simple, its accuracy can still be

improved. For instance, we could use the algorithm in [42] to select a maximal set of

AS paths that do not create cycles in relationships and are valley-free, and only con-

sider such relationships as valid. Note that out algorithm differs from the classic Gao’s

algorithm [39] in several ways. First, our algorithm is able to infer all the customer

provider relations based only in a very limited number of sources (the Tier-1 routers).

Second, contrary to [39], we do not rely on node degree to infer peer relationships. In

fact, the node degree is a variable of the monitor set, and that is the main reason why

[39] produces so distinct results with varying monitor sets. Our inference of peer rela-

tionships is purely based on the no-valley premise that peer routes are not propagated

upstream, therefore we believe our inference results are more accurate.

76

5.4.2.2 AS classification

AS classification schemes are typically based on each AS’s node degree (the num-

ber of neighbors) or the number of prefixes originated. However, the degree can be

misleading since it is a mix of providers, peers and customers in one count, and the

number of prefixes originated is not very reliable either since the length of the prefixes

is different and the routes carried downstream are not accounted. With the inferred

provider-customer relations in hand, we decided to use the number of downstream

customer ASes (or “customer cone”) as also defined in [34]. Figure 5.11 shows the

distribution of the number of downstream customers per AS. We note that over 80%

of the ASes have no customers, and a noticeable fraction of ASes have a very small

number of customers. We label as stub those ASes with 4 or less customers, which

encompass about 92% of the ASes. This should correspond to end networks which

either don’t provide transit or have very limited transit to few local customers, e.g.

universities providing transit to small local research facilities. Based on the knee of

the distribution in Figure 5.11, we label as small ISPs those ASes with between 5 and

50 downstream customers. They correspond to about 6% of the total ASes. The re-

maining non-tier-1 ASes in the long tail are labeled as large ISPs. Table 5.4 shows the

number of ASes in each class. We analyzed the sensitivity of the classification thresh-

olds by changing their values by some delta, and did not notice significant difference

in the end result.

5.4.3 Coverage of the public view

With our new method for AS relationship inference and AS classification, we now

attempt a rough quantification of the completeness of the AS topology as observed by

the public view. According to our observations in 5.4.1, a monitor can uncover all

the upstream connectivity over time. For example, in Figure 5.10, a monitor at AS

77

Parameter Full tables Full+partial tables

No. monitored 121 411

ASes

Covered ASes 1,101 / 28,486 ' 4% 1,552 / 28,486 ' 5 %

Table 5.3: Coverage of BGP monitors.

7 will receive routes from upstream providers that will carry the peer links existing

upstream, in this case the links 2-1, 2-3, 2-4 and 2-5 (in addition to the upstream

provider-customer links). Therefore, by starting at AS 7 and following all provider-

customer links upstream, we pass through all the ASes that are covered by a monitor

in AS 7, in the sense that this monitor is able to reveal all their connectivity. In Figure

5.10, AS 7 only covers AS 2, but AS 9 covers 4 upstream ASes: 6, 7, 2, and 5.

We applied this reasoning to the monitored ASes in the public view, and the re-

sults are shown in Table 5.3. For comparison purposes, we included the results from

using the set of monitors with full routing tables and that from using all the monitors

with either full or partial routing tables; the difference between the two sets is small.

Among the 400+ monitors, only a minority have full tables, and due to the overlap in

covered ASes between Routeviews and RIPE-RIS, the set of monitors with full tables

correspond to only 126 ASes. This set of monitors in the public view is only able to

cover 4% of the total number of ASes in Internet. This result indicates that the AS

topologies derived from the public view, which have been widely used by the research

community, may miss most of the peer connectivity within the remaining 96% of the

ASes (or 57% of the transits).

Finally, we look at the covered ASes in terms of their classes, which is shown in

Table 5.4. The column “Covered ASes-aggregated” refers to the fraction of covered

ASes in each AS class, whereas the column “Covered ASes-by covering type” refers

78

Type ASes Monitored Covered ASes

ASes aggregated by covering type

Tier-1 9 8 9 (100%) 8

Large ISP 436 45 337 (77.3%) 954

Small ISP 1,829 36 629 (34.4%) 269

Stubs 26,209 37 126 (0.5%) 160

Table 5.4: Coverage of BGP monitors for different network types.

to the total number of ASes covered by the monitors in each class. For instance, 77.3%

of the large ISPs are covered by monitors, and monitors in large ISPs cover a total of

954 total ASes. The numbers in the table indicate that Tier-1s are fully covered, large

ISPs are mostly covered, small ISPs remain largely uncovered (just 34.4%), and stubs

are almost completely uncovered (99.5%). These results are due to the fact that most

of the monitors reside in the core of the network. In order to cover a stub, we would

need to place a monitor in that stub, which is infeasible due to the very large number

of stubs in Internet.

5.5 Discussion

The defects in the inferred AS topologies, as revealed by our case studies, may have

different impacts on the different research projects and studies that use an inferred AS

topology. In the following, we use a few specific examples to illustrate some of the

problems that can arise.

Stub AS growth rates and network diameter: Given that the public view captures

almost all the AS nodes and customer-provider links, it provides an adequate data

source for studies on AS-topology metrics including network diameter; growth rates

79

and trends for the number of stub ASes; and quantifying customer multihoming (where

multihoming here does not account peer links).

Other graph-theoretic metrics: Given that the public view is largely inadequate in

covering peer links, and given that these peer links typically allow for shortcuts in

the data plane, relying on the public view can clearly cause major distortions when

studying generic graph properties such as node degrees, path lengths, node clustering,

etc.

Impact of prefix hijacking: Prefix hijacking is a serious security threat facing In-

ternet and happens when an AS announces prefixes that belong to other ASes. Recent

work on this topic [53, 100, 22, 92] evaluates the proposed solutions by using the in-

ferred AS topologies from the public view. Depending on the exact hijack scenario, an

incomplete topology can lead to either an underestimate or overestimate of the hijack

impact. Figure 5.12 shows an example of a hijack simulation scenario, where AS2

announces prefix p that belongs to AS1. Because of the invisible peer link 1–2, the

number of impacted ASes is underestimated, i.e. ASes 3,5 and 6 are believed to pick

the route originated by AS1, whereas in reality they would pick the more preferred

peer route coming from the hijacker AS2. At the same time, an incomplete topology

could also lead simulations to overestimate the impact of a hijack. For example, the

content network C considered in Section 5.3 has a large number of direct peers who

are unlikely to be impacted by a hijack from a remote AS, so missing 90% of C’s peer

links in the topology would significantly overestimate the impact of such a hijack. On

the other hand, if C is a hijacker, then the incomplete topology would result in a vast

underestimation of the impact.

Relationship inference/path inference: Several studies have addressed the problem

of inferring the relationship between ASes based on observed routing paths [39, 85,

63]. There can be cases where customer-provider links are wrongly inferred as peer

80

links based on the observed set of paths, creating a no-valley violation. Knowledge

of the invisible peer links in paths could avoid some of these errors. The path infer-

ence heuristics [63, 67, 68] are also impacted by the incompleteness problem, mainly

because they a priori exclude all paths that traverse invisible peer links.

Routing resiliency to failures: Studies that address robustness properties of the

Internet under different failure scenarios (e.g., see [35, 92]) also heavily depend on

having a complete and accurate AS-level topology, on top of which failures are sim-

ulated. One can easily envision scenarios where two parts of the network are thought

to become disconnected after a failure, while in reality there are invisible peer links

connecting them. Given that currently inferred AS maps tend to miss a substantial

number of peer links, robustness-related claims based these inferred maps need to be

viewed with a grain of salt.

Evaluation of new inter-domain protocols: The evaluation of new inter-domain

routing protocols also heavily relies on the accuracy of the AS-level topology over

which a new protocol is supposed to run. For instance, [87] proposes a new protocol

where a path-vector protocol is used among Tier-1 ASes, and all the ASes under each

Tier-1 run link-state routing. The design is based on an assumption that customer trees

of Tier-1 ASes are largely disjoint, and violations of this assumption are handled as

rare exceptions. However, in view of our findings, there are a substantial number of

invisible peer links interconnecting ASes at lower tier and around the edge of Internet,

therefore connectivity between different customer trees becomes the rule rather than

the exception. We would imagine the performance of the proposed protocol under

complete and incomplete topologies to be different, possibly quite significantly.

81

CHAPTER 6

Path Exploration and Internet Topology

In this chapter we study how the topology structure and relationship between different

networks constrain the path exploration process that occurs in BGP.

6.1 BGP Path Exploration

A number of previous analytical and measurement studies ([51, 52, 62]) have shown

the existence of BGP path exploration and slow convergence in the operational Internet

routing system, which can potentially lead to severe performance problems in data

delivery. Path exploration suggests that, in response to path failures or routing policy

changes, some BGP routers may try a number of transient paths before selecting a

new best path or declaring unreachability to a destination. Consequently, a long time

period may elapse before the whole network eventually converges to the final decision,

resulting in slow routing convergence. An example of path exploration is depicted in

Figure 6.1, where node C’s original path to node E (path 1) fails due to the failure of

link D-E. C reacts to the failure by attempting two alternative paths (paths 2 and 3)

before it finally gives up. The experiments in [51, 52, 62] show that some BGP routers

can spend up to several minutes exploring a large number of alternate paths before

declaring a destination unreachable.

The analytical models used in the previous studies tend to represent worst case sce-

narios of path exploration [51, 52], and the measurement studies have all been based

82

D

A B

C

E

3

1

2

X

Figure 6.1: Path exploration triggered by a fail-down event.

on controlled experiments with a small number of beacon prefixes. In the Internet op-

erational community there exist various different views regarding whether BGP path

exploration and slow convergence represent a significant threat to the network perfor-

mance, or whether the severity of the problem, as shown in simulations and controlled

experiments, would be rather rare in practice. A systematic study is needed to quantify

the pervasiveness and significance of BGP slow convergence in the operational routing

system.

6.2 Methodology and Data Set

Previous measurement results on BGP slow convergence were obtained through con-

trolled experiments. In these experiments, a small number of “beacon” prefixes are

periodically announced and withdrawn by their origin ASes at fixed time intervals [11,

13], and the resulting routing updates are collected at remote monitoring routers and

analyzed. In addition to generate announcements and withdrawals (Tup and Tdown

events), one can also use a beacon prefix to generate Tlong events by doing AS prepend-

ing [51]. For a given beacon prefix, because one knows exactly what, when, and where

83

is the root cause of each routing update, one can easily measure the routing conver-

gence time by calculating the difference between when the root cause is triggered and

when the last update due to the same root cause is observed. Although routing up-

dates for beacon prefixes may also be generated by unexpected path changes in the

network, those updates can be clearly identified through the use of anchor prefixes as

explained later in this section. Unfortunately one cannot assess the overall Internet

routing performance from observing the small number of existing beacon prefixes.

Our observation of routing dynamics is based on a set of routers termed monitors,

that propagate their routing table updates to collector boxes, which store them in disk

(e.g. RouteViews[15]). To obtain a comprehensive understanding of BGP path ex-

plorations in the operational Internet, we first cluster routing updates from the same

monitor and for the same prefix into events, sort all the routing events into several

classes, and then measure the duration and number of paths explored for each class of

events. Our task is significantly more difficult than measuring the convergence delay

of beacon prefixes for the following reasons. First, there is no easy way to tell whether

a sequence of routing updates is due to the same, or different root causes in order

to properly group them into events. Second, upon receiving an update for a prefix,

one cannot tell what is the root cause of the update, as is the case with beacon pre-

fixes. Furthermore, when the path to a given destination prefix changes, it is difficult

to determine whether the new path is a more, or less, preferred path compared to the

previous one, i.e. whether the prefix experiences a Tshort or a Tlong event in our event

classification.

To address the above problems, we take advantage of beacon updates to develop

and calibrate effective heuristics and then apply them to all the prefixes. In the rest of

this section, we first describe our data set, then discuss how we use beacon updates to

validate a timer-based mechanism for grouping routing updates into events, and how

84

we use beacon updates to develop a usage-based path ranking method which is then

used in our routing event classifications.

6.2.1 Data Set and Preprocessing

To develop and calibrate our update grouping and path ranking heuristics, we used

eight BGP beacons, one from PSG [11] (psg01), the other seven from RIPE [13]

(rrc01, rrc03, rrc05, rrc07,rrc10, rrc11 and rrc12). All the eight beacon prefixes are

announced and withdrawn alternately every 2 hours. We preprocessed the beacon up-

dates following the methods developed in [62]. First, we removed from the update

stream all the duplicate updates, as well as the updates that differ only in COMMU-

NITY or MED attribute values, because these updates are usually caused by internal

dynamics inside the last-hop AS. Second, we used the anchor prefix of each beacon to

detect routing changes other than those generated by the beacon origins. An anchor

prefix is a separate prefix announced by a beacon prefix’s origin AS, and is never with-

drawn after its announcement. Thus it serves as a calibration point to identify routing

events that are not originated by the beacon injection/removal mechanism. Because the

anchor prefix shares the same origin AS, and hopefully the same routing path, with the

beacon prefix, any routing changes that are not associated with the beacon mechanism

will trigger routing updates for both the anchor and the beacon prefixes. To remove

all beacon updates triggered by such unexpected routing events, for each anchor prefix

update at time t, we ignore all beacon updates during the time window [t−W, t+W ].

We set W ’s value to 5 minutes, as the results reported in [62] show that the number of

beacon updates remains more or less constant forW > 5 minutes. After the above two

steps of preprocessing, beacon updates are mainly comprised of those triggered by the

scheduled beacon activity at the origin ASes.

To assess the degree of path exploration for all the prefixes in the global routing

85

table, we used the public BGP data collected from 50 full-table monitoring points by

RIPE [14] and RouteViews [15] collectors during the months of January and Febru-

ary 2006. We used the data from January to evaluate the different path comparison

metrics and we later analyzed the events in both months. We removed from the data

all the updates that were caused by BGP session resets between the collectors and the

monitors, using the minimum collection time method described in [97]. Those updates

correspond to BGP routing table transfers between the collectors and the monitors, and

therefore should not be accounted in our study of the convergence process.

The 50 monitors were chosen based on the fact that each of them provided full

routing tables and continuous routing data during our measurement period. One month

was chosen as our measurement period based on the assumption that ISPs are unlikely

to make many changes of their interconnectivity within one month period, so that we

can assume the AS level topology did not change much over our measurement time

period, an assumption that is used in our AS path comparison later in the thesis.

6.2.2 Clustering Updates into Events

Some of the previous BGP data analysis studies [79, 25, 38] developed a timer-based

approach to cluster routing updates into events. Based on the observation that BGP

updates come in bursts, two adjacent updates for the same prefix are assumed to be due

to the same routing event if they are separated by a time interval less than a threshold

T . A critical step in taking this approach is to find an appropriate value for T . A value

that is too high can incorrectly group multiple events into one. On the other hand, a

value that is too low may divide a single event into multiple ones. Since the root causes

of beacon routing events are known, and the beacon update streams contain little noise

after the preprocessing, we use beacon prefixes to find an appropriate value for T .

Figure 6.2 shows the distribution of update inter-arrival times of the eight beacon

86

prefixes as observed from the 50 monitors. All the curves start flattening out either

before or around 4 minutes (the vertical line in the figure). If we use 4 minutes as

the threshold value to separate updates into different events, i.e. T = 4 minutes, in

the worst case (rrc01 beacon) we incorrectly group about 8% of messages of the same

event into different events; this corresponds to the inter-arrival time difference between

the cutting point of the rrc01 curve at 4 minutes and the horizontal tail of the curve. The

tail drop of all the curves at 7200 seconds corresponds to the 2-hour interval between

the scheduled beacon prefix activities 1.

Although the data for the beacon updates suggests that a threshold of T = 4 min-

utes may work well for grouping updates into events, no single value of T would be a

perfect fit for all the prefixes and all the monitors. Thus we need to assess how sensitive

our results may be with the choice of T = 4 minutes. Figure 6.3 compares the result

of using T = 4 minutes with that of T = 2 minutes and T = 8 minutes for clustering

the updates of all the prefixes collected from all the 50 monitors during our one-month

measurement period. Let E(m, p, 4) be the number of events identified by monitor m

for prefix p using T = 4 minutes; E(m, p, 2) and E(m, p, 8) are similarly defined but

with T = 2 minutes and T = 8 minutes respectively. Figure 6.3 shows the distribution

of |E(m, p, 8)−E(m, p, 4)| and |E(m, p, 2)−E(m, p, 4)|, which reflects the impact of

using a higher or lower timeout value, respectively. As one can see from the figure, in

about 50% of the cases the three different T values result in the same number of events,

and in more than 80% of the cases the results from using the different T values differ

by at most 2 events. Based on the data we can conclude that the result of event cluster-

ing is insensitive to the choice of T = 4 minutes. This observation is also consistent

1The psg01 curve reaches a plateau earlier than the other curves, indicating that it suffers less fromslow routing convergence. However one may note its absence of update inter-arrivals between 100seconds and 3600 seconds, followed by a high number of inter-arrivals around 3600 seconds. As hintedin [62], this behavior could be explained by BGP’s route flat damping, and one hour is the defaultmaximum suppression time applied to an unstable prefix when its announcement goes through a routerwhich enforces BGP damping.

87

0

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

1

1 10 100 1000 10000

CC

DF

Inter-arrival time (s)

psg01rrc01rrc03rrc12

rrc{05,07,10,11}

Figure 6.2: CCDF of inter-arrival times of

BGP updates for the 8 beacon prefixes as

observed from the 50 monitors.

0.5

0.6

0.7

0.8

0.9

1

0 1 2 3 4 5 6

Fre

quen

cy (

CD

F)

Difference in number of events per [monitor,prefix]

2 min8 min

Figure 6.3: Difference in number of events

per [monitor,prefix] for T=2 and 8 min-

utes, relatively to T=4 minutes, during one

month period.

88

Tdown TshortTup Tlong

Same Path

Observed Events

Path Disturbance Path Change

Tpdist Tequal Tspath

Figure 6.4: Event taxonomy.

with previous work. For example [38] experimented with various timeout threshold

values between 2 minutes and 16 minutes, and found no significant difference in the

clustering results. In the rest of the thesis, we use T = 4 minutes.

6.2.3 Classifying Routing Events

After the routing updates are grouped into events, we classify the events into different

types based on the effect that each event has on the routing path. Let us consider two

consecutive events n and n+ 1 for the same prefix observed by the same monitor. We

define the path in the last update of event n as the ending path of event n, which is

also the starting path for event n+ 1. Let pstart and pend denote an event’s starting and

ending paths, respectively, and ε denote the path in a withdrawal message (representing

an empty path). If the last update in an event is a withdrawal, we have pend = ε. Based

on the relation between pstart and pend of each event, we classify all the routing events

into one of the following categories as shown in Figure 6.4 2.

1. Same Path (Tspath): A routing event is classified as a Tspath if its pstart = pend,

and every update in the event reports the same AS path as pstart, although they

may differ in some other BGP attribute such as MED or COMMUNITY value.

Tspath events typically reflect the routing dynamics inside the monitor’s AS.

2To establish a valid starting state, we initialize pstart for each (monitor,prefix) pair with the pathextracted from the routing table of the corresponding monitor.

89

2. Path Disturbance (Tpdist): A routing event is classified as Tpdist if its pstart =

pend, and at least one update in the event carries a different AS path. In other

words, the AS path is the same before and after the event, with some transient

change(s) during the event. Tpdist events are likely resulted from multiple root

causes, such as a transient failure closely followed by a quick recovery, hence

the name of the event type. When multiple root causes occur closely in time,

the updates they produce also follow each other very closely, and no timeout

value would be able to accurately separate them out by the root causes. In our

study we identify these Tpdist events but do not include them in the convergence

analysis.

3. Path Change: A routing event is classified as a path change if its pstart 6= pend.

In other words, the paths before and after the event are different. Path change

events are further classified into five categories, based on whether the destina-

tion becomes available or unavailable, or changed to a more preferred or less

preferred path, at the end of the event. Let pref(p) represent a router’s prefer-

ence of path p, with a higher value representing a higher preference.

• Tup: A routing event is classified as a Tup if its pstart = ε. A previously

unreachable destination becomes reachable through path pend by the end of

the event.

• Tdown: A routing event is classified as Tdown if its pend = ε. That is, a

previously reachable destination becomes unreachable by the end of the

event.

• Tshort: A routing event is classified as Tshort if its pstart 6= ε, pend 6= ε and

pref(pend) > pref(pstart), indicating a reachable destination has changed

the path to a more preferred one by the end of the event.

90

• Tlong: A routing event is classified as a Tlong event if its pstart 6= ε, pend 6=

ε and pref(pend) < pref(pstart), indicating a reachable destination has

changed the path to a less preferred one by the end of the event.

• Tequal: A routing event is classified as Tequal if its pstart 6= ε, pend 6= ε and

pref(pend) = pref(pstart). That is, a reachable destination has changed

the path by the end of the event, but the starting and ending paths have the

same preference.

A major challenge in event classification is how to differentiate between Tlong and

Tshort events, a task that requires judging the relative preference between two given

paths. Individual routers use locally configured routing policies to choose the most

preferred path among available ones. Because we do not have precise knowledge of the

routing policies, we must derive effective heuristics to infer a routers’ path preference.

It is possible that our heuristics label two paths with equal preference, in which case

the event will be classified as Tequal. However, a good path ranking heuristic should

minimize such ambiguity.

6.2.4 Comparing AS Paths

If a routing event has non-empty pstart and pend, then the relative preference between

pstart and pend determines whether the event is a Tlong or Tshort. In the controlled

experiments using beacon prefixes, one can create such events by manipulating AS

paths. For example in [51], AS paths with length up to 30 AS hops were used to

simulate Tlong events.

However in general there has been no good way to infer routers’ preferences among

multiple available AS paths to the same destination. Given a set of available paths,

a BGP router chooses the most preferred one through a decision process. During

91

this process, the router usually considers several factors in the following order: local

preference (which reflects the local routing policy configuration), AS path length, the

MED attribute value, IGP cost, and tie-breaking rules. Some of the previous efforts in

estimating path preference tried to emulate a BGP router’s decision process to various

degrees. For example, [51, 52, 38] used path length only. Because BGP is not a

shortest-path routing protocol, however, it is known that the most preferred BGP paths

are not always the shortest paths. In addition, there often exist multiple shortest paths

with equal AS hop lengths. There are also a number of other efforts in inferring AS

relationship and routing policies. However as we will show later in this section, none

of the existing approaches significantly improves the inference accuracy.

To infer path preference with a high accuracy for our event classification, we took

a different approach from all the previous studies. Instead of emulating the router’s de-

cision process, we propose to look at the end result of the router’s decision: the usage

time of each path. The usage time is defined as the cumulative duration of time that

a path remains in the router’s routing table for each destination (or prefix). Assuming

that the Internet routing is relatively stable most of the time and failures are recovered

promptly, then most preferred paths should be used most and thus remain in the routing

table for the longest time. Given our study period is only one month, during this time

period it is unlikely that significant changes happened to routing policies and/or ISP

peering connections in the Internet. Thus we conjecture that relative preferences of

routing paths remained stable for most, if not all, the destinations during our study pe-

riod. Figure 6.5 shows the path usage time distribution for the monitor with IP address

12.0.1.63 (AT&T). The total number of distinct ASPATH-prefix pairs that appeared in

this router’s routing table during the month is slightly less then 650,000 (correspond-

ing to about 190,000 prefixes). About 23% of the ASPATH-prefix pairs (the 150,000

on the left side of the curve) stayed in the table for the entire measurement period, and

about 500,000 ASPATH-prefix pairs appeared in the routing table for only a fraction

92

1

10

100

1000

10000

100000

1e+06

1e+07

0 100000 200000 300000 400000 500000 600000

Pat

h U

sage

Tim

e (s

)

ASPATH-Prefix pair

Figure 6.5: Usage time per ASPATH-Prefix for router 12.0.1.63, Jan 2006.

of the period, ranging from a few days to some small number of seconds.

We compare this new Usage Time based approach with three other existing meth-

ods for inferring path preference: Length, Policy, and Policy+Length. Usage Time

uses the usage time to rank paths. Length infers path preference according the AS path

length. Policy is derives path preference based on inferred inter-AS relationships. We

used the algorithm developed in [39] to classify the relationships between ASes into

customer, provider, peer, and sibling. A path that goes through a customer is preferred

over a path that goes through a peer, which is preferred over a path that goes through

a provider 3. Policy+Length infers path preference by using the policies first, and then

using AS length for those paths that have the same AS relationship.

One challenge in conducting this comparison is how to verify the path ranking re-

sults without knowing the router’s routing policy configurations. We tackle this prob-

lem by leveraging our understanding about Tdown and Tup events. During Tdown events,

routers explore multiple paths in the order of decreasing preference; during Tup events,

routers explore paths in the order of increasing preference. Since we can identify Tdown

and Tup events fairly accurately, we can use the information learned from these events

to verify the results from different path ranking methods.

3We ignore those cases in which we could not establish the policy relation between two ASes. Suchcases happened in less than 1% of the total paths.

93

In an ideal scenario where paths explored during a Tdown (or Tup) event follow a

monotonically decreasing (or increasing) preference order, we can take samples of ev-

ery consecutive pair of routing updates and rank order the paths they carried. However

due to the difference in update timing and propagation delays along different paths,

the monotonicity does not hold true all the time. For example, we observed path with-

drawals appearing in the middle of update sequences during Tdown events. Therefore,

instead of comparing the AS paths carried in adjacent updates during a routing event,

we compare the paths occurred during an event with the stable path used either be-

fore or after the event. Figure 6.6 shows our procedure in detail. All the updates in

the figure are for the same prefix P . Before the Tup event occurs, the router does

not have any route to reach P . The first four updates are clustered into a Tup event

that stabilizes with path p4. After p4 is in use for some period of time, the prefix P

becomes unreachable. During the Tdown event, paths p5 and p6 are tried before the

final withdrawal update. From this example, we can extract the following pairs of

path preference: pref(p1) < pref(p4), pref(p2) < pref(p4), pref(p3) < pref(p4),

pref(p5) < pref(p4), and pref(p6) < pref(p4).

After extracting path preference pairs from Tdown and Tup events, we apply the

four path ranking methods in comparison to the same set of routing updates and see

whether they produce the same path ranking results as we derived from Tdown and

Tup events. We keep three counters Ccorrect, Cequal and Cwrong for each method. For

instance, in the example of Figure 6.6, if a method results in p1 and p2 being worse than

p4, and p3 having the same preference of p4 (equal), then for the Tup event we have

Ccorrect = 2 , Cequal = 1 and Cwrong = 0. Likewise, for the Tdown event, if a method

results in p5 being better than p4 and p6 being equal to p4, then we have Ccorrect = 0,

Cequal = 1 and Cwrong = 1. To quantify the accuracy of different inference methods,

we define Pcorrect = Ccorrect

Ccorrect+Cequal+Cwrong. We use Pcorrect as a measure of accuracy in

our comparison.

94

Time

p1 p3p2

Tup

p6 Wp5

Tdown

p4

Figure 6.6: Validation of path preference metric.

0

10

20

30

40

50

60

70

80

90

100

Usage TimePolicy+LengthPolicyLength

Fre

quen

cy (

%)

(a) Tup

CorrectEqual

Wrong

0

10

20

30

40

50

60

70

80

90

100

Usage TimePolicy+LengthPolicyLength

Fre

quen

cy (

%)

(b) Tdown

CorrectEqual

Wrong

Figure 6.7: Comparison between Ccorrect,Cequal and Cwrong of length , policy and us-

age time metrics for (a) Tup and (b) Tdown events of beacon prefixes.

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

1

5 10 15 20 25 30 35 40 45

Pco

rrec

t

Number of monitors

LengthPolicy

Policy+LengthUsage Time

Figure 6.8: Comparison between accuracy of length, policy and usage time metrics.

95

To compare the four different path ranking methods, we first applied them to our

beacon data set which contains updates generated by Tup and Tdown events, and com-

puted the values of Ccorrect, Cequal and Cwrong for each of the four methods. Figure

6.7 shows the result. As one can see from the figure, Length works very well in rank-

ing paths explored during Tdown events, giving 93% correct cases and 5% equal cases.

However, it performs much worse in ranking the paths explored during Tup events,

producing 40% correct cases and 40% wrong cases. During Tdown events, many “in-

valid” paths are explored and they are very likely to be longer than the stable path.

However during Tup events, only “valid” paths are explored and their preferences are

not necessarily based on their path lengths.

Policy performs roughly equally for ranking paths during Tdown and Tup events.

It does not make many wrong choices, but produces a large number of equal cases

(around 70% of the total). This demonstrates that the inferred AS relationship and

routing policies provide insufficient information for path ranking. They do not take

into account many details, such as traffic engineering, AS internal routing metric, etc.,

that affect actual routes being used. Compared with Length, Policy+Length has a

slightly worse performance with Tdown events, and a moderate improvement with Tup

events. Our observations are consistent with a recent study that concludes that per-AS

relationships is not fine-grained enough to compute routing paths correctly [68].

Usage Time works surprisingly well and outperforms the other three in both Tdown

and Tup events. Its Pcorrect is about 96.3% in Tup and 99.4% in Tdown events. Its Cequal

value is 0 in both Tup and Tdown events. This is because we are measuring the path

usage time using the unit of second, which effectively puts all the paths in strict rank

order. We also notice that for Tup events, about 3.7% of the comparisons are wrong,

whereas for Tdown events this number is as low as 0.6%. We believe this noticeably

percentage of wrong comparisons in Tup events is due to path changes caused by topo-

96

logical changes, such as a new link established between two ASes as a result of a

customer switching to a new provider. Because the new paths have low usage time,

our Usage Time based inference will give them a low rank, although these paths are ac-

tually the preferred ones. Nevertheless, the data confirmed our earlier assumption that,

during our 1-month measurement period, there were no significant changes in Internet

topology or routing polices, otherwise we would have seen a much higher percentage

of wrong cases produced by Usage Time.

We now examine how the value of Pcorrect varies between different monitors under

each of the four path ranking methods. Figure 6.8 shows the distribution of Pcorrect for

different methods, with X-axis representing the monitors sorted in decreasing order

of their Pcorrect value. The value of Pcorrect for each monitor is calculated over all

the Tdown and Tup events in our beacon data set. When using the path usage time for

path ranking, we observe an accuracy between 84% and 100% across all the monitors,

whereas with using path length for ranking, we observe the Pcorrect value can be as low

as 31% for some monitor. Using policy for path ranking leads to even lower Pcorrect

values.

After we developed and calibrated the usage time based path ranking method using

beacon updates, we applied the method, together with the other three, to the BGP

updates for all the prefixes collected from all the 50 monitors, and we obtained the

results that are similar to that from the beacon update set. Considering the aggregate

of all monitors and all prefixes, Pcorrect is 17% for Policy, 65% for Length, 73% for

Policy+Length, and 96.5% for Usage Time. Thus we believe usage time works very

well for our purpose and use it throughout our study.

To the best of our knowledge, we are the first to propose the method of using usage

time to infer relative path preference. We believe this new method can be used for

many other studies on BGP routing dynamics. For example, [38] pointed out that if

97

after a routing event, the stable path is switched from P1 to P2, the root cause of the

event should lie on the better path of the two. The study used length-only in their path

ranking and the root cause inference algorithm produced a mixed result. Our result

shows that using length for path ranking gives only about 65% accuracy, and usage

time can give more than 96% accuracy. Using usage time to rank path can potentially

improve the results of the root cause inference scheme proposed in [38].

6.3 Characterizing Events

After applying the classification algorithm to BGP data, we count the number of Tdown

events observed by each monitor as a sanity check. A Tdown event means that a pre-

viously reachable prefix becomes unreachable, suggesting that the root cause of the

failure is very likely at the AS that originates the prefix, and should be observed by all

the monitors. Therefore, we expect every monitor to observe roughly the same number

of Tdown events. Figure 6.9 shows the number of Tdown events seen by each monitor.

Most monitors observe similar number of Tdown events, but there are also a few outliers

that observe either too many or too few Tdown events. Too many Tdown events can be

due to failures that are close to monitors and partition the monitors from the rest of the

Internet, or underestimation of the relative timeout T used to cluster updates. Too few

Tdown events can be due to missing data during monitor downtime, or overestimation

of the relative timeout T . In order to keep consistency among all monitors, we decided

to exclude the head and tail of the distribution, reducing the data set to 32 monitors.

Now we examine the results of event classification. Tables 6.1 and 6.2 show the

statistics for January and February respectively for each event class, including the total

number of events, the average event duration, the average number of updates per event,

and the average number of unique paths explored per event. We exclude Tequal events

from the table since their percentage is negligible. Comparing the results from the two

98

0

200000

400000

600000

800000

1e+06

1.2e+06

0 5 10 15 20 25 30 35 40 45

Num

ber

of T

dow

n ev

ents

Monitor ID

Figure 6.9: Number of Tdown events per monitor.

No. of Events Duration No. of No. of

(×106) (second) Updates Paths

Tup 3.39 45.26 2.30 1.59

Tdown 3.35 116.34 4.10 1.95

Tshort 7.37 31.32 1.71 1.27

Tlong 8.04 69.93 2.52 1.62

Tpdist 15.51 174.19 4.66 2.33

Tspath 23.24 38.91 1.52 1.00

Table 6.1: Event Statistics for Jan 2006 (31 days)

99

No. of Events Duration No. of No. of

(×106) (second) Updates Paths

Tup 2.88 42.54 2.20 1.54

Tdown 2.85 118.98 4.00 1.90

Tshort 8.09 39.68 2.46 1.51

Tlong 8.94 67.26 2.51 1.70

Tpdist 16.01 190.79 4.80 2.31

Tspath 20.44 30.42 1.44 1

Table 6.2: Event Statistics for Feb 2006 (28 days)

0

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

1

0 50 100 150 200 250 300

Fre

quen

cy (

CD

F)

Event duration (s)

TupTdownTshortTlongTpdist

Figure 6.10: Duration of events for January

2006.

0

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

1

0 50 100 150 200 250 300

Fre

quen

cy (

CD

F)

Event duration (s)

TupTdownTshortTlongTpdist

Figure 6.11: Duration of events for Febru-

ary 2006.

100

0

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

1

1 2 3 4 5 6

Fre

quen

cy (

CD

F)

Number of updates per event

TupTdownTshortTlongTpdist

Figure 6.12: Number of Updates per Event,

January 2006.

0

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

1

0 1 2 3 4

Fre

quen

cy (

CD

F)

Number of ASPATHs explored per event

TupTdownTshortTlongTpdist

Figure 6.13: Number of Unique Paths Ex-

plored per Event, January 2006.

0

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

1

0 50 100 150 200 250 300

Fre

quen

cy (

CD

F)

Event duration (s)

TupTdownTshortTlongTpdist

Figure 6.14: Duration of events for unsta-

ble prefixes, January 2006.

0

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

1

0 50 100 150 200 250 300

Fre

quen

cy (

CD

F)

Event duration (s)

TupTdownTshortTlongTpdist

Figure 6.15: Duration of events for stable

prefixes, January 2006.

101

months, we note that the values are very close, as can also be observed by comparing

the distribution of event duration on Figures 6.10 and 6.11. Given this similarity, we

will base our following analysis on January data, although the same observations apply

to February.

There are three observations. First, the three high-level event categories in Figure

6.4 have approximately the same number of events: Path-Change events are about 36%

of all the events, Same-Path 34% and Path-Disturbance 30%. Breaking down Path-

Change events, we see that the number of Tdown balances that of Tup, and the number

of Tlong balances that of Tshort. This makes sense since Tdown failures are recovered

with Tup events, and Tlong failures are recovered with Tshort events.

Second, the average duration of different types of events can be ordered as follows:

Tshort < Tspath ' Tup < Tlong � Tdown < Tpdist4. Figure 6.10 shows the distributions

of event durations, 5 which also follow the same order. Note that the shape of the curves

is stepwise with jumps at multiples of around 26.5 seconds. The next section will

explain that this is due to the MinRouteAdvertisementInterval (MRAI) timer, which

controls the interval between consecutive updates sent by a router. The default range

of MRAI timer has the average value of 26.5 seconds, making events last for multiples

of this value. Table 6.1 also shows that Tpdist events have the longest duration, the most

updates and explore the most unique paths. This suggests that Tpdist likely contains two

events very close in time, e.g., a link failure followed shortly by its recovery. A study

[65] on network failures inside a tier-1 provider revealed that about 90% of the failures

on high-failure links take less that 3 minutes to recover, while 50% of optical-related

failures take less than 3.5 minutes to recover. Therefore there are many short-lived

network failures and they can very well generate routing events like Tpdist. On the

4The order of Tspath and Tshort average durations invert on February 2006, even though the valuesremain very close to each other.

5The Tspath curve is omitted from the figure for clarity.

102

other hand, Tspath events are much shorter and have less updates. It is because that

Tspath is likely due to routing changes inside the AS hosting the monitor, and thus does

not involve inter-domain path exploration.

Third, among the path changing events, Tdown events last the longest, have the

most updates, and explore the most unique paths. Figures 6.10, 6.12 and 6.13 show

the distributions of event duration, number of updates per event, and number unique

paths explored per event respectively. The results show that route fail-down events

(Tdown) last considerably longer than route fail-over events (Tlong). In fact, Figure 6.10

shows that about 60% of Tlong events have duration of zero, while 50% of Tdown events

last more than 80 seconds. In addition, Figure 6.12 shows that about 60% of Tlong

events have only 1 update, while about 70% of Tdown events have 3 or more updates.

Figure 6.13 shows that Tdown explore more unique paths than Tlong. These results are

in accordance with our previous analytical results in [73], but contrary to the results

of previous measurement work [52], which concluded that the duration of Tlong events

is similar to that of Tdown and longer than that of Tup and Tshort. In [73] we showed

that the upper bound of Tlong convergence time is proportional to M(P − J), where

M is the MRAI timer value, P is the path length of to the destination after the event,

and J is the distance from the failure location to the destination. Since P is typically

small for most Internet paths, and J could be anywhere between 0 and P , the duration

of most Tlong events should be short. We believe that the main reason [52] reached a

different conclusion is because they conducted measurements by artificially increasing

P to 30 AS hops using AS prepending. The analysis in [73] shows that an overestimate

of P would result in a longer Tlong convergence time, which would explain why they

observed longer durations for beacon prefixes than what we observed for operational

prefixes.

103

6.3.1 The Impact of Unstable Prefixes

So far we have been treating all destination prefixes in the same way by aggregating

them in a single set in our measurements. However, previous work[79] showed that

most of routing instabilities affect a small number of unstable prefixes, and popular

destinations (with high traffic volume) are usually stable. Therefore, it might be the

case that the results we just described are biased towards those unstable prefixes, since

these prefixes are associated with more events. In order to verify if this is the case, we

classify each prefix p into one of two classes, based on the number of events associated

with it. If we let E be the median of the distribution of the number of events per prefix

E(p), then we can classify each prefix p in: (1) unstable if E(p) ≥ E, or (2) stable if

E(p) < E. From the 205,980 prefixes in our set, only 28,954 (or 14%) were classified

as unstable, i.e. 14% of prefixes were responsible for 50% of events. In Figures 6.14

and 6.15 we show the distribution of event duration for unstable and stable prefixes

respectively. Note that not only are these two distributions very similar, but they are

also very close to the original distribution of the aggregate in Figure 6.10. Based on

these observations, we believe there is no sensitive bias in the aggregated results shown

before.

6.4 Policies, Topology and Routing Convergence

In this section we compare the extent of slow convergence across different prefixes

and different monitors to examine the impacts of routing polices and topology on slow

convergence.

104

6.4.1 MRAI Timer

In order to make fair comparisons of slow convergence observed by different monitors,

we need to be able to tell whether a monitor enables MRAI timer or not. The BGP

specification (RFC 4271 [78]) defines the MinRouteAdvertisementInterval (MRAI) as

the minimum amount of time that must elapse between two consecutive updates sent

by a router regarding the same destination prefix. Lacking MRAI timer may lead to

significantly more update messages and longer global convergence time [40]. Even

though it is a recommended practice to enable the MRAI timer, not all routers are

configured this way. Since MRAI timer will affect observed event duration and number

of updates, for the purpose of studying impacts of policies and topology, we should

only make comparisons among MRAI monitors, or among non-MRAI monitors, but

not between MRAI and non-MRAI monitors.

By default the MRAI timer is set to 30 seconds plus a jitter to avoid unwanted

synchronization. The amount of jitter is determined by multiplying the base value

(e.g., 30 seconds) by a random factor which is uniformly distributed in the range [0.75,

1]. Assuming routers are configured with the default MRAI values, we should (1) not

observe consecutive updates spaced by less than 30 × 0.75 = 22.5 seconds for the

same destination prefix, and (2) observe a considerable amount of inter-arrival times

between 22.5 and 30 seconds, centered around the expected value, 30× 0.75+12

= 26.5

seconds.

For each monitor, we define a Non-MRAI Likelihood, LM , as the probability of

finding consecutive updates for the same prefix spaced by less than 22 seconds. Figure

6.16 shows LM for all the 50 monitors in our initial set. Clearly, there are monitors

with very high LM and monitors with very small LM . The curve has a sharp turn,

hinting a major configuration change. Based on this, we decided to set LM = 0.05 as

a threshold to differentiate MRAI and non-MRAI monitors. Those with LM < 0.05

105

0

0.1

0.2

0.3

0.4

0.5

0.6

5 10 15 20 25 30 35 40 45 50

Pr

[inte

r-ar

rival

< x

sec

]Monitor ID

22 seconds10 seconds

Figure 6.16: Determining MRAI configuration.

are classified as MRAI monitors, and those with LM ≥ 0.05 are classified as non-

MRAI monitors. However, there could still be cases of non-MRAI monitors with

MRAI timer configuration just slightly bellow the RFC recommendation, and would

therefore be excluded using our method. In order to assure this was not the case, we

show in Figure 6.16 the curve corresponding to the probability of finding consecutive

updates spaced by less than 10 seconds. We note that the 10-second curve is very close

to the 22-second curve, and therefore we are effectively only excluding monitors that

depart significantly from the 30-second base value of the RFC.

Using this technique, we detect that 15 routers from the initial set of 50 are non-

MRAI (see the vertical line in Figure 6.16), and 10 of them are part of the set of 32

routers we used in previous section. We will use this set of 32-10=22 monitors for the

next subsection to compare the extent of slow convergence across monitors.

6.4.2 The Impact of Policy and Topology on Routing Convergence

Internet routing is policy-based. The “no-valley” policy [39], which is based on inter-

AS relationships, is the most prevalent one in practice. Generally, most ASes have rela-

tionships with their neighbors as provider-customer or peer-peer. In provider-customer

106

relationship, the customer AS pays the provider AS to get access service to the rest of

the Internet; in peer-peer relationship, the two ASes freely exchange traffic between

their respective customers. As a result, a customer AS does not forward packets be-

tween its two providers, and a peer-peer link can only be used for traffic between the

two incident ASes’ customers. For example, in Figure 6.19, paths [C E D], [C E F] and

[C B D] all violate the “no-valley” policy and generally are not allowed in the Internet.

Based on AS connectivity and relationships, the Internet routing infrastructure can

be viewed as a hierarchy.

• Core: consisting of a dozen or so tier-1 providers forming the top level of the

hierarchy.

• Middle: ASes that provide transit service but are not part of the core.

• Edge: stub ASes that do not provide transit service (they are customers only).

We collect an Internet AS topology [99], infer inter-AS relationships using the al-

gorithm from [94], and then classify all ASes into these three tiers. Core ASes are

manually selected based on their connectivity and relationships with other ASes [99];

Edge ASes are those that only appear at the end of AS paths; and the rest are middle

ASes. With this classification, we can locate monitors and prefix origins with regard

to the routing hierarchy.

Our set of 22 monitors consists of 4 monitors in the core, 15 in the middle and 3

at the edge. We would like to have a more representative set of monitors at the edge,

but we only found these many monitors in this class with consistent data from the

RouteViews and RIPE data archive. The results presented in this subsection might not

be quantitatively accurate due to the limitation of monitor set, but we believe they still

illustrate qualitatively the impact of monitor location on slow convergence.

107

In the previous section we showed that Tdown events have both the longest conver-

gence time and the most path exploration from all path change events. Furthermore,

in a Tdown event, the root cause of the failure is most likely inside the destination AS,

and thus all monitors should observe the same set of events. Therefore, the Tdown

events provide a common base for comparison across monitors and prefixes, and the

difference between convergence time and the number of updates should be most pro-

nounced. In this subsection we examine how the location of prefix origins and moni-

tors impact the extent of slow convergence.

Figure 6.17 shows the duration of Tdown events seen by monitors in each tier. The

order of convergence time is core < middle < edge, and the medians of convergence

times are 60, 84 and 84 seconds for core, middle and edge respectively. Taking into

account that our edge monitor ASes are well connected: one has 3 providers in the core

and the other two reach the core within two AS hops, we believe that in reality edge

will generally experience even longer convergence times than the values we measured.

Figure 6.18 shows that monitors in the middle and at the edge explore 2 or more paths

in about 60% of the cases, whereas monitors in the core explore at most one path in

about 65% of the cases.

In a Tdown event, the monitor will not finish the convergence process until it has

explored all alternative paths. Therefore, the event duration depends on the number of

alternative paths between the event origin and the monitor. In general, due to no-valley

policy [39], tier-1 ASes have fewer paths to explore than lower tier ASes. For example,

in Figure 6.19, node D (representing a tier-1 AS) has only one no-valley path to reach

node G (path 4), while node E has three paths to reach the same destination: paths 1,2

and 3. In order to reach a destination, tier-1 ASes can only utilize provider-customer

links and peer-peer links to other tier-1s, but a lower tier AS can also use customer-

provider links and peer-peer links in the middle tier, which leads to more alternative

108

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

1

0 50 100 150 200 250

Fre

quen

cy (

CD

F)

Tdown event duration (s)

coremiddle

edge

Figure 6.17: Duration of Tdown events as

seen by monitors at different tiers.

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

1

0 1 2 3 4

Fre

quen

cy (

CD

F)

Number of ASPATHs explored during Tdown

coremiddle

edge

Figure 6.18: Number of unique paths ex-

plored during Tdown as seen by monitors at

different tiers.

paths to explore during Tdown events.

We have studied how Tdown events are experienced by monitors in different tiers.

Now we study how the origin of the event impacts the convergence process. Note

that we must divide again the results according to the monitor location, otherwise we

may introduce bias caused by the fact that most of our monitors are in the middle tier.

We use the notation x → y, where x is the tier where the Tdown event is originated

from and y is the tier of the monitor that observe the event. In our measurements, we

observed that the convergence times of x → y case were close to the y → x case.

Therefore, from these two cases we will only show the case where we have a higher

percentage of monitors. For instance, between core → edge and edge → core cases

we will only show the later since our monitor set covers about 27% of the core but

only a tiny percentage of the edge. Figure 6.20 shows the duration of Tdown events for

prefixes originated and observed at different tiers. We omit the cases middle → core

and middle → middle for clarity of the figure, since they almost overlap with curves

edge→ core and edge→ middle respectively. The figure shows that the core→ core

case is the fastest, and the edge → middle, edge → edge cases are the slowest.

109

Peer PeerProvider Customer

A B

C D

E F

G

CORE

MIDDLE

EDGE

1

2

3

4

Figure 6.19: Topology example.

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

1

0 50 100 150 200 250

Fre

quen

cy (

CD

F)

Tdown event duration (s)

core → coreedge → core

edge → middleedge → edge

Figure 6.20: Duration of Tdown events ob-

served and originated in different tiers.

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

1

0 1 2 3

Fre

quen

cy (

CD

F)

Number of paths explored during Tdown

core → coreedge → core

edge → middleedge → edge

Figure 6.21: Number of paths explored

during Tdown events observed and origi-

nated in different tiers.

Tdown duration (s)

core→core 54

middle→core 60

edge→core 61

middle→middle 83

edge→edge 85

edge→middle 87

Figure 6.22: Median of duration of Tdown

events observed and originated in different

tiers.

110

This observation is also confirmed by Figure 6.21, which shows the number of paths

explored during Tdown. Table 6.22 lists the median durations of Tdown events originated

and observed at different tiers. Events observed by the core have shortest durations,

which confirms our previous observation (Figure 6.17). Note that the edge → edge

convergence is slightly faster than the edge → middle convergence. We believe this

happens because, as mentioned before, our set of edge monitors are very close to

the core. Therefore, they may not observe so much path exploration as the middle

monitors, which may have a number of additional peer links to reach other edge nodes

without going through the core.

Note that we expect that the edge → edge case reflects most of the slow routing

convergence observed in the Internet because about 80% of the autonomous systems

in the Internet are at the edge, and about 68% of the Tdown events are originated at the

edge, as will be shown in the next subsection.

6.4.3 Origin of Fail-down Events

We will now examine where the Tdown events are originated in the Internet hierarchy.

Since we expect the set of Tdown events be common to all the 32 monitors of our data set

(section 6.3), we will use in this subsection a single monitor, the router 144.228.241.81

from Sprint. Note that similar results are obtained from other monitors.

Because our data set spans over 1 month period, we do not know if during this time

there was any high-impact event that triggered an abnormal number of Tdown failures,

which could bias our results if we simply use daily count or hourly count. Instead,

Figure 6.23 plots the cumulative number of Tdown events as observed by the monitor

during January 2006, and the time granularity is second. The cumulative number of

events grows linearly, with an approximate constant number of 3,600 Tdown events per

day. This uniform distribution along the time dimension seems also to suggest that

111

0

20000

40000

60000

80000

100000

120000

0 5 10 15 20 25 30

Cum

ulat

ive

num

ber

of T

dow

n ev

ents

Days of January 2006

EmpiricalLinear Fit

Figure 6.23: Number of Tdown events over time.

most fail-down events have a random nature.

Table 6.3 shows the break down of Tdown events by the tier that they are originated

from. We observe that about 68% of the events are originated at the edge. However,

the edge also announces a chunk of 56% of the prefixes. Therefore, in order to assess

the stability of each tier, and since our identification of events is based on prefix, a

simple event count is not enough. A better measure is to divide the number of events

originated at each tier by the total number of prefixes originated from that tier. The

row “No. events per prefix” in Table 6.3 shows that if the core originates n events per

prefix, the middle originates 2× n and the edge originates 3× n such events, yielding

the interesting proportion 1:2:3. This seems to indicate that generally, prefixes in the

middle are twice as unstable as prefixes in the core, and prefixes at the edge are three

times as unstable as prefixes in the core.

6.4.4 Impact of Fail-down Convergence

The ultimate goal of routing is to deliver data packets. One may argue that although

Tdown events have the longest convergence time, they do not make the performance of

data delivery worse because the data packets would be dropped anyway if the prefix is

unreachable. However, this is not necessarily true. In the current Internet, sometimes

112

Core Middle Edge

No. of events 3,011 34,514 78,149

No. of prefixes 14,367 81,988 122,877

originated

No. of events 0.21 0.42 0.63

per prefix

Table 6.3: Tdown Events by Origin AS

the same destination network can be reached via multiple prefixes. Therefore, the fail-

ure to reach one prefix does not necessarily mean that the destination is unreachable,

because the destination may be reachable via another prefix.

Figure 6.24 shows a typical example. Network A has two providers, B and C.

To improve the availability of its Internet access, A announces prefix 131.179/16 via

B and prefix 131.179.100/24 via C. In this case, 131.179/16 is called the “covering

prefix” [66] of 131.179.100/24. As routing is done by longest prefix match, data traffic

destined to 131.179.100/24 normally takes link A-C to enter network A. When link

A-C fails, ideally, data traffic should switch to link A-B quickly with minimal damage

to data delivery performance. However, the failure of link A-C will result in a Tdown

event for 131.179.100/24. Before the convergence process completes, routers will

keep trying obsolete paths to 131.179.100/24, rather than switching to paths towards

131.179/16. This can result in packet lost and long delays, which probably will have

serious negative impacts on data delivery performance.

We analyzed routing tables from RouteViews and RIPE monitors to see how fre-

quent are the scenarios illustrated by Figure 6.24. The result shows that routing an-

nouncements like the one in Figure 6.24 are a common practice in the Internet. In the

global routing table, 50% of prefixes have covering prefixes being announced through

113

A

B C

131.179/16 131.179.100/24

X

Provider Customer

Figure 6.24: Case where Tdown convergence disrupts data delivery.

a different provider, and are therefore vulnerable to the negative impacts caused by

fail-down convergence. A recent study [47] showed that about 50% of VOIP glitches

as perceived by end users may be caused by BGP slow convergence.

114

CHAPTER 7

Prefix Hijacking and Internet Topology

In this chapter we study how different hijack attacks launched from very distinct places

in the network have different impact, and this impact depends on both the location of

the attacker and the location of the target network.

7.1 Prefix Hijacking

A prefix hijack occurs when an AS announces prefixes that it does not own. Now,

suppose AS-6 wrongly announces the prefix that belongs to AS-1, as shown in Figure

7.1. Note that AS-5 previously routed through AS-3 to reach AS-1. On receiving a

customer route through AS-6, it prefers the customer route over the peer route and

hence believes the false route. This is an example of a prefix hijack, in which a false

origin AS-6 announces a prefix it does not own, and deceives AS-5. In current routing

practice, it is difficult for an AS to differentiate between a true origin and a false origin.

Even though Internet Routing Registries (IRR) provide databases of prefix ownership,

the contents are not maintained up-to-date, and not all BGP routers are known to check

these databases. Hence, when presented multiple paths to reach the same prefix, a BGP

router will often choose the best path regardless of who originates this prefix, thus

allowing hijacked routes to propagate through the Internet. Prefix hijacks can be due to

malicious attacks or router mis-configurations. When legitimate data traffic is diverted

to the false origin, the data may be discarded, resulting in a traffic blackhole, or be

115

exploited for malicious purposes. A recent study [76] reported that some spammers

hijack prefixes in order to send spams without revealing their network identities.

The hijack depicted in Figure 7.1 is called a false origin prefix hijack, where an

AS announces the exact prefix owned by another AS. Another type of hijack, called

sub-prefix hijack, involves an AS announcing a more specific prefix (e.g. hijacker

announces a /24, when the true origin announces a /16). In this case, BGP routers will

usually treat them as different prefixes and maintain two separate entries in routing

tables. However, due to longest prefix matching in routing table lookups, data destined

to IP addresses in the /24 range will be forwarded to the false origin, instead of the true

origin. Prefix hijack can also involve a false AS link advertised in the AS path without

a change of origin. Our aim in this chapter is to understand that, how the topological

characteristics of two AS nodes announcing the same prefix influence the impact of

the hijack. Studying the impact of sub-prefix hijacks and false link hijacks involves

different considerations and is beyond the scope of this study. In the rest of this thesis,

we use the term prefix hijacks to refer to false origin prefix hijacks. We call the AS

announcing a prefix it does not own as the false origin, and the AS whose prefix is

being attacked as the true origin. Upon receiving the routes from both the false origin

as well as the true origin, an AS that believes the false origin is said to be deceived,

while an AS that still routes to the true origin is said to be unaffected.

7.2 Hijack Evaluation Metrics

For our simulations, we model the Internet topology as a graph, in which each node

represents an AS, and each link represents a logical relationship between two neighbor-

ing AS nodes. Note, two neighboring odes may have multiple physical links between

themselves. However, BGP paths are represented in the form of AS AS links, and

hence we abstract connections between two AS nodes as a single logical link. For

116

4

2

1

3

6

5

Provider CustomerPeer Peer

1

1

Tier-1

True origin False origin

1

Figure 7.1: Hijack scenario.

simplicity, each node owns exactly one unique prefix, i.e. no two nodes announce the

same prefix except during hijack. A prefix hijack at any given time involves only one

hijacker, and the hijacker can target only one node.

To capture the interaction between the entities involved in a hijack, we introduce a

variable β(a, t, v), function of false origin a, true origin t and node v as follows:

β(a, t, v) =

1 : if node v is deceived by false

origin a for true origin t’s prefix

0 : otherwise

(7.1)

Due to the rich connectivity in Internet topology, a node often has multiple equally

good paths to reach the same prefix. Figure 7.1 shows a case where AS-4 has three

equally good paths to reach the same prefix, two to the true origin AS-1 (through AS-2

and AS-3), and one to the false origin AS-6. In our model, we assume a node will

break the tie randomly. Therefore, we define the expected value of β as follows. Let

p(v, n) be the number of equally preferred paths (e.g. same policy, same path length)

from the node v to node n. E.g., in Figure 7.1, p(4, 1) = 2 since AS-4 has two paths

via AS-2 and AS-3 to reach AS-1, and p(4, 6) = 1 since AS-4 has only one route via

AS-5 to reach AS-6. If nodes use random tie-break to decide between multiple equally

117

good preferred paths, then the expected value for β is defined as:

β(a, t, v) =p(v, a)

p(v, a) + p(v, t)(7.2)

yielding β(6, 1, 4) = 13

for the example in the figure. β is the probability of a node v

being deceived by a given false origin a announcing a route belonging to true origin t.

Impact

We use the term impact to measure the attacking power of a node launching prefix

hijacks. We define impact of a node a as the fraction of the nodes that believe the false

origin a during an attack on true origin t. More formally, the impact of a node a is

given by:

I(a) =∑t∈N

∑v∈N

β(a, t, v)

(N − 1)(N − 2)(7.3)

Note that the outer sum is over N −1 true origins (we exclude the false origin) and the

inner sum is over N − 2 nodes (excluding both the false origin and true origin).

Resilience

We use the term resilience to measure the defensive power of a node against hijacks

launched against its prefix. We define the resilience of a node t as the fraction of nodes

that believe the true origin t given an arbritary hijack against t. More formally, the

node resilience R(t) of a node t is given by:

R(t) =∑a∈N

∑v∈N

β(t, a, v)

(N − 1)(N − 2)(7.4)

Note, higher R(t) values indicate better resilience against hijacks, and higher I(a)

values indicate higher impact as an attacker.

118

Relation between Impact and Resilience

The true origin t and false origin a compete with each other to make nodes in the

Internet route to itself. For example in Figure 7.1, false origin AS-6 is hijacking a

prefix belonging to true origin AS-1. In this case, only AS-5 believes the false origin

and AS-4 has a 1/3 chance of being deceived. Therefore, the chances that a node

believes the false origin AS-6 when it hijacks AS-1 is given by 1+1/34

= 13.

Now if AS-1 was to hijack a prefix belonging to AS-6, then AS-5 would still believe

AS-6 and AS-4 will believe it with a probability of 1/3. Thus, in this case, the chances

that a node believes the true origin AS-6 when it is hijacked by AS-1 is 1+1/34

= 13.

We see that the resilience of the node as a true origin is equal to its impact as

a false origin. We note that in our model, when the roles of attacker and target are

switched, the impact of a node becomes its resilience. In the rest of the thesis, we

focus on resilience, while keeping in mind that a highly resilient node can also cause

high impact as a false origin.

7.3 Evaluating Hijacks

In this section, we aim to understand the topological resilience of nodes against prefix

hijacks by performing simulations on an Internet derived topology. We first explain

the simulation setup, followed by the main results of our simulation and the insight

behind the results.

7.3.1 Simulation Setup

For our simulations, we use an AS topology collected from BGP routing tables and

updates, representing a snapshot of the Internet as of Feb 15 2006 (available from

119

[98]). The details of how this topology was constructed are described in [99]. Our

topology consists of 22,467 AS nodes and 63,883 links. We assume each AS node

owns and announces a single prefix to its neighbors. We classify AS nodes into three

tiers: Tier-1 nodes, transit nodes, and stub nodes. To choose the set of Tier-1 nodes, we

started with a well known list, and added a few high degree nodes that form a clique

with the existing set. Nodes other than Tier-1s but provide transit service to other AS

nodes, are classified as transit nodes, and the remainder of nodes are classified as stub

nodes. This classification results in 8 Tier-1 nodes, 5,793 transit nodes, and 16,666 stub

nodes. We classify each link as either customer-provider or peer-peer using the PTE

algorithm[39] and use the no valley prefer customer routing policy to infer routing

paths (also used in previous works such as [95]). We abstracted the router decision

process into the following priorities (1)local policy based on relationship, (2)AS path

length, and (3)random tie-breaker.

Of the 22,467 AS nodes in our topology, we randomly picked 1,000 AS nodes

to represent false origins that would launch attacks on other AS nodes. We checked

the degree distribution of this set of 1,000 AS nodes, and found it to be similar to

the degree distribution of all the AS nodes. For each of the 22,467 AS nodes as a

true origin, we simulated a hijack with the 1,000 false origins. Thus we simulated

22, 467× 1, 000 ' 22.5 million hijack scenarios in total.

7.3.2 Characterizing Topological Resilience

Figure 7.2 shows the distribution of the resilience (average curve) for all the nodes

in our topology from our simulated hijacks. Since the resilience of each node results

from the average over 1,000 attackers, we also show the standard deviation range.

Note, higher values of resilience imply more resilience against hijacks.

This distribution shows that node resilience varies fairly linearly except at the two

120

0

0.2

0.4

0.6

0.8

1

0 5000 10000 15000 20000

Res

ilien

cy

Node ID

Average (µ)µ + σµ - σ

Figure 7.2: Distribution of node resilience.

0

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

1

0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1

Fre

quen

cy (

CD

F)

Node resiliency

Tier-1sTransits

Stubs

Figure 7.3: Resilience of nodes in different tiers.

extremes. Figure 7.2 also shows that the deviations at the two extremes are quite small

compared to the middle, indicating that some nodes(top left) are very resilient against

hijacks, while some others (bottom right) are easily attacked, regardless of the location

of the false origin.

As a first step in understanding how different nodes differ in their resilience, we

classify nodes into the three classes already described: tier-1, transit and stub and

plot the average resilience distribution (CDF) of each class of nodes in Figure 7.3. We

observe that the resilience distribution is very similar for transits and stubs, with transit

nodes being a little more resilient than stubs.

In contrast, tier-1 nodes show a very different distribution from the stubs and tran-

121

sits. From Figure 7.3 we observe that all the tier-1 nodes have an average resilience

value between 0.4 and 0.5. In addition, we note that about 40% of stubs and 55%

of transit nodes are more resilient than all tier-1 nodes. With tier-1 nodes being the

ones with the highest degree, it is surprising to see that close to 50% of the nodes

in the Internet are more resilient than tier-1s. Next, we explain why tier-1 nodes are

more vulnerable to hijacks than a lot of other nodes and generalize this explanation to

understand the characteristics impacting resilience.

7.3.3 Factors Affecting Resilience

We first understand the resilience of tier-1 nodes with a simple hijack scenario in Fig-

ure 7.4. AS-2, AS-3, AS-4 and AS-5 represent 4 tier-1 nodes inter-connected through

a peer-peer relationship. AS-1 and AS-6 are small ISPs connected to tier-1 AS nodes

through a customer-provider relationship. Finally AS-7 is a multi-homed customer of

AS-1 and AS-6. In Figure 7.4, AS-7 represents the false origin that hijacks a prefix

belonging to a tier-1 node, AS-4.

Recall in no-valley prefer customer policy, a customer route is preferred over a

peer route which in turn is preferred over a provider route. When AS-7 hijack’s AS-

4’s prefix and announces the false route to AS-1 and AS-6, both AS-1 and AS-6 prefer

the hijacked route over the genuine route to AS-4 since its a customer route. AS-1 in

turn announces the hijacked route to its tier-1 providers AS-2 and AS-3. These tier-1

AS nodes, AS-2 and AS-3 now have to choose between a customer route through AS-

1(hijacked route), and a peer route through AS-4 (genuine route). Again due to policy

preference, the tier-1 nodes will choose the customer route which happens to be the

hijacked route. Similarly, AS-5 will also choose the hijacked route. Once big ISPs like

tier-1 nodes are deceived by the hijacker, their huge customer base (many of whom are

single homed) are also deceived, thus causing a high impact. One can see from this

122

example, that the main reason for the low resilience in the case of a hijack on a tier-1

node is that tier-1 nodes inter-connect through peer-peer relationship thus rendering a

genuine route less preferred to other tier-1 nodes than hijacked routes from customers.

The key to high resilience is to make the tier-1 nodes and other big ISPs always

believe the true origin. The way to achieve this is to reach as many tier-1 nodes as

possible using a provider route. In addition, when a node has to choose between two

routes of the same preference, path length becomes a deciding factor, and thus the

shorter the number of hops to reach the tier-1 nodes, the better the resilience. From

our observations from simulation results, we found that the most resilient nodes are

direct customers of many tier-1 nodes and other big ISPs. As an example, in our

simulations, the node with highest resilience is a stub (AS-6432 DoubleClick) directly

connected to 6 tier-1 nodes, having a resilience value of 0.95. The nodes with lowest

resilience were single-home customers, connected to poorly connected providers.

To better understand the influence of tier-1 nodes, we classified the nodes in the

Internet based on the number of direct tier-1 providers. Figure 7.5 shows the distri-

bution of resilience for nodes with different connectivity to Tier-1. Note, the closer

the curve to the right hand side of the figure (x=1), the better the resilience of that set

of nodes. There are about 21,888 nodes with less than 3 connections to Tier-1, and

we observe in Figure 7.5 that these nodes are the least resilient. A total of 379 nodes

are directly connected to 3 Tier-1s and 104 nodes are connected to 4 Tier-1s. Only

88 nodes are connected to more than 4 Tier-1s, and these nodes prove to be the most

resilient, highlighting the role of connecting to multiple tier-1 nodes.

Summary: In this section, we used an Internet scale topology with no-valley prefer

customer policy routing to evaluate the resilience of nodes against random hijackers.

The key to achieve high resilience is to protect tier-1 nodes and other big ISPs from

being deceived by the hijacker. Our main result shows that the nodes that are direct

123

4

2

1

3

6

5

Provider CustomerPeer Peer

Tier-1

7False origin

True origin

4 4

4 4

4

Figure 7.4: Understanding resilience of tier-1 nodes

0

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

1

0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1

Fre

quen

cy (

CD

F)

Node resiliency

<3 Tier-1s=3 Tier-1s=4 Tier-1s>4 Tier-1s

Figure 7.5: Resilience of nodes with different number of Tier-1 providers.

124

customers of multiple tier-1 nodes are the most resilient to hijacks. On the other hand,

the tier-1 nodes themselves in spite of being so well connected, are much less resilient

to hijack. The next question we seek to answer in Section 7.4 is whether there is

evidence of such behavior in reality, where the routing decision process is much more

complex.

7.4 Prefix Hijack Incidents in the Internet

In this section we examine two hijack events, one from January 2006 which affected

a few tens of prefixes, and the other from December 2004 when over 100,000 prefixes

were hijacked. To gauge the impact of the prefix hijacks, we analyzed the BGP routing

data collected by the Oregon collector of the RouteViews project. The Oregon collec-

tor receives BGP updates from over 40 routers. These 40 routers belong to 35 different

AS nodes (a few AS nodes have more than one BGP monitor) and we consider an AS

as deceived by a hijack if at least one BGP monitor from that AS believes the hijacker.

We call these 35 AS nodes as monitors, as they provide BGP monitoring information

to the Oregon collector. The impact of a hijack is then gauged by the ratio of monitors

in the Internet that were deceived.

7.4.1 Case I: Prefix Hijacks by AS-27506

On January 22, 2006, AS-27506 announced a number of prefixes that did not belong

to it. This hijack incident was believed to be due to operational errors, and most of

the hijacked prefixes were former customers of AS-27506. We observed a total of

40 prefixes being hijacked by AS-27506. These 40 prefixes belonged to 22 unique

ASes. We present two representative prefixes; for the first prefix the false origin could

only deceive a small number of monitors, while for the second prefix the false origin

125

deceived the majority of the monitors. We examine the topological connectivity of the

true origins as compared to that of the false origin and the relation to the true origin’s

resiliency.

7.4.1.1 High Resiliency against Hijack

We examine a hijacked prefix that belongs to the true origin AS-20282. The impact

of hijacking this prefix is just over 10%, that is 4 out of the 35 monitored ASes were

deceived by the hijack. Figure 7.6(a) depicts the connectivity of some of the entities

involved in this hijack incident. The nodes colored in gray are the nodes deceived by

the false origin AS-27506, and the white nodes persisted with the true origin. The

true origin AS-20282 is a direct customer of two tier-1 nodes, AS-701 and AS-3356.

Before the hijack incident, all the 35 monitors used routes containing one of these two

tier-1 ASes as the last hop in the AS path to reach the prefix. The hijacker AS-27506

is a customer of AS-2914, another tier-1 node. When AS-27506 hijacked the prefix,

AS-2914 chose the false customer route from AS-27506 over an existing peer route

through AS-701. The false route was further announced by AS-2914 to other tier-1

peers including AS-701 and AS-3356, however neither of them adopted the new route

because they chose the customer route announced by the true origin AS-20282. Other

tier-1 ASes, such as AS-1239 (not shown in the figure), did not adopt to the false route

from AS-2914 either, most likely because the newly announced false route was 2 hops

in length, the same as that of their existing route through AS-701 or AS-3356, and

the recommended practice suggests to avoid unnecessary best path transitions between

equal external paths [36]. However we note that AS-3130, who is a customer of both

a deceived and an unaffected tier-1 providers, also got deceived, possibly because the

new path {2914, 27506} is shorter than the original path which contained 3 AS hops.

126

701 2914

3356

20282 3130

Tier-1

PeerPeerProvider Customer

true Origin

27506

false origin

(a) High resiliency: Tier-1 provider

2914 preferred the customer route to

false origin 27506 instead of the peer

route. Similarly tier-1 providers 701

and 3356 stayed with their customer

routes to the true origin 20282. Other

tier-1 providers like X received a peer

route to false origin that is no bet-

ter than existing route and did not

change route. 3130 routed to the false

origin since the route via one of its

providers, 2914, was shorter

2914

12006 3130

Tier-1

PeerPeerProvider Customer

true Origin

27506

false origin

10910

23011

286

(b) Low resiliency: Tier-1 providers

like Y with a customer route to true

origin 23011 were not deceived by

false origin. Other tier-1 providers

like X received a shorter peer route

through 2914 and hence routed to

false origin. 286 preferred the shorter

route to 27506 via 2914 and was de-

ceived.

Figure 7.6: Case study: AS-27506 as false origin7.4.1.2 Low Resiliency against Hijack

Next, we examine another hijacked prefix which belonged to AS-23011. The average

impact of this hijacked prefix is 0.6, i.e. 21 out of the 35 monitors were deceived by

the hijack. Figure 7.6(b) shows the most relevant entities involved in this prefix hijack.

The true origin of this prefix was an indirect customer of 5 tier-1 ASes (not all of them

are shown in the figure) through its direct providers AS-12006 and AS-10910. The

connectivity of the hijacker is the same as before, and AS-2914 was deceived by the

hijack. The 5 tier-1 ASes on the provider path of the true origin stayed with the route

127

from the true origin AS-23011, however the rest of the tier-1 ASes were deceived this

time, possibly because the peer route to false origin through AS-2914 was shorter than

any other peer route to the true origin. AS-286 is a customer of the providers of both

the true and false origins, and it picked the false route through AS-2914 because it was

shorter. We note that, in this case, the true origin being indirect customers of multiple

tier-1 ASes ensured that those tier-1 ASes themselves did not get deceived, however

due to its longer distance to reach these tier-1 providers (compared to the true origin

in Figure 7.6(a)), other tier-1 ASes and their customers chose the shorter route to the

false origin.

One of the tier-1 providers that propagated the false route is known to verify the

origin of received routes with the Internet Routing Registries (IRR). However, it did not

block the hijack because the registry entries were outdated and still listed AS-27506

as an origin for the hijacked prefixes, and hence the hijack announcements passed the

registry check.

7.4.2 Case II: Prefix Hijacks by AS-9121

In this hijack incident, operational errors led AS-9121 to falsely announce routes to

over 100,000 prefixes on December 24, 2004. We use this case to evaluate the re-

siliency of tier-1 Ases as compared to that of direct customers of multiple tier-1 ASes.

Due to the large number of prefixes being falsely announced, some BGP protection

mechanisms such as prefix filters and maximum prefix limit, where an AS sets an up-

per limit on the number of routes a given neighbor may announce, were triggered and

made an effect on the overall impact. Given that multiple factors were involved in such

a large scale hijack event, it is difficult to accurately model the impact on an AS as a

function of its topological connectivity. Our objective in examining this case is to find

supporting evidence for our observations made in Section 7.3, as opposed to a detailed

128

study over all the hijacked prefixes. Similar to case-1, we observed how many moni-

tors were deceived for each hijacked prefix and used this result to gauge the resiliency

of the true origin AS.

7.4.2.1 Hijacked Tier-1 AS Prefix

In order to understand how tier-1 ASes fared against AS-9121 hijack, we studied the

impact of those hijacked prefixes that belonged to AS-7018, a tier-1 AS. Note that

AS-7018 announced over 1500 prefixes, and the impacts of different prefixes varied

noticeably, with around 7 to 8 monitors being deceived for most prefixes. For our case

study, we examine one of the hijacked prefixes which deceived the majority of the

monitors. Figure 7.7(a) shows the entities involved in the hijack of this tier-1 prefix.

The hijacker AS-9121 was connected to 3 providers, one of which was AS-1239,

a tier-1 AS. The true origin of the prefix in question was AS-7018, another tier-1 AS.

The grey nodes in the figure indicate those deceived by the hijack. All the 3 providers

of AS-9121, namely AS-1239, AS-6762, and AS-1299 were deceived into believing

the false origin. AS-1299 also propagated the false route to its tier-1 AS providers.

From our observations, a total of 19 out of 35 monitors were deceived by this hijack.

7.4.2.2 Hijacked Prefix belonging to Customer of Tier-1s

Next, we see how the AS-9121 hijack incident affected the prefixes belonging to an

AS that was a direct customer of multiple tier-1 ASes. We picked AS-6461 as an

example here because it connected to all the 8 tier-1 ASes. AS-6461 announced over

100 prefixes, 87 of which were hijacked by AS-9121. No more than 2 monitors were

deceived by the false origin of all the hijacked prefixes. Figure 7.7(b) shows the entities

involved in the hijack of one of the prefixes belonging to AS-6461. As before, AS-

6762 believed the false origin and was one of the monitors deceived of all the hijacked

129

1239

6762

Tier-1

PeerPeerProvider Customer

true Origin

9121

false origin

7018

Y

X

1299

(a) Tier-1 prefix hijacked: Tier 1

providers like 1239 and X, preferred

the customer route to false origin

9121, instead of peer route to the true

origin 7018, also a tier-1.

1239

6762

Tier-1

PeerPeerProvider Customer

true Origin

9121

false origin

X

12996461

(b) Multi-homed customer of tier-1s

hijacked: Providers of false origin

9121 got deceived, but all tier-1s in-

cluding 1239, stayed with the one hop

customer route to true origin 6461

Figure 7.7: Case studies with AS-9121 as false origin

prefixes of AS-6461. However, because all the tier-1 ASes were direct providers of

AS-6461, they stayed with the original one-hop customer route to the true origin; in

particular, note that AS-1239 was a provider for both the true origin and the hijacker,

and it stayed with the original correct route. As a result, the hijack of AS-6461’s

prefixes made a very low impact.

In addition to AS-6461, we also studied the impacts of prefixes belonging to a few

other transit ASes that were very well connected to tier-1 ASes, and found the impact

pattern for their prefixes to be very similar to the AS-6461 case. To summarize, this

real life hijack event showed strong evidence that direct multi-homing to all or most

tier-1 ASes can greatly increase an AS’s resiliency against prefix hijacks.

130

7.5 Discussion

It has been long recognized that prefix hijacking can be a serious security threat to the

Internet. Several hijack prevention solutions have been proposed, such as SBGP [46],

so-BGP [69], and more recently the effort in the IETF Secure Inter-Domain Routing

Working Group [16]. These proposed solutions use cryptographic-based origin au-

thentication mechanisms, which require coordinated efforts among a large number of

organizations and thus will take time to get deployed. Meanwhile prefix hijack inci-

dents occur from time to time and our work provides an assessment of the potential

impacts of these incidents. Several hijack detection systems have also been developed,

for example MyASN[80] and PHAS[54]. However since these systems are reactive in

nature, it is still important for network customers to understand the relations between

their networks’ topological connectivity and the potential vulnerability in face of prefix

hijacks.

Our simulation and analysis show that AS nodes with large node-degrees (e.g.,

tier-1 networks) are not the most resilient against hijacks of their own prefixes. An AS

can gain high resiliency against prefix hijacks by being direct or indirect customers

of multiple tier-1 providers with the shortest possible AS paths. Conversely, such

customer AS nodes can also make the most impact over the entire Internet, if they

inject false routes into the Internet. This finding suggests that securing the routing

announcements from the major ISPs alone is not effective in curbing a high impact

attack, and that it is even more important to watch the announcements from lower-tier

networks with good topological connectivity.

On the other hand, customer networks that are far away from their indirect tier-1

providers can be greatly affected if their prefixes get hijacked. These topologically

disadvantaged AS nodes are in the most need for investigating other means to pro-

tect themselves. Subscribing to prefix hijack detection systems, such as MyASN and

131

PHAS, would be helpful. To reduce the transient impact during the detection delay,

one may also look into another proposed solution called PGBGP [45], which is briefly

described in Section 8.

Note that the topological connectivity required for resiliency against prefix hijacks

is different from that required for fast routing convergence [71]. Fast convergence

benefits from fewer alternative paths when the routes change, thus prefixes announced

by tier-1 providers meet the requirement well; while hijack resiliency benefits from

being a direct or indirect customer of a large number of tier-1 providers, thus prefixes

are better hosted by well connected non-tier-1 AS nodes.

We would like to end this discussion by stressing the importance of understand-

ing prefix hijack impacts, even when the protection mechanisms are put in place. Our

evaluations on an Internet scale topology in Section 7.3 used a no-valley prefer cus-

tomer routing policy and showed that tier-1 AS nodes are not very resilient to hijacks

of their own prefixes since other tier-1 AS nodes prefer customer routes to false origin.

However, in reality a tier-1 AS may use various mechanisms, such as Internet Routing

Registries (IRR), to check the origin of a prefix before forwarding the route. Such

mechanisms would probably boost the resiliency of tier-1 AS nodes being hijacked.

On the other hand, these protection mechanisms can also fail or backfire, thus expos-

ing the vulnerability of a network. As we saw in case I of Section 7.4, most of the

hijacked prefixes were the former customers of the false origin AS and were recorded

in the Internet Routing Registry (IRR), which was not updated. Outdated registries

resulted in false routes being propagated to the rest of the Internet.

Another example of a protection mechanism is the maximum prefix filter in BGP

that allows an AS to configure the maximum number of routes received from a neigh-

bor. Thus, by limiting the total number of routes received from a neighbor, an AS

can limit the damage in case of the neighbor announcing false routes. In case II from

132

Section 7.4, AS-9121 announced over 100,000 false routes and one of its neighbors,

AS-1299, had a max prefix set to a relatively low value. AS-1299 believed only 1849

routes directly from AS-9121, but since the max prefix limit is per neighbor, AS-

1299 received hijacked routes from other neighbors as well. It learned a total of over

100,000 bad routes from all the neighbors combined, thus infecting a major portion

of its routing table [74]. These examples show how easily protection mechanisms can

fail due to human errors, underlining the need to understand the impact of hijacks in

face of protection failures, and the need to protect networks by multiple means such as

PGBGP and PHAS.

133

CHAPTER 8

Related Work

8.1 Internet Topology Modeling

Three main types of data sets have been available for AS-level topology inference: (1)

BGP tables and updates, (2) traceroute measurements, and (3) Internet Routing Reg-

istry (IRR) information. BGP tables and updates have been collected by the University

of Oregon RouteViews project [15] and by RIPE-RIS in Europe [14]. Traceroute-based

datasets have been gathered by CAIDA as part of the Skitter project [17], by an EU-

project called Dimes [82], and more recently by the iPlane project [57]. Other efforts

have extended the above measurements by including data from the Internet Routing

Registry [27, 83, 96]. However, studies that have critically relied on these topology

measurements have rarely examined the data quality in detail, thus the (in)sensitivity

of the results and claims to the known or suspected deficiencies in the measurements

has largely gone unnoticed.

Chang et al. [27, 30, 26] were among the first to study the completeness of com-

monly used BGP-derived topology maps; later studies [99, 77, 96], using different data

sources, yielded similar results confirming that 40% or more AS links may exist in the

actual Internet but are missed by the BGP-derived AS maps. He et al. [96] report an

additional 300% of peer links in IRR compared to those extracted from BGP data, how-

ever this percentage is likely inflated since they only took RIB snapshots from 35 of

the ∼700 routers providing BGP feeds to RouteViews and RIPE-RIS. All these efforts

134

have in common that they try to incrementally close the completeness gap, without first

quantifying the degree of (in)completeness of currently inferred AS maps. This thesis

relies on the ground truth of AS-level connectivity of different types of ASes to shed

light on what and how much is missing from the commonly-used AS maps and why.

Dimitropoulos et al. [34] use AS adjacencies as reported by several ISPs to validate an

AS relationship inference heuristic. They found that most links reported by ISPs that

are not in the public view are peer links. In contrast to their work, most of our findings

are inferred from iBGP tables, router configs, and syslog records collected over time

from thousands of routers. Our approach yields an accurate picture of the ground truth

as far as BGP adjacencies are concerned and allows us to verify precisely for each AS

link x, why x was missing from public view. Lastly, in view of the recent work [81]

that concludes that an estimated 700 route monitors would suffice to see 99.9% of all

AS-links, our approach shows that such an overall estimate comes with an important

qualifier: what is important is not the total number of monitors, but their locations

within the AS hierarchy. In fact, our findings suggest a simple strategy for placing

monitors to uncover the bulk of missing links, but unfortunately researchers have in

general little input when it comes to the actual placement of new monitors.

8.2 Path Exploration

There are two types of BGP update characterization work in the literature: passive

measurements [49, 50, 48, 90, 21, 58, 79, 93, 38], and active measurements [51, 52,

62]. The work presented in this thesis belongs to the first category. We conducted a

systematic measurement to classify routing instability events and quantify path explo-

ration for all the prefixes in the Internet. Our measurement also showed the impact of

AS’s tier level on the extent of path explorations.

Existing measurements of path exploration and slow convergence have all been

135

based on active measurements [51, 52, 62], where controlled events were injected into

the Internet from a small number of beacon sites. These measurement results demon-

strated the existence of BGP path exploration and slow convergence, but did not show

to what extent they exist on the Internet under real operational conditions. In contrast,

in this thesis we classify routing events of all prefixes, as opposed to a small number of

beacon sites, into different categories, and for each category we provide measurement

results on the updates per event and event durations. Given we examine the updates

from multiple peers for all the prefixes in the global routing table, we are able to iden-

tify the impact of AS tier levels on path exploration. Regarding the relation between

the tier levels of origin ASes, our results agree with previous active measurement work

[52] (using a small number of beacon sites) that prefixes originated from tier-1 ASes

tend to experience less slow convergence compared to prefixes originated from lower

tier ASes. Moreover, our results also showed that, for the same prefix, routers of dif-

ferent AS tiers observe different degree of slow convergence, with tier-1 ASes seeing

much less than lower tier ASes.

Existing passive measurements have studied the instability of all the prefixes. The

focuses have been on update inter-arrival time, event duration, and location of insta-

bility, and characterization of individual updates [49, 50, 48, 90, 21, 58, 79, 93, 38].

There is no previous work on classifying routing events according to their effects (e.g.

whether path becomes better or worse after the event). This thesis describe a novel path

preference heuristic based on path usage time, and studied in detail the characteristics

of different classes of instability events in the Internet.

Our approach shares certain similarity with [79, 93, 38] in that we all use a timeout-

based approach to group updates into events. Such an approach can mistakingly group

updates of multiple root causes that happened close to each other or overlapped in

time into a single event. As we discussed earlier, the events in our Path-Disturbance

136

category can be examples of grouping updates of overlap root causes, because the path

to a prefix changed at least twice, and often more times, during one event. We moved

a step forward by detecting and separating these overlapping events into a different

category. It is most likely that those Path-Change events with very long durations are

also overlapping events, and one possible way to identify them is to set a time threshold

on the event duration, which we plan to do in the future.

8.3 Prefix Hijacking

Previous efforts on prefix hijacking can be broadly sorted into two categories: hijack

prevention and hijack detection. Generally speaking, prefix hijack prevention solutions

are based on cryptographic authentications [86, 69, 46, 56, 84] where BGP routers

sign and verify the origin AS and AS path of each prefix. In addition to added router

workload, these solutions require changes to all router implementations, and some

of them also require a public key infrastructure. Due to these obstacles, none of the

proposed prevention schemes is expected to see deployment in near future.

A number of prefix hijack detection schemes have been developed recently [80, 54,

75, 45]. A commonality among these solutions is that they do not use cryptographic-

based mechanisms. In [75], any suspicious route announcements received by an AS

trigger verification probes to other AS nodes and the results are reported to the true

origin. In PGBGP [45], each router monitors the origin AS nodes in BGP announce-

ments for each prefix over time; any newly occurred origin AS of a prefix is considered

anomalous, and the router avoids using anomalous routes if the previously existing

route to the same prefix is still available. Different from the above en route detection

schemes, MyASN[80] is an offline prefix hijack alert service provided by RIPE. A

prefix owner registers the valid origin set for a prefix, and MyASN sends an alarm via

regular email when any invalid origin AS is observed in BGP routing updates. PHAS

137

[54] is also an off-path prefix hijack detection system which uses BGP routing data

collected by RouteViews and RIPE. Instead of asking prefix owners to register valid

origin AS sets as is done by MyASN, PHAS keeps track of the origin AS set for each

announced prefix, and sends hijack alerts via multiple path email delivery to the true

origin.

Unlike the prevention schemes, a hijack detection mechanism provides only half

of the solution: after a prefix hijack is detected, correction steps must follow. A recent

proposal called MIRO [95] gives end users the ability to perform correction after

detecting a problem. MIRO is a new inter-domain routing architecture that utilizes

multiple path routing. In MIRO, AS nodes can negotiate alternative routes to reach a

given destination, potentially bypassing nodes affected by hijack attacks.

The work presented in this thesis can be considered orthogonal to all the existing

efforts in the area. It examines the relation between an AS node’s topological con-

nectivity and its resiliency against false route attacks, or conversely, an AS node’s

topological connectivity and its impact as a launching pad for prefix hijacks.

138

CHAPTER 9

Conclusion

Assessing the quality of inferred AS-level Internet topology maps is an important and

difficult problem. There have been generally accepted notions that the public view

is good at capturing customer-provider links but may miss peering links. However,

there has been no systematic effort to provide hard evidence to either confirm or dis-

miss these notions. This thesis represents an important step towards addressing this

challenging problem. Recognizing that it is impractical to obtain a complete AS topol-

ogy through currently pursued data collection efforts, we approach the problem from

a new and different angle: obtaining the ground truth of sample ASes’ connectivity

structures and comparing them with the AS connectivity inferred from publicly avail-

able data sets. A key benefit we derive from this new way of tackling the problem is

that we gain a basic understanding of not only what parts of the actual topology may

be missing from the inferred ones, but also how severe the incompleteness problem

may be.

A critical aspect of our search for the ground truth of AS-level Internet connectivity

and of the proposed pragmatic approach to constructing realistic and viable AS maps

is that they both treat ASes not as generic nodes but as objects with a rich, important,

and diverse internal structure. Exploiting this structure is at the heart of this work. The

nature of this AS-internal structure permeates our definition of “ground truth” of AS-

level connectivity, our analysis of the available data sets in search of this ground truth,

our detailed understanding of the reasons behind and importance of the deficiencies

139

of commonly-used AS-level Internet topologies, and our proposed efforts to construct

realistic and viable maps of the Internet’s AS-level ecosystem. Faithfully account-

ing for this internal structure can also be expected to favor the constructions of AS

maps that withstand scrutiny by domain experts. Such constructions also stand a better

chance to represent fully functional and economically viable AS-level topologies than

models where the interconnections between different ASes are solely determined by

independent coin tosses. Validating the consistency of an approach to understanding

the AS-level Internet that utilizes the network-intrinsic meaning of what a node and

a link represents clearly requires extra efforts and creativity and will therefore feature

prominently in our future research efforts in this area.

The Internet is becoming flat, meaning the paths are becoming shorter and more

networks establish direct connectivity to avoid upstream costs and speed up content

delivery to customers. It’s important to understand the dependency between routing

protocol properties and this evolving trend. In this thesis we have studied two of these

properties: BGP path exploration and resiliency to prefix hijacks. As the connectivity

denseness of the network increases, we expect to see the values of path exploration to

increase as well, because there will be more possible paths to explore between lower

tier networks to reach other lower tier networks. On the other hand, our insights from

the prefix hijack analysis are particularly relevant to make connectivity recommenda-

tions to minimize the impact of prefix hijacks. In this case, the most resilient scenario

would be to connect directly to all the Tier-1 networks. We are currently developing

a solution to recover from hijack attacks based on this insight, and this is also part of

our future work.

140

REFERENCES

[1] AOL peering requirements. http://www.atdn.net/settlement free int.shtml.

[2] AT&T peering requirements. http://www.corp.att.com/peering/.

[3] CERNET BGP feeds. http://bgpview.6test.edu.cn/bgp-view/.

[4] European Internet exchange association. http://www.euro-ix.net.

[5] Geant2 looking glass. http://stats.geant2.net/lg/.

[6] Good practices in Internet exchange points.http://www.pch.net/resources/papers/ix-documentation-bcp/ix-documentation-bcp-v14en.pdf.

[7] Internet Routing Registry. http://www.irr.net/.

[8] Packet clearing house IXP directory. http://www.pch.net/ixpdir/Main.pl.

[9] PeeringDB website. http://www.peeringdb.com/.

[10] Personal Communication with Bill Woodcock@PCH.

[11] PSG Beacon List.

[12] Regional Internet Registry data. ftp://www.ripe.net/pub/stats.

[13] RIPE Beacon List.

[14] RIPE routing information service project. http://www.ripe.net/.

[15] RouteViews routing table archive. http://www.routeviews.org/.

[16] Secure Inter-Domain Routing (SIDR) Working Group.http://www1.ietf.org/html.charters/sidr-charter.html.

[17] Skitter AS adjacency list. http://www.caida.org/tools/measurement/skitter/as adjacencies.xml.

[18] Skitter destination list. http://www.caida.org/analysis/topology/macroscopic/list.xml.

[19] The Abilene Observatory Data Collections.http://abilene.internet2.edu/observatory/data-collections.html.

[20] Reka Albert and Albert-Laszlo Barabasi. Topology of evolving networks: localevents and universality. Physical Review Letters, 85(24):5234–5237, 2000.

141

[21] David Andersen, Nick Feamster, Steve Bauer, and Hari Balakrishnan. Topologyinference from bgp routing dynamics. In ACM SIGCOMM Internet Measure-ment Workshop (IMW), 2002.

[22] Hitesh Ballani, Paul Francis, and Xinyang Zhang. A study of prefix hijackingand interception in the Internet. In Proc. of ACM SIGCOMM, 2007.

[23] Albert-Laszlo Barabasi and Reka Albert. Emergence of scaling in random net-works. Science, 286(5439):509–512, 1999.

[24] T. Bu and D. Towsley. On distinguishing between Internet power law topologygenerators. In Proc. of IEEE INFOCOM, 2002.

[25] D. Chang, R. Govindan, and J. Heidemann. The temporal and topological char-acteristics of BGP path changes. In Proc. of the Int’l Conf. on Network Proto-cols (ICNP), November 2003.

[26] H. Chang. An Economic-Based Empirical Approach to Modeling the InternetInter-Domain Topology and Traffic Matrix. PhD thesis, University of Michigan,2006.

[27] H. Chang, R. Govindan, S. Jamin, S. J. Shenker, and W. Willinger. Towardscapturing representative AS-level Internet topologies. Elsevier Computer Net-works Journal, 2004.

[28] H. Chang, S. Jamin, and W. Willinger. Inferring AS-level Internet topologyfrom router-level path traces. In SPIE ITCom, 2001.

[29] H. Chang, S. Jamin, and W. Willinger. To peer or not to peer: modeling theevolution of the Internet’s AS-level topology. In Proc. of IEEE INFOCOM,2006.

[30] H. Chang and W. Willinger. Difficulties measuring the Internet’s AS-levelecosystem. In Annual Conference on Information Sciences and Systems(CISS’06), pages 1479–1483, 2006.

[31] Hyunseok Chang, Sugih Jamin, and Walter Willinger. Internet connectivity atthe AS-level: an optimization-driven modeling approach. In Proc ACM SIG-COMM MoMeTools workshop, 2003.

[32] Qian Chen, Hyunseok Chang, Ramesh Govindan, Sugih Jamin, Scott Shenker,and Walter Willinger. The origin of power-laws in Internet topologies revisited.In Proc. of IEEE INFOCOM, 2002.

142

[33] Brent Chun, David Culler, Timothy Roscoe, Andy Bavier, Larry Peterson,Mike Wawrzoniak, and Mic Bowman. Planetlab: an overlay testbed for broad-coverage services. ACM SIGCOMM Computer Comm. Review (CCR), 33(3):3–12, 2003.

[34] Xenofontas Dimitropoulos, Dmitri Krioukov, Marina Fomenkov, Bradley Huf-faker, Young Hyun, kc claffy, and George Riley. As relationships: inferenceand validation. ACM SIGCOMM Comput. Commun. Rev., 2007.

[35] Danny Dolev, Sugih Jamin, Osnat Mokryn, and Yuval Shavitt. Internet re-siliency to attacks and failures under bgp policy routing. Computer Networks,50(16):3183–3196, 2006.

[36] S. Sangli E. Chen. Avoid BGP Best Path Transitions from One Externalto Another. Internet Draft, IETF, June 2006. http://www.ietf.org/internet-drafts/draft-ietf-idr-avoid-transition-04.txt.

[37] M. Faloutsos, P. Faloutsos, and C. Faloutsos. On power-law relationships of theinternet topology. In Proc. of ACM SIGCOMM, 1999.

[38] Anja Feldmann, Olaf Maennel, Z. Morley Mao, Arthur Berger, and BruceMaggs. Locating internet routing instabilities. In Proc. of ACM SIGCOMM,2004.

[39] Lixin Gao. On inferring autonomous system relationships in the Internet.ACM/IEEE Transactions on Networking, 2001.

[40] Timothy G. Griffin and Brian J. Premore. An experimental analysis of bgpconvergence time. In Proc. of the Int’l Conf. on Network Protocols (ICNP),2001.

[41] B. Halabi and D. McPherson. Internet Routing Architectures. Cisco Press, 2ndedition, 2000.

[42] Benjamin Hummel and Sven Kosub. Acyclic type-of-relationship problems onthe internet: an experimental analysis. In ACM IMC, 2007.

[43] G. Huston. AS Number Consumption. The ISP Column, September 2005.

[44] Y. Hyun, A. Broido, and kc claffy. On third-party addresses in traceroute paths.In Proc. of Passive and Active Measurement Workshop (PAM), 2003.

[45] J. Karlin, S. Forrest, and J. Rexford. Pretty good bgp: Protecting bgp by cau-tiously selecting routes. Technical Report TR-CS-2005-37, University of NewMexico, Octber 2005.

143

[46] S. Kent, C. Lynn, and K. Seo. Secure Border Gateway Protocol. IEEE Journalof Selected Areas in Communications, 18(4), 2000.

[47] Nate Kushman, Srikanth Kandula, and Dina Katabi. Can you hear me now?!: itmust be BGP. SIGCOMM Comput. Commun. Rev., 37(2):75–84, 2007.

[48] C. Labovitz, A. Ahuja, and F. Jahanian. Experimental study of Internet stabilityand wide-area network failures. In Proceedings of FTCS99, June 1999.

[49] C. Labovitz, G. Malan, and F. Jahanian. Internet Routing Instability. In Proc. ofACM SIGCOMM, September 1997.

[50] C. Labovitz, R. Malan, and F. Jahanian. Origins of Internet routing instability.In Proc. of IEEE INFOCOM, 1999.

[51] Craig Labovitz, Abha Ahuja, Abhijit Abose, and Farnam Jahanian. Delayed In-ternet routing convergence. IEEE/ACM Transactions on Networking, 9(3):293– 306, June 2001.

[52] Craig Labovitz, Abha Ahuja, Roger Wattenhofer, and Srinivasan Venkatachary.The impact of Internet policy and topology on delayed routing convergence. InProc. of IEEE INFOCOM, April 2001.

[53] M. Lad, R. Oliveira, B. Zhang, and L. Zhang. Understanding the resiliency ofInternet topology against false origin attacks. In Proc. of IEEE DSN, 2007.

[54] Mohit Lad, Dan Massey, Dan Pei, Yiguo Wu, Beichuan Zhang, and LixiaZhang. PHAS: A prefix hijack alert system. In 15th USENIX Security Sym-posium, 2006.

[55] Lun Li, David Alderson, Walter Willinger, and John Doyle. A first-principlesapproach to understanding the Internet’s router-level topology. In Proc. of ACMSIGCOMM, 2004.

[56] S.W. Smith M. Zhao and D. Nicol. Aggregated path authentication for efficientbgp security. In 12th ACM Conference on Computer and Communications Se-curity (CCS), November 2005.

[57] H. Madhyastha, T. Isdal, M. Piatek, C. Dixon, T. Anderson, A. Krishnamurthy,and A. Venkataramani. iPlane: an information plane for distributed services. InProc. of OSDI, 2006.

[58] Olaf Maennel and Anja Feldmann. Realistic BGP traffic for test labs. InProc. of ACM SIGCOMM, 2002.

144

[59] Priya Mahadevan, Dimitri Krioukov, Kevin Fall, and Amin Vahdat. Systematictopology analysis and generation using degree correlations. In Proc. of ACMSIGCOMM, 2006.

[60] Priya Mahadevan, Dmitri Krioukov, Marina Fomenkov, Xenofontas Dim-itropoulos, k c claffy, and Amin Vahdat. The Internet AS-level topology: threedata sources and one definitive metric. ACM SIGCOMM Computer Comm. Re-view (CCR), 2006.

[61] R. Mahajan, D. Wetherall, and T. Anderson. Understanding BGP Misconfigu-ration. In In Proc. of ACM SIGCOMM, 2002.

[62] Z. Morley Mao, Randy Bush, Tim Griffin, and Matt Roughan. BGP beacons.In ACM SIGCOMM Internet Measurement Conference (IMC), 2003.

[63] Z. Morley Mao, Lili Qiu, Jia Wang, and Yin Zhang. On AS-level path inference.In Proc. SIGMETRICS, 2005.

[64] Zhuoqing Morley Mao, Jennifer Rexford, Jia Wang, and Randy H. Katz. To-wards an accurate AS-level traceroute tool. In Proc. of ACM SIGCOMM, 2003.

[65] Athina Markopoulou, Gianluca Iannaccone, Supratik Bhattacharyya, Chen-NeeChuah, and Christophe Diot. Characterization of failures in an IP backbone. InIEEE Infocom, Hong Kong, 2004.

[66] Xiaoqiao Meng, Zhiguo Xu, Beichuan Zhang, Geoff Huston, Songwu Lu, andLixia Zhang. IPv4 address allocation and BGP routing table evolution. In ACMSIGCOMM Computer Communication Review (CCR) special issue on InternetVital Statistics, Janurary 2005.

[67] Wolfgang Muhlbauer, Anja Feldmann, Olaf Maennel, Matthew Roughan, andSteve Uhlig. Building an AS-topology model that captures route diversity. InProc. of ACM SIGCOMM, 2006.

[68] Wolfgang Muhlbauer, Steve Uhlig, Bingjie Fu, Mickael Meulle, and Olaf Maen-nel. In search for an appropriate granularity to model routing policies. In Proc.of ACM SIGCOMM, 2007.

[69] J. Ng. Extensions to BGP to Support Secure Origin BGP. ftp://ftp-eng.cisco.com/sobgp/drafts/draft-ng-sobgp-bgp-extensions-02.txt, April 2004.

[70] Ricardo Oliveira, Dan Pei, Walter Willinger, Beichuan Zhang, and Lixia Zhang.In Search of the elusive Ground Truth: The Internet’s AS-level ConnectivityStructure. In Proc. ACM Sigmetrics, 2008.

145

[71] Ricardo Oliveira, Beichuan Zhang, Dan Pei, Rafit Izhak-Ratzin, and LixiaZhang. Quantifying Path Exploration in the Internet. In ACM Internet Mea-surement Conference (IMC), October 2006.

[72] Ricardo Oliveira, Beichuan Zhang, and Lixia Zhang. Observing the evolutionof Internet AS topology. In ACM SIGCOMM, 2007.

[73] Dan Pei, , Beichuan Zhang, Daniel Massey, and Lixia Zhang. An analysisof path-vector routing protocol convergence algorithms. Computer Networks,50(3):398 – 421, 2006.

[74] Alin C. Popescu, Brian J. Premore, and Todd Underwood. Anatomy of a leak:AS 9121. NANOG-34, May 2005.

[75] Sophie Qiu, Fabian Monrose, Andreas Terzis, and Patrick McDaniel. Efficienttechniques for detecting false origin advertisements in inter-domain routing. InSecond workshop on Secure Network Protocols (NPSec), 2006.

[76] Anirudh Ramachandran and Nick Feamster. Understanding the network-levelbehavior of spammers. In Proceedings of ACM SIGCOMM, 2006.

[77] Danny Raz and Rami Cohen. The Internet dark matter: on the missing links inthe AS connectivity map. In Proc. of IEEE INFOCOM, 2006.

[78] Y. Rekhter, T. Li, and S. Hares. Border Gateway Protocol 4. RFC 4271, InternetEngineering Task Force, January 2006.

[79] Jennifer Rexford, Jia Wang, Zhen Xiao, and Yin Zhang. BGP routing stabilityof popular destinations. In ACM SIGCOMM Internet Measurement Workshop(IMW), 2002.

[80] RIPE. Routing information service: myASn System.http://www.ris.ripe.net/myasn.html.

[81] Matthew Roughan, Simon Jonathan Tuke, and Olaf Maennel. Bigfoot,sasquatch, the yeti and other missing links: what we don’t know about the asgraph. In ACM IMC, 2008.

[82] Yuval Shavitt and Eran Shir. DIMES: Let the Internet measure itself. ACMSIGCOMM Computer Comm. Review (CCR), 2005.

[83] G. Siganos and M. Faloutsos. Analyzing BGP policies: Methodology and tool.In Proc. of IEEE INFOCOM, 2004.

[84] B. R. Smith, S. Murphy, and J. J. Garcia-Luna-Aceves. Securing the bordergateway routing protocol. In Global Internet’96, November 1996.

146

[85] L. Subramanian, S. Agarwal, J. Rexford, and R. Katz. Characterizing the inter-net hierarchy from multiple vantage points. In INFOCOM, 2002.

[86] L. Subramanian, V. Roth, I. Stoica, S. Shenker, and R. H. Katz. Listen andwhisper: Security mechanisms for bgp. In Proceedings of ACM NDSI 2004,March 2004.

[87] Lakshminarayanan Subramanian, Matthew Caesar, Cheng Tien Ee, Mark Han-dley, Morley Mao, Scott Shenker, and Ion Stoica. HLP: a next generation inter-domain routing protocol. In Proc. ACM SIGCOMM, 2005.

[88] W. B. Norton. The art of peering: the peering playbook, 2002.

[89] Lan Wang, Malleswari Saranu, Joel M. Gottlieb, and Dan Pei. UnderstandingBGP session failures in a large ISP. In Proc. of IEEE INFOCOM, 2007.

[90] Lan Wang, Xiaoliang Zhao, Dan Pei, Randy Bush, Daniel Massey, AllisonMankin, S. F. Wu, and Lixia Zhang. Observation and analysis of BGP behav-ior under stress. In ACM SIGCOMM Internet Measurement Workshop (IMW),2002.

[91] X. Wang and D. Loguinov. Wealth-based evolution model for the Internet AS-level topology. In Proc. of IEEE INFOCOM, 2006.

[92] J. Wu, Y. Zhang, Z. Mao, and K. Shin. Internet routing resilience to failures:analysis and implications. In Proc. of ACM CoNext, 2007.

[93] Jian Wu, Zhuoqing Morley Mao, Jennifer Rexford, and Jia Wang. Finding aneedle in a haystack: Pinpointing significant BGP routing changes in an IP net-work. In Symposium on Networked System Design and Implementation (NSDI),May 2005.

[94] Jianhong Xia and Lixin Gao. On the evaluation of AS relationship inferences.In Proc. of IEEE GLOBECOM, December 2004.

[95] Wen Xu and Jennifer Rexford. Miro: multi-path interdomain routing. In SIG-COMM, pages 171–182, 2006.

[96] Y. He, G. Siganos, M. Faloutsos, S. V. Krishnamurthy. A systematic frameworkfor unearthing the missing links: measurements and impact. In Proc. of NSDI,2007.

[97] Beichuan Zhang, Vamsi Kambhampati, Mohit Lad, Daniel Massey, and LixiaZhang. Identifying BGP Routing Table Transfers. In ACM SIGCOMM Miningthe Network Data (MineNet) Workshop, August 2005.

147

[98] Beichuan Zhang, Raymond Liu, Dan Massey, and Lixia Zhang. Internet Topol-ogy Project. http://irl.cs.ucla.edu/topology/.

[99] Beichuan Zhang, Raymond Liu, Daniel Massey, and Lixia Zhang. Collect-ing the Internet AS-level topology. ACM SIGCOMM Computer Comm. Review(CCR), 2005.

[100] Changxi Zheng, Lusheng Ji, Dan Pei, Jia Wang, and Paul Francis. A light-weight distributed scheme for detecting IP prefix hijacks in real-time. In Proc.of ACM SIGCOMM, 2007.

148