two days in the life of datcat overview and demonstration...overview and demonstration two days in...

overview and demonstrationTwo days in the life of

three DNS root servers

cooperative association for internet data analysis

16 November 2006

Bradley Huffaker

DatCatSM+

DNS anycast analysis

2

Using tcpdump from three roots

we examined the geographic and toplogical

clustering of DNS clients.

Data collected by OARC ISC with

COGENT and RIPE NCC.

Outline

3

DNS anycast analysis

- data source- analysis✦ diurnal patterns

✦ num. of requests/addresses per instance

✦ geographic reltionships / distances

✦ coverage topological / geographic

- visualization ✦ Influence Map

Data Source

4

• dates- January 10th-11th 2006 (47 hours)

• DNS sources- c-root (Cogent) 4 instances- f-root (ISC) 32 instances- k-root (RIPE) 11 instances

• geographic- Netacuity database used for geographic mapping

• topological - Route Views used for ASs and prefixes (January 10th)

Diurnal Patterns

5

The local instances mad1 and mty1 show clear diurnal patterns

0 100 200 300 400 500 600

00:0001/10

04:0001/10

08:0001/10

12:0001/10

16:0001/10

20:0001/10

00:0001/11

04:0001/11

08:0001/11

12:0001/11

16:0001/11

20:0001/11

00:0001/12

04:0001/12

# Re

ques

ts/s

ec

UTC time

Madrid (noon)

Monterrey (noon)

LA (noon)

mad1mty1lax1

Num. of Requests/Addresses

6

Instances sorted by the number of requests per second.

0 10 20 30 40 50 600

2

4

6

8

10x 105

Instance

# IP

add

ress

es

C!rootF!rootK!root

jfk1*

lax1*

ord1*

iad1*

sfo2*

pao1* linx*

delhi* tokyo*denicnap*

ams!ix*

muc1

ams1

gru1

lga1

0 10 20 30 40 50 600

500

1000

1500

2000

2500

Instance

# R

eque

sts

/ sec

C!rootF!rootK!root

iad1*

ord1* lax1*

jfk1*

pao1*

sfo2*

linx*

nap*

denic tokyo*

delhi*

ams!ix*

muc1 ams1 gru1 lga1

*global instance

Client Geographic locations

7

Instances sorted by the number of requests per second.

0

25

50

75

100

%

N.Amer S.Amer Europe Africa Asia Oceania

OceaniaAsiaAfricaEuropeS. AmerN. Amer

F: San Francisco (US) F: New York (US)

F: Sao Paulo (BR) F: Santiago (CL)

K: Reykjavik (IS) K: Helsinki (FI) F:Johannesburg(SA)

F: Tel Aviv (IL) K: Tokyo (JP) F: Brisbane (AU)

F: Auckland (NZ)

sfo2* pao1* delhi*ams!ix*

jfk1* lax1* ord1* iad1* nap* linx* tokyo*

0

25

50

75

100

%

N.Amer S.Amer Europe Africa Asia Oceania

OceaniaAsiaAfricaEuropeS. AmerN. Amer

F: San Francisco (US) F: New York (US)

F: Sao Paulo (BR) F: Santiago (CL)

K: Reykjavik (IS) K: Helsinki (FI) F:Johannesburg(SA)

F: Tel Aviv (IL) K: Tokyo (JP) F: Brisbane (AU)

F: Auckland (NZ)

sfo2* pao1* delhi*ams!ix*

jfk1* lax1* ord1* iad1* nap* linx* tokyo*

* global instance

CDF of distance from instance to client

8

Inflection points between 5000 and 7000 km are the result of clients clustering on continents.

0 5000 10000 15000 200000

20

40

60

80

100

Distance (km)

%

All

0 5000 10000 15000 200000

20

40

60

80

100

Distance (km)

%

LocalAllGlobal

0 5000 10000 15000 200000

20

40

60

80

100

Distance (km)

%

LocalAllGlobal

c-root f-root k-root

CDF of additional distance from optimal

9

The additional distance a client’s requests were routed from its geographically closest router.

0 5000 10000 15000 2000020

30

40

50

60

70

80

90

100

Additional Distance (km)

%

C!rootK!rootF!root

Topological Coverage

10

Instances sorted by AS coverage.

0 10 20 30 40 50 600

10

20

30

40

50

60

70

Instance

%

C!rootF!rootK!root

iad1*

lax*ord*jfk*

pao1*

sfo2*

linx*

tokyo*denic

delhi*

nap*

ams!ix*

ams1 muc1

lga1

0 10 20 30 40 50 600

10

20

30

40

50

60

70

Instance%

C!rootF!rootK!root

iad1*

lax1*

ord1*

jfk1*

pao1*

sfo2*

linx*

tokyo*denicdelhi*

nap* ams!ix*

ams1 muc1 lga1

prefix coverageAS coverage

*global instance

1 2 3 4 51

10

100

1K

10K

100K

1M

10M

# instances requested by clients#

clie

nts

C!rootF!rootK!root

Client Stability

11

the number of instances queried by clients

the percentage of clients seen by multiple servers

C!root F!root K!root0

1

2

3

4

5

%

Influence Map Breakdown

12

k-apni c

nap

k-denic

ams-ix

delhi

linx

k-cern

tokyo

k-qtel

k-emix

k-mix

k-isnic

k-ficix

k-grnet

k-apnic

napk-denic

ams-ix

delhi

linx

tokyo

k-mix

k-apni

delhi

average distance to clients (size)

number of clients (size)

number of clients (color)wedge

nodegeographic location

displacement location

displacement mapDistance from client’s centriod and DNS server’s geographic location.

location mapNumber of DNS

clients in a given direction and average distance in that direction.

k-root

5 global instances6 local instances

c-root

4 global instances

f-root

2 global instances30 local instances

Influence Maps

13

c-jfk

c-iad

c-lax

c-ord

c-jfk

c-iad

c-laxc-ord

k-apni c

nap

k-denic

ams-ix

delhi

linx

k-cern

tokyo

k-qtel

k-emix

k-mix

k-isnic

k-ficix

k-grnet

k-apnic

napk-denic

ams-ix

delhi

linx

tokyo

k-mix

f-pao1

f-mty

f-tpe

f-mad

f-kix

f-sel1y

f-lax

f-prg

f-hkg

f-gru

f-lga

sfo2

f-bne

f-yyz

f-lis

f-yowf-scl

f-bcn

f-jnb

f-akl

f-tlv

f-dac

f-dxb

f-cgk

f-maa

f-khi

f-mty

f-laxf-hkg

f-ams

f-lga

sfo2

f-bne

f-yyz

f-sin

f-bcnf-jnb

f-maa

f-khi

Server Keynodenumber of clients10010,0001,000,000

halonumber of clients1 2-8 9-7273-619 620-5289 5290-4511745118-384790

conclusion

• Geographic clustering of clients to local instances is high.- 60% of clients experience small distance penalties between selected

and optimal instances

- local instances have strong diurnal patterns of use

• Small minority of clients experience a change in instance. • ASes/IP addresses unevenly spread across instances,

especially for f-root

• Propose second data colletion 9th-10th, January 2007

14

http://imdc.datcat.org

is an NSF funded Internet measurement

data catalog.

DatCat

15

http://imdc.datcat.orgTarget Problems• data everywhere and lots of it- caida alone has over 50 terabytes of data

• pcap (tcpdump) packet traces• skitter topology traces• routing tables• etc

• data is hard to find- no central indexing- many one-off data collections

16

http://imdc.datcat.orgGoalsMake the following processes easy for users.

• finding data sets of interest- Many researchers lack access and/or expertise to collect data

needed for their research.

• adding new data sets to the catalog- Contributors (who are generally underfunded and providing

data out of dedication to the general good) want to minimize time lost to their own research.

• annotating data sets in the catalog- Provides a flexible way for contributors and users to mark up

interesting facts about data sets. Such as the number of packets or that a given file is corrupted.

17

http://imdc.datcat.orgCurrent Statuscurrent implementation

• user accounts• data sets- only CAIDA data so far- 57,088 files indexed - 4.8 TB of data indexed- 11 separate collections- 100+ accounts registered

• browse simple/advanced search• help/tutorial• API for bulk contributions

18

http://imdc.datcat.orgOverview

19

http://imdc.datcat.orgBrowse Catalog

• Browse page is a list of data file collections.- Collections are a high level group of related files.- A file may belong to more than one collection.

• User clicks on collection of interest to view the collection details.

select collection of interest

20

http://imdc.datcat.orgCollection Details

• Collection details displays specifics of the collection.

- description of collection and its contents- motivation behind the collection

• If the collection matches the researchers’ interest they can then list the data objects.

go to a list of collection’s data objects

21

http://imdc.datcat.orgData Files

• A listing of the data files contained within the selected collection. Also shown: general statistics about the files such as format and size.

• After the user has selected the set of files they are interested in, they then find packages which contain them.

select desired data

find the data’spackages.

22

http://imdc.datcat.orgPackages

• Packages are the downloadable groupings of one or more data files.

• Once the user has selected a set of packages, he then finds download locations.

select desired packages

find download locations

23

http://imdc.datcat.orgLocations

• Locations are the place or process by which the package can be obtained.

- Provides a simple URL or instructions which must be followed to get the data.

• Get your data!

select a location forthe package

24

http://imdc.datcat.orgFuture Work

• small number of invited third party contributors• public contributors• tools• studies- collections specialized for papers / experiments / etc

25

two days in the life of datcat overview and demonstration...overview and demonstration two days in...

Documents