dnsr: domain name su x-based routing in overlay networks

DNSR: Domain Name Suffix-based Routing

in Overlay Networks ?

Demetrios Zeinalipour-Yazti

Dept. of Computer Science

University of California

Riverside, CA 92507, U.S.A.

[email protected]

Abstract. Overlay Peer-to-Peer (P2P) networks are application layer networks which facilitate users

in performing distributed functions such as keyword searches over the data of other users. An im-

portant problem in such networks is that the connection among peers are arbitrary, leading in that

way to a topology structure which doesn’t match the underlying physical topology. This phenomenon

leads to excessive network resource consumption in Wide Area Networks as well as degraded user

experience because of the incurred network delays.

Most state-of-the-art research concentrates on structuring overlay networks in a way that query

messages can reach the appropriate nodes within some hop-count boundaries. These approaches are

not taking into account the underlying network topology mismatch making it therefore inappropriate

for wide area routing.

In this work we propose and evaluate DNSR (Domain Name Suffix-based Routing), which is a novel

technique to route query messages in Overlay Networks, based on the ”domain closeness” of the node

sending the message. We describe DNSR and show simulation experiments which are performed over

PeerWare, our distributed infrastructure which runs on a network of 50 workstations. Our simulations

are based on real data gathered from one of the largest open P2P networks, namely Gnutella.

1 Introduction

The advancement in public networks in the last recent years has increased the demand for distributed

application-layer collaboration suites that can be used in the context of multicast [2], object-location [8, 9],

ad-hoc collaboration [27] and information retrieval[5, 6, 7]. Moreover the recent initial success of centralized

and distributed Peer-to-Peer systems such as Napster [29, 23] and Gnutella [26] has proven that distributed

applications are feasible and that they may dominate the client-server model in the coming years.

Overlay Peer-to-Peer (P2P) networks are application layer networks which facilitate users in performing

distributed functions such as keyword searches over the data of other users. In Unstructured P2P networks,

? Course Project for ”CS202 - Advanced Operating Systems ”, with Vana Kalogeraki at the University of California

- Riverside, Department of Computer Science, April 2003. http://www.cs.ucr.edu/~csyiazti/cs202.html

1

10

100

1000

10000

100000

1 10 100

Num

ber

of IP

s co

ntrib

uted

.

Number of Domains. (Log-Log Plot)

Rank of Domains (based on IPs contributed to Gnutella)

Domains in June 2002

Fig. 1. The Number of IPs contributed by number of ISPs or domain in Gnutella on the 1st of June 2002. The

total IP set includes averagely 300,000 IP addresses and the figure shows that most of these IPs are contributed by

only a small number of ISPs or domains.

network hosts have neither global knowledge nor structure and efficient query routing is based on routing

indices [3], heuristics [7] and caching [5]. In structured P2P networks [8, 9] on the other hand, network

hosts and objects are structured in such a way that object location can be guaranteed within some hop

count boundaries. An important problem in both types of networks is that the connection between any two

peers is arbitrary, leading in that way to a topology structure which doesn’t match the underlying physical

topology. This phenomenon leads to excessive network resource consumption in Wide Area Networks as

well as degraded user experience because of the incurred network delays.

In this paper we will describe DNSR which is a simple routing algorithm for Overlay Networks which

attempts to route messages to nodes whose domain name is closer to the domain name of the node that

needs to forward a message. Such a protocol can be useful in many different settings. In the context of

distributed web crawling for instance, it is desirable to send a crawling request to nodes that are closest

to the target web server since this will reduce the network latency for the subsequence downloads between

the crawler and the web server. Another application might be in the context of distributed file sharing. A

user looking for a file would benefit a lot by firstly querying nodes which belong to the same ISP. Moreover

once the file is located the actual download time might also be reduced significantly.

1.1 Motivation

Our work is motivated from our study on the Gnutella network traffic in [4]. In this study we found among

others that most of the network hosts found in the Gnutella network do actually belong only to a small

number of ISPs or domains (see figure 1). More specifically we found that 58, 73% of the IP addresses

AverageRTT= 9 ms 4 Router Hops

AverageRTT= 140ms 19 Router Hops

8,806Km


8,747Km

pc-62-30-117-83-cr.blueyonder.co.uk

sdcax6-097.dialup.optusnet.com.au


12,764Km

p237-165.yahoo.co.jp

roc-24-169-109-208.rochester.rr.com

12-224-0-236.client.attbi.com AverageRTT= 46ms 13 Router Hops

1,544Km


3,933Km

66-215-0-xx1.oc-nod.charterpipeline.net

66-215-0-xx2.oc-nod.charterpipeline.net

London

Melbourne

Seattle Rochester

Tokyo

Riverside

Fig. 2. Intra-domain routing becomes attractive as compared to routing between Autonomous Systems that are

geographically dispersed because the Average Round Trip Time (RTT) is reduced significantly.

found in the Gnutella network are owned by only 20 ISPs. In this work we want to exploit this observation

in order to build and evaluate a more efficient routing algorithm for overlay network.

The effect of creating overlay topologies without being aware of the underlying topology can have a

dramatic impact on the performance of the application running on top of the overlay. Lets consider for ex-

ample figure 2, where node 66-215-0-xx1.oc-nod.charterpipeline.net, located in Riverside (USA), is

connected to six other peers in Tokyo (Japan), Melbourne (Australia), London (U.K), Rochester (U.S.A),

Seattle (U.S.A) and Riverside (U.S.A). The presented IP addresses are actually taken from the set of IP ad-

dresses found in the Gnutella network. For each host we calculate the average Round Trip Time (RTT) and

the number of intermediate routers. We can see that the RTT for intra-domain routing is minimal (≈ 10ms).

Therefore the Propagation delay, which is a function of the physical distance between A and its peers, is

kept minimal. The same happens for the Queuing Delays, which are a function of the number of interme-

diate routers. Therefore it would be desirable to have node 66-215-0-xx1.oc-nod.charterpipeline.net

# Sub-domain Number of IPs

1 .tampabay.rr.com 1364

2 .nyc.rr.com 1276

3 .houston.rr.com 1047

4 .austin.rr.com 951

5 .twcny.rr.com 855

Table 1. The first five sub-domains of the RoadRunner ISP (rr.com). The table shows that it is feasible to form

clusters of IPs which have the longest suffix domain name match.

connect to other fellow users in the same domain (e.g. 66-215-0-xx2.oc-nod.charterpipeline.net)

rather that somebody in a different domain.

Somebody would expect that the main obstacle in our effort is that domain names do not reveal too

many things about the geographic location of some host. For example the web server of the Telia.com

resides physically in Sweden although it has a .com domain name. Fortunately, as we will describe in

this report, the DNSR algorithm does not rely on the geographic location of a domain but rather on the

assumption that users of the same domain are clustered together. Given such a topology DNSR tries to

keep message routing local (i.e. within hosts of the same domain)

Another point to denote is that hosts within the same domain (e.g. rr.com) may be further structured.

We can see this result in table 1, where we present the first five sub-domains of the RoadRunner ISP. Inter-

estingly this particular ISP, as well as some others, reveals geographical location of a particular host. This

observation is utilized in the GeoTrack technique presented in [12]. Unfortunately it is not always possible

to infer geographic location because a domain name does not inherently contain any such indication. For

example we have observed that many ISPs like America Online (aol.com) use domain names of the form

hashcode.aol.com. Although such a structure can be characterized as a flat domain name structure, we

believe that the hashcode assignment is not random. Given the reverse hash function somebody should be

able to map the domain name to some geographic location, but this is just a conjecture.

The main design objectives of the DNSR routing scheme can be summarized as following:

1. Decentralized Routing Algorithm. Routing algorithms that use global knowledge, such as the

Link State Routing algorithm used in OSPF [15], tend to have a significant communication overhead

which makes them inefficient for dynamic environments were the participants are not known a’ priori.

DNSR is a decentralized routing algorithm which aims to use information about direct neighbors only.

This characteristic will provide DSNR with scalability.

gnuDC - gnutella Distributed Crawler

gnuDC Brick

Log Manager

P2P Network Module

Local Logs config.txt

IP Index Server

Local Repository

gnuDC Brick

gnuDC Brick

gnuDC Brick

Local Logs

Local Logs

Local Logs

.

.

. Logs Aggregator

Local Repository

Logs Analyzer

Results

1

2

3

4

Gnutella Network

Fig. 3. gnuDC - Gnutella Distributed Crawler.

2. Avoid Routing Updates. Our routing scheme is targeted towards dynamic environments were nodes

join and leave in an ad-hoc manner. Routing updates in dynamic environments come with a certain

price though because at the time a change is propagated it may already be outdated. Beside that it

also introduces a huge communication overhead which is difficult to be sustained. Therefore we want

to avoid the propagation of routing updates as much as possible. In order to overcome failure cases we

will use certain amounts of redundancy.

3. Constant Routing Table Size. The routing table size is an important issue in routing. Approaches

like Chord [8] use a routing table of size logN , where N is the size of the network. DNSR on the

contrary doesn’t define the routing table as a function of the network size but it rather uses a constant

table of size k (i.e. if a node can accept k connections it maintains information for those nodes only).

This both reduces the lookup time and the time to add/remove entries for nodes that join or leave.

4. Simplicity. The last and most important design issue is that we want to build a simple and efficient

routing algorithm. We believe that overlay networks can become difficult to analyze and predict if

the protocols are too complicated. Successful client-server protocols such as http [16], smtp [18] or

pop3 [17] own much of their success to their simplicity.

# Country Dom. IPs % # Country Dom. IPs %

1 Network .net 94, 456 38, 88% 11 Belgium .be 2, 527 1, 04%

2 US Commercial .com 81, 943 33, 73% 12 Italy .it 2, 038 0, 84%

3 Canada .ca 8, 039 3, 31% 13 Sweden .se 1, 532 0, 63%

4 France .fr 5, 565 2, 29% 14 Spain .es 1, 495 0, 62%

5 US Educational .edu 5, 102 2, 10% 15 Singapore .sg 1, 333 0, 55%

6 England .uk 4, 118 1, 69% 16 Switzerland .ch 1, 256 0, 52%

7 Germany .de 3, 693 1, 52% 17 Japan .jp 1, 089 0, 45%

8 Australia .au 3, 663 1, 51% 18 Norway .no 1, 010 0, 42%

9 Austria .at 2, 962 1, 22% 19 Brazil .br 775 0, 32%

10 Netherlands .nl 2, 625 1, 08% 20 New Zealand .nz 651 0, 27%

Table 2. Distribution of Gnutella IP Addresses to Domains.

2 Analysis and Experiences with the Gnutella Network.

In this section we describe some interesting network properties of the Gnutella Network that we obtained

with gnuDC in [4]. gnuDC is our large-scale distributed Gnutella Crawler. which allows us to obtain various

Gnutella network traffic metrics. Our analysis includes 294, 000 unique IP addresses that were gathered

from pong messages, routed through our system in June 2002, over a period of 5 hours. The gathered data

and observations initiated our effort in building and evaluating DNSR

gnuDC consists of several gnuBricks, which are gnutella clients that log various network activities,

and each gnuBrick maintains a local Hashtable of all IPs that it has seen before in order to avoid sending

duplicate IP addresses to the Index Server. The Index Server is responsible to filter out duplicate IP entries

since it maintains a global view of all IP addresses observed by the system.

After performing a reverse DNS lookup on the IP addresses that we gathered we ended up with a set of

244, 522 resolved IP addresses. An aggregate of 49, 478 or 16, 92% were not resolvable. We mention that the

non-resolvable set of IPs contain both hosts which where not reachable at the time of the resolution as well

as IP addresses which are allocated for private networks [19] (i.e. 192.168.x.x, 172.16.x.x and 10.x.x.x).

2.1 Domain Distributions of Gnutella Internet Hosts

In this subsection we investigated from which domains are the Gnutella users coming from. Clip2 [25]

reported in 2000 that Gnutella was a truly international phenomenon. Our measurements indicate that

although Gnutella has a worldwide audience most of its users come from only a few countries (i.e. U.S.A.,

Germany, Canada, France and England).

Overall Ranking of Organizations (ISPs)

# ISP Domain Country % # ISP Domain Country %

1 Road Runner rr.com US. 9, 43% 11 Adelphia Comm. adelphia.net US. 1, 73%

2 American Online aol.com US. 7, 49% 12 Wanadoo wanadoo.fr France 1, 67%

3 T-Online t-dialin.net Germany 6, 48% 13 Rogers Comm. rogers.com Canada 1, 62%

4 AT & T attbi.com US. 6, 02% 14 Woolworths Gr. co.uk England 1, 58%

5 Comcast comcast.net US. 3, 98% 15 ntl Group LTD ntl.com England 1, 34%

6 Cox Comm. cox.net US. 3, 35% 16 Verizon verizon.net US. 1, 27%

7 Shaw shawcable.net Canada 2, 30% 17 SBC Pacific Bell pacbell.net US. 1, 26%

8 Sympatico Lycos. sympatico.ca Canada 2, 15% 18 Verizon (DSL) dsl-verizon.net US. 1, 03%

9 CSC Holdings optonline.net US. 2, 09% 19 British Telec.. btopenworld.com England 1, 00%

10 BellSouth Telec. bellsouth.net US. 2, 00% 20 SBC Internet swbell.net US. 0, 94%

Table 3. Overall ranking of domains based on the number of hosts they contribute to the Gnutella Network.

Table 2 presents the top 20 domains from which Gnutella users are coming from. Although it was

expected that both .net and .com domains will dominate in this measurement, since these domains are

globally used by ISPs, we also found that the number of Gnutella users from various domains is more a

function of how advanced the networks of the various ISPs in these countries are rather than the actual

number of Internet users in these countries.

2.2 Internet Service Provider share of Gnutella Internet Hosts

Table 3 presents the overall ranking of ISPs based on their share of Gnutella Hosts they are contributing

to the Network. We can see clearly that US, Canadian, German, French and English organizations are

dominating the Gnutella network. This table shows that the largest part of the Gnutella network is occupied

by only a few countries. The table also reveals that Asian countries that have advanced networks, such as

Japan, are not particularly active in this community although their popular Napster-like File Rogue [24]

service was suspended.

The DSS group also verified that Gnutella is a truly international phenomenon, since one of three hosts

was found to be located on a non US-centric domain. Their study analyzed 3.3 million addresses, of which

1.3 million (39%) were resolvable to non-numeric hostnames. On this subset of addresses they found that

the ratio of domination was 19 : 8 : 2 : 1 for the following domains COM, NET, and EDU and combined

{ORG, US, GOV, and MIL} respectively.

3 DNSR - Domain Name Suffix-based Routing.

The DNSR protocol is a decentralized routing algorithm which has as a main objective to keep the traf-

fic generated by P2P applications within the same domain. In this section we will provide a technical

description of the algorithm and show how it can be deployed in a real setting.

3.1 Basic Notation

In the rest of this paper we will make use of the following conventions:

Network or Topology , denoted as N , consist of m hosts {n1, n2, ..., nm} inter-connected with some

topology (such as random or DNSR)

Degree of a node, denoted as di, is the number of connections a node ni can maintain at any given point.

di can be further divided into dini and douti , denoting the number of incoming and outgoing connections

respectively. We assume that d bounds the sum of din and dout in such a way that for a given d, din+dout = d

λ-suffix of a node, denoted as λi, is the ith largest suffix part of the DNS name of node. For a ex-

ample a node with a DNS of ”cs6368146-17.austin.rr.com” has a λ0 = ”com”, λ1 = ”rr.com” and

λ2 = ”austin.rr.com”. λl-suffix of a node, is the largest λ-suffix for a given DNS name. For the DNS

example cs6368146-17.austin.rr.com, the λl-suffix is equal to ”austin.rr.com”.

λ-similarity(dns1, dns2) , between two dns names dns1, dns2, is the largest match in the λ-suffices

of the parameters. For dns1 = ”cs6368146 − 17.austin.rr.com” and dns2 = ”othernode.aol.com”, λ-

similarity=0 since the two dns names match only on λ0. For dns1 = ”cs6368146− 17.austin.rr.com” and

dns2 = ”othernode.rr.com” λ-similarity=1 since the two dns names match only on λ1. If the two dns

names are completely irrelevant the λ-similarity=-1.

Sibling Factor (sfi) of a node, is the number of connections a node ni aims to maintain to its λl-suffix

match nodes. For a node ni=cs6368146-17.austin.rr.com, ni aims to maintain at any given point sfi of its

connections (either incoming or outgoing) to other nodes with the same λl-suffix match (i.e. austin.rr.com).

Parent Factor (pfi) of a node, is the number of connections a node ni aims to maintain to its λl−q-suffix

match nodes, ∃q ∈ [1..l). ni tries to maintain connections to parent nodes by having the value for q as small

as possible. This will yield a form of hierarchical topology which is something desirable in DNSR. For a

node ni=cs6368146-17.austin.rr.com, ni aims to maintain at any given point pfi of its connections (either

n i = cs6368146-17.austin.rr.com

n j = node-17.rr.com

sf i =0.8

pf i=cfj =0.1 d i =10

*.austin.rr.com

cfi =0.1

Fig. 4. Notation.

incoming or outgoing) to other nodes with the λl−1-suffix match (i.e. ”.rr.com”) or λl−2-suffix match (i.e.

”.com”) if the prior are not available.

Child Factor (cfi) of a node, is the number of connections a node ni accepts from its λl+q-suffix match

nodes, ∃q > 0. ni tries to maintain connections to parent nodes by having the value for q as small as

possible. For a node ni=cs6368146-17.austin.rr.com, ni aims to maintain at any given point cfi of its

connections to other nodes with the λl+1-suffix match (i.e. ”.subdomain.austin.rr.com”) or larger λ-suffix

match nodes if any prior node is not available.

The relation of Degree (di) and Parent/Sibling/Child Factor (pfi/sfi/cfi) . The purpose of

the pfi, sfi and cfi factors (pfi + sfi + cfii = 1), are to allow ni to determine how to allocate the di

connections to its peers. Whether these connections are incoming our outgoing is orthogonal since each ni

can anyway sustain at any given point only di connections. The three factors will allow a node to set an

order of preference to the connections it is establishing or accepting. In the current scheme we assume that

a node is creating outgoing connections to only parent and sibling nodes. A node is accepting incoming

connections from its children and sibling nodes only. To make the reading more understandable references

to all three factors (Parent, Sibling and Child) will be denoted as Level factors.

3.2 DNSR Topology

A DNSR topology is a semi-hierarchical topology where nodes having the same λl-suffix (i.e. sibling nodes)

are highly connected and connections to the parent or children layers are more sparse. This objective is

achieved by tuning the pfi, sfi and cfi factors. Since the objective of DNSR is to keep traffic within the

same domain we assign a large sfi value to ni such that sfi � pfi and sfi � cfi. In figure 4, we can see

that ni has sfi = 0.8, pfi = 0.1 and cfi = 0.1. Given that di = 10, ni aims to be connected to sfi ∗ di = 8

sibling nodes, pfi ∗ di = 1 child nodes and cfi ∗ di = 1 child nodes. Since ni has no children nodes it may

temporarily assign the particular slot to a sibling or parent node (preferably to sibling). In that way ni

Level 1 Level 1

Level 2

n1.ucr.edu

n1.cs.ucr.edu

n2.cs.ucr.edu

n3.cs.ucr.edu

n2.ucr.edu n6.ucsd.edu

n2.cs.ucr.edu

n2.cs.ucr.edu Level 2

n2.cs.ucsd.edu

n3.cs.ucsd.edu

n2.cs.ucsd.edu

DNSR Topology Instance d i = 3 -------------- pf i = d i / 3 sf i = d i / 3 cf i = d i / 3

Fig. 5. A snapshot of a DNSR Topology of 11 nodes. Each node has a degree d=3 and each node is launching

outgoing connections to 2d3

Sibling Nodes and d3

Parent Nodes.

achieves greater connectivity and potentially obtains a larger horizon.

A DNSR Topology snapshot for 11 hosts each of which having a degree of 3, can be seen in figure 5.

As we can see the upper levels of the DNSR topology have sparser connections among them and the leaf

nodes more dense connections. In the particular example the number of hosts is very small which therefore

does not allow us to illustrate the full potentiality of a DNSR topology. If on the contrary the topology

was larger and the sfi factor larger then we would be able to clearer observe that most of the connections

are among the same domain.

As we will see later in subsection 3.6, having such a topology a node can route messages in such a way

that most of the incurred traffic remains within the same domain.

3.3 Joining the DNSR Network

Let nj denote a P2P client which wants to join a DNSR network. Since nj doesn’t know which other

nodes are currently active in the network, nj has to consult a discovery service D, to obtain an initial list.

DNSR doesn’t specify the details of the initial discovery part. It assumes that an out-of-band discovery

service will provide nj with a random list L of active hosts L={nrand1, nrand2

, ..., nrandk}. The Discovery

Service D, might in fact be implemented in a similar way to techniques that are currently deployed in

P2P networks, such as Gnutella. In Gnutella two different techniques are deployed i) Discovery through a

HostCache [21, 22] and ii) Discovery by randomly probing nodes to which nj was connected in the past.

The role of D might in fact be extended in such a way that it provides nj with a ”selective” set of

hosts (i.e. hosts that match better the Level factor needs of nj . We nevertheless believe that the Discovery

service can be implemented in many different ways and that its exact operation depends on the service

that will utilize the DNSR protocol.

Having nj obtained a the random list L={nrand1, nrand2

, ..., nrandk}, DNSR requires nj to probe the k

random nodes for the best entry point(s). He does so by sending to all k nodes a message of the form:

nj: LOOKUP mynode.domainX.com

nrandi: +LOOKUPOK othernode.domainX.com

nrandi would find the ”best appropriate” node to nj ’s request by performing a Domain-Name Lookup

in the DNSR Network. The ”best appropriate” node denotes the node with the λl-suffix match for nj . The

lookup operation is described in further detail in the next subsection.

3.4 Domain-Name Lookup in the DNSR Network

We already described in the previous subsection that the Domain-Name Lookup is used for admitting a

node nj to the network. This would provide nj with the nodes that have a λl-suffix match with himself.

In order to achieve this each node ni receiving the lookup dns will chain the query by forwarding it to

only one of its connections with the largest λ-suffix match to the dns. If no such entry exists then ni can

forward the dns lookup to a random node.

From the example of figure 6 we can see that a lookup is initiated on nj ’s behalf and is then chained

through a number of intermediate hosts until we reach a host with the highest λ-similarity of the .cs.ucsd.edu

domain suffix we are looking for. In order to keep routing simple we deploy the following scheme. Each

time a node receives a lookup message it calculates the λ-similarity of himself and the predicate (nj , n4).

If the λ-similarity is -1 (no similarity at all) or 0 (similarity on λ0) then the lookup is forwarded to a

parent node. If on the contrary λ-similarity=λl (which means that the predicate is a sibling node) then the

lookup terminates and a response is sent back along the same path the lookup arrived. Finally if 0 < λ-

similarity< λl (which means that the predicate is a child node) then the query is forwarded downwards

until λ-similarity=λl at which point the query terminates.

Of course there is a possibility that λ-similarity never becomes equal to λl since a node with the same

λl suffix might not exist in the network. To cope with this problem the last node in the chain can either

simply return a LOOKUPOK message urging nj to join him (since the last node in the chain is anyway the

most appropriate node) or he might forward the query to one of its sibling nodes since they are equivalently

appropriate.

Level 1 Level 1

Level 2

n1.ucr.edu

n1.cs.ucr.edu

n2.cs.ucr.edu

n2 = n5.cs.ucr.edu

n3 = n2.ucr.edu n4 = n6.ucsd.edu

n3.cs.ucr.edu

n1 = n4.cs.ucr.edu Level 2

n2.cs.ucsd.edu

n5 = n3.cs.ucsd.edu

n4.cs.ucsd.edu

Domain-Name Lookup in DNSR Topology

Lookup

Overlay Connection

n j = n5.cs.ucsd.edu

Lookup Response

random

level up

level up

level down

Fig. 6. A Domain-Name Lookup in the DNSR topology is being used to find the node(s) that have the highest

λ-suffix match to the dns we are looking for.

3.5 Leaving a DNSR Topology

Leaving a DNSR topology does not require any form of a’ priori notification. Therefore nodes can leave

the network in an ad-hoc manner. It is expected that each node will try to keep its degree to some pre-

determined value di. Therefore if a node nj leaves, nj ’s neighbors must try to establish a connection to a

different host, keeping on the same time the Level factors at the right value. If for some reason a node is

not able to find appropriate nodes that will keep its Level factors at the pre-determined value then it may

temporarily keep them unbalanced until a more appropriate node is found.

Of course the problem is still how to find out new nodes in the network. Again a number of different

techniques can be deployed. A potential technique might be to exchange PING/PONG descriptors (like

Gnutella) and actively discover nodes for which a particular Level factor is not satisfied. If for example a

node needs to have 2 parent links and is able to find only 1 and may decide to send a PING descriptors to

its parent, in order to discover another parent which will be able to accommodate his connection request.

An alternative technique would be to repeat the procedure described in the joining phase, where a

node contacts a Discovery Service, obtains a random list and the performs a LOOKUP to find the most

appropriate entry point. In the context of this project we did not have adequate time to evaluate any of

the above techniques and therefore leave the issue open for some future work.

3.6 Searching in a DNSR Network

In the previous subsection we have shown how a node nj joins a DNSR topology. This allows to position

nj near to other nodes that have the largest λ-similarity to nj . In this subsection we will show how can nj

search the contents of other nodes. Fortunately the DNSR topology doesn’t restrict the search algorithm to

Level 2

Level 1 n1.ucr.edu

n1.cs.ucr.edu

n2.cs.ucr.edu

n2 = n3.cs.ucr.edu

n3 = n2.ucr.edu

n2.cs.ucr.edu

n1 = n2.cs.ucr.edu

Searching in DNSR Topology using BFS

QUERY

QUERYHIT

Fig. 7. A Search in the DNSR topology can be done with a number of different techniques. In this example we

are using for simplicity BFS. The important point to notice is that only a modest fraction of query messages make

their way through to a different level. (In this example sfi = 0.33).

be deployed on top of its topology. A number of different techniques such as Breadth-First-Search (BFS),

Random BFS or ISM[5] can be deployed. The bottom-line with all techniques is that bulk of the incurred

traffic will remain within the same domain since the sfi factor is set to a large value such that sfi � pfi

and sfi � cfi. Therefore only a modest amount of traffic will make its way to a different level of the

network. A node nj searching for some content sends a message of the form to some of its neighbors:

nj: QUERY some query

ni: +QUERYHIT IP, PORT

If the nodes deploy the BFS algorithm nj would send the query to all of its neighbors (parents, siblings

and children). The same would happen at each node that receives the query until a TTL (time-to-live)

parameter becomes zero. The TTL parameter, starts out by some constant value (e.g. 7) and is decreased

by one at each query forward. Therefore after 7 hops the query will terminate. The TTL technique is used

widely used in network applications. The important point in the context of DNSR is that at each forward

only a fraction of cfi + pfi messages at each forward will get their way to a different level.

The DNSR topology gives spaces for more sophisticated search techniques. One large-scale application

for instance may decide not to forward a query to any parent or children node at all, given that the query

might be satisfiable from only the sibling nodes. For example on table 1 we have shown that somebody is

able to find Gnutella approximately 1300 nodes of the .tampabay.rr.com domain itself. If these nodes were

interconnected with a DNSR topology then it would probably make sense to route messages to sibling

nodes only. Nevertheless we believe that a DNSR topology will be able to host different search techniques

based on the context they are used in.

4 Experimental Evaluation.

In order to test the applicability of the DNSR algorithm we would need to have access to a number of

hosts running on hundreds or thousands of sites (i.e. domains). Since this was not feasible in the context

of this project we decided to simulate the DNSR algorithm over the PeerWare Simulation Infrastructure

in our LAN. PeerWare [5] is our distributed middleware infrastructure which runs on a network of 50

workstations and which allows us to benchmark different routing algorithms for P2P networks. Probing

different query-routing algorithms over middleware P2P systems can be interesting from many points of

views:

1. In real settings the scalability of various query-routing algorithms may be explored to the fullest extend

since there are no assumptions which are typical in simulation environments.

2. Moreover many properties, such as network failures, dropped queries due to overloaded peers and

others may reveal many interesting patterns.

3. Finally, in a middleware infrastructure we are also able to capture the actual time to satisfy queryhits.

PeerWare consists of three main components:

1. graphGen - Network Graph Generator, which generates a network topology to be used for the simu-

lation. graphGen generates a number of files which contain initialization information for the various

dataPeers that will comprise a simulation.

2. dataPeer - The Data Node, which is a P2P client that answers to queries with queryhits if it meets the

search criterion. A dataPeer initializes a number of connections to hosts as these hosts are indicated

by graphGen

3. searchPeer - The Search-Node, which is a P2P client that submits a number of queries in a PeerWare

network and harvests the returned results. In contrast with dataPeer, searchPeer consists only of a

Network Module and a Result Logging Mechanism. Besides logging the number of results it also gath-

ers a number of other statistics such as the number of nodes answered to a particular query and the

time to receive the results.

In order to customize PeerWare to the needs of the DNSR algorithm we first extended graphGen and

generated two different types of topologies i) Random and ii) DNSR topology. For each of the topologies

we map a subset of DNS entries that we analyzed in section 2. These DNS addresses are mapped over the

# Domain Number of Nodes. Percentage # Domain Number of Nodes. Percentage

1 com 397 39% 6 au 18 1%

2 net 388 38% 7 be 13 1%

3 ca 35 3% 8 de 12 1%

4 edu 21 2% 9 uk 12 1%

5 fr 19 1% 10 at 9 0%

Table 4. Distribution of λ0 for the 1000 randomly sampled DNS entries. This table shows that the sampling from

the initial set of 294.000 IP addresses is accurate and captures the actual distribution (see table 2) of DNS entries.

# Domain Number of Nodes. Percentage # Domain Number of Nodes. Percentage

1 rr.com 109 10% 6 cox.net 31 3%

2 aol.com 87 8% 7 bellsouth.net 22 2%

3 t-dialin.net 69 6% 8 shawcable.net 22 2%

4 attbi.com 64 6% 9 sympatico.ca 22 2%

5 comcast.net 35 3% 10 optonline.net 17 1%

Table 5. Distribution of λ1 for the 1000 randomly sampled DNS entries. This table shows that the sampling from

the initial set of 294.000 IP addresses is accurate and captures the actual distribution (see table 3) of DNS entries.

physical IP addresses and port numbers of the hosts participating in a given simulation. In that way a

dataPeer can simulate a DNS name although its physical address is different, which therefore allows us to

simulate scenarios within our network of workstations.

4.1 Generating Simulation Topologies

In section 2 we presented an analysis of a set of ≈ 244.000 IP address gathered from the Gnutella network.

Since we are not able to simulate 244.000 DNS names due to shortage in PCs we decided to randomly

sample 1000 entries from the initial set.

Tables 4 and table 5 show that the distribution of λ0 and λ1 of the sampled hosts. The tables indicate

that the sampling is uniform and that it actually preserves the initial distributions of table 2 and table 3

respectively. For example in both the initial and the sampled sets λ1 for the .rr.com domain is ≈ 10%.

After obtaining the sampled set we generate two different topologies i) Random and ii) DNSR topology.

In the Random Topology we use an out-degree of 3. This would generate nodes who’s average degree is 6

(incoming and outgoing connections). We use the same degree value for the DNSR topology so that we will

be able to compare the topologies. graphGen generates a set of configuration files which can be read by the

Table 6. The myhosts.graph file for ”acbee1bf.ipt.aol.com” shows the outgoing connections that will be established

during initialization.

# UCR Random Graph Generator

# These are my settings

MYDNS = acbee1bf.ipt.aol.com

MYIP = 283-25.cs.ucr.edu

MYPORT = 10094

# Peers that I should connect to:

abbef1ef.ipt.aol.com = 283-22.cs.ucr.edu, 10707

bcdec1ba.ipt.aol.com = 283-21.cs.ucr.edu, 10720

cable-33-247.sssnet.com = 283-20.cs.ucr.edu, 10020

various nodes that comprise the simulation network topology. graphGen starts out by reading graph.conf,

which contains among others the following parameters:

1. Outdegree of a node, which is used in the case a random graph.

2. Topology of the P2P network (e.g. random graph).

3. IP List of hosts that will participate in a simulation. This allows us to map a logical topology (e.g.

Node1 -> Node10) to many different IP topologies

The output of graphgen is a directory of several myhosts.graph files (see table 6). Each file contains the

IP and port address of hosts to which a particular country must connect to. Each dataPeer ni reads upon

initialization a myhosts.graph file which contains the IP and ports of other dataPeer’s to which ni must

connect. Each dataPeer tries continuously to establish and maintain its outgoing connections. Therefore we

are not required to incorporate any topological sort algorithm. Connections among dataPeers are achieved

by the use of TCP Sockets and are persistent (they remain open until ni shuts down). If a TCP connection

goes down because of an overloaded peer then a node automatically re-establishes the connection after

some small interval.

4.2 Experiments

For the purpose of the experimentation we deploy 1000 dataPeers running on a network of 25 workstations,

each of which has an AMD Athlon4 1.4 GHz processor with 1GB RAM running Mandrake Linux 8.0 (kernel

2.4.3-20) all interconnected with a 10/100 LAN connection.

Obviously launching a large number of dataPeers on many different machines is a tedious procedure.

We have therefore constructed a set of UNIX shell scripts which automatically (by the use of ssh and

Random Topology

.com 47%

.net 45%

.edu 2%

.ca 4%

.fr 2%

Fig. 8. Distribution of QUERY messages reaching λ0

hosts in a Random Topology. The graph shows that

the distribution of hosts contacted is much like the

actual distribution of the hosts (see table 4)

DNSR Topology

com 85%

fr 4%

ca 7%

net 2% be 2%


hosts in a DNSR Topology. The graph shows that

the distribution of hosts contacted is almost only from

the ”.com” domain. This shows that 85% of the traffic

remains in the .com domain.

public/private keys) connect to any number of machines and launch the dataPeers. Bringing up a PeerWare

Network of 1000 dataPeers, on 25 machines takes about a minute.

After the PeerWare network is brought up we connect to one host ni and search using BFS a total

number of 40 queries each of which with a TTL of 7. None of the hosts actually answers to any of the

queries as we are not interested the QUERYHIT messages. What we are interested in is the distribution of

hosts contacted by these queries. Therefore our evaluation metric for the experiments was the distribution

of hosts contacted by the use of a DNSR topology as compared to a Random topology. It is expected that

DNSR would keep the bulk of the traffic within the same domain yielding therefore distributions where

one particular domain receives most of the queries.

As we can see in chart pies 8 and 9 the distribution of QUERY messages reaching λ0 hosts in a Random

Topology is is much like the actual distribution of the hosts (see table 5). The Random topology does not

favor any particular domain and the distribution of hosts that will receive a QUERY message is clearly a

function of the actual distribution of hosts in the network. On the other hand the DNSR topology favors

the hosts that have the greatest λ − similarity with the ni host we initially connected to. This happens

because ni is expected to be connected favorably with hosts that have the latest possible λ − similaritywith him. In this case since ni belonged to the rr.com domain table 9 shows that 85% of the traffic affected

”.com” hosts.

In this pie charts 10 and 11 we can actually clearer see that the ”rr.com” hosts contacted in the DNSR

topology is 24% as compared to 11% in the Random topology. This number could be much greater, in the

Random Topology - ISP Level

Other Domains

64%

.rr.com 11%

.aol.com 10%

.t-dialin.net 8%

7% .attbi.com


hosts in a DNSR Topology. The graph again shows

that the distribution of hosts contacted is much like

the actual distribution of the hosts (see table 5).

DNSR Topology - ISP Level

.aol.com 12%

.attbi.com 12%

.rogers.com 4%

Other Domains

48% .rr.com 24%


hosts in a DNSR Topology. The graph shows that the

distribution of hosts contacted favors the ”.rr.com” do-

main (24%). The node that we submitted the queries

to belonged to the rr.com domain.

case of the DNSR topology, if the topology included more hosts and if the sfi factor was larger. In this

experiment the sfi factor was 0.6 because we used only 3 outgoing connection per hosts.

4.3 Implementation

The PeerWare infrastructure is implemented entirely in Java. Its implementation consists of approximately

10000 lines of code. For this project we had to extend PeerWare by adding dnsr.apps.* which consists of

the implementations of the DataNode and the QueryNode. The package contains approximately 1000 lines

of code. The DNSR networking core package dnsr.core.* contains actually a very few additions to the

initial PeerWare package set. In the DNSR we have added the feature to each node along a Queryhit path to

insert its identity into the QUERYHIT message so that the sender becomes aware of the path the QUERYHIT

message travelled. The package dnsr.graphgen.* is also an extended version of the initial PeerWare

graphgen package set. It consists of 2200 lines of codes and is able to generate both random and DNSR

graphs. Finally a prototype version of a hostcache implementation is included in dnsr.multiserver.*.

Its implementation uses connection pooling to increase the performance of the system.

Java was chosen for a variety of reasons. Its object-oriented design enhances the software development

process, supports rapid prototyping and enables the re-use and easy integration of existing components.

Java class libraries provide support for key features of PeerWare: platform independence, multithreading,

network programming, high-level programming of distributed applications, string processing, code mobility,

compression, etc. Other Java features, such as automatic garbage collection, persistence and exception

handling, are crucial in making our system more tolerant to run-time faults.

The choice of Java, however, comes with a certain risk-factor that arises from known performance prob-

lems of this programming language and its run-time environment. Notably, performance and robustness

are issues of critical importance for a distributed system like PeerWare, which is expected to run on several

machines and to sustain high-loads at short periods of time. In our experiments, we found the performance

of Java SDK 1.3 satisfactory.

5 Related Work.

5.1 The Domain Name Service (DNS)

The Domain Name Service protocol which is described in [20] is an application layer protocol that uses

UDP and which translates ”mnemonic” hostnames (e.g www.cs.ucr.edu) to their underlying numeric IP

address(es) (e.g. 138.23.169.15). At the core the protocol consists of a distributed database implemented as

a hierarchy of name servers. There are three different types of name servers: (i) Local Name Servers, which

are in charge of caching and maintaining the DNS to IP mappings so that they can serve clients of a given

Autonomous System, (ii) Authorative Name Servers which are registering DNS to IP mappings (there are

usually two name servers per IP) and (ii) Root Name Servers which forward DNS resolution requests from

Local Name Servers to Authorative Name Servers in the case the first do not know the mapping.

The success of the distributed DNS database is attributed in our opinion to the fact that (i) domain

names don’t change often, which therefore gives space for sparse DNS updates (which will reflect the

LNS’s caches) and (ii) that the name server hierarchy is assumed to be static (since DNS daemons run

on high end servers). In the context of overlay P2P networks both advantages of DNS are unfortunately

not applicable since we have a completely dynamic topology where hosts are joining and leaving at high

paces. Therefore designing a DNS-like hierarchy of nodes for an overlay network where nodes can locate

other nodes that belong to the same domain might not be efficient or even applicable.

5.2 Topologically-Aware Overlay Construction and Server Selection.

In [1] the authors present a binning scheme in which nodes partition themselves into disjoint ”bins”, such

that nodes that fall in the same bin are relatively close to each other in the network. Such a scheme,

like DNSR, is incorporated for performance optimization rather than correct operation. Their scheme is

simple and relies only on a small number of landmarks which are positioned at well known addresses in

the network. A node prior joining the network calculates the network latency (i.e. RTT) of himself and k

landmarks. The ascending ordering of the k landmarks derives the bin of a particular node. It is expected

that two nodes with the same or similar order of their landmarks are actually close to each other. For the

server selection process, a node would specialized DNS server that holds DNS to {IP,bin} mappings, would

return most appropriate entry. The idea of the specialized DNS server could be extended into a Hostcache

server that returns peers that are relatively close to the peer making the request.

A main disadvantage of the scheme is that it relies on the reliability of the landmarks. For example if

one or more landmarks decide to leave the network, because they might in fact be ordinary nodes, then the

bin maintained by the various peers is not preserved any more. These nodes then need to find out another

landmark which might become an expensive operation in dynamic topologies where nodes join and leave

at a high pace.

5.3 Narada and End System Multicast

The Narada application layer multicast protocol is described in [2]. The main objective of the protocol is to

make Multicast successful by moving its logic from the Network layer to the Application layer. In that way

the protocol does not rely on the intermediate routers. The described protocol initially constructs a richer

connected graph, denoted as mesh, and then uses some mesh optimization algorithm to generate a mesh

that has certain performance properties. More specifically they attempt to ensure that (i) the shortest

path delay between any pair of members along the mesh is at most K, where K is a small constant and (ii)

that each member has a limited number of neighbors in the mesh. Narada uses mechanisms for member

joins, leaves and failures ensuring that the mesh is kept connected and that the mesh quality is improved

over time.

On top of the self-improving mesh, narada runs a Distance Vector Routing (DVR) algorithm to achieve

data delivery to group members of a multicast group. The metric used in the DVR algorithm is the la-

tency between neighbors. Their protocol is simulated over a a number of different types of topologies as

well as over a real setting of 13 hosts which are geographically distributed throughout the United States.

Their prototype system shows that the generated overlay spanning tree is matching the actual underlying

physical topology.

The main difference between Narada and DNSR are the following:

1. Narada is a multicast protocol for overlay topologies while DNSR is a variation of a broadcast protocol

where messages are routed to hosts which have the longest suffix domain name match.

2. DNSR is targeted for large ad-hoc communities where nodes join/leave at high paces.

3. As part of the mesh quality improvement algorithm, narada nodes randomly probe each other and

calculate the perceived gain in utility. In DNSR on the other hand this costly procedure is avoided

since nodes are already assumed to be connected to the best peers (i.e. peers within the same domain).

5.4 Connecting to Semantically Similar Nodes.

Semantically clustering nodes provides a different direction in P2P optimization. The parameter optimized

is the user satisfaction as a function of the quantity and quality of the returned results. Techniques such

as [13, 14] present an altogether different philosophy and are not directly comparable to DNSR since the

later optimizes the network efficiency parameter. In fact the network efficiency and the user satisfaction

criteria might be conflicting and the tradeoff between these two parameters inevident.

6 Conclusions.

In this work we propose and evaluate DNSR (Domain Name Suffix-based Routing), which is a novel

technique to route query messages in Overlay Networks, based on the ”domain closeness” of the node

sending the message. We describe DNSR and show simulation experiments which are performed over

PeerWare, our distributed infrastructure which runs on a network of 50 workstations. Our simulations are

based on real data gathered from one of the largest open P2P networks, namely Gnutella.

The experiments show that the idea of Domain Name Suffix-base routing of messages in Large Scale P2P

communities is highly applicable and that ISPs and corporations have to benefit a lot by the deployment

of such a scheme. The DNSR scheme is a simple technique which is expected to behave in the worse case

like a Random Topology.

In the future we plan to more carefully investigate the integration of DNS routing updates at the

highest level of the DNSR topology. Such a feature may provide nodes with the ability to more easily

locate domains in the case of Lookup queries. We are also interested in deploying a larger simulation

over the PlanetLab [28] distributed overlay testbed which is expected to run over 1000 geographically

distributed machines in the next 2 years.

References

1. Sylvia Ratnasamy, Mark Handley, Richard Karp, Scott Shenker Topologically-Aware Overlay Construction and

Server Selection. Proceedings of IEEE INFOCOM’02, 2002

2. Yang-hua Chu, Sanjay G. Rao and Hui Zhang ”A Case For End System Multicast”, Proceedings of ACM

SIGMETRICS, Santa Clara,CA, June 2000, pp 1-12.

3. Arturo Crespo and Hector Garcia-Molina, ”Routing Indices for Peer-to-peer Systems”, In ICDCS, 2002.

4. D. Zeinalipour-Yazti and T. Folias, ”Quantitative Analysis of the Gnutella Network Traffic”, Dept. of Computer

Science, University of California, Riverside, June 2000

5. V. Kalogeraki, D. Gunopulos and D. Zeinalipour-Yazti ”A Local Search Mechanism for Peer-to-Peer Networks,

”11th International Conference on Information and Knowledge Management (CIKM’2002) , McLean, Virginia

USA, November 4-9, 2002

6. Francisco Matias Cuenca-Acuna and Thu D. Nguyen. ”Text-Based Content Search and Retrieval in ad hoc P2P

Communities”, International Workshop on Peer-to-Peer Computing, Springer-Verlag, May 2002

7. B. Yang, H. Garcia-Molina, Efficient Search in Peer-to-Peer Networks. Proc. Int. Conf. on Distributed Com-

puting Systems, 2002.

8. I. Stoica, R. Morris, D. Karger, M. F. Kaashoek, H. Balakrishnan. Chord: A scalable peer-to-peer lookup service

for Internet applications. Proc. of ACM SIGCOMM 2001.

9. A. Rowstron and P. Druschel, ”Pastry: Scalable, distributed object location and routing for large-scale peer-to-

peer systems”. IFIP/ACM International Conference on Distributed Systems Platforms (Middleware), Heidel-

berg, Germany, pages 329-350, November, 2001

10. Miguel Castro, Peter Druschel, Y. Charlie Hu, and Antony Rowstron ”Topology-aware routing in structured

peer-to-peer overlay networks” IFIP/ACM International Conference on Distributed Systems Platforms (Mid-

dleware), Heidelberg, Germany, pages 329-350, November, 2001

11. Marcel Waldvogel and Roberto Rinaldi ”Efficient Topology-Aware Overlay Network” ACM Computer Com-

munication Review, Vol.33, No: 1, January 2003

12. Venkata N. Padmanabhan and Lakshminarayanan Subramanian ”An Investigation of Geographic Mapping

Techniques for Internet Hosts” Proceedings of ACM SIGCOMM 2001, San Diego, CA, USA, August 2001

13. M. K. Ramanathan, V. Kalogeraki and J. Pruyne ”Finding Good Peers in Peer-to-Peer Networks” International

Parallel and Distributed Computing Symposium (IPDPS), Fort Lauderdale, Florida (April 2002)

14. Arturo Crespo and Hector Garcia-Molina ”Semantic Overlay Networks” Stanford University

15. J. Moy Open Shortest Path First (OSPF) v2.0 Request for Comments: 2328, Network Working Group, April

1998

16. J. Gettys, J. Mogul, H. Frystyk, L. Masinter, P. Leach and T. Berners-Lee Hypertext Transfer Protocol

(HTTP/1.1) Request for Comments: 2616, Network Working Group, June 1999

17. J. Myers and M. Rose Post Office Protocol - Version 3 (POP3) Request for Comments: 1939, Network Working

Group, May 1996

18. Jonathan B. Postel Simple Mail Transfer Protocol (SMTP) Request for Comments: 821, Network Working

Group, August 1982

19. Y. Rekhter, B. Moskowitz, D. Karrenberg, G. J. de Groot, E. Lear, ”RFC1918 - Address Allocation for Private

Internets”, February 1996.

20. P. Mockapetris Domain Names - Implementation and Specification Request for Comments: 1035, Network

Working Group, November 1987

21. GWebCache, http://www.zero-g.net/gwebcache/specs.html.

22. GNetCache, http://sourceforge.net/projects/gnetcache/..

23. ”The SETI@home (Search for Extraterrestrial Intelligence at Home) Project”, UC Berkeley,

http://setiathome.ssl.berkeley.edu/.

24. File Rogue, File Rogue Inc. http://www.filerogue.com/.

25. Clip2. , Clip2.com, http://www.clip2.com/.

26. Gnutelliums, Gnutella, http://www.gnutelliums.com/.

27. Groove, Groove Networks Inc. http://www.groove.net/.

28. PlanetLab ”An open testbed for developing, deploying, and accessing planetary-scale services.”

http://www.planet-lab.org/.

29. Napster, Napster.com, http://www.napster.com/.

dnsr: domain name su x-based routing in overlay networks

Documents