868 ieee transactions on mobile computing, vol. 10, no....

13
Data Delivery Properties of Human Contact Networks Nishanth Sastry, Student Member, IEEE, D. Manjunath, Member, IEEE, Karen Sollins, Member, IEEE, and Jon Crowcroft, Fellow, IEEE Abstract—Pocket Switched Networks take advantage of social contacts to opportunistically create data paths over time. This work employs empirical traces to examine the effect of the human contact process on data delivery in such networks. The contact occurrence distribution is found to be highly uneven: contacts between a few node pairs occur too frequently, leading to inadequate mixing in the network, while the majority of contacts occur rarely, but are essential for global connectivity. This distribution of contacts leads to a significant variation in the fraction of node pairs that can be connected over time windows of similar duration. Good time windows tend to have a large clique of nodes that can all reach each other. It is shown that the clustering coefficient of the contact graph over a time window is a good predictor of achievable connectivity. We then examine all successful paths found by flooding and show that though delivery times vary widely, randomly sampling a small number of paths between each source and destination is sufficient to yield a delivery time distribution close to that of flooding over all paths. This result suggests that the rate at which the network can deliver data is remarkably robust to path failures. Index Terms—Pocket Switched Networks, human mobility networks, flooding, statistical properties, path failure tolerance. Ç 1 INTRODUCTION T HE Pocket Switched Network (PSN) paradigm [1] was proposed as a means of ferrying data using human social contacts. At its core is the idea that as the storage capacities of mobile devices increase, and support for short- range data transfer protocols (e.g., bluetooth) becomes more prevalent, we could use these devices to construct data paths in a store-carry-forward fashion: various intermediate nodes store the data on behalf of the sender and carry it to another contact opportunity where they forward the data to the destination or another node that can take the data closer to the destination. A typical contact opportunity where data hops from one node to another happens when two mobile devices are in range of each other, usually when the humans carrying the devices meet in a social situation. Thus, the path from source to destination is constructed over time and consists of a chain of intermediate nodes that altruistically carry the sender’s data at various times before the destination gets it. Designing for, and exploiting human mobility in this manner becomes important in situations where networking infrastructure has been damaged (e.g., after a disaster hits a city), or does not exist (e.g., in remote areas and in some developing countries). By the same token, certain applica- tions, such as email, database synchronization and certain types of event notifications, are inherently asynchronous and can tolerate relatively long delays. For such applica- tions, mobility can be exploited to provide multiuser diversity gains [2]. New applications, such as exchanging music and media files over an ad-hoc collocated peer-to- peer network, are being proposed [3] to take advantage of short-range connectivity between mobile devices. At an abstract level, the PSN of N nodes can be thought of as a temporally evolving undirected contact graph with N vertices. At any given instant, very few edges exist. Paths are created over time, with each contact corresponding to a new (momentary) edge in the graph. Edges appear and disappear according to some underlying stochastic process that corresponds to the social contacts. The sequences of edges (contacts) that occur constitutes a trace of the PSN. This work explores how contact occurrences shape data delivery, using empirical traces. Our simulations discover the quickest possible paths by flooding data at every contact opportunity. We then study the achievable performance of the contact network in terms of the fraction of data delivered (delivery ratio), as well as the time to delivery. The delivery ratio is indicative of the fraction of node pairs that can be connected and is, therefore, a measure of the connectivity of the network. We study performance from three different perspectives. First, we study the distribution of contact occurrences, by examining the evolution of the delivery ratio over a “natural” timescale that counts time in terms of the number of contacts made globally in the entire network. Next we study the distribution of delivery ratios over time windows of fixed duration. Finally, we study the distribution of delivery times of all the paths discovered by the flooding process. 868 IEEE TRANSACTIONS ON MOBILE COMPUTING, VOL. 10, NO. 6, JUNE 2011 . N. Sastry and J. Crowcroft are with the Computer Laboratory, University of Cambridge, William Gates Building, 15 JJ Thomson Avenue, Cambridge CB3 0FD, United Kingdom. E-mail: {nishanth.sastry, jon.crowcroft}@cl.cam.ac.uk. . D. Manjunath is with the Department of Electrical Engineering, Indian Institute of Technology (IIT) Bombay, Powai, Mumbai 400 076, India, and with the Bharti Centre for Communications. E-mail: [email protected]. . K. Sollins is with the Computer Science and Artificial Intelligence Laboratory, Masachusetts Institute of Technology, The Stata Center, Building 32, 32 Vassar Street, Cambridge, MA 02139. E-mail: [email protected]. Manuscript received 13 Feb. 2010; revised 1 July 2010; accepted 26 Aug. 2010; published online 15 Nov. 2010. For information on obtaining reprints of this article, please send e-mail to: [email protected], and reference IEEECS Log Number TMC-2010-02-0071. Digital Object Identifier no. 10.1109/TMC.2010.225. 1536-1233/11/$26.00 ß 2011 IEEE Published by the IEEE CS, CASS, ComSoc, IES, & SPS

Upload: others

Post on 22-May-2020

4 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: 868 IEEE TRANSACTIONS ON MOBILE COMPUTING, VOL. 10, NO. …people.cs.vt.edu/~irchen/6204/pdf/Sastry-TMC11.pdf · tions, mobility can be exploited to provide multiuser diversity gains

Data Delivery Propertiesof Human Contact Networks

Nishanth Sastry, Student Member, IEEE, D. Manjunath, Member, IEEE,

Karen Sollins, Member, IEEE, and Jon Crowcroft, Fellow, IEEE

Abstract—Pocket Switched Networks take advantage of social contacts to opportunistically create data paths over time. This work

employs empirical traces to examine the effect of the human contact process on data delivery in such networks. The contact

occurrence distribution is found to be highly uneven: contacts between a few node pairs occur too frequently, leading to inadequate

mixing in the network, while the majority of contacts occur rarely, but are essential for global connectivity. This distribution of contacts

leads to a significant variation in the fraction of node pairs that can be connected over time windows of similar duration. Good time

windows tend to have a large clique of nodes that can all reach each other. It is shown that the clustering coefficient of the contact

graph over a time window is a good predictor of achievable connectivity. We then examine all successful paths found by flooding and

show that though delivery times vary widely, randomly sampling a small number of paths between each source and destination is

sufficient to yield a delivery time distribution close to that of flooding over all paths. This result suggests that the rate at which the

network can deliver data is remarkably robust to path failures.

Index Terms—Pocket Switched Networks, human mobility networks, flooding, statistical properties, path failure tolerance.

Ç

1 INTRODUCTION

THE Pocket Switched Network (PSN) paradigm [1] wasproposed as a means of ferrying data using human

social contacts. At its core is the idea that as the storagecapacities of mobile devices increase, and support for short-range data transfer protocols (e.g., bluetooth) becomes moreprevalent, we could use these devices to construct datapaths in a store-carry-forward fashion: various intermediatenodes store the data on behalf of the sender and carry it toanother contact opportunity where they forward the data tothe destination or another node that can take the data closerto the destination.

A typical contact opportunity where data hops from onenode to another happens when two mobile devices are inrange of each other, usually when the humans carrying thedevices meet in a social situation. Thus, the path fromsource to destination is constructed over time and consists ofa chain of intermediate nodes that altruistically carry thesender’s data at various times before the destination gets it.

Designing for, and exploiting human mobility in thismanner becomes important in situations where networkinginfrastructure has been damaged (e.g., after a disaster hits a

city), or does not exist (e.g., in remote areas and in somedeveloping countries). By the same token, certain applica-tions, such as email, database synchronization and certaintypes of event notifications, are inherently asynchronousand can tolerate relatively long delays. For such applica-tions, mobility can be exploited to provide multiuserdiversity gains [2]. New applications, such as exchangingmusic and media files over an ad-hoc collocated peer-to-peer network, are being proposed [3] to take advantage ofshort-range connectivity between mobile devices.

At an abstract level, the PSN of N nodes can be thoughtof as a temporally evolving undirected contact graph withN vertices. At any given instant, very few edges exist. Pathsare created over time, with each contact corresponding to anew (momentary) edge in the graph. Edges appear anddisappear according to some underlying stochastic processthat corresponds to the social contacts. The sequences ofedges (contacts) that occur constitutes a trace of the PSN.

This work explores how contact occurrences shape datadelivery, using empirical traces. Our simulations discoverthe quickest possible paths by flooding data at every contactopportunity. We then study the achievable performance ofthe contact network in terms of the fraction of datadelivered (delivery ratio), as well as the time to delivery.The delivery ratio is indicative of the fraction of node pairsthat can be connected and is, therefore, a measure of theconnectivity of the network.

We study performance from three different perspectives.First, we study the distribution of contact occurrences, byexamining the evolution of the delivery ratio over a “natural”timescale that counts time in terms of the number of contactsmade globally in the entire network. Next we study thedistribution of delivery ratios over time windows of fixedduration. Finally, we study the distribution of delivery timesof all the paths discovered by the flooding process.

868 IEEE TRANSACTIONS ON MOBILE COMPUTING, VOL. 10, NO. 6, JUNE 2011

. N. Sastry and J. Crowcroft are with the Computer Laboratory, Universityof Cambridge, William Gates Building, 15 JJ Thomson Avenue, CambridgeCB3 0FD, United Kingdom.E-mail: {nishanth.sastry, jon.crowcroft}@cl.cam.ac.uk.

. D. Manjunath is with the Department of Electrical Engineering, IndianInstitute of Technology (IIT) Bombay, Powai, Mumbai 400 076, India, andwith the Bharti Centre for Communications. E-mail: [email protected].

. K. Sollins is with the Computer Science and Artificial IntelligenceLaboratory, Masachusetts Institute of Technology, The Stata Center,Building 32, 32 Vassar Street, Cambridge, MA 02139.E-mail: [email protected].

Manuscript received 13 Feb. 2010; revised 1 July 2010; accepted 26 Aug.2010; published online 15 Nov. 2010.For information on obtaining reprints of this article, please send e-mail to:[email protected], and reference IEEECS Log Number TMC-2010-02-0071.Digital Object Identifier no. 10.1109/TMC.2010.225.

1536-1233/11/$26.00 � 2011 IEEE Published by the IEEE CS, CASS, ComSoc, IES, & SPS

Page 2: 868 IEEE TRANSACTIONS ON MOBILE COMPUTING, VOL. 10, NO. …people.cs.vt.edu/~irchen/6204/pdf/Sastry-TMC11.pdf · tions, mobility can be exploited to provide multiuser diversity gains

We find that the distribution of contact occurrences ishighly skewed: an edge randomly chosen from the NðN �1Þ=2 possible edges in the PSN’s graph is much more likelyto be a rare contact in the trace than a frequently occurringcontact; a rare contact could occur fewer than ten times inthe trace, whereas a frequent contact could occur hundredsof times. The PSN’s connectivity, i.e., its ability to connecttwo nodes over time, depends crucially on the rare contacts.On the other hand, frequent contacts often occur withoutthere being new data to exchange, even when flooding isthe route selection strategy used. This inadequate inter-mingling of contacts increases the total number of contactsrequired to achieve a given delivery ratio.

Although it would appear from this macroscopic picturethat PSNs are largely inefficient, we find that over timewindows of fixed duration, there is a significant variation inthe achieved delivery ratio. We discover that in timewindows in which a large fraction of data gets delivered,rather than all nodes being able to uniformly reach eachother with the same efficiency, there is usually a large clique

of nodes that have 100 percent reachability amongthemselves. We show how to identify such time windowsand the nodes involved in the clique by computing aclustering coefficient on the contact graph.

Finally, we examine the effect of path failures on datadelivery. Flooding finds a number of paths between eachsource and destination. Unless all these paths fail, data willeventually be delivered. However, there is a wide variationin the delivery time distributions of the quickest and slowestpaths between node-pairs. Thus time to delivery is expectedto increase if some of the quicker paths fail. To understandthis, we examine the impact of failing a randomly chosensubset of the paths found by flooding between each senderand destination and find that the delivery time distributionis remarkably resilient to path failures.

Specifically, we study two failure modes among pathsfound by flooding: the first, proportional flooding, examinesdelivery times when only a fixed fraction � of pathsbetween a sender and destination can be used. The second,k-copy flooding, assumes that at most a fixed number k > 1

of the paths between a node-pair survive. In both cases, wefind that the delivery time distribution of the quickest pathscan be closely approximated with relatively small � and k. Itis shown that a constant increase in � (respectively, k) bringsthe delivery time distribution of proportional (k-copy)flooding exponentially closer to the delivery time distribu-tion of the quickest paths found by flooding.

Note that the empirically observed cumulative distribu-tion of delivery times can also be interpreted as the evolutionin time of delivery ratio, normalized by the ratio eventuallyachieved at the end of the time window.1 Thus, the aboveresult indicates that the rate at which the network can deliverdata is not greatly affected even if many of the paths fail forone reason or another. The success of k > 1-copy flooding canalso provide a loose motivation for heuristics-based routingalgorithms that explore multiple paths simultaneously.

The rest of the paper is structured as follows: In Section 2,we discuss the details of our simulations and the methodol-ogy used to probe for the delivery properties in our traces.The effect of the contact occurrence distribution and theorder of contact occurrences on data deliveries are pre-sented in Section 3. In Section 4, we show how to exploitperiods of good connectivity by identifying cliques of nodeswhich can all reach each other. Next we explore the timeevolution of delivery ratio by studying the delivery timedistribution: In Section 5, we discuss components thatcontribute to delay on successful paths and Section 6 showsthat random sampling closely approximates the quickestpossible delivery times. Section 7 discusses related work inthe area and Section 8 concludes.

2 SETUP AND METHODOLOGY

This section motivates the choice of traces, the simulationsetup and the performance measures used.

2.1 Traces

We imagine the participants of a PSN would be a finite groupof people who are at least loosely bound together by somecontext—for instance, first responders at a disaster situation,who need to send data to each other. Multiple PSNs couldcoexist for different contexts, and a single individual couldconceivably participate in several different PSNs.2

Our model that PSN participants form a cohesive groupplaces the requirement that an ideal PSN should be able tocreate paths between arbitrary source-destination pairs. Thisis reflected in our simulation setup, where the destinationsfor each source node are chosen randomly. Also, our tracesare picked to be close to the limits of Dunbar’s number(¼ 147:8, 95 percent confidence limits: 100.2-231.1), theaverage size for cohesive groups of humans [5].

The first trace comes from a four week subset of theUCSD Wireless Topology Discovery [6] project whichrecorded Wi-Fi Access Points seen by subjects’ PDAs. Wetreat PDAs simultaneously in range of the same Wi-Fiaccess point as a contact opportunity. This data has N ¼ 202subjects. The second trace consists of bluetooth contactsrecorded from November 1, 2004 to January 1, 2005 betweenparticipants of the MIT Reality Mining project [7]. Weconservatively set five minutes as the minimum alloweddata transfer opportunity and discarded contacts of smallerdurations. This trace has contacts between N ¼ 91 subjects.

The subjects in the MIT trace consist of a mixture ofstudents and faculty at the MIT Media Lab, and incomingfreshmen at the MIT Sloan Business School. The UCSD traceis comprised of a select group of freshmen, all from UCSD’sSixth College. As such, we can expect subjects in both tracesto have reasons for some amount of interaction, leading to aloosely cohesive group structure. Prior work on communitymining using the same traces supports this [8].

It is important to emphasize that our focus is solely onthe capability and efficiency of the human contact processin forming end-to-end paths. The precise choice of the

SASTRY ET AL.: DATA DELIVERY PROPERTIES OF HUMAN CONTACT NETWORKS 869

1. If the empirical probability that the delivery time is less than t is r, thena fraction r of the data that eventually get delivered have been delivered bytime t.

2. Note that this is in contrast to a single unboundedly large network ofsocially unrelated individuals as in the famous “small-world” experiment[4] that examined reachability in a network essentially comprising allAmericans and discovered an average 5.2 (�6) degrees of separation.

Page 3: 868 IEEE TRANSACTIONS ON MOBILE COMPUTING, VOL. 10, NO. …people.cs.vt.edu/~irchen/6204/pdf/Sastry-TMC11.pdf · tions, mobility can be exploited to provide multiuser diversity gains

minimum data transfer opportunity is less important—it isentirely possible that a new technology would allow forfaster node-node transfers. Indeed, our results are qualita-tively similar for other cutoff values tested. Similarly, adifferent technology for local node-node transfers couldhave different “reach,” allowing more nodes to be in contactwith each other simultaneously. Nevertheless, the substan-tial similarities (see rest of the paper) between results basedon two different technologies and traces—the Wi-Fi-basedUCSD trace and the bluetooth-based MIT trace—gives ussome confidence that the results below may be applicablebeyond the traces and technologies we have considered.

2.2 Simulation Setup and Measurement

Setup. At the beginning of simulation, data is created,marked for a randomly chosen destination, and associatedwith the source node. An oracle with complete knowledgeof the future can choose to transfer data at appropriatecontact opportunities and thereby form the quickest path tothe destination. To simulate this, we exhaustively enumer-ate all possible paths found by flooding data at each contactopportunity, and choose the quickest.

Performance measure. Consider the time-ordered se-quence (with ties broken arbitrarily) of contacts that occurglobally in the network. Since there are NðN � 1Þ quickestpaths between different sender-destination pairs, a max-imum3 of NðN � 1Þ contacts in the global sequence ofcontacts act as path completion points. Of these, Nd become“interesting” when there are d destinations per sender.Since the destinations are chosen randomly, we mightexpect that on average, if k path completion points haveoccurred, the fraction of these that are interesting isindependent of d: when d is greater, more data getsdelivered after k path completion points, but there is alsomore data to deliver.

The above discussion motivates our method of measur-ing the efficiency of the PSN: at any point in the simulation,the delivery ratio, measured as the fraction of data that hasbeen delivered, or equivalently, the number of “interest-ing” path completion points we have seen, is taken as afigure of merit. The more efficient the PSN is, the faster thedelivery ratio evolves to one, as the number of contacts andtime increase.

Unless otherwise specified, our experiments examinedelivery ratio evolution statistically averaged over 10 in-dependent runs, with each run starting at a random point inthe trace, and lasting for 6,000 contacts. We confirm ourintuition in Fig. 1, which shows that the delivery ratioevolves similarly, whether d is 1 or a maximum of N � 1destinations per sender. We note that the graph alsorepresents the fastest possible evolution of the delivery ratiounder the given set of contacts, due to the use of flooding.

3 ORDER AND DISTRIBUTION OF CONTACTS

A PSN contact trace is determined by the distribution ofcontact occurrences and the time order in which thesecontacts occur. In this section, we examine how theseproperties affect delivery ratio evolution.

Given two traces, the more efficient one will manage toachieve a given delivery ratio with fewer number of contacts.Our approach is to create a synthetic trace from the originaltrace by disrupting the property we wish to study. Compar-ing delivery ratio evolution in the original and synthetictraces informs us about the effects of the property.

Our main findings are that in both the traces weexamine, time correlations between contacts that occur toofrequently leads to noneffective contacts in which no newdata can be exchanged, and that the progress of the deliveryratio as well as the connectivity of the PSN itself areprecariously dependent on rare contacts.

3.1 Frequent Contacts are Often Noneffective

To investigate the effect of the time order in which contactsoccur, we replay the trace, randomly shuffling the timeorder in which links occur. Observe in Fig. 2 that the curvemarked “shuffled” evolves faster than “trace” implying thatthe delivery ratio increases faster after random shuffling.The random shuffle has the effect of removing any timecorrelations of contacts in the original trace. Thus theimproved delivery ratio evolution implies that timecorrelations of the contacts in the original data sloweddown the exchange of data among the nodes, causing themto be delivered later.

Manual examination reveals several time correlatedcontacts where two nodes see each other multiple timeswithout seeing other nodes. At their first contact, one or bothnodes could have data that the other does not, which is thenshared by flooding. After this initial flooding, both nodescontain the same data—subsequent contacts are “non-effective,” and only increase the number of contacts happen-ing in the network without increasing the delivery ratio.

To quantify the impact, in the curve marked “effective”on Fig. 2, we plot delivery ratio evolution in the originaltrace, counting only the contacts in which data could beexchanged. This coincides well with the time-shuffled trace,showing that noneffective contacts are largely responsiblefor the slower delivery ratio evolution in the original trace.

Next, we construct a synthetic trace that has the samenumber of nodes as the original trace, as well as thesame contact occurrence distribution. By this, we mean thatthe probability of contact between any pair of nodes is thesame as in the original trace. The delivery ratio evolution ofthis trace, depicted as “link distr” in Fig. 2, is seen to evolve

870 IEEE TRANSACTIONS ON MOBILE COMPUTING, VOL. 10, NO. 6, JUNE 2011

3. The actual number could be lesser because a contact with a rarelyactive node could complete multiple paths that end in that node.

Fig. 1. Fraction of data delivered as a function of the number of contacts,for the MIT and UCSD traces (number of destinations per sender shownin brackets). The curves for each network are clustered together,showing that the delivery ratio evolves independently of the load.

Page 4: 868 IEEE TRANSACTIONS ON MOBILE COMPUTING, VOL. 10, NO. …people.cs.vt.edu/~irchen/6204/pdf/Sastry-TMC11.pdf · tions, mobility can be exploited to provide multiuser diversity gains

in a similar fashion as the time-shuffled trace. This indicatesthat once time correlations are removed, the deliveryproperties are determined mainly by the contact occurrencedistribution.

3.2 Connectivity Depends on Rare Contacts

The fact that three different traces (shuffled, effective, andlink distr), which are based on the same contact occurrencedistribution, essentially evolve in the same manner leads usto examine this distribution further.

Fig. 3 shows the distribution: a random contact in thetrace is much more likely to be a rare one than a frequentlyoccurring contact; a rare contact could occur fewer than tentimes in the trace, whereas a frequent contact could occurhundreds of times.

Fig. 4a shows that the rare contacts are extremelyimportant for the nodes to stay connected. When contactsthat occur fewer than a minimum cutoff number of timesare removed, the number of nodes remaining in the tracefalls sharply. This implies that there are a number of nodeswhich are connected to the rest of the nodes by only a fewrare contacts.

On the other hand, removing the frequently occurringedges (by specifying a maximum cutoff frequency forcontact occurrences) does not affect connectivity greatly.For instance, the MIT trace remains connected even whencontacts which occur as few as 10 times in the trace areremoved). This suggests that nodes which contact eachother very frequently are also connected by other paths,comprising only rare edges.

Interestingly, Fig. 4b shows that with the most frequentedges removed, achieving a given delivery ratio can takefewer contacts. This appears paradoxical but can be explainedas follows: in terms of time, data delayed waiting for theoccurrence of a rare contact still take the same amount of timeto reach the destination, and data previously sent on pathscontaining more frequent edges alone are delayed, becausethey now have to be rerouted over rare contacts. However, thereliance on rare contacts allows “batch-processing”: eachnode involved in the rare contact has more data to exchangewhen the contact happens, thus decreasing the overall countof contacts taken to achieve a given delivery ratio.

SASTRY ET AL.: DATA DELIVERY PROPERTIES OF HUMAN CONTACT NETWORKS 871

Fig. 2. Delivery ratio evolution for synthetically derived variants of MIT(top), UCSD (bottom) traces. “Trace” is the original. “Shuffled,” the sametrace with time order of contacts randomly shuffled. “Effective” replays“trace,” counting only contacts where data was exchanged. “Link distr” isan artificial trace with the same size and contact occurrence distributionas the original.

Fig. 3. Contact occurrence distributions (log-log): A random edgeappears n times with probability pðnÞ. To the left of the dashed line atn ¼ 45, the distributions for both traces coincidentally happen to besimilar. The inset shows the difference when normalized by the numberof contacts in the trace. In the inset, a random edge constitutes a fractionf of the trace with probability pðfÞ.

Fig. 4. Relative importance of rare and frequent contacts. (a) Robustnessto cutoff: MIT (bottom), UCSD (top). Max cutoff specifies a maximumcutoff for the frequency of contacts, thus removing the most frequentlyoccurring ones. Min cutoff specifies a minimum frequency of con-tacts—removing the rarest contacts causes the number of nodes that areconnected to drop precipitously. (b) Evolution of delivery ratio withcontacts that occur more than cutoff times removed. MIT (left), UCSD(right). The network still remains connected, and manages to deliver datawith fewer contacts.

Page 5: 868 IEEE TRANSACTIONS ON MOBILE COMPUTING, VOL. 10, NO. …people.cs.vt.edu/~irchen/6204/pdf/Sastry-TMC11.pdf · tions, mobility can be exploited to provide multiuser diversity gains

4 DELIVERY OVER FIXED DURATION WINDOWS

The previous section showed that at a macroscopic level, aPSN is a challenged network, with connectivity cruciallydependent on rare contacts, and frequent contacts none-ffective for data transfer. This section examines timewindows of fixed duration. It is observed that there canbe a large variation in the delivery ratio achieved betweenwindows of same duration. Time windows which achieve ahigh delivery ratio are characterized by unequal connectiv-ity, with a large clique of nodes having 100 percentconnectivity among themselves and much worse connec-tivity among the other nodes.

Fig. 5 shows the distribution of delivery ratios achievedby flooding data between every possible source-destinationpair over time windows of different sizes. On average,allowing more time increases the delivery ratio. This isexpected because the number of contacts can only increaseover time. However, there is still significant variation,especially within windows of shorter duration, and thedistributions are clearly skewed (observe the positions ofthe median).

4.1 Large Cliques Correlate with Good Connectivity

Over fixed time windows, the temporal contact graph of thePSN can be viewed as constructing a static reachabilitygraph where a directed edge is drawn from node s to t if thesender s can transfer data to destination t during thatwindow. The reachability graph is constructed by floodingdata during the window between every possible source-destination pair. We examine this graph for clues aboutsuccessful time windows which achieve high delivery ratios.

A preliminary examination (Fig. 6) shows that theaverage number of paths connecting a source and destina-tion in the contact graph in the UCSD trace exhibits asignificant correlation with the achieved delivery ratio. Asimilar result can be obtained for the MIT trace.

To uncover the reason, we focus on the MIT trace anddivide the entire duration of the trace into nonoverlappingthree day windows and examine the reachability graph ofeach window for subsets of nodes with large numbers ofpaths. We find that windows with high delivery ratio tendto have a large subset of nodes that form a clique in thereachability graph (Fig. 7). By definition, each member of a

clique in the reachability graph can reach every othermember of the clique, leading to large numbers of pathswhen data is flooded.

While we expect delivery ratio to be high when thereachability graph has a large clique (implying that there iscomplete connectivity between a large fraction of nodes), it israther surprising that the converse is true, viz. whenever thedelivery ratio is high, there is a large clique in thereachability graph. To understand the implication, consideras example an arbitrarily chosen three-day window in theMIT trace, with 77 active nodes, a clique of size 46 and anoverall delivery ratio of 0.68. If nodes were equallyconnected, most nodes should be able to reach � 68 percentof the other nodes during this time window. The cliqueimplies that a subset of 46 nodes (� 60 percent of the nodes)actually have 100 percent reachability among themselves.The 31 nodes outside the clique form 31 � 30 source-destination pairs, of which only 59 percent have pathsbetween them. The table below details the skewed reach-ability between these classes:

In the same window as above, Fig. 8 looks at the qualityof the paths between nodes in the clique, outside the clique,as well as the paths that go from source nodes in the cliqueto destinations outside it, and vice-versa. Plotting the

872 IEEE TRANSACTIONS ON MOBILE COMPUTING, VOL. 10, NO. 6, JUNE 2011

Fig. 5. Distribution of delivery ratios over different time windows (MIT).Allowing more time generally results in more delivery, but there issignificant variation. Each box extends from the lower to upper quartilevalues, with a line at the median. Whiskers extend from the box to showthe range. Outliers past the whiskers are plotted individually.

Fig. 6. Scatter plot showing a correlation between delivery ratio duringrandom time windows of different sizes and the mean number of pathsconnecting node pairs. Squares, circles, and triangles represent windowsof one-hour, one-day, and three days, respectively (UCSD trace).

Fig. 7. Delivery ratio in the contact graph correlates with size ofmaximum clique observed in the reachability graph (MIT trace,nonoverlapping three-day windows).

Page 6: 868 IEEE TRANSACTIONS ON MOBILE COMPUTING, VOL. 10, NO. …people.cs.vt.edu/~irchen/6204/pdf/Sastry-TMC11.pdf · tions, mobility can be exploited to provide multiuser diversity gains

cumulative distribution functions of the delivery times ofthe quickest paths for each category shows that data istransferred faster when both the sender and receiver aremembers of the large clique.

Thus, during time intervals when there is a large cliquein the reachability graph, the PSN is very successful, butonly for the subset of nodes in the maximum cliqueobserved. It is hard to predict the nodes involved becauseclique membership changes significantly (an average of33 nodes are added or deleted over successive windows).Clique sizes also vary widely, with a mean size of 28.8, anda standard deviation of 18.2.

4.2 Clustering Coefficient Predicts Delivery Ratio

The clique occurs in the reachability graph, and cannot beeasily detected without flooding all paths and performingextensive computation. We now show that the “cliquish-ness” of the contact graph can serve as an approximation.

Suppose a vertex v has neighbors NðvÞ, with jN ðvÞj ¼ kv.At most kvðkv � 1Þ=2 edges can exist between them (thisoccurs when v is part of a kv-clique). The clustering coefficient[9] of the vertex, Cv, is defined as the fraction of these edgesthat actually exist. The clustering coefficient of the graph isdefined as the average clustering coefficient of all the verticesin the graph. In friendship networks, Cv measures the extentto which friends of v are friends of each other, and hence,approximates the cliquishness of the graph.

Fig. 9 shows that the average clustering coefficient of thecontact graph correlates well with the delivery ratioachieved during time windows of various sizes. In practice,a node can compute its current clustering coefficient byobtaining a list of recent contacts from each node it meets.The average clustering coefficient of the network can beapproximated by propagating current local estimates, forexample, by adapting a distributed algorithm to computeaggregates, such as [10].

By using clustering coefficient as a predictor for deliveryratio, senders (or other nodes on their behalf) can makeinformed decisions about how to deliver data. For instance,a node which observes a low clustering coefficient canaggressively send multiple copies, resend the same dataover time, or even bypass the PSN entirely and use a moreexpensive infrastructure-based communication mechanismsuch as a satellite connection.

5 UNDERSTANDING PATH DELAYS

We move from considering the fraction of data delivered tothe time taken to deliver data. This section focuses onunderstanding the delays on unicast paths that form duringa fixed time window. Paths are discovered by flooding datafrom every sender to every destination, as described inSection 2.

In our method of flooding, every nondestination node(including the source) forwards data to each nondestinationnode it meets that does not yet have a copy of the data, butreceives a data item at most once. The destination nodeaccepts all copies it gets, but does not forward the datafurther. Note that this does not uncover all the paths that formover time in the contact graph. Rather, since each nondestina-tion node receives data at most once, a tree of paths, rooted atthe source, forms over time. We call this the “flood-tree.”

First, we examine the quickest paths. Then, we obtain anexpression for path delay on any path discovered by flooding,in terms of the delay per hop and number of hops. Finally, welook at the distribution of the number of paths between allpossible source-destination pairs. The ultimate goal is toobtain an approximate expression for delivery time, as theminimum of the path delays of a random number of paths.

5.1 Delivery Time: Path Delay on the Quickest Path

Delivery time is the time taken by the first path to reach thedestination. Hence, it is the minimum of the path delaysalong all paths connecting the source and destination overtime. Although there is a huge difference in the path delays ofthe first (quickest) and the last (slowest) paths that formduring a time window, the Quantile-Quantile plot in Fig. 10ashows that delivery time distribution for the quickest paths isalmost exponential. Thus, most source-destination pairshave quick paths.

Small and Haas [11] and Zhang et al. [12] deriveanalytical expressions which are similar in spirit. However,these assume a constant (averaged) contact rate, whereasthe contact rates in our empirical traces are highlyheterogeneous (see Section 3). Plugging in the averagecontact rate from the empirical traces into those expressionsyields bad fits. Appendix A extends Section 5.2 and derivesChernoff bounds for the delivery times.

SASTRY ET AL.: DATA DELIVERY PROPERTIES OF HUMAN CONTACT NETWORKS 873

Fig. 8. CDFs of delivery times during a three-day window with a 46 nodeclique. The four categories shown are different combinations of sender-receiver pairs when the source (or destination) is inside (or outside) theclique. clique-clique transfers are faster than other combinations (MIT).

Fig. 9. Scatter plot showing the correlation between delivery ratio duringrandom time windows of different sizes and the clustering coefficient ofthe contact graph for that time window. Squares, circles and trianglesrepresent windows of one-hour, one-day and three days, respectively(MIT trace).

Page 7: 868 IEEE TRANSACTIONS ON MOBILE COMPUTING, VOL. 10, NO. …people.cs.vt.edu/~irchen/6204/pdf/Sastry-TMC11.pdf · tions, mobility can be exploited to provide multiuser diversity gains

5.2 Characterizing Path Delays

Next, we examine all successful paths, and express pathdelay in terms of the delay per hop and number of hops.

5.2.1 Hop Delay (H)

The hop delay distribution captures the time that elapsesbetween successive hops on the same path. Fig. 10b showsthat the UCSD curve can be fitted to an exponentialdistribution with rate � � 0:0001, giving a mean time of2.6 hours to next hop. A similar fit can be obtained for theMIT curve.

For both the MIT and UCSD traces, the fit for the entirecurve is approximate (“raw,” in Fig. 10b), with goodness offit varying between different time windows. The fit can beimproved by removing outlier values which are likely anartifact of the data set (“smoothed” in Fig. 10b). While the fitis not exact in many windows, we will assume that H isexponential, to simplify the analysis in the followingsection. This assumption does not limit the applicability ofour analysis. We will later show that the key results ofSection 6 will apply equally for any other distribution whichhas a moment-generating function.

Note that there is no conflict between the nearlyexponential distribution of hop delays and the previouslyreported power laws (with exponential tails) for intercontacttime distribution [13], [14]. Intercontact time is the timebetween repeated meetings of the same pair of nodes,whereas hop delay measures the time taken by the floodingprocess: from the time a node receives some data to the time itmeets a new node that does not already have a copy ofthe data. The longer the duration a node carries the data, thegreater the chance that data has already been flooded to thenodes it meets. Because of flooding, the hop delay distribu-tion decays rapidly, unlike the intercontact time distribution.

5.2.2 Number of Hops (N)

Several factors work together to limit the number of hops ina successful path. First, we only consider paths that formduring a fixed time window. Second, the small-worldnature of the human contact graph makes for short paths to

a destination; and paths are frozen at the destinationbecause the destination does not forward data further.Third, each node can join the flood-tree at most once. As thetree grows, the number of nodes available to grow the treeand extend a path decrease. Thus extremely long paths arerare. Fig. 10c shows that the number of hops in paths thatreach the destination during a one week time window of theMIT trace closely follows a Poisson distribution (The meannumber of hops is � ¼ 5:58 in this window, and has beenfound to vary between five to six in different windowstested). A similar fit can be obtained for the UCSD trace.

5.2.3 Path Delay (D)

From the above empirically found distributions, we canderive the distribution of the path delay D on a randompath as the sum of N hop delays. Thus, D can be written interms of its moment-generating function (see Table 1 fornotation and a summary of the component distributions):

MDðsÞ ¼ GNðMHðsÞÞ ¼ e����s�1ð Þ: ð1Þ

The average path delay is simply M 0Dð0Þ ¼ ���1. Applying a

Chernoff-type bound,

P ½D � t� � mins>0

e�stMDðsÞ ¼ mins>0

e����s�1ð Þ�st: ð2Þ

Minimizing by setting s ¼ � �ffiffiffiffiffiffiffiffiffiffi��=t

p, we get (for t <

ffiffiffiffiffiffiffiffi�=�

p)

P D � t½ � � e�ðffiffiffi�tp�ffiffi�pÞ2 : ð3Þ

Remark 1. The form of (3) indicates that the path delay on arandom path is also close-to-exponentially distributed.

5.3 Characterizing the Number of Paths (L)

Since each node joins the flood-tree at most once, there canbe at most N � 1 nodes (nodes other than destination) onthe tree. Therefore, there can be at most N � 1 pathsreaching a destination during the time window.

The actual number of paths depends on the number ofunique nodes met by the destination: if the PSN is well

874 IEEE TRANSACTIONS ON MOBILE COMPUTING, VOL. 10, NO. 6, JUNE 2011

Fig. 10. Distributions related to successful paths. Each Q-Q plot shows fit through correspondence between sample deviates generatedaccording to the theoretical distribution (predicted) and empirical (actual) values. Closeness to predicted ¼ actual diagonal indicates better fit.Different combinations of trace and time window sizes are used to show generality of fit. (a) Delivery time: roughly exponential (MIT, one weekwindow). (b) Hop delay: roughly exponential (UCSD, 12 hour window). (c) No. of hops: Poisson (MIT, one week window). (d) No. of paths (alsodegree): Negative Binomial (UCSD, 6 hour window).

TABLE 1Summary of Notation Used to Characterize Components of Path Delay and Delivery Times

Page 8: 868 IEEE TRANSACTIONS ON MOBILE COMPUTING, VOL. 10, NO. …people.cs.vt.edu/~irchen/6204/pdf/Sastry-TMC11.pdf · tions, mobility can be exploited to provide multiuser diversity gains

mixed and the window is long enough, eventually allintermediate nodes become reachable from the source andget attached to the flood-tree. Thus the number of paths to adestination is determined by the number of distinctneighbors met by the destination over the time window.

Fig. 11 empirically confirms this argument, by showingthat the median (mean can also be used, instead) number ofpaths reaching a destination correlates with the number ofdistinct neighbors it has. As a consequence of this “eventualreachability” phenomenon, the distribution of the numberof paths to a destination is simply given by the degreedistribution of the who-met-whom graph of the PSN, takenover the entire time window.

Fig. 10d shows that the degree distribution (and thenumber of paths) in the UCSD trace fits a negative binomialdistribution. A similar fit can be obtained for the MIT trace.

The fact that the number of group members that anindividual has contact with (the number of neighbors seen)follows the same distribution in both the traces, across timewindows of different sizes suggests the possibility of anunderlying stochastic mechanism.

The negative binomial is a versatile distribution that canarise in a number of ways [15]. One possible way thenegative binomial arises is as a continuous mixture ofPoisson distributions where the mixing distribution of thePoisson rate is a gamma distribution. The model assumesthat people acquire new neighbors according to a Poissonprocess with rate �p. Heterogeneity in the population ismodeled by drawing �p from some population distributionP ð�pÞ. If P ð�Þ follows the gamma distribution

P ð�p ¼ �Þ ¼e�ð�=�1Þð�=�1Þð�2�1Þ

�1�ð�2Þ;

then the observed distribution of the number of neighborsseen by individuals in the group would fit the negativebinomial distribution. One interpretation [16] of this is thatpeople acquire new neighbors by searching for peoplesatisfying some personal criterion. People continue toacquire new neighbors until they have �2 partners. Eachpotential new partner satisfies the search criterion indepen-dently with a probability pc, which defines the scaleparameter �1 ¼ ð1� pcÞ=pc.

6 EVALUATING THE EFFECT OF FAILURES

Many studies on Pocket Switched Networks, includingours, implicitly assume that every contact on any pathwhich occurs can be used to successfully transfer data. Inreality, many factors could prevent a path from beinguseful: an intermediate node may run out of storage; thetime available during a contact opportunity may not besufficient; or a node can simply fail.

Since only one path between every source-destinationpair needs to succeed for data delivery, individual pathfailures do not greatly impact the delivery ratio achieved atthe end of a time window (unless all paths between asource-destination pair fail, disconnecting the network).However, it can affect the rate at which the delivery ratioevolves: suppose the quickest path between a pair of nodeswould have arrived at t1, but cannot be used because of afailure. If the first usable path connects the nodes at timet2 > t1, then between t2 and t1 the fraction of data deliveredis decreased on account of the path failure. In other words,there is a delay in data delivery, which temporarily shiftsthe cumulative distribution of delivery times to the right.

Given a sequence of contacts, flooding achieves the bestpossible delivery times by exploring every contact opportu-nity and thereby finding the path with the minimum pathdelay. This section looks at the degradation in the deliverytime distribution when not all of the paths found byflooding can be explored.

Specifically, we study two failure modes: the first,proportional flooding, explores a fixed fraction � of thepaths found by flooding between each source and destina-tion. We show that a constant increase in the fraction ofpaths explored brings the delivery time distribution ofproportional flooding exponentially closer to that of flood-ing over all paths. The second failure mode, k-copyflooding, explores no more than a fixed number k > 1 ofthe paths found by flooding between each source anddestination. Again, a constant increase in k brings thedelivery time distribution exponentially close to the optimaldelivery time distribution of flooding all paths. Empirically,even small values of k (e.g., k ¼ 2 or k ¼ 5) closelyapproximate delivery times found by flooding.

The results of this section imply that the human contactnetwork is remarkably resilient to path failures and thedelivery ratio evolves at a close-to-optimal rate even whenthe majority of paths fail and only a small fraction or asmall, bounded number of paths can transport data to thedestination. The success of k-copy flooding can provide aloose motivation for routing algorithms that use multiplepaths between each sender and destination pair since thiscould obtain a close-to-optimal delivery time distribution.However, heuristics-based routing algorithms may not findthe same paths as found by flooding. Thus, the correspon-dence is not exact.

6.1 Proportional Flooding

Consider an arbitrary source-destination pair. We willmodel the path delays between them as being chosenindependently and identically from the distribution in (1).Suppose copies of the data are sent along l randomlychosen paths between them. The obtained delivery time D�lis the minimum of the path delays across all l paths. Using(3), we can write

SASTRY ET AL.: DATA DELIVERY PROPERTIES OF HUMAN CONTACT NETWORKS 875

Fig. 11. Median number of successful paths reaching a destination node

correlates with the number of distinct neighbors it has. Diagonal shows

x axis ¼ y axis. (MIT, one week window)

Page 9: 868 IEEE TRANSACTIONS ON MOBILE COMPUTING, VOL. 10, NO. …people.cs.vt.edu/~irchen/6204/pdf/Sastry-TMC11.pdf · tions, mobility can be exploited to provide multiuser diversity gains

P D�l � t� �

¼ 1�Yli¼1

P D � t½ � � 1� e�lffiffiffi�tp�ffiffi�pð Þ2 : ð4Þ

Note that the above assumes that the l path delays areindependent. In reality, paths found by flooding all fan outfrom a single source node, and the first few hops, close tothe source, are typically shared with other paths, violatingthe independence assumption. Therefore, the model in thissection is to be considered only as a simple formulationdesigned to gain insight into proportional flooding. It isworth mentioning however that in the empirical data sets,we frequently find that the major component of path delayis contributed by the part of the paths closest to thedestination, which are not shared with other paths. Also, inthe case when only a few paths on flood-tree are beingrandomly sampled, the number of hops shared is limited.

Consider source-destination pairs with L ¼ m pathsconnecting them. Full flooding finds the quickest of allm paths and obtains a delivery time distribution P ½D�L �tjL ¼ m�. Proportional flooding chooses a fraction � ofthem. From (4), the difference �ðt;�Þ, in the delivery timedistributions between full and proportional flooding, isupper bounded by

�ðt;�Þ � P D�L � tjL ¼ m� �

� 1þ e��mffiffiffi�tp�ffiffi�pð Þ2 : ð5Þ

Remark 2. A constant increase in � has an exponential effecton �: for any t, if � is increased by some constant, thefraction of data delivered by proportional floodingduring ½0; t� becomes exponentially closer to that deliv-ered by full flooding. Thus, proportional floodingquickly becomes very effective as � is increased.

While the above is not unexpected given the model andthe resulting distributions as obtained in Section 5.2, thisobservation can be generalized: for a hop delay distributionwith moment-generating function MHðsÞ, (1-5) can berederived to get

�ðt;�Þ � P D�L � tjL ¼ m� �

� 1þ expð�mFHðtÞÞ; ð6Þ

where FHðtÞ ¼ �MHðsminðtÞÞ � sminðtÞt� � and sminðtÞ mini-mizes s in (2).4 Thus the exponential decrease in � with aconstant increase in � is obtained as long as FHðtÞ < 0. Inother words, our results hold when there are a Poisson

number of hops in paths formed over fixed time windows,for any hop delay distribution H that has a momentgenerating function and satisfies FHðtÞ < 0.

Also, since

@�

@�¼ mFHðtÞe�mFHðtÞ > 0;

� decreases when � is increased. Furthermore, the rate ofdecrease is higher for smaller �—increasing � from � ¼ 0:1to � ¼ 0:2 results in a greater decrease than an increase from� ¼ 0:6 to � ¼ 0:7.

Fig. 12 empirically shows the difference between D�ðtÞ,the delivery time distribution obtained by flooding over allpaths, and D��ðtÞ, the delivery time distribution forproportional flooding using a randomly selected fraction� of paths between every source and destination. Thedifference is measured using the Kolmogorov-Smirnovstatistic given by D ¼ maxtðD�ðtÞ �D��ðtÞÞ. Note that theY -axis is log scale; a constant increase in � shows anexponential decrease in D.

6.2 From Proportional to Bounded Number of Paths

Section 5.3 showed that the number of paths to adestination (L), is a random variable that is well modeledby the negative binomial, a positively (or right) skeweddistribution. Thus a majority of sender-destination pairshave a small (fewer than average) number of paths, but aminority have a large number of paths, which pulls theaverage higher. The expected number of paths that need towork for successful proportional flooding is given by�IE½L�, which is higher than it would be if the minority ofnode pairs with large numbers of paths were notconsidered. In the worst case, when there are ðN � 1Þ pathsbetween a sender and destination, proportional floodingworks only if �ðN � 1Þ of these are functional.

This suggests an alternate bounded cost strategy thatexplores at most a fixed number, k, of the paths betweenevery sender and destination. We call this k-copy flooding.Unlike proportional flooding, k-copy flooding explicitlylimits the number of paths explored, and therefore cantolerate a larger number of path failures in the worst case,when there are a large number of paths between a node-pair.

Fig. 13 shows empirically that in our data sets, even forsmall k (¼ 2; 5), the delivery time distribution of k-copy

876 IEEE TRANSACTIONS ON MOBILE COMPUTING, VOL. 10, NO. 6, JUNE 2011

Fig. 12. K-S statistic (D) measuring the difference between the deliverytime distributions of full flooding and proportional flooding for different �.X-axis is linear, Y -axis is log-scale. MIT: one week window. UCSD: sixhour window.

Fig. 13. k-copy flooding: Nodes are connected by multiple paths withdifferent delays (distributions of the quickest and slowest are shown).Yet, randomly choosing at most k of the paths to each destinationclosely approximates the quickest, even for small k. (MIT trace, oneweek window)

4. When H is exponentially distributed, we get FHðtÞ ¼ �ðffiffiffiffiffi�tp�

ffiffiffi�pÞ2.

Plugging this value into (6) yields (5).

Page 10: 868 IEEE TRANSACTIONS ON MOBILE COMPUTING, VOL. 10, NO. …people.cs.vt.edu/~irchen/6204/pdf/Sastry-TMC11.pdf · tions, mobility can be exploited to provide multiuser diversity gains

flooding starts to closely approximate full flooding. To seewhy, consider the equivalent fraction �

kof paths in

proportional flooding that gives the same expected numberof paths as k-copy forwarding:

Xkl¼0

lP L ¼ l½ � þ kP L > k½ � ¼ �kIE L½ �: ð7Þ

Suppose k is increased by a constant h, resulting in a newequivalent fraction �

kþh . Equation (7) becomes

Xkl¼0

lP L ¼ l½ � þXhj¼1

ðkþ jÞP L ¼ kþ j½ �

þ ðkþ hÞP L > kþ h½ � ¼ �kþhIE L½ �:

Regrouping, we get

Xkl¼0

lP L ¼ l½ � þ kXhj¼1

P L ¼ kþ j½ � þ P L > kþ h½ � !

þXhj¼1

jP L ¼ kþ j½ � þ hP L > kþ h½ � ¼ �kþh IE L½ �:

Comparing with (7), we can write

�kIE L½ � þ

Xhj¼1

jP L ¼ kþ j½ � þ hP L > kþ h½ � ¼ �kþhIE L½ �:

Thus the increase in the equivalent fraction of paths is

�kþh � �k

� h

IE L½ �Xhj¼1

P L ¼ kþ j½ � þ P L > kþ h½ � !

¼ h P L > k½ �=IE L½ �ð Þ:ð8Þ

Remark 3. A constant increase in k is equivalent to at least a(scaled) constant increase in the fraction of paths exploredby proportional flooding. Thus, as a simple consequence ofRemark 2, a constant increase in the number of pathsexplored in k-copy forwarding moves its delivery timedistribution exponentially closer to that of full flooding.

This explains why exploring at most a small number k ofpaths has a delivery time distribution approaching that offlooding over all paths. Fig. 14 empirically shows theequivalent fractions �k for the k ¼ 2 and k ¼ 5 cases

discussed previously. Appendix B derives a closed formfor �

kwhen the number of paths L is negative binomial.

6.3 Resilience and Load Balancing

The results above suggest that the human contact networkis remarkably resilient to path failures and the network’soptimal rate of data delivery can still be approximated evenwhen many of the paths do not succeed in delivering data.To conclude, we briefly elucidate this from the perspectiveof intermediate node failures.

The source and destination nodes rely on the rest of thenodes in the network to serve as intermediate nodes andform connecting paths. The use of multiple paths provides adegree of resilience against path failures, since only one ofthe paths needs to succeed. Deterministic strategies thatselectively favor particular next hop nodes or evenparticular paths can potentially overburden certain inter-mediate nodes that end up getting selected more often, butrandomized strategies such as the ones we discuss are moreresilient and have fewer bottlenecks.

For any strategy, we can measure the burden placed on agiven intermediate node by any given source (respectively,destination) by counting the number of paths of the sender(destination), chosen according to the strategy, in which thenode figures as an intermediate hop. We call this numberthe betweenness centrality of the node for a given source(destination).

Source nodes (destinations) with very few central nodesare vulnerable to being disconnected if the central nodes fail.As Fig. 15 shows, deterministic strategies such as picking thequickest possible path can result in a network that is sparsein central nodes. A randomized strategy results in morecentral nodes and thus renders the network more resilientagainst failures. A sender (destination) that has few centralnodes is also likely to face congestion if the central nodeshit capacity bottlenecks. The availability of more centralnodes opens the possibility of load balancing in such cases.

7 RELATED WORK

Conceptually, PSNs are Delay-Tolerant Networks [17], andgeneric results from that framework apply. For instance, aforwarding algorithm that has more knowledge aboutcontacts is likely to be more successful [18], and the bestperformance is achieved by an oracle with knowledge offuture contacts.

Nevertheless, the fact that our underlying network ismade up of human contacts and is less predictable has alarge impact: For instance, reasonably predictable trafficpatterns of buses allow a distributed computation of routemetrics for packets in vehicular DTNs [18], [19]. Similarly,fixed bus routes allow the use of throwboxes [20] to reliablytransfer data between nodes that visit the same location, butat different times.

The variability of PSNs has naturally led to a statisticalapproach: The intercontact time distribution of humansocial contacts has been used to model transmission delaybetween a randomly chosen source-destination pair [13],[14]. In this work, we take a more macroscopic view andlook at the ability of the PSN to simultaneously deliver databetween multiple source-destination pairs. This leads us tolook at the distribution of the number of contacts betweenrandomly chosen source-destination pairs, and find that

SASTRY ET AL.: DATA DELIVERY PROPERTIES OF HUMAN CONTACT NETWORKS 877

Fig. 14. Proportional flooding with �2 ¼ 0:15 of paths has similar delivery

times as k ¼ 2-copy flooding. Similarly k ¼ 5 corresponds to �5¼ 0:5.

(MIT, one week window)

Page 11: 868 IEEE TRANSACTIONS ON MOBILE COMPUTING, VOL. 10, NO. …people.cs.vt.edu/~irchen/6204/pdf/Sastry-TMC11.pdf · tions, mobility can be exploited to provide multiuser diversity gains

this distribution is not only crucial for global data deliveryperformance, but also for the connectivity of the PSN itself.

This paper uses variants of flooding to obtain a betterunderstanding of achievable data delivery properties ofhuman contact networks. However, unbounded flooding isexpensive. To mitigate this, various routing protocols havebeen proposed. These typically use various ad-hoc metrics,such as betweenness centrality [21], history of previousmeetings [22], and inferred community structure [23].Computing such metrics can be costly and the computationcan be inaccurate due to the high variability inherent inPSNs. Our results point to simpler techniques that couldexploit time windows of good connectivity or the use ofmultiple paths.

Small and Haas [11], Zhang et al. [12], Groenevelt et al.[24] model the performance of epidemic routing and itsvariants. In particular, they derive a closed form fordelivery time distribution, which is shown to be accuratefor certain common mobility models. However, severalsimplifying assumptions are made, including an exponen-tial intercontact time between node pairs. Unfortunately,human contact networks are known to have power lawintercontact times with exponential tails [13], [14]. Further-more, [11], [12], [24] use a constant contact rate, whereas ourstudies show that human contacts are highly heteroge-neous. Mickens and Noble [25] considers heterogeneouscontact rates between mobile devices but only in the contextof establishing an epidemic threshold for virus spread. Ourwork overcomes the problems of heterogeneous contactrates and power-law inter contact times by directlyconsidering properties (hop delay, number of hops, numberof paths) of paths found by flooding.

The number of paths found by flooding is crucial to thesuccess of proportional and k-copy flooding. Countingdifferently, [26] reports a phenomenon of “path explosion”wherein thousands of paths reach a destination shortly afterthe first, many of which are duplicates, shifted in time. Incontrast, duplicate paths are prevented in our method ofcounting, by having nodes remember if they have alreadyreceived some data, resulting in a maximum of N � 2 pathsbetween a source and destination.

The power of using multiple paths has been recog-nized. Binary Spraying, which forms the basis for two

schemes (spray and wait, spray and focus) has beenshown to be optimal in the simple case when nodemovement is independent and identically distributed [27].Erramilli et al. [28] noted that among routing schemesevaluated, those using more than one copy performedbetter. Furthermore, all algorithms employing multiplepaths showed similar average delivery times. The successof k-copy flooding suggests a possible explanation for thisresult. Similarly, [29] finds that the delivery ratio achievedby a given time is largely independent of the propensity ofnodes to carry other people’s data. They suggest theexistence of multiple paths as an explanation. At anabstract level, the refusal of a node to carry another node’sdata can be treated as a path failure. Thus Section 6corroborates [29] and provides a direct explanation.

We mention in passing that our finding of large cliquesin the reachability graphs is loosely analogous to the giantstrongly connected component in the WWW graph thataccounts for most of its short paths [30]. Similarly, ourfinding that the human contact graph is resilient to pathfailure is echoed in the attack tolerance demonstrated forstatic graphs of many complex networks [31].

8 CONCLUSION

This work examined the data delivery properties of human

contact networks using empirical traces of contacts

between loosely connected groups of people. The ability

to deliver data was measured by the fraction of data

delivered by flooding data between every possible source-

destination pair.The effectiveness of the network in delivering data is

determined by its contact occurrence distribution and theirorder of occurrence. The contact occurrence distributionexhibits an interesting dichotomy with many contactsoccurring rarely, and a few occurring very frequently. Ata macroscopic level, this suggests that delivery of packets isdifficult: the connectivity of the network is cruciallydependent on rare contacts occurring, and inadequatemixing of data due to repeated occurrences of frequentcontacts increases the global count of contacts required fordelivering a given fraction of data.

878 IEEE TRANSACTIONS ON MOBILE COMPUTING, VOL. 10, NO. 6, JUNE 2011

Fig. 15. Scatterplots showing node numbers of source on the X-axis, and on the Y -axis, nodes numbers which cross a threshold (¼ 5) betweennesscentrality for the corresponding sources. Left figure shows the central nodes when paths are picked according to a deterministic strategy (thequickest path). Right figure shows the central nodes when up to five paths are randomly selected. The random strategy selects more central nodesand spreads the load more evenly. A similar set of figures can be obtained by looking at the most central nodes for a given destination. (MIT trace,one week window). (a) Central nodes on the quickest path. (b) Central nodes on five randomly chosen paths.

Page 12: 868 IEEE TRANSACTIONS ON MOBILE COMPUTING, VOL. 10, NO. …people.cs.vt.edu/~irchen/6204/pdf/Sastry-TMC11.pdf · tions, mobility can be exploited to provide multiuser diversity gains

We also find that there is huge variation over differenttime windows of same duration. Time windows that aresuccessful are a result of unequal connectivity and arecharacterized by a large clique of nodes that can all reacheach other. It was demonstrated that the successful timewindows as well as the clique members can be identified bylooking at the clustering co-efficient over the contact graph.

We then examined distributions of random variables thatcontribute to the delay on paths that successfully connectsource nodes to their destinations. Among other results, itwas found that the number of hops on a path follows aPoisson distribution. Primarily as a consequence of this, itwas shown that the human contact network exhibits aremarkable resilience to random path failures. Increasingthe number or fraction of paths explored between sourceand destination by a constant brings the delivery timedistribution exponentially close to the optimal. Thus, thenetwork can continue to deliver data at a near optimal rateeven when a large number of random path failures occur.

The results of this work could be used to adopt aprincipled approach to the development of routing algo-rithms for Pocket Switched Networks, rather than relyingon heuristics as motivation. For instance, one possibility isto exploit periods of good connectivity by computing thecurrent clustering co-efficient. Similarly, randomized sam-pling among paths considered could lead to better resilienceand load balancing properties.

APPENDIX A

CHERNOFF BOUNDS FOR DELIVERY TIMES

Using (4), the delivery time distribution D� of all sourcedestination pairs can be bounded from below by

P D� � t½ � ¼Xl

P D�l � tjL ¼ l� �

P L ¼ l½ �

� 1� IE�e�Lð

ffiffiffi�tp�ffiffi�pÞ2�

¼ 1�ML

��� ffiffiffiffiffi�tp�

ffiffiffi�p �2�

;

ð9Þ

where ML is the moment generating function of L.Assuming that L is a negative binomial with mean � anddispersion �,

P D� � t½ � � 1� 1

1þ ��

�1� e�ð

ffiffiffi�tp�ffiffi�pÞ2�

!�

: ð10Þ

Similarly, we can use a Chernoff bound for P ½D � t� ¼P ½a�D� 0� � estMDð�sÞ¼ e�ð

ffiffiffi�tp�ffiffi�pÞ2 , where s¼

ffiffiffiffiffiffiffiffiffiffi��=t

p� �

gives the best bound. Substituting for P ½D � t� ¼ 1�P ½D � t� in (4)

P D� � tjL ¼ l½ � � 1��1� e�ð

ffiffiffi�tp�ffiffi�pÞ2�l: ð11Þ

and performing the same calculations, we obtain a lowerbound for D�.

P D� � t½ � ¼Xl

P D� � tjL ¼ l½ �P L ¼ l½ �

� 1� IE��

1� e�ðffiffiffi�tp�ffiffi�pÞ2�L�

¼ 1�GL

�1� eð�ð

ffiffiffi�tp�ffiffi�pÞ2Þ�:

ð12Þ

When L is a negative binomial, as above,

P D� � t½ � � 1� 1

1þ �� e�ðffiffiffi�tp�ffiffi�pÞ2

!�

: ð13Þ

APPENDIX B

DERIVING AN EQUIVALENT FRACTION �k

FOR

L NegBinð�; �ÞThis section derives an equivalence between k-copy forward-ing and proportional flooding, from the general form (7),when the degree distribution (equivalently, the number ofpaths to a destination) isLNegBinðmean¼�; dispersion¼�Þ.The probability mass function for L can be written as [32]:

P L ¼ l; �; �½ � ¼ �ð�þ lÞðlÞ! �ð�Þ

� þ �

� �� �

� þ �

� �l:

First, using p ¼ �=ð� þ �Þ, we have [32]

P L > k½ � ¼ 1� P L � k½ � ¼ 1� Ipð�; kþ 1Þ; ð14Þ

where Ixða; bÞ ¼ Bðx; a; bÞ=Bða; bÞ is the regularized incom-

plete Beta function. Next,

Xkl¼0

lP L ¼ l½ � ¼Xkl¼1

�ð�þ lÞðl� 1Þ! �ð�Þ p

�ð1� pÞl

¼ � ð1� pÞp

Xkl¼1

�ð�þ 1þ l� 1Þðl� 1Þ! �ð�Þ p�þ1ð1� pÞl�1

¼ � ð1� pÞp

Xk�1

l0¼0

P L ¼ l0; �; �þ 1½ �

¼ �Ipð�þ 1; kÞ:ð15Þ

Substituting (14) and (15) into (7), we get

�k¼ Ipð�þ 1; kÞ þ k

�I1�pðkþ 1; �Þ: ð16Þ

REFERENCES

[1] P. Hui et al., “Pocket Switched Networks and the Consequences ofHuman Mobility in Conference Environments,” Proc. ACMSIGCOMM First Workshop Delay Tolerant Networking and RelatedTopics, 2005.

[2] M. Grossglauser and D.N.C. Tse, “Mobility Increases the Capacityof Ad Hoc Wireless Networks,” IEEE/ACM Trans. Networks,vol. 10, no. 4, pp. 477-486, Aug. 2002.

[3] L. McNamara, C. Mascolo, and L. Capra, “Media Sharing Based onColocation Prediction in Urban Transport,” Proc. ACM MobiCom,2008.

[4] J. Travers and S. Milgram, “An Experimental Study of the SmallWorld Problem,” Sociometry, http://www.jstor.org/stable/2786545, vol. 32, no. 4, pp. 425-443, 1969,

[5] R.I.M. Dunbar, “Co-Evolution of Neocortex Size, Group Size andLanguage in Humans,” Behavioral and Brain Sciences, vol. 16,pp. 681-735, 1993.

[6] University of California San Diego, “Wireless Topology DiscoveryProject,” http://sysnet. ucsd.edu/wtd/wtd.html, 2004.

[7] N. Eagle and A.S. Pentland, “CRAWDAD Data Set Mit/Reality(v. 2005-07-01),” http://crawdad.cs.dartmouth.edu/mit/reality,2010.

[8] E. Yoneki, P. Hui, and J. Crowcroft, “Visualizing CommunityDetection in Opportunistic Networks,” Proc. Second ACM Work-shop Challenged Networks (CHANTS ’07), pp. 93-96, 2007.

SASTRY ET AL.: DATA DELIVERY PROPERTIES OF HUMAN CONTACT NETWORKS 879

Page 13: 868 IEEE TRANSACTIONS ON MOBILE COMPUTING, VOL. 10, NO. …people.cs.vt.edu/~irchen/6204/pdf/Sastry-TMC11.pdf · tions, mobility can be exploited to provide multiuser diversity gains

[9] D.J. Watts and S.H. Strogatz, “Collective Dynamics of ‘Small-World’ Networks,” Nature, vol. 393, no. 6684, pp. 440-442, June1998.

[10] Y.J. Zhao, R. Govindan, and D. Estrin, “Computing Aggregates forMonitoring Wireless Sensor Networks,” Proc. First IEEE Int’lWorkshop Sensor Network Protocols and Applications (SNPA ’03), 2003.

[11] T. Small and Z.J. Haas, “The Shared Wireless Infostation Model: ANew Ad Hoc Networking Paradigm (or Where There Is a Whale,There Is a Way),” Proc. ACM MobiHoc, 2003.

[12] X. Zhang, G. Neglia, J. Kurose, and D. Towsley, “PerformanceModeling of Epidemic Routing,” Computer Networks, vol. 51,no. 10, pp. 2867-2891, 2007.

[13] T. Karagiannis, J.-Y. Le Boudec, and M. Vojnovic, “Power Lawand Exponential Decay of Inter Contact Times between MobileDevices,” Proc. ACM MobiCom, 2007.

[14] A. Chaintreau et al., “Impact of Human Mobility on OpportunisticForwarding Algorithms,” IEEE Trans. Mobile Computing, vol. 6,no. 6, pp. 606-620, June 2007.

[15] M.T. Boswell and G.P. Patil, “Chance Mechanisms Generating theNegative Binomial Distribution,” Random Counts for ScientificWork, vol. 1, pp. 3-22, Pennsylvania State Univ. Press, 1970.

[16] M.S. Handcock and J.H. Jones, “Interval Estimates for EpidemicThresholds in Two-Sex Network Models,” Theoretical PopulationBiology, vol. 70, no. 2, pp. 125-134, 2006.

[17] K. Fall, “A Delay-Tolerant Network Architecture for ChallengedInternets,” Proc. SIGCOMM, 2003.

[18] S. Jain, K. Fall, and R. Patra, “Routing in a Delay TolerantNetwork,” SIGCOMM Computer Comm. Rev., vol. 34, no. 4, pp. 145-158, 2004.

[19] A. Balasubramanian, B.N. Levine, and A. Venkataramani, “DTNRouting as a Resource Allocation Problem,” Proc. SIGCOMM,2007.

[20] W. Zhao et al., “Capacity Enhancement Using Throwboxes inDTNs,” Proc. IEEE Int’l Conf. Mobile Ad Hoc and Sensor Systems(MASS ’06), 2006.

[21] E. Daly and M. Haahr, “Social Network Analysis for Routing inDisconnected Delay-Tolerant Manets,” Proc. ACM MobiHoc, 2007.

[22] A. Lindgren, A. Doria, and O. Schelen, “Probabilistic Routing inIntermittently Connected Networks,” Proc. Workshop ServiceAssurance with Partial and Intermittent Resources (SAPIR ’04), 2004.

[23] P. Hui, J. Crowcroft, and E. Yoneki, “Bubble Rap: Social-BasedForwarding in Delay Tolerant Networks,” Proc. ACM MobiHoc,2008.

[24] R. Groenevelt, P. Nain, and G. Koole, “The Message Delay inMobile Ad Hoc Networks,” Performance Evaluation, vol. 62, nos. 1-4, pp. 210-228, 2005.

[25] J.W. Mickens and B.D. Noble, “Modeling Epidemic Spreading inMobile Environments,” Proc. Fourth ACM Workshop WirelessSecurity (WiSe ’05), 2005.

[26] V. Erramilli, A. Chaintreau, M. Crovella, and C. Diot, “Diversity ofForwarding Paths in Pocket Switched Networks,” Proc. ACMInternet Measurement Conf., Oct. 2007.

[27] T. Spyropoulos, K. Psounis, and C.S. Raghavendra, “EfficientRouting in Intermittently Connected Mobile Networks: TheMultiple-Copy Case,” IEEE/ACM Trans. Network, vol. 16, no. 1,pp. 77-90 , Feb. 2008.

[28] V. Erramilli, M. Crovella, A. Chaintreau, and C. Diot, “DelegationForwarding,” Proc. ACM MobiHoc, 2008.

[29] P. Hui, K. Xu, V. Li, J. Crowcroft, V. Latora, and P. Lio,“Selfishness, Altruism and Message Spreading in Mobile SocialNetworks,” Proc. First IEEE Int’l Workshop Network Science ForComm. Networks (NetSciCom ’09), 2009.

[30] A. Broder, R. Kumar, F. Maghoul, P. Raghavan, S. Rajagopalan, R.Stata, A. Tomkins, and J. Wiener, “Graph Structure in the Web,”Computer Networks, vol. 33, nos. 1-6, pp. 309-320, 2000.

[31] R. Albert, H. Jeong, and A.-L. Barabasi, “Error and AttackTolerance of Complex Networks,” Nature, vol. 406, pp. 378-382,2000.

[32] N. Kotz, A. Kemp, and S. Kotz, Univariate Discrete Distributions,third ed. Wiley, 2005.

Nishanth Sastry received the bachelor’s degree(with distinction) in computer science and en-gineering from the R.V. College of Engineering,Bangalore University, India, and the master’sdegree in computer science from the Universityof Texas at Austin. He is currently workingtoward the PhD degree in the Computer Labora-tory at the University of Cambridge, where hehas been funded by the St. John’s Benefactor’sScholarship. Prior to this, he was at IBM for five

years: first in the software group and then at IBM Research. He alsoworked for a year at Cisco Systems. His honours include a BestUndergraduate Project Award, a Best Paper Award from the ComputerSociety of India, a Cisco Achievement Program Award, a YunusInnovation Challenge Award at the Massachusetts Institute of Technol-ogy IDEAS competition, and several awards for his work at IBM. His workhas ranged over several layers of the network stack and he is currentlyinvolved in building better networked systems by harnessing socialnetwork information. He is a student member of the ACM and the IEEE.

D. Manjunath received the BE degree fromMysore University, the MS degree from IITMadras, India, and the PhD degree fromRensselaer Polytechnic Institute, Troy, NewYork, in 1986, 1989 and 1993, respectively. Hehas worked in the Corporate R&D Center ofGeneral Electic in Scehenectady, New York, in1990, in the Computer and Information SciencesDepartment, University of Delaware, from 1992-1993, and in the Computer Science Department,

University of Toronto, from 1993-1994. He was with the Department ofElectrical Engineering of IIT Kanpur from 1994-1998. He has been withthe Electrical Engineering Department of IIT Bombay since July 1998,where he is now a professor. He won the best paper award at ACMSigmetrics 2010 and was a Benjamin Meaker visiting professor at theUniversity of Bristol during 2010. His research interests include thegeneral areas of communication networks and performance analysis.His recent research has concentrated on network traffic and perfor-mance measurement, analysis of random wireless data and sensornetworks, network pricing, and queue control. He is a coauthor ofCommunication Networking: An Analytical Approach (Morgan-Kaufman,2004) and Wireless Networking (Morgan-Kaufman, 2008). He is amember of the IEEE.

Karen Sollins is a principal scientist at theMassachusetts Institute of Technology (MIT),where she also received her graduate degrees.Her research concentrates on internet scalenetwork architecture, with a current focus ondistributed network management. She has pub-lished in the areas of security, naming, networkarchitecture, and information centric networking,among others. She is cochair of the MITCommunications Futures Program’s Privacy

and Security Working Group, and spent two years as a senior programdirector at the US National Science Foundation. She is a member of theIEEE and the ACM.

Jon Crowcroft is the Marconi Professor ofNetworked Systems in the Computer Laboratoryof the University of Cambridge. Prior to that, hewas a professor of networked systems atUniversity College London (UCL) in the Compu-ter Science Department. He has supervisedmore than 45 PhD students and 150 mastersstudents. He is a fellow of the ACM, the BritishComputer Society, the IEE, the Royal Academyof Engineering, and the IEEE. He was a member

of the Internet Architecture Board (IAB) from 1996-2002 and went to thefirst 50 IETF meetings, was general chair for the ACM SIGCOMM 1995-1999, and was the recipient of Sigcomm Award in 2009. He is theprinciple investigator in the Computer Lab for the EU Social Networksproject; the EPSRC-funded Horizon Digital Economy project, hubbed atNottingham; the EPSRC-funded project on federated sensor nets projectFRESNEL, in collaboration with Oxford; and a new five year projecttoward a Carbon Neutral Internet with Leeds.

880 IEEE TRANSACTIONS ON MOBILE COMPUTING, VOL. 10, NO. 6, JUNE 2011