ieee/acm transactions on networking, vol. 25, no. 5...

IEEE/ACM TRANSACTIONS ON NETWORKING, VOL. 25, NO. 5, OCTOBER 2017 2829

Transient Community Detection and Its Applicationto Data Forwarding in Delay Tolerant Networks

Xiaomei Zhang, Member, IEEE, and Guohong Cao, Fellow, IEEE

Abstract— Community detection has received considerableattention because of its applications to many practical prob-lems in mobile networks. However, when considering temporalinformation associated with a community (i.e., transient com-munity), most existing community detection methods fail due totheir aggregation of contact information into a single weightedor unweighted network. In this paper, we propose a contact-burst-based clustering method to detect transient communitiesby exploiting pairwise contact processes. In this method, weformulate each pairwise contact process as a regular appearanceof contact bursts, during which most contacts between the pairof nodes happen. Based on this formulation, we detect transientcommunities by clustering the pairs of nodes with similar contactbursts. Since it is difficult to collect global contact informationat individual nodes, we further propose a distributed method todetect transient communities. In addition to transient communitydetection, we also propose a new data forwarding strategy fordelay tolerant networks, in which transient communities serveas the data forwarding unit. Evaluation results show that ourstrategy can achieve a much higher data delivery ratio thantraditional community-based strategies with comparable networkoverhead.

Index Terms— Transient community, data forwarding, delaytolerant networks.

I. INTRODUCTION

IN DELAY Tolerant Networks (DTNs) [1], mobile devicesare only intermittently connected due to mobility and low

node density. As a result, it is hard to maintain an end-to-end path, which makes data forwarding in DTNs extremelydifficult. To address this problem, researchers have proposedapproaches to exploit social network concepts, such as cen-trality [2]–[5], community [6]–[9], and friendship [10]–[12].

Communities have received considerable attention becauseof their applications to data forwarding in DTNs, wormcontainment [13], and other aspects. However, it is a challengeto detect communities in a large network [14] [15], espe-cially considering the temporal information associated withcommunities. For example, a class community consisting ofstudents attending a class may only appear during the day, and

Manuscript received September 23, 2015; revised July 6, 2016; acceptedMay 9, 2017; approved by IEEE/ACM TRANSACTIONS ON NETWORKINGEditor F. E. Bustamante. Date of publication June 9, 2017; date of currentversion October 13, 2017. This work was supported by the National ScienceFoundation under Grant CNS-1421578, and by Network Science CTA underGrant W911NF-09-2-0053. (Corresponding author: Xiaomei Zhang.)

X. Zhang is with the Department of Mathematics and ComputationalScience, University of South Carolina Beaufort, Bluffton, SC 29909 USA(e-mail: [email protected]).

G. Cao is with the Department of Computer Science and Engineering,The Pennsylvania State University, University Park, PA 16802 USA (e-mail:[email protected]).

Digital Object Identifier 10.1109/TNET.2017.2708090

Fig. 1. False mixture: (a) TC1 appears during daytime and TC2 appearsat night. (b) The two TCs are falsely mixed into one community using atraditional method.

a dormitory community consisting of students in a dormitorymay only appear at night. Since these communities normallyappear during a time period and disappear thereafter, they arereferred to as transient communities (TCs). Although manycommunity detection methods have been proposed in theliterature, there is hardly any method for TC detection.

Existing community detection methods are generallybased on weighted networks or unweighted networks. Forexample, algorithms have been proposed in [16] and [17] todetect communities in weighted networks. Methods like labelpropagation [18] have been proposed to detect communitiesin unweighted networks. To detect communities in bothweighted and unweighted networks, the Clique PercolationMethod (CPM) [19], also known as K-clique, has beenproposed. Recently, AFOCS [8] has been proposed to detectstatic communities and track community dynamics based onunweighted network snapshots. However, AFOCS aggregatescontact information into a weighted or unweighted network.As a result, important contact information, such as thetime when nodes contact, is lost. Losing such temporalinformation may result in two problems related to TCdetection: false mixture and false separation, as shownin Figure 1 and Figure 2.

• False Mixture There are originally two TCs in thenetwork, as shown in Figure 1 (a). TC1 is a classcommunity that occurs during the day, and TC2 isa dormitory community that occurs at night. The twocommunities share several students, who take classestogether and live together. Since there is a large overlapbetween them, traditional community detection methodsmay falsely mix these two communities as one. Forexample, CPM (K-clique) and AFOCS fail to distinguishthe two communities when the overlap is larger than acertain threshold.

• False Separation Figure 2 (a) shows one TC in thenetwork. Because the network is not strongly connected

1063-6692 © 2017 IEEE. Personal use is permitted, but republication/redistribution requires IEEE permission.See http://www.ieee.org/publications_standards/publications/rights/index.html for more information.

2830 IEEE/ACM TRANSACTIONS ON NETWORKING, VOL. 25, NO. 5, OCTOBER 2017

Fig. 2. False separation: (a) The network has one TC. (b) The TC is falselyseparated into two communities using a traditional method.

(i.e., only one node connects the two parts), a traditionalmethod may separate them into two communities. How-ever, they should not be separated since they may indeedattend the same class at the same time.

With false mixture, two highly-overlapping TCs cannot bedistinguished by ignoring temporal contact information. Withfalse separation, one large TC may be falsely separated if thenode connections are not dense enough. Therefore, traditionalcommunity detection methods fail to detect TCs.

In this paper, we propose a Contact-burst-based Cluster-ing Method (CCM) to detect TCs by exploiting pairwisecontact processes. CCM is based on the detailed pairwisecontact information between nodes, instead of networks withunweighted on-off edges or weighted edges. In CCM, we for-mulate each pairwise contact process as the regular appearanceof contact bursts, during which most contacts between the pairof nodes appear. Based on this formulation, we detect TCs byclustering the pairs of nodes with similar contact bursts. Sinceit is hard to collect global contact information at individualnodes, we also propose a distributed method. In addition toTC detection, we also apply TCs to data forwarding in DTNs.The contributions of this paper are summarized as follows:

• We propose a CCM method to detect TCs. Comparedwith existing methods, such as CPM and AFOCS, whichdo not consider the temporal information of communities,our method has much fewer false mixtures and falseseparations.

• Since it is difficult to collect global contact information,we further propose a distributed CCM method to detectTCs. Trace-driven simulation results show that the dis-tributed CCM can effectively detect TCs.

• A TC may periodically appear, and therefore we proposetechniques to identify this appearance pattern, which isuseful in many applications.

• We propose a data forwarding strategy in DTNs basedon TCs, where data are forwarded to TCs with betterrelaying capability to the destination, considering thetime constraints of data. Evaluation results show thatour strategy outperforms other existing data forwardingstrategies in DTNs.

The rest of the paper is organized as follows. Section IIpresents our CCM method and distributed CCM method forTC detection. In Section III, we present the TC-based dataforwarding strategy. Section IV provides an overview of therelated work, and Section V concludes the paper.

II. TRANSIENT COMMUNITY DETECTION

In this section, we first give some preliminaries, and then wepresent our CCM method and the distributed CCM method for

TABLE I

TRACE SUMMARY

TC detection. Finally, we compare our TC detection methodwith other community detection methods and compare thedistributed CCM method with CCM.

A. Preliminary

In this section, we describe the contact traces and introducesome definitions that will be used in this paper.

1) Contact Trace: We study the social contact patterns onthree sets of traces: Dartmouth campus trace [20], MIT realitytrace [21], and UCSD campus trace [22]. These traces recordcontacts among users carrying mobile devices on campus.In Dartmouth and UCSD traces, devices are WiFi-enabled. TheDartmouth trace uses SNMP logs from Access Points (APs).The original trace contains records of several thousandsWiFi-enabled devices and lasts almost 4 years. In this paper,we focus on the data collected from September 2004 toDecember 2004. A contact is recorded when two devicesdetect the same AP simultaneously. The UCSD trace recordsthe WiFi association of human-carried PDAs with APs, anda contact is recorded when two devices detect the same AP.In the MIT Reality trace, the devices periodically detect theirpeers via Bluetooth interfaces, and a contact is recorded whentwo devices move into the communication range of each other.The details of these three traces are shown in Table I.

2) Contact Burst: Simply extracting communities fromweighted or unweighted networks is not enough to detect TCs.Therefore, instead of simplifying each pairwise contact processto a weight, our method processes the pairwise contact infor-mation directly.

From the trace summary, we can see that each trace hashundreds of thousands of contacts, and detecting TCs fromthese contacts directly will be difficult. Also, the opportunisticnature of the contacts hardly provides any clue on how theyare related to TCs. Thus, we propose a different solution.We model the pairwise contact process using simple units,which can represent the contact information between nodesand be directly processed. We formulate each pairwise contactprocess as a series of contact bursts, during which mostcontacts between the pair of nodes appear. A contact burstis defined as follows:

Definition 1: A contact burst TB = [ts, te] betweentwo nodes is a time period when contacts frequently appearbetween these two nodes. Two adjacent contacts belong to onecontact burst if and only if the inter-contact time between themis shorter than λ, where λ is a pre-defined threshold.

Figure 3 shows three contact bursts of two nodes, whereeach vertical arrow indicates a contact and TBi denotes theith contact burst. A single contact is not considered a contactburst because it is more like a random contact.

ZHANG AND CAO: TC DETECTION AND ITS APPLICATION TO DATA FORWARDING IN DTNs 2831

Fig. 3. Three contact bursts. An arrow represents a contact betweentwo nodes.

TABLE II

HOW CONTACT BURSTS CAN REPRESENT CONTACT PROCESSES (λ = 1)

Fig. 4. How the percentage of the contacts within the contact bursts changeswith parameter λ.

With the concept of the contact burst, each pairwise contactprocess between two nodes is modeled as a series of contactbursts TBi = [tsi , tei ], i = 1, 2, 3 . . . . To verify if the contactbursts can really represent node contacts, we have conductedsome experiments based on the three traces to evaluate howmany contacts can be represented by contact bursts. Table IIshows the percentage of the contacts that are within the contactbursts and the duration of the contact bursts as a percentage ofthe total trace time. The value of λ is set empirically based ondifferent traces. We have found that in all of our three traces,with λ = 1 hour, most contacts occur within some contactbursts, which only account for a small portion of the total time.We also evaluate how λ affects the percentage of the contactsthat are within the contact bursts. The results are shownin Figure 4. As can be seen, there is a large percentage increasewhen λ increases from 0.5 to 1. As λ further increases,the percentage increase slows down. On the other hand, asλ increases, adjacent contacts with long inter-contact time aremore likely to be falsely assigned to the same contact burst.Thus, we set λ = 1 hour in this paper.

Contact bursts are used to find TCs. A contact burst [ts, te]between a pair of nodes is the time interval in which the twonodes frequently contact. In a TC with duration [ts, te], itsmembers usually have frequent contacts with each other duringthis interval; i.e., their contact bursts are all with durationssimilar to [ts, te]. Therefore, if we are able to find a groupof contact bursts with similar durations and they can mutually

Fig. 5. A connected graph built on contact bursts. Six contact bursts havesimilar contact durations, and they may be from one TC, circled by red line.The other four have different contact durations, which are not in this TC.

form a connected graph, they will form a TC. For example,in Figure 5, there are 10 contact bursts, where a double arrowrepresents a contact burst. Among them, six contact burstsstart around 1pm and end around 2pm. Then, they can form aconnected graph, and are most likely from the same TC withduration around [1, 2] pm. On the other hand, the other fourcontact bursts in the example have different contact durations,and are thus not in this TC. Although the similarity in thissimple example can be easily calculated, in a complex networkwhere contact bursts may not have the exact same starting andending time, we need techniques to calculate their similarity.

3) The Similarity of Contact Bursts: In order to detect TC,our first objective is to find out contact bursts with similartime periods.

Definition 2: The similarity of two contact burstsTB1 = [ts1, te1] and TB2 = [ts2, te2] is defined as the Jaccardsimilarity coefficient

S(TB1, TB2) =TB1 ∩ TB2

TB1 ∪ TB2

=

⎧⎪⎪⎨

⎪⎪⎩

min(te1, te2)−max(ts1, ts2)max(te1, te2)−min(ts1, ts2)

,

if min(te1, te2) > max(ts1, ts2)0, if min(te1, te2) ≤ max(ts1, ts2)

Jaccard similarity coefficient is commonly used to measure thesimilarity between two equal length “0-1” sequences; i.e., thetotal number of positions where both sequences have 1 dividedby the total number of positions where at least one sequencehas 1. A similar technique can be applied to our case when twotime intervals are compared. The Jaccard similarity coefficientfor two time intervals is the intersection between them dividedby their union.

B. TC Detection With CCM

Based on contact bursts and their similarity, we cluster sim-ilar bursts together, from which we find TCs. There are manyclustering methods in the literature, such as K-means [23],spectral clustering [24] and hierarchical clustering [25]. Thesemethods aim to cluster a set of subjects, with the similaritywell defined. Using the K-means algorithm, the number ofclusters should be pre-known before clustering. Spectral clus-tering partitions nodes into clusters by using the eigenvectors


Fig. 6. For two clusters to merge, they must share at least one commonnode. (a) No common node, nodes U, V and nodes X, Y may come fromtwo different TCs which appear at similar time. (b) Two contact bursts withcommon node U , and these three nodes form a TC.

of a pairwise similarity matrix. Among them, hierarchicalclustering is effective and efficient, with only one parameter– the minimum allowed similarity (γ) that can be flexibly set.Thus, our CCM is based on hierarchical clustering.

With a set of n contact bursts, CCM runs as follows:1) Initialization: Each of the n contact bursts starts to form

its own cluster.2) Merge clusters: Pick two clusters with the largest sim-

ilarity and merge them together. The similarity of thetwo clusters is defined as the average of all the pairwiseJaccard similarity coefficients of the contact bursts inthese two clusters. This step repeats until the terminationcondition is satisfied, as defined in the next step.

3) Termination: The algorithm terminates if the largestsimilarity between all clusters in one round is smallerthan the minimum allowed similarity γ.

In order for the algorithm to generate TCs, we need tomodify the cluster merging phase. With this algorithm, twoclusters with a similar occuring time may be merged as a TC.However, this is not always true. For example, if two clustersdo not share any common node, it is more likely that they arefrom two TCs which happened at a similar time but at differentlocations. Thus, we add one condition for cluster merging; thatis, the picked clusters must have at least one common node(as shown in Figure 6). For all pairs that satisfy this condition,those with the largest similarity are merged.

With CCM, contact bursts with similar time periods areclustered together. One cluster corresponds to one TC, wherenodes attached to the contact bursts will be in that TC. Whenthe algorithm terminates, there may be some “tiny” clusters,which only include one contact burst. One contact burst onlyconsists of two nodes, and it is more like a personal meetingrather than a TC. For a contact burst [ts, te], its duration isdefined as te − ts. For a cluster (TC) that includes multiplecontact bursts, its duration is defined as the average of thedurations of these contact bursts, and its starting time andending time are respectively defined to be the average of thestarting time and ending time of the contact bursts. A TC withshort duration may represent an occasional encountering ofseveral nodes. Such TCs with short durations may not reappearand should not be counted as TCs. To further verify this, weconduct an experiment to test whether a TC’s reappearanceis relevant to the TC’s duration. The result is shown inFigure 7. We find that in all three traces, the TCs with avery short duration (< 15 min) are less likely to reappear

Fig. 7. The effects of TC duration on TC reappearance in the future.

Fig. 8. The number of communities (left, blue) and the average size of thecommunities (right, green) with varying γ (UCSD trace).

than those with a long duration (≥ 15 min). Therefore, weeliminate the TCs with durations shorter than a thresholdTmin. We have tested different values for Tmin (i.e., from10 min to 20 min), and find that there is no noticeablechange on the reappearance ratio. Therefore, we simply setTmin = 15 min (0.25 hour). In addition, TCs with onlyone contact burst are also deleted, since these TCs only havetwo members. This is consistent with most existing work oncommunity detection (e.g., [8], [19]), in which the detectedcommunities should have at least three members.

Parameter γ: The minimum allowed similarity γ affectsthe number of TCs and the community size. We run CCM onthe UCSD trace to observe the impacts of γ, and the resultsare shown in Figure 8. As shown in the figure, increasingγ results in more TCs with small community size. When γis small, the minimum allowed similarity between clustersbecomes smaller. Then, more clusters are merged, resultingin fewer TCs, but larger community size. If γ is too large,the clustering process ends quickly, and contact bursts thatshould belong to the same TC may not be clustered beforethe algorithm terminates. Thus, it is important to find the rightvalue for γ.

In a mobile network, contacts that happen between membersin the same community are called intra-community contacts,and contacts that happen between members in different com-munities are called inter-community contacts. A good commu-nity structure should have more intra-community contacts andfewer inter-community contacts. Since a TC also has temporalinformation, a contact in a TC is called an intra-communitycontact only when the contact happens between members inthe same TC during the TC’s existing time period.

To obtain the percentage of intra-community contacts, were-run the experiment with the TCs’ information and count the


Fig. 9. Percentage (PCT) of intra-community contacts with varying γ.

number of intra-community contacts. The percentages of intra-community contacts with different γ are shown in Figure 9,and a larger percentage is better. When γ is small, more TCsare merged even though their durations may be different. Then,the durations of these individual TCs may be dramaticallydifferent from the duration of the merged TC (the durationof the merged TC is computed as the average of the durationsof these individual TCs). As a result, the occurring time ofmost contacts may fall out of the duration of the merged TC(i.e, the TC’s existing time period), and these contacts cannotbe counted as intra-community contacts. Therefore, when γis small, the percentage of intra-community contacts is low.When γ is large (e.g. close to 1), only TCs with very similardurations are merged. Then, most of the resulting TCs havevery small sizes (less than 3), and these TCs will be eliminatedby CCM. Contacts in these eliminated TCs will no longer becounted as intra-community contacts. Thus, the percentage ofintra-community contacts is also low when γ is large. As canbe seen, the percentage of intra-community contacts reachesmaximum when γ is 0.4, and we choose γ = 0.4 for the restof the paper. Note that many contacts are not intra-communitycontacts even at γ = 0.4. This is because these contacts aremost likely random contacts, which are common in networkswith high node mobility.

C. TC Detection With Distributed CCM

CCM is a centralized detection method that requires allof the pairwise contact information between nodes in thenetwork. Since it is difficult to collect global contact infor-mation at individual nodes, we further propose a distributedCCM. In this subsection, we first explain the basic idea of thedistributed CCM and then present the detailed algorithm.

1) Basic Idea: Similar to CCM, the distributed CCM detectsTCs by clustering the contact bursts. Since it is difficultto collect all contact bursts at individual nodes, each nodeonly records the contact bursts with its neighbors, which arereferred to as local contact bursts. If a node v has frequentcontacts with node w in the recent time period and the adjacentcontacts have intervals smaller than λ = 1 hour, a contact

burst between v and w exists. An indicator I(v,w)e (t) =

{true, false} is used to indicate whether the contact burstbetween v and w exists at time t.

In addition to local contact bursts, each node also maintainsthe TCs it has detected locally, which are referred to aslocal TCs or local clusters (a TC is actually a cluster of

contact bursts). The local clusters detected at node v shouldinclude v as a member and have pairwise similarities smallerthan the minimum allowed similarity γ = 0.4; otherwise,similar clusters should be further merged. With this rule,node v adds new clusters to its local clusters in the followingtwo ways:

• Adding local contact bursts: As node v finishes its contactburst with another node w (i.e., the last contact betweenv and w happens λ = 1 hour before), the contactburst T

(v,w)B is treated as a new cluster Cnew = {T (v,w)

B }and is added to the local clusters of v. Since the newcluster may have a duration similar to existing localclusters (with a similarity larger than γ), the similarclusters should be merged.

• Adding local clusters of other nodes: Simply using thelocal contact bursts to detect TCs is not enough. Nodesalso learn about the local clusters of other nodes uponpairwise contacts. Specifically, as node v contacts node w,it exchanges the local clusters with node w. Afterwards,node v selects some clusters from w and merges with thelocal clusters at v.

2) The Distributed Algorithm: When a node v initializesthe TC detection process, it has no local contact bursts andlocal clusters. Whenever node v encounters another node w,it updates the local contact bursts and exchanges the localclusters with w. Then node v uses the new information toupdate its local clusters. Algorithm 1 outlines the processof updating the local clusters as node v contacts node w attime t. We first give the notations used in the algorithm, andthen discuss the detailed process and the functions used in thealgorithm.

At node v, T(v,k)B = [t(v,k)

s , t(v,k)e ] is used to denote the con-

tact burst with node k (k �= v), and I(v,k)e (t) ∈ {true, false}

indicates whether T(v,k)B exists or not at time t. The set of local

clusters at v is denoted as Cv. Each cluster C in Cv consistsof a set of contact bursts. S(C1, C2) denotes the similarity oftwo clusters. If C1 and C2 do not share any common node,the similarity is 0; otherwise, it is computed as the averageof all pairwise Jaccard similarity coefficients of the contactbursts in the two clusters. N denotes the set of nodes in thenetwork.

Algorithm 1 consists of two steps in general. In the first step(Line 1-16), node v updates the local contact bursts at time t.Specifically, if I

(v,k)e (t) = true and the last contact in T

(v,k)B

happened λ before (t−t(v,k)e > λ), it means T

(v,k)B has finished

and I(v,k)e (t) is changed to false. Corresponding to each of

the finished contact bursts, a new cluster is created and addedto the local clusters. As the new cluster may have a durationsimilar to existing local clusters, we should determine if itwill be merged with these clusters. The function NewCluster,which will be discussed later, is used to determine how thenew cluster is merged with existing clusters.

Node v then updates the contact burst with the encoun-tered node w by incorporating the contact at time t.If I

(v,w)e (t) = false, a new contact burst begins at time t.

Otherwise, the contact burst T(v,w)B is extended to time t by

including the current contact (t(v,w)e ← t).


Algorithm 1 Updating the Local Clusters When Node vContacts Node w at Time t1: /* Update the contact bursts with other nodes. */2: for k ∈ N \ v do3: if I

(v,k)e (t) = true and t− t

(v,k)e ≥ λ then

4: I(v,k)e (t)← false;

5: Cnew ← {T (v,k)B };

6: Cv ← Cv ∪ {Cnew};7: NewCluster(Cv, Cnew);8: end if9: end for

10: /* Update the contact burst between v and w. */11: if I

(v,w)e (t) = false then

12: I(v,w)e (t)← true;

13: t(v,w)s ← t, t

(v,w)e ← t;

14: else15: t

(v,w)e ← t;

16: end if17: /* Exchange the local clusters between v and w */18: Cw ← the set of local clusters in node w;19: MergeClusters(Cv, Cw);

Function 1 - NewCluster

20: function NewCluster(C, Cnew)21: /* Find in C the most similar cluster Cm to Cnew . */22: Sm ← 0;23: for C ∈ C \ {Cnew} do24: if S(C, Cnew) > Sm then25: Sm ← S(C, Cnew);26: Cm ← C;27: end if28: end for29: if Sm < γ then return30: else31: Cm ← Cm ∪Cnew ;32: C ← C \ {Cnew};33: NewCluster(C, Cm);34: end if

Function 2 - MergeClusters

35: function MergeClusters(C1, C2)36: Sm ← 1;37: while Sm ≥ γ do38: /* Find the pair of cluster with highest similarity. */39: Sm ← 0;40: for C1 ∈ C1 do41: for C2 ∈ C2 do42: if S(C1, C2) > Sm then43: Sm ← S(C1, C2);44: Cm ← C2;45: end if46: end for47: end for48: Cnew ← Cm;49: C1 ← C1 ∪ {Cnew};50: NewCluster(C1, Cnew);51: end while

In the second step (Line 17-19), node v exchanges the localclusters with node w and checks if the local clusters of w canbe merged with its local clusters. Node v uses the functionMergeClusters to decide how to merge the clusters in node wwith the local clusters. Generally, with MergeClusters, onlyclusters having high similarities with the local clusters aremerged, since the clusters with low similarities usually haveno relation to node v.

Algorithm 1 only shows the updating process at node v astwo nodes v and w contact. The updating process at node wis similar to the process at node v and therefore is not shownhere. Next, we discuss the two functions used in Algorithm 1:NewCluster and MergeClusters.

• NewCluster: The objective of NewCluster is to check ifthe new cluster Cnew should be merged with one of theexisting clusters. It works by finding the cluster Cm thathas the maximum similarity to Cnew. If the similarity issmaller than γ, nothing needs to be done and the functionreturns. Otherwise, Cnew is merged with Cm by addingthe contact bursts in Cnew to Cm. Cnew is then deletedfrom the local clusters. Since Cm has been updated byadding more contacts bursts, it is treated as another newcluster and the function NewCluster will be recursivelycalled to check whether the updated cluster should bemerged with other clusters.

• MergeClusters: Given two sets of local clustersC1 and C2, MergeClusters is used to select similar clustersin C2 and merge those clusters with the clusters in C1. Thefunction works iteratively. In each iteration, the pairwisesimilarities between clusters in C1 and C2 are calculated.Then, for the pair of clusters that has the largest similarity,if the similarity is not smaller than γ, the cluster from C2is selected and added to C1 as a new cluster. Afterwards,the function NewCluster is used to check if the newcluster should be merged with the existing clusters in C1.

After all contacts have been processed, we use the sameprocedures as CCM to delete the clusters that only haveone contact burst and delete clusters with a duration shorterthan Tmin. Finally, each cluster corresponds to a TC whichincludes all nodes attached to the contact bursts.

D. Evaluations

1) Evaluations of CCM: We compare the performanceof our TC detection algorithm CCM with two commonlyused community detection algorithms: CPM (K-clique) andAFOCS. The performance is compared based on six metrics,covering various properties of community. Here, we only showthe results based on the UCSD trace, since the results onother traces are similar. Based on the UCSD trace, the threealgorithms are used to detect communities in each day overthe trace’s period.

Number of communities and community size: Figure 10 (a)shows that CCM can detect many more TCs than communitiesdetected by CPM and AFOCS. This indicates that CPM andAFOCS may have mixed some TCs into one community.Figure 10 (b) shows that communities detected by CPM are


Fig. 10. Comparisons among CMP, AFOCS and our CCM algorithm.(a) Number of communities. (b) Size of communities. (c) Proportion ofnodes in communities. (d) Number of associated communities for each user.(e) Proportion of intra-community contacts. (f) Location distortion.

usually bigger than those detected by AFOCS and CCM.This further confirms that CPM may have mixed some TCs,increasing the overall community size.

Proportion of nodes involved in community: Communitystructure is usually used to help data forwarding in DTNs; thusit is better if more nodes can be included in the community.Figure 10 (c) shows that CCM usually involves more nodesthan CPM and AFOCS. In CPM and AFOCS, the node thathas infrequent contacts with others may be ignored.

Number of associated communities for one user: We usethis metric to show how much communities overlap. FromFigure 10 (d), we can see that a node is normally attached toone community in CPM or AFOCS. This means communitiesare almost mutually disjointed, which does not achieve theobjective of detecting overlapping communities. However, TCsdetected in CCM have a strong overlapping property where anode belongs to an average of three TCs. Because TCs aredetected using temporal information, it has no limitation onhow much communities overlap. Other methods have limitson how much two communities can overlap.

Proportion of intra-community contacts: A commu-nity structure should incorporate more contacts. For CPMand AFOCS, a contact is considered an intra-communitycontact when it appears between two members in the samecommunity. We need another temporal requirement for acontact in a TC to be considered an intra-community contact;

that is, it must occur within the community’s duration. There-fore, we can see that TC incorporates less intra-communitycontacts than CPM and AFOCS, as shown in Figure 10 (e).CPM has the highest number of intra-community contacts.This can be explained by the fact that CPM detects many largecommunities, which counts more contacts as intra-communitycontacts.

Location distortion: It measures the difference betweennodes’ locations within a community. Intuitively, a smallerlocation distortion means that users in the same communityare near each other. On the contrary, a larger location distor-tion means that users may appear at multiple places in thiscommunity. A community’s location distortion is the standarddeviation of the locations of all intra-community contacts.A contact’s location is calculated by averaging two nodelocations at the contact time.

Loc<i,j> =Loci,t<i,j> + Locj,t<i,j>

2(1)

A community’s mean location is calculated by averaging thelocations of all intra-community contacts.

Loc(C) =

∑<i,j>∈C Loc<i,j>∑

<i,j>∈C 1(2)

The location distortion within a community C is definedas the standard deviation of all intra-community contacts’positions.

Distortion(C) = stdev{Loc<i,j>} (< i, j >∈ C)

=√

var{Loc<i,j>} (< i, j >∈ C)

=

√∑<i,j>∈C(Loc<i,j> − Loc(C))2

∑<i,j>∈C 1

(3)

The UCSD trace does not record users’ location informa-tion, so we estimate user location through the detected APs’locations at that time. From Figure 10 (f), we can observe thatthe location distortion in TCs detected by CCM is smallerthan that by CPM and AFOCS; i.e., nodes usually gatherat one place in a TC. In a traditional community-based dataforwarding strategy, data are intended to be forwarded to thecommunities that include the destination node. In such anapplication, a smaller location distortion is better, because thismeans nodes are near each other in one community. Oncethe data reaches the destination community, it must be nearthe destination node. In communities detected by CPM andAFOCS, nodes do not have a gathering period, so the contactsbetween them can appear at any time and any place, whichresults in a large location distortion.

Verifying false mixture and false separation: Here, weuse examples to illustrate that traditional community detectionmethod (i.e., AFOCS) may result in false mixture or falseseparation, while our CCM method works correctly.

Example of false mixture: To verify false mixture, we takeone AFOCS community from one day of the UCSD trace as anexample. The details of this community are shown in Table III.In Table III, Member lists the IDs of the community members.Location is the location of the community, as computed inEquation 2, represented as coordinates. Location distortion


Fig. 11. Evaluation of the distributed CCM: (a) the performance of distributed CCM as the variation of running time (number of running days), (b) comparingwith CCM on the proportion of intra-community contacts, (c) CDF of the similarities of distributed TCs to centralized TCs.

TABLE III

FALSE MIXTURE: AFOCS COMMUNITY C

TABLE IV

TRANSIENT COMMUNITY TC1 AND TC2

measures the difference between member locations within thecommunity, as computed in Equation 3. As can be seen, theAFOCS community has six members 3, 34, 48, 77, 101, 263,and a very large location distortion 181.537. The large locationdistortion indicates that the intra-community contacts betweenthe six members occur at different locations. Then, we furtherdetect transient communities during this day, and find thatthere are actually two TCs, as shown in Table IV. In Table IV,Duration is used to denote the existing time period of the TC,represented as the time in that day. As can be seen, TC1,which has five members: 3, 34, 77, 101, 263, occurred inthe afternoon. TC2, which has three members: 48, 77, 101,occurred at night. The locations of the two TCs are far away,and the location distortion inside each TC is small. Thisindicates that TC1 and TC2 are from two distinct socialactivities that occurred at different time and locations. Becausethe two TCs share two common nodes, 77 and 101, they arefalsely combined into one community by AFOCS, resulting ina false mixture.

Example of false separation: We also identify the existenceof false separation in AFOCS. From one day in the UCSDtrace, we identify two AFOCS communities, C1 and C2.As shown in Table V, C1 has four members: 112, 163,225, 248, and C2 has five members: 16, 41, 180, 212, 225.C1 and C2 share one common node 225. The locations ofthe two communities are close to each other, and the loca-tion distortion inside each community is small. The locationinformation indicates that members of C1 and C2 gather at thesame place, and they may be from the same social community

TABLE V

FALSE SEPARATION: AFOCS COMMUNITY C1 AND C2

TABLE VI

TRANSIENT COMMUNITY TC

but are falsely separated. In fact, in the same day, we detecta large TC (as shown in Table VI) including all membersof C1 and C2, and the location distortion of the large TC issmall. This large TC can well represent the social community,in which all members of C1 and C2 participate. This alsoconfirms that C1 and C2 are from the same TC which is falselyseparated by AFOCS.

2) Evaluations of Distributed CCM: We next evaluate theperformance of the distributed CCM based on the mobilitytraces.

Effects of running time: Using distributed CCM, TCs areupdated upon pairwise contacts. With a longer running time,each node encounters more nodes and more TC informationcan be learned. In the first experiment, we vary the runningtime and examine whether increasing the running time willenhance the performance of the distributed CCM. Here, theperformance is evaluated based on the proportion of intra-community contacts. For distributed CCM, a contact is con-sidered to be an intra-community contact if it is incorporatedby the local TCs of the two nodes in contact. As can be seenfrom Figure 11 (a), in all three traces, increasing the runningtime from 1 day to 2 days enhances performance significantly,but further increasing the running time does not bring obviousperformance gain. Therefore, we set the running time of thedistributed CCM to be 2 days for the remainder of this paper.

Comparing with CCM: We next compare the distributedCCM with CCM. The performance is first compared by mea-suring the proportion of intra-community contacts. As shownin Figure 11 (b), distributed CCM incorporate 80%− 90% asmuch contacts as CCM in all three traces. This demonstrates


that the TCs detected by distributed CCM (referred to asdistributed TCs) can well represent the TCs detected by CCM(referred to as centralized TCs).

We further evaluate the distributed TCs by measuring theirsimilarities to the centralized TCs. To measure the similaritybetween two communities, previous work [26] calculates theJaccard similarity coefficient between members in the twocommunities. Since TCs also consider the duration of the com-munities, the similarity on duration should also be considered.Similarly, the similarity on durations is calculated using theJaccard similarity coefficient. The similarity between two TCsTC1 and TC2 is calculated as the average of similarities onmembers and durations:

Sm,t(TC1, TC2) =Sm(TC1, TC2) + St(TC1, TC2)

2(4)

where Sm(TC1, TC2) and St(TC1, TC2) respectively denotethe similarities on members and durations.

In order to measure the similarity of a distributed TC TCd tothe centralized TCs, we find the corresponding centralized TCTC∗

d that has the highest similarity to TCd. It is preferable ifevery distributed TC can find a corresponding centralized TCwith high similarity. Otherwise, it usually means the detecteddistributed TC is inaccurate or unnecessary.

The distribution (CDF) of the similarities of distributedTCs to centralized TCs is displayed in Figure 11 (c). Forthe MIT and Dartmouth traces, more than 80% distributedTCs can find corresponding centralized TCs with similaritylarger than 0.7. For the UCSD trace, about 50% distributedTCs can find corresponding centralized TCs with similaritieslarger than 0.7. Thus, most of the detected distributed TCs canfind corresponding centralized TCs with high similarity. In themeantime, for all three traces, there exist some distributedTCs with very low similarities to the centralized TCs, whichimplies that the distributed CCM may detect some TCs thatare inaccurate or unnecessary.

III. APPLICATION TO DATA FORWARDING

Community has been widely used for data forwardingin DTNs. However, ignoring community appearance timemay lead to non-optimal forwarding paths. For example, incommunity-based data forwarding, data are always intendedto be forwarded to a node within the destination’s community.This is not optimal when considering two problems. First,because a traditional community usually has a large locationdistortion, delivering data to the destination’s community doesnot mean it is getting closer to the destination node. Second,considering the appearance time of communities, some of thedestination communities to which data are forwarded maynot even appear before the data expire. Our TC-based dataforwarding strategy solves these problems by utilizing TCs asthe forwarding unit and always forwarding data to TCs thathave better capability of relaying data to the destination nodewithin a short time constraint.

Utilizing TCs to data forwarding requires knowledge on TCappearance patterns. In this section, we first study the periodicappearance patterns of TCs and discuss how to compute therelaying capabilities of TCs, and then present the TC-baseddata forwarding strategy.

Fig. 12. CDF of TCs’ appearance times.

A. Periodic Appearance of Transient Communities

With the proposed CCM, we can detect TCs on a dailybasis. We run the algorithm based on the traces and findthat there are many TCs that share similar members andappear at different times. These TCs represent one social groupthat appears periodically, like a class. In addition to theseperiodically appearing TCs, there exist TCs that only appearonce. Such randomly appearing TCs are usually formed due tothe opportunistic meeting of unfamiliar nodes. Later, we willno longer consider these opportunistically formed TCs. In thissubsection, we will identify the periodic appearance of TCsbased on the traces and exploit their appearance patterns.

1) Identifying the Periodic Appearances of TCs: The algo-rithm we use to cluster similar TCs is the same hierarchicalclustering that we have presented in Section II-B. TCs areclustered according to the similarity of their members, whichis also based on the Jaccard similarity coefficient.

We are interested in knowing how many times each TCappears. We run the clustering algorithm on three traces, eachwith 75 days of data. The CDFs of TCs’ appearance timesare shown in Figure 12. TCs that only appear once are notincluded in the figure. As can be seen, more than 10% ofTCs in the Dartmouth trace and about 5% of TCs in the MITtrace and UCSD trace appear more than 10 times. These TCsare considered as frequently appeared TCs, which may havethe property of periodic appearance. Thus, the next questionis how to find the appearance patterns of these frequentlyappeared TCs.

2) Appearance Patterns of TCs: In this subsection, we studythe periodic appearance patterns of TCs on three sets of traces.We formulate the appearance patterns of TCs on a daily basis,based on the fact that many social groups, such as families,offices, and classes, are formed daily. Although there are manysocial groups formed on a weekly or monthly base or witha more complex pattern, we will not consider these patternshere and leave them as future work. The appearance pattern isformulated in two aspects. One is the distribution of the TCstarting time, and the other is the distribution of TC duration.

We have two observations. First, the starting time of aTC within a day can be well approximated by a normaldistribution. We take one TC in the Dartmouth trace as anexample, and the distribution of the TC starting time is shownin Figure 13 (a). The parameters of the distribution are shownin Table VII. The TC usually appears at around 1 to 3 pm.


Fig. 13. The PDF of a TC’s starting time on a daily basis and the PDF of the TC’s duration. (a) Starting time. (b) Duration.

TABLE VII

THE PARAMETERS OF NORMAL AND EXPONENTIAL DISTRIBUTION

Second, the duration of a TC can be approximated by anexponential distribution, as shown in Figure 13 (b). Theparameter λ is 0.374 (as shown in Table VII), which meansthat the TC has an average duration of 1/λ = 2.674 hours.

In our traces, a TC only appears with limited times. As aresult, the samples used to train the distributions are limited.Therefore, the approximation does not seem perfect, especiallyfor the normal distribution. To further verify if the TC startingtime follows normal distribution, we use Anderson-DarlingTest [27] to test the normality of the distribution. The testreveals that about 72% of TCs pass the normality test fortheir starting time. This result is acceptable considering thelimited sample size. If we only include the TCs that appearvery frequently (e.g. appearing every day), the percentageof TCs that pass the test can reach 90%. If there are moredata available, we believe that the distribution fitting wouldbe better and more TCs can pass the normality test for theirstarting time.

Determining the probability of TC appearance within atime interval: Given a time interval [ts, te], we are interestedin determining the probability that a TC will appear. This isaffected by three factors: the probability that the TC appearsin that day, the distribution of its starting time in that day,and the distribution of its duration. The probability that a TCappears in a day is simply calculated as the percentage ofdays when the TC has appeared during the warm-up period.Based on this, whether or not the TC appears within the pre-defined interval includes two possibilities. The first possibilityis that the TC starts in the time period [ts, te], and the secondis that the TC starts before the time interval but lasts untilthe start of the time interval. Suppose the starting time isrepresented by a normal distribution with parameters μ and σ,and the duration is represented by the exponential distributionwith parameter λ. The probability of the first possibility is:

P1[ts, te] =∫ te

ts

1σ√

2πe−

(t−μ)2

2σ2 dt

= normcdf(te, μ, σ2)− normcdf(ts, μ, σ2)

where normcdf(t, μ, σ2) is the CDF of the normal distributionwith mean μ and standard deviation σ. The probability of thesecond possibility is:

P2[ts, te] =∫ ts

0

1σ√

2πe−

(t−μ)2

2σ2 ∗ e−λ(ts−t)dt

= e−λ(ts−μ)+ λ2σ22

∫ ts

0

1σ√

2πe−

(t−μ−λσ2)2

2σ2 dt

The part under the integral is actually the probability densityfunction of the distribution norm(μ+λσ2, σ2). Thus, the inte-gral can be easily computed by normcdf(T1, μ + λσ2, σ2)−normcdf(0, μ + λσ2, σ2).

Then, the probability of the TC appearance within this timeinterval is:

PTC [ts, te] = pd ∗ (P1[ts, te] + P2[ts, te]) (5)

where pd is the probability that the TC appears in the day.

B. Relaying Capability

By estimating the current TC’s capability of forwarding datato the destination node within a future time period [t, t+T ], wecan choose users in TCs with better forwarding capability asthe data carriers. We denote the destination node as d and thecurrent TC as TCc. The destination TCs that d belongs to aredenoted as TC1

d , . . . , TCnd

d . Specifically, we first compute therelaying capability of TCc to each of TC1

d , . . . , TCnd

d within[t, t +T ], respectively. Then we obtain the relaying capabilityfrom TCc to d by summing the computed relaying capabilitiesto TC1

d , . . . , TCnd

d .We next discuss in detail how to compute the relaying

capability from TCc to one of the destination TCs (TCid).

The relaying capability from TCc to TCid within the time

period [t, t + T ] is computed by summing the probability thateach node of TCc will appear in TCi

d in [t, t + T ]. Thisis determined by the number of common nodes k betweenthem and the probability that TCi

d will appear in [t, t + T ].The probability that TCi

d will appear in [t, t + T ], denoted asPTCi

d[t, t+T ], is computed in Equation (5). Then, the relaying

capability from TCc to TCid is computed as:

RTCc→TCid[t, t + T ] = k ∗ PTCi

d[t, t + T ] (6)

The relaying capability from the current TC TCc to thedestination node d is computed by accumulating the relaying


capability from TCc to all the destination TCs d belongs to:

RTCc→d[t, t + T ] =nd∑

i=1

RTCc→TCid[t, t + T ] (7)

C. TC-Based Data Forwarding Strategy

This paper focuses on unicast in DTNs, where the sourcenode s, the destination node d, data’s initialization time Tin

and data’s expiration time Tex are given. The objectiveis to forward the data item from s to d within the timeperiod [Tin, Tex]. The basic idea of our TC-based data for-warding strategy is always forwarding data to a TC thathas better relaying capability. Specifically, our strategy isbased on the relaying capability of TCs in the recent timeperiod [t, t + T ]. Forwarding decisions are made upon nodecontacts. If a node meets another node that is in a TC witha larger relaying capability to the destination, data will beforwarded. Since the decision on data forwarding depends onthe current TC the node belongs to, it is important for thenode to know which TC it is currently in.

1) Finding the Current TC: To keep track of the TC thata node belongs to, each node keeps a queue (Qw) of therecently met nodes and the contact time within the last timewindow (Tw). Tw is a constant and how to set the value of Tw

will be discussed in Section III-E.2. Based on Qw and allthe TCs that it has ever known, a matching score is assignedto each TC. The node will consider the TC with the largestmatching score as the TC it currently belongs to. The matchingscore is computed as follows:

Score(TC) =

∑i∈Qw∧i∈TC(Tw − (t− ti))

|TC| (8)

where t is the current time, and ti is the node’s contact timewith node i in the queue.

Even though the distributed CCM method can also detectTCs distributedly, we do not use it to identify the current TCin the experiment. This is because distributed CCM cannotpromptly detect TCs considering that the updates on contactbursts usually have delays. Therefore, we first use the CCMto detect TCs during the warm-up period. Then, during theexperiment, we use the aforementioned method to quicklyidentify the current TC of each node by matching to the TCsdetected in the warm-up period.

2) Data Forwarding Based on TCs: As TCs are the dataforwarding units, data are always forwarded to the TC withbetter relaying capability to the destination. Therefore, once adata item reaches a new TC with a larger relaying capability,the data item is distributed to all nodes met in the TC.However, nodes do not carry the data permanently, and theyonly carry the data for a time period T . If the node that carriesdata no longer contacts any other node with a larger relayingcapability or reaches a TC with larger relaying capability,it will delete the data at the end of T . The detailed dataforwarding process is shown in the following steps:

When node v1 contacts node v2 at time t,

1) Update Qw for v1, and find the TC that v1 is currentlyin: TC1.

2) For each data that v1 carries.

a) Check if the data should be deleted from v1’s buffer.The data should be deleted when the node has neithercontacted a node in a TC with a larger relayingcapability nor gone to a TC with a larger relayingcapability in the period [t − T, t]. Once the data itemis deleted, check the next data item.

b) Check if the data item is carried by v2. If so, skip thisdata item.

c) Check v2’s current TC, TC2. If TC2 and TC1 arethe same, v1 forwards data to v2 so that the data canbe distributed in the current TC. If TC2 has a betterrelaying capability to d, v1 also forwards data to v2.

D. Performance Evaluations

We evaluate the performance of the TC-based data for-warding strategy with the Dartmouth trace, the MIT Realitytrace, and the UCSD trace, as presented in Section II-A.1.For each trace, we use the first half as the warm-up period,during which the TCs and TCs’ appearing patterns are detectedand related information are transmitted among nodes. Thesecond half of the trace is used to evaluate the performance.For each data item, the source and the destination are pickedrandomly and the generation time is randomly chosen in thedaytime, since nodes’ activity remains low at night, which mayresults in inaccurate comparison. Each experiment is repeated1000 times for statistical convergence. The value of the timeperiod T , with which we evaluate TCs’ relaying capability anddecide when to delete data, is empirically set at each trace.We set T = 1 hour in all three traces, with which we canachieve high delivery ratio with a small number of data copiesas can be seen in the following results.

Our TC-based data forwarding strategy is compared withthree traditional community-based forwarding strategies andthe Epidemic strategy that serves as the upper bound. A briefoverview of these strategies is shown below:

• Epidemic [28]: When a node carrying the data itemcontacts another node, the data item is always forwardedto the contacted node if it does not have the data. Thismethod has the best data delivery ratio and the highestnetwork overhead. Therefore, it is used as an upper boundfor comparison.

• Label [7]: This strategy is the first proposed community-based data forwarding strategy. In this strategy, the dataitem is forwarded to nodes that share at least one commoncommunity with the destination node. CPM (K-clique) isused to detect communities.

• Bubble Rap [6]: This strategy uses both centrality andcommunity. CPM (K-clique) is used to detect commu-nities. The data item is always forwarded to a highercentrality node, until it reaches a node that belongs to thesame community as the destination node. When the dataitem reaches the destination community, it is forwardedto higher-centrality node within the community’s scope,until the destination node is reached.

• AFOCS: This strategy was proposed in [8], and itis used to evaluate the communities detected by the


Fig. 14. Data delivery ratio of various methods based on three traces. (a)Dartmouth. (b)MIT. (c)UCSD.

Fig. 15. Delay of various methods based on three traces. (a)Dartmouth. (b)MIT. (c)UCSD.

Fig. 16. Data forwarding overhead measured by the number of data copies. (a)Dartmouth. (b)MIT. (c)UCSD.

AFOCS method. This method is based on how manycommon communities a node has with the destinationnode. The data item is only forwarded when the contactednode has more common communities with the destina-tion node than the original carrier.

The performance is measured with three metrics, includingthe data delivery ratio, delay, and network overhead. The datadelivery ratio is the proportion of data items successfullydelivered before the data expires. The delay is the averagetime used to deliver each data item. The network overhead isthe average number of data copies existing in the network ateach moment. The results on the three metrics are respectivelyshown in Figure 14, Figure 15, and Figure 16. Generallyspeaking, our TC-based method performs much better thanthe other three community-based methods by achieving alarger delivery ratio with less delay and comparable networkoverhead.

From Figure 14 we can see that, the TC-based strategyconsistently performs better than the other community-baseddata forwarding strategies and even has comparable deliveryratio with Epidemic when the time constraint is small. Thegood performance within small time constraints is due to thefact that we study and predict users’ behavior in TCs withina short time period T and forward data to nodes in TCs thathave good relaying capability to the destination within T .

Figure 15 shows the delay of each strategy. As can be seen,the TC-based strategy consistently achieves low delay com-pared with other community-based strategies, demonstratingthe superiority of the TC-based strategy on data forwarding.

Figure 16 shows the overhead generated by each strategy.Epidemic always has the highest overhead among all the strate-gies. Our TC-based strategy generates overhead comparableto Bubble Rap. Since the TC-based strategy removes datawhen there is no better TC formed or contacted within T ,


Fig. 17. The impact of prediction in the TC-based data forwarding strategy. (a)Dartmouth. (b)MIT. (c)UCSD.

TABLE VIII

EFFECT OF Tw ON DATA DELIVERY RATIO

the network overhead is kept low. Even though Label andAFOCS consume less network overhead than the TC-basedstrategy, the delivery ratio and delay of these two strategiesare much worse than the TC-based strategy. Therefore, theycannot be effectively used for data forwarding in DTNs.

E. Discussions

1) Effect of Prediction on Performance: In our TC-baseddata forwarding strategy, predicting when and whether a TCwill happen in the future is important. With prediction, we cancompute the relaying capability of each TC to the destinationnode and then use the nodes in TCs with good relayingcapability to carry data. To evaluate how the prediction affectsthe performance, we compare our strategy with a TC-basedstrategy without prediction (i.e., only distributing data whena node finds that the contacted node or itself is within thesame TC of the destination node). Figure 17 compares theirdata delivery ratios. We can clearly see the superiority of thestrategy with prediction.

2) Effect of Time Window Tw: In Section III-C.1, weused the information of recently met nodes in the last timewindow Tw to identify the current TC. To find the optimalvalue of Tw in each trace, we conduct an experiment to seehow Tw impacts the data delivery ratio. Here, the data deliveryratio is recorded with a time constraint of 12 hours. The resultsare shown in Table VIII. As can be seen, Tw = 0.5 hour hasthe best data delivery ratio for all traces.

3) Data Forwarding With Distributed CCM: For the currentresults on data forwarding, TCs are detected using cen-tralized CCM. We further evaluate the performance of theTC-based strategy where TCs are detected using distributedCCM (referred to as distributed TC-based strategy). Thestrategy is compared with the strategy where TCs are detectedusing the centralized CCM (referred to as TC-based strategy).The results on the three traces are respectively shown in

Fig. 18. Evaluation of distributed TC-based strategy based on Dartmouthtrace. (a) Delivery Ratio. (b) Overhead.

Fig. 19. Evaluation of distributed TC-based strategy based on MIT trace.(a) Delivery Ratio. (b) Overhead.

Fig. 20. Evaluation of distributed TC-based strategy based on UCSD trace.(a) Delivery Ratio. (b) Overhead.

Figure 18, Figure 19 and Figure 20, where the performance isevaluated in terms of data delivery ratio and network overhead.

As shown in these figures, the distributed TC-based strategyhas similar delivery ratio compared to the TC-based strategy inall three traces. This confirms our observation that the detected


distributed TCs can be used to represent most centralized TCs,including the destination TCs of each data item. By identifyingthe destination TCs, data can be successfully delivered tothe destination. However, we find that the overhead of thedistributed TC-based strategy is higher in most cases. Thisis because there are some unnecessary TCs detected by thedistributed CCM, which create more data copies inside TCs.

IV. RELATED WORK

Complex networks usually consist of communities. Muchresearch has been done on detecting communities by inter-disciplinary researchers from physics, biology, and computerscience. They originally aimed to identify communities ina static network. There are many methods that focus ondetecting disjoint communities, such as the modularity-basedmethods [16], [29] and label propagation [18]. Techniqueshave also been proposed to detect overlapping communities,such as CPM [19], link partitions [30], and AFOCS [8].In addition, Domenico et al. [31] proposed a method todetect community structures in multilayer networks. A surveyof existing community detection methods is given in [14].The aforementioned community detection methods are allcentralized methods, which require that the global networkinformation is known a priori. Considering that global net-work information is hard to obtain in mobile networks suchas DTNs, Hui et al. [26] proposed distributed communitydetection methods based on existing centralized methods.Clauset [32] proposed methods to detect local communitiesby distributedly exploring the graph near individual vertexes.In addition, Bae and Howe [33] achieved community detectionin large-scale networks by proposing distributed detectionalgorithms.

Although community detection has been well studied instatic networks, it remains difficult to detect time-varyingcommunities in a dynamic network, such as a social network,in which people’s behaviors are highly dynamic. Recently,more research has been focused on studying evolving com-munity structures based on network structures at successivenetwork snapshots. Palla et al. [34] developed an algorithmto identify evolving communities by first detecting over-lapping community structure at each network snapshot andthen mapping similar communities at different snapshots.Nguyen et al. [8] proposed AFOCS, which can adaptivelyupdate the community structure at each network snapshotbased on the history without detecting communities againat each snapshot. Similar methodologies are also utilizedin [35]–[37] to detect evolving communities based on thenetworks at different snapshots. These methods are intendedto analyze the long-term evolution of community structurescaused by permanent change of humans’ behaviors or habits.There is still a dearth of work examining transient communi-ties caused by the periodic changes of human behavior.

Although Gao et al. proposed the concept of transientcommunity in [38], they only investigated the community rela-tionship between individual pairs of nodes without providingcomplete knowledge about the transient community structure.Peixoto and Rosvall [39] studied dynamic community struc-ture in temporal networks based on a Markov chain model.

However, their focus was not DTN and they did not provideany approach to detect the reappearance of communitieswhich is critical to data forwarding in DTNs. Pietiläanenand Diot [40] proposed algorithms to detect temporal com-munities that are similar to transient communities. The basicidea is to extract static community structure from each networksnapshot. However, the time interval of the network snapshot isdifficult to determine; thus it is challenging to accurately detecttransient communities. Instead of detecting communities fromnetwork snapshots, our work directly studies nodes’ pairwisecontact processes, and thus can accurately detect transientcommunities.

Community structure has been extensively utilized toaddress the problem of data forwarding in DTNs. It is believedthat nodes within the same community have a higher chance ofcontacting each other. Hui et al. [6] and Hui and Crowcroft [7]have proposed strategies to forward data to the nodesthat are within the destination’s community. Li et al. [41]and Wu et al. [42] defined the concepts of space-crossingcommunities and home communities, respectively, to bettercharacterize human mobility and facilitate data forwardingin DTNs. However, these research efforts do not considerdynamic or temporal community structures. By introducing adynamic community structure, AFOCS [8] further consideredcommunity evolution in community-based data forwarding.Transient communities in [38] were used to determine thescope for evaluating node centrality. Reference [40] studiedhow temporal communities contribute to data disseminationand concluded that temporal communities tend to limit datadissemination in DTNs. To the extent of our knowledge, ourpaper is the first work to fully utilize transient communitiesfor data forwarding.

V. CONCLUSIONS

In this paper, we proposed a contact-burst-based clusteringmethod (CCM) to detect TCs by exploiting the pairwise con-tact processes. We formulated each pairwise contact processas the regular appearances of contact bursts, during whichmost contacts between the pair of nodes appear. Based on thisformulation, we detected transient communities by clusteringthe pairs of nodes with similar contact bursts. Trace-drivensimulations showed that CCM can detect TCs more effec-tively compared with existing community detection methods.We also proposed a distributed CCM method to make theTC detection feasible in individual nodes, and demonstratedthat this method can effectively detect TCs. Finally, TCs areapplied to data forwarding in DTNs, where data are forwardedto TCs that have better relaying capability to the destinationnode. Trace-drive simulations showed that our strategy outper-forms traditional community-based data forwarding strategies.

REFERENCES

[1] K. Fall, “A delay-tolerant network architecture for challenged internets,”in Proc. ACM SIGCOMM, 2003, pp. 27–34.

[2] E. M. Daly and M. Haahr, “Social network analysis for routing indisconnected delay-tolerant manets,” in Proc. ACM MobiHoc, 2007,pp. 32–40.

[3] X. Zhang and G. Cao, “Efficient data forwarding in mobile social net-works with diverse connectivity characteristics,” in Proc. IEEE ICDCS,Jul. 2014, pp. 31–40.


[4] W. Gao and G. Cao, “User-centric data dissemination in disrup-tion tolerant networks,” in Proc. IEEE INFOCOM, Apr. 2011,pp. 3119–3127.

[5] X. Sun, Z. Lu, X. Zhang, M. Salathé, and G. Cao, “Targeted vaccinationbased on a wireless sensor system,” in Proc. IEEE PerCom, Mar. 2015,pp. 215–220.

[6] P. Hui, J. Crowcroft, and E. Yoneki, “BUBBLE rap: Social-basedforwarding in delay-tolerant networks,” IEEE Trans. Mobile Comput.,vol. 10, no. 11, pp. 1576–1589, Nov. 2011.

[7] P. Hui and J. Crowcroft, “How small labels create big improvements,”in Proc. IEEE PerCom Workshops, Mar. 2007, pp. 65–70.

[8] N. P. Nguyen, T. N. Dinh, S. Tokala, and M. T. Thai, “Overlapping com-munities in dynamic networks: Their detection and mobile applications,”in Proc. ACM MobiCom, 2011, pp. 85–96.

[9] W. Gao, Q. Li, B. Zhao, and G. Cao, “Social-aware multicast indisruption-tolerant networks,” IEEE/ACM Trans. Netw., vol. 20, no. 5,pp. 1553–1566, Oct. 2012.

[10] M. McPherson, L. Smith-Lovin, and J. M. Cook, “Birds of a feather:Homophily in social networks,” Annu. Rev. Sociol., vol. 27, pp. 415–444,Aug. 2001.

[11] B. Han and A. Srinivasan, “Your friends have more friends than youdo: Identifying influential mobile users through random walks,” in Proc.ACM MobiHoc, 2012, pp. 5–14.

[12] Y. Zhang et al., “Social-aware data diffusion in delay tolerant MANETs,”in Handbook of Optimization in Complex Networks. New York, NY,USA: Springer, 2012, pp. 457–481.

[13] Z. Zhu, G. Cao, S. Zhu, S. Ranjan, and A. Nucci, “A social networkbased patching scheme for worm containment in cellular networks,” inProc. IEEE INFOCOM, Apr. 2009, pp. 1476–1484.

[14] S. Fortunato and C. Castellano. (Dec. 2007). “Community structure ingraphs.” [Online]. Available: https://arxiv.org/abs/0712.2716

[15] S. Asur, S. Parthasarathy, and D. Ucar, “An event-based framework forcharacterizing the evolutionary behavior of interaction graphs,” in Proc.ACM SIGKDD, Nov. 2009, p. 16.

[16] M. E. J. Newman, “Analysis of weighted networks,” Phys. Rev. E, Stat.Phys. Plasmas Fluids Relat. Interdiscip. Top., vol. 70, no. 5, p. 056131,Nov. 2004.

[17] Z. Lu, X. Sun, Y. Wen, G. Cao, and T. L. Porta, “Algorithms andapplications for community detection in weighted networks,” IEEETrans. Parallel Distrib. Syst., vol. 26, no. 11, pp. 2916–2926, Nov. 2014.

[18] U. N. Raghavan, R. Albert, and S. Kumara, “Near linear time algo-rithm to detect community structures in large-scale networks,” Phys.Rev. E, Stat. Phys. Plasmas Fluids Relat. Interdiscip. Top., vol. 76, no. 3,p. 036106, Sep. 2007.

[19] G. Palla, I. Derényi, I. Farkas, and T. Vicsek, “Uncovering the overlap-ping community structure of complex networks in nature and society,”Nature, vol. 435, no. 7043, pp. 814–818, Apr. 2005.

[20] T. Henderson, D. Kotz, and I. Abyzov, “The changing usage of a maturecampus-wide wireless network,” in Proc. ACM MobiCom, Oct. 2008,pp. 2690–2712.

[21] N. Eagle and A. Pentland, “Reality mining: Sensing complex socialsystems,” Pers. Ubiquitous Comput., vol. 10, no. 4, pp. 255–268, 2006.

[22] M. McNett and G. Voelker, “Access and mobility of wireless PDAusers,” ACM SIGMOBILE Mobile Comput. Commun. Rev., vol. 9, no. 2,pp. 40–55, 2005.

[23] J. A. Hartigan and M. A. Wong, “Algorithm as 136: A k-meansclustering algorithm,” J. Roy. Statist. Soc. C (Appl. Statist.), vol. 28,no. 1, pp. 100–108, 1979.

[24] I. S. Dhillon, Y. Guan, and B. Kulis, “Kernel k-means: Spectralclustering and normalized cuts,” in Proc. ACM SIGKDD, Aug. 2004,pp. 551–556.

[25] J. Friedman, T. Hastie, and R. Tibshirani, The Elements of StatisticalLearning (Springer Series in Statistics), vol. 1. Berlin, Germany:Springer, 2001.

[26] P. Hui, E. Yoneki, S. Y. Chan, and J. Crowcroft, “Distributed communitydetection in delay tolerant networks,” in Proc. ACM/IEEE Int. WorkshopMobility Evol. Internet Archit., Aug. 2007, p. 7.

[27] T. W. Anderson and D. A. Darling, “Asymptotic theory of certain‘goodness of fit’ criteria based on stochastic processes,” Ann. Math.Statist., vol. 23, no. 2, pp. 193–212, 1952.

[28] A. Vahdat et al., “Epidemic routing for partially connected ad hocnetworks,” Dept. Comput. Sci., Duke Univ., Durham, NC, USA,Tech. Rep. CS-200006, 2000.

[29] M. E. J. Newman and M. Girvan, “Finding and evaluating communitystructure in networks,” Phys. Rev. E, Stat. Phys. Plasmas Fluids Relat.Interdiscip. Top., vol. 69, p. 026113, Feb. 2004.

[30] T. S. Evans and R. Lambiotte, “Line graphs, link partitions, andoverlapping communities,” Phys. Rev. E, Stat. Phys. Plasmas FluidsRelat. Interdiscip. Top., vol. 80, no. 1, p. 016105, Jul. 2009.

[31] M. De Domenico, A. Lancichinetti, A. Arenas, and M. Rosvall, “Identi-fying modular flows on multilayer networks reveals highly overlappingorganization in interconnected systems,” Phys. Rev. X, vol. 5, no. 1,p. 011027, Mar. 2015.

[32] A. Clauset, “Finding local community structure in networks,” Phys.Rev. E, Stat. Phys. Plasmas Fluids Relat. Interdiscip. Top., vol. 72, no. 2,p. 026132, Aug. 2005.

[33] S.-H. Bae and B. Howe, “GossipMap: A distributed community detec-tion algorithm for billion-edge directed graphs,” in Proc. IEEE/ACM SC,Nov. 2015, p. 27.

[34] G. Palla, A.-L. Barabási, and T. Vicsek, “Quantifying social groupevolution,” Nature, vol. 446, no. 7136, pp. 664–667, 2007.

[35] Y.-R. Lin, Y. Chi, S. Zhu, H. Sundaram, and B. L. Tseng, “Facetnet:A framework for analyzing communities and their evolutions in dynamicnetworks,” in Proc. ACM WWW, 2008, pp. 685–694.

[36] D. Duan, Y. Li, Y. Jin, and Z. Lu, “Community mining on dynamicweighted directed graphs,” in Proc. ACM Int. Workshop Complex Netw.Meet Inf. Knowl. Manage., 2009, pp. 11–18.

[37] T. Yang, Y. Chi, S. Zhu, Y. Gong, and R. Jin, “A Bayesian approachtoward finding communities and their evolutions in dynamic socialnetworks,” in Proc. SDM SIAM, 2009, pp. 990–1001.

[38] W. Gao, G. Cao, T. L. Porta, and J. Han, “On exploiting transient socialcontact patterns for data forwarding in delay-tolerant networks,” IEEETrans. Mobile Comput., vol. 12, no. 1, pp. 151–165, Jan. 2013.

[39] T. P. Peixoto and M. Rosvall, “Modeling sequences and temporalnetworks with dynamic community structures,” CoRR, Sep. 2015.

[40] A. K. Pietilänen and C. Diot, “Dissemination in opportunistic socialnetworks: The role of temporal communities,” in Proc. ACM MobiHoc,2012, pp. 165–174.

[41] Z. Li, C. Wang, S. Yang, C. Jiang, and I. Stojmenovic, “Space-crossing:Community-based data forwarding in mobile social networks under thehybrid communication architecture,” IEEE Trans. Wireless Commun.,vol. 14, no. 9, pp. 4720–4727, Sep. 2015.

[42] J. Wu, M. Xiao, and L. Huang, “Homing spread: Community home-based multi-copy routing in mobile social networks,” in Proc. IEEEINFOCOM, Apr. 2013, pp. 2319–2327.

Xiaomei Zhang received the B.E. degree from theUniversity of Science and Technology of Chinain 2010, and the Ph.D. degree in computer scienceand engineering from The Pennsylvania State Uni-versity in 2016. Since then, she has been withthe Department of Mathematics and ComputationalScience, University of South Carolina Beaufort,where she is currently an Assistant Professor. Herresearch interests include mobile computing, mobilesocial networks, wireless communications, and datascience.

Guohong Cao (S’98–A’99–M’03–SM’06–F’11)received the Ph.D. degree in computer science fromOhio State University in 1999. Since then, he hasbeen with the Department of Computer Scienceand Engineering, The Pennsylvania State University,where he is currently a Professor. He has authoredor coauthored over 200 papers in his research areaswhich have been cited over 17,000 times, with an h-index of 66. His research interests include wirelessnetworks, mobile systems, Internet of Things, andwireless security and privacy. He has served on the

Editorial Board of the IEEE TRANSACTIONS ON MOBILE COMPUTING, theIEEE TRANSACTIONS ON WIRELESS COMMUNICATIONS, and the IEEETRANSACTIONS ON VEHICULAR TECHNOLOGY, and has served on theorganizing and technical program committees of many conferences, includingthe TPC Chair/Co-Chair of the IEEE SRDS 2009, the MASS 2010, and theINFOCOM 2013. He was a recipient of the NSF CAREER Award in 2001.

ieee/acm transactions on networking, vol. 25, no. 5...

Documents