ssgq

9
On Socio-Spatial Group Query for Location-Based Social Networks De-Nian Yang Chih-Ya Shen Wang-Chien Lee Ming-Syan Chen Academia Sinica National Taiwan Univ. The Penn State Univ. National Taiwan Univ. Taipei, Taiwan Taipei, Taiwan State College, PA, USA Taipei, Taiwan [email protected] [email protected] [email protected] [email protected] ABSTRACT Challenges faced in organizing impromptu activities are the requirements of making timely invitations in accordance with the locations of candidate attendees and the social relation- ship among them. It is desirable to find a group of attendees close to a rally point and ensure that the selected attendees have a good social relationship to create a good atmosphere in the activity. Therefore, this paper proposes Socio-Spatial Group Query (SSGQ) to select a group of nearby attendees with tight social relation. Efficient processing of SSGQ is very challenging due to the tradeoff in the spatial and so- cial domains. We show that the problem is NP-hard via a proof and design an efficient algorithm SSGSelect, which includes effective pruning techniques to reduce the running time for finding the optimal solution. We also propose a new index structure, Social R-Tree to further improve the efficiency. User study and experimental results demonstrate that SSGSelect significantly outperforms manual coordina- tion in both solution quality and efficiency. Categories and Subject Descriptors H.2.4 [Systems]: Query processing General Terms Algorithms Keywords Query Processing, Spatial Indexing, Social Networks 1. INTRODUCTION With the emergence of GPS-enabled mobile devices and Web 2.0 technology, mobile users can easily capture and upload their own locations to various location-based social networking (LBSN) applications, such as Loopt, Geomium, Buddy Beacon, FindMe, Facebook Place, BrightKite, Meetup, and Google Map. All these LBSN applications allow users to share their location information with friends and thus, as we envisage, may potentially enable instant organization Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. To copy otherwise, to republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. KDD’12, August 12–16, 2012, Beijing, China. Copyright 2012 ACM 978-1-4503-1462-6 /12/08 ...$15.00. of impromptu activities, e.g., having a dinner with friends nearby. However, with today’s technology, the selection of invitees for these events still requires significant manual co- ordination. In this work, we aim to develop new techniques for an impromptu activity planning service to alleviate this effort. Due to the inherited nature as an LBSN application, such an impromptu activity planning service needs to consider the following spatial and social factors: i) the selected friends need to be close enough to a rally point in order to minimize the waiting time for an activity; and ii) the selected friends need to have a good social relationship to create a good at- mosphere in the activity. In other words, challenges faced in organizing impromptu activities lie in meeting the require- ments of making timely invitations in accordance with the locations of candidate invitees and the social relationship among them. Notice that friends located nearby may not have tight social relationships with each other, while close friends may not happen to located near an intended rally point. Moreover, when the number of intended invitees in- creases, the invitation process becomes more complicated and thereby too tedious to coordinate manually. Therefore, efficient search algorithms that automatically suggest the most suitable attendees for an activity is a mandate for the impromptu activity planning service. As the first step towards the aforementioned impromptu activity planning service, we propose a new query, namely Socio-Spatial Group Query (SSGQ), aiming to find a set of most suitable candidates for a planned activity by taking into account certain spatial and social constraints. In this work, we assume that the service provider has access to the social connectivity of its users and their locations. 1 Thus, given a social graph consisting of socially connected users and their (published) locations, an SSGQ, specified a rally point q, the number of activity invitees p, and the average number of unfamiliar people an invitee may have k, returns a set of p invitees from the social graph of the activity initiator such that the average spatial distance for each invitee to the rally point is minimized. It is worth noting that the query incorporates a social constraint (denoted by k), to govern the unfamiliarity between invitees. Each attendee on average can have no social relationship with at most k other invitees. With a proper setting of k, the tightness of social relationships among invitees (and hopefully activity atmosphere) are ensured. As such, the SSGQ can facilitate 1 The privacy issues on the locations and social connectivity of users are very important but out of scope of this study. A simple way is to allow each user to specify a friend list and time intervals, such that only the query issued from the friends in the list during those intervals can access the location information and social connectivity of the user. 949

Upload: nitish-upreti

Post on 11-Sep-2015

212 views

Category:

Documents


0 download

DESCRIPTION

Socio Spatial Group Query

TRANSCRIPT

  • On Socio-Spatial Group Query for Location-Based SocialNetworks

    De-Nian Yang Chih-Ya Shen Wang-Chien Lee Ming-Syan ChenAcademia Sinica National Taiwan Univ. The Penn State Univ. National Taiwan Univ.Taipei, Taiwan Taipei, Taiwan State College, PA, USA Taipei, Taiwan

    [email protected] [email protected] [email protected] [email protected]

    ABSTRACTChallenges faced in organizing impromptu activities are therequirements of making timely invitations in accordance withthe locations of candidate attendees and the social relation-ship among them. It is desirable to nd a group of attendeesclose to a rally point and ensure that the selected attendeeshave a good social relationship to create a good atmospherein the activity. Therefore, this paper proposes Socio-SpatialGroup Query (SSGQ) to select a group of nearby attendeeswith tight social relation. Ecient processing of SSGQ isvery challenging due to the tradeo in the spatial and so-cial domains. We show that the problem is NP-hard viaa proof and design an ecient algorithm SSGSelect, whichincludes eective pruning techniques to reduce the runningtime for nding the optimal solution. We also propose anew index structure, Social R-Tree to further improve theeciency. User study and experimental results demonstratethat SSGSelect signicantly outperforms manual coordina-tion in both solution quality and eciency.

    Categories and Subject DescriptorsH.2.4 [Systems]: Query processing

    General TermsAlgorithms

    KeywordsQuery Processing, Spatial Indexing, Social Networks

    1. INTRODUCTIONWith the emergence of GPS-enabled mobile devices and

    Web 2.0 technology, mobile users can easily capture andupload their own locations to various location-based socialnetworking (LBSN) applications, such as Loopt, Geomium,Buddy Beacon, FindMe, Facebook Place, BrightKite, Meetup,and Google Map. All these LBSN applications allow usersto share their location information with friends and thus,as we envisage, may potentially enable instant organization

    Permission to make digital or hard copies of all or part of this work forpersonal or classroom use is granted without fee provided that copies arenot made or distributed for profit or commercial advantage and that copiesbear this notice and the full citation on the first page. To copy otherwise, torepublish, to post on servers or to redistribute to lists, requires prior specificpermission and/or a fee.KDD12, August 1216, 2012, Beijing, China.Copyright 2012 ACM 978-1-4503-1462-6 /12/08 ...$15.00.

    of impromptu activities, e.g., having a dinner with friendsnearby. However, with todays technology, the selection ofinvitees for these events still requires signicant manual co-ordination. In this work, we aim to develop new techniquesfor an impromptu activity planning service to alleviate thiseort.Due to the inherited nature as an LBSN application, such

    an impromptu activity planning service needs to consider thefollowing spatial and social factors: i) the selected friendsneed to be close enough to a rally point in order to minimizethe waiting time for an activity; and ii) the selected friendsneed to have a good social relationship to create a good at-mosphere in the activity. In other words, challenges faced inorganizing impromptu activities lie in meeting the require-ments of making timely invitations in accordance with thelocations of candidate invitees and the social relationshipamong them. Notice that friends located nearby may nothave tight social relationships with each other, while closefriends may not happen to located near an intended rallypoint. Moreover, when the number of intended invitees in-creases, the invitation process becomes more complicatedand thereby too tedious to coordinate manually. Therefore,ecient search algorithms that automatically suggest themost suitable attendees for an activity is a mandate for theimpromptu activity planning service.As the rst step towards the aforementioned impromptu

    activity planning service, we propose a new query, namelySocio-Spatial Group Query (SSGQ), aiming to nd a set ofmost suitable candidates for a planned activity by takinginto account certain spatial and social constraints. In thiswork, we assume that the service provider has access to thesocial connectivity of its users and their locations.1 Thus,given a social graph consisting of socially connected usersand their (published) locations, an SSGQ, specied a rallypoint q, the number of activity invitees p, and the averagenumber of unfamiliar people an invitee may have k, returns aset of p invitees from the social graph of the activity initiatorsuch that the average spatial distance for each invitee tothe rally point is minimized. It is worth noting that thequery incorporates a social constraint (denoted by k), togovern the unfamiliarity between invitees. Each attendeeon average can have no social relationship with at most kother invitees. With a proper setting of k, the tightness ofsocial relationships among invitees (and hopefully activityatmosphere) are ensured. As such, the SSGQ can facilitate

    1The privacy issues on the locations and social connectivityof users are very important but out of scope of this study.A simple way is to allow each user to specify a friend listand time intervals, such that only the query issued fromthe friends in the list during those intervals can access thelocation information and social connectivity of the user.

    949

  • users to organize dierent kinds of activities based on boththe specied spatial and social conditions.Note that SSGQ can be generalized for supporting other

    spatial/social relevant planning services that do not requirea specic initiator. For example, when an user queries anearby restaurant using a mobile device, an SSGQ may beissued to automatically recommend a group of socially closefriends near the restaurant. Moreover, location-based ad-vertisements can leverage SSGQ to a group of friends for apreferred restaurant in order to push mobile coupon, whichis benecial to both customers and restaurants.2 Finally, theSSGQ is particularly useful for coordinating search and res-cue operations in some scenarios of emergency and disasters,such as earthquakes and tsunamis. As social relationship,e.g., previous co-working experiences of the searchers, andspatial factor, e.g., distance for searchers to reach the dis-aster area, are important for success of rescue operations,the SSGQ can be employed to obtain the most suitablesearchers.The problem of processing SSGQ eciently is challenging

    due to the tradeo in the spatial distance and the social rela-tionships. A simple strategy for processing SSGQ is to enu-merate all possible groups of p invitees, eliminate those thatdo not satisfy the social constraints, and then return thegroup with minimal total spatial distance. This approachneeds to evaluate Cfp candidate groups, where f is the totalnumber of candidate attendees, in the solution space andthus is not desirable. Note that, as we will show later, theproblem of processing SSGQ is NP-hard. Nevertheless, dueto the importance of spatial and social factors considered inSSGQ, we exploit the spatial and social constraints speciedin SSGQ while forming the candidate groups to alleviate theprocessing cost. Our idea is to expand the solution space ina systematical way such that we can eciently nd the so-lution without examining all candidate groups in details. Inother words, we incrementally select the invitees by givingpriority to those (a) who are located close to the rally pointand (b) who are close friends. Finding a group of friendswho best meet both conditions of (a) and (b) is not trivial.An algorithm that prioritizes on (a) is inclined to generatecandidate groups with small total spatial distance quicklybut it does not always come up with a solution that satisesthe familiarity constraint, especially those activities speci-ed with a small k. On the other hand, an algorithm thatprioritizes on (b) may quickly nd a group of friends whoknow each other very well but do not easily satisfy the min-imum total spatial distance. In other words, the challengein nding optimal SSGQ answer comes from the dilemmaof (1) reducing the total spatial distance3 and (2) ensuringthat the solution satisfying the familiarity constraint.Specically, in this paper, we rst prove that SSGQ is an

    NP-hard problem but, fortunately, while the problem is verychallenging, it is still tractable since the size of p is relativelysmall in most impromptu activities. Therefore, to ecientlyprocess this query, we design a new algorithm, called SSGS-elect, to nd the optimal solution. We design various strate-gies, Distance Ordering, Socio-Spatial Ordering, Distance

    2Notice that location-based advertisement and socialnetwork-based advertisements have been exploited by web-sites, such as Google Map and Livingsocial.com. However,the two aspects have not been considered in an integratedfashion.3Reducing the maximum spatial distance of an attendee isan alternative. This paper proposes to minimize the totalspatial distance because many social activities can begin orwarm up when most attendees arrives, and a few distantattendees are allowed to arrive late.

    Pruning, and Familiarity Pruning, to eectively reduce theprocessing time. During the selection of each attendee, weaddress both the spatial distance and social connectivity ofthe attendee, together with the characteristics of the socialnetwork for other candidate attendees that have not beenconsidered, in order to prune unnecessary search space andnd the optimal solution eciently.The above strategies incrementally consider and examine

    each candidate attendee. A potential approach to furtherimprove the eciency is to select multiple candidate atten-dees jointly, because the optimal solution can be constructedwith fewer iterations. When an attendee close to the rallypoint is chosen, nearby candidate attendees are potential tobe good candidates for joint selection due to their smallerspatial distances to the rally point. Nearby candidate atten-dees can be eciently identied by leveraging the spatial in-dex structure, such as R-Tree, since they are usually locatedin the same Minimum Bounding Rectangle (MBR) of R-Tree. Nevertheless, choosing all candidate attendees in thesame MBR may not be able to generate a solution followingthe familiarity constraint with k since the social connectiv-ity among them is not carefully examined. To eectivelytackle this issue, we propose Social R-Tree (SR-Tree), whichincorporates and summarizes the social information for thecandidate attendees in an MBR. In SR-Tree, all candidateattendees in each MBR are indexed into multiple social clus-ters with dierent social densities to support SSGQ withdierent k.With the proposed SR-Tree, SSGSelect is enhanced by a

    new strategy, Joint Insertion, which chooses multiple can-didate attendees simultaneously to reduce the number ofrequired iterations for nding the optimal solution. We im-plement SSGSelect in Facebook and conduct user study on206 people. The contributions of this paper are summarizedas follows.

    We formulate a useful query for social networking ap-plication, namely, SSGQ, to obtain the optimal set ofattendees. SSGQ can be used to plan for various typesof activities by specifying the familiarity constraint k.We prove that the problem is NP-hard.

    To eciently process SSGQ, we propose SSGSelectwith various strategies, including Distance Ordering,Socio-Spatial Ordering, Distance Pruning, and Famil-iarity Pruning, for nding the optimal solution. Inaddition, we design a new index structure, named SR-Tree, together with Joint Insertion, to simultaneouslyselect multiple nodes at each iteration for nding theoptimal solution in smaller time.

    We implement SSGSelect in Facebook and conduct anuser study with 206 people. The results demonstratethat SSGSelect signicantly outperforms manual coor-dination in both solution quality and eciency.

    The rest of this paper is summarized as follows. Section2 introduces the related works. Section 3 formulates SSGQand prove that it is NP-hard. We design algorithm SSGSe-lect in Section 4 and enhance it with the proposed SR-Treein Section 5. Results of the user study and experiments areshown in Section 6, and we conclude this paper in Section7.

    2. RELATED WORKSome LBSN applications, e.g., Meetup, have been devel-

    oped for activity coordination. However, an activity ini-tiator still needs to specify candidate attendees of the ac-tivity. This is undesirable because social familiarity is not

    950

  • ensured, while the waiting time is not minimized, either.Manual assignment of candidate attendees is tedious andtime-consuming for the initiator, especially for a large ac-tivity. Thus, SSGQ is very useful for such sites which rec-ommends suitable attendees with ecient query processing.In recent works, several researches on nding groups of so-

    cially connected members, e.g., team formation [2][3], com-munity search [4], and Social-Temporal Group Query [7],have been reported in the literature. Nevertheless, their re-search context and objectives are totally dierent from ourstudy that explores both the spatial and social dimensionsin nding a group for impromptu activity planning. Specif-ically, team formation [2][3] nds a group of experts withthe required skills, while aiming to minimize the communi-cations cost between these experts. Community search [4]nds a compact community that contains particular mem-bers, aiming to minimize the total degree in the community.Social-Temporal Group Query [7] checks the available timesof attendees to nd the group with the most suitable activ-ity time. To the best knowledge of the authors, researcheson nding groups that consider constraints in both the spa-tial and social dimensions have not appeared. In contrast,our work examines the inter-play in both social and spatialdimensions, with an objective to nd a group of mutuallyfamiliar attendees such that the total spatial distance to arally point is minimized. As mentioned, we envisage thatour research result can be employed in various LBSN appli-cations for group recommendation.Relevant to our work, spatial queries for selecting a set

    of spatial points, aiming to minimize the total spatial dis-tance, have been proposed for various scenarios [5, 6, 8, 9].However, in these works, the (social) connectivity among thespatial points is not considered. Specically, given two setsof points P and Q, together with the number of points to beselected k, Group Nearest Neighbor Query [5] nds a set of kpoints in P such that the total spatial distance of the pointsto all points in Q is minimized. On the other hand, for aline segment and a set of points, Continuous Nearest Neigh-bor Search [6] returns the nearest neighbor of each pointon the line segment, while the Continuous Visible NearestNeighbor Queries [8] extends Continuous Nearest NeighborSearch [6] by incorporating the obstacles in the problem de-sign, which may aect the visibility or distance between twopoints and lead to dierent results. Meanwhile, ContinuousObstructed Nearest Neighbor Query [9] retrieves the near-est neighbor with regard to the obstructed distance, i.e., theshortest path without crossing any obstacle. Therefore, theabove-mentioned queries focus only on the spatial dimen-sion and thereby are not applicable to our scenario of LBSNapplications.

    3. PROBLEM FORMULATIONGiven a social graph G = (V,E), where each vertex v

    V is a candidate attendee with a location lv, and any twomutually acquainted vertices u and v are connected by anedge eu,v. A Socio-Spatial Group Query SSGQ(p, q, k) ndsa set F of p vertices from G where the total spatial distancefrom the vertices in F to the rally point q, i.e.,

    vF dv,q,

    is minimized and each vertex on average can share no edgewith at most k other vertices in F .Notice that the three parameters in an SSGQ, i.e., p, q,

    and k, determine the size of the answer group, spatial con-straint, and social constraint of the query, respectively, andthus have signicant implications regarding the processingstrategy. First, as the size of group p increases, the solutionspace (which consists of all candidate groups) rapidly grows.

    Indeed, we prove that processing SSGQ is an NP-hard prob-lem but, fortunately, while the problem is very challenging,it is still tractable since the size of p is relatively small inmost cases. Second, the position of q hints that candidateinvitees located close to q should be considered with prior-ity as the search criteria aims to minimize the total spatialdistance from members of the group to q. Finally, k dictatesthe tightness of social relationship among members of thegroup. A small k in SSGQ results in a group of mutuallyfamiliar invitees, while a larger k allows more unfamiliaritywithin the returned group of invitees. The spatial and socialconstraints can be employed for early pruning of unqualiedcandidate groups.In this paper, we aim to develop ecient query process-

    ing algorithms for SSGQ with a new index structure, SocialR-Tree. Moreover, we also devise several access orderingand pruning strategies to eectively reduce the computationtime. In the next section, we design an ecient query pro-cessing algorithm by pruning unqualied candidate groups.In the following, we rst prove the hardness result.

    Theorem 1. SSGQ is NP-hard.

    Proof. We prove that SSGQ is NP-hard with the re-duction from p-clique. Decision problem p-clique is given agraph Gc to nd whether the graph contains a clique, i.e.,a complete graph with an edge connecting every two ver-tices, with p vertices. In SSGQ, we let G = Gc, k = 0, anddv,q = 1 for every vertex v V . We rst prove the neces-sary condition. If Gc contains a p-clique, there must exist agroup with the same vertices in the p-clique such that everyperson has social relations with all the other attendees of thegroup, and the total spatial distance is p. We then prove thesucient condition. If G in SSGQ contains a group with thesize as p and k as 0, Gc in problem p-clique must contain asolution with size p, too. The theorem follows.

    4. ALGORITHM DESIGN FOR SSGQIn this section, we propose Algorithm SSGSelect to nd

    the optimal solution for SSGQ. To guide an ecient explo-ration of the search space, we design eective strategies toselection new attendees and prune unnecessary search space.Later in Section 5, equipped with SR-Tree, SSGSelect is en-hanced by Joint Insert to select multiple attendees at eachiteration. Specically, let SI and SR denote the intermediatesolution set and the remaining set of candidate attendees,respectively. Initially, SI is empty, and SR is V . At eachiteration afterward, we select a vertex from SR to SI . WhenSI includes p vertices and follows the familiarity constraint,SI is regarded as a feasible solution. Afterward, to improvethe feasible solution, our algorithm backtracks to the pre-vious iteration and previous SI for choosing another vertexfrom SR to SI , while a branch-and-bound tree is maintainedto keep track of the exploration process for backtracking.The selection of the vertex from SR into SI at each itera-

    tion is critical. It is essential to avoid choosing a vertex thatwill lead to a large increment of the spatial distance. Withthis important factor in mind, Distance Ordering selects avertex with a small spatial distance to the rally point q. Tofurther consider the social connectivity, Socio-Spatial Or-dering extends Distance Ordering by jointly considering thespatial and social domains. On the other hand, to trim un-necessary search space, Distance Pruning eectively avoidsunnecessary exploration of SR that will not lead to a bettersolution, by examining the sum of the current total spatialdistance from SI to q and the lower bound of the distanceincrement in selecting any vertex from SR. In addition to

    951

  • M1

    q

    M2

    MINDIST(M2,q) = 3

    MINDIST(M1,q) = 15

    M0

    15

    22

    7

    48

    19

    (a)

    a

    a,b

    a,b,c a,b,d

    a

    b

    c

    d

    a,cc

    states in the branch and bound treemove a vertex from SR to SIbacktracking

    a,b,e

    e

    a,b,f

    f

    (b)

    Figure 1: Example of Distance Ordering andBranch-and-Bound Search.

    the above spatial domain, Familiarity Pruning examines so-cial connectiviy of the vertices in SR and stops choosing anyvertex from SR if eventually it is not able to generate anyfeasible solution following the familiarity constraint.Notice that SSGSelect is able to nd the optimal solution

    because Distance Pruning only removes unqualied solutions(the solutions with larger total spatial distances to q), andFamiliarity Pruning only trims infeasible solutions (the so-lutions that do not follow the familiarity constraints). Theoptimal solution does not belong the above two categories.Moreover, with Socio-Spatial Ordering, SSGSelect exploitsboth the spatial and social domain information to guide anecient search for the optimal solution.

    4.1 Distance OrderingAt each iteration, Distance Ordering selects the vertex

    with the minimum spatial distance to q, to minimize theincrement of the total spatial distance for SI . Naturally, astraightforward approach is to sort all vertices in G beforethe algorithm starts. However, this approach is not scalablefor a large social network and is very computation intensivebecause q is allowed be dierent for every SSGQ. In con-trast, a more ecient approach is leveraging a spatial indexstructure, such as R-Tree, and its distance browsing strat-egy [10] with a priority queue for vertex selection. Eachelement in the priority queue is either a vertex or a Mini-mum Bounding Rectangle (MBR). The distance of a vertexis its spatial distance to q, while the distance of an MBR Mis the MINDIST (M,q), which denotes the minimum dis-tance between q and the border of M [1]. Finding the vertexwith the minimum distance can be achieved by iterativelyextracting the rst element of the priority queue. Initially,the priority queue contains only the root MBR of R-Tree.When the extracted element is an MBR, we traverse the R-Tree and insert all its child MBRs into the priority queue.Therefore, the rst extracted vertex from the queue is theone with the minimum distance to q.For example, Figure 1(a) shows six vertices in an R-Tree,

    where q denotes the rally point. Assume that MINDISTto q for MBRs M0, M1 and M2 are 0, 15 and 3, respectively,and the actual distances of the vertices to q are marked be-sides each vertex. Initially, the priority queue contains only{M0}. Afterward, we extractM0 and insert M1 and M2 intopriority queue. Now the rst element of the priority queueis M2 because MINDIST (M2, q) < MINDIST (M1, q).Then, M2 is popped from the priority queue with its chil-dren vertices a, b and c inserted. Now the elements in thepriority queue are {a, b, c,M1}. Therefore, Distance Order-ing rst selects and adds a to SI because its spatial distanceto q is the smallest one.Figure 1(b) illustrates a part of the branch-and-bound tree

    with p = 3, where Distance Ordering iteratively moves thevertex with the minimum distance to q from in SR to SI .Notice that SR is {c, d, e, f} during the rst visit of the statewith SI = {a, b}, and c is then selected and added to SIto create a leaf node with SI = {a, b, c} with p = 3 inFigure 1(b). Since the number of vertices in SI is sucientfor p, Algorithm SSGSelect backtracks from SI = {a, b, c}to SI = {a, b}, and SR now becomes {d, e, f} since vertexc was just considered. Similarly, when backtracked fromSI = {a, b, d} to SI = {a, b}, SR becomes {e, f} since bothc and d has been explored.

    4.2 Socio-Spatial OrderingWith R-Tree, we demonstrated that Distance Ordering

    can eciently select the vertex for SI to minimize the in-crement of the total spatial distance to q, i.e., the objectivevalue, at each iteration. Nevertheless, it is also desirable thatthe selected vertex has good social connectivity to other ver-tices in SI in order to follow the familiarity constraint. Toincorporate this key factor, Socio-Spatial Ordering enhancesDistance Ordering by considering both the spatial distanceand social connectivity during the selection of a vertex fromSR to SI . Intuitively, when a vertex is extracted from thepriority queue, we select the vertex and add it to SI onlywhen the vertex satises Intra-Familiarity Condition. Thiscondition ensures that SI together with the selected vertexleads to a mutually familiar group of attendees specied byk. If the vertex does not follow the condition, we extract andexamine the next vertex from the priority queue. Therefore,both spatial and social factors are captured in Socio-SpatialOrdering.To ensure that each selected vertex v has good social con-

    nectivity to the vertices in SI , a simple approach is to re-strict that v can be selected only when the number of edgesbetween v and the vertices in SI exceeds a given threshold.With a larger threshold, a candidate attendee that is famil-iar with more attendees in SI is inclined to be chosen, suchthat it is easier for any solution growing from SI to satisfythe familiarity constraint. Nevertheless, the connectivity ofSI is not examined during the selection of a vertex in theabove approach. Intuitively, if SI is already a mutually fa-miliar group, it is expected that choosing a vertex v withfewer edges connecting SI is more proper if it can minimizethe total spatial distance. In addition, whether the connec-tivity of SI is sucient depends on k, which can be dierentin every SSGQ. Therefore, it is desirable to incorporate theconnectivity of the vertices in SI and parameter k during theselection of a vertex from SR to SI at each iteration. Withthe above observations in mind, we propose the followingcondition for Socio-Spatial Ordering.Intra-Familiarity Condition (IFC). When a vertex v

    is extracted from the priority queue, we select the vertexand add it from SR to SI only when the vertex satisesIntra-Familiarity Condition. To eectively consider the so-cial domain, this condition rst assumes that v is added toSI and then examines whether the social connectivity of thenew group SI {v} is sucient according to the criterion k.Specically, let F (SI {v}) denote the intra-familiarity ofthe group,

    F (SI {v}) = 1|SI {v}|

    vSI{v}|Nv |,

    where Nv is the set of neighbors of v in SI {v}. For everyattendee in SI {v}, intra-familiarity represents the averagenumber of acquainted attendees in SI {v}. Vertex v isallowed to be selected and added to SI if it satises IFCspecied as follows,

    952

  • F (SI {v}) |SI {v}| k (|SI {v}|)p 1 1,

    where k here is a ltering parameter and set as k initially.Intuitively, when k = p 1, the activity allows all atten-dees to be mutually unfamiliar, and Distance Order in thiscase is the best strategy. In fact, IFC in this situation be-comes F (SI {v}) 1, and Socio-Spatial Ordering here isidentical to Distance Order. In another extreme case withk = 0, IFC becomes F (SI {v}) |SI {v}| 1, and eachattendee in SI {v} needs to be acquainted with all theothers in SI {v}.It is worth noting that IFC incorporates a ltering param-

    eter k, instead of including k directly, in order to properlyhandle other cases with 0 < k < p 1. When k = 0, if novertex from SR can satisfy IFC, it is not necessary to addany vertex v from SR to SI because every solution grow-ing from SI {v} cannot follow the familiarity constraint.When k > 0, it is worth noting that if no vertex from SR cansatisfy IFC, it does not imply that every solution growingfrom SI {v} does not have sucient social connectivity. Incontrast, it is possible to nd a vertex v in SR and a solutiongrowing from SI {v} following the familiarity constraint,when other vertices added later bring a sucient number ofedges to the solution. Therefore, for any SSGQ with k > 0,

    Socio-Spatial Ordering sets k as k initially and increases k ifno vertex from SR can satisfy IFC, until at least one vertexfollows IFC and is able to be selected for SI . In other words,IFC rst maintains a high criterion for social connectivity

    by setting k as k, in order to prioritize a vertex leading tosucient social connectivity. If no vertex from SR can sat-

    isfy such a high criterion, IFC increases k to avoid lteringout any feasible solution, and any vertex in SR that did not

    satisfy IFC previously will be examined later with a large k.At each iteration, the vertex chosen by Distance Order-

    ing and Socio-Spatial Ordering may be dierent. The fol-lowing example demonstrates that the result obtained byDistance Ordering enjoys a smaller total social distance butdoes not follow the familiarity constraint. For the six ver-tices in Figure 1(a) with the corresponding social graph in2(a), where p = 3 and k = 0, the exploration of Socio-Spatial Ordering is shown as the thick line in Figure 2(b).

    In this example, at the beginning k = 0 and SI = , sincea is the vertex with the minimum spatial distance to q andF ({a}) = 0

    1 11 01

    2satises IFC, Socio-Spatial Or-

    dering moves vertex a from SR to SI rst and let SI = {a}.However, F (SI {b}) = 02 < 21 022 , which does not sat-isfy IFC. Therefore, Socio-Spatial Ordering examines vertexc and nds F (SI {c}) = 12 2 = 1 2 1 022 , whichsatises IFC. Therefore, vertex c is moved into SI and nowwe have SI = {a, c}. We then try to expand SI by checkingIFC with vertex d. F (SI {d}) = 13 6 = 2 3 1 032 .Therefore, d is moved to SI , and SI = {a, c, d} now is a fea-sible solution. By contrast, Distance Ordering rst selectsvertex b as shown in the dashed-line in Figure 2(b) and thensequentially constructs four states {a, b, c}, {a, b, d}, {a, b, e}and {a, b, f} in the branch-and-bound tree. Unfortunatelyall of them cannot follow the familiarity constraint. There-fore, this example illustrates that it is desirable to jointlyconsider spatial and social domain in order to nd a feasiblesolution for SSGSelect earlier, because this feasible solutionis a key factor for the pruning strategy in the next subsec-tion.

    4.3 Distance PruningIt is expected that Socio-Spatial Ordering can acquire

    the rst feasible solution, which follows the familiarity con-

    (a)

    Distance OrderingSocio-Spatial Ordering

    (b)

    Figure 2: Example of Socio-Spatial Ordering.

    straint, prior to Distance Ordering because social connec-tivity is examined during the selection of vertices. With therst feasible solution and its total spatial distanceD as a cri-terion, SSGSelect is able to prune unnecessary search space,in which the total spatial distance of any solution is no bet-ter than D. Moreover, to further improving the pruningcapability, SSGSelect updates D when a better feasible so-lution is obtained afterward. As shown in our experimentalresults in Section 6, pruning of unnecessary search space cansignicantly reduce the time to nd the optimal solution.For any intermediate solution SI , there are p|SI | vertices

    required to be selected from SR to SI . Apparently, furtherprocessing of SI is unnecessary if choosing the p |SI | ver-tices with the smallest spatial distances to q is not able togenerate a solution better than D. Nevertheless, nding thep|SI | vertices with the smallest spatial distances for everySR in the branch-and-bound tree is inclined to incur moredelay, especially for a large p. Therefore, Distance Pruningidenties a lower bound and stops processing SI when thefollowing condition holds,

    uSIdu,q + (p |SI |)dvmin,q D,

    where the rst term is the total spatial distance from the ver-tices in SI to q. For SR, only the vertex vmin with the small-est spatial distance to q is accessed here, and (p|SI |)dvmin,qrepresents a lower bound on the total spatial distance for theabove p |SI | vertices in SR. Distance Pruning is computa-tionally ecient since vertex vmin can be extracting from thepriority queue in Section 4.1. On the other hand, if the rstelement of the priority queue is an MBR, Distance Pruningis specied as follows,

    vSIdv,q + (p |SI |)MINDIST (M, q) D,

    whereMINDIST (M,q) is the minimum distance between qand the border ofM . Thus, the spatial distance of every ver-tex in M must be no smaller than MINDIST (M, q). Thatis,MINDIST (M,q) vmin, and (p|SI |)MINDIST (M, q)is a lower bound on the total spatial distance of the abovep |SI | vertices in SR.Consider an example with p = 3 with the spatial loca-

    tions and social graph shown in Figure 1(a) and 2(a), re-spectively. After a feasible solution {a, c, d} is explored, itstotal spatial distance 27 is assigned to D. When SSGSelectconsiders SI = {a, b} and SR = {e, f}, since uSI du,q +(p |SI |)dvmin,q = 11 + 1 19 = 30 > 27, Distance Prun-ing removes states {a, b, e} and {a, b, f}, stops processingSI = {a, b}, and backtracks to the previous state accord-ingly.

    4.4 Familiarity PruningDistance Pruning trims the search space by stopping pro-

    cessing SI and backtracks to the previous state in the branch-and-bound tree, when the total spatial distance from each

    953

  • vertex in SR to q is too large to generate a solution betterthan the current feasible solutionD in hand. In contrast, Fa-miliarity Pruning examines the social connectivity of all pos-sible solutions growing from SI . Familiarity Pruning stopsprocessing SI and backtracks to the previous state if everypossible solution growing from SI via the exploration of SRis not able to satisfy the familiarity constraint.The challenge of designing a pruning strategy in social do-

    main lies in various social connectivity allowed to appear inall possible solutions. It is expensive to examine each possi-ble solution according to the familiarity constraint. There-fore, to eciently prune unnecessary search space, Familiar-ity Pruning derives an upper bound on the social closeness ofevery possible solution growing from SR. If the upper boundindicates that each attendee on average is acquainted withfewer than p k 1 other attendees, every possible solutionis not able to follow the familiarity constraint.Specically, the edges in any solution growing from SI

    can be divided into three categories: 1) the set of edges EIconnecting any two vertices in SI , 2) the edges of edges ERconnecting any two vertices selected from SR, and 3) theset of edges EIR connecting any two vertices in SI and thevertices selected from SR. Apparently, |EI | is 12

    vSI |N

    Iv |,

    where NIv is the set of acquainted neighbors of v in SI . Sincethe selected vertices in SR is not clear now, a good way is tond an upper bound on |ER|, i.e., 12 (p|SI |)maxvSR |NRv |,where NRv is the set of acquainted neighbors of v in SR.It is an upper bound because the vertex with the maxi-mum degree in SR is identied, and (p |SI |) vertices areselected from SR. Similarly, an upper bound on |EIR| is

    vSI |InterEdge(v)|, where InterEdge(v) is the set ofedges connecting v in SI to any vertices in SR.Notice that the number of edges in a feasible solution

    is half of the degree sum of all the vertices in the solu-tion, where the degree of a vertex represents the numberof acquainted neighbors in the solution. Therefore, with theabove three categories of edges, Familiarity Pruning stopsprocessing SI when the following condition holds,

    1

    p

    [vSI

    |NIv |+ (p |SI |) maxvSR

    |NRv |

    +2 vSI

    |InterEdge(v)|

    < (p k 1),where the left-hand-side is an upper bound on the averagenumber of attendees acquainted to each person in any feasi-ble solution growing from SI . In the above condition, eachattendee on average is acquainted with fewer than p k 1other attendees. Familiarity Pruning stops processing SIand backtracks to the previous state if every possible solu-tion growing from SI via the exploration of SR is not ableto satisfy the familiarity constraint.For the social graph in Figure 2(a) with p = 3, k = 0,

    if SI = {b, d} and SR = {e, f}, Familiarity Pruning stopsprocessing SI because

    13(0+1 1+ 2 1) = 1 < (3 0 1) =

    2. In other words, moving any vertex from SR to SI willnever generate a feasible solution following the familiarityconstraint.

    5. ENHANCED STRATEGY FOR SSGQThe above strategies eciently select and examine each

    candidate attendee. A potential approach to further im-prove the eciency is to select multiple candidate attendeesjointly, because the optimal solution can be constructed with

    1

    2

    3

    4

    5

    (a)

    e

    f

    g

    a

    c

    d

    b

    ki

    j

    h

    nl

    m

    1

    2

    3

    4

    5

    Social Cluster Set, C1

    Social Cluster Set, C2

    Social Cluster Set, C3

    ......

    ...

    ...

    ...

    ...

    Social Cluster Set, C4

    ...

    ...

    Social Cluster Set, C5

    ...

    (b)

    Figure 3: Illustrations of SR-Tree.

    fewer iterations. When an attendee close to the rally pointis chosen, nearby candidate attendees in the same MBR ofR-Tree are also potential to be good candidates for jointselection due to their smaller spatial distances to the rallypoint. Nevertheless, jointly selecting all candidate attendeesin the same MBR may not be able to generate a solutionfollowing the familiarity constraint since the social connec-tivity among them is not examined. To eectively tacklethis issue, we propose Social R-Tree (SR-Tree), in whichcandidate attendees in each MBR are indexed into multiplesocial clusters with dierent social connectivity to supportSSGQ with dierent k. Equipped with SR-Tree, SSGQ re-duces the number of iterations required to nd the optimalsolution with a new strategy, named Joint Insertion, whichselects multiple vertices jointly at each iteration for SI togrow faster.

    5.1 Social R-TreeIn section 4.1, the feature of hierarchical MBRs in R-

    Tree enables SSGQ to eciently extract the vertex with thesmallest spatial distance to q. It is expected that the verticesin the same MBR are spatially closer to each other. Never-theless, each vertex may not be familiar with all the othersin the same MBR. For example, Figure 3(a) illustrates thatonly the vertices with the same color are acquainted witheach other, and inserting all the vertices in M1 to SI is po-tential to be too aggressive to generate a solution satisfyingthe familiarity constraint. On the other hand, choosing agroup of mutually acquainted vertices is not ecient, espe-cially for a large k, since more vertices in the same MBRcan be selected jointly. Thus, it is desirable to identify so-cial clusters with dierent connectivity in order to considervarious SSGQ with dierent k possibly specied by users.In the following, we propose SR-Tree by nding a set of so-

    cial clusters Ci for each MBR Mi of R-Tree, as shown in Fig-ure 3(b). Ci for each MBR Mi is constructed in a bottom-up manner. We iteratively combine small social clusters ina lower level to large social cluster in the higher level in Fig-ure 5. At the bottom level (level 0), each cluster is a singlevertex. Multiple clusters in level i are combined to generatea new cluster in level i + 1, while each cluster in level i isallowed to participate in multiple clusters in level i+1. Forexample, cluster {a} at level 0 in Figure 5 appears in {a, b}and {a, d} in level 1. We tailor the social clusters for SSGQaccording to the following three requirements.(1) Cross MBR Requirement. To avoid generating du-plicated clusters in multiple MBRs, each cluster c must spanat least two child MBRs of the current MBRMi, since a clus-ter spanning only one child MBR can be generated when weconsider the child MBR. Figure 5 shows an example, whereclusters {a, c}, {a, e}, {b, d}, {b, f}, {c, e} and {d, f} arenot generated since the vertices in each of them are locatedwithin the same MBR.(2) Social Connectivity Requirement. This require-

    954

  • (a)

    q

    (b)

    Figure 4: Example of a social network and the cor-responding MBRs.

    Figure 5: Hierarchically constructing the set of so-cial cluster Ci for MBR Mi.

    ment species the minimum social connectivity allowed foreach generated cluster c. By lowering this requirement, moresocial clusters can be generated at the cost of more memoryconsumption. To strike a good balance between eciencyand memory requirement, a good way is referring to param-eters k and p appearing in the past SSGQ history. Let PSdenote the average k

    pspecied in SSGQ of the history. For

    each vertex v in c, PS represents the percentage of vertices inc with no edge connecting to v. Therefore, Social Connectiv-ity Requirement species that on average each vertex in a so-cial cluster is allowed to share no edge with at most |c|PSother vertices in c. Here the ceiling function is adopted suchthat small social clusters are easier to be generated and thencombined to large clusters. Consider an example with thesocial graph in Figure 4(a) and the corresponding MBRs inFigure 4(b), where vertices g and h are not located in theseMBRs. When PS = 0.3, cluster {a, b, c} cannot be createdin level 1 of Figure 5, because each vertex on average sharesno edge with 4

    3other vertices and exceeds |c|PS = 1. Sim-

    ilarly, clusters {a, c, d}, {a, c, d, f} and {c, d, f} do not havesucient social connectivity to be created.(3) Spatial Distance Requirement. To avoid the casethat the vertices of a social cluster is distributed diverselyin a large MBR, it is necessary to limit the spatial distribu-tion of a cluster, and Spatial Distance Requirement ensuresthat the average distance from a vertex to the geometric me-dian of c must not exceed a threshold Td. Given the coordi-nates (xj, yj) of each vertex vj in c, the coordinates (xm, ym)of the geometric median m is (xm, ym) = argmin(xm,ym)

    vjc(xm xj)2 + (ym yj)2. The geometric median is

    regarded as a spatial index in SR-Tree and plays an impor-tant role in Join Insertion to prioritize the selection of asocial cluster. In Figure 5, cluster {e, f} cannot appear inlevel 1 if Td is identical to the distance from a to b.

    5.2 Joint InsertionTo improve the eciency of SSGSelect, Joint Insertion

    selects a social cluster with multiple vertices and moves themfrom SR to SI . A Joint Insertion operation can be regardedas a shortcut in the branch-and-bound tree, such that agood feasible solution D can be acquired earlier for DistancePruning to remove unnecessary vertices that are not able togenerate a better solution. Therefore, equipped with SR-

    Table 1: dm,q and corresponding actual average spa-tial distance to q.

    {a, b} {a, b, d, e} {a, b, c, d, e} {a, d}dm,q 9.6 10 11.2 11.3

    vcdv,q|c| 9.8 10.8 11.7 11.8

    Tree and Joint Insertion, SSGSelect can nd the optimalsolution with smaller time.Similar to Socio-Spatial Ordering, since it is desirable to

    choose a dense social cluster with a small average spatialdistance to q, both the social and spatial domains are exam-ined during the selection of a social cluster. Nevertheless,with only the spatial location of each vertex in c, it is ex-pensive for a large cluster to nd the average distance to q,because the spatial distance from each vertex to q speciedat this query needs to be rst calculated and then summedup on-line, while q is allowed to be dierent in each SSGQ.Therefore, Joint Insertion leverages the geometric medianm in Section 5.1, which can be computed o-line and uti-lized for each query, and regards dm,q as the estimation ofthe average spatial distance from c to q. Note that JointInsertion does not consider any cluster with |SI |+ |c| > p orSI c = . The reason is that the geometric median of c inthe case with SI c = cannot correctly estimate the av-erage spatial distance for the vertices to be jointly inserted,since some of them is already in SI .The social clusters are sorted according to the spatial dis-

    tance from the geometric median of each cluster to q. Sim-ilar to Socio-Spatial Ordering, the rst social cluster thatsatises Intra-Familiarity Condition (IFC) in Section 4.2 isselected and inserted to SI , while IFC here examines SI c,instead of SI {v}.Consider an example with the social graph and the cor-

    responding MBRs in Figure 4, where SI = {g, h}, p = 6,k = 3 and the rally point q is shown in Figure 4(b). Assumethe total spatial distance from SI to q is 13, and Table 1lists the clusters with dm,q , together with their correspond-ing actual average spatial distance to q. Joint Insertion rstnds out that {a, b} cannot be moved to SI because it doesnot satisfy IFC with k = 3, i.e., F (SI {a, b}) = 14 2 =0.5 < 4 1 34

    5. We then consider the intra-familiarity of

    {a, b, d, e}, F (SI{a, b, d, e}) = 16 16 61 365 , which sat-ises IFC. Therefore, Joint Insertion simultaneously movesa, b, d, and e from SR to SI at one iteration, which pro-duces a feasible solution with the total spatial distance as13 + 10.8 4 = 56.2.Join Insertion of c to SI represents a shortcut in the

    branch-and-bound tree to eciently nd a feasible solutionfor Distance Pruning. After nding the feasible solution,SSGSelect backtracks along the original shortcut back to SIand explores the search space in the same way described inSection 4 afterward. It is worth noting that some vertices inc needs to be considered again in order to construct anotherfeasible solution with those vertices, SI , and other verticesnot appearing in SI c, to ensure that SSGSelect is able tond the optimal solution eventually.

    6. EXPERIMENTAL RESULTS6.1 Experiment SetupWe implement SSGSelect in Facebook. Figure 6(a) shows

    the input page of SSGQ to specify p, k, and the coordinateof q, which is captured by clicking a location in the map.The result of SSGSelect is displayed in Figure 6(b) withthe names of the selected attendees and their locations. We

    955

  • (a)

    (b)

    Figure 6: Implementation of SSGQ in Facebook.

    invite 206 people from various communities, e.g., schools,government, and business to join our user study, which com-pares the solution quality and the time to answer SSGQ bymanual coordination and SSGSelect. Each of them answers24 SSGQ problems with the social graphs extracted fromtheir social networks in Facebook, together with their spa-tial locations sampling from their Facebook Checkin recordsshared to the user study. The 24 problems span various pand network sizes. Dierent k are assigned in the rst 12problems, while k is not specied in the other 12 problems,to let each user freely select p people for nding out thefamiliarity preferred by each person in activities with dier-ent p. In addition to the real dataset DataSet FB from the206 people, we evaluate the performance and the solutionquality of SSGSelect on a large real dataset, DataSet Large,which is obtained by crawling Foursquare [11], one of themost representative LBSNs, for a month. DataSet Largecontains both the social and spatial information of 153577individuals. The rally point q is assigned randomly, and wemeasure 50 samples in each scenario. Our algorithm is im-plemented in an IBM 3650 server with Linux Ubuntu 4.2.4,two Quadcore Intel X5450 3.0 GHz CPUs, and 8 GB RAM.To evaluate the eectiveness of the proposed strategies

    in SSGSelect, three additional algorithms, Baseline, SSGSe-lectNoJI, and SSGHeuri(i) are evaluated here. Baseline is abrute-force algorithm that enumerates every possible solu-tion. SSGSelectNoJI is identical to SSGSelect with SR-Treeand Joint Insertion removed. SSGHeuri(i) is a heuristic al-gorithm used for DataSet Large and outputs the i-th feasiblesolution obtained by SSGSelect.

    6.2 User StudyFigures 7(a)-(f) compare manual coordination and SSGS-

    elect to answer SSGQ in the user study. Figure 7(a) com-pares the time to nd the solutions in dierent scenarios.The result indicates that SSGQ is challenging for manualcoordination, especially for a large network size. In contrast,SSGSelect obtains the optimal solution with less than 1 sec-ond. Figure 7(b) with p = 5 and k = 3 demonstrates thatthe solutions from manual coordination lead to larger spatialdistance and thereby are not optimal. With a larger networksize, i.e., more friends nearby, it is easier to nd a group ofattendees with a smaller total spatial distance to q. In ad-dition, the solution quality in Figure 7(c) shows that evenin p = 5, each solution obtained by manual coordination isnot guaranteed to follow the familiarity constraint, accord-ing to the correctness rate shown in Figure 7(c), because itis very challenging for each person to jointly minimize thetotal spatial distance and ensure the familiarity constraint,and the correctness rate drops dramatically as the networksize increases.In Figures 7(d) and (e), we let each user freely select p

    0

    60

    120

    180

    7 9 11 13 15

    Network Size (p=5, k=3)

    Tim

    e (s

    )

    Manual SSGSelect

    (a)

    60

    120

    180

    240

    7 9 11 13 15

    Network Size (p=5, k=3)

    Dis

    tanc

    e

    Manual SSGSelect

    (b)

    88% 84% 83%

    67%57%

    67%

    150

    250

    350

    450

    (5,4,1) (7,5,2) (9,6,2) (11,7,2) (13,8,3) (15,9,3)(Network Size, p, k)

    Dist

    ance

    Manual Correctness Manual SSGSelect

    (c)

    1.32.5 3.0

    3.6 4.55.5

    0

    60

    120

    180

    (5,4) (7,5) (9,6) (11,7) (13,8) (15,9)(Network Size, p)

    Tim

    e (s)

    Manual Minimum k Manual SSGSelect

    (d)

    1.3 2.5 3.0 3.6 4.5 5.5150

    200

    250

    300

    350

    400

    (5,4) (7,5) (9,6) (11,7) (13,8) (15,9)(Network Size, p)

    Dist

    ance

    Manual Minimum k Manual SSGSelect

    (e)

    61% 72% 76% 78%100

    200

    300

    400

    500

    4 5 6 7k (Network Size=15, p=9)

    Dis

    tanc

    e

    Manual Correctness Manual SSGSelect

    (f)

    Figure 7: Results of user study.

    people to nd out the familiarity preferred by each personin activities with dierent p. The minimum k here repre-sents the smallest k for each solution to follow the familiarityconstraint. With this parameter extracted from the manualsolution, we regard it as an input parameter for a SSGQ inthe same social network, and the results demonstrate thatSSGSelect can nd a better solution following the same kwith smaller time. In other words, even when an user doesnot specify k, it is possible to analyze the previous manualcoordination results and nd out a suitable k for the user,such that SSGSelect is able to nd the optimal solutions ineach query afterward.Figure 7(f) with network size as 15 and p as 9 compares

    the results of dierent k. As k decreases, the correctnessrate of manual coordination drops because it becomes moredicult for an user to nd a tighter social group with thesame number of attendees. Moreover, the solution obtainedby manual coordination is still worse than the solution ofSSGSelect even with a loose requirement on social connec-tivity, i.e., a large k.

    6.3 Performance EvaluationFigures 8(a)-(b) evaluate the eciency of dierent algo-

    rithms on DataSet FB. Figure 8(a) compares the executiontime of the four algorithms with dierent network sizes.Since the search space of Baseline is enormous, Baseline isunable to return a solution within 12 hours when the net-work size is larger than 50. Therefore, there is only a sin-gle node for Baseline in Figure 8(a). On the other hand,with Socio-Spatial Ordering, Distance Pruning, and Famil-iarity Pruning, SSGSelect and SSGSelectNoJI both returnthe optimal solution in much smaller time. Furthermore,SSGSelect enhanced by SR-Tree and Joint Insertion outper-forms SSGSelectNoJI because it can nd feasible solutionsin a shorter time for Distance Pruning to remove unneces-sary search space earlier. Figure 8(b) compares the numberof explored states in the branching-and-bound tree requiredto nd a feasible solution of SSGSelect and SSGSelectNoJI.

    956

  • 1.E-021.E-011.E+001.E+011.E+021.E+031.E+041.E+05

    50 100 150 200 250Network Size (p=12, k=3)

    Exec

    utio

    n Ti

    me

    (s)SSGSelect SSGSelectNoJISSGHeuri(1) Baseline

    (a)

    1.E+031.E+041.E+051.E+061.E+071.E+08

    50 100 150 200 250

    Network Size (p=12, k=3)

    Num

    ber o

    f Sta

    tes

    SSGSelect SSGSelectNoJI

    (b)

    0.001

    0.01

    0.1

    1

    10

    100

    8 9 10 11 12p (k=3)

    Exec

    utio

    n Ti

    me

    (s)

    Network Size=100 Network Size=150Network Size=200 Network Size=250

    (c)

    05

    1015202530

    3 5 7 9 11k (p=12)

    Exec

    utio

    n Ti

    me

    (s)

    Network Size=100 Network Size=150Network Size=200 Network Size=250

    (d)

    1.E-01

    1.E+00

    1.E+01

    1.E+02

    1.E+03

    1.E+04

    50 100 150 200 250Network Size (p=12, k=3)

    Exec

    utio

    n Ti

    me

    (s)

    SO+DP+FP SO+DP DP+FP DP

    (e)

    1.E+041.E+051.E+061.E+071.E+081.E+091.E+101.E+11

    50 100 150 200 250

    Network Size (p=12, k=3)

    Num

    ber o

    f Sta

    tes

    SO+DP DP

    (f)

    Figure 8: Experimental results on DataSet FB.

    0

    150

    300

    450

    1 4 7 10i-th Feasible Solution (Network Size=10000, k=6)

    Dis

    tann

    ce

    p=7 p=8 p=9 p=10

    (a)

    0

    100

    200

    300

    400

    1 4 7 10i-th Feasible Solution (p=9, k=6)

    Dis

    tanc

    e

    Network Size=10000 Network Size=20000Network Size=30000 Network Size=40000

    (b)

    Figure 9: Experimental results on DataSet Large.

    It demonstrates that SSGSelect only needs 2% of the statesin SSGSelectNoJI to nd a feasible solution.In addition to the network size, we compare SSGSelect

    with dierent p and k in Figures 8(c) and (d). Figure 8(c)indicates that the execution time increases as p grows, be-cause SSGSelect in this case needs to explore a larger searchspace to nd the optimal solution. On the other hand, Fig-ure 8(d) shows that a larger k leads to a smaller executiontime because it becomes easier to obtain feasible solutionsfor Distance Pruning to trim the search space. Figures 8(e)and (f) analyze the proposed strategies in Section 4. The re-sult indicates that Socio-Spatial Ordering (SO) plays a cru-cial role to reduce the execution time for at least 40 times.This is because Socio-Spatial Ordering considers both spa-tial and social domains and thereby is able to guide theecient search of the feasible solutions and optimal solutionwith fewer states explored in the branch-and-bound tree, asshown in Figure 8(f).Figure 9 shows the solution quality of SSGHeuri(i) on

    the real dataset, DataSet Large. Figure 9(a) demonstratesthat the solution converges quickly as i increases, and SS-GHeuri(i) can obtain good solutions within the rst fewfeasible solutions. In Figure 9(b), we compare the i-th fea-sible solutions with dierent network sizes. The distancesdecrease when the network size grows since it is expectedthat a larger network includes more candidate nodes withsmaller spatial distances.

    7. CONCLUSION AND FUTURE WORKTo the best of our knowledge, there is no real system

    and existing work in the literature that addresses the is-sues of automatic activity planning based on social and spa-tial relationship of activity attendees. In this paper, we de-ne a useful query, namely, SSGQ, to obtain the optimalset of attendees. We show that the problem is NP-hardand devise an ecient algorithm, namely, SSGSelect to pro-cess SSGQ. Various strategies, including Distance Ordering,Socio-Spatial Ordering, Distance Pruning, and FamiliarityPruning are explored to prune unnecessary search space andobtain the optimal solution eciently. Moreover, with SR-Tree and Joint Insert, SSGSelect is improved and requires asmaller time to nd the optimal solution. The algorithm isimplemented in Facebook, and our user study demonstratesthat the proposed algorithm signicantly outperforms man-ual coordination in both solution quality and eciency.User study also brings useful suggestions to further enrich

    SSGQ as our future work. For example, since each candidateattendee is associated with multiple attributes in Facebook,such as age, gender, interest, company name, etc, those at-tributes can be specied as the input parameters for SSGQto rst lter unsuitable candidate attendees. A more inter-esting extension is to allow each user to specify the minimumnumber of attendees with each attribute value required to beselected, such as ensuring an activity with the same numberof male and female attendees when considering the genderattribute. Some users also suggest us to integrate the pro-posed query processing system with automatically deliveryof invitation messages and navigation instructions.

    8. REFERENCES[1] N. Roussopoulos, S. Kelley and F. Vincent. Nearest

    Neighbor Queries. SIGMOD, 1995.[2] T. Lappas, K. Liu, and E. Terzi. Finding a Team of

    Experts in Social Networks. KDD, 2009.[3] C.-T. Li and M.-K. Shan. Team Formation for

    Generalized Tasks in Expertise Social Networks.SocialCom, 2010.

    [4] M. Sozio and A. Gionis. The Community-SearchProblem and How to Plan a Successful Cocktail Party.KDD, 2010.

    [5] D. Papadias, Q. Shen, Y. Tao and K. Mouratidis.Group Nearest Neighbor Queries. ICDE, 2004.

    [6] Y. Tao, D. Papadias and Q. Shen. Continuous NearestNeighbor Search. VLDB, 2002.

    [7] D.-N. Yang, Y.-L. Chen, W.-C. Lee and M.-S. Chen.On Social-Temporal Group Query with AcquaintanceConstraint. VLDB, 2011.

    [8] Y. Gao, B. Zheng, W.-C. Lee, G. Chen. ContinuousVisible Nearest Neighbor Queries. EDBT, 2009.

    [9] Y. Gao and B. Zheng. Continuous Obstructed NearestNeighbor Queries in Spatial Databases. SIGMOD,2009.

    [10] G. Hjaltason and H. Samet. Distance Browsing inSpatial Databases. TODS, 1999.

    [11] M. Ye, P. Yin, W.-C. Lee and D.-L. Lee. ExploitingGeographical Inuence for CollaborativePoint-of-Interest Recommendation. SIGIR, 2011.

    957

    /ColorImageDict > /JPEG2000ColorACSImageDict > /JPEG2000ColorImageDict > /AntiAliasGrayImages false /CropGrayImages true /GrayImageMinResolution 300 /GrayImageMinResolutionPolicy /OK /DownsampleGrayImages true /GrayImageDownsampleType /Bicubic /GrayImageResolution 300 /GrayImageDepth -1 /GrayImageMinDownsampleDepth 2 /GrayImageDownsampleThreshold 1.50000 /EncodeGrayImages true /GrayImageFilter /DCTEncode /AutoFilterGrayImages true /GrayImageAutoFilterStrategy /JPEG /GrayACSImageDict > /GrayImageDict > /JPEG2000GrayACSImageDict > /JPEG2000GrayImageDict > /AntiAliasMonoImages false /CropMonoImages true /MonoImageMinResolution 1200 /MonoImageMinResolutionPolicy /OK /DownsampleMonoImages true /MonoImageDownsampleType /Bicubic /MonoImageResolution 600 /MonoImageDepth -1 /MonoImageDownsampleThreshold 1.50000 /EncodeMonoImages true /MonoImageFilter /CCITTFaxEncode /MonoImageDict > /AllowPSXObjects false /CheckCompliance [ /PDFX1a:2003 ] /PDFX1aCheck false /PDFX3Check false /PDFXCompliantPDFOnly false /PDFXNoTrimBoxError true /PDFXTrimBoxToMediaBoxOffset [ 0.00000 0.00000 0.00000 0.00000 ] /PDFXSetBleedBoxToMediaBox true /PDFXBleedBoxToTrimBoxOffset [ 0.00000 0.00000 0.00000 0.00000 ] /PDFXOutputIntentProfile (None) /PDFXOutputConditionIdentifier () /PDFXOutputCondition () /PDFXRegistryName () /PDFXTrapped /False

    /Description > /Namespace [ (Adobe) (Common) (1.0) ] /OtherNamespaces [ > /FormElements false /GenerateStructure false /IncludeBookmarks false /IncludeHyperlinks false /IncludeInteractive false /IncludeLayers false /IncludeProfiles false /MultimediaHandling /UseObjectSettings /Namespace [ (Adobe) (CreativeSuite) (2.0) ] /PDFXOutputIntentProfileSelector /DocumentCMYK /PreserveEditing true /UntaggedCMYKHandling /LeaveUntagged /UntaggedRGBHandling /UseDocumentProfile /UseDocumentBleed false >> ]>> setdistillerparams> setpagedevice