geosocial co-clustering: a novel framework for geosocial...

16
Geosocial Co-Clustering: A Novel Framework for Geosocial Community Detection Jungeun Kim , Jae-Gil Lee †* , Byung Suk Lee , Jiajun Liu § Graduate School of Knowledge Service Engineering, KAIST Department of Computer Science, University of Vermont § School of Information, Renmin University {je_kim, jaegil}@kaist.ac.kr, [email protected], § [email protected] Abstract—As location-based services using mobile devices have become globally popular these days, social network analysis (especially, community detection) increasingly benefits from com- bining social relationships with geographic preferences. In this regard, we formalize the problem of geosocial co-clustering, which co-clusters the users in social networks and the locations they vis- ited. Geosocial co-clustering detects higher-quality communities than existing approaches by improving the mapping separability, where the users in the same community tend to visit locations in the same region that has not been visited by the users in other communities. While geosocial co-clustering is soundly formalized as non-negative matrix tri-factorization, conventional matrix tri- factorization algorithms suffer from a significant computational overhead when handling large-scale data sets. Thus, we also develop an efficient framework for geosocial co-clustering, called GEO social CO arsening and DE composition (GEOCODE). In order to achieve efficient matrix tri-factorization, GEOCODE reduces the numbers of users and locations through coarsening and then decomposes the single whole matrix tri-factorization into a set of multiple smaller sub-matrix tri-factorizations. Thorough experiments conducted using real-world geosocial networks show that GEOCODE reduces the elapsed time by 1969 times while achieving the accuracy of up to 94.8% compared with the state- of-the-art co-clustering algorithm. Furthermore, the benefit of the mapping separability is clearly demonstrated through a local expert recommendation application. I. I NTRODUCTION The widespread use of location-aware mobile devices has made it feasible to collect precise locations of the users. In many social networking services, the users’ locations play an important role in social analytics by enriching social relationships between users [1]. Particularly, geosocial net- working services such as Foursquare and Yelp have attracted much attention since they can provide location intelligence through the analysis of not only users’ social relationships but also their favorite places. For example, they recommend to users some popular places that their friends also like. Additionally, traditional social networking services such as Foursquare and Twitter have adopted geo-tagging capabilities to capture the location preferences of their users. Along these lines, a majority of social networking services of today retain abundant data about users’ location preferences as well as social relationships. The strong correlation between social relationships and spa- tial (or geographic) preferences has been empirically proven * Jae-Gil Lee is the corresponding author. by several researchers [2], [3], [4]. That is, people tend to interact more frequently with other people who live closer than farther [2] or tend to visit nearby places which their friends or friends-of-friends have already visited [4]. Thus, it is widely recognized that the quality of social network analysis improves by considering spatial preferences together with social relationships [5], [6]. Among various related analytic problems, we tackle the problem of community detection [7] or, more specifically, geosocial community detection for finding the sets of users who are densely connected through both close social relationships and similar spatial preferences. Most of existing algorithms use clustering based on a simi- larity measure that combines social similarity and spatial sim- ilarity. Their approaches are distinguished into two categories depending on which side the clustering performed is centered on. In the user-centric approach, the clustering algorithm is essentially applied on the user side, with the spatial similarity incorporated into the social similarity represented as the edge weight [8], [9]. On the other hand, in the location-centric approach, the clustering algorithm is essentially applied on the location side, with the social similarity incorporated into the spatial similarity measure [10]. Like these, each category puts emphasis on a different perspective. As, however, geosocial features become increasingly integrated in real applications, it is imperative to use an approach that exploits the two sides of similarity equitably and simultaneously. A. Geosocial Co-Clustering In this paper, we formulate disjoint geosocial community detection by way of geosocial co-clustering. Co-clustering is a generic technique that clusters the rows and columns of a matrix simultaneously [11]. In geosocial co-clustering, a row and a column correspond to a user and a location, respectively. Therefore, users are clustered based on their spatial prefer- ences (i.e., the distributions of the locations visited by them) and, at the same time, locations are clustered based on their user populations (i.e., the distributions of the users visiting them). As a result, it can produce pairs of a cluster on the user side and a cluster on the location side at once. Geosocial co-clustering improves the quality of the detected communities by achieving mapping separability, which in- dicates how well the mappings between the users and the locations are separated into disjoint sets of matching user-

Upload: others

Post on 26-Jan-2021

2 views

Category:

Documents


0 download

TRANSCRIPT

  • Geosocial Co-Clustering: A Novel Frameworkfor Geosocial Community Detection

    Jungeun Kim†, Jae-Gil Lee†∗, Byung Suk Lee‡, Jiajun Liu§

    † Graduate School of Knowledge Service Engineering, KAIST‡ Department of Computer Science, University of Vermont

    § School of Information, Renmin University†{je_kim, jaegil}@kaist.ac.kr, ‡ [email protected], § [email protected]

    Abstract—As location-based services using mobile devices havebecome globally popular these days, social network analysis(especially, community detection) increasingly benefits from com-bining social relationships with geographic preferences. In thisregard, we formalize the problem of geosocial co-clustering, whichco-clusters the users in social networks and the locations they vis-ited. Geosocial co-clustering detects higher-quality communitiesthan existing approaches by improving the mapping separability,where the users in the same community tend to visit locations inthe same region that has not been visited by the users in othercommunities. While geosocial co-clustering is soundly formalizedas non-negative matrix tri-factorization, conventional matrix tri-factorization algorithms suffer from a significant computationaloverhead when handling large-scale data sets. Thus, we alsodevelop an efficient framework for geosocial co-clustering, calledGEOsocial COarsening and DEcomposition (GEOCODE). In orderto achieve efficient matrix tri-factorization, GEOCODE reducesthe numbers of users and locations through coarsening andthen decomposes the single whole matrix tri-factorization intoa set of multiple smaller sub-matrix tri-factorizations. Thoroughexperiments conducted using real-world geosocial networks showthat GEOCODE reduces the elapsed time by 19–69 times whileachieving the accuracy of up to 94.8% compared with the state-of-the-art co-clustering algorithm. Furthermore, the benefit ofthe mapping separability is clearly demonstrated through a localexpert recommendation application.

    I. INTRODUCTIONThe widespread use of location-aware mobile devices has

    made it feasible to collect precise locations of the users. Inmany social networking services, the users’ locations playan important role in social analytics by enriching socialrelationships between users [1]. Particularly, geosocial net-working services such as Foursquare and Yelp have attractedmuch attention since they can provide location intelligencethrough the analysis of not only users’ social relationshipsbut also their favorite places. For example, they recommendto users some popular places that their friends also like.Additionally, traditional social networking services such asFoursquare and Twitter have adopted geo-tagging capabilitiesto capture the location preferences of their users. Along theselines, a majority of social networking services of today retainabundant data about users’ location preferences as well associal relationships.

    The strong correlation between social relationships and spa-tial (or geographic) preferences has been empirically proven

    ∗ Jae-Gil Lee is the corresponding author.

    by several researchers [2], [3], [4]. That is, people tend tointeract more frequently with other people who live closerthan farther [2] or tend to visit nearby places which theirfriends or friends-of-friends have already visited [4]. Thus, it iswidely recognized that the quality of social network analysisimproves by considering spatial preferences together withsocial relationships [5], [6]. Among various related analyticproblems, we tackle the problem of community detection [7] or,more specifically, geosocial community detection for findingthe sets of users who are densely connected through both closesocial relationships and similar spatial preferences.

    Most of existing algorithms use clustering based on a simi-larity measure that combines social similarity and spatial sim-ilarity. Their approaches are distinguished into two categoriesdepending on which side the clustering performed is centeredon. In the user-centric approach, the clustering algorithm isessentially applied on the user side, with the spatial similarityincorporated into the social similarity represented as the edgeweight [8], [9]. On the other hand, in the location-centricapproach, the clustering algorithm is essentially applied on thelocation side, with the social similarity incorporated into thespatial similarity measure [10]. Like these, each category putsemphasis on a different perspective. As, however, geosocialfeatures become increasingly integrated in real applications, itis imperative to use an approach that exploits the two sides ofsimilarity equitably and simultaneously.

    A. Geosocial Co-ClusteringIn this paper, we formulate disjoint geosocial community

    detection by way of geosocial co-clustering. Co-clustering isa generic technique that clusters the rows and columns of amatrix simultaneously [11]. In geosocial co-clustering, a rowand a column correspond to a user and a location, respectively.Therefore, users are clustered based on their spatial prefer-ences (i.e., the distributions of the locations visited by them)and, at the same time, locations are clustered based on theiruser populations (i.e., the distributions of the users visitingthem). As a result, it can produce pairs of a cluster on theuser side and a cluster on the location side at once.

    Geosocial co-clustering improves the quality of the detectedcommunities by achieving mapping separability, which in-dicates how well the mappings between the users and thelocations are separated into disjoint sets of matching user-

  • Users

    Mappings

    Locations

    (a) User-centric approach.

    Users

    Mappings

    Locations

    (b) Location-centric approach.

    Users

    Mappings

    Locations

    (c) Co-clustering approach (ours).Fig. 1: Comparison of the approaches for geosocial community detection.

    location pairs. In other words, higher mapping separabilitymeans that there is a higher degree of one-to-one mappingbetween mutually exclusive subsets of users and mutuallyexclusive subsets of locations. Intuitively, it means that userswho belong to the same community visit a certain groupof locations that are not visited by users who belong to adifferent community. In contrast, the existing algorithms (e.g.,[8], [10], [9]) lack the mapping separability because theyperform clustering only on a single side, i.e., either users orlocations.

    Example 1: Figures 1a and 1b respectively show what theclusters from the user-centric and location-centric approacheswould look like. As a result of satisfying both social con-straints and spatial constraints, the clusters show that closelyconnected users visited close locations and vice versa. How-ever, the subsets of locations mapped from the disjoint subsetsof users in Figure 1a overlap with one another, and so do thesubsets of users mapped from the disjoint subsets of locationsin Figure 1b. That is, both approaches show poor mappingseparability. In contrast, Figure 1c show what the clustersfrom our co-clustering approach would look like. It showsbetter mapping separability, as there are few crossings betweenthe subsets of mappings. That is, the mappings form nearlydisjoint subsets and are separated into bundles of mappingedges. �

    Many applications suitable for disjoint community detectioncan benefit from this mapping separability. For instance, inan application to monitor potential or identified terrorists(or criminals), the mapping separability helps to find theterrorists (or criminals) that are collaboratively targeting thesame locations as the top priority. For another instance, inan application to recommend local experts [12], the mappingseparability helps to find a community of mutually familiarusers who are expert especially in a given region, becauseactivities concentrated on a certain topic often indicates highexpertise in that topic [13]. Another similar application isto recommend carpool drivers [14], for which the mappingseparability would help to find a group of mutually acquainteddrivers who frequently visit locations in the same region.

    We formulate geosocial co-clustering as a matrix tri-factorization problem. Here, a user-to-location matrix is con-structed with elements that represent the frequencies of visitsto a set of locations by a set of users. This matrix is thenfactorized into a user membership matrix, a scale matrix, anda location membership matrix. The user membership matrix

    defines the membership of a user in a user-side cluster, andthe location membership matrix defines the membership ofa location in a location-side cluster. While this formulationis theoretically sound, actually computing the solution isknown to be very costly. The state-of-the-art algorithm [15]has quadratic time complexity per iteration with the numberof users and the number of locations.

    B. GEOCODE Framework

    In order to address the efficiency issue of geosocial co-clustering, we propose a framework for finding an approximatesolution of the matrix factorization. We call this frameworkGEOCODE, which stands for GEOsocial COarsening andDEcomposition. GEOCODE is geared for geosocial networkswhich have a few intrinsic properties: notably, (i) socialrelationships and geographic preferences are strongly corre-lated [2], [3], [4], and (ii) the degrees of social relationshipsand the populations of the places all follow the power-lawdistribution. We take advantage of these properties in orderto improve the efficiency of geosocial co-clustering withoutlosing the accuracy.

    Our GEOCODE framework follows a divide-and-conquerstrategy and consists of three steps: the first step reduces thesize of the whole data set, and then the second step divides itinto multiple subsets; the third step conquers each subset.• Step 1 (Coarsening): This step coarsens users and loca-

    tions in order to reduce the numbers of rows and columnsin the user-to-location matrix. Here, a set of users iscollapsed into a (virtual) user, and a set of locations iscollapsed into a (virtual) location. Because of the power-law distributions, it is reasonable for heavy users or biglocations to absorb nearby insignificant users or locations.Furthermore, because of the correlation between users andlocations, this coarsening does not significantly change thestructure of the user-to-location matrix.

    • Step 2 (Decomposition): This step decomposes the two-layer graph in Figure 1c into multiple subgraphs suchthat the mappings between a pair of clusters do notcross subgraph boundaries. For this purpose, this stepfirst orders users and locations separately in a line bycrossing minimization [16] and then detects the best cuts bythe minimum description length (MDL) [17] principle. Eachsubgraph serves as a sub-matrix of the user-to-locationmatrix. The time complexity of this step is only linearlogarithmic in terms of the numbers of users or locations.

  • • Step 3 (Partial Co-Clustering): This step runs the state-of-the-art algorithm [15] for the matrix factorization againsteach sub-matrix identified in Step 2. Running the algorithmon each sub-matrix finishes much faster than running it onthe whole matrix and can overlap through parallel execu-tion. Therefore, the elapsed time for partial co-clusteringis much shorter than that of co-clustering on the whole.

    The source code of GEOCODE is fully available at https://github.com/jaegil/GEOCODE.

    C. SummaryKey Contributions:• Problem Formalization (Section III): We define the prob-

    lem of geosocial co-clustering to guarantee the map-ping separability in geosocial community detection. Then,we formulate this problem as non-negative matrix tri-factorization.

    • Efficiency Enhancement (Section IV): More importantly,we develop the GEOCODE framework to quickly find anapproximate solution. Our framework reduces the size ofthe problem by coarsening and decomposition as well assupporting parallel execution of the sub-problems.

    • Comprehensive Evaluation (Section V): We empiricallydemonstrate the benefit of geosocial co-clustering and theefficiency improvement of GEOCODE (19–69 times) usingfour real-world data sets. GEOCODE successfully findsthe geosocial communities whose members’ interests arehighly coherent.

    Problem Significance: We propose to use co-clustering forgeosocial community detection. Although co-clustering hasbeen popularly used for related problems with different goals,e.g., geosocial recommendation [18], it has not been a dom-inant player in geosocial community detection. Thus, it isworthwhile to promote co-clustering for geosocial communitydetection in order to share the benefit of the co-clusteringapproach.Scope: We deal with disjoint community detection in thispaper and leave an extension for overlapping communitydetection as future work. Although the overlapping is shownto be more favorable than the disjoint in some cases, it isnot universally agreed [19]. Disjoint community detection hasbeen actively employed, as indicated in a recent benchmarkproposed by Wang et al. [20], in parallel to overlapping com-munity detection.

    II. RELATED WORKA. Geosocial Community Detection

    The major algorithms can be classified depending on howthe two types of similarity are combined.• User-centric: In these algorithms, a social distance measure

    is extended to consider spatial proximity [8], [9]. Shakar-ian et al. [8] proposed a community detection algorithm,Louvain-D, based on modularity maximization. Instead ofusing the original modularity, they used the distance modu-larity where the probability that two vertices are connectedin a null model is dependent on the distance betweenthe two vertices. Gennip et al. [9] proposed a spectral

    clustering algorithm for geosocial networks. The value ofthe adjacency matrix is updated by the weighted sum of theoriginal value (i.e., connectivity) and the Euclidean distancebetween two individuals.

    • Location-centric: In these algorithms, a spatial distancemeasure is extended to consider social proximity [10]. Shiet al. [10] proposed a model called density-based clusteringplaces in geo-social networks (DCPGS). A new geosocialdistance measure between two places is defined by theweighted sum of social distance and spatial distance be-tween the two places, where the social distance is definedby the number of users who visited both places. Then, avariant of DBSCAN is performed using this new distancemeasure.

    None of these algorithms, however, cares about the mappingseparability.

    B. Co-ClusteringCo-clustering is based on the duality between data

    points (e.g., locations) and features (e.g., users). That is, datapoints are grouped according to their distributions on features,and features are grouped according to their distributions ondata points. A majority of the algorithms adopt matrix factor-ization [21], [22], [15], [23].

    Algorithms based on matrix factorization are categorizedby the numbers of factors and regularizers. Ding et al. [22]decompose a matrix into two factor matrices with one regu-larizer on data points only. Cai et al. [21] decompose a matrixinto three factor matrices without regularizers. More recently,Gu and Zhou [15] and Shang et al. [23] decompose a matrixinto three factor matrices with two regularizers on data pointsand features, respectively. Because of the dual regularizers,the algorithm by Gu and Zhou is called dual regularized co-clustering (DRCC). It optimizes one variable while fixing theother two variables, and the dynamic variable that is optimizedalternates among the three variables. This procedure repeatsuntil convergence, and convergence is theoretically guaranteed.The cost of this iterative optimization is known to be veryhigh [24]. In fact, the work on co-clustering has used onlymedium-size data sets which contain up to a few hundredthousand data points.

    C. Other Related Problems1) Geosocial Recommendation: Geosocial recommenda-

    tion (e.g., [25], [18], [26], [4], [27]) and geosocial communitydetection commonly try to exploit social influence (socialrelationship), geographic influence (spatial distance), and userpreference (mapping between users and locations) all together.However, the goal of geosocial recommendation is clearlydifferent from that of geosocial community detection, becausegeosocial recommendation aims at predicting the probabilitythat a user will visit an unvisited location. Refer to AppendixA-A for more details.

    Because a few recommendation algorithms such asCLR [18] and CCCF [28] adopt co-clustering, one might arguethat the communities obtained by these algorithms could be

    https://github.com/jaegil/GEOCODEhttps://github.com/jaegil/GEOCODE

  • TABLE I: Summary of the notation.Notation Description

    G = (V,E, L,F)a geosocial network (V : users; E: user-userrelationships; L: locations; F : user-to-locationmappings)

    vi or vj ∈ V a vertex, i.e., a userli or lj ∈ L a locationnU = |V | the total number of usersnL = |L| the total number of locationsf(vi, lj) frequency of user-to-location mappingsL(vi) the location vector of the user viV(li) the visitor vector of the location li

    DS(vi, vj) the social distance between vi and vjDP (li, lj) the spatial distance between li and lj

    GL = {VL, EL}a square grid graph (VL: grid centers; EL: gridconnections)

    C ∈ C a geosocial communitynC = |C| the number of geosocial communities

    U user membership matrixR ∈ R a geosocial region

    nR = |R| the number of geosocial regionsL location membership matrixX user-to-location matrix

    used for our purpose. However, co-clustering in recommenda-tion is mainly used to confine the search space to a specificcommunity. Thus, co-clustering should be accompanied withfurther refinement or collaborative filtering, and it does notneed to be as sophisticated as our geosocial co-clustering. Asfar as we know, none of co-clustering in recommendation fullyuses the above-mentioned three aspects.

    2) Geosocial Group Querying: Geosocial group query-ing (e.g., [29], [30], [31], [32], [33]) and geosocial communitydetection commonly aim at finding users that satisfy bothsocial and spatial constraints, but geosocial group queryingis clearly distinct in that it requires a query user or locationto be given whereas geosocial community detection doesnot. Besides, although geosocial group querying is provento be useful for several applications, there are still manyother applications that need geosocial community detection,especially when providing a query point or user is infeasibleor inapplicable, e.g., terrorist monitoring [34], [8] and criminalmonitoring [9]. Refer to Appendix A-B for more details.

    III. PROBLEM STATEMENT

    In this section, we formally define the problem of geosocialco-clustering and formulate it by matrix factorization. Table Isummarizes the notation used throughout this paper.

    A. Geosocial Co-ClusteringFirst, the user-to-location matrix X is constructed by Eq.

    (1), where f(vi, lj) indicates the number of visits by a uservi to a location lj . The location vector and the visitor vectorare introduced using X in Definitions 1 and 2.

    X = [Xi,j ], where Xi,j = f(vi, lj) (1)

    Definition 1: The location vector of a user vi, L(vi), is thei-th row vector of X, which represents the frequency that vihas visited lj (j = 1, . . . , nL). �

    𝜀

    𝑙𝑖

    𝑙𝑗

    𝑔𝑗𝑔𝑖

    Fig. 2: Square grid graph over the spatial domain.

    Definition 2: The visitor vector of a location lj , V(lj), isthe j-th column vector of X, which represents the frequencythat lj has been visited by vi (i = 1, . . . , nU ). �

    Next, a geosocial network is formally defined by adding themappings to an ordinary graph, as in Definition 3.

    Definition 3: A geosocial network is an attributed graphG = (V,E, L,F), where V is the set of vertices (i.e., users),E is the set of edges (i.e., social relationships between users),L is the set of locations, and F is a mapping from a user vionto a location vector L(vi). Formally, F : V → NnL . �

    For V and L, any distance metrics can be used for thesocial distance DS and the spatial distance DP . We use thefollowing metrics for simplicity, both based on the commonlyused notion of the shortest-path distance in a graph.• DS(vi, vj): We use the shortest-path distance [35] between

    two users vi and vj on G. That is, it is the length (numberof edges) of the shortest paths.

    • DP (li, lj): We use the Manhattan-like distance because theusers are likely to move within cities. We first partitionthe spatial extent into square grids of side length ε, thusconstructing a square grid graph GL = {VL, EL} in Figure2. Here, a vertex is the center of a grid, and an edgeconnects the centers of two adjacent grids. Let gi and gj bethe vertices (i.e., grid centers) holding two locations li andlj . Then, DP (li, lj) is the shortest-path distance betweengi and gj on the square grid graph.

    Then, the communities on the user side and the clus-ters (regions) on the location side are defined in Definitions 4and 5 respectively. Here, the number of geosocial communi-ties, nC , and the number of geosocial regions, nR, are given byusers as in the k-means algorithm. There are several heuristics(e.g., [36], [37], [38]) that can be applied to determine thevalues of nC and nR.

    Definition 4: Geosocial communities are disjoint sets ofsimilar users in terms of their spatial presences and socialrelationships. The optimal set of nC geosocial communitiesis defined as C = {C1, . . . , CnC} that satisfies the followingconditions for all i and j (i 6= j, 1 ≤ i, j ≤ |C|, C ∈ C):(1)

    ∑C∈C

    ∑i,j ‖L(vi)− L(vj)‖ is minimized, and

    (2)∑

    C∈C∑

    i,j DS(vi, vj) is minimized. �Definition 5: Geosocial regions are disjoint sets of similar

    locations in terms of their spatial distances and shared visitors.The optimal set of nR geosocial regions is defined as R ={R1, . . . , RnR} that satisfies the following conditions for all iand j (i 6= j, 1 ≤ i, j ≤ |R|, R ∈ R):(1)

    ∑R∈R

    ∑i,j ‖V(li)−V(lj)‖ is minimized, and

    (2)∑

    R∈R∑

    i,j DP (li, lj) is minimized. �In addition, the goodness of geosocial co-clustering is

    formalized by Definition 6.

  • Definition 6: The mapping separability between commu-nities and regions is the proportion of the mappings in thematching pairs of a community and a region (e.g., solid-lineboxes in Figure 1c) out of all pairs between them. To formalizethe matching pairs, first let MS(C,R) be the strength ofmapping between a community C and a region R definedby Eq. (2).

    MS(C,R) =∑

    vi∈C,lj∈R

    f(vi, lj) (2)

    Then,M(C,R) defined by Eq. (3) is the set of matching pairs,with the maximum strength of mapping, between C and R; themapping is from the larger set to the smaller set, and argmaxhere returns a singleton set by breaking a tie in the mappingstrength arbitrarily.

    M(C,R)=

    {{〈C,R′〉|C∈C,R′∈argmaxR∈RMS(C,R)} if nC≥nR{〈C′,R〉|C′∈argmaxC∈CMS(C,R), R∈R} otherwise

    (3)

    The mapping separability is then formulated by Eq. (4), whichcalculates the ratio of the total mapping strength of matchingpairs over the total mapping strength of all pairs from C andR.

    Separability(C,R) =∑〈C,R〉∈M(C,R)MS(C,R)∑C∈C,R∈RMS(C,R)

    � (4)

    There are three aspects considered altogether in geosocialco-clustering: (i) social similarity on the user side, (ii) spatialsimilarity on the location side, and (iii) mapping between usersand locations. The social similarity aspect is addressed by thesecond requirement of Definition 4, and the spatial similarityaspect by the second requirement of Definition 5. The mappingaspect, which is the mapping separability, is satisfied by thefirst requirements of Definitions 4 and 5 because they makethe mappings from a subset on one side head for the samesubset on the other side.

    Based on these three aspects, the geosocial co-clustering isdefined as in Definition 7.

    Definition 7: Given a geosocial network G, geosocial co-clustering finds a disjoint set of geosocial communities, C ={C1, . . . , CnC} and a disjoint set of geosocial regions, R ={R1, . . . , RnR} in the same process to maximize an objectivethat combines the three aspects in Definitions 4, 5, and 6. �

    B. Formulation by Matrix FactorizationWe translate the problem of geosocial co-clustering in Def-

    inition 7 to that of non-negative matrix tri-factorization [15].1) Input Data Matrices: There are three input matrices for

    the three aspects of geosocial co-clustering.• A user affinity matrix WU in Eq. (5) is an nU×nU matrix,

    where an element WUi,j represents whether two users vi andvj are adjacent in G.

    WUi,j =

    {1, DS(vi, vj) ≤ 10, otherwise

    (5)

    • A location affinity matrix WL in Eq. (6) is an nL × nLmatrix, where an element WLi,j represents whether the

    locations li and lj are placed in the same grid or in twoadjacent grids.

    WLi,j =

    {1, DP (li, lj) ≤ 10, otherwise

    (6)

    • A user-to-location matrix X in Eq. (1) is an nU × nLmatrix.

    2) Output Membership Matrices: The goal of our problemis to group the users into nC communities C = {C1, . . . , CnC}as well as to group the locations into nR regions R ={R1, . . . , RnR}. Thus, we need to obtain the matrices thatrepresent the membership of each user to a community andthat of each location to a region. The membership matrixU ∈ {0, 1}nU×nC holds the clustering result of users. Ui,jbecomes 1 if the user vi belongs to the community Cj and0 otherwise. Similarly, we use another membership matrixL ∈ {0, 1}nL×nR that holds the clustering result of locations.

    3) Objective Function: The objective function incorporatesthe three aspects of geosocial co-clustering.

    First, to support the social similarity aspect, high similaritybetween two users is penalized unless they belong to the samecommunity. This requirement is formulated by Eq. (7). Here,the graph Laplacian is given by LU = DU −WU ; WUdefined as Eq. (5) is the symmetric adjacency matrix, and DU

    is a diagonal matrix with the degree of each vertex whereDUi,i =

    ∑j W

    Ui,j , i.e., the column sums of W

    U .

    Jsocial =∑i,j

    ‖Ui − Uj‖2WUi,j

    =2Tr(UTDUU)− 2Tr(UTWUU) = 2Tr(UTLUU)(7)

    Second, to support the spatial similarity aspect, high sim-ilarity between two locations is penalized unless they belongto the same region. This requirement is formulated by Eq. (8).Here, the graph Laplacian is given by LL = DL−WL, whereDLi,i =

    ∑j W

    Li,j .

    Jspatial =∑i,j

    ‖Li − Lj‖2WLi,j

    =2Tr(LTDLL)− 2Tr(LTWLL) = 2Tr(LTLLL)(8)

    Third, to support the mapping aspect, the users in a specificcommunity and the locations in a specific region shouldhave as tight connections as possible. This is where co-clustering [15], [23], [24] kicks in. The objective function ofco-clustering is usually represented by non-negative matrix(tri-)factorization, as in Eq. (9). Here, ‖ · ‖ represents theFrobenius norm. Compared to two-factor factorization, tri-factorization provides a better approximation by the additionof the scale matrix S, which absorbs the different scales of X,U, and L [24].

    Jmapping = ‖X−USLT ‖2F (9)

    Putting Eqs. (7), (8), and (9) together, our problem is tominimize Eq. (10). Here, λ ≥ 0 and µ ≥ 0 are used tobalance the reconstruction error of co-clustering in the firstterm (Eq. (9)) against the quality of the geosocial communitiesin the second term (Eq. (7)) and that of the geosocial regionsin the third term (Eq. (8)), respectively. Thus, Eq. (10) is

  • Algorithm 1 GEOCODE (Main Framework)INPUT: G, nC , nR: Table I; λ, µ: regularization parameters;OUTPUT: C and R;

    1: C ← ∅, R← ∅; /* Initialization */2: /* STEP 1: COARSENING (SECTION IV-A) */3: G′ = (V ′, E′, L′,F ′) ← Coarsening(G);4: /* STEP 2: DECOMPOSITION (SECTIONS IV-B–IV-C) */5: /* STEP 2-1: Algorithm 2 */6: Ordered V ′ and L′ ← Crossing Minimization(G′);7: /* STEP 2-2: Algorithm 3 */8: G′1,G′2, . . . ← Cut Point Detection(Ordered V ′ and L′);9: /* STEP 3: PARTIAL CO-CLUSTERING (SECTION IV-D) */

    10: parallel for each G′i do11: n′C , n

    ′R ← Parameter Setting(G′i, nC , nR);

    12: Ci,Ri ← Partial Co-Clustering(G′i, n′C , n′R, λ, µ);13: C ← C ∪ Ci, R← R∪Ri;14: end parallel for15: return C and R;

    represented as non-negative matrix tri-factorization with dualregularizers (i.e., the second term and the third term). We relaxthe binary constraints of U and L to facilitate optimization asin other co-clustering algorithms.JGEO CC =

    ‖X−USLT ‖2F + λ · 2Tr(UTLUU) + µ · 2Tr(LTLLL)s.t.U ≥ 0,L ≥ 0

    (10)

    4) Co-Clustering Algorithm: The DRCC algorithm [15] op-timizes the objective with respect to one variable while fixingthe other variables. This procedure repeats until convergence,because convergence is theoretically guaranteed. Using dualregularizers and tri-factorization is shown to improve thequality of the clustering result [15], [23]. Despite this advan-tage, the algorithm notoriously suffers from slow computationspeed because of intensive matrix multiplications involved ineach iteration step [24], which makes co-clustering difficult tosupport large-scale geosocial networks in the real world. Itstime complexity is quite high, as in Theorem 1.

    Theorem 1: The time complexity of the DRCC algo-rithm [15] is O(t (nUnL(nR+nC)+n2UnC +n

    2LnR)), where

    t is the number of iterations performed. This complexity canbe expressed as O(t(nUnL+n2U +n

    2L)) under the assumption

    that nU � nC and nL � nR hold, and further as O(tN2),where N = max(nU , nL).

    Proof: See Appendix B-A.

    IV. GEOCODE FRAMEWORKThis section proposes GEOCODE developed for fast and

    accurate approximate processing of geosocial co-clustering inSection III.Recap on Overall Procedure: GEOCODE consists of threesteps, as illustrated in Figure 3. Algorithm 1 formally de-scribes the overall procedure of GEOCODE, which is self-explanatory.• Step 1 (Coarsening) coarsens users and locations in order

    to reduce the size of a geosocial network (Section IV-A).For example, v′5 replaces a merger of v5, v7, and v9 inFigure 3.

    • Step 2 (Decomposition) first reorders users and locationsfor easy decomposition (Section IV-B) and then finds the

    optimal cuts to derive multiple sub-matrices (Section IV-C).For example, v1 ∼ v4 and l7 ∼ l10 constitute a sub-matrixin Figure 3.

    • Step 3 (Partial Co-Clustering) solves co-clustering foreach sub-matrix and merge the solutions (Section IV-D).

    Design Rationale: The coarsening and decomposition donot damage the accuracy significantly, due to the followingrationales behind them.• Coarsening is mainly motivated by the power-law distribu-

    tions of social connections and spatial densities. The activeusers with many social connections become the centers ofgroups in social coarsening, and other neighboring userscollapse into their groups. A similar idea holds for verydensely-populated grids in spatial coarsening.

    • Decomposition benefits from the correlation betweenfriendship and distance [2], [3], [4]. The locations visitedby a user have strong clustering tendency because theirfrequency degrades inversely proportional to the distancefrom home [4], [27]. This clustering tendency tends to beshared by the users tied with social friendship. Therefore,the subsets of the users and locations connected by thecommon user-to-location mapping are typically cleanlyidentifiable.

    A. Step 1: CoarseningA geosocial network G = (V,E,L,F) is reduced in size

    to G′ = (V ′, E′, L′,F ′) through social and spatial coarsen-ing. Social coarsening encapsulates a socially connected usergroup into one virtual user (Definition 8). Spatial coarseningapproximates the spatial coordinate of locations by spacepartitioning (Definition 9).

    Definition 8: Social coarsening converts the original set ofvertices and edges (V,E) to a reduced set of vertices andedges (V ′, E′), as defined by Eqs. (11) and (12). A single edgeei,j ∈ E is selected by a certain rule, and the two endpointsvi, vj ∈ V are merged into vnew. The edges that were incidentto either vi or vj now point to the newly added vertex vnew.�

    V ′ = V \ {vi, vj | ei,j ∈ E} ∪ {vnew} (11)

    E′ = E \ {ei,j}\ {ei,k | ei,k ∈ E ∧ 1 ≤ k ≤ nU}\ {ek,j | ek,j ∈ E ∧ 1 ≤ k ≤ nU}∪ {enew,k | ei,k ∈ E ∧ 1 ≤ k ≤ nU}∪ {ek,new | ek,j ∈ E ∧ 1 ≤ k ≤ nU}

    (12)

    In social coarsening, the key issue is how to pick the edgeswhose end vertices will be merged. We use a simple yeteffective heuristic called the sorted heavy edge matching [39],following the power-law distributions of social connections. Itvisits the vertices in the ascending order of their degrees whilebreaking ties randomly; for each vertex visited, it selects anincident edge with the highest weight among those whose endvertices have not merged yet. A benefit of this heuristic is thatthe partitions of a coarser graph tend to have relatively smalledge cuts [39].

    Definition 9: Spatial coarsening converts the original setof locations L to a reduced set of locations L′ by increasing

  • 𝑣2

    𝑣1

    𝑣4

    𝑣3

    𝑣5𝑣7

    𝑣9

    𝑣8

    𝑣6

    𝑣10

    𝑙8𝑙7

    𝑙10𝑙9

    𝑙1 𝑙4

    𝑙2 𝑙3

    𝑙5

    𝑙6

    𝑣5′ 𝑣6

    𝑙1′ 𝑙2

    𝑣1 𝑣2 𝑣3 𝑣4 𝑣5′ 𝑣6

    𝑙7 𝑙8 𝑙9 𝑙10𝑙1′ 𝑙2

    𝑣1𝑣2𝑣3𝑣4𝑣5′ 𝑣6

    𝑙7 𝑙8 𝑙9 𝑙10𝑙1′ 𝑙2

    1. Coarsening 2. Decomposition

    Users

    Locations

    𝐷1 𝐷2

    3. Partial Co-Clustering

    𝐷1

    𝐷2 Res

    ults

    Fig. 3: Illustration of the overall procedure of GEOCODE.

    v1 v2 v3 v4 v5

    l1 l2 l3 l4 l5

    (a) Input bipartite graph.

    v1 v3 v5 v2 v4

    l1 l3 l5 l2 l4

    (b) Optimal solution.Fig. 4: Example of crossing minimization.

    v1 v2 v3 v4 v5

    l1 l2 l3 l4 l5𝑟𝑗 1 2 3 4 5

    sm5,5=1sm3,5=1

    𝑟𝑖𝑛𝑒𝑤1.5 1.5 3.54 3.5 44.3

    Fig. 5: Concept of the motif-based weighting.the side length from ε to ε′ (> ε). Please notice that thelocations are collapsed to the center points of the grids of GL.(See Figure 2 about GL.) �

    The mapping between users and locations, F , is updated toreflect the coarsened users and locations.

    B. Step 2-1: Crossing MinimizationThis step aims to achieve the same effects as the three

    terms in Eq. (10) with regard to the co-clustering. In Eq.(10), the first term pressures the algorithm toward clusteringthe mappings between users and locations into “bundles” (i.e.,partitioned collections of mappings), the second term towardclustering similar users together, and the third term towardclustering similar locations together. With the coarsening inplace, in GEOCODE we use crossing minimization to pres-sure the algorithm toward clustering the mappings and usetwo types of motifs—social motif and spatial motif —towardclustering similar users and similar locations, respectively.

    Crossing minimization reorders the vertices on both sidesof a bipartite graph such that the number of edge crossings isminimized. For example, the input bipartite graph in Figure4a has 28 crossings, whereas the optimal solution in Figure4b has only 10 crossings. It has the same effect of minimizingthe first term in Eq. (10), because it can be used to get asolution of spectral clustering [16], which is in turn equivalentto ordinary non-negative matrix factorization [40], i.e., thefirst term. Because optimal crossing minimization is NP-hard [16], several heuristics have been proposed, including thebarycenter heuristic [16]. This heuristic assigns a new rank ofa vertex on one side using the weighted mean of those of itsneighbors on the other side until convergence is reached. Theintuition here is that the vertices with similar mappings shouldhave similar ranks.

    1) Motif-Based Weighting Scheme: In order to support thesecond and third terms as well, we extend the barycenterheuristic by proposing a novel weighting scheme, called the

    motif-based weighting. In Definition 10, the weight of an edge〈vi, lj〉 between a user vi and a location lj contains not onlythe number of visits by vi to lj but also the number of triadsformed by social relationship, called social motifs, and thatof triads formed by spatial adjacency, called spatial motifs.For example, a social motif is formed by v3, v5, and l5 inFigure 5, where the edge between v3 and v5 indicates socialrelationship.

    Definition 10: The motif-based weighting is to assign theweight of an edge 〈vi, lj〉 by wi,j = Xi,j + λ · smi,j + µ ·pmi,j , where smi,j and pmi,j are, respectively, the numbersof social and spatial motifs that the edge belongs to. Xi,j is thenumber of visits by vi to lj , and λ and µ are the regularizationparameters in Eq. (10). �

    By injecting the motif-based weighting into the heuristic,the rank of a vertex on each side is determined by Eq. (13).

    rnewi =

    ∑j∈N(i) wi,j × rj∑j∈N(i) wi,j

    (13)

    Here, rnewi is the new rank of the i-th vertex on one side, rjis the current rank of the j-th vertex on the other side, andN(i) is the neighbors of the i-th vertex.

    The weight of an edge is boosted if it is involved in social orspatial motifs, which makes the rank of a common neighborvertex more heavily affect those of the vertices socially orspatially close. For example, the ranks of v3 and v5, whichare socially close, are heavily affected by the rank of l5. Inthis way, reordering takes account of social relationship andspatial adjacency as well as mapping separability.

    Example 2: In Figure 5, suppose that Xi,j’s are all 1. Inthe typical weighting scheme using only Xi,j , the new rankson the user side are determined by the averages of the ranksof the neighbors, which are 1.5, 1.5, 3.5, 3.5, and 4. Then, v3and v5 may not have consecutive ranks since v4 can be placedbetween them. In contrast, assuming λ = µ = 1 for simplicityof calculation, the motif-based weighting boosts the ranks ofv3 and v5 to 4 and 4.3 respectively. Now, v3 and v5 shouldhave consecutive ranks. �

    2) Algorithm Details: Algorithm 2 outlines the steps ofmotif-based crossing minimization. One side of the inputbipartite graph in Figure 1 is static, and the other sides isdynamic; the roles of the two sides are switched through iter-ations until there occurs no more change of ranks. Specifically,in each iteration, for each vertex i on the dynamic side (eitherV ′ or L′), its new rank rnewi is calculated by Eq. (13) (Line5). Finally, the ranks are determined by assigning a uniqueinteger in the order of ri (Lines 10 to 11).

    Theorem 2: The time complexity of the motif-based cross-ing minimization in Algorithm 2 is O(t (|V ′|+ |V ′| log |V ′|+

  • Algorithm 2 Motif-Based Crossing MinimizationINPUT: G′; unordered V ′ and L′;OUTPUT: ordered V ′ and L′;

    1: dynamicSide← V ′, staticSide← L′;2: repeat3: isChanged← false;4: for each i ∈ dynamicSide do5: rnewi ← Calculate using Eq. (13);6: if ri 6= brnewi c and ri 6= drnewi e then7: ri ← rnewi , isChanged← true;8: end if9: end for

    10: Sort all vertices in dynamicSide by ri’s;11: Adjust their ranks in the sorted order;12: Swap dynamicSide and staticSide;13: until isChanged 6= true;14: return ordered V ′ and L′;

    |L′| + |L′| log |L′|)), where t is the number of iterationsperformed. This complexity can be expressed as O(tN logN),where N = max(|V ′|, |L′|).

    Proof: Suppose that the dynamic side is V ′. In eachiteration of the REPEAT loop (Lines 2 to 13), calculating rnewifor each vertex takes O(〈dv〉 · |V ′|), where 〈dv〉 is the averagenumber of user-to-location mappings per user (Lines 4 to 9);sorting all vertices takes O(|V ′| log |V ′|) (Line 10); adjustingthe ranks of the vertices takes O(|V ′|). When the dynamicside is switched to L′, |V ′| should be replaced with |L′|. Sincedv � |V ′| and dl � |L′|, the total complexity of the REPEATloop can be expressed as O(t (|V ′| + |V ′| log |V ′| + |L′| +|L′| log |L′|)). It can be further simplified to O(tN logN),where N = max(|V ′|, |L′|).C. Step 2-2: Cut Detection

    After the vertices on the two sides are reordered individ-ually, this step aims to split them into multiple subsets likethose boxed in Figure 4b, each of which forms a sub-matrixof the user-to-location matrix X. A set of k−1 cuts divide theset of users (or locations) into k subsets with maintaining theirorder. That is, the elements until the cut and those after thecut are allocated to different subsets. Then, each sub-matrixis determined by Definition 11. The two subsets of the sameorder form a sub-matrix since the users should have densemappings with the locations of similar ranks, and vice versa.

    Definition 11: V ′ is split into {V ′1 , . . . , V ′k} where ∀i 6= j,V ′i ∩V ′j = ∅ and

    ⋃ki=1 V

    ′i = V

    ′; L′ is split into {L′1, . . . , L′k}where ∀i 6= j, L′i ∩ L′j = ∅ and

    ⋃kj=1 L

    ′i = L

    ′. Then, asub-matrix Di = 〈V ′i , L′i〉 is defined by a pair of V ′i and L′i.�

    The key technical challenge is to select the optimal set ofcuts. To this end, we contend that conciseness and homogeneityshould be satisfied. As for conciseness, we do not wanttoo many small sub-matrices, because the communities orregions crossing sub-matrix boundaries cannot be discoveredby GEOCODE. As for homogeneity, the user-to-location map-pings in each sub-matrix should be as similar as possible.

    Evidently, there is a trade-off between these two properties.In one extreme, one can make the entire user-to-locationmatrix a sub-matrix, but then the homogeneity in the entire

    TABLE II: The notation for the MDL formulation.Notation Description

    Di a sub-matrixsV (Di) the number of users in DisL(Di) the number of locations in Di

    n(Di)the number of possible mappings in Di (= sV (Di) ×sL(Di))

    n0(Di) the number of mapping absences in Din1(Di) the number of mapping existences in DiC(Di) the code cost of Di

    matrix will be very poor. In the other extreme, one canmake each pair of vertices a sub-matrix, which achieves thehighest homogeneity because a single user or location hasthe same mapping by itself. It thus offers an optimizationproblem, which can be formulated by the minimum descriptionlength (MDL) principle [17].

    1) Formalization Using the MDL Principle: The cost ofthe MDL principle consists of two components: L(H) andL(D|H). Here, H means the hypothesis, and D meansthe data. The two components are informally stated as fol-lows [17]: “L(H) is the length, in bits, of the description ofthe hypothesis; and L(D|H) is the length, in bits, of thedescription of the data when encoded with the help of thehypothesis.” The best hypothesis H to explain D is the onethat minimizes the sum of L(H) and L(D|H).H corresponds to the cuts of the user and location sides, and

    D corresponds to the user-to-location mappings. As a result,finding a good split translates to finding the best hypothesisusing the MDL principle. Table II summarizes the notationused here.• L(H) is formulated by Eq. (14). The first term is required

    to describe the number of sub-matrices. Here, log∗ is theuniversal code length for integers [41]. The second andthird terms are required to describe the number of users andthe number of locations, respectively, in each sub-matrixDi.

    L(H) = log∗ k +

    k∑i=1

    {dlog2 sV (Di)e+ dlog2 sL(Di)e

    }(14)

    • L(D|H) is formulated by Eq. (15). The first term isrequired to describe the number of possible mappings inDi. The second term, called the code cost, represents thenumber of bits required to transmit the user-to-locationmappings within Di. If the mappings coincide among allusers (or locations), which means that Di is perfectlyhomogeneous, C(Di) becomes zero, but if the mappingsare different with one another, C(Di) becomes high.

    L(D|H) =k∑i=1

    {dlog2(n(Di) + 1)e+ C(Di)

    }(15)

    For the derivation of C(Di), we regard that a mappingbetween a user and a location is binary, ignoring thefrequency. Then, n(Di) = n0(Di)+n1(Di). By the infor-mation theory, an existence of a mapping and an absence ofa mapping can be encoded using − log2 n1(Di)/n(Di) bitsand − log2 n0(Di)/n(Di) bits, respectively, on average.Thus, C(Di) is formulated by Eq. (16).

    C(Di) = −n(Di)∑

    j∈{0,1}

    nj(Di)

    n(Di)log2

    nj(Di)

    n(Di)(16)

  • Algorithm 3 Cut Detection by MDLINPUT: Ordered V ′ and L′ by Algoritm 2;OUTPUT: A set of sub-matrices {G′1, . . . ,G′k}

    1: D← {〈V ′, L′〉}, minCost←MDL(D);2: repeat3: isDecreasing ← false;4: /* Pick up the one with the largest code cost */5: m← argmax1≤m≤|D| C(〈V ′m, L′m〉);6: /* Find the best split on the location side */7: p← argmin1≤p≤lend

    {C(〈V ′m, L′m[1, p]〉)+C(〈V ′m, L′m(p, lend]〉)

    };

    8: /* Find the best split on the user side */9: q ← argmin1≤q≤vend

    {C(〈V ′m[1, q], L′m[1, p]〉)+C(〈V ′m(q, vend], L′m(p, lend]〉)

    };

    10: /* Check if the MDL cost decreases */11: D′ ← D \ {〈V ′m, L′m〉}∪

    {〈V ′m[1, q], L′m[1, p]〉, 〈V ′m(q, vend], L′m(p, lend]〉};12: newCost←MDL(D′);13: if newCost < minCost then14: D← D′, minCost← newCost;15: isDecreasing ← true;16: end if17: until isDecreasing 6= false;18: return D; /* Best set of sub-matrices */

    2) Algorithm Details: Although theoretically plausible,finding the best hypothesis is computationally prohibitive sincewe need to consider all possible splits of the entire matrixeven without knowing the number of cuts. Thus, Algorithm 3is designed as an approximation algorithm that progressivelydivides the matrix and finds a better set of sub-matricesalternately for the users and for the locations in a greedyfashion. Initially, all users and locations form a single sub-matrix (Line 1). In every iteration, the sub-matrix with thelargest code cost is selected for split (Lines 4 to 5). Then, thebest split point is determined each for the location side andthe user side such that the split minimizes the code cost (Lines6 to 9). Here, a split is denoted by the interval of ranks in asubset, e.g., [1, p] and (p, lend]. If this split reduces the MDLcost, the set of sub-matrices D is updated to reflect the split(Lines 10 to 16). These procedures repeat until the MDL costdoes not decrease any longer. We note that the number of cutsis automatically determined.

    Theorem 3: The time complexity of the cut detection inAlgorithm 3 is O(t (|V ′| + |L′|)), where t is the number ofiterations performed. This complexity can be expressed asO(tN), where N = max(|V ′|, |L′|).

    Proof: In each iteration of the REPEAT loop (Lines 2 to17), finding the best split takes at most O(|L′|) for the locationside (Line 7) and O(|V ′|) for the user side (Line 9), becausethere are up to lend ≤ |L′| and vend ≤ |V ′| possible choices,respectively. Thus, the total complexity of the REPEAT loopcan be expressed as O(t (|V ′|+ |L′|)).

    Figure 6 illustrates the first two iterations of Algorithm3. In this figure, the rows indicate users, and the columnsindicate locations. A mapping is denoted by a point at theintersection of a user and a location. In the first iteration, thebest split is found along the location side (Figure 6(a)) andthen the user side of the entire matrix (Figure 6(b)). In the

    (a) (b)

    (c) (d)

    locations

    users

    p

    q

    p

    q

    1st

    2nd

    Fig. 6: The first two iterations of Algorithm 3.second iteration, there are two sub-matrices, and the secondsub-matrix is selected for split. The same procedure is appliedto this sub-matrix (Figures 6(c)–(d)).

    D. Step 3: Partial Co-ClusteringAfter identifying the set of sub-matrices, we can use any

    conventional co-clustering algorithm for each sub-matrix. Inthis paper, our natural choice is the DRCC algorithm [15].When applying DRCC to each sub-matrix, we need to providethe number of geosocial communities and the number ofgeosocial regions in that sub-matrix. To this end, nC andnR are distributed to each sub-matrix in proportion to thesum of Xi,j (i.e., the numbers of visits) in that sub-matrixout of the total sum of Xi,j . Then, the DRCC algorithmruns concurrently by using multiple threads or machines. Theresults of DRCC for all sub-matrices are merged to form thefinal result of geosocial co-clustering.

    V. EVALUATIONThe evaluation was done through two sets of experiments

    with the following respective focuses: (i) the benefits ofusing the geosocial co-clustering approach on the clusterquality (Section V-B) and (ii) the resulting performances ofGEOCODE (Section V-C).

    All experiments were performed on a PC with Intel Core i7-3770 3.4 GHz CPU and 32 Gbyte RAM, running Windows 7.The number of parallel threads in Step 3 was set to be 4, equalto the number of cores. All algorithms were implemented inJava.

    A. Experiment Setup1) Data Sets: Table III shows the profiles of four real-

    world geosocial networks used in our experiments. Brightkiteand Gowalla are the popular data sets available at the SNAPrepository [42]. Foursquare is provided by the courtesy ofSarwat et al. [43]. Foursquare s is a proprietary data set [44],which was collected from Gangnam District in Seoul, Koreaduring the period from April to December 2012; this dataset contains the reviews left by users for each venue. SinceFoursquare s covers only a single district, we use it only forcase study in Section V-B2.

    2) Compared Algorithms: In the first set of experi-ments, we compared GEOCODE with the user-centric ap-proach of Louvain-D [8] and the location-centric approach ofDCPGS [10]. In the second set of experiments, we compared

  • : Louvain-D : DCPGS : GEOCODE

    0.4

    0.5

    0.6

    0.7

    0.8

    Brightkite Gowalla Foursquare

    1 -

    Co

    nd

    uct

    ance

    (a) User-side community quality.

    0.1

    0.2

    0.3

    0.4

    0.5

    Brightkite Gowalla Foursquare

    Sil

    houet

    te

    (b) Location-side cluster quality.

    0

    0.1

    0.2

    0.3

    0.4

    Brightkite Gowalla Foursquare

    Map

    pin

    g s

    epar

    ability

    (c) Mapping separability.Fig. 7: Comparison of clustering quality in three aspects between geosocial co-clustering and existing approaches.

    TABLE III: Profiles of the four real-world geosocial networks.Data #users #locations #check-in’s #edgesSet (nU ) (nL) (

    ∑ij f(vi, lj)) (|E|)

    Brightkite 50,886 72,434 4,730,778 197,167Gowalla 99,563 79,725 6,272,297 456,830

    Foursquare 301,228 152,427 10,021,823 2,616,276Foursquare s 383 451 9,188 258

    GEOCODE with the state-of-the-art co-clustering algorithmDRCC [15].

    3) Parameter Settings: For both GEOCODE and DRCC,which are based on co-clustering, we used the same parametervalues. The regularization parameters in Eq. (10) were setto be λ = µ = 250 as empirically determined by Gu andZhou [15]. We also verify the impact of these parameter valuesin Section V-C3. Additionally, we set nC and nR to be acommon number by a widely-accepted heuristic for estimatingthe number of clusters [36], [37], [38], i.e., to bnU ·nL/|X>0|c,where |X>0| denotes the number of non-zero elements inX. For the parameters unique to GEOCODE, the number ofsub-matrices was automatically determined by Algorithm 3,and the proportions of nC and nR for each sub-matrix weredistributed by the heuristic in Section IV-D.

    For user-centric and location-centric approaches, since thesemantics of their parameters are not compatible with ourapproach, we used the default settings suggested by the au-thors. For DCPGS, we set the minimum number of neighborsminPts, social constraint τ , geosocial distance constraint �,and balancing parameter ω for social and spatial distance to be5, 0.7, 0.4, and 0.5, respectively, following the default values.maxD, the maximum distance between two adjacent spatialgrids, was naturally 2

    √2×grid size. For Louvain-D, we set the

    exponential distance decay γ to be 1/e, following the defaultvalue, too.

    4) Coarsening Schemes: Coarsening is performed to reducethe numbers of users and locations almost without sacrific-ing accuracy. Level 0 means the original sets of users andlocations. At level 0, the grid size in Figure 2 was set tobe 5 kilometers in the first three data sets and 100 metersin the fourth data set because it was collected from a muchsmaller region. As the level increases by 1, the numbers ofusers and locations are reduced by approximately 20%. Insocial coarsening, the sorted heavy matching algorithm [39]stops when the 20% criterion is met; in spatial coarsening,the grid size is gradually enlarged until the criterion is met.As a result, |V ′| ≈ 0.80|V | and |L′| ≈ 0.80|L| at level 1,|V ′| ≈ 0.64|V | and |L′| ≈ 0.64|L| at level 2, and so on.

    5) Performance Metrics: The following quality metricswere adopted or designed to show the merits of GEOCODE.

    User-Side Community Quality (Figure 7a): The conduc-tance [45] was used because it is designed for clustering ofgraph data.1 It is defined as the fraction of total edge volumethat points outside the cluster. A lower value means highercluster quality.Location-Side Cluster Quality (Figure 7b): The silhouettecoefficient [46] was used because it is designed for clusteringof spatial or numeric data. It is defined as the degree ofhow similar an object is to its own cluster compared to otherclusters. A higher value means higher cluster quality.Mapping Separability (Figures 7c, 9, and 12): Eq. (4) inDefinition 6 was used to quantify the mapping separability.Semantic Similarity (Table IV and Figure 8): Additionally,we designed the following two semantic similarity metricsfor the quantifiable benefits of geosocial co-clustering in real-world applications discussed in Section V-B2.• The review similarity in Eq. (17) means the textual sim-

    ilarity of the review collections written by the users in acommunity C. For this measure, a term vector is createdfor each user using his/her reviews, and cosine similarityis calculated for each pair of the term vectors ui and uj .Then, the cosine similarities between pairs, cos(ui,ui), areaveraged over all pairs.

    Review similarity =

    ∑|C|i=1

    ∑|C|j=1 cos(ui,uj)

    |C|(|C| − 1)/2 , where i < j(17)

    • The category similarity in Eq. (18) means the categoricalsimilarity of the set R of the places visited by the users ina community. For each pair of such places li and lj , thesum of the path lengths to the lowest common ancestor,LCA(li, lj), is measured in the hierarchical category tree2.Then, converting dissimilarity to similarity by an exponen-tial decay function, e−LCA(li,lj) between pairs are averagedover all pairs.

    Category similarity =

    ∑|R|i=1

    ∑|R|j=1 e

    −LCA(li,lj)

    |R|(|R| − 1)/2 , where i < j(18)

    High review and category similarities in a community orcluster indicate coherent interests of users in the community,that is, the high quality of the cluster as a community repre-sentation.B. Benefits of Geosocial Co-Clustering

    1) Quantitative Analysis: In order to verify that geosocialco-clustering works as designed with real-world data sets, we

    1We also used the internal density [45] as another metric. The resultsshowed almost the same trend as the conductance.

    2https://developer.foursquare.com/categorytree

    https://developer.foursquare.com/categorytree

  • TABLE IV: Semantic similarity of the users in communities.Review similarity Category similarity

    Louvain-D 0.141 0.011DCPGS 0.152 0.014

    GEOCODE 0.228 0.024

    compared GEOCODE, the user-centric approach of Louvain-D, and the location-centric approach of DCPGS in threeaspects: (i) the quality of communities on the user side, (ii) thequality of regions (clusters) on the location side, and (iii) themapping separability between them, as in Figure 7. In theseexperiments, the coarsening level was set to be 0. Since theuser-centric and location-centric approaches produce clusterson only one side, the clusters on the other side were derivedfor comparison purpose through the mappings, as shown withthe overlapping broken-line boxes in Figure 1.

    As for the quality of user-side communities in Figure 7a,Louvain-D performed the best since its clustering was centeredon the user side, and DCPGS performed the worst. GEOCODEachieved high quality, too, as close as 81.2–86.7% of Louvain-D. In contrast, as for the quality of location-side clustersin Figure 7b, DCPGS performed the best, and Louvain-Dperformed the worst. Again, GEOCODE achieved high quality,too, as close as 78.7–87.7% of DCPGS. Therefore, GEOCODEachieved balanced high quality on both sides as opposed to theone-sided quality (at the expense of the other side) achievedby Louvain-D and DCPGS.

    As for the mapping separability in Figure 7c, GEOCODEsignificantly outperformed the other two approaches by up to2.6 times, as expected from the fact that GEOCODE improvesthe mapping separability through multiple iterations alternat-ing on each side. In summary, we confirm that GEOCODEadditionally guarantees the mapping separability while hardlydegrading the quality of communities or regions on either side.Appendix C-A shows the results of additional experiments onmapping separability.

    2) Case Study with Real Application: We aim to demon-strate the effect of the mapping separability in local expertrecommendation using the Foursquare s data set. As dis-cussed earlier, the mapping separability enables us to findthe communities of local experts who exclusively concentrateon specific topics. Thus, high coherency within a communityor region should be achieved by GEOCODE. The reviewsimilarity measures the coherency within a community (i.e., onthe user side) in terms of the review contents, and the categorysimilarity measures the coherency within a region (i.e., on thelocation side) in terms of the venue categories.

    Louvain-D, DCPGS, and GEOCODE produced 18, 16, and18 communities or regions, respectively, from this data set.Table IV shows the average values of the review and categorysimilarities from the three approaches. GEOCODE naturallyachieved significantly higher similarities than Louvain-D andDCPGS. This result means that the interests of the users wereshared more closely in GEOCODE than in the other twoapproaches.Location-Side Semantics: Figure 8 shows the category simi-larity aspect through visualization. In this figure, the categories

    of different venues are color-coded, and the boundaries ofregions (clusters) are shown in broken lines; the categorysimilarity of each region is shown in parentheses next to thecluster label. In Figure 8a, Louvain-D incorrectly separatedvenues of the same category (in the orange area) into differentregions. In contrast, GEOCODE in Figure 8c correctly put thevenues in the same region. In Figure 8b, DCPGS incorrectlyput venues of several different categories (in the grey area) inthe same region. In contrast, GEOCODE in Figure 8c morecorrectly put them in separate regions. Overall, GEOCODE isdoing the best job in putting venues in regions coherently andcompletely among the three approaches. Note that the categoryinformation is not used at all for geosocial co-clustering.User-Side Semantics: Regarding the review similarity aspect,Table V shows the top ten most frequent keywords appearingin the reviews by the users of the communities mapped fromthe colored areas in Figure 8. The keywords are color-codedby the colors of their categories. Louvain-D and GEOCODEwere compared for the orange area in Figures 8a and 8c, andDCPGS and GEOCODE for the grey area in Figures 8b and8c. In both cases, the topics (i.e., colors) of the keywords weremore coherent in GEOCODE than in Louvain-D or DCPGS.For instance, for the DCPGS in the grey area, the keyword“noodle” about eastern restaurants blurred the overall maintopics, western restaurants and nightlife. Overall, the reviewsimilarity observed is highly coherent, thus showcasing thecommunity quality of GEOCODE on the user side as well.

    C. Performances of GEOCODE1) Comparison with Co-Clustering on the Whole: Please

    recall that GEOCODE transforms co-clustering into the paral-lel execution of multiple partial co-clustering. In this regard,we aim to verify that GEOCODE significantly improvesthe efficiency over the state-of-the-art co-clustering algorithmDRCC while compromising the accuracy only insignificantly.Table VI first reports the number of sub-matrices determined inStep 2 and provided to Step 3 in GEOCODE. Figures 9 and 10show the accuracy and efficiency, respectively, resulting fromGEOCODE and DRCC with varying the coarsening level foreach data set. Evidently, the coarsening level had a nontrivialeffect in GEOCODE but was irrelevant to DRCC. Besides,DRCC failed to produce results for the Foursquare data set(in Figures 9c and 10c) because of an out-of-memory error.Accuracy: The accuracy shown in Figure 9 is the mappingseparability measured when the coarsening level varied from 0to 5. The value of GEOCODE was very close to that of DRCC(different by as small as 5.2–7.2%) when the coarsening levelwas 0 (i.e., no coarsening), and when the level was increasedto 5, the gap of the value increased to only 22.1–24.7%, about4 times the initial gap.Efficiency: The efficiency shown in Figure 10 is the elapsedtime when the coarsening level varied from 0 to 5. Evenwithout coarsening (i.e., at level 0), GEOCODE greatly out-performed DRCC by 19–23 times and by the time the levelwas increased to 5, the ratio reached 56–69 times, about 3times the initial ratio.

  • : Eastern Restaurant : Dessert : Western Restaurant : Nightlife : Shop&Service

    𝐂𝐋𝟏(0.019) 𝐂𝐋𝟐(0.007)

    𝐂𝐋𝟑(0.015)

    (a) Louvain-D.

    𝐂𝐃𝟏(0.027) 𝐂𝐃𝟐(0.008)

    (b) DCPGS.

    𝐂𝐆𝟏(0.029) 𝐂𝐆𝟐(0.022)

    𝐂𝐆𝟑(0.019)

    (c) GEOCODE.Fig. 8: Location-side clusters discovered by Louvain-D, DCPGS, and GEOCODE (best viewed in color).

    TABLE V: Top ten keywords from the reviews in the colored areas of Figure 8 (best viewed in color).Louvain-D vs. GEOCODE(orange area) DCPGS vs. GEOCODE (grey area)

    Louvain-D (CL1 & part of CL3) GEOCODE (CG3) DCPGS (CD2) GEOCODE (CG2 & part of CG3)coffee, ice flakes, spicy seafood

    noodle, bread, beer, parking, refill,wine, service, spaghetti

    bread, beer, pizza, fried food,salad, music, reservation,discount, wine, weekend

    coffee, ice flakes, beer, refill,service, pizza, wine, kindness,

    spaghetti, noodle

    coffee, ice flakes, refill, service, pizza,beer, wine, bread, pizza, salad

    : DRCC : GEOCODE

    0

    0.1

    0.2

    0.3

    0.4

    0 1 2 3 4 5

    Map

    pin

    g s

    epar

    abil

    ity

    Coarsening level

    Only 5.2% diff 22.1% diff

    (a) Brightkite.

    0.1

    0.2

    0.3

    0.4

    0.5

    0 1 2 3 4 5

    Map

    pin

    g s

    epar

    abil

    ity

    Coarsening level

    Only 7.2% diff 24.7% diff

    (b) Gowalla.

    0

    0.1

    0.2

    0.3

    0.4

    0 1 2 3 4 5

    Map

    pin

    g s

    epar

    abil

    ity

    Coarsening level

    DRCC: N/A

    (c) Foursquare.Fig. 9: Mapping separability of GEOCODE and DRCC with varying the coarsening level.

    0 1 2 3 4 5

    Ela

    pse

    d t

    ime

    (sec

    )

    Coarsening level

    19X56X

    105

    104

    103

    102

    (a) Brightkite.

    0 1 2 3 4 5

    Ela

    pse

    d t

    ime

    (sec

    )

    Coarsening level

    23X69X

    105

    104

    103

    102

    (b) Gowalla.

    0 1 2 3 4 5E

    lap

    sed

    tim

    e (s

    ec)

    Coarsening level

    105

    104

    103

    102

    DRCC: N/A

    (c) Foursquare.Fig. 10: Elapsed time of GEOCODE and DRCC with varying the coarsening level.

    : Step 1 : Step 2 : Step 3

    0 1 2 3 4 5

    Ela

    pse

    d t

    ime

    (sec

    )

    Coarsening level

    1000

    500

    0

    (a) Brightkite.

    0 1 2 3 4 5

    Ela

    pse

    d t

    ime(

    sec)

    Coarsening level

    3000

    1000

    0

    2000

    (b) Gowalla.

    0 1 2 3 4 5

    Ela

    pse

    d t

    ime

    (sec

    )

    Coarsening level

    20000

    0

    10000

    (c) Foursquare.Fig. 11: Breakdown of GEOCODE elapsed time with varying the coarsening level.

    : DRCC : GEOCODE

    0

    0.1

    0.2

    0.3

    0.4

    0 250 500 750 1000 1250

    Map

    pin

    g s

    epar

    abil

    ity

    Regularizer λ(=μ)

    (a) Brightkite.

    0.1

    0.2

    0.3

    0.4

    0.5

    0 250 500 750 1000 1250

    Map

    pin

    g s

    epar

    abil

    ity

    Regularizer λ(=μ)

    (b) Gowalla.

    0

    0.1

    0.2

    0.3

    0.4

    0 250 500 750 1000 1250

    Map

    pin

    g s

    epar

    abil

    ity

    Regularizer λ(=μ)

    DRCC: N/A

    (c) Foursquare.Fig. 12: Impact of the regularization parameters λ and µ (λ = µ) on the mapping separability.

    These results confirm GEOCODE’s ability to enhance theefficiency of geosocial co-clustering at only marginal expenseof the accuracy. We assert that this improved efficiency issignificant enough to make GEOCODE resolve the inability

    of the conventional co-clustering algorithm to handle largenetworks due to the high computational overhead.

    2) Breakdown of Elapsed Time: Figure 11 shows the break-down of the elapsed time of GEOCODE into the three steps

  • TABLE VI: Number of sub-matrices provided to Step 3.

    DataC. level 0 1 2 3 4 5

    Brightkite 5 4 4 4 3 3Gowalla 6 6 5 4 4 4

    Foursquare 8 8 7 7 6 6

    discussed in Section IV. We observe that the first two stepstook only a minor portion of the total elapsed time—no morethan 6.8% for Step 1 and no more than 21.2% for Step 2,both much less than 72.6–82.1% for Step 3. This differenceis understandable from the running time complexities of theindividual steps. Step 1 (coarsening) takes linear-logarithmictime [39], Step 2 (decomposition) also takes linear-logarithmictime by Theorems 2 and 3, and Step 3 (partial co-clustering)takes quadratic time by Theorem 1. The breakdown profilethus indicates that the overhead of the extra two steps (Steps1 and 2) on the total elapsed time is not significant at all;in other words, the efficiency gain achieved by GEOCODEcomes at little additional cost for the pre-processing steps.

    3) Sensitivity to Regularization Parameters: We performeda sensitivity testing of the mapping separability3 for varyingthe values of the two regularization parameters λ and µ. Pleaserecall that λ and µ are used to adjust the influences of thesecond and third terms, respectively, in the GEOCODE motif-based weighting scheme (Definition 10) and also in the DRCCobjective function (Eq. (10)). Thus, our goal in this testingwas to see whether and how the second and third terms of themodif-based weighting scheme play the same roles as thosein the DRCC objective function.

    Figure 12 shows the results, where λ and µ were always setto the same value varying from 0 to 1, 250, and the coarseninglevel was set to be 0. We observe that GEOCODE and DRCCshowed almost exactly the same pattern as the parametervalues changed. The quality remained stable and close to themaximum while the parameter values were in the range of 250to 750. Additionally, the parameter value 250, used for both λand µ for DRCC according to Gu and Zhou [15], was shownto be a reasonable value for GEOCODE as well. Thus, theresults empirically confirm that the social and spatial motifsof the motif-based weighting scheme do address the socialsimilarity and spatial similarity aspects, respectively.

    4) Scalability with Data Size: We examined how theelapsed time of GEOCODE (as well as the user-centricLouvain-D and the location-centric DCPGS) increased forvarying data size. The theoretical time complexity ofGEOCODE was empirically verified. See Appendix C-B forthe results.

    VI. CONCLUSIONIn this paper, we proposed geosocial co-clustering which

    efficiently co-clusters the users in social networks and thelocations they visited. Geosocial co-clustering proved to detectmore desirable communities than the existing approaches bymaximizing the mapping separability. We first formulated it as

    3We conducted the tests on the qualities of communities and regions aswell, and the results looked almost the same.

    a mathematical non-negative matrix tri-factorization problem.Then, we developed the GEOCODE framework to improveits computational efficiency through the coarsening and de-composition of social users and locations, respectively, intogroups. The decomposition used the crossing minimizationtechnique as well as the MDL principle and enabled par-titioning of a user-to-location matrix into sub-matrices sothat individual sub-matrices could be co-clustered separately.The effect was several orders of magnitude speedup withcomparable accuracy. Besides, the overheads of the coarseningand decomposition steps proved to be very light, thus addingto the efficacy of the GEOCODE framework. Overall, webelieve that GEOCODE is a practical and useful frameworkfor running geosocial community detection from large-scalegeosocial networks.

    The future work includes supporting overlapping commu-nities and handling temporal evolution of communities.

    REFERENCES

    [1] J.-G. Lee and M. Kang, “Geospatial big data: Challenges and opportu-nities,” Big Data Research, vol. 2, no. 2, pp. 74–81, 2015.

    [2] L. Backstrom, E. Sun, and C. Marlow, “Find me if you can: Improvinggeographical prediction with social and spatial proximity,” in Proc. 19thInt’l Conf. on World Wide Web, 2010, pp. 61–70.

    [3] E. Cho, S. A. Myers, and J. Leskovec, “Friendship and mobility:User movement in location-based social networks,” in Proc. 17th ACMSIGKDD Int’l Conf. on Knowledge Discovery and Data Mining, 2011,pp. 1082–1090.

    [4] H. Wang, M. Terrovitis, and N. Mamoulis, “Location recommendationin location-based social networks using user check-in data,” in Proc. 21stACM SIGSPATIAL Int’l Conf. on Advances in Geographic InformationSystems, 2013, pp. 364–374.

    [5] X. Wang, Y. Zhang, W. Zhang, and X. Lin, “Distance-aware influencemaximization in geo-social network,” in Proc. 32nd IEEE Int’l Conf. onData Engineering, 2016, pp. 1–12.

    [6] H. Yin, Z. Hu, X. Zhou, H. Wang, K. Zheng, N. Q. V. Hung, andS. W. Sadiq, “Discovering interpretable geo-social communities foruser behavior prediction,” in Proc. 32nd IEEE Int’l Conf. on DataEngineering, 2016, pp. 942–953.

    [7] J. Kim, J.-G. Lee, and S. Lim, “Differential flattening: A novel frame-work for community detection in multi-layer graphs,” ACM Trans. onIntelligent Systems and Technology, vol. 8, no. 2, p. 27, 2016.

    [8] P. Shakarian, P. Roos, D. Callahan, and C. Kirk, “Mining for geograph-ically disperse communities in social networks by leveraging distancemodularity,” in Proc. 19th ACM SIGKDD Int’l Conf. on KnowledgeDiscovery and Data Mining, 2013, pp. 1402–1409.

    [9] Y. van Gennip, B. Hunter, R. Ahn, P. Elliott, K. Luh, M. Halvorson,S. Reid, M. Valasik, J. Wo, G. E. Tita, A. L. Bertozzi, and P. J.Brantingham, “Community detection using spectral clustering on sparsegeosocial data,” SIAM Journal of Applied Mathematics, vol. 73, no. 1,pp. 67–83, 2013.

    [10] J. Shi, N. Mamoulis, D. Wu, and D. W. Cheung, “Density-based placeclustering in geo-social networks,” in Proc. 2014 ACM SIGMOD Int’lConf. on Management of Data, 2014, pp. 99–110.

    [11] I. S. Dhillon, S. Mallela, and D. S. Modha, “Information-theoreticco-clustering,” in Proc. 9th ACM SIGKDD Int’l Conf. on KnowledgeDiscovery and Data Mining, 2003, pp. 89–98.

    [12] Z. Cheng, J. Caverlee, H. Barthwal, and V. Bachani, “Who is thebarbecue king of texas?: A geo-spatial approach to finding local expertson Twitter,” in Proc. 37th Int’l ACM SIGIR Conf. on Research andDevelopment in Information Retrieval, 2014, pp. 335–344.

    [13] L. A. Adamic, J. Zhang, E. Bakshy, and M. S. Ackerman, “Knowledgesharing and yahoo answers: Everyone knows something,” in Proc. 17thInt’l Conf. on World Wide Web, 2008, pp. 665–674.

    [14] A. Elbery, M. ElNainay, F. Chen, C.-T. Lu, and J. Kendall, “A carpoolingrecommendation system based on social VANET and geo-social data,”in Proc. 21st ACM SIGSPATIAL Int’l Conf. on Advances in GeographicInformation Systems, 2013, pp. 556–559.

  • [15] Q. Gu and J. Zhou, “Co-clustering on manifolds,” in Proc. 15th ACMSIGKDD Int’l Conf. on Knowledge Discovery and Data Mining, 2009,pp. 359–368.

    [16] W. Ahmad and A. Khokhar, “chawk: An efficient biclustering algorithmbased on bipartite graph crossing minimization,” in Proc. VLDB Work-shop on Data Mining in Bioinformatics, 2007, pp. 1553–1558.

    [17] P. D. Grünwald, I. J. Myung, and M. A. Pitt, Advances in MinimumDescription Length: Theory and Applications. MIT press, 2005.

    [18] K. W.-T. Leung, D. L. Lee, and W.-C. Lee, “CLR: A collaborative loca-tion recommendation framework based on co-clustering,” in Proc. 34thInt’l ACM SIGIR Conf. on Research and Development in InformationRetrieval, 2011, pp. 305–314.

    [19] S. Gregory, “Finding overlapping communities using disjoint communitydetection algorithms,” Complex Networks, vol. 207, pp. 47–61, 2009.

    [20] M. Wang, C. Wang, J. X. Yu, and J. Zhang, “Community detectionin social networks: An in-depth benchmarking study with a procedure-oriented framework,” Proc. VLDB Endowment, vol. 8, no. 10, pp. 998–1009, 2015.

    [21] D. Cai, X. He, J. Han, and T. S. Huang, “Graph regularized nonnegativematrix factorization for data representation,” IEEE Trans. on PatternAnalysis and Machine Intelligence, vol. 33, no. 8, pp. 1548–1560, 2011.

    [22] C. Ding, T. Li, W. Peng, and H. Park, “Orthogonal nonnegative matrixtri-factorizations for clustering,” in Proc. 12th ACM SIGKDD Int’l Conf.on Knowledge Discovery and Data Mining, 2006, pp. 126–135.

    [23] F. Shang, L. C. Jiao, and F. Wang, “Graph dual regularization non-negative matrix factorization for co-clustering,” Pattern Recognition,vol. 45, no. 6, pp. 2237–2250, 2012.

    [24] H. Wang, F. Nie, H. Huang, and C. Ding, “Nonnegative matrix tri-factorization based high-order co-clustering and its fast implementation,”in Proc. 11th IEEE Int’l Conf. on Data Mining, 2011, pp. 774–783.

    [25] J. Bao, Y. Zheng, and M. F. Mokbel, “Location-based and preference-aware recommendation using sparse geo-social networking data,” inProc. 20th ACM SIGSPATIAL Int’l Conf. on Advances in GeographicInformation Systems, 2012, pp. 199–208.

    [26] M. Ye, P. Yin, W.-C. Lee, and D.-L. Lee, “Exploiting geographical influ-ence for collaborative point-of-interest recommendation,” in Proc. 34thInt’l ACM SIGIR Conf. on Research and Development in InformationRetrieval, 2011, pp. 325–334.

    [27] J.-D. Zhang and C.-Y. Chow, “iGSLR: Personalized geo-social locationrecommendation - a kernel density estimation approach,” in Proc. 21stACM SIGSPATIAL Int’l Conf. on Advances in Geographic InformationSystems, 2013, pp. 334–343.

    [28] Y. Wu, X. Liu, M. Xie, M. Ester, and Q. Yang, “CCCF: Improvingcollaborative filtering via scalable user-item co-clustering,” in Proc. 9thACM Int’l Conf. on Web Search and Data Mining, 2015, pp. 73–82.

    [29] N. Armenatzoglou, R. Ahuja, and D. Papadias, “Geo-social ranking:Functions and query processing,” The VLDB Journal, vol. 24, no. 6, pp.783–799, 2015.

    [30] N. Armenatzoglou, S. Papadopoulos, and D. Papadias, “A generalframework for geo-social query processing,” Proc. VLDB Endowment,vol. 6, no. 10, pp. 913–924, 2013.

    [31] Y. Li, R. Chen, J. Xu, Q. Huang, H. Hu, and B. Choi, “Geo-social k-cover group queries for collaborative spatial computing,” IEEE Trans.on Knowledge and Data Engineering, vol. 27, no. 10, pp. 2729–2742,2015.

    [32] K. Mouratidis, J. Li, Y. Tang, and N. Mamoulis, “Joint search bysocial and spatial proximity,” IEEE Trans. on Knowledge and DataEngineering, vol. 27, no. 3, pp. 781–793, 2015.

    [33] D.-N. Yang, C.-Y. Shen, W.-C. Lee, and M.-S. Chen, “On socio-spatialgroup query for location-based social networks,” in Proc. 18th ACMSIGKDD Int’l Conf. on Knowledge Discovery and Data Mining, 2012,pp. 949–957.

    [34] R. M. Medina and G. F. Hepner, “Advancing the understanding ofsociospatial dependencies in terrorist networks,” Transactions in GIS,vol. 15, no. 5, pp. 577–597, 2011.

    [35] D. Liben-Nowell and J. M. Kleinberg, “The link-prediction problemfor social networks,” Journal of the American Society for InformationScience and Technology, vol. 58, no. 7, pp. 1019–1031, 2007.

    [36] F. Can and E. A. Ozkarahan, “Concepts and effectiveness of thecover-coefficient-based clustering methodology for text databases,” ACMTrans. on Database Systems, vol. 15, no. 4, pp. 483–517, 1990.

    [37] C. Castro-Herrera, C. Duan, J. Cleland-Huang, and B. Mobasher, “Arecommender system for requirements elicitation in large-scale software

    projects,” in Proc. 2009 ACM Sympo. on Applied Computing, 2009, pp.1419–1426.

    [38] N. Hariri, C. Castro-Herrera, M. Mirakhorli, J. Cleland-Huang, andB. Mobasher, “Supporting domain analysis through mining and recom-mending features from online product listings,” IEEE Trans. on SoftwareEngineering, vol. 39, no. 12, pp. 1736–1752, 2013.

    [39] G. Karypis and V. Kumar, “A fast and high quality multilevel schemefor partitioning irregular graphs,” SIAM Journal on Scientific Computing,vol. 20, no. 1, pp. 359–392, 1998.

    [40] C. Ding, X. He, and H. D. Simon, “On the equivalence of nonnegativematrix factorization and spectral clustering,” in Proc. 2005 SIAM Int’lConf. on Data Mining, 2005, pp. 606–610.

    [41] T. Lee, “An introduction to coding theory and the two-part minimumdescription length principle,” International Statistical Review, vol. 69,no. 2, pp. 169–183, 2001.

    [42] J. Leskovec and A. Krevl, “SNAP Datasets: Stanford large networkdataset collection,” http://snap.stanford.edu/data, Jun. 2014.

    [43] M. Sarwat, J. J. Levandoski, A. Eldawy, and M. F. Mokbel, “LARS*:An efficient and scalable location-aware recommender system,” IEEETrans. on Knowledge and Data Engineering, vol. 26, no. 6, pp. 1384–1399, 2014.

    [44] M. Choy, J.-G. Lee, G. Gweon, and D. Kim, “Glaucus: Exploiting thewisdom of crowds for location-based queries in mobile environments,”in Proc. 8th AAAI Int’l Conf. on Weblogs and Social Media, 2014, pp.61–70.

    [45] J. Leskovec, K. J. Lang, and M. Mahoney, “Empirical comparison ofalgorithms for network community detection,” in Proc. 19th Int’l Conf.on World Wide Web, 2010, pp. 631–640.

    [46] P. J. Rousseeuw, “Silhouettes: A graphical aid to the interpretation andvalidation of cluster analysis,” Journal of Computational and AppliedMathematics, vol. 20, pp. 53–65, 1987.

    APPENDIX AMORE DETAILED SURVEY

    A. Geosocial Recommendation

    The algorithms for geosocial recommendation predict themissing probability that a user will visit unvisited locations.In many algorithms, collaborative filtering is improved usingvarious information in geosocial networks. Bao et al. [25] pro-posed a preference-aware location recommendation algorithmthat provides a user with a set of locations within a geospatialrange with the consideration of location history and socialopinions. Wang et al. [4] proposed a location recommendationalgorithm based on personalized PageRank, which integratesthe similarities of users into the user-based collaborativefiltering and filters out the locations that are far away from thevisited locations of the target user. Ye et al. [26] and Zhanget al. [27] proposed location recommendation algorithms thatexploit the distribution of the distances between two locationsvisited by the same user in order to model the user’s check-inbehaviors. The power-law distribution is simply assumed [26]while the distribution is learned from the user’s check-inhistory using kernel density estimation [27].

    Co-clustering has also been adopted for (location) recom-mendation. Leung et al. [18] and Wu et al. [28] proposed co-clustering based algorithms, CLR and CCCF respectively, tonarrow the search space of recommendation. CLR clusters auser-activity-location tripartite graph, focusing on the map-pings to activity types; CCCF clusters a user-item bipartitegraph, considering the mappings between users and items. Forfinal recommendation in both algorithms, additional collabo-rative filtering should be applied to each co-cluster.

    http://snap.stanford.edu/data

  • Algorithm 4 Dual Regularized Co-Clustering [15]INPUT: X, LU, LL; nU , nL, nC , nR: see Table I; λ, µ: regulariza-

    tion parameters;OUTPUT: Optimally partitioned U and L;

    1: Initialize U and L using k-means;2: while not convergent do3: S← (UTU)−1UTXL(LTL)−1;

    4: Uij ← Uij√

    [λL−UU+(XLST )++U(SLTLST )−]ij

    [λL+UU+(XLST )−+U(SLTLST )+]ij

    ;

    5: Lij ← Lij√

    [µL−LL+(XTUS)++L(STUTUS)−]ij

    [µL+LL+(XTUS)−+L(STUTUS)+]ij

    ;

    6: end while7: return U and L;

    B. Geosocial Group QueryingThe algorithms for geosocial group querying receive a

    reference point or user and then return a group of users who areclose to the reference point or user both socially and spatially.Their differences mostly lie in how they define the conditionsfor social proximity and spatial proximity. Armenatzoglouet al. [30] proposed a nearest star group (NSG) query; givena query point q and an integer m, an NSG query returnsa user group of size m, which forms a star subgraph inthe social network and minimizes the aggregated distanceof its members to q. Armenatzoglou et al. [29] proposed thegeosocial ranking (GSR) that ranks the users based on theirdistance to a query point q, the number of friends in thevicinity of q, and the connectivity of those friends. Li et al. [31]proposed a geosocial k-cover group (GSKCG) query; given aset of query points, a GSKCG query retrieves a minimumuser group in which each user is socially related to at least kother users and the associated regions of the users can jointlycover all the query points. Yang et al. [33] proposed a socio-spatial group query (SSGQ); given a rally point q, the numberp of activity invitees, and the average number k of unfamiliarpeople an invitee may have, an SSGQ returns a set of pusers from the social graph of the activity initiator such thatthe average spatial distance for each user to q is minimized.Mouratidis et al. [32] proposed a social and spatial rankingquery (SSRQ); given a query user uq , an SSRQ returns the top-k users according to the integrated measure obtained by theweighted sum of spatial proximity and social proximity to uq .All these algorithms are accompanied by novel indexes such asthe SaR-tree [31], the SR-tree [33], and the AIS index [32] thatstore social information in the leaf nodes of a spatial index.

    APPENDIX BPROOF OF THEOREMS

    A. Theorem 1Algorithm 4 shows the pseudocode of the DRCC algo-

    rithm [15]. Here, M, M+ and M− are matrices containingonly the positive elements and only the negative elements,respectively, of M such that M = M+ −M−.

    First, in Line 1, k-means takes O(ikn), where i is thenumber of iterations, k is the number of clusters, and n isthe number of data points, using the Lloyds algorithm. Thus,in our work the k-means clustering on U and L takes a total ofO(i(nUnC+nLnR)), where i is the larger number of iterations

    performed during k-means between U and L. Under theassumption that nU � nC and nL � nR hold, the complexitycan be expressed as O(i(nU +nL)). Second, in each iterationof the WHILE loop (Lines 2 to 6), matrix multiplications arethe dominant cost items among all matrix algebra operations.From the time complexity O(mnp) for multiplying an m×nmatrix and an n × p matrix, the time complexity of eachmatrix multiplication step in this algorithm is expressed inone of the following cubic forms: O(nUnRnC), O(nUnRnL),O(nCnLnU ), O(nCnLnR), O(n2RnL), O(nRn

    2L), O(n

    2CnU ),

    O(nCn2U ), O(n

    2CnR), and O(nCn

    2R). Under the assump-

    tion that nU � nC and nL � nR hold, O(nUnLnR),O(nUnLnC), O(n2UnC), and O(n

    2LnR) are dominant, and

    these terms can be further reduced to O(nUnL), O(n2U ), andO(n2L) if we regard nC and nR as small constants. Then,the total complexity of the WHILE loop can be expressedas O(t(nUnL + n2U + n

    2L)). It can be further simplified to

    O(tN2), where N = max(nU , nL). Since typically t � iholds, the total complexity of this algorithm is determined byLines 2 to 6.

    APPENDIX CMORE EXPERIMENT RESULTS

    A. Additional Experiments on Mapping SeparabilityTo further demonstrate the advantage of geosocial co-

    clustering on mapping separability, we conducted additionalexperiments using cross-validation commonly used for rec-ommendation systems.

    1) Settings: We used five-fold cross validation. The entireset of user-to-location mappings was randomly partitioned intofive folds. Four folds (i.e., 80%) were used as training data toobtain matching pairs of a community and a region,M(C,R),with Louvain-D, DCPGS, and GEOCODE respectively. Then,using the remaining one fold (i.e., 20%) as test data, weexamined how many mappings of that fold fell into a pair〈C,R〉 ∈ M(C,R).

    Following the convention of location recommendation, wedefine the precision, recall, and F1-score in Eqs. (19), (20),and (21). Here, the notation in Definition 6 is used; in addition,X ′ is the set of mappings in the remaining one fold, andδ(v, l) returns 1 if the mapping between v and l falls into apair 〈C,R〉 ∈ M(C,R) and otherwise 0. For a pair 〈C,R〉∈ M(C,R), the precision indicates how many mappings inX ′ from C head for R, and the recall indicates how manymappings in X ′ heading for R are from C.

    Precision =

    ∑〈C,R〉∈M(C,R)

    ∑v∈C∧〈v,l〉∈X′ f(v, l) · δ(v, l)∑

    〈C,R〉∈M(C,R)∑v∈C∧〈v,l〉∈X′ f(v, l)

    (19)

    Recall =

    ∑〈C,R〉∈M(C,R)

    ∑v∈C∧〈v,l〉∈X′ f(v, l) · δ(v, l)∑

    〈C,R〉∈M(C,R)∑l∈R∧〈v,l〉∈X′ f(v, l)

    (20)

    F1 = 2 ·Precision · Recall

    Precision + Recall(21)

  • : Louvain-D : DCPGS : GEOCODE

    0.1

    0.2

    0.3

    0.4

    0.5

    Brightkite Gowalla Foursquare

    F1

    score

    Fig. 13: Accuracy results of cross validation.2) Results: Figure 13 shows the F1-scores of Louvain-

    D, DCPGS, and GEOCODE for the three data sets. Werepeated the same experiment five times with different randompartitions and calculated the averages. It is observed thatGEOCODE achieved the highest F1-score in all three datasets. More specifically, GEOCODE outperformed Louvain-Dby 2.0–2.7 times and DCPGS by 1.3–1.5 times. These resultsmean that the communities and regions obtained by 80% ofmappings (training data) reflect the unknown 20% of mappings(test data) more precisely and completely in GEOCODE thanin Louvain-D and DCPGS.

    : Louvain-D : DCPGS : GEOCODE 12 : GEOCODE

    Ela

    pse

    d t

    ime

    (sec

    )

    Data size

    103

    102

    104

    105

    x1 x2 x3 x4 x5

    Fig. 14: Scalability with data size (×1 to ×5).B. Scalability with Data Size

    We increased the size of the Foursquare data set, the largestamong the four data sets in Table III, up to 5 times. In each stepof the increase, we replicated the original network struct