geo-friends recommendation in gps-based cyber-physical ...€¦ · • mobile devices: very...

Post on 28-Sep-2020

2 Views

Category:

Documents

0 Downloads

Preview:

Click to see full reader

TRANSCRIPT

Data and Information Systems Laboratory University of Illinois Urbana-Champaign

ASONAM 2011 July, 2011

Geo-Friends Recommendation in GPS-Based Cyber-Physical

Social Network

Xiao Yu, Ang Pan, Lu-An Tang, Zhenhui Li, Jiawei Han

University of Illinois at Urbana-Champaign

Acknowledgements: NSF, ARL, NASA, AFOSR (MURI), IBM & Boeing

Data and Information Systems Laboratory University of Illinois Urbana-Champaign

ASONAM 2011 July, 2011 2

Roadmap • Motivation • Background and Preliminaries • Geo-friend Finding Framework

• GPS Pattern Extraction • Build Pattern-Based Information Network • Random Walk with Restart on

Heterogeneous Information Network • Experiments • Conclusions

Data and Information Systems Laboratory University of Illinois Urbana-Champaign

ASONAM 2011 July, 2011 3

Motivation: Popularity of Mobile Devices • Mobile devices: Very popular, a major media of

communication

• Data from mobile devices (like real time GPS location, moving trajectories): Reflect users’ daily activities and real life social interactions

• Social network services: Allow users to store and share locations and trajectories collected from their mobile devices

A List of Major Location-Based Social Network Services

Foursquare Facebook Place Google Latitude Twitter Location Update

Yelp Check-in Google+ ……

Data and Information Systems Laboratory University of Illinois Urbana-Champaign

ASONAM 2011 July, 2011 4

Motivation: Geo-Friends Recommendation

• Social network with data collected from sensors is usually referred as Cyber-Physical Social Network

• Problem to be solved: Friend recommendation in GPS-based cyber-physical social networks, by combining GPS data with social network information

• Our method discovers real life friends on web-based social network

• Geo-Friends: Potential real life friends, who have both social similarities and geographical correlation

Data and Information Systems Laboratory University of Illinois Urbana-Champaign

ASONAM 2011 July, 2011 5

A Geo-Friend Finding Example

• Real life friends play an important role in off-line social events while most virtual on-line friends can fulfill such social function

Alex needs geo-friends join him in a local charity

event

Bob is college friend who lives in another

country now

Carlos is a co-worker but no social network

similarity with Alex

David shares common friends and goes to

same gym, same game store with Alex

David is more likely to be Alex’s geo-friend, but we cannot get this information by only analyzing social network or GPS data.

Data and Information Systems Laboratory University of Illinois Urbana-Champaign

ASONAM 2011 July, 2011 6

Contribution • Propose a geo-friend recommendation problem, and

discuss the differences from previously studied link prediction problem

• Define and generate a set of GPS patterns to describe people’s real life social interaction and correlation

• Propose a random walk-based statistical framework for geo-friend recommendation

• Design and conduct a series of experiments on both synthetic and real-world datasets

• Demonstrate the power of our method in various situations

Data and Information Systems Laboratory University of Illinois Urbana-Champaign

ASONAM 2011 July, 2011 7

Roadmap • Motivation • Background and Preliminaries • Geo-friend Finding Framework

• GPS Pattern Extraction • Build Pattern-Based Information Network • Random Walk with Restart on

Heterogeneous Information Network • Experiments • Conclusions

Data and Information Systems Laboratory University of Illinois Urbana-Champaign

ASONAM 2011 July, 2011 8

Data Model • GPS Trajectory: Sequentially connecting GPS records

of a particular user, following the ascending order of timestamps

• GPS-Based Cyber Physical Social Network:

G(S, V, E): • V: Set of people in the

network

• E: Set of edges, represents all the links between people

• S: Set of GPS trajectories associated with people

Data and Information Systems Laboratory University of Illinois Urbana-Champaign

ASONAM 2011 July, 2011 9

Problem Definition

• Given G(S, V, E), and a particular query posed by person v∗

• Return a ranked list of people nodes in V and also for each element v′ in the list:

• What’s more, the ranking score in the process should be relevant to both GPS trajectory S and social network (V, E)

Evv >∉< '*,

Data and Information Systems Laboratory University of Illinois Urbana-Champaign

ASONAM 2011 July, 2011 10

Roadmap • Motivation • Background and Preliminaries • Geo-friend Finding Framework

• GPS Pattern Extraction • Build Pattern-Based Information Network • Random Walk with Restart on

Heterogeneous Information Network • Experiments • Conclusions

Data and Information Systems Laboratory University of Illinois Urbana-Champaign

ASONAM 2011 July, 2011 11

Geo-Friends Finding Framework: 3 Steps • GPS pattern extraction

• Convert raw, noisy GPS data to meaningful and representative GPS patterns

• Pattern-based heterogeneous information network building

• Combine geographical and social information together in one network

• Random walk with restart on the network

• Use random walk score to measure similarity between people vertices

Data and Information Systems Laboratory University of Illinois Urbana-Champaign

ASONAM 2011 July, 2011 12

GPS Pattern Extraction

• Based on empirical observations and heuristics, we propose four different GPS patterns to capture these information

• First, convert raw GPS trajectory dataset S to categorical dataset Scat , and sequential dataset Sseq

• Scat : Discard temporal information and keep discretized locations in an unordered manner

• Sseq : Locations are sequentially connected by the order of timestamps

Data and Information Systems Laboratory University of Illinois Urbana-Champaign

ASONAM 2011 July, 2011 13

FL-Pattern • FL-Pattern: Closed frequent patterns with support ≥

2 in Scat is defined as Frequent Location Patterns

• Frequent patterns in Scat could be generated using FP-Growth

• Heuristic: GPS locations can reflect people’s interests, and people tend to go to their interest-related locations more often

• If two people share common locations, which suggests they might share common interests, the probability that they become friends would be higher.

Data and Information Systems Laboratory University of Illinois Urbana-Champaign

ASONAM 2011 July, 2011 14

FT-Pattern

• FT-Pattern: Closed sequential pattern with support ≥ 2 and length ≥ 2 in Sseq is Frequent Trajectory Pattern • Sequential Patterns in Sseq could be generated

using PrefixSpan • Heuristic: : GPS trajectory segments indicate people’s

habits and routines • People who share similar routines, tend to

become friends

Data and Information Systems Laboratory University of Illinois Urbana-Champaign

ASONAM 2011 July, 2011 15

FLT-Pattern • FLT-Pattern: For each FL-Pattern, if locations share

the same timestamp in all corresponding GPS trajectories, and no super-pattern with the same support can be generated by adding another time constrained location, this pattern is a Frequent Location with Time Constraint Pattern

• Heuristic:

• If two people share same locations at the same timestamps in their GPS trajectory, they should be geographically related.

Data and Information Systems Laboratory University of Illinois Urbana-Champaign

ASONAM 2011 July, 2011 16

FTT-Pattern

• FTT-Pattern: Similarly to FLT-Pattern, Frequent Trajectory with Time Constraint Pattern can be defined as closed sequential pattern with support ≥ 2 and length ≥ 2 in Sseq and it shares the same time period in corresponding GPS trajectories

• Heuristic: Two people share same routine in a specific time period, which indicates they are hanging out in that time period

• If two people hang out, the probability of they becoming geo-friends would be higher

Data and Information Systems Laboratory University of Illinois Urbana-Champaign

ASONAM 2011 July, 2011 17

Pattern-Based Social Network

• Build a pattern-based heterogeneous information network by combining GPS patterns and social network structures

• Given G(S, V, E), first discard raw GPS trajectory set S

• Then for each GPS pattern, create an additional node p, and link corresponding person node v with p if this GPS pattern exists in person v’s GPS trajectory history

Data and Information Systems Laboratory University of Illinois Urbana-Champaign

ASONAM 2011 July, 2011 18

Pattern-Based Social Network (2) • Create a new edge <v, p>, and add it to E′. Set E′ in

contains three types of edges: edges between people, edges from person nodes to pattern nodes, and edges from pattern nodes to person nodes.

Data and Information Systems Laboratory University of Illinois Urbana-Champaign

ASONAM 2011 July, 2011 19

Pattern Refinement

• Adding a large number of GPS patterns without selection may decrease the performance badly • Common locations contains no social similarity, e.g.,

bus stop, and hospital • Instead of manually refining patterns, we employ an

entropy-based thresholding measure* to refine and select discriminative GPS patterns • This method filter out patterns with high frequency

and low length * J.N. Kapur, P.K. Sahoo and A.K.C. Wong. A new method for gray-level picture

thresholding using the entropy of the histogram In Computer Vision, Graphics, and Image Processing, March 1985.

Data and Information Systems Laboratory University of Illinois Urbana-Champaign

ASONAM 2011 July, 2011 20

Edge Weights: between Pattern Nodes and Person Nodes

• After the construction of the heterogeneous information network, edge weights between nodes need to be defined

• From different types of GPS pattern nodes to person nodes

Nbp(v) is the set of pattern nodes

length(p) denotes the length of pattern p

timespan(p) denotes time span of a time constraint pattern p

Parameters α, β, γ and θ controls pattern importance

Data and Information Systems Laboratory University of Illinois Urbana-Champaign

ASONAM 2011 July, 2011 21

Edge Weights (2) • From pattern nodes to person nodes

• Nbv(p) denotes the set of person nodes connecting to pattern node p

• From person nodes to person nodes

• Nbv(v) denotes the set of person nodes connected to person node v

Data and Information Systems Laboratory University of Illinois Urbana-Champaign

ASONAM 2011 July, 2011 22

Transition Matrix • In order to apply random walk with restart on the

network, we need to convert network into a transition matrix and then normalize edge weights of pattern nodes • Pr(V) is an |V| × |V| matrix representing the transition

probability between person nodes to person nodes • Pr(A) is a |P|× |V| matrix representing the transition

probability from GPS pattern nodes to person nodes

• Pr(B) is a |V| × |P| matrix representing the transition probability from person nodes to GPS pattern nodes

Data and Information Systems Laboratory University of Illinois Urbana-Champaign

ASONAM 2011 July, 2011 23

Why Choose Random Walk with Restart • Random Walk with Restart can simulate the following aspects of

friend finding in GPS-based social network • If a GPS pattern contains more geographical information, the

in-coming probability from person nodes to this pattern should be higher, which increases the probability from one person to another via this GPS pattern

• If two people share more GSP patterns, the overall probability for one person link to another via these GPS pattern nodes would be higher

• If one GPS pattern is rare, the out-going probability of this node would be larger, so that people connected to this pattern would have a higher probability to be linked together

Data and Information Systems Laboratory University of Illinois Urbana-Champaign

ASONAM 2011 July, 2011 24

Random Walk with Restart

• Denote the query person as v∗. The random walk process can be represented as:

• RN is a vector, that represents the link relevance from all the nodes to query person v*

• R(t)N represents the link relevance of each node at

the tth iteration

• We assign R(0)N(v*) = 1 where v* is the query

nodes, and all the other elements to 0

Data and Information Systems Laboratory University of Illinois Urbana-Champaign

ASONAM 2011 July, 2011 25

Roadmap • Motivation • Background and Preliminaries • Geo-friend Finding Framework

• GPS Pattern Extraction • Build Pattern-Based Information Network • Random Walk with Restart on

Heterogeneous Information Network • Experiments • Conclusions

Data and Information Systems Laboratory University of Illinois Urbana-Champaign

ASONAM 2011 July, 2011 26

Datasets

• We generate 4 synthetic datasets with different sizes, attributes and distributions in order to cover different scenarios and thoroughly test our framework

• Also, apply our method on MIT Reality Mining dataset

Data and Information Systems Laboratory University of Illinois Urbana-Champaign

ASONAM 2011 July, 2011 27

Competitor Methods

• Random: random selection

• Same Edge: choose friends based on number of same friends

• GPS Similarity: choose friends by measuring GPS location and trajectory similarity

• Random Walk without GPS Patterns: Recommend friends by applying random walk with restart on the original social network

• Bluetooth (only MIT dataset): Recommend friends by returning people who share high meeting frequency

Data and Information Systems Laboratory University of Illinois Urbana-Champaign

ASONAM 2011 July, 2011 28

Performance (1)

gpsnet120 precision gpsnet120 recall

Data and Information Systems Laboratory University of Illinois Urbana-Champaign

ASONAM 2011 July, 2011 29

Performance (2)

Mit dataset precision Mit dataset recall

Data and Information Systems Laboratory University of Illinois Urbana-Champaign

ASONAM 2011 July, 2011 30

Performance (3)

gpsnet120 dataset precision-recall curve MIT dataset precision-recall curve

Precision and recall curve between Random Walk with Restart without GPS information and our method

Please refer to the paper for more experiment results and analysis

Data and Information Systems Laboratory University of Illinois Urbana-Champaign

ASONAM 2011 July, 2011 31

Roadmap • Motivation • Background and Preliminaries • Geo-friend Finding Framework

• GPS Pattern Extraction • Build Pattern-Based Information Network • Random Walk with Restart on

Heterogeneous Information Network • Experiments • Conclusions

Data and Information Systems Laboratory University of Illinois Urbana-Champaign

ASONAM 2011 July, 2011 32

Conclusions

• Propose a problem of identifying geographically related friends, and also a three-step statistical framework which combines geo-information with social analysis

• Future work • Domain-oriented GPS pattern definition • Friends recommendation based on user and

his/her interests • Real time friend recommendation by tracking user

GPS usage on the fly

top related