online learning of user-speci c destination prediction...

Online Learning of User-Specific Destination Prediction Models

Erfan DavamiDepartment of EECS

University of Central FloridaOrlando, FL 32816

Email: [email protected]

Gita Sukthankar∗

Department of EECSUniversity of Central Florida

Orlando, FL 32816Email: [email protected]

ABSTRACT

In this paper, we introduce and evaluate two dif-ferent mechanisms for efficient online updating ofuser-specific destination prediction models. Althoughusers can experience long periods of regular behaviorduring which it is possible to leverage the visitationtime to learn a static user-specific model of trans-portation patterns, many users exhibit a substantialamount of variability in their travel patterns, ei-ther because their habits slowly change over timeor they oscillate between several different routines.Our methods combat this problem by doing an on-line modification of the contribution of past data toaccount for this drift in user behavior. By learningmodel updates, our proposed mechanisms, DiscountFactor updating and Dynamic Conditional Probabil-ity Table assignment, can improve on the predictionaccuracy of the best non updating methods on twochallenging location-based social networking datasetswhile remaining robust to the effects of missing check-in data.

I INTRODUCTION

Mechanisms for learning predictive models of humantransportation patterns are often foiled by the con-flict between two forces: 1) strong correlations be-tween destination and visitation time 2) long peri-ods of disruptions when regular habits are not ob-served. People are often at work at 10:00 am, in bedat 1:00 am, and have numerous regular periodic com-mitments. This characteristic can dominate the errormetric on the training set, and most feature selectionparadigms will identify time and day as importantfeatures for predicting destinations.

However, there always exist long periods of disruptionwhen regular habits are not observed. Users go ontrips, experience deviations in their work and homeroutines, or change their lifestyles. In some cases,their behavior patterns will return to the learnedbaseline, but often the disruption represents a per-manent change. During this period of time, visitationtime and temporal dependencies will not be informa-

tive, and overreliance on those cues is punished. Inthis case unless the model can adapt to these changesin behavior, the accuracy will plummet since the ma-jority of samples will be predicted incorrectly. In thecase of a non-adaptive model, the learning mecha-nism will attempt to learn the model that predictsthe majority of the samples, effectively sacrificing thesamples that occur during those period of time.

To combat this problem, we introduce methods foronline learning of user-specific destination predic-tion models, Discount Factor updating and DynamicConditional Probability Table assignment. The keyto our methods is the use of efficient online updat-ing procedures that modify the contribution of pastdata to the current prediction of the user’s behavior.The baseline non-adaptive learning mechanism usedin this paper is a Bayes net, which for these location-based social networking datasets achieves compara-ble performance to the a set of specialized methodsfor modeling human mobility [2]. Our two adapta-tion mechanisms perform online modifications of theconditional probability tables used for the inferenceto model the user’s current transportation patterns;however the ideas behind the adaptive mechanismscould be generalized to other types of classifiers aswell. This paper demonstrates that the use of onlineadaptation can offer significant improvements in pre-diction accuracy, particularly for users with certainmobility profiles.

This paper is organized as follows. Section II presentsa selection of related work on learning models of hu-man transportation patterns. Section III describesthe location-based social media datasets and ourproposed online learning methods. In Section IV,we present an evaluation of our proposed methodsagainst several specialized methods for learning hu-man mobility patterns before concluding the paper.

II RELATED WORK

Learning techniques that leverage temporal depen-dencies between subsequent locations can performwell at modeling human transportation patterns from

of 8c©ASE 2012ISBN: 978-1-62561-004-1 144

GPS data. Although the assignment of GPS read-ings to road segments can be a noisy process, GPSgenerally provides a good continuous stream of datathat can be used to learn a variety of models such asdynamic Bayesian networks [8], hidden Markov mod-els [7], or conditional random fields [9]. The problemcan also be formulated as an inverse reinforcementlearning problem [12] in which the users are attempt-ing to select trajectories that maximize an unknownreward function. Another predictive assumption thatcan be made is that the users are operating accordingto a steering model that minimizes velocity changes;this model can be combined with hidden state estima-tion techniques to predict future user positions [11].

However, in this paper, the datasets that we areusing contain user check-ins collected from defunctlocation-based social networking sites (part of theStanford Large Network Dataset Collection [1]). Un-like in the Reality Mining dataset [3] or the MicrosoftMultiperson Location Survey (MSMLS) [6], the usermust voluntarily check-in to the social media site toannounce his/her presence to other users. If the userdoesn’t check in, no data is collected. Thus, thereare often significant discontinuities in the data whenthe user neglects to check in, and it is likely thatthe users opt to underreport their presence at cer-tain locations. For this type of dataset, we foundthat the dynamic Bayes network which utilizes tem-poral dependencies actually performs slightly worsethan the simple Bayes net used as the baseline forour model. Our proposed techniques are dynamicin the sense that they change their predictions overtime, but they do not explicitly maintain conditionaldependencies between subsequent check-ins.

Rather than trying to learn temporal dependencies,our aim is to use the visitation time as the key fea-ture, which is less sensitive to discontinuous data butvery sensitive to local changes in the users’ habits.These patterns can be discovered by doing an eigen-decomposition analysis of the data [4], and inter-estingly can be predictive of users’ activities sev-eral years into the future as shown in [10]. Cho etal. [2] demonstrate that a large section of this datasetcan be fitted using a two-state mixture of Gaussianswith a time-dependent state prior (Periodic Mobil-ity Model), which we use as one of our comparisonbenchmarks; the two latent states in their model cor-respond to the user’s home and work locations. Themain contribution of this paper is to demonstrate howonline learning can improve destination prediction bymaking the learned models more robust to temporarydisruptions in user behavior patterns.

III METHOD

This section describes:

1. the location-based social network datasets usedto learn and evaluate our destination predictionmodels;

2. our baseline non-adaptive Bayes net model;3. our first proposed method, Dynamic Condi-

tional Probability Table assignment (DCPTA),for creating multiple region-specific models foreach user;

4. Discount Factor adaptation (DF), our secondproposed method for diminishing the effects ofstale data in the conditional probability tableswith a discount factor.

1 DATASETS

The datasets used in this research were extractedfrom two location-based social networking websitescalled Gowalla and Brightkite. Cho et al. [2] havemade both datasets publicly available at the StanfordLarge Network Dataset Collection [1]. Gowalla (2007-2012), gave the users the option to check in at loca-tions through either their mobile app or their website,and Brightkite was a similar social networking web-site that was active from 2007 to 2011. The datafrom these two websites consists of one user recordper check-in that stores the user ID, exact time anddate of the check-in, along with the ID and coordi-nates of the check-in location. Table 1 shows somefeatures of these datasets, and Figure 1 shows a mapof user activity within the United States.

Table 1: Location-based Social Media Datasets

Dataset Gowalla BrightkiteRecords 6,442,857 4,492,538Users 107,092 50,687

Average check-ins per user 60.16 88.63Median check-ins per user 25 11

of 8c©ASE 2012ISBN: 978-1-62561-004-1 145

Figure 1: The scope of user check-ins across theUnited States for the Brightkite location-based so-cial networking dataset. This location-based servicewas primarily active in the United States, Europe,and Japan between 2007 and 2011.

2 BASELINE MODEL

For our non-adaptive model, we implemented a sim-ple Bayes net with our modified version of the BayesNet toolbox in Matlab. A Bayes net is a probabilis-tic graphical model that represents random variablesand their conditional dependencies in the form of adirected acyclic graph. Figure 2 shows the Bayesnet structure that we identified after experimentingwith other more complicated model structures anddynamic Bayes networks in which the variables wereconditioned on their values from the previous timestep.

Figure 2: Structure of the Bayes net used as the base-line model for inferring the user’s latitude and longi-tude from the check-in day and time.

In this paper we use a fast simple method for trainingthe network and extracting the most probable valuesof the output variables (the latitude and longitudenodes). The data structure of the network consists ofh× d× l matrices for the CPTs (Conditional Proba-bility Tables) in which h and d are respectively thehour and day of the week at which the observationoccurs and l is the list of possible check-in locations.

For parameter learning, the corresponding cells ofthe CPT of the output nodes are incremented; pre-dictions are made by looking up the argmax latitudeand longitude values for the user’s location based onthe check-in time. This method is feasible given thesimple independence assumptions in this model andthe large size of the dataset.

The main problem with the non-adaptive model isthe large distortions which occur in the probabilitytable when the user makes a long-range trip. Imag-ine a particular user being at some specific location,and following a repetitive pattern of activities forsome months. If the user goes on vacation for amonth, then the non-adaptive model will deliver a se-ries of incorrect predictions based on the previouslylearned CPT, only slowly adapting to the new sit-uation. Even once the user is back from the vaca-tion, the effect of the probability distortion (causedby check-ins during the trip) is still clearly visible.We propose two new online learning algorithms ca-pable of overcoming this problem, described in thenext sections.

3 DYNAMIC CONDITIONAL PROB-ABILITY TABLE ASSIGNMENT(DCPTA)

The movement pattern of most users in the datasetconsists of a regular pattern of periodic short-range movements punctuated by occasional long-range movements. Figure 3 shows the movement pat-tern of one randomly selected user in the Brightkitedataset.

The average distance between subsequent check-insends up being a good measure of the user’s mobility.When the user’s movement exceeds twice the averagedistance between check-ins, it generally signals thestart of a new mobility pattern. DCPTA (DynamicConditional Probability Table Assignment) uses thismeasure to determine when to learn a new user pro-file. By dividing the data into sections each timethis jump in movement occurs, we can segment themovement of any user into sections with a relativelylow variance which are stored in separate conditionalprobability tables and can be recovered if the userreturns to those regions. Algorithm 1 describes how

of 8c©ASE 2012ISBN: 978-1-62561-004-1 146

Figure 3: Movement history of a user in the Brightkite dataset. (left) Initially, the latitude and longitudeof the user’s check-ins are converted to a single distance measurement relative to the first recorded check-in.(middle) The movement history can be divided into sections of low variance for learning the user’s transporta-tion pattern in a particular region by segmenting the data stream based on movement jumps that exceedtwice the average distance between check-ins. (right) DCPTA learns a separate conditional probability tablefor each segment; these tables correspond to a different aspect of the user’s routine.

the DCPTA algorithm works.

Data: Check-ins of a particular userResult: Dynamic Conditional Probability Table

AssignmentLet D be the set of observed check-in distancesLet S be the set of observed stored segmentsfor every new check-in do

Determine the distance of the current check-infrom the initial check-in;di = dist(coordinatesi − coordinates0);if di ≤ 2 ∗mean(D) then

load CPT (arg mins∈S |di − s|)elseS ← S ∪ CPT (di);

end

endAlgorithm 1: DCPTA (Dynamic Conditional Prob-ability Table Assignment). This algorithm maintainsa running average of the user’s movements relative toan initial location and creates a new location-specificconditional probability table whenever the user’s rel-ative movements exceed a certain threshold.

4 DISCOUNT FACTOR ADAPTATION(DF)

DCPTA is most effective when the user returns toregions governed by previously learned conditionalprobability tables, and least effective when the userkeeps changing his/her habits. For instance, userswho are unemployed have a greater flexibility in theirdaily schedule which translates into a data series witha less defined temporal structure. To learn predic-tion models for users that exhibit erratic check-in be-haviors, we introduce a discount factor, γ, into the

process of updating the CPT such that the existingentry is discounted before incrementing the entry forthe new observation. γ can range between 0 and 1;our results indicate that the use of the discount factorimproves the online learning but that the learning isrelatively insensitive to the magnitude of the param-eter. Algorithm 2 gives the procedure for discountingconditional probability tables.

Data: Check-ins of a particular userResult: Discounted Conditional Probability Tablefor ∀h, d, l do

CPTLatitude(h, d, l) = ∗γ;CPTLongitude(h, d, l) = ∗γ;

endAlgorithm 2: DF (Discount Factor Adaptation).Before the CPT is updated with the incoming ob-servation, the discounting procedure is applied. Dis-counting the conditional probability table reduces theeffect of older check-ins on future predictions. Thistechnique works well if the user’s behavior changesslowly over time, rather than rapidly switching be-tween destination-specific transportation patterns.

The discount factor reduces the effect of previous ob-servations on the network, making the most recentcheck-ins more influential on the location predictionprocedure. The advantage of this method comparedto the previous proposed method is its lower compu-tational and programming complexity. Applying thediscount factor limits the location prediction to a fewprevious observations while discarding the stale datafrom older check-ins.

of 8c©ASE 2012ISBN: 978-1-62561-004-1 147

IV RESULTS

We employ two datasets, Gowalla and Brightkite,containing data from real users’ check-in informa-tion [2]. Our evaluations are performed over thesubset of users with greater than 100 check-ins, cor-responding to 7600 and 8800 from Brightkite andGowalla, respectively. We directly compare ourmethods against the techniques proposed by Cho etal. [2] and Gonzalez et al. [5].

As an additional baseline, we performed location pre-diction using the Bayes net (BN) described in Sec-tion III. This network consists of the four nodesshown in Figure 2, where the predicted latitudes andlongitudes are conditionally dependent on the week-day and hour of observation. For each user, the Bayesnet is first trained using 15% of check-in data so thatthe Bayes net can gain some information about theperiodic and geographical movement patterns of theuser. The test results for this strategy and also theDCPTA strategy are shown in Figure 4.

1 APPLYING DISCOUNT FACTOR ONTHE CPTS (THE DF METHOD)

As discussed above, applying a time-dependent dis-count factor can be a useful way of eliminating traveldistortion over the conditional probability tables ofour Bayes net. A discount factor of 0 implies a state-less system where counts are reset after each obser-vation; conversely, a factor of 1 applies no temporaldecay to the system. The first question we addressis whether this discount factor dataset-specific, andhow it impacts prediction accuracy. Figure 5 showsthe effect of varying the discount factor from 0 to 1.We observe that the performance is relatively insen-sitive to the precise value of the discount factor andthat a discount factor of 0.5 maximizes predictionaccuracy for either dataset and is used in subsequentexperiments.

Figure 4 shows how the prediction rate of the originalBayes net improves with the enhancement of dynamicconditional probability table assignment (DCPTA)and discount factor (DF). Specifically, we examinehow accuracy varies with tolerance, which is definedas the level of error that is acceptable (considered ascorrect), expressed as a fraction of the total distancetraveled by the user. For instance, a tolerance of 0.05specifies that a prediction must lie within 5% of acheck-in to be counted as correct. In this figure, wesee that the proposed enhancements improve over thebaseline BN over the entire curve, and add approx-

imately 5% to the prediction rate, with DF slightlyoutperforming DCPTA, over the entire curve.

Figure 6 compares the location prediction resultsfrom our methods to the following five recent meth-ods described in the literature:

1. Periodic mobility model [2], denoted as PMM;

2. Periodic and social mobility model [2], denotedas PSSM;

3. Gaussian Mixture Model [5], denoted as G;

4. Last-known location model [2], denoted as RW;

5. Most frequent location model [2], denoted asMF.

Figure 6: Prediction performance of the Bayes netpredictor and the proposed enhancements on theBayes net along with the performance of prior arton the Brightkite dataset.

The Periodic Mobility Model (PMM) assumes themajority of the human movement in a network isbased on a periodic movement between a small setof locations. The Periodic and Social Mobility Model(PSMM) also adds additional parameters to modelmovement driven by one’s social relationships withother members of the network.

Our Bayes net methods are denoted as BN,BN&DCPTA, and BN&DF, and the comparison em-ploys a tolerance level of 2.7%.

of 8c©ASE 2012ISBN: 978-1-62561-004-1 148

Figure 4: Prediction performance of the Bayes net predictor and the proposed enhancements on the Bayesnet (DCPTA and DF) on the Brightkite dataset (left) and the Gowalla dataset (right). Tolerance is thefraction of the total distance traveled by a user that is considered the acceptable distance of the predictionand the actual location of the user in every check-in.

Figure 5: Prediction performance using the proposed Bayes net vs. discount factor on the Brightkite dataset(left) and the Gowalla dataset (right). We observe that the prediction accuracy is relatively insensitive tothe choice of discount factor and that a factor near 0.5 maximizes performance on both datasets.

Figure 7: Comparing the movement pattern of different users in the Brightkite dataset. The top two patternsare better predicted using the DCPTA method however the DF method performs better at predicting thebottom two patterns. A possible hypothesis is that DCPTA performs better for users who have multipleshort trips, compared to the DF updating.

of 8c©ASE 2012ISBN: 978-1-62561-004-1 149

We observe that the BN without enhancement (36%)performs almost as well as the best of the state-of-the-art approaches, PMM (36.5%) and PSMM (36.3%).However, with our enhancements, we see that accu-racy increases by almost 6%, with BN&DF at 42%,slightly outperforming BN&DCPTA at 41%. The re-maining baselines (B, RW, MF) are not competitive.

The results shown in Figure 6 are averages over manypredictions. Figure 7 provides a more detailed lookat some specific instances, and we see that DCPTAdoes occasionally outperform DF on users with cer-tain features. The top row of the figure shows exam-ples of users for whom DCPTA performs best whilethe bottom row shows some for whom DF is a bet-ter predictor. We hypothesize that users who spendtime shuttling between a small set of locations andrelatively little time on infrequent long-range tripsare better predicted using DCPTA; conversely, DF isbetter able to handle users who go on long trips andmake frequent check-ins away from home.

2 HANDLING MISSING DATA

Missing data within the dataset can be a severe prob-lem for location prediction algorithms. The algo-rithms used for prediction of GPS data often will notwork as well when dealing with check-in data due tothe high inconsistency of datapoints. Unfortunately,due to the relatively high overhead imposed on usersby a check-in action, the chance of collecting datawith missing check-ins is inevitable.

In this section we examine the robustness of the pro-posed algorithms towards missing data. Seven exper-iments were conducted using both datasets in whicha percentage of check-in data was randomly withheldfrom the dataset. Figure 8 summarizes the predictionresults on each dataset. All of the proposed methodsare quite robust to missing data, with the best (DF)showing a drop of only 10% for 70% missing data onthe Brightkite dataset (left) and negligible loss on theGowalla dataset (right). This confirms our belief thatthere is significant redundancy in the second datasetthat can be exploited. Somewhat surprisingly, we ob-serve a slight improvement in DCPTA’s performancewith missing data on the Gowalla dataset. We at-tribute this to the fact that withholding data has theeffect of reducing check-ins corresponding to long-range travels, which results in a reduction of suchoutliers.

Figure 8: Prediction rates of proposed algorithmswhen applied to datasets with missing data. Somefraction of the check-in data is randomly withheldand then predicted using the belief network and theproposed enhancements. Our approaches exhibit ro-bustness to missing data.

3 COMPLEXITY

We briefly summarize the computational complex-ity and storage requirements for the proposed meth-ods. The core data structure behind our methodsis a conditional probability table (described in Sec-tion 3B). Storing such a discretized table, even atdouble-precision, is cheap: a table with 24× 7× 700double-precision cells requires less than 1MB of mem-ory.

Conditional probability tables are also computation-ally efficient, affording constant-time updates. Find-ing the maximum in the table employs an exhaustivescan that is linear O(N) with respect to the numberof cells, N ; in practice, since the number of cells isaround 100K, this remains very efficient.

The DCPTA method (Section 3.C) requires multipleCPTs for every segment of the users’ movement pat-tern, thus requiring a memory growth of O(s) wheres denotes the number of segments. In terms of com-putational complexity, the algorithm must search sCPTs in order to load the right CPT for future use,resulting a computational complexity of O(sN).

Finally, the DF method simply multiplies the CPT bya real number (discount factor). This procedure hasno impact on the memory usage of the belief networkhowever, but increases the computational complexityequivalent to a scalar matrix multiply, which is theo-retically O(N) but very efficient on current hardware.

Table 2 presents measured running times for eachmethod on both datasets. The processing was doneusing Matlab 2012a, an Intel Quad-core Xeon Pro-cessor and 18GB of memory.

of 8c©ASE 2012ISBN: 978-1-62561-004-1 150

Table 2: Computational time required for eachdataset (minutes)

Name of Dataset Gowalla BrightkiteProcessed Users 8800 7600Processed check-ins 2,694,344 3,399,651Belief Network 9 12BN&DF 10 12BN&DCPTA 19 20

V CONCLUSION

In this paper we present two new algorithms for on-line learning of user-specific destination predictionmodels, Dynamic Conditional Probability Table As-signment (DCPTA) and Discount Factor updating(DF). Although we describe the use of our onlineupdate procedures for a Bayes net model, the sameintuitions behind the discounting of stale data andthreshold switching between multiple models can beapplied toward online learning procedures for othertypes of classifiers. Our proposed destination predic-tion model leverages the predictive power of visitationtimes while rapidly adapting to schedule changes bythe users. Adapting to changing user habits allowsour model to achieve better predictive performancethan the best static models which are continually pe-nalized by non-stationary user behavior.

ACKNOWLEDGMENTS

This research was funded by AFOSR YIP FA9550-09-1-0525 and DARPA award N10AP20027.

References

[1] Stanford Large Network Dataset Collection.http://snap.stanford.edu/data/index.html.

[2] E. Cho, S. Meyers, and J. Leskovec. Friendshipand mobility: user-movement in location basedsocial networks. In Proceedings of the Interna-tional Conference on Knowledge Discovery andData Mining, 2011.

[3] N. Eagle and A. Pentland. Reality mining: Sens-ing complex social systems. Journal of Personaland Ubiquitous Computing, 2005.

[4] N. Eagle and A. Pentland. Eigenbehaviors: Iden-tifying structure in routine. In Proceedings ofthe ACM International Conference on Ubiqui-tous Computing, 2006.

[5] M. Gonzalez, C. Hidalgo, and A. Barabasi. Un-derstanding individual human mobility patterns.Nature, 453(7196):779–782, 2008.

[6] J. Krumm and E. Horvitz. Predestination: In-ferring destinations from partial trajectories. InProceedings of the ACM International Confer-ence on Ubiquitous Computing, 2006.

[7] J. Letchner, J. Krumm, and E. Horvitz. Triprouter with individualized preferences (TRIP):Incorporating personalization into route plan-ning. In Proceedings of Conference on InnovativeApplications of Artificial Intelligence, 2006.

[8] L. Liao, D. Fox, and H. Kautz. Learning andinferring transportation routines. In Proceedingsof National Conference on Artificial Intelligence,2004.

[9] L. Liao, D. Fox, and H. Kautz. Extracting placesand activities from GPS traces using hierarchicalconditional random fields. International Journalof Robotics Research, 26:119–134, 2006.

[10] A. Sadilek and J. Krumm. Far out: Predict-ing long-term human mobility. In Proceedingsof National Conference on Artificial Intelligence,2012.

[11] B. Tastan and G. Sukthankar. Leveraging hu-man behavior models to improve path predictionand tracking in indoor environments. Pervasiveand Mobile Computing, 7:319–330, 2011.

[12] B. Ziebart, A. Maas, J. A. Bagnell, and A. Dey.Maximum entropy inverse reinforcement learn-ing. In Proceedings of National Conference onArtificial Intelligence, volume 3, pages 1433–1438, 2008.

of 8c©ASE 2012ISBN: 978-1-62561-004-1 151

online learning of user-speci c destination prediction...

Documents