a joint deep recommendation framework for...

Research ArticleA Joint Deep Recommendation Framework for Location-BasedSocial Networks

Omer Tal and Yang Liu

Department of Physics and Computer Science Wilfrid Laurier University Waterloo Ontario Canada

Correspondence should be addressed to Yang Liu yangliuwluca

Received 1 December 2018 Revised 11 February 2019 Accepted 5 March 2019 Published 19 March 2019

Guest Editor Jiajie Xu

Copyright copy 2019 Omer Tal and Yang Liu This is an open access article distributed under the Creative Commons AttributionLicense which permits unrestricted use distribution and reproduction in any medium provided the original work is properlycited

Location-based social networks such as Yelp and Tripadvisor which allow users to share experiences about visited locations withtheir friends have gained increasing popularity in recent years However as more locations become available the need for accuratesystems able to present personalized suggestions arises By providing such service point-of-interest recommender systems haveattracted much interest from different societies leading to improved methods and techniques Deep learning provides an excitingopportunity to further enhance these systems by utilizing additional data to understand usersrsquo preferences better In this workwe propose Textual and Contextual Embedding-based Neural Recommender (TCENR) a deep framework that employs contextualdata such as usersrsquo social networks and locationsrsquo geo-spatial data along with textual reviews To make best use of these inputswe utilize multiple types of deep neural networks that are best suited for each type of data TCENR adopts the popular multilayerperceptrons to analyze historical activities in the system while the learning of textual reviews is achieved using two variations ofthe suggested framework One is based on convolutional neural networks to extract meaningful data from textual reviews andthe other employs recurrent neural networks Our proposed network is evaluated over the Yelp dataset and found to outperformmultiple state-of-the-art baselines in terms of accuracy mean squared error precision and recall In addition we provide furtherinsight into the design selections and hyperparameters of our recommender system hoping to shed light on the benefit of deeplearning for location-based social network recommendation

1 Introduction

In todayrsquos age of information it has become a prerequisite tohave reliable data prior to making any decision Preferablywe expect to obtain reviews from like-minded users beforeinvesting our time and money for any product or serviceLocation-based social networks (LBSN) such as Yelp Tri-pAdvisor and Foursquare provide such information aboutpotential locations while allowing users to connect withtheir friends and other users that have similar tastes Andindeed these networks are gaining popularity Yelp recently(httpswwwyelpcafactsheet) reported having 72 millionmobile active users every month while TripAdvisor achievedno less than 390 million monthly users (httpswwwtripad-visorcomTripAdvisorInsightsw828) However as LBSNshave more users with various preferences and additionallocations are being added on a daily basis it becomes timeconsuming to browse through all possibilities before findingan appropriate restaurant or venue to visit

Point-of-interest (POI) recommendation a subfield ofrecommender systems (RS) attempts to provide LBSN userswith personalized suggestions Properly exploited it can savetime and effort for the end user and encourages her to makefuture use in the LBSN both as a consumer and as a contentprovider Collaborative filtering (CF) is probably the mostprominent RS paradigm It is based on the assumption thatusers who had resembling past activities will have similarfuture preferences

While POI recommendation methods usually attempt tosolve a similar problem as the standard recommender systemthey are required to overcome additional challenges Firstdata sparsity commonly referred to as the cold start problemis based on the fact that most users will only interact with asmall fraction of the possible locations and vice-versa Whenadopting methods of CF it becomes harder to representusers and locations based on similar past activities whenthe variation is high This issue is worsened for users and

HindawiComplexityVolume 2019 Article ID 2926749 11 pageshttpsdoiorg10115520192926749

2 Complexity

locations that have few or no past interactions known tothe system as available data is insufficient to fully capturetheir features Second in contrast to other recommendationscenarios the decision whether to visit a location dependsnot only on the target userrsquos preferences but on that ofher friends [1] They might share different interests that areunknown to the system resulting in amore complex decisionprocess for the RS to learn Furthermore attributes that relateto the location itself and are unknown to the system mayhave impact on the decision For example the availability ofparking or convenient public transport or the popularity ofthe area itself can be the decisive factor for a user Thesechallenges proved the classic CF methods to be insufficientand prone to overfit the data as reported by previous works[2 3]

A common approach to address sparsity while acknowl-edging the userrsquos deriving factors is by utilizing contextualdata such as social networks [2 4] geographical locations[5 6] categories [1] and textual reviews [7 8] Deep neuralnetworks have been enthusiastically adopted in recent years[2 9 10] to employ such complex contextual inputs whilebeing able to process the vast amount of available dataDifferent neural network architectures have been introducedto improve the recommendation performance such as mul-tilayer perceptrons (MLP) [3 11 12] convolutional neuralnetworks (CNN) [13ndash15] and recurrent neural networks(RNN) [16ndash18] These techniques were shown to highlybenefit from the addition of contextual attributes As moredata becomes available incorporating these features presentsa promising opportunity to improve the personalized POIrecommendation process as will be demonstrated in thispaper

Although numerous works established the potential ofrecommender systems based on deep learning most havefocused on only a single type of neural network that bestsuited their given task However in this work we intendto incorporate multiple types of neural networks ie MLPCNN and RNN to provide POI recommendation usingvarious types of inputs First we will describe our proposedmethod Textual and Contextual Embedding-based NeuralRecommender (TCENR) a framework that takes usersrsquo socialnetwork locationsrsquo geographical data and textual reviewsalong with historical activities to provide personalized top-k POI recommendations By optimizing MLP and CNNjointly the proposed model will learn different aspects of thesame user-location interaction in conjunction and thereforewill be better suited to capture the underlying factors inthe userrsquos selection We will then present an extensionof TCENR denoted as TCENRseq where we replace theCNN component with that of an RNN and attempt tolearn user preferences and location attributes by treatingthe written reviews as a sequential input rather than byfocusing on their most important words Although theproposed solution has been developed to provide recom-mendations for specific types of inputs we claim it can beeasily generalized to a framework able to support additionalfeatures

The main contributions of our work are as follows

(1) We present TCENR a framework that jointly trainsMLP and CNN to provide POI recommendationsas well as a variation TCENRseq that performs thesame task while adopting RNN instead of the CNNcomponent

(2) To the best of our knowledge no work has been donein jointly training MLP and CNN for the task of POIrecommendation using social networks geographicallocations and natural language reviews as inputs

(3) Evaluated over the Yelp dataset our proposed frame-works were found to outperform seven state-of-the-art baselines in terms of accuracyMSE precision andrecall

(4) By comparing the two alternatives to our suggestedmodel we provide insight into the impact gained byanalyzing textual reviews as a secondary input to thecommon past interactions as well as a comparison ofCNN and RNN for the task of sentiment analysis inthe same experimental settings

(5) We further present comprehensive analysis over themost important hyper-parameters and design selec-tions of our proposed networks shedding some lightover the different components of deep neural net-works in the task of POI recommendation

In the following section we first lay the background andintroduce relatedworks to ourmodel In Section 3 we developour proposed frameworks which are evaluated and analyzedin Section 4 Finally we summarize our work and introducefuture work in Section 5

2 Related Work

Incorporating additional data about users and items is acommon strategy to provide meaningful recommendationsand to mitigate the cold-start problem In the area of POIrecommender systems where the data is highly sparse suchpractices are essential

21 Context-Based Recommender Systems Location-basedsocial networks are usually rich in contextual inputs whichpresent various opportunities for data enrichment in RSSuch features include time [9] spatial location [19] userrsquossocial network [4] itemrsquos meta-data [1] photos [20] anddemographics [11]

Contextual data is usually incorporated into RS eitheras part of the input or as a regularizing factor Reference[11 21] exploits the strengths of MLP-based networks inmodeling complex relationships by concatenating multiplefeature embeddings to the input before feeding it to aseries of nonlinear layers The final layerrsquos output is thena representation of a user-item interaction adjusted to thegiven context Sequentiality is very often introduced torecommender systems by treating sequences of past user anditem interactions in a timely manner In the case of POIrecommendations it consists of feeding sequences of check-ins to RNN-based layers such as long-short term memory(LSTM) [22] or themore concise gated recurrent units (GRU)[23]

Complexity 3

The use of spatial data is often done by dividing theinput space into roles and regions Assuming usersrsquo behaviorvaries when traveling far from home previous works [5 6]generated two profiles for each user one to be used in herhome region while another in more distant locations Arecent approach [24ndash26] attempts to divide the input spaceinto geographical regions before incorporated into themodeloften by hierarchical structures Although methods based onregions and roles are able to better distinguish user behaviorsin varied locations they do not provide a personalized userrepresentation and can ignore potential shifts of preferencesfrom one region to another For example a user might preferto visit a Starbucks location in different regions close or farfrom home Enriching geographical features with additionaldata is demonstrated in [25 27] where the next locationprediction is partially determined by past sequences of check-ins However these methods are not generic and cannot beextended to include additional features derived from socialnetworks or textual data

Furthermore in case of tasks in highly sparse environ-ments such as POI recommendation adding user or itemspecific inputs may diminish the modelrsquos ability to generalizeHowever applying the same data as a regulating factor canenhance the modelrsquos performance and reduce over-fittingSuch has been done in [4] where the similarity betweenconnected users in the social network was used to constraina matrix factorization (MF) model Reference [2] utilizedsocial networks and geographical distances to enforce similarembeddings for users and locations thus improving themodelrsquos ability to generalize for users and locations with fewhistorical records

22 Text-Based Recommendation Since many websites en-courage users to provide a written explanation to theirnumeric ratings textual reviews are one of the most populartypes of data to be integrated into RS Previous works hadadopted probabilistic-based approaches to alleviate the datasparsity problem using textual input [28ndash31] By expressingeach review as a bag of words LDA-based models are ableto extract topics which can be used to represent usersrsquointerests and locationsrsquo characteristics [5 6 25] These prob-abilistic methods are usually successful in handling issuesthat standard CF approaches struggle with such as out-of-town recommendations where similar users lack sufficienthistorical data However as demonstrated in recent works[17 32] failing to preserve the original order of wordsand ignoring their semantic meaning prevent the successfulmodeling of a given review On the other hand an emergingtrend of adopting neural networks over reviews allows suchlearning without the loss of data These implementations cangenerally be categorized into RNN-based [17 18] and CNN-based [14 15] models

RNN-based recommender systems usually rely on thesequential structure of sentences to learn their meaningIn [18] the words describing a target item are fed to abidirectional RNN layer Following the GRU architecture themodel utilizes an accumulated context from successive wordsto provide better representation of each word To preserveand update the context of words the recurrent model has to

manage a large number of parameters The problem becomesmore prominent when adopting the popular LSTMparadigmas done in [17]

Following its success in the field of computer vision [33]CNN-based models are gaining popularity in other areassuch as textualmodeling [34] By employing a slidingwindowover a given document such networks are able to representdifferent features foundwithin the text by identifying relevantsubsets of words Reference [15] follows the standard CNNstructure comprised of an embedding to represent thesemantic meaning of each word a convolution layer forgenerating local features and a max pooling operation toidentify the most relevant factors In [14] two CNNs aredeveloped to represent the target user and item based on theirreviews The resulting vectors are regarded as the user anditem representations which are then fed to a nonlinear layerto learn their corresponding rating

We claim that by jointly learning contextual and textualbased deep models our proposed method will better exploitthe strengths of collaborative filtering while being moreresilient to its shortcomings in sparse scenarios This will beachieved by learning usersrsquo and locationsrsquo representations assimilarities in direct interactions along with the correlationin underlying features extracted from their written reviewsThe notable work of [35] proposed JRL a framework thatsimilarly attempts to jointly optimize multiple models whereeach is responsible for learning a unique perspective of thesame task by focusing on different inputs However whileJRL is a general framework focused on extendability to manytypes of input our proposed method is tailored for the taskof POI recommendation

3 The Neural Recommender

31 Neural Network Architecture The following recom-mender system aims to improve the POI recommendationtask by learning user-location interactions using two parallelneural networks as shown in Figure 1 The context-basednetwork presented in the left part of the figure is designed tomodel the user-POI preferences using social and geographi-cal attributes and based on a multilayer perceptron structure[2 3] Shown in the right side of Figure 1 the convolutionalneural network is responsible for the textual modeling unit[14] It attempts to learn the same preference by analyzing theunderlying meaning in usersrsquo and locationsrsquo reviews Each ofthe two networks is based on modeling the userPOI inputindividually with regard to their shared interaction definedin the merge layers

311 Context-Based Network Layers To better capture thecomplex relations between users and locations in LBSN wechose to adopt the multilayer perceptron architecture Bystacking multiple layers of nonlinearities MLP is capable oflearning relevant latent factors of its inputs It is first fedwith user and location vectors of sizes N and M whereeach input tuple ⟨119906 119901⟩ is transformed into sparse one-hotencoding representations The two fully connected embed-ding layers found on top of the input layer project the sparserepresentations of users and locations into smaller and denser

4 Complexity

Prediction y

Hidden Layer

Hidden Layer

Merge Layer

Merge Layer

Hidden Layer

Merge Layer

oplus oplus

oplus oplusoplus oplus

User GraphOutput gu

POI GraphOutput gp

Somax Layer(cEu)

Somax Layer(cEp)

User Embedding(Eu)

POI Embedding(Ep)

User u POI p

Full Connected (ℎu) Fully Connected (ℎp)

Pooling Layer (Ou) Pooling Layer (Op)

Conv Layer (Zu) Conv Layer (Zp)

Word Embedding (Vu) Word Embedding (Vp)

User Reviews (du) POI Reviews (dp)

Figure 1 TCENR Framework

vectors For 119906 and 119901 the respective embedding matrices are119864119906 isin R119896119906times119873 and 119864119901 isin R119896119901times119872 where 119896119906 and 119896119901 are thecorresponding dimensions

We exploit the usersrsquo social networks along with the loca-tionsrsquo geographical graphs to constrain the learned embed-dings Two softmax layers take the representations 119864119906 and 119864119901as input and transform them back to N and M sized vectorsrespectively The user output layer 120595119888119864119906 isin R119873 describes thesimilarity of a userrsquos embedding to all N users and can beformally denoted as

120595119888119864119906 = 119886 (119882119906119888 times 119864119906 + 119887119906119888 ) (1)

where 119882119906119888 and 119887119906119888 are the layerrsquos weight matrix and biasvector and a is a nonlinear activation function Due tothe similarity between the user and POI specific layers thelocation output layer 120595119888119864119901 will not be developed in thissection Enforcing 120595119888119864119906 and 120595119888119864119901 to resemble user ursquos socialgraph 119892119906 and location prsquos spatial graph 119892119901 respectivelyresults in a smoothing factor that limits the amount by whichconnected entitiesrsquo embeddings differ

The user and location embeddings are projected to amerge layer and combined by a concatenation operatorUsing concatenation instead of dot-product allows variedembedding structures which in turn improves the generatedrepresentations [3] As the input layer for the following neuralnetwork the merge layer can be represented as ℎ(0)(119909) where

119909 = [119864119906 119864119901] (2)

Since simple concatenation of the user-location embeddedvectors does not allow for interactions to bemodeled hiddenlayers are added to learn these connections The popularRectified Linear Units (ReLU) is employed as the activationfunction for these layers More formally the q-th hidden layercan be defined as

ℎ119902119888119900119899119905119890119909119905 (119909) = 119877119890119871119880 (119882119902ℎ119902minus1119888119900119899119905119890119909119905 (119909) + 119887119902) (3)

where119882119902 and 119887119902 are the q-th layer parameters

312 Textual Modeling Network Layers To improve themodelrsquos coverage a textual-based network is introduced Itsimultaneously learns the same interaction as the contextual-based network but with a natural language input Twoadditional vectors 119889119906 and 119889119901 representing user ursquos andlocation prsquos textual reviews respectively are applied as inputsfor this network Each vector is comprised of all n wordswritten by the user or about the location merged togetherkept in their original order These words are then mappedto c-dimensional vectors defining their semantic meaningin the following embedding layers The output of the userembedding layer is the representation of all words used bya user u in the form of a matrix and can be denoted as

119881119906 = [120601 (1198891199061) 120601 (119889119906119897 ) 120601 (119889119906119899)] (4)

where [1206011 1206012] denotes the concatenation of two vectors 119889119906119897 isuser 119906rsquos 119897-th word and 120601 119863 997888rarr R119896119908 is a lookup function to

Complexity 5

Fully Connected

Max Pooling

ConvolutionalLayers

WordEmbeddings

delicious healthy food steak is amazing

(a) Textual modeling component using CNN

Fully Connected

BI-directionalGRU

Max Pooling

WordEmbeddings


(b) Textual modeling component using RNN

Figure 2 Proposed alternatives to learn user and location representations from textual reviews 2(a) is a CNN-based solution employed inTCENR while 2(b) illustrates the suggested extension using RNN

a pre-trained textual embedding layer [36 37] that representseach word in vocabulary 119863 as a vector in size 119896119908 Similarly119881119901 denotes the word embedding matrix for location 119901

Due to the large amount of parameters required to trainthe aforementioned contextual model the textual network isimplemented using a CNN-based architecture which is usu-ally more computationally efficient than RNNThe semanticrepresentations of usersrsquo and locationsrsquo reviews are fed toconvolution layers to detect parts of the text that best capturethe reviewrsquos meaning These layers produce 119905 feature mapsover the embedded word vector using a window size of 119908119904and filter119870 isin R119896119908times119905 As suggested by [14] ReLU is used as anactivation function for this layer

119911119906119895119897 = 119877119890119871119880 (119881119906119897119897+119908119904 lowast 119870119906119895 + 119887119906119895 ) 119911119906119895 = [1199111199061198951 119911119906119895119897 119911119906119895119899] (5)

where 119881119906119897 is user 119906rsquos 119897-th input word embedding and 119911119906119895 the119895-th feature extracted from the complete textBased on the standard CNN structure feature maps

produced by the convolution layers are reduced by a poolinglayer

119900119906119895 = max (119911119906119895 ) 119874119906 = [1199001199061 119900119906119895 119900119906119905 ] (6)

wheremax-pooling is selected to identify the most importantwords and their latent values 119874119906 is the collection of allconcise features extracted from user 119906rsquos textual input Theseare followed by fully connected layers to jointlymodel the dif-ferent features and result in the latent textual representationsfor user 119906 and location 119901 respectively denoted as ℎ119906 and ℎ119901

ℎ119906 = 119877119890119871119880 (1198821199061 times 119874119906 + 1198871199061 ) (7)

To combine the outputs of the users and locations fullyconnected layers to the same feature space a shared layeris utilized It concatenates its two inputs and learns theirinteraction by employing an additional hidden layer

ℎ119903119890V119894119890119908119904 = 119877119890119871119880 (1198822 times [ℎ119906 ℎ119901] + 1198872) (8)

The two neural networks are then finally merged toproduce a prediction 119910119906119901 isin [0 1] The last layers of thetwo networks each representing a different view of the user-location interaction are concatenated and fed to yet anotherhidden layer responsible to blend the learning

119910119906119901 = 120590 (1198823 times [ℎ119888119900119899119905119890119909119905 ℎ119903119890V119894119890119908119904] + 1198873) (9)

where the sigmoid function was selected to transform thehidden layer output to the desired range of [0 1]32 Sequential Textual Modeling To further investigate thegain achieved by integrating a textual modeling componentover reviews in TCENR we suggest an extension denoted asTCENRseq Following its success in previous language model-ing tasks [17 18] and its ability to capture sentencesrsquo sequentialnature we employ anRNNcomponent to learn latent featuresfrom reviews An illustration of the proposed extension ispresented in Figure 2(b) while the CNN method used inthe vanilla TCENR is shown in Figure 2(a) to provide aconvenient base for comparisons More specifically we followthe findings of previous works [18 23] and implement ourrecurrent network using GRU an architecture that achievescompetitive performance compared to LSTM but with fewerparameters making it more efficient

119891119897 = 120590 (119882119891119881119906119897 + 119877119891ℎ119897minus1 + 119887119891)119904119897 = 120590 (119882119904119881119906119897 + 119877119904ℎ119897minus1 + 119887119904)

6 Complexity

119888119897 = tanh (119882119888119881119906119897 + 119891119897 ⊙ 119877119888ℎ119897minus1 + 119887119888)ℎ119897 = (1 minus 119904119897) ⊙ ℎ119897minus1 + 119904119897 ⊙ 119888119897

(10)

where 119891119897 is the forget gate for input word 119897 119904119897 is the outputgate 119888119897 is the new candidate state combining the current wordembedding 119881119906119897 with the previous hidden state and ℎ119897 iscurrent state for word 119897modeled by the output gate ⊙ denotesthe element-wise product and 119882119891 119877119891 119882119904 119877119904 119882119888 and 119877119888are the GRU weight matrices while 119887119891 119887119904 and 119887119888 are the biasvectors

Since the context of a word can be determined byother preceding and successive words or sentences ourproposed method employs a bidirectional GRU over theuser embedding 119881119906 and the location 119881119901 Each word 119897rsquoshidden state is learned by forward and backward GRU layers

denoted as997888rarrℎ1119897 and larr997888ℎ1119897 respectively To learn a more concise

and combined representation of a word while taking intoaccount the context of all surrounding words we feed the

concatenation of997888rarrℎ1119897 andlarr997888ℎ1119897 to an additional bidirectionalGRU

layer such that its input for every word 119897 is 1198902119897 = [997888rarrℎ1119897 larr997888ℎ1119897 ]The second recurrent unit will output 119899 latent vectors eachis a sequentially infused representation of a word written bythe target user or about the candidate item To allow themethod of textual modeling to be the only variant betweenTCENR and TCENRseq and to further reduce the numberof learned parameters all modified word vectors will befed to the pooling and fully connected layers originallypresented in (6) and (7) respectively This will allow us todirectly determine the effect RNN has on textual modelingfor POI recommender systems compared to CNN as well asenabling the model to learn a more concise user and locationrepresentations As in TCENR the resulting vectors will bemerged in order to learn the user-location interaction

33 Training the Network To train the recommendationmodels we adopt a pointwise loss objective function as donein [2 3 14] where the difference between the prediction 119910119906119901and the actual value 119910119906119901 is minimized To address the implicitfeedback nature of LBSNs we sample a set of negative samplesfrom the dataset denoted as 119884

Due to the implicit feedback nature of the recommenda-tion task the algorithmrsquos output can be considered as a binaryclassification problem As the sigmoid activation function isbeing used over the last hidden layer the output probabilitycan be defined as

119901 (119884 119884 | 119864119906 119864119901 119881119906 119881119901 Θ119891)= prod(119906119901)isin119884

119910119906119901 prod1199061199011015840isin119884

(1 minus 1199101199061199011015840) (11)

where 119864119906 and 119864119901 are the embedding layers for users andlocations respectively Similarly 119881119906 and 119881119901 are the textualreviews embedding layers and Θ119891 represents the modelparameters Taking the negative log-likelihood of p results

in the binary cross-entropy loss function for the predictionportion of the model

119871119901119903119890119889 = minus sum(119906119901)isin119884cup119884

119910119906119901 log119910119906119901+ (1 minus 119910119906119901) log (1 minus 119910119906119901)

(12)

As there are two more outputs in the model the usersrsquosocial network 120595c119864119906 and the locationsrsquo distance graph 120595119888119864119901two additional loss functions are required to train the net-work We follow the process done in [2] assuming two userswho share the same context should have similar embeddingsThis is achieved by minimizing the log-loss of the contextgiven the instance embedding

119871119906 119888119900119899119905119890119909119905= minus sum(119906119906119888)

log(120595119888119864119906 minus log sum1199061015840119888isin119862119906

exp (1205951198881015840119864119906)) (13)

where 120595119888119864119906 is as defined in (1) Taking the binary classlabel into account prompts the following loss functioncorrespondingwithminimizing the cross-entropy loss of useri and context c with respect to the y class label

119871119906 119888119900119899119905119890119909119905 = minus119868 (119910 isin 119884) log 120590 (120595119888119864119906)minus 119868 (119910 isin 119884 ) log 120590 (minus120595119888119864119906) (14)

where I is a function that returns 1 if y is in the given setand 0 otherwise The same logic is used to formulate the lossfunction for the POI context and will not be provided due tospace limitations

We simultaneously minimize the three loss functions119871119901119903e119889 119871119906 119888119900119899119905119890119909119905 and 119871119901 119888119900119899119905119890119909119905 The joint optimizationimproves the recommendation accuracy while enforcingsimilar representations for locations in close proximity andusers connected in the social network The loss functionsare combined using two hyper-parameters to weight thecontextual contribution

119871 = 119871119901119903119890119889 + 1205821119871119906 119888119900119899119905119890119909119905 + 1205822119871119901 119888119900119899119905119890119909119905 (15)

To optimize the combined loss function a method ofgradient descent can be adopted and more specifically weutilize the Adaptive Moment Estimation (Adam) [38] Thisoptimizer automatically adjusts the learning rate and yieldsfaster convergence than the standard gradient descent inaddition to making the learning rate optimization processmore efficient In order to avoid additional overfitting whentraining the model an early stopping criteria is integratedThe model parameters are initialized with Gaussian distri-bution while the output layerrsquos parameters are set to followuniform distribution

4 Experiments and Evaluation

41 Experimental Setup To evaluate our proposed algorithmwe use Yelprsquos real-world dataset (httpswwwyelpcomdata-setchallenge) It includes a subset of textual reviews along

Complexity 7

with the usersrsquo friends and the businesses locations Due tothe limited resources used in the model evaluation we choseto filter the dataset and keep only a concise subset whereall users and locations with less than 100 written reviews orless than 10 friends are removedThe filtered dataset includes141028 reviews and 9808 sparsity for the rating matrixThesocial and geographical graphs were constructed by randomwalks 10 of the original vertices were sampled as basenodes while 20 and 30 vertices were connected to each basenode for users and locations respectively with a window sizeof 3 To build the POI graph two locations are considereddirectly connected if they are up to 1 km apart

To test our modelsrsquo performance the original data wassplit to training-validation-test sets by random samplingwith the respective ratios of 56-24-20 resulting in78899 training instances In addition the input data was neg-atively sampled with 4 negative locations for every positiveone

To effectively compare our proposal with other alterna-tives we adopt the same settings as applied in [2 3] TheMLP input vectors are represented with an embedding size of10 while two hidden layers are added on top of the mergedresult Following the tower architecture where the size ofeach layer is half the size of its predecessor the numbers ofhidden units are 32 and 16 for the first and second layersrespectively

In the CNN component each word is represented by apretrained embedding layer with 50 units while the convo-lutional layer is constructed with a window size of 10 anda stride of 3 It results in 3 feature maps that are flattenedafter performing the max-pooling operation with a pool sizeof 2 The results are further modeled by a hidden layer with32 units Following the merge of the two hidden units theirinteraction is learned using another hidden layer with 8 unitsTo combine the three loss functions as described in (15) wefollow the results of [2] and set the hyperparameters 1205821 =1205822 = 01 For the training phase of the model a learning rateof 0005 was used over 50 maximum epochs and a batch sizeof 512 samples

42 Baselines To evaluate our algorithm we chose to com-pare it to these seven empirically proven frameworks

(i) HPF [39] Hierarchical Poisson matrix FactorizationA Bayesian framework for modeling implicit datausing Poisson Factorization

(ii) NMF [40] Nonnegative Matrix Factorization a CFmethod that takes only the rating matrix as input

(iii) Geo-SAGE [5] A generative method that predictsuser check-ins in LBSNs using geographical data andcrowd behaviors

(iv) LCARS [6] Location Content Aware RecommenderSystem A probabilistic model that exploits localpreferences in LBSN and content information aboutPOIs

(v) NeuMF [3] Neural Matrix Factorization A state-of-the-art model combining MF with MLP on implicitratings

Table 1 Performance comparison over the Yelp dataset Improve-ment of TCENR compared to each method is shown in brackets

Model Accuracy MSE Pre10 Rec10HPF 08141 01800 05526 03699

(169) (3494) (1851) (4098)NMF 08222 01189 07851 03517

(069) (151) (-1658) (4828)Geo-SAGE 07995 01807 02912 04145

(355) (3519) (12489) (2581)LCARS 08142 01612 06408 05127

(168) (2736) (22) (172)NeuMF 08273 01421 06488 05586

(007) (1759) (094) (-664)Pace 08239 01186 06406 05049

(049) (126) (223) (329)DeepCoNN 08037 01454 05385 0323

(301) (1946) (2162) (6446)TCENR 08279 01171 06549 05215TCENRseq 08273 01161 06655 04738

(007) (-086) (-159) (1007)

(vi) PACE [2] Preference and Context Embedding AMLP-based framework with the addition of contex-tual graphsrsquo smoothing for POI recommendation

(vii) DeepCoNN [14] Deep Cooperative Neural Net-works A CNN-based method that jointly learns anexplicit prediction by exploiting usersrsquo and locationsrsquonatural language reviews

For the task of evaluating ourmodel and the baselines wechose to apply Accuracy and Mean Square Error (MSE) overall n test samples as well as Precision (Pre10) and Recall(Rec10) for the average top 10 predictions per user

The proposed models were implemented using Keras(httpskerasio) on top of TensorFlow (httpswwwten-sorfloworg) backend All experiments were conducted usingNivida GTX 1070 GPU

43 Performance Evaluation The performance of our pro-posed algorithms and the seven baselines is reported inTable 1 along with the improvement ratio of TCENR overeach method in brackets The presented results are based onthe average of three individual executions

As can be witnessed from the results the proposedmodel TCENR achieves the best results overall comparedto all baselines Furthermore it was found to significantlyoutperformHPFNMF Geo-SAGE LCARS Pace and Deep-CoNN for 119901 lt 005 based on a one-sided unpaired t-testin terms of accuracy and MSE The contrasting results interms of precision and recall compared to NeuMF suggestthat TCENR offers lessbut more relevant recommendationsto the user While NMF provides the best precision scorecompared to all methods it underperforms in all othermeasures making it a less desirable model Taking a closerlook shows that surprisingly NeuMF outperforms PACE inaccuracy precision and recall This may be due to the less

8 Complexity

0

10000

20000

30000

40000

50000

60000

Runt

ime (

Seco

nds)

HPF NMF NeuMF DeepCoNN TCENR TCENRseqPACELCARSGeo-SAGE

Figure 3 Runtime (seconds) of all models on the Yelp dataset

sparse dataset tested which does not allow the contextualregularization to be fully harvested In addition the use ofonly the first 500words to represent the textual input for eachuser and location may explain the relatively low scores of theDeepCoNN model on the dataset while the performance ofGeo-SAGE and LCARS demonstrates that relying solely ongeographical data does not allow suchmodels to fully captureusersrsquo preferences in LBSNs

Comparing TCENR and its proposed extensionTCENRseq provides contrasting results By employingRNN instead of CNN to extract user and location featuresfrom textual reviews TCENRseq achieves lower error rateand improved precision score while accuracy and recallare worsened It may be considered that by accuratelycapturing different aspects from user reviews the modelis able to reinforce its hypotheses and therefore reducethe uncertainty in some cases However when faced witha contrast between textual aspects and the ground truthit might choose the wrong class label Nonetheless theresults demonstrate the importance of adopting the mostsuitable techniques and measures to learn different datatypes rather than employing a single method over all inputsMoreover it shows the positive impact of using textualdata in conjunction to historical activities The reportedperformance further suggests additional insight towards theselection of CNN and RNN for the task of language modelingin future recommendation tasks

To further evaluate our suggested frameworks and theseven baselines in terms of runtime the average time requiredto fully train each method is presented in Figure 3 Asdemonstrated by the results TCENR is competitivewithmostbaselines and found to be more efficient than DeepCoNNand LCARS The reported runtime of TCENRseq furtherdemonstrates the relative efficiency of CNN-based solutionsfor textual modeling tasks As the number of trainableparameters is increased due to the use of recurrent layers ourRNN-based extension takes 329 longer to train comparedto TCENR while achieving comparative results

44 Model Design Analysis In this section we discuss theeffect of several design selections over the suggested modelrsquosperformance

441Merge Layer The importance of themodelrsquos final layersresponsible for combining the dense output of both the MLPand convolutional networks requires a close attention as it

effects the networksrsquo ability to jointly learn and the predictionitself To properly select the fusion operator the followingmethods had been considered

(i) Combining the last hidden layers of the two modelsusing concatenation A model using this method willbe denoted as 119879119862119864119873119877119888119900119899 and described in (9)

(ii) Merging the last hidden layers using dot productresulting in a model named 119879119862119864119873119877119889119900119905 that can bedefined as

119910119906119901 = ℎ119888119900119899119905119890119909119905 sdot ℎ119903119890V119894119890119908119904 (16)

(iii) Combining the two previously described methodswhere the two representations will be jointly learnedby concatenation and dot productThe resultedmodelwill be denoted as 119879119862119864119873119877119889119900119905 119888119900119899 and can be devel-oped by combining (9) and (16) using addition andtranslating the result to a range of [0 1] with thesigmoid function

119910119906119901 = 120590 (120590 (1198824 times [ℎ119888119900119899119905119890119909119905 ℎ119903119890V119894119890119908119904] + 1198874) + ℎ119888119900119899119905119890119909119905sdot ℎ119903119890V119894119890119908119904) (17)

(iv) Adopting a weighted average for the prediction resultof the two networks Denoted as 119879119862119864119873119877119908119890119894119892ℎ119905 thismodel can be defined as

119910119906119901 = 1205821120590 (1198825 times ℎ119888119900119899119905119890119909119905 + 1198875)+ 1205822120590 (1198826 times ℎ119903119890V119894119890119908119904 + 1198876) (18)

As shown in Figure 4 adopting the more simple methodsof weighted average and dot product leads to an inferiorperformance of TCENR demonstrating the added valueof utilizing the latent features learned by each subnetworkjointly When combined with the underperforming methodof dot product in 119879119862119864119873119877119889119900119905 119888119900119899 the use of concatenationimproves over dot product alone However since the twomethods are integrated using a simple average employingonly concatenation as done in 119879119862119864119873119877119888119900119899 produces the bestresults and therefore integrated into the final model

442 MLP Layer Design Although it was found by [3] thatadding more layers and units to the MLP-based recom-mender has a positive effect the use of CNN and the addi-tional hidden layer suggests it is a subject worth investigating

Complexity 9

0812

0823

0802

0828

TCENRdot_con

TCENRweight

TCENRdot

TCENR con

Figure 4 Comparison of merging methods in terms of accuracy

Table 2 Modelrsquos accuracy with different layers

1st layer H=1 H=2 H=3 H=416 0824 0827 - -32 0823 0837 0825 -64 0822 0829 083 0827128 0823 0828 0829 0827

To this end we test the proposed algorithm with 1-4 hiddenlayers used to learn the user-item interaction with contextualregularization in varying sizes from 8 to 128 hidden unitsThe results in terms of test setrsquos accuracy are presented inTable 2 where the number of hidden layers is defined ascolumns and the size of the first unit is presented as rowsUnlike previous results we find that two hidden layers with32 and 16 hidden units result in the best performance for ourdataset

443 Number of Words The use of written reviews in theiroriginal order allows the strengths of CNN and RNN to beexploited by finding the best representation for every fewwords and eventually for the whole text Our final datasethowever is composed of very long reviews where to fullylearn a single user or location more than 20000 words arerequired making it computationally expensive to extract rel-evant representations To benefit from the sequential natureof the written reviews while keeping the solution feasible thenumber of words was limited to a range of 500-6000 As canbe witnessed from Figure 5 there is a slight improvement inaccuracy as the number of words increase up to 3000 whileadditional words result in an increased bias towards users andlocations with longer reviews and in turn reduce the modelrsquoslearning capabilities

5 Conclusion and Future Work

In this paper we developed a neural POI recommendersystem called TCENR The model exploits data about userslocations spatial data social networks and textual reviewsto predict the implicit preference of users regarding POIsTCENR models two types of user-location interactionsnative check-ins regularized by contextual information andthe words used to describe the usersrsquo experiences We furtherextended our proposed method and presented TCENRseqwhere textual data was modeled using RNN instead of CNNEvaluated over the Yelp dataset the proposed algorithms

0822

0825

0828

0831

0834

0 2000 4000 6000

Figure 5 Number of words comparison in terms of accuracy

consistently achieved superior results compared to sevenstate-of-the-art baselines in terms of accuracy and MSE

For future work we intend to extend our modelsrsquoevaluation over additional LBSN datasets In addition weplan to investigate the proposed frameworksrsquo contributionto the cold-start problem by analyzing its performance onadditional data while taking newusers and locationswith fewreviews into account

Data Availability

The data used to support the findings of this study areavailable from the corresponding author upon request

Conflicts of Interest

The authors declare that they have no conflicts of interest

Acknowledgments

This work was supported in part by the National NaturalScience Foundation of China Grant (61572289) and NSERCDiscovery Grants

References

[1] H Li Y Ge R Hong and H Zhu ldquoPoint-of-interest rec-ommendations learning potential check-ins from friendsrdquo inProceedings of the 22nd ACM SIGKDD International Conferenceon Knowledge Discovery And Data Mining pp 975ndash984 ACMSan Francisco California USA August 2016

[2] C Yang L Bai C Zhang Q Yuan and J Han ldquoBridgingcollaborative filtering and semi-supervised learning a neuralapproach for poi recommendationrdquo in Proceedings of the 23rdACM SIGKDD International Conference on Knowledge Discov-ery andDataMining pp 1245ndash1254 ACMHalifaxNS CanadaAugust 2017

[3] X He L Liao H Zhang L Nie X Hu and T-S Chua ldquoNeuralcollaborative filteringrdquo in Proceedings of the 26th InternationalConference onWorldWideWeb InternationalWorldWideWebConferences Steering Committee pp 173ndash182 Perth AustraliaApril 2017

[4] H Ma D Zhou C Liu M R Lyu and I King ldquoRecommendersystems with social regularizationrdquo in Proceedings of the 4thACM International Conference onWeb Search and DataMiningpp 287ndash296 ACM February 2011

10 Complexity

[5] W Wang H Yin L Chen Y Sun S Sadiq and X ZhouldquoGeo-sage a geographical sparse additive generative model forspatial item recommendationrdquo in Proceedings of the 21th ACMSIGKDD International Conference on Knowledge Discovery andData Mining pp 1255ndash1264 ACM Sydney NSW AustraliaAugust 2015

[6] H Yin Y Sun B Cui Z Hu and L Chen ldquoLcars a location-content-aware recommender systemrdquo in Proceedings of the 19thACM SIGKDD International Conference on Knowledge Discov-ery andDataMining pp 221ndash229 ACMChicago Illinois USAAugust 2013

[7] C Wang and D M Blei ldquoCollaborative topic modeling for rec-ommending scientific articlesrdquo in Proceedings of the 17th ACMSIGKDD International Conference on Knowledge Discovery andData Mining (KDD rsquo11) pp 448ndash456 ACM August 2011

[8] HWang NWang andD-Y Yeung ldquoCollaborative deep learn-ing for recommender systemsrdquo in Proceedings of the 21th ACMSIGKDD International Conference on Knowledge Discovery andData Mining pp 1235ndash1244 ACM Sydney NSW AustraliaAugust 2015

[9] J Manotumruksa C Macdonald and I Ounis ldquoA deep recur-rent collaborative filtering framework for venue recommen-dationrdquo in Proceedings of the 2017 ACM on Conference onInformation and KnowledgeManagement pp 1429ndash1438 ACMSingapore Singapore November 2017

[10] Q Liu S Wu D Wang Z Li and L Wang ldquoContext-awaresequential recommendationrdquo in Proceedings of the 2016 IEEE16th International Conference on Data Mining (ICDM) pp1053ndash1058 IEEE Barcelona Spain December 2016

[11] H-T Cheng L Koc J Harmsen et al ldquoWide amp deep learningfor recommender systemsrdquo inProceedings of the 1stWorkshop onDeep Learning for Recommender Systems pp 7ndash10 ACM 2016

[12] Y Yu L Zhang C Wang R Gao W Zhao and J JiangldquoNeural personalized ranking via poisson factormodel for itemrecommendationrdquoComplexity vol 2019 Article ID 3563674 16pages 2019

[13] A Van Den Oord S Dieleman and B Schrauwen ldquoDeepcontent-based music recommendationrdquo in Proceedings of the26th International Conference on Neural Information ProcessingSystems Advances in neural information processing systemspp 2643ndash2651 2013

[14] L Zheng V Noroozi and P S Yu ldquoJoint deepmodeling of usersand items using reviews for recommendationrdquo in Proceedingsof the Tenth ACM International Conference on Web Search andData Mining pp 425ndash434 ACM Cambridge UK Feburary2017

[15] D Kim C Park J Oh S Lee andH Yu ldquoConvolutional matrixfactorization for document context-aware recommendationrdquoin Proceedings of the 10th ACM Conference on RecommenderSystems pp 233ndash240 ACM 2016

[16] B Hidasi A Karatzoglou L Baltrunas and D Tikk ldquoSession-based recommendations with recurrent neural networksrdquo 2015httpsarxivorgabs151106939

[17] A Almahairi K Kastner K Cho and A Courville ldquoLearningdistributed representations from reviews for collaborative filter-ingrdquo in Proceedings of the 9th ACMConference on RecommenderSystems pp 147ndash154 ACM Vienna Austria September 2015

[18] T Bansal D Belanger and A McCallum ldquoAsk the gru multi-task learning for deep text recommendationsrdquo in Proceedings ofthe 10th ACMConference on Recommender Systems pp 107ndash114ACM 2016

[19] J Chen W Zhang P Zhang P Ying K Niu and M ZouldquoExploiting spatial and temporal for point of interest recom-mendationrdquoComplexity vol 2018 Article ID 6928605 16 pages2018

[20] P Zhao X Xu Y Liu V S Sheng K Zheng and H XiongldquoPhoto2trip exploiting visual contents in geo-tagged photos forpersonalized tour recommendationrdquo in Proceedings of the 2017ACM on Multimedia Conference - MM 17 pp 916ndash924 ACMPress Mountain View California USA October 2017

[21] P Covington J Adams and E Sargin ldquoDeep neural networksfor youtube recommendationsrdquo in Proceedings of the 10th ACMConference on Recommender Systems pp 191ndash198 ACM 2016

[22] C-Y Wu A Ahmed A Beutel A J Smola and H JingldquoRecurrent recommender networksrdquo in Proceedings of the TenthACM International Conference onWeb Search and DataMiningpp 495ndash503 ACM 2017

[23] A Beutel P Covington S Jain et al ldquoLatent cross making useof context in recurrent recommender systemsrdquo inProceedings ofthe Eleventh ACM International Conference on Web Search andDataMining pp 46ndash54 ACMMarinaDel Rey CA USA 2018

[24] H Yin W Wang H Wang L Chen and X Zhou ldquoSpatial-aware hierarchical collaborative deep learning for POI rec-ommendationrdquo IEEE Transactions on Knowledge and DataEngineering vol 29 no 11 pp 2537ndash2551 2017

[25] H Yin X Zhou Y Shao H Wang and S Sadiq ldquoJointmodeling of user check-in behaviors for point-of-interest rec-ommendationrdquo in Proceedings of the 24th ACM Internationalon Conference on Information and Knowledge Management pp1631ndash1640 ACMMelbourne Australia October 2015

[26] P Zhao X Xu Y Liu et al ldquoExploiting hierarchical structuresfor POI recommendationrdquo in Proceedings of the 2017 IEEEInternational Conference on Data Mining (ICDM) IEEE NewOrleans LA USA November 2017

[27] P Zhao H Zhu Y Liu et al ldquoWhere to go next a spatio-temporal gated network for next poi recommendationrdquo inProceedings of the 33rdAAAIConference onArtificial Intelligence(AAAI 2019) 2019

[28] HWang F Zhang X Xie andMGuo ldquoDkn deep knowledge-aware network for news recommendationrdquo in Proceedings of the2018World Wide Web Conference pp 1835ndash1844 Lyon FranceApril 2018

[29] Y Gong and Q Zhang ldquoHashtag recommendation usingattention-based convolutional neural networkrdquo in Proceedingsof the 25th International Joint Conference on Artificial Intelli-gence IJCAI 2016 pp 2782ndash2788 NY USA July 2016

[30] Y Tay A T Luu and S C Hui ldquoMulti-pointer co-attentionnetworks for recommendationrdquo in Proceedings of the 24th ACMSIGKDD International Conference on Knowledge Discovery ampData Mining pp 2309ndash2318 London UK August 2018

[31] Z Cheng Y Ding L Zhu and M Kankanhalli ldquoAspect-awarelatent factormodel rating prediction with ratings and reviewsrdquoin Proceedings of the 2018World WideWeb Conference pp 639ndash648 Lyon France April 2018

[32] D Tang BQin T Liu andY Yang ldquoUsermodeling with neuralnetwork for review rating predictionrdquo in Proceedings of the 24thInternational Conference on Artificial Intelligence IJCAI 2015pp 1340ndash1346 Argentina July 2015

[33] Y LeCun L Bottou Y Bengio and P Haffner ldquoGradient-basedlearning applied to document recognitionrdquo Proceedings of theIEEE vol 86 no 11 pp 2278ndash2324 1998

[34] Y Kim ldquoConvolutional neural networks for sentence classi-ficationrdquo in Proceedings of the 2014 Conference on Empirical

Complexity 11

Methods in Natural Language Processing (EMNLP) pp 1746ndash1751 Association for Computational Linguistics Doha Qatar2014 httpsaclanthologyinfopapersD14-1181d14-1181

[35] Y Zhang Q Ai X Chen andW B Croft ldquoJoint representationlearning for top-N recommendation with heterogeneous infor-mation sourcesrdquo in Proceedings of the 2017 ACM on Conferenceon Information and Knowledge Management pp 1449ndash1458ACM Singapore Singapore November 2017

[36] J Pennington R Socher and C Manning ldquoGloVe globalvectors for word representationrdquo in Proceedings of the 2014 Con-ference on Empirical Methods in Natural Language Processing(EMNLP) pp 1532ndash1543 2014

[37] T Mikolov I Sutskever K Chen G S Corrado and JDean ldquoDistributed representations of words and phrases andtheir compositionalityrdquo in Proceedings of the 26th Interna-tional Conference on Neural Information Processing SystemsAdvances in neural information processing systems pp 3111ndash3119 Lake TahoeNevada 2013 httpsdlacmorgcitationcfmid=2999959

[38] D P Kingma and J Ba ldquoAdam a method for stochasticoptimizationrdquo 2014 httpsarxivorgabs14126980

[39] P Gopalan J M Hofman and DM Blei ldquoScalable recommen-dationwith hierarchical Poisson factorizationrdquo inProceedings ofthe 31st Conference on Uncertainty in Artificial Intelligence UAI2015 pp 326ndash335 Netherlands July 2015

[40] X Luo M Zhou Y Xia and Q Zhu ldquoAn efficient non-negativematrix-factorization-based approach to collaborative filteringfor recommender systemsrdquo IEEE Transactions on IndustrialInformatics vol 10 no 2 pp 1273ndash1284 2014

Hindawiwwwhindawicom Volume 2018

MathematicsJournal of


Mathematical Problems in Engineering

Applied MathematicsJournal of


Probability and StatisticsHindawiwwwhindawicom Volume 2018

Journal of


Mathematical PhysicsAdvances in

Complex AnalysisJournal of


OptimizationJournal of



Engineering Mathematics

International Journal of


Operations ResearchAdvances in

Journal of


Function SpacesAbstract and Applied AnalysisHindawiwwwhindawicom Volume 2018

International Journal of Mathematics and Mathematical Sciences


Hindawi Publishing Corporation httpwwwhindawicom Volume 2013Hindawiwwwhindawicom

The Scientific World Journal

Volume 2018

Hindawiwwwhindawicom Volume 2018Volume 2018

Numerical AnalysisNumerical AnalysisNumerical AnalysisNumerical AnalysisNumerical AnalysisNumerical AnalysisNumerical AnalysisNumerical AnalysisNumerical AnalysisNumerical AnalysisNumerical AnalysisNumerical AnalysisAdvances inAdvances in Discrete Dynamics in

Nature and SocietyHindawiwwwhindawicom Volume 2018

Hindawiwwwhindawicom

Dierential EquationsInternational Journal of

Volume 2018


Decision SciencesAdvances in


AnalysisInternational Journal of


Stochastic AnalysisInternational Journal of

Submit your manuscripts atwwwhindawicom

2 Complexity

locations that have few or no past interactions known tothe system as available data is insufficient to fully capturetheir features Second in contrast to other recommendationscenarios the decision whether to visit a location dependsnot only on the target userrsquos preferences but on that ofher friends [1] They might share different interests that areunknown to the system resulting in amore complex decisionprocess for the RS to learn Furthermore attributes that relateto the location itself and are unknown to the system mayhave impact on the decision For example the availability ofparking or convenient public transport or the popularity ofthe area itself can be the decisive factor for a user Thesechallenges proved the classic CF methods to be insufficientand prone to overfit the data as reported by previous works[2 3]

A common approach to address sparsity while acknowl-edging the userrsquos deriving factors is by utilizing contextualdata such as social networks [2 4] geographical locations[5 6] categories [1] and textual reviews [7 8] Deep neuralnetworks have been enthusiastically adopted in recent years[2 9 10] to employ such complex contextual inputs whilebeing able to process the vast amount of available dataDifferent neural network architectures have been introducedto improve the recommendation performance such as mul-tilayer perceptrons (MLP) [3 11 12] convolutional neuralnetworks (CNN) [13ndash15] and recurrent neural networks(RNN) [16ndash18] These techniques were shown to highlybenefit from the addition of contextual attributes As moredata becomes available incorporating these features presentsa promising opportunity to improve the personalized POIrecommendation process as will be demonstrated in thispaper

Although numerous works established the potential ofrecommender systems based on deep learning most havefocused on only a single type of neural network that bestsuited their given task However in this work we intendto incorporate multiple types of neural networks ie MLPCNN and RNN to provide POI recommendation usingvarious types of inputs First we will describe our proposedmethod Textual and Contextual Embedding-based NeuralRecommender (TCENR) a framework that takes usersrsquo socialnetwork locationsrsquo geographical data and textual reviewsalong with historical activities to provide personalized top-k POI recommendations By optimizing MLP and CNNjointly the proposed model will learn different aspects of thesame user-location interaction in conjunction and thereforewill be better suited to capture the underlying factors inthe userrsquos selection We will then present an extensionof TCENR denoted as TCENRseq where we replace theCNN component with that of an RNN and attempt tolearn user preferences and location attributes by treatingthe written reviews as a sequential input rather than byfocusing on their most important words Although theproposed solution has been developed to provide recom-mendations for specific types of inputs we claim it can beeasily generalized to a framework able to support additionalfeatures

The main contributions of our work are as follows

(1) We present TCENR a framework that jointly trainsMLP and CNN to provide POI recommendationsas well as a variation TCENRseq that performs thesame task while adopting RNN instead of the CNNcomponent

(2) To the best of our knowledge no work has been donein jointly training MLP and CNN for the task of POIrecommendation using social networks geographicallocations and natural language reviews as inputs

(3) Evaluated over the Yelp dataset our proposed frame-works were found to outperform seven state-of-the-art baselines in terms of accuracyMSE precision andrecall

(4) By comparing the two alternatives to our suggestedmodel we provide insight into the impact gained byanalyzing textual reviews as a secondary input to thecommon past interactions as well as a comparison ofCNN and RNN for the task of sentiment analysis inthe same experimental settings

(5) We further present comprehensive analysis over themost important hyper-parameters and design selec-tions of our proposed networks shedding some lightover the different components of deep neural net-works in the task of POI recommendation

In the following section we first lay the background andintroduce relatedworks to ourmodel In Section 3 we developour proposed frameworks which are evaluated and analyzedin Section 4 Finally we summarize our work and introducefuture work in Section 5

2 Related Work

Incorporating additional data about users and items is acommon strategy to provide meaningful recommendationsand to mitigate the cold-start problem In the area of POIrecommender systems where the data is highly sparse suchpractices are essential

21 Context-Based Recommender Systems Location-basedsocial networks are usually rich in contextual inputs whichpresent various opportunities for data enrichment in RSSuch features include time [9] spatial location [19] userrsquossocial network [4] itemrsquos meta-data [1] photos [20] anddemographics [11]

Contextual data is usually incorporated into RS eitheras part of the input or as a regularizing factor Reference[11 21] exploits the strengths of MLP-based networks inmodeling complex relationships by concatenating multiplefeature embeddings to the input before feeding it to aseries of nonlinear layers The final layerrsquos output is thena representation of a user-item interaction adjusted to thegiven context Sequentiality is very often introduced torecommender systems by treating sequences of past user anditem interactions in a timely manner In the case of POIrecommendations it consists of feeding sequences of check-ins to RNN-based layers such as long-short term memory(LSTM) [22] or themore concise gated recurrent units (GRU)[23]

Complexity 3











4 Complexity

Prediction y

Hidden Layer

Hidden Layer

Merge Layer

Merge Layer

Hidden Layer

Merge Layer

oplus oplus


User GraphOutput gu

POI GraphOutput gp

Somax Layer(cEu)

Somax Layer(cEp)

User Embedding(Eu)

POI Embedding(Ep)

User u POI p









120595119888119864119906 = 119886 (119882119906119888 times 119864119906 + 119887119906119888 ) (1)



119909 = [119864119906 119864119901] (2)


ℎ119902119888119900119899119905119890119909119905 (119909) = 119877119890119871119880 (119882119902ℎ119902minus1119888119900119899119905119890119909119905 (119909) + 119887119902) (3)



119881119906 = [120601 (1198891199061) 120601 (119889119906119897 ) 120601 (119889119906119899)] (4)


Complexity 5

Fully Connected

Max Pooling

ConvolutionalLayers

WordEmbeddings



Fully Connected

BI-directionalGRU

Max Pooling

WordEmbeddings






119911119906119895119897 = 119877119890119871119880 (119881119906119897119897+119908119904 lowast 119870119906119895 + 119887119906119895 ) 119911119906119895 = [1199111199061198951 119911119906119895119897 119911119906119895119899] (5)



119900119906119895 = max (119911119906119895 ) 119874119906 = [1199001199061 119900119906119895 119900119906119905 ] (6)


ℎ119906 = 119877119890119871119880 (1198821199061 times 119874119906 + 1198871199061 ) (7)


ℎ119903119890V119894119890119908119904 = 119877119890119871119880 (1198822 times [ℎ119906 ℎ119901] + 1198872) (8)


119910119906119901 = 120590 (1198823 times [ℎ119888119900119899119905119890119909119905 ℎ119903119890V119894119890119908119904] + 1198873) (9)


119891119897 = 120590 (119882119891119881119906119897 + 119877119891ℎ119897minus1 + 119887119891)119904119897 = 120590 (119882119904119881119906119897 + 119877119904ℎ119897minus1 + 119887119904)

6 Complexity


(10)









119901 (119884 119884 | 119864119906 119864119901 119881119906 119881119901 Θ119891)= prod(119906119901)isin119884

119910119906119901 prod1199061199011015840isin119884

(1 minus 1199101199061199011015840) (11)





(12)


119871119906 119888119900119899119905119890119909119905= minus sum(119906119906119888)


exp (1205951198881015840119864119906)) (13)





119871 = 119871119901119903119890119889 + 1205821119871119906 119888119900119899119905119890119909119905 + 1205822119871119901 119888119900119899119905119890119909119905 (15)




Complexity 7













(169) (3494) (1851) (4098)NMF 08222 01189 07851 03517

(069) (151) (-1658) (4828)Geo-SAGE 07995 01807 02912 04145

(355) (3519) (12489) (2581)LCARS 08142 01612 06408 05127

(168) (2736) (22) (172)NeuMF 08273 01421 06488 05586

(007) (1759) (094) (-664)Pace 08239 01186 06406 05049

(049) (126) (223) (329)DeepCoNN 08037 01454 05385 0323

(301) (1946) (2162) (6446)TCENR 08279 01171 06549 05215TCENRseq 08273 01161 06655 04738

(007) (-086) (-159) (1007)







8 Complexity

0

10000

20000

30000

40000

50000

60000

Runt

ime (

Seco

nds)











119910119906119901 = ℎ119888119900119899119905119890119909119905 sdot ℎ119903119890V119894119890119908119904 (16)




119910119906119901 = 1205821120590 (1198825 times ℎ119888119900119899119905119890119909119905 + 1198875)+ 1205822120590 (1198826 times ℎ119903119890V119894119890119908119904 + 1198876) (18)



Complexity 9

0812

0823

0802

0828

TCENRdot_con

TCENRweight

TCENRdot

TCENR con



1st layer H=1 H=2 H=3 H=416 0824 0827 - -32 0823 0837 0825 -64 0822 0829 083 0827128 0823 0828 0829 0827





0822

0825

0828

0831

0834

0 2000 4000 6000




Data Availability




Acknowledgments


References





10 Complexity































Complexity 11















Journal of












Journal of







Volume 2018






Volume 2018








Complexity 3











4 Complexity

Prediction y

Hidden Layer

Hidden Layer

Merge Layer

Merge Layer

Hidden Layer

Merge Layer

oplus oplus


User GraphOutput gu

POI GraphOutput gp

Somax Layer(cEu)

Somax Layer(cEp)

User Embedding(Eu)

POI Embedding(Ep)

User u POI p









120595119888119864119906 = 119886 (119882119906119888 times 119864119906 + 119887119906119888 ) (1)



119909 = [119864119906 119864119901] (2)


ℎ119902119888119900119899119905119890119909119905 (119909) = 119877119890119871119880 (119882119902ℎ119902minus1119888119900119899119905119890119909119905 (119909) + 119887119902) (3)



119881119906 = [120601 (1198891199061) 120601 (119889119906119897 ) 120601 (119889119906119899)] (4)


Complexity 5

Fully Connected

Max Pooling

ConvolutionalLayers

WordEmbeddings



Fully Connected

BI-directionalGRU

Max Pooling

WordEmbeddings






119911119906119895119897 = 119877119890119871119880 (119881119906119897119897+119908119904 lowast 119870119906119895 + 119887119906119895 ) 119911119906119895 = [1199111199061198951 119911119906119895119897 119911119906119895119899] (5)



119900119906119895 = max (119911119906119895 ) 119874119906 = [1199001199061 119900119906119895 119900119906119905 ] (6)


ℎ119906 = 119877119890119871119880 (1198821199061 times 119874119906 + 1198871199061 ) (7)


ℎ119903119890V119894119890119908119904 = 119877119890119871119880 (1198822 times [ℎ119906 ℎ119901] + 1198872) (8)


119910119906119901 = 120590 (1198823 times [ℎ119888119900119899119905119890119909119905 ℎ119903119890V119894119890119908119904] + 1198873) (9)


119891119897 = 120590 (119882119891119881119906119897 + 119877119891ℎ119897minus1 + 119887119891)119904119897 = 120590 (119882119904119881119906119897 + 119877119904ℎ119897minus1 + 119887119904)

6 Complexity


(10)









119901 (119884 119884 | 119864119906 119864119901 119881119906 119881119901 Θ119891)= prod(119906119901)isin119884

119910119906119901 prod1199061199011015840isin119884

(1 minus 1199101199061199011015840) (11)





(12)


119871119906 119888119900119899119905119890119909119905= minus sum(119906119906119888)


exp (1205951198881015840119864119906)) (13)





119871 = 119871119901119903119890119889 + 1205821119871119906 119888119900119899119905119890119909119905 + 1205822119871119901 119888119900119899119905119890119909119905 (15)




Complexity 7













(169) (3494) (1851) (4098)NMF 08222 01189 07851 03517

(069) (151) (-1658) (4828)Geo-SAGE 07995 01807 02912 04145

(355) (3519) (12489) (2581)LCARS 08142 01612 06408 05127

(168) (2736) (22) (172)NeuMF 08273 01421 06488 05586

(007) (1759) (094) (-664)Pace 08239 01186 06406 05049

(049) (126) (223) (329)DeepCoNN 08037 01454 05385 0323

(301) (1946) (2162) (6446)TCENR 08279 01171 06549 05215TCENRseq 08273 01161 06655 04738

(007) (-086) (-159) (1007)







8 Complexity

0

10000

20000

30000

40000

50000

60000

Runt

ime (

Seco

nds)











119910119906119901 = ℎ119888119900119899119905119890119909119905 sdot ℎ119903119890V119894119890119908119904 (16)




119910119906119901 = 1205821120590 (1198825 times ℎ119888119900119899119905119890119909119905 + 1198875)+ 1205822120590 (1198826 times ℎ119903119890V119894119890119908119904 + 1198876) (18)



Complexity 9

0812

0823

0802

0828

TCENRdot_con

TCENRweight

TCENRdot

TCENR con



1st layer H=1 H=2 H=3 H=416 0824 0827 - -32 0823 0837 0825 -64 0822 0829 083 0827128 0823 0828 0829 0827





0822

0825

0828

0831

0834

0 2000 4000 6000




Data Availability




Acknowledgments


References





10 Complexity































Complexity 11















Journal of












Journal of







Volume 2018






Volume 2018








4 Complexity

Prediction y

Hidden Layer

Hidden Layer

Merge Layer

Merge Layer

Hidden Layer

Merge Layer

oplus oplus


User GraphOutput gu

POI GraphOutput gp

Somax Layer(cEu)

Somax Layer(cEp)

User Embedding(Eu)

POI Embedding(Ep)

User u POI p









120595119888119864119906 = 119886 (119882119906119888 times 119864119906 + 119887119906119888 ) (1)



119909 = [119864119906 119864119901] (2)


ℎ119902119888119900119899119905119890119909119905 (119909) = 119877119890119871119880 (119882119902ℎ119902minus1119888119900119899119905119890119909119905 (119909) + 119887119902) (3)



119881119906 = [120601 (1198891199061) 120601 (119889119906119897 ) 120601 (119889119906119899)] (4)


Complexity 5

Fully Connected

Max Pooling

ConvolutionalLayers

WordEmbeddings



Fully Connected

BI-directionalGRU

Max Pooling

WordEmbeddings






119911119906119895119897 = 119877119890119871119880 (119881119906119897119897+119908119904 lowast 119870119906119895 + 119887119906119895 ) 119911119906119895 = [1199111199061198951 119911119906119895119897 119911119906119895119899] (5)



119900119906119895 = max (119911119906119895 ) 119874119906 = [1199001199061 119900119906119895 119900119906119905 ] (6)


ℎ119906 = 119877119890119871119880 (1198821199061 times 119874119906 + 1198871199061 ) (7)


ℎ119903119890V119894119890119908119904 = 119877119890119871119880 (1198822 times [ℎ119906 ℎ119901] + 1198872) (8)


119910119906119901 = 120590 (1198823 times [ℎ119888119900119899119905119890119909119905 ℎ119903119890V119894119890119908119904] + 1198873) (9)


119891119897 = 120590 (119882119891119881119906119897 + 119877119891ℎ119897minus1 + 119887119891)119904119897 = 120590 (119882119904119881119906119897 + 119877119904ℎ119897minus1 + 119887119904)

6 Complexity


(10)









119901 (119884 119884 | 119864119906 119864119901 119881119906 119881119901 Θ119891)= prod(119906119901)isin119884

119910119906119901 prod1199061199011015840isin119884

(1 minus 1199101199061199011015840) (11)





(12)


119871119906 119888119900119899119905119890119909119905= minus sum(119906119906119888)


exp (1205951198881015840119864119906)) (13)





119871 = 119871119901119903119890119889 + 1205821119871119906 119888119900119899119905119890119909119905 + 1205822119871119901 119888119900119899119905119890119909119905 (15)




Complexity 7













(169) (3494) (1851) (4098)NMF 08222 01189 07851 03517

(069) (151) (-1658) (4828)Geo-SAGE 07995 01807 02912 04145

(355) (3519) (12489) (2581)LCARS 08142 01612 06408 05127

(168) (2736) (22) (172)NeuMF 08273 01421 06488 05586

(007) (1759) (094) (-664)Pace 08239 01186 06406 05049

(049) (126) (223) (329)DeepCoNN 08037 01454 05385 0323

(301) (1946) (2162) (6446)TCENR 08279 01171 06549 05215TCENRseq 08273 01161 06655 04738

(007) (-086) (-159) (1007)







8 Complexity

0

10000

20000

30000

40000

50000

60000

Runt

ime (

Seco

nds)











119910119906119901 = ℎ119888119900119899119905119890119909119905 sdot ℎ119903119890V119894119890119908119904 (16)




119910119906119901 = 1205821120590 (1198825 times ℎ119888119900119899119905119890119909119905 + 1198875)+ 1205822120590 (1198826 times ℎ119903119890V119894119890119908119904 + 1198876) (18)



Complexity 9

0812

0823

0802

0828

TCENRdot_con

TCENRweight

TCENRdot

TCENR con



1st layer H=1 H=2 H=3 H=416 0824 0827 - -32 0823 0837 0825 -64 0822 0829 083 0827128 0823 0828 0829 0827





0822

0825

0828

0831

0834

0 2000 4000 6000




Data Availability




Acknowledgments


References





10 Complexity































Complexity 11















Journal of












Journal of







Volume 2018






Volume 2018








Complexity 5

Fully Connected

Max Pooling

ConvolutionalLayers

WordEmbeddings



Fully Connected

BI-directionalGRU

Max Pooling

WordEmbeddings






119911119906119895119897 = 119877119890119871119880 (119881119906119897119897+119908119904 lowast 119870119906119895 + 119887119906119895 ) 119911119906119895 = [1199111199061198951 119911119906119895119897 119911119906119895119899] (5)



119900119906119895 = max (119911119906119895 ) 119874119906 = [1199001199061 119900119906119895 119900119906119905 ] (6)


ℎ119906 = 119877119890119871119880 (1198821199061 times 119874119906 + 1198871199061 ) (7)


ℎ119903119890V119894119890119908119904 = 119877119890119871119880 (1198822 times [ℎ119906 ℎ119901] + 1198872) (8)


119910119906119901 = 120590 (1198823 times [ℎ119888119900119899119905119890119909119905 ℎ119903119890V119894119890119908119904] + 1198873) (9)


119891119897 = 120590 (119882119891119881119906119897 + 119877119891ℎ119897minus1 + 119887119891)119904119897 = 120590 (119882119904119881119906119897 + 119877119904ℎ119897minus1 + 119887119904)

6 Complexity


(10)









119901 (119884 119884 | 119864119906 119864119901 119881119906 119881119901 Θ119891)= prod(119906119901)isin119884

119910119906119901 prod1199061199011015840isin119884

(1 minus 1199101199061199011015840) (11)





(12)


119871119906 119888119900119899119905119890119909119905= minus sum(119906119906119888)


exp (1205951198881015840119864119906)) (13)





119871 = 119871119901119903119890119889 + 1205821119871119906 119888119900119899119905119890119909119905 + 1205822119871119901 119888119900119899119905119890119909119905 (15)




Complexity 7













(169) (3494) (1851) (4098)NMF 08222 01189 07851 03517

(069) (151) (-1658) (4828)Geo-SAGE 07995 01807 02912 04145

(355) (3519) (12489) (2581)LCARS 08142 01612 06408 05127

(168) (2736) (22) (172)NeuMF 08273 01421 06488 05586

(007) (1759) (094) (-664)Pace 08239 01186 06406 05049

(049) (126) (223) (329)DeepCoNN 08037 01454 05385 0323

(301) (1946) (2162) (6446)TCENR 08279 01171 06549 05215TCENRseq 08273 01161 06655 04738

(007) (-086) (-159) (1007)







8 Complexity

0

10000

20000

30000

40000

50000

60000

Runt

ime (

Seco

nds)











119910119906119901 = ℎ119888119900119899119905119890119909119905 sdot ℎ119903119890V119894119890119908119904 (16)




119910119906119901 = 1205821120590 (1198825 times ℎ119888119900119899119905119890119909119905 + 1198875)+ 1205822120590 (1198826 times ℎ119903119890V119894119890119908119904 + 1198876) (18)



Complexity 9

0812

0823

0802

0828

TCENRdot_con

TCENRweight

TCENRdot

TCENR con



1st layer H=1 H=2 H=3 H=416 0824 0827 - -32 0823 0837 0825 -64 0822 0829 083 0827128 0823 0828 0829 0827





0822

0825

0828

0831

0834

0 2000 4000 6000




Data Availability




Acknowledgments


References





10 Complexity































Complexity 11















Journal of












Journal of







Volume 2018






Volume 2018








6 Complexity


(10)









119901 (119884 119884 | 119864119906 119864119901 119881119906 119881119901 Θ119891)= prod(119906119901)isin119884

119910119906119901 prod1199061199011015840isin119884

(1 minus 1199101199061199011015840) (11)





(12)


119871119906 119888119900119899119905119890119909119905= minus sum(119906119906119888)


exp (1205951198881015840119864119906)) (13)





119871 = 119871119901119903119890119889 + 1205821119871119906 119888119900119899119905119890119909119905 + 1205822119871119901 119888119900119899119905119890119909119905 (15)




Complexity 7













(169) (3494) (1851) (4098)NMF 08222 01189 07851 03517

(069) (151) (-1658) (4828)Geo-SAGE 07995 01807 02912 04145

(355) (3519) (12489) (2581)LCARS 08142 01612 06408 05127

(168) (2736) (22) (172)NeuMF 08273 01421 06488 05586

(007) (1759) (094) (-664)Pace 08239 01186 06406 05049

(049) (126) (223) (329)DeepCoNN 08037 01454 05385 0323

(301) (1946) (2162) (6446)TCENR 08279 01171 06549 05215TCENRseq 08273 01161 06655 04738

(007) (-086) (-159) (1007)







8 Complexity

0

10000

20000

30000

40000

50000

60000

Runt

ime (

Seco

nds)











119910119906119901 = ℎ119888119900119899119905119890119909119905 sdot ℎ119903119890V119894119890119908119904 (16)




119910119906119901 = 1205821120590 (1198825 times ℎ119888119900119899119905119890119909119905 + 1198875)+ 1205822120590 (1198826 times ℎ119903119890V119894119890119908119904 + 1198876) (18)



Complexity 9

0812

0823

0802

0828

TCENRdot_con

TCENRweight

TCENRdot

TCENR con



1st layer H=1 H=2 H=3 H=416 0824 0827 - -32 0823 0837 0825 -64 0822 0829 083 0827128 0823 0828 0829 0827





0822

0825

0828

0831

0834

0 2000 4000 6000




Data Availability




Acknowledgments


References





10 Complexity































Complexity 11















Journal of












Journal of







Volume 2018






Volume 2018








Complexity 7













(169) (3494) (1851) (4098)NMF 08222 01189 07851 03517

(069) (151) (-1658) (4828)Geo-SAGE 07995 01807 02912 04145

(355) (3519) (12489) (2581)LCARS 08142 01612 06408 05127

(168) (2736) (22) (172)NeuMF 08273 01421 06488 05586

(007) (1759) (094) (-664)Pace 08239 01186 06406 05049

(049) (126) (223) (329)DeepCoNN 08037 01454 05385 0323

(301) (1946) (2162) (6446)TCENR 08279 01171 06549 05215TCENRseq 08273 01161 06655 04738

(007) (-086) (-159) (1007)







8 Complexity

0

10000

20000

30000

40000

50000

60000

Runt

ime (

Seco

nds)











119910119906119901 = ℎ119888119900119899119905119890119909119905 sdot ℎ119903119890V119894119890119908119904 (16)




119910119906119901 = 1205821120590 (1198825 times ℎ119888119900119899119905119890119909119905 + 1198875)+ 1205822120590 (1198826 times ℎ119903119890V119894119890119908119904 + 1198876) (18)



Complexity 9

0812

0823

0802

0828

TCENRdot_con

TCENRweight

TCENRdot

TCENR con



1st layer H=1 H=2 H=3 H=416 0824 0827 - -32 0823 0837 0825 -64 0822 0829 083 0827128 0823 0828 0829 0827





0822

0825

0828

0831

0834

0 2000 4000 6000




Data Availability




Acknowledgments


References





10 Complexity































Complexity 11















Journal of












Journal of







Volume 2018






Volume 2018








8 Complexity

0

10000

20000

30000

40000

50000

60000

Runt

ime (

Seco

nds)











119910119906119901 = ℎ119888119900119899119905119890119909119905 sdot ℎ119903119890V119894119890119908119904 (16)




119910119906119901 = 1205821120590 (1198825 times ℎ119888119900119899119905119890119909119905 + 1198875)+ 1205822120590 (1198826 times ℎ119903119890V119894119890119908119904 + 1198876) (18)



Complexity 9

0812

0823

0802

0828

TCENRdot_con

TCENRweight

TCENRdot

TCENR con



1st layer H=1 H=2 H=3 H=416 0824 0827 - -32 0823 0837 0825 -64 0822 0829 083 0827128 0823 0828 0829 0827





0822

0825

0828

0831

0834

0 2000 4000 6000




Data Availability




Acknowledgments


References





10 Complexity































Complexity 11















Journal of












Journal of







Volume 2018






Volume 2018








Complexity 9

0812

0823

0802

0828

TCENRdot_con

TCENRweight

TCENRdot

TCENR con



1st layer H=1 H=2 H=3 H=416 0824 0827 - -32 0823 0837 0825 -64 0822 0829 083 0827128 0823 0828 0829 0827





0822

0825

0828

0831

0834

0 2000 4000 6000




Data Availability




Acknowledgments


References





10 Complexity































Complexity 11















Journal of












Journal of







Volume 2018






Volume 2018








10 Complexity































Complexity 11















Journal of












Journal of







Volume 2018






Volume 2018








Complexity 11















Journal of












Journal of







Volume 2018






Volume 2018















Journal of












Journal of







Volume 2018






Volume 2018








a joint deep recommendation framework for...

Documents