transfer learning for collective link prediction in multiple heterogenous domains
DESCRIPTION
Cao et al. ICML 2010 Presented by Danushka Bollegala. Transfer Learning for Collective Link Prediction in Multiple Heterogenous Domains. Link Prediction. Predict links (relations) between entities Recommend items for users ( MovieLens , Amazon) - PowerPoint PPT PresentationTRANSCRIPT
![Page 1: Transfer Learning for Collective Link Prediction in Multiple Heterogenous Domains](https://reader034.vdocuments.us/reader034/viewer/2022051518/5681658c550346895dd854a2/html5/thumbnails/1.jpg)
Transfer Learning for Collective Link Prediction in Multiple Heterogenous Domains
Cao et al. ICML 2010Presented by Danushka Bollegala.
![Page 2: Transfer Learning for Collective Link Prediction in Multiple Heterogenous Domains](https://reader034.vdocuments.us/reader034/viewer/2022051518/5681658c550346895dd854a2/html5/thumbnails/2.jpg)
Link Prediction Predict links (relations) between entities
Recommend items for users (MovieLens, Amazon) Recommend users for users (social recommendation) Similarity search (suggest similar web pages) Query suggestion (suggest related queries by other users)
Collective Link Prediction (CLP) Perform multiple prediction tasks for the same set of
users simultaneously▪ Predict/recommend multiple item types (books and movies)
Pros Prediction tasks might not be independent, one can
benefit from another (books vs. movies vs. food) Less affected by data sparseness (cold start problem)
![Page 3: Transfer Learning for Collective Link Prediction in Multiple Heterogenous Domains](https://reader034.vdocuments.us/reader034/viewer/2022051518/5681658c550346895dd854a2/html5/thumbnails/3.jpg)
Transfer Learning+Collective Link Prediction
(this paper)
Gaussian Process for
Regression (GPR)(PRML Sec. 6.4)
Link prediction = matrix factorization
Probabilistic Principal Component Analysis
(PPCA) (Bishop & Tipping, 1999)
PRML Chapter 12.
Probabilistic non-linear matrix factorization
Lawrence &Utrasun, ICML 2009
Task similarityMatrix, T
![Page 4: Transfer Learning for Collective Link Prediction in Multiple Heterogenous Domains](https://reader034.vdocuments.us/reader034/viewer/2022051518/5681658c550346895dd854a2/html5/thumbnails/4.jpg)
Link Modeling via NMF Link matrix X (xi,j is the rating given by user I to
item j) Xi,j is modeled by f(ui, vj, ε)
f: link function ui: latent representation of a user i vj: latent representation of an item j ε: noise term
Generalized matrix approximation Assumption: E is Gaussian noise N(0, σ2I) Use Y = f-1(X) Then, Y follows a multivariate Gaussian
distribution.
![Page 5: Transfer Learning for Collective Link Prediction in Multiple Heterogenous Domains](https://reader034.vdocuments.us/reader034/viewer/2022051518/5681658c550346895dd854a2/html5/thumbnails/5.jpg)
Gaussian Process Regression
Revision (PRML Section 6.4)
![Page 6: Transfer Learning for Collective Link Prediction in Multiple Heterogenous Domains](https://reader034.vdocuments.us/reader034/viewer/2022051518/5681658c550346895dd854a2/html5/thumbnails/6.jpg)
Functions as Vectors We can view a function as an infinite dimensional
vector f(x): (f(x1), f(x2),...)T
Each point in the domain is mapped by f to a dimension in the vector
In machine learning we must find functions (e.g. linear predictors) that map input values to their corresponding output values We must also avoid over-fitting
This can be visualized as sampling from a distribution over functions with certain properties Preference bias (cf. restriction bias)
![Page 7: Transfer Learning for Collective Link Prediction in Multiple Heterogenous Domains](https://reader034.vdocuments.us/reader034/viewer/2022051518/5681658c550346895dd854a2/html5/thumbnails/7.jpg)
Gaussian Process (GP) (1/2)Linear regression modelWe get different output functions y
for different weight vectors w.Let us impose a Gaussian prior over
w
Train dataset: {(x1,y1),...,(xN,yN)}Targets: y=(y1,...,yN)T
Design matrix
![Page 8: Transfer Learning for Collective Link Prediction in Multiple Heterogenous Domains](https://reader034.vdocuments.us/reader034/viewer/2022051518/5681658c550346895dd854a2/html5/thumbnails/8.jpg)
Gaussian Process (2/2)When we impose a Gaussian prior over
the weight vector, then the target y is also Gaussian.
K: Kernel matrix (Gram matrix)
k: kernel function
![Page 9: Transfer Learning for Collective Link Prediction in Multiple Heterogenous Domains](https://reader034.vdocuments.us/reader034/viewer/2022051518/5681658c550346895dd854a2/html5/thumbnails/9.jpg)
Gaussian Process: Definition Gaussian process is defined as a probability
distribution over functions y(x) such that the set of values y(x) evaluated at an arbitrary set of points x1,...,xN jointly have a Gaussian distribution.
p(x1,...,xN) is Gaussian. Often the mean is set to zero
Non-informative prior Then the kernel function fully defines the GP.
Gaussian kernel: Exponential Kernel:
![Page 10: Transfer Learning for Collective Link Prediction in Multiple Heterogenous Domains](https://reader034.vdocuments.us/reader034/viewer/2022051518/5681658c550346895dd854a2/html5/thumbnails/10.jpg)
Gaussian Process Regression (GPR)Predict outputs with noise
x y
e
t
![Page 11: Transfer Learning for Collective Link Prediction in Multiple Heterogenous Domains](https://reader034.vdocuments.us/reader034/viewer/2022051518/5681658c550346895dd854a2/html5/thumbnails/11.jpg)
Probabilistic Matrix Factorization PMF can be seen as a Gaussian Process with latent
variables (GP-LVM) [Lawrence & Utrasun ICML 2009]Generalized matrix approximation model
Y=f-1(X) follows a multivariate Gaussian distribution
A Gaussian prior is set on U
Probabilistic PCA model byTipping & Bishop (1999)
Non-linear version
Mapping back to X
![Page 12: Transfer Learning for Collective Link Prediction in Multiple Heterogenous Domains](https://reader034.vdocuments.us/reader034/viewer/2022051518/5681658c550346895dd854a2/html5/thumbnails/12.jpg)
Ratings are not Gaussian!
![Page 13: Transfer Learning for Collective Link Prediction in Multiple Heterogenous Domains](https://reader034.vdocuments.us/reader034/viewer/2022051518/5681658c550346895dd854a2/html5/thumbnails/13.jpg)
Collective Link PredictionGP model for each task
A single model for all tasks
![Page 14: Transfer Learning for Collective Link Prediction in Multiple Heterogenous Domains](https://reader034.vdocuments.us/reader034/viewer/2022051518/5681658c550346895dd854a2/html5/thumbnails/14.jpg)
Tensor Product
Known as Kronecker product for two matrices (e.g., numpy,kron(a,b))
€
⊗€
⊗
![Page 15: Transfer Learning for Collective Link Prediction in Multiple Heterogenous Domains](https://reader034.vdocuments.us/reader034/viewer/2022051518/5681658c550346895dd854a2/html5/thumbnails/15.jpg)
Generalized Link FunctionsEach task might have a different
rating distribution.
c, α, b are parameters that must be estimated from the data.
We can relax the constraint α > 0 if we have no prior knowledge regarding the negativity of the skewness of the rating distribution.
![Page 16: Transfer Learning for Collective Link Prediction in Multiple Heterogenous Domains](https://reader034.vdocuments.us/reader034/viewer/2022051518/5681658c550346895dd854a2/html5/thumbnails/16.jpg)
Predictive distribution
Similar to GPR predictionPredicting y= g(x)
Predicting x
![Page 17: Transfer Learning for Collective Link Prediction in Multiple Heterogenous Domains](https://reader034.vdocuments.us/reader034/viewer/2022051518/5681658c550346895dd854a2/html5/thumbnails/17.jpg)
Parameter EstimationCompute the likelihood of the dataset
Use Stochastic Gradient Descent for optimization
Non-convex optimization Sensitive to initial conditions
![Page 18: Transfer Learning for Collective Link Prediction in Multiple Heterogenous Domains](https://reader034.vdocuments.us/reader034/viewer/2022051518/5681658c550346895dd854a2/html5/thumbnails/18.jpg)
Experiments Setting
Use each dataset and predict multiple items Datasets
MovieLens▪ 100000 ratings, 1-5 scale ratings, 943 users, 1682
movies, 5 popular genres Book-Crossing▪ 56148 ratings, 1-10 scale, 28503 users, 9909 books, 4
most general Amazon book categories Douban▪ A social network-based recommendation serivce▪ 10000 users, 200000 items▪ Movies, books, music
![Page 19: Transfer Learning for Collective Link Prediction in Multiple Heterogenous Domains](https://reader034.vdocuments.us/reader034/viewer/2022051518/5681658c550346895dd854a2/html5/thumbnails/19.jpg)
Evaluation
Evaluation measure Mean Absolute Error (MAE)
Baselines I-GP: Independent Link Prediction using
GP CMF: Collective matrix factorization ▪ non GP, classical NMF
M-GP: Joint Link prediction using multi-relational GP▪ Does not consider the similarity between tasks
Proposed method = CLP-GP
![Page 20: Transfer Learning for Collective Link Prediction in Multiple Heterogenous Domains](https://reader034.vdocuments.us/reader034/viewer/2022051518/5681658c550346895dd854a2/html5/thumbnails/20.jpg)
Results
Note: (1) Smaller values are better (2) with(+)/without(-) link function.
![Page 21: Transfer Learning for Collective Link Prediction in Multiple Heterogenous Domains](https://reader034.vdocuments.us/reader034/viewer/2022051518/5681658c550346895dd854a2/html5/thumbnails/21.jpg)
Total data sparseness
Good
![Page 22: Transfer Learning for Collective Link Prediction in Multiple Heterogenous Domains](https://reader034.vdocuments.us/reader034/viewer/2022051518/5681658c550346895dd854a2/html5/thumbnails/22.jpg)
Target task data sparseness
![Page 23: Transfer Learning for Collective Link Prediction in Multiple Heterogenous Domains](https://reader034.vdocuments.us/reader034/viewer/2022051518/5681658c550346895dd854a2/html5/thumbnails/23.jpg)
Task similarity matrix (T)
Romance and Drama are very similarAction and Comedy are very
dissimilar
![Page 24: Transfer Learning for Collective Link Prediction in Multiple Heterogenous Domains](https://reader034.vdocuments.us/reader034/viewer/2022051518/5681658c550346895dd854a2/html5/thumbnails/24.jpg)
My Comments
Elegant model and well-written paperFew parameters (latent space
dimension k) need to be specified All other parameters can be learnt
Applicable to a wide range of tasksCons:
Computational complexity▪ Predictions require kernel matrix inversion▪ SGD updates might not converge▪ The problem is non-convex...