latent interest and topic mining on user-item bipartite networks
TRANSCRIPT
![Page 1: Latent Interest and Topic Mining on User-item Bipartite Networks](https://reader031.vdocuments.us/reader031/viewer/2022022202/587e4f8d1a28abeb1a8b5c37/html5/thumbnails/1.jpg)
Latent Interest and Topic Mining on User-itemBipartite Networks
Jinliang Xu, Shangguang Wang∗,Sen SuState Key Laboratory of Networking and Switching Technology
Beijing University of Posts and Telecommunications,
Beijing, China
{jlxu,sgwang,susen}@bupt.edu.cn
Sathish A.P KumarDepartment of Computer Science and Information Systems, Coastal Carolina University
South Carolina, USA
Wu ChouHuawei Technologies Co., Ltd
Shenzhen, China
Abstract—Latent Factor Model (LFM) is extensively used indealing with user-item bipartite networks in service recom-mendation systems. To alleviate the limitations of LFM, thispapers presents a novel unsupervised learning model, LatentInterest and Topic Mining model (LITM), to automaticallymine the latent user interests and item topics from user-itembipartite networks. In particular, we introduce the motivationand objectives of this bipartite network based approach, anddetail the model development and optimization process of theproposed LITM. This work not only provides an efficient methodfor latent user interest and item topic mining, but also highlightsa new way to improve the accuracy of service recommendation.Experimental studies are performed and the results validate theLITM’s efficiency in model training, and its ability to providebetter service recommendation performance based on user-itembipartite networks are demonstrated.
Keywords—service recommendation, user-item bipartite net-work, latent topic and interest, interpretability, and efficiency.
I. INTRODUCTION & RELATED WORK
Service recommendation systems collect information on
the preferences of its users for a set of items to help users
identify useful items from a considerably large search space
(e.g.,Taobao Mall, Netflix movies). With the rapid develop-
ment of e-commerce and mobile internet, new challenges have
been posed to researchers who are trying to mine the inner
patterns about users, items and relationship between them,
such as exploiting the purchasing behaviors in order to identify
additional possible matches, i.e. connecting users with items
they would otherwise not be able to find.
In the study of mining the relationships between users and
items, an important and special class of networks is a bipartite
network, where nodes can be divided into two disjoint sets,
such that no two nodes within the same set are connected [1],
[2], [3]. Many systems can be naturally modeled as bipartite
networks. For example, users are connected with the movies
they have rated (e.g., MovieLens [4]), web-users are connected
with the web sites that they have collected in their bookmark
sites, and the consumers are connected with the goods that
they have purchased from the market.
In dealing with the user-item bipartite networks, the tra-
ditional Latent Factor Models (LFM) have been shown to
be effective [5], [6]. LFM is able to factor the user-item
bipartite networks into two smaller matrices, consisting of
latent user factor vectors and item factor vectors respectively.
However, the output latent factor vectors generated by LFM
may contain arbitrary values, including negative numbers,
causing low interpretability [7]. As a result, LFM is hard to
explain to its users how a specific recommendation is derived.
Furthermore, each of the output latent factor vectors are not
a probability vector (Note that a probability vector here is
a vector with non-negative elements that add up to 1), this
property makes LFM difficult for building more complex,
realistic models [8]. Similar approaches to recommendation
systems include LSA [9], and NMF [10]. On the other hand,
LDA [11], [12] can output probabilistic distribution of latent
topics when taking user-item bipartite network as input, and
it has better interpretability than LFM. However, LDA based
models cannot model both users and items [12]. In addition,
LDA based models may not be applied directly to mine user
interests [2]. For example, Haveliwala et. al [13] computed a
set of scores for each item, where each score is related to a
topic. However, the topics here can not be automatically mined
from the user-item bipartite network.
In this paper, by combining the best of LFM and LDA,
we propose a novel unsupervised learning model, named
Latent Interest and Topic Mining model (LITM) to model both
users and items with the better interpretability from user-item
bipartite networks. As shown in Fig. 1, LITM is based on
the following: 1) for each item there is a probability vector
2016 IEEE International Conference on Services Computing
978-1-5090-2628-9/16 $31.00 © 2016 IEEE
DOI 10.1109/SCC.2016.105
778
![Page 2: Latent Interest and Topic Mining on User-item Bipartite Networks](https://reader031.vdocuments.us/reader031/viewer/2022022202/587e4f8d1a28abeb1a8b5c37/html5/thumbnails/2.jpg)
Fig. 1. Concept of latent user interests and item topics in LITM. At the left,it is a typical user-item bipartite network in recommender system, where auser is connected with an item if he has purchased it. The latent user interestand item topic vector are shown at the right side.
called latent item topic vector; 2) for each user there is a same
length vector called latent user interest vector, in which each
position of the latter vector corresponds to the same position of
the former vector; and 3) the occurrence probability of each
connection is determined by the corresponding user’s latent
interest vector and the corresponding item’s latent topic vector.
Latent user interest vector and latent item topic vector can
generalize the intrinsic patterns underlying the data generation
process into smaller number of latent factors. Taking the
MovieLens as an example, each movie has a probability dis-
tribution over different topics, and each user has a probability
distribution of interests over all topics.
The remainder of this paper is organized as follows. In
Section II we detail the model development and optimization
process of the proposed LITM. In Section III, our experimental
studies show that the proposed LITM is more efficient in mod-
el training against the baseline and it improves the accuracy of
service recommendation. Finally, we summarize the findings
of this paper in Section IV.
II. THE PROPOSED MODEL LITM
In this section we provide the details of the model devel-
opment and optimization process of LITM.
A. Conception and Notations
We explicitly describe the concept of latent item topic
vectors and latent user interest vectors respectively by the
following:
• latent item topic vector vm(m = 1, 2, · · · ,M): for each
item m, we use K-dimensionality probability vector vm
to represent its topic probability distribution over K ∈ Z+
topics;
• latent user interest vector un(n = 1, 2, · · ·N): for each
user n, we use the same length probability vector un
to represent the probability distribution of user’s interest
over the K topics. Each position of un corresponds to
the same position in vm.
Then we define the occurrence probability of the connection
between user n and item m (denoted by un − vm), or the
probability that user n select item m, by the following:
uTnvm =
K∑k=1
un,kvm,k, (1)
where un,k and vm,k are respectively the k−th elements of
un and vm. By doing so, we let un,kvm,k be the occurrence
probability of un−vm on the k−th topic, then the sum of the
occurrence probability over all topics (Eq. 1) can represent the
occurrence probability of the connection un − vm.
B. LITM
Suppose both un(n = 1, 2, · · ·N) and vm(m =1, 2, · · · ,M) are known, the occurrence of all connections
un− vm in bipartite network G are independent, according to
the d-separation criterion in probabilistic graphical models.
So we can define the objective function as the occurrence
probability of the whole bipartite network G as
P (G|U, V ) =
N∏n=1
M∏m=1
(un
Tvm
)σm,n, (2)
where σm,n indicates whether the connection un − vm exists
(σm,n = 1) or not (σm,n = 0), and U = {un|n =1, 2, · · · , N}, V = {vm|m = 1, 2, · · · ,M}.
For the ease of optimization, we take the negative logarithm
of the objective function, then add two regularized items, and
we get the problem to be optimized by the following:
minU,V −N∑
n=1
M∑m=1
σm,n log(un
Tvm
)+
1
N
N∑n=1
||un||2 + 1
M
M∑m=1
||vm||2
s.t. un ≥ 0, ||un||1 = 1, n = 1, 2, · · · , Nvm ≥ 0, ||vm||1 = 1,m = 1, 2, · · · ,M
. (3)
Note that we add the latter two �2 norm regularized terms to
alleviate the potential overfitting.
Analyzing the above problems shows, it shows as follows:
1) This objective function is non-convex (for the unTvm
term). Note, however, if we fix U and treat it as constants,
then the objective is a convex function of V , and vice versa.
We will therefore fix V and optimize U , then fix U and
optimize V , thereby obtaining a series of efficiently solvable
subproblems. As each update of U and V , it decreases the
objective function, from which the process will converge and
find a local minimum eventually. 2) If we fix V and optimize
U , then un, n = 1, 2, · · · , N can be optimized individually,
and vice versa. We take optimizing un as an example, then
we get a subproblem like the following:
minun −M∑
m=1
σm,n log(un
Tvm
)+
1
N||un||2
s.t. un ≥ 0, ||un||1 = 1
, (4)
where vm ≥ 0, ||vm||1 = 1,m = 1, 2, · · · ,M .
Clearly the subproblem (Eq. 4) has no closed-form solution,
and cannot use the gradient descent method directly to get the
optimal numerical solution for the given constraints un ≥0, ||un||1 = 1. However, since this is a convex programming
779
![Page 3: Latent Interest and Topic Mining on User-item Bipartite Networks](https://reader031.vdocuments.us/reader031/viewer/2022022202/587e4f8d1a28abeb1a8b5c37/html5/thumbnails/3.jpg)
problem with linear constraints and according to the convexity-preserving rules, we can apply Reduced Gradient method
to search for an optimal solution for the abovementioned
subproblems.
Algorithm 1: The adaptive Reduced Gradient method
Input: σm,n ∈ {0, 1},vm,m = 1, 2, · · · ,M ;K ∈ Z+;
Output: u1 Initialize u > 0 ∧ ||u||1 = 1,d �= 0;
2 while d �= 0 do3 umax = max{u};4 uN = u\umax //uN contains all but umax;
5 g = −∑Mm=1 σm,n
vm
uT vm+ 2
Nun//gradient vector ;
6 gmax = gj |uj=umax// the gradient value at umax;
7 for j = 1; j ≤ K; j ++ do8 if uj ∈ uN then9 rj = gj − gmax;
10 else11 loc = j;
12 rj = 0;
13 for j = 1; j ≤ K; j ++ do14 if rj > 0 then15 dj = −ujrj ;
16 else17 dj = −rj ;
18 dloc = −sum{d} //sum{d} is sum value of d;
19 λmax =∞;
20 for j = 1; j ≤ K; j ++ do21 if dj < 0 then22 λmax = min
{λmax,−uj
dj
}
23 search the best step length λbest between [0, λmax];24 u = u+ λbestd;
25 return u;
The Reduced Gradient method uses the equality constraints
to eliminate a subset of variables, thereby it reduces the orig-
inal problem to a bound-constrained problem in the space of
the remaining variables. Algorithm. 1 lists the whole process
to get an optimal solution for the subproblem (Eq. 4) using
the adaptive Reduced Gradient method. Lines 3-4 select the
largest variables as the basic variables. Lines 9-14 ensure
the feasibility of the new solution. Lines 15-19 make sure
to move in the direction to decrease the objective function.
Lines 21-24 calculate the largest possible step length towards
the optimizing direction.
Based on Algorithm. 1 and the previous analysis, we
summarize the basic idea of optimizing the proposed LITM
(Eq. 3) in Algorithm. 2, where lines 3-4 means fixing V and
optimizing U , line 5-6 means fixing U and optimizing V , and
line 2 means repeating the two operations until convergence.
This optimization process will lead us to a local minimum of
the objective function.
Algorithm 2: Basic idea for optimizing LITM
Input: σm,n ∈ {0, 1}, n = 1, 2, · · · , N,m =1, 2, · · · ,M ;K ∈ Z
+;Output: un,vm, n = 1, 2, · · · , N,m = 1, 2, · · · ,M ;
1 Initialize un and vm;
2 while not converged do3 for n = 1;n ≤ N ;n++ do4 un =
minun −∑M
m=1 σm,n log(un
Tvm
)+ 1
N ||un||2// get the solution by Reduced Gradientmethod(see Alg.1);
5 for m = 1;m ≤M ;m++ do6 vm =
minvm −∑N
n=1 σm,n log(un
Tvm
)+ 1
M ||vm||2// get the solution by Reduced Gradientmethod(see Alg.1);
7 return un,vm, n = 1, 2, · · · , N,m = 1, 2, · · · ,M ;
III. EXPERIMENTAL EVALUATION
In this section, we describe a series of experiments that we
conducted on benchmarking MovieLens dataset to evaluate
our proposed LITM and explore how much it can improve
the effectiveness of the service recommendation as well as its
efficiency in model training.
A. Experimental DataSet & Baseline Model
The MovieLens data is from the GroupLens Research1. It
comprises the ratings on 1, 682 items (movies) and 943 users,
where each user votes on movies in five discrete ratings 1 ∼ 5.
To generate the user-item bipartite network, we consider that
the connection between an item and a user exists if and only
if the given rating is at least 3. By this way, we obtained a
82, 520 entry (an entry means an actual user-item connection)
user-item bipartite network as our experimental dataset.
In our experiments, the dataset was randomly partitioned
into two parts, where the training set contained 90% of the
total entries, and the remaining 10% (8,252 entries) entries
constitute the test set.
In regards to the baseline model (we set K = 19 currently,
and more work is under way), we preliminarily adopted the
traditional model LFM [5] that has been described in Section
I. We choosed LFM because it has been extensively used in
the domain of service recommendation, and similar to our
proposed LITM, in LFM each users and items has a latent
vector representation.
B. Experimental Results & Discussions
LFM based recommender systems can provide each user
with an ordered queue of all its uncollected items. Similarly
our proposed LITM can rank a user’s uncollected items in the
order of the occurrence probability of the connections using
1http://www.grouplens.org
780
![Page 4: Latent Interest and Topic Mining on User-item Bipartite Networks](https://reader031.vdocuments.us/reader031/viewer/2022022202/587e4f8d1a28abeb1a8b5c37/html5/thumbnails/4.jpg)
Fig. 2. The predicted position of each entry in thetest data ranked in ascending order.
Fig. 3. The hitting rate as a function of the lengthof recommendation list.
Fig. 4. Learning curves for LFM and LITM modeltraining.
Eq. 1. For a user un, if the connection un − vm is in the
test set, we measure the position of vm in the ordered queue.
For example, if there are 1000 uncollected items for un, and
vm is the 40th from the top, we say the position of vm is
the top 40/1000, denoted by rankn,m = 0.04. Since the test
entries are actual connections, a good model is expected to
give high recommendations to them, indicated by their good
rank order positions. Fig. 2 shows the position values of 8, 252test entries, which are ordered from the top position (rank →0) to the bottom position (rank → 1). Clearly, the proposed
LITM performs better than LFM.To evaluate the proposed LITM in recommender systems
from another point of view, we adopted a measure of recom-
mending accuracy that depends on the length of recommenda-
tion list. The recommendation list for a user un with length L,
contains L highest ranking items generated from the model.
For each entry un − vm in the test data, if vm is in ui’s
recommendation list, we say the entry un−vm is “hit” by the
model. The ratio of hit entries is called “hitting rate”. For a
given L, the model with a higher hitting rate is better, and vice
versa. Clearly, the hitting rate monotonously increases with L,
with the upper bound 1 for sufficiently large L. In Fig. 3, we
report the hitting rate as a function of L for LFM and LITM,
and it’s easy to see that LITM had a better performance than
LFM.We also evaluated the efficiency of model training. Both
training processes of LFM and LITM will generate a series
of solutions in model training, and we plotted the base-10
logarithm of the objective function values of these solutions in
Fig. 4. It shows that the proposed LITM converged faster than
the traditional LFM, which demonstrates LITM’s efficiency
in model training. By changing the learning rate in model
training, we obtained the similar results as described above.
Despite three distinct metrics being used in the performance
evaluation experiments using three distinct metrics, it consis-
tently demonstrated the LITM’s efficiency in model training
and its ability for better service recommendation performance.
IV. CONCLUSION
This paper presented a novel unsupervised learning model
LITM to automatically mine the latent user interest and item
topic distribution from user-item bipartite networks. LITM
improves LFM in its low interpretability. In addition, this
work not only provides an efficient method for latent user
interest and item topic mining, but also highlights a new way
to improve accuracy of service recommendation.
ACKNOWLEDGMENTS
The work presented in this study is supported by NSFC
(61472047); NSFC(61571066).
REFERENCES
[1] T. Hu, H. Xiong, and S. Y. Sung, “Co-preserving patterns in bipartitepartitioning for topic identification.,” in SIAM International Conferenceon Data Mining (SDM), pp. 509–514, SIAM, 2007.
[2] X. Tang, M. Zhang, and C. C. Yang, “User interest and topic detec-tion for personalized recommendation,” in Proceedings of the 2012IEEE/WIC/ACM International Joint Conferences on Web Intelligenceand Intelligent Agent Technology (WI-IAT), pp. 442–446, IEEE Com-puter Society, 2012.
[3] T. de Paulo Faleiros and A. de Andrade Lopes, “Bipartite graph for topicextraction,” in Proceedings of the 24th International Joint Conferenceon Artificial Intelligence (IJCAI), pp. 4361–4362, AAAI Press, 2015.
[4] F. M. Harper and J. A. Konstan, “The movielens datasets: History andcontext,” ACM Transactions on Interactive Intelligent Systems, vol. 5,no. 4, p. 19, 2015.
[5] R. Bell, Y. Koren, and C. Volinsky, “Modeling relationships at multiplescales to improve accuracy of large recommender systems,” in Proceed-ings of the 13th ACM SIGKDD International Conference on KnowledgeDiscovery and Data Mining, pp. 95–104, ACM, 2007.
[6] Y. Shen and R. Jin, “Learning personal+ social latent factor modelfor social recommendation,” in Proceedings of the 18th ACM SIGKDDInternational Conference on Knowledge Discovery and Data Mining,pp. 1303–1311, ACM, 2012.
[7] G. Friedrich and M. Zanker, “A taxonomy for generating explanationsin recommender systems,” AI Magazine, vol. 32, no. 3, pp. 90–98, 2011.
[8] S. R. Eddy, “A probabilistic model of local sequence alignment that sim-plifies statistical significance estimation,” PLOS Computational Biology,vol. 4, no. 5, p. e1000069, 2008.
[9] T. Hofmann, “Probabilistic latent semantic indexing,” in Proceedings ofthe 22nd Annual International ACM SIGIR Conference on Research andDevelopment in Information Retrieval, pp. 50–57, ACM, 1999.
[10] D. D. Lee and H. S. Seung, “Algorithms for non-negative matrixfactorization,” in Proceedings of the Annual Conference on NeuralInformation Processing Systems (NIPS), pp. 556–562, 2001.
[11] D. M. Blei, A. Y. Ng, and M. I. Jordan, “Latent dirichlet allocation,”Journal of Machine Learning Research, vol. 3, pp. 993–1022, 2003.
[12] J. Tang, R. Jin, and J. Zhang, “A topic modeling approach and itsintegration into the random walk framework for academic search,” inProceedings of the 8th IEEE International Conference on Data Mining(ICDM), pp. 1055–1060, IEEE, 2008.
[13] T. H. Haveliwala, “Topic-sensitive pagerank: A context-sensitive rankingalgorithm for web search,” IEEE Transactions on Knowledge and DataEngineering, vol. 15, no. 4, pp. 784–796, 2003.
781