latent interest and topic mining on user-item bipartite networks

Latent Interest and Topic Mining on User-itemBipartite Networks

Jinliang Xu, Shangguang Wang∗,Sen SuState Key Laboratory of Networking and Switching Technology

Beijing University of Posts and Telecommunications,

Beijing, China

{jlxu,sgwang,susen}@bupt.edu.cn

Sathish A.P KumarDepartment of Computer Science and Information Systems, Coastal Carolina University

South Carolina, USA

[email protected]

Wu ChouHuawei Technologies Co., Ltd

Shenzhen, China

[email protected]

Abstract—Latent Factor Model (LFM) is extensively used indealing with user-item bipartite networks in service recom-mendation systems. To alleviate the limitations of LFM, thispapers presents a novel unsupervised learning model, LatentInterest and Topic Mining model (LITM), to automaticallymine the latent user interests and item topics from user-itembipartite networks. In particular, we introduce the motivationand objectives of this bipartite network based approach, anddetail the model development and optimization process of theproposed LITM. This work not only provides an efficient methodfor latent user interest and item topic mining, but also highlightsa new way to improve the accuracy of service recommendation.Experimental studies are performed and the results validate theLITM’s efficiency in model training, and its ability to providebetter service recommendation performance based on user-itembipartite networks are demonstrated.

Keywords—service recommendation, user-item bipartite net-work, latent topic and interest, interpretability, and efficiency.

I. INTRODUCTION & RELATED WORK

Service recommendation systems collect information on

the preferences of its users for a set of items to help users

identify useful items from a considerably large search space

(e.g.,Taobao Mall, Netflix movies). With the rapid develop-

ment of e-commerce and mobile internet, new challenges have

been posed to researchers who are trying to mine the inner

patterns about users, items and relationship between them,

such as exploiting the purchasing behaviors in order to identify

additional possible matches, i.e. connecting users with items

they would otherwise not be able to find.

In the study of mining the relationships between users and

items, an important and special class of networks is a bipartite

network, where nodes can be divided into two disjoint sets,

such that no two nodes within the same set are connected [1],

[2], [3]. Many systems can be naturally modeled as bipartite

networks. For example, users are connected with the movies

they have rated (e.g., MovieLens [4]), web-users are connected

with the web sites that they have collected in their bookmark

sites, and the consumers are connected with the goods that

they have purchased from the market.

In dealing with the user-item bipartite networks, the tra-

ditional Latent Factor Models (LFM) have been shown to

be effective [5], [6]. LFM is able to factor the user-item

bipartite networks into two smaller matrices, consisting of

latent user factor vectors and item factor vectors respectively.

However, the output latent factor vectors generated by LFM

may contain arbitrary values, including negative numbers,

causing low interpretability [7]. As a result, LFM is hard to

explain to its users how a specific recommendation is derived.

Furthermore, each of the output latent factor vectors are not

a probability vector (Note that a probability vector here is

a vector with non-negative elements that add up to 1), this

property makes LFM difficult for building more complex,

realistic models [8]. Similar approaches to recommendation

systems include LSA [9], and NMF [10]. On the other hand,

LDA [11], [12] can output probabilistic distribution of latent

topics when taking user-item bipartite network as input, and

it has better interpretability than LFM. However, LDA based

models cannot model both users and items [12]. In addition,

LDA based models may not be applied directly to mine user

interests [2]. For example, Haveliwala et. al [13] computed a

set of scores for each item, where each score is related to a

topic. However, the topics here can not be automatically mined

from the user-item bipartite network.

In this paper, by combining the best of LFM and LDA,

we propose a novel unsupervised learning model, named

Latent Interest and Topic Mining model (LITM) to model both

users and items with the better interpretability from user-item

bipartite networks. As shown in Fig. 1, LITM is based on

the following: 1) for each item there is a probability vector

2016 IEEE International Conference on Services Computing

978-1-5090-2628-9/16 $31.00 © 2016 IEEE

DOI 10.1109/SCC.2016.105

778

Fig. 1. Concept of latent user interests and item topics in LITM. At the left,it is a typical user-item bipartite network in recommender system, where auser is connected with an item if he has purchased it. The latent user interestand item topic vector are shown at the right side.

called latent item topic vector; 2) for each user there is a same

length vector called latent user interest vector, in which each

position of the latter vector corresponds to the same position of

the former vector; and 3) the occurrence probability of each

connection is determined by the corresponding user’s latent

interest vector and the corresponding item’s latent topic vector.

Latent user interest vector and latent item topic vector can

generalize the intrinsic patterns underlying the data generation

process into smaller number of latent factors. Taking the

MovieLens as an example, each movie has a probability dis-

tribution over different topics, and each user has a probability

distribution of interests over all topics.

The remainder of this paper is organized as follows. In

Section II we detail the model development and optimization

process of the proposed LITM. In Section III, our experimental

studies show that the proposed LITM is more efficient in mod-

el training against the baseline and it improves the accuracy of

service recommendation. Finally, we summarize the findings

of this paper in Section IV.

II. THE PROPOSED MODEL LITM

In this section we provide the details of the model devel-

opment and optimization process of LITM.

A. Conception and Notations

We explicitly describe the concept of latent item topic

vectors and latent user interest vectors respectively by the

following:

• latent item topic vector vm(m = 1, 2, · · · ,M): for each

item m, we use K-dimensionality probability vector vm

to represent its topic probability distribution over K ∈ Z+

topics;

• latent user interest vector un(n = 1, 2, · · ·N): for each

user n, we use the same length probability vector un

to represent the probability distribution of user’s interest

over the K topics. Each position of un corresponds to

the same position in vm.

Then we define the occurrence probability of the connection

between user n and item m (denoted by un − vm), or the

probability that user n select item m, by the following:

uTnvm =

K∑k=1

un,kvm,k, (1)

where un,k and vm,k are respectively the k−th elements of

un and vm. By doing so, we let un,kvm,k be the occurrence

probability of un−vm on the k−th topic, then the sum of the

occurrence probability over all topics (Eq. 1) can represent the

occurrence probability of the connection un − vm.

B. LITM

Suppose both un(n = 1, 2, · · ·N) and vm(m =1, 2, · · · ,M) are known, the occurrence of all connections

un− vm in bipartite network G are independent, according to

the d-separation criterion in probabilistic graphical models.

So we can define the objective function as the occurrence

probability of the whole bipartite network G as

P (G|U, V ) =

N∏n=1

M∏m=1

(un

Tvm

)σm,n, (2)

where σm,n indicates whether the connection un − vm exists

(σm,n = 1) or not (σm,n = 0), and U = {un|n =1, 2, · · · , N}, V = {vm|m = 1, 2, · · · ,M}.

For the ease of optimization, we take the negative logarithm

of the objective function, then add two regularized items, and

we get the problem to be optimized by the following:

minU,V −N∑

n=1

M∑m=1

σm,n log(un

Tvm

)+

1

N

N∑n=1

||un||2 + 1

M

M∑m=1

||vm||2

s.t. un ≥ 0, ||un||1 = 1, n = 1, 2, · · · , Nvm ≥ 0, ||vm||1 = 1,m = 1, 2, · · · ,M

. (3)

Note that we add the latter two �2 norm regularized terms to

alleviate the potential overfitting.

Analyzing the above problems shows, it shows as follows:

1) This objective function is non-convex (for the unTvm

term). Note, however, if we fix U and treat it as constants,

then the objective is a convex function of V , and vice versa.

We will therefore fix V and optimize U , then fix U and

optimize V , thereby obtaining a series of efficiently solvable

subproblems. As each update of U and V , it decreases the

objective function, from which the process will converge and

find a local minimum eventually. 2) If we fix V and optimize

U , then un, n = 1, 2, · · · , N can be optimized individually,

and vice versa. We take optimizing un as an example, then

we get a subproblem like the following:

minun −M∑

m=1

σm,n log(un

Tvm

)+

1

N||un||2

s.t. un ≥ 0, ||un||1 = 1

, (4)

where vm ≥ 0, ||vm||1 = 1,m = 1, 2, · · · ,M .

Clearly the subproblem (Eq. 4) has no closed-form solution,

and cannot use the gradient descent method directly to get the

optimal numerical solution for the given constraints un ≥0, ||un||1 = 1. However, since this is a convex programming

779

problem with linear constraints and according to the convexity-preserving rules, we can apply Reduced Gradient method

to search for an optimal solution for the abovementioned

subproblems.

Algorithm 1: The adaptive Reduced Gradient method

Input: σm,n ∈ {0, 1},vm,m = 1, 2, · · · ,M ;K ∈ Z+;

Output: u1 Initialize u > 0 ∧ ||u||1 = 1,d �= 0;

2 while d �= 0 do3 umax = max{u};4 uN = u\umax //uN contains all but umax;

5 g = −∑Mm=1 σm,n

vm

uT vm+ 2

Nun//gradient vector ;

6 gmax = gj |uj=umax// the gradient value at umax;

7 for j = 1; j ≤ K; j ++ do8 if uj ∈ uN then9 rj = gj − gmax;

10 else11 loc = j;

12 rj = 0;

13 for j = 1; j ≤ K; j ++ do14 if rj > 0 then15 dj = −ujrj ;

16 else17 dj = −rj ;

18 dloc = −sum{d} //sum{d} is sum value of d;

19 λmax =∞;

20 for j = 1; j ≤ K; j ++ do21 if dj < 0 then22 λmax = min

{λmax,−uj

dj

}

23 search the best step length λbest between [0, λmax];24 u = u+ λbestd;

25 return u;

The Reduced Gradient method uses the equality constraints

to eliminate a subset of variables, thereby it reduces the orig-

inal problem to a bound-constrained problem in the space of

the remaining variables. Algorithm. 1 lists the whole process

to get an optimal solution for the subproblem (Eq. 4) using

the adaptive Reduced Gradient method. Lines 3-4 select the

largest variables as the basic variables. Lines 9-14 ensure

the feasibility of the new solution. Lines 15-19 make sure

to move in the direction to decrease the objective function.

Lines 21-24 calculate the largest possible step length towards

the optimizing direction.

Based on Algorithm. 1 and the previous analysis, we

summarize the basic idea of optimizing the proposed LITM

(Eq. 3) in Algorithm. 2, where lines 3-4 means fixing V and

optimizing U , line 5-6 means fixing U and optimizing V , and

line 2 means repeating the two operations until convergence.

This optimization process will lead us to a local minimum of

the objective function.

Algorithm 2: Basic idea for optimizing LITM

Input: σm,n ∈ {0, 1}, n = 1, 2, · · · , N,m =1, 2, · · · ,M ;K ∈ Z

+;Output: un,vm, n = 1, 2, · · · , N,m = 1, 2, · · · ,M ;

1 Initialize un and vm;

2 while not converged do3 for n = 1;n ≤ N ;n++ do4 un =

minun −∑M

m=1 σm,n log(un

Tvm

)+ 1

N ||un||2// get the solution by Reduced Gradientmethod(see Alg.1);

5 for m = 1;m ≤M ;m++ do6 vm =

minvm −∑N

n=1 σm,n log(un

Tvm

)+ 1

M ||vm||2// get the solution by Reduced Gradientmethod(see Alg.1);

7 return un,vm, n = 1, 2, · · · , N,m = 1, 2, · · · ,M ;

III. EXPERIMENTAL EVALUATION

In this section, we describe a series of experiments that we

conducted on benchmarking MovieLens dataset to evaluate

our proposed LITM and explore how much it can improve

the effectiveness of the service recommendation as well as its

efficiency in model training.

A. Experimental DataSet & Baseline Model

The MovieLens data is from the GroupLens Research1. It

comprises the ratings on 1, 682 items (movies) and 943 users,

where each user votes on movies in five discrete ratings 1 ∼ 5.

To generate the user-item bipartite network, we consider that

the connection between an item and a user exists if and only

if the given rating is at least 3. By this way, we obtained a

82, 520 entry (an entry means an actual user-item connection)

user-item bipartite network as our experimental dataset.

In our experiments, the dataset was randomly partitioned

into two parts, where the training set contained 90% of the

total entries, and the remaining 10% (8,252 entries) entries

constitute the test set.

In regards to the baseline model (we set K = 19 currently,

and more work is under way), we preliminarily adopted the

traditional model LFM [5] that has been described in Section

I. We choosed LFM because it has been extensively used in

the domain of service recommendation, and similar to our

proposed LITM, in LFM each users and items has a latent

vector representation.

B. Experimental Results & Discussions

LFM based recommender systems can provide each user

with an ordered queue of all its uncollected items. Similarly

our proposed LITM can rank a user’s uncollected items in the

order of the occurrence probability of the connections using

1http://www.grouplens.org

780

Fig. 2. The predicted position of each entry in thetest data ranked in ascending order.

Fig. 3. The hitting rate as a function of the lengthof recommendation list.

Fig. 4. Learning curves for LFM and LITM modeltraining.

Eq. 1. For a user un, if the connection un − vm is in the

test set, we measure the position of vm in the ordered queue.

For example, if there are 1000 uncollected items for un, and

vm is the 40th from the top, we say the position of vm is

the top 40/1000, denoted by rankn,m = 0.04. Since the test

entries are actual connections, a good model is expected to

give high recommendations to them, indicated by their good

rank order positions. Fig. 2 shows the position values of 8, 252test entries, which are ordered from the top position (rank →0) to the bottom position (rank → 1). Clearly, the proposed

LITM performs better than LFM.To evaluate the proposed LITM in recommender systems

from another point of view, we adopted a measure of recom-

mending accuracy that depends on the length of recommenda-

tion list. The recommendation list for a user un with length L,

contains L highest ranking items generated from the model.

For each entry un − vm in the test data, if vm is in ui’s

recommendation list, we say the entry un−vm is “hit” by the

model. The ratio of hit entries is called “hitting rate”. For a

given L, the model with a higher hitting rate is better, and vice

versa. Clearly, the hitting rate monotonously increases with L,

with the upper bound 1 for sufficiently large L. In Fig. 3, we

report the hitting rate as a function of L for LFM and LITM,

and it’s easy to see that LITM had a better performance than

LFM.We also evaluated the efficiency of model training. Both

training processes of LFM and LITM will generate a series

of solutions in model training, and we plotted the base-10

logarithm of the objective function values of these solutions in

Fig. 4. It shows that the proposed LITM converged faster than

the traditional LFM, which demonstrates LITM’s efficiency

in model training. By changing the learning rate in model

training, we obtained the similar results as described above.

Despite three distinct metrics being used in the performance

evaluation experiments using three distinct metrics, it consis-

tently demonstrated the LITM’s efficiency in model training

and its ability for better service recommendation performance.

IV. CONCLUSION

This paper presented a novel unsupervised learning model

LITM to automatically mine the latent user interest and item

topic distribution from user-item bipartite networks. LITM

improves LFM in its low interpretability. In addition, this

work not only provides an efficient method for latent user

interest and item topic mining, but also highlights a new way

to improve accuracy of service recommendation.

ACKNOWLEDGMENTS

The work presented in this study is supported by NSFC

(61472047); NSFC(61571066).

REFERENCES

[1] T. Hu, H. Xiong, and S. Y. Sung, “Co-preserving patterns in bipartitepartitioning for topic identification.,” in SIAM International Conferenceon Data Mining (SDM), pp. 509–514, SIAM, 2007.

[2] X. Tang, M. Zhang, and C. C. Yang, “User interest and topic detec-tion for personalized recommendation,” in Proceedings of the 2012IEEE/WIC/ACM International Joint Conferences on Web Intelligenceand Intelligent Agent Technology (WI-IAT), pp. 442–446, IEEE Com-puter Society, 2012.

[3] T. de Paulo Faleiros and A. de Andrade Lopes, “Bipartite graph for topicextraction,” in Proceedings of the 24th International Joint Conferenceon Artificial Intelligence (IJCAI), pp. 4361–4362, AAAI Press, 2015.

[4] F. M. Harper and J. A. Konstan, “The movielens datasets: History andcontext,” ACM Transactions on Interactive Intelligent Systems, vol. 5,no. 4, p. 19, 2015.

[5] R. Bell, Y. Koren, and C. Volinsky, “Modeling relationships at multiplescales to improve accuracy of large recommender systems,” in Proceed-ings of the 13th ACM SIGKDD International Conference on KnowledgeDiscovery and Data Mining, pp. 95–104, ACM, 2007.

[6] Y. Shen and R. Jin, “Learning personal+ social latent factor modelfor social recommendation,” in Proceedings of the 18th ACM SIGKDDInternational Conference on Knowledge Discovery and Data Mining,pp. 1303–1311, ACM, 2012.

[7] G. Friedrich and M. Zanker, “A taxonomy for generating explanationsin recommender systems,” AI Magazine, vol. 32, no. 3, pp. 90–98, 2011.

[8] S. R. Eddy, “A probabilistic model of local sequence alignment that sim-plifies statistical significance estimation,” PLOS Computational Biology,vol. 4, no. 5, p. e1000069, 2008.

[9] T. Hofmann, “Probabilistic latent semantic indexing,” in Proceedings ofthe 22nd Annual International ACM SIGIR Conference on Research andDevelopment in Information Retrieval, pp. 50–57, ACM, 1999.

[10] D. D. Lee and H. S. Seung, “Algorithms for non-negative matrixfactorization,” in Proceedings of the Annual Conference on NeuralInformation Processing Systems (NIPS), pp. 556–562, 2001.

[11] D. M. Blei, A. Y. Ng, and M. I. Jordan, “Latent dirichlet allocation,”Journal of Machine Learning Research, vol. 3, pp. 993–1022, 2003.

[12] J. Tang, R. Jin, and J. Zhang, “A topic modeling approach and itsintegration into the random walk framework for academic search,” inProceedings of the 8th IEEE International Conference on Data Mining(ICDM), pp. 1055–1060, IEEE, 2008.

[13] T. H. Haveliwala, “Topic-sensitive pagerank: A context-sensitive rankingalgorithm for web search,” IEEE Transactions on Knowledge and DataEngineering, vol. 15, no. 4, pp. 784–796, 2003.

781