mining heterogeneous information...

64
APPLICATIONS OF MINING HETEROGENEOUS INFORMATION NETWORKS Yizhou Sun College of Computer and Information Science Northeastern University [email protected] July 25, 2015

Upload: others

Post on 21-Aug-2020

2 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Mining Heterogeneous Information Networkskdd.cs.ksu.edu/Workshops/HINA-2015/Presentations/Slides/01-Sun... · 25/07/2015  · Actor Movie Director Movie Studio 1. ... Leonardo Dicaprio

APPLICATIONS OF MINING HETEROGENEOUS INFORMATION NETWORKS

Yizhou SunCollege of Computer and Information Science

Northeastern University

[email protected]

July 25, 2015

Page 2: Mining Heterogeneous Information Networkskdd.cs.ksu.edu/Workshops/HINA-2015/Presentations/Slides/01-Sun... · 25/07/2015  · Actor Movie Director Movie Studio 1. ... Leonardo Dicaprio

Heterogeneous Information Networks

• Multiple object types and/or multiple link types

1

Venue Paper Author

DBLP Bibliographic Network The IMDB Movie Network

Actor

Movie

Director

Movie

Studio

1. Homogeneous networks are Information loss projection of heterogeneous networks!2. New problems are emerging in heterogeneous networks!

The Facebook Network

Directly Mining information richer heterogeneous networks

Page 3: Mining Heterogeneous Information Networkskdd.cs.ksu.edu/Workshops/HINA-2015/Presentations/Slides/01-Sun... · 25/07/2015  · Actor Movie Director Movie Studio 1. ... Leonardo Dicaprio

Outline

• Why Heterogeneous Information Networks?

• Entity Recommendation

• Information Diffusion

• Ideology Detection

• Summary

2

Page 4: Mining Heterogeneous Information Networkskdd.cs.ksu.edu/Workshops/HINA-2015/Presentations/Slides/01-Sun... · 25/07/2015  · Actor Movie Director Movie Studio 1. ... Leonardo Dicaprio

Recommendation Paradigm

3

recommender system recommendation

user

feedback

external knowledge

product features

community user-item feedback

Collaborative FilteringE.g., K-Nearest Neighbor (Sarwar WWW’01), Matrix Factorization (Hu ICDM’08, Koren IEEE-CS’09), Probabilistic Model (Hofmann SIGIR’03)

Content-Based MethodsE.g., (Balabanovic Comm. ACM’ 97, Zhang SIGIR’02)

Hybrid MethodsE.g., Content-Based CF (Antonopoulus, IS’06), External Knowledge CF (Ma WSDM’11)

Page 5: Mining Heterogeneous Information Networkskdd.cs.ksu.edu/Workshops/HINA-2015/Presentations/Slides/01-Sun... · 25/07/2015  · Actor Movie Director Movie Studio 1. ... Leonardo Dicaprio

Problem Definition

4

recommender system recommendation

user

feedback

information network

implicit user feedback

hybrid collaborative filteringwith information networks

Page 6: Mining Heterogeneous Information Networkskdd.cs.ksu.edu/Workshops/HINA-2015/Presentations/Slides/01-Sun... · 25/07/2015  · Actor Movie Director Movie Studio 1. ... Leonardo Dicaprio

Hybrid Collaborative Filtering with Networks

• Utilizing network relationship information canenhance the recommendation quality

• However, most of the previous studies only usesingle type of relationship between users or items(e.g., social network Ma,WSDM’11, trust relationshipEster, KDD’10, service membership Yuan, RecSys’11)

5

Page 7: Mining Heterogeneous Information Networkskdd.cs.ksu.edu/Workshops/HINA-2015/Presentations/Slides/01-Sun... · 25/07/2015  · Actor Movie Director Movie Studio 1. ... Leonardo Dicaprio

The Heterogeneous Information Network View of Recommender System

6

Avatar TitanicAliens Revolution-ary Road

James Cameron

Kate Winslet

Leonardo Dicaprio

Zoe Saldana Adventure

Romance

Page 8: Mining Heterogeneous Information Networkskdd.cs.ksu.edu/Workshops/HINA-2015/Presentations/Slides/01-Sun... · 25/07/2015  · Actor Movie Director Movie Studio 1. ... Leonardo Dicaprio

Relationship Heterogeneity Alleviates Data Sparsity

7

# of users or items

A small numberof users and itemshave a largenumber of ratings

Most users and items havea small number of ratings

# o

f ra

tin

gs

Collaborative filtering methods suffer from data sparsity issue

• Heterogeneous relationships complement each other

• Users and items with limited feedback can be connected to the

network by different types of paths

• Connect new users or items (cold start) in the information

network

Page 9: Mining Heterogeneous Information Networkskdd.cs.ksu.edu/Workshops/HINA-2015/Presentations/Slides/01-Sun... · 25/07/2015  · Actor Movie Director Movie Studio 1. ... Leonardo Dicaprio

Relationship Heterogeneity Based Personalized Recommendation Models

8

Different users may have different behaviors or preferences

Aliens

James Cameron fan

80s Sci-fi fan

Sigourney Weaver fan

Different users may beinterested in the samemovie for different reasons

Two levels of personalizationData level• Most recommendation methods use

one model for all users and rely on

personal feedback to achieve

personalization

Model level• With different entity relationships, we

can learn personalized models for

different users to further distinguish

their differences

Page 10: Mining Heterogeneous Information Networkskdd.cs.ksu.edu/Workshops/HINA-2015/Presentations/Slides/01-Sun... · 25/07/2015  · Actor Movie Director Movie Studio 1. ... Leonardo Dicaprio

Preference Propagation-Based Latent Features

9

Alice

Bob

Kate Winslet

Naomi Watts

Titanic

revolutionary road

skyfall

King Konggenre: drama

Sam Mendes

tag: Oscar NominationCharlie

Generate L different meta-path (path types)

connecting users and items

Propagate user implicit feedback along each meta-

path

Calculate latent-features for users and items for each

meta-path with NMFrelated method

Ralph Fiennes

Page 11: Mining Heterogeneous Information Networkskdd.cs.ksu.edu/Workshops/HINA-2015/Presentations/Slides/01-Sun... · 25/07/2015  · Actor Movie Director Movie Studio 1. ... Leonardo Dicaprio

Luser-cluster similarity

Recommendation Models

10

Observation 1: Different meta-paths may have different importance

Global Recommendation Model

Personalized Recommendation Model

Observation 2: Different users may require different models

ranking score

the q-th meta-path

features for user i and item j

c total soft user clusters

(1)

(2)

Page 12: Mining Heterogeneous Information Networkskdd.cs.ksu.edu/Workshops/HINA-2015/Presentations/Slides/01-Sun... · 25/07/2015  · Actor Movie Director Movie Studio 1. ... Leonardo Dicaprio

Parameter Estimation

11

• Bayesian personalized ranking (Rendle UAI’09)

• Objective function

minΘ

sigmoid function

for each correctly ranked item pairi.e., 𝑢𝑖 gave feedback to 𝑒𝑎 but not 𝑒𝑏

Soft cluster users with NMF + k-means

For each user cluster, learn one

model with Eq. (3)

Generate personalized model for each user on the

fly with Eq. (2)

(3)

Learning Personalized Recommendation Model

Page 13: Mining Heterogeneous Information Networkskdd.cs.ksu.edu/Workshops/HINA-2015/Presentations/Slides/01-Sun... · 25/07/2015  · Actor Movie Director Movie Studio 1. ... Leonardo Dicaprio

Experiment Setup

• Datasets

• Comparison methods:

• Popularity: recommend the most popular items to users

• Co-click: conditional probabilities between items

• NMF: non-negative matrix factorization on user feedback

• Hybrid-SVM: use Rank-SVM with plain features (utilize

both user feedback and information network)

12

Page 14: Mining Heterogeneous Information Networkskdd.cs.ksu.edu/Workshops/HINA-2015/Presentations/Slides/01-Sun... · 25/07/2015  · Actor Movie Director Movie Studio 1. ... Leonardo Dicaprio

Performance Comparison

13

HeteRec personalized recommendation (HeteRec-p) provides the best recommendation results

Page 15: Mining Heterogeneous Information Networkskdd.cs.ksu.edu/Workshops/HINA-2015/Presentations/Slides/01-Sun... · 25/07/2015  · Actor Movie Director Movie Studio 1. ... Leonardo Dicaprio

Performance under Different Scenarios

14

HeteRec–p consistently outperform other methods in different scenariosbetter recommendation results if users provide more feedbackbetter recommendation for users who like less popular items

p p

user

Page 16: Mining Heterogeneous Information Networkskdd.cs.ksu.edu/Workshops/HINA-2015/Presentations/Slides/01-Sun... · 25/07/2015  · Actor Movie Director Movie Studio 1. ... Leonardo Dicaprio

Contributions

• Propose latent representations for users and itemsby propagating user preferences along differentmeta-paths

• Employ Bayesian ranking optimization technique tocorrectly evaluate recommendation models

• Further improve recommendation quality byconsidering user differences at model level anddefine personalized recommendation models

• Two levels of personalization

15

Entity Recommendation in InformationNetworks with Implicit User Feedback

(RecSys’13, WSDM’14a)

Page 17: Mining Heterogeneous Information Networkskdd.cs.ksu.edu/Workshops/HINA-2015/Presentations/Slides/01-Sun... · 25/07/2015  · Actor Movie Director Movie Studio 1. ... Leonardo Dicaprio

Outline

• Why Heterogeneous Information Networks?

• Entity Recommendation

• Information Diffusion

• Ideology Detection

• Summary

16

Page 18: Mining Heterogeneous Information Networkskdd.cs.ksu.edu/Workshops/HINA-2015/Presentations/Slides/01-Sun... · 25/07/2015  · Actor Movie Director Movie Studio 1. ... Leonardo Dicaprio

Information Diffusion in Networks

• Action of a node is triggered by the actions of their neighbors

17

Page 19: Mining Heterogeneous Information Networkskdd.cs.ksu.edu/Workshops/HINA-2015/Presentations/Slides/01-Sun... · 25/07/2015  · Actor Movie Director Movie Studio 1. ... Leonardo Dicaprio

Linear Threshold Model

• [Granovetter, 1978]

• If the weighted activation number of its neighbors is bigger

than a pre-specified threshold 𝜃𝑢, the node u is going to be

activated

• In other words

• 𝑝𝑢(𝑡 + 1) = 𝐸[1 𝑣∈Γ 𝑢 𝑤𝑣,𝑢𝛿 𝑢, 𝑡 > 𝜃𝑢 ]

18

Page 20: Mining Heterogeneous Information Networkskdd.cs.ksu.edu/Workshops/HINA-2015/Presentations/Slides/01-Sun... · 25/07/2015  · Actor Movie Director Movie Studio 1. ... Leonardo Dicaprio

Heterogeneous Bibliographic Network

• Multiple types of objects

• Multiple types of links

19

Page 21: Mining Heterogeneous Information Networkskdd.cs.ksu.edu/Workshops/HINA-2015/Presentations/Slides/01-Sun... · 25/07/2015  · Actor Movie Director Movie Studio 1. ... Leonardo Dicaprio

Derived Multi-Relational Bibliographic Network

• Collaboration: Author-Paper-Author

• Citation: Author-Paper->Paper-Author

• Sharing Co-authors: Author-Paper-Author-Paper-Author

• Co-attending venues: Author-Paper-Venue-Paper-Author

20

How to generate these meta-paths ?PathSim: Sun et.al, VLDB’11

Page 22: Mining Heterogeneous Information Networkskdd.cs.ksu.edu/Workshops/HINA-2015/Presentations/Slides/01-Sun... · 25/07/2015  · Actor Movie Director Movie Studio 1. ... Leonardo Dicaprio

How Topics Are Propagated among Authors?

• To Apply Existing approaches

• Select one relation between authors (say,

A-P-A)

• Use all the relations, but ignore the relation

types

• Do different relation types play different roles?

• Need new models!

21

Page 23: Mining Heterogeneous Information Networkskdd.cs.ksu.edu/Workshops/HINA-2015/Presentations/Slides/01-Sun... · 25/07/2015  · Actor Movie Director Movie Studio 1. ... Leonardo Dicaprio

Two Assumptions for Topic Diffusion in Multi-Relational Networks

• Assumption 1: Relation independent diffusion

22

Model-level aggregation

Page 24: Mining Heterogeneous Information Networkskdd.cs.ksu.edu/Workshops/HINA-2015/Presentations/Slides/01-Sun... · 25/07/2015  · Actor Movie Director Movie Studio 1. ... Leonardo Dicaprio

• Assumption 2: Relation interdependent diffusion

23

Relation-level aggregation

Page 25: Mining Heterogeneous Information Networkskdd.cs.ksu.edu/Workshops/HINA-2015/Presentations/Slides/01-Sun... · 25/07/2015  · Actor Movie Director Movie Studio 1. ... Leonardo Dicaprio

Two Models under the Two Assumptions

• Two multi-relational linear threshold models

• Model 1: MLTM-M

• Model-level aggregation

• Model 2: MLTM-R

• Relation-level aggregation

24

Page 26: Mining Heterogeneous Information Networkskdd.cs.ksu.edu/Workshops/HINA-2015/Presentations/Slides/01-Sun... · 25/07/2015  · Actor Movie Director Movie Studio 1. ... Leonardo Dicaprio

MLTM-M

• For each relation type k

• The activation probability for object i at time t+1:

• The collective model

• The final activation probability for object i is an aggregation

over all relation types

25

Page 27: Mining Heterogeneous Information Networkskdd.cs.ksu.edu/Workshops/HINA-2015/Presentations/Slides/01-Sun... · 25/07/2015  · Actor Movie Director Movie Studio 1. ... Leonardo Dicaprio

Properties of MLTM-M

26

Page 28: Mining Heterogeneous Information Networkskdd.cs.ksu.edu/Workshops/HINA-2015/Presentations/Slides/01-Sun... · 25/07/2015  · Actor Movie Director Movie Studio 1. ... Leonardo Dicaprio

MLTM-R

• Aggregate multi-relational network with different weights

• Treat the activation as in a single-relational network

27

To make sure the activation probability non-negative, weights 𝛽′𝑠 are required non-negative

Page 29: Mining Heterogeneous Information Networkskdd.cs.ksu.edu/Workshops/HINA-2015/Presentations/Slides/01-Sun... · 25/07/2015  · Actor Movie Director Movie Studio 1. ... Leonardo Dicaprio

Properties of MLTM-R

28

Page 30: Mining Heterogeneous Information Networkskdd.cs.ksu.edu/Workshops/HINA-2015/Presentations/Slides/01-Sun... · 25/07/2015  · Actor Movie Director Movie Studio 1. ... Leonardo Dicaprio

How to Evaluate the Two Models?

• Test on the real action log on multiple topics!

• 𝐴𝑐𝑡𝑖𝑜𝑛 𝑙𝑜𝑔: {< 𝑢𝑖 , 𝑡𝑖 >}

• Diffusion model learning from action log

• MLE estimation over 𝛽′𝑠

29

Page 31: Mining Heterogeneous Information Networkskdd.cs.ksu.edu/Workshops/HINA-2015/Presentations/Slides/01-Sun... · 25/07/2015  · Actor Movie Director Movie Studio 1. ... Leonardo Dicaprio

Two Real Datasets

• DBLP

• Computer Science

• Relation types

• APA, AP->PA, APAPA, APVPA

• APS

• Physics

• Relation types

• APA, AP->PA, APAPA, APOPA

30

Page 32: Mining Heterogeneous Information Networkskdd.cs.ksu.edu/Workshops/HINA-2015/Presentations/Slides/01-Sun... · 25/07/2015  · Actor Movie Director Movie Studio 1. ... Leonardo Dicaprio

Topics Selected

• Select topics with increasing trends

31

Page 33: Mining Heterogeneous Information Networkskdd.cs.ksu.edu/Workshops/HINA-2015/Presentations/Slides/01-Sun... · 25/07/2015  · Actor Movie Director Movie Studio 1. ... Leonardo Dicaprio

Evaluation Methods

• Global Prediction

• How many authors are activated at t+1

• Error rate = ½(predicted#/true# + true#/predicted#)-1

• Local Prediction

• Which author is likely to be activated at t+1

• AUPR (Area under Precision-Recall Curve)

32

Page 34: Mining Heterogeneous Information Networkskdd.cs.ksu.edu/Workshops/HINA-2015/Presentations/Slides/01-Sun... · 25/07/2015  · Actor Movie Director Movie Studio 1. ... Leonardo Dicaprio

Global Prediction

33

Page 35: Mining Heterogeneous Information Networkskdd.cs.ksu.edu/Workshops/HINA-2015/Presentations/Slides/01-Sun... · 25/07/2015  · Actor Movie Director Movie Studio 1. ... Leonardo Dicaprio

Local Prediction - AUPR

• 1: Different Relation Play Different Roles in Diffusion Process

• 2: Relation-Level Aggregation is better than Model-Level Aggregation

34

Page 36: Mining Heterogeneous Information Networkskdd.cs.ksu.edu/Workshops/HINA-2015/Presentations/Slides/01-Sun... · 25/07/2015  · Actor Movie Director Movie Studio 1. ... Leonardo Dicaprio

Case Study

35

Page 37: Mining Heterogeneous Information Networkskdd.cs.ksu.edu/Workshops/HINA-2015/Presentations/Slides/01-Sun... · 25/07/2015  · Actor Movie Director Movie Studio 1. ... Leonardo Dicaprio

Prediction Results on “social network” Diffusion

36

Page 38: Mining Heterogeneous Information Networkskdd.cs.ksu.edu/Workshops/HINA-2015/Presentations/Slides/01-Sun... · 25/07/2015  · Actor Movie Director Movie Studio 1. ... Leonardo Dicaprio

37

Page 39: Mining Heterogeneous Information Networkskdd.cs.ksu.edu/Workshops/HINA-2015/Presentations/Slides/01-Sun... · 25/07/2015  · Actor Movie Director Movie Studio 1. ... Leonardo Dicaprio

38

WIN!

Page 40: Mining Heterogeneous Information Networkskdd.cs.ksu.edu/Workshops/HINA-2015/Presentations/Slides/01-Sun... · 25/07/2015  · Actor Movie Director Movie Studio 1. ... Leonardo Dicaprio

Outline

• Why Heterogeneous Information Networks?

• Entity Recommendation

• Information Diffusion

• Ideology Detection

• Summary

39

Page 41: Mining Heterogeneous Information Networkskdd.cs.ksu.edu/Workshops/HINA-2015/Presentations/Slides/01-Sun... · 25/07/2015  · Actor Movie Director Movie Studio 1. ... Leonardo Dicaprio

• Topic-Factorized Ideal Point Estimation Model for Legislative Voting Network (KDD’14, Gu, Sun et al.)

40

Page 42: Mining Heterogeneous Information Networkskdd.cs.ksu.edu/Workshops/HINA-2015/Presentations/Slides/01-Sun... · 25/07/2015  · Actor Movie Director Movie Studio 1. ... Leonardo Dicaprio

Background

Federal Legislation

(bill)Law

The HouseSenate

Ronald Paul

Bill 1 Bill 2 ……

Barack Obama

Ronald Paul

liberal conservative

Politician

Republican Democrat

Barack Obama

41

United Stated Congress

The House Senate

Page 43: Mining Heterogeneous Information Networkskdd.cs.ksu.edu/Workshops/HINA-2015/Presentations/Slides/01-Sun... · 25/07/2015  · Actor Movie Director Movie Studio 1. ... Leonardo Dicaprio

Legislative Voting Network

42

Page 44: Mining Heterogeneous Information Networkskdd.cs.ksu.edu/Workshops/HINA-2015/Presentations/Slides/01-Sun... · 25/07/2015  · Actor Movie Director Movie Studio 1. ... Leonardo Dicaprio

Problem Definition

Input: Legislative Network

Output:𝒙𝑢: Ideal Points for Politician 𝑢𝒂𝑑: Ideal Points for Bill 𝑑

43

𝒙𝑢’s on different topics

Page 45: Mining Heterogeneous Information Networkskdd.cs.ksu.edu/Workshops/HINA-2015/Presentations/Slides/01-Sun... · 25/07/2015  · Actor Movie Director Movie Studio 1. ... Leonardo Dicaprio

Existing Work

• 1-dimensional ideal point model (Poole and Rosenthal, 1985; Gerrish and Blei, 2011)

• High-dimensional ideal point model (Poole and Rosenthal, 1997)

• Issue-adjusted ideal point model (Gerrish and Blei, 2012)

44

Page 46: Mining Heterogeneous Information Networkskdd.cs.ksu.edu/Workshops/HINA-2015/Presentations/Slides/01-Sun... · 25/07/2015  · Actor Movie Director Movie Studio 1. ... Leonardo Dicaprio

Motivations

45

Topic 1Topic 2Topic 3Topic 4

• Voters have different positions on different topics.

• Traditional matrix factorization method cannot give the meanings for each dimension.

𝑀 𝑈≈ ⋅ 𝑉𝑇

𝑘𝑡ℎ latent factor

• Topics of bills can influence politician’s voting, and the voting behavior can better interpret the topics of bills as well.

Topic Model:• Health• Public Transport• …

Voting-guided Topic Model:• Health Service• Health Expenses• Public Transport• …

Page 47: Mining Heterogeneous Information Networkskdd.cs.ksu.edu/Workshops/HINA-2015/Presentations/Slides/01-Sun... · 25/07/2015  · Actor Movie Director Movie Studio 1. ... Leonardo Dicaprio

Topic-Factorized IPM

46

𝑢 𝑑

𝑤

PoliticiansBills

Terms

Heterogeneous Voting Network

𝑛(𝑑, 𝑤)𝑣𝑢𝑑

Entities:• Politicians• Bills• Terms

Links:• (𝑃, 𝐵)• (𝐵, 𝑇)

Parameters to maximize the likelihood of generating two types of links:• Ideal points for politicians• Ideal points for bills• Topic models

Page 48: Mining Heterogeneous Information Networkskdd.cs.ksu.edu/Workshops/HINA-2015/Presentations/Slides/01-Sun... · 25/07/2015  · Actor Movie Director Movie Studio 1. ... Leonardo Dicaprio

Text Part

47

PoliticiansBills

Terms

Page 49: Mining Heterogeneous Information Networkskdd.cs.ksu.edu/Workshops/HINA-2015/Presentations/Slides/01-Sun... · 25/07/2015  · Actor Movie Director Movie Studio 1. ... Leonardo Dicaprio

Text Part

• We model the probability of each word in each document as a mixture of categorical distributions, as in PLSA (Hofmann, 1999) and LDA (Blei et al., 2003)

𝑑 𝑘 𝑤𝜃𝑑𝑘 = 𝑝(𝑘|𝑑) 𝛽𝑘𝑤 = 𝑝(𝑤|𝑘)

Bill Topic Word

𝒘𝑑 = 𝑛 𝑑, 1 , 𝑛 𝑑, 2 ,… , 𝑛 𝑑,𝑁𝑤

𝑝 𝒘𝑑 𝜽,𝜷 ∝

𝑤

(

𝑘

𝜃𝑑𝑘𝛽𝑘𝑤)

𝑛(𝑑,𝑤)

𝑝 𝑾 𝜽,𝜷 ∝

𝑑

𝑤

(

𝑘

𝜃𝑑𝑘𝛽𝑘𝑤)

𝑛(𝑑,𝑤)

48

Page 50: Mining Heterogeneous Information Networkskdd.cs.ksu.edu/Workshops/HINA-2015/Presentations/Slides/01-Sun... · 25/07/2015  · Actor Movie Director Movie Studio 1. ... Leonardo Dicaprio

Voting Part

49

PoliticiansBills

Terms

Intuitions:• The more similar of

the ideal points of uand d, the higher probability of “YEA” link

• The higher portion a bill belongs to topic k, the higher weight of ideal points on topic k

Page 51: Mining Heterogeneous Information Networkskdd.cs.ksu.edu/Workshops/HINA-2015/Presentations/Slides/01-Sun... · 25/07/2015  · Actor Movie Director Movie Studio 1. ... Leonardo Dicaprio

Voting Part

YEA𝑝 𝑣𝑢𝑑 = 1 = 𝜎(

𝑘

𝜃𝑑𝑘𝑥𝑢𝑘𝑎𝑑𝑘 + 𝑏𝑑)

𝑥𝑢1 𝑥𝑢2 𝑥𝑢𝑘 𝑥𝑢𝐾

𝑎𝑑1 𝑎𝑑2 𝑎𝑑𝑘 𝑎𝑑𝐾

𝒙𝑢

𝒂𝑑

Topic1

Topic2

…… Topic

𝑘Topic

𝐾……

0 1 -1 1 1 1 1 1

0 0 -1 1 1 1 -1 1

1 1 1 1 -1 1 0 0

𝑢1

𝑢2

𝑢𝑁𝑈

𝑑1 𝑑2 𝑑𝑁𝐷……

……

User-Bill voting matrix 𝑽

𝑟𝑢𝑑 =

𝑘=1

𝐾

𝑥𝑢𝑘𝑎𝑑𝑘

𝑟𝑢𝑑 =

𝑘=1

𝐾

𝜃𝑑𝑘𝑥𝑢𝑘𝑎𝑑𝑘

𝑝 𝑣𝑢𝑑 = −1 = 1 − 𝜎(

𝑘

𝜃𝑑𝑘𝑥𝑢𝑘𝑎𝑑𝑘 + 𝑏𝑑)NAY

Voter 𝑢

Bill 𝑑

𝑝 𝑽 𝜽,𝑿, 𝑨, 𝒃 =

𝑢,𝑑 :𝑣𝑢𝑑≠0

(𝑝 𝑣𝑢𝑑 = 11+𝑣𝑢𝑑2 𝑝 𝑣𝑢𝑑 = −1

1−𝑣𝑢𝑑2 )

50

𝑥𝑢1

𝑥𝑢𝑘

𝑥𝑢𝐾

𝑎𝑑1

𝑎𝑑𝑘

𝑎𝑑𝐾

……

……

𝜃𝑑𝑘

𝜃𝑑1

𝜃𝑑𝐾

𝑥𝑢𝑘 ∈ 𝑹

𝑎𝑑𝑘 ∈ 𝑹

𝐼{𝑣𝑢𝑑=1} 𝐼{𝑣𝑢𝑑=−1}

Page 52: Mining Heterogeneous Information Networkskdd.cs.ksu.edu/Workshops/HINA-2015/Presentations/Slides/01-Sun... · 25/07/2015  · Actor Movie Director Movie Studio 1. ... Leonardo Dicaprio

Combining Two Parts Together

• The final objective function is a linear combination of the two average log-likelihood functions over the word links and voting links.

• We also add an 𝑙2 regularization term to 𝐴 and 𝑋 to reduce over-fitting.

51

Page 53: Mining Heterogeneous Information Networkskdd.cs.ksu.edu/Workshops/HINA-2015/Presentations/Slides/01-Sun... · 25/07/2015  · Actor Movie Director Movie Studio 1. ... Leonardo Dicaprio

Learning Algorithm

• An iterative algorithm where ideal points related parameters (𝑋, 𝐴, 𝑏) and topic model related parameters (𝜃, 𝛽) enhance each other.

• Step 1: Update 𝑋, 𝐴, 𝑏 given 𝜃, 𝛽• Gradient descent

• Step 2: Update 𝜃, 𝛽 given 𝑋, 𝐴, 𝑏• Follow the idea of expectation-maximization (EM) algorithm:

maximize a lower bound of the objective function in each iteration

52

Page 54: Mining Heterogeneous Information Networkskdd.cs.ksu.edu/Workshops/HINA-2015/Presentations/Slides/01-Sun... · 25/07/2015  · Actor Movie Director Movie Studio 1. ... Leonardo Dicaprio

Learning Algorithm

• Update 𝜃: A nonlinear constrained optimization problem.

Remove the constraints by a logistic function based transformation:

and update 𝜇𝑑𝑘 using gradient descent.

• Update 𝛽:

Since 𝛽 only appears in the topic model part, we use the same updating rule as in PLSA:

where

𝑒𝜇𝑑𝑘

1 + 𝑘′=1𝐾−1 𝑒𝜇𝑑𝑘′

1

1 + 𝑘′=1𝐾−1 𝑒𝜇𝑑𝑘′

if 1 ≤ 𝑘 ≤ 𝐾 − 1

if 𝑘 = 𝐾

𝜃𝑑𝑘 =

53

Page 55: Mining Heterogeneous Information Networkskdd.cs.ksu.edu/Workshops/HINA-2015/Presentations/Slides/01-Sun... · 25/07/2015  · Actor Movie Director Movie Studio 1. ... Leonardo Dicaprio

Data Description

• Dataset:

• U.S. House and Senate roll call data in the years between 1990

and 2013.∗

• 1,540 legislators

• 7,162 bills

• 2,780,453 votes (80% are “YEA”)

• Keep the latest version of a bill if there are multiple versions.

• Randomly select 90% of the votes as training and 10% as

testing.

∗Downloaded from http://thomas.loc.gov/home/rollcallvotes.html

54

Page 56: Mining Heterogeneous Information Networkskdd.cs.ksu.edu/Workshops/HINA-2015/Presentations/Slides/01-Sun... · 25/07/2015  · Actor Movie Director Movie Studio 1. ... Leonardo Dicaprio

Evaluation Measures

• Root mean square error (RMSE) between the predicted vote score and the ground truth

RMSE = 𝑢,𝑑 :𝑣𝑢𝑑≠0

1+𝑣𝑢𝑑2−𝑝 𝑣𝑢𝑑=1

2

𝑁𝑉

• Accuracy of correctly predicted votes (using 0.5 as a threshold for the predicted accuracy)

Accuracy = 𝑢,𝑑(𝐼 𝑝 𝑣𝑢𝑑=1 >0.5 && 𝑣𝑢𝑑=1

+𝐼 𝑝 𝑣𝑢𝑑=1 <0.5 && 𝑣𝑢𝑑=−1)

𝑁𝑉

• Average log-likelihood of the voting link

AvelogL = 𝑢,𝑑 :𝑣𝑢𝑑≠0

1+𝑣𝑢𝑑2log 𝑝 𝑣𝑢𝑑=1 +

1−𝑣𝑢𝑑2log 𝑝(𝑣𝑢𝑑=−1)

𝑁𝑉

55

Page 57: Mining Heterogeneous Information Networkskdd.cs.ksu.edu/Workshops/HINA-2015/Presentations/Slides/01-Sun... · 25/07/2015  · Actor Movie Director Movie Studio 1. ... Leonardo Dicaprio

Experimental Results

Training Data set Testing Data set

56

Page 58: Mining Heterogeneous Information Networkskdd.cs.ksu.edu/Workshops/HINA-2015/Presentations/Slides/01-Sun... · 25/07/2015  · Actor Movie Director Movie Studio 1. ... Leonardo Dicaprio

Parameter Study

Parameter study on 𝜆 Parameter study on 𝜎 (regularization coefficient)

57

𝐽 𝜽, 𝜷, 𝑿, 𝑨, 𝒃 = 1 − 𝜆 ⋅ 𝑎𝑣𝑒𝑙𝑜𝑔𝐿 𝑡𝑒𝑥𝑡 + 𝜆 ⋅ 𝑎𝑣𝑒𝑙𝑜𝑔𝐿(𝑣𝑜𝑡𝑖𝑛𝑔) −1

2𝜎2(

𝑢

𝒙𝒖 22+

𝑑

𝒂𝑑 22)

Page 59: Mining Heterogeneous Information Networkskdd.cs.ksu.edu/Workshops/HINA-2015/Presentations/Slides/01-Sun... · 25/07/2015  · Actor Movie Director Movie Studio 1. ... Leonardo Dicaprio

Foreign

Education

Individual Property

MilitaryFinancial Institution

Law

Health Service

Health Expenses

Funds

Public Transportation

Ronald Paul Barack Obama Joe Lieberman

Case Studies

• Ideal points for three famous politicians: (Republican, Democrat)

• Ronald Paul (R), Barack Obama (D), Joe Lieberman (D)

58

Page 60: Mining Heterogeneous Information Networkskdd.cs.ksu.edu/Workshops/HINA-2015/Presentations/Slides/01-Sun... · 25/07/2015  · Actor Movie Director Movie Studio 1. ... Leonardo Dicaprio

Case Studies

• Scatter plots over selected dimensions:(Republican, Democrat)

59

Page 61: Mining Heterogeneous Information Networkskdd.cs.ksu.edu/Workshops/HINA-2015/Presentations/Slides/01-Sun... · 25/07/2015  · Actor Movie Director Movie Studio 1. ... Leonardo Dicaprio

𝑥𝑢𝑘

Case Studies

𝑝 𝑣𝑢𝑑 = 1 = 𝜎(

𝑘

𝜃𝑑𝑘𝑥𝑢𝑘𝑎𝑑𝑘 + 𝑏𝑑)

Bill: H_RES_578 — 109th Congress (2005-2006)It is about supporting the government of Romania to improve the standard health care and well-being of children in Romania.

YEAR. Paul H_RES_578

60

Topic Model

TF-IPM

Experts/Algorithm

𝜽𝑑

𝒙𝑢

𝒂𝑑

𝑝 𝑣𝑢𝑑 = 1

For Unseen Bill 𝑑:

Page 62: Mining Heterogeneous Information Networkskdd.cs.ksu.edu/Workshops/HINA-2015/Presentations/Slides/01-Sun... · 25/07/2015  · Actor Movie Director Movie Studio 1. ... Leonardo Dicaprio

Outline

• Why Heterogeneous Information Networks?

• Entity Recommendation

• Information Diffusion

• Ideology Detection

• Summary

61

Page 63: Mining Heterogeneous Information Networkskdd.cs.ksu.edu/Workshops/HINA-2015/Presentations/Slides/01-Sun... · 25/07/2015  · Actor Movie Director Movie Studio 1. ... Leonardo Dicaprio

Summary

• Heterogeneous Information Networks are networks with multiple types of objects and links

• Principles in mining heterogeneous information networks

• Meta-path-based mining

• Systematically form new types of relations

• Relation strength-aware mining

• Different types of relations have different strengths

• Relation semantic-aware mining

• Different types of relations need different modeling

62

Page 64: Mining Heterogeneous Information Networkskdd.cs.ksu.edu/Workshops/HINA-2015/Presentations/Slides/01-Sun... · 25/07/2015  · Actor Movie Director Movie Studio 1. ... Leonardo Dicaprio

Q & A

63