simultaneously modeling semantics and structure of threaded discussions: a sparse coding approach...

34
SIMULTANEOUSLY MODELING SEMANTICS AND STRUCTURE OF THREADED DISCUSSIONS: A SPARSE CODING APPROACH AND ITS APPLICATIONS Chen LIN * , Jiang-Ming YANG + , Rui CAI + , Xin-jing WANG + , Wei WANG * , Lei ZHANG + * Fudan University + Microsoft Research Asia 1

Upload: aaron-mckenna

Post on 27-Mar-2015

219 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: SIMULTANEOUSLY MODELING SEMANTICS AND STRUCTURE OF THREADED DISCUSSIONS: A SPARSE CODING APPROACH AND ITS APPLICATIONS Chen LIN *, Jiang-Ming YANG +, Rui

SIMULTANEOUSLY MODELING SEMANTICS AND STRUCTURE OF THREADED DISCUSSIONS: A SPARSE CODING APPROACH AND ITS APPLICATIONS

Chen LIN *, Jiang-Ming YANG +, Rui CAI +, Xin-jing WANG +, Wei WANG *, Lei ZHANG +

*Fudan University+Microsoft Research Asia

1

Page 2: SIMULTANEOUSLY MODELING SEMANTICS AND STRUCTURE OF THREADED DISCUSSIONS: A SPARSE CODING APPROACH AND ITS APPLICATIONS Chen LIN *, Jiang-Ming YANG +, Rui

OUTLINE

Motivation Challenges Model Application

Reply reconstruction Junk post detection Expert finding

Experiments Conclusion

2

Page 3: SIMULTANEOUSLY MODELING SEMANTICS AND STRUCTURE OF THREADED DISCUSSIONS: A SPARSE CODING APPROACH AND ITS APPLICATIONS Chen LIN *, Jiang-Ming YANG +, Rui

THREADED DISCUSSIONS

Mailing lists

Chat roomsIMs Web forums

3

root

reply

Page 4: SIMULTANEOUSLY MODELING SEMANTICS AND STRUCTURE OF THREADED DISCUSSIONS: A SPARSE CODING APPROACH AND ITS APPLICATIONS Chen LIN *, Jiang-Ming YANG +, Rui

IMPORTANT DATA SOURCES

4

Page 5: SIMULTANEOUSLY MODELING SEMANTICS AND STRUCTURE OF THREADED DISCUSSIONS: A SPARSE CODING APPROACH AND ITS APPLICATIONS Chen LIN *, Jiang-Ming YANG +, Rui

MINING SEMANTICS & STRUCTURE

5

Junk Identification

Expert Search

Measure post quality

Page 6: SIMULTANEOUSLY MODELING SEMANTICS AND STRUCTURE OF THREADED DISCUSSIONS: A SPARSE CODING APPROACH AND ITS APPLICATIONS Chen LIN *, Jiang-Ming YANG +, Rui

CHALLENGE

6

Semantics & Structure

Page 7: SIMULTANEOUSLY MODELING SEMANTICS AND STRUCTURE OF THREADED DISCUSSIONS: A SPARSE CODING APPROACH AND ITS APPLICATIONS Chen LIN *, Jiang-Ming YANG +, Rui

SEMANTIC & STRUCTURE

7

Semantic:Topics

Structure:Who reply to who

Page 8: SIMULTANEOUSLY MODELING SEMANTICS AND STRUCTURE OF THREADED DISCUSSIONS: A SPARSE CODING APPROACH AND ITS APPLICATIONS Chen LIN *, Jiang-Ming YANG +, Rui

CHALLENGE

8

Junk Post

Page 9: SIMULTANEOUSLY MODELING SEMANTICS AND STRUCTURE OF THREADED DISCUSSIONS: A SPARSE CODING APPROACH AND ITS APPLICATIONS Chen LIN *, Jiang-Ming YANG +, Rui

JUNK POST

9

Page 10: SIMULTANEOUSLY MODELING SEMANTICS AND STRUCTURE OF THREADED DISCUSSIONS: A SPARSE CODING APPROACH AND ITS APPLICATIONS Chen LIN *, Jiang-Ming YANG +, Rui

CHALLENGE

10

Post Quality

Page 11: SIMULTANEOUSLY MODELING SEMANTICS AND STRUCTURE OF THREADED DISCUSSIONS: A SPARSE CODING APPROACH AND ITS APPLICATIONS Chen LIN *, Jiang-Ming YANG +, Rui

POST QUALITY

valuable post

11

Page 12: SIMULTANEOUSLY MODELING SEMANTICS AND STRUCTURE OF THREADED DISCUSSIONS: A SPARSE CODING APPROACH AND ITS APPLICATIONS Chen LIN *, Jiang-Ming YANG +, Rui

MODEL

Purpose: Simultaneously modeling semantics Structures

Methodology Intuitive Matrix based Sparse coding

root

reply

12

Page 13: SIMULTANEOUSLY MODELING SEMANTICS AND STRUCTURE OF THREADED DISCUSSIONS: A SPARSE CODING APPROACH AND ITS APPLICATIONS Chen LIN *, Jiang-Ming YANG +, Rui

INTUITION

13

Page 14: SIMULTANEOUSLY MODELING SEMANTICS AND STRUCTURE OF THREADED DISCUSSIONS: A SPARSE CODING APPROACH AND ITS APPLICATIONS Chen LIN *, Jiang-Ming YANG +, Rui

A THREAD HAS SEVERAL TOPICS

14

Page 15: SIMULTANEOUSLY MODELING SEMANTICS AND STRUCTURE OF THREADED DISCUSSIONS: A SPARSE CODING APPROACH AND ITS APPLICATIONS Chen LIN *, Jiang-Ming YANG +, Rui

SEMANTIC REPRESENTATION OF THREAD

D X Θ

Minimize:

post1 post2 … postLword1word2word3…wordV

topic1 … topicTword1word2word3…wordV

post1 post2 … postLtopic1…topicT

15

Project posts to topic space

Page 16: SIMULTANEOUSLY MODELING SEMANTICS AND STRUCTURE OF THREADED DISCUSSIONS: A SPARSE CODING APPROACH AND ITS APPLICATIONS Chen LIN *, Jiang-Ming YANG +, Rui

A POST IS RELATED TO PREVIOUS POSTS

Minimize

16

post1 post2 … postLtopic1…topicTΘ

b:

approximate each post aslinear combination ofprevious posts

Page 17: SIMULTANEOUSLY MODELING SEMANTICS AND STRUCTURE OF THREADED DISCUSSIONS: A SPARSE CODING APPROACH AND ITS APPLICATIONS Chen LIN *, Jiang-Ming YANG +, Rui

A POST IS RELATED TO A FEW TOPICSgovernment

cobol

17

Page 18: SIMULTANEOUSLY MODELING SEMANTICS AND STRUCTURE OF THREADED DISCUSSIONS: A SPARSE CODING APPROACH AND ITS APPLICATIONS Chen LIN *, Jiang-Ming YANG +, Rui

SPARSE SEMANTICS OF POST

D X Θ

Minimize:

post1 post2 … postLword1word2word3…wordV

topic1 … topicTword1word2word3…wordV

post1 post2 … postLtopic1…topicT

18

Page 19: SIMULTANEOUSLY MODELING SEMANTICS AND STRUCTURE OF THREADED DISCUSSIONS: A SPARSE CODING APPROACH AND ITS APPLICATIONS Chen LIN *, Jiang-Ming YANG +, Rui

A POST IS RELATED TO A FEW POSTS

Minimize

19

post1 post2 … postLtopic1…topicT

Θ

Sparse

b:

approximate each post aslinear combination ofprevious posts

Page 20: SIMULTANEOUSLY MODELING SEMANTICS AND STRUCTURE OF THREADED DISCUSSIONS: A SPARSE CODING APPROACH AND ITS APPLICATIONS Chen LIN *, Jiang-Ming YANG +, Rui

OPTIMIZE THEM TOGETHER

Model semantic

Model structure

20

Page 21: SIMULTANEOUSLY MODELING SEMANTICS AND STRUCTURE OF THREADED DISCUSSIONS: A SPARSE CODING APPROACH AND ITS APPLICATIONS Chen LIN *, Jiang-Ming YANG +, Rui

APPLICATIONS

Reply reconstruction Capability of recognizing structure

Junk identification Capability of capturing semantics

Expert finding Capability of measuring post quality

21

Page 22: SIMULTANEOUSLY MODELING SEMANTICS AND STRUCTURE OF THREADED DISCUSSIONS: A SPARSE CODING APPROACH AND ITS APPLICATIONS Chen LIN *, Jiang-Ming YANG +, Rui

REPLY RECONSTRUCTION

22

DocumentSimilarity

TopicSimilarity

StructureSimilarity

Page 23: SIMULTANEOUSLY MODELING SEMANTICS AND STRUCTURE OF THREADED DISCUSSIONS: A SPARSE CODING APPROACH AND ITS APPLICATIONS Chen LIN *, Jiang-Ming YANG +, Rui

DATA SET

Slashdot Apple discussion

23

No.threads 1154

No.posts 203210

Avg.thread len.

176.09

Avg.word/p 73.53

Avg.post/user 15.32

No.threads 4488

No.posts 80008

Avg.thread len.

17.84

Avg.word/p 78.36

Avg.post/user 4.69

Page 24: SIMULTANEOUSLY MODELING SEMANTICS AND STRUCTURE OF THREADED DISCUSSIONS: A SPARSE CODING APPROACH AND ITS APPLICATIONS Chen LIN *, Jiang-Ming YANG +, Rui

BASELINES NP

Reply to Nearest Post RR

Reply to Root DS

Document Similarity LDA

Latent Dirichlet Allocation Project documents to topic space

SWB Special Words Topic Model with Background

distribution Project documents to topic and junk topic space

24

Page 25: SIMULTANEOUSLY MODELING SEMANTICS AND STRUCTURE OF THREADED DISCUSSIONS: A SPARSE CODING APPROACH AND ITS APPLICATIONS Chen LIN *, Jiang-Ming YANG +, Rui

EVALUATION

method Slashdot Apple

All Posts Good Posts All Posts Good Posts

NP 0.021 0.012 0.289 0.239

RR 0.183 0.319 0.269 0.474

DS 0.463 0.643 0.409 0.628

LDA 0.465 0.644 0.410 0.648

SWB 0.463 0.644 0.410 0.641

SMSS 0.524 0.737 0.517 0.772

25

Page 26: SIMULTANEOUSLY MODELING SEMANTICS AND STRUCTURE OF THREADED DISCUSSIONS: A SPARSE CODING APPROACH AND ITS APPLICATIONS Chen LIN *, Jiang-Ming YANG +, Rui

JUNK IDENTIFICATION

D=

X =

Θ =

Probability of junk

post1 post2 … … … postLword1word2word3…wordV

,

topic1 … topicT topicbgword1word2word3…wordV

post1 post2 … … … postLtopic1…topicTtopicbg

26

Page 27: SIMULTANEOUSLY MODELING SEMANTICS AND STRUCTURE OF THREADED DISCUSSIONS: A SPARSE CODING APPROACH AND ITS APPLICATIONS Chen LIN *, Jiang-Ming YANG +, Rui

DATA SET

Slashdot Apple discussion

27

Page 28: SIMULTANEOUSLY MODELING SEMANTICS AND STRUCTURE OF THREADED DISCUSSIONS: A SPARSE CODING APPROACH AND ITS APPLICATIONS Chen LIN *, Jiang-Ming YANG +, Rui

BASELINES

28

DF

SVM Classify posts as junk posts & non-junk posts

SWBSpecial Words Topic Model with

Background distribution Project documents to topic and junk topic space

Page 29: SIMULTANEOUSLY MODELING SEMANTICS AND STRUCTURE OF THREADED DISCUSSIONS: A SPARSE CODING APPROACH AND ITS APPLICATIONS Chen LIN *, Jiang-Ming YANG +, Rui

EVALUATIONMethod Precision Recall F-measure

SWB 0.48 0.22 0.30

SVM 0.37 0.24 0.20

DF 0.34 0.40 0.36

SMSS 0.38 0.45 0.41

29

Page 30: SIMULTANEOUSLY MODELING SEMANTICS AND STRUCTURE OF THREADED DISCUSSIONS: A SPARSE CODING APPROACH AND ITS APPLICATIONS Chen LIN *, Jiang-Ming YANG +, Rui

EXPERT FINDING Methods

HITS

PageRank

30

Page 31: SIMULTANEOUSLY MODELING SEMANTICS AND STRUCTURE OF THREADED DISCUSSIONS: A SPARSE CODING APPROACH AND ITS APPLICATIONS Chen LIN *, Jiang-Ming YANG +, Rui

BASELINES LM

Formal Models for Expert Finding in Enterprise Corpora. SIGIR 06

Achieves stable performance in expert finding task using a language model

PageRank Benchmark nodal ranking method

HITS Find hub nodes and authority node

EABIF Personalized Recommendation Driven by

Information Flow. SIGIR ’06 Find most influential node 31

Page 32: SIMULTANEOUSLY MODELING SEMANTICS AND STRUCTURE OF THREADED DISCUSSIONS: A SPARSE CODING APPROACH AND ITS APPLICATIONS Chen LIN *, Jiang-Ming YANG +, Rui

EVALUATION

32

Bayesian estimate

Method MRR MAP P@10

LM 0.821 0.698 0.800

EABIF(ori.) 0.674 0.362 0.243

EABIF(rec.) 0.742 0.318 0.281

PageRank(ori.) 0.675 0.377 0.263

PageRank(rec.)

0.743 0.321 0.266

HITS(ori.) 0.906 0.832 0.900

HITS(rec.) 0.938 0.822 0.906

Page 33: SIMULTANEOUSLY MODELING SEMANTICS AND STRUCTURE OF THREADED DISCUSSIONS: A SPARSE CODING APPROACH AND ITS APPLICATIONS Chen LIN *, Jiang-Ming YANG +, Rui

DISCUSSION

Parameters vs. Model Complexity Linear regression

SMSS model

Though the number of parameters is increased, the projection space is shrunk by the prior knowledge. 33

Prior knowledge

Prior knowledge

Page 34: SIMULTANEOUSLY MODELING SEMANTICS AND STRUCTURE OF THREADED DISCUSSIONS: A SPARSE CODING APPROACH AND ITS APPLICATIONS Chen LIN *, Jiang-Ming YANG +, Rui

CONCLUSION

Purpose Mine the semantics Mine the structure

Highlight Simultaneously model the

Semantic Structure

Applications are designed to evaluate the model Reply reconstruction Junk identification Expert Finding

34