yuan yao joint work with hanghang tong, feng xu, and jian lu predicting long-term impact of cqa...

27
Yuan Yao Joint work with Hanghang Tong, Feng Xu, and Jian Lu Predicting Long-Term Impact of CQA Posts: A Comprehensive Viewpoint 1 Aug 24-27, KDD 2014

Upload: jonathan-oconnor

Post on 19-Dec-2015

213 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Yuan Yao Joint work with Hanghang Tong, Feng Xu, and Jian Lu Predicting Long-Term Impact of CQA Posts: A Comprehensive Viewpoint 1 Aug 24-27, KDD 2014

Yuan Yao

Joint work with

Hanghang Tong, Feng Xu, and Jian Lu

Predicting Long-Term Impact of CQA Posts: A Comprehensive Viewpoint

1Aug 24-27, KDD 2014

Page 2: Yuan Yao Joint work with Hanghang Tong, Feng Xu, and Jian Lu Predicting Long-Term Impact of CQA Posts: A Comprehensive Viewpoint 1 Aug 24-27, KDD 2014

Roadmap

Background and Motivations Modeling Multi-aspect Computation Speedup Empirical Evaluations Conclusions

2

Page 3: Yuan Yao Joint work with Hanghang Tong, Feng Xu, and Jian Lu Predicting Long-Term Impact of CQA Posts: A Comprehensive Viewpoint 1 Aug 24-27, KDD 2014

Roadmap

Background and Motivations Modeling Multi-aspect Computation Speedup Empirical Evaluations Conclusions

3

Page 4: Yuan Yao Joint work with Hanghang Tong, Feng Xu, and Jian Lu Predicting Long-Term Impact of CQA Posts: A Comprehensive Viewpoint 1 Aug 24-27, KDD 2014

CQA

4

Page 5: Yuan Yao Joint work with Hanghang Tong, Feng Xu, and Jian Lu Predicting Long-Term Impact of CQA Posts: A Comprehensive Viewpoint 1 Aug 24-27, KDD 2014

Long-Term Impact

Q: How many users will find it beneficial?

5What is the Long-Term Impact of a Q/A post?

Page 6: Yuan Yao Joint work with Hanghang Tong, Feng Xu, and Jian Lu Predicting Long-Term Impact of CQA Posts: A Comprehensive Viewpoint 1 Aug 24-27, KDD 2014

Challenges

Q: Why not off-the-shell data mining algorithms?

Challenge 1: Multi-aspect C1.1. Coupling between questions and

answers C1.2. Feature non-linearity C1.3. Posts dynamically arrive

Challenge 2: Efficiency6

Page 7: Yuan Yao Joint work with Hanghang Tong, Feng Xu, and Jian Lu Predicting Long-Term Impact of CQA Posts: A Comprehensive Viewpoint 1 Aug 24-27, KDD 2014

C1.1 Coupling

7

Strong positive correlation![Yao+@ASONAM’14]

[Yao+@ASONAM’14] Y Yao, H Tong, T Xie, L Akoglu, F Xu, J Lu. Joint Voting Prediction for Questions and Answers in CQA. ASONAM 2014.

Fq

Predicted Q Impact Yq

Question Features

Fa

Predicted A Impact Ya

Answer Features

Voting Consistency

1 2

3

Page 8: Yuan Yao Joint work with Hanghang Tong, Feng Xu, and Jian Lu Predicting Long-Term Impact of CQA Posts: A Comprehensive Viewpoint 1 Aug 24-27, KDD 2014

C1.1 Coupling

LIP-M: [Yao+@ASONAM’14]

Question prediction

Answer prediction

Voting Consistency

Regularization

1 2

3

8

Page 9: Yuan Yao Joint work with Hanghang Tong, Feng Xu, and Jian Lu Predicting Long-Term Impact of CQA Posts: A Comprehensive Viewpoint 1 Aug 24-27, KDD 2014

C1.2 Non-linearity

The kernel trick (e.g., SVM) Mercer kernel

Kernel matrix as new feature matrix

9

Page 10: Yuan Yao Joint work with Hanghang Tong, Feng Xu, and Jian Lu Predicting Long-Term Impact of CQA Posts: A Comprehensive Viewpoint 1 Aug 24-27, KDD 2014

C1.3 Dynamics

Solution: recursive least squares regression[Haykin2005]

10

(x1, y1)(x2, y2) …(xt, yt)

(xt+1, yt+1)

Modelt

Modelt+1+

[Haykin2005] S Haykin. Adaptive filter theory. 2005.

Existing examples

New examples

Current model

New model

Page 11: Yuan Yao Joint work with Hanghang Tong, Feng Xu, and Jian Lu Predicting Long-Term Impact of CQA Posts: A Comprehensive Viewpoint 1 Aug 24-27, KDD 2014

This Paper

Q1: how to comprehensively capture the multi-aspect in one algorithm? Coupling, non-linearity, and dynamics

Q2: how to make the long-term impact prediction algorithm efficient?

11

Page 12: Yuan Yao Joint work with Hanghang Tong, Feng Xu, and Jian Lu Predicting Long-Term Impact of CQA Posts: A Comprehensive Viewpoint 1 Aug 24-27, KDD 2014

Roadmap

Background and Motivations Modeling Multi-aspect Computation Speedup Empirical Evaluations Conclusions

12

Page 13: Yuan Yao Joint work with Hanghang Tong, Feng Xu, and Jian Lu Predicting Long-Term Impact of CQA Posts: A Comprehensive Viewpoint 1 Aug 24-27, KDD 2014

Modeling Non-linearity

Basic Idea: kernelize LIP-M Details - LIP-KM:

Closed-form solution:

Complexity: O(n3)

Question prediction

Answer prediction

Voting Consistency

Regularization

1 2

3

13

Page 14: Yuan Yao Joint work with Hanghang Tong, Feng Xu, and Jian Lu Predicting Long-Term Impact of CQA Posts: A Comprehensive Viewpoint 1 Aug 24-27, KDD 2014

Modeling Dynamics

Basic idea: recursively update LIP-KM Details - LIP-KIM:

Complexity: O(n3) -> O(n2)

(matrix inverse lemma)

14

Page 15: Yuan Yao Joint work with Hanghang Tong, Feng Xu, and Jian Lu Predicting Long-Term Impact of CQA Posts: A Comprehensive Viewpoint 1 Aug 24-27, KDD 2014

Roadmap

Background and Motivations Modeling Multi-aspect Computation Speedup Empirical Evaluations Conclusions

16

Page 16: Yuan Yao Joint work with Hanghang Tong, Feng Xu, and Jian Lu Predicting Long-Term Impact of CQA Posts: A Comprehensive Viewpoint 1 Aug 24-27, KDD 2014

Approximation Method (1)

Basic idea: compress the kernel matrix Details – LIP-KIMA:

1) Separate decomposition 2) Make decomposition reusable 3) Apply decomposition on LIP-KIM

Complexity: O(n2) -> O(n)

(Nyström approximation)

(Eigen-decomposition)

(SVD on X1)(Eigen-decomposition)

17

Page 17: Yuan Yao Joint work with Hanghang Tong, Feng Xu, and Jian Lu Predicting Long-Term Impact of CQA Posts: A Comprehensive Viewpoint 1 Aug 24-27, KDD 2014

Approximation Method (2)

Basic idea: filter less informative examples Details - LIP-KIMAA:

Complexity: O(n) -> <O(n)

?

(x1, y1)(x2, y2) …(xt, yt)

ModeltExisting examples

Current model

(xt+1, yt+1)

New examples

Modelt+1 +New model

Filtering

18

Page 19: Yuan Yao Joint work with Hanghang Tong, Feng Xu, and Jian Lu Predicting Long-Term Impact of CQA Posts: A Comprehensive Viewpoint 1 Aug 24-27, KDD 2014

Roadmap

Background and Motivations Modeling Multi-aspect Computation Speedup Empirical Evaluations Conclusions

20

Page 20: Yuan Yao Joint work with Hanghang Tong, Feng Xu, and Jian Lu Predicting Long-Term Impact of CQA Posts: A Comprehensive Viewpoint 1 Aug 24-27, KDD 2014

Experiment Setup

Datasets (http://blog.stackoverflow.com/category/cc-wiki-dump/)

Stack Overflow , Mathematics Stack Exchange

Features Content (bag-of-words) & contextual features

Test setTraining set

Initial set Incremental setTime

21

Page 21: Yuan Yao Joint work with Hanghang Tong, Feng Xu, and Jian Lu Predicting Long-Term Impact of CQA Posts: A Comprehensive Viewpoint 1 Aug 24-27, KDD 2014

Evaluation Objectives

O1: Effectiveness How accurate are the proposed

algorithms for long-term impact prediction?

O2: Efficiency How scalable are the proposed

algorithms?

22

Page 22: Yuan Yao Joint work with Hanghang Tong, Feng Xu, and Jian Lu Predicting Long-Term Impact of CQA Posts: A Comprehensive Viewpoint 1 Aug 24-27, KDD 2014

Effectiveness Results

Comparisons with existing models.

(better)

Our methods

23

Page 23: Yuan Yao Joint work with Hanghang Tong, Feng Xu, and Jian Lu Predicting Long-Term Impact of CQA Posts: A Comprehensive Viewpoint 1 Aug 24-27, KDD 2014

Efficiency Results

The speed comparisons.

(better)

LIP-KIMAA (sub-linear)

Ours

25

Page 24: Yuan Yao Joint work with Hanghang Tong, Feng Xu, and Jian Lu Predicting Long-Term Impact of CQA Posts: A Comprehensive Viewpoint 1 Aug 24-27, KDD 2014

Quality-Speed Balance-off

Our methods

(better)

26

Page 25: Yuan Yao Joint work with Hanghang Tong, Feng Xu, and Jian Lu Predicting Long-Term Impact of CQA Posts: A Comprehensive Viewpoint 1 Aug 24-27, KDD 2014

Roadmap

Background and Motivations Modeling Multi-aspect Computation Speedup Empirical Evaluations Conclusions

27

Page 26: Yuan Yao Joint work with Hanghang Tong, Feng Xu, and Jian Lu Predicting Long-Term Impact of CQA Posts: A Comprehensive Viewpoint 1 Aug 24-27, KDD 2014

Conclusions

A family of algorithms for long-term impact prediction of CQA posts

Q1: how to capture coupling, non-linearity, and dynamics?

A1: voting consistency + kernel trick + recursive updating

Q2: how to make the algorithms scalable? A2: approximation methods

Empirical Evaluations Effectiveness: up to 35.8% improvement Efficiency: up to 390x speedup and sub-linear

scalability28

Page 27: Yuan Yao Joint work with Hanghang Tong, Feng Xu, and Jian Lu Predicting Long-Term Impact of CQA Posts: A Comprehensive Viewpoint 1 Aug 24-27, KDD 2014

Q&A

Thanks!

29

Authors: Yuan Yao, Hanghang Tong, Feng Xu, and Jian Lu

Yuan Yao, [email protected]