learning from the past: answering new questions with past answers

24
LEARNING FROM THE PAST: ANSWERING NEW QUESTIONS WITH PAST ANSWERS Date: 2012/11/22 Author: Anna Shtok, Gideon Dror, Yoelle Maarek, Idan Szpektor Source: WWW ’12 Advisor: Dr. Jia-Ling Koh Speaker: Yi-Hsuan Yeh

Upload: carys

Post on 08-Jan-2016

66 views

Category:

Documents


6 download

DESCRIPTION

LEARNING FROM THE PAST: ANSWERING NEW QUESTIONS WITH PAST ANSWERS. Date: 2012/11/22 Author: Anna Shtok, Gideon Dror, Yoelle Maarek, Idan Szpektor Source: WWW ’12 Advisor: Dr. Jia-Ling Koh Speaker: Yi-Hsuan Yeh. OUTLINE. Introduction Description of approach - PowerPoint PPT Presentation

TRANSCRIPT

Page 1: LEARNING FROM THE PAST: ANSWERING NEW QUESTIONS WITH PAST ANSWERS

LEARNING FROM THE PAST:ANSWERING NEW QUESTIONS WITH PAST ANSWERS

Date: 2012/11/22

Author: Anna Shtok, Gideon Dror, Yoelle Maarek, Idan

Szpektor

Source: WWW ’12

Advisor: Dr. Jia-Ling Koh

Speaker: Yi-Hsuan Yeh

Page 2: LEARNING FROM THE PAST: ANSWERING NEW QUESTIONS WITH PAST ANSWERS

OUTLINE

IntroductionDescription of approach

Stage one: top candidate selectionStage two: top candidate validation

ExperimentOfflineOnline

Conclusion2

Page 3: LEARNING FROM THE PAST: ANSWERING NEW QUESTIONS WITH PAST ANSWERS

INTRODUCTION

Users struggle with expressing their need as short query3

Page 4: LEARNING FROM THE PAST: ANSWERING NEW QUESTIONS WITH PAST ANSWERS

INTRODUCTION

Community-based Question Answering(CQA) sites, such as Yahoo! Answers or Baidu Zhidao

4

Title

Body15% of the questions unanswere

d

Answer new questions by past resolved question

Page 5: LEARNING FROM THE PAST: ANSWERING NEW QUESTIONS WITH PAST ANSWERS

OUTLINE

IntroductionDescription of approach

Stage one: top candidate selectionStage two: top candidate validation

ExperimentOfflineOnline

Conclusion5

Page 6: LEARNING FROM THE PAST: ANSWERING NEW QUESTIONS WITH PAST ANSWERS

A TWO STAGE APPROACH

6

find the most similar past question.

decides whether or not to serve the answer

Page 7: LEARNING FROM THE PAST: ANSWERING NEW QUESTIONS WITH PAST ANSWERS

STAGE ONE: TOP CANDIDATE SELECTION Vector-space unigram model with TF-IDF weight

7 Ranking: Cos(Qpast title+body, Qnew title+body)

=> the top candidate past question and A

w1 w2 w3 . . . wn(title)Qnew Qpast 1

Qpast 2 . .Qpast n

0.1 0.2 0.12 . . . 0.8

0.3 0.5 0.2 . . . 0.1

0.2 0 0.1 . . . 0.6

0.9 0.3 0.5 . . . 0.1

TF-IDF

Cosine similarity => threshold α

Page 8: LEARNING FROM THE PAST: ANSWERING NEW QUESTIONS WITH PAST ANSWERS

Train a classifier that validates whether A can be served as an answer to Qnew.

STAGE TWO: TOP CANDIDATE VALIDATION

8

Page 9: LEARNING FROM THE PAST: ANSWERING NEW QUESTIONS WITH PAST ANSWERS

SURFACE-LEVEL FEATURE

Surface level statistics text length, number of question marks, stop word

count, maximal IDF within all terms in the text, minimal IDF, average IDF, IDF standard deviation, http link count, number of figures.

Surface level similarity TF-IDF weighted word unigram vector space model Cosine similarity

Qnew title - Qpast title Qnew body - Qpast body Qnew title+ body - Qpast title+body Qnew title+ body - Answer Qpast title+ body - Answer

9

Page 10: LEARNING FROM THE PAST: ANSWERING NEW QUESTIONS WITH PAST ANSWERS

LINGUISTIC ANALYSIS

Latent topic LDA(Latent Dirichlet Allocation)

10

Qnew Qpast A

Topic 1 0.3 0.1 0.25Topic 2 0.03 0.1 0.02Topic 3 0.15 0.08 0.12 . . . . . . . . . . . . . . . .Topic n 0.06 0.13 0.05

• Entropy• Most probable topic• JS divergence

Page 11: LEARNING FROM THE PAST: ANSWERING NEW QUESTIONS WITH PAST ANSWERS

Lexico-syntactic analysis Stanford dependency parser

Main verb , subject, object, the main noun and adjective

Ex: Q1:Why doesn’t my dog eat?Main predicate : eat

Main predicate argument: dog

Q2:Why doesn’t my cat eat?Main predicate : eat

Main predicate argument: cat

11

Page 12: LEARNING FROM THE PAST: ANSWERING NEW QUESTIONS WITH PAST ANSWERS

RESULT LIST ANALYSIS

Query clarity

12

Qnew

Qpast1 Qpast2 Qpast3 Qpastall

A

B

C

D

0.5

0

0.3

0.2

0

0.5

0.1

0.4

0.1

0

0

0.9

0.5

0

0.3

0.2

Language model & KL divergence

Page 13: LEARNING FROM THE PAST: ANSWERING NEW QUESTIONS WITH PAST ANSWERS

Query feedback Informational similarity between two queries can

be effectively estimated by the similarity between their ranked document lists.

Result list length The number of questions that pass the threshold

α

13

Page 14: LEARNING FROM THE PAST: ANSWERING NEW QUESTIONS WITH PAST ANSWERS

CLASSIFIER MODEL

Random forest classifier Random n feature & training n past questions

… ….

14

Page 15: LEARNING FROM THE PAST: ANSWERING NEW QUESTIONS WITH PAST ANSWERS

OUTLINE

IntroductionDescription of approach

Stage one: top candidate selectionStage two: top candidate validation

ExperimentOfflineOnline

Conclusion15

Page 16: LEARNING FROM THE PAST: ANSWERING NEW QUESTIONS WITH PAST ANSWERS

OFFLINE

Dataset Yahoo! Answer: Beauty & Style, Health and Pets. Included best answers chosen by the askers, and

received at least three stars. Between Feb and Dec 2010

16

Page 17: LEARNING FROM THE PAST: ANSWERING NEW QUESTIONS WITH PAST ANSWERS

MTurk Fleiss’s kappa

17

Page 18: LEARNING FROM THE PAST: ANSWERING NEW QUESTIONS WITH PAST ANSWERS

18

Page 19: LEARNING FROM THE PAST: ANSWERING NEW QUESTIONS WITH PAST ANSWERS

19

Page 20: LEARNING FROM THE PAST: ANSWERING NEW QUESTIONS WITH PAST ANSWERS

ONLINE

20

Page 21: LEARNING FROM THE PAST: ANSWERING NEW QUESTIONS WITH PAST ANSWERS

21

Page 22: LEARNING FROM THE PAST: ANSWERING NEW QUESTIONS WITH PAST ANSWERS

OUTLINE

IntroductionDescription of approach

Stage one: top candidate selectionStage two: top candidate validation

ExperimentOfflineOnline

Conclusions22

Page 23: LEARNING FROM THE PAST: ANSWERING NEW QUESTIONS WITH PAST ANSWERS

CONCLUSIONS

Short questions might suffer from vocabulary mismatch problems and sparsity.

The long cumbersome descriptions introduce many irrelevant aspects which can hardly be separated from the essential question details(even for a human reader).

Terms that are repeated in the past question and in its best answer should usually be emphasized more as related to the expressed need.

23

Page 24: LEARNING FROM THE PAST: ANSWERING NEW QUESTIONS WITH PAST ANSWERS

A general informative answer can satisfy a number of topically connected but different questions.

A general social answer, may often satisfy a certain type of questions.

In future work, we would like to better understand time-sensitive questions, such as common in the Sports category

24