learning from the past: answering new questions with past answers date: 2012/11/22 author: anna...

LEARNING FROM THE PAST:ANSWERING NEW QUESTIONS WITH PAST ANSWERS

Date: 2012/11/22Author: Anna Shtok, Gideon Dror,

Yoelle Maarek, Idan SzpektorSource: WWW ’12Advisor: Dr. Jia-Ling KohSpeaker: Yi-Hsuan Yeh

OUTLINEIntroductionDescription of approach

Stage one: top candidate selectionStage two: top candidate validation

ExperimentOfflineOnline

Conclusion 2

INTRODUCTION

Users struggle with expressing their need as short query3

INTRODUCTION Community-based Question Answering(CQA)

sites, such as Yahoo! Answers or Baidu Zhidao

Body15% of the questions unanswere

Answer new questions by past resolved question

Conclusion 5

A TWO STAGE APPROACH

find the most similar past question.

decides whether or not to serve the answer

STAGE ONE: TOP CANDIDATE SELECTION Vector-space unigram model with TF-IDF weight

7 Ranking: Cos(Qpast title+body, Qnew title+body)

=> the top candidate past question and A

w1 w2 w3 . . . wn(title)Qnew Qpast 1

Qpast 2 . .Qpast n

0.1 0.2 0.12 . . . 0.8

0.3 0.5 0.2 . . . 0.1

0.2 0 0.1 . . . 0.6

0.9 0.3 0.5 . . . 0.1

TF-IDF

Cosine similarity => threshold α

Train a classifier that validates whether A can be served as an answer to Qnew.

STAGE TWO: TOP CANDIDATE VALIDATION

SURFACE-LEVEL FEATURE Surface level statistics

text length, number of question marks, stop word count, maximal IDF within all terms in the text, minimal IDF, average IDF, IDF standard deviation, http link count, number of figures.

Surface level similarity TF-IDF weighted word unigram vector space model Cosine similarity

Qnew title - Qpast title Qnew body - Qpast body Qnew title+ body - Qpast title+body Qnew title+ body - Answer Qpast title+ body - Answer

LINGUISTIC ANALYSIS Latent topic

LDA(Latent Dirichlet Allocation)

Qnew Qpast A

Topic 1 0.3 0.1 0.25Topic 2 0.03 0.1 0.02Topic 3 0.15 0.08 0.12 . . . . . . . . . . . . . . . .Topic n 0.06 0.13 0.05

• Entropy• Most probable topic• JS divergence

Lexico-syntactic analysis Stanford dependency parser

Main verb , subject, object, the main noun and adjective

Ex: Q1:Why doesn’t my dog eat?Main predicate : eatMain predicate argument: dog

Q2:Why doesn’t my cat eat?Main predicate : eatMain predicate argument: cat

RESULT LIST ANALYSIS Query clarity

Qpast1 Qpast2 Qpast3 Qpastall

0.2 Language model & KL divergence

Query feedback Informational similarity between two queries can

be effectively estimated by the similarity between their ranked document lists.

Result list length The number of questions that pass the threshold α 13

CLASSIFIER MODEL Random forest classifier Random n feature & training n past questions

… ….

Conclusion 15

OFFLINE Dataset

Yahoo! Answer: Beauty & Style, Health and Pets. Included best answers chosen by the askers, and

received at least three stars. Between Feb and Dec 2010

MTurk Fleiss’s kappa

ONLINE

Conclusions 22

CONCLUSIONS Short questions might suffer from vocabulary

mismatch problems and sparsity.

The long cumbersome descriptions introduce many irrelevant aspects which can hardly be separated from the essential question details(even for a human reader).

Terms that are repeated in the past question and in its best answer should usually be emphasized more as related to the expressed need. 23

A general informative answer can satisfy a number of topically connected but different questions.

A general social answer, may often satisfy a certain type of questions.

In future work, we would like to better understand time-sensitive questions, such as common in the Sports category

learning from the past: answering new questions with past answers date: 2012/11/22 author: anna...

Documents

tagging the physical world - ibiblio · tagging the...

modern information...

1 textual entailment as a framework for applied semantics...

modern information...

delta-encoder: an effective sample synthesis …...-encoder:...

object detection · repmet: representative-based metric...

1 textual entailment: a perspective on applied text...

personalized pocket directories for mobile devices doron...

arxiv:2003.06798v1 [cs.cv] 15 mar 2020 · starnet: towards...

learning from the past: answering new questions with past...

intelligence in theory and in practice fundaments of big...

boyer moore algorithm idan szpektor. boyer and moore

taking a numeric path idan szpektor. the input a partial...

date: 2013/12/04 author: gideon dror , yoelle maarek , ...

1 cluster ranking with an application to mining mailbox...

user manual - thesaberarmory.com · 02 |...