named entity recognition in query jiafeng guo, gu xu, xueqi cheng, hang li (acm sigir 2009) speaker:...

23
Named Entity Recognition in Query Jiafeng Guo, Gu Xu, Xueqi Cheng, Hang Li (ACM SIGIR 2009) Speaker: Yi-Lin,Hsu Advisor: Dr. Koh, Jia-ling Date: 2009/11/16

Upload: dulcie-mosley

Post on 16-Dec-2015

220 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Named Entity Recognition in Query Jiafeng Guo, Gu Xu, Xueqi Cheng, Hang Li (ACM SIGIR 2009) Speaker: Yi-Lin,Hsu Advisor: Dr. Koh, Jia-ling Date: 2009/11/16

Named Entity Recognition in Query

Jiafeng Guo, Gu Xu, Xueqi Cheng, Hang Li(ACM SIGIR 2009)

Speaker: Yi-Lin,HsuAdvisor: Dr. Koh, Jia-ling

Date: 2009/11/16

Page 2: Named Entity Recognition in Query Jiafeng Guo, Gu Xu, Xueqi Cheng, Hang Li (ACM SIGIR 2009) Speaker: Yi-Lin,Hsu Advisor: Dr. Koh, Jia-ling Date: 2009/11/16

Outline

• Introduction to NERQ• NERQ Problem• Implementation• WSLDA• Experimental Results• Conclusion and Future work

2009/10/22 2

Page 3: Named Entity Recognition in Query Jiafeng Guo, Gu Xu, Xueqi Cheng, Hang Li (ACM SIGIR 2009) Speaker: Yi-Lin,Hsu Advisor: Dr. Koh, Jia-ling Date: 2009/11/16

Introduction to NERQ

• Named entity recognition (NER)is a subtask of information extraction that seeks to locate and classify atomic elements in text into predefined categories such as the names of persons, organizations, locations, expressions of times, quantities, monetary values, percentages, etc.

2009/10/22 3

Page 4: Named Entity Recognition in Query Jiafeng Guo, Gu Xu, Xueqi Cheng, Hang Li (ACM SIGIR 2009) Speaker: Yi-Lin,Hsu Advisor: Dr. Koh, Jia-ling Date: 2009/11/16

Introduction to NERQ

• NERQ involves 2 tasks:– 1. Detection of the named entity in a given query – 2. Classification of the named entity into

predefined classes.– Example: mine movie titles – Applications: Web search, etc.

• Challenges– Queries are usually very short– Queries are not necessarily in standard form

2009/10/22 4

Page 5: Named Entity Recognition in Query Jiafeng Guo, Gu Xu, Xueqi Cheng, Hang Li (ACM SIGIR 2009) Speaker: Yi-Lin,Hsu Advisor: Dr. Koh, Jia-ling Date: 2009/11/16

Query Data

• New data source for NER– About 70% of search queries contain named

entities.– Rich context for determining the classes of entities.• Query Context

– “harry potter walkthrough”→“harry potter cheats” (context in the same class)

• Wisdom-of-crowds• Very Large-scale data and keep on growing• Frequent update with emerging named entities

2009/10/22 5

Page 6: Named Entity Recognition in Query Jiafeng Guo, Gu Xu, Xueqi Cheng, Hang Li (ACM SIGIR 2009) Speaker: Yi-Lin,Hsu Advisor: Dr. Koh, Jia-ling Date: 2009/11/16

NERQ Problem

• A query having one named entity is represented as a triple (e, t, c), – e : named entity,– t : context of e α#β– c : class of e

2009/10/22 6

Page 7: Named Entity Recognition in Query Jiafeng Guo, Gu Xu, Xueqi Cheng, Hang Li (ACM SIGIR 2009) Speaker: Yi-Lin,Hsu Advisor: Dr. Koh, Jia-ling Date: 2009/11/16

Probabilistic Approach

• (e,t,c)* = argmax (e,t,c) Pr(q,e,t,c) = argmax (e,t,c) Pr(q|e,t,c) Pr(e,t,c)

= argmax (e,t,c) Pr(e,t,c) (1)

• Pr(e,t,c) = Pr(e) Pr(c|e) Pr(t|e,c)= Pr(e) Pr(c|e) Pr(t|c)

(2)

2009/10/22 7

)(qG

Make an assumption

here

Page 8: Named Entity Recognition in Query Jiafeng Guo, Gu Xu, Xueqi Cheng, Hang Li (ACM SIGIR 2009) Speaker: Yi-Lin,Hsu Advisor: Dr. Koh, Jia-ling Date: 2009/11/16

Topic Model for NERQ

• T = {(ei,ti,ci) | i = 1..N} , the learning problem can be formalized as :

2009/10/22 8

Page 9: Named Entity Recognition in Query Jiafeng Guo, Gu Xu, Xueqi Cheng, Hang Li (ACM SIGIR 2009) Speaker: Yi-Lin,Hsu Advisor: Dr. Koh, Jia-ling Date: 2009/11/16

Implementation

• Offline Training• Online Prediction

2009/10/22 9

Page 10: Named Entity Recognition in Query Jiafeng Guo, Gu Xu, Xueqi Cheng, Hang Li (ACM SIGIR 2009) Speaker: Yi-Lin,Hsu Advisor: Dr. Koh, Jia-ling Date: 2009/11/16

Offline Training

2009/10/22 10

………………..Harry Potter………………..………………..

………………..Harry Potter………………..………………..

Seeds

Scan the query log with the seed name entity and collect the queries contain themScan the query log with the seed name entity and collect the queries contain them

………………..Harry Potter trailHarry Potter walk throughHarry Potter cheats………………..

………………..Harry Potter trailHarry Potter walk throughHarry Potter cheats………………..

Query log

Page 11: Named Entity Recognition in Query Jiafeng Guo, Gu Xu, Xueqi Cheng, Hang Li (ACM SIGIR 2009) Speaker: Yi-Lin,Hsu Advisor: Dr. Koh, Jia-ling Date: 2009/11/16

movie

Offline Training• Pr(e) : the total frequency of queries

containing e in the query log

2009/10/22 11

Harry Potter trailsNew Moon

Name entity Context Class

Query

Pr(c|e) : estimated by WS-LDAPr(c|t) : fixed

Page 12: Named Entity Recognition in Query Jiafeng Guo, Gu Xu, Xueqi Cheng, Hang Li (ACM SIGIR 2009) Speaker: Yi-Lin,Hsu Advisor: Dr. Koh, Jia-ling Date: 2009/11/16

Online Prediction

harry

2009/10/22 12

trailspotter

Find the most likely triple (e,t,c) in G(q)

Page 13: Named Entity Recognition in Query Jiafeng Guo, Gu Xu, Xueqi Cheng, Hang Li (ACM SIGIR 2009) Speaker: Yi-Lin,Hsu Advisor: Dr. Koh, Jia-ling Date: 2009/11/16

WSLDA

2009/10/22 13

Page 14: Named Entity Recognition in Query Jiafeng Guo, Gu Xu, Xueqi Cheng, Hang Li (ACM SIGIR 2009) Speaker: Yi-Lin,Hsu Advisor: Dr. Koh, Jia-ling Date: 2009/11/16

WSLDA

• Introduce Weak Supervision– LDA log likelihood + soft constraints

– Soft Constraints

2009/10/22 14

yCwpywL log,LDA Probability Soft Constraints

i ii zyyC Document Probability

on i-th ClassDocument Probability on i-th Class

Document Binary Label on i-th Class Document Binary Label on i-th Class

Page 15: Named Entity Recognition in Query Jiafeng Guo, Gu Xu, Xueqi Cheng, Hang Li (ACM SIGIR 2009) Speaker: Yi-Lin,Hsu Advisor: Dr. Koh, Jia-ling Date: 2009/11/16

WSLDA

• Objective Fuction :

2009/10/22 15

Page 16: Named Entity Recognition in Query Jiafeng Guo, Gu Xu, Xueqi Cheng, Hang Li (ACM SIGIR 2009) Speaker: Yi-Lin,Hsu Advisor: Dr. Koh, Jia-ling Date: 2009/11/16

Experiments• A real data set consisting of 6 billion queries• 930 million unique queries• Four semantic classes ,“Movie”, “Game”,

“Book”, and “Music”. • 4 human annotators.• 180 named entities were selected from the web

sites of Amazon, GameSpot, and Lyrics.• 120 for training and 60 for test.• Finally , we obtain 432,304 contexts and about

1.5 millions name entities.

2009/10/22 16

Page 17: Named Entity Recognition in Query Jiafeng Guo, Gu Xu, Xueqi Cheng, Hang Li (ACM SIGIR 2009) Speaker: Yi-Lin,Hsu Advisor: Dr. Koh, Jia-ling Date: 2009/11/16

Experiments• Randomly sampled 400 queries from the recognition results(0.14 millions)

for evaluation.

2009/10/22 17

Example Queries

pics of fight club braveheart quote

watch gladiator online american beauty company

12 angry men characters mario kart guide

pc mass effect crysis mods

mother teresa images condemned screenshots

4 minutes lyric king kong

the black swan summary blackwater novel

new moon rehab the song

nineteen minutes synopsis umbrella chords

all summer long video girlfriend lyrics

Page 18: Named Entity Recognition in Query Jiafeng Guo, Gu Xu, Xueqi Cheng, Hang Li (ACM SIGIR 2009) Speaker: Yi-Lin,Hsu Advisor: Dr. Koh, Jia-ling Date: 2009/11/16

Experiments• The performance of NERQ is evaluated in terms of Top

N accuracy.

2009/10/22 18

Page 19: Named Entity Recognition in Query Jiafeng Guo, Gu Xu, Xueqi Cheng, Hang Li (ACM SIGIR 2009) Speaker: Yi-Lin,Hsu Advisor: Dr. Koh, Jia-ling Date: 2009/11/16

Experiments

• We performed experiments to make comparison between the WS-LDA approach and two baseline methods: Determ and LDA.

• Determ learns the contexts of a certain class by simply aggregating all the contexts of named entities belonging to that class.

• LDA and WS-LDA take a probabilistic approach

2009/10/22 19

Page 20: Named Entity Recognition in Query Jiafeng Guo, Gu Xu, Xueqi Cheng, Hang Li (ACM SIGIR 2009) Speaker: Yi-Lin,Hsu Advisor: Dr. Koh, Jia-ling Date: 2009/11/16

Experiments

2009/11/16 20

Movie Contexts Game Contexts

Book Contexts Music Contexts

Determ LDA WS-LDA Determ LDA WS-LDA

Determ LDA WS-LDA Determ LDA WS-LDA

Page 21: Named Entity Recognition in Query Jiafeng Guo, Gu Xu, Xueqi Cheng, Hang Li (ACM SIGIR 2009) Speaker: Yi-Lin,Hsu Advisor: Dr. Koh, Jia-ling Date: 2009/11/16

• Table 5: Comparisons on Learned Named Entities of Each Class (P@N)

2009/11/16 21

Movie Game Book Music Average-Class

Page 22: Named Entity Recognition in Query Jiafeng Guo, Gu Xu, Xueqi Cheng, Hang Li (ACM SIGIR 2009) Speaker: Yi-Lin,Hsu Advisor: Dr. Koh, Jia-ling Date: 2009/11/16

Experiments

• Comparisons between WS-LDA and LDA

2009/10/22 22

Page 23: Named Entity Recognition in Query Jiafeng Guo, Gu Xu, Xueqi Cheng, Hang Li (ACM SIGIR 2009) Speaker: Yi-Lin,Hsu Advisor: Dr. Koh, Jia-ling Date: 2009/11/16

Conclusion

• Formalized the Problem of NERQ• Proposed a novel method for NERQ• Develop a new topic model called WSLDA• Future Works:

– We plan to add more classes and conduct the experiments.– The proposed method focuses on single named entity

queries.– Some queries contained the named entity out of

predefined classes. (e.g. American beauty company)– Some contexts were not learned in our approach since

they are uncommon. (e.g lyrics for # by chris brown )

2009/10/22 23