deep learning for information retrieval - hang li...star wars the force awakens reviews star wars:...

163
Machine Learning for Information Retrieval Hang Li Noah’s Ark Lab Huawei Technologies The Third Asian Summer School in Information Access Kyoto Japan Aug 5, 2016

Upload: others

Post on 07-Jul-2020

4 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Deep Learning for Information Retrieval - Hang Li...star wars the force awakens reviews Star Wars: Episode VII Three decades after the defeat of the Galactic Empire, a new threat arises

Machine Learning for Information Retrieval

Hang Li

Noah’s Ark Lab

Huawei Technologies

The Third Asian Summer School in Information Access

Kyoto JapanAug 5, 2016

Page 2: Deep Learning for Information Retrieval - Hang Li...star wars the force awakens reviews Star Wars: Episode VII Three decades after the defeat of the Galactic Empire, a new threat arises

Outline of Tutorial

• Introduction

• Part 1: Learning to Rank

• Part 2: Learning to Match

• Part 3: Deep Learning for Information Retrieval

• Summary

Page 3: Deep Learning for Information Retrieval - Hang Li...star wars the force awakens reviews Star Wars: Episode VII Three decades after the defeat of the Galactic Empire, a new threat arises

Overview of Information Retrieval

Information and Knowledge Base

Information Retrieval System

Query

Relevant Result

Intent:Key Words,Question

Content:Documents,Images,Relational Tables

Machine Learning can play an important role

• Key questions: how to represent intent and content, how to match intent and content• Ranking, indexing, etc are less essential• Interactive IR is not particularly considered here

Page 4: Deep Learning for Information Retrieval - Hang Li...star wars the force awakens reviews Star Wars: Episode VII Three decades after the defeat of the Galactic Empire, a new threat arises

Approach in Traditional IR

Query:star wars the force awakens reviews

Star Wars: Episode VIIThree decades after the defeat of the Galactic

Empire, a new threat arises.

||||||||

,),(

dq

dqdqfVSM

0

0

1

q

1

0

1

d

• Representing query and document as tf-idf vectors• Calculating cosine similarity between them• BM25, LM4IR, etc can be considered as non-linear variants

),( dqf

Document:

Page 5: Deep Learning for Information Retrieval - Hang Li...star wars the force awakens reviews Star Wars: Episode VII Three decades after the defeat of the Galactic Empire, a new threat arises

Approach in Modern IR

• Conducting query and document understanding• Representing query and document as feature vectors• Calculating multiple matching scores between query and document using learning to match• Training ranker with matching scores as features using learning to rank

Document:Query:star wars the force awakens reviews

Star Wars: Episode VIIThree decades after the defeat of the Galactic

Empire, a new threat arises.

qm

q

v

v

q

1

dn

d

v

v

d

1),( dqf

Page 6: Deep Learning for Information Retrieval - Hang Li...star wars the force awakens reviews Star Wars: Episode VII Three decades after the defeat of the Galactic Empire, a new threat arises

“Easy” Problems in IR

• Search

– Matching between query and document

• Question Answering from Documents

– Matching between question and answer

• Learning to match & learning to rank: well studied so far

• Deep Learning may not help so much

Page 7: Deep Learning for Information Retrieval - Hang Li...star wars the force awakens reviews Star Wars: Episode VII Three decades after the defeat of the Galactic Empire, a new threat arises

“Hard” Problems in IR

• Image Retrieval

– Matching between text and image

– Not the same as traditional setting

• Question Answering from Knowledge Base

– Complicated matching between question and fact in knowledge base

• Generation-based Question Answering

– Generating answer to question based on facts in knowledge base

• Not well studied so far

• Deep Learning for Information Retrieval can make a big deal

Page 8: Deep Learning for Information Retrieval - Hang Li...star wars the force awakens reviews Star Wars: Episode VII Three decades after the defeat of the Galactic Empire, a new threat arises

Part 1. Learning to Rank

8

Page 9: Deep Learning for Information Retrieval - Hang Li...star wars the force awakens reviews Star Wars: Episode VII Three decades after the defeat of the Galactic Empire, a new threat arises

1.1. Overview of Learning to Rank

9

Page 10: Deep Learning for Information Retrieval - Hang Li...star wars the force awakens reviews Star Wars: Episode VII Three decades after the defeat of the Galactic Empire, a new threat arises

Ranking Plays Key Role in Many Applications

10

Page 11: Deep Learning for Information Retrieval - Hang Li...star wars the force awakens reviews Star Wars: Episode VII Three decades after the defeat of the Galactic Empire, a new threat arises

Ranking Problem: Example = Document Retrieval

qnq

q

q

d

d

d

,

2,

1,

q

query

documents

ranking of documents

NdddD ,,, 21

),( dqf

11

ranking based on relevance, importance,

preference

Page 12: Deep Learning for Information Retrieval - Hang Li...star wars the force awakens reviews Star Wars: Episode VII Three decades after the defeat of the Galactic Empire, a new threat arises

Ranking ProblemExample = Recommenders System

12

Item1 Item2 Item3 ... ItemN

User1 5 4

User2 1 2 2

... ? ? ?

UserM 4 3

Page 13: Deep Learning for Information Retrieval - Hang Li...star wars the force awakens reviews Star Wars: Episode VII Three decades after the defeat of the Galactic Empire, a new threat arises

Ranking ProblemExample = Machine Translation

13

Re-Ranking Model

1000

2

1

e

e

e

ranked sentencecandidates in target language

GenerativeModel

f

sentence source language

e~

re-ranked topsentence in targetlanguage

Page 14: Deep Learning for Information Retrieval - Hang Li...star wars the force awakens reviews Star Wars: Episode VII Three decades after the defeat of the Galactic Empire, a new threat arises

1.2. Problem and Approaches of Learning to Rank

14

Page 15: Deep Learning for Information Retrieval - Hang Li...star wars the force awakens reviews Star Wars: Episode VII Three decades after the defeat of the Galactic Empire, a new threat arises

Ranking Problem: Example = Document Search

qnq

q

q

d

d

d

,

2,

1,

q

query

documents

ranking of documents

NdddD ,,, 21

),( dqf

15

ranking based on relevance, importance,

preference

Page 16: Deep Learning for Information Retrieval - Hang Li...star wars the force awakens reviews Star Wars: Episode VII Three decades after the defeat of the Galactic Empire, a new threat arises

Traditional Approach = Probabilistic Model

Nd

d

d

2

1

q

),|(~

),|(~

),|(~

22

11

nn dqrPd

dqrPd

dqrPd

query

documents

ranking of documents

}0,1{

),|(

R

dqrP

16

Page 17: Deep Learning for Information Retrieval - Hang Li...star wars the force awakens reviews Star Wars: Episode VII Three decades after the defeat of the Galactic Empire, a new threat arises

BM25 [Robertson & Walker 94]

Nd

d

d

2

1

qquery

documents

ranking function

qdw wtfavgdl

dlbkb

wtfk

)()1(

)()1(

17

),|(~

),|(~

),|(~

22

11

nn dqrPd

dqrPd

dqrPd

ranking of documents

}0,1{

),|(

R

dqrP

Page 18: Deep Learning for Information Retrieval - Hang Li...star wars the force awakens reviews Star Wars: Episode VII Three decades after the defeat of the Galactic Empire, a new threat arises

PageRank[Page et al, 1999]

ndL

dPdP

ij dMd j

j

i

1)1(

)(

)()(

)(

Page 19: Deep Learning for Information Retrieval - Hang Li...star wars the force awakens reviews Star Wars: Episode VII Three decades after the defeat of the Galactic Empire, a new threat arises

New Approach = Learning to Rank

1,1

2,1

1,1

1

nd

d

d

q

mnm

m

m

m

d

d

d

q

,

2,

1,

Learning System

Ranking System

1mq

),(

),(

),(

11 ,11,1

2,112,1

1,111,1

mm nmmnm

mmm

mmm

dqfd

dqfd

dqfd

),( dqf

19

NdddD ,,, 21

Page 20: Deep Learning for Information Retrieval - Hang Li...star wars the force awakens reviews Star Wars: Episode VII Three decades after the defeat of the Galactic Empire, a new threat arises

1. Data Labeling(rank) 3. Learning

)(xf

2. Feature Extraction

1,1

2,1

1,1

1

nd

d

d

q

mnm

m

m

m

d

d

d

q

,

2,

1,

11 ,1,1

2,12,1

1,11,1

1

nn yd

yd

yd

q

mm nmnm

mm

mm

m

yd

yd

yd

q

,,

2,2,

1,1,

11 ,1,1

2,12,1

1,11,1

nn yx

yx

yx

mm nmnm

mm

mm

yx

yx

yx

,,

2,2,

1,1,

Training Process

20

Page 21: Deep Learning for Information Retrieval - Hang Li...star wars the force awakens reviews Star Wars: Episode VII Three decades after the defeat of the Galactic Empire, a new threat arises

1. Data Labeling(rank)

2. Feature Extraction

Testing Process

21

3. Ranking with )(xf

)(

)(

)(

111 ,1,1,1

2,12,12,1

1,11,11,1

mmm nmnmnm

mmm

mmm

yxfx

yxfx

yxfx

4. Evaluation

EvaluationResult

1,1

2,1

1,1

1

mnm

m

m

m

d

d

d

q

11 ,1,1

2,12,1

1,11,1

1

mm nmnm

mm

mm

m

yd

yd

yd

q

11 ,1,1

2,12,1

1,11,1

mm nmnm

mm

mm

yx

yx

yx

Page 22: Deep Learning for Information Retrieval - Hang Li...star wars the force awakens reviews Star Wars: Episode VII Three decades after the defeat of the Galactic Empire, a new threat arises

Notes

• Features are functions of query and document

• Query and associated documents form a group

• Groups are i.i.d. data

• Feature vectors within group are not i.i.d. data

• Ranking model is function of features

• Several data labeling methods (here labeling of grade)

22

Page 23: Deep Learning for Information Retrieval - Hang Li...star wars the force awakens reviews Star Wars: Episode VII Three decades after the defeat of the Galactic Empire, a new threat arises

Issues in Learning to Rank

• Data Labeling

• Feature Extraction

• Evaluation Measure

• Learning Method (Model, Loss Function, Algorithm)

23

Page 24: Deep Learning for Information Retrieval - Hang Li...star wars the force awakens reviews Star Wars: Episode VII Three decades after the defeat of the Galactic Empire, a new threat arises

Data Labeling Problem

• E.g., relevance of documents w.r.t. query

24

Doc A

Doc B

Doc C

Query

Page 25: Deep Learning for Information Retrieval - Hang Li...star wars the force awakens reviews Star Wars: Episode VII Three decades after the defeat of the Galactic Empire, a new threat arises

Data Labeling Methods

• Labeling of Grades

– Multiple levels (e.g., relevant, partially relevant, irrelevant)

– Widely used in IR

• Labeling of Ordered Pairs

– Ordered pairs between documents (e.g. A>B, B>C)

– Implicit relevance judgment: derived from click-through data

• Creation of List

– List (or permutation) of documents is given

– Ideal but difficult to implement

25

Page 26: Deep Learning for Information Retrieval - Hang Li...star wars the force awakens reviews Star Wars: Episode VII Three decades after the defeat of the Galactic Empire, a new threat arises

Implicit Relevance Judgment

Doc A

Doc B

Doc CB > A

ranking of documents at search system

users often clicked on Doc B

ordered pair

26

Page 27: Deep Learning for Information Retrieval - Hang Li...star wars the force awakens reviews Star Wars: Episode VII Three decades after the defeat of the Galactic Empire, a new threat arises

Feature Extraction

27

Doc A

Doc B

Doc C

Query

BM25

BM25

BM25

PageRank

PageRank

PageRank

.............

.............

.............

Query-document feature

Document feature

Feature Vectors

Page 28: Deep Learning for Information Retrieval - Hang Li...star wars the force awakens reviews Star Wars: Episode VII Three decades after the defeat of the Galactic Empire, a new threat arises

Example Features

28

Page 29: Deep Learning for Information Retrieval - Hang Li...star wars the force awakens reviews Star Wars: Episode VII Three decades after the defeat of the Galactic Empire, a new threat arises

Evaluation Measures

• Important to rank top results correctly

• Measures– NDCG (Normalized Discounted Cumulative Gain)

– MAP (Mean Average Precision)

– MRR (Mean Reciprocal Rank)

– WTA (Winners Take All)

– Kendall’s Tau

29

Page 30: Deep Learning for Information Retrieval - Hang Li...star wars the force awakens reviews Star Wars: Episode VII Three decades after the defeat of the Galactic Empire, a new threat arises

NDCG

• Evaluating ranking using labeled grades

• NDCG at position j

30

)1log(/)12(1

1

)( in

j

i

ir

j

Page 31: Deep Learning for Information Retrieval - Hang Li...star wars the force awakens reviews Star Wars: Episode VII Three decades after the defeat of the Galactic Empire, a new threat arises

NDCG (cont’)

• Example: perfect ranking

– (3, 3, 2, 2, 1, 1, 1) grade r=3,2,1

– (7, 7, 3, 3, 1, 1, 1) gain

– (1, 0.63, 0.5, 0.43, 0.39, 0.36, 0.33) position discount

– (7, 11.41, 12.91, …) DCG

– (1/7, 1/11.41, 1/12.91, …) normalizing factor

– (1, 1,1,1,1,1,1) NDCG for perfect ranking

12 )( jr

)1log(/1 j

)1log(/)12(1

)( ij

i

ir

jn

31

Page 32: Deep Learning for Information Retrieval - Hang Li...star wars the force awakens reviews Star Wars: Episode VII Three decades after the defeat of the Galactic Empire, a new threat arises

NDCG (cont’)

• Example: imperfect ranking

– (2, 3, 2, 3, 1, 1, 1)

– (3, 7, 3, 7, 1, 1, 1) Gain

– (1, 0.63, 0.5, 0.43, 0.39, 0.36, 0.33) Position discount

– (3, 7.41, 8.91, … ) DCG

– (1/7, 1/11.41, 1/12.91, …) normalizing factor

– (0.43, 0.65, 0.69, ….) NDCG

• Imperfect ranking decreases NDCG

32

Page 33: Deep Learning for Information Retrieval - Hang Li...star wars the force awakens reviews Star Wars: Episode VII Three decades after the defeat of the Galactic Empire, a new threat arises

Relations with Other Learning Tasks

• No need to predict category

vs Classification

• No need to predict value of

vs Regression

• Relative ranking order is more important

vs Ordinal regression

• Learning to rank can be approximated by classification, regression, ordinal regression

33

),( dqf

Page 34: Deep Learning for Information Retrieval - Hang Li...star wars the force awakens reviews Star Wars: Episode VII Three decades after the defeat of the Galactic Empire, a new threat arises

Ordinal Regression (Ordinal Classification)

• Categories are ordered

– 5, 4, 3, 2, 1

– e.g., rating restaurants

• Prediction

– Map to ordered categories

34

Page 35: Deep Learning for Information Retrieval - Hang Li...star wars the force awakens reviews Star Wars: Episode VII Three decades after the defeat of the Galactic Empire, a new threat arises

Three Major Approaches

• Pointwise approach

• Pairwise approach

• Listwise approach

• SVM based

• Boosting based

• Neural Network based

• Others

35

Page 36: Deep Learning for Information Retrieval - Hang Li...star wars the force awakens reviews Star Wars: Episode VII Three decades after the defeat of the Galactic Empire, a new threat arises

Categorization of Learning to rank Methods

36

Page 37: Deep Learning for Information Retrieval - Hang Li...star wars the force awakens reviews Star Wars: Episode VII Three decades after the defeat of the Galactic Empire, a new threat arises

Pointwise Approach

• Transforming ranking to regression, classification, or ordinal classification

• Query-document group structure is ignored

37

Page 38: Deep Learning for Information Retrieval - Hang Li...star wars the force awakens reviews Star Wars: Episode VII Three decades after the defeat of the Galactic Empire, a new threat arises

Pointwise Approach

38

Page 39: Deep Learning for Information Retrieval - Hang Li...star wars the force awakens reviews Star Wars: Episode VII Three decades after the defeat of the Galactic Empire, a new threat arises

Pointwise Approach

39

Page 40: Deep Learning for Information Retrieval - Hang Li...star wars the force awakens reviews Star Wars: Episode VII Three decades after the defeat of the Galactic Empire, a new threat arises

Pairwise Approach

• Transforming ranking to pairwise classification

• Query-document group structure is ignored

40

Page 41: Deep Learning for Information Retrieval - Hang Li...star wars the force awakens reviews Star Wars: Episode VII Three decades after the defeat of the Galactic Empire, a new threat arises

Pairwise Approach

41

Page 42: Deep Learning for Information Retrieval - Hang Li...star wars the force awakens reviews Star Wars: Episode VII Three decades after the defeat of the Galactic Empire, a new threat arises

Listwise Approach

• List as instance

• Query-document group structure is used

• Straightforwardly represents learning to rank problem

42

Page 43: Deep Learning for Information Retrieval - Hang Li...star wars the force awakens reviews Star Wars: Episode VII Three decades after the defeat of the Galactic Empire, a new threat arises

Listwise Approach

43

Page 44: Deep Learning for Information Retrieval - Hang Li...star wars the force awakens reviews Star Wars: Episode VII Three decades after the defeat of the Galactic Empire, a new threat arises

Evaluation Results

• Pairwise approach and listwise approach perform better than pointwise approach

• LabmdaMART performs best in Yahoo Learning to rank Challenge

• No significant difference among pairwise and listwise methods

44

Page 45: Deep Learning for Information Retrieval - Hang Li...star wars the force awakens reviews Star Wars: Episode VII Three decades after the defeat of the Galactic Empire, a new threat arises

45

Page 46: Deep Learning for Information Retrieval - Hang Li...star wars the force awakens reviews Star Wars: Episode VII Three decades after the defeat of the Galactic Empire, a new threat arises

1.3. Methods of Learning to Rank

46

Page 47: Deep Learning for Information Retrieval - Hang Li...star wars the force awakens reviews Star Wars: Episode VII Three decades after the defeat of the Galactic Empire, a new threat arises

Ranking SVM

Hebrich et al., 1999

47

Page 48: Deep Learning for Information Retrieval - Hang Li...star wars the force awakens reviews Star Wars: Episode VII Three decades after the defeat of the Galactic Empire, a new threat arises

Pairwise Classification

• Converting document list to document pairs

48

Doc A

Doc B

Doc C

Query

Doc A

Doc B Doc C

Query

Doc ADoc B

Doc C

grade 1

grade 2

grade 3

Page 49: Deep Learning for Information Retrieval - Hang Li...star wars the force awakens reviews Star Wars: Episode VII Three decades after the defeat of the Galactic Empire, a new threat arises

Transforming Ranking to Pairwise Classification

• Input space: X

• Ranking function

• Ranking:

• Linear ranking function:

• Transforming to pairwise classification:

RXf :

);();( wxfwxfxx jiji

xwwxf ,);(

);();( 0, wxfwxfxxw jiji

ij

ji

jixx

xxyzxx

1

1 ),,(

49

Page 50: Deep Learning for Information Retrieval - Hang Li...star wars the force awakens reviews Star Wars: Episode VII Three decades after the defeat of the Galactic Empire, a new threat arises

Ranking Problem

50

w

grade 1

grade 2

grade 3

1x

2x

3x

w

Page 51: Deep Learning for Information Retrieval - Hang Li...star wars the force awakens reviews Star Wars: Episode VII Three decades after the defeat of the Galactic Empire, a new threat arises

Transformed Pairwise Classification Problem

51

);( wxf31 xx

32 xx

21 xx

+1-1

12 xx

23 xx

13 xx

Positive Examples

Negative Examples

Redundant

Page 52: Deep Learning for Information Retrieval - Hang Li...star wars the force awakens reviews Star Wars: Episode VII Three decades after the defeat of the Galactic Empire, a new threat arises

Ranking SVM

• Pairwise classification on differences of feature vectors

• Corresponding positive and negative examples

• Negative examples are redundant and can be discarded

• Hyper plane passes the origin

• Soft margin and kernel can be used

• Ranking SVM = pairwise classification SVM

52

Page 53: Deep Learning for Information Retrieval - Hang Li...star wars the force awakens reviews Star Wars: Episode VII Three decades after the defeat of the Galactic Empire, a new threat arises

Learning of Ranking SVM

2

1

)2()1( ||||,1min wxxwyl

i

iiiw

53

),0max(][ ss C2

1

0

,,1 1,

||||2

1min

)2()1(

1

2

,

i

iiii

N

i

iw

Nixxwy

Cw

Page 54: Deep Learning for Information Retrieval - Hang Li...star wars the force awakens reviews Star Wars: Episode VII Three decades after the defeat of the Galactic Empire, a new threat arises

IR SVM

Cao et al., 2006

54

Page 55: Deep Learning for Information Retrieval - Hang Li...star wars the force awakens reviews Star Wars: Episode VII Three decades after the defeat of the Galactic Empire, a new threat arises

Cost-sensitive Pairwise Classification

• Converting to document pairs

55

Doc A

Doc B

Doc C

Query

Doc A

Doc B Doc C

Query

Doc ADoc B

Doc C

grade 1

grade 2

grade 3

CriticalNot Critical

Page 56: Deep Learning for Information Retrieval - Hang Li...star wars the force awakens reviews Star Wars: Episode VII Three decades after the defeat of the Galactic Empire, a new threat arises

Problems with Ranking SVM

• Not sufficient emphasis on correct ranking on topgrades: 3, 2, 1ranking 1: 2 3 2 1 1 1 1ranking 2: 3 2 1 2 1 1 1ranking 2 should be better than ranking 1Ranking SVM views them as the same

• Numbers of pairs vary according to queriesq1: 3 2 2 1 1 1 1q2: 3 3 2 2 2 1 1 1 1 1number of pairs for q1 : 2*(2-2) + 4*(3-1) + 8*(2-1) = 14number of pairs for q2: 6*(3-2) + 10*(3-1) + 15*(2-1) = 31Ranking SVM is biased toward q2

56

Page 57: Deep Learning for Information Retrieval - Hang Li...star wars the force awakens reviews Star Wars: Episode VII Three decades after the defeat of the Galactic Empire, a new threat arises

IR SVM

• Solving the two problems of Ranking SVM

• Higher weight on important grade pairs

• Normalization weight on pairs in query

• IR SVM = Ranking SVM using modified hinge loss

57

)(ik

)(iq

Page 58: Deep Learning for Information Retrieval - Hang Li...star wars the force awakens reviews Star Wars: Episode VII Three decades after the defeat of the Galactic Empire, a new threat arises

Modified Hinge Loss function

58

2)2()1(

)(

1

)( ||||,1min wxxwy iiiiq

l

i

ikw

1

2

0.5

1 )( )2()1( xxfy

Loss

Page 59: Deep Learning for Information Retrieval - Hang Li...star wars the force awakens reviews Star Wars: Episode VII Three decades after the defeat of the Galactic Empire, a new threat arises

Learning of IR SVM

59

2)2()1(

)(

1

)( ||||,1min wxxwy iiiiq

l

i

ikw

2

0

,,1 1,

||||2

1min

)()(

)2()1(

1

2

,

iqik

i

i

iiii

l

i

iiw

C

lixxwy

Cw

Page 60: Deep Learning for Information Retrieval - Hang Li...star wars the force awakens reviews Star Wars: Episode VII Three decades after the defeat of the Galactic Empire, a new threat arises

AdaRank

Xu and Li, 2007

60

Page 61: Deep Learning for Information Retrieval - Hang Li...star wars the force awakens reviews Star Wars: Episode VII Three decades after the defeat of the Galactic Empire, a new threat arises

Listwise Loss

61

111 ,1,1,1

2,11,22,1

1,11,11,1

1

nnn yx

yx

yx

q

mmm nmnmnm

mmm

mmm

m

yx

yx

yx

q

,,,

2,2,2,

1,1,1,

Page 62: Deep Learning for Information Retrieval - Hang Li...star wars the force awakens reviews Star Wars: Episode VII Three decades after the defeat of the Galactic Empire, a new threat arises

AdaRank

• Optimizing exponential loss function

• Algorithm: AdaBoost-like algorithm for ranking

62

Page 63: Deep Learning for Information Retrieval - Hang Li...star wars the force awakens reviews Star Wars: Episode VII Three decades after the defeat of the Galactic Empire, a new threat arises

Loss Function of AdaRank

63

Any evaluation measuretaking value between [-1,+1]

Page 64: Deep Learning for Information Retrieval - Hang Li...star wars the force awakens reviews Star Wars: Episode VII Three decades after the defeat of the Galactic Empire, a new threat arises

AdaRank Algorithm

64

Page 65: Deep Learning for Information Retrieval - Hang Li...star wars the force awakens reviews Star Wars: Episode VII Three decades after the defeat of the Galactic Empire, a new threat arises

Part 2. Learning to Match

65

Page 66: Deep Learning for Information Retrieval - Hang Li...star wars the force awakens reviews Star Wars: Episode VII Three decades after the defeat of the Galactic Empire, a new threat arises

2-1. Semantic Matching in Search

66

Page 67: Deep Learning for Information Retrieval - Hang Li...star wars the force awakens reviews Star Wars: Episode VII Three decades after the defeat of the Galactic Empire, a new threat arises

Query Document Mismatch isBiggest Challenge in Search

67

Page 68: Deep Learning for Information Retrieval - Hang Li...star wars the force awakens reviews Star Wars: Episode VII Three decades after the defeat of the Galactic Empire, a new threat arises

Query Document Mismatch

• Same intent can be represented by different queries (representations)

• Search is still mainly based on term level matching

• Query document mismatch occurs, when searcher and author use different representations

68

Page 69: Deep Learning for Information Retrieval - Hang Li...star wars the force awakens reviews Star Wars: Episode VII Three decades after the defeat of the Galactic Empire, a new threat arises

Examples of Query Document Mismatch

Query Document Term Matching

Semantic Matching

seattle best hotel seattle best hotels

no yes

pool schedule swimmingpoolschedule

no yes

natural logarithm transformation

logarithm transformation

partial yes

china kong china hong kong partial no

why are windows so expensive

why are macs so expensive

partial no

69

Page 70: Deep Learning for Information Retrieval - Hang Li...star wars the force awakens reviews Star Wars: Episode VII Three decades after the defeat of the Galactic Empire, a new threat arises

Matching at Different Levels

Semantic Matching

Form Phrase Sense Topic Structure

Term Matching

Page 71: Deep Learning for Information Retrieval - Hang Li...star wars the force awakens reviews Star Wars: Episode VII Three decades after the defeat of the Galactic Empire, a new threat arises

Structure Identification

Topic Identification

Similar Query Finding

Phrase Identification

Spelling Error Correction

Sense

michael jordan berkele

query form: michael jordan berkeley

phrase: michael jordan

similar query: michael i. jordan

main phrase: michael jordan

Phrase

Term

Structure

topic: machine learning, berkeley

Topic

phrase: berkeley

Query Understanding

Page 72: Deep Learning for Information Retrieval - Hang Li...star wars the force awakens reviews Star Wars: Episode VII Three decades after the defeat of the Galactic Empire, a new threat arises

Title Structure Identification

Topic Identification

Key Phrase Identification

Phrase IdentificationTerm

Topic

Homepage of Michael Jordan

Michael Jordan is Professor in the

Department of Electrical Engineering

……

key phrase: michael jordan, professor,

electrical engineeringKey Phrase

topic: machine learning, berkeley

main phrase in title: michael jordan

phrase: michael jordan, professor,

department, electrical engineering]

Phrase

Structure

Document Understanding

Page 73: Deep Learning for Information Retrieval - Hang Li...star wars the force awakens reviews Star Wars: Episode VII Three decades after the defeat of the Galactic Empire, a new threat arises

Query

Representation

Document

RepresentationQuery Document

Matching

Relevance Ranking

Query form: michael jordan berkeley

Similar query: michael i jordan

Main phrase: michael jordan

Phrase: michael jordan, berkeley

Topic : machine learning

Document: michael jordan homepage

Main phrase in title: michael jordan

Key phrase: michael jordan, berkeley

Phrase: michael jordan, professor,

department of electrical engineering

Topic : machine learning, berkeley

Semantic Matching

Page 74: Deep Learning for Information Retrieval - Hang Li...star wars the force awakens reviews Star Wars: Episode VII Three decades after the defeat of the Galactic Empire, a new threat arises

Matching in Different Ways

74

q d

q’

d’

c

Query Reformulation

Document transformation

Query and document transformation

No transformation

Page 75: Deep Learning for Information Retrieval - Hang Li...star wars the force awakens reviews Star Wars: Episode VII Three decades after the defeat of the Galactic Empire, a new threat arises

Web Search System

Ranking

Document

UnderstandingIndexing

Index

User Interface

Web

User

Crawling

Query

Document

Matching

Query

UnderstandingRetrieving

Page 76: Deep Learning for Information Retrieval - Hang Li...star wars the force awakens reviews Star Wars: Episode VII Three decades after the defeat of the Galactic Empire, a new threat arises

Machine Learning for Query Document Matching in Web Search

76

Page 77: Deep Learning for Information Retrieval - Hang Li...star wars the force awakens reviews Star Wars: Episode VII Three decades after the defeat of the Galactic Empire, a new threat arises

Learning for Matching between Query and Document

• Learning matching function

• Using training data

• and can be id’s or feature vectors

• can be binary or numerical values

• Using relations in data and/or prior knowledge

77

),|(or ),( dqrpdqf MM

),,(,),,,( 111 NNN rdqrdq

Nqqq ,,, 21 Nddd ,,, 21

Nrrr ,,, 21

Page 78: Deep Learning for Information Retrieval - Hang Li...star wars the force awakens reviews Star Wars: Episode VII Three decades after the defeat of the Galactic Empire, a new threat arises

Long Tail Challenge

• Head pages have rich anchor texts and click data

• Tail queries and pages suffer more from mismatch

• Problem of propagating information and knowledge from head to tail

78

Page 79: Deep Learning for Information Retrieval - Hang Li...star wars the force awakens reviews Star Wars: Episode VII Three decades after the defeat of the Galactic Empire, a new threat arises

Relation between Matching and Ranking

• In traditional IR: – Ranking = matching

• Web search: – Ranking and matching become separated– Learning to rank becomes state-of-the-art

– Matching = feature learning for ranking

79

or ),(),( 25 dqfdqf BM

)(),(),( 25 dgdqfdqf PageRankBM

)|(),( qdPdqf LMIR

Page 80: Deep Learning for Information Retrieval - Hang Li...star wars the force awakens reviews Star Wars: Episode VII Three decades after the defeat of the Galactic Empire, a new threat arises

Matching vs Ranking

Matching Ranking

Prediction Matchingdegree between query and document

Ranking list of documents

Model f(q, d) f(q,d1), f(q,d2), … f(q,dn)

Challenge Mismatch Correct ranking on top

80

In search, first matching and then ranking

Page 81: Deep Learning for Information Retrieval - Hang Li...star wars the force awakens reviews Star Wars: Episode VII Three decades after the defeat of the Galactic Empire, a new threat arises

Matching Functions as Features in Learning to Rank

• Term level matching:

• Phrase level matching:

• Sense level matching:

• Topic level matching:

• Structure level matching:

• Term level matching (spelling, stemming):

81

),(25 dqfBM

),( dqfP

),( dqfT

),( dqfC

),( dqfS

qq '

),(25 dqf BMn

Page 82: Deep Learning for Information Retrieval - Hang Li...star wars the force awakens reviews Star Wars: Episode VII Three decades after the defeat of the Galactic Empire, a new threat arises
Page 83: Deep Learning for Information Retrieval - Hang Li...star wars the force awakens reviews Star Wars: Episode VII Three decades after the defeat of the Galactic Empire, a new threat arises

2-2. Overview of Learning to Match

83

Page 84: Deep Learning for Information Retrieval - Hang Li...star wars the force awakens reviews Star Wars: Episode VII Three decades after the defeat of the Galactic Empire, a new threat arises

Matching between Heterogeneous Data is Everywhere

• Matching between user and product (collaborative filtering)

• Matching between text and image (image annotation)

• Matching between people (dating)

• Matching between languages (machine translation)

• Matching between receptor and ligand (drug design)

84

Page 85: Deep Learning for Information Retrieval - Hang Li...star wars the force awakens reviews Star Wars: Episode VII Three decades after the defeat of the Galactic Empire, a new threat arises

Formulation of Learning Problem

• Learning matching function

• Training data

• Generated according to

85

),( yxf

),,(,),,,( 111 NNN ryxryx

),|(~ ),|(~ ),(~ YXRPrXYPyXPx

Page 86: Deep Learning for Information Retrieval - Hang Li...star wars the force awakens reviews Star Wars: Episode VII Three decades after the defeat of the Galactic Empire, a new threat arises

Formulation of Learning Problem

• Loss Function

• Risk Function

• Objective Function in Learning

86

) ),(,( yxfrL

RYXryxdPyxfrLryxPyxfrR ),,() ),(,(),,() ),(,(

N

iiii

FffyxfrL

1

)() ),(,(min

Page 87: Deep Learning for Information Retrieval - Hang Li...star wars the force awakens reviews Star Wars: Episode VII Three decades after the defeat of the Galactic Empire, a new threat arises

Matching Problem: Instance Matching

1

4

51

1

x1

xm

y1 yny2 y3

x2

x3

87

Instances

Can be represented as matching between nodes in bipartite graph

Page 88: Deep Learning for Information Retrieval - Hang Li...star wars the force awakens reviews Star Wars: Episode VII Three decades after the defeat of the Galactic Empire, a new threat arises

Matching Problem: Feature Matching

1

4

51

1

x1

xm

y1 yny2 y3

x2

x3

88

Features

Can be represented as matching between objects in two spaces

Page 89: Deep Learning for Information Retrieval - Hang Li...star wars the force awakens reviews Star Wars: Episode VII Three decades after the defeat of the Galactic Empire, a new threat arises

Matching Problem: Structure Matching

1

4

51

1

x1

xm

y1 yny2 y3

x2

x3

89

Structures

Page 90: Deep Learning for Information Retrieval - Hang Li...star wars the force awakens reviews Star Wars: Episode VII Three decades after the defeat of the Galactic Empire, a new threat arises

2-3. Methods of Learning to Match

90

Page 91: Deep Learning for Information Retrieval - Hang Li...star wars the force awakens reviews Star Wars: Episode VII Three decades after the defeat of the Galactic Empire, a new threat arises

Regularized Latent Semantic Indexing

Wang et al. 2011

Page 92: Deep Learning for Information Retrieval - Hang Li...star wars the force awakens reviews Star Wars: Episode VII Three decades after the defeat of the Galactic Empire, a new threat arises

Regularized Latent Semantic Indexing

• Motivation– Matching between query and document at topic level– Scale up to large datasets (vs. existing methods)

• Approach– Matrix Factorization– Regularization on topics and documents (vs. Sparse Coding)– Learning problem can be easily decomposed

• Results– l1 on topics leads to sparse topics and l2 on documents leads to

accurate matching– Comparable with existing methods in topic discovery and search

relevance– But can easily scale up to large document sets

Page 93: Deep Learning for Information Retrieval - Hang Li...star wars the force awakens reviews Star Wars: Episode VII Three decades after the defeat of the Galactic Empire, a new threat arises

Query and Document Matching in Topic Space

q

d1

dnd2

Document Space

d2

dn

d1

Topic Space

q

Page 94: Deep Learning for Information Retrieval - Hang Li...star wars the force awakens reviews Star Wars: Episode VII Three decades after the defeat of the Galactic Empire, a new threat arises

Regularized Latent Semantic Indexing

term representation

of doc n topics

topic representation

of doc n

topics are sparse

documents are

smooth

D

term

documentdocument

topicterm

topic

U

0

0

0

1

0

0

2

0

0

9

0

0

0

5

0

0

0

0

0

0

6

1

0

4

0

0

0

0

1

0

0

0

0

≈ V×

Page 95: Deep Learning for Information Retrieval - Hang Li...star wars the force awakens reviews Star Wars: Episode VII Three decades after the defeat of the Galactic Empire, a new threat arises

Optimization Strategy

Coordinate Decent

Analytic Solution

Page 96: Deep Learning for Information Retrieval - Hang Li...star wars the force awakens reviews Star Wars: Episode VII Three decades after the defeat of the Galactic Empire, a new threat arises

RLSI Algorithm

terms processed in parallel

docs processed in parallel

• Single machine multi core version• Multiple machine version (MapReduce and MPI)

Page 97: Deep Learning for Information Retrieval - Hang Li...star wars the force awakens reviews Star Wars: Episode VII Three decades after the defeat of the Galactic Empire, a new threat arises

Regularized Matching in Latent Space (RMLS)

Wu et al., 2013

97

Page 98: Deep Learning for Information Retrieval - Hang Li...star wars the force awakens reviews Star Wars: Episode VII Three decades after the defeat of the Galactic Empire, a new threat arises

Matching in Latent Space

• Motivation

– Matching between query and document in latent space

• Assumption

– Queries have similarity

– Document have similarity

– Click-through data represent “similarity” relations between queries and documents

• Approach

– Projection to latent space

– Regularization or constraints

• Results

– Significantly enhance accuracy of query document matching

Page 99: Deep Learning for Information Retrieval - Hang Li...star wars the force awakens reviews Star Wars: Episode VII Three decades after the defeat of the Galactic Empire, a new threat arises

Matching in Latent Space

q1

qmq2

d1

dn

d2

Query Space Document Space

q1d2

qm

dn

d1

q2 Latent Space

qL

dL

Page 100: Deep Learning for Information Retrieval - Hang Li...star wars the force awakens reviews Star Wars: Episode VII Three decades after the defeat of the Galactic Empire, a new threat arises

• Matching between Heterogeneous Data

• Example: Image Annotation

hook

fishing

singer

solider

worrier

microphone

Page 101: Deep Learning for Information Retrieval - Hang Li...star wars the force awakens reviews Star Wars: Episode VII Three decades after the defeat of the Galactic Empire, a new threat arises

Projecting Keywords and Images into Latent Space

Page 102: Deep Learning for Information Retrieval - Hang Li...star wars the force awakens reviews Star Wars: Episode VII Three decades after the defeat of the Galactic Empire, a new threat arises

Partial Least Square (PLS)• Setting

– Two spaces:

• Input

– Training data:

• Output

– Matching function

• Assumption

– Two linear (and orthonormal ) transformations

– Dot product as similarity function

• Optimization

DdQq ,

Nccdq iiii )},,{(

),( dqf

dq LL ,

dLqL dq ,

ILLILLdLqLc d

T

dq

T

q

dq

idiqiLL

iidq

, st. , maxarg),(

,

Page 103: Deep Learning for Information Retrieval - Hang Li...star wars the force awakens reviews Star Wars: Episode VII Three decades after the defeat of the Galactic Empire, a new threat arises

Solution of Partial Least Square

• Non-convex optimization

• Global optimal solution exists

• Global optimum can be found by solving SVD (Singular Value Decomposition)

Page 104: Deep Learning for Information Retrieval - Hang Li...star wars the force awakens reviews Star Wars: Episode VII Three decades after the defeat of the Galactic Empire, a new threat arises

Regularized Mapping to Latent Space (RMLS)

• Setting

– Two spaces:

• Input

– Training data:

• Output

– Matching function

• Assumption

– Two linear (and sparse) transformations

– Dot product as similarity function

• Optimization

DdQq ,

Nccdq iiii )},,{(

),( dqf

dq LL ,

dLqL dq ,

ddqqddqq

dq

idiqiLL

lllldLqLcii

dq

||||,||||,||,|| st. , maxarg),(

,

Page 105: Deep Learning for Information Retrieval - Hang Li...star wars the force awakens reviews Star Wars: Episode VII Three decades after the defeat of the Galactic Empire, a new threat arises

Solution of Regularized Mapping to Latent Space (RMLS)

• Coordinate Descent

• Repeat

– Fix , update

– Fix , update

• No guarantee to find global optimum

• Updates can be parallelized by rows

qL dL

dLqL

Page 106: Deep Learning for Information Retrieval - Hang Li...star wars the force awakens reviews Star Wars: Episode VII Three decades after the defeat of the Galactic Empire, a new threat arises

Comparison between PLS and RMLS

PLS RMLS

Assumption Orthogonal L1 and L2 Regularization

OptimizationMethod

Singular Value Decomposition

CoordinateDescent

Optimality Globaloptimum

Local optimum

Efficiency Low High

Scalability Low High

Page 107: Deep Learning for Information Retrieval - Hang Li...star wars the force awakens reviews Star Wars: Episode VII Three decades after the defeat of the Galactic Empire, a new threat arises

Part 3. Deep Learning for Information Retrieval

107

Page 108: Deep Learning for Information Retrieval - Hang Li...star wars the force awakens reviews Star Wars: Episode VII Three decades after the defeat of the Galactic Empire, a new threat arises

3-1: Overview of Deep Learning for Information Retrieval

Page 109: Deep Learning for Information Retrieval - Hang Li...star wars the force awakens reviews Star Wars: Episode VII Three decades after the defeat of the Galactic Empire, a new threat arises

Hard Problems in IR

Q: How tall is Yao Ming?

Q: A dog catching a ball

Name Height Weight

Yao Ming 2.29m 134kg

Liu Xiang 1.89m 85kg

Key Questions: How to Represent Intent and Content, How to Match Intent and Content

Image Retrieval

Question Answering from Knowledge Base

Q:How far is sun from earth?

The average distance between the Sun andthe Earth is about 92,935,700 miles.

Generation-based Question Answering

A: It is about 93 million miles

(No tag on images)

Page 110: Deep Learning for Information Retrieval - Hang Li...star wars the force awakens reviews Star Wars: Episode VII Three decades after the defeat of the Galactic Empire, a new threat arises

Deep Learning and IR

Intent ContentMatching

Deep Learning

Recent Progress: Deep Learning Is Particularly Effective for Hard IR Problems

Page 111: Deep Learning for Information Retrieval - Hang Li...star wars the force awakens reviews Star Wars: Episode VII Three decades after the defeat of the Galactic Empire, a new threat arises

3-2: Basics of Deep Learning

Page 112: Deep Learning for Information Retrieval - Hang Li...star wars the force awakens reviews Star Wars: Episode VII Three decades after the defeat of the Galactic Empire, a new threat arises

Word Embedding

Page 113: Deep Learning for Information Retrieval - Hang Li...star wars the force awakens reviews Star Wars: Episode VII Three decades after the defeat of the Galactic Empire, a new threat arises

Word Embedding

• Motivation: representing words with low-dimensional real-valued vectors, utilizing them as input to deep learning methods, vs one-hot vectors

• Method: SGNS (Skip-Gram with Negative Sampling)

• Tool: Word2Vec

• Input: words and their contexts in documents

• Output: embeddings of words

• Assumption: similar words occur in similar contexts

• Interpretation: factorization of mutual information matrix

• Advantage: compact representations (usually 100~ dimensions)

Page 114: Deep Learning for Information Retrieval - Hang Li...star wars the force awakens reviews Star Wars: Episode VII Three decades after the defeat of the Galactic Empire, a new threat arises

Skip-Gram with Negative Sampling(Mikolov et al., 2013)

• Input: occurrences between words and contexts

• Probability model:

5 1 2

2 1

3 1

1w

2w

3w

1c 2c 3c4c 5cM

cwecwcwDP

1

1)(),|1(

cwecwcwDP

1

1)(),|0(

Page 115: Deep Learning for Information Retrieval - Hang Li...star wars the force awakens reviews Star Wars: Episode VII Three decades after the defeat of the Galactic Empire, a new threat arises

Skip-Gram with Negative Sampling

• Word vector and context vector: lower dimensional (parameter ) vectors

• Goal: learning of the probability model from data

• Take co-occurrence data as positive examples

• Negative sampling: randomly sample k unobserved pairs

as negative examples

• Objective function in learning

• Algorithm: stochastic gradient descent

),( Ncw

)(log)(log),(# ~ NPC

w c

cwkcwcwLN

E

cw

,

Page 116: Deep Learning for Information Retrieval - Hang Li...star wars the force awakens reviews Star Wars: Episode VII Three decades after the defeat of the Galactic Empire, a new threat arises

Interpretation as Matrix Factorization(Levy & Goldberg 2014)

• Pointwise Mutual Information Matrix

)()(

),(log

cPwP

cwP

3 -.5 2

1 -0.5

1.5 1

1w

2w

3w

1c 2c 3c4c 5c

M

Page 117: Deep Learning for Information Retrieval - Hang Li...star wars the force awakens reviews Star Wars: Episode VII Three decades after the defeat of the Galactic Empire, a new threat arises

Interpretation as Matrix Factorization

7 0.5 1

2.2 3

1 1.5 1

1w

2w

3w

1t 2t 3t

TWCM Matrix factorization, equivalent to SGNS

W

3 -.5 2

1 -0.5

1.5 1

1w

2w

3w

1c 2c 3c4c 5c

M

Word embedding

Page 118: Deep Learning for Information Retrieval - Hang Li...star wars the force awakens reviews Star Wars: Episode VII Three decades after the defeat of the Galactic Empire, a new threat arises

Recurrent Neural Network

Page 119: Deep Learning for Information Retrieval - Hang Li...star wars the force awakens reviews Star Wars: Episode VII Three decades after the defeat of the Galactic Empire, a new threat arises

Recurrent Neural Network

• Motivation: representing sequence of words and utilizing the representation in deep learning methods

• Input: sequence of word embeddings, denoting sequence of words (e.g., sentence)

• Output: sequence of internal representations (hidden states)

• Variants: LSTM and GRU, to deal with long distance dependency

• Learning of model: stochastic gradient descent

• Advantage: handling arbitrarily long sequence; can be used as part of deep model for sequence processing (e.g., language modeling)

Page 120: Deep Learning for Information Retrieval - Hang Li...star wars the force awakens reviews Star Wars: Episode VII Three decades after the defeat of the Galactic Empire, a new threat arises

Recurrent Neural Network (RNN)(Mikolov et al. 2010)

the cat sat on the mat

the cat sat …. mat

),( 1 ttt xhfh

tx

1th

tx

th1th

Page 121: Deep Learning for Information Retrieval - Hang Li...star wars the force awakens reviews Star Wars: Episode VII Three decades after the defeat of the Galactic Empire, a new threat arises

Recurrent Neural Network

)tanh(),( 11 hxtxthttt bxWhWxhfh

1th tx

+1

th

……

Page 122: Deep Learning for Information Retrieval - Hang Li...star wars the force awakens reviews Star Wars: Episode VII Three decades after the defeat of the Galactic Empire, a new threat arises

Long Term Short Memory (LSTM)(Hochreiter & Schmidhuber, 1997)

)tanh(

)tanh(

)(

)(

)(

1

1

1

1

1

ttt

ttttt

gtgxtght

otoxtoht

ftfxtfht

itixtiht

coh

cfgic

bxWhWg

bxWhWo

bxWhWf

bxWhWi

• A memory (vector) to store Svalues of previous state• Input gate, output gate, and kforget gate to control • Gate: element-wise product kwith vector of values in [0,1]

),( 1 tt xh

),( 1 tt xh ),( 1 tt xh

),( 1 tt xh

th

ti to

tf

tgtc

m

Input Gate Output Gate

Forget Gate

Page 123: Deep Learning for Information Retrieval - Hang Li...star wars the force awakens reviews Star Wars: Episode VII Three decades after the defeat of the Galactic Empire, a new threat arises

Gated Recurrent Unit (GRU)(Cho et al., 2014)

ttttt

gtgxttght

ztzxtzht

rtrxtrht

gzhzh

bxWhrWg

bxWhWz

bxWhWr

)1(

))(tanh(

)(

)(

1

1

1

1

• A memory (vector) to store Svalues of previous state• Reset gate and update gate to kcontrol

tx

th

1th

1th

),( 1tt hx

tr

tg

tz

),( 1tt hx

Reset Gate

Update Gate

Page 124: Deep Learning for Information Retrieval - Hang Li...star wars the force awakens reviews Star Wars: Episode VII Three decades after the defeat of the Galactic Empire, a new threat arises

Recurrent Neural Network Language Model

)max(soft)|(

)tanh(

11

1

bWhxxxPp

bxWhWh

tttt

hxtxtht

Model Objective of Learning

T

t

tpT 1

ˆlog1

th

tx

1th

1tx

tx1tx

• Input one sequence and output another• In training, input sequence is same as output sequence

Page 125: Deep Learning for Information Retrieval - Hang Li...star wars the force awakens reviews Star Wars: Episode VII Three decades after the defeat of the Galactic Empire, a new threat arises

Convolutional Neural Network

Page 126: Deep Learning for Information Retrieval - Hang Li...star wars the force awakens reviews Star Wars: Episode VII Three decades after the defeat of the Galactic Empire, a new threat arises

Convolutional Neural Network

• Motivation: representing sequence of words and utilizing the representation in deep learning methods

• Input: sequence of word embeddings, denoting sequence of words (e.g., sentence)

• Output: representation of input sequence

• Learning of model: stochastic gradient descent

• Advantage: robust extraction of n-gram features; can be used as part of deep model for sequence processing (e.g., sentence classification)

Page 127: Deep Learning for Information Retrieval - Hang Li...star wars the force awakens reviews Star Wars: Episode VII Three decades after the defeat of the Galactic Empire, a new threat arises

Convolutional Neural Network (CNN) (Kim 2014, Blunsom et al. 2014, Hu et al., 2014)

the cat sat on the mat Concatenation

……

the cat sat on the mat

cat sat

the cat sat

the cat

sat on

cat sat on

cat sat

on the

sat on the

sat on

the mat

on the mat

on the

sat on

the cat sat

the cat

the mat

on the mat

sat on

Convolution

Max pooling

• Shared parameters on same level• Fixed length, zero padding

Page 128: Deep Learning for Information Retrieval - Hang Li...star wars the force awakens reviews Star Wars: Episode VII Three decades after the defeat of the Galactic Empire, a new threat arises

Example: Image Convolution

0

0

3

1

Dark Pixel Value = 1, Light Pixel Value = 0Dot in Filter = 1, Others = 0

Filter

Leow Wee Kheng

Page 129: Deep Learning for Information Retrieval - Hang Li...star wars the force awakens reviews Star Wars: Episode VII Three decades after the defeat of the Galactic Empire, a new threat arises

Example: Image Convolution

0 0 0 0 0

0 0 1 1 0

0 1 3 2 0

0 1 3 1 0

0 1 1 0 0

Feature Map

• Scanning image with filter having 3*3 cells, among them 3 are dot cells• Counting number of dark pixels overlapping with dot cells at each position• Creating feature map (matrix), each element represents similarity between filer pattern and pixel pattern at one position• Equivalent to extracting feature using the filter• Translation-invariant

Convolution Operation

Page 130: Deep Learning for Information Retrieval - Hang Li...star wars the force awakens reviews Star Wars: Episode VII Three decades after the defeat of the Galactic Empire, a new threat arises

Convolution

Filter Feature Map Neuron

TT

hi

T

i

T

ii

i

l

i

flfl

fl

i

l

fll

i

flfl

i

xxxz

iz

liz

lfbw

lifz

Ffbzwz

],,[

location for vectors wordedcancatenat frominput is

1layer from location for neuron ofinput is

function sigmoid is

layer in typeofneuron of parameters are ,

layer in location for typeofneuron ofoutput is

,,2,1 )(

11

)0(

)0(

)1(

),(),(

),(

),()1(),(),(

)1( liz

),( flb

+1

),( flw

),( fliz

Convolution

Equivalent to n-gram feature extraction at each position

Page 131: Deep Learning for Information Retrieval - Hang Li...star wars the force awakens reviews Star Wars: Episode VII Three decades after the defeat of the Galactic Empire, a new threat arises

Max Pooling

lifzz

lifz

zzz

fl

i

fl

i

fl

i

fl

i

fl

i

fl

i

layer in location for typeof pooling ofinput are ,

layer in location for typeof pooling ofoutput is

),max(

),1(

2

),1(

12

),(

),1(

2

),1(

12

),(

Max Pooling

Equivalent to n-gram feature selection

Page 132: Deep Learning for Information Retrieval - Hang Li...star wars the force awakens reviews Star Wars: Episode VII Three decades after the defeat of the Galactic Empire, a new threat arises

Sentence Classification Using Convolutional Neural Network

)(CNN

)max(soft)(

xz

bWzxfy

Concatenation

……

Convolution

Max Pooling

)(Lz

x

y

Page 133: Deep Learning for Information Retrieval - Hang Li...star wars the force awakens reviews Star Wars: Episode VII Three decades after the defeat of the Galactic Empire, a new threat arises

3-3. Methods of Deep Learning for Information Retrieval

Page 134: Deep Learning for Information Retrieval - Hang Li...star wars the force awakens reviews Star Wars: Episode VII Three decades after the defeat of the Galactic Empire, a new threat arises

Retrieval based Question Answering

Hu et al. 2014

Ji et al. 2014

Page 135: Deep Learning for Information Retrieval - Hang Li...star wars the force awakens reviews Star Wars: Episode VII Three decades after the defeat of the Galactic Empire, a new threat arises

Retrieval-based Question Answering

Q: What is the population of Hong Kong?A: It is 7.18 million as in 2013.

Q: How many people are there in Hong Kong?A: There are about 7 million.

Question Answering System

Q:Do you know Hong Kong’smmpopulation?

A:There are about 7 mmmillion.

Learning System

Page 136: Deep Learning for Information Retrieval - Hang Li...star wars the force awakens reviews Star Wars: Episode VII Three decades after the defeat of the Galactic Empire, a new threat arises

Retrieval based Question Answering System

Index of Questions and

Answers

Matching Ranking

Question

Retrieval

RetrievedQuestions and Answers

RankedAnswers

MatchingModels

RankingModel

Online

Offline

Best Answer

MatchedAnswers

Page 137: Deep Learning for Information Retrieval - Hang Li...star wars the force awakens reviews Star Wars: Episode VII Three decades after the defeat of the Galactic Empire, a new threat arises

Deep Match CNN- Architecture I

MLP

…… ……

• First represent two sentences as vectors, and then match the vectors

Sentence X Sentence Y

Page 138: Deep Learning for Information Retrieval - Hang Li...star wars the force awakens reviews Star Wars: Episode VII Three decades after the defeat of the Galactic Empire, a new threat arises

Deep Match CNN - Architecture II

• Represent and match two sentences simultaneously

• Two dimensional model

138

MLP

Matching Degree

2D Convolution

More 2D Convolution & Pooling

Max-Pooling

1D Convolution

Sentence X

Sen

ten

ce Y

Page 139: Deep Learning for Information Retrieval - Hang Li...star wars the force awakens reviews Star Wars: Episode VII Three decades after the defeat of the Galactic Empire, a new threat arises

Generation based Question Answering

Shang et al. 2015

Page 140: Deep Learning for Information Retrieval - Hang Li...star wars the force awakens reviews Star Wars: Episode VII Three decades after the defeat of the Galactic Empire, a new threat arises

Generation-based Question Answering

Q: What is the population of Hong Kong?

A: It is 7.18 million as in 2013.

Q: How many people are there in HonglmmKong?

A: There are about 7 million.

Question Answering System

Q: Do you know HongmmlKong’s population?

A: It is 7 million

Learning System

Page 141: Deep Learning for Information Retrieval - Hang Li...star wars the force awakens reviews Star Wars: Episode VII Three decades after the defeat of the Galactic Empire, a new threat arises

Neural Responding Machine

• Encoding questions to internal representations

• Decoding internal representations to answers

• Using GRU

Question

Answer

Encoder

Txxx 21x

Decoder

tyyy 21y

c

h

ContextGenerator

Page 142: Deep Learning for Information Retrieval - Hang Li...star wars the force awakens reviews Star Wars: Episode VII Three decades after the defeat of the Galactic Empire, a new threat arises

Decoder

1c2c

tc 'Tc

1s 1ts ts'Ts

1y 1ty ty 'Ty

… …

GRU is () function,softmax is ()

ctorcontext ve is

decoder of statehidden is

hot vector-one is

),,(

),,(),|(

11

111

fg

c

s

y

csyfs

csygyyyP

t

t

t

tttt

ttttt

x

Similar to attention mechanism in RNN Encoder-Decoder

Page 143: Deep Learning for Information Retrieval - Hang Li...star wars the force awakens reviews Star Wars: Episode VII Three decades after the defeat of the Galactic Empire, a new threat arises

Encoder

Global EncoderLocal Encoder

1x1tx tx Tx

1h 1th th Th

1ts

tc

… …

stateshidden global and local ofion concatenat is :

weightis ctor,context ve is

),(,: 1

1

g

T

l

j

tjt

tjtj

g

T

l

j

T

j

tjt

hh

c

shqhhc

Combination of global and local encoders

GRU is ()

encoder of statehidden is

embedding wordis

),( 1

f

h

x

hxfh

t

t

ttt

Page 144: Deep Learning for Information Retrieval - Hang Li...star wars the force awakens reviews Star Wars: Episode VII Three decades after the defeat of the Galactic Empire, a new threat arises

Question Answering from Relational Database

Yin et al. 2016

Page 145: Deep Learning for Information Retrieval - Hang Li...star wars the force awakens reviews Star Wars: Episode VII Three decades after the defeat of the Galactic Empire, a new threat arises

Question Answering from Relational Database

Relational Database

Q: How many people participated in the mmgame in Beijing?

A: 4,200SQL: select #_participants, where city=beijing

Q: When was the latest game hosted?A: 2012SQL: argmax(city, year)

Question Answering System

Q: Which city hosted the mmlongest Olympic game mmbefore the game in Beijing? A: Athens

Learning System

year city #_days #_medals

2000 Sydney 20 2,000

2004 Athens 35 1,500

2008 Beijing 30 2,500

2012 London 40 2,300

Page 146: Deep Learning for Information Retrieval - Hang Li...star wars the force awakens reviews Star Wars: Episode VII Three decades after the defeat of the Galactic Empire, a new threat arises

Neural Enquirer

• Query Encoder: encoding query• Table Encoder: encoding entries in table• Five Executors: executing query against table

Conducting matching between question and database entries multiple times

Page 147: Deep Learning for Information Retrieval - Hang Li...star wars the force awakens reviews Star Wars: Episode VII Three decades after the defeat of the Galactic Empire, a new threat arises

Query Encoder and Table Encoder

• Creating query embedding using RNN• Creating table embedding for each entry using DNN

RNN

Query Representation

Query

DNN

Entry Representation

Field Value

Table Representation

Query Encoder Table Encoder

Page 148: Deep Learning for Information Retrieval - Hang Li...star wars the force awakens reviews Star Wars: Episode VII Three decades after the defeat of the Galactic Empire, a new threat arises

Executors

• Five layers, except last layer, each layer has reader, mannotator, and memory • Reader fetches important representation for each row,me.g., city=beijing• Annotator encodes result representation for each row, me.g., row where city=beijing

Select #_participants where city = beijing

Page 149: Deep Learning for Information Retrieval - Hang Li...star wars the force awakens reviews Star Wars: Episode VII Three decades after the defeat of the Galactic Empire, a new threat arises

Question Answering from Knowledge Graph

Yin et al. 2016

Page 150: Deep Learning for Information Retrieval - Hang Li...star wars the force awakens reviews Star Wars: Episode VII Three decades after the defeat of the Galactic Empire, a new threat arises

Question Answering from Knowledge Graph

(Yao-Ming, spouse, Ye-Li)(Yao-Ming, born, Shanghai)(Yao-Ming, height, 2.29m)… …(Ludwig van Beethoven, place ofbirth, Germany)… …

Knowledge Graph

Q: How tall is Yao Ming?A: He is 2.29m tall and is visible from space.(Yao Ming, height, 2.29m)

Q: Which country was Beethoven from?A: He was born in what is now Germany.(Ludwig van Beethoven, place of birth, Germany)

Question Answering System

Q: How tall is Liu Xiang? A: He is 1.89m tall

Learning System

Answer is generated

Page 151: Deep Learning for Information Retrieval - Hang Li...star wars the force awakens reviews Star Wars: Episode VII Three decades after the defeat of the Galactic Empire, a new threat arises

GenQA

• Interpreter: creates representation of question using RNN

• Enquirer: retrieves top k triples with highest matching scores using CNN model

• Generator: generates answer based on question and retrieved triples using attention-based RNN

• Attention model: controls generation of answer

Short Term Memory

Long Term Memory

(Knowledge Base)

How tall is Yao Ming?

Interpreter

Enquirer

Generator

He is 2.29m tall

Attention Model

Key idea: • Generation of answer based on question and retrieved kresult• Combination of neural processing and symbolic rprocessing

Page 152: Deep Learning for Information Retrieval - Hang Li...star wars the force awakens reviews Star Wars: Episode VII Three decades after the defeat of the Galactic Empire, a new threat arises

Enquirer: Retrieval and Matching

• Retaining both symbolic representations and vector representations• Using question words to retrieve top k triples• Calculating matching scores between question and triples using CCNN model• Finding best matched triples

(how, tall, is, liu, xiang)

< liu xiang, height, 1.90m>

< yao ming, height, 2.26m>… …

<liu xiang, birth place, shanghai>

Retrieved Top k Triplesand Embeddings

Question and Embedding Matching

Page 153: Deep Learning for Information Retrieval - Hang Li...star wars the force awakens reviews Star Wars: Episode VII Three decades after the defeat of the Galactic Empire, a new threat arises

Generator: Answer Generation

• Generating answer using attention mechanism• At each position, a variable decides whether to ggenerate a word or use the object of top triple

2s 3s

3c

He is

03 z 13 z

2.29mtall

< yao ming, height, 2.29m>

3z

How tall is Yao Ming ?

3y2y

…o

3y

Page 154: Deep Learning for Information Retrieval - Hang Li...star wars the force awakens reviews Star Wars: Episode VII Three decades after the defeat of the Galactic Empire, a new threat arises

Image Retrieval

Ma et al. 2015

Page 155: Deep Learning for Information Retrieval - Hang Li...star wars the force awakens reviews Star Wars: Episode VII Three decades after the defeat of the Galactic Empire, a new threat arises

Image Retrieval

a lady in a car

a man holds a cell phone

two ladies are chatting

Learning System

Retrieval System

Having dinner with friends in restaurant

Image Index

Page 156: Deep Learning for Information Retrieval - Hang Li...star wars the force awakens reviews Star Wars: Episode VII Three decades after the defeat of the Galactic Empire, a new threat arises

Multimodal CNN• Represent text and image as vectors and then match

the two vectors

• Word-level matching, phrase-level matching, sentence-level matching

• CNN model works better than RNN models (state of the art) for text

A dog is catching a ball

CNN CNN

MLP

Page 157: Deep Learning for Information Retrieval - Hang Li...star wars the force awakens reviews Star Wars: Episode VII Three decades after the defeat of the Galactic Empire, a new threat arises

Sentence-level Matching

• Combing image vector and sentence vector

……

CNN

MLP

a dog is catching a ball

Page 158: Deep Learning for Information Retrieval - Hang Li...star wars the force awakens reviews Star Wars: Episode VII Three decades after the defeat of the Galactic Empire, a new threat arises

Word-level Matching Model

• Adding image vector to word vectors

……

a dog is catching a ball

CNN

MLP

Page 159: Deep Learning for Information Retrieval - Hang Li...star wars the force awakens reviews Star Wars: Episode VII Three decades after the defeat of the Galactic Empire, a new threat arises

Summary

Page 160: Deep Learning for Information Retrieval - Hang Li...star wars the force awakens reviews Star Wars: Episode VII Three decades after the defeat of the Galactic Empire, a new threat arises

Summary

• Learning to Rank– Approaches: pointwise, pairwise, listwise approaches– Methods: Ranking SVM, IR SVM, AdaRank

• Learning to Match– Semantic Matching– Methods: RLSI, RMLS

• Deep Learning for Information Retrieval– Deep learning is effective for hard IR problems– Basic tools: word embedding, CNN, RNN– Methods: Deep Match CNN, NRM, GenQA, Neural

Enquirer, Multimodal CNN

Page 161: Deep Learning for Information Retrieval - Hang Li...star wars the force awakens reviews Star Wars: Episode VII Three decades after the defeat of the Galactic Empire, a new threat arises

References• Jun Yin, Xin Jiang, Zhengdong Lu, Lifeng Shang, Hang Li, Xiaoming Li. Neural Generative

Question Answering. Proceedings of the 25th International Joint Conference on Artificial Intelligence (IJCAI’16), 2972-2978, 2016.

• Pengcheng Yin, Zhengdong Lu, Hang Li, Ben Kao. Neural Enquirer: Learning to Query Tables with Natural Language. Proceedings of the 25th International Joint Conference on Artificial Intelligence (IJCAI’16), 2308-2314, 2016.

• Lin Ma, Zhengdong Lu, Lifeng Shang, Hang Li, Multimodal Convolutional Neural Networks for Matching Image and Sentence. Proceedings of 2015 IEEE International Conference on Computer Vision (ICCV), 2623-2631, 2015.

• Lifeng Shang, Zhengdong Lu, Hang Li. Neural Responding Machine for Short Text Conversation. Proceedings of the 53th Annual Meeting of Association for Computational Linguistics and the 7th International Conference on Natural Language Processing (ACL-IJCNLP'15), 2015.

• Baotian Hu, Zhengdong Lu, Hang Li, Qingcai Chen. Convolutional Neural Network Architectures for Matching Natural Language Sentences. Proceedings of Advances in Neural Information Processing Systems 27 (NIPS'14), 2042-2050, 2014.

• Hang Li, Jun Xu. Semantic Matching in Search. Foundations and Trends in Information Retrieval, Now Publishers, 2014.

• Wei Wu, Zhengdong Lu, Hang Li. Learning Bilinear Model for Matching Queries and Documents. Journal of Machine Learning Research (JMLR), 14: 2519-2548, 2013.

Page 162: Deep Learning for Information Retrieval - Hang Li...star wars the force awakens reviews Star Wars: Episode VII Three decades after the defeat of the Galactic Empire, a new threat arises

References• Quan Wang, Jun Xu, Hang Li, Nick Craswell. Regularized Latent Semantic Indexing:

A New Approach to Large Scale Topic Modeling. ACM Transactions on Information Systems (TOIS), 31(1): 5, 2013.

• Hang Li. A Short Introduction to Learning to Rank. IEICE Transactions on Information and Systems, E94-(10), 2011.

• Quan Wang, Jun Xu, Hang Li, Nick Craswell. Regularized Latent Semantic Indexing. Proceedings of the 34th Annual International ACM SIGIR Conference (SIGIR’11), 685-694, 2011.

• Hang Li. Learning to Rank for Information Retrieval and Natural Language Processing. Synthesis Lectures on Human Language Technology, Lecture 12, Morgan & Claypool Publishers, 2011.

• Zhe Cao, Tao Qin, Tie-Yan Liu, Ming-Feng Tsai, Hang Li. Learning to Rank: From Pairwise Approach to Listwise Approach. Proceedings of the 24th International Conference on Machine Learning (ICML’07), 129-136, 2007.

• Jun Xu and Hang Li. AdaRank: A Boosting Algorithm for Information Retrieval. Proceedings of the 30th Annual International ACM SIGIR Conference (SIGIR’07), 391-398, 2007.

• Yunbo Cao, Jun Xu, Tie-Yan Liu, Hang Li, Yalou Huang, Hsiao-Wuen Hon. Adapting Ranking SVM to Document Retrieval. Proceedings of the 29th Annual International ACM SIGIR Conference (SIGIR’06), 186-193, 2006.

Page 163: Deep Learning for Information Retrieval - Hang Li...star wars the force awakens reviews Star Wars: Episode VII Three decades after the defeat of the Galactic Empire, a new threat arises

Thank you!