statistic models for web/sponsored search click log analysis the chinese university of hong kong 1...

67
Statistic Models for Web/Sponsored Search Click Log Analysis The Chinese University of Hong Kong 1 Some slides are revised from Mr Guo Fan’s tutorial at CIKM 2009.

Upload: jocelin-foster

Post on 18-Dec-2015

213 views

Category:

Documents


1 download

TRANSCRIPT

Page 1: Statistic Models for Web/Sponsored Search Click Log Analysis The Chinese University of Hong Kong 1 Some slides are revised from Mr Guo Fan’s tutorial at

Statistic Models for Web/Sponsored Search Click Log Analysis

The Chinese University of Hong Kong

1

Some slides are revised from Mr Guo Fan’s tutorial at CIKM 2009.

Page 2: Statistic Models for Web/Sponsored Search Click Log Analysis The Chinese University of Hong Kong 1 Some slides are revised from Mr Guo Fan’s tutorial at

Index

• Background.• A Simple Click Model.– Dependent click model [WSDM09].

• Advanced Design.– Five extension directions.

• Advanced Estimation.– Bayesian framework and the rationale.– Bayesian browsing model (BBM) [Liu09].– Click chain model (CCM) [Guo09].

• Course Project.

2

Page 3: Statistic Models for Web/Sponsored Search Click Log Analysis The Chinese University of Hong Kong 1 Some slides are revised from Mr Guo Fan’s tutorial at

Scenario: Web Search

3

Page 4: Statistic Models for Web/Sponsored Search Click Log Analysis The Chinese University of Hong Kong 1 Some slides are revised from Mr Guo Fan’s tutorial at

User Click Log

4

36

23

1811

36

1

2

3

4

5

Page 5: Statistic Models for Web/Sponsored Search Click Log Analysis The Chinese University of Hong Kong 1 Some slides are revised from Mr Guo Fan’s tutorial at

Eye-tracking User Study

• Users have bias to examine the top results.

5

Page 6: Statistic Models for Web/Sponsored Search Click Log Analysis The Chinese University of Hong Kong 1 Some slides are revised from Mr Guo Fan’s tutorial at

Position-bias Identification

6

• Higher positions receive more user attention (eye fixation) and clicks than lower positions.

• This is true even in the extreme setting where the order of positions is reversed.

• “Clicks are informative but biased”.

Normal Position

Perc

enta

ge

Reversed Impression

Perc

enta

ge

[Joachims07]

Page 7: Statistic Models for Web/Sponsored Search Click Log Analysis The Chinese University of Hong Kong 1 Some slides are revised from Mr Guo Fan’s tutorial at

Answer to Previous Example

• Result 5 is more relevant compared with Result 1. • Because Result 5 has less opportunity to be examined.

7

36

23

1811

36

1

2

3

4

5

Page 8: Statistic Models for Web/Sponsored Search Click Log Analysis The Chinese University of Hong Kong 1 Some slides are revised from Mr Guo Fan’s tutorial at

Click Model Motivation

• Modeling the user’s click behavior in an interpreted manner and estimate the pure relevance of a query-document/ad pair regardless of bias. – Position-bias is the main problem.– Other kinds of bias.

• Influence among documents/ads• Attractiveness bias• Search intent bias• …

• Pure relevance of a query-document/ad pair intuition.– When the query is submitted to the search engine and only one single

document/ad is shown, what is the click-through rate of this query-document/ad pair?

8

Page 9: Statistic Models for Web/Sponsored Search Click Log Analysis The Chinese University of Hong Kong 1 Some slides are revised from Mr Guo Fan’s tutorial at

Examination Hypothesis [Richardson07]• A document must be examined before a click.• The probability of click conditioned on being examined

depends on the pure relevance of the query-document/ad pair.

• The click probability could be decomposed.– Global component.

• the examination probability which reflects the position-bias.

– Local component (pure relevance).• click probability of the (query, URL) pair conditioned on being examined.

9

Page 10: Statistic Models for Web/Sponsored Search Click Log Analysis The Chinese University of Hong Kong 1 Some slides are revised from Mr Guo Fan’s tutorial at

Click Models

• Key tasks.– How to design the user examination behavior? – How to estimate the relevance of a query-doc/ad pair?

• Desired Properties.– Effective: aware of the position-bias/other-bias and address it

properly.– Scalable: linear complexity for both time and space, easy to parallel.– Incremental: flexible for model update based on new data.

10

From this slide, “relevance” is equal to “pure relevance”.

Page 11: Statistic Models for Web/Sponsored Search Click Log Analysis The Chinese University of Hong Kong 1 Some slides are revised from Mr Guo Fan’s tutorial at

Importance of Understanding Logs

• Better matching query and documents/ads.• All the participants would benefit.

– Users: better relevance.– Search engines: more revenue from advertisers and more users.– Advertisers: more return on investment (ROI).

11

Advertiser

User PublisherBetterMatch

Page 12: Statistic Models for Web/Sponsored Search Click Log Analysis The Chinese University of Hong Kong 1 Some slides are revised from Mr Guo Fan’s tutorial at

Growth of Web Users

12

Page 13: Statistic Models for Web/Sponsored Search Click Log Analysis The Chinese University of Hong Kong 1 Some slides are revised from Mr Guo Fan’s tutorial at

Growth of Web Revenue

13

Page 14: Statistic Models for Web/Sponsored Search Click Log Analysis The Chinese University of Hong Kong 1 Some slides are revised from Mr Guo Fan’s tutorial at

Index

• Background.• A Simple Click Model.– Dependent Click Model [WSDM09].

• Advanced Design.• Advanced Estimation.• Projects.

14

Page 15: Statistic Models for Web/Sponsored Search Click Log Analysis The Chinese University of Hong Kong 1 Some slides are revised from Mr Guo Fan’s tutorial at

Notations

– Ei• binary r.v. for Examination Event on position i;

– Ci• binary r.v. for Click Event on position i;

– ri = p(Ci = 1| Ei = 1)• relevance for the query-document pair on position i.

15

Page 16: Statistic Models for Web/Sponsored Search Click Log Analysis The Chinese University of Hong Kong 1 Some slides are revised from Mr Guo Fan’s tutorial at

Click Model Design

16

1( 1) 1

( 1| 0) 0

( 1| 1)i i

i i i

p E

p C E

p C E r

1

1

1

( 1| 0) 0

( 1| 0, 1) 1

( 1| 0, 1)

i i

i i i

i i i i

p E E

p E C E

p E C E

Dependent Click Model (DCM) [GUO09]

Page 17: Statistic Models for Web/Sponsored Search Click Log Analysis The Chinese University of Hong Kong 1 Some slides are revised from Mr Guo Fan’s tutorial at

Parameters in DCM

• r=p(C=1|E=1) is local parameter.– Modeling the relevance of a query-document/ad pair.

– The position-bias has been modeled by p(E=1).

• λ is global parameter.– Modeling p(Ei+1=1|Ci=1,Ei=1).

17

Parameters estimationMaximum log-likelihood method

Page 18: Statistic Models for Web/Sponsored Search Click Log Analysis The Chinese University of Hong Kong 1 Some slides are revised from Mr Guo Fan’s tutorial at

Estimation of r: Step 1

• Define as last click position.• When there is no click, is the last position.

18

l

l

Query cikmPos URL Click

1 cikm2008.org 02 www.cikm.org 13 www.fc.ul.pt/cikm 04 cikmconf.org 05 www.cikm.com/... 16 Ir.iit.edu/cikm2004 0

Query cikmPos URL Click

1 cikm2008.org 02 www.cikm.org 03 www.fc.ul.pt/cikm 04 cikmconf.org 05 www.cikm.com/... 06 Ir.iit.edu/cikm2004 0

Page 19: Statistic Models for Web/Sponsored Search Click Log Analysis The Chinese University of Hong Kong 1 Some slides are revised from Mr Guo Fan’s tutorial at

Estimation of r: Step 2

• Log-likelihood of a query session.

19

1

1

1

1

1

1

( (log log ) (1 ) log(1 ))

log (1 ) log(1 )

log(1 (1 ))

( log (1 ) log(1 ))

log(1 )

l

DCM i i i i ii

l l l l

M

l l jj l

l

i i i ii

l

i i li

L C r C r

C r C r

r

C r C r

C

Page 20: Statistic Models for Web/Sponsored Search Click Log Analysis The Chinese University of Hong Kong 1 Some slides are revised from Mr Guo Fan’s tutorial at

Estimation of r: Step 3

• By maximizing the lower bound of the log-likelihood, we have

20

1

1 1

( log (1 ) log(1 )) log(1 )

01

#click

#impression before or on position

l l

DCM i i i i i i li i

AllDCM

L C r C r C

L M N

r r rM

rM N l

Suppose the current pair has occurred in different sessions. For M sessions, it occurs before/on l and has been clicked; for N sessions, it occurs before/on l and is not clicked.

Page 21: Statistic Models for Web/Sponsored Search Click Log Analysis The Chinese University of Hong Kong 1 Some slides are revised from Mr Guo Fan’s tutorial at

Estimation of λ

• For a specific , By maximizing the lower bound of the log-likelihood, we have

21

i

1

1 1

( log (1 ) log(1 )) log(1 )

01

#query sessions when last clicked position =1

#query sessions when position is clicked

l l

DCM i i i i i i li i

AllDCM

i i i

i

L C r C r C

L B C

B i

B C i

Suppose there are totally A sessions. In B sessions, the position l is large than position i and click event happens in position i. In C sessions, the position l is just equal to position i. Other cases happen in the other A-B-C sessions.

Page 22: Statistic Models for Web/Sponsored Search Click Log Analysis The Chinese University of Hong Kong 1 Some slides are revised from Mr Guo Fan’s tutorial at

Property Verification

• Effective.

• Scalable and Incremental.

22

#click

#impression before or on position r

l

#query sessions when last clicked position =1

#query sessions when position is clickedi

i

i

Page 23: Statistic Models for Web/Sponsored Search Click Log Analysis The Chinese University of Hong Kong 1 Some slides are revised from Mr Guo Fan’s tutorial at

Evaluation Criteria for DCM

• Log-likelihood.– Given the document impression in the test set.– Compute the chance to recover the entire click vector.– Averaged over different query sessions.

23

Page 24: Statistic Models for Web/Sponsored Search Click Log Analysis The Chinese University of Hong Kong 1 Some slides are revised from Mr Guo Fan’s tutorial at

Experimental Result for DCM

24

Page 25: Statistic Models for Web/Sponsored Search Click Log Analysis The Chinese University of Hong Kong 1 Some slides are revised from Mr Guo Fan’s tutorial at

Some Other Evaluations

• Log-likelihood.– http://en.wikipedia.org/wiki/Likelihood_function#Log-likelihood

• Perplexity.– http://en.wikipedia.org/wiki/Perplexity

• Root mean square error (RMSE).– http://en.wikipedia.org/wiki/Root-mean-square_deviation

• Area under ROC curve.– http://en.wikipedia.org/wiki/Receiver_operating_characteristic

25

Page 26: Statistic Models for Web/Sponsored Search Click Log Analysis The Chinese University of Hong Kong 1 Some slides are revised from Mr Guo Fan’s tutorial at

Index

• Background.• A Simple Click Model.• Advanced Design.– Five extension directions.

• Advanced Estimation.• Project.

26

Page 27: Statistic Models for Web/Sponsored Search Click Log Analysis The Chinese University of Hong Kong 1 Some slides are revised from Mr Guo Fan’s tutorial at

1 Dependency from Previous Docs/Ads

• For position 4 in the following two cases, do they have the same chance to be examined?

• Intuitively, the left one has less chance, since user may find the URL he/she wants in position 2 and stops the session.

27

Query cikmPos URL Click

1 cikm2008.org 02 www.cikm.org 13 www.fc.ul.pt/cikm 04 cikmconf.org 05 www.cikm.com/... 06 Ir.iit.edu/cikm2004 0

Query cikmPos URL Click

1 cikm2008.org 02 www.cikm.org 03 www.fc.ul.pt/cikm 04 cikmconf.org 15 www.cikm.com/... 06 Ir.iit.edu/cikm2004 0

Page 28: Statistic Models for Web/Sponsored Search Click Log Analysis The Chinese University of Hong Kong 1 Some slides are revised from Mr Guo Fan’s tutorial at

Solution: Click Chain Model [Guo09]

• The chance of being examined depend on the relevance of previous documents/ads.

• Other similar work includes [Dupret08][Liu09].

28

Page 29: Statistic Models for Web/Sponsored Search Click Log Analysis The Chinese University of Hong Kong 1 Some slides are revised from Mr Guo Fan’s tutorial at

2 Perceived v.s. Actual Relevance

• After clicking the docs/ads, the actual relevance, by judging from the landing page, might be different from user’s perceived relevance.

29

Pizza

Query

Ad1

Ad2

before examination

after examination

Page 30: Statistic Models for Web/Sponsored Search Click Log Analysis The Chinese University of Hong Kong 1 Some slides are revised from Mr Guo Fan’s tutorial at

Solution: Dynamic Bayesian Network [Chapelle09]• For each ad, two kinds of relevance are defined, perceived

relevance r and actual relevance s. s would influence the examination probability of the latter docs/ads.

30

Page 31: Statistic Models for Web/Sponsored Search Click Log Analysis The Chinese University of Hong Kong 1 Some slides are revised from Mr Guo Fan’s tutorial at

3 Aggregate v.s. Instance Relevance

• Users might have different intents for the same query.• The click event could indicate the intent.

31

Aggregate search. E.g., learn the parameters

Instance search. E.g., buy a camera

CanonQuery

Ad1

Ad2

Canon

Ad1

Ad2

Canon

Ad1

Ad2

Page 32: Statistic Models for Web/Sponsored Search Click Log Analysis The Chinese University of Hong Kong 1 Some slides are revised from Mr Guo Fan’s tutorial at

Solution: Joint Relevance Examination Model [Srikant10]• Add a correction factor , which is determined by the click

events of other docs/ads.• Other similar work includes [Hu11].

32

( )i

Page 33: Statistic Models for Web/Sponsored Search Click Log Analysis The Chinese University of Hong Kong 1 Some slides are revised from Mr Guo Fan’s tutorial at

4 Competing Influence in Docs/Ads

• When co-occurred with a high-relevant doc/ad, the perceived relevance of the current doc/ad would be decreased.

33

Page 34: Statistic Models for Web/Sponsored Search Click Log Analysis The Chinese University of Hong Kong 1 Some slides are revised from Mr Guo Fan’s tutorial at

Solution: Temporal Click Model [Xu10]

• The docs/ads are competed to win the priority to be examined.

34

Page 35: Statistic Models for Web/Sponsored Search Click Log Analysis The Chinese University of Hong Kong 1 Some slides are revised from Mr Guo Fan’s tutorial at

5 Incorporating Features

• Feature example: dwelling time.

35

Page 36: Statistic Models for Web/Sponsored Search Click Log Analysis The Chinese University of Hong Kong 1 Some slides are revised from Mr Guo Fan’s tutorial at

Solution: Post-Clicked Click Model [Zhong 10]• Incorporating features to determine the relevance. • Other similar work include [Zhu 10].

36

Page 37: Statistic Models for Web/Sponsored Search Click Log Analysis The Chinese University of Hong Kong 1 Some slides are revised from Mr Guo Fan’s tutorial at

Index

• Background.• A Simple Click Model.• Advanced Design.• Advanced Estimation.– Bayesian framework and the rationale.– Bayesian browsing model.– Click chain model.

• Project.

37

Page 38: Statistic Models for Web/Sponsored Search Click Log Analysis The Chinese University of Hong Kong 1 Some slides are revised from Mr Guo Fan’s tutorial at

Limitation of Maximum Log-likelihood

• Cannot fit the scalable and incremental properties.– It has difficulty in getting closed-form formula, when the model is

complex.– Even in DCM as shown in this page, we need to approximate a lower

bound for easy calculation. • No prior information could be utilized in such sparse data

environment.

38

Log-likelihood of DCM

1

1

1

( (log log ) (1 ) log(1 ))

log (1 ) log(1 )

log(1 (1 ))

l

DCM i i i i ii

l l l l

M

l l jj l

L C r C r

C r C r

r

1

1

1

( log (1 ) log(1 ))

log(1 )

l

i i i ii

l

i i li

C r C r

C

Page 39: Statistic Models for Web/Sponsored Search Click Log Analysis The Chinese University of Hong Kong 1 Some slides are revised from Mr Guo Fan’s tutorial at

An Coin-Toss Example for Bayesian Framework

• Scenario: to estimate the probability of tossing a head according to the following five training samples.

• The probability is a variable X = x.• Each training sample is denoted by Ci , e.g., C1 = 1, C4=0.

• According to Bayesian rule, we have

39

1:5 1:5

1:5

1:5 1:5

( | ) ( ) ( | ) ( )( | )

( ) ( | ) ( )x

p C X x p X x p C X x p X xp X x C

p C p C X x p X x dx

Page 40: Statistic Models for Web/Sponsored Search Click Log Analysis The Chinese University of Hong Kong 1 Some slides are revised from Mr Guo Fan’s tutorial at

Bayesian Estimation of Coin-tossing

40

X

C1 C2 C3 C4 C5

1:5 1:5

1:5

1:5 1:5

( | ) ( ) ( | ) ( )( | )

( ) ( | ) ( )x

p C X x p X x p C X x p X xp X x C

p C p C X x p X x dx

( ) 1p x

1:5

5 51

1 1

( | ) ( | ) (1 )i iC Ci

i i

p C X x p C X x x x

Bayesian rule:

Uniform prior:

Independent sampling :

Distribution : 51

1:51

( | ) ( ) (1 )i iC C

i

p X x C p x x x

Estimation:

1:5( | )E X C

Page 41: Statistic Models for Web/Sponsored Search Click Log Analysis The Chinese University of Hong Kong 1 Some slides are revised from Mr Guo Fan’s tutorial at

Density Function Update of Coin-tossing

41

Prior Posterior

Density Function(not normalized)

x1(1-x)0 x2(1-x)0 x3(1-x)0

x3(1-x)1 x4(1-x)1

Page 42: Statistic Models for Web/Sponsored Search Click Log Analysis The Chinese University of Hong Kong 1 Some slides are revised from Mr Guo Fan’s tutorial at

Click Data Scenario

42

a

b

c

d

a

c

e

a

b

a

c

b

a

f

g

query

1:5

1:5

1:5

( | ) ( )( | )

( | ) ( )x

p C X x p X xp X x C

p C X x p X x dx

Bayesian rule:

( ) 1p x Uniform prior:

1:5

5

1

( | ) ( | )ii

p C X x p C X x

Independent sampling :

Distribution : 5

1:51

( | ) ( ) ( | )ii

p X x C p x p C X x

Page 43: Statistic Models for Web/Sponsored Search Click Log Analysis The Chinese University of Hong Kong 1 Some slides are revised from Mr Guo Fan’s tutorial at

Factor Trick

• If the factors of p(C|X) are arbitrary, for each training sample, a unique factor of p(X) must be stored. Thus it is space consuming;

• However if the factors of p(C|X) are from a small discrete set, only the exponents are needed to be stored.

43

Distribution : 5

1:51

( | ) ( ) ( | )ii

p X x C p x p C X x

Page 44: Statistic Models for Web/Sponsored Search Click Log Analysis The Chinese University of Hong Kong 1 Some slides are revised from Mr Guo Fan’s tutorial at

Updating Example

44

Prior

Density Function(not normalized)

x1

(1-x)0

(1-0.6x)0

(1+0.3x)1

(1-0.5x)0

(1-0.2x)0

x1

(1-x)1

(1-0.6x)0

(1+0.3x)1

(1-0.5x)0

(1-0.2x)0

x2

(1-x)1

(1-0.6x)0

(1+0.3x)2

(1-0.5x)0

(1-0.2x)0

x3

(1-x)1

(1-0.6x)1

(1+0.3x)2

(1-0.5x)0

(1-0.2x)0

x3

(1-x)1

(1-0.6x)1

(1+0.3x)2

(1-0.5x)1

(1-0.2x)0

Page 45: Statistic Models for Web/Sponsored Search Click Log Analysis The Chinese University of Hong Kong 1 Some slides are revised from Mr Guo Fan’s tutorial at

How to realize the factor trick?

• Setting a global parameter for all cases.– Bayesian browsing model (BBM) [Liu09].

• Assuming all other docs/ads follows the same distribution and integrating them.– Click chain model (CCM) [Guo09].

45

In the following two example, we only concern the estimation of r using Bayesian framework. The estimation of other parameters are all based on maximizing the log-likelihood similarly as shown in DCM. Please refer the original paper for details.

Page 46: Statistic Models for Web/Sponsored Search Click Log Analysis The Chinese University of Hong Kong 1 Some slides are revised from Mr Guo Fan’s tutorial at

Index

• Background.• A Simple Click Model.• Advanced Design.• Advanced Estimation.– Bayesian framework and the rationale.– Bayesian browsing model.– Click chain model.

• Project.

46

Page 47: Statistic Models for Web/Sponsored Search Click Log Analysis The Chinese University of Hong Kong 1 Some slides are revised from Mr Guo Fan’s tutorial at

BBM Variable Definition

47

• For a specific query session, let– ri, the relevance variable at position i. – Ei, the binary examination variable at position i. – Ci, the binary click variable at position i. – ni, last click position before position i. – di, the distance between position i and its previous clicked

position.

Page 48: Statistic Models for Web/Sponsored Search Click Log Analysis The Chinese University of Hong Kong 1 Some slides are revised from Mr Guo Fan’s tutorial at

Small Discrete Set of Beta

• Suppose M = 3 for simplicity illustration. • There are only 6 values of beta.

48

n=0d=1

n=0d=2

n=0d=3

n=1d=1

n=1d=2

n=2d=1

Page 49: Statistic Models for Web/Sponsored Search Click Log Analysis The Chinese University of Hong Kong 1 Some slides are revised from Mr Guo Fan’s tutorial at

Estimation Algorithms

49

1 2,

0, 0,

( | ) ( ) (1 )N Nn d

r d r d M

p r C p r r r

How many times the Doc/ad was clicked

How many times the Doc/ad was not clicked with the probability of betan,d

5

1:51

51

1

( | ) ( ) ( | )

( ) ( ( 1) ) (1 ( 1) )i i

ii

C Ca a

i

p X x C p x p C X x

p x p E x p E x

Page 50: Statistic Models for Web/Sponsored Search Click Log Analysis The Chinese University of Hong Kong 1 Some slides are revised from Mr Guo Fan’s tutorial at

Toy Example Step 1

50

• Only top M=3 positions are shown, 3 query sessions and 4 distinct URLs.

41

4

3

1 3

31 2

Position 1 2 3

Query Session 3

Query Session 2

Query Session 1

Page 51: Statistic Models for Web/Sponsored Search Click Log Analysis The Chinese University of Hong Kong 1 Some slides are revised from Mr Guo Fan’s tutorial at

Toy Example Step 2

51

• Initialize M(M+1)/2+1 counts for each URL.

URL Clicks n=0d=1

n=0d=2

n=0d=3

n=1d=1

n=1d=2

n=2d=1

4 0 0 0 0 0 0 0

Page 52: Statistic Models for Web/Sponsored Search Click Log Analysis The Chinese University of Hong Kong 1 Some slides are revised from Mr Guo Fan’s tutorial at

Toy Example Step 3

52

• Update counts for URL 4.– If not impressed, do nothing;– If clicked, increment “clicks” by 1;– Otherwise, locate the right r and d to increment.

URL Clicks n=0d=1

n=0d=2

n=0d=3

n=1d=1

n=1d=2

n=2d=1

4 0 0 0 0 0 0 0

Page 53: Statistic Models for Web/Sponsored Search Click Log Analysis The Chinese University of Hong Kong 1 Some slides are revised from Mr Guo Fan’s tutorial at

Toy Example Step 4

53

• Update counts for URL 4.– If not impressed, do nothing;– If clicked, increment “clicks” by 1;– Otherwise, locate the right r and d to increment.

URL Clicks n=0d=1

n=0d=2

n=0d=3

n=1d=1

n=1d=2

n=2d=1

4 0 0 0 0 0 0 1

Page 54: Statistic Models for Web/Sponsored Search Click Log Analysis The Chinese University of Hong Kong 1 Some slides are revised from Mr Guo Fan’s tutorial at

Toy Example Step 5

54

• Update counts for URL 4.– If not impressed, do nothing;– If clicked, increment “clicks” by 1;– Otherwise, locate the right r and d to increment.

URL Clicks n=0d=1

n=0d=2

n=0d=3

n=1d=1

n=1d=2

n=2d=1

4 1 0 0 0 0 0 1

Page 55: Statistic Models for Web/Sponsored Search Click Log Analysis The Chinese University of Hong Kong 1 Some slides are revised from Mr Guo Fan’s tutorial at

Toy Example Step 6

55

• The posterior for URL 4.

• Interpretation: – The larger the probability of examination, the stronger the penalty for

a non-click.

URL Clicks n=0d=1

n=0d=2

n=0d=3

n=1d=1

n=1d=2

n=2d=1

4 1 0 0 0 0 0 1

Page 56: Statistic Models for Web/Sponsored Search Click Log Analysis The Chinese University of Hong Kong 1 Some slides are revised from Mr Guo Fan’s tutorial at

Algorithm Complexities

56

• Let

• Initializing and updating the counts:– Time: Space:

Linear to the size of the click log

Almost constant storage required

Page 57: Statistic Models for Web/Sponsored Search Click Log Analysis The Chinese University of Hong Kong 1 Some slides are revised from Mr Guo Fan’s tutorial at

Index

• Background.• A Simple Click Model.• Advanced Design.• Advanced Estimation.– Bayesian framework and the rationale.– Bayesian browsing model.– Click chain model.

• Project.

57

Page 58: Statistic Models for Web/Sponsored Search Click Log Analysis The Chinese University of Hong Kong 1 Some slides are revised from Mr Guo Fan’s tutorial at

User Behavior Description

58

Examine the Document

Click?

See Next Doc?

DoneNo

Yes

Yes

No

Yes

iR

1 iRSee Next

Doc?

DoneNo

2 31 i iR R

Page 59: Statistic Models for Web/Sponsored Search Click Log Analysis The Chinese University of Hong Kong 1 Some slides are revised from Mr Guo Fan’s tutorial at

Estimation Algorithms

• By assuming other docs/ads in a session follow the same distribution and integrate them, the factors f p(C|R) could be

described from a small discrete set.

59

1

| |N

nj j j

n

p R p R P C R

C

Page 60: Statistic Models for Web/Sponsored Search Click Log Analysis The Chinese University of Hong Kong 1 Some slides are revised from Mr Guo Fan’s tutorial at

Five Cases

• The current doc/ad may occur in five different cases. • For each case, there would be unique factors for p(C|Ri).

60

Page 61: Statistic Models for Web/Sponsored Search Click Log Analysis The Chinese University of Hong Kong 1 Some slides are revised from Mr Guo Fan’s tutorial at

Case 1

61

( | ) ( 0 | 1, ) 1i i i i i iP C R P C E R R

• The doc/ad must be examined. • Other R can seen as constants.

Page 62: Statistic Models for Web/Sponsored Search Click Log Analysis The Chinese University of Hong Kong 1 Some slides are revised from Mr Guo Fan’s tutorial at

Case 2

62

Page 63: Statistic Models for Web/Sponsored Search Click Log Analysis The Chinese University of Hong Kong 1 Some slides are revised from Mr Guo Fan’s tutorial at

Case 3

63

Page 64: Statistic Models for Web/Sponsored Search Click Log Analysis The Chinese University of Hong Kong 1 Some slides are revised from Mr Guo Fan’s tutorial at

All Cases

64

• By assuming other docs/ads in a session follows the same distribution and integrate them, the factors f p(C|R) could be

described from a small discrete set.

1

| |N

nj j j

n

p R p R P C R

C

Page 65: Statistic Models for Web/Sponsored Search Click Log Analysis The Chinese University of Hong Kong 1 Some slides are revised from Mr Guo Fan’s tutorial at

Index

• Background.• A Simple Click Model.• Advanced Design.• Advanced Estimation.• Project.

65

Page 66: Statistic Models for Web/Sponsored Search Click Log Analysis The Chinese University of Hong Kong 1 Some slides are revised from Mr Guo Fan’s tutorial at

Description• Fake dataset.• Format.

– queryId– ad1Id, click– ad2Id, click– ad3Id, click

• Evaluation Metric: ROC.• Baseline.

– Average (Avg).• Current competitive method.

– Simplified CCM (SCCM).• Task.

– Implement another advanced click model. – Compare the result with the Avg and SCCM.– Analyzing the reasons of improvement.

66

Page 67: Statistic Models for Web/Sponsored Search Click Log Analysis The Chinese University of Hong Kong 1 Some slides are revised from Mr Guo Fan’s tutorial at

End

67