1 estimating rates of rare events at multiple resolutions deepak agarwal andrei broder deepayan...

26
1 Estimating Rates of Rare Events at Multiple Resolutions Deepak Agarwal Andrei Broder Deepayan Chakrabarti Dejan Diklic Vanja Josifovski Mayssam Sayyadian

Post on 20-Dec-2015

215 views

Category:

Documents


0 download

TRANSCRIPT

1

Estimating Rates of Rare Events at Multiple Resolutions

Deepak AgarwalAndrei BroderDeepayan ChakrabartiDejan DiklicVanja JosifovskiMayssam Sayyadian

2

Estimation in the “tail”

Contextual Advertising Show an ad on a webpage (“impression”) Revenue is generated if a user clicks Problem: Estimate the click-through rate (CTR) of

an ad on a page Most (ad, page) pairs have very few impressions, if any, and even fewer clicks Severe data sparsity

3

Estimation in the “tail”

Use an existing, well-understood hierarchy Categorize ads and webpages to leaves of the

hierarchy CTR estimates of siblings are correlated The hierarchy allows us to aggregate data

Coarser resolutions provide reliable estimates for rare events which then influences estimation at finer

resolutions

4

System overview

Retrospective data[URL, ad, isClicked]

Crawl URLs

Classify pages and ads

Rare event estimation using

hierarchy

a sample of URLs

Impute impressions, fix sampling bias

5

Sampling of webpages

Naïve strategy: sample at random from the set of URLs Sampling errors in impression volume AND click

volume Instead, we propose:

Crawling all URLs with at least one click, and a sample of the remaining URLs Variability is only in impression volume

6

Imputation of impression volume

Ad classes

Pag

e cl

asse

s

sums to #impressions on ads of this ad class

[column constraint]

sums to ∑nij + K.∑mij

[row constraint]

sums toTotal impressions

(known)

#impressions = nij + mij + xij

Clicked pool

Sampled Non-clicked

pool

Excess impressions(to be imputed)

7

Imputation of impression volume Level 0

Level i

Page hierarchy Ad hierarchy

Region= (page node, ad node)

Region Hierarchy A cross-product of the page

hierarchy and the ad hierarchy

Page classes Ad classes

Region

8

Imputation of impression volume

sums to

[block constraint]

Level i

Level i+1

9

Imputing xij

Level i

Level i+1

Iterative Proportional Fitting [Darroch+/1972]

• Initialize xij = nij + mij

• Iteratively scale xij values to match row/col/block constraint

• Ordering of constraints: top-down, then bottom-up, and repeat

blockPage classes Ad classes

10

Imputation: Summary

Given nij (impressions in clicked pool)

mij (impressions in sampled non-clicked pool) # impressions on ads of each ad class in the ad

hierarchy We get

Estimated impression volume Ñij = nij + mij + xij

in each region ij of every level

11

System overview

Retrospective data[page, ad, isclicked]

Crawl Pages

Classify pages and ads

Rare event estimation using

hierarchy

a sample of pages

Impute impressions, fix sampling bias

12

Rare rate modeling

1. Freeman-Tukey transform: yij = F-T(clicks and impressions at ij)

≈ transformed-CTR Variance stabilizing transformation: Var(y) is

independent of E[y] needed in further modeling

13

SijSparent(ij)

Rare rate modeling

2. Generative Model (Tree-structured Markov Model)

yij yparent(ij)

covariates βij variance Vij

Unobserved “state”

variance Wij

Vparent(ij)

βparent(ij)

Wparent(ij)

14

Rare rate modeling

Model fitting with a 2-pass Kalman filter: Filtering: Leaf to root Smoothing: Root to leaf

Linear in thenumber of regions

15

Experiments

503M impressions 7-level hierarchy of which the top 3 levels

were used Zero clicks in

76% regions in level 2 95% regions in level 3

Full dataset DFULL, and a 2/3 sample DSAMPLE

16

Experiments

Estimate CTRs for all regions R in level 3 with zero clicks in DSAMPLE

Some of these regions R>0 get clicks in DFULL

A good model should predict higher CTRs for R>0 as against the other regions in R

17

Experiments

We compared 4 models TS: our tree-structured model LM (level-mean): each level smoothed

independently NS (no smoothing): CTR proportional to 1/Ñ Random: Assuming |R>0| is given, randomly

predict the membership of R>0 out of R

18

Experiments

TS

Rando

m

LM, N

S

19

Experiments

Enough impressions little “borrowing”

from siblings

Few impressions Estimates depend more on siblings

20

Related Work

Multi-resolution modeling studied in time series modeling and spatial

statistics [Openshaw+/79, Cressie/90, Chou+/94] Imputation

studied in statistics [Darroch+/1972]

Application of such models to estimation of such rare events (rates of ~10-3) is novel

21

Conclusions

We presented a method to estimate rates of extremely rare events at multiple resolutions under severe sparsity constraints

Our method has two parts Imputation incorporates hierarchy, fixes

sampling bias Tree-structured generative model extremely

fast parameter fitting

22

Rare rate modeling

1. Freeman-Tukey transform

Distinguishes between regions with zero clicks based on the number of impressions

Variance stabilizing transformation: Var(y) is independent of E[y] needed in further modeling

~ ~

# clicks in region r

# impressions in region r

23

Rare rate modeling

Generative Model Sij values can be quickly

estimated using a Kalman filtering algorithm

Kalman filter requires knowledge of β, V, and W EM wrapped around the

Kalman filter

filtering

smoo

thin

g

24

Rare rate modeling

Fitting using a Kalman filtering algorithm Filtering: Recursively aggregate

data from leaves to root Smoothing: Propagate

information from root to leaves

Complexity: linear in the number of regions, for both time and space

filtering

smoo

thin

g

25

Rare rate modeling

Fitting using a Kalman filtering algorithm Filtering: Recursively aggregate

data from leaves to root Smoothing: Propagates

information from root to leaves

Kalman filter requires knowledge of β, V, and W EM wrapped around the

Kalman filter

filtering

smoo

thin

g

26

Imputing xij

Z(i)

Z(i+1)

Iterative Proportional Fitting [Darroch+/1972]

Initialize xij = nij + mij

Top-down:

• Scale all xij in every block in Z(i+1) to sum to its parent in Z(i)

• Scale all xij in Z(i+1) to sum to the row totals

• Scale all xij in Z(i+1) to sum to the column totals

Repeat for every level Z(i)

Bottom-up: Similar

blockPage classes Ad classes