learning to interact...learning to interact. this i’ll be back! using log statistics to justify...

38
Léon Bottou Learning to Interact

Upload: others

Post on 04-Jun-2020

1 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Learning to Interact...Learning to Interact. this I’ll be back! Using log statistics to justify product changes. Product logic Logged data Outcomes Almost there! Counterfactual reasoning

Léon Bottou

Learning to Interact

Page 2: Learning to Interact...Learning to Interact. this I’ll be back! Using log statistics to justify product changes. Product logic Logged data Outcomes Almost there! Counterfactual reasoning

this

I’ll be back!

Page 3: Learning to Interact...Learning to Interact. this I’ll be back! Using log statistics to justify product changes. Product logic Logged data Outcomes Almost there! Counterfactual reasoning
Page 4: Learning to Interact...Learning to Interact. this I’ll be back! Using log statistics to justify product changes. Product logic Logged data Outcomes Almost there! Counterfactual reasoning

Using log statistics to justify product changes

Page 5: Learning to Interact...Learning to Interact. this I’ll be back! Using log statistics to justify product changes. Product logic Logged data Outcomes Almost there! Counterfactual reasoning

Product logic

Logged data

OutcomesAlmost there!

Page 6: Learning to Interact...Learning to Interact. this I’ll be back! Using log statistics to justify product changes. Product logic Logged data Outcomes Almost there! Counterfactual reasoning

Counterfactual reasoning

Offline policy evaluation

Contextual bandits

Explore/Exploit

Multi-world testing

And many more people working on connected topics.

Page 7: Learning to Interact...Learning to Interact. this I’ll be back! Using log statistics to justify product changes. Product logic Logged data Outcomes Almost there! Counterfactual reasoning
Page 8: Learning to Interact...Learning to Interact. this I’ll be back! Using log statistics to justify product changes. Product logic Logged data Outcomes Almost there! Counterfactual reasoning
Page 9: Learning to Interact...Learning to Interact. this I’ll be back! Using log statistics to justify product changes. Product logic Logged data Outcomes Almost there! Counterfactual reasoning

reason about the outcome of manipulations

Page 10: Learning to Interact...Learning to Interact. this I’ll be back! Using log statistics to justify product changes. Product logic Logged data Outcomes Almost there! Counterfactual reasoning
Page 11: Learning to Interact...Learning to Interact. this I’ll be back! Using log statistics to justify product changes. Product logic Logged data Outcomes Almost there! Counterfactual reasoning

A: Highlight

suggestion in red

B: User

takes suggestion

Page 12: Learning to Interact...Learning to Interact. this I’ll be back! Using log statistics to justify product changes. Product logic Logged data Outcomes Almost there! Counterfactual reasoning

after

A: Highlight

suggestion in red

B: User

takes suggestion

?

Page 13: Learning to Interact...Learning to Interact. this I’ll be back! Using log statistics to justify product changes. Product logic Logged data Outcomes Almost there! Counterfactual reasoning

A: Highlight

suggestion in red

B: User

takes suggestion

C: User often

takes suggestions

Page 14: Learning to Interact...Learning to Interact. this I’ll be back! Using log statistics to justify product changes. Product logic Logged data Outcomes Almost there! Counterfactual reasoning

Red highlight increases take rate.

despite the fact that he dislikes red

Red highlight decreases take rate.

A: Highlight

suggestion in red

B: User

takes suggestion

C: User often

takes suggestions

?

Page 15: Learning to Interact...Learning to Interact. this I’ll be back! Using log statistics to justify product changes. Product logic Logged data Outcomes Almost there! Counterfactual reasoning

Average

take rate

Take rate for

C=false

Take rate for

C=true

No highlight24/200

(12%)

Red highlight30/200

(15%)

Users take suggestions

more often with red

highlights

1

Page 16: Learning to Interact...Learning to Interact. this I’ll be back! Using log statistics to justify product changes. Product logic Logged data Outcomes Almost there! Counterfactual reasoning

Average

take rate

Take rate for

C=false

Take rate for

C=true

No highlight24/200

(12%)

•/182 •/18

Red highlight30/200

(15%)

•/150 •/50

Users take suggestions

more often with red

highlights

Bug favors red

highlights when

user is known to

like suggestions

1

2

Page 17: Learning to Interact...Learning to Interact. this I’ll be back! Using log statistics to justify product changes. Product logic Logged data Outcomes Almost there! Counterfactual reasoning

Average

take rate

Take rate for

C=false

Take rate for

C=true

No highlight24/200

(12%)

18/182

(10%)

6/18

(33.3%)

Red highlight30/200

(15%)

14/150

(9.3%)

16/50

(32%)

Users take suggestions

more often with red

highlights

Bug favors red

highlights when

user is known to

like suggestions

1

2

In fact, the users take the suggestion

because they like suggestions,

and despite slightly disliking the red highlights!

3

This effect is named “Simpson’s paradox” (1951).

Page 18: Learning to Interact...Learning to Interact. this I’ll be back! Using log statistics to justify product changes. Product logic Logged data Outcomes Almost there! Counterfactual reasoning

Average

take rate

Take rate for

C=false

Take rate for

C=true

No highlight24/200

(12%)

18/182

(10%)

6/18

(33.3%)

Red highlight30/200

(15%)

14/150

(9.3%)

16/50

(32%)

Users take suggestions

more often with red

highlights

Bug favors red

highlights when

user is known to

like suggestions

1

2

Red highlights are a bad idea for both kinds of users.

3

Using more data won’t make the answer correct!

Page 19: Learning to Interact...Learning to Interact. this I’ll be back! Using log statistics to justify product changes. Product logic Logged data Outcomes Almost there! Counterfactual reasoning

Unawarepositive correlation

more red highlights

a bad idea all along

Page 20: Learning to Interact...Learning to Interact. this I’ll be back! Using log statistics to justify product changes. Product logic Logged data Outcomes Almost there! Counterfactual reasoning

A: Randomly highlight

suggestion in red

B: User

takes suggestion

Possible unknown

common causes

Page 21: Learning to Interact...Learning to Interact. this I’ll be back! Using log statistics to justify product changes. Product logic Logged data Outcomes Almost there! Counterfactual reasoning

no

no

A: Randomly select

treatment

B: Patient recovers

Possible unknown

common causes

?

Page 22: Learning to Interact...Learning to Interact. this I’ll be back! Using log statistics to justify product changes. Product logic Logged data Outcomes Almost there! Counterfactual reasoning
Page 23: Learning to Interact...Learning to Interact. this I’ll be back! Using log statistics to justify product changes. Product logic Logged data Outcomes Almost there! Counterfactual reasoning
Page 24: Learning to Interact...Learning to Interact. this I’ll be back! Using log statistics to justify product changes. Product logic Logged data Outcomes Almost there! Counterfactual reasoning

A/B

Testing

“Multi-world”

Testing

randomization.

Page 25: Learning to Interact...Learning to Interact. this I’ll be back! Using log statistics to justify product changes. Product logic Logged data Outcomes Almost there! Counterfactual reasoning

Same as

randomized

drug testing!

Page 26: Learning to Interact...Learning to Interact. this I’ll be back! Using log statistics to justify product changes. Product logic Logged data Outcomes Almost there! Counterfactual reasoning

Outcomes

One

user

BingAB

Page 27: Learning to Interact...Learning to Interact. this I’ll be back! Using log statistics to justify product changes. Product logic Logged data Outcomes Almost there! Counterfactual reasoning

Users Advertisers

QueriesAds

& Bids

AdsPrices

Clicks (and other outcomes)Click Prediction

ADVERTISER

FEEDBACK LOOP

CLICK PREDICTION

FEEDBACK LOOP

USER

FEEDBACK

LOOP

Ad Placement

Page 28: Learning to Interact...Learning to Interact. this I’ll be back! Using log statistics to justify product changes. Product logic Logged data Outcomes Almost there! Counterfactual reasoning

Formulate

“”

Code product-grade

Allocate traffic

Collect data long enough

Page 29: Learning to Interact...Learning to Interact. this I’ll be back! Using log statistics to justify product changes. Product logic Logged data Outcomes Almost there! Counterfactual reasoning

Collect data carefully randomized

Formulate

when the data was collected?”

Answer offline

Pay the data collection price only once.

Iterate very quickly because evaluation happens offline.

Page 30: Learning to Interact...Learning to Interact. this I’ll be back! Using log statistics to justify product changes. Product logic Logged data Outcomes Almost there! Counterfactual reasoning

Reward 𝑟

Action Probability

10%

47%

25%

18%

Policy computes the

probability of each action.

One action is

randomly selected.

The reward depends on

both context and action.

Page 31: Learning to Interact...Learning to Interact. this I’ll be back! Using log statistics to justify product changes. Product logic Logged data Outcomes Almost there! Counterfactual reasoning

policy 𝜋′ data collection policy 𝜋?”

Observed

Reward 𝑟

Action𝝅 𝝅′

𝒂𝟏 10% 3%

𝒂𝟐 25% 12%

𝒂𝟑 47% 35%

𝒂𝟒 18% 50%

Observed reward 𝑟

occurs with p=47%

under policy 𝜋,

occurs with p=35%

under policy 𝜋′.

Estimate the average

reward under policy 𝜋′,

by giving weight 35/47

to this reward 𝑟.

Page 32: Learning to Interact...Learning to Interact. this I’ll be back! Using log statistics to justify product changes. Product logic Logged data Outcomes Almost there! Counterfactual reasoning

Work in progressA cloud service targeting contextual bandits-style problems.

Page 33: Learning to Interact...Learning to Interact. this I’ll be back! Using log statistics to justify product changes. Product logic Logged data Outcomes Almost there! Counterfactual reasoning

Yahoo (2011) – Personalized news.

AdCenter (2011) – “Metropolis” randomization.

LinkedIn (2013) – LinkedIn ad placement.

Bing (2014) – Optimization of click metrics in Bing Speller.

Li, Chu, Langford, Wang – “Unbiased offline evaluation …”, WSDM 2011.

Bottou et al. – “Counterfactual reasoning and learning systems …”, JMLR 2013.

Agarwal - Simons Foundation Workshop, Berkeley, 2013.

Li et al. – Submitted, KDD 2014.

Page 34: Learning to Interact...Learning to Interact. this I’ll be back! Using log statistics to justify product changes. Product logic Logged data Outcomes Almost there! Counterfactual reasoning

Envision the full spectrum of user interaction policies from the start.

Deploy simple policy with randomized exploration.

Use interaction data to tune the policy and learn how to “delight users”.

Logging is often under-appreciated.

Correctness and completeness of the logs is critical.

Example: logging outcomes rather than decisions…

Page 35: Learning to Interact...Learning to Interact. this I’ll be back! Using log statistics to justify product changes. Product logic Logged data Outcomes Almost there! Counterfactual reasoning
Page 36: Learning to Interact...Learning to Interact. this I’ll be back! Using log statistics to justify product changes. Product logic Logged data Outcomes Almost there! Counterfactual reasoning
Page 37: Learning to Interact...Learning to Interact. this I’ll be back! Using log statistics to justify product changes. Product logic Logged data Outcomes Almost there! Counterfactual reasoning

Large scale user interaction data offers

extraordinary opportunities to improve our products.

This is more complex than vanilla machine learning

because correlation does not imply causation.

MSR has world-class expertise and technology in this area.

Page 38: Learning to Interact...Learning to Interact. this I’ll be back! Using log statistics to justify product changes. Product logic Logged data Outcomes Almost there! Counterfactual reasoning