machine learning at criteo - paris datageeks

37
Shallow learning @ Criteo Nicolas Le Roux Scientific Program Manager – R&D

Upload: nicolas-le-roux

Post on 07-May-2015

8.420 views

Category:

Technology


0 download

DESCRIPTION

Machine learning at Criteo - How to scale machine learning algorithms to perform real-time prediction of CTR and recommendation of products.

TRANSCRIPT

Page 1: Machine learning at Criteo - Paris Datageeks

Shallow learning @ Criteo

Nicolas Le RouxScientific Program Manager – R&D

Page 2: Machine learning at Criteo - Paris Datageeks

I N T R O D U C T I O N

Copyright © 2013 Criteo

Page 3: Machine learning at Criteo - Paris Datageeks

WHAT DOES CRITEO DO?

3

• We buy advertising space on websites

• We display ads for our partners

• We get paid if the user clicks on the ad

Copyright © 2013 Criteo

Page 4: Machine learning at Criteo - Paris Datageeks

HOW DO WE BUY ADVERTISING SPACE?

4

• Real-time bidding (RTB): it is an auction

• Second-price

• Optimal strategy: bid the expected gain

• Expected gain = CPC * CTR

Copyright © 2013 Criteo

Page 5: Machine learning at Criteo - Paris Datageeks

WHAT TO DO ONCE WE WIN THE DISPLAY?

5

• Choose the best products

• Choose the color, the font, the layout…

• Generate the banner

Copyright © 2013 Criteo

Page 6: Machine learning at Criteo - Paris Datageeks

WHERE IS THE ML AT CRITEO?

6

• Click prediction model

• Product recommendation

• Testing, …

Copyright © 2013 Criteo

Page 7: Machine learning at Criteo - Paris Datageeks

WHERE IS THE ML AT CRITEO?

7

• Click prediction model

• Product recommendation

• Testing, …

Copyright © 2013 Criteo

Page 8: Machine learning at Criteo - Paris Datageeks

INFRASTRUCTURE AT CRITEO

8

• 5 data centers on 3 continents

• More than 3 500 servers

• More than 30B HTTP requests per day

• More than 2B banners displayed per day

• Peak traffic: 500K HTTP requests per second

Copyright © 2013 Criteo

Page 9: Machine learning at Criteo - Paris Datageeks

C L I C K P R E D I C T I O N M O D E L

Copyright © 2013 Criteo

Page 10: Machine learning at Criteo - Paris Datageeks

CTR PREDICTION MODEL

10

𝑃 (𝑐𝑙𝑖𝑐𝑘|𝑎𝑖 )= 𝑓 (𝑎𝑖 ,𝜃 )Copyright © 2013 Criteo

Page 11: Machine learning at Criteo - Paris Datageeks

CTR PREDICTION MODEL

11

• : features computed for this advertiser

Copyright © 2013 Criteo

Page 12: Machine learning at Criteo - Paris Datageeks

CTR PREDICTION MODEL

12

• : features computed for this advertiser

• : parameters of the model

Copyright © 2013 Criteo

Page 13: Machine learning at Criteo - Paris Datageeks

CTR PREDICTION MODEL

13

• : features computed for this advertiser

• : parameters of the model

• : prediction model

Copyright © 2013 Criteo

Page 14: Machine learning at Criteo - Paris Datageeks

CTR PREDICTION MODEL

14

• : [TimeOfDay, AdvertiserID, SizeOfAd]

• : learnt using an optimisation algorithm

• : dot product

Copyright © 2013 Criteo

Page 15: Machine learning at Criteo - Paris Datageeks

15-SECOND ML CRASH COURSE

15

• Get a massive amount of (, click) pairs

• Find the value of which gives the best prediction on these pairs.

Copyright © 2013 Criteo

Page 16: Machine learning at Criteo - Paris Datageeks

NEW HIRE REACTION

16

𝑃 (𝑐𝑙𝑖𝑐𝑘|𝑎𝑖 )= 𝑓 (𝑎𝑖 ,𝜃 )Copyright © 2013 Criteo

Page 17: Machine learning at Criteo - Paris Datageeks

NEW HIRE REACTION

17

• « You should use [his PhD topic] for f, it’s awesome! »

Copyright © 2013 Criteo

Page 18: Machine learning at Criteo - Paris Datageeks

NEW HIRE REACTION

18

• « You should use [his PhD topic] for f, it’s awesome! »

• « To f, you should use [other part of his PhD] »

Copyright © 2013 Criteo

Page 19: Machine learning at Criteo - Paris Datageeks

NEW HIRE REACTION

19

• « You should use [his PhD topic] for f, it’s awesome! »

• « To f, you should use [other part of his PhD] »

• « I think you should use [a particular optimizer he used during his PhD] »

Copyright © 2013 Criteo

Page 20: Machine learning at Criteo - Paris Datageeks

REAL TIME CONSTRAINTS FOR

20

• Bid must happen within a few ms

• Includes extracting variables, scoring tens of campaigns

• Leaves s per prediction

• Process is extremely noisyCopyright © 2013 Criteo

Page 21: Machine learning at Criteo - Paris Datageeks

IS WEIRD

21

• Around features

• Extremely sparse

• Website URL, Merchant ID, ...

Copyright © 2013 Criteo

Page 22: Machine learning at Criteo - Paris Datageeks

OPTIMIZING

22

• Merchant ID is too raw

• We can learn fancy features for each merchant: demographics, size, ...

• Hard to not penalize large merchants

Copyright © 2013 Criteo

Page 23: Machine learning at Criteo - Paris Datageeks

OPTIMIZING THE MODEL

23

• Computing is extremely fast

• The botteleneck is the streaming of the data

• Not all points are created equal

• Best optimizer is the one that needs the fewest data

Copyright © 2013 Criteo

Page 24: Machine learning at Criteo - Paris Datageeks

DEALING WITH MANY VARIABLES

24

• can be extremely large

• It makes learning and prediction complicated

Copyright © 2013 Criteo

Page 25: Machine learning at Criteo - Paris Datageeks

DEALING WITH MANY VARIABLES

25

• can be extremely large

• It makes learning and prediction complicated

• Solution: hash into

• New model is

Copyright © 2013 Criteo

Page 26: Machine learning at Criteo - Paris Datageeks

OTHER CHALLENGES

26

• How long do you wait for sales after a click?

• How much in the past do you go to learn the model?

• Do you value more long term or short term variables?

• How to choose the best variables?

Copyright © 2013 Criteo

Page 27: Machine learning at Criteo - Paris Datageeks

P R O D U C T R E C O M M E N D A T I O N

Copyright © 2013 Criteo

Page 28: Machine learning at Criteo - Paris Datageeks

THE CHALLENGES OF PRODUCT RECOMMENDATION

28

• We have the list of products seen

• A catalog can contain products

• How do we choose the right products to show?

Copyright © 2013 Criteo

Page 29: Machine learning at Criteo - Paris Datageeks

THE CHALLENGES OF PRODUCT RECOMMENDATION

29

• We have the list of products seen

• A catalog can contain products

• How do we choose the right products to show?

• In less than 20ms

Copyright © 2013 Criteo

Page 30: Machine learning at Criteo - Paris Datageeks

NEW HIRE REACTION

30

• « You should use [his PhD topic], it’s awesome! »

Copyright © 2013 Criteo

Page 31: Machine learning at Criteo - Paris Datageeks

NEW HIRE REACTION

31

• « You should use [his PhD topic], it’s awesome! »

• « It just takes a week to train. »

Copyright © 2013 Criteo

Page 32: Machine learning at Criteo - Paris Datageeks

NEW HIRE REACTION

32

• « You should use [his PhD topic], it’s awesome! »

• « It just takes a week to train. »

• « In real-time, you just have to score every product. »

Copyright © 2013 Criteo

Page 33: Machine learning at Criteo - Paris Datageeks

MAJOR HURDLES

33

• Products come and go

• There might be a sale (Black Friday)

• The « panties syndrome »

• Complementary vs. similar products

Copyright © 2013 Criteo

Page 34: Machine learning at Criteo - Paris Datageeks

OTHER CHALLENGES

34

• Multiple products in a banner

• Interaction between products and layout

• Different timeframes for different products

Copyright © 2013 Criteo

Page 35: Machine learning at Criteo - Paris Datageeks

A FIRST RECIPE FOR SUCCESS

35

• There are many sources of success/failure

• It is often suboptimal to focus on one

• The first step to address each source is often manual

Copyright © 2013 Criteo

Page 36: Machine learning at Criteo - Paris Datageeks

CONCLUSION

36

• Prediction is at the core of our business

• Huge engineering constraints

• Different bottlenecks than in academia

• Build from the ground up, not the other way around

Copyright © 2013 Criteo