copyright © 2015 criteo large-scale real-time product recommendation at criteo simon dollé recsys...

Post on 17-Jan-2016

221 Views

Category:

Documents

0 Downloads

Preview:

Click to see full reader

TRANSCRIPT

Copyright © 2015 Criteo

Large-Scale Real-Time Product Recommendation at Criteo

Simon Dollé

RecSys FR, December 1st, 2015

Copyright © 2015 Criteo

Copyright © 2015 Criteo

We buy

Ad spaces

Copyright © 2015 Criteo

We buy

Ad spaces

We sell

Clicks

Copyright © 2015 Criteo

We buy

Ad spaces

We sell

Clicksthat convert

Copyright © 2015 Criteo

We buy

Ad spaces

We sell

Clicksthat converta lot

Copyright © 2015 Criteo

We buy

Ad spaces

We sell

Clicksthat converta lot

We take the risk

10 000 displays

10 000 displays

leads to

50 clicks

10 000 displays

leads to

50 clicks

leads to

1 sale

3 billion ads/day3 billion products

10ms to pick relevant products

7 data centers15 000 servers

1200-node hadoop cluster

Catalog data3B+ products

Browsing history2B events / day

Catalog data3B+ products

Ad display data20B events / day

Browsing history2B events / day

Catalog data3B+ products

Copyright © 2015 Criteo

How do we do it ?

Copyright © 2015 Criteo

Recommend products for a user

• What we want: reco(user) = products

• 1B users x 3B products !• But we need to scale and keep it fresh

• What we can do :

Pre-select products offline

Refine scoring online to get final candidates

Bob saw orange shoes

Bob saw orange shoes

Some candidate products

Historical

Bob saw orange shoes

Some candidate products

Historical

Most viewed

Bob saw orange shoes

Some candidate products

Historical

Most viewed

Bob saw orange shoes

Some candidate products

Historical

Similar

Most viewed

Bob saw orange shoes

Some candidate products

Historical

Similar

Most viewed

Bob saw orange shoes

Some candidate products

Historical

Similar

Complementary

Most viewed

Recommendation Service20K qps

HADOOPBrowsing

history

Recommendation Service

50B

20K qps

Preselection computation Map-Reduce jobs

HADOOPBrowsing

history

Preselections

Recommendation Service

50B

12h

20K qps

Preselection computation Map-Reduce jobs

500M

Copyright © 2015 Criteo

Online: sources

Similarities Most viewed Most bought

Copyright © 2015 Criteo

Online: merge of products

Similarities Most viewed Most bought

Copyright © 2015 Criteo

ML model

• Logistic regression models because : • They scale• They are fast• They can handle lots of features

Product-specific User-specific User-product interactions Display-specific

HADOOPBrowsing

history

Recommendation Service

50B

12h

20K qps

Preselection computation Map-Reduce jobs

500M

Preselections

HADOOPBrowsing

history

Prediction models

Recommendation Service

50B

12h

6h

20K qps

Preselection computation Map-Reduce jobs

500M

Preselections

HADOOPBrowsing

history

Prediction models

Recommendation Service

50B

12h

6h

20K qps

Display, Click, Sale logs

Preselection computation Map-Reduce jobs

500M

Preselections

HADOOPBrowsing

history

Prediction models

Recommendation Service

50B

12h

6h

20K qps

Display, Click, Sale logs

Preselection computation Map-Reduce jobs

500M

Preselections

Copyright © 2015 Criteo

Online: scoring

Similarities Most viewed Most bought

0,02 0,12 0,06 0,18 0,03 0,05 0,01 0,005 0,011 0,013 0,004 0,007

Copyright © 2015 Criteo

Online: scoring

Similarities Most viewed Most bought

0,18 0,12 0,06 0,05 0,03 0,02 0,013 0,011 0,01 0,007 0,005 0,004

Copyright © 2015 Criteo

Online: candidates

0,18 0,12 0,06 0,05 0,03 0,02 0,013 0,011 0,01 0,007 0,005 0,004

SHOP SHOP SHOP SHOP

-50%

Copyright © 2015 Criteo

What’s next ?

Copyright © 2015 Criteo

What’s next for us: Upcoming challenges

• Long(er)-term user profiles

Copyright © 2015 Criteo

What’s next for us: Upcoming challenges

• Long(er)-term user profiles

• More and better product information (images, semantic, NLP)

Copyright © 2015 Criteo

What’s next for us: Upcoming challenges

• Long(er)-term user profiles

• More and better product information (images, semantic, NLP)

• Instant-update of similarities

Copyright © 2015 Criteo

What’s next for us: Upcoming challenges

• Long(er)-term user profiles

• More and better product information (images, semantic, NLP)

• Instant-update of similarities

• Joint product scoring • (score full banner and not products independently)

Copyright © 2015 Criteo

What’s next for you: Fancy a try?

On your own:

With us !

http://labs.criteo.com/jobs/

• We published datasets for click prediction• 4GB display-click data: Kaggle challenge in 2014 http://bit.ly/1vgw2XC• 1TB Display-Click data (industry’s largest dataset): http://bit.ly/1PyH4Vq

• 4 billion of observations• 156 billion feature-value• available on Microsoft Azure• used by edX (UC Berkeley)

Copyright © 2015 Criteo

Copyright © 2015 Criteo

Questions?

Copyright © 2015 Criteo

Thank you !s.dolle@criteo.com

@simondolle@recsysfr

Credits: Creative Stall, Gilbert Bages

top related