paris ml meetup

Machine Learning @ Netflix(and some lessons learned)

Yves Raimond (@moustaki)

Research/Engineering Manager

Search & Recommendations

Algorithm Engineering

https://twitter.com/moustaki

Netflix evolution

Netflix scale● > 69M members

● > 50 countries

● > 1000 device types

● > 3B hours/month

● 36% of peak US downstream traffic

Recommendations @ Netflix

● Goal: Help members find content

to watch and enjoy to maximize

satisfaction and retention

● Over 80% of what people watch

comes from our recommendations

● Top Picks, Because you Watched,

Trending Now, Row Ordering,

Evidence, Search, Search

Recommendations, Personalized

Genre Rows, ...

▪ Regression (Linear, logistic, elastic net)

▪ SVD and other Matrix Factorizations

▪ Factorization Machines

▪ Restricted Boltzmann Machines

▪ Deep Neural Networks

▪ Markov Models and Graph Algorithms

▪ Clustering

▪ Latent Dirichlet Allocation

▪ Gradient Boosted Decision Trees/Random Forests

▪ Gaussian Processes

▪ …

Models & Algorithms

Some lessons learned

Build the offline experimentation framework first

When tackling a new problem● What offline metrics can we compute that capture what online improvements we’

re actually trying to achieve?

● How should the input data to that evaluation be constructed (train, validation,

test)?

● How fast and easy is it to run a full cycle of offline experimentations?

○ Minimize time to first metric

● How replicable is the evaluation? How shareable are the results?

○ Provenance (see Dagobah)

○ Notebooks (see Jupyter, Zeppelin, Spark Notebook)

https://www.youtube.com/watch?v=V2E1PdboYLk

http://jupyter.org/

http://zeppelin-project.org/

https://github.com/andypetrella/spark-notebook

When tackling an old problem● Same…

○ Were the metrics designed when first running experimentation in that space still appropriate now?

Think about distribution from the outermost layers

1. For each combination of hyper-parameter

(e.g. grid search, random search, gaussian processes…)

2. For each subset of the training data

a. Multi-core learning (e.g. HogWild)

b. Distributed learning (e.g. ADMM, distributed L-BFGS, …)

http://www.jmlr.org/papers/volume13/bergstra12a/bergstra12a.pdf

https://github.com/Yelp/MOE

http://arxiv.org/abs/1106.5730

http://stanford.edu/~boyd/admm.html

https://en.wikipedia.org/wiki/Limited-memory_BFGS

When to use distributed learning?● The impact of communication overhead when building distributed ML

algorithms is non-trivial

● Is your data big enough that the distribution offsets the communication overhead?

Example: Uncollapsed Gibbs sampler for LDA

(more details here)

http://www.slideshare.net/moustaki/spark-meetup-netflix-05192015

Design production code to be experimentation-friendly

Idea Data

Offline Modeling

(R, Python, MATLAB, …)

Iterate

Implement in production

system (Java, C++, …)

Missing post-processing logic

Performance issues

Actual outputProduction environment

(A/B test) Code discrepancies

Final model

Data discrepancies

Example development process

Avoid dual implementations

Shared Engine

Experimentcode

Productioncode

ProductionExperiment

To be continued...

http://www.slideshare.net/SessionsEvents/justin-basilico-research-engineering-manager-at-netflix-at-mlconf-sf-111315



We’re hiring!

Yves Raimond (@moustaki)

http://jobs.netflix.com

http://jobs.netflix.com