barcelona ml meetup - lessons learned

76
Lessons Learned

Upload: xavier-amatriain

Post on 16-Apr-2017

2.491 views

Category:

Technology


2 download

TRANSCRIPT

Lessons Learned

Lessons Learned

More Data vs. Better Models

Really?

Anand Rajaraman: Former Stanford Prof. & Senior VP at Walmart

Sometimes, it’s not about more data

Norvig: “Google does not have better Algorithms only more Data”

Many features/low-bias models

Sometimes, it’s not about more data

You Might not need all your “big Data”

○○

Sometimes you do needa Complex Model

It pays off to be smart aboutHyperparameters

Supervised vs. plus Unsupervised Learning

○○

○○

○○

Everything is an ensemble

○○

○○○

○○

The output of your modelwill be the input of another one

(and other system design problems)

○○

The pains & gains of Feature Engineering

○○○○

○○○○

Implicit signals beat explicit ones

(almost always)

○○

○○

be thoughtful about your Training Data

○○

Your Model will learn what you teach it to learn

○○○

Learn to deal with Presentation Bias

More likely to see

Less likely

Data and Models are great. You know what’s even better?

The right evaluation approach!

○○

You don’t need to distribute your ML algorithm

○○○

○○○

but, If you do, you should understand at what level to do it

The three levels of Distribution/Parallelization

● For each subset of the population (e.g. region)● For each combination of the hyperparameters● For each subset of the training data

Each level has different requirements

ANN Training over distributed GPU’s

some things are better done Online and others offline… and, there is Nearline for

everything in between

System Overview

● Blueprint for multiple personalization algorithm services● Ranking● Row selection● Ratings● …

● Recommendation involving multi-layered Machine Learning

Matrix Factorization Example

The two faces of yourML infrastructure

Why you should care about answering questions (about your model)

○○○

The untold story of Data Science and vs. ML engineering

○○

○○