the essentials of effective machine learning...experienced machine learning practitioner capable of...

The Essentials of Effective Machine Learning

Scott ErnstDirector of Data

Science & Engineering

Reinventing Employee Scheduling Software

A better, happier, more efficient hourly workforce

Computational Astrophysics

What happens

when stars collide?

“Magnetohydrodynamic Shock-wave Stability Simulations”

ML-Enhanced CG Animation & VFX

Can CG Characters

Walk without Animators?

“Supervised & Unsupervised Behavioral Character Locomotion”

http://www.youtube.com/watch?v=pbRO11vNnVg

Application: Dinosaurs

How did they

really move?

http://www.youtube.com/watch?v=7OzRnj4xRFI

Dinosaur Data Science“Predict behavioral information from massive tracksite in Switzerland”

Excavation Site Map

What to do?

● No Legacy Constraints

● Freedom of Choice

● Support from Leadership

● Help from Peers

Machine LearningA very real example

data = load_data()1

The Very Real Example

data = load_data()

model = Model()

1

2


data = load_data()

model = Model()

model.fit(data.training_features, data.training_labels)

1

2

3


data = load_data()

model = Model()


predictions = model.predict(data.testing_features)

1

2

3

4


data = load_data()

model = Model()



correct = count_correct(

expected=data.testing_labels,

predicted=predictions

)

1

2

3

4

5


data = load_data()

model = Model()






)

print(f'Accuracy: {100 * correct / len(predictions):,.2f}%')

1

2

3

4

5

6


$ python example.py

Accuracy: 94.74%


VictoryMachine Learning is Easy!

data = load_data()

model = Model()






)


1

2

3

4

5

6


data = load_data()

model = FancierModel()






)


1

2

3

4

5

6

The Improved Very Real Example


$ python improved_example.py

Accuracy: 95.66%

$ python example.py

Accuracy: 94.74%

Improved VictoryMachine Learning is very Easy!

data = load_data()

model = FancierModel()






)


1

2

3

4

5

6


data = load_data()

model = EvenFancierModel()






)


1

2

3

4

5

6

The Even More Improved Very Real Example


$ python even_more_improved_example.py

Accuracy: 97.09%

$ python improved_example.py

Accuracy: 95.66%

Even Bigger VictoryMachine Learning is really Easy!

data = load_data()

model = EvenFancierModel()






)


1

2

3

4

5

6


data = load_data()

model = SuperFancyModel()






)


1

2

3

4

5

6

The Super Improved Very Real Example

The Super Improved Very Real Example

$ python super_improved_example.py

Accuracy: 97.11%

$ python even_more_improved_example.py

Accuracy: 97.09%

Super VictoryMachine Learning is super Easy!

So What Now?

(555) 555-1234 [email protected] https://linkedin.coim/in/mlmaster

Resume

Scott Ernst


Resume

Scott Ernst

Experienced Machine Learning practitioner capable of solving challenging problems with creativity and efficiency.


Resume

Scott Ernst

Skills



Resume

Scott Ernst

Skills


● Model

Expert in Machine Learning with:

● FancyModel

● FancierModel

● SuperFancyModel

Machine Learning is a collection of tools

Just because you can use a hammer

doesn’t mean you can build a house.

Some Historical Perspective

DOTCOM1995-2001

BUBBLE

Answer to the Ultimate Question of Life, the

Universe, and Everything...

Circa 2000

Web Development

Success & Failures

One of the biggest failures at Boo was to assume that [web development] was not a technology issue. Up through launch and beyond, the [web] team was first reporting to business development and then to marketing.

Boo.com Postmortem

- Tristan Louis, CTO

Joel Spolsky

Co-Founder & CEO

Founder

The Joel TestYes/No Questions

For Assessing the Quality of a Software Team

1. Do you use source control?

2. Can you make a build in one step?

3. Do you make daily builds?

4. Do you have a bug database?

5. Do you fix bugs before writing new code?

6. Do you have an up-to-date schedule?

7. Do you have a spec?

8. Do programmers have quiet working conditions?

9. Do you use the best tools money can buy?

10. Do you have testers?

11. Do new candidates write code during their interview?

12. Do you do hallway usability testing?

Answer to the Ultimate Question of Life, the

Universe, and Everything...

Circa 2019

Machine Learning & AI

What is the “Joel Test”

for Machine Learning?

Keeping in mind that

Success is not the absenceof complete failure

RoIReturn on Investment

Bad RoI

OutcomeInvest

+$100K-$200K

$0 $0.4M $0.8M $1.2M-$1.2M -$0.8M -$0.4M

Better RoI

OutcomeInvest

+$600K-$200K

$0 $0.4M $0.8M $1.2M-$1.2M -$0.8M -$0.4M

In Machine Learning

Time is the Investment

Bad RoI

OutcomeInvest

+$100K

$0 $0.4M $0.8M $1.2M

-100k person hours

OutcomeInvest

$0 $0.4M $0.8M $1.2M

Better RoI

+$600K

-60k person hours

OutcomeInvest

$0 $0.4M $0.8M $1.2M

Much Better RoI

+$600K

-600 person hours

The Holistic Process

Machine Learning Process (Simplified)

COLLECT

1

PROCESS

2

RESEARCH

3

DEVELOP

4

DEPLOY

5

VALIDATE

6


COLLECT

1PROCESS

2

RESEARCH

3

DEVELOP

4

DEPLOY

5VALIDATE

6


COLLECT

1

PROCESS

2

RESEARCH

3

DEVELOP

4

DEPLOY

5

VALIDATE

6

A Lot More to it than .fit().predict()

COLLECT

1

PROCESS

2

RESEARCH

3

DEVELOP

4

DEPLOY

5

VALIDATE

6

“80% of what we call analytics is not analytics at

all but just hard work”- Werner Vogels, CTO @ Amazon.com

Degrees of Execution Quality

Poor

OK

Good

The Ideal Scenario

COLLECT

1

PROCESS

2

RESEARCH

3

DEVELOP

4

DEPLOY

5

VALIDATE

6

Quality

Common Case #1“The Garbage Plant”

“Garbage Plant”

COLLECT

1

PROCESS

2

RESEARCH

3

DEVELOP

4

DEPLOY

5

VALIDATE

6

Poor experiments & improper data collection produce garbage results

Quality

Data is Incorrect

Some Data is Incorrect

data = load_data()

model = Model()






)


1

2

3

4

5

6

The Very Real Example (again)

data = add_noise(load_data(), amount=0.01)

model = Model()






)


1

2

3

4

5

6

The Very Real Example + Noise

The Very Real Example with 1% Noise

$ python example_with_noise.py

Accuracy: 81.58%

$ python example.py

Accuracy: 94.74%

@ When I Work

2.1

1400

250+

billion / month

/ second

distinct streams

volume

velocity

variety

2 second availability

Basic Toolchain

Ingest Store Consume

How do these Tools Help Overcome...

EntropyFragmented, Inconsistent & Disparate Data

BiasesThumbs on the Scale

How do these Tools Help Overcome...

A Maslow’s Hierarchy of Needs

“food” “shelter” “water”


A Maslow’s Hierarchy of Needs


Governance

Strict Governance

Data Catalog

Quarantine

Ingest

Validated Stream

Data Lake

Data Immutability: Write Once

Validated Stream

Data Lake

Read Only

Write Once

Common Case #2“The Silver Bullet”

Quality

“Silver Bullet”

COLLECT

1

PROCESS

2

RESEARCH

3

DEVELOP

4

DEPLOY

5

VALIDATE

6

Machine Learning will save us

Epsilon Modeling

ε

OutcomeInvest

$0 $0.4M $0.8M $1.2M

Machine Learning RoI

+$600K

-600 person hours

OutcomeInvest

$0 $0.4M $0.8M $1.2M


+$600K

-600 person hours

$ python version-1.py

Accuracy: 97.09%

OutcomeInvest

$0 $0.4M $0.8M $1.2M


+$600K-200 person hours

-600 person hours

OutcomeInvest

$0 $0.4M $0.8M $1.2M


+$600K + ???-200 person hours

-600 person hours


Accuracy: 97.11%

Does the +Δ0.02% Deliver Enough Additional Value?


Accuracy: 97.11%


Accuracy: 97.09%

Netflix Envy

Less Successful Cases


Accuracy: 27.4%



Accuracy: 27.7%

+


Accuracy: 27.4%


+ +


Accuracy: 28.3%


Accuracy: 27.7%


+ + +


Accuracy: 29.0%


Accuracy: 28.3%

Future RoI: Unknown

OutcomeInvest

$0 $0.4M $0.8M $1.2M-$1.2M -$0.8M -$0.4M

?????????? ??????????

Non-Tech Example

Blockbuster Movies

Example: Rogue One

-$0.4B

-$0.8B

-$1.2B

$0

+$0.4B

+$0.8B

+$1.2B+$1.1B

-$520M

Example: Solo

-$0.4B

-$0.8B

-$1.2B

$0

+$0.4B

+$0.8B

+$1.2B

+$400M

-$450M

Earned

Projected

E-Commerce Recommendation Engine

“When you’re fundraising, it’s AI.

When you’re hiring, it’s ML.

When you’re implementing,

it’s logistic regression.”— everyone on Twitter ever

Common Case #3“The Academic Exercise”

Quality

“Academic Exercise”

COLLECT

1

PROCESS

2

RESEARCH

3DEVELOP

4

DEPLOY

5

VALIDATE

6

Subject Matter Experts reign supreme!

Highly Interdisciplinary Field

Machine Learning Skills

Requires expertise in a very wide range of skills

First Hire

Expertise

Extreme

Low

How were they chosen?


Selection Bias

Expertise

Low

A Data Scientist is someone like me...


Extreme

A Team Missing Skills

Expertise

Low

Important elements for success are lacking


Extreme

Built in the LaboratoryWorks in the Laboratory

Can’t Survive in the Wild

Local Development

stopwatch.start()result = model.fit(data)stopwatch.stop()print(stopwatch.elapsed())print(results)

Elapsed: 240 ms

******* RESULTS ******* Data: 200k rows Mean: 12.4Variance: 4.2 Error: 2.3

The Dream of Transparent Scaling


Elapsed: 320 ms

******* RESULTS ******* Data: 20B rows Mean: 12.4Variance: 4.2 Error: 2.3

Not a Reality


Elapsed: 320 ms

******* RESULTS ******* Data: 20B rows Mean: 12.4Variance: 4.2 Error: 2.3

Common Case #4“Doing Agile”

Quality

“Doing Agile”

COLLECT

1

PROCESS

2

RESEARCH

3

DEVELOP

4

DEPLOY

5

VALIDATE

6

What looks good in our tickets is...

≠Data Engineering

ApplicationEngineering

Applications can be Engineered“We will be able to deliver value VN in iteration N.”

Iteration 1 Iteration 2 Iteration N

Yields V1 Yields V2 Yields VN

“We will be able to add the automatic logout feature next week.”

Insight cannot be Engineered“We might be able to deliver insight IN in iteration N.”


Yields I1? Yields I2? Yields IN?

“We might be able to reduce false positives by 5% next week.”

Insight cannot be Engineered“We weren’t able to deliver insight IN in iteration N.”


Yields T1 Yields T2 Yields TN

“We weren’t able to reduce false positives by 5% last week.”

shift into

Task-based Deliverables

Value-based Deliverables

Tasks can be Engineered“We will be able to finish task TN in iteration N.”


Yields T1 Yields T2 Yields TN

“We will be able to add a new feature to the model next week.”

Engineer for InsightDon’t Try to Engineer Insight

Engineers Shouldn’t Write ETLA Guide to Building a High Functioning Data Science Department

Jeff MagnssonVP Data Platform

https://multithreaded.stitchfix.com/blog/2016/03/16/engineers-shouldnt-write-etl/

engineers must deploy platforms, services, abstractions, and

frameworks that allow the data scientists to conceive of,

develop, and deploy their ideas ... I like to think of it in terms

of Lego blocks. Engineers design new Lego blocks that data

scientists assemble in creative ways to create new data

science.

Increased Operational Tempo“It’s highly likely we’ll be able to deliver insight IN in iteration N.”


Yields I1... Yields I2... Yields IN...

“We’ll likely be able to reduce false positives by 5% next week.”

Common Case #5“Blindly Charging Ahead”

“Blindly Charging Ahead”

COLLECT

1

PROCESS

2

RESEARCH

3

DEVELOP

4

DEPLOY

5

VALIDATE

6

What effect our we even having...

Quality

“All models are wrong but some are useful.”

- George Box

https://en.wikipedia.org/wiki/George_E._P._Box

ML Always has an Answer

But how wrong is it?

Solar System Orbits

Heliocentric(Sun is the center)

Solar System Orbits

Geocentric(Earth is the center)

Solar System Orbits

http://www.youtube.com/watch?v=waexG16WZrE

Two “Valid” Solutions

data = load_data()

model = Model()






)


1

2

3

4

5

6

The Very Real Example (Yet Again)

data = load_data()

model = Model()






)


1

2

3

4

5

6

Modify for a Rare Event Data Set

data = load_rare_event_data()

model = Model()






)


1

2

3

4

5

6


data = load_rare_event_data()

model = AlwaysFalseModel()






)


1

2

3

4

5

6


$ python rare_event_example.py

Accuracy: 95.00%

The Rare Event Example

$ python rare_event_example.py

Accuracy: 95.00%

The Rare Event Example

Because True only occurs 5% of the time.

Mocking a Heuristic as a scikit-learn Estimator

Eric NessSr Data Scientist

https://medium.com/when-i-work-data/mocking-a-heuristic-as-a-scikit-learn-estimator-9200bd2fb100

Creating mock models using a heuristic is an excellent way to

remove bottlenecks in the development cycle. [They are also

useful] in establishing the minimum performance necessary

for a model to be valuable. For example, if the model is

trying to predict which customers will leave and which will

stay, then a naive model might predict that all customers will

stay. While it has high accuracy, it’s precision will be poor.

Any viable model will need to beat the naive model’s

performance.

More than a ScoreUnderstanding is Critical

Two “Valid” Solutions

Domain Expertise is CriticalFor big data to mature beyond marketing hype towards truly transformative solutions, it must “grow up” out of the computer science labs that gave birth to it and spend more time on understanding the domain-specific [problems] it is applied to than on the computing algorithms that operationalize them.

- Alev Leetaru

https://www.wired.com/2014/06/how-to-teach-heartless-computers-to-really-get-what-were-feeling/

E-Commerce Recommendation Engine

The Ideal Scenario

COLLECT

1

PROCESS

2

RESEARCH

3

DEVELOP

4

DEPLOY

5

VALIDATE

6

Quality

Self Mockery

Kevin SchirooData Engineer

https://medium.com/when-i-work-data/self-mockery-2f6eabf27b21

One of the biggest reasons that I believe we succeed in

managing all of [our many] projects is our commitment to

our practice, to not only focus on the final results we deliver,

but also on the path we take to deliver them.

So what is the “Joel Test”

for Machine Learning?

.fit().predict()

What I do Know

Machine Learning is a powerful set of tools that require a holistic approach to use effectively.


Scott Ernst● [email protected]● linkedin.com/in/swernst/

the essentials of effective machine learning...experienced machine learning practitioner capable of...

Documents