the essentials of effective machine learning...experienced machine learning practitioner capable of...
TRANSCRIPT
The Essentials of Effective Machine Learning
Scott ErnstDirector of Data
Science & Engineering
Reinventing Employee Scheduling Software
A better, happier, more efficient hourly workforce
Computational Astrophysics
What happens
when stars collide?
“Magnetohydrodynamic Shock-wave Stability Simulations”
ML-Enhanced CG Animation & VFX
Can CG Characters
Walk without Animators?
“Supervised & Unsupervised Behavioral Character Locomotion”
Dinosaur Data Science“Predict behavioral information from massive tracksite in Switzerland”
Excavation Site Map
What to do?
● No Legacy Constraints
● Freedom of Choice
● Support from Leadership
● Help from Peers
The Essentials of Effective Machine Learning
Machine LearningA very real example
data = load_data()1
The Very Real Example
data = load_data()
model = Model()
1
2
The Very Real Example
data = load_data()
model = Model()
model.fit(data.training_features, data.training_labels)
1
2
3
The Very Real Example
data = load_data()
model = Model()
model.fit(data.training_features, data.training_labels)
predictions = model.predict(data.testing_features)
1
2
3
4
The Very Real Example
data = load_data()
model = Model()
model.fit(data.training_features, data.training_labels)
predictions = model.predict(data.testing_features)
correct = count_correct(
expected=data.testing_labels,
predicted=predictions
)
1
2
3
4
5
The Very Real Example
data = load_data()
model = Model()
model.fit(data.training_features, data.training_labels)
predictions = model.predict(data.testing_features)
correct = count_correct(
expected=data.testing_labels,
predicted=predictions
)
print(f'Accuracy: {100 * correct / len(predictions):,.2f}%')
1
2
3
4
5
6
The Very Real Example
$ python example.py
Accuracy: 94.74%
The Very Real Example
VictoryMachine Learning is Easy!
data = load_data()
model = Model()
model.fit(data.training_features, data.training_labels)
predictions = model.predict(data.testing_features)
correct = count_correct(
expected=data.testing_labels,
predicted=predictions
)
print(f'Accuracy: {100 * correct / len(predictions):,.2f}%')
1
2
3
4
5
6
The Very Real Example
data = load_data()
model = FancierModel()
model.fit(data.training_features, data.training_labels)
predictions = model.predict(data.testing_features)
correct = count_correct(
expected=data.testing_labels,
predicted=predictions
)
print(f'Accuracy: {100 * correct / len(predictions):,.2f}%')
1
2
3
4
5
6
The Improved Very Real Example
The Improved Very Real Example
$ python improved_example.py
Accuracy: 95.66%
$ python example.py
Accuracy: 94.74%
Improved VictoryMachine Learning is very Easy!
data = load_data()
model = FancierModel()
model.fit(data.training_features, data.training_labels)
predictions = model.predict(data.testing_features)
correct = count_correct(
expected=data.testing_labels,
predicted=predictions
)
print(f'Accuracy: {100 * correct / len(predictions):,.2f}%')
1
2
3
4
5
6
The Improved Very Real Example
data = load_data()
model = EvenFancierModel()
model.fit(data.training_features, data.training_labels)
predictions = model.predict(data.testing_features)
correct = count_correct(
expected=data.testing_labels,
predicted=predictions
)
print(f'Accuracy: {100 * correct / len(predictions):,.2f}%')
1
2
3
4
5
6
The Even More Improved Very Real Example
The Even More Improved Very Real Example
$ python even_more_improved_example.py
Accuracy: 97.09%
$ python improved_example.py
Accuracy: 95.66%
Even Bigger VictoryMachine Learning is really Easy!
data = load_data()
model = EvenFancierModel()
model.fit(data.training_features, data.training_labels)
predictions = model.predict(data.testing_features)
correct = count_correct(
expected=data.testing_labels,
predicted=predictions
)
print(f'Accuracy: {100 * correct / len(predictions):,.2f}%')
1
2
3
4
5
6
The Even More Improved Very Real Example
data = load_data()
model = SuperFancyModel()
model.fit(data.training_features, data.training_labels)
predictions = model.predict(data.testing_features)
correct = count_correct(
expected=data.testing_labels,
predicted=predictions
)
print(f'Accuracy: {100 * correct / len(predictions):,.2f}%')
1
2
3
4
5
6
The Super Improved Very Real Example
The Super Improved Very Real Example
$ python super_improved_example.py
Accuracy: 97.11%
$ python even_more_improved_example.py
Accuracy: 97.09%
Super VictoryMachine Learning is super Easy!
So What Now?
(555) 555-1234 [email protected] https://linkedin.coim/in/mlmaster
Resume
Scott Ernst
Experienced Machine Learning practitioner capable of solving challenging problems with creativity and efficiency.
(555) 555-1234 [email protected] https://linkedin.coim/in/mlmaster
Resume
Scott Ernst
Skills
Experienced Machine Learning practitioner capable of solving challenging problems with creativity and efficiency.
(555) 555-1234 [email protected] https://linkedin.coim/in/mlmaster
Resume
Scott Ernst
Skills
Experienced Machine Learning practitioner capable of solving challenging problems with creativity and efficiency.
● Model
Expert in Machine Learning with:
● FancyModel
● FancierModel
● SuperFancyModel
Machine Learning is a collection of tools
Just because you can use a hammer
doesn’t mean you can build a house.
Some Historical Perspective
DOTCOM1995-2001
BUBBLE
Answer to the Ultimate Question of Life, the
Universe, and Everything...
Circa 2000
Web Development
Success & Failures
One of the biggest failures at Boo was to assume that [web development] was not a technology issue. Up through launch and beyond, the [web] team was first reporting to business development and then to marketing.
Boo.com Postmortem
- Tristan Louis, CTO
Joel Spolsky
Co-Founder & CEO
Founder
The Joel TestYes/No Questions
For Assessing the Quality of a Software Team
1. Do you use source control?
2. Can you make a build in one step?
3. Do you make daily builds?
4. Do you have a bug database?
5. Do you fix bugs before writing new code?
6. Do you have an up-to-date schedule?
7. Do you have a spec?
8. Do programmers have quiet working conditions?
9. Do you use the best tools money can buy?
10. Do you have testers?
11. Do new candidates write code during their interview?
12. Do you do hallway usability testing?
Answer to the Ultimate Question of Life, the
Universe, and Everything...
Circa 2019
Machine Learning & AI
What is the “Joel Test”
for Machine Learning?
Keeping in mind that
Success is not the absenceof complete failure
RoIReturn on Investment
Bad RoI
OutcomeInvest
+$100K-$200K
$0 $0.4M $0.8M $1.2M-$1.2M -$0.8M -$0.4M
Better RoI
OutcomeInvest
+$600K-$200K
$0 $0.4M $0.8M $1.2M-$1.2M -$0.8M -$0.4M
In Machine Learning
Time is the Investment
Bad RoI
OutcomeInvest
+$100K
$0 $0.4M $0.8M $1.2M
-100k person hours
OutcomeInvest
$0 $0.4M $0.8M $1.2M
Better RoI
+$600K
-60k person hours
OutcomeInvest
$0 $0.4M $0.8M $1.2M
Much Better RoI
+$600K
-600 person hours
The Holistic Process
Machine Learning Process (Simplified)
COLLECT
1
PROCESS
2
RESEARCH
3
DEVELOP
4
DEPLOY
5
VALIDATE
6
Machine Learning Process (Simplified)
COLLECT
1PROCESS
2
RESEARCH
3
DEVELOP
4
DEPLOY
5VALIDATE
6
Machine Learning Process (Simplified)
COLLECT
1
PROCESS
2
RESEARCH
3
DEVELOP
4
DEPLOY
5
VALIDATE
6
A Lot More to it than .fit().predict()
COLLECT
1
PROCESS
2
RESEARCH
3
DEVELOP
4
DEPLOY
5
VALIDATE
6
“80% of what we call analytics is not analytics at
all but just hard work”- Werner Vogels, CTO @ Amazon.com
Degrees of Execution Quality
Poor
OK
Good
The Ideal Scenario
COLLECT
1
PROCESS
2
RESEARCH
3
DEVELOP
4
DEPLOY
5
VALIDATE
6
Quality
Common Case #1“The Garbage Plant”
“Garbage Plant”
COLLECT
1
PROCESS
2
RESEARCH
3
DEVELOP
4
DEPLOY
5
VALIDATE
6
Poor experiments & improper data collection produce garbage results
Quality
Data is Incorrect
Some Data is Incorrect
data = load_data()
model = Model()
model.fit(data.training_features, data.training_labels)
predictions = model.predict(data.testing_features)
correct = count_correct(
expected=data.testing_labels,
predicted=predictions
)
print(f'Accuracy: {100 * correct / len(predictions):,.2f}%')
1
2
3
4
5
6
The Very Real Example (again)
data = add_noise(load_data(), amount=0.01)
model = Model()
model.fit(data.training_features, data.training_labels)
predictions = model.predict(data.testing_features)
correct = count_correct(
expected=data.testing_labels,
predicted=predictions
)
print(f'Accuracy: {100 * correct / len(predictions):,.2f}%')
1
2
3
4
5
6
The Very Real Example + Noise
The Very Real Example with 1% Noise
$ python example_with_noise.py
Accuracy: 81.58%
$ python example.py
Accuracy: 94.74%
@ When I Work
2.1
1400
250+
billion / month
/ second
distinct streams
volume
velocity
variety
2 second availability
Basic Toolchain
Ingest Store Consume
How do these Tools Help Overcome...
EntropyFragmented, Inconsistent & Disparate Data
BiasesThumbs on the Scale
How do these Tools Help Overcome...
A Maslow’s Hierarchy of Needs
“food” “shelter” “water”
Ingest Store Consume
A Maslow’s Hierarchy of Needs
Ingest Store Consume
Governance
Strict Governance
Data Catalog
Quarantine
Ingest
Validated Stream
Data Lake
Data Immutability: Write Once
Validated Stream
Data Lake
Read Only
Write Once
Common Case #2“The Silver Bullet”
Quality
“Silver Bullet”
COLLECT
1
PROCESS
2
RESEARCH
3
DEVELOP
4
DEPLOY
5
VALIDATE
6
Machine Learning will save us
Epsilon Modeling
ε
OutcomeInvest
$0 $0.4M $0.8M $1.2M
Machine Learning RoI
+$600K
-600 person hours
OutcomeInvest
$0 $0.4M $0.8M $1.2M
Machine Learning RoI
+$600K
-600 person hours
$ python version-1.py
Accuracy: 97.09%
OutcomeInvest
$0 $0.4M $0.8M $1.2M
Machine Learning RoI
+$600K-200 person hours
-600 person hours
OutcomeInvest
$0 $0.4M $0.8M $1.2M
Machine Learning RoI
+$600K + ???-200 person hours
-600 person hours
$ python version-2.py
Accuracy: 97.11%
Does the +Δ0.02% Deliver Enough Additional Value?
$ python version-2.py
Accuracy: 97.11%
$ python version-1.py
Accuracy: 97.09%
Netflix Envy
Less Successful Cases
$ python version-1.py
Accuracy: 27.4%
Less Successful Cases
$ python version-2.py
Accuracy: 27.7%
+
$ python version-1.py
Accuracy: 27.4%
Less Successful Cases
+ +
$ python version-3.py
Accuracy: 28.3%
$ python version-2.py
Accuracy: 27.7%
Less Successful Cases
+ + +
$ python version-4.py
Accuracy: 29.0%
$ python version-3.py
Accuracy: 28.3%
Future RoI: Unknown
OutcomeInvest
$0 $0.4M $0.8M $1.2M-$1.2M -$0.8M -$0.4M
?????????? ??????????
Non-Tech Example
Blockbuster Movies
Example: Rogue One
-$0.4B
-$0.8B
-$1.2B
$0
+$0.4B
+$0.8B
+$1.2B+$1.1B
-$520M
Example: Solo
-$0.4B
-$0.8B
-$1.2B
$0
+$0.4B
+$0.8B
+$1.2B
+$400M
-$450M
Earned
Projected
E-Commerce Recommendation Engine
“When you’re fundraising, it’s AI.
When you’re hiring, it’s ML.
When you’re implementing,
it’s logistic regression.”— everyone on Twitter ever
Common Case #3“The Academic Exercise”
Quality
“Academic Exercise”
COLLECT
1
PROCESS
2
RESEARCH
3DEVELOP
4
DEPLOY
5
VALIDATE
6
Subject Matter Experts reign supreme!
Highly Interdisciplinary Field
Machine Learning Skills
Requires expertise in a very wide range of skills
First Hire
Expertise
Extreme
Low
How were they chosen?
Machine Learning Skills
Selection Bias
Expertise
Low
A Data Scientist is someone like me...
Machine Learning Skills
Extreme
A Team Missing Skills
Expertise
Low
Important elements for success are lacking
Machine Learning Skills
Extreme
Built in the LaboratoryWorks in the Laboratory
Can’t Survive in the Wild
Local Development
stopwatch.start()result = model.fit(data)stopwatch.stop()print(stopwatch.elapsed())print(results)
Elapsed: 240 ms
******* RESULTS ******* Data: 200k rows Mean: 12.4Variance: 4.2 Error: 2.3
The Dream of Transparent Scaling
stopwatch.start()result = model.fit(data)stopwatch.stop()print(stopwatch.elapsed())print(results)
Elapsed: 320 ms
******* RESULTS ******* Data: 20B rows Mean: 12.4Variance: 4.2 Error: 2.3
Not a Reality
stopwatch.start()result = model.fit(data)stopwatch.stop()print(stopwatch.elapsed())print(results)
Elapsed: 320 ms
******* RESULTS ******* Data: 20B rows Mean: 12.4Variance: 4.2 Error: 2.3
Common Case #4“Doing Agile”
Quality
“Doing Agile”
COLLECT
1
PROCESS
2
RESEARCH
3
DEVELOP
4
DEPLOY
5
VALIDATE
6
What looks good in our tickets is...
≠Data Engineering
ApplicationEngineering
Applications can be Engineered“We will be able to deliver value VN in iteration N.”
Iteration 1 Iteration 2 Iteration N
Yields V1 Yields V2 Yields VN
“We will be able to add the automatic logout feature next week.”
Insight cannot be Engineered“We might be able to deliver insight IN in iteration N.”
Iteration 1 Iteration 2 Iteration N
Yields I1? Yields I2? Yields IN?
“We might be able to reduce false positives by 5% next week.”
Insight cannot be Engineered“We weren’t able to deliver insight IN in iteration N.”
Iteration 1 Iteration 2 Iteration N
Yields T1 Yields T2 Yields TN
“We weren’t able to reduce false positives by 5% last week.”
shift into
Task-based Deliverables
Value-based Deliverables
Tasks can be Engineered“We will be able to finish task TN in iteration N.”
Iteration 1 Iteration 2 Iteration N
Yields T1 Yields T2 Yields TN
“We will be able to add a new feature to the model next week.”
Engineer for InsightDon’t Try to Engineer Insight
Engineers Shouldn’t Write ETLA Guide to Building a High Functioning Data Science Department
Jeff MagnssonVP Data Platform
https://multithreaded.stitchfix.com/blog/2016/03/16/engineers-shouldnt-write-etl/
engineers must deploy platforms, services, abstractions, and
frameworks that allow the data scientists to conceive of,
develop, and deploy their ideas ... I like to think of it in terms
of Lego blocks. Engineers design new Lego blocks that data
scientists assemble in creative ways to create new data
science.
Increased Operational Tempo“It’s highly likely we’ll be able to deliver insight IN in iteration N.”
Iteration 1 Iteration 2 Iteration N
Yields I1... Yields I2... Yields IN...
“We’ll likely be able to reduce false positives by 5% next week.”
Common Case #5“Blindly Charging Ahead”
“Blindly Charging Ahead”
COLLECT
1
PROCESS
2
RESEARCH
3
DEVELOP
4
DEPLOY
5
VALIDATE
6
What effect our we even having...
Quality
“All models are wrong but some are useful.”
- George Box
https://en.wikipedia.org/wiki/George_E._P._Box
ML Always has an Answer
But how wrong is it?
Solar System Orbits
Heliocentric(Sun is the center)
Solar System Orbits
Geocentric(Earth is the center)
Solar System Orbits
Two “Valid” Solutions
data = load_data()
model = Model()
model.fit(data.training_features, data.training_labels)
predictions = model.predict(data.testing_features)
correct = count_correct(
expected=data.testing_labels,
predicted=predictions
)
print(f'Accuracy: {100 * correct / len(predictions):,.2f}%')
1
2
3
4
5
6
The Very Real Example (Yet Again)
data = load_data()
model = Model()
model.fit(data.training_features, data.training_labels)
predictions = model.predict(data.testing_features)
correct = count_correct(
expected=data.testing_labels,
predicted=predictions
)
print(f'Accuracy: {100 * correct / len(predictions):,.2f}%')
1
2
3
4
5
6
The Very Real Example (Yet Again)
data = load_data()
model = Model()
model.fit(data.training_features, data.training_labels)
predictions = model.predict(data.testing_features)
correct = count_correct(
expected=data.testing_labels,
predicted=predictions
)
print(f'Accuracy: {100 * correct / len(predictions):,.2f}%')
1
2
3
4
5
6
Modify for a Rare Event Data Set
data = load_rare_event_data()
model = Model()
model.fit(data.training_features, data.training_labels)
predictions = model.predict(data.testing_features)
correct = count_correct(
expected=data.testing_labels,
predicted=predictions
)
print(f'Accuracy: {100 * correct / len(predictions):,.2f}%')
1
2
3
4
5
6
Modify for a Rare Event Data Set
data = load_rare_event_data()
model = AlwaysFalseModel()
model.fit(data.training_features, data.training_labels)
predictions = model.predict(data.testing_features)
correct = count_correct(
expected=data.testing_labels,
predicted=predictions
)
print(f'Accuracy: {100 * correct / len(predictions):,.2f}%')
1
2
3
4
5
6
Modify for a Rare Event Data Set
$ python rare_event_example.py
Accuracy: 95.00%
The Rare Event Example
$ python rare_event_example.py
Accuracy: 95.00%
The Rare Event Example
Because True only occurs 5% of the time.
Mocking a Heuristic as a scikit-learn Estimator
Eric NessSr Data Scientist
https://medium.com/when-i-work-data/mocking-a-heuristic-as-a-scikit-learn-estimator-9200bd2fb100
Creating mock models using a heuristic is an excellent way to
remove bottlenecks in the development cycle. [They are also
useful] in establishing the minimum performance necessary
for a model to be valuable. For example, if the model is
trying to predict which customers will leave and which will
stay, then a naive model might predict that all customers will
stay. While it has high accuracy, it’s precision will be poor.
Any viable model will need to beat the naive model’s
performance.
More than a ScoreUnderstanding is Critical
Two “Valid” Solutions
Domain Expertise is CriticalFor big data to mature beyond marketing hype towards truly transformative solutions, it must “grow up” out of the computer science labs that gave birth to it and spend more time on understanding the domain-specific [problems] it is applied to than on the computing algorithms that operationalize them.
- Alev Leetaru
https://www.wired.com/2014/06/how-to-teach-heartless-computers-to-really-get-what-were-feeling/
E-Commerce Recommendation Engine
The Ideal Scenario
COLLECT
1
PROCESS
2
RESEARCH
3
DEVELOP
4
DEPLOY
5
VALIDATE
6
Quality
Self Mockery
Kevin SchirooData Engineer
https://medium.com/when-i-work-data/self-mockery-2f6eabf27b21
One of the biggest reasons that I believe we succeed in
managing all of [our many] projects is our commitment to
our practice, to not only focus on the final results we deliver,
but also on the path we take to deliver them.
So what is the “Joel Test”
for Machine Learning?
.fit().predict()
What I do Know
Machine Learning is a powerful set of tools that require a holistic approach to use effectively.
The Essentials of Effective Machine Learning
Scott Ernst● [email protected]● linkedin.com/in/swernst/