full stack deep learning...full stack deep learning (march 2019) pieter abbeel, sergey karayev, josh...

142
Full Stack Deep Learning Troubleshooting Deep Neural Networks Josh Tobin, Sergey Karayev, Pieter Abbeel

Upload: others

Post on 25-Jul-2020

2 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Full Stack Deep Learning...Full Stack Deep Learning (March 2019) Pieter Abbeel, Sergey Karayev, Josh Tobin L6: Troubleshooting Suppose you can’t reproduce a result!7 He, Kaiming,

Full Stack Deep LearningTroubleshooting Deep Neural Networks

Josh Tobin, Sergey Karayev, Pieter Abbeel

Page 2: Full Stack Deep Learning...Full Stack Deep Learning (March 2019) Pieter Abbeel, Sergey Karayev, Josh Tobin L6: Troubleshooting Suppose you can’t reproduce a result!7 He, Kaiming,

Full Stack Deep Learning (March 2019) Pieter Abbeel, Sergey Karayev, Josh Tobin L6: Troubleshooting

Lifecycle of a ML project

!2

Planning & project setup

Data collection & labeling

Training & debugging

Deploying & testing

Team & hiring

Per-project activities

Infra & tooling

Cross-project infrastructure

Page 3: Full Stack Deep Learning...Full Stack Deep Learning (March 2019) Pieter Abbeel, Sergey Karayev, Josh Tobin L6: Troubleshooting Suppose you can’t reproduce a result!7 He, Kaiming,

Full Stack Deep Learning (March 2019) Pieter Abbeel, Sergey Karayev, Josh Tobin L6: Troubleshooting

Why talk about DL troubleshooting?

!3

XKCD, https://xkcd.com/1838/

Page 4: Full Stack Deep Learning...Full Stack Deep Learning (March 2019) Pieter Abbeel, Sergey Karayev, Josh Tobin L6: Troubleshooting Suppose you can’t reproduce a result!7 He, Kaiming,

Full Stack Deep Learning (March 2019) Pieter Abbeel, Sergey Karayev, Josh Tobin L6: Troubleshooting

Why talk about DL troubleshooting?

!4

Page 5: Full Stack Deep Learning...Full Stack Deep Learning (March 2019) Pieter Abbeel, Sergey Karayev, Josh Tobin L6: Troubleshooting Suppose you can’t reproduce a result!7 He, Kaiming,

Full Stack Deep Learning (March 2019) Pieter Abbeel, Sergey Karayev, Josh Tobin L6: Troubleshooting

Why talk about DL troubleshooting?

!5

Common sentiment among practitioners:

80-90% of time debugging and tuning

10-20% deriving math or implementing things

Page 6: Full Stack Deep Learning...Full Stack Deep Learning (March 2019) Pieter Abbeel, Sergey Karayev, Josh Tobin L6: Troubleshooting Suppose you can’t reproduce a result!7 He, Kaiming,

Full Stack Deep Learning (March 2019) Pieter Abbeel, Sergey Karayev, Josh Tobin L6: Troubleshooting

Why is DL troubleshooting so hard?

!6

Page 7: Full Stack Deep Learning...Full Stack Deep Learning (March 2019) Pieter Abbeel, Sergey Karayev, Josh Tobin L6: Troubleshooting Suppose you can’t reproduce a result!7 He, Kaiming,

Full Stack Deep Learning (March 2019) Pieter Abbeel, Sergey Karayev, Josh Tobin L6: Troubleshooting

Suppose you can’t reproduce a result

!7

He, Kaiming, et al. "Deep residual learning for image recognition." Proceedings of the IEEE conference on computer vision and pattern recognition. 2016.

Your learning curve

0. Why is troubleshooting hard?

Page 8: Full Stack Deep Learning...Full Stack Deep Learning (March 2019) Pieter Abbeel, Sergey Karayev, Josh Tobin L6: Troubleshooting Suppose you can’t reproduce a result!7 He, Kaiming,

Full Stack Deep Learning (March 2019) Pieter Abbeel, Sergey Karayev, Josh Tobin L6: Troubleshooting

Why is your performance worse?

!8

Poor model performance

0. Why is troubleshooting hard?

Page 9: Full Stack Deep Learning...Full Stack Deep Learning (March 2019) Pieter Abbeel, Sergey Karayev, Josh Tobin L6: Troubleshooting Suppose you can’t reproduce a result!7 He, Kaiming,

Full Stack Deep Learning (March 2019) Pieter Abbeel, Sergey Karayev, Josh Tobin L6: Troubleshooting

Why is your performance worse?

!9

Poor model performance

Implementation bugs

0. Why is troubleshooting hard?

Page 10: Full Stack Deep Learning...Full Stack Deep Learning (March 2019) Pieter Abbeel, Sergey Karayev, Josh Tobin L6: Troubleshooting Suppose you can’t reproduce a result!7 He, Kaiming,

Full Stack Deep Learning (March 2019) Pieter Abbeel, Sergey Karayev, Josh Tobin L6: Troubleshooting

Most DL bugs are invisible

!10

0. Why is troubleshooting hard?

Page 11: Full Stack Deep Learning...Full Stack Deep Learning (March 2019) Pieter Abbeel, Sergey Karayev, Josh Tobin L6: Troubleshooting Suppose you can’t reproduce a result!7 He, Kaiming,

Full Stack Deep Learning (March 2019) Pieter Abbeel, Sergey Karayev, Josh Tobin L6: Troubleshooting

0. Why is troubleshooting hard?

Most DL bugs are invisible

!11

Labels out of order!

Page 12: Full Stack Deep Learning...Full Stack Deep Learning (March 2019) Pieter Abbeel, Sergey Karayev, Josh Tobin L6: Troubleshooting Suppose you can’t reproduce a result!7 He, Kaiming,

Full Stack Deep Learning (March 2019) Pieter Abbeel, Sergey Karayev, Josh Tobin L6: Troubleshooting

Why is your performance worse?

!12

Poor model performance

Implementation bugs

0. Why is troubleshooting hard?

Page 13: Full Stack Deep Learning...Full Stack Deep Learning (March 2019) Pieter Abbeel, Sergey Karayev, Josh Tobin L6: Troubleshooting Suppose you can’t reproduce a result!7 He, Kaiming,

Full Stack Deep Learning (March 2019) Pieter Abbeel, Sergey Karayev, Josh Tobin L6: Troubleshooting

Why is your performance worse?

!13

Poor model performance

Implementation bugs

Hyperparameter choices

0. Why is troubleshooting hard?

Page 14: Full Stack Deep Learning...Full Stack Deep Learning (March 2019) Pieter Abbeel, Sergey Karayev, Josh Tobin L6: Troubleshooting Suppose you can’t reproduce a result!7 He, Kaiming,

Full Stack Deep Learning (March 2019) Pieter Abbeel, Sergey Karayev, Josh Tobin L6: Troubleshooting

Models are sensitive to hyperparameters

!14

0. Why is troubleshooting hard?

Andrej Karpathy, CS231n course notes

Page 15: Full Stack Deep Learning...Full Stack Deep Learning (March 2019) Pieter Abbeel, Sergey Karayev, Josh Tobin L6: Troubleshooting Suppose you can’t reproduce a result!7 He, Kaiming,

Full Stack Deep Learning (March 2019) Pieter Abbeel, Sergey Karayev, Josh Tobin L6: Troubleshooting

Andrej Karpathy, CS231n course notes He, Kaiming, et al. "Delving deep into rectifiers: Surpassing human-level performance on imagenet classification." Proceedings of the IEEE international conference on computer vision. 2015.

Models are sensitive to hyperparameters

!15

0. Why is troubleshooting hard?

Page 16: Full Stack Deep Learning...Full Stack Deep Learning (March 2019) Pieter Abbeel, Sergey Karayev, Josh Tobin L6: Troubleshooting Suppose you can’t reproduce a result!7 He, Kaiming,

Full Stack Deep Learning (March 2019) Pieter Abbeel, Sergey Karayev, Josh Tobin L6: Troubleshooting

Why is your performance worse?

!16

Poor model performance

Implementation bugs

Hyperparameter choices

0. Why is troubleshooting hard?

Page 17: Full Stack Deep Learning...Full Stack Deep Learning (March 2019) Pieter Abbeel, Sergey Karayev, Josh Tobin L6: Troubleshooting Suppose you can’t reproduce a result!7 He, Kaiming,

Full Stack Deep Learning (March 2019) Pieter Abbeel, Sergey Karayev, Josh Tobin L6: Troubleshooting

Why is your performance worse?

!17

Data/model fit

Poor model performance

Implementation bugs

Hyperparameter choices

0. Why is troubleshooting hard?

Page 18: Full Stack Deep Learning...Full Stack Deep Learning (March 2019) Pieter Abbeel, Sergey Karayev, Josh Tobin L6: Troubleshooting Suppose you can’t reproduce a result!7 He, Kaiming,

Full Stack Deep Learning (March 2019) Pieter Abbeel, Sergey Karayev, Josh Tobin L6: Troubleshooting

Data / model fit

!18

Yours: self-driving car images

0. Why is troubleshooting hard?

Data from the paper: ImageNet

Page 19: Full Stack Deep Learning...Full Stack Deep Learning (March 2019) Pieter Abbeel, Sergey Karayev, Josh Tobin L6: Troubleshooting Suppose you can’t reproduce a result!7 He, Kaiming,

Full Stack Deep Learning (March 2019) Pieter Abbeel, Sergey Karayev, Josh Tobin L6: Troubleshooting

Why is your performance worse?

!19

Data/model fit

Poor model performance

Implementation bugs

Hyperparameter choices

0. Why is troubleshooting hard?

Page 20: Full Stack Deep Learning...Full Stack Deep Learning (March 2019) Pieter Abbeel, Sergey Karayev, Josh Tobin L6: Troubleshooting Suppose you can’t reproduce a result!7 He, Kaiming,

Full Stack Deep Learning (March 2019) Pieter Abbeel, Sergey Karayev, Josh Tobin L6: Troubleshooting

Why is your performance worse?

!20

Dataset constructionData/model fit

Poor model performance

Implementation bugs

Hyperparameter choices

0. Why is troubleshooting hard?

Page 21: Full Stack Deep Learning...Full Stack Deep Learning (March 2019) Pieter Abbeel, Sergey Karayev, Josh Tobin L6: Troubleshooting Suppose you can’t reproduce a result!7 He, Kaiming,

Full Stack Deep Learning (March 2019) Pieter Abbeel, Sergey Karayev, Josh Tobin L6: Troubleshooting

Constructing good datasets is hard

!21

Amount of lost sleep over...

PhD Tesla

Slide from Andrej Karpathy’s talk “Building the Software 2.0 Stack” at TrainAI 2018, 5/10/2018

0. Why is troubleshooting hard?

Page 22: Full Stack Deep Learning...Full Stack Deep Learning (March 2019) Pieter Abbeel, Sergey Karayev, Josh Tobin L6: Troubleshooting Suppose you can’t reproduce a result!7 He, Kaiming,

Full Stack Deep Learning (March 2019) Pieter Abbeel, Sergey Karayev, Josh Tobin L6: Troubleshooting

Common dataset construction issues

!22

0. Why is troubleshooting hard?

• Not enough data

• Class imbalances

• Noisy labels

• Train / test from different distributions

• etc

Page 23: Full Stack Deep Learning...Full Stack Deep Learning (March 2019) Pieter Abbeel, Sergey Karayev, Josh Tobin L6: Troubleshooting Suppose you can’t reproduce a result!7 He, Kaiming,

Full Stack Deep Learning (March 2019) Pieter Abbeel, Sergey Karayev, Josh Tobin L6: Troubleshooting

Takeaways: why is troubleshooting hard?

!23

0. Why is troubleshooting hard?

• Hard to tell if you have a bug

• Lots of possible sources for the same degradation in performance

• Results can be sensitive to small changes in hyperparameters and dataset makeup

Page 24: Full Stack Deep Learning...Full Stack Deep Learning (March 2019) Pieter Abbeel, Sergey Karayev, Josh Tobin L6: Troubleshooting Suppose you can’t reproduce a result!7 He, Kaiming,

Full Stack Deep Learning (March 2019) Pieter Abbeel, Sergey Karayev, Josh Tobin L6: Troubleshooting

Strategy for DL troubleshooting

!24

Page 25: Full Stack Deep Learning...Full Stack Deep Learning (March 2019) Pieter Abbeel, Sergey Karayev, Josh Tobin L6: Troubleshooting Suppose you can’t reproduce a result!7 He, Kaiming,

Full Stack Deep Learning (March 2019) Pieter Abbeel, Sergey Karayev, Josh Tobin L6: Troubleshooting

Key mindset for DL troubleshooting

!25

Pessimism.

Page 26: Full Stack Deep Learning...Full Stack Deep Learning (March 2019) Pieter Abbeel, Sergey Karayev, Josh Tobin L6: Troubleshooting Suppose you can’t reproduce a result!7 He, Kaiming,

Full Stack Deep Learning (March 2019) Pieter Abbeel, Sergey Karayev, Josh Tobin L6: Troubleshooting

Key idea of DL troubleshooting

!26

…Start simple and gradually ramp up complexity

Since it’s hard to disambiguate errors…

Page 27: Full Stack Deep Learning...Full Stack Deep Learning (March 2019) Pieter Abbeel, Sergey Karayev, Josh Tobin L6: Troubleshooting Suppose you can’t reproduce a result!7 He, Kaiming,

Full Stack Deep Learning (March 2019) Pieter Abbeel, Sergey Karayev, Josh Tobin L6: Troubleshooting

Strategy for DL troubleshooting

!27

Tune hyper-parameters

Implement & debugStart simple Evaluate

Improve model/data

Meets re-quirements

Page 28: Full Stack Deep Learning...Full Stack Deep Learning (March 2019) Pieter Abbeel, Sergey Karayev, Josh Tobin L6: Troubleshooting Suppose you can’t reproduce a result!7 He, Kaiming,

Full Stack Deep Learning (March 2019) Pieter Abbeel, Sergey Karayev, Josh Tobin L6: Troubleshooting

Quick summary

!28

Overview

Start simple

• Choose the simplest model & data possible (e.g., LeNet on a subset of your data)

Page 29: Full Stack Deep Learning...Full Stack Deep Learning (March 2019) Pieter Abbeel, Sergey Karayev, Josh Tobin L6: Troubleshooting Suppose you can’t reproduce a result!7 He, Kaiming,

Full Stack Deep Learning (March 2019) Pieter Abbeel, Sergey Karayev, Josh Tobin L6: Troubleshooting

Quick summary

!29

Overview

Implement & debug

Start simple

• Choose the simplest model & data possible (e.g., LeNet on a subset of your data)

• Once model runs, overfit a single batch & reproduce a known result

Page 30: Full Stack Deep Learning...Full Stack Deep Learning (March 2019) Pieter Abbeel, Sergey Karayev, Josh Tobin L6: Troubleshooting Suppose you can’t reproduce a result!7 He, Kaiming,

Full Stack Deep Learning (March 2019) Pieter Abbeel, Sergey Karayev, Josh Tobin L6: Troubleshooting

Quick summary

!30

Overview

Implement & debug

Start simple

Evaluate

• Choose the simplest model & data possible (e.g., LeNet on a subset of your data)

• Once model runs, overfit a single batch & reproduce a known result

• Apply the bias-variance decomposition to decide what to do next

Page 31: Full Stack Deep Learning...Full Stack Deep Learning (March 2019) Pieter Abbeel, Sergey Karayev, Josh Tobin L6: Troubleshooting Suppose you can’t reproduce a result!7 He, Kaiming,

Full Stack Deep Learning (March 2019) Pieter Abbeel, Sergey Karayev, Josh Tobin L6: Troubleshooting

Quick summary

!31

Overview

Tune hyp-eparams

Implement & debug

Start simple

Evaluate

• Choose the simplest model & data possible (e.g., LeNet on a subset of your data)

• Once model runs, overfit a single batch & reproduce a known result

• Apply the bias-variance decomposition to decide what to do next

• Use coarse-to-fine random searches

Page 32: Full Stack Deep Learning...Full Stack Deep Learning (March 2019) Pieter Abbeel, Sergey Karayev, Josh Tobin L6: Troubleshooting Suppose you can’t reproduce a result!7 He, Kaiming,

Full Stack Deep Learning (March 2019) Pieter Abbeel, Sergey Karayev, Josh Tobin L6: Troubleshooting

Tune hyp-eparams

Quick summary

!32

Implement & debug

Start simple

Evaluate

Improve model/data

• Choose the simplest model & data possible (e.g., LeNet on a subset of your data)

• Once model runs, overfit a single batch & reproduce a known result

• Apply the bias-variance decomposition to decide what to do next

• Use coarse-to-fine random searches

• Make your model bigger if you underfit; add data or regularize if you overfit

Overview

Page 33: Full Stack Deep Learning...Full Stack Deep Learning (March 2019) Pieter Abbeel, Sergey Karayev, Josh Tobin L6: Troubleshooting Suppose you can’t reproduce a result!7 He, Kaiming,

Full Stack Deep Learning (March 2019) Pieter Abbeel, Sergey Karayev, Josh Tobin L6: Troubleshooting

We’ll assume you already have…

!33

• Initial test set

• A single metric to improve

• Target performance based on human-level performance, published results, previous baselines, etc

Page 34: Full Stack Deep Learning...Full Stack Deep Learning (March 2019) Pieter Abbeel, Sergey Karayev, Josh Tobin L6: Troubleshooting Suppose you can’t reproduce a result!7 He, Kaiming,

Full Stack Deep Learning (March 2019) Pieter Abbeel, Sergey Karayev, Josh Tobin L6: Troubleshooting

We’ll assume you already have…

!34

0 (no pedestrian) 1 (yes pedestrian)

Goal: 99% classification accuracy

Running example

• Initial test set

• A single metric to improve

• Target performance based on human-level performance, published results, previous baselines, etc

Page 35: Full Stack Deep Learning...Full Stack Deep Learning (March 2019) Pieter Abbeel, Sergey Karayev, Josh Tobin L6: Troubleshooting Suppose you can’t reproduce a result!7 He, Kaiming,

Full Stack Deep Learning (March 2019) Pieter Abbeel, Sergey Karayev, Josh Tobin L6: Troubleshooting

Tune hyper-parameters

Implement & debugStart simple Evaluate

Improve model/data

Meets re-quirements

Strategy for DL troubleshooting

!35

Page 36: Full Stack Deep Learning...Full Stack Deep Learning (March 2019) Pieter Abbeel, Sergey Karayev, Josh Tobin L6: Troubleshooting Suppose you can’t reproduce a result!7 He, Kaiming,

Full Stack Deep Learning (March 2019) Pieter Abbeel, Sergey Karayev, Josh Tobin L6: Troubleshooting

Starting simple

!36

Normalize inputs

Choose a simple architecture

Simplify the problem

Use sensible defaults

Steps

b

a

c

d

1. Start simple

Page 37: Full Stack Deep Learning...Full Stack Deep Learning (March 2019) Pieter Abbeel, Sergey Karayev, Josh Tobin L6: Troubleshooting Suppose you can’t reproduce a result!7 He, Kaiming,

Full Stack Deep Learning (March 2019) Pieter Abbeel, Sergey Karayev, Josh Tobin L6: Troubleshooting

Demystifying architecture selection

!37

Start here Consider using this later

Images LeNet-like architecture

LSTM with one hidden layer (or temporal convs)

Fully connected neural net with one hidden layer

ResNet

Attention model or WaveNet-like model

Problem-dependent

Images

Sequences

Other

1. Start simple

Page 38: Full Stack Deep Learning...Full Stack Deep Learning (March 2019) Pieter Abbeel, Sergey Karayev, Josh Tobin L6: Troubleshooting Suppose you can’t reproduce a result!7 He, Kaiming,

Full Stack Deep Learning (March 2019) Pieter Abbeel, Sergey Karayev, Josh Tobin L6: Troubleshooting

Dealing with multiple input modalities

!38

“This”

“is”

“a”

“cat”

Input 2

Input 3

Input 1

1. Start simple

Page 39: Full Stack Deep Learning...Full Stack Deep Learning (March 2019) Pieter Abbeel, Sergey Karayev, Josh Tobin L6: Troubleshooting Suppose you can’t reproduce a result!7 He, Kaiming,

Full Stack Deep Learning (March 2019) Pieter Abbeel, Sergey Karayev, Josh Tobin L6: Troubleshooting

Dealing with multiple input modalities

!39

“This”

“is”

“a”

“cat”

Input 2

Input 3

Input 1

1. Map each into a lower dimensional feature space

1. Start simple

Page 40: Full Stack Deep Learning...Full Stack Deep Learning (March 2019) Pieter Abbeel, Sergey Karayev, Josh Tobin L6: Troubleshooting Suppose you can’t reproduce a result!7 He, Kaiming,

Full Stack Deep Learning (March 2019) Pieter Abbeel, Sergey Karayev, Josh Tobin L6: Troubleshooting

Dealing with multiple input modalities

!40

ConvNet Flatten

“This”

“is”

“a”

“cat”

LSTM

Input 2

Input 3

Input 1

(64-dim)

(72-dim)

(48-dim)

1. Map each into a lower dimensional feature space

1. Start simple

Page 41: Full Stack Deep Learning...Full Stack Deep Learning (March 2019) Pieter Abbeel, Sergey Karayev, Josh Tobin L6: Troubleshooting Suppose you can’t reproduce a result!7 He, Kaiming,

Full Stack Deep Learning (March 2019) Pieter Abbeel, Sergey Karayev, Josh Tobin L6: Troubleshooting

Dealing with multiple input modalities

!41

ConvNet Flatten Con“cat”

“This”

“is”

“a”

“cat”

LSTM

Input 2

Input 3

Input 1

(64-dim)

(72-dim)

(48-dim)

(184-dim)

2. Concatenate

1. Start simple

Page 42: Full Stack Deep Learning...Full Stack Deep Learning (March 2019) Pieter Abbeel, Sergey Karayev, Josh Tobin L6: Troubleshooting Suppose you can’t reproduce a result!7 He, Kaiming,

Full Stack Deep Learning (March 2019) Pieter Abbeel, Sergey Karayev, Josh Tobin L6: Troubleshooting

Dealing with multiple input modalities

!42

ConvNet Flatten Concat

“This”

“is”

“a”

“cat”

LSTM

FC FC Output

T/F

Input 2

Input 3

Input 1

(64-dim)

(72-dim)

(48-dim)

(184-dim)

(256-dim)

(128-dim)

3. Pass through fully connected layers to output

1. Start simple

Page 43: Full Stack Deep Learning...Full Stack Deep Learning (March 2019) Pieter Abbeel, Sergey Karayev, Josh Tobin L6: Troubleshooting Suppose you can’t reproduce a result!7 He, Kaiming,

Full Stack Deep Learning (March 2019) Pieter Abbeel, Sergey Karayev, Josh Tobin L6: Troubleshooting

Starting simple

!43

Normalize inputs

Choose a simple architecture

Simplify the problem

Use sensible defaults

Steps

b

a

c

d

1. Start simple

Page 44: Full Stack Deep Learning...Full Stack Deep Learning (March 2019) Pieter Abbeel, Sergey Karayev, Josh Tobin L6: Troubleshooting Suppose you can’t reproduce a result!7 He, Kaiming,

Full Stack Deep Learning (March 2019) Pieter Abbeel, Sergey Karayev, Josh Tobin L6: Troubleshooting

Recommended network / optimizer defaults

!44

• Optimizer: Adam optimizer with learning rate 3e-4

• Activations: relu (FC and Conv models), tanh (LSTMs)

• Initialization: He et al. normal (relu), Glorot normal (tanh)

• Regularization: None

• Data normalization: None

1. Start simple

Page 45: Full Stack Deep Learning...Full Stack Deep Learning (March 2019) Pieter Abbeel, Sergey Karayev, Josh Tobin L6: Troubleshooting Suppose you can’t reproduce a result!7 He, Kaiming,

Full Stack Deep Learning (March 2019) Pieter Abbeel, Sergey Karayev, Josh Tobin L6: Troubleshooting

Definitions of recommended initializers

!45

N 0,

r2

n

!

<latexit sha1_base64="71oIrQeCI+zKtAqPcxHfa3NQrcI=">AAACFnicbVDLSgMxFM3UV62vqks3wSJU0DJTBF0W3LgqFewDOqVk0kwbmsmMyR2hDPMVbvwVNy4UcSvu/BszbRfaeiBwOOdebs7xIsE12Pa3lVtZXVvfyG8WtrZ3dveK+wctHcaKsiYNRag6HtFMcMmawEGwTqQYCTzB2t74OvPbD0xpHso7mESsF5Ch5D6nBIzUL567AYERJSKpp65gPpTtM+zqewWJ6ytCk2qayDR1FR+O4LRfLNkVewq8TJw5KaE5Gv3ilzsIaRwwCVQQrbuOHUEvIQo4FSwtuLFmEaFjMmRdQyUJmO4l01gpPjHKAPuhMk8Cnqq/NxISaD0JPDOZhdCLXib+53Vj8K96CZdRDEzS2SE/FhhCnHWEB1wxCmJiCKGKm79iOiKmDTBNFkwJzmLkZdKqVhy74txelGr1eR15dISOURk56BLV0A1qoCai6BE9o1f0Zj1ZL9a79TEbzVnznUP0B9bnD/n6n+k=</latexit><latexit sha1_base64="71oIrQeCI+zKtAqPcxHfa3NQrcI=">AAACFnicbVDLSgMxFM3UV62vqks3wSJU0DJTBF0W3LgqFewDOqVk0kwbmsmMyR2hDPMVbvwVNy4UcSvu/BszbRfaeiBwOOdebs7xIsE12Pa3lVtZXVvfyG8WtrZ3dveK+wctHcaKsiYNRag6HtFMcMmawEGwTqQYCTzB2t74OvPbD0xpHso7mESsF5Ch5D6nBIzUL567AYERJSKpp65gPpTtM+zqewWJ6ytCk2qayDR1FR+O4LRfLNkVewq8TJw5KaE5Gv3ilzsIaRwwCVQQrbuOHUEvIQo4FSwtuLFmEaFjMmRdQyUJmO4l01gpPjHKAPuhMk8Cnqq/NxISaD0JPDOZhdCLXib+53Vj8K96CZdRDEzS2SE/FhhCnHWEB1wxCmJiCKGKm79iOiKmDTBNFkwJzmLkZdKqVhy74txelGr1eR15dISOURk56BLV0A1qoCai6BE9o1f0Zj1ZL9a79TEbzVnznUP0B9bnD/n6n+k=</latexit><latexit sha1_base64="71oIrQeCI+zKtAqPcxHfa3NQrcI=">AAACFnicbVDLSgMxFM3UV62vqks3wSJU0DJTBF0W3LgqFewDOqVk0kwbmsmMyR2hDPMVbvwVNy4UcSvu/BszbRfaeiBwOOdebs7xIsE12Pa3lVtZXVvfyG8WtrZ3dveK+wctHcaKsiYNRag6HtFMcMmawEGwTqQYCTzB2t74OvPbD0xpHso7mESsF5Ch5D6nBIzUL567AYERJSKpp65gPpTtM+zqewWJ6ytCk2qayDR1FR+O4LRfLNkVewq8TJw5KaE5Gv3ilzsIaRwwCVQQrbuOHUEvIQo4FSwtuLFmEaFjMmRdQyUJmO4l01gpPjHKAPuhMk8Cnqq/NxISaD0JPDOZhdCLXib+53Vj8K96CZdRDEzS2SE/FhhCnHWEB1wxCmJiCKGKm79iOiKmDTBNFkwJzmLkZdKqVhy74txelGr1eR15dISOURk56BLV0A1qoCai6BE9o1f0Zj1ZL9a79TEbzVnznUP0B9bnD/n6n+k=</latexit><latexit sha1_base64="71oIrQeCI+zKtAqPcxHfa3NQrcI=">AAACFnicbVDLSgMxFM3UV62vqks3wSJU0DJTBF0W3LgqFewDOqVk0kwbmsmMyR2hDPMVbvwVNy4UcSvu/BszbRfaeiBwOOdebs7xIsE12Pa3lVtZXVvfyG8WtrZ3dveK+wctHcaKsiYNRag6HtFMcMmawEGwTqQYCTzB2t74OvPbD0xpHso7mESsF5Ch5D6nBIzUL567AYERJSKpp65gPpTtM+zqewWJ6ytCk2qayDR1FR+O4LRfLNkVewq8TJw5KaE5Gv3ilzsIaRwwCVQQrbuOHUEvIQo4FSwtuLFmEaFjMmRdQyUJmO4l01gpPjHKAPuhMk8Cnqq/NxISaD0JPDOZhdCLXib+53Vj8K96CZdRDEzS2SE/FhhCnHWEB1wxCmJiCKGKm79iOiKmDTBNFkwJzmLkZdKqVhy74txelGr1eR15dISOURk56BLV0A1qoCai6BE9o1f0Zj1ZL9a79TEbzVnznUP0B9bnD/n6n+k=</latexit>

N 0,

r2

n+m

!

<latexit sha1_base64="zIQOpEBJhC7QdJVej5ZiUKz47uk=">AAACGnicbVDLSgMxFM3UV62vqks3wSJUlDJTBF0W3LgqFewDOqVk0kwbmsmMyR2hDPMdbvwVNy4UcSdu/BszbRfaeiBwOOdebs7xIsE12Pa3lVtZXVvfyG8WtrZ3dveK+wctHcaKsiYNRag6HtFMcMmawEGwTqQYCTzB2t74OvPbD0xpHso7mESsF5Ch5D6nBIzULzpuQGBEiUjqqSuYD2X7HLv6XkHi+orQpJomEp/hIE1dxYcjOO0XS3bFngIvE2dOSmiORr/46Q5CGgdMAhVE665jR9BLiAJOBUsLbqxZROiYDFnXUEkCpnvJNFqKT4wywH6ozJOAp+rvjYQEWk8Cz0xmQfSil4n/ed0Y/KtewmUUA5N0dsiPBYYQZz3hAVeMgpgYQqji5q+YjohpBEybBVOCsxh5mbSqFceuOLcXpVp9XkceHaFjVEYOukQ1dIMaqIkoekTP6BW9WU/Wi/VufcxGc9Z85xD9gfX1Axa5oOk=</latexit><latexit sha1_base64="zIQOpEBJhC7QdJVej5ZiUKz47uk=">AAACGnicbVDLSgMxFM3UV62vqks3wSJUlDJTBF0W3LgqFewDOqVk0kwbmsmMyR2hDPMdbvwVNy4UcSdu/BszbRfaeiBwOOdebs7xIsE12Pa3lVtZXVvfyG8WtrZ3dveK+wctHcaKsiYNRag6HtFMcMmawEGwTqQYCTzB2t74OvPbD0xpHso7mESsF5Ch5D6nBIzULzpuQGBEiUjqqSuYD2X7HLv6XkHi+orQpJomEp/hIE1dxYcjOO0XS3bFngIvE2dOSmiORr/46Q5CGgdMAhVE665jR9BLiAJOBUsLbqxZROiYDFnXUEkCpnvJNFqKT4wywH6ozJOAp+rvjYQEWk8Cz0xmQfSil4n/ed0Y/KtewmUUA5N0dsiPBYYQZz3hAVeMgpgYQqji5q+YjohpBEybBVOCsxh5mbSqFceuOLcXpVp9XkceHaFjVEYOukQ1dIMaqIkoekTP6BW9WU/Wi/VufcxGc9Z85xD9gfX1Axa5oOk=</latexit><latexit sha1_base64="zIQOpEBJhC7QdJVej5ZiUKz47uk=">AAACGnicbVDLSgMxFM3UV62vqks3wSJUlDJTBF0W3LgqFewDOqVk0kwbmsmMyR2hDPMdbvwVNy4UcSdu/BszbRfaeiBwOOdebs7xIsE12Pa3lVtZXVvfyG8WtrZ3dveK+wctHcaKsiYNRag6HtFMcMmawEGwTqQYCTzB2t74OvPbD0xpHso7mESsF5Ch5D6nBIzULzpuQGBEiUjqqSuYD2X7HLv6XkHi+orQpJomEp/hIE1dxYcjOO0XS3bFngIvE2dOSmiORr/46Q5CGgdMAhVE665jR9BLiAJOBUsLbqxZROiYDFnXUEkCpnvJNFqKT4wywH6ozJOAp+rvjYQEWk8Cz0xmQfSil4n/ed0Y/KtewmUUA5N0dsiPBYYQZz3hAVeMgpgYQqji5q+YjohpBEybBVOCsxh5mbSqFceuOLcXpVp9XkceHaFjVEYOukQ1dIMaqIkoekTP6BW9WU/Wi/VufcxGc9Z85xD9gfX1Axa5oOk=</latexit><latexit sha1_base64="zIQOpEBJhC7QdJVej5ZiUKz47uk=">AAACGnicbVDLSgMxFM3UV62vqks3wSJUlDJTBF0W3LgqFewDOqVk0kwbmsmMyR2hDPMdbvwVNy4UcSdu/BszbRfaeiBwOOdebs7xIsE12Pa3lVtZXVvfyG8WtrZ3dveK+wctHcaKsiYNRag6HtFMcMmawEGwTqQYCTzB2t74OvPbD0xpHso7mESsF5Ch5D6nBIzULzpuQGBEiUjqqSuYD2X7HLv6XkHi+orQpJomEp/hIE1dxYcjOO0XS3bFngIvE2dOSmiORr/46Q5CGgdMAhVE665jR9BLiAJOBUsLbqxZROiYDFnXUEkCpnvJNFqKT4wywH6ozJOAp+rvjYQEWk8Cz0xmQfSil4n/ed0Y/KtewmUUA5N0dsiPBYYQZz3hAVeMgpgYQqji5q+YjohpBEybBVOCsxh5mbSqFceuOLcXpVp9XkceHaFjVEYOukQ1dIMaqIkoekTP6BW9WU/Wi/VufcxGc9Z85xD9gfX1Axa5oOk=</latexit>

• (n is the number of inputs, m is the number of outputs)

• He et al. normal (used for ReLU)

• Glorot normal (used for tanh)

1. Start simple

Page 46: Full Stack Deep Learning...Full Stack Deep Learning (March 2019) Pieter Abbeel, Sergey Karayev, Josh Tobin L6: Troubleshooting Suppose you can’t reproduce a result!7 He, Kaiming,

Full Stack Deep Learning (March 2019) Pieter Abbeel, Sergey Karayev, Josh Tobin L6: Troubleshooting

Normalize inputs

Starting simple

!46

Choose a simple architecture

Simplify the problem

Use sensible defaultsb

a

c

Steps

d

1. Start simple

Page 47: Full Stack Deep Learning...Full Stack Deep Learning (March 2019) Pieter Abbeel, Sergey Karayev, Josh Tobin L6: Troubleshooting Suppose you can’t reproduce a result!7 He, Kaiming,

Full Stack Deep Learning (March 2019) Pieter Abbeel, Sergey Karayev, Josh Tobin L6: Troubleshooting

Important to normalize scale of input data

!47

• Subtract mean and divide by variance

• For images, fine to scale values to [0, 1] or [-0.5, 0.5](e.g., by dividing by 255)[Careful, make sure your library doesn’t do it for you!]

1. Start simple

Page 48: Full Stack Deep Learning...Full Stack Deep Learning (March 2019) Pieter Abbeel, Sergey Karayev, Josh Tobin L6: Troubleshooting Suppose you can’t reproduce a result!7 He, Kaiming,

Full Stack Deep Learning (March 2019) Pieter Abbeel, Sergey Karayev, Josh Tobin L6: Troubleshooting

Normalize inputs

Starting simple

!48

Choose a simple architecture

Simplify the problem

Use sensible defaultsb

a

c

Steps

d

1. Start simple

Page 49: Full Stack Deep Learning...Full Stack Deep Learning (March 2019) Pieter Abbeel, Sergey Karayev, Josh Tobin L6: Troubleshooting Suppose you can’t reproduce a result!7 He, Kaiming,

Full Stack Deep Learning (March 2019) Pieter Abbeel, Sergey Karayev, Josh Tobin L6: Troubleshooting

Consider simplifying the problem as well

!49

• Start with a small training set (~10,000 examples)

• Use a fixed number of objects, classes, image size, etc.

• Create a simpler synthetic training set

1. Start simple

Page 50: Full Stack Deep Learning...Full Stack Deep Learning (March 2019) Pieter Abbeel, Sergey Karayev, Josh Tobin L6: Troubleshooting Suppose you can’t reproduce a result!7 He, Kaiming,

Full Stack Deep Learning (March 2019) Pieter Abbeel, Sergey Karayev, Josh Tobin L6: Troubleshooting

Simplest model for pedestrian detection

!50

0 (no pedestrian) 1 (yes pedestrian)

Goal: 99% classification accuracy

Running example

• Start with a subset of 10,000 images for training, 1,000 for val, and 500 for test

• Use a LeNet architecture with sigmoid cross-entropy loss

• Adam optimizer with LR 3e-4

• No regularization

1. Start simple

Page 51: Full Stack Deep Learning...Full Stack Deep Learning (March 2019) Pieter Abbeel, Sergey Karayev, Josh Tobin L6: Troubleshooting Suppose you can’t reproduce a result!7 He, Kaiming,

Full Stack Deep Learning (March 2019) Pieter Abbeel, Sergey Karayev, Josh Tobin L6: Troubleshooting

Normalize inputs

Starting simple

!51

Choose a simple architecture

Simplify the problem

Use sensible defaultsb

a

c

Steps

d

Summary

• Start with a simpler version of your problem (e.g., smaller dataset)

• Adam optimizer & no regularization

• Subtract mean and divide by std, or just divide by 255 (ims)

• LeNet, LSTM, or Fully Connected

1. Start simple

Page 52: Full Stack Deep Learning...Full Stack Deep Learning (March 2019) Pieter Abbeel, Sergey Karayev, Josh Tobin L6: Troubleshooting Suppose you can’t reproduce a result!7 He, Kaiming,

Full Stack Deep Learning (March 2019) Pieter Abbeel, Sergey Karayev, Josh Tobin L6: Troubleshooting

Tune hyper-parameters

Implement & debugStart simple Evaluate

Improve model/data

Meets re-quirements

Strategy for DL troubleshooting

!52

Page 53: Full Stack Deep Learning...Full Stack Deep Learning (March 2019) Pieter Abbeel, Sergey Karayev, Josh Tobin L6: Troubleshooting Suppose you can’t reproduce a result!7 He, Kaiming,

Full Stack Deep Learning (March 2019) Pieter Abbeel, Sergey Karayev, Josh Tobin L6: Troubleshooting

Implementing bug-free DL models

!53

Get your model to run

Compare to a known result

Overfit a single batchb

a

c

Steps

2. Implement & debug

Page 54: Full Stack Deep Learning...Full Stack Deep Learning (March 2019) Pieter Abbeel, Sergey Karayev, Josh Tobin L6: Troubleshooting Suppose you can’t reproduce a result!7 He, Kaiming,

Full Stack Deep Learning (March 2019) Pieter Abbeel, Sergey Karayev, Josh Tobin L6: Troubleshooting

Preview: the five most common DL bugs

!54

• Incorrect shapes for your tensorsCan fail silently! E.g., accidental broadcasting: x.shape = (None,), y.shape = (None, 1), (x+y).shape = (None, None)

• Pre-processing inputs incorrectly E.g., Forgetting to normalize, or too much pre-processing

• Incorrect input to your loss function E.g., softmaxed outputs to a loss that expects logits

• Forgot to set up train mode for the net correctly E.g., toggling train/eval, controlling batch norm dependencies

• Numerical instability - inf/NaNOften stems from using an exp, log, or div operation

2. Implement & debug

Page 55: Full Stack Deep Learning...Full Stack Deep Learning (March 2019) Pieter Abbeel, Sergey Karayev, Josh Tobin L6: Troubleshooting Suppose you can’t reproduce a result!7 He, Kaiming,

Full Stack Deep Learning (March 2019) Pieter Abbeel, Sergey Karayev, Josh Tobin L6: Troubleshooting

General advice for implementing your model

!55

Lightweight implementation

• Minimum possible new lines of code for v1

• Rule of thumb: <200 lines

• (Tested infrastructure components are fine)

2. Implement & debug

Page 56: Full Stack Deep Learning...Full Stack Deep Learning (March 2019) Pieter Abbeel, Sergey Karayev, Josh Tobin L6: Troubleshooting Suppose you can’t reproduce a result!7 He, Kaiming,

Full Stack Deep Learning (March 2019) Pieter Abbeel, Sergey Karayev, Josh Tobin L6: Troubleshooting

General advice for implementing your model

!56

Use off-the-shelf components, e.g.,

• Keras

• tf.layers.dense(…) instead of tf.nn.relu(tf.matmul(W, x))

• tf.losses.cross_entropy(…) instead of writing out the exp

Lightweight implementation

• Minimum possible new lines of code for v1

• Rule of thumb: <200 lines

• (Tested infrastructure components are fine)

2. Implement & debug

Page 57: Full Stack Deep Learning...Full Stack Deep Learning (March 2019) Pieter Abbeel, Sergey Karayev, Josh Tobin L6: Troubleshooting Suppose you can’t reproduce a result!7 He, Kaiming,

Full Stack Deep Learning (March 2019) Pieter Abbeel, Sergey Karayev, Josh Tobin L6: Troubleshooting

General advice for implementing your model

!57

Build complicated data pipelines later

• Start with a dataset you can load into memory

Use off-the-shelf components, e.g.,

• Keras

• tf.layers.dense(…) instead of tf.nn.relu(tf.matmul(W, x))

• tf.losses.cross_entropy(…) instead of writing out the exp

Lightweight implementation

• Minimum possible new lines of code for v1

• Rule of thumb: <200 lines

• (Tested infrastructure components are fine)

2. Implement & debug

Page 58: Full Stack Deep Learning...Full Stack Deep Learning (March 2019) Pieter Abbeel, Sergey Karayev, Josh Tobin L6: Troubleshooting Suppose you can’t reproduce a result!7 He, Kaiming,

Full Stack Deep Learning (March 2019) Pieter Abbeel, Sergey Karayev, Josh Tobin L6: Troubleshooting

Implementing bug-free DL models

!58

Get your model to run

Compare to a known result

Overfit a single batchb

a

c

Steps

2. Implement & debug

Page 59: Full Stack Deep Learning...Full Stack Deep Learning (March 2019) Pieter Abbeel, Sergey Karayev, Josh Tobin L6: Troubleshooting Suppose you can’t reproduce a result!7 He, Kaiming,

Full Stack Deep Learning (March 2019) Pieter Abbeel, Sergey Karayev, Josh Tobin L6: Troubleshooting

Get your model to runa Shape

mismatch

Casting issue

OOM

Other

Common issues Recommended resolution

Scale back memory intensive operations one-by-one

Standard debugging toolkit (Stack Overflow + interactive debugger)

Implementing bug-free DL models

!59

Step through model creation and inference in a debugger

2. Implement & debug

Page 60: Full Stack Deep Learning...Full Stack Deep Learning (March 2019) Pieter Abbeel, Sergey Karayev, Josh Tobin L6: Troubleshooting Suppose you can’t reproduce a result!7 He, Kaiming,

Full Stack Deep Learning (March 2019) Pieter Abbeel, Sergey Karayev, Josh Tobin L6: Troubleshooting

Get your model to runa Shape

mismatch

Casting issue

OOM

Other

Common issues Recommended resolution

Scale back memory intensive operations one-by-one

Standard debugging toolkit (Stack Overflow + interactive debugger)

Implementing bug-free DL models

!60

Step through model creation and inference in a debugger

2. Implement & debug

Page 61: Full Stack Deep Learning...Full Stack Deep Learning (March 2019) Pieter Abbeel, Sergey Karayev, Josh Tobin L6: Troubleshooting Suppose you can’t reproduce a result!7 He, Kaiming,

Full Stack Deep Learning (March 2019) Pieter Abbeel, Sergey Karayev, Josh Tobin L6: Troubleshooting

Debuggers for DL code

!61

• Pytorch: easy, use ipdb

• tensorflow: trickier Option 1: step through graph creation

2. Implement & debug

Page 62: Full Stack Deep Learning...Full Stack Deep Learning (March 2019) Pieter Abbeel, Sergey Karayev, Josh Tobin L6: Troubleshooting Suppose you can’t reproduce a result!7 He, Kaiming,

Full Stack Deep Learning (March 2019) Pieter Abbeel, Sergey Karayev, Josh Tobin L6: Troubleshooting

Debuggers for DL code

!62

• Pytorch: easy, use ipdb

• tensorflow: trickier Option 2: step into training loop

Evaluate tensors using sess.run(…)

2. Implement & debug

Page 63: Full Stack Deep Learning...Full Stack Deep Learning (March 2019) Pieter Abbeel, Sergey Karayev, Josh Tobin L6: Troubleshooting Suppose you can’t reproduce a result!7 He, Kaiming,

Full Stack Deep Learning (March 2019) Pieter Abbeel, Sergey Karayev, Josh Tobin L6: Troubleshooting

Debuggers for DL code

!63

• Pytorch: easy, use ipdb

• tensorflow: trickier Option 3: use tfdb

Stops execution at each sess.run(…) and lets you inspect

python -m tensorflow.python.debug.examples.debug_mnist --debug

2. Implement & debug

Page 64: Full Stack Deep Learning...Full Stack Deep Learning (March 2019) Pieter Abbeel, Sergey Karayev, Josh Tobin L6: Troubleshooting Suppose you can’t reproduce a result!7 He, Kaiming,

Full Stack Deep Learning (March 2019) Pieter Abbeel, Sergey Karayev, Josh Tobin L6: Troubleshooting

Get your model to runa Shape

mismatch

Casting issue

OOM

Other

Common issues Recommended resolution

Scale back memory intensive operations one-by-one

Standard debugging toolkit (Stack Overflow + interactive debugger)

Implementing bug-free DL models

!64

Step through model creation and inference in a debugger

2. Implement & debug

Page 65: Full Stack Deep Learning...Full Stack Deep Learning (March 2019) Pieter Abbeel, Sergey Karayev, Josh Tobin L6: Troubleshooting Suppose you can’t reproduce a result!7 He, Kaiming,

Full Stack Deep Learning (March 2019) Pieter Abbeel, Sergey Karayev, Josh Tobin L6: Troubleshooting

Shape mismatch Undefined

shapes

Incorrect shapes

Common issues Most common causes

• Flipped dimensions when using tf.reshape(…) • Took sum, average, or softmax over wrong

dimension • Forgot to flatten after conv layers • Forgot to get rid of extra “1” dimensions (e.g., if

shape is (None, 1, 1, 4) • Data stored on disk in a different dtype than

loaded (e.g., stored a float64 numpy array, and loaded it as a float32)

Implementing bug-free DL models

!65

• Confusing tensor.shape, tf.shape(tensor), tensor.get_shape()

• Reshaping things to a shape of type Tensor (e.g., when loading data from a file)

2. Implement & debug

Page 66: Full Stack Deep Learning...Full Stack Deep Learning (March 2019) Pieter Abbeel, Sergey Karayev, Josh Tobin L6: Troubleshooting Suppose you can’t reproduce a result!7 He, Kaiming,

Full Stack Deep Learning (March 2019) Pieter Abbeel, Sergey Karayev, Josh Tobin L6: Troubleshooting

Casting issueData not in

float32

Common issues Most common causes

Implementing bug-free DL models

!66

• Forgot to cast images from uint8 to float32 • Generated data using numpy in float64, forgot to

cast to float32

2. Implement & debug

Page 67: Full Stack Deep Learning...Full Stack Deep Learning (March 2019) Pieter Abbeel, Sergey Karayev, Josh Tobin L6: Troubleshooting Suppose you can’t reproduce a result!7 He, Kaiming,

Full Stack Deep Learning (March 2019) Pieter Abbeel, Sergey Karayev, Josh Tobin L6: Troubleshooting

OOMToo big a

tensor

Too much data

Common issues Most common causes

Duplicating operations

Other processes

• Other processes running on your GPU

• Memory leak due to creating multiple models in the same session

• Repeatedly creating an operation (e.g., in a function that gets called over and over again)

• Too large a batch size for your model (e.g., during evaluation)

• Too large fully connected layers

• Loading too large a dataset into memory, rather than using an input queue

• Allocating too large a buffer for dataset creation

Implementing bug-free DL models

!67

2. Implement & debug

Page 68: Full Stack Deep Learning...Full Stack Deep Learning (March 2019) Pieter Abbeel, Sergey Karayev, Josh Tobin L6: Troubleshooting Suppose you can’t reproduce a result!7 He, Kaiming,

Full Stack Deep Learning (March 2019) Pieter Abbeel, Sergey Karayev, Josh Tobin L6: Troubleshooting

Other common errors Other bugs

Common issues Most common causes

Implementing bug-free DL models

!68

• Forgot to initialize variables • Forgot to turn off bias when using batch norm • “Fetch argument has invalid type” - usually you

overwrote one of your ops with an output during training

2. Implement & debug

Page 69: Full Stack Deep Learning...Full Stack Deep Learning (March 2019) Pieter Abbeel, Sergey Karayev, Josh Tobin L6: Troubleshooting Suppose you can’t reproduce a result!7 He, Kaiming,

Full Stack Deep Learning (March 2019) Pieter Abbeel, Sergey Karayev, Josh Tobin L6: Troubleshooting

Get your model to run

Compare to a known result

Overfit a single batchb

a

c

Steps

Implementing bug-free DL models

!69

2. Implement & debug

Page 70: Full Stack Deep Learning...Full Stack Deep Learning (March 2019) Pieter Abbeel, Sergey Karayev, Josh Tobin L6: Troubleshooting Suppose you can’t reproduce a result!7 He, Kaiming,

Full Stack Deep Learning (March 2019) Pieter Abbeel, Sergey Karayev, Josh Tobin L6: Troubleshooting

Overfit a single batchb

Error goes up

Error oscillates

Common issues

Error explodes

Error plateaus

Most common causes

Implementing bug-free DL models

!70

2. Implement & debug

Page 71: Full Stack Deep Learning...Full Stack Deep Learning (March 2019) Pieter Abbeel, Sergey Karayev, Josh Tobin L6: Troubleshooting Suppose you can’t reproduce a result!7 He, Kaiming,

Full Stack Deep Learning (March 2019) Pieter Abbeel, Sergey Karayev, Josh Tobin L6: Troubleshooting

Overfit a single batchb

Error goes up

Error oscillates

Common issues

Error explodes

Error plateaus

Most common causes

Implementing bug-free DL models

!71

• Flipped the sign of the loss function / gradient • Learning rate too high • Softmax taken over wrong dimension

2. Implement & debug

Page 72: Full Stack Deep Learning...Full Stack Deep Learning (March 2019) Pieter Abbeel, Sergey Karayev, Josh Tobin L6: Troubleshooting Suppose you can’t reproduce a result!7 He, Kaiming,

Full Stack Deep Learning (March 2019) Pieter Abbeel, Sergey Karayev, Josh Tobin L6: Troubleshooting

Overfit a single batchb

Error goes up

Error oscillates

Common issues

Error explodes

Error plateaus

Most common causes

• Numerical issue. Check all exp, log, and div operations • Learning rate too high

Implementing bug-free DL models

!72

2. Implement & debug

Page 73: Full Stack Deep Learning...Full Stack Deep Learning (March 2019) Pieter Abbeel, Sergey Karayev, Josh Tobin L6: Troubleshooting Suppose you can’t reproduce a result!7 He, Kaiming,

Full Stack Deep Learning (March 2019) Pieter Abbeel, Sergey Karayev, Josh Tobin L6: Troubleshooting

Overfit a single batchb

Error goes up

Error oscillates

Common issues

Error explodes

Error plateaus

Most common causes

• Data or labels corrupted (e.g., zeroed, incorrectly shuffled, or preprocessed incorrectly)

• Learning rate too high

Implementing bug-free DL models

!73

2. Implement & debug

Page 74: Full Stack Deep Learning...Full Stack Deep Learning (March 2019) Pieter Abbeel, Sergey Karayev, Josh Tobin L6: Troubleshooting Suppose you can’t reproduce a result!7 He, Kaiming,

Full Stack Deep Learning (March 2019) Pieter Abbeel, Sergey Karayev, Josh Tobin L6: Troubleshooting

Overfit a single batchb

Error goes up

Error oscillates

Common issues

Error explodes

Error plateaus

Most common causes

• Learning rate too low • Gradients not flowing through the whole model • Too much regularization • Incorrect input to loss function (e.g., softmax instead of

logits, accidentally add ReLU on output) • Data or labels corrupted

Implementing bug-free DL models

!74

2. Implement & debug

Page 75: Full Stack Deep Learning...Full Stack Deep Learning (March 2019) Pieter Abbeel, Sergey Karayev, Josh Tobin L6: Troubleshooting Suppose you can’t reproduce a result!7 He, Kaiming,

Full Stack Deep Learning (March 2019) Pieter Abbeel, Sergey Karayev, Josh Tobin L6: Troubleshooting

Overfit a single batchb

Error goes up

Error oscillates

Common issues

Error explodes

Error plateaus

Most common causes

• Numerical issue. Check all exp, log, and div operations • Learning rate too high

• Data or labels corrupted (e.g., zeroed or incorrectly shuffled)

• Learning rate too high

• Learning rate too low • Gradients not flowing through the whole model • Too much regularization • Incorrect input to loss function (e.g., softmax instead of

logits) • Data or labels corrupted

Implementing bug-free DL models

!75

• Flipped the sign of the loss function / gradient • Learning rate too high • Softmax taken over wrong dimension

2. Implement & debug

Page 76: Full Stack Deep Learning...Full Stack Deep Learning (March 2019) Pieter Abbeel, Sergey Karayev, Josh Tobin L6: Troubleshooting Suppose you can’t reproduce a result!7 He, Kaiming,

Full Stack Deep Learning (March 2019) Pieter Abbeel, Sergey Karayev, Josh Tobin L6: Troubleshooting

Get your model to run

Compare to a known result

Overfit a single batchb

a

c

Steps

Implementing bug-free DL models

!76

2. Implement & debug

Page 77: Full Stack Deep Learning...Full Stack Deep Learning (March 2019) Pieter Abbeel, Sergey Karayev, Josh Tobin L6: Troubleshooting Suppose you can’t reproduce a result!7 He, Kaiming,

Full Stack Deep Learning (March 2019) Pieter Abbeel, Sergey Karayev, Josh Tobin L6: Troubleshooting

Hierarchy of known results

!77

More useful

Less useful

You can:

• Walk through code line-by-line and ensure you have the same output

• Ensure your performance is up to par with expectations

• Official model implementation evaluated on similar dataset to yours

2. Implement & debug

Page 78: Full Stack Deep Learning...Full Stack Deep Learning (March 2019) Pieter Abbeel, Sergey Karayev, Josh Tobin L6: Troubleshooting Suppose you can’t reproduce a result!7 He, Kaiming,

Full Stack Deep Learning (March 2019) Pieter Abbeel, Sergey Karayev, Josh Tobin L6: Troubleshooting

• Official model implementation evaluated on similar dataset to yours

• Official model implementation evaluated on benchmark (e.g., MNIST)

Hierarchy of known results

!78

More useful

Less useful

You can:

• Walk through code line-by-line and ensure you have the same output

2. Implement & debug

Page 79: Full Stack Deep Learning...Full Stack Deep Learning (March 2019) Pieter Abbeel, Sergey Karayev, Josh Tobin L6: Troubleshooting Suppose you can’t reproduce a result!7 He, Kaiming,

Full Stack Deep Learning (March 2019) Pieter Abbeel, Sergey Karayev, Josh Tobin L6: Troubleshooting

• Official model implementation evaluated on similar dataset to yours

• Official model implementation evaluated on benchmark (e.g., MNIST)

• Unofficial model implementation

• Results from the paper (with no code)

• Results from your model on a benchmark dataset (e.g., MNIST)

• Results from a similar model on a similar dataset

• Super simple baselines (e.g., average of outputs or linear regression)

Hierarchy of known results

!79

More useful

Less useful

You can:

• Same as before, but with lower confidence

2. Implement & debug

Page 80: Full Stack Deep Learning...Full Stack Deep Learning (March 2019) Pieter Abbeel, Sergey Karayev, Josh Tobin L6: Troubleshooting Suppose you can’t reproduce a result!7 He, Kaiming,

Full Stack Deep Learning (March 2019) Pieter Abbeel, Sergey Karayev, Josh Tobin L6: Troubleshooting

Hierarchy of known results

!80

More useful

Less useful

You can:

• Ensure your performance is up to par with expectations

• Official model implementation evaluated on similar dataset to yours

• Official model implementation evaluated on benchmark (e.g., MNIST)

• Unofficial model implementation

• Results from a paper (with no code)

2. Implement & debug

Page 81: Full Stack Deep Learning...Full Stack Deep Learning (March 2019) Pieter Abbeel, Sergey Karayev, Josh Tobin L6: Troubleshooting Suppose you can’t reproduce a result!7 He, Kaiming,

Full Stack Deep Learning (March 2019) Pieter Abbeel, Sergey Karayev, Josh Tobin L6: Troubleshooting

Hierarchy of known results

!81

More useful

Less useful

• Official model implementation evaluated on similar dataset to yours

• Official model implementation evaluated on benchmark (e.g., MNIST)

• Unofficial model implementation

• Results from the paper (with no code)

• Results from your model on a benchmark dataset (e.g., MNIST)

You can:

• Make sure your model performs well in a simpler setting

2. Implement & debug

Page 82: Full Stack Deep Learning...Full Stack Deep Learning (March 2019) Pieter Abbeel, Sergey Karayev, Josh Tobin L6: Troubleshooting Suppose you can’t reproduce a result!7 He, Kaiming,

Full Stack Deep Learning (March 2019) Pieter Abbeel, Sergey Karayev, Josh Tobin L6: Troubleshooting

• Official model implementation evaluated on similar dataset to yours

• Official model implementation evaluated on benchmark (e.g., MNIST)

• Unofficial model implementation

• Results from the paper (with no code)

• Results from your model on a benchmark dataset (e.g., MNIST)

• Results from a similar model on a similar dataset

• Super simple baselines (e.g., average of outputs or linear regression)

Hierarchy of known results

!82

More useful

Less useful

You can:

• Get a general sense of what kind of performance can be expected

2. Implement & debug

Page 83: Full Stack Deep Learning...Full Stack Deep Learning (March 2019) Pieter Abbeel, Sergey Karayev, Josh Tobin L6: Troubleshooting Suppose you can’t reproduce a result!7 He, Kaiming,

Full Stack Deep Learning (March 2019) Pieter Abbeel, Sergey Karayev, Josh Tobin L6: Troubleshooting

Hierarchy of known results

!83

More useful

Less useful

• Official model implementation evaluated on similar dataset to yours

• Official model implementation evaluated on benchmark (e.g., MNIST)

• Unofficial model implementation

• Results from the paper (with no code)

• Results from your model on a benchmark dataset (e.g., MNIST)

• Results from a similar model on a similar dataset

• Super simple baselines (e.g., average of outputs or linear regression)

You can:

• Make sure your model is learning anything at all

2. Implement & debug

Page 84: Full Stack Deep Learning...Full Stack Deep Learning (March 2019) Pieter Abbeel, Sergey Karayev, Josh Tobin L6: Troubleshooting Suppose you can’t reproduce a result!7 He, Kaiming,

Full Stack Deep Learning (March 2019) Pieter Abbeel, Sergey Karayev, Josh Tobin L6: Troubleshooting

Hierarchy of known results

!84

More useful

Less useful

• Official model implementation evaluated on similar dataset to yours

• Official model implementation evaluated on benchmark (e.g., MNIST)

• Unofficial model implementation

• Results from the paper (with no code)

• Results from your model on a benchmark dataset (e.g., MNIST)

• Results from a similar model on a similar dataset

• Super simple baselines (e.g., average of outputs or linear regression)

2. Implement & debug

Page 85: Full Stack Deep Learning...Full Stack Deep Learning (March 2019) Pieter Abbeel, Sergey Karayev, Josh Tobin L6: Troubleshooting Suppose you can’t reproduce a result!7 He, Kaiming,

Full Stack Deep Learning (March 2019) Pieter Abbeel, Sergey Karayev, Josh Tobin L6: Troubleshooting

Summary: how to implement & debug

!85

Get your model to run

Compare to a known result

Overfit a single batchb

a

c

Steps Summary

• Look for corrupted data, over-regularization, broadcasting errors

• Keep iterating until model performs up to expectations

• Step through in debugger & watch out for shape, casting, and OOM errors

2. Implement & debug

Page 86: Full Stack Deep Learning...Full Stack Deep Learning (March 2019) Pieter Abbeel, Sergey Karayev, Josh Tobin L6: Troubleshooting Suppose you can’t reproduce a result!7 He, Kaiming,

Full Stack Deep Learning (March 2019) Pieter Abbeel, Sergey Karayev, Josh Tobin L6: Troubleshooting

Strategy for DL troubleshooting

!86

Tune hyper-parameters

Implement & debugStart simple Evaluate

Improve model/data

Meets re-quirements

Page 87: Full Stack Deep Learning...Full Stack Deep Learning (March 2019) Pieter Abbeel, Sergey Karayev, Josh Tobin L6: Troubleshooting Suppose you can’t reproduce a result!7 He, Kaiming,

Full Stack Deep Learning (March 2019) Pieter Abbeel, Sergey Karayev, Josh Tobin L6: Troubleshooting

Bias-variance decomposition

!87

3. Evaluate

Page 88: Full Stack Deep Learning...Full Stack Deep Learning (March 2019) Pieter Abbeel, Sergey Karayev, Josh Tobin L6: Troubleshooting Suppose you can’t reproduce a result!7 He, Kaiming,

Full Stack Deep Learning (March 2019) Pieter Abbeel, Sergey Karayev, Josh Tobin L6: Troubleshooting

Bias-variance decomposition

!88

3. Evaluate

Page 89: Full Stack Deep Learning...Full Stack Deep Learning (March 2019) Pieter Abbeel, Sergey Karayev, Josh Tobin L6: Troubleshooting Suppose you can’t reproduce a result!7 He, Kaiming,

Full Stack Deep Learning (March 2019) Pieter Abbeel, Sergey Karayev, Josh Tobin L6: Troubleshooting

Bias-variance decomposition

!89

3. Evaluate

Page 90: Full Stack Deep Learning...Full Stack Deep Learning (March 2019) Pieter Abbeel, Sergey Karayev, Josh Tobin L6: Troubleshooting Suppose you can’t reproduce a result!7 He, Kaiming,

Full Stack Deep Learning (March 2019) Pieter Abbeel, Sergey Karayev, Josh Tobin L6: Troubleshooting

252 27

5 322 34

Irreducible error

Avoidable bias

(i.e., underfitting)

Train error

Variance (i.e.,

overfitting) Val error

Val set overfitting

Test error0

5

10

15

20

25

30

35

40Breakdown of test error by source

Bias-variance decomposition

!90

3. Evaluate

Page 91: Full Stack Deep Learning...Full Stack Deep Learning (March 2019) Pieter Abbeel, Sergey Karayev, Josh Tobin L6: Troubleshooting Suppose you can’t reproduce a result!7 He, Kaiming,

Full Stack Deep Learning (March 2019) Pieter Abbeel, Sergey Karayev, Josh Tobin L6: Troubleshooting

Bias-variance decomposition

!91

Test error = irreducible error + bias + variance + val overfitting

This assumes train, val, and test all come from the same distribution. What if not?

3. Evaluate

Page 92: Full Stack Deep Learning...Full Stack Deep Learning (March 2019) Pieter Abbeel, Sergey Karayev, Josh Tobin L6: Troubleshooting Suppose you can’t reproduce a result!7 He, Kaiming,

Full Stack Deep Learning (March 2019) Pieter Abbeel, Sergey Karayev, Josh Tobin L6: Troubleshooting

Handling distribution shift

!92

Test data

Use two val sets: one sampled from training distribution and one from test distribution

Train data

3. Evaluate

Page 93: Full Stack Deep Learning...Full Stack Deep Learning (March 2019) Pieter Abbeel, Sergey Karayev, Josh Tobin L6: Troubleshooting Suppose you can’t reproduce a result!7 He, Kaiming,

Full Stack Deep Learning (March 2019) Pieter Abbeel, Sergey Karayev, Josh Tobin L6: Troubleshooting

The bias-variance tradeoff

!93

3. Evaluate

Page 94: Full Stack Deep Learning...Full Stack Deep Learning (March 2019) Pieter Abbeel, Sergey Karayev, Josh Tobin L6: Troubleshooting Suppose you can’t reproduce a result!7 He, Kaiming,

Full Stack Deep Learning (March 2019) Pieter Abbeel, Sergey Karayev, Josh Tobin L6: Troubleshooting

Bias-variance with distribution shift

!94

3. Evaluate

Page 95: Full Stack Deep Learning...Full Stack Deep Learning (March 2019) Pieter Abbeel, Sergey Karayev, Josh Tobin L6: Troubleshooting Suppose you can’t reproduce a result!7 He, Kaiming,

Full Stack Deep Learning (March 2019) Pieter Abbeel, Sergey Karayev, Josh Tobin L6: Troubleshooting

Bias-variance with distribution shift

!95

252 27

2 293 32

2 34

Irred

ucibl

e erro

rAv

oidab

le bia

s

(i.e.,

unde

rfitti

ng)

Train

erro

r

Varia

nce

Train

val e

rror

Distr

ibutio

n sh

iftTe

st va

l erro

rVa

l ove

rfitti

ng

Test

erro

r

0

5

10

15

20

25

30

35

40Breakdown of test error by source

3. Evaluate

Page 96: Full Stack Deep Learning...Full Stack Deep Learning (March 2019) Pieter Abbeel, Sergey Karayev, Josh Tobin L6: Troubleshooting Suppose you can’t reproduce a result!7 He, Kaiming,

Full Stack Deep Learning (March 2019) Pieter Abbeel, Sergey Karayev, Josh Tobin L6: Troubleshooting

Train, val, and test error for pedestrian detection

!96

Error source Value

Goal performance 1%

Train error 20%

Validation error 27%

Test error 28%

Train - goal = 19%(under-fitting)

0 (no pedestrian) 1 (yes pedestrian)

Goal: 99% classification accuracy

Running example

3. Evaluate

Page 97: Full Stack Deep Learning...Full Stack Deep Learning (March 2019) Pieter Abbeel, Sergey Karayev, Josh Tobin L6: Troubleshooting Suppose you can’t reproduce a result!7 He, Kaiming,

Full Stack Deep Learning (March 2019) Pieter Abbeel, Sergey Karayev, Josh Tobin L6: Troubleshooting

Train, val, and test error for pedestrian detection

!97

Error source Value

Goal performance 1%

Train error 20%

Validation error 27%

Test error 28%

Val - train = 7%(over-fitting)

0 (no pedestrian) 1 (yes pedestrian)

Goal: 99% classification accuracy

Running example

3. Evaluate

Page 98: Full Stack Deep Learning...Full Stack Deep Learning (March 2019) Pieter Abbeel, Sergey Karayev, Josh Tobin L6: Troubleshooting Suppose you can’t reproduce a result!7 He, Kaiming,

Full Stack Deep Learning (March 2019) Pieter Abbeel, Sergey Karayev, Josh Tobin L6: Troubleshooting

Train, val, and test error for pedestrian detection

!98

Error source Value

Goal performance 1%

Train error 20%

Validation error 27%

Test error 28%

Test - val = 1%(looks good!) 0 (no pedestrian) 1 (yes pedestrian)

Goal: 99% classification accuracy

Running example

3. Evaluate

Page 99: Full Stack Deep Learning...Full Stack Deep Learning (March 2019) Pieter Abbeel, Sergey Karayev, Josh Tobin L6: Troubleshooting Suppose you can’t reproduce a result!7 He, Kaiming,

Full Stack Deep Learning (March 2019) Pieter Abbeel, Sergey Karayev, Josh Tobin L6: Troubleshooting

Summary: evaluating model performance

!99

Test error = irreducible error + bias + variance + distribution shift + val overfitting

3. Evaluate

Page 100: Full Stack Deep Learning...Full Stack Deep Learning (March 2019) Pieter Abbeel, Sergey Karayev, Josh Tobin L6: Troubleshooting Suppose you can’t reproduce a result!7 He, Kaiming,

Full Stack Deep Learning (March 2019) Pieter Abbeel, Sergey Karayev, Josh Tobin L6: Troubleshooting

Strategy for DL troubleshooting

!100

Tune hyper-parameters

Implement & debugStart simple Evaluate

Improve model/data

Meets re-quirements

Page 101: Full Stack Deep Learning...Full Stack Deep Learning (March 2019) Pieter Abbeel, Sergey Karayev, Josh Tobin L6: Troubleshooting Suppose you can’t reproduce a result!7 He, Kaiming,

Full Stack Deep Learning (March 2019) Pieter Abbeel, Sergey Karayev, Josh Tobin L6: Troubleshooting

Address distribution shift

Prioritizing improvements (i.e., applied b-v)

!101

Address under-fitting

Re-balance datasets (if applicable)

Address over-fittingb

a

c

Steps

d

4. Improve

Page 102: Full Stack Deep Learning...Full Stack Deep Learning (March 2019) Pieter Abbeel, Sergey Karayev, Josh Tobin L6: Troubleshooting Suppose you can’t reproduce a result!7 He, Kaiming,

Full Stack Deep Learning (March 2019) Pieter Abbeel, Sergey Karayev, Josh Tobin L6: Troubleshooting

Addressing under-fitting (i.e., reducing bias)

!102

Try first

Try later

A. Make your model bigger (i.e., add layers or use more units per layer)

B. Reduce regularization

C. Error analysis

D. Choose a different (closer to state-of-the art) model architecture (e.g., move from LeNet to ResNet)

E. Tune hyper-parameters (e.g., learning rate)

F. Add features

4. Improve

Page 103: Full Stack Deep Learning...Full Stack Deep Learning (March 2019) Pieter Abbeel, Sergey Karayev, Josh Tobin L6: Troubleshooting Suppose you can’t reproduce a result!7 He, Kaiming,

Full Stack Deep Learning (March 2019) Pieter Abbeel, Sergey Karayev, Josh Tobin L6: Troubleshooting

Train, val, and test error for pedestrian detection

!103

0 (no pedestrian) 1 (yes pedestrian)

Goal: 99% classification accuracy (i.e., 1% error)

Error source Value Value

Goal performance 1% 1%

Train error 20% 7%

Validation error 27% 19%

Test error 28% 20%

Add more layers to the ConvNet

4. Improve

Page 104: Full Stack Deep Learning...Full Stack Deep Learning (March 2019) Pieter Abbeel, Sergey Karayev, Josh Tobin L6: Troubleshooting Suppose you can’t reproduce a result!7 He, Kaiming,

Full Stack Deep Learning (March 2019) Pieter Abbeel, Sergey Karayev, Josh Tobin L6: Troubleshooting

Train, val, and test error for pedestrian detection

!104

0 (no pedestrian) 1 (yes pedestrian)

Goal: 99% classification accuracy (i.e., 1% error)

Error source Value Value Value

Goal performance 1% 1% 1%

Train error 20% 7% 3%

Validation error 27% 19% 10%

Test error 28% 20% 10%

Switch to ResNet-101

4. Improve

Page 105: Full Stack Deep Learning...Full Stack Deep Learning (March 2019) Pieter Abbeel, Sergey Karayev, Josh Tobin L6: Troubleshooting Suppose you can’t reproduce a result!7 He, Kaiming,

Full Stack Deep Learning (March 2019) Pieter Abbeel, Sergey Karayev, Josh Tobin L6: Troubleshooting

Train, val, and test error for pedestrian detection

!105

0 (no pedestrian) 1 (yes pedestrian)

Goal: 99% classification accuracy (i.e., 1% error)

Error source Value Value Value Value

Goal performance 1% 1% 1% 1%

Train error 20% 7% 3% 0.8%

Validation error 27% 19% 10% 12%

Test error 28% 20% 10% 12%

Tune learning rate

4. Improve

Page 106: Full Stack Deep Learning...Full Stack Deep Learning (March 2019) Pieter Abbeel, Sergey Karayev, Josh Tobin L6: Troubleshooting Suppose you can’t reproduce a result!7 He, Kaiming,

Full Stack Deep Learning (March 2019) Pieter Abbeel, Sergey Karayev, Josh Tobin L6: Troubleshooting

Address distribution shift

Prioritizing improvements (i.e., applied b-v)

!106

Address under-fitting

Re-balance datasets (if applicable)

Address over-fittingb

a

c

Steps

d

4. Improve

Page 107: Full Stack Deep Learning...Full Stack Deep Learning (March 2019) Pieter Abbeel, Sergey Karayev, Josh Tobin L6: Troubleshooting Suppose you can’t reproduce a result!7 He, Kaiming,

Full Stack Deep Learning (March 2019) Pieter Abbeel, Sergey Karayev, Josh Tobin L6: Troubleshooting

Addressing over-fitting (i.e., reducing variance)

!107

Try first

Try later

A. Add more training data (if possible!)

B. Add normalization (e.g., batch norm, layer norm)

C. Add data augmentation

D. Increase regularization (e.g., dropout, L2, weight decay)

E. Error analysis

F. Choose a different (closer to state-of-the-art) model architecture

G. Tune hyperparameters

H. Early stopping

I. Remove features

J. Reduce model size

4. Improve

Page 108: Full Stack Deep Learning...Full Stack Deep Learning (March 2019) Pieter Abbeel, Sergey Karayev, Josh Tobin L6: Troubleshooting Suppose you can’t reproduce a result!7 He, Kaiming,

Full Stack Deep Learning (March 2019) Pieter Abbeel, Sergey Karayev, Josh Tobin L6: Troubleshooting

Addressing over-fitting (i.e., reducing variance)

!108

Try first

Try later

Not recommended!

A. Add more training data (if possible!)

B. Add normalization (e.g., batch norm, layer norm)

C. Add data augmentation

D. Increase regularization (e.g., dropout, L2, weight decay)

E. Error analysis

F. Choose a different (closer to state-of-the-art) model architecture

G. Tune hyperparameters

H. Early stopping

I. Remove features

J. Reduce model size

4. Improve

Page 109: Full Stack Deep Learning...Full Stack Deep Learning (March 2019) Pieter Abbeel, Sergey Karayev, Josh Tobin L6: Troubleshooting Suppose you can’t reproduce a result!7 He, Kaiming,

Full Stack Deep Learning (March 2019) Pieter Abbeel, Sergey Karayev, Josh Tobin L6: Troubleshooting

Train, val, and test error for pedestrian detection

!109

Error source Value

Goal performance 1%

Train error 0.8%

Validation error 12%

Test error 12%

0 (no pedestrian) 1 (yes pedestrian)

Goal: 99% classification accuracy

Running example

4. Improve

Page 110: Full Stack Deep Learning...Full Stack Deep Learning (March 2019) Pieter Abbeel, Sergey Karayev, Josh Tobin L6: Troubleshooting Suppose you can’t reproduce a result!7 He, Kaiming,

Full Stack Deep Learning (March 2019) Pieter Abbeel, Sergey Karayev, Josh Tobin L6: Troubleshooting

Train, val, and test error for pedestrian detection

!110

Error source Value Value

Goal performance 1% 1%

Train error 0.8% 1.5%

Validation error 12% 5%

Test error 12% 6%

Increase dataset size to 250,000

0 (no pedestrian) 1 (yes pedestrian)

Goal: 99% classification accuracy

Running example

4. Improve

Page 111: Full Stack Deep Learning...Full Stack Deep Learning (March 2019) Pieter Abbeel, Sergey Karayev, Josh Tobin L6: Troubleshooting Suppose you can’t reproduce a result!7 He, Kaiming,

Full Stack Deep Learning (March 2019) Pieter Abbeel, Sergey Karayev, Josh Tobin L6: Troubleshooting

Train, val, and test error for pedestrian detection

!111

Error source Value Value Value

Goal performance 1% 1% 1%

Train error 0.8% 1.5% 1.7%

Validation error 12% 5% 4%

Test error 12% 6% 4%

Add weight decay

0 (no pedestrian) 1 (yes pedestrian)

Goal: 99% classification accuracy

Running example

4. Improve

Page 112: Full Stack Deep Learning...Full Stack Deep Learning (March 2019) Pieter Abbeel, Sergey Karayev, Josh Tobin L6: Troubleshooting Suppose you can’t reproduce a result!7 He, Kaiming,

Full Stack Deep Learning (March 2019) Pieter Abbeel, Sergey Karayev, Josh Tobin L6: Troubleshooting

Train, val, and test error for pedestrian detection

!112

Error source Value Value Value Value

Goal performance 1% 1% 1% 1%

Train error 0.8% 1.5% 1.7% 2%

Validation error 12% 5% 4% 2.5%

Test error 12% 6% 4% 2.6%

Add data augmentation

0 (no pedestrian) 1 (yes pedestrian)

Goal: 99% classification accuracy

Running example

4. Improve

Page 113: Full Stack Deep Learning...Full Stack Deep Learning (March 2019) Pieter Abbeel, Sergey Karayev, Josh Tobin L6: Troubleshooting Suppose you can’t reproduce a result!7 He, Kaiming,

Full Stack Deep Learning (March 2019) Pieter Abbeel, Sergey Karayev, Josh Tobin L6: Troubleshooting

Train, val, and test error for pedestrian detection

!113

Error source Value Value Value Value Value

Goal performance 1% 1% 1% 1% 1%

Train error 0.8% 1.5% 1.7% 2% 0.6%

Validation error 12% 5% 4% 2.5% 0.9%

Test error 12% 6% 4% 2.6% 1.0%

Tune num layers, optimizer params, weight initialization, kernel size, weight decay

0 (no pedestrian) 1 (yes pedestrian)

Goal: 99% classification accuracy

Running example

4. Improve

Page 114: Full Stack Deep Learning...Full Stack Deep Learning (March 2019) Pieter Abbeel, Sergey Karayev, Josh Tobin L6: Troubleshooting Suppose you can’t reproduce a result!7 He, Kaiming,

Full Stack Deep Learning (March 2019) Pieter Abbeel, Sergey Karayev, Josh Tobin L6: Troubleshooting

Address distribution shift

Prioritizing improvements (i.e., applied b-v)

!114

Address under-fitting

Re-balance datasets (if applicable)

Address over-fittingb

a

c

Steps

d

4. Improve

Page 115: Full Stack Deep Learning...Full Stack Deep Learning (March 2019) Pieter Abbeel, Sergey Karayev, Josh Tobin L6: Troubleshooting Suppose you can’t reproduce a result!7 He, Kaiming,

Full Stack Deep Learning (March 2019) Pieter Abbeel, Sergey Karayev, Josh Tobin L6: Troubleshooting

Addressing distribution shift

!115

Try first

Try later

A. Analyze test-val set errors & collect more training data to compensate

B. Analyze test-val set errors & synthesize more training data to compensate

C. Apply domain adaptation techniques to training & test distributions

4. Improve

Page 116: Full Stack Deep Learning...Full Stack Deep Learning (March 2019) Pieter Abbeel, Sergey Karayev, Josh Tobin L6: Troubleshooting Suppose you can’t reproduce a result!7 He, Kaiming,

Full Stack Deep Learning (March 2019) Pieter Abbeel, Sergey Karayev, Josh Tobin L6: Troubleshooting

Error analysis

!116

Test-val set errors (no pedestrian detected) Train-val set errors (no pedestrian detected)

4. Improve

Page 117: Full Stack Deep Learning...Full Stack Deep Learning (March 2019) Pieter Abbeel, Sergey Karayev, Josh Tobin L6: Troubleshooting Suppose you can’t reproduce a result!7 He, Kaiming,

Full Stack Deep Learning (March 2019) Pieter Abbeel, Sergey Karayev, Josh Tobin L6: Troubleshooting

Error analysis

!117

Test-val set errors (no pedestrian detected) Train-val set errors (no pedestrian detected)

Error type 1: hard-to-see pedestrians

4. Improve

Page 118: Full Stack Deep Learning...Full Stack Deep Learning (March 2019) Pieter Abbeel, Sergey Karayev, Josh Tobin L6: Troubleshooting Suppose you can’t reproduce a result!7 He, Kaiming,

Full Stack Deep Learning (March 2019) Pieter Abbeel, Sergey Karayev, Josh Tobin L6: Troubleshooting

Error analysis

!118

Test-val set errors (no pedestrian detected) Train-val set errors (no pedestrian detected)

Error type 2: reflections

4. Improve

Page 119: Full Stack Deep Learning...Full Stack Deep Learning (March 2019) Pieter Abbeel, Sergey Karayev, Josh Tobin L6: Troubleshooting Suppose you can’t reproduce a result!7 He, Kaiming,

Full Stack Deep Learning (March 2019) Pieter Abbeel, Sergey Karayev, Josh Tobin L6: Troubleshooting

Error analysis

!119

Test-val set errors (no pedestrian detected) Train-val set errors (no pedestrian detected)

Error type 3 (test-val only): night scenes

4. Improve

Page 120: Full Stack Deep Learning...Full Stack Deep Learning (March 2019) Pieter Abbeel, Sergey Karayev, Josh Tobin L6: Troubleshooting Suppose you can’t reproduce a result!7 He, Kaiming,

Full Stack Deep Learning (March 2019) Pieter Abbeel, Sergey Karayev, Josh Tobin L6: Troubleshooting

Error analysis

!120

Error type Error % (train-val)

Error % (test-val) Potential solutions Priority

1. Hard-to-see pedestrians 0.1% 0.1% • Better sensors Low

2. Reflections 0.3% 0.3%• Collect more data with reflections• Add synthetic reflections to train set• Try to remove with pre-processing• Better sensors

Medium

3. Nighttime scenes 0.1% 1%

• Collect more data at night• Synthetically darken training images• Simulate night-time data• Use domain adaptation

High

4. Improve

Page 121: Full Stack Deep Learning...Full Stack Deep Learning (March 2019) Pieter Abbeel, Sergey Karayev, Josh Tobin L6: Troubleshooting Suppose you can’t reproduce a result!7 He, Kaiming,

Full Stack Deep Learning (March 2019) Pieter Abbeel, Sergey Karayev, Josh Tobin L6: Troubleshooting

Domain adaptation

!121

What is it?

Techniques to train on “source” distribution and generalize to another “target” using only unlabeled data or

limited labeled data

When should you consider using it?

• Access to labeled data from test distribution is limited

• Access to relatively similar data is plentiful

4. Improve

Page 122: Full Stack Deep Learning...Full Stack Deep Learning (March 2019) Pieter Abbeel, Sergey Karayev, Josh Tobin L6: Troubleshooting Suppose you can’t reproduce a result!7 He, Kaiming,

Full Stack Deep Learning (March 2019) Pieter Abbeel, Sergey Karayev, Josh Tobin L6: Troubleshooting

Types of domain adaptation

!122

Type Use case Example techniques

Supervised You have limited data from target domain

• Fine-tuning a pre-trained model

• Adding target data to train set

Un-supervisedYou have lots of un-labeled data from target domain

• Correlation Alignment (CORAL)

• Domain confusion• CycleGAN

4. Improve

Page 123: Full Stack Deep Learning...Full Stack Deep Learning (March 2019) Pieter Abbeel, Sergey Karayev, Josh Tobin L6: Troubleshooting Suppose you can’t reproduce a result!7 He, Kaiming,

Full Stack Deep Learning (March 2019) Pieter Abbeel, Sergey Karayev, Josh Tobin L6: Troubleshooting

Address distribution shift

Prioritizing improvements (i.e., applied b-v)

!123

Address under-fitting

Re-balance datasets (if applicable)

Address over-fittingb

a

c

Steps

d

4. Improve

Page 124: Full Stack Deep Learning...Full Stack Deep Learning (March 2019) Pieter Abbeel, Sergey Karayev, Josh Tobin L6: Troubleshooting Suppose you can’t reproduce a result!7 He, Kaiming,

Full Stack Deep Learning (March 2019) Pieter Abbeel, Sergey Karayev, Josh Tobin L6: Troubleshooting

Rebalancing datasets

!124

• If (test)-val looks significantly better than test, you overfit to the val set

• This happens with small val sets or lots of hyper parameter tuning

• When it does, recollect val data

4. Improve

Page 125: Full Stack Deep Learning...Full Stack Deep Learning (March 2019) Pieter Abbeel, Sergey Karayev, Josh Tobin L6: Troubleshooting Suppose you can’t reproduce a result!7 He, Kaiming,

Full Stack Deep Learning (March 2019) Pieter Abbeel, Sergey Karayev, Josh Tobin L6: Troubleshooting

Strategy for DL troubleshooting

!125

Tune hyper-parameters

Implement & debugStart simple Evaluate

Improve model/data

Meets re-quirements

Page 126: Full Stack Deep Learning...Full Stack Deep Learning (March 2019) Pieter Abbeel, Sergey Karayev, Josh Tobin L6: Troubleshooting Suppose you can’t reproduce a result!7 He, Kaiming,

Full Stack Deep Learning (March 2019) Pieter Abbeel, Sergey Karayev, Josh Tobin L6: Troubleshooting

Hyperparameter optimization

!126

Model & optimizer choices?Network: ResNet - How many layers? - Weight initialization? - Kernel size? - EtcOptimizer: Adam - Batch size? - Learning rate? - beta1, beta2, epsilon? Regularization - ….

0 (no pedestrian) 1 (yes pedestrian)

Goal: 99% classification accuracy

Running example

5. Tune

Page 127: Full Stack Deep Learning...Full Stack Deep Learning (March 2019) Pieter Abbeel, Sergey Karayev, Josh Tobin L6: Troubleshooting Suppose you can’t reproduce a result!7 He, Kaiming,

Full Stack Deep Learning (March 2019) Pieter Abbeel, Sergey Karayev, Josh Tobin L6: Troubleshooting

Which hyper-parameters to tune?

!127

Choosing hyper-parameters

• More sensitive to some than others• Depends on choice of model• Rules of thumb (only) to the right• Sensitivity is relative to default values!

(e.g., if you are using all-zeros weight initialization or vanilla SGD, changing to the defaults will make a big difference)

Hyperparameter Approximate sensitivityLearning rate High

Learning rate schedule HighOptimizer choice Low

Other optimizer params(e.g., Adam beta1) Low

Batch size LowWeight initialization Medium

Loss function HighModel depth Medium

Layer size HighLayer params

(e.g., kernel size) Medium

Weight of regularization MediumNonlinearity Low

5. Tune

Page 128: Full Stack Deep Learning...Full Stack Deep Learning (March 2019) Pieter Abbeel, Sergey Karayev, Josh Tobin L6: Troubleshooting Suppose you can’t reproduce a result!7 He, Kaiming,

Full Stack Deep Learning (March 2019) Pieter Abbeel, Sergey Karayev, Josh Tobin L6: Troubleshooting

Method 1: manual hyperparam optimization

!128

How it works

• Understand the algorithm• E.g., higher learning rate means faster

less stable training• Train & evaluate model• Guess a better hyperparam value & re-

evaluate • Can be combined with other methods

(e.g., manually select parameter ranges to optimizer over)

Advantages

Disadvantages

• For a skilled practitioner, may require least computation to get good result

• Requires detailed understanding of the algorithm

• Time-consuming

5. Tune

Page 129: Full Stack Deep Learning...Full Stack Deep Learning (March 2019) Pieter Abbeel, Sergey Karayev, Josh Tobin L6: Troubleshooting Suppose you can’t reproduce a result!7 He, Kaiming,

Full Stack Deep Learning (March 2019) Pieter Abbeel, Sergey Karayev, Josh Tobin L6: Troubleshooting

Method 2: grid search

!129

Hyp

erpa

ram

eter

1 (e

.g.,

batc

h si

ze)

Hyperparameter 2 (e.g., learning rate)

How it works

Disadvantages

• Super simple to implement• Can produce good results

• Not very efficient: need to train on all cross-combos of hyper-parameters

• May require prior knowledge about parameters to getgood results

Advantages

5. Tune

Page 130: Full Stack Deep Learning...Full Stack Deep Learning (March 2019) Pieter Abbeel, Sergey Karayev, Josh Tobin L6: Troubleshooting Suppose you can’t reproduce a result!7 He, Kaiming,

Full Stack Deep Learning (March 2019) Pieter Abbeel, Sergey Karayev, Josh Tobin L6: Troubleshooting

Method 3: random search

!130

Hyp

erpa

ram

eter

1 (e

.g.,

batc

h si

ze)

Hyperparameter 2 (e.g., learning rate)

How it works

Disadvantages

Advantages

5. Tune

• Easy to implement• Often produces better results than grid

search

• Not very interpretable• May require prior knowledge about

parameters to getgood results

Page 131: Full Stack Deep Learning...Full Stack Deep Learning (March 2019) Pieter Abbeel, Sergey Karayev, Josh Tobin L6: Troubleshooting Suppose you can’t reproduce a result!7 He, Kaiming,

Full Stack Deep Learning (March 2019) Pieter Abbeel, Sergey Karayev, Josh Tobin L6: Troubleshooting

Method 4: coarse-to-fine

!131

Hyp

erpa

ram

eter

1 (e

.g.,

batc

h si

ze)

Hyperparameter 2 (e.g., learning rate)

How it works

Disadvantages

Advantages

5. Tune

Page 132: Full Stack Deep Learning...Full Stack Deep Learning (March 2019) Pieter Abbeel, Sergey Karayev, Josh Tobin L6: Troubleshooting Suppose you can’t reproduce a result!7 He, Kaiming,

Full Stack Deep Learning (March 2019) Pieter Abbeel, Sergey Karayev, Josh Tobin L6: Troubleshooting

Method 4: coarse-to-fine

!132

Hyp

erpa

ram

eter

1 (e

.g.,

batc

h si

ze)

Hyperparameter 2 (e.g., learning rate)

How it works

Disadvantages

Advantages

Best performers

5. Tune

Page 133: Full Stack Deep Learning...Full Stack Deep Learning (March 2019) Pieter Abbeel, Sergey Karayev, Josh Tobin L6: Troubleshooting Suppose you can’t reproduce a result!7 He, Kaiming,

Full Stack Deep Learning (March 2019) Pieter Abbeel, Sergey Karayev, Josh Tobin L6: Troubleshooting

Method 4: coarse-to-fine

!133

Hyp

erpa

ram

eter

1 (e

.g.,

batc

h si

ze)

Hyperparameter 2 (e.g., learning rate)

How it works

Disadvantages

Advantages

5. Tune

Page 134: Full Stack Deep Learning...Full Stack Deep Learning (March 2019) Pieter Abbeel, Sergey Karayev, Josh Tobin L6: Troubleshooting Suppose you can’t reproduce a result!7 He, Kaiming,

Full Stack Deep Learning (March 2019) Pieter Abbeel, Sergey Karayev, Josh Tobin L6: Troubleshooting

Method 4: coarse-to-fine

!134

Hyp

erpa

ram

eter

1 (e

.g.,

batc

h si

ze)

Hyperparameter 2 (e.g., learning rate)

How it works

Disadvantages

Advantages

5. Tune

Page 135: Full Stack Deep Learning...Full Stack Deep Learning (March 2019) Pieter Abbeel, Sergey Karayev, Josh Tobin L6: Troubleshooting Suppose you can’t reproduce a result!7 He, Kaiming,

Full Stack Deep Learning (March 2019) Pieter Abbeel, Sergey Karayev, Josh Tobin L6: Troubleshooting

Method 4: coarse-to-fine

!135

Hyp

erpa

ram

eter

1 (e

.g.,

batc

h si

ze)

Hyperparameter 2 (e.g., learning rate)

How it works

Disadvantages

• Can narrow in on very high performing hyperparameters

• Most used method in practice

• Somewhat manual process

Advantages

etc.

5. Tune

Page 136: Full Stack Deep Learning...Full Stack Deep Learning (March 2019) Pieter Abbeel, Sergey Karayev, Josh Tobin L6: Troubleshooting Suppose you can’t reproduce a result!7 He, Kaiming,

Full Stack Deep Learning (March 2019) Pieter Abbeel, Sergey Karayev, Josh Tobin L6: Troubleshooting

Method 5: Bayesian hyperparam opt

!136

How it works (at a high level)

• Start with a prior estimate of parameter distributions

• Maintain a probabilistic model of the relationship between hyper-parameter values and model performance

• Alternate between:• Training with the hyper-parameter

values that maximize the expected improvement

• Using training results to update our probabilistic model

• To learn more, see:

Advantages

Disadvantages

• Generally the most efficient hands-off way to choose hyperparameters

• Difficult to implement from scratch• Can be hard to integrate with off-the-shelf

tools

https://towardsdatascience.com/a-conceptual-explanation-of-bayesian-model-based-hyperparameter-optimization-for-machine-learning-b8172278050f

5. Tune

Page 137: Full Stack Deep Learning...Full Stack Deep Learning (March 2019) Pieter Abbeel, Sergey Karayev, Josh Tobin L6: Troubleshooting Suppose you can’t reproduce a result!7 He, Kaiming,

Full Stack Deep Learning (March 2019) Pieter Abbeel, Sergey Karayev, Josh Tobin L6: Troubleshooting

Method 5: Bayesian hyperparam opt

!137

How it works (at a high level)

• Start with a prior estimate of parameter distributions

• Maintain a probabilistic model of the relationship between hyper-parameter values and model performance

• Alternate between:• Training with the hyper-parameter

values that maximize the expected improvement

• Using training results to update our probabilistic model

• To learn more, see:

Advantages

Disadvantages

• Generally the most efficient hands-off way to choose hyperparameters

• Difficult to implement from scratch• Can be hard to integrate with off-the-shelf

tools

More on tools to do this automatically in the infrastructure & tooling lecture!

https://towardsdatascience.com/a-conceptual-explanation-of-bayesian-model-based-hyperparameter-optimization-for-machine-learning-b8172278050f

5. Tune

Page 138: Full Stack Deep Learning...Full Stack Deep Learning (March 2019) Pieter Abbeel, Sergey Karayev, Josh Tobin L6: Troubleshooting Suppose you can’t reproduce a result!7 He, Kaiming,

Full Stack Deep Learning (March 2019) Pieter Abbeel, Sergey Karayev, Josh Tobin L6: Troubleshooting

Summary of how to optimize hyperparams

!138

• Coarse-to-fine random searches

• Consider Bayesian hyper-parameter optimization solutions as your codebase matures

5. Tune

Page 139: Full Stack Deep Learning...Full Stack Deep Learning (March 2019) Pieter Abbeel, Sergey Karayev, Josh Tobin L6: Troubleshooting Suppose you can’t reproduce a result!7 He, Kaiming,

Full Stack Deep Learning (March 2019) Pieter Abbeel, Sergey Karayev, Josh Tobin L6: Troubleshooting

Conclusion

!139

Page 140: Full Stack Deep Learning...Full Stack Deep Learning (March 2019) Pieter Abbeel, Sergey Karayev, Josh Tobin L6: Troubleshooting Suppose you can’t reproduce a result!7 He, Kaiming,

Full Stack Deep Learning (March 2019) Pieter Abbeel, Sergey Karayev, Josh Tobin L6: Troubleshooting

Conclusion

!140

• DL debugging is hard due to many competing sources of error

• To train bug-free DL models, we treat building our model as an iterative process

• The following steps can make the process easier and catch errors as early as possible

Page 141: Full Stack Deep Learning...Full Stack Deep Learning (March 2019) Pieter Abbeel, Sergey Karayev, Josh Tobin L6: Troubleshooting Suppose you can’t reproduce a result!7 He, Kaiming,

Full Stack Deep Learning (March 2019) Pieter Abbeel, Sergey Karayev, Josh Tobin L6: Troubleshooting

How to build bug-free DL models

!141

Overview

Tune hyp-eparams

Implement & debug

Start simple

Evaluate

Improve model/data

• Choose the simplest model & data possible (e.g., LeNet on a subset of your data)

• Once model runs, overfit a single batch & reproduce a known result

• Apply the bias-variance decomposition to decide what to do next

• Use coarse-to-fine random searches

• Make your model bigger if you underfit; add data or regularize if you overfit

Page 142: Full Stack Deep Learning...Full Stack Deep Learning (March 2019) Pieter Abbeel, Sergey Karayev, Josh Tobin L6: Troubleshooting Suppose you can’t reproduce a result!7 He, Kaiming,

Full Stack Deep Learning (March 2019) Pieter Abbeel, Sergey Karayev, Josh Tobin L6: Troubleshooting

Where to go to learn more

!142

• Andrew Ng’s book Machine Learning Yearning (http://www.mlyearning.org/)

• The following Twitter thread:https://twitter.com/karpathy/status/1013244313327681536

• This blog post: https://pcc.cs.byu.edu/2017/10/02/practical-advice-for-building-deep-neural-networks/