lecture: introduction to deep learningvision.stanford.edu/.../files/19_deep_learning.pdf ·...

102
Stanford University Stanford University Lecture: Introduction to Deep Learning Juan Carlos Niebles and Ranjay Krishna Stanford Vision and Learning Lab Lecture 19 - 1 6 Dec 2016 Slides adapted from Justin Johnson

Upload: others

Post on 22-May-2020

3 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Lecture: Introduction to Deep Learningvision.stanford.edu/.../files/19_deep_learning.pdf · 2017-12-05 · Deep Learning Learning hierarchical representations from data End-to-end

Stanford UniversityStanford University

Lecture:Introduction to Deep Learning

Juan Carlos Niebles and Ranjay KrishnaStanford Vision and Learning Lab

Lecture 19 - 1 6 Dec 2016

Slides adapted from Justin Johnson

Page 2: Lecture: Introduction to Deep Learningvision.stanford.edu/.../files/19_deep_learning.pdf · 2017-12-05 · Deep Learning Learning hierarchical representations from data End-to-end

Stanford University

So far this quarter

Lecture 19 - 2 6 Dec 2016

Edge detection

RANSAC

SIFT

K-Means

Mean-shift

Linear classifiers

Image features

PCA / Eigenfaces

5 Dec 2016

Page 3: Lecture: Introduction to Deep Learningvision.stanford.edu/.../files/19_deep_learning.pdf · 2017-12-05 · Deep Learning Learning hierarchical representations from data End-to-end

Stanford UniversityStanford University Lecture 19 - 3 6 Dec 2016

Current Research

ECCV 2016(photo credit: Justin Johnson)

Page 4: Lecture: Introduction to Deep Learningvision.stanford.edu/.../files/19_deep_learning.pdf · 2017-12-05 · Deep Learning Learning hierarchical representations from data End-to-end

Stanford University

Current Research

Lecture 19 - 4 6 Dec 2016

ECCV 2016(photo credit: Justin Johnson)

Convolutional neural networks

Recurrent neural networks

Autoencoders

Fully convolutional

networks

Multilayer perceptrons

What are these things?

Why are they important?

5 Dec 2016

Page 5: Lecture: Introduction to Deep Learningvision.stanford.edu/.../files/19_deep_learning.pdf · 2017-12-05 · Deep Learning Learning hierarchical representations from data End-to-end

Stanford University

Deep Learning

● Learning hierarchical representations from data● End-to-end learning: raw inputs to predictions● Can use a small set of simple tools to solve many problems● Has led to rapid progress on many problems ● (Very loosely!) Inspired by the brain● Today: 1 hour introduction to deep learning fundamentals● To learn more take CS 231n in the Spring

Lecture 19 - 5 6 Dec 20165 Dec 2016

Page 6: Lecture: Introduction to Deep Learningvision.stanford.edu/.../files/19_deep_learning.pdf · 2017-12-05 · Deep Learning Learning hierarchical representations from data End-to-end

Stanford University

Deep learning for many different problems

Lecture 19 - 6 6 Dec 2016

Page 7: Lecture: Introduction to Deep Learningvision.stanford.edu/.../files/19_deep_learning.pdf · 2017-12-05 · Deep Learning Learning hierarchical representations from data End-to-end

Stanford University Lecture 19 - 7 6 Dec 20165 Dec 2016

Page 8: Lecture: Introduction to Deep Learningvision.stanford.edu/.../files/19_deep_learning.pdf · 2017-12-05 · Deep Learning Learning hierarchical representations from data End-to-end

Stanford University Lecture 19 - 8 6 Dec 2016

Slide credit: CS 231n, Lecture 1, 2016

5 Dec 2016

Page 9: Lecture: Introduction to Deep Learningvision.stanford.edu/.../files/19_deep_learning.pdf · 2017-12-05 · Deep Learning Learning hierarchical representations from data End-to-end

Stanford University Lecture 19 - 9 6 Dec 2016

Slide credit: Kaiming He, ICCV 2015

5 Dec 2016

Page 10: Lecture: Introduction to Deep Learningvision.stanford.edu/.../files/19_deep_learning.pdf · 2017-12-05 · Deep Learning Learning hierarchical representations from data End-to-end

Stanford University Lecture 19 - 10 6 Dec 2016

Slide credit: Kaiming He, ICCV 2015

≈5.1

Human

Human benchmark from Russakovsky et al, “ImageNet Large Scale Visual Recognition Challenge”, IJCV 2015

5 Dec 2016

Page 11: Lecture: Introduction to Deep Learningvision.stanford.edu/.../files/19_deep_learning.pdf · 2017-12-05 · Deep Learning Learning hierarchical representations from data End-to-end

Stanford University Lecture 19 - 11 6 Dec 2016

Slide credit: Kaiming He, ICCV 2015

5 Dec 2016

Page 12: Lecture: Introduction to Deep Learningvision.stanford.edu/.../files/19_deep_learning.pdf · 2017-12-05 · Deep Learning Learning hierarchical representations from data End-to-end

Stanford University Lecture 19 - 12 6 Dec 2016

Slide credit: Ross Girshick, ICCV 2015

Deep Learning for Object Detection

5 Dec 2016

Page 13: Lecture: Introduction to Deep Learningvision.stanford.edu/.../files/19_deep_learning.pdf · 2017-12-05 · Deep Learning Learning hierarchical representations from data End-to-end

Stanford University Lecture 19 - 13 6 Dec 2016

Slide credit: Ross Girshick, ICCV 2015

Deep Learning for Object DetectionCurrent SOTA: 85.6

He et al, “Deep Residual Learning for Image Recognition”, CVPR 2016

5 Dec 2016

Page 14: Lecture: Introduction to Deep Learningvision.stanford.edu/.../files/19_deep_learning.pdf · 2017-12-05 · Deep Learning Learning hierarchical representations from data End-to-end

Stanford University

Object segmentation

Lecture 19 - 14 6 Dec 2016

Figure credit: Dai, He, and Sun, “Instance-aware Semantic Segmentation via Multi-task Network Cascades”, CVPR 2016

5 Dec 2016

Page 15: Lecture: Introduction to Deep Learningvision.stanford.edu/.../files/19_deep_learning.pdf · 2017-12-05 · Deep Learning Learning hierarchical representations from data End-to-end

Stanford University

Pose Estimation

Lecture 19 - 15 6 Dec 2016

Figure credit: Cao et al, “Realtime Multi-Person 2D Pose Estimation using Part Affinity Fields”, arXiv 2016

5 Dec 2016

Page 16: Lecture: Introduction to Deep Learningvision.stanford.edu/.../files/19_deep_learning.pdf · 2017-12-05 · Deep Learning Learning hierarchical representations from data End-to-end

Stanford University

Deep Learning for Image Captioning

Lecture 19 - 16 6 Dec 2016

Figure credit: Karpathy and Fei-Fei, “Deep Visual-Semantic Alignments for Generating Image Descriptions”, CVPR 2015

5 Dec 2016

Page 17: Lecture: Introduction to Deep Learningvision.stanford.edu/.../files/19_deep_learning.pdf · 2017-12-05 · Deep Learning Learning hierarchical representations from data End-to-end

Stanford University

Dense Image Captioning

Lecture 19 - 17 6 Dec 2016

Figure credit: Johnson*, Karpathy*, and Fei-Fei, “DenseCap: Fully Convolutional Localization Networks for Dense Captioning”, CVPR 2016

5 Dec 2016

Page 18: Lecture: Introduction to Deep Learningvision.stanford.edu/.../files/19_deep_learning.pdf · 2017-12-05 · Deep Learning Learning hierarchical representations from data End-to-end

Stanford University

Visual Question Answering

Lecture 19 - 18 6 Dec 2016

Figure credit: Agrawal et al, “VQA: Visual Question Answering”, ICCV 2015

Figure credit: Zhu et al, “Visual7W: Grounded Question Answering in Images”, CVPR 2016

5 Dec 2016

Page 19: Lecture: Introduction to Deep Learningvision.stanford.edu/.../files/19_deep_learning.pdf · 2017-12-05 · Deep Learning Learning hierarchical representations from data End-to-end

Stanford University

Image Super-Resolution

Lecture 19 - 19 6 Dec 2016

Figure credit: Ledig et al, “Photo-Realistic Single Image Super-Resolution Using a Generative Adversarial Network”, arXiv 2016

5 Dec 2016

Page 20: Lecture: Introduction to Deep Learningvision.stanford.edu/.../files/19_deep_learning.pdf · 2017-12-05 · Deep Learning Learning hierarchical representations from data End-to-end

Stanford University

Generating Art

Lecture 19 - 20 6 Dec 2016

Figure credit: Gatys, Ecker, and Bethge, “Image Style Transfer using Convolutional Neural Networks”, CVPR 2016

Figure credit: Johnson, Alahi, and Fei-Fei: “Perceptual Losses for Real-Time Style Transfer and Super-Resolution”, ECCV 2016, https://github.com/jcjohnson/fast-neural-style

Figure credit: Mordvintsev, Olah, and Tyka, “Inceptionism: Going Deeper into Neural Networks”, https://research.googleblog.com/2015/06/inceptionism-going-deeper-into-neural.html

5 Dec 2016

Page 21: Lecture: Introduction to Deep Learningvision.stanford.edu/.../files/19_deep_learning.pdf · 2017-12-05 · Deep Learning Learning hierarchical representations from data End-to-end

Stanford University

Outside Computer Vision

Lecture 19 - 21 6 Dec 2016

Machine Translation

Figure credit: Bahdanau, Cho, and Bengio, “Neural Machine Translation by jointly learning to align and translate”, ICLR 2015

Speech Recognition

Figure credit: Amodei et al, “Deep Speech 2: End-to-End Speech Recognition in English and Mandarin”, arXiv 2015

Speech Synthesis

Figure credit: var der Oord et al, “WaveNet: A Generative Model for Raw Audio”, arXiv 2016, https://deepmind.com/blog/wavenet-generative-model-raw-audio/

Text Synthesis

Figure credit: Sutskever, Martens, and Hinton, “Generating Text with Recurrent Neural Networks”, ICML 2011

5 Dec 2016

Page 22: Lecture: Introduction to Deep Learningvision.stanford.edu/.../files/19_deep_learning.pdf · 2017-12-05 · Deep Learning Learning hierarchical representations from data End-to-end

Stanford University

Deep Reinforcement Learning

Lecture 19 - 22 6 Dec 2016

Playing Atari games

Mnih et al, “Human-level control through deep reinforcement learning”, Nature 2015

Silver et al, “Mastering the game of Go with deepneural networks and tree search”, Nature 2016

Image credit: http://www.newyorker.com/tech/elements/alphago-lee-sedol-and-the-reassuring-future-of-humans-and-machines

AlphaGo beats Lee Sedol

5 Dec 2016

Page 23: Lecture: Introduction to Deep Learningvision.stanford.edu/.../files/19_deep_learning.pdf · 2017-12-05 · Deep Learning Learning hierarchical representations from data End-to-end

Stanford University Lecture 19 - 23 6 Dec 2016

Outline

● Applications● Motivation● Supervised learning● Gradient descent● Backpropagation● Convolutional Neural Networks

5 Dec 2016

Page 24: Lecture: Introduction to Deep Learningvision.stanford.edu/.../files/19_deep_learning.pdf · 2017-12-05 · Deep Learning Learning hierarchical representations from data End-to-end

Stanford University Lecture 19 - 24 6 Dec 2016

Outline

● Applications● Motivation● Supervised learning● Gradient descent● Backpropagation● Convolutional Neural Networks

5 Dec 2016

Page 25: Lecture: Introduction to Deep Learningvision.stanford.edu/.../files/19_deep_learning.pdf · 2017-12-05 · Deep Learning Learning hierarchical representations from data End-to-end

Stanford University

Recall: Image Classification

Lecture 19 - 25 6 Dec 2016

Write a function that maps images to labels(also called object recognition)

Dataset: ETH-80, by B. Leibe; Slide credit: L. Lazebnik

No obvious way to implement this function!

Slide credit: CS 231n, Lecture 2, 2016

5 Dec 2016

Page 26: Lecture: Introduction to Deep Learningvision.stanford.edu/.../files/19_deep_learning.pdf · 2017-12-05 · Deep Learning Learning hierarchical representations from data End-to-end

Stanford University

Recall: Image Classification

Lecture 19 - 26 6 Dec 2016

Example training set

Slide credit: CS 231n, Lecture 2, 2016

5 Dec 2016

Page 27: Lecture: Introduction to Deep Learningvision.stanford.edu/.../files/19_deep_learning.pdf · 2017-12-05 · Deep Learning Learning hierarchical representations from data End-to-end

Stanford University

Recall: Image Classification

Lecture 19 - 27 6 Dec 2016

Figure credit: D. Hoiem, L. Lazebnik

5 Dec 2016

Page 28: Lecture: Introduction to Deep Learningvision.stanford.edu/.../files/19_deep_learning.pdf · 2017-12-05 · Deep Learning Learning hierarchical representations from data End-to-end

Stanford University

Recall: Image Classification

Lecture 19 - 28 6 Dec 2016

Features remain fixed Classifier is learned from data

Figure credit: D. Hoiem, L. Lazebnik

5 Dec 2016

Page 29: Lecture: Introduction to Deep Learningvision.stanford.edu/.../files/19_deep_learning.pdf · 2017-12-05 · Deep Learning Learning hierarchical representations from data End-to-end

Stanford University

Recall: Image Classification

Lecture 19 - 29 6 Dec 2016

Features remain fixed Classifier is learned from data

Problem: How do we know which features to use? We may need different features for each problem!

Figure credit: D. Hoiem, L. Lazebnik

5 Dec 2016

Page 30: Lecture: Introduction to Deep Learningvision.stanford.edu/.../files/19_deep_learning.pdf · 2017-12-05 · Deep Learning Learning hierarchical representations from data End-to-end

Stanford University

Recall: Image Classification

Lecture 19 - 30 6 Dec 2016

Features remain fixed Classifier is learned from data

Problem: How do we know which features to use? We may need different features for each problem!

Solution: Learn the features jointly with the classifier!

Figure credit: D. Hoiem, L. Lazebnik

5 Dec 2016

Page 31: Lecture: Introduction to Deep Learningvision.stanford.edu/.../files/19_deep_learning.pdf · 2017-12-05 · Deep Learning Learning hierarchical representations from data End-to-end

Stanford University Lecture 19 - 31 6 Dec 2016

Image Classification: Feature Learning

Image features Classifier

Training

Training labels

Learned ModelLearned features

Learned Classifier

Learned ModelLearned features

Learned Classifier

Prediction

5 Dec 2016

Page 32: Lecture: Introduction to Deep Learningvision.stanford.edu/.../files/19_deep_learning.pdf · 2017-12-05 · Deep Learning Learning hierarchical representations from data End-to-end

Stanford University Lecture 19 - 32 6 Dec 2016

Image Classification: Deep Learning

Low-level features

Training

Training labels

Learned ModelLow-level features

Mid-level features

Mid-level features

High-level features

Classifier

High-level features

Classifier

Model is deep: it has many layers

Models can learn hierarchical features

5 Dec 2016

Page 33: Lecture: Introduction to Deep Learningvision.stanford.edu/.../files/19_deep_learning.pdf · 2017-12-05 · Deep Learning Learning hierarchical representations from data End-to-end

Stanford University Lecture 19 - 33 6 Dec 2016

Image Classification: Deep Learning

Low-level features

Training

Training labels

Learned ModelLow-level features

Mid-level features

Mid-level features

High-level features

Classifier

High-level features

Classifier

Model is deep: it has many layers

Models can learn hierarchical features

Figure credit: Zeiler and Fergus, “Visualizing and Understanding Convolutional Networks”, ECCV 2014

5 Dec 2016

Page 34: Lecture: Introduction to Deep Learningvision.stanford.edu/.../files/19_deep_learning.pdf · 2017-12-05 · Deep Learning Learning hierarchical representations from data End-to-end

Stanford University Lecture 19 - 34 6 Dec 2016

Outline

● Applications● Motivation● Supervised learning● Gradient descent● Backpropagation● Convolutional Neural Networks

5 Dec 2016

Page 35: Lecture: Introduction to Deep Learningvision.stanford.edu/.../files/19_deep_learning.pdf · 2017-12-05 · Deep Learning Learning hierarchical representations from data End-to-end

Stanford University

Supervised Learning in 3 easy stepsHow to learn models from data

Lecture 19 - 35 6 Dec 2016

Step 1: Define Model

5 Dec 2016

Page 36: Lecture: Introduction to Deep Learningvision.stanford.edu/.../files/19_deep_learning.pdf · 2017-12-05 · Deep Learning Learning hierarchical representations from data End-to-end

Stanford University

Supervised Learning in 3 easy stepsHow to learn models from data

Lecture 19 - 36 6 Dec 2016

Input data(Image pixels)

Model weights

Predicted output (image label)

Model structure

Step 1: Define Model

5 Dec 2016

Page 37: Lecture: Introduction to Deep Learningvision.stanford.edu/.../files/19_deep_learning.pdf · 2017-12-05 · Deep Learning Learning hierarchical representations from data End-to-end

Stanford University

Supervised learning

Lecture 19 - 37 5 Dec 2016

b

Page 38: Lecture: Introduction to Deep Learningvision.stanford.edu/.../files/19_deep_learning.pdf · 2017-12-05 · Deep Learning Learning hierarchical representations from data End-to-end

Stanford University

Supervised Learning in 3 easy stepsHow to learn models from data

Lecture 19 - 38 6 Dec 2016

Input data(Image pixels)

Model weights

Predicted output (image label)

Model structure Training

inputTrue

output

Step 1: Define Model

Step 2: Collect data

5 Dec 2016

Page 39: Lecture: Introduction to Deep Learningvision.stanford.edu/.../files/19_deep_learning.pdf · 2017-12-05 · Deep Learning Learning hierarchical representations from data End-to-end

Stanford University

Supervised Learning in 3 easy stepsHow to learn models from data

Lecture 19 - 39 6 Dec 2016

Input data(Image pixels)

Model weights

Predicted output (image label)

Model structure Training

inputTrue

output

Step 1: Define Model

Step 2: Collect data

Step 3: Learn the model

5 Dec 2016

Page 40: Lecture: Introduction to Deep Learningvision.stanford.edu/.../files/19_deep_learning.pdf · 2017-12-05 · Deep Learning Learning hierarchical representations from data End-to-end

Stanford University

Supervised Learning in 3 easy stepsHow to learn models from data

Lecture 19 - 40 6 Dec 2016

Input data(Image pixels)

Model weights

Predicted output (image label)

Model structure

Learned weights

Minimize average loss over training set

Loss function:Measures “badness”

of prediction

Traininginput

True output

Predicted output

Step 1: Define Model

Step 2: Collect data

Step 3: Learn the model

Regularizer:Penalizes complex

models

5 Dec 2016

Page 41: Lecture: Introduction to Deep Learningvision.stanford.edu/.../files/19_deep_learning.pdf · 2017-12-05 · Deep Learning Learning hierarchical representations from data End-to-end

Stanford University Lecture 19 - 41 6 Dec 2016

Regularizer is Frobenius norm of matrix (sum of

squares of entries)

Learning Problem

Input and output are vectors Loss is Euclidean distance

Linear Regression

Model is just a matrix multiply

Supervised Learning: Linear regressionSometimes called “Ridge Regression” with regularizer

5 Dec 2016

Page 42: Lecture: Introduction to Deep Learningvision.stanford.edu/.../files/19_deep_learning.pdf · 2017-12-05 · Deep Learning Learning hierarchical representations from data End-to-end

Stanford University

Supervised learning

Lecture 19 - 42 5 Dec 2016

b

Page 43: Lecture: Introduction to Deep Learningvision.stanford.edu/.../files/19_deep_learning.pdf · 2017-12-05 · Deep Learning Learning hierarchical representations from data End-to-end

Stanford University Lecture 19 - 43 6 Dec 2016

Regularizer is Frobenius norm of matrix (sum of

squares of entries)

Learning Problem

Input and output are vectors Loss is Euclidean distance

Linear Regression

Model is just a matrix multiply

Supervised Learning: Linear regressionSometimes called “Ridge Regression” with regularizer

5 Dec 2016

Page 44: Lecture: Introduction to Deep Learningvision.stanford.edu/.../files/19_deep_learning.pdf · 2017-12-05 · Deep Learning Learning hierarchical representations from data End-to-end

Stanford University Lecture 19 - 44 6 Dec 2016

Input and output are vectors Loss is Euclidean distance

Linear Regression

Model is just a matrix multiply

Model is two matrix multiplies

New Model

Supervised Learning: Neural NetworkSometimes called “Fully-Connected Network” or “Multilayer Perceptron”

5 Dec 2016

Page 45: Lecture: Introduction to Deep Learningvision.stanford.edu/.../files/19_deep_learning.pdf · 2017-12-05 · Deep Learning Learning hierarchical representations from data End-to-end

Stanford University Lecture 19 - 45 6 Dec 2016

Question: Is the new model “more powerful” than Linear Regression? Can it represent any functions that Linear Regression cannot?

Input and output are vectors Loss is Euclidean distance

Linear Regression

Model is just a matrix multiply

Model is two matrix multiplies

New Model

Supervised Learning: Neural NetworkSometimes called “Fully-Connected Network” or “Multilayer Perceptron”

5 Dec 2016

Page 46: Lecture: Introduction to Deep Learningvision.stanford.edu/.../files/19_deep_learning.pdf · 2017-12-05 · Deep Learning Learning hierarchical representations from data End-to-end

Stanford University Lecture 19 - 46 6 Dec 2016

Question: Is the new model “more powerful” than Linear Regression? Can it represent any functions that Linear Regression cannot?

Answer: NO! We can writeAnd recover Linear Regression

Input and output are vectors Loss is Euclidean distance

Linear Regression

Model is just a matrix multiply

Model is two matrix multiplies

New Model

Supervised Learning: Neural NetworkSometimes called “Fully-Connected Network” or “Multilayer Perceptron”

5 Dec 2016

Page 47: Lecture: Introduction to Deep Learningvision.stanford.edu/.../files/19_deep_learning.pdf · 2017-12-05 · Deep Learning Learning hierarchical representations from data End-to-end

Stanford University Lecture 19 - 47 6 Dec 2016

Input and output are vectors

Model is two matrix multiplies, with an

elementwise nonlinearity

Loss is Euclidean distance

Linear Regression Neural Network

Model is just a matrix multiply

Supervised Learning: Neural NetworkSometimes called “Fully-Connected Network” or “Multilayer Perceptron”

5 Dec 2016

Page 48: Lecture: Introduction to Deep Learningvision.stanford.edu/.../files/19_deep_learning.pdf · 2017-12-05 · Deep Learning Learning hierarchical representations from data End-to-end

Stanford University Lecture 19 - 48 6 Dec 2016

Input and output are vectors

Model is two matrix multiplies, with an

elementwise nonlinearity

Loss is Euclidean distance

Linear Regression Neural Network

Model is just a matrix multiply

Supervised Learning: Neural NetworkSometimes called “Fully-Connected Network” or “Multilayer Perceptron”

Rectified Linear (ReLU) Logistic Sigmoid

Common nonlinearities:

5 Dec 2016

Page 49: Lecture: Introduction to Deep Learningvision.stanford.edu/.../files/19_deep_learning.pdf · 2017-12-05 · Deep Learning Learning hierarchical representations from data End-to-end

Stanford University Lecture 19 - 49 6 Dec 2016

Input and output are vectors

Model is two matrix multiplies, with an

elementwise nonlinearity

Loss is Euclidean distance

Linear Regression Neural Network

Model is just a matrix multiply

Supervised Learning: Neural NetworkSometimes called “Fully-Connected Network” or “Multilayer Perceptron”

Rectified Linear (ReLU) Logistic Sigmoid

Common nonlinearities:

Neural Network is more powerful than Linear Regression!

5 Dec 2016

Page 50: Lecture: Introduction to Deep Learningvision.stanford.edu/.../files/19_deep_learning.pdf · 2017-12-05 · Deep Learning Learning hierarchical representations from data End-to-end

Stanford University Lecture 19 - 50 6 Dec 2016

Neural Networks with many layersSometimes called “Fully-Connected Network” or “Multilayer Perceptron”

Two Layer network

5 Dec 2016

Page 51: Lecture: Introduction to Deep Learningvision.stanford.edu/.../files/19_deep_learning.pdf · 2017-12-05 · Deep Learning Learning hierarchical representations from data End-to-end

Stanford University Lecture 19 - 51 6 Dec 2016

Neural Networks with many layersSometimes called “Fully-Connected Network” or “Multilayer Perceptron”

Two Layer networkThree Layer network

5 Dec 2016

Page 52: Lecture: Introduction to Deep Learningvision.stanford.edu/.../files/19_deep_learning.pdf · 2017-12-05 · Deep Learning Learning hierarchical representations from data End-to-end

Stanford University Lecture 19 - 52 6 Dec 2016

Neural Networks with many layersSometimes called “Fully-Connected Network” or “Multilayer Perceptron”

Two Layer networkThree Layer network

Hidden layers are learned feature representations of the input!5 Dec 2016

Page 53: Lecture: Introduction to Deep Learningvision.stanford.edu/.../files/19_deep_learning.pdf · 2017-12-05 · Deep Learning Learning hierarchical representations from data End-to-end

Stanford University Lecture 19 - 53 6 Dec 2016

Outline

● Applications● Motivation● Supervised learning● Gradient descent● Backpropagation● Convolutional Neural Networks

5 Dec 2016

Page 54: Lecture: Introduction to Deep Learningvision.stanford.edu/.../files/19_deep_learning.pdf · 2017-12-05 · Deep Learning Learning hierarchical representations from data End-to-end

Stanford University

How to find best weights ?

Lecture 19 - 54 6 Dec 20165 Dec 2016

Page 55: Lecture: Introduction to Deep Learningvision.stanford.edu/.../files/19_deep_learning.pdf · 2017-12-05 · Deep Learning Learning hierarchical representations from data End-to-end

Stanford University

How to find best weights ?

Lecture 19 - 55 6 Dec 2016

Slide credit: CS 231n, Lecture 3

5 Dec 2016

Page 56: Lecture: Introduction to Deep Learningvision.stanford.edu/.../files/19_deep_learning.pdf · 2017-12-05 · Deep Learning Learning hierarchical representations from data End-to-end

Stanford University

How to find best weights ?

Lecture 19 - 56 6 Dec 2016

Idea #1: Closed Form SolutionWorks for some models (linear regression) but in general very hard (even for one-layer net)

Slide credit: CS 231n, Lecture 3

5 Dec 2016

Page 57: Lecture: Introduction to Deep Learningvision.stanford.edu/.../files/19_deep_learning.pdf · 2017-12-05 · Deep Learning Learning hierarchical representations from data End-to-end

Stanford University

How to find best weights ?

Lecture 19 - 57 6 Dec 2016

Idea #1: Closed Form SolutionWorks for some models (linear regression) but in general very hard (even for one-layer net)

Idea #2: Random SearchTry a bunch of random values for

w, keep the best oneExtremely inefficient in high

dimensions

Slide credit: CS 231n, Lecture 3

5 Dec 2016

Page 58: Lecture: Introduction to Deep Learningvision.stanford.edu/.../files/19_deep_learning.pdf · 2017-12-05 · Deep Learning Learning hierarchical representations from data End-to-end

Stanford University

How to find best weights ?

Lecture 19 - 58 6 Dec 2016

Idea #1: Closed Form SolutionWorks for some models (linear regression) but in general very hard (even for one-layer net)

Idea #2: Random SearchTry a bunch of random values for

w, keep the best oneExtremely inefficient in high

dimensions

Idea #3: Go downhillStart somewhere random, take tiny steps downhill and repeat

Good idea!Slide credit: CS 231n, Lecture 3

5 Dec 2016

Page 59: Lecture: Introduction to Deep Learningvision.stanford.edu/.../files/19_deep_learning.pdf · 2017-12-05 · Deep Learning Learning hierarchical representations from data End-to-end

Stanford University

Which way is downhill?

Lecture 19 - 59 6 Dec 2016

Recall from single-variable calculus:

If w is a scalar the derivative tells us how much g will change if w changes a little bit

5 Dec 2016

Page 60: Lecture: Introduction to Deep Learningvision.stanford.edu/.../files/19_deep_learning.pdf · 2017-12-05 · Deep Learning Learning hierarchical representations from data End-to-end

Stanford University

Which way is downhill?

Lecture 19 - 60 6 Dec 2016

Recall from single-variable calculus:

If w is a scalar the derivative tells us how much g will change if w changes a little bit

If w is a vector then we can take partial derivatives with respect to each component

The gradient is the vector of all partial derivatives; it has the

same shape as w

And multivariable calculus:

5 Dec 2016

Page 61: Lecture: Introduction to Deep Learningvision.stanford.edu/.../files/19_deep_learning.pdf · 2017-12-05 · Deep Learning Learning hierarchical representations from data End-to-end

Stanford University

Which way is downhill?

Lecture 19 - 61 6 Dec 2016

Recall from single-variable calculus:

If w is a scalar the derivative tells us how much g will change if w changes a little bit

If w is a vector then we can take partial derivatives with respect to each component

The gradient is the vector of all partial derivatives; it has the

same shape as w

And multivariable calculus:

Useful fact: The gradient of g at w gives the direction of steepest increase

5 Dec 2016

Page 62: Lecture: Introduction to Deep Learningvision.stanford.edu/.../files/19_deep_learning.pdf · 2017-12-05 · Deep Learning Learning hierarchical representations from data End-to-end

Stanford University

Gradient descentThe simple algorithm behind all deep learning

Lecture 19 - 62 6 Dec 2016

Initialize w randomlyWhile true: Compute gradient at current point

Move downhill a little bit:

5 Dec 2016

Page 63: Lecture: Introduction to Deep Learningvision.stanford.edu/.../files/19_deep_learning.pdf · 2017-12-05 · Deep Learning Learning hierarchical representations from data End-to-end

Stanford University

Gradient descentThe simple algorithm behind all deep learning

Lecture 19 - 63 6 Dec 2016

Initialize w randomlyWhile true: Compute gradient at current point

Move downhill a little bit:

Learning rate: How big each step should be

5 Dec 2016

Page 64: Lecture: Introduction to Deep Learningvision.stanford.edu/.../files/19_deep_learning.pdf · 2017-12-05 · Deep Learning Learning hierarchical representations from data End-to-end

Stanford University

Gradient descentThe simple algorithm behind all deep learning

Lecture 19 - 64 6 Dec 2016

Initialize w randomlyWhile true: Compute gradient at current point

Move downhill a little bit:

Learning rate: How big each step should be

In practice we estimate the gradient using a small minibatch of data

(Stochastic gradient descent)

5 Dec 2016

Page 65: Lecture: Introduction to Deep Learningvision.stanford.edu/.../files/19_deep_learning.pdf · 2017-12-05 · Deep Learning Learning hierarchical representations from data End-to-end

Stanford University

Gradient descentThe simple algorithm behind all deep learning

Lecture 19 - 65 6 Dec 2016

Initialize w randomlyWhile true: Compute gradient at current point

Move downhill a little bit:

Learning rate: How big each step should be

In practice we estimate the gradient using a small minibatch of data

(Stochastic gradient descent)

The update rule is the formula for updating the weights at each

iteration; in practice people use fancier rules

(momentum, RMSProp, Adam, etc)

5 Dec 2016

Page 66: Lecture: Introduction to Deep Learningvision.stanford.edu/.../files/19_deep_learning.pdf · 2017-12-05 · Deep Learning Learning hierarchical representations from data End-to-end

Stanford University Lecture 19 - 66 6 Dec 2016

Gradient descentThe simple algorithm behind all deep learning

Image credit: Alec Radford

5 Dec 2016

Page 67: Lecture: Introduction to Deep Learningvision.stanford.edu/.../files/19_deep_learning.pdf · 2017-12-05 · Deep Learning Learning hierarchical representations from data End-to-end

Stanford University Lecture 19 - 67 6 Dec 2016

Outline

● Applications● Motivation● Supervised learning● Gradient descent● Backpropagation● Convolutional Neural Networks

5 Dec 2016

Page 68: Lecture: Introduction to Deep Learningvision.stanford.edu/.../files/19_deep_learning.pdf · 2017-12-05 · Deep Learning Learning hierarchical representations from data End-to-end

Stanford University Lecture 19 - 68 6 Dec 2016

How to compute gradients?

Work it out on paper?

?

Linear Regression

Doable for simple models

5 Dec 2016

Page 69: Lecture: Introduction to Deep Learningvision.stanford.edu/.../files/19_deep_learning.pdf · 2017-12-05 · Deep Learning Learning hierarchical representations from data End-to-end

Stanford University Lecture 19 - 69 6 Dec 2016

How to compute gradients?

Work it out on paper?Three layer network

Gets hairy for more complex models

5 Dec 2016

Page 70: Lecture: Introduction to Deep Learningvision.stanford.edu/.../files/19_deep_learning.pdf · 2017-12-05 · Deep Learning Learning hierarchical representations from data End-to-end

Stanford University

Backpropagation

Lecture 19 - 70 6 Dec 2016

e.g. x = -2, y = 5, z = -4

Slide credit: CS 231n, Lecture 4

Computational graph

5 Dec 2016

Page 71: Lecture: Introduction to Deep Learningvision.stanford.edu/.../files/19_deep_learning.pdf · 2017-12-05 · Deep Learning Learning hierarchical representations from data End-to-end

Stanford University Lecture 19 - 71 6 Dec 2016

e.g. x = -2, y = 5, z = -4

Want:

Backpropagation

Slide credit: CS 231n, Lecture 4

Computational graph

5 Dec 2016

Page 72: Lecture: Introduction to Deep Learningvision.stanford.edu/.../files/19_deep_learning.pdf · 2017-12-05 · Deep Learning Learning hierarchical representations from data End-to-end

Stanford University Lecture 19 - 72 6 Dec 2016

e.g. x = -2, y = 5, z = -4

Want:

Backpropagation

Slide credit: CS 231n, Lecture 4

Computational graph

5 Dec 2016

Page 73: Lecture: Introduction to Deep Learningvision.stanford.edu/.../files/19_deep_learning.pdf · 2017-12-05 · Deep Learning Learning hierarchical representations from data End-to-end

Stanford University Lecture 19 - 73 6 Dec 2016

e.g. x = -2, y = 5, z = -4

Want:

Backpropagation

Slide credit: CS 231n, Lecture 4

Computational graph

5 Dec 2016

Page 74: Lecture: Introduction to Deep Learningvision.stanford.edu/.../files/19_deep_learning.pdf · 2017-12-05 · Deep Learning Learning hierarchical representations from data End-to-end

Stanford University Lecture 19 - 74 6 Dec 2016

e.g. x = -2, y = 5, z = -4

Want:

Backpropagation

Slide credit: CS 231n, Lecture 4

Computational graph

5 Dec 2016

Page 75: Lecture: Introduction to Deep Learningvision.stanford.edu/.../files/19_deep_learning.pdf · 2017-12-05 · Deep Learning Learning hierarchical representations from data End-to-end

Stanford University Lecture 19 - 75 6 Dec 2016

e.g. x = -2, y = 5, z = -4

Want:

Backpropagation

Slide credit: CS 231n, Lecture 4

Computational graph

5 Dec 2016

Page 76: Lecture: Introduction to Deep Learningvision.stanford.edu/.../files/19_deep_learning.pdf · 2017-12-05 · Deep Learning Learning hierarchical representations from data End-to-end

Stanford University Lecture 19 - 76 6 Dec 2016

e.g. x = -2, y = 5, z = -4

Want:

Backpropagation

Slide credit: CS 231n, Lecture 4

Computational graph

5 Dec 2016

Page 77: Lecture: Introduction to Deep Learningvision.stanford.edu/.../files/19_deep_learning.pdf · 2017-12-05 · Deep Learning Learning hierarchical representations from data End-to-end

Stanford University Lecture 19 - 77 6 Dec 2016

e.g. x = -2, y = 5, z = -4

Want:

Backpropagation

Slide credit: CS 231n, Lecture 4

Computational graph

5 Dec 2016

Page 78: Lecture: Introduction to Deep Learningvision.stanford.edu/.../files/19_deep_learning.pdf · 2017-12-05 · Deep Learning Learning hierarchical representations from data End-to-end

Stanford University Lecture 19 - 78 6 Dec 2016

e.g. x = -2, y = 5, z = -4

Want:

Backpropagation

Slide credit: CS 231n, Lecture 4

Computational graph

5 Dec 2016

Page 79: Lecture: Introduction to Deep Learningvision.stanford.edu/.../files/19_deep_learning.pdf · 2017-12-05 · Deep Learning Learning hierarchical representations from data End-to-end

Stanford University Lecture 19 - 79 6 Dec 2016

e.g. x = -2, y = 5, z = -4

Want:

Chain rule:

Backpropagation

Slide credit: CS 231n, Lecture 4

Computational graph

5 Dec 2016

Page 80: Lecture: Introduction to Deep Learningvision.stanford.edu/.../files/19_deep_learning.pdf · 2017-12-05 · Deep Learning Learning hierarchical representations from data End-to-end

Stanford University Lecture 19 - 80 6 Dec 2016

e.g. x = -2, y = 5, z = -4

Want:

Backpropagation

Slide credit: CS 231n, Lecture 4

Computational graph

5 Dec 2016

Page 81: Lecture: Introduction to Deep Learningvision.stanford.edu/.../files/19_deep_learning.pdf · 2017-12-05 · Deep Learning Learning hierarchical representations from data End-to-end

Stanford University Lecture 19 - 81 6 Dec 2016

e.g. x = -2, y = 5, z = -4

Want:

Chain rule:

Backpropagation

Slide credit: CS 231n, Lecture 4

Computational graph

5 Dec 2016

Page 82: Lecture: Introduction to Deep Learningvision.stanford.edu/.../files/19_deep_learning.pdf · 2017-12-05 · Deep Learning Learning hierarchical representations from data End-to-end

Stanford University

Backpropagation

Lecture 19 - 82 6 Dec 2016

Computational graph nodes can be vectors or matrices or n-dimensional tensors

x

W1 W2

* *y

5 Dec 2016

Page 83: Lecture: Introduction to Deep Learningvision.stanford.edu/.../files/19_deep_learning.pdf · 2017-12-05 · Deep Learning Learning hierarchical representations from data End-to-end

Stanford University

Backpropagation

Lecture 19 - 83 6 Dec 2016

Computational graph nodes can be vectors or matrices or n-dimensional tensors

x

W1 W2

* *y

Forward pass: Run graph “forward” to compute lossBackward pass: Run graph “backward” to compute

gradients with respect to lossEasily compute gradients for big, complex models!

5 Dec 2016

Page 84: Lecture: Introduction to Deep Learningvision.stanford.edu/.../files/19_deep_learning.pdf · 2017-12-05 · Deep Learning Learning hierarchical representations from data End-to-end

Stanford University

Backpropagation

Lecture 19 - 84 6 Dec 2016

Computational graph nodes can be vectors or matrices or n-dimensional tensors

x

W1 W2

* *y

Forward pass: Run graph “forward” to compute lossBackward pass: Run graph “backward” to compute

gradients with respect to lossEasily compute gradients for big, complex models!

Tensors flow along edges in the graph5 Dec 2016

Page 85: Lecture: Introduction to Deep Learningvision.stanford.edu/.../files/19_deep_learning.pdf · 2017-12-05 · Deep Learning Learning hierarchical representations from data End-to-end

Stanford University Lecture 19 - 85 6 Dec 2016

A two-layer neural network in 25 lines of code

5 Dec 2016

Page 86: Lecture: Introduction to Deep Learningvision.stanford.edu/.../files/19_deep_learning.pdf · 2017-12-05 · Deep Learning Learning hierarchical representations from data End-to-end

Stanford University Lecture 19 - 86 6 Dec 2016

A two-layer neural network in 25 lines of code

Randomly initialize weights

5 Dec 2016

Page 87: Lecture: Introduction to Deep Learningvision.stanford.edu/.../files/19_deep_learning.pdf · 2017-12-05 · Deep Learning Learning hierarchical representations from data End-to-end

Stanford University Lecture 19 - 87 6 Dec 2016

A two-layer neural network in 25 lines of code

Get a batch of (random) data

5 Dec 2016

Page 88: Lecture: Introduction to Deep Learningvision.stanford.edu/.../files/19_deep_learning.pdf · 2017-12-05 · Deep Learning Learning hierarchical representations from data End-to-end

Stanford University Lecture 19 - 88 6 Dec 2016

A two-layer neural network in 25 lines of code

Forward pass: compute loss

ReLU nonlinearity

Loss is Euclidean distance

5 Dec 2016

Page 89: Lecture: Introduction to Deep Learningvision.stanford.edu/.../files/19_deep_learning.pdf · 2017-12-05 · Deep Learning Learning hierarchical representations from data End-to-end

Stanford University Lecture 19 - 89 6 Dec 2016

A two-layer neural network in 25 lines of code

Backward pass: compute gradients

5 Dec 2016

Page 90: Lecture: Introduction to Deep Learningvision.stanford.edu/.../files/19_deep_learning.pdf · 2017-12-05 · Deep Learning Learning hierarchical representations from data End-to-end

Stanford University Lecture 19 - 90 6 Dec 2016

A two-layer neural network in 25 lines of code

Update weights

5 Dec 2016

Page 91: Lecture: Introduction to Deep Learningvision.stanford.edu/.../files/19_deep_learning.pdf · 2017-12-05 · Deep Learning Learning hierarchical representations from data End-to-end

Stanford University Lecture 19 - 91 6 Dec 2016

A two-layer neural network in 25 lines of code

When you run this code: loss goes down!

5 Dec 2016

Page 92: Lecture: Introduction to Deep Learningvision.stanford.edu/.../files/19_deep_learning.pdf · 2017-12-05 · Deep Learning Learning hierarchical representations from data End-to-end

Stanford University Lecture 19 - 92 6 Dec 2016

Outline

● Applications● Motivation● Supervised learning● Gradient descent● Backpropagation● Convolutional Neural Networks

5 Dec 2016

Page 93: Lecture: Introduction to Deep Learningvision.stanford.edu/.../files/19_deep_learning.pdf · 2017-12-05 · Deep Learning Learning hierarchical representations from data End-to-end

Stanford University

Convolution Layer

Lecture 19 - 93 6 Dec 2016

32

32

3

32x32x3 image

width

height

depth

Slide credit: CS 231n, Lecture 7

5 Dec 2016

Page 94: Lecture: Introduction to Deep Learningvision.stanford.edu/.../files/19_deep_learning.pdf · 2017-12-05 · Deep Learning Learning hierarchical representations from data End-to-end

Stanford University

Convolution Layer

Lecture 19 - 94 6 Dec 2016

32

32

3

5x5x3 filter

32x32x3 image

Convolve the filter with the imagei.e. “slide over the image spatially, computing dot products”

Slide credit: CS 231n, Lecture 7

5 Dec 2016

Page 95: Lecture: Introduction to Deep Learningvision.stanford.edu/.../files/19_deep_learning.pdf · 2017-12-05 · Deep Learning Learning hierarchical representations from data End-to-end

Stanford University

Convolution Layer

Lecture 19 - 95 6 Dec 2016

32

32

3

5x5x3 filter

32x32x3 image

Convolve the filter with the imagei.e. “slide over the image spatially, computing dot products”

Filters always extend the full depth of the input volume

Slide credit: CS 231n, Lecture 7

5 Dec 2016

Page 96: Lecture: Introduction to Deep Learningvision.stanford.edu/.../files/19_deep_learning.pdf · 2017-12-05 · Deep Learning Learning hierarchical representations from data End-to-end

Stanford University

Convolution Layer

Lecture 19 - 96 6 Dec 2016

32

32

3

32x32x3 image5x5x3 filter

1 number: the result of taking a dot product between the filter and a small 5x5x3 chunk of the image(i.e. 5*5*3 = 75-dimensional dot product + bias)

Slide credit: CS 231n, Lecture 7

5 Dec 2016

Page 97: Lecture: Introduction to Deep Learningvision.stanford.edu/.../files/19_deep_learning.pdf · 2017-12-05 · Deep Learning Learning hierarchical representations from data End-to-end

Stanford University

Convolution Layer

Lecture 19 - 97 6 Dec 2016

32

32

3

32x32x3 image5x5x3 filter

convolve (slide) over all spatial locations

activation map

1

28

28

Slide credit: CS 231n, Lecture 7

Translation Invariance!5 Dec 2016

Page 98: Lecture: Introduction to Deep Learningvision.stanford.edu/.../files/19_deep_learning.pdf · 2017-12-05 · Deep Learning Learning hierarchical representations from data End-to-end

Stanford University

Convolution Layer

Lecture 19 - 98 6 Dec 2016

32

32

3

32x32x3 image5x5x3 filter

convolve (slide) over all spatial locations

activation maps

1

28

28

consider a second, green filter

Slide credit: CS 231n, Lecture 7

5 Dec 2016

Page 99: Lecture: Introduction to Deep Learningvision.stanford.edu/.../files/19_deep_learning.pdf · 2017-12-05 · Deep Learning Learning hierarchical representations from data End-to-end

Stanford University

Convolution Layer

Lecture 19 - 99 6 Dec 2016

32

32

3

Convolution Layer

activation maps

6

28

28

For example, if we had 6 5x5 filters, we’ll get 6 separate activation maps:

We stack these up to get a “new image” of size 28x28x6!

Slide credit: CS 231n, Lecture 7

5 Dec 2016

Page 100: Lecture: Introduction to Deep Learningvision.stanford.edu/.../files/19_deep_learning.pdf · 2017-12-05 · Deep Learning Learning hierarchical representations from data End-to-end

Stanford University

Convolutional Neural Networks (CNN)

Lecture 19 - 100 6 Dec 2016

A CNN is a sequence of convolution layers and nonlinearities

32

32

3

CONV,ReLUe.g. 6 5x5x3 filters 28

28

6

CONV,ReLUe.g. 10 5x5x6 filters

CONV,ReLU

….

10

24

24

Slide credit: CS 231n, Lecture 7

5 Dec 2016

Page 101: Lecture: Introduction to Deep Learningvision.stanford.edu/.../files/19_deep_learning.pdf · 2017-12-05 · Deep Learning Learning hierarchical representations from data End-to-end

Stanford University Lecture 19 - 101 6 Dec 2016

Case Study: GoogLeNet [Szegedy et al., 2014]

Inception module

ILSVRC 2014 winner (6.7% top 5 error)

Slide credit: CS 231n, Lecture 7

5 Dec 2016

Page 102: Lecture: Introduction to Deep Learningvision.stanford.edu/.../files/19_deep_learning.pdf · 2017-12-05 · Deep Learning Learning hierarchical representations from data End-to-end

Stanford University Lecture 19 - 102 6 Dec 2016

Recap

● Applications● Motivation● Supervised learning● Gradient descent● Backpropagation● Convolutional Neural Networks

5 Dec 2016