lecture 17: neural networks and deep learninglecture 17: neural networks and deep learning jack...

128
Lecture 17: Neural Networks and Deep Learning Jack Lanchantin Dr. Yanjun Qi 1 UVA CS 6316 / CS 4501-004 Machine Learning Fall 2016

Upload: others

Post on 22-May-2020

8 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Lecture 17: Neural Networks and Deep LearningLecture 17: Neural Networks and Deep Learning Jack Lanchantin Dr. Yanjun Qi 1 UVA CS 6316 / CS 4501-004 Machine Learning Fall 2016. Neurons

Lecture 17: Neural Networks and

Deep LearningJack Lanchantin

Dr. Yanjun Qi

1

UVA CS 6316 / CS 4501-004Machine Learning

Fall 2016

Page 2: Lecture 17: Neural Networks and Deep LearningLecture 17: Neural Networks and Deep Learning Jack Lanchantin Dr. Yanjun Qi 1 UVA CS 6316 / CS 4501-004 Machine Learning Fall 2016. Neurons

Neurons1-Layer Neural Network

Multi-layer Neural Network

Loss Functions

Backpropagation

Nonlinearity Functions

NNs in Practice2

Page 3: Lecture 17: Neural Networks and Deep LearningLecture 17: Neural Networks and Deep Learning Jack Lanchantin Dr. Yanjun Qi 1 UVA CS 6316 / CS 4501-004 Machine Learning Fall 2016. Neurons

3

x1

x2

x3

W1

w3

W2

x ŷ

Page 4: Lecture 17: Neural Networks and Deep LearningLecture 17: Neural Networks and Deep Learning Jack Lanchantin Dr. Yanjun Qi 1 UVA CS 6316 / CS 4501-004 Machine Learning Fall 2016. Neurons

ewx+b

1 + ewx+b

Logistic Regression

Sigmoid Function(aka logistic, logit, “S”, soft-step)

4

P(Y=1|x) =

x

Page 5: Lecture 17: Neural Networks and Deep LearningLecture 17: Neural Networks and Deep Learning Jack Lanchantin Dr. Yanjun Qi 1 UVA CS 6316 / CS 4501-004 Machine Learning Fall 2016. Neurons

ez

1 + ez

Expanded Logistic Regression

x1

x2

x3

Σ

+1

z

z = wT . x + b

y = sigmoid(z) =5

px11xp

p = 3

1x1

w1

w2

w3

b1SummingFunction

SigmoidFunction

Multiply by weights

ŷ = P(Y=1|x,w)

Input x

Page 6: Lecture 17: Neural Networks and Deep LearningLecture 17: Neural Networks and Deep Learning Jack Lanchantin Dr. Yanjun Qi 1 UVA CS 6316 / CS 4501-004 Machine Learning Fall 2016. Neurons

“Neuron”

x1

x2

x3

Σ

+1

z

6

w1

w2

w3

b1

Page 7: Lecture 17: Neural Networks and Deep LearningLecture 17: Neural Networks and Deep Learning Jack Lanchantin Dr. Yanjun Qi 1 UVA CS 6316 / CS 4501-004 Machine Learning Fall 2016. Neurons

Neurons

7http://cs231n.stanford.edu/slides/winter1516_lecture5.pdf

Page 8: Lecture 17: Neural Networks and Deep LearningLecture 17: Neural Networks and Deep Learning Jack Lanchantin Dr. Yanjun Qi 1 UVA CS 6316 / CS 4501-004 Machine Learning Fall 2016. Neurons

Neuron

x1

x2

x3

Σ z ŷ

8

ez

1 + ez

z = wT . x

ŷ = sigmoid(z) =px11xp1x1

From here on, we leave out bias for simplicity

Input x

w1

w2

w3

Page 9: Lecture 17: Neural Networks and Deep LearningLecture 17: Neural Networks and Deep Learning Jack Lanchantin Dr. Yanjun Qi 1 UVA CS 6316 / CS 4501-004 Machine Learning Fall 2016. Neurons

“Block View” of a Neuron

x *

Dot Product Sigmoid

w z

9

y

Input output

Dot Product SigmoidInput output

x *w z ŷ

parameterized block

ez

1 + ez

z = wT . x

ŷ = sigmoid(z) =px11xp1x1

Page 10: Lecture 17: Neural Networks and Deep LearningLecture 17: Neural Networks and Deep Learning Jack Lanchantin Dr. Yanjun Qi 1 UVA CS 6316 / CS 4501-004 Machine Learning Fall 2016. Neurons

Neuron Representation

Σ

10

The linear transformation and nonlinearity together is typically considered a single neuron

ŷ

x1

x2

x3

x

w1

w2

w3

Page 11: Lecture 17: Neural Networks and Deep LearningLecture 17: Neural Networks and Deep Learning Jack Lanchantin Dr. Yanjun Qi 1 UVA CS 6316 / CS 4501-004 Machine Learning Fall 2016. Neurons

Neuron Representation

*w

11

ŷx

The linear transformation and nonlinearity together is typically considered a single neuron

Page 12: Lecture 17: Neural Networks and Deep LearningLecture 17: Neural Networks and Deep Learning Jack Lanchantin Dr. Yanjun Qi 1 UVA CS 6316 / CS 4501-004 Machine Learning Fall 2016. Neurons

Neurons

1-Layer Neural NetworkMulti-layer Neural Network

Loss Functions

Backpropagation

Nonlinearity Functions

NNs in Practice12

Page 13: Lecture 17: Neural Networks and Deep LearningLecture 17: Neural Networks and Deep Learning Jack Lanchantin Dr. Yanjun Qi 1 UVA CS 6316 / CS 4501-004 Machine Learning Fall 2016. Neurons

13

x1

x2

x3

W1

w3

W2

x ŷ

Page 14: Lecture 17: Neural Networks and Deep LearningLecture 17: Neural Networks and Deep Learning Jack Lanchantin Dr. Yanjun Qi 1 UVA CS 6316 / CS 4501-004 Machine Learning Fall 2016. Neurons

1-Layer Neural Network (with 4 neurons)

x1

x2

x3

Σ

Σ

Σ

Σ

Input x Output ŷ

14

ŷ

1

ŷ

2

ŷ

3

ŷ

4

Wz

Linear Sigmoid

1 layer

matrixvector

Page 15: Lecture 17: Neural Networks and Deep LearningLecture 17: Neural Networks and Deep Learning Jack Lanchantin Dr. Yanjun Qi 1 UVA CS 6316 / CS 4501-004 Machine Learning Fall 2016. Neurons

1-Layer Neural Network (with 4 neurons)

x1

x2

x3

Σ

Σ

Σ

Σ

Input x

15

d = 4

p = 3

Wz

Output ŷ

ŷ

1

ŷ

2

ŷ

3

ŷ

4

ez

1 + ez

z =WT x

ŷ = sigmoid(z) =px1dxpdx1

dx1 dx1

Element-wise on vector z

Page 16: Lecture 17: Neural Networks and Deep LearningLecture 17: Neural Networks and Deep Learning Jack Lanchantin Dr. Yanjun Qi 1 UVA CS 6316 / CS 4501-004 Machine Learning Fall 2016. Neurons

1-Layer Neural Network (with 4 neurons)

x1

x2

x3

Input x

16

ez

1 + ez

z =WT x

ŷ = sigmoid(z) =px1dxpdx1

dx1 dx1

d = 4

p = 3

W Σ

Σ

Σ

Σ

Output ŷ

ŷ

1

ŷ

2

ŷ

3

ŷ

4Element-wise on vector z

Page 17: Lecture 17: Neural Networks and Deep LearningLecture 17: Neural Networks and Deep Learning Jack Lanchantin Dr. Yanjun Qi 1 UVA CS 6316 / CS 4501-004 Machine Learning Fall 2016. Neurons

“Block View” of a Neural Network

x *

Dot Product Sigmoid

w z

17

y

Input output

Dot Product SigmoidInput output

x *W z ŷ

W is now a matrix

ez

1 + ez

z =WT x

ŷ = sigmoid(z) =px1dxpdx1

dx1 dx1

z is now a vector

Page 18: Lecture 17: Neural Networks and Deep LearningLecture 17: Neural Networks and Deep Learning Jack Lanchantin Dr. Yanjun Qi 1 UVA CS 6316 / CS 4501-004 Machine Learning Fall 2016. Neurons

Neurons

1-Layer Neural Network

Multi-layer Neural NetworkLoss Functions

Backpropagation

Nonlinearity Functions

NNs in Practice18

Page 19: Lecture 17: Neural Networks and Deep LearningLecture 17: Neural Networks and Deep Learning Jack Lanchantin Dr. Yanjun Qi 1 UVA CS 6316 / CS 4501-004 Machine Learning Fall 2016. Neurons

19

x1

x2

x3

W1

w3

W2

x ŷ

Page 20: Lecture 17: Neural Networks and Deep LearningLecture 17: Neural Networks and Deep Learning Jack Lanchantin Dr. Yanjun Qi 1 UVA CS 6316 / CS 4501-004 Machine Learning Fall 2016. Neurons

Multi-Layer Neural Network(Multi-Layer Perceptron (MLP) Network)

20

Outputlayer

x1

x2

x3

x ŷ

W1

w2

Hiddenlayer

weight subscript represents layer number

2-layer NN

Page 21: Lecture 17: Neural Networks and Deep LearningLecture 17: Neural Networks and Deep Learning Jack Lanchantin Dr. Yanjun Qi 1 UVA CS 6316 / CS 4501-004 Machine Learning Fall 2016. Neurons

Multi-Layer Neural Network (MLP)

21

1st hidden layer

2nd hiddenlayer

Outputlayer

x1

x2

x3

x ŷ

3-layer NN

W1

w3

W2

Page 22: Lecture 17: Neural Networks and Deep LearningLecture 17: Neural Networks and Deep Learning Jack Lanchantin Dr. Yanjun Qi 1 UVA CS 6316 / CS 4501-004 Machine Learning Fall 2016. Neurons

Multi-Layer Neural Network (MLP)

22

z1 =WT xh1 = sigmoid(z1)z2 =WT h1 h2 = sigmoid(z2)z3 =wT h2 ŷ = sigmoid(z3)

x1

x2

x3

h1 h2

W1

w3

W2

x ŷ

1

2

3

hidden layer 1 output

Page 23: Lecture 17: Neural Networks and Deep LearningLecture 17: Neural Networks and Deep Learning Jack Lanchantin Dr. Yanjun Qi 1 UVA CS 6316 / CS 4501-004 Machine Learning Fall 2016. Neurons

Multi-Class Output MLP

x1

x2

x3

x ŷ

23

h1 h2

z1 =WT xh1 = sigmoid(z1)z2 =WT h1 h2 = sigmoid(z2)z3 =WT h2 ŷ = sigmoid(z2)

1

2

3

W1

w3

W2

Page 24: Lecture 17: Neural Networks and Deep LearningLecture 17: Neural Networks and Deep Learning Jack Lanchantin Dr. Yanjun Qi 1 UVA CS 6316 / CS 4501-004 Machine Learning Fall 2016. Neurons

“Block View” Of MLP

x y

1st hidden layer

2nd hidden layer

Output layer

24

*W1

*W2

*W3

z1 z2 z3h1 h2

Page 25: Lecture 17: Neural Networks and Deep LearningLecture 17: Neural Networks and Deep Learning Jack Lanchantin Dr. Yanjun Qi 1 UVA CS 6316 / CS 4501-004 Machine Learning Fall 2016. Neurons

“Deep” Neural Networks (i.e. > 1 hidden layer)

25Researchers have successfully used 1000 layers to train an object classifier

Page 26: Lecture 17: Neural Networks and Deep LearningLecture 17: Neural Networks and Deep Learning Jack Lanchantin Dr. Yanjun Qi 1 UVA CS 6316 / CS 4501-004 Machine Learning Fall 2016. Neurons

Neurons

1-Layer Neural Network

Multi-layer Neural Network

Loss FunctionsBackpropagation

Nonlinearity Functions

NNs in Practice26

Page 27: Lecture 17: Neural Networks and Deep LearningLecture 17: Neural Networks and Deep Learning Jack Lanchantin Dr. Yanjun Qi 1 UVA CS 6316 / CS 4501-004 Machine Learning Fall 2016. Neurons

27

x1

x2

x3

W1

w3

W2

x ŷ E (ŷ)

Page 28: Lecture 17: Neural Networks and Deep LearningLecture 17: Neural Networks and Deep Learning Jack Lanchantin Dr. Yanjun Qi 1 UVA CS 6316 / CS 4501-004 Machine Learning Fall 2016. Neurons

ŷ = P(y=1|X,W)

28

x1

x2

x3

x

Binary Classification Loss

E= loss = - logP(Y = ŷ | X= x ) = - y log(ŷ) - (1 - y) log(1-ŷ)

W1

w3

W2

this example is for a single sample x

true output

Page 29: Lecture 17: Neural Networks and Deep LearningLecture 17: Neural Networks and Deep Learning Jack Lanchantin Dr. Yanjun Qi 1 UVA CS 6316 / CS 4501-004 Machine Learning Fall 2016. Neurons

29

x1

x2

x3

Σ

W1

w3

W2

x

Regression Loss

E = loss= ( y - ŷ )212

ŷ

true output

Page 30: Lecture 17: Neural Networks and Deep LearningLecture 17: Neural Networks and Deep Learning Jack Lanchantin Dr. Yanjun Qi 1 UVA CS 6316 / CS 4501-004 Machine Learning Fall 2016. Neurons

z1

z2

z3

30

x1

x2

x3

x

Σ

Σ

Σ

ŷ1

ŷ2

ŷ3

Multi-Class Classification Loss

“Softmax” function. Normalizing function which converts each class output to a probability.

E = loss = - yj ln ŷjΣj = 1...K

= P( ŷi = 1 | x )

W1 W3

W2

ŷi

“0” for all except true class

K = 3

Page 31: Lecture 17: Neural Networks and Deep LearningLecture 17: Neural Networks and Deep Learning Jack Lanchantin Dr. Yanjun Qi 1 UVA CS 6316 / CS 4501-004 Machine Learning Fall 2016. Neurons

Neurons

1-Layer Neural Network

Multi-layer Neural Network

Loss Functions

BackpropagationNonlinearity Functions

NNs in Practice31

Page 32: Lecture 17: Neural Networks and Deep LearningLecture 17: Neural Networks and Deep Learning Jack Lanchantin Dr. Yanjun Qi 1 UVA CS 6316 / CS 4501-004 Machine Learning Fall 2016. Neurons

32

x1

x2

x3

W1

w3

W2

x ŷ E (ŷ)

Page 33: Lecture 17: Neural Networks and Deep LearningLecture 17: Neural Networks and Deep Learning Jack Lanchantin Dr. Yanjun Qi 1 UVA CS 6316 / CS 4501-004 Machine Learning Fall 2016. Neurons

Training Neural Networks

33

How do we learn the optimal weights WL for our task??● Gradient descent:

LeCun et. al. Efficient Backpropagation. 1998

WL(t+1) = WL(t) - E WL(t)

But how do we get gradients of lower layers?● Backpropagation!

○ Repeated application of chain rule of calculus○ Locally minimize the objective○ Requires all “blocks” of the network to be differentiable

x ŷ

W1w3

W2

Page 34: Lecture 17: Neural Networks and Deep LearningLecture 17: Neural Networks and Deep Learning Jack Lanchantin Dr. Yanjun Qi 1 UVA CS 6316 / CS 4501-004 Machine Learning Fall 2016. Neurons

34

Backpropagation Intro

http://cs231n.stanford.edu/slides/winter1516_lecture5.pdf

Page 35: Lecture 17: Neural Networks and Deep LearningLecture 17: Neural Networks and Deep Learning Jack Lanchantin Dr. Yanjun Qi 1 UVA CS 6316 / CS 4501-004 Machine Learning Fall 2016. Neurons

35

Backpropagation Intro

http://cs231n.stanford.edu/slides/winter1516_lecture5.pdf

Page 36: Lecture 17: Neural Networks and Deep LearningLecture 17: Neural Networks and Deep Learning Jack Lanchantin Dr. Yanjun Qi 1 UVA CS 6316 / CS 4501-004 Machine Learning Fall 2016. Neurons

36

Backpropagation Intro

http://cs231n.stanford.edu/slides/winter1516_lecture5.pdf

Page 37: Lecture 17: Neural Networks and Deep LearningLecture 17: Neural Networks and Deep Learning Jack Lanchantin Dr. Yanjun Qi 1 UVA CS 6316 / CS 4501-004 Machine Learning Fall 2016. Neurons

37

Backpropagation Intro

http://cs231n.stanford.edu/slides/winter1516_lecture5.pdf

Page 38: Lecture 17: Neural Networks and Deep LearningLecture 17: Neural Networks and Deep Learning Jack Lanchantin Dr. Yanjun Qi 1 UVA CS 6316 / CS 4501-004 Machine Learning Fall 2016. Neurons

38

Backpropagation Intro

http://cs231n.stanford.edu/slides/winter1516_lecture5.pdf

Page 39: Lecture 17: Neural Networks and Deep LearningLecture 17: Neural Networks and Deep Learning Jack Lanchantin Dr. Yanjun Qi 1 UVA CS 6316 / CS 4501-004 Machine Learning Fall 2016. Neurons

39

Backpropagation Intro

http://cs231n.stanford.edu/slides/winter1516_lecture5.pdf

Page 40: Lecture 17: Neural Networks and Deep LearningLecture 17: Neural Networks and Deep Learning Jack Lanchantin Dr. Yanjun Qi 1 UVA CS 6316 / CS 4501-004 Machine Learning Fall 2016. Neurons

40

Backpropagation Intro

http://cs231n.stanford.edu/slides/winter1516_lecture5.pdf

Page 41: Lecture 17: Neural Networks and Deep LearningLecture 17: Neural Networks and Deep Learning Jack Lanchantin Dr. Yanjun Qi 1 UVA CS 6316 / CS 4501-004 Machine Learning Fall 2016. Neurons

41

Backpropagation Intro

http://cs231n.stanford.edu/slides/winter1516_lecture5.pdf

Page 42: Lecture 17: Neural Networks and Deep LearningLecture 17: Neural Networks and Deep Learning Jack Lanchantin Dr. Yanjun Qi 1 UVA CS 6316 / CS 4501-004 Machine Learning Fall 2016. Neurons

42

Backpropagation Intro

http://cs231n.stanford.edu/slides/winter1516_lecture5.pdf

Page 43: Lecture 17: Neural Networks and Deep LearningLecture 17: Neural Networks and Deep Learning Jack Lanchantin Dr. Yanjun Qi 1 UVA CS 6316 / CS 4501-004 Machine Learning Fall 2016. Neurons

43

Backpropagation Intro

http://cs231n.stanford.edu/slides/winter1516_lecture5.pdf

Page 44: Lecture 17: Neural Networks and Deep LearningLecture 17: Neural Networks and Deep Learning Jack Lanchantin Dr. Yanjun Qi 1 UVA CS 6316 / CS 4501-004 Machine Learning Fall 2016. Neurons

44

Backpropagation Intro

http://cs231n.stanford.edu/slides/winter1516_lecture5.pdf

Page 45: Lecture 17: Neural Networks and Deep LearningLecture 17: Neural Networks and Deep Learning Jack Lanchantin Dr. Yanjun Qi 1 UVA CS 6316 / CS 4501-004 Machine Learning Fall 2016. Neurons

45

Backpropagation Intro

http://cs231n.stanford.edu/slides/winter1516_lecture5.pdf

Page 46: Lecture 17: Neural Networks and Deep LearningLecture 17: Neural Networks and Deep Learning Jack Lanchantin Dr. Yanjun Qi 1 UVA CS 6316 / CS 4501-004 Machine Learning Fall 2016. Neurons

46

Backpropagation Intro

Tells us: by increasing x by a scale of 1, we decrease f by a scale of 4

http://cs231n.stanford.edu/slides/winter1516_lecture5.pdf

Page 47: Lecture 17: Neural Networks and Deep LearningLecture 17: Neural Networks and Deep Learning Jack Lanchantin Dr. Yanjun Qi 1 UVA CS 6316 / CS 4501-004 Machine Learning Fall 2016. Neurons

47

ŷ = P(y=1|X,W)

x1

x2

x3

W1

w2

x

Backpropagation(binary classification example)

Example on 1-hidden layer NN for binary classification

Page 48: Lecture 17: Neural Networks and Deep LearningLecture 17: Neural Networks and Deep Learning Jack Lanchantin Dr. Yanjun Qi 1 UVA CS 6316 / CS 4501-004 Machine Learning Fall 2016. Neurons

48

Backpropagation(binary classification example)

x y*W1

*w2

z1z2h1

Page 49: Lecture 17: Neural Networks and Deep LearningLecture 17: Neural Networks and Deep Learning Jack Lanchantin Dr. Yanjun Qi 1 UVA CS 6316 / CS 4501-004 Machine Learning Fall 2016. Neurons

49

x y*W1

*w2

z1z2h1

E = loss =

Backpropagation(binary classification example)

Gradient Descent to Minimize loss: Need to find these!

Page 50: Lecture 17: Neural Networks and Deep LearningLecture 17: Neural Networks and Deep Learning Jack Lanchantin Dr. Yanjun Qi 1 UVA CS 6316 / CS 4501-004 Machine Learning Fall 2016. Neurons

50

Backpropagation(binary classification example)

x y*W1

*w2

z1z2h1

Page 51: Lecture 17: Neural Networks and Deep LearningLecture 17: Neural Networks and Deep Learning Jack Lanchantin Dr. Yanjun Qi 1 UVA CS 6316 / CS 4501-004 Machine Learning Fall 2016. Neurons

51

Backpropagation(binary classification example)

= ??

= ??

x y*W1

*w2

z1z2h1

Page 52: Lecture 17: Neural Networks and Deep LearningLecture 17: Neural Networks and Deep Learning Jack Lanchantin Dr. Yanjun Qi 1 UVA CS 6316 / CS 4501-004 Machine Learning Fall 2016. Neurons

52

Backpropagation(binary classification example)

= ??

= ??

Exploit the chain rule!

x y*W1

*w2

z1z2h1

Page 53: Lecture 17: Neural Networks and Deep LearningLecture 17: Neural Networks and Deep Learning Jack Lanchantin Dr. Yanjun Qi 1 UVA CS 6316 / CS 4501-004 Machine Learning Fall 2016. Neurons

53

Backpropagation(binary classification example)

x y*W1

*w2

z1z2h1

Page 54: Lecture 17: Neural Networks and Deep LearningLecture 17: Neural Networks and Deep Learning Jack Lanchantin Dr. Yanjun Qi 1 UVA CS 6316 / CS 4501-004 Machine Learning Fall 2016. Neurons

54

Backpropagation(binary classification example)

chain rule

x y*W1

*w2

z1z2h1

Page 55: Lecture 17: Neural Networks and Deep LearningLecture 17: Neural Networks and Deep Learning Jack Lanchantin Dr. Yanjun Qi 1 UVA CS 6316 / CS 4501-004 Machine Learning Fall 2016. Neurons

55

Backpropagation(binary classification example)

x y*W1

*w2

z1z2h1

Page 56: Lecture 17: Neural Networks and Deep LearningLecture 17: Neural Networks and Deep Learning Jack Lanchantin Dr. Yanjun Qi 1 UVA CS 6316 / CS 4501-004 Machine Learning Fall 2016. Neurons

56

Backpropagation(binary classification example)

x y*W1

*w2

z1z2h1

Page 57: Lecture 17: Neural Networks and Deep LearningLecture 17: Neural Networks and Deep Learning Jack Lanchantin Dr. Yanjun Qi 1 UVA CS 6316 / CS 4501-004 Machine Learning Fall 2016. Neurons

57

Backpropagation(binary classification example)

x y*W1

*w2

z1z2h1

Page 58: Lecture 17: Neural Networks and Deep LearningLecture 17: Neural Networks and Deep Learning Jack Lanchantin Dr. Yanjun Qi 1 UVA CS 6316 / CS 4501-004 Machine Learning Fall 2016. Neurons

58

Backpropagation(binary classification example)

x y*W1

*w2

z1z2h1

Page 59: Lecture 17: Neural Networks and Deep LearningLecture 17: Neural Networks and Deep Learning Jack Lanchantin Dr. Yanjun Qi 1 UVA CS 6316 / CS 4501-004 Machine Learning Fall 2016. Neurons

59

Backpropagation(binary classification example)

x y*W1

*w2

z1z2h1

Page 60: Lecture 17: Neural Networks and Deep LearningLecture 17: Neural Networks and Deep Learning Jack Lanchantin Dr. Yanjun Qi 1 UVA CS 6316 / CS 4501-004 Machine Learning Fall 2016. Neurons

60

Backpropagation(binary classification example)

x y*W1

*w2

z1z2h1

Page 61: Lecture 17: Neural Networks and Deep LearningLecture 17: Neural Networks and Deep Learning Jack Lanchantin Dr. Yanjun Qi 1 UVA CS 6316 / CS 4501-004 Machine Learning Fall 2016. Neurons

61

Backpropagation(binary classification example)

x y*W1

*w2

z1z2h1

Page 62: Lecture 17: Neural Networks and Deep LearningLecture 17: Neural Networks and Deep Learning Jack Lanchantin Dr. Yanjun Qi 1 UVA CS 6316 / CS 4501-004 Machine Learning Fall 2016. Neurons

62

Backpropagation(binary classification example)

already computed

x y*W1

*w2

z1z2h1

Page 63: Lecture 17: Neural Networks and Deep LearningLecture 17: Neural Networks and Deep Learning Jack Lanchantin Dr. Yanjun Qi 1 UVA CS 6316 / CS 4501-004 Machine Learning Fall 2016. Neurons

63

“Local-ness” of Backpropagation

fx y

“local gradients”activations

gradients

http://cs231n.stanford.edu/slides/winter1516_lecture5.pdf

Page 64: Lecture 17: Neural Networks and Deep LearningLecture 17: Neural Networks and Deep Learning Jack Lanchantin Dr. Yanjun Qi 1 UVA CS 6316 / CS 4501-004 Machine Learning Fall 2016. Neurons

64

“Local-ness” of Backpropagation

fx y

“local gradients”activations

gradients

Page 65: Lecture 17: Neural Networks and Deep LearningLecture 17: Neural Networks and Deep Learning Jack Lanchantin Dr. Yanjun Qi 1 UVA CS 6316 / CS 4501-004 Machine Learning Fall 2016. Neurons

x

65

Example: Sigmoid Block

sigmoid(x) = (x)

Page 66: Lecture 17: Neural Networks and Deep LearningLecture 17: Neural Networks and Deep Learning Jack Lanchantin Dr. Yanjun Qi 1 UVA CS 6316 / CS 4501-004 Machine Learning Fall 2016. Neurons

Deep Learning = Concatenation of Differentiable Parameterized Layers (linear & nonlinearity functions)

x y

1st hidden layer

2nd hidden layer

Output layer

66

*W1

*W2

*W3

z1 z2 z3h1 h2

Want to find optimal weights W to minimize some loss function E!

Page 67: Lecture 17: Neural Networks and Deep LearningLecture 17: Neural Networks and Deep Learning Jack Lanchantin Dr. Yanjun Qi 1 UVA CS 6316 / CS 4501-004 Machine Learning Fall 2016. Neurons

Backprop Whiteboard Demo

x1

x2

1

Σ

Σ

ŷ

67

w1 z1w2

w3w4

b1

b2

z2

h1

h2

Σ

w5

w6

1b3

z1 = x1w1 + x2w3 + b1z2 = x1w2 + x2w4 + b2

h1 =exp(z1)

1 + exp(z1)exp(z2)

1 + exp(z2)h2 =

ŷ = h1w5 + h2w6 + b3

E = ( y - ŷ )2

f1

f2

f3

f4

w(t+1) = w(t) - E w(t)

E w = ??

Page 68: Lecture 17: Neural Networks and Deep LearningLecture 17: Neural Networks and Deep Learning Jack Lanchantin Dr. Yanjun Qi 1 UVA CS 6316 / CS 4501-004 Machine Learning Fall 2016. Neurons

Neurons

1-Layer Neural Network

Multi-layer Neural Network

Loss Functions

Backpropagation

Nonlinearity FunctionsNNs in Practice

68

Page 69: Lecture 17: Neural Networks and Deep LearningLecture 17: Neural Networks and Deep Learning Jack Lanchantin Dr. Yanjun Qi 1 UVA CS 6316 / CS 4501-004 Machine Learning Fall 2016. Neurons

Nonlinearity Functions (i.e. transfer or activation functions)

x1

x2

x3

Σ

SummingFunction

SigmoidFunction

w1

w2

w3

+1

b1

z

x﹡w

Multiply by weights

69

Page 70: Lecture 17: Neural Networks and Deep LearningLecture 17: Neural Networks and Deep Learning Jack Lanchantin Dr. Yanjun Qi 1 UVA CS 6316 / CS 4501-004 Machine Learning Fall 2016. Neurons

Nonlinearity Functions (i.e. transfer or activation functions)

x﹡w

70https://en.wikipedia.org/wiki/Activation_function#Comparison_of_activation_functions

Name Plot Equation Derivative ( w.r.t x )

Page 71: Lecture 17: Neural Networks and Deep LearningLecture 17: Neural Networks and Deep Learning Jack Lanchantin Dr. Yanjun Qi 1 UVA CS 6316 / CS 4501-004 Machine Learning Fall 2016. Neurons

Nonlinearity Functions (i.e. transfer or activation functions)

x﹡w

71https://en.wikipedia.org/wiki/Activation_function#Comparison_of_activation_functions

Name Plot Equation Derivative ( w.r.t x )

Page 72: Lecture 17: Neural Networks and Deep LearningLecture 17: Neural Networks and Deep Learning Jack Lanchantin Dr. Yanjun Qi 1 UVA CS 6316 / CS 4501-004 Machine Learning Fall 2016. Neurons

Nonlinearity Functions (aka transfer or activation functions)

x﹡w

72https://en.wikipedia.org/wiki/Activation_function#Comparison_of_activation_functions

Name Plot Equation Derivative ( w.r.t x )

usually works best in practice

Page 73: Lecture 17: Neural Networks and Deep LearningLecture 17: Neural Networks and Deep Learning Jack Lanchantin Dr. Yanjun Qi 1 UVA CS 6316 / CS 4501-004 Machine Learning Fall 2016. Neurons

Neurons

1-Layer Neural Network

Multi-layer Neural Network

Loss Functions

Backpropagation

Nonlinearity Functions

NNs in Practice73

Page 74: Lecture 17: Neural Networks and Deep LearningLecture 17: Neural Networks and Deep Learning Jack Lanchantin Dr. Yanjun Qi 1 UVA CS 6316 / CS 4501-004 Machine Learning Fall 2016. Neurons

74

Neural Net Pipeline

x ŷ

W1w3

1. Initialize weights2. For each batch of input x samples S:

a. Run the network “Forward” on S to compute outputs and lossb. Run the network “Backward” using outputs and loss to compute gradientsc. Update weights using SGD (or a similar method)

3. Repeat step 2 until loss convergence

W2

E

Page 75: Lecture 17: Neural Networks and Deep LearningLecture 17: Neural Networks and Deep Learning Jack Lanchantin Dr. Yanjun Qi 1 UVA CS 6316 / CS 4501-004 Machine Learning Fall 2016. Neurons

75

Non-Convexity of Neural Nets

In very high dimensions, there exists many local minimum which are about the same.

Pascanu, et. al. On the saddle point problem for non-convex optimization 2014

Page 76: Lecture 17: Neural Networks and Deep LearningLecture 17: Neural Networks and Deep Learning Jack Lanchantin Dr. Yanjun Qi 1 UVA CS 6316 / CS 4501-004 Machine Learning Fall 2016. Neurons

76

Building Deep Neural Nets

http://cs231n.stanford.edu/slides/winter1516_lecture5.pdf

fx

y

Page 77: Lecture 17: Neural Networks and Deep LearningLecture 17: Neural Networks and Deep Learning Jack Lanchantin Dr. Yanjun Qi 1 UVA CS 6316 / CS 4501-004 Machine Learning Fall 2016. Neurons

77

Building Deep Neural Nets

“GoogLeNet” for Object Classification

Page 78: Lecture 17: Neural Networks and Deep LearningLecture 17: Neural Networks and Deep Learning Jack Lanchantin Dr. Yanjun Qi 1 UVA CS 6316 / CS 4501-004 Machine Learning Fall 2016. Neurons

78

Block Example Implementation

http://cs231n.stanford.edu/slides/winter1516_lecture5.pdf

Page 79: Lecture 17: Neural Networks and Deep LearningLecture 17: Neural Networks and Deep Learning Jack Lanchantin Dr. Yanjun Qi 1 UVA CS 6316 / CS 4501-004 Machine Learning Fall 2016. Neurons

79

Advantage of Neural Nets

As long as it’s fully differentiable, we can train the model to automatically learn features for us.

Page 80: Lecture 17: Neural Networks and Deep LearningLecture 17: Neural Networks and Deep Learning Jack Lanchantin Dr. Yanjun Qi 1 UVA CS 6316 / CS 4501-004 Machine Learning Fall 2016. Neurons

Advanced Deep Learning Models:

Convolutional Neural Networks

& Recurrent Neural Networks

Most slides from http://cs231n.stanford.edu/

80

Page 81: Lecture 17: Neural Networks and Deep LearningLecture 17: Neural Networks and Deep Learning Jack Lanchantin Dr. Yanjun Qi 1 UVA CS 6316 / CS 4501-004 Machine Learning Fall 2016. Neurons

81

Convolutional Neural Networks(aka CNNs and ConvNets)

Page 82: Lecture 17: Neural Networks and Deep LearningLecture 17: Neural Networks and Deep Learning Jack Lanchantin Dr. Yanjun Qi 1 UVA CS 6316 / CS 4501-004 Machine Learning Fall 2016. Neurons

Challenges in Visual Recognition

Page 83: Lecture 17: Neural Networks and Deep LearningLecture 17: Neural Networks and Deep Learning Jack Lanchantin Dr. Yanjun Qi 1 UVA CS 6316 / CS 4501-004 Machine Learning Fall 2016. Neurons

Challenges in Visual Recognition

Page 84: Lecture 17: Neural Networks and Deep LearningLecture 17: Neural Networks and Deep Learning Jack Lanchantin Dr. Yanjun Qi 1 UVA CS 6316 / CS 4501-004 Machine Learning Fall 2016. Neurons

84

Problems with “Fully Connected Networks” on Images

1000x1000 Image1M hidden units:→ 10^12 parameters

Spatial Correlation is local! → Connect units

locally

Each neuron corresponds to a specific pixel

Page 85: Lecture 17: Neural Networks and Deep LearningLecture 17: Neural Networks and Deep Learning Jack Lanchantin Dr. Yanjun Qi 1 UVA CS 6316 / CS 4501-004 Machine Learning Fall 2016. Neurons

85

Problems with “Fully Connected Networks” on Images

1000x1000 Image1M hidden units:→ 10^12 parameters

Spatial Correlation is local! → Connect units

locally

Each neuron corresponds to a specific pixel

How do we deal with multiple dimensions in the input? Length,height, channel (R,G,B)

Page 86: Lecture 17: Neural Networks and Deep LearningLecture 17: Neural Networks and Deep Learning Jack Lanchantin Dr. Yanjun Qi 1 UVA CS 6316 / CS 4501-004 Machine Learning Fall 2016. Neurons

86

Convolutional Layer

Page 87: Lecture 17: Neural Networks and Deep LearningLecture 17: Neural Networks and Deep Learning Jack Lanchantin Dr. Yanjun Qi 1 UVA CS 6316 / CS 4501-004 Machine Learning Fall 2016. Neurons

87

Convolutional Layer

Page 88: Lecture 17: Neural Networks and Deep LearningLecture 17: Neural Networks and Deep Learning Jack Lanchantin Dr. Yanjun Qi 1 UVA CS 6316 / CS 4501-004 Machine Learning Fall 2016. Neurons

88

Convolutional Layer

Page 89: Lecture 17: Neural Networks and Deep LearningLecture 17: Neural Networks and Deep Learning Jack Lanchantin Dr. Yanjun Qi 1 UVA CS 6316 / CS 4501-004 Machine Learning Fall 2016. Neurons

89

Convolutional Layer

Page 90: Lecture 17: Neural Networks and Deep LearningLecture 17: Neural Networks and Deep Learning Jack Lanchantin Dr. Yanjun Qi 1 UVA CS 6316 / CS 4501-004 Machine Learning Fall 2016. Neurons

90

Convolution

Page 91: Lecture 17: Neural Networks and Deep LearningLecture 17: Neural Networks and Deep Learning Jack Lanchantin Dr. Yanjun Qi 1 UVA CS 6316 / CS 4501-004 Machine Learning Fall 2016. Neurons

91

Convolution

Page 92: Lecture 17: Neural Networks and Deep LearningLecture 17: Neural Networks and Deep Learning Jack Lanchantin Dr. Yanjun Qi 1 UVA CS 6316 / CS 4501-004 Machine Learning Fall 2016. Neurons

92

Neuron View of Convolutional Layer

Page 93: Lecture 17: Neural Networks and Deep LearningLecture 17: Neural Networks and Deep Learning Jack Lanchantin Dr. Yanjun Qi 1 UVA CS 6316 / CS 4501-004 Machine Learning Fall 2016. Neurons

93

Neuron View of Convolutional Layer

Page 94: Lecture 17: Neural Networks and Deep LearningLecture 17: Neural Networks and Deep Learning Jack Lanchantin Dr. Yanjun Qi 1 UVA CS 6316 / CS 4501-004 Machine Learning Fall 2016. Neurons

94

Convolutional Layer

Page 95: Lecture 17: Neural Networks and Deep LearningLecture 17: Neural Networks and Deep Learning Jack Lanchantin Dr. Yanjun Qi 1 UVA CS 6316 / CS 4501-004 Machine Learning Fall 2016. Neurons

95

Convolutional Layer

Page 96: Lecture 17: Neural Networks and Deep LearningLecture 17: Neural Networks and Deep Learning Jack Lanchantin Dr. Yanjun Qi 1 UVA CS 6316 / CS 4501-004 Machine Learning Fall 2016. Neurons

96

Neuron View of Convolutional Layer

Page 97: Lecture 17: Neural Networks and Deep LearningLecture 17: Neural Networks and Deep Learning Jack Lanchantin Dr. Yanjun Qi 1 UVA CS 6316 / CS 4501-004 Machine Learning Fall 2016. Neurons

97

Convolutional Neural Networks

Page 98: Lecture 17: Neural Networks and Deep LearningLecture 17: Neural Networks and Deep Learning Jack Lanchantin Dr. Yanjun Qi 1 UVA CS 6316 / CS 4501-004 Machine Learning Fall 2016. Neurons

98

Convolutional Neural Networks

Page 99: Lecture 17: Neural Networks and Deep LearningLecture 17: Neural Networks and Deep Learning Jack Lanchantin Dr. Yanjun Qi 1 UVA CS 6316 / CS 4501-004 Machine Learning Fall 2016. Neurons

99

Convolutional Neural Networks

Page 100: Lecture 17: Neural Networks and Deep LearningLecture 17: Neural Networks and Deep Learning Jack Lanchantin Dr. Yanjun Qi 1 UVA CS 6316 / CS 4501-004 Machine Learning Fall 2016. Neurons

100

Convolutional Neural Networks

http://www.wildml.com/2015/11/understanding-convolutional-neural-networks-for-nlp/

Page 101: Lecture 17: Neural Networks and Deep LearningLecture 17: Neural Networks and Deep Learning Jack Lanchantin Dr. Yanjun Qi 1 UVA CS 6316 / CS 4501-004 Machine Learning Fall 2016. Neurons

101

Pooling Layer

Page 102: Lecture 17: Neural Networks and Deep LearningLecture 17: Neural Networks and Deep Learning Jack Lanchantin Dr. Yanjun Qi 1 UVA CS 6316 / CS 4501-004 Machine Learning Fall 2016. Neurons

102

Pooling Layer (Max Pooling example)

Page 103: Lecture 17: Neural Networks and Deep LearningLecture 17: Neural Networks and Deep Learning Jack Lanchantin Dr. Yanjun Qi 1 UVA CS 6316 / CS 4501-004 Machine Learning Fall 2016. Neurons

103

History of ConvNets

Gradient-based learning applied to document recognition [LeCun, Bottou, Bengio, Haffner]

ImageNet Classification with Deep Convolutional Neural Networks [Krizhevsky, Sutskever, Hinton, 2012]

1998 2012

Page 104: Lecture 17: Neural Networks and Deep LearningLecture 17: Neural Networks and Deep Learning Jack Lanchantin Dr. Yanjun Qi 1 UVA CS 6316 / CS 4501-004 Machine Learning Fall 2016. Neurons

104

Page 105: Lecture 17: Neural Networks and Deep LearningLecture 17: Neural Networks and Deep Learning Jack Lanchantin Dr. Yanjun Qi 1 UVA CS 6316 / CS 4501-004 Machine Learning Fall 2016. Neurons

105

Page 106: Lecture 17: Neural Networks and Deep LearningLecture 17: Neural Networks and Deep Learning Jack Lanchantin Dr. Yanjun Qi 1 UVA CS 6316 / CS 4501-004 Machine Learning Fall 2016. Neurons

106

Page 107: Lecture 17: Neural Networks and Deep LearningLecture 17: Neural Networks and Deep Learning Jack Lanchantin Dr. Yanjun Qi 1 UVA CS 6316 / CS 4501-004 Machine Learning Fall 2016. Neurons

107

Page 108: Lecture 17: Neural Networks and Deep LearningLecture 17: Neural Networks and Deep Learning Jack Lanchantin Dr. Yanjun Qi 1 UVA CS 6316 / CS 4501-004 Machine Learning Fall 2016. Neurons

108

Recurrent Neural Networks

Page 109: Lecture 17: Neural Networks and Deep LearningLecture 17: Neural Networks and Deep Learning Jack Lanchantin Dr. Yanjun Qi 1 UVA CS 6316 / CS 4501-004 Machine Learning Fall 2016. Neurons

109

Standard “Feed-Forward” Neural Network

Page 110: Lecture 17: Neural Networks and Deep LearningLecture 17: Neural Networks and Deep Learning Jack Lanchantin Dr. Yanjun Qi 1 UVA CS 6316 / CS 4501-004 Machine Learning Fall 2016. Neurons

110

Standard “Feed-Forward” Neural Network

Page 111: Lecture 17: Neural Networks and Deep LearningLecture 17: Neural Networks and Deep Learning Jack Lanchantin Dr. Yanjun Qi 1 UVA CS 6316 / CS 4501-004 Machine Learning Fall 2016. Neurons

111

Recurrent Neural Networks (RNNs)RNNs can handle

Page 112: Lecture 17: Neural Networks and Deep LearningLecture 17: Neural Networks and Deep Learning Jack Lanchantin Dr. Yanjun Qi 1 UVA CS 6316 / CS 4501-004 Machine Learning Fall 2016. Neurons

112

Recurrent Neural Networks (RNNs)RNNs can handle

Page 113: Lecture 17: Neural Networks and Deep LearningLecture 17: Neural Networks and Deep Learning Jack Lanchantin Dr. Yanjun Qi 1 UVA CS 6316 / CS 4501-004 Machine Learning Fall 2016. Neurons

Recurrent Neural Networks (RNNs)

Traditional “Feed Forward” Neural Network Recurrent Neural Network

input

hidden

output

predict a vector at each timestep

input

hidden

output

Page 114: Lecture 17: Neural Networks and Deep LearningLecture 17: Neural Networks and Deep Learning Jack Lanchantin Dr. Yanjun Qi 1 UVA CS 6316 / CS 4501-004 Machine Learning Fall 2016. Neurons

114

Recurrent Neural Networks

ht

ht-1

Page 115: Lecture 17: Neural Networks and Deep LearningLecture 17: Neural Networks and Deep Learning Jack Lanchantin Dr. Yanjun Qi 1 UVA CS 6316 / CS 4501-004 Machine Learning Fall 2016. Neurons

115

Recurrent Neural Networks

ht

ht-1

Page 116: Lecture 17: Neural Networks and Deep LearningLecture 17: Neural Networks and Deep Learning Jack Lanchantin Dr. Yanjun Qi 1 UVA CS 6316 / CS 4501-004 Machine Learning Fall 2016. Neurons

116

“Vanilla” Recurrent Neural Network

ht

ht-1

Page 117: Lecture 17: Neural Networks and Deep LearningLecture 17: Neural Networks and Deep Learning Jack Lanchantin Dr. Yanjun Qi 1 UVA CS 6316 / CS 4501-004 Machine Learning Fall 2016. Neurons

117

Character Level Language Model with an RNN

Page 118: Lecture 17: Neural Networks and Deep LearningLecture 17: Neural Networks and Deep Learning Jack Lanchantin Dr. Yanjun Qi 1 UVA CS 6316 / CS 4501-004 Machine Learning Fall 2016. Neurons

118

Character Level Language Model with an RNN

Page 119: Lecture 17: Neural Networks and Deep LearningLecture 17: Neural Networks and Deep Learning Jack Lanchantin Dr. Yanjun Qi 1 UVA CS 6316 / CS 4501-004 Machine Learning Fall 2016. Neurons

119

Character Level Language Model with an RNN

Page 120: Lecture 17: Neural Networks and Deep LearningLecture 17: Neural Networks and Deep Learning Jack Lanchantin Dr. Yanjun Qi 1 UVA CS 6316 / CS 4501-004 Machine Learning Fall 2016. Neurons

120

Character Level Language Model with an RNN

Page 121: Lecture 17: Neural Networks and Deep LearningLecture 17: Neural Networks and Deep Learning Jack Lanchantin Dr. Yanjun Qi 1 UVA CS 6316 / CS 4501-004 Machine Learning Fall 2016. Neurons

121

Character Level Language Model with an RNN

Page 122: Lecture 17: Neural Networks and Deep LearningLecture 17: Neural Networks and Deep Learning Jack Lanchantin Dr. Yanjun Qi 1 UVA CS 6316 / CS 4501-004 Machine Learning Fall 2016. Neurons

122

Character Level Language Model with an RNN

Page 123: Lecture 17: Neural Networks and Deep LearningLecture 17: Neural Networks and Deep Learning Jack Lanchantin Dr. Yanjun Qi 1 UVA CS 6316 / CS 4501-004 Machine Learning Fall 2016. Neurons

123

Character Level Language Model with an RNN

Page 124: Lecture 17: Neural Networks and Deep LearningLecture 17: Neural Networks and Deep Learning Jack Lanchantin Dr. Yanjun Qi 1 UVA CS 6316 / CS 4501-004 Machine Learning Fall 2016. Neurons

124

Example: Generating Shakespeare with RNNs

Page 125: Lecture 17: Neural Networks and Deep LearningLecture 17: Neural Networks and Deep Learning Jack Lanchantin Dr. Yanjun Qi 1 UVA CS 6316 / CS 4501-004 Machine Learning Fall 2016. Neurons

125

Example: Generating C Code with RNNs

Page 126: Lecture 17: Neural Networks and Deep LearningLecture 17: Neural Networks and Deep Learning Jack Lanchantin Dr. Yanjun Qi 1 UVA CS 6316 / CS 4501-004 Machine Learning Fall 2016. Neurons

Long Short Term Memory Networks (LSTMs)Recurrent networks suffer from the “vanishing gradient problem”● Aren’t able to model long term dependencies in sequences

input

hidden

output

Page 127: Lecture 17: Neural Networks and Deep LearningLecture 17: Neural Networks and Deep Learning Jack Lanchantin Dr. Yanjun Qi 1 UVA CS 6316 / CS 4501-004 Machine Learning Fall 2016. Neurons

Long Short Term Memory Networks (LSTMs)Recurrent networks suffer from the “vanishing gradient problem”● Aren’t able to model long term dependencies in sequences

Use “gating units” to learn when to remember

input

output

Page 128: Lecture 17: Neural Networks and Deep LearningLecture 17: Neural Networks and Deep Learning Jack Lanchantin Dr. Yanjun Qi 1 UVA CS 6316 / CS 4501-004 Machine Learning Fall 2016. Neurons

128

RNNs and CNNs Together