deep neural networks - biuu.cs.biu.ac.il/~jkeshet/teaching/iml2016/iml2016_tirgul09.pdf ·...

42
Deep Neural Networks Tirgul 9

Upload: others

Post on 24-May-2020

5 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Deep Neural Networks - BIUu.cs.biu.ac.il/~jkeshet/teaching/iml2016/iml2016_tirgul09.pdf · Feedforward Neural Networks for Deep Learning •An input, output, and one or more hidden

Deep Neural Networks

Tirgul 9

Page 2: Deep Neural Networks - BIUu.cs.biu.ac.il/~jkeshet/teaching/iml2016/iml2016_tirgul09.pdf · Feedforward Neural Networks for Deep Learning •An input, output, and one or more hidden

2

Page 3: Deep Neural Networks - BIUu.cs.biu.ac.il/~jkeshet/teaching/iml2016/iml2016_tirgul09.pdf · Feedforward Neural Networks for Deep Learning •An input, output, and one or more hidden

Setup

• Handful of labeled examples, say images of cats with the label “Cat” and images of other things with the label “Not Cat”

• Algorithm that “learns” to identify images of cats and, when fed a new image, hopes to produce the correct label

• Incredibly general setting: • Data: symptoms; Labels: illnesses

• Image recognition, automatic caption generation, speech recognition, etc.

3

Page 4: Deep Neural Networks - BIUu.cs.biu.ac.il/~jkeshet/teaching/iml2016/iml2016_tirgul09.pdf · Feedforward Neural Networks for Deep Learning •An input, output, and one or more hidden

Perceptrons: Early Deep Learning Algorithms

4

Page 5: Deep Neural Networks - BIUu.cs.biu.ac.il/~jkeshet/teaching/iml2016/iml2016_tirgul09.pdf · Feedforward Neural Networks for Deep Learning •An input, output, and one or more hidden

Perceptrons: Early Deep Learning Algorithms

• Basic neural network building block: perceptron

• Say we have n points in the plane, labeled ‘0’ and ‘1’. We’re given a new point and we want to guess its label

• Solution:• Find separating hyperplane: pick a line that

best separates the labeled data and use that as your classifier.

5

Page 6: Deep Neural Networks - BIUu.cs.biu.ac.il/~jkeshet/teaching/iml2016/iml2016_tirgul09.pdf · Feedforward Neural Networks for Deep Learning •An input, output, and one or more hidden

• Each piece of input data would be represented as a vector x = (𝑥1, 𝑥2).

• Our function would be : “‘0’ if below the line, ‘1’ if above”.

• The decision boundary: 𝑓 𝑥 = 𝑤 ⋅ 𝑥 + 𝑏

• Activation function:

• ℎ 𝑥 = 1: 𝑖𝑓 𝑓 𝑥 = 𝑤 ⋅ 𝑥 + 𝑏 > 0

0: 𝑜𝑡ℎ𝑒𝑟𝑤𝑖𝑠𝑒

*The activation function of a node defines the output of that node given an input.

6

Page 7: Deep Neural Networks - BIUu.cs.biu.ac.il/~jkeshet/teaching/iml2016/iml2016_tirgul09.pdf · Feedforward Neural Networks for Deep Learning •An input, output, and one or more hidden

Training the Perceptron

• Feeding it multiple training samples

• Calculating the output for each of them.

• After each sample, the weights w are adjusted in such a way so as to minimize the output error:• (𝑦 ∈ {0,1}): Update rule: 𝑤 ← 𝑤 + 𝑦 − 𝑦 𝑥

7

Page 8: Deep Neural Networks - BIUu.cs.biu.ac.il/~jkeshet/teaching/iml2016/iml2016_tirgul09.pdf · Feedforward Neural Networks for Deep Learning •An input, output, and one or more hidden

Perceptron omnipotent?

• Logic Gate - AND

0 1

0

1

AND 0 1

0 FALSE FALSE

1 FALSE TRUE

-1.5

1

1

output

AND Gate

1

0/1

0/1

8

Page 9: Deep Neural Networks - BIUu.cs.biu.ac.il/~jkeshet/teaching/iml2016/iml2016_tirgul09.pdf · Feedforward Neural Networks for Deep Learning •An input, output, and one or more hidden

Perceptron omnipotent?

• Logic Gate - OR

0 1

0

1

OR 0 1

0 FALSE TRUE

1 TRUE TRUE

-0.5

1

1

output

OR Gate

1

0/1

0/1

9

Page 10: Deep Neural Networks - BIUu.cs.biu.ac.il/~jkeshet/teaching/iml2016/iml2016_tirgul09.pdf · Feedforward Neural Networks for Deep Learning •An input, output, and one or more hidden

Perceptron omnipotent?

• Logic Gate - NOT

0 1

NOT 0 1

TRUE FALSE

0.5

-1

NOT Gate

output

1

0/1

10

Page 11: Deep Neural Networks - BIUu.cs.biu.ac.il/~jkeshet/teaching/iml2016/iml2016_tirgul09.pdf · Feedforward Neural Networks for Deep Learning •An input, output, and one or more hidden

Perceptron omnipotent?

• Logic Gate - XOR

0 1

0

1

XOR 0 1

0 FALSE TRUE

1 TRUE FALSE

? NOT LINEAR FUNCTION

11

Page 12: Deep Neural Networks - BIUu.cs.biu.ac.il/~jkeshet/teaching/iml2016/iml2016_tirgul09.pdf · Feedforward Neural Networks for Deep Learning •An input, output, and one or more hidden

Single Perceptron Drawbacks

• Can only learn linearly separable functions.

• To address this problem, we’ll need to use a multilayer perceptron.• A.k.a feedforward neural network.

• Multiple perceptrons = a more powerful mechanism for learning.

12

Page 13: Deep Neural Networks - BIUu.cs.biu.ac.il/~jkeshet/teaching/iml2016/iml2016_tirgul09.pdf · Feedforward Neural Networks for Deep Learning •An input, output, and one or more hidden

Multilayer Perceptron

• A neural network = composition of perceptrons, connected in different ways.

• Example:

13

Page 14: Deep Neural Networks - BIUu.cs.biu.ac.il/~jkeshet/teaching/iml2016/iml2016_tirgul09.pdf · Feedforward Neural Networks for Deep Learning •An input, output, and one or more hidden

Feedforward Neural Networks for Deep Learning

14

Page 15: Deep Neural Networks - BIUu.cs.biu.ac.il/~jkeshet/teaching/iml2016/iml2016_tirgul09.pdf · Feedforward Neural Networks for Deep Learning •An input, output, and one or more hidden

Feedforward Neural Networks for Deep Learning

• An input, output, and one or more hidden layers• 3-unit input layer, 4-unit hidden layer and

an output layer with 2.

• Each unit is a single perceptron.

• The units of the input layer serve as inputs for the units of the hidden layer, while the hidden layer units are inputs to the output layer.

15

Page 16: Deep Neural Networks - BIUu.cs.biu.ac.il/~jkeshet/teaching/iml2016/iml2016_tirgul09.pdf · Feedforward Neural Networks for Deep Learning •An input, output, and one or more hidden

Feedforward Neural Networks for Deep Learning• Each connection between two neurons

has a weight w.

• Fully connected case: Each unit of layer t is typically connected to every unit of the previous layer t – 1.

• The information moves in only one direction, forward, from the input nodes, through the hidden and to the output nodes.

16

Page 17: Deep Neural Networks - BIUu.cs.biu.ac.il/~jkeshet/teaching/iml2016/iml2016_tirgul09.pdf · Feedforward Neural Networks for Deep Learning •An input, output, and one or more hidden

17

Page 18: Deep Neural Networks - BIUu.cs.biu.ac.il/~jkeshet/teaching/iml2016/iml2016_tirgul09.pdf · Feedforward Neural Networks for Deep Learning •An input, output, and one or more hidden

Beyond Linearity

• What if each of our perceptrons is only allowed to use a linear activation function? • A linear composition of linear functions is still just a linear function.

• The final output of our network will still be some linear function of the inputs.

• If we’re restricted to linear activation functions, then the feedforwardneural network is no more powerful than the perceptron, no matter how many layers it has.

18

Page 19: Deep Neural Networks - BIUu.cs.biu.ac.il/~jkeshet/teaching/iml2016/iml2016_tirgul09.pdf · Feedforward Neural Networks for Deep Learning •An input, output, and one or more hidden

Beyond Linearity

• Because of this, most neural networks use non-linear activation functions like the logistic (sigmoid), tanh, or rectifier (ReLU).

• Without them the network can only learn functions which are linear combinations of its inputs.

19

Page 20: Deep Neural Networks - BIUu.cs.biu.ac.il/~jkeshet/teaching/iml2016/iml2016_tirgul09.pdf · Feedforward Neural Networks for Deep Learning •An input, output, and one or more hidden

Activation Functions

Function Range

Logistic (sigmoid) (0,1)

tanh (-1,1)

Rectifier linear unit (ReLU) [0, ∞)

20

Page 21: Deep Neural Networks - BIUu.cs.biu.ac.il/~jkeshet/teaching/iml2016/iml2016_tirgul09.pdf · Feedforward Neural Networks for Deep Learning •An input, output, and one or more hidden

Activation Functions

22

Page 22: Deep Neural Networks - BIUu.cs.biu.ac.il/~jkeshet/teaching/iml2016/iml2016_tirgul09.pdf · Feedforward Neural Networks for Deep Learning •An input, output, and one or more hidden

𝒙 𝒉𝟏 𝒉𝟐 𝒚

𝜃𝑖 = 𝑡ℎ𝑒 𝑤𝑒𝑖𝑔ℎ𝑡𝑠 𝑜𝑓 𝑙𝑎𝑦𝑒𝑟 𝑖

𝑓𝑖 = the activation function used in layer 𝑖 23

Page 23: Deep Neural Networks - BIUu.cs.biu.ac.il/~jkeshet/teaching/iml2016/iml2016_tirgul09.pdf · Feedforward Neural Networks for Deep Learning •An input, output, and one or more hidden

Loss

• The measure of how well the model fits the training set is given by a suitable loss function: 𝐿 𝑥, 𝑦; 𝜃 , e. g. :• Sum-of-squares: 𝑖=1

𝐾 𝑦 − 𝑦 2

• Negative log likelihood: − log 𝑝 𝑐𝑙𝑎𝑠𝑠 = 𝑘 𝑥; 𝜃)

• The loss depends on the input 𝑥, the target label 𝑦, and the parameters 𝜃.

𝜃𝑖 = (𝑤𝑖 , 𝑏𝑖) 24

Page 24: Deep Neural Networks - BIUu.cs.biu.ac.il/~jkeshet/teaching/iml2016/iml2016_tirgul09.pdf · Feedforward Neural Networks for Deep Learning •An input, output, and one or more hidden

Activation Functions Derivatives

Function Derivative (w.r.t 𝑥)

Logistic (sigmoid)

tanh

Rectifier linear unit (ReLU)

25

Page 25: Deep Neural Networks - BIUu.cs.biu.ac.il/~jkeshet/teaching/iml2016/iml2016_tirgul09.pdf · Feedforward Neural Networks for Deep Learning •An input, output, and one or more hidden

DNN algorithm

1. Feedforward

2. Backpropagation

27

Page 26: Deep Neural Networks - BIUu.cs.biu.ac.il/~jkeshet/teaching/iml2016/iml2016_tirgul09.pdf · Feedforward Neural Networks for Deep Learning •An input, output, and one or more hidden

Feedforward

• Inputting values at the input layer from where it travels from input to hidden and from hidden to output layer.

28

Page 27: Deep Neural Networks - BIUu.cs.biu.ac.il/~jkeshet/teaching/iml2016/iml2016_tirgul09.pdf · Feedforward Neural Networks for Deep Learning •An input, output, and one or more hidden

Feedforward Example

• Activation function: • Sigmoid: 𝜎(𝑁𝑒𝑡)

• Loss function:

• Sum of Squares: E =1

2 𝑖 𝑦 − 𝑦 2

𝑖𝑤𝑖𝑥 + 𝑏𝑖

Activation function

z

𝑥(1)

𝑥(2)

ℎ1(1)

ℎ1(2)

Notation: ℎ𝑖(𝑗)

refers to the 𝑗th

neuron at the 𝑖th hidden layer. 29

Page 28: Deep Neural Networks - BIUu.cs.biu.ac.il/~jkeshet/teaching/iml2016/iml2016_tirgul09.pdf · Feedforward Neural Networks for Deep Learning •An input, output, and one or more hidden

Feedforward Example

• 𝑥 = 0.05, 0.1 , 𝑦 = 0.01

• Suppose the weight values of our network are given.• (appear in red)

• Notation: 𝑤𝑙𝑗 denotes the weight vector entering neuron 𝑗 of layer 𝑙.

0.15

z

𝑥(1)

𝑥(2)

ℎ1(1)

ℎ1(2)

0.20

0.25

0.30

0.350.35

0.40

0.60

0.15

0.45

30

Page 29: Deep Neural Networks - BIUu.cs.biu.ac.il/~jkeshet/teaching/iml2016/iml2016_tirgul09.pdf · Feedforward Neural Networks for Deep Learning •An input, output, and one or more hidden

Feedforward Example

• 𝑥 = 0.05, 0.1 , 𝑦 = 0.01

• 𝑁𝑒𝑡ℎ1(1) = 𝑖 𝑥

(𝑖) ⋅ 𝑤11(𝑖)

• 𝑂𝑢𝑡ℎ1(1) = 𝜎 𝑁𝑒𝑡

ℎ11

= 0.05 ⋅ 0.15 + 0.1 ⋅ 0.20 + 1 ⋅ 0.35 = 0.3775

= 𝑥(1)⋅ 𝑤11(1)

+ 𝑥(2) ⋅ 𝑤11(2)

+ 1 ⋅ 𝑏1

=1

1+𝑒−𝑁𝑒𝑡

ℎ11=

1

1+𝑒−0.3775= 0.59326

z

𝑥(1)

𝑥(2)

ℎ1(1)

ℎ1(2)

0.20

0.25

0.30

0.350.35

0.40

0.60

0.05

0.10.01

0.15

0.45

31

Page 30: Deep Neural Networks - BIUu.cs.biu.ac.il/~jkeshet/teaching/iml2016/iml2016_tirgul09.pdf · Feedforward Neural Networks for Deep Learning •An input, output, and one or more hidden

Feedforward Example

• 𝑂𝑢𝑡ℎ1(1) = 0.59326

• 𝑂𝑢𝑡ℎ1(2) = 0.59688

• 𝑁𝑒𝑡𝑧 = 𝑖 ℎ1(𝑖)

⋅ 𝑤2(𝑖)

• 𝑂𝑢𝑡𝑧 = 𝜎 𝑁𝑒𝑡𝑧 =1

1+𝑒−𝑁𝑒𝑡𝑧

z

𝑥(1)

𝑥(2)

ℎ1(1)

ℎ1(2)

0.20

0.25

0.30

0.350.35

0.40

0.45

0.60

0.05

0.10.01

0.15

= 0.59326 ⋅ 0.40 + 0.59688 ⋅ 0.45

= ℎ1(1)

⋅ 𝑤2(1)

+ℎ1(2)

⋅ 𝑤2(2)

+ 1 ⋅ 𝑏2

=1

1+𝑒−1.1059= 0.7513

← Calculated in the same way

+ 1 ⋅ 0.60 = 1.1059

32

Page 31: Deep Neural Networks - BIUu.cs.biu.ac.il/~jkeshet/teaching/iml2016/iml2016_tirgul09.pdf · Feedforward Neural Networks for Deep Learning •An input, output, and one or more hidden

Calculating the Error

• Loss function:

• 𝐸 =1

2 𝑖 𝑦 − 𝑦 2

• Our error:

• 𝐸 =1

20.7513 − 0.01 2 = 0.2747

z

𝑥(1)

𝑥(2)

ℎ1(1)

ℎ1(2)

0.20

0.25

0.30

0.350.35

0.40

0.45

0.60

0.05

0.10.01

0.15

𝒊 iterates over the output nodes (in this case there is only one)

33

Page 32: Deep Neural Networks - BIUu.cs.biu.ac.il/~jkeshet/teaching/iml2016/iml2016_tirgul09.pdf · Feedforward Neural Networks for Deep Learning •An input, output, and one or more hidden

Backpropagation

• Goal: • Update every weight vector so they cause 𝑦 to be closer to 𝑦.

• Thus minimizing the error of the network.

34

Page 33: Deep Neural Networks - BIUu.cs.biu.ac.il/~jkeshet/teaching/iml2016/iml2016_tirgul09.pdf · Feedforward Neural Networks for Deep Learning •An input, output, and one or more hidden

Backpropagation

• We will start by updating 𝑤2

• Using Gradient Descent

• Recall: the update rule:

• 𝑤2 = 𝑤2 − 𝜂𝜕𝐸

𝜕𝑤2

z

𝑥(1)

𝑥(2)

ℎ1(1)

ℎ1(2)

0.20

0.25

0.30

0.350.35

0.40

0.45

0.60

0.05

0.10.01

0.15

35

Page 34: Deep Neural Networks - BIUu.cs.biu.ac.il/~jkeshet/teaching/iml2016/iml2016_tirgul09.pdf · Feedforward Neural Networks for Deep Learning •An input, output, and one or more hidden

Backpropagation

• Find 𝜕𝐸

𝜕𝑤2:

• We will use the chain rule.

• 𝐸 =1

2𝑦 − 𝑦 2

•𝜕𝐸

𝜕𝑤2=

𝜕𝐸

𝜕𝑜𝑢𝑡𝑧

𝜕𝑜𝑢𝑡𝑧

𝜕𝑛𝑒𝑡𝑧

𝜕𝑛𝑒𝑡𝑧

𝜕𝑤2

36

Page 35: Deep Neural Networks - BIUu.cs.biu.ac.il/~jkeshet/teaching/iml2016/iml2016_tirgul09.pdf · Feedforward Neural Networks for Deep Learning •An input, output, and one or more hidden

The Chain Rule

• Feedforward Calculations:

1.𝑁𝑒𝑡ℎ1(𝑗) = ⟨𝒘1𝑗 , 𝒙⟩

2. 𝑂𝑢𝑡ℎ1(𝑗) = 𝜎 𝑁𝑒𝑡

ℎ1𝑗 (add bias)

3.𝑁𝑒𝑡z = ⟨𝒘2, 𝑂𝑢𝑡ℎ1⟩

4. 𝑂𝑢𝑡𝑧 = 𝜎 𝑁𝑒𝑡z

37

Page 36: Deep Neural Networks - BIUu.cs.biu.ac.il/~jkeshet/teaching/iml2016/iml2016_tirgul09.pdf · Feedforward Neural Networks for Deep Learning •An input, output, and one or more hidden

The Chain Rule

Backpropagation:

•𝜕𝐸

𝜕𝑤2=

𝜕𝐸

𝜕𝑜𝑢𝑡𝑧

𝜕𝑜𝑢𝑡𝑧

𝜕𝑛𝑒𝑡𝑧

𝜕𝑛𝑒𝑡𝑧

𝜕𝑤2

•𝜕𝐸

𝜕𝑜𝑢𝑡𝑧= 2

1

2𝑦 − 𝑂𝑢𝑡𝑧 −1

•𝜕𝑜𝑢𝑡𝑧

𝜕𝑛𝑒𝑡𝑧= 𝜎 𝑁𝑒𝑡z 1 − 𝜎 𝑁𝑒𝑡z

•𝜕𝑛𝑒𝑡𝑧

𝜕𝑤2= 𝑂𝑢𝑡ℎ1

Feedforward Calculations:

1.𝑁𝑒𝑡ℎ1(𝑗) = ⟨𝒘1𝑗 , 𝒙⟩

2. 𝑂𝑢𝑡ℎ1(𝑗) = 𝜎 𝑁𝑒𝑡

ℎ1𝑗 (add bias)

3.𝑁𝑒𝑡z = ⟨𝒘2, 𝑂𝑢𝑡ℎ1⟩

4. 𝑂𝑢𝑡𝑧 = 𝜎 𝑁𝑒𝑡z

Error: 𝐸 =1

2𝑦 − 𝑂𝑢𝑡𝑧

2

𝑦

38

Page 37: Deep Neural Networks - BIUu.cs.biu.ac.il/~jkeshet/teaching/iml2016/iml2016_tirgul09.pdf · Feedforward Neural Networks for Deep Learning •An input, output, and one or more hidden

The Update Rule

•𝜕𝐸

𝜕𝑤2=

𝜕𝐸

𝜕𝑜𝑢𝑡𝑧

𝜕𝑜𝑢𝑡𝑧

𝜕𝑛𝑒𝑡𝑧

𝜕𝑛𝑒𝑡𝑧

𝜕𝑤2

• Updating 𝑤2:

= 21

2𝑦 − 𝑂𝑢𝑡𝑧 −1 𝜎 𝑁𝑒𝑡z 1 − 𝜎 𝑁𝑒𝑡z 𝑂𝑢𝑡ℎ1

= − 𝑦 − 𝑂𝑢𝑡𝑧 𝑂𝑢𝑡𝑧 1 − 𝑂𝑢𝑡𝑧 𝑂𝑢𝑡ℎ1

𝑤2 = 𝑤2 − 𝜂 − 𝑦 − 𝑂𝑢𝑡𝑧 𝑂𝑢𝑡𝑧 1 − 𝑂𝑢𝑡𝑧 𝑂𝑢𝑡ℎ1

39

Page 38: Deep Neural Networks - BIUu.cs.biu.ac.il/~jkeshet/teaching/iml2016/iml2016_tirgul09.pdf · Feedforward Neural Networks for Deep Learning •An input, output, and one or more hidden

The Chain Rule

• Next, we will continue the backwards pass in order to calculate 𝑤1𝑗

40

Page 39: Deep Neural Networks - BIUu.cs.biu.ac.il/~jkeshet/teaching/iml2016/iml2016_tirgul09.pdf · Feedforward Neural Networks for Deep Learning •An input, output, and one or more hidden

The Chain Rule

Backpropagation:

•𝜕𝐸

𝜕𝑤1𝑗=

𝜕𝐸

𝜕𝑜𝑢𝑡𝑧

𝜕𝑜𝑢𝑡𝑧

𝜕𝑛𝑒𝑡𝑧

𝜕𝑛𝑒𝑡𝑧

𝜕𝑂𝑢𝑡ℎ1(𝑗)

𝜕𝑂𝑢𝑡ℎ1(𝑗)

𝜕𝑁𝑒𝑡ℎ1(𝑗)

𝜕𝑁𝑒𝑡ℎ1(𝑗)

𝜕𝑤1𝑗

•𝜕𝑛𝑒𝑡𝑧

𝜕𝑂𝑢𝑡ℎ1(𝑗)

= 𝒘2(𝑗)

•𝜕𝑂𝑢𝑡

ℎ1(𝑗)

𝜕𝑁𝑒𝑡ℎ1(𝑗)

= 𝜎 𝑁𝑒𝑡ℎ1(𝑗) 1 − 𝜎 𝑁𝑒𝑡

ℎ1(𝑗)

•𝜕𝑁𝑒𝑡

ℎ1(𝑗)

𝜕𝑤1𝑗= 𝒙

Feedforward Calculations:

1.𝑁𝑒𝑡ℎ1(𝑗) = ⟨𝒘1𝑗 , 𝒙⟩

2. 𝑂𝑢𝑡ℎ1(𝑗) = 𝜎 𝑁𝑒𝑡

ℎ1𝑗 (add bias)

3.𝑁𝑒𝑡z = ⟨𝒘2, 𝑂𝑢𝑡ℎ1⟩

4. 𝑂𝑢𝑡𝑧 = 𝜎 𝑁𝑒𝑡z

Error: 𝐸 =1

2𝑦 − 𝑂𝑢𝑡𝑧

2

𝑦

Calculated previously

41

Page 40: Deep Neural Networks - BIUu.cs.biu.ac.il/~jkeshet/teaching/iml2016/iml2016_tirgul09.pdf · Feedforward Neural Networks for Deep Learning •An input, output, and one or more hidden

The Update Rule

•𝜕𝐸

𝜕𝑤1𝑗=

𝜕𝐸

𝜕𝑜𝑢𝑡𝑧

𝜕𝑜𝑢𝑡𝑧

𝜕𝑛𝑒𝑡𝑧

𝜕𝑛𝑒𝑡𝑧

𝜕𝑂𝑢𝑡ℎ1

𝜕𝑂𝑢𝑡ℎ1𝜕𝑁𝑒𝑡ℎ1

𝜕𝑁𝑒𝑡ℎ1𝜕𝑤1𝑗

• Updating 𝑤1𝑗:

= 21

2𝑦 − 𝑂𝑢𝑡𝑧 −1 𝑂𝑢𝑡𝑧 1 − 𝑂𝑢𝑡𝑧 𝒘2

(𝑗)𝜎 𝑁𝑒𝑡

ℎ1(𝑗) 1 − 𝜎 𝑁𝑒𝑡

ℎ1(𝑗) 𝒙

𝑤1𝑗 = 𝑤1𝑗 − 𝜂 − 𝑦 − 𝑂𝑢𝑡𝑧 𝑂𝑢𝑡𝑧 1 − 𝑂𝑢𝑡𝑧 𝒘2(𝑗)

𝑂𝑢𝑡ℎ1(𝑗) 1 − 𝑂𝑢𝑡

ℎ1(𝑗) 𝒙

= − 𝑦 − 𝑂𝑢𝑡𝑧 𝑂𝑢𝑡𝑧 1 − 𝑂𝑢𝑡𝑧 𝒘2(𝑗)

𝑂𝑢𝑡ℎ1(𝑗) 1 − 𝑂𝑢𝑡

ℎ1(𝑗) 𝒙

42

Calculated previously

Page 41: Deep Neural Networks - BIUu.cs.biu.ac.il/~jkeshet/teaching/iml2016/iml2016_tirgul09.pdf · Feedforward Neural Networks for Deep Learning •An input, output, and one or more hidden

Updated Weights

• Finally, we updated the weights.

• When we fed forward the first input (0.05, 0.1), the error was: 0.2983

• After a single update, the error is down to: 0.29102

• After repeating the process 10,000 times, the error is: 0.00003

43

Page 42: Deep Neural Networks - BIUu.cs.biu.ac.il/~jkeshet/teaching/iml2016/iml2016_tirgul09.pdf · Feedforward Neural Networks for Deep Learning •An input, output, and one or more hidden

Summary

• Single Layered Perceptron• Only solves linear problems

• Neural Networks• Non-linear activation functions

• Feedforward

• Backpropagation

• Example from:• https://mattmazur.com/2015/03/17/a-step-by-step-backpropagation-

example/

44