neural networks workshop · neural networks arti cial neuron: the building blocks of a neural...
TRANSCRIPT
![Page 1: Neural Networks Workshop · Neural Networks Arti cial Neuron: The building blocks of a neural network Mathematically, y= f(w 1x 1 +w 2x 2 +w 3x 3 +b) = f(wTx+b) Diagram from Nick](https://reader030.vdocuments.us/reader030/viewer/2022040913/5e89d0626f98607fc62794e8/html5/thumbnails/1.jpg)
Neural NetworksWorkshop
Tony Allen
Department of MathematicsPurdue University
July 1, 2019
Tony Allen NN Workshop July 1, 2019 1 / 30
![Page 2: Neural Networks Workshop · Neural Networks Arti cial Neuron: The building blocks of a neural network Mathematically, y= f(w 1x 1 +w 2x 2 +w 3x 3 +b) = f(wTx+b) Diagram from Nick](https://reader030.vdocuments.us/reader030/viewer/2022040913/5e89d0626f98607fc62794e8/html5/thumbnails/2.jpg)
Helpful Resources
I Goodfellow, Bengio, Courville’s Deep Learninghttps://www.deeplearningbook.org/
I Francis Chollet’s Deep Learning with Python https://github.com/
fchollet/deep-learning-with-python-notebooks
I Dr. Buzzard MA598 Course Noteshttps://www.math.purdue.edu/~buzzard/MA598-Spring2019/
I Nick Winovich’s SIAM@Purdue TensorFlow Workshophttps://www.math.purdue.edu/~nwinovic/workshop.html
Tony Allen NN Workshop July 1, 2019 2 / 30
![Page 3: Neural Networks Workshop · Neural Networks Arti cial Neuron: The building blocks of a neural network Mathematically, y= f(w 1x 1 +w 2x 2 +w 3x 3 +b) = f(wTx+b) Diagram from Nick](https://reader030.vdocuments.us/reader030/viewer/2022040913/5e89d0626f98607fc62794e8/html5/thumbnails/3.jpg)
What is Machine Learning?
Machine Learning shifts the paradigm from programming for answers toprogramming to discover rules.
Diagram adapted from Francis Chollet’s Deep Learning with PythonTony Allen NN Workshop July 1, 2019 3 / 30
![Page 4: Neural Networks Workshop · Neural Networks Arti cial Neuron: The building blocks of a neural network Mathematically, y= f(w 1x 1 +w 2x 2 +w 3x 3 +b) = f(wTx+b) Diagram from Nick](https://reader030.vdocuments.us/reader030/viewer/2022040913/5e89d0626f98607fc62794e8/html5/thumbnails/4.jpg)
Neural Networks
Artificial Neuron: The building blocks of a neural network
Mathematically, y = f(w1x1 + w2x2 + w3x3 + b)
= f(wTx+ b)
Diagram from Nick WinovichTony Allen NN Workshop July 1, 2019 4 / 30
![Page 5: Neural Networks Workshop · Neural Networks Arti cial Neuron: The building blocks of a neural network Mathematically, y= f(w 1x 1 +w 2x 2 +w 3x 3 +b) = f(wTx+b) Diagram from Nick](https://reader030.vdocuments.us/reader030/viewer/2022040913/5e89d0626f98607fc62794e8/html5/thumbnails/5.jpg)
Neural Networks
Artificial Neuron: The building blocks of a neural network
Mathematically, y = f(w1x1 + w2x2 + w3x3 + b)
= f(wTx+ b)
Diagram from Nick WinovichTony Allen NN Workshop July 1, 2019 4 / 30
![Page 6: Neural Networks Workshop · Neural Networks Arti cial Neuron: The building blocks of a neural network Mathematically, y= f(w 1x 1 +w 2x 2 +w 3x 3 +b) = f(wTx+b) Diagram from Nick](https://reader030.vdocuments.us/reader030/viewer/2022040913/5e89d0626f98607fc62794e8/html5/thumbnails/6.jpg)
Neural Networks
Tony Allen NN Workshop July 1, 2019 5 / 30
![Page 7: Neural Networks Workshop · Neural Networks Arti cial Neuron: The building blocks of a neural network Mathematically, y= f(w 1x 1 +w 2x 2 +w 3x 3 +b) = f(wTx+b) Diagram from Nick](https://reader030.vdocuments.us/reader030/viewer/2022040913/5e89d0626f98607fc62794e8/html5/thumbnails/7.jpg)
Neural Networks
We can combine the corresponding equations
y1 = f(w1Tx+ b1)
y2 = f(w2Tx+ b2)
into one matrix-vector product equation
y = f(Wx+ b)
If we have N inputs and M outputs, then W is a Dense M ×N matrix.
(for the picky: f is applied element-wise)Tony Allen NN Workshop July 1, 2019 6 / 30
![Page 8: Neural Networks Workshop · Neural Networks Arti cial Neuron: The building blocks of a neural network Mathematically, y= f(w 1x 1 +w 2x 2 +w 3x 3 +b) = f(wTx+b) Diagram from Nick](https://reader030.vdocuments.us/reader030/viewer/2022040913/5e89d0626f98607fc62794e8/html5/thumbnails/8.jpg)
Dense (Fully Connected) Layer
Diagram from Nick WinovichTony Allen NN Workshop July 1, 2019 7 / 30
![Page 9: Neural Networks Workshop · Neural Networks Arti cial Neuron: The building blocks of a neural network Mathematically, y= f(w 1x 1 +w 2x 2 +w 3x 3 +b) = f(wTx+b) Diagram from Nick](https://reader030.vdocuments.us/reader030/viewer/2022040913/5e89d0626f98607fc62794e8/html5/thumbnails/9.jpg)
Network Depth
y = f2(W2(f1(W1x+ b1)) + b2)
Diagram from Nick WinovichTony Allen NN Workshop July 1, 2019 8 / 30
![Page 10: Neural Networks Workshop · Neural Networks Arti cial Neuron: The building blocks of a neural network Mathematically, y= f(w 1x 1 +w 2x 2 +w 3x 3 +b) = f(wTx+b) Diagram from Nick](https://reader030.vdocuments.us/reader030/viewer/2022040913/5e89d0626f98607fc62794e8/html5/thumbnails/10.jpg)
Network Depth
y = f3(W3(f2(W2(f1(W1x+ b1)) + b2)) + b3)
Diagram from Nick WinovichTony Allen NN Workshop July 1, 2019 9 / 30
![Page 11: Neural Networks Workshop · Neural Networks Arti cial Neuron: The building blocks of a neural network Mathematically, y= f(w 1x 1 +w 2x 2 +w 3x 3 +b) = f(wTx+b) Diagram from Nick](https://reader030.vdocuments.us/reader030/viewer/2022040913/5e89d0626f98607fc62794e8/html5/thumbnails/11.jpg)
A Word on Activation Functions
Activation functions are a fundamental component of networkarchitecture; they allow for non-linear modeling capacity, and control thegradient flow that guide training.
Figures from Nick Winovich
![Page 12: Neural Networks Workshop · Neural Networks Arti cial Neuron: The building blocks of a neural network Mathematically, y= f(w 1x 1 +w 2x 2 +w 3x 3 +b) = f(wTx+b) Diagram from Nick](https://reader030.vdocuments.us/reader030/viewer/2022040913/5e89d0626f98607fc62794e8/html5/thumbnails/12.jpg)
Optimization (How to learn)
Goal: Learn weights so the network gives desired output
Everything today will be Supervised Learning:
x Neural Net y
loss(y, y)Update weights
Adjust weights wi,j to minimize the loss
Tony Allen NN Workshop July 1, 2019 11 / 30
![Page 13: Neural Networks Workshop · Neural Networks Arti cial Neuron: The building blocks of a neural network Mathematically, y= f(w 1x 1 +w 2x 2 +w 3x 3 +b) = f(wTx+b) Diagram from Nick](https://reader030.vdocuments.us/reader030/viewer/2022040913/5e89d0626f98607fc62794e8/html5/thumbnails/13.jpg)
Gradient Descent
From calculus: The greatest decrease in a function is in the directionopposite of the gradient.
Let θ be all the parameters (weights and biases) and E be total loss overall data. Then iteratively apply a method called Gradient Descent:
θk+1 = θk − αk∇Eθk
However, computing gradient of loss over all data can be expensive. Soinstead compute it over random subsets of data (batches). This leads toStochastic Gradient Descent algorithms.
Tony Allen NN Workshop July 1, 2019 12 / 30
![Page 14: Neural Networks Workshop · Neural Networks Arti cial Neuron: The building blocks of a neural network Mathematically, y= f(w 1x 1 +w 2x 2 +w 3x 3 +b) = f(wTx+b) Diagram from Nick](https://reader030.vdocuments.us/reader030/viewer/2022040913/5e89d0626f98607fc62794e8/html5/thumbnails/14.jpg)
Gradient Descent
From calculus: The greatest decrease in a function is in the directionopposite of the gradient.
Let θ be all the parameters (weights and biases) and E be total loss overall data. Then iteratively apply a method called Gradient Descent:
θk+1 = θk − αk∇Eθk
However, computing gradient of loss over all data can be expensive. Soinstead compute it over random subsets of data (batches). This leads toStochastic Gradient Descent algorithms.
Tony Allen NN Workshop July 1, 2019 12 / 30
![Page 15: Neural Networks Workshop · Neural Networks Arti cial Neuron: The building blocks of a neural network Mathematically, y= f(w 1x 1 +w 2x 2 +w 3x 3 +b) = f(wTx+b) Diagram from Nick](https://reader030.vdocuments.us/reader030/viewer/2022040913/5e89d0626f98607fc62794e8/html5/thumbnails/15.jpg)
Gradient Descent
From calculus: The greatest decrease in a function is in the directionopposite of the gradient.
Let θ be all the parameters (weights and biases) and E be total loss overall data. Then iteratively apply a method called Gradient Descent:
θk+1 = θk − αk∇Eθk
However, computing gradient of loss over all data can be expensive. Soinstead compute it over random subsets of data (batches). This leads toStochastic Gradient Descent algorithms.
Tony Allen NN Workshop July 1, 2019 12 / 30
![Page 16: Neural Networks Workshop · Neural Networks Arti cial Neuron: The building blocks of a neural network Mathematically, y= f(w 1x 1 +w 2x 2 +w 3x 3 +b) = f(wTx+b) Diagram from Nick](https://reader030.vdocuments.us/reader030/viewer/2022040913/5e89d0626f98607fc62794e8/html5/thumbnails/16.jpg)
Back Propagation
How do we compute the gradient, i.e.∂E
∂wij?
The answer: Chain Rule!
Tony Allen NN Workshop July 1, 2019 13 / 30
![Page 17: Neural Networks Workshop · Neural Networks Arti cial Neuron: The building blocks of a neural network Mathematically, y= f(w 1x 1 +w 2x 2 +w 3x 3 +b) = f(wTx+b) Diagram from Nick](https://reader030.vdocuments.us/reader030/viewer/2022040913/5e89d0626f98607fc62794e8/html5/thumbnails/17.jpg)
Back Propagation
Diagram from Nick WinovichTony Allen NN Workshop July 1, 2019 14 / 30
![Page 18: Neural Networks Workshop · Neural Networks Arti cial Neuron: The building blocks of a neural network Mathematically, y= f(w 1x 1 +w 2x 2 +w 3x 3 +b) = f(wTx+b) Diagram from Nick](https://reader030.vdocuments.us/reader030/viewer/2022040913/5e89d0626f98607fc62794e8/html5/thumbnails/18.jpg)
Back Propagation
Diagram from Nick WinovichTony Allen NN Workshop July 1, 2019 15 / 30
![Page 19: Neural Networks Workshop · Neural Networks Arti cial Neuron: The building blocks of a neural network Mathematically, y= f(w 1x 1 +w 2x 2 +w 3x 3 +b) = f(wTx+b) Diagram from Nick](https://reader030.vdocuments.us/reader030/viewer/2022040913/5e89d0626f98607fc62794e8/html5/thumbnails/19.jpg)
Back Propagation
Diagram from Nick WinovichTony Allen NN Workshop July 1, 2019 16 / 30
![Page 20: Neural Networks Workshop · Neural Networks Arti cial Neuron: The building blocks of a neural network Mathematically, y= f(w 1x 1 +w 2x 2 +w 3x 3 +b) = f(wTx+b) Diagram from Nick](https://reader030.vdocuments.us/reader030/viewer/2022040913/5e89d0626f98607fc62794e8/html5/thumbnails/20.jpg)
Back Propagation
Diagram from Nick WinovichTony Allen NN Workshop July 1, 2019 17 / 30
![Page 21: Neural Networks Workshop · Neural Networks Arti cial Neuron: The building blocks of a neural network Mathematically, y= f(w 1x 1 +w 2x 2 +w 3x 3 +b) = f(wTx+b) Diagram from Nick](https://reader030.vdocuments.us/reader030/viewer/2022040913/5e89d0626f98607fc62794e8/html5/thumbnails/21.jpg)
Let’s actually do something! (Exercise 1)
The Notebook to follow along can be found on the Workshop homepage:https:
//engineering.purdue.edu/ChanGroup/MLworkshop2019.html
Tony Allen NN Workshop July 1, 2019 18 / 30
![Page 22: Neural Networks Workshop · Neural Networks Arti cial Neuron: The building blocks of a neural network Mathematically, y= f(w 1x 1 +w 2x 2 +w 3x 3 +b) = f(wTx+b) Diagram from Nick](https://reader030.vdocuments.us/reader030/viewer/2022040913/5e89d0626f98607fc62794e8/html5/thumbnails/22.jpg)
Overfitting
In some cases, a network can learn too much. That is, it can learn toperform well on the training data, but fail to generalize to testing data.Solutions include Regularization and Dropout.
Tony Allen NN Workshop July 1, 2019 19 / 30
![Page 23: Neural Networks Workshop · Neural Networks Arti cial Neuron: The building blocks of a neural network Mathematically, y= f(w 1x 1 +w 2x 2 +w 3x 3 +b) = f(wTx+b) Diagram from Nick](https://reader030.vdocuments.us/reader030/viewer/2022040913/5e89d0626f98607fc62794e8/html5/thumbnails/23.jpg)
Regularization and Dropout
Regularization adds a penalty for large weights to the loss function.Commonly, we use
I L1 norm, which encourages sparsity
I L2 norm, which encourages small weights
loss = loss + λ‖θ‖1 (or ‖θ‖2)
Dropout temporarily ignoring random nodes (with fixed probability p)during each training iteration. Ensures no individual node dominates.
Tony Allen NN Workshop July 1, 2019 20 / 30
![Page 24: Neural Networks Workshop · Neural Networks Arti cial Neuron: The building blocks of a neural network Mathematically, y= f(w 1x 1 +w 2x 2 +w 3x 3 +b) = f(wTx+b) Diagram from Nick](https://reader030.vdocuments.us/reader030/viewer/2022040913/5e89d0626f98607fc62794e8/html5/thumbnails/24.jpg)
Exercise 2
In this exercise, we will try to improve the previous model by addingdropout. You should use the Keras Documentation(https://keras.io/) to create a network with at least 2 hidden layersthat use dropout. Plot the training loss and print the test accuracy, andcompare to the previous model.
Tony Allen NN Workshop July 1, 2019 21 / 30
![Page 25: Neural Networks Workshop · Neural Networks Arti cial Neuron: The building blocks of a neural network Mathematically, y= f(w 1x 1 +w 2x 2 +w 3x 3 +b) = f(wTx+b) Diagram from Nick](https://reader030.vdocuments.us/reader030/viewer/2022040913/5e89d0626f98607fc62794e8/html5/thumbnails/25.jpg)
Convolutional Neural Networks (CNNs)
CNNs are useful when the data is spatially structured (e.g. images).
The key concept behind CNNs is that of kernels/filters. These are used inhand-craft feature detection.
What are good, distinguishing features? How do we mathematicallyextract such features?
https://towardsdatascience.com/intuitively-understanding-convolutions-for-deep-learning-1f6f42faee1
Tony Allen NN Workshop July 1, 2019 22 / 30
![Page 26: Neural Networks Workshop · Neural Networks Arti cial Neuron: The building blocks of a neural network Mathematically, y= f(w 1x 1 +w 2x 2 +w 3x 3 +b) = f(wTx+b) Diagram from Nick](https://reader030.vdocuments.us/reader030/viewer/2022040913/5e89d0626f98607fc62794e8/html5/thumbnails/26.jpg)
Convolution
https://github.com/PetarV-/TikZ/tree/masterTony Allen NN Workshop July 1, 2019 23 / 30
![Page 27: Neural Networks Workshop · Neural Networks Arti cial Neuron: The building blocks of a neural network Mathematically, y= f(w 1x 1 +w 2x 2 +w 3x 3 +b) = f(wTx+b) Diagram from Nick](https://reader030.vdocuments.us/reader030/viewer/2022040913/5e89d0626f98607fc62794e8/html5/thumbnails/27.jpg)
Pop Quiz
Let I =
1 2 0 1 32 0 1 4 07 0 9 5 58 5 2 6 08 0 0 1 4
and K =
1 1 00 0 00 0 2
.
1 What is (I ∗K)(1, 1)?
2 What is the size of I ∗K?
![Page 28: Neural Networks Workshop · Neural Networks Arti cial Neuron: The building blocks of a neural network Mathematically, y= f(w 1x 1 +w 2x 2 +w 3x 3 +b) = f(wTx+b) Diagram from Nick](https://reader030.vdocuments.us/reader030/viewer/2022040913/5e89d0626f98607fc62794e8/html5/thumbnails/28.jpg)
Matrix View
In practice, we perform a convolution as one large matrix-vector productthat does all the work in one go.
I =
1 2 0 1 32 0 1 4 07 0 9 5 58 5 2 6 08 0 0 1 4
and K =
1 1 00 0 00 0 2
.
I ∗K =
1 1 0 · · · 2 0 0 · · · 00 1 1 · · · 0 2 0 · · · 00 0 1 · · · 0 0 2 · · · 0
......
0 0 0 · · · 1 1 0 · · · 2
120...9...14
=
211211...22
Words: Toeplitz, Sparse
![Page 29: Neural Networks Workshop · Neural Networks Arti cial Neuron: The building blocks of a neural network Mathematically, y= f(w 1x 1 +w 2x 2 +w 3x 3 +b) = f(wTx+b) Diagram from Nick](https://reader030.vdocuments.us/reader030/viewer/2022040913/5e89d0626f98607fc62794e8/html5/thumbnails/29.jpg)
Convolutional Neural Network
Key Ideas of a CNN:
1 Instead of expensive dense matrix-vector products, do convolutions
2 Everything else stays the same (activation functions, training, etc.)
CNNs scale very well to large images because of their sparse connectionsand natural space invariance.
Of course I have skipped some details, so let me touch on those:
I Stride
I Padding
I Pooling
Tony Allen NN Workshop July 1, 2019 26 / 30
![Page 30: Neural Networks Workshop · Neural Networks Arti cial Neuron: The building blocks of a neural network Mathematically, y= f(w 1x 1 +w 2x 2 +w 3x 3 +b) = f(wTx+b) Diagram from Nick](https://reader030.vdocuments.us/reader030/viewer/2022040913/5e89d0626f98607fc62794e8/html5/thumbnails/30.jpg)
Convolutional Neural Network
Key Ideas of a CNN:
1 Instead of expensive dense matrix-vector products, do convolutions
2 Everything else stays the same (activation functions, training, etc.)
CNNs scale very well to large images because of their sparse connectionsand natural space invariance.
Of course I have skipped some details, so let me touch on those:
I Stride
I Padding
I Pooling
Tony Allen NN Workshop July 1, 2019 26 / 30
![Page 31: Neural Networks Workshop · Neural Networks Arti cial Neuron: The building blocks of a neural network Mathematically, y= f(w 1x 1 +w 2x 2 +w 3x 3 +b) = f(wTx+b) Diagram from Nick](https://reader030.vdocuments.us/reader030/viewer/2022040913/5e89d0626f98607fc62794e8/html5/thumbnails/31.jpg)
Stride and Padding
When defining a convolution, we need to specify how fast and to whatextent the kernel slides over the image. This is the stride and padding,respectively. Both of these determine the size of the output.
In Keras,
I “strides = 2”, determines how many pixels the kernel moves at a time(in this case two)
I “padding = same” puts zeros around the image so that the output isthe same size as the input. Called zero-padding.
I “padding = valid” puts zeros in the necessary places so that theconvolution stays valid
Tony Allen NN Workshop July 1, 2019 27 / 30
![Page 32: Neural Networks Workshop · Neural Networks Arti cial Neuron: The building blocks of a neural network Mathematically, y= f(w 1x 1 +w 2x 2 +w 3x 3 +b) = f(wTx+b) Diagram from Nick](https://reader030.vdocuments.us/reader030/viewer/2022040913/5e89d0626f98607fc62794e8/html5/thumbnails/32.jpg)
Max Pooling
Often, we care about the existence of a feature. Max Pooling is one wayto reduce dimensionality while keeping information about the existence ofa feature.
In my experience, you see this applied after a stride = 1 convolution withzero-padding.
https://computersciencewiki.org/index.php/Max-poolingTony Allen NN Workshop July 1, 2019 28 / 30
![Page 33: Neural Networks Workshop · Neural Networks Arti cial Neuron: The building blocks of a neural network Mathematically, y= f(w 1x 1 +w 2x 2 +w 3x 3 +b) = f(wTx+b) Diagram from Nick](https://reader030.vdocuments.us/reader030/viewer/2022040913/5e89d0626f98607fc62794e8/html5/thumbnails/33.jpg)
Exercise 3
In this exercise, we implement a CNN and see how much better itperforms on our image classification task.
Tony Allen NN Workshop July 1, 2019 29 / 30
![Page 34: Neural Networks Workshop · Neural Networks Arti cial Neuron: The building blocks of a neural network Mathematically, y= f(w 1x 1 +w 2x 2 +w 3x 3 +b) = f(wTx+b) Diagram from Nick](https://reader030.vdocuments.us/reader030/viewer/2022040913/5e89d0626f98607fc62794e8/html5/thumbnails/34.jpg)
Fin.
Tony Allen NN Workshop July 1, 2019 30 / 30