deep learning: an introduction for applied...
TRANSCRIPT
![Page 1: Deep Learning: An Introduction for Applied Mathematiciansmanuelbaumann.de/projects/reading-ml.pdf · Reference C. F. Higham and D. J. Higham (2018). Deep Learning: An Introduction](https://reader030.vdocuments.us/reader030/viewer/2022040308/5f09e2007e708231d428f45d/html5/thumbnails/1.jpg)
Deep Learning: An Introduction forApplied Mathematicians
Manuel Baumann
June 13, 2018
CSC Reading Group Seminar
![Page 2: Deep Learning: An Introduction for Applied Mathematiciansmanuelbaumann.de/projects/reading-ml.pdf · Reference C. F. Higham and D. J. Higham (2018). Deep Learning: An Introduction](https://reader030.vdocuments.us/reader030/viewer/2022040308/5f09e2007e708231d428f45d/html5/thumbnails/2.jpg)
Disclaimer
Claim: I am not an expert in machine learning – But we are!
Manuel Baumann, [email protected] Introduction to Deep Learning 2/16
![Page 3: Deep Learning: An Introduction for Applied Mathematiciansmanuelbaumann.de/projects/reading-ml.pdf · Reference C. F. Higham and D. J. Higham (2018). Deep Learning: An Introduction](https://reader030.vdocuments.us/reader030/viewer/2022040308/5f09e2007e708231d428f45d/html5/thumbnails/3.jpg)
Reference
C. F. Higham and D. J. Higham (2018). Deep Learning: An Introductionfor Applied Mathematicians. arXiv:1801.05894v1.
Manuel Baumann, [email protected] Introduction to Deep Learning 3/16
![Page 4: Deep Learning: An Introduction for Applied Mathematiciansmanuelbaumann.de/projects/reading-ml.pdf · Reference C. F. Higham and D. J. Higham (2018). Deep Learning: An Introduction](https://reader030.vdocuments.us/reader030/viewer/2022040308/5f09e2007e708231d428f45d/html5/thumbnails/4.jpg)
Reference
C. F. Higham and D. J. Higham (2018). Deep Learning: An Introductionfor Applied Mathematicians. arXiv:1801.05894v1.
Manuel Baumann, [email protected] Introduction to Deep Learning 3/16
![Page 5: Deep Learning: An Introduction for Applied Mathematiciansmanuelbaumann.de/projects/reading-ml.pdf · Reference C. F. Higham and D. J. Higham (2018). Deep Learning: An Introduction](https://reader030.vdocuments.us/reader030/viewer/2022040308/5f09e2007e708231d428f45d/html5/thumbnails/5.jpg)
Reference
C. F. Higham and D. J. Higham (2018). Deep Learning: An Introductionfor Applied Mathematicians. arXiv:1801.05894v1.
Manuel Baumann, [email protected] Introduction to Deep Learning 3/16
![Page 6: Deep Learning: An Introduction for Applied Mathematiciansmanuelbaumann.de/projects/reading-ml.pdf · Reference C. F. Higham and D. J. Higham (2018). Deep Learning: An Introduction](https://reader030.vdocuments.us/reader030/viewer/2022040308/5f09e2007e708231d428f45d/html5/thumbnails/6.jpg)
New names for old friends
Machine Learning Numerical Analysis
neuron smoothed step function
neural network directed graph
training phase parameter fitting
stochastic gradient steepest descent variant
learning rate step size in line search
back propagation adjoint equation
hidden layers parameter (over-)fitting
’deep’ learning large-scale
artificial intelligence fp(x)
??? model-order reduction
Manuel Baumann, [email protected] Introduction to Deep Learning 4/16
![Page 7: Deep Learning: An Introduction for Applied Mathematiciansmanuelbaumann.de/projects/reading-ml.pdf · Reference C. F. Higham and D. J. Higham (2018). Deep Learning: An Introduction](https://reader030.vdocuments.us/reader030/viewer/2022040308/5f09e2007e708231d428f45d/html5/thumbnails/7.jpg)
New names for old friends
Machine Learning Numerical Analysis
neuron smoothed step function
neural network directed graph
training phase parameter fitting
stochastic gradient steepest descent variant
learning rate step size in line search
back propagation adjoint equation
hidden layers parameter (over-)fitting
’deep’ learning large-scale
artificial intelligence fp(x)
??? model-order reduction
Manuel Baumann, [email protected] Introduction to Deep Learning 4/16
![Page 8: Deep Learning: An Introduction for Applied Mathematiciansmanuelbaumann.de/projects/reading-ml.pdf · Reference C. F. Higham and D. J. Higham (2018). Deep Learning: An Introduction](https://reader030.vdocuments.us/reader030/viewer/2022040308/5f09e2007e708231d428f45d/html5/thumbnails/8.jpg)
New names for old friends
Machine Learning Numerical Analysis
neuron smoothed step function
neural network directed graph
training phase parameter fitting
stochastic gradient steepest descent variant
learning rate step size in line search
back propagation adjoint equation
hidden layers parameter (over-)fitting
’deep’ learning large-scale
artificial intelligence fp(x)
??? model-order reduction
Manuel Baumann, [email protected] Introduction to Deep Learning 4/16
![Page 9: Deep Learning: An Introduction for Applied Mathematiciansmanuelbaumann.de/projects/reading-ml.pdf · Reference C. F. Higham and D. J. Higham (2018). Deep Learning: An Introduction](https://reader030.vdocuments.us/reader030/viewer/2022040308/5f09e2007e708231d428f45d/html5/thumbnails/9.jpg)
New names for old friends
Machine Learning Numerical Analysis
neuron smoothed step function
neural network directed graph
training phase parameter fitting
stochastic gradient steepest descent variant
learning rate step size in line search
back propagation adjoint equation
hidden layers parameter (over-)fitting
’deep’ learning large-scale
artificial intelligence fp(x)
??? model-order reduction
Manuel Baumann, [email protected] Introduction to Deep Learning 4/16
![Page 10: Deep Learning: An Introduction for Applied Mathematiciansmanuelbaumann.de/projects/reading-ml.pdf · Reference C. F. Higham and D. J. Higham (2018). Deep Learning: An Introduction](https://reader030.vdocuments.us/reader030/viewer/2022040308/5f09e2007e708231d428f45d/html5/thumbnails/10.jpg)
New names for old friends
Machine Learning Numerical Analysis
neuron smoothed step function
neural network directed graph
training phase parameter fitting
stochastic gradient steepest descent variant
learning rate step size in line search
back propagation adjoint equation
hidden layers parameter (over-)fitting
’deep’ learning large-scale
artificial intelligence fp(x)
??? model-order reduction
Manuel Baumann, [email protected] Introduction to Deep Learning 4/16
![Page 11: Deep Learning: An Introduction for Applied Mathematiciansmanuelbaumann.de/projects/reading-ml.pdf · Reference C. F. Higham and D. J. Higham (2018). Deep Learning: An Introduction](https://reader030.vdocuments.us/reader030/viewer/2022040308/5f09e2007e708231d428f45d/html5/thumbnails/11.jpg)
New names for old friends
Machine Learning Numerical Analysis
neuron smoothed step function
neural network directed graph
training phase parameter fitting
stochastic gradient steepest descent variant
learning rate step size in line search
back propagation adjoint equation
hidden layers parameter (over-)fitting
’deep’ learning large-scale
artificial intelligence fp(x)
??? model-order reduction
Manuel Baumann, [email protected] Introduction to Deep Learning 4/16
![Page 12: Deep Learning: An Introduction for Applied Mathematiciansmanuelbaumann.de/projects/reading-ml.pdf · Reference C. F. Higham and D. J. Higham (2018). Deep Learning: An Introduction](https://reader030.vdocuments.us/reader030/viewer/2022040308/5f09e2007e708231d428f45d/html5/thumbnails/12.jpg)
New names for old friends
Machine Learning Numerical Analysis
neuron smoothed step function
neural network directed graph
training phase parameter fitting
stochastic gradient steepest descent variant
learning rate step size in line search
back propagation adjoint equation
hidden layers parameter (over-)fitting
’deep’ learning large-scale
artificial intelligence fp(x)
??? model-order reduction
Manuel Baumann, [email protected] Introduction to Deep Learning 4/16
![Page 13: Deep Learning: An Introduction for Applied Mathematiciansmanuelbaumann.de/projects/reading-ml.pdf · Reference C. F. Higham and D. J. Higham (2018). Deep Learning: An Introduction](https://reader030.vdocuments.us/reader030/viewer/2022040308/5f09e2007e708231d428f45d/html5/thumbnails/13.jpg)
New names for old friends
Machine Learning Numerical Analysis
neuron smoothed step function
neural network directed graph
training phase parameter fitting
stochastic gradient steepest descent variant
learning rate step size in line search
back propagation adjoint equation
hidden layers parameter (over-)fitting
’deep’ learning large-scale
artificial intelligence fp(x)
??? model-order reduction
Manuel Baumann, [email protected] Introduction to Deep Learning 4/16
![Page 14: Deep Learning: An Introduction for Applied Mathematiciansmanuelbaumann.de/projects/reading-ml.pdf · Reference C. F. Higham and D. J. Higham (2018). Deep Learning: An Introduction](https://reader030.vdocuments.us/reader030/viewer/2022040308/5f09e2007e708231d428f45d/html5/thumbnails/14.jpg)
New names for old friends
Machine Learning Numerical Analysis
neuron smoothed step function
neural network directed graph
training phase parameter fitting → p
stochastic gradient steepest descent variant
learning rate step size in line search
back propagation adjoint equation
hidden layers parameter (over-)fitting
’deep’ learning large-scale
artificial intelligence fp(x)
??? model-order reduction
Manuel Baumann, [email protected] Introduction to Deep Learning 3/16
![Page 15: Deep Learning: An Introduction for Applied Mathematiciansmanuelbaumann.de/projects/reading-ml.pdf · Reference C. F. Higham and D. J. Higham (2018). Deep Learning: An Introduction](https://reader030.vdocuments.us/reader030/viewer/2022040308/5f09e2007e708231d428f45d/html5/thumbnails/15.jpg)
New names for old friends
Machine Learning Numerical Analysis
neuron smoothed step function
neural network directed graph
training phase parameter fitting → p
stochastic gradient steepest descent variant
learning rate step size in line search
back propagation adjoint equation
hidden layers parameter (over-)fitting
’deep’ learning large-scale
artificial intelligence fp(x)
??? model-order reduction
Manuel Baumann, [email protected] Introduction to Deep Learning 3/16
![Page 16: Deep Learning: An Introduction for Applied Mathematiciansmanuelbaumann.de/projects/reading-ml.pdf · Reference C. F. Higham and D. J. Higham (2018). Deep Learning: An Introduction](https://reader030.vdocuments.us/reader030/viewer/2022040308/5f09e2007e708231d428f45d/html5/thumbnails/16.jpg)
Overview
1. The General Set-up
2. Stochastic Gradient
3. Back Propagation
4. MATLAB Example
5. Current Research Directions
Manuel Baumann, [email protected] Introduction to Deep Learning 4/16
![Page 17: Deep Learning: An Introduction for Applied Mathematiciansmanuelbaumann.de/projects/reading-ml.pdf · Reference C. F. Higham and D. J. Higham (2018). Deep Learning: An Introduction](https://reader030.vdocuments.us/reader030/viewer/2022040308/5f09e2007e708231d428f45d/html5/thumbnails/17.jpg)
The General Set-up
The sigmoid function,
σ(x) =1
1 + e−x
models the bahavior of a neuron in the brain.
Manuel Baumann, [email protected] Introduction to Deep Learning 5/16
![Page 18: Deep Learning: An Introduction for Applied Mathematiciansmanuelbaumann.de/projects/reading-ml.pdf · Reference C. F. Higham and D. J. Higham (2018). Deep Learning: An Introduction](https://reader030.vdocuments.us/reader030/viewer/2022040308/5f09e2007e708231d428f45d/html5/thumbnails/18.jpg)
A neural network
a[1] = x ∈ Rn1 ,
a[l] = σ(W [l]a[l−1] + b[l]
)∈ Rnl , l = 2, 3, ..., L
Manuel Baumann, [email protected] Introduction to Deep Learning 6/16
![Page 19: Deep Learning: An Introduction for Applied Mathematiciansmanuelbaumann.de/projects/reading-ml.pdf · Reference C. F. Higham and D. J. Higham (2018). Deep Learning: An Introduction](https://reader030.vdocuments.us/reader030/viewer/2022040308/5f09e2007e708231d428f45d/html5/thumbnails/19.jpg)
The entering example
minW [j],b[j]
1
10
10∑i=1
1
2‖y(xi)− F (xi)‖22, j ∈ {2, 3, 4},
F (x) = σ(W [4]σ
(W [3]σ
(W [2]x+ b[2]
)+ b[3]
)b[4])∈ R2
Manuel Baumann, [email protected] Introduction to Deep Learning 7/16
![Page 20: Deep Learning: An Introduction for Applied Mathematiciansmanuelbaumann.de/projects/reading-ml.pdf · Reference C. F. Higham and D. J. Higham (2018). Deep Learning: An Introduction](https://reader030.vdocuments.us/reader030/viewer/2022040308/5f09e2007e708231d428f45d/html5/thumbnails/20.jpg)
The entering example
minp∈R23
1
10
10∑i=1
1
2‖y(xi)− F (xi)‖22, j ∈ {2, 3, 4},
F (x) = σ(W [4]σ
(W [3]σ
(W [2]x+ b[2]
)+ b[3]
)b[4])∈ R2
Manuel Baumann, [email protected] Introduction to Deep Learning 7/16
![Page 21: Deep Learning: An Introduction for Applied Mathematiciansmanuelbaumann.de/projects/reading-ml.pdf · Reference C. F. Higham and D. J. Higham (2018). Deep Learning: An Introduction](https://reader030.vdocuments.us/reader030/viewer/2022040308/5f09e2007e708231d428f45d/html5/thumbnails/21.jpg)
Stochastic Gradient
Objective function:
J (p) = 1
N
N∑i=1
1
2‖y(xi)− a[L](xi, p)‖22=:
1
N
N∑i=1
Ci(xi, p)
Steepest descent:
p← p− η∇J (p), η ∈ R+ is called ’learning rate’.
Stochastic gradient:
∇J (p) =≈
Manuel Baumann, [email protected] Introduction to Deep Learning 8/16
![Page 22: Deep Learning: An Introduction for Applied Mathematiciansmanuelbaumann.de/projects/reading-ml.pdf · Reference C. F. Higham and D. J. Higham (2018). Deep Learning: An Introduction](https://reader030.vdocuments.us/reader030/viewer/2022040308/5f09e2007e708231d428f45d/html5/thumbnails/22.jpg)
Stochastic Gradient
Objective function:
J (p) = 1
N
N∑i=1
1
2‖y(xi)− a[L](xi, p)‖22=:
1
N
N∑i=1
Ci(xi, p)
Steepest descent:
p← p− η∇J (p), η ∈ R+ is called ’learning rate’.
Stochastic gradient:
∇J (p) =≈
Manuel Baumann, [email protected] Introduction to Deep Learning 8/16
![Page 23: Deep Learning: An Introduction for Applied Mathematiciansmanuelbaumann.de/projects/reading-ml.pdf · Reference C. F. Higham and D. J. Higham (2018). Deep Learning: An Introduction](https://reader030.vdocuments.us/reader030/viewer/2022040308/5f09e2007e708231d428f45d/html5/thumbnails/23.jpg)
Stochastic Gradient
Objective function:
J (p) = 1
N
N∑i=1
1
2‖y(xi)− a[L](xi, p)‖22 =:
1
N
N∑i=1
Ci(xi, p)
Steepest descent:
p← p− η∇J (p), η ∈ R+ is called ’learning rate’.
Stochastic gradient:
∇J (p) = 1
N
N∑i=1
∇pCi(xi, p) ≈ 1
|I|∑i∈I∇pCi(x
i, p)
Manuel Baumann, [email protected] Introduction to Deep Learning 8/16
![Page 24: Deep Learning: An Introduction for Applied Mathematiciansmanuelbaumann.de/projects/reading-ml.pdf · Reference C. F. Higham and D. J. Higham (2018). Deep Learning: An Introduction](https://reader030.vdocuments.us/reader030/viewer/2022040308/5f09e2007e708231d428f45d/html5/thumbnails/24.jpg)
Stochastic Gradient
Objective function:
J (p) = 1
N
N∑i=1
1
2‖y(xi)− a[L](xi, p)‖22 =:
1
N
N∑i=1
Ci(xi, p)
Steepest descent:
p← p− η∇J (p), η ∈ R+ is called ’learning rate’.
Stochastic gradient:
∇J (p) = 1
N
N∑i=1
∇pCi(xi, p) ≈ 1
|I|∑i∈I∇pCi(x
i, p)
Manuel Baumann, [email protected] Introduction to Deep Learning 8/16
![Page 25: Deep Learning: An Introduction for Applied Mathematiciansmanuelbaumann.de/projects/reading-ml.pdf · Reference C. F. Higham and D. J. Higham (2018). Deep Learning: An Introduction](https://reader030.vdocuments.us/reader030/viewer/2022040308/5f09e2007e708231d428f45d/html5/thumbnails/25.jpg)
Back Propagation (1/3)
Now, p ∼{[W [l]
]j,k,[b[l]]j
}. Let z[l] :=W [l]a[l−1] + b[l] and δ
[l]j := ∂C
∂z[l]j
.
Lemma: Back Propagation
The partial derivatives are given by,
δ[L] = σ′(z[L]) · (a[L] − y), (1)
δ[l] = σ′(z[l]) · (W [l+1])T δ[l+1], 2 ≤ l ≤ L− 1, (2)
∂C
∂b[l]j
= δ[l]j , 2 ≤ l ≤ L, (3)
∂C
∂w[l]jk
= δ[l]j a
[l−1]k , 2 ≤ l ≤ L. (4)
Manuel Baumann, [email protected] Introduction to Deep Learning 9/16
![Page 26: Deep Learning: An Introduction for Applied Mathematiciansmanuelbaumann.de/projects/reading-ml.pdf · Reference C. F. Higham and D. J. Higham (2018). Deep Learning: An Introduction](https://reader030.vdocuments.us/reader030/viewer/2022040308/5f09e2007e708231d428f45d/html5/thumbnails/26.jpg)
Back Propagation (2/3)Proof.
We prove (1) component-wise:
δ[L]j =
∂C
∂z[L]j
=∂C
∂a[L]j
∂a[L]j
∂z[L]j
= (a[L]j − yj)σ
′(z[L]j ) = (a
[L]j − yj)(σ(z
[L]j )− σ2(z
[L]j ))
Next, we prove (2) component-wise:
δ[l]j =
∂C
∂z[l]j
=
nl+1∑k=1
∂C
∂z[l+1]k
∂z[l+1]k
∂z[l]j
=
nl+1∑k=1
δ[l+1]k
∂z[l+1]k
∂z[l]j
=
nl+1∑k=1
δ[l+1]k w
[l+1]kj σ′(z
[l]j ),
where z[l+1]k =
∑nl
s=1 w[l+1]ks σ(z
[l]s ) + b
[l+1]k .
(3) and (4) similar.
Manuel Baumann, [email protected] Introduction to Deep Learning 10/16
![Page 27: Deep Learning: An Introduction for Applied Mathematiciansmanuelbaumann.de/projects/reading-ml.pdf · Reference C. F. Higham and D. J. Higham (2018). Deep Learning: An Introduction](https://reader030.vdocuments.us/reader030/viewer/2022040308/5f09e2007e708231d428f45d/html5/thumbnails/27.jpg)
Back Propagation (2/3)Proof.
We prove (1) component-wise:
δ[L]j =
∂C
∂z[L]j
=∂C
∂a[L]j
∂a[L]j
∂z[L]j
= (a[L]j − yj)σ
′(z[L]j ) = (a
[L]j − yj)(σ(z
[L]j )− σ2(z
[L]j ))
Next, we prove (2) component-wise:
δ[l]j =
∂C
∂z[l]j
=
nl+1∑k=1
∂C
∂z[l+1]k
∂z[l+1]k
∂z[l]j
=
nl+1∑k=1
δ[l+1]k
∂z[l+1]k
∂z[l]j
=
nl+1∑k=1
δ[l+1]k w
[l+1]kj σ′(z
[l]j ),
where z[l+1]k =
∑nl
s=1 w[l+1]ks σ(z
[l]s ) + b
[l+1]k .
(3) and (4) similar.
Manuel Baumann, [email protected] Introduction to Deep Learning 10/16
![Page 28: Deep Learning: An Introduction for Applied Mathematiciansmanuelbaumann.de/projects/reading-ml.pdf · Reference C. F. Higham and D. J. Higham (2018). Deep Learning: An Introduction](https://reader030.vdocuments.us/reader030/viewer/2022040308/5f09e2007e708231d428f45d/html5/thumbnails/28.jpg)
Back Propagation (2/3)Proof.
We prove (1) component-wise:
δ[L]j =
∂C
∂z[L]j
=∂C
∂a[L]j
∂a[L]j
∂z[L]j
= (a[L]j − yj)σ
′(z[L]j ) = (a
[L]j − yj)(σ(z
[L]j )− σ2(z
[L]j ))
Next, we prove (2) component-wise:
δ[l]j =
∂C
∂z[l]j
=
nl+1∑k=1
∂C
∂z[l+1]k
∂z[l+1]k
∂z[l]j
=
nl+1∑k=1
δ[l+1]k
∂z[l+1]k
∂z[l]j
=
nl+1∑k=1
δ[l+1]k w
[l+1]kj σ′(z
[l]j ),
where z[l+1]k =
∑nl
s=1 w[l+1]ks σ(z
[l]s ) + b
[l+1]k .
(3) and (4) similar.
Manuel Baumann, [email protected] Introduction to Deep Learning 10/16
![Page 29: Deep Learning: An Introduction for Applied Mathematiciansmanuelbaumann.de/projects/reading-ml.pdf · Reference C. F. Higham and D. J. Higham (2018). Deep Learning: An Introduction](https://reader030.vdocuments.us/reader030/viewer/2022040308/5f09e2007e708231d428f45d/html5/thumbnails/29.jpg)
Back Propagation (3/3)
Interpretation:
Evaluation of a[L] requires aso-called forward pass:a[1], z[2] → a[2], z[3] → ...→ a[L]
Compute: σ[L] via (1)
Compute: backward pass (2)σ[L] → σ[L−1] → ...→ σ[2]
Gradients via (3)-(4)
Manuel Baumann, [email protected] Introduction to Deep Learning 11/16
![Page 31: Deep Learning: An Introduction for Applied Mathematiciansmanuelbaumann.de/projects/reading-ml.pdf · Reference C. F. Higham and D. J. Higham (2018). Deep Learning: An Introduction](https://reader030.vdocuments.us/reader030/viewer/2022040308/5f09e2007e708231d428f45d/html5/thumbnails/31.jpg)
Some more detailsConvolutional Neural Network:
Presented approach unfeasible for large data (W [l] is dense).
Layers can be pre- and post-precessing steps used in image analysis;filtering, max pooling, average pooling, ...
Avoiding Overfitting:
Trained network works well on given data, but not on new data.
Splitting: training – validation data
Dropout: independently remove neurons
Manuel Baumann, [email protected] Introduction to Deep Learning 13/16
![Page 32: Deep Learning: An Introduction for Applied Mathematiciansmanuelbaumann.de/projects/reading-ml.pdf · Reference C. F. Higham and D. J. Higham (2018). Deep Learning: An Introduction](https://reader030.vdocuments.us/reader030/viewer/2022040308/5f09e2007e708231d428f45d/html5/thumbnails/32.jpg)
Current Research Directions
Research directions:
proofs – e.g. when data is assumed to be i.i.d.
in practice: design of layers
perturbation theory: update trained network
autoencoders: ‖x−G(F (x))‖22 → min
Two software examples:
http://www.vlfeat.org/matconvnet/
http://scikit-learn.org/
Manuel Baumann, [email protected] Introduction to Deep Learning 14/16
![Page 33: Deep Learning: An Introduction for Applied Mathematiciansmanuelbaumann.de/projects/reading-ml.pdf · Reference C. F. Higham and D. J. Higham (2018). Deep Learning: An Introduction](https://reader030.vdocuments.us/reader030/viewer/2022040308/5f09e2007e708231d428f45d/html5/thumbnails/33.jpg)
Summary
New names for old friends.
neuron smoothed step function
neural network directed graph
training phase parameter fitting → p
stochastic gradient steepest descent variant
learning rate step size in line search
back propagation adjoint equation
hidden layers parameter (over-)fitting
’deep’ learning large-scale
artificial intelligence fp(x)
PCA POD
Manuel Baumann, [email protected] Introduction to Deep Learning 15/16
![Page 34: Deep Learning: An Introduction for Applied Mathematiciansmanuelbaumann.de/projects/reading-ml.pdf · Reference C. F. Higham and D. J. Higham (2018). Deep Learning: An Introduction](https://reader030.vdocuments.us/reader030/viewer/2022040308/5f09e2007e708231d428f45d/html5/thumbnails/34.jpg)
Further readings
Andrew Ng. Coursera Machine Learning (online courses)
M. Nielsen, Neural Networks and Deep Learning, Determination Press,2015.
I. Goodfellow, Y. Bengio, and A. Courville, Deep Learning, MIT Press,Boston, 2016.
Y. LeCun, Y. Bengio, and G. Hinton, Deep learning, Nature, 521(2015), pp. 436444.
S. Mallat, Understanding deep convolutional networks, PhilosophicalTransactions of the Royal Society of London A, 374 (2016).
Manuel Baumann, [email protected] Introduction to Deep Learning 16/16