introduction to deep learning - unibas.ch...introduction to deep learning standard feed-forward...
TRANSCRIPT
![Page 1: Introduction to Deep Learning - unibas.ch...Introduction to Deep Learning Standard feed-forward neural network with 3 hidden layers A convolutional neural network (AlexNet, Krizhevsky](https://reader033.vdocuments.us/reader033/viewer/2022053023/6053c3abd59d2404876eacd4/html5/thumbnails/1.jpg)
Introduction to Deep Learning
Standard feed-forward neural network with 3 hidden layers
A convolutional neural network (AlexNet, Krizhevsky et al ‘12)
![Page 2: Introduction to Deep Learning - unibas.ch...Introduction to Deep Learning Standard feed-forward neural network with 3 hidden layers A convolutional neural network (AlexNet, Krizhevsky](https://reader033.vdocuments.us/reader033/viewer/2022053023/6053c3abd59d2404876eacd4/html5/thumbnails/2.jpg)
Example applications
Image classification(AlexNet, Krizhevsky et al ‘12)
Generating image descriptions (Karpathy et al ‘15)
Translation (Wu et al ‘15)Face generation
(Berthelot et al ‘17)
![Page 3: Introduction to Deep Learning - unibas.ch...Introduction to Deep Learning Standard feed-forward neural network with 3 hidden layers A convolutional neural network (AlexNet, Krizhevsky](https://reader033.vdocuments.us/reader033/viewer/2022053023/6053c3abd59d2404876eacd4/html5/thumbnails/3.jpg)
Very Brief History
2012AlexNet
Deep Neural NetworksSupport Vector Machines,Kernel methods, ….
(90s and earlier:Neural networks)
![Page 4: Introduction to Deep Learning - unibas.ch...Introduction to Deep Learning Standard feed-forward neural network with 3 hidden layers A convolutional neural network (AlexNet, Krizhevsky](https://reader033.vdocuments.us/reader033/viewer/2022053023/6053c3abd59d2404876eacd4/html5/thumbnails/4.jpg)
Seminar plan
Meeting # Speaker Topic
1 David Belius Intro to Machine Learning
2 Marko Thiel Intro to Artificial Neural Networks
3 tba DL Basics: Regularization
4 tba DL Basics: Optimization
5 tba DL Basics: Convolutional neural networks
6 tba DL Basics: Recurrent Neural Networks
7-11 tba Advanced topics
![Page 5: Introduction to Deep Learning - unibas.ch...Introduction to Deep Learning Standard feed-forward neural network with 3 hidden layers A convolutional neural network (AlexNet, Krizhevsky](https://reader033.vdocuments.us/reader033/viewer/2022053023/6053c3abd59d2404876eacd4/html5/thumbnails/5.jpg)
Introduction to ML
![Page 6: Introduction to Deep Learning - unibas.ch...Introduction to Deep Learning Standard feed-forward neural network with 3 hidden layers A convolutional neural network (AlexNet, Krizhevsky](https://reader033.vdocuments.us/reader033/viewer/2022053023/6053c3abd59d2404876eacd4/html5/thumbnails/6.jpg)
ML: Learn to generalize from data● MNIST: 60k handwritten digits, 28x28 grayscale pixels
x
y● CIFAR100: 50k images of objects from 100 classes,
32x32 RGB pixels
x
y● Europarl EN-DE: 1.7m sentence pairs
0 8 3 7
train sunflower elephant cow
Frau Präsidentin, können Sie mir sagen, warum sich dieses Parlament nicht...
Madam President, can you tell me why this Parliament does not….
It is why we cannot say a clear yes.
Deswegen können wir nicht eindeutig ja sagen.
![Page 7: Introduction to Deep Learning - unibas.ch...Introduction to Deep Learning Standard feed-forward neural network with 3 hidden layers A convolutional neural network (AlexNet, Krizhevsky](https://reader033.vdocuments.us/reader033/viewer/2022053023/6053c3abd59d2404876eacd4/html5/thumbnails/7.jpg)
Labeled data
Unlabeled data
Space of “inputs” (e.g. image of handwritten digit)
Space of labels/”outputs” (e.g. digit 0-9)
Data point
Space of data
Data point
Supervised learning
Unsupervised learning
E.g.
– Clustering
– Dimensionality reduction
ML: Learn to generalize from data
![Page 8: Introduction to Deep Learning - unibas.ch...Introduction to Deep Learning Standard feed-forward neural network with 3 hidden layers A convolutional neural network (AlexNet, Krizhevsky](https://reader033.vdocuments.us/reader033/viewer/2022053023/6053c3abd59d2404876eacd4/html5/thumbnails/8.jpg)
Probabilistic model of data
Data set
are iid samples from unknown probability distribution on
Goal of learning: Get information about
![Page 9: Introduction to Deep Learning - unibas.ch...Introduction to Deep Learning Standard feed-forward neural network with 3 hidden layers A convolutional neural network (AlexNet, Krizhevsky](https://reader033.vdocuments.us/reader033/viewer/2022053023/6053c3abd59d2404876eacd4/html5/thumbnails/9.jpg)
Basic ML tasks:Supervised learning
Classification, regression– Predict y from x, i.e. learn
– Often assume Y deterministic function of X
– Then have “truth”
– Seek estimate
![Page 10: Introduction to Deep Learning - unibas.ch...Introduction to Deep Learning Standard feed-forward neural network with 3 hidden layers A convolutional neural network (AlexNet, Krizhevsky](https://reader033.vdocuments.us/reader033/viewer/2022053023/6053c3abd59d2404876eacd4/html5/thumbnails/10.jpg)
Basic ML tasks:Supervised learning
Classification, regression– Predict y from x, i.e. learn
– Often assume Y deterministic function of X
– Then have “truth”
– Seek estimate
– Classification ● is finite set of labels, e.g. ● Want for most
– Regression●
● E.g.: ● Want small for most
![Page 11: Introduction to Deep Learning - unibas.ch...Introduction to Deep Learning Standard feed-forward neural network with 3 hidden layers A convolutional neural network (AlexNet, Krizhevsky](https://reader033.vdocuments.us/reader033/viewer/2022053023/6053c3abd59d2404876eacd4/html5/thumbnails/11.jpg)
– Density estimation● probability density of on● Seek estimate ● E.g. outlier detection:
– Sampling/synthesis● Learn how to simulate a sample from a
probability law that approximates● E.g: Learn to generate an image of a realistic looking
human face
Basic ML tasks:Unsupervised learning
![Page 12: Introduction to Deep Learning - unibas.ch...Introduction to Deep Learning Standard feed-forward neural network with 3 hidden layers A convolutional neural network (AlexNet, Krizhevsky](https://reader033.vdocuments.us/reader033/viewer/2022053023/6053c3abd59d2404876eacd4/html5/thumbnails/12.jpg)
Evaluating performance
● Classification– True error for fixed :
– True error is unknown
– If are iid samples from
is unbiased estimator for true error.
– Warning: Only true if not usedto construct !
![Page 13: Introduction to Deep Learning - unibas.ch...Introduction to Deep Learning Standard feed-forward neural network with 3 hidden layers A convolutional neural network (AlexNet, Krizhevsky](https://reader033.vdocuments.us/reader033/viewer/2022053023/6053c3abd59d2404876eacd4/html5/thumbnails/13.jpg)
Evaluating performance
● Regression– True Mean Squared Error (MSE) for fixed :
– Not known, but unbiased estimate:
if not used to construct !
![Page 14: Introduction to Deep Learning - unibas.ch...Introduction to Deep Learning Standard feed-forward neural network with 3 hidden layers A convolutional neural network (AlexNet, Krizhevsky](https://reader033.vdocuments.us/reader033/viewer/2022053023/6053c3abd59d2404876eacd4/html5/thumbnails/14.jpg)
● Split data set into– Training set (~80%)
– Test set (~20%)
● Construct using training set.● Evaluate performance using test set.
Train data and test data
![Page 15: Introduction to Deep Learning - unibas.ch...Introduction to Deep Learning Standard feed-forward neural network with 3 hidden layers A convolutional neural network (AlexNet, Krizhevsky](https://reader033.vdocuments.us/reader033/viewer/2022053023/6053c3abd59d2404876eacd4/html5/thumbnails/15.jpg)
How to learn
● Non-parametric algorithms– k-nearest neighbours classification, decision trees,
k-means clustering
● Parametric algorithms (“fitting”)– Hypothesis set of potential estimates
parametrized by some number of real parameters
– Error of on training set (regression):
– Learning: find with small error on training setand set
![Page 16: Introduction to Deep Learning - unibas.ch...Introduction to Deep Learning Standard feed-forward neural network with 3 hidden layers A convolutional neural network (AlexNet, Krizhevsky](https://reader033.vdocuments.us/reader033/viewer/2022053023/6053c3abd59d2404876eacd4/html5/thumbnails/16.jpg)
Crucial to restrict class somehow
has zero error on training set.
![Page 17: Introduction to Deep Learning - unibas.ch...Introduction to Deep Learning Standard feed-forward neural network with 3 hidden layers A convolutional neural network (AlexNet, Krizhevsky](https://reader033.vdocuments.us/reader033/viewer/2022053023/6053c3abd59d2404876eacd4/html5/thumbnails/17.jpg)
Example: Linear regression● ,●
● Find that minimize
●
Line of best fit
![Page 18: Introduction to Deep Learning - unibas.ch...Introduction to Deep Learning Standard feed-forward neural network with 3 hidden layers A convolutional neural network (AlexNet, Krizhevsky](https://reader033.vdocuments.us/reader033/viewer/2022053023/6053c3abd59d2404876eacd4/html5/thumbnails/18.jpg)
● Recall: there is a closed form formula for the optimal (least squares, normal equations)
● But also:
is smooth function in → Loss function● Furthermore is convex
– Has unique global minimum
– which can be found by numerical optimization: gradient descent
Example: Linear regression
![Page 19: Introduction to Deep Learning - unibas.ch...Introduction to Deep Learning Standard feed-forward neural network with 3 hidden layers A convolutional neural network (AlexNet, Krizhevsky](https://reader033.vdocuments.us/reader033/viewer/2022053023/6053c3abd59d2404876eacd4/html5/thumbnails/19.jpg)
● Gradient descent– arbitrary (random)
– small (step size/learning rate)
–
● “Always” finds global minimum of smooth, convex loss function
● But typically not fornon-convex function
Example: Linear regression
![Page 20: Introduction to Deep Learning - unibas.ch...Introduction to Deep Learning Standard feed-forward neural network with 3 hidden layers A convolutional neural network (AlexNet, Krizhevsky](https://reader033.vdocuments.us/reader033/viewer/2022053023/6053c3abd59d2404876eacd4/html5/thumbnails/20.jpg)
General recipe of parametric ML
1. Define hypothesis set
2. Define loss smooth loss function
3. Numerically minimize to find estimate
● Traditional ML: make sure is convex to have guarantees for numerical minimization
● Deep Learning/Neural Networks: – highly non-convex
– Somehow, still works. Gradient descent finds “good” minima.
![Page 21: Introduction to Deep Learning - unibas.ch...Introduction to Deep Learning Standard feed-forward neural network with 3 hidden layers A convolutional neural network (AlexNet, Krizhevsky](https://reader033.vdocuments.us/reader033/viewer/2022053023/6053c3abd59d2404876eacd4/html5/thumbnails/21.jpg)
Example: Linear regression● Can fit data that is basically linear:
● Can’t fit other relationships:
● Solution: Make hypothesis set richer!
![Page 22: Introduction to Deep Learning - unibas.ch...Introduction to Deep Learning Standard feed-forward neural network with 3 hidden layers A convolutional neural network (AlexNet, Krizhevsky](https://reader033.vdocuments.us/reader033/viewer/2022053023/6053c3abd59d2404876eacd4/html5/thumbnails/22.jpg)
Example: Polynomial regression● ,●
● Loss
is smooth and convex.
![Page 23: Introduction to Deep Learning - unibas.ch...Introduction to Deep Learning Standard feed-forward neural network with 3 hidden layers A convolutional neural network (AlexNet, Krizhevsky](https://reader033.vdocuments.us/reader033/viewer/2022053023/6053c3abd59d2404876eacd4/html5/thumbnails/23.jpg)
Capacity, overfitting, underfitting
● Capacity: the “richness” of hypothesis set ● Mathematical definitions exist (e.g. VC-dimension)
but often used as intuitive notion● Polynomial regression:
● Too little capacity: can’t fit train data→underfitting● Too much capacity: generalize badly→overfitting
More capacityLess capacity
![Page 24: Introduction to Deep Learning - unibas.ch...Introduction to Deep Learning Standard feed-forward neural network with 3 hidden layers A convolutional neural network (AlexNet, Krizhevsky](https://reader033.vdocuments.us/reader033/viewer/2022053023/6053c3abd59d2404876eacd4/html5/thumbnails/24.jpg)
Polynomial regression:underfitting/overfitting
(Credit: Francois Fleuret, EPFL)
![Page 25: Introduction to Deep Learning - unibas.ch...Introduction to Deep Learning Standard feed-forward neural network with 3 hidden layers A convolutional neural network (AlexNet, Krizhevsky](https://reader033.vdocuments.us/reader033/viewer/2022053023/6053c3abd59d2404876eacd4/html5/thumbnails/25.jpg)
Capacity, overfitting, underfitting
● Underfitting: train error large, test error large● Overfitting: train error small, test error large● Trade-off: must find appropriate level of capacity
for data distribution
More capacityLess capacity
Train error
Test error
Best compromise
Underfitting Overfitting
![Page 26: Introduction to Deep Learning - unibas.ch...Introduction to Deep Learning Standard feed-forward neural network with 3 hidden layers A convolutional neural network (AlexNet, Krizhevsky](https://reader033.vdocuments.us/reader033/viewer/2022053023/6053c3abd59d2404876eacd4/html5/thumbnails/26.jpg)
● Traditional ML: low to moderate capacity● Deep Learning: Enormous capacity.
– Millions of parameters (>> # training examples)
– Still don’t overfit. Why?
Capacity, overfitting, underfitting
![Page 27: Introduction to Deep Learning - unibas.ch...Introduction to Deep Learning Standard feed-forward neural network with 3 hidden layers A convolutional neural network (AlexNet, Krizhevsky](https://reader033.vdocuments.us/reader033/viewer/2022053023/6053c3abd59d2404876eacd4/html5/thumbnails/27.jpg)
Model selection(Hyperparameter selection)
● If test set is used to evaluate performance of different hypothesis sets (different models):– Test error no longer unbiased estimator of true
error!
● Good:
● Still good:
● Bad:
(Credit: Francois Fleuret, EPFL)
![Page 28: Introduction to Deep Learning - unibas.ch...Introduction to Deep Learning Standard feed-forward neural network with 3 hidden layers A convolutional neural network (AlexNet, Krizhevsky](https://reader033.vdocuments.us/reader033/viewer/2022053023/6053c3abd59d2404876eacd4/html5/thumbnails/28.jpg)
● Solution: Further split train data into– Training set
– Validation set
● Pick algorithm that evaluates the best on validation set
● Report performance on test set
● Good:
Model selection(Hyperparameter selection)
(~80%)
(~20%)
(Credit: Francois Fleuret, EPFL)
![Page 29: Introduction to Deep Learning - unibas.ch...Introduction to Deep Learning Standard feed-forward neural network with 3 hidden layers A convolutional neural network (AlexNet, Krizhevsky](https://reader033.vdocuments.us/reader033/viewer/2022053023/6053c3abd59d2404876eacd4/html5/thumbnails/29.jpg)
Loss for categorical data
● True error for classification● Empirical training error
is not smooth function! Can’t be used as loss for gradient based numerical optimization.
![Page 30: Introduction to Deep Learning - unibas.ch...Introduction to Deep Learning Standard feed-forward neural network with 3 hidden layers A convolutional neural network (AlexNet, Krizhevsky](https://reader033.vdocuments.us/reader033/viewer/2022053023/6053c3abd59d2404876eacd4/html5/thumbnails/30.jpg)
Loss for categorical data
● Solution: Formulate as really predicting conditional distribution
● Specify probability distribution on , as vector of probabilities
●
● True cond. distribution is
● Seek estimate
![Page 31: Introduction to Deep Learning - unibas.ch...Introduction to Deep Learning Standard feed-forward neural network with 3 hidden layers A convolutional neural network (AlexNet, Krizhevsky](https://reader033.vdocuments.us/reader033/viewer/2022053023/6053c3abd59d2404876eacd4/html5/thumbnails/31.jpg)
Loss for categorical data
● To quantify error made in prediction:– Use relative entropy/ Kullback-Leibler divergence
as distance between prob. measures.
– Recall:
● Error made in prediction for fixed :
● Unknown true error
![Page 32: Introduction to Deep Learning - unibas.ch...Introduction to Deep Learning Standard feed-forward neural network with 3 hidden layers A convolutional neural network (AlexNet, Krizhevsky](https://reader033.vdocuments.us/reader033/viewer/2022053023/6053c3abd59d2404876eacd4/html5/thumbnails/32.jpg)
Loss for categorical data
● Unknown true error● Unbiased estimator
– Here:
● Concretely, estimator equals:
(One-hot encoding)
![Page 33: Introduction to Deep Learning - unibas.ch...Introduction to Deep Learning Standard feed-forward neural network with 3 hidden layers A convolutional neural network (AlexNet, Krizhevsky](https://reader033.vdocuments.us/reader033/viewer/2022053023/6053c3abd59d2404876eacd4/html5/thumbnails/33.jpg)
Loss for categorical data● Concretely, estimator equals:
● The training loss function
is smooth! → Can use numerical optimization● Remarks:
– MSE loss can be justified in terms of predicting a Gaussian dist.
– Not all loss functions derived in such a principled way.
![Page 34: Introduction to Deep Learning - unibas.ch...Introduction to Deep Learning Standard feed-forward neural network with 3 hidden layers A convolutional neural network (AlexNet, Krizhevsky](https://reader033.vdocuments.us/reader033/viewer/2022053023/6053c3abd59d2404876eacd4/html5/thumbnails/34.jpg)
Example: Logistic regression● Hypothesis set
● Loss
is convex in W,b! → Can find global minimum with gradient descent.
● To predict one class: output – Equivalently: output k with largest
![Page 35: Introduction to Deep Learning - unibas.ch...Introduction to Deep Learning Standard feed-forward neural network with 3 hidden layers A convolutional neural network (AlexNet, Krizhevsky](https://reader033.vdocuments.us/reader033/viewer/2022053023/6053c3abd59d2404876eacd4/html5/thumbnails/35.jpg)
Example: Logistic regression● To predict one class: output
– Equivalently: output largest k with largest
● Logistic regression can fit linearly separable data
Decision boundary
Can fit Can’t fit
![Page 36: Introduction to Deep Learning - unibas.ch...Introduction to Deep Learning Standard feed-forward neural network with 3 hidden layers A convolutional neural network (AlexNet, Krizhevsky](https://reader033.vdocuments.us/reader033/viewer/2022053023/6053c3abd59d2404876eacd4/html5/thumbnails/36.jpg)
Example: Logistic regression● If data not linearly separable, can look for
transformation that makes it more so:
● Train on data● = a representation of● Traditional ML: Construct representation by hand● Deep learning: Algorithm finds good representation
during training
![Page 37: Introduction to Deep Learning - unibas.ch...Introduction to Deep Learning Standard feed-forward neural network with 3 hidden layers A convolutional neural network (AlexNet, Krizhevsky](https://reader033.vdocuments.us/reader033/viewer/2022053023/6053c3abd59d2404876eacd4/html5/thumbnails/37.jpg)
Example: Logistic regression● MNIST: 60k handwritten digits, 28x28 grayscale pixels
x
y● Logistic regression on MNIST:
test error rate ~7%
0 8 3 7
![Page 38: Introduction to Deep Learning - unibas.ch...Introduction to Deep Learning Standard feed-forward neural network with 3 hidden layers A convolutional neural network (AlexNet, Krizhevsky](https://reader033.vdocuments.us/reader033/viewer/2022053023/6053c3abd59d2404876eacd4/html5/thumbnails/38.jpg)
Classification: overfitting/underfitting
Good fit
Overfitting
Underfitting
(Credit: Wikipedia)
![Page 39: Introduction to Deep Learning - unibas.ch...Introduction to Deep Learning Standard feed-forward neural network with 3 hidden layers A convolutional neural network (AlexNet, Krizhevsky](https://reader033.vdocuments.us/reader033/viewer/2022053023/6053c3abd59d2404876eacd4/html5/thumbnails/39.jpg)
Data encoding● MNIST: 60k handwritten digits, 28x28 grayscale pixels
x
y● CIFAR100: 50k images of objects from 100 classes,
32x32 RGB pixels
x
y● Europarl EN-DE: 1.7m sentence pairs
0 8 3 7
train sunflower elephant cow
Frau Präsidentin, können Sie mir sagen, warum sich dieses Parlament nicht...
Madam President, can you tell me why this Parliament does not….
It is why we cannot say a clear yes.
Deswegen können wir nicht eindeutig ja sagen.
(One-hot)
(One-hot)
(One-hot)
![Page 40: Introduction to Deep Learning - unibas.ch...Introduction to Deep Learning Standard feed-forward neural network with 3 hidden layers A convolutional neural network (AlexNet, Krizhevsky](https://reader033.vdocuments.us/reader033/viewer/2022053023/6053c3abd59d2404876eacd4/html5/thumbnails/40.jpg)
Feature engineeringRepresentation engineering
● Traditional ML: Use hand-engineered features as inputs to algorithms
● Deep Learning: Feed algorithm raw data (pixels, character level text,….)
![Page 41: Introduction to Deep Learning - unibas.ch...Introduction to Deep Learning Standard feed-forward neural network with 3 hidden layers A convolutional neural network (AlexNet, Krizhevsky](https://reader033.vdocuments.us/reader033/viewer/2022053023/6053c3abd59d2404876eacd4/html5/thumbnails/41.jpg)
Standard data sets:Used as benchmarks
(Credit: https://srconstantin.wordpress.com/) (Credit: Francois Fleuret, EPFL)
Performance on MNIST Performance on CIFAR10
![Page 42: Introduction to Deep Learning - unibas.ch...Introduction to Deep Learning Standard feed-forward neural network with 3 hidden layers A convolutional neural network (AlexNet, Krizhevsky](https://reader033.vdocuments.us/reader033/viewer/2022053023/6053c3abd59d2404876eacd4/html5/thumbnails/42.jpg)
Collective overfitting of test set by ML community
● Recall– Good:
– For heavily used dataset:
● Need new datasets to appear periodically(Credit: Francois Fleuret, EPFL)
![Page 43: Introduction to Deep Learning - unibas.ch...Introduction to Deep Learning Standard feed-forward neural network with 3 hidden layers A convolutional neural network (AlexNet, Krizhevsky](https://reader033.vdocuments.us/reader033/viewer/2022053023/6053c3abd59d2404876eacd4/html5/thumbnails/43.jpg)
Bias-variance tradeoff● Related to underfitting/overfitting● Fix one ● Fit is random variable (depends on trian
data)● Decompose true MSE error at :
Variance
Bias
Irreducible error
● Small variance, high bias: underfit● Large variance, low bias: overfit● Small bias, small variance is hard → Tradeoff
![Page 44: Introduction to Deep Learning - unibas.ch...Introduction to Deep Learning Standard feed-forward neural network with 3 hidden layers A convolutional neural network (AlexNet, Krizhevsky](https://reader033.vdocuments.us/reader033/viewer/2022053023/6053c3abd59d2404876eacd4/html5/thumbnails/44.jpg)
Bias-variance tradeoff
(Credit: Francois Fleuret, EPFL)
![Page 45: Introduction to Deep Learning - unibas.ch...Introduction to Deep Learning Standard feed-forward neural network with 3 hidden layers A convolutional neural network (AlexNet, Krizhevsky](https://reader033.vdocuments.us/reader033/viewer/2022053023/6053c3abd59d2404876eacd4/html5/thumbnails/45.jpg)
Bias-variance tradeoff
Less capacity
Test error
Underfitting Overfitting
More capacity
Variance
Bias
Test error = variance + bias (+ irreducible error)
![Page 46: Introduction to Deep Learning - unibas.ch...Introduction to Deep Learning Standard feed-forward neural network with 3 hidden layers A convolutional neural network (AlexNet, Krizhevsky](https://reader033.vdocuments.us/reader033/viewer/2022053023/6053c3abd59d2404876eacd4/html5/thumbnails/46.jpg)
● Example: Logistic regression●
● True cond. distribution is
● Seek estimate● Likelihood of train y give train x
● Loss is neg. log-likelihood:
Maximum likelihood interpretation
![Page 47: Introduction to Deep Learning - unibas.ch...Introduction to Deep Learning Standard feed-forward neural network with 3 hidden layers A convolutional neural network (AlexNet, Krizhevsky](https://reader033.vdocuments.us/reader033/viewer/2022053023/6053c3abd59d2404876eacd4/html5/thumbnails/47.jpg)
● Consider model parameters as random with a prior distribution
● Bayes’ rule gives posterior distribution on parameters conditioned on data
Bayesian interpretation
![Page 48: Introduction to Deep Learning - unibas.ch...Introduction to Deep Learning Standard feed-forward neural network with 3 hidden layers A convolutional neural network (AlexNet, Krizhevsky](https://reader033.vdocuments.us/reader033/viewer/2022053023/6053c3abd59d2404876eacd4/html5/thumbnails/48.jpg)
Deep Learning
● Parametric ML with hypothesis class:
![Page 49: Introduction to Deep Learning - unibas.ch...Introduction to Deep Learning Standard feed-forward neural network with 3 hidden layers A convolutional neural network (AlexNet, Krizhevsky](https://reader033.vdocuments.us/reader033/viewer/2022053023/6053c3abd59d2404876eacd4/html5/thumbnails/49.jpg)
General references
● Goodfellow, Bengio, Courville,Deep learning, MIT press, 2016,http://www.deeplearningbook.org
● EPFL course slides and videos– Prof. Francois Fleuret
– https://documents.epfl.ch/users/f/fl/fleuret/www/dlc/
![Page 50: Introduction to Deep Learning - unibas.ch...Introduction to Deep Learning Standard feed-forward neural network with 3 hidden layers A convolutional neural network (AlexNet, Krizhevsky](https://reader033.vdocuments.us/reader033/viewer/2022053023/6053c3abd59d2404876eacd4/html5/thumbnails/50.jpg)
OrganisationMeeting # Speaker Topic
Today David Belius Intro to Machine Learning
Next week Marko Thiel Intro to Artificial Neural Networks
March 8 Master Student Regularization (Bengio Ch 7)
March 15 Master Student Optimization (Bengio Ch 8)
March 22 Master Student Convolutional neural networks (Bengio Ch 9)
April 12 Master Student Recurrent Neural Networks (Bengio Ch 10)
April 26+ PhD Students Advanced topics
● First four student talks– Master student speakers: e-mail me any preferences
– Set up preliminary meeting with Marko and me
● Optional practical sessions (programming)- E-mail me if interested
● No meeting April 19