deep learning - theory and practiceleap.ee.iisc.ac.in/.../dl20/slides/lecture_7_12_03_2020.pdf ·...

24
Deep Learning - Theory and Practice 12-03-2020 Deep Neural Networks http://leap.ee.iisc.ac.in/sriram/teaching/DL20/ [email protected]

Upload: others

Post on 23-Dec-2020

0 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Deep Learning - Theory and Practiceleap.ee.iisc.ac.in/.../DL20/slides/lecture_7_12_03_2020.pdf · 2020. 4. 2. · Need For Deep Networks Modeling complex real world data like speech,

Deep Learning - Theory and Practice

12-03-2020Deep Neural Networks

http://leap.ee.iisc.ac.in/sriram/teaching/DL20/

[email protected]

Page 2: Deep Learning - Theory and Practiceleap.ee.iisc.ac.in/.../DL20/slides/lecture_7_12_03_2020.pdf · 2020. 4. 2. · Need For Deep Networks Modeling complex real world data like speech,

Logistic Regression

Bishop - PRML book (Chap 3)

❖ 2- class logistic regression

❖ K-class logistic regression

❖ Maximum likelihood solution

❖ Maximum likelihood solution

Page 3: Deep Learning - Theory and Practiceleap.ee.iisc.ac.in/.../DL20/slides/lecture_7_12_03_2020.pdf · 2020. 4. 2. · Need For Deep Networks Modeling complex real world data like speech,

Typical Error Surfaces

Typical Error Surface as a function of parameters

(weights and biases)

Page 4: Deep Learning - Theory and Practiceleap.ee.iisc.ac.in/.../DL20/slides/lecture_7_12_03_2020.pdf · 2020. 4. 2. · Need For Deep Networks Modeling complex real world data like speech,

Learning with Gradient Descent

Error surface close to a local

Page 5: Deep Learning - Theory and Practiceleap.ee.iisc.ac.in/.../DL20/slides/lecture_7_12_03_2020.pdf · 2020. 4. 2. · Need For Deep Networks Modeling complex real world data like speech,

Learning Using Gradient Descent

Page 6: Deep Learning - Theory and Practiceleap.ee.iisc.ac.in/.../DL20/slides/lecture_7_12_03_2020.pdf · 2020. 4. 2. · Need For Deep Networks Modeling complex real world data like speech,

Parameter Learning

• Solving a non-convex optimization.

• Iterative solution. • Depends on the initialization. • Convergence to a local

optima. • Judicious choice of learning

rate

Page 7: Deep Learning - Theory and Practiceleap.ee.iisc.ac.in/.../DL20/slides/lecture_7_12_03_2020.pdf · 2020. 4. 2. · Need For Deep Networks Modeling complex real world data like speech,

Least Squares versus Logistic Regression

Bishop - PRML book (Chap 4)

Page 8: Deep Learning - Theory and Practiceleap.ee.iisc.ac.in/.../DL20/slides/lecture_7_12_03_2020.pdf · 2020. 4. 2. · Need For Deep Networks Modeling complex real world data like speech,

Least Squares versus Logistic Regression

Bishop - PRML book (Chap 4)

Page 9: Deep Learning - Theory and Practiceleap.ee.iisc.ac.in/.../DL20/slides/lecture_7_12_03_2020.pdf · 2020. 4. 2. · Need For Deep Networks Modeling complex real world data like speech,

Neural Networks

Page 10: Deep Learning - Theory and Practiceleap.ee.iisc.ac.in/.../DL20/slides/lecture_7_12_03_2020.pdf · 2020. 4. 2. · Need For Deep Networks Modeling complex real world data like speech,

Perceptron Algorithm

What if the data is not linearly separable

Perceptron Model [McCulloch, 1943, Rosenblatt, 1957]

Targets are binary classes [-1,1]

Similar to the logistic

regression

Page 11: Deep Learning - Theory and Practiceleap.ee.iisc.ac.in/.../DL20/slides/lecture_7_12_03_2020.pdf · 2020. 4. 2. · Need For Deep Networks Modeling complex real world data like speech,

Multi-layer PerceptronMulti-layer Perceptron [Hopfield, 1982]

thresholding function

non-linear function (tanh,sigmoid)

Page 12: Deep Learning - Theory and Practiceleap.ee.iisc.ac.in/.../DL20/slides/lecture_7_12_03_2020.pdf · 2020. 4. 2. · Need For Deep Networks Modeling complex real world data like speech,
Page 13: Deep Learning - Theory and Practiceleap.ee.iisc.ac.in/.../DL20/slides/lecture_7_12_03_2020.pdf · 2020. 4. 2. · Need For Deep Networks Modeling complex real world data like speech,
Page 14: Deep Learning - Theory and Practiceleap.ee.iisc.ac.in/.../DL20/slides/lecture_7_12_03_2020.pdf · 2020. 4. 2. · Need For Deep Networks Modeling complex real world data like speech,
Page 15: Deep Learning - Theory and Practiceleap.ee.iisc.ac.in/.../DL20/slides/lecture_7_12_03_2020.pdf · 2020. 4. 2. · Need For Deep Networks Modeling complex real world data like speech,
Page 16: Deep Learning - Theory and Practiceleap.ee.iisc.ac.in/.../DL20/slides/lecture_7_12_03_2020.pdf · 2020. 4. 2. · Need For Deep Networks Modeling complex real world data like speech,
Page 17: Deep Learning - Theory and Practiceleap.ee.iisc.ac.in/.../DL20/slides/lecture_7_12_03_2020.pdf · 2020. 4. 2. · Need For Deep Networks Modeling complex real world data like speech,

Neural Networks

Multi-layer Perceptron [Hopfield, 1982]

thresholding function

non-linear function (tanh,sigmoid)

• Useful for classifying non-linear data boundaries - non-linear class separation can be realized given enough data.

Page 18: Deep Learning - Theory and Practiceleap.ee.iisc.ac.in/.../DL20/slides/lecture_7_12_03_2020.pdf · 2020. 4. 2. · Need For Deep Networks Modeling complex real world data like speech,

Neural Networks

tanh sigmoid ReLu

Cost-Function

are the desired outputs

Mean Square Error Cross Entropy

Types of Non-linearities

Page 19: Deep Learning - Theory and Practiceleap.ee.iisc.ac.in/.../DL20/slides/lecture_7_12_03_2020.pdf · 2020. 4. 2. · Need For Deep Networks Modeling complex real world data like speech,

Learning Posterior Probabilities with NNsChoice of target function

• Softmax function for classification

• Softmax produces positive values that sum to 1 • Allows the interpretation of outputs as posterior

probabilities

Page 20: Deep Learning - Theory and Practiceleap.ee.iisc.ac.in/.../DL20/slides/lecture_7_12_03_2020.pdf · 2020. 4. 2. · Need For Deep Networks Modeling complex real world data like speech,

Need For Deep NetworksModeling complex real world data like speech, image, text

• Single hidden layer networks are too restrictive.

• Needs large number of units in the hidden layer and

trained with large amounts of data.

• Not generalizable enough.

Networks with multiple hidden layers - deep networks

(Open questions till 2005)

• Are these networks trainable ?

• How can we initialize such networks ?

• Will these generalize well or over train ?

Page 21: Deep Learning - Theory and Practiceleap.ee.iisc.ac.in/.../DL20/slides/lecture_7_12_03_2020.pdf · 2020. 4. 2. · Need For Deep Networks Modeling complex real world data like speech,

Deep Networks IntuitionNeural networks with multiple hidden layers - Deep networks [Hinton, 2006]

Page 22: Deep Learning - Theory and Practiceleap.ee.iisc.ac.in/.../DL20/slides/lecture_7_12_03_2020.pdf · 2020. 4. 2. · Need For Deep Networks Modeling complex real world data like speech,

Neural networks with multiple hidden layers - Deep networks

Deep Networks Intuition

Page 23: Deep Learning - Theory and Practiceleap.ee.iisc.ac.in/.../DL20/slides/lecture_7_12_03_2020.pdf · 2020. 4. 2. · Need For Deep Networks Modeling complex real world data like speech,

Neural networks with multiple hidden layers - Deep networks

Deep networks perform hierarchical data abstractions which enable the non-linear separation of complex data samples.

Deep Networks Intuition

Page 24: Deep Learning - Theory and Practiceleap.ee.iisc.ac.in/.../DL20/slides/lecture_7_12_03_2020.pdf · 2020. 4. 2. · Need For Deep Networks Modeling complex real world data like speech,

Deep Networks

- Are these networks trainable ?

• Advances in computation and processing

• Graphical processing units (GPUs) performing multiple

parallel multiply accumulate operations.

• Large amounts of supervised data sets