introduction to machine learning · understanding machine learning: from theory to algorithms....

30
Introduction to Machine Learning Neural Networks Bhaskar Mukhoty, Shivam Bansal Indian Institute of Technology Kanpur Summer School 2019 June 4, 2019 Bhaskar Mukhoty, Shivam Bansal ( Indian Institute of Technology Kanpur Summer School 2019 ) Introduction to Machine Learning June 4, 2019 1 / 30

Upload: others

Post on 11-Jul-2020

7 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Introduction to Machine Learning · Understanding machine learning: From theory to algorithms. Cambridge university press. Zeiler, M. D. and Fergus, R. (2014). Visualizing and understanding

Introduction to Machine LearningNeural Networks

Bhaskar Mukhoty, Shivam Bansal

Indian Institute of Technology KanpurSummer School 2019

June 4, 2019

Bhaskar Mukhoty, Shivam Bansal ( Indian Institute of Technology Kanpur Summer School 2019 )Introduction to Machine Learning June 4, 2019 1 / 30

Page 2: Introduction to Machine Learning · Understanding machine learning: From theory to algorithms. Cambridge university press. Zeiler, M. D. and Fergus, R. (2014). Visualizing and understanding

Lecture Outline

Neural Networks

Backpropagation Algorithm

Convolution NN

Recurrent NN

Bhaskar Mukhoty, Shivam Bansal ( Indian Institute of Technology Kanpur Summer School 2019 )Introduction to Machine Learning June 4, 2019 2 / 30

Page 3: Introduction to Machine Learning · Understanding machine learning: From theory to algorithms. Cambridge university press. Zeiler, M. D. and Fergus, R. (2014). Visualizing and understanding

Recap

Linear models: Learn a linear hypothesis function h in theinput/attribute space X .

Kernelized models: Map inputs φ(x) from attribute space X tofeature space F and learn a linear hypothesis function h in the featurespace.

Bhaskar Mukhoty, Shivam Bansal ( Indian Institute of Technology Kanpur Summer School 2019 )Introduction to Machine Learning June 4, 2019 3 / 30

Page 4: Introduction to Machine Learning · Understanding machine learning: From theory to algorithms. Cambridge university press. Zeiler, M. D. and Fergus, R. (2014). Visualizing and understanding

Neural Networks

A neural network consists of an input layer, an output layer and oneor more hidden layers.

Each node in a hidden layer computes a nonlinear transform of inputsit receives.

Bhaskar Mukhoty, Shivam Bansal ( Indian Institute of Technology Kanpur Summer School 2019 )Introduction to Machine Learning June 4, 2019 4 / 30

Page 5: Introduction to Machine Learning · Understanding machine learning: From theory to algorithms. Cambridge university press. Zeiler, M. D. and Fergus, R. (2014). Visualizing and understanding

Neural network with single hidden layer

Each input xn transformed intoseveral ”pre-activations” usinglinear models,

ank = wTk xn =

D∑d=1

wdkxnd

Non-linear activation appliedon each pre-activation,

hnk = g(ank)

Bhaskar Mukhoty, Shivam Bansal ( Indian Institute of Technology Kanpur Summer School 2019 )Introduction to Machine Learning June 4, 2019 5 / 30

Page 6: Introduction to Machine Learning · Understanding machine learning: From theory to algorithms. Cambridge university press. Zeiler, M. D. and Fergus, R. (2014). Visualizing and understanding

Neural network with single hidden layer

A linear model applied on thenew features hn,

sn = vThn =K∑

k=1

vkhnk

Finally, the output is producedas yn = o(sn).

The overall effect is anon-linear mapping frominputs to outputs.

Bhaskar Mukhoty, Shivam Bansal ( Indian Institute of Technology Kanpur Summer School 2019 )Introduction to Machine Learning June 4, 2019 6 / 30

Page 7: Introduction to Machine Learning · Understanding machine learning: From theory to algorithms. Cambridge university press. Zeiler, M. D. and Fergus, R. (2014). Visualizing and understanding

Neural Network

Bhaskar Mukhoty, Shivam Bansal ( Indian Institute of Technology Kanpur Summer School 2019 )Introduction to Machine Learning June 4, 2019 7 / 30

Page 8: Introduction to Machine Learning · Understanding machine learning: From theory to algorithms. Cambridge university press. Zeiler, M. D. and Fergus, R. (2014). Visualizing and understanding

Fully-connected Feedforward Neural Network

Fully-connected: All pairs of nodes between adjacent layers areconnected to each other.

Feedforward: No backward connections. Also, only adjacent layernodes are connected.

Bhaskar Mukhoty, Shivam Bansal ( Indian Institute of Technology Kanpur Summer School 2019 )Introduction to Machine Learning June 4, 2019 8 / 30

Page 9: Introduction to Machine Learning · Understanding machine learning: From theory to algorithms. Cambridge university press. Zeiler, M. D. and Fergus, R. (2014). Visualizing and understanding

Neural networks are feature learners

A NN tries to learn features that can predict the output well.

Bhaskar Mukhoty, Shivam Bansal ( Indian Institute of Technology Kanpur Summer School 2019 )Introduction to Machine Learning June 4, 2019 9 / 30

Page 10: Introduction to Machine Learning · Understanding machine learning: From theory to algorithms. Cambridge university press. Zeiler, M. D. and Fergus, R. (2014). Visualizing and understanding

Neural Networks as Feature Learners

Figure: [Zeiler and Fergus, 2014]

Bhaskar Mukhoty, Shivam Bansal ( Indian Institute of Technology Kanpur Summer School 2019 )Introduction to Machine Learning June 4, 2019 10 / 30

Page 11: Introduction to Machine Learning · Understanding machine learning: From theory to algorithms. Cambridge university press. Zeiler, M. D. and Fergus, R. (2014). Visualizing and understanding

Learning Neural Networks via Backpropagation

Backpropogation is Gradient Descent using chain rule of derivatives.

Chain rule of derivatives: Example, if y = f1(x) and x = f2(z) then∂y∂z = ∂y

∂x∂x∂z .

Bhaskar Mukhoty, Shivam Bansal ( Indian Institute of Technology Kanpur Summer School 2019 )Introduction to Machine Learning June 4, 2019 11 / 30

Page 12: Introduction to Machine Learning · Understanding machine learning: From theory to algorithms. Cambridge university press. Zeiler, M. D. and Fergus, R. (2014). Visualizing and understanding

Learning Neural Networks via Backpropagation

Backpropagation iterates between a forward pass and a backwardpass.

Forward pass computes the errors using the current parameters.

Backward pass computes the gradients and updates the parameters,starting from the parameters at the top layer and then movingbackwards.

Using Backpropagation in neural nets enables us to reuse previouscomputations efficiently.

Bhaskar Mukhoty, Shivam Bansal ( Indian Institute of Technology Kanpur Summer School 2019 )Introduction to Machine Learning June 4, 2019 12 / 30

Page 13: Introduction to Machine Learning · Understanding machine learning: From theory to algorithms. Cambridge university press. Zeiler, M. D. and Fergus, R. (2014). Visualizing and understanding

Activation Functions

Sigmoid: h = σ(a) = 11+exp(−a)

tanh(tan hyperbolic): h = exp(a)−exp(−a)exp(a)+exp(−a) = 2σ(2a)− 1

ReLU(Rectified Linear Unit): h = max(0, a)

Leaky ReLU: h = max(βa, a), where β is small positive number

Bhaskar Mukhoty, Shivam Bansal ( Indian Institute of Technology Kanpur Summer School 2019 )Introduction to Machine Learning June 4, 2019 13 / 30

Page 14: Introduction to Machine Learning · Understanding machine learning: From theory to algorithms. Cambridge university press. Zeiler, M. D. and Fergus, R. (2014). Visualizing and understanding

Activation Functions

Sigmoid, tanh can have issues with saturating gradients.

If weights are too large, the gradient for weights is close to zero andlearning becomes slow or may stop.

Pic credit: Andrej KarpathyBhaskar Mukhoty, Shivam Bansal ( Indian Institute of Technology Kanpur Summer School 2019 )Introduction to Machine Learning June 4, 2019 14 / 30

Page 15: Introduction to Machine Learning · Understanding machine learning: From theory to algorithms. Cambridge university press. Zeiler, M. D. and Fergus, R. (2014). Visualizing and understanding

Activation Functions

ReLU activation function have dead ReLU problem.

If the weights are initialized such that output of node is 0, thegradient for weights is zero and the node never fires.

Pic credit: Andrej KarpathyBhaskar Mukhoty, Shivam Bansal ( Indian Institute of Technology Kanpur Summer School 2019 )Introduction to Machine Learning June 4, 2019 15 / 30

Page 16: Introduction to Machine Learning · Understanding machine learning: From theory to algorithms. Cambridge university press. Zeiler, M. D. and Fergus, R. (2014). Visualizing and understanding

Preventing overfitting in Neural Networks

Weight decay: l1 or l2 regularization on the weights.

Early stopping: Stop when validation error starts increasing.

Dropout: Randomly remove units (with some probability p ∈ (0, 1))during training.

Bhaskar Mukhoty, Shivam Bansal ( Indian Institute of Technology Kanpur Summer School 2019 )Introduction to Machine Learning June 4, 2019 16 / 30

Page 17: Introduction to Machine Learning · Understanding machine learning: From theory to algorithms. Cambridge university press. Zeiler, M. D. and Fergus, R. (2014). Visualizing and understanding

Convolution Neural Network

CNNs are feedforward neural networks.

Weights are shared among the connections.

The set of distinct weights defines a filter or local feature detector.

Bhaskar Mukhoty, Shivam Bansal ( Indian Institute of Technology Kanpur Summer School 2019 )Introduction to Machine Learning June 4, 2019 17 / 30

Page 18: Introduction to Machine Learning · Understanding machine learning: From theory to algorithms. Cambridge university press. Zeiler, M. D. and Fergus, R. (2014). Visualizing and understanding

Convolution

An operation that captures spatially local patterns in the input.

Usually several filters {W k}Kk=1 are applied each producing a separatefeature map.

These filters are learned usign backpropagation.

Bhaskar Mukhoty, Shivam Bansal ( Indian Institute of Technology Kanpur Summer School 2019 )Introduction to Machine Learning June 4, 2019 18 / 30

Page 19: Introduction to Machine Learning · Understanding machine learning: From theory to algorithms. Cambridge university press. Zeiler, M. D. and Fergus, R. (2014). Visualizing and understanding

Pooling

An operation that reduces the dimension of input.

Pooling operation is fixed before and not learned.

Popular pooling approaches: Max-pooling, average pooling

Bhaskar Mukhoty, Shivam Bansal ( Indian Institute of Technology Kanpur Summer School 2019 )Introduction to Machine Learning June 4, 2019 19 / 30

Page 20: Introduction to Machine Learning · Understanding machine learning: From theory to algorithms. Cambridge university press. Zeiler, M. D. and Fergus, R. (2014). Visualizing and understanding

Convolution Neural Network

Bhaskar Mukhoty, Shivam Bansal ( Indian Institute of Technology Kanpur Summer School 2019 )Introduction to Machine Learning June 4, 2019 20 / 30

Page 21: Introduction to Machine Learning · Understanding machine learning: From theory to algorithms. Cambridge university press. Zeiler, M. D. and Fergus, R. (2014). Visualizing and understanding

Modeling sequential data

Example of sequential data: Videos, text, speechFFNN on a single observation xn

FFNN on sequential data x1, ..., xT

For sequential data, we want dependencies between ht ’s of differentobservations.

Bhaskar Mukhoty, Shivam Bansal ( Indian Institute of Technology Kanpur Summer School 2019 )Introduction to Machine Learning June 4, 2019 21 / 30

Page 22: Introduction to Machine Learning · Understanding machine learning: From theory to algorithms. Cambridge university press. Zeiler, M. D. and Fergus, R. (2014). Visualizing and understanding

Recurrent Neural Networks

A neural network for sequential data.

Each hidden state ht = f (Wxt + Uht−1) where U is a K × K matrixand f some activation function.

Bhaskar Mukhoty, Shivam Bansal ( Indian Institute of Technology Kanpur Summer School 2019 )Introduction to Machine Learning June 4, 2019 22 / 30

Page 23: Introduction to Machine Learning · Understanding machine learning: From theory to algorithms. Cambridge university press. Zeiler, M. D. and Fergus, R. (2014). Visualizing and understanding

Different types of RNN

Both input and output can be sequences of different lengths.

Bhaskar Mukhoty, Shivam Bansal ( Indian Institute of Technology Kanpur Summer School 2019 )Introduction to Machine Learning June 4, 2019 23 / 30

Page 24: Introduction to Machine Learning · Understanding machine learning: From theory to algorithms. Cambridge university press. Zeiler, M. D. and Fergus, R. (2014). Visualizing and understanding

Backpropagation through time

Think of the time-dimension as another hidden layer and then it isjust like standard backpropagation for feedforward neural nets.

Bhaskar Mukhoty, Shivam Bansal ( Indian Institute of Technology Kanpur Summer School 2019 )Introduction to Machine Learning June 4, 2019 24 / 30

Page 25: Introduction to Machine Learning · Understanding machine learning: From theory to algorithms. Cambridge university press. Zeiler, M. D. and Fergus, R. (2014). Visualizing and understanding

RNN Limitation

Vanishing or exploding gradients: Repeated multiplication can causegradients to vanish or explode.

Weak Long-term dependency: Repeated composition of functionscause the sensitivity of hidden states on a given part of input tobecome weaker as we move along the sequence.

Bhaskar Mukhoty, Shivam Bansal ( Indian Institute of Technology Kanpur Summer School 2019 )Introduction to Machine Learning June 4, 2019 25 / 30

Page 26: Introduction to Machine Learning · Understanding machine learning: From theory to algorithms. Cambridge university press. Zeiler, M. D. and Fergus, R. (2014). Visualizing and understanding

Long Short-Term Memory

An RNN with hidden nodes having gates to remember or forgetinformation.

Open gate denoted by ’o’ and closed gate denoted by ’-’.

Minor variations of LSTM exists depending on gates used, eg. GRU.

Bhaskar Mukhoty, Shivam Bansal ( Indian Institute of Technology Kanpur Summer School 2019 )Introduction to Machine Learning June 4, 2019 26 / 30

Page 27: Introduction to Machine Learning · Understanding machine learning: From theory to algorithms. Cambridge university press. Zeiler, M. D. and Fergus, R. (2014). Visualizing and understanding

Gated Recurrent Unit (Simplified)

RNN computes hidden states as

ht = tanh (Wxt + Uht−1)

.

For RNN state update is multiplicative (weak memory and gradientissues).

GRU computes hidden states as

h̃t = tanh (Wxt + Uht−1)

Γu = σ(Pxt + Qht−1)

ht = Γu × h̃t + (1− Γu)× ht−1

For GRU state update is additive.

Bhaskar Mukhoty, Shivam Bansal ( Indian Institute of Technology Kanpur Summer School 2019 )Introduction to Machine Learning June 4, 2019 27 / 30

Page 28: Introduction to Machine Learning · Understanding machine learning: From theory to algorithms. Cambridge university press. Zeiler, M. D. and Fergus, R. (2014). Visualizing and understanding

Questions?

Bhaskar Mukhoty, Shivam Bansal ( Indian Institute of Technology Kanpur Summer School 2019 )Introduction to Machine Learning June 4, 2019 28 / 30

Page 29: Introduction to Machine Learning · Understanding machine learning: From theory to algorithms. Cambridge university press. Zeiler, M. D. and Fergus, R. (2014). Visualizing and understanding

References I

Andrew Ng (2019).Sequence models.https://www.coursera.org/learn/nlp-sequence-models.

Carter, S. (2019).Visualize feed-forward neural network.https://playground.tensorflow.org/.

Kar, P. (2017).Introduction to machine learning.https://web.cse.iitk.ac.in/users/purushot/courses/ml/

2017-18-a.

Bhaskar Mukhoty, Shivam Bansal ( Indian Institute of Technology Kanpur Summer School 2019 )Introduction to Machine Learning June 4, 2019 29 / 30

Page 30: Introduction to Machine Learning · Understanding machine learning: From theory to algorithms. Cambridge university press. Zeiler, M. D. and Fergus, R. (2014). Visualizing and understanding

References II

Rai, P. (2018).Introduction to machine learning.https://www.cse.iitk.ac.in/users/piyush/courses/ml_

autumn18/index.html.

Shalev-Shwartz, S. and Ben-David, S. (2014).Understanding machine learning: From theory to algorithms.Cambridge university press.

Zeiler, M. D. and Fergus, R. (2014).Visualizing and understanding convolutional networks.In European conference on computer vision, pages 818–833. Springer.

Bhaskar Mukhoty, Shivam Bansal ( Indian Institute of Technology Kanpur Summer School 2019 )Introduction to Machine Learning June 4, 2019 30 / 30