cs456: machine learning · outlines a brief history of neural network multilayer neural network...

86
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . CS456: Machine Learning Artificial Neural Networks & Deep learning Jakramate Bootkrajang Department of Computer Science Chiang Mai University Jakramate Bootkrajang CS456: Machine Learning 1 / 86

Upload: others

Post on 30-Sep-2020

9 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: CS456: Machine Learning · Outlines A brief history of neural network Multilayer neural network Backpropagation Deep neural network Convolutional neural networks (CNNs) Long-Short

...

.

...

.

...

.

...

.

...

.

...

.

...

.

...

.

...

.

...

.

CS456: Machine LearningArtificial Neural Networks & Deep learning

Jakramate Bootkrajang

Department of Computer ScienceChiang Mai University

Jakramate Bootkrajang CS456: Machine Learning 1 / 86

Page 2: CS456: Machine Learning · Outlines A brief history of neural network Multilayer neural network Backpropagation Deep neural network Convolutional neural networks (CNNs) Long-Short

...

.

...

.

...

.

...

.

...

.

...

.

...

.

...

.

...

.

...

.

Objectives

To understand the philosophy behind neural networks classifier

To understand how to train neural network models

To understand the effect of important neural networks parameters

To understand what deep neural networks are

Jakramate Bootkrajang CS456: Machine Learning 2 / 86

Page 3: CS456: Machine Learning · Outlines A brief history of neural network Multilayer neural network Backpropagation Deep neural network Convolutional neural networks (CNNs) Long-Short

...

.

...

.

...

.

...

.

...

.

...

.

...

.

...

.

...

.

...

.

Outlines

A brief history of neural network

Multilayer neural network

Backpropagation

Deep neural network▶ Convolutional neural networks (CNNs)

▶ Long-Short Term Memory (LSTM)

Jakramate Bootkrajang CS456: Machine Learning 3 / 86

Page 4: CS456: Machine Learning · Outlines A brief history of neural network Multilayer neural network Backpropagation Deep neural network Convolutional neural networks (CNNs) Long-Short

...

.

...

.

...

.

...

.

...

.

...

.

...

.

...

.

...

.

...

.

Artificial Neural Networks

Jakramate Bootkrajang CS456: Machine Learning 4 / 86

Page 5: CS456: Machine Learning · Outlines A brief history of neural network Multilayer neural network Backpropagation Deep neural network Convolutional neural networks (CNNs) Long-Short

...

.

...

.

...

.

...

.

...

.

...

.

...

.

...

.

...

.

...

.

Human brain

Jakramate Bootkrajang CS456: Machine Learning 5 / 86

Page 6: CS456: Machine Learning · Outlines A brief history of neural network Multilayer neural network Backpropagation Deep neural network Convolutional neural networks (CNNs) Long-Short

...

.

...

.

...

.

...

.

...

.

...

.

...

.

...

.

...

.

...

.

A (biological) neuron

Jakramate Bootkrajang CS456: Machine Learning 6 / 86

Page 7: CS456: Machine Learning · Outlines A brief history of neural network Multilayer neural network Backpropagation Deep neural network Convolutional neural networks (CNNs) Long-Short

...

.

...

.

...

.

...

.

...

.

...

.

...

.

...

.

...

.

...

.

1940s: Artificial neuron

Warrent McCulloch (neuroscientist) and Walter Pitts (logician) modelled alogic unit after the theories of how neuron works

Figure: https://towardsdatascience.com/mcculloch-pitts-model-5fdf65ac5dd1

Jakramate Bootkrajang CS456: Machine Learning 7 / 86

Page 8: CS456: Machine Learning · Outlines A brief history of neural network Multilayer neural network Backpropagation Deep neural network Convolutional neural networks (CNNs) Long-Short

...

.

...

.

...

.

...

.

...

.

...

.

...

.

...

.

...

.

...

.

1950s: Perceptron

F.Rosenblatt’s perceptron with learnable weights

Figure: https://towardsdatascience.com/what-the-hell-is-perceptron-626217814f53

Jakramate Bootkrajang CS456: Machine Learning 8 / 86

Page 9: CS456: Machine Learning · Outlines A brief history of neural network Multilayer neural network Backpropagation Deep neural network Convolutional neural networks (CNNs) Long-Short

...

.

...

.

...

.

...

.

...

.

...

.

...

.

...

.

...

.

...

.

1950s: Perceptron

Rosenblatt’s perceptron acts as linear function of inputs, wTx

Jakramate Bootkrajang CS456: Machine Learning 9 / 86

Page 10: CS456: Machine Learning · Outlines A brief history of neural network Multilayer neural network Backpropagation Deep neural network Convolutional neural networks (CNNs) Long-Short

...

.

...

.

...

.

...

.

...

.

...

.

...

.

...

.

...

.

...

.

1970s: XOR problem

M.Minsky showed that the perceptron is not suitable for non-linearproblems

Figure: https://www.rawanyat.com/data-science/2018/1/25/classification-via-perceptrons

Jakramate Bootkrajang CS456: Machine Learning 10 / 86

Page 11: CS456: Machine Learning · Outlines A brief history of neural network Multilayer neural network Backpropagation Deep neural network Convolutional neural networks (CNNs) Long-Short

...

.

...

.

...

.

...

.

...

.

...

.

...

.

...

.

...

.

...

.

1980s: AI winter

No significant developments

Jakramate Bootkrajang CS456: Machine Learning 11 / 86

Page 12: CS456: Machine Learning · Outlines A brief history of neural network Multilayer neural network Backpropagation Deep neural network Convolutional neural networks (CNNs) Long-Short

...

.

...

.

...

.

...

.

...

.

...

.

...

.

...

.

...

.

...

.

1990s: Multilayer perceptron

D.Rumelhart, G.Hinton put forward multilayer perceptron with non-lineardifferentiable activation function and backpropagation algorithm to train

the model

Figure: Shao, Changpeng. “A Quantum Model for Multilayer Perceptron.” (2018).Jakramate Bootkrajang CS456: Machine Learning 12 / 86

Page 13: CS456: Machine Learning · Outlines A brief history of neural network Multilayer neural network Backpropagation Deep neural network Convolutional neural networks (CNNs) Long-Short

...

.

...

.

...

.

...

.

...

.

...

.

...

.

...

.

...

.

...

.

2010s: Deep learning

G.Hinton, Y.Lecun increased the depth of MLP and proposed ways to dealwith learning millions of parameters (weight sharing, pretraining)

Figure: https://mc.ai/deep-learning-overview-of-neurons-and-activation-functions/

Jakramate Bootkrajang CS456: Machine Learning 13 / 86

Page 14: CS456: Machine Learning · Outlines A brief history of neural network Multilayer neural network Backpropagation Deep neural network Convolutional neural networks (CNNs) Long-Short

...

.

...

.

...

.

...

.

...

.

...

.

...

.

...

.

...

.

...

.

The timeline

Jakramate Bootkrajang CS456: Machine Learning 14 / 86

Page 15: CS456: Machine Learning · Outlines A brief history of neural network Multilayer neural network Backpropagation Deep neural network Convolutional neural networks (CNNs) Long-Short

...

.

...

.

...

.

...

.

...

.

...

.

...

.

...

.

...

.

...

.

Neural network

A bio-inspired (directed) weighted graph where vertices (nodes) areorganised into sets, and any two sets of vertices may form bi-partitegraph.

Set of vertices is referred to as a layer

Typically, there are three types of layer,▶ input layer

▶ hidden layer

▶ output layers

The edges represent the weights

Jakramate Bootkrajang CS456: Machine Learning 15 / 86

Page 16: CS456: Machine Learning · Outlines A brief history of neural network Multilayer neural network Backpropagation Deep neural network Convolutional neural networks (CNNs) Long-Short

...

.

...

.

...

.

...

.

...

.

...

.

...

.

...

.

...

.

...

.

Neural network layers

Note: Perceptron is neural network with no hidden layer

Jakramate Bootkrajang CS456: Machine Learning 16 / 86

Page 17: CS456: Machine Learning · Outlines A brief history of neural network Multilayer neural network Backpropagation Deep neural network Convolutional neural networks (CNNs) Long-Short

...

.

...

.

...

.

...

.

...

.

...

.

...

.

...

.

...

.

...

.

Input layer

An interface where data enters the network

Usually, the size of the layer (number of nodes) equals to thedimension of data

Jakramate Bootkrajang CS456: Machine Learning 17 / 86

Page 18: CS456: Machine Learning · Outlines A brief history of neural network Multilayer neural network Backpropagation Deep neural network Convolutional neural networks (CNNs) Long-Short

...

.

...

.

...

.

...

.

...

.

...

.

...

.

...

.

...

.

...

.

Hidden layer

Hidden layer adds capacity to the network

The more the hidden layers, the higher the capacity▶ able to represent complicated non-linear function

The number of hidden layers is referred to as depth of network▶ while the number of neuron in one hidden layer is referred to as width

of layer

Jakramate Bootkrajang CS456: Machine Learning 18 / 86

Page 19: CS456: Machine Learning · Outlines A brief history of neural network Multilayer neural network Backpropagation Deep neural network Convolutional neural networks (CNNs) Long-Short

...

.

...

.

...

.

...

.

...

.

...

.

...

.

...

.

...

.

...

.

Output layer

A layer where prediction is made

The number of output nodes depends on dimension of y, the target▶ regression: (usually) 1 output node

▶ binary classification: (usually) 1 output node

▶ multiclass classification: k output node, (k is the number of classes)

Jakramate Bootkrajang CS456: Machine Learning 19 / 86

Page 20: CS456: Machine Learning · Outlines A brief history of neural network Multilayer neural network Backpropagation Deep neural network Convolutional neural networks (CNNs) Long-Short

...

.

...

.

...

.

...

.

...

.

...

.

...

.

...

.

...

.

...

.

Node and its activation function

A node in neural network▶ receives inputs,

▶ aggregates the results, (often using simple weighted sum)

▶ passes it through node’s activation function

▶ sends the output as an input to nodes in the next layer

Jakramate Bootkrajang CS456: Machine Learning 20 / 86

Page 21: CS456: Machine Learning · Outlines A brief history of neural network Multilayer neural network Backpropagation Deep neural network Convolutional neural networks (CNNs) Long-Short

...

.

...

.

...

.

...

.

...

.

...

.

...

.

...

.

...

.

...

.

Activation function

Mimic behaviour of synapes in human brain

Only send out information only if the input is strong enough

Jakramate Bootkrajang CS456: Machine Learning 21 / 86

Page 22: CS456: Machine Learning · Outlines A brief history of neural network Multilayer neural network Backpropagation Deep neural network Convolutional neural networks (CNNs) Long-Short

...

.

...

.

...

.

...

.

...

.

...

.

...

.

...

.

...

.

...

.

Various Activation functions

Figure: V.Sze et.al.:Efficient Processing of Deep Neural Networks: A Tutorial andSurvey

Jakramate Bootkrajang CS456: Machine Learning 22 / 86

Page 23: CS456: Machine Learning · Outlines A brief history of neural network Multilayer neural network Backpropagation Deep neural network Convolutional neural networks (CNNs) Long-Short

...

.

...

.

...

.

...

.

...

.

...

.

...

.

...

.

...

.

...

.

Types of NNs

Fully connected NN

Recurrent NN

Lateral NN

Partially connected NN

Jakramate Bootkrajang CS456: Machine Learning 23 / 86

Page 24: CS456: Machine Learning · Outlines A brief history of neural network Multilayer neural network Backpropagation Deep neural network Convolutional neural networks (CNNs) Long-Short

...

.

...

.

...

.

...

.

...

.

...

.

...

.

...

.

...

.

...

.

Fully connected neural network

A node from previous layer is connected to all nodes in the next layer

Jakramate Bootkrajang CS456: Machine Learning 24 / 86

Page 25: CS456: Machine Learning · Outlines A brief history of neural network Multilayer neural network Backpropagation Deep neural network Convolutional neural networks (CNNs) Long-Short

...

.

...

.

...

.

...

.

...

.

...

.

...

.

...

.

...

.

...

.

Recurrent neural network

A node from the next layer may be connected to nodes in the previous layer

Jakramate Bootkrajang CS456: Machine Learning 25 / 86

Page 26: CS456: Machine Learning · Outlines A brief history of neural network Multilayer neural network Backpropagation Deep neural network Convolutional neural networks (CNNs) Long-Short

...

.

...

.

...

.

...

.

...

.

...

.

...

.

...

.

...

.

...

.

Lateral networks

There exists connections between nodes in the same layer

Jakramate Bootkrajang CS456: Machine Learning 26 / 86

Page 27: CS456: Machine Learning · Outlines A brief history of neural network Multilayer neural network Backpropagation Deep neural network Convolutional neural networks (CNNs) Long-Short

...

.

...

.

...

.

...

.

...

.

...

.

...

.

...

.

...

.

...

.

Partially connected NN

A sub-set of all possible connections of fully connected network. Moresimilar to human brain.

Jakramate Bootkrajang CS456: Machine Learning 27 / 86

Page 28: CS456: Machine Learning · Outlines A brief history of neural network Multilayer neural network Backpropagation Deep neural network Convolutional neural networks (CNNs) Long-Short

...

.

...

.

...

.

...

.

...

.

...

.

...

.

...

.

...

.

...

.

How to train NNs ?

Supervised learning (current focus)

Unsupervised learning▶ e.g, for training a self-organising map, autoencoder

Reinforcement learning▶ e.g, for training Deep Q Network (DQN)

Jakramate Bootkrajang CS456: Machine Learning 28 / 86

Page 29: CS456: Machine Learning · Outlines A brief history of neural network Multilayer neural network Backpropagation Deep neural network Convolutional neural networks (CNNs) Long-Short

...

.

...

.

...

.

...

.

...

.

...

.

...

.

...

.

...

.

...

.

Feedforward and backpropagation (supervised learning)

Data flow from left to right to generate output (feed forward)

The generated output will be compared with desired target so that wecan measure error (this is a form of supervised learning)

The error will back propagate in the reverse order from right to leftand are used to update weights along the way

Jakramate Bootkrajang CS456: Machine Learning 29 / 86

Page 30: CS456: Machine Learning · Outlines A brief history of neural network Multilayer neural network Backpropagation Deep neural network Convolutional neural networks (CNNs) Long-Short

...

.

...

.

...

.

...

.

...

.

...

.

...

.

...

.

...

.

...

.

(Un)supervised learning

Figure: https://mc.ai/auto-encoder-in-biology/

Goal: to find the compact representation of inputs

Usually trained using backpropagation with the input itself as target

Jakramate Bootkrajang CS456: Machine Learning 30 / 86

Page 31: CS456: Machine Learning · Outlines A brief history of neural network Multilayer neural network Backpropagation Deep neural network Convolutional neural networks (CNNs) Long-Short

...

.

...

.

...

.

...

.

...

.

...

.

...

.

...

.

...

.

...

.

Reinforcement learning

https://java.works-hub.com/learn/using-deep-q-learning-in-fifa-18-to-perfect-the-art-of-free-kicks-9bf7f

Jakramate Bootkrajang CS456: Machine Learning 31 / 86

Page 32: CS456: Machine Learning · Outlines A brief history of neural network Multilayer neural network Backpropagation Deep neural network Convolutional neural networks (CNNs) Long-Short

...

.

...

.

...

.

...

.

...

.

...

.

...

.

...

.

...

.

...

.

Backpropagation

Jakramate Bootkrajang CS456: Machine Learning 32 / 86

Page 33: CS456: Machine Learning · Outlines A brief history of neural network Multilayer neural network Backpropagation Deep neural network Convolutional neural networks (CNNs) Long-Short

...

.

...

.

...

.

...

.

...

.

...

.

...

.

...

.

...

.

...

.

Let’s do a simple forward run

Given (x, y) = ([0.5, 0.1], 1),

x1

x2y

HiddenInput Output

Jakramate Bootkrajang CS456: Machine Learning 33 / 86

Page 34: CS456: Machine Learning · Outlines A brief history of neural network Multilayer neural network Backpropagation Deep neural network Convolutional neural networks (CNNs) Long-Short

...

.

...

.

...

.

...

.

...

.

...

.

...

.

...

.

...

.

...

.

Measuring the error

For regression: Squared error▶

∑Ni=1

12 (y − y)2

For binary classification: binary cross entropy (neg.log.likelihood)▶ −

∑Ni=1

(δ(yi = 0) log(P0) + δ(yi = 1) log(P1)

)for y ∈ {0, 1}

▶ P1 is represented by the sigmoid function 11+e−wTx

For multi-class classification: cross entropy▶ −

∑Ni=1

∑Kk=1 δ(yi = k) log(Pk) for y ∈ {1, . . . ,K}, Pk = e−wT

k x∑k e−wT

k x

▶ Pk is represented by the softmax function

These are called loss function denoted L

Jakramate Bootkrajang CS456: Machine Learning 34 / 86

Page 35: CS456: Machine Learning · Outlines A brief history of neural network Multilayer neural network Backpropagation Deep neural network Convolutional neural networks (CNNs) Long-Short

...

.

...

.

...

.

...

.

...

.

...

.

...

.

...

.

...

.

...

.

Minimising the error

Minimising the error can be done by minimising the loss function

Standard gradient descent method can be employed

The optimisation plan1 Decide the loss function for our problem

2 Find the derivative of the loss w.r.t wij for all i, j

3 Update wnewij = wold

ij − η∇wijL

Jakramate Bootkrajang CS456: Machine Learning 35 / 86

Page 36: CS456: Machine Learning · Outlines A brief history of neural network Multilayer neural network Backpropagation Deep neural network Convolutional neural networks (CNNs) Long-Short

...

.

...

.

...

.

...

.

...

.

...

.

...

.

...

.

...

.

...

.

Problem with the plan

It is quite inefficient, (repeated gradient calculation)

Therefore, not scalable to deep networks

Backpropagation solves this inefficiency by▶ Computing the derivative of wij in terms of derivative of weight in the

next layer w.r.t error

▶ instead of finding derivative of wij w.r.t the final error

Jakramate Bootkrajang CS456: Machine Learning 36 / 86

Page 37: CS456: Machine Learning · Outlines A brief history of neural network Multilayer neural network Backpropagation Deep neural network Convolutional neural networks (CNNs) Long-Short

...

.

...

.

...

.

...

.

...

.

...

.

...

.

...

.

...

.

...

.

Chain rule

Useful for finding derivative of a composite function

∂f(g(h(x)))∂x =

∂f(g(h(x)))∂g(h(x))

∂g(h(x))∂x

=∂f(g(h(x)))∂g(h(x))

∂g(h(x))∂h(x)

∂h(x)∂x (1)

Note: a neural network can be viewed as a big composite function

Jakramate Bootkrajang CS456: Machine Learning 37 / 86

Page 38: CS456: Machine Learning · Outlines A brief history of neural network Multilayer neural network Backpropagation Deep neural network Convolutional neural networks (CNNs) Long-Short

...

.

...

.

...

.

...

.

...

.

...

.

...

.

...

.

...

.

...

.

Let’s find the update rules (1/3)Assume L =

∑Ni=1

12 (yi − yi)

2, for node j, oj = σ(netj) = σ(∑k

i=1 wkjok)

x1

x2y

∂L∂wij

=∂L∂oj

∂oj∂wij

=∂L∂oj

∂oj∂netj

∂netj∂wij

(2)

∂netj∂wij

=∂∑k

i=1 wkjok∂wij

= oi (3)

∂oj∂netj

=∂σ(netj)

∂netj= σ(netj)(1 − σ(netj)) (4)

Jakramate Bootkrajang CS456: Machine Learning 38 / 86

Page 39: CS456: Machine Learning · Outlines A brief history of neural network Multilayer neural network Backpropagation Deep neural network Convolutional neural networks (CNNs) Long-Short

...

.

...

.

...

.

...

.

...

.

...

.

...

.

...

.

...

.

...

.

Let’s find the update rules (2/3)

Two cases for ∂L∂oj

1. node j is the output node, we know that yj = oj

∂L∂oj

=∂∑N

i=112(yi − yi)2

∂yj

= (5)

Jakramate Bootkrajang CS456: Machine Learning 39 / 86

Page 40: CS456: Machine Learning · Outlines A brief history of neural network Multilayer neural network Backpropagation Deep neural network Convolutional neural networks (CNNs) Long-Short

...

.

...

.

...

.

...

.

...

.

...

.

...

.

...

.

...

.

...

.

Let’s find the update rules (3/3)

x1

x2 y

2. node j is intemediate node, oj acts as input to nodes in the next layer[observation] oj affects L via netk for all k in the next layer

∂L∂oj

=∑

k

( ∂L∂netk

∂netk∂oj

)=

∑k

( ∂L∂ok

∂ok∂netk

∂netk∂oj

)=

∑k

( ∂L∂ok

σ(netk)(1 − σ(netk))wjk)

(6)

Recursion !! derivative w.r.t to oj depends on derivative w.r.t okJakramate Bootkrajang CS456: Machine Learning 40 / 86

Page 41: CS456: Machine Learning · Outlines A brief history of neural network Multilayer neural network Backpropagation Deep neural network Convolutional neural networks (CNNs) Long-Short

...

.

...

.

...

.

...

.

...

.

...

.

...

.

...

.

...

.

...

.

Backpropagation algorithm

Starting from output layerCalculate weight gradients

∂L∂wij

=∂L∂oj

∂oj∂netj

∂netj∂wij

= δjoi (7)

if node i is output nodeδj = −(y − y)σ(netj)(1 − σ(netj)) = −(y − y)oj(1 − oj)

if node i is intermediate node δj =(∑

l(wjlδl))oj(1 − oj)

Update wnewij = wnew

ij − η ∂L∂wij

= ηδjoi

Jakramate Bootkrajang CS456: Machine Learning 41 / 86

Page 42: CS456: Machine Learning · Outlines A brief history of neural network Multilayer neural network Backpropagation Deep neural network Convolutional neural networks (CNNs) Long-Short

...

.

...

.

...

.

...

.

...

.

...

.

...

.

...

.

...

.

...

.

Solution landscape

Objective function of MLP with hidden layers is non-convex (there aremultiple local optima)

Gradient descent may get stuck in local optimaJakramate Bootkrajang CS456: Machine Learning 42 / 86

Page 43: CS456: Machine Learning · Outlines A brief history of neural network Multilayer neural network Backpropagation Deep neural network Convolutional neural networks (CNNs) Long-Short

...

.

...

.

...

.

...

.

...

.

...

.

...

.

...

.

...

.

...

.

Stocastic Gradient descent (SDG)

Compute gradient based on small batch of data

Might help escape local optima

Jakramate Bootkrajang CS456: Machine Learning 43 / 86

Page 44: CS456: Machine Learning · Outlines A brief history of neural network Multilayer neural network Backpropagation Deep neural network Convolutional neural networks (CNNs) Long-Short

...

.

...

.

...

.

...

.

...

.

...

.

...

.

...

.

...

.

...

.

Momentum

Just like in Physics, momentum helps moving object maintains direction

Momentum helps stochastic gradient descent reaches target quicker

Jakramate Bootkrajang CS456: Machine Learning 44 / 86

Page 45: CS456: Machine Learning · Outlines A brief history of neural network Multilayer neural network Backpropagation Deep neural network Convolutional neural networks (CNNs) Long-Short

...

.

...

.

...

.

...

.

...

.

...

.

...

.

...

.

...

.

...

.

Learning rate and rate decay

Assumption: when near stationary point avoid jumpting too much as itmight overshoot the target

Helps stay focus to the target

Jakramate Bootkrajang CS456: Machine Learning 45 / 86

Page 46: CS456: Machine Learning · Outlines A brief history of neural network Multilayer neural network Backpropagation Deep neural network Convolutional neural networks (CNNs) Long-Short

...

.

...

.

...

.

...

.

...

.

...

.

...

.

...

.

...

.

...

.

Overfitting

The network was left learning for too long. It memorises trainining databut cannot generalise

Remedies: early stopping, dropout, regularisation

Jakramate Bootkrajang CS456: Machine Learning 46 / 86

Page 47: CS456: Machine Learning · Outlines A brief history of neural network Multilayer neural network Backpropagation Deep neural network Convolutional neural networks (CNNs) Long-Short

...

.

...

.

...

.

...

.

...

.

...

.

...

.

...

.

...

.

...

.

Convolutional Neural Network

Jakramate Bootkrajang CS456: Machine Learning 47 / 86

Page 48: CS456: Machine Learning · Outlines A brief history of neural network Multilayer neural network Backpropagation Deep neural network Convolutional neural networks (CNNs) Long-Short

...

.

...

.

...

.

...

.

...

.

...

.

...

.

...

.

...

.

...

.

NN for visual related tasks

One of the most popular deep neural network models

Found its use in various visual recognition tasks

Figure: https://neurohive.io/en/datasets/new-datasets-for-3d-object-recognition/

Jakramate Bootkrajang CS456: Machine Learning 48 / 86

Page 49: CS456: Machine Learning · Outlines A brief history of neural network Multilayer neural network Backpropagation Deep neural network Convolutional neural networks (CNNs) Long-Short

...

.

...

.

...

.

...

.

...

.

...

.

...

.

...

.

...

.

...

.

Typical computer vision pipeline

Figure: https://mc.ai/cnn-application-on-structured-data-automated-feature-extraction/

Composed of two steps▶ Feature extraction (requires domain knowledge)

▶ Classification step

Jakramate Bootkrajang CS456: Machine Learning 49 / 86

Page 50: CS456: Machine Learning · Outlines A brief history of neural network Multilayer neural network Backpropagation Deep neural network Convolutional neural networks (CNNs) Long-Short

...

.

...

.

...

.

...

.

...

.

...

.

...

.

...

.

...

.

...

.

Deep NN based computer vision pipeline

Figure: https://mc.ai/cnn-application-on-structured-data-automated-feature-extraction/

Train an end-2-end convolutional neural network▶ No explicit feature extraction (Features are learned)

▶ Classification step

Jakramate Bootkrajang CS456: Machine Learning 50 / 86

Page 51: CS456: Machine Learning · Outlines A brief history of neural network Multilayer neural network Backpropagation Deep neural network Convolutional neural networks (CNNs) Long-Short

...

.

...

.

...

.

...

.

...

.

...

.

...

.

...

.

...

.

...

.

Convolutional Neural Network

Figure: https://adeshpande3.github.io/A-Beginner's-Guide-To-Understanding-Convolutional-Neural-Networks/

A deep neural network composed of Convolutional Layers for (trainable)feature extraction part and Fully-connected NN for classification part

Jakramate Bootkrajang CS456: Machine Learning 51 / 86

Page 52: CS456: Machine Learning · Outlines A brief history of neural network Multilayer neural network Backpropagation Deep neural network Convolutional neural networks (CNNs) Long-Short

...

.

...

.

...

.

...

.

...

.

...

.

...

.

...

.

...

.

...

.

Anatomy of CNN

Convolutional layer

Pooling layer

Fully-connected layer

Jakramate Bootkrajang CS456: Machine Learning 52 / 86

Page 53: CS456: Machine Learning · Outlines A brief history of neural network Multilayer neural network Backpropagation Deep neural network Convolutional neural networks (CNNs) Long-Short

...

.

...

.

...

.

...

.

...

.

...

.

...

.

...

.

...

.

...

.

Convolution operation motivation

Why ?would like to detect some patterns within the input image

How ?define the patterns in a kernel/filter and perform pattern matching

▶ the output from this operation should indicate how likely the patternsare found at the specific area

Jakramate Bootkrajang CS456: Machine Learning 53 / 86

Page 54: CS456: Machine Learning · Outlines A brief history of neural network Multilayer neural network Backpropagation Deep neural network Convolutional neural networks (CNNs) Long-Short

...

.

...

.

...

.

...

.

...

.

...

.

...

.

...

.

...

.

...

.

Filter: image processing vs CNN

Filter contains pattern which we want to detect from input image

In image processing filter is often predefined

In CNN filter is learned from the data

Jakramate Bootkrajang CS456: Machine Learning 54 / 86

Page 55: CS456: Machine Learning · Outlines A brief history of neural network Multilayer neural network Backpropagation Deep neural network Convolutional neural networks (CNNs) Long-Short

...

.

...

.

...

.

...

.

...

.

...

.

...

.

...

.

...

.

...

.

Sobel edge detection kernel

Jakramate Bootkrajang CS456: Machine Learning 55 / 86

Page 56: CS456: Machine Learning · Outlines A brief history of neural network Multilayer neural network Backpropagation Deep neural network Convolutional neural networks (CNNs) Long-Short

...

.

...

.

...

.

...

.

...

.

...

.

...

.

...

.

...

.

...

.

CNN kernels learned from ImageNet

Figure: Krizhevsky et al. “ImageNet Classification with Deep ConvolutionalNeural Networks”

Jakramate Bootkrajang CS456: Machine Learning 56 / 86

Page 57: CS456: Machine Learning · Outlines A brief history of neural network Multilayer neural network Backpropagation Deep neural network Convolutional neural networks (CNNs) Long-Short

...

.

...

.

...

.

...

.

...

.

...

.

...

.

...

.

...

.

...

.

How the output is calculated ? (1/2)

Cross correlation operation (widely used despite the name)Compute the sum of element-wise products

Jakramate Bootkrajang CS456: Machine Learning 57 / 86

Page 58: CS456: Machine Learning · Outlines A brief history of neural network Multilayer neural network Backpropagation Deep neural network Convolutional neural networks (CNNs) Long-Short

...

.

...

.

...

.

...

.

...

.

...

.

...

.

...

.

...

.

...

.

How the output is calculated ? (2/2)

Convolution operation (flipping kernel from right to left and from top tobuttom and perform cross-correlation)

Jakramate Bootkrajang CS456: Machine Learning 58 / 86

Page 59: CS456: Machine Learning · Outlines A brief history of neural network Multilayer neural network Backpropagation Deep neural network Convolutional neural networks (CNNs) Long-Short

...

.

...

.

...

.

...

.

...

.

...

.

...

.

...

.

...

.

...

.

Convolutional layer in CNN

Flat filter can be seen as a matrix. However, filter can have depthresulting in mathematical object called tensor

Jakramate Bootkrajang CS456: Machine Learning 59 / 86

Page 60: CS456: Machine Learning · Outlines A brief history of neural network Multilayer neural network Backpropagation Deep neural network Convolutional neural networks (CNNs) Long-Short

...

.

...

.

...

.

...

.

...

.

...

.

...

.

...

.

...

.

...

.

Convolutional layer in CNN

Jakramate Bootkrajang CS456: Machine Learning 60 / 86

Page 61: CS456: Machine Learning · Outlines A brief history of neural network Multilayer neural network Backpropagation Deep neural network Convolutional neural networks (CNNs) Long-Short

...

.

...

.

...

.

...

.

...

.

...

.

...

.

...

.

...

.

...

.

Convolutional layer in CNN

Jakramate Bootkrajang CS456: Machine Learning 61 / 86

Page 62: CS456: Machine Learning · Outlines A brief history of neural network Multilayer neural network Backpropagation Deep neural network Convolutional neural networks (CNNs) Long-Short

...

.

...

.

...

.

...

.

...

.

...

.

...

.

...

.

...

.

...

.

Convolutional layer in CNN

Jakramate Bootkrajang CS456: Machine Learning 62 / 86

Page 63: CS456: Machine Learning · Outlines A brief history of neural network Multilayer neural network Backpropagation Deep neural network Convolutional neural networks (CNNs) Long-Short

...

.

...

.

...

.

...

.

...

.

...

.

...

.

...

.

...

.

...

.

Convolutional layer in CNN

Jakramate Bootkrajang CS456: Machine Learning 63 / 86

Page 64: CS456: Machine Learning · Outlines A brief history of neural network Multilayer neural network Backpropagation Deep neural network Convolutional neural networks (CNNs) Long-Short

...

.

...

.

...

.

...

.

...

.

...

.

...

.

...

.

...

.

...

.

Convolutional layer in CNN

Jakramate Bootkrajang CS456: Machine Learning 64 / 86

Page 65: CS456: Machine Learning · Outlines A brief history of neural network Multilayer neural network Backpropagation Deep neural network Convolutional neural networks (CNNs) Long-Short

...

.

...

.

...

.

...

.

...

.

...

.

...

.

...

.

...

.

...

.

Stride

The number of row and column to skip when sliding the kernel

Jakramate Bootkrajang CS456: Machine Learning 65 / 86

Page 66: CS456: Machine Learning · Outlines A brief history of neural network Multilayer neural network Backpropagation Deep neural network Convolutional neural networks (CNNs) Long-Short

...

.

...

.

...

.

...

.

...

.

...

.

...

.

...

.

...

.

...

.

Padding

Input augmentation so that kernel output can be computed

Jakramate Bootkrajang CS456: Machine Learning 66 / 86

Page 67: CS456: Machine Learning · Outlines A brief history of neural network Multilayer neural network Backpropagation Deep neural network Convolutional neural networks (CNNs) Long-Short

...

.

...

.

...

.

...

.

...

.

...

.

...

.

...

.

...

.

...

.

Pooling layer

Perform compression on input using window of size p.

Making the representation invariant to small translation

Jakramate Bootkrajang CS456: Machine Learning 67 / 86

Page 68: CS456: Machine Learning · Outlines A brief history of neural network Multilayer neural network Backpropagation Deep neural network Convolutional neural networks (CNNs) Long-Short

...

.

...

.

...

.

...

.

...

.

...

.

...

.

...

.

...

.

...

.

Fully-connected layer

The final output from convolutional layer can be seen as featurevector

The features can be fed into fully-connected NN for classification

Some researchers have successfully used the features to train otherclassifiers (SVM, LR)

Jakramate Bootkrajang CS456: Machine Learning 68 / 86

Page 69: CS456: Machine Learning · Outlines A brief history of neural network Multilayer neural network Backpropagation Deep neural network Convolutional neural networks (CNNs) Long-Short

...

.

...

.

...

.

...

.

...

.

...

.

...

.

...

.

...

.

...

.

Transfer learning

Many of the recognition tasks share important low-level features

Trained convolutional layer of existing network can be used as it is(freezing the weights)

Only train the fully-connected NN to match our current task

Note: tasks similarity dictates the success of transfer learning (buthow to measure ?)

Jakramate Bootkrajang CS456: Machine Learning 69 / 86

Page 70: CS456: Machine Learning · Outlines A brief history of neural network Multilayer neural network Backpropagation Deep neural network Convolutional neural networks (CNNs) Long-Short

...

.

...

.

...

.

...

.

...

.

...

.

...

.

...

.

...

.

...

.

Case study: Alexnet

Max-pooling layers follow first, second, and fifth convolutional layers

The number of neurons in each layer is given by 253440, 186624,64896, 64896, 43264, 4096, 4096, 1000

Jakramate Bootkrajang CS456: Machine Learning 70 / 86

Page 71: CS456: Machine Learning · Outlines A brief history of neural network Multilayer neural network Backpropagation Deep neural network Convolutional neural networks (CNNs) Long-Short

...

.

...

.

...

.

...

.

...

.

...

.

...

.

...

.

...

.

...

.

Recurrent Neural Network

Jakramate Bootkrajang CS456: Machine Learning 71 / 86

Page 72: CS456: Machine Learning · Outlines A brief history of neural network Multilayer neural network Backpropagation Deep neural network Convolutional neural networks (CNNs) Long-Short

...

.

...

.

...

.

...

.

...

.

...

.

...

.

...

.

...

.

...

.

Motivation

In some situations, input instances are not completely independentwith some temporal dependencies.

Examples▶ text generation: “I was born in France, I fluently speak . . . ”

▶ weather forecast: tomorrow’s weather may depend on todays’ andyesterday’s

CNN or NN cannot model such temporal dependencies

Jakramate Bootkrajang CS456: Machine Learning 72 / 86

Page 73: CS456: Machine Learning · Outlines A brief history of neural network Multilayer neural network Backpropagation Deep neural network Convolutional neural networks (CNNs) Long-Short

...

.

...

.

...

.

...

.

...

.

...

.

...

.

...

.

...

.

...

.

Recurrent Neural Network (RNN)

An RNN is neural network where the outputs of the node can flow backinto the node at the next time step

Figure: (left) typical RNN diagram, (right) diagram unfolded through time

Jakramate Bootkrajang CS456: Machine Learning 73 / 86

Page 74: CS456: Machine Learning · Outlines A brief history of neural network Multilayer neural network Backpropagation Deep neural network Convolutional neural networks (CNNs) Long-Short

...

.

...

.

...

.

...

.

...

.

...

.

...

.

...

.

...

.

...

.

Issue with RNN

RNN works well but sometimes out of context information persists inthe memory for too long

Here, x0, x1 still have some influences on output ht+1

Jakramate Bootkrajang CS456: Machine Learning 74 / 86

Page 75: CS456: Machine Learning · Outlines A brief history of neural network Multilayer neural network Backpropagation Deep neural network Convolutional neural networks (CNNs) Long-Short

...

.

...

.

...

.

...

.

...

.

...

.

...

.

...

.

...

.

...

.

Long Short Term Memory (LSTM)

LSTM tries to solve this problem with forgetting mechanism

Based heavily on https://colah.github.io/posts/2015-08-Understanding-LSTMs/

Jakramate Bootkrajang CS456: Machine Learning 75 / 86

Page 76: CS456: Machine Learning · Outlines A brief history of neural network Multilayer neural network Backpropagation Deep neural network Convolutional neural networks (CNNs) Long-Short

...

.

...

.

...

.

...

.

...

.

...

.

...

.

...

.

...

.

...

.

Investigating LSTM components

Memory lane (cell state)

Forgetting gate layer

Input gate layer

output layer

Jakramate Bootkrajang CS456: Machine Learning 76 / 86

Page 77: CS456: Machine Learning · Outlines A brief history of neural network Multilayer neural network Backpropagation Deep neural network Convolutional neural networks (CNNs) Long-Short

...

.

...

.

...

.

...

.

...

.

...

.

...

.

...

.

...

.

...

.

Memory lane

It runs throught the whole chain

Information can be removed (via X mark)

or added to memory (via + mark)

Ct sometimes called cell stateJakramate Bootkrajang CS456: Machine Learning 77 / 86

Page 78: CS456: Machine Learning · Outlines A brief history of neural network Multilayer neural network Backpropagation Deep neural network Convolutional neural networks (CNNs) Long-Short

...

.

...

.

...

.

...

.

...

.

...

.

...

.

...

.

...

.

...

.

Forgetting gate layer

ft is between 0 and 1

ft = 0 indicates that Ct−1 should be wiped out

ft = 1 retains fully the state Ct−1

Jakramate Bootkrajang CS456: Machine Learning 78 / 86

Page 79: CS456: Machine Learning · Outlines A brief history of neural network Multilayer neural network Backpropagation Deep neural network Convolutional neural networks (CNNs) Long-Short

...

.

...

.

...

.

...

.

...

.

...

.

...

.

...

.

...

.

...

.

Input gate layer

it tells which component of Ct to update

Ct computes the update values

Jakramate Bootkrajang CS456: Machine Learning 79 / 86

Page 80: CS456: Machine Learning · Outlines A brief history of neural network Multilayer neural network Backpropagation Deep neural network Convolutional neural networks (CNNs) Long-Short

...

.

...

.

...

.

...

.

...

.

...

.

...

.

...

.

...

.

...

.

Input gate layer

the new cell state equals the sum of old memories with new memories

Jakramate Bootkrajang CS456: Machine Learning 80 / 86

Page 81: CS456: Machine Learning · Outlines A brief history of neural network Multilayer neural network Backpropagation Deep neural network Convolutional neural networks (CNNs) Long-Short

...

.

...

.

...

.

...

.

...

.

...

.

...

.

...

.

...

.

...

.

Output layer

Output layer produces prediction as well as sends info back into thenode in the next time step

Jakramate Bootkrajang CS456: Machine Learning 81 / 86

Page 82: CS456: Machine Learning · Outlines A brief history of neural network Multilayer neural network Backpropagation Deep neural network Convolutional neural networks (CNNs) Long-Short

...

.

...

.

...

.

...

.

...

.

...

.

...

.

...

.

...

.

...

.

Python code for LSTM

Figure: https://towardsdatascience.com/illustrated-guide-to-lstms-and-gru-s-a-step-by-step-explanation-44e9eb85bf21

Jakramate Bootkrajang CS456: Machine Learning 82 / 86

Page 83: CS456: Machine Learning · Outlines A brief history of neural network Multilayer neural network Backpropagation Deep neural network Convolutional neural networks (CNNs) Long-Short

...

.

...

.

...

.

...

.

...

.

...

.

...

.

...

.

...

.

...

.

Applications of LSTM

Figure: by Raymond J. Mooney

Jakramate Bootkrajang CS456: Machine Learning 83 / 86

Page 84: CS456: Machine Learning · Outlines A brief history of neural network Multilayer neural network Backpropagation Deep neural network Convolutional neural networks (CNNs) Long-Short

...

.

...

.

...

.

...

.

...

.

...

.

...

.

...

.

...

.

...

.

Wait! How to train LSTM ?

backpropagation

Figure: https://www.kdnuggets.com/2019/05/understanding-backpropagation-applied-lstm.html

Jakramate Bootkrajang CS456: Machine Learning 84 / 86

Page 85: CS456: Machine Learning · Outlines A brief history of neural network Multilayer neural network Backpropagation Deep neural network Convolutional neural networks (CNNs) Long-Short

...

.

...

.

...

.

...

.

...

.

...

.

...

.

...

.

...

.

...

.

Objectives: revisited

To understand the philosophy behind neural networks classifier

To understand how to train neural network models

To understand the effect of important neural networks parameters

To understand what deep neural networks are▶ CNN

▶ LSTM

Jakramate Bootkrajang CS456: Machine Learning 85 / 86

Page 86: CS456: Machine Learning · Outlines A brief history of neural network Multilayer neural network Backpropagation Deep neural network Convolutional neural networks (CNNs) Long-Short

...

.

...

.

...

.

...

.

...

.

...

.

...

.

...

.

...

.

...

.

Reading list

https://www.math.univ-toulouse.fr/~besse/Wikistat/pdf/st-m-hdstat-rnn-deep-learning.pdf

https://project.inria.fr/deeplearning/files/2016/05/session3.pdf

https://colah.github.io/posts/2015-08-Understanding-LSTMs/

Jakramate Bootkrajang CS456: Machine Learning 86 / 86