from neural networks to deep learning

From Artificial Neural Networks to Deep learning

Viet-Trung Tran

Perceptron •  Rosenblatt 1957 •  input signals x1, x2, •  bias x0 = 1 •  Net input = weighted sum = Net(w,x) •  Activation/transfer func = f(Net(w,x)) •  output

weighted sum

step func1on

Weighted Sum and Bias

•  Weighted sum

•  Bias

Hard-limiter function

•  Hard-limiter – Threshold function – Discontinuous function – Discontinuous derivative

Threshold logic function

•  Saturating linear function

•  Contiguous function

•  Discontinuous derivative

Sigmoid function •  Most popular •  Output (0,1) •  Continuous derivatives •  Easy to differentiate

Artificial neural network – ANN structure

•  Number of input/output signals •  Number of hidden layers •  Number of neurons per layer •  Neuron weights •  Topology •  Biases

Feed-forward neural network

•  connections between the units do not form a directed cycle

Recurrent neural network

•  A class of artificial neural network where connections between units form a directed cycle

Why hidden layers

Neural network learning

•  2 types of learning – Parameter learning •  Learn neuron weight connections

– Structure learning •  Learn ANN structure from training data

Error function

•  Consider an ANN with n neurons •  For each learning example (x,d) – Training error caused by current weight w

•  Training error caused by w for entire learning examples

Learning principle

Neuron error gradients

Parameter learning: back propagation of error

•  Calculate total error at the top •  Calculate contributions to error at each step going

backwards

Back propagation discussion

•  Initial weights •  Learning rate •  Number of neurons per hidden layers •  Number of hidden layers

Stochastic gradient descent (SGD)

Deep learning

Google brain

Learning from tagged data

•  @Andrew Ng

2006 breakthrough

•  More data •  Faster hardware: GPU’s, multi-core CPU’s •  Working ideas on how to train deep

architectures

Deep Learning trends

•  @Andrew Ng

AI will transform the internet •  @Andrew Ng •  Technology areas with potential for paradigm shift: –  Computer vision –  Speech recognition & speech synthesis –  Language understanding: Machine translation; Web

search; Dialog systems; …. –  Advertising –  Personalization/recommendation systems –  Robotics

•  All this is hard: scalability, algorithms.

Deep learning

CONVOLUTIONAL NEURAL NETWORK

http://colah.github.io/

Convolution •  Convolution is a mathematical operation on two

functions f and g, producing a third function that is typically viewed as a modified version of one of the original functions,

Convolutional neural networks

•  Conv Nets is a kind of neural network that uses many identical copies of the same neuron – Large number of neurons – Large computational models – Number of actual weights (parameters) to be

learned fairly small

A 2D Convolutional Neural Network

•  a convolutional neural network can learn a neuron once and use it in many places, making it easier to learn the model and reducing error.

Structure of Conv Nets

•  Problem – predict whether a human is speaking or not

•  Input: audio samples at different points in time

Simple approach

•  just connect them all to a fully-connected layer

•  Then classify

A more sophisticated approach •  Local properties of the data –  frequency of sounds (increasing/decreasing)

•  Look at a small window of the audio sample –  Create a group of neuron A to compute certain features –  the output of this convolutional layer is fed into a fully-

connected layer, F

Max pooling layer

2D convolutional neural networks

Three-dimensional convolutional networks

Group of neurons: A

•  Bunch of neurons in parallel •  all get the same inputs and compute different

features.

Network in Network (Lin et al. (2013)

Conv Nets breakthroughs in computer vision

•  Krizehvsky et al. (2012)

Diferent Levels of Abstraction

RECURRENT NEURAL NETWORKS

http://colah.github.io/

Recurrent Neural Networks (RNN) have loops

•  A loop allows information to be passed from one step of the network to the next.

Unroll RNN

•  recurrent neural networks are intimately related to sequences and lists.

Examples •  predict the last word in “the clouds are in the sky" •  the gap between the relevant information and the

place that it’s needed is small •  RNNs can learn to use the past information

•  “I grew up in France… I speak fluent French.” •  As the gap grows, RNNs become unable to

learn to connect the information.

LONG SHORT TERM MEMORY NETWORKS

LSTM Networks

LSTM networks •  A Special kind of RNN •  Capable of learning long-term dependencies •  Structure in the form of a chain of repeating

modules of neural network

•  repeating module has a very simple structure, such as a single tanh layer

•  The tanh(z) function is a rescaled version of the sigmoid, and its output range is [ − 1,1] instead of [0,1].

LSTM networks

•  Repeating module consists of four neuron, interacting in a very special way

Core idea behind LSTMs •  The key to LSTMs is the cell state, the horizontal line

running through the top of the diagram. •  The cell state runs straight down the entire chain, with only

some minor linear interactions •  Easy for information to just flow along it unchanged

•  The ability to remove or add information to the cell state, carefully regulated by structures called gates

•  Sigmoid – How much of each component should be let

through. – Zero means nothing through – One means let everything through

•  An LSTM has three of these gates 72

LSTM step 1

•  decide what information we’re going to throw away from the cell state

•  forget gate layer

LSTM step 2

•  decide what new information we’re going to store in the cell state

•  input gate layer

LSTMs step 3

•  update the old cell state, Ct−1, into the new cell state Ct

LSTMs step 4

•  decide what we’re going to output

RECURRENT NEURAL NETWORKS WITH WORD EMBEDDINGS

APPENDIX

Perceptron 1957

Perceptron 1986

Perceptron

Activation function

Back propagation 1974/1986

•  Inspired by the architectural depth of the brain, researchers wanted for decades to train deep multi-layer neural networks.

•  No successful attempts were reported before 2006 …Exception: convolutional neural networks, LeCun 1998

•  SVM: Vapnik and his co-workers developed the Support Vector Machine (1993) (shallow

•  architecture). •  Breakthrough in 2006!

2006 breakthrough

•  More data •  Faster hardware: GPU’s, multi-core CPU’s •  Working ideas on how to train deep

architectures

•  Beat state of the art in many areas: – Language Modeling (2012, Mikolov et al) –  Image Recognition (Krizhevsky won 2012

ImageNet competition) – Sentiment Classification (2011, Socher et al) – Speech Recognition (2010, Dahl et al) – MNIST hand-written digit recognition (Ciresan et

al, 2010)

Credits

•  Roelof Pieters, www.graph-technologies.com •  Andrew Ng •  http://colah.github.io/

from neural networks to deep learning

Data & Analytics

neural networks and deep learning

introduction to deep neural networks - deep learning ·...

lecture 1: introduction to neural networks and deep...

introduction to deep neural networks

deep multi-state dynamic recurrent neural networks...

handwritten hangul recognition using deep convolutional...

deep neural networks for object...

deep neural networks convolutional networks ii - deep...

deep neural networks 1 - zcu.cz

deep convolutional neural networks - cjoint.com

deep convolutional neural networks - overview

artificial neural networks and deep...

deep neural networks - dmi.unibas.ch

introduction to deep neural...

practical deep neural networks -...

do deep neural networks suffer from crowding? - cbmm · do...

troubleshooting deep neural networks -...

deep learning for neural networks - vixra · deep learning...

explaining deep neural networks - arxiv

deep compression: compressing deep neural networks with...