comp3411/9414: artificial intelligence deep learning...

25
COMP3411/9414: Artificial Intelligence Deep Learning Introduction COMP3411/9414 c Anthony Knittel, 2014

Upload: others

Post on 12-Sep-2020

2 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: COMP3411/9414: Artificial Intelligence Deep Learning ...cs3411/14s1/lect/1page/13a_DeepLearning.pdfConvolutional Neural Networks CNNs are a form of deep neural network with a specific

COMP3411/9414: ArtificialIntelligence

Deep Learning Introduction

COMP3411/9414 c©Anthony Knittel, 2014

Page 2: COMP3411/9414: Artificial Intelligence Deep Learning ...cs3411/14s1/lect/1page/13a_DeepLearning.pdfConvolutional Neural Networks CNNs are a form of deep neural network with a specific

COMP3411/9414 Deep Learning Introduction 1

Deep Learning Networks

� Backpropagation trains a network by passing corrections from the

output nodes to previous units in the network.

� Training is based on classification, the network weights are initially

random, and are adjusted according to a gradient.

� With increased depth, learning is less effective. The gradient becomes

weaker with depth, as learning must pass down from the classification

layer.

� Backpropagation networks require extensive training data, which can

be difficult or costly to produce

� Training can become stuck in local minima and not find better

solutions

COMP3411/9414 c©Anthony Knittel, 2014

Page 3: COMP3411/9414: Artificial Intelligence Deep Learning ...cs3411/14s1/lect/1page/13a_DeepLearning.pdfConvolutional Neural Networks CNNs are a form of deep neural network with a specific

COMP3411/9414 Deep Learning Introduction 2

Deep Learning Networks

� Deep Learning techniques address a number of these issues

� Discovery of re-used features, potential for more modular learning

� Learning from unlabelled data (followed by supervised learning)

� The ability to train deeper networks, and capture intermediate features

COMP3411/9414 c©Anthony Knittel, 2014

Page 4: COMP3411/9414: Artificial Intelligence Deep Learning ...cs3411/14s1/lect/1page/13a_DeepLearning.pdfConvolutional Neural Networks CNNs are a form of deep neural network with a specific

COMP3411/9414 Deep Learning Introduction 3

Deep Learning Networks

� Machine Learning theory says we can learn any function with

accuracy as close as we want with a single layer, so why bother?

2-layer MLPs and SVMs are “universal”

� The right representation can be much more efficient for particular

tasks

� There is significant modularity in the brain- deep networks of re-used

features are seen in vision, and are useful for audio and natural

language tasks

� A more promising approach for more general AI

� http://www.technologyreview.com/featuredstory/513696/deep-learning

COMP3411/9414 c©Anthony Knittel, 2014

Page 5: COMP3411/9414: Artificial Intelligence Deep Learning ...cs3411/14s1/lect/1page/13a_DeepLearning.pdfConvolutional Neural Networks CNNs are a form of deep neural network with a specific

COMP3411/9414 Deep Learning Introduction 4

Deep Learning Networks

� Common techniques:

◮ Unsupervised learning, to pre-train the network

◮ Feature learning takes place one layer at a time. Outputs from

features of one layer are used as inputs for the next.

◮ After pre-training, supervised learning is performed on the

network using backpropagation

COMP3411/9414 c©Anthony Knittel, 2014

Page 6: COMP3411/9414: Artificial Intelligence Deep Learning ...cs3411/14s1/lect/1page/13a_DeepLearning.pdfConvolutional Neural Networks CNNs are a form of deep neural network with a specific

COMP3411/9414 Deep Learning Introduction 5

Layer-wise Pre-Training

a

COMP3411/9414 c©Anthony Knittel, 2014

Page 7: COMP3411/9414: Artificial Intelligence Deep Learning ...cs3411/14s1/lect/1page/13a_DeepLearning.pdfConvolutional Neural Networks CNNs are a form of deep neural network with a specific

COMP3411/9414 Deep Learning Introduction 6

Deep Learning Networks

� Main approaches:

◮ Autoencoder networks (unsupervised pre-training)

◮ Restricted Boltzmann Machine networks (unsupervised pre-

training)

◮ Convolutional Neural Networks (sparse, deep topology)

COMP3411/9414 c©Anthony Knittel, 2014

Page 8: COMP3411/9414: Artificial Intelligence Deep Learning ...cs3411/14s1/lect/1page/13a_DeepLearning.pdfConvolutional Neural Networks CNNs are a form of deep neural network with a specific

COMP3411/9414 Deep Learning Introduction 7

Autoencoder networks

b

COMP3411/9414 c©Anthony Knittel, 2014

Page 9: COMP3411/9414: Artificial Intelligence Deep Learning ...cs3411/14s1/lect/1page/13a_DeepLearning.pdfConvolutional Neural Networks CNNs are a form of deep neural network with a specific

COMP3411/9414 Deep Learning Introduction 8

Autoencoder networks

� Data is provided as input, and the output of the network tries to

reconstruct the input

� Learning is performed using backpropagation or related methods

� The target output of the network is set to the input

� The aim of training is to minimise the error of reconstruction

� A reduced set of hidden units is used, creating an information

bottleneck

COMP3411/9414 c©Anthony Knittel, 2014

Page 10: COMP3411/9414: Artificial Intelligence Deep Learning ...cs3411/14s1/lect/1page/13a_DeepLearning.pdfConvolutional Neural Networks CNNs are a form of deep neural network with a specific

COMP3411/9414 Deep Learning Introduction 9

Restricted Boltzmann Machine Networks

d

� RBMs are another technique for pre-training, to capture features of

the input. They are recurrent networks, with a number of stable states

� Given an input, the network can be sampled. Activations are passed

from the input to the hidden layer, then from hidden to the input layer,

repeating until stability is reached.

� The visible layer provides a reconstruction of the input. Training the

network allows capturing features of the input.

COMP3411/9414 c©Anthony Knittel, 2014

Page 11: COMP3411/9414: Artificial Intelligence Deep Learning ...cs3411/14s1/lect/1page/13a_DeepLearning.pdfConvolutional Neural Networks CNNs are a form of deep neural network with a specific

COMP3411/9414 Deep Learning Introduction 10

Restricted Boltzmann Machine Networks

� Stable states of the network have low energy values

� Gibbs sampling is used to find a low energy state

� The energy values of configurations are defined by the weights

� Probabilistic: weights → energy values → probabilities

COMP3411/9414 c©Anthony Knittel, 2014

Page 12: COMP3411/9414: Artificial Intelligence Deep Learning ...cs3411/14s1/lect/1page/13a_DeepLearning.pdfConvolutional Neural Networks CNNs are a form of deep neural network with a specific

COMP3411/9414 Deep Learning Introduction 11

Restricted Boltzmann Machine Networks

P(y)

input sum

1

0

� Units are binary stochastic neurons. Either on or off, firing is given

by a probability value according to the sum of inputs

� The activation function describes the probability of firing P(y j) as a

function of the input ∑i xiwi j

COMP3411/9414 c©Anthony Knittel, 2014

Page 13: COMP3411/9414: Artificial Intelligence Deep Learning ...cs3411/14s1/lect/1page/13a_DeepLearning.pdfConvolutional Neural Networks CNNs are a form of deep neural network with a specific

COMP3411/9414 Deep Learning Introduction 12

Alternating Gibbs Sampling

c

� The input is presented at the visible units

� Hidden units are updated based on probabilistic activations. Visible

units are updated subsequently. This repeats until stability is reached.

COMP3411/9414 c©Anthony Knittel, 2014

Page 14: COMP3411/9414: Artificial Intelligence Deep Learning ...cs3411/14s1/lect/1page/13a_DeepLearning.pdfConvolutional Neural Networks CNNs are a form of deep neural network with a specific

COMP3411/9414 Deep Learning Introduction 13

Pre-trained Deep Networks

Putting it all together:

� This approach can be used to pre-train a network, before performing

supervised learning

� A classification layer can be added to the top layer, representing

classes for supervised learning. A fine-tuning stage adjusts

weights using an error function defined at the output nodes, by

backpropagation.

COMP3411/9414 c©Anthony Knittel, 2014

Page 15: COMP3411/9414: Artificial Intelligence Deep Learning ...cs3411/14s1/lect/1page/13a_DeepLearning.pdfConvolutional Neural Networks CNNs are a form of deep neural network with a specific

COMP3411/9414 Deep Learning Introduction 14

Deep Learning Networks

f

COMP3411/9414 c©Anthony Knittel, 2014

Page 16: COMP3411/9414: Artificial Intelligence Deep Learning ...cs3411/14s1/lect/1page/13a_DeepLearning.pdfConvolutional Neural Networks CNNs are a form of deep neural network with a specific

COMP3411/9414 Deep Learning Introduction 15

Convolutional Neural Networks

e

COMP3411/9414 c©Anthony Knittel, 2014

Page 17: COMP3411/9414: Artificial Intelligence Deep Learning ...cs3411/14s1/lect/1page/13a_DeepLearning.pdfConvolutional Neural Networks CNNs are a form of deep neural network with a specific

COMP3411/9414 Deep Learning Introduction 16

Convolutional Neural Networks

� CNNs are a form of deep neural network with a specific topology,

based on structure seen in the visual system

� Each unit has a limited receptive field

� Units are convolutional, the same set of weights are used to find a

response in multiple positions

� Convolutional and sub-sampling layers perform specific functions,

acting in a manner similar to simple and complex cells in the visual

system

COMP3411/9414 c©Anthony Knittel, 2014

Page 18: COMP3411/9414: Artificial Intelligence Deep Learning ...cs3411/14s1/lect/1page/13a_DeepLearning.pdfConvolutional Neural Networks CNNs are a form of deep neural network with a specific

COMP3411/9414 Deep Learning Introduction 17

Summary

� Deep Learning approaches introduce a number of new techniques that

allow an increase in depth and modularity of neural networks

� Unsupervised learning allows capturing structure from observations,

without relying on feedback from classifications

� Unsupervised pre-training improves the reliability and accuracy of

supervised learning

� These techniques offer many new opportunities for machine learning

and more general artificial intelligence

COMP3411/9414 c©Anthony Knittel, 2014

Page 19: COMP3411/9414: Artificial Intelligence Deep Learning ...cs3411/14s1/lect/1page/13a_DeepLearning.pdfConvolutional Neural Networks CNNs are a form of deep neural network with a specific

COMP3411/9414 Deep Learning Introduction 18

Application- Speech Recognition

� “Context-Dependent Deep Neural Network Hidden Markov Model”

� Hidden Markov Model- learn relationships of transitions between

hidden states, and between hidden states and observations

COMP3411/9414 c©Anthony Knittel, 2014

Page 20: COMP3411/9414: Artificial Intelligence Deep Learning ...cs3411/14s1/lect/1page/13a_DeepLearning.pdfConvolutional Neural Networks CNNs are a form of deep neural network with a specific

COMP3411/9414 Deep Learning Introduction 19

Application- Speech Recognition

� The HMM system can be used to infer the phoneme symbols from

audio samples, based on the relationships between acoustic patterns

and symbols, and probabilities of symbol sequences

� Hidden Markov Model- learn relationships of transitions between

hidden states, and between hidden states and observations

� The Deep Neural Network is used to learn the probability distribution

of symbolic states from audio

COMP3411/9414 c©Anthony Knittel, 2014

Page 21: COMP3411/9414: Artificial Intelligence Deep Learning ...cs3411/14s1/lect/1page/13a_DeepLearning.pdfConvolutional Neural Networks CNNs are a form of deep neural network with a specific

COMP3411/9414 Deep Learning Introduction 20

Application- Speech Recognition

COMP3411/9414 c©Anthony Knittel, 2014

Page 22: COMP3411/9414: Artificial Intelligence Deep Learning ...cs3411/14s1/lect/1page/13a_DeepLearning.pdfConvolutional Neural Networks CNNs are a form of deep neural network with a specific

COMP3411/9414 Deep Learning Introduction 21

Application- Speech Recognition

� 7 layers, 2048 hidden units at each layer

� Trained on 309 hours of training data

� Each layer pre-trained as a Restricted Boltzmann Machine

� Fine-tuned using 9304 triphone states (output layer)

� Improvement from 27.4% to 18.5% error (30% improvement)

� Demonstration: http://www.youtube.com/watch?v=Nu-nlQqFCKg

COMP3411/9414 c©Anthony Knittel, 2014

Page 23: COMP3411/9414: Artificial Intelligence Deep Learning ...cs3411/14s1/lect/1page/13a_DeepLearning.pdfConvolutional Neural Networks CNNs are a form of deep neural network with a specific

COMP3411/9414 Deep Learning Introduction 22

References

� “Improving neural networks by preventing co-adaptation of feature

detectors”, Hinton et al 2012, http://arxiv.org/abs/1207.0580

� “Reducing the Dimensionality of Data with Neural Networks”,

Hinton, Salakhutdinov 2007

� “Conversational Speech Transcription Using Context-Dependent

Deep Neural Networks”, Seide 2011, http://research.microsoft.com/apps/pub

� “Context-Dependent Pre-Trained Deep Neural Networks for Large-

Vocabulary Speech Recognition”, Dahl 2012, http://ieeexplore.ieee.org/stamp

COMP3411/9414 c©Anthony Knittel, 2014

Page 24: COMP3411/9414: Artificial Intelligence Deep Learning ...cs3411/14s1/lect/1page/13a_DeepLearning.pdfConvolutional Neural Networks CNNs are a form of deep neural network with a specific

COMP3411/9414 Deep Learning Introduction 23

References

� “Scientists See Promise in Deep-Learning Programs”, http://www.nytimes.com/20

� “ ImageNet Classification with Deep Convolutional Neural Net-

works”, Krizhevsky et al 2012, http://nips.cc/Conferences/2012/Program/even

� “Building high-level features using large scale unsupervised

learning”, Le et al 2011, http://rxiv.org/abs/1112.6209

� “In a Big Network of Computers, Evidence of Machine Learning”,

http://www.nytimes.com/2012/06/26/technology/in-a-big-network-of-co

COMP3411/9414 c©Anthony Knittel, 2014

Page 25: COMP3411/9414: Artificial Intelligence Deep Learning ...cs3411/14s1/lect/1page/13a_DeepLearning.pdfConvolutional Neural Networks CNNs are a form of deep neural network with a specific

COMP3411/9414 Deep Learning Introduction 24

References

COMP3411/9414 c©Anthony Knittel, 2014

afigure by Yoshua Bengio, Montreal. “Learning Deep Architectures for AI”bfigure by Andrew Ng, Stanford. “Sparse Autoencoder”cfigure by Geoff Hinton, Toronto. “The next generation of neural networks”dfigure by LISA lab, Toronto. “Deep Learning tutorials: Restricted Boltzmann Ma-

chines”. http://deeplearning.net/tutorial/rbm.htmlefigure by LISA lab, Toronto. “Deep Learning tutorials: Convolutional Neural Networks”.

http://deeplearning.net/tutorial/lenet.htmlffigure from Zeiler & Fergus 2013