introduction to neural networks - computer...

Pictures are taken from

http://www.cs.cmu.edu/~tom/mlbook-chapter-slides.html

http://research.microsoft.com/~cmbishop/PRML/index.htm

By Nobel Khandaker

INTRODUCTION TO NEURAL NETWORKS









Neural Networks – An Introduction

Overview of Neural Networks

Origin, Definitions, examples

Basic building blocks of Neural Networks

Perceptrons, Sigmoids

Gradient Descent Algorithm

BACKPROPAGATION Algorithm

2

What is a Neural Network? - I

A general, practical method for learning real-valued, discrete-valued and vector-valued functions from examples

Uses of Neural Networks:

Recognizing handwritten characters (Microsoft uses ANN)

Recognizing spoken words

Recognizing human faces

Interpreting visual scenes

Learning robot control strategies

3

What is a Neural Network? - II

Neural Network is a set of connected INPUT/OUTPUT UNITS, where each connection has a WEIGHT associated with it.

Neural Network learning is also called CONNECTIONIST learning due to the connections between units.

It is a case of SUPERVISED, INDUCTIVE or CLASSIFICATION learning.

Neural Network learns by adjusting the weights so as to be able to correctly classify the training data and hence, after testing phase, to classify unknown data.

4

Example of Neural Network5

Use of Neural Network

ALVINN system uses Neural

Networks to steer

autonomous vehicle (70

mph)

The Neural Network output

uses the camera input to

determine steering direction

6

Invention of Neural Networks

Biological learning systems are built of very

complex webs of interconnected neurons e.g. human

brain

Your brain takes about 10-1 s to recognize your mother

Neural networks are built using densely

interconnected set of simple units.

Each unit takes a number of real valued inputs and

produces a single real valued output

7

Strengths and Weaknesses of Neural

Networks - I

Strengths

Can handle against complex data (i.e., problems with

many parameters)

Can handle noise in the training data

Prediction accuracy is generally high

Neural Networks are robust, work well even when

training examples contain errors

Neural Networks can handle missing data well

8

Strengths and Weaknesses of NNs - II

Neural Network implementations are slow in the training phase

A major disadvantage of neural network lies in their knowledge representation.

Acquired knowledge in the form of a network units connected by weighted links is difficult for humans to interpret.

This factor has motivated research in extracting the knowledge embedded in trained neural network and in representing it in forms of symbolic rules

9

Perceptron

Perceptron

10

Use of Perceptron

Say +1 represents TRUE and -1 represents FALSE

How can we set the weights of a perceptron to

represent AND?

w0=-0.8, w1=w2=0.5

Name a boolean function that cannot be

represented by a single perceptron –

XOR

11

Perceptron Training Rule - I

Problem: Determine the weight vector that causes the perceptron to produce correct output for the training examples.

Several algorithms exist:

Perceptron Rule

Delta Rule

Both of these algorithms are guaranteed to converge

For perceptron rule, training examples are assumed to be linearly separable

1

12

Perceptron Training Rule - II

1

Learning will converge if:

training examples are assumed to be linearly separable

η is sufficiently small

13

Gradient Descent and Delta Rule - I

How to train perceptrons when the training

examples are not linearly separable?

Use the delta rule

Key idea in delta rule:

Use gradient descent to search the hypothesis space to

find the weights that best fit the training examples

14

Gradient Descent and Delta Rule -II

D - set of training examples

tD – target output for training examples

od – output of the linear unit for training example d

15

Gradient Descent and Delta Rule - III

The weights wo ,w1

plane represents entire

hypothesis space

Vertical axis

represents the error E

Gradient descent

search determines

weight vector to

minimize E

16

Gradient Descent Algorithm17

Multilayer Networks

Single perceptrons can only express linear decision

surfaces

Multilayer networks learned by can express non-

linear decision surfaces

We need a network that can represent highly non-

linear functions

We can use Sigmoid units.

18

Example of a Multilayer Network

Network was trained to recognize 1of 10 vowel sounds

Network input consist of F1, F2 obtained from spectral analysis of sound

Network prediction is the output whose value is highest

Decision regions of a

multilayer feed

forward network

19

Sigmoid Units

It computes the output

o as:

The range of the

output function is [0,1]

ye

y

where

xwo

1

1

20

BACKPROPAGATION Algorithm - I

Backpropagation (training_examples, η, nin , nout ,

nhidden )

denotes the pair of training values

denotes the vector of network input values

denotes the vector of target network output

values

η = learning rate

nin = number of network inputs

tx

,

x

t

21

BACKPROPAGATION Algorithm - II

Backpropagation (training_examples, η, nin , nout , nhidden )

nout = number of network outputs

nhidden = number of units in the hidden layer

xji denotes the input from i to j

wji denotes the weight from unit i to j

Since this is a network of multiple units, the error function is defined as:

2

2

1

Dd outputsk

kdkd otwE

22

BACKPROPAGATION Algorithm - III

Create a feed-forward

network with nin inputs,

nhidden hidden units, and

nout output units

Initialize all network

weights to small random

numbers (e.g., -0.05 and

0.05)

Until the termination

condition is met, Do

23

BACKPROPAGATION Algorithm - IV

BACKPROPAGATION algorithm uses a gradient descent search through the space of possible network weights , iteratively reducing E

Gradient descent may get trapped in any one of the local minimas

Only guaranteed to converge to some local minimum in E

However, in practice, the BACKPROPAGATION algorithm performs well

Gradient descent over complex error surfaces is poorly understood

24

BACKPROPAGATION Algorithm - V

No methods exist to predict with certainty when

local minima will cause difficulties

Heuristics used to alleviate the problem of local

minimas:

Train multiple networks using the same data, but

initialize each network with different random weights

Use stochastic gradient descent

Add a momentum term to the weight-update rule

25

Example of BACKPROPAGATION - I26

Example of BACKPROPAGATION - II27

Example of BACKPROPAGATION - III28

1

2

3

6

4

5

0.5

0.5

0.5

0.5

0.5

0.5

0.5

Hidden Units

Output

Input Units

0.5

0

1

0

A Neural Network For Simulating AND Function

Example of BACKPROPAGATION - III

The given network was

trained using initial

weights randomly set

between (-1.0, 1.0)

Learning rate η = 0.3

(x,y) = (No. of

iterations of the outer

loop, Sum of Squared

errors)

29

Example of BACKPROPAGATION - IV

Evolution of hidden

layers

(x,y) = (No. of


loop, hidden unit

values)

30

Example of BACKPROPAGATION - V

Evolution of individual

weights

(x,y) = (No. of


loop, weights from

inputs to one hidden

unit)

31

Representational Power of

Feedforward Networks

Set of functions that can be represented:

Boolean functions

Number of hidden units required grows exponentially with the number of network inputs in the worst case

Continuous functions

Every bounded continuous function can be approximated with a network of two layers

Arbitrary functions

Any arbitrary function can be approximated to an arbitrary accuracy by a network of three layers

32

Regularization - I33

The number of input and outputs in a network are determined by the dimension of the data and the number of classes

The number of hidden units (M) is a free parameter that can be adjusted to give the best predictive performance

M also represents the weights and biases in the network

The sub-optimum value of M could result in under-fitting and over-fitting

Regularization - II34

Examples of two-layer networks trained on 10 data points drawn from the sinusoidal data set

Regularization - III35

How to control the complexity of a neural network

to avoid over-fitting

We can choose a relatively large value of M and

then control the complexity by adding a regularizer

term

A simple regularizer is:

This function is also known as: weight decay

wwwEwE T

2

~

Regularization - IV36

Problem: The simple weight decay function is

inconsistent with the scaling properties of network

mapping

Solution: A regularizer invariant under linear

transformations

Wi – set of weights in the ith layer

This regularizer remains unchanged with

21

2221

22 WwWw

ww

2

2/1

21

2/1

1 , ca

Invariances - I37

Predictions of a classifier should remain invariant

under any transformation of input variables

Example:

In handwritten character recognition:

Each character should be classified correctly irrespective of

its position (translation invariance)

Each character should be classified correctly irrespective of

its size (scale invariance)

Neural network can learn the invariance with

sufficient number of training examples

Invariances - II38

What if we do not have enough training examples?

Augment training set using replicas of the training

pattern

Example: make multiple copies of the training set of

character recognition problem where each character is

shifted to a different position

Add a regularization term to the error function that

penalizes changes in the output model when the input is

transformed

Invariances - III39

Synthetic warping of a handwritten digit. Top Right digits show the warped

input digit (Left) using random displacement and smoothing using Gaussians

of width 0.01, 30, 60. Displacement fields are shown in bottom right row.

Bayesian Neural Networks

Laplace approximation for a Bayesian neural

network with 8 hidden units and a single output unit

40

Conclusion41

What we have learned about Neural Networks?

What is a Neural Network – Definition, Examples

Strengths and weaknesses of Neural Networks

Basic building blocks – Perceptrons, Sigmoids

Perceptron Training Rules – Delta Rules, Gradient Descent

Multilayer Networks

BACKPROPAGATION Algorithm – description, example

Regularization

Invariances

Bayesian Neural Networks

introduction to neural networks - computer...

Documents