artificial neural network 24-01-12

37
Artificial Neural Network

Upload: anirbed-mondal

Post on 22-Jul-2016

7 views

Category:

Documents


2 download

TRANSCRIPT

Page 1: Artificial Neural Network 24-01-12

Artificial Neural

Network

Page 2: Artificial Neural Network 24-01-12

Introduction

• Introducing some of the fundamental

techniques and principles of Neural Network

Systems

• Investigate some common models and its

applications

Page 3: Artificial Neural Network 24-01-12

What are Neural Networks?

• Neural Networks (NNs) are networks of neurons, for example, as found in

real (i.e. biological) brains.

• Artificial Neurons are crude approximations of the neurons found in

brains. They may be physical devices, or purely mathematical constructs.

• Artificial Neural Networks (ANNs) are networks of Artificial Neurons,

and hence constitute crude approximations to parts of real brains. They may

be physical devices, or simulated on conventional computers.

• From a practical point of view, an ANN is just a parallel computational

system consisting of many simple processing elements connected together

in a specific way in order to perform a particular task.

Page 4: Artificial Neural Network 24-01-12

A Brief History of ANN • 1943 McCulloch and Pitts proposed the McCulloch-Pitts neuron model

• 1949 Hebb published his book The Organization of Behavior, in which the

Hebbian learning rule was proposed.

• 1958 Rosenblatt introduced the simple single layer networks now called Perceptrons.

• 1969 Minsky and Papert’s book Perceptrons demonstrated the limitation of single layer perceptrons, and almost the whole field went into hibernation.

• 1982 Hopfield published a series of papers on Hopfield networks.

• 1982 Kohonen developed the Self-Organizing Maps that now bear his name.

• 1986 The Back-Propagation learning algorithm for Multi-Layer Perceptrons was re-discovered and the whole field took off again.

• 1990s The sub-field of Radial Basis Function Networks was developed.

• 2000s The power of Ensembles of Neural Networks and Support Vector Machines becomes apparent.

Page 5: Artificial Neural Network 24-01-12

Applications of ANN

• Brain Modeling • Models of human development – help children with developmental

problems

• Simulations of adult performance – aid our understanding of how the brain works

• Neuropsychological models – suggest remedial actions for brain damaged patients

• Artificial System Building • Pattern Recognition - speech recognition, hand-writing recognition, sonar

signals

• Data analysis – data compression, data mining

• Noise reduction – function approximation, ECG noise reduction

• Bioinformatics – protein secondary structure, DNA sequencing

• Control systems – autonomous adaptable robots, microwave controllers

• Financial Modeling

Page 6: Artificial Neural Network 24-01-12

Structure of Human Brain

Page 7: Artificial Neural Network 24-01-12

Features of Human Brain

• Ten billion (1010) neurons

• Neuron switching time >10-3secs

• Face Recognition ~0.1secs

• On average, each neuron has several thousand

connections

• Hundreds of operations per second

• High degree of parallel computation

• Distributed representations

Page 8: Artificial Neural Network 24-01-12

Brain Vs. Computer

There are approximately 10 billion neurons in the human cortex, compared with 10 of thousands of processors in the most powerful parallel computers. Each biological neuron is connected to several thousands of other neurons, similar to the connectivity in powerful parallel computers. Lack of processing units can be compensated by speed. The typical operating speeds of biological neurons is measured in milliseconds (10-3 s), while a silicon chip can operate in nanoseconds (10-9 s). The human brain is extremely energy efficient, using approximately 10-16 joules per operation per second, whereas the best computers today use around 10-6 joules per operation per second.

Page 9: Artificial Neural Network 24-01-12

Structure of Neuron

Axon

Terminal Branches

of AxonDendrites

Page 10: Artificial Neural Network 24-01-12

Biological Neural Network

• The majority of neurons encode their outputs or activations as a series of brief electrical pulses (i.e. spikes or action potentials).

• Dendrites are the receptive zones that receives input or activation from other neurons.

• The cell body (soma) of the neuron’s processes the incoming activations and converts them into output activations.

• Axons are transmission lines that send activation to other neurons.

• Synapses allow weighted transmission of signals (using neurotransmitters) between axons and dendrites to build up large biological neural networks.

Page 11: Artificial Neural Network 24-01-12

• A neuron has a cell body, a branching input structure (the

dendrIte) and a branching output structure (the axOn) • Signals can be transmitted unchanged or they can be altered by

synapses. A synapse is able to increase or decrease the strength of the connection from the neuron to neuron and cause excitation or inhibition of a subsequence neuron. This is where information is stored.

• The information processing abilities of biological neural

systems must follow from highly parallel processes operating on representations that are distributed over many neurons. One motivation for ANN is to capture this kind of highly parallel computation based on distributed representations.

Page 12: Artificial Neural Network 24-01-12

Biological and Artificial Neuron

Page 13: Artificial Neural Network 24-01-12

Axon

Terminal Branches

of AxonDendrites

S

x1

x2

w1

w2

wn

xn

x3 w3

Page 14: Artificial Neural Network 24-01-12

The McCulloch-Pitts Neuron

• This vastly simplified model of real neurons is also known as a Threshold Logic Unit :

– A set of synapses (i.e. connections) brings in activations from other neurons.

– A processing unit sums the inputs, and then applies a non-linear activation function (i.e. squashing/transfer/threshold function).

– An output line transmits the result to other neurons.

Page 15: Artificial Neural Network 24-01-12

Perceptron

• Linear treshold unit (LTU)

1 if S wi xi >0 o(xi)= -1 otherwise {

n S

x1

x2

xn

.

.

.

w1

w2

wn

w0

x0=1

S wi xi

o

i=0

i=0

n

Page 16: Artificial Neural Network 24-01-12

• Many simple neuron-like threshold switching

(Linear)units

• Many weighted interconnections among units

• Highly parallel, distributed processing

• Learning by tuning the connection weights

Page 17: Artificial Neural Network 24-01-12

Perceptron Training

t = 0.0

y

x

-1

W = 0.3

W = -0.4

W = 0.5

I1 I2 I3 Summation Output

-1 0 0 (-1*0.3) + (0*0.5) + (0*-0.4) = -0.3 0

-1 0 1 (-1*0.3) + (0*0.5) + (1*-0.4) = -0.7 0

-1 1 0 (-1*0.3) + (1*0.5) + (0*-0.4) = 0.2 1

-1 1 1 (-1*0.3) + (1*0.5) + (1*-0.4) = -0.2 0

For AND

A B Output

0 0 0

0 1 0

1 0 0

1 1 1

Page 18: Artificial Neural Network 24-01-12

Perceptron learning Rule

wi = wi + wi wi = (t - o) xi

t=c(x) is the target value

o is the perceptron output Is a small constant (e.g. 0.1) called learning rate

• If the output is correct (t=o) the weights wi are not changed

• If the output is incorrect (to) the weights wi are changed such that the

output of the perceptron for the new weights is closer to t.

• The algorithm converges to the correct classification

• if the training data is linearly separable

• and is sufficiently small

Page 19: Artificial Neural Network 24-01-12

Decision Boundaries

• In simple cases, divide feature space by

drawing a hyperplane across it.

• Known as a decision boundary.

• Discriminant function: returns different values

on opposite sides. (straight line)

• Problems which can be thus classified are

linearly separable.

Page 20: Artificial Neural Network 24-01-12

Linearly Separable

X1

X2

A

B

A

A

A A

A A

B

B

B

B

B

B

B Decision Boundary

Page 21: Artificial Neural Network 24-01-12

The two-input perceptron can implement the AND function when we set the weights:

w0 = 0.5, w1 = w2 = 0.5

x1 x2 output

0 0 -1

0 1 -1

1 0 -1

1 1 1

<Training examples>

Decision hyperplane :

w0 + w1 x1 + w2 x2 = 0

-0.8 + 0.5 x1 + 0.5 x2 = 0

x1 x2 wixi output

0 0 -0.8 -1

0 1 -0.3 -1

1 0 -0.3 -1

1 1 0.2 1

<Test Results>

-

-

-

+

x1

x2

-0.8 + 0.5 x1 + 0.5 x

2= 0

-

-

-

+

x1

x2

-0.8 + 0.5 x1 + 0.5 x

2= 0

Single Perceptron Used to

Represent AND function

Page 22: Artificial Neural Network 24-01-12

The two-input perceptron can implement the AND function when we set the weights:

w0 = -0.3, w1 = w2 = 0.5

Single Perceptron Used to

Represent OR function

x1 x2 output

0 0 -1

0 1 1

1 0 1

1 1 1

<Training examples>Decision hyperplane :

w0 + w1 x1 + w2 x2 = 0

-0.3 + 0.5 x1 + 0.5 x2 = 0

x1 x2 Swixi output

0 0 -0.3 -1

0 1 0.2 -1

1 0 0.2 -1

1 1 0.7 1

<Test Results>

-

+

+

+

x1

x2

-0.3 + 0.5 x1 + 0.5 x

2= 0

-

+

+

+

x1

x2

-0.3 + 0.5 x1 + 0.5 x

2= 0

Page 23: Artificial Neural Network 24-01-12

XOR Function

It’s i possi le to i ple e t the XOR fu tio y a si gle perception.

x1 x2 output

0 0 -1

0 1 1

1 0 1

1 1 -1

<Training examples>

-

+

+

-

x1

x2

-

+

+

-

x1

x2

A two-layer network of

perceptrons can represent

XOR function. XOR Equation of Two layer

perceptron

Page 24: Artificial Neural Network 24-01-12

Different Non-linearly separable

problems and their Perceptron

Structure Types of

Decision Regions

Exclusive-OR

Problem Classes with

Meshed regions

Single-Layer

Two-Layer

Three-Layer

Half Plane

Bounded By

Hyperplane

Convex Open

Or

Closed Regions

Arbitrary

(Complexity

Limited by No.

of Nodes)

A

A B

B

A

A B

B

A

A B

B

B A

B A

B A

Page 25: Artificial Neural Network 24-01-12

Activation Function

• Transforms neuron’s input into output • Ensures the neuron’s response is bounded

• Sigmoid – Monotonic (not having discontinuity at orgin)

– Bounded

– Simple derivatives

– Non-linear

• Hard Limiter – Not monotonic (Discontinuity at origin)

– Not easily differentiable

– Linear at both its upper and lower bounds

Page 26: Artificial Neural Network 24-01-12

Activation Functions

Page 27: Artificial Neural Network 24-01-12

• The hard-limiting threshold function

– Corresponds to the biological paradigm • either fires or not

• Sigmoid functions ('S'-shaped curves)

– The logistic function

– The hyperbolic tangent (symmetrical)

– Both functions have a simple differential

– Only the shape is important

f(x) = 1

1 + e -ax

Standard Activation Functions

Page 28: Artificial Neural Network 24-01-12

Training of ANN

• Paradigm (Network) developed consists of artificial neurons

• The neurons may be interconnected in different ways, the learning process is not same for them all.

• The paradigm observes learning rule described in a mathematical expression called learning equation.

• Different learning methodologies – different learning technique suits different networks.

Page 29: Artificial Neural Network 24-01-12

Learning Methods

• Supervised Learning

• Unsupervised Learning

• Reinforced Learning

• Competitive Learning

• Delta Rule

• Gradient Descend Rule

Page 30: Artificial Neural Network 24-01-12

Supervised Learning

• Inputs are applied to the paradigm, that results in an output

response

• Response compared with a prior desired output signal, the target

response

• If the actual response differs from target response, the network

generates an error signal, which is then used to calculate the

adjustment should be made on the network’s synaptic weights, so

the actual response matches to the target output.

• The error is minimized, possibly to zero

• This error minimization requires special circuit known as teacher or

supervisor, hence the name supervised learning.

Page 31: Artificial Neural Network 24-01-12

• Amount of calculation required to minimize the error is

depends on algorithm used.

• It is a mathematical tool derived from optimization techniques

• Some parameters to watch

• Time required per iteration

• The number of iterations per input pattern for each error to

reach a minimum during training session

• Whether the network has reached global minimum or local

one, if it is local one the network has escaped from it, or

remains it is trapped

Page 32: Artificial Neural Network 24-01-12

Local Minimum

Page 33: Artificial Neural Network 24-01-12

Unsupervised Learning

• Does not require any teacher, i.e. no target outputs

• During training the NN receives many input patterns, and it arbitrarily

organizes the patterns into categories.

• While testing (When the input later applied), the NN provides output

response indicating the class to which the input belongs.

• If the class cannot be found for the given input, a new class is generated.

• Even though, it does not require a teacher, it requires guidelines to

determine how it will form the groups

• If no guidelines is given the grouping may or may not be successful.

• To classify more accurately, some feature selecting guidelines are required.

Page 34: Artificial Neural Network 24-01-12

Reinforced Learning • Requires one or more neurons at the output

• Another form of supervised learning

• Unlike supervised learning, the teacher, does not indicate how close the actual output is to the desired output, but indicates whether the actual output is same with the target or not.

• Error signal generated during the training is binary: Pass or Fail

• If the supervisor indicates bad the network readjusts its parameters again and again until its output response is right

• No indication is moving in right direction or not

Page 35: Artificial Neural Network 24-01-12

Competitive Learning

• Another form of supervised learning

• Several neurons are at the output layer

• When an input is applied, each output neuron competes with

others to produce the closest output signal to the target.

• Then this output becomes dominant one, and the others cease

producing an output signal for this input.

• For another input signal, another output neuron may be

dominant one, and so on.

• Thus, each neuron is trained to respond to a different input

sets.

Page 36: Artificial Neural Network 24-01-12

Delta Rule

• Continuously adjusts the value of weights such that

the difference of error between the desired and actual

output value of processing element is reduced.

• Also knows as Least Mean Square rule

Page 37: Artificial Neural Network 24-01-12

Gradient Descend Rule

• Value of weights are adjusted by an amount proportional to the

first derivative of the error, with respect to the value of weight.

• Goal is to decrease the error function, avoiding local minima,

and reaching actual or global minimum.