artificial neural network 24-01-12

Artificial Neural

Network

Introduction

• Introducing some of the fundamental

techniques and principles of Neural Network

Systems

• Investigate some common models and its

applications

What are Neural Networks?

• Neural Networks (NNs) are networks of neurons, for example, as found in

real (i.e. biological) brains.

• Artificial Neurons are crude approximations of the neurons found in

brains. They may be physical devices, or purely mathematical constructs.

• Artificial Neural Networks (ANNs) are networks of Artificial Neurons,

and hence constitute crude approximations to parts of real brains. They may

be physical devices, or simulated on conventional computers.

• From a practical point of view, an ANN is just a parallel computational

system consisting of many simple processing elements connected together

in a specific way in order to perform a particular task.

A Brief History of ANN • 1943 McCulloch and Pitts proposed the McCulloch-Pitts neuron model

• 1949 Hebb published his book The Organization of Behavior, in which the

Hebbian learning rule was proposed.

• 1958 Rosenblatt introduced the simple single layer networks now called Perceptrons.

• 1969 Minsky and Papert’s book Perceptrons demonstrated the limitation of single layer perceptrons, and almost the whole field went into hibernation.

• 1982 Hopfield published a series of papers on Hopfield networks.

• 1982 Kohonen developed the Self-Organizing Maps that now bear his name.

• 1986 The Back-Propagation learning algorithm for Multi-Layer Perceptrons was re-discovered and the whole field took off again.

• 1990s The sub-field of Radial Basis Function Networks was developed.

• 2000s The power of Ensembles of Neural Networks and Support Vector Machines becomes apparent.

Applications of ANN

• Brain Modeling • Models of human development – help children with developmental

problems

• Simulations of adult performance – aid our understanding of how the brain works

• Neuropsychological models – suggest remedial actions for brain damaged patients

• Artificial System Building • Pattern Recognition - speech recognition, hand-writing recognition, sonar

signals

• Data analysis – data compression, data mining

• Noise reduction – function approximation, ECG noise reduction

• Bioinformatics – protein secondary structure, DNA sequencing

• Control systems – autonomous adaptable robots, microwave controllers

• Financial Modeling

Structure of Human Brain

Features of Human Brain

• Ten billion (1010) neurons

• Neuron switching time >10-3secs

• Face Recognition ~0.1secs

• On average, each neuron has several thousand

connections

• Hundreds of operations per second

• High degree of parallel computation

• Distributed representations

Brain Vs. Computer

There are approximately 10 billion neurons in the human cortex, compared with 10 of thousands of processors in the most powerful parallel computers. Each biological neuron is connected to several thousands of other neurons, similar to the connectivity in powerful parallel computers. Lack of processing units can be compensated by speed. The typical operating speeds of biological neurons is measured in milliseconds (10-3 s), while a silicon chip can operate in nanoseconds (10-9 s). The human brain is extremely energy efficient, using approximately 10-16 joules per operation per second, whereas the best computers today use around 10-6 joules per operation per second.

Structure of Neuron

Axon

Terminal Branches

of AxonDendrites

Biological Neural Network

• The majority of neurons encode their outputs or activations as a series of brief electrical pulses (i.e. spikes or action potentials).

• Dendrites are the receptive zones that receives input or activation from other neurons.

• The cell body (soma) of the neuron’s processes the incoming activations and converts them into output activations.

• Axons are transmission lines that send activation to other neurons.

• Synapses allow weighted transmission of signals (using neurotransmitters) between axons and dendrites to build up large biological neural networks.

• A neuron has a cell body, a branching input structure (the

dendrIte) and a branching output structure (the axOn) • Signals can be transmitted unchanged or they can be altered by

synapses. A synapse is able to increase or decrease the strength of the connection from the neuron to neuron and cause excitation or inhibition of a subsequence neuron. This is where information is stored.

• The information processing abilities of biological neural

systems must follow from highly parallel processes operating on representations that are distributed over many neurons. One motivation for ANN is to capture this kind of highly parallel computation based on distributed representations.

Biological and Artificial Neuron

Axon

Terminal Branches

of AxonDendrites

S

x1

x2

w1

w2

wn

xn

x3 w3

The McCulloch-Pitts Neuron

• This vastly simplified model of real neurons is also known as a Threshold Logic Unit :

– A set of synapses (i.e. connections) brings in activations from other neurons.

– A processing unit sums the inputs, and then applies a non-linear activation function (i.e. squashing/transfer/threshold function).

– An output line transmits the result to other neurons.

Perceptron

• Linear treshold unit (LTU)

1 if S wi xi >0 o(xi)= -1 otherwise {

n S

x1

x2

xn

.

.

.

w1

w2

wn

w0

x0=1

S wi xi

o

i=0

i=0

n

• Many simple neuron-like threshold switching

(Linear)units

• Many weighted interconnections among units

• Highly parallel, distributed processing

• Learning by tuning the connection weights

Perceptron Training

t = 0.0

y

x

-1

W = 0.3

W = -0.4

W = 0.5

I1 I2 I3 Summation Output

-1 0 0 (-1*0.3) + (0*0.5) + (0*-0.4) = -0.3 0

-1 0 1 (-1*0.3) + (0*0.5) + (1*-0.4) = -0.7 0

-1 1 0 (-1*0.3) + (1*0.5) + (0*-0.4) = 0.2 1

-1 1 1 (-1*0.3) + (1*0.5) + (1*-0.4) = -0.2 0

For AND

A B Output

0 0 0

0 1 0

1 0 0

1 1 1

Perceptron learning Rule

wi = wi + wi wi = (t - o) xi

t=c(x) is the target value

o is the perceptron output Is a small constant (e.g. 0.1) called learning rate

• If the output is correct (t=o) the weights wi are not changed

• If the output is incorrect (to) the weights wi are changed such that the

output of the perceptron for the new weights is closer to t.

• The algorithm converges to the correct classification

• if the training data is linearly separable

• and is sufficiently small

Decision Boundaries

• In simple cases, divide feature space by

drawing a hyperplane across it.

• Known as a decision boundary.

• Discriminant function: returns different values

on opposite sides. (straight line)

• Problems which can be thus classified are

linearly separable.

Linearly Separable

X1

X2

A

B

A

A

A A

A A

B

B

B

B

B

B

B Decision Boundary

The two-input perceptron can implement the AND function when we set the weights:

w0 = 0.5, w1 = w2 = 0.5

x1 x2 output

0 0 -1

0 1 -1

1 0 -1

1 1 1

<Training examples>

Decision hyperplane :

w0 + w1 x1 + w2 x2 = 0

-0.8 + 0.5 x1 + 0.5 x2 = 0

x1 x2 wixi output

0 0 -0.8 -1

0 1 -0.3 -1

1 0 -0.3 -1

1 1 0.2 1

<Test Results>

-

-

-

+

x1

x2

-0.8 + 0.5 x1 + 0.5 x

2= 0

-

-

-

+

x1

x2

-0.8 + 0.5 x1 + 0.5 x

2= 0

Single Perceptron Used to

Represent AND function

The two-input perceptron can implement the AND function when we set the weights:

w0 = -0.3, w1 = w2 = 0.5

Single Perceptron Used to

Represent OR function

x1 x2 output

0 0 -1

0 1 1

1 0 1

1 1 1

<Training examples>Decision hyperplane :

w0 + w1 x1 + w2 x2 = 0

-0.3 + 0.5 x1 + 0.5 x2 = 0

x1 x2 Swixi output

0 0 -0.3 -1

0 1 0.2 -1

1 0 0.2 -1

1 1 0.7 1

<Test Results>

-

+

+

+

x1

x2

-0.3 + 0.5 x1 + 0.5 x

2= 0

-

+

+

+

x1

x2

-0.3 + 0.5 x1 + 0.5 x

2= 0

XOR Function

It’s i possi le to i ple e t the XOR fu tio y a si gle perception.

x1 x2 output

0 0 -1

0 1 1

1 0 1

1 1 -1

<Training examples>

-

+

+

-

x1

x2

-

+

+

-

x1

x2

A two-layer network of

perceptrons can represent

XOR function. XOR Equation of Two layer

perceptron

Different Non-linearly separable

problems and their Perceptron

Structure Types of

Decision Regions

Exclusive-OR

Problem Classes with

Meshed regions

Single-Layer

Two-Layer

Three-Layer

Half Plane

Bounded By

Hyperplane

Convex Open

Or

Closed Regions

Arbitrary

(Complexity

Limited by No.

of Nodes)

A

A B

B

A

A B

B

A

A B

B

B A

B A

B A

Activation Function

• Transforms neuron’s input into output • Ensures the neuron’s response is bounded

• Sigmoid – Monotonic (not having discontinuity at orgin)

– Bounded

– Simple derivatives

– Non-linear

• Hard Limiter – Not monotonic (Discontinuity at origin)

– Not easily differentiable

– Linear at both its upper and lower bounds

Activation Functions

• The hard-limiting threshold function

– Corresponds to the biological paradigm • either fires or not

• Sigmoid functions ('S'-shaped curves)

– The logistic function

– The hyperbolic tangent (symmetrical)

– Both functions have a simple differential

– Only the shape is important

f(x) = 1

1 + e -ax

Standard Activation Functions

Training of ANN

• Paradigm (Network) developed consists of artificial neurons

• The neurons may be interconnected in different ways, the learning process is not same for them all.

• The paradigm observes learning rule described in a mathematical expression called learning equation.

• Different learning methodologies – different learning technique suits different networks.

Learning Methods

• Supervised Learning

• Unsupervised Learning

• Reinforced Learning

• Competitive Learning

• Delta Rule

• Gradient Descend Rule

Supervised Learning

• Inputs are applied to the paradigm, that results in an output

response

• Response compared with a prior desired output signal, the target

response

• If the actual response differs from target response, the network

generates an error signal, which is then used to calculate the

adjustment should be made on the network’s synaptic weights, so

the actual response matches to the target output.

• The error is minimized, possibly to zero

• This error minimization requires special circuit known as teacher or

supervisor, hence the name supervised learning.

• Amount of calculation required to minimize the error is

depends on algorithm used.

• It is a mathematical tool derived from optimization techniques

• Some parameters to watch

• Time required per iteration

• The number of iterations per input pattern for each error to

reach a minimum during training session

• Whether the network has reached global minimum or local

one, if it is local one the network has escaped from it, or

remains it is trapped

Local Minimum

Unsupervised Learning

• Does not require any teacher, i.e. no target outputs

• During training the NN receives many input patterns, and it arbitrarily

organizes the patterns into categories.

• While testing (When the input later applied), the NN provides output

response indicating the class to which the input belongs.

• If the class cannot be found for the given input, a new class is generated.

• Even though, it does not require a teacher, it requires guidelines to

determine how it will form the groups

• If no guidelines is given the grouping may or may not be successful.

• To classify more accurately, some feature selecting guidelines are required.

Reinforced Learning • Requires one or more neurons at the output

• Another form of supervised learning

• Unlike supervised learning, the teacher, does not indicate how close the actual output is to the desired output, but indicates whether the actual output is same with the target or not.

• Error signal generated during the training is binary: Pass or Fail

• If the supervisor indicates bad the network readjusts its parameters again and again until its output response is right

• No indication is moving in right direction or not

Competitive Learning

• Another form of supervised learning

• Several neurons are at the output layer

• When an input is applied, each output neuron competes with

others to produce the closest output signal to the target.

• Then this output becomes dominant one, and the others cease

producing an output signal for this input.

• For another input signal, another output neuron may be

dominant one, and so on.

• Thus, each neuron is trained to respond to a different input

sets.

Delta Rule

• Continuously adjusts the value of weights such that

the difference of error between the desired and actual

output value of processing element is reduced.

• Also knows as Least Mean Square rule

Gradient Descend Rule

• Value of weights are adjusted by an amount proportional to the

first derivative of the error, with respect to the value of weight.

• Goal is to decrease the error function, avoiding local minima,

and reaching actual or global minimum.

artificial neural network 24-01-12

Documents