artificial neural network 24-01-12
TRANSCRIPT
Artificial Neural
Network
Introduction
• Introducing some of the fundamental
techniques and principles of Neural Network
Systems
• Investigate some common models and its
applications
What are Neural Networks?
• Neural Networks (NNs) are networks of neurons, for example, as found in
real (i.e. biological) brains.
• Artificial Neurons are crude approximations of the neurons found in
brains. They may be physical devices, or purely mathematical constructs.
• Artificial Neural Networks (ANNs) are networks of Artificial Neurons,
and hence constitute crude approximations to parts of real brains. They may
be physical devices, or simulated on conventional computers.
• From a practical point of view, an ANN is just a parallel computational
system consisting of many simple processing elements connected together
in a specific way in order to perform a particular task.
A Brief History of ANN • 1943 McCulloch and Pitts proposed the McCulloch-Pitts neuron model
• 1949 Hebb published his book The Organization of Behavior, in which the
Hebbian learning rule was proposed.
• 1958 Rosenblatt introduced the simple single layer networks now called Perceptrons.
• 1969 Minsky and Papert’s book Perceptrons demonstrated the limitation of single layer perceptrons, and almost the whole field went into hibernation.
• 1982 Hopfield published a series of papers on Hopfield networks.
• 1982 Kohonen developed the Self-Organizing Maps that now bear his name.
• 1986 The Back-Propagation learning algorithm for Multi-Layer Perceptrons was re-discovered and the whole field took off again.
• 1990s The sub-field of Radial Basis Function Networks was developed.
• 2000s The power of Ensembles of Neural Networks and Support Vector Machines becomes apparent.
Applications of ANN
• Brain Modeling • Models of human development – help children with developmental
problems
• Simulations of adult performance – aid our understanding of how the brain works
• Neuropsychological models – suggest remedial actions for brain damaged patients
• Artificial System Building • Pattern Recognition - speech recognition, hand-writing recognition, sonar
signals
• Data analysis – data compression, data mining
• Noise reduction – function approximation, ECG noise reduction
• Bioinformatics – protein secondary structure, DNA sequencing
• Control systems – autonomous adaptable robots, microwave controllers
• Financial Modeling
Structure of Human Brain
Features of Human Brain
• Ten billion (1010) neurons
• Neuron switching time >10-3secs
• Face Recognition ~0.1secs
• On average, each neuron has several thousand
connections
• Hundreds of operations per second
• High degree of parallel computation
• Distributed representations
Brain Vs. Computer
There are approximately 10 billion neurons in the human cortex, compared with 10 of thousands of processors in the most powerful parallel computers. Each biological neuron is connected to several thousands of other neurons, similar to the connectivity in powerful parallel computers. Lack of processing units can be compensated by speed. The typical operating speeds of biological neurons is measured in milliseconds (10-3 s), while a silicon chip can operate in nanoseconds (10-9 s). The human brain is extremely energy efficient, using approximately 10-16 joules per operation per second, whereas the best computers today use around 10-6 joules per operation per second.
Structure of Neuron
Axon
Terminal Branches
of AxonDendrites
Biological Neural Network
• The majority of neurons encode their outputs or activations as a series of brief electrical pulses (i.e. spikes or action potentials).
• Dendrites are the receptive zones that receives input or activation from other neurons.
• The cell body (soma) of the neuron’s processes the incoming activations and converts them into output activations.
• Axons are transmission lines that send activation to other neurons.
• Synapses allow weighted transmission of signals (using neurotransmitters) between axons and dendrites to build up large biological neural networks.
• A neuron has a cell body, a branching input structure (the
dendrIte) and a branching output structure (the axOn) • Signals can be transmitted unchanged or they can be altered by
synapses. A synapse is able to increase or decrease the strength of the connection from the neuron to neuron and cause excitation or inhibition of a subsequence neuron. This is where information is stored.
• The information processing abilities of biological neural
systems must follow from highly parallel processes operating on representations that are distributed over many neurons. One motivation for ANN is to capture this kind of highly parallel computation based on distributed representations.
Biological and Artificial Neuron
Axon
Terminal Branches
of AxonDendrites
S
x1
x2
w1
w2
wn
xn
x3 w3
The McCulloch-Pitts Neuron
• This vastly simplified model of real neurons is also known as a Threshold Logic Unit :
– A set of synapses (i.e. connections) brings in activations from other neurons.
– A processing unit sums the inputs, and then applies a non-linear activation function (i.e. squashing/transfer/threshold function).
– An output line transmits the result to other neurons.
Perceptron
• Linear treshold unit (LTU)
1 if S wi xi >0 o(xi)= -1 otherwise {
n S
x1
x2
xn
.
.
.
w1
w2
wn
w0
x0=1
S wi xi
o
i=0
i=0
n
• Many simple neuron-like threshold switching
(Linear)units
• Many weighted interconnections among units
• Highly parallel, distributed processing
• Learning by tuning the connection weights
Perceptron Training
t = 0.0
y
x
-1
W = 0.3
W = -0.4
W = 0.5
I1 I2 I3 Summation Output
-1 0 0 (-1*0.3) + (0*0.5) + (0*-0.4) = -0.3 0
-1 0 1 (-1*0.3) + (0*0.5) + (1*-0.4) = -0.7 0
-1 1 0 (-1*0.3) + (1*0.5) + (0*-0.4) = 0.2 1
-1 1 1 (-1*0.3) + (1*0.5) + (1*-0.4) = -0.2 0
For AND
A B Output
0 0 0
0 1 0
1 0 0
1 1 1
Perceptron learning Rule
wi = wi + wi wi = (t - o) xi
t=c(x) is the target value
o is the perceptron output Is a small constant (e.g. 0.1) called learning rate
• If the output is correct (t=o) the weights wi are not changed
• If the output is incorrect (to) the weights wi are changed such that the
output of the perceptron for the new weights is closer to t.
• The algorithm converges to the correct classification
• if the training data is linearly separable
• and is sufficiently small
Decision Boundaries
• In simple cases, divide feature space by
drawing a hyperplane across it.
• Known as a decision boundary.
• Discriminant function: returns different values
on opposite sides. (straight line)
• Problems which can be thus classified are
linearly separable.
Linearly Separable
X1
X2
A
B
A
A
A A
A A
B
B
B
B
B
B
B Decision Boundary
The two-input perceptron can implement the AND function when we set the weights:
w0 = 0.5, w1 = w2 = 0.5
x1 x2 output
0 0 -1
0 1 -1
1 0 -1
1 1 1
<Training examples>
Decision hyperplane :
w0 + w1 x1 + w2 x2 = 0
-0.8 + 0.5 x1 + 0.5 x2 = 0
x1 x2 wixi output
0 0 -0.8 -1
0 1 -0.3 -1
1 0 -0.3 -1
1 1 0.2 1
<Test Results>
-
-
-
+
x1
x2
-0.8 + 0.5 x1 + 0.5 x
2= 0
-
-
-
+
x1
x2
-0.8 + 0.5 x1 + 0.5 x
2= 0
Single Perceptron Used to
Represent AND function
The two-input perceptron can implement the AND function when we set the weights:
w0 = -0.3, w1 = w2 = 0.5
Single Perceptron Used to
Represent OR function
x1 x2 output
0 0 -1
0 1 1
1 0 1
1 1 1
<Training examples>Decision hyperplane :
w0 + w1 x1 + w2 x2 = 0
-0.3 + 0.5 x1 + 0.5 x2 = 0
x1 x2 Swixi output
0 0 -0.3 -1
0 1 0.2 -1
1 0 0.2 -1
1 1 0.7 1
<Test Results>
-
+
+
+
x1
x2
-0.3 + 0.5 x1 + 0.5 x
2= 0
-
+
+
+
x1
x2
-0.3 + 0.5 x1 + 0.5 x
2= 0
XOR Function
It’s i possi le to i ple e t the XOR fu tio y a si gle perception.
x1 x2 output
0 0 -1
0 1 1
1 0 1
1 1 -1
<Training examples>
-
+
+
-
x1
x2
-
+
+
-
x1
x2
A two-layer network of
perceptrons can represent
XOR function. XOR Equation of Two layer
perceptron
Different Non-linearly separable
problems and their Perceptron
Structure Types of
Decision Regions
Exclusive-OR
Problem Classes with
Meshed regions
Single-Layer
Two-Layer
Three-Layer
Half Plane
Bounded By
Hyperplane
Convex Open
Or
Closed Regions
Arbitrary
(Complexity
Limited by No.
of Nodes)
A
A B
B
A
A B
B
A
A B
B
B A
B A
B A
Activation Function
• Transforms neuron’s input into output • Ensures the neuron’s response is bounded
• Sigmoid – Monotonic (not having discontinuity at orgin)
– Bounded
– Simple derivatives
– Non-linear
• Hard Limiter – Not monotonic (Discontinuity at origin)
– Not easily differentiable
– Linear at both its upper and lower bounds
Activation Functions
• The hard-limiting threshold function
– Corresponds to the biological paradigm • either fires or not
• Sigmoid functions ('S'-shaped curves)
– The logistic function
– The hyperbolic tangent (symmetrical)
– Both functions have a simple differential
– Only the shape is important
f(x) = 1
1 + e -ax
Standard Activation Functions
Training of ANN
• Paradigm (Network) developed consists of artificial neurons
• The neurons may be interconnected in different ways, the learning process is not same for them all.
• The paradigm observes learning rule described in a mathematical expression called learning equation.
• Different learning methodologies – different learning technique suits different networks.
Learning Methods
• Supervised Learning
• Unsupervised Learning
• Reinforced Learning
• Competitive Learning
• Delta Rule
• Gradient Descend Rule
Supervised Learning
• Inputs are applied to the paradigm, that results in an output
response
• Response compared with a prior desired output signal, the target
response
• If the actual response differs from target response, the network
generates an error signal, which is then used to calculate the
adjustment should be made on the network’s synaptic weights, so
the actual response matches to the target output.
• The error is minimized, possibly to zero
• This error minimization requires special circuit known as teacher or
supervisor, hence the name supervised learning.
• Amount of calculation required to minimize the error is
depends on algorithm used.
• It is a mathematical tool derived from optimization techniques
• Some parameters to watch
• Time required per iteration
• The number of iterations per input pattern for each error to
reach a minimum during training session
• Whether the network has reached global minimum or local
one, if it is local one the network has escaped from it, or
remains it is trapped
Local Minimum
Unsupervised Learning
• Does not require any teacher, i.e. no target outputs
• During training the NN receives many input patterns, and it arbitrarily
organizes the patterns into categories.
• While testing (When the input later applied), the NN provides output
response indicating the class to which the input belongs.
• If the class cannot be found for the given input, a new class is generated.
• Even though, it does not require a teacher, it requires guidelines to
determine how it will form the groups
• If no guidelines is given the grouping may or may not be successful.
• To classify more accurately, some feature selecting guidelines are required.
Reinforced Learning • Requires one or more neurons at the output
• Another form of supervised learning
• Unlike supervised learning, the teacher, does not indicate how close the actual output is to the desired output, but indicates whether the actual output is same with the target or not.
• Error signal generated during the training is binary: Pass or Fail
• If the supervisor indicates bad the network readjusts its parameters again and again until its output response is right
• No indication is moving in right direction or not
Competitive Learning
• Another form of supervised learning
• Several neurons are at the output layer
• When an input is applied, each output neuron competes with
others to produce the closest output signal to the target.
• Then this output becomes dominant one, and the others cease
producing an output signal for this input.
• For another input signal, another output neuron may be
dominant one, and so on.
• Thus, each neuron is trained to respond to a different input
sets.
Delta Rule
• Continuously adjusts the value of weights such that
the difference of error between the desired and actual
output value of processing element is reduced.
• Also knows as Least Mean Square rule
Gradient Descend Rule
• Value of weights are adjusted by an amount proportional to the
first derivative of the error, with respect to the value of weight.
• Goal is to decrease the error function, avoiding local minima,
and reaching actual or global minimum.