ai - characteristics of neural networks

22
5 Neural Networks

Upload: girish-kumar-nistala

Post on 27-Nov-2014

3.107 views

Category:

Documents


1 download

TRANSCRIPT

Page 1: AI - Characteristics of Neural Networks

5

Neural Networks

Page 2: AI - Characteristics of Neural Networks

http://rajakishor.co.cc Page 2

What is a Neural Network?

The human brain is a highly complex, nonlinear and parallel computer. It has the capability to organize its structural constituents, known as neurons, so as to perform certain computations many times faster than the fastest digital computer in existence today.

A neural network is massively parallel distributed processor made up of simple processing units, which has a natural propensity for storing experiential knowledge and making it available for use.

It resembles the brain in two respects:

1. Knowledge is acquired by the network from its environment through process.

2. Interneuron connection strengths, known as synaptic weights, are used to store the acquired knowledge.

The procedure used to perform the learning process is called a learning algorithm. Its function is to modify the synaptic weights of the network to attain a desired design objective.

Neural networks are also referred to as neurocomputers, connectionist networks, parallel distributed processors.

Benefits of Neural Network

A neural network derives its computing power through

1. its massively parallel distribute structure

2. its ability to learn and generalize

Properties and Capabilities of Neural Networks

1. Nonlinearity

Nonlinearity is a highly important property, particularly if the underlying physical mechanism responsible for generation of the input signal is inherently nonlinear.

2. Input-output mapping

Supervised learning

Working through training samples or task examples.

Page 3: AI - Characteristics of Neural Networks

http://rajakishor.co.cc Page 3

3. Adaptivity

Adapting the synaptic weights to change in the surrounding environments.

4. Evidential response

5. Contextual information

6. Fault tolerance

7. VLSI implementability

8. Uniformity of analysis and design

9. Neurobiological analogy

Human Brain

The human nervous system may be viewed as a three-stage system.

Central to the nervous system is the brain. It is represented by the neural net. The brain continually receives the information, perceives it, and makes appropriate decisions. The arrows pointing from left to right indicate the forward transmission of information – bearing signals through the system. The arrows pointing from right to left signify the presence of feedback in the system.

The receptors convert stimuli from the human body or the external environment into electrical impulses that convey information to the neural net (the brain). The effectors convert electrical impulses generated by the neural net into discernible responsible as system outputs.

Typically, neurons are five to six orders of magnitude slower than silicon gates. Events in the silicon chip happen in the 10-9s – range, whereas neural events happen in the 10-3s – range.

It is estimated that there are approximately 10 billion neurons and 60 trillion synapses or connections in the human brain.

Synapses are elementary structural and functional units that mediate the interactions between neurons. The most common kind of synapse is a chemical synapse.

A chemical synapse operates as follows. A pre-synaptic process liberates a transmitter substance that diffuses across the synaptic junction between neurons and then

Page 4: AI - Characteristics of Neural Networks

http://rajakishor.co.cc Page 4

acts on a post-synaptic process. Thus, a synapse converts a pre-synaptic electric signal into a chemical signal and then back into a post synaptic electrical signal.

Structural organization of levels in the brain

The synapses represent the most fundamental level, depending on molecules and ions for their action.

A neural microcircuit refers to an assembly of synapses organized into patterns of connectivity to produce a functional operation of interest.

The neural microcircuits are grouped to form dendritic subunits within the dendritic trees of individual neurons.

The whole neuron is about 100m in size. It contains several dendritic subunits.

The local circuits are made up of neurons with similar or different properties. Each circuit is about 1mm in size. The neural assemblies perform operations on characteristics of a localized region in the brain.

Page 5: AI - Characteristics of Neural Networks

http://rajakishor.co.cc Page 5

The interregional circuits are made up of pathways, columns and topographic maps, which involve multiple regions located in different parts of the brain.

Topographic maps are organized to respond to incoming sensory information.

The central nervous system is the final level of complexity where the topographic maps and other interregional circuits mediate specific types of behavior.

Models of a Neuron

A neuron is an information-processing unit that is fundamental to the operation of a neural network. Its model can be shown in the following block diagram.

The neuronal model has three basic elements:

1. A set of synapses each of which is characterized by a weight or strength of its own. Each synapse has two parts: a signal xj and a weight wkj. Wkj refers to the weight of the kth neuron with respect to jth input signal. The synaptic weight may range through positive as well as the negative values.

2. An adder for summing the input signals, weighted by the respective synapses of the neuron.

3. An activation function for limiting the amplitude of the output of a neuron.

The neuron model also includes an externally applied bias, bk. the bias has the effect of increasing or lowering the net input of the activation function, depending on whether it is positive or negative, respectively.

Page 6: AI - Characteristics of Neural Networks

http://rajakishor.co.cc Page 6

A neuron k may be mathematically described as follows:

where x1, x2, …, xm are the input signals; Wk1, Wk2, …, Wkm are the synaptic weights of the neuron k; uk is the linear combiner output due to the input signal; bk is the bias; vk is the induced local field; (.) is the activation function and yk is the output signal of the neuron k.

The use of bias bk has the effect of applying an affine transformation to the output uk of the linear combiner.

So, we can have

vk = uk + bk ---- (2)

Now, the equation (1) will be written as follows:

Due to this affine transformation, the graph of vk versus uk no longer passes through the origin.

vk is called the induced local field or activation potential of neuron k. In vk we have added a synapse. Its input is x0 = +1 and weight is Wk0 = bk.

Page 7: AI - Characteristics of Neural Networks

http://rajakishor.co.cc Page 7

Types of Activation Function

1. Threshold function

2. Piecewise-linear function

3. Sigmoid function

The activation function defines the output of a neuron in terms of the induced local field vk.

1. Threshold function

The function is defined as

1, 0( )

0, 0

if vv

if v

This form of a threshold function is also called as Heaviside function. Correspondingly, the output of neuron k is expressed as

1, 0

0, 0

k

k

k

if vy

if v

where

1

m

k kj j k

j

v W x b

This model is also called the McCullouch-Pitts model. In this model, the output of a neuron is 1, if the induced local field of that neuron is nonnegative, and 0 otherwise. This statement describes the all-or-none property of the model.

Page 8: AI - Characteristics of Neural Networks

http://rajakishor.co.cc Page 8

2. Piecewise-linear function

The activation function, here, is defined as

11,

2

1 1( ) ,

2 2

10,

2

v

v v v

v

where the amplification factor inside the linear region of operation is assumed to be unit. Two situations can be observed for this function:

A linear combiner arises if the linear region of operation is maintained without running into situation.

The piecewise-linear function reduces to a threshold function if the amplification factor of the linear region is made infinitely large.

3. Sigmoid function

This is the most common form of activation function used in the construction of artificial neural networks. It is defined as a strictly increasing function that exhibits a graceful balance between linear and nonlinear behavior.

An example of sigmoid function is the logistic function, which is defined as

1( )

1 avv

e

where a is the slope parameter of the sigmoid function.

Page 9: AI - Characteristics of Neural Networks

http://rajakishor.co.cc Page 9

Neural Networks and Directed Graphs

The neural network can be represented through a signal-flow graph. A signal-flow graph is a network of directed links (branches) that are interconnected at certain points called nodes. A typical node j has an associated node signal xj. A typical directed link originates at node j and terminates on node k; it has an associated transfer function or transmittance that specifies manner in which the signal yk at node k depends on the signal xj at node j.

The flow of signals in the various parts of the graph is directed by three basic rules.

Rule-1:

A signal flows along a link only in the direction defined by the arrow on the link. There are two types of links:

Synaptic links: whose behavior is governed by a linear input-output relation. Here, we have yk = Wkjxj.

For example,

Activation links: whose behavior is governed by a nonlinear input-output relation.

For example,

Rule-2:

A node signal equals the algebraic sum of all signals entering the pertinent node via the incoming links. This also called the synaptic convergence or fan-in.

For example,

Page 10: AI - Characteristics of Neural Networks

http://rajakishor.co.cc Page 10

Rule-3:

The signal at a node is transmitted to each outgoing link originating from that node.

For example,

This rule is also called the synaptic divergence or fan-out.

A neural network is a directed graph consisting of nodes with interconnecting synaptic and activation links. It is characterized by four properties:

1. Each neuron is represented by a set of linear synaptic links, an externally applied bias, and a possibly nonlinear activation link. The bias is represented by a synaptic link connected to an input fixed at +1.

2. The synaptic links of a neuron weight their respective input signals.

3. The weighted sum of the input signals defines the induced local field of the neuron under study.

4. The activation link squashes the induced local field of the neuron to produce an output.

Note: A digraph describes not only the signal flow from neuron to neuron, but also the signal flow inside each neuron.

Neural Networks and Architectures

There are three fundamentally different classes of network architectures:

1. Single-layer feedforward networks

2. Multilayer feedforward networks

3. Recurrent networks or neural networks with feedback

1. Single-layer feedforward networks

In a layered neural network, the neurons are organized in the form of layers. The simplest form of a layered network has an input layer of source nodes that project onto an output layer but not vice versa.

Page 11: AI - Characteristics of Neural Networks

http://rajakishor.co.cc Page 11

For example,

The above network is a feedforward or acyclic type. This is also called a single-layer network. The single layer refers to the output layer as computations take place only at the output nodes.

2. Multilayer feedforward networks

In this class, a neural network has one or more hidden layers, whose computation nodes are called hidden neurons or hidden units. The function of hidden neurons is to intervene between the external input and the network output in a useful manner. By adding one or more hidden layers, the network is enabled to extract higher-order statistics. This is essentially required when the size of the input layer is large. For example,

Page 12: AI - Characteristics of Neural Networks

http://rajakishor.co.cc Page 12

The source nodes in the input layer supply respective elements of the activation pattern (input vector), which constitutes the input signals applied to the second layer.

The output signals of the second layer are used as inputs to the inputs to the third layer, and so on for the rest of the network.

The set of output signals of the neurons in the output layer constitutes the overall response of the network to the activation pattern supplied by the source nodes in the input layer.

3. Recurrent networks or neural networks with feedback

In this class, a network will have at least one feedback loop.

For example,

The above is a recurrent network with no hidden neurons. The presence of feedback loops has an impact on the learning capability of the network and on its performance. Moreover, the feedback loops involve the use of unit-delay elements (denoted by Z-1), which result in a nonlinear dynamical behavior of the network.

Page 13: AI - Characteristics of Neural Networks

http://rajakishor.co.cc Page 13

Knowledge Representation

Knowledge refers to stored information or models used by a person or machine to interpret, predict, and appropriately respond to the outside world.

Knowledge representation involves the following:

1. Indentifying the information that is to be processed

2. Physically encoding the information for subsequent use

Knowledge representation is goal directed. In real-world applications of “intelligent” machines, a good solution depends on a good representation of knowledge.

A major task for a neural network is to provide a model for a real-time environment into which it is embedded. Knowledge of the world consists of two kinds of information:

1. Prior information: It gives the known state of the world. It is represented by facts about what is and what has been known.

2. Observations: These are the measures of the world. These are obtained by the sensors that probe the environment where the neural network operates.

The set of input-output pairs, with each pair consisting of an input signal and the corresponding desired response is called a set of training data or training sample.

Ex: Handwritten digital recognition.

The training sample consists of a large variety of handwritten digits that are representative of a real-time situation. Given such a set of examples, the design of a neural network may proceed as follows:

Step-1: Select an appropriate architecture for the NN, with an input layer consisting of source nodes equal in number to the pixels of an input image, and an output layer consisting of 10 neurons (one for each digit). A subset of examples is then used to train the network by means of a suitable algorithm. This phase is the learning phase.

Step-2: The recognition performance of the trained network is tested with data not seen before. Here, an input image is presented to the network and not its corresponding digit. The NN now compares the input image with the stored image of digits and then produces the required output digit. This phase is called the generalization.

Note: The training data for a NN may consist of both positive and negative examples.

Page 14: AI - Characteristics of Neural Networks

http://rajakishor.co.cc Page 14

Example: A simple neuronal model for recognizing handwritten digits.

Consider an input set X of key patterns X1, X2, X3, ……

Each key pattern represents a specific handwritten digit.

The network has k neurons.

Let W = {w1j(i), w2j(i), w3j(i), ……}, for j= 1,2,3, …., k be the set of weights of X1, X2, X3, ….. with respect to each of k neurons in the network. i referrers to an instance.

Let y(j) be the generated output of neuron j for j=1,2,…k.

Let d(j) be the desired output of neuron j, for j=1,2,…..k.

Let e(j)= d(j) – y(j) be the error that is calculated at neuron j, for j = 1,2,…,k.

Now we design the neuronal model for the system as follows.

In the above model, each neuron computes a specific digit j. With every key pattern, synapses are established to every neuron in the model. We assumed that the weights of each key pattern can be either 0 or 1.

Page 15: AI - Characteristics of Neural Networks

http://rajakishor.co.cc Page 15

Ex: Let the key pattern x1 corresponds a hand written digit 1. So its synaptic weight W11(i) should be 1 for the 1st neuron and all other synaptic weights for x1 is must be 0.

Weight matrix for the above model can be as follows.

Now the output for the neuron will be computed as follows.

Y(1) = w11x1+w21x2+w31x3+……………….+w91x9

= 1.(x1)+0.(x2)+0.(x3)+…………..+0.(x9)

= x1

Which means that neuron 1 is designed to recognize only the key pattern x1 which corresponds to the hand written digit 1. In the same way all other neurons in the model have to recognize their respective digits.

Rules for knowledge representation

Rule-1: Similar inputs from similar classes should produce similar representations inside the network, and should belong to the same category. The concept of Euclidean distance is used as a measure of the similarity between inputs.

Let Xi denote the m x 1 vector.

Xi = [xi1, xi2, …, xim]T.

The vector Xi defines a point in an m-dimensional space called Euclidian space denoted by Rm.

Now, the Euclidean distance between Xi and Xj is defined by

2

1

( , , ) || ||

( )

i j i j

m

ik jk

k

d X X X X

x x

Page 16: AI - Characteristics of Neural Networks

http://rajakishor.co.cc Page 16

The two inputs Xi and Xj are said to be similar if d(Xi, Xj) is minimum.

Rule-2: Items to be categorized as separate classes should be given widely different representations in the network.

This rule is the exact opposite of rule-1.

Rule-3: If a particular feature is important, then there should be a large number of neurons involved in the representation of that item in the network.

Ex: A radar application involving the detection of a target in the presence of clutter. The detection performance of such a radar system is measured in terms of two probabilities.

Probability of detection

Probability of false alarm

Rule-4: Prior information and invariances should be built into the design of a NN, thereby simplifying the network design by not having to learn them.

How to build prior information into NN design?

We can use a combination of two techniques:

1. Restricting the network architecture through the use of local connections known as receptive fields.

2. Constraining the choice of synaptic weight through the use of weight sharing.

How to build invariances into NN design?

Coping with a range of transformations of the observed signals.

Pattern recognition.

Need of a system that is capable of understanding the whole environment.

A primary requirement of pattern recognition is to design a classifier that is invariant to the transformations.

There are three techniques for rendering classifier-type NNs invariant to transformations:

1. Invariance by structure

2. Invariance by training

3. Invariant feature space

Page 17: AI - Characteristics of Neural Networks

http://rajakishor.co.cc Page 17

Basic Learning Laws

A neural network learns about its environment through an interactive process of adjustments applied to its synaptic weights and bias levels. The network becomes more knowledgeable after each iteration of the learning process.

Learning is a process by which the free parameters of a neural network are adapted through a process of stimulation by the environment in which the network is embedded.

The operation of a neural network is governed by neuronal dynamics. Neuronal dynamics consists of two parts: one corresponding to the dynamics of the activation state and the other corresponding to the dynamics of the synaptic weights.

The Short Term Memory (STM) in neural networks is modeled by the activation state of the network. The Long Term Memory (LTM) corresponds to the encoded pattern information in the synaptic weights due to learning.

Learning laws are merely implementation models of synaptic dynamics. Typically, a model of synaptic dynamics is described in terms of expressions for the first derivative of the weights. They are called learning equations.

Learning laws describe the weight vector for the ith processing unit at time instant (t+1) in terms of the weight vector at time instant (t) as follows:

Wi(t+1) = Wi(t) + Wi(t)

where Wi(t) is the change in the weight vector.

There are different methods for implementing the learning feature of a neural network, leading to several learning laws. Some basic learning laws are discussed below. All these learning laws use only local information for adjusting the weight of the connection between two units.

Hebb’s Laws

Here the change in the weight vector is given by

Wi(t) = f(WiTa)a

Therefore, the jth component of Wi is given by

wij = f(WiTa)aj

= siaj, for j = 1, 2, …, M.

where si is the output signal of the ith unit. a is the input vector.

The Hebb’s law states that the weight increment is proportional to the product of the input data and the resulting output signal of the unit. This law requires weight initialization to small random values around wij = 0 prior to learning. This law represents an unsupervised learning.

Page 18: AI - Characteristics of Neural Networks

http://rajakishor.co.cc Page 18

Perceptron Learning Law

Here the change in the weight vector is given by

Wi = [di – sgn(WiTa)]a

where sgn(x) is sign of x. Therefore, we have

wij = [di – sgn(WiTa)]aj

= (di – Si) aj, for j = 1, 2, …, M.

The perceptron law is applicable only for bipolar output functions f(.). This is also

called discrete perceptron learning law. The expression for wij shows that the weights are adjusted only if the actual output si is incorrect, since the term in the square brackets is zero for the correct output.

This is a supervised learning law, as the law requires a desired output for each input. In implementation, the weights can be initialized to any random initial values, as they are not critical. The weights converge to the final values eventually by repeated use of the input-output pattern pairs, provided the pattern pairs are representable by the system.

Delta Learning Law

Here the change in the weight vector is given by

Wi = [di – f(WiTa)] f(WiTa)a

where f(x) is the derivative with respect to x. Hence,

wij = [di – f(WiTa)] f(Wi

Ta)aj

= [di - si] f(xi) aj, for j = 1, 2, …, M.

This law is valid only for a differentiable output function, as it depends on the derivative of the output function f(.). It is a supervised learning law since the change in the weight is based on the error between the desired and the actual output values for a given input.

Delta learning law can also be viewed as a continuous perceptron learning law.

In-implementation, the weights can be initialized to any random values as the values are not very critical. The weights converge to the final values eventually by repeated use of the input-output pattern pairs. The convergence can be more or less guaranteed by using more layers of processing units in between the input and output layers. The delta learning law can be generalized to the case of multiple layers of a feedforward network.

Page 19: AI - Characteristics of Neural Networks

http://rajakishor.co.cc Page 19

Widrow and Hoff LMS Learning Law

Here, the change in the weight vector is given by

Wi = [di - WiTa]a

Hence

wij = [di - WiTa]aj, for j = 1, 2, …, M.

This is a supervised learning law and is a special case of the delta learning law, where the output function is assumed linear, i.e., f(xi) = xi.

In this case the change in the weight is made proportional to the negative gradient of the error between the desired output and the continuous activation value, which is also the continuous output signal due to linearity of the output function. Hence, this is also called the Least Mean Squared (LMS) error learning law.

In implementation, the weights may be initialized to any values. The input-output pattern pairs data is applied several times to achieve convergence of the weights for a given set of training data. The convergence is not guaranteed for any arbitrary training data set.

Correlation Learning Law

Here, the change in the weight vector is given by

Wi = dia

Therefore,

wij = diaj

This is a special case of the Hebbian learning with the output signal (si) being replaced by the desired signal (di). But the Hebbian learning is an unsupervised learning, whereas the correlation learning is a supervised learning, since it uses the desired output value to adjust the weights. In the implementation of the learning law, the weights are initialised to small random values close to zero, i.e., wij ≈ 0.

Page 20: AI - Characteristics of Neural Networks

http://rajakishor.co.cc Page 20

Instar (Winner-take-all) Learning Law

This is relevant for a collection of neurons, organized in a layer as shown below.

All the inputs are connected to each of the units in the output layer in a feedforward manner. For a given input vector a, the output from each unit i is computed using the weighted sum wi

Ta. The unit k that gives maximum output is identified. That is

max( )T T

k ii

W W a

Then the weight vector leading to the kth unit is adjusted as follows:

Wk = (a - Wk)

Therefore,

wkj = (aj - wkj), for j = 1, 2, …, M.

The final weight vector tends to represent a group of input vectors within a small neighbourhood. This is a case of unsupervised learning. In implementation, the values of the weight vectors are initialized to random values prior to learning, and the vector lengths are normalized during learning.

Outstar Learning Law

The outstar learning law is also related to a group of units arranged in a layer as shown below.

Page 21: AI - Characteristics of Neural Networks

http://rajakishor.co.cc Page 21

In this law the weights are adjusted so as to capture the desired output pattern characteristics. The adjustment of the weights is given by

Wjk = (dj - wjk), for j = 1, 2, …, M

where the kth unit is the only active unit in the input layer. The vector d = (d1, d2, …, dM)T is the desired response from the layer of M units.

The outstar learning is a supervised learning law, and it is used with a network of instars to capture the characteristics of the input and output patterns for data compression. In implementation, the weight vectors are initialized to zero prior to learning.

Pattern Recognition

Data refers to the collection of raw facts, whereas, the pattern refers to an observed sequence of facts.

The main difference between human and machine intelligence comes from the fact that humans perceive everything as a pattern, whereas for a machine everything is data. Even in routine data consisting of integer numbers (like telephone numbers, bank account numbers, car numbers) humans tend to perceive a pattern. If there is no pattern, then it is very difficult for a human being to remember and reproduce the data later.

Thus storage and recall operations in human beings and machines are performed by different mechanisms. The pattern nature in storage and recall automatically gives robustness and fault tolerance for the human system.

Pattern recognition tasks

Pattern recognition is the process of identifying a specified sequence that is hidden in a large amount of data.

Following are the pattern recognition tasks.

1. Pattern association

2. Pattern classification

3. Pattern mapping

4. Pattern grouping

5. Feature mapping

6. Pattern variability

7. Temporal patterns

8. Stability-plasticity dilemma

Page 22: AI - Characteristics of Neural Networks

http://rajakishor.co.cc Page 22

Basic ANN Models for Pattern Recognition Problems

1. Feedforward ANN

Pattern association

Pattern classification

Pattern mapping/classification

2. Feedback ANN

Autoassociation

Pattern storage (LTM)

Pattern environment storage (LTM)

3. Feedforward and Feedback (Competitive Learning) ANN

Pattern storage (STM)

Pattern clustering

Feature mapping

In any pattern recognition task we have a set of input patterns and the corresponding output patterns. Depending on the nature of the output patterns and the nature of the task environment, the problem could be identified as one of association or classification or mapping.

The given set of input-output pattern pairs form only a few samples of an unknown system. From these samples the pattern recognition model should capture the characteristics of the system.

Without looking into the details of the system, let us assume that the input-output patterns are available or given to us. Without loss of generality, let us also assume that the patterns could be represented as vectors in multidimensional spaces.