additional nn models

Additional NN ModelsReinforcement learning (RL)• Basic ideas:

– Supervised learning: (delta rule, BP)• sample(x, f(s)) to learn f(.)• precise error can be determined and is used to drive the

learning.– Unsupervised learning: (competitive, BM)

• no target/desired output provided to help learning, • learning is self-organized/clustering

– reinforcement learning: in between the two• no target output for input vectors in training samples• a judge/critic will evaluate the output

good: reward signal (+1) bad: penalty signal (-1)

• RL exists in many places– Originated from psychology( training animal)– Machine learning community, different theories and

algorithmsmajor difficulty: credit/blame distribution chess playing: W/L (multi-step) soccer playing: W/L(multi-player)

– In many applications, it is much easier to determine good/bad, right/wrong, acceptable/unacceptable than to provide precise correct answer/error.

– It is up to the learning process to improve the system’s performance based on the critic’s signal.

• Principle of RL– Let r = +1 reword (good output) r = -1 penalty (bad output)– If r = +1, the system is encouraged to continue what it

is doingIf r = -1, the system is encouraged not to do what it is doing.

– Need to search for better output • because r = -1 does not indicate what the good

output should be. • common method is “random search”

• ARP: the associative reword-and-penalty algorithm for NN (Barton and Anandan, 1985)– Architecture

criticz(k)

y(k)

x(k)

input: x(k)output: y(k)stochastic units: z(k) for random search

– Random search by stochastic units zi

or let zi obey a continuous probability distribution function.

or let is a random noise, obeys

certain distribution.Key: z is not a deterministic function of x, this

gives z a chance to be a good output.– Prepare desired output (temporary)

1/2

1/2

)1()1()1()1(

Tneti

Tneti

i

i

ezpezp

wherenetz ii

1)()(1)()()(

krifkykrifkykd

– Compute the errors at z layer

where E(z(k)) is the expected value of z(k) because z is a random variableHow to compute E(z(k))

• take average of z over a period of time• compute from the distribution, if possible• if logistic sigmoid function is used,

– Training: BP or other method to minimize the error

))(()()( kzEkdke

)/tanh())(1)(1()()1()( TnetnetgnetgzE iiii

(II) Probabilistic Neural Networks1. Purpose: classify a given input pattern x into one of the pre-defined classes by Bayesian decision rule. Suppose there are k predefined classes s1, …sk

P(si): prior probability of class siP(x|si): conditional probability of class siP(x): probability of xP(si|x): posterior probability of si, given xexample:S=s1Us2Us3….Usk, the set of all patientssi: the set of all patients having disease Ix: a description of a patient(manifestation)

P(x|si): prob. One with disease I will have description x

P(si|x): prob. one with description x will have disease i.

by Bayes’ theorem:)|(max)|( xsPxsP jji

examplersfromlearnedaresxPPNNin

sPsxPsPsxP

iffxsPxsPstconsisxPce

xPsPsxPxsP

i

iijii

jji

i

iii

)|(,

\)()|(max)()|(

)|(max)|(,tan)(sin

)()()|()|(

2. PNN architecture: feed forward with 2 hidden layerslearning is not used to minimize error but to obtain P(x|si)

3. Learning assumption: P(si) are known, P(x|si) obey Gaussian distr.

estimate

)2

)(exp(21)(

2exp

)2(1)|(

2

2

12

2)(

2/

uxxf

xx

nsxP

in

j i

ij

im

imi

4.Comments:(1) Bayesian classification by (2) fast classification( especially if implemented in parallel machine).(3) fast learning(4) trade nodes for time( not good with large training smaples/clusters).

(III)Recurrent BP1. Recurrent networks: network with feedback links

- state(output) of the network evolves along the time.- may or may not have hidden nodes.- may or may not stabilize when t- how to learn w so that an initial state(input) will lead to a stable state with the desired output.

2. Unfoldingfor any recurrent network with finite evolution time, there is an equivalent feedforward network.problems:too many repetitions too many layers when the network need a long time to

reach stable state.standard BP needs to be relized to hard duplicate weights.

3. Recurrent BP (1987)system:

assume at least one fixed point exists for the system with the given initial statewhen a fixed point is reduced

can be obtained.error

)1()( iijjii

i vvwgdtdV

T

)0(

y

)( ijiji vwgv

k

kkk

vtueE 22 )(21

21

take the gradient descent approach to minimize E by update W

direct derivation will have

k ts

kq

stts W

ve

uEW

jiifjiif

whfpP

PQmatrixaofelementanisqwhere

vingqeW

ijijiijij

ks

ktskskts

01

)(':

)(')(

1

Computing is very time consuming.Pineda and Almeida/s proposal:

can be computed by another recurrent netwith identical structure of the original RNdirection of each are is reversed( transposed network)in the original network: weight for node j to i: Wijin the transposed network, weight for node j to i:

1P

kksks

ktskskts

qez

yhfqeW )(')(

jij whf )('

j

ijjijii ezwhfZ

dtdZ

)('

Weight-update procedure for RBPwith a given input and its desired output

1. Relax the original network to a fixed point2. Compute error3. Relax the transposed network to a fixed point 4. Update the weight of the original network

The complete learning algorithmincremental/sequentialW is updated by the preseting of each learning pair using the weight-update procedure.to ensure the dearned network is stable, learning rate must be small(much smaller than the rate for standard BP learning)time consuming: two relaxation processes are involved for each step of weight updatebetter performance than BP in some applications

)),0(( dy

III network of radial basis functions1. Motivations

better function approximationBP network( hidden units are sigmoid)

training time is very longgeneralization(with non-training input) not

always good

Counter Propagation(hidden units are WTA)poor approximation, especially with interpolationany input is forced to be classified into one class

and intern produces class/ output as its function value.

2. Architectureinputhiddenoutput(similar to BP and CPN)operation/learning: similar to CPN

inputhidden: competitive learning for class character

hiddenoutput delta rule(LMS error) for mappingdifference: hidden units obey Radial Basis function

3. Hidden unit: Gaussian functionsuppose unit I represent a class of inputs with centroid

ic

jj

iii

xwhere

Cxx

)(

/)2/exp()( 2

Radial basis function

input vectors with equal distance to Ci will have the same output.

Each hidden unit I has a receptive fied with Ci as its centerif x=Ci , unit I has the largest outputif x!=Ci, unit I has the smallest outputthe size of the receptive field is determined by

During computation, hidden units are not WTA( no lateral inhibition with an input x, usually more than one hidden units can have non-zero output. These outputs can be combined at output layer to produce better approximation.

)()( 2121 xxthenxxif

4. LearninginputhiddenCi: competitive, based on neti : ad hoc(performance not sensitive tohiddenoutput

delta rule(LMS)

i i

5. Comments• compare with BP

approximate any …..L2 function(same as BP)may have better …usually requires many more training samples and

many more hidden unitsonly one hidden layer is needed.training is faster

• Compare with CPNmuch better function approximationtheoretical analysis is only prelimnary

additional nn models

Documents