introduction to neural networks introduction to neural networks applied to ocr and speech...

Introduction to Neural Networks

Introduction to Neural NetworksApplied to OCR and Speech Recognition

Axon

DendriteSynapse

Soma

• An actual neuron

• A crude model of a neuron

• Computational Neural Networks: A computational approach inspired by the architecture of the biological nervous system


Cat Neural Probe to Study Response

oscillosco pe

Time Axis

milliVolts

IntensityShine Light

NoLight

NeuralResponse

probecat neuron

RESPONSE

STIMULUS

probe

un lucky cat


The Perceptron Model

w1

w2

w3

w4

Sum Threshold

O

I1I2I3

I4


AND OR NOT

0001

0111

1100

I1

w1 I1w1

w2

+ I2w2

I2

I1O

I10011

I2

0101

w1w2

threshold

11

1.5

11

0.5

- 10

- 0.5

T

Example Weights: And & Or Problems

I w I w T

I w I w T

1 1 2 2

1 1 2 2 1 0

( )


Weight Adjustments

I 1 I 2 O Eqn : I w I w 1 w1 1 2 2 3

0 0 0 0 0 1 0 01 2 3 3 w w w w

0 1 0 0 1 1 0 01 2 3 2 3 w w w w w

1 0 0 1 0 1 0 01 2 3 1 3 w w w w w

1 1 1 1 1 1 0 01 2 3 1 2 3 w w w w w w


3-D and 2-D Plot of AND Table

Problem is to find a plane that sepa-rates the “on” circle from the “off” circles.

- Output is 0

- Output is 1


Training Procedure

1. First assign any values to w1, w2 and w3

2.Using the current weight values w1, w2 and w3 and the next training item inputs I1and I2 compute the value:

3.If V 0 set computed output C to 1 else set to 0.

4.If the computed output C is not the same as the current training item output O, Adjust Weights.

5.Repeat steps 2-4. If you run out of training items, start with the first training item. Stop repeating if no weight changes through 1 complete training cycle).

32211 w1wIwI = V


Gradient Descent Algorithm

w w I C O

w w I C O

w w C O

Next Current

Next Current

Next Current

1 1 1

2 2 2

3 3

( )

( )

( )


Linearly vs. Non-Linearly Separable


The XOR Problem is Linearly Non-separable


The Back Propagation Model

Five6 input, 1 output

Perceptrons

Four5 input, 1 output

Perceptrons

Three4 input, 1 output

Perceptrons

w1

w2

w3

w4

Sum Threshold

O

I1I2

I3

I4

Input Output

Backpropagation Network

layers

One Backprop Unit

hidden


Advantage of Backprop over Perceptron

กกก ถก ถถถ

หหหหกกกกกก

กกยยยย

ห



กกยยยย

ห

กกก ถก ถถถหหหหกกกกกก

กกยยยย

ห

Input: Cluster based on two features

feature 1

feature2

Layer 1: Decisionboundaries draw n

Layer 2: Decisionregions determined



กกยยยย

ห

Layer 3: Decision

regions (ก) grouped


Backprop Learning Algorithm

1. Assign random values to all the weights

2. Choose a pattern from the training set (similar to perceptron).

3. Propagate the signal through to get final output (similar to perceptron).

4. Compute the error for the output layer (similar to the perceptron).

5. Compute the errors in the preceding layers by propagating the error backwards.

6. Change the weight between neuron A and each neuron B in another layer by an amount proportional to the observed output of B and the error of A.

7. Repeat step 2 for next training sample.


Application: Needs Enough Training

กกกก

หหห

A small t rain ing set meansmany possible decisio n boundaries .

กกกกหหห

กกกกกก กก

หหหหหหห

หห

A large training setconstrain ts the deci-sion boundary more.


Application to Speech Pr

ocessing


Speech


Appendix

V = I w I w 1 w1 1 2 2 3 C = t h r e s h o l d ( V ) = c u r r e n t o b s e r v e d o u t p u t

O = c u r r e n t t r a i n i n g s e t o u t p u t

T h e d i f f e r e n c e b e t w e e n t h e o b s e r v e d C a n d t h ed e s i r e d o u t p u t O m a y b e u s e d t o m e a s u r e t h e r e r r o rf u n c t i o n E :

E C O

dE

dt

E

w

dw

dt

( ) 2

( 3 )

T h e m a x i m u m d e c r e a s e i n e n e r g y w o u l d b ea c c o m p l i s h e d w h e n :

.max,2

decreasethedt

dw

dt

dEyieldTo

dt

dw

w

E

D i f f e r e n t i a t i n g E q u a t i o n 2 t o g e t a n e x p r e s s i o n f o rd w :

E

wI C O

dw

dtI C O

w w w w I C O

where is a gain factor

next current current

2 2( ) ( )

( )

.


Error Propagation Algorithm

If neuron A in one layer is connected to B,C, and D in the output layer, it is responsible for the errors observed in B,C, and D. Thus, the error in A is computable by summing up the errors in B, C, and D weighted by the connection strengths between A and each B, C and D

introduction to neural networks introduction to neural networks applied to ocr and speech...

Documents