rutgers cs440, fall 2003 neural networks reading: ch. 20, sec. 5, aima 2 nd ed

Rutgers CS440, Fall 2003

Neural networks

Reading: Ch. 20, Sec. 5, AIMA 2nd Ed


Outline

• Human learning, brain, neurons• Artificial neural networks• Perceptron• Learning in NNs• Multi-layer NNs• Hopfield networks


Brains & neurons

• Brain is composed of over a billion nerve cells that communicate with each other through specialized contacts called synapses. ~1000 synapses / neuron => extensive and elaborate neural circuits

• Some synapses excite neurons and cause them to generate signals called action potentials, large transient voltage changes that propagates down their axons. Other synapses are inhibitory and prevent the neuron from generating action potentials. The action potential propagates down the axon to the sites where the axon has made synapses with the dendrites of other nerve cells.

http://www.medicine.mcgill.ca/physio/cooperlab/cooper.htm


Artificial neurons

• Computational neuroscience• McCulloch-Pitts model

inj

gj

xj

wj0wj1

wji

wjN

x0=-1

x1

xi

xNInputfunction

Activationfunction

Output

Offset / bias

Weights

Input links Output links

)(

'2

1

1

10010

xWgx

gxwgwxwgxwgx

Nx

x

x

jNwjwjwjjjj

N

iijij

N

iijijj


Activation function

• Threshold activation function (hard threshold, step function)• Sigmoid activation function (soft threshold)• Bias / offset shifts activation function left/right

gj

inj

gj

inj

0,1

0,0)(

in

ining

ineing

1

1)(


X

Y

-1

X V Y

wx

wy

w0

OR

X

Y

-1

X V Y

wx=1

wy=1

w0=0.5

OR

Implementing functions

• Artificial neurons can implement different (boolean) function

X

Y

-1

X ^ Y

wx

wy

w0

AND

X

-1

Xwx

w0

NOT

X

Y

-1

X ^ Y

wx=1

wy=1

w0=1.5

AND

X

-1

Xwx=1

w0=0.5

NOT


Networks of neurons

• Feed-forward networks:– single-layer networks

– multi-layer networks

– Feed-forward networks implement functions, have no internal state

• Recurrent networks:– Hopfield networks have

symmetric weights (Wij = Wji)

– Boltzmann machines use stochastic activation functions,

– Recurrent neural nets have directed cycles with delays


Perceptron

• Rosenblatt, 1958• Single layer, feed-forward network

x1

x2

x3

x4

x5

y1

y2

y3

Input layer Output layer

wji

x1

x2

y1


Expressiveness of perceptron

• With threshold function, can represent AND, OR, NOT, majority• Linear separator (linear discriminant classifier)

0:

0:

0

0

N

iiji

N

iiji

xwX

xwX

0:

0:

WXX

WXX

Line: WX = 0


Expressiveness of perceptron (cont’d)

• Can only separate linearly separable classes

x1

x2

0 1

1

x1

x2

0 1

1

x1

x2

0 1

1

AND OR XOR

?

x1

x2

0

C1

C2


Perceptron example:geometric interpretation

• Two inputs, one output

w’ x – w0 = 0

122

21

2

1

2

1

ww

x

xx

w

ww

w0

w1 = cos

w2

= s

in

)'()( 002211 wxwgwxwxwgy

xpw’ xp

x1

x2

w’xp-w0


Perceptron learning

• Given a set of (linearly separable) points, X+ and X- ( or a set of points with labels, D = { (x,y)k }, y { 0, 1 } ), find a perceptron that separates (classifies) the two

• Algorithm: – Minimize error of classification. How?– Start with some random weights and adjust them (iteratively) so that the error is

minimal

K

k

K

kk

N

ikiik errxwgyE

1 1

221

2

0,2

1

)(minarg* wEww

i

li

li

kik

K

kk

i

w

Eww

xingerrw

E

)1()(

,1

)('

in

g’(in)


Analysis of the perceptron learning algorithm

• Consider 1D case: 1 input, 1 output

0

)1(0

)(0

10

0

)1)(('

)(

w

Eww

ingerrw

E

wxgy

ll

k

K

kk

x0

w0(l-1)

0 0 -1 0 0err

)('00)(')1(00)1)((' 3310

xgxgingerrw

Ek

K

kk

)('))('( 3)1(

03)1(

00

)1(0

)(0 xgwxgw

w

Eww llll

w0(l)

)(' 3xg

x3


Perceptron learning (cont’d)

• Converges to consistent classifiers if data is linearly separable


Multilayer perceptrons

• Number of hidden layers• Usually fully connected• 2-layers can represent all

continuous functions• 3-layers can represent all

functions• Learning:

backpropagation algorithm, extension of the perceptron learning– Backpropagate the error from

the output layer into hidden layer(s)

x1

x2

x3

x4

x5

y1

y2

Input layer Output layer

w(1)ji

Hidden layer

w(2)ji


Application: handwritten digit recognition


Probabilistic interpretation

• Two classes, X+ and X-, elements of which are distributed randomly according to some densities p+(x) and p-(x)

• Classes have prior probabilities p+ and p-

• Given an element x decide which class it belongs to.

X+

X-

0 )(

)(log

)|(

)|(log

?

pxp

pxp

xp

xp

)|(

)|(logg ) Class(

xp

xp

)|()|()|(

)|(

)|(

)|(1

1e1

1 ) is Class(

)|(

)|(log

xpxpxp

xp

xp

xp

xp

xp

• Output of NN is the posterior probability of a data class


Hopfield NN

• “Associative memory” – associate unknown input with entities encoded in the network

)( )1()( tt Wxgx

Activation potential at “time” t-1(state of network at t-1)

Activation potential at “time” t(state of network at t)

• Learning:Use a set of examples X = { x1, …, xK} and set W to

1...0

1

0...1

1

' KxxWK

kkk

• Association:To associate an unknown input xu with one of K memorized elements, set x(0) = xu and let the network converge


Hopfield NN example

Training set

Test example

t=0 t=1 t=2 t=3

rutgers cs440, fall 2003 neural networks reading: ch. 20, sec. 5, aima 2 nd ed

Documents

output w x w

x y x v y w x

rutgers cs440

w0w0 w

x y x v y wxwx wywy

gin slide

threshold function

w ji boltzmann machines