introduction to artificial neural network - dosen perbanas€¦ · artificial neural network w 1....

1

Introduction to Artificial Neural Network

- theory, application and practice using WEKA-

Anto Satriyo Nugroho, Dr.Eng Center for Information & Communication Technology,

Agency for the Assessment & Application of Technology (PTIK-BPPT) Email: [email protected] URL: http://asnugroho.net

Agenda

1.  Brain, Biological neuron, Artificial Neuron 2.  Perceptron 3.  Multilayer Perceptron & Backpropagation Algorithm 4.  Application of neural network 5.  Practice using WEKA 6.  Important & useful references

Brain vs Computer Brain Computer

Informa(on Proc. Low speed, fuzzy, parallel

Fast, accurate, sequen(al

Specializa(on Pa=ern recogni(on Numerical computa(on

Informa(on representa(on Analog Digital

Num. of elements 10 billion ～ 106

Speed Slow (103/s) Fast (109/s) Performance improvement Learning SoLware upgrade

Memory Associa(ve (distributed among the synapses)

address

•  1.4 x 1011

•  Structure –  Cell Body –  Dendrite –  Axon –  Synapse (103～104)

Biological Neuron

Biological Neural Network

1.  Principal of neuron : collection, processing, dissemination of electrical signals

2.  Information processing capacity of brain : from network of the neurons

•  McCulloch & Pitts (1943)

y = f xi ×wii=1

n

∑#

$%

&

'(

. . .

x1

x2

x3

xn

y

w1

w2

w3

wn

Input signal

Output signal f

w= synapses f = activation function

Mathematical Model of Neuron

•  Input signal can be considered as dendrites in the biological neuron

•  Output signal can be considered as axon in the biological neuron

Components of a neuron •  Synapse •  Calculator of weighted input signals •  Activation Function

y = f xi ×wii=1

n

∑#

$%

&

'(

Activation Function

1. Threshold function (Heaviside function）

f (v) =1 if v > 0

0 if v ≤ 0

"

#$

%$

1 －1 　－0.5 　　　 0 0.5 1

•  used by McCulloch & Pitts

•  all-or-none characteristic

2.  Piecewise-linear function

f (v) =

1 v ≥ + 12

v +12> v > − 1

2

−1 v ≤ − 12

$

%

&&&

'

&&&

1 －1 　－0.5 　　　 0 0.5 1 　　　　　　　　　　－1

Activation Function

3.  Sigmoid function

0

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

1

-5 -3 -1 1 3 5

xcexf .1

1)(−+

=

c=4 c=2

c=1

Activation Function

How to calculate neuron’s output (without bias) ?

5.0)5.0(15.00 −=−×+×=v

f (v) =1 if v > 0

0 if v ≤ 0

"

#$

%$

0 1

0.5 -0.5

Heaviside Activation Function

0)( =vf⎥⎦

⎤⎢⎣

⎡=10

x

Input :

-0.7

How to calculate neuron’s output (with bias)?

2.0)7.0())5.0(15.00( =−−−×+×=v

⎥⎦

⎤⎢⎣

⎡=10

x

Input :

f (v) =1 if v > 0

0 if v ≤ 0

"

#$

%$

0 1

0.5 -0.5


1)( =vf

Artificial Neural Network

ww

1.  Architecture : how the neurons are connected each other 1.  Feed-forward network

2.  Recurrent networks

2.  Learning Algorithm: how the network are trained to fit an input-output mapping/function LMS, Delta rule, Backpropagation, etc.

www

Agenda

1.  Brain, Biological neuron, Artificial Neuron 2.  Perceptron 3.  Multilayer Perceptron & Backpropagation Algorithm 4.  Application of Neural Network 5.  Practice using WEKA 6.  Important & useful references

Cristopher M. Bishop: Pattern Recognition & Machine Learning, Springer, 2006, p.196

Perceptron Learning (taking of AND function as example)

1x 2x y0 0 0

0 1 0

1 0 0 1 1 1

21 xxy ∧=

1 0 1

Perceptron Learning (taking of AND function as example)

Training set: 4 examples, each consists of 2 dimensional vector

0 0

( , 0), ( ,0), ( ,0), ( ,1) 0 1

1 0

1 1

teaching signal (desired output)

22 ))((21

21 xhyErrE W−≡=

Output

xInput :

Learn by adjusting weights to reduce error on training set. The square error for an example with input and true output (teaching signal) y is

x

Gradient Descent Optimization

)( )()()1( ttt wEww ∇−=+ α

j

n

jjj

j

jj

xingErr

xWgyW

Err

WErrErr

WE

××−=

⎟⎟⎠

⎞⎜⎜⎝

⎛⎟⎟⎠

⎞⎜⎜⎝

⎛−

∂

∂×=

∂

∂×=

∂

∂

∑=

)('0

jjj xingErrWW ×××+← )('α

∑=

=n

jjj xWin

0Simple weight update rule:

Perform optimization search by gradient descent:

Weight Update rule

))(1()(11

1)1(

)()1(

1

)1()1(

1)(

2

2

2

xgxgee

e

ee

ee

ede

xgdxd

x

x

x

x

x

xx

xx

−×=+

×+

=

+=

−×+

−=

++

−=

−

−

−

−

−

−−

−−

xexg

−+=11)(

What if we use Sigmoid function as g ?

Like this !

j

n

jjj

j

jj

xingErr

xWgyW

Err

WErrErr

WE

××−=

⎟⎟⎠

⎞⎜⎜⎝

⎛⎟⎟⎠

⎞⎜⎜⎝

⎛−

∂

∂×=

∂

∂×=

∂

∂

∑=

)('0

jjj xingingErrWW ×−××+← ))(1)((α

∑=

=n

jjj xWin

0Simple weight update rule:

Perform optimization search by gradient descent:

Weight Update rule (using Sigmoid as Activation　Function)

Perceptron Learning Algorithm

AIMA, p.742

)(ing

∑=

←n

jjj exWin

0][

][)(' exingErrWW jjj ×××+← α

Output calculation

)(][ ingeyErr −←

][exInput :

For (e=1;e<n;e++)

Error calculation

Weight update

)(ing

Perceptron Learning Algorithm

1.5 1.0

1.0

AND function using Perceptron

1x 2x y0 0 0

0 1 0

1 0 0 1 1 1

21 xxy ∧=

!"#

≤

>=

0001

)(vifvif

vf


0.5 1.0

1.0

OR function using Perceptron

1x 2x y0 0 0

0 1 1

1 0 1 1 1 1

21 xxy ∨=

!"#

≤

>=

0001

)(vifvif

vf


iteration

MS

E

Result of XOR output

Problem appears when perceptron is used to learn NON-linear function

1 0 1

5.001 −= xx

5.001 += xx

Class ＝０

Class＝１

-2.5

1x

2x

-2.5

2.5

-5

-5

5

5

5

-5

Non linear mapping can be realized by inserting a hidden layer. But the learning algorithm is not known until 1986.

0 0 0 0 1 1 1 0 1 1 1 0

1x 2x yXOR

Marvin Minsky Seymour Papert (Cognitive Scientist) ( MIT mathematician)

Agenda


David E.Rumelhart: A Scientific Biography http://www.cnbc.cmu.edu/derprize/

1986, Chap.8, pp.318-362, Learning Internal Representations by Error propagation

w

Input data

Input layer Output layer w Hidden layer

X

1.  Input a datum from training set to the Input Layer, and calculate the output of each neuron in Hidden and Output Layer

Forward pass

Backpropagation Learning

wΔ

w

X

Teaching signal

2.　Calculate the Error, that is the difference (Δ) between the output of neuron in output layer

with the desired value (teaching signal)

Input data

Input layer Output layer Hidden layer

Δ

Δ


ww



Input data : an image of “B”



B

A

B

C

Output value : 0.5

Output value : 0.3

Output value : 0.1

w0.5

w Teaching signal





03

0.1


B

A

B

C

0

1

0

wΔ = 0-0.5

w





Δ = 1-03

Δ = 0-0.1


B

A

B

C

ww

X

3.　Using the Δ value, update the weight between Output-Hidden Layer, and Hidden-Input Layer

Backward pass

Teaching signal

Input data


Δ

Δ

Δ


4.  Repeat step 1 to step 3, until stopping criteria is satisfied.

Stopping Criteria: - maximal epochs/iteration - MSE (Mean Square Error)


BP for 3 layers MLP

… …

…

Input Layer Output LayerHidden Layer

Ii

i j k

HjOk

wji wkj

Input layer Hidden Layer

Output layer

x!

Input layer-Hidden layer

ii xI =

jnetjj enetfH −+

==11)(

∑+=i

ijijj Iwnet θ… …

…


Ii

i j k

HjOk

wji wkj


Output layer

x!

bias

Forward Pass (1)

Hidden layer-Output layer

x!knetkk e

netfO −+==11)(

∑+=j

jkjkk Hwnet θ

Forward Pass (2)

x!

… …

…


Ii

i j k

HjOk

wji wkj


Output layer

Backward Pass 1: Hidden-Output Layer

Hidden layer-Output layer

∑ −=k

kk OtE 2)(21

x! Teaching signal Δ )1()( kkkkk OOOt −−=δ

jkkj

kj HwEw ηδη =

∂∂

−=Δ

Error （MSE:Mean Square Error）

Weight update

kjoldnew www Δ+=

Learning rate

… …

…


Ii

i j k

HjOk

wji wkj


Output layer

Δ

Δ

Δ

∑ −=k

kk OtE 2)(21

)( kkk

OtOE

−−=∂∂

)1()1(

12 kk

netnet

k

k OOeenet

Ok

k−=

+=

∂

∂ −−

jkj

k Hwnet

=∂∂

Error is given by Modification of weights between Output and Hidden Layer due to the error E is calculated as follows:

)1()( kkkkk OOOt −−=δ

jkkj

kj HwEw ηδη =

∂∂

−=Δ

jkjkkkkkj

HHOOOtwE

δ−=−−−=∂∂ )1()(

where

Thus, the weight correction is obtained as follows

η is the learning rate

Hidden layer-Input layer

Weight update

jioldnew www Δ+=

∑−=k

kkjjjj wHH δδ )1(

ijji

ji xwEw ηδη =

∂∂

−=Δx! Teaching signal Δ

… …

…


Ii

i j k

HjOk

wji wkj


Output layer

Δ

Δ

Δ

Backward Pass 2: Input-Hidden Layer

kjj

k

kkkkkj

k

k

iji

j

jjnet

netj

j

wHnet

OOOtnetO

OE

xwnet

HHeenet

Hj

j

=∂

∂

−=−−−=∂

∂

∂

∂

=∂

∂

−=+

=∂

∂ −

−

δ)1()(

)1()1(

12

j

k

k

k

k kji

j

j

j

ji

j

j

j

j

k

k

k

k kji

Hnet

netO

OE

wnet

netH

wnet

netH

Hnet

netO

OE

wE

∂

∂

∂

∂

∂

∂

∂

∂

∂

∂=

∂

∂

∂

∂

∂

∂

∂

∂

∂

∂=

∂

∂

∑

∑

The weight correction between Hidden and Input layer are determined using the similar way.

ij

ij

kkkjjji

kkjkijj

ji

xI

wHHI

wIHHwE

δ

δ

δ

δ

=

=

−−=

−−=∂

∂

∑

∑)1(

)()1(hence

∑−=k

kkjjjj wHH δδ )1(where

The correction of weight vector is

ijji

ji xwEw ηδη =

∂∂

−=Δ

Momentum

jkkj Htw ηδ=Δ )(

ijji xtw ηδ=Δ )(

)1()( −Δ+=Δ twHtw kjjkkj αηδ

)1()( −Δ+=Δ twxtw jiijji αηδ

Add inertia to the motion through weight space, preventing the oscillation

Output-Hidden Layer

Hidden-Input Layer

ii xI =

jnetjj enetfH −+

==11)(∑+=

iijijj Iwnet θ

Training Process: Forward Pass

1.  Calculate the Output of Input Layer

2.  Calculate the Output of Hidden Layer

3.  Calculate the Output of Output Layer

knetkk enetfO −+

==11)(∑+=

jjkjkk Hwnet θ

∑−=k

kjkjjj wHH δδ ,)1(

Training Process: Backward Pass

1.  Calculate the of Output Layer

2.  Update the weight between Hidden & Output Layer

3.  Calculate the of Hidden layer

4.  Update the weight between Input & Hidden Layer

jkjk Hw ηδ=Δ ,

))(1( kkkkk OtOO −−=δ

jkoldjknewjk www ,)(,)(, Δ+=

kδ

jδ

ijij Iw ηδ=Δ , ijoldijnewij www ,)(,)(, Δ+=

Implementation of Neural Network for Handwriting Numeral Recognition System

in Facsimile Autodialing System 1

１２３－４５６－７８９０

Facsimile Form

To Mr.Tanaka

ＸＸＸＸＸＸＸＸＸＸ

ＸＸＸＸＸＸＸＸＸＸＸＸＸ



H.Kawajiri

１２３－４５６－７８９０

Facsimile Form

To Mr.Tanaka





H.Kawajiri

123-456-7890

④Auto-dialing

①Write the dial number at the head of the facsimile draft

②Insert the draft

③Dial number will be recognized and displayed

Facsimile Form

⑤Sending the draft

Facsimile Form

⑤Sending the draft

Hand-written Auto-dialing Facsimile(SFX-70CL)

Related Publication: Hand-written Numeric Character Recognition for Facsimile Auto-dialing by Large Scale Neural Network CombNET-II, Proc. of 4th.International Conference on Engineering Applications of NeuralNetworks, pp.40-46, June 10-12,1998, Gibraltar

•  Application : - Robot Eyes - Support System for Visually Handicapped

Input Image

Camera

Find the text region

Character recognition

Text to Speech Synthesizer

Automatic System for locating characters Using Stroke Analysis Neural Network

2

Related Publication: An algorithm for locating characters in color image using stroke analysis neural network, Proc. of the 9th International Conference on Neural Information Processing (ICONIP’02), Vol.4, pp.2132-2136, November 18-22, 2002, Singapore

Fog Forecasting by large scale neural network CombNET-II

3

l  Predicting fog event based on meteorological observation l  The prediction was held every 30 minutes and the result was used to

support aircraft navigation l  The number of fog events was very small compared to no fog events

which can be considered as a pattern classification problem involving imbalanced training sets

l  Observation was held every 30 minutes, in Long.141.70 E, 42.77 Lat., 25 m above sea level by Shin Chitose Meteorological Observatory Station (Hokkaido Island, Japan)

l  Fog Event is defined for condition where l  Range of Visibility < 1000 m l  Weather shows the appearance of the fog

l  Winner of the competition (1999)

No. Meteorological Information

1 2 3 4 5 6 7 8 9 10 11 12 13

Year Month Date Time Atmospheric Pressure [hPA] Temperature [oC] Dew Point Temperature [oC] Wind Direction [o] Wind Speed [m/s] Max.Inst.Wind Speed [m/s] Change of Wind (1) [o] Change of Wind (1) [o] Range of Visibility

No. Meteorological Information

14 15 16 17 18 19 20 21 22 23 24 25 26

Weather Cloudiness (1st layer) Cloud Shape (1st layer) Cloud Height (1st layer) Cloudiness (2st layer) Cloud Shape (2st layer) Cloud Height (2st layer) Cloudiness (3st layer) Cloud Shape (3st layer) Cloud Height (3st layer) Cloudiness (4st layer) Cloud Shape (4st layer) Cloud Height (4st layer)

Example : 1984 1 1 4.5 1008 0.0 –7.0 270 6 –1 –1 –1 9999 85 0 2 10 0 4 25 –1 –1 –1 –1 –1 –1

Observed Information

Proposed Method

CombNET-II

Probabilistic NN

Modified Counter Propagation NN

Fog Events (539 correct) Predictions

622

169

908

Correctly Pred. 374

127

178

Num. of false prediction.

370

445

734

Result of 1999 Fog Forecasting Contest

This study won the first prize award in the 1999 Fog Forecasting Contest sponsored by Neurocomputing Technical Group of IEICE-Japan

Problem: given the complete observation data of 1984-1988, 1990-1994 for designing the model, then predict the appearance of fog-event during

1989 and 1995

Achievements

This study won the first prize award in the 1999 Fog Forecasting Contest sponsored by Neurocomputing Technical Group of IEICE-Japan

Related Publications: 1. A Solution for Imbalanced Training Sets Problem by CombNET-II

and Its Application on Fog Forecasting, IEICE Trans. on Information & Systems, Vol.E85-D, No.7, pp.1165-1174, July 2002

2. Mathematical perspective of CombNET and its application to meteorological prediction, Special Issue of Meteorological Society of Japan on Mathematical Perspective of Neural Network and its Application to Meteorological Problem, Meteorological Research Note, No.203, pp.77-107, October 2002

NET Talk

•  T.J. Sejnowski and C.R. Rosenberg : a parallel network that learns to read aloud, Cognitive Science, 14:179-211, 1990. Simulation: “Continuous Informal Speech” pp.194-203

•  Network architecture: 203-120-26 (trained in 30,000 iterations)

Output: phoneme (accuracy 98%)

Text (1000 words) THE OF AND TO IN …etc

… …

…


Ii

i j k

HjOk

wji wkj

http://www.cnl.salk.edu/ParallelNetsPronounce/index.php

4

Handwriting Digit Recognition

•  MNIST database consists of 60,000 examples as training set, and 10,000 examples as testing set

•  Linear Classifier: 8.4% error •  K-Nearest Neighbor Classifier, L3: 1.22% error •  SVM Gaussian kernel: 1.4% •  SVM deg.4 polynomial : 1.1% error •  2 layer ANN with 800 hidden units: 0.9% error •  Currently (26 October 2009) the best accuracy is achieved using

Large Convolutional Network (0.39% error)

http://yann.lecun.com/exdb/mnist/

5

Agenda


Flow of an AI experiment

AI model

Training Set

Testing Set

Validation Set

Model fitting

Error estimation Of selected model

Generalization assessment Of the final chosen model

applied to the real world

How to make experiment using ANN ?

Step 1 Prepare three data set which is independent each other: Training Set, Validation Set and Testing Set. Step 2 Train the neural network using initial parameter setting : - stopping criteria (training is stopped if exceeded t iteration OR MSE is lower than z)

- num. of hidden neuron - learning rate - momentum

How to make experiment using ANN ?

Step 3 Evaluate the performance of the initial model by measuring its accuracy to the validation set Step 4 Change the parameter and repeat step 2 and step 3 until satisfied result achieved. Step 5 Evaluate the performance of the neural network by measuring its accuracy to the testing set

Performance Evaluation

•  Training set: model fitting •  Validation set: estimation of prediction error for

model selection •  Testing set: assessment of generalization error

of the final chosen model

Train Validation Test

Agenda


Important & Useful References for Neural Network

•  Neural Networks for Pattern Recognition, Christopher M. Bishop, Oxford University Press, 1995

•  Neural Network Comprehensive Foundation (2nd edition), Simon Haykin, Prentice Hall, 1998

•  Pattern Classification, Richard O. Duda, Peter E. Hart, David G. Stork, John Wiley & Sons Inc, 2000

•  Artificial Intelligence: A Modern Approach, Stuart J. Russell, Peter Norvig, Prentice Hall, 2002

•  Introduction to Data Mining, Pang Ning Tan, Michael Steinbach, Vipin Kumar, Addison Wesley, 2006

•  Data Mining: Practical Machine Learning Tools and Techniques (Second Edition), Ian H. Witten, Eibe Frank, Morgan Kaufmann, June 2005

•  FAQ Neural Network ftp://ftp.sas.com/pub/neural/FAQ.html

•  Backpropagator’s review http://www.dontveter.com/bpr/bpr.html

•  UCI Machine Learning Repository http://archive.ics.uci.edu/ml/index.html

•  WEKA: http://www.cs.waikato.ac.nz/~ml/weka/

•  Kangaroos and Training Neural Networks: http://www.sasenterpriseminer.com/documents/Kangaroos%20and%20Training%20Neural%20Networks.txt

introduction to artificial neural network - dosen perbanas€¦ · artificial neural network w 1....

Documents