multistage neural networks for pattern recognition829352/fulltext01.pdfmultistage neural networks...

63
Master Thesis Computer Science Thesis no: MSE-2009:34 May 2009 Multistage neural networks for pattern recognition Maciej Zieba School of Engineering Blekinge Institute of Technology Box 520 SE – 372 25 Ronneby Sweden

Upload: others

Post on 17-Mar-2020

9 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Multistage neural networks for pattern recognition829352/FULLTEXT01.pdfMultistage neural networks for pattern recognition Maciej Zieba School of Engineering Blekinge Institute of Technology

Master ThesisComputer ScienceThesis no: MSE-2009:34May 2009

Multistage neural networks for

pattern recognition

Maciej Zieba

School of EngineeringBlekinge Institute of TechnologyBox 520SE – 372 25 RonnebySweden

Page 2: Multistage neural networks for pattern recognition829352/FULLTEXT01.pdfMultistage neural networks for pattern recognition Maciej Zieba School of Engineering Blekinge Institute of Technology

This thesis is submitted to the School of Engineering at Blekinge Institute of Tech-

nology in partial fulfillment of the requirements for the degree of Master of Science in

Computer Science. The thesis is equivalent to 24 weeks of full time studies.

Contact Information:

Author:

Maciej Zieba

E-mail: [email protected]

University advisors:

Jerzy Swiatek, Professor

Dept. of Computer Science and Management

Wroc law University of Technology, Poland

Ludwik Kuzniarz, Doctor

Dept. of Software Engineering and Computer Science

Blekinge Institute of Technology, Sweden

School of Engineering Internet : www.bth.se/tek

Blekinge Institute of Technology Phone : +46 457 38 50 00

Box 520 Fax : + 46 457 271 25

SE – 372 25 Ronneby

Sweden

Page 3: Multistage neural networks for pattern recognition829352/FULLTEXT01.pdfMultistage neural networks for pattern recognition Maciej Zieba School of Engineering Blekinge Institute of Technology

Abstract

In this work the concept of multistage neural net-

works is going to be presented. The possibility of

using this type of structure for pattern recogni-

tion would be discussed and examined with cho-

sen problem from field area. The results of ex-

periment would be confront with other possible

methods used for the problem.

Keywords: two-stage neural network, multi-

stage neural network, multistage pattern recogni-

tion, two-stage pattern recognition, writer identi-

fication ,iconic gesture recognition, online recog-

nition

Page 4: Multistage neural networks for pattern recognition829352/FULLTEXT01.pdfMultistage neural networks for pattern recognition Maciej Zieba School of Engineering Blekinge Institute of Technology

Contents

Abstract iii

Symbols and abbreviations viii

1 Introduction 1

1.1 Background . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1

1.2 Motivation and goal . . . . . . . . . . . . . . . . . . . . . . . . . . . 2

1.3 Related works . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2

1.4 Expected outcomes . . . . . . . . . . . . . . . . . . . . . . . . . . . 3

1.5 Research questions . . . . . . . . . . . . . . . . . . . . . . . . . . . 3

1.6 Thesis outline . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4

2 Structure of MPR model 6

2.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6

2.2 Two-stage pattern recognition model . . . . . . . . . . . . . . . . . 7

2.3 MPR model . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8

3 Introduction to NN 9

3.1 Model of neuron . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9

3.1.1 Activation function - survey . . . . . . . . . . . . . . . . . . 10

3.1.2 Model of perceptron . . . . . . . . . . . . . . . . . . . . . . . 11

3.2 Multilayer Feedforward Network (MFN) . . . . . . . . . . . . . . . 12

3.2.1 Taxonomy of Neural Networks . . . . . . . . . . . . . . . . . 12

3.2.2 Model of MFN . . . . . . . . . . . . . . . . . . . . . . . . . . 12

3.2.3 Neural networks for pattern recognition . . . . . . . . . . . . 13

4 MNN model 15

4.1 Two-stage neural network . . . . . . . . . . . . . . . . . . . . . . . 15

4.1.1 Two-stage identification concept . . . . . . . . . . . . . . . . 15

iv

Page 5: Multistage neural networks for pattern recognition829352/FULLTEXT01.pdfMultistage neural networks for pattern recognition Maciej Zieba School of Engineering Blekinge Institute of Technology

CONTENTS v

4.1.2 Two-stage neuron . . . . . . . . . . . . . . . . . . . . . . . . 16

4.1.3 Binding functions . . . . . . . . . . . . . . . . . . . . . . . . 18

4.1.4 Model of two-stage MFN . . . . . . . . . . . . . . . . . . . . 20

4.1.5 Two-stage MFN for pattern recognition . . . . . . . . . . . . 22

4.2 Multistage generalization for two-stage neural networks . . . . . . . 24

5 Introduction to estimation of MNN parameters 26

5.1 Estimation parameters for two-stage identification . . . . . . . . . . 26

5.2 Parameter estimation for neural networks - learning methods . . . . 27

5.2.1 Learning taxonomy . . . . . . . . . . . . . . . . . . . . . . . 27

5.2.2 Backpropagation algorithm . . . . . . . . . . . . . . . . . . . 28

6 MNN learning method 32

6.1 Two-stage MFN learning process . . . . . . . . . . . . . . . . . . . . 32

6.2 Multistage MFN learning process . . . . . . . . . . . . . . . . . . . 33

6.3 Parameter estimation for MFNs used for two-stage PR problem . . 34

7 Experiment 36

7.1 Experiment description . . . . . . . . . . . . . . . . . . . . . . . . . 36

7.1.1 Description of considered pattern recognition problem . . . . 36

7.1.2 Dataset description . . . . . . . . . . . . . . . . . . . . . . . 36

7.2 Features selection . . . . . . . . . . . . . . . . . . . . . . . . . . . . 37

7.2.1 g-48 features . . . . . . . . . . . . . . . . . . . . . . . . . . . 37

7.2.2 Cosine Representation . . . . . . . . . . . . . . . . . . . . . 39

7.3 Methods used for the experiment . . . . . . . . . . . . . . . . . . . 40

7.3.1 Method 1 - MFN network . . . . . . . . . . . . . . . . . . . 40

7.3.2 Method 2 - Two-stage pattern recognition system with two

MLF networks on each stage . . . . . . . . . . . . . . . . . . 41

7.3.3 Method 3 - Two-stage MFN for PR . . . . . . . . . . . . . 42

7.4 Testing and validation details . . . . . . . . . . . . . . . . . . . . . 43

7.5 Results of experiment . . . . . . . . . . . . . . . . . . . . . . . . . . 43

7.6 Method 4 - Two-stage smart switching . . . . . . . . . . . . . . . 45

8 Conclusions 46

9 Future Works 49

Bibliography 50

Page 6: Multistage neural networks for pattern recognition829352/FULLTEXT01.pdfMultistage neural networks for pattern recognition Maciej Zieba School of Engineering Blekinge Institute of Technology

CONTENTS vi

A Features description 53

Page 7: Multistage neural networks for pattern recognition829352/FULLTEXT01.pdfMultistage neural networks for pattern recognition Maciej Zieba School of Engineering Blekinge Institute of Technology

List of Figures

2.1 Pattern recognition model . . . . . . . . . . . . . . . . . . . . . . . . 6

2.2 Two-stage pattern recognition model . . . . . . . . . . . . . . . . . . 7

3.1 Simple neuron model . . . . . . . . . . . . . . . . . . . . . . . . . . . 9

3.2 Bipolar sigmoid function . . . . . . . . . . . . . . . . . . . . . . . . . 10

3.3 Unipolar sigmoid function . . . . . . . . . . . . . . . . . . . . . . . . 11

3.4 Simple perceptron model . . . . . . . . . . . . . . . . . . . . . . . . . 11

3.5 Model of Multilayer Feedforward Network . . . . . . . . . . . . . . . 13

3.6 Pattern recognition model with neural networks . . . . . . . . . . . . 14

4.1 Two-stage identification model . . . . . . . . . . . . . . . . . . . . . . 15

4.2 Two-stage neuron model . . . . . . . . . . . . . . . . . . . . . . . . . 17

4.3 Two-stage neuron model with direct dependency between stages . . . 19

4.4 Two-stage Multilayer Feedforward Network model . . . . . . . . . . . 22

4.5 Two-stage pattern recognition procedure using neural networks . . . . 23

4.6 Two-stage neural network used for pattern recognition . . . . . . . . 23

7.1 The set of plotted icons . . . . . . . . . . . . . . . . . . . . . . . . . . 37

7.2 Example of usage Cosine Representation . . . . . . . . . . . . . . . . 39

7.3 Results of first experiment . . . . . . . . . . . . . . . . . . . . . . . . 43

7.4 Wrongly recognized icons . . . . . . . . . . . . . . . . . . . . . . . . . 44

7.5 Results of second experiment . . . . . . . . . . . . . . . . . . . . . . 45

vii

Page 8: Multistage neural networks for pattern recognition829352/FULLTEXT01.pdfMultistage neural networks for pattern recognition Maciej Zieba School of Engineering Blekinge Institute of Technology

Symbols and abbreviations

¯x - vector of features

Ψ - algorithm of recognition

Ψ - algorithm of identification

j - result of recognition

n−m− k − p - configuration of NN

W = [wi,j] - matrix of weights

x - input vector

y - output vector

f , g, h - activation functions from first, second and third layer of the

NN

γ - activation function parameter

α - stepsize

u - input vector for activation functions

o - output vector on hidden layers

F - function describing object

F - binding function

reshape(·) - procedure of reshaping matrix to vector

NN - Neural Network

MNN - Multistage Neural Network

PR - Pattern Recognition

MPR - Multistage Pattern Recognition

MFN - Multilayer Feedforward Network

The notation used in Chapter 7: Experiment is taken from items [29] and [8].

viii

Page 9: Multistage neural networks for pattern recognition829352/FULLTEXT01.pdfMultistage neural networks for pattern recognition Maciej Zieba School of Engineering Blekinge Institute of Technology

Chapter 1

Introduction

1.1 Background

Pattern Recognition (PR) is one of the most important disciplines in machine

learning. The aim of PR is to classify object to one of the K possible classes. The

problem of PR is some time very sophisticated and the choice of classification method

depends on knowledge about the object. In same cases instant recognition of the

object is difficult or even impossible. For example the results of preliminary examina-

tion give no final recognition of the disease, but indicate further examination of the

patient and after couple of sub-recognizing stages final diagnose can be given. The

result of recognition of each stage indicates in which way the object is going to be

recognized on the next stage. The algorithm of recognition process depends not only

on the current input, but also on the results of recognition on directly related stages.

This problem can be found as Multistage Pattern Recognition (MPR) [14] .

Neural networks, one of the most common tools in Artificial Intelligence (AI),

are widely used for PR problems [16, 21, 31]. The idea of multistage neural networks

(MNN) is based on multistage identification concept. The parameter values of neurons

on the n stage depend on the output values of the neurons which creates the n + 1

stage. This kind of structure can be used for MPR problem. Each stage of recognition

is combined with proper stage of the MNN. The final result of recognition process is

provided by the output of the first stage of MNN.

1

Page 10: Multistage neural networks for pattern recognition829352/FULLTEXT01.pdfMultistage neural networks for pattern recognition Maciej Zieba School of Engineering Blekinge Institute of Technology

CHAPTER 1. INTRODUCTION 2

1.2 Motivation and goal

Many PR problems can be easily decomposed to related subproblems. For in-

stance, The style of writing can be recognized first instead of direct handwritten

characters recognition. On the other hand, it can be much easier to recognize the

character first, than identify the writer in way determined by the previous recognition

result. There are plenty of direct methods data mining methods which can be com-

bined to be used for MPR problems, but there are only few methods directly purposed

for it. The model of MNN can be used for this case.

The main goal of the thesis is to define, design and implement the MNN and use

it for chosen pattern recognition problem. The results of the experiment should be

compared to results of other methods used for the problem.

1.3 Related works

There are no visible hints of MNN in literature. Books describing various models

of NN like [13, 15, 16, 21] were analysed to find some information about MNN, but

without success. The concept of two-stage NN were presented in [3, 18] but it was

different to thesis idea.

The thesis model of MNN was inspired by the multistage identification concept.

Item [25] is a monograph which describes in detail the problem of multistage identi-

fication. The author of the book concentrates on basic aspects of the problems like

parameter estimation or choice of the best model. There are also couple of practical

examples. Despite the fact, that the item was published in 1987 the concept is very

actual, what proves related with the topic PhD thesis [9], which was defended in 2005.

The introduction to NN is presented in [21]. The book is used as a course book for

BTH course - Neural Networks. It is very interesting compendium which describes the

NN from biological background to sophisticated and complex network models. Very

useful for the topic of the thesis are also lecture notes related with the mentioned

course [7]. The notation used in this item is also used in the thesis. Very interesting

for the topic are also [15, 16]. The books describes models NN used for solving prac-

tical problems.

Page 11: Multistage neural networks for pattern recognition829352/FULLTEXT01.pdfMultistage neural networks for pattern recognition Maciej Zieba School of Engineering Blekinge Institute of Technology

CHAPTER 1. INTRODUCTION 3

The problem of writer identification were chosen as PR problem. Related prob-

lems are widely described in articles: [3, 19, 23, 24, 26]. The mentioned items are

strongly related with writer identification based on analysis of text samples. The

multistage concept is presented in the thesis, so the literature for handwritten ges-

tures and characters recognition to create two-stage writer identification system were

studied [1, 8, 11, 17, 20, 29, 30]. There are couple of articles related to the problem of

iconic gesture recognition for crisis management [20, 29, 30]. The item [29] is tech-

nical report, in which the way of calculating features from trajectory based data of

handwritten signs is presented. The other two publications presents the results of

recognition using methods like NN or Support Vector Machines. The experiments

described in mentioned publications were made using fully accessible dataset which

can be found on [27]. Very interesting way of presenting the features of gesture or

character with Cosine Representation is presented in [8].

Very useful for the thesis experiment is [31]. It is a course book in which interesting

chapter about validation and testing methods can be found.

1.4 Expected outcomes

Suggestion for definition of MNN is expected as an outcome of the thesis. Effective

method for MNN parameter estimation should be presented. The results of exper-

iment with using chosen model of MNN for the MPR problem would be presented,

analysed and compared with other neural network based methods which can be used

for the problem.

1.5 Research questions

The thesis addresses the following research questions:

Research Question 1: How the MNN can be defined ?

Research Question 2: How to estimate parameters of MNN ?

Research Question 3: How to design the MNN for PR ?

Page 12: Multistage neural networks for pattern recognition829352/FULLTEXT01.pdfMultistage neural networks for pattern recognition Maciej Zieba School of Engineering Blekinge Institute of Technology

CHAPTER 1. INTRODUCTION 4

Research Question 4: How MNN perform for chosen problem of PR comparing

to other methods ?

Research Question 5: How to improve MNN method for pattern recognition

to increase correctness of recognition ?

1.6 Thesis outline

Chapter 2: Structure of MPR model. In this chapter MPR model is going

to be presented and described.

Chapter 3: Introduction to NN. The basic definitions and problems related

with the neural networks are going to be presented in this section. The mechanism of

using NN for PR is going to be described. The notation used in this chapter is very

important for better understanding the thesis content.

Chapter 4: MNN model . The definition of two-stage neuron is going to be

presented in this part. The two-stage multilayer feedforward network would be also

defined in the chapter. The relation between classical two-stage recognition system

and the two-stage NN for PR is going to be described.

Chapter 5: Introduction to estimation of MNN parameters In this chap-

ter the mechanism of parameter estimation for two-stage identification model would

be described. The idea of backpropagation algorithm is going to be presented also in

this part.

Chapter 6: MNN learning method. In this section mechanism of learning

two-stage NN is going to be presented.

Chapter 7: Experiment . In this chapter experiment of using two-stage neural

network for real PR problem is going to be presented.

Chapter 8: Conclusions . In this part an effort to answer the research ques-

tion is going to be made. Additional conclusions after empirical verification of MNN

should be added.

Page 13: Multistage neural networks for pattern recognition829352/FULLTEXT01.pdfMultistage neural networks for pattern recognition Maciej Zieba School of Engineering Blekinge Institute of Technology

CHAPTER 1. INTRODUCTION 5

Chapter 9: Future Work . In this section possible next steps related with the

topic of thesis are going to be described.

Page 14: Multistage neural networks for pattern recognition829352/FULLTEXT01.pdfMultistage neural networks for pattern recognition Maciej Zieba School of Engineering Blekinge Institute of Technology

Chapter 2

Structure of MPR model

2.1 Introduction

The goal of the PR problems is to classify the considered object to one of K pos-

sible classes (Figure 2.1). The object is described by the previously selected features

¯x from X, the universe of possible feature values. The object is recognised using

algorithm Ψ. The vector ¯x is an input of the algorithm and the output is j value,

which represents the result of recognition [14] .

Figure 2.1: Pattern recognition process.

For some cases it is ineffective, difficult or even impossible to recognize the ob-

jects directly. The way of recognizing the object may depend on special conditions.

This conditions could be the results of recognition problem related to the object and

different to considered one. For this case the used algorithm depends not only on the

vector of features, but also on results of sub-recognition problems which takes as an

input other vectors of features related to the object. The problem is known as MPR

problem.

6

Page 15: Multistage neural networks for pattern recognition829352/FULLTEXT01.pdfMultistage neural networks for pattern recognition Maciej Zieba School of Engineering Blekinge Institute of Technology

CHAPTER 2. STRUCTURE OF MPR MODEL 7

2.2 Two-stage pattern recognition model

The special case of MPR is two-stage PR model presented on the Figure 2.2. Ob-

jects considered for PR problem are represented by the features from X universe. On

each stage of recognition process the vectors of features are selected. The process of

selection on the second stage is independent and as a result the vector of the features

¯x(2) is gained. The result of the recognition on the second stage (j2) is achieved using

algorithm Ψ2. The process of selection the features on the first stage depends on the

result of classification on the second stage, so having j2 the vector ¯x(1)(j2) could be

gained. To achieve the desired result of recognition j1 algorithm Ψ1 is used. The

result of the algorithm also depends on recognition process on the second stage.

Figure 2.2: Two-stage pattern recognition model (based on [14]).

The two-stage model can be described using equations:

j1 = Ψ1(¯x(1)(j2), j2) (2.1)

j2 = Ψ2(¯x(2)) (2.2)

For further deliberation in the thesis simpler model is going to be considered:

1. The selection process on all stages is independent. It means that ¯x(1)(j2) = ¯x(1).

2. Only the result of final recognition on the first stage is considered for the prob-

lem. It means that j2 need not to be obtained at the end of recognition process.

Page 16: Multistage neural networks for pattern recognition829352/FULLTEXT01.pdfMultistage neural networks for pattern recognition Maciej Zieba School of Engineering Blekinge Institute of Technology

CHAPTER 2. STRUCTURE OF MPR MODEL 8

2.3 MPR model

For MPR model each recognition process depends on results of recognition on

higher stages. The MPR model with N stages is going to be considered. On each stage

of the model the vectors of features ¯x(1),. . . ,¯x(N) are given as an input. Independent

features selection is assumed as it was mentioned for two-stage model. The MPR

model can be described using equations:

jn = Ψn(¯x(n), jn+1, . . . , jN) (2.3)

jN = ΨN(¯x(N)) (2.4)

Where n = 1,. . . ,N − 1. Recognition algorithms on each stage are denoted as Ψ1,

Ψ2, . . . , ΨN .

The special case of MPR model is cascade model where the result of recognition

of n-th stage besides ¯xn value depends on only on jn+1. For this model equations 2.3

are replaced by:

jn = Ψn(¯x(n), jn+1) (2.5)

There are plenty of algorithms used for PR. The most common are methods,

which uses supervised kind of learning. They can be the rules or decision trees [31]

for nominal and real sort of features. If the features are only real, algorithms like

k-NN, support vector machines (SVM) [31], or neural networks (NN) could be used.

For the thesis only algorithms based on NN are considered.

Page 17: Multistage neural networks for pattern recognition829352/FULLTEXT01.pdfMultistage neural networks for pattern recognition Maciej Zieba School of Engineering Blekinge Institute of Technology

Chapter 3

Introduction to NN

3.1 Model of neuron

The model of simple neuron is well defined in literature [7,10,13,15,16,21] and is

shown in Figure 3.1. Model of the neuron could be described by the equation:

y = f(u) = f(n∑j=1

wjxj) (3.1)

The input of network is denoted as vector x = [x1, . . . , xn]. Weights w1,. . . ,wn

are parameters of the neuron which are being estimated while training. Function

f(u) is call activation function. In some cases activation function depends also on

parameter value: f(u, γ). The function parameter in most cases is chosen arbitrary

but literature provides solutions for some types of activation function to find optimal

values of parameters while training [10,12,32].

Figure 3.1: Simple neuron without bias term.

9

Page 18: Multistage neural networks for pattern recognition829352/FULLTEXT01.pdfMultistage neural networks for pattern recognition Maciej Zieba School of Engineering Blekinge Institute of Technology

CHAPTER 3. INTRODUCTION TO NN 10

3.1.1 Activation function - survey

There are usually three types of activation function used in neurons:

1. Linear function.

2. Bipolar sigmoid function.

3. Unipolar sigmoid function.

Linear function as a activation function does not contain any parameters. It could

be easily proved for single neuron with N inputs and linear activation function (

f(u) = au ):

f(N∑j=1

wjxj) = a

N∑j=1

wjxj =N∑j=1

awjxj =N∑j=1

w′

jxj (3.2)

In bipolar sigmoid function the parameter γ called slope is used. The function,

called also tangent hyperbolic is described by the equation:

f(u) = tangh(γu) =1− e−γu

1 + e−γu(3.3)

Figure 3.2: Plot of the bipolar sigmoid function for different γ values.

The plot of the bipolar sigmoid function was presented on the Figure 3.2. The

possible values for the function are between -1 to 1. When values between 0 and 1

are suspected on the output then the unipolar sigmoid function could be used (Figure

3.3). The equation for this function is presented below:

f(u) = tangh(γu) =1

1 + e−γu(3.4)

Page 19: Multistage neural networks for pattern recognition829352/FULLTEXT01.pdfMultistage neural networks for pattern recognition Maciej Zieba School of Engineering Blekinge Institute of Technology

CHAPTER 3. INTRODUCTION TO NN 11

Figure 3.3: Plot of the unipolar sigmoid function for different γ values.

In some cases the parameter γ = 1 is chosen arbitrary. However, in some cases

improperly chosen γ value could destroy results of well projected neural network.

3.1.2 Model of perceptron

There is also possibility that the output of the neuron can take boolean values.

For this case the activation function is signum function. This model is called the

perceptron (Figure 3.4).

Figure 3.4: Simple perceptron model.

The θ parameter is called shifting or threshold value. The output y takes the

values from {0, 1}, but vectors x, w ∈ <n.

Perceptrons are widely used for separation linearly separable classes.

Page 20: Multistage neural networks for pattern recognition829352/FULLTEXT01.pdfMultistage neural networks for pattern recognition Maciej Zieba School of Engineering Blekinge Institute of Technology

CHAPTER 3. INTRODUCTION TO NN 12

3.2 Multilayer Feedforward Network (MFN)

3.2.1 Taxonomy of Neural Networks

The following taxonomy can presented according to the literature [2] :

1. Feedforward networks

(a) Single-layer perceptron.

(b) Multilayer perceptron (MLP).

(c) Radial Basis Functions (RBF) nets.

2. Recurrent (feedback) networks.

(a) Competitive networks.

(b) Kohonen’s self-organizing nets.

(c) Hopfield networks.

(d) Adaptive Resonance Theory (ART) models.

Feedforward networks are the models of the networks without loops and recurrent

networks contain feedback connection between output and input of network.

There is also ambiguity in taxonomy related to MLP model. In literature [8, 10,

20, 30] the MLP model is defined as the network consisted of neurons with default

activation functions, not only of perceptrons. For thesis instead of MLP the Multilayer

Feedforward Network (MFN) name is going to be used [7, 21]. The (RBS) nets are

going to be categorized as Modular Neural Networks. In the paper only MFN models

are going to be considered as the components of MNN, however it is interesting to

analyze the other types of networks for this case in the future.

3.2.2 Model of MFN

The model of MFN is presented in Figure 3.5. There is scientifically proved that

3 layer networks are sufficient for solving most desired problems under condition,

that sufficient number of neurons is used [10, 22]. The configuration of the network

is n − m − k − p, what means that first, second and third layer consists of m, k,

p neurons respectively. Functions f , g, h are activation functions of neurons in the

layers. Weights for each layer are described by matrices W [1] = [w[1]i,j ], W

[2] = [w[2]i,j ],

Page 21: Multistage neural networks for pattern recognition829352/FULLTEXT01.pdfMultistage neural networks for pattern recognition Maciej Zieba School of Engineering Blekinge Institute of Technology

CHAPTER 3. INTRODUCTION TO NN 13

W [3] = [w[3]i,j ]. The input vector is denoted as n+1 size vector x = [x1, x2, . . . , xn, xn+1]

which includes the bias term xn+1 = 1. The i-th output value of the first layer (o[1]i )

can be calculated in following way:

o[1]i = f(

n+1∑j=1

w[1]i,jxj) (3.5)

The input vector for the next layer o[1] is constructed using output values o[1]i and

extended with bias term o[1]m+1 = 1. The output values of the second layer o

[2]i can be

gained in similar way:

o[2]i = f(

m+1∑j=1

w[2]i,jo

[1]j ) (3.6)

The process is repeated on the last layer and the output values of the network

yi = o[3]i are equal:

yi = f(k+1∑j=1

w[3]i,jo

[2]j ) (3.7)

Figure 3.5: Model of MFN

The MFN model can be described using matrix equation:

y = h(W [3][g(W [2][f(W [1]xT ), 1]T ), 1]T ) (3.8)

3.2.3 Neural networks for pattern recognition

Neural networks are widely used to solve pattern recognition problems. The recog-

nition process using neural network is presented in Figure 3.6. It is possible to educe

three steps of Ψ algorithm in this case: pre-processing, calculating the neural network

Page 22: Multistage neural networks for pattern recognition829352/FULLTEXT01.pdfMultistage neural networks for pattern recognition Maciej Zieba School of Engineering Blekinge Institute of Technology

CHAPTER 3. INTRODUCTION TO NN 14

output values, interpreting the network output and mapping to the result of recogni-

tion [5] .

Figure 3.6: Pattern recognition procedure using neural networks.

The pre-processing is the process in which the input vector of features ¯x is trans-

formed to the network input vector x. For instance, ¯x may contain nominal features

so using ¯x as an output directly is impossible. The other example is handwritten

character recognition problem in which each character is represented by the sequence

of coordinates with different length value. All sequences during the pre-processing

must be transformed to vectors of the same length.

In mapping procedure the network output vector y must be interpreted and trans-

formed to j value, which is the result of recognition. For example yj, the member of

the y vector can represent the the confidence that the object is recognized as a member

of j-th class. The class with highest confidence rate is going to be chosen for this case.

The pre-processing procedure strongly depends on vector of features. The mapping

process is more independent because there are universal methods of transferring the

network output to classification result.

Page 23: Multistage neural networks for pattern recognition829352/FULLTEXT01.pdfMultistage neural networks for pattern recognition Maciej Zieba School of Engineering Blekinge Institute of Technology

Chapter 4

MNN model

4.1 Two-stage neural network

4.1.1 Two-stage identification concept

The idea of two-stage neural network is based on two-stage identification problem

[9,25]. The model of two-stage identification system can be observed in Figure 4.1 .

System can be described using two equations:

y = F1(x1, a1) (4.1)

a1 = F2(x2, a2) (4.2)

Figure 4.1: Two-stage identification model [25].

The object O1 (the object on the first stage) is described by F1 function. The

output y for object on the first stage depends on the input x1 and the parameter a1,

which is the output of the object on the second stage (O2). The object O2 is described

by F2 function which depends on x2 input and parameter a2.

15

Page 24: Multistage neural networks for pattern recognition829352/FULLTEXT01.pdfMultistage neural networks for pattern recognition Maciej Zieba School of Engineering Blekinge Institute of Technology

CHAPTER 4. MNN MODEL 16

To construct the two-stage identification system, which will be able to solve iden-

tification problem it is necessary to define the model of the function on each stage.

Then, the parameters of the function should be estimated to give the the system op-

portunity to give satisfactory identification results.

The complex network system presented in the thesis combines the two-stage (mul-

tistage) identification concept with MFN model. For this case considered objects are

neural networks. Influence on parameters of one network made by another is observed

in this case. Two types of parameters are going to be considered: weight values and

activation function parameters.

4.1.2 Two-stage neuron

The model of two-stage neuron is going to be presented first. Consider the struc-

ture presented in Figure 4.2. It consists of l + 1 neurons. One neuron, which can be

identify using parameters: vector of weights w(1) = [w(1)1, . . . , w(1)n(1)] , activation

function parameter γ, is the first stage neuron. The constant model of the activation

function f with single, scalar parameter is assumed in this case. All parameters of the

neuron are dependent on the second stage neurons. There are l neurons on the sec-

ond stage which can be identified by weight vectors w(2, i) = [w(2, i)1, . . . , w(2, i)n(2)i],

where i = 1, . . . , l. The activation function parameters were omitted on the second

stage, because the constant values of them were assumed to simplify the notation.

The binding items between stages are functions: F1, . . . , Fn(1)+1. They take as an

input the output values of second stage neurons and give the parameter values of first

stage neuron as an output.

The two-stage neuron can be defined as a structure of simple neurons. One neuron

described by equation:

y(1) = f(γ,

n(1)∑j=1

w(1)jx(1)j) (4.3)

and l neurons described by equations:

y(2)i = g(

n(2)i∑j=1

w(2, i)jx(2, i)j) (4.4)

Page 25: Multistage neural networks for pattern recognition829352/FULLTEXT01.pdfMultistage neural networks for pattern recognition Maciej Zieba School of Engineering Blekinge Institute of Technology

CHAPTER 4. MNN MODEL 17

Figure 4.2: Two-stage neuron model.

Page 26: Multistage neural networks for pattern recognition829352/FULLTEXT01.pdfMultistage neural networks for pattern recognition Maciej Zieba School of Engineering Blekinge Institute of Technology

CHAPTER 4. MNN MODEL 18

where i = 1, . . . , l. The relationships between neurons are described by binding

functions:

w(1)j = Fj(y(2)1, . . . , y(2)l) (4.5)

γ = Fn(1)+1(y(2)1, . . . , y(2)l) (4.6)

where j = 1, . . . , n(1).

In the presented two-stage model the activation function parameter γ dependency

were included. There are some trials of tuning some models of activation functions

during network training [10,12,32]. The optimal value of γ parameter can be obtained

during this process, what is necessary to estimate network parameters. However, the

constant, arbitrary chosen by the network designer γ value is going be consider in the

thesis.

4.1.3 Binding functions

The set of binding functions is created by functions Fi : Y (2)1 × · · · × Y (2)l −→V (1)i. Y (2)j is the universe of possible values for y(2)j and Vi is the universe of

possible values for w(1)i.

The most common example of binding function model is the linear dependency

between stages. In this case, the set of binding functions consists of following func-

tions:

Fi(y(2)1, . . . , y(2)l) = a(1,i)y(2)1 + · · ·+ a(l,i)y(2)l (4.7)

where i = 1, . . . , n(1). In matrix notation the set of equations is equivalent to:

w(1) = Ay(2)T (4.8)

where w(1) = [w(1)1, . . . w(1)n(1)], y(1) = [y(2)1, . . . y(2)n(1)] and A = [a(i,j)]. It

is easy to notice that in this case the set of binding functions creates one layer of

linear network. The binding function can be interpreted as a activation function

of the neuron created in this way. For instance second stage neurons can create 1

layer linear neural network. It was proved in [7] that N− layer linear neural network

Page 27: Multistage neural networks for pattern recognition829352/FULLTEXT01.pdfMultistage neural networks for pattern recognition Maciej Zieba School of Engineering Blekinge Institute of Technology

CHAPTER 4. MNN MODEL 19

Figure 4.3: An example of two-stage neuron model with direct dependency between

stages.

Page 28: Multistage neural networks for pattern recognition829352/FULLTEXT01.pdfMultistage neural networks for pattern recognition Maciej Zieba School of Engineering Blekinge Institute of Technology

CHAPTER 4. MNN MODEL 20

can be reduced to one layer linear neural network, so creating additional linear layer

using binding functions is pointless. It was mentioned in the thesis that three layered

MFN is sufficient for all considered problems, so again next layer created by binding

functions is useless. The model of two-stage neuron (Figure 4.2) presented in the

thesis is only theoretical and make sense only, if the second stage neurons do not create

the network structure. In practice, there is no sense to create binding functions other

than in way presented in Figure 4.3. It means that:

Fi(y(2)1, . . . , y(2)l) = Fi(y(2)i) = y(2)i (4.9)

The dependency between stages is direct. This type of binding between stages is

going to be considered in this thesis.

4.1.4 Model of two-stage MFN

To define two-stage MFN the following MFNs must be considered first. They can

be described using matrix equations:

y(1) = h(1)(W (1)[3][g(1)(W (1)[2][f(1)(W (1)[1]x(1)T ), 1]T ), 1]T ) (4.10)

y(2) = h(2)(W (2)[3][g(2)(W (2)[2][f(2)(W (2)[1]x(2)T ), 1]T ), 1]T ) (4.11)

where:

x(jstage) - input vector of the jstage stage network.

y(jstage) - output vector of the jstage stage network.

f(jstage), g(jstage), h(jstage) - activation functions from first, second and third layer of NN

situated on jstage stage.

W (jstage)[jlayer] = [w(jstage)

[jlayer]i,j ] is matrix of weights of jlayer layer for NN situated

on jstage stage.

As it could be noticed the constant value of activation function parameter were

assumed. The configuration of networks are n(1) − m(1) − k(1) − p(1) and n(2) −m(2) − k(2) − p(2) respectively. The two-stage MFN is a neural network structure

which consists of simple and two-stage neurons. First stage is composed of first stage

neurons which creates the MFN described by equation 4.10. Second stage neurons

create the last layer of second stage MFN network described by equation 4.11. The

other layers in second stage network consist of simple neurons. For instance, the

Page 29: Multistage neural networks for pattern recognition829352/FULLTEXT01.pdfMultistage neural networks for pattern recognition Maciej Zieba School of Engineering Blekinge Institute of Technology

CHAPTER 4. MNN MODEL 21

i(1)-th two-stage neuron creating the first layer of first stage network and last layer

of second stage can be described using equation:

o(1)[1]i(1) = f(1)(

n(1)+1∑j(1)=1

w(1)[1]i(1),j(1)x(1)j(1)) (4.12)

and set of equations describing the neurons, which output values are directly the

values of weight on first stage of considered i(1)-th two-stage neuron:

w(1)[1]i(1),1 = h(2)(

∑k(2)+1j(2)=1 w(2)

[3]m(1)(i(1)−1)+1,j(2)o(2)

[2]j(2))

...

w(1)[1]i(1),j(1) = h(2)(

∑k(2)+1j(2)=1 w(2)

[3]m(1)(i(1)−1)+j(1),j(2)o(2)

[2]j(2))

...

w(1)[1]i(1),n(1)+1 = h(2)(

∑k(2)+1j(2)=1 w(2)

[3]m(1)(i(1)−1)+n(1)+1,j(2)o(2)

[2]j(2))

(4.13)

where:

o(jstage)[jlayer]i - output for i-th neuron of jlayer layer of network situated on jstage stage.

x(jstage)j - j − th coordinate of vector x(jstage) .

Defining the two-stage network structure using two-stage neurons is complex and

can be unclear. It is much easier to define the two-stage MFN as a two MFN networks

described by equations 4.10 and 4.11 binded by vector equation:

y(2) = [reshape(W (1)[1]), reshape(W (1)[2]), reshape(W (1)[3])] (4.14)

Function reshape changes m× n size matrix W = [wi,j] to vector of m ∗ n length,

which consists of all members of the matrix:

reshape(W ) = [w1,1w2,1, . . . , wm,1, . . . , w1,jw2,j, . . . , wm,j, . . . , w1,nw2,n, . . . , wm,n]

(4.15)

It is also important to mention, that vector of vectors is interpreted as a simple

vector with all coordinates of included vectors saving indicated order. For instance

vector v = [v1v2], where v1 = [v1,1, . . . , v1,n1 ] and v2 = [v2,1, . . . , v2,n2 ] is simply a vector

v = [v1,1, . . . , v1,n1 , v2,1, . . . , v2,n2 ].

Number of MFN parameters can be very large. In this case it is better to consid-

ered more than one network on the second stage to avoid huge number of outputs of

Page 30: Multistage neural networks for pattern recognition829352/FULLTEXT01.pdfMultistage neural networks for pattern recognition Maciej Zieba School of Engineering Blekinge Institute of Technology

CHAPTER 4. MNN MODEL 22

Figure 4.4: Two-stage MFN model.

one network. It is also important to discuss the possibility of only partial dependency

between stages. In this case, the simple neurons can occur in first stage network.

However, problem of estimation independent weights is visible here. The independent

parameters should fit to the problem globally, while the dependent parameters lo-

cally, in way determined by the second stage input. The model of two stage network

with partial dependency can be transformed to complete dependency model. The

knowledge related with independent parameters would be accumulated in bias term.

4.1.5 Two-stage MFN for pattern recognition

If the schema of two-stage pattern recognition from Figure 2.2 is taken and it is

extended by taking neural networks as recognition algorithms the structure presented

in Figure 4.5 will be gained. As it was presented in Chapter 2, the algorithm on the

first stage takes as an input the result of recognition on the second stage. The result j2

can indicate what kind of neural network, type of activation function or which values

of parameters are going to be used on the first stage. The main goal of recognition

on the second stage is to ’choose’ the best model of network for the first stage. If the

three-layer feedforward network model with arbitrary chosen activation functions is

Page 31: Multistage neural networks for pattern recognition829352/FULLTEXT01.pdfMultistage neural networks for pattern recognition Maciej Zieba School of Engineering Blekinge Institute of Technology

CHAPTER 4. MNN MODEL 23

assumed the behaviour of the network on first stage can be changed by manipulating

values contained in weight matrices, which are denoted as W . The values of weights

are being switched by results of recognition on the second stage.

Figure 4.5: Two-stage pattern recognition procedure using neural networks

Typical model of two-stage neural network for pattern recognition problem is pre-

sented in Figure 4.6. In this case there are no two separate algorithms on each stage.

There is only one algorithm Ψ(¯x(1), ¯x(2)) which includes complex, two-stage mecha-

nism similar to two-stage recognition model.

Figure 4.6: Two-stage neural network used for pattern recognition

Influence of second stage structure is bounded by set of possible j2 values for

two-stage pattern recognition model. The model of two-stage neural network enables

variety of possible influence determined by y. For example, vector ¯x(1) can repre-

sent the features characteristic for handwritten character. Vector ¯x(2) can contain

features describing style of writing. Using two-stage pattern recognition model writer

(or group of writers with similar handwriting style) can be identified first using Ψ2

algorithm. Then, the result of recognition j2 determines the choice of weights for the

Page 32: Multistage neural networks for pattern recognition829352/FULLTEXT01.pdfMultistage neural networks for pattern recognition Maciej Zieba School of Engineering Blekinge Institute of Technology

CHAPTER 4. MNN MODEL 24

first stage network which is the recognition method used in algorithm Ψ1. Finally

the desired j2 recognition value is obtained. That could be the situation in which

the style of writing considered character would be similar to two of possible styles

almost equally. In this case the algorithm Ψ2 would choose one of two similar styles

of writing. The knowledge of similarity to other style will be lost. If two-stage neural

network is used the directly taken network output y could determinate the style of

writing composed of two similar styles. Chosen weight values in this case will enable

to recognize the character written in combined style what was unable in two-stage

pattern recognition process.

Two-stage neural networks do not consider the recognition process on the second

stage. For instance, the sub-universe X(2) ⊂ X can be taken and the clustering

algorithm can be used to gain the possible classes. The first stage network taking

the vector of features ¯x(2) would return the cluster number which would indicate the

proper weights values W .

4.2 Multistage generalization for two-stage neural

networks

The model of two-stage MFN can be extended to multistage MFN model. Multi-

stage MFN consists of N MFNs described with equations:

y(j) = h(j)(W (j)[3][g(j)(W (j)[2][f(j)(W (j)[1]x(j)T ), 1]T ), 1]T ) (4.16)

where j = 1, . . . , N . The binding between stages can be described using N − 1

equations :

y(j) = [reshape(W (j − 1)[1]), reshape(W (j − 1)[2]), reshape(W (j − 1)[3])] (4.17)

where j = 2, . . . , N .

The multilayer MFN can be used for MPR in analogical way as it was presented

for two-stage PR problem. Instead of using algorithm Ψj on each stage of recognition

process, MNN is used, which takes the vector of features related to the stage.

Page 33: Multistage neural networks for pattern recognition829352/FULLTEXT01.pdfMultistage neural networks for pattern recognition Maciej Zieba School of Engineering Blekinge Institute of Technology

CHAPTER 4. MNN MODEL 25

MFN are objects, which must be described by plenty of parameters. It differs the

MNN from multistage identification, in which the number of parameters describing

the object depends on model of function. Parameter estimation can be unacceptably

time consuming even for a few stages of MNN.

Page 34: Multistage neural networks for pattern recognition829352/FULLTEXT01.pdfMultistage neural networks for pattern recognition Maciej Zieba School of Engineering Blekinge Institute of Technology

Chapter 5

Introduction to estimation of MNN

parameters

5.1 Estimation parameters for two-stage identifi-

cation

It is necessary to present the learning algorithm procedure of parameters estima-

tion for two-stage identification problem as a introduction to MNN parameters estima-

tion. The model of two-stage identification is presented in Figure 4.1 and is described

by equations 4.1 and 4.2. We assume that models of functions F1, F2 of objects O1

and O2 respectively are known. The goal of identification in this case is to estimate

value of parameter a2, which is parameter of function F2 and only independent for con-

sidered system, having following training set: (x2,1, x1,1,1, y1,1),. . . ,(x2,1, x1,1,N1 , y1,N1)

, . . . , (x2,j, x1,j,1, yj,1),. . . ,(x2,j, x1,j,Nj, yj,Nj

), . . . , (x2,M , x1,M,1, yM,1),. . . ,

(x2,M , x1,M,NM, yM,NM

)

In the two-stage identification system the a1 parameter should be estimated for

each constant input value of x2. It would provide the set of input values x2,j and

corresponding output values a1,j, which are estimated values of parameter on first

stage. The estimation of parameter a1 is made M times for each constant x2,j value

(j = 1, . . . ,M ), using sequence of observed input values x1,j,1, . . . , x1,j,i, . . . , x1,j,Nj

and output values yj,1, . . . , yj,i, . . . , yj,Njon the first stage with supervised algorithm

of identification:

a1,j = Ψ1((x1,j,1, yj,1), . . . , (x1,j,i, yj,i), . . . , (x1,j,Nj, yj,Nj

)) (5.1)

26

Page 35: Multistage neural networks for pattern recognition829352/FULLTEXT01.pdfMultistage neural networks for pattern recognition Maciej Zieba School of Engineering Blekinge Institute of Technology

CHAPTER 5. INTRODUCTION TO ESTIMATION OF MNN PARAMETERS 27

Each tact provides the new member of training set, which can be used for esti-

mating the parameter on the second stage. The complete sequence of input values

x2,1, . . . , x2,j, . . . , x2,M and output values a1,1, . . . , a1,j, . . . , a1,M enable estimation of

parameter a2 using supervised algorithm of estimation on second stage:

a∗2 = Ψ2((x2,1, a1,1), . . . , (x2,j, a1,j), . . . , (x2,M , a1,M)) (5.2)

The parameter a1 is estimated M times, for each constant value of input x2 on the

second stage. Each time the training set (x1,j,1, yj,1), . . . (x1,j,i, yj,i), . . . (x1,j,i, yj,Nj)

were used for the estimation. Then, the second stage parameter were gained using

training set composed of pairs: (x2,j, a1,j).

Estimation of parameters for multistage identification is analogical as those pre-

sented for two-stage example. The process of parameter estimation, also for multistage

identification model, is in detail described in [25].

Once can noticed, that this kind of estimation method can be easily used for

two-stage networks. The weights on the first stage can be estimated during training

for each input value on the second stage. The training set for second stage network

could be gained and parameter values for this network estimated. Backpropagation

algorithm for MFN is going to be presented next complete survey for MNN learning.

5.2 Parameter estimation for neural networks -

learning methods

5.2.1 Learning taxonomy

Following learning methods taxonomy can be presented for Neural networks [7] :

1. supervised learning.

2. unsupervised learning.

(a) corrective learning.

(b) reinforced learning.

Page 36: Multistage neural networks for pattern recognition829352/FULLTEXT01.pdfMultistage neural networks for pattern recognition Maciej Zieba School of Engineering Blekinge Institute of Technology

CHAPTER 5. INTRODUCTION TO ESTIMATION OF MNN PARAMETERS 28

For supervised learning the training set includes input and desired output values.

The members of training set are used for parameter estimation. Backpropagation

algorithm for MFN is most common example of supervised learning. There are also

unsupervised learning methods for which only input values are known for estimation.

In this case corrective and reinforced learning division can be mentioned. First kind

of learning methods are related with real valued desired signal outputs and the second

one with boolean values of the output.

5.2.2 Backpropagation algorithm

Backpropagation algorithm is the most typical method for learning MFN. This

method is based on steepest decent method in which the parameter of estimation is

updated iteratively by subtracting the gradient of error multiplied by α parameter.

The MFN presented in Figure 3.5 would be considered for backpropagation learn-

ing. Weights values for each layer - w[jlayer]i,j are going to be iteratively estimated using

the training set (x1, t1), . . . , (xitrain, titrain

), . . . , (xN , tN) with procedure:

w[jlayer](tn+1)i,j = w

[jlayer](tn)i,j − α ∂Eitrain

∂w[jlayer](tn)i,j

= w[jlayer](tn)i,j + ∆w

[jlayer]i,j (5.3)

where:

α - stepsize

w[jlayer](tn)i,j - value of weight w

[jlayer]i,j in tn moment

Eitrainis the error value for training member (xitrain

, titrain):

Eitrain=

1

2

p∑i=1

(ti,itrain− yi,itrain

)2 =1

2

p∑i=1

e2i,itrain

(5.4)

where yitrainis the vector of output values for the input vector xitrain

. The initial

w[jlayer](t0)i,j values are randomly chosen and updating process is stopped, when the er-

ror value achieve stepping criteria.

At the beginning the estimation process is going to be considered on the last layer.

The weights would be updated in following way:

Page 37: Multistage neural networks for pattern recognition829352/FULLTEXT01.pdfMultistage neural networks for pattern recognition Maciej Zieba School of Engineering Blekinge Institute of Technology

CHAPTER 5. INTRODUCTION TO ESTIMATION OF MNN PARAMETERS 29

∆w[3]i,j = −α∂Eitrain

∂w[3]i,j

= −α∂Eitrain

∂u[3]i

∂u[3]i

∂w[3]i,j

(5.5)

u(1)[3]i can be described as:

u[3]i =

k+1∑j=1

w[3]i,jo

[2]j (5.6)

It is important to mention that o[2] is in this case extended vector, what means

that o(1)[2]k+1 = 1. So it can written:

∂u[3]i

∂w[3]i,j

= o[2]j (5.7)

Next the local error on the third layer would be defined as:

δ[3]i = −∂Eitrain

∂u[3]i

= − ∂Eitrain

∂ei,itrain

∂ei,itrain

∂u[3]i

= −ei,itrain

∂ei,itrain

∂u[3]i

(5.8)

δ[3]i = −ei,itrain

∂ei,itrain

∂yi

∂yi

∂u[3]i

= ei,itrain

∂h(u[3]i )

∂u[3]i

(5.9)

So the weights on the third layer can be updated in following way:

∆w[3]i,j = αδ

[3]i o

[2]j (5.10)

Next the updating process for wages from second layer would be presented:

∆w[2]i,j = −α∂Eitrain

∂w[2]i,j

= −α∂Eitrain

∂u[2]i

∂u[2]i

∂w[2]i,j

= −α∂Eitrain

∂u[2]i

o[1]j (5.11)

as it was in previous layer, the local error δ[2]i would be defined:

δ[2]i = −∂Eitrain

∂u[2]i

= −∂Eitrain

∂o[2]i

∂o[2]i

∂u[2]i

= −∂Eitrain

∂o[2]i

∂g(u[2]i )

∂u[2]i

(5.12)

Page 38: Multistage neural networks for pattern recognition829352/FULLTEXT01.pdfMultistage neural networks for pattern recognition Maciej Zieba School of Engineering Blekinge Institute of Technology

CHAPTER 5. INTRODUCTION TO ESTIMATION OF MNN PARAMETERS 30

then, the − ∂E

∂o(1)[2]i

must be calculated:

− ∂Eitrain

∂o[2]i

= −p∑j=1

∂Eitrain

∂u[3]j

∂u[3]j

∂o[2]i

= −p∑j=1

δ[3]j wj,i[3] (5.13)

So the wages are updated in following way:

∆w[2]i,j = αδ

[2]i o

[1]j (5.14)

In the same way the weights on first layer are modified:

∆w[1]i,j = αδ

[1]i xj,itrain

(5.15)

The current pair (xitrain, yitrain

) is chosen randomly during each iteration. Authors

of [7] present the backpropagation algorithm in following way:

Given

• model of the three-layer feedforward network

• training set (x1, t1), . . . , (xitrain, titrain

), . . . , (xN , tN)

• stepsize α value

• stopping criteria value ε

Find

• estimated values of weights: W [1], W [2], W 3].

Step 1

Initiate the weight matrices: W [1], W [2], W 3]. The following rule of initialization

is proposed in [7]:” Pick values randomly in the interval < −0.5 0.5 > and divide with

fan-in, which is the number of units feeding the layer.”

Page 39: Multistage neural networks for pattern recognition829352/FULLTEXT01.pdfMultistage neural networks for pattern recognition Maciej Zieba School of Engineering Blekinge Institute of Technology

CHAPTER 5. INTRODUCTION TO ESTIMATION OF MNN PARAMETERS 31

Step 2

Pick a member (xitrain, titrain

) of training set. Calculate the output yitrainfor input

value xitrainusing equation 3.8. If the 1

p

∑pi=1(yi,itrain

− ti,itrain) ≤ ε then stop (ε is

maximal acceptable error value, which set before training).

Step 3

Find the weight corrections for each layer: ∆W [3] = [∆w[3]i,j ], ∆W [2] = [∆w

[2]i,j ],

∆W [1] = [∆w[1]i,j ] using equations: 5.10, 5.14, 5.15.

Step 4

Update the weight values: W [3] = W [3] + ∆W [3], W [2] = W [2] + ∆W [2] and

W [1] = W [1] + ∆W [1]. go to Step 2.

There are possible additional features improving convergence speed and gener-

alization. The most common is momentum term for which the process of updating

weight values includes the average of previous gradients. Momentum term is described

in detail in [7] and [10] .

Page 40: Multistage neural networks for pattern recognition829352/FULLTEXT01.pdfMultistage neural networks for pattern recognition Maciej Zieba School of Engineering Blekinge Institute of Technology

Chapter 6

MNN learning method

The concept two-stage MFN with multistage generalization was presented in Chap-

ter 4. The process of parameters estimation for two-stage identification system was

presented in Chapter 5. In this part of the thesis the process of estimation pa-

rameters for MNN is going to be described. Author in the paper concentrate mostly

on two-stage case of MNN, but brief description about learning procedure for bigger

number of stages would be provided. It is also important to add few words about

process of estimation weights of MFNs for classical problem of two-stage PR using

NN.

6.1 Two-stage MFN learning process

The process of weights estimation for two-stage MFN is very similar to method

for parameter estimation used in two-stage identification. The model of two-stage

MFN described in Chapter 4 is known. The aim of estimation in this case is to find

weight values on second stage (values of W (2)[1],W (2)[2], W (2)[3]) with training set

(x(2)1, x(1)1,1, y(1)1,1),. . . ,(x(2)1, x(1)1,N1 , y(1)1,N1) , . . . , (x(2)j, x(1)j,1, y(1)j,1),. . . ,

(x(2)j, x(1)j,Nj, y(1)j,Nj

), . . . , (x(2)M , x(1)M,1, y(1)M,1),. . . , (x(2)M , x(1)M,NM, y(1)M,NM

).

For each constant j−th vector of values x(2)j of vector x(2) the network on the first

stage is trained with training set: (x(1)j,1, y(1)j,1), . . . , (x(1)j,i, y(1)j,i), . . . (x(1)j,Nj,

y(1)j,N(1)j) using backpropagation algorithm. In this process of estimation the values

W (1)[1]j , W (1)

[2]j ,W (1)

[3]j of weight matrices are achieved. After reshaping (see equa-

tion 4.14) the pair (x(2)j, y(2)j) is gained. The values of W (2)[1],W (2)[2], W (2)[3]

are calculated with training set (x(2)1, y(2)1), . . . , (x(2)j, y(2)j), . . . , (x(2)M , y(2)M)

using backpropagation algorithm. This values are desired result of estimation, be-

32

Page 41: Multistage neural networks for pattern recognition829352/FULLTEXT01.pdfMultistage neural networks for pattern recognition Maciej Zieba School of Engineering Blekinge Institute of Technology

CHAPTER 6. MNN LEARNING METHOD 33

cause weights cumulated in matrices W (2)[1],W (2)[2], W (2)[3] are the only parameters

in two-stage MFN.

In practice, the possibility of existing the training set with sufficient number of

members for each constant value of x(2) is very low. Especially for two - stage neural

network used PR it is better to train the network not for one constant value x(2)j,

but for set of similar, related in some way set of values x(2)j,1,. . . ,x(2)j,N(2)j. In one

tact the members (x(2)j,1, y(2)j),. . . ,(x(2)j,N(2)j, y(2)j) of training set used on second

stage would be gained instead of only one pair. For two-stage MFN used for two-stage

PR the number of tacts during estimation can be equal to number of possible results

of recognition on the second stage of the two-stage recognition process.

6.2 Multistage MFN learning process

It is easy to extend the procedure of estimation for multistage MFN. The algo-

rithm of estimation the parameters for N-stage network can be presented in following

way:

Given

• model of the N-stage MFN

• training set composed of members (x(N)jN , x(N−1)jN ,jN−1, . . . , x(2)jN ,jN−1,...,j2 ,

x(1)jN ,jN−1,...,j2,j1 , y(1)jN ,jN−1,...,j2,j1), where jN = 1, . . . ,M , jN−1 = 1, . . . , NjN ,

. . . , j1 = 1, . . . , NjN ,...,j2 .

• parameters of backpropagation algorithm for MFN

Find

• estimated values of weights: W (N)[1],W (N)[2], W (N)[3].

Step 1

Set jstage = 1.

Page 42: Multistage neural networks for pattern recognition829352/FULLTEXT01.pdfMultistage neural networks for pattern recognition Maciej Zieba School of Engineering Blekinge Institute of Technology

CHAPTER 6. MNN LEARNING METHOD 34

Step 2

For each constant vector of values x(jstage+1)jN ,jN−1,...,jjstage+1estimate weight ma-

trices values W (jstage)[1]jN ,jN−1,...,jjstage

, W (jstage)[2]jN ,jN−1,...,jjstage

, W (jstage)[3]jN ,jN−1,...,jjstage

using backpropagation algorithm with training set:

(x(jstage)jN ,jN−1,...,jstage+1,1, y(jstage)jN ,jN−1,...,jstage+1,1),

(x(jstage)jN ,jN−1,...,jstage+1,2, y(jstage)jN ,jN−1,...,jstage+1,2), . . . ,

(x(jstage)jN ,jN−1,...,jstage+1,NM,...,jstage+1, y(jstage)jN ,jN−1,...,jstage+1,NM,...,jstage+1

)

Step 3

Transform each of calculated in Step 2 three weights matricesW (jstage)[1]jN ,jN−1,...,jjstage

,

W (jstage)[2]jN ,jN−1,...,jjstage

, W (jstage)[3]jN ,jN−1,...,jjstage

to vector of values y(jstage+1)jN ,jN−1,...,jjstage

using reshape method . Increase jstage - jstage = jstage + 1.

If jstage < N go to Step 2.

Step 4

Estimate values of weights: W (N)[1],W (N)[2], W (N)[3] using backpropagation al-

gorithm with training sequence:

(x(N)1, y(N)1), . . . , (x(N)M , y(N)M)

This values are desired result of estimation.

For pattern recognition purpose of MNN usage it is necessary to modify the method

of estimation in way presented for two-stage NN.

6.3 Parameter estimation for MFNs used for two-

stage PR problem

Two-stage PR process with NN used as algorithms for recognition is presented in

Figure 4.5 and was described in Chapter 4. To estimate the parameters of networks

Page 43: Multistage neural networks for pattern recognition829352/FULLTEXT01.pdfMultistage neural networks for pattern recognition Maciej Zieba School of Engineering Blekinge Institute of Technology

CHAPTER 6. MNN LEARNING METHOD 35

situated on first and second stage of the structure it is necessary to have sample of two

pairs of feature vectors and corresponding class values for each stage. It means, that

the following training set is known: (¯x(1)1, ¯x(2)1, j1,1, j2,1), . . . , (¯x(1)M , ¯x(2)M , j1,M , j2,M)

. It is important to highlight, that presented training set is for PR algorithms, not

for networks directly.

Second stage recognition process is totally independent and can be used separately.

To train the NN used for algorithm Ψ2 it is necessary to obtain the training set suit-

able for network estimation. The vectors of features ¯x(2)i should be transformed to

input vector x(2)i in pre-processing. Having the rules of mapping procedure it is easy

to retrieve the output vectors y(2)i from known class values j2,i. Having the train-

ing sequence for the network on the second stage consisted of pairs (x(2)i, y(2)i) the

weight values can be gained using backpropagation algorithm.

The process of estimation on the first stage is more complex. The algorithm

Ψ1 depends on result of recognition on the second stage. The values of weights are

changed each time, when different result occurs on the second stage. It is easy to

observe, that weight values should be estimated each time for all possible values of

j2. For each possible j2 = 1, . . . , K2 only the members (¯x(1)i, ¯x(2)i, j1,i, j2) are taken,

and the weight values corresponding to the current value of j2 are estimated with

training set (x(1)j2,i, y(1)j2,i) using backpropagation algorithm. As a result K2 sets

of weight values are gained and process of estimation is completed. In two-stage

pattern recognition system the second stage weight values and values of weights for

possible scenarios on the first stage must be remembered. Two-stage MFN is described

only with weight parameters creating the second stage, because values of weights are

directly dependent on second stage NN output.

Page 44: Multistage neural networks for pattern recognition829352/FULLTEXT01.pdfMultistage neural networks for pattern recognition Maciej Zieba School of Engineering Blekinge Institute of Technology

Chapter 7

Experiment

7.1 Experiment description

7.1.1 Description of considered pattern recognition problem

The problem of writer identification based on online handwritten iconic gesture

is going to be considered as a chosen pattern recognition problem in the experiment.

The are plenty of literature items related to the problem of handwritten characters

recognition [1,6,8,11,17,20,26,30] , but only few in which writer recognition is consid-

ered [3,4,19,23]. In some areas correct and instant writer recognition is as important

as recognizing the written character. The domain, in which time plays extremely

important role is crisis management. For this case, the communication system must

work extremely fast. There is a set of publications [20, 29, 30] in which authors con-

sider the set of 14 icons (Figure 7.1 ) which represents emergency situations. As they

concluded, this kind of communication tool is much faster than handwriting, easy to

learn and remember and and visual meaningful shape.

All considered papers [20, 29, 30] are related to iconic gesture recognition. In the

thesis the problem of writer recognition is going to be considered. The direct solution

using MFN is going to be used in the experiment. The two-stage recognition systems

and two-stage MFN are going to be tested as alternative methods.

7.1.2 Dataset description

The dataset used for the experiment is fully available on the web page [27]. It

contains the set of icons drawn by 32 volunteers. Every participant of the experiment

36

Page 45: Multistage neural networks for pattern recognition829352/FULLTEXT01.pdfMultistage neural networks for pattern recognition Maciej Zieba School of Engineering Blekinge Institute of Technology

CHAPTER 7. EXPERIMENT 37

Figure 7.1: The set of icons plotted by one of the participants (taken from [30]).

were supposed to fill out 22 paper forms. Each form contained 35 boxes arranged in 7

rows and 5 columns, two calibration crosses and an identification area. As the result,

the set consisted of 24,441 samples were gained. The set was reduced to 8 writers and

3256 samples due to time consuming data transformation process. Each sample can

be described by writer ID, icon ID and trajectory:

σi = {(xi, yi), fi, ti} (7.1)

where (xi, yi) is current position during drawing, fi is current pressure and ti is

current time of drawing.

7.2 Features selection

The writer identification techniques are in most cases text [4, 19, 23] or signature

[3, 28] based methods. For this case, writer is going to be recognized having only

one icon produced by him. It is important to highlight, that features for typical

sign recognition problem are often used for identification of the producer [3, 26]. It

seems to be natural that features like: number of straight lines or convex hull area

are the features which differs the different types of icons or characters but also the

styles of writing. In this section two types of features, which could be used for writer

and gesture recognition are going to be presented: g-48 features [29] and Cosine

Representation [8] based on Discrete Cosine Transform.

7.2.1 g-48 features

The set of g-48 features was presented in technical report related to the dataset

[29]. This features can be used not only for iconic gesture recognition, but the writer

can be also identify using them. The set consists of space-based, dynamic and force-

based features. Most of features definition are presented in Appendix A, others can

Page 46: Multistage neural networks for pattern recognition829352/FULLTEXT01.pdfMultistage neural networks for pattern recognition Maciej Zieba School of Engineering Blekinge Institute of Technology

CHAPTER 7. EXPERIMENT 38

be found in mentioned technical report. Categories of used features are going to be

briefly described next.

Length and Area Features

This kind of features are related with shape of the icon. The features like length

of trajectory or the distance first and last sample are considered here. The convex

hull Area is very important in the group. It can be easily calculated using Graham

Algorithm. There are also principal components based features like orientation of the

principal axis or the centroid offset.

Direction, Curvature and Perpendicularity

This kind of subset of features is related with angles between vectors which de-

scribes current writing direction. The most important features in this case are: Aver-

age, absolute and squared curvature, perpendicularity and maximum angular differ-

ence.

Octans

Octans are eight features which describes the the distribution of sample points on

the surface. The circle covering whole icon with center in centroid point is taken and

is divided into eight, equal parts. For each part the number of in-points is calculated

as one feature.

Trajectory-based features

As it was mentioned in technical document, trajectory-based features use the

relation between different parts of the trajectory as features. The number of crossings

seems to be the most important in this group of features for iconic gesture and writer

recognition.

Straight Lines

This kinds of features are related to a set of straight lines in the plotted icon. The

technical report present the efficient algorithm for finding the straight lines. The basic

idea of the algorithm is to find the lines not shorter than arbitrarily taken minimal

value, for which the distance for subsequent set of sample points of the trajectory is

closer to the line as a predefined threshold.

Page 47: Multistage neural networks for pattern recognition829352/FULLTEXT01.pdfMultistage neural networks for pattern recognition Maciej Zieba School of Engineering Blekinge Institute of Technology

CHAPTER 7. EXPERIMENT 39

Cups

The notion of the cups is very common in handwritten characters analysis. Cups

are U-shape parts of the trajectory. The technical report presents three algorithms

for finding cups related features like: the number of cups present in a trajectory, the

offset (relative start position) of the

first cup and the offset (relative end position) of the last cup.

Dynamic Features

Dynamic features are time depending features. For this group issues like average

velocity or acceleration of the writing can be mentioned.

Force-based Features

This set of features consider the force values of the trajectory. The average pressure

is obvious feature in this case.

7.2.2 Cosine Representation

(a) original icon (b) retrieved icon

Figure 7.2: The Figure 7.2(a) contains the original plot of the icon and the Figure

7.2(b) presents the same icon retrieved from cosine transform.

The sequence of both coordinates of sample points si = (xi, yi) can be transformed

to Cosine Representation using Discrete Cosine Transformation. For sequence of

coefficients (xi, yi), where i = 1, . . . , N , transformed coordinates (vk, zk) can be gained

be gained in following way:

Page 48: Multistage neural networks for pattern recognition829352/FULLTEXT01.pdfMultistage neural networks for pattern recognition Maciej Zieba School of Engineering Blekinge Institute of Technology

CHAPTER 7. EXPERIMENT 40

v0 =1

N

N−1∑n=0

(xn) (7.2)

vk =2

N

N−1∑n=0

(xn)cos(ktn) (7.3)

z0 =1

N

N−1∑n=0

(yn) (7.4)

zk =2

N

N−1∑n=0

(yn)cos(ktn) (7.5)

where k = 0, . . . , K − 1, K is length of desired Cosine Representation sequence.

The tn is defined as:

tn =π

N(n+

1

2) (7.6)

The symbol can be easily retrieved with small data lost using inversed transfor-

mation [8]. The main advantage of the Cosine Representation is that it saves the

features of each symbol in sequences of the same length. It can be easily used as an

input in the neural network.

7.3 Methods used for the experiment

Following methods and systems are going to be used for the writer recognition :

1. Method 1 - MFN for pattern recognition.

2. Method 2 - Two-stage MFN system with iconic gesture recognition on the

second stage.

3. Method 3 - Two-stage MFN for PR.

7.3.1 Method 1 - MFN network

The three layer feedforward network will be used as a first method. This kind

of network was presented in Figure 3.5. On each stage bipolar sigmoid functions

(see Figure 3.2) are used with constant, activation parameter value: γ = 1. The

configuration of the network weights is: 89− 150 − 100 − 8, what means, that there

Page 49: Multistage neural networks for pattern recognition829352/FULLTEXT01.pdfMultistage neural networks for pattern recognition Maciej Zieba School of Engineering Blekinge Institute of Technology

CHAPTER 7. EXPERIMENT 41

was 89 input values and 8 output values of the network which consists of 3 layers

with 150, 100 and 8 neurons respectively. The input vector consists of 57 normalized

features values (features are presented in Appendix A, one feature value was omit-

ted) and 32 values representing the Cosine Representation - 16 for representing xi

values and 16 for representing yi values. The output vector takes the values from the

I8, where I = [−1 1]. The mapping process (see Figure 3.6 ) is the following: the

maximal value of the output vector is taken and position of this value determines the

recognition result. The output vector (if the output would be normalized or unipolar

function would be used instead ) can be interpreted as a vector of confidence rates of

writer recognition. The i− th value of the output vector is related to the i− th writer

in this case.

As a learning procedure the backpropagation algorithm were used. The value of

parameter α were found in experimental way. The training set used for estimation

consisted of two vectors: vector of features values of considered object, and the vector,

which has 1 value on the i − th positron (which was the current writer’s id) and −1

on other positions.

7.3.2 Method 2 - Two-stage pattern recognition system with

two MLF networks on each stage

The two-stage recognition system with two three layer neural networks is going to

be used as a next method. The schema of two-stage recognition using NNs was pre-

sented in Figure 4.5. On the first stage process the writer is going to be recognized.

The process of recognition on this level depends on recognition result of recognition

on the second stage. On the second stage the iconic gesture is going to be recognized.

The network structure is similar to MFN used for writer recognition. The set of fea-

tures considered for the writer recognition is the same as for recognising the icons.

The configuration of the network on the second stage is: 89−150−100−14. The map-

ping process in this case is analogical as for the writer recognition using direct solution.

The process of iconic gesture recognition has the influence both on the features

and network parameters selection. Each time, the different icon is plotted the set

of features is changed and the network parameter values are switched. In the ex-

periment two sets of features are considered: the whole set, which contains the g-48

and Cosine Representation features and the second one, which contains only the g-48

Page 50: Multistage neural networks for pattern recognition829352/FULLTEXT01.pdfMultistage neural networks for pattern recognition Maciej Zieba School of Engineering Blekinge Institute of Technology

CHAPTER 7. EXPERIMENT 42

features. As the empirical research shows, for some icons the extended set of features

with Cosine Representation gives better recognition results, but for other icons the

result are worse then for g-48 features set only. It is easy to observe, that for different

results on the second stage the input vector for the first stage network is going to have

different lengths. Besides the dependent features selection the values of weights must

be changed in way determined by the second stage recognition result. In the result

there are 14 sets of possible weights values on the first stage. The number of neurons

creating the first stage network is constant, however, due to different input vector

lengths the number of parameters in sets of weights is also different. It is important

to notice, that all network parameters depend on second stage result of recognition, so

the process of switching weights is actually the process of switching the whole network.

The weights on the second stage were estimated using backpropagation algorithm.

This network is totally independent and can be used separately for iconic gesture

recognition. For each possible output on the second stage two sets of parameters

describing the network on the first stage are estimated using backpropagation algo-

rithm. The network is trained using the whole vector of features in the first case.

The second kind of training does not include the Cosine Representation. In both

cases the training set contains only samples of considered gesture. As a result, 28

sets of weight values are gained. For each pair of sets related to the chosen icon, the

set which gave better performance on the testing set is chosen. In practice it says,

which set of features used to be taken to achieve better accuracy of writer recognition.

Concluding, if the recognition result on the second stage is j2,i, the set of features

which gave better performance on the testing set is going to be selected. Then, the

corresponding to j2,i result weight values would be selected and used to gain j1,k,

which is the result of recognition on first stage.

7.3.3 Method 3 - Two-stage MFN for PR

The third method considered in the experiment is two-stage MFN. The model of

this kind of network was described in Chapter 4. The second stage of the model con-

sists of 586 networks with output vector of size 50. As it was presented in Chapter

4 each output value on the second stage is directly connected with the corresponding

weight value. The input vector on the first stage consists of g-48 features, while the

vector on second stage also included the Cosine Representation.

Page 51: Multistage neural networks for pattern recognition829352/FULLTEXT01.pdfMultistage neural networks for pattern recognition Maciej Zieba School of Engineering Blekinge Institute of Technology

CHAPTER 7. EXPERIMENT 43

The two-stage MFN were estimated using modified method described in Chapter

6. Each tact of estimation was related with one type of iconic gesture.

7.4 Testing and validation details

As the criteria of evaluating the performance of recognition system the accuracy of

recognition is taken. Matlab v. 7.5 were used as an experimental tool. The dataset

used for the experiment was split into three sets using stratified sampling [31]:

1. Training set 36%.

2. Testing set 24%.

3. Evaluation set 40%.

Training set is used for parameter estimation. During training current condition

of the learning precess is examined using testing set. To avoid overfitting, those pa-

rameters are chosen for which the average training and testing recognition accuracy is

the highest. The real recognition accuracy is checked after training using evaluation

set.

7.5 Results of experiment

Figure 7.3: Values of accuracy rate for presented methods.

Page 52: Multistage neural networks for pattern recognition829352/FULLTEXT01.pdfMultistage neural networks for pattern recognition Maciej Zieba School of Engineering Blekinge Institute of Technology

CHAPTER 7. EXPERIMENT 44

Method 1 Method 2 Method 3

Time of estimation ∼ 115 s ∼ 8 min. ∼ 19 h

Table 7.1: Times of estimation for methods used in experiment.

Results of the experiment are presented in Figure 7.3. The direct solution (

Method 1) gave significantly better results then Method 2 and Method 3. The

reason for the bad performance of the Method 2 is poor recognition rate on the

second stage. The icons were recognized correctly in about 95%. The Figure 7.4

presents the icons, which unexpectly occured on the second stage as a result of recog-

nition more then 2 times during testing. The process of testing during training avaible

detection of unexpected outcomes on the second stage. If repetivly bad recognition

can be obsereved on the second stage while testing, it is high probability, that some

area of PR process on this stage does not work well. To correct this problem the

Method 4 is going to be presented.

Figure 7.4: Icons, which wrongly occurred as a result of icon recognition on the

second stage more then 2 times while testing.

Method 3 , which uses the two-stage NN gave the worst results. The structure

of the network was very complex, what was caused by large number of parameters

on the second stage. It was necessary to train 586 networks on the second stage. If

the number of networks is low there is a possibility to train the network couple of

times for different α values and choose the best values of weights. It was impossible

for Method 3 . In such complex system couple of bad estimated weight values on

the second stage can have big influence on whole network performance.

Page 53: Multistage neural networks for pattern recognition829352/FULLTEXT01.pdfMultistage neural networks for pattern recognition Maciej Zieba School of Engineering Blekinge Institute of Technology

CHAPTER 7. EXPERIMENT 45

It is also important to analyse the times of parameters estimation for considered

methods. The main disadvantage of usage NN is the time of estimation for com-

plex problems. As it was shown in table 7.1 times of parameter estimation for the

systems were extremely large. For single MFN it was almost two minutes. It was

cased by extensive and frequent process of testing while training and usage of poor

simulation environment. For Method 2 estimation took about 8 minutes. For this two

methods time reduction is possible, but time of estimation for Method 3 was totally

unacceptable.

7.6 Method 4 - Two-stage smart switching

As it was mentioned before, the main reason for bad performance of two-stage

recognition system presented as Method 2 was insufficient recognition rate on the

second stage. To eliminate this case simple Method 4 is going to be presented.

Figure 7.4 presents the gestures with high possibility of being badly classify, what

was examined on the testing set. It can be assumed that this type of recognition

mistakes can occur in practical usage of the network. This kind of incorrectness can

be easily eliminated. The two-stage system presented in this section will switch the

network to those, used in Method 1 if on the output of the second stage one of

the icons from 7.4 occurred as a result of recognition. Otherwise, the system would

work as typical two-stage recognition system presented as Method 2. This simple

modification, which combines two previously used methods, gave significantly better

results for writer recognition ( Figure 7.5 ).

Figure 7.5: Value of accuracy rate for Method 4 comparing to previously used

methods.

Page 54: Multistage neural networks for pattern recognition829352/FULLTEXT01.pdfMultistage neural networks for pattern recognition Maciej Zieba School of Engineering Blekinge Institute of Technology

Chapter 8

Conclusions

Research Question 1: How the MNN can be defined ?

The topic of this thesis is related with concept not defined in literature. As it was

mentioned in introduction, there are only couple of items related with the notion of

MNN. The idea of the MNN presented in the thesis is strongly rooted in multistage

identification concept. Before presenting the definition of MNN it was necessary to

present brief review about NNs. As there are plenty of network models only MFN were

chosen to be considered in the work. Defining the multistage MFN was introduced by

presenting the concept of two-stage neuron model. It was important to consider the

influence of couple of neurons on weight values so the need of some kind of binding

function was provided. In practice, if the two-stage neurons construct some NN, that

character of binding between stages other then totally direct is out of sense. There is

no need to add new complex binding function because influence of input vector on the

second stage is cumulated in the network. Some trial of defining the two-stage MFN

with two-stage neuron were made, but it was observed that it is much easier to define

it as two directly binded MFNs, characterized by weight values on the second stage.

It seems to be natural, because even if we consider the single MFN trained with back-

propagation algorithm it is just a kind of equation with parameters estimated with

steepest decent algorithm. Finding the relationship with neurobiology seems to be

artificial and commercial. Concluding, the two-stage MFN model is a special case of

two-stage identification with arbitrary chosen model of the function and estimation

method. The multistage MFN was defined at the end of Chapter 4.

Research Question 2: How to estimate parameters of MNN ?

46

Page 55: Multistage neural networks for pattern recognition829352/FULLTEXT01.pdfMultistage neural networks for pattern recognition Maciej Zieba School of Engineering Blekinge Institute of Technology

CHAPTER 8. CONCLUSIONS 47

As the relationship between multistage identification and MNN can be observed

the process of estimation multistage MFN is analogical as for identification. The

algorithm of estimation used during each tact was simply the backpropagation algo-

rithm. The formally described algorithm for estimation was presented in Chapter 6.

Research Question 3: How to design the MNN for PR ?

The concept MNN seemed to ideally fit identification problems. The model of

MNN can be also used instead of classical MPR. The MNN should consist of as

many stages as in consider recognition problem. Network on the first stage should

be designed as for typical PR problem. The process of constructing MNN for PR is

described in details in Chapter 4.

It is also important to highlight the need of modification of parameter estimation

if the MNN is used for PR problem. The process of tacting for each constant value

of lower stage input can be difficult to achieve. Instead, there is possibility of tacking

for each possible recognition result related with the stage where constant value should

be assumed. This modification is described in Chapter 6.

Research Question 4: How MNN perform for chosen problem of PR comparing

to other methods ?

The problem of writer identification was taken as a chosen PR problem. Three

methods were considered for the problem: direct recognition with one-stage MFN,

two-stage recognition model with iconic gesture recognition on the second stage, and

two stage NN which took vector of features describing icon on the second stage, and

vector of features describing writer on first stage. The results were the best for direct

solution. The two-stage PR model performed due to poor gesture recognition rate.

It was corrected by using the forth method, which was the combination of direct and

two-stage solution. The usage of the two-stage network for the problem was very

ineffective and gave the worst results. It was caused by two-complex structure, what

made the training process on the second stage very difficult to control. The MNN

should be used for problems, which seeks low number of parameters on the first stage.

Research Question 5: How to improve MNN method for pattern recognition

to increase correctness of recognition ?

Page 56: Multistage neural networks for pattern recognition829352/FULLTEXT01.pdfMultistage neural networks for pattern recognition Maciej Zieba School of Engineering Blekinge Institute of Technology

CHAPTER 8. CONCLUSIONS 48

The good solution to improve method MNN is to use two-stage PR model extended

on smart switching process. This method performed the best in the experiment de-

scribed in Chapter 7 . This kind of improvement minimizes incorrectness on first

stage of recognition process and in parallel maximize the correctness of recognition

on the second stage.

Page 57: Multistage neural networks for pattern recognition829352/FULLTEXT01.pdfMultistage neural networks for pattern recognition Maciej Zieba School of Engineering Blekinge Institute of Technology

Chapter 9

Future Works

As it was mentioned in previous chapter the usage of MNN for PR problems is in-

effective and does not correct the accuracy of recognition. It is interesting to consider

the MNN for multistage identification problems. The networks should be designed in

effective way to avoid to complex structures.

As to experiment provided in the thesis it is interesting to analyse two-stage model

with various possible methods used as recognition algorithms. SVM can be good so-

lution in this case. It is also worth to analyse the deeper features selection indicated

by result of gesture recognition.

It can be also worth to study hybrid NN systems. It could be interesting to exam-

ine the possibility of steering the Hebian learning for Hopfield network by the three

layer neural network.

49

Page 58: Multistage neural networks for pattern recognition829352/FULLTEXT01.pdfMultistage neural networks for pattern recognition Maciej Zieba School of Engineering Blekinge Institute of Technology

Bibliography

[1] AKHTAR JAMEEL Experiments with various Recurrent Neural Network Ar-

chitectures for Handwritten Character Recognition , IEEE Xplore, 1994

[2] ANIL K. JAIN, JIANCHANG MAO K. M. MOHIUDDIN Artificial Neural

Networks - A Tutorial , IEEE Xplore, 1996

[3] BALTZAKIS H., PAPAMARKOS N. A new signature verification technique

based on a two-stage neural network classifier , Elsevier, 2001

[4] BENSEFIA A., NOSARY A., PAQUET T., HEUTTE L., Writer Identification

By Writers Invariants , IEEE Xplore, 2002

[5] BISHOP C. M., Neural Networks for Pattern Recognition , Oxford University

Press, 2005

[6] CONNELL S. D., JAIN A. K., Writer Adaptation for Online Handwriting

Recognition , IEEE Transactions on Pattern Analysis and Machine Intelligence,

Vol. 24, NO. 03, MARCH 2002

[7] CORNELIUS P., GRBIC N. Neural Networks - Lecture notes for course Neural

Networks (ETD007) , BTH, Karlskrona, 1999

[8] DUY BUI Classifying Online Handwriting Characters under Cosine Represen-

tation , IEEE Xplore, 2007

[9] GIERACHA J., Recursive two-stage estimation algorithms , Wroclaw Univer-

sity of Technology - PhD thesis, Wroclaw, 2005 (in Polish)

[10] GRBIC N., Development of a General Purpose On-Line Update Multiple Layer

Feedforward Backpropagation Neural Network , Master Thesis MEE 97-04, BTH

, Karlskrona/Ronneby, 1997

50

Page 59: Multistage neural networks for pattern recognition829352/FULLTEXT01.pdfMultistage neural networks for pattern recognition Maciej Zieba School of Engineering Blekinge Institute of Technology

BIBLIOGRAPHY 51

[11] JAEGER S., MANKEL S., REICHERT J., WAIBEL A., Online handwriting

recognition: the NPen++ recognizer, Springer, IJDAR (2001) 3: 169180

[12] KAMRUZZAMAN J., AZIZ S. M. A Note on Activation Function in Multilayer

Feedforward Network , IEEE Xplore, 2002

[13] KOSINSKI R. A., Artificial Neural networks: non-linear dynamics and chaos ,

WNT , Warszawa, 2007 (in Polish)

[14] KURZYNSKI M., Pattern recognition - statistic methods , Oficyna Wydawnicza

Politechniki Wroclawskiej, Wroclaw, 1997 (in Polish)

[15] KWASNICKA H., Evolutionary designing of neural networks , Oficyna

Wydawnicza Politechniki Wroclawskiej, Wroclaw, 2007 (in Polish)

[16] KWASNICKA H., MARKOWSKA-KACZMAR U., Neural networks in practise

, Oficyna Wydawnicza Politechniki Wroclawskiej, Wroclaw, 2005 (in Polish)

[17] MARUKATAT S., SICARD R., ARTIERES T., GALLINARI P. A Flexible

Recognition Engine for Complex On-line Handwritten Character Recognition ,

IEEE Xplore, 2003

[18] MOHAMED N. AHMED, ALY A. FARAG Two-stage neural network for vol-

ume segmentation of medical images , Elsevier, 1996

[19] NIELS R., VUURPIJL L., Generating copybooks from consistent handwriting

styles , Nijmegen Institute for Cognition and Information Radboud University

Nijmegen, Nijmegen, The Netherlands, 2008

[20] NIELS R., WILLEMS D., VUURPIJL L., The NicIcon Database of Handwrit-

ten Icons for Crisis Management, Nijmegen Institute for Cognition and Infor-

mation Radboud University Nijmegen, Nijmegen, The Netherlands, 2008, link:

http://unipen.nici.ru.nl/NicIcon/.

[21] ROJAS R., Neural Networks - A Systematic Introduction , Springer, Berlin,

1996

[22] RUMELHART D. E., MCLELLAND J. L. Parallel Distributed Processing ,

Cambridge, MA.: The M.I.T. Press, 1986.

[23] SANTANA O., TRAVIESO C. M., ALONSO J. B., FERRER M. A., Writer

Identification Based on Graphology Techniques, IEEE Xplore, 2008

Page 60: Multistage neural networks for pattern recognition829352/FULLTEXT01.pdfMultistage neural networks for pattern recognition Maciej Zieba School of Engineering Blekinge Institute of Technology

BIBLIOGRAPHY 52

[24] SCHLAPBACH A., LIWICKI M., BUNKE H., A writer identification system

for on-line whiteboard data , Elsevier, Pattern Recognition 41 (2008) 2381 2397

[25] SWIATEK J., Two-stage identification and its technical and biomedical appli-

cations , Wydawnictwo Politechniki Wroclawskiej, Wroclaw, 1987 (in Polish)

[26] TENG LONG, LIAN-WEN JIN, LI-XIN ZHEN, JIAN-CHENG HUANG, One

Stroke Cursive Character Recognition Using Combination of Directional and

Positional Features , IEEE Xplore, 2005

[27] UNIPEN FUNDATION, link: http://www.unipen.org/

[28] VIELHAUER C., STEINMETZ R., MAYERHOFER Biometric Hash based on

Statistical Features of Online Signatures , IEEE Xplore, 2002

[29] WILLEMS D., NIELS R., Definitions for Features used in Online Pen

Gesture Recognition , Nijmegen Institute for Cognition and Information

Radboud University Nijmegen, Nijmegen, The Netherlands, 2008, link:

http://unipen.nici.ru.nl/NicIcon/

[30] WILLEMS D., NIELS R., VAN GERVEN M., VUURPIJL L., Iconic and

multi-stroke gesture recognition, Elsevier, Pattern Recognition (2009),doi:

10.1016/j.patcog.2009.01.030.

[31] WITTEN I.H., FRANK E., Data Mining. Practical Machine Learning Tools

and Techniques, Elsevier, San Francisco, 2005

[32] YAMADA TAKAYUKI , YABUTA TETSURO, Remarks on Neural Network

Controller Using Different Sigmoid Functions , IEEE Xplore, 1994

[33] YANG FENG , YANG FAN Character Recognition Using Parallel BP Neural

Network , IEEE Xplore, 2008

Page 61: Multistage neural networks for pattern recognition829352/FULLTEXT01.pdfMultistage neural networks for pattern recognition Maciej Zieba School of Engineering Blekinge Institute of Technology

Appendix A

Features description

Lp Name Definition

1 Length Φ1 =∑N−1

n=1 ‖ sn+1 − sn ‖2 Area Φ2 = A

3 Compactnes Φ3 =Φ2

1

A

4 Eccentricity Φ4 =√

1− b2

a2

5 Ratio coord. axes Φ5 = b′

a′

6 Closure Φ6 =∑N−1

n=1 ‖sn+1−sn‖‖sN−s1‖

7 Circular variance Φ7 =∑N

n=1(‖sn−µ‖−Φ73)2

NΦ273

8 Curvature Φ8 =∑N−1

n=2 Ψsn

9 Avg. curvature Φ9 = 1N−2

∑N−1n=2 Ψsn

10 Abs. curvature Φ62 =∑N−1

n=2 | Ψsn |11 Squared curvature Φ63 =

∑N−1n=2 Ψ2

sn

12 Avg. direction Φ12 = 1N−1

∑N−1n=1 arctan

yn+1−yn

xn+1−xn

13 Perpendicularity Φ13 =∑N−1

n=2 sin2Ψsn

14 Avg. perpendicularity Φ14 = 1N−2

∑N−1n=2 sin

2Ψsn

15 Centroid offset Φ16 =‖ p(µ− c) ‖16 Length princ. axis Φ17 = α

Table A.1: Table of features g-48 (1) (taken from [30]).

53

Page 62: Multistage neural networks for pattern recognition829352/FULLTEXT01.pdfMultistage neural networks for pattern recognition Maciej Zieba School of Engineering Blekinge Institute of Technology

APPENDIX A. FEATURES DESCRIPTION 54

17 Orient. princ. axis Φ18 = sinΨ, Φ19 = cosΨ

18 Ratio ofprinc. axes Φ67 = βα

19 Length b. box diag. Φ57 =√a2 + b2

20 Angle b. box diag. Φ58 = tan ba

21 Rectangularity Φ20 = Aαβ

22 Max. ang. difference Φ21 = max1+k≤n≤N−kΨksn

23 Cup count see [29]

24 Last cupoffset see [29]

25 First cupoffset see [29]

26 Initial hor. offset Φ35 = x1−xmin

a

27 Final hor. offset Φ36 = xN−xmin

a

28 Initial ver.offset Φ37 = y1−ymin

b

29 Final verr. offset Φ38 = yN−ymin

b

30 N straight lines see [29]

31 Straight line ratio see [29]

32 Largest str. line ratio see [29]

33 Sample ratio octants see [29]

34 N connected comp. see [29]

35 N crossings see [29]

36 Initial angle Φ55 = x3−x1

‖s3−s1‖ , Φ56 = y3−y1‖s3−s1‖

37 Dist. first-last Φ59 =‖ sN − s1 ‖38 Angle first-last Φ60 = xN−x1

‖sN−s1‖, Φ61 = yN−y1

‖sN−s1‖

39 Avg. centr. radius Φ59 = 1N

∑Nn=1 ‖ sn − µ ‖

40 Duration Φ24 = tN − t141 Avg. velocity Φ25 = 1

N−2

∑N−1n=2 Vn

42 Max. velocity Φ25 = max2≤n≤N−1Vn

43 Avg. acceleration Φ28 = 1N−4

∑N−2n=3 an

44 Max. acceleration Φ29 = max3≤n≤N−2an

45 Max. deceleration Φ30 = min3≤n≤N−2an

46 Npen down see [29]

47 Avg. pressure Φ20 = 1N

∑Nn=1 fn

48 Penup/down ratio see [29]

Table A.2: Table of features g-48 (2) (taken from [30]).

Page 63: Multistage neural networks for pattern recognition829352/FULLTEXT01.pdfMultistage neural networks for pattern recognition Maciej Zieba School of Engineering Blekinge Institute of Technology

APPENDIX A. FEATURES DESCRIPTION 55

Lp Description Notation

1 Unit vectors (x- and y-axes)

spanning R2

e1 = (1, 0), e2 = (0, 1)

2 Pen trajectory with N sam-

ple points

= = {σ1, . . . , σN}

3 Sample σi = {si, fi, ti}4 Position si = (xi, yi)

5 Area of the convex hull A

6 Angle between subsequent

segments

Ψsn = arctan (sn−sn−1)(sn+1−sn)‖sn−sn−1‖‖sn+1−sn‖

7 Length along the x-axis a = max1≤i<j≤N | xi − xj |.8 Length along the y-axis a = max1≤i<j≤N | yi − yj |.9 Center of the bounding box c = (xmin + 1

2(xmax + xmin) , ymin +

12(ymax + ymin))

10 Longest edge-length of the

bounding box

a′= a if a > b else a

′= b

11 Shortest edge-length of the

bounding box

b′= b if a < b else b

′= a

12 Principal components pi

13 Angle of first principal axis Ψ = arctanp1e2p1e1

14 Length of first principal axis α = 2max0≤i<N | p2(c− sn) |15 Length of second principal

axis

α = 2max0≤i<N | p1(c− sn) |

16 Centroid µ = 1N

∑Nn=1 sn

17 Velocity Vi = si+1−si−1

ti+1−ti−1

18 Acceleration ai = Vi+1−Vi−1

ti+1−ti−1

Table A.3: The notation and definitions used in A

(taken from [30]).