characters extraction for traffic sign destination boards...

Characters Extraction for

Traffic Sign Destination

boards in video and still

images

Qiu Peng

2010.9.30

Master

Thesis

Computer

Engineering

Nr:E3986D

II

DEGREE PROJECT

Computer Engineering

Programme Reg number Extent

Masters Programme in Computer Engineering - Applied

Artificial Intelligence

E3986D 15 ECTS

Name of student Year-Month-Day

Qiu Peng 2010.9.30 Supervisor Examiner

Hasan Fleyeh

Company/Department Supervisor at the Company/Department Computer Science Hasan fleyeh

Title

Recognition characters on the Destination board Keywords RGB, HSV, extract character, SVM

Abstract

Traffic Control Signs or destination boards on roadways offer significant information for

drivers. Regulation signs tell something like your speed, turns, etc; Warning signs warn

drivers of conditions ahead to help them avoid accidents; Destination signs show distances

and directions to various locations; Service signs display location of hospitals, gas and rest

areas etc. Because the signs are so important and there is always a certain distance from them

to drivers, to let the drivers get information clearly and easily even in bad weather or other

situations. The idea is to develop software which can collect useful information from a special

camera which is mounted in the front of a moving car to extract the important information and

finally show it to the drivers. For example, when a frame contains on a destination drive sign

board it will be text something like "Linkoping 50",so the software should extract every

character of "Linkoping 50", compare them with the already known character data in the

database. if there is extracted character match "k" in the database then output the destination

name and show to the driver. In this project C++ will be used to write the code for this

software.

III

ACKNOWLEDGMENT First, I would like to thank my advisor, Mr. Hasan Fleyeh. The decision he made in Fall, last

year, to take me as one of his graduate assistants gave me the opportunity to do the research I

am interested in. I am grateful to his support and advice ever since then. He creates a

wonderful and dynamic environment for me to learn and gives me the freedom to explore the

interesting problems in field of Computer Vision and Digital Image Processing.

1

TABLE OF CONTENS 1 . C h a p t e r 1 I n t r o d u c t i o n……………………………. .………………… . . 3

1.1 The background………………………………………………………….…………….4

1.2 Application of road sign recognition system………………………………...………...4

1.3 Aim………………………………………………………………………...………......5

1.4 Contents arranged………………………………………………………….……..……5

2 . Chapter 2 I ma ge process ing theory…………………………………… . 7

2.1 Image acquisition………….………………………………………………………...…8

2.2 The HSV color model………………..………………………………………………...8

2.2.1 Theory details……………………….…………………………………………….8

2.2.2 HSV color model definition……………………………………………...……….9

2.3 Image segmentation…………………………..……………….………….……….….10

2.4 Shadow and highlight invariant color segmentation……...……………………..…....10

2.4.1 Theory details……………………………………………………………..….…..11

2.5 The noise problem……………………………….…………………………….….…..11

2.5.1 Problem with noise filters…………………………...……………………...……11

3. Chapter 3 Support vector machine…………………………………… .15

3.1 Introduction…………………………………………………………………..…...…..16

3.2 Machine learning……………………………………………………………..…..…...16

3.3 Statistical learning theory…………………………………………………...………...16

3.4 Support vector machine……………………………………………………...………..17

3.5 Two situations……………………………………………………………….…..……18

3.5.1 Liner separated problem…………………………………………………….…....18

3.5.2 Non liner separated problem………………………………………………....…..20

3.6 Kernel function……………………………………………………………………….21

3.7 Use Kernel function to solve non liner problem………………………………….......21

4 . Chapter 4 The implementa t ion………………… . .………… . . .………22

4.1 Real time traffic signs recognition flowchart…………………………………….…...24

4.2 application component…………………………………………..………………....…25

4.2.1 Implementation of background extraction module…………………………………25

4.2.2 Apply shadow and highlight invariant segmentation algorithm…………………....29

4.2.3 Algorithm Implementation………………………………………………………….30

4.2.4 Extraction area implementation………………………………………………...…..31

4.2.5 Second Time Image Processing Module…………………………………...………34

4.2.6 Character extraction module implementation………………………………………37

4.2.7 Training and testing with SVM……………………………………………………..42

5. Chapter 5 Analys i s and resul t………………………………………… .47

5.1 Analysis the application……………...…………………………………….…………48

5.1.1 Analysis with the character extraction part…………………………………………48

5.1.2 Analysis of hsv color model image…………………………………………………48

5.1.3analysis of the character extraction algorithm………………………………………48

5.1.4 analysis of the noise filter algorithm……………………………………..…………50

5.1.5 result……………………………………………………………….………………..51

5.1.6 Character recognition……………………………………………………………….66

5.1.7 SVM applied here……………………………………………………………….….66

5.1.8 Test with Liner function…………………………………………………………….68

5.1.9 Test with Polynomial function……………………………………………………...70

5.1.10 Test with RBF function……………………………………………………………72

5.1.11 Test with Sigmoid…………………………………………………………………74

Qiu Peng Masters September, 2010

E3986D

Dalarna University Tel: +46(0)23 7780000 Röda vägen 3S-781 88 Fax: +46(0)23 778080 Borlänge Sweden http://www.du.se

2

6. Chapter 6 conclusion and future works…………………………………....…………..76

6.1 conclusion……………………….…………………………………………………..77

6.2 future works…………………………..……………………………...……………...77


E3986D


3

Chapter 1 Introduction


E3986D


4

1.1 The background In an environment with all kinds of traffic signs and city names. These kind of things plays an

important role in regulating the traffic and warns the driver to prohibit certain actions for their

safety and for the safety of their passengers.

Road signs use colors, shapes, and markings to communicate message to on road drivers.

Without such information the motion of traffic would be disorderly and unpredictable. It‟s

very crucial for drivers to identify road signs at right time, at right place but at times when

everything is expected to be perfect, from others off course, we tend to forget the inherent

imperfection of mankind . Noticing these safety precaution signs on the road greatly depends

on the physical and mental health of the drivers. There visual perception ability can be

affected by stress, tension and physical illness and sometimes it‟s the lack of knowledge about

road signs. According to a recent poll conducted by motoring website, New Car Net, one in

three motorists fail to recognize even the most basic Road Signs. It‟s because of these reasons

an autonomous robust real time road sign recognition system has gained interest since last two

decades. The very first paper appeared in 1984 which aimed on testing various computer

vision methods for detection of objects in outdoor scenes. Since then many research groups

and companies have been interested and have conducted research in the field. Computer

vision has been applied to a wide variety of intelligent transport systems (ITS)[1] such as

traffic monitoring system, traffic related parameter estimation and intelligent vehicles, and an

important part of intelligent vehicles is the detection and recognition of Road signs. A robust

real time and automatic road sign detection and recognition system can really support and

disburden drivers by giving information at good time; it can increase driving efficiency, save

lives and can provide driving comfort.

1.2 Application of Road Sign Recognition System The Road Sign Recognition is a field of applied computer vision research concerned with the

automatic detection and classification of road signs in traffic scene images acquired from a

moving car. The result of this research effort will be the subsystem of Driver Support System

(DSS). The aim is to provide DSS with the ability to understand its neighborhood

environment and so permit advanced driver support such as collision prediction and

avoidance.

Employing computer vision technology in smart vehicle design calls for consideration of all

its advantages and disadvantages. Firstly, vision subsystem incorporated into the DSS may

exploit all the information processed by human drivers without any requirements for new

traffic infrastructure devices (a very hard and expensive task). Smart cars equipped with

vision based systems will be able to adapt themselves to operate in different countries (with

often quite dissimilar traffic devices).

As the integration of various technologies in the field of traffic engineering has been

introduced (ITS) the convenience of computer vision usage has become more obvious. We

may observe this trend e.g. in proceedings of annual IEEE International Conference on

Intelligent Vehicles (IVS). More than 50% of papers are focused on Image Processing and

Computer Vision method.


E3986D


5

Obviously, there exist even disadvantages of the vision-based approach. Smart vehicles will

operate in real traffic conditions on the road. So, the algorithms must be robust enough to give

good results even under adverse illumination and weather conditions. Although this system

property may seem to be solved easily it is the real challenge for the algorithm developers.

For example Fridtj of Stein, main project manager of Cleopatra project (Clusters of embedded

parallel time-critical applications) said that "reliable optical detection is the biggest hurdle the

project must overcome".

There cannot be assured absolute system reliability and the system will not be "fail-safe"

because of the definition of individual transportation system. The aim is to provide a level of

safety similar to or higher than that of human drivers. For example it could assist drivers

about signs they did not recognize before passing them. Specifically, speed limit sign

recognition could provide driver the present speed limit as well as giving an alert if a car is

driven faster than the speed limit.

In future, autonomous vehicles would have to be controlled by automatic road sign

recognition. As with any vehicle, an autonomous vehicle driving on public roads must obey

the rules of the road. Many of these rules are conveyed through the use of the road signs,

soan autonomous vehicle must be able to detect and recognize sings and change its behavior

accordingly.

1.3 Aim Aim of this research project is to present an Intelligent Road Sign Recognition System based

on state-of-the-art technique, the Support Vector Machine and image processing skills.

The project is an extension to the already known system that can recognize traffic sign. This

application can extract every character on the destination board then output the city name.

1.4 Contents arranged Chapter 1 Image processing

Image acquisition (this part introduced how and what the types of images was

captured)

HSV color model (this part introduced from the RGB color model to the HSV

color model, and the advantage the HSV have compared with

RGB in the segmentation field)

Shadow And Highlight Invariant Color Segmentation Algorithm (this part

shows how the

color was

extracted and

can be


E3986D


6

distinguished

with other

colors)

Character extract (show how the characters was extracted)

Character Normalization (normalize the extracted characters to be 30*30 pixels)

Chapter 2 SVM theory

Introduction (introduce the SVM theory)

Machine learning (shows the origin of SVM)

Statistical Learning Theory (another part of the origin of SVM)

Support Vector Machine (what is SVM, and how it works)

Two situations (the liner and non-liner problems)

Kernel Function (introduce the kernel formula)

Chapter 3 Implementation (shows the steps how the theory works with real life problems)

Chapter 4 Analysis (analysis the application based on these theories, and shows out how the

application works, the result we get by this application)


E3986D


7

Chapter 2 Theoretical background


E3986D


8

2.1 Image acquisition Image acquisition is the first step of the Traffic Sign Recognition. An input image can be

either taken by the live stream from the camera mounted on the vehicle‟s deck or taken from

the video for an experimental purpose. The video format, acceptable by the OpenCV platform,

should be in the AVI format. Each frame of the video is in a RGB Image format. The

dimension of captured image is set to be 400 x 600 pixels set by my application.

Figure below shows such an example

Figure 2.1 sample image from video stream

2.2 The HSV color model

1. The image acquired by the camera is in RGB format is greatly sensitive to chromatic

variation of the daylight. The coordinates of three colors are highly correlated. As a

result of this any variation in the ambient light intensity affect the RGB system by

shifting the cluster of colors towards the white or the black corners. As a result, it

will be hard to recognize the object.

2. HSV was the ideal color model for the recognition problem since it decouples the chromatic and achromatic notion of light. This method is also preferable because

Hue feature is invariant to shadows and highlights.

3. HSV represents the colors in a similar way by which human eye senses the color.

2.2.1 Theory details

Every Color in this space is represented by three components:

1. Hue (H): the apparent light color (determined by dominant wavelength).

2. Saturation (S): the purity of light.

3. Value (V): the total light across all frequencies.

The HSV model is illustrated as a conical object. The cone is usually represented in the three


E3986D


9

dimensional form. The hue is represented by the circular part of the cone. The saturation is

calculated using the radius of the cone and value is the height of the cone. Advantage of the

conical model is that it is able to represent the HSV color space in a single object.

2.2.2 HSV color model defination

Figure 2.2 HSV color model

The hue Red, Green and Blue (RGB) are the three primary colors used by computer monitors. 180

degree away from a primary, none of it is mixed in. These colors are the complement hues i.e.

Cyan, Magenta and Yellow. The next level colors are between the secondary and primary

colors, are called the tertiary hue colors. This process continues, creating a solid ring of colors

around the primaries. This definition of color describes just one dimension of color that is hue.

Hue is more specifically described by the dominant wavelength. Hue describes a dimension of

color readily experienced by the eye. Hence it is the dimension of color interpreted by the

human brain.

The value Value is the brightness of the color, ranges from 0 to 100% and varies with color saturation.

When the value is 0, the color space will be completely black. In terms of a spectral definition

of color, value describes the overall intensity or strength of the light. If The hue can be

thought of as a dimension. go around a wheel, then value is a linear axis like an axis running

through the middle of the wheel as shown in figure up.

The saturation Saturation refers to the dominance of hue in the color. On the outer edge of the hue wheel, are

the 'pure' hues. Near the center of the wheel, the hue to describe the color dominates less and

less. Exactly in the center of the wheel, no hue dominates. These colors directly on the central

axis are considered de-saturated. These de-saturated colors constitute the gray scale ranges

from 0 to 100%, running from white to black with all of the intermediate grays in between,


E3986D


10

perpendicular to the Value axis.In terms of a spectral definition of color, saturation is the ratio

of the dominant wavelength to other wavelengths of color. White light is white because it

contains an even balance of all wavelengths.

Here are two images with RGB color field and HVS color field

Figure 2.3 RGB and HSV image

2.3 Image segmentation

1. Image Segmentation is a process by which the specific objects in the image are

distinguished from the background. Based on the color information candidate traffic sign

needs to be separated from the rest of the image.

2. By segmenting the image in the binary image, only two types of pixels are left to be

processed, those are “white and black”. In this way the complexity of the image

processing will be reduced for Traffic Sign Recognition.

3. The processing time will be improved too, because only two intensity levels will be

used for processing the image.

2.4 Shadow And Highlight Invariant Color Segmentation

Algorithm Most of the times, the weather condition will give big problems for us to extract traffic signs,

for example may strong sun shine will make some color of traffic signs missing.

Figures showing below

Figure 2.4 original image


E3986D


11

Figure 2.5 color segmentation with better algorithm

2.4.1 Theory details The color segmentation algorithm is carried out by taking RGB images using a digital camera

mounted on a moving car. The images are converted to HSV color space. The hue, saturation,

and value are normalized into [0,255]. The HSV color space is chosen because the Hue

feature is invariant to shadows and highlights.

While normalized Hue is used as a priori knowledge to the algorithm, normalized Saturation

and Value are used to specify and avoid the achromatic subspaces in HSV color space. When

the hue value of color of the pixel in the input image is with the specified color range

specified in figure below, and its hue value is not in the achromatic area, then the

corresponding value in the output image is set to white. The output image is then divided into

a number of 16x16 pixel sub-images and used to calculate the seeds for the region growing

algorithm. A seed in initiated if the number of white pixels in the output image is above a

certain threshold level. Region growing algorithm is then applied to find all the objects in the

output image which are big enough to initiate at least one seed. Noise and other small object

are rejected because of the region growing algorithm. This has an advantage that no more

filtering is needed to delete these objected and the remaining objects are only the ones which

can be used for recognition.

2.5 The extraction of every character (traffic sign)

2.5.1 Character extraction algorithm Due to learn the traffic sign board, that gives some very important theories. Is for every traffic

board in the world, they all have a background. Then put the city name and other something

on the background.

Why the background will be painted out, is because, the background color was chosen very

carefully, totally different from the whole environment that can be easily looked, so the

background can give most attention to the people there is some information. If there was no

such background, only characters exist in the air. People are very easy to ignore they.

So the algorithm is built on such theory and combined with the background color which is

used in Sweden, light blue.

For the first, the application will get a image from the video stream.

Then applied the HSV algorithm to process the image. Extract the light blue color then turn

the light blue area white and the rest of them black.


E3986D


12

And rescan the image pixel by pixel from four directions. they are from top to bottom, from

bottom to top, from left to right, from right to left. And for every time scan, if the pointer meet

one white pixel them then break, and save the position values where they meet white pixel.

After rescan, should be given 4 position values, they are most top, most bottom,

Most left, and most right. So use these points, to find out the matrix that will be the area use to

extract the characters.

Combine the most top value and most left value to find out the left bound, and use the most

right, and most bottom value to find out the right bound.

After make sure the area where to extract.

Because there was commonly two colors being write down on the board (the white color and

the black color characters).

First for the white color, apply the HSV algorithm turn the whole image white color to be

absolutely white, and the rest black. then rescan again the image pixel by pixel inside the area

from top, left bound to bottom, right bound. and when find white pixel on any line and start a

matrix, continue scan, then for one line, if can not find any white pixels exist, then stop the

matrix, and keep the them into an array.

Second for the black color, apply the HSV algorithm turn the whole image black color to be

absolutely black, and the rest black. Then rescan again the image

Pixel by pixel inside the area from top, left bound to bottom, right bound. and

When find white pixel on any line and start a matrix, continue scan, then for one

Line, if can not find any white pixels exist, then stop the matrix, and keep the them

Into the same array as the white chars.

The next, is pop out every matrix, calculate the matrix position value, and from the values to

do scanning form left to right for every row from top to bottom, if can be find any color

change, then start a matrix to save them, and put it into the bottom of the array. if can not find.

Then means, this is a char. And put them into another array, that only keeps chars after

separation.

When every characters was separated, a final array we can get, that keeps every chars.


E3986D


13

FIGURES showed below:

Figure 2.6 original image


E3986D


14

Figure 2.7 image process result


E3986D


15

Chapter 3 Support vector machine


E3986D


16

3.1 Introduction Support vector machines are widely used for pattern classification because of their good

generalization ability compared with conventional classifiers. In support vector machines the

input space is mapped to a higher dimensional space called the Feature Space. The aim is to

find an optimal hyperplane in this higher dimension feature space that can separate the data in

the best way possible. Since training of a support vector machine is formulated as a quadratic

optimization problem with the number of variables being equal to the number of training data,

a global optimal solution can be achieved. Among those training data set the instances

necessary for the construction of the decision function are the ones closer to the class

boundary. These are called the Support Vectors.

3.2 Machine Learning Being a broad field of Artificial intelligence, Machine Learning is concerned with the

development of algorithms and techniques that allow computers to Learn. It has a wide

spectrum of applications including object recognition, medical diagnosis, speech and

handwriting recognition, robot locomotion, computer vision and many more. To be more

specific the goal of machine learning is to ensemble learning and adaptation abilities of living

species in computers; more deeply to program computers to use past experience to solve a

given problem. Machine learning under went a great deal of advancement in the late eighties

and nineties with the active research done in the field of Artificial Intelligence and Neural

Networks. These advancements in machine learning will lead researchers in understanding the

learning behavior in humans and animals and systems like I-Swarm Robots, that imitate the

behavior of ant colonies performing tasks which are much difficult and unsafe for humans to

performance, and the success of DARPA grand challenge have shown the achievements and

upcoming challenges in this field. Learning can be categorized in various types some as

follows:

Supervised learning

• Learning form examples.

• Learning by taking advice. Unsupervised learning

• Competitive learning.

• Clustering.

• Reinforcement learning. In context of object recognition, machine learning aims on finding a pattern of similarity or

structure in a data set that will lead to generalization of learning system and consequently

identification of unknown data.

3.3 Statistical Learning Theory Support vector algorithms are considered as the first practical spin-off of statistical learning

theory. Therefore, it‟s important to have a little insight about statistical learning theory

before going into details of Support Vector Machine. Statistical learning theory addresses the

fundamental issue of how to control the generalization ability of a neural network in

mathematical terms. Since SVM is a set of supervised learning algorithms, so statistical

theory is only reviewed in its context.


E3986D


17

There are three basic components interrelated with each other in a supervised learning model.

These are:

The feasibility of the system depends how much information does the training set has,

generated by the joint probability distribution function of environment and supervisor R(x, d) ,

for the learning system to have good generalization. Supervised learning problem can be

viewed as an approximation problem.

3.4 Support Vector Machine Support Vector Machine is a linear classifier, using the roots of statistical learning theory and

the very powerful kernel function, and are more demandingly used for solving classification

and regression problems. It‟s a linear machine closely related to classical Neural Networks,

infact a support vector machine with a sigmoid kernel function acts as a two-layer feed

forward neural network. SVM is based on the concept of decision planes that defines the

decision boundaries. . To explain the main idea of a support vector machine perhaps the

easiest way is to take the scenario of separating patterns that arises in context of pattern

classification. In that case the role of support vector machine would be to draw a decision

surface which will be called Hyperplane, Such that the distance between the closest samples

and the hyperplane is maximized. This distance between the closest sample and the

hyperplane is known as the Margin and the closest samples with respect to which we calculate

the margin are called the Support Vectors.

Finding a hyperplane with maximum margin is very important. It helps prevent data over

fitting problem and enables the system to classify unknown samples from testing set which

come closer to hyperplane. A hyperplane with maximum margin is called the Optimal

Hyperplane.

Any classification task consists of data instances divided into two sets:

• Training set: used to train the system.

• Testing set: used to test the learning of the system.

Now each instance in the training set has one “target value” called the Class Label along with

several “attributes” called as Features. The task of selecting the most suitable features for

learning and testing is called Feature Selection. It‟s these features that help the learning

system define the hyperplane.


E3986D


18

3.5 Two situations

3.5.1 liner separated problem For this problem, all the data can be line separated as figure showed below

Figure 3.1 liner separation model

The SVM can easily find some straight lines that can separated them.

According to the image:

Considering a finite set of input space

(3.1) generated through probability distribution function.

Xi represent data instance from input space X.

Di represent the corresponding output of input space { -1, +1 }

Optimal Margin Hyperplane:

Figure 3.2 optimal margin hyperplane

In neural terms a hyperplane separating a linearly separable data is represented by following

equation:


E3986D


19

(3.2) w is the weight vector orthogonal to the hyperplane (decision surface), controlling the angular

movement of the hyperplane.

b is the bias controlling the movement of the hyperplane parallel to the origin.

Figure below can present:

Figure 3.3 calculate separation line

The formal equation

Figure 3.4 equation of separation line

To emphasis the effect of choosing the decision surface with maximum margin let‟s take two

hyperplanes such that there orientation allows one to have greater margin then the other.


E3986D


20

Training sets

Figure 3.5 seperation line with training sets

Testing sets

Figure 3.6 seperation line with testing sets

Form this, we can easily find that: Data points in GREEN color are the points that come

inside the margin but still distinguished by the hyperplane but the data points in BLUE color

are the ones those are not recognized by the hyperplane.[6]

So, we can conclude from the above example figure some data instances came too close to

hyperplane but the left side hyperplane, the one with greater margin, was able to classify them

because of its flexibility but the hyperplane with small margin, the one on the right side,

wasn‟t able to classify some of the data instances as they lie on the hyperplane. Such flexible

hyperplane is called the Optimal Hyperplane giving the optimal results on both the training

and the testing set.

3.5.2 non-liner separated problem for the most of the real world problems is non-liner problem.[11][12]

These kind of problems requires non-linear dividing line for separating the instances into two

classes such as the one shown in the figure below:


E3986D


21

Figure 3.7 non-liner separation model

This is the point where some advance technique for handling the situation are required and

this is where the concept of Kernel comes in handy. Rather then fitting a nonlinear curve to

the data set the Support Vector Machine uses the kernel function to map the data into a

different space where a linear hyperplane can be used as the dividing line.

This higher dimensional mapping space is called the Feature Space. The functional concept of

kernel mapping is very important and powerful. It allows SVM models to perform separations

even on data set having very complex boundaries by using N-dimensional hyperplanes.

3.6 Kernel function Kernel defined the function to map the classes from a space that is non-liner separated to

another space that will be liner-separated.

Based on the kernel function, we can easily do training samples to get template, and input

data then get finally result.

3.7 Use kernel function solve non-liner problem A Support Vector Machine (SVM) performs classification by constructing an N-dimensional

hyperplane that optimally separates the data into two categories. SVM models are closely

related to neural networks. In fact, a SVM model using a sigmoid kernel function is

equivalent to a two-layer, perceptron neural network. Support Vector Machine (SVM) models

are a close cousin to classical multilayer perceptron neural networks. Using a kernel function,

SVM‟s are an alternative training method for polynomial, radial basis function and multi-

layer perceptron classifiers in which the weights of the network are found by solving a

quadratic programming problem with linear constraints, rather than by solving a non-convex,

unconstrained minimization problem as in standard neural network training. In the parlance of

SVM literature, a predictor variable is called an attribute, and a transformed attribute that is

used to define the hyperplane is called a feature. The task of choosing the most suitable

representation is known as feature selection. A set of features that describes one case (i.e., a

row of predictor values) is called a vector. So the goal of SVM modeling is to find the optimal

hyperplane that separates clusters of vector in such a way that cases with one category of the


E3986D


22

target variable are on one side of the plane and cases with the other category are on the other

size of the plane. The vectors near the hyperplane are the support vectors. The figure below

presents an overview of the SVM process.

Figure 3.8 A Two-Dimensional example

If all analyses consisted of two-category target variables with two predictor variables, and the

cluster of points could be divided by a straight line, life would be easy. Unfortunately, this is

not generally the case, so SVM must deal with (a) more than two predictor variables, (b)

separating the points with non-linear curves, (c) handling the cases where clusters cannot be

completely separated, and (d) handling classifications with more than two categories.

three kernel mapping functions motioned last chapter can be used – probably an infinite

number. But a few kernel functions have been found to work well in for a wide variety of

applications. The default and recommended kernel function is the Radial Basis Function

(RBF).


E3986D


23

Chapter 4 The Implementation


E3986D


24

4.1 Real-time traffic signs recognition flowchart the system is based upon the four main steps (including sub-steps) which include one more

step of „Tracking‟ for faster search by the prediction of next search region. The flow chart in

figure depicts the final design of the real-time traffic sign recognition system:

Figure 4.1 flow chart of the project


E3986D


25

4.2 application component The image showed below is the structure of the characters recognition system

Figure 4.2 the System processing flowchart

4.2.1 Implementation of background extraction module

In this part, and job is to extract the traffic sign background use HSV color segmentation.

Because of when extract the characters, and the environment will give a lot of noises, and

these noises will be very difficult to remove or remove them will take much CPU processing

resources, so extract the traffic sign background color, will produce the less noises, and easy

to make a region for extract characters only in the gray mode.

The reason to extract the background color is due to the destination board background plays a

very important role in the character extraction part. So for every image first extract the back

ground shows below, then do the image segmentation.

Because for the absorbing people attention aim. It is very easy to extract the light blue board

back ground from the image. Method and implementation showed in Chapter 4.2.4.

To get every image for the board here gives three method and works with almost every

destination board. Use array to save the matrix, then do pop check and separation. If for every

matrix. Counts was 2.then got results, if not, then continue pop check and separation till

Counts was 2.


E3986D


26

Figure 4.3 Original image

Figure 4.4 Extract the traffic sign background color use color segmentation

The color segmentation algorithm showed below:

Calculation formula:


E3986D


27

HSV is defined mathematically by transformations between the r, g, and b coordinates. Let r,

g, b ∈ [0, 1] be the red, green, and blue coordinates in RGB color space. Let max be the

greatest of r, g, and b, and min the least of r, g, and b. To find the hue angle h ∈ [0, 360] for HSV, compute the following equation:

(4.2)

(4.2) R image

Figure 4.5 R image G image

Figure 4.6 G image B image


E3986D


28

Figure 4.7 B image H image

Figure 4.8 RGB and H image S image

Figure 4.9 RGB and S image V image

Figure 4.10 RGB and V image

After converting the RGB color mode to HSV color mode

According to the HSV color model, easily find the color range to extract


E3986D


29

Figure 4.11 HSV color model for find color range

4.2.2 Apply Shadow And Highlight Invariant Color

Segmentation Algorithm The Swedish National Road Administration defined the colors used for the signs in CMYK

color space. These values are converted into Normalized Hue and Normalizes Saturation as

shown in Table below.

The color segmentation algorithm is carried out by taking RGB images using a digital camera

mounted on a moving car. The images are converted to HSV color space. The hue, saturation,

and value are normalized into [0,255]. The HSV color space is chosen because the Hue

feature is invariant to shadows and highlights.

While normalized Hue is used as a priori knowledge to the algorithm, normalized Saturation

and Value are used to specify and avoid the achromatic subspaces in HSV color space.

When the hue value of color of the pixel in the input image is with the specified color range

specified in Table below, and its hue value is not in the achromatic area, then the

corresponding value in the output image is set to white.

Table 4.1 color space relationship with different conditions


E3986D


30

Table 4.2 specified color table

4.2.3 Algorithm Implementation Step 1. Convert the RGB image into HSV color space.

Step 2. Normalize the grey level of every pixel in the H image from [0,360] to [0,255].

Step 3. Normalize the grey level of every pixel in the S image from [0,1] to [0,255].

Step 4. Normalize the grey level of every pixel in the I image from [0,1] to [0,255].

Step 5. For all pixels in the H image

If (H pixel value >240 AND H pixel value= 0 AND

H pixel value < 10) Then H pixel value =255

Step 6.For all pixels in the S image

If corresponding S pixel value < 40 Then H pixel value = 0

Step 7.For all pixels in the V image

If corresponding ( V pixel value < 30) OR( V pixel value > 230) Then H pixel value =

0

Application result:

Figure 4.12 applied better algorithm

Figure 4.13 result with better algorithm


E3986D


31

4.2.5 Extraction area implementation Because maybe the destination board was put in an very complex environment. and the

background was white the same color with characters on the board, this case especially

happened in Sweden, and if doing extract like that, will give o many noises and maybe will

cause the extraction failed, so the background of the destination board become very important.

That was the blue object can absorb human sight. And the algorithm is depends on this theory.

Extract the blue background color and make sure the area and first, then record the blue color

area coordinates, and next doing scanning inside the area. So can easily find out the characters

with less noises.


E3986D


32

See figures below

Figure 4.14 the image after extract blue background


E3986D


33

Figure 4.15 the extraction area find

In this step, keep the 4 coordinates in an array. they are top, bottom, right, left.


E3986D


34

4.2.4 Second Time Image Processing Module In this part, due to use HSV color mode to extract the characters directly maybe will give

many noises, that will give big troubles to the character extraction module, so for here gray

image will be applied to extract the characters.

Figure 4.16 Gray image


E3986D


35

Extract black characters

(define a number, bigger than that number will give black, otherwise white)

Figure 4.17 the black characters find


E3986D


36

Extract white characters

(define a number, bigger than that number will give black, otherwise white)

Figure 4.17 the white characters find

Here, from these two images, we can still find a lot of noises, that will give troubles to the

character extraction, especially the 4.13, but in last chapter, we have already defined a region.

So for now, we just need to apply the 4 coordinates, so easy to fine a region showed as the red

line area on the image.

So for the next character extraction module, the application only need to start scan in side the

red line area, that will give only a little noises, and reduce the calculation time and cpu

resources.


E3986D


37

4.2.5 Character extraction module implementation 1.For the first category: Texts on the image are well arranged, row by row or line by line.

Figure 4.14 destination board suitable with height scan

Step.1

(1).Scan by every pixels from left to right, top to bottom.

Figure 4.15 scan the image from L to R Figure 4.16 scan the image from T to B

(2).If color changes, then give count.

(3).For this image, width scan counts like 3, height scan counts like 1.

(4).Width counts is more than height counts.

Step.2

(1).Use width scan to separate the image, find the matrix, see figure below:

Figure 4.17 image after separated (2).put all the matrix into the array one by one showed below

Figure 4.18 prepare the array


E3986D


38

Figure 4.19 how the image was put into array

Figure 4.20 put image rects into container Step.3 get the first matrix from the array and redo step 1 and step 2, see figure below

Figure 4.21 get the first matrix

Figure 4.22 redo step.1 and step.2 and get new matrix

Step.4 put these new matrixes into the array again, showed below

Figure 4.23 put these matrixes into the array again

Figure 4.24 third round scanning and image separation

Step.5 then get them one by one, redo step one, when the scanning counts is equal to 1, that

means can not being separated any more. If not then step.2 separate and put them

into the array again. After images can not being separated anymore, Get result.


E3986D


39

Showed below:

Figure 4.25 results

2.For the second category: Texts on the image but with some crosses and arrows that will

interfere the texts separation field.

Figure 4.26 destination board suitable with width scan

Step.1

(1).Scan from left to right, top to bottom.

Figure 4.27 width scan Figure 4.28 height scan (2).If color changes, then give count.

(3).For this image, height scan counts like 3, width scan counts like 1.

(4).Height counts is more than width counts.

Step.2

(1).Use height scan to separate the image, see figure below:

Figure 4.29 image after separated

Step.3

(1).Image like this can not give right city name after processed.

(2).Add some threshold, the different color space if is less than a number, then

(3).Do not separated them.


E3986D


40

Figure 4.30 shows the threshold

Figure 4.31 after mixed the bound

Step.4

(1).Use the method introduced in last section, then can get result:

Figure 4.32 result

3.For the third category: the image was not the normal image, somehow strange, so we should

define some threshold in the image to help to separated them with correct output city name.

All the texts were linked together and different color. Use scanning method can only find a

big chunk. image showed below:

Step.1

(1).Width or height scan but with some threshold:

Figure 4.33 Image not well arranged Step.2

(1).deal with these kind of images. should put some threshold at the beginning

(2).Like for one row if black pixels number is less than threshold, then put them

all white. See figures below:


E3986D


41

Figure 4.34 find threshold

Figure 4.35 calculate according to the threshold

Step.3

(1).Apply the method showing in section 1, it should be very easy to find the city name.

Figure 4.36 final output of figure showing in 4.9


E3986D


42

4.2.6 Training and testing with SVM

In this part, I will train the images coming from the image processing module, and testing

them with the application link to the SVM library that download from internet, and with four

different kernel functions. And show the result at chapter 5.

Figure shows how the image was been processed to be feature vector.

But for the first, build a character transfer table is very important, because this SVM lib only

accept the digitals and do calculations, not characters.

Table showed below:

0---------------------------------------------------------------------------------------------noises

1---------------------------------------------------------------------------------------------a

2---------------------------------------------------------------------------------------------b

3---------------------------------------------------------------------------------------------c

4---------------------------------------------------------------------------------------------d

5---------------------------------------------------------------------------------------------e

6---------------------------------------------------------------------------------------------f

7---------------------------------------------------------------------------------------------g

8---------------------------------------------------------------------------------------------h

9---------------------------------------------------------------------------------------------i

10--------------------------------------------------------------------------------------------j

11--------------------------------------------------------------------------------------------k

12--------------------------------------------------------------------------------------------l

13--------------------------------------------------------------------------------------------m

14--------------------------------------------------------------------------------------------n

15--------------------------------------------------------------------------------------------o

16--------------------------------------------------------------------------------------------p

17--------------------------------------------------------------------------------------------q

18--------------------------------------------------------------------------------------------r

19--------------------------------------------------------------------------------------------s

20--------------------------------------------------------------------------------------------t

21--------------------------------------------------------------------------------------------u

22--------------------------------------------------------------------------------------------v

24--------------------------------------------------------------------------------------------w

25--------------------------------------------------------------------------------------------x

26--------------------------------------------------------------------------------------------y

27--------------------------------------------------------------------------------------------z


E3986D


43

Figure 4.39 image transferred to feature vecter

Three steps:

1. Prepare input data for training(sample).

2. Train these data, and get result file.

3. Predict information based on result file then get output.

Step 1.training data

If the problem is liner problem, then we can apply the liner function.

Due to the data was non-liner separated, applied kernel function separated them.

(4.3) Table the kernel functions(RBF was most common method, here were applied)

Figure 4.40 training data based on kernel function

Transferring the images into the feature data for training, figure showed below


E3986D


44

Figure 4.41 training data

The template after processed the training data

Figure 4.42 the template


E3986D


45

Figure 4.43 template file summary

Step 2.input data

Transferring the images into the feature data for training, figure showed below

Figure 4.44 testing data

Step 3. easily get output file that show result

Figure 4.45 output(0 means noises, 1 means character a)


E3986D


46


E3986D


47

Chapter 5 Analysis and result


E3986D


48

5.1 Analysis the application

5.1.1 Analysis with the character extraction part

5.1.2 Analysis of hsv color model image

1. Advantages of converting to HSV The HSV color space is quite similar to the way in which humans perceive color. The other

models define color in relation to the primary colors. The colors used in HSV are clearly

defined by human perception which is not always the case with RGB or CMYK.

Hue played the central role in the color detection because it is invariant to the variations in

light conditions as its scale invariant, shift invariant and invariant under saturation changes.

HSV model has been very helpful to resolve the problems of Shadows and Highlights or

the chromatic variation of the day light. For example a faded image is considered as one

with the low saturation; the value of saturation can be tuned of that color as per the weather

conditions. Therefore it is able to preserve the maximum image information.

2. Problems with hue in HSV Color Space The hue coordinate is unstable and small changes in the RGB caused strong variation in hue.

It suffered from three problems as stated by Fleyeh:

When the intensity is very low or very high the hue is meaningless.

When the saturation is very low, the hue is meaningless.

When the saturation is less than the threshold value, the hue becomes unstable.

5.1.3 analysis of the character extraction algorithm

Due to the special environment, especially in Sweden, the characters on the destination board

normally should be white, but for the winter in Sweden, everything is white. So use HSV

method directly to extract the white character. Will give many noises and, for this application.

All the extract images will be in the size 30*30, then normalize them to be full of the image.

Here comes problem, for the character l, full image is a black square, but for many small

black spot(noise), is also a black square, this gives recognition problem.

So, make sure the noise is less as possible as we can, are very important. after extract the blue

background to make sure the area should be extract. it still gives noise. So for here, we use

region growing to filter them second time.

And for the extraction, was divided into two parts.

The first part is extract the area with the full city name.

Showed below:


E3986D


49

Figure 5.1 first part extraction

After get this area, then put them into an array.

Second part

Take them out from the array do recursion to split every characters.

the first part processing time is liner, it is easy to calculate and estimated.

Time consume showed below:

Figure 5.2 first processing time growing

the second part processing, due to this part, nobody can make sure the city name structures, so

let recursion to finish the splitting. so the time spend is non-liner, if there was too many

characters inside a city name, or some arrow, cross or something. That will consume a lot of

time to process them.

Time consume showed below:


E3986D


50

Figure 5.3 second processing time growing

5.1.4 analysis of the noise filter algorithm For here I have applied the noise filter algorithm. This algorithm truly filtered a lot of noises.

And the algorithm time consume is liner and is very to control, for one testing image, the

difference can be found from the figure showed below:

Figure 5.4 noises filter

The red cols means the noises exist in the image, from the image can be easily find that, when

applied this filter, noises reduced a lot. Because of the time consume is liner, so it is possible

for users to decide filter the image with more times and use different size. That can gives

better result.


E3986D


51

5.1.5 Result 1.image result The algorithm used in this part, that can separated every character with good accuracy.

but for this algorithm, still for some images, that is need to define the threshold. Because of

the distance of the destination board image captured by the camera. When the destination

board is far, threshold may give wrong result. b

So under the complex destination board conditions for the whole video stream, maybe there

was only one or two frames, the application can get perfect result,

Here will be testing with ten images and see the result:


E3986D


52

Image1:

Figure 5.1 image processing

Figure 5.2 character result


E3986D


53

Image2:




E3986D


54

Image3:




E3986D


55

Image4:




E3986D


56

Image5:




E3986D


57

Image6:




E3986D


58

Image7:




E3986D


59

Image8:




E3986D


60

Image9:




E3986D


61

Image10:




E3986D


62

2.result analysis This character application runs based on:

CPU:

OPERATION SYSTEM: windows 7 64bits

The processing time of the algorithm: Due to the different of every image, and the application processing data is different, so for

every image, and in some part of the video stream will be very slow, or take long time to

process, but the average time of the core calculation of this application processing time is:

4.8ms. and the algorithm TIME COMPLEXITY should be N(logn).

Image Nr Image Time Total Extract Failed Rate

Image 1

0.578s 54 54 5 0.90

Image 2

0.328s 72 72 11 0.84

Image 3

0.429s 24 24 0 1.00

Image 4

0.414s 11 5 6 0.54

Image 5

0.371s 51 51 12 0.76


E3986D


63

Image 6

0.489s 15 7 8 0.46

Image 7

0.398s 18 11 7 0.61

Image 8

0.542s 9 9 0 1.00

Image 9

0.372s 39 15 24 0.38

Image 10

0.401s 25 19 6 0.76

The accuracy rate of the extraction characters:

Totally the normal destination board extracted accuracy rate can be reach up to about 70%.

But for the complex destination board extracted depends on threshold.

Because of this application is use the threshold to make sure that every character can be

extracted with arrows and crosses inside them. So for some special case, it will failed to

extract them, or given out not accuracy result.

Figures showed below:


E3986D


64

Figure 5.21 the slope background

for the image showed, it can be very difficult for the application to extract EXKOPING, is

because the black pixel counts from the top to the character ENKOPING, the structure is

different, but the total black pixel counts was closed, so the application can not extract

ENKOPING.

Figure 5.22 I and l character problem

From the figure showed here, can be easily find out that, due to the application was extract the

characters, and resize them to the full region, for this application is 30*30, so for the character

I and l was should a totally black square, this is the same thing with some noises extract from

the image. So this will give big problem to the character recognition part.


E3986D


65

Figure 5.23 the problem threshold makes

From the image up there, people can easily find it was a character O, but in fact, is not a

character O, it is a Sweden character, the full spell should be O with two nodes , but the two

nodes was missing. The reason cause this problem is due to the splitting threshold we make.

So, there should be existed a balance that can make them both good, if not, then the system

gives out the errors.

Figure 5.24 the light problems

Due to the environment lights, maybe this problem is not produced by this processing

application, but I still should motion them here, that is the light, the image showed up, was a

dark image, and the white character color gray level is different from the background, so it is

no problem to extract, but for the black characters, the color gray level is closed to the

background color level, so extract the black characters, was very difficult, that can leads to

two different situations, one is the character extracted was very fuzzy, even human with good

logical was difficult to recognize them, and another situation is will make them a totally white


E3986D


66

chunk, no one knows what was that.(the figures shows how the image processed was before

this section figure 5.17)

Figure 2.25 special character combination

From the image showed up, that we can easily find out, due to some not normal spellings and

special characters, like V and A, put them together, can cause problems. For this application,

there was no method to solve these kind of problem. But train them as an special combination.

Then give two character output maybe can solve these problem.

5.1.6 Character recognition

5.1.7 SVM applied here For this application SVM was applied is because

1. For empirical risk minimization SVM is better than NN. 2. NN is very difficult to decide the hidden layer number, but for SVM the kernel

functions was given out, no more changes.

Figure 5.25 characters for testing from the image For every image showed before, then scan them from top to bottom and left to right. The first

pixel was given a number, and if the value was 0, then put 0 after the number, 255 put 1.

Figure 5.26 training sets

Then get template file

And predict from the testing sets


E3986D


67

As the figure showed below

Figure 5.27 testing sets Then we get result

For here „1‟ equals „p‟

Test the accuracy rate with all the four kernel function.


E3986D


68

5.1.8 Test with Liner function Image Nr Image 270 image correctness 570 image correctness

Image 1

32% 27%

Image 2

35% 38%

Image 3

46% 43%

Image 4

34% 36%

Image 5

41% 32%

Image 6

43% 44%


E3986D


69

Image 7

41% 41%

Image 8

32% 33%

Image 9

27% 19%

Image 10

33% 33%


E3986D


70

5.1.9 Test with polynomial function Image Nr Image 270 image correctness 570 image correctness

Image 1

68% 72%

Image 2

62% 64%

Image 3

72% 77%

Image 4

69% 71%

Image 5

73% 73%

Image 6

73% 78%


E3986D


71

Image 7

75% 62%

Image 8

77% 79%

Image 9

69% 72%

Image 10

64% 62%


E3986D


72

5.1.10 Test with RBF function Image Nr Image 270 image correctness 570 image correctness

Image 1

72% 77%

Image 2

74% 75%

Image 3

76% 79%

Image 4

72% 75%

Image 5

80% 80%

Image 6

73% 73%


E3986D


73

Image 7

75% 78%

Image 8

79% 82%

Image 9

72% 75%

Image 10

77% 79%


E3986D


74

5.1.11 Test with sigmoid function Image Nr Image 270 image correctness 570 image correctness

Image 1

62% 64%

Image 2

67% 69%

Image 3

64% 65%

Image 4

63% 55%

Image 5

62% 54%

Image 6

63% 71%


E3986D


75

Image 7

67% 62%

Image 8

70% 70%

Image 9

69% 75%

Image 10

77% 78%

The result showed that for the image problem, due to the size of the image is not very big and

very clear, so the accuracy is not very high, but for the image recognition problem, due to it is

belong to non-liner problem, so use liner-kernel function to train them, then predict, that gives

almost wrong output, and compared with the three kernel function, rbf, polynomial, sigmoid.

For here, rbf gives best output. And for the image counts. From the result showed, that can tell,

sometimes, training with more sample vectors maybe will give lower accuracy, that is because

when the kernel function mapping the non-liner space into some high dimension space and try

to find the liner space, but due to to many sample vectors, and will cause very difficult to find

accuracy liner separated space, so will give lower accuracy output even compare with less

training vectors.

So, for the problem type and make sure the best number training vectors for the the kernel

function due to use, will give very good result.


E3986D


76

Chapter 6 Conclusion and future works


E3986D


77

6.1 Conclusion To conclude the thesis report, the Real-Time city name Recognition System presented here

has been a success. The aim based on right output city name which was the necessary

parameters for the real-time environment has been accomplished. The average accuracy value

achieved by the application is like 60%.

1. Image Segmentation is found to be the most critical task during the whole project. This is

because of the illumination conditions especially due to highlights and shadows. There is

always a need to tune the parameters in segmentation during the process. However the

Image Segmentation that has been done in the project was satisfactory.

2. Noise filtering with the multiple median filter has been very efficient somehow. as it

reduces the number of objects to be appointed as candidate traffic signs for recognition.

Multiple median filter is harmless to the internal information carried by the traffic sign

which could be used for the classification. Therefore it is very compatible for the traffic

sign recognition system in real time.

3. Character extraction is also the import part in my thesis. Use width and height scan to

extract the characters from the image, only works with formal image I have showed before.

If the image is very complex so should and some threshold. Or worked with the useful

frames from the video stream. Not all of them.

4. But for the Character extraction part, due to noise problem, and some image, especially with the white frame inside the blue background, if the image was slope, so maybe will

cause the every char extraction failed, instead will output a big black chunk.

5. The Support vector machine is very powerful to classify characters. And gives high

recognition rate. but there is one not good compared with neural networks is must train the

noise image also. And give them a feature space.so that the system can recognize it as

noise.

6. And for this SVM algorithm, first I have used another algorithm, that is save all the image in an array, and

characters extraction for traffic sign destination boards...

Documents