object recognition in the surveillance area of visually impaired

7/31/2019 Object Recognition in the Surveillance Area of Visually Impaired

1/65

1

CHAPTER 1

INTRODUCTION

Blindness is the condition of lacking visual perception due

to physiological or neurological factors. Various scales have been

developed to describe the extent ofvision loss and define blindness.

Total blindness is the complete lack of form and visual light perception

and is clinically recorded as NLP, an abbreviation for no light

perception. Blindness is frequently used to describe severe visual

impairment with residual vision. Those described as having only light

perception have no more sight than the ability to tell light from dark

and the general direction of a light source. Visually impaired people

need some assistance in order to move from one place to another in day

to day life. It might be in a dependent manner with the help of others or

in an independent manner with the help of canes, trained dogs etc. to

guide them. In both the cases the significant objective of them is todetect the obstacle in front of them and avoiding it while moving. With

the advent of electronic technologies self-assistive devices are made to

help them. Some of the present technologies are as follows.

1.1 LASER CANE

This is an electronic cane that uses invisible laser beams to

detect obstacles, drop offs, and similar hazards in the surroundings.

Once the cane detects the obstacle or drop off using the laser beams, it

will produce a specific audio signal. The cane has three distinct audio

signals; each indicates a specific distance. The audio signal informs the

user of the distance of the obstacle or the height of the drop off .This

device can detect objects and hazards up to a distance of 12 feet.
http://en.wikipedia.org/wiki/Visual_perceptionhttp://en.wikipedia.org/wiki/Physiologyhttp://en.wikipedia.org/wiki/Neurologyhttp://en.wikipedia.org/wiki/Vision_losshttp://en.wikipedia.org/wiki/Visual_impairmenthttp://en.wikipedia.org/wiki/Visual_impairmenthttp://en.wiktionary.org/wiki/residualhttp://en.wikipedia.org/wiki/Light_sourcehttp://en.wikipedia.org/wiki/Light_sourcehttp://en.wiktionary.org/wiki/residualhttp://en.wikipedia.org/wiki/Visual_impairmenthttp://en.wikipedia.org/wiki/Visual_impairmenthttp://en.wikipedia.org/wiki/Vision_losshttp://en.wikipedia.org/wiki/Neurologyhttp://en.wikipedia.org/wiki/Physiologyhttp://en.wikipedia.org/wiki/Visual_perception


2/65

2

Figure 1.1BLIND PERSON WITH LASER CANE

A part of the canes handle also vibrates when there is an object

in front of the user. The laser cane is suitable for persons who are blind

and persons who are deaf blind. It can be used on its own. However

mobility experts strongly recommend that blind persons first learn the

use of the long white cane before using the laser cane. The Laser Cane

emits beams of invisible light which results in sounds or vibrations

when the beam encounters an object, so as to alert the user to an

obstruction ahead. Weighs one pound, made of aluminum-steel.

1.2 SONIC MOBILITY DEVICE

This is a device that is generally mounted on users head. It

uses ultrasonic technology to detect obstacles and other objects that are

located in front of users path. The sonic mobility devices uses the

musical scales 8 tones to indicate the distance of the object. Each tone

signifies a particular distance from the obstruction. The user hears the

tone through the devices earpiece.

Figure 1.2SONIC MOBILITY DEVICES


3/65

3

1.3 GPS DEVICES FOR THE BLIND

Although mainly used in identifying ones location, GPS

(Global Positioning System) devices also help blind persons in

travelling independently. Blind persons can use portable GPS systems

to determine and verify the correct travel route. They can use these

devices whether are they are walking or riding a vehicle.GPS devices

for the blind include screen readers so the user can hear the

information.

Other GPS devices are connected to a Braille display so the

user can read the information displayed in Braille. Blind person should

use a particular mobility device in addition to the GPS system.

FIGURE 1.3 GPS DEVICES

The Braille devices and software help blind people to improve

their skills in reading and writing. To become literate is very important

for this kind of individuals because it allows them to hope for a

productive future at the same time live with confidence. These

innovative Braille devices and software help the visually-impaired

individuals print and store information quickly, quietly, and reliably.

1.4 ULTRASOUND BASED DETECTION

Here a wearable system for visually impaired users is

implemented which allows them to detect and avoid obstacles. This is

based on ultrasound sensors which can acquire range data from the


4/65

4

objects in the environment by estimating the time-of-flight of the

ultrasound signal. Using a hemispherical sensor array, we can detect

obstacles and determine which directions should be avoided. However,the ultrasound sensors are only used to detect whether the obstacles are

present in front of users. Unimpeded directions are determined by

analyzing patterns of the range values from successive frames.

Feedback is presented to users in the form of voice commands and

vibration patterns.

1.4.1 NEW BRAILLE TECHNOLOGY

Using this technology visually impaired persons can read a

persons emotion or facial expressions to whom he is conversing. To

make this possible here an ordinary web camera, hardware as small as a

coin and a tactile display is used.This enables the visually impaired to

direct interpret human emotions.

Visual information is transferred from the camera into

advanced vibrating patterns displayed on the skin. The vibrators are

sequentially activated to provide dynamic information about what kind

of emotion a person is expressing and the intensity of the emotion

itself.

Figure 1.4 BRAILLE DEVICES AND SOFTWARE


5/65

5

The first step for a user is to learn the patterns of different

facial expressions which can be done by displaying the emotions in

front of a camera which translates it into vibration patterns. In thislearning phase visually impaired person have a tactile display mounted

on the back of a chair. When interacting with other people a sling on

the forearm can be used instead.

The main research focus is to characterize different emotions

and to find a way to present them by means of advanced biomedical

engineering and computer vision technologies. This technology can

also be implemented on mobile phones for tactile rendering of live

football games and human vibration information through vibrations

which is an interesting way of enhancing the experience of mobile

users.

1.5 COMPUTER ASSISTIVE TECHNOLOGYFOR THE BLIND

The most important advancement since blind assistive

technology began to appear in the 1970s is screen reading software,

which simulates the human voice reading the text on computer screen

or renders hard-copy output into Braille. Screen readers are designed to

pick out things that will catch sited people, such as colors and blinking

cursors, and can be modified to choose areas the user wants or doesnt

want.

Figure 1.5 VISUALLY IMAPAIRED ASSISTIVE DEVICES


6/65

6

1.6 CANE WITH SENSOR

The cane is very essential for safe mobility of vision-impaired

people. With this device, they are able to stroll around withoutworrying for bumps. And along with the innovations made in

technology, the cane being used blind people are better improved in

terms of safety and functionality.

Figure 1.6CANE WITH SENSOR

1.7BATTERY-OPERATED SPHYGMOMANOMETER

Blind person can also be subjected to hypertension. And it is

good to know that with the availability of beeping or talking

sphygmomanometer, vision-impaired individuals can now accurate take

or monitor blood pressure by simply using a beeping or talking

sphygmomanometer. This type of medical equipment is battery-

operated. The blood pressure and pulse readings are announced in a

clear voice and shown simultaneously on a digital display.

Figure 1.7BATTERY-OPERATED SPHYGMOMANOMETER


7/65

7

1.8NAVIGATIONAL AID HEADSET

This device is still in concept. However, if successfully

launched, the aid headset will help the blind person to confidently,independently and safely walk through the city streets. The said

navigational aid device comes will a built-in microphone and audio

transducer. It will also incorporate a GPS system, speech recognition,

and obstacle detection technology. Using the microphone, the user will

tell his destination and from the audible information, the GPS system

will direct the user to his desired location and the obstacle technology

will help him safely reach the place by informing him any impediments

he might encounter.

Figure 1.8NAVIGATIONAL AID HEADSET

It is estimated that 7.4 million people in Europe are visually

impaired [11]. For many, known destinations along familiar routes can

be reached with the aid of white canes or guide dogs. By contrast, for

new or unknown destinations along unfamiliar routes (that may change

dynamically) the limitations of these aids become apparent [12, 13, 14]

(e.g. white canes are ineffective for detecting obstacles beyond 3-6

feet). The mobility aids are only useful for assisting visually impaired

people through the immediate environment (termed as micro-

navigation), but do not facilitate the traveller in more distant

environments (termed as macro navigation).


8/65

8

Figure 1.9ELECTRONIC TRAVEL AIDS (ETAS)

` With the proliferation of context-aware research and

development, Electronic Travel Aids (ETAs) such as obstacle

avoidance systems (e.g. Laser Cane and ultrasonic obstacle avoiders )

have been developed to assist visually impaired travellers for micro-

navigation. Whereas, Global Positioning Systems (GPS) and

Geographical Information Systems (GIS) have been/are being

developed for macro navigation (e.g. MOBIC Travel Aid & PersonalGuidance System).

However, despite recent technological advancements, there is

still considerable scope for Human Computer Interaction (HCI)

research. Previous work has predominantly focused on developing

technologies and testing their functionality as opposed to utilizing HCI

principles (e.g. Task Analysis) to actively assess the impact on the user.

For instance, Dodson et al. [12] make the assumption that since a blind

human is the intended navigator a speech user-interface is used to

implement this.

However, despite the contextual complexity of a visually

impaired traveller interacting with various mobility aids (i.e.

navigational system and guide dog/white cane), existing research has


9/65

9

failed to fully address the interaction of contextual components and

how usability is influenced. Further, as more contextual sources are

used to identify and discover a users context, it is becomingincreasingly paramount that information is managed appropriately and

displayed in a way that is tailored to the visually impaired travellers

task, situation and environment.

1.9 WHITE CANE

A white cane is used by many people who

areblindorvisually impaired, both as a mobility tool and as a courtesy

to others. Not all modern white canes are designed to fulfil the same

primary function, however: There are at least five varieties of this tool,

each serving a slightly different need.

TYPES:

Long cane: This "traditional" white cane, also known as a "Hoover"

cane, after Dr. Richard Hoover, is designed primarily as a mobility tool

used to detect objects in the path of a user. Cane length depends upon

the height of a user, and traditionally extends from the floor to the

user's sternum. Some organizers favour the use of much longer canes.

Figure 1.10LONG WHITE CANE
http://en.wikipedia.org/wiki/Blindnesshttp://en.wikipedia.org/wiki/Blindnesshttp://en.wikipedia.org/wiki/Blindnesshttp://en.wikipedia.org/wiki/Visually_impairedhttp://en.wikipedia.org/wiki/Visually_impairedhttp://en.wikipedia.org/wiki/Visually_impairedhttp://en.wikipedia.org/wiki/Human_sternumhttp://en.wikipedia.org/wiki/Human_sternumhttp://en.wikipedia.org/wiki/Visually_impairedhttp://en.wikipedia.org/wiki/Blindness


10/65

10

"Kiddie" cane: This version works in the same way as an

adult's long cane, but is designed for use by children.

Figure 1.11 KIDDIE CANE

Identification cane ("Symbol Cane" in British English):

The ID cane is used primarily to alert others as to the bearer's visual

impairment. It is often lighter and shorter than the long cane, and has

no use as a mobility tool.

Figure 1.12IDENTIFICATION CANE

Support cane: The white support cane is designed primarily

to offer physical stability to a visually impaired user. By virtue of its

colour, the cane also works as a means of identification. This tool has

very limited potential as a mobility device.

Figure 1.13 SUPPORT CANE

.


11/65

11

CHAPTER 2

PROBLEM DESCRIPTION

Visually impaired people cannot navigate easily in their day to

day life. They need help of others or cane or other electronic mobility

devices or guided dogs which guides them in an appropriate manner.

So they need a self assistive device to guide and make them

independent from being dependent on others for navigation. The very

preliminary and significant thing is the detection of obstacles in front ofthem and avoiding it. In this project we need to classify the objects,

recognize obstacles or objects, identify them and track the objects

through an image processing technique and suggesting them an

alternative path.


12/65

12

CHAPTER 3

LITERATURE SURVEY AND RELATED WORKS

3.1 VISUAL ATTENTIONThe visual system is not capable of fully processing all of the

visual information that arrives at the eye. In order to get around the

limitation, a mechanism that selects regions of interest for additional

processing is used. This selection is done bottom-up, using saliency

information, and top down, using cueing.

The processing of visual information starts at the retina. The

neurons in the retina have a center surround organization of their

receptive fields. The shapes of these receptive fields are among others

modeled by the difference of Gaussian (DoG).This function captures

the Mexican hat Shape of the retina ganglion cells receptive field.

These cells emphasize boundaries and edges. Further up the visual

processing pathway is the visual cortex area V1.Here are cells that are

orientation selective. These cells can be modeled by a 2D gabor

function. Itti and Kochs implementation of Koch and Ullmans

saliency map is one of the best performing biologically plausible

attention model[1][2][3]. Itti et al.[3]implemented bottom up saliency

detection by modeling specific feature selective retina cells and cells

further up the visual processing pathway. The retina cells use a center

surround receptive field which is modeled in[2] by taking the DoG.

They also model orientation selective cells using 2D Gabor filters. For

each receptive field there is an inhibitory variant. For example if an on-

center off-surround receptive field shows excitation on certain input,

then the input will cause the opposite off-center-on-surround receptive

field to inhibit.


13/65

13

The sub-modalities that Itti et al..[3]use for creating a saliency

map are intensity, color and orientation. For each of these sub-

modalities a Gaussian scale pyramid is computed to obtain scaleinvariant features. For each of these image scale features maps are

created with a receptive field and its inhibitory counterpart.

For each of these image scales feature maps are created with a

receptive field and its inhibitory counterpart .For the intensity sub-

modality on-center off-surround and off-center on-surround feature

maps for different scales are computed based on the pixel intensity. For

the color sub modality feature maps are computed with center surround

receptive fields using a color pixel value as center with its opponent

color as surround. The color combinations used for this are red-green

and blue-yellow. The feature maps for the orientation sub-modality

were created using the 2D Gabor filters for the orientation 0,45,90,135

degrees.

To obtain a saliency map from all these features, a weighting

process is executed in several stages to obtain the most salient features.

In the first stage feature maps are weighted across the different

receptive fields ,in the second stage this is done across the scales and in

the final stage across the sub-modalities. By combining the feature

maps obtained in the last stage a saliency map is created.

The visual system has limited capacity and cannot process

everything that falls onto the retina. Instead, the brain relies on

attention to bring salient details into focus and filter out background

clutter. Two recent studies by researchers at the Salk Institute for

Biological Studies, one study employing computational modeling

techniques and the other experimental techniques, have helped to

unravel the mechanisms underlying attention. The strength of visual


14/65

14

input fluctuates over orders of magnitude. The visual system reacts

automatically to these changes by adjusting its sensitivity, becoming

more sensitive in response to faint inputs, and reducing sensitivity tostrong inputs. For example, when we walk into a darkened lecture hall

on a sunny day at first we see little, but over time our visual system

adapts, increasing its sensitivity to match the environment.

Neurons in the visual cortex view the world through their

"receptive fields," the small portion of the visual field individual

neurons actually "see" or respond to. Whenever a stimulus falls within

the receptive field, the cell produces a volley of electrical spikes,

known as "action potentials" that convey information about the

stimulus in the receptive field.

But the strength and fidelity of these signals also depends on

other factors. Scientists generally agree that neurons typically respond

more strongly when attention is directed to the stimulus in theirreceptive fields. In addition, the response of individual neurons can be

strongly influenced by what's happening within the immediate

surroundings of the receptive field, a phenomenon known as contextual

modulation.

The visual attention mechanism may have at least the

following basic components :

(1) The selection of a region of interest in the visual field.

(2) The selection of feature dimensions and values of interest.

(3) The control of information flow through the network of

neurons that constitutes the visual system.

(4) the shifting from one selected region to the next in time .


15/65

15

The biologically motivated computational attention

system VOCUS (Visual Object detection with a CompUtational

attention System) that detects regions of interest in images. It operatesin two modes, in an exploration mode in which no task is provided, and

in a search mode with a specified target. In exploration mode, regions

of interest are defined by strong contrasts (e.g. color or intensity

contrasts) and by the uniqueness of a feature. For example, a black

sheep is salient in a flock of white sheep. In search mode, the system

uses previously learned information of a target object to bias the

saliency computations with respect to the target.

In various experiments, it is shown that the target is in average

found with less than three fixations, that usually less than five training

images suffice to learn the target information, and that the system is

mostly robust with regard to viewpoint changes and illumination

variances. VOCUS provides a powerful approach to improve existing

vision systems by concentrating computational resources to regions that

are more likely to contain relevant information. The more the complexity

and power of vision systems increases in the future, the more they will

profit from an attentional front-end like VOCUS.

3.2 PSYCHOPHYSICAL MODELS OF ATTENTION

FEATURE INTEGRATED THEORY (FIT):

Both anatomical and physiological evidence support the

hypothesis that the visual system divides input visual information into

distinct subsystems that analyze and code different properties in various

specialized areas. This raises a critical problem of how these dispersed

representations are combined together into an unified perception, i.e.,

the binding problem.


16/65

16

FIT consists of a master map which codes locations of feature

discontinuities in luminance, color, depth or motion, and a separate set

of feature maps for processing information about the current spatiallayout of the features. An attention window moves within a location

map which selects the features attended to and temporarily excludes

others from the feature maps, thus putting the what and where

pathways together.

There are three spatially selective mechanisms used in FIT to

solve the binding problem: selection by a spatial attention window,

inhibition of location feature maps containing unwanted features, and

top-down activation of the location containing the currently attended

object.

FIGURE 3.1FEATURE INTEGRATED THEORY (FIT)

3.3 SALIENCY MAP

SALIENCY:

Something is said to be salient if it stands out.

E.g. road signs should have high saliency

Figure 3.1 SALIENCY MAP


17/65

17

Saliency Map is also defined asa topographically arranged

map that represents visual saliency of a corresponding visual

scene.The Saliency Map Model is defined as

Localizes salient points in the visual field.

Saliency is based on (bottom-up) scene-based properties

Reduces computation by a selection on basis of pre

attentively computed simple features.

Addresses some problems with the integration of

different feature dimensions into a space-related map.

Given an image, we assign to each pixel a value of how

informative the pixel is with respect to the Human visual system

(HVS). Research in this area is generally divided into two topics:

3.3.1 BOTTOM-UP SALIENCY

Also known as pre-attentive vision. Useful for rapid scene

understanding and linked to human survival mechanisms .

The bottom -up saliency method depend only on the

instantaneous sensory input, without taking into account the

internal state of the organism.

A dramatic example of a stimulus that attracts attention using

bottom-up mechanisms is a fire-cracker going off suddenly.

A bottom-up attention, are easier to understand than those that

are influenced by internal states.

Possibly the most influential attempt at understanding bottom-

up attention and the underlying neural mechanisms was made by

Christof Koch and Shimon Ullman (Koch and Ullman, 1985). They

proposed that the different visual features that contribute to attentive
http://www.scholarpedia.org/article/Neuronhttp://www.scholarpedia.org/article/Neuron


18/65

18

selection of a stimulus (color, orientation, movement etc) are combined

into one single topographically oriented map, the Saliency map which

integrates the normalized information from the individual feature mapsinto one global measure of conspicuity. In analogy to the center-

surround representations of elementary visual features, bottom-up

saliency is thus determined by how different a stimulus is from its

surround, in many sub modalities and at many scales. To quote from

Koch and Ullman, 1985, Saliency at a given location is determined

primarily by how different this location is from its surround in color,

orientation, motion, depth etc.

Figure 3.2 BOTTOM -UP SALIENCY

3.3.2 TOP-DOWN SALIENCY

Goal-driven. Controlled be higher-order brain processes for tasks

such as object recognition and tracking .

Top-down control, on the other hand, does take into account theinternal state, such as goals the organisms has at this time,

personal history and experiences, etc.

An example of top-down attention is the focusing onto difficult-

to-find food items by an animal that is hungry, ignoring more

"salient" stimuli.


19/65

19

Figure 3.3 TOP-DOWN SALIENCY

EXAMPLEOF SALIENCY MAP:

The figure shows a complex visual scene and the corresponding

saliency map, as computed from the algorithm in Niebur and Koch

(1996). The scene is static so the motion component of the algorithm

does not yield a contribution. The surf line is well-represented in the

saliency map since it combines input from several feature maps:

intensity, orientation and color all have substantial local contrast at

several spatial scales in this area. The same is the case for the clouds

and the island in the distance.

Figure 3.4 SALIENCY MAP FOR STOOL. IN LEFT HAND SIDE

ORIGINAL IMAGE AND IN RIGHT HAND SIDE DISPLAYS

SALIENCY
http://www.scholarpedia.org/article/Algorithmhttp://www.scholarpedia.org/article/Algorithm


20/65

20

The original definition of the saliency map by Koch and Ullman

(1985) is in terms of neural processes and transformations, rather than

in terms of cognitive or higher order constructs. The question where thesaliency map is located in the brain arises thus quite naturally.

There is no logical necessity that it arises in one particular location

and it could be understood as a functional map whose components

could be distributed over many brain areas. It is also possible that there

are more than one topographically organized saliency maps.

However, given that many feature maps of early vision are, in fact,

localized in specific parts of the central nervous system, it has been

proposed that the same might also be the case for the saliency map.

Koch and Ullman (1985) proposed that it may be located in the lateral

geniculate nucleus of the thalamus, an area previously suggested as

playing a major role in attentional control.

Another thalamic nucleus, the pulvinar, is known to be involved in

attention (Robinson and Petersen 1992) and has also been suggested as

a candidate for housing the saliency map. Another possibility is the

superior colliculus, likewise known to be involved in the control of

attention (Kustov and Robinson 1996).

Several neocortical areas have been suggested as well, including V1

(Li 2002), V4 (Mazer and Gallant 2003), and posterior

parietal (Gottlieb 2007).

Thus, there are a number of identified candidates which may

correspond to different flavours of salience, perhaps more bottom-up

driven in some area and more strongly modulated by behavioural goals

in some other area.
http://www.scholarpedia.org/article/Brainhttp://www.scholarpedia.org/article/Thalamushttp://www.scholarpedia.org/article/Pulvinarhttp://www.scholarpedia.org/article/Pulvinarhttp://www.scholarpedia.org/article/Thalamushttp://www.scholarpedia.org/article/Brain


21/65

21

APPLICATIONS

Visual saliency and the saliency model introduced in the previous

chapter have several applications in computer vision, artificial

intelligence systems as well as more recently developed marketing

analyses. Following are only some of the applications described in the

literature:

Machine Vision and Mobile Robots: Visual saliency is used as a

visual landmark for mobile robot applications, to efficiently compute a

robots localization relative to its environment or to track objects in the

environment (VOCUS - System (A Visual Attention System for Object

Detection and Goal-Directed Search.

Neuromorphic Vision: Developing a high-speed robotics

application that includes visual saliency which can handle as large an

amount of information as the human visual system .

Automatic target detection: For example, finding hidden militaryvehicles or traffic signs .

Image and Video Compression: The eye does not sense the image

at a constant resolution but has a fovea at it center, where

photoreceptors are much closer spaced than in the periphery. This

foveation property of the vertebrate eye has been used in image and

video processing algorithms. A combination of foveated algorithms

combined with visual saliency has been used for video compression.

Medical Imaging: Tumour detection in mammograms using

topographic maps based on salient regions.

Advertisement Design: The prediction of human fixations based on

visual saliency can be used to improve the design of advertisements or

magazine covers.


22/65

22

CHAPTER 4

PROPOSED METHODOLOGY

Figure 4.1 OVERALL BLOCK DIAGRAM

In this Block Diagram representation of the architecture, modules are

visualized by blocks and information streams by arrows.

The goal of the proposed technology is self learning, self-

configuration and self adjustment. The proposed system addresses the

following challenging requirements:

1.The system should be able to report the location, distance and

direction of items in the room such as equipment, furniture, doors and

even other users.

2.It must be a reliable system that minimizes the impact of

installation and maintenance to the building owner.

A great number of benefits are realized from the implementation of

systems, such as greater safety ,autonomy and self esteem, and

eventually, better quality of life.


23/65

23

The visual saliency detection module receives a camera image and

computes a saliency map. The visual location module returns the

location of the most salient object. The feature extraction modulecomputes image features from the salient image region. The visual

saliency detection architecture that will be described in this section is

derived from work of Itti.et.al.

4.1 Visual Saliency Detection

This method implemented bottom up saliency detection by

modelling specific feature selective retina cells and cells further up the

visual processing pathway.

The retina cells use a center surround receptive field which is

modelled in [28] by taking the difference of Gaussian (DoG). They also

model orientation selective cells using 2D Gabor filters. The features

that they use for creating a saliency map are intensity, color and

orientation. For each of these features a Gaussian scale pyramid is

computed to obtain scale invariant features using receptive fields.

. The input image is decomposed through several pre-attentive feature

detection mechanisms (sensitive to color, intensity, etc), which operate

in parallel over the entire visual scene.

Input: static images (640x480)

Each image at 8 different scales(640x480, 320x240, 160x120)

Use different scales for computing centre-surround

differences (similar to assignment).


24/65

24

Neurons in the feature maps then encode for spatial contrast in

each of those feature channels. In addition, neurons in each feature map

spatially compete for salience, through long-range connections thatextend far beyond the spatial range of the classical receptive field of

each neuron (here shown for one channel; the others are similar).

After competition, the feature maps are combined into a

unique saliency map, which topographically encodes for saliency

irrespective of the feature channel in which stimuli appeared salient.

The saliency map is sequentially scanned by attention through the

interplay between a winner-take-all network (which detects the point of

highest saliency at any given time) and inhibition of return (which

suppresses the last attended location from the saliency map, so that

attention can focus onto the next most salient location). Top-down

attentional bias and training can modulate most stages of this bottom-up

model.

It is based on four major principles: visual attention acts on

a multi-featured input; saliency of locations is influenced by the

surrounding context; the saliency of locations is represented on a scalar

map: the saliency map; and the Winner-Take-all and inhibition of

return are suitable mechanisms to allow attention shift. In the

following, the implementation details of the four main steps of themodel are as follows

4.1.1 FEATURE MAPS

First, a number of features (1::j::n) are extracted from the scene by

computing the so called feature maps Fj . Such a map represents the

image of the scene, based on a well-defined feature, which leads to a

multi-featured representation of the scene. In his implementation, Itti


25/65

25

considered seven different features which are computed from an RGB

color image and which belong to three main cues, namely intensity,

color, and orientation.Extraction of Early Visual Features

The r, g, and b is t he red, green, and blue channels of

the input image.

Intensity image:

I = (r+g+b)/3

The r, g, and b channels are normalized by I in order to decouple hue

from intensity.

Hue variation are not perceivable at very low luminance.

Normalization is only applied at the location where

Intensity feature

F1 = I = 0.3.R + 0.59 . G + 0.11 . B

Two chromatic features based on the two color opponency filters R+G

-

and B+Y

-where the yellow signal is defined as

. Such

chromatic opponency exists in human visual cortex.

F2 =

F3 =

jiji II ,, max10

1


26/65

26

The normalization of the chromatic features byIdecouples hue from

intensity.

Four local orientation featuresAccording to the angles { , , , }. Gabor filters,

which represent a suitable mathematical model of the receptive field

impulse response of orientation-selective neurons in primary visual

cortex, are used to compute the orientation features. In this

implementation of the model, it is possible to use an arbitrary number

of orientations. However, it has been noticed that using more than four

orientations does not improve the performance of the model drastically.

4.1.2Center Surround Difference

In a second step, each feature map is transformed in its

conspicuity map which highlights the parts of the scene that strongly

differ, according to a specific feature, from their surroundings. In

biologically plausible models, this is usually achieved by using a

center-surround-mechanism. Practically, this mechanism can be

implemented with a difference-of-Gaussians-filter( DoG) which can be

applied on feature maps to extract local activities for each feature type.

A visual attention task has to detect conspicuous regions, regardless of

their sizes. Thus, a multiscale conspicuity operator is required.

Center-Surround is then implemented as the difference between fine

(c for center) and coarse scales (s for surround). Indeed, for a feature j

(1..j..n), a set of intermediate multiscale conspicuity maps Mj,k

(1..k..K) are computed according to the following Equation, giving rise

to (n K) maps for n considered features.

Mj,k = |Pj(ck) Pj(sk)|


27/65

27

where is a cross-scale difference operator that first interpolates the

coarser scale to the finer one and then carries out a point-by-point

subtraction.

The absolute value of the difference between the center and

the surround allows the simultaneous computing of both sensitivities,

dark center on bright surround and bright center on dark surround

(red/green and green/red or blue/yellow and yellow/blue for color).

Creating the Gaussian pyramid

In this step the original input image I is convolved with a

linearly separable 5x5 Gaussian kernel and is sub sampled in nine (s

[0..8]) different spatial scales.

The sub sampling is obtained as follows:

I( ) =

I1/2

Gaussian Scale Pyramids:

In Gaussian scale pyramids are used for scale invariant receptive

field feature extraction. It is a commonly used method in image

processing, but it is computationally rather expensive. Gaussian

pyramids are used to compute scale invariant features. Different image

scales are normally used so that the filter mask with which an image is

convolved does not have to change. The convolution of an image with a

larger mask is rather time consuming, O(nm) where

n is the number of pixels in the image

m the number of entries in the filter mask.


28/65

28

Figure 4.2 GAUSSIAN SCALE PYRAMID

When a Gaussian pyramid is used, several processing steps have

to be taken. First the input image needs to be scaled down, which can

be done by sub-sampling. Sub-sampling can lead to aliasing and to

overcome this problem the spatial frequencies of the image which are

above the sampling frequency must be removed.

This can be done by smoothing the image with a Gaussian filter

before sub-sampling it. When the receptive field filter is applied the

filtered image needs to be scaled up/back. In that they used 9 spatial

scales and all filtered maps are resized to scale 4. If they used 4 scales,

2 receptive field sizes, and all maps are resized to scale 2. When scaling

up some sort of interpolation needs to be used for anti-aliasing.

4.1.3 NORMALIZATION STRATEGIES

The saliency-based model of visual attention performs two

kinds of map combination. On one hand, the cross-scale combination of

the multiscale conspicuity maps Mj,k in order to compute a unique

conspicuity map Cj for each scene feature.


29/65

29

4.1.4 SALIENCY MAP

Purpose: represent saliency at all locations with a scalar quantity

Feature maps combined into three conspicuity maps Intensity (I)

Color (C)

Orientation (O)

Before they are combined they need to be normalized

Creating the saliency map:The combination of multiple maps is

obtained by the linear combination of the conspicuity maps

The overall computation goal is to have a single map, in which

the most salient object of an image stands out more than others and to

have a mechanism that models the shift to the next most salient object.

The input image is decomposed through several pre-attentive

feature detection mechanisms (sensitive to color, intensity, orientation),

which operate in parallel over the entire visual scene. The models

saliency map is endowed with internal dynamics which generate

attentional shifts. This model consequently represents a complete

account of bottom-up saliency and does not require any top-down

guidance to shift attention.

This framework provides a massively parallel method for the

fast selection of a small number of interesting image locations to be

analyzed by more complex and time consuming object-recognition

processes. Extending this approach in guided-search, feedback from

higher cortical areas (e.g. ,knowledge about targets to be found) was

used to weight the importance of different features. Input is provided in

the form of static color images, usually digitized at 640X480 resolution.


30/65

30

Nine spatial scales are created using dyadic Gaussian pyramids [10],

which progressively low-pass filter and subsample the input image,

yielding horizontal and vertical image-reduction factors ranging from1:1 (scale zero) to 1:256 (scale eight) in eight octaves.

Each feature is computed by a set of linear center-surround

operations akin to visual receptive fields (Fig. 1): Typical visual

neurons are most sensitive in a small region of the visual space (the

center), while stimuli presented in a broader, weaker antagonistic

region concentric with the center (the surround) inhibit the neuronal

response. Such an architecture, sensitive to local spatial discontinuities,

is particularly well-suited to detecting locations which stand out from

their surround and is a general computational principle in the retina,

lateral geniculate nucleus, and primary visual cortex [11].

Center-surround is implemented in the model as the difference

between fine and coarse scales: The center is a pixel at scale c {2, 3,

4}, and the surround is the corresponding pixel at scale s = c + , with

{3, 4}. The across-scale difference between two maps, denoted

below, is obtained by interpolation to the finer scale and point-by-point

subtraction. Using several scales not only for c but also for = s - c

yields truly multiscale feature extraction, by including different size

ratios between the center and surround regions.

Extraction of early visual features

With r, g, and b being the red, green, and blue channels of the

input image, an intensity image I is obtained as I= (r+ g + b)/3. I is

used to create a Gaussian pyramid I(), where [0..8] is the scale.

The r, g, and b channels are normalized by Iin order to decouple hue


31/65

31

from intensity. However, because hue variations are not perceivable at

very low luminance (and hence are not salient),normalization is only

applied at the locations whereIis larger than 1/10 of its maximum overthe entire image (other locations yield zero r, g, and b). Four broadly-

tuned color channels are created: R = r- (g + b)/2 for red, G = g - (r+

b)/2 for green,B = b - (r+ g)/2 for blue, and Y= (r+ g)/2 - |r- g|/2 - b

for yellow (negative values are set to zero). Four Gaussian pyramids

R(), G(),B(), and Y() are created from these color channels

I(c,s)=|I(c)I(s)|

A second set of maps is similarly constructed for the color

channels, which, in cortex, are represented using a so-called color

double-opponent system: In the center of their receptive fields,

neurons are excited by one color (e.g., red) and inhibited by another

(e.g., green), while the converse is true in the surround. Such spatial

and chromatic opponency exists for the red/green, green/red,

blue/yellow, and yellow/blue color pairs in human primary visual

cortex [12].Accordingly, maps RG(c, s) are created in the model to

simultaneously account for red/green and green/red double opponency

(2) andBY(c, s) for blue/yellow and yellow/blue double opponency (3):

RG(c, s) = |(R(c) - G(c)) (G(s) -R(s))|

BY(c, s) = |(B(c) - Y(c)) (Y(s) -B(s))|

Local orientation information is obtained fromIusing oriented

Gabor pyramids O(, ), where [0..8] represents the scale and

{0o, 45

o, 90

o, 135

o} is the preferred orientation [11]. (Gabor filters,

which are the product of a cosine grating and a 2D Gaussian envelope,

approximate the receptive field sensitivity profile (impulse response) of


32/65

32

orientation-selective neurons in primary visual cortex [12].) Orientation

feature maps, O (c, s, ), encode, as a group, local orientation contrast

between the center and surround scales:

O (c, s, ) = |O(C,) O(S, )|

In total, 42 feature maps are computed: six for intensity, 12 for

color, and 24 for orientation. The predictions of saliency models, that

is, which locations are most likely to be attended to, have been

compared at the quantitative level against the scan paths generate by

human observers looking at the same images.

Saliency Map

The purpose of the saliency map is to represent the saliency at

every location in the visual field by a scalar quantity and to guide the

selection of attended locations, based on the spatial distribution of

saliency. A combination of the feature maps provides bottom-up input

to the saliency map, modelled as a dynamical neural network. One

difficulty in combining different feature maps is that they represent a

priori not comparable modalities, with different dynamic ranges and

extraction. mechanisms. Also, because all 42 feature maps are

combined, salient objects appearing strongly in only a few maps may

be masked by noise or by less-salient objects present in a larger number

of maps. In the absence of top-down supervision, we propose a map

normalization operator, N(.), which globally promotes maps in which a

small number of strong peaks of activity (conspicuous locations) is

present, while globally suppressing maps which contain numerous

comparable peak responses. N(.).

The following Steps are used to find the normalization:


33/65

33

(1) normalizing the values in the map to a fixed range [0..M], in order

to eliminate modality-dependent amplitude differences

(2)finding the location of the maps global maximum M and computing

the average m of all its other local maxima; and

(3) globally multiplying the map by |M - m|2

.

Only local maxima of activity are considered, such that N(.)

compares responses associated with meaningful activitation spots in

the map and ignores homogeneous areas. Comparing the maximumactivity in the entire map to the average overall activation measures

how different the most active location is from the average. When this

difference is large, the most active location stands out, and the map is

strongly promoted. When the difference is small, the map contains

nothing unique and is suppressed. The biological motivation behind the

design of N(.) is that it coarsely replicates cortical lateral inhibition

mechanisms, in which neighbouring similar features inhibit each other

via specific, anatomically defined connections [13].

The motivation for the creation of three separate channels I ,O

andC, and their individual normalization is the hypothesis that similar

features compete strongly for saliency, while different modalities

contribute independently to the saliency map. The three maps are

normalized and summed into the final input S to the saliency map:

S= 1/3 (N(I ) + N (O ) + N(C))

Where N represents Normalization operator.

The three images were taken by the camera and for that

images saliency detection was performed by using above method for


34/65

34

finding one particular attractive portion in an image. In this method, the

first step is to find the three different features like the intensity, color

and orientation and after that three images are used to perform the Haartransform to remove the differencing and noise in that images.

After that for each of these featured a Gaussian scale pyramid

is computed to obtain the scale invariant features using the receptive

fields. To obtain a real time saliency detection system, the most

computational expensive parts are changed by using the calculation of

the center surround difference. After that Normalization was done and

by using the linear combinations all the features are combined and after

that saliency map was formed for the three images(Different chairs and

tables). After that these information are stored in the database manner.

Then the obstacle image is compared with the image in memory. If

match was found means then it returns the object or obstacle was found

and if match is not found means it returns no match .By using thisproposal method the visually impaired people can recognize and track

the objects for their surveillance in this world.

4.1.5 Platform Used

MATLAB R2010a is used for implementing image processing

algorithm on the input image.

4.1.5.1 About MATLAB

MATLAB is a high performance language for technical

computing. It integrates computation, visualization, and

programming in n easy to-use environment where problems and

solutions are expressed in familiar mathematical notation.


35/65

35

MATLAB is an interactive system whose basic

data element is a matrix. This allows formulating solutions to many

technical computing problems, especially those involving matrix

representations, in a fraction of the time it would take to write a program

in a scalar non-interactive language such as C.

The name MATLAB stands for Matrix Laboratory.

MATLAB was written originally to provide easy access to matrix

and linear algebra software that previously required writing

FORTRAN programs to use. Today MATLAB incorporates state of

the art numerical computation software that is highly optimized for

modern processors and memory architectures.

MATLAB is the computational tool of choice for research,

development and analysis. MATLAB is complemented by a family of

application-solutions called toolboxes. The Image Processing Toolbox is

a collection of MATLAB functions that extend the capability of

MATLAB environment for the solution of digital image processing

problems. Other toolboxes that sometimes used to complement

the Image Processing Toolbox are the Signal Processing, Neural

Networks, Fuzzy Logic, and Wavelet Toolboxes.

The power that MATLAB brings to digital image

processing is an extensive set of functions for processing

multidimensional arrays of which images are a special case.

The MATLAB Desktop is the main working environment. It

is a set of graphics tools for tasks such as running MATLAB

commands, viewing output, editing and managing files and variables

and viewing session histories.


36/65

36

4.2 Comparison of Saliency Maps

The most important thing in object recognition system is to

differentiate the objects. The test image will be between multiple images

of objects so that the accuracy of the algorithm can be calculated. The

train image of different objects is saving to a folder train database. Then

the test image will be captured. The threshold of minimum co-efficient of

determination is set to .75 which is the highest value for the system to

recognize the image.

Figure 4.3 Algorithm of Training and Testing Images Comparison


37/65

37

Coefficient of Determination

The coefficient of determinationR2

is used in the context of

statistical models whose main purpose is the prediction of future

outcomes on the basis of other related information. It is the proportion

of variability in a data set that is accounted for by the statistical model.

The coefficient of determination R2(or sometimes r

2) is another

measure of how well the least squares equation

= b0 + b1x

performs as a predictor of y.

There are several different definitions ofR2

which are only

sometimes equivalent. One class of such cases includes that of linear

regression. In this case, if an intercept is included thenR2

is simply the

square of the sample correlation coefficient between the outcomes and

their predicted values, or in the case ofsimple linear regression,between the outcomes and the values of the single regressor being used

for prediction.

In such cases, the coefficient of determination ranges from 0

to 1. Important cases where the computational definition ofR2

can yield

negative values, depending on the definition used, arise where the

predictions which are being compared to the corresponding outcomeshave not been derived from a model-fitting procedure using those data,

and where linear regression is conducted without including an intercept.

Additionally, negative values ofR2

may occur when fitting non-linear

trends to data.[2]

In these instances, the mean of the data provides a fit

to the data that is superior to that of the trend under this goodness of

fit analysis.
http://en.wikipedia.org/wiki/Linear_regressionhttp://en.wikipedia.org/wiki/Linear_regressionhttp://en.wikipedia.org/wiki/Pearson_product-moment_correlation_coefficienthttp://en.wikipedia.org/wiki/Simple_linear_regressionhttp://en.wikipedia.org/wiki/Coefficient_of_determination#cite_note-1http://en.wikipedia.org/wiki/Coefficient_of_determination#cite_note-1http://en.wikipedia.org/wiki/Coefficient_of_determination#cite_note-1http://en.wikipedia.org/wiki/Goodness_of_fithttp://en.wikipedia.org/wiki/Goodness_of_fithttp://en.wikipedia.org/wiki/Goodness_of_fithttp://en.wikipedia.org/wiki/Goodness_of_fithttp://en.wikipedia.org/wiki/Coefficient_of_determination#cite_note-1http://en.wikipedia.org/wiki/Simple_linear_regressionhttp://en.wikipedia.org/wiki/Pearson_product-moment_correlation_coefficienthttp://en.wikipedia.org/wiki/Linear_regressionhttp://en.wikipedia.org/wiki/Linear_regression


38/65

38

A data set has valuesyi, each of which has an associated

modelled valuefi (also sometimes referred to asi). Here, the

valuesyi are called the observed values and the modelled valuesfi aresometimes called the predicted values.

The "variability" of the data set is measured through different sums of

squares:

the total sum of squares (proportional to the sample variance);

the regression sum of squares, also called the explained sum of squares.

,

the sum of squares of residuals, also called the residual sum of squares.

In the above is the mean of the observed data:

where n is the number of observations.

The notations and should be avoided, since in some texts

their meaning is reversed to Residual sum of squares and Explained

sum of squares, respectively.

The most general definition of the coefficient of determination is
http://en.wikipedia.org/wiki/Sum_of_squareshttp://en.wikipedia.org/wiki/Sum_of_squareshttp://en.wikipedia.org/wiki/Total_sum_of_squareshttp://en.wikipedia.org/wiki/Explained_sum_of_squareshttp://en.wikipedia.org/wiki/Residual_sum_of_squareshttp://en.wikipedia.org/wiki/Residual_sum_of_squareshttp://en.wikipedia.org/wiki/Explained_sum_of_squareshttp://en.wikipedia.org/wiki/Total_sum_of_squareshttp://en.wikipedia.org/wiki/Sum_of_squareshttp://en.wikipedia.org/wiki/Sum_of_squares


39/65

39

Hypothesis Test:

The null and alternative hypotheses are

Ho: = 0 (no actual correlation; The Null Hypothesis)

Ha: 0 (there is some correlation; The Alternative

Hypothesis)

By using the coefficient of determination algorithm, the comparison was

made between the training and testing images. The most important thing in

object recognition system is to differentiate between the two objects. The testimage will be between different objects so that the accuracy of the algorithm can

be calculated. The train image of different object is saving to folder train

database. Then the test image will be stored in different folder in the name of

test database. Then by applying the algorithm of coefficient of determination the

comparison was made between the different objects stored in two database.

Figure 4.4 Train Image Folder for Different Train Image


40/65

40

Then the testing images is saving to a folder Testing Database which

consists of different images of chairs, tables and furniture's. The images are

stored after the saliency was done. Then the comparison was made betweenthe two folder by using the coefficient of determination algorithm.

Figure 4.5 Test Image Folder for different test image

4.3 Graphical User Interface Design

A graphical user interface (GUI) can be describe as a

graphical display that contains devices, or components, that enable a

user to perform interactive tasks without creating a script or type

commands at the command line. These components can be push

buttons menus, toggle buttons, toolbars, checkboxes, radio buttons and

sliders etc.


41/65

41

Data can also be display in graphical form or plots

or groups. The user need not know the details of the task. A

simple GUI supported by MATLAB with its rich sets of tools is asshown in Figure 4.6

Figure 4.6 GUI Supported by MATLAB

Creating a GUI using MATLABs Graphical User Interface

Development Environment

(GUIDE) is divided into two relatively managed and independents tasks,

viz:

1) GUI Component layout

2) GUI Programming


42/65

42

In GUI component layout, the GUIDE enables the user to layout

the GUI as required. It involves clicking and dragging of the

components from the components palette to the layout area. Thesecomponents can be aligned, resize, set tab order etc by using tools are

accessible from the Layout Editor. Saving this GUI layout generates an

M-Files(MATLAB) file which helps to control how the GUI works.

This and subsequent activities constitute the GUI Programming tasks.

The generated M-file provides code to initialize the GUI when

launched and contains a framework for the GUI callbacks; the

routines that execute in response to user-generated events such as a

mouse click. Adding codes to the callbacks function using the M-file

editor enable the GUI perform intended operations.

A graphical user interface provides the user with a familiar

environment in which to work. This environment contains

pushbuttons, toggle buttons, lists, menus, textboxes and so forth, all

of which are already familiar to the user, so that he or she can

concentrate on using the application rather than on the mechanics

involved in doing things.

However, GUIs are harder for the programmer because a

GUI-based program must be prepared for mouse clicks (or possiblykeyboard input) for any GUI element at any time. Such inputs are

known as events, and a program that responds to events is said to be

event driven.


43/65

43

Three principal elements required to create a MATLAB Graphical

User Interface :

1. Components: Each item on a MATLAB GUI (pushbuttons, labels,

edit boxes, etc.) is a graphical component. The types of components

include graphical controls (pushbuttons, edit boxes, lists, sliders, etc.),

static elements (frames and text strings), menus, and axes. Graphical

controls and static elements are created by the function uicontrol, and

menus are created by the functions uimenu and uicontextmenu. Axes,

which are used to display graphical data, are created by the function axes.

2.Figures: The components of a GUI must be arranged within a

figure, which is a window on the computer screen. In the past, figures

have been created automatically whenever we have plotted data.

However, empty figures can be created with the function figure and can

be used to hold any combination of components.

3. Call backs: Finally, there must be some way to perform an action if a

user clicks mouse on a button or types information on a keyboard. A

mouse click or key press is an event, and the MATLAB program must

respond to each event if the program is to perform its function.

For example, if a user clicks on a button, that event must cause

the MATLAB code that implements the function of the button to be

executed. The code executed in response to an event is known as a

call back. There must be a call back to implement the function of

each graphical component on the GUI.


44/65

44

Creating and Displaying a Graphical User Interface

MATLAB GUIs are created using a tool called guide,

the GUI Development Environment. This tool allows a programmer

to layout the GUI, selecting and aligning the GUI components to be

placed in it. Once the components are in place, the programmer

can edit their properties: name, color, size, font, text to display and so

forth. When guide saves the GUI, it creates working program including

skeleton functions that the programmer can modify to implement the

behavior of the GUI. When guide is executed, it creates the Layout

Editor. The large white area with grid lines is the layout area, where a

programmer can layout the GUI.

The Layout Editor window has a palate of GUI components

along the left side of the layout area. A user can create any number of

GUI components by first clicking on the desired component, and then

dragging its outline in the layout area. The top of the window has a

toolbar with a series of useful tools that allow the user to distribute and

align GUI components, modify the properties of GUI components,

add menus to GUIs, and so on. The components used and its

functions are

Pushbuttons:A pushbutton is a component that a user can click on to

trigger a specific action. The pushbutton generates a callback when

the user clicks the mouse on it. A pushbutton is created by creating a

uicontrol whose style property is 'pushbutton'. A pushbutton may be

added to a GUI by using the pushbutton tool in the Layout Editor.


45/65

45

Figure 4.7 Layout of a simple GUI with an Pushbutton

Edit Boxes: An edit box is a graphical object that allows a user to

enter at ext string. The edit box generates a call back when the user

presses the Enter key after typing a string into the box. An edit box is

created by creating a uicontrol whose style property is 'edit'. An edit box

may be added to a GUI by using the edit box tool in the Layout Editor.

Figure 4.8 Layout of a simple GUI with an Edit box

The GUI designed is shown in Figure 4.9. The GUI includes

the BROWSE pushbutton for getting the input images which is stored

in the database. The Pushbutton "SALIENCY" on click runs the


46/65

46

process of saliency and displays the images of original and saliency .

The image of the different chairs, tables and furniture's is uploaded as

the input and the SALIENCY push button is pressed which then as aresult of processing displays the image in the display area. Then the

pushbutton "BROWSE" on click runs the process of selecting the saliency

image stored in the testing database and displays the image name in the

display area.

Figure 4.9 GUI Design

The pushbutton "COMPARISON" on click runs the process of

comparing the Training and Testing Database and displays the result of

training and testing images and then the bounding box is drawn in the

training database images . The testing image is compared and the bounding

box is drawn in that testing image which is stored in the training images.

The pushbutton "REFRESH" on click runs the process of refreshing the

display button.


47/65

47

CHAPTER 5

RESULTS

The proposed methodology for detecting hard exudates are implemented

in Matlab and the outcomes are discussed below.

INTENSITY

An intensity image is a data matrix, I, whose values represent

intensities within some range. An intensity image is represented as a

single matrix, with each element of the matrix corresponding to one

image pixel. The matrix can be of class double, uint8, or uint16.

ORIGINAL IMAGE INTENSITY

Figure 5.1 Intensity


48/65

48

COLOR INTENSITY:

The values in a binary, intensity, or RGB image can be

different data types. The data type of the image values determines

which values correspond to black and white as well as the absence or

saturation of color. The following figures shown the color intensity for

the different furniture's, tables and chairs.

ORIGINAL IMAGE COLOR INTENSITY

Figure 5.2 Color Intensity


49/65

49

ORIENTATION:

Orientation is the process of rotating the images in different

angles like 35, 90,125 degrees. The following figures shows theorientation in 35 degree for different chairs, tables and

furniture's.

ORIGINAL IMAGE ORIENTATION

Figure 5.3 35 degree Orientation


50/65

50

The following figures shows the orientation in 125 degree for

different chairs, tables and furniture's.


Figure 5.4 125 degree orientation


51/65

51

The following figures shows the orientation in 90 degree for

different chairs, tables and furniture's.


Figure 5.5 90 degree orientation


52/65

52

IMAGE PYRAMIDS:

Image pyramids is used to represent images at more than one

resolution. The following figure shows the image pyramids at fourlevels for different chairs, tables and furniture's.

Figure 5.6 Image Pyramid


53/65

53

HAAR VERTICAL TRANSFORM:

The Haar Transform is a certain sequence of rescaled "square-

shaped" functions which together form a wavelet family or basis. Thefollowing figure shows the haar vertical transform for different chairs,

tables and furniture's .

Figure 5.7 Haar Vertical Transform


54/65

54

HAAR TRANSFORMED IMAGE:

The following figure shows the transformed image for different

chairs, tables and furniture's.

Figure 5.8 Harr Transformed Image


55/65

55

HAAR TRANSFORM:

The following figure shows the haar transform for different chairs,

tables and furniture's at three levels. The transformed image is againtransform into another one transformed image.

Figure 5.9 Haar Transform


56/65

56

HISTOGRAM:

For each gray level, count the number of pixels having that level For each level, a stick represent the number of pixels(can group

nearby levels to form a bin and count number of pixels in it). Thefollowing figure shows the histogram for ball, mouse and glass. The

histogram figure shows the number of gray levels in X-axis and the

number of pixels in the gray level is shown in the y-axis.

Figure 5.10 Histogram


57/65

57

HISTOGRAM EQUALIZATION:

The main objective of this is after transformation, the histogram

becomes constant. The following figure shown the histogramequalisation of ball, mouse and glass. The histogram obtained after

equalization is spread out over the entire scale of gray-levels.

Figure 5.11 Histogram Equalization


58/65

58

SALIENCY MAP:

Saliency Map is used to represent saliency at all locations with a

scalar quantity. Saliency means the attractive portion in image. In thebelow figure, saliency of particular image is shown for ball, mouse and

glass.

Figure 5.12 Saliency Map


59/65

59

COMPARISON OUTPUT:

The figure shown below consists of Training Images and

Testing Images .The different types of chairs, tables and furniture's are

stored as the training images for comparison purpose and testing image

is the one which is to be compared with the training images.

The Detected result is shown by drawing a bounding box around it

and also displays that "This is the matched Object".

Figure 5.13 Comparison of Saliency Maps


60/65

60

GUI DESIGN:

The GUI is used to get the input image and perform the

saliency and comparison between the different chairs, tables and

Furniture's.

Figure 5.14 GUI deign for proposed Methodology

The above figure shows the GUI design of the proposed

methodology. For getting input image click browse and it runs the

process of getting input from the databases and in the edit box it

displays the name of the image which is chosen by the user for the

process of saliency and comparison. Then the GUI looks like as shown

below. Then by clicking saliency it runs the process of saliency and it

displays the saliency output as shown in figure 5.12 and by clicking

the comparison pushbutton it show the output of comparison as shown

in the figure 5.13.


61/65

61

Figure 5.15 GUI Design for Displaying the input image

Thus, by using these outputs the object can be recognized in

very accuracy manner and the correct object among the different

objects can be detected very easily by the visually impaired people as

these objects are already stored in the databases of both Training and

Testing. The comparison result shows both Training and Testing

Images along with the correct object detected by drawing the bounding

box around the correct object that is the image given in the testing

image .The bounding box was found in the training images so that the

object can be correctly detected.

ERROR RATE FOR THE PROPOSED METHODOLOGY

TOTAL NUMBER OF

IMAGES

MATCHED UNMATCHED

50 images 50 -

As all the 25 images were correctly recognized, the success

rate for the proposed methodology is 100%


62/65

62

CHAPTER 6

CONCLUSION AND FUTURE ENHANCEMENTS

The goal of this master's work was to develop algorithms for detecting the

object in the real time environment for survival of the Visually Impaired

People to recognize the object in front of them and avoid it while moving

from one place to another place. From the literature survey a

comparative study of methodologies implemented is made and the

methods generating better performance are chosen in this work to

recognize the object in front of the visually impaired people. Analysis

on methods was done based on the parameters accuracy and better

performance.

6.1 CONCLUSION

The algorithm is implemented on the input images captured

from the camera and stored those images in database. There are totally

15 images in training Database and comparison was made between

testing and Training Images. Before storing the images in the

databases saliency was made for different chairs, tables and

Furniture's.In this Project the Visually impaired people can detect the

obstacles in front of them and survive in this world without anybody

help. For that Several Steps are carried out. First step is the visual

saliency detection was done as given in aim of our project. In that

Linear filtering which means color, intensity, orientation was done and

after that image pyramids was done by image reduction technique and

from that haar transform was done and by using Difference ofGaussian

and Gabor filter Center-Surround difference was done and from that

normalization was done and from that saliency map was done by using


63/65

63

the Itti koch et al method. The objects were taken and all the above are

performed and comparison was made for that objects by using the

image processing technique in MATLAB. All images are stored inDatabase. The comparison was made for the images and the output

displays both the Training and testing Images in one figure and the

object Matched window shows the correct object detection by drawing

the bounding box around the testing image which is to be compared

with the Training Images. Thus by using this methodology, the object

recognition is very easy and accuracy for those who are not able to

identify the object in front of them and avoid the objects while moving

from one place to another.

6.2FUTURE ENHANCEMENTS

In the Future work, the Audio Saliency detection which

means as like the Visual Saliency Detection the audio can be used to

find saliency for sound that is the attractive portion or part of the sound

can be detected to recognize the sound of the object or the person who

is standing in front of them and thus by using this audio saliency

detection the sound and speaker recognition can be done. For Example,

the object is chair means the device will say that is the chair by

recognizing that object and tell the visually impaired people to avoid

that object and take a another way for their destination. Thus by using

these enhancements the visually impaired people do not need any

assistance in either dependent manner or in independent manner for

moving one place to another place. The system should be able to report

the location, distance and direction of items in the room such as

equipment, furniture, doors and even other users.


64/65

64

REFERENCES:

[1] KOCH,C., AND ULLMAN, S.Shifts in selective visual attention:

towards the underlying neural circuitry. Hum Neurobiol4,4(1985),219-227.

[2] ITTI,L.,KOCH,C., AND NIEBUR,E.A model of saliency based

visual attention for rapid scene analysis. IEEE transactions on pattern

analysis and machine intelligence 20,11(1998),1254-1259.

[3] ITTI,L., AND KOCH, C.Computational modeling of visual

attention. Nature Reviews Neuroscience 2,3(March 2001),194-203.

[4] FRINTROP, S.Vocus: A visual attention system for object

detection and goal-directed search. Lecture Notes in Artificial

Intelligence(LNAI)Vol.3899(2006).

[5] FRINTROP,S., AND ROME, E. Simulating visual attention for

object recognition. In Proceedings of workshop on Early Cognitive

Vision(2004),Isle of Skye, Scotland.

[6] FRINTROP,S.,NUCHTER, A., SURMANN, H., AND

HERTZBERG, J. Saliency-based object recognition in 3d data. Isle of

Skye ,Scotland.

[7] ITTI,L., AND KOCH, C. Feature combination strategies for

saliency based visual attention systems. Journal of Electronic

Imaging 10,1(January 2001),161-169.


65/65

[8] KOCH,C., AND ULLMANN, S. Shifts in selective visual

attention: towards the underlying neural circuitry. Hum Neurobiol

4,4(1985),219-227.

[9] VANRULLEN, R. Visual saliency and spike timing in the

ventral visual pathway. Journal of Physiol Paris 97,2-3(mar-may

2003),365-377.

[10] R.C. GONZALES, R.E. WOODS, Digital Image

Processing[Book], pp. 525-626, Pearson Prentice Hall, Upper Saddle

River, New Jersey, 2008.

[11] European Blind Union. (2002). Statistical Data on blind and

partially sighted people in European countries.

http://www.euroblind.org/fichiersGB/STAT.html

[12] DODSON, A.H.; MOORE, T. & MOON, G.V. (1999). A

Navigation System for the Blind Pedestrian, Proceedings of GNSS

99, 3rd European Symposium on Global Navigation Satellite Systems,

p 513-518, Genoa, Italy, October 1999.

[13] SHOVAL, S.; ULRICH, I. & BORENSTEIN, J. (2000).

Computerized Obstacle Avoidance Systems for the Blind and

Visually Impaired. Invited chapter in Intelligent Systems and

Technologies in Rehabilitation Engineering. Editors: Teodprescu,
http://www.euroblind.org/fichiersGB/STAT.htmlhttp://www.euroblind.org/fichiersGB/STAT.htmlhttp://www.euroblind.org/fichiersGB/STAT.html

object recognition in the surveillance area of visually impaired

Documents