object recognition in the surveillance area of visually impaired
TRANSCRIPT
-
7/31/2019 Object Recognition in the Surveillance Area of Visually Impaired
1/65
1
CHAPTER 1
INTRODUCTION
Blindness is the condition of lacking visual perception due
to physiological or neurological factors. Various scales have been
developed to describe the extent ofvision loss and define blindness.
Total blindness is the complete lack of form and visual light perception
and is clinically recorded as NLP, an abbreviation for no light
perception. Blindness is frequently used to describe severe visual
impairment with residual vision. Those described as having only light
perception have no more sight than the ability to tell light from dark
and the general direction of a light source. Visually impaired people
need some assistance in order to move from one place to another in day
to day life. It might be in a dependent manner with the help of others or
in an independent manner with the help of canes, trained dogs etc. to
guide them. In both the cases the significant objective of them is todetect the obstacle in front of them and avoiding it while moving. With
the advent of electronic technologies self-assistive devices are made to
help them. Some of the present technologies are as follows.
1.1 LASER CANE
This is an electronic cane that uses invisible laser beams to
detect obstacles, drop offs, and similar hazards in the surroundings.
Once the cane detects the obstacle or drop off using the laser beams, it
will produce a specific audio signal. The cane has three distinct audio
signals; each indicates a specific distance. The audio signal informs the
user of the distance of the obstacle or the height of the drop off .This
device can detect objects and hazards up to a distance of 12 feet.
http://en.wikipedia.org/wiki/Visual_perceptionhttp://en.wikipedia.org/wiki/Physiologyhttp://en.wikipedia.org/wiki/Neurologyhttp://en.wikipedia.org/wiki/Vision_losshttp://en.wikipedia.org/wiki/Visual_impairmenthttp://en.wikipedia.org/wiki/Visual_impairmenthttp://en.wiktionary.org/wiki/residualhttp://en.wikipedia.org/wiki/Light_sourcehttp://en.wikipedia.org/wiki/Light_sourcehttp://en.wiktionary.org/wiki/residualhttp://en.wikipedia.org/wiki/Visual_impairmenthttp://en.wikipedia.org/wiki/Visual_impairmenthttp://en.wikipedia.org/wiki/Vision_losshttp://en.wikipedia.org/wiki/Neurologyhttp://en.wikipedia.org/wiki/Physiologyhttp://en.wikipedia.org/wiki/Visual_perception -
7/31/2019 Object Recognition in the Surveillance Area of Visually Impaired
2/65
2
Figure 1.1BLIND PERSON WITH LASER CANE
A part of the canes handle also vibrates when there is an object
in front of the user. The laser cane is suitable for persons who are blind
and persons who are deaf blind. It can be used on its own. However
mobility experts strongly recommend that blind persons first learn the
use of the long white cane before using the laser cane. The Laser Cane
emits beams of invisible light which results in sounds or vibrations
when the beam encounters an object, so as to alert the user to an
obstruction ahead. Weighs one pound, made of aluminum-steel.
1.2 SONIC MOBILITY DEVICE
This is a device that is generally mounted on users head. It
uses ultrasonic technology to detect obstacles and other objects that are
located in front of users path. The sonic mobility devices uses the
musical scales 8 tones to indicate the distance of the object. Each tone
signifies a particular distance from the obstruction. The user hears the
tone through the devices earpiece.
Figure 1.2SONIC MOBILITY DEVICES
-
7/31/2019 Object Recognition in the Surveillance Area of Visually Impaired
3/65
3
1.3 GPS DEVICES FOR THE BLIND
Although mainly used in identifying ones location, GPS
(Global Positioning System) devices also help blind persons in
travelling independently. Blind persons can use portable GPS systems
to determine and verify the correct travel route. They can use these
devices whether are they are walking or riding a vehicle.GPS devices
for the blind include screen readers so the user can hear the
information.
Other GPS devices are connected to a Braille display so the
user can read the information displayed in Braille. Blind person should
use a particular mobility device in addition to the GPS system.
FIGURE 1.3 GPS DEVICES
The Braille devices and software help blind people to improve
their skills in reading and writing. To become literate is very important
for this kind of individuals because it allows them to hope for a
productive future at the same time live with confidence. These
innovative Braille devices and software help the visually-impaired
individuals print and store information quickly, quietly, and reliably.
1.4 ULTRASOUND BASED DETECTION
Here a wearable system for visually impaired users is
implemented which allows them to detect and avoid obstacles. This is
based on ultrasound sensors which can acquire range data from the
-
7/31/2019 Object Recognition in the Surveillance Area of Visually Impaired
4/65
4
objects in the environment by estimating the time-of-flight of the
ultrasound signal. Using a hemispherical sensor array, we can detect
obstacles and determine which directions should be avoided. However,the ultrasound sensors are only used to detect whether the obstacles are
present in front of users. Unimpeded directions are determined by
analyzing patterns of the range values from successive frames.
Feedback is presented to users in the form of voice commands and
vibration patterns.
1.4.1 NEW BRAILLE TECHNOLOGY
Using this technology visually impaired persons can read a
persons emotion or facial expressions to whom he is conversing. To
make this possible here an ordinary web camera, hardware as small as a
coin and a tactile display is used.This enables the visually impaired to
direct interpret human emotions.
Visual information is transferred from the camera into
advanced vibrating patterns displayed on the skin. The vibrators are
sequentially activated to provide dynamic information about what kind
of emotion a person is expressing and the intensity of the emotion
itself.
Figure 1.4 BRAILLE DEVICES AND SOFTWARE
-
7/31/2019 Object Recognition in the Surveillance Area of Visually Impaired
5/65
5
The first step for a user is to learn the patterns of different
facial expressions which can be done by displaying the emotions in
front of a camera which translates it into vibration patterns. In thislearning phase visually impaired person have a tactile display mounted
on the back of a chair. When interacting with other people a sling on
the forearm can be used instead.
The main research focus is to characterize different emotions
and to find a way to present them by means of advanced biomedical
engineering and computer vision technologies. This technology can
also be implemented on mobile phones for tactile rendering of live
football games and human vibration information through vibrations
which is an interesting way of enhancing the experience of mobile
users.
1.5 COMPUTER ASSISTIVE TECHNOLOGYFOR THE BLIND
The most important advancement since blind assistive
technology began to appear in the 1970s is screen reading software,
which simulates the human voice reading the text on computer screen
or renders hard-copy output into Braille. Screen readers are designed to
pick out things that will catch sited people, such as colors and blinking
cursors, and can be modified to choose areas the user wants or doesnt
want.
Figure 1.5 VISUALLY IMAPAIRED ASSISTIVE DEVICES
-
7/31/2019 Object Recognition in the Surveillance Area of Visually Impaired
6/65
6
1.6 CANE WITH SENSOR
The cane is very essential for safe mobility of vision-impaired
people. With this device, they are able to stroll around withoutworrying for bumps. And along with the innovations made in
technology, the cane being used blind people are better improved in
terms of safety and functionality.
Figure 1.6CANE WITH SENSOR
1.7BATTERY-OPERATED SPHYGMOMANOMETER
Blind person can also be subjected to hypertension. And it is
good to know that with the availability of beeping or talking
sphygmomanometer, vision-impaired individuals can now accurate take
or monitor blood pressure by simply using a beeping or talking
sphygmomanometer. This type of medical equipment is battery-
operated. The blood pressure and pulse readings are announced in a
clear voice and shown simultaneously on a digital display.
Figure 1.7BATTERY-OPERATED SPHYGMOMANOMETER
-
7/31/2019 Object Recognition in the Surveillance Area of Visually Impaired
7/65
7
1.8NAVIGATIONAL AID HEADSET
This device is still in concept. However, if successfully
launched, the aid headset will help the blind person to confidently,independently and safely walk through the city streets. The said
navigational aid device comes will a built-in microphone and audio
transducer. It will also incorporate a GPS system, speech recognition,
and obstacle detection technology. Using the microphone, the user will
tell his destination and from the audible information, the GPS system
will direct the user to his desired location and the obstacle technology
will help him safely reach the place by informing him any impediments
he might encounter.
Figure 1.8NAVIGATIONAL AID HEADSET
It is estimated that 7.4 million people in Europe are visually
impaired [11]. For many, known destinations along familiar routes can
be reached with the aid of white canes or guide dogs. By contrast, for
new or unknown destinations along unfamiliar routes (that may change
dynamically) the limitations of these aids become apparent [12, 13, 14]
(e.g. white canes are ineffective for detecting obstacles beyond 3-6
feet). The mobility aids are only useful for assisting visually impaired
people through the immediate environment (termed as micro-
navigation), but do not facilitate the traveller in more distant
environments (termed as macro navigation).
-
7/31/2019 Object Recognition in the Surveillance Area of Visually Impaired
8/65
8
Figure 1.9ELECTRONIC TRAVEL AIDS (ETAS)
` With the proliferation of context-aware research and
development, Electronic Travel Aids (ETAs) such as obstacle
avoidance systems (e.g. Laser Cane and ultrasonic obstacle avoiders )
have been developed to assist visually impaired travellers for micro-
navigation. Whereas, Global Positioning Systems (GPS) and
Geographical Information Systems (GIS) have been/are being
developed for macro navigation (e.g. MOBIC Travel Aid & PersonalGuidance System).
However, despite recent technological advancements, there is
still considerable scope for Human Computer Interaction (HCI)
research. Previous work has predominantly focused on developing
technologies and testing their functionality as opposed to utilizing HCI
principles (e.g. Task Analysis) to actively assess the impact on the user.
For instance, Dodson et al. [12] make the assumption that since a blind
human is the intended navigator a speech user-interface is used to
implement this.
However, despite the contextual complexity of a visually
impaired traveller interacting with various mobility aids (i.e.
navigational system and guide dog/white cane), existing research has
-
7/31/2019 Object Recognition in the Surveillance Area of Visually Impaired
9/65
9
failed to fully address the interaction of contextual components and
how usability is influenced. Further, as more contextual sources are
used to identify and discover a users context, it is becomingincreasingly paramount that information is managed appropriately and
displayed in a way that is tailored to the visually impaired travellers
task, situation and environment.
1.9 WHITE CANE
A white cane is used by many people who
areblindorvisually impaired, both as a mobility tool and as a courtesy
to others. Not all modern white canes are designed to fulfil the same
primary function, however: There are at least five varieties of this tool,
each serving a slightly different need.
TYPES:
Long cane: This "traditional" white cane, also known as a "Hoover"
cane, after Dr. Richard Hoover, is designed primarily as a mobility tool
used to detect objects in the path of a user. Cane length depends upon
the height of a user, and traditionally extends from the floor to the
user's sternum. Some organizers favour the use of much longer canes.
Figure 1.10LONG WHITE CANE
http://en.wikipedia.org/wiki/Blindnesshttp://en.wikipedia.org/wiki/Blindnesshttp://en.wikipedia.org/wiki/Blindnesshttp://en.wikipedia.org/wiki/Visually_impairedhttp://en.wikipedia.org/wiki/Visually_impairedhttp://en.wikipedia.org/wiki/Visually_impairedhttp://en.wikipedia.org/wiki/Human_sternumhttp://en.wikipedia.org/wiki/Human_sternumhttp://en.wikipedia.org/wiki/Visually_impairedhttp://en.wikipedia.org/wiki/Blindness -
7/31/2019 Object Recognition in the Surveillance Area of Visually Impaired
10/65
10
"Kiddie" cane: This version works in the same way as an
adult's long cane, but is designed for use by children.
Figure 1.11 KIDDIE CANE
Identification cane ("Symbol Cane" in British English):
The ID cane is used primarily to alert others as to the bearer's visual
impairment. It is often lighter and shorter than the long cane, and has
no use as a mobility tool.
Figure 1.12IDENTIFICATION CANE
Support cane: The white support cane is designed primarily
to offer physical stability to a visually impaired user. By virtue of its
colour, the cane also works as a means of identification. This tool has
very limited potential as a mobility device.
Figure 1.13 SUPPORT CANE
.
-
7/31/2019 Object Recognition in the Surveillance Area of Visually Impaired
11/65
11
CHAPTER 2
PROBLEM DESCRIPTION
Visually impaired people cannot navigate easily in their day to
day life. They need help of others or cane or other electronic mobility
devices or guided dogs which guides them in an appropriate manner.
So they need a self assistive device to guide and make them
independent from being dependent on others for navigation. The very
preliminary and significant thing is the detection of obstacles in front ofthem and avoiding it. In this project we need to classify the objects,
recognize obstacles or objects, identify them and track the objects
through an image processing technique and suggesting them an
alternative path.
-
7/31/2019 Object Recognition in the Surveillance Area of Visually Impaired
12/65
12
CHAPTER 3
LITERATURE SURVEY AND RELATED WORKS
3.1 VISUAL ATTENTIONThe visual system is not capable of fully processing all of the
visual information that arrives at the eye. In order to get around the
limitation, a mechanism that selects regions of interest for additional
processing is used. This selection is done bottom-up, using saliency
information, and top down, using cueing.
The processing of visual information starts at the retina. The
neurons in the retina have a center surround organization of their
receptive fields. The shapes of these receptive fields are among others
modeled by the difference of Gaussian (DoG).This function captures
the Mexican hat Shape of the retina ganglion cells receptive field.
These cells emphasize boundaries and edges. Further up the visual
processing pathway is the visual cortex area V1.Here are cells that are
orientation selective. These cells can be modeled by a 2D gabor
function. Itti and Kochs implementation of Koch and Ullmans
saliency map is one of the best performing biologically plausible
attention model[1][2][3]. Itti et al.[3]implemented bottom up saliency
detection by modeling specific feature selective retina cells and cells
further up the visual processing pathway. The retina cells use a center
surround receptive field which is modeled in[2] by taking the DoG.
They also model orientation selective cells using 2D Gabor filters. For
each receptive field there is an inhibitory variant. For example if an on-
center off-surround receptive field shows excitation on certain input,
then the input will cause the opposite off-center-on-surround receptive
field to inhibit.
-
7/31/2019 Object Recognition in the Surveillance Area of Visually Impaired
13/65
13
The sub-modalities that Itti et al..[3]use for creating a saliency
map are intensity, color and orientation. For each of these sub-
modalities a Gaussian scale pyramid is computed to obtain scaleinvariant features. For each of these image scale features maps are
created with a receptive field and its inhibitory counterpart.
For each of these image scales feature maps are created with a
receptive field and its inhibitory counterpart .For the intensity sub-
modality on-center off-surround and off-center on-surround feature
maps for different scales are computed based on the pixel intensity. For
the color sub modality feature maps are computed with center surround
receptive fields using a color pixel value as center with its opponent
color as surround. The color combinations used for this are red-green
and blue-yellow. The feature maps for the orientation sub-modality
were created using the 2D Gabor filters for the orientation 0,45,90,135
degrees.
To obtain a saliency map from all these features, a weighting
process is executed in several stages to obtain the most salient features.
In the first stage feature maps are weighted across the different
receptive fields ,in the second stage this is done across the scales and in
the final stage across the sub-modalities. By combining the feature
maps obtained in the last stage a saliency map is created.
The visual system has limited capacity and cannot process
everything that falls onto the retina. Instead, the brain relies on
attention to bring salient details into focus and filter out background
clutter. Two recent studies by researchers at the Salk Institute for
Biological Studies, one study employing computational modeling
techniques and the other experimental techniques, have helped to
unravel the mechanisms underlying attention. The strength of visual
-
7/31/2019 Object Recognition in the Surveillance Area of Visually Impaired
14/65
14
input fluctuates over orders of magnitude. The visual system reacts
automatically to these changes by adjusting its sensitivity, becoming
more sensitive in response to faint inputs, and reducing sensitivity tostrong inputs. For example, when we walk into a darkened lecture hall
on a sunny day at first we see little, but over time our visual system
adapts, increasing its sensitivity to match the environment.
Neurons in the visual cortex view the world through their
"receptive fields," the small portion of the visual field individual
neurons actually "see" or respond to. Whenever a stimulus falls within
the receptive field, the cell produces a volley of electrical spikes,
known as "action potentials" that convey information about the
stimulus in the receptive field.
But the strength and fidelity of these signals also depends on
other factors. Scientists generally agree that neurons typically respond
more strongly when attention is directed to the stimulus in theirreceptive fields. In addition, the response of individual neurons can be
strongly influenced by what's happening within the immediate
surroundings of the receptive field, a phenomenon known as contextual
modulation.
The visual attention mechanism may have at least the
following basic components :
(1) The selection of a region of interest in the visual field.
(2) The selection of feature dimensions and values of interest.
(3) The control of information flow through the network of
neurons that constitutes the visual system.
(4) the shifting from one selected region to the next in time .
-
7/31/2019 Object Recognition in the Surveillance Area of Visually Impaired
15/65
15
The biologically motivated computational attention
system VOCUS (Visual Object detection with a CompUtational
attention System) that detects regions of interest in images. It operatesin two modes, in an exploration mode in which no task is provided, and
in a search mode with a specified target. In exploration mode, regions
of interest are defined by strong contrasts (e.g. color or intensity
contrasts) and by the uniqueness of a feature. For example, a black
sheep is salient in a flock of white sheep. In search mode, the system
uses previously learned information of a target object to bias the
saliency computations with respect to the target.
In various experiments, it is shown that the target is in average
found with less than three fixations, that usually less than five training
images suffice to learn the target information, and that the system is
mostly robust with regard to viewpoint changes and illumination
variances. VOCUS provides a powerful approach to improve existing
vision systems by concentrating computational resources to regions that
are more likely to contain relevant information. The more the complexity
and power of vision systems increases in the future, the more they will
profit from an attentional front-end like VOCUS.
3.2 PSYCHOPHYSICAL MODELS OF ATTENTION
FEATURE INTEGRATED THEORY (FIT):
Both anatomical and physiological evidence support the
hypothesis that the visual system divides input visual information into
distinct subsystems that analyze and code different properties in various
specialized areas. This raises a critical problem of how these dispersed
representations are combined together into an unified perception, i.e.,
the binding problem.
-
7/31/2019 Object Recognition in the Surveillance Area of Visually Impaired
16/65
16
FIT consists of a master map which codes locations of feature
discontinuities in luminance, color, depth or motion, and a separate set
of feature maps for processing information about the current spatiallayout of the features. An attention window moves within a location
map which selects the features attended to and temporarily excludes
others from the feature maps, thus putting the what and where
pathways together.
There are three spatially selective mechanisms used in FIT to
solve the binding problem: selection by a spatial attention window,
inhibition of location feature maps containing unwanted features, and
top-down activation of the location containing the currently attended
object.
FIGURE 3.1FEATURE INTEGRATED THEORY (FIT)
3.3 SALIENCY MAP
SALIENCY:
Something is said to be salient if it stands out.
E.g. road signs should have high saliency
Figure 3.1 SALIENCY MAP
-
7/31/2019 Object Recognition in the Surveillance Area of Visually Impaired
17/65
17
Saliency Map is also defined asa topographically arranged
map that represents visual saliency of a corresponding visual
scene.The Saliency Map Model is defined as
Localizes salient points in the visual field.
Saliency is based on (bottom-up) scene-based properties
Reduces computation by a selection on basis of pre
attentively computed simple features.
Addresses some problems with the integration of
different feature dimensions into a space-related map.
Given an image, we assign to each pixel a value of how
informative the pixel is with respect to the Human visual system
(HVS). Research in this area is generally divided into two topics:
3.3.1 BOTTOM-UP SALIENCY
Also known as pre-attentive vision. Useful for rapid scene
understanding and linked to human survival mechanisms .
The bottom -up saliency method depend only on the
instantaneous sensory input, without taking into account the
internal state of the organism.
A dramatic example of a stimulus that attracts attention using
bottom-up mechanisms is a fire-cracker going off suddenly.
A bottom-up attention, are easier to understand than those that
are influenced by internal states.
Possibly the most influential attempt at understanding bottom-
up attention and the underlying neural mechanisms was made by
Christof Koch and Shimon Ullman (Koch and Ullman, 1985). They
proposed that the different visual features that contribute to attentive
http://www.scholarpedia.org/article/Neuronhttp://www.scholarpedia.org/article/Neuron -
7/31/2019 Object Recognition in the Surveillance Area of Visually Impaired
18/65
18
selection of a stimulus (color, orientation, movement etc) are combined
into one single topographically oriented map, the Saliency map which
integrates the normalized information from the individual feature mapsinto one global measure of conspicuity. In analogy to the center-
surround representations of elementary visual features, bottom-up
saliency is thus determined by how different a stimulus is from its
surround, in many sub modalities and at many scales. To quote from
Koch and Ullman, 1985, Saliency at a given location is determined
primarily by how different this location is from its surround in color,
orientation, motion, depth etc.
Figure 3.2 BOTTOM -UP SALIENCY
3.3.2 TOP-DOWN SALIENCY
Goal-driven. Controlled be higher-order brain processes for tasks
such as object recognition and tracking .
Top-down control, on the other hand, does take into account theinternal state, such as goals the organisms has at this time,
personal history and experiences, etc.
An example of top-down attention is the focusing onto difficult-
to-find food items by an animal that is hungry, ignoring more
"salient" stimuli.
-
7/31/2019 Object Recognition in the Surveillance Area of Visually Impaired
19/65
19
Figure 3.3 TOP-DOWN SALIENCY
EXAMPLEOF SALIENCY MAP:
The figure shows a complex visual scene and the corresponding
saliency map, as computed from the algorithm in Niebur and Koch
(1996). The scene is static so the motion component of the algorithm
does not yield a contribution. The surf line is well-represented in the
saliency map since it combines input from several feature maps:
intensity, orientation and color all have substantial local contrast at
several spatial scales in this area. The same is the case for the clouds
and the island in the distance.
Figure 3.4 SALIENCY MAP FOR STOOL. IN LEFT HAND SIDE
ORIGINAL IMAGE AND IN RIGHT HAND SIDE DISPLAYS
SALIENCY
http://www.scholarpedia.org/article/Algorithmhttp://www.scholarpedia.org/article/Algorithm -
7/31/2019 Object Recognition in the Surveillance Area of Visually Impaired
20/65
20
The original definition of the saliency map by Koch and Ullman
(1985) is in terms of neural processes and transformations, rather than
in terms of cognitive or higher order constructs. The question where thesaliency map is located in the brain arises thus quite naturally.
There is no logical necessity that it arises in one particular location
and it could be understood as a functional map whose components
could be distributed over many brain areas. It is also possible that there
are more than one topographically organized saliency maps.
However, given that many feature maps of early vision are, in fact,
localized in specific parts of the central nervous system, it has been
proposed that the same might also be the case for the saliency map.
Koch and Ullman (1985) proposed that it may be located in the lateral
geniculate nucleus of the thalamus, an area previously suggested as
playing a major role in attentional control.
Another thalamic nucleus, the pulvinar, is known to be involved in
attention (Robinson and Petersen 1992) and has also been suggested as
a candidate for housing the saliency map. Another possibility is the
superior colliculus, likewise known to be involved in the control of
attention (Kustov and Robinson 1996).
Several neocortical areas have been suggested as well, including V1
(Li 2002), V4 (Mazer and Gallant 2003), and posterior
parietal (Gottlieb 2007).
Thus, there are a number of identified candidates which may
correspond to different flavours of salience, perhaps more bottom-up
driven in some area and more strongly modulated by behavioural goals
in some other area.
http://www.scholarpedia.org/article/Brainhttp://www.scholarpedia.org/article/Thalamushttp://www.scholarpedia.org/article/Pulvinarhttp://www.scholarpedia.org/article/Pulvinarhttp://www.scholarpedia.org/article/Thalamushttp://www.scholarpedia.org/article/Brain -
7/31/2019 Object Recognition in the Surveillance Area of Visually Impaired
21/65
21
APPLICATIONS
Visual saliency and the saliency model introduced in the previous
chapter have several applications in computer vision, artificial
intelligence systems as well as more recently developed marketing
analyses. Following are only some of the applications described in the
literature:
Machine Vision and Mobile Robots: Visual saliency is used as a
visual landmark for mobile robot applications, to efficiently compute a
robots localization relative to its environment or to track objects in the
environment (VOCUS - System (A Visual Attention System for Object
Detection and Goal-Directed Search.
Neuromorphic Vision: Developing a high-speed robotics
application that includes visual saliency which can handle as large an
amount of information as the human visual system .
Automatic target detection: For example, finding hidden militaryvehicles or traffic signs .
Image and Video Compression: The eye does not sense the image
at a constant resolution but has a fovea at it center, where
photoreceptors are much closer spaced than in the periphery. This
foveation property of the vertebrate eye has been used in image and
video processing algorithms. A combination of foveated algorithms
combined with visual saliency has been used for video compression.
Medical Imaging: Tumour detection in mammograms using
topographic maps based on salient regions.
Advertisement Design: The prediction of human fixations based on
visual saliency can be used to improve the design of advertisements or
magazine covers.
-
7/31/2019 Object Recognition in the Surveillance Area of Visually Impaired
22/65
22
CHAPTER 4
PROPOSED METHODOLOGY
Figure 4.1 OVERALL BLOCK DIAGRAM
In this Block Diagram representation of the architecture, modules are
visualized by blocks and information streams by arrows.
The goal of the proposed technology is self learning, self-
configuration and self adjustment. The proposed system addresses the
following challenging requirements:
1.The system should be able to report the location, distance and
direction of items in the room such as equipment, furniture, doors and
even other users.
2.It must be a reliable system that minimizes the impact of
installation and maintenance to the building owner.
A great number of benefits are realized from the implementation of
systems, such as greater safety ,autonomy and self esteem, and
eventually, better quality of life.
-
7/31/2019 Object Recognition in the Surveillance Area of Visually Impaired
23/65
23
The visual saliency detection module receives a camera image and
computes a saliency map. The visual location module returns the
location of the most salient object. The feature extraction modulecomputes image features from the salient image region. The visual
saliency detection architecture that will be described in this section is
derived from work of Itti.et.al.
4.1 Visual Saliency Detection
This method implemented bottom up saliency detection by
modelling specific feature selective retina cells and cells further up the
visual processing pathway.
The retina cells use a center surround receptive field which is
modelled in [28] by taking the difference of Gaussian (DoG). They also
model orientation selective cells using 2D Gabor filters. The features
that they use for creating a saliency map are intensity, color and
orientation. For each of these features a Gaussian scale pyramid is
computed to obtain scale invariant features using receptive fields.
. The input image is decomposed through several pre-attentive feature
detection mechanisms (sensitive to color, intensity, etc), which operate
in parallel over the entire visual scene.
Input: static images (640x480)
Each image at 8 different scales(640x480, 320x240, 160x120)
Use different scales for computing centre-surround
differences (similar to assignment).
-
7/31/2019 Object Recognition in the Surveillance Area of Visually Impaired
24/65
24
Neurons in the feature maps then encode for spatial contrast in
each of those feature channels. In addition, neurons in each feature map
spatially compete for salience, through long-range connections thatextend far beyond the spatial range of the classical receptive field of
each neuron (here shown for one channel; the others are similar).
After competition, the feature maps are combined into a
unique saliency map, which topographically encodes for saliency
irrespective of the feature channel in which stimuli appeared salient.
The saliency map is sequentially scanned by attention through the
interplay between a winner-take-all network (which detects the point of
highest saliency at any given time) and inhibition of return (which
suppresses the last attended location from the saliency map, so that
attention can focus onto the next most salient location). Top-down
attentional bias and training can modulate most stages of this bottom-up
model.
It is based on four major principles: visual attention acts on
a multi-featured input; saliency of locations is influenced by the
surrounding context; the saliency of locations is represented on a scalar
map: the saliency map; and the Winner-Take-all and inhibition of
return are suitable mechanisms to allow attention shift. In the
following, the implementation details of the four main steps of themodel are as follows
4.1.1 FEATURE MAPS
First, a number of features (1::j::n) are extracted from the scene by
computing the so called feature maps Fj . Such a map represents the
image of the scene, based on a well-defined feature, which leads to a
multi-featured representation of the scene. In his implementation, Itti
-
7/31/2019 Object Recognition in the Surveillance Area of Visually Impaired
25/65
25
considered seven different features which are computed from an RGB
color image and which belong to three main cues, namely intensity,
color, and orientation.Extraction of Early Visual Features
The r, g, and b is t he red, green, and blue channels of
the input image.
Intensity image:
I = (r+g+b)/3
The r, g, and b channels are normalized by I in order to decouple hue
from intensity.
Hue variation are not perceivable at very low luminance.
Normalization is only applied at the location where
Intensity feature
F1 = I = 0.3.R + 0.59 . G + 0.11 . B
Two chromatic features based on the two color opponency filters R+G
-
and B+Y
-where the yellow signal is defined as
. Such
chromatic opponency exists in human visual cortex.
F2 =
F3 =
jiji II ,, max10
1
-
7/31/2019 Object Recognition in the Surveillance Area of Visually Impaired
26/65
26
The normalization of the chromatic features byIdecouples hue from
intensity.
Four local orientation featuresAccording to the angles { , , , }. Gabor filters,
which represent a suitable mathematical model of the receptive field
impulse response of orientation-selective neurons in primary visual
cortex, are used to compute the orientation features. In this
implementation of the model, it is possible to use an arbitrary number
of orientations. However, it has been noticed that using more than four
orientations does not improve the performance of the model drastically.
4.1.2Center Surround Difference
In a second step, each feature map is transformed in its
conspicuity map which highlights the parts of the scene that strongly
differ, according to a specific feature, from their surroundings. In
biologically plausible models, this is usually achieved by using a
center-surround-mechanism. Practically, this mechanism can be
implemented with a difference-of-Gaussians-filter( DoG) which can be
applied on feature maps to extract local activities for each feature type.
A visual attention task has to detect conspicuous regions, regardless of
their sizes. Thus, a multiscale conspicuity operator is required.
Center-Surround is then implemented as the difference between fine
(c for center) and coarse scales (s for surround). Indeed, for a feature j
(1..j..n), a set of intermediate multiscale conspicuity maps Mj,k
(1..k..K) are computed according to the following Equation, giving rise
to (n K) maps for n considered features.
Mj,k = |Pj(ck) Pj(sk)|
-
7/31/2019 Object Recognition in the Surveillance Area of Visually Impaired
27/65
27
where is a cross-scale difference operator that first interpolates the
coarser scale to the finer one and then carries out a point-by-point
subtraction.
The absolute value of the difference between the center and
the surround allows the simultaneous computing of both sensitivities,
dark center on bright surround and bright center on dark surround
(red/green and green/red or blue/yellow and yellow/blue for color).
Creating the Gaussian pyramid
In this step the original input image I is convolved with a
linearly separable 5x5 Gaussian kernel and is sub sampled in nine (s
[0..8]) different spatial scales.
The sub sampling is obtained as follows:
I( ) =
I1/2
Gaussian Scale Pyramids:
In Gaussian scale pyramids are used for scale invariant receptive
field feature extraction. It is a commonly used method in image
processing, but it is computationally rather expensive. Gaussian
pyramids are used to compute scale invariant features. Different image
scales are normally used so that the filter mask with which an image is
convolved does not have to change. The convolution of an image with a
larger mask is rather time consuming, O(nm) where
n is the number of pixels in the image
m the number of entries in the filter mask.
-
7/31/2019 Object Recognition in the Surveillance Area of Visually Impaired
28/65
28
Figure 4.2 GAUSSIAN SCALE PYRAMID
When a Gaussian pyramid is used, several processing steps have
to be taken. First the input image needs to be scaled down, which can
be done by sub-sampling. Sub-sampling can lead to aliasing and to
overcome this problem the spatial frequencies of the image which are
above the sampling frequency must be removed.
This can be done by smoothing the image with a Gaussian filter
before sub-sampling it. When the receptive field filter is applied the
filtered image needs to be scaled up/back. In that they used 9 spatial
scales and all filtered maps are resized to scale 4. If they used 4 scales,
2 receptive field sizes, and all maps are resized to scale 2. When scaling
up some sort of interpolation needs to be used for anti-aliasing.
4.1.3 NORMALIZATION STRATEGIES
The saliency-based model of visual attention performs two
kinds of map combination. On one hand, the cross-scale combination of
the multiscale conspicuity maps Mj,k in order to compute a unique
conspicuity map Cj for each scene feature.
-
7/31/2019 Object Recognition in the Surveillance Area of Visually Impaired
29/65
29
4.1.4 SALIENCY MAP
Purpose: represent saliency at all locations with a scalar quantity
Feature maps combined into three conspicuity maps Intensity (I)
Color (C)
Orientation (O)
Before they are combined they need to be normalized
Creating the saliency map:The combination of multiple maps is
obtained by the linear combination of the conspicuity maps
The overall computation goal is to have a single map, in which
the most salient object of an image stands out more than others and to
have a mechanism that models the shift to the next most salient object.
The input image is decomposed through several pre-attentive
feature detection mechanisms (sensitive to color, intensity, orientation),
which operate in parallel over the entire visual scene. The models
saliency map is endowed with internal dynamics which generate
attentional shifts. This model consequently represents a complete
account of bottom-up saliency and does not require any top-down
guidance to shift attention.
This framework provides a massively parallel method for the
fast selection of a small number of interesting image locations to be
analyzed by more complex and time consuming object-recognition
processes. Extending this approach in guided-search, feedback from
higher cortical areas (e.g. ,knowledge about targets to be found) was
used to weight the importance of different features. Input is provided in
the form of static color images, usually digitized at 640X480 resolution.
-
7/31/2019 Object Recognition in the Surveillance Area of Visually Impaired
30/65
30
Nine spatial scales are created using dyadic Gaussian pyramids [10],
which progressively low-pass filter and subsample the input image,
yielding horizontal and vertical image-reduction factors ranging from1:1 (scale zero) to 1:256 (scale eight) in eight octaves.
Each feature is computed by a set of linear center-surround
operations akin to visual receptive fields (Fig. 1): Typical visual
neurons are most sensitive in a small region of the visual space (the
center), while stimuli presented in a broader, weaker antagonistic
region concentric with the center (the surround) inhibit the neuronal
response. Such an architecture, sensitive to local spatial discontinuities,
is particularly well-suited to detecting locations which stand out from
their surround and is a general computational principle in the retina,
lateral geniculate nucleus, and primary visual cortex [11].
Center-surround is implemented in the model as the difference
between fine and coarse scales: The center is a pixel at scale c {2, 3,
4}, and the surround is the corresponding pixel at scale s = c + , with
{3, 4}. The across-scale difference between two maps, denoted
below, is obtained by interpolation to the finer scale and point-by-point
subtraction. Using several scales not only for c but also for = s - c
yields truly multiscale feature extraction, by including different size
ratios between the center and surround regions.
Extraction of early visual features
With r, g, and b being the red, green, and blue channels of the
input image, an intensity image I is obtained as I= (r+ g + b)/3. I is
used to create a Gaussian pyramid I(), where [0..8] is the scale.
The r, g, and b channels are normalized by Iin order to decouple hue
-
7/31/2019 Object Recognition in the Surveillance Area of Visually Impaired
31/65
31
from intensity. However, because hue variations are not perceivable at
very low luminance (and hence are not salient),normalization is only
applied at the locations whereIis larger than 1/10 of its maximum overthe entire image (other locations yield zero r, g, and b). Four broadly-
tuned color channels are created: R = r- (g + b)/2 for red, G = g - (r+
b)/2 for green,B = b - (r+ g)/2 for blue, and Y= (r+ g)/2 - |r- g|/2 - b
for yellow (negative values are set to zero). Four Gaussian pyramids
R(), G(),B(), and Y() are created from these color channels
I(c,s)=|I(c)I(s)|
A second set of maps is similarly constructed for the color
channels, which, in cortex, are represented using a so-called color
double-opponent system: In the center of their receptive fields,
neurons are excited by one color (e.g., red) and inhibited by another
(e.g., green), while the converse is true in the surround. Such spatial
and chromatic opponency exists for the red/green, green/red,
blue/yellow, and yellow/blue color pairs in human primary visual
cortex [12].Accordingly, maps RG(c, s) are created in the model to
simultaneously account for red/green and green/red double opponency
(2) andBY(c, s) for blue/yellow and yellow/blue double opponency (3):
RG(c, s) = |(R(c) - G(c)) (G(s) -R(s))|
BY(c, s) = |(B(c) - Y(c)) (Y(s) -B(s))|
Local orientation information is obtained fromIusing oriented
Gabor pyramids O(, ), where [0..8] represents the scale and
{0o, 45
o, 90
o, 135
o} is the preferred orientation [11]. (Gabor filters,
which are the product of a cosine grating and a 2D Gaussian envelope,
approximate the receptive field sensitivity profile (impulse response) of
-
7/31/2019 Object Recognition in the Surveillance Area of Visually Impaired
32/65
32
orientation-selective neurons in primary visual cortex [12].) Orientation
feature maps, O (c, s, ), encode, as a group, local orientation contrast
between the center and surround scales:
O (c, s, ) = |O(C,) O(S, )|
In total, 42 feature maps are computed: six for intensity, 12 for
color, and 24 for orientation. The predictions of saliency models, that
is, which locations are most likely to be attended to, have been
compared at the quantitative level against the scan paths generate by
human observers looking at the same images.
Saliency Map
The purpose of the saliency map is to represent the saliency at
every location in the visual field by a scalar quantity and to guide the
selection of attended locations, based on the spatial distribution of
saliency. A combination of the feature maps provides bottom-up input
to the saliency map, modelled as a dynamical neural network. One
difficulty in combining different feature maps is that they represent a
priori not comparable modalities, with different dynamic ranges and
extraction. mechanisms. Also, because all 42 feature maps are
combined, salient objects appearing strongly in only a few maps may
be masked by noise or by less-salient objects present in a larger number
of maps. In the absence of top-down supervision, we propose a map
normalization operator, N(.), which globally promotes maps in which a
small number of strong peaks of activity (conspicuous locations) is
present, while globally suppressing maps which contain numerous
comparable peak responses. N(.).
The following Steps are used to find the normalization:
-
7/31/2019 Object Recognition in the Surveillance Area of Visually Impaired
33/65
33
(1) normalizing the values in the map to a fixed range [0..M], in order
to eliminate modality-dependent amplitude differences
(2)finding the location of the maps global maximum M and computing
the average m of all its other local maxima; and
(3) globally multiplying the map by |M - m|2
.
Only local maxima of activity are considered, such that N(.)
compares responses associated with meaningful activitation spots in
the map and ignores homogeneous areas. Comparing the maximumactivity in the entire map to the average overall activation measures
how different the most active location is from the average. When this
difference is large, the most active location stands out, and the map is
strongly promoted. When the difference is small, the map contains
nothing unique and is suppressed. The biological motivation behind the
design of N(.) is that it coarsely replicates cortical lateral inhibition
mechanisms, in which neighbouring similar features inhibit each other
via specific, anatomically defined connections [13].
The motivation for the creation of three separate channels I ,O
andC, and their individual normalization is the hypothesis that similar
features compete strongly for saliency, while different modalities
contribute independently to the saliency map. The three maps are
normalized and summed into the final input S to the saliency map:
S= 1/3 (N(I ) + N (O ) + N(C))
Where N represents Normalization operator.
The three images were taken by the camera and for that
images saliency detection was performed by using above method for
-
7/31/2019 Object Recognition in the Surveillance Area of Visually Impaired
34/65
34
finding one particular attractive portion in an image. In this method, the
first step is to find the three different features like the intensity, color
and orientation and after that three images are used to perform the Haartransform to remove the differencing and noise in that images.
After that for each of these featured a Gaussian scale pyramid
is computed to obtain the scale invariant features using the receptive
fields. To obtain a real time saliency detection system, the most
computational expensive parts are changed by using the calculation of
the center surround difference. After that Normalization was done and
by using the linear combinations all the features are combined and after
that saliency map was formed for the three images(Different chairs and
tables). After that these information are stored in the database manner.
Then the obstacle image is compared with the image in memory. If
match was found means then it returns the object or obstacle was found
and if match is not found means it returns no match .By using thisproposal method the visually impaired people can recognize and track
the objects for their surveillance in this world.
4.1.5 Platform Used
MATLAB R2010a is used for implementing image processing
algorithm on the input image.
4.1.5.1 About MATLAB
MATLAB is a high performance language for technical
computing. It integrates computation, visualization, and
programming in n easy to-use environment where problems and
solutions are expressed in familiar mathematical notation.
-
7/31/2019 Object Recognition in the Surveillance Area of Visually Impaired
35/65
35
MATLAB is an interactive system whose basic
data element is a matrix. This allows formulating solutions to many
technical computing problems, especially those involving matrix
representations, in a fraction of the time it would take to write a program
in a scalar non-interactive language such as C.
The name MATLAB stands for Matrix Laboratory.
MATLAB was written originally to provide easy access to matrix
and linear algebra software that previously required writing
FORTRAN programs to use. Today MATLAB incorporates state of
the art numerical computation software that is highly optimized for
modern processors and memory architectures.
MATLAB is the computational tool of choice for research,
development and analysis. MATLAB is complemented by a family of
application-solutions called toolboxes. The Image Processing Toolbox is
a collection of MATLAB functions that extend the capability of
MATLAB environment for the solution of digital image processing
problems. Other toolboxes that sometimes used to complement
the Image Processing Toolbox are the Signal Processing, Neural
Networks, Fuzzy Logic, and Wavelet Toolboxes.
The power that MATLAB brings to digital image
processing is an extensive set of functions for processing
multidimensional arrays of which images are a special case.
The MATLAB Desktop is the main working environment. It
is a set of graphics tools for tasks such as running MATLAB
commands, viewing output, editing and managing files and variables
and viewing session histories.
-
7/31/2019 Object Recognition in the Surveillance Area of Visually Impaired
36/65
36
4.2 Comparison of Saliency Maps
The most important thing in object recognition system is to
differentiate the objects. The test image will be between multiple images
of objects so that the accuracy of the algorithm can be calculated. The
train image of different objects is saving to a folder train database. Then
the test image will be captured. The threshold of minimum co-efficient of
determination is set to .75 which is the highest value for the system to
recognize the image.
Figure 4.3 Algorithm of Training and Testing Images Comparison
-
7/31/2019 Object Recognition in the Surveillance Area of Visually Impaired
37/65
37
Coefficient of Determination
The coefficient of determinationR2
is used in the context of
statistical models whose main purpose is the prediction of future
outcomes on the basis of other related information. It is the proportion
of variability in a data set that is accounted for by the statistical model.
The coefficient of determination R2(or sometimes r
2) is another
measure of how well the least squares equation
= b0 + b1x
performs as a predictor of y.
There are several different definitions ofR2
which are only
sometimes equivalent. One class of such cases includes that of linear
regression. In this case, if an intercept is included thenR2
is simply the
square of the sample correlation coefficient between the outcomes and
their predicted values, or in the case ofsimple linear regression,between the outcomes and the values of the single regressor being used
for prediction.
In such cases, the coefficient of determination ranges from 0
to 1. Important cases where the computational definition ofR2
can yield
negative values, depending on the definition used, arise where the
predictions which are being compared to the corresponding outcomeshave not been derived from a model-fitting procedure using those data,
and where linear regression is conducted without including an intercept.
Additionally, negative values ofR2
may occur when fitting non-linear
trends to data.[2]
In these instances, the mean of the data provides a fit
to the data that is superior to that of the trend under this goodness of
fit analysis.
http://en.wikipedia.org/wiki/Linear_regressionhttp://en.wikipedia.org/wiki/Linear_regressionhttp://en.wikipedia.org/wiki/Pearson_product-moment_correlation_coefficienthttp://en.wikipedia.org/wiki/Simple_linear_regressionhttp://en.wikipedia.org/wiki/Coefficient_of_determination#cite_note-1http://en.wikipedia.org/wiki/Coefficient_of_determination#cite_note-1http://en.wikipedia.org/wiki/Coefficient_of_determination#cite_note-1http://en.wikipedia.org/wiki/Goodness_of_fithttp://en.wikipedia.org/wiki/Goodness_of_fithttp://en.wikipedia.org/wiki/Goodness_of_fithttp://en.wikipedia.org/wiki/Goodness_of_fithttp://en.wikipedia.org/wiki/Coefficient_of_determination#cite_note-1http://en.wikipedia.org/wiki/Simple_linear_regressionhttp://en.wikipedia.org/wiki/Pearson_product-moment_correlation_coefficienthttp://en.wikipedia.org/wiki/Linear_regressionhttp://en.wikipedia.org/wiki/Linear_regression -
7/31/2019 Object Recognition in the Surveillance Area of Visually Impaired
38/65
38
A data set has valuesyi, each of which has an associated
modelled valuefi (also sometimes referred to asi). Here, the
valuesyi are called the observed values and the modelled valuesfi aresometimes called the predicted values.
The "variability" of the data set is measured through different sums of
squares:
the total sum of squares (proportional to the sample variance);
the regression sum of squares, also called the explained sum of squares.
,
the sum of squares of residuals, also called the residual sum of squares.
In the above is the mean of the observed data:
where n is the number of observations.
The notations and should be avoided, since in some texts
their meaning is reversed to Residual sum of squares and Explained
sum of squares, respectively.
The most general definition of the coefficient of determination is
http://en.wikipedia.org/wiki/Sum_of_squareshttp://en.wikipedia.org/wiki/Sum_of_squareshttp://en.wikipedia.org/wiki/Total_sum_of_squareshttp://en.wikipedia.org/wiki/Explained_sum_of_squareshttp://en.wikipedia.org/wiki/Residual_sum_of_squareshttp://en.wikipedia.org/wiki/Residual_sum_of_squareshttp://en.wikipedia.org/wiki/Explained_sum_of_squareshttp://en.wikipedia.org/wiki/Total_sum_of_squareshttp://en.wikipedia.org/wiki/Sum_of_squareshttp://en.wikipedia.org/wiki/Sum_of_squares -
7/31/2019 Object Recognition in the Surveillance Area of Visually Impaired
39/65
39
Hypothesis Test:
The null and alternative hypotheses are
Ho: = 0 (no actual correlation; The Null Hypothesis)
Ha: 0 (there is some correlation; The Alternative
Hypothesis)
By using the coefficient of determination algorithm, the comparison was
made between the training and testing images. The most important thing in
object recognition system is to differentiate between the two objects. The testimage will be between different objects so that the accuracy of the algorithm can
be calculated. The train image of different object is saving to folder train
database. Then the test image will be stored in different folder in the name of
test database. Then by applying the algorithm of coefficient of determination the
comparison was made between the different objects stored in two database.
Figure 4.4 Train Image Folder for Different Train Image
-
7/31/2019 Object Recognition in the Surveillance Area of Visually Impaired
40/65
40
Then the testing images is saving to a folder Testing Database which
consists of different images of chairs, tables and furniture's. The images are
stored after the saliency was done. Then the comparison was made betweenthe two folder by using the coefficient of determination algorithm.
Figure 4.5 Test Image Folder for different test image
4.3 Graphical User Interface Design
A graphical user interface (GUI) can be describe as a
graphical display that contains devices, or components, that enable a
user to perform interactive tasks without creating a script or type
commands at the command line. These components can be push
buttons menus, toggle buttons, toolbars, checkboxes, radio buttons and
sliders etc.
-
7/31/2019 Object Recognition in the Surveillance Area of Visually Impaired
41/65
41
Data can also be display in graphical form or plots
or groups. The user need not know the details of the task. A
simple GUI supported by MATLAB with its rich sets of tools is asshown in Figure 4.6
Figure 4.6 GUI Supported by MATLAB
Creating a GUI using MATLABs Graphical User Interface
Development Environment
(GUIDE) is divided into two relatively managed and independents tasks,
viz:
1) GUI Component layout
2) GUI Programming
-
7/31/2019 Object Recognition in the Surveillance Area of Visually Impaired
42/65
42
In GUI component layout, the GUIDE enables the user to layout
the GUI as required. It involves clicking and dragging of the
components from the components palette to the layout area. Thesecomponents can be aligned, resize, set tab order etc by using tools are
accessible from the Layout Editor. Saving this GUI layout generates an
M-Files(MATLAB) file which helps to control how the GUI works.
This and subsequent activities constitute the GUI Programming tasks.
The generated M-file provides code to initialize the GUI when
launched and contains a framework for the GUI callbacks; the
routines that execute in response to user-generated events such as a
mouse click. Adding codes to the callbacks function using the M-file
editor enable the GUI perform intended operations.
A graphical user interface provides the user with a familiar
environment in which to work. This environment contains
pushbuttons, toggle buttons, lists, menus, textboxes and so forth, all
of which are already familiar to the user, so that he or she can
concentrate on using the application rather than on the mechanics
involved in doing things.
However, GUIs are harder for the programmer because a
GUI-based program must be prepared for mouse clicks (or possiblykeyboard input) for any GUI element at any time. Such inputs are
known as events, and a program that responds to events is said to be
event driven.
-
7/31/2019 Object Recognition in the Surveillance Area of Visually Impaired
43/65
43
Three principal elements required to create a MATLAB Graphical
User Interface :
1. Components: Each item on a MATLAB GUI (pushbuttons, labels,
edit boxes, etc.) is a graphical component. The types of components
include graphical controls (pushbuttons, edit boxes, lists, sliders, etc.),
static elements (frames and text strings), menus, and axes. Graphical
controls and static elements are created by the function uicontrol, and
menus are created by the functions uimenu and uicontextmenu. Axes,
which are used to display graphical data, are created by the function axes.
2.Figures: The components of a GUI must be arranged within a
figure, which is a window on the computer screen. In the past, figures
have been created automatically whenever we have plotted data.
However, empty figures can be created with the function figure and can
be used to hold any combination of components.
3. Call backs: Finally, there must be some way to perform an action if a
user clicks mouse on a button or types information on a keyboard. A
mouse click or key press is an event, and the MATLAB program must
respond to each event if the program is to perform its function.
For example, if a user clicks on a button, that event must cause
the MATLAB code that implements the function of the button to be
executed. The code executed in response to an event is known as a
call back. There must be a call back to implement the function of
each graphical component on the GUI.
-
7/31/2019 Object Recognition in the Surveillance Area of Visually Impaired
44/65
44
Creating and Displaying a Graphical User Interface
MATLAB GUIs are created using a tool called guide,
the GUI Development Environment. This tool allows a programmer
to layout the GUI, selecting and aligning the GUI components to be
placed in it. Once the components are in place, the programmer
can edit their properties: name, color, size, font, text to display and so
forth. When guide saves the GUI, it creates working program including
skeleton functions that the programmer can modify to implement the
behavior of the GUI. When guide is executed, it creates the Layout
Editor. The large white area with grid lines is the layout area, where a
programmer can layout the GUI.
The Layout Editor window has a palate of GUI components
along the left side of the layout area. A user can create any number of
GUI components by first clicking on the desired component, and then
dragging its outline in the layout area. The top of the window has a
toolbar with a series of useful tools that allow the user to distribute and
align GUI components, modify the properties of GUI components,
add menus to GUIs, and so on. The components used and its
functions are
Pushbuttons:A pushbutton is a component that a user can click on to
trigger a specific action. The pushbutton generates a callback when
the user clicks the mouse on it. A pushbutton is created by creating a
uicontrol whose style property is 'pushbutton'. A pushbutton may be
added to a GUI by using the pushbutton tool in the Layout Editor.
-
7/31/2019 Object Recognition in the Surveillance Area of Visually Impaired
45/65
45
Figure 4.7 Layout of a simple GUI with an Pushbutton
Edit Boxes: An edit box is a graphical object that allows a user to
enter at ext string. The edit box generates a call back when the user
presses the Enter key after typing a string into the box. An edit box is
created by creating a uicontrol whose style property is 'edit'. An edit box
may be added to a GUI by using the edit box tool in the Layout Editor.
Figure 4.8 Layout of a simple GUI with an Edit box
The GUI designed is shown in Figure 4.9. The GUI includes
the BROWSE pushbutton for getting the input images which is stored
in the database. The Pushbutton "SALIENCY" on click runs the
-
7/31/2019 Object Recognition in the Surveillance Area of Visually Impaired
46/65
46
process of saliency and displays the images of original and saliency .
The image of the different chairs, tables and furniture's is uploaded as
the input and the SALIENCY push button is pressed which then as aresult of processing displays the image in the display area. Then the
pushbutton "BROWSE" on click runs the process of selecting the saliency
image stored in the testing database and displays the image name in the
display area.
Figure 4.9 GUI Design
The pushbutton "COMPARISON" on click runs the process of
comparing the Training and Testing Database and displays the result of
training and testing images and then the bounding box is drawn in the
training database images . The testing image is compared and the bounding
box is drawn in that testing image which is stored in the training images.
The pushbutton "REFRESH" on click runs the process of refreshing the
display button.
-
7/31/2019 Object Recognition in the Surveillance Area of Visually Impaired
47/65
47
CHAPTER 5
RESULTS
The proposed methodology for detecting hard exudates are implemented
in Matlab and the outcomes are discussed below.
INTENSITY
An intensity image is a data matrix, I, whose values represent
intensities within some range. An intensity image is represented as a
single matrix, with each element of the matrix corresponding to one
image pixel. The matrix can be of class double, uint8, or uint16.
ORIGINAL IMAGE INTENSITY
Figure 5.1 Intensity
-
7/31/2019 Object Recognition in the Surveillance Area of Visually Impaired
48/65
48
COLOR INTENSITY:
The values in a binary, intensity, or RGB image can be
different data types. The data type of the image values determines
which values correspond to black and white as well as the absence or
saturation of color. The following figures shown the color intensity for
the different furniture's, tables and chairs.
ORIGINAL IMAGE COLOR INTENSITY
Figure 5.2 Color Intensity
-
7/31/2019 Object Recognition in the Surveillance Area of Visually Impaired
49/65
49
ORIENTATION:
Orientation is the process of rotating the images in different
angles like 35, 90,125 degrees. The following figures shows theorientation in 35 degree for different chairs, tables and
furniture's.
ORIGINAL IMAGE ORIENTATION
Figure 5.3 35 degree Orientation
-
7/31/2019 Object Recognition in the Surveillance Area of Visually Impaired
50/65
50
The following figures shows the orientation in 125 degree for
different chairs, tables and furniture's.
ORIGINAL IMAGE INTENSITY
Figure 5.4 125 degree orientation
-
7/31/2019 Object Recognition in the Surveillance Area of Visually Impaired
51/65
51
The following figures shows the orientation in 90 degree for
different chairs, tables and furniture's.
ORIGINAL IMAGE INTENSITY
Figure 5.5 90 degree orientation
-
7/31/2019 Object Recognition in the Surveillance Area of Visually Impaired
52/65
52
IMAGE PYRAMIDS:
Image pyramids is used to represent images at more than one
resolution. The following figure shows the image pyramids at fourlevels for different chairs, tables and furniture's.
Figure 5.6 Image Pyramid
-
7/31/2019 Object Recognition in the Surveillance Area of Visually Impaired
53/65
53
HAAR VERTICAL TRANSFORM:
The Haar Transform is a certain sequence of rescaled "square-
shaped" functions which together form a wavelet family or basis. Thefollowing figure shows the haar vertical transform for different chairs,
tables and furniture's .
Figure 5.7 Haar Vertical Transform
-
7/31/2019 Object Recognition in the Surveillance Area of Visually Impaired
54/65
54
HAAR TRANSFORMED IMAGE:
The following figure shows the transformed image for different
chairs, tables and furniture's.
Figure 5.8 Harr Transformed Image
-
7/31/2019 Object Recognition in the Surveillance Area of Visually Impaired
55/65
55
HAAR TRANSFORM:
The following figure shows the haar transform for different chairs,
tables and furniture's at three levels. The transformed image is againtransform into another one transformed image.
Figure 5.9 Haar Transform
-
7/31/2019 Object Recognition in the Surveillance Area of Visually Impaired
56/65
56
HISTOGRAM:
For each gray level, count the number of pixels having that level For each level, a stick represent the number of pixels(can group
nearby levels to form a bin and count number of pixels in it). Thefollowing figure shows the histogram for ball, mouse and glass. The
histogram figure shows the number of gray levels in X-axis and the
number of pixels in the gray level is shown in the y-axis.
Figure 5.10 Histogram
-
7/31/2019 Object Recognition in the Surveillance Area of Visually Impaired
57/65
57
HISTOGRAM EQUALIZATION:
The main objective of this is after transformation, the histogram
becomes constant. The following figure shown the histogramequalisation of ball, mouse and glass. The histogram obtained after
equalization is spread out over the entire scale of gray-levels.
Figure 5.11 Histogram Equalization
-
7/31/2019 Object Recognition in the Surveillance Area of Visually Impaired
58/65
58
SALIENCY MAP:
Saliency Map is used to represent saliency at all locations with a
scalar quantity. Saliency means the attractive portion in image. In thebelow figure, saliency of particular image is shown for ball, mouse and
glass.
Figure 5.12 Saliency Map
-
7/31/2019 Object Recognition in the Surveillance Area of Visually Impaired
59/65
59
COMPARISON OUTPUT:
The figure shown below consists of Training Images and
Testing Images .The different types of chairs, tables and furniture's are
stored as the training images for comparison purpose and testing image
is the one which is to be compared with the training images.
The Detected result is shown by drawing a bounding box around it
and also displays that "This is the matched Object".
Figure 5.13 Comparison of Saliency Maps
-
7/31/2019 Object Recognition in the Surveillance Area of Visually Impaired
60/65
60
GUI DESIGN:
The GUI is used to get the input image and perform the
saliency and comparison between the different chairs, tables and
Furniture's.
Figure 5.14 GUI deign for proposed Methodology
The above figure shows the GUI design of the proposed
methodology. For getting input image click browse and it runs the
process of getting input from the databases and in the edit box it
displays the name of the image which is chosen by the user for the
process of saliency and comparison. Then the GUI looks like as shown
below. Then by clicking saliency it runs the process of saliency and it
displays the saliency output as shown in figure 5.12 and by clicking
the comparison pushbutton it show the output of comparison as shown
in the figure 5.13.
-
7/31/2019 Object Recognition in the Surveillance Area of Visually Impaired
61/65
61
Figure 5.15 GUI Design for Displaying the input image
Thus, by using these outputs the object can be recognized in
very accuracy manner and the correct object among the different
objects can be detected very easily by the visually impaired people as
these objects are already stored in the databases of both Training and
Testing. The comparison result shows both Training and Testing
Images along with the correct object detected by drawing the bounding
box around the correct object that is the image given in the testing
image .The bounding box was found in the training images so that the
object can be correctly detected.
ERROR RATE FOR THE PROPOSED METHODOLOGY
TOTAL NUMBER OF
IMAGES
MATCHED UNMATCHED
50 images 50 -
As all the 25 images were correctly recognized, the success
rate for the proposed methodology is 100%
-
7/31/2019 Object Recognition in the Surveillance Area of Visually Impaired
62/65
62
CHAPTER 6
CONCLUSION AND FUTURE ENHANCEMENTS
The goal of this master's work was to develop algorithms for detecting the
object in the real time environment for survival of the Visually Impaired
People to recognize the object in front of them and avoid it while moving
from one place to another place. From the literature survey a
comparative study of methodologies implemented is made and the
methods generating better performance are chosen in this work to
recognize the object in front of the visually impaired people. Analysis
on methods was done based on the parameters accuracy and better
performance.
6.1 CONCLUSION
The algorithm is implemented on the input images captured
from the camera and stored those images in database. There are totally
15 images in training Database and comparison was made between
testing and Training Images. Before storing the images in the
databases saliency was made for different chairs, tables and
Furniture's.In this Project the Visually impaired people can detect the
obstacles in front of them and survive in this world without anybody
help. For that Several Steps are carried out. First step is the visual
saliency detection was done as given in aim of our project. In that
Linear filtering which means color, intensity, orientation was done and
after that image pyramids was done by image reduction technique and
from that haar transform was done and by using Difference ofGaussian
and Gabor filter Center-Surround difference was done and from that
normalization was done and from that saliency map was done by using
-
7/31/2019 Object Recognition in the Surveillance Area of Visually Impaired
63/65
63
the Itti koch et al method. The objects were taken and all the above are
performed and comparison was made for that objects by using the
image processing technique in MATLAB. All images are stored inDatabase. The comparison was made for the images and the output
displays both the Training and testing Images in one figure and the
object Matched window shows the correct object detection by drawing
the bounding box around the testing image which is to be compared
with the Training Images. Thus by using this methodology, the object
recognition is very easy and accuracy for those who are not able to
identify the object in front of them and avoid the objects while moving
from one place to another.
6.2FUTURE ENHANCEMENTS
In the Future work, the Audio Saliency detection which
means as like the Visual Saliency Detection the audio can be used to
find saliency for sound that is the attractive portion or part of the sound
can be detected to recognize the sound of the object or the person who
is standing in front of them and thus by using this audio saliency
detection the sound and speaker recognition can be done. For Example,
the object is chair means the device will say that is the chair by
recognizing that object and tell the visually impaired people to avoid
that object and take a another way for their destination. Thus by using
these enhancements the visually impaired people do not need any
assistance in either dependent manner or in independent manner for
moving one place to another place. The system should be able to report
the location, distance and direction of items in the room such as
equipment, furniture, doors and even other users.
-
7/31/2019 Object Recognition in the Surveillance Area of Visually Impaired
64/65
64
REFERENCES:
[1] KOCH,C., AND ULLMAN, S.Shifts in selective visual attention:
towards the underlying neural circuitry. Hum Neurobiol4,4(1985),219-227.
[2] ITTI,L.,KOCH,C., AND NIEBUR,E.A model of saliency based
visual attention for rapid scene analysis. IEEE transactions on pattern
analysis and machine intelligence 20,11(1998),1254-1259.
[3] ITTI,L., AND KOCH, C.Computational modeling of visual
attention. Nature Reviews Neuroscience 2,3(March 2001),194-203.
[4] FRINTROP, S.Vocus: A visual attention system for object
detection and goal-directed search. Lecture Notes in Artificial
Intelligence(LNAI)Vol.3899(2006).
[5] FRINTROP,S., AND ROME, E. Simulating visual attention for
object recognition. In Proceedings of workshop on Early Cognitive
Vision(2004),Isle of Skye, Scotland.
[6] FRINTROP,S.,NUCHTER, A., SURMANN, H., AND
HERTZBERG, J. Saliency-based object recognition in 3d data. Isle of
Skye ,Scotland.
[7] ITTI,L., AND KOCH, C. Feature combination strategies for
saliency based visual attention systems. Journal of Electronic
Imaging 10,1(January 2001),161-169.
-
7/31/2019 Object Recognition in the Surveillance Area of Visually Impaired
65/65
[8] KOCH,C., AND ULLMANN, S. Shifts in selective visual
attention: towards the underlying neural circuitry. Hum Neurobiol
4,4(1985),219-227.
[9] VANRULLEN, R. Visual saliency and spike timing in the
ventral visual pathway. Journal of Physiol Paris 97,2-3(mar-may
2003),365-377.
[10] R.C. GONZALES, R.E. WOODS, Digital Image
Processing[Book], pp. 525-626, Pearson Prentice Hall, Upper Saddle
River, New Jersey, 2008.
[11] European Blind Union. (2002). Statistical Data on blind and
partially sighted people in European countries.
http://www.euroblind.org/fichiersGB/STAT.html
[12] DODSON, A.H.; MOORE, T. & MOON, G.V. (1999). A
Navigation System for the Blind Pedestrian, Proceedings of GNSS
99, 3rd European Symposium on Global Navigation Satellite Systems,
p 513-518, Genoa, Italy, October 1999.
[13] SHOVAL, S.; ULRICH, I. & BORENSTEIN, J. (2000).
Computerized Obstacle Avoidance Systems for the Blind and
Visually Impaired. Invited chapter in Intelligent Systems and
Technologies in Rehabilitation Engineering. Editors: Teodprescu,
http://www.euroblind.org/fichiersGB/STAT.htmlhttp://www.euroblind.org/fichiersGB/STAT.htmlhttp://www.euroblind.org/fichiersGB/STAT.html