object recognition in the surveillance area of visually impaired

Upload: najmunnisa-ily

Post on 05-Apr-2018

214 views

Category:

Documents


0 download

TRANSCRIPT

  • 7/31/2019 Object Recognition in the Surveillance Area of Visually Impaired

    1/65

    1

    CHAPTER 1

    INTRODUCTION

    Blindness is the condition of lacking visual perception due

    to physiological or neurological factors. Various scales have been

    developed to describe the extent ofvision loss and define blindness.

    Total blindness is the complete lack of form and visual light perception

    and is clinically recorded as NLP, an abbreviation for no light

    perception. Blindness is frequently used to describe severe visual

    impairment with residual vision. Those described as having only light

    perception have no more sight than the ability to tell light from dark

    and the general direction of a light source. Visually impaired people

    need some assistance in order to move from one place to another in day

    to day life. It might be in a dependent manner with the help of others or

    in an independent manner with the help of canes, trained dogs etc. to

    guide them. In both the cases the significant objective of them is todetect the obstacle in front of them and avoiding it while moving. With

    the advent of electronic technologies self-assistive devices are made to

    help them. Some of the present technologies are as follows.

    1.1 LASER CANE

    This is an electronic cane that uses invisible laser beams to

    detect obstacles, drop offs, and similar hazards in the surroundings.

    Once the cane detects the obstacle or drop off using the laser beams, it

    will produce a specific audio signal. The cane has three distinct audio

    signals; each indicates a specific distance. The audio signal informs the

    user of the distance of the obstacle or the height of the drop off .This

    device can detect objects and hazards up to a distance of 12 feet.

    http://en.wikipedia.org/wiki/Visual_perceptionhttp://en.wikipedia.org/wiki/Physiologyhttp://en.wikipedia.org/wiki/Neurologyhttp://en.wikipedia.org/wiki/Vision_losshttp://en.wikipedia.org/wiki/Visual_impairmenthttp://en.wikipedia.org/wiki/Visual_impairmenthttp://en.wiktionary.org/wiki/residualhttp://en.wikipedia.org/wiki/Light_sourcehttp://en.wikipedia.org/wiki/Light_sourcehttp://en.wiktionary.org/wiki/residualhttp://en.wikipedia.org/wiki/Visual_impairmenthttp://en.wikipedia.org/wiki/Visual_impairmenthttp://en.wikipedia.org/wiki/Vision_losshttp://en.wikipedia.org/wiki/Neurologyhttp://en.wikipedia.org/wiki/Physiologyhttp://en.wikipedia.org/wiki/Visual_perception
  • 7/31/2019 Object Recognition in the Surveillance Area of Visually Impaired

    2/65

    2

    Figure 1.1BLIND PERSON WITH LASER CANE

    A part of the canes handle also vibrates when there is an object

    in front of the user. The laser cane is suitable for persons who are blind

    and persons who are deaf blind. It can be used on its own. However

    mobility experts strongly recommend that blind persons first learn the

    use of the long white cane before using the laser cane. The Laser Cane

    emits beams of invisible light which results in sounds or vibrations

    when the beam encounters an object, so as to alert the user to an

    obstruction ahead. Weighs one pound, made of aluminum-steel.

    1.2 SONIC MOBILITY DEVICE

    This is a device that is generally mounted on users head. It

    uses ultrasonic technology to detect obstacles and other objects that are

    located in front of users path. The sonic mobility devices uses the

    musical scales 8 tones to indicate the distance of the object. Each tone

    signifies a particular distance from the obstruction. The user hears the

    tone through the devices earpiece.

    Figure 1.2SONIC MOBILITY DEVICES

  • 7/31/2019 Object Recognition in the Surveillance Area of Visually Impaired

    3/65

    3

    1.3 GPS DEVICES FOR THE BLIND

    Although mainly used in identifying ones location, GPS

    (Global Positioning System) devices also help blind persons in

    travelling independently. Blind persons can use portable GPS systems

    to determine and verify the correct travel route. They can use these

    devices whether are they are walking or riding a vehicle.GPS devices

    for the blind include screen readers so the user can hear the

    information.

    Other GPS devices are connected to a Braille display so the

    user can read the information displayed in Braille. Blind person should

    use a particular mobility device in addition to the GPS system.

    FIGURE 1.3 GPS DEVICES

    The Braille devices and software help blind people to improve

    their skills in reading and writing. To become literate is very important

    for this kind of individuals because it allows them to hope for a

    productive future at the same time live with confidence. These

    innovative Braille devices and software help the visually-impaired

    individuals print and store information quickly, quietly, and reliably.

    1.4 ULTRASOUND BASED DETECTION

    Here a wearable system for visually impaired users is

    implemented which allows them to detect and avoid obstacles. This is

    based on ultrasound sensors which can acquire range data from the

  • 7/31/2019 Object Recognition in the Surveillance Area of Visually Impaired

    4/65

    4

    objects in the environment by estimating the time-of-flight of the

    ultrasound signal. Using a hemispherical sensor array, we can detect

    obstacles and determine which directions should be avoided. However,the ultrasound sensors are only used to detect whether the obstacles are

    present in front of users. Unimpeded directions are determined by

    analyzing patterns of the range values from successive frames.

    Feedback is presented to users in the form of voice commands and

    vibration patterns.

    1.4.1 NEW BRAILLE TECHNOLOGY

    Using this technology visually impaired persons can read a

    persons emotion or facial expressions to whom he is conversing. To

    make this possible here an ordinary web camera, hardware as small as a

    coin and a tactile display is used.This enables the visually impaired to

    direct interpret human emotions.

    Visual information is transferred from the camera into

    advanced vibrating patterns displayed on the skin. The vibrators are

    sequentially activated to provide dynamic information about what kind

    of emotion a person is expressing and the intensity of the emotion

    itself.

    Figure 1.4 BRAILLE DEVICES AND SOFTWARE

  • 7/31/2019 Object Recognition in the Surveillance Area of Visually Impaired

    5/65

    5

    The first step for a user is to learn the patterns of different

    facial expressions which can be done by displaying the emotions in

    front of a camera which translates it into vibration patterns. In thislearning phase visually impaired person have a tactile display mounted

    on the back of a chair. When interacting with other people a sling on

    the forearm can be used instead.

    The main research focus is to characterize different emotions

    and to find a way to present them by means of advanced biomedical

    engineering and computer vision technologies. This technology can

    also be implemented on mobile phones for tactile rendering of live

    football games and human vibration information through vibrations

    which is an interesting way of enhancing the experience of mobile

    users.

    1.5 COMPUTER ASSISTIVE TECHNOLOGYFOR THE BLIND

    The most important advancement since blind assistive

    technology began to appear in the 1970s is screen reading software,

    which simulates the human voice reading the text on computer screen

    or renders hard-copy output into Braille. Screen readers are designed to

    pick out things that will catch sited people, such as colors and blinking

    cursors, and can be modified to choose areas the user wants or doesnt

    want.

    Figure 1.5 VISUALLY IMAPAIRED ASSISTIVE DEVICES

  • 7/31/2019 Object Recognition in the Surveillance Area of Visually Impaired

    6/65

    6

    1.6 CANE WITH SENSOR

    The cane is very essential for safe mobility of vision-impaired

    people. With this device, they are able to stroll around withoutworrying for bumps. And along with the innovations made in

    technology, the cane being used blind people are better improved in

    terms of safety and functionality.

    Figure 1.6CANE WITH SENSOR

    1.7BATTERY-OPERATED SPHYGMOMANOMETER

    Blind person can also be subjected to hypertension. And it is

    good to know that with the availability of beeping or talking

    sphygmomanometer, vision-impaired individuals can now accurate take

    or monitor blood pressure by simply using a beeping or talking

    sphygmomanometer. This type of medical equipment is battery-

    operated. The blood pressure and pulse readings are announced in a

    clear voice and shown simultaneously on a digital display.

    Figure 1.7BATTERY-OPERATED SPHYGMOMANOMETER

  • 7/31/2019 Object Recognition in the Surveillance Area of Visually Impaired

    7/65

    7

    1.8NAVIGATIONAL AID HEADSET

    This device is still in concept. However, if successfully

    launched, the aid headset will help the blind person to confidently,independently and safely walk through the city streets. The said

    navigational aid device comes will a built-in microphone and audio

    transducer. It will also incorporate a GPS system, speech recognition,

    and obstacle detection technology. Using the microphone, the user will

    tell his destination and from the audible information, the GPS system

    will direct the user to his desired location and the obstacle technology

    will help him safely reach the place by informing him any impediments

    he might encounter.

    Figure 1.8NAVIGATIONAL AID HEADSET

    It is estimated that 7.4 million people in Europe are visually

    impaired [11]. For many, known destinations along familiar routes can

    be reached with the aid of white canes or guide dogs. By contrast, for

    new or unknown destinations along unfamiliar routes (that may change

    dynamically) the limitations of these aids become apparent [12, 13, 14]

    (e.g. white canes are ineffective for detecting obstacles beyond 3-6

    feet). The mobility aids are only useful for assisting visually impaired

    people through the immediate environment (termed as micro-

    navigation), but do not facilitate the traveller in more distant

    environments (termed as macro navigation).

  • 7/31/2019 Object Recognition in the Surveillance Area of Visually Impaired

    8/65

    8

    Figure 1.9ELECTRONIC TRAVEL AIDS (ETAS)

    ` With the proliferation of context-aware research and

    development, Electronic Travel Aids (ETAs) such as obstacle

    avoidance systems (e.g. Laser Cane and ultrasonic obstacle avoiders )

    have been developed to assist visually impaired travellers for micro-

    navigation. Whereas, Global Positioning Systems (GPS) and

    Geographical Information Systems (GIS) have been/are being

    developed for macro navigation (e.g. MOBIC Travel Aid & PersonalGuidance System).

    However, despite recent technological advancements, there is

    still considerable scope for Human Computer Interaction (HCI)

    research. Previous work has predominantly focused on developing

    technologies and testing their functionality as opposed to utilizing HCI

    principles (e.g. Task Analysis) to actively assess the impact on the user.

    For instance, Dodson et al. [12] make the assumption that since a blind

    human is the intended navigator a speech user-interface is used to

    implement this.

    However, despite the contextual complexity of a visually

    impaired traveller interacting with various mobility aids (i.e.

    navigational system and guide dog/white cane), existing research has

  • 7/31/2019 Object Recognition in the Surveillance Area of Visually Impaired

    9/65

    9

    failed to fully address the interaction of contextual components and

    how usability is influenced. Further, as more contextual sources are

    used to identify and discover a users context, it is becomingincreasingly paramount that information is managed appropriately and

    displayed in a way that is tailored to the visually impaired travellers

    task, situation and environment.

    1.9 WHITE CANE

    A white cane is used by many people who

    areblindorvisually impaired, both as a mobility tool and as a courtesy

    to others. Not all modern white canes are designed to fulfil the same

    primary function, however: There are at least five varieties of this tool,

    each serving a slightly different need.

    TYPES:

    Long cane: This "traditional" white cane, also known as a "Hoover"

    cane, after Dr. Richard Hoover, is designed primarily as a mobility tool

    used to detect objects in the path of a user. Cane length depends upon

    the height of a user, and traditionally extends from the floor to the

    user's sternum. Some organizers favour the use of much longer canes.

    Figure 1.10LONG WHITE CANE

    http://en.wikipedia.org/wiki/Blindnesshttp://en.wikipedia.org/wiki/Blindnesshttp://en.wikipedia.org/wiki/Blindnesshttp://en.wikipedia.org/wiki/Visually_impairedhttp://en.wikipedia.org/wiki/Visually_impairedhttp://en.wikipedia.org/wiki/Visually_impairedhttp://en.wikipedia.org/wiki/Human_sternumhttp://en.wikipedia.org/wiki/Human_sternumhttp://en.wikipedia.org/wiki/Visually_impairedhttp://en.wikipedia.org/wiki/Blindness
  • 7/31/2019 Object Recognition in the Surveillance Area of Visually Impaired

    10/65

    10

    "Kiddie" cane: This version works in the same way as an

    adult's long cane, but is designed for use by children.

    Figure 1.11 KIDDIE CANE

    Identification cane ("Symbol Cane" in British English):

    The ID cane is used primarily to alert others as to the bearer's visual

    impairment. It is often lighter and shorter than the long cane, and has

    no use as a mobility tool.

    Figure 1.12IDENTIFICATION CANE

    Support cane: The white support cane is designed primarily

    to offer physical stability to a visually impaired user. By virtue of its

    colour, the cane also works as a means of identification. This tool has

    very limited potential as a mobility device.

    Figure 1.13 SUPPORT CANE

    .

  • 7/31/2019 Object Recognition in the Surveillance Area of Visually Impaired

    11/65

    11

    CHAPTER 2

    PROBLEM DESCRIPTION

    Visually impaired people cannot navigate easily in their day to

    day life. They need help of others or cane or other electronic mobility

    devices or guided dogs which guides them in an appropriate manner.

    So they need a self assistive device to guide and make them

    independent from being dependent on others for navigation. The very

    preliminary and significant thing is the detection of obstacles in front ofthem and avoiding it. In this project we need to classify the objects,

    recognize obstacles or objects, identify them and track the objects

    through an image processing technique and suggesting them an

    alternative path.

  • 7/31/2019 Object Recognition in the Surveillance Area of Visually Impaired

    12/65

    12

    CHAPTER 3

    LITERATURE SURVEY AND RELATED WORKS

    3.1 VISUAL ATTENTIONThe visual system is not capable of fully processing all of the

    visual information that arrives at the eye. In order to get around the

    limitation, a mechanism that selects regions of interest for additional

    processing is used. This selection is done bottom-up, using saliency

    information, and top down, using cueing.

    The processing of visual information starts at the retina. The

    neurons in the retina have a center surround organization of their

    receptive fields. The shapes of these receptive fields are among others

    modeled by the difference of Gaussian (DoG).This function captures

    the Mexican hat Shape of the retina ganglion cells receptive field.

    These cells emphasize boundaries and edges. Further up the visual

    processing pathway is the visual cortex area V1.Here are cells that are

    orientation selective. These cells can be modeled by a 2D gabor

    function. Itti and Kochs implementation of Koch and Ullmans

    saliency map is one of the best performing biologically plausible

    attention model[1][2][3]. Itti et al.[3]implemented bottom up saliency

    detection by modeling specific feature selective retina cells and cells

    further up the visual processing pathway. The retina cells use a center

    surround receptive field which is modeled in[2] by taking the DoG.

    They also model orientation selective cells using 2D Gabor filters. For

    each receptive field there is an inhibitory variant. For example if an on-

    center off-surround receptive field shows excitation on certain input,

    then the input will cause the opposite off-center-on-surround receptive

    field to inhibit.

  • 7/31/2019 Object Recognition in the Surveillance Area of Visually Impaired

    13/65

    13

    The sub-modalities that Itti et al..[3]use for creating a saliency

    map are intensity, color and orientation. For each of these sub-

    modalities a Gaussian scale pyramid is computed to obtain scaleinvariant features. For each of these image scale features maps are

    created with a receptive field and its inhibitory counterpart.

    For each of these image scales feature maps are created with a

    receptive field and its inhibitory counterpart .For the intensity sub-

    modality on-center off-surround and off-center on-surround feature

    maps for different scales are computed based on the pixel intensity. For

    the color sub modality feature maps are computed with center surround

    receptive fields using a color pixel value as center with its opponent

    color as surround. The color combinations used for this are red-green

    and blue-yellow. The feature maps for the orientation sub-modality

    were created using the 2D Gabor filters for the orientation 0,45,90,135

    degrees.

    To obtain a saliency map from all these features, a weighting

    process is executed in several stages to obtain the most salient features.

    In the first stage feature maps are weighted across the different

    receptive fields ,in the second stage this is done across the scales and in

    the final stage across the sub-modalities. By combining the feature

    maps obtained in the last stage a saliency map is created.

    The visual system has limited capacity and cannot process

    everything that falls onto the retina. Instead, the brain relies on

    attention to bring salient details into focus and filter out background

    clutter. Two recent studies by researchers at the Salk Institute for

    Biological Studies, one study employing computational modeling

    techniques and the other experimental techniques, have helped to

    unravel the mechanisms underlying attention. The strength of visual

  • 7/31/2019 Object Recognition in the Surveillance Area of Visually Impaired

    14/65

    14

    input fluctuates over orders of magnitude. The visual system reacts

    automatically to these changes by adjusting its sensitivity, becoming

    more sensitive in response to faint inputs, and reducing sensitivity tostrong inputs. For example, when we walk into a darkened lecture hall

    on a sunny day at first we see little, but over time our visual system

    adapts, increasing its sensitivity to match the environment.

    Neurons in the visual cortex view the world through their

    "receptive fields," the small portion of the visual field individual

    neurons actually "see" or respond to. Whenever a stimulus falls within

    the receptive field, the cell produces a volley of electrical spikes,

    known as "action potentials" that convey information about the

    stimulus in the receptive field.

    But the strength and fidelity of these signals also depends on

    other factors. Scientists generally agree that neurons typically respond

    more strongly when attention is directed to the stimulus in theirreceptive fields. In addition, the response of individual neurons can be

    strongly influenced by what's happening within the immediate

    surroundings of the receptive field, a phenomenon known as contextual

    modulation.

    The visual attention mechanism may have at least the

    following basic components :

    (1) The selection of a region of interest in the visual field.

    (2) The selection of feature dimensions and values of interest.

    (3) The control of information flow through the network of

    neurons that constitutes the visual system.

    (4) the shifting from one selected region to the next in time .

  • 7/31/2019 Object Recognition in the Surveillance Area of Visually Impaired

    15/65

    15

    The biologically motivated computational attention

    system VOCUS (Visual Object detection with a CompUtational

    attention System) that detects regions of interest in images. It operatesin two modes, in an exploration mode in which no task is provided, and

    in a search mode with a specified target. In exploration mode, regions

    of interest are defined by strong contrasts (e.g. color or intensity

    contrasts) and by the uniqueness of a feature. For example, a black

    sheep is salient in a flock of white sheep. In search mode, the system

    uses previously learned information of a target object to bias the

    saliency computations with respect to the target.

    In various experiments, it is shown that the target is in average

    found with less than three fixations, that usually less than five training

    images suffice to learn the target information, and that the system is

    mostly robust with regard to viewpoint changes and illumination

    variances. VOCUS provides a powerful approach to improve existing

    vision systems by concentrating computational resources to regions that

    are more likely to contain relevant information. The more the complexity

    and power of vision systems increases in the future, the more they will

    profit from an attentional front-end like VOCUS.

    3.2 PSYCHOPHYSICAL MODELS OF ATTENTION

    FEATURE INTEGRATED THEORY (FIT):

    Both anatomical and physiological evidence support the

    hypothesis that the visual system divides input visual information into

    distinct subsystems that analyze and code different properties in various

    specialized areas. This raises a critical problem of how these dispersed

    representations are combined together into an unified perception, i.e.,

    the binding problem.

  • 7/31/2019 Object Recognition in the Surveillance Area of Visually Impaired

    16/65

    16

    FIT consists of a master map which codes locations of feature

    discontinuities in luminance, color, depth or motion, and a separate set

    of feature maps for processing information about the current spatiallayout of the features. An attention window moves within a location

    map which selects the features attended to and temporarily excludes

    others from the feature maps, thus putting the what and where

    pathways together.

    There are three spatially selective mechanisms used in FIT to

    solve the binding problem: selection by a spatial attention window,

    inhibition of location feature maps containing unwanted features, and

    top-down activation of the location containing the currently attended

    object.

    FIGURE 3.1FEATURE INTEGRATED THEORY (FIT)

    3.3 SALIENCY MAP

    SALIENCY:

    Something is said to be salient if it stands out.

    E.g. road signs should have high saliency

    Figure 3.1 SALIENCY MAP

  • 7/31/2019 Object Recognition in the Surveillance Area of Visually Impaired

    17/65

    17

    Saliency Map is also defined asa topographically arranged

    map that represents visual saliency of a corresponding visual

    scene.The Saliency Map Model is defined as

    Localizes salient points in the visual field.

    Saliency is based on (bottom-up) scene-based properties

    Reduces computation by a selection on basis of pre

    attentively computed simple features.

    Addresses some problems with the integration of

    different feature dimensions into a space-related map.

    Given an image, we assign to each pixel a value of how

    informative the pixel is with respect to the Human visual system

    (HVS). Research in this area is generally divided into two topics:

    3.3.1 BOTTOM-UP SALIENCY

    Also known as pre-attentive vision. Useful for rapid scene

    understanding and linked to human survival mechanisms .

    The bottom -up saliency method depend only on the

    instantaneous sensory input, without taking into account the

    internal state of the organism.

    A dramatic example of a stimulus that attracts attention using

    bottom-up mechanisms is a fire-cracker going off suddenly.

    A bottom-up attention, are easier to understand than those that

    are influenced by internal states.

    Possibly the most influential attempt at understanding bottom-

    up attention and the underlying neural mechanisms was made by

    Christof Koch and Shimon Ullman (Koch and Ullman, 1985). They

    proposed that the different visual features that contribute to attentive

    http://www.scholarpedia.org/article/Neuronhttp://www.scholarpedia.org/article/Neuron
  • 7/31/2019 Object Recognition in the Surveillance Area of Visually Impaired

    18/65

    18

    selection of a stimulus (color, orientation, movement etc) are combined

    into one single topographically oriented map, the Saliency map which

    integrates the normalized information from the individual feature mapsinto one global measure of conspicuity. In analogy to the center-

    surround representations of elementary visual features, bottom-up

    saliency is thus determined by how different a stimulus is from its

    surround, in many sub modalities and at many scales. To quote from

    Koch and Ullman, 1985, Saliency at a given location is determined

    primarily by how different this location is from its surround in color,

    orientation, motion, depth etc.

    Figure 3.2 BOTTOM -UP SALIENCY

    3.3.2 TOP-DOWN SALIENCY

    Goal-driven. Controlled be higher-order brain processes for tasks

    such as object recognition and tracking .

    Top-down control, on the other hand, does take into account theinternal state, such as goals the organisms has at this time,

    personal history and experiences, etc.

    An example of top-down attention is the focusing onto difficult-

    to-find food items by an animal that is hungry, ignoring more

    "salient" stimuli.

  • 7/31/2019 Object Recognition in the Surveillance Area of Visually Impaired

    19/65

    19

    Figure 3.3 TOP-DOWN SALIENCY

    EXAMPLEOF SALIENCY MAP:

    The figure shows a complex visual scene and the corresponding

    saliency map, as computed from the algorithm in Niebur and Koch

    (1996). The scene is static so the motion component of the algorithm

    does not yield a contribution. The surf line is well-represented in the

    saliency map since it combines input from several feature maps:

    intensity, orientation and color all have substantial local contrast at

    several spatial scales in this area. The same is the case for the clouds

    and the island in the distance.

    Figure 3.4 SALIENCY MAP FOR STOOL. IN LEFT HAND SIDE

    ORIGINAL IMAGE AND IN RIGHT HAND SIDE DISPLAYS

    SALIENCY

    http://www.scholarpedia.org/article/Algorithmhttp://www.scholarpedia.org/article/Algorithm
  • 7/31/2019 Object Recognition in the Surveillance Area of Visually Impaired

    20/65

    20

    The original definition of the saliency map by Koch and Ullman

    (1985) is in terms of neural processes and transformations, rather than

    in terms of cognitive or higher order constructs. The question where thesaliency map is located in the brain arises thus quite naturally.

    There is no logical necessity that it arises in one particular location

    and it could be understood as a functional map whose components

    could be distributed over many brain areas. It is also possible that there

    are more than one topographically organized saliency maps.

    However, given that many feature maps of early vision are, in fact,

    localized in specific parts of the central nervous system, it has been

    proposed that the same might also be the case for the saliency map.

    Koch and Ullman (1985) proposed that it may be located in the lateral

    geniculate nucleus of the thalamus, an area previously suggested as

    playing a major role in attentional control.

    Another thalamic nucleus, the pulvinar, is known to be involved in

    attention (Robinson and Petersen 1992) and has also been suggested as

    a candidate for housing the saliency map. Another possibility is the

    superior colliculus, likewise known to be involved in the control of

    attention (Kustov and Robinson 1996).

    Several neocortical areas have been suggested as well, including V1

    (Li 2002), V4 (Mazer and Gallant 2003), and posterior

    parietal (Gottlieb 2007).

    Thus, there are a number of identified candidates which may

    correspond to different flavours of salience, perhaps more bottom-up

    driven in some area and more strongly modulated by behavioural goals

    in some other area.

    http://www.scholarpedia.org/article/Brainhttp://www.scholarpedia.org/article/Thalamushttp://www.scholarpedia.org/article/Pulvinarhttp://www.scholarpedia.org/article/Pulvinarhttp://www.scholarpedia.org/article/Thalamushttp://www.scholarpedia.org/article/Brain
  • 7/31/2019 Object Recognition in the Surveillance Area of Visually Impaired

    21/65

    21

    APPLICATIONS

    Visual saliency and the saliency model introduced in the previous

    chapter have several applications in computer vision, artificial

    intelligence systems as well as more recently developed marketing

    analyses. Following are only some of the applications described in the

    literature:

    Machine Vision and Mobile Robots: Visual saliency is used as a

    visual landmark for mobile robot applications, to efficiently compute a

    robots localization relative to its environment or to track objects in the

    environment (VOCUS - System (A Visual Attention System for Object

    Detection and Goal-Directed Search.

    Neuromorphic Vision: Developing a high-speed robotics

    application that includes visual saliency which can handle as large an

    amount of information as the human visual system .

    Automatic target detection: For example, finding hidden militaryvehicles or traffic signs .

    Image and Video Compression: The eye does not sense the image

    at a constant resolution but has a fovea at it center, where

    photoreceptors are much closer spaced than in the periphery. This

    foveation property of the vertebrate eye has been used in image and

    video processing algorithms. A combination of foveated algorithms

    combined with visual saliency has been used for video compression.

    Medical Imaging: Tumour detection in mammograms using

    topographic maps based on salient regions.

    Advertisement Design: The prediction of human fixations based on

    visual saliency can be used to improve the design of advertisements or

    magazine covers.

  • 7/31/2019 Object Recognition in the Surveillance Area of Visually Impaired

    22/65

    22

    CHAPTER 4

    PROPOSED METHODOLOGY

    Figure 4.1 OVERALL BLOCK DIAGRAM

    In this Block Diagram representation of the architecture, modules are

    visualized by blocks and information streams by arrows.

    The goal of the proposed technology is self learning, self-

    configuration and self adjustment. The proposed system addresses the

    following challenging requirements:

    1.The system should be able to report the location, distance and

    direction of items in the room such as equipment, furniture, doors and

    even other users.

    2.It must be a reliable system that minimizes the impact of

    installation and maintenance to the building owner.

    A great number of benefits are realized from the implementation of

    systems, such as greater safety ,autonomy and self esteem, and

    eventually, better quality of life.

  • 7/31/2019 Object Recognition in the Surveillance Area of Visually Impaired

    23/65

    23

    The visual saliency detection module receives a camera image and

    computes a saliency map. The visual location module returns the

    location of the most salient object. The feature extraction modulecomputes image features from the salient image region. The visual

    saliency detection architecture that will be described in this section is

    derived from work of Itti.et.al.

    4.1 Visual Saliency Detection

    This method implemented bottom up saliency detection by

    modelling specific feature selective retina cells and cells further up the

    visual processing pathway.

    The retina cells use a center surround receptive field which is

    modelled in [28] by taking the difference of Gaussian (DoG). They also

    model orientation selective cells using 2D Gabor filters. The features

    that they use for creating a saliency map are intensity, color and

    orientation. For each of these features a Gaussian scale pyramid is

    computed to obtain scale invariant features using receptive fields.

    . The input image is decomposed through several pre-attentive feature

    detection mechanisms (sensitive to color, intensity, etc), which operate

    in parallel over the entire visual scene.

    Input: static images (640x480)

    Each image at 8 different scales(640x480, 320x240, 160x120)

    Use different scales for computing centre-surround

    differences (similar to assignment).

  • 7/31/2019 Object Recognition in the Surveillance Area of Visually Impaired

    24/65

    24

    Neurons in the feature maps then encode for spatial contrast in

    each of those feature channels. In addition, neurons in each feature map

    spatially compete for salience, through long-range connections thatextend far beyond the spatial range of the classical receptive field of

    each neuron (here shown for one channel; the others are similar).

    After competition, the feature maps are combined into a

    unique saliency map, which topographically encodes for saliency

    irrespective of the feature channel in which stimuli appeared salient.

    The saliency map is sequentially scanned by attention through the

    interplay between a winner-take-all network (which detects the point of

    highest saliency at any given time) and inhibition of return (which

    suppresses the last attended location from the saliency map, so that

    attention can focus onto the next most salient location). Top-down

    attentional bias and training can modulate most stages of this bottom-up

    model.

    It is based on four major principles: visual attention acts on

    a multi-featured input; saliency of locations is influenced by the

    surrounding context; the saliency of locations is represented on a scalar

    map: the saliency map; and the Winner-Take-all and inhibition of

    return are suitable mechanisms to allow attention shift. In the

    following, the implementation details of the four main steps of themodel are as follows

    4.1.1 FEATURE MAPS

    First, a number of features (1::j::n) are extracted from the scene by

    computing the so called feature maps Fj . Such a map represents the

    image of the scene, based on a well-defined feature, which leads to a

    multi-featured representation of the scene. In his implementation, Itti

  • 7/31/2019 Object Recognition in the Surveillance Area of Visually Impaired

    25/65

    25

    considered seven different features which are computed from an RGB

    color image and which belong to three main cues, namely intensity,

    color, and orientation.Extraction of Early Visual Features

    The r, g, and b is t he red, green, and blue channels of

    the input image.

    Intensity image:

    I = (r+g+b)/3

    The r, g, and b channels are normalized by I in order to decouple hue

    from intensity.

    Hue variation are not perceivable at very low luminance.

    Normalization is only applied at the location where

    Intensity feature

    F1 = I = 0.3.R + 0.59 . G + 0.11 . B

    Two chromatic features based on the two color opponency filters R+G

    -

    and B+Y

    -where the yellow signal is defined as

    . Such

    chromatic opponency exists in human visual cortex.

    F2 =

    F3 =

    jiji II ,, max10

    1

  • 7/31/2019 Object Recognition in the Surveillance Area of Visually Impaired

    26/65

    26

    The normalization of the chromatic features byIdecouples hue from

    intensity.

    Four local orientation featuresAccording to the angles { , , , }. Gabor filters,

    which represent a suitable mathematical model of the receptive field

    impulse response of orientation-selective neurons in primary visual

    cortex, are used to compute the orientation features. In this

    implementation of the model, it is possible to use an arbitrary number

    of orientations. However, it has been noticed that using more than four

    orientations does not improve the performance of the model drastically.

    4.1.2Center Surround Difference

    In a second step, each feature map is transformed in its

    conspicuity map which highlights the parts of the scene that strongly

    differ, according to a specific feature, from their surroundings. In

    biologically plausible models, this is usually achieved by using a

    center-surround-mechanism. Practically, this mechanism can be

    implemented with a difference-of-Gaussians-filter( DoG) which can be

    applied on feature maps to extract local activities for each feature type.

    A visual attention task has to detect conspicuous regions, regardless of

    their sizes. Thus, a multiscale conspicuity operator is required.

    Center-Surround is then implemented as the difference between fine

    (c for center) and coarse scales (s for surround). Indeed, for a feature j

    (1..j..n), a set of intermediate multiscale conspicuity maps Mj,k

    (1..k..K) are computed according to the following Equation, giving rise

    to (n K) maps for n considered features.

    Mj,k = |Pj(ck) Pj(sk)|

  • 7/31/2019 Object Recognition in the Surveillance Area of Visually Impaired

    27/65

    27

    where is a cross-scale difference operator that first interpolates the

    coarser scale to the finer one and then carries out a point-by-point

    subtraction.

    The absolute value of the difference between the center and

    the surround allows the simultaneous computing of both sensitivities,

    dark center on bright surround and bright center on dark surround

    (red/green and green/red or blue/yellow and yellow/blue for color).

    Creating the Gaussian pyramid

    In this step the original input image I is convolved with a

    linearly separable 5x5 Gaussian kernel and is sub sampled in nine (s

    [0..8]) different spatial scales.

    The sub sampling is obtained as follows:

    I( ) =

    I1/2

    Gaussian Scale Pyramids:

    In Gaussian scale pyramids are used for scale invariant receptive

    field feature extraction. It is a commonly used method in image

    processing, but it is computationally rather expensive. Gaussian

    pyramids are used to compute scale invariant features. Different image

    scales are normally used so that the filter mask with which an image is

    convolved does not have to change. The convolution of an image with a

    larger mask is rather time consuming, O(nm) where

    n is the number of pixels in the image

    m the number of entries in the filter mask.

  • 7/31/2019 Object Recognition in the Surveillance Area of Visually Impaired

    28/65

    28

    Figure 4.2 GAUSSIAN SCALE PYRAMID

    When a Gaussian pyramid is used, several processing steps have

    to be taken. First the input image needs to be scaled down, which can

    be done by sub-sampling. Sub-sampling can lead to aliasing and to

    overcome this problem the spatial frequencies of the image which are

    above the sampling frequency must be removed.

    This can be done by smoothing the image with a Gaussian filter

    before sub-sampling it. When the receptive field filter is applied the

    filtered image needs to be scaled up/back. In that they used 9 spatial

    scales and all filtered maps are resized to scale 4. If they used 4 scales,

    2 receptive field sizes, and all maps are resized to scale 2. When scaling

    up some sort of interpolation needs to be used for anti-aliasing.

    4.1.3 NORMALIZATION STRATEGIES

    The saliency-based model of visual attention performs two

    kinds of map combination. On one hand, the cross-scale combination of

    the multiscale conspicuity maps Mj,k in order to compute a unique

    conspicuity map Cj for each scene feature.

  • 7/31/2019 Object Recognition in the Surveillance Area of Visually Impaired

    29/65

    29

    4.1.4 SALIENCY MAP

    Purpose: represent saliency at all locations with a scalar quantity

    Feature maps combined into three conspicuity maps Intensity (I)

    Color (C)

    Orientation (O)

    Before they are combined they need to be normalized

    Creating the saliency map:The combination of multiple maps is

    obtained by the linear combination of the conspicuity maps

    The overall computation goal is to have a single map, in which

    the most salient object of an image stands out more than others and to

    have a mechanism that models the shift to the next most salient object.

    The input image is decomposed through several pre-attentive

    feature detection mechanisms (sensitive to color, intensity, orientation),

    which operate in parallel over the entire visual scene. The models

    saliency map is endowed with internal dynamics which generate

    attentional shifts. This model consequently represents a complete

    account of bottom-up saliency and does not require any top-down

    guidance to shift attention.

    This framework provides a massively parallel method for the

    fast selection of a small number of interesting image locations to be

    analyzed by more complex and time consuming object-recognition

    processes. Extending this approach in guided-search, feedback from

    higher cortical areas (e.g. ,knowledge about targets to be found) was

    used to weight the importance of different features. Input is provided in

    the form of static color images, usually digitized at 640X480 resolution.

  • 7/31/2019 Object Recognition in the Surveillance Area of Visually Impaired

    30/65

    30

    Nine spatial scales are created using dyadic Gaussian pyramids [10],

    which progressively low-pass filter and subsample the input image,

    yielding horizontal and vertical image-reduction factors ranging from1:1 (scale zero) to 1:256 (scale eight) in eight octaves.

    Each feature is computed by a set of linear center-surround

    operations akin to visual receptive fields (Fig. 1): Typical visual

    neurons are most sensitive in a small region of the visual space (the

    center), while stimuli presented in a broader, weaker antagonistic

    region concentric with the center (the surround) inhibit the neuronal

    response. Such an architecture, sensitive to local spatial discontinuities,

    is particularly well-suited to detecting locations which stand out from

    their surround and is a general computational principle in the retina,

    lateral geniculate nucleus, and primary visual cortex [11].

    Center-surround is implemented in the model as the difference

    between fine and coarse scales: The center is a pixel at scale c {2, 3,

    4}, and the surround is the corresponding pixel at scale s = c + , with

    {3, 4}. The across-scale difference between two maps, denoted

    below, is obtained by interpolation to the finer scale and point-by-point

    subtraction. Using several scales not only for c but also for = s - c

    yields truly multiscale feature extraction, by including different size

    ratios between the center and surround regions.

    Extraction of early visual features

    With r, g, and b being the red, green, and blue channels of the

    input image, an intensity image I is obtained as I= (r+ g + b)/3. I is

    used to create a Gaussian pyramid I(), where [0..8] is the scale.

    The r, g, and b channels are normalized by Iin order to decouple hue

  • 7/31/2019 Object Recognition in the Surveillance Area of Visually Impaired

    31/65

    31

    from intensity. However, because hue variations are not perceivable at

    very low luminance (and hence are not salient),normalization is only

    applied at the locations whereIis larger than 1/10 of its maximum overthe entire image (other locations yield zero r, g, and b). Four broadly-

    tuned color channels are created: R = r- (g + b)/2 for red, G = g - (r+

    b)/2 for green,B = b - (r+ g)/2 for blue, and Y= (r+ g)/2 - |r- g|/2 - b

    for yellow (negative values are set to zero). Four Gaussian pyramids

    R(), G(),B(), and Y() are created from these color channels

    I(c,s)=|I(c)I(s)|

    A second set of maps is similarly constructed for the color

    channels, which, in cortex, are represented using a so-called color

    double-opponent system: In the center of their receptive fields,

    neurons are excited by one color (e.g., red) and inhibited by another

    (e.g., green), while the converse is true in the surround. Such spatial

    and chromatic opponency exists for the red/green, green/red,

    blue/yellow, and yellow/blue color pairs in human primary visual

    cortex [12].Accordingly, maps RG(c, s) are created in the model to

    simultaneously account for red/green and green/red double opponency

    (2) andBY(c, s) for blue/yellow and yellow/blue double opponency (3):

    RG(c, s) = |(R(c) - G(c)) (G(s) -R(s))|

    BY(c, s) = |(B(c) - Y(c)) (Y(s) -B(s))|

    Local orientation information is obtained fromIusing oriented

    Gabor pyramids O(, ), where [0..8] represents the scale and

    {0o, 45

    o, 90

    o, 135

    o} is the preferred orientation [11]. (Gabor filters,

    which are the product of a cosine grating and a 2D Gaussian envelope,

    approximate the receptive field sensitivity profile (impulse response) of

  • 7/31/2019 Object Recognition in the Surveillance Area of Visually Impaired

    32/65

    32

    orientation-selective neurons in primary visual cortex [12].) Orientation

    feature maps, O (c, s, ), encode, as a group, local orientation contrast

    between the center and surround scales:

    O (c, s, ) = |O(C,) O(S, )|

    In total, 42 feature maps are computed: six for intensity, 12 for

    color, and 24 for orientation. The predictions of saliency models, that

    is, which locations are most likely to be attended to, have been

    compared at the quantitative level against the scan paths generate by

    human observers looking at the same images.

    Saliency Map

    The purpose of the saliency map is to represent the saliency at

    every location in the visual field by a scalar quantity and to guide the

    selection of attended locations, based on the spatial distribution of

    saliency. A combination of the feature maps provides bottom-up input

    to the saliency map, modelled as a dynamical neural network. One

    difficulty in combining different feature maps is that they represent a

    priori not comparable modalities, with different dynamic ranges and

    extraction. mechanisms. Also, because all 42 feature maps are

    combined, salient objects appearing strongly in only a few maps may

    be masked by noise or by less-salient objects present in a larger number

    of maps. In the absence of top-down supervision, we propose a map

    normalization operator, N(.), which globally promotes maps in which a

    small number of strong peaks of activity (conspicuous locations) is

    present, while globally suppressing maps which contain numerous

    comparable peak responses. N(.).

    The following Steps are used to find the normalization:

  • 7/31/2019 Object Recognition in the Surveillance Area of Visually Impaired

    33/65

    33

    (1) normalizing the values in the map to a fixed range [0..M], in order

    to eliminate modality-dependent amplitude differences

    (2)finding the location of the maps global maximum M and computing

    the average m of all its other local maxima; and

    (3) globally multiplying the map by |M - m|2

    .

    Only local maxima of activity are considered, such that N(.)

    compares responses associated with meaningful activitation spots in

    the map and ignores homogeneous areas. Comparing the maximumactivity in the entire map to the average overall activation measures

    how different the most active location is from the average. When this

    difference is large, the most active location stands out, and the map is

    strongly promoted. When the difference is small, the map contains

    nothing unique and is suppressed. The biological motivation behind the

    design of N(.) is that it coarsely replicates cortical lateral inhibition

    mechanisms, in which neighbouring similar features inhibit each other

    via specific, anatomically defined connections [13].

    The motivation for the creation of three separate channels I ,O

    andC, and their individual normalization is the hypothesis that similar

    features compete strongly for saliency, while different modalities

    contribute independently to the saliency map. The three maps are

    normalized and summed into the final input S to the saliency map:

    S= 1/3 (N(I ) + N (O ) + N(C))

    Where N represents Normalization operator.

    The three images were taken by the camera and for that

    images saliency detection was performed by using above method for

  • 7/31/2019 Object Recognition in the Surveillance Area of Visually Impaired

    34/65

    34

    finding one particular attractive portion in an image. In this method, the

    first step is to find the three different features like the intensity, color

    and orientation and after that three images are used to perform the Haartransform to remove the differencing and noise in that images.

    After that for each of these featured a Gaussian scale pyramid

    is computed to obtain the scale invariant features using the receptive

    fields. To obtain a real time saliency detection system, the most

    computational expensive parts are changed by using the calculation of

    the center surround difference. After that Normalization was done and

    by using the linear combinations all the features are combined and after

    that saliency map was formed for the three images(Different chairs and

    tables). After that these information are stored in the database manner.

    Then the obstacle image is compared with the image in memory. If

    match was found means then it returns the object or obstacle was found

    and if match is not found means it returns no match .By using thisproposal method the visually impaired people can recognize and track

    the objects for their surveillance in this world.

    4.1.5 Platform Used

    MATLAB R2010a is used for implementing image processing

    algorithm on the input image.

    4.1.5.1 About MATLAB

    MATLAB is a high performance language for technical

    computing. It integrates computation, visualization, and

    programming in n easy to-use environment where problems and

    solutions are expressed in familiar mathematical notation.

  • 7/31/2019 Object Recognition in the Surveillance Area of Visually Impaired

    35/65

    35

    MATLAB is an interactive system whose basic

    data element is a matrix. This allows formulating solutions to many

    technical computing problems, especially those involving matrix

    representations, in a fraction of the time it would take to write a program

    in a scalar non-interactive language such as C.

    The name MATLAB stands for Matrix Laboratory.

    MATLAB was written originally to provide easy access to matrix

    and linear algebra software that previously required writing

    FORTRAN programs to use. Today MATLAB incorporates state of

    the art numerical computation software that is highly optimized for

    modern processors and memory architectures.

    MATLAB is the computational tool of choice for research,

    development and analysis. MATLAB is complemented by a family of

    application-solutions called toolboxes. The Image Processing Toolbox is

    a collection of MATLAB functions that extend the capability of

    MATLAB environment for the solution of digital image processing

    problems. Other toolboxes that sometimes used to complement

    the Image Processing Toolbox are the Signal Processing, Neural

    Networks, Fuzzy Logic, and Wavelet Toolboxes.

    The power that MATLAB brings to digital image

    processing is an extensive set of functions for processing

    multidimensional arrays of which images are a special case.

    The MATLAB Desktop is the main working environment. It

    is a set of graphics tools for tasks such as running MATLAB

    commands, viewing output, editing and managing files and variables

    and viewing session histories.

  • 7/31/2019 Object Recognition in the Surveillance Area of Visually Impaired

    36/65

    36

    4.2 Comparison of Saliency Maps

    The most important thing in object recognition system is to

    differentiate the objects. The test image will be between multiple images

    of objects so that the accuracy of the algorithm can be calculated. The

    train image of different objects is saving to a folder train database. Then

    the test image will be captured. The threshold of minimum co-efficient of

    determination is set to .75 which is the highest value for the system to

    recognize the image.

    Figure 4.3 Algorithm of Training and Testing Images Comparison

  • 7/31/2019 Object Recognition in the Surveillance Area of Visually Impaired

    37/65

    37

    Coefficient of Determination

    The coefficient of determinationR2

    is used in the context of

    statistical models whose main purpose is the prediction of future

    outcomes on the basis of other related information. It is the proportion

    of variability in a data set that is accounted for by the statistical model.

    The coefficient of determination R2(or sometimes r

    2) is another

    measure of how well the least squares equation

    = b0 + b1x

    performs as a predictor of y.

    There are several different definitions ofR2

    which are only

    sometimes equivalent. One class of such cases includes that of linear

    regression. In this case, if an intercept is included thenR2

    is simply the

    square of the sample correlation coefficient between the outcomes and

    their predicted values, or in the case ofsimple linear regression,between the outcomes and the values of the single regressor being used

    for prediction.

    In such cases, the coefficient of determination ranges from 0

    to 1. Important cases where the computational definition ofR2

    can yield

    negative values, depending on the definition used, arise where the

    predictions which are being compared to the corresponding outcomeshave not been derived from a model-fitting procedure using those data,

    and where linear regression is conducted without including an intercept.

    Additionally, negative values ofR2

    may occur when fitting non-linear

    trends to data.[2]

    In these instances, the mean of the data provides a fit

    to the data that is superior to that of the trend under this goodness of

    fit analysis.

    http://en.wikipedia.org/wiki/Linear_regressionhttp://en.wikipedia.org/wiki/Linear_regressionhttp://en.wikipedia.org/wiki/Pearson_product-moment_correlation_coefficienthttp://en.wikipedia.org/wiki/Simple_linear_regressionhttp://en.wikipedia.org/wiki/Coefficient_of_determination#cite_note-1http://en.wikipedia.org/wiki/Coefficient_of_determination#cite_note-1http://en.wikipedia.org/wiki/Coefficient_of_determination#cite_note-1http://en.wikipedia.org/wiki/Goodness_of_fithttp://en.wikipedia.org/wiki/Goodness_of_fithttp://en.wikipedia.org/wiki/Goodness_of_fithttp://en.wikipedia.org/wiki/Goodness_of_fithttp://en.wikipedia.org/wiki/Coefficient_of_determination#cite_note-1http://en.wikipedia.org/wiki/Simple_linear_regressionhttp://en.wikipedia.org/wiki/Pearson_product-moment_correlation_coefficienthttp://en.wikipedia.org/wiki/Linear_regressionhttp://en.wikipedia.org/wiki/Linear_regression
  • 7/31/2019 Object Recognition in the Surveillance Area of Visually Impaired

    38/65

    38

    A data set has valuesyi, each of which has an associated

    modelled valuefi (also sometimes referred to asi). Here, the

    valuesyi are called the observed values and the modelled valuesfi aresometimes called the predicted values.

    The "variability" of the data set is measured through different sums of

    squares:

    the total sum of squares (proportional to the sample variance);

    the regression sum of squares, also called the explained sum of squares.

    ,

    the sum of squares of residuals, also called the residual sum of squares.

    In the above is the mean of the observed data:

    where n is the number of observations.

    The notations and should be avoided, since in some texts

    their meaning is reversed to Residual sum of squares and Explained

    sum of squares, respectively.

    The most general definition of the coefficient of determination is

    http://en.wikipedia.org/wiki/Sum_of_squareshttp://en.wikipedia.org/wiki/Sum_of_squareshttp://en.wikipedia.org/wiki/Total_sum_of_squareshttp://en.wikipedia.org/wiki/Explained_sum_of_squareshttp://en.wikipedia.org/wiki/Residual_sum_of_squareshttp://en.wikipedia.org/wiki/Residual_sum_of_squareshttp://en.wikipedia.org/wiki/Explained_sum_of_squareshttp://en.wikipedia.org/wiki/Total_sum_of_squareshttp://en.wikipedia.org/wiki/Sum_of_squareshttp://en.wikipedia.org/wiki/Sum_of_squares
  • 7/31/2019 Object Recognition in the Surveillance Area of Visually Impaired

    39/65

    39

    Hypothesis Test:

    The null and alternative hypotheses are

    Ho: = 0 (no actual correlation; The Null Hypothesis)

    Ha: 0 (there is some correlation; The Alternative

    Hypothesis)

    By using the coefficient of determination algorithm, the comparison was

    made between the training and testing images. The most important thing in

    object recognition system is to differentiate between the two objects. The testimage will be between different objects so that the accuracy of the algorithm can

    be calculated. The train image of different object is saving to folder train

    database. Then the test image will be stored in different folder in the name of

    test database. Then by applying the algorithm of coefficient of determination the

    comparison was made between the different objects stored in two database.

    Figure 4.4 Train Image Folder for Different Train Image

  • 7/31/2019 Object Recognition in the Surveillance Area of Visually Impaired

    40/65

    40

    Then the testing images is saving to a folder Testing Database which

    consists of different images of chairs, tables and furniture's. The images are

    stored after the saliency was done. Then the comparison was made betweenthe two folder by using the coefficient of determination algorithm.

    Figure 4.5 Test Image Folder for different test image

    4.3 Graphical User Interface Design

    A graphical user interface (GUI) can be describe as a

    graphical display that contains devices, or components, that enable a

    user to perform interactive tasks without creating a script or type

    commands at the command line. These components can be push

    buttons menus, toggle buttons, toolbars, checkboxes, radio buttons and

    sliders etc.

  • 7/31/2019 Object Recognition in the Surveillance Area of Visually Impaired

    41/65

    41

    Data can also be display in graphical form or plots

    or groups. The user need not know the details of the task. A

    simple GUI supported by MATLAB with its rich sets of tools is asshown in Figure 4.6

    Figure 4.6 GUI Supported by MATLAB

    Creating a GUI using MATLABs Graphical User Interface

    Development Environment

    (GUIDE) is divided into two relatively managed and independents tasks,

    viz:

    1) GUI Component layout

    2) GUI Programming

  • 7/31/2019 Object Recognition in the Surveillance Area of Visually Impaired

    42/65

    42

    In GUI component layout, the GUIDE enables the user to layout

    the GUI as required. It involves clicking and dragging of the

    components from the components palette to the layout area. Thesecomponents can be aligned, resize, set tab order etc by using tools are

    accessible from the Layout Editor. Saving this GUI layout generates an

    M-Files(MATLAB) file which helps to control how the GUI works.

    This and subsequent activities constitute the GUI Programming tasks.

    The generated M-file provides code to initialize the GUI when

    launched and contains a framework for the GUI callbacks; the

    routines that execute in response to user-generated events such as a

    mouse click. Adding codes to the callbacks function using the M-file

    editor enable the GUI perform intended operations.

    A graphical user interface provides the user with a familiar

    environment in which to work. This environment contains

    pushbuttons, toggle buttons, lists, menus, textboxes and so forth, all

    of which are already familiar to the user, so that he or she can

    concentrate on using the application rather than on the mechanics

    involved in doing things.

    However, GUIs are harder for the programmer because a

    GUI-based program must be prepared for mouse clicks (or possiblykeyboard input) for any GUI element at any time. Such inputs are

    known as events, and a program that responds to events is said to be

    event driven.

  • 7/31/2019 Object Recognition in the Surveillance Area of Visually Impaired

    43/65

    43

    Three principal elements required to create a MATLAB Graphical

    User Interface :

    1. Components: Each item on a MATLAB GUI (pushbuttons, labels,

    edit boxes, etc.) is a graphical component. The types of components

    include graphical controls (pushbuttons, edit boxes, lists, sliders, etc.),

    static elements (frames and text strings), menus, and axes. Graphical

    controls and static elements are created by the function uicontrol, and

    menus are created by the functions uimenu and uicontextmenu. Axes,

    which are used to display graphical data, are created by the function axes.

    2.Figures: The components of a GUI must be arranged within a

    figure, which is a window on the computer screen. In the past, figures

    have been created automatically whenever we have plotted data.

    However, empty figures can be created with the function figure and can

    be used to hold any combination of components.

    3. Call backs: Finally, there must be some way to perform an action if a

    user clicks mouse on a button or types information on a keyboard. A

    mouse click or key press is an event, and the MATLAB program must

    respond to each event if the program is to perform its function.

    For example, if a user clicks on a button, that event must cause

    the MATLAB code that implements the function of the button to be

    executed. The code executed in response to an event is known as a

    call back. There must be a call back to implement the function of

    each graphical component on the GUI.

  • 7/31/2019 Object Recognition in the Surveillance Area of Visually Impaired

    44/65

    44

    Creating and Displaying a Graphical User Interface

    MATLAB GUIs are created using a tool called guide,

    the GUI Development Environment. This tool allows a programmer

    to layout the GUI, selecting and aligning the GUI components to be

    placed in it. Once the components are in place, the programmer

    can edit their properties: name, color, size, font, text to display and so

    forth. When guide saves the GUI, it creates working program including

    skeleton functions that the programmer can modify to implement the

    behavior of the GUI. When guide is executed, it creates the Layout

    Editor. The large white area with grid lines is the layout area, where a

    programmer can layout the GUI.

    The Layout Editor window has a palate of GUI components

    along the left side of the layout area. A user can create any number of

    GUI components by first clicking on the desired component, and then

    dragging its outline in the layout area. The top of the window has a

    toolbar with a series of useful tools that allow the user to distribute and

    align GUI components, modify the properties of GUI components,

    add menus to GUIs, and so on. The components used and its

    functions are

    Pushbuttons:A pushbutton is a component that a user can click on to

    trigger a specific action. The pushbutton generates a callback when

    the user clicks the mouse on it. A pushbutton is created by creating a

    uicontrol whose style property is 'pushbutton'. A pushbutton may be

    added to a GUI by using the pushbutton tool in the Layout Editor.

  • 7/31/2019 Object Recognition in the Surveillance Area of Visually Impaired

    45/65

    45

    Figure 4.7 Layout of a simple GUI with an Pushbutton

    Edit Boxes: An edit box is a graphical object that allows a user to

    enter at ext string. The edit box generates a call back when the user

    presses the Enter key after typing a string into the box. An edit box is

    created by creating a uicontrol whose style property is 'edit'. An edit box

    may be added to a GUI by using the edit box tool in the Layout Editor.

    Figure 4.8 Layout of a simple GUI with an Edit box

    The GUI designed is shown in Figure 4.9. The GUI includes

    the BROWSE pushbutton for getting the input images which is stored

    in the database. The Pushbutton "SALIENCY" on click runs the

  • 7/31/2019 Object Recognition in the Surveillance Area of Visually Impaired

    46/65

    46

    process of saliency and displays the images of original and saliency .

    The image of the different chairs, tables and furniture's is uploaded as

    the input and the SALIENCY push button is pressed which then as aresult of processing displays the image in the display area. Then the

    pushbutton "BROWSE" on click runs the process of selecting the saliency

    image stored in the testing database and displays the image name in the

    display area.

    Figure 4.9 GUI Design

    The pushbutton "COMPARISON" on click runs the process of

    comparing the Training and Testing Database and displays the result of

    training and testing images and then the bounding box is drawn in the

    training database images . The testing image is compared and the bounding

    box is drawn in that testing image which is stored in the training images.

    The pushbutton "REFRESH" on click runs the process of refreshing the

    display button.

  • 7/31/2019 Object Recognition in the Surveillance Area of Visually Impaired

    47/65

    47

    CHAPTER 5

    RESULTS

    The proposed methodology for detecting hard exudates are implemented

    in Matlab and the outcomes are discussed below.

    INTENSITY

    An intensity image is a data matrix, I, whose values represent

    intensities within some range. An intensity image is represented as a

    single matrix, with each element of the matrix corresponding to one

    image pixel. The matrix can be of class double, uint8, or uint16.

    ORIGINAL IMAGE INTENSITY

    Figure 5.1 Intensity

  • 7/31/2019 Object Recognition in the Surveillance Area of Visually Impaired

    48/65

    48

    COLOR INTENSITY:

    The values in a binary, intensity, or RGB image can be

    different data types. The data type of the image values determines

    which values correspond to black and white as well as the absence or

    saturation of color. The following figures shown the color intensity for

    the different furniture's, tables and chairs.

    ORIGINAL IMAGE COLOR INTENSITY

    Figure 5.2 Color Intensity

  • 7/31/2019 Object Recognition in the Surveillance Area of Visually Impaired

    49/65

    49

    ORIENTATION:

    Orientation is the process of rotating the images in different

    angles like 35, 90,125 degrees. The following figures shows theorientation in 35 degree for different chairs, tables and

    furniture's.

    ORIGINAL IMAGE ORIENTATION

    Figure 5.3 35 degree Orientation

  • 7/31/2019 Object Recognition in the Surveillance Area of Visually Impaired

    50/65

    50

    The following figures shows the orientation in 125 degree for

    different chairs, tables and furniture's.

    ORIGINAL IMAGE INTENSITY

    Figure 5.4 125 degree orientation

  • 7/31/2019 Object Recognition in the Surveillance Area of Visually Impaired

    51/65

    51

    The following figures shows the orientation in 90 degree for

    different chairs, tables and furniture's.

    ORIGINAL IMAGE INTENSITY

    Figure 5.5 90 degree orientation

  • 7/31/2019 Object Recognition in the Surveillance Area of Visually Impaired

    52/65

    52

    IMAGE PYRAMIDS:

    Image pyramids is used to represent images at more than one

    resolution. The following figure shows the image pyramids at fourlevels for different chairs, tables and furniture's.

    Figure 5.6 Image Pyramid

  • 7/31/2019 Object Recognition in the Surveillance Area of Visually Impaired

    53/65

    53

    HAAR VERTICAL TRANSFORM:

    The Haar Transform is a certain sequence of rescaled "square-

    shaped" functions which together form a wavelet family or basis. Thefollowing figure shows the haar vertical transform for different chairs,

    tables and furniture's .

    Figure 5.7 Haar Vertical Transform

  • 7/31/2019 Object Recognition in the Surveillance Area of Visually Impaired

    54/65

    54

    HAAR TRANSFORMED IMAGE:

    The following figure shows the transformed image for different

    chairs, tables and furniture's.

    Figure 5.8 Harr Transformed Image

  • 7/31/2019 Object Recognition in the Surveillance Area of Visually Impaired

    55/65

    55

    HAAR TRANSFORM:

    The following figure shows the haar transform for different chairs,

    tables and furniture's at three levels. The transformed image is againtransform into another one transformed image.

    Figure 5.9 Haar Transform

  • 7/31/2019 Object Recognition in the Surveillance Area of Visually Impaired

    56/65

    56

    HISTOGRAM:

    For each gray level, count the number of pixels having that level For each level, a stick represent the number of pixels(can group

    nearby levels to form a bin and count number of pixels in it). Thefollowing figure shows the histogram for ball, mouse and glass. The

    histogram figure shows the number of gray levels in X-axis and the

    number of pixels in the gray level is shown in the y-axis.

    Figure 5.10 Histogram

  • 7/31/2019 Object Recognition in the Surveillance Area of Visually Impaired

    57/65

    57

    HISTOGRAM EQUALIZATION:

    The main objective of this is after transformation, the histogram

    becomes constant. The following figure shown the histogramequalisation of ball, mouse and glass. The histogram obtained after

    equalization is spread out over the entire scale of gray-levels.

    Figure 5.11 Histogram Equalization

  • 7/31/2019 Object Recognition in the Surveillance Area of Visually Impaired

    58/65

    58

    SALIENCY MAP:

    Saliency Map is used to represent saliency at all locations with a

    scalar quantity. Saliency means the attractive portion in image. In thebelow figure, saliency of particular image is shown for ball, mouse and

    glass.

    Figure 5.12 Saliency Map

  • 7/31/2019 Object Recognition in the Surveillance Area of Visually Impaired

    59/65

    59

    COMPARISON OUTPUT:

    The figure shown below consists of Training Images and

    Testing Images .The different types of chairs, tables and furniture's are

    stored as the training images for comparison purpose and testing image

    is the one which is to be compared with the training images.

    The Detected result is shown by drawing a bounding box around it

    and also displays that "This is the matched Object".

    Figure 5.13 Comparison of Saliency Maps

  • 7/31/2019 Object Recognition in the Surveillance Area of Visually Impaired

    60/65

    60

    GUI DESIGN:

    The GUI is used to get the input image and perform the

    saliency and comparison between the different chairs, tables and

    Furniture's.

    Figure 5.14 GUI deign for proposed Methodology

    The above figure shows the GUI design of the proposed

    methodology. For getting input image click browse and it runs the

    process of getting input from the databases and in the edit box it

    displays the name of the image which is chosen by the user for the

    process of saliency and comparison. Then the GUI looks like as shown

    below. Then by clicking saliency it runs the process of saliency and it

    displays the saliency output as shown in figure 5.12 and by clicking

    the comparison pushbutton it show the output of comparison as shown

    in the figure 5.13.

  • 7/31/2019 Object Recognition in the Surveillance Area of Visually Impaired

    61/65

    61

    Figure 5.15 GUI Design for Displaying the input image

    Thus, by using these outputs the object can be recognized in

    very accuracy manner and the correct object among the different

    objects can be detected very easily by the visually impaired people as

    these objects are already stored in the databases of both Training and

    Testing. The comparison result shows both Training and Testing

    Images along with the correct object detected by drawing the bounding

    box around the correct object that is the image given in the testing

    image .The bounding box was found in the training images so that the

    object can be correctly detected.

    ERROR RATE FOR THE PROPOSED METHODOLOGY

    TOTAL NUMBER OF

    IMAGES

    MATCHED UNMATCHED

    50 images 50 -

    As all the 25 images were correctly recognized, the success

    rate for the proposed methodology is 100%

  • 7/31/2019 Object Recognition in the Surveillance Area of Visually Impaired

    62/65

    62

    CHAPTER 6

    CONCLUSION AND FUTURE ENHANCEMENTS

    The goal of this master's work was to develop algorithms for detecting the

    object in the real time environment for survival of the Visually Impaired

    People to recognize the object in front of them and avoid it while moving

    from one place to another place. From the literature survey a

    comparative study of methodologies implemented is made and the

    methods generating better performance are chosen in this work to

    recognize the object in front of the visually impaired people. Analysis

    on methods was done based on the parameters accuracy and better

    performance.

    6.1 CONCLUSION

    The algorithm is implemented on the input images captured

    from the camera and stored those images in database. There are totally

    15 images in training Database and comparison was made between

    testing and Training Images. Before storing the images in the

    databases saliency was made for different chairs, tables and

    Furniture's.In this Project the Visually impaired people can detect the

    obstacles in front of them and survive in this world without anybody

    help. For that Several Steps are carried out. First step is the visual

    saliency detection was done as given in aim of our project. In that

    Linear filtering which means color, intensity, orientation was done and

    after that image pyramids was done by image reduction technique and

    from that haar transform was done and by using Difference ofGaussian

    and Gabor filter Center-Surround difference was done and from that

    normalization was done and from that saliency map was done by using

  • 7/31/2019 Object Recognition in the Surveillance Area of Visually Impaired

    63/65

    63

    the Itti koch et al method. The objects were taken and all the above are

    performed and comparison was made for that objects by using the

    image processing technique in MATLAB. All images are stored inDatabase. The comparison was made for the images and the output

    displays both the Training and testing Images in one figure and the

    object Matched window shows the correct object detection by drawing

    the bounding box around the testing image which is to be compared

    with the Training Images. Thus by using this methodology, the object

    recognition is very easy and accuracy for those who are not able to

    identify the object in front of them and avoid the objects while moving

    from one place to another.

    6.2FUTURE ENHANCEMENTS

    In the Future work, the Audio Saliency detection which

    means as like the Visual Saliency Detection the audio can be used to

    find saliency for sound that is the attractive portion or part of the sound

    can be detected to recognize the sound of the object or the person who

    is standing in front of them and thus by using this audio saliency

    detection the sound and speaker recognition can be done. For Example,

    the object is chair means the device will say that is the chair by

    recognizing that object and tell the visually impaired people to avoid

    that object and take a another way for their destination. Thus by using

    these enhancements the visually impaired people do not need any

    assistance in either dependent manner or in independent manner for

    moving one place to another place. The system should be able to report

    the location, distance and direction of items in the room such as

    equipment, furniture, doors and even other users.

  • 7/31/2019 Object Recognition in the Surveillance Area of Visually Impaired

    64/65

    64

    REFERENCES:

    [1] KOCH,C., AND ULLMAN, S.Shifts in selective visual attention:

    towards the underlying neural circuitry. Hum Neurobiol4,4(1985),219-227.

    [2] ITTI,L.,KOCH,C., AND NIEBUR,E.A model of saliency based

    visual attention for rapid scene analysis. IEEE transactions on pattern

    analysis and machine intelligence 20,11(1998),1254-1259.

    [3] ITTI,L., AND KOCH, C.Computational modeling of visual

    attention. Nature Reviews Neuroscience 2,3(March 2001),194-203.

    [4] FRINTROP, S.Vocus: A visual attention system for object

    detection and goal-directed search. Lecture Notes in Artificial

    Intelligence(LNAI)Vol.3899(2006).

    [5] FRINTROP,S., AND ROME, E. Simulating visual attention for

    object recognition. In Proceedings of workshop on Early Cognitive

    Vision(2004),Isle of Skye, Scotland.

    [6] FRINTROP,S.,NUCHTER, A., SURMANN, H., AND

    HERTZBERG, J. Saliency-based object recognition in 3d data. Isle of

    Skye ,Scotland.

    [7] ITTI,L., AND KOCH, C. Feature combination strategies for

    saliency based visual attention systems. Journal of Electronic

    Imaging 10,1(January 2001),161-169.

  • 7/31/2019 Object Recognition in the Surveillance Area of Visually Impaired

    65/65

    [8] KOCH,C., AND ULLMANN, S. Shifts in selective visual

    attention: towards the underlying neural circuitry. Hum Neurobiol

    4,4(1985),219-227.

    [9] VANRULLEN, R. Visual saliency and spike timing in the

    ventral visual pathway. Journal of Physiol Paris 97,2-3(mar-may

    2003),365-377.

    [10] R.C. GONZALES, R.E. WOODS, Digital Image

    Processing[Book], pp. 525-626, Pearson Prentice Hall, Upper Saddle

    River, New Jersey, 2008.

    [11] European Blind Union. (2002). Statistical Data on blind and

    partially sighted people in European countries.

    http://www.euroblind.org/fichiersGB/STAT.html

    [12] DODSON, A.H.; MOORE, T. & MOON, G.V. (1999). A

    Navigation System for the Blind Pedestrian, Proceedings of GNSS

    99, 3rd European Symposium on Global Navigation Satellite Systems,

    p 513-518, Genoa, Italy, October 1999.

    [13] SHOVAL, S.; ULRICH, I. & BORENSTEIN, J. (2000).

    Computerized Obstacle Avoidance Systems for the Blind and

    Visually Impaired. Invited chapter in Intelligent Systems and

    Technologies in Rehabilitation Engineering. Editors: Teodprescu,

    http://www.euroblind.org/fichiersGB/STAT.htmlhttp://www.euroblind.org/fichiersGB/STAT.htmlhttp://www.euroblind.org/fichiersGB/STAT.html