17
CHAPTER 2
LITERATURE SURVEY
2.1 INTRODUCTION
While tracking a target in realistic physical environments, the
sensor information related to the target is being updated with incorrect data
computed due to thermal noise, false alarms, clutter, occlusions and shadows.
Consequently, tracking performance degrades and the resulting tracking
errors are often far worse than those predicted by the tracking filter’s error
covariance matrix. The proposed research work provides solutions for
efficient tracking of targets in a radar sensor network, wireless sensor network
and camera sensor network. This chapter gives the overview of existing
techniques used for target tracking in various sensor networks. The taxonomy
of single and multiple target tracking techniques are presented in this chapter.
The various requirements and challenges in the design of the target tracking
algorithms are also discussed. Section 2.2 provides tracking of targets under
radar sensor network and section 2.3 explains about tracking of targets in
Wireless Sensor Network. Tracking of targets in camera sensor network is
explained in section 2.4.
18
2.2 TRACKING OF TARGETS UNDER RADAR SENSOR
NETWORK
Multiple Targets Tracking (MTT) is an important topic under radar
surveillance, since many applications such as remote sensing observing
system, ground based target recognition, detection and tracking, detecting
speed of the vehicle & highway safety, target tracking in ATC, aircraft safety,
electronic warfare, ship safety and navigation are based upon it.
2.2.1 Existing Algorithms
The data association is the basic problem of MTT. Various methods
for multiple targets tracking have been analyzed in the literature are described
below.
The Figure 2.1 shows classification of literature survey of target
tracking in radar sensor network.
The existing literature survey available for target tracking in radar
sensor network can be mainly classified as maneuvering target and non
maneuvering target for single target and multiple targets. Various
methodologies such as data association, position estimation and classification
techniques are available for tracking multiple targets. This thesis mainly
focuses on data association and position estimation.
19
20
In MTT, the data (location information represented by spherical
coordinates) produced by the same source is identified and partitioned into
sets of tracks. Also MTT finds a number of targets and parameters such as
position, velocity and acceleration for each track (Blackman 1986). Li and
Jilkov (2000) presented the methodology of new targets identification, new
plot creation and existing track updation for each scan. Observations that are
not assigned to existing tracks are used to form new tentative tracks. Once a
tentative track is formed from the observations, it is updated by successive
scans. The gate size and time duration allowed for confirming observation can
be chosen as functions of the confidence in the validity of the original
observation. A track which is not updated by successive scans has to be
deleted.
Bar-Shalom and Fortmann (1988) explained that when tracking is
performed in an environment that contains clutter and/or more than one
object, the measurements need to be associated with the correct tracks. Not all
measurements convey information about the tracked object and the
measurements that are not informative about the tracked object are called
clutter. Determining which measurements are informative and which are not,
is usually referred to as data association. As a result of this process, data
association is able to produce a set of tracks for a target.
Multiple targets tracking with radar applications (Blackman 1986,
Blackman et al 1993) described multiple target tracking and data association
of the sensor data for individual targets. When multiple number of
observations are received by the tracking system, it is necessary to assign
each incoming observation report to a specific target track. The popular
mechanism for classifying reports was the “nearest-neighbor rule” (Liggins
et al 2009). The idea of the rule is to estimate each target position at the time
21
of a new position report, and then assign that report to the target nearest to
such estimate.
Bar-Shalom and Tse (1975) have proposed an all-neighbor PDA
approach to correlate sensor data under the assumption of a single target. The
PDA method is based on computing the posterior probability of each
candidate measurement found in a validation gate, assuming that only one real
target is present and all other measurements are Poisson-distributed clutter.
The PDA and its extension JPDA (Blackman 1986) are used for tracking
single and multiple targets respectively. In JPDA method, joint posterior
probabilities are computed for multiple targets in a Poisson clutter.
However, these methods are computationally heavy and have no
explicit provision for track initiation. Although many association (Smith and
Sameer 2006) and tracking algorithms (Liggins et al 2009) have been
suggested, it is still difficult to generate and maintain tracks in practice
(Musicki 2007).
Fortmann et al (1983) proposed a new JPDA algorithm for multiple
targets in clutter. This was a target oriented approach, in the sense that a set of
established targets is used to form gates in the measurement space and to
compute posterior probabilities.
Roecker et al (1995) proposed a multiple scan or n-back scan JPDA
algorithm which addresses itself to the problem of measurement to track data
association in a multiple target and clutter environment and uses multiple
scans of measurements along with the present target information to produce
better weights for data association.
In MTT, there are number of methods for classifying the observed
data into tracks. MHT uses track splitting technique for accurate decision
22
making from the observed data (Musicki and Suvorova 2008). Under this
MHT scheme, the tracking system does not have to commit immediately or
irrevocably to a single assignment of each report. If a report is highly
correlated with more than one track, an updated copy of each track can be
created; subsequent reports can be used to determine which assignment is
correct. As more reports come in, the track associated with the correct
assignment will rapidly converge on the true target trajectory, whereas the
falsely updated tracks are less likely to be correlated with subsequent reports.
The n-backscan MHT approach requires information collected from
‘n’ number of previous scans for making a decision. Hence it needs more
memory for maintaining numerous track hypotheses (Feo et al 1997). The
main drawback of n-backscan MHT is the exponential increase in
computation complexity and memory requirement. Bar-Shalom et al (2007)
discussed several theoretical issues relating to the score function for the
measurement-to-track association/assignment decision in the track oriented
version of the MHT. The score function is the ratio of the Probability Density
Function (PDF) of a measurement having originated from a track to the PDF
of the measurement having a different origin and is called as likelihood ratio.
When the system is linear with additive Gaussian noise, there exists
an analytical solution to the Bayesian time and measurement update equations.
The solution is given by the KF. Many books describing different aspects of
the KF exist (Simon 2001). Since the system is linear and Gaussian, the
update formula will remain Gaussian, and hence all Gaussian systems can be
described by their first two moments (mean and covariance). The update
equations consist of mean and covariance update. The original KF (Kalman
1960, Kalman 1961) defined in continuous-time, but soon a discrete version
was also derived. Much of the classical theory is described in Anderson and
Moore (1979). For the discretized-linearization, (Gustafsson 2000), the non-
23
linear continuous-time system was linearized and then the system was
discretized. Anderson and Moore (1979) and Bar-Shalom and Li (1993) have
discussed the EKF for the discrete-time.
Farina et al (2002) have compared the estimation performance
(error mean and standard deviation; consistency test) of nonlinear filters like
the Extended Kalman Filter (EKF), the. statistical linearization, the particle
filtering, and the Unscented Kalman filter (UKF).
Singer et al (1974) proposed a new optimal filter for target tracking
in dense multitarget environment. The sensitivity of tracking accuracy to data
rate, maneuver magnitude, maneuver correlation coefficient, and single-look
measurement accuracy of KF is discussed
The problems and issues involved in Multitarget Ocean tracking
using a heterogeneous set of passive acoustic measurements are outlined by
Fortmann and Baron (1979). They have also described an approach to solve
data association and maneuver detection problems. Their method uses an EKF
with both geographic and acoustic states, and handles measurement vectors
such as bearing/frequency and delay/Doppler difference.
To resolve the problem of track-to-track association in a distributed
multisensor situation, He and Zhang (2006) presented independent and
dependent sequential track correlation algorithms based on those of Singer
(1970) and Bar-Shalom (1981). In this paper, based on sequential track
correlation algorithm, the restricted and attenuation memory track correlation
algorithms and sequential classic assignment rules are explained. The
correlation performances of the sequential algorithms are much better than
those of Singer (1970) and Bar-Shalom (1981) with a little more computation
and memory burden under the environments of dense targets, interfering noise
24
and track cross. The computational complexity of these algorithm increases
with increasing environmental parameter under consideration. Also,
performance of these algorithms reduces with increased number of targets.
Keuk (1998) derived an optimal combinational method which can
be used under different operating conditions. The method related to MHT
uses a sequential likelihood ratio test and derives benefit from processing
signal strength information. Multiscan data association can significantly
enhance tracking performance (Battistelli et al 2011) in critical radar
surveillance scenarios involving multiple targets, low detection probability,
high false alarm probability, evasive target maneuvers, and finite radar
resolution. Unfortunately, multiscan data association approach is affected by
dimensionality which delays its real-time application for tracking problems
with short scan periods and/or a more number of scans of the association
logics and/or many measurements per scan. To solve this Battistelli et al
(2011) have suggested multiscan association as a multi-commodity or single-
commodity flow optimization problem that allows a relaxation of the
association problem which provides close-to-optimal association
performance.
Li and Bar-Shalom (1996) presented the conditional PDF of the
Nearest Neighbor (NN) measurement under the events of correct and
incorrect data association, the probabilities of these data association events,
and the propagation of the matrix mean square error conditioned on these
events. The development of the above mentioned recursion relies heavily on
these conditional PDFs and probabilities.
Feo et al (1997) explained about Interacting Multiple Model Joint
Probabilistic Data Association (IMMJPDA) and MHT for improved tracking
performance. They also provide a performance comparison between three
25
tracking algorithms Nearest Neighbour (NN) correlation and KF, IMMJPDA
and MHT in terms of track maintenance probability and tracking errors. Both
perform better than NN and KF.
Roecker and Mcgillem (1988) compared the state vector fusion
method and the measurement fusion method for fusing the tracks of two
different sensors and showed the reduction achieved in the covariance of the
filtered state vector by utilizing measurement fusion.
The decision-based techniques for maneuvering target tracking,
which appeared after the decision free adaptive KF techniques based on a
single model, have become quite popular and have been studied extensively in
the literature (Bar-Shalom and Li 1993, Bar-Shalom et al 2001). In decision-
based approaches, the state estimation is based on a target motion model
determined by maneuver detection. Therefore, making reliable and timely
decisions is the key to these approaches for satisfactory state estimation.
Many algorithms and techniques have been developed for the detection of
maneuvers. A comprehensive survey of such decision-based approaches is
given by Li and Jilkov (2002).
Li et al (1999) proposed a general Multiple Model (MM) estimator
with a Variable Structure (VSMM), called Model Group Switching (MGS)
algorithm. In this estimator, a set of model groups is used, each representing a
collection or cluster of closely related system models. The set of models is
made adaptive by switching among these groups to follow possible jumps
(across groups) of the system mode in such a way that it balances well
between the needs to have the smallest delay in switching and to have a
minimum false switching rate.
Li and Jilkov (2001) dealt with surveys of the problems and
techniques of tracking maneuvering targets in the absence of the
26
measurement-origin uncertainty. Mallick et al (2004) have proposed a
solution of multi-target tracking problems in clutter with the probability of
detection lesser than unity using the track-oriented MHT. They have also
presented about multiple hypotheses distributed tracking algorithms for track
initialization, gating, hypothesis generation, track update, computation of
track likelihood, formation of global hypothesis, and pruning using the pseudo
measurement formulation.
Ru et al (2009) proposed a technique that gives explicit solutions
for Gaussian-mixture prior distributions and can be applied to arbitrary prior
distributions through Gaussian-mixture approximations. The approach
essentially utilizes prior information about the maneuver accelerations in
typical tracking engagements and thus allows implementation of detection
performance as compared to traditional maneuver detectors.
Zhu and Li (2003) presented a fusion rule for distributed multi
hypothesis decision systems with communication patterns among sensors and
the fusion center. They have also proposed a scheme for generating optimum
sensor rules and optimum fusion rules, which reduce computation
tremendously as compared with the commonly used exhaustive search. They
have provided a guideline to assign sensors to nodes in a signal detection
networks with a given communication pattern too.
Myung (2003) explained Maximum Likelihood Estimation (MLE)
for parameter estimation in statistics and in particular, in non-linear modeling
with non-normal data. Hwang et al (2004) proposed a filter based on JPDA
for target-measurement correlation combined with an identity management
algorithm incorporating suitable local information, when available, in a
manner that decreases the uncertainty, as measured by system entropy.
27
Sathyan and Sinha (2011) presented a new two-stage algorithm for
multitarget tracking using multiple asynchronous passive sensors. The
proposed algorithm used local bearings-only (mono) tracks for each sensor
and combined these tracks to generate complete kinematic (stereo) tracks in
the Cartesian coordinate frame. Once stereo tracks have been formed, in the
second stage known as the stereo tracking stage, bearing measurements are
directly used to update the stereo tracks. They have also used the assignment
based technique to solve various data association problems that arise due to
measurement origin uncertainty.
Scala and Pulford (2005) have described an algorithm for tracking a
maneuvering target in heavy clutter and/or with a low probability of detection.
They have used a computationally efficient algorithm for multi-scan target
tracking based on the Viterbi algorithm, known as the Viterbi Data
Association (VDA) algorithm. Tugnait (2004) proposed a novel suboptimal
filtering algorithm by applying the basic IMM approach and the JPDA
technique for tracking of two highly maneuvering closely spaced targets.
Jeong and Tugnait (2005) presented a filtering algorithm by
applying basic IMM approach and the PDA technique to a two sensor (radar
and infrared) scheme for tracking a highly maneuvering target in a cluttered
environment. Musicki and Suvorova (2008) have described IMM-integrated
PDA (IMM-IPDA), IMM–Joint IPDA (IMM-JIPDA), and linear multitarget
IMM-IPDA (IMM-LMIPDA) filters for tracking maneuvering targets in
clutter. These algorithms use the IMM approximation for multiple model
target trajectory estimation and the PDA approximation to estimate target
trajectories of individual IMM models in clutter. All algorithms recursively
update the probability of target existence, which has been used for false track
discrimination.
28
For tracking heavily maneuvering targets, a KF with a single
motion model will not be sufficient. If the true observation is much farther
away from the prediction to the nearest false observation, the measurement
update will be incorrect and thus the next prediction will be in the wrong
direction. In order to deal with this problem, Bar-Shalom and Fortmann
(1988) have used multiple motion models.
Maneuvering target tracking is an important problem, because
target accelerations are generally unknown and structural variations also exist
as the target moves into or out of the maneuvering mode. Li and Jilkov (2000,
2001, 2002 and 2003) provided a comprehensive survey of the problems and
techniques of tracking maneuvering targets. In the year 2000, they presented a
survey of various mathematical models of target dynamics for maneuvering
target tracking including 2D and 3D maneuver models as well as coordinate-
uncoupled generic models. This survey emphasized the underlying ideas and
assumptions of the models. Li and Jilkov (2001) provided a comprehensive
survey of the problems and techniques of tracking maneuvering targets in the
absence of the measurement-origin uncertainty. Li and Jilkov (2002) provided
a survey of maneuvering target motion model used for tracking ballistic
targets.
An improved IMMJPDA algorithm for tracking multiple
maneuvering targets in clutter has been analyzed by Mao et al (2006).
Musicki et al (2007) have implemented a near-optimal algorithm for tracking
a single maneuvering target in clutter. The algorithm integrates the target
existence paradigm with a multi-scan target state estimation algorithm. The
target trajectory estimation calculates the target’s posteriori PDF based on all
possible measurement detection histories and all possible target maneuvering
model histories.
29
Puranik and Tugnait (2007) have presented tracking of multiple
maneuvering targets in the presence of clutter using switching multiple target
motion models. A novel suboptimal filtering algorithm was developed by
applying the basic IMM approach and multiscan-JPDA technique. It showed
significant improvement in target position estimates by the proposed IMM
multiscan-JPDA compared with the results of the single-scan IMM/JPDA
algorithm for closely spaced targets.
Gerasimos (2012) has explained a derivative-free Kalman filtering
approach, which is suitable for state estimation based control of a class of
nonlinear systems. The considered systems are first subject to a linearization
transformation, and next state estimation is performed by applying the
standard Kalman filter to the linearized model. The proposed method provides
estimates of the state vector of the nonlinear system without the need for
derivatives and Jacobians calculation and without using linearization
approximations.
From the literature, it is found that multiple target tracking methods
are computationally heavy and have no explicit provision for track initiation.
Hence, there is a need to have a MTT scheme with lesser computation
complexity and memory requirement with a new correlation logic which has
better response time than the existing tracking schemes.
2.3 TRACKING OF TARGETS IN WIRELESS SENSOR
NETWORK
Recently, with the rapid development of wireless communication
technologies, wireless sensors have become more popular in military and
civilian systems. Civilian applications include air traffic, marine control,
navigation, and person/object tracking etc. (Culler et al 2004, Zhang and Cao
30
2004). Since sensor nodes are small and cheap, a large number of sensors are
deployed in the interesting field to retrieve the real-world information. The
Figure 2.2 shows classification of literature survey of target
tracking in wireless sensor network.
The existing literature survey available for target tracking in
wireless sensor network can be mainly classified as single sensor and multiple
sensors. The tracking is performed for both indoor and outdoor environment
based on various sensors and techniques. This thesis mainly focuses on
multiple target tracking using multiple sensors in WSN.
Due to the rapid development in sensor technology, Wireless
Sensor Networks are used for person tracking, home monitoring and
environment monitoring (Akyildiz et al 2002). Several solutions for human
detection and tracking have been proposed in the literature.
2.3.1 Existing Algorithms
Many Intelligent environments and security systems deploy WSN to
detect and track the targets. To make WSN economically feasible, the
individual nodes are to be low-end inexpensive devices (Krishnamachari and
Sitharama 2004). In the random deployment of WSN scenario, samples may
not arrive at regular time intervals. There is a need to predict the future
position of a target using sensor data based on the target dynamics even if
events are missed.
31
32
Umesh Babu et al (2006) proposed a KF based method for tracking
target in a sensor network. This approach used the acoustic signal and Time
Difference of Arrival (TDoA) for detecting and localization. Hopper et al
(1993) have presented a scheme based on the active badge system for
identifying the targets but the drawback of the scheme is limited scalability.
Kim et al (2009) proposed a method for object tracking in an indoor
environment. RFID system has been used to increase the accuracy and
resolution of location estimation.
Taketoshi et al (2004) proposed a method to track multiple persons
by integrating distributed floor pressure sensors and RFID. The distributed
floor pressure sensor data was utilized to detect the person by finding high
pressure in the area where the person was available and, when it failed to do
that, the RFID made it possible to associate that areas with the ID names of
the persons in the sensor covered areas.
Several schemes were able to reliably detect and track the
movement of persons in indoor and controlled environment with a unique
identification mark like RFID tag (Bharghavi et al 2010). Ayd n et al (2007)
presented a study on localization and tracking of an object carrying an active
RFID tag. Received Signal Strength (RSS) measurements at outdoor,
obstacle-free indoor and obstacled indoor environments were analyzed for
that purpose.
For object tracking, the existing method (Roseveare and Natarajan
2012) aimed at minimizing the number of sensing nodes. A common way to
reduce the number of sensing nodes was through the prediction technique (Xu
et al 2004). On the other hand, Chen and Chung (2005) have proposed useful
Alert-based Object Tracking (AbOT) scheme, to track irregular movement of
object that did not depend on predicting the object trajectory.
33
Chun Chen et al (2009) proposed a complete systematic
architecture, called Hierarchical Alert Model Architecture (HAMA), to
minimize the number of nodes participating in the tracking activities and
efficiently manage all moving object’s location information.
Yang et al (2008) described the probabilistic approaches, such as
Simultaneous Localization and Mapping (SLAM) algorithm for 2D trajectory
tracking with improved efficiency.
The KF is suitable for the estimation of the state of targets moving
with nearly constant velocity, but once the target starts maneuvering, the
single KF is not suitable for tracking. Various mathematical models
representing target motion have been developed in the literature. Bar-Shalom
and Birmiwal (1982) have explained KF based target tracking scheme to
provide the estimate of the states of the target. The velocity model KF gave
poor performance for a maneuvering target and the acceleration model KF
gave inferior performance when the target moved with linear velocity. Li and
Jilkov (2003) have mentioned that to precisely estimate the state of the target,
the exact model of maneuvering target was to be selected. Hence a Multiple
Model (MM) filter was needed for efficient tracking of maneuvering and non
maneuvering targets. In the IMM estimator, multiple models were used to
describe the motion of the target. The IMM made use of a bank of KF to
accommodate various possible target trajectory patterns and conditions (Blom
and Bar-Shalom 1988). The final estimate was obtained by the weighted sum
of estimates from sub-filters of the different models (Chen et al 2007) and
switching between models was obtained as per Markov transition probability
matrix (Mazor et al 1998). The IMM estimator with KF as sub filter used a set
of models to describe the target model (Li and Bar-Shalom 1993,
Yeddanapudi et al 1997).
Engin et al (2012) addressed the problem of target tracking based
on received signal strengths in WSN. The Kalman gain matrix has been
34
obtained as the solution to an optimization problem. Since each column of the
Kalman gain matrix corresponds to one sensor measurement, by formulating
an optimization problem with sparsity promoting penalty function in which
the number of nonzero columns of the Kalman gain are penalized.
Sreekanth and Krishna (2011) considered the problem of providing
guided navigation in a target tracking enabled wireless sensor network. In
this work, a constant velocity model is considered and the location of the
target is computed using the predictive regeneration method and the weighted
centroid method. The position of the target at a particular time is computed
using the weighted centroid algorithm. It is then compared with the predicted
position. Depending upon the comparison, a correction is provided and the
position is recomputed.
Gireesan et al (2001) described the target tracking application of
WSN based on an experimental testbed using Digi Xbee device, a Passive
Infrared (PIR) and MaxSonar ultrasonic ranging sensors. The experiments
showed that it is not possible to expect continuous ranging and reporting in
practice for low power sensors. The stabilization time and false positive
probability are very significant when deploying sensors in an outdoor
environment. Also, the MaxSonar ultrasonic sensor does not have hardware
range inhibition
The proposed solutions in the existing literature used sensors like
acoustic, image and PIR sensors for person detection. Tracking was achieved
using techniques like Particle Filter (Ozdemir 2009), signatures, mobile
agents (Yu et al 2004) and KF (Umesh Babu et al 2006). Although the PIR
sensor sensed the presence of an object, it failed to classify the type of object
as human or non-human or to give the count of the number of objects. Hence,
in addition to the PIR sensor, another sensor is required to identify the person
for tracking applications. The random deployment of sensor nodes exhibit
unreliable behavior and might not generate samples at regular time intervals.
35
The target (person) may be detected by more than one sensor or may not be
sensed even by a single sensor. Hence, an algorithm is required to estimate
the missing events and the future position of a target based on the available
measurements.
2.4 TRACKING OF TARGETS IN CAMERA SENSOR
NETWORK
Video surveillance has long been in use for the purpose of
monitoring in highly secured areas like banks, malls. Also it is used in athletic
performance analysis, industries and video conferencing etc. Traditionally,
the video streams are monitored online by human operators and stored for
future reference.
A considerable amount of work has been devoted to tracking
humans in the view of a single camera. However, single camera tracking can
monitor only a relatively narrow area due to the limited viewing angle of a
camera lens. Recently, growing interest has focused on tracking humans using
distributed monocular cameras (Sato et al 1994).
2.4.1 Existing Algorithms
The increasing need for intelligent visual surveillance in
commercial, law enforcement and military applications makes automated
visual surveillance systems and it is one of the main current application
domains in computer vision. Vision based multi-target tracking has been
focussed extensively and several algorithms are available in the literature to
track people using camera images (Tsagkatakis and Savakis 2011, Iketani et
al 1998, Anurag and Davis 2002, Kang et al 2004). The Figure 2.3 shows
classification of literature survey of target tracking in camera sensor network.
36
37
The existing literature survey available for target tracking in camera
sensor network can be mainly classified as single camera and multiple camera.
The major problems associated with visual tracking are variation in the
backgrounds, camera position and occlusion. This thesis mainly focuses on
occlusion handling and background estimation.
However, most of the existing methods have given inferior
performance due to camera position, varying pose, illumination conditions,
dynamic background and occlusion. Classifying multiple detected targets into
human, vehicle or animal is yet another difficult problem and also
computationally intensive. The different features for object tracking include
template, colour, contour, histogram of gradients, etc. of an object image
(Dalal and Triggs 2005).
Many existing techniques made assumptions which greatly
restricted the generality of the approach in real-world settings like (Comaniciu
et al 2003) background modelling techniques, scenes often included many
other dynamic objects, fast changes in lighting, and complex object
interactions like shadows and reflections that greatly influenced the image.
Comaniciu et al 2003 has proposed a Kernel-Based Object Tracking, which
successfully handles camera motion, partial occlusions, clutter, and target
scale variations
In single-camera tracking techniques, it was common to assume
that distinct targets had distinct appearances with respect to colour, texture,
size, or contour features (Iketani et al 1998), and also faced a fundamental
limitation caused by changing background. A real-time people tracking
system for an interactive environment used depth based background
subtraction (Krumm et al 2000). These approaches require the objects in the
38
scene to have enough texture information for dense stereo reconstruction and
build background models assuming static environment.
The region based stereo technique avoided many problems with
wide-baseline correspondence by matching regions instead of points (Kang et
al 2003). This approach required background modelling and assumed that
everyone in the scene was wearing uniquely coloured clothing to perform the
region based correspondence. The detection of moving objects is performed
by defining an adaptive background model that takes into account the camera
motion approximated by the affine transformation (Kang et al 2004).
The problems associated with automatic real time visual
surveillance include tracking unwanted target rather than desired target,
changes in the background, occlusions and the assumption that the
background environment is a static model (Forsyth and Ponce 2003). The
distance metric learning reliably represented the similarity between different
appearances of the object as well as the difference in appearance between the
object & the background and in detection of occlusions (Tsagkatakis and
Savakis 2011). Also they identify occlusion by comparing the distance
between the object and the background
The region based stereo technique required background modelling
and assumed that everyone in the scene is wearing uniquely coloured clothing
(Darrell et al 2001) to perform the region based correspondence.
Moreover, scenes often include many other dynamic objects, fast
changes in lighting and complex object interactions shadows and reflections
that greatly influence the image (Forsyth and Ponce 2003). However, in many
real-world settings, it was not possible to place a camera in an ideal location
39
that minimized occlusions (like a very high overhead view). Hence a robust
technique for tracking was required to handle frequent and prolonged
occlusions (Isard and MacCormick 2001), to work in crowded areas with
multiple views (Anurag and Davis 2002).
The person tracking was performed from offline data using
background subtraction and multiple cameras. It introduced some delay in
segmentation because sensor fusion was done by rendering foregrounds from
multiple sensors image (Krumm et al 2000). Multi-camera techniques need to
perform correspondence between the views and assumed that the appearance
of a feature in one view would be similar to its appearance in another view.
This assumption failed for widely separated views where the scene geometry
and lighting could result in the lack of commonly observed features and very
different appearances of the same feature.
However, single camera tracking can monitor only a relatively
narrow area due to the limited viewing angle of a camera lens. Recently,
growing interest has focused on tracking humans using distributed monocular
cameras. In such a setup, the image of the target within the area monitored by
cameras will be present in at least one of the video sequences produced by the
cameras.
Jun et al (2006) has proposed a novel vehicle classification scheme
for estimating important traffic parameters from video sequences. For
robustness condition, to keep background static, a background update method
was used. The desired target was detected through image differencing and
then tracked by a KF.
When surveillance is performed over a wide area, multi-camera
techniques needed to provide correspondence between views and assumed
40
that the appearance of a feature in one view will be similar to its appearance
in another view. This assumption failed for widely separated views where the
scene geometry and lighting could result in a lack of commonly observed
features (Kang et al 2004).
KF was the first filter to be used for visual tracking. Various
extensions of the filter have shown considerable success (Dalal and Triggs
2005, Zhao 2004, Piater and Crowley 2001) for person tracking. When the
state space was discrete and made up of a finite number of states, the Hidden
Markov Model (HMM) explained by Rabiner (1986 and 1989) could be
applied for tracking.
Tracking of people with particle filter (Osawa 2006) demonstrated
tracking in a cluttered office environment with two people but did not discuss
the cost of rendering an image from a model per particle per time step (Saad
and Mubarak 2006). Tracking methods were based on the visual hull
techniques, which were sensitive to errors in foreground segmentation but not
suited for environments with many occlusions because the visual hull became
loose and could not resolve individuals (Lopez et al 2007).
Valera and Velastin (2005) presented the state of development of
intelligent distributed surveillance systems, including a review of current
image processing techniques that are used in different modules that constitute
part of the surveillance systems. They also explained that the ability to
recognize objects and humans, to describe their actions and interactions from
information acquired by sensors were essential for automated visual
surveillance. Gian et al (2005) proposed image and video processing
techniques for advanced visual surveillance system using multicamera
systems to provide surveillance coverage across a wide area, ensuring object
visibility over a large range of depths. In the work proposed by Antonio et al
41
(2007), the cascaded structure form of the multiclass detection with fragment
based approach was used. The object detection and classification were done
on the dynamic background to overcome the limitations observed with static
background like illumination conditions and artifacts due to movement of
leaves etc.
Xue et al (2009) have proposed a system for multi-view visual
target surveillance system in WSN, which autonomously implemented target
classification and tracking with collaborative online learning and localization.
Complex Event Processing (CEP) for sensor networks was analyzed by
Dunkel (2009) and he processed complex event streams in real time. The
approach was based on semantically rich event models using ontologies that
allow representation of structural properties of event types. A survey
presented by Joshua et al (2010) described an overview of the state of the art
developments on behavior recognition algorithms for transit visual
surveillance applications. These techniques are often sensitive to poor
resolution, frame rate, drastic illumination changes, and frequent occlusions,
among other common problems prevalent in transit surveillance systems.
Norbert et al (2011) have presented a comprehensive review of
computer vision techniques for traffic analysis systems, with a specific focus
on urban environments. There is an increasing scope in intelligent transport
systems to adopt video analysis for traffic measurement. Traditional methods
were used for background estimation and perform top down classification,
which could raise issues under challenging urban conditions. Methods from
the object recognition domain (bottom up) have shown promising results,
overcoming some of the issues of traditional methods, but are limited in
different ways.
42
Liang et al (2012) proposed a scheme to track multiple video
targets and recovered their trajectories against occlusion, interruption, and
background clutter using stochastic sampling algorithm to iteratively solve the
spatial graph partition and temporal graph matching. Also this algorithm was
designed under the Metropolis-Hastings method without the need for good
initializations.
Mirabi and Javadi (2012) have presented an algorithm for accurate
segmentation and tracking of people in dynamic outdoor environments
The existing literatures proposed solutions for camera angle (Kang
et al 2004), pose variation, correspondence between the regions (Kang et al
2003), changing background (Iketani et al 1998), differences in appearances
between the object and background, (Tsagkatakis and Savakis 2011), multiple
views (Krumm et al 2000, Isard and MacCormick 2001, Anurag and Davis
2002, Saad and Mubarak 2006, Xue et al 2009), and background modelling
(Darrell et al 2001). For occlusion handling, various techniques such as KF
(Dalal and Triggs 2005, Zhao 2004, Piater and Crowley 2001, Mirabi and
Javadi 2012), minimum allowable distance between the object and the
background (Tsagkatakis and Savakis 2011), HMM (Rabiner 1986 and 1989)
and PF (Osawa 2006) have been proposed. In this thesis, a Combined
Gaussian Hidden Markov Model based Kalman Filter (CGHMM-KF) scheme
is proposed to accurately detect, classify and track multiple persons in the
complex scenario.