three camera-based human tracking using weighted color and

Three Camera-Based Human Tracking Using weighted Color and Cellular LBP histograms in a Particle Filter Framework

Abstract: In this paper an effective three view multiple human tracking method based on color and texture information fusion is proposed. Since human motion is usually non-linear and non-Gaussian, a particle filter framework is used to estimate human position. Human model is jointly represented by weighted color and cellular LBP (cellular local binary pattern) histograms. Weighted color histogram is robust to scale invariant and partial occlusion but has a main limitation when object's color and background's color are similar; so using these two complement features improve tracking results. This method is robust against illumination changes and occlusions. A three-camera network is used to handle occlusion. Tracking process has done separately for each camera, when occlusion is detected in one view. Tracking results of two other views are used to handle occlusion. Experimental results demonstrate that the proposed method improves performance of human tracking. Keywords: human tracking, cellular LBP, weighted color histogram, occlusion, three-camera network.

1. Introduction Human tracking is one of the most important topics in

computer vision and it has many applications such as visual surveillance, human computer interface, human monitoring, games and medical applications. In recent years, many human tracking methods have been proposed, but the performance of these methods is affected by problems such as occlusion, illumination changes and scene clutter. In recent years, multiple camera approaches have received a lot of attention of the computer vision research community. The use of multiple cameras in a tracking system, because of its wide field of view, has advantages over single view and allows additional information to be extracted from a scene. When multiple cameras monitor a scene, some regions may be occluded in one view but visible in other views. To resolve occlusion problem, a two view tracking method is introduced in [1] which used the homography relation between two views to handle occlusion. Particle filters [4] are a sequential Monte Carlo method based on sample representation of probability density function. Particle filter allows objects to be tracked without the need to detect objects in every frame. In [5] an important advantage of particle filtering framework has been proposed that allows the information from different

sources to be fused in a principled manner. In [2] a real-time tracker based on color, texture and motion information is introduced. RGB color histogram and correlogram are exploited as color features, texture properties are represented by local binary patterns (LBP). In [3], a particle filter based tracking method is proposed, in which color and original LBP histograms are used to model objects. These features result a system with less sensitive to illumination changes and partial occlusion.

In this paper, a particle filter based multi camera tracking system is proposed to track humans in video sequences. Human tracking is done in each camera separately and homography correspondence between cameras is used to handle occlusion. Weighted color and cellular LBP histograms are used to describe humans. LBP is a local structure descriptor and has poor result in human tracking, so this paper calculates human’s LBP information in a cellular structure to improve the performance.

The reminder of this paper is as follows. In Section 2 weighted color and cellular LBP histograms are calculated. In Section 3 human tracking algorithm based on particle filter framework in one view and three views is proposed. Experimental results are shown in Section 4 and the conclusion of paper is represented in Section 5.

2. Weighted Color and Cellular Local Binary Pattern Histograms

In this section, weighted color and cellular LBP histograms are explained which used to model humans in tracking. None of these features have reliable result alone, so this paper fuses features to improve tracking performance. These features improve robustness of algorithm against illumination changes. 2.1 Weighted Color Histogram

Most tracking methods use color features to represent targets because of its robustness against target deformation and partial occlusion. Color histogram is calculated in RGB space and a histogram is obtained separately for each color channel (R, G and B) with 8 bins. Boundary pixels of target area maybe belong to the background or affected by occlusion; so a kernel function

S. Rahimi*, A. Aghagolzadeh**, and H. Seyedarabi*** *Faculty of Electrical and Computer Engineering, University of Tabriz, Tabriz, Iran. ([email protected])

** Faculty of Electrical and Computer Engineering, Babol University of Technology, Babol, Iran. ([email protected]) *** Faculty of Electrical and Computer Engineering, University of Tabriz, Tabriz, Iran. ([email protected])

is used to improve the result [7]. Weighted histogram is obtained by assigning the larger weights to the central pixels and the smaller weights to the boundary pixels. A weighted histogram is obtained for each color channel separately. Let { }ˆ ˆ 1,...,q qu u m= = represents the human

model. The probability of each histogram’s bin in the target model is calculated by:

[ ]2 2

|| ||2ˆ ( )1

x y

n x x iq C K b x uu ii H Hδ

−∑= −=

⎛ ⎞⎜ ⎟⎜ ⎟+⎝ ⎠

(1)

where n is the number of pixels in the target region, δ is Kronecker delta function. Hx and Hy are half width and half height of target’s bounding box, respectively, b(xi) associates the pixel xi to the histogram bin and C is a normalization factor defined by

2

1

|| ||2 21

n x x iC Ki H Hx y

−⎛ ⎞⎛ ⎞⎜ ⎟−⎜ ⎟= ∑⎜ ⎟⎜ ⎟⎜ ⎟= ⎜ ⎟+⎝ ⎠⎝ ⎠

(2)

21 if r<1( ) 0 otherwise

rK r −= (3)

where K(.) is an Epanechnikov kernel function, r is the distance from the region center and

2|| || | |2 1

nx x x xi ii

∑− = −=

. Color similarity between

target and background is a limitation for this feature; so we use another feature to improve the tracking performance.

2.2 Cellular Local Binary Pattern Histogram Local Binary Pattern (LBP) is a simple texture operator with a low computational complexity that describes an object’s local structure efficiently [6] and is robust against illumination changes and object deformation. The original LBP operator labels the pixel (xc,yc) in an image by thresholding its 3 3× neighborhood with center value and forming the result as a binary number by Equation (4):

1( , ) ( ).2

0

M pLBP x y s g gp c c p cp

−∑= −=

(4)

1 x 0S(x)=

0 x <0≥

(5)

where gc is gray value of central pixel (xc,yc) and gp denotes gray value of neighborhood pixels. Original LBP operator has 256 ( 2M , that M is its eight neighbors) labels. The histogram of these labels with 256 bins is used as a texture descriptor. This operator is shown in Fig. 1. Original LBP is a common feature in face tracking but does not suit to describe human. Experimental results show its weakness in human description and tracking. So, in this paper, a cellular LBP operator is used to improve the poor performance of the original LBP in human tracking domain and also use the advantages of this

feature. In this paper, first, Sobel operator is applied to enhance edge information of gray scale region [8], then each region is resized to 128×64 pixels. The obtained region is divided into 128 cells of size 8×8 pixels. An original LBP histogram is calculated independently for each cell (the histogram of each cell has 256 bins), all histograms are collected together to form the final cellular LBP histogram with 128×256=32768 bins. The algorithm is shown in Fig. 2.

Fig. 1: LBP operator diagram. (a) the pixel and its eight neighborhoods. (b) thresholding result. (c) weights of the LBP template.

Fig. 2: Cellular LBP calculation: (a) input RGB image. (b) gray scale image. (c) applying Sobel operator. (d) dividing image into 128 cells.

2.3 Bhattacharyya Distance Consider two histograms, target model { }ˆ ˆ

1,...,q qu u m= = and candidate model { }ˆ ˆ1,...,p pu u m= = ;

the similarity between these two histograms is calculated by Bhattacharyya coefficient as:

[ ]ˆ ˆ ˆ ˆ, .1

mq p q pu uu

ρ ∑==

(6)

[ ]ˆ ˆ1 , .d q pρ= − (7)

where [ ]ˆ ˆ,q pρ is Bhattacharyya coefficient and d is Bhattacharyya distance.

3. Human Tracking Algorithm This paper focuses on human tracking in video

sequences despite challenges like occlusion and illumination changes. If selected object is hidden by other objects, it is called occlusion. In human tracking domain, if selected human is hidden by other humans or objects, such that human cannot be seen by camera, then the selected human is named occluded human and the camera's view is named occluded view. Therefore multi-

Threshold

(a) (b)

(c)

Binary code: 01010101

Decimal: 85

Final cellular LBP histogram

camera system is used to overcome this problem. In this paper, humans are tracked in each camera separately and three-camera network is used to handle occlusions. In this section, first, single camera tracking method is introduced, then three-camera network is introduced and occlusion handling method is proposed. 3.1 Single Camera Tracking

First step of human tracking algorithms is human detection. Some papers manually select humans in the first frame. In this paper, Gaussian mixture model background subtraction algorithm [9] is used to find humans in the first frame. After finding humans, their weighted color and cellular LBP histograms are extracted to model them. Particle filter is used to estimate human position in next frames. Particle filter [10] is a sequential Monte Carlo method which estimates posterior density function and human state using a set of N weighted particles {Xt

k,wtk}k=1:N, where Xt

k and wtk are the state and the

weight of kth particle at time t, respectively. Particle filter estimates the state of human (human's bounding box), X t , at time t as a posterior density function as:

{ }, , ,X x y Hx Hyt t t t t= (8)

( | ) . ( )1: 1

N k kp X Y w X Xt t t tt kδ∑≈ −

= (9)

where ,x yt t specify center of object’s bounding box,

,Hx Hyt t are half-width and half-height of bounding box,

1:Y t is the measurements up to time t and δ is Dirac function. The weight is calculated by:

( | ) ( | )11

( | , )1

k k kp Y X p X Xk k t t t tw wt t k kq X X Yt tt

−∝ −−

(10)

where 11

N kw tk∑ ==

.

Human's bounding box is obtained in two steps which are the prediction step and the update step. At the prediction step, N particles spread randomly around human position that is calculated at prior frame. At the update step, a weight is assigned to each particle according to similarity between human model and each particle's observation. The process of algorithm is as follows:

• Initialization: at the first frame, background subtraction result is used to denote humans. The weighted color and the cellular LBP histograms of each human are calculated as human model. For each human, particles are generated

according to human state { },0 0 1

Nk kX wk =

where

10kw

N= .

• Propagation: at the next frames, the particles are propagated using a random walk model by

1 1X X nt t t= +− − (11)

where X t is the human state at time t, 1X t − is the

human state at time t-1 and 1tn − is a random value.

• Observation: according to each particle's state, a bonding box is earned, the weighted color and the cellular LBP histograms of each bounding box,

{ }ˆ ˆ1,...,p pu u m= = , are calculated and compared

with the human model, { }ˆ ˆ1,...,q qu u m= = , to

calculate each particle’s weight based on Bhattacharyya distance as:

221 2

22

dccw ec

c

δ

πδ

−= (12)

2

22122

dcell LBP

cell LBPw ecell LBPcell LBP

δ

πδ

−−−=−

−

(13)

2 2. .2 2 2 2

Cell LBPCell LBP

Cell LBP Cell LBP

d dc w wcd d d dc c

w −−

− −

= ++ +

(14)

where cw and cell LBPw − are the corresponding weight of weighted color and cellular LBP histograms, respectively. cd and

cell LBPd

−are

calculated from (7) and w is particle’s weight that is calculated by Wang's method [14]. This step is repeated for each particle.

• Output: the current state of human is estimated according to the minimum variance estimation method by:

1

ˆ .N

k kt t t

kX w X

==∑ (15)

where ˆtX is human's current state at time t;

ktX and k

tw are the state and weight of kth particle at time t, respectively.

• Resample: in this paper, residual resampling method [15] is used to avoid the degeneracy problem.

• Update: to improve the tracking result, the human model is updated timely as

1 1ˆ(1 ) ( )t t t tq q p Xβ β+ += − + , where β is

an updating coefficient, 1tq + is updated model

and 1ˆ( )t tp X+ is new observation.

3.2 Three-Camera Tracking Recent surveillance systems use multiple cameras to

increase security and improve performance. A single camera tracking algorithm that was proposed in the previous section has poor result in occluded humans tracking because of its limited field of view; so multi-camera tracking system is used to compensate the lack of visibility and handle occlusions. In this paper, tracking algorithm is done for each camera separately and homography transformation is used to find occluded human position. A three-camera system is used to locate occluded humans. First, occluded humans are detected at each camera separately; then homography transformation is used to locate occluded humans at occluded view. Human model, q̂ , is named non-occluded model.

Bhattacharyya distance between human model and observations (weighted color and cellular LBP histograms) that are extracted from human's bounding box is used as a criterion for occlusion detection. Bhattacharyya distance is calculated for three weighted color histograms and cellular LBP histogram by:

[ ]ˆ ˆ1 ,red red redd q pρ= − (16)

ˆ ˆ1 ,green green greend q pρ= − ⎡ ⎤⎣ ⎦ (17)

[ ]ˆ ˆ1 ,blue blue blued q pρ= − (18)

3red green blue

colordd d d

=+ +

(19)

[ ]ˆ ˆ1 ,Cell LBP Cell LBP Cell LBPd q pρ− − −= − (20)

where ˆ ˆ,red redq p are human model and human observation of red color component, respectively; ˆ ˆ,green greenq p are human model and human observation of

green color component, respectively; ˆ ˆ,blue blueq p are human model and human observation of blue color component, respectively; ˆ ˆ,Cell LBP Cell LBPq p− − are human model and human observation of cellular LBP feature, respectively; colord is color feature distance and

Cell LBPd − is cellular LBP feature distance. colord and

Cell LBPd − are compared with predefined thresholds

(Thcolor and Cell LBPTh − ). Occlusion is detected if

d Thcolor color> or Cell LBP Cell LBPd Th− −> .

After occlusion detection, positions of occluded humans are calculated using their positions at two other views and homography transformation. Homography transformation links position of the same points between two cameras; homography transformation just can be calculated in overlapped views (common views) of each

pair of cameras. In this paper, three cameras are used and for each pair of cameras, a homography transformation is calculated; it means that we need to calculate three homography transformations totally. Let ,x y and

,x y′ ′ be a pair of the corresponding points on the ground plane at view 1 and view 2. Homography transformation [11] can be calculated by

11 12 1321 22 23

1 1 1131 32

h h hx x xy H y h h h y

h h

′′ = =

⎛ ⎞⎡ ⎤ ⎡ ⎤ ⎡ ⎤⎜ ⎟⎢ ⎥ ⎢ ⎥ ⎢ ⎥⎜ ⎟⎢ ⎥ ⎢ ⎥ ⎢ ⎥⎣ ⎦ ⎣ ⎦ ⎣ ⎦⎝ ⎠

(21)

At least four corresponding points between two overlapped views are needed to estimate homography matrix H; this paper uses six points. For each corresponding points, two equations are generated that are shown in Equation (22), where ( , )i ix y is the point

position in view 1 and ( , )i ix y′ ′ is the point position in view 2. Homography matrix is calculated by solving 12 equations (Equation (23)).

( )

111213210 0 0 1

0221 00 02331321

hhhh

y x y y yx y i i i i ii i hx y x x x y xi i i i i i i h

hh

′ ′ ′− − −=

′ ′ ′− − −

⎛ ⎞⎜ ⎟⎜ ⎟⎜ ⎟⎜ ⎟⎜ ⎟⎜ ⎟⎝ ⎠

(22)

1

1112

0 0 0 1 131 1 1 1 11 11 0 0 0 211 1 1 1 1 1 1

0220 0 0 1 23

1 00 06 6 3132

6 6 6 6 6 6 66 6 6 6 6

hhhy x y y yx yhx y x x x y xh

x y y x y y y hx y x x x y x h

h

′ ′ ′− − −′ ′ ′− − −

=′ ′ ′− − −′ ′ ′− − −

⎛ ⎞⎜ ⎟

⎛ ⎞⎜ ⎟⎜ ⎟⎜ ⎟⎜ ⎟⎜ ⎟⎜ ⎟⎜ ⎟⎝ ⎠⎜ ⎟⎝ ⎠

(23)

Final position of the occluded human is calculated by the weighted sum of two other camera's outputs by:

.( ),: 2 X H Xoccluded i occluded i ii cameras

ϖ∑=⟨ ⟩

(24)

where X occluded is the occluded human position in

occluded camera, i indicates two other cameras; iϖ is the weight of camera number i; this weight is calculated according to similarity between human region and human model of camera number i using Equation (14).

,Hoccluded i is homography matrix between occluded

camera and camera number i and X i is the human position at camera number i. It is necessary to normalize each camera’s weight by:

: 2 cameras>

ii

ii

ϖϖ

ϖ<

=∑

(25)

4. Experimental Results The proposed algorithm in this paper has been

implemented using MATLAB and tested on a computer with 2.4 GHZ CPU and 4GB RAM. Cameras 1, 6 and 8 of PETS2009 dataset [12] and cameras 1, 2 and 3 of NLPR multi-camera datasete are used to evaluate the results. NLPR dataset covers situations like human by human occlusion; PETS2009 dataset covers situations like human by human and human by background occlusions. The performance of proposed algorithm is compared with both the performance of tracking algorithm with color feature [13] and the performance of tracking algorithm with color and original LPB features [3]. Results show robustness of the proposed method versus methods in [13] and [3]. The experiments are repeated 25 times for each algorithm to

compute the average position , (251ˆ ˆ ˆ ˆ[ , ])

125X X X x yii

∑= ==

.

The average tracking error for these algorithms is shown in Fig. 3. This error is calculated for frames 291 to 350 of camera 1 of PETS2009 dataset by

2 2( ) ( )E x x y yavg real real= − + − , where ,real realx y

are real position of humans. The given results in Fig. 3 show robustness and accuracy of cellular LBP feature against original LBP features that make cellular LBP as a proper feature in human tracking domain. The average tracking error of our algorithm is around 2 pixels but the average tracking error of algorithm proposed in [3] is around 8 pixels; it shows the advantage of our algorithm. The proposed algorithm can track multiple occluded humans with lower error because of using a three-camera network. The position of humans in frames 238, 248, 278, 295, 304, 314, 321, 340, 345 and 354 of cameras 1, 6 and 8 of PETS2009 dataset and the position of humans in frames 5, 10, 15, 20, 25, 30, 35, 40, 45, 50 and 55 of cameras 1, 2 and 3 of NLPR dataset are shown in Fig. 4 and Fig. 5, respectively. The results of two other cameras are used to handle occlusion at occluded camera. Fig. 4 shows two kinds of occlusion, human by human occlusion and human by background occlusion. Each two cameras have an overlapped view that is used to calculate homography transformation. For example, the points (x1,y1)=(467,223), (x2,y2)=(499,271), (x3,y3)=(656,123), (x4,y4)=(736,176), (x5,y5)=(597,335) and (x6,y6) =(286,215) of camera number 1 of PETS2009 dataset and the points (x’

1,y’1)=(272,212), (x’

2,y’2)=(314,265),

(x’3,y’

3)=(494,141), (x’4,y’

4)=(629,165), (x’5,y’

5)=(563,384) and (x’

6,y’6)=(6,209) of camera number 6 of PETS2009

dataset are used to calculated homography transform from camera number 1 to camera number 6 (Equation (26)).

0.8131 0.6692 84.9787

0.0217 0.0710 108.6745

0.0000 0.0020 1

H

− −

= −

− −

⎛ ⎞⎜ ⎟⎜ ⎟⎜ ⎟⎝ ⎠

(26)

5. Conclusion In this paper, a human tracking algorithm is proposed.

Weighted color and cellular LBP histograms are used to model humans. Experimental results show he robustness of cellular LBP feature against original LBP feature in

human tracking domain. Occlusion is one the most challenging issue in tracking algorithms. In this paper, a three-camera network with overlapped views is used to handle occlusions and track humans.

Fig. 3: Average tracking error in our algorithm comparing with

algorithms proposed in [3] and [13].

References [1] Z. Yue, S. K. Zhou, and R. Chellappa, “Robust two-camera

tracking using homography,” in proc. 2004 IEEE Acoustics, Speech, and Signal Processing Conf., vol. 3, pp. 1-4.

[2] V. Takala, and M. Pietikainen, “Multi-object tracking using color, texture and motion,” in proc. 2007 IEEE Computer Vision and Pattern Recognition Conf., pp. 1-7.

[3] C. Ruiqing, Z. Zhaohui, L. Hanqing, C. Huiqing, and Y. Yukun, “Particle filter based object tracking with color and texture information fusion,” in proc. 2009 SPIE Automatic Target Recognition and Image Analysis Conf., vol. 7495, pp. 1-8.

[4] M. S. Arulampalam, S. Maskell, N. Gordon, and T. Clapp, “A tutorial on particle filters for online nonlinear/non-Gaussian Bayesian tracking,” IEEE Trans. Signal Processing, vol. 50, no. 2, pp. 174-188, 2002.

[5] P. Perez, J. Vermaak, and A. Blake, “Data fusion for visual tracking with particles,” in proc. 2004 IEEE, vol. 92, no. 3, pp. 495-513.

[6] T. Ojala, M. Pietikainen, and D. Harwood, “A comparative study of texture measures with classification based on feature distributions,” Pattern Recognition, vol. 29, no. 1, pp. 51-59, 1996.

[7] D. Comaniciu, V. Ramesh, and P. Meer, “Kernel-based object tracking,” IEEE Trans. Pattern Analysis and Machine Intelligence, vol. 25, no. 5, pp. 564-577, 2003.

[8] Tian Hui, Chen Yi-qin, Shen Ting-zhi, “Face tracking using multiple facial features based on particle filters,” in proc. 2010 IEEE Informatics in Control, Automation and Robotics (CAR) Conf., vol. 3, pp. 72-75.

[9] C. Stauffer, and W. Grimson, “Adaptive background mixture models for real-time tracking,” in proc. 1999 IEEE Computer Vision and Pattern Recognition Conf., vol. 2, pp. 246-252.

[10] P. Brasnett, L. Mihaylova, D. Bull, and N. Canagarajah, “Sequential Monte Carlo tracking by fusing multiple cues in video sequences, ” Image and Vision Computing, vol. 25, no. 8, pp. 1217-1227, 2007.

[11] R. Hartly, and A. Zisserman, Multiple View Geometry in Computer Vision, Cambridge Univ. Press, 2000.

[12] http://ftp.pets.rdg.ac.uk/PETS2009/, last access in August 2012. [13] K. Nummiaro, E. Koller-Meier, and L. V. Gool, “An adaptive

color-based particle filter,” Image and Vision Computing, vol. 21, no. 1, pp. 99-110, 2003.

[14] Y. Wang, Y. Tan, and J. Tian, “Adaptive hybrid likelihood model for visual tracking based on Gaussian particle filter,” Optical Engineering, vol. 49, no. 7, pp. 077004-077011, 2010.

[15] T. Higuchi, “Monte Carlo filter using the generic algorithm operators,” Journal of Statistical Computation and Simulation, vol. 59, no. 1, pp. 1-23, 1997.

290 300 310 320 330 340 3500

2

4

6

8

10

12

14

16

Frame Number

Ave

rage

Tra

ckin

g E

rror

Color Color and Cellular LBPColor and original LBP

Fig. 4: Three cameras tracking results of PETS2009 dataset: (a)

Tracking results of camera 1 of dataset. (b) Tracking results of camera 6 of dataset. (c) Tracking results of camera 8 of dataset.

Fig. 5: Three cameras tracking results of NLPR dataset: (a) Tracking results of camera 1 of dataset. (b) Tracking results of camera 2 of dataset. (c) Tracking results of camera 3 of dataset.

(a) (b) (c)

(a) (b) (c)

three camera-based human tracking using weighted color and

Documents