[ieee 2011 seventh international conference on natural computation (icnc) - shanghai, china...

4
978-1-4244-9953-3/11/$26.00 ©2011 IEEE 960 2011 Seventh International Conference on Natural Computation Mean-shift Algorithm Integrating with SURF for Tracking Jian Zhang College of Information Engineering Zhejiang University of Technology Hangzhou, China Jun Fang College of Computer Science and Technology Zhejiang University of Technology Hangzhou, China Jin Lu College of Information Engineering Zhejiang University of Technology Hangzhou, China Abstract—A new algorithm is proposed to solve the issue of dynamically changing tracking window size in Mean-shift progress. Firstly, the algorithm detects feature points in the target area of current and previous frames using SURF. Epanechnikov kernel function is introduced to increase the weights of feature points in the central area. After matching feature points in two frames, we can calculate the target scale parameters which are used for adjusting the tracking window size in current frame and the bandwidth of kernel function. The algorithm is proved to have a good performance on real-time tracking using a moving camera. Keywords-adaptive bandwidth; SURF; Mean-shift; kernel function I. INTRODUCTION Real-Time target tracking is the critical task in various computer vision systems, such as perceptual user interface, intelligent video compression, traffic monitoring and so on. Mean-shift is a non-parametric feature-space analysis technique. Application domains include clustering in computer vision and image processing. The Mean-shift algorithm creates a confidence map in the new image based on the color histogram of the object in the previous image, and finds the peak of a confidence map near the object's old position. Mean- shift is used widely in many real-time tracking applications due to its robustness and low computational complexity. The problem we addressed in this paper is selecting the bandwidth of the Mean-shift kernel, which plays an important role in the algorithm. However, the scale of the target often changes in time and it is difficult to specify a constant value for the bandwidth during the tracking procedure. The value should of course be proportional to the expected image area of the blob being tracking. In [1] a method was proposed by increasing or decreasing 10% of the window size, to find the best scale. When the target shrinks, the method has better tracking performance. When the target’s size increases, the window shrinks because the Bhattacharyya coefficient will achieve maximum in a smaller local area. In [2] the Mean-shift iterates in an additional scale of kernel, which is equivalent to use Epanechnikov kernel. In [3] a method of backward-based tracking was proposed, which did not mention to the case that target’ size decreasing. In [4] the bandwidth was obtained using forgetting factor, which has no ability to prevent the windows shrinking when the target’s size decreases. In [5] a self-adaptive bandwidth in scale direction was proposed, which has a high computational complexity. In this paper, a new paradigm for selecting the bandwidth of Mean-shift kernel is presented. We use SURF to extract the feature points in the tracking windows and find all pairs between previous and current frames. A scale factor which was calculated from the pairs is used to adjust the tracking window size in current frame. Epanechnikov kernel is introduced to increase the weight of the feature points in the center area. Various test sequences showed that the method we developed can be adapt to the changing of tracking window size effectively and is in a low computational complexity for real- time tracking. II. MEAN-SHIFT AND ITS LIMITATION A. The classic Mean-shift algorithm The target model is represented by m-bin histograms. Let * 1... { } i i n x = be the normalized pixel locations in the region defined as the target model. The function 2 : {1... } b R m associates to the pixel at location * i x the index * ( ) i bx of its bin in the quantized feature space. A monotonic kernel profile assigns smaller weights to pixels farther from the center. Then the m-bin histograms model of target can be expressed as 1... ˆ ˆ { } u u m q q = = . The probability of the feature 1... u m = in the target model is computed as 2 * * 1 ˆ ( )[( ) ] n u i i i q C k x bx u δ = = , (1) where δ is the Kronecker delta function. The normalization constant C is derived by imposing the condition 1 ˆ 1 m u u q = = .

Upload: jin

Post on 27-Mar-2017

212 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: [IEEE 2011 Seventh International Conference on Natural Computation (ICNC) - Shanghai, China (2011.07.26-2011.07.28)] 2011 Seventh International Conference on Natural Computation -

978-1-4244-9953-3/11/$26.00 ©2011 IEEE 960

2011 Seventh International Conference on Natural Computation

Mean-shift Algorithm Integrating with SURF for Tracking

Jian Zhang College of Information Engineering Zhejiang University of Technology

Hangzhou, China

Jun Fang College of Computer Science and Technology

Zhejiang University of Technology Hangzhou, China

Jin Lu College of Information Engineering Zhejiang University of Technology

Hangzhou, China

Abstract—A new algorithm is proposed to solve the issue of dynamically changing tracking window size in Mean-shift progress. Firstly, the algorithm detects feature points in the target area of current and previous frames using SURF. Epanechnikov kernel function is introduced to increase the weights of feature points in the central area. After matching feature points in two frames, we can calculate the target scale parameters which are used for adjusting the tracking window size in current frame and the bandwidth of kernel function. The algorithm is proved to have a good performance on real-time tracking using a moving camera.

Keywords-adaptive bandwidth; SURF; Mean-shift; kernel

function

I. INTRODUCTION

Real-Time target tracking is the critical task in various computer vision systems, such as perceptual user interface, intelligent video compression, traffic monitoring and so on. Mean-shift is a non-parametric feature-space analysis technique. Application domains include clustering in computer vision and image processing. The Mean-shift algorithm creates a confidence map in the new image based on the color histogram of the object in the previous image, and finds the peak of a confidence map near the object's old position. Mean-shift is used widely in many real-time tracking applications due to its robustness and low computational complexity.

The problem we addressed in this paper is selecting the bandwidth of the Mean-shift kernel, which plays an important role in the algorithm. However, the scale of the target often changes in time and it is difficult to specify a constant value for the bandwidth during the tracking procedure. The value should of course be proportional to the expected image area of the blob being tracking. In [1] a method was proposed by increasing or decreasing 10% of the window size, to find the best scale. When the target shrinks, the method has better tracking performance. When the target’s size increases, the window shrinks because the Bhattacharyya coefficient will achieve maximum in a smaller local area. In [2] the Mean-shift iterates in an additional scale of kernel, which is equivalent to use Epanechnikov kernel. In [3] a method of backward-based tracking was proposed, which did not mention to the case that

target’ size decreasing. In [4] the bandwidth was obtained using forgetting factor, which has no ability to prevent the windows shrinking when the target’s size decreases. In [5] a self-adaptive bandwidth in scale direction was proposed, which has a high computational complexity.

In this paper, a new paradigm for selecting the bandwidth of Mean-shift kernel is presented. We use SURF to extract the feature points in the tracking windows and find all pairs between previous and current frames. A scale factor which was calculated from the pairs is used to adjust the tracking window size in current frame. Epanechnikov kernel is introduced to increase the weight of the feature points in the center area. Various test sequences showed that the method we developed can be adapt to the changing of tracking window size effectively and is in a low computational complexity for real-time tracking.

II. MEAN-SHIFT AND ITS LIMITATION

A. The classic Mean-shift algorithm The target model is represented by m-bin histograms. Let

*1...{ }i i nx = be the normalized pixel locations in the region

defined as the target model. The function 2: {1... }b R m→ associates to the pixel at location *

ix the index *( )ib x of its bin in the quantized feature space. A monotonic kernel profile assigns smaller weights to pixels farther from the center. Then the m-bin histograms model of target can be expressed as

1...ˆ ˆ{ }u u mq q == . The probability of the feature 1...u m= in the target model is computed as

2* *

1ˆ ( ) [ ( ) ]n

u i iiq C k x b x uδ

== −∑ , (1)

where δ is the Kronecker delta function. The normalization constant C is derived by imposing the condition

1ˆ 1m

uuq

==∑ .

Page 2: [IEEE 2011 Seventh International Conference on Natural Computation (ICNC) - Shanghai, China (2011.07.26-2011.07.28)] 2011 Seventh International Conference on Natural Computation -

961

Let *1...{ }

hi i nx = be the normalized pixel locations of the target candidate, centered at y in the current frame. The same kernel profile ( )k x is used. The target candidates can be

expressed as 1

ˆ ( ) 1muu

p y=

=∑ , where ˆ ( )up y is the probability

of the feature 1...u m= in the target candidate and is given by

2

1ˆ ( ) ( ) [ ( ) ]hn i

u h ii

y xp y C k b x u

=

−= −∑ , (2)

where hC is the normalization constant.

Bhattacharyya coefficient is used in Mean-shift as the similarity measure. The Bhattacharyya coefficient between target model q̂ and target candidate p̂ is

1

ˆ ˆ( ) ( )m

u uu

y q p yρ=

=∑ . (3)

The range of ( )yρ is from 0 to 1.

To find the location in the current frame, the ( )yρ should be maximized. Initialize the location of the target in the current frame with 0y . Using Taylor expansion around the value

0ˆ ( )p y , the Bhattacharyya coefficient is obtained as

01 1 0

ˆ1 1ˆ ˆ( ) ( )ˆ2 2 ( )

m mu

u uu u u

qy p y q

p yρ

= =

≈ +∑ ∑ . (4)

By incorporating Equation.(2) in to Equation.(4), we obtain

20

1 1

1 1ˆ ˆ( ) ( ) (|| || )2 2

hnmi

u u h iu i

y xy p y q C w k

= =

−≈ +∑ ∑ , (5)

where

1 0

ˆ[ ( ) ]

ˆ ( )

mu

i iu u

qw b x u

p yδ

=

= −∑ . (6)

To maximize the coefficient, the second term in Equation.(5) has to be maximized by several iterations[1].

The center of tracking window is moved to 1y , which is the new location calculated in each iteration according to the relation

20

11

20

1

(|| || )

(|| || )

h

h

ni

i ii

ni

ii

y xx w ghy

y xw gh

=

=

=−

∑. (7)

B. Mean-shift and kernel bandwidth Kernel bandwidth is a critical parameter in Mean-shift. The

Epanechnikov function (Fig. 1) often be used to assign small weights to pixels farther from the center. The bandwidth is invariance while the target scale changes increases the possibility of tracking failure. In the case of the target is increasing out of the tracking window, the algorithm iterates into a local area of target; in the case of target scale decreases, background goes into the center of tracking window in which points are assigned a greater weight, and the window is located between target and background.

Figure 1. Epanechnikov kernel.

III. PROPOSED SOLUTION

A. The affine model We can get the x, y axis scaling factor according to rigid

moving object obeys the affine model. These factors are used to adjust the bandwidth of kernel under

0'

0'x

y

sx xv

sy y⎛ ⎞⎛ ⎞ ⎛ ⎞= +⎜ ⎟⎜ ⎟ ⎜ ⎟

⎝ ⎠ ⎝ ⎠⎝ ⎠, (8)

where ( , )Tx y is the coordinates of a feature point in the previous frame, ( ', ')Tx y is the coordinates of the feature point in current frame, v is the shift parameter between two frames, and xs , ys are the scaling parameter which we desire[3].

Page 3: [IEEE 2011 Seventh International Conference on Natural Computation (ICNC) - Shanghai, China (2011.07.26-2011.07.28)] 2011 Seventh International Conference on Natural Computation -

962

B. Feature points detection and matching SURF is a fast feature point detection algorithm with high

accuracy. Juan proved that SURF is the best choice in conditions like that illumination changing and scale changing[6], which appear frequently in our setting.

We first calculate feature points in the most recent two adjacent frames t-1, t, where t represents the current frame. In order to avoid unnecessary computation, we detect feature points in a predicted area (Fig. 2) which contains the target.

Figure 2. Tracking window prediction.

Due to feature points of background be detected in the edge of window, we introduce Epanechnikov kernel to assign small weights to these points. The Epanechnikov function is written as

23( ) (1 )4

E s s= − , (9)

where s is ( ) /iy x h− , ix is the coordinate of feature points.

We match the points in the two frames, then calculate scaling factor in x, y-axis direction from these pairs of feature point.

C. Calculate the scaling factor

Let xs , ys represent the scaling factor in x, y-axis direction respectively. We can obtain them according to

1 1

. .

. .

t tj k

xi t tj k

p x p xs

p x p x− −

−=

−, (10)

where .tjp x denotes the x coordinate of the feature point j

in frame t. All scaling factors are recorded in the structure { , }

ix is weight , where the iweight is the sum of weight, i.e. 1 1t t

j kp weight p weight− −+i i . If there are N pairs of feature point,

the combination of j and k is 2NC . The values those are

greater than 1.5 or less than 0.5 are eliminated, assuming the target in two adjacent frames should not have such a fierce change in size.

Let M represent the valid scaling factor number. We can obtain an accurate scaling factor according to

1 _ _

Mi

x xii

weights s

sum weight x=

=∑ , (11)

where _ _sum weight x is 1

Mii

weight=∑ . The formula

'

_ _i

iweightweight

sum weight x= (12)

impose the condition '1

Mii

weight=∑ .

ys can be obtained using the same method. Now the kernel bandwidth is max( , )x yh h s s= ⋅ .

D. Algorithm description In order to better meet the needs of real-time tracking, we

predict the position of tracking window in current frame t according to the positions in frame t-2 and t-1. Moving object is obtained by background subtraction.

The algorithm we developed is described as follow:

1) extract feature points in the tracking window in frame t-1 and in the predicted area in frame t respectively;

2) find the pairs of each point in frame t-1 and frame t; 3) calculate all scaling factors, remove the invalids; 4) obtain the target scaling factor according to

Equation.(11) and adjust the bandwidth.

IV. RESULTS

We compared the algorithm proposed in this paper with the classic Mean-shift. The sequence has 204 frames of 320 × 240 pixels, and a book was tracked. The CPU is Intel dual-core E7400 2.8GHz, running a Windows XP. The computing task focuses on target modeling and matching, feature points detection and matching, and is in proportion to tracking window size.

Figure 3 shows object tracking results using classic Mean-shift. The window size is fixed in scale. With the target size changing, the windows located between target and background.

Page 4: [IEEE 2011 Seventh International Conference on Natural Computation (ICNC) - Shanghai, China (2011.07.26-2011.07.28)] 2011 Seventh International Conference on Natural Computation -

963

Figure 3. Tracking results of classic Mean-shift

Figure 4 shows object tracking results using the method we developed. The window size is adapt to the book’s size changes correctly. The former 4 images present the results when object’s size decreases, the latter 3 images present the results when object’s size increases.

Figure 4. Tracking results of this paper

Figure 5 shows the curve of scaling factor during the whole sequence. We can see there is no sharp change of scaling, indicating that this algorithm can adapt to changes in target size.

Figure 5. Scaling factor curve of book seqence

V. CONCLUSIONS

The limitations of Mean-shift are analyzed and a new method for choosing the correct bandwidth of kernel is proposed in this paper. We use SURF to detect feature points in the most recent adjacent frames, and then find the points’ pairs. A reliable scaling factor can be calculated from these pairs. The results show that the algorithm is stable and keeps a low computational complexity for real-time tracking.

REFERENCES [1] D. Comaniciu and V. Ramesh, "Mean shift and optimal prediction for

efficient object tracking," in International Conference on Image Processing, Vancouver, BC , Canada, 2000, pp. 70-73.

[2] R. T. Collins, "Mean-shift blob tracking through scale space," in IEEE Computer Society Conference on Computer Vision and Pattern Recognition, Vancouver, BC, Canada, 2003, pp. 234-240.

[3] P. Ningsong, Y. Jie, L. Zhi, and Z. Fengchao, "Automatic selection kernel-bandwidth for Mean-Shift object tracking," Journal of Software, vol. 16, pp. 1542-1550, 2005.

[4] D. Comaniciu and P. Meer, "Mean shift: A robust approach toward feature space analysis," IEEE Transactions on pattern analysis and machine intelligence, vol. 24, p. 603, 2002.

[5] Z. Heng, L. Lichun and Y. Qifeng, "Scale and direction adaptive Mean Shift tracking algorithm," Optics and Precision Engineering, vol. 16, pp. 1133-1139, 2008.

[6] L. Juan and O. Gwun, "A Comparison of SIFT, PCA-SIFT and SURF," International Journal of Image Processing (IJIP), vol. 3, p. 143, 2010.