[communications in computer and information science] informatics engineering and information science...

A. Abd Manaf et al. (Eds.): ICIEIS 2011, Part II, CCIS 252, pp. 96–103, 2011. © Springer-Verlag Berlin Heidelberg 2011

A New SIFT-Based Camera Calibration Method for Hybrid Dual-Camera

Yi-Qian Low, Sze-Wei Lee, Bok-Min Goi, and Mow-Song Ng

Faculty of Engineering and Science Universiti Tunku Abdul Rahman

Kuala Lumpur, Malaysia [email protected],{leeszewei,goibm,ngms}@utar.edu.my

Abstract. Camera networks, consisting of various types of camera systems, play an important role in security surveillance system. This paper presents a new calibration method for hybrid multi-camera system; particularly, a video surveillance system with a static camera and a dynamic camera which is used for environment-monitoring and security purpose. The first static wide angle camera covers the complete scene, whereas the second dynamic camera, Pan-Tilt-Zoom (PTZ) camera provides multi-view-angle and multi-resolution images of the complete scene. The new proposed calibration method is based on Lowe’s Scale invariant Feature Transform (SIFT) algorithm and keypoints are selected based on the measurement of their stability. To improve the accuracy and robustness, a simple noise (unwanted keypoints) filtering technique using trigonometry theorem has also been adopted in the proposed system. From the obtained experimental results, it is shown that great improvement, in term of the determination and detection rate (from 55.71% to 94.87%) in camera networks calibration, has been achieved.

Keywords: Camera Calibration, SIFT Algorithm, Mutli-camera System.

1 Introduction

Surveillance is the monitoring of the behaviour, activities, or changing information, usually of people and often in a surreptitious manner. Nowadays, video surveillance has become an important tool to help to reduce crime and protect public spaces. System surveillance reduced crime statistics, but there are drawbacks in the use of classic video camera; it obtains low resolution information and lack of flexibility to observe the complete scene. To overcome these problems, we propose a method which uses two different types of cameras; namely, a static wide angle camera and an active Pan-Tilt-Zoom (PTZ) camera. The static wide angle camera is used to observe a complete scene at a distance to provide a global view and used to detect and track multiple objects. Accuracy of the camera intrinsic and extrinsic parameter will affect and improve the measurement accuracy[1].

Camera calibration is much more complex when handling different kinds of cameras. Assuming that all cameras are linked in a camera networks, undergo a planer to control PTZ camera and object detection in static camera. But, background

A New SIFT-Based Camera Calibration Method for Hybrid Dual-Camera 97

appearance is not stationary and camera parameters are kept on changing throughout the time while the PTZ cameras pan, tilt and zoom. Hence, it is difficult to compute and justify the spatial position of the object[3]. The traditional calibration methods need a known structure, high precision calibration object as a space reference and some tailor-designed algorithms to get the parameters of the cameras. This is to relate each others between the space point and the image point. These two principal sources of difficulty in performing the task are: (a) different appearance of the object from different viewpoints and illuminations; and (b) partial occlusion of the object of interest by other object[4]. Therefore, in order to overcome the shortcoming of the traditional methods, the Lowe’s Scale invariant Feature Transform (SIFT) algorithm has been used to perform self-calibration in camera networks. It does not need any calibration object as the calibration can be done directly relying on the relationship of corresponding points of the number of image solely[1].

Fig. 1. The Hybrid Dual-Camera System

In this paper, we made three contributes:

a) Setup a testbed – the hybrid dual-camera systems, to collect data and fine tuning the parameters of PTZ camera, as shown in Fig. 1.

b) Propose, implement and test the new SIFT-based camera methods, with various parameter sets.

c) Introduce a novel filtering method by using trigonometry theorem. For proof of concept, this method has been adopted in the proposed methods and the empirical results showed that the detection rate of the calibration has been improved dramatically.

98 Y.-Q. Low et al.

2 Background and Literature Review

The purpose of the dual camera calibration is to determine the coordinate of the region of interest on two camera images, and calibrate it to obtain a higher accuracy coordinate of the region of interest. Feature-based approach has been widely used in computer vision image processing. Hence, the most common image features in previous work are image contour, corner, region of interest or interest point and etc. Featured-based algorithms involve the extraction of regions of interest in the image and then identification of the counterparts in individual images of the sequence [5]. The well execution of feature extraction will reduce the amount of workload to be proceeded and also obtaining a higher level of understanding of scene as these features are matched between the frames. Meanwhile, Lowe’s Scale Invariant Feature Transform (SIFT) features allows to transform an image into a large collection of local feature vectors, each of which is invariant to image translation, scaling, and rotation, and partially invariant to illumination changes and affine or 3D projection [6].

2.1 SIFT Theory

SIFT algorithm performs efficiently by using staged filtering approach [6]. The first stage identifies key locations in scale space by looking for locations that are maxima or minima of a difference-of-Gaussian function. Each point is used to generate a feature vector that describes the local image region sampled relative to its scale-space coordinate frame. The features achieve partial invariance to local variations, such as affine and 3D projections by blurring image gradient locations. The resulting vectors are called SIFT keys. The SIFT keys derived from an image are used in a nearest-neighbour approach to indexing to identify candidate object models. The feature extraction can be computed efficiently by building an image pyramid with re-sampling between each level.

According to Lowe’s method [8], SIFT algorithm detects keypoints using a cascade filtering approach to identify locations and scales that can be repeatedly assigned under differing views of the object.

a) Scale-space extrema detection: The first stage of computation searches over all scales and image locations. Difference-of-Gaussian is being used to identify potential interest points that are invariant to scale and orientation.

b) Keypoint localization: At each potential interest point, a detailed model is fit to determine location and scale. Keypoints are selected based on measures of their stabilities.

c) Orientation assignment: One or more orientations are assigned to each keypoint location based on local image gradient directions. All future operations are performed on image data that has been transformed relative to the assigned orientation, scale, and location for each feature, thereby providing invariance to these transformations.

d) Keypoints descriptor: The local image gradients are measured at the selected scale in the region around each keypoint. These are transformed into a representation that allows for significant levels of local shape distortion and change in illumination.


The SIFT algorithm computes for each keypoint and its location in the image as well as a distinctive 128-dimension descriptor vector associated with it [8, 9]. Matching a keypoint to a keypoints database is usually done by identifying its nearest neighbour in that database. The nearest neighbour is defined as the keypoint with minimum Euclidean distance to the keypoint descriptor [8]. To reduce the lag in the matching process, the ratio of the distance of the closest neighbour to that of the second closest neighbour is computed.

In matching processing, it is an exhaustive search due to high dimensionality of the keypoint. In order to increase the processing speed, k-dimensional tree [12] provide an efficient search for more than about 10-dimensional spaces. The tolerance threshold value is 0 < χ < 1. The smaller value χ, the higher similarity rate is needed to match each keypoint.

2.2 Theorem Trigonometry Filter (TTF)

In plane geometry, is about shape that can be show in 2-dimentional image. While we captured both images at the same time by using hybrid dual-camera system, it contains high similarity information. This image-to-image homology makes matching process much simpler. So, it is assume there are no rotations around and there is only one orientation. It is a good assumption to calculate the gradient of pairs’ keypoints and classified them into either positive or negative orientation.

Theorem Trigonometry Filter playing the role to calculate and classified the orientation of the images. The threshold allow in the filter is 0 < θ < 90 degree of gradient. While the keypoints’ gradient is greater or smaller than certain threshold for example, average degree of gradient ± 5 will be discarded.

Fig. 2. The setup of PTZ and Wide-Angle Cameras

3 Proposed Calibration Method

In our approach, different value was set for the PTZ. With each value, images were captured from each of the camera and build a camera network database. The construction of the database was built from frame-to-frame of the current view in wide angle camera and PTZ camera. Then, it is proceed with image extraction by

100 Y.-Q. Low et al.

using SIFT algorithm. In SIFT algorithm, each image was computed and keypoints information were gathered. After collected all the keypoints from both cameras, both images were matched according to nearest search in the feature descriptor space, k-d tree. Yet, Theorem Trigonometry Filter was being used in order to increase the detection rate of the matching process. Lastly, we localized the respected view to the scene. These processes will keep looping until the PTZ camera covered entire environment while matching with wide angle camera.

Through these processes, we stored the entire coordinate’s value in to database. These large databases are useful while tracking the region of interest in future enhancement, for example, object tracking from one place to the other.

Fig. 3. Flowchart of the Processing Steps of the Proposed Hybrid Dual-Camera system

4 Experiment Results

Two different types of cameras which were used in this project were Samsung Mini SmartDome (PTZ) and a normal wide angle camera. PTZ camera consists of

Start

• Set PTZ value • Capture images

from both cameras

• Image features extraction by using SIFT

• Keypoints matching by using KD-Tree

• Filter the keypoints by using Theorem Trigonometry Filter

• Localize respected view • Store coordinate value

into database

Next image?

End

Yes

No


10Megapixels resolution and 10 times optical zoom while wide angle camera has 2Megapixels resolution and 1 times optical zoom. Both of them were placed at the corner of the room, at about 2.5 meters height and 0.3 meter from each other. Images from both cameras were taken at 320x240 pixels of resolution.

Fig. 4 shows an example of current-frames from the PTZ camera were captured with different values of pan, tilt and zoom. The images stored inside the database. The large database will matched with the wide angle camera image. Fig. 5 shows the matching process between wide angle camera and PTZ camera with SIFT algorithm.

Fig. 4. Multiple Frames by the PTZ Camera with Different Values of Pan, Tilt and Zoom

In Fig. 4, the matching process was conducted to identify location of the image from wide angle camera using the PTZ camera’s image as a model. Both images were captured from cameras and cascaded side by side. Then, matching process using SIFT algorithm and Theorem Trigonometry Filtering were demonstrated. The red solid lines show the correct matches while the blue lines are not in the TTF threshold value. Lastly, we localize the respected view which was a yellow box and only truth keypoints selected were shown in the images. The red box was the region of interest when matching process was completed.

102 Y.-Q. Low et al.

Fig. 5. SIFT-based Keypoint Detection and Matching between Wide Angle Camera (Left) and PTZ Camera (Right)

Table 1. The accuracy(%) by using different threshold values

Kd-tree matching threshold

Positive Matching

Pairs

Negative Matching

Pairs

Without Filter

(%)

With Filter

(%)

0.4 39 31 55.71 94.87

0.5 55 46 54.46 90.90

0.6 75 67 52.81 90.67

In this experiment, different kd-tree matching threshold value applied to a sample

image. Through the process in Fig. 3, the threshold value 0.4 computed 39 pairs of positive and 31 pairs of negative gradient keypoints that are found at matching locations. With the larger vote of positive pairs, we defined positive gradient consist more region of interest in the image. So, the calculation of matching percentage is 55.71%. Apparently, the rate of accuracy is increase from 55.71% to 94.87% after applied the Theorem Trigonometry filter.


5 Conclusion and Future Works

In this paper, the proposed calibration method for both different appearance of the object from different viewpoint and illumination has been presented, based on the basic PTZ camera networks. The empirical results showed that the proposed method could increase accuracy of calibration from 55.71% to 94.87%. Furthermore, our proposed algorithm does not require any 3D pre-processing in order to identify the region of interest from the view of PTZ camera. This approach produces a better solution in camera calibration in different scenarios. However, there are some limitations of the proposed approach, i.e. the detection rate of the image will be affected by the objects with high similarity located nearby. For future works, we are going to further reduce computational time and fine tune the internal and external parameters which cause the imperfection of the system.

References

1. Liu, R., Zhang, H., Liu, M., Xia, X., Hu, T.: Stereo Cameras Self-calibration Based on SIFT. In: 2009 International Conference on Measuring Technology and Mechatronics Automation (2009), doi:10.1109/ICMTMA.2009.338

2. de Agapito, L., Hayman, E., Reid, I.D.: Self-calibration of rotating and zooming cameras. International Journal of Computer Vision 45(2) (November 2001)

3. Del Bimbo, A., Dini, F., Lisanti, G., Pernici, F.: Exploiting distinctive visual landmark maps in pan–tilt–zoom camera networks. Computer Vision and Image Understanding 114, 611–623 (2010)

4. Bo, W., Nevatia, R.: Detection and Tracking of Multiple,Partially Occluded Humans by Bayesian Combination of Edgelet based Part Detectors. International Journal of Computer Vision, doi:10.1007/s11263-006-0027-7

5. Zhou, H., Yuan, Y., Shi, C.: Object tracking using SIFT features and mean shift. Computer Vision and Image Understanding 113, 345–352 (2009)

6. Lowe, D.G.: Object Recognition from Scale-Invariant Features. In: Proc. of International Conference on Computer Vision, Corfu, pp. 1150–1157 (September 1999)

7. Lindeberg, T.: Scale-space theory: a basic tool for analysing structures at different scales. J. Appl. Statist. 2(2), 224–270 (1994)

8. Lowe, D.G.: Distinctive Image Features from Scale-Invariant Keypoints. International Journal of Computer Vision (2004)

9. Liu, J., Hubbold, R.: Automatic Camera Calibration and Scene Reconstruction with Scale-Invariant Features. In: Bebis, G., Boyle, R., Parvin, B., Koracin, D., Remagnino, P., Nefian, A., Meenakshisundaram, G., Pascucci, V., Zara, J., Molineros, J., Theisel, H., Malzbender, T. (eds.) ISVC 2006. LNCS, vol. 4291, pp. 558–568. Springer, Heidelberg (2006)

10. Beis, J.S., Lowe, D.G.: Shape indexing using approximate nearest-neighbour search in high-dimensional spaces. In: CVPR, p. 1000 (1997)

11. Ke, Y., Sukthankar, R.: PCA-SIFT, A more distinctive representation for local image descriptors. In: CVPR, pp. 506–513 (2004)

12. Friedman, J.H., Bentley, J.L., Finkel, R.A.: An algorithm for finding best matches in logarithmic expected time. ACM Transactions on Mathematical Software 3(3), 209–226 (1977)

[communications in computer and information science] informatics engineering and information science...

Documents