[ieee 2014 11th international joint conference on computer science and software engineering (jcsse)...

6
Vehicle Logo Detection Using Convolutional Neural Network and Pyramid of Histogram of Oriented Gradients Wasin Thubsaeng, Aram Kawewong, Karn Patanukhom Visual Intelligence and Pattern Understanding Laboratory Department of Computer Engineering Chiang Mai University, Chiang Mai, Thailand [email protected] AbstractThis paper presents a new method for vehicle logo detection and recognition from images of front and back views of vehicle. The proposed method is a two-stage scheme which combines Convolutional Neural Network (CNN) and Pyramid of Histogram of Gradient (PHOG) features. CNN is applied as the first stage for candidate region detection and recognition of the vehicle logos. Then, PHOG with Support Vector Machine (SVM) classifier is employed in the second stage to verify the results from the first stage. Experiments are performed with dataset of vehicle images collected from internet. The results show that the proposed method can accurately locate and recognize the vehicle logos with higher robustness in comparison with the other conventional schemes. The proposed methods can provide up to 100% in recall, 96.96% in precision and 99.99% in recognition rate in dataset of 20 classes of the vehicle logo. Keywordsvehicle logo; logo detection; logo recognition; CNN; PHOG. I. INTRODUCTION Vehicle detection and recognition is one of the research problems in computer vision community that associates with intelligent transport systems. The purpose of this task is to locate areas of image that contain vehicles then find out what brands or models of the vehicles which can lead to automatic application system that can monitor, track vehicle, and detect violation of traffic law. The main difficulty of vehicle detection and recognition is that there are many vehicle models and designs for different brands available in present day which are also changing quickly overtime. Therefore to detect the vehicle, the logo, which tend to be identical in vehicles of same brand regardless of models and designs, is chosen as the main feature of vehicle detection. To find the logo on scene image, many researchers apply different image features and recognition models. However, many of the methods are not robust to complex backgrounds and vehicle textures. Apostolos et al. [1], [2] proposed SIFT- based method for vehicle logo detection and recognition. The system performance was enhanced by using merged features from multiple images. Generalized Hough transform was used for feature clustering [1] while geometric verification was applied by an affine transformation. Instead of using SIFT, vehicle logos can be located by using information on license plate location [2], [4] a symmetry axis [2], [3] or edge image [3], [4]. Kai et al. [3] proposed hybrid scheme for vehicle-logo localization using appearance features and symmetric property. The combination of symmetric property and edge-based features for grille detection, which can provide information about relative position of vehicle logos, is efficiently used to detect and recognize vehicles logo in front-view vehicle images. In comparison with other related works that use only symmetry, the work outperforms the other works in logo detection. In the work of Yang et al. [4] a vehicle-logo detection approach based on edge detection and projection has been proposed. Vehicle logo can be detected by using license plate location. After the plate is detected, vertical and horizontal projections are applied to locate logo area which is relatively close to the license plate. However, the methods which rely on the detection of the license plate may not perform well in many conditions such as when license plate cannot be located in the center of the vehicle’s front or when license plate is censored or blurred for privacy reasons as in the Google Street View. Tong Sam et al. [5] proposed vehicle detection and recognition models using modest Adaboost algorithm and radial Tchebichef moments. The system can recognize vehicle logos regardless of variation in viewpoint such as rotation, scaling, translation and skewing. For the detection of vehicle logo, Haar-like features are extracted to represent parts of image and modest Adaboost algorithm is used to classify them. After the logo has been detected, the system normalizes and shifts the logo back to the normal front view point. Then, the image pattern is represented by radial Tchebichef moments and is recognized with k-Nearest-Neighbor (k-NN) classifier. In this paper, we give attention to both the detection and recognition of vehicle logos from the front view and the view behind vehicles. The proposed method can identify the positions of vehicle logos and recognize them. To improve the accuracy, we propose two-stage scheme by using combination of Convolutional Neural Network (CNN) and PHOG which is illustrated in Fig 1. In the first stage, CNN, which is a framework used in many visual recognition works [6]-[9], is applied to select candidate regions that are likely to be the 2014 11th International Joint Conference on Computer Science and Software Engineering (JCSSE) 34 ,(((

Upload: karn

Post on 08-Feb-2017

217 views

Category:

Documents


2 download

TRANSCRIPT

Vehicle Logo Detection Using Convolutional Neural Network and Pyramid of Histogram of Oriented

Gradients

Wasin Thubsaeng, Aram Kawewong, Karn Patanukhom Visual Intelligence and Pattern Understanding Laboratory

Department of Computer Engineering Chiang Mai University, Chiang Mai, Thailand

[email protected]

Abstract— This paper presents a new method for vehicle logo detection and recognition from images of front and back views of vehicle. The proposed method is a two-stage scheme which combines Convolutional Neural Network (CNN) and Pyramid of Histogram of Gradient (PHOG) features. CNN is applied as the first stage for candidate region detection and recognition of the vehicle logos. Then, PHOG with Support Vector Machine (SVM) classifier is employed in the second stage to verify the results from the first stage. Experiments are performed with dataset of vehicle images collected from internet. The results show that the proposed method can accurately locate and recognize the vehicle logos with higher robustness in comparison with the other conventional schemes. The proposed methods can provide up to 100% in recall, 96.96% in precision and 99.99% in recognition rate in dataset of 20 classes of the vehicle logo.

Keywords— vehicle logo; logo detection; logo recognition; CNN; PHOG.

I. INTRODUCTION

Vehicle detection and recognition is one of the research problems in computer vision community that associates with intelligent transport systems. The purpose of this task is to locate areas of image that contain vehicles then find out what brands or models of the vehicles which can lead to automatic application system that can monitor, track vehicle, and detect violation of traffic law. The main difficulty of vehicle detection and recognition is that there are many vehicle models and designs for different brands available in present day which are also changing quickly overtime. Therefore to detect the vehicle, the logo, which tend to be identical invehicles of same brand regardless of models and designs, is chosen as the main feature of vehicle detection.

To find the logo on scene image, many researchers apply different image features and recognition models. However, many of the methods are not robust to complex backgrounds and vehicle textures. Apostolos et al. [1], [2] proposed SIFT-based method for vehicle logo detection and recognition. The system performance was enhanced by using merged features from multiple images. Generalized Hough transform was used for feature clustering [1] while geometric verification was applied by an affine transformation.

Instead of using SIFT, vehicle logos can be located by using information on license plate location [2], [4] a symmetry axis [2], [3] or edge image [3], [4]. Kai et al. [3] proposed hybrid scheme for vehicle-logo localization using appearance features and symmetric property. The combination of symmetric property and edge-based features for grille detection, which can provide information about relative position of vehicle logos, is efficiently used to detect and recognize vehicles logo in front-view vehicle images. In comparison with other related works that use only symmetry, the work outperforms the other works in logo detection. In the work of Yang et al. [4] a vehicle-logo detection approach based on edge detection and projection has been proposed. Vehicle logo can be detected by using license plate location. After the plate is detected, vertical and horizontal projections are applied to locate logo area which is relatively close to the license plate. However, the methods which rely on the detection of the license plate may not perform well in many conditions such as when license plate cannot be located in the center of the vehicle’s front or when license plate is censored or blurred for privacy reasons as in the Google Street View.

Tong Sam et al. [5] proposed vehicle detection and recognition models using modest Adaboost algorithm and radial Tchebichef moments. The system can recognize vehicle logos regardless of variation in viewpoint such as rotation, scaling, translation and skewing. For the detection of vehicle logo, Haar-like features are extracted to represent parts of image and modest Adaboost algorithm is used to classify them. After the logo has been detected, the system normalizes and shifts the logo back to the normal front view point. Then, the image pattern is represented by radial Tchebichef moments and is recognized with k-Nearest-Neighbor (k-NN) classifier.

In this paper, we give attention to both the detection and recognition of vehicle logos from the front view and the view behind vehicles. The proposed method can identify the positions of vehicle logos and recognize them. To improve the accuracy, we propose two-stage scheme by using combination of Convolutional Neural Network (CNN) and PHOG which is illustrated in Fig 1. In the first stage, CNN, which is a framework used in many visual recognition works [6]-[9], is applied to select candidate regions that are likely to be the

2014 11th International Joint Conference on Computer Science and Software Engineering (JCSSE)

34

manufacturer logo. In general process, CNN extracts low-level features from image. Then, unsupervised learning method such as K-means clustering [6], [7] Restricted Boltzmann Machines (RBM), auto-encoders, or sparse coding are used to learn model set of features. In this work, we use modified K-means in [6], [7] as the learning method instead learning centroids since the method can perform faster than the rest of the mentioned methods. After the first stage, candidate regions are located. However, the results may contain the correct logo areas along with falsely detected regions. In this framework, Pyramid of Histogram of Orientation Gradients (PHOG), which is successfully implemented for many tasks such as vehicle detection [11] or human posture recognition [12], is applied to verify the candidate regions and eliminate the false regions that not contain logo. The importance of this operation is that it can improve detection rate of vehicle’s logo which lead to improvement of overall performance of the system.

This paper is organized as followed: Section II gives the details of our proposed vehicle logo detection and recognition scheme. In order to evaluate performance of our schemes, the experimental methodology and results are presented in Section III, followed by conclusions of the work in Section IV.

II. THE PROPOSED FRAMEWORK

A. System Overview In this work, we propose vehicle logo detection and

recognition framework that can locate and recognize the vehicle manufacturer from the logo on the front-view and behind the vehicle of vehicle images. The proposed scheme is based on a combination of CNN and PHOG features. Overview of the proposed framework is shown in Fig 1. The system is able to locate and classify classes of vehicle manufacturer from their logos even if the testing images have complex environmental background. The process can be divided into two main steps which are locating candidate regions using CNN framework and verifying the candidate regions using PHOG features with Support Vector Machine (SVM) classifier.

In the initial process of our system prior CNN operations,sliding window search in multi-scale pyramid representation isapplied to obtain windows of images. However, to improve the detection speed in practical implementation, the system does not input every scanned window to CNN classifier. Instead, the system extracts saliency map [14] and use the map to rejects the windows with low edge density and low saliency value before inputs them to CNN classifier. Then, the system classifies whether each of the windows is the logo region or not and determines the classes of that logo in CNN stage.Results from CNN classifier are the labels for input windowed images. If the windows obtain the labels that belong to the logo classes, they are considered as candidate regions for the next stage. On the other hand, if the windows obtain the labels that do not belong to any logo classes, they are rejected as shown in Fig 1.

However, the results classified by CNN may contain some false detection. To verify every candidate region, PHOG features are extracted and binary SVM classifier is used for re-

checking whether the regions are logos or not. The SVM classifier in the second stage will reject the regions that do not labeled as logo and keep the answer of label class obtained in CNN stage as a final answer.

B. Candidate Region Detection Using CNN CNN is a machine learning scheme for feature extraction

and classification of images which is applied in many works such as character recognition [6], [7] and face recognition [9]. CNN take an image as input. To train CNN in the first stage, features are extracted from training set of positive samples which are logo images with given class labels of many interested brands, and negative samples which are images from environmental backgrounds.

In this stage, each sub-window of scanned image based on sliding window and multi-scale pyramid scheme is input to the multi-class CNN classifier to check whether it belongs to environmental background class or any logo class. According to the process, the windowed images undergo two types of operational layers in feature extraction process that are convolution layers and sub-sampling layers. These two operations may be repeated more than one time in CNN structure. Then, the output images from the final sub-sampling

Fig. 1. System overview diagram.

Fig. 2. Architecture of CNN used in this work.

Fig. 3. PHOG feature extraction process

Reject Reject

PHOG SVMCNN

ResultYes Yes

No No

SALIENCY

SVM

32x32 25x25x64 5x5x64 4x4x64 2x2x64

Convolution ConvolutionSubsampling Subsampling Classification

Level 0 Level 1 Level 2

2014 11th International Joint Conference on Computer Science and Software Engineering (JCSSE)

35

layers can be used as inputs for classifiers such as Multi-Layer Perceptron (MLP) or SVM to label class of input image.

In order to compute the features from CNN, the input images of the convolution layer are convoluted with a set of filters which are different for each node of convolution layer.The filters are designed based on unsupervised learning from the patches of sample images in training set. In this work, variation of K-mean clustering presented in [6], [7] is used to group normalized and whitened patches of sample images. Every center patch of the clusters is used as the filter kernel in the convolution layer. According to CNN architecture, before applying filters to the input images in the convolution layer,every input image is weighted according to MLP architecture. Then, each filter is used to convolute with every input image to find a response of that convolutional node. After responses of every convolutional node are calculated, the responses are sent to the sub-sampling layers. In sub-sampling layers, a spatial pooling scheme is applied to resize the response images of convolution process which can reduce the dimensional of feature space. As a result after one set of convolution and sub-

sampling processes, number images feature maps increase but the resolution of feature maps are reduced. The outputs of the sub-sampling layer can be sent to further convolution and sub-sampling layers to repeat the same process. In this work, this set of processes is repeated two times before sending to classification layer as illustrated in Fig 2. Finally, the classification layer takes responses from the previous layer as features to be classified. In this work, multi-class SVM classifier is used to estimate the class label of input image from the features extracted from convolution and sub-sampling layers.

C. Candidate Region Validation Using PHOG The candidate regions detected form CNN stage which

may contain some false positives are verified using PHOG descriptor. PHOG [10] is a feature used in many object recognition and detection processes such as vehicle detection [11], vehicle model recognition [15]. It is a spatial shape descriptor from calculated distribution of edge orientation.The descriptor has the basic concept from combination of the Histogram of Orientation Gradients (HOG) descriptor [12] and multi-scale pyramid segmentation. To calculated HOG descriptor, Canny edge detection is applied to input image to locate edge pixels. Then, gradient directions are calculated for every edge pixel. Directions of gradient vectors are quantized into N bins histogram. Gradient magnitudes in every edge pixel are collected into the histogram for each corresponding bin of their orientations. Then, the histogram is normalized, resulting in N dimensional HOG descriptor.

For PHOG descriptor calculation, the input image is equally divided into segments where the sizes of segments are depend on levels of pyramid. In each pyramid level, image is divided into L2 segments in which L is the pyramid level.Fig.3 shows an example of extraction of PHOG descriptor in the pyramid level of 0, 1 and 2 where the image is equally divided into 1, 4, and 16 segments, respectively. HOGdescriptors of every image segment in every pyramid level are extracted and are concatenated together into the PHOG descriptor. This combination of HOG features gives attention in the distribution of local gradients or edge directions.

To verify the results of CNN operation, the proposed framework extracts PHOG descriptors from the candidate regions and classifies them using the binary SVM with radial basis function kernel (RBF). The SVM classifier is trained by using dataset of both positive samples (logo images) and negative samples (backgrounds and other parts of vehicles). In this verification stage, the SVM classifier will reject the non-logo regions.

III. EXPERIMENTAL RESULTS

A. Experiment I In the first experiment, three training datasets of image

segments are used to evaluate the performances of proposed scheme for comparison with the conventional CNN and PHOG methods. The first dataset has 3,000 image segments which are composed of 1,000 segments of manually cropped

Fig. 4. Examples of vehicle logo segments in trainning set.

Fig. 5. Examples of non-vehicle logo segments in trainning set.

Fig. 6. 8-by-8 grayscale filter patches from CNN operation.

2014 11th International Joint Conference on Computer Science and Software Engineering (JCSSE)

36

logo images from 20 manufacture brands and 2,000 segments manually cropped from environmental background. The second dataset has 5,000 segments which are composed of the same positive samples as in the first dataset but the number of negative samples is increased to 4,000 images. Finally, the third dataset consists of 7,000 segments which include 6,000 segments of negative samples. Note that size of the image segments in every dataset are 32 32 square of pixels. The manufacturer brands in the datasets are listed in Table III. Examples of logo segments and non-logo segments in training set are given in Fig. 4 and Fig.5, respectively.

The systems are tested on 200 vehicle images which consist of ten images per manufacture brands. The images are mainly collected from several websites such as used car dealer websites or via Google image search. Example images in test set are given in Fig. 7. In this experiment the variation of image scale is not considered yet. Therefore, to simplify the problem, the images in test set are already resized to obtain the suitable sizes of logo areas and the multi-scale pyramid scheme is not applied in this experiment.

For details of CNN process, input image windows for CNN are taken by scanning the images with 32 32 sliding window. The CNN structure with two convolution and sub-sampling layers as shown in Fig. 2 is used in this experiment. The number of filters used in convolution layer is equal to 64. Examples of filter kernels obtained from training data are illustrated in Fig 6. Blocks of 5 5 and 2 2 are used in pooling processes of the first and the second sub-sampling layers, respectively. As a result, the feature extraction layers transform the input image of size 32 32 into 64 output images of size 2 2. 64 output images of size 2 2 are used as features for SVM classifier in classification layer.

For PHOG based classification, we use PHOG with pyramid level 1L . The number of orientation bins used in this experiment is 8N in a range of 0-360 . In the case of individual PHOG scheme, to classify the classes of logos, k-NN classifier which empirically provides the best results on these datasets is chosen for comparison with the proposed scheme. On the other hand, the binary SVM classifier with RBF kernel function is used for classification of PHOG descriptor in the verification stage of the proposed scheme.

Since, in the verification stage, it is not necessary to classify the classes of logos, therefore, the binary SVM classifier can perform well.

To evaluate performance of our method and baseline methods, “Precision”, “Recall” and “Accuracy” are calculated from results. The same training sets and test set are used in CNN, PHOG and the proposed methods. The results are presented in Table I. The performances of logo detection are evaluated via precision and recall. The precision is calculated as the ratio between the number of logo segments that are correctly found by systems and the number of all segments found by systems (including both correct and incorrect classification). On the other hand, the recall is calculated as the ratio between the number of logo segments that are correctly found by systems and the number of logo segments appearing in test set. According to the results in Table I, CNN scheme yields high recall rate but low precision. In contrast, PHOG scheme provides higher precision with low recall. In comparison with baseline methods, the proposed scheme can balance the trade-off between precision and recall. The results show that high recall from CNN and high precision from PHOG can complement each other by using our proposed combination of the two methods. Based on experiments on different sizes of dataset, we found that conventional CNN can provide the greatest performance of 100% recall with training set of size 3,000 and conventional PHOG can provide the best precision of 75.35% with training set of size 7,000. By choosing the best training sets for those two methods for the proposed scheme, which are training CNN with dataset of 3,000 image segments and training PHOG with dataset of 7,000 image segments, the proposed can provide 100% of recall with 96.96% of precision which significantly improve from the baseline CNN and PHOG schemes.

The performances of logo recognition are measured by using accuracy (recognition rate). In this experiment, there are 21 class labels (20 class labels for manufacturer brand and one class label for unknown class) for classification of 32 32windows of images in 200 test images. The accuracy is calculated by the number of image windows that are recognized correctly and the number of all image windows in 200 test images. The result in Table I demonstrates that the proposed scheme can provide the higher accuracy in comparison with CNN and PHOG.

B. Experiment II In this experiment, the proposed scheme is tested on

detection and recognition of two datasets which contains 10 and 20 classes of logos. In both cases, datasets of 200 images, 3,000 image segments, and 7,000 image segments are used as the test set and the training sets for CNN, and PHOG stages, respectively. The detailed results separated by vehicle brands are demonstrated in Table II and Table III for 10 and 20 classes, respectively. Columns of “TP” and “FP” show the number of true positives and false positives, respectively. Column of “#Images” represents the number of test images in that class. The experiments show the satisfying results in both detection and recognition process.

TABLE I. RESULTS FROM DIFFERENT SIZES OF TRAINING SETS

Methods Training Set Recall(%)

Precision(%)

Accuracy(%)

CNN3000 100.00 49.22 99.895000 95.50 58.64 99.937000 95.50 54.13 99.90

PHOGwith k-NN

3000 40.00 71.44 99.915000 39.50 69.96 99.917000 45.50 75.35 99.92

Proposed3000 84.00 74.29 99.945000 97.50 58.05 99.937000 83.00 67.46 99.94

Proposed 3000 (CNN) & 7000 (PHOG) 100.00 96.96 99.99

2014 11th International Joint Conference on Computer Science and Software Engineering (JCSSE)

37

Based on the experiment, we found that the false positives happen when there are some vehicle components or environments that have similar shape comparing to the logo images. For example, front lights and fog lights of the vehicle are often in sphere shapes and shape details are sometimes very similar to some vehicle logos such as BMW, ALFAROMEO. On the other hands, false negatives can occur by shadows in the scene or low resolution images.

C. Experiment III In this section, the comparative results of the proposed

method to the other state-of-art methods are presented. The results are illustrated in Table IV. The methods of which we compare in this section are HOG with k-NN classifier, PHOG with k-NN classifier, CNN classifier, SIFT with relative matching scheme, and the proposed scheme. The training dataset of 3,000 image segments is used in conventional CNN approach and the CNN stage in the proposed scheme while the training dataset of 7,000 image segments is used in HOGscheme, PHOG scheme and the PHOG verification stage in the proposed scheme. For SIFT based approach, SIFT keypoints are extracted from training images to construct sets of keypoints for each classes of logo. Since the negative

samples are not necessary in the SIFT based approach, we use dataset of 1,000 logo images as in dataset of 3,000 and 7,000 images for extraction of the prototype keypoints. Then, relative matching scheme used in [13] is applied to detect logo keypoints in the testing images. Testing dataset for every method is a same set of 200 vehicle images of 20 classes that was used in Experiment I.

According to the results in Table IV, our proposed method outperforms other baseline methods in both logo detection and logo recognition. Examples of the results from every method are illustrated in Fig. 7. CNN and proposed scheme can provide 100% in recall while the proposed scheme can improve the precision by 47.74%.

SIFT matching method achieve approximately the same precision as in CNN but the recall is significantly lower because the size and resolution of the logo images in training set is not enough to efficiently extract the SIFT keypoints.

In contrary to the PHOG scheme, the HOG scheme performs better in term of recall though the precision is very low comparing to the other methods. The reason may be because HOG descriptor focuses on only general detail but PHOG provide more specific local detail in the higher pyramid level. According Fig. 7, a number of false detection can be observed in HOG scheme with corresponding to 20.57% in precision while PHOG scheme fails to detect the logo with corresponding to 45.50% in recall.

IV. CONCLUSIONS

This paper proposed a new hybrid method for vehicle logo detection and recognition based on CNN and PHOG feature. Since CNN can provide the high recall rate in logo detection and high accuracy in logo recognition, the proposed scheme uses CNN in the first stage to locate and recognize candidate regions of the input image. However the number of false detection from CNN detector is still high. PHOG features and binary SVM classifier are used to verify the candidate regions. PHOG based classifier can provide the logo detection scheme with high precision, therefore false detection can be rejected in PHOG verification stage. Based on the experiment, our proposed two-stage scheme can achieve high precision and recall in vehicle logo detection and high accuracy in recognition and overcomes the problems complexity of scene. The results also show the improvement of the proposed scheme in comparison with the individual CNN and PHOG feature, and also other baseline methods.

TABLE II. DETECTION AND RECOGNITION RESULTS OF 10 CLASSES

Models #Images TP FP Recall(%)

Precision(%)

Accuracy(%)

FORD 20 20 0 100.00 100.00 100.00NISSAN 20 20 0 100.00 100.00 100.00LEXUS 20 18 0 90.00 100.00 99.98MAZDA 20 19 0 95.00 100.00 99.99

BENZ 20 20 0 100.00 100.00 100.00HONDA 20 20 0 100.00 100.00 100.00SUZUKI 20 18 2 90.00 90.00 99.98

AUDI 20 18 1 90.00 94.73 99.94VOLK 20 19 1 95.00 95.00 99.98

CETROEN 20 19 3 95.00 86.36 99.97Average 95.50 96.60 99.98

TABLE III. DETECTION AND RECOGNITION RESULTS OF 20 CLASSES

Models #Images TP FP Recall(%)

Precision(%)

Accuracy(%)

TOYOTA 10 10 0 100.00 100.00 100.00ROMEO 10 10 0 100.00 100.00 100.00

BMW 10 10 1 100.00 90.90 99.99VOLK 10 10 0 100.00 100.00 100.00AUDI 10 10 0 100.00 100.00 100.00

CETROEN 10 10 0 100.00 100.00 100.00FIAT 10 10 1 100.00 90.90 99.99

PEUGEOT 10 10 0 100.00 100.00 100.00BENZ 10 10 0 100.00 100.00 100.00

HONDA 10 10 0 100.00 100.00 100.00HYUNDAI 10 10 0 100.00 100.00 100.00

KIA 10 10 0 100.00 100.00 100.00FORD 10 10 0 100.00 100.00 100.00

NISSAN 10 10 0 100.00 100.00 100.00LEXUS 10 10 0 100.00 100.00 100.00MAZDA 10 10 0 100.00 100.00 100.00

MITSUBISHI 10 10 2 100.00 83.33 99.99PROTON 10 10 2 100.00 83.33 99.99

TATA 10 10 1 100.00 90.90 99.99SUZUKI 10 10 0 100.00 100.00 100.00

Average 100.00 96.96 99.99

TABLE IV. COMPARISON OF RESULTS FROM SEVERAL METHODS

Methods Recall(%)

Precision(%)

Accuracy(%)

HOG with k-NN 65.50 20.57 99.66

PHOG with k-NN 45.50 75.35 99.92

CNN 100.00 49.22 99.89

SIFT 62.00 51.89 99.90

Proposed 100.00 96.96 99.99

2014 11th International Joint Conference on Computer Science and Software Engineering (JCSSE)

38

ACKNOWLEDGEMENTS

This research was supported by Chiang Mai University Research Fund and Graduate School, Chiang Mai University, Thailand.

REFERENCES

[1] A. Psyllos, C. N. Anagnostopoulos, and E. Kayafas, “M-SIFT: A new method for Vehicle Logo Recognition,” IEEE Inter. Conf. on Vehicular Electronics and Safety, Istanbul Turkey, July 24-27, 2012.

[2] A. Psyllos, and C. N. Anagnostopoulos, “Vehicle Logo Recognition Using a SIFT-Based Enhanced Matching Scheme,” IEEE Trans. on Intelligent Transportation Systems, vol. 11.2, pp. 322-328, 2010.

[3] K. Zhou, K. Mahesh Varadarajan, M.Vincze, and F. Liu, “Hybridization of Appearance and Symmetry for Vehicle-Logo Localization,” IEEE Inter. Conf. on Intelligent Transportation Systems (ITSC), 2012.

[4] Y. Liu, and S. Li, “A Vehicle-logo Location Approach Based on Edge Detection and Projection, ” IEEE Inter. Conf. on Vehicular Electronics and Safety (ICVES), 2011.

[5] S. Kam-Tong, and X. Lin Tian, “Vehicle Logo Recognition Using Modest AdaBoost and Radial Tchebichef Moments,” Inter. Conf. on Machine Learning and Computing (ICMLC), 2012.

[6] A. Coates, B. Carpenter, C. Case, S. Satheesh, B. Suresh, T. Wang, D. J. Wu, and A. Y. Ng, “Text detection and character recognition in scene images with unsupervised feature learning,” IEEE Inter. Conf. on Document Analysis and Recognition (ICDAR), 2011.

[7] T. Wang, D. J. Wu, A. Coates, A. Y. Ng, “End-to-End Text Recognition with Convolutional Neural Networks,” IEEE Inter. Conf. on Pattern Recognition (ICPR), 2012.

[8] Y. Netzer, T. Wang, A.Coates, A. Bissacco, B. Wu, and A. Y. Ng, “Reading digits in natural images with unsupervised feature learning,” NIPS Workshop on Deep Learning and Unsupervised Feature Learning, 2011.

[9] H. Khalajzadeh, M. Mansouri, and M. Teshnehlab, “Face Recognition Using Convolutional Neural Network and Simple Logistic Classifier,” Soft Computing in Industrial Applications, Springer International Publishing, pp. 197-207, 2014.

[10] A. Bosch, A. Zisserman, and X. Munoz, “ Representing shape with a spatial pyramid kernel,” Proceed. Inter. Conf. on Image and Video Retrieval (CIVR), 2007.

[11] W. Hailuo, W. Bo, and L. Sun, “Pyramid Histogram of Oriented Gradient and Particles Swarm Optimization Based SVM for Vehicle Detection,” IEEE Inter. Conf. on Image and Graphics (ICIG), 2013.

[12] N. Dalal, and B. Triggs, “Histograms of Oriented Gradients for Human Detection,” IEEE Computer Society Conf. on. Computer Vision and Pattern Recognition (CVPR), vol. 1, 2005.

[13] D.G. Lowe, “Distinctive image features from scale-invariant keypoints,” International journal of computer vision, vol. 60.2, pp. 91-110, 2004.

[14] J. Harel, C. Koch, and P. Perona, “Graph-Based Visual Saliency,” Inter. Conf. on Advances in Neural Information Processing Systems, vol. 19, pp. 545, 2007.

[15] T. Anakavej, A. Kawewong, and K. Patanukhom, “Internet-Vision Based Vehicle Model Query System Using Eigenfaces and Pyramid of Histogram of Oriented Gradients,” IEEE Inter. Conf. on Signal-Image Technology and Internet-Besed Systems (SITIS), 2013.

Fig. 7. Comparison of results from each methods.

CNNPHOG SIFTHOG

Not found

Not found

PROPOSED

2014 11th International Joint Conference on Computer Science and Software Engineering (JCSSE)

39