poster_logo detection and recognition

1
A B Logo Detection and Recognition Kaicheng Wang 1 1 School of Engineering, Stanford University 1. Abstract 2. Image Features 4. Results and Brief Discussion . . 3. Learning Algorithms Invariant local features Local maximum in space and scale 128-dimensional vector per point 1. Scale-invariant feature transform (SIFT) 2. K-nearest neighbor (cooperates with Fisherfaces) Sliding detection windows: The testing image is scanned by windows of different sizes. The best match with the specific red window is reported. 2. Fisherfaces Several basis vectors Maximizing difference between clusters Minimizing variance within clusters • SIFT and Fisherfaces are used to extract features from training and testing images. • Naïve Bayes and K-nearest neighbors are used to build model for logo detection and recognition. 60 min 5. Acknowledgement The author would like to thank Prof Andrew Ng for his lecture of CS 229 and TA Albert Haque for his guidance. 1. Naïve Bayes (cooperates with SIFT) Gatorade1 Gatorade2 Gatorade3 Training data: different versions of the same brand Every image is expressed by a linear combination of basis vectors. The number of basis vectors to keep is the dimension of features, which is optimized by k-fold cross validation. In this way, both training and testing images have very low-dimensional features. Testing data: advertisement of Gatorade from Michel Jordan Point A is matched to all three training images. P(A|y = Gatorade) = (3+1)/(3+2) = 0.8 Point B is matched to only one image (Gatorade1). P(B|y = Gatorade) =(1+1)/(3+2) = 0.4 Laplace smoothing is used here. Toy model of likelihood estimates in NB Two parameters of this learning model are to be decided: Value of K in KNN Number of basis vectors (“reduced dimensions” in the plots) Spatial pyramid: Similarity between detection window and training data is evaluated at different resolutions. The weight of matching at higher resolution is larger. 1. Techniques 3. Accuracy Method Target Training set size Testing set size Precision SIFT + NB Commercial trademark 10images/brand*150brands = 1500 images 5images/brand*150brands = 750 images 88.27% Fisherfaces + KNN Traffic sign 3images/sign*40signs = 120 images 9images/sign*40signs = 360 images 95.57% 2. Reasons for each choice Commercial advertisements • Weaker assumption SIFT Curse of dimensionality NB • High accuracy Traffic signs • Stronger assumption Fisherfaces • Lower dimensions KNN • Computation efficient

Upload: kaicheng-wang

Post on 14-Apr-2017

81 views

Category:

Documents


2 download

TRANSCRIPT

Page 1: Poster_Logo Detection and Recognition

A

B

Logo Detection and Recognition Kaicheng Wang1

1School of Engineering, Stanford University

1. Abstract

2. Image Features

4. Results and Brief Discussion

. .

3. Learning Algorithms

•  Invariant local features •  Local maximum in

space and scale •  128-dimensional

vector per point

1. Scale-invariant feature transform (SIFT)

2. K-nearest neighbor (cooperates with Fisherfaces)

•  Sliding detection windows: The testing image is scanned by windows of different sizes. The best match with the specific red window is reported.

2. Fisherfaces

•  Several basis vectors •  Maximizing difference

between clusters •  Minimizing variance

within clusters

• SIFT and Fisherfaces are used to extract features from training and testing images.

• Naïve Bayes and K-nearest neighbors are used to build model for logo detection and recognition.

60 min 5. Acknowledgement

The author would like to thank Prof Andrew Ng for his lecture of CS 229 and TA Albert Haque for his guidance.

1. Naïve Bayes (cooperates with SIFT)

Gatorade1 Gatorade2 Gatorade3

Training data: different versions of the same brand

Every image is expressed by a linear combination of basis vectors. The number of basis vectors to keep is the dimension of features, which is optimized by k-fold cross validation. In this way, both training and testing images have very low-dimensional features.

Testing data: advertisement of Gatorade from Michel Jordan

•  Point A is matched to all three training images.

P(A|y = Gatorade) = (3+1)/(3+2) = 0.8

•  Point B is matched to only

one image (Gatorade1). P(B|y = Gatorade) =(1+1)/(3+2) = 0.4

•  Laplace smoothing is used

here.

Toy model of likelihood estimates in NB

Two parameters of this learning model are to be decided: •  Value of K in KNN •  Number of basis vectors (“reduced dimensions” in the plots)

•  Spatial pyramid: Similarity between detection window and training data is evaluated at different resolutions. The weight of matching at higher resolution is larger.

1. Techniques

3. Accuracy Method Target Training set size Testing set size Precision SIFT + NB Commercial

trademark 10images/brand*150brands = 1500 images

5images/brand*150brands = 750 images

88.27%

Fisherfaces + KNN Traffic sign 3images/sign*40signs = 120 images

9images/sign*40signs = 360 images

95.57%

2. Reasons for each choice

Commercial advertisements

•  Weaker assumption

SIFT •  Curse of dimensionality

NB •  High accuracy

Traffic signs

•  Stronger assumption

Fisherfaces •  Lower dimensions

KNN •  Computation efficient