prof. feng liu

Post on 01-Dec-2021

2 Views

Category:

Documents

0 Downloads

Preview:

Click to see full reader

TRANSCRIPT

Prof. Feng Liu

Winter 2020

http://www.cs.pdx.edu/~fliu/courses/cs410/

02/27/2020

Last Time

Introduction to object recognition

2

The slides for this topic are used from Prof. S. Lazebnik.

Today

Machine learning approach to object recognition

◼ Classifiers

◼ Bag-of-features models

3

The slides for this topic are used from Prof. S. Lazebnik.

Recognition: A machine learning approach

Slides adapted from Fei-Fei Li, Rob Fergus, Antonio Torralba, Kristen Grauman, and Derek Hoiem

The machine learning framework

Apply a prediction function to a feature representation of

the image to get the desired output:

f( ) = “apple”

f( ) = “tomato”

f( ) = “cow”

The machine learning framework

y = f(x)

Training: given a training set of labeled examples {(x1,y1),

…, (xN,yN)}, estimate the prediction function f by minimizing

the prediction error on the training set

Testing: apply f to a never before seen test example x and

output the predicted value y = f(x)

output prediction function

Image feature

Prediction

Steps

Training

LabelsTraining Images

Training

Training

Image

Features

Image

Features

Testing

Test Image

Learned

model

Learned

model

Slide credit: D. Hoiem

Features

Raw pixels

Histograms

GIST descriptors

Classifiers: Nearest neighbor

f(x) = label of the training example nearest to x

All we need is a distance function for our inputs

No training required!

Test example

Training examples from class

1

Training examples from class

2

Classifiers: Linear

Find a linear function to separate the classes:

f(x) = sgn(w x + b)

Images in the training set must be annotated with the

“correct answer” that the model is expected to produce

Contains a motorbike

Recognition task and supervision

Unsupervised “Weakly” supervised Fully supervised

Definition depends on task

Generalization

How well does a learned model generalize from

the data it was trained on to a new test set?

Training set (labels known) Test set (labels unknown)

Generalization

Components of generalization error

◼ Bias: how much the average model over all training sets differ from

the true model?

Error due to inaccurate assumptions/simplifications made by the model

◼ Variance: how much models estimated from different training sets

differ from each other

Underfitting: model is too “simple” to represent all the relevant

class characteristics

◼ High bias and low variance

◼ High training error and high test error

Overfitting: model is too “complex” and fits irrelevant

characteristics (noise) in the data

◼ Low bias and high variance

◼ Low training error and high test error

Bias-variance tradeoff

Training error

Test error

Slide credit: D. Hoiem

Underfitting Overfitting

Complexity Low BiasHigh Variance

High BiasLow Variance

Err

or

Bias-variance tradeoff

Many training examples

Few training examples

Complexity Low BiasHigh Variance

High BiasLow Variance

Test

Err

or

Slide credit: D. Hoiem

Effect of Training Size

Testing

Training

Generalization Error

Slide credit: D. Hoiem

Number of Training Examples

Err

or

Fixed prediction model

Datasets

Circa 2001: 5 categories, 100s of images per

category

Circa 2004: 101 categories

Today: up to thousands of categories, millions

of images

Caltech 101 & 256

Griffin, Holub, Perona, 2007

Fei-Fei, Fergus, Perona, 2004

http://www.vision.caltech.edu/Image_Datasets/Caltech101/http://www.vision.caltech.edu/Image_Datasets/Caltech256/

Caltech-101: Intraclass variability

The PASCAL Visual Object Classes

Challenge (2005-present)

Challenge classes:Person: person Animal: bird, cat, cow, dog, horse, sheep Vehicle: aeroplane, bicycle, boat, bus, car, motorbike, train Indoor: bottle, chair, dining table, potted plant, sofa, tv/monitor

http://host.robots.ox.ac.uk/pascal/VOC/

Main competitions

◼ Classification: For each of the twenty classes,

predicting presence/absence of an example of that

class in the test image

◼ Detection: Predicting the bounding box and label of

each object from the twenty target classes in the test

image

The PASCAL Visual Object Classes

Challenge (2005-present)

http://pascallin.ecs.soton.ac.uk/challenges/VOC/

“Taster” challenges

◼ Segmentation:

Generating pixel-wise

segmentations giving

the class of the object

visible at each pixel, or

"background"

otherwise

◼ Person layout:

Predicting the

bounding box and label

of each part of a

person (head, hands,

feet)

The PASCAL Visual Object Classes

Challenge (2005-present)

http://pascallin.ecs.soton.ac.uk/challenges/VOC/

“Taster” challenges

◼ Action classification

The PASCAL Visual Object Classes

Challenge (2005-present)

http://pascallin.ecs.soton.ac.uk/challenges/VOC/

Russell, Torralba, Murphy, Freeman, 2008

LabelMehttp://labelme.csail.mit.edu/

80 Million Tiny Images

http://people.csail.mit.edu/torralba/tinyimages/

ImageNet http://www.image-net.org/

Today

Machine learning approach to object recognition

◼ Classifiers

◼ Bag-of-features models

28

The slides for this topic are used from Prof. S. Lazebnik.

Bag-of-features models

Origin 1: Texture recognition

Texture is characterized by the repetition of basic

elements or textons

For stochastic textures, it is the identity of the

textons, not their spatial arrangement, that matters

Julesz, 1981; Cula & Dana, 2001; Leung & Malik 2001; Mori, Belongie & Malik, 2001; Schmid 2001; Varma & Zisserman, 2002, 2003; Lazebnik, Schmid & Ponce, 2003

Origin 1: Texture recognition

Universal texton dictionary

histogram

Julesz, 1981; Cula & Dana, 2001; Leung & Malik 2001; Mori, Belongie & Malik, 2001; Schmid 2001; Varma & Zisserman, 2002, 2003; Lazebnik, Schmid & Ponce, 2003

Orderless document representation: frequencies of words

from a dictionary Salton & McGill (1983)

Origin 2: Bag-of-words models

US Presidential Speeches Tag Cloudhttp://chir.ag/phernalia/preztags/

Orderless document representation: frequencies of words

from a dictionary Salton & McGill (1983)

Origin 2: Bag-of-words models

US Presidential Speeches Tag Cloudhttp://chir.ag/phernalia/preztags/

Orderless document representation: frequencies of words

from a dictionary Salton & McGill (1983)

Origin 2: Bag-of-words models

US Presidential Speeches Tag Cloudhttp://chir.ag/phernalia/preztags/

Orderless document representation: frequencies of words

from a dictionary Salton & McGill (1983)

Origin 2: Bag-of-words models

1. Extract features

2. Learn “visual vocabulary”

3. Quantize features using visual vocabulary

4. Represent images by frequencies of “visual words”

Bag-of-features steps

1. Feature extraction

Regular grid or interest regions

Normalize patch

Detect patches

Compute descriptor

Slide credit: Josef Sivic

1. Feature extraction

1. Feature extraction

Slide credit: Josef Sivic

2. Learning the visual vocabulary

Slide credit: Josef Sivic

2. Learning the visual vocabulary

Clustering

Slide credit: Josef Sivic

2. Learning the visual vocabulary

Clustering

Visual vocabulary

Slide credit: Josef Sivic

K-means clustering

• Want to minimize sum of squared Euclidean

distances between points xi and their

nearest cluster centers mk

Algorithm:

• Randomly initialize K cluster centers

• Iterate until convergence:

◼ Assign each data point to the nearest center

◼ Re-compute each cluster center as the mean of

all points assigned to it

−=k

ki

ki mxMXDcluster

clusterinpoint

2)(),(

Clustering and vector quantization

• Clustering is a common method for learning a visual

vocabulary or codebook

◼ Unsupervised learning process

◼ Each cluster center produced by k-means becomes a codevector

◼ Codebook can be learned on separate training set

◼ Provided the training set is sufficiently representative, the

codebook will be “universal”

• The codebook is used for quantizing features

◼ A vector quantizer takes a feature vector and maps it to the index

of the nearest codevector in a codebook

◼ Codebook = visual vocabulary

◼ Codevector = visual word

Example codebook

Source: B. LeibeAppearance codebook

Another codebook

Appearance codebook…

………

Source: B. Leibe

Yet another codebook

Fei-Fei et al. 2005

Visual vocabularies: Issues

• How to choose vocabulary size?

◼ Too small: visual words not representative of all

patches

◼ Too large: quantization artifacts, overfitting

• Computational efficiency

◼ Vocabulary trees

(Nister & Stewenius, 2006)

Spatial pyramid representation

level 0

Extension of a bag of features

Locally orderless representation at several levels of resolution

Beyond Bags of Features: Spatial Pyramid Matching for Recognizing Natural Scene CategoriesLazebnik, Schmid & Ponce (CVPR 2006)

Spatial pyramid representation

level 0 level 1

Extension of a bag of features

Locally orderless representation at several levels of resolution

Lazebnik, Schmid & Ponce (CVPR 2006)

Spatial pyramid representation

level 0 level 1 level 2

Extension of a bag of features

Locally orderless representation at several levels of resolution

Lazebnik, Schmid & Ponce (CVPR 2006)

Scene category dataset

Multi-class classification results(100 training images per class)

Caltech101 dataset

Multi-class classification results (30 training images per class)

Next Time

More classification

Visual saliency

54

top related