prof. feng liu

54
Prof. Feng Liu Winter 2020 http://www.cs.pdx.edu/~fliu/courses/cs410/ 02/27/2020

Upload: others

Post on 01-Dec-2021

2 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Prof. Feng Liu

Prof. Feng Liu

Winter 2020

http://www.cs.pdx.edu/~fliu/courses/cs410/

02/27/2020

Page 2: Prof. Feng Liu

Last Time

Introduction to object recognition

2

The slides for this topic are used from Prof. S. Lazebnik.

Page 3: Prof. Feng Liu

Today

Machine learning approach to object recognition

◼ Classifiers

◼ Bag-of-features models

3

The slides for this topic are used from Prof. S. Lazebnik.

Page 4: Prof. Feng Liu

Recognition: A machine learning approach

Slides adapted from Fei-Fei Li, Rob Fergus, Antonio Torralba, Kristen Grauman, and Derek Hoiem

Page 5: Prof. Feng Liu

The machine learning framework

Apply a prediction function to a feature representation of

the image to get the desired output:

f( ) = “apple”

f( ) = “tomato”

f( ) = “cow”

Page 6: Prof. Feng Liu

The machine learning framework

y = f(x)

Training: given a training set of labeled examples {(x1,y1),

…, (xN,yN)}, estimate the prediction function f by minimizing

the prediction error on the training set

Testing: apply f to a never before seen test example x and

output the predicted value y = f(x)

output prediction function

Image feature

Page 7: Prof. Feng Liu

Prediction

Steps

Training

LabelsTraining Images

Training

Training

Image

Features

Image

Features

Testing

Test Image

Learned

model

Learned

model

Slide credit: D. Hoiem

Page 8: Prof. Feng Liu

Features

Raw pixels

Histograms

GIST descriptors

Page 9: Prof. Feng Liu

Classifiers: Nearest neighbor

f(x) = label of the training example nearest to x

All we need is a distance function for our inputs

No training required!

Test example

Training examples from class

1

Training examples from class

2

Page 10: Prof. Feng Liu

Classifiers: Linear

Find a linear function to separate the classes:

f(x) = sgn(w x + b)

Page 11: Prof. Feng Liu

Images in the training set must be annotated with the

“correct answer” that the model is expected to produce

Contains a motorbike

Recognition task and supervision

Page 12: Prof. Feng Liu

Unsupervised “Weakly” supervised Fully supervised

Definition depends on task

Page 13: Prof. Feng Liu

Generalization

How well does a learned model generalize from

the data it was trained on to a new test set?

Training set (labels known) Test set (labels unknown)

Page 14: Prof. Feng Liu

Generalization

Components of generalization error

◼ Bias: how much the average model over all training sets differ from

the true model?

Error due to inaccurate assumptions/simplifications made by the model

◼ Variance: how much models estimated from different training sets

differ from each other

Underfitting: model is too “simple” to represent all the relevant

class characteristics

◼ High bias and low variance

◼ High training error and high test error

Overfitting: model is too “complex” and fits irrelevant

characteristics (noise) in the data

◼ Low bias and high variance

◼ Low training error and high test error

Page 15: Prof. Feng Liu

Bias-variance tradeoff

Training error

Test error

Slide credit: D. Hoiem

Underfitting Overfitting

Complexity Low BiasHigh Variance

High BiasLow Variance

Err

or

Page 16: Prof. Feng Liu

Bias-variance tradeoff

Many training examples

Few training examples

Complexity Low BiasHigh Variance

High BiasLow Variance

Test

Err

or

Slide credit: D. Hoiem

Page 17: Prof. Feng Liu

Effect of Training Size

Testing

Training

Generalization Error

Slide credit: D. Hoiem

Number of Training Examples

Err

or

Fixed prediction model

Page 18: Prof. Feng Liu

Datasets

Circa 2001: 5 categories, 100s of images per

category

Circa 2004: 101 categories

Today: up to thousands of categories, millions

of images

Page 19: Prof. Feng Liu

Caltech 101 & 256

Griffin, Holub, Perona, 2007

Fei-Fei, Fergus, Perona, 2004

http://www.vision.caltech.edu/Image_Datasets/Caltech101/http://www.vision.caltech.edu/Image_Datasets/Caltech256/

Page 20: Prof. Feng Liu

Caltech-101: Intraclass variability

Page 21: Prof. Feng Liu

The PASCAL Visual Object Classes

Challenge (2005-present)

Challenge classes:Person: person Animal: bird, cat, cow, dog, horse, sheep Vehicle: aeroplane, bicycle, boat, bus, car, motorbike, train Indoor: bottle, chair, dining table, potted plant, sofa, tv/monitor

http://host.robots.ox.ac.uk/pascal/VOC/

Page 22: Prof. Feng Liu

Main competitions

◼ Classification: For each of the twenty classes,

predicting presence/absence of an example of that

class in the test image

◼ Detection: Predicting the bounding box and label of

each object from the twenty target classes in the test

image

The PASCAL Visual Object Classes

Challenge (2005-present)

http://pascallin.ecs.soton.ac.uk/challenges/VOC/

Page 23: Prof. Feng Liu

“Taster” challenges

◼ Segmentation:

Generating pixel-wise

segmentations giving

the class of the object

visible at each pixel, or

"background"

otherwise

◼ Person layout:

Predicting the

bounding box and label

of each part of a

person (head, hands,

feet)

The PASCAL Visual Object Classes

Challenge (2005-present)

http://pascallin.ecs.soton.ac.uk/challenges/VOC/

Page 24: Prof. Feng Liu

“Taster” challenges

◼ Action classification

The PASCAL Visual Object Classes

Challenge (2005-present)

http://pascallin.ecs.soton.ac.uk/challenges/VOC/

Page 25: Prof. Feng Liu

Russell, Torralba, Murphy, Freeman, 2008

LabelMehttp://labelme.csail.mit.edu/

Page 26: Prof. Feng Liu

80 Million Tiny Images

http://people.csail.mit.edu/torralba/tinyimages/

Page 27: Prof. Feng Liu

ImageNet http://www.image-net.org/

Page 28: Prof. Feng Liu

Today

Machine learning approach to object recognition

◼ Classifiers

◼ Bag-of-features models

28

The slides for this topic are used from Prof. S. Lazebnik.

Page 29: Prof. Feng Liu

Bag-of-features models

Page 30: Prof. Feng Liu

Origin 1: Texture recognition

Texture is characterized by the repetition of basic

elements or textons

For stochastic textures, it is the identity of the

textons, not their spatial arrangement, that matters

Julesz, 1981; Cula & Dana, 2001; Leung & Malik 2001; Mori, Belongie & Malik, 2001; Schmid 2001; Varma & Zisserman, 2002, 2003; Lazebnik, Schmid & Ponce, 2003

Page 31: Prof. Feng Liu

Origin 1: Texture recognition

Universal texton dictionary

histogram

Julesz, 1981; Cula & Dana, 2001; Leung & Malik 2001; Mori, Belongie & Malik, 2001; Schmid 2001; Varma & Zisserman, 2002, 2003; Lazebnik, Schmid & Ponce, 2003

Page 32: Prof. Feng Liu

Orderless document representation: frequencies of words

from a dictionary Salton & McGill (1983)

Origin 2: Bag-of-words models

Page 33: Prof. Feng Liu

US Presidential Speeches Tag Cloudhttp://chir.ag/phernalia/preztags/

Orderless document representation: frequencies of words

from a dictionary Salton & McGill (1983)

Origin 2: Bag-of-words models

Page 34: Prof. Feng Liu

US Presidential Speeches Tag Cloudhttp://chir.ag/phernalia/preztags/

Orderless document representation: frequencies of words

from a dictionary Salton & McGill (1983)

Origin 2: Bag-of-words models

Page 35: Prof. Feng Liu

US Presidential Speeches Tag Cloudhttp://chir.ag/phernalia/preztags/

Orderless document representation: frequencies of words

from a dictionary Salton & McGill (1983)

Origin 2: Bag-of-words models

Page 36: Prof. Feng Liu

1. Extract features

2. Learn “visual vocabulary”

3. Quantize features using visual vocabulary

4. Represent images by frequencies of “visual words”

Bag-of-features steps

Page 37: Prof. Feng Liu

1. Feature extraction

Regular grid or interest regions

Page 38: Prof. Feng Liu

Normalize patch

Detect patches

Compute descriptor

Slide credit: Josef Sivic

1. Feature extraction

Page 39: Prof. Feng Liu

1. Feature extraction

Slide credit: Josef Sivic

Page 40: Prof. Feng Liu

2. Learning the visual vocabulary

Slide credit: Josef Sivic

Page 41: Prof. Feng Liu

2. Learning the visual vocabulary

Clustering

Slide credit: Josef Sivic

Page 42: Prof. Feng Liu

2. Learning the visual vocabulary

Clustering

Visual vocabulary

Slide credit: Josef Sivic

Page 43: Prof. Feng Liu

K-means clustering

• Want to minimize sum of squared Euclidean

distances between points xi and their

nearest cluster centers mk

Algorithm:

• Randomly initialize K cluster centers

• Iterate until convergence:

◼ Assign each data point to the nearest center

◼ Re-compute each cluster center as the mean of

all points assigned to it

−=k

ki

ki mxMXDcluster

clusterinpoint

2)(),(

Page 44: Prof. Feng Liu

Clustering and vector quantization

• Clustering is a common method for learning a visual

vocabulary or codebook

◼ Unsupervised learning process

◼ Each cluster center produced by k-means becomes a codevector

◼ Codebook can be learned on separate training set

◼ Provided the training set is sufficiently representative, the

codebook will be “universal”

• The codebook is used for quantizing features

◼ A vector quantizer takes a feature vector and maps it to the index

of the nearest codevector in a codebook

◼ Codebook = visual vocabulary

◼ Codevector = visual word

Page 45: Prof. Feng Liu

Example codebook

Source: B. LeibeAppearance codebook

Page 46: Prof. Feng Liu

Another codebook

Appearance codebook…

………

Source: B. Leibe

Page 47: Prof. Feng Liu

Yet another codebook

Fei-Fei et al. 2005

Page 48: Prof. Feng Liu

Visual vocabularies: Issues

• How to choose vocabulary size?

◼ Too small: visual words not representative of all

patches

◼ Too large: quantization artifacts, overfitting

• Computational efficiency

◼ Vocabulary trees

(Nister & Stewenius, 2006)

Page 49: Prof. Feng Liu

Spatial pyramid representation

level 0

Extension of a bag of features

Locally orderless representation at several levels of resolution

Beyond Bags of Features: Spatial Pyramid Matching for Recognizing Natural Scene CategoriesLazebnik, Schmid & Ponce (CVPR 2006)

Page 50: Prof. Feng Liu

Spatial pyramid representation

level 0 level 1

Extension of a bag of features

Locally orderless representation at several levels of resolution

Lazebnik, Schmid & Ponce (CVPR 2006)

Page 51: Prof. Feng Liu

Spatial pyramid representation

level 0 level 1 level 2

Extension of a bag of features

Locally orderless representation at several levels of resolution

Lazebnik, Schmid & Ponce (CVPR 2006)

Page 52: Prof. Feng Liu

Scene category dataset

Multi-class classification results(100 training images per class)

Page 53: Prof. Feng Liu

Caltech101 dataset

Multi-class classification results (30 training images per class)

Page 54: Prof. Feng Liu

Next Time

More classification

Visual saliency

54