computer visioncrcv.ucf.edu/people/faculty/xborji/www/lectures/6.pdf1/2 1/4 (2x zoom) 1/8 (4x zoom)...

95
Computer Vision Image pyramids Ali Borji UWM

Upload: buitram

Post on 14-Mar-2018

229 views

Category:

Documents


2 download

TRANSCRIPT

Computer Vision

Image pyramids

Ali BorjiUWM

• This image is too big to fit on the screen. How can we reduce it?

• How to generate a half-sized version?

2Slide credit: S. Seitz

Image Scaling

Image Sub-Sampling

Throw away every other row and column to create a 1/2 size image

- called image sub-sampling

1/4

1/8

Slide credit: S. Seitz

Image Sub-Sampling

1/4 (2x zoom) 1/8 (4x zoom)1/2

Slide credit: S. Seitz

Good and Bad Sampling

Good sampling: •Sample often or, •Sample wisely

Bad sampling: •Aliasing!

Slide credit: S. Narasimhan

Aliasing

6Slide credit: F. Durand

Aliasing

• Occurs when your sampling rate is not high enough to capture the amount of detail in your image

• Can give you the wrong signal/image—an alias !

• To do sampling right, need to understand the structure of your signal/image

• Enter Monsieur Fourier… !

•To avoid aliasing: - sampling rate ≥ 2 * max frequency in the image

•said another way: ≥ two samples per cycle - This minimum sampling rate is called the Nyquist rate 7

Slide credit: L. Zhang

Aliasing• When downsampling by a factor of two

- Original image has frequencies that are too high

!

!

!• How can we fix this?

8Slide credit: N. Snavely

Gaussian pre-filtering

G 1/4

G 1/8

Gaussian 1/2

• Solution: filter the image, then subsample Slide credit: S. Seitz

Subsampling with Gaussian pre-filtering

G 1/4 G 1/8Gaussian 1/2

• Solution: filter the image, then subsampleSlide credit: S. Seitz

Compare with...

1/4 (2x zoom) 1/8 (4x zoom)1/2

Slide credit: S. Seitz

Gaussian pre-filtering• Solution: filter

the image, then subsample

blur

F0 H*

subsample blur subsample …F1

F1 H*

F2F0

Slide credit: N. Snavely

blur

F0 H*

subsample blur subsample …F1

F1 H*

F2F0

{Gaussian pyramid

Slide credit: N. Snavely

Template matching• Goal: find in image !

• Main challenge: What is a good similarity or distance measure between two patches? – Correlation – Zero-mean correlation – Sum Square Difference – Normalized Cross

Correlation

Matching with filters

• Goal: find in image • Method 0: filter the image with eye patch

Input Filtered Image

],[],[],[,

lnkmflkgnmhlk

++=∑

What went wrong?

f = image g = filter

Matching with filters

• Goal: find in image • Method 1: filter the image with zero-mean

eye

Input Filtered Image (scaled) Thresholded Image

)],[()],[(],[,

lnkmgflkfnmhlk

++−=∑

True detections

False detections

mean of f

Matching with filters

• Goal: find in image • Method 2: SSD

Input 1- sqrt(SSD) Thresholded Image

2

,

)],[],[(],[ lnkmflkgnmhlk

++−=∑

True detections

Matching with filters

• Goal: find in image • Method 2: SSD

Input 1- sqrt(SSD)

2

,

)],[],[(],[ lnkmflkgnmhlk

++−=∑

What’s the potential downside of SSD?

Matching with filters

• Goal: find in image • Method 3: Normalized cross-correlation

5.0

,

2,

,

2

,,

)],[()],[(

)],[)(],[(],[

!!"

#$$%

&−−−−

−−−−

=

∑ ∑

lknm

lk

nmlk

flnkmfglkg

flnkmfglkgnmh

Matlab: normxcorr2(template, im)

mean image patchmean template

Matching with filters

• Goal: find in image • Method 3: Normalized cross-correlation

Input Normalized X-Correlation Thresholded Image

True detections

Matching with filters

• Goal: find in image • Method 3: Normalized cross-correlation

Input Normalized X-Correlation Thresholded Image

True detections

Q: What is the best method to use?

!A: Depends • SSD: faster, sensitive to overall intensity • Normalized cross-correlation: slower,

invariant to local average intensity and contrast

• But really, neither of these baselines are representative of modern recognition.

Image pyramidsImage information occurs at all spatial scales !

• Gaussian pyramid • Laplacian pyramid • Wavelet/QMF pyramid • Steerable pyramid

Slide credit: B. Freeman and A. Torralba

The Gaussian pyramid• Smooth with gaussians, because

– a gaussian*gaussian=another gaussian • Gaussians are low pass filters, so

representation is redundant.

Slide credit: B. Freeman and A. Torralba

The computational advantage of pyramids

Slide credit: B. Freeman and A. Torralba[Burt and Adelson, 1983]

The Gaussian Pyramid

Slide credit: B. Freeman and A. Torralba[Burt and Adelson, 1983]

Slide credit: B. Freeman and A. Torralba

Convolution and subsampling as a matrix multiply (1D case)

!! 1 4 6 4 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0

0 0 1 4 6 4 1 0 0 0 0 0 0 0 0 0 0 0 0 0

0 0 0 0 1 4 6 4 1 0 0 0 0 0 0 0 0 0 0 0

0 0 0 0 0 0 1 4 6 4 1 0 0 0 0 0 0 0 0 0

0 0 0 0 0 0 0 0 1 4 6 4 1 0 0 0 0 0 0 0

0 0 0 0 0 0 0 0 0 0 1 4 6 4 1 0 0 0 0 0

0 0 0 0 0 0 0 0 0 0 0 0 1 4 6 4 1 0 0 0

0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 4 6 4 1 0

(Normalization constant of 1/16 omitted for visual clarity.)Slide credit: B. Freeman and A. Torralba

Next pyramid level

!! 1 4 6 4 1 0 0 0

0 0 1 4 6 4 1 0

0 0 0 0 1 4 6 4

0 0 0 0 0 0 1 4

Slide credit: B. Freeman and A. Torralba

The combined effect of the two pyramid levels

!!! 1 4 10 20 31 40 44 40 31 20 10 4 1 0 0 0 0 0 0 0

0 0 0 0 1 4 10 20 31 40 44 40 31 20 10 4 1 0 0 0

0 0 0 0 0 0 0 0 1 4 10 20 31 40 44 40 30 16 4 0

0 0 0 0 0 0 0 0 0 0 0 0 1 4 10 20 25 16 4 0

Slide credit: B. Freeman and A. Torralba

Slide credit: B. Freeman and A. Torralba

Gaussian pyramids used for• up- or down- sampling images. • Multi-resolution image analysis

– Look for an object over various spatial scales

– Coarse-to-fine image processing: form blur estimate or the motion analysis on very low-resolution image, upsample and repeat. Often a successful strategy for avoiding local minima in complicated estimation tasks.

Slide credit: B. Freeman and A. Torralba

1D Gaussian pyramid matrix, for [1 4 6 4 1] low-pass filter

full-band image, highest

resolution

lower-resolution image

lowest resolution image

Slide credit: B. Freeman and A. Torralba

Image pyramidsImage information occurs at all spatial scales !

• Gaussian pyramid • Laplacian pyramid • Wavelet/QMF pyramid • Steerable pyramid

Slide credit: B. Freeman and A. Torralba

The Laplacian Pyramid• Synthesis

– Compute the difference between upsampled Gaussian pyramid level and Gaussian pyramid level.

– band pass filter - each level represents spatial frequencies (largely) unrepresented at other level.

Slide credit: B. Freeman and A. Torralba

The Laplacian Pyramid

36

Upsampling

6 1 0 0

4 4 0 0

1 6 1 0

0 4 4 0

0 1 6 1

0 0 4 4

0 0 1 6

0 0 0 4

Insert zeros between pixels, then apply a low-pass filter, [1 4 6 4 1]

Slide credit: B. Freeman and A. Torralba

Showing, at full resolution, the information captured at each level of a Gaussian (top) and Laplacian (bottom)

pyramid.

Slide credit: B. Freeman and A. Torralba

Laplacian pyramid reconstruction algorithm: recover x1 from L1, L2, L3 and x4

G# is the blur-and-downsample operator at pyramid level # F# is the blur-and-upsample operator at pyramid level # !Laplacian pyramid elements: L1 = (I – F1 G1) x1 L2 = (I – F2 G2) x2 L3 = (I – F3 G3) x3 x2 = G1 x1 x3 = G2 x2 x4 = G3 x3 !!Reconstruction of original image (x1) from Laplacian pyramid elements: x3 = L3 + F3 x4 x2 = L2 + F2 x3 x1 = L1 + F1 x2

Slide credit: B. Freeman and A. Torralba

Laplacian pyramid reconstruction algorithm: recover x1 from L1, L2, L3 and g3

+

++

Slide credit: B. Freeman and A. Torralba

Slide credit: B. Freeman and A. Torralba

Slide credit: B. Freeman and A. Torralba

1D Laplacian pyramid matrix, for [1 4 6 4 1] low-pass filter

high frequencies

mid-band frequencies

low frequencies

Slide credit: B. Freeman and A. Torralba

Laplacian pyramid applications• Texture synthesis • Image compression • Noise removal

Slide credit: B. Freeman and A. Torralba

Image blending

Slide credit: B. Freeman and A. Torralba

Szel

iski

, Com

pute

r Vi

sion

, 201

0

Slide credit: B. Freeman &

A. Torralba

Image blending

• Build Laplacian pyramid for both images: LA, LB • Build Gaussian pyramid for mask: G • Build a combined Laplacian pyramid: L(j) = G(j) LA(j) + (1-G(j)) LB(j) • Collapse L to obtain the blended image

33Slide credit: B. Freeman and A. Torralba

48

Image pyramidsImage information occurs at all spatial scales !

• Gaussian pyramid • Laplacian pyramid • Wavelet/QMF pyramid • Steerable pyramid

Slide credit: B. Freeman and A. Torralba

Linear transforms

Linear transform

Vectorized image

transformed image

! f = U−1 ! F

Note: not all important transforms need to have an inverse

Slide credit: A. Torralba

U= U-1=1 1

1 -1

0.5 0.5

0.5 -0.5

The simplest set of functions:

Haar transform

Slide credit: A. Torralba

U= U-1=1 1

1 -1

0.5 0.5

0.5 -0.5

To code a signal, repeat at several locations:

1 11 -

1 1 11 -

1 1 11 -

1 1 11 -

1

U=

The simplest set of functions:

1 11 -

1 1 11 -

1 1 11 -

1 1 11 -

1

U-1= ½

Haar transform

Slide credit: A. Torralba

1 1

1 -1

1 1

1 -1

1 1

1 -1

1 1

1 -1

1 11 1

1 11 1

1 -11 -1

1 -11 -1

Reordering rows

Low pass

High pass

Apply the same decomposition to the Low pass component:

1 11 1

1 11 1

1 1

1 -1

1 1

1 -1

=

1 1 1 11 1 -1 -1

1 1 1 11 1 -1 -1

And repeat the same operation to the low pass component, until length 1.Note: each subband is sub-sampled and has aliased signal components.

Haar transform

Slide credit: A. Torralba

1 1 1 1 1 1 1 11 1 1 1 -1 -1 -1 -11 1 -1 -1

1 1 -1 -11 -1

1 -11 -1

1 -1

The entire process can be written as a single matrix:

Average

Multiscale derivatives

Haar transform

Slide credit: A. Torralba

1 1 1 1 1 1 1 11 1 1 1 -1 -1 -1 -11 1 -1 -1

1 1 -1 -11 -1

1 -11 -1

1 -1

U= U-1=

0.125

0.125 0.25 0 0.5 0 0 0

0.125

0.125 0.25 0 -0.5 0 0 0

0.125

0.125 -0.25 0 0 0.5 0 0

0.125

0.125 -0.25 0 0 -0.5 0 0

0.125

-0.125 0 0.25 0 0 0.5 0

0.125

-0.125 0 0.25 0 0 -0.5 0

0.125

-0.125 0 -0.25 0 0 0 0.5

0.125

-0.125 0 -0.25 0 0 0 -0.5

Properties: • Orthogonal decomposition • Perfect reconstruction • Critically sampled

Haar transform

Slide credit: A. Torralba

11

1-1

1 1 1 -1

Basic elements:

56

2D Haar transform

11

1-1

1 1 1 -1

Basic elements:

11

1 11 11 1

= 2 Low pass

57

2D Haar transform

11

1-1

1 1 1 -1

Basic elements:

11

1 1

11

1 -1

1-1

1 1

1-1

1 -1

1 11 1

=

=

=

=

2 Low pass

58

2D Haar transform

11

1-1

1 1 1 -1

Basic elements:

11

1 1

11

1 -1

1-1

1 1

1-1

1 -1

1 11 1

1 -11 -1

1 1-1

-1

1 -1-

11

=

=

=

=

2

2

2

2

Low pass

59

2D Haar transform

11

1-1

1 1 1 -1

Basic elements:

11

1 1

11

1 -1

1-1

1 1

1-1

1 -1

1 11 1

1 -11 -1

1 1-1

-1

1 -1-

11

=

=

=

=

2

2

2

2

Low pass

High pass vertical

High pass horizontal

High pass diagonal

60

2D Haar transform

2D Haar transform

1 11 1

1 -11 -1

1 1-1 -1

1 -1-1 1

2

2

2

2 Horizontal high pass, vertical high pass

Horizontal high pass, vertical low-pass

Horizontal low pass, vertical high-pass

Horizontal low pass, Vertical low-pass

Sketch of the Fourier transform

Slide credit: B. Freeman and A. Torralba

Simoncelli and Adelson, in “Subband coding”, Kluwer, 1990.

Pyramid cascade

37Slide credit: B. Freeman and A. Torralba

Wavelet/QMF representation

1 -1

1 -1

1 1

-1 -1

1 -1

-1 1

Same number of pixels!Slide credit: B. Freeman and A. Torralba

Image pyramidsImage information occurs at all spatial scales !

• Gaussian pyramid • Laplacian pyramid • Wavelet/QMF pyramid • Steerable pyramid

Slide credit: B. Freeman and A. Torralba

Steerable Pyramid

Images from: http://www.cis.upenn.edu/~eero/steerpyr.html

2 Level decomposition of white circle example:

Low pass residual

Subbands

Slide credit: B. Freeman and A. Torralba

Steerable Pyramid

Images from: http://www.cis.upenn.edu/~eero/steerpyr.html

… …

Slide credit: B. Freeman and A. Torralba

We may combine Steerability with Pyramids to get a Steerable Laplacian Pyramid as shown below

Decomposition Reconstruction

Steerable Pyramid

We may combine Steerability with Pyramids to get a Steerable Laplacian Pyramid as shown below

Images from: http://www.cis.upenn.edu/~eero/steerpyr.html

Decomposition Reconstruction

Slide credit: B. Freeman and A. Torralba

Steerable Pyramid

But we need to get rid of the corner regions before starting the recursive circular filtering

Slide credit: B. Freeman and A. TorralbaSimoncelli and Freeman, ICIP 1995

Reprinted from “Shiftable MultiScale Transforms,” by Simoncelli et al., IEEE Transactions on Information Theory, 1992, copyright 1992, IEEE

There is also a high pass residual…Slide credit: B. Freeman and A. Torralba

70

Monroe

71

Dog or cat?

72

Almost no dog information

73

• Summary of pyramid representations

74

Image pyramids

Shows the information added in Gaussian pyramid at each spatial scale. Useful for noise reduction & coding.

Progressively blurred and subsampled versions of the image. Adds scale invariance to fixed-size algorithms.

Shows components at each scale and orientation separately. Non-aliased subbands. Good for texture and feature analysis. But overcomplete and with HF residual.

Bandpassed representation, complete, but with aliasing and some non-oriented subbands.

• Gaussian !!

• Laplacian !!

• Wavelet/QMF !!

• Steerable pyramid75

Schematic pictures of each matrix transform

Shown for 1-d images The matrices for 2-d images are the same idea, but

more complicated, to account for vertical, as well as horizontal, neighbor relationships.

Fourier transform, or Wavelet transform, or Steerable pyramid transform

Vectorized imagetransformed image

76

Gaussian pyramid

= *pixel image

Overcomplete representation. Low-pass filters, sampled appropriately for their blur.

Gaussian pyramid

Slide credit: B. Freeman and A. Torralba

Laplacian pyramid

= *pixel image

Overcomplete representation. Transformed pixels represent bandpassed image information.

Laplacian pyramid

Slide credit: B. Freeman and A. Torralba

Wavelet (QMF) transform

= *pixel imageOrtho-normal transform

(like Fourier transform), but with localized basis functions.

Wavelet pyramid

Slide credit: B. Freeman and A. Torralba

= *pixel image

Over-complete representation, but non-aliased subbands.

Steerable pyramid

Multiple orientations at one scale

Multiple orientations at the next

scale

the next scale…

Steerable pyramid

Slide credit: B. Freeman and A. Torralba

Why use image pyramids? • Handle real-world size variations with

a constant-size vision algorithm. • Remove noise • Analyze texture • Recognize objects • Label image features • Image priors can be specified naturally

in terms of wavelet pyramids.

Slide credit: B. Freeman and A. Torralba

Image representation

!• Pixels: great for spatial resolution, poor access

to frequency !

• Fourier transform: great for frequency, not for spatial info !

• Pyramids/filter banks: balance between spatial and frequency information

Slides credit: James Hayes

Major uses of image pyramids!

• Compression !

• Object detection – Scale search – Features !

• Detecting stable interest points !!

• Registration – Course-to-fine

Slides credit: James Hayes

Application: Representing Texture

Source: Forsyth

Texture and Material

http://www-cvr.ai.uiuc.edu/ponce_grp/data/texture_database/samples/

Texture and Orientation

http://www-cvr.ai.uiuc.edu/ponce_grp/data/texture_database/samples/

Texture and Scale

http://www-cvr.ai.uiuc.edu/ponce_grp/data/texture_database/samples/

What is texture?

! Regular or stochastic patterns caused by

bumps, grooves, and/or markings

How can we represent texture?

!• Compute responses of blobs and edges at

various orientations and scales

Overcomplete representation: filter banks

LM Filter Bank

Code for filter banks: www.robots.ox.ac.uk/~vgg/research/texclass/filters.html

Filter banks

• Process image with each filter and keep responses (or squared/abs responses)

How can we represent texture?

!• Measure responses of blobs and edges at

various orientations and scales !

• Idea 1: Record simple statistics (e.g., mean, std.) of absolute filter responses

Can you match the texture to the response?

Mean abs responses

FiltersA

B

C

1

2

3

Representing texture by mean abs response

Mean abs responses

Filters

Representing texture• Idea 2: take vectors of filter responses at each pixel and

cluster them, then take histograms (more on in later weeks)