computer visioncrcv.ucf.edu/people/faculty/xborji/www/lectures/6.pdf1/2 1/4 (2x zoom) 1/8 (4x zoom)...

Computer Vision

Image pyramids

Ali BorjiUWM

• This image is too big to fit on the screen. How can we reduce it?

• How to generate a half-sized version?

2Slide credit: S. Seitz

Image Scaling

Image Sub-Sampling

Throw away every other row and column to create a 1/2 size image

- called image sub-sampling

1/4

1/8

Slide credit: S. Seitz

Image Sub-Sampling

1/4 (2x zoom) 1/8 (4x zoom)1/2


Good and Bad Sampling

Good sampling: •Sample often or, •Sample wisely

Bad sampling: •Aliasing!

Slide credit: S. Narasimhan

Aliasing

6Slide credit: F. Durand

Aliasing

• Occurs when your sampling rate is not high enough to capture the amount of detail in your image

• Can give you the wrong signal/image—an alias !

• To do sampling right, need to understand the structure of your signal/image

• Enter Monsieur Fourier… !

•To avoid aliasing: - sampling rate ≥ 2 * max frequency in the image

•said another way: ≥ two samples per cycle - This minimum sampling rate is called the Nyquist rate 7

Slide credit: L. Zhang

Aliasing• When downsampling by a factor of two

- Original image has frequencies that are too high

!

!

!• How can we fix this?

8Slide credit: N. Snavely

Gaussian pre-filtering

G 1/4

G 1/8

Gaussian 1/2

• Solution: filter the image, then subsample Slide credit: S. Seitz

Subsampling with Gaussian pre-filtering

G 1/4 G 1/8Gaussian 1/2

• Solution: filter the image, then subsampleSlide credit: S. Seitz

Compare with...

1/4 (2x zoom) 1/8 (4x zoom)1/2


Gaussian pre-filtering• Solution: filter

the image, then subsample

blur

F0 H*

subsample blur subsample …F1

F1 H*

F2F0

Slide credit: N. Snavely

blur

F0 H*

subsample blur subsample …F1

F1 H*

F2F0

{Gaussian pyramid

Slide credit: N. Snavely

Template matching• Goal: find in image !

• Main challenge: What is a good similarity or distance measure between two patches? – Correlation – Zero-mean correlation – Sum Square Difference – Normalized Cross

Correlation

Matching with filters

• Goal: find in image • Method 0: filter the image with eye patch

Input Filtered Image

],[],[],[,

lnkmflkgnmhlk

++=∑

What went wrong?

f = image g = filter


• Goal: find in image • Method 1: filter the image with zero-mean

eye

Input Filtered Image (scaled) Thresholded Image

)],[()],[(],[,

lnkmgflkfnmhlk

++−=∑

True detections

False detections

mean of f


• Goal: find in image • Method 2: SSD

Input 1- sqrt(SSD) Thresholded Image

2

,

)],[],[(],[ lnkmflkgnmhlk

++−=∑

True detections


• Goal: find in image • Method 2: SSD

Input 1- sqrt(SSD)

2

,

)],[],[(],[ lnkmflkgnmhlk

++−=∑

What’s the potential downside of SSD?


• Goal: find in image • Method 3: Normalized cross-correlation

5.0

,

2,

,

2

,,

)],[()],[(

)],[)(],[(],[

!!"

#$$%

&−−−−

−−−−

=

∑ ∑

∑

lknm

lk

nmlk

flnkmfglkg

flnkmfglkgnmh

Matlab: normxcorr2(template, im)

mean image patchmean template


• Goal: find in image • Method 3: Normalized cross-correlation

Input Normalized X-Correlation Thresholded Image

True detections

Q: What is the best method to use?

!A: Depends • SSD: faster, sensitive to overall intensity • Normalized cross-correlation: slower,

invariant to local average intensity and contrast

• But really, neither of these baselines are representative of modern recognition.

Image pyramidsImage information occurs at all spatial scales !

• Gaussian pyramid • Laplacian pyramid • Wavelet/QMF pyramid • Steerable pyramid

Slide credit: B. Freeman and A. Torralba

The Gaussian pyramid• Smooth with gaussians, because

– a gaussian*gaussian=another gaussian • Gaussians are low pass filters, so

representation is redundant.


The computational advantage of pyramids

Slide credit: B. Freeman and A. Torralba[Burt and Adelson, 1983]

The Gaussian Pyramid

Slide credit: B. Freeman and A. Torralba[Burt and Adelson, 1983]

Convolution and subsampling as a matrix multiply (1D case)

!! 1 4 6 4 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0

0 0 1 4 6 4 1 0 0 0 0 0 0 0 0 0 0 0 0 0

0 0 0 0 1 4 6 4 1 0 0 0 0 0 0 0 0 0 0 0

0 0 0 0 0 0 1 4 6 4 1 0 0 0 0 0 0 0 0 0

0 0 0 0 0 0 0 0 1 4 6 4 1 0 0 0 0 0 0 0

0 0 0 0 0 0 0 0 0 0 1 4 6 4 1 0 0 0 0 0

0 0 0 0 0 0 0 0 0 0 0 0 1 4 6 4 1 0 0 0

0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 4 6 4 1 0

(Normalization constant of 1/16 omitted for visual clarity.)Slide credit: B. Freeman and A. Torralba

Next pyramid level

!! 1 4 6 4 1 0 0 0

0 0 1 4 6 4 1 0

0 0 0 0 1 4 6 4

0 0 0 0 0 0 1 4


The combined effect of the two pyramid levels

!!! 1 4 10 20 31 40 44 40 31 20 10 4 1 0 0 0 0 0 0 0

0 0 0 0 1 4 10 20 31 40 44 40 31 20 10 4 1 0 0 0

0 0 0 0 0 0 0 0 1 4 10 20 31 40 44 40 30 16 4 0

0 0 0 0 0 0 0 0 0 0 0 0 1 4 10 20 25 16 4 0


Gaussian pyramids used for• up- or downsampling images. • Multi-resolution image analysis

– Look for an object over various spatial scales

– Coarse-to-fine image processing: form blur estimate or the motion analysis on very low-resolution image, upsample and repeat. Often a successful strategy for avoiding local minima in complicated estimation tasks.


1D Gaussian pyramid matrix, for [1 4 6 4 1] low-pass filter

full-band image, highest

resolution

lower-resolution image

lowest resolution image


The Laplacian Pyramid• Synthesis

– Compute the difference between upsampled Gaussian pyramid level and Gaussian pyramid level.

– band pass filter - each level represents spatial frequencies (largely) unrepresented at other level.


The Laplacian Pyramid

36

Upsampling

6 1 0 0

4 4 0 0

1 6 1 0

0 4 4 0

0 1 6 1

0 0 4 4

0 0 1 6

0 0 0 4

Insert zeros between pixels, then apply a low-pass filter, [1 4 6 4 1]


Showing, at full resolution, the information captured at each level of a Gaussian (top) and Laplacian (bottom)

pyramid.


Laplacian pyramid reconstruction algorithm: recover x1 from L1, L2, L3 and x4

G# is the blur-and-downsample operator at pyramid level # F# is the blur-and-upsample operator at pyramid level # !Laplacian pyramid elements: L1 = (I – F1 G1) x1 L2 = (I – F2 G2) x2 L3 = (I – F3 G3) x3 x2 = G1 x1 x3 = G2 x2 x4 = G3 x3 !!Reconstruction of original image (x1) from Laplacian pyramid elements: x3 = L3 + F3 x4 x2 = L2 + F2 x3 x1 = L1 + F1 x2


Laplacian pyramid reconstruction algorithm: recover x1 from L1, L2, L3 and g3

+

++


1D Laplacian pyramid matrix, for [1 4 6 4 1] low-pass filter

high frequencies

mid-band frequencies

low frequencies


Laplacian pyramid applications• Texture synthesis • Image compression • Noise removal


Image blending


Szel

iski

, Com

pute

r Vi

sion

, 201

0

Slide credit: B. Freeman &

A. Torralba

Image blending

• Build Laplacian pyramid for both images: LA, LB • Build Gaussian pyramid for mask: G • Build a combined Laplacian pyramid: L(j) = G(j) LA(j) + (1-G(j)) LB(j) • Collapse L to obtain the blended image

33Slide credit: B. Freeman and A. Torralba

Linear transforms

Linear transform

Vectorized image

transformed image

€

! f = U−1 ! F

Note: not all important transforms need to have an inverse

Slide credit: A. Torralba

U= U-1=1 1

1 -1

0.5 0.5

0.5 -0.5

The simplest set of functions:

Haar transform


U= U-1=1 1

1 -1

0.5 0.5

0.5 -0.5

To code a signal, repeat at several locations:

1 11 -

1 1 11 -

1 1 11 -

1 1 11 -

1

U=

The simplest set of functions:

1 11 -

1 1 11 -

1 1 11 -

1 1 11 -

1

U-1= ½

Haar transform


1 1

1 -1

1 1

1 -1

1 1

1 -1

1 1

1 -1

1 11 1

1 11 1

1 -11 -1

1 -11 -1

Reordering rows

Low pass

High pass

Apply the same decomposition to the Low pass component:

1 11 1

1 11 1

1 1

1 -1

1 1

1 -1

=

1 1 1 11 1 -1 -1

1 1 1 11 1 -1 -1

And repeat the same operation to the low pass component, until length 1.Note: each subband is sub-sampled and has aliased signal components.

Haar transform


1 1 1 1 1 1 1 11 1 1 1 -1 -1 -1 -11 1 -1 -1

1 1 -1 -11 -1

1 -11 -1

1 -1

The entire process can be written as a single matrix:

Average

Multiscale derivatives

Haar transform


1 1 1 1 1 1 1 11 1 1 1 -1 -1 -1 -11 1 -1 -1

1 1 -1 -11 -1

1 -11 -1

1 -1

U= U-1=

0.125

0.125 0.25 0 0.5 0 0 0

0.125

0.125 0.25 0 -0.5 0 0 0

0.125

0.125 -0.25 0 0 0.5 0 0

0.125

0.125 -0.25 0 0 -0.5 0 0

0.125

-0.125 0 0.25 0 0 0.5 0

0.125

-0.125 0 0.25 0 0 -0.5 0

0.125

-0.125 0 -0.25 0 0 0 0.5

0.125

-0.125 0 -0.25 0 0 0 -0.5

Properties: • Orthogonal decomposition • Perfect reconstruction • Critically sampled

Haar transform


11

1-1

1 1 1 -1

Basic elements:

56

2D Haar transform

11

1-1

1 1 1 -1

Basic elements:

11

1 11 11 1

= 2 Low pass

57

2D Haar transform

11

1-1

1 1 1 -1

Basic elements:

11

1 1

11

1 -1

1-1

1 1

1-1

1 -1

1 11 1

=

=

=

=

2 Low pass

58

2D Haar transform

11

1-1

1 1 1 -1

Basic elements:

11

1 1

11

1 -1

1-1

1 1

1-1

1 -1

1 11 1

1 -11 -1

1 1-1

-1

1 -1-

11

=

=

=

=

2

2

2

2

Low pass

59

2D Haar transform

11

1-1

1 1 1 -1

Basic elements:

11

1 1

11

1 -1

1-1

1 1

1-1

1 -1

1 11 1

1 -11 -1

1 1-1

-1

1 -1-

11

=

=

=

=

2

2

2

2

Low pass

High pass vertical

High pass horizontal

High pass diagonal

60

2D Haar transform

2D Haar transform

1 11 1

1 -11 -1

1 1-1 -1

1 -1-1 1

2

2

2

2 Horizontal high pass, vertical high pass

Horizontal high pass, vertical low-pass

Horizontal low pass, vertical high-pass

Horizontal low pass, Vertical low-pass

Sketch of the Fourier transform


Simoncelli and Adelson, in “Subband coding”, Kluwer, 1990.

Pyramid cascade

37Slide credit: B. Freeman and A. Torralba

Wavelet/QMF representation

1 -1

1 -1

1 1

-1 -1

1 -1

-1 1

Same number of pixels!Slide credit: B. Freeman and A. Torralba

Steerable Pyramid

Images from: http://www.cis.upenn.edu/~eero/steerpyr.html

2 Level decomposition of white circle example:

Low pass residual

Subbands


Steerable Pyramid


… …


We may combine Steerability with Pyramids to get a Steerable Laplacian Pyramid as shown below

Decomposition Reconstruction

Steerable Pyramid

We may combine Steerability with Pyramids to get a Steerable Laplacian Pyramid as shown below


Decomposition Reconstruction


Steerable Pyramid

But we need to get rid of the corner regions before starting the recursive circular filtering

Slide credit: B. Freeman and A. TorralbaSimoncelli and Freeman, ICIP 1995

Reprinted from “Shiftable MultiScale Transforms,” by Simoncelli et al., IEEE Transactions on Information Theory, 1992, copyright 1992, IEEE

There is also a high pass residual…Slide credit: B. Freeman and A. Torralba

Monroe

71

Dog or cat?

72

Almost no dog information

73

• Summary of pyramid representations

74

Image pyramids

Shows the information added in Gaussian pyramid at each spatial scale. Useful for noise reduction & coding.

Progressively blurred and subsampled versions of the image. Adds scale invariance to fixed-size algorithms.

Shows components at each scale and orientation separately. Non-aliased subbands. Good for texture and feature analysis. But overcomplete and with HF residual.

Bandpassed representation, complete, but with aliasing and some non-oriented subbands.

• Gaussian !!

• Laplacian !!

• Wavelet/QMF !!

• Steerable pyramid75

Schematic pictures of each matrix transform

Shown for 1-d images The matrices for 2-d images are the same idea, but

more complicated, to account for vertical, as well as horizontal, neighbor relationships.

Fourier transform, or Wavelet transform, or Steerable pyramid transform

Vectorized imagetransformed image

76

Gaussian pyramid

= *pixel image

Overcomplete representation. Low-pass filters, sampled appropriately for their blur.

Gaussian pyramid


Laplacian pyramid

= *pixel image

Overcomplete representation. Transformed pixels represent bandpassed image information.

Laplacian pyramid


Wavelet (QMF) transform

= *pixel imageOrtho-normal transform

(like Fourier transform), but with localized basis functions.

Wavelet pyramid


= *pixel image

Over-complete representation, but non-aliased subbands.

Steerable pyramid

Multiple orientations at one scale

Multiple orientations at the next

scale

the next scale…

Steerable pyramid


Why use image pyramids? • Handle real-world size variations with

a constant-size vision algorithm. • Remove noise • Analyze texture • Recognize objects • Label image features • Image priors can be specified naturally

in terms of wavelet pyramids.


Image representation

!• Pixels: great for spatial resolution, poor access

to frequency !

• Fourier transform: great for frequency, not for spatial info !

• Pyramids/filter banks: balance between spatial and frequency information

Slides credit: James Hayes

Major uses of image pyramids!

• Compression !

• Object detection – Scale search – Features !

• Detecting stable interest points !!

• Registration – Course-to-fine

Slides credit: James Hayes

Application: Representing Texture

Source: Forsyth

Texture and Material

http://www-cvr.ai.uiuc.edu/ponce_grp/data/texture_database/samples/

Texture and Orientation


Texture and Scale


What is texture?

! Regular or stochastic patterns caused by

bumps, grooves, and/or markings

How can we represent texture?

!• Compute responses of blobs and edges at

various orientations and scales

Overcomplete representation: filter banks

LM Filter Bank

Code for filter banks: www.robots.ox.ac.uk/~vgg/research/texclass/filters.html

Filter banks

• Process image with each filter and keep responses (or squared/abs responses)

How can we represent texture?

!• Measure responses of blobs and edges at

various orientations and scales !

• Idea 1: Record simple statistics (e.g., mean, std.) of absolute filter responses

Can you match the texture to the response?

Mean abs responses

FiltersA

B

C

1

2

3

Representing texture by mean abs response

Mean abs responses

Filters

Representing texture• Idea 2: take vectors of filter responses at each pixel and

cluster them, then take histograms (more on in later weeks)

computer visioncrcv.ucf.edu/people/faculty/xborji/www/lectures/6.pdf1/2 1/4 (2x zoom) 1/8 (4x zoom)...

Documents