description - instituto de computaçãoafalcao/mo445/description.pdf · a descriptor is...

127
Description Alexandre Xavier Falc˜ ao Institute of Computing - UNICAMP [email protected] Alexandre Xavier Falc˜ ao MC940/MO445 - Image Analysis

Upload: others

Post on 12-Oct-2020

0 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Description - Instituto de Computaçãoafalcao/mo445/description.pdf · A descriptor is analgorithmthat extracts a feature vector x(s) = (x 1(s);x 2(s);:::;x n(s)) from any sample

Description

Alexandre Xavier Falcao

Institute of Computing - UNICAMP

[email protected]

Alexandre Xavier Falcao MC940/MO445 - Image Analysis

Page 2: Description - Instituto de Computaçãoafalcao/mo445/description.pdf · A descriptor is analgorithmthat extracts a feature vector x(s) = (x 1(s);x 2(s);:::;x n(s)) from any sample

Introduction

A descriptor is an algorithm that extracts a feature vectorx(s) = (x1(s), x2(s), . . . , xn(s)) from any sample s ∈ Z, wherea sample may be defined as pixel, superpixel, the image of asegmented object, its shape, etc.

The descriptor may also include a distance function (e.g.,d(s, t) = ‖x(t)− x(s)‖) to compare the dissimilarity betweensamples s and t in the feature space.

In this module, we are interested in obtaining color, shape,and texture descriptors from the previously studied imagerepresentations.

Alexandre Xavier Falcao MC940/MO445 - Image Analysis

Page 3: Description - Instituto de Computaçãoafalcao/mo445/description.pdf · A descriptor is analgorithmthat extracts a feature vector x(s) = (x 1(s);x 2(s);:::;x n(s)) from any sample

Introduction

A descriptor is an algorithm that extracts a feature vectorx(s) = (x1(s), x2(s), . . . , xn(s)) from any sample s ∈ Z, wherea sample may be defined as pixel, superpixel, the image of asegmented object, its shape, etc.

The descriptor may also include a distance function (e.g.,d(s, t) = ‖x(t)− x(s)‖) to compare the dissimilarity betweensamples s and t in the feature space.

In this module, we are interested in obtaining color, shape,and texture descriptors from the previously studied imagerepresentations.

Alexandre Xavier Falcao MC940/MO445 - Image Analysis

Page 4: Description - Instituto de Computaçãoafalcao/mo445/description.pdf · A descriptor is analgorithmthat extracts a feature vector x(s) = (x 1(s);x 2(s);:::;x n(s)) from any sample

Introduction

A descriptor is an algorithm that extracts a feature vectorx(s) = (x1(s), x2(s), . . . , xn(s)) from any sample s ∈ Z, wherea sample may be defined as pixel, superpixel, the image of asegmented object, its shape, etc.

The descriptor may also include a distance function (e.g.,d(s, t) = ‖x(t)− x(s)‖) to compare the dissimilarity betweensamples s and t in the feature space.

In this module, we are interested in obtaining color, shape,and texture descriptors from the previously studied imagerepresentations.

Alexandre Xavier Falcao MC940/MO445 - Image Analysis

Page 5: Description - Instituto de Computaçãoafalcao/mo445/description.pdf · A descriptor is analgorithmthat extracts a feature vector x(s) = (x 1(s);x 2(s);:::;x n(s)) from any sample

Agenda

We will focus on the following descriptors and a geneticprogramming strategy for their combination.

Color descriptors: Color histogram and BIC [8].

Texture descriptors (the most popular ones): LBP, HoG,BoVW, and CNN.

Shape descriptors using multiscale fractal dimension [3],saliences [2], and tensor scale [1, 7].

Descriptor combination by genetic programming [4].

Alexandre Xavier Falcao MC940/MO445 - Image Analysis

Page 6: Description - Instituto de Computaçãoafalcao/mo445/description.pdf · A descriptor is analgorithmthat extracts a feature vector x(s) = (x 1(s);x 2(s);:::;x n(s)) from any sample

Color histogram

The histogram h(i) of a grayscale image I = (DI , I ) is defined as

h(i) =∑∀p∈DI

δ (I (p)− i) ,

δ (I (p)− i) =

{1, when i = I (p),0, otherwise.

For color images I = (DI , I), where I(p) = (I1(p), I2(p), I3(p)) insome color space (e.g., RGB, YCbCr, Lab), this definition wouldlead to a sparse and, in this case, likely ineffective feature vectorx(s) = h, where s = I and xi (s) = h(i).

Alexandre Xavier Falcao MC940/MO445 - Image Analysis

Page 7: Description - Instituto de Computaçãoafalcao/mo445/description.pdf · A descriptor is analgorithmthat extracts a feature vector x(s) = (x 1(s);x 2(s);:::;x n(s)) from any sample

Color histogram

The histogram h(i) of a grayscale image I = (DI , I ) is defined as

h(i) =∑∀p∈DI

δ (I (p)− i) ,

δ (I (p)− i) =

{1, when i = I (p),0, otherwise.

For color images I = (DI , I), where I(p) = (I1(p), I2(p), I3(p)) insome color space (e.g., RGB, YCbCr, Lab), this definition wouldlead to a sparse and, in this case, likely ineffective feature vectorx(s) = h, where s = I and xi (s) = h(i).

Alexandre Xavier Falcao MC940/MO445 - Image Analysis

Page 8: Description - Instituto de Computaçãoafalcao/mo445/description.pdf · A descriptor is analgorithmthat extracts a feature vector x(s) = (x 1(s);x 2(s);:::;x n(s)) from any sample

Color histogram

The problem is addressed by dividing each axis of the color spaceinto fixed intervals, called bins. For a 64-bin histogram of a24-bit-RGB image, I1 = R, I2 = G , and I3 = B, the histogram h is

h(i) =∑∀p∈DI

δ (V (p)− i) ,

V (p) =R(p)

64+ 4

[G (p)

64

]+ 16

[B(p)

64

],

where V (p) ∈ [0, 63].

It is also common to create normalized

histograms by setting h(i)← h(i)|DI | , i = 1, 2, . . . , 64.

Alexandre Xavier Falcao MC940/MO445 - Image Analysis

Page 9: Description - Instituto de Computaçãoafalcao/mo445/description.pdf · A descriptor is analgorithmthat extracts a feature vector x(s) = (x 1(s);x 2(s);:::;x n(s)) from any sample

Color histogram

The problem is addressed by dividing each axis of the color spaceinto fixed intervals, called bins. For a 64-bin histogram of a24-bit-RGB image, I1 = R, I2 = G , and I3 = B, the histogram h is

h(i) =∑∀p∈DI

δ (V (p)− i) ,

V (p) =R(p)

64+ 4

[G (p)

64

]+ 16

[B(p)

64

],

where V (p) ∈ [0, 63]. It is also common to create normalized

histograms by setting h(i)← h(i)|DI | , i = 1, 2, . . . , 64.

Alexandre Xavier Falcao MC940/MO445 - Image Analysis

Page 10: Description - Instituto de Computaçãoafalcao/mo445/description.pdf · A descriptor is analgorithmthat extracts a feature vector x(s) = (x 1(s);x 2(s);:::;x n(s)) from any sample

BIC — Border and Interior Classification

The BIC descriptor consists of three components.

A simple and yet effective pixel classification algorithm intoimage regions defined as either border (high frequency) orinterior (low frequency).

A compact region representation based on color histograms.

A logarithmic distance function to compare histograms fromtwo images.

Alexandre Xavier Falcao MC940/MO445 - Image Analysis

Page 11: Description - Instituto de Computaçãoafalcao/mo445/description.pdf · A descriptor is analgorithmthat extracts a feature vector x(s) = (x 1(s);x 2(s);:::;x n(s)) from any sample

BIC — Border and Interior Classification

Let L(p) ∈ {0, 1} indicate when p is an interior or borderpixel.

Once V (p) is computed for each pixel p ∈ DI , a pixel p isclassified as follows.

L(p) =

{1, when ∃q ∈ A1(p) | V (q) 6= V (p), and0, otherwise.

The normalized color histograms h0 (interior) and h1 (border)of each region are computed and then quantized from 0 to255 by setting h0(i)← 255h0(i) and h1(i)← 255h1(i).

Alexandre Xavier Falcao MC940/MO445 - Image Analysis

Page 12: Description - Instituto de Computaçãoafalcao/mo445/description.pdf · A descriptor is analgorithmthat extracts a feature vector x(s) = (x 1(s);x 2(s);:::;x n(s)) from any sample

BIC — Border and Interior Classification

Let L(p) ∈ {0, 1} indicate when p is an interior or borderpixel.

Once V (p) is computed for each pixel p ∈ DI , a pixel p isclassified as follows.

L(p) =

{1, when ∃q ∈ A1(p) | V (q) 6= V (p), and0, otherwise.

The normalized color histograms h0 (interior) and h1 (border)of each region are computed and then quantized from 0 to255 by setting h0(i)← 255h0(i) and h1(i)← 255h1(i).

Alexandre Xavier Falcao MC940/MO445 - Image Analysis

Page 13: Description - Instituto de Computaçãoafalcao/mo445/description.pdf · A descriptor is analgorithmthat extracts a feature vector x(s) = (x 1(s);x 2(s);:::;x n(s)) from any sample

BIC — Border and Interior Classification

Let L(p) ∈ {0, 1} indicate when p is an interior or borderpixel.

Once V (p) is computed for each pixel p ∈ DI , a pixel p isclassified as follows.

L(p) =

{1, when ∃q ∈ A1(p) | V (q) 6= V (p), and0, otherwise.

The normalized color histograms h0 (interior) and h1 (border)of each region are computed and then quantized from 0 to255 by setting h0(i)← 255h0(i) and h1(i)← 255h1(i).

Alexandre Xavier Falcao MC940/MO445 - Image Analysis

Page 14: Description - Instituto de Computaçãoafalcao/mo445/description.pdf · A descriptor is analgorithmthat extracts a feature vector x(s) = (x 1(s);x 2(s);:::;x n(s)) from any sample

BIC — Border and Interior Classification

Assuming the L1 metric, the logarithmic distance (log2) isencoded in the histograms by mapping hb(i), b = {0, 1} from[0, 255] to [0, 9] (4 bits per bin) as follows.

hb(i) ←

0 if hb(i) = 0,1 if hb(i) < 1,2 if hb(i) < 2,3 if hb(i) < 4,4 if hb(i) < 8,5 if hb(i) < 16,6 if hb(i) < 32,7 if hb(i) < 64,8 if hb(i) < 128,9 otherwise.

Finally, the histograms h0 and h1 are concatenated into asingle histogram x.

Alexandre Xavier Falcao MC940/MO445 - Image Analysis

Page 15: Description - Instituto de Computaçãoafalcao/mo445/description.pdf · A descriptor is analgorithmthat extracts a feature vector x(s) = (x 1(s);x 2(s);:::;x n(s)) from any sample

BIC — Border and Interior Classification

Assuming the L1 metric, the logarithmic distance (log2) isencoded in the histograms by mapping hb(i), b = {0, 1} from[0, 255] to [0, 9] (4 bits per bin) as follows.

hb(i) ←

0 if hb(i) = 0,1 if hb(i) < 1,2 if hb(i) < 2,3 if hb(i) < 4,4 if hb(i) < 8,5 if hb(i) < 16,6 if hb(i) < 32,7 if hb(i) < 64,8 if hb(i) < 128,9 otherwise.

Finally, the histograms h0 and h1 are concatenated into asingle histogram x.

Alexandre Xavier Falcao MC940/MO445 - Image Analysis

Page 16: Description - Instituto de Computaçãoafalcao/mo445/description.pdf · A descriptor is analgorithmthat extracts a feature vector x(s) = (x 1(s);x 2(s);:::;x n(s)) from any sample

Texture: Local Binary Patterns (LBP)

For a given grayscale image I = (DI , I ) and adjacency setA√2(p) = {q1, q2, . . . , q8} for p ∈ DI .

A local binary pattern B(p) ∈ [0, 255] is assigned to p bysetting each bit bk(z), k = 1, 2, . . . , 8, of z = B(p) as

bk(z) ←{

1 if I (p) > I (qk),0 otherwise.

The histogram of the map B can be used as LBP featurevector.

Alternatively, DI can be divided into cells of N ×N pixels, oneLBP histogram can be extracted per cells, and the histogramsconcatenated into a single feature vector per image.

Alexandre Xavier Falcao MC940/MO445 - Image Analysis

Page 17: Description - Instituto de Computaçãoafalcao/mo445/description.pdf · A descriptor is analgorithmthat extracts a feature vector x(s) = (x 1(s);x 2(s);:::;x n(s)) from any sample

Texture: Local Binary Patterns (LBP)

For a given grayscale image I = (DI , I ) and adjacency setA√2(p) = {q1, q2, . . . , q8} for p ∈ DI .

A local binary pattern B(p) ∈ [0, 255] is assigned to p bysetting each bit bk(z), k = 1, 2, . . . , 8, of z = B(p) as

bk(z) ←{

1 if I (p) > I (qk),0 otherwise.

The histogram of the map B can be used as LBP featurevector.

Alternatively, DI can be divided into cells of N ×N pixels, oneLBP histogram can be extracted per cells, and the histogramsconcatenated into a single feature vector per image.

Alexandre Xavier Falcao MC940/MO445 - Image Analysis

Page 18: Description - Instituto de Computaçãoafalcao/mo445/description.pdf · A descriptor is analgorithmthat extracts a feature vector x(s) = (x 1(s);x 2(s);:::;x n(s)) from any sample

Texture: Local Binary Patterns (LBP)

For a given grayscale image I = (DI , I ) and adjacency setA√2(p) = {q1, q2, . . . , q8} for p ∈ DI .

A local binary pattern B(p) ∈ [0, 255] is assigned to p bysetting each bit bk(z), k = 1, 2, . . . , 8, of z = B(p) as

bk(z) ←{

1 if I (p) > I (qk),0 otherwise.

The histogram of the map B can be used as LBP featurevector.

Alternatively, DI can be divided into cells of N ×N pixels, oneLBP histogram can be extracted per cells, and the histogramsconcatenated into a single feature vector per image.

Alexandre Xavier Falcao MC940/MO445 - Image Analysis

Page 19: Description - Instituto de Computaçãoafalcao/mo445/description.pdf · A descriptor is analgorithmthat extracts a feature vector x(s) = (x 1(s);x 2(s);:::;x n(s)) from any sample

Texture: Local Binary Patterns (LBP)

However, the LBP histograms are not rotation invariant,which has inspired many variants.

Some of them treat rotation in the distance function andothers incorporate rotation invariance in the feature vector.

The problem is also not critical, when the images are aligned.

The extension to color images can simply concatenate thehistograms from each band.

Alexandre Xavier Falcao MC940/MO445 - Image Analysis

Page 20: Description - Instituto de Computaçãoafalcao/mo445/description.pdf · A descriptor is analgorithmthat extracts a feature vector x(s) = (x 1(s);x 2(s);:::;x n(s)) from any sample

Texture: Local Binary Patterns (LBP)

However, the LBP histograms are not rotation invariant,which has inspired many variants.

Some of them treat rotation in the distance function andothers incorporate rotation invariance in the feature vector.

The problem is also not critical, when the images are aligned.

The extension to color images can simply concatenate thehistograms from each band.

Alexandre Xavier Falcao MC940/MO445 - Image Analysis

Page 21: Description - Instituto de Computaçãoafalcao/mo445/description.pdf · A descriptor is analgorithmthat extracts a feature vector x(s) = (x 1(s);x 2(s);:::;x n(s)) from any sample

Texture: Local Binary Patterns (LBP)

However, the LBP histograms are not rotation invariant,which has inspired many variants.

Some of them treat rotation in the distance function andothers incorporate rotation invariance in the feature vector.

The problem is also not critical, when the images are aligned.

The extension to color images can simply concatenate thehistograms from each band.

Alexandre Xavier Falcao MC940/MO445 - Image Analysis

Page 22: Description - Instituto de Computaçãoafalcao/mo445/description.pdf · A descriptor is analgorithmthat extracts a feature vector x(s) = (x 1(s);x 2(s);:::;x n(s)) from any sample

Texture: Local Binary Patterns (LBP)

However, the LBP histograms are not rotation invariant,which has inspired many variants.

Some of them treat rotation in the distance function andothers incorporate rotation invariance in the feature vector.

The problem is also not critical, when the images are aligned.

The extension to color images can simply concatenate thehistograms from each band.

Alexandre Xavier Falcao MC940/MO445 - Image Analysis

Page 23: Description - Instituto de Computaçãoafalcao/mo445/description.pdf · A descriptor is analgorithmthat extracts a feature vector x(s) = (x 1(s);x 2(s);:::;x n(s)) from any sample

Texture: Histogram of Oriented Gradients (HoG)

In object detection, for instance, each sample is a subimage(called window) around a candidate object.

The candidate objects reduce the number of windows foranalysis and they can be obtained by segmentation and simplecomponent analysis.

An example is the detection of car license plates in a grayscaleimage I = (DI , I ).

The problem can be reduced to extract a HoG feature vector(or its concatenation with LBP) inside each window forpattern classification as car license plate or background.

The extension to color images can simply concatenate theHoG feature vectors of each band inside the window.

Alexandre Xavier Falcao MC940/MO445 - Image Analysis

Page 24: Description - Instituto de Computaçãoafalcao/mo445/description.pdf · A descriptor is analgorithmthat extracts a feature vector x(s) = (x 1(s);x 2(s);:::;x n(s)) from any sample

Texture: Histogram of Oriented Gradients (HoG)

In object detection, for instance, each sample is a subimage(called window) around a candidate object.

The candidate objects reduce the number of windows foranalysis and they can be obtained by segmentation and simplecomponent analysis.

An example is the detection of car license plates in a grayscaleimage I = (DI , I ).

The problem can be reduced to extract a HoG feature vector(or its concatenation with LBP) inside each window forpattern classification as car license plate or background.

The extension to color images can simply concatenate theHoG feature vectors of each band inside the window.

Alexandre Xavier Falcao MC940/MO445 - Image Analysis

Page 25: Description - Instituto de Computaçãoafalcao/mo445/description.pdf · A descriptor is analgorithmthat extracts a feature vector x(s) = (x 1(s);x 2(s);:::;x n(s)) from any sample

Texture: Histogram of Oriented Gradients (HoG)

In object detection, for instance, each sample is a subimage(called window) around a candidate object.

The candidate objects reduce the number of windows foranalysis and they can be obtained by segmentation and simplecomponent analysis.

An example is the detection of car license plates in a grayscaleimage I = (DI , I ).

The problem can be reduced to extract a HoG feature vector(or its concatenation with LBP) inside each window forpattern classification as car license plate or background.

The extension to color images can simply concatenate theHoG feature vectors of each band inside the window.

Alexandre Xavier Falcao MC940/MO445 - Image Analysis

Page 26: Description - Instituto de Computaçãoafalcao/mo445/description.pdf · A descriptor is analgorithmthat extracts a feature vector x(s) = (x 1(s);x 2(s);:::;x n(s)) from any sample

Texture: Histogram of Oriented Gradients (HoG)

In object detection, for instance, each sample is a subimage(called window) around a candidate object.

The candidate objects reduce the number of windows foranalysis and they can be obtained by segmentation and simplecomponent analysis.

An example is the detection of car license plates in a grayscaleimage I = (DI , I ).

The problem can be reduced to extract a HoG feature vector(or its concatenation with LBP) inside each window forpattern classification as car license plate or background.

The extension to color images can simply concatenate theHoG feature vectors of each band inside the window.

Alexandre Xavier Falcao MC940/MO445 - Image Analysis

Page 27: Description - Instituto de Computaçãoafalcao/mo445/description.pdf · A descriptor is analgorithmthat extracts a feature vector x(s) = (x 1(s);x 2(s);:::;x n(s)) from any sample

Texture: Histogram of Oriented Gradients (HoG)

In object detection, for instance, each sample is a subimage(called window) around a candidate object.

The candidate objects reduce the number of windows foranalysis and they can be obtained by segmentation and simplecomponent analysis.

An example is the detection of car license plates in a grayscaleimage I = (DI , I ).

The problem can be reduced to extract a HoG feature vector(or its concatenation with LBP) inside each window forpattern classification as car license plate or background.

The extension to color images can simply concatenate theHoG feature vectors of each band inside the window.

Alexandre Xavier Falcao MC940/MO445 - Image Analysis

Page 28: Description - Instituto de Computaçãoafalcao/mo445/description.pdf · A descriptor is analgorithmthat extracts a feature vector x(s) = (x 1(s);x 2(s);:::;x n(s)) from any sample

Texture: Histogram of Oriented Gradients (HoG)

As first step, the image intensities are normalized within aninterval [0− L] (e.g., by gamma correction).

I ′(p) = K

[I (p)

Imax

]γ,

where Imax = max∀p∈DI{I (p)}, γ > 0, and K = 2b − 1.

Now, for each window of size n1 ×m1 pixels around acandidate object, the HoG feature vector requires theestimation of a gradient vector ~g(p) at each pixel p.

~g(p) =∑

∀q∈Ar (p)

[I (q)− I (p)] exp

(−‖q − p‖2

2σ2

)~pq,

where σ = r/3, ~pq = q−p‖q−p‖ and r ≤ 1.

The magnitude ‖~g(p)‖ and orientation θ(p) (angle between~g(p) and x) are used as follows.

Alexandre Xavier Falcao MC940/MO445 - Image Analysis

Page 29: Description - Instituto de Computaçãoafalcao/mo445/description.pdf · A descriptor is analgorithmthat extracts a feature vector x(s) = (x 1(s);x 2(s);:::;x n(s)) from any sample

Texture: Histogram of Oriented Gradients (HoG)

As first step, the image intensities are normalized within aninterval [0− L] (e.g., by gamma correction).

I ′(p) = K

[I (p)

Imax

]γ,

where Imax = max∀p∈DI{I (p)}, γ > 0, and K = 2b − 1.

Now, for each window of size n1 ×m1 pixels around acandidate object, the HoG feature vector requires theestimation of a gradient vector ~g(p) at each pixel p.

~g(p) =∑

∀q∈Ar (p)

[I (q)− I (p)] exp

(−‖q − p‖2

2σ2

)~pq,

where σ = r/3, ~pq = q−p‖q−p‖ and r ≤ 1.

The magnitude ‖~g(p)‖ and orientation θ(p) (angle between~g(p) and x) are used as follows.

Alexandre Xavier Falcao MC940/MO445 - Image Analysis

Page 30: Description - Instituto de Computaçãoafalcao/mo445/description.pdf · A descriptor is analgorithmthat extracts a feature vector x(s) = (x 1(s);x 2(s);:::;x n(s)) from any sample

Texture: Histogram of Oriented Gradients (HoG)

As first step, the image intensities are normalized within aninterval [0− L] (e.g., by gamma correction).

I ′(p) = K

[I (p)

Imax

]γ,

where Imax = max∀p∈DI{I (p)}, γ > 0, and K = 2b − 1.

Now, for each window of size n1 ×m1 pixels around acandidate object, the HoG feature vector requires theestimation of a gradient vector ~g(p) at each pixel p.

~g(p) =∑

∀q∈Ar (p)

[I (q)− I (p)] exp

(−‖q − p‖2

2σ2

)~pq,

where σ = r/3, ~pq = q−p‖q−p‖ and r ≤ 1.

The magnitude ‖~g(p)‖ and orientation θ(p) (angle between~g(p) and x) are used as follows.

Alexandre Xavier Falcao MC940/MO445 - Image Analysis

Page 31: Description - Instituto de Computaçãoafalcao/mo445/description.pdf · A descriptor is analgorithmthat extracts a feature vector x(s) = (x 1(s);x 2(s);:::;x n(s)) from any sample

Texture: Histogram of Oriented Gradients (HoG)

The window is further divided into an integer number of cellscontaining n2 ×m2 pixels each.

Window

object

candidate

pixel

Cell

Alexandre Xavier Falcao MC940/MO445 - Image Analysis

Page 32: Description - Instituto de Computaçãoafalcao/mo445/description.pdf · A descriptor is analgorithmthat extracts a feature vector x(s) = (x 1(s);x 2(s);:::;x n(s)) from any sample

Texture: Histogram of Oriented Gradients (HoG)

One histogram of gradient orientations per cell is obtainedwith nb bins.

For nb = 9 bins, for instance, the bin 0 may be used toaccumulate votes from pixels whose ‖~g(p)‖ = 0 and theremaining bins store votes from pixels whose θ(p) falls within[0− 44], [45− 89], . . . , [315− 359], respectively.

The orientation θ(p) for hx(p) = gx (p)‖~g(p)‖ and hy (p) =

gy (p)‖~g(p)‖ is

defined as

θ(p) =

{180π cos−1(hx(p)) if hy (p) ≥ 0,

360− 180π cos−1(hx(p)) if hy (p) < 0.

Alexandre Xavier Falcao MC940/MO445 - Image Analysis

Page 33: Description - Instituto de Computaçãoafalcao/mo445/description.pdf · A descriptor is analgorithmthat extracts a feature vector x(s) = (x 1(s);x 2(s);:::;x n(s)) from any sample

Texture: Histogram of Oriented Gradients (HoG)

One histogram of gradient orientations per cell is obtainedwith nb bins.

For nb = 9 bins, for instance, the bin 0 may be used toaccumulate votes from pixels whose ‖~g(p)‖ = 0 and theremaining bins store votes from pixels whose θ(p) falls within[0− 44], [45− 89], . . . , [315− 359], respectively.

The orientation θ(p) for hx(p) = gx (p)‖~g(p)‖ and hy (p) =

gy (p)‖~g(p)‖ is

defined as

θ(p) =

{180π cos−1(hx(p)) if hy (p) ≥ 0,

360− 180π cos−1(hx(p)) if hy (p) < 0.

Alexandre Xavier Falcao MC940/MO445 - Image Analysis

Page 34: Description - Instituto de Computaçãoafalcao/mo445/description.pdf · A descriptor is analgorithmthat extracts a feature vector x(s) = (x 1(s);x 2(s);:::;x n(s)) from any sample

Texture: Histogram of Oriented Gradients (HoG)

One histogram of gradient orientations per cell is obtainedwith nb bins.

For nb = 9 bins, for instance, the bin 0 may be used toaccumulate votes from pixels whose ‖~g(p)‖ = 0 and theremaining bins store votes from pixels whose θ(p) falls within[0− 44], [45− 89], . . . , [315− 359], respectively.

The orientation θ(p) for hx(p) = gx (p)‖~g(p)‖ and hy (p) =

gy (p)‖~g(p)‖ is

defined as

θ(p) =

{180π cos−1(hx(p)) if hy (p) ≥ 0,

360− 180π cos−1(hx(p)) if hy (p) < 0.

Alexandre Xavier Falcao MC940/MO445 - Image Analysis

Page 35: Description - Instituto de Computaçãoafalcao/mo445/description.pdf · A descriptor is analgorithmthat extracts a feature vector x(s) = (x 1(s);x 2(s);:::;x n(s)) from any sample

Texture: Histogram of Oriented Gradients (HoG)

Each pixel p distributes ‖~g(p)‖ votes by trilinear interpolationbetween adjacent bins b1 and b2 of its four adjacent cellsq1, q2, q3, and q4.

p

q2

q3

q1

q4 p

q1,b1 q2,b1

q4,b1q3,b1

q1,b2 q2,b2

q4,b2q3,b2

Window

Cell

For θ = 30, for instance, b1 = 22 and b2 = 67, since thecenter of the 8 bins with non-zero gradient magnitude arerepresented by 22, 67, 112, 157, 202, 247, 292, and 337.

Alexandre Xavier Falcao MC940/MO445 - Image Analysis

Page 36: Description - Instituto de Computaçãoafalcao/mo445/description.pdf · A descriptor is analgorithmthat extracts a feature vector x(s) = (x 1(s);x 2(s);:::;x n(s)) from any sample

Texture: Histogram of Oriented Gradients (HoG)

Each pixel p distributes ‖~g(p)‖ votes by trilinear interpolationbetween adjacent bins b1 and b2 of its four adjacent cellsq1, q2, q3, and q4.

p

q2

q3

q1

q4 p

q1,b1 q2,b1

q4,b1q3,b1

q1,b2 q2,b2

q4,b2q3,b2

Window

Cell

For θ = 30, for instance, b1 = 22 and b2 = 67, since thecenter of the 8 bins with non-zero gradient magnitude arerepresented by 22, 67, 112, 157, 202, 247, 292, and 337.

Alexandre Xavier Falcao MC940/MO445 - Image Analysis

Page 37: Description - Instituto de Computaçãoafalcao/mo445/description.pdf · A descriptor is analgorithmthat extracts a feature vector x(s) = (x 1(s);x 2(s);:::;x n(s)) from any sample

Texture: Histogram of Oriented Gradients (HoG)

The distribution of votes aims to treat relevant pixels withhigh gradient magnitude that might fall in adjacent cells.

Let (xp, yp, zp), zp = θ(p), be the coordinate of p in a 3Dspace.

Let (xi , yi ) be the center of the cell qi , i = 1, 2, 3, 4 and(q1, b1), (q2, b1), (q3, b1), (q4, b1), (q1, b2), (q2, b2), (q3, b2),and (q4, b2) be the 8 vertices (xi , yi , zi ), i = 1, 2, . . . , 8,around p.

The gradient magnitude w = ‖~g(p)‖ is a weight distributedamong the 8 vertices by trilinear interpolation.

Alexandre Xavier Falcao MC940/MO445 - Image Analysis

Page 38: Description - Instituto de Computaçãoafalcao/mo445/description.pdf · A descriptor is analgorithmthat extracts a feature vector x(s) = (x 1(s);x 2(s);:::;x n(s)) from any sample

Texture: Histogram of Oriented Gradients (HoG)

The distribution of votes aims to treat relevant pixels withhigh gradient magnitude that might fall in adjacent cells.

Let (xp, yp, zp), zp = θ(p), be the coordinate of p in a 3Dspace.

Let (xi , yi ) be the center of the cell qi , i = 1, 2, 3, 4 and(q1, b1), (q2, b1), (q3, b1), (q4, b1), (q1, b2), (q2, b2), (q3, b2),and (q4, b2) be the 8 vertices (xi , yi , zi ), i = 1, 2, . . . , 8,around p.

The gradient magnitude w = ‖~g(p)‖ is a weight distributedamong the 8 vertices by trilinear interpolation.

Alexandre Xavier Falcao MC940/MO445 - Image Analysis

Page 39: Description - Instituto de Computaçãoafalcao/mo445/description.pdf · A descriptor is analgorithmthat extracts a feature vector x(s) = (x 1(s);x 2(s);:::;x n(s)) from any sample

Texture: Histogram of Oriented Gradients (HoG)

The distribution of votes aims to treat relevant pixels withhigh gradient magnitude that might fall in adjacent cells.

Let (xp, yp, zp), zp = θ(p), be the coordinate of p in a 3Dspace.

Let (xi , yi ) be the center of the cell qi , i = 1, 2, 3, 4 and(q1, b1), (q2, b1), (q3, b1), (q4, b1), (q1, b2), (q2, b2), (q3, b2),and (q4, b2) be the 8 vertices (xi , yi , zi ), i = 1, 2, . . . , 8,around p.

The gradient magnitude w = ‖~g(p)‖ is a weight distributedamong the 8 vertices by trilinear interpolation.

Alexandre Xavier Falcao MC940/MO445 - Image Analysis

Page 40: Description - Instituto de Computaçãoafalcao/mo445/description.pdf · A descriptor is analgorithmthat extracts a feature vector x(s) = (x 1(s);x 2(s);:::;x n(s)) from any sample

Texture: Histogram of Oriented Gradients (HoG)

The distribution of votes aims to treat relevant pixels withhigh gradient magnitude that might fall in adjacent cells.

Let (xp, yp, zp), zp = θ(p), be the coordinate of p in a 3Dspace.

Let (xi , yi ) be the center of the cell qi , i = 1, 2, 3, 4 and(q1, b1), (q2, b1), (q3, b1), (q4, b1), (q1, b2), (q2, b2), (q3, b2),and (q4, b2) be the 8 vertices (xi , yi , zi ), i = 1, 2, . . . , 8,around p.

The gradient magnitude w = ‖~g(p)‖ is a weight distributedamong the 8 vertices by trilinear interpolation.

Alexandre Xavier Falcao MC940/MO445 - Image Analysis

Page 41: Description - Instituto de Computaçãoafalcao/mo445/description.pdf · A descriptor is analgorithmthat extracts a feature vector x(s) = (x 1(s);x 2(s);:::;x n(s)) from any sample

Texture: Histogram of Oriented Gradients (HoG)

The weight w = ‖~g(p)‖ is first distributed between points p1 andp2 on opposite faces, then the weights on the faces are distributedamong points p3, p4, p5, p6 of opposite edges, and finally the edgeweights are distributed to the vertices p7, p8, p9, p10, p11, p12, p13,and p14 of the corresponding edges.

p1p2

p3

p4

p5

p6

p7

p8 p9

p10

p11

p12p13

p14

p

x

yz

Alexandre Xavier Falcao MC940/MO445 - Image Analysis

Page 42: Description - Instituto de Computaçãoafalcao/mo445/description.pdf · A descriptor is analgorithmthat extracts a feature vector x(s) = (x 1(s);x 2(s);:::;x n(s)) from any sample

Texture: Histogram of Oriented Gradients (HoG)

The weights wi of each point pi = (xpi , ypi , zpi ), i = 1, 2, . . . , 14,are computed as

w1 = w(xp2 − xp)

(xp2 − xp1)

w2 = w(xp − xp1)

(xp2 − xp1)

w3 = w1(yp1 − yp4)

(yp3 − yp4)

w4 = w1(yp3 − yp1)

(yp3 − yp4)

Alexandre Xavier Falcao MC940/MO445 - Image Analysis

Page 43: Description - Instituto de Computaçãoafalcao/mo445/description.pdf · A descriptor is analgorithmthat extracts a feature vector x(s) = (x 1(s);x 2(s);:::;x n(s)) from any sample

Texture: Histogram of Oriented Gradients (HoG)

w5 = w2(yp2 − yp6)

(yp5 − yp6)

w6 = w2(yp5 − yp2)

(yp5 − yp6)

w7 = w3(zp11 − zp3)

(zp11 − zp7)

w11 = w3(zp3 − zp7)

(yp11 − zp7)

w8 = w4(zp12 − zp4)

(zp12 − zp8)

w12 = w4(zp4 − zp8)

(zp12 − zp8)

Alexandre Xavier Falcao MC940/MO445 - Image Analysis

Page 44: Description - Instituto de Computaçãoafalcao/mo445/description.pdf · A descriptor is analgorithmthat extracts a feature vector x(s) = (x 1(s);x 2(s);:::;x n(s)) from any sample

Texture: Histogram of Oriented Gradients (HoG)

w10 = w5(zp14 − zp5)

(zp14 − zp10)

w14 = w5(zp5 − zp10)

(zp14 − zp10)

w9 = w6(zp13 − zp6)

(zp13 − zp9)

w13 = w6(zp6 − zp9)

(zp13 − zp9)

Finally the weights wi are accumulated in the corresponding bin ofthe cell represented by pi , i = 7, 8, 9, 10, 11, 12, 13, 14.

Alexandre Xavier Falcao MC940/MO445 - Image Analysis

Page 45: Description - Instituto de Computaçãoafalcao/mo445/description.pdf · A descriptor is analgorithmthat extracts a feature vector x(s) = (x 1(s);x 2(s);:::;x n(s)) from any sample

Texture: Histogram of Oriented Gradients (HoG)

Now, each group of n3 ×m3 cells constitutes a block.

Adjacent blocks are defined by stride (displacement in x andy).

stride of one cell

Block of 2x2 cellsWindow

Alexandre Xavier Falcao MC940/MO445 - Image Analysis

Page 46: Description - Instituto de Computaçãoafalcao/mo445/description.pdf · A descriptor is analgorithmthat extracts a feature vector x(s) = (x 1(s);x 2(s);:::;x n(s)) from any sample

Texture: Histogram of Oriented Gradients (HoG)

Now, each group of n3 ×m3 cells constitutes a block.

Adjacent blocks are defined by stride (displacement in x andy).

stride of one cell

Block of 2x2 cellsWindow

Alexandre Xavier Falcao MC940/MO445 - Image Analysis

Page 47: Description - Instituto de Computaçãoafalcao/mo445/description.pdf · A descriptor is analgorithmthat extracts a feature vector x(s) = (x 1(s);x 2(s);:::;x n(s)) from any sample

Texture: Histogram of Oriented Gradients (HoG)

The cell histograms in each block are concatenated from left toright, top to bottom, and normalized, to treat contrast variations.Similarly, the block feature vectors are concatenated to output aHoG feature vector for the window.

Alexandre Xavier Falcao MC940/MO445 - Image Analysis

Page 48: Description - Instituto de Computaçãoafalcao/mo445/description.pdf · A descriptor is analgorithmthat extracts a feature vector x(s) = (x 1(s);x 2(s);:::;x n(s)) from any sample

Texture: Histogram of Oriented Gradients (HoG)

Let hk(i), i = 0, 1, . . . , nb − 1 and k = 1, 2, . . . , n3 ×m3, bethe cell histograms in a block with n3 ×m3 cells.

Their concatenation from left to right, top to bottom,generates a vector with features vj , j = 1, 2, . . . , nb × n3×m3.

These features are normalized as

vj =vj√∑nb×n3×m3

j=1 vjvj + ε

where ε is a small number.

Alexandre Xavier Falcao MC940/MO445 - Image Analysis

Page 49: Description - Instituto de Computaçãoafalcao/mo445/description.pdf · A descriptor is analgorithmthat extracts a feature vector x(s) = (x 1(s);x 2(s);:::;x n(s)) from any sample

Texture: Histogram of Oriented Gradients (HoG)

Let hk(i), i = 0, 1, . . . , nb − 1 and k = 1, 2, . . . , n3 ×m3, bethe cell histograms in a block with n3 ×m3 cells.

Their concatenation from left to right, top to bottom,generates a vector with features vj , j = 1, 2, . . . , nb × n3×m3.

These features are normalized as

vj =vj√∑nb×n3×m3

j=1 vjvj + ε

where ε is a small number.

Alexandre Xavier Falcao MC940/MO445 - Image Analysis

Page 50: Description - Instituto de Computaçãoafalcao/mo445/description.pdf · A descriptor is analgorithmthat extracts a feature vector x(s) = (x 1(s);x 2(s);:::;x n(s)) from any sample

Texture: Histogram of Oriented Gradients (HoG)

Let hk(i), i = 0, 1, . . . , nb − 1 and k = 1, 2, . . . , n3 ×m3, bethe cell histograms in a block with n3 ×m3 cells.

Their concatenation from left to right, top to bottom,generates a vector with features vj , j = 1, 2, . . . , nb × n3×m3.

These features are normalized as

vj =vj√∑nb×n3×m3

j=1 vjvj + ε

where ε is a small number.

Alexandre Xavier Falcao MC940/MO445 - Image Analysis

Page 51: Description - Instituto de Computaçãoafalcao/mo445/description.pdf · A descriptor is analgorithmthat extracts a feature vector x(s) = (x 1(s);x 2(s);:::;x n(s)) from any sample

Texture: Histogram of Oriented Gradients (HoG)

For instance, for a window with 126× 36 pixels and cells with6× 6 pixels, each window contains 21× 6 cells.

If each block is defined by 2× 2 cells and the stride is 1 cell inx and y , each window generates 20× 5 blocks.

The four cell histograms of 9 bins in each block areconcatenated and normalized to compose a vector of 36features per block.

The feature vectors of the blocks are then concatenated toform a HoG vector with 20× 5× 36 features.

Alexandre Xavier Falcao MC940/MO445 - Image Analysis

Page 52: Description - Instituto de Computaçãoafalcao/mo445/description.pdf · A descriptor is analgorithmthat extracts a feature vector x(s) = (x 1(s);x 2(s);:::;x n(s)) from any sample

Texture: Histogram of Oriented Gradients (HoG)

For instance, for a window with 126× 36 pixels and cells with6× 6 pixels, each window contains 21× 6 cells.

If each block is defined by 2× 2 cells and the stride is 1 cell inx and y , each window generates 20× 5 blocks.

The four cell histograms of 9 bins in each block areconcatenated and normalized to compose a vector of 36features per block.

The feature vectors of the blocks are then concatenated toform a HoG vector with 20× 5× 36 features.

Alexandre Xavier Falcao MC940/MO445 - Image Analysis

Page 53: Description - Instituto de Computaçãoafalcao/mo445/description.pdf · A descriptor is analgorithmthat extracts a feature vector x(s) = (x 1(s);x 2(s);:::;x n(s)) from any sample

Texture: Histogram of Oriented Gradients (HoG)

For instance, for a window with 126× 36 pixels and cells with6× 6 pixels, each window contains 21× 6 cells.

If each block is defined by 2× 2 cells and the stride is 1 cell inx and y , each window generates 20× 5 blocks.

The four cell histograms of 9 bins in each block areconcatenated and normalized to compose a vector of 36features per block.

The feature vectors of the blocks are then concatenated toform a HoG vector with 20× 5× 36 features.

Alexandre Xavier Falcao MC940/MO445 - Image Analysis

Page 54: Description - Instituto de Computaçãoafalcao/mo445/description.pdf · A descriptor is analgorithmthat extracts a feature vector x(s) = (x 1(s);x 2(s);:::;x n(s)) from any sample

Texture: Histogram of Oriented Gradients (HoG)

For instance, for a window with 126× 36 pixels and cells with6× 6 pixels, each window contains 21× 6 cells.

If each block is defined by 2× 2 cells and the stride is 1 cell inx and y , each window generates 20× 5 blocks.

The four cell histograms of 9 bins in each block areconcatenated and normalized to compose a vector of 36features per block.

The feature vectors of the blocks are then concatenated toform a HoG vector with 20× 5× 36 features.

Alexandre Xavier Falcao MC940/MO445 - Image Analysis

Page 55: Description - Instituto de Computaçãoafalcao/mo445/description.pdf · A descriptor is analgorithmthat extracts a feature vector x(s) = (x 1(s);x 2(s);:::;x n(s)) from any sample

Texture: Bag of Visual Words (BoVW)

Let A, B, and C be examples of texts about different subjects.

A Bag of Words (BoW) is a dictionary of keywords identified asthe most frequent ones in texts from different subjects.

Alexandre Xavier Falcao MC940/MO445 - Image Analysis

Page 56: Description - Instituto de Computaçãoafalcao/mo445/description.pdf · A descriptor is analgorithmthat extracts a feature vector x(s) = (x 1(s);x 2(s);:::;x n(s)) from any sample

Texture: Bag of Visual Words (BoVW)

By using the dictionary on the right to determine the frequency ofits words in A, B, and C, each text is represented by the followinghistograms.

Alexandre Xavier Falcao MC940/MO445 - Image Analysis

Page 57: Description - Instituto de Computaçãoafalcao/mo445/description.pdf · A descriptor is analgorithmthat extracts a feature vector x(s) = (x 1(s);x 2(s);:::;x n(s)) from any sample

Texture: Bag of Visual Words (BoVW)

We expect higher similarity between the histograms of A and Bthen between any of them and the histogram of C.

Alexandre Xavier Falcao MC940/MO445 - Image Analysis

Page 58: Description - Instituto de Computaçãoafalcao/mo445/description.pdf · A descriptor is analgorithmthat extracts a feature vector x(s) = (x 1(s);x 2(s);:::;x n(s)) from any sample

Texture: Bag of Visual Words (BoVW)

There are a couple of drawbacks in BoW that need more attention.

For a given subject (category), the choice of its most frequentkeywords is crucial, but BoW is unsupervised. How can weincorporate supervision in BoW?

Such keywords might not occur always in texts from thatsubject. How can we avoid errors of similarity with othersubjects?

Alexandre Xavier Falcao MC940/MO445 - Image Analysis

Page 59: Description - Instituto de Computaçãoafalcao/mo445/description.pdf · A descriptor is analgorithmthat extracts a feature vector x(s) = (x 1(s);x 2(s);:::;x n(s)) from any sample

Texture: Bag of Visual Words (BoVW)

There are a couple of drawbacks in BoW that need more attention.

For a given subject (category), the choice of its most frequentkeywords is crucial, but BoW is unsupervised. How can weincorporate supervision in BoW?

Such keywords might not occur always in texts from thatsubject. How can we avoid errors of similarity with othersubjects?

Alexandre Xavier Falcao MC940/MO445 - Image Analysis

Page 60: Description - Instituto de Computaçãoafalcao/mo445/description.pdf · A descriptor is analgorithmthat extracts a feature vector x(s) = (x 1(s);x 2(s);:::;x n(s)) from any sample

Texture: Bag of Visual Words (BoVW)

There are a couple of drawbacks in BoW that need more attention.

For a given subject (category), the choice of its most frequentkeywords is crucial, but BoW is unsupervised. How can weincorporate supervision in BoW?

Such keywords might not occur always in texts from thatsubject. How can we avoid errors of similarity with othersubjects?

Alexandre Xavier Falcao MC940/MO445 - Image Analysis

Page 61: Description - Instituto de Computaçãoafalcao/mo445/description.pdf · A descriptor is analgorithmthat extracts a feature vector x(s) = (x 1(s);x 2(s);:::;x n(s)) from any sample

Texture: Bag of Visual Words (BoVW)

In Bag of Visual Words (BoVW), the problem is the same but

the words are the local features of image patches extractedfrom key locations and

the most frequent ones are the keywords, which aredetermined by patch clustering as the group representatives.

from MathWorks.com

Alexandre Xavier Falcao MC940/MO445 - Image Analysis

Page 62: Description - Instituto de Computaçãoafalcao/mo445/description.pdf · A descriptor is analgorithmthat extracts a feature vector x(s) = (x 1(s);x 2(s);:::;x n(s)) from any sample

Texture: Bag of Visual Words (BoVW)

from MathWorks.com

One can separate training images from each category, build andmerge the dictionaries, encode the training images, and train aclassifier, but....

Alexandre Xavier Falcao MC940/MO445 - Image Analysis

Page 63: Description - Instituto de Computaçãoafalcao/mo445/description.pdf · A descriptor is analgorithmthat extracts a feature vector x(s) = (x 1(s);x 2(s);:::;x n(s)) from any sample

Texture: Bag of Visual Words (BoVW)

from MathWorks.com

One can separate training images from each category, build andmerge the dictionaries, encode the training images, and train aclassifier, but....

Alexandre Xavier Falcao MC940/MO445 - Image Analysis

Page 64: Description - Instituto de Computaçãoafalcao/mo445/description.pdf · A descriptor is analgorithmthat extracts a feature vector x(s) = (x 1(s);x 2(s);:::;x n(s)) from any sample

Texture: Bag of Visual Words (BoVW)

from MathWorks.com

One can separate training images from each category, build andmerge the dictionaries, encode the training images, and train aclassifier, but....

Alexandre Xavier Falcao MC940/MO445 - Image Analysis

Page 65: Description - Instituto de Computaçãoafalcao/mo445/description.pdf · A descriptor is analgorithmthat extracts a feature vector x(s) = (x 1(s);x 2(s);:::;x n(s)) from any sample

Texture: Bag of Visual Words (BoVW)

Can we identify the most discriminative visual words, whenmerging dictionaries, and take this into account whenencoding the images?

How do we choose the key patches, local features, clusteringtechnique, similarity function, image encoding rule?

By building a single dictionary, the image code becomessparse and high-dimensional. What are the most suitableclassifiers for this case?

Alexandre Xavier Falcao MC940/MO445 - Image Analysis

Page 66: Description - Instituto de Computaçãoafalcao/mo445/description.pdf · A descriptor is analgorithmthat extracts a feature vector x(s) = (x 1(s);x 2(s);:::;x n(s)) from any sample

Texture: Bag of Visual Words (BoVW)

Can we identify the most discriminative visual words, whenmerging dictionaries, and take this into account whenencoding the images?

How do we choose the key patches, local features, clusteringtechnique, similarity function, image encoding rule?

By building a single dictionary, the image code becomessparse and high-dimensional. What are the most suitableclassifiers for this case?

Alexandre Xavier Falcao MC940/MO445 - Image Analysis

Page 67: Description - Instituto de Computaçãoafalcao/mo445/description.pdf · A descriptor is analgorithmthat extracts a feature vector x(s) = (x 1(s);x 2(s);:::;x n(s)) from any sample

Texture: Bag of Visual Words (BoVW)

Can we identify the most discriminative visual words, whenmerging dictionaries, and take this into account whenencoding the images?

How do we choose the key patches, local features, clusteringtechnique, similarity function, image encoding rule?

By building a single dictionary, the image code becomessparse and high-dimensional. What are the most suitableclassifiers for this case?

Alexandre Xavier Falcao MC940/MO445 - Image Analysis

Page 68: Description - Instituto de Computaçãoafalcao/mo445/description.pdf · A descriptor is analgorithmthat extracts a feature vector x(s) = (x 1(s);x 2(s);:::;x n(s)) from any sample

Texture: Bag of Visual Words (BoVW)

Some visual words might be common for different categories,but be discriminative when used together with the otherwords.

The discriminative power of such words must be determinedby some feature selection technique.

Sampling techniques, such as Scale Invariant FeatureTransform (SIFT), grid, random, have been used to locate keypatches.

The local features are usually color and texture features ofthose patches.

Alexandre Xavier Falcao MC940/MO445 - Image Analysis

Page 69: Description - Instituto de Computaçãoafalcao/mo445/description.pdf · A descriptor is analgorithmthat extracts a feature vector x(s) = (x 1(s);x 2(s);:::;x n(s)) from any sample

Texture: Bag of Visual Words (BoVW)

Some visual words might be common for different categories,but be discriminative when used together with the otherwords.

The discriminative power of such words must be determinedby some feature selection technique.

Sampling techniques, such as Scale Invariant FeatureTransform (SIFT), grid, random, have been used to locate keypatches.

The local features are usually color and texture features ofthose patches.

Alexandre Xavier Falcao MC940/MO445 - Image Analysis

Page 70: Description - Instituto de Computaçãoafalcao/mo445/description.pdf · A descriptor is analgorithmthat extracts a feature vector x(s) = (x 1(s);x 2(s);:::;x n(s)) from any sample

Texture: Bag of Visual Words (BoVW)

Some visual words might be common for different categories,but be discriminative when used together with the otherwords.

The discriminative power of such words must be determinedby some feature selection technique.

Sampling techniques, such as Scale Invariant FeatureTransform (SIFT), grid, random, have been used to locate keypatches.

The local features are usually color and texture features ofthose patches.

Alexandre Xavier Falcao MC940/MO445 - Image Analysis

Page 71: Description - Instituto de Computaçãoafalcao/mo445/description.pdf · A descriptor is analgorithmthat extracts a feature vector x(s) = (x 1(s);x 2(s);:::;x n(s)) from any sample

Texture: Bag of Visual Words (BoVW)

Some visual words might be common for different categories,but be discriminative when used together with the otherwords.

The discriminative power of such words must be determinedby some feature selection technique.

Sampling techniques, such as Scale Invariant FeatureTransform (SIFT), grid, random, have been used to locate keypatches.

The local features are usually color and texture features ofthose patches.

Alexandre Xavier Falcao MC940/MO445 - Image Analysis

Page 72: Description - Instituto de Computaçãoafalcao/mo445/description.pdf · A descriptor is analgorithmthat extracts a feature vector x(s) = (x 1(s);x 2(s);:::;x n(s)) from any sample

Texture: Bag of Visual Words (BoVW)

The most commonly used clustering technique is k-means,where k defines the size of the dictionary.

The similarity function between patches (and visual words) isinversely proportional to a distance function (e.g., cosine,Euclidean, etc) suitable for the type of local description.

The images are usually coded by either hard or softassignment.

hard assigment: each patch counts for its closest visual wordonly.

soft assignment: each patch counts for all visual words withcount directly proportional to the similarity between patch andword.

Alexandre Xavier Falcao MC940/MO445 - Image Analysis

Page 73: Description - Instituto de Computaçãoafalcao/mo445/description.pdf · A descriptor is analgorithmthat extracts a feature vector x(s) = (x 1(s);x 2(s);:::;x n(s)) from any sample

Texture: Bag of Visual Words (BoVW)

The most commonly used clustering technique is k-means,where k defines the size of the dictionary.

The similarity function between patches (and visual words) isinversely proportional to a distance function (e.g., cosine,Euclidean, etc) suitable for the type of local description.

The images are usually coded by either hard or softassignment.

hard assigment: each patch counts for its closest visual wordonly.

soft assignment: each patch counts for all visual words withcount directly proportional to the similarity between patch andword.

Alexandre Xavier Falcao MC940/MO445 - Image Analysis

Page 74: Description - Instituto de Computaçãoafalcao/mo445/description.pdf · A descriptor is analgorithmthat extracts a feature vector x(s) = (x 1(s);x 2(s);:::;x n(s)) from any sample

Texture: Bag of Visual Words (BoVW)

The most commonly used clustering technique is k-means,where k defines the size of the dictionary.

The similarity function between patches (and visual words) isinversely proportional to a distance function (e.g., cosine,Euclidean, etc) suitable for the type of local description.

The images are usually coded by either hard or softassignment.

hard assigment: each patch counts for its closest visual wordonly.

soft assignment: each patch counts for all visual words withcount directly proportional to the similarity between patch andword.

Alexandre Xavier Falcao MC940/MO445 - Image Analysis

Page 75: Description - Instituto de Computaçãoafalcao/mo445/description.pdf · A descriptor is analgorithmthat extracts a feature vector x(s) = (x 1(s);x 2(s);:::;x n(s)) from any sample

Texture: Bag of Visual Words (BoVW)

The most commonly used clustering technique is k-means,where k defines the size of the dictionary.

The similarity function between patches (and visual words) isinversely proportional to a distance function (e.g., cosine,Euclidean, etc) suitable for the type of local description.

The images are usually coded by either hard or softassignment.

hard assigment: each patch counts for its closest visual wordonly.

soft assignment: each patch counts for all visual words withcount directly proportional to the similarity between patch andword.

Alexandre Xavier Falcao MC940/MO445 - Image Analysis

Page 76: Description - Instituto de Computaçãoafalcao/mo445/description.pdf · A descriptor is analgorithmthat extracts a feature vector x(s) = (x 1(s);x 2(s);:::;x n(s)) from any sample

Texture: Bag of Visual Words (BoVW)

The most commonly used clustering technique is k-means,where k defines the size of the dictionary.

The similarity function between patches (and visual words) isinversely proportional to a distance function (e.g., cosine,Euclidean, etc) suitable for the type of local description.

The images are usually coded by either hard or softassignment.

hard assigment: each patch counts for its closest visual wordonly.

soft assignment: each patch counts for all visual words withcount directly proportional to the similarity between patch andword.

Alexandre Xavier Falcao MC940/MO445 - Image Analysis

Page 77: Description - Instituto de Computaçãoafalcao/mo445/description.pdf · A descriptor is analgorithmthat extracts a feature vector x(s) = (x 1(s);x 2(s);:::;x n(s)) from any sample

Texture: Bag of Visual Words (BoVW)

The resulting feature vectors (histograms) ask forclassification techniques such as support vector machines andmulti-layer perceptron.

The histograms lose the spatial information (localization) ofthe visual words in each image.

The convolution between the image and a filter bank(dictionary) is equivalent to select patches for all pixels,compute the similarity to all kernels (visual words), such as insoft assignment, and store the result in a multiband image,preserving all spatial and texture information.

This is the strategy in Convolutional Networks (CNNs orConvNets) for image description [6].

Alexandre Xavier Falcao MC940/MO445 - Image Analysis

Page 78: Description - Instituto de Computaçãoafalcao/mo445/description.pdf · A descriptor is analgorithmthat extracts a feature vector x(s) = (x 1(s);x 2(s);:::;x n(s)) from any sample

Texture: Bag of Visual Words (BoVW)

The resulting feature vectors (histograms) ask forclassification techniques such as support vector machines andmulti-layer perceptron.

The histograms lose the spatial information (localization) ofthe visual words in each image.

The convolution between the image and a filter bank(dictionary) is equivalent to select patches for all pixels,compute the similarity to all kernels (visual words), such as insoft assignment, and store the result in a multiband image,preserving all spatial and texture information.

This is the strategy in Convolutional Networks (CNNs orConvNets) for image description [6].

Alexandre Xavier Falcao MC940/MO445 - Image Analysis

Page 79: Description - Instituto de Computaçãoafalcao/mo445/description.pdf · A descriptor is analgorithmthat extracts a feature vector x(s) = (x 1(s);x 2(s);:::;x n(s)) from any sample

Texture: Bag of Visual Words (BoVW)

The resulting feature vectors (histograms) ask forclassification techniques such as support vector machines andmulti-layer perceptron.

The histograms lose the spatial information (localization) ofthe visual words in each image.

The convolution between the image and a filter bank(dictionary) is equivalent to select patches for all pixels,compute the similarity to all kernels (visual words), such as insoft assignment, and store the result in a multiband image,preserving all spatial and texture information.

This is the strategy in Convolutional Networks (CNNs orConvNets) for image description [6].

Alexandre Xavier Falcao MC940/MO445 - Image Analysis

Page 80: Description - Instituto de Computaçãoafalcao/mo445/description.pdf · A descriptor is analgorithmthat extracts a feature vector x(s) = (x 1(s);x 2(s);:::;x n(s)) from any sample

Texture: Bag of Visual Words (BoVW)

The resulting feature vectors (histograms) ask forclassification techniques such as support vector machines andmulti-layer perceptron.

The histograms lose the spatial information (localization) ofthe visual words in each image.

The convolution between the image and a filter bank(dictionary) is equivalent to select patches for all pixels,compute the similarity to all kernels (visual words), such as insoft assignment, and store the result in a multiband image,preserving all spatial and texture information.

This is the strategy in Convolutional Networks (CNNs orConvNets) for image description [6].

Alexandre Xavier Falcao MC940/MO445 - Image Analysis

Page 81: Description - Instituto de Computaçãoafalcao/mo445/description.pdf · A descriptor is analgorithmthat extracts a feature vector x(s) = (x 1(s);x 2(s);:::;x n(s)) from any sample

Texture: Convolutional Network (ConvNets)

Each layer of a ConvNet may consist of four operations.

Convolution between the input image and a kernel bank,

neuron activation,

pooling, and

normalization.

By stacking multiple layers, one after the other, low-level texturefeatures are combined into high-level features for image description.

Alexandre Xavier Falcao MC940/MO445 - Image Analysis

Page 82: Description - Instituto de Computaçãoafalcao/mo445/description.pdf · A descriptor is analgorithmthat extracts a feature vector x(s) = (x 1(s);x 2(s);:::;x n(s)) from any sample

Texture: Convolutional Network (ConvNets)

The feature vector can be represented by the features of each pixelfrom left to right, top to bottom, after the third layer of the CNNfor classification by SVM, or as the last hidden layer of amulti-layer perceptron (MLP) classifier.

Alexandre Xavier Falcao MC940/MO445 - Image Analysis

Page 83: Description - Instituto de Computaçãoafalcao/mo445/description.pdf · A descriptor is analgorithmthat extracts a feature vector x(s) = (x 1(s);x 2(s);:::;x n(s)) from any sample

Texture: Convolutional Network (ConvNets)

The feature vector can be represented by the features of each pixelfrom left to right, top to bottom, after the third layer of the CNNfor classification by SVM, or as the last hidden layer of amulti-layer perceptron (MLP) classifier.

Alexandre Xavier Falcao MC940/MO445 - Image Analysis

Page 84: Description - Instituto de Computaçãoafalcao/mo445/description.pdf · A descriptor is analgorithmthat extracts a feature vector x(s) = (x 1(s);x 2(s);:::;x n(s)) from any sample

Texture: Convolutional Network (ConvNets)

L1 L2 L3L0

The feature vector can be represented by the features of each pixelfrom left to right, top to bottom, after the third layer of the CNNfor classification by SVM, or as the last hidden layer of amulti-layer perceptron (MLP) classifier.

Alexandre Xavier Falcao MC940/MO445 - Image Analysis

Page 85: Description - Instituto de Computaçãoafalcao/mo445/description.pdf · A descriptor is analgorithmthat extracts a feature vector x(s) = (x 1(s);x 2(s);:::;x n(s)) from any sample

Texture: Convolutional Network (ConvNets)

Each of the main operations in a CNN layer,

Convolution between the input image and a kernel bank,

neuron activation,

pooling, and

normalization,

can be explained as follows.

Alexandre Xavier Falcao MC940/MO445 - Image Analysis

Page 86: Description - Instituto de Computaçãoafalcao/mo445/description.pdf · A descriptor is analgorithmthat extracts a feature vector x(s) = (x 1(s);x 2(s);:::;x n(s)) from any sample

Texture: Convolutional Network (ConvNets)

First recall that the convolution between a grayscale imageI = (DI , I ) and a symmetric kernel (Ar ,W ) outputs agrayscale image J = (DJ , J), with

J(p) =K∑

k=1

I (qk)wk , ∀p ∈ DI ,

where Ar (p) = {q1, q2, . . . , qK}, r ≥ 1, andW = [w1,w2, . . . ,wK ].

However, the adjacency relation Ar is usually defined as a box

Ar : {(p, q) ∈ DI × DI | |xq − xp| ≤r

2and |yq − yp| ≤

r

2}

This operation is equivalent to the inner product 〈v(p),w〉between the vectors v(p) = (I (q1), I (q2), . . . , I (qK )) andw = (w1,w2, . . . ,wK ).

Alexandre Xavier Falcao MC940/MO445 - Image Analysis

Page 87: Description - Instituto de Computaçãoafalcao/mo445/description.pdf · A descriptor is analgorithmthat extracts a feature vector x(s) = (x 1(s);x 2(s);:::;x n(s)) from any sample

Texture: Convolutional Network (ConvNets)

First recall that the convolution between a grayscale imageI = (DI , I ) and a symmetric kernel (Ar ,W ) outputs agrayscale image J = (DJ , J), with

J(p) =K∑

k=1

I (qk)wk , ∀p ∈ DI ,

where Ar (p) = {q1, q2, . . . , qK}, r ≥ 1, andW = [w1,w2, . . . ,wK ].

However, the adjacency relation Ar is usually defined as a box

Ar : {(p, q) ∈ DI × DI | |xq − xp| ≤r

2and |yq − yp| ≤

r

2}

This operation is equivalent to the inner product 〈v(p),w〉between the vectors v(p) = (I (q1), I (q2), . . . , I (qK )) andw = (w1,w2, . . . ,wK ).

Alexandre Xavier Falcao MC940/MO445 - Image Analysis

Page 88: Description - Instituto de Computaçãoafalcao/mo445/description.pdf · A descriptor is analgorithmthat extracts a feature vector x(s) = (x 1(s);x 2(s);:::;x n(s)) from any sample

Texture: Convolutional Network (ConvNets)

First recall that the convolution between a grayscale imageI = (DI , I ) and a symmetric kernel (Ar ,W ) outputs agrayscale image J = (DJ , J), with

J(p) =K∑

k=1

I (qk)wk , ∀p ∈ DI ,

where Ar (p) = {q1, q2, . . . , qK}, r ≥ 1, andW = [w1,w2, . . . ,wK ].

However, the adjacency relation Ar is usually defined as a box

Ar : {(p, q) ∈ DI × DI | |xq − xp| ≤r

2and |yq − yp| ≤

r

2}

This operation is equivalent to the inner product 〈v(p),w〉between the vectors v(p) = (I (q1), I (q2), . . . , I (qK )) andw = (w1,w2, . . . ,wK ).

Alexandre Xavier Falcao MC940/MO445 - Image Analysis

Page 89: Description - Instituto de Computaçãoafalcao/mo445/description.pdf · A descriptor is analgorithmthat extracts a feature vector x(s) = (x 1(s);x 2(s);:::;x n(s)) from any sample

Texture: Convolutional Network (ConvNets)

One can also think of it as part of a perceptron (artificial neuron)operation at location p.

J(p) = 〈v(p),w〉

O(p) =

{J(p) + b, if J(p) + b > θ,0, otherwise,

where b is the bias and θ ≥ 0 is the neuron activation threshold.

Alexandre Xavier Falcao MC940/MO445 - Image Analysis

Page 90: Description - Instituto de Computaçãoafalcao/mo445/description.pdf · A descriptor is analgorithmthat extracts a feature vector x(s) = (x 1(s);x 2(s);:::;x n(s)) from any sample

Texture: Convolutional Network (ConvNets)

Graphically, v(p) is a point in <K and w is the normal vector of ahyperplane in <K . The bias b affects the position of the hyperplaneand θ selects points v(p) with some margin in its positive side.

Convolution is then followed by activation.

Alexandre Xavier Falcao MC940/MO445 - Image Analysis

Page 91: Description - Instituto de Computaçãoafalcao/mo445/description.pdf · A descriptor is analgorithmthat extracts a feature vector x(s) = (x 1(s);x 2(s);:::;x n(s)) from any sample

Texture: Convolutional Network (ConvNets)

Some of many alternatives for the activation function ψ(x),where x = J(p) + b, are

ψ(x) = max{0, x − θ} rectified linear unit (ReLu)

ψ(x) = ln(1 + exp(x)) softplus

ψ(x) =1

1 + exp(−x)logistic

ψ(x) = tan−1(x) arctan

A same activation function (e.g., ReLU) for all pixels p ∈ DI

in a given layer is common choice.

It is also common that ‖w‖ = 1. A CNN may also adoptkernels with mean weights equal to 0, specially when they arerandomly chosen. In this case, the bias can be zero for allpixels.

Alexandre Xavier Falcao MC940/MO445 - Image Analysis

Page 92: Description - Instituto de Computaçãoafalcao/mo445/description.pdf · A descriptor is analgorithmthat extracts a feature vector x(s) = (x 1(s);x 2(s);:::;x n(s)) from any sample

Texture: Convolutional Network (ConvNets)

Some of many alternatives for the activation function ψ(x),where x = J(p) + b, are

ψ(x) = max{0, x − θ} rectified linear unit (ReLu)

ψ(x) = ln(1 + exp(x)) softplus

ψ(x) =1

1 + exp(−x)logistic

ψ(x) = tan−1(x) arctan

A same activation function (e.g., ReLU) for all pixels p ∈ DI

in a given layer is common choice.

It is also common that ‖w‖ = 1. A CNN may also adoptkernels with mean weights equal to 0, specially when they arerandomly chosen. In this case, the bias can be zero for allpixels.

Alexandre Xavier Falcao MC940/MO445 - Image Analysis

Page 93: Description - Instituto de Computaçãoafalcao/mo445/description.pdf · A descriptor is analgorithmthat extracts a feature vector x(s) = (x 1(s);x 2(s);:::;x n(s)) from any sample

Texture: Convolutional Network (ConvNets)

Some of many alternatives for the activation function ψ(x),where x = J(p) + b, are

ψ(x) = max{0, x − θ} rectified linear unit (ReLu)

ψ(x) = ln(1 + exp(x)) softplus

ψ(x) =1

1 + exp(−x)logistic

ψ(x) = tan−1(x) arctan

A same activation function (e.g., ReLU) for all pixels p ∈ DI

in a given layer is common choice.

It is also common that ‖w‖ = 1. A CNN may also adoptkernels with mean weights equal to 0, specially when they arerandomly chosen. In this case, the bias can be zero for allpixels.

Alexandre Xavier Falcao MC940/MO445 - Image Analysis

Page 94: Description - Instituto de Computaçãoafalcao/mo445/description.pdf · A descriptor is analgorithmthat extracts a feature vector x(s) = (x 1(s);x 2(s);:::;x n(s)) from any sample

Texture: Convolutional Network (ConvNets)

For a kernel bank, W is a matrix N × K , the input image isrepresented by a matrix XI of size K ×m, m = |DI |, and theirconvolution becomes the matrix XJ = WXI of size N ×m, where

W =

w11 w12 . . . w1K

w21 w22 . . . w2K...

......

...wN1 wN2 . . . wNK

,

XI =

I (q11) I (q12) . . . I (q1m)I (q21) I (q22) . . . I (q2m)

......

......

I (qK1) I (qK2) . . . I (qKm)

, and

Alexandre Xavier Falcao MC940/MO445 - Image Analysis

Page 95: Description - Instituto de Computaçãoafalcao/mo445/description.pdf · A descriptor is analgorithmthat extracts a feature vector x(s) = (x 1(s);x 2(s);:::;x n(s)) from any sample

Texture: Convolutional Network (ConvNets)

XJ =

J(p11) J(p12) . . . J(p1m)J(p21) J(p22) . . . J(p2m)

......

......

J(pN1) J(pN2) . . . J(pNm)

.

Each row in W contains a kernel i = 1, 2, . . . ,N of the bank,

each column in XI contains the intensities of the adjacentpixels qkj , k = 1, 2, . . . ,K , of each pixel pj , j = 1, 2, . . . ,m,and

each column in XJ represents the feature vector J(pj) of the

resulting multiband image J = (DI , J). It is also common toeliminate pixels at distance r to the image border, reducing itsdomain to DJ ⊂ DI .

Alexandre Xavier Falcao MC940/MO445 - Image Analysis

Page 96: Description - Instituto de Computaçãoafalcao/mo445/description.pdf · A descriptor is analgorithmthat extracts a feature vector x(s) = (x 1(s);x 2(s);:::;x n(s)) from any sample

Texture: Convolutional Network (ConvNets)

When I = (DI , I) is a multiband image,

the feature vectors I(qkj) are expanded along their respectivecolumns in XI ,

each kernel i = 1, 2, . . . ,N must be multiband with vectorialweights wik , k = 1, 2, . . . ,K , expanded along their respectiverows in W .

J(pij) =K∑

k=1

〈I(qkj),wik〉

The bias and activation are then applied to J = (DI , J) tocreate the output O = (DJ ,O).

Alexandre Xavier Falcao MC940/MO445 - Image Analysis

Page 97: Description - Instituto de Computaçãoafalcao/mo445/description.pdf · A descriptor is analgorithmthat extracts a feature vector x(s) = (x 1(s);x 2(s);:::;x n(s)) from any sample

Texture: Convolutional Network (ConvNets)

When I = (DI , I) is a multiband image,

the feature vectors I(qkj) are expanded along their respectivecolumns in XI ,

each kernel i = 1, 2, . . . ,N must be multiband with vectorialweights wik , k = 1, 2, . . . ,K , expanded along their respectiverows in W .

J(pij) =K∑

k=1

〈I(qkj),wik〉

The bias and activation are then applied to J = (DI , J) tocreate the output O = (DJ ,O).

Alexandre Xavier Falcao MC940/MO445 - Image Analysis

Page 98: Description - Instituto de Computaçãoafalcao/mo445/description.pdf · A descriptor is analgorithmthat extracts a feature vector x(s) = (x 1(s);x 2(s);:::;x n(s)) from any sample

Texture: Convolutional Network (ConvNets)

When I = (DI , I) is a multiband image,

the feature vectors I(qkj) are expanded along their respectivecolumns in XI ,

each kernel i = 1, 2, . . . ,N must be multiband with vectorialweights wik , k = 1, 2, . . . ,K , expanded along their respectiverows in W .

J(pij) =K∑

k=1

〈I(qkj),wik〉

The bias and activation are then applied to J = (DI , J) tocreate the output O = (DJ ,O).

Alexandre Xavier Falcao MC940/MO445 - Image Analysis

Page 99: Description - Instituto de Computaçãoafalcao/mo445/description.pdf · A descriptor is analgorithmthat extracts a feature vector x(s) = (x 1(s);x 2(s);:::;x n(s)) from any sample

Texture: Convolutional Network (ConvNets)

For a given box adjacency Ar of size r ≥ 1 around pixel p, thepooling operation aims at aggregating for p the activationsthat might have occurred nearby.

This operation can also be applied only to every r ≥ d ≥ 1pixels (stride).

It makes the process robust to slight object translations and,for d > 1, reduces the output image domain.

Alexandre Xavier Falcao MC940/MO445 - Image Analysis

Page 100: Description - Instituto de Computaçãoafalcao/mo445/description.pdf · A descriptor is analgorithmthat extracts a feature vector x(s) = (x 1(s);x 2(s);:::;x n(s)) from any sample

Texture: Convolutional Network (ConvNets)

For a given box adjacency Ar of size r ≥ 1 around pixel p, thepooling operation aims at aggregating for p the activationsthat might have occurred nearby.

This operation can also be applied only to every r ≥ d ≥ 1pixels (stride).

It makes the process robust to slight object translations and,for d > 1, reduces the output image domain.

Alexandre Xavier Falcao MC940/MO445 - Image Analysis

Page 101: Description - Instituto de Computaçãoafalcao/mo445/description.pdf · A descriptor is analgorithmthat extracts a feature vector x(s) = (x 1(s);x 2(s);:::;x n(s)) from any sample

Texture: Convolutional Network (ConvNets)

For a given box adjacency Ar of size r ≥ 1 around pixel p, thepooling operation aims at aggregating for p the activationsthat might have occurred nearby.

This operation can also be applied only to every r ≥ d ≥ 1pixels (stride).

It makes the process robust to slight object translations and,for d > 1, reduces the output image domain.

Alexandre Xavier Falcao MC940/MO445 - Image Analysis

Page 102: Description - Instituto de Computaçãoafalcao/mo445/description.pdf · A descriptor is analgorithmthat extracts a feature vector x(s) = (x 1(s);x 2(s);:::;x n(s)) from any sample

Texture: Convolutional Network (ConvNets)

The pooling applied to image O = (DJ ,O) creates an imageP = (DP ,P), DP ⊂ DJ , P = (P1,P2, . . . ,PN),

Pi (p) = α

√ ∑∀q∈Ar (p)

Oi (q)α,

for i = 1, 2, . . . ,N, and α ≥ 1 controls the operation from

additive to max-pooling.

Alexandre Xavier Falcao MC940/MO445 - Image Analysis

Page 103: Description - Instituto de Computaçãoafalcao/mo445/description.pdf · A descriptor is analgorithmthat extracts a feature vector x(s) = (x 1(s);x 2(s);:::;x n(s)) from any sample

Texture: Convolutional Network (ConvNets)

For a given box adjacency set Ar (p) = {q1, q2, . . . , qK} of sizer ≥ 1 around pixel p, the normalization enhances the isolated(relevant) activations in detriment of the others and outputs imageQ = (DQ ,Q), DQ ⊂ DP , Q = (Q1,Q2, . . . ,QN), and

Qi (p) =Qi (p)√∑N

i=1

∑Kk=1Qi (qk)Qi (qk)

.

The process continues by having Q as input of the next layer.

Alexandre Xavier Falcao MC940/MO445 - Image Analysis

Page 104: Description - Instituto de Computaçãoafalcao/mo445/description.pdf · A descriptor is analgorithmthat extracts a feature vector x(s) = (x 1(s);x 2(s);:::;x n(s)) from any sample

Shape: Multiscale fractal dimension

The fractal dimension of a 2D point set S (contour, skeleton) byMinkowski-Bouligand is a number F ∈ [0, 2],

F = 2− limr→0

ln(A(r))

ln(r),

where A(r) is the number of propagated points (area) when S isdilated by a disk of radius r .

The fractal dimension represents the self-similarity of S whenr tends to zero.

Note that A can be obtained from the cumulative histogramof the Euclidean distance map of S upto the distance r .

Alexandre Xavier Falcao MC940/MO445 - Image Analysis

Page 105: Description - Instituto de Computaçãoafalcao/mo445/description.pdf · A descriptor is analgorithmthat extracts a feature vector x(s) = (x 1(s);x 2(s);:::;x n(s)) from any sample

Shape: Multiscale fractal dimension

The fractal dimension of a 2D point set S (contour, skeleton) byMinkowski-Bouligand is a number F ∈ [0, 2],

F = 2− limr→0

ln(A(r))

ln(r),

where A(r) is the number of propagated points (area) when S isdilated by a disk of radius r .

The fractal dimension represents the self-similarity of S whenr tends to zero.

Note that A can be obtained from the cumulative histogramof the Euclidean distance map of S upto the distance r .

Alexandre Xavier Falcao MC940/MO445 - Image Analysis

Page 106: Description - Instituto de Computaçãoafalcao/mo445/description.pdf · A descriptor is analgorithmthat extracts a feature vector x(s) = (x 1(s);x 2(s);:::;x n(s)) from any sample

Shape: Multiscale fractal dimension

The fractal dimension of a 2D point set S (contour, skeleton) byMinkowski-Bouligand is a number F ∈ [0, 2],

F = 2− limr→0

ln(A(r))

ln(r),

where A(r) is the number of propagated points (area) when S isdilated by a disk of radius r .

The fractal dimension represents the self-similarity of S whenr tends to zero.

Note that A can be obtained from the cumulative histogramof the Euclidean distance map of S upto the distance r .

Alexandre Xavier Falcao MC940/MO445 - Image Analysis

Page 107: Description - Instituto de Computaçãoafalcao/mo445/description.pdf · A descriptor is analgorithmthat extracts a feature vector x(s) = (x 1(s);x 2(s);:::;x n(s)) from any sample

Shape: Multiscale fractal dimension

It is known, for instance, that the fractal dimension of a koch star(on the left) is F ≈ 1.26 [3].

8.5

9

9.5

10

10.5

11

11.5

12

12.5

13

1 2 3 4 5 6

Log(A

)

Log(r)

observed valuesfitted straight line

One can then fit a line to the logarithmic curve of the cumulativehistogram of the EDT (on the right) and use the slope m of theline to estimate F = 2−m ≈ 1.23.

Alexandre Xavier Falcao MC940/MO445 - Image Analysis

Page 108: Description - Instituto de Computaçãoafalcao/mo445/description.pdf · A descriptor is analgorithmthat extracts a feature vector x(s) = (x 1(s);x 2(s);:::;x n(s)) from any sample

Shape: Multiscale fractal dimension

By fitting a polynomial curve (on the left) and computing its firstderivative, the resulting curve F (on the right) is a feature vectorof the shape, called multiscale fractal dimension.

8.5

9

9.5

10

10.5

11

11.5

12

12.5

13

1 2 3 4 5 6

Log(A

)

Log(r)

observed valuespolynomial regression

0.8

0.85

0.9

0.95

1

1.05

1.1

1.15

1.2

1.25

1.3

1 1.5 2 2.5 3 3.5 4 4.5 5 5.5F

ract

al D

imen

sion (

F)

Log(r)

Note that its maximum value (on the right) is ≈ 1.25 for somevalue ln(r) ∈ [3.5, 4].

Alexandre Xavier Falcao MC940/MO445 - Image Analysis

Page 109: Description - Instituto de Computaçãoafalcao/mo445/description.pdf · A descriptor is analgorithmthat extracts a feature vector x(s) = (x 1(s);x 2(s);:::;x n(s)) from any sample

Shape: Point/segment saliences

The dilation of a contour S by a disk of small radius r (e.g.,r = 10) shows that the outside area Aout(p) of the influence zoneof a point p ∈ S is higher than its inside area Ain(p), when p isconvex, and the other way around when it is concave [2].

��������������������

��������������������

��������

��������

��������

���������

���������

A

B

Such areas come from the root histogram of the EDT, which canbe normalized for the purpose of shape description.

Alexandre Xavier Falcao MC940/MO445 - Image Analysis

Page 110: Description - Instituto de Computaçãoafalcao/mo445/description.pdf · A descriptor is analgorithmthat extracts a feature vector x(s) = (x 1(s);x 2(s);:::;x n(s)) from any sample

Shape: Point/segment saliences

By considering only the salience points p ∈ S, a point-saliencefeature vector can encode the respective positive (convex) andnegative (concave) areas.

-0.3

-0.25

-0.2

-0.15

-0.1

-0.05

0

0.05

0.1

0.15

0.2

0 0.2 0.4 0.6 0.8 1

Infl

uen

ce A

rea

Contour point

A B

C

D E

FG

H I

J

K

Alexandre Xavier Falcao MC940/MO445 - Image Analysis

Page 111: Description - Instituto de Computaçãoafalcao/mo445/description.pdf · A descriptor is analgorithmthat extracts a feature vector x(s) = (x 1(s);x 2(s);:::;x n(s)) from any sample

Shape: Point/segment saliences

Similar idea works for a segment-salience feature vector, whenreplacing points by contour segments around the salience points.

Note that, in any case, the distance function must account forpossible shape rotations.

Alexandre Xavier Falcao MC940/MO445 - Image Analysis

Page 112: Description - Instituto de Computaçãoafalcao/mo445/description.pdf · A descriptor is analgorithmthat extracts a feature vector x(s) = (x 1(s);x 2(s);:::;x n(s)) from any sample

Shape: Tensor scale

The tensor scale of each pixel contains orientation θ(p) andanisotropy α(p) values. One can divide the contour S intosegments and compute a weighted angular mean θ for eachsegment, using α(p) as weight and points p of its internalinfluence zone R [1].

θ = arctan

(∑∀p∈R α(p) sin(2θ(p))∑∀p∈R α(p) cos(2θ(p))

)

The tensor-scale feature vector is composed by these weightedangular mean values.

Alexandre Xavier Falcao MC940/MO445 - Image Analysis

Page 113: Description - Instituto de Computaçãoafalcao/mo445/description.pdf · A descriptor is analgorithmthat extracts a feature vector x(s) = (x 1(s);x 2(s);:::;x n(s)) from any sample

Shape Descriptor based on tensor scale

The distance function must consider possible rotations, requiring ashape matching for distance computation.

Alexandre Xavier Falcao MC940/MO445 - Image Analysis

Page 114: Description - Instituto de Computaçãoafalcao/mo445/description.pdf · A descriptor is analgorithmthat extracts a feature vector x(s) = (x 1(s);x 2(s);:::;x n(s)) from any sample

Shape Descriptor based on tensor scale

The distance function must consider possible rotations, requiring ashape matching for distance computation.

Alexandre Xavier Falcao MC940/MO445 - Image Analysis

Page 115: Description - Instituto de Computaçãoafalcao/mo445/description.pdf · A descriptor is analgorithmthat extracts a feature vector x(s) = (x 1(s);x 2(s);:::;x n(s)) from any sample

Combining descriptors by genetic programming

Let {D1,D2, . . . ,Dk} be a set of descriptors such thatDi = (vi , di ), i = 1, 2, . . . , k , consists of

an algorithm vi that extracts feature vectors vi (s) and vi (t)from samples s and t, and

a distance function di that assigns a dissimilarity value di (s, t)in the feature space between samples s and t (e.g.,‖vi (t)− vi (s)‖).

The distance functions di , i = 1, 2, . . . , k , can be combinedinto a single distance function d using, for instance, GeneticProgramming (GP) [4].

The resulting descriptor D is called a composite descriptor.

Alexandre Xavier Falcao MC940/MO445 - Image Analysis

Page 116: Description - Instituto de Computaçãoafalcao/mo445/description.pdf · A descriptor is analgorithmthat extracts a feature vector x(s) = (x 1(s);x 2(s);:::;x n(s)) from any sample

Combining descriptors by genetic programming

Let {D1,D2, . . . ,Dk} be a set of descriptors such thatDi = (vi , di ), i = 1, 2, . . . , k , consists of

an algorithm vi that extracts feature vectors vi (s) and vi (t)from samples s and t, and

a distance function di that assigns a dissimilarity value di (s, t)in the feature space between samples s and t (e.g.,‖vi (t)− vi (s)‖).

The distance functions di , i = 1, 2, . . . , k , can be combinedinto a single distance function d using, for instance, GeneticProgramming (GP) [4].

The resulting descriptor D is called a composite descriptor.

Alexandre Xavier Falcao MC940/MO445 - Image Analysis

Page 117: Description - Instituto de Computaçãoafalcao/mo445/description.pdf · A descriptor is analgorithmthat extracts a feature vector x(s) = (x 1(s);x 2(s);:::;x n(s)) from any sample

Combining descriptors by genetic programming

Let {D1,D2, . . . ,Dk} be a set of descriptors such thatDi = (vi , di ), i = 1, 2, . . . , k , consists of

an algorithm vi that extracts feature vectors vi (s) and vi (t)from samples s and t, and

a distance function di that assigns a dissimilarity value di (s, t)in the feature space between samples s and t (e.g.,‖vi (t)− vi (s)‖).

The distance functions di , i = 1, 2, . . . , k , can be combinedinto a single distance function d using, for instance, GeneticProgramming (GP) [4].

The resulting descriptor D is called a composite descriptor.

Alexandre Xavier Falcao MC940/MO445 - Image Analysis

Page 118: Description - Instituto de Computaçãoafalcao/mo445/description.pdf · A descriptor is analgorithmthat extracts a feature vector x(s) = (x 1(s);x 2(s);:::;x n(s)) from any sample

Combining descriptors by genetic programming

Illustration of a composite descriptor D∗ with the GP combiner C(one may use any other optimization technique).

D*

Dk

D2

D1

d1(s,t)

d2(s,t)

dk(s,t)

*(s,t)d

(b)

......C

t

s

vi

t

vi

s

vi(t)vi(s)

di

di (s,t)

Di

(a)

Alexandre Xavier Falcao MC940/MO445 - Image Analysis

Page 119: Description - Instituto de Computaçãoafalcao/mo445/description.pdf · A descriptor is analgorithmthat extracts a feature vector x(s) = (x 1(s);x 2(s);:::;x n(s)) from any sample

Combining descriptors by genetic programming

GP is an Artificial Intelligence technique based on biologicalprinciples of heritage and evolution.

Each candidate solution is an individual of a population, asrepresented by a data structure (e.g., tree, list, stack) whosenodes are mathematical operations, rather than a sequence ofnumbers, such as in genetic algorithms.

From some initial random population, the most promisingindividuals pass through genetic transformations (e.g.,mutations) that make the population more diverse andsuitable to solve the problem.

Alexandre Xavier Falcao MC940/MO445 - Image Analysis

Page 120: Description - Instituto de Computaçãoafalcao/mo445/description.pdf · A descriptor is analgorithmthat extracts a feature vector x(s) = (x 1(s);x 2(s);:::;x n(s)) from any sample

Combining descriptors by genetic programming

GP is an Artificial Intelligence technique based on biologicalprinciples of heritage and evolution.

Each candidate solution is an individual of a population, asrepresented by a data structure (e.g., tree, list, stack) whosenodes are mathematical operations, rather than a sequence ofnumbers, such as in genetic algorithms.

From some initial random population, the most promisingindividuals pass through genetic transformations (e.g.,mutations) that make the population more diverse andsuitable to solve the problem.

Alexandre Xavier Falcao MC940/MO445 - Image Analysis

Page 121: Description - Instituto de Computaçãoafalcao/mo445/description.pdf · A descriptor is analgorithmthat extracts a feature vector x(s) = (x 1(s);x 2(s);:::;x n(s)) from any sample

Combining descriptors by genetic programming

GP is an Artificial Intelligence technique based on biologicalprinciples of heritage and evolution.

Each candidate solution is an individual of a population, asrepresented by a data structure (e.g., tree, list, stack) whosenodes are mathematical operations, rather than a sequence ofnumbers, such as in genetic algorithms.

From some initial random population, the most promisingindividuals pass through genetic transformations (e.g.,mutations) that make the population more diverse andsuitable to solve the problem.

Alexandre Xavier Falcao MC940/MO445 - Image Analysis

Page 122: Description - Instituto de Computaçãoafalcao/mo445/description.pdf · A descriptor is analgorithmthat extracts a feature vector x(s) = (x 1(s);x 2(s);:::;x n(s)) from any sample

Combining descriptors by genetic programming

Example of individual, where the distances di , i = 1, 2, . . . , k, arethe terminal nodes of a binary tree and the remaining nodes areother mathematical operations.

* 1d

1d 2d

/

+

sqrt

3d

d

Alexandre Xavier Falcao MC940/MO445 - Image Analysis

Page 123: Description - Instituto de Computaçãoafalcao/mo445/description.pdf · A descriptor is analgorithmthat extracts a feature vector x(s) = (x 1(s);x 2(s);:::;x n(s)) from any sample

Combining descriptors by genetic programming

In addition to the mathematical operations (see examples in [5]),

the reproduction selects the most effective individuals to thenext population,

the crossover exchanges subtrees between selected individualsto increase diversity, generating new trees (sons), and

the mutation replaces a subtree of a selected individual byanother subtree randomly chosen.

The individuals are assessed by a fitness function, which can be theaccuracy of classification in a validation set.

Alexandre Xavier Falcao MC940/MO445 - Image Analysis

Page 124: Description - Instituto de Computaçãoafalcao/mo445/description.pdf · A descriptor is analgorithmthat extracts a feature vector x(s) = (x 1(s);x 2(s);:::;x n(s)) from any sample

Combining descriptors by genetic programming

The algorithm can be sketched as follows.

1. Generate an initial random population (first generation).

2. For each generation from a given maximum number do.

3. Evaluate each individual by the fitness function.

4. Select a number of the most effective ones.

5. Generate the next population by reproduction, crossover,

and mutation of the selected individuals.

6. Select the best individual as the final solution.

Alexandre Xavier Falcao MC940/MO445 - Image Analysis

Page 125: Description - Instituto de Computaçãoafalcao/mo445/description.pdf · A descriptor is analgorithmthat extracts a feature vector x(s) = (x 1(s);x 2(s);:::;x n(s)) from any sample

[1] F.A. Andalo, P.A.V. Miranda, R. da S. Torres, and A.X.Falcao.Shape feature extraction and description based on tensor scale.Pattern Recognition, 43(1):26 – 36, 2010.

[2] R. da S. Torres and A.X. Falcao.Contour salience descriptors for effective image retrieval andanalysis.Image and Vision Computing, 25(1):3 – 13, 2007.

[3] R. da S. Torres, A.X. Falcao, and L. da F. Costa.A graph-based approach for multiscale shape analysis.Pattern Recognition, 37(6):1163 – 1174, 2004.

[4] R. da S. Torres, A.X. Falcao, M.A. Goncalves, J.P. Papa,B. Zhang, W. Fan, and E.A. Fox.A genetic programming framework for content-based imageretrieval.Pattern Recognition, 42(2):283 – 292, 2009.Learning Semantics from Multimedia Content.

Alexandre Xavier Falcao MC940/MO445 - Image Analysis

Page 126: Description - Instituto de Computaçãoafalcao/mo445/description.pdf · A descriptor is analgorithmthat extracts a feature vector x(s) = (x 1(s);x 2(s);:::;x n(s)) from any sample

[5] R. da S. Torres, A.X. Falcao, M.A. Goncalves, B. Zhang,W. Fan, and E.A. Fox.A new framework to combine descriptors for content-basedimage retrieval.Technical Report IC-05-21, Institute of Computing, Universityof Campinas, September 2005.

[6] Ian Goodfellow, Yoshua Bengio, and Aaron Courville.Deep Learning.MIT Press, 2016.http://www.deeplearningbook.org.

[7] P. A. V. Miranda, R. da S. Torres, and A. X. Falcao.TSD: A shape descriptor based on a distribution of tensor scalelocal orientation.In XVIII Brazilian Symposium on Computer Graphics andImage Processing (SIBGRAPI’05), pages 139–146, Oct 2005.

[8] Renato O. Stehling, Mario A. Nascimento, and Alexandre X.Falcao.

Alexandre Xavier Falcao MC940/MO445 - Image Analysis

Page 127: Description - Instituto de Computaçãoafalcao/mo445/description.pdf · A descriptor is analgorithmthat extracts a feature vector x(s) = (x 1(s);x 2(s);:::;x n(s)) from any sample

A compact and efficient image retrieval approach based onborder/interior pixel classification.In Proceedings of the Eleventh International Conference onInformation and Knowledge Management, pages 102–109,2002.

Alexandre Xavier Falcao MC940/MO445 - Image Analysis