image representation. global vectorial representations image features have a feature vector...

Image Representation

Global vectorial representations

• Image features have a feature vector representation to collects the numerical values of the feature descriptor. Therefore features can be regarded as points in a multidimensional feature space.

• Representations of images based on global features in vectorial form have a dimension that is determined by the number of features properties used to describe the image patterns.

Feature histogram representations

• Histograms are a common feature vector representation that measures the frequency with which the feature appears in an image or in an image window. With histograms features are quantized into a finite number of bins. Histogram-based holistic representations are widely used with color, edge, lines.

• Being orderless the histogram representation is invariant to viewing conditions and is also tolerant to some extent to partial occlusions.

• Histogram representations main drawbacks are:– Global representation of the image (window) content:

Might be inaccurate to account for local properties and spatial configuration of regions.– Dimensionality course:

Histograms must have a high number of bins (>64) for a meaningful representation. This requires high-dimensional indexes for similarity search

– Histogram distances: Several histogram distances can be defined but their choice is critical for a meaningful and sound comparison of feature distributions

– Histogram binning: Hard assignment of feature values to bins might result into boundary effects.

Color histogram

• The color histogram of an image defines the image colour distribution. Color histograms tassellate the color space and hold the count of pixels for each color zone. For gray level images the grey level histogram is built similarly.

• Main issues:− Uniform tassellation can result inappropriate for a correct representation of color

content.− Color histogram size is typical 256 or more for meaningful representations− Quadratic distances are often more appropriate for color distribution matching − Accounting for spatial information is often fundamental for meaningful color

comparison

Color histogram

Image: I

Quantized colors: c1, c1,…….. cm,

Distance between two pixels: |p1 – p2| = max |x1 – x2|, |y1 – y2|

Pixel set with color c: Ic = p | I(p) = c

Given distance: k

- Correlogram:

Dimensionality m m d if number of different k is d

- Auto-correlogram:

Dimensionality m d

• Practical consideration: use auto-correlogram with m <= 64, d = 1, k = 1

• Correlogram is a variant of histogram that accounts for the local spatial correlation of colors. Correlogram is based on the estimation of the probability of finding a pixel of color j at a distance k from a pixel of color i in an image

Color Correlogram

• Edge histograms represent distribution of edge properties in one image

• Edge intensity histograms provide good discrimination between scenes.

Edge histograms

• Histograms of edge orientations can also be obtained to have a description of the edge

directionality. Each histogram bin accounts for the number of edges at a specific orientation.

– 8 orientations are typically sufficient.– Interpolation of bin assignment can be necessary for meaningful representations

• Histograms of line lenght and orientation provide useful characterization of image content.– Binning can be critical for discriminative representations

Line histograms

Lines

Line lengthhistogram

Image lines

Line orientationhistogram

Human localization capability

• Localization is the process of identifying landmarks of a scene.

• We as humans can distinguish:– Indoors: strong assumptions of flat walls, narrow hallways… – Outdoors: less conforming set of surfaces

• We as humans we can for localization:– Objects – Regions – The scene as a whole

Human vision architecture

• In the human visual system there exist evidence of place recognizing cells at parahippocampal place area. Context information is obtained within a eye saccade in approx 150 ms.

• Two basic models: Gist and Saliency

• Visual Cortex: low level filters, center-surround, and normalization

• Saliency model: attends to pertinent regions• Gist model: computes image general characteristics• High Level Vision:

– Object recognition– Layout recognition– Scene understanding

Gist versus Saliency

• Gist is the term used to signify the essence, the holistic characteristics of an image. “It is an abstract representation of the scene that spontaneously activates memory representations of scene categories (a city, a mountain, etc.)” [A. Oliva and A. Torralba 2001]

• Gist utilizes the same visual cortex raw features as in the Saliency model. Gist is theoretically non-redundant with Saliency

• Gist versus Saliency:– instead of looking at most conspicuous locations in image, looks at scene as a whole– detects regularities (not irregularities) – exploits cooperation (accumulation) instead of competition (winner takes all) among

locations– there is more spatial emphasis in Saliency

Gist• With images an approximate global representation of human gist can be obtained by partitioning

the image in a 4x4 grid (3x3 for small images like 32x32) and taking orientations at different scales and center-surround differences of color and intensity at each grid cell.

• It is equivalent to compute gradient magnitude and orientation for each grid cell plus color and intensity gradients. Does not imply any segmentation.

GIST global representation

• V1 raw image feature-maps‒ Orientation Channel Gabor filters at 4 angles (0,45,90,135) on 4 scales = 16 sub-channels‒ Color red-green and blue-yellow center-surround with 6 scale combinations = 12 sub-channels‒ Intensity dark-bright center-surround with 6 scale combinations = 6 sub-channels

• Total of 34 sub-channels

Gist model implementation

Gist vector


• Gist feature extraction: average values on the predetermined grid

• Dimension Reduction

– Original: 34 sub-channels x 16 features = 544 features

– PCA/ICA reduction: 80 features keep >95% of variance

• Place Classification– Three-layer neural network


The MPEG-7 standard

• MPEG-7 formally named Multimedia Content Description Interface, is a standard for describing the multimedia content that supports some degree of interpretation of the information meaning which can be passed onto or accessed by a device or a computer code . MPEG7 is composed of:

– MPEG-7 Visual – the Description Tools dealing with Visual descriptions.– MPEG-7 Audio – the Description Tools dealing with Audio descriptions

• The goal of the MPEG-7 standard is to allow interoperable searching, indexing, filtering and access of audio-visual content by enabling interoperability among devices and applications. Ideally, MPEG-7 facilitates exchange and reuse of multimedia content across different application domains

AcquisitionAuthoringEditing

BrowsingNavigation

FilteringManagement

TransmissionRetrievalStreaming

CodingCompression

SearchingIndexing

MPEG-1,-2,-4

MPEG-7

• MPEG-7 provides four types of normative description elements: – Descriptors,– Description Schemes (DSs) – Description Definition Language (DDL) – System Tools (coding schemes)

• A description consists of a Description Scheme and the set of Descriptor values:– Descriptor: A representation of a feature. A Descriptor defines the syntax and the

semantics of the feature representation.– Description Scheme: The structure and semantics of the relationships between its

components, which may be both Descriptors and Description Schemes.

MPEG-7 description elements

MPEG-7 Descriptors

• MPEG-7 Descriptors support a range of abstraction levels, from low-level signal characteristics to high-level semantic information. The abstraction level relates to the way we extract the features: we can automatically extract most low-level features, whereas high-level features usually need human supervision and annotation.

• Only the description format is fixed, not the extraction methodologies.

DescriptionProduction(extraction)

DescriptionConsumption

StandardDescription

Normative part ofMPEG-7 standard

MPEG-7 Description Scheme

• A Description Scheme deals with the structure of the description and describes both the structure and semantics of the audio-visual content. In addition MPEG-7 Description Scheme also supports the description of other types of information about the multimedia data such as the coding scheme used, the data size, place and time of recording, classification, and links to other relevant material.

19

Syntax Semantics

MPEG-7 Segment Description Scheme tree

Annotate the whole image with StillRegionSpatial segmentation at different levels

• Among different regions we could use Segment Relationship description tools

MPEG-7 Segment Relationship Description Scheme graph

• Video Segment Relationship description tools can be used to model video shot segments and relationships between regions within video shots

MPEG-7 Description Definition Language and System Tools

• Basic tools of MPEG-7 are:– Description Definition Language: “A language that allows the creation of new Description

Schemes and, possibly, Descriptors. It also allows the extension and modification of existing Description Schemes.”

– Systems Tools: Tools to support multiplexing of descriptions, synchronization of descriptions with content, delivery mechanisms, and coded representations for efficient storage and transmission and the management and protection of intellectual property in MPEG-7 Descriptions.

• MPEG-7 descriptions take two possible forms: a textual XML form suitable for editing, searching, and filtering, the BiM binary form suitable for storage, transmission, and streaming delivery

To reduce the space occupation of the stored MPEG-7 descriptors, due to the verbosity of the XML format, it is possible to use the BiM (Binary Format for MPEG-7) framework. BiM enables compression of any generic XML document, reaching an average 85% compression ratio of MPEG-7 data, and allows the parsing of BiM encoded files, without requiring their decompression

• Broadcast media selection (e.g., radio channel, TV channel)• Cultural services (history museums, art galleries, etc.). • Digital libraries (e.g., image catalogue, musical dictionary, film, video and radio archives). • E-Commerce (e.g., personalised advertising, on-line catalogues). • Education (e.g., repositories of multimedia courses, multimedia search for material). • Multimedia directory services (e.g. yellow pages, Tourist information, Geographical

information systems). • Remote sensing (e.g., cartography, natural resources management). • Surveillance and investigation services (e.g., humans recognition, forensics, traffic control,

surface transportation). ……

• MPEG-7 will also make the web as searchable for multimedia content as it is searchable for text today. This would apply especially to large content archives, which are being made accessible to the public, as well as to multimedia catalogues enabling people to identify content for purchase

Application Areas of MPEG-7

MPEG-7 Visual : Visual Descriptors

Color DescriptorsTexture DescriptorsShape DescriptorsMotion Descriptors for Video

Color Descriptors

• Constrained color spaces:- Scalable Color Descriptor uses HSV

- Color Structure Descriptor uses HMMD- Color Layout Descriptor uses YCbCr

Color Descriptors

Dominant ColorScalable ColorHSV space

Color StructureHMMD space

Color LayoutYCbCr space

Group Of Frames /Pictures histogram

Scalable Color Descriptor

• Scalable Color Descriptor (SCD) is in the form of a color histogram in the HSV color space encoded using a Haar transform. H is quantized to 16 bin and S and V are quantized to 4 bins each. The binary representation is scalable in the number of bins used and the number of bits per bin.

• After all the pixels are processed, the histogram is calculated with the probability for each bin, truncated into an 11-bit value. These values are then non-uniformly quantized into 4-bit values according to the table provided in the ISO specification 13 for more efficient encoding, giving higher significance to small values.

Saturation

Hue

Value

Red (0o)

Yellow (60o)

Green (120o)

Cyan (180o)

Blue (240o) Magenta (300o)

Black

White

Haar Wavelet Transform*

• In numerical analysis and functional analysis, the Discrete Wavelet Transform refers to wavelet transforms for which the wavelets are discretely sampled

• The first Discrete Wavelet Transform was invented by the mathematician Alfréd Haar: – for an input represented by a list of 2n numbers, the Haar wavelet transform may be

considered to pair up input values, storing the difference and passing the sum. – This process is repeated recursively, pairing up the sums to provide the next scale finally

resulting in 2n − 1 differences and 1 final sum

• The Haar wavelet transform can be described as a step function. In the discrete domain it is defined as a 2x2 matrix H defined as:

1 1

1 -1

+1 0 <= x < ½

-1 ½ < x < =1 0 otherwise

1

-1

1

F(x) 2x2 matrix H

• Given a sequence (a0, a1, a2,a3…a2n+1) of even lenght this can be transformed into a sequence of

two-component vectors (a0,a1),… (a2n,a2n+1)

• If one multiplies each vector with the matrix H one gets the result (s0,d0)…..(sn,dn) of one stage of the Haar wavelet transform (sum, difference).

• The two sequences s and d are separated and the process is repeated with the sequence s (s0, s1, s2, s3…sn)

• The discrete wavelet transform has nice properties:– It can be performed in O(n) operations– It captures not only some notion of the frequency content of the input, by examining it

at different scales, but also captures the temporal content, i.e. the times at which these frequencies occur

• With SCD summing pairs of adjacent histogram lines is equivalent to the calculation of a histogram with half number of bins. If this is performed iteratively starting with the H axis, S, V, and hence H….

• Usage of subsets of the coefficients in the Haar representation is equivalent to histograms of 128, 64, 32 bins, calculated from the source histogram

H 16S 4

V 4 Here follow 16-H bin groups of S=2, V=0S=3, V=0

Here follow 16-H bin groups of S=0-3, V=1S=0-3, V=2S=0-3, V=3

…..

SCD computation

256 bins 128 bins 64 bins 32 bins 16 bins

16 bins

16 bins

16 bins

16 bins

This is the 16-H bin group of S=0 V=0

This is the 16-H bin group of S=1 V=0

.

• The result of applying Haar Transform is a set of 16 low pass coefficients and up to 240

high-pass coefficients. The high-pass (difference) coefficients of the Haar transform express the information contained in finer-resolution levels of the histogram.

• Natural image signals usually exhibit high redundancy between adjacent histogram lines. This can be explained by the slight variation of colors caused by variable illumination and shadowing effects.

• Hence, it can be expected that the high-pass coefficients expressing differences between adjacent histogram bins usually have only small values. Exploiting this property, it is possible to truncate the high-pass coefficients to integer representation with a low number of bits

Bin scaling

4416256

448128

42864

22832

224 16

#bins: V# bins: S# bins: H

No. coeff

• SCD representations can be stored in different resolutions, ranging from 256 down to 16 coefficients per histogram.

• Table shows the relationship between number of Haar coefficients as specified in the SCD and partitions in the components of a corresponding HSV histogram that could be reconstructed from the coefficients

• The high-pass (difference) coefficients in the Haar transform can take either positive or

negative values. The sign part is always retained whereas the magnitude part can be scaled by skipping the least significant bits.

• Using the sign-bit only (1 bit/coefficient) leads to an extremely compact representation, while good retrieval efficiency is retrained.

• At the highest accuracy level, 1–8 bits are defined for integer representations of the magnitude part, depending on the relevance of the respective coefficients. In between these extremes, it is possible to scale to different resolution levels.

4bits/bin11bits/bin

Nbits/bin(#bin<256)

Bit scaling

• With SCD, the reconstruction of color histogram from Haar coefficients allows matching with

highest retrieval efficiency. Matching in the histogram domain is only useful to achieve high quality, i.e. when all coefficients are available.

• It is recommended to perform the matching directly in the Haar coefficient domain, which induces only marginal loss in the precision of the similarity matching with considerable savings in computational cost.

• For matching in the Haar coefficient domain it is recommended to use the L1 norm. The L1 norm is also recommended for matching in the histogram domain.

Matching with SCD

GoF/GoP Color Descriptor

– Average: sensitivity to outliers (lighting changes occlusion, text overlays)– Median: increased computational complexity for sorting– Intersection: a “least common” color trait viewpoint

• Applications:– Browsing a large collection of images to find similar images– Use histogram Intersection as a color similarity measure for clustering a collection of images – Represent each cluster by GoP descriptor

• GoF/GoP Color Descriptor extends Scalable Color Descriptor for a video segment or a group of pictures: joint color histogram is then processed as SCD - Haar transform encoding

• In this case two additional bits allow to define how the joint histogram is calculated before applying the Haar transform. The standard allows to use average, median or intersection histograms aggregation methods:

histogram Intersection

Dominant Color Descriptor

• Dominant Color Descriptor (DCD) assumes that a given image is described in terms of a set of region labels and the associated color descriptors:

– Each pixel has a unique region label– Each region is characterized by a color histogram

• Colors in a given region are clustered into a small number of representative colors. For each representative color the descriptor consists of:

– ci : representative color identifier– pi : its percentage in the region– vi : its color variance in the region– s : the overall spatial coherency of the dominant colors in the region

• DCD variance is computed as the variance of each of the dominant colors (h are perceptual weights):

• Spatial coherency for each dominant color captures how coherent the pixels corresponding tothe dominant color are and whether they appear to be a solid color in the given image region.

DCD computation

• Spatial coherency per dominant color is computed by the normalized average connectivity (8-connectedness: pixels with coordinates are counted if connected to the corresponding dominant color pixel )

• DCD spatial coherency gives an idea of the spatial homogeneity of the dominant colors of a region. It is computed as a single value by the weighted sum of per-dominant color spatial coherencies. The weight is proportional to the number of pixels corresponding to each dominant color.

vi

• DCD is suitable for local (object or region) features, when a small number of colors is enough to characterize the color information. Before feature extraction, images must be segmented into regions:

. − maximum of 8 dominant colors can be used to represent the region (3 bits)− percentage values are quantized to 5 bits each− variance: 3 bits /dominant color− spatial coherence: 5 bits

• The color quantization depends on the color space specifications defined for the entire database and need not be specified with each descriptor. LuV uniform color space is recommended.

• Dominant color representation is sufficiently accurate and compact compared to the traditional color histogram:

- color bins quantized from each image region instead of fixed - 3 bins on average instead of 256 or more

Matching with DCD

ak,l : similarity coefficient between two colors ck and cl

dk,l : Euclidean distance between two colors ck and cl

Td : maximum distance for two colors to be considered similar,

dmax = Td , values 1.0 - 1.5, Td values 10 - 20 in the Luv color space

lklk ccd ,

• It supports efficient database indexing and search. Typically when using DCD image similarity is evaluated simply comparing the corresponding dominant color percentages and dominant color similarity (color distances):

Color Structure Descriptor

• Similar to a histogram, the Color Structure Descriptor (CSD) represents an image by both the color distribution and the local structure. Scalable Color Descriptor may not distinguish both images but the Color Structure Descriptor can do it.

• CSD is obtained by scanning the image by an 8x8 structure element in a sliding window approach: with each shift of the structuring element, the number of times a particular color is contained in the structure element is counted, and a color histogram is constructed. The HMMD color space is used.

HMMD Color space*

• The HMMD color space regards the colors adjacent to a given color in the color space as the neighboring colors. It is closely related to HSV:‒ the Hue is the same as in the HSV space (0-360°)‒ Max and Min are the maximum and minimum among the R, G, and B values i.e. how much

black and how much white are present respectively ‒ Diff component is the difference between Max and Min i.e. how much a color is close to

pure color ‒ Sum = (Max + Min) / 2 can also be defined i.e. how much brightness

• Only three of the four components are sufficient to describe the HMMD space (H, Max, Min) or (H, Diff, Sum). HMMD color space can be depicted using the double cone structure

• HMMD can accomplish a color quantization close to the change of the color sensed by the human eye, thereby capable of enhancing a performance of content-based image searching.

HMMD subspace quantization

Example: 128-bins (cells) of the HMMD color space

black

white

4 nonuniform quantizations are defined that partition the space into 256, 128, 64, 32 cells

Each quantization is defined via five subspaces. The Diff axis is defined in 5 subintervals [0,6), [6,20), [20, 60), [60,110), [110, 255). Each subspace has sum and hue allowed to take all values in their ranges. They are partitioned into uniform intervals according to a table.

Subspace 0 Subspace

1

Subspace 2

Subspace 3

Subspace 4

Hue Sum

1 16

4 4

8 4

8 4

8 4

• The color structure histogram allows for m quantized colors cm, where m is {256, 128, 64, 32}.

• The bin value h(m) is the number of structuring elements containing one or more pixels with color cm – consider the set of quantized color index of an image and the set of quantized color index

existing inside the subimage region covered by the structuring element– with the structuring element scanning the image, the color histogram bins are accumulated– the final value of h(m) is determined by the number of positions at which the structuring

element contains color cm

8 x 8 structuringelement

COLORBINC0

C1 +1

C2

C3 +1

C4

C5

C6

C7 +1

CSD computation

• Given two images with DCD representation matching is performed by computing L1 distance measure between CSDs:

)()(),dist( iiBA BAi

hh

Matching with CSD

• Color Layout Descriptor (CLD) is very Compact Descriptor (63 bit) per image based on:– Grid-based Dominant Color in the YCbCr color space (the dominant color may also be

the average color) – DCT (Discrete Cosine transformation) on a 2D-array of Dominant Colors– Final quantization to 63 bits

Color Layout Descriptor

F ={CoefPattern, Y-DC_coef, Cb-DC_coef, Cr-DC_coef, Y-AC_coef, Cb-AC_coef, Cr-AC_coef}

Y = 0.299*R + 0.587*G + 0.114*BCb = -0.169*R - 0.331*G + 0.500*BCr = 0.500*R - 0.419*G - 0.081*B

DCT applies to 8x8 image blocks

For each block, DCT allows to shift from spatial domain to frequency domain:

• f(i,j) is the value that is present in the (i,j) position of the 8x8 block of the original image• F(u,v) is the DCT coefficient of the 8x8 block in the (u,v) position of the 8x8 matrix that encodes the transformed coefficients

DCT (Discrete Cosine Transformation)*

F[0,0]

The 64 (8 x 8) DCT basis functions:

CLD computation

• The image is clustered into 64 (8x8) blocks• A single representative color is selected from each block (the average of the pixel colors in

a block suggested as the representative color). The selection results in a 8x8 image

• Derived average colors are transformed into a series of coefficients by performing DCT• A few low-frequency coefficients are selected using zigzag scanning and quantized to form

a CLD (large quantization step in quantizing AC coeff / small quantization step in quantizing DC coff).

If the time domain data is smooth (with little variation in data) then frequency domain data will make low frequency data larger and high frequency data smaller.

...

...

...

...

• CLD is efficient for:– Sketch-based image retrieval– Content Filtering using image indexing

• The distance of two Color Layout Descriptors CLD and CLD’ with 12 coefficients (6 Y, 3 Cb, 3Cr):

CLD {Y0, ..., Y5, Cr0, Cr1, Cr2, Cb0, Cb1, Cb2} is defined as follows :

Matching with CLD

What applications

• Scalable Color descriptor is useful for image-to-image matching and retrieval based on color feature. Retrieval accuracy increases with the number of bits used in the representation.

• Dominant Color(s) descriptor is most suitable for representing local (object or image region) features where a small number of colors are enough to characterize the color information. A spatial coherency on the entire descriptor is also defined, and used in similarity retrieval.

• Color structure descriptor is suited to image-to-image matching and its intended use is for still- natural image retrieval, where an image may consist of either a single rectangular frame or arbitrarily shaped, possibly disconnected, regions.

• Color Layout descriptor allows image-to-image matching at very small computational costs and ultra high-speed sequence-to-sequence matching also at different resolutions. It is feasible to apply to mobile terminal applications where the available resources is strictly limited. Users can easily introduce perceptual sensitivity of human vision system for similarity calculation.

Texture Descriptors

• Homogenous Texture Descriptor• Non-Homogenous Texture Descriptor (Edge Histogram)

Homogenous Texture Descriptor

• Homogenous Texture Descriptor (HTD) is composed of 62 numbers:– #1,2: respectively the mean and the standard deviation of the image– #3-62: the energy (e) and the energy deviation (d) of the 30 Gabor filtered responses of the

channels, in the subdivision layout of the frequency domain (6 orientations and 5 scales)

• This design is based on the fact that response of the visual cortex is bandlimited and brain decomposes the spectra into bands in spatial frequency (from 4 to 8 frequency bands and approx as many orientations)

F = {fDC, fSD, e1,…, e30, d1,…, d30}

• Gabor filter assumes that the function is first multiplied by a Gaussian function (as a window) and

the resulting function is then subjected to Fourier transform to derive the time-frequency analysis (fixed Gaussian and variable frequency of the modulating wave).

• The characteristic of optimal joint resolution in both space and frequency suggests that these filters are appropriate operators for tasks requiring simultaneous measurement in these domains like f.e. texture discrimination.

• In 1D the window function means that the signal near the time being analyzed will have higher weight. The Gabor transform of a signal x(t) is defined by this formula:

Gabor filter*

1

0

360

0

2, )],(),([

PGp rsPi

being P(ω,θ) the Fourier transform of an image represented in the polar frequency domain and G a Gaussian function: .

2

2

2

2

,2

exp2

exprs

rsrsPG

]1[log10 ii pe Energy in i channel is defined as:

where:

2D-Gabor filter for HTD

• Extension to 2D is satisfied by a family of functions which

can be realized as spatial filters consisting of sinusoidal plane waves within two-dimensional elliptical Gaussian envelopes.• The corresponding Fourier transforms contain elliptical Gaussians displaced from the origin in the direction of orientation with major and minor axes inversely proportional to those of the spatial Gaussian envelopes.

• Each channel filters a specific type of texture.

The center frequencies of the channels in the angular and radial directions are such that: r = 30 ° x r with 0 ≤ r ≤ 5 s = 0 2-s with 0 ≤ s ≤ 4 , 0 = 3/4.

With HTD one can perform:

– Rotation invariance matching– Intensity invariance matching (fDC removed from the feature vector)– Scale-Invariant matching F = {fDC, fSD, e1,…, e30, d1,…, d30}

Matching with HTD

Texture Browsing Descriptor

• The Texture Browsing Descriptor (TBD) requires the same spatial filtering as the HTD and captures the regularity (or the lack of it) in the texture pattern. Its computation is based on the following observations: – Structured textures usually consist of dominant periodic patterns. – A periodic or repetitive pattern, if it exists, can be captured by the filtered images. – The dominant scale and orientation information can be captured by analyzing projections

of the filtered images.

• The texture browsing descriptor can be used to find a set of candidates with similar perceptual properties and then use the HTD to get a precise similarity match list among the candidate images.

• The TBD descriptor is defined as follows:

TBD = [ v1, v2, v3, v4, v5 ]

– Regularity (v1): v1 represents the degree of regularity or structuredness of the texture. A larger value of v1 indicates a more regular pattern.

– Scale (coarseness) (v3, v5): These represent the two dominant scales of the texture. Similar to directionality, the more structured the texture, the more robust the computation of these two components.

– Direction (v2, v4 ): these values represent the two dominant orientations of the texture. The accuracy of computing these two components often depends on the level of regularity of the texture pattern. The orientation space is divided into 30 intervals.

TBD computation

Scale and orientationselective band-pass filters

Regularity (periodic to random)

Coarseness (grain to coarse)

Directionality (/300)

E.g look for textures that are very regular and oriented at 300

Non-Homogenous Texture DescriptorEdge Histogram Descriptor

• Edge Histogram Descriptor (EHD) represents the spatial distribution of five types of edges: vertical, horizontal, 45°, 135°, and non-directional

– Dividing the image into 16 (4x4) blocks– Generating a 5-bin histogram for each block

• EHD cannot be used for object-based image retrieval. Thedgeif set to 0 EHD applies for binary edge images (sketch-based retrieval)

• EHD is scale invariant. The Extended EHD achieves better results than HTD but does not exhibits rotation invariant property

• Algorithm – Divide the image into 4x4 non overlapping sub-images– Generate histogram of edge distribution for each sub-image using 2x2 filter masks to

bin edges into vertical, horizontal, 45°diagonal, 135°diagonal, non-directional .

EHD computation

• Edge map is obtained by using Canny edge operator.

• The basic EHD uses 5 bins for each sub-image. In total we have 80 bins. The histogram bin values are normalized by the total number of the image-blocks.

• The bin values are then non-linearly quantized to keep the size of the histogram as small as possible. With 3 bits/bin, 240 bits are needed in total per sub-image.

Basic (80 bins)

Extended (150 bins)

13 clusters for semi-global

basic Semi-globalglobal

• For a good performance, we need the global edge distribution for the whole image and semi global, horizontal and vertical edge distributions.

• The Extended EHD is obtained by accumulating EHD bins for basic, semi-global and global. Global uses 5 EHD bins for all the sub-images. For the semi-global, four connected sub-images are clustered. In total, we have 150 bins (80 basic + 65 semi-global + 5 global )

Extended EHD computation

What applications

• Homogenous Texture descriptor is for searching and browsing through large collections of similar looking patterns. An image can be considered as a mosaic of homogeneous textures so that these texture features associated with the regions can be used to index the image data.

• Texture Browsing descriptor is useful for representing homogeneous texture for browsing type applications. It provides a perceptual characterization of texture, similar to a human characterization, in terms of regularity, coarseness and directionality.

• Edge Histogram descriptor, in that edges play an important role for image perception, can retrieve images with similar semantic meaning. It targets image-to-image matching (by example or by sketch), especially for natural images with non-uniform edge distribution. The image retrieval performance can be significantly improved if the edge histogram descriptor is combined with other descriptors such as the color histogram descriptor.

Shape Descriptors

• Region-based Descriptor• Contour-based Shape Descriptor

• 2D/3D Shape Descriptor• 3D Shape Descriptor

• A shape is the outline or characteristic surface configuration of a thing: a contour; a form.

A shape cannot be described through text.

• Shape representation and matching is one of the major and oldest research topics of pattern Recognition and Computer Vision.

• Property of invariance of the representation - such that shape representations are left unaltered, under a set of transformations - plays a very important role in order to recognize the same object even in its translated /rotated/ scaled/ shrinked.. view.

Region Based Descriptor

ART Algorithm

‒ Perform edge detection‒ Calculate ARTmn for m=0..M, n=0..N according to:

‒ Scale coefficients by |ART00| to normalize‒ Perform matching on the features ARTmn.

jmAm exp2

1

0cos2

01

nn

nRn

m = 0, ..12n = 0, ..2

Magnitude of ARTnm

The angular and radial basis functions are defined as follows:

is an image function in polar coordinates,

is the ART basis function. The ART basis functions are separable along the angular and radial directions

),( f

),( nmV

ARTnm =

• Region Based Descriptor (RBD) expresses pixel distribution within a 2D object region. Employs 2D-Angular Radial Transformation (ART) defined on a unit disk in polar coordinates.

Matching with RBD

• Applicable to figures (a) – (e)• Distinguishes (i) from (g) and (h); (j)• Find similarities in (k), and (l)

• Advantages:– Describes complex shapes with disconnected regions– Robust to segmentation noise– Fast extraction and matching

• Contour-Based Descriptor (CBD) captures perceptually meaningful features of the shape contour. It is based on Curvature Scale Space representation.

• Curvature Scale-Space – Finds curvature zero crossing points of the shape’s contour (keypoints)– Reduces the number of keypoints step by step, by applying Gaussian smoothing (the contour

is then gradually smoothed by repetitive application of a low-pass filter with the kernel to X and Y coordinates of the selected N contour points ).

– The position of key points are expressed relative to the length of the contour curve

Contour Based Descriptor

• The number of the curvature zeroes is a decreasing function of .

• The diagram of the zero positions as varies is known as scale space diagram.

Gaussian filtered signal

first derivative peaks

Scale space diagram

Scale space diagram

• Comparison between two scale space diagrams can be made by considering only the points of maxima of the two diagrams. • To obtain shape rotation invariance, invariance of scale space diagram to horizontal shifting must be assured. Peaks are aligned to the zero of the diagram and the others are shifted accordingly.

• Properties of scale space diagram:− edge position may shift with increasing scale − two edges may merge with increasing scale − an edge may not split into two with increasing scale

69

Matching with CBD

• Applicable to (a)• Distinguishes differences in (b)• Find similarities in (c) - (e)

• Advantages:‒ Captures the shape very well‒ Robust to the noise, scale, and orientation‒ It is fast and compact

70

RBD versus CBD

• Blue: Similar shapes by Region-Based• Yellow: Similar shapes by Contour-Based

Global Curvature Vector

• Global Curvature Vector (GCV) s pecifies global parameters of the contour, namely the Eccentricity and Circularity:

area

perimeterycircularit

2

.4)2(

2

2

r

rCcircle

for a circle, circularity is

2110220

202

2200220

2110220

202

2200220

42

42

iiiiiii

iiiiiiityeccentrici

202 )( cyyi

))((11 cc yyxxi

220 )( cxxi

What applications

• Region Shape descriptor makes use of all pixels constituting the shape within a frame and can describe any shapes.

– It is characterized by small size, fast extraction time and matching. The data size for this representation is fixed to 17.5 bytes.

– The feature extraction and matching processes have low order of computational complexities, and are suitable for tracking shapes in the video data processing.

• Contour Shape descriptor captures perceptually meaningful features of the shape enabling similarity-based retrieval.

– It is robust to non-rigid motion. – It is robust to partial occlusion of the shape. – It is robust to perspective transformations, which result from the changes of the camera

parameters and are common in images and video

image representation. global vectorial representations image features have a feature vector...

Documents

color histogram size

image representation

color space

color zone

histograms features

color histograms tassellate

global features

global representation