presentation to vii international workshop on advanced computing and analysis techniques in physics...

Presentation to VII International Workshop on Advanced Computing and Analysis

Techniques in Physics Research

October , 2000

2

Title of the Talk

Singular Value Decomposition (SVD) to simplify features recognitions analysis in very large collection of images

F.Guillon D.J.C Murray , SourceWorks Consulting Inc

and Philip Desautels ,Ereo Inc.

4

Internet was built for slow speed modems

Broadband adoption will cause rapid rich media adoption

Ereo is developing the first search engine (for images) that identifies and indexes the content and context of rich media files Goal is to answer : find one or more similar IMAGESfind one or more similar IMAGES

Ereo builds the largest most intelligent image database on the web

Introduction

5

Insatiable Thirst for Bandwidth

Bandwidth and content fuel one another… Reinvention of Web – 28.8 (text) to Broadband More Bandwidth – More Rich Media Content Cycle repeats itself Lack of Tools for Broadband Medium

© 2000 JUPITER COMMUNICATIONS

PUB. 2/00

6

• Next-Gen Rich Media Search

– Focus on Image – Content and Contextual Analysis– Most powerful search tools– Most Relevance– Biggest Database

Ereo Addresses The Market’s Needs

7

Ereo Technology Our Competitive Positioning

Ereo WorldWide 1.0

Ereo OnSite V 2.0

Ereo OnSite V 1.0

More Like This

RELEVANCY

Rich Media Database Size

8

Ereo Technology

Basic ProblemBasic Problem: Simple Image Search in a Database of 1 Million Imagesbased on color only will required searching for similar vectors in a high-dimensional space .

Each Image Color Vector represents the Color Histogram representation however the number of possible colors is large (16,777,216 values for the RGB color model ) .

At run-time the similarity search cannot afford this high-dimensional space cost so

SOME DIMENSIONALITY REDUCTION IS NEEDED

SolutionSolutionSVD1 ( Singular Value Decomposition ) Tool targeted to PCA (Principal Component Analysis) Dimensionality Reduction for the Color-Image Vector Space for Color Similarity Application

¹ Patent Pending under The United States Patent and Trademark Office

9

SVD Technique : Basic

• SVD methods deal with solving difficult linear-least squares problems such as the terms in documents case and here colors in images

• They are based on the theorem of Linear Algebra:– Any M x N matrix A whose number of rows M is greater than or equal to its

number of columns N can be written as the product of an M x N column-orthogonal matrix U , an N x N diagonal matrix W of singular values and the transpose of an N x N orthogonal matrix V:

=A Uw1

w2

WN

VT

10

SVD Solution

• We can envision building a A matrix corresponding to the 16 M Color Histogram of a stack of images upon which we would like to do Image Color Similarity Comparison.

• In order that the SVD process can be a usefull empirical type of model,we need to demonstrate that we can build a A matrix that is Sparse

11

SVD Implementation / Expt

• We use the SVD package (SVDPACKC ) from the Net Lib repository , in particular we used the Lanczos and subspace iteration-based methods for determining several of the largest singular triplets (singular values and corresponding left- and right-singular vectors, the U and V matrix ) for large sparse matrices.

• Test Run:– Stack of 1999 jpeg images ( 133 x 100 pixels) : 6,29 Mb– Use the las2 (Lanczos method) program with modified

data requirements– Use 16Millions ( RGB color model ) color Histogram as

starting point (Note R, G, B : discrete range from 0-255 )– We try to build a sparse matrix A of :

• Columns identify as image id• Rows identify as color C index (combination of R, G, B values )

12

SVD Run/ Results

• To be able to run the SVD process we have to obtain a matrix sufficiently sparse which means reducing effectively number of colors 16 M to 65,536

• In constructing the SVD Sparse matrix we used:

– 234 Mb of temporary storage to handle 1999 color histograms of 65536 color bins

– Final sparse matrix formatted using only non-zero values used about 78 Mb

– SVD process run on a Digital Alpha Linux Machine– (Two Alpha 21264 667 MHz CPUs with 4 MB Cache,– 1 GB SDRAM) – Run time is about 20 min for 200 eigenvalues

13

SVD Results : 700 eigenvalues

0 100 200 300 400 500 600 700

2000

4000

6000

8000

1 104

trace 1eigenvalues number

eige

nval

ue

14

PCA Dimensionality Reduction

• The singular value decomposition of the (65536 x 1999) matrix A yields a new 65536 x N matrix T containing the left singular vectors (equiv of U matrix ) corresponding to the N largest singular values of A . The N largest values retained as much information about the original histograms of the stack of images.

• The matrix T is used to project 65,536 dimensional color histograms into an N-dimensional space color histograms :

H65536 T = H N

15

SVD Results Impact on Similarity

• We can compare the similarity of images based on 512 colors bins Histograms versus the SVD Reduced N-Histograms :

– It is found that for about N = 50 eigenvalues (the most largest values ) we get the same similarity measure of image comparison using the well known P(10) metric .

– Therefore SVD Process is an efficient empirical method to reduce size of any function describing image properties for similarity purpose.

16

CONCLUSION

• Singular Value Decomposition Method can be an extremely valuable and simple method of analyzing data without the necessity of explicit mathematical model and provides data size reduction as well.

• The method has been successfully proven on text and now can be considered a power full method of similarity comparison based on image properties such as color and possibly morphology of objects (shapes) and textures contained within the image.

presentation to vii international workshop on advanced computing and analysis techniques in physics...

Documents

search boxes

search engine

multimedia search site

search software sales

rapid rich media adoption

text search offerings

rich media management

rich mediaereo