presentation to vii international workshop on advanced computing and analysis techniques in physics...
TRANSCRIPT
Presentation to VII International Workshop on Advanced Computing and Analysis
Techniques in Physics Research
October , 2000
2
Title of the Talk
Singular Value Decomposition (SVD) to simplify features recognitions analysis in very large collection of images
F.Guillon D.J.C Murray , SourceWorks Consulting Inc
and Philip Desautels ,Ereo Inc.
3
4
Internet was built for slow speed modems
Broadband adoption will cause rapid rich media adoption
Ereo is developing the first search engine (for images) that identifies and indexes the content and context of rich media files Goal is to answer : find one or more similar IMAGESfind one or more similar IMAGES
Ereo builds the largest most intelligent image database on the web
Introduction
5
Insatiable Thirst for Bandwidth
Bandwidth and content fuel one another… Reinvention of Web – 28.8 (text) to Broadband More Bandwidth – More Rich Media Content Cycle repeats itself Lack of Tools for Broadband Medium
© 2000 JUPITER COMMUNICATIONS
PUB. 2/00
6
• Next-Gen Rich Media Search
– Focus on Image – Content and Contextual Analysis– Most powerful search tools– Most Relevance– Biggest Database
Ereo Addresses The Market’s Needs
7
Ereo Technology Our Competitive Positioning
Ereo WorldWide 1.0
Ereo OnSite V 2.0
Ereo OnSite V 1.0
More Like This
RELEVANCY
Rich Media Database Size
8
Ereo Technology
Basic ProblemBasic Problem: Simple Image Search in a Database of 1 Million Imagesbased on color only will required searching for similar vectors in a high-dimensional space .
Each Image Color Vector represents the Color Histogram representation however the number of possible colors is large (16,777,216 values for the RGB color model ) .
At run-time the similarity search cannot afford this high-dimensional space cost so
SOME DIMENSIONALITY REDUCTION IS NEEDED
SolutionSolutionSVD1 ( Singular Value Decomposition ) Tool targeted to PCA (Principal Component Analysis) Dimensionality Reduction for the Color-Image Vector Space for Color Similarity Application
¹ Patent Pending under The United States Patent and Trademark Office
9
SVD Technique : Basic
• SVD methods deal with solving difficult linear-least squares problems such as the terms in documents case and here colors in images
• They are based on the theorem of Linear Algebra:– Any M x N matrix A whose number of rows M is greater than or equal to its
number of columns N can be written as the product of an M x N column-orthogonal matrix U , an N x N diagonal matrix W of singular values and the transpose of an N x N orthogonal matrix V:
=A Uw1
w2
WN
VT
10
SVD Solution
• We can envision building a A matrix corresponding to the 16 M Color Histogram of a stack of images upon which we would like to do Image Color Similarity Comparison.
• In order that the SVD process can be a usefull empirical type of model,we need to demonstrate that we can build a A matrix that is Sparse
11
SVD Implementation / Expt
• We use the SVD package (SVDPACKC ) from the Net Lib repository , in particular we used the Lanczos and subspace iteration-based methods for determining several of the largest singular triplets (singular values and corresponding left- and right-singular vectors, the U and V matrix ) for large sparse matrices.
• Test Run:– Stack of 1999 jpeg images ( 133 x 100 pixels) : 6,29 Mb– Use the las2 (Lanczos method) program with modified
data requirements– Use 16Millions ( RGB color model ) color Histogram as
starting point (Note R, G, B : discrete range from 0-255 )– We try to build a sparse matrix A of :
• Columns identify as image id• Rows identify as color C index (combination of R, G, B values )
12
SVD Run/ Results
• To be able to run the SVD process we have to obtain a matrix sufficiently sparse which means reducing effectively number of colors 16 M to 65,536
• In constructing the SVD Sparse matrix we used:
– 234 Mb of temporary storage to handle 1999 color histograms of 65536 color bins
– Final sparse matrix formatted using only non-zero values used about 78 Mb
– SVD process run on a Digital Alpha Linux Machine– (Two Alpha 21264 667 MHz CPUs with 4 MB Cache,– 1 GB SDRAM) – Run time is about 20 min for 200 eigenvalues
13
SVD Results : 700 eigenvalues
0 100 200 300 400 500 600 700
2000
4000
6000
8000
1 104
trace 1eigenvalues number
eige
nval
ue
14
PCA Dimensionality Reduction
• The singular value decomposition of the (65536 x 1999) matrix A yields a new 65536 x N matrix T containing the left singular vectors (equiv of U matrix ) corresponding to the N largest singular values of A . The N largest values retained as much information about the original histograms of the stack of images.
• The matrix T is used to project 65,536 dimensional color histograms into an N-dimensional space color histograms :
H65536 T = H N
15
SVD Results Impact on Similarity
• We can compare the similarity of images based on 512 colors bins Histograms versus the SVD Reduced N-Histograms :
– It is found that for about N = 50 eigenvalues (the most largest values ) we get the same similarity measure of image comparison using the well known P(10) metric .
– Therefore SVD Process is an efficient empirical method to reduce size of any function describing image properties for similarity purpose.
16
CONCLUSION
• Singular Value Decomposition Method can be an extremely valuable and simple method of analyzing data without the necessity of explicit mathematical model and provides data size reduction as well.
• The method has been successfully proven on text and now can be considered a power full method of similarity comparison based on image properties such as color and possibly morphology of objects (shapes) and textures contained within the image.