similarity search in visual data ph.d. thesis defense anoop cherian * department of computer science...

Download Similarity Search in Visual Data Ph.D. Thesis Defense Anoop Cherian * Department of Computer Science and Engineering University of Minnesota, Twin-Cities

If you can't read please download the document

Upload: calvin-moody

Post on 17-Dec-2015

218 views

Category:

Documents


2 download

TRANSCRIPT

  • Slide 1
  • Similarity Search in Visual Data Ph.D. Thesis Defense Anoop Cherian * Department of Computer Science and Engineering University of Minnesota, Twin-Cities Adviser: Prof. Nikolaos Papanikolopoulos *Contact: [email protected]@cs.umn.edu
  • Slide 2
  • Talk Outline Introduction Problem Statement Algorithms for Similarity Search in Matrix Valued Data High Dimensional Vector Data Conclusion Future Work
  • Slide 3
  • Thesis Related Publications Journals 1. A. Cherian, S. Sra, A. Banerjee, and N. Papanikolopoulos. Jensen-Bregman- LogDet-Divergence with Application to Efficient Similarity Search for Covariance Matrices. Transactions on Pattern Analysis and Machine Intelligence (TPAMI), [Accepted with minor revisions]. (Chapter 3) 2.A. Cherian, V. Morellas, and N. Papanikolopoulos. Efficient Nearest Neighbor Retrieval via Sparse Coding. Pattern Recognition Journal, [Being submitted] (Chapters 5, 7) Conference Publications 1. A. Cherian, S. Sra, A. Banerjee, and N. Papanikolopoulos. Efficient Similarity Search on Covariance Matrices via the Jensen-Bregman-LogDet-Divergence, Intl. Conf. on Computer Vision (ICCV), 2011. (Chapter 3) 2. A. Cherian, V. Morellas, N. Papanikolopoulos, and S. Badros. Dirichlet Process Mixture Models on Symmetric Positive Definite Matrices for Appearance Clustering in Video Surveillance Applications, Computer Vision and Pattern Recognition (CVPR), 2011. (Chapter 4)
  • Slide 4
  • Thesis Related Publications 3. A. Cherian, J. Andersh, V. Morellas, N. Papanikolopoulos, and B. Mettler. Motion Estimation of a Miniature Helicopter using a Single Onboard Camera, American Control Conference (ACC), 2010. (Chapter 5) 4. A. Cherian, S. Sra, and N. Papanikolopoulos. Denoising Sparse Noise via Online Dictionary Learning. Intl. Conf. on Acoustics, Speech, and Signal Processing (ICASSP), 2011. (Chapter 6) 5. A. Cherian, V. Morellas, and N. Papanikolopoulos. Robust Sparse Hashing. Intl. Conf. on Image Processing (ICIP), 2012 (Chapter 6) [Best Student Paper Award] 6. A. Cherian, V. Morellas, and N. Papanikolopoulos. Approximate Nearest Neighbors via Dictionary Learning, Proceedings of SPIE, 2011. (Chapters 5,6,7) 7. S. Sra, and A. Cherian. Generalized Dictionary Learning for Symmetric Positive Definite Matrices with Application to Nearest Neighbor Retrieval, European Conference on Machine Learning (ECML), 2011. (Chapter 8) 8. A. Cherian, and N. Papanikolopoulos. Large Scale Image Search via Sparse Coding. Minnesota Supercomputing Institute (MSI) Poster Presentation, 2012. [Best Poster Award]
  • Slide 5
  • Talk Outline Introduction Motivation Problem Statement Algorithms for Similarity Search in Matrix Valued Data High Dimensional Vector Data Conclusion Future Work
  • Slide 6
  • Courtesy of Intel
  • Slide 7
  • Big-Data Challenge How to connect the information seeker to the right content? Solution Similarity search Three fundamental steps in similarity search 1. Represent the data 2. Describe the query 3. Retrieve data most similar to the query
  • Slide 8
  • Visual Data Challenges Art courtesy of Thomas Kinkade Pastoral House Never express yourself more clearly than you are able to think-- Neils Bohr It is sometimes difficult to describe precisely in words, what data is to be retrieved! This is especially the case in visual content retrieval, where similarity is defined by an unconscious process. Therefore, characterizing what we see is hard. It is even harder to teach a machine visual similarity.
  • Slide 9
  • A Few Applications using Visual Similarity Search Content-based image retrieval Medical Image Analysis 3D Reconstruction Visual SurveillanceHuman-Machine Interaction
  • Slide 10
  • 3D Scene Reconstruction: Technical Analysis Courtesy: Google Street View Goal: 3D street view Input: A set of images Algorithm 1. Find point correspondences between pairs of images 2. Estimate camera parameters 3. Estimate camera motion 4. Estimate 3D point locations
  • Slide 11
  • 3D Scene Reconstruction: Technical Analysis Courtesy: Google Street View Typically SIFT point descriptors (128D) are used as point descriptors Each image produces several thousand SIFT descriptors (let us say 10K SIFTs/image) There are several thousand images required for a reliable reconstruction (assume 1K images). Thus, there are approximately 10Kx1K=10 7 SIFTs. Pair-wise computations require 10 14 comparisons! This is for only one scene think of millions of scenes in a Street-View application! Computational bottleneck: Efficient similarity computation!
  • Slide 12
  • Talk Outline Introduction Motivation Problem Statement Algorithms for Similarity Search in Matrix Valued Data High Dimensional Vector Data Conclusion Future Work
  • Slide 13
  • Problem Statement Approximate Nearest Neighbor
  • Slide 14
  • Problem Challenges High dimensional data Poses the curse of dimensionality Difficult to distinguish near and far points Examples: SIFT (128D), GIST (960D) Large scale datasets Needle in the haystack! Peta-bytes of visual data and billions of data descriptors Desired Similarity Search Algorithm Properties High retrieval accuracy Fast retrieval Low memory footprint Scalability to large datasets Scalability to high dimensional data Robustness to data perturbations Generalizability to various data descriptors Unit ball inside a unit hypercube
  • Slide 15
  • Thesis Contributions We propose NN retrieval algorithms for two different data modalities:- Matrix valued data (as symmetric positive definite matrices) A new similarity distance Jensen Bregman LogDet Divergence An unsupervised clustering algorithm High dimensional vector valued data A novel connection between sparse coding and hashing A fast and accurate hashing algorithm for NN retrieval Theoretical analysis of our algorithms Experimental validation of our algorithms: against the state-of-the-art techniques in NN retrieval, and on several computer vision datasets
  • Slide 16
  • Talk Outline Introduction Introduction Motivation Motivation Problem Statement Problem Statement Algorithms for Similarity Search in Algorithms for Similarity Search in Matrix Valued Data High Dimensional Vector Data Conclusion Future Work
  • Slide 17
  • Covariance of features Appearance silhouette Features (color + gradient + curvature) Covariance Descriptor Advantages Multi-feature fusion Compact Real-time computable Robust to static noise Robust to illumination Robust to affine transforms Matrix (Covariance) Valued Data
  • Slide 18
  • Importance of Covariance Valued Data in Vision Diffusion Imaging (3x3D) (DT-MRI) Object Tracking (5x5D), Tuzel, et al. 2006 Activity Recognition (12x12D), Guo et al. 2009 Emotion Recognition (30x30D), Zheng, et al., 2010 Face Recognition (40x40D), Pang et al. 2008 3D Object Recognition (8x8D), Fehr et al. 2012
  • Slide 19
  • Geometry of Covariances Covariances form a manifold in Euclidean space due to their positive definiteness property Distances are not straight lines, but curves! Incorporating curvature makes distance computation expensive X Y S p ++
  • Slide 20
  • Similarity Metrics on Covariances Affine Invariant Riemannian Metric (AIRM) Natural metric induced by the Riemannian geometry Log-Euclidean Riemannian Metric (LERM) Induced by approximating covariances to a flat geometry Kullback-Leibler Divergence Metric (KLDM) Considering covariances as objects of an associated Gaussian distribution Matrix Frobenius Distance (FROB) Considering covariances as vectors in the Euclidean space
  • Slide 21
  • Let f be a convex function d f (X,Y) is the deviation of f(Y) from the tangent through f(X) (see figure on the right) Jensen-Bregman divergence is the average deviation of f from the mid point of X and Y Our new measure is derived by substituting f as the -log|. | function: where X,Y are covariances and log|. | is the logdet function. f(Y) f(X) f Our Distance:Jensen-Bregman LogDet Divergence (JBLD)
  • Slide 22
  • Metric\ Property MGNIAINEOFLOPS FROB d(d+1)/2 AIRM 4d 3 LERM (8/3) d 3 KLDM (8/3) d 3 JBLD d3d3 Notation What it means ? MDoes it satisfy all metric properties? GAre gradient computations fast? NIIs the measure invariant to inversion? AIAffine invariance? Notation What it means ? NEAre neg. eigenvalues at infinity? OWill not overestimate AIRM? FLOPSComputational complexity Properties of JBLD
  • Slide 23
  • Computational Speedup using JBLD Speedup in computing AIRM and JBLD for increasing matrix dimensionality Speedup in computing gradients of AIRM and JBLD for increasing matrix dimensionality
  • Slide 24
  • JBLD Geometry FROB surface AIRM surface KLDM surface JBLD surface
  • Slide 25
  • Nearest Neighbors using JBLD Considering NN retrieval on any metric space Scalability Ease for exact NN retrieval Ease for Approximate NN We decided to use a Metric Tree (MT) on JBLD for NN retrieval Square-root of JBLD is a metric. Basically a hierarchical kmeans algorithm From root (which is the entire dataset), bipartitions data recursively.
  • Slide 26
  • Experimental Results using JBLD
  • Slide 27
  • Experiments: Evaluation Datasets Weizmann Actions dataset ETH Tracking dataset Brodatz Texture dataset Faces in the Wild dataset DatasetCovariance sizeDataset sizeGround truth Actions12x1265KAvailable Textures8x827KAvailable Faces40x4031KAvailable Tracking8x810KAIRM
  • Slide 28
  • Experimental Results using JBLD Metric Tree Creation Time NN via Metric tree ANN via Metric tree
  • Slide 29
  • Unsupervised Clustering of Covariances Clustering is an important step in NN retrieval K-Means type clustering need known number of clusters (K) Finding K is non-trivial in practice Thus, we propose an unsupervised clustering algorithm on covariances Extension to Dirichlet Process Mixture Model (DPMM) Uses Wishart-Inverse-Wishart (WIW) conjugate pair Also investigates other DPMM models such as, Gaussian on log-Euclidean covariance vectors Gaussian on vectorized covariances
  • Slide 30
  • Experimental Results Purity is synonymous with Accuracy. Definitions:- le: LERM, f: FROB, l-KLDM, g: AIRM Faces, 40x40 D, 900 matrices, 110 clusters Simulation results for increasing true number of clusters DPMM computational expense against k-means (using AIRM) and EM (using MoW) Appearances, 5x5 D, 758 matrices, 31 clusters
  • Slide 31
  • Talk Outline Introduction Motivation Problem Statement Algorithms for Similarity Search in Matrix Valued Data High Dimensional Vector Data Conclusion Future Work
  • Slide 32
  • Importance of Vector Valued Data in Vision Fundamental data type in several applications As histogram based descriptors - Examples: SIFT, Spin Images, etc. As feature descriptors - Example: image patches As filter outputs: - Example: GIST descriptor GIST Texture patches SIFT
  • Slide 33
  • KD Trees Partitions space along fixed hyperplanes Locality Sensitive Hashing (LSH), Indyk et al. 2008 Generates hash codes by projecting data to random hyperplanes Spectral Hashing, Torralba et al. 2008 Projection planes derived from orthogonal subspaces of PCA Kernelized Hashing, Kulis et al. 2010 Projection planes derived from PCA over kernel matrix learned from data Shift Invariant Kernel Hashing, Lazebik et al. 2009 Spectral hashing with a cosine based kernel Product Quantization, Jegou et al. 2011 K-means sub-vector clustering followed by standard LSH FLANN, Lowe et al. 2009 Not a hashing algorithm, but a hybrid of Hierarchical K-Means and KD-tree. Related Work KD-Tree X X 1 2 3 4 5 LSH Hash code : 11010
  • Slide 34
  • Our Approach Based on Dictionary Learning (DL) and Sparse Coding (SC) Algorithm steps: For each data vector v, 1.Represent v as a sparse vector w using a dictionary B 2.Encode w as a hash code T 3.Store w at H(T), where H is a hash table indexed by T End Given query vector q, 1. Generate sparse vector w q and hash code T q 2. Find ANN(q) in H(T q )
  • Slide 35
  • Dictionary Learning and Sparse Coding Dictionary learning:- An algorithm to learn atoms from data. Sparse Coding:- An algorithm to represent data in terms of a few atoms in the dictionary. An Analogy Dictionary Learning Data Dictionary of basic atoms
  • Slide 36
  • Dictionary Learning and Sparse Coding Dictionary learning:- An algorithm to learn atoms from data. Sparse Coding:- An algorithm to represent data in terms of a few atoms in the dictionary. An Analogy Dictionary Learning Image dataDictionary of basic atoms
  • Slide 37
  • Dictionary Learning and Sparse Coding Dictionary learning:- An algorithm to learn atoms from data. Sparse Coding:- An algorithm to represent data in terms of a few atoms in the dictionary. An Analogy Sparse Coding 0 x Na 0 x Li 0 x Be. 2 x H. 1 x O. 0 x Xe 0 x Rn Sparse atom selection Data vector Sparse representation (lots of zeros)
  • Slide 38
  • Dictionary Learning and Sparse Coding Dictionary learning:- An algorithm to learn atoms from data. Sparse Coding:- An algorithm to represent data in terms of a few atoms in the dictionary. An Analogy Sparse Coding 0.0 x. 1.2 x. 0.4 x. 0.0 x Sparse atom selection Image Sparse representation (lots of zeros)
  • Slide 39
  • Sparse Codes as Hash Codes 10, 33, 77, 90 Sparse code Subspace Combination Tuple (SCT) (hash code) Data vector Dictionary Sparse codeHash table Hashing Illustration
  • Slide 40
  • Sparse Coding & NN Retrieval Connection High probability New data point
  • Slide 41
  • Sparse Coding & NN Retrieval Connection
  • Slide 42
  • Advantages of Sparse Coding for NN Retrieval Hashing efficiency Large number of hash codes 2 k n C k k-sparse codes against 2 k codes of LSH Storage efficiency Need to store only sparse coefficients Against entire data vectors as in LSH Query efficiency Linear search on low dimensional sparse vectors No curse of dimensionality Sparse coding complexity O(ndk) for a dictionary of n atoms each of dimension d and generating k-sparse codes. 1-sparse 2-sparse
  • Slide 43
  • Disadvantage: Sensitivity to Data Perturbation! Sparse coding fits hyperplanes to dense regions of data There are 2 k n C k hyperplanes for k-sparse code and n-atom dictionary Example: Assume n=1024, k=10 We have 10 30 hyperplanes Data partitions can be too small! Small data perturbations can lead data points to change partitions Different partitions imply different hash codes and hashing fails!
  • Slide 44
  • Robust NN Retrieval Robust Dictionary Learning Robust Sparse Coding Align dictionary atoms compensating for data perturbations Approaches Let perturbations be noise. Develop a denoising model Make data immune to worst case perturbation Hierarchical data space partitioning Larger partitions subsume smaller partitions Generate multiple hash codes, one for each partition
  • Slide 45
  • Robust Dictionary Learning Denoising approach Robust Optimization Data has large and small perturbations Assume Gaussian noise for small perturbations Assume Laplacian for large but sparse perturbations. Denoise for Gaussian + Laplacian noise Resulting denoised data should produce same SCT! Basis learned Subtract off Laplacian noise Subtract off Gaussian noise Worst case perturbation Project data to worst case perturbation Basis learned No assumptions on noise distribution Learn worst case perturbation from a training set Project every data point as if perturbed by the worst case noise Learn basis on the perturbed data Resulting immunized data should produce same SCT!
  • Slide 46
  • Robust Dictionary Learning: Experimental Results Denoising approach Robust optimization INRIA Copydays Dataset Graf Bike Bark Boat Wall Leu UBC Tree
  • Slide 47
  • Robust Sparse Coding Based on the regularization path of sparse coding Similar data points will have similar regularization paths Similar data points & basis activationsDissimilar data points & basis activations Main idea: Generate multiple SCTs for each increasing regularizations Multi-Regularization Sparse Coding (MRSC) algorithm Increasing regularization means bigger data partitions & more robustness
  • Slide 48
  • Robust Sparse Coding: Experimental Results MNIST Digits CIFAR 10 objects SHREC spin Images (2M) Holidays SIFT (2M)
  • Slide 49
  • Robust Sparse Coding: Experimental Results (SIFT) Timing Robustness Scalability Timing/Scalability
  • Slide 50
  • Sparse Coding for Covariances: Generalized Dictionary Learning Basic idea Extend sparse coding framework for matrix valued data Sparse vector Sparse diagonal matrix Vector dictionary Non-negative rank-one dictionary
  • Slide 51
  • Generalized Dictionary Learning: Experimental Results DatasetCovariance size Dataset size Dictionary% of dataset searched LabelMe objects7x725K7x285.11 % Faces (FERRET)40x4010K40x1603.54 % Textures (Brodatz + Curet) 5x560K5x506.26 % AppearancesFacesTexture
  • Slide 52
  • Talk Outline Introduction Motivation Problem Statement Algorithms for Similarity Search in Matrix Valued Data High Dimensional Vector Data Conclusion Future Work
  • Slide 53
  • We considered NN problems on two different data types Covariance data High dimensional vector data For covariance data, We proposed an efficient similarity metric-Jensen Bregman LogDet Divergence Proposed novel unsupervised clustering algorithms with high clustering accuracy For vector data We established a connection between LSH and sparse coding Proposed efficient algorithms for robust NN retrieval We proposed a framework for sparse coding covariances- Generalized Dictionary Learning Conclusion
  • Slide 54
  • Talk Outline Introduction Motivation Problem Statement Algorithms for Similarity Search in Matrix Valued Data High Dimensional Vector Data Conclusion Future Work
  • Slide 55
  • Future Work Covariance data Application of JBLD for DT-MRI applications Semi-supervised Dirichlet process mixture models Metric learning on covariance manifolds Locality sensitive hashing on covariances High dimensional vector data Hamming embedding via dictionary learning Dictionary Learning under constraints Bulk sparse coding Large scale dictionary learning
  • Slide 56
  • Thank you! Image courtesy : http://www.spokanecriminaldefenseattorney.net/spokane-domestic-violence-attorney/http://www.spokanecriminaldefenseattorney.net/spokane-domestic-violence-attorney/