minimal loss hashing for compact binary codes mohammad norouzi david fleet university of toronto
TRANSCRIPT
![Page 1: Minimal Loss Hashing for Compact Binary Codes Mohammad Norouzi David Fleet University of Toronto](https://reader036.vdocuments.us/reader036/viewer/2022062408/56649f005503460f94c157f6/html5/thumbnails/1.jpg)
Minimal Loss Hashing for Compact Binary Codes
Mohammad Norouzi
David Fleet
University of Toronto
![Page 2: Minimal Loss Hashing for Compact Binary Codes Mohammad Norouzi David Fleet University of Toronto](https://reader036.vdocuments.us/reader036/viewer/2022062408/56649f005503460f94c157f6/html5/thumbnails/2.jpg)
Near Neighbor Search
![Page 3: Minimal Loss Hashing for Compact Binary Codes Mohammad Norouzi David Fleet University of Toronto](https://reader036.vdocuments.us/reader036/viewer/2022062408/56649f005503460f94c157f6/html5/thumbnails/3.jpg)
Near Neighbor Search
![Page 4: Minimal Loss Hashing for Compact Binary Codes Mohammad Norouzi David Fleet University of Toronto](https://reader036.vdocuments.us/reader036/viewer/2022062408/56649f005503460f94c157f6/html5/thumbnails/4.jpg)
Near Neighbor Search
![Page 5: Minimal Loss Hashing for Compact Binary Codes Mohammad Norouzi David Fleet University of Toronto](https://reader036.vdocuments.us/reader036/viewer/2022062408/56649f005503460f94c157f6/html5/thumbnails/5.jpg)
Similarity-Preserving Binary Hashing
Why binary codes?
Sub-linear search using hash indexing
(even exhaustive linear search is fast)
Binary codes are storage-efficient
![Page 6: Minimal Loss Hashing for Compact Binary Codes Mohammad Norouzi David Fleet University of Toronto](https://reader036.vdocuments.us/reader036/viewer/2022062408/56649f005503460f94c157f6/html5/thumbnails/6.jpg)
input vector
parametermatrix
binaryquantization
Random projections used by locality-sensitive hashing
(LSH) and related techniques [Indyk & Motwani ‘98;
Charikar ’02; Raginsky & Lazebnik ’09]
Similarity-Preserving Binary Hashing
Hash function
kth row of W
![Page 7: Minimal Loss Hashing for Compact Binary Codes Mohammad Norouzi David Fleet University of Toronto](https://reader036.vdocuments.us/reader036/viewer/2022062408/56649f005503460f94c157f6/html5/thumbnails/7.jpg)
Learning Binary Hash Functions
Reasons to learn hash functions:
to find more compact binary codes
to preserve general similarity measures
Previous work
boosting [Shakhnarovich et al ’03]
neural nets [Salakhutdinov & Hinton 07; Torralba et al 07]
spectral methods [Weiss et al ’08]
loss-based methods [Kulis & Darrel ‘09]
…
![Page 8: Minimal Loss Hashing for Compact Binary Codes Mohammad Norouzi David Fleet University of Toronto](https://reader036.vdocuments.us/reader036/viewer/2022062408/56649f005503460f94c157f6/html5/thumbnails/8.jpg)
Formulation
Input data:
Similarity labels:
Hash function:
Binary codes:
![Page 9: Minimal Loss Hashing for Compact Binary Codes Mohammad Norouzi David Fleet University of Toronto](https://reader036.vdocuments.us/reader036/viewer/2022062408/56649f005503460f94c157f6/html5/thumbnails/9.jpg)
Loss Function
Hash code quality measured by a loss function:
similarity label
binarycodes : code for item 1
: code for item 2
: similarity label
cost
measures consistency
Similar items should map to nearby hash codes
Dissimilar items should map to very different codes
![Page 10: Minimal Loss Hashing for Compact Binary Codes Mohammad Norouzi David Fleet University of Toronto](https://reader036.vdocuments.us/reader036/viewer/2022062408/56649f005503460f94c157f6/html5/thumbnails/10.jpg)
Hinge Loss
Similar items should map to codes within a radius of bits
Dissimilar items should map to codes no closer than bits
![Page 11: Minimal Loss Hashing for Compact Binary Codes Mohammad Norouzi David Fleet University of Toronto](https://reader036.vdocuments.us/reader036/viewer/2022062408/56649f005503460f94c157f6/html5/thumbnails/11.jpg)
Empirical Loss
Good:
incorporates quantization and Hamming distance
Not so good:
discontinuous, non-convex objective function
Given training pairs with similarity labels
![Page 12: Minimal Loss Hashing for Compact Binary Codes Mohammad Norouzi David Fleet University of Toronto](https://reader036.vdocuments.us/reader036/viewer/2022062408/56649f005503460f94c157f6/html5/thumbnails/12.jpg)
We minimize an upper bound on empirical loss,
inspired by structural SVM formulations
[Taskar et al ‘03; Tsochantaridis et al ‘04; Yu &
Joachims ‘09]
![Page 13: Minimal Loss Hashing for Compact Binary Codes Mohammad Norouzi David Fleet University of Toronto](https://reader036.vdocuments.us/reader036/viewer/2022062408/56649f005503460f94c157f6/html5/thumbnails/13.jpg)
Bound on loss
LHS = RHS
![Page 14: Minimal Loss Hashing for Compact Binary Codes Mohammad Norouzi David Fleet University of Toronto](https://reader036.vdocuments.us/reader036/viewer/2022062408/56649f005503460f94c157f6/html5/thumbnails/14.jpg)
Bound on loss
Remarks: piecewise linear in W convex-concave in W relates to structural SVM with latent variables
[Yu & Joachims ‘09]
![Page 15: Minimal Loss Hashing for Compact Binary Codes Mohammad Norouzi David Fleet University of Toronto](https://reader036.vdocuments.us/reader036/viewer/2022062408/56649f005503460f94c157f6/html5/thumbnails/15.jpg)
Bound on Empirical Loss
Loss-adjusted inference
Exact
Efficient
![Page 16: Minimal Loss Hashing for Compact Binary Codes Mohammad Norouzi David Fleet University of Toronto](https://reader036.vdocuments.us/reader036/viewer/2022062408/56649f005503460f94c157f6/html5/thumbnails/16.jpg)
Perceptron-like Learning
Initialize with LSH
Iterate over pairs
• Compute , the codes given by
• Solve loss-adjusted inference
• Update
[McAllester et al.., 2010]
![Page 17: Minimal Loss Hashing for Compact Binary Codes Mohammad Norouzi David Fleet University of Toronto](https://reader036.vdocuments.us/reader036/viewer/2022062408/56649f005503460f94c157f6/html5/thumbnails/17.jpg)
Experiment: Euclidean ANN
Similarity based on Euclidean distance
Datasets LabelMe (GIST) MNIST (pixels) PhotoTourism (SIFT) Peekaboom (GIST) Nursery (8D attributes) 10D Uniform
![Page 18: Minimal Loss Hashing for Compact Binary Codes Mohammad Norouzi David Fleet University of Toronto](https://reader036.vdocuments.us/reader036/viewer/2022062408/56649f005503460f94c157f6/html5/thumbnails/18.jpg)
Experiment: Euclidean ANN
22K LabelMe
512 GIST
20K training
2K testing
~1% of pairs are similar
Evaluation
Precision: #hits / number of items retrieved
Recall: #hits / number of similar items
![Page 19: Minimal Loss Hashing for Compact Binary Codes Mohammad Norouzi David Fleet University of Toronto](https://reader036.vdocuments.us/reader036/viewer/2022062408/56649f005503460f94c157f6/html5/thumbnails/19.jpg)
Techniques of interest
MLHMLH – minimal loss hashing (This work)
LSHLSH – locality-sensitive hashing (Charikar ‘02)
SHSH – spectral hashing (Weiss, Torralba & Fergus ‘09)
SIKHSIKH – shift-Invariant kernel hashing (Raginsky & Lazebnik ‘09)
BRE BRE – Binary reconstructive embedding (Kulis & Darrel ‘09)
![Page 20: Minimal Loss Hashing for Compact Binary Codes Mohammad Norouzi David Fleet University of Toronto](https://reader036.vdocuments.us/reader036/viewer/2022062408/56649f005503460f94c157f6/html5/thumbnails/20.jpg)
Euclidean Labelme – 32 bits
![Page 21: Minimal Loss Hashing for Compact Binary Codes Mohammad Norouzi David Fleet University of Toronto](https://reader036.vdocuments.us/reader036/viewer/2022062408/56649f005503460f94c157f6/html5/thumbnails/21.jpg)
Euclidean Labelme – 32 bits
![Page 22: Minimal Loss Hashing for Compact Binary Codes Mohammad Norouzi David Fleet University of Toronto](https://reader036.vdocuments.us/reader036/viewer/2022062408/56649f005503460f94c157f6/html5/thumbnails/22.jpg)
Euclidean Labelme – 32 bits
![Page 23: Minimal Loss Hashing for Compact Binary Codes Mohammad Norouzi David Fleet University of Toronto](https://reader036.vdocuments.us/reader036/viewer/2022062408/56649f005503460f94c157f6/html5/thumbnails/23.jpg)
Euclidean Labelme – 64 bits
![Page 24: Minimal Loss Hashing for Compact Binary Codes Mohammad Norouzi David Fleet University of Toronto](https://reader036.vdocuments.us/reader036/viewer/2022062408/56649f005503460f94c157f6/html5/thumbnails/24.jpg)
Euclidean Labelme – 64 bits
![Page 25: Minimal Loss Hashing for Compact Binary Codes Mohammad Norouzi David Fleet University of Toronto](https://reader036.vdocuments.us/reader036/viewer/2022062408/56649f005503460f94c157f6/html5/thumbnails/25.jpg)
Euclidean Labelme – 128 bits
![Page 26: Minimal Loss Hashing for Compact Binary Codes Mohammad Norouzi David Fleet University of Toronto](https://reader036.vdocuments.us/reader036/viewer/2022062408/56649f005503460f94c157f6/html5/thumbnails/26.jpg)
Euclidean Labelme – 256 bits
![Page 27: Minimal Loss Hashing for Compact Binary Codes Mohammad Norouzi David Fleet University of Toronto](https://reader036.vdocuments.us/reader036/viewer/2022062408/56649f005503460f94c157f6/html5/thumbnails/27.jpg)
Experiment: Semantic ANN
Semantic similarity measure based on annotations(object labels) from LabelMe database:
512D GIST, 20K training, 2K testing
Techniques of interest
MLHMLH – minimal loss hashing
NNNN – nearest neighbor in GIST space
NNCA NNCA – multilayer network with RBM pre-training and nonlinear NCA fine tuning [Torralba, et al. ’09; Salakhutdinov & Hinton ’07]
![Page 28: Minimal Loss Hashing for Compact Binary Codes Mohammad Norouzi David Fleet University of Toronto](https://reader036.vdocuments.us/reader036/viewer/2022062408/56649f005503460f94c157f6/html5/thumbnails/28.jpg)
Semantic LabelMe
![Page 29: Minimal Loss Hashing for Compact Binary Codes Mohammad Norouzi David Fleet University of Toronto](https://reader036.vdocuments.us/reader036/viewer/2022062408/56649f005503460f94c157f6/html5/thumbnails/29.jpg)
Semantic LabelMe
![Page 30: Minimal Loss Hashing for Compact Binary Codes Mohammad Norouzi David Fleet University of Toronto](https://reader036.vdocuments.us/reader036/viewer/2022062408/56649f005503460f94c157f6/html5/thumbnails/30.jpg)
![Page 31: Minimal Loss Hashing for Compact Binary Codes Mohammad Norouzi David Fleet University of Toronto](https://reader036.vdocuments.us/reader036/viewer/2022062408/56649f005503460f94c157f6/html5/thumbnails/31.jpg)
![Page 32: Minimal Loss Hashing for Compact Binary Codes Mohammad Norouzi David Fleet University of Toronto](https://reader036.vdocuments.us/reader036/viewer/2022062408/56649f005503460f94c157f6/html5/thumbnails/32.jpg)
![Page 33: Minimal Loss Hashing for Compact Binary Codes Mohammad Norouzi David Fleet University of Toronto](https://reader036.vdocuments.us/reader036/viewer/2022062408/56649f005503460f94c157f6/html5/thumbnails/33.jpg)
Summary
A formulation for learning binary hash functions
based on
structured prediction with latent variables
hinge-like loss function for similarity search
Experiments show that with minimal loss hashing
binary codes can be made more compact
semantic similarity based on human labels can be preserved
![Page 34: Minimal Loss Hashing for Compact Binary Codes Mohammad Norouzi David Fleet University of Toronto](https://reader036.vdocuments.us/reader036/viewer/2022062408/56649f005503460f94c157f6/html5/thumbnails/34.jpg)
Thank you!
Questions?