Download - Graph Regularised Hashing (ECIR'15 Talk)
Graph Regularised Hashing
Sean Moran and Victor Lavrenko
Institute of Language, Cognition and ComputationSchool of Informatics
University of Edinburgh
ECIR’15 Vienna, March 2015
Graph Regularised Hashing (GRH)
Overview
GRH
Evaluation
Conclusion
Graph Regularised Hashing (GRH)
Overview
GRH
Evaluation
Conclusion
Locality Sensitive Hashing
H
DATABASE
Locality Sensitive Hashing
110101
010111
H
010101
111101
.....
DATABASE
HASH TABLE
Locality Sensitive Hashing
110101
010111
H
H
QUERY
DATABASE
HASH TABLE
010101
111101
.....
Locality Sensitive Hashing
110101
010111
H
H
COMPUTE SIMILARITY
NEARESTNEIGHBOURS
QUERY
010101
111101
.....
H
DATABASE
QUERY
HASH TABLE
Locality Sensitive Hashing
110101
010111
111101
H
H
Content Based IR
Image: Imense Ltd
Image: Doersch et al.
Image: Xu et al.
Location Recognition
Near duplicate detection
010101
111101
.....
H
QUERY
DATABASE
QUERY
NEARESTNEIGHBOURS
HASH TABLE
COMPUTE SIMILARITY
Previous work
I Data-independent: Locality Sensitive Hashing (LSH) [Indyk.98]
I Data-dependent (unsupervised): Anchor Graph Hashing(AGH) [Liu et al. ’11], Spectral Hashing (SH) [Weiss ’08]
I Data-dependent (supervised): Self Taught Hashing (STH)[Zhang ’10], Supervised Hashing with Kernels (KSH) [Liu etal. ’12], ITQ + CCA [Gong and Lazebnik ’11], BinaryReconstructive Embedding (BRE) [Kulis and Darrell. ’09]
Previous work
Method Data-Dependent Supervised Scalable Effectiveness
LSH X LowSH X LowSTH X X MediumBRE X X MediumITQ+CCA X X MediumKSH X X High
GRH X X X High
Graph Regularised Hashing (GRH)
Overview
GRH
Evaluation
Conclusion
Graph Regularised Hashing (GRH)
I Two step iterative hashing model:
I Step A: Graph Regularisation
Lm ← sgn(α SD−1Lm−1 + (1−α)L0
)I Step B: Data-Space Partitioning
for k = 1. . .K : min ||hk ||2 + C∑N
i=1 ξik
s.t. Lik(hkᵀxi + bk) ≥ 1− ξik for i = 1. . .N
I Repeat for a set number of iterations (M)
Graph Regularised Hashing (GRH)
I Step A: Graph Regularisation [Diaz ’07][1]
Lm ← sgn(α SD−1Lm−1 + (1−α)L0
)
I S: Affinity (adjacency) matrix
I D: Diagonal degree matrix
I L: Binary bits at specified iteration
I α: Interpolation parameter (0 ≤ α ≤ 1)
[1] Diaz, F.: Regularizing query-based retrieval scores. In: IR(2007)
Graph Regularised Hashing (GRH)
I Step A: Graph Regularisation [Diaz ’07]
Lm ← sgn(α SD−1Lm−1 + (1−α)L0
)
I S: Affinity (adjacency) matrix
I D: Diagonal degree matrix
I L: Binary bits at specified iteration
I α: Interpolation parameter (0 ≤ α ≤ 1)
Graph Regularised Hashing (GRH)
I Step A: Graph Regularisation [Diaz ’07]
Lm ← sgn(α SD−1Lm−1 + (1−α)L0
)
I S: Affinity (adjacency) matrix
I D: Diagonal degree matrix
I L: Binary bits at specified iteration
I α: Interpolation parameter (0 ≤ α ≤ 1)
Graph Regularised Hashing (GRH)
I Step A: Graph Regularisation [Diaz ’07]
Lm ← sgn(α SD−1Lm−1 + (1−α)L0
)
I S: Affinity (adjacency) matrix
I D: Diagonal degree matrix
I L: Binary bits at specified iteration
I α: Interpolation parameter (0 ≤ α ≤ 1)
Graph Regularised Hashing (GRH)
I Step A: Graph Regularisation [Diaz ’07]
Lm ← sgn(α SD−1Lm−1 + (1−α)L0
)
I S: Affinity (adjacency) matrix
I D: Diagonal degree matrix
I L: Binary bits at specified iteration
I α: Interpolation parameter (0 ≤ α ≤ 1)
Graph Regularised Hashing (GRH)
-1 1 1
-1 -1 -1ba
c
1 1 1
S a b c
a 1 1 0b 1 1 1c 0 1 1
D−1 a b c
a 0.5 0 0b 0 0.33 0c 0 0 0.5
L0 b1 b2 b3
a −1 −1 −1b −1 1 1c 1 1 1
Graph Regularised Hashing (GRH)
-1 1 1
-1 -1 -1ba
c
1 1 1
L1 = sgn
−1 0 0−0.33 0.33 0.33
0 1 1
Graph Regularised Hashing (GRH)
-1 1 1
-1 1 1ba
c
1 1 1
L1 =
b1 b2 b3
a −1 1 1b −1 1 1c 1 1 1
Graph Regularised Hashing (GRH)
I Step B: Data-Space Partitioning
for k = 1. . .K : min ||hk ||2 + C∑N
i=1 ξik
s.t. Lik(hkᵀxi + bk) ≥ 1− ξik for i = 1. . .N
I hk : Hyperplane k
I bk : bias of hyperplane k
I xi : data-point i
I Lik : bit k of data-point i
ξik : slack variable ij
K : # bits
N: # data-points
Graph Regularised Hashing (GRH)
I Step B: Data-Space Partitioning
for k = 1. . .K : min ||hk ||2 + C∑N
i=1 ξik
s.t. Lik(hkᵀxi + bk) ≥ 1− ξik for i = 1. . .N
I hk : Hyperplane k
I bk : bias of hyperplane k
I xi : data-point i
I Lik : bit k of data-point i
ξik : slack variable ij
K : # bits
N: # data-points
Graph Regularised Hashing (GRH)
I Step B: Data-Space Partitioning
for k = 1. . .K : min ||hk ||2 + C∑N
i=1 ξik
s.t. Lik(hkᵀxi + bk) ≥ 1− ξik for i = 1. . .N
I hk : Hyperplane k
I bk : bias of hyperplane k
I xi : data-point i
I Lik : bit k of data-point i
ξik : slack variable ij
K : # bits
N: # data-points
Graph Regularised Hashing (GRH)
I Step B: Data-Space Partitioning
for k = 1. . .K : min ||hk ||2 + C∑N
i=1 ξik
s.t. Lik(hkᵀxi + bk) ≥ 1− ξik for i = 1. . .N
I hk : Hyperplane k
I bk : bias of hyperplane k
I xi : data-point i
I Lik : bit k of data-point i
ξik : slack variable ij
K : # bits
N: # data-points
Graph Regularised Hashing (GRH)
I Step B: Data-Space Partitioning
for k = 1. . .K : min ||hk ||2 + C∑N
i=1 ξik
s.t. Lik(hkᵀxi + bk) ≥ 1− ξik for i = 1. . .N
I hk : Hyperplane k
I bk : bias of hyperplane k
I xi : data-point i
I Lik : bit k of data-point i
ξik : slack variable ij
K : # bits
N: # data-points
Graph Regularised Hashing (GRH)
ba
c
e
f
g
h
d
Graph Regularised Hashing (GRH)
ba
c
e
f
g
h
d
Graph Regularised Hashing (GRH)
-1 1 1
1 1 1
-1 -1 -1
1 1 -1
ba
c
e
f
g
h
d
-1 1 1
1 -1 -1
1 -1 -1
1 1 1
Graph Regularised Hashing (GRH)
1 1 1
-1 1 1
-1 -1 -1
1 1 -1
ba
c
e
f
g
h
d
-1 1 1
1 1 1
1 -1 -1
1 -1 -1
Graph Regularised Hashing (GRH)
-1 1 1
-1 -1 -1
1 1 -1
ba
c
e
f
g
h
d
-1 1 1
-1 1 1
1 1 1
First bit flipped
1 1 -1 Second bit flipped
1 -1 -1
Graph Regularised Hashing (GRH)
-1 1 1
1 1 1
-1 -1 -1
1 1 -1
ba
c
e
f
g
h
d
-1 1 1h1 . x−b1=0
h1
Negative (-1)half space
Positive (+1)half space
1 1 -1
1 -1 -1
-1 1 1
Graph Regularised Hashing (GRH)
-1 1 1
1 1 1
-1 -1 -1
1 1 -11 -1 -1
ba
c
e
f
g
h
1 1 -1
d
-1 1 1
-1 1 1
h2
Positive (+1)half space
h2 . x−b2=0 Negative (-1) half space
Evaluation
Overview
GRH
Evaluation
Conclusion
Datasets/Features
I Standard evaluation datasets [Liu et al. ’12], [Gong andLazebnik ’11]:
I CIFAR-10: 60K images, GIST descriptors, 10 classes1
I MNIST: 70K images, grayscale pixels, 10 classes2
I NUSWIDE: 270K images, GIST descriptors, 21 classes3
I True NNs: images that share at least one class in common[Liu et al. ’12]
1http://www.cs.toronto.edu/~kriz/cifar.html2http://yann.lecun.com/exdb/mnist/3http://lms.comp.nus.edu.sg/research/NUS-WIDE.htm
Evaluation Metrics
I Hamming ranking evaluation paradigm [Liu et al. ’12], [Gongand Lazebnik ’11]
I Standard evaluation metrics [Liu et al. ’12], [Gong andLazebnik ’11]:
I Mean average precison (mAP)
I Precision at Hamming radius 2 (P@R2)
GRH vs Literature (CIFAR-10 @ 32 bits)
LSH BRE STH KSH GRH (Linear) GRH (RBF)0.10
0.15
0.20
0.25
0.30
0.35
mA
P LinearGRH
Non-linear GRH
GRH vs Literature (CIFAR-10 @ 32 bits)
LSH BRE STH KSH GRH (Linear)GRH (RBF)0.10
0.15
0.20
0.25
0.30
0.35
mA
P
GRH's straightforwardobjective outperforms
more complexobjectives
GRH vs Literature (CIFAR-10)
16 24 32 40 48 56 640.10
0.15
0.20
0.25
0.30
0.35
0.40LSHBREKSHGRH
# Bits
mAP
GRH vs Literature (CIFAR-10)
Small amount of supervision
required
16 24 32 40 48 56 640.10
0.15
0.20
0.25
0.30
0.35
0.40LSHBREKSHGRH
# Bits
mAP
+25-30%
GRH vs. Initialisation Strategy (CIFAR-10 @ 32 bits)
GRH (Linear) GRH (RBF)0.00
0.05
0.10
0.15
0.20
0.25
0.30 LSH
ITQ+CCA
mAP
Linear GRH
Non-Linear GRH
GRH vs. Initialisation Strategy (CIFAR-10 @ 32 bits)
GRH (Linear) GRH (RBF)0.00
0.05
0.10
0.15
0.20
0.25
0.30 LSH
ITQ+CCA
mAP
Eigendecompositionnot necessary -saves O(d^3)
GRH vs # Supervisory Data-Points (CIFAR-10)
Linear, T=1K Linear, T=2K RBF, T=1K RBF, T=2K0.00
0.05
0.10
0.15
0.20
0.25
0.30
0.35
0.40
mA
P
LinearGRH
Non-linear GRH
LinearGRH
Non-linear GRH
GRH vs # Supervisory Data-Points (CIFAR-10)
Linear, T=1K Linear, T=2K RBF, T=1K RBF, T=2K0.00
0.05
0.10
0.15
0.20
0.25
0.30
0.35
0.40
mAP
Small amount of supervision
required
GRH Timing (CIFAR-10 @ 32 bits)
Timings (s)Method Train Test TotalGRH 42.68 0.613 43.29KSH [1] 81.17 0.103 82.27
BRE [2] 231.1 0.370 231.4
[1] Liu, W.: Supervised Hashing with Kernels. In: CVPR (2012)[2] Kulis, B.: Binary Reconstructive Embedding. In: NIPS (2009)
Conclusion
Overview
GRH
Evaluation
Conclusion
Conclusions and Future Work
I Supervised hashing model that is both accurate and easilyscalable
I Take-home messages:
I Regularising bits over a graph is effective (and efficient) forhashcode learning
I An intermediate eigendecomposition step is not necessary
I Hyperplanes (linear hypersurfaces) can achieve a very goodretrieval accuracy
I Future work: extend to the cross-modal hashing scenario (e.g.Image ↔ Text, English ↔ Spanish)