temporal event clustering for digital photo collections
DESCRIPTION
TEMPORAL EVENT CLUSTERING FOR DIGITAL PHOTO COLLECTIONS. Matthew Cooper, Jonathan Foote, Andreas Girgensohn, and Lynn Wilcox ACM Multimedia ACM Transactions on Multimedia Computing , Communications and Application. OUTLINE. Introduction Feature extraction Clustering techniques - PowerPoint PPT PresentationTRANSCRIPT
TEMPORAL EVENT CLUSTERING FOR DIGITAL PHOTO COLLECTIONSMatthew Cooper, Jonathan Foote, Andreas Girgensohn, and Lynn WilcoxACM MultimediaACM Transactions on Multimedia Computing , Communications and Application
OUTLINE
Introduction Feature extraction Clustering techniques
Supervised event clustering Unsupervised event clustering
Clustering goodness criteria Experimental result Conclusion
INTRODUCTION
Users navigate their photos Temporal order Visual content
Associate time and content with the notion of a specific “event”
Photos associated with an event often exhibit little coherence in terms of either low-level image features or visual similarity photographs from the same event are taken in
relatively close proximity in time
BASIC CONCEPTS--- EVENT
Events are naturally associated with specific times and places. Birthday party Vacation Wedding
BASIC CONCEPTS--- EXIF & CBIR
Exchangeable Image File (EXIF): Time, Location, Focal length, Flash, etc. => Season, place, weather, indoor/outdoor,etc
Content-based Image Retrieval (CBIR): Color, Texture, Shape, etc. => Face & Fingerprint Recognition,etc
Metadata
FEATURE EXTRACTION
EXIF headers are processed to extract the timestamp
The N photos in the collection are then ordered in time so the resulting timestamps, {tn:n = 1, . . . , N},satisfy t1 ≤ t2 ≤ … ≤ tN
Time difference between indices (photos) is nonuniform
t1 t2 t3 t4 t5 t6 ….. t
FEATURE EXTRACTIONComputing similarity matrices SK
K
ttjiS
ji
k exp, temporal similarity matrix
FEATURE EXTRACTIONComputing similarity matrix
low-frequency discrete cosine transform (DCT) coefficients from each photo using the cosine distance measure
ji
ji
cvv
vvjiS
,,
content-based similarity matrix
FEATURE EXTRACTIONcomputing novelty scores
1
,
,,L
Lmlkk mlgmiliSiV
K=1000K=10000
K=100000
peaks in the novelty scores = cluster boundaries between contiguousgroups of similar photos
CLUSTERING TECHNIQUES Supervised event clustering
Based on LVQ Unsupervised event clustering
Scale-space analysis of the raw timestamp data Temporal Similarity Analysis Combining Time and Content-Based Similarity
Supervised event clustering Let K take M values : K ≡ {K1, . . . , KM} Define the M × N matrix N(j,i) = νKj (i)
, where
Based on LVQ (Learning Vector Quantization)
[Kohonen 1989] LVQ codebook discriminates between the two classes
“event boundary” and “event interior.” The codebook vectors for each class are used for
nearest-neighbor classification of the novelty features for each photo in the test set.
Ni NNN ...
i
i
N
mk
k
i
1
Supervised event clustering In the training phase, a codebook is calculated
using an iterative procedure Each step Nearest codebook vector to each training sample is
determined shifted toward or away the training sample
tMNttM
tMNttMtM
cxc
cxcc
1
If Nx and Mc are in the same class
If Nx and Mc aren’t in the same class
Supervised event clustering ALGORITHM 1 (LVQ-BASED PHOTO CLUSTERING). (1) Calculate novelty features from labeled sorted training
data for each scale K : (i) compute the similarity matrix SK (ii) compute the novelty score νK
(2) Train LVQ using the iterative procedure (3) Calculate novelty features for the testing data for each K
(i) compute the similarity matrix SK
(ii) compute the novelty score νK
(4) Classify each test sample’s novelty features Ni using the LVQ codebook and the nearest-neighbor rule.
UNSUPERVISED EVENT CLUSTERING scale-space analysis
operate on the raw timestamps T0 = [t1, . . . , tN] so that T0(i) = ti
ALGORITHM 2 (SCALE-SPACE PHOTO CLUSTERING). (1) Extract timestamp data from photo collection:
{t1, . . . , tN}. (2) For each σ in descending order:
(i) compute Tσ
(ii) detect peaks in Tσ , tracing peaks from larger to smaller scales (decreasing σ).
G
Gj
j
ejiTiiTiT2
2
2020
2
1,
UNSUPERVISED EVENT CLUSTERING Temporal Similarity Analysis
Locate peaks at each scale by analysis of the first difference of each novelty scores νK , proceeding from coarse scale to fine (decreasing K)
To build a hierarchical set of event boundaries, we include boundaries detected at coarse scales in the boundary lists for all finer scales.
1
,
,,L
Lmlkk mlgmiliSiV
K
ttjiS
ji
k exp,
checkerboard kernel used to compute the novelty features
UNSUPERVISED EVENT CLUSTERING Combining Time and Content-Based
Similarity constructed a content-based matrix SC using low-
frequency DCT features and the cosine distance
if |ti-tj| > 48h
others
if |ti-tj| > 48h
others
jiSjiS
jiSjiS
ck
kJk ,1,
,,
h
tt ji
48
jiSjiS
jiSjiS
ck
kJk ,,,max
,,
ji
ji
cvv
vvjiS
,,
CLUSTERING GOODNESS CRITERIA
Peak detection at each scale K results in a hierarchical set of candidate boundaries
Subset must be selected to define the final event clusters
Three different automatic approaches Similarity-Based Confidence Score Boundary Selection via Dynamic Programming BIC-Based Boundary Selection
Similarity-Based Confidence Score Detected boundaries at each level K,
BK = {b1, . . . , bnK },
indexed by photo: BK ⊂ {1, . . . , N}
1
1 ,
1
1
1
1 1212
1
1 2
1
,,k l
l
k l l
l
B
l
b
bji
B
l
b
i
b
bj llll
k
ll
kks bbbb
jiS
bb
jiSBC
average intracluster similarity between the photos within each
cluster
average intercluster similarity between photos in adjacent
clusters
Boundary Selection via Dynamic Programming Reduced complexity
Begin with the set of peaks detected from the novelty features at all scales
Cost of the cluster between photos bi and bj
k
kBB
1 2
1
1,
j
i
b
bnijn
ijjiF t
bbbbC
1
1
1 j
i
b
bnn
ijij t
bb
NB
Boundary Selection via Dynamic Programming Optimal partitions with m boundaries based
on the optimal partition with m−1 boundaries First, optimal partitions are computed with two
clusters
EF (j,m) is the optimal partition of the photos with cardinality m
jibbCbCjE jiFiFji
F ,,,1min2,2
MjMjiEMiFMjE FFjiM
F 3,,,1,min,
Boundary Selection via Dynamic Programming
Number of clusters increases, the total cost of the partition decreases monotonically
Selecting the optimal number of clusters, M∗, based on the total partition cost
mgArgMaxMm 12
*
1,
,
mE
mEmg
F
F
BIC-Based Boundary Selection
This method is based on the Bayes information criterion (BIC) [Schwarz 1978]
Assumption timestamps within an event are distributed
normally around the event mean
111111 log2
,,, llllllll bbbblbblbbl
lll
b
bnl
lnl
llll
bb
tbbbbl
l
l
2log12
2
2log2
,
1
21
1
1
log-likelihood of the two segment model
Log-likelihood of the single segment model and the penalty term
λ is 2 ,since we describe each segment using the sample mean μ,and variance, σ2
BIC-BASED BOUNDARY SELECTION
Employ the hierarchical coarse-to-fine approach
At each scale, we test only the newly detected boundaries (undetected at coarser scales)
Add the boundaries for which the left side exceeds the right side
ALGORITHM 3 (SIMILARITY-BASED PHOTO CLUSTERING) (1) Extract and sort photo timestamps, {t1, . . . , tn}. (2) For each K in decreasing order
(i) compute the similarity matrix Sk
(ii) compute the novelty score νK
(iii) detect peaks in νK (iv) form event boundary list using event boundaries
from previous iterations and newly detected peaks (3) Determine a final boundary subset of collected
boundaries over all scales considered according to one of the methods : (a) the confidence score (b) the DP boundary selection approach (c) the BIC boundary selection approach
EXPERIMENTAL RESULT
Run Times for Different Size Photo Collections The times are in seconds
No Conf. indicates times for Steps 1 and 2 BIC peak selection (BIC) Dynamic programming peak selection (DP) similarity-based peak selection (Conf.)
Doubling the number of photos(N),the time for the segmentation step(No Conf.) increases linearly, while including the confidence measure (Conf.) incurs a polynomial cost.
EXPERIMENTAL RESULT
Compare the event clustering performance of eleven systems on two separate photo collections Collection I consists of 1036 photos taken over
15 months Collection II consists of 413 photos taken over 13
months The first four algorithms in
the table are “hand-tuned” to maximize performance. The remaining algorithms
are fully automatic.
EXPERIMENTAL RESULT
Precision indicates the proportion of falsely labeled boundaries:
Recall measures the proportion of true boundaries detected:
The F-score is a composite of precision and recall:
boundaries detected ofnumber total
boundaries detectedcorrectly precision
boundaries truth ground ofnumber total
boundaries detectedcorrectly recall
recallprecision
recallprecision 2
scoreF
EXPERIMENTAL RESULT
EXPERIMENTAL RESULT
The adaptive-thresholding algorithms exhibited high recall and low precision on both test sets, even with manual tuning
Scale-space and the two similarity-based approaches demonstrated more consistent performance and traded off precision and recall more evenly
CONCLUSION
Employed the automatic temporal similarity-based method
Does not rely on preset thresholds or restrictive assumptions
As photo collections with location information become available, we hope to extend our system to combine temporal similarity, content-based similarity, and location-based similarity.
The automatic methods’ performance exceeded that of manually tuned alternatives in our testing, and have been well received by users of our photo management application.