temporal event clustering for digital photo collections

TEMPORAL EVENT CLUSTERING FOR DIGITAL PHOTO COLLECTIONSMatthew Cooper, Jonathan Foote, Andreas Girgensohn, and Lynn WilcoxACM MultimediaACM Transactions on Multimedia Computing , Communications and Application

OUTLINE

Introduction Feature extraction Clustering techniques

Supervised event clustering Unsupervised event clustering

Clustering goodness criteria Experimental result Conclusion

INTRODUCTION

Users navigate their photos Temporal order Visual content

Associate time and content with the notion of a specific “event”

Photos associated with an event often exhibit little coherence in terms of either low-level image features or visual similarity photographs from the same event are taken in

relatively close proximity in time

BASIC CONCEPTS--- EVENT

Events are naturally associated with specific times and places. Birthday party Vacation Wedding

BASIC CONCEPTS--- EXIF & CBIR

Exchangeable Image File (EXIF): Time, Location, Focal length, Flash, etc. => Season, place, weather, indoor/outdoor,etc

Content-based Image Retrieval (CBIR): Color, Texture, Shape, etc. => Face & Fingerprint Recognition,etc

Metadata

FEATURE EXTRACTION

EXIF headers are processed to extract the timestamp

The N photos in the collection are then ordered in time so the resulting timestamps, {tn:n = 1, . . . , N},satisfy t1 ≤ t2 ≤ … ≤ tN

Time difference between indices (photos) is nonuniform

t1 t2 t3 t4 t5 t6 ….. t

FEATURE EXTRACTIONComputing similarity matrices SK

K

ttjiS

ji

k exp, temporal similarity matrix

FEATURE EXTRACTIONComputing similarity matrix

low-frequency discrete cosine transform (DCT) coefficients from each photo using the cosine distance measure

ji

ji

cvv

vvjiS

,,

content-based similarity matrix

FEATURE EXTRACTIONcomputing novelty scores

1

,

,,L

Lmlkk mlgmiliSiV

K=1000K=10000

K=100000

peaks in the novelty scores = cluster boundaries between contiguousgroups of similar photos

CLUSTERING TECHNIQUES Supervised event clustering

Based on LVQ Unsupervised event clustering

Scale-space analysis of the raw timestamp data Temporal Similarity Analysis Combining Time and Content-Based Similarity

Supervised event clustering Let K take M values : K ≡ {K1, . . . , KM} Define the M × N matrix N(j,i) = νKj (i)

, where

Based on LVQ (Learning Vector Quantization)

[Kohonen 1989] LVQ codebook discriminates between the two classes

“event boundary” and “event interior.” The codebook vectors for each class are used for

nearest-neighbor classification of the novelty features for each photo in the test set.

Ni NNN ...

i

i

N

mk

k

i

1

Supervised event clustering In the training phase, a codebook is calculated

using an iterative procedure Each step Nearest codebook vector to each training sample is

determined shifted toward or away the training sample

tMNttM

tMNttMtM

cxc

cxcc

1

If Nx and Mc are in the same class

If Nx and Mc aren’t in the same class

Supervised event clustering ALGORITHM 1 (LVQ-BASED PHOTO CLUSTERING). (1) Calculate novelty features from labeled sorted training

data for each scale K : (i) compute the similarity matrix SK (ii) compute the novelty score νK

(2) Train LVQ using the iterative procedure (3) Calculate novelty features for the testing data for each K

(i) compute the similarity matrix SK

(ii) compute the novelty score νK

(4) Classify each test sample’s novelty features Ni using the LVQ codebook and the nearest-neighbor rule.

UNSUPERVISED EVENT CLUSTERING scale-space analysis

operate on the raw timestamps T0 = [t1, . . . , tN] so that T0(i) = ti

ALGORITHM 2 (SCALE-SPACE PHOTO CLUSTERING). (1) Extract timestamp data from photo collection:

{t1, . . . , tN}. (2) For each σ in descending order:

(i) compute Tσ

(ii) detect peaks in Tσ , tracing peaks from larger to smaller scales (decreasing σ).

G

Gj

j

ejiTiiTiT2

2

2020

2

1,

UNSUPERVISED EVENT CLUSTERING Temporal Similarity Analysis

Locate peaks at each scale by analysis of the first difference of each novelty scores νK , proceeding from coarse scale to fine (decreasing K)

To build a hierarchical set of event boundaries, we include boundaries detected at coarse scales in the boundary lists for all finer scales.

1

,

,,L

Lmlkk mlgmiliSiV

K

ttjiS

ji

k exp,

checkerboard kernel used to compute the novelty features

UNSUPERVISED EVENT CLUSTERING Combining Time and Content-Based

Similarity constructed a content-based matrix SC using low-

frequency DCT features and the cosine distance

if |ti-tj| > 48h

others

if |ti-tj| > 48h

others

jiSjiS

jiSjiS

ck

kJk ,1,

,,

h

tt ji

48

jiSjiS

jiSjiS

ck

kJk ,,,max

,,

ji

ji

cvv

vvjiS

,,

CLUSTERING GOODNESS CRITERIA

Peak detection at each scale K results in a hierarchical set of candidate boundaries

Subset must be selected to define the final event clusters

Three different automatic approaches Similarity-Based Confidence Score Boundary Selection via Dynamic Programming BIC-Based Boundary Selection

Similarity-Based Confidence Score Detected boundaries at each level K,

BK = {b1, . . . , bnK },

indexed by photo: BK ⊂ {1, . . . , N}

1

1 ,

1

1

1

1 1212

1

1 2

1

,,k l

l

k l l

l

B

l

b

bji

B

l

b

i

b

bj llll

k

ll

kks bbbb

jiS

bb

jiSBC

average intracluster similarity between the photos within each

cluster

average intercluster similarity between photos in adjacent

clusters

Boundary Selection via Dynamic Programming Reduced complexity

Begin with the set of peaks detected from the novelty features at all scales

Cost of the cluster between photos bi and bj

k

kBB

1 2

1

1,

j

i

b

bnijn

ijjiF t

bbbbC

1

1

1 j

i

b

bnn

ijij t

bb

NB

Boundary Selection via Dynamic Programming Optimal partitions with m boundaries based

on the optimal partition with m−1 boundaries First, optimal partitions are computed with two

clusters

EF (j,m) is the optimal partition of the photos with cardinality m

jibbCbCjE jiFiFji

F ,,,1min2,2

MjMjiEMiFMjE FFjiM

F 3,,,1,min,

Boundary Selection via Dynamic Programming

Number of clusters increases, the total cost of the partition decreases monotonically

Selecting the optimal number of clusters, M∗, based on the total partition cost

mgArgMaxMm 12

*

1,

,

mE

mEmg

F

F

BIC-Based Boundary Selection

This method is based on the Bayes information criterion (BIC) [Schwarz 1978]

Assumption timestamps within an event are distributed

normally around the event mean

111111 log2

,,, llllllll bbbblbblbbl

lll

b

bnl

lnl

llll

bb

tbbbbl

l

l

2log12

2

2log2

,

1

21

1

1

log-likelihood of the two segment model

Log-likelihood of the single segment model and the penalty term

λ is 2 ,since we describe each segment using the sample mean μ,and variance, σ2

BIC-BASED BOUNDARY SELECTION

Employ the hierarchical coarse-to-fine approach

At each scale, we test only the newly detected boundaries (undetected at coarser scales)

Add the boundaries for which the left side exceeds the right side

ALGORITHM 3 (SIMILARITY-BASED PHOTO CLUSTERING) (1) Extract and sort photo timestamps, {t1, . . . , tn}. (2) For each K in decreasing order

(i) compute the similarity matrix Sk

(ii) compute the novelty score νK

(iii) detect peaks in νK (iv) form event boundary list using event boundaries

from previous iterations and newly detected peaks (3) Determine a final boundary subset of collected

boundaries over all scales considered according to one of the methods : (a) the confidence score (b) the DP boundary selection approach (c) the BIC boundary selection approach

EXPERIMENTAL RESULT

Run Times for Different Size Photo Collections The times are in seconds

No Conf. indicates times for Steps 1 and 2 BIC peak selection (BIC) Dynamic programming peak selection (DP) similarity-based peak selection (Conf.)

Doubling the number of photos(N),the time for the segmentation step(No Conf.) increases linearly, while including the confidence measure (Conf.) incurs a polynomial cost.

EXPERIMENTAL RESULT

Compare the event clustering performance of eleven systems on two separate photo collections Collection I consists of 1036 photos taken over

15 months Collection II consists of 413 photos taken over 13

months The first four algorithms in

the table are “hand-tuned” to maximize performance. The remaining algorithms

are fully automatic.

EXPERIMENTAL RESULT

Precision indicates the proportion of falsely labeled boundaries:

Recall measures the proportion of true boundaries detected:

The F-score is a composite of precision and recall:

boundaries detected ofnumber total

boundaries detectedcorrectly precision

boundaries truth ground ofnumber total

boundaries detectedcorrectly recall

recallprecision

recallprecision 2

scoreF

EXPERIMENTAL RESULT

EXPERIMENTAL RESULT

The adaptive-thresholding algorithms exhibited high recall and low precision on both test sets, even with manual tuning

Scale-space and the two similarity-based approaches demonstrated more consistent performance and traded off precision and recall more evenly

CONCLUSION

Employed the automatic temporal similarity-based method

Does not rely on preset thresholds or restrictive assumptions

As photo collections with location information become available, we hope to extend our system to combine temporal similarity, content-based similarity, and location-based similarity.

The automatic methods’ performance exceeded that of manually tuned alternatives in our testing, and have been well received by users of our photo management application.

temporal event clustering for digital photo collections

Documents