joint unsupervised learning of deep representations and image clusters

Post on 15-Apr-2017

165 Views

Category:

Data & Analytics

2 Downloads

Preview:

Click to see full reader

TRANSCRIPT

1.IntroductionThe main idea

2

Learning good image representations is beneficial to image clustering

Main Idea ( 1 )Joint Unsupervised Learning of representations and image clusters

3

Clustering provide supervisory signals for image representation

Main Idea ( 2 )Joint Unsupervised Learning of representations and image clusters

4

Integrate both processes into a single model

● Unified triplet loss

● End-to-end training

Network Clustering

Forward - Update Clustering

Backward - Update Representation parameters

Main Idea ( 3 )Joint Unsupervised Learning of representations and image clusters

5

2.ClusteringAgglomerative Clustering

6

Agglomerative ClusteringWhat?

● At the beginning each image is a cluster

● Merge two clusters at each timestep

Affinity Measure

7

Agglomerative ClusteringWhy?

● Recurrent process → Implemented in recurrent framework

● Begins with over-clustering (overcome bad initial representations)

● Clusters are merged as better representations are learned

8

Agglomerative ClusteringAffinity Measure

● Directed graph

● Affinity matrix corresponding to the edge set

Ks → # Nearest Neighbors

NiKs → Ks NN of Xi

xi,j → Image Representation (vertex)

ns → # Samples

a → Design Parameter = 1 9

Agglomerative ClusteringAffinity Measure

● Affinity matrix corresponding to the edge set:

● Affinity measure:

W. Zhang, X. Wang, D. Zhao, and X. Tang. Graph degree linkage: Agglomerative clustering on a directed graph.

10

3.System FormulationJoint Optimization

11

System FormulationRecurrent Framework

At timestep t:

ht → Image cluster labelsyt → OutputXt → Image representations

12

System FormulationRecurrent Framework

● Unrolling strategy:○ Complete○ Partial

■ Split the overall T timesteps into multiple periods

■ In each period we merge a number of clusters and update CNN parameters

13

Unrolling rate : control the number of timesteps

ncs : number of clusters at the start of p

Number of timesteps:

System FormulationAlgorithm

14

System FormulationObjective Function

● Accumulate the losses from all the timesteps

● For y0 we take each image as a cluster, at time t we merge 2 clusters given yt-1

● In conventional agglomerative clustering○ 2 clusters are selected by maximal Affinity measure over pairs of clusters

● In this approach○ Consider the local structure surrounding the clusters

15

System FormulationObjective Function

● Assuming that from yt-1 to yt we have merged a cluster Cit and its NN

Affinity between cluster Ci and NN

Difference of affinity between cluster Ci and NN and

affinities of Ci with its Kc neighbor clusters

16

System FormulationObjective Function

● Introducing this loss term:

○ Consider the local structure

surrounding the clusters○ Allows to write the loss in terms of

triplets

17

System FormulationForward/Backward Pass

Network Clustering

Forward - Update Clustering

Backward - Update Representation parameters

18

System FormulationForward Pass

● In the forward pass of the p-th period p∈(1, ... , P), update the clusters with the network parameters fixed

Overall loss for period p

19

● In the forward pass of the p-th period p∈(1, ... , P), we have merged a number of clusters

● In the backward pass we aim to derive the optimal parameters to minimize the losses generated in the forward pass.

System FormulationBackward Pass

We accumulate the losses of p periods

20

● Loss defined on clusters○ Need the entire dataset to estimate

○ Difficult to use batch-based optimization

● Sample based loss approximation

System FormulationFrom cluster based loss to sample based

21

● Sample based loss intuition○ Agglomerative clustering starts with each datapoint as a cluster

○ Clusters at higher level hierarchy are formed by merging lower level clusters

System FormulationFrom cluster based loss to sample based

22

System FormulationFrom cluster based loss to sample based

● Affinities between clusters can be expressed in terms of affinities with datapoints

For more info check supplement of the paper

23

● Finally it has got the form of a weight triplet loss

System FormulationFrom cluster based loss to sample based

is a weight whose value depends on

xi and xj are from the same cluster

xk is from the Kc nearest neighbouring clusters

24

● Given a dataset with ns samples and n

c number of clusters, T = n

s -n

c timesteps

○ Time consuming optimization in large datasets

● Run a fast algorithm to determine initial clustering○ Agglomerative clustering via maximum incremental path integral

System FormulationOptimization

25

4.Experiments

26

● They use Caffe/Torch● Convolutional layers of 50 channels, 5x5 filters, stride = 1, padding = 0● Pooling layers, 2x2 kernel, stride = 2● Final size of the output feature map is 10x10

ExperimentsExperimental Setup

Face datasets

27

● On top of all the CNNs they append a IP layer (dimension 160)

● Followed by l2 norm layer● wt-loss = weight triplet loss

ExperimentsExperimental Setup

28

ExperimentsRobustness Analysis

29

ExperimentsQuantitative Comparison Using Image Intensities as an Input

Measure = NMI = Normalized Mutual Information30

ExperimentsQuantitative Comparison of the Other State-of-the-Art Using their Representations as Input

31

ExperimentsTransfer Learning

32

ExperimentsFace Verification

● Representations learn from Youtube-Face dataset evaluated on LFW

33

5.Conclusions

34

Conclusions

● They combine agglomerative clustering with CNNs and optimize jointly in a recurrent framework.

● Proposal of a partial unrolling strategy to divide the timesteps into multiple periods.

● Obtain more precise image clusters and discriminative representations. ● Able to generalize well across many datasets and tasks.

35

top related