joint unsupervised learning of deep representations and image clusters

Joint Unsupervised Learning of Deep Representations and Image Clusters

Jianwei Yang, Devi Parikh, Dhruv Batra[arXiv] [GitHub]

Slides by Albert Jiménez [GDoc]Computer Vision Reading Group (23/09/2016)

http://arxiv.org/find/cs/1/au:+Yang_J/0/1/0/all/0/1

http://arxiv.org/find/cs/1/au:+Parikh_D/0/1/0/all/0/1

http://arxiv.org/find/cs/1/au:+Batra_D/0/1/0/all/0/1

http://arxiv.org/find/cs/1/au:+Yang_J/0/1/0/all/0/1

http://arxiv.org/abs/1604.03628

https://github.com/jwyang/joint-unsupervised-learning

https://docs.google.com/presentation/d/1MhgluIvGo3UDAhSbN3lxmdNLBrEOc6DQ_1TGFJQrPWk/edit?usp=sharing

https://imatge.upc.edu/web/teaching/computer-vision-reading-group

https://imatge.upc.edu/web/teaching/computer-vision-reading-group

1.IntroductionThe main idea

2

Learning good image representations is beneficial to image clustering

Main Idea ( 1 )Joint Unsupervised Learning of representations and image clusters

3

Clustering provide supervisory signals for image representation


4

Integrate both processes into a single model

● Unified triplet loss

● End-to-end training

Network Clustering

Forward - Update Clustering

Backward - Update Representation parameters


5

2.ClusteringAgglomerative Clustering

6

Agglomerative ClusteringWhat?

● At the beginning each image is a cluster

● Merge two clusters at each timestep

Affinity Measure

7

Agglomerative ClusteringWhy?

● Recurrent process → Implemented in recurrent framework

● Begins with over-clustering (overcome bad initial representations)

● Clusters are merged as better representations are learned

8

Agglomerative ClusteringAffinity Measure

● Directed graph

● Affinity matrix corresponding to the edge set

Ks → # Nearest Neighbors

NiKs → Ks NN of Xi

xi,j → Image Representation (vertex)

ns → # Samples

a → Design Parameter = 1 9

Agglomerative ClusteringAffinity Measure

● Affinity matrix corresponding to the edge set:

● Affinity measure:

W. Zhang, X. Wang, D. Zhao, and X. Tang. Graph degree linkage: Agglomerative clustering on a directed graph.

10

https://arxiv.org/abs/1208.5092



3.System FormulationJoint Optimization

11

System FormulationRecurrent Framework

At timestep t:

ht → Image cluster labelsyt → OutputXt → Image representations

12

System FormulationRecurrent Framework

● Unrolling strategy:○ Complete○ Partial

■ Split the overall T timesteps into multiple periods

■ In each period we merge a number of clusters and update CNN parameters

13

Unrolling rate : control the number of timesteps

ncs : number of clusters at the start of p

Number of timesteps:

System FormulationAlgorithm

14

System FormulationObjective Function

● Accumulate the losses from all the timesteps

● For y0 we take each image as a cluster, at time t we merge 2 clusters given yt-1

● In conventional agglomerative clustering○ 2 clusters are selected by maximal Affinity measure over pairs of clusters

● In this approach○ Consider the local structure surrounding the clusters

15


● Assuming that from yt-1 to yt we have merged a cluster Cit and its NN

Affinity between cluster Ci and NN

Difference of affinity between cluster Ci and NN and

affinities of Ci with its Kc neighbor clusters

16


● Introducing this loss term:

○ Consider the local structure

surrounding the clusters○ Allows to write the loss in terms of

triplets

17

System FormulationForward/Backward Pass

Network Clustering

Forward - Update Clustering

Backward - Update Representation parameters

18

System FormulationForward Pass

● In the forward pass of the p-th period p∈(1, ... , P), update the clusters with the network parameters fixed

Overall loss for period p

19

● In the forward pass of the p-th period p∈(1, ... , P), we have merged a number of clusters

● In the backward pass we aim to derive the optimal parameters to minimize the losses generated in the forward pass.

System FormulationBackward Pass

We accumulate the losses of p periods

20

● Loss defined on clusters○ Need the entire dataset to estimate

○ Difficult to use batch-based optimization

● Sample based loss approximation

System FormulationFrom cluster based loss to sample based

21

● Sample based loss intuition○ Agglomerative clustering starts with each datapoint as a cluster

○ Clusters at higher level hierarchy are formed by merging lower level clusters


22


● Affinities between clusters can be expressed in terms of affinities with datapoints

For more info check supplement of the paper

23





● Finally it has got the form of a weight triplet loss


is a weight whose value depends on

xi and xj are from the same cluster

xk is from the Kc nearest neighbouring clusters

24

● Given a dataset with ns samples and n

c number of clusters, T = n

s -n

c timesteps

○ Time consuming optimization in large datasets

● Run a fast algorithm to determine initial clustering○ Agglomerative clustering via maximum incremental path integral

System FormulationOptimization

25

https://github.com/waynezhanghk/gactoolbox

https://github.com/waynezhanghk/gactoolbox

4.Experiments

26

● They use Caffe/Torch● Convolutional layers of 50 channels, 5x5 filters, stride = 1, padding = 0● Pooling layers, 2x2 kernel, stride = 2● Final size of the output feature map is 10x10

ExperimentsExperimental Setup

Face datasets

27

● On top of all the CNNs they append a IP layer (dimension 160)

● Followed by l2 norm layer● wt-loss = weight triplet loss

ExperimentsExperimental Setup

28

ExperimentsRobustness Analysis

29

ExperimentsQuantitative Comparison Using Image Intensities as an Input

Measure = NMI = Normalized Mutual Information30

ExperimentsQuantitative Comparison of the Other State-of-the-Art Using their Representations as Input

31

ExperimentsTransfer Learning

32

ExperimentsFace Verification

● Representations learn from Youtube-Face dataset evaluated on LFW

33

5.Conclusions

34

Conclusions

● They combine agglomerative clustering with CNNs and optimize jointly in a recurrent framework.

● Proposal of a partial unrolling strategy to divide the timesteps into multiple periods.

● Obtain more precise image clusters and discriminative representations. ● Able to generalize well across many datasets and tasks.

35

joint unsupervised learning of deep representations and image clusters

Data & Analytics