jigsaws: joint appearance and shape clustering john winn with anitha kannan and carsten rother...
TRANSCRIPT
Jigsaws: joint appearance and shape clustering
John Winnwith Anitha Kannan and Carsten Rother
Microsoft Research, Cambridge
Patch models
Used for: Object recognition/detection Object segmentation
But also: Stereo matching, photo stitching Texture synthesis Super-resolution Motion segmentation Image/video compression
Patch models
Patch clustering/codebook (e.g. Leibe & Schiele)
Epitome (Jojic et al.)parameter sharing + translation invariant
Issues with fixed patch size/shape
Patch includes backgroundpatches containing the same object are not clustered together
Patch excludes part of objectpatch is less discriminative
Patch includes occlusionoccluded and unoccluded objects are not clustered together
Patch size?
Small(single pixel)
Large(entire image)
More discriminative
Less sharing
More sharing
Less discriminative
Optimal size/shape?
Depends on: •object size/shape•object variability•size of training set
Size
Aims of jigsaw model
Learn patches (jigsaw pieces) which are
1.Shared: each piece is similar in shape and appearance to many regions of the training images;
2.Discriminative: each piece is as large as possible;
3.Exhaustive: all parts of the training images can be reconstructed from the set of jigsaw pieces.
The Jigsaw model
Image I1 Offset map L1
...Image I2 Offset map L2 Image IN Offset map LN
Jigsaw J
The Jigsaw model
Jigsaw J
Image I1 Offset map L1
...Image I2 Offset map L2 Image IN Offset map LN
x
xxxxxP 1)(,)(|Normal),|( LL)I(LJI
)( λμ,
The Jigsaw model
Jigsaw J
Image I1 Offset map L1
...Image I2 Offset map L2 Image IN Offset map LN
)( λμ,
Potts model:
Toy example
Training image Jigsaw
Learned using EM+ graph cuts
Dog example
Training image
3232 Jigsaw mean
Dog example
Reconstructed image
Learned segmentation
3232 Jigsaw mean
Epitome reconstruction
Faces example
128128 Jigsaw mean
100 6464 imagesSource: Olivetti face database
Learning the ‘pieces’
Image I1 Offset map L1
...Image I2 Offset map L2 Image IN Offset map LN
Jigsaw J
Learning the ‘pieces’
Image I1 Offset map L1
...Image I2 Offset map L2 Image IN Offset map LN
Jigsaw J
Faces example
Results of shape clustering on the face images
64x64 jigsaw
Object recognition (preliminary) Trained set: 20 street images
Allow patches to deform (as in LayoutCRF, CVPR 2006).
Object recognition (preliminary) Trained set: 20 street images (10 labelled)
64x64 jigsaw
Accuracy improves (~1%) if you include an additional 10 unlabelled images when learning the jigsaw.
Allow patches to deform (as in LayoutCRF, CVPR 2006).
Work in progress…
Training larger jigsaws on 100s of images
Incorporating shape clustering into the probabilistic model
Learning additional invariances e.g. to illumination
Object recognition results on MSRC and other datasets
Conclusions Jigsaw model allows learning the shape
and appearance of objects or object parts in images. Can also handle occlusion.
Clustering shape and appearance much more powerful for recognition than appearance alone.
Can be used as a ‘plug-and-play’ replacement for fixed size patches in any existing patch-based system.