learning shared body plans

Learning Shared Body Plans

Ian EndresUniversity of Illinois

work with Derek Hoiem, Vivek Srikumar and Ming-Wei Chang

How should we represent multiple related object categories?

How should we represent multiple related object categories?

Want to detect, localize, and estimate pose of broad range of objects, including new ones

One option: independent detectors

CatDetector

DogDetector

4-Legged Animal

Detector

Basic-Level Categories

Broad Categories Parts

…

Head Detector

Our previous work: Train separate detectors, Joint spatial model

Vehicle

Wheel

Animal

Leg

Head

Four-leggedMammal

Can runCan JumpFacing rightMoves on road

Facing right

Farhadi Endres Hoiem (2010)

Jointly trained multi-category models• Train part/category detectors to jointly predict

object structure– Only need to perform well in context defined by

others

• Spatial model encodes likely part positions, number of parts, likely categories, etc.– Generalizes Felzenszwalb et al.: cross-category

sharing, multiple parts with one model, variable size

Deformable Part Models

From Felzenszwalb et al.

Detection with Deformable Part Models

From Felzenszwalb et al.

Shared mixture of deformable parts: Body Plans

Include a body plan for background patches:No appearance models, just a bias

Body Plan Overview

Object Center ++

+

Head Anchors

High Scoring Detections

Anchor Point Score

Sa = bias

+ appearance score

- deformation cost

HOG based Deformable part model (Felzenszwalb et al.)

Quadratic penalty in position and scale

Sa = bias

+ appearance score

- deformation cost

Overall score must be greater than 0 to be detected

Inference: Head

++

+✓

Inference: Leg

++++ +

Inference: Leg

++++ +✓

Search Constraints:CountPairwise Exclusion

Inference: Leg

++++ +✓

Inference: Leg

++++ +✓✓

Inference: Leg

++++ +✓✓✓

Inference: Leg

++++ +✓✓✓✓

Inference

Score for each body plan:

Overall score for an object hypothesis:

Benefits of Joint Learning

Only consider structures with:

Benefits of Joint Learning

No structures have

(Latent) Max Margin Structured Learning

Highest Scoring Valid Structure

Invalid Structure Loss

Soft margin slack

Valid Structures

LEGLEG

LEG LEG

HeadFour-leggedElk

Object Detectors: 50% Overlap with ground truthPart Detectors: 25% Overlap with ground truth

Positive Examples Negative Examples

Must select BG body plan

Loss

LEGLEG

LEGHead

Four-leggedElk

False Positives: +1Duplicate Detections: +1Missed Detections: + 1

Head

LEG

Positive Examples Negative Examples

Non-BG body plan: +1False Positives: +1

Optimization

• Latent Structured SVM– Non-convex - CCCP

• Stochastic gradient descent based cutting plane optimization

Optimization Challenges

1) Expensive search for violated constraints– Mine many violated constraints at once– Speeds convergence

2) Large feature vectors (100k+)– Can’t store every mined violated constraint– Requires careful caching

Experimental Setup

• CORE: Train + Test– Familiar Categories: Camel, Dog, Elephant, Elk– Parts: Head, Leg, Torso– Unfamiliar Categories: Cat, Cow

• Pascal 2008: Test– Unfamiliar Categories: Cat, Cow, Horse, Sheep

Familiar Objects

Unfamiliar Objects

Mistakes

Object Level ResultsAP

Familiar four-legged partsAP

Unfamiliar four-legged partsAP

Mixed Supervision

LEG

LEG

LEG

Head

Four-leggedDog L

EG

LEG

LEG

Four-leggedDog L

EG

LEG

Head

Learning

Mixed Supervision

LEG

LEG

LEG

Head

Four-leggedDog L

EG

Four-leggedDog+

LEG

LEG

Four-leggedDog L

EG

LEG

Head

Learning

Mixed Supervision - Learning

• Unlabeled boxes become latent variables– Compute most likely positition– No loss for missed detections

Highest Scoring Valid Structure

Loss

Mixed Supervision … Mixed ResultsAP

Conclusions

• Jointly representing related categories leads to better performance and generalization to unfamiliar categories

• Joint training important to get full benefit of spatial model

Thanks

learning shared body plans

Documents

head inference

leg inferencescore

leg search constraints

likely categories

body plansinclude

torsounfamiliar categories

testunfamiliar categories

model felzenszwalb et