taxonomic multi-class prediction and person layout using efficient structured ranking ·...

Post on 01-Jun-2020

4 Views

Category:

Documents

0 Downloads

Preview:

Click to see full reader

TRANSCRIPT

Taxonomic Multi-class Prediction and Person Layout using Efficient Structured Ranking

Arpit Mittal1, Matthew Blaschko2, Andrew Zisserman1 and Philip Torr3

1Visual Geometry Group, University of Oxford 2Center of Visual Computing, École Centrale Paris 3Oxford Brookes Vision Group, Oxford Brookes University

3. Implementation and Results 1. Objective and Contributions

2. Parts-based model in which mis-classification score should be proportional to the number of parts mis-classified.

Animal Vehicle

Horse Cow Bus Bike

Evaluation Measure: Mean accumulated taxonomic loss over the top scoring ‘t’ classes for a given image.

* Taxonomic loss between ‘bike’ and ‘bus’ is 2, whereas between

‘bike’ and ‘cow’ it is 4.

We perform layout prediction in 2 stages: (i) Part candidates are generated using individual

detectors. • The candidates are filtered and scored using local position and scale cues (ϕhead and ϕhand for head and hand respectively).

(ii) Candidates are combined and ranked using the

structured output ranking. • ϕ(xi,yi) is formed by concatenating the feature vectors of head and all hands, ϕ(xi,yi) = [ϕhead (ϕhand)

h].

Feature

Vector

Structured

Ranking

SVM

Confidence

Value

Multi-class Taxonomic Prediction Person Layout

• Experiments are performed using the Indoor scene database [Quattoni et al., 2009] and the PASCAL VOC 2007 classification dataset.

• The Indoor scene database has 67 classes structured in a two level taxonomy whereas the VOC 2007 dataset has a three level taxonomy of 20 classes.

• Deeper taxonomy of PASCAL VOC dataset yields better performance.

• PASCAL VOC 2011 person layout dataset is used for the experiments.

• Our method is also compared with the PASCAL VOC 2010 submissions.

This research is funded by ERC grant VisRec no. 228180, ERC grant no. 259112, ONR MURI N00014-07-1-0182 and PASCAL2 Network of Excellence.

Loss3 Loss1 Loss2 < <

Loss1 Loss2 Loss3 < <

Contributions: • Show that the training time can be reduced from

quadratic to linear in the number of training samples. • Improved performance over state-of-the-art for both

the example problems.

(2)

2. Method

Linear Time Constraint Generation

• Generalizes ordinal regression to a structured space. • Learns the weight vector w, such that the input-output

pairs with lower loss values (Δi) are assigned a higher compatibility score f(xi, yi) = wTϕ(xi, yi).

• The objective function pays a hinge loss proportional to the difference in losses for mis-ordering:

(1)

• (1) has O(n2) number of constraints and slack variables. • The equivalent 1-slack formulation has a single slack

variable, .

• cij = |(Δi < Δj) ∧ ((wTϕ(xi, yi)) – (wTϕ(xj, yj)) < 1)| . • A constraint is violated when the difference in the

compatibility score of samples is less than the margin. • Training of (2) involves optimizing the objective

function with the violated constraints. • If the loss values are discrete and finite, all the violated

constraints can be found in linear time.

(a) (b)

• Sort the data samples, as list S, in terms of decreasing compatibility score.

• Iterate through the set of loss values l = l1 … lL. For each l: • If a sample j with Δj > l (black bar) is scored such that

it violates the margin constraint with a sample i having Δi = l, it will also violate all the subsequent samples i’ with Δi’ = l in the sorted list S.

• By recording all the samples having loss value l (grey bars), all the violated constraints for pairs (i, j) with Δi

= l, can be found in one pass through the samples. • http://www.robots.ox.ac.uk/~vgg/software/struct_rank/ .

Structured Output Ranking

Results

Results

* The target class is bike

Algorithm

* * * * *

The number of loss values is logarithmic in the number of class labels.

• We learn the class hierarchies in a ranking setting. • Image (xi) is represented by quantised SIFT descriptors

encoded using locality-constrained linear encoding. • The joint feature map (ϕ) is the standard one used in

multi-class prediction: ϕ(xi, yi) = λ(yi) xi. • Class attribute vector λj(yi) = 1 if j = yi, zero otherwise.

The number of loss values is limited by the number of parts.

Objective: Structured output ranking for problems

where the mis-classification score has a special structure.

Examples:

1. Taxonomy in which the mis-classification score for object classes close by, using the tree distance within the taxonomy, should be less than for those far apart.

top related