sharing features between visual tasks at different levels of granularity sung ju hwang 1, fei sha 2...

Sharing Features Between Visual Tasks at Different Levels of GranularitySung Ju Hwang1, Fei Sha2 and Kristen Grauman1

1 University of Texas at Austin, 2 University of Southern California

Problem

Sharing features between sub/superclasses

A single visual instance can have multiple labels at different levels of semantic granularity..

Main Idea We propose to simultaneously learn shared features that are discriminative for tasks at different levels of semantic granularity

Baselines

Dataset

Sharing features between objects/attributes

Example object class / attribute predictions

1) No sharing : Baseline SVM classifier for the object class recognition2) Sharing-Same level only : Sharing features between object classifiers at the same level 3) Sharing+Superclass : Subclass classifiers trained with features shared with superclasses*4) Sharing+Subclass : Superclass classifiers trained with features shared with subclasses**We use the algorithm for kernel classifiers.

1) Finer grained categorization tasks benefit from sharing features with their superclasses.→ A subclass classifier learns features specific to its superclass, so that it can discriminate better

between itself and the classes that do NOT belong to the same superclass.2) Coarser grained categorization tasks do not benefit from sharing features with their subclasses→ Features learned for subclasses are just intra-class variances that introduce confusion.

Recognition accuracy

Predicted Object

NSO

NSA

Ours

Dolphin

Walrus

Grizzly bear

NSO

NSA

Ours

Grizzly bear

Rhinoceros

Moose

NSO

NSA

Ours

Giant Panda

Rabbit

Rhinoceros

Fast, active, toughskin, chewteeth, forest, ocean, swims

NSA

OursFast, active, toughskin, fish, forest, meatteeth, strong

Strong, inactive, vegetation, quadrapedal, slow, walks, big

NSA

OursStrong, toughskin, slow, walks, vegetation, quadrapedal, inactive

Quadrapedal, oldworld, walks, ground, furry, gray, chewteeth

NSA

OursQuadrapedal, oldworld, ground, walks, tail, gray, furry

White, Spots, Long leg

Polar bear Motivation: By regularizing to use the shared features, we aim to • select features that are associated

with semantic concepts at each level

• avoid overfitting when object-labeled data is lacking

1) Our method is more robust to background clutter, as it has a more refined set of features from sparsity regularization with attributes.

2) Our method makes robust predictions in atypical cases.

3) When our method fails, it often makes more semantically “close” predictions.

Visualinstance

Dalmatian Canine Spots

Object class

Dalmatian

Attributes

→ How can we learn new information from these extra labels, that can aid object recognition?

u2u1 u3 uD

x2x1 x3 xD

Shared features

Input visual features

Superclass

Dog, Canine, Placental mammal

Dataset # images # classes # Attributes Hierarchy level

Animals with Attributes 30,475 50(40) 28 2

Recognition accuracy for each class

Overall recognition accuracy

1) We make improvement on 33 classes out of 50 AWA classes, and on all classes of OSR.2) Classes with more distinct attributes benefit more from feature sharing: e.g. Dalmatian,

leopard, giraffe, zebra

1) No sharing-Obj. : Baseline SVM classifier for object class recognition2) No sharing-Attr. : Baseline object recognition on predicted attributes as in Lampert’s approach3) Sharing-Obj. : Our multitask feature sharing with the object class classifiers only4) Sharing-Attr. : Our multitask feature sharing method with object class + attribute classifiers

Baselines

Predicted Attributes Red: incorrect prediction

Conclusion / Future WorkBy sharing features between classifiers learned at different levels of granularity, we improve object class recognition rates. The exploited semantics effectively regularize the object models.

Future work

1) Automatic selection of useful attributes/superclass grouping.

2) Leveraging the label structure to refine degree of sharing.

Algorithm

Extension to Kernel classifiers

1) Formulate kernel matrix K

2) Compute the basis vector B and diagonal matrix S using Gram-Schmidt process

3) Transform the data according to the learned B and S

Learning shared features for linear classifiers 1) Initialize covariance matrix Ω with a scaled identity matrix I/D

2) Transform the variables using the covariance matrix Ω

: transformed n-th feature vector

: transformed classifier weights

3) Solve for the optimal weights , while holding Ω fixed.

4) Update the covariance matrix Ω

Alternate until W converges

: weight vectors

: smoothing parameter(for numerical stability)

Variable updates

4) Apply the algorithm for the linear classifiers on the transformed features

Initialization

Independentclassifiertraining

Featurelearning

We adopt the alternating optimization algorithm from [Argyriou08] that can train classifiers and learn the shared features at each step.

Learning shared features via regularization Sharing features via sparsity regularization

Trace norm regularization.

Multitask feature learning

Object classifier

Attributeclassifier

SVM loss function on the original feature space

(2,1)-normL2-norm:Joint data fitting

L1-norm: sparsity

sparsitygeneral Training set specific

: n-th feature vector : n-th label for task t : parameter (weight vector) for task t

Multitask Feature Learning: Sparsity regularization on the parameters across different tasks results in shared features with better generalization power.

: regularization parameter

: Orthogonal transformation to a shared feature space

Covariance matrix.

[Argyriou08] M. Argyriou, T. Evgeniou, Convex Multi-task Feature Learning, Machine Learning, 2008

Transformed (Shared) features

Sparsity regularizer

loss function

Convex optimization

However, the (2,1)-norm is nonsmooth. We instead solve an equivalent form, in which the features are replaced with a covariance matrix that measures the relative effectiveness of each dimension.

How can we promote common sparsity across different parameters? → We use a (2,1)-norm regularizer that minimizes L1 norm across tasks [Argyriou08].

Our approach makes substantial improvements over the baselines. Exploiting the external semantics the auxiliary attribute tasks provide, our learned features generalize better---particularly when less training data is available.

sharing features between visual tasks at different levels of granularity sung ju hwang 1, fei sha 2...

Documents

sharing features

object classifiers

subclasses features

object class classifiers

features specific

degree of sharing

feature sharing method

s learning shared features