sharing features between visual tasks at different levels of granularity sung ju hwang 1, fei sha 2...
TRANSCRIPT
Sharing Features Between Visual Tasks at Different Levels of GranularitySung Ju Hwang1, Fei Sha2 and Kristen Grauman1
1 University of Texas at Austin, 2 University of Southern California
Problem
Sharing features between sub/superclasses
A single visual instance can have multiple labels at different levels of semantic granularity..
Main Idea We propose to simultaneously learn shared features that are discriminative for tasks at different levels of semantic granularity
Baselines
Dataset
Sharing features between objects/attributes
Example object class / attribute predictions
1) No sharing : Baseline SVM classifier for the object class recognition2) Sharing-Same level only : Sharing features between object classifiers at the same level 3) Sharing+Superclass : Subclass classifiers trained with features shared with superclasses*4) Sharing+Subclass : Superclass classifiers trained with features shared with subclasses**We use the algorithm for kernel classifiers.
1) Finer grained categorization tasks benefit from sharing features with their superclasses.→ A subclass classifier learns features specific to its superclass, so that it can discriminate better
between itself and the classes that do NOT belong to the same superclass.2) Coarser grained categorization tasks do not benefit from sharing features with their subclasses→ Features learned for subclasses are just intra-class variances that introduce confusion.
Recognition accuracy
Predicted Object
NSO
NSA
Ours
Dolphin
Walrus
Grizzly bear
NSO
NSA
Ours
Grizzly bear
Rhinoceros
Moose
NSO
NSA
Ours
Giant Panda
Rabbit
Rhinoceros
Fast, active, toughskin, chewteeth, forest, ocean, swims
NSA
OursFast, active, toughskin, fish, forest, meatteeth, strong
Strong, inactive, vegetation, quadrapedal, slow, walks, big
NSA
OursStrong, toughskin, slow, walks, vegetation, quadrapedal, inactive
Quadrapedal, oldworld, walks, ground, furry, gray, chewteeth
NSA
OursQuadrapedal, oldworld, ground, walks, tail, gray, furry
White, Spots, Long leg
Polar bear Motivation: By regularizing to use the shared features, we aim to • select features that are associated
with semantic concepts at each level
• avoid overfitting when object-labeled data is lacking
1) Our method is more robust to background clutter, as it has a more refined set of features from sparsity regularization with attributes.
2) Our method makes robust predictions in atypical cases.
3) When our method fails, it often makes more semantically “close” predictions.
Visualinstance
Dalmatian Canine Spots
Object class
Dalmatian
Attributes
→ How can we learn new information from these extra labels, that can aid object recognition?
u2u1 u3 uD
x2x1 x3 xD
Shared features
Input visual features
Superclass
Dog, Canine, Placental mammal
Dataset # images # classes # Attributes Hierarchy level
Animals with Attributes 30,475 50(40) 28 2
Recognition accuracy for each class
Overall recognition accuracy
1) We make improvement on 33 classes out of 50 AWA classes, and on all classes of OSR.2) Classes with more distinct attributes benefit more from feature sharing: e.g. Dalmatian,
leopard, giraffe, zebra
1) No sharing-Obj. : Baseline SVM classifier for object class recognition2) No sharing-Attr. : Baseline object recognition on predicted attributes as in Lampert’s approach3) Sharing-Obj. : Our multitask feature sharing with the object class classifiers only4) Sharing-Attr. : Our multitask feature sharing method with object class + attribute classifiers
Baselines
Predicted Attributes Red: incorrect prediction
Conclusion / Future WorkBy sharing features between classifiers learned at different levels of granularity, we improve object class recognition rates. The exploited semantics effectively regularize the object models.
Future work
1) Automatic selection of useful attributes/superclass grouping.
2) Leveraging the label structure to refine degree of sharing.
Algorithm
Extension to Kernel classifiers
1) Formulate kernel matrix K
2) Compute the basis vector B and diagonal matrix S using Gram-Schmidt process
3) Transform the data according to the learned B and S
Learning shared features for linear classifiers 1) Initialize covariance matrix Ω with a scaled identity matrix I/D
2) Transform the variables using the covariance matrix Ω
: transformed n-th feature vector
: transformed classifier weights
3) Solve for the optimal weights , while holding Ω fixed.
4) Update the covariance matrix Ω
Alternate until W converges
: weight vectors
: smoothing parameter(for numerical stability)
Variable updates
4) Apply the algorithm for the linear classifiers on the transformed features
Initialization
Independentclassifiertraining
Featurelearning
We adopt the alternating optimization algorithm from [Argyriou08] that can train classifiers and learn the shared features at each step.
Learning shared features via regularization Sharing features via sparsity regularization
Trace norm regularization.
Multitask feature learning
Object classifier
Attributeclassifier
SVM loss function on the original feature space
(2,1)-normL2-norm:Joint data fitting
L1-norm: sparsity
sparsitygeneral Training set specific
: n-th feature vector : n-th label for task t : parameter (weight vector) for task t
Multitask Feature Learning: Sparsity regularization on the parameters across different tasks results in shared features with better generalization power.
: regularization parameter
: Orthogonal transformation to a shared feature space
Covariance matrix.
[Argyriou08] M. Argyriou, T. Evgeniou, Convex Multi-task Feature Learning, Machine Learning, 2008
Transformed (Shared) features
Sparsity regularizer
loss function
Convex optimization
However, the (2,1)-norm is nonsmooth. We instead solve an equivalent form, in which the features are replaced with a covariance matrix that measures the relative effectiveness of each dimension.
How can we promote common sparsity across different parameters? → We use a (2,1)-norm regularizer that minimizes L1 norm across tasks [Argyriou08].
Our approach makes substantial improvements over the baselines. Exploiting the external semantics the auxiliary attribute tasks provide, our learned features generalize better---particularly when less training data is available.