end-to-end localization and ranking for relative attributesyjlee/teaching/ecs289g... · our idea:...

End-to-End Localization and Ranking

for Relative Attributes

Krishna Kumar Singh and Yong Jae Lee

Presented by Minhao Cheng

Visual attributes

High heel SmileMountainousCozy

[Farhadi et al. 2009, Kumar et al. 2009, Lampert et al. 2009,

Berg et al. 2010, Rastegari et al. 2012, …][Slide: Xiao and Lee, ICCV 2015]

Relative attributes

Is she smiling? Hard to say... Lot easier to say "the right

one is more smiling"

[Parikh & Grauman 2011, Shrivastava et al. 2012,

Kovashka et al. 2013, Sandeep et al. 2014, …]

[Slide: Xiao and Lee, ICCV 2015]

Localization of attributes

Spatial regions that are most relevant to a particular attribute

MountainousCozy

Prior work on localizing attributes

• Attribute localization with human-in-the-loop: [Duan et al. 2012]

• Attribute localization with pre-trained detectors: [Bourdev et al. 2011, Zhang et al. 2014, Sandeep et al. 2014]

• Attribute localization with binary attributes: [Berg et

al. 2010, Bourdev et al. 2011, Duan et al. 2012, Zhang et al. 2014]

Requires strong human supervision

or binary attribute annotations

Prior work on localizing attributes

“Pipeline” where features, localizer, and

classifier are trained separately and

sequentially; suboptimal and slow

• Attribute localization in weakly-supervised setting: [Xiao and Lee, ICCV 2015]

Our idea: jointly learn features, localizer, and ranker end-to-end using deep network

End-to-end network for attribute localization and ranking

[Singh and Lee, ECCV 2016]

Our idea: jointly learn features, localizer, and classifier end-to-end using deep network

Attribute: Smile

Training pairs

Training

Our idea: jointly learn features, localizer, and classifier end-to-end using deep network

Attribute: Smile

Training pairs

Weak Strong

Testing

Training

Test images

Overview of our end-to-end approach

Loss Function

Localization

Network

Ranker

Network

Siamese Network (S1)

Localization

Network

Ranker

Network

Siamese Network (S2)

Attribute: Smile

• Goal: Given pairs of ordered training images, simultaneously localize attribute in each image and learn a ranker

Our end-to-end approach

384 384 384 128 3128

θ Grid

generator

Ranker Network

384 384 384 4096 4096

Localization Network

384 384 384 128 3128

θ Grid

generator

• Localization network discovers the region-of-interest for the attribute

• Learn transformation parameters mapping input to output

• Spatial Transformer Networks [Jaderberg et al. 2014]

384 384 384 128 3128

θ Grid

generator

Ranker Network

384 384 384 4096 4096

• Ranker network takes the localized region to produce a ranking score

• Combine the global image for global context

Loss Function

Localization

Network

Ranker

Network

Siamese (S1)

Localization

Network

Ranker

Network

Siamese (S2)

Training

• Cross entropy:

Attribute: Smile

Loss Function

Localization

Network

Ranker

Network

Siamese (S1)

Localization

Network

Ranker

Network

Siamese (S2)

Training

• Localized region can fall outside image bounds making learning difficult

Attribute: Smile

Loss Function

Localization

Network

Ranker

Network

Siamese (S1)

Localization

Network

Ranker

Network

Siamese (S2)

Training

• Optimized using backpropagation, mini-batch Stochastic Gradient Descent

Attribute: Smile

Attribute:

Attribute: Dark hair

Training epochs

• Heatmap: distribution of localized region across entire training dataset

Progression of localized region over training epochs

VtestLocalization

NetworkRanker Network

Siamese (S1)

Testing

• Localize the relevant attribute region

• Produce a ranking score for the test image

Test image

Experiments: Relative attribute datasets

Visible teeth, Eyes open, Dark hair, Smile, Good looking...

Pointy, Open, Sporty, Comfort

LFW-10 (2k images)[Sandeep et al., CVPR 2014]

UTZappos50k (50k images)[Yu & Grauman, CVPR 2014]

Results: Discovered regions and ranking on LFW-10 FacesWeak Strong

• Our network discovers relevant attribute regions

• Leads to accurate rankings

Weak Strong

Masculine

looking

• Global attributes are harder to interpret

• Focus more on larger areas

Results: Discovered regions and ranking on LFW-10 Faces

Weak Strong

Pointy

Sporty

Comfort

Results: Discovered regions and ranking UT-Zap50K Shoes

Results: Image pair ranking accuracy

• % of test image pairs whose predicted relative attribute ranking is correct

• State-of-the-art results on LFW-10, UT-Zap50K, OSR, Shoe-with-Attribute

Combing global image context w/ localized fine-grained information performs best

Conclusions

• Novel end-to-end network for ranking and localizing attributes.

• State-of-the-art performance on the attribute ranking performance on benchmark face, shoe, and outdoor scene datasets.

• Our Our approach is 100 times faster than [Xiao & Lee].

Question

• What if we can use multiple localization network instead of one to help to get a better performance? (like we can use the eye’s feature to help ranking the smile attribute as well)

end-to-end localization and ranking for relative attributesyjlee/teaching/ecs289g... · our idea:...

Documents

dealer scorecard ranker

aprank (antigenic peptide/protein ranker): a bioinformatic

heat map based feature ranker: in depth comparison with

improved tf-idf ranker presentation by, muralidhar chouhan

ranker adults 35-54

gsa search engine ranker review

a syntactic constituent ranker for speculation and...

visualizing and understanding convolution...

ecs 289g: visual...

fully convolutional networks for semantic...

nielsenen-us.nielsen.com/sitelets/cls/documents/radio_advisor/...nielsen...

last ranker guide & walkthrough

improved tf-idf ranker

michael clay & rich ranker east tennessee state university...

review of gsa search engine ranker and step by step tutorial

ecs$289g$–$uc$davis$...

triton d webcast metrics monthly ranker

ssd:single shot multibox redd, cheng-yang fu, alexander...

visualizing and understanding recurrent...

february 17, 2005 ©rich ranker – salt orlando 05...