using attributes to describe what people wear andy gallagher october 14, 2013 with huizhong chen and...
TRANSCRIPT
Using Attributes to Describe What People
WearAndy Gallagher
October 14, 2013 with Huizhong Chen and Bernd Girod
Objective
List of attributesMen’sBlack colorSweaterLong sleeveSolid patternLow skin exposure…
Attribute learning
3
Outline Attributes Describing Clothing with Attributes ! Miscellaneous Topics !
Attributes
Attributes Describing objects by their attributes, A
Farhadi, I Endres, D Hoiem, D ForsythComputer Vision and Pattern Recognition, 2009. CVPR 2009
Learning To Detect Unseen Object Classes by Between-Class Attribute Transfer, C. Lampert, H. Nickisch, S. Harmeling, CVPR 2009
Many others
Computer Vision
image
features
classification
Computer Vision
image
features
classification
[ .1 -.9.1.231-.1]
?
Computer Vision
image
features
classification
What feature representation should we use?
Computer Vision
image
features
classification
[ .1 -.9.1.231-.1]
Now we can talk…
attributes Has hair, has skin, has ear, has eye, has arms
Attributes Properties shared by many objects Explicit semantics Facilitate human-CPU communication Materials (glass, fur, wood, etc.) Parts (has wheel, has tail, etc.) Shape (boxy, cylindrical, etc.)
11Based on a slide by David Forsyth
Example AttributesFace Tracer Image Search
“Smiling Asian Men With Glasses”
Kumar et al., 200812
Example Attributes
Farhadi et al. 2009 13
Example Attributes
Lampert et al. 200914
Slide credit: Devi Parikh
Example Attributes
Welinder et al. 201015
Slide credit: Devi Parikh
16
Attribute Models Classifiers for binary attributes
Kumar et al. 2010
Slide credit: Devi Parikh
17
Why attributes? How humans naturally describe visual
concepts Image search I want elegant
silver sandals with high heels
Slide credit: Devi Parikh
Example Attributes
Verification
classifier
SAMEKumar et al., 2010
Why attributes? An okapi is a mammal with a reddish dark
back, with striking horizontal white stripes on the front and back legs. (Wikipedia)
19
Why attributes? An okapi is a mammal with a reddish dark
back, with striking horizontal white stripes on the front and back legs. (Wikipedia)
20
Why attributes? An okapi is a mammal with a reddish dark
back, with striking horizontal white stripes on the front and back legs. (Wikipedia)
21
Zero-shot Learning Aye-ayes
Are nocturnal Live in trees Have large eyes Have long middle fingers
Which one of these is an aye-aye?
Humans can learn from descriptions (zero examples).
Slide adapted from Christoph Lampert by Devi Parikh 22
Is this a giraffe? No.
Is this a giraffe? Yes.
Is this a giraffe? No.
23Slide credit: Devi Parikh
I think this is a giraffe. What do you think?
No, its neck is too short for it to be a giraffe.
Ah! These must not be giraffes
either then.
[Animals with even shorter necks]
……
Current belief Focused feedbackKnowledge of the world
Feedback on one, transferred to many
Learner learns better from its mistakes Accelerated discriminative learning with few
examples
Parkash and Parikh, 2012
24Slide credit: Devi Parikh
Which Attributes to Describe?
25
(a) (b) (c)
(d)(e)
(f)
Please choose a person to the left of the person who is frowning
Sadovnik et al. 2013
Describing Clothing with Attributes
Objective
List of attributesMen’sBlack colorSweaterLong sleeveSolid patternLow skin exposure…
Attribute learning
Recommend and Analyze
Formal Sport
Recommendations
Related Work Person identification with clothing
Bounding box under face [Anguelov, 2007]
Clothing segmentation [Gallagher, 2008]
Dataset Preparation 1856 people from the web. Images are unconstrained.
Dataset Preparation$400 spent for collecting 283,107 labels on Amazon Mechanical Turk (AMT).
Dataset Statistics23
Bin
ary
3 M
ultic
lass
The System
Pose estimation
Feature extraction & quantization
Attribute classifier 1
Attribute classifier 2
Attribute classifier M
…
Multi-attribute CRF inference
Feature 1
Feature N
… SVM1
SVMN
… Combine features SVM
PredictionsBlueSolid patternOuterwearWear scarfLong sleeve
A: attribute
F: feature
A2
A1
A3
F1
F2
F3
F4
A4
…
…
Pose Estimation [Eichner et. al., 2010] Perform upper body detection, by using complementary results
from face detector and deformable part models. Foreground highlighting within the enlarged upper body bounding
box. Parse the upper body into head, torso, upper and lower parts of the
left and right arms.
SIFT descriptor extracted over the sampling grid.
Similar procedure for the arm regions.
Feature Extraction
Feature Extraction Maximum Response Filters [Varma 2005]
LAB color Skin probability
RGB image
Skin probability
MRF bank
Feature Extraction Raw features are quantized using soft K-
means (K=5 in our implementation). Quantized features are aggregated over
various body regions, by max or average pooling.
For learning color attributes, the feature is LAB color aggregated from non-skin regions.
Feature type Region Pooling method
SIFT Torso Average
Texture Left upper arm Max
Color Right upper arm
Skin probability Left lower arm
Right lower arm
Feature Fusion SVM is a kernel-based classification technique. Feature fusion solution: combined SVM is
trained using weighted sum of the kernels. Combining features consistently outperforms
the single best feature.
SVM 1
SVM 2
SVM N
K1
K2
KN
Predict accuracy 2
K1
K2
KN
SVM Combined
Predict accuracy 1
…
Predict accuracy N
Attribute prediction
Recap
Pose estimation
Feature extraction & quantization
Attribute classifier 1
Attribute classifier 2
Attribute classifier M
…
Multi-attribute CRF inference
Feature 1
Feature N
… SVM1
SVMN
… Combine features SVM
PredictionsBlueSolid patternOuterwearWear scarfLong sleeve
A: attribute
F: feature
A2
A1
A3
F1
F2
F3
F4
A4
…
…
Attribute Dependencies
Necktie and T-Shirt?
Attribute Inference with CRF Each attribute is a node. All nodes are pair-wise
connected. The edge connecting 2 nodes corresponds to the
joint probability of these 2 attributes.
Ai: Attribute iFi: Features for Ai
A6
F6
A2
A1
A3
A5
A4
F1
F2
F3
F4
F5
CRF for Attribute Learning
44
For a fully connected CRF, we maximize:
The CRF potential is maximized using standard belief propagation technique [Tappen et. al. 2003] .
),(),,(),,( 2121212121 AAPAAFFPFFAAP
CAAPAP
FAP
AP
FAPFFAAP
AAAA
),(
potential Edge
21
)(potential 2 Node
2
22
)(potential 1 Node
1
112121
2121
),(log)(
)(log
)(
)(log),,(log
[Following CRF model]),()()( 212211 AAPAFPAFP
),()(
)(
)(
)(21
2
22
1
11 AAPAP
FAP
AP
FAP
A1 AM
F1 FM
A2
F2
…
…
EAA
jiSA
i
jii
AAA),(
),()(
Node potential Edge potential
No necktie (Wear necktie)Has collar
Men’s Has placket
Low exposure No scarf
Solid patternBlack
Short sleeve (Long sleeve)V-shape neckline
Dress (Suit)
Wear necktieHas collar
Men’sHas placket
High exposure (Low exposure)No scarf
Solid patternGray & blackLong sleeve
V-shape neckline Suit
No necktieHas collar
Men’sHas placket
Low exposureWear scarf
Solid patternBrown & black
No sleeve (long sleeve)V-shape neckline
Tank top (outerwear)
Experimental Results Questions that we are interested in:
Does combining features improve performance?
Does the pose model help? Does the CRF work?
Pose Vs No Pose - Experiment Setup Positive and negative examples are
balanced. SVM classification
Chi-squared kernel Leave-1-out cross validation
Comparison with attribute learning without pose model. Features are extracted within a scaled
clothing mask under the face. Evaluation performed under the same
experiment settings. The clothing mask [Gallagher 2008]
Neckti
e
Gender
Skin ex
posure
Pattern
solid
Pattern
spot
Pattern
plaid
Color red
Color gree
n
Color blue
Color bro
wn
Color gray
>2 co
lors
neckli
ne45%
50%
55%
60%
65%
70%
75%
80%
85%
90%
95%
Best feature (with pose) Combined feature (with pose)Combined feature (no pose)
Accu
racy
(bin
ary-
clas
s) /
MAP
(mul
ti-cl
ass)
Multiclass Confusion Matrix
Neckti
e
Gender
Skin ex
posure
Pattern
solid
Pattern
spot
Pattern
plaid
Color red
Color gree
n
Color blue
Color bro
wn
Color gray
>2 co
lors
neckli
ne45%
50%
55%
60%
65%
70%
75%
80%
85%
90%
95%
Before CRF After CRFG
-mea
n
Steve Jobs:“solid pattern, men’s clothing, black color, long sleeves, round neckline, outerwear, wearing scarf”
The predicted dressing style of weddings: Male: “solid pattern, suit, long-sleeves, V-
shape neckline, wearing necktie, wearing scarf, has collar, has placket”
Female: “high skin exposure, no sleeves, dress, other neckline shapes, white, >2 colors, floral pattern”
Gender RecognitionFace-based: Project faces in the Fisher space.Clothing-based: The gender output of our system.Better gender recognition is achieved by combining face and clothing.
Conclusions Clothing attributes can be better learned
with a human pose model. CRF offers improved performance by
exploring attribute relations. Proposed novel applications that exploit
the predicted attributes.
Miscellaneous
56
What do you have?
57
58
59
AutoCropping
60
AutoCropping
61Auction Probability: 97%
AutoCropping
62
Eigenvector
Quantized Eigenvector
63
How do photos affect value?
64
Angled, high contrast: ~$115
How do photos affect value?
65
Frontal, Flash reflection~$88
Thank You!
66
Future Work Expect even better performance by using
the (almost) ground truth pose estimated by Kinect sensors [Shotton et. al., Best Paper CVPR 2011].
Incorporate clothing information in person identification.
68
The Loop
Images and Computer Vision
What we know about people
69
The Loop: This talk Examples of how social data has helped
understand images of people Some things I’ve learned about people
from computer vision
75
What is context?
76
Context
77
Which monster is larger?
Shepard RN (1990) Mind Sights: Original Visual Illusions, Ambiguities, and other Anomalies, New York: WH Freeman and Company
78
Your brain specializes in faces
79
Find The Face In the beans:
http://www.michaelbach.de/ot/sze_muelue/index.html
80
Understanding images of people