“now! – that should clear up a few things around here!”
TRANSCRIPT
“Now! – That should clear upa few things around here!”
The Challenge of Recognition
The Challenge of Recognition
The Importance of Recognition
Navigation
Social interactions
Sexualselection
Foraging
Offspringcare
Dangeravoidance
Pavlovianconditioning
Objectrecognition
Objectrecognition
The Importance of Recognition
“Not only did Dr. P fail to see faces, but he saw faces when there wereno faces to see. In the street he might pat the heads of water hydrantsand parking meters, taking these to be the heads of children; he would amiably address carved knobs on the furniture and be astounded whenthey did not reply. Such incidents multiplied, causing embarrassment, perplexity and fear.”
From ‘The man who mistook his wife for a hat’by Dr. Oliver Sacks
Brain Mechanisms of Recognition
Pawan SinhaDepartment of Brain and Cognitive Sciences
MIT
• Current understanding• Ongoing research• Real-world applications• What the future holds
Current Understanding – Lesion Studies
Which parts of the brain are involved in visual recognition?
Initial clues:Kluver-Bucy syndrome (1939)
Temporal lobe lesions in humansand monkeys lead to:1. Visual agnosia, and2. Hypersexuality
Responses of a Patient with Temporal Lobe Damage
(Farah, 1995)
Current Understanding – Electrophysiology
Face-cell 1
Face-cell 2
Desimone, 1984
Are there specific neurons involved in visual recognition?
Current Understanding – Brain Imaging
Kanwisher et al, 1997
Are there specific brain regions involved in visual recognition?
“Face area”
Current Understanding – Summary
The temporal lobe is involved in visual recognition.
Current Understanding – Summary
The temporal lobe is involved in visual recognition.
So what?
Current Understanding – Summary
The temporal lobe is involved in visual recognition.
So what?
This doesn’t tell us how the brain recognizes objects.
How Does the Brain Recognize Objects?
PrimaryVisualCortex
Oriented bar and edgedetector neurons
Hubel and Wiesel (1977)
A Proposal for How the Brain Recognizes Objects
Edges!
Further processing
Marr (1979)
A Proposal for How the Brain Recognizes Objects
Image
Edge map
Binocularprocessing
3D estimate Recognition!
A Proposal for How the Brain Recognizes Objects
What underlies the researchers’ fascination for edges?
Fine edges, but not the coarse structure, are expected to be invariant to imaging variations.
In principle, this can make the recognition task very easy.
The Problem with Edges…
In practice, fine edges turn out to be highly unstable.
Even after more than two decades of research, we havebeen unable to create a robust recognition system based onthe proposed model.
The Problem with the Rest of the Model…
Image
Edge map
Binocularprocessing
3D estimate Recognition!
Recent experimental results suggest that recognitionmay precede the intermediate steps.
Sinha & Poggio, Nature, 1996Jones, Sinha, Poggio & Vetter, Current Biology, 1997Bulthoff, Bulthoff & Sinha, Nature Neuroscience, 1998
The Million Dollar Question…
If recognition has to happen before the image can be finely analyzed, what then is the minimum image information that suffices for recognition?
Image
Edge map
Binocularprocessing
3D estimate Recognition!
?
Reducing Image Information
An ecologically sound method: Progressive blurring(Equivalent to recognition at increasing distances)
RecognitionPerformance
Amount of Blur
Criterion
Face Detection – Experimental Protocol
Subject’s task: Given an image, to determine whether it is a face.
Random patterns
Symmetric patterns
False-alarms from an artificial detectionsystem
Targets Distractors
All stimuli are presented at several blur levels.
Face Detection – Results
Amount of Blur(radius of Gaussian)
Face Detection Performance
0
20
40
60
80
100
120
0 2 4 6 8 10 12 14 16
Blur Amount
Per
form
ance
(%)
Face Hit Rate
FP FA
Symm FA
Random FA
Hit Rate Criterion
FA Criterion
Face Detection – Analysis
Are there any useful invariants at such high levels of blur?
Yes! Sinha, 1994, 1995; Lipson, Grimson, Sinha, 1997
Conceptlearning
algorithmsRatio-template: A stable
face-signature that comprises pairwise ordinal brightness
relationships
http://www.ai.mit.edu/projects/cbcl/web-pawan/cartoon/cartoon.html
Face Detection – Analysis (contd.)
Does the brain use a ‘ratio-template’ like invariant fordetecting faces?
There is no direct evidence yet. However, there is someindirect evidence.
1. Neurons in the visual cortex have the required propertiesneeded to implement this model.
2. Computer implementations of this model yield goodperformance.
Performance of Ratio-templates
Benefits of Low-Resolution Approach
1. Permits face detection at a distance2. Is robust to image degradations3. Can generalize across facial variations4. Is computationally simple
An Application of Our Face-detection System
The Nielsen People Meter
Beyond Mere Detection – Face Recognition
Low-resolution images may suffice for detection, but, surely,we must have fine detailed information for recognition…
A popular approach to face recognition – feature matching
Beyond Mere Detection – Face Recognition
Prediction of such an approach:
Amount of Blur
RecognitionPerformance
Face Recognition – Experimental Protocol
Subject’s task: To recognize celebrity images subjected todifferent levels of blur.
Stimuli: Blur series for 36 celebrity faces.
Face Recognition – Experimental Protocol
29.3%
37.9%
56.1%
66.7%
82.1%87.4%91.2%
91.4%
0%0%0%2.4%
7.1%
29.8%
65.9%
79%
0%0%0%0%0%2%
28.6%
52.4%
94.1%
85.3%
0
10
20
30
40
50
60
70
80
90
100
REF 0 2 4 6 8 10 12 14
Blur Level
Pe
rce
nt
Co
rre
ct
Full_Face_NC_11
Internal_Intact_7
Internal_Broken_7
REF_Intact
REF_Broken
Face Recognition - Results
10 x 12 pixels/face
29.3%
37.9%
56.1%
66.7%
82.1%87.4%91.2%
91.4%
0%0%0%2.4%
7.1%
29.8%
65.9%
79%
0%0%0%0%0%2%
28.6%
52.4%
94.1%
85.3%
0
10
20
30
40
50
60
70
80
90
100
REF 0 2 4 6 8 10 12 14
Blur Level
Pe
rce
nt
Co
rre
ct
Full_Face_NC_11
Internal_Intact_7
Internal_Broken_7
REF_Intact
REF_Broken
Face Recognition - Results
10 x 12 pixels/face70 x 70 pixels/face
Face Recognition - inference
Overall face configuration supports much more robustrecognition as compared to individual features.
Face Recognition - inference
Overall face configuration supports much more robustrecognition as compared to individual features.
“By and large, Dr. P recognized nobody: neither his family, norHis colleagues, nor his pupils. He recognized Einstein becauseHe picked up the characteristic moustache, and the same thingHappened with one or two other people. ‘Ach, Paul!’ he said,When shown a portrait of his brother. ‘That square jaw, thoseBig teeth –I would know Paul anywhere!’
From ‘The man who mistook his wife for a hat’By Dr. Oliver Sacks
Face Recognition - inference
Overall face configuration supports much more robustrecognition as compared to individual features.
Sinha & Poggio, Nature, 1996
The Two Million Dollar Question
Which aspects of facial configuration are importantand which are not?
The Two Million Dollar Question
Which aspects of facial configuration are importantand which are not?
Caricaturists probably know the answer to this question…
The Two Million Dollar Question
Which aspects of facial configuration are importantand which are not?
Caricaturists probably know the answer to this question…
…but it is difficult for them to articulate this intuitive knowledge, sometimes even to themselves.
Caricaturists in their own words…
“When I’m having difficulty caricaturing someone, I just keep drawing by doing ten, twenty, thirty, forty sketches of the subject…”
- Bill Plympton
“I once spent about eighteen hours trying to caricatureBarry Manilow. It was frustrating not being able to draw someone who is so funny looking in the first place.”
- Taylor Jones
The Hirschfeld Project
An attempt to make explicit the intuitive knowledgethat caricaturists possess and, in the process, togain insights into the brain’s face recognitionstrategies.
The Hirschfeld Project - Goal
Given multiple caricatures corresponding to everyface image in a large set…
m f
ace
imag
esn caricatures
…the goal is to determine which facial measurements Caricaturists consistently emphasize (or de-emphasize) and how the extent of distortion relates to the deviation of a given face from the population average.
The Hirschfeld Project - Caveats
“To effectively caricature a subject, I must feel that person’spersonality. A caricature to me is not just a big nose, big earsAnd a big head on a little body – it is much more.”
-Gerald ScarfeCaricaturist
The Hirschfeld Project - Caveats
“I wouldn’t be surprised if some day they would create a computer program to do caricature, but it would be terrible.what a caricaturist does involves his whole life experience,education and some indefinable thing that makes it all work.”
-Robert GrossmanCaricaturist
“To effectively caricature a subject, I must feel that person’spersonality. A caricature to me is not just a big nose, big earsAnd a big head on a little body – it is much more.”
-Gerald ScarfeCaricaturist
The Hirschfeld Project – in 10 Steps
Step 1:Caricature database creation(~50 caricaturists; ~100 faces of celebrities and others)
Step 2:Assessment of recognizability of caricatures
Step 3: Database digitization
The Hirschfeld Project – in 10 Steps
Step 4:Measurements for each database entry
The Hirschfeld Project – in 10 Steps
Step 4:Measurements for each database entry
The Hirschfeld Project – in 10 Steps
Step 5:Average face construction and measurement
The Hirschfeld Project – in 10 Steps
Step 6:Over-complete attribute set creation for each database entry
Step 7:Determination of ‘deviations’ of input face attributes w.r.t. average face
Point coordinates (a0, a1, a2, …, an)Lengths, ratios of lengths, angles,Areas, ratios of areas
dk = a k/a k - 1 inp avg
Dk = 0 -> Face attribute same as averageDk < 0 -> Face attribute smaller than averageDk > 0 -> Face attribute larger than average
Step 8:Determination of attribute exaggeration in caricatures
ek = a k/a k - 1 caric avg
The Hirschfeld Project – in 10 Steps
Step 9:Plot e vs d for each attribute across all inputs and perform linear regression.
d
eBest linear fit
Slope of regression line provides a measure of theemphasis assigned to anattribute.
d
e Emphasis
De-emphasisAnti-emphasis
The Hirschfeld Project – in 10 Steps
Step 10:Rank order attributes using regression line slopes.This provides an estimate of their relative salienceto the caricaturist and, perhaps, to the visual system.
The Hirschfeld Project – Early Results
Prediction:The width and height dimensions of a face can be independently scaledwithout adversely affecting recognizability.
ab
cde
f
g
h
i
Important attributes:
A /Aa/bc/de/df/gh/gi/(g+h)
hair face
Testing the Prediction of Independent Scalability in x & y
Artificially Created Caricatures
We can also specialize this approach to generate caricatures in the style of a particular caricaturist.
One day, the Hirschfeld Project may allow us to create an artificial Mr. Hirschfeld.
Besides yielding clues about the brain’s recognition strategies, these results also provide a prescription for automatically creating caricatures.
The Hirschfeld Project - Hurdles
Getting good caricaturists to help populate the database.
The Hirschfeld Project - Hurdles
Getting good caricaturists to help populate the database.
Interim Summary
1. The human brain can recognize objects well even invery low-resolution images.2. In contrast to previous proposals, we are formulatinga recognition scheme designed to utilize low-resolutionimage information. We call this the ‘Configural Recognition Scheme’3. We have made some headway on the task of face-detectionand are currently exploring face-recognition.
Interim Summary
1. The human brain can recognize objects well even invery low-resolution images.2. In contrast to previous proposals, we are formulatinga recognition scheme designed to utilize low-resolutionimage information. We call this the ‘Configural Recognition Scheme’.3. We have made some headway on the task of face-detectionand are currently exploring face-recognition.
Understanding the brain’s strategies for recognition mayallow us to create useful artificial vision systems…
Pedestrian Detection
Collaboration with T. PoggioPartly funded by Daimler-Benz
Driver’s attentionalwindow
Achtung!
Pedestrian Detection – Early Results
Logo Search
The US-PTO has more than 2 million logos on file.New logo registration requires a search to prevent design infringement.
Existing logo search method:
Retrieval via numerical design annotation system (USPTO)
Five pointed star: 01-01-03
Logo Search - Challenges
Problem: Many logos do not have simple annotations!
Logo Search - Results
Query
Query
Industrial Inspection
3000 components/PCboard
60 seconds for inspection
Yield rate: 20%
Industrial Inspection - Results
Venturing into the Real World…
1st
MIT $50KEntrepreneurshipCompetition
What Does the Future Hold?
- A great deal more research- More sophisticated artificial systems
The COG project:Brooks, Scassellati et al.MIT AI Lab
What Does the Future Hold?
Face detection and tracking by Cog using Ratio-templates
Scassellati, 1998
What Does the Future Hold?
Face motion imitation by Cog
What Does the Future Hold?
- Better psycho-forensic systems
Current IdentiKit systems use a piecemeal approach:
What Does the Future Hold?
- Better psycho-forensic systems
Some IdentiKit composites generated by a police operator:
What Does the Future Hold?
- Better tools for visual information management
100 million
1997 199919961995199419931992
# of digital imageson the web
What Does the Future Hold?
Current tools for visual information management
If an image is worth a thousand words, how canwe hope to describe it with just a few?
Textual annotations
What Does the Future Hold? (contd.)
Desirable features for an image retriever
Graphical queries (“bring me images like this one”) No annotation required (content-based search) Quantitative measure of perceptual similarity
What Does the Future Hold? (contd.)
A content-based image retriever
What Does the Future Hold? (contd.)
Non-configural image retrievers
What Does the Future Hold?
- Better tools for visual information filtering(the flip side of searching)
xxx xxx
xxxTrainingData
Content-basedImage filter
xxx xxx
xxxWebimages
xxxFilteredcontent
What Does the Future Hold?
- Understanding recognition in the other sensory modalities
Summary and Conclusion
• Current understanding• Ongoing research
•Face detection•Face recognition
• Real-world applications
• The future holds very exciting prospects