![Page 1: Where have we been? Where are we going? - ACM Learning …• 1.4M objects, 75.7K obj. classes • 1.5M relationships, 40.5K rel. classes • 1.7M attributes, 40.5K attr. classes •](https://reader033.vdocuments.us/reader033/viewer/2022042222/5ec87e7f022b2a556d25c351/html5/thumbnails/1.jpg)
Where have we been? Where are we going?
LI F E I – F EI
![Page 2: Where have we been? Where are we going? - ACM Learning …• 1.4M objects, 75.7K obj. classes • 1.5M relationships, 40.5K rel. classes • 1.7M attributes, 40.5K attr. classes •](https://reader033.vdocuments.us/reader033/viewer/2022042222/5ec87e7f022b2a556d25c351/html5/thumbnails/2.jpg)
The Beginning: CVPR 2009
J. Deng, W. Dong, R. Socher, L.-J. Li, K. Li and L. Fei-Fei, ImageNet: A Large-Scale Hierarchical Image Database. IEEE Computer Vision and Pattern Recognition (CVPR), 2009.
![Page 3: Where have we been? Where are we going? - ACM Learning …• 1.4M objects, 75.7K obj. classes • 1.5M relationships, 40.5K rel. classes • 1.7M attributes, 40.5K attr. classes •](https://reader033.vdocuments.us/reader033/viewer/2022042222/5ec87e7f022b2a556d25c351/html5/thumbnails/3.jpg)
The Impact of
![Page 4: Where have we been? Where are we going? - ACM Learning …• 1.4M objects, 75.7K obj. classes • 1.5M relationships, 40.5K rel. classes • 1.7M attributes, 40.5K attr. classes •](https://reader033.vdocuments.us/reader033/viewer/2022042222/5ec87e7f022b2a556d25c351/html5/thumbnails/4.jpg)
4,386 Citations
2,847 Citations
on Google Scholar
…and many more.
![Page 5: Where have we been? Where are we going? - ACM Learning …• 1.4M objects, 75.7K obj. classes • 1.5M relationships, 40.5K rel. classes • 1.7M attributes, 40.5K attr. classes •](https://reader033.vdocuments.us/reader033/viewer/2022042222/5ec87e7f022b2a556d25c351/html5/thumbnails/5.jpg)
From Challenge Contestants to Startups
![Page 6: Where have we been? Where are we going? - ACM Learning …• 1.4M objects, 75.7K obj. classes • 1.5M relationships, 40.5K rel. classes • 1.7M attributes, 40.5K attr. classes •](https://reader033.vdocuments.us/reader033/viewer/2022042222/5ec87e7f022b2a556d25c351/html5/thumbnails/6.jpg)
A Revolution in Deep Learning
Why Deep Learning is Suddenly Changing Your Life
By Roger Parloff The Great Artificial Intelligence Awakening
By Gideon Lewis-Kraus The data that transformed AI research—and possibly the world
By Dave Gershgorn
![Page 7: Where have we been? Where are we going? - ACM Learning …• 1.4M objects, 75.7K obj. classes • 1.5M relationships, 40.5K rel. classes • 1.7M attributes, 40.5K attr. classes •](https://reader033.vdocuments.us/reader033/viewer/2022042222/5ec87e7f022b2a556d25c351/html5/thumbnails/7.jpg)
“The of x”
SpaceNet DigitalGlobe, CosmiQ Works, NVIDIA
ShapeNet A.Chang et al, 2015
MusicNet J. Thickstun et al, 2017
EventNet G. Ye et al, 2015
Medical ImageNet Stanford Radiology, 2017
ActivityNet F. Heilbron et al, 2015
![Page 8: Where have we been? Where are we going? - ACM Learning …• 1.4M objects, 75.7K obj. classes • 1.5M relationships, 40.5K rel. classes • 1.7M attributes, 40.5K attr. classes •](https://reader033.vdocuments.us/reader033/viewer/2022042222/5ec87e7f022b2a556d25c351/html5/thumbnails/8.jpg)
An Explosion of Datasets
1627 Hosted Datasets
276 Commercial
Competitions
1MM Data Scientists
4MM ML Models Submitted
1919 Student
Competitions
![Page 9: Where have we been? Where are we going? - ACM Learning …• 1.4M objects, 75.7K obj. classes • 1.5M relationships, 40.5K rel. classes • 1.7M attributes, 40.5K attr. classes •](https://reader033.vdocuments.us/reader033/viewer/2022042222/5ec87e7f022b2a556d25c351/html5/thumbnails/9.jpg)
“Datasets—not algorithms—might be the key limiting factor to development of human-level artificial intelligence.”
A L E X A N D E R W I S S N E R - G R O S S Edge.org, 2016
![Page 10: Where have we been? Where are we going? - ACM Learning …• 1.4M objects, 75.7K obj. classes • 1.5M relationships, 40.5K rel. classes • 1.7M attributes, 40.5K attr. classes •](https://reader033.vdocuments.us/reader033/viewer/2022042222/5ec87e7f022b2a556d25c351/html5/thumbnails/10.jpg)
The Untold History of
![Page 11: Where have we been? Where are we going? - ACM Learning …• 1.4M objects, 75.7K obj. classes • 1.5M relationships, 40.5K rel. classes • 1.7M attributes, 40.5K attr. classes •](https://reader033.vdocuments.us/reader033/viewer/2022042222/5ec87e7f022b2a556d25c351/html5/thumbnails/11.jpg)
Hardly the First Image Dataset
Lotus Hill (2007)
Yao et al, 2007
ESP (2006) Ahn et al, 2006
LabelMe (2005) Russell et al, 2005
MSRC (2006) Shotton et al. 2006
CalTech 101/256 (2005) Fei-Fei et al, 2004 GriffIn et al, 2007
TinyImage (2008) Torralba et al. 2008
PASCAL (2007) Everingham et al, 2009
CAVIAR Tracking (2005) R. Fisher, J. Santos-Victor J. Crowley
Middlebury Stereo (2002) D. Scharstein R. Szeliski
UIUC Cars (2004) S. Agarwal, A. Awan, D. Roth
FERET Faces (1998) P. Phillips, H. Wechsler, J.
Huang, P. Raus
CMU/VASC Faces (1998) H. Rowley, S. Baluja, T. Kanade
MNIST digits (1998-10) Y LeCun & C. Cortes
COIL Objects (1996) S. Nene, S. Nayar, H. Murase
3D Textures (2005) S. Lazebnik, C. Schmid, J. Ponce
CuRRET Textures (1999) K. Dana B. Van Ginneken S. Nayar
J. Koenderink
KTH human action (2004) I. Leptev & B. Caputo
Sign Language (2008) P. Buehler, M. Everingham, A.
Zisserman
Segmentation (2001) D. Martin, C. Fowlkes, D. Tal, J. Malik.
![Page 12: Where have we been? Where are we going? - ACM Learning …• 1.4M objects, 75.7K obj. classes • 1.5M relationships, 40.5K rel. classes • 1.7M attributes, 40.5K attr. classes •](https://reader033.vdocuments.us/reader033/viewer/2022042222/5ec87e7f022b2a556d25c351/html5/thumbnails/12.jpg)
A Profound Machine Learning Problem Within Visual Learning
![Page 13: Where have we been? Where are we going? - ACM Learning …• 1.4M objects, 75.7K obj. classes • 1.5M relationships, 40.5K rel. classes • 1.7M attributes, 40.5K attr. classes •](https://reader033.vdocuments.us/reader033/viewer/2022042222/5ec87e7f022b2a556d25c351/html5/thumbnails/13.jpg)
Machine Learning 101: Complexity, Generalization, Overfitting
Underfitting Zone
Overfitting Zone
Generalization Error
Generalization Gap
Training Error
Error
Capacity Optimal Capacity
![Page 14: Where have we been? Where are we going? - ACM Learning …• 1.4M objects, 75.7K obj. classes • 1.5M relationships, 40.5K rel. classes • 1.7M attributes, 40.5K attr. classes •](https://reader033.vdocuments.us/reader033/viewer/2022042222/5ec87e7f022b2a556d25c351/html5/thumbnails/14.jpg)
Fei-Fei et al, 2003, 2004
One-Shot Learning
![Page 15: Where have we been? Where are we going? - ACM Learning …• 1.4M objects, 75.7K obj. classes • 1.5M relationships, 40.5K rel. classes • 1.7M attributes, 40.5K attr. classes •](https://reader033.vdocuments.us/reader033/viewer/2022042222/5ec87e7f022b2a556d25c351/html5/thumbnails/15.jpg)
Fei-Fei et al, 2003, 2004
![Page 16: Where have we been? Where are we going? - ACM Learning …• 1.4M objects, 75.7K obj. classes • 1.5M relationships, 40.5K rel. classes • 1.7M attributes, 40.5K attr. classes •](https://reader033.vdocuments.us/reader033/viewer/2022042222/5ec87e7f022b2a556d25c351/html5/thumbnails/16.jpg)
How Children Learn to See
![Page 17: Where have we been? Where are we going? - ACM Learning …• 1.4M objects, 75.7K obj. classes • 1.5M relationships, 40.5K rel. classes • 1.7M attributes, 40.5K attr. classes •](https://reader033.vdocuments.us/reader033/viewer/2022042222/5ec87e7f022b2a556d25c351/html5/thumbnails/17.jpg)
Underfitting Zone
Overfitting Zone
Generalization Error
Generalization Gap
Training Error
Error
Capacity Optimal Capacity
![Page 18: Where have we been? Where are we going? - ACM Learning …• 1.4M objects, 75.7K obj. classes • 1.5M relationships, 40.5K rel. classes • 1.7M attributes, 40.5K attr. classes •](https://reader033.vdocuments.us/reader033/viewer/2022042222/5ec87e7f022b2a556d25c351/html5/thumbnails/18.jpg)
A new way of thinking…
To shift the focus of Machine Learning for visual recognition
from modeling…
…to data. Lots of data.
![Page 19: Where have we been? Where are we going? - ACM Learning …• 1.4M objects, 75.7K obj. classes • 1.5M relationships, 40.5K rel. classes • 1.7M attributes, 40.5K attr. classes •](https://reader033.vdocuments.us/reader033/viewer/2022042222/5ec87e7f022b2a556d25c351/html5/thumbnails/19.jpg)
15,000
Global Data Traffic (PB/month) Source: Cisco
11,250
7,500
3,750
Internet Data Growth 1990-2010
![Page 20: Where have we been? Where are we going? - ACM Learning …• 1.4M objects, 75.7K obj. classes • 1.5M relationships, 40.5K rel. classes • 1.7M attributes, 40.5K attr. classes •](https://reader033.vdocuments.us/reader033/viewer/2022042222/5ec87e7f022b2a556d25c351/html5/thumbnails/20.jpg)
What is WordNet?
Original paper by [George Miller, et al 1990] cited over
5,000 times
Organizes over 150,000 words into 117,000 categories
called synsets.
Establishes ontological and
lexical relationships in NLP and related
tasks.
![Page 21: Where have we been? Where are we going? - ACM Learning …• 1.4M objects, 75.7K obj. classes • 1.5M relationships, 40.5K rel. classes • 1.7M attributes, 40.5K attr. classes •](https://reader033.vdocuments.us/reader033/viewer/2022042222/5ec87e7f022b2a556d25c351/html5/thumbnails/21.jpg)
Christiane Fellbaum Senior Research Scholar Computer Science Department, Princeton President, Global WordNet Consortium
![Page 22: Where have we been? Where are we going? - ACM Learning …• 1.4M objects, 75.7K obj. classes • 1.5M relationships, 40.5K rel. classes • 1.7M attributes, 40.5K attr. classes •](https://reader033.vdocuments.us/reader033/viewer/2022042222/5ec87e7f022b2a556d25c351/html5/thumbnails/22.jpg)
German shepherd: breed of large shepherd dogs used in police work and as a guide for the blind.
microwave: kitchen appliance that cooks food by passing an electromagnetic wave through it.
mountain: a land mass that projects well above its surroundings; higher than a hill.
jacket: a short coat
A massive ontology of images to transform
computer vision
Individually Illustrated WordNet Nodes
![Page 23: Where have we been? Where are we going? - ACM Learning …• 1.4M objects, 75.7K obj. classes • 1.5M relationships, 40.5K rel. classes • 1.7M attributes, 40.5K attr. classes •](https://reader033.vdocuments.us/reader033/viewer/2022042222/5ec87e7f022b2a556d25c351/html5/thumbnails/23.jpg)
Comrades
Prof. Kai Li Princeton
Jia Deng 1st Ph.D. student
Princeton
![Page 24: Where have we been? Where are we going? - ACM Learning …• 1.4M objects, 75.7K obj. classes • 1.5M relationships, 40.5K rel. classes • 1.7M attributes, 40.5K attr. classes •](https://reader033.vdocuments.us/reader033/viewer/2022042222/5ec87e7f022b2a556d25c351/html5/thumbnails/24.jpg)
Entity
Mammal
Dog
German Shepherd
Step 1: Ontological structure based on WordNet
![Page 25: Where have we been? Where are we going? - ACM Learning …• 1.4M objects, 75.7K obj. classes • 1.5M relationships, 40.5K rel. classes • 1.7M attributes, 40.5K attr. classes •](https://reader033.vdocuments.us/reader033/viewer/2022042222/5ec87e7f022b2a556d25c351/html5/thumbnails/25.jpg)
Dog
German Shepherd
Step 2: Populate categories with thousands of images from the Internet
![Page 26: Where have we been? Where are we going? - ACM Learning …• 1.4M objects, 75.7K obj. classes • 1.5M relationships, 40.5K rel. classes • 1.7M attributes, 40.5K attr. classes •](https://reader033.vdocuments.us/reader033/viewer/2022042222/5ec87e7f022b2a556d25c351/html5/thumbnails/26.jpg)
Step 3: Clean results by hand
Dog
German Shepherd
![Page 27: Where have we been? Where are we going? - ACM Learning …• 1.4M objects, 75.7K obj. classes • 1.5M relationships, 40.5K rel. classes • 1.7M attributes, 40.5K attr. classes •](https://reader033.vdocuments.us/reader033/viewer/2022042222/5ec87e7f022b2a556d25c351/html5/thumbnails/27.jpg)
Three Attempts at Launching
![Page 28: Where have we been? Where are we going? - ACM Learning …• 1.4M objects, 75.7K obj. classes • 1.5M relationships, 40.5K rel. classes • 1.7M attributes, 40.5K attr. classes •](https://reader033.vdocuments.us/reader033/viewer/2022042222/5ec87e7f022b2a556d25c351/html5/thumbnails/28.jpg)
1st Attempt: The Psychophysics Experiment
ImageNet PhD Students
Miserable Undergrads
![Page 29: Where have we been? Where are we going? - ACM Learning …• 1.4M objects, 75.7K obj. classes • 1.5M relationships, 40.5K rel. classes • 1.7M attributes, 40.5K attr. classes •](https://reader033.vdocuments.us/reader033/viewer/2022042222/5ec87e7f022b2a556d25c351/html5/thumbnails/29.jpg)
1st Attempt: The Psychophysics Experiment
• # of synsets: 40,000 (subject to: imageability analysis)
• # of candidate images to label per synset: 10,000 • # of people needed to verify: 2-5 • Speed of human labeling: 2 images/sec (one fixation: ~200msec)
• Massive parallelism (N ~ 10^2-3)
40,000 × 10,000 × 3 / 2 = 6000,000,000 sec ≈ 19 years
N
![Page 30: Where have we been? Where are we going? - ACM Learning …• 1.4M objects, 75.7K obj. classes • 1.5M relationships, 40.5K rel. classes • 1.7M attributes, 40.5K attr. classes •](https://reader033.vdocuments.us/reader033/viewer/2022042222/5ec87e7f022b2a556d25c351/html5/thumbnails/30.jpg)
2nd Attempt: Human-in-the-Loop Solutions
![Page 31: Where have we been? Where are we going? - ACM Learning …• 1.4M objects, 75.7K obj. classes • 1.5M relationships, 40.5K rel. classes • 1.7M attributes, 40.5K attr. classes •](https://reader033.vdocuments.us/reader033/viewer/2022042222/5ec87e7f022b2a556d25c351/html5/thumbnails/31.jpg)
2nd Attempt: Human-in-the-Loop Solutions
Human-generated datasets transcend
algorithmic limitations, leading to better
machine perception.
Machine-generated datasets can only match the best algorithms of
the time.
![Page 32: Where have we been? Where are we going? - ACM Learning …• 1.4M objects, 75.7K obj. classes • 1.5M relationships, 40.5K rel. classes • 1.7M attributes, 40.5K attr. classes •](https://reader033.vdocuments.us/reader033/viewer/2022042222/5ec87e7f022b2a556d25c351/html5/thumbnails/32.jpg)
3rd Attempt: A Godsend Emerges
ImageNet PhD Students
Crowdsourced Labor
49k Workers from 167 Countries 2007-2010
![Page 33: Where have we been? Where are we going? - ACM Learning …• 1.4M objects, 75.7K obj. classes • 1.5M relationships, 40.5K rel. classes • 1.7M attributes, 40.5K attr. classes •](https://reader033.vdocuments.us/reader033/viewer/2022042222/5ec87e7f022b2a556d25c351/html5/thumbnails/33.jpg)
The Result: Goes Live in 2009
![Page 34: Where have we been? Where are we going? - ACM Learning …• 1.4M objects, 75.7K obj. classes • 1.5M relationships, 40.5K rel. classes • 1.7M attributes, 40.5K attr. classes •](https://reader033.vdocuments.us/reader033/viewer/2022042222/5ec87e7f022b2a556d25c351/html5/thumbnails/34.jpg)
![Page 35: Where have we been? Where are we going? - ACM Learning …• 1.4M objects, 75.7K obj. classes • 1.5M relationships, 40.5K rel. classes • 1.7M attributes, 40.5K attr. classes •](https://reader033.vdocuments.us/reader033/viewer/2022042222/5ec87e7f022b2a556d25c351/html5/thumbnails/35.jpg)
What We Did Right
![Page 36: Where have we been? Where are we going? - ACM Learning …• 1.4M objects, 75.7K obj. classes • 1.5M relationships, 40.5K rel. classes • 1.7M attributes, 40.5K attr. classes •](https://reader033.vdocuments.us/reader033/viewer/2022042222/5ec87e7f022b2a556d25c351/html5/thumbnails/36.jpg)
While Others Targeted Detail…
LabelMe Per-Object Regions and Labels
Russell et al, 2005
Lotus Hill Hand-Traced Parse Trees
Yao et al, 2007
![Page 37: Where have we been? Where are we going? - ACM Learning …• 1.4M objects, 75.7K obj. classes • 1.5M relationships, 40.5K rel. classes • 1.7M attributes, 40.5K attr. classes •](https://reader033.vdocuments.us/reader033/viewer/2022042222/5ec87e7f022b2a556d25c351/html5/thumbnails/37.jpg)
15M [Deng et al. ’09]
SUN, 131K [Xiao et al. ‘10]
LabelMe, 37K [Russell et al. ’07]
…We Targeted Scale
PASCAL VOC, 30K [Everingham et al. ’06-’12]
Caltech101, 9K [Fei-Fei, Fergus, Perona, ‘03]
![Page 38: Where have we been? Where are we going? - ACM Learning …• 1.4M objects, 75.7K obj. classes • 1.5M relationships, 40.5K rel. classes • 1.7M attributes, 40.5K attr. classes •](https://reader033.vdocuments.us/reader033/viewer/2022042222/5ec87e7f022b2a556d25c351/html5/thumbnails/38.jpg)
Additional Goals
High Resolution
To better replicate human visual acuity
Free of Charge
To ensure immediate application and a sense of community
High-Quality Annotation
To create a benchmarking dataset and advance the state of machine perception, not merely reflect it
Carnivore - Canine - Dog - Working Dog - Husky
![Page 39: Where have we been? Where are we going? - ACM Learning …• 1.4M objects, 75.7K obj. classes • 1.5M relationships, 40.5K rel. classes • 1.7M attributes, 40.5K attr. classes •](https://reader033.vdocuments.us/reader033/viewer/2022042222/5ec87e7f022b2a556d25c351/html5/thumbnails/39.jpg)
An Emphasis on Community and Achievement
Large Scale Visual Recognition Challenge (ILSVRC 2010-2017)
![Page 40: Where have we been? Where are we going? - ACM Learning …• 1.4M objects, 75.7K obj. classes • 1.5M relationships, 40.5K rel. classes • 1.7M attributes, 40.5K attr. classes •](https://reader033.vdocuments.us/reader033/viewer/2022042222/5ec87e7f022b2a556d25c351/html5/thumbnails/40.jpg)
Olga Russakovsky Stanford
Fei-Fei Li Stanford
Alex Berg UNC Chapel Hill
Wei Liu UNC Chapel Hill
ILSVRC Contributors
Eunbyung Park UNC Chapel Hill
Sean Ma Stanford
Jonathan Krause Stanford
Sanjeev Satheesh Stanford
Hao Su Stanford
Aditya Khosla Stanford
Zhiheng Huang Stanford
Jia Deng Univ. of Michigan
![Page 41: Where have we been? Where are we going? - ACM Learning …• 1.4M objects, 75.7K obj. classes • 1.5M relationships, 40.5K rel. classes • 1.7M attributes, 40.5K attr. classes •](https://reader033.vdocuments.us/reader033/viewer/2022042222/5ec87e7f022b2a556d25c351/html5/thumbnails/41.jpg)
Our Inspiration: PASCAL VOC
2005-2012
![Page 42: Where have we been? Where are we going? - ACM Learning …• 1.4M objects, 75.7K obj. classes • 1.5M relationships, 40.5K rel. classes • 1.7M attributes, 40.5K attr. classes •](https://reader033.vdocuments.us/reader033/viewer/2022042222/5ec87e7f022b2a556d25c351/html5/thumbnails/42.jpg)
Our Inspiration: PASCAL VOC
Mark Everingham
1973-2012
Mark Everingham Prize @ ECCV 2016
Alex Berg, Jia Deng, Fei-Fei Li, Wei Liu, Olga Russakovsky
![Page 43: Where have we been? Where are we going? - ACM Learning …• 1.4M objects, 75.7K obj. classes • 1.5M relationships, 40.5K rel. classes • 1.7M attributes, 40.5K attr. classes •](https://reader033.vdocuments.us/reader033/viewer/2022042222/5ec87e7f022b2a556d25c351/html5/thumbnails/43.jpg)
2010
35 29
81
123
157 172
2011 2012 2013 2014 2015 2016
Participation and Performance
Number of Entries
![Page 44: Where have we been? Where are we going? - ACM Learning …• 1.4M objects, 75.7K obj. classes • 1.5M relationships, 40.5K rel. classes • 1.7M attributes, 40.5K attr. classes •](https://reader033.vdocuments.us/reader033/viewer/2022042222/5ec87e7f022b2a556d25c351/html5/thumbnails/44.jpg)
2010
35 29
81
123
157 172
2011 2012 2013 2014 2015 2016
Participation and Performance
Number of Entries
Classification Errors (top-5)
0.28
0.03
![Page 45: Where have we been? Where are we going? - ACM Learning …• 1.4M objects, 75.7K obj. classes • 1.5M relationships, 40.5K rel. classes • 1.7M attributes, 40.5K attr. classes •](https://reader033.vdocuments.us/reader033/viewer/2022042222/5ec87e7f022b2a556d25c351/html5/thumbnails/45.jpg)
2010
35 29
81
123
157 172
2011 2012 2013 2014 2015 2016
Participation and Performance
Number of Entries
Classification Errors (top-5)
0.28
0.03 0.23
0.66
Average Precision For Object Detection
![Page 46: Where have we been? Where are we going? - ACM Learning …• 1.4M objects, 75.7K obj. classes • 1.5M relationships, 40.5K rel. classes • 1.7M attributes, 40.5K attr. classes •](https://reader033.vdocuments.us/reader033/viewer/2022042222/5ec87e7f022b2a556d25c351/html5/thumbnails/46.jpg)
What we did to make better
![Page 47: Where have we been? Where are we going? - ACM Learning …• 1.4M objects, 75.7K obj. classes • 1.5M relationships, 40.5K rel. classes • 1.7M attributes, 40.5K attr. classes •](https://reader033.vdocuments.us/reader033/viewer/2022042222/5ec87e7f022b2a556d25c351/html5/thumbnails/47.jpg)
Lack of Details
![Page 48: Where have we been? Where are we going? - ACM Learning …• 1.4M objects, 75.7K obj. classes • 1.5M relationships, 40.5K rel. classes • 1.7M attributes, 40.5K attr. classes •](https://reader033.vdocuments.us/reader033/viewer/2022042222/5ec87e7f022b2a556d25c351/html5/thumbnails/48.jpg)
Lack of Details…ILSVRC Detection Challenge
Statistics PASCAL VOC 2012
ILSVRC 2013
Object classes 20 200
Training
Images 5.7K 395K
Objects 13.6K 345K 25x
10x
70x
![Page 49: Where have we been? Where are we going? - ACM Learning …• 1.4M objects, 75.7K obj. classes • 1.5M relationships, 40.5K rel. classes • 1.7M attributes, 40.5K attr. classes •](https://reader033.vdocuments.us/reader033/viewer/2022042222/5ec87e7f022b2a556d25c351/html5/thumbnails/49.jpg)
Evaluation of ILSVRC Detection Need to annotate the presence of all classes
(to penalize false detections)
Table Chair Horse Dog Cat Bird
+ + - - - -
+ - - - + -
+ + - - - -
# images: 400K # classes: 200 # annotations = 80M!
![Page 50: Where have we been? Where are we going? - ACM Learning …• 1.4M objects, 75.7K obj. classes • 1.5M relationships, 40.5K rel. classes • 1.7M attributes, 40.5K attr. classes •](https://reader033.vdocuments.us/reader033/viewer/2022042222/5ec87e7f022b2a556d25c351/html5/thumbnails/50.jpg)
Evaluation of ILSVRC Detection Hierarchical annotation
J. Deng, O. Russakovsky, J. Krause, M. Bernstein, A. Berg, & L. Fei-Fei. CHI, 2014
![Page 51: Where have we been? Where are we going? - ACM Learning …• 1.4M objects, 75.7K obj. classes • 1.5M relationships, 40.5K rel. classes • 1.7M attributes, 40.5K attr. classes •](https://reader033.vdocuments.us/reader033/viewer/2022042222/5ec87e7f022b2a556d25c351/html5/thumbnails/51.jpg)
J. Deng, A. Berg & L. Fei-Fei, ECCV, 2010
What does classifying 10K+ classes tell us?
![Page 52: Where have we been? Where are we going? - ACM Learning …• 1.4M objects, 75.7K obj. classes • 1.5M relationships, 40.5K rel. classes • 1.7M attributes, 40.5K attr. classes •](https://reader033.vdocuments.us/reader033/viewer/2022042222/5ec87e7f022b2a556d25c351/html5/thumbnails/52.jpg)
Fine-Grained Recognition
“Cardigan Welsh Corgi” “Pembroke Welsh Corgi”
![Page 53: Where have we been? Where are we going? - ACM Learning …• 1.4M objects, 75.7K obj. classes • 1.5M relationships, 40.5K rel. classes • 1.7M attributes, 40.5K attr. classes •](https://reader033.vdocuments.us/reader033/viewer/2022042222/5ec87e7f022b2a556d25c351/html5/thumbnails/53.jpg)
[Gebru, Krause, Deng, Fei-Fei, CHI 2017]
2567 classes 700k images
Fine-Grained Recognition cars
![Page 54: Where have we been? Where are we going? - ACM Learning …• 1.4M objects, 75.7K obj. classes • 1.5M relationships, 40.5K rel. classes • 1.7M attributes, 40.5K attr. classes •](https://reader033.vdocuments.us/reader033/viewer/2022042222/5ec87e7f022b2a556d25c351/html5/thumbnails/54.jpg)
Expected Outcomes
ImageNet becomes a benchmark
Machine learning advances and changes
dramatically
Breakthroughs in object recognition
![Page 55: Where have we been? Where are we going? - ACM Learning …• 1.4M objects, 75.7K obj. classes • 1.5M relationships, 40.5K rel. classes • 1.7M attributes, 40.5K attr. classes •](https://reader033.vdocuments.us/reader033/viewer/2022042222/5ec87e7f022b2a556d25c351/html5/thumbnails/55.jpg)
Unexpected Outcomes
![Page 56: Where have we been? Where are we going? - ACM Learning …• 1.4M objects, 75.7K obj. classes • 1.5M relationships, 40.5K rel. classes • 1.7M attributes, 40.5K attr. classes •](https://reader033.vdocuments.us/reader033/viewer/2022042222/5ec87e7f022b2a556d25c351/html5/thumbnails/56.jpg)
Neural Nets are Cool Again!
Krizhevsky, Sutskever & Hinton, NIPS 2012
13,259 Citations
![Page 57: Where have we been? Where are we going? - ACM Learning …• 1.4M objects, 75.7K obj. classes • 1.5M relationships, 40.5K rel. classes • 1.7M attributes, 40.5K attr. classes •](https://reader033.vdocuments.us/reader033/viewer/2022042222/5ec87e7f022b2a556d25c351/html5/thumbnails/57.jpg)
…And Cooler and Cooler
[Krizhevsky et al. NIPS 2012]
“AlexNet”
[Szegedy et al. CVPR 2015]
“GoogLeNet”
[Simonyan & Zisserman, ICLR 2015]
“VGG Net”
[He et al. CVPR 2016]
“ResNet”
![Page 58: Where have we been? Where are we going? - ACM Learning …• 1.4M objects, 75.7K obj. classes • 1.5M relationships, 40.5K rel. classes • 1.7M attributes, 40.5K attr. classes •](https://reader033.vdocuments.us/reader033/viewer/2022042222/5ec87e7f022b2a556d25c351/html5/thumbnails/58.jpg)
A Deep Learning Revolution
Neural Nets
GPUs
![Page 59: Where have we been? Where are we going? - ACM Learning …• 1.4M objects, 75.7K obj. classes • 1.5M relationships, 40.5K rel. classes • 1.7M attributes, 40.5K attr. classes •](https://reader033.vdocuments.us/reader033/viewer/2022042222/5ec87e7f022b2a556d25c351/html5/thumbnails/59.jpg)
Ontological Structure Structure Not Used as Much
![Page 60: Where have we been? Where are we going? - ACM Learning …• 1.4M objects, 75.7K obj. classes • 1.5M relationships, 40.5K rel. classes • 1.7M attributes, 40.5K attr. classes •](https://reader033.vdocuments.us/reader033/viewer/2022042222/5ec87e7f022b2a556d25c351/html5/thumbnails/60.jpg)
Thing
Animalia
Chordate Arthropoda
Mammal Insect
Carnivora Diptera
Felidae Muscidae
Felis Musca
Housefly
Domestica Domestica Leo
Lion House Cat
Primate
Pongidae
Pan
Troglodytes
Chimpanzee
Hominidae
Homo
Sapiens
Human
Marsupial
Wombat
is a
is a
is a
Wombat
Deng, Krause, Berg & Fei-Fei, CVPR 2012
![Page 61: Where have we been? Where are we going? - ACM Learning …• 1.4M objects, 75.7K obj. classes • 1.5M relationships, 40.5K rel. classes • 1.7M attributes, 40.5K attr. classes •](https://reader033.vdocuments.us/reader033/viewer/2022042222/5ec87e7f022b2a556d25c351/html5/thumbnails/61.jpg)
Thing
Animalia
Chordate Arthropoda
Mammal Insect
Carnivora Diptera
Felidae Muscidae
Felis Musca
Housefly
Domestica Domestica Leo
Lion House Cat
Primate
Pongidae
Pan
Troglodytes
Chimpanzee
Hominidae
Homo
Sapiens
Human
Marsupial
Wombat
is a
is a
is a
Wombat
Thing
Animal
Mammal
Marsupial
Wombat Deng, Krause, Berg & Fei-Fei, CVPR 2012
![Page 62: Where have we been? Where are we going? - ACM Learning …• 1.4M objects, 75.7K obj. classes • 1.5M relationships, 40.5K rel. classes • 1.7M attributes, 40.5K attr. classes •](https://reader033.vdocuments.us/reader033/viewer/2022042222/5ec87e7f022b2a556d25c351/html5/thumbnails/62.jpg)
Thing
Animalia
Chordate Arthropoda
Mammal Insect
Carnivora Diptera
Felidae Muscidae
Felis Musca
Housefly
Domestica Domestica Leo
Lion House Cat
Primate
Pongidae
Pan
Troglodytes
Chimpanzee
Hominidae
Homo
Sapiens
Human
Marsupial
Wombat
is a
is a
is a
Wombat Maximize Specificity ( f ) Subject to Accuracy ( f ) ≥ 1 - ε
Deng, Krause, Berg & Fei-Fei, CVPR 2012
![Page 63: Where have we been? Where are we going? - ACM Learning …• 1.4M objects, 75.7K obj. classes • 1.5M relationships, 40.5K rel. classes • 1.7M attributes, 40.5K attr. classes •](https://reader033.vdocuments.us/reader033/viewer/2022042222/5ec87e7f022b2a556d25c351/html5/thumbnails/63.jpg)
Our Model
Optimizing with a Knowledge Ontology Results in Big Gains in Information at Arbitrary Accuracy
Deng, Krause, Berg & Fei-Fei, CVPR 2012
![Page 64: Where have we been? Where are we going? - ACM Learning …• 1.4M objects, 75.7K obj. classes • 1.5M relationships, 40.5K rel. classes • 1.7M attributes, 40.5K attr. classes •](https://reader033.vdocuments.us/reader033/viewer/2022042222/5ec87e7f022b2a556d25c351/html5/thumbnails/64.jpg)
Kuettel, Guillaumin, Ferrari. Segmentation Propagation in ImageNet. ECCV 2012
ECCV 2012 Best paper Award
Relatively Few Works Have Used Ontology
![Page 65: Where have we been? Where are we going? - ACM Learning …• 1.4M objects, 75.7K obj. classes • 1.5M relationships, 40.5K rel. classes • 1.7M attributes, 40.5K attr. classes •](https://reader033.vdocuments.us/reader033/viewer/2022042222/5ec87e7f022b2a556d25c351/html5/thumbnails/65.jpg)
Most works still use 1M images to do pre-training
15M Images Total
1M Images
![Page 66: Where have we been? Where are we going? - ACM Learning …• 1.4M objects, 75.7K obj. classes • 1.5M relationships, 40.5K rel. classes • 1.7M attributes, 40.5K attr. classes •](https://reader033.vdocuments.us/reader033/viewer/2022042222/5ec87e7f022b2a556d25c351/html5/thumbnails/66.jpg)
“First, we find that the performance on vision
tasks still increases linearly with orders of magnitude of training
data size.”
C. Sun et al, 2017
![Page 67: Where have we been? Where are we going? - ACM Learning …• 1.4M objects, 75.7K obj. classes • 1.5M relationships, 40.5K rel. classes • 1.7M attributes, 40.5K attr. classes •](https://reader033.vdocuments.us/reader033/viewer/2022042222/5ec87e7f022b2a556d25c351/html5/thumbnails/67.jpg)
How Humans Compare
Andrej Karpathy. http://karpathy.github.io/2014/09/02/what-i-learned-from-competing-against-a-convnet-on-imagenet/
![Page 68: Where have we been? Where are we going? - ACM Learning …• 1.4M objects, 75.7K obj. classes • 1.5M relationships, 40.5K rel. classes • 1.7M attributes, 40.5K attr. classes •](https://reader033.vdocuments.us/reader033/viewer/2022042222/5ec87e7f022b2a556d25c351/html5/thumbnails/68.jpg)
How Humans Compare GoogLeNet
6.8% Top-5 error rate
• Small, thin objects • Image filters • Abstract representations • Miscellaneous sources
Susceptible to:
Human
5.1% Top-5 error rate
• Fine-grained recognition • Class unawareness • Insufficient training data
Susceptible to:
Andrej Karpathy. http://karpathy.github.io/2014/09/02/what-i-learned-from-competing-against-a-convnet-on-imagenet/
![Page 69: Where have we been? Where are we going? - ACM Learning …• 1.4M objects, 75.7K obj. classes • 1.5M relationships, 40.5K rel. classes • 1.7M attributes, 40.5K attr. classes •](https://reader033.vdocuments.us/reader033/viewer/2022042222/5ec87e7f022b2a556d25c351/html5/thumbnails/69.jpg)
What Lies Ahead
![Page 70: Where have we been? Where are we going? - ACM Learning …• 1.4M objects, 75.7K obj. classes • 1.5M relationships, 40.5K rel. classes • 1.7M attributes, 40.5K attr. classes •](https://reader033.vdocuments.us/reader033/viewer/2022042222/5ec87e7f022b2a556d25c351/html5/thumbnails/70.jpg)
Moving from object recognition… person
person person
person person
scale
room
![Page 71: Where have we been? Where are we going? - ACM Learning …• 1.4M objects, 75.7K obj. classes • 1.5M relationships, 40.5K rel. classes • 1.7M attributes, 40.5K attr. classes •](https://reader033.vdocuments.us/reader033/viewer/2022042222/5ec87e7f022b2a556d25c351/html5/thumbnails/71.jpg)
…to human-level understanding. person
Standing on person
Stepping on
person
Watching and laughing
room
scale
Wants to weigh himself
Wants to play a prank
Stepping on a scale adds weight and ups the reading.
![Page 72: Where have we been? Where are we going? - ACM Learning …• 1.4M objects, 75.7K obj. classes • 1.5M relationships, 40.5K rel. classes • 1.7M attributes, 40.5K attr. classes •](https://reader033.vdocuments.us/reader033/viewer/2022042222/5ec87e7f022b2a556d25c351/html5/thumbnails/72.jpg)
Inverse Graphics
Image credit: https://www.youtube.com/watch?v=ip-KIzQmcBo (Oliver Villar)
![Page 73: Where have we been? Where are we going? - ACM Learning …• 1.4M objects, 75.7K obj. classes • 1.5M relationships, 40.5K rel. classes • 1.7M attributes, 40.5K attr. classes •](https://reader033.vdocuments.us/reader033/viewer/2022042222/5ec87e7f022b2a556d25c351/html5/thumbnails/73.jpg)
ImageNet: Deng et al. 2009; COCO: Lin et al. 2014
lady
![Page 74: Where have we been? Where are we going? - ACM Learning …• 1.4M objects, 75.7K obj. classes • 1.5M relationships, 40.5K rel. classes • 1.7M attributes, 40.5K attr. classes •](https://reader033.vdocuments.us/reader033/viewer/2022042222/5ec87e7f022b2a556d25c351/html5/thumbnails/74.jpg)
tree
ski jacket
boots
snow
sunglasses vest
pole
coat glove
head
building leaves
equipment
bag
hat
sky
…
lady
COCO: Lin et al. 2014
“A lady in pink dress is skiing.”
![Page 75: Where have we been? Where are we going? - ACM Learning …• 1.4M objects, 75.7K obj. classes • 1.5M relationships, 40.5K rel. classes • 1.7M attributes, 40.5K attr. classes •](https://reader033.vdocuments.us/reader033/viewer/2022042222/5ec87e7f022b2a556d25c351/html5/thumbnails/75.jpg)
Q: What is the man in the center doing? A: Standing on a ski. Q: What is the color of the sky? A: Blue Q: Where are the pine trees? A: Behind the hill.
… <woman, wear, coat> <trees, be, green> <trees, behind, group (of people)> <man, has, jacket> <boots, be, yellow> <lady, hold, skis>
“A man standing.” “A clear blue sky at a ski resort.” “A snowy hill is in front of pine trees.” “There are several pine trees.” “A group of people getting ready to ski.”
tree
ski jacket
boots
snow
sunglasses vest
pole
coat glove
head
building leaves
equipment
bag
hat
sky
…
lady
“A lady in pink dress is skiing.”
![Page 76: Where have we been? Where are we going? - ACM Learning …• 1.4M objects, 75.7K obj. classes • 1.5M relationships, 40.5K rel. classes • 1.7M attributes, 40.5K attr. classes •](https://reader033.vdocuments.us/reader033/viewer/2022042222/5ec87e7f022b2a556d25c351/html5/thumbnails/76.jpg)
![Page 77: Where have we been? Where are we going? - ACM Learning …• 1.4M objects, 75.7K obj. classes • 1.5M relationships, 40.5K rel. classes • 1.7M attributes, 40.5K attr. classes •](https://reader033.vdocuments.us/reader033/viewer/2022042222/5ec87e7f022b2a556d25c351/html5/thumbnails/77.jpg)
entire universe of images
[Johnson et al., CVPR 2015]
![Page 78: Where have we been? Where are we going? - ACM Learning …• 1.4M objects, 75.7K obj. classes • 1.5M relationships, 40.5K rel. classes • 1.7M attributes, 40.5K attr. classes •](https://reader033.vdocuments.us/reader033/viewer/2022042222/5ec87e7f022b2a556d25c351/html5/thumbnails/78.jpg)
Visual Genome Dataset
Goals • Beyond nouns
– Objects, verbs, attibutes
• Beyond object classification
– Relationships and contexts
• Sentences and QAs
• From Perception to Cognition
Specs • 108,249 images (COCO images)
• 4.2M image descriptions
• 1.8M Visual QA (7W)
• 1.4M objects, 75.7K obj. classes
• 1.5M relationships, 40.5K rel. classes
• 1.7M attributes, 40.5K attr. classes
• Vision and language correspondences
• Everything mapped to WordNet Synset
Krishna et al. IJCV 2016
A dataset, a knowledge base, an ongoing effort to connect structural image concepts to language.
![Page 79: Where have we been? Where are we going? - ACM Learning …• 1.4M objects, 75.7K obj. classes • 1.5M relationships, 40.5K rel. classes • 1.7M attributes, 40.5K attr. classes •](https://reader033.vdocuments.us/reader033/viewer/2022042222/5ec87e7f022b2a556d25c351/html5/thumbnails/79.jpg)
Visual Genome Dataset A dataset, a knowledge base, an ongoing effort to connect structural image concepts to language.
Krishna et al. IJCV 2016
Q: What is the person sitting on the right of the elephant wearing? A: A blue shirt.
DenseCap & Paragraph Generation Karpathy et al. CVPR’16 Krause et al. CVPR’17
Relationship Prediction Krishna et al. ECCV’16
Image Retrieval w/ Scene Graphs Johnson et al. CVPR’15 Xu et al. CVPR’17
Visual Q&A Zhu et al. CVPR’16
![Page 80: Where have we been? Where are we going? - ACM Learning …• 1.4M objects, 75.7K obj. classes • 1.5M relationships, 40.5K rel. classes • 1.7M attributes, 40.5K attr. classes •](https://reader033.vdocuments.us/reader033/viewer/2022042222/5ec87e7f022b2a556d25c351/html5/thumbnails/80.jpg)
Visual Genome Dataset A dataset, a knowledge base, an ongoing effort to connect structural image concepts to language.
Krishna et al. IJCV 2016
Q: What is the person sitting on the right of the elephant wearing? A: A blue shirt.
DenseCap & Paragraph Generation Karpathy et al. CVPR’16 Krause et al. CVPR’17
Relationship Prediction Krishna et al. ECCV’16
Image Retrieval w/ Scene Graphs Johnson et al. CVPR’15 Xu et al. CVPR’17
Visual Q&A Zhu et al. CVPR’16 Workshop on Visual Understanding by Learning from
Web Data 2017
26 July 2017 | Honolulu, Hawaii in conjunction with CVPR 2017
http://www.vision.ee.ethz.ch/webvision/workshop.html
![Page 81: Where have we been? Where are we going? - ACM Learning …• 1.4M objects, 75.7K obj. classes • 1.5M relationships, 40.5K rel. classes • 1.7M attributes, 40.5K attr. classes •](https://reader033.vdocuments.us/reader033/viewer/2022042222/5ec87e7f022b2a556d25c351/html5/thumbnails/81.jpg)
81
The Future of Vision and Intelligence
Agency: The integration of perception, understanding
and action
Vision
Language
Understanding
Action
![Page 82: Where have we been? Where are we going? - ACM Learning …• 1.4M objects, 75.7K obj. classes • 1.5M relationships, 40.5K rel. classes • 1.7M attributes, 40.5K attr. classes •](https://reader033.vdocuments.us/reader033/viewer/2022042222/5ec87e7f022b2a556d25c351/html5/thumbnails/82.jpg)
Eight Years of Competitions
2010-2017 10× reduction of image classification error
3× improvement of
detection precision
![Page 83: Where have we been? Where are we going? - ACM Learning …• 1.4M objects, 75.7K obj. classes • 1.5M relationships, 40.5K rel. classes • 1.7M attributes, 40.5K attr. classes •](https://reader033.vdocuments.us/reader033/viewer/2022042222/5ec87e7f022b2a556d25c351/html5/thumbnails/83.jpg)
What Happens Now?
We’re passing the baton to Kaggle: a community of more
than 1M data scientists.
Why? democratizing data is vital to
democratizing AI.
image-net.org remains live at
Stanford.
![Page 84: Where have we been? Where are we going? - ACM Learning …• 1.4M objects, 75.7K obj. classes • 1.5M relationships, 40.5K rel. classes • 1.7M attributes, 40.5K attr. classes •](https://reader033.vdocuments.us/reader033/viewer/2022042222/5ec87e7f022b2a556d25c351/html5/thumbnails/84.jpg)
What Happens Now?
ImageNet Object Localization Challenge
ImageNet Object Detection Challenge
ImageNet Object Detection from Video Challenge
![Page 85: Where have we been? Where are we going? - ACM Learning …• 1.4M objects, 75.7K obj. classes • 1.5M relationships, 40.5K rel. classes • 1.7M attributes, 40.5K attr. classes •](https://reader033.vdocuments.us/reader033/viewer/2022042222/5ec87e7f022b2a556d25c351/html5/thumbnails/85.jpg)
Alex Berg Michael Bernstein
Edward Chang Brendan Collins
Jia Deng Minh Do Wei Dong
Alexei Efros Mark Everingham
Christiane Fellbaum Adam Finkelstein
Thomas Funkhouser Timnit Gebru
Derek Hoiem Zhiheng Huang Andrej Karpathy
Aditya Khosla Jonathan Krause
Fei-Fei Li Kai Li
Li-Jia Li Wei Liu Sean Ma
Xiaojuan Ma Jitendra Malik Dan Osherson
Eunbyung Park Chuck Rosenberg Olga Russakovksy Sanjeev Satheesh Richard Socher
Hao Su Zhe Wang
Andrew Zisserman
Contributors/Friends/Advisors
49k Amazon Mechanical Turk Workers
![Page 86: Where have we been? Where are we going? - ACM Learning …• 1.4M objects, 75.7K obj. classes • 1.5M relationships, 40.5K rel. classes • 1.7M attributes, 40.5K attr. classes •](https://reader033.vdocuments.us/reader033/viewer/2022042222/5ec87e7f022b2a556d25c351/html5/thumbnails/86.jpg)
“This is not the end. It is not even the beginning of the end. But it is, perhaps,
the end of the beginning.”
W I N S T O N C H U R C H I L L