![Page 1: ImageNet : A Large-Scale Hierarchical Image Database](https://reader036.vdocuments.us/reader036/viewer/2022081421/56812a92550346895d8e3fbb/html5/thumbnails/1.jpg)
Jia Deng, Wei Dong, Richard Socher, Li-Jia Li, Kai Li and Li Fei-FeiDept. of Computer Science, Princeton University, USA
CVPR 2009
23/4/19ImageNet 1
Jiewen [email protected]
![Page 2: ImageNet : A Large-Scale Hierarchical Image Database](https://reader036.vdocuments.us/reader036/viewer/2022081421/56812a92550346895d8e3fbb/html5/thumbnails/2.jpg)
This paper is mainly an introduction to ImageNet. The paper is organized as follows:
1. shows properties of ImageNet 2. Compare ImageNet with current related datasets
3. Constructing ImageNet - describes without concrete steps
4. ImageNet Applications mainly focus on the constructing ImageNet. It mostly relatives to Crawling and
PageRank.
23/4/19ImageNet 2
![Page 3: ImageNet : A Large-Scale Hierarchical Image Database](https://reader036.vdocuments.us/reader036/viewer/2022081421/56812a92550346895d8e3fbb/html5/thumbnails/3.jpg)
A dataset- Datasets and Computer Vision Based on WordNet- Each node is depicted by images A knowledge ontology- Taxonomy- Partonomy
23/4/19ImageNet 3
![Page 4: ImageNet : A Large-Scale Hierarchical Image Database](https://reader036.vdocuments.us/reader036/viewer/2022081421/56812a92550346895d8e3fbb/html5/thumbnails/4.jpg)
2-step process
23/4/19ImageNet 4
Step 1 :Collect candidate images Via the Internet
Step 2 :Clean up the candidate Images by humans
![Page 5: ImageNet : A Large-Scale Hierarchical Image Database](https://reader036.vdocuments.us/reader036/viewer/2022081421/56812a92550346895d8e3fbb/html5/thumbnails/5.jpg)
For each synset, the queries are the set of WordNet synonyms
Accuracy of Internet Image search results: 10 %- For 500-1000 clean images, needs 10K images Query expansion- Synonyms: German police dog, German shepherd dog- Appending words form ancestors: sheepdog, dog Multiple Languages- Italian, Dutch, Spanish, Chinese e.g. 德国牧羊犬, pastore tedesco More engines: Yahoo! , flickr, Google
Parallel downloading
23/4/19ImageNet 5
![Page 6: ImageNet : A Large-Scale Hierarchical Image Database](https://reader036.vdocuments.us/reader036/viewer/2022081421/56812a92550346895d8e3fbb/html5/thumbnails/6.jpg)
Rely on humans to verify each candidate image collected for a given synset
Amazon Mechanical Turk (AMT)- used for labeling vision data- 300 images: 0.02 dollar- 14,197,122 images: 946 dollars- 10 repetition: 9460 dollars- Jul 2008 -Apr 2010:11 million images Present the users with a set of
candidate images and the definition of the target synset
let users select the best match ones
23/4/19ImageNet 6
![Page 7: ImageNet : A Large-Scale Hierarchical Image Database](https://reader036.vdocuments.us/reader036/viewer/2022081421/56812a92550346895d8e3fbb/html5/thumbnails/7.jpg)
23/4/19ImageNet 7
Workers do annotation on AMT-Multiple annotations for each images
Annotation Results- An average of > 97% accuracy
![Page 8: ImageNet : A Large-Scale Hierarchical Image Database](https://reader036.vdocuments.us/reader036/viewer/2022081421/56812a92550346895d8e3fbb/html5/thumbnails/8.jpg)
Users Enhancement- Provide wiki and google links for definitions- Make sure workers read the definition - Definition quiz- Allow more feedback. E.g. “unimagable
synset” expert opinion
23/4/19ImageNet 8
![Page 9: ImageNet : A Large-Scale Hierarchical Image Database](https://reader036.vdocuments.us/reader036/viewer/2022081421/56812a92550346895d8e3fbb/html5/thumbnails/9.jpg)
Human users make mistakes Not all users follow the instructions Users do not always agree with each other- Subtle or confusing synsets, e.g. Burmese cat Quality Control System
23/4/19ImageNet 9
![Page 10: ImageNet : A Large-Scale Hierarchical Image Database](https://reader036.vdocuments.us/reader036/viewer/2022081421/56812a92550346895d8e3fbb/html5/thumbnails/10.jpg)
randomly sample an initial subset of image to users
- Have multiple users independently label same image obtain a confidence score table,
indicating the probability of an image being a good image given the user votes - Different categories requires different levels of consensus
Proceed until a pre-determined confidence score threshold reached
23/4/19ImageNet 10
![Page 11: ImageNet : A Large-Scale Hierarchical Image Database](https://reader036.vdocuments.us/reader036/viewer/2022081421/56812a92550346895d8e3fbb/html5/thumbnails/11.jpg)
Scale: 12 subtrees,3,2 million images,5247 categories
Hierarchy: densely populated semantic hierarchy, based on WordNet
23/4/19ImageNet 11
![Page 12: ImageNet : A Large-Scale Hierarchical Image Database](https://reader036.vdocuments.us/reader036/viewer/2022081421/56812a92550346895d8e3fbb/html5/thumbnails/12.jpg)
Accuracy: clean dataset at all level
23/4/19ImageNet 12
Diversity: variable appearances, positions, view points, poses, background clutter, occlusions.
![Page 13: ImageNet : A Large-Scale Hierarchical Image Database](https://reader036.vdocuments.us/reader036/viewer/2022081421/56812a92550346895d8e3fbb/html5/thumbnails/13.jpg)
Non-parametric Object Recognition1. NN-voting + noisy ImageNet2. NN-voting + clean ImageNet3. Naive Bayesian Nearest Neighbor (NBNN)4. NBNN-100 Tree Based Image Classification Automatic Object Localization
23/4/19ImageNet 13
![Page 14: ImageNet : A Large-Scale Hierarchical Image Database](https://reader036.vdocuments.us/reader036/viewer/2022081421/56812a92550346895d8e3fbb/html5/thumbnails/14.jpg)
Pros1. Crowdsourcing
2. Benchmarking
3. Open: Download Original Images, URLs, Features, Object Attributes, API
Cons1. Improve algorithm: PageRank
2. AMT: hierarchical users based on their ability
3. Only one tag per image
23/4/19ImageNet 14