large dataset for object and scene recognition a. torralba, r. fergus, w. t. freeman 80 million tiny...
TRANSCRIPT
![Page 1: Large dataset for object and scene recognition A. Torralba, R. Fergus, W. T. Freeman 80 million tiny images Ron Yanovich Guy Peled](https://reader036.vdocuments.us/reader036/viewer/2022062714/56649d1f5503460f949f35ea/html5/thumbnails/1.jpg)
Large dataset for object and scene recognition
A. Torralba, R. Fergus, W. T. Freeman
80 million tiny images
Ron Yanovich Guy Peled
![Page 2: Large dataset for object and scene recognition A. Torralba, R. Fergus, W. T. Freeman 80 million tiny images Ron Yanovich Guy Peled](https://reader036.vdocuments.us/reader036/viewer/2022062714/56649d1f5503460f949f35ea/html5/thumbnails/2.jpg)
![Page 3: Large dataset for object and scene recognition A. Torralba, R. Fergus, W. T. Freeman 80 million tiny images Ron Yanovich Guy Peled](https://reader036.vdocuments.us/reader036/viewer/2022062714/56649d1f5503460f949f35ea/html5/thumbnails/3.jpg)
http://royal.pingdom.com/
Internet 2012 in numbers• 7 petabytes
– How much photo content Facebook added every month.
• 300 million– Number of new photos added every day to Facebook.
• 5 billion– The total number of photos uploaded to Instagram since its start,
reached in September 2012.
• 58 – Number of photos uploaded every second to Instagram.
• 1 – Apple iPhone 4S was the most popular camera on Flickr.
![Page 4: Large dataset for object and scene recognition A. Torralba, R. Fergus, W. T. Freeman 80 million tiny images Ron Yanovich Guy Peled](https://reader036.vdocuments.us/reader036/viewer/2022062714/56649d1f5503460f949f35ea/html5/thumbnails/4.jpg)
• Image search is a specialized data search used to find images
• Search methods– Image meta search– Content-base image retrieval
Image Retrieval
![Page 5: Large dataset for object and scene recognition A. Torralba, R. Fergus, W. T. Freeman 80 million tiny images Ron Yanovich Guy Peled](https://reader036.vdocuments.us/reader036/viewer/2022062714/56649d1f5503460f949f35ea/html5/thumbnails/5.jpg)
• Search of images based on associated metadata such as keywords, text, etc.
• Google Images– The keywords for the image search are based on
the filename of the image, the link text pointing to the image, and text adjacent to the image
Image meta search
http://en.wikipedia.org/wiki/Google_Images
![Page 6: Large dataset for object and scene recognition A. Torralba, R. Fergus, W. T. Freeman 80 million tiny images Ron Yanovich Guy Peled](https://reader036.vdocuments.us/reader036/viewer/2022062714/56649d1f5503460f949f35ea/html5/thumbnails/6.jpg)
• The search will analyze the actual contents of the image by colors, shapes, textures etc.
• The most common method for comparing two images in content based image retrieval is using an image distance measure.
• Many CBIR systems have been developed, but the problem of retrieving images on the basis of their pixel content remains largely unsolved.
Content-based image retrieval (CBIR)
http://en.wikipedia.org/wiki/Content-based_image_retrieval
![Page 7: Large dataset for object and scene recognition A. Torralba, R. Fergus, W. T. Freeman 80 million tiny images Ron Yanovich Guy Peled](https://reader036.vdocuments.us/reader036/viewer/2022062714/56649d1f5503460f949f35ea/html5/thumbnails/7.jpg)
prag.diee.unica.itwww.dailydawdle.com
![Page 8: Large dataset for object and scene recognition A. Torralba, R. Fergus, W. T. Freeman 80 million tiny images Ron Yanovich Guy Peled](https://reader036.vdocuments.us/reader036/viewer/2022062714/56649d1f5503460f949f35ea/html5/thumbnails/8.jpg)
Why not combine both methods?
![Page 9: Large dataset for object and scene recognition A. Torralba, R. Fergus, W. T. Freeman 80 million tiny images Ron Yanovich Guy Peled](https://reader036.vdocuments.us/reader036/viewer/2022062714/56649d1f5503460f949f35ea/html5/thumbnails/9.jpg)
Primary goals
• 79,000,000 images collected from WWW
• Image matching similar to Google search prediction– “Did you mean?” tool
![Page 10: Large dataset for object and scene recognition A. Torralba, R. Fergus, W. T. Freeman 80 million tiny images Ron Yanovich Guy Peled](https://reader036.vdocuments.us/reader036/viewer/2022062714/56649d1f5503460f949f35ea/html5/thumbnails/10.jpg)
The problem• 79,000,000 images
– Large storage
– Long process time
![Page 11: Large dataset for object and scene recognition A. Torralba, R. Fergus, W. T. Freeman 80 million tiny images Ron Yanovich Guy Peled](https://reader036.vdocuments.us/reader036/viewer/2022062714/56649d1f5503460f949f35ea/html5/thumbnails/11.jpg)
Collecting ~80,000,000 images
• Using image search engines:– Altavista, Ask, Flickr, Cydral, Google, Picsearch and
Webshots
• 760GB on one hard disk?
www.apartmenttherapy.com
![Page 12: Large dataset for object and scene recognition A. Torralba, R. Fergus, W. T. Freeman 80 million tiny images Ron Yanovich Guy Peled](https://reader036.vdocuments.us/reader036/viewer/2022062714/56649d1f5503460f949f35ea/html5/thumbnails/12.jpg)
Creating image dataset
• Each image is labeled with one of the 75,062 non-abstract nouns in English,
as listed in the Wordnetlexical database.
• The result is a large semantic tree
![Page 13: Large dataset for object and scene recognition A. Torralba, R. Fergus, W. T. Freeman 80 million tiny images Ron Yanovich Guy Peled](https://reader036.vdocuments.us/reader036/viewer/2022062714/56649d1f5503460f949f35ea/html5/thumbnails/13.jpg)
What is WordNet
• WordNet® is a large lexical database of English Nouns, verbs, adjectives and adverbs are grouped into sets of cognitive synonyms (synsets), each expressing a distinct concept.
http://wordnet.princeton.edu
![Page 14: Large dataset for object and scene recognition A. Torralba, R. Fergus, W. T. Freeman 80 million tiny images Ron Yanovich Guy Peled](https://reader036.vdocuments.us/reader036/viewer/2022062714/56649d1f5503460f949f35ea/html5/thumbnails/14.jpg)
carrot
Plant root
Plant organ
Plant part
Natural object
Object, physical object
entity, physical thing
entity
mechanism
Mechanical device
sprinkler
![Page 15: Large dataset for object and scene recognition A. Torralba, R. Fergus, W. T. Freeman 80 million tiny images Ron Yanovich Guy Peled](https://reader036.vdocuments.us/reader036/viewer/2022062714/56649d1f5503460f949f35ea/html5/thumbnails/15.jpg)
http://www.cs.princeton.edu/courses/archive/spr07/cos226/assignments/wordnet.html
![Page 16: Large dataset for object and scene recognition A. Torralba, R. Fergus, W. T. Freeman 80 million tiny images Ron Yanovich Guy Peled](https://reader036.vdocuments.us/reader036/viewer/2022062714/56649d1f5503460f949f35ea/html5/thumbnails/16.jpg)
Reduce space and process time
With The size of 32X32 we can get more than 80% correct recognition rate
![Page 17: Large dataset for object and scene recognition A. Torralba, R. Fergus, W. T. Freeman 80 million tiny images Ron Yanovich Guy Peled](https://reader036.vdocuments.us/reader036/viewer/2022062714/56649d1f5503460f949f35ea/html5/thumbnails/17.jpg)
Reduce space and process time
• Moving from 256X256 to 32X32
![Page 18: Large dataset for object and scene recognition A. Torralba, R. Fergus, W. T. Freeman 80 million tiny images Ron Yanovich Guy Peled](https://reader036.vdocuments.us/reader036/viewer/2022062714/56649d1f5503460f949f35ea/html5/thumbnails/18.jpg)
Reduce space and process time
Studies on the face perception have shown that only 16X16 pixels needed for robust face recognition
This remarkable performance is also found in a scene recognition task
![Page 19: Large dataset for object and scene recognition A. Torralba, R. Fergus, W. T. Freeman 80 million tiny images Ron Yanovich Guy Peled](https://reader036.vdocuments.us/reader036/viewer/2022062714/56649d1f5503460f949f35ea/html5/thumbnails/19.jpg)
Reduce space and process time
• Speech recognition uses 10^6 data points.
• Current experiments in object recognition typically use 10^2 - 10^4
![Page 20: Large dataset for object and scene recognition A. Torralba, R. Fergus, W. T. Freeman 80 million tiny images Ron Yanovich Guy Peled](https://reader036.vdocuments.us/reader036/viewer/2022062714/56649d1f5503460f949f35ea/html5/thumbnails/20.jpg)
Reduce space and process timeHuman visual space
• ( 100 years ) * ( 30 frames per sec ) = 10^11
• All 32X32 images = 10^7400 images– Most of the images are just noise
![Page 21: Large dataset for object and scene recognition A. Torralba, R. Fergus, W. T. Freeman 80 million tiny images Ron Yanovich Guy Peled](https://reader036.vdocuments.us/reader036/viewer/2022062714/56649d1f5503460f949f35ea/html5/thumbnails/21.jpg)
Reduce space and process time
• We understand that 32^2 contain enough data for our purpose.
• The advantage is the ability to work with million of images (~10^8).
![Page 22: Large dataset for object and scene recognition A. Torralba, R. Fergus, W. T. Freeman 80 million tiny images Ron Yanovich Guy Peled](https://reader036.vdocuments.us/reader036/viewer/2022062714/56649d1f5503460f949f35ea/html5/thumbnails/22.jpg)
Statistics of low-res images
• Image matching methods:
– SSD (sum of squared differences)
– Warp
– Shift (per pixel)
![Page 23: Large dataset for object and scene recognition A. Torralba, R. Fergus, W. T. Freeman 80 million tiny images Ron Yanovich Guy Peled](https://reader036.vdocuments.us/reader036/viewer/2022062714/56649d1f5503460f949f35ea/html5/thumbnails/23.jpg)
Statistics of low-res images
![Page 24: Large dataset for object and scene recognition A. Torralba, R. Fergus, W. T. Freeman 80 million tiny images Ron Yanovich Guy Peled](https://reader036.vdocuments.us/reader036/viewer/2022062714/56649d1f5503460f949f35ea/html5/thumbnails/24.jpg)
Statistics of low-res images
![Page 25: Large dataset for object and scene recognition A. Torralba, R. Fergus, W. T. Freeman 80 million tiny images Ron Yanovich Guy Peled](https://reader036.vdocuments.us/reader036/viewer/2022062714/56649d1f5503460f949f35ea/html5/thumbnails/25.jpg)
![Page 26: Large dataset for object and scene recognition A. Torralba, R. Fergus, W. T. Freeman 80 million tiny images Ron Yanovich Guy Peled](https://reader036.vdocuments.us/reader036/viewer/2022062714/56649d1f5503460f949f35ea/html5/thumbnails/26.jpg)
![Page 27: Large dataset for object and scene recognition A. Torralba, R. Fergus, W. T. Freeman 80 million tiny images Ron Yanovich Guy Peled](https://reader036.vdocuments.us/reader036/viewer/2022062714/56649d1f5503460f949f35ea/html5/thumbnails/27.jpg)
Recognition
• The goal is to recognize objects and scenery by using SSD, WARP, SHIFT methods
instead of complex matching algorithms
• Given an image, the neighbors are found using some similarity measure (D-Shift)
![Page 28: Large dataset for object and scene recognition A. Torralba, R. Fergus, W. T. Freeman 80 million tiny images Ron Yanovich Guy Peled](https://reader036.vdocuments.us/reader036/viewer/2022062714/56649d1f5503460f949f35ea/html5/thumbnails/28.jpg)
Recognition
• Each neighbor in turn votes for its branch within the WordNet tree.
• Classification
• Image Search returns an object
![Page 29: Large dataset for object and scene recognition A. Torralba, R. Fergus, W. T. Freeman 80 million tiny images Ron Yanovich Guy Peled](https://reader036.vdocuments.us/reader036/viewer/2022062714/56649d1f5503460f949f35ea/html5/thumbnails/29.jpg)
![Page 30: Large dataset for object and scene recognition A. Torralba, R. Fergus, W. T. Freeman 80 million tiny images Ron Yanovich Guy Peled](https://reader036.vdocuments.us/reader036/viewer/2022062714/56649d1f5503460f949f35ea/html5/thumbnails/30.jpg)
![Page 31: Large dataset for object and scene recognition A. Torralba, R. Fergus, W. T. Freeman 80 million tiny images Ron Yanovich Guy Peled](https://reader036.vdocuments.us/reader036/viewer/2022062714/56649d1f5503460f949f35ea/html5/thumbnails/31.jpg)
![Page 32: Large dataset for object and scene recognition A. Torralba, R. Fergus, W. T. Freeman 80 million tiny images Ron Yanovich Guy Peled](https://reader036.vdocuments.us/reader036/viewer/2022062714/56649d1f5503460f949f35ea/html5/thumbnails/32.jpg)
![Page 33: Large dataset for object and scene recognition A. Torralba, R. Fergus, W. T. Freeman 80 million tiny images Ron Yanovich Guy Peled](https://reader036.vdocuments.us/reader036/viewer/2022062714/56649d1f5503460f949f35ea/html5/thumbnails/33.jpg)
Person detection
• Is it a person?
![Page 34: Large dataset for object and scene recognition A. Torralba, R. Fergus, W. T. Freeman 80 million tiny images Ron Yanovich Guy Peled](https://reader036.vdocuments.us/reader036/viewer/2022062714/56649d1f5503460f949f35ea/html5/thumbnails/34.jpg)
Person detection
• Standard approach :
Face detection algorithm
Not good enough
![Page 35: Large dataset for object and scene recognition A. Torralba, R. Fergus, W. T. Freeman 80 million tiny images Ron Yanovich Guy Peled](https://reader036.vdocuments.us/reader036/viewer/2022062714/56649d1f5503460f949f35ea/html5/thumbnails/35.jpg)
Person detection
• Better approach: Using the image DB
• More then 23% images contain pictures of people
![Page 36: Large dataset for object and scene recognition A. Torralba, R. Fergus, W. T. Freeman 80 million tiny images Ron Yanovich Guy Peled](https://reader036.vdocuments.us/reader036/viewer/2022062714/56649d1f5503460f949f35ea/html5/thumbnails/36.jpg)
Person detection
• Evaluating performance by two different sets of test images:
- Evaluation using randomly drawn images
- Evaluation using Altavista images
![Page 37: Large dataset for object and scene recognition A. Torralba, R. Fergus, W. T. Freeman 80 million tiny images Ron Yanovich Guy Peled](https://reader036.vdocuments.us/reader036/viewer/2022062714/56649d1f5503460f949f35ea/html5/thumbnails/37.jpg)
Evaluation using randomly drawn images
• Randomly drawn 1,125 images from DB• People were manually segmented on each
image
• Findings:– Large Appearance Better performance– Weaker labels Largest object
![Page 38: Large dataset for object and scene recognition A. Torralba, R. Fergus, W. T. Freeman 80 million tiny images Ron Yanovich Guy Peled](https://reader036.vdocuments.us/reader036/viewer/2022062714/56649d1f5503460f949f35ea/html5/thumbnails/38.jpg)
Large Appearance Better Performance
A better performance is achieved when a person’s appearance is greater than 20% of the image.
![Page 39: Large dataset for object and scene recognition A. Torralba, R. Fergus, W. T. Freeman 80 million tiny images Ron Yanovich Guy Peled](https://reader036.vdocuments.us/reader036/viewer/2022062714/56649d1f5503460f949f35ea/html5/thumbnails/39.jpg)
![Page 40: Large dataset for object and scene recognition A. Torralba, R. Fergus, W. T. Freeman 80 million tiny images Ron Yanovich Guy Peled](https://reader036.vdocuments.us/reader036/viewer/2022062714/56649d1f5503460f949f35ea/html5/thumbnails/40.jpg)
![Page 41: Large dataset for object and scene recognition A. Torralba, R. Fergus, W. T. Freeman 80 million tiny images Ron Yanovich Guy Peled](https://reader036.vdocuments.us/reader036/viewer/2022062714/56649d1f5503460f949f35ea/html5/thumbnails/41.jpg)
Evaluation using Altavista images
• 1,018 images drawn by searching ‘person’ label• Images classified using WordNetReordered labels
![Page 42: Large dataset for object and scene recognition A. Torralba, R. Fergus, W. T. Freeman 80 million tiny images Ron Yanovich Guy Peled](https://reader036.vdocuments.us/reader036/viewer/2022062714/56649d1f5503460f949f35ea/html5/thumbnails/42.jpg)
![Page 43: Large dataset for object and scene recognition A. Torralba, R. Fergus, W. T. Freeman 80 million tiny images Ron Yanovich Guy Peled](https://reader036.vdocuments.us/reader036/viewer/2022062714/56649d1f5503460f949f35ea/html5/thumbnails/43.jpg)
Scene recognition
• A search for images that match an entire scene rather than a specific object
• Randomly tagging 1,125 pictures to:“City” , “River” , “Field” , “Mountain”
![Page 44: Large dataset for object and scene recognition A. Torralba, R. Fergus, W. T. Freeman 80 million tiny images Ron Yanovich Guy Peled](https://reader036.vdocuments.us/reader036/viewer/2022062714/56649d1f5503460f949f35ea/html5/thumbnails/44.jpg)
DB Size:80,000,000800,0008,000
The larger the database, the more successful the detection rate.
![Page 45: Large dataset for object and scene recognition A. Torralba, R. Fergus, W. T. Freeman 80 million tiny images Ron Yanovich Guy Peled](https://reader036.vdocuments.us/reader036/viewer/2022062714/56649d1f5503460f949f35ea/html5/thumbnails/45.jpg)
![Page 46: Large dataset for object and scene recognition A. Torralba, R. Fergus, W. T. Freeman 80 million tiny images Ron Yanovich Guy Peled](https://reader036.vdocuments.us/reader036/viewer/2022062714/56649d1f5503460f949f35ea/html5/thumbnails/46.jpg)
Achievements
• Building a large dataset of 79 million 32x32 color labeled images.
• Showing that a simple non-parametric method, in conjunction with large dataset, can give reasonable performance on object recognition task.
• Tasks as Person detection and Scene detection perform as good as leading class specific detectors
![Page 47: Large dataset for object and scene recognition A. Torralba, R. Fergus, W. T. Freeman 80 million tiny images Ron Yanovich Guy Peled](https://reader036.vdocuments.us/reader036/viewer/2022062714/56649d1f5503460f949f35ea/html5/thumbnails/47.jpg)
Conclusions
It is possible to put less effort into the modeling part in object recognition (seeking to develop suitable parametric representation for recognition), while simultaneously improving the dataset itself can help to solve the same problem.
![Page 48: Large dataset for object and scene recognition A. Torralba, R. Fergus, W. T. Freeman 80 million tiny images Ron Yanovich Guy Peled](https://reader036.vdocuments.us/reader036/viewer/2022062714/56649d1f5503460f949f35ea/html5/thumbnails/48.jpg)
References• 80 million tiny images
– http://people.csail.mit.edu/torralba/publications/80millionImages.pdf
• ImageNet– http://wordnet.cs.princeton.edu/papers/imagenet_cvpr09.pdf
• WordNet– http://wordnet.princeton.edu/wordnet/
• Precision and recall– http://en.wikipedia.org/wiki/Precision_and_recall
• ROC curve– http://en.wikipedia.org/wiki/Receiver_operating_characteristic
• Images taken from:– http://royal.pingdom.com/– http://en.wikipedia.org/wiki/Google_Images– http://en.wikipedia.org/wiki/Content-based_image_retrieval– http:// www.prag.diee.unica.it– http:// www.dailydawdle.com– www.apartmenttherapy.com– http://www.cs.princeton.edu/courses/archive/spr07/cos226/assignments/wordnet.html
![Page 49: Large dataset for object and scene recognition A. Torralba, R. Fergus, W. T. Freeman 80 million tiny images Ron Yanovich Guy Peled](https://reader036.vdocuments.us/reader036/viewer/2022062714/56649d1f5503460f949f35ea/html5/thumbnails/49.jpg)
?