human abilities presented by mahmoud awadallah 1

63
Human abilities Presented By Mahmoud Awadallah 1

Upload: jasper-clark

Post on 28-Dec-2015

222 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Human abilities Presented By Mahmoud Awadallah 1

Human abilities

Presented ByMahmoud Awadallah

1

Page 2: Human abilities Presented By Mahmoud Awadallah 1

What do we perceive in a glanceof a real-world scene?

Bryan Russell

Page 3: Human abilities Presented By Mahmoud Awadallah 1

Motivation

• Much can be recognized quickly

• Investigate the early computations of an image• Analyze real-world, complicated scenes

Page 4: Human abilities Presented By Mahmoud Awadallah 1
Page 5: Human abilities Presented By Mahmoud Awadallah 1

Stimuli: outdoor images

Page 6: Human abilities Presented By Mahmoud Awadallah 1

Stimuli: outdoor images

Page 7: Human abilities Presented By Mahmoud Awadallah 1

Stimuli: indoor images

Page 8: Human abilities Presented By Mahmoud Awadallah 1

Stimuli: indoor images

Page 9: Human abilities Presented By Mahmoud Awadallah 1
Page 10: Human abilities Presented By Mahmoud Awadallah 1
Page 11: Human abilities Presented By Mahmoud Awadallah 1
Page 12: Human abilities Presented By Mahmoud Awadallah 1

Experiment specifications

• 5 naïve scorers

• 105 attributes assessed for eachdescription

• 2 scoring fields for each attribute:– whether the attribute is described

– if yes, whether it is accurate

Page 13: Human abilities Presented By Mahmoud Awadallah 1

Computation of score

Attribute: building, Image: 52, PT: 500ms

Subject123

Correctly described?YesNoYes

Score: 0.67

For image 52, normalize by max score across all PT

Page 14: Human abilities Presented By Mahmoud Awadallah 1

How the scorers perform

Building attribute

Page 15: Human abilities Presented By Mahmoud Awadallah 1
Page 16: Human abilities Presented By Mahmoud Awadallah 1

The “content” of a single fixation

Animate objects

Page 17: Human abilities Presented By Mahmoud Awadallah 1

The “content” of a single fixation

Inanimate objects

Page 18: Human abilities Presented By Mahmoud Awadallah 1

The “content” of a single fixation

Scene

Page 19: Human abilities Presented By Mahmoud Awadallah 1

The “content” of a single fixation

Social events

Page 20: Human abilities Presented By Mahmoud Awadallah 1

Outdoor vs. indoor bias

Page 21: Human abilities Presented By Mahmoud Awadallah 1

Outdoor vs. indoor bias

Page 22: Human abilities Presented By Mahmoud Awadallah 1

Summary plots

Page 23: Human abilities Presented By Mahmoud Awadallah 1

Summary plots

Page 24: Human abilities Presented By Mahmoud Awadallah 1
Page 25: Human abilities Presented By Mahmoud Awadallah 1

Sensory vs. object/scene

Page 26: Human abilities Presented By Mahmoud Awadallah 1

Sensory vs. object/scene

Page 27: Human abilities Presented By Mahmoud Awadallah 1

Sensory vs. object/scene

Page 28: Human abilities Presented By Mahmoud Awadallah 1

Correlation of object/sceneperception

Page 29: Human abilities Presented By Mahmoud Awadallah 1

Scene vs. objects

Page 30: Human abilities Presented By Mahmoud Awadallah 1

Conclusions

• Outdoor scene bias

• Less information needed forshape/sensory recognition

• Weak correlation between scene andobject perception

Page 31: Human abilities Presented By Mahmoud Awadallah 1

80 million tiny images: a large dataset for non-parametric object and scene

recognition

Page 32: Human abilities Presented By Mahmoud Awadallah 1

A.I. for the postmodern world:• All questions have already been answered…many times, in

many ways

• Google is dumb, the “intelligence” is in the data

Page 33: Human abilities Presented By Mahmoud Awadallah 1

How about visual data?

• The key question here in this paper is: How big does the image dataset need to be to robustly perform recognition using simple nearest-neighbor schemes?

• Complex classification methods don’t extend well

• Can we use a simple classification method?

Page 34: Human abilities Presented By Mahmoud Awadallah 1

Past and future of image datasets in computer vision

Lenaa dataset in one picture

1972

100

105

1010

1020

Number of

pictures

1015

Human Click Limit(all humanity takingone picture/secondduring 100 years)

Time 1996

40.000

COREL

2007

2 billion

2020?

Slide by Antonio Torralba

Page 35: Human abilities Presented By Mahmoud Awadallah 1

How big is Flickr?

Credit: Franck_Michel (http://www.flickr.com/photos/franckmichel/)

100M photos updated daily6B photos as of August 2011!

• ~3B public photos

Page 36: Human abilities Presented By Mahmoud Awadallah 1

How Annotated is Flickr? (tag search)

Party – 23,416,126

Paris – 11,163,625

Pittsburgh – 1,152,829

Chair – 1,893,203

Violin – 233,661

Trashcan – 31,200

Page 37: Human abilities Presented By Mahmoud Awadallah 1

Noisy Output from Image Search Engines

Page 38: Human abilities Presented By Mahmoud Awadallah 1

Thumbnail Collection Project

Collected 80M images

http://people.csail.mit.edu/torralba/tinyimages

Page 39: Human abilities Presented By Mahmoud Awadallah 1

Thumbnail Collection Project

Collect images for ALL objects• List obtained from WordNet

• 75,378 non-abstract nouns in English

Page 40: Human abilities Presented By Mahmoud Awadallah 1

Web image dataset

79.3 million images

Collected using imagesearch engines

List of nouns taken from Wordnet

Save all images in 32x32

resolution

Page 41: Human abilities Presented By Mahmoud Awadallah 1

How Much is 80M Images?

One feature-length movie:• 105 min = 151K frames @ 24 FPS

For 80M images, watch 530 movies

How do we store this?• 1k * 80M = 80 GB

• Actual storage: 760GB

Page 42: Human abilities Presented By Mahmoud Awadallah 1

Powers of 10Number of images on my hard drive: 104

Number of images seen during my first 10 years: 108 (3 images/second * 60 * 60 * 16 * 365 * 10 = 630720000)

Number of images seen by all humanity: 1020

106,456,367,669 humans1 * 60 years * 3 images/second * 60 * 60 * 16 * 365 = 1 from http://www.prb.org/Articles/2002/HowManyPeopleHaveEverLivedonEarth.aspx

Number of photons in the universe: 1088

Number of all 8-bits 32x32 images: 107373

256 32*32*3 ~ 107373

Page 43: Human abilities Presented By Mahmoud Awadallah 1

Are 32x32 images enough?

Page 44: Human abilities Presented By Mahmoud Awadallah 1

Are 32x32 images enough?

Page 45: Human abilities Presented By Mahmoud Awadallah 1

Are 32x32 images enough?

Page 46: Human abilities Presented By Mahmoud Awadallah 1

Statistics of database of tiny images

46

Page 47: Human abilities Presented By Mahmoud Awadallah 1

Lots

Of

Images

A. Torralba, R. Fergus, W.T.Freeman. PAMI 2008

Page 48: Human abilities Presented By Mahmoud Awadallah 1

Lots

Of

Images

A. Torralba, R. Fergus, W.T.Freeman. PAMI 2008

Page 49: Human abilities Presented By Mahmoud Awadallah 1

Lots

Of

Images

Page 50: Human abilities Presented By Mahmoud Awadallah 1

First Attempt

Used SSD++ to find nearest neighbors of query image

• Used first 19 principal components

Page 51: Human abilities Presented By Mahmoud Awadallah 1

SSD says these are not similar

?

Page 52: Human abilities Presented By Mahmoud Awadallah 1

Another similarity measure

Page 53: Human abilities Presented By Mahmoud Awadallah 1

Wordnet Voting Scheme

Ground truth

One image – one vote

Page 54: Human abilities Presented By Mahmoud Awadallah 1

Classification at Multiple Semantic Levels

Votes:

Animal 6Person 33Plant 5Device 3Administrative4Others 22

Votes:

Living 44Artifact 9Land 3Region 7Others 10

Page 55: Human abilities Presented By Mahmoud Awadallah 1
Page 56: Human abilities Presented By Mahmoud Awadallah 1

Person Recognition

23% of all imagesin dataset containpeople

Wide range ofposes: not justfrontal faces

Page 57: Human abilities Presented By Mahmoud Awadallah 1

Person Recognition – Test Set

•1016 images fromAltavista using“person” query

•High res and 32x32available

•Disjoint from 79million tiny images

Page 58: Human abilities Presented By Mahmoud Awadallah 1
Page 59: Human abilities Presented By Mahmoud Awadallah 1
Page 60: Human abilities Presented By Mahmoud Awadallah 1

Person Recognition

Task: person in image or not?

(c) shows the recall-precision curves for all 1018 images gathered fromAltavista, and (d) shows curves for the subset of 173 images where people occupy at least 20% of the image

Page 61: Human abilities Presented By Mahmoud Awadallah 1

Scene classification

yellow = 7,900 image training set; red = 790,000 images; blue = 79,000,000 images

Page 62: Human abilities Presented By Mahmoud Awadallah 1
Page 63: Human abilities Presented By Mahmoud Awadallah 1

What If we have Labels…