grammars in computer vision presented by: thomas kollar slides courtesy of song-chun zhu

44
Grammars in computer Grammars in computer vision vision Presented by: Thomas Kollar Slides courtesy of Song-Chun

Upload: leslie-wilson

Post on 29-Jan-2016

223 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Grammars in computer vision Presented by: Thomas Kollar Slides courtesy of Song-Chun Zhu

Grammars in computer Grammars in computer visionvision

Presented by: Thomas Kollar

Slides courtesy of Song-Chun Zhu

Page 2: Grammars in computer vision Presented by: Thomas Kollar Slides courtesy of Song-Chun Zhu

PartsGlobal appearance

Local contextGlobal context

Object size

Inside the object(intrinsic features)

Outside the object(contextual features)

Pixels

Kruppa & Shiele, (03), Fink & Perona (03)

Carbonetto, Freitas, Barnard (03), Kumar, Hebert, (03)

He, Zemel, Carreira-Perpinan (04), Moore, Essa, Monson, Hayes (99)

Strat & Fischler (91), Torralba (03), Murphy, Torralba & Freeman (03)

Agarwal & Roth, (02), Moghaddam, Pentland (97), Turk, Pentland (91),Vidal-Naquet, Ullman, (03)

Heisele, et al, (01), Agarwal & Roth, (02), Kremp, Geman, Amit (02), Dorko, Schmid, (03)

Fergus, Perona, Zisserman (03), Fei Fei, Fergus, Perona, (03), Schneiderman, Kanade (00), Lowe (99)Etc.

Context in computer Context in computer visionvision

Page 3: Grammars in computer vision Presented by: Thomas Kollar Slides courtesy of Song-Chun Zhu

Why grammars?Why grammars?

Guzman (SEE), 1968 Noton and Stark 1971 Hansen & Riseman (VISIONS),

1978 Barrow & Tenenbaum 1978 Brooks (ACRONYM), 1979 Marr, 1982 Ohta & Kanade, 1978 Yakimovsky & Feldman, 1973

[Ohta & Kanade 1978]

Page 4: Grammars in computer vision Presented by: Thomas Kollar Slides courtesy of Song-Chun Zhu

Why grammars?Why grammars?

Page 5: Grammars in computer vision Presented by: Thomas Kollar Slides courtesy of Song-Chun Zhu

Why grammars?Why grammars?

Page 6: Grammars in computer vision Presented by: Thomas Kollar Slides courtesy of Song-Chun Zhu

Which papers?Which papers?

F. Han and S.C. Zhu, Bottom-up/Top-down Image Parsing with Attribute Grammar, 2005.

Zijian Xu; A hierarchical compositional model for representation and sketching of high-resolution human images, PhD Thesis 2007.

Song-Chun Zhu and David Mumford; A stochastic grammar of images, 2007.

L. Lin, S. Peng, J. Porway, S.C. Zhu, and Y. Wang, An empirical study of object category recognition: sequential testing with generalized samples, 2007.

Page 7: Grammars in computer vision Presented by: Thomas Kollar Slides courtesy of Song-Chun Zhu

DatasetsDatasets

Page 8: Grammars in computer vision Presented by: Thomas Kollar Slides courtesy of Song-Chun Zhu

Large-scale image Large-scale image labelinglabeling

Page 9: Grammars in computer vision Presented by: Thomas Kollar Slides courtesy of Song-Chun Zhu

Our Goal:Our Goal:

Page 10: Grammars in computer vision Presented by: Thomas Kollar Slides courtesy of Song-Chun Zhu

Three projects using and-Three projects using and-or graphsor graphs

1. Modeling an environment with rectangles.

2. Creating sketches

Page 11: Grammars in computer vision Presented by: Thomas Kollar Slides courtesy of Song-Chun Zhu

CommonalitiesCommonalities

Use context sensitive grammars Called And-Or graphs in these papers

Provides top-down and bottom-up influence

Most are generative all the way to the pixel level

Configuration matters E.g. they don’t assume independence given

the parent

These can take the form of a MRF

Page 12: Grammars in computer vision Presented by: Thomas Kollar Slides courtesy of Song-Chun Zhu

ChallengesChallenges

Objects have large within-category variations

Scenes have variation

Page 13: Grammars in computer vision Presented by: Thomas Kollar Slides courtesy of Song-Chun Zhu

ChallengesChallenges

Describing people has variation

Page 14: Grammars in computer vision Presented by: Thomas Kollar Slides courtesy of Song-Chun Zhu

Grammar definitionGrammar definition

Page 15: Grammars in computer vision Presented by: Thomas Kollar Slides courtesy of Song-Chun Zhu

And-or graphsAnd-or graphs

Page 16: Grammars in computer vision Presented by: Thomas Kollar Slides courtesy of Song-Chun Zhu

Modeling with rectanglesModeling with rectangles

Page 17: Grammars in computer vision Presented by: Thomas Kollar Slides courtesy of Song-Chun Zhu

Modeling with Modeling with rectanglesrectangles

Page 18: Grammars in computer vision Presented by: Thomas Kollar Slides courtesy of Song-Chun Zhu

Six production rulesSix production rules

Page 19: Grammars in computer vision Presented by: Thomas Kollar Slides courtesy of Song-Chun Zhu

Two examplesTwo examples

Page 20: Grammars in computer vision Presented by: Thomas Kollar Slides courtesy of Song-Chun Zhu

Three phasesThree phases

1. Bottom-up detection Compute edge segments and a number of

vanishing points. These vanishing points are grouped into a line set and rectangle hypotheses are found using RANSAC, generating a number of rectangles from a bottom up proposal.

2. Initialize the terminal nodes greedily Pick the most promising hypotheses with

heaviest weight by increase in posterior probability.

3. Incorporate top-down influence Each step of the algorithm picks the most

promising proposal among the 5 candidate rules by increase in posterior probability.

When a new non-terminal node is accepted (1) insert and create a new proposal (2) reweight the proposals (3) pass attributes between the node and parent.

Page 21: Grammars in computer vision Presented by: Thomas Kollar Slides courtesy of Song-Chun Zhu

Probability ModelsProbability Models

)()()),(|(maxarg*freefreeG CpGpCGCIpG

• p(C_free) follows the primal sketch model.

• p(G) is the probability of the parse tree

• p(I | G) is the reconstruction likelihood

Page 22: Grammars in computer vision Presented by: Thomas Kollar Slides courtesy of Song-Chun Zhu

Probability ModelsProbability Models

)( )(

))(|)(())(),(|)(())(|)(())(()(GA AchildB

ooo

N

AXBXpAnAlAXpAlAnpAlpGp

• p(l) is the probability of a rule

• p(n | l) is the probability of the number of components given the type of rule.

• p(X | l, n) is the probability of the geometry of A.

• p(X(B) | X(A)) ensures regularities between the geometries (e.g. that aligned rectangles have almost the same shape).

1)"")(|3)(( cubeAlAnp

qcubeAlp )"")((

e.g. each square should look reasonable

e.g. for the line rule, enforce that everything lines up

Page 23: Grammars in computer vision Presented by: Thomas Kollar Slides courtesy of Song-Chun Zhu

Probability ModelsProbability Models

N

k yx

M

mmksk

ksk

mnskIhyxByxI

ZCIp

1 ),( 1

22

,

,)(,)),(),((

2

1exp

1)|(

• Primal sketch modelkskkkkkkt yxyxnyyxxByxI

k ,),( ),,(),,;,(),(

Page 24: Grammars in computer vision Presented by: Thomas Kollar Slides courtesy of Song-Chun Zhu

Inference: bottom-up Inference: bottom-up detection of rectanglesdetection of rectangles

• RANSAC is run to propose a number of rectangles using vanishing points

Page 25: Grammars in computer vision Presented by: Thomas Kollar Slides courtesy of Song-Chun Zhu

Inference: initialize Inference: initialize terminal nodesterminal nodes

• Input: candidate set of rectangles from previous phase

• Output: a set of non-terminal nodes representing rectangles

• While(not done):• re-compute weights• Greedily select the rectangle with the

highest weight• Create a new non-terminal node in the

grammar

Page 26: Grammars in computer vision Presented by: Thomas Kollar Slides courtesy of Song-Chun Zhu

Inference: initialize Inference: initialize terminal nodesterminal nodes

• Input: non-terminal rectangles from previous step

• Output: a parse graph

• While (not done):• re-compute weights• Greedily select the highest weight

candidate rule• Add rule to parse graph along with any

top-down predictions.

• Weights are computed similarly to before.

Page 27: Grammars in computer vision Presented by: Thomas Kollar Slides courtesy of Song-Chun Zhu

Example of Example of top-down/bottom-up top-down/bottom-up

inferenceinference

Page 28: Grammars in computer vision Presented by: Thomas Kollar Slides courtesy of Song-Chun Zhu

ResultsResults

Page 29: Grammars in computer vision Presented by: Thomas Kollar Slides courtesy of Song-Chun Zhu

ResultsResults

Page 30: Grammars in computer vision Presented by: Thomas Kollar Slides courtesy of Song-Chun Zhu

ResultsResults

Page 31: Grammars in computer vision Presented by: Thomas Kollar Slides courtesy of Song-Chun Zhu

ResultsResults

Page 32: Grammars in computer vision Presented by: Thomas Kollar Slides courtesy of Song-Chun Zhu

ROC curveROC curve

Page 33: Grammars in computer vision Presented by: Thomas Kollar Slides courtesy of Song-Chun Zhu

Generating sketchesGenerating sketches

Additional semantics

Page 34: Grammars in computer vision Presented by: Thomas Kollar Slides courtesy of Song-Chun Zhu

ChallengesChallenges

Geometric deformationsclothes are very flexible

Photometric variabilities large variety of colors, shading and

texture

Topological configurations combinatorial number of clothes designs

Page 35: Grammars in computer vision Presented by: Thomas Kollar Slides courtesy of Song-Chun Zhu

Decomposing a sketchDecomposing a sketch

Page 36: Grammars in computer vision Presented by: Thomas Kollar Slides courtesy of Song-Chun Zhu

And-Or graphAnd-Or graph

“In a computing and recognition phase, we first activate some sub-templates in a bottom-up step. For example, we can detect the face and skin color to locate the coarse position of some components, which help to predict the positions of other components by context.”

Page 37: Grammars in computer vision Presented by: Thomas Kollar Slides courtesy of Song-Chun Zhu

Sketch sub-partsSketch sub-parts

Page 38: Grammars in computer vision Presented by: Thomas Kollar Slides courtesy of Song-Chun Zhu

Example grammarExample grammar

Page 39: Grammars in computer vision Presented by: Thomas Kollar Slides courtesy of Song-Chun Zhu

Sub-templatesSub-templates

Page 40: Grammars in computer vision Presented by: Thomas Kollar Slides courtesy of Song-Chun Zhu

Probability modelProbability model

Page 41: Grammars in computer vision Presented by: Thomas Kollar Slides courtesy of Song-Chun Zhu

Overview of the Overview of the algorithmalgorithm

Page 42: Grammars in computer vision Presented by: Thomas Kollar Slides courtesy of Song-Chun Zhu

Sketch resultsSketch results

Page 43: Grammars in computer vision Presented by: Thomas Kollar Slides courtesy of Song-Chun Zhu

Sketch resultsSketch results

Page 44: Grammars in computer vision Presented by: Thomas Kollar Slides courtesy of Song-Chun Zhu

ConclusionsConclusions

Grammar-based model was presented for generating sketches.

Markov random fields at lowest level.

Top-down/bottom-up inference performed.