is there a simple statistical model of generic natural images? david mumford

MSRI Program:Mathematical,

Computational and Statistical Aspects of Vision

Learning and Inference in Low and Mid Level VisionFeb 21-25, 2005

Is there a simple Statistical Model of

Generic Natural Images?

David Mumford

Outline of talk

1. What are we trying to do: the role of modeling, the analogy with language.

2. The scaling property of images and its implications: model I

3. High kurtosis as the universal clue to discrete structure: models IIa and IIb

4. The ‘phonemes’ of images

5. Syntax: objects+parts, gestalt grouping, segmentation

6. PCFG’s, random branching trees, models IIIa and IIIb

7. Two implementations: Zhu et al, Sharon etal

8. Parse graphs: the problem of context sensitivity.

Part 1: What is the role of a model?

• How much of the complexities of reality can be usefully abstracted in a precise mathematical model?• Early examples:

Ibn al-HaythamGalileo

• Recent examples:Navier-StokesVapnik and PAC learning

• The model and the algorithm are two different things. Don’t judge a model by one slow implementation!

Analogy with language4 levels:• Phonology Pixel statistics• Syntax Grouping rules• Semantics Object recognition• Pragmatics Robotic applications

Part 2: Scaling properties of image statistics

“Renormalization fixed point” means that in a 2Nx2N image, the marginal stats on an NxN subimage or an averaged NxN image should be the same.

In the continuum limit, if random means a sample from μ on the space of Schwarz distributions D′ (R2), then scale-invariant means

2 1, ( ) , where , ,T T f f

Evidence of scaling in images: horizontal derivative of images from 2 different

databases at 5 dyadic scales

Vertical axis is log of frequency; dotted lines = 1 standard deviation

The sources of scaling

The distance from the camera or eye to the object is random. When a scene is viewed nearer, all objects enlarge; further away, they shrink (modulo perspective corrections). Note blueberry pickers in the image above.

A less obvious root of scaling

• On the left, a woman and dog on a dirt road in a rural scene; on the right, enlargement of a patch.

• Note the dog is the same size as the texture patches in the dirt road; and in the enlargement, windows and shrub branches retreat into ‘noise’.

• There is a continuum from objects to texture elements to noise of unresolved details

Gaussian colored noise, spectrum 1/f 2

(looks like typical cumulus clouds) This image is not a measurable function!!

Left: 1/f 4 noise – a true function. Right: white noise

Part 3: Another basic statistical property -- high kurtosis

• Essentially all real valued signals from nature have kurtosis (=μ4/σ4) greater than 3 (their Gaussian value).

• Explanation I: the signal is a mixture of Gaussians with multiple variances (‘heteroscedastic’ to wall streeters). Thus random mean 0 8x8 filter stats have kurtosis > 3, and this disappears if the images are locally contrast normalized.

• Explanation II: a Markov stochastic process with i.i.d increments always has values Xt with kurtosis ≥ 3, and if >3, it has discrete jumps. (Such variables are called infinitely divisible). Thus high kurtosis is a signal of discrete events/objects in nature.

Huge tails are not needed for a process to have jumps: generate a

stochastic process with gamma increments

The Levy-Khintchine theorem for images

• If a random ‘natural’ image I is a vector valued infinitely divisible variable, then L-K applies, so:

, a Poisson

process sampled from Levy measure

gauss k kI I I I

The Ik were called ‘textons’ by Julesz, are the

elts of Marr’s ‘primal sketch’.

What can we say about these elementary constituents of images?

Edges, bars, blobs, corners, T-junctions?

Seeking the textons experimentally – 2x2, 3x3 patches :Ann Lee, Huang, Zhu, Malik, …

Levi-Khinchine leads to the next level of image modeling

• Random natural images have translation and scale-invariant statistics

• This means the primitive objects should be ‘random wavelets:

• Must worry about UV and IR limits – but it works.

• A complication: occlusion. This makes images extremely non-Markovian, leads to the ‘Dead Leaves’ (Matheron, Serra) or ‘random collage’ model (Ann Lee):

( , ) ( , )

where are size+position normalized

( , ) ( , )

k kr rk k k k

k

kk

I x y e x x e y y

I x y I x y

( , ) ( , ), where is the

component covering ( , ) closest to the observer

k kr rk k kI x y e x x e y y k

x y

4 random wavelet images with different primitives

Part 4: In search of the primitives: equi-probable contours in the joint histograms of adjacent wavelet pairs

(filters from E.Simoncelli, calculation by J.Huang)

Bottom left: horizontally adjacent horizontal filters. The diagonal corner illustrates the likelihood of contour continuation; the rounder corners on the x- and y- axes are line endings.

Bottom right: horizontally adjacent vertical filters. The anti-diagonal elongation comes from bars giving a contrast reversal; the rounded corners on the axes comes from edges making one filter respond but not the other.

These cannot be produced by products of an independent scalar factor and a Gaussian.

Image ‘phonemes’ obtained by k-means clustering on 8x8 image

patches (from J.Huang)

Ann Lee’s study of 3x3 patches

Take all 3x3 patches, normalize mean to 0, take those with top 20% contrast, normalize contrast: result is a datapoint on S7.

In this 7-sphere, perfect edges form a surface E. Plot the volume in tubular neighborhhods of E and the proportion of datapoints in them.

There is a huge concentration around E with aymptotic infinite density.

Part 5. Grouping laws and the syntax of images

• Our models are too homogeneous. Natural world scenes have more homogeneously colored/textured parts with discontinuities between them: this is segmentation.

• Is segmentation well-defined? YES if you let it be hierarchical and not one fixed domain decomposition, allowing for the scaling of images.

3 images from the Malik database, each with 3 human segmentations

The gestalt rules of grouping(Metzger, Wertheimer, Kanisza,…)

Segmentation of images is sometimes obvious,

sometimes not: clutter is a consequence of scaling

The classic mandrill: it segments unambiguously into eyes, nose, fleshy cheeks, whiskers, fur

My own favorite for total clutter: an image of log driver in the spring timber run. Logs do not form a consistent texture, background trees have contrast reversal, snow matches white water.

The gestalt rules of grouping(Metzger, Wertheimer, Kanisza,…)

Elements of images are linked on the basis of:

• Proximity

• Similar color/texture These are the factors used in segmentation

• Good continuation This is studied as contour completion

• Parallelism

• Symmetry

• Convexity

Reconstructed edges and objects can be amodal as well as modal:

Part 6: Random branching trees

• Start at the root. Each node decides to have k children with probability pk, where Σ pk = 1. Continue infinietly or until no more children.

• λ= Σ kpk is the expected # of children; if λ≤1, then the tree is a.s. finite; if λ>1, it is infinite with pos.prob.

• Can put labels, from some finite set L, on the nodes and make a labelled tree from a prob. distr. which assigns probabilities to a label {} having k children with labels {k

• This is identical to what linguists call PCFG’s (=probabilistic context-free grammars). For them, L is the set of attributed phrases (e.g. ‘singular feminine noun phrases’) plus the lexicon (which can have no children) and the tree is assumed a.s. to be finite.

Graft random branching trees into the random wavelet model

1. Seed scale space with a Poisson process (xi,yi,ri) with density Cdxdydr/r3.

2. Let each node grow a tree with growth rate <(r/r0)2.

3. Put primitives on seeds, e.g. face, tree, road. It passes attributes to its children, e.g. eyes, trunk, car.

Part 7a: Zhu’s algorithms: top-down synthesis and bottom up tree construction

The nodes of the tree correspond to subsets of the image domain; edges show when one region is a subset of the other

Part 7b: AMG algorithm of Galun, Sharon, Basri and Brandt

The segmentation of a shell by progressive grouping, the set of colored regions at each level forming the nodes of the grid at that level and the grouping following weights obtained by aggregating statistics from below.

A second example of AMG

Part 8: The next frontier -- context sensitive grammars

Children of children need to share information, i.e. context. This can be done by giving more and more attributes to each node to pass down. But this gets absurd after a while. Here areTwo examples – a face and a sentence of a 2 ½ yr. old

‘Unification’ grammar, Shieber and Geman

Parsing the concept of square. Parts of the square belong to more than intermediate level grouping. We no longer have a tree.

‘Compositional’ grammars (Geman-

Bienenstock) are a beginning of a stochastic version of this construction

is there a simple statistical model of generic natural images? david mumford

Documents

random natural image

model ihigh kurtosis

scaling property of

2nx2n image

averaged nxn image

random mean

precise mathematical

signal of discrete