latent variable / hierarchical models in computational neural science ying nian wu ucla department...

Latent Variable / Hierarchical Models in Computational

Neural Science

Ying Nian WuUCLA Department of Statistics

March 30, 2011

Outline• Latent variable models in statistics• Primary visual cortex (V1)• Modeling and learning in V1 • Layered hierarchical models• Joint work with Song-Chun Zhu and Zhangzhang Si

Latent variable models

Hidden

Observed

Learning: Examples

Inference:

Latent variable models Mixture model

Factor analysis

Latent variable models

Hidden

Observed

Learning: Examples

Maximum likelihood

EM/gradient

Inference / explaining away

E-step / imputation

Computational neural science

Z: Internal representation by neurons

Y: Sensory data from outside environment

Hidden

Observed

Connection weights

Hierarchical extension: modeling Z by another layer of hidden variables explaining Y instead of Z

Inference / explaining away

Source: Scientific American, 1999

Visual cortex: layered hierarchical architecture

V1: primary visual cortex simple cells complex cells

bottom-up/top-down

1]}[2

1exp{)(

22

22

21

21 ixe

xxxG

Simple V1 cells Daugman, 1985

Gabor wavelets: localized sine and cosine waves

Transation, rotation, dilation of the above function

)'()'(,'

,,,, xBxIBIx

sxsx

image pixels

V1 simple cells

,,sxB

respond to edges

Complex V1 cells Riesenhuber and Poggio,1999

2,,)(),( |,|max sxxAx BI

Image pixels

V1 simple cells

V1 complex cells

Local max

Local sum

•Larger receptive field •Less sensitive to deformation

Independent Component Analysis Bell and Sejnowski, 1996

CBcBcI NN B ...11

Nicpci ,...,1tly independen )(~

)dim(IN

IIC AB 1

mNNmmm CBcBcI B ,11, ...

mmm IIC AB 1

Laplacian/Cauchy

Hyvarinen, 2000

Sparse coding Olshausen and Field, 1996

Laplacian/Cauchy/mixture Gaussians


NNBcBcI ...11

mNNmmm BcBcI ,11, ...)dim(IN

Inference: sparsification, non-linear lasso/basis pursuit/matching pursuit mode and uncertainty of p(C|I) explaining-away, lateral inhibition


Sparse coding / variable selection

Learning: mNNmmm BcBcI ,11, ...

)dim(IN

A dictionary of representational elements (regressors)

NNBcBcI ...11

Olshausen and Field, 1996

}exp{)(

1),(

,, j

jiiji vhW

WZVHp

Nihi ,...,1 ,

V

Restricted Boltzmann Machine Hinton, Osindero and Teh, 2006

P(V|H)P(H|V): factorized no-explaining away

hidden, binary

visible

Energy-based model Teh, Welling, Osindero and Hinton, 2003

)},(exp{),(

1)(

iiBIZ

Ip B

Features, no explaining-away

Maximum entropy with marginalsExponential family with sufficient stat

)},(exp{)(

1)(

,,,,,

sxsxs BI

ZIp

Zhu, Wu, and Mumford, 1997Wu, Liu, and Zhu, 2000

Markov random field/Gibbs distribution

Zhu, Wu, and Mumford, 1997Wu, Liu, and Zhu, 2000

Source: Scientific American, 1999

Visual cortex: layered hierarchical architecture

bottom-up/top-down

What is beyond V1?Hierarchical model?

Hierchical ICA/Energy-based model?

Larger featuresMust introduce nonlinearitiesPurely bottom-up

P(V,H) = P(H)P(V|H) P(H) P(V’,H)

I

H

V

V’

Discriminative correction by back-propagation

Unfolding, untying, re-learning

Hierarchical RBM Hinton, Osindero and Teh, 2006

Hierarchical sparse coding

NNBcBcI ...11

,,sxB

Attributed sparse coding elements transformation group topological neighborhood system

UBcIii sx

n

ii

,,

1

Layer above : further coding of the attributes of selected sparse coding elements

Active basis modelWu, Si, Gong, Zhu, 10Zhu, Guo, Wang, Xu, 05

n-stroke templaten = 40 to 60, box= 100x100

Active basis model Wu, Si, Gong, Zhu, 10Zhu, et al., 05

Yuille, Hallinan, Cohen, 92

n-stroke templaten = 40 to 60, box= 100x100

•Simplest AND-OR graph (Pearl, 84; Zhu, Mumford 06) AND composition and OR perturbations or variations of basis elements

•Simplest shape model: average + residual•Simplest modification of Olshausen-Field model•Further sparse coding of attributes of sparse coding elements

Simplicity

Bottom layer: sketch against texture

Only need to pool a marginal q(c) as null hypothesis • natural images explicit q(I) of Zhu, Mumford, 97• this image explicit q(I) of Zhu, Wu, Mumford, 97

Maximum entropy (Della Pietra, Della Pietra, Lafferty, 97; Zhu, Wu, Mumford, 97; Jin, S. Geman, 06; Wu, Guo, Zhu, 08) Special case: density substitution (Friedman, 87; Jin, S. Geman, 06)

p(C, U) = p(C) p(U|C) = p(C) q(U|C) = p(C) q(U,C)/q(C)

Shared sketch algorithm: maximum likelihood learning

Prototype: shared matching pursuit (closed-form computation)

Step 1: two max to explain images by maximum likelihood no early decision on edge detection Step 2: arg-max for inferring hidden variablesStep 3: arg-max explains away, thus inhibits (matching pursuit, Mallat, Zhang, 93)

Finding n strokes to sketch M images simultaneouslyn = 60, M = 9

Bottom-up sum-max scoring (no early edge decision)

Top-down arg-max sketching

1. Reinterpreting MAX1: OR-node of AND-OR, MAX for ARG-MAX in max-product algorithm2. Stick to Olshausen-Field sparse top-down model : AND-node of AND-OR Active basis, SUM2 layer, “neurons” memorize shapes by sparse connections to MAX1 layer Hierarchical, recursive AND-OR/ SUM-MAX Architecture: more top-down than bottom-up Neurons: more representational than operational (OR-neurons/AND-neurons)

Cortex-like sum-max maps: maximum likelihood inference

SUM1 layer: simple V1 cells of Olshausen, Field, 96MAX1 layer: complex V1 cells of Riesenhuber, Poggio, 99

Scan over multiple resolutions

Bottom-up detectionTop-down sketching

SUM1

MAX1

SUM2

arg MAX1

Sparse selective connection as a result of learningExplaining-away in learning but not in inference

Bottom-up scoring and top-down sketching

Scan over multiple resolutions and orientations (rotating template)

Classification based on log likelihood ratio score

Freund, Schapire, 95; Viola, Jones, 04

http://www.stat.ucla.edu/~zzsi/abpage/abExp3/head_shoulder_85_127/auc_comparison_new.png

http://www.stat.ucla.edu/~zzsi/abpage/abExp3/head_shoulder_85_127/ActiveBasisTemplate_sketch30.png

Adjusting Active Basis Model by L2 Regularized Logistic RegressionBy Ruixun Zhang

•Exponential family model, q(I) negatives Logistic regression for p(class | image), partial likelihood•Generative learning without negative examples basis elements and hidden variables•Discriminative adjustment with hugely reduced dimensionality correcting conditional independence assumption

L2 regularized logistic regressionre-estimated lambda’s

Conditional on: (1) selected basis elements (2) inferred hidden variables (1) and (2) generative learning

Active basis templates

Adaboost templates

# of negatives: 10556 7510 4552 1493 12217

• Arg-max inference and explaining away, no reweighting, • Residual images neutralize existing elements, same set of training examples

• No arg-max inference or explaining away inhibition• Reweighted examples neutralize existing classifiers, changing set of examples

double # elements

same # elements

Mixture model of active basis templates fitted by EM/maximum likelihood with random initialization

MNIST500 total

Learning active basis models from non-aligned imageEM-type maximum likelihood learning, Initialized by single image learning

Learning active basis models from non-aligned image

Hierarchical active basis by Zhangzhang Si et al. •And-OR graph: Pearl, 84; Zhu, Mumford, 06•Compositionality and reusability: Geman, Potter, Chi, 02; L.Zhu, Lin, Huang, Chen,Yuille, 08•Part-based method: everyone et al. •Latent SVM: Felzenszwalb, McAllester, Ramanan, 08•Constellation model: Weber, Welling, Perona, 00

Lowlog-likelihood

Highlog-like

Simplicity

•Simplest and purest recursive two-layer AND-OR graph•Simplest generalization of active basis model

AND-OR graph and SUM-MAX mapsmaximum likelihood inference

Cortex-like, related to Riesenhuber, Poggio, 99•Bottom-up sum-max scoring•Top-down arg-max sketching

Hierarchical active basis by Zhangzhang Si et al.

Shape script by composing active basis shape motifsRepresenting elementary geometric shapes (shape motifs) by active bases (Si, Wu, 10) Geometry = sketch that can be parametrized

UBcIii sx

n

ii

,,

1

),...,1,,(motif shape),...,1,,( nixnix iik

kii

Bottom-layer: Olshausen-Field (foreground) + Zhu-Wu-Mumford (background) Maximum entropy tilting (Della Pietra, Della Pietra, Lafferty, 97) white noise texture (high entropy) sketch (low and mid entropy) (reverse the central limit theorem effect of information scaling)

Build up layers: (1) AND-OR, SUM-MAX (top-down arg-MAX) (2) Perpetual sparse coding: further coding of attributes of the current sparse coding elements (a) residuals of attributes continuous OR-nodes (b) mixture model discrete OR-nodes

Summary

latent variable / hierarchical models in computational neural science ying nian wu ucla department...

Documents

100x100 slide

deformation slide

edges slide

function slide

laplaciancauchy slide

zhangzhang si slide

binary visible slide

estep imputation slide