muri projects cortical mechanisms of integration and inference biomimetic models uniting cortical...

MURI Projects Cortical mechanisms of integration and inference Biomimetic models uniting cortical architectures with graphical/statistical models Application of Bayesian Hypercolumn model to target recognition in HSI Image statistics of hyperspectral and spatiotemporal imagery Transition models to MR spectroscopy for biomedical applications University of Pennsylvania Columbia University MIT Leif Finkel Kwabena Boahen Diego Contreras Paul Sajda Josh Zeevi Ted Adelson Yair Weiss The Fundamental Problem: Integration Bottom-up and top-down integration Integration of multiple cues (contour, surface, texture, color, motion depth) horizontal integration The Fundamental Problem: Integration Integration of multiple cues (contour, surface, texture, color, motion depth) horizontal integration Probabilistic models offer a unifying approach to integration spatiotemporal spatiospectral How the Brain Does It? The Cortical Hypercolumn Fundamental module for processing a localized patch (~2 o visual angle) of the visual field Contains neural machinery needed to construct statistically description (i.e. multivariant PDF constructed across orientation, scale, wavelength, disparity, velocity, etc.) The Generalized Aperture Problem: Capturing Non-local Dependencies (i.e. Context) h1 h2 h3h5h7 h8h6h4 Possible mechanism for capturing non-local dependency structure: long-range cortico- cortical connections ( Geisler et. al,. 2001) Statistical properties of natural images are consistent with such a mechanism (Bosking et. al. 1997) Approach: Bayesian Hypercolumn Network Bayesian Hypercolumn as a Canonical Unit in Biological Visual Processing Bridging Bayesian Networks and Cortical Processing A Hypercolumn Architecture for Computing Target Salience A Bayesian Network Model for Capturing Contextual Cues: Applications to Target Classification, Synthesis and Compression Shaashua & Ullman, 1988 Shmuel and Grinvald (2000) Orientation Pinwheels in Visual Cortex Bosking, et al., (1997) J. Neurosci Vertical Meridian VH Anatomical Connectivity in Striate Cortex Geisler, WS et al., Vis. Res. 41 (2001)Sigman, M et al., PNAS 98 (2001) Physiology/PsychophysicsNatural Image Statistics Co-circularity R. Hess & D. Field (1999) Trends in Cog. Sci. Contour Salience o x D. Contreras & L. Palmer, unpublished data Intracellular In Vivo Physiological Recordings D( D( D( = D( A Hypercolumn-Based Model for Estimating Co-Circularity D( D( Detect match between local & distant hypercolumn Hypercolumn receives matched inputs from multiple other hypercolumns D. McCormick Transition to Chattering BehaviorMultiple Matches Causes Synchronization of Chattering Bursts Detects Clique of Connected Hypercolumns Same Chattering Frequency Synchronizes Different Frequencies Dont Synchronize Shaashua & Ullman, 1988 Hypercolumn-Based Co-circularity Measure Bridging Bayesian Networks and Cortical Processing A Hypercolumn Architecture for Computing Target Salience A Bayesian Network Model for Capturing Contextual Cues: Applications to Target Classification, Synthesis and Compression Problem: Integrating Multi-scale Features for Object Recognition/Detection Detecting small objects having few features Discriminating large objects having subtle differences Aim is to do this within a machine learning framework Analogous Problems in Medical Imaging Anatomical and Physiological Context breast cancers tend to be highly vascularized Context provided by multiple modalities leakage seen in fluorescein image can provide insight into clinical significance of drusen in fundus photo Generative Probability Models Statistical Pattern Recognizers are important components of Automatic Target Recognition (ATR) and Computer-aided Detection (CAD) Systems. Most are trained as discriminative models: they model Pr(C | I) C=class, I=image. However there are advantages to generative models: they model Pr(I | C) or Pr(I). x x x x x x x x x x x x x x o o o o o o o o o o o o o o o o discriminative generative By applying Bayes rule generative models can be used for classification: Pr(C|I)=Pr(I|C)P(C)/Pr(I) x x novelty detection compute absolute value of Pr(I|C) to detect images very different from those used to construct the model. confidence measure on the output of the ATR/CAD system synthesis by sampling Pr(I|C) we can generate new images for class C. insight into the image structure captured by the model compression knowing Pr(I|C) gives the optimal code for compressing the image. object optimized compression also noise suppression, segmentation etc. Utility of a Generative Model The Hierarchical Image Probability (HIP) Model Coarse-to-fine conditional dependence. Short-range (relative to pyramid level) dependencies captured by modeling the distribution of feature vectors. Longer-range dependencies captured through a set of hidden variables. Factor probabilities over position to make the model tractable. Coarse-to-fine Conditional Dependence Pyramid divides image structure into scales. Finer scales are conditioned on coarser scale (i.e. objects contain parts, which contain sub-parts, etc.) Factoring Across Scale Models of Pr(G l |I l+1 ) Factor over position to make the computations tractability. Need hidden variables (A) to capture non-local dependencies. Assume F l+1 and A carry relevant information of I l+1. where A and its dependencies are arbitrary. Capturing Long-range Dependencies (Context) with Hidden Variables Coarse-to-fine conditioning alone does not make dependencies local If a large area of I l +1 implies object class A, and class A implies a certain texture in I l, local structure in I l depends on non-local information in I l+1. If I l+1 implies an object class which in turn implies a texture over the region of the object, but I l+1 contains no information for differentiating object class A or B, distant patches are mutually dependent OAOA IlIl I l+1 or not OAOA OBOB Tree Structure of Hidden Variables Choose tree-structure for hidden variables/label Belief network or HMM on a tree A l+2 A l+1 AlAl 1-D2-D Hidden labels can be thought of as a learned segmentation for the image One Model for Pr(G l |I l+1 ) We choose a local integer label a l at each x in G l, with coarse-to- fine conditioning of a l on a l+1 to make the dependency tree. Train the model using Expectation/Maximization (EM) algorithm. Structure of the HIP Model a a g g f f g f a Level l Level l+1 Level l+2 analog to a hypercolumn analog to a long range cortico-cortical connections Example: X-Ray Mammography dataset and training Regions of Interest (ROIs) provided by Dr. Maryellen Giger from the University of Chicago (UofC). ROIs represent outputs from UofC CAD system for mass detection. 72 positive and 96 negative ROIs. Half of data used for training, half for testing. Train two HIP models: masses (positives), non-masses (negatives). Choose architecture using minimum description length (MDL) criterion. Bounded number of labels above at 17. Best architecture; 17,17,11,2,1 hidden labels in levels 0-4 respectively. Mass Detection Reducing false positives by 25% without loss in sensitivity Novelty Detection Use novelty detection to establish a confidence measure for the detector pos neg Image Synthesis Synthesized images can be used to develop intuition of how well model represents the data ROI image synthesized from positive modelROI images synthesized from negative model Compression originalJPEGHIP Results on Aerial Imagery A z ( HIP )=0.87 vs. A z ( HPNN )=0.86 %correct( HIP )=85% vs. %correct( D/V )=78% example images classification synthesis compression label 1 label 2 Hidden Variable Probabilities

muri projects cortical mechanisms of integration and inference biomimetic models uniting cortical...

Documents