ballard d. and brown c. m. 1982 computer vision

C

OMFUT

DANAH.BALLARDCHRISTOPHER M.BROWN

COMPUTER VISIONDanaH.Ballard Christopher M.BrownDepartment of Computer Science University of Rochester Rochester, New York

PRENTICEHALL,

INC., Englewood Cliffs, New Jersey 07632

Libraryof Congress Cataloging inPublication DataBALLARD. DANA HARRY.

Computer vision. Bibliography: p. Includes index. I. Image processing. I. Brown, Christopher M. II. Title. TA1632.B34 621.38'04I4 8120974 ISBN 0131653164 AACR2 Cover design by Robin Breite

1982 by PrenticeHall, Inc. Englewood Cliffs, New Jersey 07632

All rights reserved. No part of this book may be reproduced in any form or by any means without permission in writing from the publisher.

Printed in the United States of America 10 9 8 7 6 5 4 3 2

ISBN

D13XtS31bM

PRENTICEHALL INTERNATIONAL, INC., London PRENTICEHALL OF AUSTRALIA PTY. LIMITED, Sydney PRENTICEHALL OF CANADA, LTD., Toronto PRENTICEHALL OF INDIA PRIVATE LIMITED, New Delhi PRENTICEHALL O F JAPAN, INC., Tokyo PRENTICEHALL OF SOUTHEAST ASIA PTE. LTD., Singapore WHITEHALL BOOKS LIMITED, Wellington, New Zealand

Preface

xiii xv

Acknowledgments

Mnemonics for Proceedings and Special Collections Cited in References xix

1 COMPUTER VISION1.1 1.2 1.3 1.4 1.5 AchievingSimpleVisionGoals 1 HighLevel and LowLevel Capabilities 2 A Range of Representations 6 The Role of Computers 9 Computer Vision Research and Applications

12

Part I GENERALIZED IMAGES 13 IMAGE FORMATION2.1 2.2 Images 17 Image Model 18 2.2.1 ImageFunctions, 18 2.2.2 Imaging Geometry, 19 2.2.3 Reflectance, 22 2.2.4 Spatial Properties, 24 2.2.5 Color, 31 2.2.6 Digital Images, 35 Imaging Devices for C o m p u t e r Vision 2.3.1 Photographic Imaging,44 2.3.2 Sensing Range, 52 2.3.3 Reconstruction Imaging, 56

2.3

42

EARLY PROCESSING3.1 3.2 Recovering Intrinsic Structure 63 Filtering the Image 65 3.2.1 Template Matching, 65 3.2.2 Histogram Transformations, 70 3.2.3 Background Subtraction, 72 3.2.4 Filtering and Reflectance Models, 73 Finding Local Edges 75 3.3.1 Typesof EdgeOperators,76 3.3.2 Edge Thresholding Strategies, 80 3.3.3 ThreeDimensional Edge Operators, 81 3.3.4 How Good Are Edge Operators? 83 3.3.5 Edge Relaxation, 85

3.3

3.4

Range Information from Geometry

88 93

3.4.1 Stereo Vision and Triangulation, 88 3.4.2 A Relaxation Algorithm for Stereo, 89

3.5 Surface Orientation from Reflectance Models3.5.1 3.5.2 3.5.3 3.5.4 Reflectivity Functions, 93 Surface Gradient, 95 Photometric Stereo, 98 Shape from Shading by Relaxation, 99

3.6 Optical Flow

102

3.6.1 The Fundamental FlowConstraint, 102 3.6.2 Calculating Optical Flow by Relaxation, 103 3.7 Resolution Pyramids 106 3.7.1 GrayLevel Consolidation, 106 3.7.2 Pyramidal Structures in Correlation, 107 3.7.3 Pyramidal Structures in Edge Detection, 109

PART II SEGMENTED IMAGES 115 4 BOUNDARY DETECTION4.1 4.2 On Associating Edge Elements 119 Searching N e a r an A p p r o x i m a t e Location 121 4.2.1 Adjusting A Priori Boundaries, 121 4.2.2 NonLinear Correlation in Edge Space, 121 4.2.3 DivideandConquer Boundary Detection, 122 T h e H o u g h Method for Curve Detection 123 4.3.1 UseoftheGradient, 124 4.3.2 Some Examples, 125 4.3.3 Trading Off Work in Parameter Space for Work in Image Space, 126 4.3.4 Generalizingthe Hough Transform, 128 Edge Following as G r a p h Searching 131 4.4.1 GoodEvaluation Functions,133 4.4.2 Finding All the Boundaries, 133 4.4.3 Alternatives to the A Algorithm, 136 Edge 4.5.1 4.5.2 4.5.3 4.5.4 Following as D y n a m i c P r o g r a m m i n g 137 Dynamic Programming, 137 Dynamic Programming for Images, 139 Lower Resolution Evaluation Functions, 141 Theoretical Questions about Dynamic Programming, 143

4.3

4.4

4.5

4.6

Contour Following 143 4.6.1 Extension toGrayLevel Images, 144 4.6.2 Generalization to HigherDimensional Image Data, 146

5 REGION GROWING5.1 5.2 5.3 Regions 149 151 A Local Technique: Blob Coloring Global Techniques: Region G r o w i n g via Thresholding 5.3.1 ThresholdinginMultidimensional Space, 153 5.3.2 Hierarchical Refinement, 155 Splitting a n d Merging 155 5.4.1 StateSpaceApproach to Region Growing, 157 5.4.2 LowLevel Boundary Data Structures, 158 5.4.3 GraphOriented Region Structures, 159 Incorporation of Semantics 160

5.4

5.5

6 TEXTURE6.1 6.2 W h a t Is Texture? Texture Primitives 166 169

Contents

6.3

6.4

Structural Models of Texel Placement 170 6.3.1 Grammatical Models, 172 6.3.2 Shape Grammars, 173 6.3.3 Tree Grammars, 175 6.3.4 Array Grammars, 178 Texture as a Pattern Recognition Problem 181 6.4.1 Texture Energy, 184 6.4.2 Spatial GrayLevel Dependence, 186 6.4.3 Region Texels, 188

6.5 The Texture Gradient

189

7 MOTION7.1 M o t i o n U n d e r s t a n d i n g 195 7.1.1 DomainIndependent Understanding, 196 7.1.2 DomainDependent Understanding, 196 Understanding Optical Flow 199 7.2.1 Focusof Expansion, 199 7.2.2 Adjacency, Depth, and Collision, 201 7.2.3 Surface Orientation and Edge Detection, 202 7.2.4 Egomotion, 206 U n d e r s t a n d i n g Image Sequences 207 7.3.1 Calculating Flow from Discrete Images,207 7.3.2 Rigid Bodies from Motion, 210 7.3.3 Interpretation of Moving Light DisplaysA DomainIndependent Approach, 214 7.3.4 Human Motion UnderstandingA Model Directed Approach, 217 7.3.5 Segmented Images, 220

7.2

7.3

Part III GEOMETRICAL STRUCTURES 227 8 REPRESENTATION OF TWODIMENSIONAL GEOMETRIC STRUCTURES8.1 TwoDimensional Geometric Structures 8.2 Boundary Representations 2328.2.1 8.2.2 8.2.3 8.2.4 8.2.5 8.2.6 8.2.7 Polylines,232 Chain Codes, 235 The tyj Curve, 237 Fourier Descriptors, 238 Conic Sections, 239 BSplines, 239 Strip Trees, 244

231

8.3

Region Representations 247 8.3.1 Spatial Occupancy Array, 247 8.3.2 y Axis, 248 8.3.3 Quad Trees, 249 8.3.4 Medial Axis Transform, 252 8.3.5 Decomposing Complex Areas, 253 Simple Shape Properties 254 8.4.1 Area,254 8.4.2 Eccentricity, 255 8.4.3 Euler Number, 255 8.4.4 Compactness, 256 8.4.5 Slope Density Function, 256 8.4.6 Signatures, 257 8.4.7 Concavity Tree, 258 8.4.8 Shape Numbers, 258

8.4

9 REPRESENTATION OF THREEDIMENSIONAL STRUCTURES9.1 Solids and Their Representation 9.2 Surface Representations 265 264

9.2.1 Surfaces with Faces,265 9.2.2 Surfaces Based on Splines, 268 9.2.3 Surfaces That Are Functions on the Sphere, 270

9.3 Generalized Cylinder Representations

274

9.3.1 Generalized CylinderCoordinate Systems and Properties, 275 9.3.2 Extracting Generalized Cylinders, 278 9.3.3 A Discrete Volumetric Version of theSkeleton, 279

9.4 Volumetric Representations9.4.1 9.4.2 9.4.3 9.4.4 9.5

280

Spatial Occupancy, 280 Cell Decomposition, 281 Constructive Solid Geometry, 282 Algorithms for Solid Representations, 284

U n d e r s t a n d i n g Line Drawings 291 9.5.1 Matching LineDrawings to ThreeDimensional Primitives, 293 9.5.2 Grouping Regions Into Bodies, 294 9.5.3 Labeling Lines, 296 9.5.4 Reasoning About Planes, 301

Part IV RELATIONAL STRUCTURES 3131 0 KNOWLEDGE REPRESENTATION AND USE10.1 Representations 31710.1.1 The Knowledge BaseModels and Processes,318

10.1.2 Analogical and Propositional Representations, 319 10.1.3 Procedural Knowledge, 321 10.1.4 Computer Implementations, 322

10.2 Semantic Nets

323 334 340

10.2.1 Semantic Net Basics,323 10.2.2 Semantic Nets for Inference, 327

10.3 Semantic Net Examples

10.3.1 Frame Implementations, 334 10.3.2 Location Networks, 335

10.4 Control Issues in Complex Vision Systems

10.4.1 Paralleland SerialComputation, 341 10.4.2 Hierarchical and Heterarchical Control, 341 10.4.3 Belief Maintenance and Goal Achievement, 346

1 1 MATCHING1.1 Aspects of Matching 35211.1.1 Interpretation: Construction, Matching, and Labeling 352 Il.l.2 Matching Iconic, Geometric, and Relational Structures, 353

352

1.2 GraphTheoretic Algorithms11.2.1 TheAlgorithms,357 11.2.2 Complexity, 359

355 360

1.3 Implementing GraphTheoretic Algorithms

1.4

11.3.1 Matching Metrics,360 11.3.2 Backtrack Search, 363 11.3.3 Association Graph Techniques, 365 Matching in Practice 369 11.4.1 Decision Trees,370 11.4.2 Decision Tree and Subgraph Isomorphism, 375 11.4.3 Informal Feature Classification, 376 11.4.4 A Complex Matcher, 378

1 2 INFERENCE12.1 FirstOrder Predicate Calculus 384 12.1.1 ClauseForm Syntax (Informal), 384 12.1.2 Nonclausal Syntax and Logic Semantics (Informal), 385 12.1.3 Converting Nonclausal Form to Clauses,387 12.1.4 Theorem Proving, 388 12.1.5 Predicate Calculus and Semantic Networks, 390 12.1.6 Predicate Calculus and Knowledge Representation, 392

383

12.2 Computer Reasoning 12.3 Production Systems

395 396

12.3.1 Production System Details, 398 12.3.2 Pattern Matching, 399

Contents

12.3.3 An Example, 401 12.3.4 Production System Pros and Cons, 406

12.4 Scene Labeling and Constraint Relaxation

408

12.5

12.4.1 Consistent and Optimal Labelings,408 12.4.2 Discrete Labeling Algorithms, 410 12.4.3 A Linear Relaxation Operator and a Line Labeling Example, 415 12.4.4 ANonlinear Operator, 419 12.4.5 Relaxation as Linear Programming, 420 Active Knowledge 430 12.5.1 Hypotheses,431 12.5.2 HOWTO and SOWHAT Processes, 431 12.5.3 Control Primitives, 431 12.5.4 Aspects of Active Knowledge, 433

1 3 GOAL ACHIEVEMENT13.1 Symbolic Planning13.1.1 13.1.2 13.1.3 13.1.4 13.2

438

439

RepresentingtheWorld,439 Representing Actions, 441 Stacking Blocks, 442 The Frame Problem, 444

Planning with Costs 445 13.2.1 Planning, Scoring,and Their Interaction, 446 13.2.2 Scoring Simple Plans, 446 13.2.3 Scoring Enhanced Plans, 451 13.2.4 Practical Simplifications, 452 13.2.5 A Vision System Based on Planning, 453

APPENDICES 465 A 1 SOME MATHEMATICAL TOOLSAl.l Coordinate SystemsAl.I.I Al.l.2 Al.l.3 Al.l.4 A l . 2

465

465

Cartesian, 465 Polar and Polar Space, 465 Spherical and Cylindrical, 466 Homogeneous Coordinates, 467

Trigonometry 468 Al.2.1 PlaneTrigonometry, 468 A1.2.2 Spherical Trigonometry, 469 Vectors Matrices 469 471

A1.3 A1.4 A1.5

Lines 474 A1.5.1 Two Points,474 A1.5.2 Point and Direction, 474 A1.5.3 Slope and Intercept, 474

Contents

xi

A1.5.4 Ratios, 474 Al.5.5 Normal and Distance from Origin (Line Equation), 475 A1.5.6 Parametric,476

A1.6 Planes 476 A1.7 Geometric TransformationsA1.7.1 A1.7.2 A1.7.3 Al.7.4 A1.7.5 Al.7.6 A1.7.7

477

Rotation,477 Scaling, 478 Skewing, 479 Translation, 479 Perspective, 479 Transforming Lines and Planes, 480 Summary, 480

A1.8 Camera Calibration and Inverse PerspectiveA1.8.1 Camera Calibration, 482 A1.8.2 Inverse Perspective, 483

481

A1.9 LeastSquaredError Fitting

484

A1.9.1 PseudoInverseMethod,485 A1.9.2 Principal Axis Method, 486 Al.9.3 Fitting Curves by the PseudoInverse Method, 487

A1.10 Conies 488 A1.11 Interpolation 489Al.11.1 OneDimensional, 489 A1.11.2 TwoDimensional, 490

A1.12 The Fast Fourier Transform A1.13 The Icosahedron 492 A1.14 Root Finding 493

490

A 2 ADVANCED CONTROL MECHANISMSA2.1 Standard Control StructuresA2.1.1 Recursion, 498 A2.1.2 CoRoutining, 498

497 499 500

A2.2 Inherently Sequential MechanismsA2.2.1 Automatic Backtracking,499 A2.2.2 Context Switching, 500

A2.3 Sequential or Parallel MechanismsA2.3.1 A2.3.2 A2.3.3 A2.3.4 Modules and Messages,500 Priority Job Queue, 502 PatternDirected Invocation, 504 Blackboard Systems, 505

AUTHOR INDEX SUBJECTINDEX

PrefaceThedreamofintelligentautomatagoesbacktoantiquity;itsfirstmajor articulation in the context of digital computers was by Turing around 1950.Since then, this dream has been pursued primarily by workers in thefieldofartificial intelligence, whose goal is to endow computers with informationprocessing capabilities comparable to thoseofbiological organisms. From theoutset, oneofthegoalsof artificial intelligencehasbeentoequipmachineswiththecapabilityofdealingwith sensory inputs. Computervisionis the construction of explicit, meaningful descriptions of physical objects from images. Image understanding is very different from image processing, which studies imagetoimage transformations, not explicit description building. Descriptions are a prerequisite for recognizing, manipulating, and thinking about objects. We perceive a world of coherent threedimensional objects with many invariant properties. Objectively, the incoming visual data do not exhibit corresponding coherence or invariance; they contain much irrelevant or even misleading variation. Somehow our visual system, from the retinal to cognitive levels,understands, or imposesorder on, chaoticvisual input. It doessobyusing intrinsicinformationthatmayreliablybeextractedfrom theinput,andalsothrough assumptions and knowledgethat areapplied at various levelsinvisualprocessing. The challenge of computer vision is one of explicitness. Exactly what information about scenes can be extracted from an image using only very basic assumptions about physics and optics? Explicitly, what computations must be performed? Then, at what stage must domaindependent, prior knowledge about the world beincorporated intotheunderstanding process?How areworld models and knowledge represented and used?This book isabout the representations and mechanismsthatallowimageinformation andpriorknowledgetointeractinimage understanding. Computer vision is a relatively new and fastgrowing field. The first experiments were conducted in the late 1950s,and many oftheessential conceptsxiii

havebeendevelopedduringthelast ive ears.Withthisrapidgrowth,crucialideas f y have arisen in disparate areas such asartificial intelligence, psychology, computer graphics,andimageprocessing.Ourintentistoassembleaselectionofthismaterial in a form that will serve both as asenior/graduatelevel academic text and as a useful reference to thosebuilding vision systems.Thisbook has astrong artificial intelligenceflavor,andwehopethiswillprovokethought.Webelievethatboththe intrinsic image information and theinternal model of theworld are important in successful vision systems. Thebookisorganized intofourparts,based ondescriptionsofobjectsat four different levelsof abstraction. 1. Generalizedimagesimagesandimagelikeentities. 2. Segmented imagesimages organized into subimages that are likely to correspond to"interesting objects." 3. Geometricstructuresquantitativemodelsofimageandworldstructures. 4. Relational structurescomplex symbolicdescriptions ofimageand world structures. Theparts follow aprogression of increasing abstractness. Although the four partsaremostnaturallystudiedinsuccession,theyarenottightlyinterdependent.Part Iisaprerequisitefor PartII,but PartsIIIand IVcan beread independently. Parts of the book assume some mathematical and computing background (calculus,linearalgebra,datastructures,numericalmethods).However,throughout thebookmathematicalrigortakesabackseattoconcepts.Ourintentistotransmitaset ofideasabout anewfieldtothewidestpossibleaudience. Inonebookitisimpossibletodojusticetothescopeanddepthofpriorworkin computervision.Further,werealizethatinafastdevelopingfield,therapidinfluxof newideaswillcontinue.Wehopethatourreaderswillbechallengedtothink,criticize, read further, and quicklygobeyondtheconfines ofthisvolume.

xiv

Preface

AcknowledgmentsJerry Feldman and HerbVoelcker(andthrough them theUniversity ofRochester) provided many resources for thiswork. One ofthemost important wasacapable and forgiving staff (secretarial, technical, and administrative). For massive text editing, valuable advice, and good humor weareespecially grateful to Rose Peet. Peggy Meeker, Jill Orioli, and Beth Zimmerman all helped at various stages. Several colleagues made suggestions on early drafts: thanks to James Allen, Norm Badler, Larry Davis, Takeo Kanade, John Render, Daryl Lawton, Joseph O'Rourke, Ari Requicha, Ed Riseman, Azriel Rosenfeld, Mike Schneier, Ken Sloan, Steve Tanimoto, Marty Tenenbaum, and Steve Zucker. Graduatestudentshelped inmanydifferent ways:thanksespeciallyto Michel Denber, Alan Frisch, Lydia Hrechanyk, Mark Kahrs, Keith Lantz, Joe Maleson, LeeMoore, Mark Peairs,Don Perlis,Rick Rashid,Dan Russell,Dan Sabbah,Bob Schudy,PeterSelfridge, UriShani,andBobTilove.BernhardStuthdeservesspecial mention for muchcareful andcriticalreading. Finally,thanks goto JaneBallard, mostly for standing steadfast through the cycles ofelation and depression and for numerous engineeringtoEnglish transla tions. AsPatWinstonputit:"Awillingnesstohelpisnotanimplied endorsement." The aid of others was invaluable, but we alone are responsible for the opinions, technical details, and faults of this book. Funding assistancewasprovidedbytheSloan Foundation underGrant784 15,bytheNational Institutes ofHealth underGrant HL21253,andbythe Defense Advanced Research Projects Agency under Grant N0001478C0164. The authors wish to credit the following sources for figures and tables. For complete citations given here in abbreviated form (as "from ..." or "after .. ."), refer to the appropriate chapterend references.Fig. 1.2 from Shani, U., "A 3D modeldriven system for the recognition of abdominal anatomy from CTscans,"TR77, Dept.ofComputer Science,University of Rochester, May 1980.Acknowledgments

xv

Fig. 1.4 courtesy of Allen Hanson and Ed Riseman, COINS Research Project, University of Massachusetts, Amherst, MA. Fig. 2.4 after Horn and Sjoberg, 1978. Figs. 2.5, 2.9, 2.10, 3.2, 3.6, and 3.7 courtesy of Bill Lampeter. Fig.2.7a painting by Louis Condax; courtesy of Eastman Kodak Company and the Optical Society of America. Fig.2.8a courtesy of D.Greenberg and G. Joblove, Cornell Program ofComputer Graphics. Fig. 2.8b courtesy of Tom Check. Table 2.3 after Gonzalez and Wintz, 1977. Fig. 2.18 courtesy of EROS Data Center, Sioux Falls, SD. Figs. 2.19 and 2.20 from Herrick, C.N., Television Theory and Servicing: Black/White and Color, 2nd Ed. Reston, VA: Reston, 1976. Figs. 2.21,2.22, 2.23, and 2.24 courtesy of Michel Denber. Fig. 2.25 from Popplestone et al., 1975. Fig. 2.26 courtesy of Production Automation Project, University of Rochester. Fig. 2.27 from Waag and Gramiak, 1976. Fig. 3.1 courtesy of Marty Tenenbaum. Fig. 3.8 after Horn, 1974. Figs. 3.14 and 3.15 after Frei and Chen, 1977. Figs. 3.17 and 3.18 from Zucker, S.W. and R.A. Hummel, "An optimal 3D edge operator," IEEE Trans. PAMI3, May 1981,pp. 324331. Fig. 3.19 curves are based on data in Abdou, 1978. Figs. 3.20, 3.21,and 3.22 from Prager, J.M., "Extracting and labeling boundary segments in natural scenes," IEEE Tans. PAMI 12, 1,January 1980. 1980 IEEE. Figs. 3.23,3.28, 3.29, and 3.30courtesy of Berthold Horn. Figs. 3.24 and 3.26 from Marr, D. and T. Poggio, "Cooperative computation of stereo dis parity," Science, Vol. 194, 1976,pp.283287. 1976bytheAmerican Association for the Advancement of Science. Fig. 3.31 from Woodham, R.J., "Photometric stereo:A reflectance map technique for deter mining surface orientation from image intensity," Proc. SPIE, Vol. 155, August 1978. Figs. 3.33 and 3.34 after Horn and Schunck, 1980. Fig. 3.37 from Tanimoto, S. and T. Pavlidis, "A hierarchical data structure for picture pro cessing," CGIP 4, 2, June 1975, pp. 104119. Fig. 4.6 from Kimme et al., 1975. Figs. 4.7 and 4.16 from Ballard and Sklansky, 1976. Fig. 4.9 courtesy of Dana Ballard and Ken Sloan. Figs.4.12 and 4.13 from Ramer, U., "Extraction of linestructures from photgraphs ofcurved objects," CGIP 4, 2, June 1975, pp. 81103. Fig. 4.14 courtesy of Jim Lester, Tufts/New England Medical Center. Fig. 4.17 from Chien, Y.P. and K.S. Fu, "A decision function method for boundary detec tion," CGIP 3, 2, June 1974, pp. 125140. Fig. 5.3 from Ohlander, R., K. Price, and D.R. Reddy, "Picture segmentation using a recur siveregion splitting method," CGIP 8, 3, December 1979. Fig. 5.4 courtesy of Sam Kapilivsky. Figs. 6.1, 11.16, and A1.13 courtesy of Chris Brown. Fig. 6.3 courtesy of Joe Maleson and John Kender. Fig. 6.4 from Connors, 1979. Texture images by Phil Brodatz, in Brodatz, Textures. New York: Dover, 1966. Fig. 6.9 texture image by Phil Brodatz, in Brodatz, Textures. New York: Dover, 1966. Figs. 6.11, 6.12, and 6.13 from Lu, S.Y. and K.S. Fu, "A syntactic approach to texture analysis," CGIP 7,3, June 1978, pp. 303330. xviAcknowledgments

Fig. 6.14 from Jayaramamurthy, S.N., "Multilevel array grammars for generating texture scenes," Proc. PRIP, August 1979,pp.391398. 1979 IEEE. Fig.6.20 from Laws, 1980. Figs. 6.21 and 6.22 from Maleson et al., 1977. Fig. 6.23 courtesy of Joe Maleson. Figs. 7.1 and 7.3 courtesy of Daryl Lawton. Fig. 7.2 after Prager, 1979. Figs.7.4and 7.5 from Clocksin, W.F., "Computer prediction of visual thresholds for surface slant and edge detection from optical flow fields," Ph.D. dissertation, University of Edin burgh, 1980. Fig. 7.7 courtesy of Steve Barnard and Bill Thompson. Figs. 7.8 and 7.9 from Rashid, 1980. Fig. 7.10 courtesy of Joseph O'Rourke. Figs. 7.11 and 7.12 after Aggarwal and Duda, 1975. Fig. 7.13 courtesy of HansHellmut Nagel. Fig. 8.Id after Requicha, 1977. Figs. 8.2, 8.3, 8.21a, 8.22, and 8.26 after Pavlidis, 1977. Figs. 8.10, 8.11, 9.6, and 9.16 courtesy of Uri Shani. Figs. 8.12, 8.13, 8.14, 8.15, and 8.16 from Ballard, 1981. Fig. 8.21 b from Preston, K., Jr., M.J.B. Duff; S. Levialdi, P.E. Norgren, and Ji. Toriwaki, "Basicsofcellular logicwith someapplications in medical image processing," Proc. IEEE, Vol. 67, No. 5, May 1979, pp. 826856. Figs. 8.25, 9.8, 9.9, 9.10, and 11.3 courtesy of Robert Schudy. Fig. 8.29 after Bribiesca and Guzman, 1979. Figs. 9.1, 9.18, 9.19, and 9.27 courtesy of Ari Requicha. Fig. 9.2 from Requicha, A.A.G., "Representations for rigid solids: theory, methods, systems," Computer Surveys 12,4, December 1980. Fig. 9.3 courtesy of Lydia Hrechanyk. Figs. 9.4 and 9.5 after Baumgart, 1972. Fig. 9.7 courtesy of Peter Selfridge. Fig. 9.11 after Requicha, 1980. Figs.9.14and 9.15b from Agin,G.J. and T.O. Binford, "Computer description ofcurved ob jects," IEEE Trans, on Computers 25, 1, April 1976. Fig. 9.15a courtesy of Gerald Agin. Fig. 9.17 courtesy of A. Christensen; published as frontispiece of ACM SIGGRAPH 80 Proceedings. Fig. 9.20 from Marr and Nishihara, 1978. Fig. 9.21 after Tilove, 1980. Fig. 9.22b courtesy of Gene Hartquist. Figs. 9.24, 9.25, and 9.26 from Lee and Requicha, 1980. Figs.9.28a, 9.29, 9.30, 9.31,9.32,9.35, and 9.37 and Table 9.1 from Brown, C. and R. Pop plestone, "Cases inscene analysis," in Pattern Recognition, ed. B.G. Batchelor. New York: Plenum, 1978. Fig.9.28b from Guzman,A.,"Decomposition ofavisualsceneintothreedimensional bodies," in Automatic Interpretation and Classification of Images, A. Grasseli, ed., New York: Academic Press, 1969. Fig. 9.28c from Waltz, D., "Understanding line drawing of scenes with shadows," in The Psychology of Computer Vision,ed. P.H. Winston. New York: McGrawHill, 1975. Fig. 9.28d after Turner, 1974. Figs. 9.33, 9.38, 9.40, 9.42, 9.43, and 9.44 after Mackworth, 1973.

Acknowledgments

xvii

Figs. 9.39, 9.45, 9.46, and 9.47 and Table 9.2 after Kanade, 1978. Figs. 10.2 and A2.1 courtesy of Dana Ballard. Figs. 10.16, 10.17, and 10.18 after Russell, 1979. Fig. 11.5 after Fischler and Elschlager, 1973. Fig. 11.8 after Ambler et al., 1975. Fig. 11.10 from Winston, P.H., "Learning structural descriptions from examples," in The Psychology of Computer Vision, ed. P.H. Winston. New York: McGrawHill, 1975. Fig. 11.11 from Nevatia, 1974. Fig. 11.12 after Nevatia, 1974. Fig. 11.17 after Barrow and Popplestone, 1971. Fig. 11.18 from Davis, L.S., "Shape matching using relaxation techniques," IEEE Trans. PAMI 1, 4, January 1979, pp. 6072. Figs. 12.4 and 12.5 from Sloan and Bajcsy, 1979. Fig. 12.6 after Barrow and Tenenbaum, 1976. Fig. 12.8 after Freuder. 1978. Fig. 12.10from Rosenfeld, A.R., A.Hummel, andS.W.Zucker,"Scenelabelingby relaxation operations," IEEE Trans. SMC 6, 6,June 1976,p.420. Figs. 12.11, 12.12, 12.13, 12.14, and 12.15 after Hinton, 1979. Fig. 13.3 courtesy of Aaron Sloman. Figs. 13.6, 13.7, and 13.8 from Garvey, 1976. Fig. A1.11 after Duda and Hart, 1973. Figs.A2.2and A2.3from Hanson, A.R. and E.M. Riseman, "VISIONS: Acomputer system for interpreting scenes," in Computer VisionSystems, ed.A.R. Hanson and E.M. Riseman. New York: Academic Press, 1978.

Acknowledgments

Mnemonics for Proceedings and Special Collections Cited in the ReferencesCGIP Computer Graphicsand Image Processing COMPSAC IEEEComputer Society's3rd International ComputerSoftware andApplica tions Conference, Chicago, November 1979.

cvsHanson, A. R. and E. M. Riseman (Eds.). ComputerVision Systems. New York: Academic Press, 1978. DARPA IU Defense Advanced Research Projects Agency Image Understanding Workshop, Minneapolis, MN, April 1977. Defense Advanced Research Projects Agency Image Understanding Workshop, Palo Alto, CA, October 1977. Defense Advanced Research Projects Agency Image Understanding Workshop, Cambridge, MA, May 1978. Defense Advanced Research Projects Agency Image Understanding Workshop, CarnegieMellon University, Pittsburgh, PA, November 1978. Defense Advanced Research Projects Agency Image Understanding Workshop, University of Maryland, College Park, MD, April 1980. IJCAI 2nd International Joint Conference on Artificial Intelligence, Imperial College, London, September 1971. 4th International Joint Conference onArtificial Intelligence,Tbilisi,Georgia, USSR, September 1975. 5th International Joint Conference on Artificial Intelligence, MIT, Cambridge, MA, August 1977. 6th International Joint Conference on Artificial Intelligence,Tokyo, August 1979.Mnemonics xix

IJCPR 2nd International Joint Conference on Pattern Recognition, Copenhagen, August 1974. 3rd International Joint Conference on Pattern Recognition, Coronado, CA, November 1976. 4thInternationalJointConferenceonPattern Recognition,Kyoto,November 1978. 5th International Joint Conference on Pattern Recognition, Miami Beach, FL, December 1980. MI4 * Meltzer, B.and D. Michie (Eds.). MachineIntelligence 4. Edinburgh: Edin burgh University Press, 1969. Meltzer, B.and D. Michie (Eds.). MachineIntelligence 5. Edinburgh: Edin burgh University Press, 1970. MI6 Meltzer, B.and D. Michie (Eds.). MachineIntelligence 6. Edinburgh: Edin burgh University Press,1971. M17 Meltzer, B.and D. Michie (Eds.). MachineIntelligence 7.Edinburgh: Edin burgh University Press, 1972. PCV Winston, P. H. (Ed.). The Psychologyof Computer Vision.New York: McGrawHill, 1975. PRIP IEEE Computer Society Conference on Pattern Recognition and Image Processing, Chicago, August 1979.

MI5

Mnemonics

Computer VisionComputerVisionIssues1.1 ACHIEVINGSIMPLEVISION GOALS

1

SupposethatyouaregivenanaerialphotosuchasthatofFig.1.1aandaskedtolo cateshipsinit.Youmayneverhaveseenanavalvesselinanaerialphotographbe fore, butyouwillhavenotroublepredictinggenerallyhowshipswillappear.You mightreasonthatyouwillfindnoshipsinland,andsoturnyourattentiontoocean areas.Youmightbemomentarilydistractedbytheglareonthewater,butrealizing that itcomesfrom reflected sunlight, you perceive the ocean ascontinuous and flat.Shipsontheopenoceanstandouteasily (ifyouhaveseenshipsfrom theair, youknowtolookfortheirwakes).Neartheshoretheimageismoreconfusing, but youknowthatshipsclosetoshoreareeithermooredordocked.Ifyouhaveamap (Fig.1.1b),itcanhelplocatethedocks (Fig.1.1c);inalowqualityphotograph it canhelpyou identify the shoreline. Thus it might beagoodinvestment ofyour time toestablish the correspondence between the map and the image. Asearch paralleltotheshoreinthedockareasrevealsseveralships(Fig.1.Id). Again, suppose that you are presented with aset ofcomputeraided tomo graphic (CAT) scansshowing "slices"ofthehuman abdomen (Fig.1.2a).These imagesareproductsofhigh technology,andgiveusviewsnot normallyavailable even with xrays.Yourjob isto reconstruct from these crosssections the three dimensional shape of the kidneys. Thisjob may well seem harder thanfinding ships.Youfirstneedtoknowwhattolookfor (Fig.1.2b),wheretofinditinCAT scans,andhowitlooksinsuchscans.Youneedtobeableto"stackup"thescans mentallyandform aninternal modeloftheshapeofthekidneyasrevealed byits slices(Fig.1.2cand1.2d). Thisbookisabout computervision.Thesetwoexampletasksaretypicalcom1

puter vision tasks; both were solved bycomputers using thesortsofknowledge and techniques alluded tointhedescriptive paragraphs. Computer vision is the enterprise ofautomatingandintegratingawiderangeofprocessesandrepresenta tions used for vision perception. It includes asparts many techniques thatare useful by themselves, such as imageprocessing(transforming, encoding, and transmitting images) andstatisticalpatternclassification (statistical decision theory applied togeneral patterns, visualorotherwise). More importantly forus,it in cludestechniquesforgeometricmodelingandcognitiveprocessing.

1.2 HIGHLEVELAND LOWLEVEL CAPABILITIES

TheexamplesofSection 1.1illustratevisionthatusescognitiveprocesses,geometric models, goals, andplans.These highlevelprocessesareveryimportant;ourexam ples only weakly illustrate their power andscope. There surely would besome overall purpose tofindingships;there mightbecollateral information that there were submarines, barges, orsmall craft intheharbor, andsoforth. CATscans wouldbeusedwithseveraldiagnosticgoalsinmindandanassociated medicalhis tory available. Goals andknowledge are highlevel capabilities that canguide visualactivities,andavisualsystemshouldbeabletotakeadvantageofthem.

(a)

(b)

Fig. 1.1 Finding ships inanaerial photograph, (a)Thephotograph; (b)a corresponding map; (c)thedock areaofthephotograph; (d)registered mapand image,with shiplocation.2 Ch. 1 Computer Vision

Fig. 1.1 (cont.)

Even such elaborated tasks are very special ones and in their way easier to think about than the commonplace visual perceptions needed to pick up a baby, cross a busy street, or arrive at a party and quickly " s e e " who you know, your host's taste in decor, and how long the festivities have been going on. All these tasks require judgment and large amounts of knowledge of objects in the world, how they look, and how they behave. Such highlevel powers are so well in tegrated into "vision" astobeeffectively inseparable. Knowledge and goals areonly part ofthe visionstory. Vision requires many lowlevelcapabilities we often take for granted; for example, our ability to extract intrinsicimagesof "lightness," "color," and "range." We perceive black as black in a complex scene even when the lighting is such that some black patches are reflecting more light than some white patches. Similarly, perceived colorsare not related simply to the wavelengths of reflected light; if they were, we would con sciously see colors changing with illumination. Stereo fusion (stereopsis) isa low levelfacility basictoshortrange threedimensional perception. An important lowlevel capability is objectperception:for our purposes it does notreallymatter ifthistalent isinnate, ("hardwired"), orifitisdevelopmental or even learned ("compiledin"). Thefact remains that mature biological visionsys tems are specialized and tuned to deal with the relevant objects in their environSec. 1.2 HighLevel and LowLevel Capabilities 3

Fig. 1.2 Findingakidney inacomputeraided lomographicscan, (a)Onesliceofscan data; (b)prototype kidney model; (c)model fitting; (d)resultingkidneyand spinalcordinstances.

ments.Further specialization can often belearned, butitisbuilt onbasicimmut ableassumptionsabouttheworldwhichunderliethevisionsystem. Abasicsortofobject recognition capability isthe "figure/ground" discrimi nation that separates objects from the "background." Other basicorganizational predispositions are revealed by the "Gestalt laws" ofclustering, which demon strate rules our vision systems use to form simple arrays of stimuli into more coherent spatial groups. Adramatic example ofspecialized object perception for

4

Ch. 7 Computer Vision

human beingsisrevealed inour"face recognition"capability,whichseemstooc cupyalargevolumeofbrainmatter.Geometricvisualillusionsaremoresurprising symptoms ofnonintuitive processingthat isperformed byour visionsystems,ei therforsomedirectpurposeorasasideeffect ofits specializedarchitecture.Some other illusions clearly reflect the intervention of highlevel knowledge. For in stance,thefamiliar "Necker cubereversal"isgrounded inour threedimensional modelsforcubes. Lowlevelprocessingcapabilitiesareelusive;theyareunconscious,andthey are not well connected to other systems that allow direct introspection. For in stance,ourvisualmemoryfor imagesisquiteimpressive,yetourquantitativever bal descriptions of images are relatively primitive. The biological visual "hardware" has been developed, honed,andspecialized overaverylongperiod. However, its organization and functionality isnot well understood except atex treme levelsofdetailandgeneralitythe behavior ofsmallsetsofcatormonkey corticalcellsandthebehaviorofhumanbeingsinpsychophysicalexperiments. Computer vision isthus immediately faced with avery difficult problem;it must reinvent, withgeneral digital hardware, the most basicandyet inaccessible talents ofspecialized, parallel,andpartly analogbiological visual systems.Figure 1.3 maygiveafeelingfor theproblem;itshowstwovisualrenditionsofafamiliar subject.Theinsetisanormalimage,therestisaplotoftheintensities (graylevels) intheimageagainsttheimagecoordinates.Inotherwords,itdisplays information

F/ " " ' ' . '

1

' ' ' '

? I k. ; . , ' " ' ' , ' . > :

%

. . . \ ^ r : ; ' ; " ''

& f.'.K:i #* '\

iK v..' Vs

!

.'

" " V . .

Fig. 1.3 Tworepresentationsofan image.Oneisdirectlyaccessibletoour lowlevelprocesses;theotherisnot.

Sec. 7.2 HighLevel and LowLevel Capabilities

5

with "height" instead of "light." No information is lost, and the display is an imagelikeobject, butwedonotimmediatelyseeafaceinit.Theinitialrepresenta tion the computer has to work with is no better; it is typically just an array of numbers from which human beings could extract visual information only very painfully. Skipping the lowlevel processing we take for granted turns normally effortless perceptionintoaverydifficult puzzle. Computer vision is vitally concerned with both lowlevel or "early proc essing" issuesandwith thehighlevel and "cognitive" useofknowledge. Where doesvision leaveoff and reasoning and motivation begin? We donot knowpre cisely, but wefirmly believe (and hope toshow) that powerful, cooperating, rich representations oftheworldareneeded for anyadvanced visionsystem.Without them, nosystem can deriverelevant and invariant information from inputthatis beset witheverchanging lighting and viewpoint, unimportant shape differences, noise,andother largebutirrelevant variations.Theserepresentationscanremove somecomputational loadbypredictingorassumingstructureforthevisualworld. Finally, ifasystem isto be successful in avariety of tasks, it needs some "metalevel"capabilities:itmustbeabletomodelandreasonaboutitsowngoals and capabilities, and the success of its approaches. These complex and related modelsmustbemanipulatedbycognitiveliketechniques,eventhoughintrospec tivelytheperceptualprocessdoesnotalways"feel" touslikecognition.

ComputerVision Systems1.3 ARANGEOFREPRESENTATIONS

Visualperception istherelationofvisualinputtopreviouslyexistingmodelsofthe world. There is alarge representational gap between the image and the models ("ideas," "concepts") whichexplain,describe,orabstracttheimageinformation. Tobridgethatgap,computervisionsystemsusuallyhavea(looselyordered)range ofrepresentationsconnectingtheinputandthe "output" (afinaldescription,deci sion, orinterpretation). Computer vision then involvesthedesign ofthese inter mediate representations and the implementation ofalgorithms toconstruct them andrelatethemtooneanother. We broadly categorize the representations into four parts (Fig. 1.4) which correspond with the organization ofthis volume. Withineach part there maybe severallayersofrepresentation, orseveralcooperatingrepresentations. Although the setsofrepresentations areloosely ordered from "early" and "lowlevel"sig nalsto "late" and "cognitive'''' symbols, the actualflowofeffort and information between them is not unidirectional. Ofcourse, not all levels need to be used in each computer vision application; some may be skipped, or the processing may startpartwayupthehierarchyorendpartwaydownit. Generalizedimages (PartI) areiconic (imagelike) and analogicalrepresenta tions of the input data. Images may initially arise from several technologies.6 Ch. 1 Computer Vision

Fig. 1.4 Examplesofthefour categoriesofrep resentation usedincomputervision, (a)Iconic;(b) segmented; (c)geometric;(d)relational.

Domainindependent processing can produce other iconic representations more directly useful to later processing, such as arrays of edgeelements (graylevel discontinuities).Intrinsicimagescansometimesbeproducedatthisleveltheyre veal physical properties ofthe imagedscene (such assurface orientations, range, or surface reflectance). Often parallelprocessingcan produce generalized images. More generally, most "lowlevel" processes can be implemented with parallel computation. Segmentedimages(PartII)areformed from thegeneralizedimagebygather ing its elements into sets likely to be associated with meaningful objects in the scene.Forinstance,segmenting asceneofplanarpolyhedra (blocks) might result in a set of edgesegmentscorresponding to polyhedral edges, or a set of twoSec. 1.3 A Range ol Representations

7

Garage

(

)Bushes

(

) Grass

(

j House

(

j Sky

( j Tree1

( j Tree2

6

Side2

Fig. 1.4 (cont.)

dimensional regionsintheimagecorresponding topolyhedral faces.In producing thesegmentedimage,knowledgeabouttheparticulardomainatissuebeginstobe important bothtosavecomputationandtoovercomeproblemsofnoiseandinade quatedata.Intheplanarpolyhedralexample,ithelpstoknowbeforehand thatthe linesegmentsmustbestraight. Textureandmotionareknowntobeveryimportant in segmentation, and arecurrently topicsofactive research; knowledge in these areasisdevelopingveryfast. Geometricrepresentations (PartIII) areusedtocapturetheallimportantidea8 Ch. 7 Computer Vision

of twodimensional and threedimensional shape. Quantifying shape isas impor tantasitisdifficult. Thesegeometricrepresentationsmustbepowerful enoughto support complex and general processing, such as "simulation" of the effects of lighting and motion. Geometric structures areasuseful for encoding previously acquiredknowledgeastheyareforrerepresentingcurrentvisualinput.Computer visionrequiressomebasicmathematics;Appendix 1 hasabriefselectionofuseful techniques. Relationalmodels(PartIV)arecomplexassemblagesofrepresentationsused to support sophisticated highlevel processing. An important tool inknowledge representationissemanticnets, whichcanbeusedsimplyasanorganizational con venience or asaformalism in their own right. Highlevel processing often uses prior knowledge and modelsacquired priortoaperceptualexperience.Thebasic mode ofprocessing turns from constructingrepresentations to matchingthem. At highlevels,propositionalrepresentations becomemoreimportant. Theyaremade upofassertionsthataretrueorfalsewithrespecttoamodel,andaremanipulated by rules of inference.Inferencelike techniques can also be used forplanning, which models situations and actions through time,and thus must reason about temporally varying and hypothetical worlds.The higher the level of representa tion, the more marked isthe flow of control(direction ofattention, allocationof effort) downwardtolowerlevels,andthegreaterthetendencyofalgorithmstoex hibit serialprocessing. These issues of control are basic to complex information processingingeneralandcomputervisioninparticular;Appendix2outlinessome specificcontrolmechanisms. Figure 1.5 illustratesthe looseclassification ofthe four categoriesintoana logicalandpropositional representations.Weconsidergeneralizedandsegmented imagesaswellasgeometricstructurestobeanalogicalmodels.Analogical models capture directly the relevant characteristics of the represented objects, and are manipulatedandinterrogated bysimulationlike processes.Relational modelsare generallyamix of analogical and propositional representations. Wedevelop this distinctioninmoredetailinChapter10.

1.4 THEROLEOF COMPUTERS

Thecomputerisacongenialtoolforresearchintovisualperception. Computers are versatile and forgiving experimental subjects. They are easily andethicallyreconfigurable, notmessy,andtheirworkingscanbescrutinized inthefinestdetail. Computers aredemanding critics.Imprecision, vagueness, andoversightsare nottoleratedinthecomputerimplementationofatheory. Computers offer new metaphors for perceptual psychology (also neurology, linguistics,andphilosophy).Processesandentitiesfromcomputersciencepro vide powerful and influential conceptual tools for thinking about perception andcognition. Computers cangive precise measurements of the amount ofprocessing theySec. 1.4 The Role of Computers

9

Knowledge base

AiJogical models

Analogical prooositional models

Generalized image

Geometric structures

Relational structures

Fig. 1.5 The knowledge base ofacomplex computer vision system, showing four basic representationalcategories.

Table 1.1 EXAMPLES OFIMAGE ANALYSIS TASKS Domain

Objects Threedimensional outdoor scenes indoor scenes Mechanical parts Terrain Buildings,etc.

Modality

Tasks

Knowledge Sources

Robotics

Light Xrays Light Structured light Light Infrared Radar

Identify or describe objects in scene Industrial tasks

Models of objects Models ofthereflection of light from objects

Aerial images

Improved images Resource analyses Weather prediction Spying Missileguidance Tactical analysis Chemical composition Improved images Diagnosis of abnor malities Operative and treatment planning Pathology, cytology Karyotyping Analysis of molecular compositions Determination of spatial orientation Find newparticles Identify tracks

Maps Geometrical models of shapes Models ofimage formation

Astronomy Medical Macro

Stars Planets Body organs

Light Xrays Ultrasound Isotopes Heat Electronmicroscopy Light Electron densities Light Electronmicroscopy Light

Geometrical models of shapes Anatomical models Models ofimage formation

Micro Chemistry Neuroanatomy Physics

Cells Protein chains Chromosomes Molecules Neurons Particle tracks

Models of shape Chemical models Structured models Neural connectivity Atomic physics

do.Acomputerimplementation placesanupperlimitontheamountofcompu tationnecessaryforatask. Computersmaybeusedeithertomimicwhatweunderstandabouthumanper ceptual architectureandprocesses,ortostrikeoutindifferent directionstotry toachievesimilarendsbydifferent means. Computer models may bejudged either by their efficacy for applications and onthejob performance or by their internal organization, processes, and structuresthetheorytheyembody.1.5 COMPUTER VISION RESEARCHAND APPLICATIONS

"Pure" computer visionresearch often dealswithrelatively domainindependent considerations.Theresultsareuseful inabroadrangeofcontexts.Almostalways suchworkisdemonstrated inoneormoreapplicationsareas,andmoreoften than notaninitialapplicationproblem motivatesconsideration ofthegeneralproblem. Applicationsofcomputervisionareexciting,andtheirnumberisgrowingascom putervisionbecomesbetterunderstood.Table 1.1givesapartiallistof"classical" andcurrentapplicationsareas. Within the organization outlined above, this book presents many specific ideasandtechniqueswithgeneralapplicability.Itismeanttoprovideenoughbasic knowledgeandtoolstosupportattacksonbothapplicationsandresearchtopics.

12

Ch. 7 Computer Vision

GENERALIZED IMAGES

Thefirststep in the vision processisimage formation. Images mayarise from a variety of technologies. For example, most televisionbased systems convert reflected lightintensity intoanelectronicsignalwhich isthendigitized;othersys temsusemoreexoticradiations,such asxrays,laserlight, ultrasound, and heat. Thenetresultisusuallyanarrayofsamplesofsomekindofenergy. Thevision system maybeentirely passive,takingasinput adigitized image from amicrowave or infrared sensor, satellite scanner, oraplanetary probe, but more likelyinvolves somekind ofactive imaging. Automated activeimagingsys tems may control the direction and resolution ofsensors, or regulate and direct their own light sources. The light source itself may have special properties and structuredesignedtorevealthenatureofthethreedimensionalworld;anexample istouseaplaneoflightthatfallsonthesceneinastripewhosestructureisclosely related to the structure ofopaque objects. Range data for the scenemay bepro vided by stereo (two images), but also by triangulation using lightstripe tech niquesorby"spotranging" usinglaserlight.Asinglehardwaredevicemaydeliver range and multispectral reflectivity ("color") information. The imageforming devicemayalsoperform variousotheroperations. Forexample,itmayautomati callysmoothorenhancetheimageorvaryitsresolution. Thegeneralizedimageisasetofrelatedimagelikeentitiesforthescene.This set mayinclude related imagesfrom several modalities, but mayalsoinclude the resultsofsignificant processingthatcanextract intrinsicimages. Anintrinsicimage isan"image," orarray,ofrepresentations ofanimportant physicalquantitysuch assurface orientation, occludingcontours, velocity,orrange.Objectcolor,which is a different entity from sensed redgreenblue wavelengths, is an intrinsic quality.Theseintrinsicphysicalqualitiesareextremely useful; theycanberelated tophysicalobjectsfarmoreeasilythantheoriginalinput values,which reveal the physicalparametersonlyindirectly.Anintrinsicimageisamajorsteptowardscene understandingandusuallyrepresentssignificant andinterestingcomputations.PartI Generalized Images

Theinformation necessarytocomputeanintrinsicimageiscontained inthe inputimageitself, andisextracted by"inverting" thetransformation wroughtby theimagingprocess,thereflection ofradiationfrom thescene,andotherphysical processes.Anexampleisthefusion oftwostereoimagestoyieldanintrinsicrange image.Many algorithms to recover intrinsic imagescan berealized with parallel implementations, mirroringcomputations thatmaytakeplaceinthelowerneuro logicallevelsofbiologicalimageprocessing. Allofthecomputationslistedabovebenefit from theideaofresolutionpyra mids.Apyramidisageneralizedimagedatastructureconsistingofthesameimage atseveralsuccessively increasinglevelsofresolution.Astheresolution increases, more samplesare required torepresent the increased information and hence the successive levels are larger, making the entire structure look like a pyramid. Pyramidsallowtheintroductionofmanydifferent coarsetofine imageresolution algorithmswhicharevastlymoreefficient thantheirsinglelevel, highresolution onlycounterparts.

Part I Generalized Images

15

ImageFormation 2

2.1 IMAGES

Imageformation occurswhen asensorregisters radiationthat hasinteracted with physicalobjects. Section 2.2 dealswith mathematical modelsofimagesand image formation. Section2.3describesseveralspecificimageformation technologies. Themathematicalmodelofimaginghasseveraldifferent components. 1. Animagefunctionisthefundamental abstractionofanimage. 2. Ageometricalmodeldescribeshowthreedimensionsareprojectedintotwo. 3. A radiometrical modelshows how the imaging geometry, light sources, and reflectancepropertiesofobjectsaffect thelightmeasurementatthesensor. 4. A spatialfrequency model describeshowspatial variationsoftheimagemay becharacterizedinatransform domain. 5. Acolormodeldescribeshowdifferent spectralmeasurementsarerelatedtoim agecolors. 6. Adigitizingmodeldescribestheprocessofobtainingdiscretesamples. This material forms the basis of much imageprocessing work and is developed in much more detail elsewhere, e.g., [Rosenfeld and Kak 1976; Pratt 1978].Ourgoalsarenotthoseofimageprocessing,sowelimitourdiscussiontoa summaryoftheessentials. The wide range of possible sources of samples and the resulting different implications for later processing motivate our overview ofspecific imaging tech niques.Ourgoalisnottoprovideanexhaustivecatalog,butrathertogiveanidea of the range of techniques available. Very different analysis techniques may be needed depending on how the image wasformed. Twoexamples illustrate this

17

point.Iftheimageisformedbyreflectedlightintensity,asinaphotograph,theim age records both light from primary light sources and (more usually) the light reflected off physicalsurfaces. WeshowinChapter 3that incertaincaseswecan use these kinds of images together with knowledge about physics to derive the orientation ofthe surfaces. If, on theotherhand, the imageisacomputed tomo gramofthehuman body (discussed inSection 2.3.4),theimagerepresents tissue densityofinternalorgans.Hereorientationcalculationsareirrelevant, butgeneral segmentation techniques ofChapters 4and 5 (the agglomeration of neighboring samplesofsimilardensityintounitsrepresentingorgans)areappropriate.

2.2 IMAGEMODEL

Sophisticated image models of astatistical flavor are useful in image processing [Jan1981].Hereweareconcernedwithmoregeometricalconsiderations.2.2.1 Image Functions

Animagefunctionisamathematical representationofanimage. Generally,anim agefunction isavectorvaluedfunction ofasmallnumberofarguments.Aspecial case ofthe imagefunction isthe digital (discrete) imagefunction,where theargu mentstoandvalueofthefunction areallintegers.Different imagefunctions may beusedtorepresent thesameimage,dependingonwhichofitscharacteristicsare important. For instance, a camera produces an image on blackandwhitefilm which is usually thought of asarealvalued function (whose value could be the density ofthephotographic negative) oftworealvalued arguments, onefor each of twospatial dimensions. However, at avery small scale (the order ofthefilm grain)thenegativebasicallyhasonlytwodensities,"opaque"and "transparent." Most images are presented by functions of two spatial variables fix) =fix, y), wherefix, y) isthebrightnessofthegrayleveloftheimageata spatialcoordinate ix,y). Amultispectral imagefisavectorvalued function with components if].. ,f). Onespecialmultispectral imageisacolorimageinwhich, for example, the components measure the brightness values of each of three wavelengths,thatis,

/CO

J red( x ) >Jblue( x ) '/green ( x )

Timevarying images fix,t) have an added temporal argument. For special threedimensional images,x= ix,y, z). Usually,both thedomainandrange of/ arebounded. An important part of the formation process isthe conversion ofthe image representation from acontinuous function to adiscrete function; weneed some wayofdescribingtheimagesassamplesatdiscretepoints.Themathematicaltool weshalluseisthedeltafunction. Formally,thedeltafunction maybedefinedby

18

Ch. 2 Image Formation

8(x)= oo henx =0 W

0whenx^0

( 2 1 )

J8(x)dx= 1Ifsomecareisexercised, thedeltafunction maybeinterpreted asthelimitofaset offunctions: 8(x) = lim 80c)n

wb^re

/I8Ax) 0

ifUI

t_LLJJ^2 H i [ 5 ( ? w 0 ) +5(? + w 0 )

sin 27rcjx

5 / [ 8 ( ? w 0 ) +5{f+ w 0 )]

ImF

Intuitively,onefunction is"sweptpast" (inonedimension) or"rubbedover" (in twodimensions) theother.Thevalueoftheconvolutionatanydisplacementisthe integral oftheproduct ofthe (relativelydisplaced) function values.Onecommon phenomenon that iswellexpressed byaconvolution istheformation ofanimage byan optical system. The system (say acamera) has a"pointspread function," whichistheimageofasinglepoint. (Inlinearsystemstheory,thisisthe "impulse response,"orresponsetoadeltafunction input.) Theidealpointspread function is,ofcourse,apoint.Atypicalpointspread function isatwodimensional Gaus sian spatial distribution of intensities, but may include such phenomena as diffraction rings.Inanyevent, ifthecameraismodeled asalinearsystem (ignor

Fig. 2.5 (on facing page) (a) An image,fix, y). (b) A rotated version of (a), filtered toenhance high spatial frequencies, (c) Similar to (b), but filtered to enhance low spatial frequencies, (d), (e), and (f) show the loga rithm of the power spectrum of (a), (b), and (c).The power spectrum is the logsquare modulus of the Fourier transform F(u, v).Considered in polar coordinates (p,0), pointsofsmallp correspond tolowspatial frequencies ("slowlyvarying" intensities), large p to high spatial frequencies contributed by "fast" variations such as step edges.The power at (p, 9) isdetermined bythe amount ofintensity variation at thefrequency p occurring at the angle0.28


(b)

(d)

;

:

,

(f)

n H B H o B B H H H H H29

ing the added complexity that the pointspread function usually varies over the field ofview),theimageistheconvolutionofthepointspreadfunctionandthein putsignal.Thepointspread function isrubbed overtheperfect inputimage,thus blurringit. Convolution isalso agood model for the application of many other linear operators,suchaslinedetecting templates.Itcanbeusedinanotherguise (called correlation) toperform matchingoperations (Chapter3)whichdetectinstancesof subimagesorfeaturesinanimage. Inthespatialdomain, theobviousimplementation oftheconvolutionopera tion involves ashiftmultiplyintegrate operation which is hard to do efficiently. However,multiplicationandconvolutionare"transform pairs,"sothatthecalcu lation of the convolution in one domain (say the spatial) is simplified byfirst Fouriertransforming totheother (thefrequency) domain,performing amultipli cation,andthentransforming back. The convolution of/ a n d gin the spatialdomain isequivalent tothe point wiseproductofFandGinthefrequency domain,c

Sif*g) =FG

(2.24)

Weshallshowthisinamanner similar to [Dudaand Hart 1973]. Firstweprove theshifttheorem.IftheFouriertransform of/Gc) isFiu), definedas F(u) =Jf(x)X

exp [ j2ir(ux)]dx

(2.25)

then 5 [fix a)] = ffixa)X

exp [ j2ir(ux)]dx

(2.26)

changingvariablessothatx' =x aanddx=dx' = ff(x')x'

exp {j2ir[u(x'

+a)])dx'

(2.27)

Nowexp[jliruix' +a)] = exp (jlrrua) exp( JITTUX'), where thefirst termisaconstant.Thismeansthat 3 [fix a)] =exp( jl7rua)Fiu) Nowwearereadytoshowthat,'y[f(x)*g(x)]=(

(shift theorem) Fiu)Giu). (2.28) (2.29)

Sif*g) =j{jy x

fix)giy

x)} exp ( jlituy) dx dy x) exp ( jlTTuy) dy)dx

= ffix){fgiyx y

Recognizingthatthetermsinbracesrepresent ,fy[giy x)] andapplyingtheshift theorem,weobtainc

S(f*g) =J/Gc)exp ( j2irux)G(u)X

dx

(2.30) (2.31)Ch. 2 Image Formation

= Fiu)Giu)30

2.2.5 Color

Notallimagesaremonochromatic;infact,applicationsusingmultispectralimages arebecomingincreasingly common (Section2.3.2).Further, human beingsintui tivelyfeelthatcolorisanimportantpartoftheirvisualexperience,andisusefulor evennecessaryfor powerful visualprocessing intherealworld. Colorvisionpro vides ahost ofresearch issues, both forpsychology andcomputer vision. We briefly discuss two aspects ofcolor vision: color spaces andcolor perception. Severalmodelsofthehumanvisualsystemnotonlyincludecolorbuthaveproven usefulinapplications [Granrath1981]. ColorSpaces Colorspacesareawayoforganizingthecolorsperceived byhumanbeings.It happens that weightedcombinations ofstimuliatthree principal wavelengthsare sufficient todefinealmostallthecolorsweperceive.Thesewavelengthsform ana turalbasisorcoordinatesystemfrom whichthecolormeasurement processcanbe described. Color perception isnot related inasimplewaytocolor measurement, however. Colorisaperceptual phenomenon related tohuman response todifferent wavelengthsinthevisibleelectromagneticspectrum [400 (blue) to700nanometers (red); ananometer (nm) is10 9 meter].The sensation ofcolor arises fromthe sensitivities ofthree typesofneurochemical sensors inthe retina tothe visible spectrum. The relative responseofthese sensorsisshown inFig.2.6.Note that each sensor respondstoarangeofwavelengths. The illumination source hasits own spectral compositionf{k) which ismodified bythe reflecting surface.Let r(k) bethisreflectance function.ThenthemeasurementRproducedbythe "red" sensorisgivenby R=jf(k)r(k)hR(k) dk (2.32)

So thesensor output is actually theintegral of three different wavelength dependentcomponents:thesource/ , thesurfacereflectance /,andthesensorhR. Surprisingly, onlyweightedcombinationsofthreedeltafunction approxima tionstothedifferentf(k)h 0 0 , thatis,8(A/?), 8(A G ),and8(X B ),arenecessarytoi

1

1

P

1a 400 ^ / 500 Wavelength,nm 600 Sec.2.2 Image Model

700

Fig. 2.6 Spectral responseof human color sensors.31

producethesensationofnearlyallthecolors.Thisresultisdisplayedonachromati citydiagram.Suchadiagramisobtainedbyfirstnormalizingthethreesensormeas urements: r = R + R G+ B G (2.33) R + G+ B B b = R + G +B andthenplottingperceived colorasafunction ofanytwo(usuallyredandgreen). Chromaticity explicitly ignoresintensity orbrightness;itisasection through the threedimensionalcolorspace(Fig.2.7).Thechoiceof(XR,kG, \B) = (410,530, 650)nm maximizes the realizable colors, but somecolors still cannot berealized sincetheywouldrequirenegativevaluesforsomeof/, g,andb. Another moreintuitivewayofvisualizingthepossiblecolorsfrom the RGB spaceistoviewthesemeasurementsasEuclideancoordinates.Hereanycolorcan bevisualized asapoint inthe unit cube.Other coordinate systems areuseful for different applications;computergraphicshasprovedastrongstimulusforinvesti gationofdifferent colorspacebases. ColorPerception Color perception is complex, but the essential step is a transformation of threeinputintensitymeasurements intoanotherbasis.Thecoordinatesofthenew

(a)

(b)

Fig. 2.7 (a) An artist's conception of the chromaticity diagramsee color insert; (b) a more useful depiction. Spectral colors range along the curved boundary; the straight boun dary isthelineofpurples.32 Ch. 2 Image Formation

basisaremoredirectlyrelatedtohumancolorjudgments. Although theRGBbasisisgoodfor theacquisitionordisplayofcolor infor mation, itisnotaparticularlygoodbasistoexplain theperception ofcolors. Hu man vision systems can make goodjudgments about the relative surface reflec tancer(A)despitedifferent illuminatingwavelengths;thisreflectanceseemstobe whatwemeanbysurfacecolor. Another important feature ofthecolorbasisisrevealed byanabilitytoper ceive in "black and white," effectively deriving intensity information from the color measurements. From an evolutionary point of view, wemight expect that colorperceptioninanimalswouldbecompatiblewithpreexistingnoncolorpercep tualmechanisms. These twoneedsthe need tomakegood colorjudgments and theneed to retainanduseintensity informationimply thatweuseatransformed, nonRGB basisforcolorspace.Ofthedifferent basesinuseforcolorvision,allarevariations onthistheme:Intensityforms onedimensionandcolorisatwodimensionalsub space.Thedifferences ariseinhowthecolorsubspaceisdescribed.Wecategorize suchbasesintotwogroups. 1.Intensity/Saturation/Hue(IHS).Inthisbasis,wecomputeintensityas intensity:= R + G+B (2.34) The saturation measures the lackofwhiteness in the color. Colorssuch as "fire engine" redand "grass"greenaresaturated; pastels (e.g.,pinksand paleblues) aredesaturated.SaturationcanbecomputedfromRGBcoordinatesbytheformula [TenenbaumandWeyl1975] 3min (R, G, B) ,~~cx ^L (2.35) : intensity Hue is roughly proportional to the average wavelength of the color. It can be definedusingRGBbythefollowingprogram fragment: MR G) +(R B))} (2.36) hue:= cos l ^VCR G)2+ (R B){G BY UB > Gthenhue:= 2pi hue The IHS basis transforms the RGB basis in the following way. Thinking of the color cube,the diagonal from the origin to (1, 1, 1)becomes the intensity axis. Saturation isthedistanceofapointfrom thataxisandhueistheanglewithregard tothepointaboutthataxisfromsomereference (Fig.2.8). This basis isessentially that used byartists [Munsell 1939],who term sat uration chroma. Also, this basis hasbeen used ingraphics [Smith 1978;Joblove andGreenberg1978]. One problem with the IHS basis, particularly asdefined by (2.34) through (2.36),isthatitcontainsessentialsingularitieswhereitisimpossibletodefine the color in aconsistent manner [Kender 1976].For example, hue hasan essential singularityforallvaluesof(R, G,B), whereR = G=B. Thismeansthatspecial caremustbetakeninalgorithmsthatusehue. 2.Opponentprocesses.Theopponent processbasisusesCartesianratherthanSec. 2.2 Image Model 33

, saturation:= 1

(a)

(b)

8 An IHSColorSpace, (a)Crosssection atone intensity; (b) crosssectionatone hueseecolorinserts.

cylindrical coordinates for the color subspace, and wasfirstproposed by Hering [Teevan and Birney 1961].The simplest form ofbasis isalinear transformation from R, G, B coordinates. The new coordinates are termed "i? G", "Bl r " , a n d " ^ Bk ": \R G Bl Y W Bk 1 1 1 2 1 1 1 R] 2 G 1 B

Theadvocatesofthisrepresentation, suchas[HurvichandJameson 1957],theor izethat thisbasishasneurological correlatesand isinfact thewayhuman beings represent ("name") colors.Forexample,inthisbasisitmakessensetotalkabout a "reddish blue" but not a"reddish green." Practical opponent process models usuallyhavemorecomplexweightsinthetransform matrixtoaccountforpsycho physical data. Somestartling experiments [Land 1977] show our ability to make correct colorjudgmentsevenwhen theillumination consistsofonlytwoprincipal wavelengths. The opponent process, at the level at which wehave developed it, doesnotdemonstrate howsuchjudgmentsaremade,butdoesshowhowstimulus atonlytwowavelengthswillproject intothecolorsubspace.Readersinterestedin thedetailsofthetheoryshouldconsultthereferences. Commercial television transmissionneedsanintensity,or" W Bk" com ponent forblackandwhitetelevisionsetswhilestillspanningthecolorspace.The National Television Systems Committee (NTSC) uses a "YIQ" basis extracted fromRGBvia


0.60 0.28 0.32 0.21 0.52 0.31 0.30 0.59 0.11 Thisbasisisaweightedformof (/, Q, Y) = ("R yan, " "magentagreen, " "WBk") c2.2.6 Digital Images

The digitalimageswith which computer vision dealsare represented bymvector discretevalued imagefunctions / ( x ) , usually ofone,two,three, orfour dimen sions. Usually m = 1, and both the domain and range offix) are discrete. The domain of / is finite, usually a rectangle, and the range of / is positive and bounded:0 l/(2x 0 ), thenthereisnooverlapbetweensuccessivereplicationsof Fiu) inthefrequency domain. ThisisshownforthecaseofFig.2.15a,wherewe havearbitrarilyusedatriangularshaped imagetransform toillustratetheeffectsof sampling.Incidentally, notethatfor thistransform Fiu) =F(u) andthatithas no imaginary part; from Table 2.2, the onedimensional imagemust also bereal andeven.NowifF{u) isnotbandlimited, i.e.,thereareu> forwhichFiu) 2*o ^ 0,thencomponentsofdifferent replicationsofFiu) willinteracttoproducethe composite function Fsiu), asshowninFig. 2.15b.Inthefirstcasefix) canbe recoveredfrom Fsiu) bymultiplyingFsiu) byasuitableGiu): G(u) = Then fix)=ixcos9 +y sin 9)]\oo\dood9 (2.53)

S i n c e x ' = xcos9 + y sin0,rewriteEq. (2.53) as fix) = frl{Feia>)Hi0. Expandingtheexpressionford2,wecanseethat d2iy) =E [ / 2 ( x ) 2/(x)Kx y)+t2ix y)]X

(3.2)

NoticethatJ ) t2ix y)isaconstanttermandcanbeneglected.When / 2 ( x ) isX X

approximately constantittoo can bediscounted, leaving what iscalled thecross correlationbetween/and t. Rfliy)=yLfix)tixy)X

(3.3)

Thisismaximizedwhentheportionoftheimage"under"tisidenticaltot.Ch. 3 Early Processing

Template

Industrial Image Fig. 3.3 An industrial imageand template forahexagonal nut.

Onemayvisualizethetemplatematchingcalculationsbyimaginingthetem plate being shiftedacross the image to different offsets; then the superimposed valuesatthisoffset aremultipliedtogether, andtheproductsareadded.Theresult ingsum ofproductsforms anentry inthe "correlation array"whose coordinates aretheoffsetsattainedbythesourcetemplate. Ifthetemplateisallowedtotakealloffsetswithrespecttotheimagesuchthat someoverlaptakesplace,thecorrelationarrayislargerthaneitherthetemplateor the image. An n x n image with an m x m template yields an (n +m lxn +m \) correlation array. If the template is not allowed to shift off the image, the correlation array is (n m + 1x n m + 1); for m T' '~ 0 otherwise S =setofpointsx',y',d'such that|x x'|andqdeterminedapriorithatremainconstantthroughouteachite ration.Thesimplest placetospecify asurface gradient isatanoccluding contour (seeFig.3.32)wherethegradientisnearly90tothelineofsight.Unfortunately, p and qare infinite at these points. Ikeuchi's elegant solution to this is to usea different coordinate system for gradient space, that of a Gaussian sphere (Appendix 1).Inthissystem,thesurface normal isdescribed relativetowhereit intersects the sphere ifthe tailofthe normal isatthesphere's origin.Thisisthe pointatwhichaplaneperpendicular tothenormalwouldtouchthesphereiftran slatedtowardit(Fig.3.32b). In thissystemtheradiancemaybedescribedintermsofthespherical coor dinates9, .ForaLambertian surface R(9,) = cos9 cos9S+ sin9sin9S cos( '"*'"'"""" , j> ^ J ^

(b)

(d)

Fig. 3.34 Opticalflowresults,(a),(b)and(c)arethreeframesfrom therotating sphere,(d)isthederivedthreedimensionalflowafter 32suchtimeframes.Sec.3.6 OpticalFlow 105

3.7 RESOLUTION PYRAMIDS

Whatisthebestspatialresolution foranimage?Thesamplingtheoremstatesthat the maximum spatialfrequency inthe imagedata mustbelessthanhalfthesam plingfrequency inorder thatthesampledimagerepresent theoriginal unambigu ously.However,thesamplingtheoremisnotagoodpredictorofhoweasilyobjects canberecognized bycomputer programs.Often objectscanbemoreeasilyrecog nizedinimagesthathaveaverylowsamplingrate.Therearetworeasonsforthis. First, thecomputationsarefewer because ofthereduction indimensionality.Se cond, confusing detail present in the highresolution versions ofthe imagesmay notappearatthereducedresolution.Buteventhoughsomeobjectsaremoreeasily found atlowresolutions,usuallyanobjectdescriptionneedsdetailonlyrevealedat thehigherresolutions.Thisleadsnaturallytothenotionofapyramidalimagedata structureinwhichthesearchforobjectsisbegunatalowresolution,andrefinedat everincreasing resolutions until one reaches the highest resolution of interest. Figure3.35showsthecorrespondencebetweenpixelsforthepyramidalstructure. Inthenextthreesections,pyramidsareappliedtograylevelimagesandedge images.Pyramids, however, areaverygeneral tool and can beused to represent anyimageatvaryinglevelsofdetail.3.7.1 Graylevel Consolidation

Insomeapplications, redigitizingtheimagewithadifferent samplingrateisaway toreducethenumber ofsamples.However,mostdigitizerparametersare difficult to change, so that often computational means of reduction are needed. A straightforward method is to partition the digitized image into nonoverlapping

106

Ch. 3 Early Processing

neighborhoodsofequalsizeandshapeandtoreplaceeachofthose neighborhoods bytheaveragepixeldensitiesinthatneighborhood.Thisoperationisconsolidation. For an/? x /ineighborhood, consolidation isequivalent toaveragingtheoriginal imageovertheneighborhoodfollowed bysamplingatintervalsnunitsapart. Consolidation tends to offset the aliasing that would beintroduced bysam plingthesensed dataatareduced rate.Thisisduetotheeffects oftheaveraging stepintheconsolidationprocess.Fortheonedimensionalcasewhere

fix) =j[f(x) +fix +*)]thecorrespondingFouriertransform [Steiglitz 1974]is

(3.63)

(3.64) whichhasmagnitude\H(u)\ = COS[TT(u/u0)]andphaseTT(U/U0). Thesampling frequency u0 = 1/A whereAisthespacing between samples.Thustheaveraging step hastheeffect ofattenuating the higher frequencies ofF{u) asshown inFig. 3.36.Since the higher frequencies areinvolved inaliasing,attenuating these fre quenciesreducesthealiasingeffects.3.7.2 Pyramidal Structuresin Correlation

With correlation matching, the use of multiple resolution techniques can some times provide significant functional and computational advantages [Moravec 1977]. Binary search correlation uses pyramids of the input imageand referenceF(u)IH(u) |

Fig. 3.36 Consolidation effects viewed in the spatial frequency domain, (a) Original transform, (b)Transform ofaveragingoperator, (c)Transform ofaveraged image.Sec.3.7 Resolution Pyramids 1 0 7

patterns.Thealgorithm partakes ofthecomputational efficiency ofbinary (asop posed to linear) search [Knuth 1973]. Further, the lowresolution correlation operationsathighlevelsinthepyramidensurethattheearliercorrelationsareon grossimagefeatures ratherthandetails. Inbinarysearchcorrelation afeature tobelocatedisatsomeunknownloca tionintheinputimage.Thereference versionofthefeature originatesinanother image,the reference image. Thefeature inthereference imageiscontained ina window ofn x pixels.The task ofthe correlator istofinda n n x n windowin the input image that best matches the reference image window containing the feature. The details of the correlation processes are given in the following algo rithm.

Algorithm3.6: BinarySearchCorrelationControlAlgorithm

Definitions OrigReference: anN x Nimagecontainingafeature centered at (Fea tureX,FeatureY). Origlnput: a n M x Marray inwhichaninstance oftheFeatureis to be located. For simplicity, assume that it isat the sameresolutionasOrigReference. a window size; an n x nwindow in OrigReference is largeenoughtocontaintheFeature. Window: an nx array containing avaryingresolution subim ageofOrigReferencecenteredontheFeature. Input: aIn x In arraycontainingavaryingresolution subim age of Origlnput, centered on the best match for the Feature. Reference: atemporaryarray.

Algorithm 1. Input:= ConsolidateOriglnputbyafactorofIn/Mto sizeIn x In. 2. Reference := Consolidate OrigReference by the same factor 2n/M to size 2nN/Mx 2nN/M. Thisconsolidation takestheFeature toanew (Featured, FeatureY). 3. Window := n x nwindow from Reference centered on the new (Featured, FeatureY). 4. Calculatethematchmetricofthewindowatthe (n+ l) 2locationsinInputat whichitiswhollycontained.Saythat thebestmatchoccursat (BestMatchZ, BestMatchY) inInput.108Ch. 3 Early Processing

5. Input := nx nwindowfrom Input centered at (BestMatchX, BestMatchlO, enlargedbyafactorof2. 6. Reference := Reference enlargedbyafactor of2.ThistakesFeaturetoanew (Feature*,FeatureY). 7. Goto3.

Through time, the algorithm usesareference imagefor matchingthatisal wayscentered onthefeature tobematched, but thathomes inonthefeatureby beingincreasedinresolutionandthusreducedinlinearimagecoveragebyafactor of2eachtime.Intheinputimage,asimilarhominginisgoingon,butthesearch area isusually twice the linear dimension of the reference window. Further, the center of the search area varies in the input image as the improved resolution refinesthepointofbestmatch. Binarysearchcorrelationisformatchingfeatureswithcontext.Thetemplate at low resolution possibly corresponds to much of the area around the feature, whilethefeaturemaybesosmallintheinitialconsolidatedimagesastobeinvisi ble.Thecoarsetofine strategyisperfect forsuchconditions,sinceitallowsgross features tobematchedfirstand toguidethe later highresolution searchfor best match.Suchmatchingwithcontextislessuseful forlocatingseveralinstancesofa shapedottedatrandomaroundanimage.3.7.3 PyramidalStructuresinEdgeDetection

Asanexampleoftheuseofpyramidalstructuresinprocessing,considertheuseof such structures in edge detection. Thisapplication, after [Tanimoto and Pavlidis 1975],usestwopyramids,one tostorethe imageand another tostore theimage edges.Theideaofthealgorithm isthataneighborhood inthelowresolutionim agewherethegraylevelvaluesarethesameistakentoimplythatinfactthereis nograylevelchange (edge) in the neighborhood. Ofcourse, the lowresolution levels in the pyramid tend to blur the image and thus attenuate the graylevel changes thatdenoteedges.Thusthestartinglevelinthepyramidmust bepicked judiciouslytoensurethattheimportantedgesaredetected.

Algorithm3.7: HierarchicalEdgeDetection recursiveprocedurerefine (k, x,y) begin ifk Threshold(x) thenrefine (k +l , x + dx,y +dy) end;. 3.7 Resolution Pyramids

Fig. 3.37 Pyramidaledgedetection.110Ch. 3 Early Processing

procedureFindEdges: begin commentapplyoperatortoeverypixelin the startinglevels,refining where necessary; forx:=0 until2s 1do fory.= 0until2s 1do //EdgeOp (s,x,y) > Threshold(s) thenrefine (s.x,y); end;

Figure 3.37 shows Tanimoto's results for a chromosome image. The table inset shows the computational advantage in terms of the calls to the edge operator asa function ofthestartinglevels. Similar kinds of edge detection strategies based on pyramids have been pursued by [Levine 1978; Hanson and Riseman 1978]. The latter effort isa little different in that processing within the pyramid is bidirectional; information from edgesdetected atahighresolution level isprojected tolowresolutionlevelsof the pyramid. EXERCISES 3.1 Derive an analytical expression for the response ofthe Sobel operator to avertical stepedgeasafunction ofthedistanceoftheedgetothecenteroftheoperator. 3.2 Usetheformulas of Eqs. (3.31) toderivethedigitaltemplatefunction forg\ ina5 3 pixeldomain. 3.3 Specify aversionofAlgorithm3.1thatusesthegradientedgeoperatorinsteadofthe "crack"edgeoperator. 3.4 In photometric stereo, three or more lightsource positions areused todeterminea surface orientation.The dualofthisproblem usessurface orientations to determine lightsourceposition.Whatistheusefulness ofthelatterformulation? In particular, howdoesitrelatetoAlgorithm 3.3? 3.5 Usinganyone ofAlgorithms 3.1 through 3.4asanexample, show howitcould be modifiedtousepyramidaldatastructures. 3.6 Write a reflectance function to capture the "grazing incidence" phenomenon surfacesbecomemoremirrorlikeatsmallanglesofincidence (andreflectance). 3.7 Equations3.49and3.50werederivedbyminimizingthelocalerror.Showhowthese equationsaremodified whentotalerror [i.e., E(x, y)] isminimized. x,y REFERENCESABDOU,I.E."Quantitative methodsofedgedetection."USCIPIReport830,ImageProcessingInstit ute, Univ.SouthernCalifornia,July1978. AKATSUKA, T., T. ISOBE, and 0. TAKATANI. "Feature extraction ofstomach radiograph." Proc,2nd IJCPR,August 1974,324328.References

111

ANDREWS, H.C.andB.R.HUNT. DigitalImage Restoration. Englewood Cliffs, NJ:PrenticeHall,1977. ATTNEAVE, F."Some informational aspectsofvisualperception." PsychologicalReview 61,1954. BARROW, H.G.andJ.M.TENENBAUM. "Computational Vision." Proc.IEEE 69,5,May1981,572595 BARROW, H. G. and J. M. TENENBAUM. "Recovering intrinsic scene characteristics from images." Technical Note 157,AICenter,SRIInternational, April1978. BINFORD, T.O."Visual perception bycomputer." Proc, IEEEConf. onSystemsandControl, Miami, December 1971. BLINN, J. E. "Computer display of curved surfaces." Ph.D. dissertation, Computer Science Dept., Univ.Utah, 1978. FREI, W. and C. C. CHEN. "Fast boundary detection: ageneralization and a newalgorithm." IEEE Trans. Computers26,2,October 1977,988998. GONZALEZ, R.C.andP.WINTZ. DigitalImageProcessing. Reading, MA:AddisonWesley,1977. GRIFFITH, A.K."Edge detection insimple scenes usingapriori information." IEEE Trans. Computers 22, 4,April1973. HANSON, A.R.andE.M.RISEMAN (Eds.). Computer VisionSystems (CVS). NewYork: Academic Press, 1978. HORN, B.K.P."Determining lightnessfrom animage." CGIP3,4, December 1974,277299. HORN, B.K.P."Shape from shading." InPCV, 1975. HORN, B.K.P.andB.G. SCHUNCK. "Determining optical flow." AI Memo 572,AILab,MIT,April 1980. HORN, B.K.P.andR.W.SJOBERG. "Calculating thereflectance map." Proc, DARPA IU Workshop, November 1978,115126. HUBEL, D.H.andT.N. WIESEL. "Brain mechanisms ofvision." ScientificAmerican, September 1979, 150162. HUECKEL, M." A n operator which locates edges in digitized pictures." /. ACM 18, 1,January1971, 113125. HUECKEL, M."A local visualoperator which recognizes edgesandlines."J.ACM20,4, October 1973, 634647. IKEUCHI, K."Numerical shape from shading andoccluding contours inasingle view." AIMemo566, AILab,MIT,revised February1980. KIRSCH, R.A."Computer determination ofthe constituent structure ofbiological images." Computers andBiomedicalResearch4,3,June 1971,315328. KNUTH, D.E.TheArtofComputerProgramming.Reading, MA:AddisonWesley,1973. LEVINE,M.D." A knowledgebased computer vision system." InCVS,1978. Liu,H.K."Twoand threedimensional boundary detection." CGIP'6,2,1977, 123134. MARR, D.andT.POGGIO."Cooperative computation ofstereo disparity."Science 194, 1976,283287. MARR, D. and T. POGGIO. "A theory of human stereo vision." AI Memo 451,AI Lab,MIT,No vember 1977. MERO, L. and Z. VASSY. " A simplified and fast version ofthe Hueckel operator for finding optimal edgesinpictures."Proc, 4thIJCAI,September 1975,650655. MORAVEC,H.P."Towardsautomaticvisualobstacleavoidance."Proc, 5thIJCAI, August 1977,584. NEVATIA, R. "Evaluation of a simplified Hueckel edgeline detector." Note, CGIP 6,6, December 1977, 582588. PHONG, BT."Illumination for computer generated pictures." Commun. ACM 18,6,June 1975, 311 317. PINGLE, K. K. and J. M. TENENBAUM. "An accommodating edge follower." Proc, 2nd IJCAI, September 1971,17. 112Ch. 3 Early Processing

PRAGER,J.M."Extractingandlabelingboundary segmentsinnaturalscenes."IEEETrans. PAMI2, 1,January 1980,1627. PRATT,W.K.DigitalImageProcessing.NewYork:WileyInterscience,1978. PREWITT,J.M.S."Object enhancementandextraction."InPictureProcessingand Psychopictorics, B.S. LipkinandA.Rosenfeld (Eds.).NewYork:AcademicPress,1970. QUAM, L.andM.J. HANNAH. "Stanford automated photogrammetry research." AIM254, Stanford AILab,November1974. ROBERTS,L.G."Machineperceptionofthreedimensional solids."InOpticaland ElectroopticalInfor mationProcessing,J.P.Tippettet?1. (Eds.).Cambridge,MA:MITPress,1965. ROSENFELD,A.andA.C.KAK.DigitalPictureProcessing.NewYork:AcademicPress,1976. ROSENFELD,A.,R.A.HUMMEL,andS.W.ZUCKER."Scenelabellingbyrelaxationoperations."IEEE Trans.SMC6, 1976,430. RUSSELL,D.L.(Ed.).CalculusofVariationsand ControlTheory. NewYork:AcademicPress, 1976. SHAPIRA, R."A technique forthereconstruction ofastraightedge, wireframe object from two or morecentralprojections."CGIP3,4,December 1974,318326. SHIRAI,V."Analyzingintensityarraysusingknowledgeaboutscenes."InPCV, 1975. STEIGLITZ,K.AnIntroductiontoDiscreteSystems.NewYork:Wiley, 1974. STOCKHAM,T.J.,Jr."Imageprocessinginthecontextofavisualmodel."Proc. IEEE60,7,July1972, 828842. TANIMOTO,S.andT.PAVLIDIS."Ahierarchicaldatastructureforpictureprocessing." CGIP4, 2,June 1975,104119. TRETIAK,O..J."Aparametericmodel foredge detection." Proc,3rdCOMPSAC, November1979, 884887. TURNER,K.J."Computer perceptionofcurvedobjectsusingatelevisioncamera."Ph.D.dissertation, Univ.Edinburgh,1974. WECHSLER, H.andJ. SKLANSKY. "Finding theribcageinchest radiographs." Pattern Recognition 9, 1977,2130. WHITTED, T."Animproved illumination modelforshaded display." Comm.ACM 23, 6,June1980, 343349. WOODHAM, R.J."Photometricstereo:Areflectance maptechniquefordeterminingsurface orienta tion from image intensity." Proc, 22nd International Symp., Society ofPhotooptical Instru mentationEngineers,SanDiego,CA,August1978,136143. ZUCKER, S.W.andR.A.HUMMEL. "Anoptimal threedimensional edge operator." Report 7910, McGillUniv.,April1979.ZUCKER,S.W.,R.A.HUMMEL,and A.ROSENFELD."An application ofrelaxation labelingtolineand

curveenhancement."IEEETrans. Computers26,1977.

ij"

References

113

SEGMENTED IMAGESKnowledge base

Analogical models

Analogical/ propositional models

Generalized image

Segmented image

Geometric structures

Relational structures

Edge following

Texture

Motion

Theideaofsegmentation hasitsrootsinwork bytheGestalt psychologists (e.g., Kohler), whostudied the preferences exhibited byhuman beingsingrouping or organizingsetsofshapesarrangedinthevisualfield.Gestaltprinciplesdictatecer taingroupingpreferences basedonfeaturessuchasproximity,similarity,andcon tinuity.Otherresultshadtodowithfigure/grounddiscriminationandopticalillu sions. The latter have provided a fertile ground for vision theories to post Gestaltists such as Gibson and Gregory, who emphasize that these grouping mechanisms organize the scene into meaningfulunitsthat are a significant step towardimageunderstanding. Incomputer vision,groupingpartsofageneralized imageintounitsthatare homogeneouswithrespecttooneormorecharacteristics (orfeatures) resultsina segmentedimage. Thesegmented imageextendsthegeneralized imageinacrucial respect: it contains the beginnings of domaindependent interpretation. At this descriptive level the internal domaindependent models of objects begin to influence thegroupingofgeneralizedimagestructuresintounitsmeaningful inthe domain.For instance, the model maysupplycrucial parameters to segmentation procedures. Inthesegmentationprocesstherearetwoimportantaspectstoconsider:one isthe data structure used to keep track of homogeneous groups offeatures; the otheristhetransformation involvedincomputingthefeatures. Twobasicsortsofsegments arenatural:boundaries and regions.Thesecan be used combined into a single descriptive structure, a set of nodes (one per region),connected byarcsrepresenting the "adjacency" relation.The "dual"of thisstructurehasarcscorrespondingtoboundariesconnectingnodesrepresenting pointswhereseveral regions meet. Chapters 4and 5describe segmentation with respect toboundariesandregionsrespectively, emphasizinggraylevelsandgray leveldifferences asindicatorsofsegments.Ofcourse,from the standpoint of the

116

Part II Segmented Images

algorithmsinvolved, itisirrelevant whether the features areintensity graylevels orintrinsicimagevaluesperhapsrepresentingmotion,color,orrange. Texture and motion images are addressed in Chapters 6 and 7. Each has several computationally difficult aspects, and neither has received the attention givenstatic,nontextured images.However,eachisveryimportantinthesegmen tationenterprise.

PartII Segmented Images

117

Boundary Detection

4

4.1 ON ASSOCIATING EDGEELEMENTS

Boundariesofobjectsareperhapsthemostimportantpartofthehierarchyofstruc tures that links raw image data with their interpretation [Marr 1975].Chapter 3 describedhowvariousoperatorsappliedtorawimagedatacanyieldprimitiveedge elements. However, an image of only disconnected edge elements is relatively featureless;additionalprocessingmustbedonetogroupedgeelementsintostruc tures better suited tothe processofinterpretation. Thegoalofthetechniques in thischapter istoperform alevelofsegmentation, thatis,tomakeacoherent one dimensional (edge) feature from manyindividuallocaledgeelements.Thefeature couldcorrespond toan object boundary or toanymeaningful boundary between scene entities. The problems that edgebased segmentation algorithms have to contend withareshownbyFig.4.1,whichisanimageofthelocaledgeelements yielded byone common edge operator applied to achest radiograph. As can be seen, the edge elements often exist where no meaningful scene boundary does, and conversely often areabsent where aboundary is.For example, consider the boundaries ofribsasrevealed bytheedgeelements. Missingedgeelements and extraedgeelementsbothtendtofrustratethesegmentationprocess. The methods in this chapter are ordered according to the amount of knowledgeincorporated intothegroupingoperation thatmapsedgeelementsinto boundaries."Knowledge" meansimplicitorexplicitconstraintsonthelikelihood ofagivengrouping. Suchconstraints mayarisefrom general physical arguments or (more often) from stronger restrictions placed on the image arising from domaindependent considerations. If there ismuch knowledge, thisimplies that theglobalform oftheboundary and itsrelation toother imagestructures isvery constrained. Little prior knowledge means that the segmentation must proceed moreonthebasisoflocalcluesandevidenceandgeneral (domaindependent) as sumptionswithfewerexpectationsandconstraintsonthefinalresultingboundary.119

Fig. 4.1 Edgeelementsinachest radiograph.

Theseconstraints take many forms. Knowledge ofwhere toexpectaboun dary allows very restricted searches to verify the edge. In many such cases, the domain knowledge determines the type of curve (its parameterization or func tional form) aswell asthe relevant "noise processes." In images of polyhedra, only straightedged boundaries are meaningful, and they will come together at varioussortsofverticesarisingfrom corners,shadowsofcorners,andocclusions. Human rib boundaries appear approximately like conic sections in chest radio graphs,and radiographs havecomplex edgestructures thatcancompete with rib edges. All this specific knowledge can and should guide our choice ofgrouping method. Iflessisknownabout thespecific imagecontent, onemayhavetofallback ongeneral worldknowledge orheuristics thataretruefor most domains.For in stance, in theabsence ofevidence to the contrary, the shorter line between two pointsmight beselected overalonger line.Thissortofgeneral principleiseasily built into evaluation functions for boundaries, and used in segmentation algo rithmsthatproceedbymethodicallysearchingforsuchgroupings.Iftherearenoa priori restrictions on boundary shapes, a general contourextraction method is calledfor,suchasedgefollowingorlinkingofedgeelements. Themethodsweshallexaminearethefollowing: 1. Searchingnearanapproximatelocation.Thesearemethodsforrefiningaboun darygivenaninitialestimate. 2. TheHough transform. Thiselegant and versatile technique appearsinvarious guises throughout computer vision. Inthischapter it isused todetect boun darieswhoseshapecanbedescribedinananalyticalortabular form. 3. Graphsearching. This method represents the image of edge elements as a graph.Thusaboundary isapaththrough agraph.LiketheHough transform, thesetechniquesarequitegenerallyapplicable.120 Ch.4 BoundaryDetection

4. Dynamicprogramming.Thismethod isalsoverygeneral. Itusesamathemati calformulation ofthegloballybestboundaryandcanfindboundariesinnoisy images. 5. Contourfollowing. This hillclimbing technique works best with good image data.4.2 SEARCHINGNEARANAPPROXIMATE LOCATION

Iftheapproximate orapriori likely location ofaboundary hasbeen determined somehow,itmaybeusedtoguidetheeffort torefine thatboundary [Kelly1971]. Theapproximatelocationmayhavebeenfoundbyoneofthetechniquesbelowap pliedtoalowerresolutionimage,oritmayhavebeendeterminedusinghighlevel knowledge.4.2.1 AdjustingAPriori Boundaries

Thisideawasdescribed by [Bolles1977] (seeFig.4.2).Localsearchesarecarried outatregularintervalsalongdirectionsperpendiculartotheapproximate(apriori) boundary. Anedgeoperatorisappliedtoeachofthediscretepointsalongeachof theseperpendicular directions.Foreachsuchdirection,theedgewiththehighest magnitude isselectedfrom among thosewhoseorientationsarenearlyparallelto the tangent atthepointon thenearbyaprioriboundary.Ifsufficiently manyele mentsarefound, theirlocationsarefitwithananalyticcurvesuchasalowdegree polynomial,andthiscurvebecomestherepresentationoftheboundary.

Fig. 4.2 Searchorientationsfroman approximateboundarylocation. 4.2.2 NonlinearCorrelationinEdgeSpace

In thiscorrelationlike technique, the apriori boundary istreated asarigid tem plate,orpieceofrigidwirealongwhichedgeoperatorsareattachedlikebeads.The apriorirepresentation thusalsocontainsrelative locationsatwhichtheexistence ofedgeswillbetested (Fig.4.3).Anedgeelementreturned bythe edgeoperator application "matches" the apriori boundary ifitscontour istangent tothe tem plate and its magnitude exceeds some threshold. The template is to be moved around the image,andfor each location, thenumber ofmatches iscomputed.If the number ofmatchesexceedsathreshold, theboundary location isdeclared toSec. 4.2 Searching near an Approximate Location 1 2 1

Fig. 4.3 Atemplatefor edgeoperator application.

bethecurrenttemplatelocation.Ifnot,thetemplateismovedtoadifferent image pointandtheprocessisrepeated.Eithertheboundarywillbelocatedortherewill eventuallybenomoreimagepointstotry.4.2.3 DivideandConquer Boundary Detection

This is atechnique that is useful in the case that a lowcurvature boundary is known toexistbetween twoedgeelements and the noiselevelsinthe imageare low (Algorithm 8.1). In thiscase, to find aboundary point in between the two known points, search along the perpendiculars ofthe linejoiningthetwopoints. The pointofmaximum magnitude (ifitisoversomethreshold) becomesabreak pointontheboundaryandthetechniqueisappliedrecursivelytothetwolineseg ments formed between thethree known boundary points. (Somefixmust beap plied if the maximum isnot unique.) Figure 4.4 showsone step in this process. Divideandconquer boundary detection has been used to outline kidney boun daries on computed tomograms (these images were described in Section 2.3.4) [Selfridgeetal. 1979].

Fig. 4.4 Divideandconquertechnique.

122

Ch.4 BoundaryDetection

(b)

Fig. 4.5 Aline(a)inimagespace;(b)inparameterspace. 4.3 THEHOUGHMETHOD FORCURVE DETECTION

T

ballard d. and brown c. m. 1982 computer vision

Documents