localization of sensors and objects in distributed … · 2018-06-22 · the sensors by observation...

Localization of Sensors and Objects inDistributed Omnidirectional Vision

Takushi Sogo

Abstract

Recent progress of multimedia and computer graphics is developing practi-cal application systems based on simple computer vision techniques. Espe-cially, the practical approach recently focused on is to use multiple visionsensors with simple visual processing.

The objective of this research is to introduce a distributed omnidirec-tional vision system and to develop various application systems. The sys-tem consists of a large number of omnidirectional vision sensors locatedin the environment and connected with a computer network. Compared toexisting systems using standard vision sensors, the distributed omnidirec-tional vision system can provide a wide scene coverage with fewer visionsensors. In addition, the system can observe an object from various view-points, which enables robust recognition of the object. These merits allowus to develop practical vision systems.

In this thesis, we study various techniques in the distributed omnidirec-tional vision system, especially the following respects: methods for measur-ing the sensor and object locations by observation, methods for measuringthe sensor locations by observation without complex numerical expressions,and development of application systems.

The first issue deals with one of the most fundamental and importanttechniques in the distributed omnidirectional vision system. The distributedomnidirectional vision system has different aspects in localization comparedto existing multiple camera systems. In addition, various methods shouldbe considered according to situations, e.g., whether the locations of the sen-sors and/or objects are known. The second issue addresses localization of

i

the sensors by observation when the locations of reference objects are un-known. If the locations of the objects are unknown, it is in general difficultto compute the sensor locations by solving complex numerical expressionson account of many unknown parameters. The third issue deals with variousproblems to be solved in order to realize application systems. The advan-tages of the distributed omnidirectional vision system are shown through thedevelopment of application systems.

With respect to the above issues, the thesis studies the following topics.

1. Methods for localizing the sensors and objects

We classified the localization methods into five groups, and studiedvarious aspects of the methods. When the locations of the sensorsare known, objects are localized by omnidirectional stereo, where theuse of multiple sensors improves the measurement precision. Whenthe locations of objects are known, the sensors are localized by trian-gulation, where the reference objects should be carefully selected inorder to avoid localization errors. The sensors are also localized byobserving the azimuth angles to other sensors and applying triangleconstraints.

2. Statistical and qualitative methods for localizing the sensors

We have proposed two methods for localizing sensors without com-plex numerical expressions. In the statistical localization of the sen-sors, the method observes objects, analyzes the azimuth angles, andestimates the baselines among the sensors, from which the sensors arelocalized using triangle constraints. In the qualitative localization, themethod observes the motion directions of objects, applies three pointconstraints, and determines the qualitative locations of the sensors.This method directly acquires the qualitative representation from thequalitative observed information.

3. Development of application systems

ii

In the robot navigation system, we have proposed a method for nav-igating robots without precise sensor parameters by the distributedomnidirectional vision system. The system performs complicatedrobot tasks with simple visual processing using rich visual informa-tion provided by many sensors. In the real-time human tracking sys-tem, we have proposed N-ocular stereo for verifying the correspon-dence among multiple targets and measuring their locations in realtime using multiple omnidirectional vision sensors. In addition, sev-eral methods have been developed for compensating observation er-rors, in order to improve the localization precision in omnidirectionalstereo.

Thus, in this research we have made a systematic study of the distributedomnidirectional vision system, from the localization methods of the sensorsand objects as the most fundamental techniques, to the development of prac-tical application systems. The techniques investigated in the thesis form thefoundations of the distributed omnidirectional vision system, which breaksa promising research direction in multiple camera systems.

iii

Acknowledgments

I would like to express my sincere gratitude to Professor Toru Ishida for hiscontinuous guidance, valuable advice, and helpful discussions. I would alsolike to thank Professor Hiroshi Ishiguro at Wakayama University for his pa-tient and inspiring guidance. I gratefully acknowledge valuable commentsof other members of my thesis committee, Professor Yahiko Kambayashiand Professor Tetsuro Sakai at Kyoto University.

I wish to thank Professor Mohan M. Trivedi at University of California,San Diego, where I completed some important part of the work in the thesis.I am also thankful to his laboratory members for their kind support.

I am deeply indebted to Professor Tomomasa Sato at the University ofTokyo for his sharp and valuable criticism. His comments were very helpfulto improve the quality of my papers.

I also wish to express my special thanks to Dr. Kurt G. Konolige at SRIInternational, where I had valuable experience of vision research during myfive-month stay.

Brief mention should be made of the work by my colleagues at ProfessorIshida’s laboratory. Goichi Tanaka developed an early version of the robotnavigation system. Ryusuke Sagawa wrote a program for an early versionof the real-time human tracking system.

Finally, I wish to thank all members of Professor Ishida’s laboratory fortheir help and fruitful discussion.

This research was supported in part by a grant from the Japan Societyfor the Promotion of Science.

v

Contents

1 Introduction 11.1 Background . . . . . . . . . . . . . . . . . . . . . . . . . . 1

1.1.1 Approaches from Multimedia Systems . . . . . . . . 11.1.2 Approaches from Intelligent Robots . . . . . . . . . 3

1.2 Distributed Omnidirectional Vision System . . . . . . . . . 41.3 Research Issues . . . . . . . . . . . . . . . . . . . . . . . . 51.4 Thesis Outline . . . . . . . . . . . . . . . . . . . . . . . . . 6

2 Quantitative Approaches 92.1 Observation for Localizing Sensors and Objects . . . . . . . 92.2 Method M-1: Localization of Objects by Stereo . . . . . . . 13

2.2.1 Omnidirectional Stereo . . . . . . . . . . . . . . . . 132.2.2 Uncertainty of Omnidirectional Stereo . . . . . . . . 142.2.3 Using Multiple Omnidirectional Sensors . . . . . . 15

2.3 Method M-2: Localization of Sensors by Observing Refer-ence Objects . . . . . . . . . . . . . . . . . . . . . . . . . . 17

2.4 Method M-3: Localization of Sensors by Observing Sensors 202.4.1 Triangle Constraint . . . . . . . . . . . . . . . . . . 202.4.2 Method M-3FOE: Estimation of Baselines among

Sensors Based on FOE . . . . . . . . . . . . . . . . 222.5 Method M-4 & M-5: Localization of Sensors and Objects

by Observing Objects . . . . . . . . . . . . . . . . . . . . . 232.6 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . 26

vii

3 Statistical Approaches 273.1 Statistical Estimation of the Baseline Directions among the

Sensors . . . . . . . . . . . . . . . . . . . . . . . . . . . . 273.2 Implementation of the Algorithm . . . . . . . . . . . . . . . 293.3 Increase Ratio of the Reliability . . . . . . . . . . . . . . . 303.4 Experimentation . . . . . . . . . . . . . . . . . . . . . . . . 33

3.4.1 Simulated Environment . . . . . . . . . . . . . . . . 343.4.2 Real Environment . . . . . . . . . . . . . . . . . . 35

3.5 Discussion . . . . . . . . . . . . . . . . . . . . . . . . . . . 383.5.1 Statistical Estimation of the Object Correspondence . 383.5.2 Classification of the Statistical Methods . . . . . . . 39

3.6 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . 40

4 Qualitative Approaches 414.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . 41

4.1.1 Vision Systems Using Qualitative Maps . . . . . . . 414.1.2 Representing Qualitative Locations . . . . . . . . . 42

4.2 Localization of the Sensors Based on Qualitative Information 434.2.1 Previous Work . . . . . . . . . . . . . . . . . . . . 434.2.2 Qualitative Spatial Model . . . . . . . . . . . . . . 444.2.3 Qualitative Observation . . . . . . . . . . . . . . . 454.2.4 Overview of the Acquisition Process . . . . . . . . . 464.2.5 Acquiring Three Point Constraints . . . . . . . . . . 474.2.6 Constraint Propagation . . . . . . . . . . . . . . . . 484.2.7 Formalization of the Constraint Propagation . . . . . 514.2.8 Transforming into the Qualitative Spatial Model . . 52

4.3 Experimental Results . . . . . . . . . . . . . . . . . . . . . 544.3.1 Verification in a Simple Environment . . . . . . . . 544.3.2 Application to a Complex and Realistic Environment 564.3.3 Observation Errors . . . . . . . . . . . . . . . . . . 60

4.4 Discussion . . . . . . . . . . . . . . . . . . . . . . . . . . . 634.4.1 Completeness of the Algorithm . . . . . . . . . . . 634.4.2 Computational Costs . . . . . . . . . . . . . . . . . 64

viii

4.4.3 Distributed Computation . . . . . . . . . . . . . . . 654.4.4 Stability of Detection of Motion Directions . . . . . 66

4.5 Application . . . . . . . . . . . . . . . . . . . . . . . . . . 694.6 Determining Qualitative Locations by Observation . . . . . 704.7 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . 73

5 Integration of the Fundamental Techniques toward Real Appli-cations 755.1 Merits and Limitations . . . . . . . . . . . . . . . . . . . . 755.2 Integration of the Localization Methods . . . . . . . . . . . 79

5.2.1 Identification of Moving Objects . . . . . . . . . . . 795.2.2 Qualitative Localization and Object Tracking . . . . 825.2.3 Verification of Baselines by FOE . . . . . . . . . . . 835.2.4 Stepwise Localization of the Sensors . . . . . . . . 84

5.3 Examples . . . . . . . . . . . . . . . . . . . . . . . . . . . 875.3.1 Robot Navigation System . . . . . . . . . . . . . . 875.3.2 Activity Monitoring System . . . . . . . . . . . . . 90

6 Mobile Robot Navigation 916.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . 916.2 Development of a Prototype System . . . . . . . . . . . . . 93

6.2.1 Architecture . . . . . . . . . . . . . . . . . . . . . . 936.2.2 Fundamental Functions for Robot Navigation . . . . 946.2.3 Task Teaching . . . . . . . . . . . . . . . . . . . . 966.2.4 Navigation of Mobile Robots . . . . . . . . . . . . . 97

6.3 Experimentation . . . . . . . . . . . . . . . . . . . . . . . . 1016.4 Discussion . . . . . . . . . . . . . . . . . . . . . . . . . . . 104

6.4.1 Localization Methods for Robust Processing . . . . 1046.4.2 Previous Work . . . . . . . . . . . . . . . . . . . . 105

6.5 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . 106

7 Real-Time Human Tracker 1097.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . 109

ix

7.2 Localization of Targets by N-Ocular Stereo . . . . . . . . . 1107.2.1 Correspondence Problem and Trinocular Stereo . . . 1107.2.2 Problems of Conventional Methods . . . . . . . . . 1127.2.3 Basic Algorithm of N-Ocular Stereo . . . . . . . . . 1137.2.4 Localization of Targets and Error Handling . . . . . 1147.2.5 False Matches in N-Ocular Stereo . . . . . . . . . . 116

7.3 Implementing N-Ocular Stereo . . . . . . . . . . . . . . . . 1177.3.1 Simplified N-Ocular Stereo . . . . . . . . . . . . . . 1177.3.2 Error Handling in the Simplified N-Ocular Stereo . . 118

7.4 Experimentation . . . . . . . . . . . . . . . . . . . . . . . . 1207.4.1 Hardware Configuration . . . . . . . . . . . . . . . 1207.4.2 Measurement Precision of N-ocular Stereo . . . . . 1217.4.3 Tracking People . . . . . . . . . . . . . . . . . . . 123

7.5 Applications . . . . . . . . . . . . . . . . . . . . . . . . . . 1257.5.1 Monitoring System . . . . . . . . . . . . . . . . . . 1257.5.2 Gesture Recognition System . . . . . . . . . . . . . 127

7.6 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . 130

8 Conclusion 131

Bibliography 135

Publications 141

A Generating Perspective Images 145A.1 Mirror Shapes and Omnidirectional Images . . . . . . . . . 145A.2 Transforming an Omnidirectional Image into a Perspective

Image . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 146A.3 Fast Transformation Using Lookup Tables . . . . . . . . . . 149

B Estimating Camera Parameters by Observation 151B.1 Estimation Method . . . . . . . . . . . . . . . . . . . . . . 151B.2 Estimation Error . . . . . . . . . . . . . . . . . . . . . . . . 154

x

List of Figures

1.1 Compact omnidirectional vision sensor . . . . . . . . . . . . 4

1.2 Distributed omnidirectional vision system . . . . . . . . . . 5

2.1 Configuration of the sensors . . . . . . . . . . . . . . . . . 10

2.2 Observation for localizing sensors and/or objects . . . . . . 12

2.3 Binocular stereo with omnidirectional vision sensors . . . . 13

2.4 Unstable localization . . . . . . . . . . . . . . . . . . . . . 14

2.5 Omnidirectional image . . . . . . . . . . . . . . . . . . . . 15

2.6 Uncertainty of omnidirectional binocular stereo . . . . . . . 16

2.7 Uncertainty of conventional binocular stereo . . . . . . . . . 17

2.8 Uncertainty of omnidirectional stereo with four sensors . . . 18

2.9 Localization of omnidirectional vision sensor (top view) . . 19

2.10 Selection of reference objects that yields an unstable solution. 19

2.11 Triangle constraint . . . . . . . . . . . . . . . . . . . . . . 21

2.12 Propagation of triangle constraints. . . . . . . . . . . . . . . 22

2.13 FOE constraints . . . . . . . . . . . . . . . . . . . . . . . . 23

2.14 Localizing three sensors by observing five objects of un-known positions . . . . . . . . . . . . . . . . . . . . . . . . 24

3.1 Basic idea for baseline estimation . . . . . . . . . . . . . . 28

3.2 Three different positions on the baseline . . . . . . . . . . . 28

3.3 Configuration of sensors and an object where at least one ofthe sensors observe the object in the baseline direction . . . 31

xi

3.4 Configuration of the sensors and an object where both ofthe sensors observe the object in the direction other than thebaseline . . . . . . . . . . . . . . . . . . . . . . . . . . . . 32

3.5 Sensor configuration . . . . . . . . . . . . . . . . . . . . . 333.6 The number of detected baselines in the simulated environ-

ment . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 343.7 Outdoor experimentation . . . . . . . . . . . . . . . . . . . 353.8 Unwrapped image . . . . . . . . . . . . . . . . . . . . . . . 363.9 The number of detected baselines in the real environment . . 373.10 Detected baselines in the real environment . . . . . . . . . . 383.11 Basic idea for correspondence estimation . . . . . . . . . . 39

4.1 Qualitative representation of positions . . . . . . . . . . . . 424.2 Observation for acquiring qualitative positions . . . . . . . . 444.3 Qualitative representation of positions . . . . . . . . . . . . 454.4 Qualitative representation of positional relations among

three points . . . . . . . . . . . . . . . . . . . . . . . . . . 464.5 Process for acquiring the qualitative spatial model . . . . . . 474.6 Seven regions defined with three points . . . . . . . . . . . 484.7 Three point constraints . . . . . . . . . . . . . . . . . . . . 494.8 An example of possible SCPs and 3PCs . . . . . . . . . . . 494.9 An example of constraint propagation . . . . . . . . . . . . 504.10 Classifications for the constraint propagation . . . . . . . . . 534.11 SCPs including the points on the classifying line . . . . . . . 544.12 Transformation into the qualitative spatial model . . . . . . 544.13 Simple environment with 10 vision sensors and a moving

object . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 554.14 The number of acquired components in the simple environ-

ment . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 554.15 Simple environment with 20 vision sensors and a moving

object . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 574.16 The number of components acquired with 20 vision sensors

in the simple environment . . . . . . . . . . . . . . . . . . . 58

xii

4.17 Complex environment with 35 vision sensors and 8 movingobjects . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 59

4.18 The number of components acquired with 35 vision sensorsin the complex environment . . . . . . . . . . . . . . . . . . 60

4.19 Qualitative positions of the vision sensors depicted based onthe acquired 3PCs . . . . . . . . . . . . . . . . . . . . . . . 61

4.20 Force F which acts on X with respect to a triangle ABC . . . 624.21 The number of acquired and wrong components in a noisy

environment . . . . . . . . . . . . . . . . . . . . . . . . . . 624.22 Motion directions detected by background subtraction and

template matching . . . . . . . . . . . . . . . . . . . . . . . 674.23 A model town and mobile robots . . . . . . . . . . . . . . . 684.24 Observation by a moving robot for acquiring qualitative maps 704.25 Method Q-1: Qualitative localization of objects . . . . . . . 714.26 Method Q-2: Qualitative localization of the sensors . . . . . 724.27 Method Q-3: Qualitative localization of the sensors . . . . . 73

5.1 Correspondence of objects . . . . . . . . . . . . . . . . . . 805.2 A three point constraint acquired from an SCP “fACg, fBDg” 815.3 Qualitative localization of objects based on observation of

motion directions of objects . . . . . . . . . . . . . . . . . . 825.4 An example of qualitative localization of objects based on

the motion direction . . . . . . . . . . . . . . . . . . . . . . 835.5 Stepwise localization of the sensors . . . . . . . . . . . . . 855.6 Adjacent sensors . . . . . . . . . . . . . . . . . . . . . . . 885.7 Verification of the correspondence . . . . . . . . . . . . . . 89

6.1 A fully autonomous mobile robot with cameras mounted on it 926.2 A mobile robot navigated by cameras located in the envi-

ronment . . . . . . . . . . . . . . . . . . . . . . . . . . . . 926.3 The architecture of the robot navigation system . . . . . . . 936.4 Detecting free regions . . . . . . . . . . . . . . . . . . . . . 966.5 Navigation paths taught by a human operator . . . . . . . . 97

xiii

6.6 Communication between VAs and a robot in the navigationphase . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 98

6.7 Generating a navigation plan . . . . . . . . . . . . . . . . . 996.8 Model town . . . . . . . . . . . . . . . . . . . . . . . . . . 1016.9 Hardware configuration . . . . . . . . . . . . . . . . . . . . 1026.10 Images processed by VAs . . . . . . . . . . . . . . . . . . . 1036.11 A sequence of photographs of two robots being navigated

by the system . . . . . . . . . . . . . . . . . . . . . . . . . 1046.12 Overlaps of the visual fields of the sensors . . . . . . . . . . 105

7.1 Localization in the real-time tracking system . . . . . . . . . 1117.2 Localization of a target considering observation errors . . . . 1147.3 Localization of a target by binocular stereo . . . . . . . . . 1157.4 False matches in N-ocular stereo . . . . . . . . . . . . . . . 1167.5 Simplified N-ocular stereo . . . . . . . . . . . . . . . . . . 1177.6 Error compensation . . . . . . . . . . . . . . . . . . . . . . 1197.7 Overview of the real-time human tracking system . . . . . . 1207.8 Detecting targets by background subtraction . . . . . . . . . 1217.9 Measurement precision of N-ocular stereo . . . . . . . . . . 1227.10 Trajectories of a walking person . . . . . . . . . . . . . . . 1237.11 Tracking people with eight sensors . . . . . . . . . . . . . . 1257.12 Showing people’s locations and images . . . . . . . . . . . 1267.13 Range space search for synthesizing an image at a virtual

viewpoint . . . . . . . . . . . . . . . . . . . . . . . . . . . 1277.14 Modeling human motions with multiple omnidirectional vi-

sion sensors . . . . . . . . . . . . . . . . . . . . . . . . . . 1287.15 Screen shot of a gesture recognition system . . . . . . . . . 129

A.1 Optical properties of the hyperboloidal mirror . . . . . . . . 146A.2 Generating perspective images . . . . . . . . . . . . . . . . 147A.3 R-z plane . . . . . . . . . . . . . . . . . . . . . . . . . . . 148

B.1 Estimating camera parameters . . . . . . . . . . . . . . . . 152B.2 Estimation error ∆αi . . . . . . . . . . . . . . . . . . . . . 154

xiv

B.3 Images used for estimating αi . . . . . . . . . . . . . . . . . 155

xv

List of Tables

2.1 Localization types of sensors and/or objects . . . . . . . . . 11

5.1 Properties of each localization method . . . . . . . . . . . . 76

7.1 Averages and errors of the measured locations . . . . . . . . 122

B.1 αi of vision agents estimated by observation . . . . . . . . . 155

xvii

Chapter 1

Introduction

1.1 Background

Recent progress of multimedia and computer graphics is developing practi-cal application systems based on simple computer vision techniques. Espe-cially, the practical approach recently focused on is to use multiple visionsensors with simple visual processing. For example, several systems trackpeople and automobiles in the real environment with multiple vision sen-sors, and other systems analyze their behaviors. On the other hand, recentresearch activities in robotics involve intelligent robots that perform com-plicated tasks using not only the vision sensors mounted on the robots, butalso those located in the environment.

Thus, these activities that aim to introduce a new platform using multiplevision sensors are increasing in both multimedia researches and intelligentrobotics.

1.1.1 Approaches from Multimedia Systems

Generally, the purpose of researches developing multiple camera systemsfrom the viewpoint of multimedia systems is to acquire more precise three-dimensional models of objects than those acquired by single camera sys-tems, or to obtain a wide scene coverage with multiple sensors, in order to

1

realize various application systems. For example, if the system can acquireprecise models, it will be applied to virtual reality; a wide scene coveragewill be useful in monitoring systems. Thus, we can expect that various ap-plication systems are realized with multiple camera systems.

Boyd and others have developed MPI Video (Multi Perspective Inter-active Video) [Boyd98], which builds precise three-dimensional models ofpeople based on stereo with multiple cameras whose parameters have beenprecisely measured in advance. With the models, the system synthesizesvideo streams at an arbitrary viewpoint in the environment, and realizesan interactive TV, for example. Furthermore, this system has developedto a monitoring system in a town. With this system, users can access thedatabase to obtain various information such as images and behaviors of tar-gets.

Kanade and others have proposed Virtualized Reality [Kanade99], whichsynthesizes video streams at an arbitrary viewpoint using multiple camerasthat are calibrated precisely. This practical system employs a robust stereomethod, called multiple baseline stereo [Okutomi93], to build precise three-dimensional models and provide low-noise images.

Matsuyama and others have proposed Distributed Cooperative Vi-sion [Matsuyama98], which consists of observation stations located in atown and robots equipped with vision sensors. The goal of the researchproject is to realize monitoring systems, traffic control systems, teleconfer-ence systems, navigation systems for robots and handicapped people, etc.One of the systems in this project utilizes multiple cameras to track peo-ple, where the locations are measured by binocular stereo. In order to real-ize real-time cooperation of the cameras, a new framework called DynamicMemory [Matsuyama00] has been proposed.

Grimson and others have proposed a system that tracks people and au-tomobiles with multiple cameras, and statistically analyzes and classifiestheir behaviors [Grimson98]. For example, the system can detect a suspi-cious person in a parking lot. The locations of the targets are measured notby stereo, but by projecting the view of each camera into a common groundplane. In this system, the sensor parameters are automatically computed by

2

observation.

1.1.2 Approaches from Intelligent Robots

The purpose of researches developing multiple camera systems in roboticsis to realize vision systems that support robots by providing visual infor-mation taken with multiple cameras located in the environment, since it ishard for a single robot to obtain sufficient visual information for performingcomplicated tasks.

Hoover and others have proposed a concept of sensor networkedrobotics [Hoover00], where multiple vision sensors located in the environ-ment are used for supporting visual functions of robots. In this system, thelocations of the robots and obstacles are represented in an occupancy map,with which precise path planning of the robot is performed. Compared to asingle robot equipped with several sensors on board, the system can performrobust observation and planning of complicated paths by using sensors lo-cated in the environment, even if there are other objects than the robots suchas people and obstacles in the environment.

Robotic Room has been proposed by Sato and others [Nishida97], whichis an intelligent room for patients. Instead of developing a robot that canperform complicated tasks to support them, the room has various facilitiesconnected with a computer network such as vision sensors, robotic arms,tactile sensors, etc., which are usually mounted on a robot, in order to sup-port activities of patients in homes and hospital rooms.

Thus, there are various research activities that attempt to realize novelvision systems using multiple vision sensors. Most of them, especially in thearea of computer vision, are developing multiple camera systems to obtainmore precise models of objects than a single camera system, or to providesynthesized images at an arbitrary viewpoint.

On the other hand, several projects attempt to develop an intelligent en-vironment for robots and people. This approach is changing the paradigmof vision research, and will involve various research areas such as vision,

3

Figure 1.1: Compact omnidirectional vision sensor. The height is about6cm including a CCD camera unit.

robotics, computer network, and up to cognitive science for understandingthe behaviors of people observed by the system.

1.2 Distributed Omnidirectional Vision System

The objective of this research is to introduce a distributed omnidirectionalvision system and to develop various application systems. The system con-sists of a large number of omnidirectional vision sensors (see Figure 1.1),which are embedded in the environment and connected with a computer net-work, and monitors the environment as shown in Figure 1.2. Compared tostandard vision sensors, multiple omnidirectional vision sensors provide awide range of view, and also rich and redundant visual information, whichenables robust recognition of the targets and the environment. Thus, it isexpected that the distributed omnidirectional vision system breaks a newapplication area in both robotics and computer vision.

4

Omnidirectional vision sensorsOmnidirectional vision sensors

Computer networkComputer network

Figure 1.2: Distributed omnidirectional vision system.

Although various omnidirectional vision sensors have been proposed sofar, the distributed omnidirectional vision system employs the sensor shownin Figure 1.1, which has been developed by Ishiguro [Ishiguro98]. Sincethe sensor is low-cost and compact compared to other existing ones, wecan develop various application systems using many omnidirectional visionsensors.

1.3 Research Issues

In this thesis, we propose the distributed omnidirectional vision system, andstudy various techniques in the system, especially the following respects:

1. development of methods for localizing sensors and objects in the dis-tributed omnidirectional vision system,

2. development of localization methods that measure the sensor loca-

5

tions without complex numerical expressions, and

3. development of application systems.

The first issue deals with one of the most fundamental and importanttechniques in the distributed omnidirectional vision system. The distributedomnidirectional vision system has different aspects in localization comparedto existing multiple camera systems, that is, target locations are measured byomnidirectional stereo using a pair of omnidirectional vision sensors locatedapart from each other. In addition, various methods should be consideredaccording to situations, e.g., whether the locations of the sensors and/orobjects are known. The thesis systematically discusses these localizationmethods.

The second issue addresses localization of the sensors by observationwhen the locations of reference objects are unknown. Since the system con-sists of a large number of sensors, localization methods are necessary thatautonomously compute locations of the sensors by observing objects in theenvironment. If the locations of the objects are unknown, it is in generalhard to compute the sensor locations by solving complex numerical expres-sions on account of many unknown parameters. For this issue, the thesistakes two different approaches, i.e., statistical and qualitative approaches.

The third issue deals with various problems to be solved in order torealize application systems. In addition, the advantages of the distributedomnidirectional vision system are shown through the development of twoapplication systems, i.e., a navigation system for mobile robots and a real-time human tracking system.

1.4 Thesis Outline

In Chapter 2, several localization methods are introduced that measure met-rical locations of the sensors and objects by observation. Basically, thesemethods are based on omnidirectional stereo, which is an omnidirectionalversion of binocular stereo and measures precise locations based on precise

6

observed information. This chapter also discusses the measurement preci-sion specific to omnidirectional stereo, and difficulty of localization of thesensors by observation without any known parameters.

Chapter 3 introduces a statistical method for detecting the baselinesamong the sensors, from which the sensor locations can be computed bytriangulation. The method observes objects in the environment, and find thebaselines based on statistical analysis. The method is integrated with thequantitative methods explained in Chapter 2 for performing autonomous lo-calization of the sensors and objects. This chapter also discusses a statisticalmethod for estimating the correspondence among the projections of objectsin the view of the sensors.

In Chapter 4, a new method for localizing the sensors based on qualita-tive information, i.e., the motion directions of objects observed by the sen-sors, which can be considered as a purely qualitative localization methodthat has not been proposed so far. In addition, several other qualitative lo-calization methods are discussed. Although most of existing vision systemsmake use of metrical maps of the sensors and objects, it is important todiscuss qualitative approaches, since several existing systems employ qual-itative maps as well as metrical maps.

Chapter 5 summarizes the localization methods explained in Chapters 2through 4, showing various merits and limitations they have and the condi-tions where they can be used. It is also important to consider the integrationof the methods to solve various application-specific problems. As an exam-ple, this chapter gives a stepwise localization method of the sensors, withwhich the system autonomously localizes the sensors by observing objectsin the environment.

Chapters 6 and 7 present two application systems. In Chapter 6, a nav-igation system for mobile robots is explained. Generally, it is hard for asingle robot equipped with on-board vision sensors to perform complicatedtasks in the real environment. On the other hand, the robot can accomplishcomplicated tasks with the help of the sensors located around it. Basedon this idea, the system works as an infrastructure for mobile robots, andnavigates them using the omnidirectional vision sensors embedded in the

7

environment.In Chapter 7, a real-time human tracking system is introduced. The sys-

tem measures the locations of people by omnidirectional stereo in real time.When applying omnidirectional stereo to the real-time tracking system, wehave to consider the observation errors. In this system, several error com-pensation techniques are employed. Based on redundant visual informationtaken with many omnidirectional vision sensors, this practical vision systemperforms robust localization with simple computer vision techniques.

Chapter 8 concludes the thesis with a summary of the results of thisresearch, and discusses remaining issues.

8

Chapter 2

Quantitative Approaches

One of the most fundamental and important techniques in the distributedomnidirectional vision system is localization of the sensors and objects byobservation. There are various conventional methods for localizing the sen-sors and objects such as calibration of camera parameters using referencepatterns, map building of the environment, and localization of objects.

In this chapter, the localization methods of the sensors and objects,including conventional ones and those especially introduced for the dis-tributed omnidirectional vision system, are classified into five groups, anddiscussed in detail.

2.1 Observation for Localizing Sensors and Ob-jects

In general, several methods measure the locations in a 3-D space, and theothers on a 2-D plane. This thesis mainly discusses the latter methods (i.e.,those which localize the sensors and objects based on information obtainedby observing them with the sensors) based on the following assumptions:

� The sensors stand vertically at the same height from the floor (seeFigure 2.1).

9

Omni-directionalvision sensors

Figure 2.1: Configuration of the sensors.

� The sensors can observe each other, or objects in all horizontal direc-tions.

� Localization is performed on the plane where the sensors are.

In some cases, the object may have a vision sensor and observe the sen-sors located in the environment in order to localize itself or the sensors.However, in this chapter, in order to simply the discussion, it is assumedthat the objects have no vision sensor and only the sensors located in theenvironment observe other sensors or the objects.

10

Table 2.1: Localization types of sensors and/or objects. Only five types oflocalization indicated with numbers are possible.

Sensor Observation Target Parameters toType locations targets locations be acquired

Known Sensors Known Sensors�1

Known Sensors Known Objects�2

Known Objects Known Sensors�1

Known Objects Known Objects�1

Known Sensors Unknown�3 SensorsKnown Sensors Unknown�3 ObjectsKnown Objects Unknown Sensors�1

1 Known Objects Unknown ObjectsUnknown Sensors Known�3 SensorsUnknown Sensors Known�3 Objects

2 Unknown Objects Known SensorsUnknown Objects Known Objects�1

3 Unknown Sensors Unknown SensorsUnknown Sensors Unknown Objects�2

4 Unknown Objects Unknown Sensors5 Unknown Objects Unknown Objects

�1: locations are already known.�2: trying to localize the objects by observing the sensors.�3: conflict of known/unknown about the sensor locations.

With this assumption, observation for localizing and identifying sensorsand objects are classified as shown in Table 2.1. Each column of the ta-ble has two values (known/unknown and sensors/objects), and sixteen typesof localization are considered. However, only five of them indicated withnumbers are possible for localizing the sensors and/or objects:

Type 1 Observe objects of unknown locations from sensors of known loca-tions to localize objects (see Figure 2.2 (a)).

Type 2 Observe objects of known locations from sensors of unknown loca-

11

(a) Type 1 (b) Type 2

(c) Type 3 (d) Type 4 & 5

Sensor Object

unknown unknown

unknown unknown

unknown

unknown unknown

unknown

unknown unknown

unknown unknown

unknown

Figure 2.2: Observation for localizing sensors and/or objects

tions to localize sensors (see Figure 2.2 (b)).

Type 3 Observe sensors of unknown locations from sensors of unknownlocations to localize sensors (see Figure 2.2 (c)).

Type 4 Observe objects of unknown locations from sensors of unknown lo-cations to localize sensors (see Figure 2.2 (d)).

Type 5 Observe objects of unknown locations from sensors of unknown lo-cations to localize objects (see Figure 2.2 (d)).

The other observations shown in Table 2.1 are impossible. For example,the first row of the table is meaningless; it measures the locations of thesensors though they are already known.

In the following sections, detailed discussions about the methods M-1through M-5 are given, each of which corresponds to the localization types

12

Sensor 1 Sensor 2

M

Figure 2.3: Binocular stereo with omnidirectional vision sensors (top view).

1 through 5.

2.2 Method M-1: Localization of Objects byStereo

2.2.1 Omnidirectional Stereo

Given the precise sensor parameters (i.e., locations and directions of thesensors), objects in the environment can be identified and localized witha pair of omnidirectional vision sensors in the same way as conventionalbinocular stereo.

In Figure 2.3, in order to compute the 2-D location of the point M withtwo omnidirectional vision sensors, each sensor observes the azimuth angleto M (shown as the broken lines in Figure 2.3). Since the angle is repre-sented in the internal coordinates of each sensor, it is transformed into therepresentation in the world reference frame using the sensor parameters.Then, the intersection of the lines is considered as the location of M.

The difference between omnidirectional stereo using omnidirectional vi-sion sensors and conventional stereo using standard rectilinear vision sen-sors is that omnidirectional stereo assumes an arbitrary configuration of thesensors and objects, which involves the uncertainty problem in omnidirec-tional stereo explained in the next subsection.

13

Sensor 1 Sensor 2

M1

M2

Figure 2.4: Unstable localization.

2.2.2 Uncertainty of Omnidirectional Stereo

The problem in the omnidirectional stereo is that the measurement precisionof a target location becomes very low when the target is located along thebaseline of two sensors [Ishiguro92]. For example, the target M2 in Figure2.4 is located near the baseline of sensors 1 and 2, so that the target loca-tion measured by them is unstable. Generally, this problem does not occurin conventional stereo, but in omnidirectional stereo, since omnidirectionalstereo assumes arbitrary locations of sensors and targets.

In order to compute the uncertainty of the omnidirectional stereo, wefirst consider the angular resolution of the omnidirectional sensors. Figure2.5 is an example of omnidirectional images whose size is 640� 480 pix-els. The resolution of the image along the circumference of the horizontaldirection (indicated with the white circle) is approximately 800 pixels, andthe sensor has a resolution of approximately 0.5 to 1 degree.

Figure 2.6 shows the uncertainty of the omnidirectional stereo using twoomnidirectional vision sensors. The angular error of the sensors is �0:5degree, and the baseline length is 2m. The dots represent the uncertainty ateach location. As shown in Figure 2.6, the omnidirectional stereo performslocalization with a high degree of accuracy near the sensors except near thebaseline, since the baseline is very long compared to conventional stereo.On the other hand, the measurement is very noisy near the baseline.

The uncertainly of conventional binocular stereo with standard visionsensors is shown in Figure 2.7 for comparison. The baseline length is 20cm.

14

Horizontal direction

Figure 2.5: Omnidirectional image.

Here, it is assumed that the viewing angle of the sensors is 30 degrees, andthe horizontal resolution is 640 pixels. Hence the maximum angular errorof the standard sensors is about �0:03 degree. The error is much smallerthan that of the omnidirectional stereo, since the omnidirectional vision sen-sor observes 360 degrees in the omnidirectional stereo, while the standardvision sensor observes only 30 degrees. However, by comparing Figure 2.6with Figure 2.7, we can find that, on account of the long baseline length, theomnidirectional stereo has almost same accuracy as the conventional stereoexcept near the baseline, in spite of its large angular errors.

2.2.3 Using Multiple Omnidirectional Sensors

In the distributed omnidirectional vision system, many combinatorial pairsof the sensors can be used for omnidirectional stereo. Figure 2.8 shows the

15

-4

-2

0

2

4

-6 -4 -2 0 2 4 6

y [m

]

x [m]

Sensors

Figure 2.6: Uncertainty of omnidirectional binocular stereo (top view). Thebaseline length is 2m.

uncertainty of the omnidirectional stereo using four sensors. The angularerror is same as in Figure 2.6. In Figure 2.8, the smallest uncertainty ateach location given with the best sensor pair is shown. Thus, by selecting aproper sensor pair, the omnidirectional stereo covers a wide area with a highdegree of accuracy. In other words, it is very important to select a propersensor pair in order to perform fine localization.

Furthermore, by using more than two sensors for localization, it is ex-pected that the object location is precisely measured based on redundantinformation from the multiple sensors.

16

0

2

4

6

8

10

-6 -4 -2 0 2 4 6

y [m

]

x [m]

Sensors

Figure 2.7: Uncertainty of conventional binocular stereo (top view). Theviewing angle of the cameras is 30 degrees and the baseline length is 20cm.

2.3 Method M-2: Localization of Sensors byObserving Reference Objects

If the locations of the sensors are unknown but those of objects are known,the locations of the sensors can be computed by observing the objects. Basi-cally, various conventional methods for calibrating camera parameters usingmany calibration points of known locations can be applied to sensor local-ization. Here, the following method is considered.

In order to localize the sensor, it observes the objects of known locationsas reference points and measures angle θ12 between the points (objects) M1and M2 (see Figure 2.9 (a)). By this observation, the sensor is located along

circular arc A12. From sine formula, the radius of A12 is l122sinθ12

, where l12 isthe length of line segment M1M2. In the same way, it is located along circu-lar arcs A12, A13, and A23 by observation of M1, M2, and M3 (see Figure 2.9(b)). Hence, the sensor location C is estimated at the intersections of the

17

-4

-2

0

2

4

-6 -4 -2 0 2 4 6

y [m

]

x [m]

Sensors

Figure 2.8: Uncertainty of omnidirectional stereo with four sensors (topview).

circular arcs. Note that, in this method, all of the reference points (objects)should be identified since we use their locations for sensor localization. In-stead of identifying multiple objects, however, it is also possible that thesensor observes a single moving object of known location and measures thedirections to the object at various locations.

The measurement precision depends on the configuration of the sensorand the objects, which is similar to that of standard camera’s localizationbased on triangulation. As discussed in [Madsen98], a proper combinationof objects should be selected for a precise measurement of the sensor lo-cation. For example, with the configuration of the sensor and the objectsshown in Figure 2.10, it is difficult to obtain a stable solution since the cir-cular arcs intersect with very small angles.

In general, quantitative methods [Nagel86], which use triangulation,stereo techniques, range sensors and so on, are based on the accumula-

18

Sensor

12A12

M1

M2 M2

M1

M3C

A12

A23

A13

(a) Localization withtwo objects

(b) Localization withthree objects

Figure 2.9: Localization of omnidirectional vision sensor (top view). Thesensor is located (a) along circular arc A12 with two reference objects, and(b) at point C with three objects.

M1

M2

M3

C

Figure 2.10: Selection of reference objects that yields an unstable solution.

19

tion of precise metrical information. Triangulation, for example, is gen-erally sensitive against sensory noise and accumulates errors, especiallyfor some configurations of landmarks [Madsen98], so that proper errormodels and noise filtering techniques are necessary for quantitative meth-ods [Roach80, Matthies87, Ayache89, Broida90].

2.4 Method M-3: Localization of Sensors byObserving Sensors

In methods M-3, M-4, and M-5, the locations of both the sensors and objectsare unknown. In this section, we first discuss method M-3, i.e., localizationof the sensors by observing themselves.

2.4.1 Triangle Constraint

If the sensors are enough big to find others in their view, the directions fromeach sensor to other sensors can be obtained. Kato and others [Kato99]proposed a method for identifying and localizing the sensors with triangleconstraints as described below:

1. At each sensor, measure the directions to the other sensors, then com-pute angles between every pair of the directions (see Figure 2.11 (a)).

2. For every possible triplet of the sensors, find a triplet of angles thatadd up to 180 degrees (triangle constraint, see Figure 2.11 (b)). Thisis a triangle candidate that consists of the three sensors.

This method assumes that the sensors can find other sensors and measurethe directions, but cannot identify them. Note that, with a single trianglecandidate, the correspondence between the actual sensors and the projec-tions in their retina cannot be determined. In Figure 2.11 (b), for example,the triangle can be flipped and the correspondence among the projections

20

(a) Observation (b) Triangle candidate

1 + 2 + 3 =

Sensor 1

Sensor 2

Sensor 3Sensor 1

S11

S12

S21

S22

S31 S32

Sensor 2

Sensor 3

Figure 2.11: Triangle constraint. (a) Observations by three sensors. (b) Findtriangle candidates that consist of the three sensors and satisfy the triangleconstraint, i.e., θ1 +θ2 +θ3 = π . Note that this constraint is not enough todetermine the correspondence of the sensors (see text).

S11 through S32 in Figure 2.11 (a) cannot be determined from a single trian-gle candidate. In addition, the same angle may be observed by the sensors,which yields wrong triangle candidates. Therefore, triangle verification isnecessary:

1. For a triangle candidate found in the above process (let this be a ref-erence triangle Tref, which consists of sensors S1, S2, and S3), findan adjacent triangle (let this be Tadj) that shares one of the sides oftriangle Tref (see Figure 2.12). Tadj contains the forth sensor S4.

2. Check the locations of S1 through S4 using Tref and Tadj. If the sensorscan be placed, the triangle candidates are possible. If not, they areimpossible.

21

S2

S1 S3

S4

S5

Tref

Tadj

Figure 2.12: Propagation of triangle constraints.

3. Iterate the above steps for remaining triangle candidates, using thesensors’ locations that have already been checked in the previous it-eration (propagation of triangle constraints).

In order to obtain a correct solution, all of the sensor locations shouldbe refined (e.g., by a least squares method) whenever the propagation isperformed. Thus, this method performs identification and localization ofthe sensors at the same time.

2.4.2 Method M-3FOE: Estimation of Baselines amongSensors Based on FOE

In the above method, the sensors need to observe themselves. However,if the sensors are compact, they cannot detect other sensors in their view.In this case, the directions to other sensors can be estimated by observingvisual features in the environment as follows.

Let us suppose that two sensors observe the environment. If both of themobserve two visual features with the interval of 180 degrees, the directionof the features indicates the other sensor’s direction (FOE constraint, seeFigure 2.13). Thus, the baselines among the sensors are obtained regardlessof the environment structure, and the sensor locations can be computed inthe same way as described in Section 2.4.1.

The FOE constraints can be found only if there are visual features alongthe baseline that can be identified with template matching etc. If not, the

22

: Sensors

Figure 2.13: FOE constraints. Observation of two visual features by twosensors with the interval of 180 degrees implies that there is the baselinebetween them.

method cannot obtain some of the FOE constraints. However, the sensorlocations are computed by triangle constraint and propagation of them asdescribed in the previous subsection, even if not all of the FOE constraintsamong the sensors have not been obtained.

2.5 Method M-4 & M-5: Localization of Sen-sors and Objects by Observing Objects

The method described in the previous section can be used when the sensorscan observe others. In this section, we discuss methods that can be usedwhen the sensors cannot observe others but can observe objects of unknownlocations in the environment. Since it is basically hard to measure the sensorlocations when the system consists of a large number of sensors, the local-

23

: Object: Sensor

m1

A (= 0)

m2

m3

m4

m5

SA

BC

y

x1A

SB

SC

Figure 2.14: Localizing three sensors by observing five objects of unknownpositions. The sensors observe the objects in the environment.

ization methods of the sensors are necessary that can autonomously measurethe sensor locations by observing objects in the environment.

Let us suppose that the sensors measure their locations by observing thedirections to reference points of unknown locations as shown in Fig. 2.14.The following equation is an example representation of the locations of thesensors and objects:

tan(φi j +θ j) =miy�S jy

mix�S jx; (2.1)

where φi j represents the azimuth angle to object mi measured from sensorS j, θ j represents the reference direction (zero direction) of sensor S j, and(S jx; S jy) and (mix; miy) represent the location of sensor S j and object mi,respectively. Note that, as shown in Figure 2.14, SAx = SAy = θA = 0.

In order to compute the sensor locations (S jx; S jy) and the reference

24

directions θ j, we have to solve nonlinear simultaneous equations obtainedfrom Equation 2.1. For example, the following equations give θB and θC:

(X1�X3)fX1 tan(φ1B+θB)�X2 tan(φ2B+θB)g

�(X1�X2)fX1 tan(φ1B+θB)�X3 tan(φ3B+θB)g

(X1�X2)ftan(φ3C+θC)�tan(φ1C+θC)g

�(X1�X3)ftan(φ2C+θC)�tan(φ1C+θC)g

=

(X1�X4)fX1 tan(φ1B+θB)�X2 tan(φ2B+θB)g�(X1�X2)fX1 tan(φ1B+θB)�X4 tan(φ4B+θB)g



(2.2)





=





; (2.3)

where Xi is:

Xi =tanφiA� tan(φiC +θC)

tanφiA� tan(φiB +θB): (2.4)

These nonlinear simultaneous equations are sensitive against observa-tion errors, especially for some configurations of the objects [Madsen98].They can be also solved by numerical methods which compute approximatesolutions, however, proper initial estimates are necessary.

Note that, in the scenario of this section, the equations representing thesensor and object locations are more sensitive than those of conventionaltriangulation, since the locations of the reference points (objects) are alsounknown. In addition, if the objects are not small points and have complexshapes, the measurements of the directions to the objects may be noisy inpractice. Therefore, iterative observation and noise filtering techniques arenecessary [Roach80, Broida90].

25

2.6 Summary

In this chapter, several methods for measuring sensor and object locationshave been discussed. These quantitative methods measure precise locationsby reducing observation errors based on rich and redundant visual informa-tion obtained with the omnidirectional vision sensors. This is one of theadvantages of the distributed omnidirectional vision system.

However, with respect to the methods M-4 and M-5, which localize thesensors and objects without any known parameter, it is basically difficult tosolve the nonlinear simultaneous equations for localizing them, as shown inSection 2.5.

In the following chapters, we introduce localization methods of the sen-sors in both statistical and qualitative approaches, which compute the loca-tions of the sensors without complex numerical expressions.

26

Chapter 3

Statistical Approaches

In this chapter, we introduce a method that statistically detects the baselinesamong the sensors, with which the sensors are localized by triangulation.The method observes the directions to objects in the environment, mem-orizes them, and finds baseline directions among the sensors by statisticalanalysis.

3.1 Statistical Estimation of the Baseline Direc-tions among the Sensors

Figure 3.1 illustrates the basic idea of this method. There are two omnidi-rectional sensors, and an object is moving among them. When the object isat a, b, or c in Figure 3.1, sensors 1 and 2 observe it in the same direction.On the other hand, when the object is at d, e, or f, sensor 1 observes it inthe same direction, but sensor 2 observes it in different directions. Thus, thesensors observe the object in the same direction only when the object is onthe baseline of the sensors. By memorizing pairs of the directions to movingobjects, the direction of the baseline between two sensors can be found.

In omnidirectional stereo, the object may be located at three differentpositions on the baseline as shown in Figure 3.2. For actual directions to theother sensor (let these directions be θ1 and θ2), the following three direction

27

a b c

d

e

f

Sensor 1 Sensor 2

Baseline

Figure 3.1: Basic idea for baseline estimation.

Sensor 1 Sensor 2(a)

(b)

(c)

Object

1 2

Figure 3.2: Three different positions on the baseline.

pairs are obtained:

1. θ1 and θ2 (see Figure 3.2 (a)),

2. (θ1 +π) and θ2 (see Figure 3.2 (b)),

3. θ1 and (θ2 +π) (see Figure 3.2 (c)).

Theoretically, a direction pair (θ1 + π) and (θ2 + π) is not obtained,except the case where sensor 1 observes an object in (θ1 +π) and sensor 2observes another object in (θ2 +π), which is rarely happen. By checkingthe number of observation of each pair, the actual direction pair to the othersensor θ1 and θ2 is estimated.

28

3.2 Implementation of the Algorithm

Based on the above discussion, the baseline directions among sensors arestatistically estimated as follows. For sensor i, let the direction to the jthobject be θi j, which is represented in each sensor’s local coordinate. Themethod memorizes a direction pair Pi1 j1i2 j2

= (θi1 j1; θi2 j2

) in a baseline can-didate set P.

Assuming that the sensors may be accidentally moved in the real envi-ronment, the method dynamically estimates the baseline directions based ondynamic (i.e., real-time) information of observed directions as follows:

1. Observe the directions to objects.

2. For all combination of the directions Pi1 j1i2 j2observed in step 1, do

the following steps:

(a) Initialize Pdec:

Pdec P:

Pdec is a set of direction pairs whose reliability as a baselinedirection should be decreased.

(b) Compare Pi1 j1i2 j2with every element of P (let this be pk):

� If both of the two directions θi1 j1and θi2 j2

of Pi1 j1i2 j2are

same as those of pk, pk is considered as a pair of correctbaseline directions. Increase the reliability of pk, and re-move it from Pdec.

� If one of the two directions of Pi1 j1i2 j2is same as pk’s, pk is

considered as a pair of wrong baseline directions. Leave pkin Pdec.

(c) Decrease the reliability of the elements in P that are also con-tained in Pdec. If the reliability becomes smaller than a thresh-old, remove it from P.

29

(d) In step 2b, if both of the two directions of Pi1 j1i2 j2are different

from those of all pk, Pi1 j1i2 j2is considered as a new candidate of

baseline direction. Add it to P.

In step 2b, direction α is considered as identical to direction β if α =

β �π (i.e., α is exactly opposite direction to β ).After several observations, the baseline candidates in P whose reliability

is greater than a threshold are considered as correct baseline directions.

3.3 Increase Ratio of the Reliability

The quality of the results depends on the increase ratio of the reliability. Forexample, the method may detect many wrong baselines with a high ratio(e.g., increase : decrease = 10 : 1), while it cannot detect baselines with alow ratio (e.g., 1 : 10). In this section, we discuss how a proper ratio can bedetermined for baseline estimation.

Figure 3.3 shows the case where an object is located on the base-line between sensors 1 and 2. In this figure, θ indicates the angular res-olution of the sensors, and k1 and k2 indicate observed azimuth angles(0� (k1θ ; k2θ)< π=2; the value is an integer). δ1 and δ2 (0� (δ1; δ2)< θ )indicate the differential angle between the actual baseline direction and thedirection of the reference axis of each sensor (the zero direction). d is thebaseline length and D is the observable range of the sensors. N is the numberof different angles, which is given by 2π=θ .

If the object is located in R (light gray regions in Figure 3.3), both ofthe sensors observe it in the baseline direction. On the other hand, if theobjects is located in S (dark gray regions), one of the sensors observes it inthe baseline direction, but the other does not.

When the object is located in R, the direction pair (k1; k2) = (0; 0) isobtained, which indicates the baseline direction, and the reliability of thepair is increased by ci. On the other hand, when the object is located inS, the direction pair (k1; k2) = (0; �) and (k1; k2) = (�; 0) (“�” takes an

30

...

...

...

...

SR

Sensor 1 Sensor 2d

D

2

k1 = 2

k1 = 1

k1 = 0

k1 = N/2-1

k1 = N/2-2

k2 = 2

k2 = 1

k2 = 0

k2 = N/2-2

k2 = N/2-1

1

Figure 3.3: Configuration of sensors and an object (top view) where at leastone of the sensors observe the object in the baseline direction.

arbitrary value except 0) are obtained, and the reliability of the pair (0; 0) isdecreased by cd .

Hence, in order to avoid removal of the pair (0; 0), the increase ratio ci

and the decrease ratio cd of the reliability of direction pairs should satisfythe following inequality:

ciR > cdS; (3.1)

where R and S indicate the size of the regions R and S, respectively.Next, the case is considered where the object is located in an arbitrary

place other than the regions R and S shown in Figure 3.3 (see Figure 3.4).In Figure 3.4, when the object is located in R’, the reliability of the directionpair (k1; k2) = (a; b) is increased by ci. On the other hand, when the objectis located in S’, the direction pair (k1; k2) = (a; �) and (k1; k2) = (�; b) (“�”

31

...

...

...

...

Sensor 1 Sensor 2d

D

S'R'

2

k1 = 2

k1 = 1

k1 = 0

k1 = N/2-1

k1 = N/2-2

k2 = 2

k2 = 1

k2 = 0

k2 = N/2-2

k2 = N/2-1

1

Figure 3.4: Configuration of the sensors and an object (top view) where bothof the sensors observe the object in the direction other than the baseline.

takes an arbitrary value except b and a, respectively) are obtained, and thereliability of the pair (a; b) is decreased by cd .

Since the pair (a; b) does not indicate the baseline direction, it shouldbe removed. Hence, ci and cd should satisfy the following inequality:

ciR0 < c jS

0; (3.2)

where R0 and S0 indicate the size of the regions R’ and S’, respectively.Inequalities 3.1 and 3.2 should be satisfied for arbitrary δ1, δ2, and d

(0 < d < 2D; d is the baseline length and D is the observable range). Con-sequently, ci and c j should satisfy:

SR� ci

c j� S0

R0: (3.3)

32

Sensor 1

Sensor 2

Sensor 3 Sensor 4

3.0m

Figure 3.5: Sensor configuration (top view).

However, ci and c j that satisfy the above inequality for arbitrary δ1, δ2,and d do not exist, since:

� When δ1 and δ2 are close to 0 or θ (i.e., the difference between thereference axis and the actual baseline direction becomes large), R be-comes small and S becomes large, and S=R (the left side of Inequal-ity 3.3) becomes large.

� When d is relatively small compared to D (i.e., two sensors are veryclose; for example, d < 0:5D), or d is larger than D, S0=R0 (the rightside of Inequality 3.3) becomes small.

In order to determine ci and c j that satisfy Inequality 3.3, other val-ues such as δ1, δ2, and d should be limited to a specific range. For exam-ple, the results of preliminary experimentation show that, if ci=c j = 7:0, itapproximately satisfies Inequality 3.3 on condition that 0:5D � d � 1:0D,θ = 2π=52, and the error margin of k1 and k2 is �1.

3.4 Experimentation

We have evaluated the method in both of a simulated and real environment.Figure 3.5 shows the configuration of four omnidirectional vision sensors,

33

0

2

4

6

8

10

12

14

16

18

0 25000 50000 75000 100000

# B

asel

ines

# Frames

1 object

3 objects

5 objects

Figure 3.6: The number of detected baselines in the simulated environment.

which is same in both environments (actually, the sensor parameters weremeasured in the experimentation of the real environment). The sensors canobserve the direction to objects with the resolution of 360Æ/52. The reli-ability of the baseline candidates are increased by 5 and decreased by 1.The upper limit of the reliability is 500, and the threshold with which thecandidates are considered as correct ones is 250.

3.4.1 Simulated Environment

In the simulation, several objects are randomly placed in the environment,and the sensors measure the direction to the objects. As shown in Figure 3.6,when there is one object in the environment, the method has found all ofthe six baselines after 100,000 observations. With three objects, it has alsofound six baselines, however, two of them were duplicated next to each ofthe actual baselines, and two of the actual baselines have not been found. In

34

Sensors

Figure 3.7: Outdoor experimentation.

the case of five objects, the method has found sixteen baselines on accountof false matches of the objects. All of the six correct baselines were includedin the sixteen baselines, four of the remaining ten baselines were next to theactual ones, and the remaining six baselines were completely wrong.

3.4.2 Real Environment

We have evaluated the method in an outdoor environment, with the samesensor configuration as the simulation (see Figure 3.7). In this experimen-tation, the sensors detect objects (usually, walking people) by backgroundsubtraction (see Figure 3.8), and measure the direction to the objects. InFigure 3.8, four omnidirectional images are shown, and the graphs in thebottom of each image show vertical sums of the intensity difference. Thehorizontal center of each graph is considered as the direction to the object.

In the real environment, we need to ignore stationary objects, since thesame direction pair is continuously yielded by observing a stationary object,which leads to unexpected increase of the reliability of the direction pair that

35

Figure 3.8: Unwrapped image. Each of the white peaks indicates the objectdirection detected by background subtraction.

36

0

2

4

6

8

10

0 5000 10000 15000 20000 25000

# B

asel

ines

# Frames / 10

Figure 3.9: The number of detected baselines in the real environment.

is not of the baseline. Therefore, in this experimentation, we have modifiedthe acquisition process described in Section 3.2 to eliminate stationary ob-jects as follows: in step 2 (see page 29), if both of the directions θi1 j1

andθi2 j2

that a direction pair consists of are same as those observed in the pre-vious frame, the method does nothing with the direction pair. This processalso removes observation errors that are yielded by background noises andcontinuously detected at the same direction by background subtraction.

In addition, the difference of�1� in comparing two directions in step 2b(page 29) is considered as zero (i.e., the same direction), since the methodcannot detect the precise directions to objects with the real images, espe-cially the objects are located near the sensors.

Figure 3.9 shows the number of detected baselines in the real environ-ment. After 250,000 observations (approximately 4.5 hours), the method

�The unit is 360/52 degrees, as described in page 34.

37

Sensor 1

Sensor 2

Sensor 3

Sensor 4

44

33

344

21 1 4

44

2 21

33

31

Figure 3.10: Detected baselines in the real environment (top view). Thearrows indicate wrong directions.

has detected ten baselines. Figure 3.10 shows the directions of the obtainedbaselines overlaid on the actual sensor locations. Two of them indicatedwith the arrows are obviously wrong, however, the others indicate nearlycorrect directions. Note that twenty directions are shown in Figure 3.10,though only ten baselines have been acquired. This is because each baselineis represented with a pair of two directions.

3.5 Discussion

3.5.1 Statistical Estimation of the Object Correspondence

In statistical baseline estimation, the method memorizes a pair of directionsto an object observed by two sensors. If it memorizes a combination of threeor more directions, the projections of an object to the sensors are estimated.

Figure 3.11 illustrates the basic idea of this method. If there is one objectin the environment, the projections of the object to the sensors are unique(see Figure 3.11 (a)). If the correspondence between various object loca-tions and the sensors’ projections have been statistically acquired, then we

38

x y

Sensor 1 Sensor 2

Sensor 3

12

11 22

21

3132

(a) (b)

Figure 3.11: Basic idea for correspondence estimation.

can verify the correspondence of multiple objects projected onto the viewof the sensors. In Figure 3.11 (b), for example, each sensor has two projec-tions of the two objects a and b, and there are 2� 2� 2 = 8 combinationsof the projections. However, if the knowledge of correct correspondencesamong sensors’ projections has been acquired, it is determined that onlythe combinations fθ11; θ21; θ31g and fθ12; θ22; θ32g are correct, and oth-ers are wrong. Note that this method is not for measuring the locations ofthe objects (e.g., the metrical locations x and y in Figure 3.11 (b)), but forestimating correct combinations of the projections.

3.5.2 Classification of the Statistical Methods

In the same way as the quantitative methods described in Chapter 2, themethods in this chapter can be also classified based on the observation.

With respect to the baseline estimation method described in Section 3.1,the locations of both the sensors and objects are unknown. The method de-tects the baselines among the sensors, from which approximate locations ofthe sensors can be computed by the triangle constraints as described in Sec-tion 2.4. Therefore, this method is named “S-3” that is similar to methodsM-3, which localizes the sensors by observing the sensors.

With respect to the identification method described in Section 3.5.1, thecorrespondence of objects in the views of the sensors is determined. Thismethod does not measure the locations of the objects, however, it identifies

39

their locations. Therefore, this method is named “S-5” that is similar tomethod M-5, which localizes the objects by observing them with the sensorsof unknown locations.

3.6 Summary

In this chapter, two methods that statistically detects the baselines amongthe sensors, and identifies objects projected onto the view of each sensor,are presented.

As described in Section 3.3, the increase ratio of the reliability should beproperly determined in the baseline estimation method. Although the dis-cussion on this point is not yet sufficient, we have shown with experimen-tation that this method can detect the baselines among the sensors withoutthe knowledge of the object correspondence. Further consideration of theincrease ratio as well as the verification of the identification method remainsas future work.

The methods described in this chapter are classified into S-3 and S-5 asdescribed in Section 3.5.2. Other types of methods, i.e., S-1, S-2, and S-4should be considered in future research.

40

Chapter 4

Qualitative Approaches

4.1 Introduction

4.1.1 Vision Systems Using Qualitative Maps

Generally, most of existing vision systems make use of metrical maps. Forexample, vision guided mobile robots and automobiles refer metrical mapsto navigate in the environment, and people tracking systems measure thelocations of people by triangulation using the sensors’ parameters.

On the other hand, several vision systems make use of not only metricalmaps, but also qualitative maps. For example, Levitt and Lawton proposeda qualitative method for landmark-based robot navigation in an outdoor en-vironment [Levitt90]. The robot observes the order of landmarks locatedaround it, and refers to a map to identify its qualitative location. The mapindicates precise locations of the landmarks and defines qualitative locationsas shown in Figure 4.1.

Besides this method, several works have been reported which utilizepre-defined qualitative maps and qualitatively utilize standard geometricalmaps. Especially in the area of robotics, it is expected that qualitative meth-ods are not seriously affected by sensory noise and enable us to navigaterobots in a wide environment, for example. In the distributed omnidirec-tional vision system, a qualitative map representing the qualitative positions

41

B

A

CD

EF

Figure 4.1: Qualitative representation of positions.

of the sensors can be used to determine in what order the sensors should beused for navigating robots.

4.1.2 Representing Qualitative Locations

Qualitative maps of sensors and objects used in such application systemsare represented in various ways. However, here we consider the qualitativerepresentation as shown in Figure 4.1, where the point positions are repre-sented as relative positions with respect to any possible lines passing overtwo points. The position of point C, for example, is represented as follows:

C is located on the left of directed lines AB, AE, BE, BF, EF,and the right of AD, AF, BD, DE, DF.

This is one of the simplest representation and various methods have beenproposed in the field of qualitative spatial reasoning [Forbus91, Freksa92,Latecki93, Isli00]. Especially, this representation can be used for map-basedrobot navigation [Levitt90]. In Section 4.2.2, a more detailed discussionabout the qualitative representation is given.

42

4.2 Localization of the Sensors Based on Quali-tative Information

4.2.1 Previous Work

Most of the previous works acquire qualitative landmark positions from geo-metrical maps measured by quantitative methods (e.g., triangulation). How-ever, since the sensory data is noisy, the acquired qualitative positions maynot be consistent especially in a large-scale environment. Although it is pos-sible to measure the precise positions by iterating observations and comput-ing the positions systematically, this approach is inefficient for acquiring thequalitative landmark positions, since they have essential information aboutpositions, but have approximate positions in general.

Several methods have been proposed that acquire qualitative spatialrepresentation by quantitative observation. Yeap developed a methodfor acquiring a cognitive map based on 2 1

2-D representation of local ar-eas [Yeap88]. The map is acquired with range sensors. Kuipers and Byunproposed a method for acquiring qualitative representation by explorationof a robot [Kuipers91]. The representation consists of corridors and in-tersections recognized from sensory input. These methods have to solvethe abstraction problem from perceptual information of the real world intoqualitative representation, and also have to integrate local representationsinto a global representation. A method which acquires qualitative landmarkpositions from more low-level and reliable information would be useful,however, such a method has not been proposed so far.

Here, we introduce a novel method for directly acquiring qualitative po-sitions of landmarks from qualitative information obtained by visual obser-vation [Sogo01]. The method observes motion directions of moving objectsin an environment from several landmarks as shown in Figure 4.2 (a). Whilethe objects move around the environment, the method acquires qualitativepositions of the landmarks with several rules based on geometrical con-straints. Generally, we consider that qualitative information is abstractedfrom quantitative information. However, the correctness of the qualitative

43

DC

B

ALeft

LeftRight

LeftMoving

object

: Landmark

Figure 4.2: Observation for acquiring qualitative positions. A sensor at eachlandmark observes motion directions of moving objects in the environment.

information obviously depends on measurement methods. We use motiondirections of moving objects as qualitative information since they are stablyobtained by tracking the objects for a sufficiently long time. Thus, comparedwith the previous acquisition methods, this method focuses on how to ac-quire qualitative positions of landmarks from low-level, simple and reliableinformation.

4.2.2 Qualitative Spatial Model

In our method, the positions of points (in the remaining sections, we refer tolandmarks as “points”) are represented with relative positions with respectto lines passing over arbitrary two points as shown in Figure 4.3 (a).

Figure 4.3 (b), called a qualitative spatial model, is a formal represen-tation of the qualitative positions of the points shown in Figure 4.3 (a). Themodel consists of several components, each of which represents positionalrelations among arbitrary three points as follows [Schlieder95] (see Fig-ure 4.4):

� pi p j pk =+ if pi! p j! pk lie in counterclockwise,

44

B

A

CD

EF

BDF: −

BEF: +

CDE: −

CDF: −

CEF: +

DEF: +

ABC: +

ABD: +

ABE: +

ABF: +

ACD: +

ACE: −

ACF: +

ADE: −

ADF: −

AEF: +

BCD: +

BCE: −

BCF: −

BDE: −

(b) Qualitative spatial model(a) Configuration of points

+

Figure 4.3: Qualitative representation of positions. (a) An example config-uration of points. (b) The corresponding qualitative spatial model.

� pi p j pk =� if pi! p j! pk lie in clockwise,

where pi, p j and pk are arbitrary three points. In the case of six points as

shown in Figure 4.3, these�6

3

�= 20 components are needed to represent all

positional relations among the points.The goal of this method is to acquire the qualitative spatial model as

shown in Figure 4.3 (b).

4.2.3 Qualitative Observation

The qualitative spatial model is acquired by observing motion directions ofmoving objects from each point. In the case of Figure 4.2 (a), for exam-ple, vision sensors at the points A through D simultaneously observe instantmotion directions of the object. When the projection of the moving objectmoves clockwise in the omnidirectional retina of a vision sensor, the motionis qualitatively represented as “right,” and when it moves counterclockwise,it is represented as “left.”

With the observed motion directions, the points are classified into a spa-tially classified pair (SCP), which consists of a pair of point sets labeled

45

+

pi

jp kp kp

pi

jp−

pi jp kp = + pi jp kp = −

Figure 4.4: Qualitative representation of positional relations among threepoints.

“left” and “right.” In the case of Figure 4.2, an SCP “fABDg, fCg” is ac-quired by observation, which means that there is a straight line that classifiesthe points into such a pair of point sets. By iterating the observation whilethe object moves around the environment, various SCPs are acquired exceptinconsistent ones. For example, an SCP “fADg, fBCg” is inconsistent withthe configuration of the points shown in Figure 4.2, since there is no straightline which classifies the points into such a pair.

Note that if the sensor cannot determine the motion direction of the ob-ject, the sensor (point) will not appear in the SCP. This means that the SCPhas no information with respect to the point position. The qualitative spatialmodel is acquired from the SCPs as described in the next section, however,the qualitative position of a point will not be acquired unless it observes themotion direction of the object.

4.2.4 Overview of the Acquisition Process

SCPs represent geometrical constraints among positions of points as de-scribed in the previous section. The qualitative spatial model, an exampleof which is shown in Figure 4.3 (b), is acquired from the SCPs by iteratingthe following steps (see Figure 4.5):

1. Acquire an SCP by qualitative observation (see Section 4.2.3).

2. Acquire three point constraints (3PCs) from the SCP.

46

1

2

4

3

Qualitative spatial model

Motion directions(left / right)

Spatially classified pairs

Three point constraints

Figure 4.5: Process for acquiring the qualitative spatial model.

3. Classify the points into new SCPs based on the 3PCs, and acquire new3PCs (constraint propagation).

4. Transform the 3PCs into the qualitative spatial model.

The following subsections explain steps 2 through 4 in detail.

4.2.5 Acquiring Three Point Constraints

In order to determine the qualitative positions of the points, our methodchecks possible positions of the fourth point with respect to a triangle con-sisting of three points. Since a triangle is the minimum component to repre-sent closed regions, we can represent the qualitative positions of all pointsby combining the triangles.

Let us consider four points A, B, C and X . The qualitative position ofX with respect to A, B and C is represented with one of the seven regionsdefined with three lines AB, AC and BC, and encoded as shown in Fig-ure 4.6. Several constraints which limit possible regions of X are acquiredfrom SCPs based on geometrical constraints. Suppose A, B, C and X areclassified into SCPs “P, Q” in various ways as shown in Figure 4.7. Con-sidering positional symmetry of the points, the geometrical constraints are

47

B C

A

100

101111

110

001011010

Figure 4.6: Seven regions defined with three points.

summarized into the following cases:

1. When P includes A, B and C:

If P also includes X , there is no constraint on the position of X . If Qincludes X , which is the other set to P, X is not located in the region111 (see Figure 4.7 (1)).

2. When P includes A, and Q includes B and C:

If P also includes X , it is not located in the region 011 (see Figure 4.7(2)). If Q includes X , it is not located in the region 100 (see Figure 4.7(3)).

We call these constraints three point constraints (3PCs). In general,there are six different SCPs with respect to arbitrary four points as shownin Figure 4.8 (a). The six SCPs are acquired by observation if motion di-rections of objects are sufficiently observed from the points, and they aretransformed into six 3PCs with respect to each point’s position as shownin Figure 4.8 (b), which uniquely determine the region of the point. In thesame way, the qualitative positions of all points are determined when allpossible SCPs are acquired by observation.

4.2.6 Constraint Propagation

Various SCPs and 3PCs are acquired by iterating observation of motion di-rections of objects, however, in practice there are some limitations of ob-

48

(2) (3)(1)

XCB

A

XA

B C

{ABC}, {X}

P Q

XCB

A

X

{AX}, {BC}

P

Q

A

B C

X

CB

A

X

{A}, {BCX}

P

Q

A

B C

Figure 4.7: Three point constraints. The crosses represent regions where Xis not located.

D

D

DD

DD

A

CB

2

1

4

3

5 6

(a) SCPs (b) 3PCs

1

2 345

6

B C

A

D 1. {ABC...}, {D...}

2. {AB...}, {CD...}

3. {ABD...}, {C...}

4. {AC...}, {BD...}

5. {ACD...}, {B...}

6. {AD...}, {BC...} (Position of D w.r.t. ABC)

Figure 4.8: An example of possible SCPs and 3PCs. For general configu-ration of four points, six SCPs are acquired by observation and transformedinto six 3PCs.

49

EC

1

2 34

D

A

B

A

D E

B C

B C

B C

(a) An example of positions (b) Propagated positions

Figure 4.9: An example of constraint propagation.

servation. For example, vision sensors cannot observe objects in distantlocations and behind walls. In this case, the observation will not providesufficient SCPs (and 3PCs) for reconstructing a complete qualitative spatialmodel. However, the 3PCs acquired from SCPs provide further 3PCs. Thiscan be considered as constraint propagation.

A simple example of the constraint propagation is as follows. Let usconsider five points A through E. When the positions of D and E have beenuniquely determined with twelve 3PCs with respect to A, B and C as shownin Figure 4.9 (a), the points C, D and E, for example, are classified into thefollowing SCP with the line AB:

fDg, fCEg � � � (1)

Note that the notation of the qualitative positions in Figure 4.9 through4.12 is different from that of 3PCs as shown in Figure 4.7 and 4.8 for simplerepresentation. That is, 3PCs originally represent regions where a point isnot located, however, these figures indicate regions where a point is located.

Furthermore, there are four lines around the line AB which classify thefive points including A and B into the following SCPs (the numbers corre-spond to those in Figure 4.9 (a)):

50

1. fADg, fBCEg2. fBDg, fACEg3. fABDg, fCEg4. fDg, fABCEg

� � � (2)

There are�5

2

�= 10 lines which pass over two points out of A through

E. Each line classifies the points into several SCPs in the same way. Con-sequently, the following seven different SCPs are acquired in the case ofFigure 4.9 (a):

fABCDg; fEg fACEg; fBDgfABCEg; fDg fADg; fBCEgfABDg; fCEg fAEg; fBCDgfACDg; fBEg

Then, these SCPs are transformed into several 3PCs as described in Sec-tion 4.2.5. Figure 4.9 (b) shows an example of possible positions of B andC with respect to A, D and E acquired from these SCPs.

4.2.7 Formalization of the Constraint Propagation

The process for acquiring new SCPs described in the previous subsection isas follows:

1. Acquire an SCP classified by a line passing over arbitrary two points(an example is the SCP (1) in the previous subsection).

2. Then, transform it into four SCPs including the two points (an exam-ple is the SCPs (2)).

This process can be formally summarized as follows.Let us suppose that positions of several points (i.e., regions where they

are located) with respect to a triangle ABC have been uniquely determinedwith 3PCs. Then, a line is considered which passes over two of the pointsand classifies the other points into an SCP. Considering positional symmetryof the points, there are 15 kinds of selection of two points which the classi-fying line passes over as shown in Figure 4.10, where the circles indicate the

51

selected two points, and the points in the regions X and Y are classified intoan SCP “fXg, fYg” with the line. Figure 4.10 (1) corresponds to the case inwhich the selected points are two of A, B and C. (2) through (6) correspondto the case in which one of the selected points are A, B and C. (7), (8) and(9) correspond to the case in which the selected points are located in thesame region. (10) through (15) correspond to the case in which the selectedpoints are located in different regions. Note that no SCP is acquired in (7),(8), (9) and (13), and an SCP “fXg, f /0g” is acquired in (10), (11) and (14).

Then, SCPs including the two points on the classifying line are consid-ered. Suppose the line AB classifies the other points into an SCP “fXg,fYg.” Although A and B are not included in the SCP in the above discus-sion, there are four lines which classify A and B as well as X and Y into thefollowing SCPs (see Figure 4.11):

fAXg, fBYg fABXg, fYgfBXg, fAYg fXg, fABYg

Thus, new SCPs can be acquired from 3PCs, then the SCPs are trans-formed into new 3PCs as described in Section 4.2.5.

In the above discussion, the constraint propagation is performed whenthe positions of the points have been uniquely determined with 3PCs (i.e.,each point is located in one of the seven regions of a triangle). However,even if they have not been uniquely determined, the constraint propagationcan be performed with respect to the points each of which is located onlyin the region X or Y shown in Figure 4.10. In the experimentation of Sec-tion 4.3, the constraint propagation is performed under such a situation.

4.2.8 Transforming into the Qualitative Spatial Model

The 3PCs are transformed into the qualitative spatial model (see Figure 4.3(b)) as follows. For example, if the position of X with respect to A, B and Chas been determined with 3PCs as shown in Figure 4.12, then the order ofBCX (B!C!X ) is determined to be opposite to that of ABC (A!B!C);that is, BCX = � if ABC = +, and BCX = + if ABC = �. If the order of

52

X Y

(1) (2) (3)

(4) (5) (6)

(7) (8) (9)

(10) (11) (12)

(13) (14) (15)

X

X

XY

X

X

Y

Y Y

Y X Y

X YX Y

X

Y

Y

Y

Y

X

Y

Y

Y

YY X Y

X YX Y

X Y

X YX Y

X

XX

X Y

X Y

X

X

X

Figure 4.10: Classifications for the constraint propagation.

53

{X}, {Y} {AX}, {BY} {BX}, {AY} {ABX}, {Y} {X}, {ABY}

X Y

A

B

X Y

B

A

X Y

A

B

X Y

B

A

X Y

B

A

Figure 4.11: SCPs including the points on the classifying line.

ABC: +

ABC: −

BCX: −

BCX: +

XCB

A

X X

Figure 4.12: Transformation into the qualitative spatial model.

ABC is given, the order of BCX is uniquely determined. Consequently, allcomponents of the qualitative spatial model are uniquely determined whensix 3PCs with respect to each point’s position have been acquired as shownin Figure 4.8 (b).

4.3 Experimental Results

4.3.1 Verification in a Simple Environment

We have acquired a qualitative spatial model of positions of vision sensorsby the proposed method with simulations. First, we have verified the methodwith a simple environment. In the environment, there exist 10 vision sensorsand a moving object in a 10m�10m space as shown in Fig. 4.13. Eachvision sensor observes the object in all directions at any distance and detectsmotions of the object as it randomly moves 1m.

54

1

2

34

5

6

78 9

10

Figure 4.13: Simple environment with 10 vision sensors and a moving ob-ject.

0

20

40

60

80

100

120

0 500 1000 1500 2000

Acq

uire

d co

mpo

nent

s

Observations

Figure 4.14: The number of acquired components in the simple environ-ment.

55

Fig. 4.14 shows the averaged number of acquired components of thequalitative spatial model over five runs. In this experimentation, the to-tal number of the components is

�103

�= 120, and 80% of them have been

acquired within 100 observations, and all components have been acquiredwithin about 2;000 observations. The 2;000 observations provided 45 SCPs,which is equal to the number of possible SCPs with this configuration of thevision sensors. Since identical SCPs are frequently acquired by the obser-vations, the number of the provided SCPs is far less than that of the obser-vations. With this experimentation, we could verify the proposed method.

4.3.2 Application to a Complex and Realistic Environ-ment

Next, we have evaluated the method with a complex and real environment.The purpose of this experimentation is to evaluate practicality of the method.In the environment, there are 20 vision sensors and a moving object in a20m�20m space as shown in Figure 4.15. The vision sensors have omnidi-rectional views and observe motion directions of the object in all directionswhenever it randomly moves 1m on the light gray region in Figure 4.15.However, they cannot observe the object at a distance of more than 10m orbehind walls (indicated with the white lines in Figure 4.15). In this exper-iment, the number of components of the qualitative spatial model (repre-sented with ‘+’ and ‘�’ as shown in Figure 4.3 (b)) is

�203

�= 1;140. How-

ever, the proposed method cannot acquire all of the components since thesensors cannot observe sufficient motion directions of the object. It is esti-mated from the configuration of the sensors that about 560 components willbe acquired without constraint propagation.

Figure 4.16 shows the averaged number of acquired components overfive runs. With 5,000 observations, the method has determined the direc-tions (‘+’ or ‘�’) of 490 components without constraint propagation. Onthe other hand, it has determined the directions of 969 components withconstraint propagation,which are twice as many as those acquired withoutconstraint propagation, and 85% of all components.

56

1 32 4

5

6

78

910

11

12

13 14 15

16

171819

20

Figure 4.15: Simple environment with 20 vision sensors and a moving ob-ject. The walls in the center of the environment obstruct the views of thesensors.

Figure 4.17 shows another environment, where there are 35 vision sen-sors and 8 moving objects in a 30m�30m space. The objects are iden-tified by their color. In this environment, the number of the components is�35

3

�= 6;545, and it is estimated that about 540 components will be acquired

without constraint propagation.Figure 4.18 shows the averaged number of acquired components over

five runs. Note that the acquisition of the components is accelerated to eighttimes faster than the previous experiment, since there are eight moving ob-jects. With 2,000 observations, the method has determined the directions of513 components without constraint propagation, which is almost equal tothe estimated number of 540. With constraint propagation, it has determinedthe directions of 707 components. In other words, the constraint propaga-tion has acquired about 200 components, which represent the positions ofthe sensors in distant locations. However, the method could not determinethe other components on account of the limitations of observation.

Figure 4.19 (a) and (b) show the qualitative positions of the sensors de-

57

0

100

200

300

400

500

600

700

800

900

1000

0 1000 2000 3000 4000 5000

Acq

uire

d co

mpo

nent

s

Observations

with propagation

without propagation

Figure 4.16: The number of components acquired with 20 vision sensors inthe simple environment.

picted based on the 3PCs acquired in the environments Figure 4.15 and 4.17,respectively. The reason we have used the 3PCs in spite of the componentsof the qualitative spatial model is that all of the 3PCs cannot be transformedinto the components, and they include more constraints than the compo-nents. For obtaining Figure 4.19, we first located the sensors randomly,then dynamically adjusted the positions by iterating the following steps forarbitrary points, so as to satisfy the constraints of the acquired 3PCs:

1. Compute the force F (∝ exp(d)) with respect to an arbitrary triangleABC, which moves a point X into a correct region. d is a distancebetween the current position of X and the correct region (see Fig-ure 4.20).

2. Gradually move X based on the resultant of F .

58

21 3 4 5

6

7

9

8

101112

1314

1516

17

18

19

20 21

2223

24 25

26

27

28

29

30

31

3233

3435

Figure 4.17: Complex environment with 35 vision sensors and 8 movingobjects. The white lines indicate walls.

By comparing Figure 4.19 (a) with Figure 4.15, and Figure 4.19 (b) withFigure 4.17, we can find that the acquired positions are topologically cor-rect, that is, the order of connections among the sensors is correct. Althoughthe method could not acquire all of the components, these qualitative mapsare sufficient for map-based robot navigation as shown in Figure 4.3 (a)[Levitt90], since the qualitative positions of neighboring sensors have beenacquired. It can also be used for coarse path planning in the distributed vi-sion system: if we suppose that the sensors (i.e., vision agents) observe thesize of the robot, they know which sensor is nearest to the robot, and candetermine the order in which they navigate the robot to its destination.

59

0

100

200

300

400

500

600

700

800

0 500 1000 1500 2000

Acq

uire

d co

mpo

nent

s

Observations

with propagation

without propagation

Figure 4.18: The number of components acquired with 35 vision sensors inthe complex environment.

4.3.3 Observation Errors

As described in Section 4.4.4, observation of motion directions is stableagainst sensory noise. The proposed method can acquire correct positionsof points as long as observed motion directions are correct, however, oncea wrong direction is observed, several wrong SCPs and 3PCs may be ac-quired, which causes inconsistency in the qualitative spatial model. In thissection, we consider the observation errors to verify the sensitivity of theacquisition algorithm itself.

First, we have verified the method in a noisy environment, where theconfiguration of the sensors is same as Figure 4.17, however, approximately16% of motion directions of the objects are mistakenly determined. Fig-ure 4.21 shows the averaged number of acquired components and that ofwrong components (i.e., components whose directions (‘+’ or ‘�’) are mis-

60

12 3

4

5

6

78

910

11

12

13

14 15

16

171819

20

(a) 20 vision sensors with 5,000 observations.

1 23

4

5

6

7

89

1011

12

131415

16

17

18

19

2021

2223 24

25

262728

29

3031

3233

34 35

(b) 35 vision sensors with 2,000 observations.

Figure 4.19: Qualitative positions of the vision sensors depicted based onthe acquired 3PCs.

61

Fd

dd

F

F

X

X

X

X: Correct regions for

d

FX

B

A

C

Figure 4.20: Force F which acts on X with respect to a triangle ABC.

0

100

200

300

400

500

600

700

800

0 500 1000 1500 2000

Com

pone

nts

Observations

without error eliminationwith error elimination

Acquired components

Wrong components

Figure 4.21: The number of acquired and wrong components in a noisyenvironment.

62

takenly determined) over five runs. With 2;000 observations, the method hasacquired 795 components including 234 wrong components. Note that thenumber of the acquired components is more than that of the experiments inSection 4.3.2 since various SCPs have been acquired on account of obser-vation errors.

In order to eliminate wrong 3PCs, we consider the following statisticalmethod. As described in Section 4.4.4, not all of the seven 3PCs with re-spect to arbitrary four points should be acquired with observation. If wesuppose that the objects randomly move around and various SCPs and 3PCsare equally acquired multiple times by observation, a considerable numberof wrong 3PCs can be eliminated by checking the 3PCs acquired relativelyless times than others. With this error elimination, the proposed methodhas determined 499 components as indicated with “error elimination” inFigure 4.21, and reduced the number of wrong components to 32. In otherwords, the error elimination method could reduce the ratio of wrong compo-nents from 29% to 6%, though it also reduced the total number of acquiredcomponents. However, more sophisticated error elimination methods willbe necessary since the above method could not eliminate all of the wrong3PCs.

4.4 Discussion

4.4.1 Completeness of the Algorithm

In the experimentation, the proposed method could not acquire the compo-nents representing positional relations of sensors (1) in distant locations, (2)behind walls, and (3) in a straight line (e.g., in Figure 4.15, sensors 13-15-18, 1-3-4 and so on). It is not easy to acquire all of the components withrespect to (1) and (2) on account of the limitations of observation. In addi-tion, if there are concavities in the environment (e.g., several sensors may besurrounded with walls), it will be also difficult to acquire the components.In future work, it should be analyzed how the acquisition process is affectedby the structure of the environment. With respect to (3), the qualitative spa-

63

tial model cannot originally represent such a positional relation, however, itwill be estimated from acquired 3PCs that several sensors are located in astraight line.

With respect to observation errors, the statistical error eliminationmethod cannot remove all of the wrong 3PCs. In addition, it does not checkinconsistency among acquired 3PCs, that is, whether the points can be lo-calized so as to satisfy all of the acquired 3PCs. In fact, the positionalrelations cannot be depicted in the experiment of Section 4.3.3 on accountof inconsistency in the acquired model. Therefore, geometrical approachessuch as triangle constraints [Kim92] should be developed for more effec-tive error elimination. In such methods, backtracking will be necessary inthe same way as constraint satisfaction problems [Isli00]. Especially withdistributed computing resources as discussed in Section 4.4.3, distributedconstraint satisfaction problems [Yokoo98, Yokoo99] can be applied to theerror elimination.

4.4.2 Computational Costs

In this section, we discuss the computational costs of the algorithm. Notethat the discussion excludes the constraint propagation since its behavior iscomplicated.

As discussed in Section 4.2.5, qualitative positions of all points aredetermined when all possible SCPs are acquired by observation (see Fig-ure 4.8). The k-sets theory [Alon86] gives the number of possible SCPs(i.e., the number of straight lines which classify the points into differentSCPs) as follows. In the k-sets theory, it has been proved that the upperbound of the number of point sets, which contain at most k points and arecut off by straight lines from n points in a plane, is kn (k < n=2). The caseof the SCPs is considered as the same problem for k = n=2 (i.e., we con-sider all of the straight lines including those which classify the n points intoexact halves). It has not been proved for k = n=2 so far, however, it is esti-mated to be O(n2) [Dey98]. Thus, the number of possible SCPs is O(n2).In other words, our method has to observe different motion directions of

64

objects O(n2) times to acquire the qualitative positions of n points.Next, the number of 3PCs which the method needs to check in order

to acquire the qualitative spatial model is considered. If motion directionsof objects are observed from all of n points, every SCP obtained by theobservation contains all of the n points. Then, the algorithm chooses arbi-trary three points out of n points in the acquired SCP and checks 3PCs withrespect to remaining (n� 3) points as described in Section 4.2.5. Conse-quently, the algorithm checks f�n

3

�(n� 3)g = O(n4) 3PCs for every SCP.

Since the number of SCPs needed to acquire the positions of n points isO(n2) as described above, the method checks O(n6) 3PCs to acquire thequalitative spatial model.

However, the number of 3PCs needed to represent the qualitative posi-tions of points is far less than that. As described in Section 4.2.5, the quali-tative position of a point with respect to arbitrary three points out of (n�1)is uniquely determined with six 3PCs as shown in Figure 4.8. Therefore,fn � �n�1

3

� � 6g = O(n4) 3PCs are needed to represent the qualitative posi-tions of n points. Since the number of 3PCs the method checks is O(n6) asdescribed above, it redundantly checks many 3PCs.

4.4.3 Distributed Computation

As described above, the computational costs of the proposed method arerather high. However, for practical implementation, we can expect to em-ploy parallel computation using distributed computing resources, such asthe distributed omnidirectional vision system. Let us suppose that each sen-sor has computational ability. The qualitative spatial model can be acquiredin a distributed manner as follows:

1. Observe motion directions of objects simultaneously.

2. Exchange the motion directions with other sensors which observedthe same object, and acquire an SCP.

3. At each sensor, independently compute 3PCs and the components ofthe qualitative spatial model related to the sensor position.

65

In the distributed computation, it is expected that the method can acquirethe qualitative spatial model even if the number of the sensors increases,since in practice vision sensors do not observe objects in distant locations,and the computation of SCPs and 3PCs is performed only at the sensorswhich observed objects. However, the message exchange costs among thesensors may increase. In future work, problems in the distributed computa-tion should be considered in more detail.

4.4.4 Stability of Detection of Motion Directions

Theoretically, the qualitative spatial model can be also acquired by quan-titative methods which transform metrical positions of points into qualita-tive positions. This section considers quantitative methods which directlyacquire metrical positions of sensors from visual information without anyother sensory information, and discusses the advantages of the proposedmethod.

In the qualitative method, the acquisition process described in Sec-tion 4.2.4 itself is sensitive against errors of motion directions so that evena few errors may cause wrong qualitative positions in the qualitative spa-tial model. More specifically, observation errors may yield wrong SCPsand 3PCs as follows: if the observation error yields an inconsistent SCP(inconsistent classification such as “fADg, fBCg” in the case of Fig. 4.2),wrong 3PCs are acquired, which mistakenly determine the qualitative posi-tions of points. In addition, only six kinds of 3PCs should be acquired withrespect to arbitrary four points as shown in Fig. 4.8, however, inconsistentSCPs may give all of the seven 3PCs, from which correct qualitative posi-tions cannot be acquired. For this problem, Section 4.3.3 provides a simplemethod which statistically eliminates wrong 3PCs.

Thus, the algorithm itself is sensitive against observation errors. How-ever, observation of motion directions of moving objects is fairly stable inpractice. Let us suppose the following method for detecting motion direc-tions of a robot with multiple vision sensors:

1. Detect the robot by background subtraction.

66

Cam 1 Cam 2 Cam 3 Cam 4 Cam 5

(2/0/2)?

(1/0/0)?

(0/0/1)?

?(3/0/2)

(1/4/2)?

(3/0/0)L*

(3/0/0)L*

(3/0/0)L*

(3/0/0)L*

(3/0/0)L*

(3/0/0)L*

(3/0/0)L*

(3/0/0)L*

(3/0/0)L*

(6/0/0)L*

(5/0/1)L*

(9/0/2)L*

(9/0/2)L*

(0/6/2)R*

(0/5/0)R*

(0/3/1)R*

(0/5/1)R*

(1/1/4)R*

(0/1/4)R*

Tim

e

(2/0/6)N

Figure 4.22: Motion directions detected by background subtraction and tem-plate matching. Five successive images taken with each vision sensor areshown. The numbers in parentheses indicate nL/nR/nN , and L, R, N, and‘?’ indicate the detected motion directions. ‘�’ indicates that the detectedmotion direction (L or R) is correct.

2. Find several small regions such as vertical edges which can beused for template matching (indicated with the white rectangles inFig. 4.22).

3. Compute optical flows by template matching (the horizontal lines inthe white rectangles indicate the optical flows).

4. Check the direction of each flow and determine the motion directionof the robot based on majority rule.

We have verified the above method with a model town (see Fig. 4.23),which is originally built as a platform for a robot navigation system (thedetails are described in Chapter 6). There are five vision sensors, whichobserve a robot at a distance between 50cm and 150cm, and take imagesof 160�120 pixels whenever the robot moves approximately 5cm. In this

67

Vision agents

Robots

Figure 4.23: A model town and mobile robots. The scale is 1/12.

experimentation, we have taken 250 images in total, and determined themotion direction D 2 fL;R;N;?g (left, right, no motion, and unknown, re-spectively) based on the number of the templates as follows:

� D = L if nL � 3 and nL � 2(nR+nN),

� D = R if nR � 3 and nR � 2(nL +nN),

� D = N if nN � 3 and nN � 2(nL +nR),

� otherwise D =?,

where nL and nR are the number of templates which moved left and right,respectively, and nN is that of no motion. A threshold for detecting themotion directions of the templates is 1 pixel.

Fig. 4.22 shows part of the experimental results, where five successiveimages taken with each vision sensor and estimated motion directions areshown. The above method correctly determined 113 motion directions (L*and R* in Fig. 4.22) out of 250, and could not determine 137 motion direc-tions (21 of which are no motion (N) and 116 are unknown (?)). This means

68

that the optical flow estimation by template matching provides correct mo-tion directions if we refer to an enough number of templates. Furthermore,the intervals for taking images and the threshold for motion detection arefixed in this experimentation, however, if we dynamically adjust them ac-cording to the motion of the robot and use high-resolution vision sensors,the above method for detecting motion directions will provide better results.Note that the latter 137 results do not affect the correctness of the qualitativespatial model as described in Section 4.2.3.

The method is the simplest one and there exist more sophisticated meth-ods. We can expect that, in general, correct motion directions of objects areobtained if they are tracked for a sufficiently long time. Thus, it is possibleto stably acquire qualitative motion directions by image processing, and thequalitative method introduced in this chapter is considered as an alterna-tive to quantitative methods for acquiring qualitative positions of points byvisual observation in a simple and stable manner.

4.5 Application

The proposed method acquires the qualitative sensor positions in the dis-tributed omnidirectional vision system, however, it can also acquire quali-tative maps of landmarks in general robot navigation. Let us suppose therobot has an omnidirectional vision sensor, observes motion directions oflandmarks in the omnidirectional retina as shown in Figure 4.24. If therobot can identify all of the landmarks or keep track of them anywhere inthe environment, it can acquire the qualitative landmark positions with thismethod by observing their motion directions. In addition, the acquired mapcan be used for map-based robot navigation [Levitt90]. Thus, the methodsolves one of the general and fundamental problems in robot navigation andmap building.

69

: Landmark

Left

LeftRight

LeftRobot

B

A

C D

Figure 4.24: Observation by a moving robot for acquiring qualitative maps.The robot observes the motion directions of the landmarks.

4.6 Determining Qualitative Locations by Ob-servation

As described in Section 2.1, five types of localization are possible for mea-suring metrical locations of sensors and objects. In the same way, we con-sider five types of localization (see Figure 2.2) from the viewpoint of qual-itative approaches, including the qualitative localization method of the sen-sors by observation described in the previous sections. Note that, here weassume that the locations of both the sensors and objects are represented inthe same way as shown in Figure 4.1.

Method Q-1 (localization type 1: localization of objects by observingthem from the sensors) If the sensors can directly observe each other, theobjects are localized based on the order of the sensors and the objects pro-jected onto the view of each sensor. In Figure 4.25, for example, sensor Aobserves the object between sensors C and F, etc., from which the objectlocation is identified with the given qualitative locations of the sensors. Thequalitative location of the object is identical unless the object moves out ofthe dark gray region in Figure 4.25.

70

B

A

C D

E

Object

Figure 4.25: Method Q-1: Qualitative localization of objects. The object islocated in the dark gray region.

Note that this method identifies the object location, but cannot determinethe region occupied by the object without metrical sensor locations, sincethe shape of the regions as shown in Figure 4.25 changes according to themetrical locations of the sensors. Therefore, in the strict sense, this is not alocalization method but an identification method of the qualitative locationsof objects. This method can find that the object comes to the same regionwhere it was previously.

Method Q-2 (localization type 2: localization of the sensors by observ-ing reference objects) The sensors are localized based on the directions tothe objects of known locations. In Figure 4.26, for example, the sensor mea-sures the directions to the reference objects A through E to obtain the anglesbetween two of the objects. Then, for any two objects X and Y whose angleis less than 180 degrees, we know in which side of the line XY the sensoris located. The sensors need to identify the objects. This method is same aswhat Levitt has proposed [Levitt90].

In the strict sense, however, this method is not a purely qualitativemethod, since it checks the angles between the objects as metric informa-tion. In addition, this method localizes the sensors with respect to the object

71

B

A

C D

E

Sensor

Figure 4.26: Method Q-2: Qualitative localization of the sensors.

locations, however, it does not acquire the qualitative representation of thelocations among the sensors.

Method Q-3 (localization type 3: localization of the sensors by observ-ing themselves) The sensors observe the directions to other sensors to ac-quire their qualitative locations in the same way as the method Q-2 (seeFigure 4.27). This method, however, is not also a purely qualitative methodsince it uses the directions as metric information. The sensors need to ob-serve and identify the others.

Method Q-4 (localization type 4: localization of the sensors by observ-ing objects) The sensors observe objects of unknown locations, and acquirethe qualitative locations of the sensors. Section 4.2 discusses this method,which is a purely qualitative acquisition method, i.e., it acquires the qualita-tive locations of the sensors only from the motion directions of the objects,as qualitative information, acquired by observation. In this method, the sen-sors need to identify the objects.

Method Q-5 (localization type 5: localization of the objects by observ-ing objects) The qualitative locations of the objects are acquired by using

72

Figure 4.27: Method Q-3: Qualitative localization of the sensors.

method Q-1 after Q-3, or Q-1 after Q-4. However, this is not a method fordirectly acquiring the object locations. We will need further considerationon this method.

The above discussion is based on the assumption that the qualitative lo-cations are represented in the same way as shown in Figure 4.1. If otherqualitative representations are considered, different methods will be neces-sary and should be discussed in future work.

4.7 Summary

In this chapter, we have explained the idea of qualitative approaches. Itis generally expected that qualitative methods are not seriously affected bysensory noise, and enable us to navigate robots in a wide environment, forexample.

Several methods have been proposed for acquiring qualitative locationsof objects so far. However, they are not purely qualitative since they obtaina qualitative representation by measuring metrical locations, or abstractingobserved visual information. As a purely qualitative localization method,we have proposed a method that directly acquires qualitative locations.

The method acquires a qualitative spatial representation from qualitative

73

motion information of moving objects. Key points of this method are asfollows:

� Qualitative positions of landmarks are acquired from motion direc-tions of objects, which are purely qualitative information and obtainedwith stable observation.

� With constraint propagation, the positions of landmarks in distant lo-cations can be acquired if sensors are partially observable.

It has been confirmed with simulations that the method is valid for ac-quiring the qualitative positions of multiple vision sensors.

Finally, we discuss remaining problems. In the simulations, we sup-posed to use omnidirectional vision sensors. If normal vision sensors areused instead of them, there will be some “blind spots” where they cannotsimultaneously observe an object, which makes it difficult to acquire thequalitative spatial model with the proposed method on account of insuffi-cient observation. However, the method can acquire the qualitative positionsas long as the sensors simultaneously observe motion directions of the ob-ject somewhere in the environment. In future work, the condition of visualangles of sensors needed to acquire the qualitative spatial model should beclarified.

The correspondence problem of multiple objects should be also ad-dressed. In a real environment, it is difficult to identify many objects, es-pecially when the positions of the sensors are unknown. Therefore, thecorrespondence errors should be also checked with the elimination methodof observation errors.

74

Chapter 5

Integration of the FundamentalTechniques toward RealApplications

Chapters 2 through 4 summarize various methods for localizing and iden-tifying sensors and objects in quantitative, statistical, and qualitative ap-proaches. They each have various merits and limitations, and are used underdifferent situations. This chapter summarizes the properties of the methods,and discusses the integration of the methods for various applications.

5.1 Merits and Limitations

Table 5.1 summarizes the properties of each localization method. For ex-ample, the method M-1 needs the sensor locations, observes objects, andacquires the locations of the objects. The correspondence and the locationsof the objects are not necessary.

The detailed explanations for each method are given below:

Method M-1 (localization of objects by omnidirectional stereo) Thismethod needs precise metrical locations of the sensors to localize theobjects, but does not need to identify the objects or to determine thecorrespondence of the objects projected onto the view of the sensors.

75

Table 5.1: Properties of each localization method.

Sensor Observation Correspon- Loca-Method location target dence�2 tion Acquires�3 Remarks

M-1 Yes Objects No No MoQ-1 Yes Objects C No Qo *6M-2 No Objects I Yes MsQ-2 No Objects I Yes QsM-3 No Sensors No No Ms *4

M-3FOE No Env�1 C No B *4,7S-3 No Objects No No B *4,5,7Q-3 No Sensors I No Qs

M-4, M-5 No Objects C No Ms, MoQ-4 No Objects C No Qs *5S-5 No Objects No No C *4,5

�1: Env: Visual features in the environment�2: I: Need to identify the sensors or objects.

C: Need to determine the correspondence of the sensors or objects.�3: Qs: Qualitative locations of sensors

Qo: Qualitative locations of objectsMs: Metrical locations of sensorsMo: Metrical locations of objectsB: Baseline directionsC: Correspondence of objects

�4: Observation should be limited in a local area.�5: Need to iterate observation.�6: Does not localize but identifies the qualitative object location.�7: Metrical locations are computed by M-3 from The baseline directions.

76

Method M-2 (localization of the sensors with reference objects of knownlocations) This method needs precise metrical locations of the objects,and the sensors need to identify the objects. The sensors are local-ized by solving nonlinear equations, which are unstable in generaland should be solved in a sophisticated manner to perform localiza-tion with high precision.

Method M-3 (localization of the sensors with triangle constraints) Sincethis method needs the directions to the sensors, the sensors need tofind each other in their view. The method does not need to identifythe sensors or determine the correspondence of the sensors projectedon the view of each sensor. The method should be used among thesensors in a limited and local areas, especially if the sensors are dis-tributed in a wide area, since in such a case a number of similar an-gles between the sensors yield wrong triangle candidates, which areused for localizing the sensors. If the sensors are small and cannot befound by observation, the directions obtained by other methods suchas M-3FOE and S-3 can be used for localizing the sensors with triangleconstraints.

Method M-3FOE (baseline estimation based on FOE) In order to find thebaselines, this method needs sufficient visual features in the directionof the baselines that can be found by the sensors. It cannot be usedif there are similar visual features in the environment, especially in awide area. In this case, the method should be used among the sensorsin a limited and local areas.

Method M-4, M-5 (localization of the sensors and objects by observingthe objects in the environment) This method needs the directions tothe objects, and have to determine the correspondence of the objectsamong the sensors. If it is difficult to determine the correspondence,a single moving object is observed instead. Since the locations aredefined with unstable nonlinear equations, the method should solvethem in a sophisticated manner to localize the sensors (M-4) or objects

77

(M-5) with high precision.

Method S-3 (statistical estimation of the baselines) This method needsonly the directions to objects, and does not need to identify the ob-jects or to determine the correspondence of the objects among thesensors. It is necessary to iterate observation of the directions of theobjects. If the sensors are distributed in a wide area and there are sev-eral objects, the method may not be able to detect the baselines onaccount of a large number of wrong matches. In this case, the methodshould estimate the baselines among the sensors in a limited and localareas, or needs to determine the correspondence of the objects.

Method S-5 (statistical estimation of the correspondence of objects) In thesame way as the statistical estimation of the baselines, this methodneeds only the directions to objects, and does not need to identify theobjects or to determine the correspondence of the objects among thesensors. It is necessary to iterate observation of the directions of theobjects. If the sensors are distributed in a wide area and there areseveral objects, the method may not be able to detect the correspon-dences on account of a large number of wrong matches. In this case,the method should estimate the correspondence of objects observedby the sensors in a limited and local areas.

Method Q-1 (qualitative localization of objects by observing the order ofthe sensors and objects) In this method, the sensors need to find eachother in their view as well as objects, and to determine the corre-spondence of the objects projected onto the view of the sensors. Thequalitative locations of the sensors are necessary. This method onlyidentifies the qualitative location of the objects, however, it cannot de-termine the actual qualitative locations of the objects without metricalsensor locations, since the qualitative locations will vary according tothe metrical locations of the sensors.

Method Q-2 (qualitative localization of the sensors by observing objects ofknown qualitative locations) The sensors need to identify the objects.

78

In the strict sense, this is not a purely qualitative method, since thesensors are localized by observing the angles between the objects.That is, this method uses metrical information for localization.

Method Q-3 (qualitative localization of the sensors by observing the direc-tion to other sensors) In this method, the sensors need to observe andidentify other sensors. Basically, this method is same as method Q-2.

Method Q-4 (qualitative localization of the sensors by observing the mo-tion directions of moving objects) This method needs the motion di-rections of the objects, and have to determine the correspondence ofobjects among the sensors. Assuming that the observation of the mo-tion directions is stable against noises in sensory data, this methodacquires correct qualitative maps of the sensors. It is necessary toiterate observation of the motion directions of the objects.

Method Q-5 (qualitative localization of the objects by observing objectsof unknown locations) This method is achieved by a combination ofother qualitative methods, i.e., Q-3 and Q-1, or Q-4 and Q-1, as de-scribed in Section 4.6.

Thus, the methods have various merits, and also limitations. Accordingto the situations, a proper method should be used.

5.2 Integration of the Localization Methods

The methods described above are fundamental localization methods in thedistributed omnidirectional vision system. We can consider various methodsfor solving application-specific problems by integrating them.

5.2.1 Identification of Moving Objects

If the qualitative positions of the sensors have been acquired, the correspon-dence of the objects can be verified based on the motion directions of theobjects.

79

B

A

C

D

M2

(a) Configuration(top view)

MA1

M1

MA2A

MB1 MB2B

MC1 MC2C

MD1 MD2D

(b) Omnidirectional Views

Figure 5.1: Correspondence of objects

Let us suppose that four vision sensors observe two objects M1 and M2as shown in Figure 5.1 (a). The objects are projected onto the view of eachomnidirectional vision sensor as shown in Figure 5.1 (b). Let the two objectsin the view of sensor A be MA1 and MA2, and the objects in the view of sensorB be MB1 and MB2, etc. If the two objects move as shown in Figure 5.1 (a),the following motion directions are observed by the sensors (see Figure 5.1(b)):

Sensor A: MA1 = right, MA2 = leftSensor B: MB1 = right, MB2 = leftSensor C: MC1 = left, MC2 = leftSensor D: MD1 = right, MD2 = left.

By observing the motion directions of the objects, we can verify the cor-respondence without visual features as follows. In the case of four sensorsand two objects as shown in Figure 5.1, eight kinds of correspondence ofthe objects are possible, but the correct one is:

M1 = MA2-MB1-MC1-MD2, andM2 = MA1-MB2-MC2-MD1.

Then, each hypothesis about the correspondence of the objects is verifiedwith the qualitative spatial model, in the same way as acquisition of the

80

B

A

C

D{AC}, {BD}

(a) SCP (b) 3PC

B

A

C

D

(c) Actual configuration

Figure 5.2: A three point constraint acquired from an SCP “fACg, fBDg”

qualitative spatial model as described in 4.2.5. For example, with respect tothe hypothesis:

M1 = MA1-MB2-MC2-MD2, andM2 = MA2-MB1-MC1-MD1,

it is supposed that sensors A through D observe that M1 is moving to theright, left, left, and left respectively, and M2 is moving to the left, right,left, and right respectively. Then, two SCPs fAg, fBCDg and fACg, fBDgare obtained (see Section 4.2.3). However, the latter SCP fACg, fBDg istransformed into a 3PC as shown in Figure 5.2, and we can find that itis inconsistent with the actual configuration of the sensors. Therefore, weknow that the hypothesis is wrong.

Basically, this method find only part of wrong correspondences with asingle observation of the motion directions of objects. For example, for thehypothesis:

M1 = MA1-MB2-MC2-MD1, andM2 = MA2-MB1-MC1-MD2,

sensors A through D observe that M1 is moving to the right, left, left, andright respectively, and M2 is moving to the left, right, left, and left respec-tively. Then two SCPs fADg, fBCg and fACDg, fBg are obtained, whichare consistent with the actual configuration of the sensors and this hypothe-sis is accepted as correct.

However, if the sensors can continuously observe the motion directionswhile the objects move around the environment, this method can eliminate

81

B

A

C B C

{A}, {BC} {ABC}, { }

A

: Object

(a) (b)

Figure 5.3: Qualitative localization of objects based on observation of mo-tion directions of objects. The gray regions indicate the regions occupied bythe object.

more wrong correspondences. In the case of four sensors and two objects,various SCPs are obtained for each of the eight possible hypotheses. Ifthere is one or more inconsistent SCPs in the series of obtained SCPs withrespect to a hypothesis, the hypothesis is considered as wrong. Thus, wronghypotheses are gradually removed while the objects move around.

5.2.2 Qualitative Localization and Object Tracking

In the method Q-1 (i.e., qualitative localization of objects by observing theobjects), the sensors need to observe each other as well as objects. If thesensors cannot directly observe each other, the motion directions of objectsobserved by the sensors can be used for localization. The region occupiedby the object is estimated based on the motion direction of the object asshown in Figure 5.3. For example, for an arbitrary object and motion di-rection shown in Figure 5.3 (a), an SCP “fAg, fBCg” is obtained, fromwhich the region occupied by the object is estimated as the gray regions inFigure 5.3 (a). Combining several SCPs obtained by observation of othersensors, the region occupied by the object can be roughly identified. Forexample, in Figure 5.4, the method observes the motion direction of the ob-

82

B

A

C D

E

Object

Figure 5.4: An example of qualitative localization of objects based on themotion direction. The gray regions indicate where the object occupies.

ject, and determines the regions occupied by the object (indicated with thegray regions in Figure 5.4).

Note that, in this method, the objects must be identified by their visualfeature, color, etc., or the qualitative identification method described in theprevious subsection.

5.2.3 Verification of Baselines by FOE

The baseline directions estimated by the statistical method can be verifiedby the following method that is based on method M-3.

When the baseline between two sensors has been detected by the sta-tistical method, the sensors should observe the same visual feature in thebaseline direction. if they observe different visual features in the baselinedirection, we find that the detected direction is wrong.

Thus, integrating the statistical baseline estimation method and the sen-sor localization method based on FOE, the baselines among the sensors canbe detected more correctly.

83

5.2.4 Stepwise Localization of the Sensors

Since the distributed omnidirectional vision system consists of a large num-ber of vision sensors, it is very hard to precisely calibrate all the sensors. Inaddition, it is also difficult to maintain precise camera parameters since thesensors located in the real (and sometimes outdoor) environment may be ac-cidentally moved. Therefore, the system needs a method that autonomouslylocalizes the sensors by observation. In what follows, we introduce a step-wise localization method using several localization methods.

For example, the quantitative method for localizing the sensors by ob-serving objects of unknown locations can be used in order to measure themetrical locations of the sensors. However, it is basically difficult to acquireprecise locations by the method alone, especially only with vision sensorsin the real environment, since the sensory data is noisy. In this case, ap-proximate locations of the sensors may be helpful as an initial solution incomputing an approximate solution of nonlinear equations.

This idea leads to the following scenario for stepwise localization of thesensors (from approximate to precise, and from local to global locations)using various localization methods (see Figure 5.5):

1. Acquire the qualitative locations of the sensors by method Q-4 usingmoving objects that the sensors can identify (see Figure 5.5 (a) and(b)). The objects to be observed may be robots or automobiles withdistinguishable colors, or people wearing clothes of specific colors.If the sensor know the motion parameters of the objects, the sensorscan identify them with the parameters. For example, the sensors canidentify an object when it starts to move or stops.

2. Compute approximate metrical locations of the sensors by energyminimization so as to satisfy the qualitative locations as describedin Section 4.3.2 (see Figure 5.5 (c)), from which we can estimate ad-jacent sensors for each sensor.

3. Estimate the baseline directions by the statistical method S-3 (see Fig-ure 5.5 (d) and (e)), using the approximate metrical locations obtained

84

BDF: -BEF: +CDE: -CDF: -CEF: +DEF: +

ABC: +ABD: +ABE: +ABF: +ACD: +ACE: -ACF: +

ADE: -ADF: -AEF: +BCD: +BCE: -BCF: -BDE: -

(b) Qualitative spatial model (c) Topological map

B

A

CD

E

F(e) Baseline directions (f) Approximate locations

B

A

CD

E

F(h) Precise locations

B

A

C D

E F

B

A

CD

E

F

B

A

CD

E

F

B

A

CD

EF

B

A

CD

E

F

(a) Motion directions

(d) Azimuth angles

Q-4

S-3

(g) Azimuth angles

M-4

M-3

Energyminimi-zation

Observation Obtained information (& transformation)

App

roxi

mat

ePr

ecis

e

Figure 5.5: Stepwise localization of the sensors.

85

in step 2 to limit the computation to local areas.

4. From the estimated baseline directions, compute approximate butmore precise metrical locations of the sensors by triangle constraints(M-3; see Figure 5.5 (f)). The propagation of triangle constraintsis performed among adjacent sensor locations using the approximatemetrical locations acquired in step 2.

5. Compute the locations of objects by omnidirectional stereo (M-1)using the approximate locations acquired in step 4, then determinetheir correspondence. At the same time, based on the computed lo-cations, select proper objects for calibration that will give stable so-lutions [Madsen98]. Note that the correspondence can be also deter-mined by the statistical method (S-5).

6. Gradually refine the sensor locations by solving nonlinear equationsusing sensory data observed in step 5 (M-4; see Figure 5.5 (h)), wherethe approximate locations obtained in step 4 can be used as an initialestimate for the simultaneous nonlinear equations.

Note that, in step 6, the precise locations of the sensors are computedfrom sensory data using the approximate locations. However, in terms ofautonomous localization of the sensors, it is still generally hard to obtainprecise sensory data for the localization by visual observation without spe-cial reference objects such as those used in calibration of conventional stereocamera pair. For example, the directions to people or robots measured bythe sensors are very noisy, especially when they are close to the sensors.Therefore, some error compensation methods will be necessary.

The above scenario is just a basic idea for stepwise localization of thesensors. The actual scenario depends on the situation, such as what can beobserved, how detailed information the application system needs, etc.

86

5.3 Examples

In this section, we discuss how the localization methods are used in twoexample application systems, robot navigation and activity monitoring.

5.3.1 Robot Navigation System

In the robot navigation system, the sensors are located so as to provide suf-ficient scene coverage of the environment. Even if the locations of the sen-sors are unknown, the system can navigate robots with the knowledge ofnavigation paths taught by a human operator (the details are described inChapter 6). First, the operator shows navigation paths to the sensors bycontrolling a robot. If there are multiple robots or other moving objects inthe environment, the robot should have a specific visual feature that can beidentified by the sensors. Note that no method based on the metrical loca-tions of the sensors can be used for identifying the robot since the sensorlocations are unknown at this moment.

While the operator shows navigation paths to the sensors, the systemcan directly memorize the correspondence of the robot among the sensors(i.e., the correspondence of the directions in which each sensor observesthe robot) in the same way as method S-5 described in Section 3.5.1, butwithout iterating observation, since there is one robot in the environmentand the correspondence is known to be unique.

In addition, the system can find which sensors are roughly adjacent toeach other by checking the sensors that are simultaneously observing therobot. In Figure 5.6, for example, sensors A through D observe the robot,while sensors E and F do not, from which the system knows that A throughD are roughly adjacent to each other (however, it does not know detailedpositional relations among them). Based on the knowledge, the system canroughly determine the correspondence of objects in the view of the sensors.

Observing the identified robot in the environment, the system can de-termine the qualitative locations of the sensors by method Q-4. With thequalitative locations as shown in Figure 5.5 (c), the system can verify the

87

A

B

C

D

E

FAdjacent

Figure 5.6: Adjacent sensors. Sensors A through D observe the robot, butsensors E and F do not, from which the system finds that A through D areroughly adjacent to each other.

correspondence of other objects by comparing the appearance projected onthe view of each sensor with the model the system has. In Figure 5.7, forexample, each sensor observes two robots (Figure 5.7 (b)). Then, using thetopological map of the sensors (c) and the appearance model (d), the systemcan verify the correspondence of the robots among the sensors (e). In Fig-ure 5.7, the projection 1 in the view of sensor A (let this be A1) is comparedwith B1 and C1, and the method finds that, by roughly localizing the robotin the topological map, such a combination of the appearance is possible.Then, D1 is verified, and it finds that the combination is impossible. Thus,the correspondence of the robot among the sensors is verified, and finally itfinds a correct correspondence “A1-B1-C1-D2-E2-F2” (f).

By iterating the observation of the robot, the system can also estimatethe baseline directions with method S-3, and obtain approximate locationsof the sensors as shown in Figure 5.5 (f), with which the system can navigaterobots more precisely, without the knowledge taught by an operator.

88

A

B

C

D

E

F

(a) Scene

(d) Appearance model

(b) Observation

(c) Topological map

B

A

C

D

E

F

(e) Verification

A 1 2

B 1 2

C 1 2

D 1 2

E 1 2

F 1 2

A B C D E F1 1 1

2

1

1

2 1

2

A1-B1-C1-D2-E2-F2(f) Correspondence

Figure 5.7: Verification of the correspondence.

89

5.3.2 Activity Monitoring System

As another example system, an activity monitoring system is considered.In most of multiple camera systems for activity monitoring [Boyd98,

Matsuyama98], the sensor parameters are precisely measured in advance.However, such systems are not flexible since the sensors cannot be movedonce the parameters have been calibrated. If the system autonomously cali-brates the parameters based on observation [Grimson98], it realizes a moreflexible system so that the sensors can be moved and added if required. Thefollowing discussion is based on this idea.

Here, we assume that the system localizes the sensors by observing mov-ing objects such as walking people, running automobiles, etc., since a con-trollable object for localization such as a robot in the navigation system maynot be available in the activity monitoring system. First, the sensors observeobjects and the baselines among them are detected by method S-3. If thereare sufficient visual features in the environment, the baseline directions canbe also detected by FOE (i.e., method M-3). In both cases, a rough knowl-edge about adjacent sensors as shown in Figure 5.6 may be necessary inorder to limit the computation to local areas, and if so, it should be given inadvance.

From the estimated baselines, the locations of the sensors can be com-puted in the same way as steps 2 through 6 in Section 5.2.4. With the preciselocations as shown in Figure 5.5 (h), the system can measure the locationsof objects by omnidirectional stereo.

90

Chapter 6

Mobile Robot Navigation

The distributed omnidirectional vision system provides rich visual informa-tion about the environment with many omnidirectional vision sensors. Sincethe sensors simultaneously observe the environment from various and fixedviewpoints, the system has great advantages in monitoring, modeling, andrecognizing events in the environment.

In this chapter, we discuss the navigation system for mobile robots asan example of application systems of the distributed omnidirectional visionsystem.

6.1 Introduction

Realizing autonomous robots that behave in the real world based on visualinformation has been one of the goals in robotics and artificial intelligence.For limited environments such as offices and factories, several types of au-tonomous robots which utilize vision sensors have been developed. How-ever, it is still hard to realize autonomous robots behaving in dynamicallychanging real worlds such as an outdoor environment. One of the reason isthe difficulty of obtaining sufficient visual information with the vision sen-sor fixed on the robots. In addition, they have to recognize the environmentthrough the moving camera (see Figure 6.1).

On the other hand, the sensors of the distributed omnidirectional vision

91

Figure 6.1: A fully autonomous mobile robot with cameras mounted on it.

Figure 6.2: A mobile robot navigated by cameras located in the environ-ment.

system can easily process the image data and detect dynamic events, sincethey are fixed in the environment. For example, in order to find moving ob-jects, the sensors only need to perform background subtraction. Thus, withthe aid of the distributed omnidirectional vision system, the robots can ob-tain sufficient visual information from various viewpoints (see Figure 6.2).

Another reason for the difficulty in building autonomous robots behav-ing in the real world is that a single robot has to acquire a consistent modelof a wide environment with vision sensors fixed on its body. If the environ-ment dynamically changes, it will be very hard to maintain the model. Thedistributed omnidirectional vision system can efficiently solve this problemby observing the changes with the sensors.

92

Motion planner

Knowledge database

Estimator ofcamera parameters

Organizationmanager

Camera

Image processor

Memory of tasks

Communicator

Controller

Task manager

CommunicatorCommand integratorActuators

Robot

Memory oforganizations

Planner

Vision agent

Com

pute

r ne

twor

k

Figure 6.3: The architecture of the robot navigation system.

Thus, the distributed omnidirectional vision system is an idealinfrastructure for realizing autonomous mobile robots in the realworld [Ishiguro97].

6.2 Development of a Prototype System

We have developed a prototype system for robot navigation based on thearchitecture described in the previous section. In this section, the details ofthe system are described.

6.2.1 Architecture

Fig. 6.3 shows the architecture of the robot navigation system. The systemconsists of multiple vision agents (VAs) that have omnidirectional visionsensors, robots, and a computer network connecting them.

Image processor detects moving robots and tracks them by referring toKnowledge database which stores visual features of robots. Estimator re-ceives the results and estimates camera parameters for establishing represen-

93

tation frames for sharing robot motion plans with other VAs. Task managermemorizes the trajectories of robots as tasks shown by a human operator,and selects proper tasks in the memory in order to navigate robots. Plannerplans robot actions based on the memorized tasks and the estimated cameraparameter. Organization manager communicates with other VAs throughCommunicator and selects proper plans. The selected plans are memorizedin Memory of organizations for planning robot tasks more properly. Con-troller controls the modules, according to requests from the robots and thesystem state such as task teaching phase and navigation phase.

The robot receives the plans through Communicator. Command Inte-grator selects and integrates the plans, from which actuator commands aregenerated.

6.2.2 Fundamental Functions for Robot Navigation

Precision of Sensor Parameters and Navigation Methods

According to the representation and precision of the measured locations,the system navigates robots in different ways. For example, if the precisesensor locations are given, the system can navigate the robots with an occu-pancy map and an environmental model representing free regions, obstacles,etc. [Hoover00]

On the other hand, in the early stages of the stepwise localization, thesystem has only rough information about the sensor locations and cannotnavigate the robots with the occupancy map. However, in such cases, thesystem can navigate them if a human operator shows navigation paths: thesystem memorizes the navigation paths taught by human operators (teachingphase), then navigates robots based on the memorized paths (navigationphase). In the following sections, we discuss the method for navigatingrobots without precise parameters of the sensors.

94

Navigation without Precise Sensor Parameters

As described above, robot navigation is achieved with two steps — taskteaching phase and navigation phase. In the task teaching phase, each VAobserves and memorizes the path of a robot which is controlled by a hu-man operator or autonomously moves in the environment. In the naviga-tion phase, VAs communicate with each other to select VAs which provideproper visual information for navigation, then navigate robots based on thepaths memorized in each sensor image.

The system needs the following functions for robot navigation:

1. Navigate robots on free regions where the robots can move.

2. Avoid a collision with other robots and obstacles.

3. Navigate robots to their destinations.

The system realizes these functions by using visual information acquired byVAs.

Navigation on free regions is realized as follows. The knowledge of thefree regions are obtained if VAs observe moving objects in the environmentfor a long time, assuming that the regions where the objects move aroundare free regions. In the developed system, VAs observe a robot controlled bya human operator in the task teaching phase (described in the next subsec-tion), and recognize free regions as shown in Fig. 6.4. Collision avoidanceis realized by generating a navigation plan so as to avoid a collision. In thedeveloped system, if VAs find that a robot is on a possible collision coursewith other robots or obstacles, they temporarily modify the destination ofthe navigation plan in order to avoid the collision. Navigation to a destina-tion is realized by teaching a knowledge of paths for navigating robots totheir destinations. In the following subsection, the task teaching phase isdescribed.

95

Background images Detected free regions

Figure 6.4: Detecting free regions. The system obtains the knowledge of theenvironment structure such as free regions by observing moving objects.

6.2.3 Task Teaching

The system switches into the task teaching phase with instruction of a hu-man operator. In this phase, VAs memorize tasks shown by a human oper-ator. The task consists of several subtasks, which are navigation betweenintersections, in this prototype. By linking subtasks, the system navigatesrobots to their destination.

First, VAs detect robots in each view. Since the VAs are fixed in an en-vironment, they can easily detect objects by background subtraction. Then,the VAs distinguish robots from other objects, and identify robots by theircolors. Each VA tracks the robot controlled by a human operator, and mem-orizes its apparent trajectory in the view as a navigation task. When therobot passes over a specific place (e.g., in front of a building), the operatornotifies the meaning of the place to the system. The system divides the tasksshown by the operator into several subtasks of navigation between intersec-tions. In this prototype, the subtasks are directly shown by the operator inorder to simplify the experimentation, since the experimental environment

96

1

23

4

A B

C

Figure 6.5: Navigation paths taught by a human operator

is small and there are only two intersections as shown in Fig. 6.5. The oper-ator shows navigation paths 1 through 4 to the VAs as subtasks, and notifiesthe places A, B and C. In this prototype, A and C are navigation goals.

In the distributed omnidirectional vision system, the VAs can robustlydetect robots using knowledge of local environments around them. In thisprototype, for example, the VAs recognize the objects as robots if their bot-tom is in the free region in the view of the VAs as shown in Fig. 6.4.

6.2.4 Navigation of Mobile Robots

After the task teaching phase, the system navigates robots by iterating thefollowing steps (see Fig. 6.6):

1. A robot requires the system to navigate itself to a destination.

2. VAs communicate with each other to make a global navigation planto its destination.

3. Each VA sets an instant navigation goal in front of the robot, thengenerates a navigation plan and sends it to the robot.

97

Controllingthe robot

Vi

Robot

Vision agent

Camera

i

i*

camera parametersEstimating Planning

a robot motion

Selectingthe plans

Integratingthe plans

Observingthe robot motion

i

Memorizedrobot path

Observation

Figure 6.6: Communication between VAs and a robot in the navigationphase

4. The robot receives the navigation plans from multiple VAs, then se-lects proper ones, integrates them, and moves based on the integratedplan.

Generating Navigation Plans

Each VA generates a navigation plan as follows. First, the VA estimates thenearest point on the memorized paths from the robot in its view (see Fig. 6.7(1)). Here, we assume that the omnidirectional images taken with the om-nidirectional vision sensors are transformed into robot-centered rectilinearimages. Note that, if each sensor has a hyperboloidal mirror, the omnidirec-tional image can be transformed into a rectilinear image (see Appendix A).

Then, the VA sets an instant navigation goal at a certain distance ∆t fromthe estimated position (Fig. 6.7 (2)), and computes an angle θ �

i between thedirection to the goal Gi = (gxi; gyi)

T and the current motion direction of the

98

Robot

wi

(1) Nearest point

Memorized path

t(2) Navigation goal

i

View of VA i

(5) i*

Vi*

Vi(4)

Gi*

Gi(3)

Figure 6.7: Generating a navigation plan

robot Vi = (vxi; vyi)T as follows (see Fig. 6.7 (3), (4) and (5)):

θ�i = arcsin

�V �

i �G�i

jV �i jjG�

i j�; (6.1)

where V �i and G�

i are:

V �i =

�1 00 1

sinαi

�Vi; G�

i =

�1 00 1

sinαi

�Gi:

αi is the tilt angle of the virtual rectilinear image plane of VA i, whichis estimated from the location of the robot in the omnidirectional image, as-suming that the sensor stands vertically to the plane where the robot moves.Note that we assume orthographic projection here. Each VA sends θ �

i to therobot as a navigation plan. When a VA has detected an obstacle (includingother robots) between the navigation goal and the robot, the VA modifiesthe goal so as to avoid the obstacle.

99

Selecting Proper Plans

After the robot received the navigation plans (θ �i ) from VAs, it estimates the

error of θ�i to eliminate navigation plans which include large errors. The

error of θ�i is caused by an observation error of the motion direction of the

robot and an estimation error of αi (the estimated tilt angle of the virtualrectilinear image plane of VA i). Here, we assume that the former error isinversely proportional to the apparent size of the robot wi in the view ofVA i. The latter error (let this be ∆θ �

i ) caused by the error of αi is computedfrom equation (6.1) as follows:

∆θ �i = arcsin

�V �

i0 �G�

i0

jV �i0jjG�

i0j�� arcsin

�V �

i �G�i

jV �i jjG�

i j�; (6.2)

where V �i0 and G�

i0 are:

V �i0 =

�1 00 1

sin(αi+∆αi)

�Vi;

G�i0 =

�1 00 1

sin(αi+∆αi)

�Gi;

and ∆αi is the estimation error of αi.Then, if the size of the robot observed by a VA is less than 2/3 of the

largest of all, the robot considers that the navigation plan generated by theVA includes a relatively large error, and eliminates it. Furthermore, therobot also eliminates navigation plans the estimated errors of which (i.e.,∆θ �

i ) are more than twice of the smallest of all.

Integrating Navigation Plans

Next, the robot integrates the remaining navigation plans. Since they arerepresented with angles on a common coordinate system along the motiondirection of the robot, they can be integrated by computing an average angleof them. Here, the robot computes an average angle of θ �

i weighted with the

100

Robots

Vision agents

Figure 6.8: Model town

estimated error ∆θ �i as follows:

θ� =∑i kiθ

�i

∑i ki

; ki =wi

j∆θ �i j; (6.3)

where wi is an apparent size of the robot observed by VA i. Finally, therobot generates actuator commands based on θ �.

6.3 Experimentation

We have developed a prototype system based on the concept of the dis-tributed vision described above. Fig. 6.8 shows a model town and mobilerobots used in the experimentation. The model town, the scale of which is1/12, has been made for representing enough realities of an outdoor envi-ronment, such as shadows, textures of trees, lawns and houses. Sixteen VAshave been established in the model town and used for navigating two mobile

101

Sun SS10

Serial

Video camerax 16

Robotx 2

4 1

4 1

4 1

4 1

4 1NTSC

NTSC

x 2Wireless unit

Encoders

Figure 6.9: Hardware configuration

robots. Although we have used standard vision sensors instead of omnidi-rectional vision sensors in this experimentation, the navigation method dis-cussed in the previous section can be used for robot navigation by estimatingthe sensor parameters αi and ∆αi based on observation (see Appendix B).

Fig. 6.9 shows the hardware configuration. Images taken with the VAsare sent to image encoders which integrate sixteen images into one image,then sent to a color frame grabber. The size of the whole image is 640�480pixels, and each VA’s image is 160�120 pixels. The main computer, SunSparc Station 10, executes sixteen VA modules, which process data from thecolor frame grabber at 5 frames per second, and communicate with the tworobots through serial devices. The robots avoid a collision based on VA’scommands, however, if a collision is detected with their touch sensors, theymove backward and change the direction in order to avoid the collision.

In this prototype, a human operator shows VAs two robots and each VAmemorizes their colors (red and black) in order to identify them. Then, theoperator shows several paths using one of the robots. Finally, the system

102

3 4 5 6 9 11 16

0

5

10

15

20

25

30

35

Time(sec)

VA

Figure 6.10: Images processed by VAs

simultaneously navigates the robots along the paths shown by the operator.Fig. 6.10 shows images taken by VAs in the navigation phase. The verticalaxis and the horizontal axis indicate time and the ID numbers of the VAs,respectively. The solid and the broken rectangles indicate VAs selected fornavigating the red and the black robot, respectively. As shown in Fig. 6.10,VAs are dynamically selected according to navigation tasks. In other words,the system navigates robots observing them from proper viewpoints.

Fig. 6.11 shows robot trajectories navigated by the system exhibited atthe international conference IJCAI’97. The sixteen cameras were simplylocated so as to provide entire scene coverage. Although their locations

103

Figure 6.11: A sequence of photographs of two robots being navigated bythe system.

were not measured, the system continuously navigated the robots for threedays during the exhibition. The concept of the omnidirectional vision sys-tem, such as simple visual functions, flexible navigation strategies and useof redundant visual information, realizes robust navigation in a complex en-vironment.

6.4 Discussion

6.4.1 Localization Methods for Robust Processing

In the navigation system discussed in this chapter, it is assumed that thesystem has no information about the locations of the sensors. However, asdescribed in Section 5.3.1, the system can perform more robust processingby using the sensor locations.

104

VA1VA2

VA3

VA4

Robot 1Robot 2

F1

F3

F4

F2

Figure 6.12: Overlaps of the visual fields of the sensors. F1 through F4indicate the observation range of each sensor.

The VAs can determine the correspondence of the robots more robustlywith the knowledge of adjacent sensors. In Fig. 6.12, for example, the visualfield of VA4 (indicated with F4) does not overlap with those of VA1 (F1)and VA3 (F3). When every VA is observing a robot, it is estimated thatVA4 is observing a different one from that of other VAs. In the developedsystem, the knowledge of the overlaps of the visual fields are acquired bysimultaneously observing the robot from all VAs in the task teaching phase,then it is used in the navigation phase.

6.4.2 Previous Work

In distributed artificial intelligence, several fundamental works such as Dis-tributed Vehicle Monitoring Testbed (DVMT) [Lesser83] and Partial GlobalPlanning (PGP) [Durfee91] dealing with systems using multiple sensorshave been reported. In these systems, which are based on the blackboardmodel [Erman80], agents symbolize sensory information with a commonrepresentation, and gradually proceed with their recognition by exchang-ing them. Thus, these systems deal with recognition based on symbolized

105

information.On the other hand, the purpose of the system here is to navigate robots.

In the task teaching phase, the VAs independently memorize navigationpaths as robot tasks from their own viewpoints without symbolizing them.In the navigation phase, the VAs plan a global path of a robot by communi-cating with each other, generate instant navigation plans, and the robot gen-erates an instant actuator command based on the plans. Thus, our navigationsystem deals with motion recognition by multiple agents, and regenerationof the robot tasks by cooperation of the agents.

6.5 Summary

We have developed a prototype of the navigation system. In this chapter,we mainly described the details of the navigation method of mobile robotsusing multiple VAs. In addition, the prototype system partly deals withcommunication among VAs and robots, and also deals with constructionand management of environment models. With the experimental result ofrobot navigation, we have confirmed that the system can robustly navigatemobile robots in a complex environment.

As a future work, more detailed communication among VAs should beconsidered. In the experimentation of Section 6.3, a human operator con-trols a robot to show subtasks of the robot (i.e., navigation between inter-sections) while directly indicating specific places such as intersections, andthe VAs learn the subtasks by observing the robot motion. However, theyneed to communicate with each other to autonomously lean the subtasks.In addition, in the navigation phase, the VAs communicate with each otherto make plans for navigating robots to their destinations, however, in theprototype the communication is simplified and specialized for the small ex-perimental environment. For real world application systems, more sophisti-cated communication will be needed to perform flexible planning by VAs’communications.

Furthermore, the following issues should be considered for extending

106

the scale of the navigation system:

� more accurate identification of multiple robots,

� dynamic organization for navigating many robots by a limited numberof VAs.

With respect to the first point, the system does not utilize a geometricalmap, which realizes robustness and flexibility of the system. Instead, itwill be achieved by observing relations between robot commands and actualrobot motions, and utilizing a qualitative map [Sogo99] which representsrough positional relations of VAs. With respect to the second point, wehave to analyze behaviors of the system in such a situation and developmore sophisticated communication among the VAs.

107

Chapter 7

Real-Time Human Tracker

Recently, various practical application systems have been developed basedon simple computer vision techniques, especially using multiple vision sen-sors with simple visual processing. For example, several systems track peo-ple and automobiles in the real environment with multiple vision sensors,and other systems analyze their behaviors. Compared with systems using asingle vision sensor, these systems enable to observe a moving target in awide area for a long time. However, they need to use many vision sensors toprovide seamless coverage of the environment since a single standard visionsensor itself has a narrow range of view.

On the other hand, the distributed omnidirectional vision system pro-vides a wide range of view with multiple omnidirectional vision sensors,and enables robust recognition of the targets and the environment. As apractical application of the distributed omnidirectional vision system, wehave developed a real-time human tracking system using multiple omnidi-rectional vision sensors. In this chapter, the details of the tracking systemare given.

7.1 Introduction

The system detects people by background subtraction, measures azimuthangles with the omnidirectional vision sensors, and determines their loca-

109

tions by omnidirectional stereo (i.e., method M-1) in real time. In orderto measure target locations, the following problems in the omnidirectionalstereo should be considered:

� correspondence problem among multiple targets,

� measurement precision of target locations, and

� deformable human bodies.

The first problem also occurs in conventional stereo using two or morevision sensors [Kanade96, Mori97, Okutomi93]. However, in our system itis more difficult to verify the correspondence of targets with visual feature,since the baseline of the sensors is much longer than that of conventionalstereo and the sensors may observe different sides of a target. The secondproblem is that the measurement precision of a target location becomes verylow when the target locates along the baseline of two sensors [Ishiguro92]as shown in Figure 2.6. In addition, deformable human bodies should beproperly handled in the real-time localization and tracking process.

In order to solve the above problems, we have extended trinocularstereo [Gurewitz86, Yachida86]. The extended method, called N-ocularstereo, verifies correspondence of multiple targets without visual features.In addition, we have developed several compensation methods of observa-tion errors for measuring target locations more robustly and accurately.

In this system, it is assumed that the locations of the sensors are alreadygiven, since the purpose of this chapter is to discuss the localization methodof the targets.

7.2 Localization of Targets by N-Ocular Stereo

7.2.1 Correspondence Problem and Trinocular Stereo

The target locations are basically measured by omnidirectional stereo as de-scribed in Section 2.2. In order to perform real-time processing, our system

110

Sensor 1 Sensor 2Sensor 3

Target Target

A

B

C

D

Figure 7.1: Localization in the real-time tracking system.

basically detects targets by background subtraction, measures their locationsusing the azimuth angles to the center of each target by omnidirectionalstereo as shown in Figure 7.1.

In the target localization, multiple targets in the environment cause thecorrespondence problem. In Figure 7.1, for example, there are two targets(the gray circles indicate actual target locations), however, from the direc-tions observed by the sensors 1 and 2, it is estimated by stereo that thetargets may exist at A through D in Figure 7.1. In general, this correspon-dence problem can be solved by using visual features of the targets. In oursystem, however, it is difficult to verify the correspondence of targets withvisual features, since the sensors observe targets from various viewpointsand their visual features may differ.

Alternatively, the correspondence problem can also be solved by usingthree or more sensors. In Figure 7.1, the locations C and D are verified withthe sensor 3, then they are eliminated since the sensor 3 does not observethe targets in these directions. This technique is known as trinocular stereo[Gurewitz86, Yachida86], and it can be applied to our system for verifyingthe target correspondence.

111

7.2.2 Problems of Conventional Methods

Observation Errors

When applying trinocular stereo to actual systems, we have to considerobservation errors. In Figure 7.1, for example, the lines indicating az-imuth angles of the target exactly intersect at one point, however, inpractice they do not intersect in this way on account of observation er-rors. Generally, clusters of intersections are considered as target loca-tions [Pattipati92, Sastry91, Shams96].

However, information of vision systems is much noisy compared withthat of radar systems, etc. Furthermore, our system cannot precisely detectthe azimuth angles of targets because of the following reasons:

� If targets are located near sensors, they are widely projected on thesensors, which increases the error of azimuth angles to the targets.

� In our system, the targets are humans whose bodies are quite de-formable. In addition, each vision sensor observes them from variousviewpoints.

The conventional methods for localizing targets do not consider theseproblems. In addition, as described in Section 2.2.2, the omnidirectionalbinocular stereo has a low-precision problem in localizing targets locatedalong the baseline of the sensors [Ishiguro92]. These problems should becarefully considered in our approach.

Computational Costs

In order to solve the correspondence problems and to measure target lo-cations properly, each azimuth angle of the targets detected by the sen-sors should be associated with at least one of the measured locations. Theassignment process of azimuth angles is an optimization problem of NP-hard [Pattipati92]. Several methods for solving it have been proposed sofar [Pattipati92, Shams96], however, these methods need iterative compu-

112

tation (more than 10 or 300 times). Therefore, a more efficient method isnecessary for real-time processing.

The aim of N-ocular stereo introduced in this chapter is to solve the cor-respondence problem with low computational cost within acceptable qualityof solutions, rather than to compute optimal ones. Basically, N-ocular stereois also NP-hard. However, it does not require iterative computation. In addi-tion, the computational costs can be reduced by checking the apparent sizesof targets and ignoring small (distant) ones.

In the following, we mainly discuss the handling of the observation er-rors.

7.2.3 Basic Algorithm of N-Ocular Stereo

In trinocular stereo, three vision sensors are used to measure the target lo-cation and to verify the correspondence. On the other hand, in N-ocularstereo, more than three vision sensors are used. This is based on the ideathat observation errors are reduced by using much visual information.

The basic process of N-ocular stereo is as follows:

1. Measure the location of a target from azimuth angles detected bya pair of arbitrary vision sensors as shown in Figure 7.1 (binocularstereo).

2. Check if another sensor observes the target at the location measuredwith (N�1)-ocular stereo. If so, the location is considered as a resultof N-ocular stereo. (see A and B in Figure 7.1). Iterate this step fromN = 3 to N =(the number of sensors).

3. Finally, the locations measured with only two sensors (C and D inFigure 7.1) are considered as wrong matchings, and erased from thelist of candidates.

113

C- C+C

Sensor

l

l’

l’’

Left side

Right side

m’’

m’

m

Estimated location

Figure 7.2: Localization of a target considering observation errors.

7.2.4 Localization of Targets and Error Handling

As described in Section 7.2.2, observation errors of azimuth angles shouldbe considered when measuring people’s locations, since the human bodydeforms every moment and is widely projected on the sensors. Here, wesuppose the human body is represented with a circle of a constant radius,and the location of a person is represented as the center of the circle. Theerrors in the model matching can be handled with the following two param-eters:

� α: Detection errors of the right and left side of a target

� β : An error of the human model, i.e., the error of the circle’s radius

With the parameters α and β , the center of the circle is localized withinthe hexagon as shown in Figure 7.2. It is computed as follows: supposethat a target C with a radius r is observed from the sensor, and the detectionerror of the right and left side of the target is α , as shown in Figure 7.2.First, a circle C� with a radius (r�β ) is considered. The black region inFigure 7.2 indicates a possible region for the center location of the circle

114

Sensor 1 Sensor 2

Estimated location

Figure 7.3: Localization of a target by binocular stereo.

C�, on condition that the right and left side of the circle C� are projectedwithin �α from those of the target C, respectively. Here, the straight linesl and m are parallel to l0 and m0, respectively, and the black region indicatesonly the upper half of the possible region for the circle C�. In the sameway, the dark gray region indicates a possible region for the center locationof a circle C+ with a radius (r+β ). Here, the straight lines l 00 and m00 areparallel to l and m, respectively. Hence, the center of the circle whose radiusis from (r�β ) to (r+β ) exists in the merged region of the black, the darkgray and the light gray regions (Figure 7.2 shows only the upper half of theregion). This region indicates the location of the target C.

In the above method, target matchings can be verified by checking ifthe hexagons overlap each other. Then, in the first step of N-ocular stereo,the target is localized at the overlapped region of two hexagons as shownin Figure 7.3. In the same way, in the second step, the target is localized atthe overlapped region of N hexagons. If let α and β smaller, the overlappedregion also becomes smaller; and when it finally becomes a point, it can beconsidered as the location of the target.

115

Sensor 1 Sensor 2

Sensor 3

A B

C

DGhost

Target Target

Target

Figure 7.4: False matches in N-ocular stereo.

7.2.5 False Matches in N-Ocular Stereo

N-ocular stereo can solve the correspondence problems of multiple targetsin most cases, however, it cannot solve the target correspondence with aparticular arrangement of targets. In Figure 7.4, for example, it is estimatedby N-ocular stereo that targets exist at up to four locations of A through D,including a false one (called a ghost target). In general, there is no way toeliminate the ghost target except to observe the motion of the intersectionsfor a while [Shams96].

The false match in N-ocular stereo occurs when an azimuth angle of atarget is associated with multiple locations (in Figure 7.4, an azimuth angleobserved by the sensor 1 is used for the locations B and D). Therefore, ifall of azimuth angles which are used for measuring a target location arealso used for other locations, it is estimated that the location may be a falsematch (the location D in the case of Figure 7.4).

In the implemented system, the false matches are considered in the pro-cess of target tracking. In the process, each of the measured locations isrelated to the nearest one of previously measured locations, and the loca-tions of false matches are checked after those of correct matchings.

116

Sensor 1 Sensor 2

Sensor 3

Target

Figure 7.5: Simplified N-ocular stereo.

7.3 Implementing N-Ocular Stereo

7.3.1 Simplified N-Ocular Stereo

In N-ocular stereo described in the previous section, the verification costs ofoverlapped regions of hexagons and that of convergent operations are veryhigh, and it is difficult to perform real-time computation. Therefore, wehave simplified N-ocular stereo as follows:

1. In the first step (binocular stereo), place a circle at the intersectionof the azimuth angles detected by arbitrary two sensors, and considerthe circle as the target location (see three black circles shown in Fig-ure 7.5). Here, the radius of the circle is assumed to be 30cm sincethe targets are people.

2. In the second step (N-ocular stereo), check if the circles overlap eachother to verify if the Nth sensor observes the target. If the circlesoverlap each other, place a new circle with a radius of 30cm at thecenter of gravity of the circles. It is considered as the target locationmeasured with N sensors.

117

7.3.2 Error Handling in the Simplified N-Ocular Stereo

In the simplified N-ocular stereo, errors α and β described in Section 7.2.4are handled as follows.

α: Detection Errors of the Right and Left Side of a Target

As described in Section 7.2.2, binocular stereo using omnidirectional visionsensors has a low-precision problem with respect to targets locating alongthe baseline of the sensors [Ishiguro92]. In the simplified N-ocular stereo,this problem causes the following problems: Figure 7.6 (a), (b) and (c) showexamples in the step of binocular stereo, where there is a target but no circleis placed since there is no intersection on account of observation errors ofazimuth angles. Figure 7.6 (d) shows another example in the step of N-ocular stereo, where the target cannot be localized since the circles whichare placed in the step of (N�1)-ocular stereo do not overlap each other onaccount of observation errors.

Here, we introduce the following techniques to cope with the aboveproblems.

When there is no intersections with respect to two lines: If the angle be-tween the baseline l of the two sensors and each of azimuth anglesdetected by the sensors (let these be θ1 and θ2) are equal to or lessthan α (see Figure 7.6 (a) and (b)), consider that a target exists onthe baseline l. Then, locate the target in such a way that the ratio ofthe distances between the target and each sensor (let this be d1 : d2)matches that of the apparent sizes of the target observed by the sen-sors. If one of the azimuth angles (let this be θ2) is equal to or lessthan α , consider that a target exists on the line representing the otherazimuth angle (θ1). Then, correct the azimuth angle (θ2) with ∆θ(∆θ � α), and locate the target in such a way that the ratio of thedistances d1 : d2 is close to that of the apparent sizes of the target.

When two circles do not overlap each other: If the circles overlap eachother by correcting one of the azimuth angles with ∆θ (∆θ � α),

118

Sensor 1

Sensor 2

(a)

(b)

d1 d2

d1 d2

Sensor 1 Sensor 2

(c)

Sensor 1 Sensor 2

d1d2

Sensor 1 Sensor 2

Sensor 3(d)

l

l

l

Target

1

2

1

2

1

2

Figure 7.6: Error compensation.

119

Omnidirectional Vision SensorsOmnidirectional Vision Sensors

Figure 7.7: Overview of the real-time human tracking system.

consider that they overlap each other (see Figure 7.6 (d)).

β : An Error of the Human Model

After the target is localized, the apparent size of the target reflected on eachsensor can be computed from the distance between the sensor and the mea-sured target location. If it differs by more than β from the actual size ob-served by the sensor, consider that the measured location is a false matching.

7.4 Experimentation

7.4.1 Hardware Configuration

We have developed a real-time human tracking system (see Figure 7.7),which measures people’s locations based on N-ocular stereo and tracks themin real time. The system consists of four sensors, and omnidirectional im-ages taken with the sensors are merged into one image with a quadrantimage unit, then sent to a standard image capture card (Matrox Meteor,640�480 pixels) on a PC (Pentium II 400MHz with 128MB memory). The

120

A B C

Unwrappedimage

Difference

3600

Azimuth

Figure 7.8: Detecting targets by background subtraction.

sensors are arranged in the center of a room (9m�7m) at a height of approx-imately 1m. In this system, the locations and the orientations of the sensorsare measured before tracking.

The system detects targets in the omnidirectional images by backgroundsubtraction, since the sensors are fixed in the environment. The top of Fig-ure 7.8 shows an unwrapped image, and the bottom graph shows the verticalsum of the difference at each pixel. The targets A, B and C are detected witha threshold shown with the broken line in Figure 7.8, which is determinedby taking 10 frames. The centroid is regarded as the azimuth of the target.

7.4.2 Measurement Precision of N-ocular Stereo

Figure 7.9 shows the error range of target locations measured by the system.Here, we have used a white cylinder with a diameter of 30cm as a target, andplaced it at precisely measured marks on the floor. The circles A through Nin Figure 7.9 indicate the locations of the marks and “+” indicates cylinderlocations measured by the system over 100 frames. Thus, the distribution ofthe measured locations (Figure 7.9) is analogous to that of the uncertainty ofomnidirectional stereo (see Figure 2.8). Table 7.1 shows averages and errors(distances between measured and actual target locations) of the measuredlocations. The maximum error is 0.17m at the location A.

In Figure 7.9, we can find that the target locations are measured within

121

1 72

3

4

5

6

2 3 4 5 6

Y [

m]

X [m]

Sensor

Sensor Sensor

Sensor

A B C D E

F G H I

J K L M N

Figure 7.9: Measurement precision of N-ocular stereo.

Table 7.1: Averages and errors of the measured locations.

Loc Average [m] Err [m]A (1.35, 5.11) 0.170B (2.37, 5.07) 0.098C (3.10, 5.15) 0.056D (3.88, 5.08) 0.086E (4.61, 5.12) 0.041F (1.60, 4.35) 0.077G (2.36, 4.34) 0.077

Loc Average [m] Err [m]H (3.85, 4.38) 0.043I (4.63, 4.38) 0.061J (1.61, 3.58) 0.092K (2.40, 3.61) 0.114L (3.10, 3.59) 0.053M (3.85, 3.59) 0.047N (4.63, 3.61) 0.056

122

Y [

m]

1

2

3

4

5

6

1 2 3 4 5 6 7X [m]

ODVSsP

A

A

A

Q

A

B

A

Table

Figure 7.10: Trajectories of a walking person.

5cm error if the target locates within 3m from three sensors (in N-ocularstereo, at least three sensors need to simultaneously observe the same targetfor measuring its location). However, the precision depends on the num-ber of sensors, the arrangement of the sensors, the precision of backgroundsubtraction, and so on.

7.4.3 Tracking People

Figure 7.10 shows trajectories of a walking person for one minute, with thesame arrangement of sensors as the experimentation in Section 7.4.2. Thesolid lines show the trajectories, and dots on the lines show the person’s lo-cations at intervals of 0.5 second. As shown in Figure 7.10, the system couldtrack the person without loosing sight. In this experimentation, the person’slocation measured by N-ocular stereo is smoothed during 0.5 second, so thatthere is a delay of about 0.25 second.

123

The broken lines in Figure 7.10 indicate the person’s locations at everyframe before smoothing. There are large errors around A and B. This isbecause (1) binocular stereo using omnidirectional vision sensors has a low-precision problem with respect to targets locating along the baseline of thesensors (this corresponds A in Figure 7.10), and (2) the result of backgroundsubtraction becomes noisy if the color of person’s clothes is similar to thatof the background (this corresponds B in Figure 7.10). In the latter case,general noise filtering techniques such as the Kalman filter may not be ableto successfully eliminate the noise, since the noise is different from whitenoise. It is effective to add additional sensors to cope with this kind of noise.

In this implementation, the system could simultaneously track three peo-ple at video rate (30 fps). The experimental results show that the systemcompletely tracked one person, and correctly tracked two persons for 99%of the time, and three persons for 89% of the time. Tracking errors occuredin the following cases:

� When two or more people moved closer with each other, the systemrecognized them as one person in background subtraction, or couldnot correctly identify them in the tracking phase.

� When a person moved behind another person, the system could notmeasure its location.

In order to reduce the former error, a more sophisticated method isneeded for detecting people. The latter error will be reduced by addingsensors into the environment.

Figure 7.11 shows a tracking result with a different configuration of thesensors. Here, a walking person is tracked by eight omnidirectional visionsensors. In the figure, the solid line indicates the tracking result using eightsensors, and the other broken lines indicate the result using only four sen-sors. We can find that the measurement precision has been improved withthe use of eight sensors.

124

1

2

3

4

5

6

7

1 2 3 4 5 6 7 8 9

Y [m]

X [m]

Sensors 5−8Sensors 1−4

All sensors

Table

A

B

C’

C

1

2

3

4

5

67

8

: ODVSs

D

Figure 7.11: Tracking people with eight sensors.

7.5 Applications

The developed system can show live images of tracked people as shown inFigure 7.12 as well as record their trajectories. The images are taken withthe sensors, and zoomed in and out according to the distance between thepeople and the sensor. The side views of the people also enable the systemto identify the people with the colors of their clothes, to recognize their be-haviors by observing motions of their head and arms, etc. In addition, morerobust recognition will be achieved by using redundant visual informationtaken from various points of view with multiple sensors.

7.5.1 Monitoring System

The distributed omnidirectional vision system can synthesize images at ar-bitrary viewpoints in the environment from the images of very wide viewing

125

Figure 7.12: Showing people’s locations and images.

angle taken by the omnidirectional vision sensors. By combining real-timehuman tracking, the system can show users the images that the tracked per-son is viewing.

Virtual View Generation

Figure 7.13 shows the basic idea of the range space search for synthesizingvirtual images at an arbitrary viewpoint [Ng99]. In order to synthesize avirtual image, the method computes every small region projected onto thevirtual image plane as follows:

1. Assume that there is object M at a distance d from the virtual view-point and it is projected onto m on the virtual image plane. If theassumption is true, the object is also projected onto a specific loca-tion in the view of each sensor, m1 through m4 in Figure 7.13. Thelocations can be computed based on d and the viewing direction (see

126

Viewpoint

d

Virtual image plane

M

m

m1 m2

m3

m4

Figure 7.13: Range space search for synthesizing an image at a virtual view-point.

Appendix A). That is, m1 through m4 have a similar template imageif there is an object at M.

2. Compute the best match of the projected images among the sensors byvarying d (range space search), and regard the template image of thebest match as an image projected onto the small region on the virtualimage plane.

The best match can be stably computed with larger templates than thoseused in standard dense stereo.

7.5.2 Gesture Recognition System

By adding the function of gesture recognition to the tracking system, thesystem can recognize the motions of people wherever they are, with multipleomnidirectional vision sensors observing them at various viewpoints. Thisis a great advantage over conventional recognition systems using a singlecamera.

127

Figure 7.14: Modeling human motions with multiple omnidirectional visionsensors.

Modeling Human Motions

Our system has models of human motions for gesture recognition given inadvance. The models, called view and motion-based aspect model (VAM-BAM) [Ishiguro01], are simultaneously extracted from multiple vision sen-sors surrounding a person in the modeling phase (see Figure 7.14).

Compared with conventional models for gesture recognition, our modelhas rich visual information about the human motions. Existing methods ex-tract motion models with a single camera located in front of a person. How-ever, the best viewing angle varies according to the pose of a person. Thatis, cameras taking the side view of the person may give the best informationabout the motion. By taking the person from various directions, our methodcan model human motions more properly compared to conventional meth-ods. Since the sensors surrounding the person have different priority formotion recognition, the feature vector of each sensor has a different weightaccording to the motion.

Recognizing Human Motions

In the recognition process of human motions, the system tracks a personin the environment and observes the motions with multiple sensors. Thesensors digitally zoom in on the person according to the measured distance

128

Figure 7.15: Screen shot of a gesture recognition system.

between the sensors and the person, in order to normalize the apparent size.Then, based on the relative orientation of the person and the sensors, the sys-tem compares the models and the motion observed at multiple viewpoints.

Figure 7.15 is a screen shot of the developed system. Our gesture recog-nition system based on the distributed omnidirectional vision system real-izes location/orientation-independent gesture recognition with multiple om-nidirectional vision sensors.

129

7.6 Summary

In this chapter, we have explained N-ocular stereo for verifying the cor-respondence among multiple targets and measuring their locations, usingmultiple omnidirectional vision sensors. In addition, several methods havebeen developed for compensating observation errors, in order to cope withthe precision problem in omnidirectional stereo. With the experimentation,we have shown that the system can robustly track people in real time onlywith visual information.

Since the system gives live images of tracked people and records theirtrajectories, various application systems can be developed using the track-ing system, such as a monitoring system. By adding the function of ges-ture recognition, the system can recognize human behaviors. Thus, the dis-tributed omnidirectional vision system will recognize the behaviors of peo-ple and robots, and support them as an infrastructure for real world agents.

130

Chapter 8

Conclusion

One of the practical approaches recently focused on in the areas of computervision and robotics is to use multiple vision sensors with simple visual pro-cessing. The distributed omnidirectional vision system proposed in this the-sis is distinguished from existing multiple camera systems by the seamlesscoverage of the environment provided with multiple omnidirectional visionsensors. Taking advantage of the rich and redundant visual information, wecan develop various practical vision systems.

In this research, we have studied the fundamental methods for localiz-ing the sensors and objects in the distributed omnidirectional vision system.The distributed omnidirectional vision system has different aspects in local-ization compared to conventional multiple camera systems, that is, targetlocations are measured by omnidirectional stereo, and a pair of omnidirec-tional vision sensors located apart from each other is used for the stereo.In addition, various methods should be considered according to situations,e.g., whether the locations of the sensors and/or objects are known.

The thesis has systematically discussed these issues by classifying theminto five groups. When the locations of the sensors are known, objects arelocalized by omnidirectional stereo, where the use of multiple sensors im-proves the precision of the localization. When the locations of objects areknown, the sensors are localized by triangulation, where the reference ob-jects should be carefully selected in order to avoid localization errors. The

131

sensors are also localized by observing the azimuth angles to other sensorsand applying triangle constraints.

The localization methods of the sensors by observing reference objectsof unknown locations are important, since the system consists of a largenumber of sensors. That is, with the methods, the locations of the sen-sors can be autonomously computed by observing objects in the environ-ment. Generally, if the locations of the objects are unknown, it is difficultto compute the sensor locations by solving complex numerical expressionson account of many unknown parameters. For this issue, the thesis pro-posed methods in two different approaches, i.e., statistical and qualitativeapproaches, with which the sensor locations can be computed without com-plex numerical expressions.

� In the statistical localization of the sensors, the method observes ob-jects, analyzes the azimuth angles, and estimates the baselines amongthe sensors, from which the sensors are localized using triangle con-straints. In this method, it is not necessary to determine the corre-spondence of the objects.

� In the qualitative localization, the method observes the motion direc-tions of objects, applies three point constraints, and determines thequalitative locations of the sensors. This method directly acquires thequalitative representation from the qualitative observed information.

By integrating the localization methods according to situations, vari-ous localization methods and application systems can be developed with thedistributed omnidirectional vision system. For example, stepwise and au-tonomous localization of the sensors is possible by integrating qualitative,statistical, and quantitative localization methods. In multiple camera sys-tems including the distributed omnidirectional vision system that consist ofmany vision sensors and work as an infrastructure in the real environment,this method is important for the flexibility of the system.

In addition to the discussion about the localization methods, we havedeveloped application systems, i.e., a robot navigation system and a real-time human tracking system. In the implementation of actual systems, we

132

considered various problems such as modeling of the environment and theuse of multiple vision sensors, as well as localization of the sensors andobjects.

� In the robot navigation system, we have proposed a method for navi-gating robots without precise sensor parameters by the distributed om-nidirectional vision system. The system performs complicated robottasks with simple visual processing using rich visual information pro-vided by many sensors.

� In the real-time human tracking system, we have proposed N-ocularstereo for verifying the correspondence among multiple targets andmeasuring their locations in real time using multiple omnidirectionalvision sensors. In addition, several methods have been developed forcompensating observation errors, in order to improve the localizationprecision in omnidirectional stereo.

Although they employ quite simple visual processing, these practicalsystems perform stable control of the robots, precise localization of people,etc., with rich and redundant visual information provided by the distributedomnidirectional vision system.

Thus, in this research we have made a systematic study on the distributedomnidirectional vision system, from the localization methods of the sen-sors and objects as the most fundamental techniques, to the developmentof application systems. The techniques investigated in this thesis form thefoundations of the distributed omnidirectional vision system.

Recently, many researchers started to investigate social robot systemsthat interact with humans. The distributed omnidirectional vision systemwill solve various problems in realizing such systems; it supports robot ac-tivities in the real environment by providing visual information. In addition,it can recognize and record human behaviors by tracking, gesture recogni-tion, etc., which is useful for analyzing social aspects of the human activi-ties. Thus, the distributed omnidirectional vision system breaks a promisingresearch direction in multiple camera systems.

133

In conclusion, we point out future research directions. Although thisthesis mainly discusses the localization methods as one of the most funda-mental issues in the distributed omnidirectional vision system, the followingissues should be also addressed:

� Distributed processing

In order to develop a large-scale system using a large number of om-nidirectional vision sensors, the scalability of the system is important.There are several systems that employ parallel but centralized pro-cessing. However, distributed processing is necessary for infrastruc-ture systems embedded in the real environment, since the flexibilityof the system is important; for example, the sensors may be added orremoved if required. Therefore, we should consider distributed pro-cessing for memorizing the environmental models, tracking and rec-ognizing targets, etc., in a distributed manner according to dynamicevents in the environment.

� Dynamic organization of the network

The network structure connecting the sensors (i.e., the processing el-ements) is also important. In existing systems using multiple cam-eras, the computer network has a simple and static structure such as aLAN. However, dynamic organization of the network is necessary forrealizing complicated infrastructure systems that recognize dynamicevents over the network. Since the function of each element (sensor,processor, etc.) of the system in the distributed processing is not pre-defined but dynamically changes according to the events and tasks,the network should be also dynamically organized according to them,in order to achieve efficient processing. In addition, this dynamicorganization itself may represent the events that happen in the envi-ronment, since there are close relations between the events and theinformation flows over the network.

134

Bibliography

[Alon86] Noga Alon and Ervin Gyori, “The number of small semispacesof a finite set of points in the plane,” J. Combinatorial Theory, SeriesA, Vol. 41, No. 1, pp. 154–157, 1986.

[Ayache89] Nicholas Ayache and Olivier Faugeras, “Maintaining represen-tations of the environment of a mobile robot,” IEEE Trans. Roboticsand Automation, Vol. 5, No. 6, pp. 804–819, 1989.

[Boyd98] Jeffrey E. Boyd, Edward Hunter, Patrick H. Kelly, Li-Cheng Tai,Clifton B. Phillips, and Ramesh C. Jain, “MPI-Video infrastructure fordynamic environments,” In Proc. IEEE Int. Conf. Multimedia Comput-ing and Systems, pp. 249–254, 1998.

[Broida90] Ted J. Broida, S. Chandrashekhar, and Rama Chellappa, “Re-cursive 3-d motion estimation from a monocular image sequence,”IEEE Trans. Aerospace and Electronic Systems, Vol. 26, No. 4, pp.639–656, 1990.

[Dey98] Tamal K. Dey, “Improved bounds for planar k-sets and relatedproblems,” Discrete and Computational Geometry, Vol. 19, No. 3, pp.373–382, 1998.

[Durfee91] Edmund H. Durfee and Victor R. Lesser, “Partial global plan-ning: A coordination framework for distributed hypothesis formation,”IEEE Trans. Systems, Man, and Cybernetics, Vol. 21, No. 5, pp. 1167–1183, 1991.

135

[Erman80] Lee D. Erman, Frederick Hayes-Roth, Victor R. Lesser, andD. Raj Reddy, “The Hearsay-II speech-understanding system: Inte-grated knowledge to resolve uncertainty,” Comput. Surveys, Vol. 12,pp. 213–253, 1980.

[Forbus91] Kenneth D. Forbus, Paul Nielsen, and Boi Faltings, “Qualitativespatial reasoning: the CLOCK project,” Artificial Intelligence, Vol. 51,pp. 417–471, 1991.

[Freksa92] Christian Freksa, “Using orientation information for qualitativespatial reasoning,” In A. U. Frank, I. Campari, and U. Formentini Eds.,Theories and Methods of Spatio-Temporal Reasoning in GeographicSpace. International Conference GIS, LNCS, Vol. 639, pp. 162–178,Springer, Berlin, 1992.

[Grimson98] W. E. L. Grimson, Chris Stauffer, Raquel Romano, and LilyLee, “Using adaptive tracking to classify and monitor activities in asite,” In Proc. IEEE Conf. Computer Vision and Pattern Recognition(CVPR), pp. 22–29, 1998.

[Gurewitz86] E. Gurewitz, I. Dinstein, and B. Sarusi, “More on the benefitof a third eye,” In Proc. ICPR, pp. 966–968, 1986.

[Hoover00] Adam Hoover and Bent David Olsen, “Sensor network per-ception for mobile robotics,” In Proc. IEEE Int. Conf. Robotics andAutomation, pp. 342–347, 2000.

[Hosoda94] Koh Hosoda and Minoru Asada, “Versatile visual servoingwithout knowledge of true jacobian,” In Proc. Int. Conf. IntelligentRobots and Systems (IROS), pp. 186–193, 1994.

[Ishiguro97] Hiroshi Ishiguro, “Distributed vision system: A perceptualinformation infrastructure for robot navigation,” In Proc. IJCAI, pp.36–41, 1997.

136

[Ishiguro98] Hiroshi Ishiguro, “Development of low-cost compact omni-directional vision sensors and their applications,” In Proc. Int. Conf.Information Systems, Analysis and Synthesis, pp. 433–439, 1998.

[Ishiguro01] Hiroshi Ishiguro and Takuichi Nishimura, “VAMBAM: Viewand motion-based aspect models for distributed omnidirectional visionsystems,” In Proc. IJCAI, pp. 1375–1380, 2001.

[Ishiguro92] Hiroshi Ishiguro, Masashi Yamamoto, and Saburo Tsuji,“Omni-directional stereo,” IEEE Trans. Pattern Analysis and MachineIntelligence, Vol. 14, No. 2, pp. 257–262, 1992.

[Isli00] Amar Isli and Anthony G. Cohn, “A new approach to cyclic or-dering of 2D orientations using ternary relation algebras,” ArtificialIntelligence, Vol. 122, pp. 137–187, 2000.

[Kanade99] Takeo Kanade, Peter Rander, Sunder Vedula, and Hideo Saito,“Virtualized reality: Digitizing a 3D time-varying event as is and inreal time,” In Yuichi Ohta and Hideyuki Tamura Eds., Mixed Reality,Merging Real and Virtual Worlds, pp. 41–57, Springer-Verlag, Berlin,1999.

[Kanade96] Takeo Kanade, Atsushi Yoshida, Kazuo Oda, Hiroshi Kano,and Masaya Tanaka, “A stereo machine for video-rate dense depthmapping and its new applications,” In Proc. IEEE Conf. ComputerVision and Pattern Recognition (CVPR), pp. 196–202, 1996.

[Kato99] Koji Kato and Hiroshi Ishiguro, “Identifying and localizingrobots in a multi-robot system,” In Proc. Int. Conf. Intelligent Robotsand Systems (IROS), pp. 966–972, 1999.

[Kim92] Hyun-Kyung Kim, “Qualitative kinematics of linkages,” InBoi Faltings and Peter Struss Eds., Recent Advances in QualitativePhysics, pp. 137–151, MIT Press, London, 1992.

137

[Kuipers91] Benjamin J. Kuipers and Yung-Tai Byun, “A robot explorationand mapping strategy based on a semantic hierarchy of spatial repre-sentations,” J. Robotics and Autonomous Systems, Vol. 8, pp. 47–63,1991.

[Latecki93] Longin Latecki and Ralf Rohrig, “Orientation and qualitativeangle for spatial reasoning,” In Proc. IJCAI, pp. 1544–1549, 1993.

[Lesser83] Victor R. Lesser and Daniel D. Corkill, “The distributed ve-hicle monitoring testbed: A tool for investigating distributed problemsolving networks,” AI Magazine, pp. 15–33, 1983.

[Levitt90] Tod S. Levitt and Daryl T. Lawton, “Qualitative navigation formobile robots,” Artificial Intelligence, Vol. 44, pp. 305–360, 1990.

[Madsen98] Claus B. Madsen and Claus S. Andersen, “Optimal landmarkselection for triangulation of robot position,” J. Robotics and Au-tonomous Systems, Vol. 23, No. 4, pp. 277–292, 1998.

[Matsuyama98] Takashi Matsuyama, “Cooperative distributed vision —dynamic integration of visual perception, action, and communication—,” In DARPA Image Understanding Workshop, pp. 365–384, 1998.

[Matsuyama00] Takashi Matsuyama, Shinsaku Hiura, Toshikazu Wada,Kentaro Murase, and Akio Yoshioka, “Dynamic memory: Architec-ture for real time integration of visual perception, camera action, andnetwork communication,” In Proc. IEEE Conf. Computer Vision andPattern Recognition (CVPR), pp. 728–735, 2000.

[Matthies87] Larry Matthies and Steven A. Shafer, “Error modeling instereo navigation,” IEEE J. Robotics and Automation, Vol. RA-3, No.3, pp. 239–248, 1987.

[Mori97] Taketoshi Mori, Yoshikatsu Kamisuwa, Hiroshi Mizoguchi, andTomomasa Sato, “Action recognition system based on human finder

138

and human tracker,” In Proc. Int. Conf. Intelligent Robots and Systems(IROS), pp. 1334–1341, 1997.

[Nagel86] Hans-Hellmut Nagel, “Image sequences — ten (octal) years —from phenomenology towards a theoretical foundation,” In Proc. Int.Conf. Pattern Recognition, pp. 1174–1185, 1986.

[Ng99] Kim C. Ng, Hiroshi Ishiguro, Mohan M. Trivedi, and Takushi Sogo,“Monitoring dynamically changing environments by ubiquitous visionsystem,” In Second IEEE Workshop on Visual Surveillance, pp. 67–73,1999.

[Nishida97] Yoshifumi Nishida, Masashi Takeda, Taketoshi Mori, HiroshiMizoguchi, and Tomomasa Sato, “Monitoring patient respiration andposture using human symbiosis system,” In Proc. Int. Conf. IntelligentRobots and Systems (IROS), pp. 632–639, 1997.

[Okutomi93] Masatoshi Okutomi and Takeo Kanade, “A multiple-baselinestereo,” IEEE Trans. Pattern Analysis and Machine Intelligence, Vol.15, No. 4, pp. 353–363, 1993.

[Pattipati92] Krishna R. Pattipati, Somnath Deb, Yaakov Bar-Shalom, andR. B. Washburn, “A new relaxation algorithm and passive sensor dataassociation,” IEEE Trans. Automatic Control, Vol. 37, No. 2, pp. 198–213, 1992.

[Roach80] John W. Roach and J. K. Aggarwal, “Determining the move-ment of objects from a sequence of images,” IEEE Trans. PatternAnalysis and Machine Intelligence, Vol. 2, No. 6, pp. 554–562, 1980.

[Sastry91] C. R. Sastry, E. W. Kamen, and M. Simaan, “An efficient algo-rithm for tracking the angles of arrival of moving targets,” IEEE Trans.Signal Processing, Vol. 39, No. 1, pp. 242–246, 1991.

[Schlieder95] Christoph Schlieder, “Reasoning about ordering,” In Proc.Int. Conf. Spatial Information Theory, pp. 341–349, 1995.

139

[Shams96] Soheil Shams, “Neural network optimization for multi-targetmulti-sensor passive tracking,” Proc. IEEE, Vol. 84, No. 10, pp. 1442–1457, 1996.

[Sogo99] Takushi Sogo, Hiroshi Ishiguro, and Toru Ishida, “Acquisitionof qualitative spatial representation by visual observation,” In Proc.IJCAI, pp. 1054–1060, 1999.

[Sogo01] Takushi Sogo, Hiroshi Ishiguro, and Toru Ishida, “Acquisitionand propagation of spatial constraints based on qualitative informa-tion,” IEEE Trans. Pattern Analysis and Machine Intelligence, Vol.23, No. 3, pp. 268–278, 2001.

[Yachida86] Masahiko Yachida, “3-D data acquisition by multiple views,”In 3rd International Symposium on Robotics Research (ISRR’85), pp.11–18, MIT Press, London, 1986.

[Yeap88] Wai K. Yeap, “Towards a computational theory of cognitivemaps,” Artificial Intelligence, Vol. 34, pp. 297–360, 1988.

[Yokoo98] Makoto Yokoo, Edmund H. Durfee, Toru Ishida, and KazuhiroKuwabara, “The distributed constraint satisfaction problem: Formal-ization and algorithms,” IEEE Trans. Knowledge and Data Engineer-ing, Vol. 10, No. 5, pp. 673–685, 1998.

[Yokoo99] Makoto Yokoo and Toru Ishida, “Search algorithms for agents,”In Gerhard Weiss Ed., Multiagent Systems: A Modern Approach toDistributed Artificial Intelligence, pp. 165–199, MIT Press, 1999.

140

Publications

Major Publications

Journals

1. Takushi Sogo, Hiroshi Ishiguro, and Toru Ishida, “Spatial constraintpropagation for identifying qualitative spatial structure,” Transactionsof IEICE (The Institute of Electronics, Information and Communica-tion Engineers), Vol. J81-D-II, No. 10, pp. 2311–2320, 1998 (inJapanese).

2. Takushi Sogo, Katsumi Kimoto, Hiroshi Ishiguro, and Toru Ishida,“Mobile robot navigation by a distributed vision system,” Journal ofthe Robotics Society of Japan, Vol. 17, No. 7, pp. 1009–1016, 1999(in Japanese).

3. Takushi Sogo, Hiroshi Ishiguro, and Mohan M. Trivedi, “Real-timehuman tracking system with multiple omni-directional vision sen-sors,” Transactions of IEICE (The Institute of Electronics, Informa-tion and Communication Engineers), Vol. J83-D-II, No. 12, pp.2567–2577, 2000 (in Japanese).

4. Takushi Sogo, Hiroshi Ishiguro, and Toru Ishida, “Acquisition andpropagation of spatial constraints based on qualitative information,”IEEE Transactions on Pattern Analysis and Machine Intelligence(PAMI), Vol. 23, No. 3, pp. 268–278, 2001.

141

5. Takushi Sogo, Hiroshi Ishiguro, and Toru Ishida, “Mobile robot nav-igation by a distributed vision system,” New Generation Computing,Vol. 19, No. 2, pp. 121–137, 2001.

6. Takuichi Nishimura, Takushi Sogo, Shinobu Ogi, Ryuichi Oka, andHiroshi Ishiguro, “Recognition of human motion behaviors usingview-based aspect model based on motion change,” Transactions ofIEICE (The Institute of Electronics, Information and CommunicationEngineers), D-II, 2001 (in Japanese, to appear).

7. Takushi Sogo, Hiroshi Ishiguro, and Toru Ishida, “Studies on dis-tributed omnidirectional vision systems,” IPSJ Transactions on Com-puter Vision and Image Media, 2001 (in Japanese, to appear).

International Conference

1. Takushi Sogo, Hiroshi Ishiguro, and Toru Ishida, “Acquisition ofqualitative spatial representation by visual observation,” InternationalJoint Conference on Artificial Intelligence (IJCAI’99), pp. 1054–1060, 1999.

Chapter in Book

1. Takushi Sogo, Hiroshi Ishiguro, and Mohan M. Trivedi, “N-ocularstereo for real-time human tracking,” In Ryad Benosman and SingBing Kang Eds., Panoramic Vision: Sensors, Theory and Applica-tions, pp. 359–375, Springer-Verlag, 2001.

Other Publications

Workshops

1. Hiroshi Ishiguro, Ryusuke Sagawa, Takushi Sogo, and Toru Ishida,“Human behavior recognition by a distributed vision system,” Di-

142

CoMo Workshop, pp. 615–620, 1997 (in Japanese).

2. Takushi Sogo, Hiroshi Ishiguro, and Toru Ishida, “Spatial con-straint propagation for identifying qualitative spatial structure,” 6thWorkshop on Multi-Agent and Cooperative Computation (MACC),http://www.kecl.ntt.co.jp/csl/msrg/events/macc97/sogo.html, 1997 (inJapanese).

3. Kim C. Ng, Hiroshi Ishiguro, Mohan M. Trivedi, and Takushi Sogo,“Monitoring dynamically changing environments by ubiquitous vi-sion system,” Second IEEE Workshop on Visual Surveillance (VS’99),pp. 67–73, 1999.

4. Takushi Sogo, Hiroshi Ishiguro, and Toru Ishida, “Mobile robot navi-gation by distributed vision agents,” In Nakashima, H. and Zhang, C.Eds., Approaches to Intelligent Agents (Second Pacific Rim Interna-tional Workshop on Multi Agents (PRIMA’99)), Lecture Notes in Ar-tificial Intelligence, Vol. 1733, pp. 96–110, Springer-Verlag, Berlin,1999.

5. Takuichi Nishimura, Takushi Sogo, Ryuichi Oka, and Hiroshi Ishig-uro, “Recognition of human motion behaviors using multiple omni-directional vision sensors,” SIG-CII-2000-MAR-04, Japan Society forArtificial Intelligence, pp.16–21, 2000.

6. Takushi Sogo, Hiroshi Ishiguro, and Mohan M. Trivedi, “Real-timetarget localization and tracking by N-ocular stereo,” IEEE Workshopon Omnidirectional Vision (OMNIVIS’00), pp. 153–160, 2000.

Conventions

1. Takushi Sogo, Hiroshi Ishiguro, and Toru Ishida, “Building qualita-tive spatial models for distributed vision systems,” 55th National Con-vention IPSJ, 5AB-9, 1997 (in Japanese).

143

2. Takushi Sogo, Hiroshi Ishiguro, and Toru Ishida, “Identification ofqualitative spatial structure by observation,” 12th Annual Conferenceof Japan Society for Artificial Intelligence, pp. 86–89, 1998 (inJapanese).

144

Appendix A

Generating Perspective Images

A.1 Mirror Shapes and Omnidirectional Im-ages

As shown in Figure 1.1, the omnidirectional vision sensor has a mirror toobtain a 360 degree image with a standard camera mounted in the bottomof the sensor. There are various omnidirectional vision sensors of differentmirror shapes, and different omnidirectional images are obtained accord-ing to the shape. The mirror shape of the sensor used in this research ishyperboloid, which has a single center of projection and provides an omni-directional image that can be transformed into a normal perspective image.

In the omnidirectional vision sensor of a hyperboloidal mirror, one ofthe focal points of the hyperboloidal mirror is exactly adjusted to the focalpoint OC of the camera as shown in Figure A.1. With this configuration,object point M at an arbitrary location is reflected on the surface of thehyperboloidal mirror, then it is projected into the view of the camera. Animage point of the camera at OC can be uniquely related to an image pointof a virtual camera at OM.

Therefore, a normal perspective image taken with a virtual camera atOM can be generated from the image taken with the camera at OC.

145

OC

OMM

Hyperboloidal mirror

Camera

Figure A.1: Optical properties of the hyperboloidal mirror.

A.2 Transforming an Omnidirectional Imageinto a Perspective Image

The omnidirectional image PO = (x;y) is transformed into a virtual perspec-tive image PV = (u;v) as follows (see Figure A.2).

Let the parameters of the hyperboloidal mirror be a and b. The mirrorshape is represented as:

x2 + y2

a2 � z2

b2 =�1;

and its focal points are at OM = (0;0;C) and OC = (0;0;�C), whereC =p

a2 +b2 (i.e., eccentricity of the mirror). The focal length of the om-nidirectional image is fO and that of the virtual perspective image is fV .The focal point of the virtual camera should be at OM, and the direction is

146

Virtual image plane

Image plane

OM

OC

fO

fV

( V , V)u

v

( P , P)PV

x

y

z

P

P

PM

PO

Figure A.2: Generating perspective images.

represented with two angles as (θV , φV ). Here, we assume that the tilt angleof the virtual image plane is zero, i.e., u axis is parallel to x-y plane.

The direction (θP;φP) of a point PV on the virtual image plane from OM

is represented as:

θP = θV � arctan

�ufV

�(A.1)

φP = φV � arctan

�vfV

�(A.2)

Note that tanφP must be less than tan(b=a), otherwise PV comes to the inside

147

OM = (0, C)

z

PM = (RM, zM)

R

RM

Pl

H

z = Rba

z = Rba

Figure A.3: R-z plane.

of the mirror.Then, PV is projected onto the mirror (let this point be PM =

(xM;yM;zM)). Let us consider the plane defined with three points PV , OM,and OC (let this be R-z plane). PM is computed as follows (see also Fig-ure A.3):

0@ xM

yMzM

1A=

0@ RM cosθP

RM sinθPbp

(RM=a)2 +1

1A ; (A.3)

where RM is the distance between PM and z axis, which depends only on φP.RM is at the intersection of line l and hyperbolic curve H on the R-z plane:

l : z = tanφP �R+C (A.4)

148

H :R2

a2 �z2

b2 =�1: (A.5)

Then, RM is computed from the above equations:

RM =

8<:

C tanφP+bp

tan2 φP+1(b=a)2�tan2 φP

�if tanφP 6=�b

a

�a3

2Cb

�if tanφP =�b

a

� (A.6)

Finally, PM is projected onto the image plane (let this be PO = (x;y)).Since the ray going toward the focal point OM in the hyperboloidal mirroris reflected to the other focal point OC, PO is computed as follows:

�xy

�=

fO

zM +C

�xMyM

�

=fORM

zM +C

�cosθPsinθP

�: (A.7)

Thus, the virtual perspective image PV = (u;v) is projected onto the om-nidirectional image PO = (x;y). With equations (A.1), (A.2), (A.3), (A.6),and (A.7), the pixel values on the virtual perspective image can be computedfrom those on the omnidirectional image.

A.3 Fast Transformation Using Lookup Tables

In order to realize fast transformation from PV = (u;v) to PO = (x;y), theuse of lookup tables is considered.

The projection depends on the direction of the virtual image plane andthe position of PV in the virtual image plane, i.e., four variables (θV , φV , u,v), and is represented as follows:

(x; y) = f (θV ; φV ; u; v);

which obviously needs a huge lookup table.

149

However, equation (A.1) depends on u and θV , equation (A.2) dependson v and φV , and equations (A.3), (A.6), and (A.7) depend on θP, φP, re-spectively. Therefore, the transformation from PV = (u;v) to PO = (x;y) canbe computed with the following small lookup tables:

θP = f (θV ; u);

φP = g(φV ; v);

(x; y) = h(θP; φP):

150

Appendix B

Estimating Camera Parametersby Observation

In the robot navigation system shown in Chapter 6, the parameters αi and∆αi of the omnidirectional vision sensor is used for generating navigationplans (see Section 6.2.4). Even if standard vision sensors are used insteadof omnidirectional vision sensors, the navigation method can be used fornavigation by estimating αi and ∆αi based on observation as follows.

B.1 Estimation Method

In general, the position of a vision sensor is represented with six parame-ters: rotation and translation parameters. Hosoda and Asada [Hosoda94]proposed a method for estimating the camera parameters by visual feed-back. In the navigation system shown in Chapter 6, each camera observesmoving robots in an environment in order to estimate the camera parametersin the same way. However, the system considers the bottom of the robot asits position so that the robot position measured by observation is imprecise,which makes it difficult to estimate all the six camera parameters. There-fore, we estimate three parameters αi and βi as shown in Fig. B.1 in anon-line manner.

Let x and y be reference axes of rectangular coordinates, where the di-

151

RobotPath of the robot

x

yV

i

i

View of VA i

i

Vi

v

ui

Figure B.1: Estimating camera parameters

rection of the x axis indicates the motion direction of the robot, and let αi, βi

and γi be the tilt angle of sensor i, the angle between sensor i and the y axis,and the rotation angle around the viewing direction of sensor i, respectively.Assuming orthographic projection, the velocity of the robot V is projectedinto the view of sensor i as follows:

V i = SiT iRiV

where the vector V i = (ui; vi)T is the velocity projected in the view of sen-

sor i, Ri and Si represent rotation matrices of βi and �γi, respectively, andT i represents a matrix of orthographic projection:

Ri =

�cosβi �sinβisinβi cosβi

�;

Si =

�cosγi sinγi�sinγi cosγi

�;

T i =

�1 00 sinαi

�:

152

Hence, the velocity V is represented as follows using V i = (ui; vi)T :

V = R�1i T�1

i S�1i V i

=

cosβi

sinβisinαi

�sinβicosβisinαi

!�u0iv0i

�(B.1)

where u0i and v0i are:�u0iv0i

�= S�1

i V i

=

�cosγi �sinγisinγi cosγi

��uivi

�: (B.2)

Therefore,

V 2 = u02i +

�v0i

sinαi

�2

:

If a human operator controls the robot with a constant speed, jV j is a knownvalue. Consequently, αi can be computed with the following equation:

sinαi =

sv02i

V 2�u02i(v0i 6= 0): (B.3)

Furthermore, the component y of the velocity V is always zero, so that βi

can be computed from equation (B.1) as follows:

u0i sinβi�v0i

sinαicosβi = 0: (B.4)

By observing two velocities of a robot (i.e., observing two different V i), αi,(two different) βi and γi are acquired based on equations (B.2), (B.3) and(B.4). In this experimentation, in order to simplify the estimation we assumeγi = 0, that is, the cameras are located in parallel with the plane where robotsmove. On this assumption, αi and βi are computed with equations (B.3)and (B.4), respectively, by observing one motion of a robot. Note that, inpractice, the velocity of the robot V i is normalized with wi (the size of therobot in the view of sensor i).

153

0

5

10

15

20

25

30

35

40

45

0 10 20 30 40 50 60 70 80 90

1% 5% 10% [

degr

ee]

[degree]

Figure B.2: Estimation error ∆αi

B.2 Estimation Error

The relation between an observation error of the robot velocity (∆ui; ∆vi)

and an estimation error of the tilt angle ∆αi is represented as follows fromequation (B.3):

∆αi = sin�1

(s(vi +∆vi)

2

V 2� (ui +∆ui)2

)�αi

where we assume γi = 0. Fig. B.2 shows ∆αi when αi = 30Æ, and ∆ui and∆vi are 1%, 5% and 10% of jV j. In Fig. B.2, the horizontal axis indicates βi

since ui and vi are determined by equations (B.3), (B.4), and βi. Thus, theestimation error ∆αi becomes larger when βi approaches zero, that is, whenthe velocity of the robot approaches the horizontal direction in the view ofsensor i. Note that ∆αi is used in equations (6.2) and (6.3) in Chapter 6 forintegrating navigation plans generated by multiple sensors (i.e., VAs).

154

Table B.1: αi of vision agents estimated by observation

VA VA1 VA2 VA3 VA4Actual 30 31 9 28

Observation 1 21.7* 35.8 30.4* 6.49*Observation 2 24.9 8.08* 16.9* 34.4

VA1 VA2 VA3 VA4

Observation 1

Observation 2

Figure B.3: Images used for estimating αi

Table B.1 shows an example of the tilt angles α (in degrees) of four sen-sors estimated by observing two motions of a robot as shown in Fig. B.3.Comparing with the actual angles (indicated as ‘Actual’ in Table B.1), theestimation error becomes larger (values denoted with ‘�’ in Table B.1) whenthe velocity of the robot approaches the horizontal direction as discussedabove. The estimated parameters are not exactly precise, however, the sys-tem can still navigate robots with the estimated parameters though the robotsmove in a zigzag. This is because the navigation plan is represented witha differential angle between the current motion direction of the robot andthe direction to an instant navigation goal. Therefore, the direction (right orleft) in which the robot is navigated is not affected by the estimation error∆αi. Furthermore, the system can successfully navigate robots by integrat-ing navigation plans generated by multiple VAs.

155

localization of sensors and objects in distributed … · 2018-06-22 · the sensors by observation...

Documents