computer vision

COMPUTER VISION

Introduction

Computer vision is the study and application of methods which allow computers to

"understand" image content or content of multidimensional data in general. The term

"understand" means here that specific information is being extracted from the image data

for a specific purpose: either for presenting it to a human operator (e. g., if cancerous

cells have been detected in a microscopy image), or for controlling some process (e. g.,

an industry robot or an autonomous vehicle). The image data that is fed into a computer

vision system is often a digital gray-scale or colour image, but can also be in the form of

two or more such images (e. g., from a stereo camera pair), a video sequence, or a 3D

volume (e. g., from a tomography device). In most practical computer vision applications,

the computers are pre-programmed to solve a particular task, but methods based on

learning are now becoming increasingly common. Computer vision can also be described

as the complement (but not necessary the opposite) of biological vision. In biological

vision and visual perception real vision systems of humans and various animals are

studied, resulting in models of how these systems are implemented in terms of neural

processing at various levels.

State Of The Art

Relation between Computer vision and various other fields

The field of computer vision can be characterized as immature and diverse. Even though

earlier work exists, it was not until the late 1970's that a more focused study of the field

1

started when computers could manage the processing of large data sets such as images.

However, these studies usually originated from various other fields, and consequently

there is no standard formulation of the "computer vision problem". Also, and to an even

larger extent, there is no standard formulation of how computer vision problems should

be solved. Instead, there exists an abundance of methods for solving various well-defined

computer vision tasks, where the methods often are very task specific and seldom can be

generalized over a wide range of applications. Many of the methods and applications are

still in the state of basic research, but more and more methods have found their way into

commercial products, where they often constitute a part of a larger system which can

solve complex tasks (e.g., in the area of medical images, or quality control and

measurements in industrial processes).

A significant part of artificial intelligence deals with planning or deliberation for system

which can perform mechanical actions such as moving a robot through some

environment. This type of processing typically needs input data provided by a computer

vision system, acting as a vision sensor and providing high-level information about the

environment and the robot. Other parts which sometimes are described as belonging to

artificial intelligence and which are used in relation to computer vision is pattern

recognition and learning techniques. As a consequence, computer vision is sometimes

seen as a part of the artificial intelligence field.

Since a camera can be seen as a light sensor, there are various methods in computer

vision based on correspondences between a physical phenomenon related to light and

images of that phenomenon. For example, it is possible to extract information about

motion in fluids and about waves by analyzing images of these phenomena. Also, a

subfield within computer vision deals with the physical process which given a scene of

objects, light sources, and camera lenses forms the image in a camera. Consequently,

computer vision can also be seen as an extension of physics.A third field which plays an

important role is neurobiology, specifically the study of the biological vision system.

Over the last century, there has been an extensive study of eyes, neurons, and the brain

structures devoted to processing of visual stimuli in both humans and various animals.

This has led to a coarse, yet complicated, description of how "real" vision systems

2

operate in order to solve certain vision related tasks. These results have led to a subfield

within computer vision where artificial systems are designed to mimic the processing and

behaviour of biological systems, at different levels of complexity. Also, some of the

learning-based methods developed within computer vision have their background in

biology.

Yet another field related to computer vision is signal processing. Many existing methods

for processing of one-variable signals, typically temporal signals, can be extended in a

natural way to processing of two-variable signals or multi-variable signals in computer

vision. However, because of the specific nature of images there are many methods

developed within computer vision which have no counterpart in the processing of one-

variable signals. A distinct character of these methods is the fact that they are non-linear

which, together with the multi-dimensionality of the signal, defines a subfield in signal

processing as a part of computer vision.

Beside the above mentioned views on computer vision, many of the related research

topics can also be studied from a purely mathematical point of view. For example, many

methods in computer vision are based on statistics, optimization or geometry. Finally, a

significant part of the field is devoted to the implementation aspect of computer vision;

how existing methods can be realized in various combinations of software and hardware,

or how these methods can be modified in order to gain processing speed without losing

too much performance.

Related Fields

Computer vision, Image processing, Image analysis, Robot vision and Machine vision are

closely related fields. If you look inside text books which have either of these names in

the title there is a significant overlap in terms of what techniques and applications they

cover. This implies that the basic techniques that are used and developed in these fields

are more or less identical, something which can be interpreted as there is only one field

with different names. On the other hand, it appears to be necessary for research groups,

scientific journals, conferences and companies to present or market themselves as

3

belonging specifically to one of these fields and, hence, various characterizations which

distinguish each of the fields from the others have been presented. The following

characterizations appear relevant but should not be taken as universally accepted.

Image processing and Image analysis tend to focus on 2D images, how to transform one

image to another, e.g., by pixel-wise operations such as contrast enhancement, local

operations such as edge extraction or noise removal, or geometrical transformations such

as rotating the image. This characterization implies that image processing/analysis neither

require assumptions nor produce interpretations about the image content.

Computer vision tends to focus on the 3D scene projected onto one or several images,

e.g., how to reconstruct structure or other information about the 3D scene from one or

several images. Computer vision often relies on more or less complex assumptions about

the scene depicted in an image.

Machine vision tends to focus on applications, mainly in industry, e.g., vision based

autonomous robots and systems for vision based inspection or measurement. This implies

that image sensor technologies and control theory often are integrated with the processing

of image data to control a robot and that real-time processing is emphasized by means of

efficient implementations in hardware and software. There is also a field called Imaging

which primarily focus on the process of producing images, but sometimes also deals with

processing and analysis of images. For example, Medical imaging contains lots of work

on the analysis of image data in medical applications.

Finally, pattern recognition is a field which uses various methods to extract information

from signals in general, mainly based on statistical approaches. A significant part of this

field is devoted to applying these methods to image data.A consequence of this state of

affairs is that you can be working in a lab related to one of these fields, apply methods

from a second field to solve a problem in a third field and present the result at a

conference related to a fourth field!

Typical Tasks Of Computer Vision

4

Each of the application areas described above employ a range of computer vision tasks;

more or less well-defined measurement problems or processing problems, which can be

solved using a variety of methods. Some examples of typical computer vision tasks are

presented below.

Recognition

The classical problem in computer vision, image processing and machine vision is that of

determining whether or not the image data contains some specific object, feature, or

activity. This task can normally be solved robustly and without effort by a human, but is

still not satisfactory solved in computer vision for the general case: arbitrary objects in

arbitrary situations. The existing methods for dealing with this problem can at best solve

it only for specific objects, such as simple geometric objects (e.g., polyhedrons), human

faces, printed or hand-written characters, or vehicles, and in specific situations, typically

described in terms of well-defined illumination, background, and pose of the object

relative to the camera.

Different varieties of the recognition problem are described in the literature:

Recognition: one or several pre-specified or learned objects or object classes can

be recognized, usually together with their 2D positions in the image or 3D poses

in the scene.

Identification: An individual instance of an object is recognized. Examples:

identification of a specific person face or fingerprint, or identification of a specific

vehicle.

Detection: the image data is scanned for a specific condition. Examples: detection

of possible abnormal cells or tissues in medical images or detection of a vehicle in

an automatic road toll system. Detection based on relatively simple and fast

computations is sometimes used for finding smaller regions of interesting image

data which can be further analyzed by more computationally demanding

techniques to produce a correct interpretation. Several specialized tasks based on

recognition exist, such as:

5

Content-based image retrieval: find all images which has a specific content in a

larger set or database of images.

Pose estimation: estimation of the position and orientation of specific object

relative to the camera. Example: to allow a robot arm to pick up the objects from

the belt.

Optical character recognition (or OCR): images of printed or handwritten text

are converted to computer readable text such as ASCII or Unicode.

Motion

Several tasks relate to motion estimation in which an image sequence is processed to

produce an estimate of the local image velocity at each point. Examples of such tasks are

Egomotion: determine the 3D rigid motion of the camera.

Tracking of one or several objects (e.g. vehicles or humans) through the image

sequence.

Surveillance: detection of possible activities based on motion.

Scene Reconstruction

Given two or more images of a scene, or a video, scene reconstruction aims at computing

a 3D model of the scene. In the simplest case the model can be a set of 3D points. More

sophisticated methods produce a complete 3D surface model.

Image Restoration

Given an image, an image sequence, or a 3D volume, which has been degraded by noise,

image restoration aims at producing the image data without the noise. Examples of noise

processes which are considered are sensor noise (e.g., ultrasonic images) and motion blur

(e.g., because of a moving camera or moving objects in the scene).

Computer Vision Systems

6

A typical computer vision system can be divided in the following subsystems:

Image acquisition

The image or image sequence is acquired with an imaging system

(camera,radar,lidar,tomography system). Often the imaging system has to be calibrated

before being used.

Preprocessing

In the preprocessing step, the image is being treated with "low-level"-operations. The aim

of this step is to do noise reduction on the image (i.e. to dissociate the signal from the

noise) and to reduce the overall amount of data. This is typically being done by

employing different (digital)image processing methods such as:

1. Downsampling the image.

2. Applying digital filters

3. Computing the x- and y-gradient (possibly also the time-gradient).

4. Segmenting the image.

a. Pixelwise thresholding.

5. Performing an eigentransform on the image

a. Fourier transform

6. Doing motion estimation for local regions of the image (also known as optical

flow estimation).

7. Estimating disparity in stereo images.

8. Multiresolution analysis

Feature extraction

7

The aim of feature extraction is to further reduce the data to a set of features, which ought

to be invariant to disturbances such as lighting conditions, camera position, noise and

distortion. Examples of feature extraction are:

1. Performing edge detection or estimation of local orientation.

2. Extracting corner features.

3. Detecting blob features.

4. Extracting spin images from depth maps.

5. Extracting geons or other three-dimensional primitives, such as superquadrics.

6. Acquiring contour lines and maybe curvature zero crossings.

7. Generating features with the Scale-invariant feature transform.

8. Calculating the Co-occurrence matrix of the image or sub-images to measure

texture.

Registration

The aim of the registration step is to establish correspondence between the features in the

acquired set and the features of known objects in a model-database and/or the features of

the preceding image. The registration step has to bring up a final hypothesis. To name a

few methods:

1. Least squares estimation

2. Hough transform in many variations

3. Geometric hashing

4. Particle filtering

Applications Of Computer Vision

The following is a non-complete list of applications which are studied in computer vision.

In this category, the term application should be interpreted as a high level function which

solves a problem at a higher level of complexity. Typically, the various technical

problems related to an application can be solved and implemented in different ways.

8

Applications Of Computer Vision

A facial recognition system is a computer-driven application for automatically

identifying a person from a digital image. It does that by comparing selected facial

features in the live image and a facial database. It is typically used for security systems

and can be compared to other biometrics such as fingerprint or eye iris recognition

systems.

Popular recognition algorithms include eigenface, fisherface, the Hidden Markov model,

and the neuronal motivated Dynamic Link Matching. A newly emerging trend, claimed to

achieve previously unseen accuracies, is three-dimensional face recognition. Another

emerging trend uses the visual details of the skin, as captured in standard digital or

scanned images. Tests on the FERET database, the widely used industry benchmark,

showed that this approach is substantially more reliable than previous algorithms.

Polly (robot)

Polly was a robot created at the MIT Artificial Intelligence Laboratory by Ian Horswill

for his PhD, which was published in 1993 as a technical report. It was the first mobile

robot to move at animal-like speeds (1m per second) using computer vision for its

navigation. It was an example of behavior based robotics. For a few years, Polly was able

to give tours of the AI laboratory's seventh floor, using canned speech to point out

landmarks such as Anita Flynn's office. The Polly algorithm is a way to navigate in a

cluttered space using very low resolution vision to find uncluttered areas to move forward

into, assuming that the pixels at the bottom of the frame (the closest to the robot) show an

example of an uncluttered area. Since this could be done 60 times a second, the algorithm

only needed to discriminate three categories: telling the robot at each instant to go

straight, towards the right or towards the left.

Mobile robot

9

Mobile Robots are automatic machines that are capable of movement in a given

environment. Robots generally fall into two classes, linked manipulators (or Industrial

robots) and mobile robots. Mobile robots have the capability to move around in their

environment and are not fixed to one physical location. In contrast, industrial

manipulators usually consist of a jointed arm and gripper assembly (or end effector) that

is attached to a fixed surface.

The most common class of mobile robots are wheeled robots. A second class of mobile

robots includes legged robots while a third smaller class includes aerial robots, usually

referred to as unmanned aerial vehicles (UAVs). Mobile robots are the focus of a great

deal or current research and almost every major university has one or more labs that

focus on mobile robot research. Mobile robots are also found in industry, military and

security environments, and appear as consumer products.

Robot

A humanoid robot manufactured by Toyota "playing" a trumpet

The word robot is used to refer to a wide range of machines, the common feature of

which is that they are all capable of movement and can be used to perform physical tasks.

Robots take on many different forms, ranging from humanoid, which mimic the human

form and way of moving, to industrial, whose appearance is dictated by the function they

are to perform. Robots can be grouped generally as mobile robots (eg. autonomous

vehicles), manipulator robots (eg. industrial robots) and Self reconfigurable robots, which

can conform themselves to the task at hand.

Robots may be controlled directly by a human, such as remotely-controlled bomb-

disposal robots, robotic arms, or shuttles, or may act according to their own decision

making ability, provided by artificial intelligence. However, the majority of robots fall in-

between these extremes, being controlled by pre-programmed computers. Such robots

may include feedback loops such that they can interact with their environment, but do not

display actual intelligence.

10

The word "robot" is also used in a general sense to mean any machine which mimics the

actions of a human (biomimicry), in the physical sense or in the mental sense.It comes

from the Czech and Slovak word robota, labour or work (also used in a sense of a serf).

The word robot first appeared in Karel Čapek's science fiction play R.U.R. (Rossum's

Universal Robots) in 1921.

History

The construction of the Soviet-made robot of the 1970's. The robot was able to move,

reproduce the pre-recorded sounds, imitate the clever conversation using the built-in

11

radio station and demonstrate movies on the built-in screen. It was used in various

shows.The word robot was introduced by Czech writer Karel Capek in his play R.U.R.

(Rossum's Universal Robots) which was written in 1920 (See also Robots in literature for

details of the play). However, the verb robotovat, meaning "to work" or "to slave", and

the noun robota (meaning corvée) used in the Czech and Slovak languages, has been

used since the early 10th century. It was suggested that the word robot had been coined

by Karel Čapek's brother, painter and writer Josef Čapek.

An early automaton was created 1738 by Jacques de Vaucanson, who created a

mechanical duck that was able to eat grain, flap its wings, and excrete.

The first human to be killed by a robot was 37 year-old Kenji Urada, a Japanese factory

worker, in 1981. According the Economist.com, Urada "climbed over a safety fence at a

Kawasaki plant to carry out some maintenance work on a robot. In his haste, he failed to

switch the robot off properly. Unable to sense him, the robot's powerful hydraulic arm

kept on working and accidentally pushed the engineer into a grinding machine."

Smart Camera

A smart camera is an integrated machine vision system which, in addition to image

capture circuitry, includes a processor, which can extract information fromimageswithout

need for an external processing unit, and interface devices used to make results available

to other devices.

A Smart Camera or „intelligent Camera“ is a self-contained, standalone vision system

with built-in image sensor in the housing of an industrial video camera. It contains all

12

necessary communication interfaces, e.g. Ethernet. It is not necessarily larger than an

industrial or surveillance camera. This architecture has the advantage of a more compact

volume compared to PC-based vision systems and often achieves lower cost, at the

expense of a somewhat simpler (or missing altogether) user interface.

Early smart camera (ca. 1985, in red) with an 8MHz Z80 compared to a modern device

featuring Texas Instruments' C64 @1GHz. A Smart Camera usually consists of several

(but not necessarily all) of the following components:

1. Image sensor (matrix or linear, CCD- or CMOS)

2. Image digitization circuitry

3. Image memory

4. Communication interface (RS232, Ethernet)

5. I/O lines (often optoisolated)

6. Lens holder or built in lens (usually C or C-mount)

Examples Of Applications For Computer Vision

Another way to describe computer vision is in terms of applications areas. One of the

most prominent application fields is medical computer vision or medical image

processing. This area is characterized by the extraction of information from image data

for the purpose of making a medical diagnosis of a patient. Typically image data is in the

form of microscopy images, X-ray images, angiography images, ultrasonic images, and

tomography images. An example of information which can be extracted from such image

data is detection of tumours, arteriosclerosis or other malign changes. It can also be

measurements of organ dimensions, blood flow, etc. This application area also supports

13

http://en.wikipedia.org/wiki/Image:MicroScanner_VC_smartcamera.jpg

medical research by providing new information, e.g., about the structure of the brain, or

about the quality of medical treatments.

A second application area in computer vision is in industry. Here, information is

extracted for the purpose of supporting a manufacturing process. One example is quality

control where details or final products are being automatically inspected in order to find

defects. Another example is measurement of position and orientation of details to be

picked up by a robot arm. See the article on machine vision for more details on this area.

Military applications are probably one of the largest areas for computer vision, even

though only a small part of this work is open to the public. The obvious examples are

detection of enemy soldiers or vehicles and guidance of missiles to a designated target.

More advanced systems for missile guidance send the missile to an area rather than a

specific target, and target selection is made when the missile reaches the area based on

locally acquired image data. Modern military concepts, such as "battlefield

awareness,"imply that various sensors, including image sensors, provide a rich set of

information about a combat scene which can be used to support strategic decisions. In

this case, automatic processing of the data is used to reduce complexity and to fuse

information from multiple sensors to increase reliability.

Artist's Concept of Rover on Mars. Notice the stereo cameras mounted on top of the

Rover. (credit: Maas Digital LLC) One of the newer application areas is autonomous

vehicles, which include submersibles, land-based vehicles (small robots with wheels, cars

or trucks), and aerial vehicles. An unmanned aerial vehicle is often denoted UAV. The

level of autonomy ranges from fully autonomous (unmanned) vehicles to vehicles where

computer vision based systems support a driver or a pilot in various situations. Fully

autonomous vehicles typically use computer vision for navigation, e. g., a UAV looking

for forest fires. Examples of supporting system are obstacle warning systems in cars and

systems for autonomous landing of aircraft. Several car manufacturers have demonstrated

systems for autonomous driving of cars, but this technology has still not reached a level

where it can be put on the market.

14

Software For Computer Vision

Animal

Animal (first implementation: 1988 - revised: 2004) is an interactive environment for

Image processing that is oriented toward the rapid prototyping, testing, and modification

of algorithms. To create ANIMAL (AN IMage ALgebra), XLISP of David Betz was

extended with some new types: sockets, arrays, images, masks, and drawables. The

theoretical framework and the implementation of the working environment is described

in the paper ANIMAL: AN IMage ALgebra.In the theoretical framework of ANIMAL a

digital image is a boundless matrix. However, in the implementation it is bounded by a

rectangular region in the discrete plane and the elements outside the region have a

constant value. The size and position of the region in the plane (focus) is defined by the

coordinates of the rectangle. In this way all the pixels, including those on the border, have

the same number of neighbors (useful in local operators, such as digital filters).

Furthermore, pixelwise commutative operations remain commutative on image level,

independently on focus.

OpenCv

OpenCV is an open source computer vision library developed by Intel. The library is

cross-platform, and runs on both Windows and Linux. It focuses mainly towards real-

time image processing. The application areas include

1. Human-Computer Interface (HCI)

2. Object Identification

3. Segmentation and Recognition

4. Face Recognition

5. Gesture Recognition

6. Motion Tracking

Visualization Toolkit (VTK)

15

Visualization Toolkit (VTK) is an open source, freely available software system for 3D

computer graphics, image processing, and visualization used by thousands of researchers

and developers around the world. VTK consists of a C++ class library, and several

interpreted interface layers including Tcl/Tk, Java, and Python. Professional support and

products for VTK are provided by Kitware, Inc. VTK supports a wide variety

ofvisualization algorithms including scalar, vector, tensor, texture, and volumetric

methods; and advanced modeling techniques such as implicit modelling, polygon

reduction, mesh smoothing, cutting, contouring, and Delaunay triangulation.

Commercial Computer Vision Systems

Automatix Inc., founded in January 1980, was the first company to market industrial

robots with built-in machine vision. Its founders were Victor Scheinman, inventor of the

Stanford arm; Phillippe Villers, Michael Cronin, and Arnold Reinhold of

Computervision; Jake Dias and Dan Nigro of Data General; Gordon VanderBrug, of NBS

and Norman Wittels of Clark University.

Automatix Robots at the Robots 1985 show in Detroit, Michigan. Clockwise from lower

left: AID 600, AID 900 Seamtracker, Yaskawa Motoman.Automatix mostly used robot

mechanisms imported from Hitachi at first and later from Yaskawa and KUKA. It did

design and manufacture a Cartesian robot called the AID-600. The 600 was intended for

use in precision assembly but was adapted for welding use, particularly Tungsten inert

gas welding (TIG), which demands high accuracy and immunity from the intense

electromagnetic interference that the TIG process creates. Automatix was the first

company to market a vision-guided welding robot called Seamtracker. Structured laser

16

http://en.wikipedia.org/wiki/Image:AutomatixRobots1985.agr.jpg

light and monochromatic filters were used to allow an image to be seen in the presence of

the welding arc. Another concept, invented by Mr. Scheinman, was RobotWorld, a

system of cooperating small modules suspended from a 2-D linear motor. The product

line was later sold to Yaskawa.

Automatix raised large amounts of venture capital, and went public in 1983, but was not

profitable until the early 1990s. In 1994, Automatix merged with another machine vision

company, Itran Corp., to form Acuity Imaging, Inc. Acuity was acquired by Robotics

Vision Systems Inc. (RVSI) in September 1995. As of 2004, RVSI still supported the

evolved Automatix machine vision package under the PowerVision brand.

RapidEye is a commercial multispectral remote sensing satellite mission being designed

and implemented by MDA for RapidEye AG. The RapidEye sensor images five optical

bands in the 400-850nm range and provides 5m pixel size at nadir. Rapid delivery and

short revisit times are provided through the use of a five-satellite constellation.

Scantron is the name of a United States company that makes and sells Scantron exam

answer sheets and the machines to grade them. The Scantron system usually takes the

form of a "multiple choice, fill-in-the-circle/square/rectangle" form of varying length and

width, from single column 50 answer tests, to multiple 8.5" x 11" page forms used in

standardized testing such as the SAT and ACT. The forms are sensed optically, using

optical mark recognition to detect markings in each place, in a "Scantron Machine" that

tabulates and can automatically grade results. Earlier versions were sensed electrically.

17

A typical 100-answer Scantron answer sheet. This is only half of it (the front side) with

the back side not being shown.Commonly, there are two sides to Scantron answer sheets.

18

They can contain 50 answer blanks, 100 answer blanks, and so on. There is even a

smaller form called a "Quiz Strip" that contains only about 20 answer boxes to bubble-in.

On the larger sheets, there is a space on the back where answers can be manually written

in for separate questions, if a test giver issues them out. The full-sized 8.5" x 11" form

may contain a larger area for using it to work on math formulas, write short answers, etc.

Answers "A" and "B" are commonly used for "True" and "False" questions, as shown in

the image to the right on the top of each row.

Grading of Scantron sheets is performed first by creating an answer key. The answer key

is simply a standard Scantron answer sheet with all of the correct answers filled in, along

with the "key" rectangle at the top of the sheet.Once you have your answer key ready the

Scantron machine is powered on and the answer key is fed through. This stores the

answer key in the memory of the Scantron machine and any further sheets that are fed

through will be graded and marked according to the key in memory. Switching off the

Scantron machine will stop the paper feed and clear the memory.

19

Conclusion

Computer vision, unlike for example factory machine vision, happens in unconstrained

environments, potentially with changing cameras and changing lighting and camera

views. Also, some “objects” such as roads, rivers, bushes, etc. are just difficult to

describe. In these situations, engineering a model a-priori can be difficult. With learning-

based vision, one just “points” the algorithm at the data and useful models for detection,

segmentation, and identification can often be formed. Learning can often easily fuse or

incorporate other sensing modalities such as sound, vibration, or heat. Since cameras and

sensors are becoming cheap and powerful and learning algorithms have a vast appetite

for computational threads, Intel is very interested in enabling geometric and learning-

based vision routines in its OpenCV library since such routines are vast consumers of

computational power.

20

computer vision

Technology

computer vision system

field of computer vision

computer vision deals

computer vision problems

vision sensor

biological vision system

opposite of biological

certain vision related