smart camera review

Upload: santosh-siri

Post on 04-Apr-2018

219 views

Category:

Documents


0 download

TRANSCRIPT

  • 7/31/2019 Smart Camera Review

    1/35

    Smart Cameras: A Review

    Yu Shi*, Serge Lichman

    Interfaces, Machines And Graphical ENvironments (IMAGEN)

    National Information and Communications Technology Australia (NICTA)

    Australian Technology Park, Bay 15 Locomotive Workshop

    Eveleigh, NSW 1430, Australia

    *Corresponding Author. Tel.: +61 2 8374 5565; Fax: +61 2 8374 5527.E-mail Addresses: [email protected], [email protected]

    Abstract

    Smart cameras are cameras that can perform tasks far beyond simply taking photos and recording videos. Thanks to

    the purposely built-in intelligent image processing and pattern recognition algorithms, smart cameras can detect

    motion, measure objects, read vehicle number plates, and even recognize human behaviors. They are essential

    components to build active and automated control systems for many applications, and they will play significant role in

    our daily life in the near future. This paper aims to provide a first comprehensive review of smart camera technologies

    and applications. Here, we analyse the reasons behind the recent rapid growth of the smart cameras, discuss different

    categories of them and review their system architectures. We also examine their intelligent algorithms, features and

    applications. Finally we conclude with a discussion on design issues, challenges and future technological directions.

    Keywords: smart cameras, pattern recognition, machine vision, computer vision, video surveillance, embedded

    systems.

    1 IntroductionWhat is a smart camera? Different researchers and camera manufacturers offer different definitions.

    There does not seem to be a well-established and agreed-upon definition in either the video surveillance

    or machine vision industries, probably the two most active and advanced applications for smart cameras

    at present. For the purpose of this paper, we define a smart camera as a vision system in which the

    primary function is to produce a high-level understanding of the imaged scene and generate application-

    specific data to be used in an autonomous and intelligent system. The idea of smart cameras is to convert

    data to knowledge by processing information where it becomes available, and transmit only results that

    are at a higher level of abstraction. A smart camera is smart because it performs application specific

  • 7/31/2019 Smart Camera Review

    2/35

    information processing (ASIP), the goal of which is usually not to provide better quality images for

    human viewing but to understand and describe what is happening in the images for the purpose of better

    decision-making in an automated control system. For example, a motion-triggered surveillance camera

    captures video of a scene, detects motion in the region of interest, and raises an alarm when the detected

    motion satisfies certain criteria. In this case, the ASIP is motion detection and alarm generation.

    The important differences between a smart camera and normal cameras, such as consumer digital

    cameras and camcorders, lie in two aspects. The first is in camera system architecture. A smart camera

    usually has a special image processing unit containing one or more high performance microprocessors to

    run intelligent ASIP algorithms, in which the primary objective is not to improve images quality but to

    extract information and knowledge from images. The image processing hardware in normal cameras is

    usually simpler and less powerful with the main aim being to achieve good visual image quality. The

    other main difference is in the primary camera output. A smart camera outputs either the features

    extracted from the captured images or a high-level description of the scene, which is fed into an

    automated control system, while for normal cameras the primary output is the processed version of the

    captured images for human consumption. For this reason, normal video cameras have large output

    bandwidth requirements (in direct proportion to the resolution of the image sensor used), while smart

    camera can have very low data bandwidth requirements at the output (it can be just one bit in the simplest

    case, with 1 meaning there is motion and 0 meaning there is no motion, for example). These

    differences are illustrated in figure 1.

    image

    sensingimage

    processing

    video to TV display or digital display

    for human consumption

    image

    sensingASIP

    app. specific data

    generation and

    communication

    meta data to an automated

    control system for decision making

    (a)

    (b)

    image/video output

    generation and

    communication

    Figure 1: Differences between a normal camera (a) and a smart camera (b).

  • 7/31/2019 Smart Camera Review

    3/35

    Smart cameras can exist where a camera is not expected to be. A good example is the ubiquitous optical

    mouse for PC. Most optical mice contain a miniature digital video camera inside the mouse casing. They

    work by shining a bright light onto the surface below, then using a camera to take up to 1 500 pictures a

    second of that surface. An intelligent image processing circuit inside the mouse performs image

    enhancement and calculates the mouse motion based on image difference between successive frames.

    This difference is then used to displace the mouse cursor on the screen. The optical mouse is a good

    example of a smart camera in three respects: firstly its a stand-alone camera with camera and processing

    in a single embedded device; secondly the camera is used not to take pictures or video for human

    consumption, but to produce a feature vector (motion vector of in x and y directions) to represent the

    object (the mouse in this case) displacement; thirdly it shows that smart cameras are not restricted to a

    niche market, but can be adopted ubiquitously.

    Strictly speaking, a smart camera is a stand-alone, self-contained device that integrates image sensing,

    ASIP and communications in one single box. It is designed for a special type of application (for example,

    surveillance, and industrial machine vision). However, there are other types of vision systems that are

    often referred to as smart cameras as well, such as PC-based smart cameras. Well analyze these different

    types of smart cameras in section 3. The term smart camera in this paper covers both stand-alone smart

    cameras and other types of smart cameras, as described in section 3.1, unless specified otherwise.

    The advent of smart cameras can be traced back to the early 1990s when PCs became popular and video

    frame grabbers became available. Early solid state CCD (Charge-Coupled Device) cameras of the mid-

    1970s were analog cameras. Later digital signal processing (DSP) technologies pushed analog CCD

    cameras into the digital era with enhanced image quality, but the output of most of these cameras was still

    analog (e.g. NTSC/PAL signals). Frame grabbers allowed CCD cameras with analog output to be

    connected to computers and digitized for versatile processing by computers. This marked the beginning of

    smart camera systems, with the camera performing image capture and computer carrying out intelligent

    processing tasks such as motion detection and shape recognition. The first applications were in the area of

    industrial machine vision and surveillance.

    The real interest in and the growth of smart cameras started in late 1990s and early 2000s, spurred by

    factors such as technological advancements in chip manufacturing, embedded system design, the coming-

  • 7/31/2019 Smart Camera Review

    4/35

    of-age of CMOS (Complementary Metal Oxide Semiconductor) image sensors and so on. Market

    demands from surveillance and machine vision also played significant roles. Advanced smart camera

    systems often integrate the latest technologies in image sensors, optics, imaging systems, embedded

    systems, computer vision, video analysis and communication, networking and etc.

    The heart of smart cameras is the intelligent ASIP algorithms and the hardware that runs them. Image

    feature extraction and pattern recognition are probably among the most widely used algorithms in smart

    cameras. In a way, a smart camera can be thought of as an image feature extractor or a visual pattern

    recognizer. Research in computer vision, image understanding and pattern recognition has yielded many

    algorithms and solutions that can be used by smart cameras. However, the performance and robustness of

    the ASIP algorithms when deployed into cameras operating under real-world conditions are among the

    most important issues facing the development and commercialization of new smart cameras.

    In the remainder of this paper, we analyze the main reasons behind the rapid growth of smart cameras

    (section 2), review system architectures of different smart cameras (section 3), review the state-of-the-art

    smart camera systems and ASIP algorithms for some applications (section 4), and finally discuss some

    design issues and conclude with some thoughts about technical challenges and future technological

    directions (section 5).

    2 The Rapid Growth of Smart Cameras2.1 Coming of Age of CMOS Image Sensors

    The advent of CMOS image sensors (CIS) in late 1990s played an important role in the development of

    smart camera technology and systems, and has potential to make smart camera smaller, cheaper and more

    pervasive. Compared to CCD, CIS have several advantages which make them excellent candidates for

    smart camera front-end. These include smaller size, cheaper manufacturing cost, lower power

    consumption, the ability to build a camera-on-a-chip, the ability to integrate intelligent processing circuits

    onto the sensor chip, and significantly simplified camera system design.

    Most CISs are manufactured using the same process by which semiconductor chips (CPUs, memories,

    etc) are made. This means that many semiconductor manufacturers can make CIS, which drives up

    competition and reduces cost. CCD sensors, by contrast, are made using special chip manufacturing

  • 7/31/2019 Smart Camera Review

    5/35

    process and there are only a few manufacturers in the world, mostly in Japan. CCD-based camera chip-

    sets usually include at least three or four chips: a CCD pixel array, CDS (Correlated Double Sampling), a

    timing generator, and ADC (Analog-to-Digital Converter). In the case of CIS, all these functions can be

    integrated onto one single chip, making it a real camera-on-a-chip with light in and pixel out. This greatly

    simplifies camera system design and reduces cost. Compared with the CCD chip-set, there are many more

    sources from which a CIS can be purchased, even a single item at a time, which is very difficult to

    achieve in the case of CCD. All this makes it much easier for more researchers, students, and camera

    manufacturers alike to develop smart cameras of their own.

    Probably the most important advantage of CIS over CCD lies in its ability to have image sensor array

    and intelligent image processing circuits side by side on the same chip. This makes a single chip smart

    camera possible. One example is a vision-based single-chip fingerprint reader with on-chip CIS, a

    processing circuitry performing pattern matching and a memory storing templates of one or several user

    fingerprints for real-time comparison and identification [1].

    A recent market survey by Gartner Dataquest [2] estimated that there are about 40 suppliers of CIS

    world-wide, and that the global CIS market would increase from $3.2 billion in 2005 to 5.6 billion by

    2008. The survey showed that automobile, medical imaging and surveillance applications are among the

    emerging markets for CIS products.

    2.2 Research in Computer Vision and Pattern RecognitionWhat makes a camera smart is the intelligent ASIP - the application-specific information processor

    built into the camera system. The advancement in academic and industrial research in real-time image

    processing and understanding, pattern recognition, machine learning, computer vision and video

    communication continues to provide a large library of intelligent algorithms for use by smart cameras for

    different applications. As an example, Intels OpenCV (Open Source Computer Vision) Library [3] has

    been very popular with academic researchers and students working on smart camera projects. Every year,

    numerous international journals, conferences and workshops give researchers world-wide forums to

    present their innovative work in areas such as computer vision and pattern recognition. A lot of the work

    presented can be seen as embryos of future smart cameras. Recently, first ever international conferences

    and workshops have been held focusing on the design of embedded vision systems.

  • 7/31/2019 Smart Camera Review

    6/35

    2.3 Embedded System TechnologiesA stand-alone smart camera is essentially an embedded vision system. Compared with PC-based

    systems, an embedded system is usually subject to many constraints on the design, implementation and

    production of the device which encapsulates it, such as low power, limited resources, real-time processing

    and low cost. An embedded vision system is even more challenging to design due to video processings

    insatiable demand for computing power and memory resources. In the last decade, embedded vision

    systems have made great progress thanks to the increasing affordability of powerful processors and

    memory chips, availability of real-time operating systems, low complexity intelligent algorithms and the

    coming-of-age of system development software and tools.

    Functional integration seems to be a trend in consumer electronics and ICT (Information and

    Communications Technology). For example, many cellular phones now come with a camera and can play

    music and receive radio. Some webcams have built-in intelligence such as face tracking. Functional

    integration can seemingly make a normal camera become smart. For example, a camera with an

    integrated voice/sound detection component can take a picture of the surrounding area when a human

    voice is detected, or it can take a picture in a direction from which a gun-shot has been detected [4].

    2.4 Socio-Economical DriversThanks to Moores law, semiconductor chips and computer hardware continue to shrink in size, reduce

    in cost and gain in performance. This has driven the prices of cameras, frame grabbers and computers

    down and made smart camera systems, especially PC-based systems, more affordable to research and

    development on one hand and to the market and end-users on the other. As hardware constraints (cost-

    wise) are lifted, software developers have more freedom to write "smarter" algorithms.

    One of the most significant developments in surveillance and security industries in the last several years

    has been the wide use of CCTV (Closed Circuit Television) cameras and their impact on crime, terrorist

    attacks, and on the general public. It is noticeable that after the 9/11 event in the US, video surveillance

    has received more attention not only from the academic community, but also from industry and

    governments. The recent terrorist attacks in the London Underground in mid-2005 and the successful use

    of CCTV by police in identification of perpetrators have intensified the talk about a new generation of

    intelligent video surveillance systems based on smart cameras. In fact, surveillance and security demands

  • 7/31/2019 Smart Camera Review

    7/35

    are an important driving force behind the ever-increasing scale of academic and industrial research in

    advanced vision algorithms such as object tracking and identification, and human behavior analysis.

    2.5 Market Demands and Analysis2.5.1 Digital Video Surveillance

    The first generation of CCTV cameras (1980s-1990s) was mostly analog cameras with limited

    functionality and high cost. Digital CCTV cameras and the use of DVR (Digital Video Recorders)

    represented the second generation (2G, 1990s-now). Digital CCTV cameras built using CCD and CMOS

    image sensors provide better video quality, some intelligent functions such as motion detection, electronic

    PTZ (Pan-Tilt-Zooming), and networking. The 2G CCTV systems have become mass market products,

    fuelled by improved affordability and societys increasing concerns over safety and security. According

    to estimates made in 2004 by market research firm Datamonitor [5], digital video surveillance is a high-

    growth segment within the overall surveillance market estimated at 55% CAGR (Compound Annual

    Growth Rate) between 2003 and 2007. In dollar terms, between 2003 and 2007 the market will grow from

    US$1.3bn to US$7.4bn globally.

    However, the 2G CCTV systems are not smart enough to help prevent crimes or terror attacks, even

    though they proved very useful in post-event identification of crime perpetrators. The 2G CCTV systems

    are mostly not automated systems and rely strongly on trained security personnel to perform image

    analysis, object tracking and identification. The increasing number of cameras makes this difficult for

    real-time analysis by security personnel. Network bandwidth is another important issue affecting real-

    time processing needed for crime prevention. The intelligent video surveillance system (IVSS) (also

    called the third generation CCTV system) will try to provide solutions to these problems. Smart cameras

    will be one of the fundamental building blocks of the IVSS, making it possible to build and deploy

    automated, distributed and intelligent multi-sensory surveillance systems capable of tracking humans and

    suspected objects, analyzing human behaviors, and etc. Many market research firms have predicted

    significant growth in intelligent video systems and smart cameras. For example, the market researcher

    Frost & Sullivan [6] has forecast that the US$153.7 million video surveillance software market is

    expected to witness a healthy CAGR of 23.4% from 2004 to 2011 to reach US$670.7 million.

  • 7/31/2019 Smart Camera Review

    8/35

    2.5.2 Industry Machine VisionIndustrial machine vision is probably the birth place of smart cameras, at least in terms of the

    systematic use of commercial smart cameras. It is also one of their most active playgrounds. Most

    machine vision smart cameras are stand-alone cameras. The demand for these cameras has been steadily

    increasing over the years. The major end user industries are in robotics, semiconductor, electronics,

    pharmaceutical, manufacturing, food, plastics and printing. The tasks these smart cameras usually

    perform include bar-code reading, part inspection, flaw detection, surface inspection, dimensional

    measurement, assembly verification, print verification, object sorting, OCR (optical character

    recognition) and maintenance. A recent survey on machine vision products from a Europe based market

    research firm IMS Research [7] has discovered that smart cameras are rapidly accounting for a greater

    share of the machine vision market revenue. Demand for smart cameras is primarily driven by the

    increasing demand for better production efficiency and quality control in industries such as manufacturing

    and medicine / pharmaceutical. The survey revealed that whilst the sale of more traditional PC-based

    products (cameras and frame grabbers) has fallen, sales of smart cameras and compact vision systems

    have continued to grow. The survey predicts that the machine vision market in Europe will grow at an

    average rate of 11.6% each year to 2006. The highest levels of growth, approaching 20%, are forecast for

    the smart sensor and cameras product groups resulting in more than doubling in value in dollar terms. The

    same trend has also been forecast by the same company for the Asia-Pacific market [8]. An estimate

    provided by the annual market study by the AIA (Automated Imaging Association) for the 2003 North

    American machine vision smart camera market is about $57 million US dollars, with growth at 15% per

    year in terms of revenues and 20% per year in terms of units [9].

    2.5.3 Other Significant MarketsOther important markets for smart cameras are ITS (Intelligent Transport Systems), automobiles, HCI

    (Human Computer Interface), medical/healthcare, games, toys, video conferencing, biometrics.

    3 Review of Smart Camera System ArchitecturesIn recent years, smart cameras have attracted considerable attention from academic and industrial

    research and development (R&D) organizations. However, to the best of the authors knowledge, a

    systematic approach to analyzing smart cameras has yet to be agreed-upon. In this section we firstly

  • 7/31/2019 Smart Camera Review

    9/35

    present one approach to classify smart camera systems and provide an analysis of their system

    architectures, followed by a review of some R&D activities on the design of smart cameras as embedded

    systems.

    3.1 Classification of Smart CamerasSmart cameras can come in different system and physical configurations. Figure 2 shows one proposed

    classification of different types of vision systems and smart cameras.

    Vision Systems

    Embedded

    Vision Systems

    PC based

    Vision Systems

    Network based

    Vision Systems

    Hybrid

    Vision Systems

    Stand-alone

    Smart Cameras

    Non Stand-alone

    Smart Cameras

    Single Chip

    Smart Cameras

    PC-based

    Smart Cameras

    Networked

    Smart Cameras

    Other types of

    Smart Cameras

    Figure 2: One proposed classification of vision systems and smart cameras.

    As shown in Figure 2, stand-alone smart cameras are a subset of embedded vision systems. Non-stand-

    alone embedded smart cameras are sometimes called compact vision systems. Compact vision systems

    are usually composed of general purpose cameras connected to an external embedded processing unit in a

    separate box to provide ASIP and communication/networking functionality. Single-chip smart cameras

    can be thought of as a special case of smart cameras because they require special system design

    considerations and are usually used in carefully targeted applications. Non-stand-alone smart cameras can

    be thought of as virtual smart cameras because from user point of view the cameras are smart, even

  • 7/31/2019 Smart Camera Review

    10/35

    though the ASIP which makes them smart may be performed by an external unit, like a hardware

    accelerator board, a local PC or a networked PC. PC-based smart cameras, consisting of a general purpose

    video camera, a frame-grabber of some sort and a PC, of which the CPU performs the ASIP, is a very

    common and inexpensive platform for researchers, academics and students to conduct research on smart

    cameras. Sometimes a normal camera is connected to a PCI (Peripheral Component Interconnect)

    processing board within a PC. In this case, the PCI board may perform most of the ASIP and output

    generation, while the PC provides a flexible operator interface or additional processing power. This kind

    of system is a special case of a compact vision system and a PC-based system. A digital CCTV

    surveillance system with intelligent features is an example of a network-based smart camera system, and

    the next generation of distributed intelligent video surveillance systems will be the exciting test ground

    for smart cameras, especially stand-alone smart cameras. Hybrid vision systems may give rise to some

    special types of smart cameras. This category may also include smart camera systems that may need some

    kind of human intervention to help provide high accuracy data output.

    3.2 Analysis of Different Types of Smart Cameras3.2.1 Common Characteristics

    The common basic components of a normal digital video camera (consumer, professional or industrial)

    include optics, solid-state image sensor (CCD or CMOS), image processor(s) and supporting hardware,

    output generator, and communication ports. The main tasks performed by the image processor(s) are to

    provide color interpolation, color correction or saturation, gamma correction, image enhancement and

    camera control such as white balance and exposure control. The output generator can be an NTSC/PAL

    encoder to provide standard TV-compatible output, or a video compression engine to provide compressed

    video streams for communication over network, or digital video output generator such as a Firewire

    encoder. Communication ports, such as Ethernet or RS232 provide the basis for networked camera

    functionality or camera configuration and firmware upgrading through a PC respectively.

    The main basic components of a smart camera typically exhibit all the above essential components of a

    normal camera, with the following differences:

    A smart camera has a distinct and powerful signal processing unit to perform image featureextraction and/or pattern analysis based on application-specific requirements; and

  • 7/31/2019 Smart Camera Review

    11/35

    A smart camera has an output generator to produce a coded representation of the image featuresand/or results from the pattern matching, or in some cases, control signals for other devices (e.g.

    alarm triggering signal) or actions (e.g. sending a picture of the number plate of a car which is

    speeding to police).

    System architecture design for smart cameras often involves significant system engineering effort.

    Clear application requirements and specifications are crucial to the successful design. Software

    architecture, hardware architectures, and network architecture for network-based systems, need to be

    jointly designed to maximize resource usage and efficiency, and to reduce cost and time-to-completion.

    More detailed design considerations are discussed in section 5.1.

    3.2.2 Stand-alone Smart CamerasA stand-alone smart camera integrates image capture, ASIP and application specific output generation

    into a single device casing. A stand-alone smart camera may look very much like a normal industrial

    camera or a CCTV camera. While the primary function of a normal camera is to provide raw video for

    monitoring and recording, a smart camera is usually designed to perform specific, repetitive, high-speed

    and high-accuracy tasks in industries such as machine vision and surveillance. Most of the industry

    machine vision cameras are stand-alone smart cameras. While a normal video camera may only cost

    anywhere between US$50 and US$2 000, a machine vision smart camera can cost between US$1 000 and

    $6 000 per unit [10] and beyond, depending on the functionality and level of customization.

    Many pattern recognition techniques involve two types of processing tasks, data-intensive tasks such as

    image enhancement and feature extraction, and math-intensive tasks such as statistical pattern matching.

    While data-intensive tasks require high speed hardware to deal with high pixel volume and high frame

    rate, math-intensive tasks often require high performance processors to deal with issues such as pipelining

    and floating-point arithmetic. For demanding applications, camera hardware architecture may be based on

    a heterogeneous- and multiple-processor platform, with one or more processor(s) capable of

    implementing parallel processing (e.g. an FPGA - Field Programmable Gate Array) performing data

    intensive tasks, and a DSP and/or a RISC (Reduced Instruction Set Computer) processor performing

    math-intensive tasks. A smart camera built for face detection and recognition application by Broers et al.

    [11] is such an example. The system employs an FPGA and a parallel processor Xetal working in SIMD

  • 7/31/2019 Smart Camera Review

    12/35

    (Single Instruction Multiple Data) mode, to perform data intensive operations such as face detection. A

    high performance DSP, TriMedia, with a VLIW (Very Long Instruction Word) core is used to perform

    high level programs such as face recognition. The system architecture can be represented as in Figure 3.

    Image sensor and

    AFE/ADC Blocks

    CameraControl

    Memory

    System Communications/network interfaces

    Math-Intensive Processing

    Block

    TriMedia

    System Data Bus

    Data-Intensive Processing

    Block

    FPGA Xetal

    Figure 3: A stand-alone smart camera system architecture for face recognition [11].

    3.2.3 Single-Chip Smart CamerasSingle-board or single-chip smart cameras are a special kind of stand-alone smart camera. Single chip

    smart cameras take advantage of the integration capability of CMOS image sensors by building intelligent

    ASIP circuits onto the image sensor chip, potentially releasing the host computer of cumbersome pixel

    processing tasks and minimizing the data transfer between camera and computer. In some cases, pixel-

    level ADC and processing can be achieved [12], which can lead to a brand new level of signal and image

    processing methodologies. Single-chip smart cameras make it possible to design very efficient, very

    small, low power and low cost cameras (when a large volume is produced). As examples, the VISoc

    single chip smart camera [13] integrates a 320x256 pixel CMOS image sensor, a RISC processor, a vision

    co-processor and I/O onto a single chip, which has been fabricated in a 0.35m process on an area of

    about 36mm2, and a typical power dissipation of about 1W at 3.3V at 60MHz. Moorhead et al. [14]

    designed a smart CMOS camera chip which integrates an edge detection mechanism directly into the

    sensor array. Lee et al. [15] reported the design of a 30 frames/second VGA-format CMOS image sensor

    with an embedded massively parallel processor, for real-time skin-tone detection.

    In some applications single chip smart camera can bring distinct advantages. For example, Shigematsu

    et al. argue that, compared with conventional multi-chip fingerprint readers, a single-chip smart camera

    based fingerprint reader can have advantages of being much smaller, allowing much simplified

  • 7/31/2019 Smart Camera Review

    13/35

    integration into mobile devices such as mobile phone, being low in cost, and having improved security

    [1]. The main disadvantage of the single-chip smart camera lies in the cost of chip design and

    manufacturing, unless a large volume of units can be produced to justify the initial capital investment.

    Nevertheless, a single-chip smart camera is a smart sensor that has potential to make vision systems

    pervasive, especially when connected to wireless sensor networks.

    3.2.4 Embedded System based Smart CamerasThis category of smart cameras most often consists of a camera (usually a general purpose one) and an

    external embedded processing unit connected to it. For example, an embedded system based smart

    camera could be a general-purpose camera connected to a high performance video processing board,

    which itself is connected to a PC, either through a PCI slot or through a RS232 port. This kind of

    configuration is not too different from a PC-based system. Many 2G digital CCTV systems with some

    intelligent features belong to this category.

    The necessity of having a dedicated and embedded processing unit in this type of smart cameras is due

    to the fact that PC, while flexible and versatile, is far from being adequate to perform intensive image and

    video processing and pattern recognition tasks, particularly when high-resolution, high frame rate and low

    latency processing is required. Another advantage of this kind of system is that once proof-of-concept is

    achieved and end-users are identified, it is easier for the system to be converted to a stand-alone smart

    camera if required.

    Smart cameras used in robotic and automobile applications can also be classified into this category.

    These cameras may share computing resources such as a processor and memory with other embedded

    devices in the robot and in the vehicle.

    3.2.5 PC and Network based Smart CamerasPC-based smart camera systems are probably most popular within the academic research environment,

    as a first step to conducting computer vision and pattern recognition research, and building first prototype

    for proof-of-concepts. It is a very simple and inexpensive configuration, as the prices for general purpose

    video cameras and PCs continue to fall. Most often, a general purpose camera is connected to a PC

    through either a frame grabber or a communication port such as USB, Firewire, CameraLink, or Ethernet.

    This type of system relies on the PCs CPU to perform image analysis, feature extraction and pattern

  • 7/31/2019 Smart Camera Review

    14/35

    recognition tasks. The availability of various vision processing libraries for PC platforms makes this kind

    of system very popular. PCs also provide a more flexible environment for building user interfaces.

    USB cameras, Firewire cameras and network cameras allow digital images to be transferred directly

    from camera to a PC or an embedded processing hardware, avoiding signal integrity loss caused by DAC

    (digital to analog conversion) inside many CCTV cameras and ADC by frame grabbers. For high-

    resolution cameras, Firewire cameras are starting to become popular and affordable, but CameraLink

    remains dominant, especially for high bandwidth and high performance applications.

    The 2G CCTV system is a network based video surveillance system (NVSS). An NVSS with built-in

    intelligent surveillance features can be loosely considered as a network of virtual smart cameras. An

    NVSS is composed of four main layers: a CCTV camera (sensor) layer, a network layer, a central

    computer layer and a trained security personnel layer (Figure 4). As discussed in section 2.5.1, in most of

    the currently deployed NVSSes, the ASIP tasks such as object tracking and identification and threat

    detection are typically performed mostly by trained security personnel. However, human monitoring of

    surveillance video is a very labor-intensive task. It is generally agreed that watching video feeds requires

    a higher level of visual attention than most every day tasks. Specifically vigilance, the ability to hold

    attention and to react to rarely occurring events, is extremely demanding and prone to error due to lapses

    in attention. A recent study by the US National Institute of Justice found that, after only 20 minutes of

    watching and evaluating monitor screens, the attention of most individuals will degenerate to well below

    acceptable levels [16]. The next generation of video surveillance systems - intelligent video surveillance

    systems (IVSS) will try to solve these problems by providing automated video surveillance and crime

    preemption abilities. The IVSS will seek a re-distribution of ASIP tasks among the four layers in the

    NVSS system, notably shifting processing load from security personnel to central computers or DVR (in

    short-term), and probably more importantly to the surveillance cameras that is, the introduction of

    (stand-alone) smart cameras to replace passive or dumb CCTV cameras (in mid- and long-term). The use

    of smart cameras would greatly reduce the bandwidth problem caused by the increasing number of

    cameras present in the system and enhance surveillance system performance, as sending raw pixels over

    the network is less efficient than sending the results of intermediate analysis results. Smart cameras can

    also help in decentralizing the overall surveillance system, which can lead to improved fault tolerance and

    the realization of more surveillance tasks than with traditional cameras [17].

  • 7/31/2019 Smart Camera Review

    15/35

    sensor layer network layercentral computer

    (server) layer

    security personal

    layer

    Camera 1

    Camera 2

    Camera N

    network layer

    Figure 4: Four layers of a network based video surveillance system (NVSS).

    3.3 Research in Smart Cameras as Embedded SystemsVideo processing is notoriously hungry for computation horsepower, memory and other resources.

    Smart cameras as embedded systems have to meet the insatiable demand of video processing on one

    hand, and to meet the challenging demands of embedded systems, such as real-time, robustness,

    reliability under real-world conditions, on the other hand. This has made smart cameras a leading-edge

    application for embedded systems research [18]. Recently there has been a significant increase in research

    in building smart cameras as embedded systems. The first IEEE workshop on Embedded Computer

    Vision (ECV05) was held in June 2005 [19]. The workshop addressed issues such as how to design

    smart algorithms to efficiently utilize embedded hardware, how to meet real-time constraints in embedded

    environment and verification methods for mission-critical embedded vision systems. In particular, the

    workshop discussed the suitability of FPGA for embedded vision systems.

    Apart from numerous research groups working on developing smart cameras for video surveillance,

    there are a number of academic research groups in the world dedicated to research into building smart

    cameras as embedded systems. One prominent group is the Embedded Systems Group in Princeton

    Universitys Department of Electrical Engineering [18]. This group has developed an embedded smart

    camera system that can detect people and analyze their movement in real time. They are also working on

    a VLSI (Very Large Scale Integration) smart camera. An interesting research activity involving the design

    of stand-alone smart cameras is the SmartCam project at University of Technology Eindhoven [20]. This

    project investigates multi-processor based smart camera system architectures and addresses the critical

  • 7/31/2019 Smart Camera Review

    16/35

    issue of determining correct camera architectural parameters for a given application domain. Another

    project bearing the same name is being undertaken by the University of Technology in Graz, Austria [17].

    The project aims to develop distributed smart cameras for traffic surveillance applications. They also

    investigate various issues involved in making smart cameras as embedded systems, such as resource-

    aware dynamic task allocation systems to support real-time requirements.

    Many industry research groups and companies are involved in smart camera research for machine

    vision, especially in Germany, Japan and the US. There exist some very informative and useful journals

    and web portals for the machine vision world, such as IEEE Transactions on Pattern Analysis and

    Machine Intelligence, Advanced Imaging Magazine [21], Machine Vision Resources [22], Machine

    Vision Online [23].

    A search on USPTO (US Patent and Trademark Office) website can reveal many patents filed or issued

    in relation to the concept and embodiment of smart cameras as embedded systems. For example, patent

    #6 985 780filed in Aug 2004 under the title of Smart Camera [24] made claims about a camera system

    that includes an image sensor and a processing module at the imaging location that processes the capture

    images prior to sending the results to a host computer. The processing module can perform tasks such as

    image feature extraction and filtering, convolution and deconvolution methods, correction of parallax and

    perspective image error and image compression.

    4 Review of ASIP Algorithms for Smart Cameras and State-of-the-Art SystemsIf cameras are extensions of human eyes, the smart cameras are pushing the boundary of possibilities to

    become extensions of human brain as well. What makes a camera smart is the intelligent and application

    specific information processing (ASIP) algorithms that are built into the software architecture of the

    camera systems. In this section we firstly explore some common characteristics of intelligent algorithms

    for smart cameras. We then review several categories of algorithms as applied to machine vision,

    surveillance and other prominent applications, and some state-of-the-art smart camera systems in use in

    these applications areas.

    4.1 Common Characteristics of Algorithms for Smart CamerasThe primary function of a smart camera is to conduct autonomous analysis of the content of an image

    or video and achieve a high-level understanding of what is happening in the scene. One of the most

  • 7/31/2019 Smart Camera Review

    17/35

    commonly adopted approaches is image processing-based pattern recognition, which is a branch of

    artificial intelligence. Pattern recognition assumes that the image may contain one or more objects and

    that each object belongs to one of several predetermined types or classes. Given a digitized image

    containing several objects, the pattern recognition process consists of three main phases, each including

    several processing tasks:

    Signal level processing image enhancement, image segmentation; Feature level processing feature extraction, feature measurements and tracking; and Object level processing object classification and estimation.

    This is illustrated in figure 5. Also shown in figure 5 is a semantic-level processing component, which is

    central to the output or action side of smart cameras. The main tasks at this level include possible joint

    analysis of inputs from additional cameras, other sensory and database inputs, data fusion, event

    description, control signal generation. It should be noted that some tasks at different levels or phases may

    intersect each other during processing.

    feature

    extraction and

    tracking

    person, behavior

    and eventdescription

    signal level feature level object level

    +

    other camera and/orsensory and/ordatabase inputs

    video

    capture

    object

    classification and

    estimationcontrol signal

    generation

    image

    enhancement and

    segmentation

    semantic level

    Figure 5: General processing flow of algorithms for pattern recognition and smart cameras.

    Image segmentation at signal level is essential to all subsequent processing tasks, aiming at dividing an

    image into distinct parts, each having a common characteristic. Image segmentation can be based on

    color, texture, shape and motion. Feature extraction is crucial to pattern recognition. This is where the

    segmented regions or objects are measured. A measurement is the value of some quantitative property of

    an object. A feature is a function of one or more measurements, computed so that it quantifies some

    significant characteristic of the object. This drastically reduced amount of information (compared to the

    original image) represents all the knowledge upon which the subsequent classification decision must be

    based. Object classification outputs a decision regarding the class to which each object belongs. Each

  • 7/31/2019 Smart Camera Review

    18/35

    object is recognized as being of one particular type, and the recognition is implemented as a classification

    process [25].

    For simple applications, not all these levels and tasks are required to be implemented. For example, the

    camera in an optical mouse only performs signal- and feature-level processing tasks. On the other hand,

    for a particular processing task, different applications can have quite different requirements on the

    cameras performance, robustness and reliability. For example, the requirements for robustness of

    processing tasks at all levels are much higher for video surveillance monitoring human movement and

    behaviors than for industry machine vision cameras performing parts inspection or sorting.

    Tasks at signal- and feature-levels are usually data-intensive and are well suited for hardware-based

    implementation to meet speed demands. Tasks at the object level can be math-intensive and may need

    high performance processor(s) to complete. Stand-alone smart cameras built on a multi-processor

    architecture would have one processor, such as a DSP or an FPGA, to perform tasks at signal- and

    feature-levels, and have a high performance DSP or RISC microprocessor to perform statistical object

    classification.

    When designing smart cameras as embedded systems for demanding applications such as surveillance

    and automobiles, there are several important and challenging issues that need to be addressed, such as the

    development of low-complexity, low-cost algorithms suitable for hardware implementation, and software

    and hardware co-design, in order to map algorithmic requirements to hardware resources. These issues

    will be further discussed in section 5.1.

    4.2 Application: Intelligent Video Surveillance Systems (IVSS)4.2.1 Current Research in Algorithms for IVSS

    Video surveillance in dynamic scenes, especially for humans and vehicles, is currently one of the most

    active research topics in computer vision and pattern recognition. The IEEE and IEE have organized

    many workshops and conferences on intelligent visual surveillancein the last several years and have

    published special journal issues that focus solely on visual surveillanceor in human motion/behavior

    analysis. Hu et al. [26] and Valera et al. [27] recently conducted excellent surveys on various algorithms

    and techniques under research and development for video surveillance. They also reviewed some high

    profile IVSS systems. Some comments in this section are derived from their papers.

  • 7/31/2019 Smart Camera Review

    19/35

    For video surveillance, image segmentation most often starts with motion detection, which aims at

    segmenting regions corresponding to moving objects from the rest of an image. Background modeling is

    indispensable to motion detection. 3-D models can provide more realistic background descriptions but are

    more costly. 2-D models have more applications currently due to their simplicity. However, all modeling

    techniques need to find ways to reduce the effect of unfavorable factors such as illumination variation,

    moving shadows and so on. Promising techniques for motion segmentation include simple background

    subtraction, temporal differencing, and more complex optical flow methods. Skin-color based

    segmentation can be very useful when human objects are close enough to the camera and lighting is

    consistent. Once segmentation has provided isolated objects, feature extraction and measurements can be

    performed on each object. Simple algorithms for feature extraction include image moments, which can

    provide geometrical features of the objects. For gesture and behavior recognition, promising algorithms

    for feature extraction include MEF (Most Expressive Features), extracted by Karhunen-Loeve projection,

    and MDF (Most Discriminative Features), extracted by multivariate discriminate analysis [28]. Since

    sometimes it is not easy to specify features explicitly, in some applications when the image size is small

    enough, the whole image or transformed image is taken as the feature vector. Examples of algorithms for

    object classification are shape-based classification and motion-based classification. After motion

    detection and object classification, video surveillance systems generally track moving objects from one

    frame to another. Promising algorithms for object tracking can be classified into four categories: region-

    based tracking, active contour based tracking, feature based tracking, and model based tracking. Particle

    filters have recently become a major way of tracking moving objects.

    Human behavior understanding and personal identification are among the most challenging tasks facing

    IVSS systems for high-end security applications. Behavior understanding involves the analysis and

    recognition of motion patterns, and the production of high-level description of actions and interactions.

    Promising approaches and algorithms for behavior understanding include dynamic time warping, finite

    state-machine, HMMs (Hidden Markov Models), time-delay neural networks. Personal identification is of

    increasing importance for many security applications. The human face and gait are now regarded as the

    main biometric features that can be used for personal identification in video surveillance systems. While

    face recognition research and development has made a lot of progress in recent years, current research on

    gait recognition is still in its infancy.

  • 7/31/2019 Smart Camera Review

    20/35

    4.2.2 State-of-the-Art IVSSesA number of high-profile IVSSes have been reported in recent years. These systems, some deployed in

    real-world applications, applied various pattern recognition techniques described in previous sections and

    provided features such as people tracking, behavior recognition, detection of unattended objects and so

    on. Examples are the real-time visual surveillance system W4 [29], the Pfinder system developed by

    Wren et al. [30], the single-person tracking system, TI, developed by Olsen et al. [31], and a system at

    CMU (Carnegie Mellon University) [32] that can monitor activities over a large area using multiple

    cameras connected by a network.

    A few IVSSes based on the use of stand-alone smart cameras have also been reported. The V2 system

    developed by Christensen and Alblas [33] is a surveillance system that avoids the disadvantages of the

    centralized computer server, and moves many of the processing tasks directly to the camera, making the

    system a group of smart cameras connected across the network. The event detection and storage of event

    video can be performed autonomously by the camera. Thus, normally, it is only necessary to

    communicate with a central point when significant events occur. The VSAM project described by Collins

    [34, 35]is a multi-camera surveillance system composed of a network of smart sensors that are

    independent and autonomous vision modules. These vision sensors are capable of detecting and tracking

    objects, classifying the moving objects into semantic categories such as human or vehicle and

    identifying simple human movements such as walking. Desurmont et al. [36] developed a smart network

    camera system with three smart cameras to perform people tracking and counting in shopping malls.

    Their system uses web services standards and XML-based meta data to implement inter-camera and

    camera-to-host coordination. Fleck et al. [37] designed a smart camera that contains an FPGA and a

    PowerPC processor to perform face tracking and people tracking, using particle filters on HSV (Hue,

    Saturation, Value) color distributions. The camera outputs the approximated PDF (probability distribution

    function) of the target state to a host computer.

    4.3 Application: Industry Machine VisionWhile advanced algorithms for smart cameras for surveillance applications are mostly still in their

    research and development stage, due to high complexity and high-level of robustness requirement for

    real-world applications, smart cameras for industry machine vision have long established their places in

  • 7/31/2019 Smart Camera Review

    21/35

    the market as mature players. Most machine vision cameras are stand-alone and autonomous smart

    cameras, where communications with PC or other central control unit is only needed for camera

    configuration, firmware upgrading or in some cases output data collection. Most algorithms implemented

    in these cameras follow the similar processing flow described in figure 5. One important reason for the

    relative maturity of machine vision smart cameras, compared with smart cameras for surveillance, is that

    the application requirements for machine vision cameras are much less restrictive compared with those

    for surveillance cameras. In other words, many pattern recognition algorithms or techniques have a much

    better chance of performing with satisfactory robustness and reliability for machine vision than for

    surveillance applications. This is because machine vision cameras mainly deal with conditions such as:

    indoor use, thus good and consistent lighting conditions can be more easily guaranteed; minimum problems of occlusion; static and known background, thus unusual feature detection is simpler; limited object patterns to be recognized; and no human movement tracking and recognition is necessary.

    There are many proven software packages on the market that can be customized or directly

    implemented for programmable machine vision cameras. Most of these packages are for special industry

    sectors, but some are general purpose packages, including a few powerful up-market libraries such as

    Halcon library [38]. The Halcon library provides algorithms that include shape-based matching to find

    objects based on ROI (region of interest) modeling, blob analysis, metrology (both 1D and 3D), edge

    detection, edge and line extraction, contour processing, template matching, and color processing.

    Thanks to the advancements in embedded system technologies and improved affordability of

    processing power, there is a migration of the functionality of what were once only PC-based systems

    down to the smart camera level. Artificial intelligence is one of these functionalities. Pulnix Americas

    ZiCAM camera, for example, makes use of a hardware neural network to eliminate the need for

    programming to execute image-understanding algorithms [39]. It can learn what is required for a machine

    vision application, and once taught, operates as a stand-alone smart camera. Wintriss Engineering

    manufactured a smart camera which sports a microprocessor, DSP and multiple FPGAs with up to

  • 7/31/2019 Smart Camera Review

    22/35

    130,000 gates [40]. The company offers both area- and line-scan versions of their smart cameras, with

    line scan version being able to perform imaging-related processes on 5 150 pixel lines at 40 MHz. One

    such camera uses an FPGA to perform image sensor control and pixel correction, and the combination of

    the compute power in the camera head to run real-time digital filters, lighting correction, streak correction

    and input/output capability. Ultimately geometric and photometric manifested flaws are discriminated

    based on connectivity analysis, all performed within the camera.

    4.4 Application: Intelligent Transport Systems and Automobiles4.4.1 ITS Applications

    There is growing awareness and interest in using smart cameras in Intelligent Transport Systems (ITS)

    and automobile industries. IEEE organized very recently an international workshop in June 2005 on

    Machine Vision for Intelligent Vehicles [41]. Generally speaking, the application and algorithmic

    requirements for ITS are quite similar to those of IVSS. These requirements can be quite different for

    automobile applications, however, where high-speed imaging and processing are often needed, imposing

    higher level of demand on both hardware and software. Increased robustness is also required for car-

    mounted cameras to deal with varying weather conditions, speeds, road conditions, car vibrations. CMOS

    image sensors can overcome problems like large intensity contrasts due to weather conditions or road

    lights and further blooming, which is an inherent weakness of existing CCD image sensors [42].

    There have been a number of successful applications of smart camera systems for ITS reported in the

    literature. The VIEWS system at the University of Reading [43] is a 3D model-based vehicle tracking

    system. Kumar et al. [44] described a real-time rule-based behavior-recognition system for traffic videos.

    This system will be useful for better traffic rule enforcement by detecting and signaling improper

    behaviors, which is capable of detecting potential accident situations and is designed for existing camera

    setups on road networks. Beymer et al. [45]presented a smart camera-based monitoring system for

    measuring traffic parameters. The aim of the system is to capture video from cameras that are placed on

    poles or other structures looking down at traffic. Once the video is captured, digitized and processed by

    onsite smart camera, it is transmitted in summary form to a transportation management centre for

    computing multi-site statistics like travel times. Bramberger et al. [42] described an embedded smart

    camera for stationary vehicle detection. They discussed the mapping of high-level algorithms to

  • 7/31/2019 Smart Camera Review

    23/35

    embedded system components. Dimitropoulos et al. [46] described a network of smart cameras deployed

    at the airport to detect and track aircrafts; each camera can autonomously detect aircraft traffic in multiple

    locations within its field of view. A camera data fusion module performs data fusion from multiple

    cameras to determine the location and size of the aircraft. Other applications for smart cameras for ITS

    include vehicle behavior in parking lots, vision based vehicle speed measurement, red-light intrusion at

    traffic lights, vehicle number plate recognition. Some authors have expressed the need to integrate smart

    traffic surveillance systems with existing traffic control systems to develop the next generation of

    advanced traffic control and management system [47].

    4.4.2 Automobile ApplicationsIntelligent vehicles will form an integral aspect of the next generation technology of ITS. Smart

    camera-powered intelligent vehicles will have the comprehensive capability of monitoring the vehicle

    environment including the drivers state and attention inside of the vehicle as well as detecting roads and

    obstacles outside the vehicle, so as to provide assistance to drivers and avoid accidents in emergencies.

    However, building and integrating smart cameras into vehicles is not an easy task: On one hand the

    algorithms require considerable computing power to work reliably in real-time and under a wide range of

    lighting conditions. On the other hand, the cost must be kept low, the package size must be small and the

    power consumption must be low [48]. Applications of smart cameras in intelligent vehicles include lane

    departure detection, cruise control, parking assistance, blind-spot warning, driver fatigue detection,

    occupant classification and identification, obstacle and pedestrian detection, intersection-collision

    warning, overtaking vehicle detection. Below are a few examples.

    Stein [49] described a single smart camera-based adaptive cruise control system for intelligent vehicles.

    In a paper on obstacle detection using stereo vision, Ruichek [68] focused on a multilevel- and neural-

    network-based stereo-matching method for real-time road obstacle detection with linear cameras for use

    in vehicles. Xu et al. [50] addressed the problem of pedestrian detection and tracking with night vision

    using a single infrared video camera installed on the vehicle. The EyeQ is a single chip smart camera

    processor developed by Mobileye [51]. It has been fabricated using 0.18m CMOS technology, operating

    at 120 MHz. It integrates two 32 bit RISC ARM946E CPUs, four Vision Computing Engines, a multi-

    channel DMA (Direct Memory Access) and several peripherals and is designed for computationally

  • 7/31/2019 Smart Camera Review

    24/35

    intensive applications for real-time visual recognition and scene interpretation for use in intelligent

    vehicle systems.

    4.5 Other Application AreasOther important applications for smart cameras include HCI, medical imaging, robotics, games and

    toys. Optical mice are widely used. Smart cameras performing gesture recognition will play important

    role in the development of multimodal user interfaces. Bonato et al. [52] presented an FPGA-based smart

    vision system for mobile robots capable of performing real-time human gesture recognition. The RVT

    system developed by Leeser et al. [53] and based on FPGA processing allows surgeons to see live retinal

    images with vasculature highlighted in real time during surgery.

    5 Smart Camera Design Considerations and Future DirectionsIn this final section we discuss design considerations for smart cameras as embedded systems, identify

    several key issues that need to be addressed by the design and research community, and speculate on the

    future directions of smart camera research and development.

    5.1 Design Considerations5.1.1 Design and Development Process

    Figure 6 shows a typical design and development process for smart cameras as embedded systems

    (excluding single-chip smart cameras). A shown in figure 6, the process can be iterative, especially if the

    initial application specification was not complete from the end user point of view.

    Application

    Requirements

    Specifications

    System

    Architecture

    Design

    Proof of Concept

    - Algorithm and

    Hardware

    Algorithm

    Conversion

    Embedded System

    Integration and

    Debugging

    Field Test -

    EvaluationRequirements

    Met?

    Engineering Prototyping /

    Manufacturing

    Project

    Definition

    No

    Yes

    Figure 6: Design and development process for smart cameras as embedded systems.

  • 7/31/2019 Smart Camera Review

    25/35

    The system architecture design stage will decide on software and hardware architectures, based on

    performance, deadline and cost criteria. Algorithmic design and timing design suitable to the targeted

    hardware platform also needs to be defined. The mapping between algorithm requirements and hardware

    resources is an important issue. The proof-of-concept stage may use a PC platform for research and

    algorithm development. Usually a COTS (Commercial Off-The-Shelf) general purpose camera is used at

    this stage. Hardware components need to be acquired, integrated and tested. However, this is not needed

    if, during the architecture design stage, a third party camera development platform or hardware

    accelerator unit for video processing is identified to be an appropriate solution to hardware platform (see

    section 5.1.6 for examples of smart camera development platforms). The algorithm conversion stage

    includes tasks such as converting floating-point arithmetic to fixed-point arithmetic, low power and low

    complexity version consideration, implementation using HDL (Hardware Description Language). The

    Embedded System Integration stage will result in a prototype smart camera using an embedded hardware

    platform running embedded versions of algorithms.

    5.1.2 System Architecture and Design MethodologySystem architecture design will surely depend on application requirements, which can be very simple

    (e.g. an optical mouse) but can be very complex (e.g. face recognition). System architecture design has to

    consider many factors such as the hardware platform, cost, time to market, flexibility, and so on.

    Generally speaking, a heterogeneous, multiple-processor architecture can be ideal for smart camera

    development. For example, such an architecture may consist of an FPGA or a DSP as a data processor to

    tackle image segmentation and feature extraction, and a high-performance DSP or media processor to

    tackle math-intensive tasks such as statistical pattern classification. This kind of system can allow better

    exploitation of pipelining and parallel processing, which are essential to achieve high frame rates and low

    latency. Some authors have reported work on the impact of hardware system architecture on the level of

    implementable pipelining and parallel processing for smart cameras [54, 55]. Some initial work has been

    reported on design methodology for embedded vision systems [56, 57].

    5.1.3 Embedded ProcessorsThere are generally four main families of embedded processors that can be used for smart cameras:

    Microcontrollers, ASICs (Application Specific Integration Circuits), DSPs (Digital Signal Processors)

    and PLDs (Programmable Logic Devices) such as the FPGA. Microcontrollers are cheap but have limited

  • 7/31/2019 Smart Camera Review

    26/35

    processing power and are generally not suited for building demanding smart cameras. ASICs are powerful

    and power-efficient processors, but the design cost and risk are high and they are viable solutions only

    when volume is high and time-to-market is well-timed. DSPs are relatively cheap and powerful in

    performing image and video processing, but for demanding applications usually more than one DSP

    would be needed. DSP-based solutions can be cost-effective for medium-volume production. Recently a

    new class of DSP processors, called media processors, has come into the vision market. Media processors

    try to provide a good trade-off between flexibility and cost-effectiveness. They typically have a high-end

    DSP core employing SIMD (Single Instruction Multiple Data) and VLSI architectures, married on-chip

    with some typical multimedia peripherals such as video ports, networking support, and other fast data

    ports [58]. Examples of media processors are Philips TriMedia, TIs DM64x, ADIs (Analog Devices,

    Inc) Blackfin.

    The FPGA has recently emerged as a very good hardware platform candidate for embedded vision

    systems such as smart cameras. One of the most important advantages of the FPGA is the ability to

    exploit the inherently parallel nature of many vision algorithms. FPGAs used to be mainly employed as

    glue logic between processors and peripherals, but the introduction of on-chip hardware multipliers and

    dual-port memory has made FPGAs excellent options for DSP applications. The integration of

    microprocessors into FPGA chips (such as Xilinx Virtex-II Pro and Virtex-4 chips) made them true

    system-on-a-chip solutions. These features, together with the continuous improvements in cost and

    maturity of design tools, have made FPGAs very competitive against DSPs and media processors for

    many types of embedded vision system designs. In fact, an increasing number of publications on smart

    cameras as embedded systems have employed FPGAs as the sole processor or as a data-intensive

    processor before a DSP or a media processor, in a powerful heterogeneous multi-processor architecture

    [59]. Sen et al. [56] has recently proposed a design methodology for effectively and efficiently

    implementing computer vision algorithms on FPGA to build smart cameras. A study to compare the

    relative performance of running various image processing routines on DSP, PowerPC, Intel Pentium 4

    and FPGA was published on Alacrons web site [60], in which the FPGA solution was found to produce a

    distinct advantage. However, a more standardized performance evaluation mechanism to help processor

    selection is much needed.

  • 7/31/2019 Smart Camera Review

    27/35

    How should one choose between DSPs, media processors, ASICs and FPGAs? Kisacanin [58] proposed

    a practical way to help processor selection based on intended production volume, cost and development

    flexibility. He argued that ASICs may be suitable for high volume of over 1 000 000units, DSPs or media

    processors for medium volumes between 10 000 and 100 000 units, while for low volumes of under 10

    000, FPGAs can be a good viable candidate.

    5.1.4 Algorithms Development and ConversionAlgorithm development for embedded systems is quite different from that for PC-based platforms.

    Basically it can be a lot more demanding and challenging, especially if FPGA or ASIC processors are

    targeted. Usually when designing applications for ASIC or FPGA, one has to understand chip architecture

    so that algorithms can be executed efficiently and effectively. Nowadays behavior synthesizers or

    algorithmic synthesizers do exist to help designers to forget about the device architecture and focus on

    functionality, but they come at the cost of efficiency in terms of chip area or gate counts and power

    consumption. Therefore, it is always important to gain an intimate knowledge of the device architecture

    of whichever of the ASIC, FPGA or DSP is targeted. This intimate knowledge can also help design

    parallel processing and pipelining processing, which can be a very important and effective video

    processing technique. Converting floating-point arithmetics to fixed-point and eliminating divisions as

    much as possible (by using hardware multipliers and look-up tables, for example) are other design

    considerations for algorithm conversion.

    5.1.5 Other FactorsMemory System - Smart cameras need flexible memory models to meet requirements such as scalable

    frame buffers to cope with increasing image sensor resolutions. As the smart camera may integrate

    different types of processors, the memory system should support potentially complex processing pipeline

    and parallelism in order to meet the applications real-time requirements. For single chip smart cameras,

    care needs to be taken at design stage to conserve memory [54].

    Communication Protocols - There are currently too many data output protocols for cameras, such as

    Firewire, CameraLink, GigE, USB. Firewire is maturing but CameraLink remains the bandwidth leader

    and very popular with the machine vision users. Unfortunately, the variety of digital interfaces increases

    the confusion in the market and put pressure on the camera vendors to support multiple versions of

    cameras with different interfaces.

  • 7/31/2019 Smart Camera Review

    28/35

    5.1.6 Smart Camera Development PlatformsThere have been a number of commercially available programmable smart camera platforms for

    developers to design and prototype smart cameras for applications such as machine vision, biometrics,

    HCI and surveillance. Philips has introduced the INCA (INtelligent CAmera) series of programmable

    cameras [61] which integrate CMOS image sensors of various resolutions and a highly flexible duel-core

    processing unit which includes a Xetal processor for computation intensive signal processing such as

    feature extraction, together with a high performance TriMedia DSP core for math-intensive processing

    tasks such as pattern recognition. The camera comes with an application development kit allowing for fast

    prototyping. One application has been designed for face recognition [62], in which the Xetal is used for

    face detection and TriMedia for face recognition. Sony has recently released a smart camera development

    system XCI-SX1 that integrates an SXGA CCD image sensor (15 frames per second, 34fps at 640x480

    resolution) and an AMD GeodeGX533 400Mhz processor running MontaVista Linux operating system

    [63]. The camera platform is designed to provide OEMs, systems integrators and vision tool

    manufacturers a rugged, robust component, combining the imager, intelligence and interface in a single

    plug-in module that is simple to set up and easy to integrate. The IQeye3 IP camera from IQinvision Inc,

    powered by a 250 MIPS PowerPC CPU, is a platform for smart IP network camera development [64].

    Some signal processing tool development companies provide multi-processor development systems that

    can serve as excellent development platforms for smart cameras. For example, Hunt Engineering [65]

    provides a development platform HERON based on a Xilinx FPGA and a TI (Texas Instruments) DSP.

    They also provide expansion capabilities to integrate video capture, IPs, more DSPs and/or FPGAs for

    creating scalable smart camera architectures. Lyrtech also provides similar development systems in its

    SignalMaster series of products [66]. These systems generally provide flexible communication ports and

    drivers.

    5.2 Key Issues or ChallengesSystem Design The proprietary nature of smart cameras can limit choices of hardware, like imagers,

    I/O, lighting, lens and the communications format. This may lead to a lack of expandability and flexibility

    of PC-based systems. On the other hand, smart cameras dont have as many software applications and

    libraries as already exist for PC/frame grabber-based systems. In terms of design methodology, the easy

  • 7/31/2019 Smart Camera Review

    29/35

    integration of intellectual property in the design tool and flow can help foster product differentiation.

    Other important system-level issues include smart camera operating systems, development tools.

    CMOS Image Sensors Dynamic range is still one of the key aspects where CMOS image sensors lag

    behind CCD. Improvement in this area can lead to more low-cost smart cameras using CMOS image

    sensors for machine vision and surveillance applications.

    Algorithm Development Many intelligent pattern recognition algorithms work well in laboratory

    conditions but fail when deployed and implemented in real-world conditions (occlusion, lighting

    condition changes, unfavourable weather conditions), and embedded system environments (scant

    resources, low power, low cost). Robustness and low complexity are among key issues facing researchers

    developing algorithms for smart cameras in surveillance, ITS and automobile applications.

    Performance Evaluation - This is a very significant challenge in smart surveillance systems. Evaluating

    the performance of video analysis systems requires significant amounts of annotated data. Typically,

    annotation is a very expensive and tedious process. Additionally, there can be significant errors in

    annotation. All of these issues make performance evaluation a significant challenge [16].

    Standards Development There is need for the development of some smart camera standards. In fact,

    the European Machine Vision Association (EMVA, [67]) has recently launched an initiative (EMVA

    1288 Standard) to define a unified method to measure, compute and present specification parameters for

    smart cameras and image sensors used for machine vision applications. More needs to be done in this

    respect.

    Single Chip Smart Cameras Single-chip smart cameras are an attractive concept, but the

    manufacturing cost for the single-chip smart cameras can be high because the feature size for making

    digital processors and memory is often different from the one used to make image sensors, which may

    require relatively large pixels to efficiently collect light. Therefore, for applications where physical space

    and power consumption is not extremely restrictive, it probably still makes sense to design the smart

    camera in a multi-chip approach with a separate image sensor chip. Separating the sensor and the

    processor also makes sense at the architectural level, given the well-understood and simple interface

    between the sensor and the computation engine [54].

  • 7/31/2019 Smart Camera Review

    30/35

    5.3 Future DirectionsThe demand for smart cameras will steadily increase in traditional industries such as surveillance and

    industry machine vision, and may also come from new industry and market segments such as healthcare,

    entertainment, education and so on. Research interest, economic and social factors will drive continuous

    technological and product development. Based on the discussions above, we can discern the following

    future directions for smart camera system and technologies.

    At the system design level, continuous effort will be made in the development of a researchstrategy or design methodology for smart cameras as embedded systems. Same for the

    development of libraries and tools that facilitate algorithm implementation in DSPs and FPGAs.

    Research on the general and optimal architectures for smart cameras and on real-time

    operating systems for smart cameras will be undertaken, and the issue of too many digital

    interfaces (Firewire, CameraLink, etc) for cameras will be addressed.

    At the ASIP algorithm development level, in order to improve performance and robustness ofexisting techniques, research should address issues such as occlusion handling, fusion of 2D and

    3D tracking, anomaly detection and behavior prediction, combination of video surveillance and

    biometrical personal identification, multi-sensory data fusion [26].

    Multi-modal, multi-sensory augmented video surveillance systems have the potential to provideimproved performance and robustness. Such systems should be adaptable enough to adjust

    automatically and cope with changes in the environment like lighting, scene geometry or scene

    activity.

    Work on distributed (or networked) IVSS should not be limited to the territory of computervision laboratories, but should involve telecommunication companies and network service

    providers, and should take into account system engineering issues.

    In the machine vision arena, smart cameras will offer more and more functionality. The trend ofdistributing machine vision across the entire production line at points before value is added will

    continue. Neural network techniques seem to have become a key paradigm in machine vision

    that are used either to correctly segment an image in a wide variety of operational conditions or

  • 7/31/2019 Smart Camera Review

    31/35

    to classify the detected object. Stereo and 3D-vision applications are also increasingly

    widespread. Another trend is to utilize machine vision in the non-visible spectrum.

    New product developments will introduce smart camera-based digital imaging systems into

    existing consumer and industry products, to increase their value and create new products.

    Standards development. One area which may need standardization is the metadata format thatfacilitates integration and communication between different cameras, sensors and modules in a

    distributed and augmented video surveillance system. New communication protocols may be

    needed for better communication between different smart camera products.

    Acknowledgements

    The authors would like to thank Dr. Xing Zhang from ST Microelectronics and Dr. Julien Epps of

    National ICT Australia for their many valuable comments and corrections of parts of this paper.

    References

    [1] S. Shigematsu, H. Morimura: A Single-Chip Fingerprint Sensor and Identifier. IEEE Journal of Solid-State

    Circuits, Vol. 34, No. 12, December 1999. pp.1852-1859.

    [2] M. LaPedus: CMOS Image Sensors Market Consolidates.

    http://www.eet.com/news/semi/showArticle.jhtml?articleID=177102846.

    [3] Intel Open Source Computer Vision Library. http://www.intel.com/technology/computing/opencv/index.htm.

    [4] Chicago Pairing Surveillance Cameras with Gunshot Recognition Systems.

    http://www.securityinfowatch.com/online/CCTV--and--Surveillance/Chicago-Pairing-Surveillance-Cameras-with-

    Gunshot-Recognition-Systems/4628SIW427.

    [5] Marketresearch.com: Global digital video surveillance markets.

    http://www.marketresearch.com/product/display.asp?productid=1032291&xs=r.

    [6] Frost & Sullivan: Video Surveillance Software Emerges as Key Weapon in Fight Against Terrorism.

    http://www.prnewswire.co.uk/cgi/news/release?id=151696.

    [7] Smart Products Can See The Future.http://www.imsresearch.com/members/pr.asp?X=103.

    [8] Smart Cameras Drive Machine Vision Growth. Advanced Imaging Journal. October 2005. page 8.

    [9] Machine Vision Online: JAI PULNiX Forms New Smart Camera Business Unit.

    http://www.machinevisiononline.org/public/articles/archivedetails.cfm?id=1990.

    http://www.eet.com/news/semi/showArticle.jhtml?articleID=177102846http://www.intel.com/technology/computing/opencv/index.htmhttp://www.securityinfowatch.com/online/CCTV--and--Surveillance/Chicago-Pairing-Surveillance-Cameras-with-Gunshot-Recognition-Systems/4628SIW427http://www.securityinfowatch.com/online/CCTV--and--Surveillance/Chicago-Pairing-Surveillance-Cameras-with-Gunshot-Recognition-Systems/4628SIW427http://www.marketresearch.com/product/display.asp?productid=1032291&xs=rhttp://www.prnewswire.co.uk/cgi/news/release?id=151696http://www.imsresearch.com/members/pr.asp?X=103http://www.machinevisiononline.org/public/articles/archivedetails.cfm?id=1990http://www.machinevisiononline.org/public/articles/archivedetails.cfm?id=1990http://www.imsresearch.com/members/pr.asp?X=103http://www.prnewswire.co.uk/cgi/news/release?id=151696http://www.marketresearch.com/product/display.asp?productid=1032291&xs=rhttp://www.securityinfowatch.com/online/CCTV--and--Surveillance/Chicago-Pairing-Surveillance-Cameras-with-Gunshot-Recognition-Systems/4628SIW427http://www.securityinfowatch.com/online/CCTV--and--Surveillance/Chicago-Pairing-Surveillance-Cameras-with-Gunshot-Recognition-Systems/4628SIW427http://www.intel.com/technology/computing/opencv/index.htmhttp://www.eet.com/news/semi/showArticle.jhtml?articleID=177102846
  • 7/31/2019 Smart Camera Review

    32/35

    [10] W. Hardin, Smart Cameras: The Last Step in Machine Vision Evolution?

    http://www.machinevisiononline.org/public/articles/archivedetails.cfm?id=389.

    [11] H. Broers, R. Kleihorst, M. Reuvers and B. Krose: Face Detection and Recognition On A Smart Camera.

    Proceedings of ACIVS 2004, Brussels, Belgium, Aug.31- Sept.3, 2004.

    [12] Pixim Digital Pixel System Technology Backgrounder. http://www.pixim.com/html/tech_about.htm.

    [13] L. Albani, P. Chiesa, D. Covi, G. Pedegani, A. Sartori, M. Vatteroni: VISoc : A Smart Camera SoC. Proceedings

    of the 28th European Solid-State Circuits Conference, pp.367-370, Firenze, Italy, September 2002.

    [14] T.W.J. Moorhead, T.D.Binnie: Smart CMOS Camera For Machine Vision Applications. Image Processing and

    Its Applications, Conference Publication No.465. IEE 1999. pp.865-869.

    [15] M.S. Lee, R. Kleihorst, A. Abbo, E. Cohen-Solal: Real-time Skin-tone Detection with A Single-chip Digital

    Camera. Proc. of 2001 Intl Conference on Image Processing. Volume 3, 7-10 Oct. 2001. Page(s):306 309.

    [16] A. Hampapur, L. Brown, J. Connel, S. Pankanti, A. Senior, Y. Tian: Smart Surveillance: Applications,

    Technologies and Implications. 4th IEEE Pacific-Rim Conference On Multimedia. 15-18 December 2003, Singapore.

    [17] SmartCam - Design and Implementation of an Embedded Smart Camera:

    http://www.iti.tu-graz.ac.at/de/research/smartcam/smartcam.html.

    [18] W. Wolf, B. Ozer, T. Lu: Smart Cameras As Embedded Systems. IEEE Computer, 35(9):4853, Sep 2002.

    [19] The First IEEE Workshop on Embedded Computer Vision: http://www.scr.siemens.com/ecv05/.

    [20] SmartCam: Devices for Embedded Intelligent Cameras. http://www.stw.nl/projecten/E/ees5411.html.

    [21] Advanced Imaging. http://www.advancedimagingpro.com/.

    [22] Machine Vision Resources. http://www.eeng.dcu.ie/~whelanp/resources/r_references.html.

    [23] Machine Vision Online: http://www.machinevisiononline.org/.

    [24] USPTO. http://patft.uspto.gov/netacgi/nph-Parser?Sect1=PTO2&Sect2=HITOFF&u=/netahtml/search-

    adv.htm&r=1&p=1&f=G&l=50&d=ptxt&S1=(smart+AND+camera).TTL.&OS=ttl/(smart+and+camera)&RS=TTL/(

    smart+AND+camera) .

    [25] K. R. Castleman: Digital Image Processing. 1st

    edition, Prentice Hall, New Jersey, 1996.

    [26] W. Hu, T. Tan, L. Wang and S. Maybank: A Survey on Visual Surveillance of Object Motion and Behaviors.

    IEEE Transactions on Systems, Man and Cybernetics. Vol. 34, No. 3, August 2004. 334-352.

    [27] M. Valera and S.A. Velastin: Intelligent distributed surveillance systems: A review. IEE Proc.-Vis. Image Signal

    Process. Vol. 152, No. 204 2, April 2005. 192-204.

    [28] Y. Wu, T.S. Huang: Vision-Based Gesture Recognition: A Review. Lecture Notes in Computer Science. Volume

    1739, 1999. pp.103-114.

    http://www.machinevisiononline.org/public/articles/archivedetails.cfm?id=389http://www.pixim.com/html/tech_about.htmhttp://www.iti.tu-graz.ac.at/de/research/smartcam/smartcam.htmlhttp://www.scr.siemens.com/ecv05/http://www.stw.nl/projecten/E/ees5411.htmlhttp://www.advancedimagingpro.com/http://www.eeng.dcu.ie/~whelanp/resources/r_references.htmlhttp://www.machinevisiononline.org/http://patft.uspto.gov/netacgi/nph-Parser?Sect1=PTO2&Sect2=HITOFF&u=/netahtml/search-adv.htm&r=1&p=1&f=G&l=50&d=ptxt&S1=(smart+AND+camera).TTL.&OS=ttl/(smart+and+camera)&RS=TTL/(smart+AND+camera)http://patft.uspto.gov/netacgi/nph-Parser?Sect1=PTO2&Sect2=HITOFF&u=/netahtml/search-adv.htm&r=1&p=1&f=G&l=50&d=ptxt&S1=(smart+AND+camera).TTL.&OS=ttl/(smart+and+camera)&RS=TTL/(smart+AND+camera)http://patft.uspto.gov/netacgi/nph-Parser?Sect1=PTO2&Sect2=HITOFF&u=/netahtml/search-adv.htm&r=1&p=1&f=G&l=50&d=ptxt&S1=(smart+AND+camera).TTL.&OS=ttl/(smart+and+camera)&RS=TTL/(smart+AND+camera)http://patft.uspto.gov/netacgi/nph-Parser?Sect1=PTO2&Sect2=HITOFF&u=/netahtml/search-adv.htm&r=1&p=1&f=G&l=50&d=ptxt&S1=(smart+AND+camera).TTL.&OS=ttl/(smart+and+camera)&RS=TTL/(smart+AND+camera)http://patft.uspto.gov/netacgi/nph-Parser?Sect1=PTO2&Sect2=HITOFF&u=/netahtml/search-adv.htm&r=1&p=1&f=G&l=50&d=ptxt&S1=(smart+AND+camera).TTL.&OS=ttl/(smart+and+camera)&RS=TTL/(smart+AND+camera)http://patft.uspto.gov/netacgi/nph-Parser?Sect1=PTO2&Sect2=HITOFF&u=/netahtml/search-adv.htm&r=1&p=1&f=G&l=50&d=ptxt&S1=(smart+AND+camera).TTL.&OS=ttl/(smart+and+camera)&RS=TTL/(smart+AND+camera)http://www.machinevisiononline.org/http://www.eeng.dcu.ie/~whelanp/resources/r_references.htmlhttp://www.advancedimagingpro.com/http://www.stw.nl/projecten/E/ees5411.htmlhttp://www.scr.siemens.com/ecv05/http://www.iti.tu-graz.ac.at/de/research/smartcam/smartcam.htmlhttp://www.pixim.com/html/tech_about.htmhttp://www.machinevisiononline.org/public/articles/archivedetails.cfm?id=389
  • 7/31/2019 Smart Camera Review

    33/35

    [29] I. Haritaoglu, D. Harwood, and L. S. Davis: Real-time surveillance of people and their activities. IEEE Trans.

    Pattern Anal. Machine Intell., vol. 22, pp. 809830, Aug. 2000.

    [30] C. R. Wren, A. Azarbayejani, T. Darrell, and A. P. Pentland: Pfinder: real-time tracking of the human body.

    IEEE Trans. Pattern Anal. Machine Intell., vol. 19, pp. 780785, July 1997.

    [31] T. Olson and F. Brill: Moving object detection and event recognition algorithms for s