distributed smart cameras for hard real-time control

Distributed Smart Cameras forHard Real-Time Control

Herwig GuggiInstitute of Networked and Embedded Systems

Pervasive Computing GroupKlagenfurt University

[email protected]

Bernhard RinnerInstitute of Networked and Embedded Systems

Pervasive Computing GroupKlagenfurt University

[email protected]

ABSTRACTThis paper describes an approach for integrating cognition,real-time communication and control. Our test setup is acrane system whose area of operation is observed by dis-tributed smart cameras analyzing the crane’s environment.The position and motion information of all detected objectsare transferred to the crane controller via a real-time net-work. With this information the controller can optimize thetrajectory of the load. We present preliminary results ofa single camera setting and give an outlook for the multicamera data fusion process.

Categories and Subject DescriptorsC.2.4 [Computer-Communication Networks]: DistributedSystems

Keywordssmart camera, real-time control

1. INTRODUCTION AND MOTIVATIONControl systems can be found in many real world applica-

tions such as industrial automation, automotive and trans-portation. The controller determines the behavior of thecontrolled object based on the observed internal state cap-tured by some sensors. In this work we use multiple camerasto observe the environment of the controlled object and pro-vide information of the environments’ current state to thecontroller which is then able to optimize the behavior. In-formation about the environment, such as position and ve-locity of obstacles, may be very helpful for control optimiza-tion, however to be useful this information has to be deliv-ered within guaranteed time bounds, i.e., in hard real-time.Thus, our ultimate objective is to integrate real-time imageanalysis, adaptive motion control, and synchronous commu-nication between the imaging and control subsystems.

In this work in progress, we deploy distributed smart cam-eras [6] to monitor the environment of a model crane. The

Permission to make digital or hard copies of all or part of this work forpersonal or classroom use is granted without fee provided that copies arenot made or distributed for profit or commercial advantage and that copiesbear this notice and the full citation on the first page. To copy otherwise, torepublish, to post on servers or to redistribute to lists, requires prior specificpermission and/or a fee.ICDSC 2010 August 31 – September 4, 2010, Atlanta, GA, USACopyright 20XX ACM 978-1-4503-0317-0/10/08 ...$10.00.

smart cameras are connected over a time-triggered networkwith the crane’s control system and deliver the position andvelocity of all obstacles within the predefined frame rate of20 fps. By knowing the position of obstacles, the controllercan adapt the trajectory of the payload appropriately andincrease the safety and efficiency of the crane system.

Our contribution comprises the vision system. This in-cludes the image analysis, the scene reconstruction, the ob-ject motion prediction and the communication with the con-trol system.

This paper reports on the current state of this research.The remainder is organized as follows. Section 2 sketchesrelated work. Section 3 briefly describes the system overviewand high-level data flow. In section 4 we present preliminaryresults on a single camera setting and in section 5 we discussdifferent methods for data fusion in the distributed smartcamera network. Section 6 concludes the paper.

2. RELATED WORKCollision avoidance [2, 9] is a very active field in robotics.

Examples include work for ground based robots [7, 4] butrecently also for flying robots [1, 10]. However, we onlydescribe closely related work on using multiple, stationarycameras for collision avoidance.

Henrich et al. [3] presented a closely related system wheremultiple cameras are used to guide a robot in a region withmoving obstacles. Obstacle detection is performed with fourcalibrated cameras connected to a single computer where theposition and structure of the obstacles are calculated. Af-ter the planning, each configuration (movement step of therobot) is tested for collision with obstacles. If a collision isdetected, the trajectory is re-planned. The paper providesno information about the update rate of the scene or if im-ages are captured while the robot is moving.

Work by Ladikos et al. [5] used a similar setup wherefour cameras are connected to a PC. This PC calculatesthe 3D model with the help of a GPU (graphics processingunit). Potential collision are checked based on boundingboxes around detected obstacles. To prevent collisions thesebounding boxes are extended by a security distance. Thecameras operate at a resolution of 1024x768 and at a framerate of 30 fps.

3. SYSTEM OVERVIEWFigure 1 depicts the system overview as well as the high-

level data flow. The main components include the multi-camera subsystem, the control subsystem (including the cranewith its sensors and actuators) and the real-time network for

Figure 1: System architecture and high-level dataflow

connecting the components. The distributed smart camerasanalyse the crane environment. In especially, the camerasperform some object detection individually and fuse this lo-cal information within the camera network to determine theobject’s position and velocity in the 3D space. This fusedinformation, i.e., object size, position and velocity, is trans-ferred to the control subsystem.

The model crane has a height of approx. 1.3 m and its jibhas a length of approx. 1 m. This results in an operationregion of the crane in form of a cylinder with radius andheight of about 1 m. The ground plane of the observationarea is 2 m × 2.25 m which we consider as the environmentspace. The crane is operated indoors with stable ambientlight. The processing time and the position accuracy are thetwo major requirements for the image analysis on the smartcameras. The control subsystem requires a frame rate of 20fps which in turn limits the maximum time for processingand network transfer to 50 ms. The objects’ positions shouldbe determined within an accuracy of 1 cm.

We use custom-built smart cameras from SLR Engineeringwhich are equipped with an Intel Atom processor running at1.6 GHz and an 100 MBit Ethernet interface , see figure 2 .The camera has a CCD image sensor with a native resolutionof 1360 × 1024 pixels. The control subsystem is based ona real-time work station equipped with an Intel Core 2 Duowith 2.13 GHz CPU and 1024 MB RAM. The smart camerasand the control subsystem are connected via TTEthernet[8] which extends classical Ethernet with services based ontime-triggered protocols to meet time-critical, deterministicor safety-critical requirements.

3.1 Image AnalysisThe image analysis can be basically split into two pro-

cesses. Functions which have to be performed on the entireimage, e.g., image capturing, background subtraction andthresholding are assigned to the first process. These func-tions are also referred to as “object independent process-ing”(cp. section 4).

The second part of the image analysis starts with the listof segments that have been detected on the single image.The following steps are dependent on the observed scene and

Figure 2: Image of the used smart camera from SLREngineering

include functions such as transformation to the real-worldcoordinate system, fusion with corresponding segments fromother cameras, object reconstruction, speed determination.All these functions have to executed for each object individu-ally and are also referred to as“object dependent processing”.

4. SINGLE CAMERA RESULTSWe conducted first experiments with a single camera set-

ting where the camera is mounted at a height of 1.5 metersat a distance of 2 meters from the crane. In this setting weassume that all objects are placed on the ground plane. Allprocessing is performed on the smart camera. For every de-tected object, the position of the central point at the groundplane and the size of the bounding box are transferred to thecontrol subsystem at every frame.

We evaluated the performance of the single camera settingwrt. the achieved processing speed, position accuracy andcommunication performance.

4.1 Processing speedIn our image processing pipeline the total processing speed

varies with the resolution and the number of detected objectsin the field of view. Some functions of the image pipeline areindependent of the number of objects while others are not.Object independent execution includes background subtrac-tion, thresholding and contour detection. The function fortransferring the scene description is also independent of thenumber of objects because all position data can be packedinto a single Ethernet frame.

Object dependent execution time includes mapping to real-world coordinates, position determination, motion detec-tion, to mention only some. Table 1 lists the measurementexecution times for different resolution (first column) andnumber of detected objects (second column). The followingtwo columns represent the accumulated execution times forthe object-independent and the object-dependent functions(median of 120 measurements). The final column presentsthe computed execution time for 50 objects at the corre-spondent resolution. This time is calculated with the equa-tion 1 where toi, tod represent the object-independent andobject-dependent time, σoi, σod are the object-independentand object-dependent standard deviation values as listed intable 2 and n is the number of objects (second row of table1).

t = toi + 2 ∗ σoi +tod + 2 ∗ σod

n∗ 50 (1)

The implementation of the controller is currently able to

Table 1: The processing time compared for different resolutions, measured: median for 120 runsResolution number of objects object indepen-

dent processingtime

object dependentprocessing time

time for 50 Ob-jects (calculated)

640x480 12 7.86 ms 2.32 ms 19.59 ms720x576 13 10.31 ms 2.67 ms 23.54 ms800x600 15 11.97 ms 3.07 ms 30.67 ms

1024x768 17 19.77 ms 4.03 ms 38.67 ms1360x1024 19 35.83 ms 4.55 ms 65.3 ms

Table 2: The standard deviation of time for differentresolutions

Resolution σ object independent σ object dependent640x480 0.61 0.1720x576 1.34 0.04800x600 3.97 0.081024x768 3.35 0.061360x1020 8.51 0.09

process the sensory data at a frame rate of 20 fps. Ta-ble 1 shows that within 38.67 ms 97.7% (average + 2 ∗standard deviation) of the scenes with a resolution of 1024× 768 can be calculated. If equation 1 is modified to use3 ∗ σ, 99.9% of the scenes at a resolution of 1024 × 768 canbe calculated at 42.2 ms. Thus it is very unlikely that theprocessing cannot keep pace with the frame rate.1

4.2 Position AccuracyThe position accuracy depends on the geometric setting

as well as the resolution of the image. Table 3 lists theminimum and maximum accuracy for each resolution. Theaccuracy of any pixel is calculated in two steps. In the firststep, the real world coordinates of the selected pixel andall of its neighbors is calculated. In the second step, theabsolute distance between the real-world-coordinates of eachneighboring pixel with the pixel in the center is calculated.The average of these values is the final accuracy as it is listedin table 3.

To decide on a specific setting, the combination of process-ing speed and position accuracy has to be determined. Forexample by considering our measurements, we can achieve atotal frame rate of 20 fps with a position accuracy of 5 mmper pixel at each axis.

4.3 Protocol performanceThe communication protocol is designed to provide the

abstracted object information to the control system. Thechosen structure consists of a small header (4 Bytes) and asimple representation for the object. The objects are repre-sented either as a cylinder or rectangular box. Each objectis defined by its form, its color, the position, speed and ori-entation all in 3D space. All in all 28 Bytes are sufficient torepresent each object. Thus, descriptions of more than 50

1In the rare case the deadline is missed, the objects’ positioninformation at the previous sampling time is transferred tothe controller which can then decide whether to discard thisdata.

Table 3: The accuracy listed for different resolu-tions, accuracy is in [mm/pixel/axis]

Resolution minimum accuracy maximum accuracy640x480 7.22 2.29720x576 6.45 1.96800x600 6.14 1.831024x768 4.72 1.431360x1020 3.59 1.00

objects can be packed into a single Ethernet frame of 1500bytes.

5. DISTRIBUTED CAMERA PROCESSINGIn the work in progress, we use multiple cameras object

detection and position determination. On the one hand,multiple cameras with overlapping field of views help to omitsome of our initial system requirements, e.g., object place-ment on the ground plane, dedicated objects’ shapes. Onthe other hand, the complexity of the system increases. Allcomponents, i.e., the smart cameras; have to be configuredto send the data in the correct format at the correct time tothe correct destination.

In a multi camera system, each camera captures the rawframes and performs the preprocessing. This process willinclude all “object independent” steps (cp. in section 4). Amajor challenge is the fusion of the results from the indi-vidual cameras. One option is to perform the fusion on anadditional hardware component. This approach is only nec-essary if it turns out that the camera platforms do not haveenough computing resources to perform the fusion process.

For the current use case we use of 3 smart cameras. Thereis a number of possible methods for fusing the individual re-sults from each camera. The simplest one is that all camerasperform the preprocessing. After this step, two cameras for-ward their abstracted data to the third one. This camerawill then fuse all data and forward the result to the controlsystem. Currently no time measurement for the fusion pro-cess is available, so the extra delay that would result fromthis structure cannot be assumed yet.

Pipelining is another possible fusion approach. This methodstarts with each camera capturing the current frame andperforming the object-independent processing. After that,the first camera sends its results to the second one. Thiscamera fuses the information from the own frame with thedata from the first camera. The result is forwarded to thelast camera. This camera is then responsible for the finalfusion and to forward the information to the control sys-

tem. This fusion mechanism has the benefit that the delaybetween frame capturing and information transmission tothe control system is relatively small. A drawback is thatthe delay between two frame capturing increases with thenumber of cameras that are included in the system.

Figure 3 shows the idea of an alternative pipelined fu-sion mechanism. Compared to the previous approach, thismethod has an increased delay from image capturing to in-formation transmission. The benefit of this system is thatthe frame rate is increased.

Figure 3: Pipeline fusion approach, increased framerate

During the project it might turn out that it makes senseto keep the fusion process on a single device. In that case avariation of a loop scheduling can be performed (see figure4). The first scene is evaluated by the first camera, thesecond scene by the second camera and the last scene by thelast camera. With that structure, each camera can focus thefusion process on a single scene and all scene data can beevaluated from the same device at the same time.

Figure 4: Fusion in a loop

6. CONCLUSIONSmart cameras with integrated image analysis are fast

enough to detect objects and convert these information. Theinformation of the detected objects is transferred to the con-trol unit. The control system uses these object informationwhich would otherwise not be available to plan the trajec-tory of the crane. The detailed and real-time view of theobjects in the operating range allows the planning of the tra-jectory with respect to collision avoidance and optimisationcriteria like maximum speed or minimum energy consump-tion.

AcknowledgmentThis work was supported by the Austrian Science PromotionFund (FFG) under grant 819482.

7. REFERENCES[1] J. Byrne and C. J. Taylor. Expansion segmentation for

visual collision detection and estimation. In ICRA’09:Proceedings of the 2009 IEEE international conferenceon Robotics and Automation, pages 1938–1945,Piscataway, NJ, USA, 2009. IEEE Press.

[2] S. J. Guy, J. Chhugani, C. Kim, N. Satish, M. Lin,D. Manocha, and P. Dubey. ClearPath: highly parallelcollision avoidance for multi-agent simulation. In SCA’09: Proceedings of the 2009 ACMSIGGRAPH/Eurographics Symposium on ComputerAnimation, pages 177–187, New York, NY, USA,2009. ACM.

[3] D. Henrich and T. Gecks. Multi-camera collisiondetection between known and unknown objects. In 2ndACM/IEEE International Conference on DistributedSmart Cameras ? ICDSC 2008, volume 2, pages 1–10,Stanford/USA, Sept 7-11 2008. ACM/IEEE.

[4] K.-D. Kuhnert and M. Stommel. Fusion ofStereo-Camera and PMD-Camera Data for Real-TimeSuited Precise 3D Environment Reconstruction. InIROS, pages 4780–4785, 2006.

[5] A. Ladikos, S. Benhimane, and N. Navab. Real-Time3D Reconstruction for Collision Avoidance inInterventional Environments. In MICCAI ’08:Proceedings of the 11th International Conference onMedical Image Computing and Computer-AssistedIntervention, Part II, pages 526–534, Berlin,Heidelberg, 2008. Springer-Verlag.

[6] B. Rinner and W. Wolf. A Bright Future forDistributed Smart Cameras. Proceedings of the IEEE,96(10):1562–1564, October 2008.

[7] S. Vacek, T. Schamm, J. Schroder, J. M. Zollner, andR. Dillmann. Collision avoidance for cognitiveautomobiles using a 3D PMD camera. In IntelligentAutonomous Vehicles 2007, Toulouse, France, 2007.

[8] www.tttech.com.

[9] S.-E. Yoon, B. Salomon, M. Lin, and D. Manocha.Fast collision detection between massive models usingdynamic simplification. In SGP ’04: Proceedings of the2004 Eurographics/ACM SIGGRAPH symposium onGeometry processing, pages 136–146, New York, NY,USA, 2004. ACM.

[10] J.-C. Zufferey, A. Beyeler, and D. Floreano.Autonomous flight at low altitude with vision-basedcollision avoidance and GPS-based path following. InProceedings of the IEEE International Conference onRobotics and Automation (ICRA). IEEE, 2010.

distributed smart cameras for hard real-time control

Documents