scene detection for flexible production robot

7/27/2019 Scene Detection for Flexible Production Robot

1/81


2/81


3/81


4/81


5/81

Scene detection for flexible production robot

Scene detektion for fleksibel produktionsrobot

This report was prepared by:

Mikkel Viager (s072103)

Advisors:

Jens Christian Andersen, Professor at DTU, Department of Electrical Engineering

Ole Ravn, Head of Group, Department of Electrical Engineering

Anders B. Beck, Project Leader, Danish Technological Institute

DTU Electrical EngineeringAutomation and Control

Technical University of Denmark

Elektrovej, Building 326

2800 Kgs. Lyngby

Denmark

Tel: +45 4525 3576

[email protected]

Project period: February 2013 - June 2013

ECTS: 30

Education: MSc

Field: Electrical Engineering

Class: Public

Remarks: This report is submitted as partial fulfilment of the requirements for

graduation in the above education at the Technical University of Denmark.

Copyrights: Mikkel Viager, 2013


6/81


7/81


8/81


9/81


10/81


11/81


12/81


13/81


14/81


15/81


16/81


17/81

Abstract

This report documents the development, integration and verification of a scene camera solution

for the Robot Co-Worker prototype at the Danish Technical Institute.

An analysis of the requirements to the implementation is conducted, and it is determined that

no existing solutions can sufficiently fulfil these. Based on two structured light sensors, a complete

solution is developed to match a set of requested functionalities.The result is a ROS package capable of delivering detailed RGBD point cloud segmentations

for each object in the scene. Furthermore, bounding box geometries are estimated and made

available for use in motion planning and in an included service to return the position of boxes

from provided dimensions.

Calibration of the solution is done by automatic estimation of sensor poses in 6D, allowing

alignment of 3D data from the sensors into a single combined point cloud. Creation of a method

for calibration of distance estimates from structured light sensors have also been done, as this was

shown to be necessary.

The implementation is verified through tests and inclusion in demonstrations of industrial

assembly cases as an integrated part of the Robot Co-Worker, fulfilling the requested capabilities.

xi


18/81


19/81

Preface

This project was carried out at the Technical University of Denmark (DTU) in collaboration

with the Danish Technological Institute (DTI). The project was completed in the timeframe from

February 2013 to June 2013, and covers a workload of 30 ECTS credits.

While completing this thesis I have worked with several people whom I would like to thank

for their support; my supervisors Jens Christian Andersen and Ole Ravn for great sparring,my external supervisor Anders B. Beck and the entire Robot Co-Worker team at DTI for their

helpfulness and interest in my work, and my fellow student Jakob Mahler Hansen for his support

and constructive input.

The work of this thesis have been partially funded by the European Commission in relation to

the FP7 project PRACE grant no. 285380, with great appreciation.

xiii


20/81


21/81


22/81


23/81

1.2. PROBLEM FORMULATION 3

1.2 Problem formulation

The project goal is to develop, integrate and verify a scene camera solution for the Robot

Co-Worker, making the system capable of perceiving its work area in 3D. It is imperative that the

implemented solution is both accurate and robust enough to function reliably in connection withother modules, making it a useful feature for the Robot Co-Worker, as well as a viable choice for

inclusion in other projects with similar needs. The scene camera should be able to detect objects in

the scene and provide details on their size and position.

Desired functionality includes:

Simple and fast calibration procedure

Data precision with sufficient accuracy for initial object position estimation

Creation and publication of the scene as a dense 3D pointcloud of RGBD data

Segmentation, of individual objects in the scene, into separate point clouds

Generation of simple bounding geometries for obstacle avoidance in motion planning

Functionality to return the position of a box, of specified size, in the scene

The scene camera solution should be able to operate continuously, even during movement of

the robot arm inside the work area.

As part of the verification process it is desired to have the scene camera feature showcased as a

fully functional and essential part of the Robot Co-Worker during a scheduled public demonstra-

tion in early May 2013.

Furthermore, the project should evaluate on the capabilities of the scene camera in perspective

to options for further development of extended functionality for future tasks, while preparing the

software structure to allow such expansions.

In summary:

Development and implementation of a scene camera solution with easy calibration.

Integration with the Robot Co-Worker and inclusion in showcase demonstration.

Test and verification of the solution.

Evaluation on options for future use and expandability.

Finally, the solution should also be usable with the computer systems at Automation and

Control, DTU, which is to be verified through testing.


24/81


25/81


26/81


27/81


28/81


29/81


30/81


31/81

2.5. SENSORS 11

and inherit from parent launch files, allowing to overwrite any or none of the inherited parameter

values.

2.4.1 Software versions

To match the ROS version which the existing Robot Co-Worker modules were created for, it

is chosen to use ROS fuerte. As operating system is used the popular Linux distribution Ubuntu,

version 12.04 LTS.

A full desktop installation of ROS with the addition of OpenNI sensor driver packages

openni_camera and openni_launch allows direct use of the Kinect sensors.

In order to use the Xtion sensors it is required to roll back the Sensor-Bin-Linux driver to version

5.1.0.41, otherwise the driver will not register the sensors as connected. This is done simply by

running the driver install script with administrator rights, overwriting the existing driver files. To

revert this process, the newest driver version can be used to overwrite the driver files the same

way.

2.5 Sensors

A key aspect of the solution is to utilize useful sensors in optimal positions. The sensor type

has already been decided, so this choice is outside the scope of the project. Two candidate sensors

of this type qualify for use in this case, so a comparison is needed. This section also contains a

brief description of what to expect from structured light technology, as well as thoughts on sensor

positioning.

2.5.1 Structured light

Figure 2.4: A section of the structured

IR light projected by a Kinect. Less than

10% of the entire projection is shown.

Structured light sensors function by projecting a pre-

defined pattern unto a surface and analysing the defor-

mations in the projection, from the viewpoint of a cam-

era with known relative position [10]. Precision of the

sensor is depending on the resolution of the projected

pattern, as well as the resolution of the camera. The cam-

era must have a resolution high enough to distinguish

the individual parts of the projected pattern from the

background, and the resulting 3D precision is dependenton the density of the pattern features. An example of the

pattern projected by a Kinect sensor is shown in figure

2.4. Because of the recent mass production of the Mi-

crosoft Kinect device, structured light sensor technology

is currently very affordable and available to everyone.

The research leading to the choice of structured light sensors for this project shows that

appropriate precision for the desired level of object detection is achievable [12], even though the

technology is not ideal for all surface and texture properties. As with other light based sensors,

it gives rise to problems when surfaces are either very reflective or absorbs the projected light

instead of reflecting it. This downside has been considered, but it is determined that a structuredlight based scene camera will still be of good use in the case at hand.
http://0.0.0.0/http://0.0.0.0/http://0.0.0.0/http://0.0.0.0/


32/81


33/81


34/81

14 CHAPTER 2. ANALYSIS

Figure 2.8: Sensor FOV overlapping.

The entire table surface is covered by

both sensors, but tall objects close tothe sides will only be covered by one.

Moving the cameras to S2 and S4 at the sides of

the cell allows the field of view for both cameras to be

aligned with the work area, as well as provide viewing

angles complementing each other well. Depending on

the anticipated height of objects in the scene, it may be

advantageous to extend the cell and move the cameras

further apart.

In figure 2.8 is shown a conceptual graphic of the

FOV seen from the front of the cell, making it clear why

the detectable object height is limited. Small objects are

entirely covered by both cameras, but at soon as a larger

object is placed closer to the sides, they are not guaran-

teed to be covered in their full height. This could lead

to critical detection errors in rare situations where the

robot arm is blocking the camera above a large object

and the full height of the object is not within the FOV of

the opposite camera. For the cases and demonstrations

used in this project, the viewing angles allow sufficient object height, as the detectable objects

are to be placed within predefined boundaries with enough distance from the work space edges.

These boundaries are decided by the limited reach of the robot with a gripping tool mounted. In

other applications it may be advantageous to move the cameras further apart, but this would

increase the distance to the workspace and thereby decrease sensor precision.

Figure 2.15 shows the FOV of both sensors in the chosen positions, confirming that the entire

work space is covered.

((a)) 640x480 RGB image from Xtion1. ((b)) 640x480 RGB image from Xtion2.

Figure 2.9: Images showing the FOV for both sensors. It should be noted that this is the RGB

image, which has a slightly bigger FOV than what is covered in images for depth estimation.


35/81


36/81

16 CHAPTER 2. ANALYSIS

2.7 Calibration

The precision of 3D data provided by the scene camera setup relies on accurate calibration of

the intrinsic camera properties as well as the poses of sensors in the scene.

In order to make the calibration task both fast and reliable, it is desired to automate and

streamline the process. This will also make it viable to do re-calibration more often, making the

entire solution more resistant to changes in the scene and setup.

2.7.1 Intrinsic camera calibration

Figure 2.10: Openni_camera calibration

screen. Intrinsic camera parameters

are calculated based on several pictures

with a checker board, of given size, in

varying positions and orientations.

As with all vision-based systems it is important to

have a good calibration of the intrinsic camera parame-

ters. This calibration can be done with the ROS package

openni_camera, by following the official tutorial [8], mak-

ing it possible to calibrate the intrinsic camera parameters

as well as the impact of any lens distortion. Because of

the type of lens used in these sensors, there is not muchdistortion of the image, but the best result is achieved

by also calibrating for the slight distortion. A screenshot

from the calibration procedure is shown in figure 2.10.

The same calibration routine can be used to calibrate

the IR sensor used for the depth data, as this is simply

another CMOS sensor with an IR filter in front of it.

2.7.2 Distance calculation

Both the ASUS Xtion and Microsoft Kinect devices come with preset calibrations for internal

calculation of metric distances from raw data values. It is uncertain whether this calibration has

been done individually for each device, but it is clear that quality of the converted metric values

vary a lot between devices.

In order to achieve the desired precision, it is necessary to recalibrate the parameters used for

conversion between raw sensor values and metric values. As no method is available for doing this,

a new functionality needs to be developed. This is described in section 3.2.

2.7.3 External position calibration

In order to have the two sensors contribute to a single well-defined collective point cloud, it is

necessary to align the two separate point clouds in correct relation to each other.

2.7.3.1 3D Feature registration

A common method of combining several datasets into a single global consistent model is the

technique called registration, which iteratively matches and aligns identified feature points from

separate data sets until a given alignment error threshold is reached. However, this method works

best for data sets containing several well defined 3D features for matching, which also requires

relatively high sensor precision. Even if these requirements are met by the structured light sensors,

the computational load caused by the registration algorithm would take up a lot of the available

processing power.

For calibration purposes it could be viable to use registration, as the computational require-

ments would not have very high impact if only used in a one time offline calculation. However, a

simpler external calibration has proven to be sufficiently effective in this case, as described in the

following section.
http://0.0.0.0/http://0.0.0.0/


37/81

2.7. CALIBRATION 17

2.7.3.2 Transforms

Figure 2.11: Sensor poses relative to

common reference, visualized in rviz.

Knowledge about relative positions al-

lows merging of sensor data.

Instead of analysing and realigning the point clouds

after acquisition, a pre-alignment of the relative sensor

poses has proven to be sufficiently accurate. By esti-mating poses of the two sensors relative to a common

world frame, it is possible to have the two resulting point

clouds aligned and positioned with enough precision to

seamlessly make up a common larger and more dense

total point cloud. This can be done by keeping track of

the transforms between the coordinate systems of the

two sensors and the chosen common world reference

(camworld), as illustrated in figure 2.11.

One way to estimate the sensor poses would be to

manually measure them in camworld coordinates. Notonly would this be a tedious and time consuming cali-

bration method, but it would also be unlikely to meet the precision requirements, as it is difficult

to do precise manual measurements of the sensor orientation angles.

A vision based method, using tag tracking, has previously been proven to provide very good

positional estimates with similar sensors [11]. However, as this solution is required to use ROS, it

is not very easy to re-use the mobotware2 plug-in from the previous project. Similar functionality

can be achieved through use of the ROS package ar-track-alvar, serving as a wrapper for Alvar

which is an open source Augmented Reality (AR) tag tracking library.

2.7.3.3 AR-track-alvar

Figure 2.12: An AR tag gen-

erated with ar-track-alvar, en-

coded with the id 19.

The ar-track-alvar [5] ROS package provides functionality

to detect, track and generate AR tags (see figure 2.12) in a

live image stream. Other similar packages have been evalu-

ated; the ar_pose [4], and camera_pose_calibration [13], but nei-

ther was found to be as reliable as ar-track-alvar or provide

the pose data in a similarly directly usable structure. Based

on these findings, it is decided that ar-track-alvar is the best

currently existing package for relative camera pose estima-

tion.

Provided with the intrinsic camera calibration parameters and

the physical size of the AR tag, the package returns the 6D pose

and encoded tag number. The pose is then used to determine the transform between sensor and

AR tag, resulting in a calibration with sufficient precision, as shown in section 4.2.

2Mobotware is a plug-in based software platform developed and used at Automation and Control, DTU
http://0.0.0.0/http://0.0.0.0/http://0.0.0.0/http://0.0.0.0/http://0.0.0.0/http://0.0.0.0/http://0.0.0.0/http://0.0.0.0/


38/81


39/81


40/81


41/81

Chapter 3Developed Elements

As concluded in the analysis, a combination of existing software modules alone does not

provide all of the required functionality to implement the solution. It has been necessary to

develop additions to fill these gaps, as explained in this chapter.

3.1 Software structure

In order to make further development possible in the future, it is imperative that the software

structure of the solution is prepared for this. Implementing all features in a single node would be

possible, but it would also make it very troublesome to maintain and expand the source code.

The implemented types of nodes can be divided into four main categories;

Sensor driver

Core functionality of the sensors, taking care of receiving data and configuration of the

camera parameters and settings.

Calibration

Used only when it is decided that a calibration is needed. Stores the calibration data in a file

which will then be used until a new calibration is done.

Data providers

Handles data streamed from the driver and read from the calibration file, preparing these for

use by correcting depth estimates and keeping the information available.

Scene camera features

Main processing, filtering and segmentation of the acquired and prepared data, in the form

of point clouds. Provides most public topics and services, and expandable with creation of

add-ons.

An illustration of the categories, containing their respective nodes, can be seen in figure 3.1.

This also provides an overview of the included elements, of which the nodes openni_camera and

ar-track-alvar are used directly from existing ROS packages.

There is no strict launch order for the nodes, but no output will be generated if some of the

lower level nodes are not running. The openni_camera sensor driver does not depend on input from

other nodes, and is the obvious choice as the first node to be started. A separate driver node is

needed for each sensor, which is handled with two launch files inheriting most of their contents

from the same parent. Each node then publishes topics with RGB images, point clouds and depth

images.

21


42/81

22 CHAPTER 3. DEVELOPED ELEMENTS

Figure 3.1: Overview chart of software structure in the solution. Division into separate nodes,rather than one single combined, is to allow extraction and use of individual components in otherROS solutions, as well as easy maintenance and the option of utilizing distributed processing.

It should be noted that it is not possible to subscribe to point clouds from both sensors at

the same time, if they are configured to 30 Hz operation, unless they are connected to separate

computers or USB busses. Even then, it would also require a computer with more than average

processing power to receive and handle the data. This is explained further in section 4.4.

The distance precision of point clouds available directly from the driver varies a lot with each

sensor unit, and is not adjustable through calibration in the driver. Because of this uncertainty it

is deemed necessary to include a depth_image_correctornode for generation of more precise point

clouds. This is to be based on calibration of each individual sensor, as explained in detail in section

3.2. The corrected point clouds are then published for further analysis in the main scene cameranode, as well as for calibration purposes.

With corrected depth values, it is now possible to run a calibration of the sensor positions.

The cam_tf_generator is started from another launch file, and is configured to launch and call

ar-track-alvar as necessary. The requested sensor unit id is provided in the launch file and passed

on through a call to ar-track-alvar internally. With every new set of RGBD information a detection

of the AR tag is attempted, and once enough data samples have been collected the average relative

pose transform is calculated and saved to a file. This makes it possible to save calibrations through

system restarts.

Back with the data providers, the tf_publisher node reads the pose calibration file and makes

sure that the transformation is continuously published for use in the point cloud alignment. Here

is also needed one node for each sensor. The calibration file is occasionally checked for new

parameters, to allow recalibration during operation.

With all of the required data, it is now possible to start the actual analysis in the scenecamera

node. This contains the core methods for initial data treatment, providing segmented and down

sampled point clouds for further analysis in following nodes. Some data parts may already be

usable in external modules at this point, but the main goal is to prepare and organize the data for

final analysis in smaller add-ons with more specific tasks.


43/81


44/81


clear that both fits confirms that the estimates deviate significantly from a 1:1 relation (the dashed

line).

((a)) Full plot of all measurements ((b)) Zoom on the area of interest

Figure 3.2: The distance estimates for Xtion2 plotted in relation to measured distances. From the

offset to the cyan colored dashed guideline with correct y = x relation, it is clear that the sensor

estimates are not correct. With a polynomial fitting the estimation data it is possible to calculate the

correct distance from the estimates. The difference between a first and second order fit is hardly

noticeable on the plots, but the second order polynomial does fit the samples best.

It may be possible to enhance the existing structured light sensor driver in ROS by including

calibration parameters for the depth calculation, but this would also require manual inclusion of

the feature in all future versions of the official driver. Instead, an individual node is developed tocomplete this task as a part of the scene camera features.

Internally, the node is named depth_image_corrector, as this is what it does. It handles generation

of a RGBD point cloud, similar to what is done in the driver, but corrects the depth values according

to equation parameters supplied in its launch file. This equation is based on the polynomial line

fit to a set of experimentally obtained measurements, and only needs to be estimated once for

each sensor. The correction equation for the Xtion2 sensor is best estimated as the second order

polynomial: dreal = 0.00008d2est+ 0.963dest+ 24.33, converted to mm.

In the driver there is a step between the gathered data and creation of the point cloud, which

is a depth image where each pixel is encoded with its corresponding depth value. Already at

this point the distances are given in mm, which is the most raw data value available from the

sensor. With the raw sensor values unavailable, the only option is to do a recalibration of these

provided metric values. This is not ideal, but it will result in valid corrections as long as the initial

conversion does not change.

Iterating through the depth image to recalculate each value, provides a corrected depth image

from which a point cloud is then generated. The point cloud is published and made available

alongside the one without recalibration, allowing the user to choose either. This continuous

processing of images and parallel creation of two point clouds is in principle not desirable, as it

is a waste of processing resources. However, because of the way ROS topics works, the original

point cloud is not generated unless a node is an active subscriber to it. The addition of the depth

image correction node will therefore not require additional processing power, as long as there are

no subscribers to the original point cloud.


45/81


46/81


47/81


48/81


49/81

3.5. PERCEPTION PROCESSING STEP 4 - BOX FINDER ADD-ON 29

3.5.1 Detection of rotation

A problem for the box finder add-on, in its current state, is that it relies on sizes of axis-aligned

bounding boxes. In situations where the searched boxes are not aligned with the camworld coor-

dinate system, the bounding boxes will represent a larger size than the actual box within. Smallrotations of only a few degrees may be handled by setting the size tolerance a bit higher, but the

true box pose is still not found.

There is no existing PCL library providing this object-aligned bounding box estimation directly,

and development of such a method is on the boundary of the scope for this project. Some

considerations on the subject was done, but an implementation was not prioritized, as it could be

assumed that the box in the immediate demonstration would be axis aligned.

Two approaches to object alignment estimation have been investigated in this project. As

summarized in the following, one of these was found to be potentially useful.

Principal component analysis (PCA) was considered as a candidate for determining the rota-

tional alignment of objects, but was discarded after experimental results made it clear that this

was not ideal for hollow box shapes. Even if all sides of the box was visible, the estimated major

axis of the object would most likely be from one corner to the adjacent, rather than parallel with

the box sides. This led to the conclusion that PCA is not very well suited for this task. If the boxes

had been solid, containing points as well, it might have been more plausible.

Figure 3.6: Individual visualization of the box

from figure 3.3 with normal vectors in key areas.

As only one box side is partly detected, it is hard

to use normal vector information from only this

to determine the orientation of the box.

Normal vector analysis was then consid-

ered, with the idea that normal vector his-tograms could be used to estimate rotation

from the angles of normal vectors on box sides.

An example of a box point cloud visualized

with its normal vectors can be seen in figure

3.6. The amount available normal vectors in

the shown example makes it clear that the cur-

rent sensor positioning makes it unlikely to

have many of the box sides detected at once,

which could prove to be a problem. Without

sides to analyse for normal vector alignment,estimation of orientation is not easy. Analysis

of the rectangular shape of the visible top layer

would probably provide better results in such

worst case scenarios.

The principle of normal vector analysis is

considered viable for use in orientation estima-

tion of boxes, as well as other types of feature

recognition.


50/81


3.6 Conclusion

Several elements have had to be developed to make the scene camera solution fulfil the given

requirements.

Focus has been on the first part of the perception sequence, allowing robust and correct dataaquisition and segmentation of objects into seperate point clouds, ready for further analysis. The

second perception part with object recognition or classification has been introduced in the form

of a box finder add-on, but actual object recognition is considered to be outside the scope of this

project.

Calibration and data provider nodes have been developed to create an easy calibration proce-

dure, partly consisting of methods allowing inclusion of the existing ROS package ar-track-alvar.

A method for calibration of the depth estimates of the structured light sensors has been shown

to be necessary, making it possible to achieve higher precision. Such a functionality has been

integrated, in the form of a depth_image_corrector node which can also be used directly in other

ROS applications using similar sensors.

A core scenecamera node has been developed to merge, down sample, filter and segment the

acquired point clouds, using algorithms from the PCL. The output from this is ready for use

in add-ons and is published in several forms, alongside simple bounding box geometries allow

obstacle avoidance through motion planning.

Because of specific needs in a demonstration for estimating the location of a box in the scene, a

box finder service add-on has been developed. This can also serve as a near minimal example of

scene camera add-on construction, for use as a reference to new developers.

In order to verify the robustness, reliability and precision of the individual elements, as well as

the collective solution as a whole, experimental tests are documented in the next chapter.


51/81


52/81

32 CHAPTER 4. TESTS

4.1 Impact of noise sources

The Robot Co-Worker concept is to be functional in existing industrial environments in produc-

tion lines, and must be able to function even though conditions are not necessarily optimal. It may

be possible to adapt the work environment slightly to shield the scene camera from most externalinfluence, but some interference between the sensors of the cell itself should also be expected.

4.1.1 Ambient light

Light conditions in the scene is always important for vision based systems. Where many

cameras can be manually adjusted to have white balance and focus distance changed, the Kinect

and Xtion sensors handle such adjustments automatically, with no option for manual adjustment.

In most cases this is sufficient, but it makes the implementation dependant on external light

sources, with no option to adapt.

Figure 4.1: Example of how ambient sun light can

affect the acquisition of 3D data sets with the IR

based structured light sensors. The area of the

table with highest concentration of IR light from

both sensors and the sun is undetectable under

these conditions.

A critical issue is when saturation occurs

for the camera with IR filter, which is used

for 3D data generation. Artificial light sources

have not been seen to cause such issues, con-

tradictory to sunlight. Direct sunlight has the

most significant impact, but diffused reflec-

tions from surfaces or through windows have

proven to be an issue as well.

The image in figure 4.1 shows the impact

of diffused sunlight in the work area. The en-

tire center of the table surface is not registered,

because the saturation of IR light in that area

makes it impossible for the sensor to distin-

guish the projected points.

It is important that the entire work space is

shielded from strong sunlight, in order to en-

sure reliable use of the structured light cameras.

A curtain had to be added to one side of the the

Robot Co-Worker demonstration cell which is

oriented towards several windows. To further

control the lighting conditions, a roof with LED lights has also been added. This is not requiredby the scene camera solution, but it is necessary for the tool camera to have as close to static light

conditions as possible.

4.1.2 Overlapping structured light patterns

It is expected to have some impact from using two structured light sensors to monitor the same

area, as they are both emitting IR dot patterns for depth estimation. As long as alignment of all

nine calibration points [10] is avoided, only limited conflict between the two is noticable.

On most surfaces in the scene there is no problem for each sensor to recognize its own pattern

projection and ignore any extra dots. However, surfaces with reflective properties can sometime

be oriented so that very little returned light is visible to the IR camera, normally causing a small

hole of missing data in the resulting point cloud. In these cases it can be critical if one or more of
http://0.0.0.0/http://0.0.0.0/


53/81

4.1. IMPACT OF NOISE SOURCES 33

the dots projected by the other sensor is at an angle where the light is better reflected, causing the

wrongful assumption that this is the missing dot. Any shift in position of such dots is assumed

by the sensor to be caused by an object in the scene, resulting in a spike in the point cloud. An

example of this is shown in figure 4.2 where the black area of an AR tag causes a significant spike.

Figure 4.2: At the upper right corner of the black

square AR tag can be seen a significant spike indepth reading of a small area. This primarily

occurs with two sensors pointed at the same re-

flective surface.

Of all objects used in the demonstrations

through this project, very few have caused such

significant problems. The AR tag, which is only

in the scene during calibration, as well as the

black tape marking the box zone, are the pri-

mary ones. In both cases the issue is not critical,

as these spikes only consists of a few points,

which is not enough to pass through the seg-

mentation process and be considered as objects.

Worst case is if a real object is positioned very

close to a spike, having this included in the

segmentation of such an object because of the

small distance between them. However, this

is unlikely, as the object will usually block the

projected pattern from one of the two sensors.

It should be kept in mind that these spikes can occur in the 3D data stream, but simple averaging

with various filtering techniques will make them neglectable.


54/81

34 CHAPTER 4. TESTS

4.2 Precision of sensor position calibration

In the analysis it was concluded that the ar-track-alvar package can provide sufficiently precise

6D pose estimates of the two sensors relative to a common reference. It was also seen that the

RGB and RGBD modes gave different calibration results which deviated on either position or

orientation. To further investigate this, pose estimation data from a calibration sequence has been

plotted for analysis as shown in figure 4.3.

The two available modes for pose estimation was run on the same data sequence of approxi-

mately 20 seconds length, which has been recorded and played back using the rosbag functionality.

An AR tag with quadratic sides of 44.7cm has been used for calibration purposes. A smaller varia-

tion of 20cm was also tested, but best accuracy was achieved with the larger one. For unknown

reasons the two modes register the AR tag with a /2 rotation offset, which has been normalized

in the plotted data.

The 100 2D estimations deviate very little because of less sensor noise than with 3D, meaning

that less measurements than 100 would be sufficient to determine a mean value. Noise on the 3D

data readings is much more significant, making the need to average over many measurements

clear. The estimates from RGB images in 2D and from RGBD in both 2D and 3D gives different

results on several parameters, of which translation along the y-axis and roll, pitch rotation is

obvious.

((a)) Estimates of x, y & z coordinates ((b)) Estimates of roll, pitch & yaw

Figure 4.3: Calibration data showing the estimate differences between the 2D only and combined 2D

and 3D methods for 6D pose estimation. The difference in position is around 5cm, and the offset in

orientation is around 0.1 radians in both roll and pitch estimates.

The data plots in figure 4.3 confirm the estimation offsets seen back in figure 2.13 and 2.14.

From these visualizations of point cloud alignment with both calibrations is concluded that the

best position estimate is achieved with the 2D method, suggesting that the combined method gives

a position error of around 5cm for this sensor. However, the 2D method is lacking precision in its

orientation estimate, which is around 0.1 radians off compared to the more accurate result from

the combined method. If run on a 64-bit system, supporting the combined 2D and 3D method, it

could be advantageous to use RGBD for position estimation and RGB for orientation estimation.

On 32-bit systems the RGB method is the only one available, so a small error in orientation offset

should be expected. This can either be accepted as it is, or adjusted manually after the calibration.

In any case, it is obvious that a more precise pose estimation method could improve precision,preferably through development of a new package with both 32-bit and 64-bit compatibility.


55/81

4.3. DISTINGUISHABILITY OF 3D FEATURES 35

4.3 Distinguishability of 3D features

It is desirable that the 3D data collected by the scene camera solution is of sufficient resolution

to allow object recognition, which could be very useful in many scenarios. Depending on the

size of objects and their features, it is possible to adjust the voxel grid resolution to reflect detailrequirements. In these examples the highest possible precision of around 2mm resolution is used.

A key object in the main demonstration of the Robot Co-Worker is the cardboard box with

transformers placed in foam. In figure 4.4 is shown visualizations of the point cloud segmentation

of such a box, with both RGB and a z-axis gradient coloring. Positioning of the box relative to both

cameras is as shown in figure 2.15, also making it clear that full vision of the box is not obtained

with both cameras. With only a single box side partly visible to one of the sensors, it is hard

to classify this as a box, from only the point cloud data. Another classification is still achieved

through object segmentation as shown in figure 3.4. No true classification is done, as all objects are

fitted with axis aligned bounding boxes, regardless of shape. A box classification algorithm would

most likely require more sides of the box to be visible.

((a)) RGB colored transformer box point cloud. ((b)) Same point cloud with z-axis gradient color.

Figure 4.4: Example of a single object point cloud after segmentation. Further analysis of the object

could be conducted to determine how many transformers is left in the box, and where these are

placed. Either from image analysis of the RGB data (left image), or the elevation level of distance

readings (right image).

Looking at the surface of the box, the data from on top of transformers and inside empty slots is

very detailed. For each of the eight slots, both analysis of RGB values and analysis of depth looks

to be viable approaches to determining which are occupied by a transformer. From the 3D data

this is possible because of the feature size, as each hole is around 7cm x 6cm x 6cm, which leads to

a considerable feature chance when placing a transformer in it. On both sides of the transformer is

placed mounting brackets, but as these are not the same size, the transformer is not symmetrical.

Whereas it would be beneficial to know the orientation of a transformer, this is hardly possible

with even the highest resolution of the scene camera solution. It may be possible under optimal

conditions, but in a dynamic work environment it would not be expected to achieve a very high

success rate.


56/81

36 CHAPTER 4. TESTS

Figure 4.5: Point cloud of metal heat sink with a

mounted transformer. Only some parts of major

features are visible.

Another example is the metal heat sink in

which the transformers are to be placed. Even

though the metal surface is matte, surfaces

of some angles are still not detectable with

structured light. This makes it unlikely that

points on all object features are always regis-

tered and available for recognition. The point

cloud result from object segmentation of the

heat sink with a transformer is shown in figure

4.5, which also corresponds to the RGB images

in figure 2.15. Most of the key features are too

small to be registered, and some of the detected

features are only partly included. It is hard to

say if recognition would be possible, but it would definitely not be reliable.

This is as expected from the beginning of the project, confirming the assumption that inclusion

of a tool camera for reliable object recognition is necessary. For larger objects it may be possible to

successfully analyse major features, but reliable classification or recognition can not be expected

with these sensors.


57/81


58/81


59/81


60/81


61/81

4.7. TEST CONCLUSION 41

4.7 Test conclusion

Several critical aspects of the implementation has been investigated, and evaluations of these

have been used to determine the usability and dependability of the scene camera solution.

It has been pointed out that shielding from external IR light sources is crucial for proper datacollection with structured light sensors. In particular it is important to avoid sunlight in the work

area. Potential issues caused by interference between the two light patterns projected on the

same surfaces has been investigated, but are not deemed to have critical impact in most detection

scenarios.

The difference between 6D pose estimation of sensors, with RGB or RGBD, have been shown to

vary. Neither provides a perfect pose estimate, but the RGB approach only has a small deviation

on orientation, making it the most suited candidate for calibration use. Ultimately, a combination

of both methods, or development of a new package, would provide more precise results.

With large objects it is possible to obtain 3D data of high enough detail to distinguish major

features, which could be used in the Robot Co-Worker demo to determine how many transformersare left in a box, as well as their positions. Smaller features, especially on metallic objects, can not

be expected to be distinguishable in the point clouds.

Multi-core processors are best suited to run the solution, as the separation into nodes makes it

possible to assign these to different processor cores for improved performance. Running at full

resolution, with no down sampling, the solution will require a powerful processor in order to

run at a decent rate. With down sampling to a 1cm voxel grid it is possible to achieve a full cycle

processing time of 1Hz or faster, even on a laptop processor.

Multiple demonstrations have been carried out to present examples of intended use of the

scene camera solution in combination with the Robot Co-Worker. These demonstrations have

showed that the project solution requirements are met, and that the implemented features are bothuseful, reliant, and portable to ROS environments on other systems as well.


62/81


63/81

Chapter 5Future Work

The goal of this project has been to develop a scene camera solution to be generally useful in

integration with the DTI Robot Co-Worker prototype. A few specific uses have also been requested

for use in demonstrations, as examples of how the generated point clouds can be used. Targetingboth general usability and specific requests, suggestions for future work are divided into two

categories to reflect the difference between improvements and expansions.

Improvements include aspects of the implemented solution which are open to significant

quality enhancement. These are the suggestions for future work to improve the existing imple-

mentation:

The calibration procedure currently relies on the ar-track-alvar package, which has been

shown to have room for improvement. Refinement of how the package is used, or replace-

ment with another, could make calibrations more precise.

Object segmentation with the kdtree method has proven to be computationally heavy,

suggesting that this approach could be reconsidered, possibly resulting in a more efficient

implementation.

Expansions are ways of adding new features and functionalities on top of the existing imple-

mentation, most likely through development of add-ons. These are suggested areas considered as

worth looking into, because they will add useful tools based on the existing core implementation:

A functionality for monitoring and keeping track of all objects in the scene could be useful

for tracking and identifying when changes occur in the scene or a new object is detected.

An add-on capable of analysing the content of a container in the scene in order to determine

if it is empty, or in which region of the container there is still objects left to be picked.

Object classification or recognition could be useful in scenarios where larger objects with

distinguishable features are present. This could also help determine the orientation of

recognizable objects.

As expected, and desired, the implemented scene camera solution has presented some useful

options for further development. Which of these are the most relevant to prioritize comes down to

which demonstration tasks are decided to show in the future.

43


64/81


65/81

Chapter 6Conclusion

Through completion of this project a scene camera solution for the DTI Robot Co-Worker

prototype has been developed, integrated and verified.

From analysis of the scene and investigation of existing modules it was decided to create a

completely new implementation, tailored to match requirements for use with the Robot Co-Worker.

The developed solution serves as a fully functional scene camera module, including necessary

core features for calibration and reliable generation of segmented RGBD point clouds, as well as

additional functionalities to allow pose estimation of objects.

Fulfilment of the initial project goals have been achieved by development of:

An automatic calibration solution, estimating and saving 6D pose estimates of the structured

light sensors by placement of an AR tag in the scene and executionf of a script. Calibration

during operation is possible, and values are saved even after restart of the system.

Functionality to allow easy configuration of point cloud density through adjustment of voxel

grid resolution used for down sampling. This makes it possible to adjust 3D data resolution

to match varying requirements in multiple situations.

A Correction node for adjustment of estimated depth values, which has proven to be crucial

in order to obtain reliable distance measurements from the sensors.

The scene camera core which processes obtained 3D data and publishes segmented point

clouds of objects in the scene, as well as bounding box geometries for obstacle avoidance inmotion planning.

An add-on providing a ROS service to locate an axis-aligned bounding box of specific size in

the scene and return its pose for use in external modules.

In cooperation with the development team of the Robot Co-Worker, the scene camera solution has

been successfully integrated with the current prototype. This has made it possible to perform tests

in a wide variety of realistic scenarios, through demonstrations of performance in example cases

suggested by industrial partners.

45


66/81


67/81

List of Appendices

A Appendix - Package installation instructions . . . . . . . . . . . . . . . . . . . . . . . . 49

A.1 How to do a fresh installation . . . . . . . . . . . . . . . . . . . . . . . . . . . . 49

A.2 How to launch the nodes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 49

B Appendix - Wiki content . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 51

C Appendix - Depth estimation offset for sensors . . . . . . . . . . . . . . . . . . . . . . . 57D Appendix - CD contents . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 59

E Appendix - Code . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 61

47
http://0.0.0.0/http://0.0.0.0/http://0.0.0.0/http://0.0.0.0/http://0.0.0.0/http://0.0.0.0/


68/81


69/81


70/81


71/81

B. Appendix - Wiki content

During development of the scene camera, an internal wiki webpage was created as a reference

for the members of the Robot Co-Worker team at DTI. The content has not been streamlined for

external use, as it is just meant to serve as a basic guideline for use of the scene camera solution.

Included in the following four pages is a snapshot of the wiki near the end of this project. It

should be noted that the wiki format is primarily created to be viewed in a browser, not for niceprinting layouts.

51


72/81


73/81


74/81

C. Appendix - Depth estimation offset for

sensors

Xtion 1 correction equation converted to [mm]): y = 0.00001x2 + 0.989x+ 15.5

Xtion1 distance estimatesest. [m] act. [m] error [m] error [%]

0.538 0.55 0.012 2.23

0.634 0.65 0.016 2.52

0.733 0.75 0.017 2.32

0.833 0.85 0.017 2.04

0.931 0.95 0.019 2.04

1.029 1.05 0.021 2.04

1.129 1.15 0.021 1.86

1.228 1.25 0.022 1.79

1.323 1.35 0.027 2.04

1.418 1.45 0.032 2.26

Table 1: Estimate results for the Xtion1

sensor with factory calibration.

Figure 1: Xtion1 estimates and measurements.

Xtion 2 correction equation converted to [mm]): y = 0.00008x2 + 0.963x+ 24.33

Xtion2 distance estimates

est. [m] act. [m] error [m] error [%]

0.619 0.65 0.031 5.01

0.711 0.75 0.039 5.49

0.801 0.85 0.049 6.12

0.892 0.95 0.058 6.50

0.984 1.05 0.066 6.71

1.072 1.15 0.078 7.28

1.161 1.25 0.089 7.67

1.247 1.35 0.103 8.26

1.331 1.45 0.119 8.94

1.414 1.55 0.136 9.62

Table 2: Estimate results for the Xtion2


Figure 2: Xtion2 estimates and measurements.

57


75/81

58 List of Appendices

Kinect 1 correction equation converted to [mm]): y = 0.000001x2 + 0.984x+ 7.99

Kinect1 distance estimates


0.547 0.55 0.003 0.550.650 0.65 0.000 0.00

0.750 0.75 0.000 0.00

0.848 0.85 0.002 0.24

0.949 0.95 0.001 0.11

1.050 1.05 0.000 0.00

1.149 1.15 0.001 0.09

1.248 1.25 0.002 0.16

1.346 1.35 0.004 0.30

1.448 1.45 0.002 0.14

Table 3: Estimate results for the Kinect1


Figure 3: Kinect1 estimates and measurements.

Kinect 2 correction equation converted to [mm]): y = 0.000001x2 + 1.014x 43.53

Kinect2 distance estimates


0.583 0.55 -0.033 -5.66

0.681 0.65 -0.031 -4.55

0.776 0.75 -0.026 -3.35

0.873 0.85 -0.023 -2.630.972 0.95 -0.022 -2.26

1.070 1.05 -0.020 -1.87

1.164 1.15 -0.014 -1.20

1.264 1.25 -0.014 -1.11

1.360 1.35 -0.010 -0.74

1.453 1.45 -0.003 -0.21

Table 4: Estimate results for the Kinect2


Figure 4: Kinect2 estimates and measurements.


76/81

D. Appendix - CD contents

Along with this report comes a CD with files related to the project.

These are the folder on the CD, and their content:

Images Full resolution versions of images included in the report.

Packages Source files in the form of ROS packages ready for compilation.

References Pdf files of all resources from the resource list of the report.

Report Pdf version of this report itself.

Videos Video files of the demonstrations mentioned in the report.

59


77/81


78/81


79/81


80/81


81/81

References

[1] DIRA. Robot Co-Worker - Information og demonstration. UR L: http://www.dira.dk/nyheder/

?id=519 (visited on 30/05/2013).

[2] Willow Garage. PR2 Tabletop Manipulation Apps. UR L: http://ros.org/wiki/pr2_

tabletop_manipulation_apps (visited on 22/06/2013).

[3] The Danish Technological Institute. DTI Robot Co-Worker for Assembly. UR L: http://www.

teknologisk.dk/ ydelser/dti- robot- co- worker- for- assembly/ 32940 (visited on

21/06/2013).

[4] Gautier Dumonteil et al. Ivan Dryanovski William Morris. Augmented Reality Marker Pose Es-

timation using ARToolkit. URL: http://www.ros.org/wiki/ar_pose(visited on 20/06/2013).

[5] Scott Niekum. ar-track-alvar Package Summary. URL: http://www.ros.org/wiki/ar_track_

alvar (visited on 31/05/2013).

[6] PCL. Euclidean Cluster Extraction - Documentation. UR L: http://www.pointclouds.org/

documentation/tutorials/cluster_extraction.php (visited on 09/06/2013).

[7] PCL. How to use a KdTree to search. UR L: http: / / pointclouds . org / documentation /

tutorials/kdtree_search.php (visited on 24/06/2013).

[8] ROS. Intrinsic calibration of the Kinect cameras. UR L: http://www.ros.org/wiki/openni_

launch/Tutorials/IntrinsicCalibration(visited on 31/05/2013).

[9] SMErobotics. The SMErobotics Project. UR L: http://www.smerobotics.org/project.html

(visited on 21/06/2013).

[10] Mikkel Viager. Analysis of Kinect for Mobile Robots. Individual course report. Technical Uni-

versity of Denmark DTU, Mar. 2011.

[11] Mikkel Viager. Flexible Mission Execution for Mobile Robots. Individual course report. Technical

University of Denmark DTU, July 2012.

[12] Mikkel Viager. Scene analysis for robotics using 3D camera. Individual course report. Technical

U i i f D k DTU J 2013
http://www.dira.dk/nyheder/?id=519http://www.dira.dk/nyheder/?id=519http://ros.org/wiki/pr2_tabletop_manipulation_appshttp://ros.org/wiki/pr2_tabletop_manipulation_appshttp://www.teknologisk.dk/ydelser/dti-robot-co-worker-for-assembly/32940http://www.teknologisk.dk/ydelser/dti-robot-co-worker-for-assembly/32940http://www.ros.org/wiki/ar_posehttp://www.ros.org/wiki/ar_track_alvarhttp://www.ros.org/wiki/ar_track_alvarhttp://www.pointclouds.org/documentation/tutorials/cluster_extraction.phphttp://www.pointclouds.org/documentation/tutorials/cluster_extraction.phphttp://pointclouds.org/documentation/tutorials/kdtree_search.phphttp://pointclouds.org/documentation/tutorials/kdtree_search.phphttp://www.ros.org/wiki/openni_launch/Tutorials/IntrinsicCalibrationhttp://www.ros.org/wiki/openni_launch/Tutorials/IntrinsicCalibrationhttp://www.smerobotics.org/project.htmlhttp://www.smerobotics.org/project.htmlhttp://www.ros.org/wiki/openni_launch/Tutorials/IntrinsicCalibrationhttp://www.ros.org/wiki/openni_launch/Tutorials/IntrinsicCalibrationhttp://pointclouds.org/documentation/tutorials/kdtree_search.phphttp://pointclouds.org/documentation/tutorials/kdtree_search.phphttp://www.pointclouds.org/documentation/tutorials/cluster_extraction.phphttp://www.pointclouds.org/documentation/tutorials/cluster_extraction.phphttp://www.ros.org/wiki/ar_track_alvarhttp://www.ros.org/wiki/ar_track_alvarhttp://www.ros.org/wiki/ar_posehttp://www.teknologisk.dk/ydelser/dti-robot-co-worker-for-assembly/32940http://www.teknologisk.dk/ydelser/dti-robot-co-worker-for-assembly/32940http://ros.org/wiki/pr2_tabletop_manipulation_appshttp://ros.org/wiki/pr2_tabletop_manipulation_appshttp://www.dira.dk/nyheder/?id=519http://www.dira.dk/nyheder/?id=519

scene detection for flexible production robot

Documents