scene detection for flexible production robot
TRANSCRIPT
-
7/27/2019 Scene Detection for Flexible Production Robot
1/81
-
7/27/2019 Scene Detection for Flexible Production Robot
2/81
-
7/27/2019 Scene Detection for Flexible Production Robot
3/81
-
7/27/2019 Scene Detection for Flexible Production Robot
4/81
-
7/27/2019 Scene Detection for Flexible Production Robot
5/81
Scene detection for flexible production robot
Scene detektion for fleksibel produktionsrobot
This report was prepared by:
Mikkel Viager (s072103)
Advisors:
Jens Christian Andersen, Professor at DTU, Department of Electrical Engineering
Ole Ravn, Head of Group, Department of Electrical Engineering
Anders B. Beck, Project Leader, Danish Technological Institute
DTU Electrical EngineeringAutomation and Control
Technical University of Denmark
Elektrovej, Building 326
2800 Kgs. Lyngby
Denmark
Tel: +45 4525 3576
Project period: February 2013 - June 2013
ECTS: 30
Education: MSc
Field: Electrical Engineering
Class: Public
Remarks: This report is submitted as partial fulfilment of the requirements for
graduation in the above education at the Technical University of Denmark.
Copyrights: Mikkel Viager, 2013
-
7/27/2019 Scene Detection for Flexible Production Robot
6/81
-
7/27/2019 Scene Detection for Flexible Production Robot
7/81
-
7/27/2019 Scene Detection for Flexible Production Robot
8/81
-
7/27/2019 Scene Detection for Flexible Production Robot
9/81
-
7/27/2019 Scene Detection for Flexible Production Robot
10/81
-
7/27/2019 Scene Detection for Flexible Production Robot
11/81
-
7/27/2019 Scene Detection for Flexible Production Robot
12/81
-
7/27/2019 Scene Detection for Flexible Production Robot
13/81
-
7/27/2019 Scene Detection for Flexible Production Robot
14/81
-
7/27/2019 Scene Detection for Flexible Production Robot
15/81
-
7/27/2019 Scene Detection for Flexible Production Robot
16/81
-
7/27/2019 Scene Detection for Flexible Production Robot
17/81
Abstract
This report documents the development, integration and verification of a scene camera solution
for the Robot Co-Worker prototype at the Danish Technical Institute.
An analysis of the requirements to the implementation is conducted, and it is determined that
no existing solutions can sufficiently fulfil these. Based on two structured light sensors, a complete
solution is developed to match a set of requested functionalities.The result is a ROS package capable of delivering detailed RGBD point cloud segmentations
for each object in the scene. Furthermore, bounding box geometries are estimated and made
available for use in motion planning and in an included service to return the position of boxes
from provided dimensions.
Calibration of the solution is done by automatic estimation of sensor poses in 6D, allowing
alignment of 3D data from the sensors into a single combined point cloud. Creation of a method
for calibration of distance estimates from structured light sensors have also been done, as this was
shown to be necessary.
The implementation is verified through tests and inclusion in demonstrations of industrial
assembly cases as an integrated part of the Robot Co-Worker, fulfilling the requested capabilities.
xi
-
7/27/2019 Scene Detection for Flexible Production Robot
18/81
-
7/27/2019 Scene Detection for Flexible Production Robot
19/81
Preface
This project was carried out at the Technical University of Denmark (DTU) in collaboration
with the Danish Technological Institute (DTI). The project was completed in the timeframe from
February 2013 to June 2013, and covers a workload of 30 ECTS credits.
While completing this thesis I have worked with several people whom I would like to thank
for their support; my supervisors Jens Christian Andersen and Ole Ravn for great sparring,my external supervisor Anders B. Beck and the entire Robot Co-Worker team at DTI for their
helpfulness and interest in my work, and my fellow student Jakob Mahler Hansen for his support
and constructive input.
The work of this thesis have been partially funded by the European Commission in relation to
the FP7 project PRACE grant no. 285380, with great appreciation.
xiii
-
7/27/2019 Scene Detection for Flexible Production Robot
20/81
-
7/27/2019 Scene Detection for Flexible Production Robot
21/81
-
7/27/2019 Scene Detection for Flexible Production Robot
22/81
-
7/27/2019 Scene Detection for Flexible Production Robot
23/81
1.2. PROBLEM FORMULATION 3
1.2 Problem formulation
The project goal is to develop, integrate and verify a scene camera solution for the Robot
Co-Worker, making the system capable of perceiving its work area in 3D. It is imperative that the
implemented solution is both accurate and robust enough to function reliably in connection withother modules, making it a useful feature for the Robot Co-Worker, as well as a viable choice for
inclusion in other projects with similar needs. The scene camera should be able to detect objects in
the scene and provide details on their size and position.
Desired functionality includes:
Simple and fast calibration procedure
Data precision with sufficient accuracy for initial object position estimation
Creation and publication of the scene as a dense 3D pointcloud of RGBD data
Segmentation, of individual objects in the scene, into separate point clouds
Generation of simple bounding geometries for obstacle avoidance in motion planning
Functionality to return the position of a box, of specified size, in the scene
The scene camera solution should be able to operate continuously, even during movement of
the robot arm inside the work area.
As part of the verification process it is desired to have the scene camera feature showcased as a
fully functional and essential part of the Robot Co-Worker during a scheduled public demonstra-
tion in early May 2013.
Furthermore, the project should evaluate on the capabilities of the scene camera in perspective
to options for further development of extended functionality for future tasks, while preparing the
software structure to allow such expansions.
In summary:
Development and implementation of a scene camera solution with easy calibration.
Integration with the Robot Co-Worker and inclusion in showcase demonstration.
Test and verification of the solution.
Evaluation on options for future use and expandability.
Finally, the solution should also be usable with the computer systems at Automation and
Control, DTU, which is to be verified through testing.
-
7/27/2019 Scene Detection for Flexible Production Robot
24/81
-
7/27/2019 Scene Detection for Flexible Production Robot
25/81
-
7/27/2019 Scene Detection for Flexible Production Robot
26/81
-
7/27/2019 Scene Detection for Flexible Production Robot
27/81
-
7/27/2019 Scene Detection for Flexible Production Robot
28/81
-
7/27/2019 Scene Detection for Flexible Production Robot
29/81
-
7/27/2019 Scene Detection for Flexible Production Robot
30/81
-
7/27/2019 Scene Detection for Flexible Production Robot
31/81
2.5. SENSORS 11
and inherit from parent launch files, allowing to overwrite any or none of the inherited parameter
values.
2.4.1 Software versions
To match the ROS version which the existing Robot Co-Worker modules were created for, it
is chosen to use ROS fuerte. As operating system is used the popular Linux distribution Ubuntu,
version 12.04 LTS.
A full desktop installation of ROS with the addition of OpenNI sensor driver packages
openni_camera and openni_launch allows direct use of the Kinect sensors.
In order to use the Xtion sensors it is required to roll back the Sensor-Bin-Linux driver to version
5.1.0.41, otherwise the driver will not register the sensors as connected. This is done simply by
running the driver install script with administrator rights, overwriting the existing driver files. To
revert this process, the newest driver version can be used to overwrite the driver files the same
way.
2.5 Sensors
A key aspect of the solution is to utilize useful sensors in optimal positions. The sensor type
has already been decided, so this choice is outside the scope of the project. Two candidate sensors
of this type qualify for use in this case, so a comparison is needed. This section also contains a
brief description of what to expect from structured light technology, as well as thoughts on sensor
positioning.
2.5.1 Structured light
Figure 2.4: A section of the structured
IR light projected by a Kinect. Less than
10% of the entire projection is shown.
Structured light sensors function by projecting a pre-
defined pattern unto a surface and analysing the defor-
mations in the projection, from the viewpoint of a cam-
era with known relative position [10]. Precision of the
sensor is depending on the resolution of the projected
pattern, as well as the resolution of the camera. The cam-
era must have a resolution high enough to distinguish
the individual parts of the projected pattern from the
background, and the resulting 3D precision is dependenton the density of the pattern features. An example of the
pattern projected by a Kinect sensor is shown in figure
2.4. Because of the recent mass production of the Mi-
crosoft Kinect device, structured light sensor technology
is currently very affordable and available to everyone.
The research leading to the choice of structured light sensors for this project shows that
appropriate precision for the desired level of object detection is achievable [12], even though the
technology is not ideal for all surface and texture properties. As with other light based sensors,
it gives rise to problems when surfaces are either very reflective or absorbs the projected light
instead of reflecting it. This downside has been considered, but it is determined that a structuredlight based scene camera will still be of good use in the case at hand.
http://0.0.0.0/http://0.0.0.0/http://0.0.0.0/http://0.0.0.0/ -
7/27/2019 Scene Detection for Flexible Production Robot
32/81
-
7/27/2019 Scene Detection for Flexible Production Robot
33/81
-
7/27/2019 Scene Detection for Flexible Production Robot
34/81
14 CHAPTER 2. ANALYSIS
Figure 2.8: Sensor FOV overlapping.
The entire table surface is covered by
both sensors, but tall objects close tothe sides will only be covered by one.
Moving the cameras to S2 and S4 at the sides of
the cell allows the field of view for both cameras to be
aligned with the work area, as well as provide viewing
angles complementing each other well. Depending on
the anticipated height of objects in the scene, it may be
advantageous to extend the cell and move the cameras
further apart.
In figure 2.8 is shown a conceptual graphic of the
FOV seen from the front of the cell, making it clear why
the detectable object height is limited. Small objects are
entirely covered by both cameras, but at soon as a larger
object is placed closer to the sides, they are not guaran-
teed to be covered in their full height. This could lead
to critical detection errors in rare situations where the
robot arm is blocking the camera above a large object
and the full height of the object is not within the FOV of
the opposite camera. For the cases and demonstrations
used in this project, the viewing angles allow sufficient object height, as the detectable objects
are to be placed within predefined boundaries with enough distance from the work space edges.
These boundaries are decided by the limited reach of the robot with a gripping tool mounted. In
other applications it may be advantageous to move the cameras further apart, but this would
increase the distance to the workspace and thereby decrease sensor precision.
Figure 2.15 shows the FOV of both sensors in the chosen positions, confirming that the entire
work space is covered.
((a)) 640x480 RGB image from Xtion1. ((b)) 640x480 RGB image from Xtion2.
Figure 2.9: Images showing the FOV for both sensors. It should be noted that this is the RGB
image, which has a slightly bigger FOV than what is covered in images for depth estimation.
-
7/27/2019 Scene Detection for Flexible Production Robot
35/81
-
7/27/2019 Scene Detection for Flexible Production Robot
36/81
16 CHAPTER 2. ANALYSIS
2.7 Calibration
The precision of 3D data provided by the scene camera setup relies on accurate calibration of
the intrinsic camera properties as well as the poses of sensors in the scene.
In order to make the calibration task both fast and reliable, it is desired to automate and
streamline the process. This will also make it viable to do re-calibration more often, making the
entire solution more resistant to changes in the scene and setup.
2.7.1 Intrinsic camera calibration
Figure 2.10: Openni_camera calibration
screen. Intrinsic camera parameters
are calculated based on several pictures
with a checker board, of given size, in
varying positions and orientations.
As with all vision-based systems it is important to
have a good calibration of the intrinsic camera parame-
ters. This calibration can be done with the ROS package
openni_camera, by following the official tutorial [8], mak-
ing it possible to calibrate the intrinsic camera parameters
as well as the impact of any lens distortion. Because of
the type of lens used in these sensors, there is not muchdistortion of the image, but the best result is achieved
by also calibrating for the slight distortion. A screenshot
from the calibration procedure is shown in figure 2.10.
The same calibration routine can be used to calibrate
the IR sensor used for the depth data, as this is simply
another CMOS sensor with an IR filter in front of it.
2.7.2 Distance calculation
Both the ASUS Xtion and Microsoft Kinect devices come with preset calibrations for internal
calculation of metric distances from raw data values. It is uncertain whether this calibration has
been done individually for each device, but it is clear that quality of the converted metric values
vary a lot between devices.
In order to achieve the desired precision, it is necessary to recalibrate the parameters used for
conversion between raw sensor values and metric values. As no method is available for doing this,
a new functionality needs to be developed. This is described in section 3.2.
2.7.3 External position calibration
In order to have the two sensors contribute to a single well-defined collective point cloud, it is
necessary to align the two separate point clouds in correct relation to each other.
2.7.3.1 3D Feature registration
A common method of combining several datasets into a single global consistent model is the
technique called registration, which iteratively matches and aligns identified feature points from
separate data sets until a given alignment error threshold is reached. However, this method works
best for data sets containing several well defined 3D features for matching, which also requires
relatively high sensor precision. Even if these requirements are met by the structured light sensors,
the computational load caused by the registration algorithm would take up a lot of the available
processing power.
For calibration purposes it could be viable to use registration, as the computational require-
ments would not have very high impact if only used in a one time offline calculation. However, a
simpler external calibration has proven to be sufficiently effective in this case, as described in the
following section.
http://0.0.0.0/http://0.0.0.0/ -
7/27/2019 Scene Detection for Flexible Production Robot
37/81
2.7. CALIBRATION 17
2.7.3.2 Transforms
Figure 2.11: Sensor poses relative to
common reference, visualized in rviz.
Knowledge about relative positions al-
lows merging of sensor data.
Instead of analysing and realigning the point clouds
after acquisition, a pre-alignment of the relative sensor
poses has proven to be sufficiently accurate. By esti-mating poses of the two sensors relative to a common
world frame, it is possible to have the two resulting point
clouds aligned and positioned with enough precision to
seamlessly make up a common larger and more dense
total point cloud. This can be done by keeping track of
the transforms between the coordinate systems of the
two sensors and the chosen common world reference
(camworld), as illustrated in figure 2.11.
One way to estimate the sensor poses would be to
manually measure them in camworld coordinates. Notonly would this be a tedious and time consuming cali-
bration method, but it would also be unlikely to meet the precision requirements, as it is difficult
to do precise manual measurements of the sensor orientation angles.
A vision based method, using tag tracking, has previously been proven to provide very good
positional estimates with similar sensors [11]. However, as this solution is required to use ROS, it
is not very easy to re-use the mobotware2 plug-in from the previous project. Similar functionality
can be achieved through use of the ROS package ar-track-alvar, serving as a wrapper for Alvar
which is an open source Augmented Reality (AR) tag tracking library.
2.7.3.3 AR-track-alvar
Figure 2.12: An AR tag gen-
erated with ar-track-alvar, en-
coded with the id 19.
The ar-track-alvar [5] ROS package provides functionality
to detect, track and generate AR tags (see figure 2.12) in a
live image stream. Other similar packages have been evalu-
ated; the ar_pose [4], and camera_pose_calibration [13], but nei-
ther was found to be as reliable as ar-track-alvar or provide
the pose data in a similarly directly usable structure. Based
on these findings, it is decided that ar-track-alvar is the best
currently existing package for relative camera pose estima-
tion.
Provided with the intrinsic camera calibration parameters and
the physical size of the AR tag, the package returns the 6D pose
and encoded tag number. The pose is then used to determine the transform between sensor and
AR tag, resulting in a calibration with sufficient precision, as shown in section 4.2.
2Mobotware is a plug-in based software platform developed and used at Automation and Control, DTU
http://0.0.0.0/http://0.0.0.0/http://0.0.0.0/http://0.0.0.0/http://0.0.0.0/http://0.0.0.0/http://0.0.0.0/http://0.0.0.0/ -
7/27/2019 Scene Detection for Flexible Production Robot
38/81
-
7/27/2019 Scene Detection for Flexible Production Robot
39/81
-
7/27/2019 Scene Detection for Flexible Production Robot
40/81
-
7/27/2019 Scene Detection for Flexible Production Robot
41/81
Chapter 3Developed Elements
As concluded in the analysis, a combination of existing software modules alone does not
provide all of the required functionality to implement the solution. It has been necessary to
develop additions to fill these gaps, as explained in this chapter.
3.1 Software structure
In order to make further development possible in the future, it is imperative that the software
structure of the solution is prepared for this. Implementing all features in a single node would be
possible, but it would also make it very troublesome to maintain and expand the source code.
The implemented types of nodes can be divided into four main categories;
Sensor driver
Core functionality of the sensors, taking care of receiving data and configuration of the
camera parameters and settings.
Calibration
Used only when it is decided that a calibration is needed. Stores the calibration data in a file
which will then be used until a new calibration is done.
Data providers
Handles data streamed from the driver and read from the calibration file, preparing these for
use by correcting depth estimates and keeping the information available.
Scene camera features
Main processing, filtering and segmentation of the acquired and prepared data, in the form
of point clouds. Provides most public topics and services, and expandable with creation of
add-ons.
An illustration of the categories, containing their respective nodes, can be seen in figure 3.1.
This also provides an overview of the included elements, of which the nodes openni_camera and
ar-track-alvar are used directly from existing ROS packages.
There is no strict launch order for the nodes, but no output will be generated if some of the
lower level nodes are not running. The openni_camera sensor driver does not depend on input from
other nodes, and is the obvious choice as the first node to be started. A separate driver node is
needed for each sensor, which is handled with two launch files inheriting most of their contents
from the same parent. Each node then publishes topics with RGB images, point clouds and depth
images.
21
-
7/27/2019 Scene Detection for Flexible Production Robot
42/81
22 CHAPTER 3. DEVELOPED ELEMENTS
Figure 3.1: Overview chart of software structure in the solution. Division into separate nodes,rather than one single combined, is to allow extraction and use of individual components in otherROS solutions, as well as easy maintenance and the option of utilizing distributed processing.
It should be noted that it is not possible to subscribe to point clouds from both sensors at
the same time, if they are configured to 30 Hz operation, unless they are connected to separate
computers or USB busses. Even then, it would also require a computer with more than average
processing power to receive and handle the data. This is explained further in section 4.4.
The distance precision of point clouds available directly from the driver varies a lot with each
sensor unit, and is not adjustable through calibration in the driver. Because of this uncertainty it
is deemed necessary to include a depth_image_correctornode for generation of more precise point
clouds. This is to be based on calibration of each individual sensor, as explained in detail in section
3.2. The corrected point clouds are then published for further analysis in the main scene cameranode, as well as for calibration purposes.
With corrected depth values, it is now possible to run a calibration of the sensor positions.
The cam_tf_generator is started from another launch file, and is configured to launch and call
ar-track-alvar as necessary. The requested sensor unit id is provided in the launch file and passed
on through a call to ar-track-alvar internally. With every new set of RGBD information a detection
of the AR tag is attempted, and once enough data samples have been collected the average relative
pose transform is calculated and saved to a file. This makes it possible to save calibrations through
system restarts.
Back with the data providers, the tf_publisher node reads the pose calibration file and makes
sure that the transformation is continuously published for use in the point cloud alignment. Here
is also needed one node for each sensor. The calibration file is occasionally checked for new
parameters, to allow recalibration during operation.
With all of the required data, it is now possible to start the actual analysis in the scenecamera
node. This contains the core methods for initial data treatment, providing segmented and down
sampled point clouds for further analysis in following nodes. Some data parts may already be
usable in external modules at this point, but the main goal is to prepare and organize the data for
final analysis in smaller add-ons with more specific tasks.
-
7/27/2019 Scene Detection for Flexible Production Robot
43/81
-
7/27/2019 Scene Detection for Flexible Production Robot
44/81
24 CHAPTER 3. DEVELOPED ELEMENTS
clear that both fits confirms that the estimates deviate significantly from a 1:1 relation (the dashed
line).
((a)) Full plot of all measurements ((b)) Zoom on the area of interest
Figure 3.2: The distance estimates for Xtion2 plotted in relation to measured distances. From the
offset to the cyan colored dashed guideline with correct y = x relation, it is clear that the sensor
estimates are not correct. With a polynomial fitting the estimation data it is possible to calculate the
correct distance from the estimates. The difference between a first and second order fit is hardly
noticeable on the plots, but the second order polynomial does fit the samples best.
It may be possible to enhance the existing structured light sensor driver in ROS by including
calibration parameters for the depth calculation, but this would also require manual inclusion of
the feature in all future versions of the official driver. Instead, an individual node is developed tocomplete this task as a part of the scene camera features.
Internally, the node is named depth_image_corrector, as this is what it does. It handles generation
of a RGBD point cloud, similar to what is done in the driver, but corrects the depth values according
to equation parameters supplied in its launch file. This equation is based on the polynomial line
fit to a set of experimentally obtained measurements, and only needs to be estimated once for
each sensor. The correction equation for the Xtion2 sensor is best estimated as the second order
polynomial: dreal = 0.00008d2est+ 0.963dest+ 24.33, converted to mm.
In the driver there is a step between the gathered data and creation of the point cloud, which
is a depth image where each pixel is encoded with its corresponding depth value. Already at
this point the distances are given in mm, which is the most raw data value available from the
sensor. With the raw sensor values unavailable, the only option is to do a recalibration of these
provided metric values. This is not ideal, but it will result in valid corrections as long as the initial
conversion does not change.
Iterating through the depth image to recalculate each value, provides a corrected depth image
from which a point cloud is then generated. The point cloud is published and made available
alongside the one without recalibration, allowing the user to choose either. This continuous
processing of images and parallel creation of two point clouds is in principle not desirable, as it
is a waste of processing resources. However, because of the way ROS topics works, the original
point cloud is not generated unless a node is an active subscriber to it. The addition of the depth
image correction node will therefore not require additional processing power, as long as there are
no subscribers to the original point cloud.
-
7/27/2019 Scene Detection for Flexible Production Robot
45/81
-
7/27/2019 Scene Detection for Flexible Production Robot
46/81
-
7/27/2019 Scene Detection for Flexible Production Robot
47/81
-
7/27/2019 Scene Detection for Flexible Production Robot
48/81
-
7/27/2019 Scene Detection for Flexible Production Robot
49/81
3.5. PERCEPTION PROCESSING STEP 4 - BOX FINDER ADD-ON 29
3.5.1 Detection of rotation
A problem for the box finder add-on, in its current state, is that it relies on sizes of axis-aligned
bounding boxes. In situations where the searched boxes are not aligned with the camworld coor-
dinate system, the bounding boxes will represent a larger size than the actual box within. Smallrotations of only a few degrees may be handled by setting the size tolerance a bit higher, but the
true box pose is still not found.
There is no existing PCL library providing this object-aligned bounding box estimation directly,
and development of such a method is on the boundary of the scope for this project. Some
considerations on the subject was done, but an implementation was not prioritized, as it could be
assumed that the box in the immediate demonstration would be axis aligned.
Two approaches to object alignment estimation have been investigated in this project. As
summarized in the following, one of these was found to be potentially useful.
Principal component analysis (PCA) was considered as a candidate for determining the rota-
tional alignment of objects, but was discarded after experimental results made it clear that this
was not ideal for hollow box shapes. Even if all sides of the box was visible, the estimated major
axis of the object would most likely be from one corner to the adjacent, rather than parallel with
the box sides. This led to the conclusion that PCA is not very well suited for this task. If the boxes
had been solid, containing points as well, it might have been more plausible.
Figure 3.6: Individual visualization of the box
from figure 3.3 with normal vectors in key areas.
As only one box side is partly detected, it is hard
to use normal vector information from only this
to determine the orientation of the box.
Normal vector analysis was then consid-
ered, with the idea that normal vector his-tograms could be used to estimate rotation
from the angles of normal vectors on box sides.
An example of a box point cloud visualized
with its normal vectors can be seen in figure
3.6. The amount available normal vectors in
the shown example makes it clear that the cur-
rent sensor positioning makes it unlikely to
have many of the box sides detected at once,
which could prove to be a problem. Without
sides to analyse for normal vector alignment,estimation of orientation is not easy. Analysis
of the rectangular shape of the visible top layer
would probably provide better results in such
worst case scenarios.
The principle of normal vector analysis is
considered viable for use in orientation estima-
tion of boxes, as well as other types of feature
recognition.
-
7/27/2019 Scene Detection for Flexible Production Robot
50/81
30 CHAPTER 3. DEVELOPED ELEMENTS
3.6 Conclusion
Several elements have had to be developed to make the scene camera solution fulfil the given
requirements.
Focus has been on the first part of the perception sequence, allowing robust and correct dataaquisition and segmentation of objects into seperate point clouds, ready for further analysis. The
second perception part with object recognition or classification has been introduced in the form
of a box finder add-on, but actual object recognition is considered to be outside the scope of this
project.
Calibration and data provider nodes have been developed to create an easy calibration proce-
dure, partly consisting of methods allowing inclusion of the existing ROS package ar-track-alvar.
A method for calibration of the depth estimates of the structured light sensors has been shown
to be necessary, making it possible to achieve higher precision. Such a functionality has been
integrated, in the form of a depth_image_corrector node which can also be used directly in other
ROS applications using similar sensors.
A core scenecamera node has been developed to merge, down sample, filter and segment the
acquired point clouds, using algorithms from the PCL. The output from this is ready for use
in add-ons and is published in several forms, alongside simple bounding box geometries allow
obstacle avoidance through motion planning.
Because of specific needs in a demonstration for estimating the location of a box in the scene, a
box finder service add-on has been developed. This can also serve as a near minimal example of
scene camera add-on construction, for use as a reference to new developers.
In order to verify the robustness, reliability and precision of the individual elements, as well as
the collective solution as a whole, experimental tests are documented in the next chapter.
-
7/27/2019 Scene Detection for Flexible Production Robot
51/81
-
7/27/2019 Scene Detection for Flexible Production Robot
52/81
32 CHAPTER 4. TESTS
4.1 Impact of noise sources
The Robot Co-Worker concept is to be functional in existing industrial environments in produc-
tion lines, and must be able to function even though conditions are not necessarily optimal. It may
be possible to adapt the work environment slightly to shield the scene camera from most externalinfluence, but some interference between the sensors of the cell itself should also be expected.
4.1.1 Ambient light
Light conditions in the scene is always important for vision based systems. Where many
cameras can be manually adjusted to have white balance and focus distance changed, the Kinect
and Xtion sensors handle such adjustments automatically, with no option for manual adjustment.
In most cases this is sufficient, but it makes the implementation dependant on external light
sources, with no option to adapt.
Figure 4.1: Example of how ambient sun light can
affect the acquisition of 3D data sets with the IR
based structured light sensors. The area of the
table with highest concentration of IR light from
both sensors and the sun is undetectable under
these conditions.
A critical issue is when saturation occurs
for the camera with IR filter, which is used
for 3D data generation. Artificial light sources
have not been seen to cause such issues, con-
tradictory to sunlight. Direct sunlight has the
most significant impact, but diffused reflec-
tions from surfaces or through windows have
proven to be an issue as well.
The image in figure 4.1 shows the impact
of diffused sunlight in the work area. The en-
tire center of the table surface is not registered,
because the saturation of IR light in that area
makes it impossible for the sensor to distin-
guish the projected points.
It is important that the entire work space is
shielded from strong sunlight, in order to en-
sure reliable use of the structured light cameras.
A curtain had to be added to one side of the the
Robot Co-Worker demonstration cell which is
oriented towards several windows. To further
control the lighting conditions, a roof with LED lights has also been added. This is not requiredby the scene camera solution, but it is necessary for the tool camera to have as close to static light
conditions as possible.
4.1.2 Overlapping structured light patterns
It is expected to have some impact from using two structured light sensors to monitor the same
area, as they are both emitting IR dot patterns for depth estimation. As long as alignment of all
nine calibration points [10] is avoided, only limited conflict between the two is noticable.
On most surfaces in the scene there is no problem for each sensor to recognize its own pattern
projection and ignore any extra dots. However, surfaces with reflective properties can sometime
be oriented so that very little returned light is visible to the IR camera, normally causing a small
hole of missing data in the resulting point cloud. In these cases it can be critical if one or more of
http://0.0.0.0/http://0.0.0.0/ -
7/27/2019 Scene Detection for Flexible Production Robot
53/81
4.1. IMPACT OF NOISE SOURCES 33
the dots projected by the other sensor is at an angle where the light is better reflected, causing the
wrongful assumption that this is the missing dot. Any shift in position of such dots is assumed
by the sensor to be caused by an object in the scene, resulting in a spike in the point cloud. An
example of this is shown in figure 4.2 where the black area of an AR tag causes a significant spike.
Figure 4.2: At the upper right corner of the black
square AR tag can be seen a significant spike indepth reading of a small area. This primarily
occurs with two sensors pointed at the same re-
flective surface.
Of all objects used in the demonstrations
through this project, very few have caused such
significant problems. The AR tag, which is only
in the scene during calibration, as well as the
black tape marking the box zone, are the pri-
mary ones. In both cases the issue is not critical,
as these spikes only consists of a few points,
which is not enough to pass through the seg-
mentation process and be considered as objects.
Worst case is if a real object is positioned very
close to a spike, having this included in the
segmentation of such an object because of the
small distance between them. However, this
is unlikely, as the object will usually block the
projected pattern from one of the two sensors.
It should be kept in mind that these spikes can occur in the 3D data stream, but simple averaging
with various filtering techniques will make them neglectable.
-
7/27/2019 Scene Detection for Flexible Production Robot
54/81
34 CHAPTER 4. TESTS
4.2 Precision of sensor position calibration
In the analysis it was concluded that the ar-track-alvar package can provide sufficiently precise
6D pose estimates of the two sensors relative to a common reference. It was also seen that the
RGB and RGBD modes gave different calibration results which deviated on either position or
orientation. To further investigate this, pose estimation data from a calibration sequence has been
plotted for analysis as shown in figure 4.3.
The two available modes for pose estimation was run on the same data sequence of approxi-
mately 20 seconds length, which has been recorded and played back using the rosbag functionality.
An AR tag with quadratic sides of 44.7cm has been used for calibration purposes. A smaller varia-
tion of 20cm was also tested, but best accuracy was achieved with the larger one. For unknown
reasons the two modes register the AR tag with a /2 rotation offset, which has been normalized
in the plotted data.
The 100 2D estimations deviate very little because of less sensor noise than with 3D, meaning
that less measurements than 100 would be sufficient to determine a mean value. Noise on the 3D
data readings is much more significant, making the need to average over many measurements
clear. The estimates from RGB images in 2D and from RGBD in both 2D and 3D gives different
results on several parameters, of which translation along the y-axis and roll, pitch rotation is
obvious.
((a)) Estimates of x, y & z coordinates ((b)) Estimates of roll, pitch & yaw
Figure 4.3: Calibration data showing the estimate differences between the 2D only and combined 2D
and 3D methods for 6D pose estimation. The difference in position is around 5cm, and the offset in
orientation is around 0.1 radians in both roll and pitch estimates.
The data plots in figure 4.3 confirm the estimation offsets seen back in figure 2.13 and 2.14.
From these visualizations of point cloud alignment with both calibrations is concluded that the
best position estimate is achieved with the 2D method, suggesting that the combined method gives
a position error of around 5cm for this sensor. However, the 2D method is lacking precision in its
orientation estimate, which is around 0.1 radians off compared to the more accurate result from
the combined method. If run on a 64-bit system, supporting the combined 2D and 3D method, it
could be advantageous to use RGBD for position estimation and RGB for orientation estimation.
On 32-bit systems the RGB method is the only one available, so a small error in orientation offset
should be expected. This can either be accepted as it is, or adjusted manually after the calibration.
In any case, it is obvious that a more precise pose estimation method could improve precision,preferably through development of a new package with both 32-bit and 64-bit compatibility.
-
7/27/2019 Scene Detection for Flexible Production Robot
55/81
4.3. DISTINGUISHABILITY OF 3D FEATURES 35
4.3 Distinguishability of 3D features
It is desirable that the 3D data collected by the scene camera solution is of sufficient resolution
to allow object recognition, which could be very useful in many scenarios. Depending on the
size of objects and their features, it is possible to adjust the voxel grid resolution to reflect detailrequirements. In these examples the highest possible precision of around 2mm resolution is used.
A key object in the main demonstration of the Robot Co-Worker is the cardboard box with
transformers placed in foam. In figure 4.4 is shown visualizations of the point cloud segmentation
of such a box, with both RGB and a z-axis gradient coloring. Positioning of the box relative to both
cameras is as shown in figure 2.15, also making it clear that full vision of the box is not obtained
with both cameras. With only a single box side partly visible to one of the sensors, it is hard
to classify this as a box, from only the point cloud data. Another classification is still achieved
through object segmentation as shown in figure 3.4. No true classification is done, as all objects are
fitted with axis aligned bounding boxes, regardless of shape. A box classification algorithm would
most likely require more sides of the box to be visible.
((a)) RGB colored transformer box point cloud. ((b)) Same point cloud with z-axis gradient color.
Figure 4.4: Example of a single object point cloud after segmentation. Further analysis of the object
could be conducted to determine how many transformers is left in the box, and where these are
placed. Either from image analysis of the RGB data (left image), or the elevation level of distance
readings (right image).
Looking at the surface of the box, the data from on top of transformers and inside empty slots is
very detailed. For each of the eight slots, both analysis of RGB values and analysis of depth looks
to be viable approaches to determining which are occupied by a transformer. From the 3D data
this is possible because of the feature size, as each hole is around 7cm x 6cm x 6cm, which leads to
a considerable feature chance when placing a transformer in it. On both sides of the transformer is
placed mounting brackets, but as these are not the same size, the transformer is not symmetrical.
Whereas it would be beneficial to know the orientation of a transformer, this is hardly possible
with even the highest resolution of the scene camera solution. It may be possible under optimal
conditions, but in a dynamic work environment it would not be expected to achieve a very high
success rate.
-
7/27/2019 Scene Detection for Flexible Production Robot
56/81
36 CHAPTER 4. TESTS
Figure 4.5: Point cloud of metal heat sink with a
mounted transformer. Only some parts of major
features are visible.
Another example is the metal heat sink in
which the transformers are to be placed. Even
though the metal surface is matte, surfaces
of some angles are still not detectable with
structured light. This makes it unlikely that
points on all object features are always regis-
tered and available for recognition. The point
cloud result from object segmentation of the
heat sink with a transformer is shown in figure
4.5, which also corresponds to the RGB images
in figure 2.15. Most of the key features are too
small to be registered, and some of the detected
features are only partly included. It is hard to
say if recognition would be possible, but it would definitely not be reliable.
This is as expected from the beginning of the project, confirming the assumption that inclusion
of a tool camera for reliable object recognition is necessary. For larger objects it may be possible to
successfully analyse major features, but reliable classification or recognition can not be expected
with these sensors.
-
7/27/2019 Scene Detection for Flexible Production Robot
57/81
-
7/27/2019 Scene Detection for Flexible Production Robot
58/81
-
7/27/2019 Scene Detection for Flexible Production Robot
59/81
-
7/27/2019 Scene Detection for Flexible Production Robot
60/81
-
7/27/2019 Scene Detection for Flexible Production Robot
61/81
4.7. TEST CONCLUSION 41
4.7 Test conclusion
Several critical aspects of the implementation has been investigated, and evaluations of these
have been used to determine the usability and dependability of the scene camera solution.
It has been pointed out that shielding from external IR light sources is crucial for proper datacollection with structured light sensors. In particular it is important to avoid sunlight in the work
area. Potential issues caused by interference between the two light patterns projected on the
same surfaces has been investigated, but are not deemed to have critical impact in most detection
scenarios.
The difference between 6D pose estimation of sensors, with RGB or RGBD, have been shown to
vary. Neither provides a perfect pose estimate, but the RGB approach only has a small deviation
on orientation, making it the most suited candidate for calibration use. Ultimately, a combination
of both methods, or development of a new package, would provide more precise results.
With large objects it is possible to obtain 3D data of high enough detail to distinguish major
features, which could be used in the Robot Co-Worker demo to determine how many transformersare left in a box, as well as their positions. Smaller features, especially on metallic objects, can not
be expected to be distinguishable in the point clouds.
Multi-core processors are best suited to run the solution, as the separation into nodes makes it
possible to assign these to different processor cores for improved performance. Running at full
resolution, with no down sampling, the solution will require a powerful processor in order to
run at a decent rate. With down sampling to a 1cm voxel grid it is possible to achieve a full cycle
processing time of 1Hz or faster, even on a laptop processor.
Multiple demonstrations have been carried out to present examples of intended use of the
scene camera solution in combination with the Robot Co-Worker. These demonstrations have
showed that the project solution requirements are met, and that the implemented features are bothuseful, reliant, and portable to ROS environments on other systems as well.
-
7/27/2019 Scene Detection for Flexible Production Robot
62/81
-
7/27/2019 Scene Detection for Flexible Production Robot
63/81
Chapter 5Future Work
The goal of this project has been to develop a scene camera solution to be generally useful in
integration with the DTI Robot Co-Worker prototype. A few specific uses have also been requested
for use in demonstrations, as examples of how the generated point clouds can be used. Targetingboth general usability and specific requests, suggestions for future work are divided into two
categories to reflect the difference between improvements and expansions.
Improvements include aspects of the implemented solution which are open to significant
quality enhancement. These are the suggestions for future work to improve the existing imple-
mentation:
The calibration procedure currently relies on the ar-track-alvar package, which has been
shown to have room for improvement. Refinement of how the package is used, or replace-
ment with another, could make calibrations more precise.
Object segmentation with the kdtree method has proven to be computationally heavy,
suggesting that this approach could be reconsidered, possibly resulting in a more efficient
implementation.
Expansions are ways of adding new features and functionalities on top of the existing imple-
mentation, most likely through development of add-ons. These are suggested areas considered as
worth looking into, because they will add useful tools based on the existing core implementation:
A functionality for monitoring and keeping track of all objects in the scene could be useful
for tracking and identifying when changes occur in the scene or a new object is detected.
An add-on capable of analysing the content of a container in the scene in order to determine
if it is empty, or in which region of the container there is still objects left to be picked.
Object classification or recognition could be useful in scenarios where larger objects with
distinguishable features are present. This could also help determine the orientation of
recognizable objects.
As expected, and desired, the implemented scene camera solution has presented some useful
options for further development. Which of these are the most relevant to prioritize comes down to
which demonstration tasks are decided to show in the future.
43
-
7/27/2019 Scene Detection for Flexible Production Robot
64/81
-
7/27/2019 Scene Detection for Flexible Production Robot
65/81
Chapter 6Conclusion
Through completion of this project a scene camera solution for the DTI Robot Co-Worker
prototype has been developed, integrated and verified.
From analysis of the scene and investigation of existing modules it was decided to create a
completely new implementation, tailored to match requirements for use with the Robot Co-Worker.
The developed solution serves as a fully functional scene camera module, including necessary
core features for calibration and reliable generation of segmented RGBD point clouds, as well as
additional functionalities to allow pose estimation of objects.
Fulfilment of the initial project goals have been achieved by development of:
An automatic calibration solution, estimating and saving 6D pose estimates of the structured
light sensors by placement of an AR tag in the scene and executionf of a script. Calibration
during operation is possible, and values are saved even after restart of the system.
Functionality to allow easy configuration of point cloud density through adjustment of voxel
grid resolution used for down sampling. This makes it possible to adjust 3D data resolution
to match varying requirements in multiple situations.
A Correction node for adjustment of estimated depth values, which has proven to be crucial
in order to obtain reliable distance measurements from the sensors.
The scene camera core which processes obtained 3D data and publishes segmented point
clouds of objects in the scene, as well as bounding box geometries for obstacle avoidance inmotion planning.
An add-on providing a ROS service to locate an axis-aligned bounding box of specific size in
the scene and return its pose for use in external modules.
In cooperation with the development team of the Robot Co-Worker, the scene camera solution has
been successfully integrated with the current prototype. This has made it possible to perform tests
in a wide variety of realistic scenarios, through demonstrations of performance in example cases
suggested by industrial partners.
45
-
7/27/2019 Scene Detection for Flexible Production Robot
66/81
-
7/27/2019 Scene Detection for Flexible Production Robot
67/81
List of Appendices
A Appendix - Package installation instructions . . . . . . . . . . . . . . . . . . . . . . . . 49
A.1 How to do a fresh installation . . . . . . . . . . . . . . . . . . . . . . . . . . . . 49
A.2 How to launch the nodes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 49
B Appendix - Wiki content . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 51
C Appendix - Depth estimation offset for sensors . . . . . . . . . . . . . . . . . . . . . . . 57D Appendix - CD contents . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 59
E Appendix - Code . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 61
47
http://0.0.0.0/http://0.0.0.0/http://0.0.0.0/http://0.0.0.0/http://0.0.0.0/http://0.0.0.0/ -
7/27/2019 Scene Detection for Flexible Production Robot
68/81
-
7/27/2019 Scene Detection for Flexible Production Robot
69/81
-
7/27/2019 Scene Detection for Flexible Production Robot
70/81
-
7/27/2019 Scene Detection for Flexible Production Robot
71/81
B. Appendix - Wiki content
During development of the scene camera, an internal wiki webpage was created as a reference
for the members of the Robot Co-Worker team at DTI. The content has not been streamlined for
external use, as it is just meant to serve as a basic guideline for use of the scene camera solution.
Included in the following four pages is a snapshot of the wiki near the end of this project. It
should be noted that the wiki format is primarily created to be viewed in a browser, not for niceprinting layouts.
51
-
7/27/2019 Scene Detection for Flexible Production Robot
72/81
-
7/27/2019 Scene Detection for Flexible Production Robot
73/81
-
7/27/2019 Scene Detection for Flexible Production Robot
74/81
C. Appendix - Depth estimation offset for
sensors
Xtion 1 correction equation converted to [mm]): y = 0.00001x2 + 0.989x+ 15.5
Xtion1 distance estimatesest. [m] act. [m] error [m] error [%]
0.538 0.55 0.012 2.23
0.634 0.65 0.016 2.52
0.733 0.75 0.017 2.32
0.833 0.85 0.017 2.04
0.931 0.95 0.019 2.04
1.029 1.05 0.021 2.04
1.129 1.15 0.021 1.86
1.228 1.25 0.022 1.79
1.323 1.35 0.027 2.04
1.418 1.45 0.032 2.26
Table 1: Estimate results for the Xtion1
sensor with factory calibration.
Figure 1: Xtion1 estimates and measurements.
Xtion 2 correction equation converted to [mm]): y = 0.00008x2 + 0.963x+ 24.33
Xtion2 distance estimates
est. [m] act. [m] error [m] error [%]
0.619 0.65 0.031 5.01
0.711 0.75 0.039 5.49
0.801 0.85 0.049 6.12
0.892 0.95 0.058 6.50
0.984 1.05 0.066 6.71
1.072 1.15 0.078 7.28
1.161 1.25 0.089 7.67
1.247 1.35 0.103 8.26
1.331 1.45 0.119 8.94
1.414 1.55 0.136 9.62
Table 2: Estimate results for the Xtion2
sensor with factory calibration.
Figure 2: Xtion2 estimates and measurements.
57
-
7/27/2019 Scene Detection for Flexible Production Robot
75/81
58 List of Appendices
Kinect 1 correction equation converted to [mm]): y = 0.000001x2 + 0.984x+ 7.99
Kinect1 distance estimates
est. [m] act. [m] error [m] error [%]
0.547 0.55 0.003 0.550.650 0.65 0.000 0.00
0.750 0.75 0.000 0.00
0.848 0.85 0.002 0.24
0.949 0.95 0.001 0.11
1.050 1.05 0.000 0.00
1.149 1.15 0.001 0.09
1.248 1.25 0.002 0.16
1.346 1.35 0.004 0.30
1.448 1.45 0.002 0.14
Table 3: Estimate results for the Kinect1
sensor with factory calibration.
Figure 3: Kinect1 estimates and measurements.
Kinect 2 correction equation converted to [mm]): y = 0.000001x2 + 1.014x 43.53
Kinect2 distance estimates
est. [m] act. [m] error [m] error [%]
0.583 0.55 -0.033 -5.66
0.681 0.65 -0.031 -4.55
0.776 0.75 -0.026 -3.35
0.873 0.85 -0.023 -2.630.972 0.95 -0.022 -2.26
1.070 1.05 -0.020 -1.87
1.164 1.15 -0.014 -1.20
1.264 1.25 -0.014 -1.11
1.360 1.35 -0.010 -0.74
1.453 1.45 -0.003 -0.21
Table 4: Estimate results for the Kinect2
sensor with factory calibration.
Figure 4: Kinect2 estimates and measurements.
-
7/27/2019 Scene Detection for Flexible Production Robot
76/81
D. Appendix - CD contents
Along with this report comes a CD with files related to the project.
These are the folder on the CD, and their content:
Images Full resolution versions of images included in the report.
Packages Source files in the form of ROS packages ready for compilation.
References Pdf files of all resources from the resource list of the report.
Report Pdf version of this report itself.
Videos Video files of the demonstrations mentioned in the report.
59
-
7/27/2019 Scene Detection for Flexible Production Robot
77/81
-
7/27/2019 Scene Detection for Flexible Production Robot
78/81
-
7/27/2019 Scene Detection for Flexible Production Robot
79/81
-
7/27/2019 Scene Detection for Flexible Production Robot
80/81
-
7/27/2019 Scene Detection for Flexible Production Robot
81/81
References
[1] DIRA. Robot Co-Worker - Information og demonstration. UR L: http://www.dira.dk/nyheder/
?id=519 (visited on 30/05/2013).
[2] Willow Garage. PR2 Tabletop Manipulation Apps. UR L: http://ros.org/wiki/pr2_
tabletop_manipulation_apps (visited on 22/06/2013).
[3] The Danish Technological Institute. DTI Robot Co-Worker for Assembly. UR L: http://www.
teknologisk.dk/ ydelser/dti- robot- co- worker- for- assembly/ 32940 (visited on
21/06/2013).
[4] Gautier Dumonteil et al. Ivan Dryanovski William Morris. Augmented Reality Marker Pose Es-
timation using ARToolkit. URL: http://www.ros.org/wiki/ar_pose(visited on 20/06/2013).
[5] Scott Niekum. ar-track-alvar Package Summary. URL: http://www.ros.org/wiki/ar_track_
alvar (visited on 31/05/2013).
[6] PCL. Euclidean Cluster Extraction - Documentation. UR L: http://www.pointclouds.org/
documentation/tutorials/cluster_extraction.php (visited on 09/06/2013).
[7] PCL. How to use a KdTree to search. UR L: http: / / pointclouds . org / documentation /
tutorials/kdtree_search.php (visited on 24/06/2013).
[8] ROS. Intrinsic calibration of the Kinect cameras. UR L: http://www.ros.org/wiki/openni_
launch/Tutorials/IntrinsicCalibration(visited on 31/05/2013).
[9] SMErobotics. The SMErobotics Project. UR L: http://www.smerobotics.org/project.html
(visited on 21/06/2013).
[10] Mikkel Viager. Analysis of Kinect for Mobile Robots. Individual course report. Technical Uni-
versity of Denmark DTU, Mar. 2011.
[11] Mikkel Viager. Flexible Mission Execution for Mobile Robots. Individual course report. Technical
University of Denmark DTU, July 2012.
[12] Mikkel Viager. Scene analysis for robotics using 3D camera. Individual course report. Technical
U i i f D k DTU J 2013
http://www.dira.dk/nyheder/?id=519http://www.dira.dk/nyheder/?id=519http://ros.org/wiki/pr2_tabletop_manipulation_appshttp://ros.org/wiki/pr2_tabletop_manipulation_appshttp://www.teknologisk.dk/ydelser/dti-robot-co-worker-for-assembly/32940http://www.teknologisk.dk/ydelser/dti-robot-co-worker-for-assembly/32940http://www.ros.org/wiki/ar_posehttp://www.ros.org/wiki/ar_track_alvarhttp://www.ros.org/wiki/ar_track_alvarhttp://www.pointclouds.org/documentation/tutorials/cluster_extraction.phphttp://www.pointclouds.org/documentation/tutorials/cluster_extraction.phphttp://pointclouds.org/documentation/tutorials/kdtree_search.phphttp://pointclouds.org/documentation/tutorials/kdtree_search.phphttp://www.ros.org/wiki/openni_launch/Tutorials/IntrinsicCalibrationhttp://www.ros.org/wiki/openni_launch/Tutorials/IntrinsicCalibrationhttp://www.smerobotics.org/project.htmlhttp://www.smerobotics.org/project.htmlhttp://www.ros.org/wiki/openni_launch/Tutorials/IntrinsicCalibrationhttp://www.ros.org/wiki/openni_launch/Tutorials/IntrinsicCalibrationhttp://pointclouds.org/documentation/tutorials/kdtree_search.phphttp://pointclouds.org/documentation/tutorials/kdtree_search.phphttp://www.pointclouds.org/documentation/tutorials/cluster_extraction.phphttp://www.pointclouds.org/documentation/tutorials/cluster_extraction.phphttp://www.ros.org/wiki/ar_track_alvarhttp://www.ros.org/wiki/ar_track_alvarhttp://www.ros.org/wiki/ar_posehttp://www.teknologisk.dk/ydelser/dti-robot-co-worker-for-assembly/32940http://www.teknologisk.dk/ydelser/dti-robot-co-worker-for-assembly/32940http://ros.org/wiki/pr2_tabletop_manipulation_appshttp://ros.org/wiki/pr2_tabletop_manipulation_appshttp://www.dira.dk/nyheder/?id=519http://www.dira.dk/nyheder/?id=519