3d body measurement’s with a single kinect sensor
DESCRIPTION
3D body measurement’s with a single Kinect sensor project reportTRANSCRIPT
3D body measurement’s with
a single Kinect sensor
Nikolai Bickel
Supervisor: Dr. Bogdan Matuszewski
A project report submitted in partial fulfilment of the Degree of BSc (Hons.) Computing
26/04/2012
Double Project CO3808
University of Central Lancashire
Honours Degree Project Nikolai Bickel Page 2 of 69 26/04/2012
I. Abstract
This report describes the project that has been undertaken as part of the final Degree of
Computing BSc (Hons) at the University of Central Lancashire.
The Microsoft Kinect holds the Guinness World Record of being the "fastest selling consumer
electronics device". 8 million units were sold in its first 60 days and that was just the
beginning. Until January 2012 18 million Kinect’s were sold worldwide. Mainly these
devices were used for the gaming console Xbox as an input device instead of the normal game
controller. It is able to control the console and selected games with gestures and voice
commands.
Very short after the big commercial success a few smart engineers found out that it was
possible to connect the Kinect not just with the gaming console but also with a standard
personal computer. Soon after a connection was established the engineers were also able to
access the data of the Kinects sensors.
The Kinect as a 3D input device for computer vision applications and a lot more was born.
The quite good accuracy, relative to the price, made it to a great 3D measurement device.
Because working with the Kinect is very interesting this project analyses if it is possible to
use the Kinect as an input device for body measurements such as height or chest
circumference to use this data to communicate clothing sizes. That first idea slightly changed
to just make the 3D body model and the measurement of body height.
In this project the different capabilities of the Kinect and the way to access and analyse data
was explored. The next step was to make experiments with real persons to look how accurate
the results are compared to the real world.
This report describes the research around the Kinect project, its capability’s and the present
results of the body scanning experiments and accuracy analysis.
II. Acknowledgements
I would like to thank
my supervisor Dr. Bogdan Matuszewski for his help and guidance over the whole
project lifecycle
my family for supporting my study in Preston
Honours Degree Project Nikolai Bickel Page 3 of 69 26/04/2012
III. Table of Contents
I. Abstract ........................................................................................................................... 2
II. Acknowledgements ......................................................................................................... 2
III. Table of Contents ............................................................................................................ 3
1. Introduction ..................................................................................................................... 5
1.1. Context......................................................................................................................... 5
1.2. Overview ..................................................................................................................... 5
2. Analysis and Investigation .............................................................................................. 6
2.1. Chapter Summary ........................................................................................................ 6
2.2. 3D human body scanning ............................................................................................ 6
2.2.1. What is 3D? .......................................................................................................... 6
2.2.2. What can be scanned? .......................................................................................... 6
2.2.3. Problems ............................................................................................................... 7
2.2.4. Use-cases for 3d body models .............................................................................. 7
2.3. Project management .................................................................................................... 8
2.3.1. Project Proposal .................................................................................................... 8
2.3.2. Technical Plan ...................................................................................................... 8
2.3.3. Literature Review ................................................................................................. 8
2.3.4. Supervisor Meetings ............................................................................................. 8
2.3.5. Risk Analysis ........................................................................................................ 9
2.4. Microsoft Kinect ........................................................................................................ 10
2.4.1. What is a Microsoft Kinect? ............................................................................... 10
2.4.2. Hardware details ................................................................................................. 11
2.4.3. Principles of Kinect ............................................................................................ 12
3. Design Issues ................................................................................................................. 14
3.1. Chapter Summary ...................................................................................................... 14
3.2. Kinect interfaces ........................................................................................................ 14
3.2.1. OpenNI framework ............................................................................................ 14
3.2.2. OpenKinect ......................................................................................................... 16
3.2.3. Microsoft Kinect SDK ....................................................................................... 18
3.3. RGB-Demo/Nestk (data processing) ......................................................................... 20
3.3.1. Current features of the RGBDemo ..................................................................... 20
3.3.2. Installation and Compilation .............................................................................. 21
3.3.3. RGBDemo installation ....................................................................................... 22
3.4. Supporting tools ......................................................................................................... 23
3.4.1. CMake ................................................................................................................ 23
3.4.2. Git ....................................................................................................................... 25
3.4.3. Meshlab .............................................................................................................. 25
Honours Degree Project Nikolai Bickel Page 4 of 69 26/04/2012
3.4.4. Microsoft Visual Studio ..................................................................................... 25
4. Implementation .............................................................................................................. 26
4.1. Chapter Summary ...................................................................................................... 26
4.2. Kinect Calibration...................................................................................................... 26
4.3. Collect data ................................................................................................................ 28
4.3.1. How to save the Kinect data ............................................................................... 28
4.3.2. What data are saved per frame? ......................................................................... 30
4.4. Accuracy measurements ............................................................................................ 31
4.4.1. Random error ...................................................................................................... 31
4.4.2. Systematic error .................................................................................................. 31
4.4.3. The setup ............................................................................................................ 32
4.4.4. Colour Segmentation .......................................................................................... 32
4.4.5. Histograms ......................................................................................................... 33
4.4.6. Standard deviation .............................................................................................. 34
4.4.7. Error map ............................................................................................................ 34
4.4.8. Problems ............................................................................................................. 34
4.4.9. Results ................................................................................................................ 34
4.5. 3D reconstruction ...................................................................................................... 35
4.5.1. PCL Kinect Fusion ............................................................................................. 35
4.5.2. RGBDemo Reconstructor .................................................................................. 35
4.5.3. Implementation ................................................................................................... 39
4.5.4. Problems/Solutions............................................................................................. 41
4.6. Point cloud processing ............................................................................................... 43
4.7. Meshing ..................................................................................................................... 44
4.8. Measurements ............................................................................................................ 45
5. Critical Evaluation ......................................................................................................... 46
5.1. Chapter Summary ...................................................................................................... 46
5.2. Project Management Evaluation ................................................................................ 46
5.3. Design and Implementation ....................................................................................... 46
5.4. Possible Improvements .............................................................................................. 47
5.5. Learning Outcomes .................................................................................................... 48
6. Conclusion ..................................................................................................................... 48
7. References ..................................................................................................................... 49
8. List of Figures ............................................................................................................... 52
9. Appendix A ................................................................................................................... 53
10. Appendix B ................................................................................................................... 56
11. Appendix C ................................................................................................................... 61
12. Appendix D ................................................................................................................... 62
Honours Degree Project Nikolai Bickel Page 5 of 69 26/04/2012
1. Introduction
1.1. Context
There are a few problems when buying clothes online, the most common being that the
purchased clothes do not fit. This is exacerbated by the fact that many users do not know their
own size or the size of those they are purchasing for (such as parents who purchase garments
for their children). Many people deal with this problem by ordering several sizes of the same
clothes and send back the excess.
For those who buy several sizes it can be a nuisance to return the excess clothes. Additionally
if a customer wishes to purchase clothes for a special occasion, they might be reluctant to
order online as they might be unsure of the fitting of the clothes ordered. The online stores
also bear the costs associated with this problem, as they usually pay the shipping costs for the
returned items, and also deal with several logistical issues along with the costs associated with
the resale of the items. The project idea was to get the measurements with the help of a
Microsoft Kinect. This was a pretty clear target, because the device is a cheap depth scanner.
A customer should be able to upload his data of measurements to a homepage and the online
shop can then check if the clothes the person wants to buy would fit this person. But this was
not part of this project. The main work was the collection, analysis and combination of the
data from the Kinect with the help of several different tools explained in this report.
Additionally to the 3D reconstruction a setup and a program to analyse the accuracy and noise
of the Kinect data was made.
The first thought for the usage of the project outcome was just for online shops, but a 3D body
model could be useful in different other applications for example to generate a virtual game
character (avatar) for computer games or medical analysis.
1.2. Overview
Chapter 2 (Analysis and Investigation) contains general information about 3D, especially in
context of human bodies. It also contains project relevant information for example about
project meetings, additional project documents and an evaluation of the risk analysis.
Chapter 3 (Design Issues) contains information about the interfaces to connect the Kinect to
the computer. It also contains information about the program RGBDemo which was used in
this project. The next part introduces a few other supporting programs.
Chapter 4 (Implementation) contains more information about how the 3D body model was
produced and anything else that is important to produce a 3D body model. The chapter also
contains information about the Kinect error and the accuracy experiment that has been made
during this project.
Chapter 5 (Critical Evaluation) contains evaluations and thoughts about the different aspects
of the project. It also includes possible improvements and learning outcomes.
Honours Degree Project Nikolai Bickel Page 6 of 69 26/04/2012
2. Analysis and Investigation
2.1. Chapter Summary
This chapter contains general information about 3D especially in context of human bodies. It
also contains project relevant information for example about project meetings, additional
project documents and an evaluation of the risk analysis.
2.2. 3D human body scanning
2.2.1. What is 3D?
Figure 1 2D, 2.5D and 3D (D’Apuzzo, 2009)
3D stands for three-dimensional. Objects of our world can be represented by three parameters.
These three dimensions are commonly called length, width and depth. That is also the reason
why the Kinect is sometimes called “depth sensor”. The depth sensor of the Kinect can
capture the information in front of it in these three dimensions.
2.2.2. What can be scanned?
It is possible to scan the whole body or just parts of the body. For example it is possible to
scan the chest, the back, the face or legs. In this project the full body scan of a person is being
used. The advantages of scanning just a part of the body are given when there is a special
interest for example in medical application.
Honours Degree Project Nikolai Bickel Page 7 of 69 26/04/2012
2.2.3. Problems
When working with a human body as a scanning object there are some problems that are
summarized in the presentation of D’Apuzzo (D’Apuzzo, 2009):
Practical problems
o Movements
o Breathing
o Hairs
Physical limits
o Stature
o Size
o Weight
Private data
Especially the movements during the scanning were a big problem when reconstructing the
body.
2.2.4. Use-cases for 3d body models
Animation – When a 3D body model of a person is available it is possible to animate this
body with computer graphic techniques. This animation could be useful in computer games.
Ergonomics – For example a company could produce a special chair for a specific person’s
body
Fitness / Sport – Strung together scans could show the weight loss process (motivation)
Medicine – A 3D model of a face could be useful for plastic surgery
Honours Degree Project Nikolai Bickel Page 8 of 69 26/04/2012
2.3. Project management
2.3.1. Project Proposal
The purpose of the Project proposal (see Appendix A) was to write about the problem we
want to tackle with our project. We also had to write in generally how we want to solve the
project. The project proposal includes searching for relevant literature and adding it to the
document. It also contains an initial idea to tackle the problem.
2.3.2. Technical Plan
In the technical plan (see Appendix B) the project is specified in more detail. It also contains
the project management relevant material for example project deadlines. In this stage of the
project a risk analysis was made and potential ethical or legal issues were discussed. The
technical plan additionally contains a small commercial analysis about what costs are to be
expected.
An important part was also to schedule our project. It is realized with a Gantt chart (see
Appendix C). A Gantt chart illustrates the different project stages in form of a bar chart.
2.3.3. Literature Review
The literature review contains a discussion of the published work around the project topic.
Because the Kinect is relatively new it was not easy to find literature specific for the Kinect.
The title of the literature review is “Build a 3D body model with a single Kinect sensor”. The
work on the literature review gave an overview about what is important in this project.
2.3.4. Supervisor Meetings
There had been several meetings and conversations with the supervisor Bogdan Matuszewski
during the course of the project from September until the end of April.
From the start of the academic year in September until Christmas vacation we met nearly
every week. Those meetings were shared by electronic bachelor students who also have Mr.
Matuszewski as a supervisor.
These meetings were organized in a way that every week one person had to present his project
to the other students. The thoughts about the project were presented first. For this presentation
a PowerPoint presentation to illustrate the problems was made. The purpose of the
presentation was to learn how to present a project progress. The students who heard the
presentation learned to think and criticize other projects. Sometimes the other projects
introduced thoughts and ideas which could then be adapted to this project. For example the
colour segmentation (see chapter 4.4.4) was mentioned in one of this meetings.
In the second semester there were only individual meetings with the Mr. Matuszewski, so he
could go into more detail when supervising. These meetings happened approximately every
two to three weeks. In these meetings we also got the feedback for the literature review, the
pre-christmas progress review, the post-christmas progress review and the acceptance check.
Honours Degree Project Nikolai Bickel Page 9 of 69 26/04/2012
2.3.5. Risk Analysis
Risk Severity Likelihood Action
Noisy depth data from
the Kinect
Medium High Make the best out of the data I get
from the Kinect
Inaccurate data from
external resources
(OpenKinect, MS Kinect
SDK, OpenNI)
High Medium Try to configure the tools the right
way so that the interface give the best
results that are possible
Robustness – the data of
2 measurements do not
match
Medium High Try to reduce the error rate as much
as possible
The Kinect sensor
breaking
High Low Buy a new one (will take a week)
Losing information about
the project or related
information
High Low Make backups
Measurement points are
too complex to
implement
Medium Medium Try the best otherwise reduce the
measurement points
Scheduling failure, not
enough time to complete
project
High Medium Make a Gantt chart
Table 1 Risk analysis
This is the risk analysis made for the project as part of the technical plan (see Appendix B) at
the beginning of the project. This chapter discuss these risks and how they affected the
project.
The first risk was the noise data from the Kinect. Every measurement device has a noise. Due
to the Kinects cast as a controller device for a gaming console perfect accuracy is not needed.
But it still works as you can see in the results at the end of the report.
The data from the Kinect interfaces on the PC were pretty good. There was no problem that
the Kinect interfaces (see chapter 3.2) gave inaccurate values.
The Kinect sensor did not break but instead the hard drive did. The likelihood on this issue
was “Low”. But there were a backup of all project relevant data available so the impact on the
project was not very big. It was just a small problem but a project meeting with the supervisor
had to be rescheduled because the hard drive was not delivered on time.
Honours Degree Project Nikolai Bickel Page 10 of 69 26/04/2012
2.4. Microsoft Kinect
2.4.1. What is a Microsoft Kinect?
Figure 2 Microsoft Kinect for Xbox 360 (Pandya, 2011)
The usual purpose of the Microsoft Kinect is not the way it is used in this project. Microsoft
announced the Kinect as a gaming input device. It interacts with the Xbox, a video game
console manufactured by Microsoft. It is an optional peripheral device. The Kinect is
connected with a wire to the Xbox and enables advanced gesture recognition, facial
recognition and voice recognition. The Kinect for Xbox 360 was released in November 2010.
18 million units of the Kinect sensor have been shipped worldwide till January 2012
(Takahashi, 2012). In this project the Kinect for Xbox 360 was used for research and
development. The Kinect sensor has a practical ranging limit of 1.2 – 3.5 m distance when
used with the Xbox software.
The Kinect is connected to the Xbox through a USB port. Because the USB port cannot
supply enough power for the Kinect the device makes use of a proprietary connector
combining USB communication with additional power. The newer Versions of the Xbox does
not need this special power supply cable.
Shortly after the Kinect was announced hacker figured out how to use the Kinect as a non-
gaming device on PC’s. People used the Kinect to control robots or play normal computer
games with gestures.
In this project the Kinect was not used with an Xbox. Instead it was connected with a
computer by a standard USB 2.0 port.
The Kinect has an RGB camera, depth sensor and multi-array microphone. There is more
written about the features in chapter 2.4.2.
Honours Degree Project Nikolai Bickel Page 11 of 69 26/04/2012
There are two different Versions of the Microsoft Kinect. As Microsoft discovered that many
people want to use the Kinect beyond the gaming purpose they start thinking about to support
these researcher and company’s too. That’s why they announced a SDK about which is
written in chapter 3.2.3. In the middle of this project Microsoft announced a new Kinect
version (hardware) the “Kinect for Windows”.
The Microsoft Kinect for Xbox 360 is the original Kinect sold for the Xbox. With this Kinect
version people are not allowed to make commercial products using the Kinect. But the device
drivers for the computer are still working with it. The Microsoft Kinect for Windows will be
available in May 2012. The hardware of the device did not change. The Kinect application
now has a “Near Mode” which enables the depth camera to see objects which are closer to the
camera. With the “Kinect for Windows” developer are also able to use the Kinect
commercially.
The Kinect are just sold with a Kinect game “Kinect adventures”. At the moment the product
Kinect Sensor with Kinect Adventures costs £ 99.00 (02/04/2012). For a device with a depth
sensor this is very cheap. The price for the Kinect for Xbox are subsidized by Microsoft
because they hope that “consumers buying a number of Kinect games, subscribing to Xbox
Live, and making other transactions associated with the Xbox 360 ecosystem” (Kinect for
Windows Team, 2012). This is also the reason why the Kinect for Windows will cost
approximately £ 100 more than the Kinect for Xbox version.
Within the first 60 days the Kinect was sold more than 8 million times. This is a Guiness
World Record for the “fastest selling consumer electronics device”. In January 2012
Microsoft announced that they sold 18 million Kinect motion-sensing systems. Certainly the
most Kinects are used for gaming and not for developing applications on the computer.
2.4.2. Hardware details
2.4.2.1. RGB camera
The Kinect has a traditional colour video camera in it, similar to webcams and mobile phone
cameras. Microsoft calls this an RGB camera referring to the red, green and blue colours it
detects. The camera has a resolution of 640x480 pixels with 30 frames per second. The RGB
camera has a slightly larger angel of view than the depth sensor.
2.4.2.2. 3D Depth Sensor
The 3D depth sensor is the heart of the Kinect unique capabilities. The sensor provides a
640x480 pixel depth map with 11 bit depth (2048 levels of sensitivity) with 30 frames per
second. An advantage of a depth sensor is that it is colour and texture invariant.
2.4.2.3. Multi-array microphone
The Kinect includes an array of four built-in microphones. Each of the microphones provides
16-bit audio at a sampling rate of 16 kHz. They are used by the Xbox to gather sound
commands by the user. There is more than one microphone to isolate the noise in the room.
2.4.2.4. Tilt motor for sensor adjustment
The motorized pivot is capable of tilting the sensor up and down. With this motor the Kinect
is able to extend the horizontal field of view.
Honours Degree Project Nikolai Bickel Page 12 of 69 26/04/2012
2.4.3. Principles of Kinect
There are different methods to measure the depth information of a scene. Those which are
working with light waves, like the Kinect, are for instance laser scans or time-of-flight
techniques. The Kinect is using structured light technique.
Structured light is a process of projecting a known pattern of pixels on to a scene and then
analyses the way these deform when hit a surface. This structured light could be visible for
the eye or not. The Kinect is using invisible (or imperceptible) structured light because the
light pattern is near-infrared light which a normal human eye cannot see. The structured light
projected by the Kinect has a pseudo random pattern.
Figure 3 Image from the PrimeSense patent (Zalevsky, et al., 2007)
The Light Source Unit (IR light source) project the light pattern to the scene. The Light
Detector (CMOS IR Camera) observes the scene and the Control system calculates the depth
information.
The calibration between the projector and the detector has to be known. Calibration is carried
out the time of manufacture. The calibration of the device was carried out while
manufactured. A set of reference images was taken at different locations then stored in the
memory.
Honours Degree Project Nikolai Bickel Page 13 of 69 26/04/2012
Figure 4 Structured light (Freedman, et al., 2010)
The speckle pattern (structured light) produced by the IR light source varies with the z-axis.
Figure 4 shows a human hand (object) with different speckle pattern at different distance.
Kinect uses 3 different sizes of speckles for three different regions of distances. Because the
speckles are having a distance-dependent property, each position has its specific spacing and
shape. The control system of the Kinect estimates the depth by correlating each window with
the reference data (speckle pattern). The reference pattern is stored at a known depth in the
Kinects memory.
“The best match with the stored pattern gives an offset from the known depth, in terms of
pixels: this is called disparity. The Kinect device performs a further interpolation of the
best match to get sub-pixel accuracy of 1/8 pixel. Given the known depth of the memorized
plane, and the disparity, an estimated depth for each pixel can be calculated by
triangulation.” - (ROS.org, 2010)
Honours Degree Project Nikolai Bickel Page 14 of 69 26/04/2012
3. Design Issues
3.1. Chapter Summary
This chapter contains information about the interfaces to connect the Kinect to the computer.
It also contains information about the program RGBDemo which was used in this project. The
next part introduces a few other supporting programs.
3.2. Kinect interfaces
At the beginning of the project there was the decision how to get the information from the
Kinect. There were three possible interfaces to access the information. Often these interfaces
were “built from source code”. In the field of computer software this term means the process
of converting source code files into software that can be run on a computer.
3.2.1. OpenNI framework
OpenNI Framework is published by the OpenNI organization. One of the OpenNI
organization goals is to accelerate natural user interfaces. The founding members of the
OpenNI organization are PrimeSense, Willow Garage, Side-Kick, ASUS and AppSide.
PrimeSense is an Israeli company that provides 3D sensing technology for Kinect and ASUS
is a multinational computer hardware and electronics company. The OpenNI organization
provides an API that covers communication with low level devices (e.g. vision and audio
sensor) and high-level middleware solutions (e.g. visual tracking using computer vision).
Figure 5 OpenNI framework architecture (OpenNI, 2012)
The OpenNI framework is not just for the Microsoft Kinect. It also supports other hardware as
example the ASUS Xtion PRO.
Honours Degree Project Nikolai Bickel Page 15 of 69 26/04/2012
3.2.1.1. Installation
The concept of OpenNI is that they are trying to be very modular. To install and access the
API it is necessary to install three different components to work with the Kinect. It is
important to install these components in the right order.
The download for the components can be found on the homepage of the OpenNI organization:
http://www.openni.org/Downloads/OpenNIModules.aspx
All components are available as executable files in 32 and 64-bit version for Windows and
Ubuntu. There are also stable and unstable releases available. To use the Kinect mod (Step 4)
at the moment the unstable releases are required as you can read in the last step of the
installation process.
The first step is to install the OpenNI Binaries.
The second step is to install the NITE Module (Download category: OpenNI Compliant
Middleware Binaries). Many of old installation instructions on the internet tell you that a
license key is necessary. But in all of the latest NITE installation packages the licence key is
added automatically.
The third step is to install the Primesensor Module (Download category: OpenNI Compliant
Middleware Binaries).
The last step is to install the SensorKinect driver which can be downloaded from
https://github.com/avin2/SensorKinect. It is important to read the README file. At the actual
version it says “You must use this kinect mod version with the unstable OpenNI release”.
When all these packages are installed a restart of the system is highly recommended.
When the installation is successful the driver for the Kinect is installed when the Kinect is
connected the first time to the USB port (shown in Figure 6)
Figure 6 Installed OpenNI driver at the Windows Device Manager
3.2.1.2. Interface
You can access data from the Kinect over the OpenNI framework with Java, C++ and C#.
To use the interface in C/C++ add the include Directory "$(OPEN_NI_INCLUDE)". This is
an environment variable that points to the location of the OpenNI Include directory. The
standard location of the include directory is C:\Program files\OpenNI\Include.
Also add the library directory "$(OPEN_NI_LIB)". This is also an environment variable that
points by default to the location C:\Program files\OpenNI\Lib.
Honours Degree Project Nikolai Bickel Page 16 of 69 26/04/2012
The source code should include XnOpenNI.h if using the C interface or XnCppWrapper.h if
using the C++ interface.
3.2.1.3. License
OpenNI is written and distributed under the GNU Lesser General Public License (LGPL)
which means that its source code is freely-distributed and available to the general public.
3.2.1.4. Documentation
OpenNI framework is well-documented. There exists a Programmer Guide which explains the
OpenNI system architecture and programming object model. These explanations are
illustrated with code snippet examples.
The OpenNI framework also provide example applications in C, C++ and Java with a
Features thoroughly explained in the documentation.
As any normal framework OpenNI also supply a documentation of the interface its classes
and members.
3.2.2. OpenKinect
OpenKinect is according to their homepage “an open community of people interested in
making use of the amazing Xbox Kinect hardware with our PCs and other devices. We are
working on free, open source libraries that will enable the Kinect to be used with Windows,
Linux, and Mac” (OpenKinect, 2012).
The OpenKinect project has an interesting history. In November 2010 the website
adafruit.com announced a competition for hackers and said they would pay $3,000 to the
person who can access the Kinect with a PC and access the image and depth data. The source
code needs to be open source and/or public domain. A few days later Hector Martín finally
was able to hack the Kinect and won the competition.
When Microsoft finally announced their driver for the Kinect a Microsoft employee called
Johnny Chung Lee shared his secret in his blog:
“Back in the late Summer of 2010, trying to argue for the most basic level of PC
support for Kinect from within Microsoft, to my immense disappointment, turned out
to be really grinding against the corporate grain at the time (for many reasons I won't
enumerate here). When my frustration peaked, I decided to approach AdaFruit to put
on the Open Kinect contest” - (Lee, 2011)
Johnny Chung Lee does not work at Microsoft anymore, but this statement means that the
first person who started the whole hacks around the Kinect was a developer of the Kinect
within Microsoft.
The heart of the OpenKinect is the “libfreenect”. Libfreenect includes all necessary code to
activate, initialize, and communicate data with the Kinect hardware. This includes drivers and
a cross-platform API that works on Windows, Linux, and OS X. At the moment there is no
access to the audio stream of the Kinect.
Honours Degree Project Nikolai Bickel Page 17 of 69 26/04/2012
The roadmap of the Kinect mentions an OpenKinect Analysis Library. This library should
analyse the raw information into more useful abstractions. This includes hand tracking,
skeleton tracking, Point cloud generation and 3d reconstruction. But they are also writing that
it takes months or years to implement these functionalities
3.2.2.1. Installation
To explore the OpenKinect project the driver were build form the source code. To build the
project for Windows you first need to download the source code from github (OpenKinectSrc,
2012).
The next step is to install the dependencies. Those are for Windows libusb-win32, pthreads-
win32 and Glut. Copy the .dll-files from the dependencies pthreads and Glut to
/windows/system32.
There are two parts to libfreenect. One is the low-level libusb-based device driver and the
other is libfreenect. Libfreenect is the library that talks to the driver. The next step is to install
the low-level device driver. This can be done in the Device Manager of Windows. Right click
on the Kinect devices and select "Update Driver Software...“ Update the driver of the devices
“Xbox NUI Motor”, "Xbox NUI Camera" and "Xbox NUI Audio". The drivers are located in
the downloaded source code in the folder “/platform/windows/inf”.
After this step use Cmake to configure the compiler and create the makefiles for the compiler.
CMake is discussed in chapter 3.4.1.
The next step is to compile the source code with the compiler that was selected in Cmake. To
use the library it should be copied to “/windows/system32” or to the folder of the program
you want to run with the library.
More information is in the README files and the OpenKinect encyclopaedia.
3.2.2.2. Interface
OpenKinect is a low level API. At the moment it only supports a few basic functions. It
allows the access to the camera, depth map, the led and the motor for tiling the system.
public class KinectDevice {
public KinectDevice()
Signature: ()V;
public setLEDStatus(LEDStatus)
Signature: (LLEDStatus;)V
public getLEDStatus()
Signature: ()LLEDStatus;
public setMotorPosition(float)
Signature: (F)V
public getMotorPosition()
Signature: ()F
public getRGBImage()
Signature: ()LI
public getDepthImage()
Signature: ()LI
}
Honours Degree Project Nikolai Bickel Page 18 of 69 26/04/2012
These are the functions OpenKinect offers to the user. As you can see it offers access to the
RGB Image, the Depth Image and the Motor Position.
The API is written in C but there are wrappers for Python, C++, C#, Java and several other
programming languages. Wrapper in this case means that these wrappers are bridges between
the C API and the appropriate programming language.
3.2.2.3. License
This project is dual-licensed. When software is dual-licensed the recipients can choose under
which terms they want to use or distribute the software. The two licenses are the Apache 2.0
license and the GPL v2 license. This means you can copy, modify and distribute the covered
software in source and/or binary forms with some conditions. These conditions are for
example all copies should be accompanied by a copy of the license.
3.2.2.4. Documentation
OpenKinect provides a Wiki with relevant information to the interface and the Kinect itself. A
Wiki is a website whose users can add, modify, or delete its content via a web browser like
the well-known internet site Wikipedia.
3.2.3. Microsoft Kinect SDK
The Microsoft Kinect SDK is the original programming interface to the Microsoft Kinect. It
was announced by Microsoft in spring 2011 after they saw what impact the OpenKinect
project had to the developer community.
During the project in March 2012 Microsoft announced a new version of the SDK called
“Kinect for Windows 1.5”.
At the start of the project Microsoft named his software development kit “Microsoft Kinect
SDK”. At the beginning of the Year 2012 they changed the name of their Kinect project in
“Kinect for Windows” and the name of the Kinect SDK changed into “Kinect for Windows
SDK”.
Additionally to the publishing the Kinect SDK Microsoft also started 'Kinect Effect'
marketing campaign. The campaign aims to show that a product designed for entertainment is
having a big impact on people's lives. The intention is that consumers will view the product as
a source of 'innovation and inspiration'. They created a video where they show different use
cases of the Kinect beyond the normal gaming purpose. (Microsoft, 2012)
Technically it is possible to use the Microsoft Kinect SDK for the “Kinect for Xbox” but they
recommend changing to the new version of the Kinect called “Kinect for Windows”. This is
the new version of the Kinect especially for Windows (See chapter 2.4.1).
The Kinect for Windows is just available for the operating system Windows. This includes
Windows 7 and Windows Embedded Standard 7. Currently, it can also be used with Windows
8 Developer Preview.
Honours Degree Project Nikolai Bickel Page 19 of 69 26/04/2012
3.2.3.1. Installation
The installation of the Kinect driver for Windows is the easiest of all Kinects programming
interfaces. You only have to download the Kinect SDK application and start the setup. Then
follow the setup which is pretty straight forward.
Additionally to the Kinect for Windows SDK Microsoft also provide a runtime version of the
Kinect framework. A run time version enables a potential user to install all components used
by a program which want to connect to the Kinect without any software development kit
specific content. The runtime version is for customers and smaller than the whole SDK. But
this runtime version just work for the “Kinect for Windows” device.
3.2.3.2. Interface
With the SDK you can build applications in C++, C# and Visual Basic. It is possible to access
the image, depth and audio stream of the connected Kinect.
To use the framework provided by Microsoft add a Reference to the dynamic-link library
Microsoft.Kinect.dll.
To use the data from the Kinect in Visual Studio make a right-click to “References” and select
the option “Add Reference”. Select the Tap “.NET” and search for the “Microsoft.Kinect”
library. Select it and click OK. You can use the classes and functions in C# by adding the
using-command “using Microsoft.Kinect;”
3.2.3.3. License
The new Kinect for Windows SDK authorizes development and distribution of commercial
applications. The old SDK was a beta, and as a result was appropriate only for research,
testing and experimentation.
The license allows software developer to create and sell their applications to customers using
Kinect for Windows hardware. That means you cannot sell applications for people who are
having the Kinect for Xbox hardware.
3.2.3.4. Documentation
Microsoft provides a lot of different documentations and help when you want to work with
the Kinect for Microsoft SDK. This includes a discussion board, videos and code samples.
Honours Degree Project Nikolai Bickel Page 20 of 69 26/04/2012
3.3. RGB-Demo/Nestk (data processing)
An internet search for tools to access and process the Kinect data had the RGBDemo as result.
The main creator of this program is Nicolas Burrus from Spain. RGBDemo helps to access
the data from the Kinect and have a bunch of useful functions to compute the data from the
Kinect. Both of these two projects are open source and available to download via the version
control system git (more on chapter 3.4.2) or over Github (Burrus, 2012) (Burrus, 2012).
Mr. Burrus divided his project with the Kinect into two parts. One is a library nestk (Burrus,
2012) and the other is the RGBDemo (Burrus, 2012).
Nestk is a C++ Library for Kinect which provides a lot of the functions and classes that were
used in the RGBDemo. The library is built on top of OpenCV and QT for the graphical parts.
Parts of it also depend on PCL. The text in chapter 3.2.2 deals with this dependencies.
RGBDemo is using a lot of the nestk library and it has implemented a lot of Kinect related
algorithms. The RGBDemo is written in C++ and use QT for the graphical user interface.
3.3.1. Current features of the RGBDemo
The RGBDemo can grab Kinect images, visualize and replay them. This topic is discussed in
chapter 4.3.
It supports the OpenKinect and the OpenNI as a backend framework. The library of
OpenKinect called libfreenect is already integrated into the nestk. With the OpenNI backend
the program can also extract skeleton data and hand point position.
Since a few months there is also a stable release of the RGBDemo which supports multiple
Kinects.
Results of the different demo programs can be exported to .ply files. More about this file
format is written in chapter 4.5.2.1. These demo programs are:
Demo of 3D scene reconstruction using a freehand Kinect (more in chapter 4.5.2)
Demo of people detection and localization
Demo of gesture recognition and skeleton tracking using OpenNI
Demo of 3D model estimation of objects lying on a table (based on PCL table top
object detector)
Demo of multiple Kinect calibration
A good point for the RGBDemo is that it supports all common operating systems Windows,
Linux and Mac’s OSX.
RGBDemo is under the GNU Lesser General Public License (LGPL). “In short, it means you
can use it freely in most cases, including commercial software. The main limitation is that you
cannot modify it and keep your changes private: you have to share them under the same
license.” (Burrus, 2012)
Honours Degree Project Nikolai Bickel Page 21 of 69 26/04/2012
3.3.2. Installation and Compilation
There is no installation routine for the RGBDemo. To use it the user simply has to download
the .exe files and can start the program. This means there are no fancy installers or nice icons.
RGBDemo offer Win32 binaries to use the program. To start one of the programs you simply
have to click on the “rgbd-viewer.exe” and the program starts. Certainly the dependencies
need to be installed and should work.
It take some time to self-compile the RGBDemo application the first time especially when
you never worked with Git CMake or with other open source projects.
The environment to install the RGBDemo was Microsoft Windows 64-bit with Visual Studio
2010. The compiler from Visual Studio 2010 to build the RGBDemo was used.
It is nearly impossible to compile the RGBDemo Reconstructor with 64-bit because there was
an error “C2872: ‘flann’ : ambiguous symbol”. This error is caused by a conflict between an
embedded flann in OpenCV and an external dependency of PCL with another copy of Flann.
So the ambition to build a 64-bit version of was given up and the RGBDemo were built in the
32-bit version. This has no major disadvantage and also run on a 64-bit computer.
At first dependencies of the RGBDemo were installed. The dependencies are OpenNI, QT,
OpenCV and PCL. PCL is an optional dependency but were still installed and used in this
project, because PCL have good point cloud algorithms implemented.
3.3.2.1. QT
The Qt framework is an application to make graphical user interfaces. It is comparable with
Window’s WinForms or WPF. But instead of just Windows, QT works on a lot more
platforms for example additional to Windows, Mac OSX, Linux, Symbian (mobile phone
operation system). QT is supported and developed by the Nokia development division. QT
includes a GUI designer which was used in the accuracy program (see chapter 4.4).
When the project started last year there was no pre-compiled library for Visual Studio 2010
available. But now they are available on Nokia’s web site. This makes the installation of the
Qt to an easy task. The download is available on Nokia’s download page (Nokia, 2012).
3.3.2.2. OpenCV
OpenCV (Open Source Computer Vision) is a library of programming functions for computer
vision. It includes among other things 2D and 3D feature toolkits, Facial recognition system,
Gesture recognition, Structure from motion or Motion tracking. The library was originally
written in C. But the version 2.0 includes the traditional C interface and additionally a C++
interface. OpenCV runs on any major operating system including Android. Android is the
most common operating system of smart phones.
The pre-compiled OpenCV library is downloadable from the project homepage of OpenCV
(OpenCV, 2012). When installing OpenCV it is a good idea to choose an installation path
which has no spaces in it. For example on “C:\OpenCV2.2” instead of “C:\Program Files
(x86)\OpenCV2.2”. This can prevent problems with the inclusion of the library.
Honours Degree Project Nikolai Bickel Page 22 of 69 26/04/2012
3.3.2.3. PCL
PCL stands for point cloud library. Its main goal is the 3D point cloud processing. The PCL
framework contains numerous state-of-the art algorithms including, surface reconstruction,
feature estimation, segmentation and model fitting. PCL is an open source project.
The PCL were installed with the normal All-In-One installer from their homepage. This
includes all of the libraries and also the dependencies of PCL. Part of the PCL is also
experimental implementation of the KinectFusion algorithm. This algorithm is not included in
the All-In-One installer. There is more about this reconstruction algorithm in chapter 4.5.1.
3.3.2.4. Kinect backend
Certainly the RGBDemo need a middleware to get access to the Kinect. These are the Kinect
interfaces which are described in chapter 3.2. In this project the RGBDemo was used with
OpenNI.
3.3.3. RGBDemo installation
After the installation of the dependencies the next step is to download the source code of the
RGBDemo from the internet. This download can be done with git. There are more details
about git written in chapter 3.4.2. The git command for the download is:
git clone --recursive git://github.com/nburrus/rgbdemo.git
With this git command the source code is saved to the hard drive and in the next step to
configure the source code with the CMake GUI (see chapter 3.4.1). Therefore the source
directory is set to the path were the source code was downloaded. For example when the
source is on C:\ the source directory is “C:\rgbdemo”. Then we are determining the build
directory. We set the path to “C:\rgbdemo\build“.
During the project the experience was made that it is no good idea to use the Cache function
of CMake. It made more problems than it was useful. The cache can be deleted over File
menu.
The next step is to start the configuration by doing a click on the “Configure” button. In the
next step a compiler can be selected. In this project the compiler from Visual Studio 2010
were used. That is a 32-bit compiler.
A list of grouped names and values should appear now. These are the configuration
parameters. Set the parameter OpenCV_DIR to the folder where the OpenCV binaries are (for
example C:\OpenCV2.2).
To use the PCL open the NESTK group and select the checkbox at the name
“NESTK_USE_PCL”.
Hit the configure button again and your almost done. The Cmake log should not show any
errors. The CMake project is configured now. You can start to generate the Visual Studio
Project files by clicking on the “Generate” button. Cmake then generate the project files
(Visual Studio solution) to the source directory. For example to: “C:\rgbdemo\build”.
Honours Degree Project Nikolai Bickel Page 23 of 69 26/04/2012
Then the Visual Studio solution with the name RGBDemo.sln can be opened. To finally get
an executable file you need to right click on the project and click “Build”.
There are two different build configurations available. These two are a “Release” and a
“Debug” configuration. In a debug build the complete symbolic debug information is emitted
to help visual studio to provide the debug tools and also the code optimization is not taken
into account. In the release build there is no symbolic debug information and the code is
optimized. The result is that the RGBDemo is approximately three times faster when with the
release configuration.
3.4. Supporting tools
3.4.1. CMake
A lot of tools that were used in this project needed CMake. CMake is a program to help and
automate the build and compilation process is cross-platform and open source.
CMake generate with source code and the CMake configuration files, project files for native
build environments. These are for example in Linux makefiles, in Windows Visual Studio
solutions and in Apple Xcode.
This solves the problem of many open source and cross-platform applications like OpenNI,
RGBDemo, PCL and OpenCV have. The problem is that one build environment for example
makefile cannot build applications for Windows and Visual Studio does not build applications
for Linux. CMake separate the process of writing source code and compile it for a platform.
Figure 7 CMake process
Figure 7 show the process supported from CMake. The configuration files names are
“CMakeLists.txt”. The native build system is for example a Visual Studio solution with
projects and the native build tools are in this case Visual Studio. With the CMake generated
solution you can compile the source code to an executable file or a library.
CMake differentiate between two different folder trees. One is the source tree which contains
the CMake configuration files, the source code and the header files and the other is the binary
tree which contains the native build system files and the compilation output as for example
executable files or libraries. Source and binary tree could be in the same directory but it is
better to spate the trees. This has the advantage that the developer can delete the binary tree
and not affect any of the source code. This two folder trees are configured in the CMake GUI
in the field’s labelled with “Where is the source code” and “Where to build the binaries”.
Cmake has a Cache where it saves the configured values. This cache is located in the build
tree and stores the key-value pairs in a text file called “CMakeCache.txt”. The Cache contains
a simple variable name and a value for that variable. Those configuration variables are for the
Configuration file
(CMakeLists.txt) CMake
Native Build System
Native Build Tools
(Visual Studio)
Executables / Libraries
(.exe, .lib)
Honours Degree Project Nikolai Bickel Page 24 of 69 26/04/2012
RGBDemo for example OpenCV_DIR with a value where the OpenCV folder is located or
NESTK_USE_PCL with the information whether we want to use PCL or not (where 1 is true
and 0 is false).
The configuration files have a simple syntax to govern the configuration process.
IF (NOT WIN32)
SET(NESTK_USE_FREENECT 1)
ENDIF()
This is an example code of the RGBDemo configuration file that ensures if the operating
system is not Windows to set the configuration value NESTK_USE_FREENECT to true. That
makes the OpenKinect (freenect) to the default Kinect interface in Linux and Apple.
But just setting the variable is not enough. In the source code we have to check if the CMake
variable is set and then use OpenKinect or OpenNI.
#ifdef NESTK_USE_FREENECT
# include <ntk/camera/freenect_grabber.h>
#endif
In the source code of the RGBDemo the OpenKinect (freenect) header files are only included
when the variable is set to true. #Ifdef is a pre-processor command for C and C++.
CMake is very scalable. KDE a desktop environment with approximately six million lines of
code used for their build process.
Honours Degree Project Nikolai Bickel Page 25 of 69 26/04/2012
3.4.2. Git
Git is the name of a distributed version control and a source code management system. Git
was initially designed and developed by Linus Torvalds for Linux kernel development. Git
does not need any network access or a central server once the code is downloaded.
Source code management systems keep track of a source code of a program when more than
one person is working on it. A lot of open source projects and companies are using Git to
manage their code.
Git has a lot of commands and functionalities. The functions that were used in this project are
only the tip of the iceberg. The following commands are just a small subset of Git. The
commands below were the only ones used in this project. RGBDemo and OpenNI are using
Git also for version controlling.
Git is available for Windows, Mac OSX and Linux. The Windows version is integrated into
the command line. To download the source code from a Git address you navigate to the folder
you want to store the code and execute the following commands:
git clone --recursive + git url
The git address is published by the project you want the code from. The recursive command is
to clone the sub modules to. For example nestk is a sub module of the RGBDemo.
git fetch
When the code is in one folder you can download latest source code by using the fetch
command.
3.4.3. Meshlab
Meshlab is an open source program to process meshes and point clouds. MeshLab is oriented
towards the management and processing of unstructured large meshes and provides a set of
tools for editing, cleaning, healing, inspecting, rendering and converting these kinds of
meshes. MeshLab has been chosen as it's a free software and has been used in various
academic and research projects.
Meshlab was used for the point cloud processing and meshing (see chapter 0).
3.4.4. Microsoft Visual Studio
Visual Studio is a well-known integrated development environment from Microsoft. A source
code editor and a debugger are just a few of the functionality Visual Studio provides.
In this project Visual Studio was used to write the accuracy analysis program and inspect and
modify the source code of the RGBDemo.
Honours Degree Project Nikolai Bickel Page 26 of 69 26/04/2012
4. Implementation
4.1. Chapter Summary
This chapter contains more information about how the 3D body model was produced and
other things that are important to create a 3D body model. The chapter also contains
information about the Kinect error and the accuracy experiment that has been made during
this project.
4.2. Kinect Calibration
Camera calibration is a way of analysing an image and derive what the camera situation was
at the time the image was captured. Camera calibration is also known by the term “Camera
resectioning”. The camera parameters are represented as a camera matrix. This is a 3 * 4
matrix. In the pinhole camera model, a camera matrix is used to denote a projective mapping
form the world coordinates to pixel coordinates using a perspective transformation.
“OpenNI comes with a predefined calibration stored in the firmware that can directly output
aligned depth and colour images with a virtual constant focal length. Most applications will be
happy with this calibration and do not require any additional step. However, some computer
vision applications such as robotics might need a more accurate calibration.” (MANCTL,
2012)
Because of the above statement from MANCTL in this project a calibration was made to
determine the specific intrinsic parameter of the Kinect. The RGBDemo contain an algorithm
to do this calibration.
Nicolas Burrus the programmer of RGBDemo is writing in the discussion board of the
program, that he used the calibration routine from OpenCV. It is basically a pin-hole model
with distortions.
For the calibration the application “calibrate-openni-intrinsics.exe” from the RGBDemo was
used. After the compilation the program can start over the command line (cmd.exe):
calibrate-openni-intrinsics --pattern-size 0.0325 calibration
calibration.yml
These are the parameter for the calibration program:
pattern-size The square size of the chessboard in mention
calibration The folder with a set of images with a checkerboard
calibration.yml The initial calibration file
The folders with the set of images with a checkerboard were made with the rgb-viewer.exe. A
few examples are in Figure 8.
Honours Degree Project Nikolai Bickel Page 27 of 69 26/04/2012
Figure 8 Kinect calibration RGB images
The calibration.yml file was exported with the rgb-viwer.exe with the menu File -> Save
calibration file. This is the calibration file generated by OpenNI with the default parameters
for the Kinect.
The result of this process is a new calibration file “openni_calibration.yml”. In this file the
intrinsic camera parameter of the Kinect video camera and the depth camera are stored.
Camera matrix, or a matrix of intrinsic parameters
Cx and Cy is a principal point (that is usually at the image center)
and are the focal lengths
Result of calibration (RGB-Intrinsic):
The depth camera had the same intrinsic matrixes. It looked as if the depth camera calibration
for OpenNI backend is not available. But after the calibration files worked with the other
applications for example the Reconstructor program there was no further research in this area
executed.
The generated calibration file was used every time offline operations with the RGBDemo
were done.
The alignment between the depth data and the RGB data are done by the OpenNI framework
internally. This means the depth at point [1, 1] are corresponding to the colour in the RGB
image at point [1, 1].
Honours Degree Project Nikolai Bickel Page 28 of 69 26/04/2012
4.3. Collect data
When the connection with the Kinect was available data could be recorded. The RGBDemo
was used to collect the depth and image data.
In this project the data was not processed in real time. This means in the first step the data was
recorded to the hard drive and in the second step the data was then reconstructed from there.
This had a few advantages.
One advantage is that the RGBDemo can reconstruct only 1 FPS (frames per second) but can
save 6 FPS to the hard drive. The barrier for more frames per second are not the computing
time of the reconstruction instead it is the limitation of the hard drive. To record the Kinect
data in this project a SATA hard drive with 7200 and 5400 rpm was used. The program can
probably save a lot more frames per second with an SSD (Solid-state drive) hard drive.
Another advantage of recording and saving data to a disc is that they are not lost after one
reconstruction cycle. When the data is saved it can be used again to make the reconstruction
or the accuracy analyse several times with different parameter or different program code.
One disadvantage might be that you do not have a full reconstructed model after a scan. It
takes a bit of time to analyse the data after it’s saved. In a commercial use this might be a
critical factor.
4.3.1. How to save the Kinect data
The program to save the data is the “rgbd-viewer.exe” (see chapter 3.3).
All of the necessary recording functions are encapsulated in a class called
“RGBDFrameRecorder”. You can see the function and properties in the following UML
diagram:
Figure 9 RGBDFrameRecorder UML class
This class stores a series of RGB-D images in “view directories”.
Honours Degree Project Nikolai Bickel Page 29 of 69 26/04/2012
The first step is to select the folder where the program should save all the frames. This is
implemented through a QT text field (see chapter 3.3.2.1).
The program that handles the storing of the information is as mentioned above the “rgb-
viewer.exe”. This viewer has a function called “onRGBDDataUpdated()”. The function name
says pretty much what this function is doing. Every time when there is a new frame coming
from the Kinect this function is called. This function contains amongst others the command:
m_frame_recorder->saveCurrentFrame(m_last_image);
m_frame_recorder is from the data type „ RGBDFrameRecorder “ and m_last_image is from
the data type “RGBDImage”.
The function “saveCurrentFrame” generates the full directory path where the data should be
stored and call the function “writeFrame()” from the class RGBDFrameRecorder. In this
function all of the information from this frame is actually stored to the hard drive.
The folder structure after two saved frames looks like that:
GRAP1
is the name of the configured folder we want to save the data.
a00366901966103a
is the serial number of the Kinect. This folder exists when someone wants to record multiple
Kinects. In this case for every Kinect there is a new folder.
viewXXXX
For every frame there is a new viewXXXX folder. XXXX are a consecutive number from 0.
raw
In this folder all the raw frame data are stored.
Honours Degree Project Nikolai Bickel Page 30 of 69 26/04/2012
4.3.2. What data are saved per frame?
The term “Frame” has a lot of different definitions. In this project a frame is a collection of
the following three files.
4.3.2.1. color.png
Containing the image data of the frame compressed in
the format PNG. PNG stands for “Portable Network
Graphics” and it is a bitmapped image format that
enables lossless data compression. A lossless data
compression allows reconstructing the exact original
data from the compressed data. An example of such an
image is on at the right side. This image is saved with
colour information.
The resolution of the image is 640x480 pixels.
4.3.2.2. depth.raw
This is a file format for RGBDemo. The nestk library has functions to read and write in that
format. These functions are located in the opencv_utils.cpp and their names are
“imwrite_Mat1f_raw” and “imread_Mat1f_raw”.
qint32 rows = m.rows, cols = m.cols;
f.write((char*)&rows, sizeof(qint32));
f.write((char*)&cols, sizeof(qint32));
f.write((char*)m.data, m.rows*m.cols*sizeof(float));
In this code snipped you can see how the raw information is saved. First the program writes
two 32 bits integers which contain row and column information. And then it saves rows*cols
32 bits float values.
Because of that process the every depth.raw have a size of 1.17 MB (1.228.808 bytes).
640 and 480 are the normal weight and height of the depth image.
4.3.2.3. intensity.raw
This is the IR image normalized to grayscale and saved with the same method as the
depth.raw. Because of that the intensity.raw-file also has a size of 1.17 MB.
Figure 10 color.png example picture
Honours Degree Project Nikolai Bickel Page 31 of 69 26/04/2012
4.4. Accuracy measurements
During the project an application was programmed to find out the accuracy of the Kinect.
Measurement errors can be split into two components: random error and systematic error.
(Taylor, 1999)
Figure 11 Random and systematic error (Taylor, 1999)
4.4.1. Random error
Random errors are errors in measurements inherently unpredictable, and have null expected
value in the experiment. Every measurement is susceptible to have a random error. Random
errors show up as different results for supposedly the same repeated measurement.
4.4.2. Systematic error
“Systematic error is caused by any factors that systematically affect measurement of the
variable across the sample. “ (Research methods, 2012)
“The correction of systematic errors is a prerequisite for the alignment of the depth and colour
data, and relies on the identification of the mathematical model of depth measurement and the
calibration parameters involved. The characterization of random errors is important and useful in
further processing of the depth data, for example in weighting the point pairs or planes in the
registration algorithm.” (Kourosh & Elberink, 2012)
In their paper (Kourosh & Elberink, 2012) are pointing out, that it is important to know the
random error when processing the data.
Honours Degree Project Nikolai Bickel Page 32 of 69 26/04/2012
Several sources are coming to the conclusion that “the random error of depth measurements
increases quadratically with increasing distance from the sensor and reaches 4 cm at the
maximum range” (Kourosh & Elberink, 2012) (ROS, 2011).
4.4.3. The setup
To look if these results of the previous papers are possible a setup to test the random error of
the Kinect was made.
Figure 12 Kinect accuracy test setup
The Kinect is pointing towards the sheet. Then the information form the Kinect is collected
with the method described in the previous chapter 4.3. As a result a few frames on different
distances to the Kinect were taken. This is one example frame:
Figure 13 Kinect accuracy test example frame
4.4.4. Colour Segmentation
Of course the program should just use depth point of the white area on the sheet and not the
depth data from somewhere else. The first approach was to use colour segmentation.
The algorithm searches for all of the pixels in the RGB Image that are white, and because the
depth and the RGB Image are aligned every pixel of the depth image identified in the RGB
Image as a white pixel was used. Certainly as you see the in Figure 13 these pixels are never
prefect white. Because of that other colours then clean white were used (for example grey
colour shades).
Honours Degree Project Nikolai Bickel Page 33 of 69 26/04/2012
As you can also see in Figure 13 there is a white colour in the background of the image
recorded from the light. These white points were not used for the accuracy analysis because
there were thresholds in the program. For example when the plane was located in 100 cm
distance of the Kinect the program just used depth values that are in the range from 90 cm to
110 cm for the calculation.
But there were still problems with that approach because there were also white pixels on the
retainer of the sheet and the depth artefacts which were also used because their value were
inside the depth threshold.
Because of those problems the colour segmentation was
not used and the area of the sheet were defined by hand.
The point on the left upper corner and the point at the
bottom on the right were defined and then the program
just used the points inside this rectangle.
The highlighted rectangle at Error! Reference source
ot found. shows the area that is used for the analysis. All
the pixels that are red are included into the accuracy
measurement for the program to analyse.
4.4.5. Histograms
A histogram is "a representation of a frequency distribution by means of rectangles whose
widths represent class intervals and whose areas are proportional to the corresponding
frequencies." (Dictionary, 2012)
In this project histograms were used to show visually the distribution of the depth values on
the observed rectangle on the sheet.
To generate a histogram you first have to define a range you want to observe. For example
when we had measured with the sheet at a distance of 60 cm we are looking at the highest and
the lowest value of the depth data (58 and 62 cm) and used them as a range.
Then these ranges were divided into intervals and then every depth was assigned value to one
of these intervals. To
calculate the histogram
functions of OpenCV
were used (cvCreateHist,
cvCalcHist).
Figure 15 shows the
histogram when the sheet
is 60 cm away from the
Kinect sensor. The gaps
between some occur
because of the
quantisation range of the
depth inside the Kinect.
Figure 15 Histogram
Figure 14 Accuracy measurement with
highlighted area
Honours Degree Project Nikolai Bickel Page 34 of 69 26/04/2012
4.4.6. Standard deviation
Standard deviation measures the dispersion of a set of data from its mean. The more the
values are spread apart the mean, the higher the deviation. Standard deviation is a well-known
parameter in statistics. The standard deviation is calculated as the square root of variance.
The standard deviation can be calculated with the following formula.
√
∑
Where in this experiment
{ } are the depth values
is the mean of the depth values
In this project the standard deviation was calculated over a set of frames with the depth values
on the sheet. The results of these calculations are shown in 4.4.9.
4.4.7. Error map
To visualize the errors there was also made an error map. Every depth
value in the frame was subtracted by the mean of all depth values. The
result values then converted into colour information according to the
distance of the mean and showed in the accuracy program.
4.4.8. Problems
There is a problem when the Kinect is not facing straight to the sheet. This problem could be
solved by estimating a plane with the depth values with SVD and then calculate the distance
from all points to this plane. That was not made in this project because at the end of the
project the priorities were the 3D body model.
4.4.9. Results
Distance
[cm]
Average
distance
[cm]
Average
minimal
value [cm]
Average
maximal
value [cm]
Standard
deviation
Frames
90 89.9007 88.7333 89.9007 0.496103 6
80 80.1377 79.1571 81.2036 0.369619 56
70 70.5072 69.6477 71.6892 0.386073 111
60 60.3991 59.5136 61.3 0.39508 22
The standard deviation is really high. The error is higher than the results in the literature
(Kourosh & Elberink, 2012) (ROS, 2011). There are a lot of error sources but one important is
properly the one pointed out in 4.4.8. Another bad effect could be if the calibration went
wrong.
Figure 16 Error map of the first
frame (60 cm)
Honours Degree Project Nikolai Bickel Page 35 of 69 26/04/2012
4.5. 3D reconstruction
In computer vision and computer graphics 3D reconstruction is to capture the shape (in our
case the body) of an object. In this project two different Kinect programs that provide
reconstruction functionality were evaluated. These two called RGB-Demo Reconstructor and
PCL Kinect Fusion.
4.5.1. PCL Kinect Fusion
The PCL (point cloud library) project was already mentioned in chapter 3.3.2.3. They
implemented the Kinect Fusion algorithm into their library (PCL, 2011). The Kinect Fusion
project “investigates techniques to track the 6DOF position of handheld depth sensing
cameras, such as Kinect, as they move through space and perform high quality 3D surface
reconstructions for interaction” (Microsoft, 2012). They have published the two research
papers “KinectFusion: Real-time 3D Reconstruction and Interaction Using a Moving Depth
Camera” (Izadi, et al., 2011) and “KinectFusion: Real-Time Dense Surface Mapping and
Tracking” (Newcombe, et al., 2011) . The PCL open source community is about to implement
the algorithm from this scholarly papers in the PCL source code.
This program is not in the official release version which means it is currently in development
and can be used just experimental. During the evaluation of reconstruction programs this
implementation was also tried. Therefore the SVN and CMake (see chapter 3.4.1) were used
to build environment. It worked but not perfect. Probably this was because the algorithm was
not perfectly implemented. The algorithm support real time reconstruction and the code rely
heavily on the NVidia CUDA development libraries for GPU optimizations. Compute Unified
Device Architecture (CUDA) is a parallel computing architecture developed by Nvidia for
graphics processing (Nvidia, 2011).
In this project this approach was not used because it was too difficult to predict in which
direction the project is moving. Because the program is in an experimental stage the creator of
this program could easily change interfaces which could have an effect on the program. By
the time the program was tested there were also no easy export function of the generated
model, but it is possible to extract this point cloud model somehow. A little disadvantage was
also that results are without colour. Because of this the RGBDemo Reconstructor was used. In
a few months PCL KinectFusion is definitely an option to keep an eye on.
4.5.2. RGBDemo Reconstructor
The RGB Reconstructor “rgbd-reconstructor.exe” is a part of the RGBDemo demo programs.
In chapter 3.3 this tool is already explained.
When we generate the Visual Studio solution we have to build the rgbd-reconstructor project.
It is necessary to build in “Release”-mode so that the speed of the reconstruction is faster. The
Debug mode has a lot of overhead from the debugging tools.
The official purpose of the RGBD Reconstructor is the interactive scene reconstruction. In
praxis this looks if you are walking in a room with a Kinect and the program will then
progressively aggregate all captured frames in a single 3D point cloud model.
Honours Degree Project Nikolai Bickel Page 36 of 69 26/04/2012
Because the normal purpose of the Reconstructor is to scan a room and not a person’s body
the first thing was to look if the reconstruction also works with a person inside the scene
(room). The big different between a scene (room) and a person is that even when the person
tries to stand still there is movement of the body. For instance when the person breathes there
is a small movement of the chest and when the person is wearing clothes they also move a
little bit.
This data is represented in a single reference frame using a Surfel representation to avoid
duplicates and to even out the result. An object is represented by a dense set of points Surfel is
an abbreviation of "surface element" and in 3D computer graphics Surfels are a alternative to
polygonal models (Meshes). The creator of the RGBDemo calls it Surfel representation but it
is also possible to call it point cloud.
It is possible to use the RGBD Reconstructor in real-time. That means you can start this
program with a connected Kinect and as a frame comes in its analysed and integrated into the
point cloud. In this project this was not done because of the following disadvantages.
First it does not use every frame that is coming in from the Kinect in real-time because it took
the program to a lot of calculations to find the right spot where it can insert the data of this
frame. Therefore the program uses just 1 Frame per second with an i7-2720QM CPU (quad-
core processor with 2.20 GHz/Core). Of course there is a possibility to improve this process
of the algorithm or use a faster computer.
The second disadvantage has a little bit to do with the development respectively adaption of
the reconstruction program. When you save all the frames on a disk you can repeat the
reconstruction several times with different parameters and other modifications and can look if
there are differences in the quality of the result (Point cloud).
That is why the frames of the Kinect were collected with a frame rate as fast as possible with
a hard drive (about 4-10 FPS) and the reconstruction program run on this data to get a point
cloud. How and what data recorded is explained in chapter 4.3.
The data is recorded whilst another person walked around the subject (first person) with the
Kinect sensor in his hand to get the information about the body from 360 degree. The first
thought was that the person standing in front of the Kinect rotates on its own axis. That has
the advantages of the scanning process not needing two involved persons (One who scan and
one who are the subject). But there were problems with that approach. First the RGBD
Reconstructor is not build to make this task. But this is not the main problem because it is
possible to change the algorithm. The major problem is that when the person is rotating
around its own axis the body deforms too much.
Honours Degree Project Nikolai Bickel Page 37 of 69 26/04/2012
Figure shows six frames from the recorded data of one scan. On the left side is the depth
image converted into a colour representation and on the right side the corresponding RGB
Image. For one reconstruction approximately 1000 frames were used.
After the images were collected the reconstruction process begun. The reconstruction program
(rgbd-reconstructor.exe) can be started with the following command on the windows
command line (cmd.exe).
C:\RGBDemo-0.7.0-Source\build\bin\Release>rgbd-reconstructor.exe --
calibration openni_calibration.yml --directory C:\usi7 --sync true –-icp
true
The parameters for the Reconstructor are:
calibration The calibration file (yml) (see chapter 4.2)
directory The folder where the recorded data is located (see chapter 4.3)
sync The synchronization mode that should be used
icp Use ICP to refine pose estimation
The synchronization mode tells the program that it should use every frame to build the point
cloud.
After the process of reconstruction which takes about 10 minutes the result is a point cloud in
the programs memory. This point cloud can be exported in the format .ply.
Figure 17 six example input frames for the Reconstructor
Honours Degree Project Nikolai Bickel Page 38 of 69 26/04/2012
4.5.2.1. Export file format - .PLY
PLY is a computer file format known as the Polygon File Format or the Stanford Triangle
Format. The format is designed to save three dimensional data. The format has a relatively
easy structure and there are two variation of the format one in ASCII, the other in binary. The
RGBDemo exports its 3D models in the ASCII version.
ply
format ascii 1.0
element vertex 3294565
property float x
property float y
property float z
property float nx
property float ny
property float nz
property uchar red
property uchar green
property uchar blue
end_header
0.00290329 0.00341359 0.429719 -0.203918 0.268669 -0.9414 254 254 254
-0.000625859 -0.00391612 0.432549 0.200947 -0.465571 -0.861896 254 254 254
-0.0013024 -0.00193452 0.438721 0.333459 -0.0203877 -0.942544 254 254 254
0.004828 0.000784338 0.443075 0.457977 0.101636 -0.883135 254 254 254
This is the header and a few points of an exported file. The first line “ply” indicates the file as
a PLY file. The second line indicates which variation of the PLY format this is. The third line
presents a description of how some particular data elements is stored and how many of them
there are. The following “property“-lines are describing how the element is represented.
Where x, y, z are the coordinates, nx, ny, nz are the normals and red, green, blue are the RGB
representation of the colour of a point.
Figure 18 Reconstructed point cloud
Honours Degree Project Nikolai Bickel Page 39 of 69 26/04/2012
Figure 20 is showing the result of reconstruction process in MeshLab. The person is on the
left side and as you can see the algorithm reconstructed also the walls and a bit of the floor.
That unimportant information is deleted in the next step (see chapter 4.6).
4.5.3. Implementation
In the official forum the main developer of
the RGBDemo describes the algorithm of
the RGBD Reconstructor this way: “it
basically uses feature point matching,
following by RANSAC and optional ICP
pose refinement” (Burrus, 2012).
The program extract the SURF features
from the camera image and localize them in
3D space. Then it matches these features
between the previous acquired images, and
use RANSAC to robustly estimate the 3D
transformation between them. Optionally it
uses ICP to refine the estimated camera
position.
If the algorithm found a pose and the error
thresholds (of RANSAC and ICP) are not
exceeded then the program adds the points
in this frame to the reference frame. That is
the point cloud which is later exported from
the program.
4.5.3.1. SURF
SURF stands for “Speeded Up Robust Feature” and was first presented by Herbert Bay et al.
in 2006 (Bay, et al., 2008). It is an image detector and descriptor that can be used in computer
vision. SURF is based on sums of 2D Haar wavelet responses and makes an efficient use of
integral images. The standard version of SURF is partly inspired by SIFT. SIFT is another
better known algorithm in computer vision to detect features in images. The standard version
of SURF is faster than the SIFT implementation.
In computer vision and image processing feature detection is a concept to find information
about an image that describes an image in a way a computer can work with. The result of
feature detectors (SURF) is often subset of points which describe the image appropriate.
Often the features extracted by analysing the surrounding pixels of one pixel.
There are different types of image features and often feature detection algorithms are
specialised on one of these features. These types are for example Edges, Corners / interest
points, Blobs / regions of interest or interest points or Ridges.
Figure 19 RGBD Reconstructor flowchart
Honours Degree Project Nikolai Bickel Page 40 of 69 26/04/2012
After the SURF algorithm has found the interest points in the RGB image this information is
combined with the depth data because the interest points of the RGB image are just two
dimensional. For example if an interest point is at the pixel [5, 5] it access the depth data on
pixel [5, 5] and combines this value to a point in three-dimensional space. These 3D points
are then matched with 3D points of previous frames. The result is a set of point-wise 3D
correspondences between two frames. Based on these correspondences the RANSAC
algorithm estimates the relative transformation between the frames.
4.5.3.2. RANSAC
RANSAC is an abbreviation for "RANdom SAmple Consensus". It is an algorithm to
estimate parameters of a mathematical model from a set of observed data. The property of this
observed data is often that they have outliers.
The input data for the relative pose transformation estimation with RANSAC at the RGB
Demo are the interest point correspondences.
4.5.3.3. ICP
The RGB Demo uses a variation of the ICP (Iterative closes point) (ZHANG, 1992) to refine
the estimated transformation. The process is optional and takes computing time. If it is
important to have a very fast reconstruction it is better to turn of this feature. Certainly the
quality of the reconstructed object is not as good as possible. Because this project did not look
on the speed properties of the reconstruction this refinement with ICP were used.
The implemented ICP algorithm is not in the RGBD source code. Instead the program is using
the ICP method from the Point Cloud Library (see chapter 3.3.2.3).
Honours Degree Project Nikolai Bickel Page 41 of 69 26/04/2012
4.5.4. Problems/Solutions
This section contains problems with the reconstruction and what the solutions of these
problems were.
There was a problem that the RGBDemo did not use all frames successive. Instead it took the
first image and when the Reconstructor program tried to estimate the camera position the
other frames continued. This means after the first frames was computed the program did not
use the second frame instead it used the frame that is at this moment active for example the
fifth frame. But with this behaviour the reconstructed model lost information and the
estimated camera position was not very good because there was not enough reference
information.
The RGBDemo had an option to compute the frames one by one but in the early versions this
option did not work. This option can be activated by putting a “--sync true” parameter when
starting the Reconstructor program. Nevertheless in the first versions this option did not work
but with the version 0.7.0 that bug was fixed and the frames were computed one by one.
Another issue was that the reconstruction stopped at a certain point. Often this was the case
when the front of a body was scanned and then came to the side of the person. Somewhere in
this area the Reconstructor algorithm lost the track. An example of this issue is shown in
Figure 20. As you can see in the side view there just a few points from the back.
Figure 20 Failed reconstruction shown from front and side view
Honours Degree Project Nikolai Bickel Page 42 of 69 26/04/2012
The next attempt with red markers in the background also failed. The idea behind the red and
white paper stuck on the walls was that the algorithm may be able to use this colour to find
more feature points. But as you can see in
Figure 21 this did not work. The supervisor gave the advice to try another environment for the
scan because although there more markers added to have more feature points they maybe still
to small. After a change of the location the reconstruction
worked as expected.
Figure 21 Reconstruction with marker
This is a limitation of this algorithm. The background should not be monotonous. The more
diversified the background colour is the merrier. Another solution could be to change the
threshold of the feature points. This could have an effect on the quality of the estimated
position. It is also possible to estimate the camera position just with the depth information but
this is a big change in the actual algorithm.
Honours Degree Project Nikolai Bickel Page 43 of 69 26/04/2012
4.6. Point cloud processing
A point cloud is a set of points in a three-dimensional coordinate system. These points called
vertices are the plural form of vertex. In computer graphics a vertex is a data structure to
describe a point. The result of the reconstruction is a point clouds were the point are defined
by X, Y and Z coordinates and the colour. To process this point clouds the program Meshlab
was used.
“MeshLab is an open source, portable, and extensible system for the processing and editing of
unstructured 3D triangular meshes. The system is aimed to help the processing of the typical
not-so-small unstructured models arising in 3D scanning, providing a set of tools for editing,
cleaning, healing, inspecting, rendering and converting this kind of meshes.” (MeshLab,
2012)
Although there is mesh in the name of the program it also has a range of utilities to edit point
clouds. MeshLab can open the ply (Polygon File Format) files that are exported by the
reconstruction program. In this project the cleaning of the point could was made by hand.
Cleaning a point cloud means to delete all points that do not belong to the object. It is possible
to automate this process by trying to recognize the walls and the ground and delete these
points but this was not done in this project.
The images below show the process of the cleaning on the left side and the result on the right
side.
Figure 22 After point cloud processing Figure 23 Point cloud cleaning
Honours Degree Project Nikolai Bickel Page 44 of 69 26/04/2012
4.7. Meshing
The result of the reconstruction and the cleaning is a point cloud. They are a bunch of
disconnected points floating near each other in three-dimensional space. When we look
closely the image will break down into a bunch of distinct points with space visible between
them. “If we wanted to convert these points into a smooth continuous surface we’d need to
figure out a way to connect them with a large number of polygons to fill in the gaps. This is a
process called "constructing a mesh"” (Borenstein, 2011).
To build a mesh the Poisson surface reconstruction (Kazhdan, et al., 2006) implemented into
MeshLab is being used. The Poisson algorithm is designed to handle noisy point clouds like
ours. The Poisson algorithm has a triangle mesh as a result.
An alternative algorithm implemented in Matlab is the Ball Pivoting (Bernardini, et al., 1999).
This algorithm uses the points from the point cloud and links them together into triangles.
Because these algorithms use the points from noisy data, there are a lot of holes in that mesh.
Additionally there are double surfaces when point clouds are not perfectly aligned.
The advantage of the Passion algorithm is that it minimizes the creation of holes even if some
parts of the surface are missing in the point cloud. This is because the algorithm wrap around
the points. The algorithm does not use the points of the point cloud as a vertex. Because of
that property of the algorithm produces smooth surface. In order that the Passion algorithm
works right it is necessary that every point in the point cloud was assigned normal. These
normals can be calculated with Meshlab’s filter called “Compute normal for point sets”.
After the Passion surface reconstruction the colour is lost in the mesh. To colorize the mesh
there is a Matlab filter called “Vertex attribute transfer”. This filter picks the colour from the
nearest point of the point cloud and applies them to the mesh.
Figure 24 Meshing process
Honours Degree Project Nikolai Bickel Page 45 of 69 26/04/2012
4.8. Measurements
There were not a lot of measurements done in this project because of the lack of time.
MeshLab has a measuring tool to measure distances.
Figure 25 3D body model with measurement
As you can see in Figure 25 the result of height is 1.71985 m. It turns out that the real height
of the test person is 1.72 m (depends if hair counts). This is quite accurate but was just made
with one person and is not representative.
The measurement tool calculates the distance between two points ( ) in 3D space
with the following formula.
√
Where one point is and the second point is .
Honours Degree Project Nikolai Bickel Page 46 of 69 26/04/2012
5. Critical Evaluation
5.1. Chapter Summary
This chapter mainly contains evaluations and thoughts about the different aspects of the
project. It also includes possible improvements and learning outcomes.
5.2. Project Management Evaluation
All of the requested documents were submitted timely and also all progress reviews were held
in time.
The task scheduling with a Gantt chart was done pretty early in the project. As part of the
project the problems in the project changed and also the time scheduling changed.
A problem at the end of the project was that the planning had not taken into account that at the
end of the academic year there is a lot of other work to do. In a planning for another project
this factors should take into account.
The risk management was an important part at the beginning of the project. Especially in this
project it was very good that the risk “Losing information about the project or related
information” was regarded, and backup of relevant data was made. Because you cannot trust
that the hard drive lasts the whole project. In this project the hard drive broke in the middle of
the project. If there would have been no backup available it could have caused a major impact
on the project progress.
5.3. Design and Implementation
In a project in the real world normally you also evaluate and consider other depth scanners to
take body measurements. But due the limited budget in this project the Kinect was the only
device that was affordable. That is one of the points why the Kinect was used in the first
place. The other advantages are for example the big community around that device and that a
lot of people already have this device in their homes.
It is impossible to use the Kinect on a computer without a Kinect interface. That is why it is
necessary to choose a Kinect interface to get the relevant data. In this project it was a good
idea to use the RGBDemo. It had already implemented a bunch of useful functions to process
the information from the Kinect and a reconstruction example to build on. But you have to
trust that the information from this additional middleware is right. In this project there was
never the suspicion that something does not work correct except the calibration of the depth
camera.
It is recommendable to store the collected data on the hard drive and then use it to reconstruct
the body. For this project it was the right decision but in commercial use it would be more
practical to have the reconstruction in real time and without the need of MeshLab to clean the
relevant information out of the reconstructed scene. This might be implemented with a depth
threshold to exclude the walls in the scene. It was interesting to build a setup to test the
accuracy of the Kinect. In case to repeat this test it is advisable to use a bigger sheet.
Honours Degree Project Nikolai Bickel Page 47 of 69 26/04/2012
The reconstruction worked but not perfect. It depends a lot on the environment to work well.
There are a lot of possible improvements in this area. Especially the KinectFusion (Izadi, et
al., 2011) implemented in an experimental PCL version should not be unconsidered. If it is
ready implemented it could be better than the RGBDemo but in this project it was too early.
Through the lack of time the actual body measurements where really basic one. To take more
measurements of other body parts it is necessary to improve the
reconstruction process and evaluate and find other programmes
to measure meshes.
It should be noted that the scanned persons should extend the
arms in new scans (Shown in Figure 26). It is possible that this
has also an effect on the reconstruction and meshing quality.
5.4. Possible Improvements
To simplify the reconstruction process it would be a very interesting experiment to use more
than one Kinect. Three Kinects are still cheaper than most other depth sensors. It is possible to
connect more than one Kinect to a computer. For example you could use three Kinects from
three different angles and take just a few (or one) frame and then reconstruct the model from
this. This has the advantage that the problem with body movements during the data collection
is minimized. If the three Kinects are calibrated and the position of the cameras is known
there is no camera position estimation necessary. A disadvantage is that no normal consumer
has three Kinects at hand. This improvement could be work for a whole new project but could
still use knowledge and experiences of this project.
Another possible improvement could be to use the Kinect vertically. Therewith you can go
closer to the person scanned and have a more accurate result because the accuracy depends on
the distance to the object. The RGBDemo need some modification to use it in this position.
It could be useful to use the depth information to estimate the camera position instead of the
RGB image information. That has the advantage that the reconstruction is independent from
the surrounding environment.
Machine learning, a branch of artificial intelligence, is maybe an interesting topic to work
with the data generated with that project. Machine learning generates the results by comparing
data with samples from a database of known models. Let’s imagine a database with 3D body
models and each measurement is written down. A new body model is then compared to this
database and looks for similar models to generate with the known measurements a result. The
difficulty in machine learning is that the input models are too much to be covered by a set of
observed examples (training data). Therefore the training data must be generalized. There are
a lot of different algorithms to face this problem also known under the term “pattern
recognition algorithms”.
Figure 26 Model with arms
extended (The Districts, 2011)
Honours Degree Project Nikolai Bickel Page 48 of 69 26/04/2012
5.5. Learning Outcomes
Certainly you know a lot more about the Kinect and how to handle information from the
sensors. Because the Kinect is a depth sensor you also learn more about depth sensors and
their abilities. Also you learn how to access information form the sensor and learn about the
different interfaces and how you can use and install them.
In the accuracy test you learn how to make an experiment and then how to analyse the data.
You also know afterwards that there are different possible error sources and if it is possible to
correct them. You are also learning mathematical skills when analysing the generated data.
The accuracy analysis was programmed in the programming language C++. It is interesting to
work with C++ interfaces that are new for example OpenCV.
After you are working with a bunch of open source projects you are beginning to know that
there is a pattern a lot of open source project are having. For example a lot of them use some
sort of code management tools like SVN or git and a lot of the open source have discussion
boards when you are having questions.
Because the Kinect is not just a depth sensor and also have an in-build RGB camera you learn
how to process and analyse images. What feature points are and how they could help you with
the task you are working on.
When you are looking at code from somewhere else you learn how helpful it is, when there
are comments in the code. Especially when looking at reconstruction algorithm from
RGBDemo it is very helpful. You also learn how difficult it is to reconstruction non-rigid
objects. After this project you know for sure what point clouds and what meshes are and what
the difference between them is.
6. Conclusion
This report gave an overview about the important and interesting parts of this project to turn a
human body into a digital 3D representation.
The project proofs that there are a lot of interesting applications beyond the initial propose as
a game console input device. It shows how the sensor from the Kinect can be accessed and
how the information can be processed. At the moment it contains only a basic measurement.
At the end of the project is a 3D body model but this is still not perfectly accurate. The
evaluation presents a few important ideas to build on this project. For example use of multiple
Kinects or machine learning.
The whole process is at the moment not automated and it needs an expert to build the 3D
body model. That means in this state it is not ready for commercial use, because for the
customer all of the steps made by hand should be done by the computer program. It is
definitely possible to build a program to automate this.
The technology build in the Kinect is still at the beginning of a very interesting future in this
area.
Honours Degree Project Nikolai Bickel Page 49 of 69 26/04/2012
7. References
Bay, H., Ess, A., Tuytelaars, T. & Gool, L. V., 2008. SURF: Speeded Up Robust Features.
Computer Vision and Image Understanding (CVIU), Volume 110, pp. 346-359.
Bernardini, F. et al., 1999. The Ball-Pivoting Algorithm for Surface Reconstruction. IEEE
Transactions on Visualization and Computer , 5(4), pp. 349-359.
Borenstein, G., 2011. Making Things See. s.l.:O'Reilly Media / Make.
Burrus, N., 2012. How rgbd-reconstructor.exe works?. [Online]
Available at: https://groups.google.com/d/msg/rgbdemo/fY1d950ZRxc/8QUALhLpv4wJ
[Accessed 8 April 2012].
Burrus, N., 2012. nestk. [Online]
Available at: https://github.com/nburrus/nestk
[Accessed 03 27 2012].
Burrus, N., 2012. rgbdemo. [Online]
Available at: https://github.com/nburrus/rgbdemo
[Accessed 27 3 2012].
Burrus, N., 2012. RGBDemo License. [Online]
Available at: http://labs.manctl.com/rgbdemo/index.php/Main/License
[Accessed 27 May 2012].
D’Apuzzo, N., 2009. Hometrica. [Online]
Available at: http://www.hometrica.ch/pres/2009_essilor_pres.pdf
[Accessed 19 November 2011].
D’Apuzzo, N., 2009. Hometrica. [Online]
Available at: http://www.hometrica.ch/pres/2009_essilor_pres.pdf
[Accessed 12 April 2012].
Dictionary, F. M.-W., 2012. Histogram - Definition. [Online]
Available at: http://www.merriam-webster.com/dictionary/histogram
[Accessed 22 April 2012].
Freedman, B., Shpunt, A. & Arieli, Y., 2010. Distance-Varying Illumination and Imaging
Techniques for Depth Mapping. s.l. Patent No. US2010/0290698.
Izadi, S. et al., 2011. KinectFusion: Real-time 3D Reconstruction and Interaction. Santa
Barbara, CA, USA., ACM Symposium on User Interface Software and Technology.
Kazhdan, M., Bolitho, M. & Hoppe, H., 2006. Poisson surface reconstruction. s.l.,
Proceedings of the fourth Eurographics symposium on Geometry processing, pp. 61-70.
Kinect for Windows Team, 2012. Starting February 1, 2012: Use the Power of Kinect for
Windows to Change the World. [Online]
Available at: http://blogs.msdn.com/b/kinectforwindows/archive/2012/01/09/kinect-for-
Honours Degree Project Nikolai Bickel Page 50 of 69 26/04/2012
windows-commercial-program-announced.aspx
[Accessed 02 04 2012].
Kourosh, K. & Elberink, S. O., 2012. Accuracy and Resolution of Kinect Depth Data for
Indoor Mapping Applications. sensors, II(12), pp. 1437-1454.
Lee, J. C., 2011. Windows Drivers for Kinect, Finally!. [Online]
Available at: http://procrastineering.blogspot.co.uk/2011/02/windows-drivers-for-kinect.html
[Accessed 4 April 2012].
libfreenect, 2011. libfreenect. [Online]
Available at: https://github.com/OpenKinect/libfreenect
[Accessed 20 November 2011].
MANCTL, 2012. Calibrating your Kinect (OpenNI backend). [Online]
Available at: http://labs.manctl.com/rgbdemo/index.php/Documentation/Calibration
[Accessed 27 March 2012].
MeshLab, 2012. MeshLab. [Online]
Available at: http://meshlab.sourceforge.net/
[Accessed 18 April 2012].
Microsoft, 2012. KinectFusion Project Page. [Online]
Available at: http://research.microsoft.com/en-us/projects/surfacerecon/
[Accessed 11 April 2012].
Microsoft, 2012. The Kinect Effect. [Online]
Available at: http://www.xbox.com/en-GB/Kinect/Kinect-Effect
[Accessed 05 April 2012].
MicrosoftInt, 2011. Introduction to Kinect for Windows. [Online].
Newcombe, R. A. et al., 2011. KinectFusion: Real-Time Dense Surface Mapping and
Tracking. Basel, IEEE.
Nokia, 2012. Download Qt, the cross-platform application framework. [Online]
Available at: http://qt.nokia.com/downloads
[Accessed 19 April 2012].
Nvidia, 2011. http://developer.nvidia.com/nvidia-gpu-computing-documentation. [Online]
Available at: http://developer.nvidia.com/nvidia-gpu-computing-documentation
[Accessed 14 April 2012].
OpenCV, 2012. OpenCV Download. [Online]
Available at: http://sourceforge.net/projects/opencvlibrary/files/opencv-win/
[Accessed 28 March 2012].
OpenKinect, 2011. OpenKinect. [Online]
Available at: http://openkinect.org/wiki/Main_Page
[Accessed 19 November 2011].
Honours Degree Project Nikolai Bickel Page 51 of 69 26/04/2012
OpenKinect, 2012. OpenKinect. [Online]
Available at: http://openkinect.org/wiki/Main_Page
[Accessed 03 April 2012].
OpenKinectSrc, 2012. libfreenect. [Online]
Available at: https://github.com/OpenKinect/libfreenect
[Accessed 4 April 2012].
OpenNI, 2012. Abstract Layered View. [Online]
Available at: http://openni.org/Documentation/ProgrammerGuide.html
[Accessed 01 April 2012].
Pandya, H., 2011. Microsoft Kinect: Technical Introduction. [Online]
Available at: http://entreprene.us/2011/03/09/microsoft-kinect-technical-
introduction/kinect_hacks_introduction/
[Accessed 12 April 2012].
PCL, 2011. An open source implementation of KinectFusion. [Online]
Available at: http://pointclouds.org/news/kinectfusion-open-source.html
[Accessed 14 April 2012].
Research methods, 2012. Measurement Error. [Online]
Available at: http://www.socialresearchmethods.net/kb/measerr.php
[Accessed 23 April 2012].
ROS.org, 2010. Depth calculation. [Online]
Available at: http://www.ros.org/wiki/kinect_calibration/technical#Depth_calculation
[Accessed 03 April 2012].
ROS, 2011. openni_kinect/kinect_accuracy - ROS Wiki. [Online]
Available at: http://www.ros.org/wiki/openni_kinect/kinect_accuracy
[Accessed 11 April 2012].
Takahashi, D., 2012. Gamesbeat. [Online]
Available at: http://venturebeat.com/2012/01/09/xbox-360-surpassed-66m-sold-and-kinect-
has-sold-18m-units/
[Accessed 27 03 2012].
Taylor, J. R., 1999. An Introduction to Error Analysis: The Study of Uncertainties in Physical
Measurements. s.l.:University Science Books.
The Districts, 2011. The Districts. [Online]
Available at: http://thedistricts.wordpress.com/tag/film-terms/
[Accessed 22 April 2012].
Zalevsky, Z., Shpunt, A., Maizles, A. & Garcia, J., 2007. METHOD AND SYSTEM FOR
OBJECT RECONSTRUCTION. Israel, Patent No. WO2007/043036.
ZHANG, Z., 1992. Iterative Point Matching for Registration of Free-form Curves. s.l.:s.n.
Honours Degree Project Nikolai Bickel Page 52 of 69 26/04/2012
8. List of Figures
Figure 1 2D, 2.5D and 3D (D’Apuzzo, 2009) ............................................................................ 6 Figure 2 Microsoft Kinect for Xbox 360 (Pandya, 2011) ........................................................ 10 Figure 3 Image from the PrimeSense patent (Zalevsky, et al., 2007) ...................................... 12 Figure 4 Structured light (Freedman, et al., 2010) ................................................................... 13
Figure 5 OpenNI framework architecture (OpenNI, 2012) ...................................................... 14 Figure 6 Installed OpenNI driver at the Windows Device Manager ........................................ 15 Figure 7 CMake process ........................................................................................................... 23 Figure 8 Kinect calibration RGB images ................................................................................. 27
Figure 9 RGBDFrameRecorder UML class ............................................................................. 28 Figure 10 color.png example picture ........................................................................................ 30 Figure 11 Random and systematic error (Taylor, 1999) .......................................................... 31 Figure 12 Kinect accuracy test setup ........................................................................................ 32
Figure 13 Kinect accuracy test example frame ....................................................................... 32 Figure 14 Accuracy measurement with highlighted area ......................................................... 33 Figure 15 Histogram ................................................................................................................. 33 Figure 16 Error map of the first frame (60 cm) ........................................................................ 34
Figure 17 six example input frames for the Reconstructor ...................................................... 37 Figure 18 Reconstructed point cloud ....................................................................................... 38
Figure 19 RGBD Reconstructor flowchart ............................................................................... 39 Figure 20 Failed reconstruction shown from front and side view ............................................ 41
Figure 21 Reconstruction with marker ..................................................................................... 42 Figure 22 After point cloud processing .................................................................................... 43 Figure 23 Point cloud cleaning ................................................................................................. 43
Figure 24 Mesing process ........................................................................................................ 44 Figure 25 3D body model with measurement .......................................................................... 45
Figure 26 Model with arms extended (The Districts, 2011) .................................................... 47
Honours Degree Project Nikolai Bickel Page 53 of 69 26/04/2012
9. Appendix A
Department of Computing Degree Project Proposal
Name: Nikolai Bickel Course: Computing Size: double
Discussed with (lecturer): Dr. Bogdan Matuszewski, Chris Casey Type:
development
1 Previous and Current Modules
Object Oriented Methods in Computing (CO3402)
Enterprise Application Development (CO3409)
Database Driven Web Sites (CO3708)
Computer Vision (EL 3105)
2 Problem Context
There are a few problems when buying clothes online, the most common being that the
purchased clothes do not fit. This is exacerbated by the fact that many users don’t know their
own size or the size of those they are purchasing for (such as parents who purchase garments
for their children). Many people deal with this problem by ordering several sizes of the same
clothes and send back the excess.
3 The Problem
For those who buy several sizes it can be a nuisance to return the excess clothes. Additionally
if a customer wishes to purchase clothes for a special occasion, they might be reluctant to
order online as they might be unsure of the fitting of the clothes ordered.
The online stores also bear the costs associated with this problem, as they usually pay the
shipping costs for the returned items, and also deal with several logistical issues along with
the costs associated with the resale of the items.
4 Potential Ethical or Legal Issues
none
Honours Degree Project Nikolai Bickel Page 54 of 69 26/04/2012
5 Specific Objectives
Access information from the sensor (RGBD sensor -> Microsoft Kinect)
Isolate the important data
Try to get the sizes of body parts
Convert measurements into clothing size (S, M, L, XL)
Compare program results -> real data
6 The Approach
I want to capture the data of a body of a person using a RGBD sensor. The RGBD sensor I
intend to use is the Microsoft Kinect. The Microsoft Kinect takes a RGB picture of a person
and also captures depth using an additional sensor. Depth sensors are used to measure the 3rd
dimension that is the depth of the object from the camera.
The depth information will be very important data to work with. There are better sensors than
the Kinect, but it is very cheap.
To get the data from the Kinect it must be connected with a Computer over USB. There are
some interfaces to get the required “Kinect data”. The first approach will be to build a desktop
application that can handle the “Kinect data” and get the required clothes sizes.
7 Resources
Microsoft Kinect
Kinect Interfaces
o Kinect for Windows SDK from Microsoft Research (free for research)
o OpenKinect (free)
A PC to connect the Kinect over USB
Honours Degree Project Nikolai Bickel Page 55 of 69 26/04/2012
8 Potential Commercial Considerations
8.1 Estimated costs and benefits
Whether the final product can be used commercially depends on how accurately the
measurements can be made. At this time I cannot say how accurate the measurements will be.
Not every person has a RGBD sensor at home, but maybe in the coming years every webcam
have a depth sensor to provide this information.
9 Literature Review
Is the data from the Microsoft Kinect good enough to get exact data of a person’s body?
10 References
Jamie, S. et al. 2011. Real-Time Human Pose Recognition in Parts from Single Depth Images.
[ONLINE] Available at:
http://research.microsoft.com/pubs/145347/BodyPartRecognition.pdf. [Accessed 27
September 11].
Christian Plagemann, Varun Ganapathi, Daphne Koller, Sebastian Thrun. 2010. Real-time
Identification and Localization of Body Parts from Depth Images. [ONLINE] Available at:
http://www.stanford.edu/~plagem/bib/plagemann10icra.pdf. [Accessed 27 September 11].
eurogamer.net. 2010. Kinect visionary talks tech. [ONLINE] Available at:
http://www.eurogamer.net/articles/digitalfoundry-kinect-tech-interview. [Accessed 27
September 11].
Similar project for webcams (without depth information)
http://www.upcload.com/
http://www.seventeen.com/fashion/virtual-dressing-room
Kinect for Windows Software Development Kit (SDK) beta from Microsoft Research,
http://research.microsoft.com/en-us/um/redmond/projects/kinectsdk/
Free, open source libraries that will enable the Kinect to be used with Windows, Linux, and
Mac
http://openkinect.org/wiki/Main_Page
Honours Degree Project Nikolai Bickel Page 56 of 69 26/04/2012
10. Appendix B
Department of Computing Final Year Project Technical Plan
Name: Bickel Nikolai Size: double Mode: ft Time: 1
Course: Computing Supervisor: Dr. Bogdan Matuszewski
1 Summary
I want to build a computer application which connects to a RDGB sensor (the Microsoft
Kinect). The user of the program should be able to stand in front of the computer and see his
body measurements and what clothing size would fit him. That means that the user should
also be able to see the body measurements which would be determine the process of getting
the clothing sizes.
The challenge of the project is to find the best algorithm to get robust measurements. So I will
measure twice and I want to have nearly the same results. When I have robust and accurate
measurements it will not be difficult to find the right clothing size. To find the best algorithm
I need to test different approaches. There may be accuracy problems due to quality of the data
provided by the Kinect.
2 Constraints
Because I don’t work with an external partner, I just have the deadlines that were given from
the school.
Project Deadlines:
Proposal: 27-09-2009
Technical Plan: 20-10-2009
Literature Report: 24-11-2009
Project Report: 26-04-2010
Honours Degree Project Nikolai Bickel Page 57 of 69 26/04/2012
3 Key Problems
As I mentioned before one of the key problems is to ensure the measurements are robust. To
ensure that the results are accurate there must be a few components working in tandem. When
I am able to get the data via USB I will have a Byte array, so I need to isolate the important
data contained within it. Some external resources will help me to find that important data.
I am unsure of the data quality from a Microsoft Kinect and that there some use cases which
would be too difficult to implement. I may need to write algorithms that compensate for the
problem of noisy data. Another potential problem is that I may not be able to change some of
the hardware restrictions from in the Microsoft Kinect.
The results of the measurements should be presented in an understandable way and the usage
of the application should be not too difficult. Because I don’t know the standards of clothing
industry I need to invest some time to get information about what standards the textile and
clothing industry use. Just with this knowledge I can convert my measurements in
representative clothing size (e.g. S, M, L, and XL)
4 Risk Analysis
Risk Severity Likelihood Action
Noisy depth data from the
Kinect
Medium High Make the best out of the data I get
from the Kinect
Inaccurate data from
external resources
(OpenKinect, MS Kinect
SDK)
High Medium Try to configure the tools the right
way so that the give the best results
that are possible
Robustness – the data of 2
measurements don’t match
Medium High Try to reduce the error rate as much
as possible
The RGBD sensor
breaking
High Low Buy a new one (will take a week)
Losing information about
the project or related
information
High Low Make backups
Measurement points are
too complex to implement
Medium Medium Try the best otherwise reduce the
measurement points to that which
are possible
Scheduling failure, not
enough time to complete
project
High Medium Try to work within a timetable with
the help of a Gantt chart
Honours Degree Project Nikolai Bickel Page 58 of 69 26/04/2012
5 Options
Middleware
o Kinect for Windows SDK from Microsoft Research (Microsoft)
o OpenNI Framework (OpenNI organization)
o OpenKinect (Open Source)
You can see the architecture of the application in the section “System & Work Outline”.
6 Potential Ethical or Legal Issues
When I want to test my application I cannot just test it with my body measures. I need other
subject to test if my application is to be robust as people body sizes will vary, also if someone
is thicker or thinner. I will ask some volunteers to test my application but I will not publish
any personal details (e.g. pictures) in my reports. I may need to publish some anonymized
data for example purpose.
7 System & Work Outline
Sensor array:
Microsoft Kinect
Middleware:
The middleware provides USB driver and each of the Frameworks have own additional
features. As For example they provide image, depth and audio streams or skeleton tranking.
As a part of my preparation for the technical plan I will try to install all of the different
middleware’s and play around with them. Each of them has pros and cons. At the moment I
can’t say which of the products I want to use. I am need more time to test them precisely. As I
Sensor array USB
Middleware
My
Application
Honours Degree Project Nikolai Bickel Page 59 of 69 26/04/2012
mention in my Gantt chart I want to work with all of them next time and then I will choose
one of them.
Kinect for Windows SDK from Microsoft Research (Microsoft)
OpenNI Framework (OpenNI organization)
OpenKinect (Open Source)
The middlewares are not compatible with each other.
My Application:
Which programming language I will use, depends on which of the middleware I choose. I
may need to search for a wrapper. All of the middleware support Microsoft.NET
programming language. I think I will program the application in C# or C++. I can handle both
of them and I think in this project the program language is not one of the big problems.
8 Commercial Analysis
8.1 Estimated costs and benefits
Factor name Description Is this a
cost or a
benefit
Estimated
Amount
Estimate of
when paid
Kinect for Xbox
360
RGBD sensor
Image stream
Depth stream
Audio stream
cost £100 Before the
project
Software Microsoft Visual
Studio, Netbeans,
Middleware
benefit £0 MSDNAA
software and
free software
Miscellaneous Measuring tape,
pocket rule
cost £15 Payable during
project
Working Time Develop and research cost 300 – 400
working hours
During project
Honours Degree Project Nikolai Bickel Page 60 of 69 26/04/2012
8.2 Commercial Conclusion
Whether the final product can be used commercially depends on how accurately the
measurements can be made. At this time I cannot say if the measurements are accurate.
Actually not all people have RGBD sensors at home, but maybe in the coming years every
normal computer webcam will provide depth information.
At the moment the middleware “Kinect for Windows SDK from Microsoft Research” is
licensed only for non-commercial use. But they will release a licence for commercial use. The
beta SDK has been developed to support wide exploration and experimentation by academic
and research communities.
Honours Degree Project Nikolai Bickel Page 61 of 69 26/04/2012
11. Appendix C
12. Appendix D
Build a 3D body model with a single Kinect sensor
Nikolai Bickel,
BSc (Hons) Computing
Project: Body measurements with the Microsoft Kinect
Supervisor: Dr. Bogdan Matuszewski
Second Reader: -
25. November 2011
Abstract
Depth cameras are not conceptually new, but the Microsoft Kinect has made the sensor
popular for researchers and enthusiasts. A 3D body model is beneficial for applications in a
lot of different areas.
This paper gives an overview about how to build a 3D body model with a single Kinect
sensor. It also gives some technical details about the specification and capabilities of the
Microsoft Kinect system. Different algorithms to solve the problems in getting a 3D body
model with a Kinect will be discussed.
Honours Degree Project Nikolai Bickel Page 63 of 69 26/04/2012
1 Introduction
1.1 Context
Kinect becoming an important 3D sensor and that is not because it is the best sensor. It is
because of its reliability and the low cost. If you want to build a 3D body model with a Kinect
it is important to keep in mind some of the problems which could appear. An important part
of working with a technical device is to know the basic behaviour of it. Because of that you
also can find some interesting information about the device in this paper. The paper will show
some possible ways to treat the problem of building a 3D body model with a single Kinect
sensor.
1.2 Overview
Section 2 (Kinect sensor) describes the Microsoft Kinect sensor and its capabilities and some
additionally information about the accuracy and the calibration process. Section 3 (Collect
data from the Kinect) is an overview about the different options to access the data from the
Kinect. The following Section 4 (The object “human body”) contains some error sources
when dealing with a human body. In Section 5 (Pre-processing Kinect data) are explanations
to process the collected data before it can be used in different approaches which are pointed
out in Section 6 (3D - Registration). To get a useable body shape there is a technique called
meshing which is discussed in Section 7.
2 Kinect sensor
Borstein (2011) explains in his book “Making things see” what a Kinect does. The difference
between a normal camera and a Kinect is that the Kinect additionally collects depth data. That
means the Kinect measures the distance to the object that is placed in front of the camera. For
a normal person there is no big difference between a normal picture and depth data but for the
computer it is not so easy to “see” what it wants to know to differ between them. When a
computer analyze a picture it has just the colour of a pixel and it is difficult to separate
different objects and people. In a depth image at the other hand the computer have depth
information for each pixel and it is easier to find the data that it is looking for because he
know how far away the object is from the sensor. A benefit from the depth data is also that
you can build a 3D model of what the camera can see. This is important in building a full 3D
model of an object.
Honours Degree Project Nikolai Bickel Page 64 of 69 26/04/2012
Functionality
The Kinect sensor has a RGB camera, an IR camera and an IR projector. The IR projector
projects irregular patterns to the objects in front of the Kinect. The depth camera creates a
depth image by recognizing the alteration in this pattern. The inventors of the Kinect describe
the measurements of depth as a triangulation process (Freedman, et al., 2010).
Kinect Sensor Array Specifications
Sensor item Specification range
Viewing angle 43° vertical by 57° horizontal field of view
Mechanized tilt range (vertical) ±28°
Frame rate (depth and color stream) 30 frames per second (FPS)
Resolution, depth stream QVGA (320 × 240)
Resolution, color stream VGA (640 × 480)
(MicrosoftInt, 2011)
Accuracy
Khoshelham (2011) has analyzed the accuracy of the Microsoft Kinect in the paper “Accuracy
analysis of Kinect depth data” and came to the following statement: “The random error of
depth measurements increases quadratic with increasing distance to the sensor and reaches 4
cm at the maximum range”. Khoshelham (2011) also comes to the conclusion that for
mapping purpose the data should be acquired within 1-3 m distance to the sensor. At the ROS
homepage (ROS, 2011) is written: “Because the Kinect is essentially a stereo camera, the
expected error on its depth measurements is proportional to the distance squared.”
3 Calibration
In many literature resources they point out that it is important to have a calibrated Microsoft
Kinect to get accurate data (Weiss, et al., 2011), (ROS, 2011), (Pajdla, et al., 2011). It
depends which middleware are in use to access different methods to calibrate a Microsoft
Kinect. There are some explanations and technical descriptions of the calibration process at
the ROS Homepage (ROS CA, 2011) (ROS CT, 2011). For the OpenKinect project there is a
calibration method at the OpenKinect Wiki (Burrus, 2011) . In the paper “Accurate and
Practical Calibration of a Depth and Colour Camera Pair” by (Herrera, et al., 2011) is an
explanation of calibrating a depth and colour camera pair.
4 Collect data from the Kinect
The normal purpose of Microsoft Kinect (MicrosoftKin, 2011) is for playing with the
Microsoft Xbox console gaming system. But there are some projects that allow us to connect
Honours Degree Project Nikolai Bickel Page 65 of 69 26/04/2012
the Microsoft Kinect sensor to the personal computer. These middleware products provide the
USB-driver and interfaces to access the data from the Kinect. The most popular are:
OpenKinect
Is an open source community around the topic Kinect. They focused to the libfreenect driver
(libfreenect, 2011). The most of the driver program code is written in C. Libfreenect is
OpenSource (Apache license) and available for Windows, Linux, and Mac. They also provide
an encyclopaedia with information around the topic “Kinect” (OpenKinect, 2011).
OpenNI framework
The OpenNI framework is published from the OpenNI organization (OpenNI organization,
2011). Companies like PrimSense who provide the 3D sensing technologies for the Kinect are
in this organization. All source code of the driver and the sample programmes are available
and in C#.
Kinect for Windows SDK
The Kinect for Windows SDK is published by Microsoft (MicrosoftSDK, 2011). It provides
data from the Kinect to developers to build applications in C++, C# or Visual Basic. The
source code is not published. At the moment the SDK is only for non-commercial use.
Optional: Matlab
There are possibilities to combine the OpenNI driver (mexkinect, 2011) and the Microsoft
Kinect SDK driver (Dirk-Jan Kroon, 2011) with Matlab. The OpenNI library wrapper
functions are almost bug free and have more functionality than the wrapper functions for the
Microsoft SDK. But to use the OpenNI library wrapper functions it is necessary to use an
older driver from the OpenNI framework and not the latest one.
All of the middleware’s providing the raw-data which are needed to build a 3D body model.
There are differences in the simplicity of the installation and the connect ability with Matlab.
5 The object “human body”
When working with a body as a scanning object there are some problems are summarized in
presentation of D’Apuzzo (D’Apuzzo, 2009): The problem of scanning a body are practical
problems (movements, breathing, hairs and eyes) and physical limits (stature, size, and
weight). There are also some problems in the scanning process like scanning time, nudity and
the problem of the privacy of the collected data. Especially the movements can be a big
problem when using the ICP algorithm (shown in Section 5). Allen, Curless & Popović
(2003) are writing it in their paper “The human body comes in all shapes and sizes, from
ballet dancers to sumo wrestlers.” That means that it is difficult to make general assumptions
for the object human body.
Honours Degree Project Nikolai Bickel Page 66 of 69 26/04/2012
6 Pre-processing Kinect data
When you want to collect the data from the Kinect you get all measurement points relative to
the Kinect. But you need just the data of the person in front of the Kinect. So you need to get
rid of the points that are not belonging to the human body. This process called segmentation
(Rabbania, et al., 2011). In the paper “Home 3D Body Scans from Noisy Image and Range
Data” Weiss, Hirshberg & Black (2011) explain the segmentation process that way: “We
segment the body from the surrounding environment using background subtraction on the
depth map. Given a depth map Dbg taken without the subject present and a depth map Df
associated with a frame f, we take the foreground to be Dbg − Df > ϵ, where ϵ is a few mm.
We then apply a morphological opening operation to remove small isolated false positives.”
The floor is also not important for the 3D body model. To find the floor and delete those
points in the point cloud you can use the Kinects on board accelerometer to find the floor. The
OpenNI middleware provides a function to find the floor coordinates.
7 3D - Registration
When the data from the Kinect are collected and converted to 3D world coordinates there is
still no full 3D body model available. To get a full 3D body model it is necessary to collect
data of a person’s body in different angle to combine the data in a full 3D body model. This is
required because the Kinect can only collect data in front of the sensor. For instance, when a
person stands with the face to the Kinect sensor the sensor cannot see the information’s of the
back of this person. That means you need the data of an object, in our case a person in
different angles, and match all of these data to one body model together. This process called
registration and there is an explanation in the article written by Brown (1992).
The problem when working with a Kinect as a sensor is pointed out in the paper “Home 3D
Body Scans from Noisy Image and Range Data” written by Weiss, Hirshberg & Black (2011):
“To estimate body shape accurately, we must deal with data that is monocular, low resolution,
and noisy”. They use a part of the SCAPE model which are developed by Anguelov et al.
(2005). Because the SCAPE model is made for shape completion they just use the SCAPE
model which factors body shape and pose information. The SCAPE algorithm needs a
training database of body shapes to work correctly.
When we have a cloud of points from a person in different angels we need to try combining
the different point clouds together. A possible algorithm would be the ICP (Iterative closest
point). An explanation can found in the book written by ZHANG (1992). There are a lot of
implementations in different programming languages accessible over the internet. Further
there are a lot of derivations of the ICP method available. Problems could occur when using
the ICP algorithm when the data from the Kinect are too noisy or not correctly segmented.
Honours Degree Project Nikolai Bickel Page 67 of 69 26/04/2012
Izadi, et al., (2011) suggest that the “Depth measurements often fluctuate and depth images
contain numerous ‘holes’ where no readings were obtained”. In a Kinect image there are
holes where the Kinect IR camera can’t “see” because of lightning conditions, reflection,
transparency, occlusion, the objects being out of range or objects do not reflecting the
infrared. And the Kinect need infrared to work correctly.
8 Meshing
The outcome of the registration process should be a point-cloud. They are a bunch of
disconnected points floating near each other in three-dimensional space. When we look
closely the image will break down into a bunch of distinct points with space visible between
them. “If we wanted to convert these points into a smooth continuous surface we’d need to
figure out a way to connect them with a large number of polygons to fill in the gaps. This is a
process called "constructing a mesh"” (Borenstein, 2011).
An explanation to generate a mesh in Matlab is available in the article “A Simple Mesh
Generator in MATLAB” written by Persson & Strang
9 Conclusion
This paper should be a help in building a 3D body model of a person with a single Kinect.
The paper restricted to the object body and a single Kinect. The process to getting a 3d model
is approximately the same when not working with a body as object. There could be an
improved result when using multiple Kinect systems or have a lot of training data.
Overall the process of getting a 3D body model is not easy and it is a big new task to make it
automatable and user-friendly.
10 References Allen, B, Curless, B & Popović, Z (2003)
The space of human body shapes: reconstruction and parameterization from range scans.
SIGGRAPH '03
Anguelov, D, Srinivasan, , Koller, D, Thrun, S, Rodgers, J & Davis, J. (2005)
SCAPE: Shape Completion and Animation of People.
SIGGRAPH Conference
Honours Degree Project Nikolai Bickel Page 68 of 69 26/04/2012
Borenstein, G. (2011)
Making Things See.
O'Reilly Media
Brown, LG. (1992)
A Survey of Image Registration Techniques.
ACM Computing Surveys, vol 24, pp. 325-376.
Burrus, N. (2011)
Kinect Calibration OpenKinect.
http://nicolas.burrus.name/index.php/Research/KinectCalibration
(visited Nov. 2011)
D’Apuzzo, N. (2009)
Hometrica. http://www.hometrica.ch/pres/2009_essilor_pres.pdf
(visited Nov. 2011)
Daniel Herrera C., Juho, K & Janne, H. (2011)
Accurate and Practical Calibration of a Depth.
LNCS 6855, vol II, pp. 437—445.
Dirk-Jan, K. (2011)
Kinect Microsoft SDK. http://www.mathworks.com/matlabcentral/fileexchange/33035
(visited Nov. 2011)
Freedman, B., Shpunt, A., Machline, M. & Arieli, Y. (2010)
Depth mapping using projected patterns.
United States, Patent No. US 2010/0118123
Izadi, S, Kim, D, Hilliges, O, Molyneaux, D, Newcombe, R, Kohli, P, Shotton, J, Hodges, S,
Freeman, D, Davison, A & Fitzgibbon, A. (2011)
KinectFusion: Real-time 3D Reconstruction and Interaction.
http://research.microsoft.com/pubs/155416/kinectfusion-uist-comp.pdf
(visited Nov. 2011)
Khoshelham, K .(2011)
Accuracy analysis of kinect depth data.
ISPRS
libfreenect (2011)
libfreenect. https://github.com/OpenKinect/libfreenect
(visited Nov. 2011)
mexkinect (2011)
kinectmex. http://sourceforge.net/projects/kinect-mex/
(visited Nov. 2011)
Honours Degree Project Nikolai Bickel Page 69 of 69 26/04/2012
MicrosoftInt (2011)
Introduction to Kinect for Windows, Microsoft. http://www.xbox.com/en-US/kinect
(visited Nov. 2011)
MicrosoftSDK (2011)
Microsoft Kinect SDK. http://kinectforwindows.org/
(visited Nov. 2011)
OpenKinect (2011)
OpenKinect. http://openkinect.org/wiki/Main_Page
(visited Nov. 2011)
OpenNI organization (2011)
OpenNI. http://openni.org/
(visited Nov. 2011)
Persson, P-O & Strang, G. (2004)
A Simple Mesh Generator in MATLAB.
SIAM Review, vol 46, pp. 329-345.
Rabbania, T, van den Heuvelb, FA & Vosselmanc. (2011)
Segmentation of point clouds using smoothness constraint.
ISPRS Commission V Symposium
ROS (2011)
ROS (Robot Operating System). http://www.ros.org/wiki/openni_kinect/kinect_accuracy
(visited Nov. 2011)
ROS CA (2011)
ROS. http://www.ros.org/wiki/openni_camera/calibration
(visited Nov. 2011)
ROS CT (2011)
ROS (Robot Operating System). http://www.ros.org/wiki/kinect_calibration/technical
(visited Nov. 2011)
Pajdla, T. , Smisek, J. & Jancosek, M. (2011)
3D with Kinect.
ICCV
Weiss, A, Hirshberg, D & Black, M (2011)
Home 3D Body Scans from Noisy Image and Range Data.
ICCV 2011
ZHANG, Z. (1992)
Iterative Point Matching for Registration of Free-form Curves.