3d body measurement’s with a single kinect sensor

3D body measurement’s with

a single Kinect sensor

Nikolai Bickel

Supervisor: Dr. Bogdan Matuszewski

A project report submitted in partial fulfilment of the Degree of BSc (Hons.) Computing

26/04/2012

Double Project CO3808

University of Central Lancashire

[email protected]

Honours Degree Project Nikolai Bickel of 69 26/04/2012

I. Abstract

This report describes the project that has been undertaken as part of the final Degree of

Computing BSc (Hons) at the University of Central Lancashire.

The Microsoft Kinect holds the Guinness World Record of being the "fastest selling consumer

electronics device". 8 million units were sold in its first 60 days and that was just the

beginning. Until January 2012 18 million Kinect’s were sold worldwide. Mainly these

devices were used for the gaming console Xbox as an input device instead of the normal game

controller. It is able to control the console and selected games with gestures and voice

commands.

Very short after the big commercial success a few smart engineers found out that it was

possible to connect the Kinect not just with the gaming console but also with a standard

personal computer. Soon after a connection was established the engineers were also able to

access the data of the Kinects sensors.

The Kinect as a 3D input device for computer vision applications and a lot more was born.

The quite good accuracy, relative to the price, made it to a great 3D measurement device.

Because working with the Kinect is very interesting this project analyses if it is possible to

use the Kinect as an input device for body measurements such as height or chest

circumference to use this data to communicate clothing sizes. That first idea slightly changed

to just make the 3D body model and the measurement of body height.

In this project the different capabilities of the Kinect and the way to access and analyse data

was explored. The next step was to make experiments with real persons to look how accurate

the results are compared to the real world.

This report describes the research around the Kinect project, its capability’s and the present

results of the body scanning experiments and accuracy analysis.

II. Acknowledgements

I would like to thank

my supervisor Dr. Bogdan Matuszewski for his help and guidance over the whole

project lifecycle

my family for supporting my study in Preston


III. Table of Contents

I. Abstract ........................................................................................................................... 2

II. Acknowledgements ......................................................................................................... 2

III. Table of Contents ............................................................................................................ 3

1. Introduction ..................................................................................................................... 5

1.1. Context......................................................................................................................... 5

1.2. Overview ..................................................................................................................... 5

2. Analysis and Investigation .............................................................................................. 6

2.1. Chapter Summary ........................................................................................................ 6

2.2. 3D human body scanning ............................................................................................ 6

2.2.1. What is 3D? .......................................................................................................... 6

2.2.2. What can be scanned? .......................................................................................... 6

2.2.3. Problems ............................................................................................................... 7

2.2.4. Use-cases for 3d body models .............................................................................. 7

2.3. Project management .................................................................................................... 8

2.3.1. Project Proposal .................................................................................................... 8

2.3.2. Technical Plan ...................................................................................................... 8

2.3.3. Literature Review ................................................................................................. 8

2.3.4. Supervisor Meetings ............................................................................................. 8

2.3.5. Risk Analysis ........................................................................................................ 9

2.4. Microsoft Kinect ........................................................................................................ 10

2.4.1. What is a Microsoft Kinect? ............................................................................... 10

2.4.2. Hardware details ................................................................................................. 11

2.4.3. Principles of Kinect ............................................................................................ 12

3. Design Issues ................................................................................................................. 14

3.1. Chapter Summary ...................................................................................................... 14

3.2. Kinect interfaces ........................................................................................................ 14

3.2.1. OpenNI framework ............................................................................................ 14

3.2.2. OpenKinect ......................................................................................................... 16

3.2.3. Microsoft Kinect SDK ....................................................................................... 18

3.3. RGB-Demo/Nestk (data processing) ......................................................................... 20

3.3.1. Current features of the RGBDemo ..................................................................... 20

3.3.2. Installation and Compilation .............................................................................. 21

3.3.3. RGBDemo installation ....................................................................................... 22

3.4. Supporting tools ......................................................................................................... 23

3.4.1. CMake ................................................................................................................ 23

3.4.2. Git ....................................................................................................................... 25

3.4.3. Meshlab .............................................................................................................. 25


3.4.4. Microsoft Visual Studio ..................................................................................... 25

4. Implementation .............................................................................................................. 26

4.1. Chapter Summary ...................................................................................................... 26

4.2. Kinect Calibration...................................................................................................... 26

4.3. Collect data ................................................................................................................ 28

4.3.1. How to save the Kinect data ............................................................................... 28

4.3.2. What data are saved per frame? ......................................................................... 30

4.4. Accuracy measurements ............................................................................................ 31

4.4.1. Random error ...................................................................................................... 31

4.4.2. Systematic error .................................................................................................. 31

4.4.3. The setup ............................................................................................................ 32

4.4.4. Colour Segmentation .......................................................................................... 32

4.4.5. Histograms ......................................................................................................... 33

4.4.6. Standard deviation .............................................................................................. 34

4.4.7. Error map ............................................................................................................ 34

4.4.8. Problems ............................................................................................................. 34

4.4.9. Results ................................................................................................................ 34

4.5. 3D reconstruction ...................................................................................................... 35

4.5.1. PCL Kinect Fusion ............................................................................................. 35

4.5.2. RGBDemo Reconstructor .................................................................................. 35

4.5.3. Implementation ................................................................................................... 39

4.5.4. Problems/Solutions............................................................................................. 41

4.6. Point cloud processing ............................................................................................... 43

4.7. Meshing ..................................................................................................................... 44

4.8. Measurements ............................................................................................................ 45

5. Critical Evaluation ......................................................................................................... 46

5.1. Chapter Summary ...................................................................................................... 46

5.2. Project Management Evaluation ................................................................................ 46

5.3. Design and Implementation ....................................................................................... 46

5.4. Possible Improvements .............................................................................................. 47

5.5. Learning Outcomes .................................................................................................... 48

6. Conclusion ..................................................................................................................... 48

7. References ..................................................................................................................... 49

8. List of Figures ............................................................................................................... 52

9. Appendix A ................................................................................................................... 53

10. Appendix B ................................................................................................................... 56

11. Appendix C ................................................................................................................... 61

12. Appendix D ................................................................................................................... 62


1. Introduction

1.1. Context

There are a few problems when buying clothes online, the most common being that the

purchased clothes do not fit. This is exacerbated by the fact that many users do not know their

own size or the size of those they are purchasing for (such as parents who purchase garments

for their children). Many people deal with this problem by ordering several sizes of the same

clothes and send back the excess.

For those who buy several sizes it can be a nuisance to return the excess clothes. Additionally

if a customer wishes to purchase clothes for a special occasion, they might be reluctant to

order online as they might be unsure of the fitting of the clothes ordered. The online stores

also bear the costs associated with this problem, as they usually pay the shipping costs for the

returned items, and also deal with several logistical issues along with the costs associated with

the resale of the items. The project idea was to get the measurements with the help of a

Microsoft Kinect. This was a pretty clear target, because the device is a cheap depth scanner.

A customer should be able to upload his data of measurements to a homepage and the online

shop can then check if the clothes the person wants to buy would fit this person. But this was

not part of this project. The main work was the collection, analysis and combination of the

data from the Kinect with the help of several different tools explained in this report.

Additionally to the 3D reconstruction a setup and a program to analyse the accuracy and noise

of the Kinect data was made.

The first thought for the usage of the project outcome was just for online shops, but a 3D body

model could be useful in different other applications for example to generate a virtual game

character (avatar) for computer games or medical analysis.

1.2. Overview

Chapter 2 (Analysis and Investigation) contains general information about 3D, especially in

context of human bodies. It also contains project relevant information for example about

project meetings, additional project documents and an evaluation of the risk analysis.

Chapter 3 (Design Issues) contains information about the interfaces to connect the Kinect to

the computer. It also contains information about the program RGBDemo which was used in

this project. The next part introduces a few other supporting programs.

Chapter 4 (Implementation) contains more information about how the 3D body model was

produced and anything else that is important to produce a 3D body model. The chapter also

contains information about the Kinect error and the accuracy experiment that has been made

during this project.

Chapter 5 (Critical Evaluation) contains evaluations and thoughts about the different aspects

of the project. It also includes possible improvements and learning outcomes.


2. Analysis and Investigation

2.1. Chapter Summary

This chapter contains general information about 3D especially in context of human bodies. It

also contains project relevant information for example about project meetings, additional

project documents and an evaluation of the risk analysis.

2.2. 3D human body scanning

2.2.1. What is 3D?

Figure 1 2D, 2.5D and 3D (D’Apuzzo, 2009)

3D stands for three-dimensional. Objects of our world can be represented by three parameters.

These three dimensions are commonly called length, width and depth. That is also the reason

why the Kinect is sometimes called “depth sensor”. The depth sensor of the Kinect can

capture the information in front of it in these three dimensions.

2.2.2. What can be scanned?

It is possible to scan the whole body or just parts of the body. For example it is possible to

scan the chest, the back, the face or legs. In this project the full body scan of a person is being

used. The advantages of scanning just a part of the body are given when there is a special

interest for example in medical application.


2.2.3. Problems

When working with a human body as a scanning object there are some problems that are

summarized in the presentation of D’Apuzzo (D’Apuzzo, 2009):

Practical problems

o Movements

o Breathing

o Hairs

Physical limits

o Stature

o Size

o Weight

Private data

Especially the movements during the scanning were a big problem when reconstructing the

body.

2.2.4. Use-cases for 3d body models

Animation – When a 3D body model of a person is available it is possible to animate this

body with computer graphic techniques. This animation could be useful in computer games.

Ergonomics – For example a company could produce a special chair for a specific person’s

body

Fitness / Sport – Strung together scans could show the weight loss process (motivation)

Medicine – A 3D model of a face could be useful for plastic surgery


2.3. Project management

2.3.1. Project Proposal

The purpose of the Project proposal (see Appendix A) was to write about the problem we

want to tackle with our project. We also had to write in generally how we want to solve the

project. The project proposal includes searching for relevant literature and adding it to the

document. It also contains an initial idea to tackle the problem.

2.3.2. Technical Plan

In the technical plan (see Appendix B) the project is specified in more detail. It also contains

the project management relevant material for example project deadlines. In this stage of the

project a risk analysis was made and potential ethical or legal issues were discussed. The

technical plan additionally contains a small commercial analysis about what costs are to be

expected.

An important part was also to schedule our project. It is realized with a Gantt chart (see

Appendix C). A Gantt chart illustrates the different project stages in form of a bar chart.

2.3.3. Literature Review

The literature review contains a discussion of the published work around the project topic.

Because the Kinect is relatively new it was not easy to find literature specific for the Kinect.

The title of the literature review is “Build a 3D body model with a single Kinect sensor”. The

work on the literature review gave an overview about what is important in this project.

2.3.4. Supervisor Meetings

There had been several meetings and conversations with the supervisor Bogdan Matuszewski

during the course of the project from September until the end of April.

From the start of the academic year in September until Christmas vacation we met nearly

every week. Those meetings were shared by electronic bachelor students who also have Mr.

Matuszewski as a supervisor.

These meetings were organized in a way that every week one person had to present his project

to the other students. The thoughts about the project were presented first. For this presentation

a PowerPoint presentation to illustrate the problems was made. The purpose of the

presentation was to learn how to present a project progress. The students who heard the

presentation learned to think and criticize other projects. Sometimes the other projects

introduced thoughts and ideas which could then be adapted to this project. For example the

colour segmentation (see chapter 4.4.4) was mentioned in one of this meetings.

In the second semester there were only individual meetings with the Mr. Matuszewski, so he

could go into more detail when supervising. These meetings happened approximately every

two to three weeks. In these meetings we also got the feedback for the literature review, the

pre-christmas progress review, the post-christmas progress review and the acceptance check.

http://www.dict.cc/englisch-deutsch/therewith.html


2.3.5. Risk Analysis

Risk Severity Likelihood Action

Noisy depth data from

the Kinect

Medium High Make the best out of the data I get

from the Kinect

Inaccurate data from

external resources

(OpenKinect, MS Kinect

SDK, OpenNI)

High Medium Try to configure the tools the right

way so that the interface give the best

results that are possible

Robustness – the data of

2 measurements do not

match

Medium High Try to reduce the error rate as much

as possible

The Kinect sensor

breaking

High Low Buy a new one (will take a week)

Losing information about

the project or related

information

High Low Make backups

Measurement points are

too complex to

implement

Medium Medium Try the best otherwise reduce the

measurement points

Scheduling failure, not

enough time to complete

project

High Medium Make a Gantt chart

Table 1 Risk analysis

This is the risk analysis made for the project as part of the technical plan (see Appendix B) at

the beginning of the project. This chapter discuss these risks and how they affected the

project.

The first risk was the noise data from the Kinect. Every measurement device has a noise. Due

to the Kinects cast as a controller device for a gaming console perfect accuracy is not needed.

But it still works as you can see in the results at the end of the report.

The data from the Kinect interfaces on the PC were pretty good. There was no problem that

the Kinect interfaces (see chapter 3.2) gave inaccurate values.

The Kinect sensor did not break but instead the hard drive did. The likelihood on this issue

was “Low”. But there were a backup of all project relevant data available so the impact on the

project was not very big. It was just a small problem but a project meeting with the supervisor

had to be rescheduled because the hard drive was not delivered on time.


2.4. Microsoft Kinect

2.4.1. What is a Microsoft Kinect?

Figure 2 Microsoft Kinect for Xbox 360 (Pandya, 2011)

The usual purpose of the Microsoft Kinect is not the way it is used in this project. Microsoft

announced the Kinect as a gaming input device. It interacts with the Xbox, a video game

console manufactured by Microsoft. It is an optional peripheral device. The Kinect is

connected with a wire to the Xbox and enables advanced gesture recognition, facial

recognition and voice recognition. The Kinect for Xbox 360 was released in November 2010.

18 million units of the Kinect sensor have been shipped worldwide till January 2012

(Takahashi, 2012). In this project the Kinect for Xbox 360 was used for research and

development. The Kinect sensor has a practical ranging limit of 1.2 – 3.5 m distance when

used with the Xbox software.

The Kinect is connected to the Xbox through a USB port. Because the USB port cannot

supply enough power for the Kinect the device makes use of a proprietary connector

combining USB communication with additional power. The newer Versions of the Xbox does

not need this special power supply cable.

Shortly after the Kinect was announced hacker figured out how to use the Kinect as a non-

gaming device on PC’s. People used the Kinect to control robots or play normal computer

games with gestures.

In this project the Kinect was not used with an Xbox. Instead it was connected with a

computer by a standard USB 2.0 port.

The Kinect has an RGB camera, depth sensor and multi-array microphone. There is more

written about the features in chapter 2.4.2.


There are two different Versions of the Microsoft Kinect. As Microsoft discovered that many

people want to use the Kinect beyond the gaming purpose they start thinking about to support

these researcher and company’s too. That’s why they announced a SDK about which is

written in chapter 3.2.3. In the middle of this project Microsoft announced a new Kinect

version (hardware) the “Kinect for Windows”.

The Microsoft Kinect for Xbox 360 is the original Kinect sold for the Xbox. With this Kinect

version people are not allowed to make commercial products using the Kinect. But the device

drivers for the computer are still working with it. The Microsoft Kinect for Windows will be

available in May 2012. The hardware of the device did not change. The Kinect application

now has a “Near Mode” which enables the depth camera to see objects which are closer to the

camera. With the “Kinect for Windows” developer are also able to use the Kinect

commercially.

The Kinect are just sold with a Kinect game “Kinect adventures”. At the moment the product

Kinect Sensor with Kinect Adventures costs £ 99.00 (02/04/2012). For a device with a depth

sensor this is very cheap. The price for the Kinect for Xbox are subsidized by Microsoft

because they hope that “consumers buying a number of Kinect games, subscribing to Xbox

Live, and making other transactions associated with the Xbox 360 ecosystem” (Kinect for

Windows Team, 2012). This is also the reason why the Kinect for Windows will cost

approximately £ 100 more than the Kinect for Xbox version.

Within the first 60 days the Kinect was sold more than 8 million times. This is a Guiness

World Record for the “fastest selling consumer electronics device”. In January 2012

Microsoft announced that they sold 18 million Kinect motion-sensing systems. Certainly the

most Kinects are used for gaming and not for developing applications on the computer.

2.4.2. Hardware details

2.4.2.1. RGB camera

The Kinect has a traditional colour video camera in it, similar to webcams and mobile phone

cameras. Microsoft calls this an RGB camera referring to the red, green and blue colours it

detects. The camera has a resolution of 640x480 pixels with 30 frames per second. The RGB

camera has a slightly larger angel of view than the depth sensor.

2.4.2.2. 3D Depth Sensor

The 3D depth sensor is the heart of the Kinect unique capabilities. The sensor provides a

640x480 pixel depth map with 11 bit depth (2048 levels of sensitivity) with 30 frames per

second. An advantage of a depth sensor is that it is colour and texture invariant.

2.4.2.3. Multi-array microphone

The Kinect includes an array of four built-in microphones. Each of the microphones provides

16-bit audio at a sampling rate of 16 kHz. They are used by the Xbox to gather sound

commands by the user. There is more than one microphone to isolate the noise in the room.

2.4.2.4. Tilt motor for sensor adjustment

The motorized pivot is capable of tilting the sensor up and down. With this motor the Kinect

is able to extend the horizontal field of view.

http://www.dict.cc/englisch-deutsch/approximately.html


2.4.3. Principles of Kinect

There are different methods to measure the depth information of a scene. Those which are

working with light waves, like the Kinect, are for instance laser scans or time-of-flight

techniques. The Kinect is using structured light technique.

Structured light is a process of projecting a known pattern of pixels on to a scene and then

analyses the way these deform when hit a surface. This structured light could be visible for

the eye or not. The Kinect is using invisible (or imperceptible) structured light because the

light pattern is near-infrared light which a normal human eye cannot see. The structured light

projected by the Kinect has a pseudo random pattern.

Figure 3 Image from the PrimeSense patent (Zalevsky, et al., 2007)

The Light Source Unit (IR light source) project the light pattern to the scene. The Light

Detector (CMOS IR Camera) observes the scene and the Control system calculates the depth

information.

The calibration between the projector and the detector has to be known. Calibration is carried

out the time of manufacture. The calibration of the device was carried out while

manufactured. A set of reference images was taken at different locations then stored in the

memory.


Figure 4 Structured light (Freedman, et al., 2010)

The speckle pattern (structured light) produced by the IR light source varies with the z-axis.

Figure 4 shows a human hand (object) with different speckle pattern at different distance.

Kinect uses 3 different sizes of speckles for three different regions of distances. Because the

speckles are having a distance-dependent property, each position has its specific spacing and

shape. The control system of the Kinect estimates the depth by correlating each window with

the reference data (speckle pattern). The reference pattern is stored at a known depth in the

Kinects memory.

“The best match with the stored pattern gives an offset from the known depth, in terms of

pixels: this is called disparity. The Kinect device performs a further interpolation of the

best match to get sub-pixel accuracy of 1/8 pixel. Given the known depth of the memorized

plane, and the disparity, an estimated depth for each pixel can be calculated by

triangulation.” - (ROS.org, 2010)


3. Design Issues


This chapter contains information about the interfaces to connect the Kinect to the computer.

It also contains information about the program RGBDemo which was used in this project. The

next part introduces a few other supporting programs.

3.2. Kinect interfaces

At the beginning of the project there was the decision how to get the information from the

Kinect. There were three possible interfaces to access the information. Often these interfaces

were “built from source code”. In the field of computer software this term means the process

of converting source code files into software that can be run on a computer.

3.2.1. OpenNI framework

OpenNI Framework is published by the OpenNI organization. One of the OpenNI

organization goals is to accelerate natural user interfaces. The founding members of the

OpenNI organization are PrimeSense, Willow Garage, Side-Kick, ASUS and AppSide.

PrimeSense is an Israeli company that provides 3D sensing technology for Kinect and ASUS

is a multinational computer hardware and electronics company. The OpenNI organization

provides an API that covers communication with low level devices (e.g. vision and audio

sensor) and high-level middleware solutions (e.g. visual tracking using computer vision).

Figure 5 OpenNI framework architecture (OpenNI, 2012)

The OpenNI framework is not just for the Microsoft Kinect. It also supports other hardware as

example the ASUS Xtion PRO.


3.2.1.1. Installation

The concept of OpenNI is that they are trying to be very modular. To install and access the

API it is necessary to install three different components to work with the Kinect. It is

important to install these components in the right order.

The download for the components can be found on the homepage of the OpenNI organization:

http://www.openni.org/Downloads/OpenNIModules.aspx

All components are available as executable files in 32 and 64-bit version for Windows and

Ubuntu. There are also stable and unstable releases available. To use the Kinect mod (Step 4)

at the moment the unstable releases are required as you can read in the last step of the

installation process.

The first step is to install the OpenNI Binaries.

The second step is to install the NITE Module (Download category: OpenNI Compliant

Middleware Binaries). Many of old installation instructions on the internet tell you that a

license key is necessary. But in all of the latest NITE installation packages the licence key is

added automatically.

The third step is to install the Primesensor Module (Download category: OpenNI Compliant

Middleware Binaries).

The last step is to install the SensorKinect driver which can be downloaded from

https://github.com/avin2/SensorKinect. It is important to read the README file. At the actual

version it says “You must use this kinect mod version with the unstable OpenNI release”.

When all these packages are installed a restart of the system is highly recommended.

When the installation is successful the driver for the Kinect is installed when the Kinect is

connected the first time to the USB port (shown in Figure 6)

Figure 6 Installed OpenNI driver at the Windows Device Manager

3.2.1.2. Interface

You can access data from the Kinect over the OpenNI framework with Java, C++ and C#.

To use the interface in C/C++ add the include Directory "$(OPEN_NI_INCLUDE)". This is

an environment variable that points to the location of the OpenNI Include directory. The

standard location of the include directory is C:\Program files\OpenNI\Include.

Also add the library directory "$(OPEN_NI_LIB)". This is also an environment variable that

points by default to the location C:\Program files\OpenNI\Lib.

http://www.openni.org/Downloads/OpenNIModules.aspx

https://github.com/avin2/SensorKinect


The source code should include XnOpenNI.h if using the C interface or XnCppWrapper.h if

using the C++ interface.

3.2.1.3. License

OpenNI is written and distributed under the GNU Lesser General Public License (LGPL)

which means that its source code is freely-distributed and available to the general public.

3.2.1.4. Documentation

OpenNI framework is well-documented. There exists a Programmer Guide which explains the

OpenNI system architecture and programming object model. These explanations are

illustrated with code snippet examples.

The OpenNI framework also provide example applications in C, C++ and Java with a

Features thoroughly explained in the documentation.

As any normal framework OpenNI also supply a documentation of the interface its classes

and members.

3.2.2. OpenKinect

OpenKinect is according to their homepage “an open community of people interested in

making use of the amazing Xbox Kinect hardware with our PCs and other devices. We are

working on free, open source libraries that will enable the Kinect to be used with Windows,

Linux, and Mac” (OpenKinect, 2012).

The OpenKinect project has an interesting history. In November 2010 the website

adafruit.com announced a competition for hackers and said they would pay $3,000 to the

person who can access the Kinect with a PC and access the image and depth data. The source

code needs to be open source and/or public domain. A few days later Hector Martín finally

was able to hack the Kinect and won the competition.

When Microsoft finally announced their driver for the Kinect a Microsoft employee called

Johnny Chung Lee shared his secret in his blog:

“Back in the late Summer of 2010, trying to argue for the most basic level of PC

support for Kinect from within Microsoft, to my immense disappointment, turned out

to be really grinding against the corporate grain at the time (for many reasons I won't

enumerate here). When my frustration peaked, I decided to approach AdaFruit to put

on the Open Kinect contest” - (Lee, 2011)

Johnny Chung Lee does not work at Microsoft anymore, but this statement means that the

first person who started the whole hacks around the Kinect was a developer of the Kinect

within Microsoft.

The heart of the OpenKinect is the “libfreenect”. Libfreenect includes all necessary code to

activate, initialize, and communicate data with the Kinect hardware. This includes drivers and

a cross-platform API that works on Windows, Linux, and OS X. At the moment there is no

access to the audio stream of the Kinect.


The roadmap of the Kinect mentions an OpenKinect Analysis Library. This library should

analyse the raw information into more useful abstractions. This includes hand tracking,

skeleton tracking, Point cloud generation and 3d reconstruction. But they are also writing that

it takes months or years to implement these functionalities


To explore the OpenKinect project the driver were build form the source code. To build the

project for Windows you first need to download the source code from github (OpenKinectSrc,

2012).

The next step is to install the dependencies. Those are for Windows libusb-win32, pthreads-

win32 and Glut. Copy the .dll-files from the dependencies pthreads and Glut to

/windows/system32.

There are two parts to libfreenect. One is the low-level libusb-based device driver and the

other is libfreenect. Libfreenect is the library that talks to the driver. The next step is to install

the low-level device driver. This can be done in the Device Manager of Windows. Right click

on the Kinect devices and select "Update Driver Software...“ Update the driver of the devices

“Xbox NUI Motor”, "Xbox NUI Camera" and "Xbox NUI Audio". The drivers are located in

the downloaded source code in the folder “/platform/windows/inf”.

After this step use Cmake to configure the compiler and create the makefiles for the compiler.

CMake is discussed in chapter 3.4.1.

The next step is to compile the source code with the compiler that was selected in Cmake. To

use the library it should be copied to “/windows/system32” or to the folder of the program

you want to run with the library.

More information is in the README files and the OpenKinect encyclopaedia.

3.2.2.2. Interface

OpenKinect is a low level API. At the moment it only supports a few basic functions. It

allows the access to the camera, depth map, the led and the motor for tiling the system.

public class KinectDevice {

public KinectDevice()

Signature: ()V;

public setLEDStatus(LEDStatus)

Signature: (LLEDStatus;)V

public getLEDStatus()

Signature: ()LLEDStatus;

public setMotorPosition(float)

Signature: (F)V

public getMotorPosition()

Signature: ()F

public getRGBImage()

Signature: ()LI

public getDepthImage()

Signature: ()LI

}


These are the functions OpenKinect offers to the user. As you can see it offers access to the

RGB Image, the Depth Image and the Motor Position.

The API is written in C but there are wrappers for Python, C++, C#, Java and several other

programming languages. Wrapper in this case means that these wrappers are bridges between

the C API and the appropriate programming language.

3.2.2.3. License

This project is dual-licensed. When software is dual-licensed the recipients can choose under

which terms they want to use or distribute the software. The two licenses are the Apache 2.0

license and the GPL v2 license. This means you can copy, modify and distribute the covered

software in source and/or binary forms with some conditions. These conditions are for

example all copies should be accompanied by a copy of the license.


OpenKinect provides a Wiki with relevant information to the interface and the Kinect itself. A

Wiki is a website whose users can add, modify, or delete its content via a web browser like

the well-known internet site Wikipedia.

3.2.3. Microsoft Kinect SDK

The Microsoft Kinect SDK is the original programming interface to the Microsoft Kinect. It

was announced by Microsoft in spring 2011 after they saw what impact the OpenKinect

project had to the developer community.

During the project in March 2012 Microsoft announced a new version of the SDK called

“Kinect for Windows 1.5”.

At the start of the project Microsoft named his software development kit “Microsoft Kinect

SDK”. At the beginning of the Year 2012 they changed the name of their Kinect project in

“Kinect for Windows” and the name of the Kinect SDK changed into “Kinect for Windows

SDK”.

Additionally to the publishing the Kinect SDK Microsoft also started 'Kinect Effect'

marketing campaign. The campaign aims to show that a product designed for entertainment is

having a big impact on people's lives. The intention is that consumers will view the product as

a source of 'innovation and inspiration'. They created a video where they show different use

cases of the Kinect beyond the normal gaming purpose. (Microsoft, 2012)

Technically it is possible to use the Microsoft Kinect SDK for the “Kinect for Xbox” but they

recommend changing to the new version of the Kinect called “Kinect for Windows”. This is

the new version of the Kinect especially for Windows (See chapter 2.4.1).

The Kinect for Windows is just available for the operating system Windows. This includes

Windows 7 and Windows Embedded Standard 7. Currently, it can also be used with Windows

8 Developer Preview.



The installation of the Kinect driver for Windows is the easiest of all Kinects programming

interfaces. You only have to download the Kinect SDK application and start the setup. Then

follow the setup which is pretty straight forward.

Additionally to the Kinect for Windows SDK Microsoft also provide a runtime version of the

Kinect framework. A run time version enables a potential user to install all components used

by a program which want to connect to the Kinect without any software development kit

specific content. The runtime version is for customers and smaller than the whole SDK. But

this runtime version just work for the “Kinect for Windows” device.

3.2.3.2. Interface

With the SDK you can build applications in C++, C# and Visual Basic. It is possible to access

the image, depth and audio stream of the connected Kinect.

To use the framework provided by Microsoft add a Reference to the dynamic-link library

Microsoft.Kinect.dll.

To use the data from the Kinect in Visual Studio make a right-click to “References” and select

the option “Add Reference”. Select the Tap “.NET” and search for the “Microsoft.Kinect”

library. Select it and click OK. You can use the classes and functions in C# by adding the

using-command “using Microsoft.Kinect;”

3.2.3.3. License

The new Kinect for Windows SDK authorizes development and distribution of commercial

applications. The old SDK was a beta, and as a result was appropriate only for research,

testing and experimentation.

The license allows software developer to create and sell their applications to customers using

Kinect for Windows hardware. That means you cannot sell applications for people who are

having the Kinect for Xbox hardware.


Microsoft provides a lot of different documentations and help when you want to work with

the Kinect for Microsoft SDK. This includes a discussion board, videos and code samples.


3.3. RGB-Demo/Nestk (data processing)

An internet search for tools to access and process the Kinect data had the RGBDemo as result.

The main creator of this program is Nicolas Burrus from Spain. RGBDemo helps to access

the data from the Kinect and have a bunch of useful functions to compute the data from the

Kinect. Both of these two projects are open source and available to download via the version

control system git (more on chapter 3.4.2) or over Github (Burrus, 2012) (Burrus, 2012).

Mr. Burrus divided his project with the Kinect into two parts. One is a library nestk (Burrus,

2012) and the other is the RGBDemo (Burrus, 2012).

Nestk is a C++ Library for Kinect which provides a lot of the functions and classes that were

used in the RGBDemo. The library is built on top of OpenCV and QT for the graphical parts.

Parts of it also depend on PCL. The text in chapter 3.2.2 deals with this dependencies.

RGBDemo is using a lot of the nestk library and it has implemented a lot of Kinect related

algorithms. The RGBDemo is written in C++ and use QT for the graphical user interface.

3.3.1. Current features of the RGBDemo

The RGBDemo can grab Kinect images, visualize and replay them. This topic is discussed in

chapter 4.3.

It supports the OpenKinect and the OpenNI as a backend framework. The library of

OpenKinect called libfreenect is already integrated into the nestk. With the OpenNI backend

the program can also extract skeleton data and hand point position.

Since a few months there is also a stable release of the RGBDemo which supports multiple

Kinects.

Results of the different demo programs can be exported to .ply files. More about this file

format is written in chapter 4.5.2.1. These demo programs are:

Demo of 3D scene reconstruction using a freehand Kinect (more in chapter 4.5.2)

Demo of people detection and localization

Demo of gesture recognition and skeleton tracking using OpenNI

Demo of 3D model estimation of objects lying on a table (based on PCL table top

object detector)

Demo of multiple Kinect calibration

A good point for the RGBDemo is that it supports all common operating systems Windows,

Linux and Mac’s OSX.

RGBDemo is under the GNU Lesser General Public License (LGPL). “In short, it means you

can use it freely in most cases, including commercial software. The main limitation is that you

cannot modify it and keep your changes private: you have to share them under the same

license.” (Burrus, 2012)


3.3.2. Installation and Compilation

There is no installation routine for the RGBDemo. To use it the user simply has to download

the .exe files and can start the program. This means there are no fancy installers or nice icons.

RGBDemo offer Win32 binaries to use the program. To start one of the programs you simply

have to click on the “rgbd-viewer.exe” and the program starts. Certainly the dependencies

need to be installed and should work.

It take some time to self-compile the RGBDemo application the first time especially when

you never worked with Git CMake or with other open source projects.

The environment to install the RGBDemo was Microsoft Windows 64-bit with Visual Studio

2010. The compiler from Visual Studio 2010 to build the RGBDemo was used.

It is nearly impossible to compile the RGBDemo Reconstructor with 64-bit because there was

an error “C2872: ‘flann’ : ambiguous symbol”. This error is caused by a conflict between an

embedded flann in OpenCV and an external dependency of PCL with another copy of Flann.

So the ambition to build a 64-bit version of was given up and the RGBDemo were built in the

32-bit version. This has no major disadvantage and also run on a 64-bit computer.

At first dependencies of the RGBDemo were installed. The dependencies are OpenNI, QT,

OpenCV and PCL. PCL is an optional dependency but were still installed and used in this

project, because PCL have good point cloud algorithms implemented.

3.3.2.1. QT

The Qt framework is an application to make graphical user interfaces. It is comparable with

Window’s WinForms or WPF. But instead of just Windows, QT works on a lot more

platforms for example additional to Windows, Mac OSX, Linux, Symbian (mobile phone

operation system). QT is supported and developed by the Nokia development division. QT

includes a GUI designer which was used in the accuracy program (see chapter 4.4).

When the project started last year there was no pre-compiled library for Visual Studio 2010

available. But now they are available on Nokia’s web site. This makes the installation of the

Qt to an easy task. The download is available on Nokia’s download page (Nokia, 2012).

3.3.2.2. OpenCV

OpenCV (Open Source Computer Vision) is a library of programming functions for computer

vision. It includes among other things 2D and 3D feature toolkits, Facial recognition system,

Gesture recognition, Structure from motion or Motion tracking. The library was originally

written in C. But the version 2.0 includes the traditional C interface and additionally a C++

interface. OpenCV runs on any major operating system including Android. Android is the

most common operating system of smart phones.

The pre-compiled OpenCV library is downloadable from the project homepage of OpenCV

(OpenCV, 2012). When installing OpenCV it is a good idea to choose an installation path

which has no spaces in it. For example on “C:\OpenCV2.2” instead of “C:\Program Files

(x86)\OpenCV2.2”. This can prevent problems with the inclusion of the library.


3.3.2.3. PCL

PCL stands for point cloud library. Its main goal is the 3D point cloud processing. The PCL

framework contains numerous state-of-the art algorithms including, surface reconstruction,

feature estimation, segmentation and model fitting. PCL is an open source project.

The PCL were installed with the normal All-In-One installer from their homepage. This

includes all of the libraries and also the dependencies of PCL. Part of the PCL is also

experimental implementation of the KinectFusion algorithm. This algorithm is not included in

the All-In-One installer. There is more about this reconstruction algorithm in chapter 4.5.1.

3.3.2.4. Kinect backend

Certainly the RGBDemo need a middleware to get access to the Kinect. These are the Kinect

interfaces which are described in chapter 3.2. In this project the RGBDemo was used with

OpenNI.

3.3.3. RGBDemo installation

After the installation of the dependencies the next step is to download the source code of the

RGBDemo from the internet. This download can be done with git. There are more details

about git written in chapter 3.4.2. The git command for the download is:

git clone --recursive git://github.com/nburrus/rgbdemo.git

With this git command the source code is saved to the hard drive and in the next step to

configure the source code with the CMake GUI (see chapter 3.4.1). Therefore the source

directory is set to the path were the source code was downloaded. For example when the

source is on C:\ the source directory is “C:\rgbdemo”. Then we are determining the build

directory. We set the path to “C:\rgbdemo\build“.

During the project the experience was made that it is no good idea to use the Cache function

of CMake. It made more problems than it was useful. The cache can be deleted over File

menu.

The next step is to start the configuration by doing a click on the “Configure” button. In the

next step a compiler can be selected. In this project the compiler from Visual Studio 2010

were used. That is a 32-bit compiler.

A list of grouped names and values should appear now. These are the configuration

parameters. Set the parameter OpenCV_DIR to the folder where the OpenCV binaries are (for

example C:\OpenCV2.2).

To use the PCL open the NESTK group and select the checkbox at the name

“NESTK_USE_PCL”.

Hit the configure button again and your almost done. The Cmake log should not show any

errors. The CMake project is configured now. You can start to generate the Visual Studio

Project files by clicking on the “Generate” button. Cmake then generate the project files

(Visual Studio solution) to the source directory. For example to: “C:\rgbdemo\build”.


Then the Visual Studio solution with the name RGBDemo.sln can be opened. To finally get

an executable file you need to right click on the project and click “Build”.

There are two different build configurations available. These two are a “Release” and a

“Debug” configuration. In a debug build the complete symbolic debug information is emitted

to help visual studio to provide the debug tools and also the code optimization is not taken

into account. In the release build there is no symbolic debug information and the code is

optimized. The result is that the RGBDemo is approximately three times faster when with the

release configuration.

3.4. Supporting tools

3.4.1. CMake

A lot of tools that were used in this project needed CMake. CMake is a program to help and

automate the build and compilation process is cross-platform and open source.

CMake generate with source code and the CMake configuration files, project files for native

build environments. These are for example in Linux makefiles, in Windows Visual Studio

solutions and in Apple Xcode.

This solves the problem of many open source and cross-platform applications like OpenNI,

RGBDemo, PCL and OpenCV have. The problem is that one build environment for example

makefile cannot build applications for Windows and Visual Studio does not build applications

for Linux. CMake separate the process of writing source code and compile it for a platform.

Figure 7 CMake process

Figure 7 show the process supported from CMake. The configuration files names are

“CMakeLists.txt”. The native build system is for example a Visual Studio solution with

projects and the native build tools are in this case Visual Studio. With the CMake generated

solution you can compile the source code to an executable file or a library.

CMake differentiate between two different folder trees. One is the source tree which contains

the CMake configuration files, the source code and the header files and the other is the binary

tree which contains the native build system files and the compilation output as for example

executable files or libraries. Source and binary tree could be in the same directory but it is

better to spate the trees. This has the advantage that the developer can delete the binary tree

and not affect any of the source code. This two folder trees are configured in the CMake GUI

in the field’s labelled with “Where is the source code” and “Where to build the binaries”.

Cmake has a Cache where it saves the configured values. This cache is located in the build

tree and stores the key-value pairs in a text file called “CMakeCache.txt”. The Cache contains

a simple variable name and a value for that variable. Those configuration variables are for the

Configuration file

(CMakeLists.txt) CMake

Native Build System

Native Build Tools

(Visual Studio)

Executables / Libraries

(.exe, .lib)

http://www.dict.cc/englisch-deutsch/approximately.html


RGBDemo for example OpenCV_DIR with a value where the OpenCV folder is located or

NESTK_USE_PCL with the information whether we want to use PCL or not (where 1 is true

and 0 is false).

The configuration files have a simple syntax to govern the configuration process.

IF (NOT WIN32)

SET(NESTK_USE_FREENECT 1)

ENDIF()

This is an example code of the RGBDemo configuration file that ensures if the operating

system is not Windows to set the configuration value NESTK_USE_FREENECT to true. That

makes the OpenKinect (freenect) to the default Kinect interface in Linux and Apple.

But just setting the variable is not enough. In the source code we have to check if the CMake

variable is set and then use OpenKinect or OpenNI.

#ifdef NESTK_USE_FREENECT

# include <ntk/camera/freenect_grabber.h>

#endif

In the source code of the RGBDemo the OpenKinect (freenect) header files are only included

when the variable is set to true. #Ifdef is a pre-processor command for C and C++.

CMake is very scalable. KDE a desktop environment with approximately six million lines of

code used for their build process.


3.4.2. Git

Git is the name of a distributed version control and a source code management system. Git

was initially designed and developed by Linus Torvalds for Linux kernel development. Git

does not need any network access or a central server once the code is downloaded.

Source code management systems keep track of a source code of a program when more than

one person is working on it. A lot of open source projects and companies are using Git to

manage their code.

Git has a lot of commands and functionalities. The functions that were used in this project are

only the tip of the iceberg. The following commands are just a small subset of Git. The

commands below were the only ones used in this project. RGBDemo and OpenNI are using

Git also for version controlling.

Git is available for Windows, Mac OSX and Linux. The Windows version is integrated into

the command line. To download the source code from a Git address you navigate to the folder

you want to store the code and execute the following commands:

git clone --recursive + git url

The git address is published by the project you want the code from. The recursive command is

to clone the sub modules to. For example nestk is a sub module of the RGBDemo.

git fetch

When the code is in one folder you can download latest source code by using the fetch

command.

3.4.3. Meshlab

Meshlab is an open source program to process meshes and point clouds. MeshLab is oriented

towards the management and processing of unstructured large meshes and provides a set of

tools for editing, cleaning, healing, inspecting, rendering and converting these kinds of

meshes. MeshLab has been chosen as it's a free software and has been used in various

academic and research projects.

Meshlab was used for the point cloud processing and meshing (see chapter 0).

3.4.4. Microsoft Visual Studio

Visual Studio is a well-known integrated development environment from Microsoft. A source

code editor and a debugger are just a few of the functionality Visual Studio provides.

In this project Visual Studio was used to write the accuracy analysis program and inspect and

modify the source code of the RGBDemo.


4. Implementation


This chapter contains more information about how the 3D body model was produced and

other things that are important to create a 3D body model. The chapter also contains

information about the Kinect error and the accuracy experiment that has been made during

this project.

4.2. Kinect Calibration

Camera calibration is a way of analysing an image and derive what the camera situation was

at the time the image was captured. Camera calibration is also known by the term “Camera

resectioning”. The camera parameters are represented as a camera matrix. This is a 3 * 4

matrix. In the pinhole camera model, a camera matrix is used to denote a projective mapping

form the world coordinates to pixel coordinates using a perspective transformation.

“OpenNI comes with a predefined calibration stored in the firmware that can directly output

aligned depth and colour images with a virtual constant focal length. Most applications will be

happy with this calibration and do not require any additional step. However, some computer

vision applications such as robotics might need a more accurate calibration.” (MANCTL,

2012)

Because of the above statement from MANCTL in this project a calibration was made to

determine the specific intrinsic parameter of the Kinect. The RGBDemo contain an algorithm

to do this calibration.

Nicolas Burrus the programmer of RGBDemo is writing in the discussion board of the

program, that he used the calibration routine from OpenCV. It is basically a pin-hole model

with distortions.

For the calibration the application “calibrate-openni-intrinsics.exe” from the RGBDemo was

used. After the compilation the program can start over the command line (cmd.exe):

calibrate-openni-intrinsics --pattern-size 0.0325 calibration

calibration.yml

These are the parameter for the calibration program:

pattern-size The square size of the chessboard in mention

calibration The folder with a set of images with a checkerboard

calibration.yml The initial calibration file

The folders with the set of images with a checkerboard were made with the rgb-viewer.exe. A

few examples are in Figure 8.


Figure 8 Kinect calibration RGB images

The calibration.yml file was exported with the rgb-viwer.exe with the menu File -> Save

calibration file. This is the calibration file generated by OpenNI with the default parameters

for the Kinect.

The result of this process is a new calibration file “openni_calibration.yml”. In this file the

intrinsic camera parameter of the Kinect video camera and the depth camera are stored.

Camera matrix, or a matrix of intrinsic parameters

Cx and Cy is a principal point (that is usually at the image center)

and are the focal lengths

Result of calibration (RGB-Intrinsic):

The depth camera had the same intrinsic matrixes. It looked as if the depth camera calibration

for OpenNI backend is not available. But after the calibration files worked with the other

applications for example the Reconstructor program there was no further research in this area

executed.

The generated calibration file was used every time offline operations with the RGBDemo

were done.

The alignment between the depth data and the RGB data are done by the OpenNI framework

internally. This means the depth at point [1, 1] are corresponding to the colour in the RGB

image at point [1, 1].


4.3. Collect data

When the connection with the Kinect was available data could be recorded. The RGBDemo

was used to collect the depth and image data.

In this project the data was not processed in real time. This means in the first step the data was

recorded to the hard drive and in the second step the data was then reconstructed from there.

This had a few advantages.

One advantage is that the RGBDemo can reconstruct only 1 FPS (frames per second) but can

save 6 FPS to the hard drive. The barrier for more frames per second are not the computing

time of the reconstruction instead it is the limitation of the hard drive. To record the Kinect

data in this project a SATA hard drive with 7200 and 5400 rpm was used. The program can

probably save a lot more frames per second with an SSD (Solid-state drive) hard drive.

Another advantage of recording and saving data to a disc is that they are not lost after one

reconstruction cycle. When the data is saved it can be used again to make the reconstruction

or the accuracy analyse several times with different parameter or different program code.

One disadvantage might be that you do not have a full reconstructed model after a scan. It

takes a bit of time to analyse the data after it’s saved. In a commercial use this might be a

critical factor.

4.3.1. How to save the Kinect data

The program to save the data is the “rgbd-viewer.exe” (see chapter 3.3).

All of the necessary recording functions are encapsulated in a class called

“RGBDFrameRecorder”. You can see the function and properties in the following UML

diagram:

Figure 9 RGBDFrameRecorder UML class

This class stores a series of RGB-D images in “view directories”.


The first step is to select the folder where the program should save all the frames. This is

implemented through a QT text field (see chapter 3.3.2.1).

The program that handles the storing of the information is as mentioned above the “rgb-

viewer.exe”. This viewer has a function called “onRGBDDataUpdated()”. The function name

says pretty much what this function is doing. Every time when there is a new frame coming

from the Kinect this function is called. This function contains amongst others the command:

m_frame_recorder->saveCurrentFrame(m_last_image);

m_frame_recorder is from the data type „ RGBDFrameRecorder “ and m_last_image is from

the data type “RGBDImage”.

The function “saveCurrentFrame” generates the full directory path where the data should be

stored and call the function “writeFrame()” from the class RGBDFrameRecorder. In this

function all of the information from this frame is actually stored to the hard drive.

The folder structure after two saved frames looks like that:

GRAP1

is the name of the configured folder we want to save the data.

a00366901966103a

is the serial number of the Kinect. This folder exists when someone wants to record multiple

Kinects. In this case for every Kinect there is a new folder.

viewXXXX

For every frame there is a new viewXXXX folder. XXXX are a consecutive number from 0.

raw

In this folder all the raw frame data are stored.


4.3.2. What data are saved per frame?

The term “Frame” has a lot of different definitions. In this project a frame is a collection of

the following three files.

4.3.2.1. color.png

Containing the image data of the frame compressed in

the format PNG. PNG stands for “Portable Network

Graphics” and it is a bitmapped image format that

enables lossless data compression. A lossless data

compression allows reconstructing the exact original

data from the compressed data. An example of such an

image is on at the right side. This image is saved with

colour information.

The resolution of the image is 640x480 pixels.

4.3.2.2. depth.raw

This is a file format for RGBDemo. The nestk library has functions to read and write in that

format. These functions are located in the opencv_utils.cpp and their names are

“imwrite_Mat1f_raw” and “imread_Mat1f_raw”.

qint32 rows = m.rows, cols = m.cols;

f.write((char*)&rows, sizeof(qint32));

f.write((char*)&cols, sizeof(qint32));

f.write((char*)m.data, m.rows*m.cols*sizeof(float));

In this code snipped you can see how the raw information is saved. First the program writes

two 32 bits integers which contain row and column information. And then it saves rows*cols

32 bits float values.

Because of that process the every depth.raw have a size of 1.17 MB (1.228.808 bytes).

640 and 480 are the normal weight and height of the depth image.

4.3.2.3. intensity.raw

This is the IR image normalized to grayscale and saved with the same method as the

depth.raw. Because of that the intensity.raw-file also has a size of 1.17 MB.

Figure 10 color.png example picture


4.4. Accuracy measurements

During the project an application was programmed to find out the accuracy of the Kinect.

Measurement errors can be split into two components: random error and systematic error.

(Taylor, 1999)

Figure 11 Random and systematic error (Taylor, 1999)

4.4.1. Random error

Random errors are errors in measurements inherently unpredictable, and have null expected

value in the experiment. Every measurement is susceptible to have a random error. Random

errors show up as different results for supposedly the same repeated measurement.

4.4.2. Systematic error

“Systematic error is caused by any factors that systematically affect measurement of the

variable across the sample. “ (Research methods, 2012)

“The correction of systematic errors is a prerequisite for the alignment of the depth and colour

data, and relies on the identification of the mathematical model of depth measurement and the

calibration parameters involved. The characterization of random errors is important and useful in

further processing of the depth data, for example in weighting the point pairs or planes in the

registration algorithm.” (Kourosh & Elberink, 2012)

In their paper (Kourosh & Elberink, 2012) are pointing out, that it is important to know the

random error when processing the data.


Several sources are coming to the conclusion that “the random error of depth measurements

increases quadratically with increasing distance from the sensor and reaches 4 cm at the

maximum range” (Kourosh & Elberink, 2012) (ROS, 2011).

4.4.3. The setup

To look if these results of the previous papers are possible a setup to test the random error of

the Kinect was made.

Figure 12 Kinect accuracy test setup

The Kinect is pointing towards the sheet. Then the information form the Kinect is collected

with the method described in the previous chapter 4.3. As a result a few frames on different

distances to the Kinect were taken. This is one example frame:

Figure 13 Kinect accuracy test example frame

4.4.4. Colour Segmentation

Of course the program should just use depth point of the white area on the sheet and not the

depth data from somewhere else. The first approach was to use colour segmentation.

The algorithm searches for all of the pixels in the RGB Image that are white, and because the

depth and the RGB Image are aligned every pixel of the depth image identified in the RGB

Image as a white pixel was used. Certainly as you see the in Figure 13 these pixels are never

prefect white. Because of that other colours then clean white were used (for example grey

colour shades).


As you can also see in Figure 13 there is a white colour in the background of the image

recorded from the light. These white points were not used for the accuracy analysis because

there were thresholds in the program. For example when the plane was located in 100 cm

distance of the Kinect the program just used depth values that are in the range from 90 cm to

110 cm for the calculation.

But there were still problems with that approach because there were also white pixels on the

retainer of the sheet and the depth artefacts which were also used because their value were

inside the depth threshold.

Because of those problems the colour segmentation was

not used and the area of the sheet were defined by hand.

The point on the left upper corner and the point at the

bottom on the right were defined and then the program

just used the points inside this rectangle.

The highlighted rectangle at Error! Reference source

ot found. shows the area that is used for the analysis. All

the pixels that are red are included into the accuracy

measurement for the program to analyse.

4.4.5. Histograms

A histogram is "a representation of a frequency distribution by means of rectangles whose

widths represent class intervals and whose areas are proportional to the corresponding

frequencies." (Dictionary, 2012)

In this project histograms were used to show visually the distribution of the depth values on

the observed rectangle on the sheet.

To generate a histogram you first have to define a range you want to observe. For example

when we had measured with the sheet at a distance of 60 cm we are looking at the highest and

the lowest value of the depth data (58 and 62 cm) and used them as a range.

Then these ranges were divided into intervals and then every depth was assigned value to one

of these intervals. To

calculate the histogram

functions of OpenCV

were used (cvCreateHist,

cvCalcHist).

Figure 15 shows the

histogram when the sheet

is 60 cm away from the

Kinect sensor. The gaps

between some occur

because of the

quantisation range of the

depth inside the Kinect.

Figure 15 Histogram

Figure 14 Accuracy measurement with

highlighted area


4.4.6. Standard deviation

Standard deviation measures the dispersion of a set of data from its mean. The more the

values are spread apart the mean, the higher the deviation. Standard deviation is a well-known

parameter in statistics. The standard deviation is calculated as the square root of variance.

The standard deviation can be calculated with the following formula.

√

∑

Where in this experiment

{ } are the depth values

is the mean of the depth values

In this project the standard deviation was calculated over a set of frames with the depth values

on the sheet. The results of these calculations are shown in 4.4.9.

4.4.7. Error map

To visualize the errors there was also made an error map. Every depth

value in the frame was subtracted by the mean of all depth values. The

result values then converted into colour information according to the

distance of the mean and showed in the accuracy program.

4.4.8. Problems

There is a problem when the Kinect is not facing straight to the sheet. This problem could be

solved by estimating a plane with the depth values with SVD and then calculate the distance

from all points to this plane. That was not made in this project because at the end of the

project the priorities were the 3D body model.

4.4.9. Results

Distance

[cm]

Average

distance

[cm]

Average

minimal

value [cm]

Average

maximal

value [cm]

Standard

deviation

Frames

90 89.9007 88.7333 89.9007 0.496103 6

80 80.1377 79.1571 81.2036 0.369619 56

70 70.5072 69.6477 71.6892 0.386073 111

60 60.3991 59.5136 61.3 0.39508 22

The standard deviation is really high. The error is higher than the results in the literature

(Kourosh & Elberink, 2012) (ROS, 2011). There are a lot of error sources but one important is

properly the one pointed out in 4.4.8. Another bad effect could be if the calibration went

wrong.

Figure 16 Error map of the first

frame (60 cm)


4.5. 3D reconstruction

In computer vision and computer graphics 3D reconstruction is to capture the shape (in our

case the body) of an object. In this project two different Kinect programs that provide

reconstruction functionality were evaluated. These two called RGB-Demo Reconstructor and

PCL Kinect Fusion.

4.5.1. PCL Kinect Fusion

The PCL (point cloud library) project was already mentioned in chapter 3.3.2.3. They

implemented the Kinect Fusion algorithm into their library (PCL, 2011). The Kinect Fusion

project “investigates techniques to track the 6DOF position of handheld depth sensing

cameras, such as Kinect, as they move through space and perform high quality 3D surface

reconstructions for interaction” (Microsoft, 2012). They have published the two research

papers “KinectFusion: Real-time 3D Reconstruction and Interaction Using a Moving Depth

Camera” (Izadi, et al., 2011) and “KinectFusion: Real-Time Dense Surface Mapping and

Tracking” (Newcombe, et al., 2011) . The PCL open source community is about to implement

the algorithm from this scholarly papers in the PCL source code.

This program is not in the official release version which means it is currently in development

and can be used just experimental. During the evaluation of reconstruction programs this

implementation was also tried. Therefore the SVN and CMake (see chapter 3.4.1) were used

to build environment. It worked but not perfect. Probably this was because the algorithm was

not perfectly implemented. The algorithm support real time reconstruction and the code rely

heavily on the NVidia CUDA development libraries for GPU optimizations. Compute Unified

Device Architecture (CUDA) is a parallel computing architecture developed by Nvidia for

graphics processing (Nvidia, 2011).

In this project this approach was not used because it was too difficult to predict in which

direction the project is moving. Because the program is in an experimental stage the creator of

this program could easily change interfaces which could have an effect on the program. By

the time the program was tested there were also no easy export function of the generated

model, but it is possible to extract this point cloud model somehow. A little disadvantage was

also that results are without colour. Because of this the RGBDemo Reconstructor was used. In

a few months PCL KinectFusion is definitely an option to keep an eye on.

4.5.2. RGBDemo Reconstructor

The RGB Reconstructor “rgbd-reconstructor.exe” is a part of the RGBDemo demo programs.

In chapter 3.3 this tool is already explained.

When we generate the Visual Studio solution we have to build the rgbd-reconstructor project.

It is necessary to build in “Release”-mode so that the speed of the reconstruction is faster. The

Debug mode has a lot of overhead from the debugging tools.

The official purpose of the RGBD Reconstructor is the interactive scene reconstruction. In

praxis this looks if you are walking in a room with a Kinect and the program will then

progressively aggregate all captured frames in a single 3D point cloud model.


Because the normal purpose of the Reconstructor is to scan a room and not a person’s body

the first thing was to look if the reconstruction also works with a person inside the scene

(room). The big different between a scene (room) and a person is that even when the person

tries to stand still there is movement of the body. For instance when the person breathes there

is a small movement of the chest and when the person is wearing clothes they also move a

little bit.

This data is represented in a single reference frame using a Surfel representation to avoid

duplicates and to even out the result. An object is represented by a dense set of points Surfel is

an abbreviation of "surface element" and in 3D computer graphics Surfels are a alternative to

polygonal models (Meshes). The creator of the RGBDemo calls it Surfel representation but it

is also possible to call it point cloud.

It is possible to use the RGBD Reconstructor in real-time. That means you can start this

program with a connected Kinect and as a frame comes in its analysed and integrated into the

point cloud. In this project this was not done because of the following disadvantages.

First it does not use every frame that is coming in from the Kinect in real-time because it took

the program to a lot of calculations to find the right spot where it can insert the data of this

frame. Therefore the program uses just 1 Frame per second with an i7-2720QM CPU (quad-

core processor with 2.20 GHz/Core). Of course there is a possibility to improve this process

of the algorithm or use a faster computer.

The second disadvantage has a little bit to do with the development respectively adaption of

the reconstruction program. When you save all the frames on a disk you can repeat the

reconstruction several times with different parameters and other modifications and can look if

there are differences in the quality of the result (Point cloud).

That is why the frames of the Kinect were collected with a frame rate as fast as possible with

a hard drive (about 4-10 FPS) and the reconstruction program run on this data to get a point

cloud. How and what data recorded is explained in chapter 4.3.

The data is recorded whilst another person walked around the subject (first person) with the

Kinect sensor in his hand to get the information about the body from 360 degree. The first

thought was that the person standing in front of the Kinect rotates on its own axis. That has

the advantages of the scanning process not needing two involved persons (One who scan and

one who are the subject). But there were problems with that approach. First the RGBD

Reconstructor is not build to make this task. But this is not the main problem because it is

possible to change the algorithm. The major problem is that when the person is rotating

around its own axis the body deforms too much.


Figure shows six frames from the recorded data of one scan. On the left side is the depth

image converted into a colour representation and on the right side the corresponding RGB

Image. For one reconstruction approximately 1000 frames were used.

After the images were collected the reconstruction process begun. The reconstruction program

(rgbd-reconstructor.exe) can be started with the following command on the windows

command line (cmd.exe).

C:\RGBDemo-0.7.0-Source\build\bin\Release>rgbd-reconstructor.exe --

calibration openni_calibration.yml --directory C:\usi7 --sync true –-icp

true

The parameters for the Reconstructor are:

calibration The calibration file (yml) (see chapter 4.2)

directory The folder where the recorded data is located (see chapter 4.3)

sync The synchronization mode that should be used

icp Use ICP to refine pose estimation

The synchronization mode tells the program that it should use every frame to build the point

cloud.

After the process of reconstruction which takes about 10 minutes the result is a point cloud in

the programs memory. This point cloud can be exported in the format .ply.

Figure 17 six example input frames for the Reconstructor


4.5.2.1. Export file format - .PLY

PLY is a computer file format known as the Polygon File Format or the Stanford Triangle

Format. The format is designed to save three dimensional data. The format has a relatively

easy structure and there are two variation of the format one in ASCII, the other in binary. The

RGBDemo exports its 3D models in the ASCII version.

ply

format ascii 1.0

element vertex 3294565

property float x

property float y

property float z

property float nx

property float ny

property float nz

property uchar red

property uchar green

property uchar blue

end_header

0.00290329 0.00341359 0.429719 -0.203918 0.268669 -0.9414 254 254 254

-0.000625859 -0.00391612 0.432549 0.200947 -0.465571 -0.861896 254 254 254

-0.0013024 -0.00193452 0.438721 0.333459 -0.0203877 -0.942544 254 254 254

0.004828 0.000784338 0.443075 0.457977 0.101636 -0.883135 254 254 254

This is the header and a few points of an exported file. The first line “ply” indicates the file as

a PLY file. The second line indicates which variation of the PLY format this is. The third line

presents a description of how some particular data elements is stored and how many of them

there are. The following “property“-lines are describing how the element is represented.

Where x, y, z are the coordinates, nx, ny, nz are the normals and red, green, blue are the RGB

representation of the colour of a point.

Figure 18 Reconstructed point cloud


Figure 20 is showing the result of reconstruction process in MeshLab. The person is on the

left side and as you can see the algorithm reconstructed also the walls and a bit of the floor.

That unimportant information is deleted in the next step (see chapter 4.6).

4.5.3. Implementation

In the official forum the main developer of

the RGBDemo describes the algorithm of

the RGBD Reconstructor this way: “it

basically uses feature point matching,

following by RANSAC and optional ICP

pose refinement” (Burrus, 2012).

The program extract the SURF features

from the camera image and localize them in

3D space. Then it matches these features

between the previous acquired images, and

use RANSAC to robustly estimate the 3D

transformation between them. Optionally it

uses ICP to refine the estimated camera

position.

If the algorithm found a pose and the error

thresholds (of RANSAC and ICP) are not

exceeded then the program adds the points

in this frame to the reference frame. That is

the point cloud which is later exported from

the program.

4.5.3.1. SURF

SURF stands for “Speeded Up Robust Feature” and was first presented by Herbert Bay et al.

in 2006 (Bay, et al., 2008). It is an image detector and descriptor that can be used in computer

vision. SURF is based on sums of 2D Haar wavelet responses and makes an efficient use of

integral images. The standard version of SURF is partly inspired by SIFT. SIFT is another

better known algorithm in computer vision to detect features in images. The standard version

of SURF is faster than the SIFT implementation.

In computer vision and image processing feature detection is a concept to find information

about an image that describes an image in a way a computer can work with. The result of

feature detectors (SURF) is often subset of points which describe the image appropriate.

Often the features extracted by analysing the surrounding pixels of one pixel.

There are different types of image features and often feature detection algorithms are

specialised on one of these features. These types are for example Edges, Corners / interest

points, Blobs / regions of interest or interest points or Ridges.

Figure 19 RGBD Reconstructor flowchart


After the SURF algorithm has found the interest points in the RGB image this information is

combined with the depth data because the interest points of the RGB image are just two

dimensional. For example if an interest point is at the pixel [5, 5] it access the depth data on

pixel [5, 5] and combines this value to a point in three-dimensional space. These 3D points

are then matched with 3D points of previous frames. The result is a set of point-wise 3D

correspondences between two frames. Based on these correspondences the RANSAC

algorithm estimates the relative transformation between the frames.

4.5.3.2. RANSAC

RANSAC is an abbreviation for "RANdom SAmple Consensus". It is an algorithm to

estimate parameters of a mathematical model from a set of observed data. The property of this

observed data is often that they have outliers.

The input data for the relative pose transformation estimation with RANSAC at the RGB

Demo are the interest point correspondences.

4.5.3.3. ICP

The RGB Demo uses a variation of the ICP (Iterative closes point) (ZHANG, 1992) to refine

the estimated transformation. The process is optional and takes computing time. If it is

important to have a very fast reconstruction it is better to turn of this feature. Certainly the

quality of the reconstructed object is not as good as possible. Because this project did not look

on the speed properties of the reconstruction this refinement with ICP were used.

The implemented ICP algorithm is not in the RGBD source code. Instead the program is using

the ICP method from the Point Cloud Library (see chapter 3.3.2.3).


4.5.4. Problems/Solutions

This section contains problems with the reconstruction and what the solutions of these

problems were.

There was a problem that the RGBDemo did not use all frames successive. Instead it took the

first image and when the Reconstructor program tried to estimate the camera position the

other frames continued. This means after the first frames was computed the program did not

use the second frame instead it used the frame that is at this moment active for example the

fifth frame. But with this behaviour the reconstructed model lost information and the

estimated camera position was not very good because there was not enough reference

information.

The RGBDemo had an option to compute the frames one by one but in the early versions this

option did not work. This option can be activated by putting a “--sync true” parameter when

starting the Reconstructor program. Nevertheless in the first versions this option did not work

but with the version 0.7.0 that bug was fixed and the frames were computed one by one.

Another issue was that the reconstruction stopped at a certain point. Often this was the case

when the front of a body was scanned and then came to the side of the person. Somewhere in

this area the Reconstructor algorithm lost the track. An example of this issue is shown in

Figure 20. As you can see in the side view there just a few points from the back.

Figure 20 Failed reconstruction shown from front and side view


The next attempt with red markers in the background also failed. The idea behind the red and

white paper stuck on the walls was that the algorithm may be able to use this colour to find

more feature points. But as you can see in

Figure 21 this did not work. The supervisor gave the advice to try another environment for the

scan because although there more markers added to have more feature points they maybe still

to small. After a change of the location the reconstruction

worked as expected.

Figure 21 Reconstruction with marker

This is a limitation of this algorithm. The background should not be monotonous. The more

diversified the background colour is the merrier. Another solution could be to change the

threshold of the feature points. This could have an effect on the quality of the estimated

position. It is also possible to estimate the camera position just with the depth information but

this is a big change in the actual algorithm.


4.6. Point cloud processing

A point cloud is a set of points in a three-dimensional coordinate system. These points called

vertices are the plural form of vertex. In computer graphics a vertex is a data structure to

describe a point. The result of the reconstruction is a point clouds were the point are defined

by X, Y and Z coordinates and the colour. To process this point clouds the program Meshlab

was used.

“MeshLab is an open source, portable, and extensible system for the processing and editing of

unstructured 3D triangular meshes. The system is aimed to help the processing of the typical

not-so-small unstructured models arising in 3D scanning, providing a set of tools for editing,

cleaning, healing, inspecting, rendering and converting this kind of meshes.” (MeshLab,

2012)

Although there is mesh in the name of the program it also has a range of utilities to edit point

clouds. MeshLab can open the ply (Polygon File Format) files that are exported by the

reconstruction program. In this project the cleaning of the point could was made by hand.

Cleaning a point cloud means to delete all points that do not belong to the object. It is possible

to automate this process by trying to recognize the walls and the ground and delete these

points but this was not done in this project.

The images below show the process of the cleaning on the left side and the result on the right

side.

Figure 22 After point cloud processing Figure 23 Point cloud cleaning


4.7. Meshing

The result of the reconstruction and the cleaning is a point cloud. They are a bunch of

disconnected points floating near each other in three-dimensional space. When we look

closely the image will break down into a bunch of distinct points with space visible between

them. “If we wanted to convert these points into a smooth continuous surface we’d need to

figure out a way to connect them with a large number of polygons to fill in the gaps. This is a

process called "constructing a mesh"” (Borenstein, 2011).

To build a mesh the Poisson surface reconstruction (Kazhdan, et al., 2006) implemented into

MeshLab is being used. The Poisson algorithm is designed to handle noisy point clouds like

ours. The Poisson algorithm has a triangle mesh as a result.

An alternative algorithm implemented in Matlab is the Ball Pivoting (Bernardini, et al., 1999).

This algorithm uses the points from the point cloud and links them together into triangles.

Because these algorithms use the points from noisy data, there are a lot of holes in that mesh.

Additionally there are double surfaces when point clouds are not perfectly aligned.

The advantage of the Passion algorithm is that it minimizes the creation of holes even if some

parts of the surface are missing in the point cloud. This is because the algorithm wrap around

the points. The algorithm does not use the points of the point cloud as a vertex. Because of

that property of the algorithm produces smooth surface. In order that the Passion algorithm

works right it is necessary that every point in the point cloud was assigned normal. These

normals can be calculated with Meshlab’s filter called “Compute normal for point sets”.

After the Passion surface reconstruction the colour is lost in the mesh. To colorize the mesh

there is a Matlab filter called “Vertex attribute transfer”. This filter picks the colour from the

nearest point of the point cloud and applies them to the mesh.

Figure 24 Meshing process


4.8. Measurements

There were not a lot of measurements done in this project because of the lack of time.

MeshLab has a measuring tool to measure distances.

Figure 25 3D body model with measurement

As you can see in Figure 25 the result of height is 1.71985 m. It turns out that the real height

of the test person is 1.72 m (depends if hair counts). This is quite accurate but was just made

with one person and is not representative.

The measurement tool calculates the distance between two points ( ) in 3D space

with the following formula.

√

Where one point is and the second point is .


5. Critical Evaluation


This chapter mainly contains evaluations and thoughts about the different aspects of the

project. It also includes possible improvements and learning outcomes.

5.2. Project Management Evaluation

All of the requested documents were submitted timely and also all progress reviews were held

in time.

The task scheduling with a Gantt chart was done pretty early in the project. As part of the

project the problems in the project changed and also the time scheduling changed.

A problem at the end of the project was that the planning had not taken into account that at the

end of the academic year there is a lot of other work to do. In a planning for another project

this factors should take into account.

The risk management was an important part at the beginning of the project. Especially in this

project it was very good that the risk “Losing information about the project or related

information” was regarded, and backup of relevant data was made. Because you cannot trust

that the hard drive lasts the whole project. In this project the hard drive broke in the middle of

the project. If there would have been no backup available it could have caused a major impact

on the project progress.

5.3. Design and Implementation

In a project in the real world normally you also evaluate and consider other depth scanners to

take body measurements. But due the limited budget in this project the Kinect was the only

device that was affordable. That is one of the points why the Kinect was used in the first

place. The other advantages are for example the big community around that device and that a

lot of people already have this device in their homes.

It is impossible to use the Kinect on a computer without a Kinect interface. That is why it is

necessary to choose a Kinect interface to get the relevant data. In this project it was a good

idea to use the RGBDemo. It had already implemented a bunch of useful functions to process

the information from the Kinect and a reconstruction example to build on. But you have to

trust that the information from this additional middleware is right. In this project there was

never the suspicion that something does not work correct except the calibration of the depth

camera.

It is recommendable to store the collected data on the hard drive and then use it to reconstruct

the body. For this project it was the right decision but in commercial use it would be more

practical to have the reconstruction in real time and without the need of MeshLab to clean the

relevant information out of the reconstructed scene. This might be implemented with a depth

threshold to exclude the walls in the scene. It was interesting to build a setup to test the

accuracy of the Kinect. In case to repeat this test it is advisable to use a bigger sheet.

http://www.dict.cc/englisch-deutsch/timely.html


The reconstruction worked but not perfect. It depends a lot on the environment to work well.

There are a lot of possible improvements in this area. Especially the KinectFusion (Izadi, et

al., 2011) implemented in an experimental PCL version should not be unconsidered. If it is

ready implemented it could be better than the RGBDemo but in this project it was too early.

Through the lack of time the actual body measurements where really basic one. To take more

measurements of other body parts it is necessary to improve the

reconstruction process and evaluate and find other programmes

to measure meshes.

It should be noted that the scanned persons should extend the

arms in new scans (Shown in Figure 26). It is possible that this

has also an effect on the reconstruction and meshing quality.

5.4. Possible Improvements

To simplify the reconstruction process it would be a very interesting experiment to use more

than one Kinect. Three Kinects are still cheaper than most other depth sensors. It is possible to

connect more than one Kinect to a computer. For example you could use three Kinects from

three different angles and take just a few (or one) frame and then reconstruct the model from

this. This has the advantage that the problem with body movements during the data collection

is minimized. If the three Kinects are calibrated and the position of the cameras is known

there is no camera position estimation necessary. A disadvantage is that no normal consumer

has three Kinects at hand. This improvement could be work for a whole new project but could

still use knowledge and experiences of this project.

Another possible improvement could be to use the Kinect vertically. Therewith you can go

closer to the person scanned and have a more accurate result because the accuracy depends on

the distance to the object. The RGBDemo need some modification to use it in this position.

It could be useful to use the depth information to estimate the camera position instead of the

RGB image information. That has the advantage that the reconstruction is independent from

the surrounding environment.

Machine learning, a branch of artificial intelligence, is maybe an interesting topic to work

with the data generated with that project. Machine learning generates the results by comparing

data with samples from a database of known models. Let’s imagine a database with 3D body

models and each measurement is written down. A new body model is then compared to this

database and looks for similar models to generate with the known measurements a result. The

difficulty in machine learning is that the input models are too much to be covered by a set of

observed examples (training data). Therefore the training data must be generalized. There are

a lot of different algorithms to face this problem also known under the term “pattern

recognition algorithms”.

Figure 26 Model with arms

extended (The Districts, 2011)


5.5. Learning Outcomes

Certainly you know a lot more about the Kinect and how to handle information from the

sensors. Because the Kinect is a depth sensor you also learn more about depth sensors and

their abilities. Also you learn how to access information form the sensor and learn about the

different interfaces and how you can use and install them.

In the accuracy test you learn how to make an experiment and then how to analyse the data.

You also know afterwards that there are different possible error sources and if it is possible to

correct them. You are also learning mathematical skills when analysing the generated data.

The accuracy analysis was programmed in the programming language C++. It is interesting to

work with C++ interfaces that are new for example OpenCV.

After you are working with a bunch of open source projects you are beginning to know that

there is a pattern a lot of open source project are having. For example a lot of them use some

sort of code management tools like SVN or git and a lot of the open source have discussion

boards when you are having questions.

Because the Kinect is not just a depth sensor and also have an in-build RGB camera you learn

how to process and analyse images. What feature points are and how they could help you with

the task you are working on.

When you are looking at code from somewhere else you learn how helpful it is, when there

are comments in the code. Especially when looking at reconstruction algorithm from

RGBDemo it is very helpful. You also learn how difficult it is to reconstruction non-rigid

objects. After this project you know for sure what point clouds and what meshes are and what

the difference between them is.

6. Conclusion

This report gave an overview about the important and interesting parts of this project to turn a

human body into a digital 3D representation.

The project proofs that there are a lot of interesting applications beyond the initial propose as

a game console input device. It shows how the sensor from the Kinect can be accessed and

how the information can be processed. At the moment it contains only a basic measurement.

At the end of the project is a 3D body model but this is still not perfectly accurate. The

evaluation presents a few important ideas to build on this project. For example use of multiple

Kinects or machine learning.

The whole process is at the moment not automated and it needs an expert to build the 3D

body model. That means in this state it is not ready for commercial use, because for the

customer all of the steps made by hand should be done by the computer program. It is

definitely possible to build a program to automate this.

The technology build in the Kinect is still at the beginning of a very interesting future in this

area.


7. References

Bay, H., Ess, A., Tuytelaars, T. & Gool, L. V., 2008. SURF: Speeded Up Robust Features.

Computer Vision and Image Understanding (CVIU), Volume 110, pp. 346-359.

Bernardini, F. et al., 1999. The Ball-Pivoting Algorithm for Surface Reconstruction. IEEE

Transactions on Visualization and Computer , 5(4), pp. 349-359.

Borenstein, G., 2011. Making Things See. s.l.:O'Reilly Media / Make.

Burrus, N., 2012. How rgbd-reconstructor.exe works?. [Online]

Available at: https://groups.google.com/d/msg/rgbdemo/fY1d950ZRxc/8QUALhLpv4wJ

[Accessed 8 April 2012].

Burrus, N., 2012. nestk. [Online]

Available at: https://github.com/nburrus/nestk

[Accessed 03 27 2012].

Burrus, N., 2012. rgbdemo. [Online]

Available at: https://github.com/nburrus/rgbdemo

[Accessed 27 3 2012].

Burrus, N., 2012. RGBDemo License. [Online]

Available at: http://labs.manctl.com/rgbdemo/index.php/Main/License

[Accessed 27 May 2012].

D’Apuzzo, N., 2009. Hometrica. [Online]

Available at: http://www.hometrica.ch/pres/2009_essilor_pres.pdf

[Accessed 19 November 2011].

D’Apuzzo, N., 2009. Hometrica. [Online]

Available at: http://www.hometrica.ch/pres/2009_essilor_pres.pdf


Dictionary, F. M.-W., 2012. Histogram - Definition. [Online]

Available at: http://www.merriam-webster.com/dictionary/histogram


Freedman, B., Shpunt, A. & Arieli, Y., 2010. Distance-Varying Illumination and Imaging

Techniques for Depth Mapping. s.l. Patent No. US2010/0290698.

Izadi, S. et al., 2011. KinectFusion: Real-time 3D Reconstruction and Interaction. Santa

Barbara, CA, USA., ACM Symposium on User Interface Software and Technology.

Kazhdan, M., Bolitho, M. & Hoppe, H., 2006. Poisson surface reconstruction. s.l.,

Proceedings of the fourth Eurographics symposium on Geometry processing, pp. 61-70.

Kinect for Windows Team, 2012. Starting February 1, 2012: Use the Power of Kinect for

Windows to Change the World. [Online]

Available at: http://blogs.msdn.com/b/kinectforwindows/archive/2012/01/09/kinect-for-


windows-commercial-program-announced.aspx

[Accessed 02 04 2012].

Kourosh, K. & Elberink, S. O., 2012. Accuracy and Resolution of Kinect Depth Data for

Indoor Mapping Applications. sensors, II(12), pp. 1437-1454.

Lee, J. C., 2011. Windows Drivers for Kinect, Finally!. [Online]

Available at: http://procrastineering.blogspot.co.uk/2011/02/windows-drivers-for-kinect.html


libfreenect, 2011. libfreenect. [Online]

Available at: https://github.com/OpenKinect/libfreenect


MANCTL, 2012. Calibrating your Kinect (OpenNI backend). [Online]

Available at: http://labs.manctl.com/rgbdemo/index.php/Documentation/Calibration

[Accessed 27 March 2012].

MeshLab, 2012. MeshLab. [Online]

Available at: http://meshlab.sourceforge.net/


Microsoft, 2012. KinectFusion Project Page. [Online]

Available at: http://research.microsoft.com/en-us/projects/surfacerecon/


Microsoft, 2012. The Kinect Effect. [Online]

Available at: http://www.xbox.com/en-GB/Kinect/Kinect-Effect


MicrosoftInt, 2011. Introduction to Kinect for Windows. [Online].

Newcombe, R. A. et al., 2011. KinectFusion: Real-Time Dense Surface Mapping and

Tracking. Basel, IEEE.

Nokia, 2012. Download Qt, the cross-platform application framework. [Online]

Available at: http://qt.nokia.com/downloads


Nvidia, 2011. http://developer.nvidia.com/nvidia-gpu-computing-documentation. [Online]

Available at: http://developer.nvidia.com/nvidia-gpu-computing-documentation


OpenCV, 2012. OpenCV Download. [Online]

Available at: http://sourceforge.net/projects/opencvlibrary/files/opencv-win/

[Accessed 28 March 2012].

OpenKinect, 2011. OpenKinect. [Online]

Available at: http://openkinect.org/wiki/Main_Page



OpenKinect, 2012. OpenKinect. [Online]

Available at: http://openkinect.org/wiki/Main_Page


OpenKinectSrc, 2012. libfreenect. [Online]

Available at: https://github.com/OpenKinect/libfreenect


OpenNI, 2012. Abstract Layered View. [Online]

Available at: http://openni.org/Documentation/ProgrammerGuide.html


Pandya, H., 2011. Microsoft Kinect: Technical Introduction. [Online]

Available at: http://entreprene.us/2011/03/09/microsoft-kinect-technical-

introduction/kinect_hacks_introduction/


PCL, 2011. An open source implementation of KinectFusion. [Online]

Available at: http://pointclouds.org/news/kinectfusion-open-source.html


Research methods, 2012. Measurement Error. [Online]

Available at: http://www.socialresearchmethods.net/kb/measerr.php


ROS.org, 2010. Depth calculation. [Online]

Available at: http://www.ros.org/wiki/kinect_calibration/technical#Depth_calculation


ROS, 2011. openni_kinect/kinect_accuracy - ROS Wiki. [Online]

Available at: http://www.ros.org/wiki/openni_kinect/kinect_accuracy


Takahashi, D., 2012. Gamesbeat. [Online]

Available at: http://venturebeat.com/2012/01/09/xbox-360-surpassed-66m-sold-and-kinect-

has-sold-18m-units/

[Accessed 27 03 2012].

Taylor, J. R., 1999. An Introduction to Error Analysis: The Study of Uncertainties in Physical

Measurements. s.l.:University Science Books.

The Districts, 2011. The Districts. [Online]

Available at: http://thedistricts.wordpress.com/tag/film-terms/


Zalevsky, Z., Shpunt, A., Maizles, A. & Garcia, J., 2007. METHOD AND SYSTEM FOR

OBJECT RECONSTRUCTION. Israel, Patent No. WO2007/043036.

ZHANG, Z., 1992. Iterative Point Matching for Registration of Free-form Curves. s.l.:s.n.


8. List of Figures

Figure 1 2D, 2.5D and 3D (D’Apuzzo, 2009) ............................................................................ 6 Figure 2 Microsoft Kinect for Xbox 360 (Pandya, 2011) ........................................................ 10 Figure 3 Image from the PrimeSense patent (Zalevsky, et al., 2007) ...................................... 12 Figure 4 Structured light (Freedman, et al., 2010) ................................................................... 13

Figure 5 OpenNI framework architecture (OpenNI, 2012) ...................................................... 14 Figure 6 Installed OpenNI driver at the Windows Device Manager ........................................ 15 Figure 7 CMake process ........................................................................................................... 23 Figure 8 Kinect calibration RGB images ................................................................................. 27

Figure 9 RGBDFrameRecorder UML class ............................................................................. 28 Figure 10 color.png example picture ........................................................................................ 30 Figure 11 Random and systematic error (Taylor, 1999) .......................................................... 31 Figure 12 Kinect accuracy test setup ........................................................................................ 32

Figure 13 Kinect accuracy test example frame ....................................................................... 32 Figure 14 Accuracy measurement with highlighted area ......................................................... 33 Figure 15 Histogram ................................................................................................................. 33 Figure 16 Error map of the first frame (60 cm) ........................................................................ 34

Figure 17 six example input frames for the Reconstructor ...................................................... 37 Figure 18 Reconstructed point cloud ....................................................................................... 38

Figure 19 RGBD Reconstructor flowchart ............................................................................... 39 Figure 20 Failed reconstruction shown from front and side view ............................................ 41

Figure 21 Reconstruction with marker ..................................................................................... 42 Figure 22 After point cloud processing .................................................................................... 43 Figure 23 Point cloud cleaning ................................................................................................. 43

Figure 24 Mesing process ........................................................................................................ 44 Figure 25 3D body model with measurement .......................................................................... 45

Figure 26 Model with arms extended (The Districts, 2011) .................................................... 47


9. Appendix A

Department of Computing Degree Project Proposal

Name: Nikolai Bickel Course: Computing Size: double

Discussed with (lecturer): Dr. Bogdan Matuszewski, Chris Casey Type:

development

1 Previous and Current Modules

Object Oriented Methods in Computing (CO3402)

Enterprise Application Development (CO3409)

Database Driven Web Sites (CO3708)

Computer Vision (EL 3105)

2 Problem Context

There are a few problems when buying clothes online, the most common being that the

purchased clothes do not fit. This is exacerbated by the fact that many users don’t know their

own size or the size of those they are purchasing for (such as parents who purchase garments

for their children). Many people deal with this problem by ordering several sizes of the same

clothes and send back the excess.

3 The Problem

For those who buy several sizes it can be a nuisance to return the excess clothes. Additionally

if a customer wishes to purchase clothes for a special occasion, they might be reluctant to

order online as they might be unsure of the fitting of the clothes ordered.

The online stores also bear the costs associated with this problem, as they usually pay the

shipping costs for the returned items, and also deal with several logistical issues along with

the costs associated with the resale of the items.

4 Potential Ethical or Legal Issues

none


5 Specific Objectives

Access information from the sensor (RGBD sensor -> Microsoft Kinect)

Isolate the important data

Try to get the sizes of body parts

Convert measurements into clothing size (S, M, L, XL)

Compare program results -> real data

6 The Approach

I want to capture the data of a body of a person using a RGBD sensor. The RGBD sensor I

intend to use is the Microsoft Kinect. The Microsoft Kinect takes a RGB picture of a person

and also captures depth using an additional sensor. Depth sensors are used to measure the 3rd

dimension that is the depth of the object from the camera.

The depth information will be very important data to work with. There are better sensors than

the Kinect, but it is very cheap.

To get the data from the Kinect it must be connected with a Computer over USB. There are

some interfaces to get the required “Kinect data”. The first approach will be to build a desktop

application that can handle the “Kinect data” and get the required clothes sizes.

7 Resources

Microsoft Kinect

Kinect Interfaces

o Kinect for Windows SDK from Microsoft Research (free for research)

o OpenKinect (free)

A PC to connect the Kinect over USB

http://www.dict.cc/englisch-deutsch/measurements.html


8 Potential Commercial Considerations

8.1 Estimated costs and benefits

Whether the final product can be used commercially depends on how accurately the

measurements can be made. At this time I cannot say how accurate the measurements will be.

Not every person has a RGBD sensor at home, but maybe in the coming years every webcam

have a depth sensor to provide this information.

9 Literature Review

Is the data from the Microsoft Kinect good enough to get exact data of a person’s body?

10 References

Jamie, S. et al. 2011. Real-Time Human Pose Recognition in Parts from Single Depth Images.

[ONLINE] Available at:

http://research.microsoft.com/pubs/145347/BodyPartRecognition.pdf. [Accessed 27

September 11].

Christian Plagemann, Varun Ganapathi, Daphne Koller, Sebastian Thrun. 2010. Real-time

Identification and Localization of Body Parts from Depth Images. [ONLINE] Available at:

http://www.stanford.edu/~plagem/bib/plagemann10icra.pdf. [Accessed 27 September 11].

eurogamer.net. 2010. Kinect visionary talks tech. [ONLINE] Available at:

http://www.eurogamer.net/articles/digitalfoundry-kinect-tech-interview. [Accessed 27

September 11].

Similar project for webcams (without depth information)

http://www.upcload.com/

http://www.seventeen.com/fashion/virtual-dressing-room

Kinect for Windows Software Development Kit (SDK) beta from Microsoft Research,

http://research.microsoft.com/en-us/um/redmond/projects/kinectsdk/

Free, open source libraries that will enable the Kinect to be used with Windows, Linux, and

Mac

http://openkinect.org/wiki/Main_Page

http://www.upcload.com/

http://www.seventeen.com/fashion/virtual-dressing-room

http://research.microsoft.com/en-us/um/redmond/projects/kinectsdk/



10. Appendix B

Department of Computing Final Year Project Technical Plan

Name: Bickel Nikolai Size: double Mode: ft Time: 1

Course: Computing Supervisor: Dr. Bogdan Matuszewski

1 Summary

I want to build a computer application which connects to a RDGB sensor (the Microsoft

Kinect). The user of the program should be able to stand in front of the computer and see his

body measurements and what clothing size would fit him. That means that the user should

also be able to see the body measurements which would be determine the process of getting

the clothing sizes.

The challenge of the project is to find the best algorithm to get robust measurements. So I will

measure twice and I want to have nearly the same results. When I have robust and accurate

measurements it will not be difficult to find the right clothing size. To find the best algorithm

I need to test different approaches. There may be accuracy problems due to quality of the data

provided by the Kinect.

2 Constraints

Because I don’t work with an external partner, I just have the deadlines that were given from

the school.

Project Deadlines:

Proposal: 27-09-2009

Technical Plan: 20-10-2009

Literature Report: 24-11-2009

Project Report: 26-04-2010


3 Key Problems

As I mentioned before one of the key problems is to ensure the measurements are robust. To

ensure that the results are accurate there must be a few components working in tandem. When

I am able to get the data via USB I will have a Byte array, so I need to isolate the important

data contained within it. Some external resources will help me to find that important data.

I am unsure of the data quality from a Microsoft Kinect and that there some use cases which

would be too difficult to implement. I may need to write algorithms that compensate for the

problem of noisy data. Another potential problem is that I may not be able to change some of

the hardware restrictions from in the Microsoft Kinect.

The results of the measurements should be presented in an understandable way and the usage

of the application should be not too difficult. Because I don’t know the standards of clothing

industry I need to invest some time to get information about what standards the textile and

clothing industry use. Just with this knowledge I can convert my measurements in

representative clothing size (e.g. S, M, L, and XL)

4 Risk Analysis

Risk Severity Likelihood Action

Noisy depth data from the

Kinect

Medium High Make the best out of the data I get

from the Kinect

Inaccurate data from

external resources

(OpenKinect, MS Kinect

SDK)

High Medium Try to configure the tools the right

way so that the give the best results

that are possible

Robustness – the data of 2

measurements don’t match

Medium High Try to reduce the error rate as much

as possible

The RGBD sensor

breaking

High Low Buy a new one (will take a week)

Losing information about

the project or related

information

High Low Make backups

Measurement points are

too complex to implement

Medium Medium Try the best otherwise reduce the

measurement points to that which

are possible

Scheduling failure, not

enough time to complete

project

High Medium Try to work within a timetable with

the help of a Gantt chart


5 Options

Middleware

o Kinect for Windows SDK from Microsoft Research (Microsoft)

o OpenNI Framework (OpenNI organization)

o OpenKinect (Open Source)

You can see the architecture of the application in the section “System & Work Outline”.

6 Potential Ethical or Legal Issues

When I want to test my application I cannot just test it with my body measures. I need other

subject to test if my application is to be robust as people body sizes will vary, also if someone

is thicker or thinner. I will ask some volunteers to test my application but I will not publish

any personal details (e.g. pictures) in my reports. I may need to publish some anonymized

data for example purpose.

7 System & Work Outline

Sensor array:

Microsoft Kinect

Middleware:

The middleware provides USB driver and each of the Frameworks have own additional

features. As For example they provide image, depth and audio streams or skeleton tranking.

As a part of my preparation for the technical plan I will try to install all of the different

middleware’s and play around with them. Each of them has pros and cons. At the moment I

can’t say which of the products I want to use. I am need more time to test them precisely. As I

Sensor array USB

Middleware

My

Application


mention in my Gantt chart I want to work with all of them next time and then I will choose

one of them.

Kinect for Windows SDK from Microsoft Research (Microsoft)

OpenNI Framework (OpenNI organization)

OpenKinect (Open Source)

The middlewares are not compatible with each other.

My Application:

Which programming language I will use, depends on which of the middleware I choose. I

may need to search for a wrapper. All of the middleware support Microsoft.NET

programming language. I think I will program the application in C# or C++. I can handle both

of them and I think in this project the program language is not one of the big problems.

8 Commercial Analysis

8.1 Estimated costs and benefits

Factor name Description Is this a

cost or a

benefit

Estimated

Amount

Estimate of

when paid

Kinect for Xbox

360

RGBD sensor

Image stream

Depth stream

Audio stream

cost £100 Before the

project

Software Microsoft Visual

Studio, Netbeans,

Middleware

benefit £0 MSDNAA

software and

free software

Miscellaneous Measuring tape,

pocket rule

cost £15 Payable during

project

Working Time Develop and research cost 300 – 400

working hours

During project


8.2 Commercial Conclusion

Whether the final product can be used commercially depends on how accurately the

measurements can be made. At this time I cannot say if the measurements are accurate.

Actually not all people have RGBD sensors at home, but maybe in the coming years every

normal computer webcam will provide depth information.

At the moment the middleware “Kinect for Windows SDK from Microsoft Research” is

licensed only for non-commercial use. But they will release a licence for commercial use. The

beta SDK has been developed to support wide exploration and experimentation by academic

and research communities.


11. Appendix C

12. Appendix D

Build a 3D body model with a single Kinect sensor

Nikolai Bickel,

BSc (Hons) Computing

Project: Body measurements with the Microsoft Kinect

Supervisor: Dr. Bogdan Matuszewski

Second Reader: -

25. November 2011

Abstract

Depth cameras are not conceptually new, but the Microsoft Kinect has made the sensor

popular for researchers and enthusiasts. A 3D body model is beneficial for applications in a

lot of different areas.

This paper gives an overview about how to build a 3D body model with a single Kinect

sensor. It also gives some technical details about the specification and capabilities of the

Microsoft Kinect system. Different algorithms to solve the problems in getting a 3D body

model with a Kinect will be discussed.


1 Introduction

1.1 Context

Kinect becoming an important 3D sensor and that is not because it is the best sensor. It is

because of its reliability and the low cost. If you want to build a 3D body model with a Kinect

it is important to keep in mind some of the problems which could appear. An important part

of working with a technical device is to know the basic behaviour of it. Because of that you

also can find some interesting information about the device in this paper. The paper will show

some possible ways to treat the problem of building a 3D body model with a single Kinect

sensor.

1.2 Overview

Section 2 (Kinect sensor) describes the Microsoft Kinect sensor and its capabilities and some

additionally information about the accuracy and the calibration process. Section 3 (Collect

data from the Kinect) is an overview about the different options to access the data from the

Kinect. The following Section 4 (The object “human body”) contains some error sources

when dealing with a human body. In Section 5 (Pre-processing Kinect data) are explanations

to process the collected data before it can be used in different approaches which are pointed

out in Section 6 (3D - Registration). To get a useable body shape there is a technique called

meshing which is discussed in Section 7.

2 Kinect sensor

Borstein (2011) explains in his book “Making things see” what a Kinect does. The difference

between a normal camera and a Kinect is that the Kinect additionally collects depth data. That

means the Kinect measures the distance to the object that is placed in front of the camera. For

a normal person there is no big difference between a normal picture and depth data but for the

computer it is not so easy to “see” what it wants to know to differ between them. When a

computer analyze a picture it has just the colour of a pixel and it is difficult to separate

different objects and people. In a depth image at the other hand the computer have depth

information for each pixel and it is easier to find the data that it is looking for because he

know how far away the object is from the sensor. A benefit from the depth data is also that

you can build a 3D model of what the camera can see. This is important in building a full 3D

model of an object.


Functionality

The Kinect sensor has a RGB camera, an IR camera and an IR projector. The IR projector

projects irregular patterns to the objects in front of the Kinect. The depth camera creates a

depth image by recognizing the alteration in this pattern. The inventors of the Kinect describe

the measurements of depth as a triangulation process (Freedman, et al., 2010).

Kinect Sensor Array Specifications

Sensor item Specification range

Viewing angle 43° vertical by 57° horizontal field of view

Mechanized tilt range (vertical) ±28°

Frame rate (depth and color stream) 30 frames per second (FPS)

Resolution, depth stream QVGA (320 × 240)

Resolution, color stream VGA (640 × 480)

(MicrosoftInt, 2011)

Accuracy

Khoshelham (2011) has analyzed the accuracy of the Microsoft Kinect in the paper “Accuracy

analysis of Kinect depth data” and came to the following statement: “The random error of

depth measurements increases quadratic with increasing distance to the sensor and reaches 4

cm at the maximum range”. Khoshelham (2011) also comes to the conclusion that for

mapping purpose the data should be acquired within 1-3 m distance to the sensor. At the ROS

homepage (ROS, 2011) is written: “Because the Kinect is essentially a stereo camera, the

expected error on its depth measurements is proportional to the distance squared.”

3 Calibration

In many literature resources they point out that it is important to have a calibrated Microsoft

Kinect to get accurate data (Weiss, et al., 2011), (ROS, 2011), (Pajdla, et al., 2011). It

depends which middleware are in use to access different methods to calibrate a Microsoft

Kinect. There are some explanations and technical descriptions of the calibration process at

the ROS Homepage (ROS CA, 2011) (ROS CT, 2011). For the OpenKinect project there is a

calibration method at the OpenKinect Wiki (Burrus, 2011) . In the paper “Accurate and

Practical Calibration of a Depth and Colour Camera Pair” by (Herrera, et al., 2011) is an

explanation of calibrating a depth and colour camera pair.

4 Collect data from the Kinect

The normal purpose of Microsoft Kinect (MicrosoftKin, 2011) is for playing with the

Microsoft Xbox console gaming system. But there are some projects that allow us to connect


the Microsoft Kinect sensor to the personal computer. These middleware products provide the

USB-driver and interfaces to access the data from the Kinect. The most popular are:

OpenKinect

Is an open source community around the topic Kinect. They focused to the libfreenect driver

(libfreenect, 2011). The most of the driver program code is written in C. Libfreenect is

OpenSource (Apache license) and available for Windows, Linux, and Mac. They also provide

an encyclopaedia with information around the topic “Kinect” (OpenKinect, 2011).

OpenNI framework

The OpenNI framework is published from the OpenNI organization (OpenNI organization,

2011). Companies like PrimSense who provide the 3D sensing technologies for the Kinect are

in this organization. All source code of the driver and the sample programmes are available

and in C#.

Kinect for Windows SDK

The Kinect for Windows SDK is published by Microsoft (MicrosoftSDK, 2011). It provides

data from the Kinect to developers to build applications in C++, C# or Visual Basic. The

source code is not published. At the moment the SDK is only for non-commercial use.

Optional: Matlab

There are possibilities to combine the OpenNI driver (mexkinect, 2011) and the Microsoft

Kinect SDK driver (Dirk-Jan Kroon, 2011) with Matlab. The OpenNI library wrapper

functions are almost bug free and have more functionality than the wrapper functions for the

Microsoft SDK. But to use the OpenNI library wrapper functions it is necessary to use an

older driver from the OpenNI framework and not the latest one.

All of the middleware’s providing the raw-data which are needed to build a 3D body model.

There are differences in the simplicity of the installation and the connect ability with Matlab.

5 The object “human body”

When working with a body as a scanning object there are some problems are summarized in

presentation of D’Apuzzo (D’Apuzzo, 2009): The problem of scanning a body are practical

problems (movements, breathing, hairs and eyes) and physical limits (stature, size, and

weight). There are also some problems in the scanning process like scanning time, nudity and

the problem of the privacy of the collected data. Especially the movements can be a big

problem when using the ICP algorithm (shown in Section 5). Allen, Curless & Popović

(2003) are writing it in their paper “The human body comes in all shapes and sizes, from

ballet dancers to sumo wrestlers.” That means that it is difficult to make general assumptions

for the object human body.


6 Pre-processing Kinect data

When you want to collect the data from the Kinect you get all measurement points relative to

the Kinect. But you need just the data of the person in front of the Kinect. So you need to get

rid of the points that are not belonging to the human body. This process called segmentation

(Rabbania, et al., 2011). In the paper “Home 3D Body Scans from Noisy Image and Range

Data” Weiss, Hirshberg & Black (2011) explain the segmentation process that way: “We

segment the body from the surrounding environment using background subtraction on the

depth map. Given a depth map Dbg taken without the subject present and a depth map Df

associated with a frame f, we take the foreground to be Dbg − Df > ϵ, where ϵ is a few mm.

We then apply a morphological opening operation to remove small isolated false positives.”

The floor is also not important for the 3D body model. To find the floor and delete those

points in the point cloud you can use the Kinects on board accelerometer to find the floor. The

OpenNI middleware provides a function to find the floor coordinates.

7 3D - Registration

When the data from the Kinect are collected and converted to 3D world coordinates there is

still no full 3D body model available. To get a full 3D body model it is necessary to collect

data of a person’s body in different angle to combine the data in a full 3D body model. This is

required because the Kinect can only collect data in front of the sensor. For instance, when a

person stands with the face to the Kinect sensor the sensor cannot see the information’s of the

back of this person. That means you need the data of an object, in our case a person in

different angles, and match all of these data to one body model together. This process called

registration and there is an explanation in the article written by Brown (1992).

The problem when working with a Kinect as a sensor is pointed out in the paper “Home 3D

Body Scans from Noisy Image and Range Data” written by Weiss, Hirshberg & Black (2011):

“To estimate body shape accurately, we must deal with data that is monocular, low resolution,

and noisy”. They use a part of the SCAPE model which are developed by Anguelov et al.

(2005). Because the SCAPE model is made for shape completion they just use the SCAPE

model which factors body shape and pose information. The SCAPE algorithm needs a

training database of body shapes to work correctly.

When we have a cloud of points from a person in different angels we need to try combining

the different point clouds together. A possible algorithm would be the ICP (Iterative closest

point). An explanation can found in the book written by ZHANG (1992). There are a lot of

implementations in different programming languages accessible over the internet. Further

there are a lot of derivations of the ICP method available. Problems could occur when using

the ICP algorithm when the data from the Kinect are too noisy or not correctly segmented.


Izadi, et al., (2011) suggest that the “Depth measurements often fluctuate and depth images

contain numerous ‘holes’ where no readings were obtained”. In a Kinect image there are

holes where the Kinect IR camera can’t “see” because of lightning conditions, reflection,

transparency, occlusion, the objects being out of range or objects do not reflecting the

infrared. And the Kinect need infrared to work correctly.

8 Meshing

The outcome of the registration process should be a point-cloud. They are a bunch of

disconnected points floating near each other in three-dimensional space. When we look

closely the image will break down into a bunch of distinct points with space visible between

them. “If we wanted to convert these points into a smooth continuous surface we’d need to

figure out a way to connect them with a large number of polygons to fill in the gaps. This is a

process called "constructing a mesh"” (Borenstein, 2011).

An explanation to generate a mesh in Matlab is available in the article “A Simple Mesh

Generator in MATLAB” written by Persson & Strang

9 Conclusion

This paper should be a help in building a 3D body model of a person with a single Kinect.

The paper restricted to the object body and a single Kinect. The process to getting a 3d model

is approximately the same when not working with a body as object. There could be an

improved result when using multiple Kinect systems or have a lot of training data.

Overall the process of getting a 3D body model is not easy and it is a big new task to make it

automatable and user-friendly.

10 References Allen, B, Curless, B & Popović, Z (2003)

The space of human body shapes: reconstruction and parameterization from range scans.

SIGGRAPH '03

Anguelov, D, Srinivasan, , Koller, D, Thrun, S, Rodgers, J & Davis, J. (2005)

SCAPE: Shape Completion and Animation of People.

SIGGRAPH Conference


Borenstein, G. (2011)

Making Things See.

O'Reilly Media

Brown, LG. (1992)

A Survey of Image Registration Techniques.

ACM Computing Surveys, vol 24, pp. 325-376.

Burrus, N. (2011)

Kinect Calibration OpenKinect.

http://nicolas.burrus.name/index.php/Research/KinectCalibration

(visited Nov. 2011)

D’Apuzzo, N. (2009)

Hometrica. http://www.hometrica.ch/pres/2009_essilor_pres.pdf

(visited Nov. 2011)

Daniel Herrera C., Juho, K & Janne, H. (2011)

Accurate and Practical Calibration of a Depth.

LNCS 6855, vol II, pp. 437—445.

Dirk-Jan, K. (2011)

Kinect Microsoft SDK. http://www.mathworks.com/matlabcentral/fileexchange/33035

(visited Nov. 2011)

Freedman, B., Shpunt, A., Machline, M. & Arieli, Y. (2010)

Depth mapping using projected patterns.

United States, Patent No. US 2010/0118123

Izadi, S, Kim, D, Hilliges, O, Molyneaux, D, Newcombe, R, Kohli, P, Shotton, J, Hodges, S,

Freeman, D, Davison, A & Fitzgibbon, A. (2011)

KinectFusion: Real-time 3D Reconstruction and Interaction.

http://research.microsoft.com/pubs/155416/kinectfusion-uist-comp.pdf

(visited Nov. 2011)

Khoshelham, K .(2011)

Accuracy analysis of kinect depth data.

ISPRS

libfreenect (2011)

libfreenect. https://github.com/OpenKinect/libfreenect

(visited Nov. 2011)

mexkinect (2011)

kinectmex. http://sourceforge.net/projects/kinect-mex/

(visited Nov. 2011)

http://nicolas.burrus.name/index.php/Research/KinectCalibration

http://www.hometrica.ch/pres/2009_essilor_pres.pdf

http://www.mathworks.com/matlabcentral/fileexchange/33035

http://research.microsoft.com/pubs/155416/kinectfusion-uist-comp.pdf

https://github.com/OpenKinect/libfreenect

http://sourceforge.net/projects/kinect-mex/


MicrosoftInt (2011)

Introduction to Kinect for Windows, Microsoft. http://www.xbox.com/en-US/kinect

(visited Nov. 2011)

MicrosoftSDK (2011)

Microsoft Kinect SDK. http://kinectforwindows.org/

(visited Nov. 2011)

OpenKinect (2011)

OpenKinect. http://openkinect.org/wiki/Main_Page

(visited Nov. 2011)

OpenNI organization (2011)

OpenNI. http://openni.org/

(visited Nov. 2011)

Persson, P-O & Strang, G. (2004)

A Simple Mesh Generator in MATLAB.

SIAM Review, vol 46, pp. 329-345.

Rabbania, T, van den Heuvelb, FA & Vosselmanc. (2011)

Segmentation of point clouds using smoothness constraint.

ISPRS Commission V Symposium

ROS (2011)

ROS (Robot Operating System). http://www.ros.org/wiki/openni_kinect/kinect_accuracy

(visited Nov. 2011)

ROS CA (2011)

ROS. http://www.ros.org/wiki/openni_camera/calibration

(visited Nov. 2011)

ROS CT (2011)

ROS (Robot Operating System). http://www.ros.org/wiki/kinect_calibration/technical

(visited Nov. 2011)

Pajdla, T. , Smisek, J. & Jancosek, M. (2011)

3D with Kinect.

ICCV

Weiss, A, Hirshberg, D & Black, M (2011)

Home 3D Body Scans from Noisy Image and Range Data.

ICCV 2011

ZHANG, Z. (1992)

Iterative Point Matching for Registration of Free-form Curves.

http://www.xbox.com/en-US/kinect


http://openni.org/

http://www.ros.org/wiki/openni_kinect/kinect_accuracy

http://www.ros.org/wiki/openni_camera/calibration

http://www.ros.org/wiki/kinect_calibration/technical

3d body measurement’s with a single kinect sensor

Documents

kinect project

project report

project lifecycle

project analyses

microsoft kinect

d input device

d body measurements

d measurement device