comparison between single and multi...

14
COMPARISON BETWEEN SINGLE AND MULTI-CAMERA VIEW VIDEOGRAMMETRY FOR ESTIMATING 6DOF OF A RIGID BODY Erica Nocerino, Fabio Menna, Fabio Remondino 3D Optical Metrology Unit, FBK Trento, Italy Email: <nocerino><fmenna><remondino>@fbk.eu, Web: http://3dom.fbk.eu ABSTRACT Motion capture (MOCAP) systems are used in many fields of application (e.g., machine vision, navigation, industrial measurements, medicine) for tracking and measuring the 6DOF (Degrees-Of-Freedom) of bodies. A variety of systems has been developed in the commercial, as well as research domain, exploiting different sensors and techniques, among which optical methods, based on multi-epoch photogrammetry, are the most common. The authors have developed an off-line low-cost MOCAP system made up of three consumer-grade video cameras, i.e. a multi-view camera system. The system was employed in two different case studies for measuring the motion of personnel working onboard a fishing boat and of a ship model in a towing tank (or model basin) subjected to different sea conditions. In this contribution, the same three single cameras are separately processed to evaluate the performances of a sequential space resection method for estimated the 6DOF of a rigid body (a ship model during high frequency tests in a model basin). The results from each video camera are compared with the motion estimated using the multi-view approach, with the aim of providing a quantitative assessment of the performances obtainable. Keywords: motion capture systems, photogrammetry, 6DOF, accuracy, multi-view, resection, tracking, synchronization 1. INTRODUCTION The expression motion capture (MOCAP) designates the technique of recording or capturing movements, i.e. changes in position of features of interest (ranging from single or multiple points to small particles or complex subjects) with respect to time or through sequential epochs. When only qualitative information is to be derived such as, for example, the case of basic applications in video-surveillance 1 , the mere recording task is sufficient; on the contrary, if quantitative information about the motion is required, a processing procedure is to be performed on the recorded sequential data. When dealing with single or multiple points (e.g. particle image velocimetry PIV 2 ), motion is typically described in terms of displacements, trajectories, speed, acceleration. On the other hand, in many fields of application (machine vision, navigation, industrial measurements, medicine, etc.) the dynamics of rigid, also interconnected bodies (e.g., human body) is of primary interest and the task to be accomplished is the estimation of their pose, i.e. position and orientation (also called six degrees of freedom - 6DOF), through time or multiple epochs. MOCAP systems rely on different sensors that fall in two main categories: (i) optical and (ii) non-optical methods. Non- optical methods make use of inertial sensors (inertial measurement units - IMUs) 3 , GNSS receivers, mechanical devices, magnetic and acoustic sensors, often combined together to form inertial navigation systems (INS) 4,5 . On the other hand, the most diffuse MOCAP systems are based on optical methods, that can be further classified according to the usual taxonomy of passive/image-based and active/range-based methods. Laser tracking systems, such as the Leica systems (Absolute Tracker T-Mac, T-Scan, T-Probe) 6 or the Automated Precision Inc. systems 7 , belong to the category of active methods and are mainly suited for industrial, highly accurate applications. Passive optical methods employ photogrammetry iteratively over time, so that it is also referred to as multi-epoch photogrammetry or videogrammetry. Such systems represent a very flexible solution, adopted for a variety of applications, in different environments and able to provide high accuracy 8 . On the market and in literature there are plenty of photogrammetry-based motion capture systems that make use of industrial, high frame rate video cameras 9 , DSLRS 10 and low-cost video cameras 11,12 .

Upload: hathien

Post on 17-Sep-2018

215 views

Category:

Documents


0 download

TRANSCRIPT

COMPARISON BETWEEN SINGLE AND MULTI-CAMERA VIEW

VIDEOGRAMMETRY FOR ESTIMATING 6DOF OF A RIGID

BODY

Erica Nocerino, Fabio Menna, Fabio Remondino

3D Optical Metrology Unit, FBK Trento, Italy

Email: <nocerino><fmenna><remondino>@fbk.eu, Web: http://3dom.fbk.eu

ABSTRACT

Motion capture (MOCAP) systems are used in many fields of application (e.g., machine vision, navigation, industrial

measurements, medicine) for tracking and measuring the 6DOF (Degrees-Of-Freedom) of bodies. A variety of systems

has been developed in the commercial, as well as research domain, exploiting different sensors and techniques, among

which optical methods, based on multi-epoch photogrammetry, are the most common. The authors have developed an

off-line low-cost MOCAP system made up of three consumer-grade video cameras, i.e. a multi-view camera system. The

system was employed in two different case studies for measuring the motion of personnel working onboard a fishing boat

and of a ship model in a towing tank (or model basin) subjected to different sea conditions. In this contribution, the same

three single cameras are separately processed to evaluate the performances of a sequential space resection method for

estimated the 6DOF of a rigid body (a ship model during high frequency tests in a model basin). The results from each

video camera are compared with the motion estimated using the multi-view approach, with the aim of providing a

quantitative assessment of the performances obtainable.

Keywords: motion capture systems, photogrammetry, 6DOF, accuracy, multi-view, resection, tracking, synchronization

1. INTRODUCTION

The expression motion capture (MOCAP) designates the technique of recording or capturing movements, i.e. changes in

position of features of interest (ranging from single or multiple points to small particles or complex subjects) with respect

to time or through sequential epochs. When only qualitative information is to be derived such as, for example, the case of

basic applications in video-surveillance1, the mere recording task is sufficient; on the contrary, if quantitative information

about the motion is required, a processing procedure is to be performed on the recorded sequential data. When dealing

with single or multiple points (e.g. particle image velocimetry – PIV2), motion is typically described in terms of

displacements, trajectories, speed, acceleration. On the other hand, in many fields of application (machine vision,

navigation, industrial measurements, medicine, etc.) the dynamics of rigid, also interconnected bodies (e.g., human body)

is of primary interest and the task to be accomplished is the estimation of their pose, i.e. position and orientation (also

called six degrees of freedom - 6DOF), through time or multiple epochs.

MOCAP systems rely on different sensors that fall in two main categories: (i) optical and (ii) non-optical methods. Non-

optical methods make use of inertial sensors (inertial measurement units - IMUs)3, GNSS receivers, mechanical devices,

magnetic and acoustic sensors, often combined together to form inertial navigation systems (INS)4,5

. On the other hand,

the most diffuse MOCAP systems are based on optical methods, that can be further classified according to the usual

taxonomy of passive/image-based and active/range-based methods. Laser tracking systems, such as the Leica systems

(Absolute Tracker T-Mac, T-Scan, T-Probe)6

or the Automated Precision Inc. systems7, belong to the category of active

methods and are mainly suited for industrial, highly accurate applications. Passive optical methods employ

photogrammetry iteratively over time, so that it is also referred to as multi-epoch photogrammetry or videogrammetry.

Such systems represent a very flexible solution, adopted for a variety of applications, in different environments and able

to provide high accuracy8. On the market and in literature there are plenty of photogrammetry-based motion capture

systems that make use of industrial, high frame rate video cameras9, DSLRS

10 and low-cost video cameras

11,12.

a) b) c)

Figure 1. Example of commercial markers: (a) flat circular retroreflective markers by Geodetic Systems13, (b) spherical

retroreflective markers by zFlo14, c) active spherical markers by Ar-tracking15.

Hereinafter, the terms video frames and epochs are used as synonyms; they are equivalent to single pictures acquired by

cameras at different time in standard photogrammetry. The expression video stream is used to intend an entire sequence

of frames recorded by the passive optical sensor.

When optical-based systems are used, in order to describe and measure the 6DOF of a rigid body, the position and

motion of at least three points firmly fixed on it, must be measured and tracked. Usually more points are used for

redundancy and robustness, particularly in case of obstructions. These points can be “signalized” through well

recognizable active and passive targets - a mandatory requirement for highly precise tracking tasks, or natural features

automatically marked in the frames. MOCAP systems, especially those employed in biomechanics and augmented reality

systems, usually features both active and passive markers: (i) active markers are mostly made up of LEDs, emitting

either in visible or infrared, (ii) passive markers are retroreflective (i.e. they reflect the incoming radiation back to source

of light) and they are usually coupled with ring flash-lights. On the market (Figure 1), both flat and spherical active and

passive markers are available and multiple markers can be also combined together to directly provide the 6DOF of the

rigid body or component of interest.

When it comes to employed sensors and techniques, three main approaches are normally used to track a rigid body:

multi-view

(i) Multi-camera view photogrammetry: it is the most accurate and reliable technique which takes advantage of

redundant observations and multiple intersections of optical rays, but, at the same, is affected by the critical issue of

system synchronization. If not properly solved, temporal bias of video streams can result and some errors, whose

magnitude increases with high dynamics, are to be expected. Both hardware and software solutions can be adopted

to solve or, at least, reduce this crucial problem. Electronic hardware solutions are the most used by commercial

systems for online (real time) measurements and for high speed motion industrial applications, where high frequency

cameras (i.e. 1000 Hz) are employed and even temporal biases of tenths of ms must be avoided. Software solutions

use analytical approaches by adding a time dependent unknown parameter in the functional model to measure the

delay among video streams16-18

. Corrections to the image observation coordinates are applied in post processing to

take into account the delay. Some software solutions use an external device positioned in the common field of view

of the camcorders introducing in the video streams a well recognizable common event. The video streams are then

aligned on the basis of this common event. The most simple and oldest solution is switching on and off a LED. The

attainable accuracy of video synchronization with this method is at most the duration of one frame of the video

sequence. Other solutions consist in including a LED chronometer device that is used to read directly the time, hence

the synchronization bias. Other software solutions are based on the delay’s measurement with the audio channel of

consumer-grade video cameras19,20

. Using audio stream, accuracy up to some tenths of ms is reported in measuring

the delays.

(ii) Multi-view from mirror systems: these approaches use optical beam splitter and deflection mirrors to divide the

optical rays coming from the object as if it were acquired from different points of view. The disadvantage of this

method is that the unique sensor area is partitioned to acquire different images divided by the beam splitter, leading

to a decrease of the optical resolution21-24

. An alternative device that preserves the original resolution of the camera

but halves the image acquisition frame rate has been proposed for consumer-grade video cameras25

. Maas et al.2

propose a virtual four camera system, realized with a single camera and a four-fold mirror system.

(iii) Single or sequential space resection with one camera: these methods use one camera to determine the 6DOF of a

rigid body with sequential space resections26-28

. Sub-millimeter accuracy for position and sub-degree accuracy for

angle estimation has been proved feasible in simulation for a typical video camera with the object up to 1.5m of

distance26

. Several factors influence the accuracy of the method, such as the accuracy of reference points on the rigid

body and fixed reference system as well as camera to object distance or image marking measurement precision.

Methods (ii) and (iii) are not affected by synchronization issues therefore they are especially suited for high speed motion

analysis applications where the cost of high frequency cameras is not negligible (e.g., AICON wheel watch29

, Figure 2a).

Method (iii) is also used when multi-view systems are not effective because of obstructions or system flexibility

restrictions, as in the case of AICON ProCam29

portable touch probe Coordinate measurement machine (CMM) for pre

and post-crash test measurements (Figure 2b) or medical applications (AXIOS 3D® SingleCamC830

) or for augmented

reality applications (Cyclope tracker31

, Figure 2c).

Depending on the environmental conditions i.e. temperature, humidity, vibrations, underwater, the synchronization and

wiring of video cameras can be a very limiting constraint. Wireless communication cameras such as Qualisys Oqus

series32

represent an important innovation that improves the system flexibility noticeably. Most commercial

videogrammetric systems are designed to work in real time. For high speed/frequency and high complexity scenes, where

many objects must be tracked simultaneously, a MOCAP system can display the motion in real time at a reduced

frequency to allow the user to supervise the acquisition while the full data review can be done after the complete

recording. For augmented reality applications the time delay between actual motion and the one reconstructed by the

MOCAP system, known as latency, must be very short to avoid simulator/VR sickness to users. From an internet search

done in March 2015, a typical value for latency is below 10 ms for most known commercial systems (Qualisys32

,

Vicon33

, OptiTrack34

).

a)

b)

c)

Figure 2. Example of commercial single camera based motion capture system based on markers: (a) AICON wheel

watch28

, (b) AICON ProCam29

, c) Cyclope tracker31

.

Table 1. Technical specifications of the consumer-grade video camera employed for the developed videogrammetric system.

Camera Model Sony HDR CX106E

Sensor Type 1/5 Exmor™ CMOS Sensor

Resolution 1920×1080 pixels

Pixel size 1.5 μm

Lens Carl Zeiss Vario-Tessar

Focal Length (nominal) 3.2-32 mm

Field of view Horizontal 48-5 degrees

Field of view Vertical 28-3 degrees

Frames per second 50i (Interlaced)

Shutter Speed 1/2-1/1000 sec

1.1 Development of an off-line, low-cost, flexible motion capture system

Within the OptiMMA (Optical Metrology for Maritime Applications) project11,12

, the authors developed an off-line, low-

cost, flexible and portable videogrammetric system, made up of three consumer-grade, stand-alone video cameras (three

full HD interlaced Sony HDR CX106E, Table 1). One of the strong points of the proposed low-cost motion capture

system is the lack of wiring which gives virtually no constraints to camera position and orientation. Two case studies are

afterwards reported, showing the system performances in two completely different and demanding environments: (i)

onboard a fishing boat as human motion capture system to track and measure the movements of personnel working on

the ship deck12

; (ii) in a towing tank for tracking and measuring the 6DOF of ship models11

. In both cases, the multi-

camera view photogrammetry approach was exploited, therefore the synchronization of video streams was necessary. A

two-step synchronization procedure was developed:

(i) A coarse synchronization is achieved through the use of a LED, switched on (off) within the Field of View

(FOV) of the system: the first frame where the LED changes its status (from off to on or vice-versa) is

taken as the synchronization event. With this method, the different image sequences are synchronized up to

the duration of an entire frame (1/50th of second), but a residual misalignment error can still exist and can

reach a maximum value of one frame.

(ii) A finer video stream alignment is then performed to obtain sub-frame synchronization exploiting the audio

signals recorded by the three cameras. A special device emitting simultaneously sounds at known frequency

and flashing a LED is used to introduce a common event used for an automatic a-posteriori synchronization

of video sequences up to 1 msec. The video sequences are synchronized using matching procedures based

on the cross-correlation between audio signals recorded by camcorders. The measured synchronization

error is finally used to correct the image trajectories of tracked points by linear interpolation.

The synchronization method based on audio signals correlation led to an improvement of 5 times in the relative accuracy

of the 3D coordinate measurement with respect to relative accuracy achievable with the simple synchronization approach

based on LED lighting.

In this paper, starting from the same low-cost videogrammetric system11

, the single camera method is investigated. Each

of the three cameras is used to measure separately the 6DOF of the same ship model hosted in the towing tank. The

results achieved with the multi-camera view photogrammetry approach are considered “ground truth” for accuracy

evaluation of the single camera method. The reasoning behind the proposed study is to provide a quantitative evaluation

of the single camera approach against the multi-view camera method, in order to understand whether the former and

simpler system can substitute the latter more complex approach. The use of a single camera would have different

advantages, from both theoretical and practical points of view: (i) it does not need a synchronization procedure,

completely eliminating an important source of error of multi-view videogrammetry; (ii) it is simpler to set up, more

portable and cheaper; (iii) it requires lower processing time (only one video stream vs three).

Figure 3. Space resection (after Al Khalil and Grussenmeyer37).

z

Object space

P2

O (X0,Y0,Z0)

Image plane

Object pyramid

Image pyramid x

y

X Z

Y

P1

P3

p2

p1

p2

2. SINGLE CAMERA APPROACH: SEQUENTIAL SPACE RESECTION

Sequential space resection is the mathematical procedure that constitutes the theoretical base for single camera approach

in videogrammetry.

Space resection is one of the standard orientation tasks of photogrammetry and computer vision and it was historically

the first orientation procedure to be applied since the middle of 19th

century35

. It is defined as 2D-3D procedure, as it

combines information from 2D images with 3D object space. Given a set of corresponding points pi and Pi from an

image and 3D object, respectively (Figure 3), the process of image orientation is defined as the computation of its

exterior orientation parameters, i.e. the position vector X0, Y0, Z0 of its center of perspective and the rotation of the image

reference, represented by appropriate rotation angles , , or by quaternions a, b, c, d.

In the literature, a number of different approaches have been published to solve the spatial resection problem based on

projective and perspective camera models and on a combination of the two35-37

. Closed-form solutions are derived from

the projective camera model, but they are more prone to multiple or unstable solutions than the perspective camera

model. The camera model based on perspective collineation is expressed by the well-known nonlinear relationship:

𝐩𝑖 = 𝜆 ∙ 𝑹 ∙ 𝐏𝑖 + 𝒕 , 𝑖 = 1,…

(3 × 1) (3 × 3) (3 × 1) (3 × 1) (1)

where

- 𝐩𝑖 = [𝑥𝑖 − 𝑥0𝑦𝑖 − 𝑦0−𝑐

] is the vector of Cartesian coordinates of a point in image space, being (x0, x0,-c)T the image

coordinates of principal points and c the camera constant (i.e., the camera interior orientation parameters);

- is the scale factor;

- R(, , ) or R (a, b, c, d) is the nonlinear rotation matrix that can be expressed in terms of ration angles as well

as of quaternions;

- Pi = (Xi,Yi,Zi)T is the vector of Cartesian coordinates of a point in object space;

- t = (X0,Y0,Z0)T is the vector containing the object space coordinates of center of perspective.

Assuming that the image coordinates are corrected for lens distortions and the camera interior parameters are known and

kept fixed, according to equation (1) the minimum solution for the space resection problem should be obtained with three

non-collinear points known in object spaces. However, with the minimum number of three reference points up to four

real solutions can result35

. To remove the ambiguity, either approximate values of the camera exterior orientation

parameters or a fourth reference point should be known. If one assumes that the reference points are fixed on a rigid body

whose 6DOF are to be determined along different epochs, for each epoch a space resection problem can be solved to

compute sequentially the position and orientation of the camera relative to the observed body. By computing the inverse

transformation, the pose of the body with respect to the camera can be derived epoch by epoch, thus obtaining the motion

trough the video stream. If also the camera undergoes unknown movements, a reference field can be used to define an

absolute datum with respect to which we can express the movements of the camera and the body, thus deriving the

relative orientation between both object systems26

.

Theoretical precision of space resection can be anticipated by numerical simulation26

. Nevertheless, simulations are often

too optimistic, as instability of interior orientation parameters and other systematic effects cannot be easily accounted for.

The accuracy of space resection depends on many parameters and factors that affect the estimation of object’s pose: the

distance between the camera and the observed object, the distribution of object points (i.e., coverage of measurement

volume, non-planar configuration) and the number of reference points . From simulated tests26

, it resulted that increasing

the number of reference points more than 9 does not further improve the solution; however, some commercial systems

suggest a minimum of twelve well-distributed reference points13

. In favorable conditions, precisions better than 1:10.000

of the maximum dimension of the measuring volume for the three translations and better than 0.05° for angular

orientation can be achieved26

.

3 MOTION CAPTURE TESTS IN A SHIP MODEL BASIN

3.1 Test environment and system set up

The MOCAP system introduced in Section 1.1 was used for tracking and measuring the 6DOF (named heave, surge,

sway, roll, pitch and yaw in naval terminology) of ship models in the laboratories for hydrodynamic experiences of the

Federico II University of Naples, Italy (Figure 4). A ship model basin (or towing tank) is a pool where tests on scaled

ship models are carried out to assess the performances of the ship in both calm and rough sea conditions. The

measurement of the movements experienced by the model under different sea conditions is required for estimating the

behavior of full-scale vessels in the real environment.

a)

b) c)

Figure 4. a) Sketch of the towing tank facility of the Federico II University of Naples. b) System set-up with two of the three

video cameras secured to the towing carriage and tested ship model (designed by Dr. C. Bertorello). c) Configuration of the

MOCAP system with the three video cameras (A, B, C) looking at the ship model.

In the experiments originally designed for the multi-view camera system11

, a 2 m long ship model (designed by Dr. C.

Bertorello, Federico II University of Naples), was secured to the tank sides (Figure 4b) and several sea states were

generated. This specific testing procedure, called zero-speed tests in beam waves, involved the model at zero forward

speed constrained with its longitudinal axis along the transversal section of the tank in order to get the waves clashing its

side. The tests, conducted within the research project “Safety and Comfort Onboard Fast Ships” headed by Dr. E.

Begovic38

, aimed s to measure heave, roll and pitch motions of the model. The three video cameras (Figure 4c) were

fixed with special photographic clamps to the main carriage of the towing tank, with one camera, namely the master A, a

bit farther (3 m distance) from the ship model then the slave cameras B and C (2.8 m distance). The distinction between

master and slave cameras means that camera A was selected as reference for the synchronization procedure based on

cross-correlation of the recorded audio signals11

. Because of the de-interlacing procedure applied to the acquired frames,

the average ground sampling distance (GSD) was 1.5 mm only along the width direction of the image format. In the

A

B C

other direction, the actual spatial resolution in object space was about 3 mm. Figure 5 shows three simultaneous frames

extracted by the video streams of the multi-view camera system.

3.2 Motion capture system requirements

The end-users of the project specified the required accuracy in the measurement of the 6DOF of the model: (i) 0.75 mm

for linear movements (heave motion, in particular), i.e. 5% of the minimum wave amplitude (equal to 17 mm) generated

for the seakeeping tests and (ii) sub-degree for angular movements (especially roll and pitch). The highest generated

wave frequency was of 1 Hz, to which corresponded the fastest movements for the targets fixed to the ship model. It was

expected that the critical dynamics would not exceeded a speed of 0.5 m/s.

a) b) c)

Figure 5. Scaled ship model recorded by the three camcorders during one of the executed tests.

a) b)

Figure 6. a) Scaled ship model with both passive and active targets in static condition during the preliminary

photogrammetric survey. b) Photogrammetric network and 3D reconstruction of the surveyed ship model.

3.3 Photogrammetric measurement of the targets on the rigid body

Before the execution of seakeeping tests in the tank, both passive and active (retro illuminated by LEDs) targets were

located on the scaled ship model (Figure 6a). The passive targets were used to identify the design coordinate reference

system on the model, while the active (five) targets provided the model motions during the tests. The positions of the

targets with respect to model reference system were measured with a standard close-range photogrammetric survey using

a DSLR camera and a robust image network (Figure 6b). The σXYZ theoretical precision of 3D coordinates from the

bundle adjustment was better than 0.1 mm.

4 DATA PROCESSING AND COMPARISON

4.1 Processing of single camera video streams trough space resection

The video cameras were calibrated using a self-calibrating bundle adjustment and a 3D calibration testfield located

approximately at the same distance of the successive tests run in the towing tank.

Knowing the interior parameters of the three video cameras, the previously measured 3D coordinates of the targets on the

ship model (Figure 6a - Section 3.3) were used as reference to compute the sequential resections through the epochs.

Here the assumption that the body is rigid is fundamental since it means that the relative positions of reference points

(active targets) had not changed over time. At each epoch, the 2D image coordinates of the five active targets were given

as input for spatial resection that was computed using Australis (Photometrix39

).

The exterior orientation parameters (6DOF) of each video camera through the epochs represent its relative motion with

respect to the ship model, considered still (Section 2). Obviously, the actual motion was the one of the ship model

whereas the video cameras were fixed to the towing tank carriage. By inverting each spatial similarity transformation

built with the 6DOF obtained by spatial resection, each video camera is considered still in its reference frame and the

motion is applied to 3D coordinates of the scaled ship model, known from the photogrammetric survey. As a result of the

sequential spatial resections, for each epoch, the 6DOF of the ship model with respect to each video camera were known.

An additional similarity transformation was needed to derive the motion in a more convenient reference frame. Generally

this task is accomplished by an initialization procedure at the beginning of each seakeeping trial: in the first epoch, the

model is in a quasi-static condition with its main axis properly aligned orthogonally to the main direction of the incoming

waves. That position is considered as reference, with all 6DOF equal to zero. Due to obstructions and water reflections,

target tracking was subject to gross errors and consequently wrong 6DOF estimation. Because of low redundancy (five

points vs three minimum with known approximations) the RMS of 2D image residuals for each resection was used as

first filtering procedure to reject the entire epoch. The average RMS of 2D image coordinates was below 1.5 microns,

while a threshold of 15 microns was used as rejection criteria. Furthermore, the continuity of trajectory was assumed and

an additional filtering and interpolation was applied. Finally, as the position of the Centre of Gravity (CoG) of the ship

model was known in the design coordinate reference system, its coordinates were propagated through the successive

epochs.

4.2 Comparison between multi-view and single camera approaches

To quantitatively evaluate the performances of the single camera space resections, the CoG displacement (Figure 7) and

angular movements of the ship model obtained from the three single video cameras are compared with the same motion

derived from the multi-view camera configuration. The results are shown afterwards for the most demanding test,

characterized by the highest wave frequency (1 Hz) and the smallest wave amplitude (17 mm), which corresponds to the

highest speed (0.35 m/s) and the most severe requirement in terms of accuracy in linear motion measurement.

Figure 7a shows the trajectories of CoG derived from the multi-view approach as well as the three single video cameras.

As it can be easier seen in the particular reported in Figure 7b, there is a fair agreement among the different patterns, but

the distances increase with the intensification of motion dynamics. This outcome is confirmed by the differences

computed between the reference trajectory from multi-view method and the single cameras. Figure 8 is the displacement

of the CoG along the vertical axis measured with the multi-view method. Figure 9, 10 and 11 represent the differences in

the motion estimation derived by the three single cameras whereas the two straight lines indicate the requisite accuracy

of 0.75 mm. The farthest video camera (A) shows the highest deviations, while one of the two closest views displays a

RMS of 0.7 mm that is within the accuracy requirement. The comparisons for the roll angle are reported too. The roll

angle variation through the test, determined with the multi-view approach is shown in Figure 12. The differences with the

single cameras are displayed in Figure 13, 14 and 15, where also the required measurement accuracy of 0.5° is indicated

as straight lines. Also in this case, camera C, one of the closest to the ship model, provides the lower RMS, but the

differences between the single cameras are not very significant. It should be highlighted that in the initial part of the test,

when the motion dynamics is slow, all the three video cameras provide sub-millimeter and sub-degree accuracy, while

the differences increase when the dynamics becomes faster.

a)

b)

Figure 7. a) Center of Gravity (CoG) trajectories derived from the multi-view approach and the three single video cameras. b) Close-

up particular of the trajectories when the motion dynamics increases.

Figure 8. Displacement of the CoG along the vertical axis measured with the multi-view method.

Figure 9. Difference between multi-view and most distant single camera in measuring the CoG displacement in the vertical direction.

Figure 10. Difference between multi-view and close camera B in measuring the CoG displacement in the vertical direction.

Figure 11. Difference between multi-view and the close camera C in measuring the CoG displacement in the vertical direction.

RMS=1.8mm

RMS=1.5mm

RMS=0.7mm

Figure 12. Roll angle measured with the multi-view method.

Figure 13. Difference between multi-view and the most distant single camera in measuring roll angle.

Figure 14. Difference between multi-view and the close camera B in measuring roll angle.

RMS=0.71°

RMS=0.66°

Figure 15. Difference between multi-view and the close camera C in measuring roll angle.

5 FINAL COMMENTS AND CONCLUSIONS

The paper reported some results of 6DOF motion analysis based on single camera sequential spatial resections. A ship

model in a naval basin is used as test object. The motion of the ship model is recorded simultaneously by three identical

consumer-grade full HD 50 Hz video cameras placed in three different positions at an average distance of about 3 m. The

system of three video cameras was developed by the authors and already presented as low-cost off-line videogrammetric

system in a previous work. The system provided sub-millimeter accuracy results and was used in this work as reference

for evaluating the accuracy of 6DOF estimation by single camera resections. The comparisons between the 6DOF

estimated by multi-view and single camera methods show that the second approach is undoubtedly less precise, in

particular for fast motion dynamics. In summary, the main outcomes that can be highlighted are:

(i) When the motion dynamics is slow (<0.1m/s), all the three video cameras (and a sequential spatial resection

method) provide sub-millimeter and sub-degree accuracy.

(ii) One of the closest cameras shows the best results in the measurement of linear motions, within the required

accuracy, even when the motion dynamics increases.

(iii) The other closest camera displays a negative trend in the measurement of linear motions that can be

probably explained as residual synchronization error not properly compensated. Indeed that camera

received the biggest correction (interpolated image trajectories) after the synchronization procedure.

(iv) The differences in angle estimation provided by the three cameras are not significantly different, despite the

different distances from the ship model.

(v) All the differences in the linear and angular motion estimation provided by the three video cameras show a

sinusoidal behavior, with a frequency comparable to the one of the ship model dynamics.

The issues described in (iv) and (v) are still under investigation.

Besides the above comments, the results are promising if the low-cost characteristics of the video cameras used are taken

into account: the cameras did not allow to set exposure time - a crucial factor to avoid motion blur effects on targets, or

to fix the focus distance, which can produce an unpredictable variation in camera calibration parameters during the

acquisition. Probably, these effects may be the reason of the different results between the two closest cameras B and C,

placed at the same distance from the object, but featuring different brightness due to different exposure values (see

Figure 5b an Figure 5c). It is worth to note that a difference of 1/100 s at the maximum linear speed of 0.350 m/s may

cause a motion blur in the object space of 3.5 mm. Last but not least, the employed video cameras provided for interlaced

frames so resulting in low resolution images if comparing to actual video cameras. Nevertheless the sequential spatial

resection procedure based on single camera acquisition is a promising low-cost, offline videogrammetric solution for

estimating 6DOF of a rigid body.

RMS=0.61°

ACKNOWLEDGMENTS

The authors would like to thank Dr. Ermina Begovic and Dr. Carlo Bertorello for ship models and Dr. Andrea Bove,

head of Hydrodynamic Laboratory of Federico II University in Naples (Italy) for the opportunity to realize the study

presented in this article. A special thank goes also to Manfrotto, a Vitec Group company, for providing the photographic

materials (clamps, tripods, etc.) used in the experiments and Prof. Salvatore Troisi (Parthenope University, Italy) for the

useful discussions and support.

REFERENCES

[1] Hampapur, A., Brown, L., Connell, J., Ekin, A., Haas, N., Lu, M., Merkl, H., Pankanti, S., Senior, A., Shu, C. F.,

and Tian, Y. L., “Smart video surveillance - exploring the concept of multiscale spatiotemporal tracking,” IEEE

Signal Processing Magazine, 38-51 (2005).

[2] Maas, H.-G., Putze, T. and Westfeld, P., “Recent developments in 3d-ptv and tomo-piv”, In W. Nitsche and C.

Dobriloff, editors, Imaging Measurement Methods for Flow Analysis, Volume 106/2009 of Notes on Numerical

Fluid Mechanics and Multidisciplinary Design, 53-62 (2009). Springer Berlin / Heidelberg.

[3] xsens, https://www.xsens.com/

[4] Roetenberg, D., "Inertial and Magnetic Sensing of Human Motion," Ph.D. Thesis, University of Twente, Enschede,

The Netherland, (2006).

[5] Brodie, M. A. D., "Development of Fusion Motion Capture for Optimisation of Performance in Alpine Ski Racing,"

Ph.D. Thesis, Massey University, Wellington, New Zealand, (2009).

[6] Leica Geosystems, http://metrology.leica-geosystems.com/en/Laser-Tracker-Systems_69045.htm Accessed April

2015.

[7] Automated Precision Inc. (API), http://www.apisensor.com/ Accessed April 2015.

[8] Gruen, A., “Fundamentals of videogrammetry - a review,” In Human Movement Science, Vol. 16, Nos. 2-3, pp.

155-187 (1997) http://www.idb.arch.ethz.ch/files/videogrammetry.pdf.

[9] Luhmann, T., “Precision potential of photogrammetric 6DOF pose estimation with a single camera,” ISPRS Journal

of Photogrammetry and Remote Sensing 64(3), 275-284 (2009).

[10] Rupnik, E., and Jansa, J., “Off-the-shelf videogrammetry – a success story”, ISPRS Archives of the

Photogrammetry, Remote Sensing and Spatial Information Sciences, XL-3/W1, pp. 99-105 (2014).

[11] Nocerino, E., Menna, F., Troisi, S., “High accuracy low-cost videogrammetric system: an application to 6DOF

estimation of ship models”, Proc. of Videometrics, Range Imaging and Applications XII, SPIE Optical Metrology,

Vol. 8791, doi: 10.1117/12.2020922 (2013)

[12] Nocerino, E., Ackermann, S., Del Pizzo, S., Menna, F. and S., Troisi, S., “Low-cost human motion capture system

for postural analysis onboard ships,” Videometrics, Range Imaging, and Applications XI, Proc. SPIE,

Remondino/Shortis (Eds), Vol. 80850L-1-15 (2011).

[13] Geodetic Systems, http://www.geodetic.com/ Accessed April 2015.

[14] zFlo, http://www.zflomotion.com/blog/bid/341006/zFlo-Is-Now-Providing-Reflective-Markers Accessed April

2015.

[15] Ar-tracking, http://www.ar-tracking.com/technology/markers/ Accessed April 2015.

[16] Pourcelot, P., Audigié, F., Degueurce, C., Geiger, D., and Denoix, J. M., “A method to synchronise cameras using

the direct linear transformation technique,” Journal of Biomechanics 33(12), pp. 1751-1754 (2000).

[17] Whitehead, A., Laganiere, R., and Bose, P., “Temporal synchronization of video sequences in theory and in practice,

” In Application of Computer Vision, WACV/MOTIONS'05 Volume 1, Seventh IEEE Workshops on, vol. 2, pp.

132-137, IEEE (2005).

[18] Raguse, K., and Heipke, C., “Synchronization of image sequences: a photogrammetric method,” Photogrammetric

Engineering & Remote Sensing 75(4), pp. 535-546 (2009).

[19] de Barros, R. M. L., Russomanno, T. G., Brenzikofer, R and, Figueroa, P. J., “A method to synchronise video

cameras using the audio band,” Journal of Biomechanics 39(4), pp. 776–780 (2006).

[20] Hasler, N., Rosenhahn, B., Thormahlen, T., Wand, M., Gall, J., and Seidel, H. P., “Markerless motion capture with

unsynchronized moving cameras,” In Computer Vision and Pattern Recognition, CVPR 2009 IEEE Conference, pp.

224-231 (2009).

[21] Willneff, J., [A spatio-temporal matching algorithm for 3D particle tracking velocimetry], Diss., Technische

Wissenschaften ETH Zurich, Nr. 15276, 2003, IGP Mitteilung N. 82 (2003)

http://www.photogrammetry.ethz.ch/research/diss/DissETH15276.pdf

[22] Luhmann, T. and Raguse. K., “Synchronous 3-D high-speed camera with stereo-beam splitting,” SENSOR 2005,

12th International Conference, AMA Service, pp. 443-448 (2005).

[23] Godding, R., Luhmann, T., and Wendt, A., “4D Surface matching for high-speed stereo sequences, ” Int. Archives

of Photogrammetry, Remote Sensing and Spatial Information Sciences, Vol. 36(5), ISPRS Commission V

Symposium, Dresden (Germany) (2006).

[24] Putze, T., Raguse, K., and Maas, H. G., “Configuration of multi mirror systems for single high-speed camera based

3D motion analysis, ” In Electronic Imaging, International Society for Optics and Photonics, pp. 64910L-64910L

(2007).

[25] Chong, A. K., “An inexpensive stereo-image capture tool for motion study,” The Photogrammetric Record 22(119),

pp. 226–237 (2007).

[26] Luhmann, T., “Precision potential of photogrammetric 6DOF pose estimation with a single camera,” ISPRS Journal

of Photogrammetry and Remote Sensing 64(3), 275-284 (2009).

[27] Armstrong, B., Verron, T., Heppe, L., Reynolds, J., and Schmidt, K., “RGR-3D: simple, cheap detection of 6-DOF

pose for teleoperation and robot programming and calibration,” Robotics and Automation Proceedings, ICRA'02,

IEEE International Conference Vol. 3, pp. 2938-2943 (2002).

[28] Bethmann, F. and Luhmann, T., “Monte-Carlo simulation for accuracy assessment of a single camera navigation

system, ” Int. Archives of Photogrammetry, Remote Sensing and Spatial Information Sciences, Vol.39(B5), pp. 9-14

(2012).

[29] AICON, http://aicon3d.com html April 2015.

[30] AXIOS 3D®, http://axios3d.com Accessed April 2015.

[31] Cyclope, http://sed.bordeaux.inria.fr/people/hervemathieu/6doftracker/ Accessed April 2015.

[32] Qualisys, http://www.qualisys.com/products/hardware/oqus/ Accessed April 2015.

[33] Vicon, http://www.vicon.com/ Accessed April 2015.

[34] OptiTrack, http://www.optitrack.com/ Accessed April 2015.

[35] Wrobel, B., P., “Minimum solutions for orientation”, in Gruen, A., & Huang, T. S. (Eds.). Calibration and

orientation of cameras in computer vision (Vol. 34). Springer Science & Business Media (2001).

[36] Al Khalil, O. and Grussenmeyer, P., "Solutions for exterior orientation in photogrammetry, a review.", The

Photogrammetric Record, , ISSN: 615-634 (2002).

[37] Luhmann, T., Robson, S., Kyle, S., and Harley, I., [Close range photogrammetry: Principles, methods and

applications], Whittles (2011).

[38] Begovic, E., Bertorello, C. and Orsic, J. P., “Roll damping coefficients assessment and comparison for round bilge

and hard chine hullforms,” Proceedings of the ASME 2013 32nd International Conference on Ocean, Offshore and

Arctic Engineering (2013).

[39] Photometrix http://www.photometrix.com.au/ Accessed April 2015.