click to edit master title styleperso.limsi.fr/vezien/ra_orleans/intro_ra_vezien.pdfiphone app: ny...

Click to edit Master title style

“Imagerie Opérationnelle”

Polytech’Orleans 2012

Augmented Reality: an introduction

Jean-Marc Vezien

[email protected]

2

Vision-Embarquée - “Imagerie Opérationnelle”- Polytech’Orleans

Plan of the lecture

1. Augmented reality: what and why (with examples !)

2. User tracking

3. Real world 3D

4. Graphics

5. Augmentation

3


Wikipedia says…

Augmented reality (AR) is a term for a live

direct or indirect view of a physical, real-world

environment whose elements are augmented by

computer-generated sound, video, graphics,

haptic or GPS data.

Augmentation is conventionally in real-time and

in semantic context with environmental

elements, such as sports scores on TV during a

match.

With the help of advanced AR technology

(computer vision and object recognition) the

information about the surrounding world

becomes interactive, e.g. artificial information

about the environment can be overlaid.

Augmented reality was coined by Thomas

Caudell, working at Boeing, in 1990. ARToolkit (Kato & Billinghurst, 2001)

4


Real/virtual Continuum (Milgram 1994)

5



6



7



8



9



10


AR: examples

Not a new idea….

• User position and gaze provides context ,

sometimes with help (markers)

• Audio is nice (non-obstrusive), can be

stopped anytime.

• No computer involved

• No sensing involved : limited interaction

Now associated with head-tracking…

Started in 1957 (Roosevelt home)

Principio system

(2007)

11


AR: examples

See-through augmented vision: « classic AR »

Tourism: Archeoguide (2002)

12


AR: examples

Assemply / Maintenance / Repair

Maintenance: help technician with

contextualized content

BMW, 2010

Matris project, 2007

Fiducial Text

Action Graphics Icons

13


AR: examples

Way-finding

Iphone app: NY nearest subway (and many cities, airports have their own app)

Augmented Car Finder

Note:

• Data is collected off-line

• Access to database is native on mobile phones

(http protocol)

14


War "games"

Track target w.r.t. weapon

Coupled with (off-line) Geographical Information

system

… and strategic realtime info (GPS, detectors)

gun-mounted display

15


MR: examples

Games and promotional contents

EyePet for PSP (2009)

Topps 3D baseball cards (2009)

Note: often Webcam + computer (not wearable)

PSVita (2012)

16


Other examples

Medical: Surgery planning, nurse and student training…

University of North Carolina

Augmented virtuality (presented on screen)

Track hand-held device or body

Coupled with (off-line) anatomic data

17


• Militaires: « future soldier », BARS

• Médical : assistance chirurgicale

• Tourisme

• Customization (mode)

Virtual try-ons and Customize

Augmented virtuality or

AR

Track users anatomy

and motion (better still)

Coupled with external

data on-line

Webcam Social Shopper by Zugara

Augment for Ipad (2012)

18


MR: the SACARI example

Mixed virtuality: user not moving with camera,

indirect view of real world

Tele-immersion : provide sense of

presence of remote environment

Internship available ! PhD position !

19


Blurred boundaries: augmented movies

Is this real ? Augmented ? Virtual ?

On-line ? Off-line ?

20


Blurred boundaries

Is this still AR ?

AR will be everywhere the

moment efficient, cheap

see-through displays will

become available

Google moto: The World's

Information in Context

Street View is close to

Augmented Virtuality

Serguei Brin (2012)

21


The Ingredients for AR

Capture real world

22



Capture real world

+

Capture virtual world

(Computer Graphics)

23



Capture real world

+

Capture virtual world

(Computer Graphics)

Present to user

(Augmentation)

24


Real 3D world

Two ingredients necessary for

successful AR:

Real-time 3D tracking of

user viewpoint w.r.t. world

3D scene analysis

realistic augmentation

WorldViz

Shadow Zone

25


Real 3D world


successful AR:

Real-time 3D tracking of

user viewpoint w.r.t. world

3D scene analysis


WorldViz

26


Real-Time tracking

Many means to compute positioning information:

Geolocalization

Electro-magnetic

Acoustic

Inertial

Vision-based (active or passive)

27


GPS triangulation

Global positioning system = array of synchronized

satellites: triangulate position (radio-triangulate)

Receptor has poor sync: affects precision

Differential GPS provide extra accuracy

Typical: max 50 cm accuracy

Provides position only

Couple with gyroscope or compass

(see after).

28


Electro-magnetic tracking

Three mutually-orthogonal coils

Each transmitter coil activated serially

Induced current in the receiver coils is measured

Varies with

the distance (cubically) from the transmitter and

their orientation relative to the transmitter (cosine of the angle between the axis and the local magnetic field direction)

Three measurements apiece (three receiver coils)

Nine-element measurement for 6D pose

Sensitive to magnetism + wires ! Source: SIGGRAPH 2001 Course 11 – Slides by Allen, Bishop, Welch

29


Acoustic tracking Triangulation of sound sources

The intersection of two spheres is a

circle.

The intersection of three spheres is two

points.

One of the two points can easily be

eliminated.

Ultrasonic

40 kHz typical

Good precision, but still wires

From [1]

Intersense IS-600 Mark 2

30


Image-based 3D tracking

It is a special case of adaptive

rendering ( for HMDs)

Environment control ++

Markers are still needed

(often)

Special case of

structure/motion estimation

AR meets movie industry:

the director is the user !

markers Known as motion estimation / match moving

31


From [3]

= user

From [3]

Inside-out vs. Outside-in

= user

Inside-out: user wears camera Outside-in: camera observes user

Inside-out better at estimating relative rotation

Outside-in much more convenient : don’t have to wear a camera

Special case: console gaming

Inside-out easier for wearable AR: mobility, ego-centered

reference frame

Needed for future nomadic apps

Cameras are getting small and commodity items

32


M

Z

Y

X

P

tt

tttt

v

u

v

u

sz

y

x

cv

cu

110

k

j

i

0100

0010

0001

100

0

0

1

image 3D world

(i,j,k,Tx,Ty,Tz) can be computed if m and M are known.

Not linear in the motion parameters !!

m = P . M

motion structure

The Maths Projection equation:

33


M

Z

Y

X

P

tt

tttt

v

u

v

u

sz

y

x

cv

cu

110

k

j

i

0100

0010

0001

100

0

0

1

Projection equation:

image 3D world



m = P . M

motion structure

The Maths World camera =

Camera position

34


M

Z

Y

X

P

tt

tttt

v

u

v

u

sz

y

x

cv

cu

110

k

j

i

0100

0010

0001

100

0

0

1


image 3D world



m = P . M

motion structure

The Maths 3D 2D projection

35


M

Z

Y

X

P

tt

tttt

v

u

v

u

sz

y

x

cv

cu

110

k

j

i

0100

0010

0001

100

0

0

1


image 3D world



m = P . M

motion structure

The Maths

Sensor image (pixels)

36


Dementhon-Davis (1992)

Several methods to recover calibration exist (Tsai, Lowe…)

" Model-Based object pose in 25 lines of code" is a simple

yet elegant method to obtain pose rapidly if a rigid 3D model

is provided.

i

i

i

i

i

i

i

ii

xPI

PTz

kTz

TxP

Tz

i

TzPk

TxPi

Z

Xx

1

.

1.

.

.

. 0

i

ii

yPJy

1

. 0

),( 00 yx Tz

Pk ii

.

Projection of T(Tx,Ty,Tz) Relative depth around Z0=Tz

Idea consists in linearizing xi and yi to compute iJI ,,

ziz TTZ

37


i

o

i

i

o

i

i PJyy

PIxx

.

.

0

011

1

Geometrical interpretation:

P

O

Image plane

P0

Reference plane

Z0=Tz

Orthographic

projection

Z

Y

i Z0

Perspective

projection

Linear approximation of perspective

iio

y

iii

o

i

o

x

y

xxx

38


POSIT algorithm

Summarize:

Tz

jPJyyyy

Tz

iPIxxxx

i

o

iii

i

o

iii

=Jwith.)1(

=Iwith.)1(

00

00

If εi is known then I,J,Tz,X0=x0Tz,Y0=y0Tz can be computed

If I,J,Tz k is known

Iterate starting with

Converges rapidly:

Tz

Pk ii

.

0i

39


3D tracking: off-the-shelf solution

• Kato & Billinghurst (HIT lab, univ. of Washington) introduce

in 1999 a tool for teleconferencing:

"Marker Tracking and HMD Calibration for a video-

based Augmented Reality Conferencing System.“

Soon to become ARToolkit

40


ARToolkit: main characteristics

• Fast and cheap 6D marker tracking.

• Distributed with complete source code.

OpenSource with GPL license for

noncommercial usage.

• Multiplatform (Linux, MacOS and Windows) .

• Multiple input sources (USB, Firewire) ,

multiple format (RGB, YUV) supported.

• Multiple camera tracking supported.

• GUI initializing interface.

• Easy calibration routine.

• Fast rendering based on OpenGL.

• 3D VRML support.

• Simple and modular API (in C and C++).

• Complete set of samples and utilities.

• Supports both video and optical see-through

AR.

41


ARToolkit: main processing loop

Since v4, Pose estimation is performed using the

Iterative Closest Point (ICP) algorithm (Besl, 1992)

42


Important details

43


Important details

Virtual objects appear only when complete markers are visible:

Size

Movement

Orientation

Lighting conditions

Pattern Size (cm.) Usable Range (cm.)

6.98 40.64

8.89 63.5

10.79 86.36

In practice: close range only !

44


Real 3D world


successful AR:

Real-time 3d tracking of user

viewpoint w.r.t. world

3D scene analysis


45


2 components: Light and Geometry

Lighting coherency Motion coherency

Geometric coherency

Light probe

3D real world analysis

46


Aim: Capture light coming from light sources in real world

10,000:1 is the (static) human eye ratio between brightest and

darkest shade: cannot be represented on 8 bits

Stored in High Dynamic Range images (.hdr image format)

From all directions at a given point

Light Probe

u

v

47


Half-life 2: normal and HDR

Note: advanced graphics techniques can

use hdr images (Unreal engine)

u

v

1,1),( vu

u

v1tan 22 vu

A direction vector in the world (Dx, Dy, Dz)

u = r. Dx and v = r.Dy with

unit vector pointing in the direction (u,v) is obtained by rotating

(0,0,-1) by:

1) degrees around the y (up) axis

2) degrees around the -z (forward) axis.

22

1 )(cos1

DyDx

Dzr

Light Probe (2)

48


World content

Geometry: 3D reconstruction : what is

where ?

Semantic : augmentation of context

object recognition

interpretation of content

Comprendre l’environnement 3D en termes : http://www.truevisionsys.com

49


Markers cannot always be present !

If structure is known: a priori model tracking (see before):

an object is a (collection of) marker

Remember : projection combines structure and motion (m

= P.M)

Idea: recover both simultaneouly !

+ gives tracking and reconstruction at the same time

- highly non-linear : requires iterations + data filtering (remove

outliers )

Geometry: 3D reconstruction

Factorization method introduced by Tomasi and Kanade (1992)

50


mij = Pj. Mi

Factorization method

2x1 = 2x3 3x1

FPF

FPF

P

P

yy

xx

yy

xx

m

1

1

111

111

jth frame

ith point

P

P

P

ZZ

YY

XX

M

1

1

1

FP

P

P 1

• Compact representation:

• Consider P points projecting on F frames:

• There is an of possible (P,M) pairs!

• Solved via SVD decomposition (but not uniquely)

Carlo Tomasi and Takeo Kanade. (November 1992)

"Shape and motion from image streams under orthography: a factorization method."

International Journal of Computer Vision, 9 (2): 137–154.

Images !

m is big, but rank 3 ! m= P.M

51


S. Christy et R. Horaud : "Euclidean Shape and Motion from Multiple Perspective Views by Affine Iterations"

IEEE Transactions on Pattern Analysis and Machine Intelligence, Volume 18, Number 11, Pages 1098--1104 - November 1996

Factorization: results

Hanno Ackermann, University of Hanovre (2008)

52


Computer Vision is good at locating features:

points

regions

textures

contours

… at different scales !

Image-based 3D reconstruction

Harris (1988)

SURF detector (OpenCV), Bay (2006)

2D region detection (region-growing)

53


Problems :

Robustness of detection

Matching (space/time)

Will become prevalent eventually.

For now: limited use , constraints

(model-based)

Markerless tracking

Wang & Popovic, MIT, 2010

54


Boujou: http://vicon.com/boujou/

Commercial solutions

Pricey (10 000$)

55


New real-time sensors: depth cameras (Zcam)

Infer depth for ALL pixels

Early : laser, stereo Perceptron

LIDAR , 1995

Devernay, 1994


56


Now: structured lighting : Zcam for 300€ !

Kinect, 2010

Augmented Reality Magic

Mirror using the Kinect

Tobias Blum


Solves the "where" but not the "what":

objects must still be identified (segmented)

Calibration, segment by depth

57


Stereo for AR: augmentation

Object location (table…)

Object occlusion

Shadows and light

interactions

In real-time !!

Will soon happen in

movie industry ($$$)

3D reconstruction is necessary for realistic CG blending:

X3d consortium (web)

58


3D analysis of

real images

(camera +

reconstruction)

Augmentation: CG blending

59


CG graphics + 3D analysis of

real images

(camera +

reconstruction)


60


CG graphics

Compositing masks

(shadows, occlusions…)

+ 3D analysis of

real images

(camera +

reconstruction)


match move

61


CG graphics

Compositing masks

(shadows, occlusions…)

Augmented image

+ 3D analysis of

real images

(camera +

reconstruction)


match move

62


Matching constraints:

• geometry (epipolar)

• photometry

H. Jin, P. Favaro, and S. Soatto.

A Semi-direct Approach to Structure From Motion.

The Visual Computer, 19(6): 377-394, October

2003.

Example of augmentation pipeline (1)

image analysis

Region matching (1992)

63

Vision-Embarquée - “Imagerie Opérationnelle”- Polytech’Orleans Master RV&A 2011 - J-M.

Vézien


3D reconstruction

3D reconstruction of regions based on planar equations:

Hypothesis: world is piece-wise planar

64


Prior information:

explicit 3D model

motion constraints

3D registration by ICP = Iterative

Closest Point (Besl 92):

always converges

model points do not coincide with

reconstruction points


3D reconstruction

65



3D virtual content

Two steps:

Geometric modeling

(sometimes based on 3D

scans): Maya, Blender,

3DSMAX, Sketchup, etc.

Photo-realistic rendering:

textures, lights, reflectance,

shadows, etc.

Note: Light probes

environment mapping

66


Light interaction: a must

Geometric coherency is good

but… light is essential !

Shadows in real-time =

geometry + light

Standard technique for CG renderers:

Ground (real)

(virtual) PSP

.'

dnEc

.nLd

.

E

S

67


Shadow projection techniques

Many different techniques !

Plane Projected

Shadows

Projected

Shadows

Depth Shadow

Mapping Vertex Projection

Shadow

Volumes

Quick, not much

calculations

Quick, almost no

calculations

Very quick, not

much calculations

Slow with high-res

meshes

Slow, lots of

calculations

High detail

Detail depends

on texture

Detail depends

on texture High Detail High detail

No self-

shadowing

No self-

shadowing Self-shadowing

No self-shadowing

Self-shadowing

No shadow

receivers

Shadow

receivers

Shadow

receivers

No shadow

receivers

Shadow

receivers

68


Shadow mapping

M. Haller, S. Drab, and W. Hartmann, 2003.

"A real-time shadow approach for an augmented reality application using shadow

volumes," in VRST 03: Proceedings of the ACM symposium on Virtual reality software and

technology, New York, NY, USA, 2003, pp. 56-65

Shadow volume with stencil drawing

69



animations

Two steps:

Virtual motions must be

coherent with real ones :

match-moving

Motion is camera-centric

Virtual camera must be

identical to real one (off-line

calibration is necessary)

Accommodate zoom !

70


Occlusion mask

Virtual object rendering

(alone)


Step by step rendering

71


Real objects: reference white

(albedo)

"black" virtual objects:

Shadow computation



72


Final composition =

real image *

(1-mask) * attenuation

+

Virtual objects * mask

+ reflexions * (1-mask)



73


Final compositing

74


Other example (1)

Objets déformables + occultations réel/virtuel

75


Other example (2)

Reference white

Shadows of virtual

objects on the real

scene

76


Other example (3)

Final result

77


Conclusion

o Augmented reality is a reality

o Is a complicated process, requires

a lot of expertise and hard/software

o In-depth coherency of geometry

and photometry

o Real-time challenges

Elaborate rendering (GPU)

Match moving (markerless)

3D reconstruction (occlusions)

User interaction: hand and

body tracking

78


THE END !

Denno Coil, 2010

79


Human vision is good at 3D but …

Mantis Shrimp: best eyes of the world.

The animal with the most sophisticated vision is

thought to be the mantis shrimp. Humans have

only three kinds of light receptors while Mantis

shrimp have ten allowing them to see not only

visible light but infrared and ultraviolet light as

well. They are the only invertebrates that visually

recognizes members of its own species. While

humans see three pigments (red, yellow and

green), mantis shrimp see up to sixteen. Mantis

shrimp also have polarized filters and some can

even produce signals detectable only with a

polarized filter. Mantis shrimp are also able to

see in stereo with each eye individually, which

means that if they lose one eye, they can still

see just as well.

click to edit master title styleperso.limsi.fr/vezien/ra_orleans/intro_ra_vezien.pdfiphone app: ny...

Documents