D1 - 21/04/23
Le présent document contient des informations qui sont la propriété de France Télécom. L'acceptation de ce document par son destinataire implique, de la part de ce dernier, la reconnaissance du caractère confidentiel de son contenu et l'engagement de n'en faire aucune reproduction, aucune transmission à des tiers, aucune divulgation et aucune utilisation commerciale sans l'accord préalable écrit de Recherche et Développement de France Télécom.
France TélécomRecherche & Développement
Raphaèle Balter
Construction of a scalable and evolving 3D model for video coding
27 / 05
/2005
Context
complementary background related to the subject
Context:Evolution of the numerical video world
Q New sources/terminals (dv, more powerful computer or terminals, IP...)
Q Various networks (internet, telephone networks RTC, GSM…)
Q New functionnalities (interactivity, 3D games, DVD, broadcasting...)
New orientations in video coding: Compact transmission with addition of functionalities Content adapted coding
Transmission
coding/decoding
Context: Representations from images
Rendering withno geometry
Rendering withimplicit geometry
Rendering withexplicit geometry
LightfieldLumigraph LDIs
Texture mapped 3D model
View morphing
Less geometry More geometry
Video Metric 3D model
MosaicsView interpolation
+ Photorealistic rendering- Dataset volume- Acquisition system
+ Compact representation+ acquisition system- rendering
Context: 3D model based coding
Capture DisplayAnalysisDigitalization
Real world Video camera Original sequence 3D model Reconstructed sequence
s Principle:
Context: 3D model based representations
s Goal:Q3D extractionQCamera parameters computation
s Assisted modeling: QHuman intervention [debevec96][debevec00]
QSpecific Acquisition system–Turn table [niem94] [debevec96][gibson98]–Robot [mellor00] [zisserman01]
QKnowledge on the scene contents–Faces [preteux00] [girod02]–Architectural scenes [faugeras95][hartley00][dick00] [bazin01] [werner02]
[sturm02]
Context: 3D model based representations
Q3D model stream [galpin02]Original sequence
3D models
M0 M1
I0 I3 I20I8 In
Originalsequence
3D model
I0 I5 In
QSingle model [fitzgibbon99] [roning99][pollefeys00][yao02] [Nis03][yu04]
s Non assisted modeling:QLimits: only for static scene without reflections nor pure rotation camera motion
Objectives
s 3D representation suited for coding
s Envisioned applicationsQVideo coding for distant real-time visualization on heterogeneous terminals
Services Providers
s Constraints:QNon assisted modeling
QNo assumptions on camera parameters nor on scene content
QNo assumptions on video length
QScalability
Problems
solution = tradeoff between a single model and a stream
s Representations:QSingle realistic model
+Realistic consistent representation–Incompatible with video coding constraint on video
length.
Q3D model stream +No assumption on video length+Adapted to the streaming–Inconsistency of the representation-Transitions between models
Problems
s Scalability:QTo represent a signal with several levels of information
QAllowing adaptation of a signal to –the capabilities of the networks –losses transmission –the terminals capabilities
Video bitstream
losses
Terminal computationaland rendering
capabilities
Network bandwidth
Base stream
Refinement layers
Need for multi-resolution representation
Proposed scheme:
video bitstream3D Reconstruction [galpin02]
Evolving modelConstruction(morphing)
HierarchicalCoding
Compression
A priori morphing
Wavelet analysis
A posteriori morphing
Evolving modelConstruction(morphing)
! Automatic algorithm
HierarchicalCoding
! Evolving structure
Overview
s 3D information extraction
s Evolving model construction
s Evolving model coding
s Evolving model compression
s Conclusions/Perspectives
video bitstream3D Reconstruction [galpin02]
Evolving modelconstruction
Coding Compression
Overview
s 3D information extraction
s Evolving model construction
s Evolving model coding
s Evolving model compression
s Conclusions/Perspectives
video bitstream3D Reconstruction [galpin02]
Evolving modelconstruction
Coding Compression
3D extraction: principle of Galpin algorithm
QModel valid for a portion of the original sequence: a GOF (Group of Frames)
QGOF delimitated by keyframes used as texture images
QKeyframe selection based on several criteria:–Global motion–3D validity : epipolar residual–Ratio of the outgoing points
s 3D model stream:Q Classical structure from motion algorithm [faugeras93] [horaud93]
[hartley-zisserman2004]
C1
M(X,Y,Z)
m1m2
C2
Original sequence
3D models
M0 M1
I0 I3 I20I8 In
3D models
Reconstructed sequence
Camera positions
Textureimages
keyframe keyframe
GOF 1 GOF 2
3D extraction: global scheme [galpin]
Images
Textured3D model Estimation
of textured 3D model
Meshand depth
3D mesh computation
from a triangulation
of the keyframeassociated to the GOF
Motion estimationof pixels
[marquant00]
Dense motion field
Extraction and tracking
of interest points [harris88]
Interest points motion
Estimationof camera
posesintra-GOF
[dementhon95]
Camerapositions
Textured3D models
Coder
Estimationof depth image
[huang84]
Images
Save ofkeyframes
Keyframesselection
keyframes
3D extraction:limits of Galpin 3D model stream
s Stream of independant 3D models:QUniform regular meshesQDifferent fields of view
Abrupt transitions between models
Geometric jump
3D extraction: limits of Galpin 3D model stream
Texture jump
Texture image k Texture image k+1
3D extraction: limits of Galpin 3D model stream
Connectivity jump
3D extraction:limits of Galpin 3D model stream
Overview
s 3D information extraction
s Evolving model construction
s Evolving model coding
s Evolving model compression
s Conclusions/Perspectives
video bitstream3D Reconstruction [galpin02]
Evolving modelconstruction
Coding Compression
Construction: a posteriori morphing
s Evolving model: QTradeoff between a single model and a 3D model streamQModel stream with 3D morphing to link models together
s Morphing [hong88][parent92][lazarus98][alexa02]QTwo-steps process:
–Vertex mapping–Interpolation between corresponding vertices
QEfficient methods are semi-automatic [bethel89] [kent92][delingette93]
[decarlo96] [lee99][zockler00] [kanai00][michikawa01]
=> not compatible with our schemes Non detailed contributions:
QA posteriori meshed depth maps morphing [balter03]
QA posteriori 3D model morphing [leguen04]
Construction: a priori morphing
nn
cn
tt
tt
tt
tt
1
11)1( nnc MMM with
s Principle of the new encoding scheme:QNo more uniform gridQCorresponding vertices: vertices of successive models are same physical 3D points of the sceneQImplicit morphing based on those corresponding vertices = simple linear interpolation
Construction: inputs
Camera positions
Texture Images
Depth maps
Dense motion field
Images 3DExtraction[galpin02]
Construction: proposed algorithm
s Fixed connectivity and time evolving
geometry1. Initialisation with a uniform regular
mesh covering the whole image surface
2. Tracking and update of vertices still visible from the next point of view to get the corresponding mesh
3. Integration of the new parts appearing in the next model to get the new-vertices mesh (NVM)
4. Merge
5. Reinitialisation of the model for long sequences to avoid drifts => GGOF (group of GOFs)
1
2
3
Construction: additional constraints
s Merge of CMn and NVMn:
QNot call into question the existant connectivity
QNot create a non manifold mesh
s Vertices must be valid:
QValidity map
Construction: constrained merge
s Manifold merge:QNew vertices triangulated under the CMn envelope constraintQCMn envelope vertices are included in the delaunay triangulationQFaces overlapped CMn mask
CMn envelopeCMn faceCMn vertex
NVMn faceSuperimposed NVMn face
CMn mask
Caption
NVMn vertices
Construction: constrained merge
s Proposed solution for 2-manifold mergeQElimination of all the faces containing only vertices of the CMn envelopeQRecovery of the faces eliminated that do not overlap with CMn mask
–Convex areas of the enveloppe–Detection of holes in the mesh (Euler formula : S-A+F = 2(1-
g) )
CMn envelopeCMn faceCMn vertex
NVMn faceSuperimposed NVMn face
CMn mask
Caption
NVMn vertices
Construction: matching information
s How to transmit the matching information?Q No additional information to transmitQ Known at the encoding stage with the motion fieldQ Retrieved at the decoding stage by:
– reprojecting the model on the following point of view – identifying of vertices having the same 2D coordinates.
Cn
Cn+1
Construction: validity map
Validity map: to ensure matching consistencyCn
Cn+1
s Uncertainty on the motion and on decoded models
due to the errors in 3D estimation
Construction: results
s Stair sequence: lateral translationQGreen: current meshQYellow: next meshQRed: morphing source (subset of the current mesh)QBlue: morphing target (subset of the next mesh)
Construction: results
s Stair sequence: virtual navigation
Tradeoff between single model and model stream=> evolving model = consistent 3D model stream
Overview
s 3D information extraction
s Evolving model construction
s Evolving model coding
s Evolving model compression
s Conclusions/Perspectives
video bitstream3D Reconstruction [galpin02]
Evolving modelconstruction
Coding Compression
Coding: wavelet analysiss Goal: scalable multi-resolution representation
s Classical efficient signal processing tool: wavelets [mallat89]
[derose96]QInterest:
–hierarchical representation of a signal => provides multiresolution–good compression
QPrinciple:–low frequencies representation refined by well located high frequencies (details)–Successive filterings
QExample: image case [jpeg00]
QSurfaces case
Coding: 2nd generation wavelet analysiss 2nd generation wavelets [loop87][dyn90] [schröder95][lounsbery97]
[sweldens98]:QFor non regular surfaces
QCoarse base mesh + refinements
(wavelet coefficients)
Surface
Base Mesh
A
B
C
C
2
BAC
W
BAC
2
Coding: 2nd generation wavelets analysiss Filters:
QGenerated by "lifting scheme"QCan have various sizes according to the properties wanted for wavelets
–Compression requires a minimal size filter QExamples of reconstruction high pass filters
envisionned applications: real-time reconstruction in a adaptive way
need of a fast algorithm => tradeoff compression/speed
0 0 0 0 0
0 0 0 0 0
0
0 0
0
0 0
818
7
161
161
161 16
1
161 16
1 161
161
161 16
1
81
0
0
10
0
0
0 0
0
0
0
0
0 0 0
0
0
0 0
Butterfly Lifted Midpoint lifted Midpoint non lifted
0
0 0
0 0
0
0
161
161
161
161
161
161
161
161
161
161
161
161
161
161
161
161
0 0
161
161
161
161
81
81
818
14
3
167
167
167
167
167
167
167
167
167
167
163
163
163
163
163
163
16316
3
Coding: Independent analysis
not satisfying
Coding: proposed representation
s Proposed representation: QDecompositions based on the same support
QTransformation of each dense depth map into consistent hierarchical triangular meshes
QSupport dissociated of geometry : the single connectivity mesh (SCM)
Coding: base meshes construction
s Base mesh = coarse meshQEvolving model construction
QLarge faces: need for accurate represent the scene despite the face sizes => content based vertices ≠ regular vertices
QTime evolution: increased size and stretched faces management
Coding: base meshes constructions New evolving model generation
Harris corner
detector
Canny edge
detector
Canny edge
detector
Delaunaytriangulation
Init
Update
ValidityMap
computation
Coding: wavelet decomposition
s Decomposition scheme:
Wavelet coefficientscomputation
Filtering
Information computation
Depth difference p
Base modelMBn
Hierarchical 3D meshesconstruction
Canonical facets quadri-section to define scale and wavelets spaces
DensemodelMDn
DensemodelMDn
jM j j
M j 1
)()( jijj
ij
ijS
pd
Camera
and associated view lines
Mni
Mni+1
Mmi
Mni
Mmi
Mni+1 0f ,, zyx ,,
Coding: Consistent wavelet decomposition
s Single Connectivity Mesh (SCM)QCommon connectivity decomposition support:
–sufficient since wavelet coefficients are added on edges by face quadrisection
QPurpose: –To gather connectivity information –Easy to construct thanks to evolving
model structure with consistent connectivity
correspondances/implicit morphing at each level
Coding: Consistent wavelet decomposition
1
2
3 45
6
78
9
1
6
5
7
98
10 11
3
2
45
9 8
67
96
78
1211
10
6
7 98
10 11
67
1211
5
12
15
14
13
13 14
16
15
1 : face global indices1: vertex global indices
, , global face indexk (max resolution)
Unique global index
Coding: Results
s Street sequence: travelling:QGreen: current meshQYellow: next meshQRed: morphing source (subset of the current mesh)QBlue: morphing target (subset of the next mesh)
Base mesh Base mesh + refinments
Overview
s 3D information extraction
s Evolving model construction
s Evolving model coding
s Evolving model compression
s Conclusions/Perspectives
video bitstream3D Reconstruction [galpin02]
Evolving modelconstruction
Coding Compression
Compression: Media interrelationss Redundancies => Exploiting interrelations
between medias
Mux
bitstream
3D encoder
3D
2D encoder EBCOT
1D encoder
2D
1D
Mesh geometry & connectivity
Camera positions
Texture
(1)
(1)
(2)
(1) Texture image prediction using previous texture image + 3D model + camera position
(2) Texture coordinates of vertices retrieved using camera position
=> by reprojection of the model on camera position
=> 3 coordinates instead of 5+or
x,y,zu,v
x,y,z u,v,por
Compression: camera position compressions Camera positions compression [galpin02]
Q Intra –All camera positions are encoded in intra mode
Q Inter/ Predictive scheme–The first camera is encoded in intra mode–Key cameras are encoded incrementaly compared to the previous key position–Other cameras are encoded incrementaly compared to linear prediction
)).(()(
)).(()(
1
1
1
1
nnn
nnn
ttnn
nttt
ttnn
nttt
RRtt
ttRRRC
TTtt
ttTTTC
Intermediate positions:
1
1
)(
)(
nnn
nnn
ttt
ttt
RRRC
TTTC
Key positions:
Compression: geometry compression
s Geometry compression:QBase mesh: Topological Surgery (TS) encoder also known as MPEG4-3DMC for (XYZ) encoding
QWavelets: adaptation [koda00] of the SPIHT algorithm (Set Partitioning In Hierarchical Tree) [said96]:
–bitplane scalability is added to spatial and temporal scalability–Based on clever partitioning of coefficient hierarchy –Hierarchy not obtained by face subdivision but trough edge
based hierarchy
Compression: texture compressions Predictive scheme
[galpin02]
texturing projectionPrevious texture image
Predicted image
Next texure image
Compression: texture compression
Tn+2
T’n+1
Debased
¨T’n+2^
Debased
T’n+2
Debased
E’n+1
Debased
+ +
++
Networktransmission
Difference image
Networktransmission
Difference image
Difference image Difference image
Predicted image
Reconstructed image
Reconstructed image
Predicted image
Predicted image
Original image
Original image
Tn+2
Tn+2
Tn+1
En+2 En+2 Tn+2
Tn+1
En+1 En+1
Tn+1
Tn+1
Tn+2^
^
+
-
-
+
+
+
++
Compression: texture compression
s Base layers
K 1K K 2K K 3K
K 1K0
K 2K0
K 2K1
K 2K2
difference
difference
differenceprediction
Higher level layers
Base layer
ensures no error on predicted image
Compression: bitrate repartition
s Non detailed contributions:Q Optimal bitrate repartition between base mesh accuracy and necessary wavelet decomposition levelQ Optimal bitrate repartition between geometry and texture
To favour base mesh accuracy over wavelet decomposition level or on additional bitplane
To favour texture over geometry (once a minimal accuracy has been obtained)
Compression: low bitrate results
s Stairs sequence at 125kb/s for CIF 25Hz images
15
17
19
21
23
25
27
29
31
33
35
1 11 21 31 41 51 61 71 81 91 101
Images
PS
NR
PSNR image GalpinPSNR texture GalpinPSNR image H264PSNR image usPSNR texture us
oursgalpinH264
original
Compression: very low bitrate results
s Street sequence at 16kb/s for CIF 25Hz images
15
17
19
21
23
25
27
29
1 8 15 22 29 36 43 50 57 64 71 78 85 92 99 106 113 120 127 134 141 148 155 162
Images
PS
NR
PSNR image GalpinPSNR texture GalpinPSNR image usPSNR texture us
galpin ours
original
Compression: virtual reality results
s Street sequence at 82kb/s
Compression: augmented reality results
s Street sequence at 82kb/s
s Stairs sequence at 125kb/s
Compression: conclusions
s Good compression results/ better virtual reality resultsQBetter than state of the art generic coder H264QComparable to state of the art for 3D model based codingQBut consistence of the representation and scalability
essential for envisioned applications
Conclusionss Contributions
QEvolving model construction: –A posteriori morphing–consistent 3D model stream–Fixed connectivity and evolving geometry
QScalable encoding of the representation to obtain a multi-resolution evolving model
–Use of second generation wavelets–Consistent decompositions of the models of the stream based on a
common topological support, the SCM–Introduction of a global indexing system
QEfficient and scalable compression of the representation–Efficient codecs–Exploitations of interrelations existing between the media–Bitrate repartition optimisation
Perspectives
s Short term perspectivesQOcclusion zones management:
–Better detection of characteristic for depth discontinuities –Use of an adapted motion estimator => discontinuous disparity maps
[cammas03]QWavelet filters optimisationQSmoothing of the models to avoid noise peaksQAdaptive correction for texture image depending on the size of the 3D face
s Longer term perspectivesQUse of this representation in a dynamic coder to transmit static background of the sceneQMix between this representation and synthetic scene in order to get more genericity (town representation for example)
Communications and patents
s 7 international conference papers
s 5 national conference papers
s 1 patent
s 2 contributions to MPEG4 standardization
s 1 submitted journal paper (review in process)
Communications and patents
s 7 international conference papers
s 5 national conference papers
s 1 patent
s 2 contributions to MPEG4 standardization
s 1 submitted journal paper (review in process)
Coding: Consistent wavelet decomposition
23
5
1
4
23
4
(1,1,0,1)
(2,1,1,0)
2 global indices per vertex located on a face border
those coordinatesDo not give the same index
, , global face indexk (max resolution)
Unique global index
Coding: wavelet filters choice
s Envisionned applications:QNetwork between client and server
QReal time applications
need of a reconstruction:QScalable and locally adaptive
QVery fast algorithm
Our choice: simplest filters: lazy waveletQFilters
-low pass filter = average filter-High pass filter = identity
QBest compromise between filter simplicity/compression
Le codage d’images fixesOndelette Butterfly non liftée (synthèse)
161
161
0 161
1
0
0 0
21
161
21
21
21
21
21
81
81
81
161
0
161
161
0
81
81
0
161
161 0
0 0
81
161
161
0
0
161
0
0
10
0
0
0 0
0
0
0
0
0 0 0
0
0
0 0
Filtre passe haut
Filtre passe bas