point cloud compression, mpeg and beyond
TRANSCRIPT
© 2016 BlackBerry. All Rights Reserved. 1
1
Point Cloud Compression, MPEG and beyond
Sébastien Lasserre, David FlynnSenior Standards Managers
ERC-CLIM Workshop, Inria, Rennes, May 28th, 2019
And all started from the need for something…MPEG requirements
© 2016 BlackBerry. All Rights Reserved. 3
3
Requirements
▪practical applications
▪need for compression▪storage of long sequences or large data
▪transmission (=reduced bandwidth)
▪need for a standard ▪interoperability
© 2016 BlackBerry. All Rights Reserved. 4
4
Why PCC for AR/VR?
▪ how to handle these content ?▪only point clouds are credible
▪easy to process on the production side
▪the dream of 6 Degrees of Freedom
© 2016 BlackBerry. All Rights Reserved. 5
5
DENSE AR/VR point clouds
Multiple-camera capture
Computer Generated Imaging
▪ provided by 8i
© 2016 BlackBerry. All Rights Reserved. 6
6
Why PCC for automotive? ▪ provided by
Mitsubishi
© 2016 BlackBerry. All Rights Reserved. 7
7
Fused over time to obtain 3D maps
© 2016 BlackBerry. All Rights Reserved. 8
8
© 2016 BlackBerry. All Rights Reserved. 9
9
PCC for other use-cases
▪ culture heritage
▪ industrial applications
▪buildings
▪ topography
Organizing the work in MPEG toward an International Standard
© 2016 BlackBerry. All Rights Reserved. 11
11
MPEG▪the obvious standard body for a compression activity▪a unique expertise in compression
▪historically audio and video
▪the new standard MPEG-I▪a set of specifications (“parts”)
▪“I” for Immersive
▪why Point Cloud Compression in MPEG ?▪because MPEG is THE standard body for multimedia compression
▪existence of the 3D graphics sub-group
▪worked on and issued a specification on mesh compression some 5-10 years ago
© 2016 BlackBerry. All Rights Reserved. 12
12
What are we working on? Point clouds!
A point cloud is the given of▪ geometry = 3D positions (X,Y,Z) of a set of points
▪ attributes = attribute values associated with each point
Attributes may be▪ 3-component colours (RGB or YUV), reflectance, time stamp
▪ scene segmentation deduced from scene analysis
Static or dynamic ▪static point clouds
▪dynamic point clouds▪several “frames”
▪VR/AR moving objects, Lidar on a moving vehicle
© 2016 BlackBerry. All Rights Reserved. 13
13
We must agree on a data formatplyformat ascii 1.0comment Version 2, Copyright 2017, 8i Labs, Inc.comment frame_to_world_scale 0.179523comment frame_to_world_translation -45.2095 7.18301 -54.3561comment width 1023element vertex 800259property float xproperty float yproperty float zproperty uchar redproperty uchar greenproperty uchar blueend_header167 62 246 177 159 147167 62 247 171 153 142167 63 247 180 162 152165 63 249 159 143 131165 63 250 155 140 129165 63 251 154 139 129167 61 248 157 142 132167 61 249 151 137 128167 61 250 150 136 128167 61 251 147 136 128166 62 248 161 145 135166 62 249 155 140 129166 63 248 173 157 147166 63 249 158 142 131….
© 2016 BlackBerry. All Rights Reserved. 14
14
Need for an evidence and preparation of the Call for Proposal
▪Call for Evidence▪do we have hope to respond to the requirements
▪using a reasonable technology?
▪generating anchors▪ to compare with technologies to be proposed as responses to the Call for Proposal
▪evaluation method▪objective metrics
▪ framework for subjective tests
© 2016 BlackBerry. All Rights Reserved. 15
15
The initial plan (2016)
two Point Cloud specifications?
v1 then v2 ?
We’ll see…
“Better have a bad plan than no plan at all”
© 2016 BlackBerry. All Rights Reserved. 16
16
However…▪Call for Proposal▪ responses in October 2017
▪ three categories
▪cat1: static various unstructured point clouds
▪cat2: dense dynamic voxelized AR/VR
▪cat3: sparse automotive
▪three Test Models, then two▪ three categories have led to three Test Models TM1, TM2 and TM3
▪TM1 and TM3 merged to become TM13
▪time to market, dense vs sparse => two tracks▪AR/VR TM2 pushing for a fast track leveraging existing hardware (video codecs) => Video-PCC
▪ robust to any content and disruptive TM13 on a longer track => Geometry-PCC
© 2016 BlackBerry. All Rights Reserved. 17
17
Timeline Jan 2019
Mar 2019
Jul2019
Oct2019
Jan 2020
Apr2020
Jun 2020
Jul2018
Oct2018
Apr2018
Jan 2018
CfPresp
final CfP
CfPv2
CfPv1
Oct2017
Jul2017
Apr2017
Jan 2017
draft CfP
Oct2016
May 2016
requirements
requirements
CfPresp
WD
CD
DIS
FDIS
G-PCC v2 (including inter)
CD DIS FDIS
© 2016 BlackBerry. All Rights Reserved. 18
18
The current plan
“Better have a good plan than a bad plan”
© 2016 BlackBerry. All Rights Reserved. 19
19
Geometry-based Point Cloud compression in MPEG
▪ deliverables▪a specification
▪ on bit-stream structure and decoding process
▪ the encoding process is informative only, not normative
▪a reference software aka Test Model
▪ in C++
▪ encoder and decoder
▪ main participants▪8i
▪Apple
▪BlackBerry
▪Hanyang University
▪ InterDigital
▪Huawei
▪ LG
▪Mitsubishi Electrics
▪Panasonic
▪Peking University (PKU)
▪ Sony
▪ Tencent
▪ TNO
▪ etc.
© 2016 BlackBerry. All Rights Reserved. 20
20
Don’t forget: this must be deployed and make the user happy
▪ controlled complexity▪memory
▪ computation
▪ algorithms linear in the data size (or close to)
▪ robustness▪ should not break badly if a “bad” content as input
▪ this must work ALL the time; worst case scenario becomes THE scenario
▪ the end user will never accept something that works “only” 99% of the time
▪acceptable implementation▪ fixed point C++
▪no use of crazy functions (+,-, >>, LUTs, fixed *, bitwise tests)
▪must have addressed the requirements
© 2016 BlackBerry. All Rights Reserved. 21
21
MPEG meeting cycle
W01 W13
1 meeting cycle = typically 3 months (or 12-13 weeks)
MPEG meeting MPEG
meeting
Ad hoc Groups(presentation of contributions)
upload input contributions
output documents
release Test Model software
release new draft spec
integration of adopted new tools
one week
▪ one cycle is a sprint
▪ the repetition of cycles is a marathon
crosschecks
© 2016 BlackBerry. All Rights Reserved. 22
22
MPEG is a very efficient machinery
▪ has delivered the best-in-class compression technologies over decades
▪ many participants working together more than not
▪ you, as an individual, can not compete
▪ better embracing it and participate
© 2016 BlackBerry. All Rights Reserved. 23
23
Geometry-based Point Cloud Compression (GPCC)
MPEG-I part 9
Octree representing the geometry
© 2016 BlackBerry. All Rights Reserved. 25
25
Representing the geometry: octreesPop out a cube to be coded from a FIFO
• split into 8 sub-cubes• deduce an occupancy pattern b in [1,255]
Entropy code the pattern• using an arithmetic coder• whose probabilities are updated during
the coding
Push sub-cubes in FIFO• if occupied by at least a point and sub-
cube size is bigger than 1
Pop-out a next cube from FIFO until FIFO is empty
© 2016 BlackBerry. All Rights Reserved. 26
26
Extension to better trees?
Octree in GPCC▪very practical!
▪allows to introduce prediction the easy way
KD-tree▪ local split decision performed by the encoder
▪based on RDO
▪signaled to the decoder
A mix ?
More general types of trees ?
Breadth-first or depth-first ?
kd-tree
Lossless compression of the octree prediction from a neighbourhood
© 2016 BlackBerry. All Rights Reserved. 28
28
Occupancy binarizationfor each current node▪ eight child cubes CCi that may or may not be occupied by a point of the point cloud
▪occupancy bit bi = occupied (bi=1) or non-occupied (bi=0)
binary coders ▪ to code the eight bits bi
▪have more potential and flexibility toward the introduction of occupancy prediction tools
binarization of the 8-bit occupancy pattern ▪based on the conditional entropy formula
▪ to profit from local geometry correlation, the bits bi are coded dependently on the preceding bits
Ɗ𝑗 = 𝑏0. . . 𝑏𝑗−1
𝐻 𝑏 = 𝐻 𝑏0 + 𝐻 𝑏1|𝑏0 +⋯+ 𝐻 𝑏7|𝑏0…𝑏6
prediction states of bit bj
© 2016 BlackBerry. All Rights Reserved. 29
29
Neighbouring configurationsSix neighbours▪six cubes of the same depth sharing a face with a current node
Neighbouring configuration▪neighbouring configuration NC is determined by summing the
weights (1, 2, 4, etc.) associated with occupied neighbouring cubes
Reduction of configurations configuration▪64 neighbouring configurations NC reduced to 10 invariant
configurations NC10
▪NC in [0,63] is mapped onto NC10 in [0,9]
Ɗ𝑗 = 𝑏0. . . 𝑏𝑗−1, NC10 .
configuration NC=15
NC10
© 2016 BlackBerry. All Rights Reserved. 30
30
On using neighbours’ children
▪Breadth-first scanning order ▪ensures that 3 (among 6) neighbours are already coded
▪Child nodes of already coded neighbours ▪ the child node structure of occupied already coded neighbours is used to predict
the occupancy bits bi of the current node
▪a value NT[j] indicates the number of neighbour’s occupied child nodes touching the child node.
Ɗ𝑗 = 𝑏0. . . 𝑏𝑗−1, NC10, 𝑁𝑇 𝑗 .
© 2016 BlackBerry. All Rights Reserved. 31
31
Intra prediction
Computing a score▪using the 26 neighbouring nodes of the current node
▪score as a sum of the 26 weights depending on the occupancy of the neighbouring nodes
scorem =1
26
𝑘=1
26
൯𝑤𝑘,𝑚(𝛿𝑘
Ɗ𝑗 = 𝑏0. . . 𝑏𝑗−1, NC10, NT[j] ]Pred[𝑗
Hard thresholding▪ transform the score into a ternary information Pred[j]
▪by using two thresholds th0 and th1
Pred[j] ϵ{predicted non-occupied, predicted occupied, not predicted}
score
proba occupancy
cumulative distribution
pred non-occupied not pred pred occupied
© 2016 BlackBerry. All Rights Reserved. 32
32
Ɗ𝑗 = 𝑏0. . . 𝑏𝑗−1, NC10, 𝑁𝑇 𝑗 , ]Pred[𝑗
Optimal Binary Coder with Update on the Fly (OBUF)Small number of binary entropy coders▪ a fixed number N of binary arithmetic coders Ci with evolving internal probability (like CABAC)
Selection of the coder ▪ selecting an “optimal” coder Ci for an occupancy bit bj depending on the state set Ɗ𝑗▪ selection by a coder mapping (=a big LUT)
▪mapping updated after the coding of each bit
▪update based on a simple channel model
preparing the future: just plug inter prediction here!
© 2016 BlackBerry. All Rights Reserved. 33
33
Overview of the geometry coding engine
FIFO
child nodes computation
nodechild nodes
occupancy pattern b
neighbour configuration N
N, P, b0
point cloud
neighbour computation
state reduction 0
state reduction 1
reduced states
…
predictor
prediction P
intermediate coder mapping 0
mapping update
intermediate coder index
true coder correspondence
binary coder 0binary coder 1binary coder 2
binary coder N-1
bitstream
true coder index
b
state reduction 2
N, P
N, P, b0, b1
reduced states
intermediate coder mapping 1
mapping update
reduced states
intermediate coder mapping 2
mapping update
b0
b1
b2
preceding bits in occupancy pattern b
© 2016 BlackBerry. All Rights Reserved. 34
34
Geometry coding in G-PCC: results (lossless)▪ Strong gains on dense point clouds (-60%)
▪ Smaller gains on sparse point clouds (-5% to -25%)
Compression of attributes(colour and reflectance)
© 2016 BlackBerry. All Rights Reserved. 36
36
Attribute coding in G-PCC: RAHT transforms
▪A two-point transform applied iteratively ▪depth by depth, node by node, direction by direction
the current node sub-nodes
direction
2-point transform
2-point transform
no transform
▪AC coefficients coded, DC pushed to preceding depth
depth d
transform along three directions
depth d-1 DC coefficientsAC coefficients
© 2016 BlackBerry. All Rights Reserved. 37
37
RAHT along directions X, Y and Z
first directionDC coefficients
AC coefficients
second directionDC coefficients
AC coefficients
quantize and code
DC coefficients
AC coefficients
third direction
used for depth d-1
RAHT(𝑤1, 𝑤2) =1
𝑤1 +𝑤2
𝑤1 𝑤2− 𝑤2 𝑤1
𝐷𝐶𝐴𝐶
= RAHT(𝑤1, 𝑤2)𝑐1𝑐2
© 2016 BlackBerry. All Rights Reserved. 38
38
Attribute coding in G-PCC: the pred/lift scheme
Splitting into Levels of Detail
A standard lifting schemefiner details
coarser details
even corser details
© 2016 BlackBerry. All Rights Reserved. 39
39
The pred/lift scheme (continued)
Prediction
Update
𝑈𝑝𝑑𝑎𝑡𝑒(𝑃) = [𝛼(𝑃,𝑄)×𝑤(𝑄)× 𝐷(𝑄)]𝑄∈Δ(𝑃)
[𝛼(𝑃,𝑄)×𝑤(𝑄)]𝑄∈Δ(𝑃)
Q is a point for which P has been used for prediction
the residual coded for Q
© 2016 BlackBerry. All Rights Reserved. 40
40
Mixing prediction and transformsTransforms▪ 2-point RAHT▪ small transform, compaction is not really good
▪ Graph transforms▪complexity issues in O(n²) for n nodes; based on the diagonalization of a nxn matrix
▪ Weighted Graph Transforms▪generalizes RAHT, but has the same complexity issue
▪ Others ?
Types of prediction▪ intra/neighbour prediction, as in pred/lift for example using k nearest neighbours
▪ inter-frame prediction (see later section)
▪ inter-depth prediction (similar to inter layer prediction in scalable video coding) = up-sampling between depths
inter prediction, toward a GPCC v2
© 2016 BlackBerry. All Rights Reserved. 42
42
Frame F Frame F+1 in yellow
Geometry coding in G-PCC: 3D motion and inter prediction
Profit from high temporal correlation using motion compensation ▪3D motion search
▪optimal motion based on aLagrange cost optimisation
Prediction Units▪3D motion vectors coded in PUs
▪motion compensated points used aspredictors
𝑋′𝑌′𝑍′
=∗ ∗ ∗∗ ∗ ∗∗ ∗ ∗
𝑋𝑌𝑍
+
𝑉𝑥𝑉𝑦𝑉𝑧
motion field
© 2016 BlackBerry. All Rights Reserved. 43
43
translation + yaw +roll
Geometry coding in G-PCC: global motionThe vehicle moves in the Earth referential system ▪some objects appear moving
▪infrastructure, building, landscape
▪but do not in the Earth referential system
▪ idea: referential dependent motion
frame F’ in vehicle coordinates frame F in vehicle coordinates
MY+V {Yj}
{Xi}
▪Use a global motion between referential systems ▪mainly vehicle relative to Earth
▪non solid 3D motion
▪other referential system possible, e.g. another vehicle
Lossy geometry compression
© 2016 BlackBerry. All Rights Reserved. 45
45
Geometry coding in G-PCC: intra lossy coding▪Adding, removing point in a
2x2x2 node ▪ increase distortion but lower bit-rates
▪ find the best trade-off using a Lagrange cost
▪by testing many configurations on the encoder side
bit per point
quality geometry (PSNR)
remove a point
add a point
© 2016 BlackBerry. All Rights Reserved. 46
46
Geometry coding in G-PCC: inter lossy coding
▪Predictor Copy Coding Mode ▪add a PCCM flag signalling if the PCCM is active
▪ if active, replace the original set of points by the copy of the predictive set of points
▪ set of points obtained by 3D motion compensation
current node
occupied child nodes
PCCM flag yes
predictor = set of prediction points
no
copy the predictor points
in the current node that
becomes a leaf node (early
termination)
code the occupancy of the
current node
occupied child nodes are
iteratively coded (put in the FIFO)
occupancy coding depending on the predictor
▪Very promising results▪will G-PCC overperform V-PCC on
dense point clouds ?
▪when ?
© 2016 BlackBerry. All Rights Reserved. 47
47
Do you meet the requirements for the geometry?▪A bit of anticipation on the expected lossy compression performance▪ for high quality compression
▪0.2 bpp for intra
▪0.05 bpp for inter
▪A 3D object▪1 million of points
▪30 fps
▪A High Definition 3D object▪4 million of points
▪60 fps
▪ the bpp is a bit lower
<0.1 bpp in average in a GOP
bitrate < 3 Mb/s
bitrate < 20 Mb/s
Proof of evidence for a v2 with inter is there…