the long march ( 長征 ) to 3d video leonardo chiariglione speech at 3d systems and applications...
TRANSCRIPT
The Long March ( 長征 )to 3D Video
Leonardo ChiariglioneSpeech at 3D Systems and
ApplicationsSeoul – 2014/05/28
It has already been not a short march
Analogue Printing
Photography
Telegraphy
Telephony
Audio recording
Radio
Television
Video recording
Digital Video conference
Video telephony
Video interactive
Television
(3D TV)
2
The dimensions of future media
Time/space resolution
Screen content
Colour
Brightness
Scalability
3D Video
3D Audio
Metadata
File format
Sensors/actuators
Human interaction
Fusion of real & virtual
Detection/analysis
Linking
Energy saving
User profile
3
There has been progress in resolution
QSIF
SIF
Standard Definition (interlace)
High Definition (Interlaced/progressive)
4k (Progressive)
8k (Progressive)
4
The cost of being digital
“VHS” SD HD 4k 8k#lines 288 576 1080 2160 4320#pixels 360 720 1920 3840 7680Frame freq. 25 25 25 50 50Mbit/s 41 166 829 6636 26542
5
Speech CD Stereo 5.1 22.2Sampling freq. 8 44.1 48 48 48bits/sample 8 16 16 16 16#channels 1 2 2 5.33 22.66Mbit/s 0.064 1.411 1.536 4.093 17.403
Video
Audio
Compression is making progress affordable
6
Base Scalable
Stereo Depth
Selectable
viewpoint
yr
MP1 ~VHS - - - 92MP2 2Mbit/
s-10% -15% - - 94
MP4-V
-25% -10% -15% - - 98
AVC -30% -25% -25% -20% 5/10% 03HEVC -60% -25% -25% -20% 5/10% 13
? ? ? ? ? ? ?
Are there limits to compression?
Input bandwidth to humansEyes: 2 channels of 430–790 THzEars: 2 channels of 20 Hz – 20 kHz
A nerve fiber connecting senses to the brain can transmit a new impulse every ~6ms = 167 spikes/s (1 bit ~16 spikes)Eye
1.2 M fibers transmit 10 bit/s eachAn eye sends ~12 Mbit/s to brain
Ear30 k fibers in the cochlear nerveAn ears sends ~300 kbit/s to brain 7
Sensors-to-brain bitrates
8
0.020-20kHz
30k nerve fibers
1.2M nerve fibers
~0.3Mbit/s
430–790 THz
~12Mbit/s
High Dynamic Range and Wider Color Gamut
Higher Dynamic Range and Wider Color Gamut can give users a better sense of “being there”, with a viewing experience closer to real life experience Light bulb > 10,000 nitsSurface lit in the sunlight > 100,000 nitsNight sky < 0.005 nits
Question: if dynamic ranges and volumes of the color gamut increase significantly, are existing MPEG video coding standards able to efficiently support future needs?
9
Wider Color Gamut
10ITU-R BT.709 ITU-R BT.2020
Dynamic Range- ExamplesBright areascan have > 10,000 Cd/m2
luminance
Dark areas can have < 0.01 Cd/m2 luminance
Screen Content applications
Wireless display
Companion screen
Control rooms with high resolution display wall
Digital operating room (DiOR)
Virtual desktop infrastructure (VDI)
Screen/desktop sharing and collaboration
Cloud computing and gaming
Factory automation display
Supervisory control and data acquisition (SCADA) display
Automotive/navigation display
PC over IP (PCoIP)
Ultra-thin client
Remote sensing
12
Use case #1: Hi-res display wall
Use case #2: collaboration
Use case 3: DiOR
Where we areJanury 2014: Joint Call for Proposals for Coding of
Screen Content
April 2014: Proposals evaluationConclusion: evidence that significantly improved coding
efficiency can be obtained by exploiting screen content characteristics with novel dedicated coding tools
April 2014: Standardization plan and tentative time lineFirst Test Model: Apr. 2014PDAM: Oct. 2014DAM: Feb. 2015FDAM: Oct. 2015
Test sequence #1 (text and graphics with motion)
Test sequence #2 (text and graphics with motion)
Test sequence #3 (mixed content)
Test sequence #4 (animation)
MPEG standards for coding multiple cameras
A long history, starting from MPEG-2 (mid 1990s)
MPEG standards (existing and under development)Multiview coding – can only display views
captured at the sourceDepth-based coding – can also display limited
number of additional views
Camera arrangement: cameras are assumed to be linearly arranged
21
Free viewpoint television (FTV)/1
Free viewpoint television (FTV): a hypothetical 3D transmission system that enables a viewer to select arbitrarys viewpoints, inside and outside a scene
FTV requires many technologies, not just from MPEG
A 3D video format supporting the generation of views not already included in the bitstream generated by the encoder would be a major enabler for FTV.
Purpose of MPEG FTV exploration: to develop the know-how to enable MPEG to develop the said 3D video format
22
Free viewpoint television (FTV)/2
Areas considered in the MPEG FTV exploration
Compare and evaluate the depth quality attainable for general camera arrangements
Evaluate view synthesis algorithms and improve their performance
To investigate the coding efficiency of the most promising coding technologies currently available
To investigate the influence of mis-registration on the View Synthesis performance
To investigate the representation capability of BIFS to clarify the elements that need to be standardized
23
FTV Seminar
A Viewing Revolution in the Making
Date: 2014 July 8 T14:00-18:00
Venue: Main Hall B, Sapporo Convention Center
Sapporo, Japan
Exhibition of FTV demos
Room 101, 10:00-17:00, July 1 to 4.
24
3D Audio – NHK Loudspeaker Array Frame
25
Parallel worldsFor centuries humans have been building two
different types of worlds
26
PhysicalInformational
Books
Music
Films
Knowledge
ImmersionA definition of immersion: a state in which
connections of a human with Physical world are severed Informational world are activated
27
How far is immersion progressing?
Fairly… …or too far?
28
Can we reconnect the two worlds?
Smartphones Enable universal access to the Informational world
while sensing also the Physical worldEnhance history and meaning of the real world
with powerful digital elements
Let’s create two-way bridgesExtend reality to virtualAdd reality to virtual
29
Physical & Informational
Functions of an Augmented Reality
browserRetrieve scenario from the internet
Start video acquisition and track objects
Recognise objects and recover camera pose
Get streamed 3D graphics and compose new scenes
Get input from various sensors
Access interaction possibilities and objects from a remote server
Adapt to offer optimal AR experience30
The AR technology chain
31
ARAF Browser
MediaServer
s
Service
Servers
User
LocalSensors & Actuators
RemoteSensors & Actuators
MPEG ARAF
Local Real World
Environment
Local Real World
Environment
Remote Real World
Environment
Remote Real World
Environment
Authoring Tools
ARAFAugmented RealityApplication Format
Augmented Reality Application Format
A set of MPEG-4 scene graph nodes Audio, image, video, graphics, programming,
communication, user interactivity, animation
Map, MapMarker, Overlay, ReferenceSignal, ReferenceSignalLocation, CameraCalibration, AugmentedRegion
Connection to sensors defined in MPEG-V Orientation, Position, Angular Velocity, Acceleration,
GPS, Geomagnetic, Altitude, Local camera(s)
Compressed media Image, (3D) sound, (3D) video, 2D/3D graphics
32
ARAF
The whole used to be the message
33
Classic Books: the value is in the content as a whole
Today the link adds value
to the message
34
On-line knowledge: the value is in the link
The video used to be the message
35Classic video content: the value is in the content as a whole
Next the link will add value
to the video message
36New video content: the value is in the link
From EU FP7 BRIDGET project
An unequal fightMany new services – all more demanding in
bandwidth
Compression improves, but cannot cope with all the demands just by itself UHD is 4 times the uncompressed bitrate of HD,
but HEVC “only” compresses two times AVC)And we have HDR, WCG, SCC, FTV…
At prime time 30% of USA internet is taken by Netflix traffic
We need more tools to solve the problem37
The mobile industry perspective
38
10 x more spectrum
10 x better spectrum utilisation
10 x more base stations
1000 x more capacityX X =
Making the network smarter
Video has lion’s share of internet traffic – more so as we add more dimensions to the user experience
We need to cope with (human-vehicle) mobility More and more of human life happens on the
move
We need new smarter approaches instead of just throwing more network capacity, beyond Digital video recording (on premises or networked)Peer-to-Peer (P2P) OverlaysContent Distribution Networks (CDNs)
39
Video and Information Centric Networking
40
Information Centric Network
IP Network
Same content available at different network locations
Migration path from today’s IP infrastructure to pub/sub support for ICN
Client - content - network mobility under energy consumption constraints
From FP7/NICT EU-JAPAN GreenICN project
MediaPre-processor
MediaEncoder
MediaDecoder
PresentationSubsystem
Green Meta-data Generator
Green Meta-data Generator
Power optimization module Power optimization module
Green Meta-data Extractor
Powercontrol
Powercontrol
Powercontrol
Powercontrol
Green Metadata
Green Metadata
Media EncodedMedia
MediaEncodedMedia
Green Feedback
Green Feedback
Green MPEG
42
http://mpeg.chiariglione.org/