the long march ( 長征 ) to 3d video leonardo chiariglione speech at 3d systems and applications...

The Long March ( 長征 )to 3D Video

Leonardo ChiariglioneSpeech at 3D Systems and

ApplicationsSeoul – 2014/05/28

It has already been not a short march

Analogue Printing

Photography

Telegraphy

Telephony

Audio recording

Radio

Television

Video recording

Digital Video conference

Video telephony

Video interactive

Television

(3D TV)

2

The dimensions of future media

Time/space resolution

Screen content

Colour

Brightness

Scalability

3D Video

3D Audio

Metadata

File format

Sensors/actuators

Human interaction

Fusion of real & virtual

Detection/analysis

Linking

Energy saving

User profile

3

There has been progress in resolution

QSIF

SIF

Standard Definition (interlace)

High Definition (Interlaced/progressive)

4k (Progressive)

8k (Progressive)

4

The cost of being digital

“VHS” SD HD 4k 8k#lines 288 576 1080 2160 4320#pixels 360 720 1920 3840 7680Frame freq. 25 25 25 50 50Mbit/s 41 166 829 6636 26542

5

Speech CD Stereo 5.1 22.2Sampling freq. 8 44.1 48 48 48bits/sample 8 16 16 16 16#channels 1 2 2 5.33 22.66Mbit/s 0.064 1.411 1.536 4.093 17.403

Video

Audio

Compression is making progress affordable

6

Base Scalable

Stereo Depth

Selectable

viewpoint

yr

MP1 ~VHS - - - 92MP2 2Mbit/

s-10% -15% - - 94

MP4-V

-25% -10% -15% - - 98

AVC -30% -25% -25% -20% 5/10% 03HEVC -60% -25% -25% -20% 5/10% 13

? ? ? ? ? ? ?

Are there limits to compression?

Input bandwidth to humansEyes: 2 channels of 430–790 THzEars: 2 channels of 20 Hz – 20 kHz

A nerve fiber connecting senses to the brain can transmit a new impulse every ~6ms = 167 spikes/s (1 bit ~16 spikes)Eye

1.2 M fibers transmit 10 bit/s eachAn eye sends ~12 Mbit/s to brain

Ear30 k fibers in the cochlear nerveAn ears sends ~300 kbit/s to brain 7

Sensors-to-brain bitrates

8

0.020-20kHz

30k nerve fibers

1.2M nerve fibers

~0.3Mbit/s

430–790 THz

~12Mbit/s

High Dynamic Range and Wider Color Gamut

Higher Dynamic Range and Wider Color Gamut can give users a better sense of “being there”, with a viewing experience closer to real life experience Light bulb > 10,000 nitsSurface lit in the sunlight > 100,000 nitsNight sky < 0.005 nits

Question: if dynamic ranges and volumes of the color gamut increase significantly, are existing MPEG video coding standards able to efficiently support future needs?

9

Wider Color Gamut

10ITU-R BT.709 ITU-R BT.2020

Dynamic Range- ExamplesBright areascan have > 10,000 Cd/m2

luminance

Dark areas can have < 0.01 Cd/m2 luminance

Screen Content applications

Wireless display

Companion screen

Control rooms with high resolution display wall

Digital operating room (DiOR)

Virtual desktop infrastructure (VDI)

Screen/desktop sharing and collaboration

Cloud computing and gaming

Factory automation display

Supervisory control and data acquisition (SCADA) display

Automotive/navigation display

PC over IP (PCoIP)

Ultra-thin client

Remote sensing

12

Use case #1: Hi-res display wall

Use case #2: collaboration

Use case 3: DiOR

Where we areJanury 2014: Joint Call for Proposals for Coding of

Screen Content

April 2014: Proposals evaluationConclusion: evidence that significantly improved coding

efficiency can be obtained by exploiting screen content characteristics with novel dedicated coding tools

April 2014: Standardization plan and tentative time lineFirst Test Model: Apr. 2014PDAM: Oct. 2014DAM: Feb. 2015FDAM: Oct. 2015

Test sequence #1 (text and graphics with motion)

Test sequence #2 (text and graphics with motion)

Test sequence #3 (mixed content)

Test sequence #4 (animation)

MPEG standards for coding multiple cameras

A long history, starting from MPEG-2 (mid 1990s)

MPEG standards (existing and under development)Multiview coding – can only display views

captured at the sourceDepth-based coding – can also display limited

number of additional views

Camera arrangement: cameras are assumed to be linearly arranged

21

Free viewpoint television (FTV)/1

Free viewpoint television (FTV): a hypothetical 3D transmission system that enables a viewer to select arbitrarys viewpoints, inside and outside a scene

FTV requires many technologies, not just from MPEG

A 3D video format supporting the generation of views not already included in the bitstream generated by the encoder would be a major enabler for FTV.

Purpose of MPEG FTV exploration: to develop the know-how to enable MPEG to develop the said 3D video format

22

Free viewpoint television (FTV)/2

Areas considered in the MPEG FTV exploration

Compare and evaluate the depth quality attainable for general camera arrangements

Evaluate view synthesis algorithms and improve their performance

To investigate the coding efficiency of the most promising coding technologies currently available

To investigate the influence of mis-registration on the View Synthesis performance

To investigate the representation capability of BIFS to clarify the elements that need to be standardized

23

FTV Seminar

A Viewing Revolution in the Making

Date: 2014 July 8 T14:00-18:00

Venue: Main Hall B, Sapporo Convention Center

Sapporo, Japan

Exhibition of FTV demos

Room 101, 10:00-17:00, July 1 to 4.

24

3D Audio – NHK Loudspeaker Array Frame

25

Parallel worldsFor centuries humans have been building two

different types of worlds

26

PhysicalInformational

Books

Music

Films

Knowledge

ImmersionA definition of immersion: a state in which

connections of a human with Physical world are severed Informational world are activated

27

How far is immersion progressing?

Fairly… …or too far?

28

Can we reconnect the two worlds?

Smartphones Enable universal access to the Informational world

while sensing also the Physical worldEnhance history and meaning of the real world

with powerful digital elements

Let’s create two-way bridgesExtend reality to virtualAdd reality to virtual

29

Physical & Informational

Functions of an Augmented Reality

browserRetrieve scenario from the internet

Start video acquisition and track objects

Recognise objects and recover camera pose

Get streamed 3D graphics and compose new scenes

Get input from various sensors

Access interaction possibilities and objects from a remote server

Adapt to offer optimal AR experience30

The AR technology chain

31

ARAF Browser

MediaServer

s

Service

Servers

User

LocalSensors & Actuators

RemoteSensors & Actuators

MPEG ARAF

Local Real World

Environment

Local Real World

Environment

Remote Real World

Environment

Remote Real World

Environment

Authoring Tools

ARAFAugmented RealityApplication Format

Augmented Reality Application Format

A set of MPEG-4 scene graph nodes Audio, image, video, graphics, programming,

communication, user interactivity, animation

Map, MapMarker, Overlay, ReferenceSignal, ReferenceSignalLocation, CameraCalibration, AugmentedRegion

Connection to sensors defined in MPEG-V Orientation, Position, Angular Velocity, Acceleration,

GPS, Geomagnetic, Altitude, Local camera(s)

Compressed media Image, (3D) sound, (3D) video, 2D/3D graphics

32

ARAF

The whole used to be the message

33

Classic Books: the value is in the content as a whole

Today the link adds value

to the message

34

On-line knowledge: the value is in the link

The video used to be the message

35Classic video content: the value is in the content as a whole

Next the link will add value

to the video message

36New video content: the value is in the link

From EU FP7 BRIDGET project

An unequal fightMany new services – all more demanding in

bandwidth

Compression improves, but cannot cope with all the demands just by itself UHD is 4 times the uncompressed bitrate of HD,

but HEVC “only” compresses two times AVC)And we have HDR, WCG, SCC, FTV…

At prime time 30% of USA internet is taken by Netflix traffic

We need more tools to solve the problem37

The mobile industry perspective

38

10 x more spectrum

10 x better spectrum utilisation

10 x more base stations

1000 x more capacityX X =

Making the network smarter

Video has lion’s share of internet traffic – more so as we add more dimensions to the user experience

We need to cope with (human-vehicle) mobility More and more of human life happens on the

move

We need new smarter approaches instead of just throwing more network capacity, beyond Digital video recording (on premises or networked)Peer-to-Peer (P2P) OverlaysContent Distribution Networks (CDNs)

39

Video and Information Centric Networking

40

Information Centric Network

IP Network

Same content available at different network locations

Migration path from today’s IP infrastructure to pub/sub support for ICN

Client - content - network mobility under energy consumption constraints

From FP7/NICT EU-JAPAN GreenICN project

MediaPre-processor

MediaEncoder

MediaDecoder

PresentationSubsystem

Green Meta-data Generator

Green Meta-data Generator

Power optimization module Power optimization module

Green Meta-data Extractor

Powercontrol

Powercontrol

Powercontrol

Powercontrol

Green Metadata

Green Metadata

Media EncodedMedia

MediaEncodedMedia

Green Feedback

Green Feedback

Green MPEG

42

http://mpeg.chiariglione.org/

the long march ( 長征 ) to 3d video leonardo chiariglione speech at 3d systems and applications...

Documents

mbits slide

video audio slide

dior slide

collaboration slide

luminance slide

display wall slide

coding of screen content

nerve fibers