recent advances in quality of experience in multimedia communication

37
IMTC 20 th Anniversary Forum – Porto, Portugal 1 Recent advances in quality of experience in multimedia communication Touradj Ebrahimi [email protected] International Multimedia Telecommunications Consortium

Upload: imtc

Post on 25-May-2015

747 views

Category:

Technology


0 download

DESCRIPTION

Presentation covers various aspects of defining and measuring of the Quality of Experience in IP Multimedia communications, with emphasis on Video. Presented at IMTC 20th Anniversary Forum

TRANSCRIPT

Page 1: Recent advances in quality of experience in multimedia communication

IMTC 20th Anniversary Forum – Porto, Portugal1

Recent advances in quality of experience in

multimedia communication

Touradj [email protected]

International Multimedia Telecommunications Consortium

Page 2: Recent advances in quality of experience in multimedia communication

IMTC 20th Anniversary Forum – Porto, Portugal2

•What  is  “quality”?

•Origins  of  “quality”  evalua7on  in  telecommunica7ons

•How  is  “quality”  measured  in  mul7media  communica7ons  today  ?

•What  are  the  trends  on  “quality”  in  mul7media  communica7on?

•  What  are  the  challenges  ahead?

Today  we  will  talk  about…

Page 3: Recent advances in quality of experience in multimedia communication

IMTC 20th Anniversary Forum – Porto, Portugal3

Quality: a simple yet difficult concept

• Like many human sensations, quality is easy to understand but difficult to define

• Quality according to Wikipedia:– A quality (from Latin - qualitas) is an attribute or a

property– Some philosophers assert that a quality cannot be

defined

Page 4: Recent advances in quality of experience in multimedia communication

IMTC 20th Anniversary Forum – Porto, Portugal4

A fundamental, ancient, but largely under investigated concept

Aristotle classified every object of human apprehension into 10 Categories– Substance– Quantity– Quality– Relation– Place– Time– Position– State– Action– Affection

Page 5: Recent advances in quality of experience in multimedia communication

IMTC 20th Anniversary Forum – Porto, Portugal5

Qualinet white paper

• White Paper produced by COST Action IC1003 (Qualinet):– http://www.qualinet.eu/images/stories/

whitepaper_v1.1_dagstuhl_output_corrected.pdf• Several definitions of quality in multimedia

systems and services and other related concepts

Page 6: Recent advances in quality of experience in multimedia communication

IMTC 20th Anniversary Forum – Porto, Portugal6

Quality is like an elephant …

The blind men and the elephant: Poem by John Godfrey Saxe

Page 7: Recent advances in quality of experience in multimedia communication

IMTC 20th Anniversary Forum – Porto, Portugal7

Quality in telecommunications

Page 8: Recent advances in quality of experience in multimedia communication

IMTC 20th Anniversary Forum – Porto, Portugal8

Quality  in  telecommunica4ons

Network QualityCapacity

CoverageHandoff

Link QualityBitrate

Frame/Bit/Packet lossDelay

User Quality

Speech fidelityAudio fidelityImage fidelityVideo fidelity

Page 9: Recent advances in quality of experience in multimedia communication

IMTC 20th Anniversary Forum – Porto, Portugal9

Quality in telecommunications

Objective: input = output

Page 10: Recent advances in quality of experience in multimedia communication

IMTC 20th Anniversary Forum – Porto, Portugal10

QoS in telecommunications

• Quality of Service (QoS)– Resource reservation control mechanisms– Ability to provide different priority to different

applications or data flows– Guarantee a certain level of performance (quality) to

a data flow• Requires measuring or estimating quality• (Service) Provider-centric concept

– Tightly related to the concept of Mean Opinion Score (MOS)

Page 11: Recent advances in quality of experience in multimedia communication

IMTC 20th Anniversary Forum – Porto, Portugal11

Mean Opinion Score (MOS)?

• Widely used in many fields:– Politics/Elections– Marketing/Advertisement– Food industry– Multimedia– …

• The likely level of satisfaction of a service or product as appreciated by an average user in a typical use

• Should be performed such that it generates reliable and reproducible results– Subjective evaluation methodology– More complex and difficult that it seems

Page 12: Recent advances in quality of experience in multimedia communication

IMTC 20th Anniversary Forum – Porto, Portugal12

What is behind a MOS?

Page 13: Recent advances in quality of experience in multimedia communication

IMTC 20th Anniversary Forum – Porto, Portugal13

What is wrong with MOS?• Ground truth

– It is assumed to be the optimal solution (ground truth)• Defies user preference

– It is assumed to be independent of specific subjects• Ignores key parameters

– It is often assumed to be independent of content and context

• Misuse as a comparison approach– It is often used to compare performance between

alternative tools for a same task/operation• Impractical or expensive• Statistical assumptions not always realistic

Page 14: Recent advances in quality of experience in multimedia communication

IMTC 20th Anniversary Forum – Porto, Portugal14

Ground truth

Page 15: Recent advances in quality of experience in multimedia communication

IMTC 20th Anniversary Forum – Porto, Portugal15

Defies user preference

Page 16: Recent advances in quality of experience in multimedia communication

IMTC 20th Anniversary Forum – Porto, Portugal16

Content

p01 $$$$$$p06 $$$$$$$$$$p10 $$$$$$$$$$$$$$$bike$$$$$$$$$$$$$$$$$$cafe$$$$$$$$$$$$$$$woman$

Page 17: Recent advances in quality of experience in multimedia communication

IMTC 20th Anniversary Forum – Porto, Portugal17

Context

Page 18: Recent advances in quality of experience in multimedia communication

IMTC 20th Anniversary Forum – Porto, Portugal18

Objec4ve  quality  metrics

• Subjec7ve   tests   are   7me   consuming,   expensive,   difficult   to  design,  and  cannot  be  used  in  real-­‐7me  systems

• Objec7ve  algorithms,   i.e.  metrics,  es7ma7ng  MOS  with  high  level  of  correla7on  are  desired• Full  reference  metrics• No  reference  metrics• Reduced  reference  metrics

Page 19: Recent advances in quality of experience in multimedia communication

IMTC 20th Anniversary Forum – Porto, Portugal19

Objec4ve  quality  metrics  based  on  fidelity

• Full  Reference  scenario• Metrics   which   look   at   the   fidelity   of   the   signal   when  

compared  to  an  explicit  reference:  

processed  signal=

perfect  quality  reference  signal+

error  signal

Page 20: Recent advances in quality of experience in multimedia communication

IMTC 20th Anniversary Forum – Porto, Portugal20

Objec4ve  quality  metrics  based  on  fidelity

• Examples   of   fidelity   metrics   for   image   quality   quality  assessment− Mean  Square  Error  (MSE)− Peak  Signal  to  Noise  Ra7o  (PSNR)− Maximum  Pixel  Devia7on  (Linf)

− Weighted  PSNR  − Masked  PSNR− Structural  SIMilarity  (SSIM)− Mul7scale  Structural  Similarity  (MSSIM)− Visual  Informa7on  Fidelity  (VIF)− etc…

Page 21: Recent advances in quality of experience in multimedia communication

IMTC 20th Anniversary Forum – Porto, Portugal21

What is wrong with objective quality metrics ?

• Poor performance in predicting MOS– Especially in real life situations

• How to deal with color components?– Even in PSNR!

• How to deal with video?– Time versus spatial dimension

• How to take into account context?– Display size/type– Environmental illumination– …

• How to deal with user preference/expectation?• ...

Page 22: Recent advances in quality of experience in multimedia communication

IMTC 20th Anniversary Forum – Porto, Portugal

MVC assessment using PSNR as metric

22 Vetro: Update on MPEG 3D Video Coding Activity

MVC�Coding�Performance

Ballroom

31

32

33

34

35

36

37

38

39

40

0 200 400 600 800 1000 1200 1400 1600 1800

Bitrate (Kb/s)

PS

NR

(db)

Simulcast

MVC

Race1

32

33

34

35

36

37

38

39

40

41

42

0 200 400 600 800 1000 1200 1400 1600

Bitrate (Kb/s)

PS

NR

(db)

Simulcast

MVC

Sample comparison of simulcast vs inter-view prediction

8 views (640�480), consider rate for all views~25% bit rate savings over all views

Page 23: Recent advances in quality of experience in multimedia communication

IMTC 20th Anniversary Forum – Porto, Portugal

MVC assessment by subjective evaluation

23Vetro: Update on MPEG 3D Video Coding Activity

MVC:�Subjective�Performance• Main�finding:�MVC�achieves�comparable�quality�to�

simulcast�with�as�little�as�25%�rate�for�dependent�view

1.00

1.50

2.00

2.50

3.00

3.50

4.00

4.50

Origina

l

Simulc

ast (

AVC+AVC)

12L_

50Pct

12L_

35Pct

12L_

25Pct

12L_

20Pct

12L_

15Pct

12L_

10Pct

12L_

5Pct

Mea

n O

pini

on S

core

Base view fixed at 12Mbps Dependent view at varying percentage of base view rate

Page 24: Recent advances in quality of experience in multimedia communication

IMTC 20th Anniversary Forum – Porto, Portugal24

Evolving Quality Paradigms

Page 25: Recent advances in quality of experience in multimedia communication

IMTC 20th Anniversary Forum – Porto, Portugal

User centered evaluation

25

• It is becoming increasing important to evaluate quality from a user perspective rather than from a provider/system perspective

• Constrains the tests to potential users and stimuli and evaluates the acceptability with respect to the task and the context

• Identify underlying quality factors beside the overall quality

Page 26: Recent advances in quality of experience in multimedia communication

IMTC 20th Anniversary Forum – Porto, Portugal26

What people mean when they say QoE ?

• “The degree of fulfillment of an intended experience on a given user”

as defined by Touradj Ebrahimi, 2001

• “The overall acceptability of an application or service … by the end user”

as defined by the ITU-T

• “The degree of delight ... of the user of an application or service ... user’s personality and current state.”

as defined in white paper by Qualinet

The term ‘experience’ promises individual engagement …

Page 27: Recent advances in quality of experience in multimedia communication

IMTC 20th Anniversary Forum – Porto, Portugal27

Quality  of  Service  vs  Quality  of  Experience

• Quality   of   Service:   Value   of   the   average   user’s  experience   es7mated   by   a   service/product/content  provider  

• Quality   of   Experience:   Value   (es7mated   or   actually  measured)  of  a  specific  user’s  richness  of  experience

• Quality  of  Experience  is  the  dual  (and  extended)  view  of  QoS  problem

QoS=provider-­‐centricQoE=user-­‐centric

Page 28: Recent advances in quality of experience in multimedia communication

IMTC 20th Anniversary Forum – Porto, Portugal28

A simple model for QoE

• User attributes – individual attributes – expectation, age, sex, personality, background…– sensorial attributes – including limitations and deficiencies– perceptual attributes – emotional attributes

• System attributes– technical attributes (as in QoS)

• Contextual attributes– environmental attributes– device attributes– service attributes– content attributes

userQoS

QoE

context

Page 29: Recent advances in quality of experience in multimedia communication

IMTC 20th Anniversary Forum – Porto, Portugal

Personas and scenarios [Strohmeier2009]

• Personas (user preference)– Archetypical user representing the needs, behaviors

and goals of a particular group of users– Not real people but fictional characters derived from

user requirements• Scenarios (context)

– Realistic usage environment

29

Page 30: Recent advances in quality of experience in multimedia communication

IMTC 20th Anniversary Forum – Porto, Portugal

Quality of Experience in Mobile Multimedia

• The influence of context in mobile multimedia user experience and QoE– Various probes to monitor user behavior– Specific scenario to be followed by subjects

30

The test equipment selected for the experiment needed tosatisfy two main constraints. Firstly, given the complexity ofthe measurement task, a range of di!erent devices was nec-essary to capture the video consumption behaviour of theusers during the experiment. The devices were required towork and interact with each other in real time. Secondly, asthe experiment was conducted in a non-laboratory environ-ment and the users needed to carry the equipment around,the devices were constrained to be lightweight and robustenough to work under moving conditions.The test equipment consisted of a video playback system

and a recording system selected with respect to the aboveconstraints. The former was used to show the content tothe user and capture information regarding the video playerstatus and the user location. The video playback systemconsisted of the following items:

• a HTC Tattoo mobile phone used to display the videocontent and an external headset to listen to the audio

• a standard button based video player graphical userinterface, as illustrated in Figure 2, with the followingfunctionalities:

– pause/play

– next channel

– previous channel

– menu to change channel

– standard volume controls

• a logging software to keep track of the controls acti-vated at anytime by the user

• a GPS integrated in the mobile phone used to deter-mine the position and the mobility scenario of the user.

The recording system facilitated the capture and storageof video information recorded to determine the user’s visualfocus of attention. The recording system consisted of thefollowing items:

• a head mounted Logitech Webcam Pro 9000 (cam1) torecord the scene in front of the viewer

• a mobile device mounted Logitech Webcam Pro 9000(cam2) to record the scene in front of the display ofthe mobile device

• a backpack with a Lenovo ThinkPad T400s laptop witha solid state disk to store simultaneously and in realtime the data recorded by the two cameras.

The videos were recorded at 15 frames per second (fps).The internal clocks of the laptop and the mobile phone weresynchronized before each test session. In order to allow foran automated analysis of the user’s focus of attention duringpost-processing of the acquired data, a marker (black text’MMSPG’ on white background) was placed on each camera.A picture of the test equipment installed on a user is shownin Figure 1.

Figure 1: Test equipment system installed on a user.

(a) (b)

(c) (d)

Figure 2: Video GUI for the following modi: (a)play, (b) pause, (c) menu, and (d) volume control.

4.3 Data processingThe data acquired for each subject during the experiment

has been processed o"ine. The user attention has been de-termined by applying an object detection algorithm [25], todetect the markers in the two sequences. Particularly, themarkers were placed on cam1 and cam2 in such a way that,when the user is watching at the mobile phone screen, thescene recorded from cam1 will include the marker on topof the mobile phone (Figure 3(a)). Vice versa, the scenerecorded from cam2 will include the frontal user’s face aswell as the marker on user’s head (Figure 3(b)). Addition-ally, the marker on cam1 was mounted on a concave supportwhich occludes it from the field of view of cam2 when theface is perpendicular to the mobile phone screen (Figure4(b)). This avoided the detection of translated versions ofthe marker, which may be present in the field of view ofcam2 when the user is not looking at the mobile phone butholds it in the same position as when the video consumptiontakes place. Thus, none of the markers is in the field of viewof the cameras when the user is not looking at the phone,as shown in the sample frames in Figure 4. As a result, ifthe markers are detected in the frames captured at the sameinstant from the two cameras, then the user was identifiedto be looking at the mobile phone screen. Vice versa, if nomarkers are detected, then the user was not looking at the

The test equipment selected for the experiment needed tosatisfy two main constraints. Firstly, given the complexity ofthe measurement task, a range of di!erent devices was nec-essary to capture the video consumption behaviour of theusers during the experiment. The devices were required towork and interact with each other in real time. Secondly, asthe experiment was conducted in a non-laboratory environ-ment and the users needed to carry the equipment around,the devices were constrained to be lightweight and robustenough to work under moving conditions.The test equipment consisted of a video playback system

and a recording system selected with respect to the aboveconstraints. The former was used to show the content tothe user and capture information regarding the video playerstatus and the user location. The video playback systemconsisted of the following items:

• a HTC Tattoo mobile phone used to display the videocontent and an external headset to listen to the audio

• a standard button based video player graphical userinterface, as illustrated in Figure 2, with the followingfunctionalities:

– pause/play

– next channel

– previous channel

– menu to change channel

– standard volume controls

• a logging software to keep track of the controls acti-vated at anytime by the user

• a GPS integrated in the mobile phone used to deter-mine the position and the mobility scenario of the user.

The recording system facilitated the capture and storageof video information recorded to determine the user’s visualfocus of attention. The recording system consisted of thefollowing items:

• a head mounted Logitech Webcam Pro 9000 (cam1) torecord the scene in front of the viewer

• a mobile device mounted Logitech Webcam Pro 9000(cam2) to record the scene in front of the display ofthe mobile device

• a backpack with a Lenovo ThinkPad T400s laptop witha solid state disk to store simultaneously and in realtime the data recorded by the two cameras.

The videos were recorded at 15 frames per second (fps).The internal clocks of the laptop and the mobile phone weresynchronized before each test session. In order to allow foran automated analysis of the user’s focus of attention duringpost-processing of the acquired data, a marker (black text’MMSPG’ on white background) was placed on each camera.A picture of the test equipment installed on a user is shownin Figure 1.

Figure 1: Test equipment system installed on a user.

(a) (b)

(c) (d)

Figure 2: Video GUI for the following modi: (a)play, (b) pause, (c) menu, and (d) volume control.

4.3 Data processingThe data acquired for each subject during the experiment

has been processed o"ine. The user attention has been de-termined by applying an object detection algorithm [25], todetect the markers in the two sequences. Particularly, themarkers were placed on cam1 and cam2 in such a way that,when the user is watching at the mobile phone screen, thescene recorded from cam1 will include the marker on topof the mobile phone (Figure 3(a)). Vice versa, the scenerecorded from cam2 will include the frontal user’s face aswell as the marker on user’s head (Figure 3(b)). Addition-ally, the marker on cam1 was mounted on a concave supportwhich occludes it from the field of view of cam2 when theface is perpendicular to the mobile phone screen (Figure4(b)). This avoided the detection of translated versions ofthe marker, which may be present in the field of view ofcam2 when the user is not looking at the mobile phone butholds it in the same position as when the video consumptiontakes place. Thus, none of the markers is in the field of viewof the cameras when the user is not looking at the phone,as shown in the sample frames in Figure 4. As a result, ifthe markers are detected in the frames captured at the sameinstant from the two cameras, then the user was identifiedto be looking at the mobile phone screen. Vice versa, if nomarkers are detected, then the user was not looking at the

(a) (b)

Figure 3: Sample frames of the video sequencesrecorded by (a) cam1 and (b) cam2, when the useris looking at the phone screen.

(a) (b)

Figure 4: Sample frames of the video sequencesrecorded by (a) cam1 and (b) cam2, when the useris not looking at the phone screen.

phone at that instant.It is worth mentioning that we also tested a face detection

algorithm [26] to detect the face of the user in the videorecorded by cam2. Unfortunately, too many false positiveand false negative face detections were present in the results.Particularly, false positive occurred when the face was notfrontal, even if the frontal face cascade classifier has beenused. Some false negative frontal face detections can beexplained by the bad quality of the video frames, due touser’s motion and abrupt scene illumination changes.Such frame quality degradations were found to create prob-

lems to the marker detection algorithm as well, however,they were considerably fewer compared to the face detec-tion. Furthermore, the marker detection did not show anyfalse positives. The false negative marker detections wereremoved by applying a correction algorithm that is basedon the assumption that it is impossible for the user to lookaway from the phone for a very short instance in time. Tobe precise, if the marker was not detected for a succession ofk frames but it was detected in its m previous and n follow-ing frames, with m,n > k, then a correction was applied tothe detection results for the k frames. We found that robustresults could be achieved for k = 6 which corresponds to ap-proximately 400ms at 15 fps. For these reasons, the markerdetection algorithm was used to process the video recordedby cam2, instead of the face detection algorithm.Finally, bi-directional marker detection was used instead

of mono-directional detection, in order to avoid any false in-terpretations of the user’s focus of attention. For example,only the head mounted camera could have been used, de-tecting the marker placed on top of the mobile phone screen.Unfortunately, in this case it might happen that the markerfalls in the field of view of cam1 even if the user is not look-ing exactly at the phone screen. The bi-directional markerdetection therefore provided a more robust solution for col-

Figure 5: Path of the experiment reconstructed us-ing the GPS data from one test session.

lecting information regarding the focus of attention of theusers.

The information regarding the detection of the markeron the frames captured at each time instance from the twocameras, together with the information regarding the videoplayer status at the same time instance, facilitates to identifyin which of the three consumption states (see Section 3.1)the user is during the entire duration of the experiment. TheGPS data additionally allows to distinguish between the mo-bility states (static or dynamic) based on a speed estimationof the user. The path followed during the experiment, re-constructed on the Earth map using the acquired GPS dataof one user, is shown in figure 5.

For further processing and analysis, we mapped the ac-quired data onto binary values as shown in Table 2. Thesebinary values are then used to define a set of logical rules aspresented in Table 3, which subsequently serve to identifythe states within the consumption and mobility scenariosand thus, to parameterize the Markov model defined in Sec-tion 3.

5. RESULTSThe major result of the data acquisition test consists of

providing the transition probabilities of the Markov model.Furthermore, the average state sojourn time is estimated.The outcome is used to compute potential bandwidth sav-ings and to analyze two specific parameters leading to QoE

Page 31: Recent advances in quality of experience in multimedia communication

IMTC 20th Anniversary Forum – Porto, Portugal

3D QoE in mobile multimedia

• Comparison of different methods to produce 3D on mobile phones- 2D- Anaglyph- Motion Parallax - Wiggle- Autostereoscopic

31

(a) Overall quality (b) Perceived depth

Fig. 9. MOS/CI of the individual test conditions.

methods are preferred with lower quality score in the follow-ing order Sw, An and Aw. In other words, the wide multi-viewis rated as the best restitution methods in 30% of cases, Mnand 2D in 25%, Sn in 19%, etc. For the depth preference prob-ability and quality scores, the situation is slightly different.The scores for all the 3D restitution techniques are much bet-ter when compared to the 2D restitution. Auto-stereoscopic(Sn, Sw) methods achieve the highest score when Sn withMOS 100 is rated approximately twice better than Sw withMOS 55 and three times better that Mw with MOS 30.

4. CONCLUSION

Alternative 3D restitution techniques for displays of the com-mercially available portable devices have been studied withinthis work. Conventional 2D together with different 3D resti-tution techniques including anaglyph, multi-view motion par-allax and auto-stereoscopic method have been compared oneto another in terms of overall quality and depth perception.

The subjective quality tests show that the additional depthcues provide a better depth perception when compared to sim-ple 2D restitution. While motion parallax method is preferredin terms of overall quality, the depth quality of 3D is clearlyon a side of auto-stereoscopic method. Well-known anaglyphmethod is loosing in both, overall and depth quality, mostlikely due to the inaccurate color rendering and the crosstalkof anaglyph stereoscopy.

As future work, the significant influence of camera base-line, content and size of multiscopic image set on the overallquality and perceived depth for all restitution methods will bestudied in more details.

5. REFERENCES

[1] A. Gotchev, S. Jumisko-Pyykko, A. Boev, andD. Strohmeier, “Mobile 3dtv system: Quality and userperspective,” Procs of EUMOB, 2008.

[2] Cha Zhang, D. Florencio, and Zhengyou Zhang, “Im-proving immersive experiences in telecommunicationwith motion parallax [applications corner],” Signal Pro-

cessing Magazine, IEEE, vol. 28, no. 1, pp. 139 –144,jan. 2011.

[3] M. Rerabek, L. Goldmann, J.S. Lee, and T. Ebrahimi,“Motion parallax based restitution of 3d images onlegacy consumer mobile devices,” in Multimedia Sig-nal Processing (MMSP), 2011 IEEE 13th InternationalWorkshop on. IEEE, 2011, pp. 1–5.

[4] P. Benzie, J. Watson, P. Surman, I. Rakkolainen,K. Hopf, H. Urey, V. Sainov, and C. von Kopylow, “Asurvey of 3dtv displays: Techniques and technologies,”Circuits and Systems for Video Technology, IEEE Trans-actions on, vol. 17, no. 11, pp. 1647 –1658, nov. 2007.

[5] LG Electronics Inc., “Introduction to real3d,” Tech.Rep., LG Electronics Inc., April 2011.

[6] A.J. Woods and C.R. Harris, “Comparing levels ofcrosstalk with red/cyan, blue/yellow, and green/magentaanaglyph 3d glasses (proceedings paper),” 2010.

[7] David F. McAllister, Ya Zhou, and Sophia Sullivan,“Methods for computing color anaglyphs,” in Stereo-scopic Displays and Applications XXI.

[8] G. Peters, “Theories of three-dimensional object per-ception - a survey,” Recent Research Developments inPattern Recognition, 2000.

[9] Christoph Fehn, “Depth image based rendering (dibr),compression and transmission for a new approach on3d-tv,” 2004.

[10] MPEG, “Draft report on experimental frameworkfor 3d video coding,” Tech. Rep. N11478, ISO/IECJTC1/SC29/WG11, July 2010.

[11] MPEG video group, “Description of exploration experi-ments in 3d video coding,” ISO/IEC JTC1/SC29/WG11N9466, , no. N11630, Oct. 2010.

[12] ITU-R, “Subjective assessment of stereoscopic televi-sion pictures,” Tech. Rep. BT.1438, ITU-R, 2000.

[13] M.E. Glickman, “Parameter estimation in large dynamicpaired comparison experiments,” Journal of the RoyalStatistical Society: Series C (Applied Statistics), vol. 48,no. 3, pp. 377–394, 1999.

Fig. 2. Motion parallax based 3D restitution.

2.2.1. Multi-view motion parallax

The idea of portable device restitution method based on mo-tion parallax employing the multi-view image data-set is il-lustrated in Fig. 2. By rotating the portable device, the viewercontrols the view which is rendered on the display and sohe/she can observe the scene from different positions. Thisway the viewer imitates a process similar to what he/she doeswhen observing the real world.

When the number of images in a stereoscopic or multi-scopic image set is too small to achieve a smooth restitution,intermediate images are generated using depth image basedrendering [9]. The depth estimation and view synthesis toolsof the 3D video coding (3DV) framework [10] developed byMPEG are used for the content generation process withinour experiments. The depth estimation reference software(DERS) uses three camera views (left, center, right) togetherwith the intrinsic and extrinsic camera parameters to estimatethe depth map of the center view. From the three operationmodes (automatic, segmentation, semi-automatic) of DERS,the automatic mode has been used. Once the depth mapswere obtained the view synthesis reference software (VSRS)was used to synthesize intermediate views. A virtual view isgenerated based on two reference views with the correspond-ing depth maps as well as the intrinsic and extrinsic cameraparameters. This is achieved by depth and texture mappingand hole filling for each of the reference views followed byimage blending and inpainting.

Given the multiscopic image set the goal of the multi-view restitution is to display the appropriate view accordingto the relative orientation between the device and the vieweras shown in Fig. 3.

Measuring the orientation of the device with respect tothe viewer can be achieved through a built-in accelerometer.From the three possible rotations (pan, tilt, roll) of the mobiledevice only the rotation around the vertical axis (pan) is used.Given the initial pan angle and the predefined pan range, thecurrent view can be computed based on the current pan an-gle. It is displayed until the pan angle reaches an angle that

Fig. 3. Content restitution for the motion parallax based dis-play.

corresponds to another view. Therefore, the display durationof a view depends solely on the speed with which the viewermoves the display.

3. QUALITY EVALUATION

3.1. Image data preparation

A subset of the multi-view sequences from the MPEG1 3DVideo (3DV) data-set [11] has been used for the experiments.More specifically 4 sequences (Lovebird1, Ballons, Kendo,Mobile) from the class C set have been considered. The firstone was used for training and the latter three for testing.

Suitable frames have been selected and extracted fromeach video. Using the latest version of the depth estimationreference software (DERS 5.1) [10] and the view synthesisreference software (VSRS 3.5) [10], the required number ofviews were synthesized. Since the display application on themobile phone does not support the YUV format used by theMPEG tools, the resulting multi-view image sets were fur-ther converted to high quality JPEG images. Finally the spa-tial resolution of the multiscopic image set was adapted tomatch that of the portable device. For our experiments anAndroid based smart phone LG Optimus 3D with an auto-stereoscopic display and with screen resolution of 800x480pixels was used.

Given the resulting multi-view data-set, 7 subsets are cre-ated that simulate the different restitution methods (2D, auto-stereoscopic, anaglyph, motion parallax) for narrow 10cmand wide 20cm camera baseline leading to the following testconditions. 2D - reference 2D image, chosen to be the cen-ter view of the multi-view data-set. Sn - stereoscopic imageset2 with narrow camera baseline resulting in a smaller depth

1http://mpeg.chiariglione.org

2Stereoscopic image set consists of left and right image. The resulting in-terlaced image is rendered automatically using the LG Real3D technology [?]

Fig. 4. Creation of the individual test conditions from multi-scopic data-set.

range. Sw - stereoscopic image set with a wide camera base-line resulting in a larger depth range. Mn - multiscopic imageset of 11 images with a narrow camera baseline resulting insmaller motion parallax. Mw - multiscopic image of 21 im-ages with a wide camera baseline resulting in larger motionparallax. An - anaglyph image with narrow camera baselineresulting in a smaller depth range. Aw - anaglyph image withwide camera baseline resulting in a larger depth range.

The idea of image data-set creation process is illustratedin Fig. 4 for a multi-view image set which consists of 5 orig-inal views (OV) and 4 intermediate synthesized views (SV)between each of them. As usual, the camera baseline corre-sponds the distance between the left and right view. Sincecamera distance affects directly the perceived depth but alsothe visual comfort, two different camera distances were con-sidered to identify the optimal one for each of the restitutionmethods. Subsets with narrower camera baseline (Sn, Mn andAn) utilize the original views OV 10 and OV 30 as the leftand right views, respectively. On the other hand, the origi-nal views OV 00 and OV 40 are used as left and right viewswithin the wider camera baseline subsets (Sw, Mw and Aw).Since the viewing angle range of the mobile device is fixed,the motion smoothness and the depth range are larger for thewider camera baseline.

3.2. Test methodology

Since judging the quality of different 2D and 3D restitutiontechniques individually may be quite difficult, the stimuluscomparison (SC) quality evaluation method seems to be themost suitable. The subjective tests were performed as follows.An special Android application has been developed in orderto display sequentially a pair test stimuli on a testing device.After comparing the two test stimuli the subject was askedto choose his/her preference (“first” or “second”) in terms ofoverall quality and depth quality. The option “same” was alsoincluded to avoid random preference selections. For each of

(a) Ballons (b) Kendo

(c) Mobile (d) Overall

Fig. 5. Preference and tie probabilities of the individual pairsconsidering the overall quality.

the 3 test video sequences (Balloons, Kendo, Mobile) all thepossible combinations of the 7 test conditions (2D, Sw, Sn,Mw, Mn, Aw, An) were considered. This led to a test-setwith 3⇥

�72

�= 63 paired comparisons. Since the IPD (Inter-

Pupillary Distance) of all participating subjects has not beenmeasured, the two camera baselines, used to generate narrowand wide subset, are compared against each other.

Fifteen subjects (11 male and 4 female) with an averageage of 29.3 participated in the subjective test experiments.They reported normal or corrected to normal vision accordingto [12]. All of them were nonexpert viewers with a marginalexperience of 3D image and video viewing.

3.3. Results and discussion

Statistical tools have been applied in order to analyze the in-dividual users preferences ratings for the different scenes andtest conditions.

The simplest way to analyze a set of paired comparisonsis to compute the distribution of the votes over the differentcategorical levels (first, same, second) and normalizing themby the number of subjects. This can be done individually foreach or jointly over all the video sequences. Fig. 5 and Fig. 6show the resulting probabilities for the overall quality and theperceived depth, respectively.

With respect to the overall quality the results across thedifferent scenes are quite similar. Multi-view (Mn, Mw) israted similarly as 2D and together in general perceived muchbetter (preference probability between 60% and 90%) whencompared to anaglyph (An, Aw). Multi-view and 2D com-parison with auto-stereoscopic (Sn, Sw) brings quite com-petitive results with slight advantage on side of multi-view

Page 32: Recent advances in quality of experience in multimedia communication

IMTC 20th Anniversary Forum – Porto, Portugal32

Trends in QoE

• Digital world has (re-)discovered the notion of quality– Lower quality content is increasingly less tolerated by

end-users– Digital technology have surpassed barriers of

attainable quality, while remaining cost effective• Increasing interest in QoE

– Extending from device-centric and system-centric quality optimization to end-to-end and especially user-centric optimization

Page 33: Recent advances in quality of experience in multimedia communication

IMTC 20th Anniversary Forum – Porto, Portugal33

Trends in QoE community building

• Increased interest in workshops and conferences around the notion of quality assessment and metrics– QoMEX: International Workshop on Quality of Multimedia

Experience (http://www.qomex.org)– VPQM: International Workshop on Video Processing and Quality

Metrics for Consumer Electronics (http://www.vpqm.org)– …

• QoE is one of the issues referred to in research programs funded by the EC– Various calls in the recent year include QoE issues– QoE is one of the top topics in future framework program– Already ongoing initiatives such as COST Action IC1003

(Qualinet)

Page 34: Recent advances in quality of experience in multimedia communication

IMTC 20th Anniversary Forum – Porto, Portugal34

Trends in standardization

• Standardization efforts in quality assessment and metrics– Video Quality Experts Group (VQEG)– ITU-T SG 12 (Performance, QoS and QoE)– JPEG (Advanced Image Coding - AIC, HDR image compression)– MPEG (HEVC, 3DVC)– …

Page 35: Recent advances in quality of experience in multimedia communication

IMTC 20th Anniversary Forum – Porto, Portugal35

Challenges ahead

• Some key issues in QoE:– Content-dependent quality assessment methods and metrics – Context-dependent quality assessment methods and metrics– Quality assessment methods and metrics beyond AV (haptics, …)– Multi-modal quality assessment methods and metrics (AV, …)– 3D quality assessment methods and metrics (3D sound, 3D video, …)– HDR content quality assessment methods and metrics – Interaction quality metrics (closely related to usability)– Presence/immersion quality metrics– …

• Need for Quality Certification Mechanisms of multimedia services and products– Similar in idea to ISO 9000 series

Page 36: Recent advances in quality of experience in multimedia communication

IMTC 20th Anniversary Forum – Porto, Portugal36

What does this all mean to you?

• Era of user-centric multimedia has already started– It is not anymore sufficient to merely add new features and

functionalities to multimedia systems– True added value in terms of impact on user experience of such

features and functions should be demonstrated– Quality of Experience plays a central role in this new game

• Opportunities in research, technology, business, art and entertainment

Page 37: Recent advances in quality of experience in multimedia communication

IMTC 20th Anniversary Forum – Porto, Portugal37

Thank you for your attentionQuestions?