recent advances in quality of experience in multimedia communication

IMTC 20th Anniversary Forum – Porto, Portugal1

Recent advances in quality of experience in

multimedia communication

Touradj [email protected]

International Multimedia Telecommunications Consortium


•What is “quality”?

•Origins of “quality” evalua7on in telecommunica7ons

•How is “quality” measured in mul7media communica7ons today ?

•What are the trends on “quality” in mul7media communica7on?

• What are the challenges ahead?

Today we will talk about…


Quality: a simple yet difficult concept

• Like many human sensations, quality is easy to understand but difficult to define

• Quality according to Wikipedia:– A quality (from Latin - qualitas) is an attribute or a

property– Some philosophers assert that a quality cannot be

defined


A fundamental, ancient, but largely under investigated concept

Aristotle classified every object of human apprehension into 10 Categories– Substance– Quantity– Quality– Relation– Place– Time– Position– State– Action– Affection


Qualinet white paper

• White Paper produced by COST Action IC1003 (Qualinet):– http://www.qualinet.eu/images/stories/

whitepaper_v1.1_dagstuhl_output_corrected.pdf• Several definitions of quality in multimedia

systems and services and other related concepts


Quality is like an elephant …

The blind men and the elephant: Poem by John Godfrey Saxe


Quality in telecommunications


Quality in telecommunica4ons

Network QualityCapacity

CoverageHandoff

Link QualityBitrate

Frame/Bit/Packet lossDelay

User Quality

Speech fidelityAudio fidelityImage fidelityVideo fidelity


Quality in telecommunications

Objective: input = output


QoS in telecommunications

• Quality of Service (QoS)– Resource reservation control mechanisms– Ability to provide different priority to different

applications or data flows– Guarantee a certain level of performance (quality) to

a data flow• Requires measuring or estimating quality• (Service) Provider-centric concept

– Tightly related to the concept of Mean Opinion Score (MOS)


Mean Opinion Score (MOS)?

• Widely used in many fields:– Politics/Elections– Marketing/Advertisement– Food industry– Multimedia– …

• The likely level of satisfaction of a service or product as appreciated by an average user in a typical use

• Should be performed such that it generates reliable and reproducible results– Subjective evaluation methodology– More complex and difficult that it seems


What is behind a MOS?


What is wrong with MOS?• Ground truth

– It is assumed to be the optimal solution (ground truth)• Defies user preference

– It is assumed to be independent of specific subjects• Ignores key parameters

– It is often assumed to be independent of content and context

• Misuse as a comparison approach– It is often used to compare performance between

alternative tools for a same task/operation• Impractical or expensive• Statistical assumptions not always realistic


Ground truth


Defies user preference


Content

p01 $$$$$$p06 $$$$$$$$$$p10 $$$$$$$$$$$$$$$bike$$$$$$$$$$$$$$$$$$cafe$$$$$$$$$$$$$$$woman$


Context


Objec4ve quality metrics

• Subjec7ve tests are 7me consuming, expensive, difficult to design, and cannot be used in real-‐7me systems

• Objec7ve algorithms, i.e. metrics, es7ma7ng MOS with high level of correla7on are desired• Full reference metrics• No reference metrics• Reduced reference metrics


Objec4ve quality metrics based on fidelity

• Full Reference scenario• Metrics which look at the fidelity of the signal when

compared to an explicit reference:

processed signal=

perfect quality reference signal+

error signal


Objec4ve quality metrics based on fidelity

• Examples of fidelity metrics for image quality quality assessment− Mean Square Error (MSE)− Peak Signal to Noise Ra7o (PSNR)− Maximum Pixel Devia7on (Linf)

− Weighted PSNR − Masked PSNR− Structural SIMilarity (SSIM)− Mul7scale Structural Similarity (MSSIM)− Visual Informa7on Fidelity (VIF)− etc…


What is wrong with objective quality metrics ?

• Poor performance in predicting MOS– Especially in real life situations

• How to deal with color components?– Even in PSNR!

• How to deal with video?– Time versus spatial dimension

• How to take into account context?– Display size/type– Environmental illumination– …

• How to deal with user preference/expectation?• ...

IMTC 20th Anniversary Forum – Porto, Portugal

MVC assessment using PSNR as metric

22 Vetro: Update on MPEG 3D Video Coding Activity

MVC�Coding�Performance

Ballroom

31

32

33

34

35

36

37

38

39

40

0 200 400 600 800 1000 1200 1400 1600 1800

Bitrate (Kb/s)

PS

NR

(db)

Simulcast

MVC

Race1

32

33

34

35

36

37

38

39

40

41

42

0 200 400 600 800 1000 1200 1400 1600

Bitrate (Kb/s)

PS

NR

(db)

Simulcast

MVC

Sample comparison of simulcast vs inter-view prediction

8 views (640�480), consider rate for all views~25% bit rate savings over all views


MVC assessment by subjective evaluation

23Vetro: Update on MPEG 3D Video Coding Activity

MVC:�Subjective�Performance• Main�finding:�MVC�achieves�comparable�quality�to�

simulcast�with�as�little�as�25%�rate�for�dependent�view

1.00

1.50

2.00

2.50

3.00

3.50

4.00

4.50

Origina

l

Simulc

ast (

AVC+AVC)

12L_

50Pct

12L_

35Pct

12L_

25Pct

12L_

20Pct

12L_

15Pct

12L_

10Pct

12L_

5Pct

Mea

n O

pini

on S

core

Base view fixed at 12Mbps Dependent view at varying percentage of base view rate


Evolving Quality Paradigms


User centered evaluation

25

• It is becoming increasing important to evaluate quality from a user perspective rather than from a provider/system perspective

• Constrains the tests to potential users and stimuli and evaluates the acceptability with respect to the task and the context

• Identify underlying quality factors beside the overall quality


What people mean when they say QoE ?

• “The degree of fulfillment of an intended experience on a given user”

as defined by Touradj Ebrahimi, 2001

• “The overall acceptability of an application or service … by the end user”

as defined by the ITU-T

• “The degree of delight ... of the user of an application or service ... user’s personality and current state.”

as defined in white paper by Qualinet

The term ‘experience’ promises individual engagement …


Quality of Service vs Quality of Experience

• Quality of Service: Value of the average user’s experience es7mated by a service/product/content provider

• Quality of Experience: Value (es7mated or actually measured) of a specific user’s richness of experience

• Quality of Experience is the dual (and extended) view of QoS problem

QoS=provider-‐centricQoE=user-‐centric


A simple model for QoE

• User attributes – individual attributes – expectation, age, sex, personality, background…– sensorial attributes – including limitations and deficiencies– perceptual attributes – emotional attributes

• System attributes– technical attributes (as in QoS)

• Contextual attributes– environmental attributes– device attributes– service attributes– content attributes

userQoS

QoE

context


Personas and scenarios [Strohmeier2009]

• Personas (user preference)– Archetypical user representing the needs, behaviors

and goals of a particular group of users– Not real people but fictional characters derived from

user requirements• Scenarios (context)

– Realistic usage environment

29


Quality of Experience in Mobile Multimedia

• The influence of context in mobile multimedia user experience and QoE– Various probes to monitor user behavior– Specific scenario to be followed by subjects

30

The test equipment selected for the experiment needed tosatisfy two main constraints. Firstly, given the complexity ofthe measurement task, a range of di!erent devices was nec-essary to capture the video consumption behaviour of theusers during the experiment. The devices were required towork and interact with each other in real time. Secondly, asthe experiment was conducted in a non-laboratory environ-ment and the users needed to carry the equipment around,the devices were constrained to be lightweight and robustenough to work under moving conditions.The test equipment consisted of a video playback system

and a recording system selected with respect to the aboveconstraints. The former was used to show the content tothe user and capture information regarding the video playerstatus and the user location. The video playback systemconsisted of the following items:

• a HTC Tattoo mobile phone used to display the videocontent and an external headset to listen to the audio

• a standard button based video player graphical userinterface, as illustrated in Figure 2, with the followingfunctionalities:

– pause/play

– next channel

– previous channel

– menu to change channel

– standard volume controls

• a logging software to keep track of the controls acti-vated at anytime by the user

• a GPS integrated in the mobile phone used to deter-mine the position and the mobility scenario of the user.

The recording system facilitated the capture and storageof video information recorded to determine the user’s visualfocus of attention. The recording system consisted of thefollowing items:

• a head mounted Logitech Webcam Pro 9000 (cam1) torecord the scene in front of the viewer

• a mobile device mounted Logitech Webcam Pro 9000(cam2) to record the scene in front of the display ofthe mobile device

• a backpack with a Lenovo ThinkPad T400s laptop witha solid state disk to store simultaneously and in realtime the data recorded by the two cameras.

The videos were recorded at 15 frames per second (fps).The internal clocks of the laptop and the mobile phone weresynchronized before each test session. In order to allow foran automated analysis of the user’s focus of attention duringpost-processing of the acquired data, a marker (black text’MMSPG’ on white background) was placed on each camera.A picture of the test equipment installed on a user is shownin Figure 1.

Figure 1: Test equipment system installed on a user.

(a) (b)

(c) (d)

Figure 2: Video GUI for the following modi: (a)play, (b) pause, (c) menu, and (d) volume control.

4.3 Data processingThe data acquired for each subject during the experiment

has been processed o"ine. The user attention has been de-termined by applying an object detection algorithm [25], todetect the markers in the two sequences. Particularly, themarkers were placed on cam1 and cam2 in such a way that,when the user is watching at the mobile phone screen, thescene recorded from cam1 will include the marker on topof the mobile phone (Figure 3(a)). Vice versa, the scenerecorded from cam2 will include the frontal user’s face aswell as the marker on user’s head (Figure 3(b)). Addition-ally, the marker on cam1 was mounted on a concave supportwhich occludes it from the field of view of cam2 when theface is perpendicular to the mobile phone screen (Figure4(b)). This avoided the detection of translated versions ofthe marker, which may be present in the field of view ofcam2 when the user is not looking at the mobile phone butholds it in the same position as when the video consumptiontakes place. Thus, none of the markers is in the field of viewof the cameras when the user is not looking at the phone,as shown in the sample frames in Figure 4. As a result, ifthe markers are detected in the frames captured at the sameinstant from the two cameras, then the user was identifiedto be looking at the mobile phone screen. Vice versa, if nomarkers are detected, then the user was not looking at the

The test equipment selected for the experiment needed tosatisfy two main constraints. Firstly, given the complexity ofthe measurement task, a range of di!erent devices was nec-essary to capture the video consumption behaviour of theusers during the experiment. The devices were required towork and interact with each other in real time. Secondly, asthe experiment was conducted in a non-laboratory environ-ment and the users needed to carry the equipment around,the devices were constrained to be lightweight and robustenough to work under moving conditions.The test equipment consisted of a video playback system

and a recording system selected with respect to the aboveconstraints. The former was used to show the content tothe user and capture information regarding the video playerstatus and the user location. The video playback systemconsisted of the following items:

• a HTC Tattoo mobile phone used to display the videocontent and an external headset to listen to the audio

• a standard button based video player graphical userinterface, as illustrated in Figure 2, with the followingfunctionalities:

– pause/play

– next channel

– previous channel

– menu to change channel

– standard volume controls

• a logging software to keep track of the controls acti-vated at anytime by the user

• a GPS integrated in the mobile phone used to deter-mine the position and the mobility scenario of the user.

The recording system facilitated the capture and storageof video information recorded to determine the user’s visualfocus of attention. The recording system consisted of thefollowing items:

• a head mounted Logitech Webcam Pro 9000 (cam1) torecord the scene in front of the viewer

• a mobile device mounted Logitech Webcam Pro 9000(cam2) to record the scene in front of the display ofthe mobile device

• a backpack with a Lenovo ThinkPad T400s laptop witha solid state disk to store simultaneously and in realtime the data recorded by the two cameras.

The videos were recorded at 15 frames per second (fps).The internal clocks of the laptop and the mobile phone weresynchronized before each test session. In order to allow foran automated analysis of the user’s focus of attention duringpost-processing of the acquired data, a marker (black text’MMSPG’ on white background) was placed on each camera.A picture of the test equipment installed on a user is shownin Figure 1.

Figure 1: Test equipment system installed on a user.

(a) (b)

(c) (d)

Figure 2: Video GUI for the following modi: (a)play, (b) pause, (c) menu, and (d) volume control.

4.3 Data processingThe data acquired for each subject during the experiment

has been processed o"ine. The user attention has been de-termined by applying an object detection algorithm [25], todetect the markers in the two sequences. Particularly, themarkers were placed on cam1 and cam2 in such a way that,when the user is watching at the mobile phone screen, thescene recorded from cam1 will include the marker on topof the mobile phone (Figure 3(a)). Vice versa, the scenerecorded from cam2 will include the frontal user’s face aswell as the marker on user’s head (Figure 3(b)). Addition-ally, the marker on cam1 was mounted on a concave supportwhich occludes it from the field of view of cam2 when theface is perpendicular to the mobile phone screen (Figure4(b)). This avoided the detection of translated versions ofthe marker, which may be present in the field of view ofcam2 when the user is not looking at the mobile phone butholds it in the same position as when the video consumptiontakes place. Thus, none of the markers is in the field of viewof the cameras when the user is not looking at the phone,as shown in the sample frames in Figure 4. As a result, ifthe markers are detected in the frames captured at the sameinstant from the two cameras, then the user was identifiedto be looking at the mobile phone screen. Vice versa, if nomarkers are detected, then the user was not looking at the

(a) (b)

Figure 3: Sample frames of the video sequencesrecorded by (a) cam1 and (b) cam2, when the useris looking at the phone screen.

(a) (b)

Figure 4: Sample frames of the video sequencesrecorded by (a) cam1 and (b) cam2, when the useris not looking at the phone screen.

phone at that instant.It is worth mentioning that we also tested a face detection

algorithm [26] to detect the face of the user in the videorecorded by cam2. Unfortunately, too many false positiveand false negative face detections were present in the results.Particularly, false positive occurred when the face was notfrontal, even if the frontal face cascade classifier has beenused. Some false negative frontal face detections can beexplained by the bad quality of the video frames, due touser’s motion and abrupt scene illumination changes.Such frame quality degradations were found to create prob-

lems to the marker detection algorithm as well, however,they were considerably fewer compared to the face detec-tion. Furthermore, the marker detection did not show anyfalse positives. The false negative marker detections wereremoved by applying a correction algorithm that is basedon the assumption that it is impossible for the user to lookaway from the phone for a very short instance in time. Tobe precise, if the marker was not detected for a succession ofk frames but it was detected in its m previous and n follow-ing frames, with m,n > k, then a correction was applied tothe detection results for the k frames. We found that robustresults could be achieved for k = 6 which corresponds to ap-proximately 400ms at 15 fps. For these reasons, the markerdetection algorithm was used to process the video recordedby cam2, instead of the face detection algorithm.Finally, bi-directional marker detection was used instead

of mono-directional detection, in order to avoid any false in-terpretations of the user’s focus of attention. For example,only the head mounted camera could have been used, de-tecting the marker placed on top of the mobile phone screen.Unfortunately, in this case it might happen that the markerfalls in the field of view of cam1 even if the user is not look-ing exactly at the phone screen. The bi-directional markerdetection therefore provided a more robust solution for col-

Figure 5: Path of the experiment reconstructed us-ing the GPS data from one test session.

lecting information regarding the focus of attention of theusers.

The information regarding the detection of the markeron the frames captured at each time instance from the twocameras, together with the information regarding the videoplayer status at the same time instance, facilitates to identifyin which of the three consumption states (see Section 3.1)the user is during the entire duration of the experiment. TheGPS data additionally allows to distinguish between the mo-bility states (static or dynamic) based on a speed estimationof the user. The path followed during the experiment, re-constructed on the Earth map using the acquired GPS dataof one user, is shown in figure 5.

For further processing and analysis, we mapped the ac-quired data onto binary values as shown in Table 2. Thesebinary values are then used to define a set of logical rules aspresented in Table 3, which subsequently serve to identifythe states within the consumption and mobility scenariosand thus, to parameterize the Markov model defined in Sec-tion 3.

5. RESULTSThe major result of the data acquisition test consists of

providing the transition probabilities of the Markov model.Furthermore, the average state sojourn time is estimated.The outcome is used to compute potential bandwidth sav-ings and to analyze two specific parameters leading to QoE


3D QoE in mobile multimedia

• Comparison of different methods to produce 3D on mobile phones- 2D- Anaglyph- Motion Parallax - Wiggle- Autostereoscopic

31

(a) Overall quality (b) Perceived depth

Fig. 9. MOS/CI of the individual test conditions.

methods are preferred with lower quality score in the follow-ing order Sw, An and Aw. In other words, the wide multi-viewis rated as the best restitution methods in 30% of cases, Mnand 2D in 25%, Sn in 19%, etc. For the depth preference prob-ability and quality scores, the situation is slightly different.The scores for all the 3D restitution techniques are much bet-ter when compared to the 2D restitution. Auto-stereoscopic(Sn, Sw) methods achieve the highest score when Sn withMOS 100 is rated approximately twice better than Sw withMOS 55 and three times better that Mw with MOS 30.

4. CONCLUSION

Alternative 3D restitution techniques for displays of the com-mercially available portable devices have been studied withinthis work. Conventional 2D together with different 3D resti-tution techniques including anaglyph, multi-view motion par-allax and auto-stereoscopic method have been compared oneto another in terms of overall quality and depth perception.

The subjective quality tests show that the additional depthcues provide a better depth perception when compared to sim-ple 2D restitution. While motion parallax method is preferredin terms of overall quality, the depth quality of 3D is clearlyon a side of auto-stereoscopic method. Well-known anaglyphmethod is loosing in both, overall and depth quality, mostlikely due to the inaccurate color rendering and the crosstalkof anaglyph stereoscopy.

As future work, the significant influence of camera base-line, content and size of multiscopic image set on the overallquality and perceived depth for all restitution methods will bestudied in more details.

5. REFERENCES

[1] A. Gotchev, S. Jumisko-Pyykko, A. Boev, andD. Strohmeier, “Mobile 3dtv system: Quality and userperspective,” Procs of EUMOB, 2008.

[2] Cha Zhang, D. Florencio, and Zhengyou Zhang, “Im-proving immersive experiences in telecommunicationwith motion parallax [applications corner],” Signal Pro-

cessing Magazine, IEEE, vol. 28, no. 1, pp. 139 –144,jan. 2011.

[3] M. Rerabek, L. Goldmann, J.S. Lee, and T. Ebrahimi,“Motion parallax based restitution of 3d images onlegacy consumer mobile devices,” in Multimedia Sig-nal Processing (MMSP), 2011 IEEE 13th InternationalWorkshop on. IEEE, 2011, pp. 1–5.

[4] P. Benzie, J. Watson, P. Surman, I. Rakkolainen,K. Hopf, H. Urey, V. Sainov, and C. von Kopylow, “Asurvey of 3dtv displays: Techniques and technologies,”Circuits and Systems for Video Technology, IEEE Trans-actions on, vol. 17, no. 11, pp. 1647 –1658, nov. 2007.

[5] LG Electronics Inc., “Introduction to real3d,” Tech.Rep., LG Electronics Inc., April 2011.

[6] A.J. Woods and C.R. Harris, “Comparing levels ofcrosstalk with red/cyan, blue/yellow, and green/magentaanaglyph 3d glasses (proceedings paper),” 2010.

[7] David F. McAllister, Ya Zhou, and Sophia Sullivan,“Methods for computing color anaglyphs,” in Stereo-scopic Displays and Applications XXI.

[8] G. Peters, “Theories of three-dimensional object per-ception - a survey,” Recent Research Developments inPattern Recognition, 2000.

[9] Christoph Fehn, “Depth image based rendering (dibr),compression and transmission for a new approach on3d-tv,” 2004.

[10] MPEG, “Draft report on experimental frameworkfor 3d video coding,” Tech. Rep. N11478, ISO/IECJTC1/SC29/WG11, July 2010.

[11] MPEG video group, “Description of exploration experi-ments in 3d video coding,” ISO/IEC JTC1/SC29/WG11N9466, , no. N11630, Oct. 2010.

[12] ITU-R, “Subjective assessment of stereoscopic televi-sion pictures,” Tech. Rep. BT.1438, ITU-R, 2000.

[13] M.E. Glickman, “Parameter estimation in large dynamicpaired comparison experiments,” Journal of the RoyalStatistical Society: Series C (Applied Statistics), vol. 48,no. 3, pp. 377–394, 1999.

Fig. 2. Motion parallax based 3D restitution.

2.2.1. Multi-view motion parallax

The idea of portable device restitution method based on mo-tion parallax employing the multi-view image data-set is il-lustrated in Fig. 2. By rotating the portable device, the viewercontrols the view which is rendered on the display and sohe/she can observe the scene from different positions. Thisway the viewer imitates a process similar to what he/she doeswhen observing the real world.

When the number of images in a stereoscopic or multi-scopic image set is too small to achieve a smooth restitution,intermediate images are generated using depth image basedrendering [9]. The depth estimation and view synthesis toolsof the 3D video coding (3DV) framework [10] developed byMPEG are used for the content generation process withinour experiments. The depth estimation reference software(DERS) uses three camera views (left, center, right) togetherwith the intrinsic and extrinsic camera parameters to estimatethe depth map of the center view. From the three operationmodes (automatic, segmentation, semi-automatic) of DERS,the automatic mode has been used. Once the depth mapswere obtained the view synthesis reference software (VSRS)was used to synthesize intermediate views. A virtual view isgenerated based on two reference views with the correspond-ing depth maps as well as the intrinsic and extrinsic cameraparameters. This is achieved by depth and texture mappingand hole filling for each of the reference views followed byimage blending and inpainting.

Given the multiscopic image set the goal of the multi-view restitution is to display the appropriate view accordingto the relative orientation between the device and the vieweras shown in Fig. 3.

Measuring the orientation of the device with respect tothe viewer can be achieved through a built-in accelerometer.From the three possible rotations (pan, tilt, roll) of the mobiledevice only the rotation around the vertical axis (pan) is used.Given the initial pan angle and the predefined pan range, thecurrent view can be computed based on the current pan an-gle. It is displayed until the pan angle reaches an angle that

Fig. 3. Content restitution for the motion parallax based dis-play.

corresponds to another view. Therefore, the display durationof a view depends solely on the speed with which the viewermoves the display.

3. QUALITY EVALUATION

3.1. Image data preparation

A subset of the multi-view sequences from the MPEG1 3DVideo (3DV) data-set [11] has been used for the experiments.More specifically 4 sequences (Lovebird1, Ballons, Kendo,Mobile) from the class C set have been considered. The firstone was used for training and the latter three for testing.

Suitable frames have been selected and extracted fromeach video. Using the latest version of the depth estimationreference software (DERS 5.1) [10] and the view synthesisreference software (VSRS 3.5) [10], the required number ofviews were synthesized. Since the display application on themobile phone does not support the YUV format used by theMPEG tools, the resulting multi-view image sets were fur-ther converted to high quality JPEG images. Finally the spa-tial resolution of the multiscopic image set was adapted tomatch that of the portable device. For our experiments anAndroid based smart phone LG Optimus 3D with an auto-stereoscopic display and with screen resolution of 800x480pixels was used.

Given the resulting multi-view data-set, 7 subsets are cre-ated that simulate the different restitution methods (2D, auto-stereoscopic, anaglyph, motion parallax) for narrow 10cmand wide 20cm camera baseline leading to the following testconditions. 2D - reference 2D image, chosen to be the cen-ter view of the multi-view data-set. Sn - stereoscopic imageset2 with narrow camera baseline resulting in a smaller depth

1http://mpeg.chiariglione.org

2Stereoscopic image set consists of left and right image. The resulting in-terlaced image is rendered automatically using the LG Real3D technology [?]

Fig. 4. Creation of the individual test conditions from multi-scopic data-set.

range. Sw - stereoscopic image set with a wide camera base-line resulting in a larger depth range. Mn - multiscopic imageset of 11 images with a narrow camera baseline resulting insmaller motion parallax. Mw - multiscopic image of 21 im-ages with a wide camera baseline resulting in larger motionparallax. An - anaglyph image with narrow camera baselineresulting in a smaller depth range. Aw - anaglyph image withwide camera baseline resulting in a larger depth range.

The idea of image data-set creation process is illustratedin Fig. 4 for a multi-view image set which consists of 5 orig-inal views (OV) and 4 intermediate synthesized views (SV)between each of them. As usual, the camera baseline corre-sponds the distance between the left and right view. Sincecamera distance affects directly the perceived depth but alsothe visual comfort, two different camera distances were con-sidered to identify the optimal one for each of the restitutionmethods. Subsets with narrower camera baseline (Sn, Mn andAn) utilize the original views OV 10 and OV 30 as the leftand right views, respectively. On the other hand, the origi-nal views OV 00 and OV 40 are used as left and right viewswithin the wider camera baseline subsets (Sw, Mw and Aw).Since the viewing angle range of the mobile device is fixed,the motion smoothness and the depth range are larger for thewider camera baseline.

3.2. Test methodology

Since judging the quality of different 2D and 3D restitutiontechniques individually may be quite difficult, the stimuluscomparison (SC) quality evaluation method seems to be themost suitable. The subjective tests were performed as follows.An special Android application has been developed in orderto display sequentially a pair test stimuli on a testing device.After comparing the two test stimuli the subject was askedto choose his/her preference (“first” or “second”) in terms ofoverall quality and depth quality. The option “same” was alsoincluded to avoid random preference selections. For each of

(a) Ballons (b) Kendo

(c) Mobile (d) Overall

Fig. 5. Preference and tie probabilities of the individual pairsconsidering the overall quality.

the 3 test video sequences (Balloons, Kendo, Mobile) all thepossible combinations of the 7 test conditions (2D, Sw, Sn,Mw, Mn, Aw, An) were considered. This led to a test-setwith 3⇥

�72

�= 63 paired comparisons. Since the IPD (Inter-

Pupillary Distance) of all participating subjects has not beenmeasured, the two camera baselines, used to generate narrowand wide subset, are compared against each other.

Fifteen subjects (11 male and 4 female) with an averageage of 29.3 participated in the subjective test experiments.They reported normal or corrected to normal vision accordingto [12]. All of them were nonexpert viewers with a marginalexperience of 3D image and video viewing.

3.3. Results and discussion

Statistical tools have been applied in order to analyze the in-dividual users preferences ratings for the different scenes andtest conditions.

The simplest way to analyze a set of paired comparisonsis to compute the distribution of the votes over the differentcategorical levels (first, same, second) and normalizing themby the number of subjects. This can be done individually foreach or jointly over all the video sequences. Fig. 5 and Fig. 6show the resulting probabilities for the overall quality and theperceived depth, respectively.

With respect to the overall quality the results across thedifferent scenes are quite similar. Multi-view (Mn, Mw) israted similarly as 2D and together in general perceived muchbetter (preference probability between 60% and 90%) whencompared to anaglyph (An, Aw). Multi-view and 2D com-parison with auto-stereoscopic (Sn, Sw) brings quite com-petitive results with slight advantage on side of multi-view


Trends in QoE

• Digital world has (re-)discovered the notion of quality– Lower quality content is increasingly less tolerated by

end-users– Digital technology have surpassed barriers of

attainable quality, while remaining cost effective• Increasing interest in QoE

– Extending from device-centric and system-centric quality optimization to end-to-end and especially user-centric optimization


Trends in QoE community building

• Increased interest in workshops and conferences around the notion of quality assessment and metrics– QoMEX: International Workshop on Quality of Multimedia

Experience (http://www.qomex.org)– VPQM: International Workshop on Video Processing and Quality

Metrics for Consumer Electronics (http://www.vpqm.org)– …

• QoE is one of the issues referred to in research programs funded by the EC– Various calls in the recent year include QoE issues– QoE is one of the top topics in future framework program– Already ongoing initiatives such as COST Action IC1003

(Qualinet)


Trends in standardization

• Standardization efforts in quality assessment and metrics– Video Quality Experts Group (VQEG)– ITU-T SG 12 (Performance, QoS and QoE)– JPEG (Advanced Image Coding - AIC, HDR image compression)– MPEG (HEVC, 3DVC)– …


Challenges ahead

• Some key issues in QoE:– Content-dependent quality assessment methods and metrics – Context-dependent quality assessment methods and metrics– Quality assessment methods and metrics beyond AV (haptics, …)– Multi-modal quality assessment methods and metrics (AV, …)– 3D quality assessment methods and metrics (3D sound, 3D video, …)– HDR content quality assessment methods and metrics – Interaction quality metrics (closely related to usability)– Presence/immersion quality metrics– …

• Need for Quality Certification Mechanisms of multimedia services and products– Similar in idea to ISO 9000 series


What does this all mean to you?

• Era of user-centric multimedia has already started– It is not anymore sufficient to merely add new features and

functionalities to multimedia systems– True added value in terms of impact on user experience of such

features and functions should be demonstrated– Quality of Experience plays a central role in this new game

• Opportunities in research, technology, business, art and entertainment


Thank you for your attentionQuestions?

recent advances in quality of experience in multimedia communication

Technology

anniversary forum porto

quality of experience

portugal9 quality

portugal3 quality

portugal6 quality

portugal7 quality

definitions of quality

objective quality metrics