recent advances in quality of experience in multimedia communication
DESCRIPTION
Presentation covers various aspects of defining and measuring of the Quality of Experience in IP Multimedia communications, with emphasis on Video. Presented at IMTC 20th Anniversary ForumTRANSCRIPT
IMTC 20th Anniversary Forum – Porto, Portugal1
Recent advances in quality of experience in
multimedia communication
Touradj [email protected]
International Multimedia Telecommunications Consortium
IMTC 20th Anniversary Forum – Porto, Portugal2
•What is “quality”?
•Origins of “quality” evalua7on in telecommunica7ons
•How is “quality” measured in mul7media communica7ons today ?
•What are the trends on “quality” in mul7media communica7on?
• What are the challenges ahead?
Today we will talk about…
IMTC 20th Anniversary Forum – Porto, Portugal3
Quality: a simple yet difficult concept
• Like many human sensations, quality is easy to understand but difficult to define
• Quality according to Wikipedia:– A quality (from Latin - qualitas) is an attribute or a
property– Some philosophers assert that a quality cannot be
defined
IMTC 20th Anniversary Forum – Porto, Portugal4
A fundamental, ancient, but largely under investigated concept
Aristotle classified every object of human apprehension into 10 Categories– Substance– Quantity– Quality– Relation– Place– Time– Position– State– Action– Affection
IMTC 20th Anniversary Forum – Porto, Portugal5
Qualinet white paper
• White Paper produced by COST Action IC1003 (Qualinet):– http://www.qualinet.eu/images/stories/
whitepaper_v1.1_dagstuhl_output_corrected.pdf• Several definitions of quality in multimedia
systems and services and other related concepts
IMTC 20th Anniversary Forum – Porto, Portugal6
Quality is like an elephant …
The blind men and the elephant: Poem by John Godfrey Saxe
IMTC 20th Anniversary Forum – Porto, Portugal7
Quality in telecommunications
IMTC 20th Anniversary Forum – Porto, Portugal8
Quality in telecommunica4ons
Network QualityCapacity
CoverageHandoff
Link QualityBitrate
Frame/Bit/Packet lossDelay
User Quality
Speech fidelityAudio fidelityImage fidelityVideo fidelity
IMTC 20th Anniversary Forum – Porto, Portugal9
Quality in telecommunications
Objective: input = output
IMTC 20th Anniversary Forum – Porto, Portugal10
QoS in telecommunications
• Quality of Service (QoS)– Resource reservation control mechanisms– Ability to provide different priority to different
applications or data flows– Guarantee a certain level of performance (quality) to
a data flow• Requires measuring or estimating quality• (Service) Provider-centric concept
– Tightly related to the concept of Mean Opinion Score (MOS)
IMTC 20th Anniversary Forum – Porto, Portugal11
Mean Opinion Score (MOS)?
• Widely used in many fields:– Politics/Elections– Marketing/Advertisement– Food industry– Multimedia– …
• The likely level of satisfaction of a service or product as appreciated by an average user in a typical use
• Should be performed such that it generates reliable and reproducible results– Subjective evaluation methodology– More complex and difficult that it seems
IMTC 20th Anniversary Forum – Porto, Portugal12
What is behind a MOS?
IMTC 20th Anniversary Forum – Porto, Portugal13
What is wrong with MOS?• Ground truth
– It is assumed to be the optimal solution (ground truth)• Defies user preference
– It is assumed to be independent of specific subjects• Ignores key parameters
– It is often assumed to be independent of content and context
• Misuse as a comparison approach– It is often used to compare performance between
alternative tools for a same task/operation• Impractical or expensive• Statistical assumptions not always realistic
IMTC 20th Anniversary Forum – Porto, Portugal14
Ground truth
IMTC 20th Anniversary Forum – Porto, Portugal15
Defies user preference
IMTC 20th Anniversary Forum – Porto, Portugal16
Content
p01 $$$$$$p06 $$$$$$$$$$p10 $$$$$$$$$$$$$$$bike$$$$$$$$$$$$$$$$$$cafe$$$$$$$$$$$$$$$woman$
IMTC 20th Anniversary Forum – Porto, Portugal17
Context
IMTC 20th Anniversary Forum – Porto, Portugal18
Objec4ve quality metrics
• Subjec7ve tests are 7me consuming, expensive, difficult to design, and cannot be used in real-‐7me systems
• Objec7ve algorithms, i.e. metrics, es7ma7ng MOS with high level of correla7on are desired• Full reference metrics• No reference metrics• Reduced reference metrics
IMTC 20th Anniversary Forum – Porto, Portugal19
Objec4ve quality metrics based on fidelity
• Full Reference scenario• Metrics which look at the fidelity of the signal when
compared to an explicit reference:
processed signal=
perfect quality reference signal+
error signal
IMTC 20th Anniversary Forum – Porto, Portugal20
Objec4ve quality metrics based on fidelity
• Examples of fidelity metrics for image quality quality assessment− Mean Square Error (MSE)− Peak Signal to Noise Ra7o (PSNR)− Maximum Pixel Devia7on (Linf)
− Weighted PSNR − Masked PSNR− Structural SIMilarity (SSIM)− Mul7scale Structural Similarity (MSSIM)− Visual Informa7on Fidelity (VIF)− etc…
IMTC 20th Anniversary Forum – Porto, Portugal21
What is wrong with objective quality metrics ?
• Poor performance in predicting MOS– Especially in real life situations
• How to deal with color components?– Even in PSNR!
• How to deal with video?– Time versus spatial dimension
• How to take into account context?– Display size/type– Environmental illumination– …
• How to deal with user preference/expectation?• ...
IMTC 20th Anniversary Forum – Porto, Portugal
MVC assessment using PSNR as metric
22 Vetro: Update on MPEG 3D Video Coding Activity
MVC�Coding�Performance
Ballroom
31
32
33
34
35
36
37
38
39
40
0 200 400 600 800 1000 1200 1400 1600 1800
Bitrate (Kb/s)
PS
NR
(db)
Simulcast
MVC
Race1
32
33
34
35
36
37
38
39
40
41
42
0 200 400 600 800 1000 1200 1400 1600
Bitrate (Kb/s)
PS
NR
(db)
Simulcast
MVC
Sample comparison of simulcast vs inter-view prediction
8 views (640�480), consider rate for all views~25% bit rate savings over all views
IMTC 20th Anniversary Forum – Porto, Portugal
MVC assessment by subjective evaluation
23Vetro: Update on MPEG 3D Video Coding Activity
MVC:�Subjective�Performance• Main�finding:�MVC�achieves�comparable�quality�to�
simulcast�with�as�little�as�25%�rate�for�dependent�view
1.00
1.50
2.00
2.50
3.00
3.50
4.00
4.50
Origina
l
Simulc
ast (
AVC+AVC)
12L_
50Pct
12L_
35Pct
12L_
25Pct
12L_
20Pct
12L_
15Pct
12L_
10Pct
12L_
5Pct
Mea
n O
pini
on S
core
Base view fixed at 12Mbps Dependent view at varying percentage of base view rate
IMTC 20th Anniversary Forum – Porto, Portugal24
Evolving Quality Paradigms
IMTC 20th Anniversary Forum – Porto, Portugal
User centered evaluation
25
• It is becoming increasing important to evaluate quality from a user perspective rather than from a provider/system perspective
• Constrains the tests to potential users and stimuli and evaluates the acceptability with respect to the task and the context
• Identify underlying quality factors beside the overall quality
IMTC 20th Anniversary Forum – Porto, Portugal26
What people mean when they say QoE ?
• “The degree of fulfillment of an intended experience on a given user”
as defined by Touradj Ebrahimi, 2001
• “The overall acceptability of an application or service … by the end user”
as defined by the ITU-T
• “The degree of delight ... of the user of an application or service ... user’s personality and current state.”
as defined in white paper by Qualinet
The term ‘experience’ promises individual engagement …
IMTC 20th Anniversary Forum – Porto, Portugal27
Quality of Service vs Quality of Experience
• Quality of Service: Value of the average user’s experience es7mated by a service/product/content provider
• Quality of Experience: Value (es7mated or actually measured) of a specific user’s richness of experience
• Quality of Experience is the dual (and extended) view of QoS problem
QoS=provider-‐centricQoE=user-‐centric
IMTC 20th Anniversary Forum – Porto, Portugal28
A simple model for QoE
• User attributes – individual attributes – expectation, age, sex, personality, background…– sensorial attributes – including limitations and deficiencies– perceptual attributes – emotional attributes
• System attributes– technical attributes (as in QoS)
• Contextual attributes– environmental attributes– device attributes– service attributes– content attributes
userQoS
QoE
context
IMTC 20th Anniversary Forum – Porto, Portugal
Personas and scenarios [Strohmeier2009]
• Personas (user preference)– Archetypical user representing the needs, behaviors
and goals of a particular group of users– Not real people but fictional characters derived from
user requirements• Scenarios (context)
– Realistic usage environment
29
IMTC 20th Anniversary Forum – Porto, Portugal
Quality of Experience in Mobile Multimedia
• The influence of context in mobile multimedia user experience and QoE– Various probes to monitor user behavior– Specific scenario to be followed by subjects
30
The test equipment selected for the experiment needed tosatisfy two main constraints. Firstly, given the complexity ofthe measurement task, a range of di!erent devices was nec-essary to capture the video consumption behaviour of theusers during the experiment. The devices were required towork and interact with each other in real time. Secondly, asthe experiment was conducted in a non-laboratory environ-ment and the users needed to carry the equipment around,the devices were constrained to be lightweight and robustenough to work under moving conditions.The test equipment consisted of a video playback system
and a recording system selected with respect to the aboveconstraints. The former was used to show the content tothe user and capture information regarding the video playerstatus and the user location. The video playback systemconsisted of the following items:
• a HTC Tattoo mobile phone used to display the videocontent and an external headset to listen to the audio
• a standard button based video player graphical userinterface, as illustrated in Figure 2, with the followingfunctionalities:
– pause/play
– next channel
– previous channel
– menu to change channel
– standard volume controls
• a logging software to keep track of the controls acti-vated at anytime by the user
• a GPS integrated in the mobile phone used to deter-mine the position and the mobility scenario of the user.
The recording system facilitated the capture and storageof video information recorded to determine the user’s visualfocus of attention. The recording system consisted of thefollowing items:
• a head mounted Logitech Webcam Pro 9000 (cam1) torecord the scene in front of the viewer
• a mobile device mounted Logitech Webcam Pro 9000(cam2) to record the scene in front of the display ofthe mobile device
• a backpack with a Lenovo ThinkPad T400s laptop witha solid state disk to store simultaneously and in realtime the data recorded by the two cameras.
The videos were recorded at 15 frames per second (fps).The internal clocks of the laptop and the mobile phone weresynchronized before each test session. In order to allow foran automated analysis of the user’s focus of attention duringpost-processing of the acquired data, a marker (black text’MMSPG’ on white background) was placed on each camera.A picture of the test equipment installed on a user is shownin Figure 1.
Figure 1: Test equipment system installed on a user.
(a) (b)
(c) (d)
Figure 2: Video GUI for the following modi: (a)play, (b) pause, (c) menu, and (d) volume control.
4.3 Data processingThe data acquired for each subject during the experiment
has been processed o"ine. The user attention has been de-termined by applying an object detection algorithm [25], todetect the markers in the two sequences. Particularly, themarkers were placed on cam1 and cam2 in such a way that,when the user is watching at the mobile phone screen, thescene recorded from cam1 will include the marker on topof the mobile phone (Figure 3(a)). Vice versa, the scenerecorded from cam2 will include the frontal user’s face aswell as the marker on user’s head (Figure 3(b)). Addition-ally, the marker on cam1 was mounted on a concave supportwhich occludes it from the field of view of cam2 when theface is perpendicular to the mobile phone screen (Figure4(b)). This avoided the detection of translated versions ofthe marker, which may be present in the field of view ofcam2 when the user is not looking at the mobile phone butholds it in the same position as when the video consumptiontakes place. Thus, none of the markers is in the field of viewof the cameras when the user is not looking at the phone,as shown in the sample frames in Figure 4. As a result, ifthe markers are detected in the frames captured at the sameinstant from the two cameras, then the user was identifiedto be looking at the mobile phone screen. Vice versa, if nomarkers are detected, then the user was not looking at the
The test equipment selected for the experiment needed tosatisfy two main constraints. Firstly, given the complexity ofthe measurement task, a range of di!erent devices was nec-essary to capture the video consumption behaviour of theusers during the experiment. The devices were required towork and interact with each other in real time. Secondly, asthe experiment was conducted in a non-laboratory environ-ment and the users needed to carry the equipment around,the devices were constrained to be lightweight and robustenough to work under moving conditions.The test equipment consisted of a video playback system
and a recording system selected with respect to the aboveconstraints. The former was used to show the content tothe user and capture information regarding the video playerstatus and the user location. The video playback systemconsisted of the following items:
• a HTC Tattoo mobile phone used to display the videocontent and an external headset to listen to the audio
• a standard button based video player graphical userinterface, as illustrated in Figure 2, with the followingfunctionalities:
– pause/play
– next channel
– previous channel
– menu to change channel
– standard volume controls
• a logging software to keep track of the controls acti-vated at anytime by the user
• a GPS integrated in the mobile phone used to deter-mine the position and the mobility scenario of the user.
The recording system facilitated the capture and storageof video information recorded to determine the user’s visualfocus of attention. The recording system consisted of thefollowing items:
• a head mounted Logitech Webcam Pro 9000 (cam1) torecord the scene in front of the viewer
• a mobile device mounted Logitech Webcam Pro 9000(cam2) to record the scene in front of the display ofthe mobile device
• a backpack with a Lenovo ThinkPad T400s laptop witha solid state disk to store simultaneously and in realtime the data recorded by the two cameras.
The videos were recorded at 15 frames per second (fps).The internal clocks of the laptop and the mobile phone weresynchronized before each test session. In order to allow foran automated analysis of the user’s focus of attention duringpost-processing of the acquired data, a marker (black text’MMSPG’ on white background) was placed on each camera.A picture of the test equipment installed on a user is shownin Figure 1.
Figure 1: Test equipment system installed on a user.
(a) (b)
(c) (d)
Figure 2: Video GUI for the following modi: (a)play, (b) pause, (c) menu, and (d) volume control.
4.3 Data processingThe data acquired for each subject during the experiment
has been processed o"ine. The user attention has been de-termined by applying an object detection algorithm [25], todetect the markers in the two sequences. Particularly, themarkers were placed on cam1 and cam2 in such a way that,when the user is watching at the mobile phone screen, thescene recorded from cam1 will include the marker on topof the mobile phone (Figure 3(a)). Vice versa, the scenerecorded from cam2 will include the frontal user’s face aswell as the marker on user’s head (Figure 3(b)). Addition-ally, the marker on cam1 was mounted on a concave supportwhich occludes it from the field of view of cam2 when theface is perpendicular to the mobile phone screen (Figure4(b)). This avoided the detection of translated versions ofthe marker, which may be present in the field of view ofcam2 when the user is not looking at the mobile phone butholds it in the same position as when the video consumptiontakes place. Thus, none of the markers is in the field of viewof the cameras when the user is not looking at the phone,as shown in the sample frames in Figure 4. As a result, ifthe markers are detected in the frames captured at the sameinstant from the two cameras, then the user was identifiedto be looking at the mobile phone screen. Vice versa, if nomarkers are detected, then the user was not looking at the
(a) (b)
Figure 3: Sample frames of the video sequencesrecorded by (a) cam1 and (b) cam2, when the useris looking at the phone screen.
(a) (b)
Figure 4: Sample frames of the video sequencesrecorded by (a) cam1 and (b) cam2, when the useris not looking at the phone screen.
phone at that instant.It is worth mentioning that we also tested a face detection
algorithm [26] to detect the face of the user in the videorecorded by cam2. Unfortunately, too many false positiveand false negative face detections were present in the results.Particularly, false positive occurred when the face was notfrontal, even if the frontal face cascade classifier has beenused. Some false negative frontal face detections can beexplained by the bad quality of the video frames, due touser’s motion and abrupt scene illumination changes.Such frame quality degradations were found to create prob-
lems to the marker detection algorithm as well, however,they were considerably fewer compared to the face detec-tion. Furthermore, the marker detection did not show anyfalse positives. The false negative marker detections wereremoved by applying a correction algorithm that is basedon the assumption that it is impossible for the user to lookaway from the phone for a very short instance in time. Tobe precise, if the marker was not detected for a succession ofk frames but it was detected in its m previous and n follow-ing frames, with m,n > k, then a correction was applied tothe detection results for the k frames. We found that robustresults could be achieved for k = 6 which corresponds to ap-proximately 400ms at 15 fps. For these reasons, the markerdetection algorithm was used to process the video recordedby cam2, instead of the face detection algorithm.Finally, bi-directional marker detection was used instead
of mono-directional detection, in order to avoid any false in-terpretations of the user’s focus of attention. For example,only the head mounted camera could have been used, de-tecting the marker placed on top of the mobile phone screen.Unfortunately, in this case it might happen that the markerfalls in the field of view of cam1 even if the user is not look-ing exactly at the phone screen. The bi-directional markerdetection therefore provided a more robust solution for col-
Figure 5: Path of the experiment reconstructed us-ing the GPS data from one test session.
lecting information regarding the focus of attention of theusers.
The information regarding the detection of the markeron the frames captured at each time instance from the twocameras, together with the information regarding the videoplayer status at the same time instance, facilitates to identifyin which of the three consumption states (see Section 3.1)the user is during the entire duration of the experiment. TheGPS data additionally allows to distinguish between the mo-bility states (static or dynamic) based on a speed estimationof the user. The path followed during the experiment, re-constructed on the Earth map using the acquired GPS dataof one user, is shown in figure 5.
For further processing and analysis, we mapped the ac-quired data onto binary values as shown in Table 2. Thesebinary values are then used to define a set of logical rules aspresented in Table 3, which subsequently serve to identifythe states within the consumption and mobility scenariosand thus, to parameterize the Markov model defined in Sec-tion 3.
5. RESULTSThe major result of the data acquisition test consists of
providing the transition probabilities of the Markov model.Furthermore, the average state sojourn time is estimated.The outcome is used to compute potential bandwidth sav-ings and to analyze two specific parameters leading to QoE
IMTC 20th Anniversary Forum – Porto, Portugal
3D QoE in mobile multimedia
• Comparison of different methods to produce 3D on mobile phones- 2D- Anaglyph- Motion Parallax - Wiggle- Autostereoscopic
31
(a) Overall quality (b) Perceived depth
Fig. 9. MOS/CI of the individual test conditions.
methods are preferred with lower quality score in the follow-ing order Sw, An and Aw. In other words, the wide multi-viewis rated as the best restitution methods in 30% of cases, Mnand 2D in 25%, Sn in 19%, etc. For the depth preference prob-ability and quality scores, the situation is slightly different.The scores for all the 3D restitution techniques are much bet-ter when compared to the 2D restitution. Auto-stereoscopic(Sn, Sw) methods achieve the highest score when Sn withMOS 100 is rated approximately twice better than Sw withMOS 55 and three times better that Mw with MOS 30.
4. CONCLUSION
Alternative 3D restitution techniques for displays of the com-mercially available portable devices have been studied withinthis work. Conventional 2D together with different 3D resti-tution techniques including anaglyph, multi-view motion par-allax and auto-stereoscopic method have been compared oneto another in terms of overall quality and depth perception.
The subjective quality tests show that the additional depthcues provide a better depth perception when compared to sim-ple 2D restitution. While motion parallax method is preferredin terms of overall quality, the depth quality of 3D is clearlyon a side of auto-stereoscopic method. Well-known anaglyphmethod is loosing in both, overall and depth quality, mostlikely due to the inaccurate color rendering and the crosstalkof anaglyph stereoscopy.
As future work, the significant influence of camera base-line, content and size of multiscopic image set on the overallquality and perceived depth for all restitution methods will bestudied in more details.
5. REFERENCES
[1] A. Gotchev, S. Jumisko-Pyykko, A. Boev, andD. Strohmeier, “Mobile 3dtv system: Quality and userperspective,” Procs of EUMOB, 2008.
[2] Cha Zhang, D. Florencio, and Zhengyou Zhang, “Im-proving immersive experiences in telecommunicationwith motion parallax [applications corner],” Signal Pro-
cessing Magazine, IEEE, vol. 28, no. 1, pp. 139 –144,jan. 2011.
[3] M. Rerabek, L. Goldmann, J.S. Lee, and T. Ebrahimi,“Motion parallax based restitution of 3d images onlegacy consumer mobile devices,” in Multimedia Sig-nal Processing (MMSP), 2011 IEEE 13th InternationalWorkshop on. IEEE, 2011, pp. 1–5.
[4] P. Benzie, J. Watson, P. Surman, I. Rakkolainen,K. Hopf, H. Urey, V. Sainov, and C. von Kopylow, “Asurvey of 3dtv displays: Techniques and technologies,”Circuits and Systems for Video Technology, IEEE Trans-actions on, vol. 17, no. 11, pp. 1647 –1658, nov. 2007.
[5] LG Electronics Inc., “Introduction to real3d,” Tech.Rep., LG Electronics Inc., April 2011.
[6] A.J. Woods and C.R. Harris, “Comparing levels ofcrosstalk with red/cyan, blue/yellow, and green/magentaanaglyph 3d glasses (proceedings paper),” 2010.
[7] David F. McAllister, Ya Zhou, and Sophia Sullivan,“Methods for computing color anaglyphs,” in Stereo-scopic Displays and Applications XXI.
[8] G. Peters, “Theories of three-dimensional object per-ception - a survey,” Recent Research Developments inPattern Recognition, 2000.
[9] Christoph Fehn, “Depth image based rendering (dibr),compression and transmission for a new approach on3d-tv,” 2004.
[10] MPEG, “Draft report on experimental frameworkfor 3d video coding,” Tech. Rep. N11478, ISO/IECJTC1/SC29/WG11, July 2010.
[11] MPEG video group, “Description of exploration experi-ments in 3d video coding,” ISO/IEC JTC1/SC29/WG11N9466, , no. N11630, Oct. 2010.
[12] ITU-R, “Subjective assessment of stereoscopic televi-sion pictures,” Tech. Rep. BT.1438, ITU-R, 2000.
[13] M.E. Glickman, “Parameter estimation in large dynamicpaired comparison experiments,” Journal of the RoyalStatistical Society: Series C (Applied Statistics), vol. 48,no. 3, pp. 377–394, 1999.
Fig. 2. Motion parallax based 3D restitution.
2.2.1. Multi-view motion parallax
The idea of portable device restitution method based on mo-tion parallax employing the multi-view image data-set is il-lustrated in Fig. 2. By rotating the portable device, the viewercontrols the view which is rendered on the display and sohe/she can observe the scene from different positions. Thisway the viewer imitates a process similar to what he/she doeswhen observing the real world.
When the number of images in a stereoscopic or multi-scopic image set is too small to achieve a smooth restitution,intermediate images are generated using depth image basedrendering [9]. The depth estimation and view synthesis toolsof the 3D video coding (3DV) framework [10] developed byMPEG are used for the content generation process withinour experiments. The depth estimation reference software(DERS) uses three camera views (left, center, right) togetherwith the intrinsic and extrinsic camera parameters to estimatethe depth map of the center view. From the three operationmodes (automatic, segmentation, semi-automatic) of DERS,the automatic mode has been used. Once the depth mapswere obtained the view synthesis reference software (VSRS)was used to synthesize intermediate views. A virtual view isgenerated based on two reference views with the correspond-ing depth maps as well as the intrinsic and extrinsic cameraparameters. This is achieved by depth and texture mappingand hole filling for each of the reference views followed byimage blending and inpainting.
Given the multiscopic image set the goal of the multi-view restitution is to display the appropriate view accordingto the relative orientation between the device and the vieweras shown in Fig. 3.
Measuring the orientation of the device with respect tothe viewer can be achieved through a built-in accelerometer.From the three possible rotations (pan, tilt, roll) of the mobiledevice only the rotation around the vertical axis (pan) is used.Given the initial pan angle and the predefined pan range, thecurrent view can be computed based on the current pan an-gle. It is displayed until the pan angle reaches an angle that
Fig. 3. Content restitution for the motion parallax based dis-play.
corresponds to another view. Therefore, the display durationof a view depends solely on the speed with which the viewermoves the display.
3. QUALITY EVALUATION
3.1. Image data preparation
A subset of the multi-view sequences from the MPEG1 3DVideo (3DV) data-set [11] has been used for the experiments.More specifically 4 sequences (Lovebird1, Ballons, Kendo,Mobile) from the class C set have been considered. The firstone was used for training and the latter three for testing.
Suitable frames have been selected and extracted fromeach video. Using the latest version of the depth estimationreference software (DERS 5.1) [10] and the view synthesisreference software (VSRS 3.5) [10], the required number ofviews were synthesized. Since the display application on themobile phone does not support the YUV format used by theMPEG tools, the resulting multi-view image sets were fur-ther converted to high quality JPEG images. Finally the spa-tial resolution of the multiscopic image set was adapted tomatch that of the portable device. For our experiments anAndroid based smart phone LG Optimus 3D with an auto-stereoscopic display and with screen resolution of 800x480pixels was used.
Given the resulting multi-view data-set, 7 subsets are cre-ated that simulate the different restitution methods (2D, auto-stereoscopic, anaglyph, motion parallax) for narrow 10cmand wide 20cm camera baseline leading to the following testconditions. 2D - reference 2D image, chosen to be the cen-ter view of the multi-view data-set. Sn - stereoscopic imageset2 with narrow camera baseline resulting in a smaller depth
1http://mpeg.chiariglione.org
2Stereoscopic image set consists of left and right image. The resulting in-terlaced image is rendered automatically using the LG Real3D technology [?]
Fig. 4. Creation of the individual test conditions from multi-scopic data-set.
range. Sw - stereoscopic image set with a wide camera base-line resulting in a larger depth range. Mn - multiscopic imageset of 11 images with a narrow camera baseline resulting insmaller motion parallax. Mw - multiscopic image of 21 im-ages with a wide camera baseline resulting in larger motionparallax. An - anaglyph image with narrow camera baselineresulting in a smaller depth range. Aw - anaglyph image withwide camera baseline resulting in a larger depth range.
The idea of image data-set creation process is illustratedin Fig. 4 for a multi-view image set which consists of 5 orig-inal views (OV) and 4 intermediate synthesized views (SV)between each of them. As usual, the camera baseline corre-sponds the distance between the left and right view. Sincecamera distance affects directly the perceived depth but alsothe visual comfort, two different camera distances were con-sidered to identify the optimal one for each of the restitutionmethods. Subsets with narrower camera baseline (Sn, Mn andAn) utilize the original views OV 10 and OV 30 as the leftand right views, respectively. On the other hand, the origi-nal views OV 00 and OV 40 are used as left and right viewswithin the wider camera baseline subsets (Sw, Mw and Aw).Since the viewing angle range of the mobile device is fixed,the motion smoothness and the depth range are larger for thewider camera baseline.
3.2. Test methodology
Since judging the quality of different 2D and 3D restitutiontechniques individually may be quite difficult, the stimuluscomparison (SC) quality evaluation method seems to be themost suitable. The subjective tests were performed as follows.An special Android application has been developed in orderto display sequentially a pair test stimuli on a testing device.After comparing the two test stimuli the subject was askedto choose his/her preference (“first” or “second”) in terms ofoverall quality and depth quality. The option “same” was alsoincluded to avoid random preference selections. For each of
(a) Ballons (b) Kendo
(c) Mobile (d) Overall
Fig. 5. Preference and tie probabilities of the individual pairsconsidering the overall quality.
the 3 test video sequences (Balloons, Kendo, Mobile) all thepossible combinations of the 7 test conditions (2D, Sw, Sn,Mw, Mn, Aw, An) were considered. This led to a test-setwith 3⇥
�72
�= 63 paired comparisons. Since the IPD (Inter-
Pupillary Distance) of all participating subjects has not beenmeasured, the two camera baselines, used to generate narrowand wide subset, are compared against each other.
Fifteen subjects (11 male and 4 female) with an averageage of 29.3 participated in the subjective test experiments.They reported normal or corrected to normal vision accordingto [12]. All of them were nonexpert viewers with a marginalexperience of 3D image and video viewing.
3.3. Results and discussion
Statistical tools have been applied in order to analyze the in-dividual users preferences ratings for the different scenes andtest conditions.
The simplest way to analyze a set of paired comparisonsis to compute the distribution of the votes over the differentcategorical levels (first, same, second) and normalizing themby the number of subjects. This can be done individually foreach or jointly over all the video sequences. Fig. 5 and Fig. 6show the resulting probabilities for the overall quality and theperceived depth, respectively.
With respect to the overall quality the results across thedifferent scenes are quite similar. Multi-view (Mn, Mw) israted similarly as 2D and together in general perceived muchbetter (preference probability between 60% and 90%) whencompared to anaglyph (An, Aw). Multi-view and 2D com-parison with auto-stereoscopic (Sn, Sw) brings quite com-petitive results with slight advantage on side of multi-view
IMTC 20th Anniversary Forum – Porto, Portugal32
Trends in QoE
• Digital world has (re-)discovered the notion of quality– Lower quality content is increasingly less tolerated by
end-users– Digital technology have surpassed barriers of
attainable quality, while remaining cost effective• Increasing interest in QoE
– Extending from device-centric and system-centric quality optimization to end-to-end and especially user-centric optimization
IMTC 20th Anniversary Forum – Porto, Portugal33
Trends in QoE community building
• Increased interest in workshops and conferences around the notion of quality assessment and metrics– QoMEX: International Workshop on Quality of Multimedia
Experience (http://www.qomex.org)– VPQM: International Workshop on Video Processing and Quality
Metrics for Consumer Electronics (http://www.vpqm.org)– …
• QoE is one of the issues referred to in research programs funded by the EC– Various calls in the recent year include QoE issues– QoE is one of the top topics in future framework program– Already ongoing initiatives such as COST Action IC1003
(Qualinet)
IMTC 20th Anniversary Forum – Porto, Portugal34
Trends in standardization
• Standardization efforts in quality assessment and metrics– Video Quality Experts Group (VQEG)– ITU-T SG 12 (Performance, QoS and QoE)– JPEG (Advanced Image Coding - AIC, HDR image compression)– MPEG (HEVC, 3DVC)– …
IMTC 20th Anniversary Forum – Porto, Portugal35
Challenges ahead
• Some key issues in QoE:– Content-dependent quality assessment methods and metrics – Context-dependent quality assessment methods and metrics– Quality assessment methods and metrics beyond AV (haptics, …)– Multi-modal quality assessment methods and metrics (AV, …)– 3D quality assessment methods and metrics (3D sound, 3D video, …)– HDR content quality assessment methods and metrics – Interaction quality metrics (closely related to usability)– Presence/immersion quality metrics– …
• Need for Quality Certification Mechanisms of multimedia services and products– Similar in idea to ISO 9000 series
IMTC 20th Anniversary Forum – Porto, Portugal36
What does this all mean to you?
• Era of user-centric multimedia has already started– It is not anymore sufficient to merely add new features and
functionalities to multimedia systems– True added value in terms of impact on user experience of such
features and functions should be demonstrated– Quality of Experience plays a central role in this new game
• Opportunities in research, technology, business, art and entertainment
IMTC 20th Anniversary Forum – Porto, Portugal37
Thank you for your attentionQuestions?