a crowdsourceable qoe evaluation framework for multimedia content

37
ACM Multimedia 2009 A Crowdsourceable QoE Evaluation Framework for Multimedia Content KuanTa Chen Academia Sinica ChenChi Wu National Taiwan University YuChun Chang National Taiwan University ChinLaung Lei National Taiwan University

Upload: academia-sinica

Post on 10-May-2015

1.376 views

Category:

Technology


3 download

DESCRIPTION

Until recently, QoE (Quality of Experience) experiments had to be conducted in academic laboratories; however, with the advent of ubiquitous Internet access, it is now possible to ask an Internet crowd to conduct experiments on their personal computers. Since such a crowd can be quite large, crowdsourcing enables researchers to conduct experiments with a more diverse set of participants at a lower economic cost than would be possible under laboratory conditions. However, because participants carry out experiments without supervision, they may give erroneous feedback perfunctorily, carelessly, or dishonestly, even if they receive a reward for each experiment. In this paper, we propose a crowdsourceable framework to quantify the QoE of multimedia content. The advantages of our framework over traditional MOS ratings are: 1) it enables crowdsourcing because it supports systematic verification of participants’ inputs; 2) the rating procedure is simpler than that of MOS, so there is less burden on participants; and 3) it derives interval-scale scores that enable subsequent quantitative analysis and QoE provisioning. We conducted four case studies, which demonstrated that, with our framework, researchers can outsource their QoE evaluation experiments to an Internet crowd without risking the quality of the results; and at the same time, obtain a higher level of participant diversity at a lower monetary cost.

TRANSCRIPT

Page 1: A Crowdsourceable QoE Evaluation Framework for Multimedia Content

ACM Multimedia 2009

A CrowdsourceableQoE Evaluation Framework for 

Multimedia Content

Kuan‐Ta Chen Academia SinicaChen‐Chi Wu National Taiwan UniversityYu‐Chun Chang National Taiwan UniversityChin‐Laung Lei National Taiwan University 

Page 2: A Crowdsourceable QoE Evaluation Framework for Multimedia Content

A Crowdsourceable QoE Evaluation Framework for Multimedia Content / Kuan‐Ta Chen  2

What is QoE?

Quality of Experience = 

Users’ satisfaction about a service 

(e.g., multimedia content)

Page 3: A Crowdsourceable QoE Evaluation Framework for Multimedia Content

A Crowdsourceable QoE Evaluation Framework for Multimedia Content / Kuan‐Ta Chen  3

Quality of Experience

Poor(underexposed)

Good(exposure OK)

Page 4: A Crowdsourceable QoE Evaluation Framework for Multimedia Content

A Crowdsourceable QoE Evaluation Framework for Multimedia Content / Kuan‐Ta Chen  4

Challenges

How to quantify the QoE of multimedia content efficiently and reliably?

Q=?

Q=?

Q=?

Q=?

Page 5: A Crowdsourceable QoE Evaluation Framework for Multimedia Content

A Crowdsourceable QoE Evaluation Framework for Multimedia Content / Kuan‐Ta Chen  5

Mean Opinion Score (MOS)Idea: Single Stimulus Method (SSM) + Absolute Categorial Rating (ACR)

Excellent?Good?Fair?Poor?Bad?

vote

Fair

MOS Quality Impairment

5 Excellent Imperceptible

4 Good Perceptible but not annoying

3 Fair Slightly annoying

2 Poor Annoying

1 Bad Very annoying

Page 6: A Crowdsourceable QoE Evaluation Framework for Multimedia Content

A Crowdsourceable QoE Evaluation Framework for Multimedia Content / Kuan‐Ta Chen  6

ACR‐basedConcepts of the scales cannot be concretely defined

Dissimilar interpretations of the scale among users

Only an ordinal scale, not an interval scale

Difficult to verify users’ scores

Subjective experiments in laboratoryMonetary cost (reward, transportation)

Labor cost (supervision)

Physical space/time/hardware constraints

Drawbacks of MOS‐based Evaluations

Solve all these drawbacks

ACR‐basedConcepts of the scales cannot be concretely defined

Dissimilar interpretations of the scale among users

Only an ordinal scale, not an interval scale

Difficult to verify users’ scores

Subjective experiments in laboratoryMonetary cost (reward, transportation)

Labor cost (supervision)

Physical space/time/hardware constraints

Page 7: A Crowdsourceable QoE Evaluation Framework for Multimedia Content

A Crowdsourceable QoE Evaluation Framework for Multimedia Content / Kuan‐Ta Chen  7

ACR‐basedConcepts of the scales cannot be concretely defined

Dissimilar interpretations of the scale among users

Only an ordinal scale, not an interval scale

Difficult to verify users’ scores

Subjective experiments in laboratoryMonetary cost (reward, transportation)

Labor cost (supervision)

Physical space/time/hardware constraints

Drawbacks of MOS‐based Evaluations

Crowdsourcing

Paired Comparison

Page 8: A Crowdsourceable QoE Evaluation Framework for Multimedia Content

A Crowdsourceable QoE Evaluation Framework for Multimedia Content / Kuan‐Ta Chen  8

Contribution

Page 9: A Crowdsourceable QoE Evaluation Framework for Multimedia Content

A Crowdsourceable QoE Evaluation Framework for Multimedia Content / Kuan‐Ta Chen  9

Talk Progress

Overview

MethodologyPaired Comparison

Crowdsourcing Support

Experiment Design

Case Study & EvaluationAcoustic QoE

Optical QoE

Conclusion

Page 10: A Crowdsourceable QoE Evaluation Framework for Multimedia Content

A Crowdsourceable QoE Evaluation Framework for Multimedia Content / Kuan‐Ta Chen  10

Current Approach: MOS Rating

Excellent?Good?Fair?Poor?Bad?

Vote

?

Page 11: A Crowdsourceable QoE Evaluation Framework for Multimedia Content

A Crowdsourceable QoE Evaluation Framework for Multimedia Content / Kuan‐Ta Chen  11

Our Proposal: Paired Comparison

Which one is better?

B

Vote

A

B

Page 12: A Crowdsourceable QoE Evaluation Framework for Multimedia Content

A Crowdsourceable QoE Evaluation Framework for Multimedia Content / Kuan‐Ta Chen  12

Properties of Paired Comparison

Generalizable across different content types and applications

Simple comparative judgmentdichotomous decision easier than 5‐category rating

Interval‐scale QoE scores can be inferred

The users’ inputs can be verified

Page 13: A Crowdsourceable QoE Evaluation Framework for Multimedia Content

Choice Frequency Matrix

0 9 10 9

1 0 7 8

0 3 0 6

1 2 4 0

10 experiments, each containing C(4,2)=6 paired comparisons

A

B

C

D

A B C D

Page 14: A Crowdsourceable QoE Evaluation Framework for Multimedia Content

A Crowdsourceable QoE Evaluation Framework for Multimedia Content / Kuan‐Ta Chen  14

Inference of QoE Scores

Bradley‐Terry‐Luce (BTL) modelinput: choice frequency matrix

output: an interval‐scale score for each content (based on maximum likelihood estimation)

)()(

)()(

1)()()(

ji

ji

TuTu

TuTu

ji

iij e

eTT

TP −

+=

+=

πππ

n content: T1,…, Tn

Pij : the probability of choosing Ti over Tj

u(Ti) is the estimated QoE score of the quality level Ti

Basic IdeaP12 = P23 u(T1) - u(T2) = u(T2) - u(T3)

Page 15: A Crowdsourceable QoE Evaluation Framework for Multimedia Content

A Crowdsourceable QoE Evaluation Framework for Multimedia Content / Kuan‐Ta Chen  15

Inferred QoE Scores

0 0.63 0.91 1

Page 16: A Crowdsourceable QoE Evaluation Framework for Multimedia Content

A Crowdsourceable QoE Evaluation Framework for Multimedia Content / Kuan‐Ta Chen  16

Talk Progress

Overview

MethodologyPaired Comparison

Crowdsourcing Support

Experiment Design

Case Study & EvaluationAcoustic QoE

Optical QoE

Conclusion

Page 17: A Crowdsourceable QoE Evaluation Framework for Multimedia Content

A Crowdsourceable QoE Evaluation Framework for Multimedia Content / Kuan‐Ta Chen  17

Crowdsourcing

= Crowd + Outsourcing

“soliciting solutions via open calls to large‐scale communities”

Page 18: A Crowdsourceable QoE Evaluation Framework for Multimedia Content

A Crowdsourceable QoE Evaluation Framework for Multimedia Content / Kuan‐Ta Chen  18

Image Understanding

Reward: 0.04 USD

main theme?key objects?

unique attributes?

Page 19: A Crowdsourceable QoE Evaluation Framework for Multimedia Content

A Crowdsourceable QoE Evaluation Framework for Multimedia Content / Kuan‐Ta Chen  19

Linguistic Annotations

Word similarity (Snow et al. 2008)

USD 0.2 for labeling 30 word pairs

Page 20: A Crowdsourceable QoE Evaluation Framework for Multimedia Content

A Crowdsourceable QoE Evaluation Framework for Multimedia Content / Kuan‐Ta Chen  20

More Examples

Document relevance evaluationAlonso et al. (2008)

Document rating collectionKittur et al. (2008)

Noun compound paraphrasingNakov (2008)

Person name resoluationSu et al. (2007)

Page 21: A Crowdsourceable QoE Evaluation Framework for Multimedia Content

A Crowdsourceable QoE Evaluation Framework for Multimedia Content / Kuan‐Ta Chen  21

The Risk

Users may give erroneous feedback perfunctorily, carelessly, or dishonestly

Dishonest users have more incentives to perform tasks

Not every Internet user is trustworthy!

Need to have an ONLINE algorithm to detect problematic inputs!

Page 22: A Crowdsourceable QoE Evaluation Framework for Multimedia Content

A Crowdsourceable QoE Evaluation Framework for Multimedia Content / Kuan‐Ta Chen  22

Verification of Users’ Inputs (1)

Transitivity propertyIf A > B and B > C  A should be > C

Transitivity Satisfaction Rate (TSR)

apply tomay ruleity transitiv the triplesof #ruleity transitivesatisfy th triplesof #

Page 23: A Crowdsourceable QoE Evaluation Framework for Multimedia Content

A Crowdsourceable QoE Evaluation Framework for Multimedia Content / Kuan‐Ta Chen  23

Verification of Users’ Inputs (2)

Detect inconsistent judgments from problematic users

TSR = 1  perfect consistency

TSR >= 0.8  generally consistent

TSR < 0.8  judgments are inconsistent

TSR‐based reward / punishment(e.g., only pay a reward if TSR > 0.8)

Page 24: A Crowdsourceable QoE Evaluation Framework for Multimedia Content

A Crowdsourceable QoE Evaluation Framework for Multimedia Content / Kuan‐Ta Chen  24

Experiment Design

For n algorithms (e.g., speech encoding)1. a source content as the evaluation target 

2. apply the n algorithms to generate n content w/ different Q

3. ask a user to perform       paired comparisons

4. compute TSR after an experiment⎟⎟⎠

⎞⎜⎜⎝

⎛2n

reward a user ONLY if his inputs are self‐consistent(i.e., TSR is higher than a certain threshold)

Page 25: A Crowdsourceable QoE Evaluation Framework for Multimedia Content

A Crowdsourceable QoE Evaluation Framework for Multimedia Content / Kuan‐Ta Chen  25

Concept Flow in Each Round

Page 26: A Crowdsourceable QoE Evaluation Framework for Multimedia Content

A Crowdsourceable QoE Evaluation Framework for Multimedia Content / Kuan‐Ta Chen  26

Audio QoE Evaluation

Which one is better?

(SPACE key released) (SPACE key pressed)

Page 27: A Crowdsourceable QoE Evaluation Framework for Multimedia Content

A Crowdsourceable QoE Evaluation Framework for Multimedia Content / Kuan‐Ta Chen  27

Video QoE evaluation

Which one is better?

(SPACE key released) (SPACE key pressed)

Page 28: A Crowdsourceable QoE Evaluation Framework for Multimedia Content

A Crowdsourceable QoE Evaluation Framework for Multimedia Content / Kuan‐Ta Chen  28

Talk Progress

Overview

MethodologyPaired Comparison

Crowdsourcing Support

Experiment Design

Case Study & EvaluationAcoustic QoE

Optical QoE

Conclusion

Page 29: A Crowdsourceable QoE Evaluation Framework for Multimedia Content

A Crowdsourceable QoE Evaluation Framework for Multimedia Content / Kuan‐Ta Chen  29

Audio QoE Evaluation

MP3 compression levelSource clips: one fast‐paced and one slow‐paced song

MP3 CBR format with 6 bit rate levels: 32, 48, 64, 80, 96, and 128 Kbps

127 participants and 3,660 paired comparisons

Effect of packet loss rate on VoIPTwo speech codecs: G722.1 and G728

Packet loss rate: 0%, 4%, and 8%

62 participants and 1,545 paired comparisons

Page 30: A Crowdsourceable QoE Evaluation Framework for Multimedia Content

A Crowdsourceable QoE Evaluation Framework for Multimedia Content / Kuan‐Ta Chen  30

Inferred QoE Scores

MP3 Compression Level VoIP Packet Loss Rate

Page 31: A Crowdsourceable QoE Evaluation Framework for Multimedia Content

A Crowdsourceable QoE Evaluation Framework for Multimedia Content / Kuan‐Ta Chen  31

Video QoE Evaluation

Video codecSource clips: one fast‐paced and one slow‐paced video clip

Three codecs: H.264, WMV3, and XVID

Two bit rates: 400 and 800 Kbps

121 participants and 3,345 paired comparisons

Loss concealment schemeSource clips: one fast‐paced and one slow‐paced video clip

Two schemes: Frame copy (FC) and FC with frame skip (FCFS) 

Packet loss rate: 1%, 5%, and 8% 

91 participants and 2,745 paired comparisons

Page 32: A Crowdsourceable QoE Evaluation Framework for Multimedia Content

A Crowdsourceable QoE Evaluation Framework for Multimedia Content / Kuan‐Ta Chen  32

Inferred QoE Scores

Video Codec Concealment Scheme

Page 33: A Crowdsourceable QoE Evaluation Framework for Multimedia Content

A Crowdsourceable QoE Evaluation Framework for Multimedia Content / Kuan‐Ta Chen  33

Participant Source

LaboratoryRecruit part‐time workers at an hourly rate of 8 USD

MTurkPost experiments on the Mechanical Turk web site

Pay the participant 0.15 USD for each qualified experiment

CommunitySeek participants on the website of an Internet community with 1.5 million members

Pay the participant an amount of virtual currency that was equivalent to one US cent for each qualified experiment

Page 34: A Crowdsourceable QoE Evaluation Framework for Multimedia Content

Participant Source EvaluationWith crowdsourcing…

lower monetary cost

wider participant diversity 

maintaining the evaluation results’ quality

Crowdsourcing seems a good strategy for multimedia QoE assessment!

Page 35: A Crowdsourceable QoE Evaluation Framework for Multimedia Content

http://mmnet.iis.sinica.edu.tw/link/qoe

Page 36: A Crowdsourceable QoE Evaluation Framework for Multimedia Content

Conclusion

Crowdsourcing is not without limitationsphysical contact

environment control

media

With paired comparison and user input verification, less monetary cost

wider participant diversity

shorter experiment cycle 

evaluation quality maintained

Page 37: A Crowdsourceable QoE Evaluation Framework for Multimedia Content

ACM Multimedia 2009

Kuan‐Ta ChenAcademia Sinica

May the crowd Force with you!

Thank You!