institute of computer science chair of communication networks prof. dr.-ing. p. tran-gia modeling...
TRANSCRIPT
Institute of Computer ScienceChair of Communication Networks
Prof. Dr.-Ing. P. Tran-Gia
Modeling YouTube QoE based on Crowdsourcing and Laboratory User Studies
Tobias Hoßfeld, Raimund Schatz
STSM 15.8.-30.9.2011
http://www3.informatik.uni-wuerzburg.de/research/fia
http://www3.informatik.uni-wuerzburg.de/staff/hossfeld
22
Tobias Hoßfeld
QoE Issue: Waiting, Waiting, Waiting…
Stalling
Waiting Time Perception
33
Tobias Hoßfeld
Research Activities Related to STSM
Application-Level Measurements• bottleneck scenario with constant bandwidth
• video characteristics• realistic stalling patterns
Monitoring and Stalling Detector• heuristics fit QoS• information extraction approach leads to exact QoE results
Optimization and Dimensioning• initial delay (GI/GI/1): T0/D<5%• bandwidth provisioning: 120%V• TCP better UDP in bottleneck
QoE Modeling• only stalling relevant, not content, demographics, etc.
• users “accept” almost no or only short stalling
• crowdsourcing supports i:lab
video player parameter,initial buffer 2sec
variable video bit rate V;high stalling frequency for V=B QoE
management
used stalling lengthin tests: 1-6sec
mapping between QoS (e.g. bandwidth B) and QoE
stalling as keyinfluence factor
44
Tobias Hoßfeld
Executive Summary of STSM
Developed Test Design
Conducted Crowd-
sourcing Tests
DerivedQoE Model
Application Measurements
Laboratory Study
Remote users• ‘Reliability’ questions• App./user monitoring• Preloading of data
Data analysis• Identification of reliable
users• Key influences factors
via machine learning• Fitting with fundamental
relationships
• Mapping function: stalling and QoE• Acceptance vs. perception• Comparison crowdsourcing
with laboratory results
Realistic parametersfor temporal stimuli
• Reliable users• Different demographics• Different test setting, e.g.
longer user tests
55
Tobias Hoßfeld
Crowdsourcing Workflow
Challenge: identify unreliable QoE results Countermeasures: proper test design (gold standard data, consistency questions,
content questions, application monitoring) filtering data and analyzing QoE results
Employer Worker
Submit task Pull task
Complete task
Remuneration
Crowdsourcing platform
1 2
34
5
Methods also applicable to e.g. field trials!
66
Tobias Hoßfeld
Crowdsourcing: Unreliable workers
LEVEL 1: ‘reliability’ questions- wrong answers to content questions- different answers to the same questions- always selected same option- consistency questions:
specified the wrong country/continent
LEVEL 2: ‘QoE’ question- did not notice stalling- perceived non-existent stalling
LEVEL 3: ‘application/user’ monitoring- did not watch all videos completely
Mw1 Mw2Mw3 Mw4 Mw5Mw6 Mw7 FB.0
20
40
60
80
100
Per
cent
age
of w
orke
rs in
test
Level 0 Level 1 Level 2 Level 3
C1 C2 C3 C4 C5 C6 C7 Facebook0 0.2 0.4 0.6 0.8 1
0.25
0.3
0.35
0.4
0.45
0.5
0.55
ratio of fake users
SO
S p
aram
eter
a
stalling length L=1sstalling length L=3s
filter level 1
filter level 3
filter level 2
• SOS hypothesis indicates unreliable test
• Many user ratings rejected further improvements required
• User warnings („Test not done carefully“) rejection rate decreased about 50%
• Filtering may be too strict application layer monitoring not reliable
77
Tobias Hoßfeld
Crowdsourcing vs. Laboratory Studies
0 1 2 3 4 5 61
2
3
4
5
number of stallings
MO
S
crowdsourcinglaboratory
4 seconds of stalling
Key influence factors on YouTube QoE stalling frequency and stalling duration determine the user perceived quality
Lab studies within ACE 2.0 at FTW’s i:Lab Similar shapes of curves in laboratory and crowdsourcing
study
88
Tobias Hoßfeld
Conclusions
Most of relevant stimuli of Internet applications are of temporal nature
QoE models have to be extended in temporal dimension: stalling, waiting times, service interruptions
Gap between user perception and user acceptance, differences in lab and crowdsourcing (WG3)
‘Failed’ subjective studies for analysis of reliability (WG4) Standards to detect unreliable subjects (WG5)
Crowdsourcing appears promising Tests are conducted fast at low costs Possibility to access different user groups (in terms of
expectations/social background) But new challenges are imposed
WG1: “Web and
cloud apps”
WG2: “Crowd-
sourcing”
99
Tobias Hoßfeld
Outcome of STSM
“Quantification of YouTube QoE via Crowdsourcing” by Tobias Hoßfeld, Raimund Schatz, Michael Seufert, Matthias Hirth, Thomas Zinner, Phuoc Tran-Gia, IEEE International Workshop on Multimedia Quality of Experience - Modeling, Evaluation, and Directions (MQoE 2011), Dana Point, CA, USA, December 2011.
“FoG and Clouds: On Optimizing QoE for YouTube” by Tobias Hoßfeld, Florian Liers, Thomas Volkert, Raimund Schatz, accepted at 5th KuVS GI/ITG Workshop "NG Service Delivery Platforms", at DOCOMO Euro-Labs, Munich, Germany
“Quality of Experience of YouTube Video Streaming for Current Internet Transport Protocols” by Tobias Hoßfeld and Raimund Schatz, currently under submission at ACM Computer Communications Review; a technical report of University of Würzburg is available containing the numerical results, Technical Report No. 482: “Transport Protocol Influences on YouTube QoE”, July 2011.
" ‘Time is Bandwidth’? Narrowing the Gap between Subjective Time Perception and Quality of Experience” by Sebastian Egger, Peter Reichl, Tobias Hoßfeld, Raimund Schatz, submitted to IEEE ICC 2012 - Communication QoS, Reliability and Modeling Symposium
“Challenges of QoE Management for Cloud Applications” by Tobias Hoßfeld, Raimund Schatz, Martin Varela, Christian Timmerer, submitted to IEEE Communications Magazine, Special Issues on QoE management in emerging multimedia services
“Recommendations and Comparison of Subjective User Tests via Crowdsourcing and Laboratories for online video streaming”, intended for submission
“Impact of Fake User Ratings on QoE”, intended for Journal submission.