task force - lehrstuhl für informatik iii: startseitecrowd:... · • national project proposal...
TRANSCRIPT
Task Force
“Crowdsourcing”
Lisboa Meeting@QoMEX 2016
9th June 2016, 9.30-10.00
Tobias Hossfeld, Babak Naderi
WG1 Research
Active Crowd
WG1 Research
Agenda: Crowdsourcing TF
• 1. Overview on current status of crowdsourcing TF (Tobias, 15 min)
• 2. Interests & definition of core topics (all, 15 min)
– Bias in CS (Tobias Hossfeld)
– Adaptive CS (Michael Seufert)
– Scalability, Accuracy Improvement, Bias Removal (Dietmar Saupe)
– Enterprise Crowdsourcing (Matthias Hirth)
– The crowd community, lab vs. crowd (Judith Redi)
– P.CROWD (Babak Naderi)
WG1 Research
Joint Activities and Major Outcome
• Collected in Google Doc
• Active members: core qualinet group and new members
• Continuation of research topics, but different flavor (visualization
community, saliency coding, etc.)
– Crowdsourcing for QoE Experiments: Sebastian Egger, Judith Redi,
Sebastian Möller, Tobias Hossfeld, Matthias Hirth, Christian Keimel,
and Babak Naderi
– Crowdsourcing Versus the Laboratory: Towards Human-centered
Experiments Using the Crowd: Ujwal Gadiraju, Sebastian Möller, Martin
Nöllenburg, Dietmar Saupe, Sebastian Egger, Daniel W. Archambault,
and Brian Fischer
– Crowdsourcing for QoE Experiments: Saliency Coding - Dietmar Saupe
(Uni Konstanz)
WG1 Research
Joint Activities and Major Outcome (f.)
• New topics started
– Adaptive Crowdsourcing: Tobias Hoßfeld, Michael Seufert, Dietmar Saupe
– Understanding The Crowd: ethical and practical matters in the academic use
of crowdsourcing: Sheelagh Carpendale, Neha Gupta, Tobias Hoßfeld, David
Martin, Sebastian Möller, Babak Naderi, Judith Redi, Ina Wechsung
– Crowdsourced speech QoE (Sebastian Möller, Tobias Hoßfeld, Babak Naderi)• National project proposal “Analysis of influence factors and definition of subjective methods for
evaluating the quality of speech services using crowdsourcing”
• Special session "Advanced Crowdsourcing for Speech and Beyond" at Interspeech 2015
• Progress in standardization
– ITU Standardization on Crowdsourcing (P.Crowd): Sebastian Möller (Editor)
– ITU-T P912 („Subjective video quality assessment methods for recognition
tasks“) • with an appendix focused on Crowdsourcing based on the Qualinet Crowdsourcing white paper
“Best practices and recommendations for crowdsourced QoE-Lessons learned from the qualinet
task “force crowdsourcing”
WG1 Research
Joint Activities and Major Outcome
• Special Issue on “Crowdsourcing” in Computer Networks Journal
(Elsevier) 2015 published. Guest editors: Tobias Hoßfeld (University
of Duisburg-Essen), Phuoc Tran-Gia (University of Würzburg), Dr.
Maja Vukovic (IBM Thomas J. Watson Research Center Yorktown
Heights)
• Dagstuhl Report “Evaluation in the Crowd: Crowdsourcing and
Human-Centred Experiments”.
http://dx.doi.org/10.4230/DagRep.5.11.103
WG1 Research
Joint Events
• Dagstuhl seminar on Crowdsourcing: http://dagstuhl.de/15481/
• PQS Special Session on Crowdsourcing: Judith Redi (TU Delft), Matthias Hirth (University of Würzburg), Tim Polzehl (TU Berlin)
• ACM CrowdMM 2015, Brisbane, AU: co-organizer Judith Redi (TU Delft)
• Special session on "Advanced Crowdsourcing for Speech andBeyond" at Interspeech 2015, Dresden, Germany, Sep 2015. Main organizer from Qualinet: Tim Polzehl (TU Berlin)
• Tutorial “Adaptive Media Streaming and Quality of Experience Evaluations using Crowdsourcing” by Christian Timmerer, Tobias Hoßfeld at 27th International Teletraffic Congress (ITC 27), September 8th-10th, 2015, Ghent, Belgium
WG1 Research
(Joint) Publications
• Michael Seufert, Ondrej Zach, Tobias Hoßfeld, Martin Slanina, Phuoc Tran-Gia. Impact of Test ConditionSelection in Adaptive Crowdsourcing Studies on Subjective Quality. (QoMEX 2016), Lisbon, Portugal, June 2016.
• Lebreton, P., Hupont, I., Mäki, T., Skodras, E., & Hirth, M. (2015, October). Eye Tracker in the Wild: Studying the delta between what is said and measured in a crowdsourcing experiment. In Proceedingsof the Fourth International Workshop on Crowdsourcing for Multimedia (pp. 3-8). ACM.
• Gardlo, B., Egger, S., & Hossfeld, T. (2015, October). Do Scale-Design and Training Matter for Video QoE Assessments through Crowdsourcing?. In Proceedings of the Fourth International Workshop on Crowdsourcing for Multimedia (pp. 15-20). ACM.
• Redi, J., Siahaan, E., Korshunov, P., Habigt, J., & Hossfeld, T. (2015, October). When the Crowd Challenges the Lab: Lessons Learnt from Subjective Studies on Image Aesthetic Appeal. In Proceedingsof the Fourth International Workshop on Crowdsourcing for Multimedia (pp. 33-38). ACM.
• Korshunov, P., Bernardo, M. V., Pinheiro, A. M., & Ebrahimi, T. (2015). Impact of Tone-mappingAlgorithms on Subjective and Objective Face Recognition in HDR Images. In International ACM Workshop on Crowdsourcing for Multimedia (CrowdMM) (No. EPFL-CONF-210823).
• Naderi B, Möller S., Hoßfeld T(2016). ITU-T contribution to P.Crowd (Q.7)
• Dietmar Saupe, Franz Hahn, Vlad Hosu, Igor Zingman, Masud Rana, Shujun Li: Crowd workers provenvaluable in comparative study of subjective video quality assessment, QoMEX 2016 Short Paper, Lisbon2016
• Vlad Hosu, Franz Hahn, Igor Zingman, Dietmar Saupe: Reported Attention as a Promising Alternative toGaze in IQA Tasks submitted, PQS, Berlin, 2016
• „One Shot Crowdtesting: Approaching the Extremes of Crowdsourced Subjective Quality Testing“ by Michael Seufert; Tobias Hoßfeld, PQS 2016
• „Reported Attention as a Promising Alternative to Gaze in IQA Tasks“ by Vlad Hosu; Franz Hahn; Igor Zingman; Dietmar Saupe, PQS 2016
• „Size does matter. Comparing the results of a lab and a crowdsourcing file download QoE study“ byAndreas Sackl; Bruno Gardlo; Raimund Schatz, QoMEX 2016
List is incomplete and needs to be updated
WG1 Research
WG1 Research
Active TF!
Agenda: Crowdsourcing TF
• 1. Overview on current status of crowdsourcing TF (Tobias, 15 min)
• 2. Interests & definition of core topics (all, 15 min)
– Bias in CS (Tobias Hossfeld)
– Adaptive CS (Michael Seufert)
– Scalability, Accuracy Improvement, Bias Removal (Dietmar Saupe)
– Enterprise Crowdsourcing (Matthias Hirth)
– The crowd community, lab vs. crowd (Judith Redi)
– P.CROWD (Babak Naderi)
WG1 Research
Mini-Dagstuhl Seminar at PQS 2016
• Idea: Mini-Dagstuhl Seminar on Crowdsourcing
– Intensive discussions on selected core topics
– Topics identified in online meeting before
– There is interest of the TF members for Mini-Dagstuhl-like seminar on
crowdsourcing.
• Co-located with PQS 2016, 5th ISCA/DEGA Workshop on
Perceptual Quality of Systems
– http://pqs.qu.tu-berlin.de/
– Sep 1, 2016: 13.00- 17.30;
Sep. 2, 2016: 9.00-12.00
– Concrete schedule: please
fill doodle
https://uaruhr.doodle.com/poll/y22qwhxm8e5mi6cw
WG1 Research
Key Topics
• Bias in CS: Methodology
• Lab vs. CS: Systematic Approach and Standardization
• Adaptive CS: Sampling Strategies, Parameter Selection
• Extreme Design Cases: Single CS Ratings vs. Long CS Tasks
• Utilizing the power of CS: influence factors like user level
• Reliability Metrics and CS: effects of CS
• Crowdsourcing in enterprise environments for
– QoE experiments
– Knowledge discovery/organization
• Broadening the scope
– Crowdsensing
– The People
WG1 Research
(Some) Proposals: Tobias
• Adaptive crowdsourcing and automatic selection
– Task questions are generated online and depending on the answers to the previously given answers for crowdsourcing experiment for video quality (Uni Konstanz)
– Parameter selection in crowdsourcing tests: continuous as well as discrete parameters
– Multi-parameter selection in crowdsourcing tests
– How to select parameters dynamically (based on history of users’ ratings, based on parameters under investigation, based on data quality, reliability)?
• Bias in CS
– Many factors lead to a bias in CS experiments and determine who participates (type of task, motivation, time zones, etc.)
– CS experiment is one random sample
– Example: demographics of workers
– What is the bias of CS results? How can the results be unbiased?
WG1 Research
Michael Seufert14
Qualinet Crowdsourcing Task Force
Adaptive Crowdsourcing
Problem: How to obtain an accurate QoE model for a parameter
range with a fixed, small rating budget?
Typical approach: split rating budget equally among discrete
conditions
But: higher certainty at the edge of the rating scale than for
„interesting“ medium quality conditions
Proposed alternative: adaptive crowdsourcing
Statistical test condition selection based on confidence interval widths
Results:
Statistical adaptation is able to reduce confidence interval width and
gives accurate MOS values
Enough ratings have to be allocated at the cold start
QoE models of continuous test design outperform discrete case
Michael Seufert15
Qualinet Crowdsourcing Task Force
One Shot Crowdsourcing
Crowdsourcing studies often show to be very sensitive, e.g., to
test instructions, design, and filtering of unreliable participants
The exposure of several test conditions to single workers
potentially leads to an implicit training and anchoring of ratings
Research question:
Is it feasible to conduct a crowdsourced QoE study be
presenting only a single test condition to each participant?
First results:
Lower MOS values for the set of first ratings
are clearly visible, although the confidence
intervals overlap
One shot test design results in a similar QoE
model than traditional approach
Workgroup Multimedia Signal ProcessingUniversity of Konstanz, GermanyDietmar Saupe, Vlad Hosu, Franz Hahn, Shujun Li, Tamas Sziranyi
IQA/VQA
• Scalability• Economics of crowdsourcing: Improving the worker engagement. Tradeoffs
between qualification and price. Short term vs. long term incentives. Wheredoes one best do the filtering?
• We will be using thousands of stimuli.
• Accuracy improvement• Reconstruction of MOS from paired comparison ratings. Problems: How to
best replace absolute category ratings by more sensitive paired comparisons(with ratings of magnitudes of differences). Mathematical models, numericalsolution, and validation.
WG1 Research
• How to properly define quality in IQA/VQA• How to deal with the different dimensions of quality: information content,
usability, aesthetics or pleasantness.
• Bias removal• Grounded mean opinion scores: how to introduce objective anchors and to
use the crowd as a predictor for the rest of the data.
WG1 Research
Matthias Hirth18
Enterprise Crowdsourcing
Enterprise Crowdsourcing
Idea: Integrating Crowdsourcing techniques in enterprise
environments
Challenges
Appropriate incentive mechanisms
Management of confidential data
Access to enterprise employees for real-world studies
…
Possible use cases
Information collection/Knowledge management
Enabling flexible working time models
QoE studies in enterprise settings
…
Matthias Hirth19
Enterprise Crowdsourcing
Crowdsourced Enterprise QoE Study
Is it possible to quantify the impact of delays in enterprise
applications (e.g. SAP) on the satisfaction of employees?
Crowdsourcing methodology
Collection of subjective ratings at a very large scale
Simple interfaces for subjective ratings
Incentives for employees to increase participation (open)
Collected data
Six measurements, each 1-2 weeks
(Different) SAP performance
measurements (~ 17 mio data points)
Subjective employee ratings
( ~55800 ratings)
Judith – Current CS & QoE Activities
• Crowd community• Who are crowdworkers? Why do they do crowdwork?
Crowdworkforce demographics - with Tobias, Babak and Ina
• Is there a social infrastructure supporting crowdwork? Communities in online crowdsourcing fora – With Alessandro Bozzon (TUD) and Tobias
• Lab and crowd• Reliability and repeatability of CS-based QoE Testing
• Recent TMM publication - with Ernestasia Siahaan
• JOC – with Tobias and Pavel
• Facebook and volunteer crowds• Social-media based QoE testing, continuation of CrowdMM ‘14 work – with Yi
Zhu and Jessica Alecci
• QoE Crowdtesting• Book chapter in the workings
WG1 Research
QUL TU-Berlin & CrowdsourcingStandardization of speech quality assessment in Crowdsourcing (ITU-T contribution on P.CROWD)
First steps (Lab vs. CS):
Effect of (re-)training on quality of data
Motivation message in trapping question
Questionnaire for measuring motivation in crowdsourcing based on Self-Determination Theory
Prediction of task choice strategies of workers (based on workload,…)
Gamification in Crowdsourcing
Crowdee: mobile crowdsourcing platform provided by QUL
WG1 Research
WG1 Research
Thank you
WG1 Crowdsourcing TF: Contact Information
• TF leader
– Tobias Hoßfeld (University of Duisburg-Essen)[email protected]
– Babak Naderi (TU Berlin) [email protected]
• Wiki page
– https://www3.informatik.uni-wuerzburg.de/qoewiki/qualinet:crowd
– Access to the wiki: contact Tobias and Babak
• Mailing list
– Qualinet Mail-Reflector for “Crowdsourcing” [email protected].
– In order to subscribe in this list, you simply have to send an (empty) email to [email protected] and follow the steps of the e-mail being received. The instructions can also be found http://listes.epfl.ch/doc.cgi?liste=cs.wg2.qualinet.
WG1 Research