task force - lehrstuhl für informatik iii: startseitecrowd:... · • national project proposal...

Task Force

“Crowdsourcing”

Lisboa Meeting@QoMEX 2016

9th June 2016, 9.30-10.00

Tobias Hossfeld, Babak Naderi

WG1 Research

Active Crowd

WG1 Research

Agenda: Crowdsourcing TF

• 1. Overview on current status of crowdsourcing TF (Tobias, 15 min)

• 2. Interests & definition of core topics (all, 15 min)

– Bias in CS (Tobias Hossfeld)

– Adaptive CS (Michael Seufert)

– Scalability, Accuracy Improvement, Bias Removal (Dietmar Saupe)

– Enterprise Crowdsourcing (Matthias Hirth)

– The crowd community, lab vs. crowd (Judith Redi)

– P.CROWD (Babak Naderi)

WG1 Research

Joint Activities and Major Outcome

• Collected in Google Doc

• Active members: core qualinet group and new members

• Continuation of research topics, but different flavor (visualization

community, saliency coding, etc.)

– Crowdsourcing for QoE Experiments: Sebastian Egger, Judith Redi,

Sebastian Möller, Tobias Hossfeld, Matthias Hirth, Christian Keimel,

and Babak Naderi

– Crowdsourcing Versus the Laboratory: Towards Human-centered

Experiments Using the Crowd: Ujwal Gadiraju, Sebastian Möller, Martin

Nöllenburg, Dietmar Saupe, Sebastian Egger, Daniel W. Archambault,

and Brian Fischer

– Crowdsourcing for QoE Experiments: Saliency Coding - Dietmar Saupe

(Uni Konstanz)

WG1 Research

https://docs.google.com/document/d/1HFWq2aG7Rp6lqzLM4jYQQPcDFV9L8p7bEx-WnDT3a_c/edit

Joint Activities and Major Outcome (f.)

• New topics started

– Adaptive Crowdsourcing: Tobias Hoßfeld, Michael Seufert, Dietmar Saupe

– Understanding The Crowd: ethical and practical matters in the academic use

of crowdsourcing: Sheelagh Carpendale, Neha Gupta, Tobias Hoßfeld, David

Martin, Sebastian Möller, Babak Naderi, Judith Redi, Ina Wechsung

– Crowdsourced speech QoE (Sebastian Möller, Tobias Hoßfeld, Babak Naderi)• National project proposal “Analysis of influence factors and definition of subjective methods for

evaluating the quality of speech services using crowdsourcing”

• Special session "Advanced Crowdsourcing for Speech and Beyond" at Interspeech 2015

• Progress in standardization

– ITU Standardization on Crowdsourcing (P.Crowd): Sebastian Möller (Editor)

– ITU-T P912 („Subjective video quality assessment methods for recognition

tasks“) • with an appendix focused on Crowdsourcing based on the Qualinet Crowdsourcing white paper

“Best practices and recommendations for crowdsourced QoE-Lessons learned from the qualinet

task “force crowdsourcing”

WG1 Research

https://hal.archives-ouvertes.fr/hal-01078761

Joint Activities and Major Outcome

• Special Issue on “Crowdsourcing” in Computer Networks Journal

(Elsevier) 2015 published. Guest editors: Tobias Hoßfeld (University

of Duisburg-Essen), Phuoc Tran-Gia (University of Würzburg), Dr.

Maja Vukovic (IBM Thomas J. Watson Research Center Yorktown

Heights)

• Dagstuhl Report “Evaluation in the Crowd: Crowdsourcing and

Human-Centred Experiments”.

http://dx.doi.org/10.4230/DagRep.5.11.103

WG1 Research

http://dx.doi.org/10.4230/DagRep.5.11.103

Joint Events

• Dagstuhl seminar on Crowdsourcing: http://dagstuhl.de/15481/

• PQS Special Session on Crowdsourcing: Judith Redi (TU Delft), Matthias Hirth (University of Würzburg), Tim Polzehl (TU Berlin)

• ACM CrowdMM 2015, Brisbane, AU: co-organizer Judith Redi (TU Delft)

• Special session on "Advanced Crowdsourcing for Speech andBeyond" at Interspeech 2015, Dresden, Germany, Sep 2015. Main organizer from Qualinet: Tim Polzehl (TU Berlin)

• Tutorial “Adaptive Media Streaming and Quality of Experience Evaluations using Crowdsourcing” by Christian Timmerer, Tobias Hoßfeld at 27th International Teletraffic Congress (ITC 27), September 8th-10th, 2015, Ghent, Belgium

WG1 Research

http://dagstuhl.de/15481/

(Joint) Publications

• Michael Seufert, Ondrej Zach, Tobias Hoßfeld, Martin Slanina, Phuoc Tran-Gia. Impact of Test ConditionSelection in Adaptive Crowdsourcing Studies on Subjective Quality. (QoMEX 2016), Lisbon, Portugal, June 2016.

• Lebreton, P., Hupont, I., Mäki, T., Skodras, E., & Hirth, M. (2015, October). Eye Tracker in the Wild: Studying the delta between what is said and measured in a crowdsourcing experiment. In Proceedingsof the Fourth International Workshop on Crowdsourcing for Multimedia (pp. 3-8). ACM.

• Gardlo, B., Egger, S., & Hossfeld, T. (2015, October). Do Scale-Design and Training Matter for Video QoE Assessments through Crowdsourcing?. In Proceedings of the Fourth International Workshop on Crowdsourcing for Multimedia (pp. 15-20). ACM.

• Redi, J., Siahaan, E., Korshunov, P., Habigt, J., & Hossfeld, T. (2015, October). When the Crowd Challenges the Lab: Lessons Learnt from Subjective Studies on Image Aesthetic Appeal. In Proceedingsof the Fourth International Workshop on Crowdsourcing for Multimedia (pp. 33-38). ACM.

• Korshunov, P., Bernardo, M. V., Pinheiro, A. M., & Ebrahimi, T. (2015). Impact of Tone-mappingAlgorithms on Subjective and Objective Face Recognition in HDR Images. In International ACM Workshop on Crowdsourcing for Multimedia (CrowdMM) (No. EPFL-CONF-210823).

• Naderi B, Möller S., Hoßfeld T(2016). ITU-T contribution to P.Crowd (Q.7)

• Dietmar Saupe, Franz Hahn, Vlad Hosu, Igor Zingman, Masud Rana, Shujun Li: Crowd workers provenvaluable in comparative study of subjective video quality assessment, QoMEX 2016 Short Paper, Lisbon2016

• Vlad Hosu, Franz Hahn, Igor Zingman, Dietmar Saupe: Reported Attention as a Promising Alternative toGaze in IQA Tasks submitted, PQS, Berlin, 2016

• „One Shot Crowdtesting: Approaching the Extremes of Crowdsourced Subjective Quality Testing“ by Michael Seufert; Tobias Hoßfeld, PQS 2016

• „Reported Attention as a Promising Alternative to Gaze in IQA Tasks“ by Vlad Hosu; Franz Hahn; Igor Zingman; Dietmar Saupe, PQS 2016

• „Size does matter. Comparing the results of a lab and a crowdsourcing file download QoE study“ byAndreas Sackl; Bruno Gardlo; Raimund Schatz, QoMEX 2016

List is incomplete and needs to be updated

WG1 Research

WG1 Research

Active TF!

Agenda: Crowdsourcing TF

• 1. Overview on current status of crowdsourcing TF (Tobias, 15 min)

• 2. Interests & definition of core topics (all, 15 min)

– Bias in CS (Tobias Hossfeld)

– Adaptive CS (Michael Seufert)

– Scalability, Accuracy Improvement, Bias Removal (Dietmar Saupe)

– Enterprise Crowdsourcing (Matthias Hirth)

– The crowd community, lab vs. crowd (Judith Redi)

– P.CROWD (Babak Naderi)

WG1 Research

Mini-Dagstuhl Seminar at PQS 2016

• Idea: Mini-Dagstuhl Seminar on Crowdsourcing

– Intensive discussions on selected core topics

– Topics identified in online meeting before

– There is interest of the TF members for Mini-Dagstuhl-like seminar on

crowdsourcing.

• Co-located with PQS 2016, 5th ISCA/DEGA Workshop on

Perceptual Quality of Systems

– http://pqs.qu.tu-berlin.de/

– Sep 1, 2016: 13.00- 17.30;

Sep. 2, 2016: 9.00-12.00

– Concrete schedule: please

fill doodle

https://uaruhr.doodle.com/poll/y22qwhxm8e5mi6cw

WG1 Research

http://pqs.qu.tu-berlin.de/

https://uaruhr.doodle.com/poll/y22qwhxm8e5mi6cw

Key Topics

• Bias in CS: Methodology

• Lab vs. CS: Systematic Approach and Standardization

• Adaptive CS: Sampling Strategies, Parameter Selection

• Extreme Design Cases: Single CS Ratings vs. Long CS Tasks

• Utilizing the power of CS: influence factors like user level

• Reliability Metrics and CS: effects of CS

• Crowdsourcing in enterprise environments for

– QoE experiments

– Knowledge discovery/organization

• Broadening the scope

– Crowdsensing

– The People

WG1 Research

(Some) Proposals: Tobias

• Adaptive crowdsourcing and automatic selection

– Task questions are generated online and depending on the answers to the previously given answers for crowdsourcing experiment for video quality (Uni Konstanz)

– Parameter selection in crowdsourcing tests: continuous as well as discrete parameters

– Multi-parameter selection in crowdsourcing tests

– How to select parameters dynamically (based on history of users’ ratings, based on parameters under investigation, based on data quality, reliability)?

• Bias in CS

– Many factors lead to a bias in CS experiments and determine who participates (type of task, motivation, time zones, etc.)

– CS experiment is one random sample

– Example: demographics of workers

– What is the bias of CS results? How can the results be unbiased?

WG1 Research

Michael Seufert14

Qualinet Crowdsourcing Task Force

Adaptive Crowdsourcing

Problem: How to obtain an accurate QoE model for a parameter

range with a fixed, small rating budget?

Typical approach: split rating budget equally among discrete

conditions

But: higher certainty at the edge of the rating scale than for

„interesting“ medium quality conditions

Proposed alternative: adaptive crowdsourcing

Statistical test condition selection based on confidence interval widths

Results:

Statistical adaptation is able to reduce confidence interval width and

gives accurate MOS values

Enough ratings have to be allocated at the cold start

QoE models of continuous test design outperform discrete case

Michael Seufert15

Qualinet Crowdsourcing Task Force

One Shot Crowdsourcing

Crowdsourcing studies often show to be very sensitive, e.g., to

test instructions, design, and filtering of unreliable participants

The exposure of several test conditions to single workers

potentially leads to an implicit training and anchoring of ratings

Research question:

Is it feasible to conduct a crowdsourced QoE study be

presenting only a single test condition to each participant?

First results:

Lower MOS values for the set of first ratings

are clearly visible, although the confidence

intervals overlap

One shot test design results in a similar QoE

model than traditional approach

Workgroup Multimedia Signal ProcessingUniversity of Konstanz, GermanyDietmar Saupe, Vlad Hosu, Franz Hahn, Shujun Li, Tamas Sziranyi

IQA/VQA

• Scalability• Economics of crowdsourcing: Improving the worker engagement. Tradeoffs

between qualification and price. Short term vs. long term incentives. Wheredoes one best do the filtering?

• We will be using thousands of stimuli.

• Accuracy improvement• Reconstruction of MOS from paired comparison ratings. Problems: How to

best replace absolute category ratings by more sensitive paired comparisons(with ratings of magnitudes of differences). Mathematical models, numericalsolution, and validation.

WG1 Research

• How to properly define quality in IQA/VQA• How to deal with the different dimensions of quality: information content,

usability, aesthetics or pleasantness.

• Bias removal• Grounded mean opinion scores: how to introduce objective anchors and to

use the crowd as a predictor for the rest of the data.

WG1 Research

Matthias Hirth18

Enterprise Crowdsourcing


Idea: Integrating Crowdsourcing techniques in enterprise

environments

Challenges

Appropriate incentive mechanisms

Management of confidential data

Access to enterprise employees for real-world studies

…

Possible use cases

Information collection/Knowledge management

Enabling flexible working time models

QoE studies in enterprise settings

…

Matthias Hirth19


Crowdsourced Enterprise QoE Study

Is it possible to quantify the impact of delays in enterprise

applications (e.g. SAP) on the satisfaction of employees?

Crowdsourcing methodology

Collection of subjective ratings at a very large scale

Simple interfaces for subjective ratings

Incentives for employees to increase participation (open)

Collected data

Six measurements, each 1-2 weeks

(Different) SAP performance

measurements (~ 17 mio data points)

Subjective employee ratings

( ~55800 ratings)

Judith – Current CS & QoE Activities

• Crowd community• Who are crowdworkers? Why do they do crowdwork?

Crowdworkforce demographics - with Tobias, Babak and Ina

• Is there a social infrastructure supporting crowdwork? Communities in online crowdsourcing fora – With Alessandro Bozzon (TUD) and Tobias

• Lab and crowd• Reliability and repeatability of CS-based QoE Testing

• Recent TMM publication - with Ernestasia Siahaan

• JOC – with Tobias and Pavel

• Facebook and volunteer crowds• Social-media based QoE testing, continuation of CrowdMM ‘14 work – with Yi

Zhu and Jessica Alecci

• QoE Crowdtesting• Book chapter in the workings

WG1 Research

QUL TU-Berlin & CrowdsourcingStandardization of speech quality assessment in Crowdsourcing (ITU-T contribution on P.CROWD)

First steps (Lab vs. CS):

Effect of (re-)training on quality of data

Motivation message in trapping question

Questionnaire for measuring motivation in crowdsourcing based on Self-Determination Theory

Prediction of task choice strategies of workers (based on workload,…)

Gamification in Crowdsourcing

Crowdee: mobile crowdsourcing platform provided by QUL

WG1 Research

WG1 Research

Thank you

WG1 Crowdsourcing TF: Contact Information

• TF leader

– Tobias Hoßfeld (University of Duisburg-Essen)[email protected]

– Babak Naderi (TU Berlin) [email protected]

• Wiki page

– https://www3.informatik.uni-wuerzburg.de/qoewiki/qualinet:crowd

– Access to the wiki: contact Tobias and Babak

• Mailing list

– Qualinet Mail-Reflector for “Crowdsourcing” [email protected].

– In order to subscribe in this list, you simply have to send an (empty) email to [email protected] and follow the steps of the e-mail being received. The instructions can also be found http://listes.epfl.ch/doc.cgi?liste=cs.wg2.qualinet.

WG1 Research

mailto:[email protected]


https://www3.informatik.uni-wuerzburg.de/qoewiki/qualinet:crowd



http://listes.epfl.ch/doc.cgi?liste=cs.wg2.qualinet

task force - lehrstuhl für informatik iii: startseitecrowd:... · • national project proposal...

Documents