The aggregated decisions of a group are often better than the those of any single member
Requirements:
Diversity
Independence
Decentralization
Aggregation
2[Surowiecki, 2004]
Sir Francis Galton
An undefined group of people
Typically ‘large’
Diverse skills and abilities
Typically no special skills assumed
3
[Estelles-Arolas, 2012]
Computational power Distributed computing
Content Web searches, social media
updates, blogs
Observations Online surveys
Personal data4[Good & Su, 2013]
Cognitive power Visual reasoning, language
processing
Creative effort Resource creation, algorithm
development
Funding: $$$
5[Good & Su, 2013]
Crowd data
Content
Search logs
Crowdsourcing
Observations
Cognitive power
Creative effort
Not a focus in this tutorial
Distributed computing
Crowdfunding
6
Access
To the data; to the crowd
▪ 1 in 5 people have a smartphone worldwide
Engagement
Getting contributors’ attention
Incentive Quality control
7
Information reflects health
Disease status
Disease associations
Health related behaviors
Information also drives health
Knowledge and beliefs regarding prevention and treatment
Quality monitoring of health information available to public
8
“Infodemiology”
[Eysenbach, 2006]
Key challenge: text Variability: tired, wiped, pooped somnolence
Ambiguity: numb sensory or cognition? Two levels Keyword: locate specific terms + synonyms
Concept: attempt to normalize mentions to specific entities
Measurement Disproportionality analysis
Separating signal from noise9
Objective: predict flu outbreaks from internet search trends
Access to search data via direct access to logs or via ad clicks
High correlation between clicks one week and cases the next
Caveats!
Many potential confounders
10
[Eysenbach, 2006]
[Eysenbach, 2009]
[Ginsberg et al., 2009]
2004 2005 2006 2007
searches
cases
Objective: Mine social media forums for ADR reports
Lexicon based on UMLS Metathesaurus, SIDER, MedEffect, and a set of colloquial phrases (“zonked”, misspellings)
Demonstrated viability of text mining (73.9% f-measure)
Revealed known ADRs and putatively novel ADRs
Olanzapine Known incidence
Corpus Frequency
Weight gain 65% 30.0%
Fatigue 26% 15.9%
Increased cholesterol
22% -
Increased appetite
- 4.9%
Depression - 3.1%
Tremor - 2.7%
Diabetes 2% 2.6%
Anxiety - 1.4%
11
[Leaman et al., 2010]
Objective: identify DDI from internet search logs
DDI reports difficult to find Focused on a DDI unknown at
time data collected▪ Paroxetine + pravastatin
hyperglycemia
Synonyms Web searches
Disproportionality analysis Results
Significant association
Classifying 31 TP & 31 TN pairs▪ AUC = 0.82
12
[White et al., 2013]
Outsourcing
Tasks normally performed in-house
To a large, diverse, external group
Via an open call
13
[Estelles-Arolas, 2012]
EXPERT LABOR
Must be found Expensive Often slow High quality Ambiguity OK Hard to use for
experiments Must be retained
CROWD LABOR
Readily available Inexpensive Fast Quality variable Instructions must be clear Easy prototyping and
experimentation Retention less important
14
Humans (even unskilled) simply better than computers at some tasks
Allows workflows to include an “HPU” Highly scalable
Rapid turn-around
High throughput
Diverse solutions Low risk Low cost
15
[Quinn & Bederson, 2011]
Microtask: low difficulty, large in number
Observations or data processing
Surveying, text or image annotation
Validation: redundancy and aggregation
Megatask: high difficulty, low in number
Problem solving, creative effort
Validation: manually, with metrics or rubric
16
[Good & Su, 2013]
MICROTASK
Microtask market Citizen science Workflow
sequestration Casual game Educational
MEGATASK
Innovation contest Hard game Collaborative
content creation
17
[Good & Su, 2013]
18
Requester
Tasks
Amazon
Tasks
Tasks
TasksTasks
Tasks
Tasks
Aggregation function
Workers
http://www.thesheepmarket.com/
Automatically tag all genes (NCBI’s gene tagger), all mutations (UMBC’s EMU)
Highlight candidate gene-mutation pairs in context Frame task as simple yes/no questions
Slide courtesy: L. Hirschman [Burger et al., 2012]
Baseline: majority vote
Can we do better?
Separate annotator bias and error
Model annotator quality
▪ Measure with labeled data or reputation
Model difficulty of each task
Sometimes disagreement is informative
23
[Ipeirotis et al., 2010][Raykar et al., 2010][Arroyo & Welty, 2013]
MICROTASK
Microtask market Citizen science Workflow
sequestration Casual game Educational
MEGATASK
Innovation contest Hard game Collaborative
content creation
24
[Good & Su, 2013]
Volunteers label images of cell biopsies from cancer patients Estimate presence and number of cancer cells
Incentive Altruism, sense of mastery
Quality training, redundancy
Analyzed 2.4 million images as of 11/2014
25
[cellslider.net]
MICROTASK
Microtask market Citizen science Workflow
sequestration Casual game Educational
MEGATASK
Innovation contest Hard game Collaborative
framework
26
[Good & Su, 2013]
EXAMPLE: RECAPTCHA, Workflow:
logging into a website
Sequestration:performingoptical character recognition
27
EXAMPLE: PROBLEM-TREATMENT KNOWLEDGE BASE CREATION
Workflow: prescribing medication Sequestration:entering reason for prescription
into ordering system
28
[Mccoy 2012]
MICROTASK
Microtask market Citizen science Workflow
sequestration Casual game Educational
MEGATASK
Innovation contest Hard game Collaborative
content creation
29
[Good & Su, 2013]
MICROTASK
Microtask market Citizen science Workflow
sequestration Casual game Educational
MEGATASK
Innovation contest Hard game Collaborative
content creation
31
[Good & Su, 2013]
Bioinformatics students simultaneously learn and perform metagenome annotation
Incentive:educational
Quality:aggregation,instructorevaluation
32[Hingamp et al., 2008]
MICROTASK
Microtask market Citizen science Workflow
sequestration Casual game Educational
MEGATASK
Innovation contest Hard game Collaborative
content creation
33
[Good & Su, 2013]
OPEN PROFESSIONAL PLATFORMS ($$$)
Innocentive TopCoder Kaggle
ACADEMIC (PUBLICATIONS..)
DREAM (see invited opening talk at crowdsourcing session) CASP
34
MICROTASK
Microtask market Citizen science Workflow
sequestration Casual game Educational
MEGATASK
Innovation contest Hard game Collaborative
content creation
35
[Good & Su, 2013]
Players manipulate proteins to find the 3D shape with the lowest calculated free energy
Competitive and collaborative
Incentive
Altruism, fun, community
Quality
Automated scoring
High performance, founda difficult key retroviral structure
36
[Khatib, et al., 2011]
MICROTASK
Microtask market Citizen science Workflow
sequestration Casual game Educational
MEGATASK
Innovation contest Hard game Collaborative
content creation
37
Aims to provide a Wikipedia page for every notable human gene
Repository of functional knowledge
10K distinct genes 50M views & 15K edits
per year
38
[Huss et al., 2008][Good et al., 2011]
Means many different things Fundamental points:
Humans (even unskilled) simply better than computers at some tasks
There are a lot of humans available
There are many approaches for accessing their talents
39
INTRINSIC
Altruism
Fun
Education
Sense of mastery
Resource creation
EXTRINSIC
Money
Recognition
Community
40
Define problem & goal
Decide platform
Decompose problem into tasks Separate: expert, crowdsourced & automatable
Refine crowdsourced tasks Simple, clear, self-contained, engaging
Design: instructions and user interface
41
[Hetmank, 2013][Alonso & Lease, 2011][Eickhoff & de Vries, 2011]
Iterate Test internally
Calibrate with small crowdsourced sample
Verify understanding, timing, pricing & quality
Incorporate feedback
Run production Scale on data before workers
Validate results
42
[Hetmank, 2013][Alonso & Lease, 2011][Eickhoff & de Vries, 2011]
Automatic evaluation If possible
Direct quality assessment Expensive▪ Microtask: Include tasks with known answers
▪ Megatask: Evaluate tasks after completion (rubric)
Aggregate redundant responses
43
PRO
Reduced cost more data
Fast turn-around time High throughput “Real world”
environment Public participation &
awareness
CON
Potentially poor quality
Spammers
Potentially low retention
Privacy concerns for sensitive data
Lax protections for workers
44
Potentially poor quality: discussed previously
Low retention
Complicates quality estimation due to sparsity
Do workers build task-specific expertise?
Privacy
Sensitive data requires trusted workers
45
Protection for workers Low pay, no protections, benefits, or career path
Potential to cause harm▪ E.g. exposure to anti-vaccine information
Is IRB approval needed?
Can be addressed Responsibility of the researcher▪ “[opportunity to] deliberately value ethics above cost
savings”
46
[Graber & Graber, 2013]
[Fort, Adda and Cohen, 2011]
[Fort, Adda and Cohen, 2011]
Demographics:
Shift from mostly US to US/India mix
Average pay is <$2.00 / hour Over 30% rely on MTurk for basic income Workers not anonymous
However:
Tools can be used ethically or unethically
Crowdsourcing ≠ AMT
47
[Ross et al., 2009]
[Lease et al., 2013]
Improved predictability Pricing, quality, retention
Improved infrastructure Data analysis, validation & aggregation
Improved trust mechanisms
Matching workers and tasks Relevant characteristics for matching each
Increased mobility48
Crowdsourcing and learning from crowd data offer distinct advantages
Scalability
Rapid turn-around
Throughput
Low cost
Must be carefully planned and managed
49
Wide variety of approaches and platforms available
Resources section lists several
Many questions still open
Science using crowdsourcing
Science of crowdsourcing
50
Thanks to the members of the crowd who make this methodology possible
Questions: [email protected], [email protected], [email protected]
Support: Robert Leaman & Zhiyong Lu:
▪ Intramural Research Program of National Library of Medicine, NIH
Benjamin Good & Andrew Su:▪ National Institute of General Medical Sciences, NIH: R01GM089820
and R01GM083924▪ National Center for Advancing Translational Sciences, NIH:
UL1TR001114
51
Distributed computing: BOINC Microtask markets: Amazon Mechanical Turk,
Clickworker, SamaSource, many others Meta services: Crowdflower, Crowdsource Educational: annotathon.org Innovation contest: Innocentive, TopCoder Crowdfunding: Rockethub, Petridish
52
Adar E: Why I hate Mechanical Turk research (and workshops). In: CHI: 2011; Vancouver, BC, Canada. Citeseer. Alonso O, Lease M: Crowdsourcing for Information Retrieval: Principles, Methods and Applications. Tutorial at ACM-SIGIR 2011. Aroyo L, Welty C: Crowd Truth: Harnessing disagreement in crowdsourcing a relation extraction gold standard. In: WebSci2013 ACM 2013. 2013. Burger J, Doughty E, Bayer S, Tresner-Kirsch D, Wellner B, Aberdeen J, Lee K, Kann M, Hirschman L: Validating Candidate Gene-Mutation Relations in MEDLINE Abstracts via Crowdsourcing. In: Data Integration in the Life Sciences. vol. 7348: Springer Berlin Heidelberg; 2012: 83-91. Eickhoff C, de Vries A: How Crowdsourceable is your Task? In: WSDM 2011 Workshop on Crowdsourcing for Search and Data Mining; Hong Kong, China. 2011: 11-14. Estelles-Arolas E, Gonzalez-Ladron-de-Guevara F: Towards an integrated crowdsourcing definition. Journal of Information Science 2012, 38(189). Fort K, Adda G, Cohen KB: Amazon Mechanical Turk: Gold Mine or Coal Mine? Computational Linguistics 2011, 37(2). Ginsberg J, Mohebbi MH, Patel RS, Brammer L, Smolinski MS, Brilliant L: Detecting influenza epidemics using search engine query data. Nature 2009, 457(7232):1012-1014.
53
Good BM, Clarke EL, de Alfaro L, Su AI: Gene Wiki in 2011: community intelligence applied to human gene annotation. Nucleic Acids Res 2011, 40:D1255-1261. Good BM, Su AI: Crowdsourcing for bioinformatics. Bioinformatics 2013, 29(16):1925-1933. Graber MA, Graber A: Internet-based crowdsourcing and research ethics: the case for IRB review. Journal of medical ethics 2013, 39(2):115-118. Halevy A, Norvig P, Pereira F: The Unreasonable Effectiveness of Data. IEEE Intelligent Systems 2009, 9:8-12. Harpaz R, Callahan A, Tamang S, Low Y, Odgers D, Finlayson S, Jung K, LePenduP, Shah NH: Text Mining for Adverse Drug Events: the Promise, Challenges, and State of the Art. Drug Safety 2014, 37(10):777-790. Hetmank L: Components and Functions of Crowdsourcing Systems - A Systematic Literature Review. In: 11th International Conference on Wirtschaftsinformatik; Leipzip, Germany. 2013. Hingamp P, Brochier C, Talla E, Gautheret D, Thieffry D, Herrmann C: Metagenome annotation using a distributed grid of undergraduate students. PLoSbiology 2008, 6(11):e296. Howe J: Crowdsourcing: Why the power of the crowd is driving the future of business: Crown Business; 2009.
54
Huss JW, Orozco D, Goodale J, Wu C, Batalov S, Vickers TJ, Valafar F, Su AI: A Gene Wiki for Community Annotation of Gene Function. PLoS biology 2008, 6(7):e175. Ipeirotis P: Managing Crowdsourced Human Computation. Tutorial at WWW2011. Ipeirotis PG, Provost F, Wang J: Quality Management on Amazon Mechanical Turk. In: KDD-HCOMP; Washington DC, USA. 2010. Khatib F, DiMaio F, Foldit Contenders G, Foldit Void Crushers G, Cooper S, Kazmierczyk M, Gilski M, Krzywda S, Zabranska H, Pichova I et al: Crystal structure of a monomeric retroviral protease solved by protein folding game players. Nature structural & molecular biology 2011, 18(10):1175-1177. Leaman R, Wojtulewicz L, Sullivan R, Skariah A, Yang J, Gonzalez G: Towards Internet-Age Pharmacovigilance: Extracting Adverse Drug Reactions from User Posts to Health-Related Social Networks. In: BioNLP Workshop; 2010: 117-125. Lease M, Hullman J, Bingham JP, Bernstein M, Kim J, Lasecki WS, Bakhshi S, Mitra T, Miller RC: Mechanical Turk is Not Anonymous. In.: Social Science Research Network; 2013. Nakatsu RT, Grossman EB, Iacovou CL: A taxonomy of crowdsourcing based on task complexity. Journal of Information Science 2014. Nielsen J: Usability Engineering: Academic Press; 1993.
55
Pustejovsky J, Stubbs A: Natural Language Annotation for Machine Learning: O'Reilly Media; 2012. Quinn AJ, Bederson BB: Human Computation: A Survey and Taxonomy of a Growing Field. In: CHI; Vancouver, BC, Canada. 2011. Ranard BL, Ha YP, Meisel ZF, Asch DA, Hill SS, Becker LB, Seymour AK, Merchant RM: Crowdsourcing--harnessing the masses to advance health and medicine, a systematic review. Journal of General Internal Medicine 2014, 29(1):187-203. Raykar VC, Yu S, Zhao LH, Valadez GH, Florin C, Bogoni L, Moy L: Learning from Crowds. Journal of Machine Learning Research 2010, 11:1297-1332. Ross J, Zaldivar A, Irani L: Who are the Turkers? Worker demographics in Amazon Mechanical Turk. In.: Department of Informatics, UC Irvine USA; 2009. Surowiecki J: The Wisdom of Crowds: Doubleday; 2004. Vakharia D, Lease M: Beyond AMT: An Analysis of Crowd Work Platforms. arXiv; 2013. Von Ahn L: Games with a Purpose. Computer 2006, 39(6):92-94. White R, Tatonetti NP, Shah NH, Altman RB, Horvitz E: Web-scale pharmacovigilance: listening to signals from the crowd. J Am Med Inform Assoc2013, 20:404-408. Yuen M-C, King I, Leung K-S: A Survey of Crowdsourcing Systems. In: IEEE International Conference on Privacy, Security, Risk and Trust. 2011.
56