crowdsourcing and learning from crowd data (tutorial @ psb2015)

56
Robert Leaman Benjamin Good Zhiyong Lu Andrew Su http://slideshare.net/andrewsu

Upload: andrew-su

Post on 17-Jul-2015

882 views

Category:

Science


0 download

TRANSCRIPT

Robert Leaman Benjamin GoodZhiyong Lu Andrew Su

http://slideshare.net/andrewsu

The aggregated decisions of a group are often better than the those of any single member

Requirements:

Diversity

Independence

Decentralization

Aggregation

2[Surowiecki, 2004]

Sir Francis Galton

An undefined group of people

Typically ‘large’

Diverse skills and abilities

Typically no special skills assumed

3

[Estelles-Arolas, 2012]

Computational power Distributed computing

Content Web searches, social media

updates, blogs

Observations Online surveys

Personal data4[Good & Su, 2013]

Cognitive power Visual reasoning, language

processing

Creative effort Resource creation, algorithm

development

Funding: $$$

5[Good & Su, 2013]

Crowd data

Content

Search logs

Crowdsourcing

Observations

Cognitive power

Creative effort

Not a focus in this tutorial

Distributed computing

Crowdfunding

6

Access

To the data; to the crowd

▪ 1 in 5 people have a smartphone worldwide

Engagement

Getting contributors’ attention

Incentive Quality control

7

Information reflects health

Disease status

Disease associations

Health related behaviors

Information also drives health

Knowledge and beliefs regarding prevention and treatment

Quality monitoring of health information available to public

8

“Infodemiology”

[Eysenbach, 2006]

Key challenge: text Variability: tired, wiped, pooped somnolence

Ambiguity: numb sensory or cognition? Two levels Keyword: locate specific terms + synonyms

Concept: attempt to normalize mentions to specific entities

Measurement Disproportionality analysis

Separating signal from noise9

Objective: predict flu outbreaks from internet search trends

Access to search data via direct access to logs or via ad clicks

High correlation between clicks one week and cases the next

Caveats!

Many potential confounders

10

[Eysenbach, 2006]

[Eysenbach, 2009]

[Ginsberg et al., 2009]

2004 2005 2006 2007

searches

cases

Objective: Mine social media forums for ADR reports

Lexicon based on UMLS Metathesaurus, SIDER, MedEffect, and a set of colloquial phrases (“zonked”, misspellings)

Demonstrated viability of text mining (73.9% f-measure)

Revealed known ADRs and putatively novel ADRs

Olanzapine Known incidence

Corpus Frequency

Weight gain 65% 30.0%

Fatigue 26% 15.9%

Increased cholesterol

22% -

Increased appetite

- 4.9%

Depression - 3.1%

Tremor - 2.7%

Diabetes 2% 2.6%

Anxiety - 1.4%

11

[Leaman et al., 2010]

Objective: identify DDI from internet search logs

DDI reports difficult to find Focused on a DDI unknown at

time data collected▪ Paroxetine + pravastatin

hyperglycemia

Synonyms Web searches

Disproportionality analysis Results

Significant association

Classifying 31 TP & 31 TN pairs▪ AUC = 0.82

12

[White et al., 2013]

Outsourcing

Tasks normally performed in-house

To a large, diverse, external group

Via an open call

13

[Estelles-Arolas, 2012]

EXPERT LABOR

Must be found Expensive Often slow High quality Ambiguity OK Hard to use for

experiments Must be retained

CROWD LABOR

Readily available Inexpensive Fast Quality variable Instructions must be clear Easy prototyping and

experimentation Retention less important

14

Humans (even unskilled) simply better than computers at some tasks

Allows workflows to include an “HPU” Highly scalable

Rapid turn-around

High throughput

Diverse solutions Low risk Low cost

15

[Quinn & Bederson, 2011]

Microtask: low difficulty, large in number

Observations or data processing

Surveying, text or image annotation

Validation: redundancy and aggregation

Megatask: high difficulty, low in number

Problem solving, creative effort

Validation: manually, with metrics or rubric

16

[Good & Su, 2013]

MICROTASK

Microtask market Citizen science Workflow

sequestration Casual game Educational

MEGATASK

Innovation contest Hard game Collaborative

content creation

17

[Good & Su, 2013]

18

Requester

Tasks

Amazon

Tasks

Tasks

TasksTasks

Tasks

Tasks

Aggregation function

Workers

http://www.thesheepmarket.com/

Automatically tag all genes (NCBI’s gene tagger), all mutations (UMBC’s EMU)

Highlight candidate gene-mutation pairs in context Frame task as simple yes/no questions

Slide courtesy: L. Hirschman [Burger et al., 2012]

20

21

[Mea 2014]

Tagging cells for breast cancer based on stain

22

Requester

Tasks

Amazon

Tasks

Tasks

TasksTasks

Tasks

Tasks

Aggregation function

Workers

Baseline: majority vote

Can we do better?

Separate annotator bias and error

Model annotator quality

▪ Measure with labeled data or reputation

Model difficulty of each task

Sometimes disagreement is informative

23

[Ipeirotis et al., 2010][Raykar et al., 2010][Arroyo & Welty, 2013]

MICROTASK

Microtask market Citizen science Workflow

sequestration Casual game Educational

MEGATASK

Innovation contest Hard game Collaborative

content creation

24

[Good & Su, 2013]

Volunteers label images of cell biopsies from cancer patients Estimate presence and number of cancer cells

Incentive Altruism, sense of mastery

Quality training, redundancy

Analyzed 2.4 million images as of 11/2014

25

[cellslider.net]

MICROTASK

Microtask market Citizen science Workflow

sequestration Casual game Educational

MEGATASK

Innovation contest Hard game Collaborative

framework

26

[Good & Su, 2013]

EXAMPLE: RECAPTCHA, Workflow:

logging into a website

Sequestration:performingoptical character recognition

27

EXAMPLE: PROBLEM-TREATMENT KNOWLEDGE BASE CREATION

Workflow: prescribing medication Sequestration:entering reason for prescription

into ordering system

28

[Mccoy 2012]

MICROTASK

Microtask market Citizen science Workflow

sequestration Casual game Educational

MEGATASK

Innovation contest Hard game Collaborative

content creation

29

[Good & Su, 2013]

30

MalariaSpot: Luengo-Ortiz 2012

MOLT: Mavandadi 2012

MICROTASK

Microtask market Citizen science Workflow

sequestration Casual game Educational

MEGATASK

Innovation contest Hard game Collaborative

content creation

31

[Good & Su, 2013]

Bioinformatics students simultaneously learn and perform metagenome annotation

Incentive:educational

Quality:aggregation,instructorevaluation

32[Hingamp et al., 2008]

MICROTASK

Microtask market Citizen science Workflow

sequestration Casual game Educational

MEGATASK

Innovation contest Hard game Collaborative

content creation

33

[Good & Su, 2013]

OPEN PROFESSIONAL PLATFORMS ($$$)

Innocentive TopCoder Kaggle

ACADEMIC (PUBLICATIONS..)

DREAM (see invited opening talk at crowdsourcing session) CASP

34

MICROTASK

Microtask market Citizen science Workflow

sequestration Casual game Educational

MEGATASK

Innovation contest Hard game Collaborative

content creation

35

[Good & Su, 2013]

Players manipulate proteins to find the 3D shape with the lowest calculated free energy

Competitive and collaborative

Incentive

Altruism, fun, community

Quality

Automated scoring

High performance, founda difficult key retroviral structure

36

[Khatib, et al., 2011]

MICROTASK

Microtask market Citizen science Workflow

sequestration Casual game Educational

MEGATASK

Innovation contest Hard game Collaborative

content creation

37

Aims to provide a Wikipedia page for every notable human gene

Repository of functional knowledge

10K distinct genes 50M views & 15K edits

per year

38

[Huss et al., 2008][Good et al., 2011]

Means many different things Fundamental points:

Humans (even unskilled) simply better than computers at some tasks

There are a lot of humans available

There are many approaches for accessing their talents

39

INTRINSIC

Altruism

Fun

Education

Sense of mastery

Resource creation

EXTRINSIC

Money

Recognition

Community

40

Define problem & goal

Decide platform

Decompose problem into tasks Separate: expert, crowdsourced & automatable

Refine crowdsourced tasks Simple, clear, self-contained, engaging

Design: instructions and user interface

41

[Hetmank, 2013][Alonso & Lease, 2011][Eickhoff & de Vries, 2011]

Iterate Test internally

Calibrate with small crowdsourced sample

Verify understanding, timing, pricing & quality

Incorporate feedback

Run production Scale on data before workers

Validate results

42

[Hetmank, 2013][Alonso & Lease, 2011][Eickhoff & de Vries, 2011]

Automatic evaluation If possible

Direct quality assessment Expensive▪ Microtask: Include tasks with known answers

▪ Megatask: Evaluate tasks after completion (rubric)

Aggregate redundant responses

43

PRO

Reduced cost more data

Fast turn-around time High throughput “Real world”

environment Public participation &

awareness

CON

Potentially poor quality

Spammers

Potentially low retention

Privacy concerns for sensitive data

Lax protections for workers

44

Potentially poor quality: discussed previously

Low retention

Complicates quality estimation due to sparsity

Do workers build task-specific expertise?

Privacy

Sensitive data requires trusted workers

45

Protection for workers Low pay, no protections, benefits, or career path

Potential to cause harm▪ E.g. exposure to anti-vaccine information

Is IRB approval needed?

Can be addressed Responsibility of the researcher▪ “[opportunity to] deliberately value ethics above cost

savings”

46

[Graber & Graber, 2013]

[Fort, Adda and Cohen, 2011]

[Fort, Adda and Cohen, 2011]

Demographics:

Shift from mostly US to US/India mix

Average pay is <$2.00 / hour Over 30% rely on MTurk for basic income Workers not anonymous

However:

Tools can be used ethically or unethically

Crowdsourcing ≠ AMT

47

[Ross et al., 2009]

[Lease et al., 2013]

Improved predictability Pricing, quality, retention

Improved infrastructure Data analysis, validation & aggregation

Improved trust mechanisms

Matching workers and tasks Relevant characteristics for matching each

Increased mobility48

Crowdsourcing and learning from crowd data offer distinct advantages

Scalability

Rapid turn-around

Throughput

Low cost

Must be carefully planned and managed

49

Wide variety of approaches and platforms available

Resources section lists several

Many questions still open

Science using crowdsourcing

Science of crowdsourcing

50

Thanks to the members of the crowd who make this methodology possible

Questions: [email protected], [email protected], [email protected]

Support: Robert Leaman & Zhiyong Lu:

▪ Intramural Research Program of National Library of Medicine, NIH

Benjamin Good & Andrew Su:▪ National Institute of General Medical Sciences, NIH: R01GM089820

and R01GM083924▪ National Center for Advancing Translational Sciences, NIH:

UL1TR001114

51

Distributed computing: BOINC Microtask markets: Amazon Mechanical Turk,

Clickworker, SamaSource, many others Meta services: Crowdflower, Crowdsource Educational: annotathon.org Innovation contest: Innocentive, TopCoder Crowdfunding: Rockethub, Petridish

52

Adar E: Why I hate Mechanical Turk research (and workshops). In: CHI: 2011; Vancouver, BC, Canada. Citeseer. Alonso O, Lease M: Crowdsourcing for Information Retrieval: Principles, Methods and Applications. Tutorial at ACM-SIGIR 2011. Aroyo L, Welty C: Crowd Truth: Harnessing disagreement in crowdsourcing a relation extraction gold standard. In: WebSci2013 ACM 2013. 2013. Burger J, Doughty E, Bayer S, Tresner-Kirsch D, Wellner B, Aberdeen J, Lee K, Kann M, Hirschman L: Validating Candidate Gene-Mutation Relations in MEDLINE Abstracts via Crowdsourcing. In: Data Integration in the Life Sciences. vol. 7348: Springer Berlin Heidelberg; 2012: 83-91. Eickhoff C, de Vries A: How Crowdsourceable is your Task? In: WSDM 2011 Workshop on Crowdsourcing for Search and Data Mining; Hong Kong, China. 2011: 11-14. Estelles-Arolas E, Gonzalez-Ladron-de-Guevara F: Towards an integrated crowdsourcing definition. Journal of Information Science 2012, 38(189). Fort K, Adda G, Cohen KB: Amazon Mechanical Turk: Gold Mine or Coal Mine? Computational Linguistics 2011, 37(2). Ginsberg J, Mohebbi MH, Patel RS, Brammer L, Smolinski MS, Brilliant L: Detecting influenza epidemics using search engine query data. Nature 2009, 457(7232):1012-1014.

53

Good BM, Clarke EL, de Alfaro L, Su AI: Gene Wiki in 2011: community intelligence applied to human gene annotation. Nucleic Acids Res 2011, 40:D1255-1261. Good BM, Su AI: Crowdsourcing for bioinformatics. Bioinformatics 2013, 29(16):1925-1933. Graber MA, Graber A: Internet-based crowdsourcing and research ethics: the case for IRB review. Journal of medical ethics 2013, 39(2):115-118. Halevy A, Norvig P, Pereira F: The Unreasonable Effectiveness of Data. IEEE Intelligent Systems 2009, 9:8-12. Harpaz R, Callahan A, Tamang S, Low Y, Odgers D, Finlayson S, Jung K, LePenduP, Shah NH: Text Mining for Adverse Drug Events: the Promise, Challenges, and State of the Art. Drug Safety 2014, 37(10):777-790. Hetmank L: Components and Functions of Crowdsourcing Systems - A Systematic Literature Review. In: 11th International Conference on Wirtschaftsinformatik; Leipzip, Germany. 2013. Hingamp P, Brochier C, Talla E, Gautheret D, Thieffry D, Herrmann C: Metagenome annotation using a distributed grid of undergraduate students. PLoSbiology 2008, 6(11):e296. Howe J: Crowdsourcing: Why the power of the crowd is driving the future of business: Crown Business; 2009.

54

Huss JW, Orozco D, Goodale J, Wu C, Batalov S, Vickers TJ, Valafar F, Su AI: A Gene Wiki for Community Annotation of Gene Function. PLoS biology 2008, 6(7):e175. Ipeirotis P: Managing Crowdsourced Human Computation. Tutorial at WWW2011. Ipeirotis PG, Provost F, Wang J: Quality Management on Amazon Mechanical Turk. In: KDD-HCOMP; Washington DC, USA. 2010. Khatib F, DiMaio F, Foldit Contenders G, Foldit Void Crushers G, Cooper S, Kazmierczyk M, Gilski M, Krzywda S, Zabranska H, Pichova I et al: Crystal structure of a monomeric retroviral protease solved by protein folding game players. Nature structural & molecular biology 2011, 18(10):1175-1177. Leaman R, Wojtulewicz L, Sullivan R, Skariah A, Yang J, Gonzalez G: Towards Internet-Age Pharmacovigilance: Extracting Adverse Drug Reactions from User Posts to Health-Related Social Networks. In: BioNLP Workshop; 2010: 117-125. Lease M, Hullman J, Bingham JP, Bernstein M, Kim J, Lasecki WS, Bakhshi S, Mitra T, Miller RC: Mechanical Turk is Not Anonymous. In.: Social Science Research Network; 2013. Nakatsu RT, Grossman EB, Iacovou CL: A taxonomy of crowdsourcing based on task complexity. Journal of Information Science 2014. Nielsen J: Usability Engineering: Academic Press; 1993.

55

Pustejovsky J, Stubbs A: Natural Language Annotation for Machine Learning: O'Reilly Media; 2012. Quinn AJ, Bederson BB: Human Computation: A Survey and Taxonomy of a Growing Field. In: CHI; Vancouver, BC, Canada. 2011. Ranard BL, Ha YP, Meisel ZF, Asch DA, Hill SS, Becker LB, Seymour AK, Merchant RM: Crowdsourcing--harnessing the masses to advance health and medicine, a systematic review. Journal of General Internal Medicine 2014, 29(1):187-203. Raykar VC, Yu S, Zhao LH, Valadez GH, Florin C, Bogoni L, Moy L: Learning from Crowds. Journal of Machine Learning Research 2010, 11:1297-1332. Ross J, Zaldivar A, Irani L: Who are the Turkers? Worker demographics in Amazon Mechanical Turk. In.: Department of Informatics, UC Irvine USA; 2009. Surowiecki J: The Wisdom of Crowds: Doubleday; 2004. Vakharia D, Lease M: Beyond AMT: An Analysis of Crowd Work Platforms. arXiv; 2013. Von Ahn L: Games with a Purpose. Computer 2006, 39(6):92-94. White R, Tatonetti NP, Shah NH, Altman RB, Horvitz E: Web-scale pharmacovigilance: listening to signals from the crowd. J Am Med Inform Assoc2013, 20:404-408. Yuen M-C, King I, Leung K-S: A Survey of Crowdsourcing Systems. In: IEEE International Conference on Privacy, Security, Risk and Trust. 2011.

56