crowdsourcing massimo poesio part 4: dealing with crowdsourced data

52
CROWDSOURCING Massimo Poesio Part 4: Dealing with crowdsourced data

Upload: solomon-reed

Post on 27-Dec-2015

214 views

Category:

Documents


1 download

TRANSCRIPT

Page 1: CROWDSOURCING Massimo Poesio Part 4: Dealing with crowdsourced data

CROWDSOURCING

Massimo PoesioPart 4: Dealing with crowdsourced

data

Page 2: CROWDSOURCING Massimo Poesio Part 4: Dealing with crowdsourced data

THE DATA

• The result of crowdsourcing in whatever form is a mass of often inconsistent judgments

• Need techniques for identifying reliable annotations and reliable annotators– In the Phrase Detectives context, to discriminate

between genuine ambiguity and disagreements due to error

Page 3: CROWDSOURCING Massimo Poesio Part 4: Dealing with crowdsourced data

THE ANDROCLES EXAMPLE

Page 4: CROWDSOURCING Massimo Poesio Part 4: Dealing with crowdsourced data

SOME APPROACHES

• Majority voting• But: it ignores the substantial differences in

behavior between annotators• Alternatives:

– Removing bad annotators eg using clustering– Weighing annotators

Page 5: CROWDSOURCING Massimo Poesio Part 4: Dealing with crowdsourced data

SNOW ET AL

Page 6: CROWDSOURCING Massimo Poesio Part 4: Dealing with crowdsourced data

SNOW ET AL: WEIGHTING ANNOTATORS

Page 7: CROWDSOURCING Massimo Poesio Part 4: Dealing with crowdsourced data

LATENT MODELS OF ANNOTATION QUALITY

• The problem of reaching a conclusion on the basis of judgments by separate experts that may often be in disagreement is a longstanding one in epidemiology

• A number of techniques developed, including– Dawid and Skene 1979 (also used by Passonneau &

Carpenter)– Latent Annotation model (Uebersax 1994)– Raykar et al 2010

• Recently, Carpenter has been developing an explicit Hierarchical Bayesian model (2008)

Page 8: CROWDSOURCING Massimo Poesio Part 4: Dealing with crowdsourced data

DAWID AND SKENE 1979

• Model consists of likelihood for1. annotations (labels from annotators) 2. categories (true labels) for items given 3. annotator accuracies and biases4. prevalence of labels • Frequentists estimate 2–4 given 1• Optional regularization of estimates (for 3 and

4)

Page 9: CROWDSOURCING Massimo Poesio Part 4: Dealing with crowdsourced data

A GENERATIVE MODEL OF THE ANNOTATION TASK

• What all of these models do is to provide an EXPLICIT PROBABILISTIC MODEL of the observations in terms of annotators, labels, and items

Page 10: CROWDSOURCING Massimo Poesio Part 4: Dealing with crowdsourced data

THE DATA

• K possible labels• J annotators• I number of items • N total number of annotations of the I items

produced by the J annotators• y_{i,j}: label produced for item i by coder j

Page 11: CROWDSOURCING Massimo Poesio Part 4: Dealing with crowdsourced data

THE DATA: BY ITEM

ITEM CODER 1 CODER 2 … CODER J

1 y_{1,1} y_{1,2} ..

2 y_{2,1} ..

3

4 y_{4,2} y_{4,J}

I y_{I,1} …

Page 12: CROWDSOURCING Massimo Poesio Part 4: Dealing with crowdsourced data

THE DATA: BY ANNOTATIONS

ANNOTATION LABEL

1 y_{1,1}

2 y_{1,2}

3 y_{2,3}

4 ..

..

N

Page 13: CROWDSOURCING Massimo Poesio Part 4: Dealing with crowdsourced data

THE ANNOTATION TABLE

ANNOTATION Ii_n jj_n y_{ii_n,jj_n}

1 1 1 A

2 1 2 A

3 2 3 B

4 ..

..

N

Page 14: CROWDSOURCING Massimo Poesio Part 4: Dealing with crowdsourced data

A GENERATIVE MODEL OF THE ANNOTATION TASK

• The probabilistic model specifies the probability of a particular label on the basis of PARAMETERS specifying the behavior of the annotators, the prevalence of the labels, etc

• In Bayesian models, these parameters are specified in terms of PROBABILITY DISTRIBUTIONS

Page 15: CROWDSOURCING Massimo Poesio Part 4: Dealing with crowdsourced data

THE PARAMETERS OF THE MODEL

• z_i: the ACTUAL category of item i• Θ_{j,k,k’}: ANNOTATOR RESPONSE

– the probability that annotator j labels an item as k’ when it belongs to category k

• π_k: PREVALENCE– The probability that an item belongs to category k

Page 16: CROWDSOURCING Massimo Poesio Part 4: Dealing with crowdsourced data

DISTRIBUTIONS

• Each of the parameters is characterized in terms of a PROBABILITY DISTRIBUTION

• When we have some information on the data, these distributions can be used to characterize their behavior – E.g., annotators may be all equally good / there

may be a skew• Otherwise just defaults

Page 17: CROWDSOURCING Massimo Poesio Part 4: Dealing with crowdsourced data

DISTRIBUTIONS

• Prevalence of labels (PRIOR)– π ~ Dir(α)

• Annotator j’s response to item of category k (PRIOR)– Θ_{j,k} ~ Dir(β_k)

• True category of item i (LIKELIHOOD):– z_i ~ Categorical(π)

• Label from j for item i (LIKELIHOOD):– y_{i,j} ~ Categorical(Θ_{j,z_i})

Page 18: CROWDSOURCING Massimo Poesio Part 4: Dealing with crowdsourced data

TYPES OF ANNOTATORS: SPAMMY

(RESPONSE TO ALL ITEMS THE SAME)

Page 19: CROWDSOURCING Massimo Poesio Part 4: Dealing with crowdsourced data

TYPES OF ANNOTATORS: BIASED

(HAS SKEW IN RESPONSE – COMMON IN LOW PREVALENCE DATA)

Page 20: CROWDSOURCING Massimo Poesio Part 4: Dealing with crowdsourced data

QUICK INTRO TO DIRICHLET

• Dirichlet is often seen in Bayesian models (e.g., Latent Dirichlet Allocation, LDC) because it is a CONJUGATE PRIOR of the MULTINOMIAL distribution

Page 21: CROWDSOURCING Massimo Poesio Part 4: Dealing with crowdsourced data

BINOMIAL AND MULTINOMIAL

Page 22: CROWDSOURCING Massimo Poesio Part 4: Dealing with crowdsourced data

NLE 24

CONJUGATE PRIOR

• In Bayesian inference the objective is to compute a POSTERIOR on the basis of a LIKELIHOOD and a PRIOR

• A CONJUGATE PRIOR of distribution D is a distribution such that if it is used for the prior, then the posterior also has that shape– E.g., ‘Dirichlet is a conjugate prior of the multinomial’ means that if

the likelihood is a multinomial and the prior is Dirichlet then the posterior is also Dirichlet.

)(

)()|()|(

AP

BPBAPABP

Page 23: CROWDSOURCING Massimo Poesio Part 4: Dealing with crowdsourced data

DIRICHLET DISTRIBUTION

Page 24: CROWDSOURCING Massimo Poesio Part 4: Dealing with crowdsourced data

CATEGORICAL

• The categorical distribution is a generalization of the Bernoulli distribution that specifies the probability of a given outcome for a binary trial– E.g., the probability of getting a head in a coin toss– Cfr.: BINOMIAL distribution that specifies the

probability of getting N heads

Page 25: CROWDSOURCING Massimo Poesio Part 4: Dealing with crowdsourced data

A GRAPHICAL VIEW OF THE MODEL

Page 26: CROWDSOURCING Massimo Poesio Part 4: Dealing with crowdsourced data

THE PROBABILISTIC MODEL OF A GIVEN LABEL

Page 27: CROWDSOURCING Massimo Poesio Part 4: Dealing with crowdsourced data

AN EXAMPLE

Page 28: CROWDSOURCING Massimo Poesio Part 4: Dealing with crowdsourced data

PROBABILISTIC INFERENCE

• Probabilistic inference techniques are used to INFER the parameters from the data and therefore compute the probabilities and parameters– Often: Expectation Maximization (EM)

• The EM implementation in R used by Carpenter & Passonneau to estimate the parameters available from – https://github.com/bob- carpenter/ anno

Page 29: CROWDSOURCING Massimo Poesio Part 4: Dealing with crowdsourced data

APPLICATION TO WORD SENSE DISTRIBUTION (CARPENTER & PASSONNEAU, 2013, 2014)

• Carpenter and Passonneau used the Dawid and Skeene model to compare manual annotators with turkers on word sense disambiguation anno of the MASC corpus

Page 30: CROWDSOURCING Massimo Poesio Part 4: Dealing with crowdsourced data

THE MASC corpus

• Manually annotated subcorpus (MASC)– 500K word subset of Open American National Corpus

(OANC)• Multiple genres: technical manuals, poetry, news,

dialogue, etc.• 16 types of annotation (not all manual)

– part of speech, phrases, word sense, named entity, ... • 100 item word-sense corpus

– balanced by genre and part-of-speech (noun, verb, adjective)

Page 31: CROWDSOURCING Massimo Poesio Part 4: Dealing with crowdsourced data

MASC WORDSENSE

• 100 words balanced between adjs, nouns, & verbs

• 1000 sentences for each word• Annotated using WordNet senses for these

words• ~ 1M tokens

Page 32: CROWDSOURCING Massimo Poesio Part 4: Dealing with crowdsourced data

MASC Wordsense: annotation using trained annotators

• pre-training on 50 items• independent labeling of 1000 items • 100 items labeled by 3 or 4 annotators • agreement on these 100 items reported• only single round of annotation, most items

single annotated

Page 33: CROWDSOURCING Massimo Poesio Part 4: Dealing with crowdsourced data

Annotation using trained annotators

• College students from Vassar, Barnard, Columbia

• 2–3 years of work on project• General training plus per-word training• Supervised by

– Becky Passonneau– Nancy Ide (maintainer of MASC)– Christiane Fellbaum (maintainer of WordNet)

Page 34: CROWDSOURCING Massimo Poesio Part 4: Dealing with crowdsourced data

Annotation using crowdsourcing

• 45 randomly selected words balanced across nouns, verbs, and adjectives were reannotated using crowdsourcing

• 1000 instances per word • 25+ annotators per instance • high number of annotators to

– estimate difficulty– reject independence of labels

Page 35: CROWDSOURCING Massimo Poesio Part 4: Dealing with crowdsourced data

Differences from trained situation

• Annotators not trained• Not told to look at WordNet• Each HIT:

– 10 sentences for the same word– WordNet senses listed under the word

Page 36: CROWDSOURCING Massimo Poesio Part 4: Dealing with crowdsourced data

METHODS

• Passonneau & Carpenter used their model to– Evaluate prevalence of labels in different ways– Evaluate annotator response

Page 37: CROWDSOURCING Massimo Poesio Part 4: Dealing with crowdsourced data

PREVALENCE ESTIMATION

Page 38: CROWDSOURCING Massimo Poesio Part 4: Dealing with crowdsourced data

ASSESSMENT OF QUALITY

Page 39: CROWDSOURCING Massimo Poesio Part 4: Dealing with crowdsourced data

ANNOTATOR RESPONSE

Page 40: CROWDSOURCING Massimo Poesio Part 4: Dealing with crowdsourced data

AGREEMENT RATES

Page 41: CROWDSOURCING Massimo Poesio Part 4: Dealing with crowdsourced data

OTHER MODELS

• Raykar et al, 2010• Carpenter, 2008

Page 42: CROWDSOURCING Massimo Poesio Part 4: Dealing with crowdsourced data

RAYKAR ET AL 2010

• Simultaneously ESTIMATES THE GROUND TRUTH from noisy labels, produces an ASSESSMENT OF THE ANNOTATORS, and LEARNS A CLASSIFIER– Based on logistic regression

• Bayesian (includes priors on the annotators)

Page 43: CROWDSOURCING Massimo Poesio Part 4: Dealing with crowdsourced data

ANNOTATORS

• Annotator j characterized by her/his– SENSITIVITY: the ability to recognize positive cases

• α_j = P(y_j=1|y=1)

– SPECIFICITY: the ability to recognize negative cases• β_j = P(y_j=1|y=1)

Page 44: CROWDSOURCING Massimo Poesio Part 4: Dealing with crowdsourced data

www.phrasedetectives.com

Raykar et al propose a version of the EM algorithm that can be used to estimate P(O|θ) as well as sensitivity and specificity for each annotator

P O |θ( ) = P y i1...y i

R | x i,θ( )i=1

N

P O |θ( ) = [aipi + bii=1

N

∏ (1− pi)]

Carpenter developed a fully Bayesian version of the approach based on gradient descent

RAYKAR ET AL

Page 45: CROWDSOURCING Massimo Poesio Part 4: Dealing with crowdsourced data

CARPENTER

Page 46: CROWDSOURCING Massimo Poesio Part 4: Dealing with crowdsourced data

DISAGREEMENT IN INTERPRETATION

Page 47: CROWDSOURCING Massimo Poesio Part 4: Dealing with crowdsourced data

15.12 M: we’re gonna take the engine E3

15.13 : and shove it over to Corning

15.14 : hook [it] up to [the tanker car]

15.15 : _and_

15.16 : send it back to Elmira

(from the TRAINS-91 dialogues collected at the University of Rochester)

AMBIGUITY: REFERENT

Page 48: CROWDSOURCING Massimo Poesio Part 4: Dealing with crowdsourced data

www.phrasedetectives.com

About 160 workers at a factory that made paper for the Kent filters were exposed to asbestos in the 1950s.

Areas of the factory were particularly dusty where the crocidolite was used.

Workers dumped large burlap sacks of the imported material into a huge bin, poured in cotton and acetate fibers and mechanically mixed the dry fibers in a process used to make filters.

Workers described "clouds of blue dust" that hung over parts of the factory,

even though exhaust fans ventilated the area.

AMBIGUITY: REFERENT

Page 49: CROWDSOURCING Massimo Poesio Part 4: Dealing with crowdsourced data

AMBIGUITY: EXPLETIVES

'I beg your pardon!' said the Mouse, frowning, but very politely: 'Did you speak?'

'Not I!' said the Lory hastily.

'I thought you did,' said the Mouse. '--I proceed. "Edwin and Morcar,the earls of Mercia and Northumbria, declared for him: and even Stigand,the patriotic archbishop of Canterbury, found it advisable--"'

'Found WHAT?' said the Duck.

'Found IT,' the Mouse replied rather crossly: 'of course you know what"it" means.'

Page 50: CROWDSOURCING Massimo Poesio Part 4: Dealing with crowdsourced data

OTHER DATA: WORDSENSE DISAMBIGUATION (Passonneau et al 2010)

And our ideas of what constitutes a FAIR wage on a FAIR return on capital are historically contingent … {sense1, sense1, sense1, sense2, sense2, sense2}

… the federal government … is wrangling for its FAIR share of the dividend … {sense1, sense1, sense2, sense2, sense8, sense8}

Page 51: CROWDSOURCING Massimo Poesio Part 4: Dealing with crowdsourced data

OTHER DATA: POS (Plank, Hovy & Søgaard 2014)

Noam goes OUT tonight {ADP, PRT}

Noam likes SOCIAL media {ADJ, NOUN}

Page 52: CROWDSOURCING Massimo Poesio Part 4: Dealing with crowdsourced data

REFERENCES

• Passonneau & Carpenter, 2014. The Benefits of a Model of Annotation. TACL. To appear.

• Raykar, Yu, Zhao, Valadez, Florin, Bogoni, & Moy, 2010. Learning from crowds. Journal of Machine Learning Research.