1 efficiently learning the accuracy of labeling sources for selective sampling by pinar donmez,...
TRANSCRIPT
![Page 1: 1 Efficiently Learning the Accuracy of Labeling Sources for Selective Sampling by Pinar Donmez, Jaime Carbonell, Jeff Schneider School of Computer Science,](https://reader036.vdocuments.us/reader036/viewer/2022062308/56649dc55503460f94ab8d17/html5/thumbnails/1.jpg)
1
Efficiently Learning the Accuracy of
Labeling Sources for Selective Sampling
by Pinar Donmez, Jaime Carbonell, Jeff Schneider
School of Computer Science, Carnegie Mellon University
KDD ’09
June 30th 2009
Paris, France
![Page 2: 1 Efficiently Learning the Accuracy of Labeling Sources for Selective Sampling by Pinar Donmez, Jaime Carbonell, Jeff Schneider School of Computer Science,](https://reader036.vdocuments.us/reader036/viewer/2022062308/56649dc55503460f94ab8d17/html5/thumbnails/2.jpg)
2
Problem Illustration
0.74
0.55
0.8
0.9
0.67
0.83
0.58
0.69
instances
oracles
![Page 3: 1 Efficiently Learning the Accuracy of Labeling Sources for Selective Sampling by Pinar Donmez, Jaime Carbonell, Jeff Schneider School of Computer Science,](https://reader036.vdocuments.us/reader036/viewer/2022062308/56649dc55503460f94ab8d17/html5/thumbnails/3.jpg)
3
Interval Estimate Threshold (IEThresh) Goal: find the labeler(s) with the highest expected accuracy Our work builds upon Interval Estimation [L. P. Kaelbling]
1. Estimate the reward of each labeler (more on next slide)2. Compute upper confidence interval for the labelers
3. Select labelers with upper interval higher than a threshold
4. Observe the output of the chosen oracles to estimate their reward
5. Repeat to step 1
filter out unreliable labelers reduce labeling cost
![Page 4: 1 Efficiently Learning the Accuracy of Labeling Sources for Selective Sampling by Pinar Donmez, Jaime Carbonell, Jeff Schneider School of Computer Science,](https://reader036.vdocuments.us/reader036/viewer/2022062308/56649dc55503460f94ab8d17/html5/thumbnails/4.jpg)
4
Reward of the labelers The reward of each labeler is unknown => need to be estimated
reward of a labeler eliciting true label
true label is also unknown => estimated by the majority vote
We propose the below reward function
reward=1 if the labeler agrees with the majority label reward=0 otherwise
![Page 5: 1 Efficiently Learning the Accuracy of Labeling Sources for Selective Sampling by Pinar Donmez, Jaime Carbonell, Jeff Schneider School of Computer Science,](https://reader036.vdocuments.us/reader036/viewer/2022062308/56649dc55503460f94ab8d17/html5/thumbnails/5.jpg)
5
IEThresh at the Beginning
Oracles
Expect
ed
rew
ard
incr
ease
s
![Page 6: 1 Efficiently Learning the Accuracy of Labeling Sources for Selective Sampling by Pinar Donmez, Jaime Carbonell, Jeff Schneider School of Computer Science,](https://reader036.vdocuments.us/reader036/viewer/2022062308/56649dc55503460f94ab8d17/html5/thumbnails/6.jpg)
6
IEThresh Oracle Selection
Oracles
Expect
ed
rew
ard
incr
ease
s
Threshold
1 2 3 4 5
![Page 7: 1 Efficiently Learning the Accuracy of Labeling Sources for Selective Sampling by Pinar Donmez, Jaime Carbonell, Jeff Schneider School of Computer Science,](https://reader036.vdocuments.us/reader036/viewer/2022062308/56649dc55503460f94ab8d17/html5/thumbnails/7.jpg)
7
IE Learning Snapshot IIExpect
ed
rew
ard
incr
ease
s
Oracles
Threshold
1 2 3 4 5
![Page 8: 1 Efficiently Learning the Accuracy of Labeling Sources for Selective Sampling by Pinar Donmez, Jaime Carbonell, Jeff Schneider School of Computer Science,](https://reader036.vdocuments.us/reader036/viewer/2022062308/56649dc55503460f94ab8d17/html5/thumbnails/8.jpg)
8
IEThresh Instance Selection1
3
4
5
2
![Page 9: 1 Efficiently Learning the Accuracy of Labeling Sources for Selective Sampling by Pinar Donmez, Jaime Carbonell, Jeff Schneider School of Computer Science,](https://reader036.vdocuments.us/reader036/viewer/2022062308/56649dc55503460f94ab8d17/html5/thumbnails/9.jpg)
9
Uniform Expert Accuracy є (0.5,1]
Repeated Labeling [Sheng et al, 2008]: querying all experts for labeling
Cla
ssifi
cati
on e
rror
![Page 10: 1 Efficiently Learning the Accuracy of Labeling Sources for Selective Sampling by Pinar Donmez, Jaime Carbonell, Jeff Schneider School of Computer Science,](https://reader036.vdocuments.us/reader036/viewer/2022062308/56649dc55503460f94ab8d17/html5/thumbnails/10.jpg)
10
# Oracle Queries vs. Accuracy
: First 10 iterations
: Next 40 iterations
: Next 100 iterations
![Page 11: 1 Efficiently Learning the Accuracy of Labeling Sources for Selective Sampling by Pinar Donmez, Jaime Carbonell, Jeff Schneider School of Computer Science,](https://reader036.vdocuments.us/reader036/viewer/2022062308/56649dc55503460f94ab8d17/html5/thumbnails/11.jpg)
11
# Oracle queries to reach a target accuracy
skew increases
bett
er
![Page 12: 1 Efficiently Learning the Accuracy of Labeling Sources for Selective Sampling by Pinar Donmez, Jaime Carbonell, Jeff Schneider School of Computer Science,](https://reader036.vdocuments.us/reader036/viewer/2022062308/56649dc55503460f94ab8d17/html5/thumbnails/12.jpg)
12
Results on AMT Data with Human Annotators
IEThresh reaches the best performance with similar effort to Repeated labeling
Repeated baseline needs 840 queries total to reach 0.95 accuracy
Dataset at http://nlpannotations.googlepages.com/ made available by [Snow et al., 2008]
5 annotators
6 annotators
![Page 13: 1 Efficiently Learning the Accuracy of Labeling Sources for Selective Sampling by Pinar Donmez, Jaime Carbonell, Jeff Schneider School of Computer Science,](https://reader036.vdocuments.us/reader036/viewer/2022062308/56649dc55503460f94ab8d17/html5/thumbnails/13.jpg)
13
Conclusions and Future Work Conclusions
IEThresh is effective in balancing exploration vs. exploitation tradeoff
Early filtering of unreliable labelers boosts performance Utilizing labeler accuracy estimates is more effective
than asking all or randomly
Future Work
from consistent to time-variant labeler quality label noise conditioned on the data instance correlated labeling errors
![Page 14: 1 Efficiently Learning the Accuracy of Labeling Sources for Selective Sampling by Pinar Donmez, Jaime Carbonell, Jeff Schneider School of Computer Science,](https://reader036.vdocuments.us/reader036/viewer/2022062308/56649dc55503460f94ab8d17/html5/thumbnails/14.jpg)
14
THANK YOU!