aws re:invent 2016: getting to ground truth with amazon mechanical turk (mac201)
TRANSCRIPT
![Page 1: AWS re:Invent 2016: Getting to Ground Truth with Amazon Mechanical Turk (MAC201)](https://reader031.vdocuments.us/reader031/viewer/2022030305/58714f0d1a28ab55588b762b/html5/thumbnails/1.jpg)
© 2016, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
Peter W. Hallinan, Ph.D.
A9.com
November 30, 2016
MAC201
Getting to Ground Truth with
Amazon Mechanical Turk
![Page 2: AWS re:Invent 2016: Getting to Ground Truth with Amazon Mechanical Turk (MAC201)](https://reader031.vdocuments.us/reader031/viewer/2022030305/58714f0d1a28ab55588b762b/html5/thumbnails/2.jpg)
What to expect from the session
• How to use Mechanical Turk to build ML datasets
• Best practices at smaller and larger scales
• Lessons learned building datasets for an AWS service
What not to expect
• Detailed tutorial on how to use Mechanical Turk
![Page 3: AWS re:Invent 2016: Getting to Ground Truth with Amazon Mechanical Turk (MAC201)](https://reader031.vdocuments.us/reader031/viewer/2022030305/58714f0d1a28ab55588b762b/html5/thumbnails/3.jpg)
Machine learning requires large scale data
production pipelines
• Readily available: algorithms and compute clusters
• Not readily available: large scale, high quality datasets
Amazon Mechanical Turk can help you build your dataset
• Training data is the key differentiator
• Success depends on the quality, scale, and throughputof your dataset production pipelines
![Page 4: AWS re:Invent 2016: Getting to Ground Truth with Amazon Mechanical Turk (MAC201)](https://reader031.vdocuments.us/reader031/viewer/2022030305/58714f0d1a28ab55588b762b/html5/thumbnails/4.jpg)
www.mturk.com
![Page 5: AWS re:Invent 2016: Getting to Ground Truth with Amazon Mechanical Turk (MAC201)](https://reader031.vdocuments.us/reader031/viewer/2022030305/58714f0d1a28ab55588b762b/html5/thumbnails/5.jpg)
What is Amazon Mechanical Turk (MTurk)?
• A marketplace for getting simple tasks done in parallel by humans
• Launched in 2005, one of the first AWS services
• Basic unit of work is a HIT, a single, self-contained task
• Example: “How many wolves are in this photo?”
• Requesters use website or APIs to publish HITs to workers and
consume results
• One or more workers per HIT
• Rapid response times
• Simple workflow: HTML template, csv in, csv out
![Page 6: AWS re:Invent 2016: Getting to Ground Truth with Amazon Mechanical Turk (MAC201)](https://reader031.vdocuments.us/reader031/viewer/2022030305/58714f0d1a28ab55588b762b/html5/thumbnails/6.jpg)
What datasets can you build with MTurk?
Almost any domain
• Vision, NLP, psych, etc.
All key task types
• Open-ended questions
• Structured questions
• Binary verifications
ImageNet
• Prof. Fei Fei Li, Stanford AI Lab
• 21841 WordNet categories
• 14.1 MM total images
• 1 MM localized examples
• ImageNet Challenge 2010-2016
Search Google Scholar for “Mechanical Turk” + “machine learning”
Result: 6000+ citations
L. Fei-Fei, ImageNet: crowdsourcing, benchmarking and other
cool things, CMU VASC Seminar, March, 2010.
![Page 7: AWS re:Invent 2016: Getting to Ground Truth with Amazon Mechanical Turk (MAC201)](https://reader031.vdocuments.us/reader031/viewer/2022030305/58714f0d1a28ab55588b762b/html5/thumbnails/7.jpg)
What kind of quality can you expect?
1. How much do your
categories intrinsically
overlap?
2. How representative is
your “golden set”?
3. How well can workers
solve your specific HIT?
Wolves Dogs?
Feature B
Feature A
11
![Page 8: AWS re:Invent 2016: Getting to Ground Truth with Amazon Mechanical Turk (MAC201)](https://reader031.vdocuments.us/reader031/viewer/2022030305/58714f0d1a28ab55588b762b/html5/thumbnails/8.jpg)
What kind of quality can you expect?
1. How much do your
categories intrinsically
overlap?
2. How representative is
your “golden set”?
3. How well can Workers
solve your specific HIT?
True wolves
2
Golden wolves
Feature B
Feature A
Worker
wolves
3
![Page 9: AWS re:Invent 2016: Getting to Ground Truth with Amazon Mechanical Turk (MAC201)](https://reader031.vdocuments.us/reader031/viewer/2022030305/58714f0d1a28ab55588b762b/html5/thumbnails/9.jpg)
Rapid prototyping:
Smaller scale datasets
![Page 10: AWS re:Invent 2016: Getting to Ground Truth with Amazon Mechanical Turk (MAC201)](https://reader031.vdocuments.us/reader031/viewer/2022030305/58714f0d1a28ab55588b762b/html5/thumbnails/10.jpg)
Dataset construction is highly iterative...
… so MTurk supports
rapid iterations
Source data
Define HIT & golden set
EvaluateHIT results
Augmentdata
Train & testML algorithm
Define objectives
MTurk
![Page 11: AWS re:Invent 2016: Getting to Ground Truth with Amazon Mechanical Turk (MAC201)](https://reader031.vdocuments.us/reader031/viewer/2022030305/58714f0d1a28ab55588b762b/html5/thumbnails/11.jpg)
Example: Build a wristwatch classifier
• ML objective: Label wristwatch “shape”
• Dataset objective: ~2000 training examples
• Data source: Amazon Catalog
![Page 12: AWS re:Invent 2016: Getting to Ground Truth with Amazon Mechanical Turk (MAC201)](https://reader031.vdocuments.us/reader031/viewer/2022030305/58714f0d1a28ab55588b762b/html5/thumbnails/12.jpg)
Experiment A: 1 shape feature, 3 categories
Rectangular
Circular
Other
![Page 13: AWS re:Invent 2016: Getting to Ground Truth with Amazon Mechanical Turk (MAC201)](https://reader031.vdocuments.us/reader031/viewer/2022030305/58714f0d1a28ab55588b762b/html5/thumbnails/13.jpg)
Experiment A: HIT design
![Page 14: AWS re:Invent 2016: Getting to Ground Truth with Amazon Mechanical Turk (MAC201)](https://reader031.vdocuments.us/reader031/viewer/2022030305/58714f0d1a28ab55588b762b/html5/thumbnails/14.jpg)
Experiment A: Accuracy is 97.29%
Golden set
MTurk
Circle Rect Other 700 Accuracy Precision Recall
Circle 504 3 10 571 97.29% 97% 99%
Rect 0 64 2 66 97% 94%
Other 3 1 113 117 97% 90%
700 507 68 125 681
72% 10% 18%
Can we do better?
![Page 15: AWS re:Invent 2016: Getting to Ground Truth with Amazon Mechanical Turk (MAC201)](https://reader031.vdocuments.us/reader031/viewer/2022030305/58714f0d1a28ab55588b762b/html5/thumbnails/15.jpg)
Experiment B: 2 shape features, 16 categories
Dial face
shape
Casing shape
Other
Rectangle
Circle/oval
Circle/oval
Rectangle
Other
Tonneau
Tonneau
![Page 16: AWS re:Invent 2016: Getting to Ground Truth with Amazon Mechanical Turk (MAC201)](https://reader031.vdocuments.us/reader031/viewer/2022030305/58714f0d1a28ab55588b762b/html5/thumbnails/16.jpg)
Experiment B: Hit Design
![Page 17: AWS re:Invent 2016: Getting to Ground Truth with Amazon Mechanical Turk (MAC201)](https://reader031.vdocuments.us/reader031/viewer/2022030305/58714f0d1a28ab55588b762b/html5/thumbnails/17.jpg)
Experiment B: Accuracy drops 4.72% to 92.57%
Golden setCase/
Dial C/C T/C R/C O/C C/T T/T R/T O/T C/R T/R R/R O/R C/O T/O R/O O/O 700 Accuracy Precision Recall
MTurk
C/C 453 3 3 7 8 474 92.57% 96% 97%
T/C 11 19 1 2 1 34 56% 86%
R/C 1 1 2 50% 20%
O/C 5 4 9 44% 31%
C/T 0 0 - -
T/T 3 1 4 75% 100%
R/T 0 1 1 0% -
O/T 0 0 - -
C/R 0 0 - -
T/R 0 0 - 0%
R/R 53 2 2 57 93% 93%
O/R 2 2 100% 40%
C/O 0 1 1 0% -
T/O 0 0 - -
R/O 0 0 - -
O/O 1 2 113 116 97% 90%
700 469 22 5 13 0 3 0 0 0 1 57 5 0 0 0 125 648
Prevalence 67% 3% 1% 2% 0% 0% 0% 0% 0% 0% 8% 1% 0% 0% 0% 18%
A second “fuzzy” feature creates more opportunity for disagreement
C = Circle/Oval; T = Tonneau; R = Rectangular; O = Other
![Page 18: AWS re:Invent 2016: Getting to Ground Truth with Amazon Mechanical Turk (MAC201)](https://reader031.vdocuments.us/reader031/viewer/2022030305/58714f0d1a28ab55588b762b/html5/thumbnails/18.jpg)
Experiment B: Filtering workers adds 0.14%
Golden setCase/
Dial C/C T/C R/C O/C C/T T/T R/T O/T C/R T/R R/R O/R C/O T/O R/O O/O 700 Accuracy Precision Recall
MTurk
C/C 455 3 3 7 8 476 92.71% 96% 97%
T/C 10 19 1 2 1 33 58% 86%
R/C 1 2 3 33% 20%
O/C 4 4 8 50% 31%
C/T 0 0 - -
T/T 3 1 4 75% 100%
R/T 0 1 1 0% -
O/T 0 0 - -
C/R 0 0 - -
T/R 0 0 - 0%
R/R 52 2 2 56 93% 91%
O/R 2 2 100% 40%
C/O 0 1 1 0% -
T/O 0 0 - -
R/O 0 0 - -
O/O 1 2 113 116 97% 90%
700 469 22 5 13 0 3 0 0 0 1 57 5 0 0 0 125 649
Prevalence 67% 3% 1% 2% 0% 0% 0% 0% 0% 0% 8% 1% 0% 0% 0% 18%
Only 5/124 workers are in the minority for a majority of their votes
![Page 19: AWS re:Invent 2016: Getting to Ground Truth with Amazon Mechanical Turk (MAC201)](https://reader031.vdocuments.us/reader031/viewer/2022030305/58714f0d1a28ab55588b762b/html5/thumbnails/19.jpg)
Many possible experiments; we’ve reported three
Quality levers Lever settings / experiments
ML objectives 1 feature / 3 categories, 2 features / 16 categories
Data sources and segments Held constant
Golden set # examples: 100, 700
# annotators: 1,3
HIT design / instructions Picture only, text only, picture and text
Worker selection Prequalified
Workers per HIT 3, 5
HIT aggregation rules Majority vote vs majority of filtered workers
Worker feedback None
![Page 20: AWS re:Invent 2016: Getting to Ground Truth with Amazon Mechanical Turk (MAC201)](https://reader031.vdocuments.us/reader031/viewer/2022030305/58714f0d1a28ab55588b762b/html5/thumbnails/20.jpg)
Quality control levers
ML objectives
Data sources and segments
Golden sets
HIT design / instructions
Worker selection
Workers per HIT
HIT aggregation rules
Worker feedback
Myth: Dataset quality is an intrinsic property of the
MTurk marketplace
Throughput control levers
• HIT price
• HIT publication rate
1
2
3
MTurk provides you with control
levers to optimize dataset
quality and throughput
![Page 21: AWS re:Invent 2016: Getting to Ground Truth with Amazon Mechanical Turk (MAC201)](https://reader031.vdocuments.us/reader031/viewer/2022030305/58714f0d1a28ab55588b762b/html5/thumbnails/21.jpg)
Best practices: HIT design
• Simplicity of question vs. clarity of answers• Prefer questions with limited option sets to open-ended questions
• Prefer mutually exclusive, collectively exhaustive option sets
• Prefer smaller option sets to larger ones
• Ease of learning vs. time to complete HIT• Prefer more questions and simpler instructions to fewer questions and more
complex instructions
• Prefer that each possible answer set costs the same time to provide
• Workers optimize their behaviors to your design; don’t “tweak” too much
![Page 22: AWS re:Invent 2016: Getting to Ground Truth with Amazon Mechanical Turk (MAC201)](https://reader031.vdocuments.us/reader031/viewer/2022030305/58714f0d1a28ab55588b762b/html5/thumbnails/22.jpg)
Using MTurk criteria• Geography
• Worker approval rating
• Total # HITs approved
• Masters status
• Mobile device user
• Political affiliation
• High school graduate
• Bachelor’s degree
• Marital status
• Parenthood status
• Voted in 2012 presidential election
• Smoker
• Car owner
• Handedness
Best practices: Selecting workers
Using your own criteria• Past performance on your HITs
• Custom tests of domain specific knowledge
• Custom tests of decision-making ability
![Page 23: AWS re:Invent 2016: Getting to Ground Truth with Amazon Mechanical Turk (MAC201)](https://reader031.vdocuments.us/reader031/viewer/2022030305/58714f0d1a28ab55588b762b/html5/thumbnails/23.jpg)
Aggregating results
• When using multiple workers /
HIT, aggregate results with voting
scheme
• Align voting w/ prevalence
• Moderate prevalence => Majority
voting
• Low prevalence => Any yes vote
• Either drop split decisions, or force
them into a category that can be
split later
Best practices: Assessing results
Worker feedback
• Approve and reject HITs carefully
• Automatic rejections require
ironclad reasons
• Adjust selection criteria
• Monitor emails and forums
![Page 24: AWS re:Invent 2016: Getting to Ground Truth with Amazon Mechanical Turk (MAC201)](https://reader031.vdocuments.us/reader031/viewer/2022030305/58714f0d1a28ab55588b762b/html5/thumbnails/24.jpg)
Best practices: Boosting quality
• Separate your categories
• Scrutinize false positives and negatives
• Simplify and clarify instructions
• Optimize worker quals
• Experiment, experiment, experiment!
Dataset accuracy puts an upper bound on system performance
![Page 25: AWS re:Invent 2016: Getting to Ground Truth with Amazon Mechanical Turk (MAC201)](https://reader031.vdocuments.us/reader031/viewer/2022030305/58714f0d1a28ab55588b762b/html5/thumbnails/25.jpg)
Scaling Up:
Larger scale datasets
![Page 26: AWS re:Invent 2016: Getting to Ground Truth with Amazon Mechanical Turk (MAC201)](https://reader031.vdocuments.us/reader031/viewer/2022030305/58714f0d1a28ab55588b762b/html5/thumbnails/26.jpg)
Challenge: Measuring quality over time
Example
• 1 MM HITs @ 100K / week
• 99% confidence level
• Confidence interval (CI)
varies with sample size
Three strategies
• Scrutinize @ fixed CI
• Scrutinize @ decreasing CI
• Scrutinize and trust
Potential tactic
• Partition workers and
interleave golden sets
0.0%
1.0%
2.0%
3.0%
4.0%
5.0%
6.0%
7.0%
8.0%
9.0%
0 1000 2000 3000 4000 5000 6000
Cre
dib
le r
egi
on
wid
th
Number of answers checked
Credible Region Width vs. # Answers Checked
![Page 27: AWS re:Invent 2016: Getting to Ground Truth with Amazon Mechanical Turk (MAC201)](https://reader031.vdocuments.us/reader031/viewer/2022030305/58714f0d1a28ab55588b762b/html5/thumbnails/27.jpg)
Myth: Workers always fatigue/satisfice with time
Worker accuracy is
stable and predictable
Check forums for
Worker discussions of
your HITs
Kenji Hata, Ranjay Krishna, Li Fei-Fei, and Michael S. Bernstein. “A Glimpse Far into the Future:
Understanding Long-term Crowd Worker Accuracy.” arXiv preprint arXiv:1609.04855 (2016).
![Page 28: AWS re:Invent 2016: Getting to Ground Truth with Amazon Mechanical Turk (MAC201)](https://reader031.vdocuments.us/reader031/viewer/2022030305/58714f0d1a28ab55588b762b/html5/thumbnails/28.jpg)
Top down (20 questions)
• Build classifier with 100 root
categories, not 20K leaf
categories
• HIT 1 labels candidate root
examples (only 10-20K ex. req’d)
• HIT 2 verifies root members
• Split root categories
• Repeat
Challenge: Minimizing cost per training example
Bottom up (data mining)
• Mine your source data for clusters
• HIT 1 assigns labels to clusters
• HIT 2 verifies members of clusters
• Delete clusters members from
source data
• Repeat
Divide and conquer to maximize validation rates
ML goal: recognize 20K categories; Dataset goal: 1K examples / category.
![Page 29: AWS re:Invent 2016: Getting to Ground Truth with Amazon Mechanical Turk (MAC201)](https://reader031.vdocuments.us/reader031/viewer/2022030305/58714f0d1a28ab55588b762b/html5/thumbnails/29.jpg)
Challenge: Success
• Eventually, user input will
differ from the data you
trained on
• Assess your actual
recognition rates
• Use errors to guide
expansion of your test and
training datasets
![Page 30: AWS re:Invent 2016: Getting to Ground Truth with Amazon Mechanical Turk (MAC201)](https://reader031.vdocuments.us/reader031/viewer/2022030305/58714f0d1a28ab55588b762b/html5/thumbnails/30.jpg)
Amazon Rekognition:
Lessons Learned
Ranju Das
Amazon Rekognition
![Page 31: AWS re:Invent 2016: Getting to Ground Truth with Amazon Mechanical Turk (MAC201)](https://reader031.vdocuments.us/reader031/viewer/2022030305/58714f0d1a28ab55588b762b/html5/thumbnails/31.jpg)
Amazon Rekognition
A deep learning-based image recognition service
Search, verify, and organize millions of images
Object and scene
detectionFacial analysis Face comparison Facial recognition
![Page 32: AWS re:Invent 2016: Getting to Ground Truth with Amazon Mechanical Turk (MAC201)](https://reader031.vdocuments.us/reader031/viewer/2022030305/58714f0d1a28ab55588b762b/html5/thumbnails/32.jpg)
What do people see?
• People see a lot more than what is imaged on the retina
Vision involves a process called “unconscious inference” in neuroscience
The largely unconscious nature of the inferences is confirmed by the study
of optical illusions
• In order for a human observer to recognize an image, two neuronal processes come together:
Sensory activation from the eyes (referential system)
Information from past experience that is stored in distributed regions across
the brain (inferential system)
![Page 33: AWS re:Invent 2016: Getting to Ground Truth with Amazon Mechanical Turk (MAC201)](https://reader031.vdocuments.us/reader031/viewer/2022030305/58714f0d1a28ab55588b762b/html5/thumbnails/33.jpg)
What do you see in the yellow bounding box (“region proposal”)?
“a hat”?
![Page 34: AWS re:Invent 2016: Getting to Ground Truth with Amazon Mechanical Turk (MAC201)](https://reader031.vdocuments.us/reader031/viewer/2022030305/58714f0d1a28ab55588b762b/html5/thumbnails/34.jpg)
We “know” from correlation with other image crops and past experience that it’s a baby
People don’t classify “region proposals” in isolation
What do you see in the yellow bounding box (“region proposal”)?
“a baby”!
![Page 35: AWS re:Invent 2016: Getting to Ground Truth with Amazon Mechanical Turk (MAC201)](https://reader031.vdocuments.us/reader031/viewer/2022030305/58714f0d1a28ab55588b762b/html5/thumbnails/35.jpg)
Examples: Common inferencesAdding “must be” invisible objects
baby
![Page 36: AWS re:Invent 2016: Getting to Ground Truth with Amazon Mechanical Turk (MAC201)](https://reader031.vdocuments.us/reader031/viewer/2022030305/58714f0d1a28ab55588b762b/html5/thumbnails/36.jpg)
Examples: Common inferencesAdding “must be” invisible objects
fish
![Page 37: AWS re:Invent 2016: Getting to Ground Truth with Amazon Mechanical Turk (MAC201)](https://reader031.vdocuments.us/reader031/viewer/2022030305/58714f0d1a28ab55588b762b/html5/thumbnails/37.jpg)
Examples: Common inferencesAdding “must be” invisible objects
ring
![Page 38: AWS re:Invent 2016: Getting to Ground Truth with Amazon Mechanical Turk (MAC201)](https://reader031.vdocuments.us/reader031/viewer/2022030305/58714f0d1a28ab55588b762b/html5/thumbnails/38.jpg)
Examples: Common inferencesBetting whole from parts
baby
![Page 39: AWS re:Invent 2016: Getting to Ground Truth with Amazon Mechanical Turk (MAC201)](https://reader031.vdocuments.us/reader031/viewer/2022030305/58714f0d1a28ab55588b762b/html5/thumbnails/39.jpg)
Examples: Common inferencesBetting whole from parts
family
![Page 40: AWS re:Invent 2016: Getting to Ground Truth with Amazon Mechanical Turk (MAC201)](https://reader031.vdocuments.us/reader031/viewer/2022030305/58714f0d1a28ab55588b762b/html5/thumbnails/40.jpg)
Examples: Common inferencesBetting whole from parts
balloon
![Page 41: AWS re:Invent 2016: Getting to Ground Truth with Amazon Mechanical Turk (MAC201)](https://reader031.vdocuments.us/reader031/viewer/2022030305/58714f0d1a28ab55588b762b/html5/thumbnails/41.jpg)
Examples: Common inferencesReading (and trusting) text hints
farm
![Page 42: AWS re:Invent 2016: Getting to Ground Truth with Amazon Mechanical Turk (MAC201)](https://reader031.vdocuments.us/reader031/viewer/2022030305/58714f0d1a28ab55588b762b/html5/thumbnails/42.jpg)
Examples: Common inferencesReading (and trusting) text hints
chocolate
![Page 43: AWS re:Invent 2016: Getting to Ground Truth with Amazon Mechanical Turk (MAC201)](https://reader031.vdocuments.us/reader031/viewer/2022030305/58714f0d1a28ab55588b762b/html5/thumbnails/43.jpg)
Examples: Common inferencesReading (and trusting) text hints
pizza
![Page 44: AWS re:Invent 2016: Getting to Ground Truth with Amazon Mechanical Turk (MAC201)](https://reader031.vdocuments.us/reader031/viewer/2022030305/58714f0d1a28ab55588b762b/html5/thumbnails/44.jpg)
Examples: Common inferencesReading (and trusting) stereotypes and symbols
beer
![Page 45: AWS re:Invent 2016: Getting to Ground Truth with Amazon Mechanical Turk (MAC201)](https://reader031.vdocuments.us/reader031/viewer/2022030305/58714f0d1a28ab55588b762b/html5/thumbnails/45.jpg)
Examples: Common inferencesReading (and trusting) stereotypes and symbols
4th of July
![Page 46: AWS re:Invent 2016: Getting to Ground Truth with Amazon Mechanical Turk (MAC201)](https://reader031.vdocuments.us/reader031/viewer/2022030305/58714f0d1a28ab55588b762b/html5/thumbnails/46.jpg)
Examples: Common inferencesReading (and trusting) stereotypes and symbols
party
![Page 47: AWS re:Invent 2016: Getting to Ground Truth with Amazon Mechanical Turk (MAC201)](https://reader031.vdocuments.us/reader031/viewer/2022030305/58714f0d1a28ab55588b762b/html5/thumbnails/47.jpg)
Examples: Common inferencesGambling on the past and future
swimming
![Page 48: AWS re:Invent 2016: Getting to Ground Truth with Amazon Mechanical Turk (MAC201)](https://reader031.vdocuments.us/reader031/viewer/2022030305/58714f0d1a28ab55588b762b/html5/thumbnails/48.jpg)
Examples: Common inferencesGambling on the past and future
wedding
![Page 49: AWS re:Invent 2016: Getting to Ground Truth with Amazon Mechanical Turk (MAC201)](https://reader031.vdocuments.us/reader031/viewer/2022030305/58714f0d1a28ab55588b762b/html5/thumbnails/49.jpg)
Examples: Common inferencesGambling on the past and future
camping
![Page 50: AWS re:Invent 2016: Getting to Ground Truth with Amazon Mechanical Turk (MAC201)](https://reader031.vdocuments.us/reader031/viewer/2022030305/58714f0d1a28ab55588b762b/html5/thumbnails/50.jpg)
Verification example
![Page 51: AWS re:Invent 2016: Getting to Ground Truth with Amazon Mechanical Turk (MAC201)](https://reader031.vdocuments.us/reader031/viewer/2022030305/58714f0d1a28ab55588b762b/html5/thumbnails/51.jpg)
Bounding box example
![Page 52: AWS re:Invent 2016: Getting to Ground Truth with Amazon Mechanical Turk (MAC201)](https://reader031.vdocuments.us/reader031/viewer/2022030305/58714f0d1a28ab55588b762b/html5/thumbnails/52.jpg)
Group image verification example
Yes
No
Sample Images
Descriptions>
Progress
6/200
Back
Does each image contain a cat?
![Page 53: AWS re:Invent 2016: Getting to Ground Truth with Amazon Mechanical Turk (MAC201)](https://reader031.vdocuments.us/reader031/viewer/2022030305/58714f0d1a28ab55588b762b/html5/thumbnails/53.jpg)
How to improve quality and consistency
Make the HITs “bite”-sized
Create clear and concise instructions
Ask multiple people and build consensus
Include control images to measure performance of workers
Use qualifications and white/black lists to control workforce
![Page 54: AWS re:Invent 2016: Getting to Ground Truth with Amazon Mechanical Turk (MAC201)](https://reader031.vdocuments.us/reader031/viewer/2022030305/58714f0d1a28ab55588b762b/html5/thumbnails/54.jpg)
Thank you!
![Page 55: AWS re:Invent 2016: Getting to Ground Truth with Amazon Mechanical Turk (MAC201)](https://reader031.vdocuments.us/reader031/viewer/2022030305/58714f0d1a28ab55588b762b/html5/thumbnails/55.jpg)
Remember to complete
your evaluations!