![Page 1: Introduction to Mechanized Labor Marketplaces: Mechanical Turk Uichin Lee KAIST KSE](https://reader035.vdocuments.us/reader035/viewer/2022062421/56649e2c5503460f94b1c0d8/html5/thumbnails/1.jpg)
Introduction to Mechanized Labor Marketplaces: Mechanical Turk
Uichin LeeKAIST KSE
![Page 2: Introduction to Mechanized Labor Marketplaces: Mechanical Turk Uichin Lee KAIST KSE](https://reader035.vdocuments.us/reader035/viewer/2022062421/56649e2c5503460f94b1c0d8/html5/thumbnails/2.jpg)
Mechanical Turk• Begin with a project
– Define the goals and key components of your project. For example, your goal might be to clean your business listing database so that you have accurate information for consumers.
• Break it into tasks and design your HIT– Break the project into individual tasks; e.g., if you have 1,000 listings to verify, each listing would be
an individual task.– Next, design your Human Intelligence Tasks (HITs) by writing crisp and clear instructions, identifying
the specific outputs/inputs desired and how much you will pay to have work completed.• Publish HITs to the marketplace
– You can load millions of HITs into the marketplace. Each HIT can have multiple assignments so that different Workers can provide answers to the same set of questions and you can compare the results to form an agreed-upon answer.
https://requester.mturk.com/tour/how_it_works
![Page 3: Introduction to Mechanized Labor Marketplaces: Mechanical Turk Uichin Lee KAIST KSE](https://reader035.vdocuments.us/reader035/viewer/2022062421/56649e2c5503460f94b1c0d8/html5/thumbnails/3.jpg)
Mechanical Turk
• Workers accept assignments– If Workers need special skills to complete your tasks, you can require that they pass a
Qualification test before they are allowed to work on your HITs. – You can also require other Qualifications such as the location of a Worker or that they have
completed a minimum number of HITs.• Workers submit assignments for review
– When a Worker completes your HIT, he or she submits an assignment for you to review.• Approve or reject assignments
– When your work items have been completed, you can review the results and approve or reject them. You pay only for approved work.
• Complete your project– Congratulations! Your project has been completed and your Workers have been paid.
https://requester.mturk.com/tour/how_it_works
![Page 4: Introduction to Mechanized Labor Marketplaces: Mechanical Turk Uichin Lee KAIST KSE](https://reader035.vdocuments.us/reader035/viewer/2022062421/56649e2c5503460f94b1c0d8/html5/thumbnails/4.jpg)
Screenshot
![Page 5: Introduction to Mechanized Labor Marketplaces: Mechanical Turk Uichin Lee KAIST KSE](https://reader035.vdocuments.us/reader035/viewer/2022062421/56649e2c5503460f94b1c0d8/html5/thumbnails/5.jpg)
Screenshot
![Page 6: Introduction to Mechanized Labor Marketplaces: Mechanical Turk Uichin Lee KAIST KSE](https://reader035.vdocuments.us/reader035/viewer/2022062421/56649e2c5503460f94b1c0d8/html5/thumbnails/6.jpg)
AMT Questions
• Who are the workers that complete these tasks?
• What type of tasks can be completed in the marketplace?
• How much does it cost?• How fast can I get results back?• How big is the AMT marketplace?
Analyzing the Amazon Mechanical Turk marketplace, P. G. Ipeirotis, Journal XRDS: Crossroads, 2010Demographics of Mechanical Turk, P.G. Ipeirotis, NYU TR, 2010
![Page 7: Introduction to Mechanized Labor Marketplaces: Mechanical Turk Uichin Lee KAIST KSE](https://reader035.vdocuments.us/reader035/viewer/2022062421/56649e2c5503460f94b1c0d8/html5/thumbnails/7.jpg)
Gender and Age
• Countries: 46.80% US, India: 34%, Misc: 19.2% (from 66 different countries)
http://behind-the-enemy-lines.blogspot.com/2010/03/new-demographics-of-mechanical-turk.html
![Page 8: Introduction to Mechanized Labor Marketplaces: Mechanical Turk Uichin Lee KAIST KSE](https://reader035.vdocuments.us/reader035/viewer/2022062421/56649e2c5503460f94b1c0d8/html5/thumbnails/8.jpg)
Educational Level
• Many of the workers are younger than overall population, and this leads to higher educational levels
![Page 9: Introduction to Mechanized Labor Marketplaces: Mechanical Turk Uichin Lee KAIST KSE](https://reader035.vdocuments.us/reader035/viewer/2022062421/56649e2c5503460f94b1c0d8/html5/thumbnails/9.jpg)
Income Level
• Indian Turkers relatively have low income level
![Page 10: Introduction to Mechanized Labor Marketplaces: Mechanical Turk Uichin Lee KAIST KSE](https://reader035.vdocuments.us/reader035/viewer/2022062421/56649e2c5503460f94b1c0d8/html5/thumbnails/10.jpg)
Marital Status and Household Size
• Lots of single workers• Indian workers tend to belong to larger households
![Page 11: Introduction to Mechanized Labor Marketplaces: Mechanical Turk Uichin Lee KAIST KSE](https://reader035.vdocuments.us/reader035/viewer/2022062421/56649e2c5503460f94b1c0d8/html5/thumbnails/11.jpg)
Level of Engagement on M-Turk• Most workers spent less than a day per week, completing 20-100 HITs, and
earning less than $20 per week.
![Page 12: Introduction to Mechanized Labor Marketplaces: Mechanical Turk Uichin Lee KAIST KSE](https://reader035.vdocuments.us/reader035/viewer/2022062421/56649e2c5503460f94b1c0d8/html5/thumbnails/12.jpg)
Motivation• Why do you complete tasks in Mechanical Turk? Please check any of the
following that applies (multiple items possible):– [1] Fruitful way to spend free time and get some cash (e.g., instead of watching TV)– [2] I find the tasks to be fun– [3] To kill time– [4] For "primary" income purposes (e.g., gas, bills, groceries, credit cards)– [5] For "secondary" income purposes, pocket change (for hobbies, gadgets, going out)– [6] I am currently unemployed, or have only a part time job
[1] [2] [3]
![Page 13: Introduction to Mechanized Labor Marketplaces: Mechanical Turk Uichin Lee KAIST KSE](https://reader035.vdocuments.us/reader035/viewer/2022062421/56649e2c5503460f94b1c0d8/html5/thumbnails/13.jpg)
Motivation• Why do you complete tasks in Mechanical Turk? Please check any of the
following that applies:– [1] Fruitful way to spend free time and get some cash (e.g., instead of watching TV)– [2] I find the tasks to be fun– [3] To kill time– [4] For "primary" income purposes (e.g., gas, bills, groceries, credit cards)– [5] For "secondary" income purposes, pocket change (for hobbies, gadgets, going
out)– [6] I am currently unemployed, or have only a part time job
[4] [5] [6]
![Page 14: Introduction to Mechanized Labor Marketplaces: Mechanical Turk Uichin Lee KAIST KSE](https://reader035.vdocuments.us/reader035/viewer/2022062421/56649e2c5503460f94b1c0d8/html5/thumbnails/14.jpg)
Summary
• Significant fraction of Turkers are from US and India (> 80% vs. 20% from misc)
• Turkers are younger (more than 50% from 21-35)
• More females (US) and more males (India)• Turkers relatively have lower income• Turkers relatively have smaller families • Geographic distribution of Turkers and Internet
users is similar
![Page 15: Introduction to Mechanized Labor Marketplaces: Mechanical Turk Uichin Lee KAIST KSE](https://reader035.vdocuments.us/reader035/viewer/2022062421/56649e2c5503460f94b1c0d8/html5/thumbnails/15.jpg)
Type of Tasks in M-Turk
![Page 16: Introduction to Mechanized Labor Marketplaces: Mechanical Turk Uichin Lee KAIST KSE](https://reader035.vdocuments.us/reader035/viewer/2022062421/56649e2c5503460f94b1c0d8/html5/thumbnails/16.jpg)
Requester vs. Total Rewards
• Long tail nature of participation
![Page 17: Introduction to Mechanized Labor Marketplaces: Mechanical Turk Uichin Lee KAIST KSE](https://reader035.vdocuments.us/reader035/viewer/2022062421/56649e2c5503460f94b1c0d8/html5/thumbnails/17.jpg)
Keywords vs. Ranks
![Page 18: Introduction to Mechanized Labor Marketplaces: Mechanical Turk Uichin Lee KAIST KSE](https://reader035.vdocuments.us/reader035/viewer/2022062421/56649e2c5503460f94b1c0d8/html5/thumbnails/18.jpg)
Price Distribution
• HITgroups with a large number of HITs tend to have a low price (e.g., $0.10)
• 90% of HITs pay less than $0.10
![Page 19: Introduction to Mechanized Labor Marketplaces: Mechanical Turk Uichin Lee KAIST KSE](https://reader035.vdocuments.us/reader035/viewer/2022062421/56649e2c5503460f94b1c0d8/html5/thumbnails/19.jpg)
Posting and Serving Process• Cumulative distribution plots:
– For each day, the value of tasks being posted by the AMT requesters, and the value of the tasks that got completed in each day
• Median posting/completion rate: $1,040 vs. $1,155 per day• M/M/1 queueing assumption: a task worth $1, average completion time of 12.5
minutes
Value of posted/completed HITs in USD ($)
![Page 20: Introduction to Mechanized Labor Marketplaces: Mechanical Turk Uichin Lee KAIST KSE](https://reader035.vdocuments.us/reader035/viewer/2022062421/56649e2c5503460f94b1c0d8/html5/thumbnails/20.jpg)
Posting and Serving Process
• Less posting over the weekends (requesters)• Less work done on Monday due to less posting over the
weekendsPosting Completion
![Page 21: Introduction to Mechanized Labor Marketplaces: Mechanical Turk Uichin Lee KAIST KSE](https://reader035.vdocuments.us/reader035/viewer/2022062421/56649e2c5503460f94b1c0d8/html5/thumbnails/21.jpg)
Completion Time Distribution
10 days 10 days
Power law distribution:
w/ alpha = -1.48
![Page 22: Introduction to Mechanized Labor Marketplaces: Mechanical Turk Uichin Lee KAIST KSE](https://reader035.vdocuments.us/reader035/viewer/2022062421/56649e2c5503460f94b1c0d8/html5/thumbnails/22.jpg)
Running Experiments with Amazon Mechanical-Turk
Gabriele Paolacci, Jesse Chandler, Jesse ChandlerJudgment and Decision Making, Vol. 5, No. 5,
August 2010KSE 801: Human Computation and Crowdsourcing
![Page 23: Introduction to Mechanized Labor Marketplaces: Mechanical Turk Uichin Lee KAIST KSE](https://reader035.vdocuments.us/reader035/viewer/2022062421/56649e2c5503460f94b1c0d8/html5/thumbnails/23.jpg)
Practical Advantages of M-Turk• Supportive infrastructure:
– Fast recruiting– Convenient to run experiments– External site could be used (e.g., validation code)
• Subject identifiability and prescreening:– M-Turk workers can be required to earn “qualifications” (or prescreening questions) prior
to completing a HIT • Subject identifiability and longitudinal studies:
– Worker IDs can be used to explicitly re-contact former subjects or code can be written that restricts the availability of a HIT to a predetermined list of workers
• Cultural diversity: – Cross-cultural comparisons feasible (e.g., country, language, currency)
• Subject anonymity (not easy though)– Ensuring worker’s anonymity (if external site is used) – M-Turk studies can be exempted for the review of IRBs (Institutional Review Boards) if
anonymity is guaranteed
![Page 24: Introduction to Mechanized Labor Marketplaces: Mechanical Turk Uichin Lee KAIST KSE](https://reader035.vdocuments.us/reader035/viewer/2022062421/56649e2c5503460f94b1c0d8/html5/thumbnails/24.jpg)
Tradeoffs of Different Recruiting Methods
![Page 25: Introduction to Mechanized Labor Marketplaces: Mechanical Turk Uichin Lee KAIST KSE](https://reader035.vdocuments.us/reader035/viewer/2022062421/56649e2c5503460f94b1c0d8/html5/thumbnails/25.jpg)
A Comparative Study
• Tested various Judgment and Decision Making (JDM) findings– M-Turk, a traditional subject pool at a large
Midwestern US university, and visitors of online discussion boards
– During April to May 2010• Survey:– Asian disease problem– Linda problem– Physician problem
![Page 26: Introduction to Mechanized Labor Marketplaces: Mechanical Turk Uichin Lee KAIST KSE](https://reader035.vdocuments.us/reader035/viewer/2022062421/56649e2c5503460f94b1c0d8/html5/thumbnails/26.jpg)
Survey (Asian Disease Problem)• Asian disease problem (called framing, Tversky and Kahnerman, 1981)• Subjects read one of two hypothetical scenarios
– Imagine that the United States is preparing for the outbreak of an unusual Asian disease, which is expected to kill 600 people. Two alternative programs to combat the disease have been proposed. Assume that the exact scientific estimates of the consequences of the programs are as follows:
– Problem 1: If Program A is adopted, 200 people will be saved. If Program B is adopted, there is 1/3 probability that 600 people will be saved and 2/3 probability that no people will be saved. Which of the two programs would you favor?
– Problem 2: If Program A is adopted, 400 people will die. If Program B is adopted, there is 1/3 probability that nobody will die, and 2/3 probability that 600 people will die.
• Two scenarios are numerically identical, but the subjects responded very differently
• In the scenario framed in terms of gains, subjects were risk-averse (72% chose Program A); in the scenario framed in terms of losses, 78% of subjects preferred Program B (Tversky and Kahnerman, 1981)
![Page 27: Introduction to Mechanized Labor Marketplaces: Mechanical Turk Uichin Lee KAIST KSE](https://reader035.vdocuments.us/reader035/viewer/2022062421/56649e2c5503460f94b1c0d8/html5/thumbnails/27.jpg)
Survey (Linda Problem)• Example: “Linda is 31 years old, single, outspoken, and very
bright. She majored in philosophy. As a student, she was deeply concerned with issues of discrimination and social justice, and also participated in anti-nuclear demonstrations.”
• Which is more probable?– Linda is a bank teller – Linda is a bank teller and is active in the feminist movement
• Linda problem (Tversky & Kahneman, 1983)– Demonstrates the conjunction fallacy– People often fail to regard a combination of events as less
probable than a single event in the combination• Probability of two events occurring together (in “conjunction”) is always
less than or equal to the probability of either one occurring alone
![Page 28: Introduction to Mechanized Labor Marketplaces: Mechanical Turk Uichin Lee KAIST KSE](https://reader035.vdocuments.us/reader035/viewer/2022062421/56649e2c5503460f94b1c0d8/html5/thumbnails/28.jpg)
Survey (Physician Problem)• Physician problem demonstrates the outcome bias: a surgeon
deciding whether or not to do a risky surgery on a patient. – The surgery had a known probability of success (e.g., 92%)– Subjects were presented with either a good or bad outcome (in this case
living or dying), and asked to rate the quality of the surgeon's pre-operation decision.
• Judgment of quality of a decision is often dependent on the valence of the outcome (Baron and Hershey, 1988)
• Subjects rated the quality of a physician’s decision to perform an operation on a patient (on a 7-point scale)– 1: incorrect and inexcusable, 7: clearly correct, and the opposite
decision would be inexcusable– Those presented with bad outcomes rated the decision worse than
those who had good outcomes.
![Page 29: Introduction to Mechanized Labor Marketplaces: Mechanical Turk Uichin Lee KAIST KSE](https://reader035.vdocuments.us/reader035/viewer/2022062421/56649e2c5503460f94b1c0d8/html5/thumbnails/29.jpg)
After Survey
• After survey, subjects completed the subjective numeracy scale (SNS, 2007) called SNS score– An eight-item self-report measure of perceived ability to perform
various mathematical tasks and preference for the use of numerical vs. prose information
– Used as a parsimonious measurement of an individual’s quantitative abilities
• Additional “catch trial” question: to test whether subjects were attending to the questions (by requiring precise and obvious answers)– E.g., “while watching the television, have you ever had a fatal heart
attack?” (w/ six-point scale anchored on “Never” and “Often”)
![Page 30: Introduction to Mechanized Labor Marketplaces: Mechanical Turk Uichin Lee KAIST KSE](https://reader035.vdocuments.us/reader035/viewer/2022062421/56649e2c5503460f94b1c0d8/html5/thumbnails/30.jpg)
Configuration• M-Turk:
– Pay: $0.10 (N=318 participated)– Title: “Answer a short decision survey”– Description: “Make some choices and judgments in this 5-minute
survey”• Estimated completion time is included to provide workers with a rough
assessment of the reward/effort ratio (e.g., $1.71/hour)
• Lab subject pool: – N=141 students from an introductory subject pool at a large university
• Internet discussion board: – Posted a link to the survey to several online discussion boards that host
online experiments in psychology – Online for 2 weeks; and N=137 visitors took part in the survey
![Page 31: Introduction to Mechanized Labor Marketplaces: Mechanical Turk Uichin Lee KAIST KSE](https://reader035.vdocuments.us/reader035/viewer/2022062421/56649e2c5503460f94b1c0d8/html5/thumbnails/31.jpg)
Subject Pools: Characteristics
• Subjects recruited from online discussion forums were significantly less likely to complete the survey than the subjects on M-Turk (69.3% vs. 91.6%, X2=20.915, p<.001)
• # of respondents who failed the catch trial is low, and not significantly different across subject pools (X2(2,301)=0.187, p=0.91)
• Subjects in the three subject pools did not differ significantly in the SNS score: F(2, 299) = 1.193, p=0.30
![Page 32: Introduction to Mechanized Labor Marketplaces: Mechanical Turk Uichin Lee KAIST KSE](https://reader035.vdocuments.us/reader035/viewer/2022062421/56649e2c5503460f94b1c0d8/html5/thumbnails/32.jpg)
Results on Experimental Tasks• M-Turk is a reliable source of experimental data in JDM
![Page 33: Introduction to Mechanized Labor Marketplaces: Mechanical Turk Uichin Lee KAIST KSE](https://reader035.vdocuments.us/reader035/viewer/2022062421/56649e2c5503460f94b1c0d8/html5/thumbnails/33.jpg)
Labor Supply
• Economic theory predicts that increasing the price paid for labor will increase the supply of labor in most cases
• M-Turk experiment: after completing the demographic survey and the first task (transcription), subjects were randomly assigned to one of the four treatment groups and offered the chance to perform another transcription for p cents: 1, 5, 15, or 25
• Workers receiving high offers were more likely to accept