crowdsourcing the assembly of concept hierarchies
DESCRIPTION
How to create a taxonomy by a paid workforce provided by Amazon Mechanical Turk. Evaluative comparison to an existing community of motivated students and domain experts. Presentation held at JCDL 2010, Brisbane, Australia (http://www.jcdl2010.org).TRANSCRIPT
![Page 1: Crowdsourcing the Assembly of Concept Hierarchies](https://reader033.vdocuments.us/reader033/viewer/2022051514/5495fb9aac7959222e8b4fb6/html5/thumbnails/1.jpg)
CrowdsourcingCrowdsourcingthe Assembly of Concept
Hierarchies
Kai Eckert¹Mathias Niepert¹Christof Niemann¹
Joint Conference on Digital Libraries (JCDL), Brisbane, Australia, 2010
Cameron Buckner²Colin Allen²Heiner Stuckenschmidt¹
Presentation: Kai EckertWednesday, June 23, 2010
¹ University of Mannheim, Germany ² Indiana University, USA
![Page 2: Crowdsourcing the Assembly of Concept Hierarchies](https://reader033.vdocuments.us/reader033/viewer/2022051514/5495fb9aac7959222e8b4fb6/html5/thumbnails/2.jpg)
Motivation
● Various types of Concept Hierarchies:
● Thesauri● Taxonomies● Classifications● Ontologies● ...
● Manual creation is expensive.
● Automatic creation lacks quality.
![Page 3: Crowdsourcing the Assembly of Concept Hierarchies](https://reader033.vdocuments.us/reader033/viewer/2022051514/5495fb9aac7959222e8b4fb6/html5/thumbnails/3.jpg)
Could the users do the work?
● Divide the work between a lot of users.
● Motivate them to be part of a community.
● Achieve quality control by means of redundancy.
● Can a concept hierarchy be created like e.g. Wikipedia?
![Page 4: Crowdsourcing the Assembly of Concept Hierarchies](https://reader033.vdocuments.us/reader033/viewer/2022051514/5495fb9aac7959222e8b4fb6/html5/thumbnails/4.jpg)
● The Indiana Philosophy Ontology Project.
● A browsable taxonomy of philosophical ideas.
● Ideas are extracted from the Stanford Encyclopedia of Philosophy (SEP).
● Intuitive access to the SEP via the InPhO taxonomy.
● Entry point for other philosophical ressources on the web.
![Page 5: Crowdsourcing the Assembly of Concept Hierarchies](https://reader033.vdocuments.us/reader033/viewer/2022051514/5495fb9aac7959222e8b4fb6/html5/thumbnails/5.jpg)
From the SEP to InPhO
Extraction of newideas and relationships
Gathering communityfeedback about ideas and relationships
Process feedback andinfer positions in theclassification tree
Start with a hand-builtformal ontology describing majortopics and sub-topics.
![Page 6: Crowdsourcing the Assembly of Concept Hierarchies](https://reader033.vdocuments.us/reader033/viewer/2022051514/5495fb9aac7959222e8b4fb6/html5/thumbnails/6.jpg)
Gathering community feedback
![Page 7: Crowdsourcing the Assembly of Concept Hierarchies](https://reader033.vdocuments.us/reader033/viewer/2022051514/5495fb9aac7959222e8b4fb6/html5/thumbnails/7.jpg)
Gathering community feedback
Relatedness
![Page 8: Crowdsourcing the Assembly of Concept Hierarchies](https://reader033.vdocuments.us/reader033/viewer/2022051514/5495fb9aac7959222e8b4fb6/html5/thumbnails/8.jpg)
Gathering community feedback
Relatedness
Relative Generality
is more specific thanis more specific than
![Page 9: Crowdsourcing the Assembly of Concept Hierarchies](https://reader033.vdocuments.us/reader033/viewer/2022051514/5495fb9aac7959222e8b4fb6/html5/thumbnails/9.jpg)
![Page 10: Crowdsourcing the Assembly of Concept Hierarchies](https://reader033.vdocuments.us/reader033/viewer/2022051514/5495fb9aac7959222e8b4fb6/html5/thumbnails/10.jpg)
Great stuff, but...
● what, if you do not have a motivated community of expert users?
● Well,...
● Like almost everything,you can buy it at Amazon...
● Amazon Mechanical Turk
![Page 11: Crowdsourcing the Assembly of Concept Hierarchies](https://reader033.vdocuments.us/reader033/viewer/2022051514/5495fb9aac7959222e8b4fb6/html5/thumbnails/11.jpg)
Amazon Mechanical Turk (AMT)
● Platform for the placing and taking ofHuman Intelligence Tasks (HIT).
● 100,000 – 400,000 HITs available.
● Number of workers: ??? (100,000 in 100 countries, 2007, New York Times).
![Page 12: Crowdsourcing the Assembly of Concept Hierarchies](https://reader033.vdocuments.us/reader033/viewer/2022051514/5495fb9aac7959222e8b4fb6/html5/thumbnails/12.jpg)
HIT Definition
Time allotted per assignment: Maximum timea worker can work on a single task.
Worker restrictions: Approval Rate, Location
Reward per assignment: How much do you pay for each HIT?
Number of assignments per HIT: How many unique workers do you want to work on each HIT?
![Page 13: Crowdsourcing the Assembly of Concept Hierarchies](https://reader033.vdocuments.us/reader033/viewer/2022051514/5495fb9aac7959222e8b4fb6/html5/thumbnails/13.jpg)
HIT Result
Answer of each worker for each HIT
Accept Time, Submit Time, Work Time In Seconds
Worker ID
![Page 14: Crowdsourcing the Assembly of Concept Hierarchies](https://reader033.vdocuments.us/reader033/viewer/2022051514/5495fb9aac7959222e8b4fb6/html5/thumbnails/14.jpg)
Our questions
Can we replace the InPhO community by means of Amazon Mechanical Turk?
How much does it cost and what is the resulting quality?
![Page 15: Crowdsourcing the Assembly of Concept Hierarchies](https://reader033.vdocuments.us/reader033/viewer/2022051514/5495fb9aac7959222e8b4fb6/html5/thumbnails/15.jpg)
Experimental Setup
Minimum overlap i=1 2 3 4 5
Number of pairs 3,237 1,154 370 187 92
● We wanted some overlap within the experts:
We decided for the 1,154 pairs.
● Each pair was evaluated by 5 different workers.
● Each worker evaluated at least 12 pairs (1 HIT).
● 87 distinct workers.
● The HITs were completed in 20 hours.
![Page 16: Crowdsourcing the Assembly of Concept Hierarchies](https://reader033.vdocuments.us/reader033/viewer/2022051514/5495fb9aac7959222e8b4fb6/html5/thumbnails/16.jpg)
Measuring Agreement
● Calculation of the distance between two answers:
● Relatedness: Absolute value of the difference
● Relative Generality: Match: 0, otherwise: 1
● The evaluation deviation is the mean distance of a user to the users in a reference group.
![Page 17: Crowdsourcing the Assembly of Concept Hierarchies](https://reader033.vdocuments.us/reader033/viewer/2022051514/5495fb9aac7959222e8b4fb6/html5/thumbnails/17.jpg)
Comparison with Experts(Relative Generality)
0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1.00
30InPhO UsersAMT Users
Fra
ctio
n o
f u
sers
in %
Follow Experts Own Opinion
![Page 18: Crowdsourcing the Assembly of Concept Hierarchies](https://reader033.vdocuments.us/reader033/viewer/2022051514/5495fb9aac7959222e8b4fb6/html5/thumbnails/18.jpg)
Comparison with Experts(Relative Generality)
0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1.00
30InPhO UsersAMT Users
Fra
ctio
n o
f u
sers
in %
Follow Experts Own Opinion
Ran
do
m C
licke
r
![Page 19: Crowdsourcing the Assembly of Concept Hierarchies](https://reader033.vdocuments.us/reader033/viewer/2022051514/5495fb9aac7959222e8b4fb6/html5/thumbnails/19.jpg)
Comparison with Experts(Relative Generality)
0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1.00
30InPhO UsersAMT Users
Fra
ctio
n o
f u
sers
in %
Follow Experts Own Opinion
![Page 20: Crowdsourcing the Assembly of Concept Hierarchies](https://reader033.vdocuments.us/reader033/viewer/2022051514/5495fb9aac7959222e8b4fb6/html5/thumbnails/20.jpg)
InPhO Users are quite consistent.InPhO Users are quite consistent.
Comparison with Experts(Relative Generality)
0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1.00
30InPhO UsersAMT Users
Fra
ctio
n o
f u
sers
in %
Follow Experts Own Opinion
![Page 21: Crowdsourcing the Assembly of Concept Hierarchies](https://reader033.vdocuments.us/reader033/viewer/2022051514/5495fb9aac7959222e8b4fb6/html5/thumbnails/21.jpg)
InPhO Users are quite consistent.InPhO Users are quite consistent.
Comparison with Experts(Relative Generality)
0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1.00
30InPhO UsersAMT Users
Fra
ctio
n o
f u
sers
in %
Follow Experts Own Opinion
AMT Users are not consistent.→ Are there good ones?
![Page 22: Crowdsourcing the Assembly of Concept Hierarchies](https://reader033.vdocuments.us/reader033/viewer/2022051514/5495fb9aac7959222e8b4fb6/html5/thumbnails/22.jpg)
InPhO Users are quite consistent.InPhO Users are quite consistent.
Comparison with Experts(Relative Generality)
0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1.00
30InPhO UsersAMT Users
Fra
ctio
n o
f u
sers
in %
Follow Experts Own Opinion
AMT Users are not consistent.→ Are there good ones?
Yes, there are!→ But which ones?
![Page 23: Crowdsourcing the Assembly of Concept Hierarchies](https://reader033.vdocuments.us/reader033/viewer/2022051514/5495fb9aac7959222e8b4fb6/html5/thumbnails/23.jpg)
InPhO Users are quite consistent.InPhO Users are quite consistent.
Comparison with Experts(Relative Generality)
0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1.00
30InPhO UsersAMT Users
Fra
ctio
n o
f u
sers
in %
Follow Experts Own Opinion
AMT Users are not consistent.→ Are there good ones?
Yes, there are!→ But which ones?
![Page 24: Crowdsourcing the Assembly of Concept Hierarchies](https://reader033.vdocuments.us/reader033/viewer/2022051514/5495fb9aac7959222e8b4fb6/html5/thumbnails/24.jpg)
Mixed Results...
Can we just use the good ones?
![Page 25: Crowdsourcing the Assembly of Concept Hierarchies](https://reader033.vdocuments.us/reader033/viewer/2022051514/5495fb9aac7959222e8b4fb6/html5/thumbnails/25.jpg)
Telling the good from the bad
● First approach: Filtering by working time
● Hypothesis 1: Workers who think some time before they answer, give better answers.
● Hypothesis 2: Probably there are workers who give quick random responses.
![Page 26: Crowdsourcing the Assembly of Concept Hierarchies](https://reader033.vdocuments.us/reader033/viewer/2022051514/5495fb9aac7959222e8b4fb6/html5/thumbnails/26.jpg)
Filtering by working time
>80s
>140
s
>200
s
>260
s
>320
s
>380
s
>440
s
>500
s
>560
s
>620
s
>680
s
>740
s
>800
s0
20
40
60
80
100
84
75
68
57
44
36
29
22
17
13
9 9 8 75 4 4 3
# Users
Average working time for one HIT (12 pairs)
Nu
mb
er
of
Us
ers
![Page 27: Crowdsourcing the Assembly of Concept Hierarchies](https://reader033.vdocuments.us/reader033/viewer/2022051514/5495fb9aac7959222e8b4fb6/html5/thumbnails/27.jpg)
Filtering by working time
>80s
>140
s
>200
s
>260
s
>320
s
>380
s
>440
s
>500
s
>560
s
>620
s
>680
s
>740
s
>800
s0
0,3
0,6
0,9
1,2
1,5
0
20
40
60
80
100
84
75
68
57
44
36
29
22
17
13
9 9 8 75 4 4 3
1,38 1,
41
1,37
1,36 1,
391,
48
1,42
1,47
1,27
1,10
1,06
1,35
1,31
1,21
0,64
# UsersDeviation
Average working time for one HIT (12 pairs)
De
via
tio
n f
rom
Ex
pe
rts
Nu
mb
er
of
Us
ers
![Page 28: Crowdsourcing the Assembly of Concept Hierarchies](https://reader033.vdocuments.us/reader033/viewer/2022051514/5495fb9aac7959222e8b4fb6/html5/thumbnails/28.jpg)
Telling the good from the bad
● Second approach: Filtering by comparison with a hidden gold standard.
● Test pairs:
● Social Epistemology – Epistemology (P1)
● Computer Ethics – Ethics (P2)
● Chinese Room Argument – Chinese Philosophy (P3)
● Dualism - Philosophy of Mind (P4)
![Page 29: Crowdsourcing the Assembly of Concept Hierarchies](https://reader033.vdocuments.us/reader033/viewer/2022051514/5495fb9aac7959222e8b4fb6/html5/thumbnails/29.jpg)
Applying filters
● Test pairs:● Social Epistemology – Epistemology (P1)● Computer Ethics – Ethics (P2)● Chinese Room Argument – Chinese Philosophy (P3)● Dualism - Philosophy of Mind (P4)
● Filters:
1) P1 and P2 are correct (Common Sense)
2) Like 1), additionally P4 is correct (+Background)
3) Like 1), additionally P3 is correct (+Lexical)
4) All have to be correct (All)
![Page 30: Crowdsourcing the Assembly of Concept Hierarchies](https://reader033.vdocuments.us/reader033/viewer/2022051514/5495fb9aac7959222e8b4fb6/html5/thumbnails/30.jpg)
Filter results for relatedness
Filter Users Deviation Max. Dev.
All (4) 7 0.60 1.00
+Lexical (3) 10 0.87 1.78
+Background (2) 23 0.84 1.41
Common Sense (1) 40 1.11 1.96
All AMT 87 1.39 2.96
All InPhO 25 0.77 1.75
Random --- 1.8 ---
![Page 31: Crowdsourcing the Assembly of Concept Hierarchies](https://reader033.vdocuments.us/reader033/viewer/2022051514/5495fb9aac7959222e8b4fb6/html5/thumbnails/31.jpg)
Filter results for relative generality
Filter Users Deviation Max. Dev.
All (4) 7(5) 0.12 0.22
+Lexical (3) 10(8) 0.14 0.27
+Background (2) 23(20) 0.15 0.45
Common Sense (1) 40(35) 0.21 0.59
All AMT 87(78) 0.45 1.00
All InPhO 25 0.23 0.47
Random --- 0.75 ---
![Page 32: Crowdsourcing the Assembly of Concept Hierarchies](https://reader033.vdocuments.us/reader033/viewer/2022051514/5495fb9aac7959222e8b4fb6/html5/thumbnails/32.jpg)
Financial considerations
Filter Pairs Evaluations Cost per Pair Cost per Evaluation
--- 1,138 5,690 US$ 0.111 US$ 0.022
Common Sense (1) 1,074 1,909 US$ 0.117 US$ 0.066
+Background (2) 1,018 1,558 US$ 0.124 US$ 0.081
+Lexical (3) 215 215 US$ 0.586 US$ 0.586
All (4) 183 183 US$ 0.689 US$ 0.689
● Overall payments: 126 US$
● Estimation for all pairs with filter „All (4)“: 784 US$
● Estimation for all pairs with redundancy (5x): 3,920 US$.
![Page 33: Crowdsourcing the Assembly of Concept Hierarchies](https://reader033.vdocuments.us/reader033/viewer/2022051514/5495fb9aac7959222e8b4fb6/html5/thumbnails/33.jpg)
ConclusionAMT answers are of varying quality. But this is true for many communities, too.
With moderate filtering („Background“), we achieved a quality comparable to the InPhO community.
With 5 evaluations per pair, we still covered 89% of all pairs with this filter.
The resulting InPhO taxonomy is online:http://inpho.cogs.indiana.edu/amt_taxonomy
No need for existing data, gold standards or training data (Beside the filter pairs).
No need for a community?
![Page 34: Crowdsourcing the Assembly of Concept Hierarchies](https://reader033.vdocuments.us/reader033/viewer/2022051514/5495fb9aac7959222e8b4fb6/html5/thumbnails/34.jpg)
„Computer ethics doesn't exist. Blue is black and red is blood on the internet. Nobody cares, because they are lonely.“
Anonymous Mechanical Turk Worker
Thank you
Questions?
Kai [email protected]://www.slideshare.net/kaiec
![Page 35: Crowdsourcing the Assembly of Concept Hierarchies](https://reader033.vdocuments.us/reader033/viewer/2022051514/5495fb9aac7959222e8b4fb6/html5/thumbnails/35.jpg)
Photo Credits
● Michal Zacharzewski (Title Crowd), http://www.sxc.hu/profile/mzacha
● Peter Suneson (Crowd sillhouette), http://www.sxc.hu/profile/CMSeter
● Alaa Hamed (Egyptian Coins), http://www.sxc.hu/profile/alaasafei
● Piotr Lewandowski (Money), http://www.sxc.hu/profile/LeWy2005
● Asif Akbar (Clock), http://www.sxc.hu/profile/asifthebes
● Zern Liew (Traffic Cone), http://www.sxc.hu/profile/eidesign
● Peter Gustafson (Counting Fingers), http://www.sxc.hu/profile/liaj
● Kostya Kisleyko (Yes No), http://www.sxc.hu/profile/dlnny
● Sergio Roberto Bichara (Barcode), http://www.sxc.hu/profile/srbichara
● Maggie Molloy (Icons), http://www.sxc.hu/profile/agthabrown
● Sanja Gjenero (World with Crowd), http://www.sxc.hu/profile/lusi
● Wikimedia Commons (The Turk), http://en.wikipedia.org/wiki/File:Kempelen_chess1.jpg