crowdsourcing the assembly of concept hierarchies

CrowdsourcingCrowdsourcingthe Assembly of Concept

Hierarchies

Kai Eckert¹Mathias Niepert¹Christof Niemann¹

Joint Conference on Digital Libraries (JCDL), Brisbane, Australia, 2010

Cameron Buckner²Colin Allen²Heiner Stuckenschmidt¹

Presentation: Kai EckertWednesday, June 23, 2010

¹ University of Mannheim, Germany ² Indiana University, USA

Motivation

● Various types of Concept Hierarchies:

● Thesauri● Taxonomies● Classifications● Ontologies● ...

● Manual creation is expensive.

● Automatic creation lacks quality.

Could the users do the work?

● Divide the work between a lot of users.

● Motivate them to be part of a community.

● Achieve quality control by means of redundancy.

● Can a concept hierarchy be created like e.g. Wikipedia?

● The Indiana Philosophy Ontology Project.

● A browsable taxonomy of philosophical ideas.

● Ideas are extracted from the Stanford Encyclopedia of Philosophy (SEP).

● Intuitive access to the SEP via the InPhO taxonomy.

● Entry point for other philosophical ressources on the web.

From the SEP to InPhO

Extraction of newideas and relationships

Gathering communityfeedback about ideas and relationships

Process feedback andinfer positions in theclassification tree

Start with a hand-builtformal ontology describing majortopics and sub-topics.

Gathering community feedback


Relatedness


Relatedness

Relative Generality

is more specific thanis more specific than

Great stuff, but...

● what, if you do not have a motivated community of expert users?

● Well,...

● Like almost everything,you can buy it at Amazon...

● Amazon Mechanical Turk

Amazon Mechanical Turk (AMT)

● Platform for the placing and taking ofHuman Intelligence Tasks (HIT).

● 100,000 – 400,000 HITs available.

● Number of workers: ??? (100,000 in 100 countries, 2007, New York Times).

HIT Definition

Time allotted per assignment: Maximum timea worker can work on a single task.

Worker restrictions: Approval Rate, Location

Reward per assignment: How much do you pay for each HIT?

Number of assignments per HIT: How many unique workers do you want to work on each HIT?

HIT Result

Answer of each worker for each HIT

Accept Time, Submit Time, Work Time In Seconds

Worker ID

Our questions

Can we replace the InPhO community by means of Amazon Mechanical Turk?

How much does it cost and what is the resulting quality?

Experimental Setup

Minimum overlap i=1 2 3 4 5

Number of pairs 3,237 1,154 370 187 92

● We wanted some overlap within the experts:

We decided for the 1,154 pairs.

● Each pair was evaluated by 5 different workers.

● Each worker evaluated at least 12 pairs (1 HIT).

● 87 distinct workers.

● The HITs were completed in 20 hours.

Measuring Agreement

● Calculation of the distance between two answers:

● Relatedness: Absolute value of the difference

● Relative Generality: Match: 0, otherwise: 1

● The evaluation deviation is the mean distance of a user to the users in a reference group.

Comparison with Experts(Relative Generality)

0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1.00

30InPhO UsersAMT Users

Fra

ctio

n o

f u

sers

in %

Follow Experts Own Opinion


0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1.00


Fra

ctio

n o

f u

sers

in %


Ran

do

m C

licke

r


0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1.00


Fra

ctio

n o

f u

sers

in %


InPhO Users are quite consistent.InPhO Users are quite consistent.


0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1.00


Fra

ctio

n o

f u

sers

in %




0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1.00


Fra

ctio

n o

f u

sers

in %


AMT Users are not consistent.→ Are there good ones?



0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1.00


Fra

ctio

n o

f u

sers

in %


AMT Users are not consistent.→ Are there good ones?

Yes, there are!→ But which ones?

Mixed Results...

Can we just use the good ones?

Telling the good from the bad

● First approach: Filtering by working time

● Hypothesis 1: Workers who think some time before they answer, give better answers.

● Hypothesis 2: Probably there are workers who give quick random responses.

Filtering by working time

>80s

>140

s

>200

s

>260

s

>320

s

>380

s

>440

s

>500

s

>560

s

>620

s

>680

s

>740

s

>800

s0

20

40

60

80

100

84

75

68

57

44

36

29

22

17

13

9 9 8 75 4 4 3

# Users

Average working time for one HIT (12 pairs)

Nu

mb

er

of

Us

ers

Filtering by working time

>80s

>140

s

>200

s

>260

s

>320

s

>380

s

>440

s

>500

s

>560

s

>620

s

>680

s

>740

s

>800

s0

0,3

0,6

0,9

1,2

1,5

0

20

40

60

80

100

84

75

68

57

44

36

29

22

17

13

9 9 8 75 4 4 3

1,38 1,

41

1,37

1,36 1,

391,

48

1,42

1,47

1,27

1,10

1,06

1,35

1,31

1,21

0,64

# UsersDeviation

Average working time for one HIT (12 pairs)

De

via

tio

n f

rom

Ex

pe

rts

Nu

mb

er

of

Us

ers

Telling the good from the bad

● Second approach: Filtering by comparison with a hidden gold standard.

● Test pairs:

● Social Epistemology – Epistemology (P1)

● Computer Ethics – Ethics (P2)

● Chinese Room Argument – Chinese Philosophy (P3)

● Dualism - Philosophy of Mind (P4)

Applying filters

● Test pairs:● Social Epistemology – Epistemology (P1)● Computer Ethics – Ethics (P2)● Chinese Room Argument – Chinese Philosophy (P3)● Dualism - Philosophy of Mind (P4)

● Filters:

1) P1 and P2 are correct (Common Sense)

2) Like 1), additionally P4 is correct (+Background)

3) Like 1), additionally P3 is correct (+Lexical)

4) All have to be correct (All)

Filter results for relatedness

Filter Users Deviation Max. Dev.

All (4) 7 0.60 1.00

+Lexical (3) 10 0.87 1.78

+Background (2) 23 0.84 1.41

Common Sense (1) 40 1.11 1.96

All AMT 87 1.39 2.96

All InPhO 25 0.77 1.75

Random --- 1.8 ---

Filter results for relative generality

Filter Users Deviation Max. Dev.

All (4) 7(5) 0.12 0.22

+Lexical (3) 10(8) 0.14 0.27

+Background (2) 23(20) 0.15 0.45

Common Sense (1) 40(35) 0.21 0.59

All AMT 87(78) 0.45 1.00

All InPhO 25 0.23 0.47

Random --- 0.75 ---

Financial considerations

Filter Pairs Evaluations Cost per Pair Cost per Evaluation

--- 1,138 5,690 US$ 0.111 US$ 0.022

Common Sense (1) 1,074 1,909 US$ 0.117 US$ 0.066

+Background (2) 1,018 1,558 US$ 0.124 US$ 0.081

+Lexical (3) 215 215 US$ 0.586 US$ 0.586

All (4) 183 183 US$ 0.689 US$ 0.689

● Overall payments: 126 US$

● Estimation for all pairs with filter „All (4)“: 784 US$

● Estimation for all pairs with redundancy (5x): 3,920 US$.

ConclusionAMT answers are of varying quality. But this is true for many communities, too.

With moderate filtering („Background“), we achieved a quality comparable to the InPhO community.

With 5 evaluations per pair, we still covered 89% of all pairs with this filter.

The resulting InPhO taxonomy is online:http://inpho.cogs.indiana.edu/amt_taxonomy

No need for existing data, gold standards or training data (Beside the filter pairs).

No need for a community?

http://inpho.cogs.indiana.edu/amt_taxonomy

„Computer ethics doesn't exist. Blue is black and red is blood on the internet. Nobody cares, because they are lonely.“

Anonymous Mechanical Turk Worker

Thank you

Questions?

Kai [email protected]://www.slideshare.net/kaiec

mailto:[email protected]

http://www.slideshare.net/kaiec

Photo Credits

● Michal Zacharzewski (Title Crowd), http://www.sxc.hu/profile/mzacha

● Peter Suneson (Crowd sillhouette), http://www.sxc.hu/profile/CMSeter

● Alaa Hamed (Egyptian Coins), http://www.sxc.hu/profile/alaasafei

● Piotr Lewandowski (Money), http://www.sxc.hu/profile/LeWy2005

● Asif Akbar (Clock), http://www.sxc.hu/profile/asifthebes

● Zern Liew (Traffic Cone), http://www.sxc.hu/profile/eidesign

● Peter Gustafson (Counting Fingers), http://www.sxc.hu/profile/liaj

● Kostya Kisleyko (Yes No), http://www.sxc.hu/profile/dlnny

● Sergio Roberto Bichara (Barcode), http://www.sxc.hu/profile/srbichara

● Maggie Molloy (Icons), http://www.sxc.hu/profile/agthabrown

● Sanja Gjenero (World with Crowd), http://www.sxc.hu/profile/lusi

● Wikimedia Commons (The Turk), http://en.wikipedia.org/wiki/File:Kempelen_chess1.jpg

http://www.sxc.hu/profile/mzacha

http://www.sxc.hu/profile/CMSeter

http://www.sxc.hu/profile/alaasafei

http://www.sxc.hu/profile/LeWy2005

http://www.sxc.hu/profile/asifthebes

http://www.sxc.hu/profile/eidesign

http://www.sxc.hu/profile/liaj

http://www.sxc.hu/profile/dlnny

http://www.sxc.hu/profile/srbichara

http://www.sxc.hu/profile/agthabrown

http://www.sxc.hu/profile/lusi

http://en.wikipedia.org/wiki/File:Kempelen_chess1.jpg

crowdsourcing the assembly of concept hierarchies

Technology