crowdsourcing satellite imagery (talk at giscience2012)

Post on 24-Jan-2015

9.962 Views

Category:

Technology

1 Downloads

Preview:

Click to see full reader

DESCRIPTION

Talk given at the GIScience2012 conference (http://www.giscience.org/) More info about this work on my blog http://goo.gl/giouF

TRANSCRIPT

Crowdsourcing satellite imagery: study of iterative vs. parallel models

Nicolas Maisonneuve, Bastien Chopard

Twitter: nmaisonneuve

1Friday, September 21, 12

Damage assessment after a humanitarian crisis

2Friday, September 21, 12

Port-au-prince: 300K buildings assessed in 3 months for 8 UNOSAT experts

3Friday, September 21, 12

Organizational challenges: How to organize non-trained volunteers,

especially to enforce quality?

4Friday, September 21, 12

Organizational challenges: How to organize non-trained volunteers,

especially to enforce quality?

Investigated scope:

• Qualitative + Quantitative study of 2 collaborative models inspired by Computer science: iterative vs parallel information processing

• Controlled experiment to isolate quality = F(organisation), removing other parameters e.g. training, task difficulty

• this research != studying real world collaborative practices but more extreme/symbolic cases to guide collaborative system designers

5Friday, September 21, 12

Organizational challenges: How to organize non-trained volunteers,

especially to enforce quality?

Investigated scope:

• Qualitative + Quantitative study of 2 collaborative models inspired by Computer science: iterative vs parallel information processing

• Controlled experiment to isolate quality = F(organisation), removing other parameters e.g. training, task difficulty

• this research != studying real world collaborative practices but more extreme/symbolic cases to guide collaborative system designers

6Friday, September 21, 12

Organizational challenges: How to organize non-trained volunteers,

especially to enforce quality?

Investigated scope:

• Qualitative + Quantitative study of 2 collaborative models inspired by Computer science: iterative vs parallel information processing

• Controlled experiment to isolate quality = F(organisation), removing other parameters e.g. training, task difficulty

• this research != studying real world collaborative practices but more extreme/symbolic cases to guide collaborative system designers

7Friday, September 21, 12

Tested Collaborative Models (1/2) iterative model

e.g. wikipedia, open street map, assembly lines

8Friday, September 21, 12

Tested Collaborative Models (2/2) parallel model

aggregation

e.g. voting systems in society, distributed computing9Friday, September 21, 12

Tested Collaborative Models (2/2) parallel model

old version (17th to mid 20th century): when computers were human/women (Mathematical Table project - (1938 -1948)

10Friday, September 21, 12

Qualitative comparisonIterative Parallel

problem divisibility

No need to divide complex problem

Complex problem need to be divided in easier pieces

11Friday, September 21, 12

Qualitative comparisonIterative Parallel

problem divisibility

No need to divide complex problem

Complex problem need to be divided in easier pieces

optimization tradeoff

copy emphasizing exploitation

isolation emphasizing exploration

12Friday, September 21, 12

Qualitative comparisonIterative Parallel

problem divisibility

No need to divide complex problem

Complex problem need to be divided in easier pieces

optimization tradeoff

copy emphasizing exploitation

isolation emphasizing exploration

quality mechanism sequential improvement redundancy + diversity of

opinions

13Friday, September 21, 12

Qualitative comparisonIterative Parallel

problem divisibility

No need to divide complex problem

Complex problem need to be divided in easier pieces

optimization tradeoff

copy emphasizing exploitation

isolation emphasizing exploration

quality mechanism sequential improvement redundancy + diversity of

opinions

side effect path dependency effect + sensitivity to vandalism

useless redundancy for obvious decisions + pb of aggregation

14Friday, September 21, 12

Controlled Experiment: web platform

Interface/instruction for the Parallel model

15Friday, September 21, 12

on 3 maps with different topologies (annotated by 1 UNITAR expert)

16Friday, September 21, 12

Participants used for the experiments: Mechanical Turk as simulator

17Friday, September 21, 12

Quality of the collective output• type I errors = p(wrong annotation)• type II errors = p(missing a building)• Consistency

Data Quality Metrics

Analogy with the information retrieval field:• Precision = p(an annotation is a building)• Recall = p(a building is annotated)• F-measure = score mixing recall + precision• (metrics adjusted with tolerance distance)

18Friday, September 21, 12

Step 1 - collecting independent contribution: N for (map1, map2, map3) = (121,120,113)

Methodology for parallel model

19Friday, September 21, 12

Step 2 - for each map,generating the set of groups of m=[1 to N] participants

Methodology for parallel model

m = 1

m = 2

m = 3

20Friday, September 21, 12

Step 3 - for each group: aggregating + computing quality

Methodology for parallel model

groups of m = 2

Spatial Clustering of points + quorum

Compute Data Quality with Gold Standard

Precision Recall F-measure

21Friday, September 21, 12

The more = the better? (parallel model)

yes but until some points.. • (Adding more people wont change the consensus panel) • Limitation of Linus’ law (compared to iterative model e.g. openstreetmap)

• Wisdom != skill: we can’t replace training by more people

avg.

F-m

easu

re

22Friday, September 21, 12

Methodology for Iterative model

sample of an iterative process for map3

23Friday, September 21, 12

Methodology for Iterative model

Collected data for map1, map2, map3 = 13, 21,25 instances of about 10 iterations

n instances of about m iterations

24Friday, September 21, 12

Methodology for Iterative model

Step 2- for each iteration, we compute the precision, recall, f-measure of all the instances

Precision Recall F-measure

25Friday, September 21, 12

Intrepretation of results / Comparison on data quality

(*) but parallel < iterative in difficult cases (map 2) (lack of consensus)

Parallel Iterative

Accuracy - wrong annotations

consensual results (*) error propagation

Accuracy - missing buildings

useless redundancy on obvious buildings

accumulation of knowledge driving attention on uncovered area

Consistency redundancy naive last = best

26Friday, September 21, 12

Side-objective: Measuring how the crowd spatially agrees

Method: taking randomly 2 participants and measure their spatial inter-agreement (e.g. ratio of points matching) and repeat

the process N time

27Friday, September 21, 12

Side-objective: Measuring how the crowd spatially agrees

Method: taking randomly 2 participants and measure their spatial inter-agreement (e.g. ratio of points matching) and repeat

the process N time

way to measure the intrinsic difficulty of a task (map 1 = easy , map 2 = quite hard)

28Friday, September 21, 12

Impact of the organization beyond data quality• Energy / Footprint to collectively solve a problem, • Participation sustainability, • On Individual behavior (skill Learning & Enjoyment)

Skill complementarity: Is the best group of 3 people the best 3 people at the individual level? data says no!

Other symbolic organisations / mechanism: • human cellular automata (cell = 1 person, resubmit a task at time t, because influenced by peers results generated at time t-1)• Integration of Game design / Gamification

future tracks

29Friday, September 21, 12

top related