crowdsourcing satellite imagery (talk at giscience2012)

Crowdsourcing satellite imagery: study of iterative vs. parallel models

Nicolas Maisonneuve, Bastien Chopard

Twitter: nmaisonneuve

1Friday, September 21, 12

Damage assessment after a humanitarian crisis


Port-au-prince: 300K buildings assessed in 3 months for 8 UNOSAT experts


Organizational challenges: How to organize non-trained volunteers,

especially to enforce quality?




Investigated scope:

• Qualitative + Quantitative study of 2 collaborative models inspired by Computer science: iterative vs parallel information processing

• Controlled experiment to isolate quality = F(organisation), removing other parameters e.g. training, task difficulty

• this research != studying real world collaborative practices but more extreme/symbolic cases to guide collaborative system designers




Investigated scope:





Tested Collaborative Models (1/2) iterative model

e.g. wikipedia, open street map, assembly lines


Tested Collaborative Models (2/2) parallel model

aggregation

e.g. voting systems in society, distributed computing9Friday, September 21, 12

Tested Collaborative Models (2/2) parallel model

old version (17th to mid 20th century): when computers were human/women (Mathematical Table project - (1938 -1948)


Qualitative comparisonIterative Parallel

problem divisibility

No need to divide complex problem

Complex problem need to be divided in easier pieces






optimization tradeoff

copy emphasizing exploitation

isolation emphasizing exploration









quality mechanism sequential improvement redundancy + diversity of

opinions









quality mechanism sequential improvement redundancy + diversity of

opinions

side effect path dependency effect + sensitivity to vandalism

useless redundancy for obvious decisions + pb of aggregation


Controlled Experiment: web platform

Interface/instruction for the Parallel model


on 3 maps with different topologies (annotated by 1 UNITAR expert)


Participants used for the experiments: Mechanical Turk as simulator


Quality of the collective output• type I errors = p(wrong annotation)• type II errors = p(missing a building)• Consistency

Data Quality Metrics

Analogy with the information retrieval field:• Precision = p(an annotation is a building)• Recall = p(a building is annotated)• F-measure = score mixing recall + precision• (metrics adjusted with tolerance distance)


Step 1 - collecting independent contribution: N for (map1, map2, map3) = (121,120,113)

Methodology for parallel model


Step 2 - for each map,generating the set of groups of m=[1 to N] participants


m = 1

m = 2

m = 3


Step 3 - for each group: aggregating + computing quality


groups of m = 2

Spatial Clustering of points + quorum

Compute Data Quality with Gold Standard

Precision Recall F-measure


The more = the better? (parallel model)

yes but until some points.. • (Adding more people wont change the consensus panel) • Limitation of Linus’ law (compared to iterative model e.g. openstreetmap)

• Wisdom != skill: we can’t replace training by more people

avg.

F-m

easu

re


Methodology for Iterative model

sample of an iterative process for map3



Collected data for map1, map2, map3 = 13, 21,25 instances of about 10 iterations

n instances of about m iterations



Step 2- for each iteration, we compute the precision, recall, f-measure of all the instances

Precision Recall F-measure


Intrepretation of results / Comparison on data quality

(*) but parallel < iterative in difficult cases (map 2) (lack of consensus)

Parallel Iterative

Accuracy - wrong annotations

consensual results (*) error propagation

Accuracy - missing buildings

useless redundancy on obvious buildings

accumulation of knowledge driving attention on uncovered area

Consistency redundancy naive last = best


Side-objective: Measuring how the crowd spatially agrees

Method: taking randomly 2 participants and measure their spatial inter-agreement (e.g. ratio of points matching) and repeat

the process N time


Side-objective: Measuring how the crowd spatially agrees

Method: taking randomly 2 participants and measure their spatial inter-agreement (e.g. ratio of points matching) and repeat

the process N time

way to measure the intrinsic difficulty of a task (map 1 = easy , map 2 = quite hard)


Impact of the organization beyond data quality• Energy / Footprint to collectively solve a problem, • Participation sustainability, • On Individual behavior (skill Learning & Enjoyment)

Skill complementarity: Is the best group of 3 people the best 3 people at the individual level? data says no!

Other symbolic organisations / mechanism: • human cellular automata (cell = 1 person, resubmit a task at time t, because influenced by peers results generated at time t-1)• Integration of Game design / Gamification

future tracks


crowdsourcing satellite imagery (talk at giscience2012)

Technology