quick metric for judging level of bias basis for comparing samples/sampling methods a way to...

24
GLOBE Analytics for Assessing Global Representativeness Matthew D Schmill Lindsey Gordon, Erle Ellis, Nicholas Magliocca, Tim Oates University of Maryland, Baltimore County

Upload: ross-cobb

Post on 18-Dec-2015

218 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Quick metric for judging level of bias  Basis for comparing samples/sampling methods  A way to compute the probability of incorrectly concluding

GLOBEAnalytics for Assessing Global Representativeness

Matthew D SchmillLindsey Gordon, Erle Ellis, Nicholas Magliocca, Tim Oates

University of Maryland, Baltimore County

Page 2: Quick metric for judging level of bias  Basis for comparing samples/sampling methods  A way to compute the probability of incorrectly concluding

GLOBE: Enhancing Scientific Workflows The goal: accelerate and improve scientific workflows for land change

science

Joint work with Wayne Lutters, Erle Ellis, Tim Oates, Penny Rheingans at University of Maryland, Baltimore County IS, CSEE, GES

Supported by NSF’s Cyber-Enabled Discovery & Innovation program Fourth and final year of the program

Centerpiece is the GLOBE system Enabling better science through

Real-time statistical assessments, interactive geovisualization tools

Scientific collaboration platform

Page 3: Quick metric for judging level of bias  Basis for comparing samples/sampling methods  A way to compute the probability of incorrectly concluding

Land Change Science Study of interaction between human systems, ecosystems, the atmosphere,

and other Earth Systems as mediated through human use of land. Cross cuts many disciplines of social and natural science

Typified by this challenge: how to integrate and synthesize local studies to “globalized” results

Though GLOBE is targeted at Land Change Scientists The concept of representativeness is a very general concern

The GLOBE system is appropriate to any discipline engaged in the synthesizing local studies into global results

Page 4: Quick metric for judging level of bias  Basis for comparing samples/sampling methods  A way to compute the probability of incorrectly concluding

Representativeness The degree to which a sample represents a global pattern

A converse to bias A well-represented sample is not biased, a biased sample is not representative

Sampling bias: a typical criticism anywhere that samples are used to make inferences A land change science example:

Are you representing only accessible sites?

Accessibility as a measure of travel time to a city (Nelson, 2008)

A measure of representativeness should be Intuitive, understandable

Statistically sound

Page 5: Quick metric for judging level of bias  Basis for comparing samples/sampling methods  A way to compute the probability of incorrectly concluding

Measures of Representativeness Pearson’s Chi Square

Requires the variable space be discrete

Unreliable with small sample sizes

Kolmogorov-Smirnov Goodness-of-Fit Test Does not require discrete space

Scaling and visualizing beyond 1d is hard

f-Divergence (Hellinger, Jensen-Shannon) Requires discrete variable space

Page 6: Quick metric for judging level of bias  Basis for comparing samples/sampling methods  A way to compute the probability of incorrectly concluding

Measures of Representativeness Pearson’s Chi Square

Requires the variable space be discrete

Unreliable with small sample sizes

Kolmogorov-Smirnov Does not require discrete space

Scaling and visualizing beyond 1d is hard

f-Divergence (Hellinger, Jensen-Shannon) Requires discrete variable space

Probability Estimates Chi Square – simple

Monte Carlo methods for the rest

Page 7: Quick metric for judging level of bias  Basis for comparing samples/sampling methods  A way to compute the probability of incorrectly concluding

Representativeness

Gives you Quick metric for judging level of bias

Basis for comparing samples/sampling methods

A way to compute the probability of incorrectly concluding a sample is biased

Does not give you Any guidance on where to look to

address sampling bias

Any way to view this geographically

Page 8: Quick metric for judging level of bias  Basis for comparing samples/sampling methods  A way to compute the probability of incorrectly concluding

Representedness The degree to which a location or member of the population is represented

by the collection The complement of representativeness

Useful for visualization and analysis Heat maps that show geographically where gaps lie

Can be used as a basis for case study search to fill study gaps

Page 9: Quick metric for judging level of bias  Basis for comparing samples/sampling methods  A way to compute the probability of incorrectly concluding

Computing Representedness

Get datum for land unit(precipitation)

Locate datum in global distribution

Chi Square

KS Distance

1573mm/yr

Compute representativeness for that value

p-value of x2

times sign of between differencesample and population

Difference in ECDF forpopulation versus sample

at unit datum

Page 10: Quick metric for judging level of bias  Basis for comparing samples/sampling methods  A way to compute the probability of incorrectly concluding

Computing Representedness

Get datum for land unit

Locate datum in global distribution

discrete

continuous

49.2m

Compute representativeness for that value

p-value of x2

times sign of between differencesample and population

Difference in ECDF forpopulation versus sample

at unit datum

Compute RGB(heat map)

Page 11: Quick metric for judging level of bias  Basis for comparing samples/sampling methods  A way to compute the probability of incorrectly concluding

Addressing Bias

Study Gap Search Identify areas where density in

population is significantly higher than sample

Search case database using that criterion

Additional criteria available (fts, metadata)

Case Weighting Addresses biases in statistical

analysis by Over-weighting (> 1.0) cases in under-

represented areas

Under-weighting (< 1.0) cases in over-represented areas

Computed using representedness

Page 12: Quick metric for judging level of bias  Basis for comparing samples/sampling methods  A way to compute the probability of incorrectly concluding

The GLOBE Application Our platform for better Land Change Science

By improving workflows

As a social/collaborative platform

Formally introduced to GLP OSM in March 2014

Features Allows researchers to create and manage case studies and their geometry

Integrates global data layers to augment user cases

Provides real-time analytics and visual tools Similarity search

Representativeness analysis

Page 13: Quick metric for judging level of bias  Basis for comparing samples/sampling methods  A way to compute the probability of incorrectly concluding

Global Data Organized into a Discrete Global Grid [Sahr, White, and Kimerling, 2003]

ISEA Aperature 3, Hexagonal

1.5M 96 km2 equal-area hexagons at resolution 12 (native GLOBE resolution)

Downsampled grid at resolution 10 (863.8 km2) for approximate calculations

Currently 75 global variables; variables can be processed and submitted to GLOBE Human, remote sensing, biological, surface, climate

Page 14: Quick metric for judging level of bias  Basis for comparing samples/sampling methods  A way to compute the probability of incorrectly concluding

GLOBE Cases

Page 15: Quick metric for judging level of bias  Basis for comparing samples/sampling methods  A way to compute the probability of incorrectly concluding

GLOBE Cases GLOBE GES team has georeferenced and entered 630 cases

Currently a total 927 georeferenced, completed cases

Page 16: Quick metric for judging level of bias  Basis for comparing samples/sampling methods  A way to compute the probability of incorrectly concluding

Similarity Assessment

Page 17: Quick metric for judging level of bias  Basis for comparing samples/sampling methods  A way to compute the probability of incorrectly concluding

Representativeness Analysis – Monte Carlo

Page 18: Quick metric for judging level of bias  Basis for comparing samples/sampling methods  A way to compute the probability of incorrectly concluding

Representativeness Analysis – x2

Page 19: Quick metric for judging level of bias  Basis for comparing samples/sampling methods  A way to compute the probability of incorrectly concluding

Representativeness Analysis – Gap Search

Page 20: Quick metric for judging level of bias  Basis for comparing samples/sampling methods  A way to compute the probability of incorrectly concluding

In Summary Representativeness an issue anywhere inferences are made from samples

Representedness a companion piece that enables geovisualization and gap search

Can be implemented many ways Classical hypothesis test (x2)

Monte Carlo methods: f-divergence, KS-distance

GLOBE application enables representativeness workflow for land change science Realtime assessment & visualizations

Gap search and case weighting

Page 21: Quick metric for judging level of bias  Basis for comparing samples/sampling methods  A way to compute the probability of incorrectly concluding

In the Pipeline Multidimensional Analysis

Quantifying the impact of data scarcity (small sample size) Heuristic tools for guiding the user

Improved visual tools

Dimensionality reduction Identifying if and when it is possible

Automated exploratory analysis Helping the user to identify what analysis they should be running

Page 22: Quick metric for judging level of bias  Basis for comparing samples/sampling methods  A way to compute the probability of incorrectly concluding

Thanks! Visit us at http://globe.umbc.edu

Page 23: Quick metric for judging level of bias  Basis for comparing samples/sampling methods  A way to compute the probability of incorrectly concluding

Representativeness Analysis – KS

Page 24: Quick metric for judging level of bias  Basis for comparing samples/sampling methods  A way to compute the probability of incorrectly concluding

Conceptual Overview

Global Datadiscrete global grid

GLOBE Casesgeography + data

GLOBE GCEanalytical &

computational engine

GLOBE Web App

visual & interactive tools