quick metric for judging level of bias basis for comparing samples/sampling methods a way to...
TRANSCRIPT
GLOBEAnalytics for Assessing Global Representativeness
Matthew D SchmillLindsey Gordon, Erle Ellis, Nicholas Magliocca, Tim Oates
University of Maryland, Baltimore County
GLOBE: Enhancing Scientific Workflows The goal: accelerate and improve scientific workflows for land change
science
Joint work with Wayne Lutters, Erle Ellis, Tim Oates, Penny Rheingans at University of Maryland, Baltimore County IS, CSEE, GES
Supported by NSF’s Cyber-Enabled Discovery & Innovation program Fourth and final year of the program
Centerpiece is the GLOBE system Enabling better science through
Real-time statistical assessments, interactive geovisualization tools
Scientific collaboration platform
Land Change Science Study of interaction between human systems, ecosystems, the atmosphere,
and other Earth Systems as mediated through human use of land. Cross cuts many disciplines of social and natural science
Typified by this challenge: how to integrate and synthesize local studies to “globalized” results
Though GLOBE is targeted at Land Change Scientists The concept of representativeness is a very general concern
The GLOBE system is appropriate to any discipline engaged in the synthesizing local studies into global results
Representativeness The degree to which a sample represents a global pattern
A converse to bias A well-represented sample is not biased, a biased sample is not representative
Sampling bias: a typical criticism anywhere that samples are used to make inferences A land change science example:
Are you representing only accessible sites?
Accessibility as a measure of travel time to a city (Nelson, 2008)
A measure of representativeness should be Intuitive, understandable
Statistically sound
Measures of Representativeness Pearson’s Chi Square
Requires the variable space be discrete
Unreliable with small sample sizes
Kolmogorov-Smirnov Goodness-of-Fit Test Does not require discrete space
Scaling and visualizing beyond 1d is hard
f-Divergence (Hellinger, Jensen-Shannon) Requires discrete variable space
Measures of Representativeness Pearson’s Chi Square
Requires the variable space be discrete
Unreliable with small sample sizes
Kolmogorov-Smirnov Does not require discrete space
Scaling and visualizing beyond 1d is hard
f-Divergence (Hellinger, Jensen-Shannon) Requires discrete variable space
Probability Estimates Chi Square – simple
Monte Carlo methods for the rest
Representativeness
Gives you Quick metric for judging level of bias
Basis for comparing samples/sampling methods
A way to compute the probability of incorrectly concluding a sample is biased
Does not give you Any guidance on where to look to
address sampling bias
Any way to view this geographically
Representedness The degree to which a location or member of the population is represented
by the collection The complement of representativeness
Useful for visualization and analysis Heat maps that show geographically where gaps lie
Can be used as a basis for case study search to fill study gaps
Computing Representedness
Get datum for land unit(precipitation)
Locate datum in global distribution
Chi Square
KS Distance
1573mm/yr
Compute representativeness for that value
p-value of x2
times sign of between differencesample and population
Difference in ECDF forpopulation versus sample
at unit datum
Computing Representedness
Get datum for land unit
Locate datum in global distribution
discrete
continuous
49.2m
Compute representativeness for that value
p-value of x2
times sign of between differencesample and population
Difference in ECDF forpopulation versus sample
at unit datum
Compute RGB(heat map)
Addressing Bias
Study Gap Search Identify areas where density in
population is significantly higher than sample
Search case database using that criterion
Additional criteria available (fts, metadata)
Case Weighting Addresses biases in statistical
analysis by Over-weighting (> 1.0) cases in under-
represented areas
Under-weighting (< 1.0) cases in over-represented areas
Computed using representedness
The GLOBE Application Our platform for better Land Change Science
By improving workflows
As a social/collaborative platform
Formally introduced to GLP OSM in March 2014
Features Allows researchers to create and manage case studies and their geometry
Integrates global data layers to augment user cases
Provides real-time analytics and visual tools Similarity search
Representativeness analysis
Global Data Organized into a Discrete Global Grid [Sahr, White, and Kimerling, 2003]
ISEA Aperature 3, Hexagonal
1.5M 96 km2 equal-area hexagons at resolution 12 (native GLOBE resolution)
Downsampled grid at resolution 10 (863.8 km2) for approximate calculations
Currently 75 global variables; variables can be processed and submitted to GLOBE Human, remote sensing, biological, surface, climate
GLOBE Cases
GLOBE Cases GLOBE GES team has georeferenced and entered 630 cases
Currently a total 927 georeferenced, completed cases
Similarity Assessment
Representativeness Analysis – Monte Carlo
Representativeness Analysis – x2
Representativeness Analysis – Gap Search
In Summary Representativeness an issue anywhere inferences are made from samples
Representedness a companion piece that enables geovisualization and gap search
Can be implemented many ways Classical hypothesis test (x2)
Monte Carlo methods: f-divergence, KS-distance
GLOBE application enables representativeness workflow for land change science Realtime assessment & visualizations
Gap search and case weighting
In the Pipeline Multidimensional Analysis
Quantifying the impact of data scarcity (small sample size) Heuristic tools for guiding the user
Improved visual tools
Dimensionality reduction Identifying if and when it is possible
Automated exploratory analysis Helping the user to identify what analysis they should be running
Thanks! Visit us at http://globe.umbc.edu
Representativeness Analysis – KS
Conceptual Overview
Global Datadiscrete global grid
GLOBE Casesgeography + data
GLOBE GCEanalytical &
computational engine
GLOBE Web App
visual & interactive tools