terra populus: integrated data on population and environment
TRANSCRIPT
TerraPop Goals
Lower barriers to conducting interdisciplinary human-
environment interactions research by making data with
different formats from different scientific domains easily
interoperable
Provide an organizational and technical framework to
preserve, integrate, disseminate, and analyze global-
scale spatiotemporal data describing population and the
environment.
TerraPop in ContextCollaborating Organizations
• Data integration expertise
• Large census and survey
data collections & expertise
• Institutional foundation
• Human-environment
interactions research
expertise
• Environmentally-oriented
data collections & expertise
• Data preservation and
sustainability expertise
• Social science data
collections & expertise
• Major producers and distributors of data on both humans and their environment
• Major producers of tools for integrating and transforming data across formats
• Leaders in preservation and sustainability
Background
Sustainable Digital Data Access and
Preservation Network (DataNet) Provide reliable digital preservation, access, integration, and
analysis
Anticipate and adapt to technological change and user needs
Engage with frontiers of computer/information science and CI
Serve as component elements of interoperable data
preservation and access network
established in 2009
established in 2011
TerraPop in ContextDataNet Cyberinfrastructure
Curated population and
environment data collection
Exposed through DataONE,
SEAD
Extracts exportable to DFC
Integration services
Potentially available through
DFC, SEAD
Open source components and
API
• T W O D O M AI N S : P O P U L AT I O N & E N V I R O N M E N T
• T H R E E D ATA S T R U C T U R E S
• Microdata
• Area-level data
• Rasters
Source Data
Making disparate data formats interoperable
Microdata: Characteristics of individuals
and households
Area-level data:Characteristics of places defined
by boundaries
Raster data:Values tied to spatial
coordinates
Location-Based Integration
Microdata
Area-level dataRasters
Mix and match
variables originating in
any of the data structures
Obtain output in the
data structure most
useful to you
Location-Based Integration
Individuals and householdswith their environmental
and social context
Microdata
Area-level dataRasters
Age Sex
36 M
34 F
11 M
8 M
42 M
39 F
15 F
Landcover
Forest
Forest
Forest
Forest
Grassland
Grassland
Grassland
Location-Based Integration
Summarized environmental
and population
Microdata
Area-level dataRasters
characteristics for administrative
districts
County ID
G01001
G01003
G01005
G01007
County ID Mean Ann.
Precip.
Median HH
Income
G01001 768 50,500
G01003 589 48,500
G01005 867 51,000
G01007 701 50,750
Location-Based Integration
Rasters of
population and
environment
data
Microdata
Area-level dataRasters
Why TerraPop?
Data Access
Preservation
Documentation
Creation
Transformations
Improved Data Access
Preservation
Data producers have no preservation plan GLI crops data
Previous versions of data difficult or impossible
to find MODIS Land Cover Collection 4 superseded by Collection 5,
but Collection 4 is unavailable
Documentation
Data lacks sufficient (or any) metadata
http://gli.environment.umn.edu/
Documentation
GLI crops data originally provided through an
anonymous FTP site
No metadata provided with the data files
So, we wrote it!
http://www.earthstat.org/
http://www.earthstat.org/wp-content/uploads/METADATA_HarvestedAreaYield175Crops.pdf
Abaca – Harvested Area GeoTIFF Metadatahttp://www.earthstat.org
Abaca – Harvested Area GeoTIFF Metadatahttp://www.terrapop.org
Creation
Historical subnational GIS data Matched to census data
Aligned with most recent GIS data available for a given
country
Photographed CountriesCensus Bureau Library, Library of Congress, Harvard
Creation
Historical subnational GIS data Matched to census data
Aligned with most recent GIS data available for a given
country
Area-level data Tabulated from census microdata
Obtained from census agencies as digital files, PDFs, or
HTML tables
Transformations
Continuous Binary Categorical
Min Percent area Mode
Max Total area Number of Classes
Mean
Count
Percent area*
Total area*
* Available for some continuous agricultural rasters
Area-Level Summary of Raster Data
Data in TerraPop
Completed GIS Boundary Files GIS Boundary Files In Progress
Beta Raster data
Global Landscapes Initiative (GLI) Yield and harvested area for 175 crops
Global Land Cover 2000 (GLC2000) Land cover data, circa 2000, derived from the VEGETATION
instrument on the SPOT 4 satellite
WorldClim Climate data describing temperature, precipitation, and
bioclimatic variables, created from weather station data
collected from approximately 1950-2000
New Raster Data
MODIS Land Cover Type (MCD12Q1) Yearly land cover data derived from the MODIS Terra and
Aqua satellites, available for 2001 - 2012
500 meter spatial resolution
Available in five land cover classifications
IGBP
University of Maryland
LAI/fPAR
Net Primary Productivity
Plant Functional Type
Now available on our staging site
Project Status
Currently in project year 4
Prepping a rollout of new data, but you can
preview it at http://beta2.terrapop.org
Prepping a new UI for summer 2015
Always creating new data!