building integrated data streams for large- scale paleoclimatology & biogeography cdsco neotoma...
Post on 03-Jan-2016
214 Views
Preview:
TRANSCRIPT
Building Integrated Data Streams for Large-Scale Paleoclimatology & Biogeography
CDSCO
Neotoma DB
www.neotomadb.org
Neotoma DBwww.neotomadb.org C4P
Jack WilliamsSimon GoringUW-Madison
Many Big Questions require assembly of individual records into larger networks
Do global temperatures lead or lag CO2 during deglaciations?
21,000 11,000 Modern15,000 7,000
%
Spruce distributions: last glacial maximum to present
%
%
%
No Data
Williams et al. (2004) Ecological Monographs
SprucePollen
Ice IceIce
How far and fast can species migrate when climates change?
Global temperatures & CO2: 22ka->0ka
Shakun et al. (2012) Nature
Paleoecological Data: Key characteristics
• ‘Long Tail’: Collected in the field by small scientific teams. Workers vary w.r.t. data management expertise, capacity, interest
• Highly valuable – specimens & samples collected decades ago are still analyzed
• Scientific expertise distributed by proxy type, region, time period, and/or taxonomic group
C4P
Community Data Repositories have emerged to tackle these bigger questions
Neotoma DBwww.neotomadb.org
Key Characteristics
Open Data
Curated by Community
Standardized Taxonomy
Time: Age Controls and Age Models
Paleobiology DBpaleobiodb.org
PALEOBIOLOGICAL DATA
CONSORTIUM
COMMUNITYGEODATA
OPEN-SOURCE
BIODATA
Paleobiology DB
NOW DBContinental Scientific Drilling Office (CDSCO)
Digimorph
NOAA Paleoclimatology
DarwinCore
iDigPaleo
MorphoBank
Neotoma DB
VertNet
Early Career Members-at-Large
ROpenSci
GBIF/BISON
STEPPE
Open Geospatial Consortium
Integrated Earth Data Alliance
iDigBio
C4P
• Share best practices & protocols
• Build compatibility between geo- & bioinformatics
Neotoma Paleoecology Database: Design Concepts• Spatiotemporal database: species
occurrences & abundances in space and time
• Age controls and age models stored
• Centralized IT and Distributed Scientific Governance. Neotoma composed of several constituent databases (e.g. North American Pollen Database, FAUNMAP)
• Open data accessible via Explorer, APIs, R Neotoma
• Broad user community: Paleoecologists, ecosystem modellers, paleoclimatologists, biogeographers, educators, … Neotoma DB
www.neotomadb.org
• Time: Late Neogene (~last 5 million years)
• Most records: 104-105 yrs• Space: North American to
Global• Datasets:
• Plants & pollen• Vertebrates• Ostracodes• Diatoms• Insects• Testate Amoebae• Physical Sedimentology
Brewer et al. 2012 TREE
Neotoma Domain Temporal Domains of Paleoecological Databases
Neotoma DBwww.neotomadb.org
Paleoecol-ogists
Ecosystem Modelers
Biogeograph-ers
Neotoma DB
Neotoma as Boundary OrganizationData UsersPaleoecologists
Pollen
Vertebrates
Insects
Diatoms
Ostracodes
Amoebae
Packrat Middens
Informatics & Computer Scientists
IEDA GeoWSOpen Core
Paleoclimat-ologists
Best PracticesShared Protocols
Data Data
New Questions
Paleodata Workflows:
State of Field1. Cores Collected
2. Cores Split, Sampled, Logged
3. Proxies Measured by PIs
4. Papers Written
5. Data & Metadata Assembled
6. Data Deposited (Journals, NOAA-Paleo, Neotoma, etc.)
Consequences: • Variably documented data
• Challenging project management
• Multiple inefficiencies, sources of data friction
• Synthetic research hard at anything beyond site scale
Neotoma DBwww.neotomadb.org C4P
7. Data Synthesized into Regional-Global Studies
9. New Analyses.
8. Metadata gaps discovered
Key Need: Integrated Data Workflows1. Cores Collected, Tagged with IGSNs, Metadata Logged In
Field
2. Cores Split, Sampled, Logged, Samples Tagged with IGSNs, Data
Stored in Common Data Structures (Open Core Data)
3. Proxies Measured by PIs, Data Stored in Common Data Formats
4. Papers Written, Embargoed Data Passed to Community Data Repositories
(e.g. Neotoma)
5. Data & Metadata Assembled
6. Paper Published, Embargo Lifted from RepositoryNeotoma DBwww.neotomadb.org C4P
Current & Future Neotoma Activities1. Data Uploads
2. Partnership with LacCore/CDSCO et al. to establish common standards & linked data flows
3. neotoma R – establishing data models, integration with R packages
4. API development, user-driven
5. New tools for data visualization & exploration
Neotoma DBwww.neotomadb.org
1
Neotoma2
Users
This talk represents the work of many
Neotoma PIs & Developers: Eric C. Grimm, Russ Graham, Mike Anderson, Allan Ashworth, Brian Bills, Jessica Blois, Bob Booth, Ed Davis, Don Charles, Simon Goring, Steve Jackson, Alison Smith, Jack Williams
C4P Steering Committee: Kerstin Lehnert, David Anderson, Doug Fils, Leslie Hsu, Chris Jenkins, Anders Noren, Tom Olsewski, Dena Smith, Mark Uhen, Jack Williams
Neotoma DBNSF-Geoinformatics
NSF-Earth Cube
Eric Grimm
C4P
top related