implementing a national data infrastructure: opportunities for the bio community peter mccartney...

12
Implementing a National Data Infrastructure: Opportunities for the BIO Community Peter McCartney Program Director Division of Biological Infrastructure CASC - 2015

Upload: irma-harrington

Post on 21-Jan-2016

212 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Implementing a National Data Infrastructure: Opportunities for the BIO Community Peter McCartney Program Director Division of Biological Infrastructure

Implementing a National Data Infrastructure: Opportunities for the BIO Community

Peter McCartneyProgram Director

Division of Biological InfrastructureCASC - 2015

Page 2: Implementing a National Data Infrastructure: Opportunities for the BIO Community Peter McCartney Program Director Division of Biological Infrastructure

NationalData Infrastructure

Acquisition & Generation

Storage & Curation

Analysis, Modeling & Visualization

Data Policy

Education &Workforce

Foundational Research in Cyber-technologies

Collaboration, Partnerships & Grand

Challenges

Page 3: Implementing a National Data Infrastructure: Opportunities for the BIO Community Peter McCartney Program Director Division of Biological Infrastructure

NITRD Big Data R&D Strategies

Strategy I: Create next generation capabilities by leveraging emerging Big Data foundations, technologies, processes, and policies (Foundational Research)

Strategy II: In addition to the generation of knowledge from data, also emphasize using trustworthy data and resulting knowledge to make decisions and take confident action (Grand Challenges)

Strategy III: Ensure the long term sustainability, access, and development of high value data sets and data resources (NDI)

Strategy IV: Improve the national landscape for Big Data education and training to fulfill increasing demand for both deep analytical talent and analytical capacity for the broader workforce (Ed& Workforce)

Strategy V?: (Data Policy)

Page 4: Implementing a National Data Infrastructure: Opportunities for the BIO Community Peter McCartney Program Director Division of Biological Infrastructure

Biology as an Information Science

Life exists because of the ability to encode, exchange, and interpret information.

Bioinformatics programs in BIO support:Development of methods to represent and

manipulate biological information, rules, and processes in digital form

Development of tools and resources to support biolological research using computational methods.

Page 5: Implementing a National Data Infrastructure: Opportunities for the BIO Community Peter McCartney Program Director Division of Biological Infrastructure

3 0

1 0

7

6

6

4

32

5 2

3 7

2 9

7

1 9

Populations &

Community Ecology

Ecosystem ScienceEvolutionary Processes

Molecular Biophysics

Research Resources

Genetic Mechanisms

Systematic Biology & Biodiversity

Neural Systems

Cellular Dynamics and

Function

Synthetic and Systems Biology

Plant Genome Research Program

Developmental Systems

Physiological and Structural Systems

Page 6: Implementing a National Data Infrastructure: Opportunities for the BIO Community Peter McCartney Program Director Division of Biological Infrastructure

BIO Grand Challenges

Understanding the Brain

Understanding Biological Diversity

Interactions of the Earth, Climate, and Biosphere

Phenomics: Genotype to Phenotype.

Synthetic Biology

Page 7: Implementing a National Data Infrastructure: Opportunities for the BIO Community Peter McCartney Program Director Division of Biological Infrastructure

innovative sustaining

general

BIO-specific

large

small life cycle

scal

e

scope

CI for Life Sciences Portfolio Balance

Page 8: Implementing a National Data Infrastructure: Opportunities for the BIO Community Peter McCartney Program Director Division of Biological Infrastructure

Implementing a National Data Infrastructure: Acquisition and Generation

Instrumentation Observing & experimental infrastructure (NEON), New molecular

technologies(Cryo EM) Digitization

Imaging technologies & feature extraction (Bisque, ADBC) Data Mining

Annotation, Knowledgebases (Phenoscape) Computational approaches

Protein structure prediction (Bio XFEL). Crowd sourcing

Citizen science networks (eBird)

Page 9: Implementing a National Data Infrastructure: Opportunities for the BIO Community Peter McCartney Program Director Division of Biological Infrastructure

Implementing a National Data Infrastructure: Curation & Storage

Curation (Science communities) Standards (metadata, formats, APIs, QAQC, etc) Portals (DataOne, Arabidopsis Information Portal,

Biodiversity portals) Data repositories (PDB, TAIR, Gramene, REDfly

Storage Infrastructure (Shared infrastructure) Tools (data management technologies, cyber security,

identity management, DOI’s, etc) Storage capacity (xSede partners, campuses, clouds)

Page 10: Implementing a National Data Infrastructure: Opportunities for the BIO Community Peter McCartney Program Director Division of Biological Infrastructure

Implementing a National Data Infrastructure: Modeling and Analysis

Modeling and Analytic environments Tools organized around bio research communities

(bioKepler, Galaxy, Predictive Ecosystem Analyzer) Computational gateways

Connecting users to shared infrastructure (iPlant, CIPRES, Neuro Science Gateway)

Page 11: Implementing a National Data Infrastructure: Opportunities for the BIO Community Peter McCartney Program Director Division of Biological Infrastructure

Advances in Biological Informatics

Innovation Awards – smaller, shorter projects, emphasis on innovative, high risk research to develop new approaches.

Development Awards – larger efforts focused on delivery of a database, software tool or informatics resource.

Sustaining Awards – limited funds for operations and maintenance of critical infrastructure

Page 12: Implementing a National Data Infrastructure: Opportunities for the BIO Community Peter McCartney Program Director Division of Biological Infrastructure

Mapping ABI Tracks across NSF

BIO – PDB, NEON, iDigBio, iPlant, GoLife, PGRP, Centers

MPS – Math BIO. CDS&E ENG – Bioengineering, Synthetic Bio CISE – IIA, BigData, GEO - Earthcube, GeoInformatics, BCO DMO Crosscutting – SI2, DIBBS, BioMAPS, CDS&E International - BBSRC