implementing a national data infrastructure: opportunities for the bio community peter mccartney...
TRANSCRIPT
Implementing a National Data Infrastructure: Opportunities for the BIO Community
Peter McCartneyProgram Director
Division of Biological InfrastructureCASC - 2015
NationalData Infrastructure
Acquisition & Generation
Storage & Curation
Analysis, Modeling & Visualization
Data Policy
Education &Workforce
Foundational Research in Cyber-technologies
Collaboration, Partnerships & Grand
Challenges
NITRD Big Data R&D Strategies
Strategy I: Create next generation capabilities by leveraging emerging Big Data foundations, technologies, processes, and policies (Foundational Research)
Strategy II: In addition to the generation of knowledge from data, also emphasize using trustworthy data and resulting knowledge to make decisions and take confident action (Grand Challenges)
Strategy III: Ensure the long term sustainability, access, and development of high value data sets and data resources (NDI)
Strategy IV: Improve the national landscape for Big Data education and training to fulfill increasing demand for both deep analytical talent and analytical capacity for the broader workforce (Ed& Workforce)
Strategy V?: (Data Policy)
Biology as an Information Science
Life exists because of the ability to encode, exchange, and interpret information.
Bioinformatics programs in BIO support:Development of methods to represent and
manipulate biological information, rules, and processes in digital form
Development of tools and resources to support biolological research using computational methods.
3 0
1 0
7
6
6
4
32
5 2
3 7
2 9
7
1 9
Populations &
Community Ecology
Ecosystem ScienceEvolutionary Processes
Molecular Biophysics
Research Resources
Genetic Mechanisms
Systematic Biology & Biodiversity
Neural Systems
Cellular Dynamics and
Function
Synthetic and Systems Biology
Plant Genome Research Program
Developmental Systems
Physiological and Structural Systems
BIO Grand Challenges
Understanding the Brain
Understanding Biological Diversity
Interactions of the Earth, Climate, and Biosphere
Phenomics: Genotype to Phenotype.
Synthetic Biology
innovative sustaining
general
BIO-specific
large
small life cycle
scal
e
scope
CI for Life Sciences Portfolio Balance
Implementing a National Data Infrastructure: Acquisition and Generation
Instrumentation Observing & experimental infrastructure (NEON), New molecular
technologies(Cryo EM) Digitization
Imaging technologies & feature extraction (Bisque, ADBC) Data Mining
Annotation, Knowledgebases (Phenoscape) Computational approaches
Protein structure prediction (Bio XFEL). Crowd sourcing
Citizen science networks (eBird)
Implementing a National Data Infrastructure: Curation & Storage
Curation (Science communities) Standards (metadata, formats, APIs, QAQC, etc) Portals (DataOne, Arabidopsis Information Portal,
Biodiversity portals) Data repositories (PDB, TAIR, Gramene, REDfly
Storage Infrastructure (Shared infrastructure) Tools (data management technologies, cyber security,
identity management, DOI’s, etc) Storage capacity (xSede partners, campuses, clouds)
Implementing a National Data Infrastructure: Modeling and Analysis
Modeling and Analytic environments Tools organized around bio research communities
(bioKepler, Galaxy, Predictive Ecosystem Analyzer) Computational gateways
Connecting users to shared infrastructure (iPlant, CIPRES, Neuro Science Gateway)
Advances in Biological Informatics
Innovation Awards – smaller, shorter projects, emphasis on innovative, high risk research to develop new approaches.
Development Awards – larger efforts focused on delivery of a database, software tool or informatics resource.
Sustaining Awards – limited funds for operations and maintenance of critical infrastructure
Mapping ABI Tracks across NSF
BIO – PDB, NEON, iDigBio, iPlant, GoLife, PGRP, Centers
MPS – Math BIO. CDS&E ENG – Bioengineering, Synthetic Bio CISE – IIA, BigData, GEO - Earthcube, GeoInformatics, BCO DMO Crosscutting – SI2, DIBBS, BioMAPS, CDS&E International - BBSRC