the data commons · 2016-12-19 · recommendation #4: a national cancer data ecosystem for sharing...
TRANSCRIPT
![Page 1: The Data Commons · 2016-12-19 · Recommendation #4: A national cancer data ecosystem for sharing and analysis. Create a National Cancer Data Ecosystem to collect, share, and interconnect](https://reader034.vdocuments.us/reader034/viewer/2022042301/5ecba77b704f51380256f095/html5/thumbnails/1.jpg)
The Data Commons An introduction & Overview
BD2K AHM, November 29, 2016
Vivien Bonazzi (ADDS)
![Page 2: The Data Commons · 2016-12-19 · Recommendation #4: A national cancer data ecosystem for sharing and analysis. Create a National Cancer Data Ecosystem to collect, share, and interconnect](https://reader034.vdocuments.us/reader034/viewer/2022042301/5ecba77b704f51380256f095/html5/thumbnails/2.jpg)
Outline
What’s driving the need for a Data Commons?
Development of the Data Commons at NIH
Current Data Commons Pilots
• Next steps
Considerations & Concluding Thoughts
![Page 3: The Data Commons · 2016-12-19 · Recommendation #4: A national cancer data ecosystem for sharing and analysis. Create a National Cancer Data Ecosystem to collect, share, and interconnect](https://reader034.vdocuments.us/reader034/viewer/2022042301/5ecba77b704f51380256f095/html5/thumbnails/3.jpg)
What’s driving the need for a
Data Commons?
![Page 4: The Data Commons · 2016-12-19 · Recommendation #4: A national cancer data ecosystem for sharing and analysis. Create a National Cancer Data Ecosystem to collect, share, and interconnect](https://reader034.vdocuments.us/reader034/viewer/2022042301/5ecba77b704f51380256f095/html5/thumbnails/4.jpg)
Convergence of factors
Mountains of Data
Increasing need and support for Data sharing
Availability of digital technologies and infrastructures that support Data at scale
![Page 5: The Data Commons · 2016-12-19 · Recommendation #4: A national cancer data ecosystem for sharing and analysis. Create a National Cancer Data Ecosystem to collect, share, and interconnect](https://reader034.vdocuments.us/reader034/viewer/2022042301/5ecba77b704f51380256f095/html5/thumbnails/5.jpg)
![Page 6: The Data Commons · 2016-12-19 · Recommendation #4: A national cancer data ecosystem for sharing and analysis. Create a National Cancer Data Ecosystem to collect, share, and interconnect](https://reader034.vdocuments.us/reader034/viewer/2022042301/5ecba77b704f51380256f095/html5/thumbnails/6.jpg)
![Page 7: The Data Commons · 2016-12-19 · Recommendation #4: A national cancer data ecosystem for sharing and analysis. Create a National Cancer Data Ecosystem to collect, share, and interconnect](https://reader034.vdocuments.us/reader034/viewer/2022042301/5ecba77b704f51380256f095/html5/thumbnails/7.jpg)
https://gds.nih.gov/Went into effect January 25, 2015
NCI guidance:http://www.cancer.gov/grants-training/grants-management/nci-policies/genomic-data
Requires public sharing of genomic data sets
![Page 8: The Data Commons · 2016-12-19 · Recommendation #4: A national cancer data ecosystem for sharing and analysis. Create a National Cancer Data Ecosystem to collect, share, and interconnect](https://reader034.vdocuments.us/reader034/viewer/2022042301/5ecba77b704f51380256f095/html5/thumbnails/8.jpg)
8
Recommendation #4: A national cancer data ecosystem for sharing and analysis.
Create a National Cancer Data Ecosystem to collect, share, and interconnect a broad array of large datasets so that researchers, clinicians, and patients will be able to both contribute and analyze data, facilitating discovery that will ultimately improve patient care and outcomes.
8
![Page 9: The Data Commons · 2016-12-19 · Recommendation #4: A national cancer data ecosystem for sharing and analysis. Create a National Cancer Data Ecosystem to collect, share, and interconnect](https://reader034.vdocuments.us/reader034/viewer/2022042301/5ecba77b704f51380256f095/html5/thumbnails/9.jpg)
![Page 10: The Data Commons · 2016-12-19 · Recommendation #4: A national cancer data ecosystem for sharing and analysis. Create a National Cancer Data Ecosystem to collect, share, and interconnect](https://reader034.vdocuments.us/reader034/viewer/2022042301/5ecba77b704f51380256f095/html5/thumbnails/10.jpg)
![Page 11: The Data Commons · 2016-12-19 · Recommendation #4: A national cancer data ecosystem for sharing and analysis. Create a National Cancer Data Ecosystem to collect, share, and interconnect](https://reader034.vdocuments.us/reader034/viewer/2022042301/5ecba77b704f51380256f095/html5/thumbnails/11.jpg)
Challenges with Biomedical Data
The Journal Article is the end goal
Data is a means to an ends (low value)
Data is not FAIRFindable, Accessible, Interoperable, Reproducible
Limited e-infrastructures to support FAIR data
![Page 12: The Data Commons · 2016-12-19 · Recommendation #4: A national cancer data ecosystem for sharing and analysis. Create a National Cancer Data Ecosystem to collect, share, and interconnect](https://reader034.vdocuments.us/reader034/viewer/2022042301/5ecba77b704f51380256f095/html5/thumbnails/12.jpg)
What’sChanging?
Digital ecosystems
![Page 13: The Data Commons · 2016-12-19 · Recommendation #4: A national cancer data ecosystem for sharing and analysis. Create a National Cancer Data Ecosystem to collect, share, and interconnect](https://reader034.vdocuments.us/reader034/viewer/2022042301/5ecba77b704f51380256f095/html5/thumbnails/13.jpg)
Development of the
NIH Data Commons
![Page 14: The Data Commons · 2016-12-19 · Recommendation #4: A national cancer data ecosystem for sharing and analysis. Create a National Cancer Data Ecosystem to collect, share, and interconnect](https://reader034.vdocuments.us/reader034/viewer/2022042301/5ecba77b704f51380256f095/html5/thumbnails/14.jpg)
Changing the conversation around Data sharing and access
NIH Data CommonsHow do we find data, software, standards?
How can we make (large) data, annotations, software, metadata accessible?
How do we reuse data, tools and standards?
How do we make more data machine readable?
How do we leverage existing digital technologies systems, infrastructures?
How do we collaborate?
How do we enable digital ecosystem?
![Page 15: The Data Commons · 2016-12-19 · Recommendation #4: A national cancer data ecosystem for sharing and analysis. Create a National Cancer Data Ecosystem to collect, share, and interconnect](https://reader034.vdocuments.us/reader034/viewer/2022042301/5ecba77b704f51380256f095/html5/thumbnails/15.jpg)
Data Commons enabling data driven science
Enable investigators to leverage all possible data and tools in the effort to accelerate biomedical discoveries, therapies and cures
by
driving the development of data infrastructure and data science capabilities through collaborative research and robust engineering
Matthew Trunnel, FHC
![Page 16: The Data Commons · 2016-12-19 · Recommendation #4: A national cancer data ecosystem for sharing and analysis. Create a National Cancer Data Ecosystem to collect, share, and interconnect](https://reader034.vdocuments.us/reader034/viewer/2022042301/5ecba77b704f51380256f095/html5/thumbnails/16.jpg)
Data Commons’s
![Page 17: The Data Commons · 2016-12-19 · Recommendation #4: A national cancer data ecosystem for sharing and analysis. Create a National Cancer Data Ecosystem to collect, share, and interconnect](https://reader034.vdocuments.us/reader034/viewer/2022042301/5ecba77b704f51380256f095/html5/thumbnails/17.jpg)
Developing a Data Commons
Treats products of research – data, methods, papers etc. as digital objects
These digital objects exist in a shared virtual space• Find, Deposit, Manage, Share, and Reuse data,
software, metadata and workflows
Digital object compliance through FAIR principles:• Findable• Accessible (and usable)• Interoperable • Reusable
![Page 18: The Data Commons · 2016-12-19 · Recommendation #4: A national cancer data ecosystem for sharing and analysis. Create a National Cancer Data Ecosystem to collect, share, and interconnect](https://reader034.vdocuments.us/reader034/viewer/2022042301/5ecba77b704f51380256f095/html5/thumbnails/18.jpg)
The Data Commons is a framework that supports
FAIR data access and sharing and
fosters the development of a digital ecosystem
https://datascience.nih.gov/commons
![Page 19: The Data Commons · 2016-12-19 · Recommendation #4: A national cancer data ecosystem for sharing and analysis. Create a National Cancer Data Ecosystem to collect, share, and interconnect](https://reader034.vdocuments.us/reader034/viewer/2022042301/5ecba77b704f51380256f095/html5/thumbnails/19.jpg)
The Data Commons Framework
Software: Services & Tools
SaaSApp store/User Interface
PaaS Services: APIs, Containers, Indexing,
scientific analysis tools/workflows
Data“Reference” Data Sets
User defined data
Digita
l Ob
ject Com
plia
nce
IaaS Compute Platform: Cloud
https://datascience.nih.gov/commons
![Page 20: The Data Commons · 2016-12-19 · Recommendation #4: A national cancer data ecosystem for sharing and analysis. Create a National Cancer Data Ecosystem to collect, share, and interconnect](https://reader034.vdocuments.us/reader034/viewer/2022042301/5ecba77b704f51380256f095/html5/thumbnails/20.jpg)
Mapping BD2K Activities and Commons Pilots to the Commons Framework
Digita
l Ob
ject Com
plia
nce
Software: Services & ToolsApp store/User Interface
BD2K Centers, MODS, HMP & InteroperabilitySupplements
BioCADDIE/OtherIndexing
Services: APIs, Containers, Indexing,
scientific analysis tools/workflows
NIH + Community defined data sets
Data“Reference” Data Sets
User defined data
Cloud credits model (CCM)
Compute Platform: Cloud or HPC
NCI & NIAID Cloud Pilots+ GDC
![Page 21: The Data Commons · 2016-12-19 · Recommendation #4: A national cancer data ecosystem for sharing and analysis. Create a National Cancer Data Ecosystem to collect, share, and interconnect](https://reader034.vdocuments.us/reader034/viewer/2022042301/5ecba77b704f51380256f095/html5/thumbnails/21.jpg)
Current Data Commons Pilots
![Page 22: The Data Commons · 2016-12-19 · Recommendation #4: A national cancer data ecosystem for sharing and analysis. Create a National Cancer Data Ecosystem to collect, share, and interconnect](https://reader034.vdocuments.us/reader034/viewer/2022042301/5ecba77b704f51380256f095/html5/thumbnails/22.jpg)
Current Data Commons Pilots
Making large and/or high impact NIH funded data sets
and tools accessible in the cloud
Explore feasibility of the Commons Framework Facilitate collaboration and interoperability
Developing Data and Software indexing methodsLeveraging BD2K Efforts: bioCADDIE and others.Collaborating with external groups
Provide access to cloud (IaaS) and PaaS/SaaS via creditsConnecting credits to the grants system
![Page 23: The Data Commons · 2016-12-19 · Recommendation #4: A national cancer data ecosystem for sharing and analysis. Create a National Cancer Data Ecosystem to collect, share, and interconnect](https://reader034.vdocuments.us/reader034/viewer/2022042301/5ecba77b704f51380256f095/html5/thumbnails/23.jpg)
Reference Data Sets PilotLarge, High-Impact Datasets in the Cloud
Vivien Bonazzi
![Page 24: The Data Commons · 2016-12-19 · Recommendation #4: A national cancer data ecosystem for sharing and analysis. Create a National Cancer Data Ecosystem to collect, share, and interconnect](https://reader034.vdocuments.us/reader034/viewer/2022042301/5ecba77b704f51380256f095/html5/thumbnails/24.jpg)
Mapping to the Commons FrameworkLarge, High-Impact Datasets in the Cloud - Populating the Commons
Software: Services & ToolsApp store/User Interface
Services: APIs, Containers, Indexing,
scientific analysis tools/workflows
Large, High-Impact Data Sets in the Cloud
Data“Reference” Data Sets
User defined data
Digita
l Ob
ject Com
plia
nce
Compute Platform: Cloud or HPC
![Page 25: The Data Commons · 2016-12-19 · Recommendation #4: A national cancer data ecosystem for sharing and analysis. Create a National Cancer Data Ecosystem to collect, share, and interconnect](https://reader034.vdocuments.us/reader034/viewer/2022042301/5ecba77b704f51380256f095/html5/thumbnails/25.jpg)
Overview: Large, High-Impact Datasets in the Cloud - Populating the Commons
Make large, high impact, NIH funded data sets available in the cloud/commons
Co-locate large datasets and compute power, to improve access, use, re-use, and sharing of data and tools
Kick-start the Commons with Commons-compliant data and tools Data must adhere to Common compliance /FAIR principles
Provide an indexable test data sets for bioCADDIE (and other indexing efforts)
![Page 26: The Data Commons · 2016-12-19 · Recommendation #4: A national cancer data ecosystem for sharing and analysis. Create a National Cancer Data Ecosystem to collect, share, and interconnect](https://reader034.vdocuments.us/reader034/viewer/2022042301/5ecba77b704f51380256f095/html5/thumbnails/26.jpg)
What will we learn: Large, High-Impact Datasets in the Cloud - Populating the Commons
This pilot project will inform NIH on:Which Clouds are most functional, practical, and
cost effective?What is involved in moving data resources to the
Cloud? What will it cost? How to manage challenges associated with both
open access and controlled access data? How do we find data and resources across clouds? How do we compute across clouds?
![Page 27: The Data Commons · 2016-12-19 · Recommendation #4: A national cancer data ecosystem for sharing and analysis. Create a National Cancer Data Ecosystem to collect, share, and interconnect](https://reader034.vdocuments.us/reader034/viewer/2022042301/5ecba77b704f51380256f095/html5/thumbnails/27.jpg)
Proposed Components: Large, High-Impact Datasets in the Cloud
Biomedical data resources and tools• Support to migrate large, high-impact datasets and associated tools
into multiple cloud providers• Data an tools sets must be FAIR
Cloud Infrastructure• Support for cloud storage and architectural engineering to support
data and tools
Coordination• Facilitate activities across the biomedical data resources and cloud providers
• Development of market place/app store approaches
• Auth: Authorization & Access controls • Tracking metrics (cost, usage etc.) and impact of the overall project
![Page 28: The Data Commons · 2016-12-19 · Recommendation #4: A national cancer data ecosystem for sharing and analysis. Create a National Cancer Data Ecosystem to collect, share, and interconnect](https://reader034.vdocuments.us/reader034/viewer/2022042301/5ecba77b704f51380256f095/html5/thumbnails/28.jpg)
Reference Data Sets – Next Steps
NIH Data Task Force • Chaired by Francis Collins• Involves many NIH ICs• Developing some shorter term preliminary pilots for larger NIH
funded data sets in the cloud• Expect to see some announcements in Jan/Feb 2017
RFI – engage in dialoged with the community • Planned Winter 2017
FOAs – Supporting large high impact data sets in the cloud• Spring 2017
![Page 29: The Data Commons · 2016-12-19 · Recommendation #4: A national cancer data ecosystem for sharing and analysis. Create a National Cancer Data Ecosystem to collect, share, and interconnect](https://reader034.vdocuments.us/reader034/viewer/2022042301/5ecba77b704f51380256f095/html5/thumbnails/29.jpg)
Commons Framework Pilots
Exploring feasibility of the Commons Framework : Software and Services layer
Valentina Di Francesco
![Page 30: The Data Commons · 2016-12-19 · Recommendation #4: A national cancer data ecosystem for sharing and analysis. Create a National Cancer Data Ecosystem to collect, share, and interconnect](https://reader034.vdocuments.us/reader034/viewer/2022042301/5ecba77b704f51380256f095/html5/thumbnails/30.jpg)
Commons Framework Pilots (CFPs)
Exploring feasibility of the Commons Framework
Facilitating connectivity, interoperability and access to digital objects
Providing digital research objects to populate the Commons
![Page 31: The Data Commons · 2016-12-19 · Recommendation #4: A national cancer data ecosystem for sharing and analysis. Create a National Cancer Data Ecosystem to collect, share, and interconnect](https://reader034.vdocuments.us/reader034/viewer/2022042301/5ecba77b704f51380256f095/html5/thumbnails/31.jpg)
Commons Framework PilotsPI Parent grant’s IC Project description
TOGA NIBIB • Cloud-hosted data publication system • Allows the automatic creation and publication of data a personalized data
repositoryMUSEN NIAID • Smart APIs – improved handling for metadata within APIs
• Ontological support for metadata within an API• Improving smart API discoverability: a registry of APIs
HAN NIGMS • Docker container hub for BD2K community• Docker containers for genomic analysis applications and pipelines• Benchmark, Evaluation & best practices
COOPER/KOHANE
NHGRI • Cloud based authenticated API access and exchange of causal modeling data , tools + genomic and phenomic data (PIC)
• Docker containers for CCD tools available in AWSHAUSSLER NHGRI • Secure sharing of germline genetic variations for a targeted panel of breast
cancer susceptibility genes and variations• (GA4GH) API : being able to query this data and metadata
Ohno-Machado NHLBI • Development of an ecosystem for repeatable science • easy reuse of data AND software; tracking of provenance. • Use of container technologies for software and data reuse.
White NHGRI • The entire HMP1 data set made accessible on AWS• Analysis tools for microbiome data in AWS
Ma’ayan NHLBI • A Cloud-Based Microscopy Imaging Commons Portal with microscopy data and metadata
Sternberg NHGRI • Development of a cloud-based literature curation system for specific curation tasks of the collaborating sites.
• An API to provide programmatic access to the relevant papers in PMCMODs PIs NHGRI • Development of a common data model for the MODs
• Development of APIs accessing data across the MODs
![Page 32: The Data Commons · 2016-12-19 · Recommendation #4: A national cancer data ecosystem for sharing and analysis. Create a National Cancer Data Ecosystem to collect, share, and interconnect](https://reader034.vdocuments.us/reader034/viewer/2022042301/5ecba77b704f51380256f095/html5/thumbnails/32.jpg)
Commons Framework Pilots
• APIs
• Containerization: • Docker containers, guidelines, registry store
• Workbenches, Connectors
• Indexing
• Market Place/App Store
![Page 33: The Data Commons · 2016-12-19 · Recommendation #4: A national cancer data ecosystem for sharing and analysis. Create a National Cancer Data Ecosystem to collect, share, and interconnect](https://reader034.vdocuments.us/reader034/viewer/2022042301/5ecba77b704f51380256f095/html5/thumbnails/33.jpg)
Mapping the Commons Framework PILOTS to the Commons Framework
Software: Services & Tools
App store/User Interface
White - HMP
Cooper
Musen
Toga
Services: APIs, Containers, Indexing,
scientific analysis tools/workflows
Ohno-Machado
Han
HausslerSternberg
MODs
Data“Reference” Data Sets
User defined dataMa’ayan
Compute Platform: Cloud or HPC
![Page 34: The Data Commons · 2016-12-19 · Recommendation #4: A national cancer data ecosystem for sharing and analysis. Create a National Cancer Data Ecosystem to collect, share, and interconnect](https://reader034.vdocuments.us/reader034/viewer/2022042301/5ecba77b704f51380256f095/html5/thumbnails/34.jpg)
Commons Framework Pilots : Updates
Sept. 2015 – First set of CFPs awarded
Nov. 2015 - CFPs participated in the AHM and the Commons breakout session
Feb. 2016 - Established Common Framework Working Group (CFWG)
• CFWG members: Pilots’ PIs and/or technical leads; few PIs of the BD2K interoperability projects
• Meeting in person on March 1, 2016
![Page 35: The Data Commons · 2016-12-19 · Recommendation #4: A national cancer data ecosystem for sharing and analysis. Create a National Cancer Data Ecosystem to collect, share, and interconnect](https://reader034.vdocuments.us/reader034/viewer/2022042301/5ecba77b704f51380256f095/html5/thumbnails/35.jpg)
Commons Framework Pilots : UpdatesMarch 2016 – CFPs meeting in person
• To develop an initial plan for the implementation of Commons Framework• Meeting presentations here• A manuscript describing the outcomes of the meeting was submitted
• Established the Commons Framework Working Group (CFWG) and sub-WGs on the following topics:• FAIRness Metrics (Neil McKenna & Michel Dumontier)
• Data-object registry (Lucila Ohno-Machado, Michel Dumontier, Wei Wang)
• Interoperability of APIs (Michel Dumontier)
• Workflow sharing and docker registry (Umberto Ravaioli & Brian O’Connor)
• Commons Framework Publications (Owen White)
Nov 28, 2016 – Held a CFWG meeting in person
These groups will present a report of their activities at the Commons Session tomorrow at 10:30am
![Page 36: The Data Commons · 2016-12-19 · Recommendation #4: A national cancer data ecosystem for sharing and analysis. Create a National Cancer Data Ecosystem to collect, share, and interconnect](https://reader034.vdocuments.us/reader034/viewer/2022042301/5ecba77b704f51380256f095/html5/thumbnails/36.jpg)
Commons Framework WG - Next Steps
GET INVOLVED: See Valentina Di Francesco or WG leads for details
A broad announcement to the BD2K research community went out in late summer – we are seeking more participants
Contribute to the implementation of the Commons Framework
Suggest other scientific areas of interest that need coordination
Generate guidelines that all of our peers will use as we begin to jumpstart the NIH Commons
Participate in meetings of the CFWG and hear the latest news
![Page 37: The Data Commons · 2016-12-19 · Recommendation #4: A national cancer data ecosystem for sharing and analysis. Create a National Cancer Data Ecosystem to collect, share, and interconnect](https://reader034.vdocuments.us/reader034/viewer/2022042301/5ecba77b704f51380256f095/html5/thumbnails/37.jpg)
Commons Framework – Next Steps
FOA: Support investigator-initiated projects to further develop the Data Commons Framework• Could leverage and expand upon resources developed with the
Reference data sets• Planned Fall 2017
FOA: Making existing data and tools Commons Compliant/FAIR• Competitive Supplements to existing NIH Awards.• Provide support to existing projects to make current digital resources
FAIR & Commons Compliant• Digital resources could include: data, analytical software, or
workflows• Planned Fall 2017
![Page 38: The Data Commons · 2016-12-19 · Recommendation #4: A national cancer data ecosystem for sharing and analysis. Create a National Cancer Data Ecosystem to collect, share, and interconnect](https://reader034.vdocuments.us/reader034/viewer/2022042301/5ecba77b704f51380256f095/html5/thumbnails/38.jpg)
Resource Search & Indexing
Discoverability of data and software
Ian Fore, Ron Margolis, Alison Yao, Claire Schulkey Dawei Lin
![Page 39: The Data Commons · 2016-12-19 · Recommendation #4: A national cancer data ecosystem for sharing and analysis. Create a National Cancer Data Ecosystem to collect, share, and interconnect](https://reader034.vdocuments.us/reader034/viewer/2022042301/5ecba77b704f51380256f095/html5/thumbnails/39.jpg)
Mapping to the Commons FrameworkLarge, High-Impact Datasets in the Cloud - Populating the Commons
Software: Services & ToolsApp store/User Interface
Services: APIs, Containers, Indexing,
scientific analysis tools/workflows
Indexing
Data“Reference” Data Sets
User defined data
Digita
l Ob
ject Com
plia
nce
Compute Platform: Cloud or HPC
![Page 40: The Data Commons · 2016-12-19 · Recommendation #4: A national cancer data ecosystem for sharing and analysis. Create a National Cancer Data Ecosystem to collect, share, and interconnect](https://reader034.vdocuments.us/reader034/viewer/2022042301/5ecba77b704f51380256f095/html5/thumbnails/40.jpg)
An Indexing Ecosystem for the Commons:
a virtual environment for ‘FIND’
Enable biomedical research by providing scientists with the ability to FIND digital resources
Establish a mature resource discovery tool(s) that can be sustained as long as the need for it exists
Focus on characteristics of the tool as infrastructure Maintains a defined level of service Contribute to a Commons that is reliable, available,
easy to use, and adaptable
![Page 41: The Data Commons · 2016-12-19 · Recommendation #4: A national cancer data ecosystem for sharing and analysis. Create a National Cancer Data Ecosystem to collect, share, and interconnect](https://reader034.vdocuments.us/reader034/viewer/2022042301/5ecba77b704f51380256f095/html5/thumbnails/41.jpg)
Current Activities
Identify indexing activities in and
outside NIH
BD2K: bioCADDIE, Centers of Excellence
ICs: NLM, NCI, NHGRI, other
Non-BD2K: Elixir (EBI), Publishers
(Elsevier), Repositories, schema.org
Compareongoing
activities and identify needs
Benchmarking
Identify gaps in strategy• Dimensions to
consider• Content,
Metadata, Platform/Technology
Coordinatewith other BD2K
PMWGs
Standards
Specific Center WGs
![Page 42: The Data Commons · 2016-12-19 · Recommendation #4: A national cancer data ecosystem for sharing and analysis. Create a National Cancer Data Ecosystem to collect, share, and interconnect](https://reader034.vdocuments.us/reader034/viewer/2022042301/5ecba77b704f51380256f095/html5/thumbnails/42.jpg)
Cloud Credits Model
George Komatsoulis
![Page 43: The Data Commons · 2016-12-19 · Recommendation #4: A national cancer data ecosystem for sharing and analysis. Create a National Cancer Data Ecosystem to collect, share, and interconnect](https://reader034.vdocuments.us/reader034/viewer/2022042301/5ecba77b704f51380256f095/html5/thumbnails/43.jpg)
Mapping to the Commons FrameworkLarge, High-Impact Datasets in the Cloud - Populating the Commons
Software: Services & ToolsApp store/User Interface
Services: APIs, Containers, Indexing,
scientific analysis tools/workflows
Data“Reference” Data Sets
User defined data
Digita
l Ob
ject Com
plia
nce
Compute Platform: Cloud or HPC Cloud Credits Pilot
![Page 44: The Data Commons · 2016-12-19 · Recommendation #4: A national cancer data ecosystem for sharing and analysis. Create a National Cancer Data Ecosystem to collect, share, and interconnect](https://reader034.vdocuments.us/reader034/viewer/2022042301/5ecba77b704f51380256f095/html5/thumbnails/44.jpg)
![Page 45: The Data Commons · 2016-12-19 · Recommendation #4: A national cancer data ecosystem for sharing and analysis. Create a National Cancer Data Ecosystem to collect, share, and interconnect](https://reader034.vdocuments.us/reader034/viewer/2022042301/5ecba77b704f51380256f095/html5/thumbnails/45.jpg)
How do credits work from the point of view of an investigator?
Investigators receive credits worth a certain amount (in dollars) that can be used at the conformant provider(s) of their choice
Credits are pre-purchased and applied to the account of the investigator with the relevant provider(s)
As the investigator uses services with a conformant provider, the provider debits the value of the investigators usage against the pre-loaded credits
INVESTIGATORS ARE NOT BILLED BY PROVIDERS AS LONG AS THEY DO NOT EXCEED THEIR CREDIT ALLOCATION.
![Page 46: The Data Commons · 2016-12-19 · Recommendation #4: A national cancer data ecosystem for sharing and analysis. Create a National Cancer Data Ecosystem to collect, share, and interconnect](https://reader034.vdocuments.us/reader034/viewer/2022042301/5ecba77b704f51380256f095/html5/thumbnails/46.jpg)
Commons Credits Model Pilot
3 year pilot to test this business model to facilitate researcher use of cloud resources (enhance data sharing and potentially reduce costs).
Contract with the CMS Alliance to Modernize Healthcare (CAMH) Federally Funded Research and Development Center (FFRDC) managed by the MITRE corporation• FFRDCs are special purpose, government-owned but
contractor-managed entities that meet R&D needs that can’t be well managed by traditional grants and contracts
• Examples: National Labs and organizations like RAND Pilot will not directly interact with the existing grant system.
• Instead is modeled on the mechanisms being used to gain access to NSF and DOE national resources (HPC, light sources, etc.)
The only required qualification for applying for credits will be that the investigator must have an existing NIH grant
![Page 47: The Data Commons · 2016-12-19 · Recommendation #4: A national cancer data ecosystem for sharing and analysis. Create a National Cancer Data Ecosystem to collect, share, and interconnect](https://reader034.vdocuments.us/reader034/viewer/2022042301/5ecba77b704f51380256f095/html5/thumbnails/47.jpg)
Commons Credits Model Pilot
Current List of Approved Vendors DLT = Amazon Web Services Reseller IBM Onix = Google Reseller Broad and ISB NCI Cloud Pilots accessible via Google Two more approved but negotiating participation agreement
First batch of credits issued Sep 29, 2016 8 Investigators (cohort 1) that are part of an ‘alpha test’ Only IBM/AWS at the time 93% AWS, 7% IBM First credits have been used, usage information coming
First “production” credit request period opening this month
![Page 48: The Data Commons · 2016-12-19 · Recommendation #4: A national cancer data ecosystem for sharing and analysis. Create a National Cancer Data Ecosystem to collect, share, and interconnect](https://reader034.vdocuments.us/reader034/viewer/2022042301/5ecba77b704f51380256f095/html5/thumbnails/48.jpg)
Considerations and Concluding Thoughts
![Page 49: The Data Commons · 2016-12-19 · Recommendation #4: A national cancer data ecosystem for sharing and analysis. Create a National Cancer Data Ecosystem to collect, share, and interconnect](https://reader034.vdocuments.us/reader034/viewer/2022042301/5ecba77b704f51380256f095/html5/thumbnails/49.jpg)
Considerations Communication
Metrics – Understanding and accounting of data usage patterns
Cost• Cloud Storage• Pay for use cloud compute (NIH credits pilot)• Indirect costs for cloud
Hybrid Clouds – Institution (private) and commercial (public) clouds
Managing Open vs Controlled access data • Auth: single sign on - dreams/nightmares?
Archive vs Working Copies of data
Interoperability with other Commons (clouds)
![Page 50: The Data Commons · 2016-12-19 · Recommendation #4: A national cancer data ecosystem for sharing and analysis. Create a National Cancer Data Ecosystem to collect, share, and interconnect](https://reader034.vdocuments.us/reader034/viewer/2022042301/5ecba77b704f51380256f095/html5/thumbnails/50.jpg)
Standards – Metadata, UIDs, APIs
Discoverability – Finding digital objects across clouds
Interfaces – For users with different needs and capabilities
Consent – Reconsenting data, Dynamic consents?
Policies • Data sharing policies that are useful and effective • Keep pace with use of technology (e.g. dbGAP data in the Cloud)
Incentives • Access to, and shareability of FAIR Data as part of NIH grant review
criteria
Governance – Community involvement in governance models
Sustainability – Long term support
![Page 51: The Data Commons · 2016-12-19 · Recommendation #4: A national cancer data ecosystem for sharing and analysis. Create a National Cancer Data Ecosystem to collect, share, and interconnect](https://reader034.vdocuments.us/reader034/viewer/2022042301/5ecba77b704f51380256f095/html5/thumbnails/51.jpg)
Summary
We need an unprecedented level of convergence and collaboration to drive biomedical science to the next level.
Supporting this model of data-intensive collaborative science requires a shift in academic research culture and new investments in data infrastructure and capabilities.
Matthew Trunnel, FHC
![Page 52: The Data Commons · 2016-12-19 · Recommendation #4: A national cancer data ecosystem for sharing and analysis. Create a National Cancer Data Ecosystem to collect, share, and interconnect](https://reader034.vdocuments.us/reader034/viewer/2022042301/5ecba77b704f51380256f095/html5/thumbnails/52.jpg)
Acknowledgments• ADDS Office: Jennie Larkin, Phil Bourne, Michelle Dunn,Mark Guyer, Allen Dearry, Sonynka Ngosso,
Tonya Scott, Lisa Dunneback, Vivek Navale (CIT/ADDS)
• NCBI: George Komatsoulis
• NHGRI: Valentina di Francesco
• NIGMS: Susan Gregurick
• CIT: Andrea Norris, Debbie Sinmao
• NIH Common Fund: Jim Anderson , Betsy Wilder, Leslie Derr
• NCI Cloud Pilots/ GDC: Warren Kibbe, Tony Kerlavage, Tanja Davidsen
• Commons Reference Data Set Working Group: Weiniu Gan (HL), Ajay Pillai (HG), Elaine Ayres, (BITRIS), Sean Davis (NCI), Vinay Pai (NIBIB), Maria Giovanni (AI), Leslie Derr (CF), Claire Schulkey (AI)
• RIWG Core Team: Ron Margolis (DK), Ian Fore, (NCI), Alison Yao (AI), Claire Schulkey (AI), Eric Choi (AI)
• OSP: Dina Paltoo, Kris Langlais, Erin Luetkemeier, Agnes Rooke,
• Research and Industry: Mathew Trunnell (FHC), Bob Grossman (Chicago), Toby Bloom (NYGC)
![Page 53: The Data Commons · 2016-12-19 · Recommendation #4: A national cancer data ecosystem for sharing and analysis. Create a National Cancer Data Ecosystem to collect, share, and interconnect](https://reader034.vdocuments.us/reader034/viewer/2022042301/5ecba77b704f51380256f095/html5/thumbnails/53.jpg)
Acknowledgements- CFPs
NIH CFPs WG • Valentina Di Francesco• Sam Moore• Vivien Bonazzi• Allen Dearry• Maria Giovanni• Susan Gregurick• Weiniu Gan• James Luo• Stacia Friedman-Hill• Ajay Pillai• Leslie Derr • Debbie Sinmao• Eric Choi• Claire Schulkey• George Komatsoulis
CFWG • Owen White• Neil McKenna• Michel Dumontier• Umberto Ravaioli• Brian O’Connor• Lucila Ohno-Machado• Wei Wang• All the other members
![Page 54: The Data Commons · 2016-12-19 · Recommendation #4: A national cancer data ecosystem for sharing and analysis. Create a National Cancer Data Ecosystem to collect, share, and interconnect](https://reader034.vdocuments.us/reader034/viewer/2022042301/5ecba77b704f51380256f095/html5/thumbnails/54.jpg)
Acknowledgements - Credits Model
• ADDS Office• Vivien Bonazzi• Phil Bourne• Jennie Larkin• Mark Guyer
• MITRE• Ari Abrams-Kudan• Wenling (Eileen) Chang• Peter Gutgarts• Lynette Hirschman• William Kim• Eldred Rubeiro• Bruce Shirk• David Tanenbaum• Lisa Tutterow
• Grant Thornton• Katie Beringer• Mike Clifford• Tamara Reynolds
• NIH• Tanja Davidsen (NCI)• Valentina di Franceso (NHGRI) • Susan Gregurick (NIGMS)• David Lipman (NCBI)• Vivek Navale (CIT)• Jim Ostell (NCBI)• Debbie Sinmao (CIT)• Nick Weber (NIAID)
• NITRD• Peter Lyster
![Page 55: The Data Commons · 2016-12-19 · Recommendation #4: A national cancer data ecosystem for sharing and analysis. Create a National Cancer Data Ecosystem to collect, share, and interconnect](https://reader034.vdocuments.us/reader034/viewer/2022042301/5ecba77b704f51380256f095/html5/thumbnails/55.jpg)
Stay in Touch
QR Business Card
@Vivien.Bonazzi
Slideshare
Blog (Coming soon!)