cyverse: transforming life science research via cyberinfrastructure
TRANSCRIPT
CYVERSE: TRANSFORMING LIFE SCIENCE RESEARCH VIA CYBERINFRASTRUCTURE
Matthew Vaughn @mattdotvaughn
Director, Life Sciences Computing, TACC
Co-PI Cyverse, Araport, Jetstream Cloud
9/8/2016 1
OVERVIEW
9/8/2016 2
• WHAT IS CYVERSE?
• HOW IS IT TRANSFORMATIONAL FOR LIFE SCIENCES RESEARCH?
• HOW DOES IT FIT INTO THE BIGGER SCHEME?
• WHAT DIRECTIONS AND CHALLENGES ARE IN ITS FUTURE?
CYVERSE IS A CYBERINFRASTRUCTURE
9/8/2016 3
Vision: Transforming science through data-driven discovery
Mission: To design, develop, deploy, and expand a national cyberinfrastructure for life science research, and to train scientists in its use
SUPPORTED BY THE NSF BIO DIRECTORATE
9/8/2016 4
• Division of Biological Infrastructure
• $100 Million, 10-year investment
• CyVerse resources are
– Freely available to the community
– Intended to spur national and international collaboration for research and education
iPlant 2008Empowering a New Plant Biology
iPlant 2013Cyberinfrastructure for Life Science
CyVerse 2016Transforming Science Through Data-Driven Discovery
DBI-0735191DBI-1265383
RESEARCH TEAMS NEED THIS
Store, organize, share primary data
Do basic analysis
Store, organize, share data products
Generate and explore hypotheses
Share analysis code with the scientific public
Integrate results from new experiments
Publish data alongside plots, visualizations and analytical tools
9/8/2016 9
BUT END UP DEALING WITH THIS
Data lifecycle management
Fine-grained permission management
Discoverability
Version control
Taming promising new analysis codes (usually based immature technology)
Paying for storage, cycles, and consulting
Making their science reproducible
9/8/2016 10
CYVERSE PRODUCT MATRIX
9/8/2016 12
AtmosphereUser-provisioned, highly configurable cloud computing environment tailored for sciences
DiscoveryEnvironment
Web-accessible analysis workbench and gateway to national HPC infrastructure (XSEDE)
Bisque Software for managing, analyzing and visualizing high throughput imaging data
Data StoreScalable data storage for managing and sharing data across CyVerse’s CI and external data resources
Science APIsAutomation interfaces to connect data and computation for rapid integration external resources. Also used as a graduate teaching platform.
DNA Subway Classroom-friendly bioinformatics teaching platform
Powered by CyVerse Third-party applications built on CyVerse’s foundational services and
Welch et al. 2013
Bioinformatics Specialist
Computing Professional
Bench Scientist
EMPOWER USERS AT ALL LEVELS
Help them avoid data and
operations siloes
9/8/2016 14
Science applications
Domain-specific services
Establishedsoftware and CI
Physical resources
Federated Storage
National CI VirtualizationJob
SchedulingSingle
Sign-on
Ease
of
Use
Ease
of
Re-u
se
IMPACTS
9/8/2016 15
• 500+ publications• >2PB user data stored• 40+k registered users• Millions of compute
hours annually• Hundreds of trainees
CYVERSE IS A HUB IN A RICH & COLLABORATIVE ECOSYSTEM
9/8/2016 16
• Using• Collaborating• Contributing• Supporting• Inventing
CURRENT INITIATIVES
9/8/2016 17
Enabling Data-Driven Discovery. Providing Advanced Training to Researchers. Removing Barriers to Reproducible Science.
Cyverse Data Commons
Portable Science Lab
Intensive Engagement
CYVERSE DATA COMMONS
9/8/2016 18
Make research data discoverable and reusable. Ensure it ends up stored in its natural repository.
Cyverse Data Store
Staging Area
Data Commons Portal
Natural Repositories
Publish in place simply by sharing
Curate, format, describe metadata
Published snapshot with DOI and open
access
Facilitated deposit to NCBI-SRA, Genbank, and
more
PORTABLE SCIENCE LAB
9/8/2016 19
Continue adoption of technologies to describe, encapsulate, and share research code and data.
Virtual machines, Linux containers, Web Service APIs, Workflow Standards
Integrated via Interactive, Narrative Notebooks
INTENSIVE ENGAGEMENT
9/8/2016 20
Extended Collaborative
Support
Consultation and Support Forums
Hands-on Training and
Tutorials
Enhanced Support Tooling
Empower Researchers to Embrace and
Extend Cyverse
SUMMARY
9/8/2016 21
• CyVerse is a reference model for cyberinfrastructure that is already being extended to other disciplines
• CyVerse provides a vertically integrated, scalable data-to-discovery cyberinfrastructure that leverages existing federal and state investments to transform life science research
• Cyverse is driving technological and operational innovation via a web of interactions and collaborations with other projects, platforms, and infrastructures.
KEY CHALLENGE - CYVERSE VALUE PROPOSITION
9/8/2016 22
“Are you still going to be around in 3 years?”
”Why did my analysis fail? Don’t you have big computers?”
“Shouldn’t we just go to Amazon Web Services?”
“I don’t want my students spending time learning computing.”
“Why aren’t you working on X?”