building reproducible network data analysis / visualization workflows
TRANSCRIPT
Keiichiro Ono UCSD Trey Ideker Lab Cytoscape Core Team
Lab Meeting Aug 4, 2015
Building Reproducible Network Data Analysis / Visualization Workflows
REST
Problems We are Trying to Solve
- Complex software stack for data analysis - Setting up environment for data analysis is not trivial, and it is time-
consuming
- Python 3.x or 2.x/NumPy/SciPy/Cython Modules - R/Bioconductor/packages - OS version, etc.
- Automation - Point-and-Click operations are not reproducible
- Applying different layouts to 100 networks by hand is possible, but ridiculous - Sharing Recipe (= common workflows) is hard
- Integration to external computing resources
Goal: Reproducible,Scalable Dry Experiments
REST
REST
- Docker - Data analysis environment in a portable container
- GitHub - Source code sharing
- Jupyter Notebook - Your electronic lab notebook
- cyREST - RESTful API module for Cytoscape
Goal: Reproducible, Scalable Dry Experiments
Data Preparation
Analysis Visualization
REST
Scenario 1: Everything on your Workstation
Notebook Server
Your Jupyter Notebook
REST
Scenario 2: Workstation + Cloud
Notebook Server
Your Jupyter Notebook
Example: Community-Detection + Edge-Weighted Layout
Source Code: bit.ly/1P4LUFU
Demo
TODO
- Integration to Cyberinfrastructure (CI)
- R Wrapper - https://github.com/tmuetze/
Bioconductor_RCy3_the_new_RCytoscape
- More realistic workflows / pipelines
Resources- cyREST
- http://apps.cytoscape.org/apps/cyrest
- py2cytoscape
- https://pypi.python.org/pypi/py2cytoscape
- RCy3
- https://github.com/tmuetze/Bioconductor_RCy3_the_new_RCytoscape