exploring architecture options for a federated, cloud...

16
Exploring Architecture Options for a Federated, Cloud-based Systems Biology Knowledgebase Ian Gorton, Jenny Liu, Jian Yin 1

Upload: others

Post on 04-Jun-2020

1 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Exploring Architecture Options for a Federated, Cloud ...salsahpc.indiana.edu/CloudCom2010/slides/PDF/Exploring Architectu… · Sharing data, software tools, and computing resources

Exploring Architecture Options for a Federated, Cloud-based Systems Biology

KnowledgebaseIan Gorton, Jenny Liu, Jian Yin

1

Page 2: Exploring Architecture Options for a Federated, Cloud ...salsahpc.indiana.edu/CloudCom2010/slides/PDF/Exploring Architectu… · Sharing data, software tools, and computing resources

Systems Biology

Systems BiologyIntegrated study of organisms as a wholeObtain, integrate, and analyze complex data from multiple experimental sources using interdisciplinary tools

RequirementsLarge amount of dataDifferent types of toolsLarge amount of computation resources

2

Page 3: Exploring Architecture Options for a Federated, Cloud ...salsahpc.indiana.edu/CloudCom2010/slides/PDF/Exploring Architectu… · Sharing data, software tools, and computing resources

Systems Biology Knowledgebase

Drawbacks of the current approachThreshold of entrance can be highLittle reusing and sharing of the data and tools, wasteful repetitive effort to develop similar software toolsResults are hard to replicated

Seamlessly sharing and integration of data and software tools between multiple institution are attractiveThe goal of system biology knowledgebase is to exploit cloud computing technologies to enable sharing of data and software tools

3

Page 4: Exploring Architecture Options for a Federated, Cloud ...salsahpc.indiana.edu/CloudCom2010/slides/PDF/Exploring Architectu… · Sharing data, software tools, and computing resources

Why Cloud Computing

Enable sharing of data and software toolsDynamic allocation of computing resourcesMany software tools can be converted to run on top of cloud computing services such as Hadoop

4

Page 5: Exploring Architecture Options for a Federated, Cloud ...salsahpc.indiana.edu/CloudCom2010/slides/PDF/Exploring Architectu… · Sharing data, software tools, and computing resources

Outline

IntroductionSystem ArchitecturePrototype of selected components

Case studyHadoop based systems biology tools

Conclusion

5

Page 6: Exploring Architecture Options for a Federated, Cloud ...salsahpc.indiana.edu/CloudCom2010/slides/PDF/Exploring Architectu… · Sharing data, software tools, and computing resources

Centralize verse Federated

Advantages of centralized approachEase of integrationMore efficient computing resource allocations

However, many institute may want to retain controls of their data and toolsFederated approach

Leverage specialized computing resources across organizations

6

Page 7: Exploring Architecture Options for a Federated, Cloud ...salsahpc.indiana.edu/CloudCom2010/slides/PDF/Exploring Architectu… · Sharing data, software tools, and computing resources

Architecture Overview

7

Workflow Tools, Web Portals, Desktop Apps

cURL scriptsphp java python

Cloud storage

Kbase core

Cloud computations

Cloud-based data APIs

e.g. EC2 e.g. Clusters

Kbase Interface Layer (for flexible federation of Kbase Data and Compute resources)

HPC-basedcomputations

RESTful API LayerMiddleware and Workflow Utilities Database Adaptors

e.g. S3

ExampleFederated Resources

Data and Resource Directories

User AccessLayer

InfrastructureLayer

FederationLayer

Semantic Access Interface Layer

Page 8: Exploring Architecture Options for a Federated, Cloud ...salsahpc.indiana.edu/CloudCom2010/slides/PDF/Exploring Architectu… · Sharing data, software tools, and computing resources

Components

Location independent components Uniformed interfacesEasy compositionExecution can be monitored with JBPM

8

Page 9: Exploring Architecture Options for a Federated, Cloud ...salsahpc.indiana.edu/CloudCom2010/slides/PDF/Exploring Architectu… · Sharing data, software tools, and computing resources

Secure Communication

Security must be ensured for communication across institutions

Only SSL traffic are allowed through firewallRequiring all the components to use SSL could be difficultUse SOCKS to minimize code changes of components

9

Page 10: Exploring Architecture Options for a Federated, Cloud ...salsahpc.indiana.edu/CloudCom2010/slides/PDF/Exploring Architectu… · Sharing data, software tools, and computing resources

Example

Original codeURL url = new URL(urlname);Modified codeSocketAddress addr = new InetSocketAddress("localhost", 8182);Proxy proxy = new Proxy(Proxy.Type.SOCKS, addr); URL url = new URL(urlname); // Create the URLURLConnection uc = url.openConnection(proxy);

10

Page 11: Exploring Architecture Options for a Federated, Cloud ...salsahpc.indiana.edu/CloudCom2010/slides/PDF/Exploring Architectu… · Sharing data, software tools, and computing resources

1Advanced Visualizations

VESPA

Prototype

Script: translate DNA (.fna file) in six frames

Protein fasta file(.faa file)

Visualization tool

Polygraph

Query & copy the .fna fileQuery & copy the

.faa file

Visualization at a user’s local workstation

Query & copy the .fna file and the .gbk file

peptide file

post-process script

parameter file

Proteomics data(dta files)

GenBankQuery & copy the .dta files

Page 12: Exploring Architecture Options for a Federated, Cloud ...salsahpc.indiana.edu/CloudCom2010/slides/PDF/Exploring Architectu… · Sharing data, software tools, and computing resources

Hadoop Based Polygraph

Polygraph is a proteomics application to identify peptides from MS dataInitially implemented with MPILoosely coupled and suitable for HadoopSmall amount of effort to adapt it to run on top of Hadoop

12

Page 13: Exploring Architecture Options for a Federated, Cloud ...salsahpc.indiana.edu/CloudCom2010/slides/PDF/Exploring Architectu… · Sharing data, software tools, and computing resources

Running Polygraph

13

Page 14: Exploring Architecture Options for a Federated, Cloud ...salsahpc.indiana.edu/CloudCom2010/slides/PDF/Exploring Architectu… · Sharing data, software tools, and computing resources

Experimental Results

14

Page 15: Exploring Architecture Options for a Federated, Cloud ...salsahpc.indiana.edu/CloudCom2010/slides/PDF/Exploring Architectu… · Sharing data, software tools, and computing resources

Comparison

MPI-base implementation is highly tuned and thus more efficientHadoop based approach is more flexible

Most cloud computing providers provide Hadoop serviceFlexibility for leveraging various amounts of computing resource without changing code

Can produce results even with one machineMore machines can speed up the computation

Many system biology applications can be adapted to Map Reduce paradigm

15

Page 16: Exploring Architecture Options for a Federated, Cloud ...salsahpc.indiana.edu/CloudCom2010/slides/PDF/Exploring Architectu… · Sharing data, software tools, and computing resources

Conclusion

Sharing data, software tools, and computing resources is essential for systems biologyCloud computing can provide the ideal platforms

Many applications are loosely coupled and can be adapted to run in cloud computing environments

Federated approach provides more flexibilityUniformed interfaces enable easy integration

16