is there an app for that ?

41
1 Is there an app for that ? Challenges in scalable analysis for Life sciences 1 Nirav Merchant UA BioComputing + iPlant Arizona Research Laboratories University of Arizona http://

Upload: obelia

Post on 24-Feb-2016

21 views

Category:

Documents


0 download

DESCRIPTION

Is there an app for that ?. Challenges in scalable analysis for Life sciences. Nirav Merchant UA BioComputing + iPlant Arizona Research Laboratories University of Arizona http:// bcf.arl.arizona.edu /. 1. Topic Coverage. Formula for success (and failure) Flavors of Bio-information - PowerPoint PPT Presentation

TRANSCRIPT

Page 1: Is there an  app for that  ?

1

Is there an app for that ?Challenges in scalable analysis for Life sciences

1

Nirav MerchantUA BioComputing + iPlantArizona Research LaboratoriesUniversity of Arizonahttp://bcf.arl.arizona.edu/

Page 2: Is there an  app for that  ?

Topic Coverage Formula for success (and failure) Flavors of Bio-information What is iPlant ? Typical Non-NGS workflow Data life cycle issues (some) Application life cycle issues (some) Why “app” ?

2

Page 3: Is there an  app for that  ?

3

+ =

Simple Formula

Page 4: Is there an  app for that  ?

The Reality

4

+ +PERL PythonJava RubyFortran C C# C++R Matlabetc.

AmazonAzureRackspaceCampus HPCXSEDEEtc.

and lots of glue…..

Page 5: Is there an  app for that  ?

+ =

Simple Formula

Page 6: Is there an  app for that  ?

Life science: Going across scales

6

Page 7: Is there an  app for that  ?

Putting it all to work

Wayne Stayskal, The Tampa Tribune

Page 8: Is there an  app for that  ?

The iPlant CollaborativeCyberinfrastructure for the Plant Sciences

• The iPlant CI is designed as infrastructure. • This means it is a platform upon which other projects

can build. • Use of the iPlant infrastructure can take one of several

forms: Storage Computation Hosting Web Services Scalability

Page 9: Is there an  app for that  ?

For a challenge as broad as “plant science,” focus on specific applications/tools is a moving target, and never enough.

Most important to build a platform that can support diverse and constantly evolving needs. “Cyberinfrastructure” is, in fact, infrastructure. The platform can lift all the apps, not select winners and losers.

“The useful lifetime of our analysis toolchains is now 6 months”

-Matthew Trunnel, Broad Institute

The iPlant CollaborativeCyberinfrastructure for the Plant Sciences

Page 10: Is there an  app for that  ?

EndUsers

ComputationalUsers

TeragridXSEDE

The iPlant CollaborativeCyberinfrastructure for the Plant Sciences

Page 11: Is there an  app for that  ?

BioInformation :: Data FlavorsSequencesStructuresImagesVideoAudioPathways (graphs)Text (Publications)TracesCombination (eg Video & Traces)And much more …

Page 12: Is there an  app for that  ?

Life scientist :: Data Wrestler

Volume of data is increasing Resolution of data is increasing Number of data repositories is

increasing Ever increasing analysis options Demands to share, collaborate

data (team science) Do you know where your data is ?

(and your collaborators data !)

Page 13: Is there an  app for that  ?

13

SystemsBiology

Genomics

FunctionalGenomics

Metabolomics

Proteomics

Pharmaco-genomics

Modeling

Clinical

Pathways

Page 14: Is there an  app for that  ?

X prize for sequencing

142012 guidelines are different, this is graphics dated

Page 15: Is there an  app for that  ?

X prize for analyzing it ?

?15

Page 16: Is there an  app for that  ?

The Lifecycle

Data Acquisition

and Modeling

Collaboration and

Visualization

Analysis and Data

Mining

Dissemination and Sharing

Archive and Presentatio

n

16

The Fourth Paradigm: Data-Intensive Scientific Discovery

Page 17: Is there an  app for that  ?

17

Page 18: Is there an  app for that  ?

18

Page 19: Is there an  app for that  ?

Why is this hard when we have … Pegasus Taverna Kepler Condor (DAGman) Gearman Makeflow myExperiment Science pipes We have X (take your pick)

19

Page 20: Is there an  app for that  ?

What did the scientists do ?

20

• Used the “parametric launcher” • Essentially its a very functional “submit” script !• Why use it ?

• Dir of full of files and one executable• Simple linear flow (no branching)• Needed results “yesterday” for

conference/working group• Need to be run ONCE every year

• Not sexy but functional• Serial runs are important

Page 21: Is there an  app for that  ?

Python in HPC : OMG

21

Page 22: Is there an  app for that  ?

Data issues

22

Page 23: Is there an  app for that  ?

DLM: Issues Most “pipelines/analysis” are Data

intensiveSadly data originates from slow desktops, external hard drives, file servers using ftp, http etc (and ends up there)

Hard to stage data to begin computation !No place to bring things together (quickly)

Data needs substantial pre and post processingMeta data is usually not adequate

RDBMS are part of workflows Do you need better indexing of flat files ?

It does not have to be this way !

23

Page 24: Is there an  app for that  ?

24

Page 25: Is there an  app for that  ?

Data Lifecycle: Our effort

25

Page 26: Is there an  app for that  ?

What can users do ?

26

Page 27: Is there an  app for that  ?

27

Page 28: Is there an  app for that  ?

But I don’t get throughput

28

Networking is huge BLACK BOX and too much finger pointing

Page 29: Is there an  app for that  ?

Compute Issues: Cloud

29

Page 30: Is there an  app for that  ?

What is cloud computing ?

http://geekandpoke.typepad.com/geekandpoke/2009/03/let-the-clouds-make-your-life-easier.html

Page 31: Is there an  app for that  ?

The application lifecycle

31

Page 32: Is there an  app for that  ?

32

A rich web client Provides a consistent interface to

a range of bioinformatics tools Provides a portal to users not

wishing to interact with lower level infrastructure

An integrated, extensible system of applications and services

Provides additional intelligence above low level APIs – Provenance, Collaboration, etc.

The iPlant CollaborativeiPlant Discovery Environment

Page 33: Is there an  app for that  ?

API-compatible implementation of Amazon EC2/S3 interfaces

Virtualize the execution environment for applications and services

Get Up to 12 core / 48 GB instances Access to Cloud Storage + EBS 1008 users 167 users launched 657 instances (May 2012) 227 were terminated outside the of Atmosphere due to

idleness (per user's request) 430 instances average time was 1 day, 16 hours, and 13

minutes. Longest running was 30 days Run servers, CloudBurst desktop use cases. Big data and

the desktop are co-local again!

>60 hosted applications in Atmosphere today, including users from USDA, Forest Service, data providers, etc.

30+ private images for postdocs and grad students for training classes

The iPlant CollaborativeProject Atmosphere™: Custom Cloud Computing

Page 34: Is there an  app for that  ?

Atmosphere: Collaboration

iPlant Data Store

Page 35: Is there an  app for that  ?

Lifecycle

Page 36: Is there an  app for that  ?

How to Connect

Page 37: Is there an  app for that  ?

Different Ways to Log in to VMs

Page 38: Is there an  app for that  ?

Steps to get started !

Page 39: Is there an  app for that  ?

My wish list for CCL (parrot) Improved performance for iRODS

transfers(parallel transfers ?)

File permission calls (iRODS ACL)* Ability to provide throughput/transfer

stats Thanks for updating iRODS support to

3.1

39

Page 40: Is there an  app for that  ?

My wish list for CCL (makeflow) *Bundle dependencies along with

script and binaries e.g.CDE: Automatically create portable Linux applicationshttp://www.pgbovine.net/cde.html

Progress reporting, profiling of performance e.g equivalent progress bar

40

*Not a makeflow issue but a good feature

Page 41: Is there an  app for that  ?

Staff:Greg AbramSonali AdityaRoger BarthelsonBrad BoyleTodd BryanGordon BurleighJohn CazesMike ConwayKaren CranstonRion DoodeyAndy EdmondsDmitry FedorovMichael GattoUtkarsh GaurCornel GhibanMichael GonzalesHariolf HäfeleMatthew Hanlon

74

Metadata Data Tools Workflows Viz

Executive Team:Steve GoffDan Stanzione

Faculty Advisors & Collaborators:Ali AkogluGreg AndrewsKobus BarnardSue BrownThomas BrutnellMichael DonoghueCasey DunnBrian EnquistDamian GesslerRuth GreneJohn HartmanMatthew HudsonDan KliebensteinJim Leebens-MackDavid LowenthalRobert Martienssen

Students:Peter BaileyJeremy BeaulieuDevi BhattacharyaStorme BriscoeYa-Di ChenJohn DonoghueSteven Gregory Yekatarina KhartianovaMonica Lent Amgad Madkour

B.S. Manjunath Nirav Merchant David NealeBrian O’MearaSudha RamDavid SaltMark SchildhauerDoug SoltisPam SoltisEdgar SpaldingAlexis StamatakisAnn StapletonLincoln SteinVal TannenTodd VisionDoreen WareSteve WelchMark Westneat

Andrew LenardsZhenyuan LuEric LyonsNaim MatasciSheldon McKayRobert McLayAngel MercerDave MicklosNathan MillerSteve Mock Martha NarroPraveen NuthulapatiShannon OliverShiran PasternakWilliam PeilTitus PurdinJ.A. Raygoza GarayDennis RobertsJerry Schneider

Anthony HeathBarbara HeathMatthew Helmke Natalie HenriquesUwe HilgertNicole HopkinsEun-Sook JeongLogan JohnsonChris JordanB.D. KimKathleen KennedyMohammed KhalfanSeung-jin KimLars KoersterkSangeeta KuchimanchiKristian KvilekvalAruna LakshmananSue LauterTina Lee

Bruce SchumakerSriramu SingaramEdwin SkidmoreBrandon SmithMary Margaret Sprinkle Sriram SrinivasanJosh SteinLisa StillwellKris UriePeter Van BurenHans Vasquez-GrossMatthew VaughnFusheng WeiJason WilliamsJohn WregglesworthWeijia XuJill Yarmchuk

Aniruddha MaratheKurt MichaelsDhanesh PrasadAndrew PredoehlJose SalcedoShalini SasidharanGregory StriemerJason VandeventerKuan Yang

Postdocs:Barbara BanburyJamie EstillBindu JosephChristos Noutsos Brad RuhfelStephen A. SmithChunlao TangLin WangLiya WangNorman Wickett

The iPlant Collaborative