language and runtime support for automatic configuration ... · design and implementation of...

24
J Grid Computing DOI 10.1007/s10723-012-9213-8 Language and Runtime Support for Automatic Configuration and Deployment of Scientific Computing Software over Cloud Fabrics Chris Bunch · Brian Drawert · Navraj Chohan · Chandra Krintz · Linda Petzold · Khawaja Shams Received: 16 August 2011 / Accepted: 8 March 2012 © Springer Science+Business Media B.V. 2012 Abstract In this paper, we present the design and implementation of Neptune, a simple, domain- specific language based on the Ruby programming language. Neptune automates the configuration and deployment of scientific software frameworks over disparate cloud computing systems. Neptune integrates support for MPI, MapReduce, UPC, X10, StochKit, and others. We implement Nep- tune as a software overlay for the AppScale cloud platform and extend AppScale with support for elasticity and hybrid execution for scientific com- puting applications. Neptune imposes no over- head on application execution, yet significantly simplifies the application deployment process, en- ables portability across cloud systems, and pro- motes lock-in avoidance by specific cloud vendors. Keywords Cloud platform · Service placement · Domain specific language C. Bunch (B ) · B. Drawert · N. Chohan · C. Krintz · L. Petzold Computer Science Department, University of California, Santa Barbara, CA, USA e-mail: [email protected] K. Shams Jet Propulsion Laboratory, California Institute of Technology, Pasadena, CA, USA Mathematics Subject Classifications (2010) D.3.2 · C.2.4 1 Introduction Cloud computing is a service-oriented methodol- ogy that simplifies distributed computing through dynamic resource (compute, storage, database, software) acquisition and management. Cloud computing differs from Grid computing in that re- sources are both shared and opaque. Specifically, users do not know about or control the ge- ographic location, low-level organization, capa- bility, sharing, or configuration of the physical resources they use. Moreover, these resources can grow and shrink dynamically according to service- level agreements, application behavior, and re- source availability. Despite making vast resources procurable at very low cost and providing a scal- able, effective execution model for a wide range of application domains, cloud computing has yet to achieve widespread use for scientific computing. Beyond the differences between clouds and Grids, there are three barriers to the adoption of cloud computing for the execution of distributed, cluster-based, scientific applications. First, cloud systems currently in use have been designed for the execution of applications from the web ser- vices domain. As a result, developers must im- plement additional services and frameworks to

Upload: others

Post on 05-Jun-2020

0 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Language and Runtime Support for Automatic Configuration ... · design and implementation of Neptune, and our extensions to the AppScale cloud platform. We then empirically evaluate

J Grid ComputingDOI 10.1007/s10723-012-9213-8

Language and Runtime Support for AutomaticConfiguration and Deployment of ScientificComputing Software over Cloud Fabrics

Chris Bunch · Brian Drawert · Navraj Chohan ·Chandra Krintz · Linda Petzold · Khawaja Shams

Received: 16 August 2011 / Accepted: 8 March 2012© Springer Science+Business Media B.V. 2012

Abstract In this paper, we present the design andimplementation of Neptune, a simple, domain-specific language based on the Ruby programminglanguage. Neptune automates the configurationand deployment of scientific software frameworksover disparate cloud computing systems. Neptuneintegrates support for MPI, MapReduce, UPC,X10, StochKit, and others. We implement Nep-tune as a software overlay for the AppScale cloudplatform and extend AppScale with support forelasticity and hybrid execution for scientific com-puting applications. Neptune imposes no over-head on application execution, yet significantlysimplifies the application deployment process, en-ables portability across cloud systems, and pro-motes lock-in avoidance by specific cloud vendors.

Keywords Cloud platform · Service placement ·Domain specific language

C. Bunch (B) · B. Drawert · N. Chohan · C. Krintz ·L. PetzoldComputer Science Department,University of California,Santa Barbara, CA, USAe-mail: [email protected]

K. ShamsJet Propulsion Laboratory,California Institute of Technology,Pasadena, CA, USA

Mathematics Subject Classifications (2010)D.3.2 · C.2.4

1 Introduction

Cloud computing is a service-oriented methodol-ogy that simplifies distributed computing throughdynamic resource (compute, storage, database,software) acquisition and management. Cloudcomputing differs from Grid computing in that re-sources are both shared and opaque. Specifically,users do not know about or control the ge-ographic location, low-level organization, capa-bility, sharing, or configuration of the physicalresources they use. Moreover, these resources cangrow and shrink dynamically according to service-level agreements, application behavior, and re-source availability. Despite making vast resourcesprocurable at very low cost and providing a scal-able, effective execution model for a wide range ofapplication domains, cloud computing has yet toachieve widespread use for scientific computing.

Beyond the differences between clouds andGrids, there are three barriers to the adoption ofcloud computing for the execution of distributed,cluster-based, scientific applications. First, cloudsystems currently in use have been designed forthe execution of applications from the web ser-vices domain. As a result, developers must im-plement additional services and frameworks to

Page 2: Language and Runtime Support for Automatic Configuration ... · design and implementation of Neptune, and our extensions to the AppScale cloud platform. We then empirically evaluate

C. Bunch et al.

support applications from other domains. Suchinfrastructure (tools, services, packages, libraries)presents challenges to efficient reuse, and requiresnon-trivial installation, configuration, and deploy-ment efforts to be repeatable. Second, cloudsystems today are vastly diverse between one an-other, and code written for one system is noteasily portable to other cloud systems, despiteusing common services and APIs provided bythe cloud system. Differing interfaces can imposelarge learning curves and lead to lock-in – theinability to easily move from one cloud systemto another. Third, the self-service nature of cloudinfrastructures require significant user expertiseto manipulate, control, and customize virtual ma-chines (the execution unit of cloud infrastruc-tures), making them inaccessible to all but expertusers [15].

The goal of our work is to reduce the real-world impact of these barriers-to-entry and tofacilitate greater use of cloud fabrics by the sci-entific computing community. This is also part ofan effort to enable a cost-effective computationalternative to that of the cluster that is still viablefor large scale scientific problems. Toward thisend, we present and evaluate Neptune, a domain-specific language for automatically configuringand deploying disparate cloud-based services andapplications. Neptune is a high-level language thatis a superset of Ruby [31], a dynamic, open sourceprogramming language that is easy to learn and fa-cilitates programmer productivity. Neptune addsto Ruby a series of keywords and constructs thatdevelopers use to describe a computational jobat a very high level. Neptune executes any Rubycode using the Ruby interpreter and uses thisjob description along with a set of API calls tobuild, configure, and deploy the services, libraries,and virtual machines necessary for the distributedexecution of complex scientific applications andsystems over cloud platforms. Neptune abstractsaway all of the low level details of the under-lying cloud platforms (and by extension, cloudinfrastructures) and provides a single, simple in-terface with which developers can deploy theirapplications. Neptune thus enables applicationportability across clouds and precludes lock-into any single cloud vendor. Moreover, develop-ers can use Neptune to employ multiple clouds

concurrently (hybrid cloud computing), withoutapplication modification.

To enable this, Neptune interfaces to the App-Scale [8, 9, 24] cloud platform. AppScale is adistributed, scalable software system that exposesa set of popular cloud service APIs (based onthose of Google App Engine), and executes overthe Amazon Web Services and Eucalyptus [27]clouds.

In this paper we present the design and imple-mentation of Neptune, as well as a set of AppScaleextensions that enable automatic configurationand deployment of scientific applications. Theseextensions include dynamic instantiation of vir-tual machines, placement of application and cloudservice components within virtual machines forelasticity, and hybrid cloud computing. We ex-tend AppScale with a set of popular softwaresystems that are employed by a wide range ofscientific application domains, such as MPI, UPC,and MapReduce for general-purpose HPC, aswell as more science-specific toolkits such asStochKit [33] for stochastic biochemical simula-tion, DFSP [12] for spatial stochastic biochemicalsimulation, and dwSSA [10] for the estimationof rare event probabilities. Moreover, Neptune’sdesign makes it straightforward for users to addadditional frameworks, libraries, and toolkits.

In the sections that follow, we describe thedesign and implementation of Neptune, and ourextensions to the AppScale cloud platform. Wethen empirically evaluate Neptune using distrib-uted HPC frameworks, stochastic simulation ap-plications, and different placement strategies. Wethen present related work and conclude.

2 Neptune

Neptune is a domain-specific language that givescloud application developers the ability to eas-ily configure and deploy computational sciencesoftware over cloud systems. Configuration refersto writing the configuration files that HPC soft-ware requires to execute in a distributed fashion,while deployment refers to starting HPC servicesin the correct order, to enable user code to beexecuted. Neptune operates at the cloud plat-form layer (runtime system level) so that it can

Page 3: Language and Runtime Support for Automatic Configuration ... · design and implementation of Neptune, and our extensions to the AppScale cloud platform. We then empirically evaluate

Language/Runtime Support for Cloud Platforms

control infrastructure-level entities (virtual ma-chines) as well as application components andcloud services.

2.1 Syntax and Semantics

The Neptune language is a metaprogramming ex-tension of the Ruby programming language. Assuch, it is high-level and familiar, and can leveragea large set of Ruby libraries to interact with cloudinfrastructures and platforms. Moreover, any legalRuby code is also legal within Neptune programs,enabling users to use Ruby’s scripting capabilitiesto quickly construct functioning programs. Thereverse is also true: Neptune can be used withinRuby programs, to which it appears to users as alibrary that can be utilized in the same fashion asother Ruby libraries.

Neptune uses a reserved keyword (denotedthroughout this work via the neptune keyword)to identify services within a cloud platform. LegalNeptune code follows the syntax:

neptune : type => : service_name ,: opt ion1 => ‘ s e t t i n g 1 ’ ,: opt ion2 => ‘ s e t t i n g 2 ’

The semantics of the Neptune language areas follows: each valid Neptune program consistsof one or more neptune invocations, each ofwhich indicate a job to run in a cloud. Theservice-namemarker indicates the name of thejob (e.g., MPI, X10), and is associated with a setof parameters that are necessary for the giveninvocation. This design choice is intentional: notall jobs are created equal, and while some jobs re-quire little information be passed, other job typescan benefit greatly from increased information.As a further step, we leverage Ruby’s dynamictyping to enable the types of parameters to beconstrained by the developer. If the user specifiesa Neptune job but fails to provide the necessaryparameters, Neptune informs them which para-meters are required and aborts execution.

The value that the invocation returns is alsoextensible, but by default, a Ruby hash is re-turned, whose items are job specific. In most cases,this hash contains a key named :success whoseBoolean value corresponds to whether or notthe request succeeded. Other scenarios allow for

additional parameters to be included. For exam-ple, in the scenario where the invocation asks forthe access policy for a particular piece of datastored in the underlying cloud platform, there isan additional key named :acl whose value is thecurrent data access policy.

Finally, when the user wishes to retrieve datavia a Neptune job, the invocation returns the lo-cation on the user’s filesystem where the outputcan be found. Work is in progress to expand thenumber of failure messages to give users more in-formation about why particular operations failed(e.g., if the data storage mechanism was unavail-able or had failed, or if the cloud platform itselfwas unreachable in a reasonable amount of time),to enable Neptune programs written by users tobecome robust, and to adequately deal with fail-ures at the cloud level. The typical format of auser’s Neptune code is thus of the following form:

r e s u l t = neptune : type => : mpi ,: code => ‘ / code / powermethod ’ ,: nodes_to_use => 4

i f r e s u l t [ : s u c c e s s ]puts ‘ Your MPI job i s now in

p r o g r e s s . ’e l s e

puts ‘ Your MPI job f a i l e d tos t a r t . ’

end

2.2 Design Choices

It is important to contrast the decision to de-sign Neptune as a domain specific language withother configuration options that use XML orother markup languages [23]. These languageswork well for configuration but, since they are notTuring-complete programming languages, theyare bound to their particular execution model. Incontrast, Neptune’s strong binding to the Rubyprogramming language enables users to leverageNeptune and its HPC capabilities to easily in-corporate it into their own codes. For example,Ruby is well known for its Rails web programmingframework [32], and Neptune’s interoperabilityenables Rails users to easily spawn instances ofscientific software without explicit knowledge of

Page 4: Language and Runtime Support for Automatic Configuration ... · design and implementation of Neptune, and our extensions to the AppScale cloud platform. We then empirically evaluate

C. Bunch et al.

how Neptune or the scientific computing softwareoperates.

Markup and workflow languages are powerfulin the types of computation that they enable.Similarly, Neptune allows arbitrary computationto be connected and chained to one another. Thefollowing example shows how the output of aMapReduce job can be used as the input to a X10job. Here, the MapReduce job produces a graphrepresenting links between web pages, while theX10 code takes this graph and performs a shortest-path algorithm from all nodes to one another. AsNeptune does not automatically resolve data de-pendencies between jobs, we manually delay theexecution of the X10 job until after the MapRe-duce job has completed and generated its output.

neptune : type => : mapreduce ,: input => ‘ / rawdata / webdata ’ ,: output => ‘ / output / mrgraph ’ ,

: mapreducejar => ‘ / code / graph . j a r ’ ,: main => ‘ main ’ ,

: nodes_to_use => 64

# wai t f o r the mapreduce job tof i n i s h loop {

r e s u l t = neptune: type => : get−output ,: output => ‘ / output / mrgraph ’

i f r e s u l t [ : s u c c e s s ]break

ends l e e p (60)

}

neptune : type => : mpi ,: input => ‘ / output / mrgraph ’ ,: output => ‘ / output / s h o r t e s t p a t h ’ ,

: code => ‘ / code / Shortes tPath ’ ,: nodes_to_use => 64

To enable code reuse, we allow operations tobe reused across multiple Neptune job types. Forexample, retrieving data and setting ACLs on dataare two operations that occur throughout all thejob types that Neptune supports. Thus, the Nep-tune runtime enables these operations to share a

single code base for the implementation of thesefunctions. This feature is optional: not all softwarepackages may support ACLs and a unified modelfor data output, so Neptune gives developers theoption to implement support for only the featuresthey require, and the ability to leverage existingsupport as needed.

3 Implementation

To enable the deployment of Neptune jobs, thecloud platform must support a number of prim-itive operations. These operations are similar tothose found in computational Grid and clusterutilities, such as the Portable Batch System [28].The cloud platform must be able to receive Nep-tune jobs, acquire computational resources to ex-ecute jobs on, run these jobs asynchronously, andplace the output of these jobs in a way that enablesusers to retrieve them later or share them withother users. For this work, we employ the App-Scale cloud platform to add these capabilities.

AppScale is an open-source cloud platform thatimplements the Google App Engine APIs. Usersdeploy applications using AppScale via either aset of command-line tools or a web interface.An AppScale cloud consists of one or more dis-tributed database components, web servers, anda monitoring daemon (the AppController) thatcoordinates services across nodes in the AppScalecloud. AppScale implements a wide range of data-stores for its database interface via popular opensource technologies. As of its most recent release(AppScale 1.5), it includes support for HBase, Hy-pertable, MySQL Cluster, Cassandra, Voldemort,MongoDB, MemcacheDB, Scalaris, and AmazonSimpleDB. AppScale runs over virtualized andun-virtualized cluster resources as well as overthe Amazon EC2 and Eucalyptus cloud infrastruc-tures. The full details of AppScale are describedin [4, 9].

The execution of a Neptune job follows thepattern shown in Fig. 1. The user invokes theneptune executable on a Neptune script theyhave written, which results in a SOAP messagebeing sent to the Neptune runtime (a separatethread in AppScale’s AppController service). Inthe case of a compute job, the Neptune runtime

Page 5: Language and Runtime Support for Automatic Configuration ... · design and implementation of Neptune, and our extensions to the AppScale cloud platform. We then empirically evaluate

Language/Runtime Support for Cloud Platforms

Fig. 1 AppScale cloudplatform with Neptuneconfiguration anddeployment support

acquires nodes to run the code over, configuresthem for use, and executes the code, storing theoutput for later retrieval. In the case of a datainput or output job, the Neptune runtime storesor retrieves the data via the datastore.

In this section, we overview the AppScale com-ponents that are impacted by our extensions en-abling customized placement, automatic scaling,and Neptune support, the AppScale command-line tools and the AppController

3.1 Cloud Support

Our extensions to AppScale facilitate interopera-tion with Neptune. In particular, we modify App-Scale to acquire and release machines used forcomputation, and to enable static and dynamicservice placement. To do so, we modify two com-ponents within AppScale: the AppScale Tools andthe AppController.

3.1.1 AppScale Tools

The AppScale Tools are a set of command-linetools that developers and administrators can useto manage AppScale deployments and applica-tions. In a typical deployment, the user writes aconfiguration file specifying which node in thesystem is the “master” node and which nodes arethe “slave” nodes. Prior to this work, this meantthat the master node always deployed a DatabaseMaster (or Database Peer for peer-to-peer data-bases) and AppLoadBalancer to handle and routeincoming user requests, while slave nodes alwaysdeployed a Database Slave (or Database Peer)and AppServers hosting the user’s application.

We extend this configuration model to enableusers to provide a configuration file that identifieswhich nodes in the system should run each ser-vice (e.g., Database Master, Database Slave, Ap-pLoadBalancer, AppServer). For example, userscan specify that they want to run each serviceon a dedicated machine by itself. Alternatively,users could specify that they want their databasenodes running on the same machines as theirAppServers, and have all other components run-ning on another machine. We also allow users todesignate certain nodes in the system as “open”,which tells the AppController that this node isfree to use for Neptune jobs (a hot spare).

We extend this support to enable hybrid clouddeployment of AppScale, in which nodes are notlimited to a single cloud infrastructure. Here,users specify which nodes belong to each cloudinfrastructure, and then export environment vari-ables that correspond to the credentials neededfor each cloud. This is done to mirror the stylesused by Amazon EC2 and Eucalyptus. One po-tential use case of this hybrid cloud support is forusers who have a small, dedicated Eucalyptus de-ployment and access to Amazon EC2: these userscould use their Eucalyptus deployment to test andoptimize their code, and deploy to Amazon EC2when more nodes are needed. Similarly, Neptuneusers can use hybrid cloud support to run jobs inmultiple availability zones simultaneously, provid-ing them with the ability to run computation asclose as possible to their data. For scenarios wherethe application to be deployed is not a compute-intensive application (e.g., web applications), itmay be beneficial to ensure that instances ofthe application are served in as many availability

Page 6: Language and Runtime Support for Automatic Configuration ... · design and implementation of Neptune, and our extensions to the AppScale cloud platform. We then empirically evaluate

C. Bunch et al.

zones as possible, to ensure that users always haveaccess to a nearby instance. This deployment strat-egy gives users some degree of fault-tolerance, inthe rare cases when an entire availability zone isdown or temporarily inaccessible [18].

3.1.2 AppController

The AppController is a monitoring service thatruns on every node in an AppScale deployment. Itconfigures and instantiates all necessary services,which typically involves starting databases andrunning Google App Engine applications. App-Controllers also monitor the status of each serviceit runs, and periodically send heartbeat messagesto other AppControllers to aggregate this infor-mation. It currently queries each node to learn itsCPU, memory, and hard drive usage, although itis extensible to collecting other metrics.

Our extensions enable the AppController toreceive and understand RPC (via SOAP) mes-sages from Neptune and to coordinate Neptuneactivities across other nodes in an AppScale de-ployment. Computational jobs and requests foroutput data run asynchronously within AppScale,and do not block the user’s Neptune code. AllNeptune requests are authenticated with a secretestablished when starting AppScale, and are per-formed over SSL to prevent request sniffing.

If running in hybrid cloud deployments, App-Scale spawns machines for each cloud in whichthe user has requested machines, with the creden-tials that the user has provided. Any hot spares(machines indicated as “open”) are acquired be-fore new nodes are spawned. The AppControllerrecords which cloud each machine runs in, so thatNeptune jobs can ask for nodes within specificcloud or more than one cloud. Additionally, ascloud infrastructures currently meter on a per-hour basis, we have modified the AppControllerto be cognizant of this and reuse virtual machinesbetween Neptune jobs. Within AppScale, any vir-tual machine that is not running a Neptune job atthe 55 min mark is terminated; all other machinesare renewed for another hour.

Administrators query AppScale via either theAppScale Tools or the web interface provided bythe AppLoadBalancer. These interfaces informadministrators about the jobs in progress and, in

hybrid cloud deployments, which clouds are run-ning which jobs. These interfaces do not actuallyrun Neptune jobs or interact with them, but simplydescribe their status as reported to them by theAppController.

A perk of offering this service at the cloudplatform layer is that the platform can profilethe usage patterns of the underlying system andact accordingly (since a well-specified set of APIsare offered to users). We provide customizablescheduling mechanisms for scenarios when theuser is unsure how many nodes are required toachieve optimal performance. This use case isunlikely to occur for highly tuned codes, but morelikely to occur within HTC and MTC applica-tions, where the code may not be as well tunedfor high performance. Users only need specifyhow many nodes the application can run over,a required parameter because Neptune does notperform static analysis of the user’s code, andoftentimes specific numbers of nodes are required(e.g., powers of two). Neptune then employs a hill-climbing algorithm to determine how many ma-chines to acquire: given an initial guess, Neptuneacquires that many machines and runs the user’sjob, recording the total execution time for lateruse. On subsequent job requests, Neptune triesthe next highest number of nodes, and follows thisstrategy until the execution time fails to improve.Our initial release of Neptune provides schedulingbased on total execution time, total cost incurred(e.g., acquire more nodes only if it costs less todo so), or a weighted average of the two. Thisbehavior is customizable, and is open to experi-mentation via alternative schedulers.

More appropriate to scientists using cloud tech-nologies is the ability to automatically choose thetype of instance acquired for computation. Cloudinfrastructure providers offer a wide variety ofmachines, referred to as “instance types”, thatdiffer in terms of cost and performance. Inex-pensive instance types offer less compute powerand memory, while more expensive instance typesoffer more compute power and memory. If theuser does not specify an instance type to use,Neptune will automatically acquire a compute-intensive instance. A benefit of this strategy isthat since these machines are among the moreexpensive machines available, the virtual machine

Page 7: Language and Runtime Support for Automatic Configuration ... · design and implementation of Neptune, and our extensions to the AppScale cloud platform. We then empirically evaluate

Language/Runtime Support for Cloud Platforms

reuse techniques we employ amortize their costbetween multiple users for jobs that do not runin 60 min increments (the billing quantum used inAmazon EC2).

3.1.3 AppServer

The AppServer is a modified version of theGoogle App Engine SDK that runs a user’s AppEngine application. Applications can be writtenin Python, Java, or Go, and can utilize APIs thatprovide a variety of features, including storagecapabilities (via the Datastore and Blobstore) andcommunication capabilities (via Mail and XMPP).

For this work, we modify the AppServer to addan additional API: the Neptune API. This APIallows users to initiate Neptune jobs from withinApp Engine applications hosted on AppScale, andthus provides a mechanism by which web applica-tions can execute high performance computation.This also opens up HPC to greater audiences ofusers, including those who want to run their codesfrom different types of platforms (e.g., via theirsmartphone or tablet computer).

3.2 Job Data

Clouds that run Neptune jobs must allow for datastored remotely to be imported and used as jobinputs. Jobs can consume zero or more files asinputs, but always produce exactly one piece ofoutput, a string containing the standard out gen-erated by the executed code. Neptune refers todata as three-tuple: a string containing the job’sidentification number, a string containing the out-put of the job, and a composite type indicatingits access policy. The access policy used withinNeptune is similar to that of the access policy usedby Amazon’s Simple Storage Service [1]: a partic-ular piece of data can be tagged as either private(only visible to the user that uploaded it) or public(visible to anyone). Data is by default private butcan be changed by the user, via a Neptune job.Similarly, data is referenced as though it were on afile-system: paths must begin with a forward-slash(‘/’) and can be compartmentalized into folders inthe familiar manner. The data itself is accessedvia a Google App Engine application that is au-tomatically started when AppScale starts, and can

be stored internally via AppScale or externallyvia Amazon S3. This allows jobs to automaticallysave their outputs in any datastore that AppScalesupports, or any service that is API-compatiblewith Amazon S3 (e.g., Google Storage, Eucalyp-tus Walrus). The Neptune program to set the ACLof a particular piece of data to be public is:

neptune : type => ‘ set−ac l ’ ,: output = > ‘/ mydata / nqueens−output ’ ,: a c l => ‘ publ ic ’

Just as a Neptune job can be used to set theACL for a piece of data, a Neptune job can alsobe used to get the ACL for a piece of data:

a c l _ d a t a = neptune: type => ‘ get−ac l ’ ,: output => ‘ / mydata / nqueens−output ’

puts ‘ The c u r r e n t ACL i s : ’+ a c l _ d a t a [ : a c l ]

Retrieving the output of a given job is also donevia a Neptune job. By default, it returns a stringcontaining the results of the job. As many jobsreturn data that is far too large to efficiently beused in this manner, a special parameter can beused to instead indicate that it should be copiedto the local machine. The following Neptune codeillustrates both use cases (note that the # characteris Ruby’s comment character):

# f o r a job with smal l outputr e s u l t = neptune

: type => ‘ get−output ’ ,: output => ‘ / mydata / boo ’

puts ‘ Output i s : ’ + r e s u l t [ : output ]# f o r a job with much l a r g e r outputr e s u l t = neptune

: type => ‘ get−output ’ ,: output => ‘ / mydata / boo−l a rge ’ ,: s a v e _ t o _ l o c a l =>

‘ / shared / boo−l a r g e . t x t ’i f r e s u l t [ : s u c c e s s ]

puts ‘ Output copied s u c c e s s f u l l y . ’end

3.3 Employing Neptune for HPC Frameworks

To support HPC applications within cloud plat-forms, we service-ize them for use via Neptune.

Page 8: Language and Runtime Support for Automatic Configuration ... · design and implementation of Neptune, and our extensions to the AppScale cloud platform. We then empirically evaluate

C. Bunch et al.

Specifically, Neptune provides support for MPI,X10, MapReduce, UPC, and Erlang, to enableusers to run arbitrary codes for different com-putational models. While these general purposelanguages and frameworks are useful for the sci-entific community as a whole, Neptune also seeksto engender support from the biochemical simu-lation community. These groups of HPC performsimulations via kinetic Monte Carlo methods(specifically, the Stochastic Simulation Algo-rithm), and often need to run a large number ofthese simulations (on a minimum order of 105) togain statistical accuracy. Neptune supports use ofStochKit, a general purpose SSA implementation,as well as DFSP and dwSSA, two specialized SSAimplementations.

As users may not have these libraries and run-times installed locally, Neptune also provides theability to remotely compile their code (requiredfor the non-SSA computational models), and isextensible to support non-compute intensive ap-plication domains, such as web services.

3.3.1 MPI

The Message Passing Interface (MPI) [16] is apopular, general purpose computational frame-work for distributed scientific computing. The mostpopular implementation is written in a combina-tion of C, C++, and assembly. Implementationsexist for many other programming languages,such as Fortran, Java, and Python. AppScale em-ploys the C/C++ version, enabling developersto write code in either of these languages toaccess MPI bindings within AppScale. The de-veloper uses Neptune to specify the location ofthe compiled application binary and output data,and this information is sent from Neptune to theAppController.

Following the MPI execution model, one com-pute node is designated as a master node, andall other nodes are referred to as slave nodes.The master node starts up NFS on all nodes,mounts a shared filesystem on all slave nodes,runs mpdboot on its own node, and executes theuser’s code on its node via mpiexec, piping theoutput of the job to a file on its local filesystem.Once it has completed, the master node runsmpdallexit and stores the standard output and

standard error of the job (the results) in thedatabase that the user has requested, for laterretrieval. An example of how a user would run anMPI job is as follows:

neptune : type => : mpi ,: code => ‘ / code / powermethod ’ ,: nodes_to_use => 4 ,: output => ‘ / output / powermethod . t x t ’

In this example, we specify the location wherethe compiled code to execute is located (stored viaa previous Neptune job). The user also indicateshow many machines are required to run their MPIcode and where the output of the job should beplaced. Note that this program does not use anyinputs, nor need to write to any files on disk as partof its output. Neptune can be extended to do so,if necessary. We also can designate which sharedfile system to use when running MPI. Currently,we support NFS and are working on support forthe Lustre Distributed File System [25].

We also note that many HPC applications re-quire a high performance, low latency intercon-nect. If running over Amazon EC2, users canacquire this via the Cluster Compute Instancesthey provided, and in Eucalyptus, a cloud can bephysically constructed with the required networkhardware. If the user does not have access to thistype of hardware, and their program requires it,their program may suffer from degraded perfor-mance, or may not run at all.

3.3.2 X10

While MPI is suitable for many types of appli-cation domains, one demand in computing hasbeen to enable programmers to write fast, scalablecode using a high-level programming language.In addition, as many years of research have goneinto optimizing virtual machine technologies, it isalso desirable for a new technology to be able toleverage this work. In this spirit, IBM introducedthe X10 programming language [7], which usesa Java-like syntax, and can execute transparentlyover either a non-distributed Java backend or adistributed MPI backend. The Java backend en-ables developers to develop and test their codequickly, and utilize Java libraries, while the MPI

Page 9: Language and Runtime Support for Automatic Configuration ... · design and implementation of Neptune, and our extensions to the AppScale cloud platform. We then empirically evaluate

Language/Runtime Support for Cloud Platforms

backend allows the code to be run over as manymachines as the user can acquire.

As X10 code can compile to executables foruse by MPI, X10 jobs are reducible to MPI jobs.Thus the following Neptune code deploys an X10executable that has been compiled for use withMPI:

neptune : type => : mpi ,: code => ‘ / code / NQueensDist ’ ,: nodes_to_use => 2 ,: output => ‘ / output / nqueensx10 . t x t ’

With the combination of MPI and X10 withinNeptune, users can trivially write algorithms inboth frameworks and (provided a common outputformat exists) compare the results of a particularalgorithm to ensure correctness across implemen-tations. One example used in this paper is the n −queens algorithm [30], an algorithm that, given anchess board of size n × n, determines how manyways n queens can be placed on the board withoutthreatening one another. The following Neptunecode illustrates how to verify the results producedby an MPI implementation against that of an X10implementation (assuming both codes are alreadystored remotely):

# run mpi v e r s i o nneptune : type => : mpi ,

: code => ‘ / code / MpiNQueens ’ ,: nodes_to_use => 4 ,: output => ‘ / mpi / nqueens ’

# run x10 v e r s i o nneptune : type => : mpi ,

: code => ‘ / code / X10NQueens ’ ,: nodes_to_use => 4 ,: output => ‘ / x10 / nqueens ’

# wai t f o r mpi v e r s i o n to f i n i s hloop {

mpi_data = neptune: type => ‘ output ’ ,

: output => ‘ / mpi / nqueens ’i f mpi_data [ : s u c c e s s ]break

ends l e e p (60)

}

# wai t f o r x10 v e r s i o n to f i n i s hloop {

x10_data = neptune: type => ‘ output ’ ,

: output => ‘ / x10 / nqueens ’i f x10_data [ : s u c c e s s ]break

ends l e e p (60)

}

i f mpi_data [ : output ]== x10_data [ : output ]

puts ‘ Output matched ! ’e l s e

puts ‘ Output did not match . ’end

Output jobs return a hash containing a:success parameter, indicating whether or notthe output exists. We leverage this to determinewhen the compute job that generates this out-put has finished. The :output parameter in anoutput job contains a string corresponding to thestandard out of the job itself, and we use Ruby’sstring comparison operator (==) to compare theoutputs for equality.

3.3.3 MapReduce

Popularized by Google in 2004 for its internaldata processing [11], the map-reduce program-ming paradigm (MapReduce) has experienced aresurgence and renewed interest. In contrast tothe general-purpose message passing paradigmembodied in MPI, MapReduce targets embar-rassingly parallel problems. Users provide input,which is split across multiple instances of a user-defined Map function. The output of this functionis then sorted based on a key provided by the Mapfunction, and all outputs with the same key aregiven to a user-defined Reduce function, whichtypically aggregates the data. As no communica-tion can be done by the user in the Map and Re-duce phases, these programs are highly amenableto parallelization.

Hadoop provides an open-source implemen-tation of MapReduce that runs over the HadoopDistributed File System (HDFS) [17]. The

Page 10: Language and Runtime Support for Automatic Configuration ... · design and implementation of Neptune, and our extensions to the AppScale cloud platform. We then empirically evaluate

C. Bunch et al.

standard implementation requires users to writetheir code in the Java programming language,while the Hadoop Streaming implementationfaciliates writing code in any programminglanguage. Neptune has support for bothimplementations. Users provide a Java archivefile (JAR) for the standard implementation, orMap and Reduce applications for the Streamingimplementation.

AppScale retrieves the user’s files from thedesired data storage location, and runs the job onthe Neptune-specified nodes in the system. In par-ticular, the AppController contacts the HadoopJobTracker node with this information, and pollsHadoop until the job completes (indicated by theoutput location having data written to it). Whenthis occurs, Neptune copies the data back to auser-specified location. From the user’s perspec-tive, the necessary Neptune code to run codewritten with the standard MapReduce implemen-tation is:

neptune : type => : mapreduce ,: input => ‘ / input / input−t e x t . t x t ’ ,: output => ‘ / output / mr−output . t x t ’ ,

: mapreducejar =>‘ / code / example . j a r ’ ,

: main => ‘ wordcount ’ ,

: nodes_to_use => 4

As was the case with MPI jobs, the userspecifies where the input to the MapReduce jobis located, where to write the output to, and wherethe code to execute is located. Users also specifyhow many nodes they want to run their code over.AppScale normally stores inputs and outputs ina datastore it supports or Amazon S3, but forMapReduce jobs, it also supports the HadoopDistributed File System (HDFS). This can resultin Neptune copying data to HDFS from S3 (andvice-versa), but an extra parameter can be used toindicate that the input already exists in HDFS, toskip this extra copy operation.

3.3.4 Unif ied Parallel C

Unified Parallel C [13] is a superset of the Cprogramming language that aims to simplify HPC

applications via the Partitioned Global AddressSpace (PGAS) programming model. UPC allowsdevelopers to write applications that use sharedmemory in lieu of the message passing model thatother programming languages offer (e.g., MPI).UPC also can be deployed over a number ofruntimes; some of these backends include special-ized support for shared memory machines as wellas optimized performance when specialized net-working equipment is available. UPC programsdeployed via Neptune can use any backend sup-ported by the underlying cloud platform, and aswe use AppScale in this work, three backendsare available: the SMP backend, optimized forsingle node deployments, the UDP backend, fordistributed deployments, and the MPI backend,which leverages the mature MPI runtime.

UPC code can be deployed in Neptune in amanner analogous to that of other programminglanguages. If a UPC backend is not specified in aMakefile with the user’s code, the MPI backendis automatically selected. As we have compiledour code with the MPI backend, the Neptunecode needed is identical to that used in MPIdeployments:

r e s u l t = neptune : type => : mpi ,: code => ‘ ~ / r ing−compiled / Ring ’ ,: nodes_to_use => 4 ,: procs_to_use => 4 ,: output => ‘ / upc / r ing−output ’

# i n s p e c t i s Ruby ’ s method to p r i n ta hash

puts r e s u l t . i n s p e c t

As shown here, users need only specify thelocation of the executable, how many nodes touse, and where the output should be placed. Weextend the MPI support that Neptune offers to en-able users to specify how many processes shouldbe spawned. This allows for deployments wherethe number of processes is greater than that ofthe number of available nodes (and are thus over-provisioned), and can take advantage of scenarioswhere the instance types requested have morethan a single core present.

Page 11: Language and Runtime Support for Automatic Configuration ... · design and implementation of Neptune, and our extensions to the AppScale cloud platform. We then empirically evaluate

Language/Runtime Support for Cloud Platforms

3.3.5 Erlang

Erlang [2] is a concurrent programming languagedeveloped by Ericsson that uses a message passinginterface similar to that of MPI. While other HPCofferings try to engender a larger user communityby basing their language’s syntax, semantics, orruntime on that of C or Java (e.g., MPI, UPC, andX10), Erlang does not. The stated reason for thisis that Erlang seeks to optimize the user’s codevia the single assignment model, which enables ahigher degree of compile-time optimization thanthe model used by C-style languages.

While Erlang’s concurrent programming con-structs extend to distributed computing, Erlangdoes not provide a parallel job launcher analogousto those provided by MPI (via mpiexec), HadoopMapReduce, X10, and UPC. These job launchersdo not require the user to hardcode IP addressesin their code, as is required by Erlang programs.

Due to this limitation, we support only the con-current programming model that Erlang offers.We are currently investigating ways to automatethe process for the distributed version. Users writeErlang code with a main method, as is standardErlang programming practice, and this method isthen invoked by the AppScale cloud platform ona machine allocated for use with Erlang.

The Neptune code needed to deploy a pieceof Erlang code is similar to that of the othersupported languages:

neptune : type => : er lang ,: code => ‘ ~ / r ing−compiled /

r i n g . beam ’ ,: output => ‘ / er lang−output . t x t ’ ,: nodes_to_use => 1

In this example, we specify that we wish to use asingle node, the path on the local filesystem wherethe compiled code can be found, and where theoutput of the execution should be placed.

3.3.6 Compilation Support

Before MPI, X10, MapReduce, UPC, or Erlangjobs can be run, they require the user’s code to becompiled. Although the target architecture (themachines that AppScale runs over) may be the

same as the architecture that the scientist has com-piled their code on, it is not guaranteed to be so. Itis therefore necessary to offer remote compilationsupport, so that no matter what platform the userruns, whether it be a 32-bit laptop, a 64-bit server,or even a tablet computer that has a text editorand internet connection, code can be compiledand run. The Neptune code required to compilea given piece of source code is:

r e s u l t = neptune : type => : compile ,: code => ‘ ~ / r ing ’ ,: main => ‘ Ring . x10 ’ ,: output => ‘ / output / r ing ’ ,: copyto => ‘ ~ / r ing−compiled ’

puts r e s u l t . i n s p e c t

This Neptune code requires the user to indicateonly where their code is located and which code isthe main executable (as opposed to being a libraryor other ancillary code). Scientists may providemakefiles if they like. If they do not, Neptuneattempts to generate one for them based on thefile’s extension or its contents. Neptune cannotgenerate makefiles for all scenarios, but can doso for many scenarios where the user may not becomfortable with writing a makefile.

3.3.7 StochKit

To enable general purpose SSA programmingsupport for scientists, Neptune provides supportfor StochKit, an open source biochemical simula-tion software package. StochKit provides stochas-tic solvers for several variants of the StochasticSimulation Algorithm (SSA), and provides themechanisms for the stochastic simulation of arbi-trary models. Scientists describe their models byspecifying them in the Systems Biology MarkupLanguage (SBML) [20]. In this work, we simu-late a model included with StochKit, known asheat-shock-10x. This model is a ten-fold ex-pansion of the system that models the heat shockresponse in Escherichia coli [14]. Figure 2 showsresults from a statistical analysis on an ensembleof simulated trajectories from this model.

Typically scientists utilizing the SSA run a largenumber of simulations to ensure enough statisticalaccuracy in their results. As the number of simula-

Page 12: Language and Runtime Support for Automatic Configuration ... · design and implementation of Neptune, and our extensions to the AppScale cloud platform. We then empirically evaluate

C. Bunch et al.

−10 0 10 20 30 40 50 60 700

20

40

60

80

100

120

Bin Centers

Bin

Cou

nts

Euclidian d=0.038 Manhattan d=0.100

data set 1data set 2

0 10 20 30 40 50 60 70 80 90 1000

2

4

6

8

10

12

14

16

18

time

po

pu

lati

on

Index 4Index 9

Fig. 2 Plots showing statistical results from StochKitstochastic simulations of the heat shock model. (Left)Comparison of probability density histograms from twoindependent ensembles of trajectories, as well as thehistogram distance between them. The histogram self-

distance, a measure of the difference between independentensembles from the same model, is used to determinethe confidence for a given ensemble size. (Right) Time-series plots of the mean (solid lines) and standard-deviationbounds (dashed lines) for two biochemical species

tions to run may not be known a priori, scientistsoften have to run a number of simulations, see ifthe requested confidence level has been achieved,and if this has not occurred, the process repeats.The Neptune code required to do this is trivial:

confidence_needed = 0 . 9 5i = 0loop {

neptune : type => : ssa ,: nodes_to_use = 4 ,: t a r => ‘ / code / s s a . t a r . gz ’: s i m u l a t i o n s = 100 _000 ,: output = ‘ / mydata / run −#{ i } ’

# wai t f o r s s a job to f i n i s hloop {

s s a _ d a t a = neptune: type => ‘ get−output ’ ,

: output => ‘ / mydata / run −#{ i } ’i f s s a _ d a t a [ : s u c c e s s ]break

ends l e e p (60)

}

conf idence_achieved= s s a _ d a t a [ : output ]

i f conf idence_achieved>confidence_needed

breake l s e

puts ‘ S u f f i c i e n t conf idence notreached . ’

end

i += 1}

To enable StochKit support within Neptune, weautomatically install StochKit within newly cre-ated AppScale images by fetching it from a localrepository. It is placed in a predetermined locationon the image and made available to user-specifiedscripts via its standard executables. It is possibleto require users to run a Neptune :compilejob that would install StochKit in an on-demandfashion, but we elect to preinstall it, to reduce thenumber of steps required to run a StochKit job.Additionally, while forcing a compilation step ispossible, the user’s StochKit code often consistsof biochemical models and a bash script, whichdo not need to be compiled to execute and thusdo not fall under the domain of a :compile job.

As StochKit does not run in a distributed fash-ion, the AppController coordinates the machinesthat the user requests to run their SSA computa-

Page 13: Language and Runtime Support for Automatic Configuration ... · design and implementation of Neptune, and our extensions to the AppScale cloud platform. We then empirically evaluate

Language/Runtime Support for Cloud Platforms

tion. For the example above, in which four nodesare to be used to run 100,000 simulations, Neptuneinstructs each node to run 25,000 simulations.

3.3.8 DFSP

One specialized SSA implementation supportedby Neptune is the Diffusive Finite State Projec-tion algorithm (DFSP) [12], a high-performancemethod for simulating spatially inhomogenousstochastic biochemical systems, such as thosefound inside living cells. The example system thatwe examine here is a biological model of yeast po-larization, known as the G-protein cycle example,shown in [12]. Yeast cells break their spatial sym-metry and polarize in response to an extra-cellulargradient of mating pheromones. The dynamicsof the system are modeled using the stochas-tic reaction-diffusion master equation. Figure 3shows visualizations from stochastic simulationsof this model.

The code for the DFSP implementation is atarball containing C language source and an ac-companying makefile. The executable producesa single trajectory for each instance that is run. Asthis simulation is a stochastic system, an ensembleof independent trajectories are required for sta-tistical analysis; 10,000 trajectories are needed to

minimize error to acceptable levels. The Neptunecode needed to run this is:

s t a t u s = neptune : type => : ssa ,: nodes_to_use => 64 ,

: t a r => ‘ / code / dfsp . t a r . gz ’ ,: output => ‘ / outputs / ssa−output ’ ,

: s t o r a g e => ‘ s3 ’ ,:EC2_ACCESS_KEY =>

ENV[ ‘S3_ACCESS_KEY’ ] ,:EC2_SECRET_KEY =>

ENV[ ‘S3_SECRET_KEY’ ] ,: S3_URL => ENV[ ‘S3_URL ’ ] ,

: t r a j e c t o r i e s => 10 _000 ,

puts s t a t u s . i n s p e c t

In this example, the scientist has indicated thatthey wish to run their DFSP code, stored remotelyat /code/dfsp.tar.gz, over 64 machines. Thescientist here has also specified that their codeshould be retrieved from Amazon S3 with theprovided credentials, and that the output shouldbe saved back to Amazon S3. Finally, the scientisthas indicated that 10,000 simulations should berun. The storage-specific parameters used here

020

4060

80100

−10

−5

0

5

100

10

20

30

40

50

60

70

timespace

mol

ecul

es

−8 −6 −4 −2 0 2 4 6 80

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

1

space

Nor

mal

ized

Con

cent

ratio

n

LigandBound ReceptorActivated G−Protein

Fig. 3 Two plots of the DFSP example model of yeastpolarization. (Left) Temporal-Spatial profile of activatedG-protein. Stochastic simulation reproduces the noise inthe protein population that is inherent to this system.

(Right) Overlay of three biochemical species popula-tions across the yeast cell membrane: the extra-cellularpheromone ligand, the ligand bound with membrane re-ceptor, and the G-protein activated by a bound receptor

Page 14: Language and Runtime Support for Automatic Configuration ... · design and implementation of Neptune, and our extensions to the AppScale cloud platform. We then empirically evaluate

C. Bunch et al.

are not specific to DFSP or SSA jobs, and can beused with any type of computation.

To enable DFSP support within Neptune, weautomatically install support for the GNU Scien-tific Library (GSL) when we generate a new App-Scale image. The user’s DFSP code can then uti-lize it within its computations in the same fashionas if it were installed on their local computer. Nep-tune does not currently provide a general modelfor determining library dependencies, as version-ing of libraries can make this problem difficult tohandle in an automated fashion. However, Nep-tune does allow an expert user to manually installthe required libraries a single time and enable thecommunity at large to benefit.

3.3.9 dwSSA

Another specialized SSA implementation we sup-port within Neptune is the dwSSA, the dou-bly weighed SSA coupled with the cross-entropymethod. The dwSSA is a method for accurateestimation of rare event probabilities in stochas-tic biochemical systems. Rare events, events withprobabilities no larger than 10−9, often havesignificant consequences to biological systems,yet estimating them tends to be computationallyinfeasible. The dwSSA accelerates the estima-tion of these rare events by significantly reduc-ing the number of trajectories required. This isaccomplished using importance sampling, whicheffectively biases the system toward the desiredrare event, and reduces the number of trajectoriessimulated by several orders of magnitude.

The system we examine in this work is the birth-death process shown in [10]. The rare event thatthis model attempts to determine is the probabil-ity that the stochastic fluctuations of this systemwill double the population of the chemical species.The model requires the simulation of 1,000,000trajectories to accurately characterize the rareevent probability. The code for this example isa coupled set of source files written in R (formodel definition and rare event calculations) andC (for efficient generation of stochastic trajecto-ries). The Neptune code needed to run the dwSSAimplementation is identical to that of DFSP andStochKit: users simply supply their own tarballwith the dwSSA code in place of a different SSA

implementation. As each dwSSA simulation takesa trivial amount of time to run, we customize itto take, as an input, the number of simulationsto run. This minimizes the amount of time wastedsetting up and tearing down the R environment.

To enable dwSSA support within Neptune, weautomatically install support for the R program-ming language when we generate a new App-Scale image. We also place the R executables in apredetermined location for use by AppScale andNeptune and use R’s batch facilities to instruct Rto never save the user’s workspace (environment)between R executions, as is the default behavior.

3.4 Employing Neptune for Cloud Scalingand Enabling Hybrid Clouds

Our goal with Neptune is to simplify configurationand deployment of HPC applications. However,Neptune is flexible enough to be used with otherapplication domains. Specifically, Neptune can beused to control the scaling and placement of ser-vices within the underlying cloud platform. Fur-thermore, if the platform supports hybrid cloudplacement strategies, Neptune can control howservices are placed. This allows Neptune to beused for both high throughput computing (HTC)and many task computing (MTC). In the for-mer case, resources can be claimed from multiplecloud infrastructures to serve user jobs. In thelatter case, Neptune can be used to serve bothcompute-intensive jobs as well as web serviceprograms.

To demonstrate this, we use Neptune to enableusers to manually scale up a running AppScaledeployment. Users need only specify which com-ponent they wish to scale up (e.g., the load bal-ancer, application server, or database server) andhow many of them they require. This reduces thetypically difficult problem of scaling up a cloud tothe following Neptune code:

neptune : type => : appscale ,: nodes_to_use => { : cloud1 => 3 ,

: c loud2 => 6 } ,: add_component => ‘ appengine ’ ,: t ime_needed_for => 3600

In this example, the user has specified thatthey wish to add nine application servers to their

Page 15: Language and Runtime Support for Automatic Configuration ... · design and implementation of Neptune, and our extensions to the AppScale cloud platform. We then empirically evaluate

Language/Runtime Support for Cloud Platforms

AppScale deployment, and that these machinesare needed for one hour. Furthermore, three ofthe servers should be placed in the first cloudthat the platform is running over, while six serversshould be placed in the second cloud. Definingwhich cloud is the “first cloud” and which cloud isthe “second cloud” is done by the cloud adminis-trator, via the AppScale Tools (see Section 4.1.1).This type of scaling is useful when the amount ofload in both clouds is known: here, this is useful ifboth clouds are over-provisioned, but the secondis either expecting greater traffic in the near futureor is sustaining more load than the first cloud.

Scaling and automation are only amenable tothe same degree as the underlying services allowfor. For example, while the Cassandra databaseallows nodes to be added to the system dynami-cally, users cannot add more nodes to the systemthan already exist (e.g., in a system with N nodes,no more than N − 1 nodes can be added at atime) [6]. Therefore, if more than the allowedfor number of nodes are needed, either multipleNeptune jobs must be submitted or the cloud plat-form must absorb this complexity into its scalingmechanisms.

3.5 Limitations

Neptune enables automatic configuration and de-ployment of software by a cloud platform to theextent that the underlying software allows. It isthus important to make explicit scenarios in whichNeptune encounters difficulties, as they are thesame scenarios in which the supported softwarepackages are not amenable to being placed ina cloud platform. From the end-users we havedesigned Neptune to aid, we have experiencedthree common problems that are not specific toNeptune or to distributed systems (e.g., clouds,Grids) in general:

• Codes that require a unique identifier,whether it be an IP address or processname to be used to locate each machine inthe computation (e.g., multi-node Erlangcomputations). This is distinct from the casewhere the framework requires IP addressesto be hardcoded, as these frameworks (likeMPI) do not require the end-user’s code to be

modified in any way or be aware of a node’sIP address.

• Programs that have highly specialized librariesfor end-users but are not free / open-source,and thus are currently difficult to dynamicallyacquire and release licenses for.

• Algorithms that require a high-speed inter-connect that run in a cloud infrastructure thatdoes not offer one. These algorithms maysuffer from degraded performance or may notwork correctly at all. The impact of this canbe mitigated by choosing a cloud infrastruc-ture that does provide such an offering (e.g.,Cluster Compute Instances for Amazon EC2,or a Eucalyptus cloud with similar networkhardware).

We are investigating how to mitigate these lim-itations as part of our future work. For uniqueidentifiers, it is possible to have Neptune take aparameter containing a list of process identifiersto use within computation. For licensing issues, wecan have the cloud fabric make licenses availableon a per-use basis. AppScale can then guide devel-opers to clouds that have the appropriate licensesfor their application.

3.6 Extensibility

Neptune is designed to be extensible, both inthe types of job supported and the infrastruc-tures that it can harness. Developers who wishto add support for a given software frameworkwithin Neptune need to modify the Neptune lan-guage component as well as the Neptune runtimewithin the cloud platform that receives Neptunejob requests. In the Neptune language component,the developer needs to indicate which parametersusers need to specify in their Neptune code (e.g.,how input and output should be handled), and ifany framework-specific parameters should be ex-posed to the user. At the cloud platform layer, thedeveloper needs to add functionality that can un-derstand the particulars of their Neptune job. Thisoften translates into performing special requestsbased on the parameters present (or absent) in aNeptune job request. For example, MapReduceusers can specify that the input be copied fromthe local file system to the Hadoop Distributed

Page 16: Language and Runtime Support for Automatic Configuration ... · design and implementation of Neptune, and our extensions to the AppScale cloud platform. We then empirically evaluate

C. Bunch et al.

File System. Our implementation within App-Scale skips this step if the user indicates that theinput is already present within HDFS. Once asingle, expert developer has added support fora job type within Neptune and AppScale, it canthen be automatically configured and deployed bythe community at large, without requiring them tobecome an expert user.

4 Evaluation

We next use Neptune to empirically evaluate howeffectively the supported services execute withinAppScale. We begin by presenting our experi-mental methodology and then discuss our results.

4.1 Methodology

To evaluate the software packages supported byNeptune, we use benchmarks and sample applica-tions provided by each. We also measure the costof running Neptune jobs with and without VMreuse.

To evaluate our support for MPI, we use aPower Method implementation that, at its core,multiplies a matrix by a vector (the standardMatVec operation) to find the absolute value ofthe largest eigenvalue of the matrix. We choosethis code over more standard codes such as theIntel MPI Benchmarks because it tests a numberof the MPI primitives working in tandem, pro-ducing a code that should scale with respect tothe number of nodes in the system. By contrast,the Intel MPI Benchmarks largely measure inter-process communication time or the time takenfor a single primitive operation, which is likelyto scale negatively as the number of nodes in-crease (e.g., barrier operations are likely to takelonger when more nodes participate). We use a6400x6400 matrix and 6400x1 vector to ensure thatthe size of the matrices evenly divides the numberof nodes in the computation.

For X10, our evaluation uses an NQueensimplementation publicly available from the X10team that is optimized for multiple machines. Toensure a sufficient amount of computation is avail-able, we set n = 16, thus creating a 16x16 chess-

board and placing 16 queens on the board. Forcomparison purposes with MPI, we also includean optimized MPI version publicly made availablebe the authors of [30]. It is also set to use a 16x16chessboard, using a single node to distribute workacross machines and the others to perform theactual work involved.

To evaluate our support for MapReduce, weuse the publicly available Java WordCount bench-mark, which takes an input data set and findsthe number of occurrences of each word in thatset. Each Map task is assigned a series of linesfrom the input text, and for every word it finds, itreports this with an associated count of one. EachReduce task then sums the counts for each wordand saves the result to the output file. Our inputfile consists of the works of William Shakespeareappended to itself 500 times, producing an inputfile roughly 2.5GB in size.

We evaluate UPC and Erlang by means of aThread Ring benchmark, and compare them toreference implementations in MPI and X10. Eachcode implements the same functionality: a fixednumber of processes are spawned over a givennumber of nodes, and each thread is assigned aunique identifier. The first thread passes a mes-sage to the next thread, who then continues doingso until the last thread receives the message. Thefinal thread sends the message to the first thread,connecting the threads in a ring-like fashion. Thisis repeated a given number of times to completethe program’s execution.

In our first Thread Ring experiment, we fix thenumber of messages to be sent to 100 and fix thenumber of threads to spawn to 64. We vary thenumber of nodes to use between 1, 2, 4, 8, 16,32, and 64 nodes, to determine the performanceimprovement that can be achieved by increasingthe amount of available computation power.

In our second Thread Ring experiment, we fixthe number of messages to be sent to 100, and fixthe number of nodes to use at 8 nodes. We thenvary the number of threads to spawn between 2, 4,8, 16, 32, and 64 threads, to determine the impactof increasing the number of threads that must bescheduled on a fixed number of machines.

Our third Thread Ring experiment fixes thenumber of nodes to use at 8 nodes, and fixes the

Page 17: Language and Runtime Support for Automatic Configuration ... · design and implementation of Neptune, and our extensions to the AppScale cloud platform. We then empirically evaluate

Language/Runtime Support for Cloud Platforms

number of threads to use at 64 threads. We thenvary the number of messages to send between 1,10, 100, 1000, and 10000, to determine the perfor-mance costs of increasing the number of messagesthat must be sent around the distributed threadring in each implementation.

For our SSA codes, DFSP and dwSSA, we run10,000 and 1,000,000 simulations, respectively, andmeasure the total execution time. As mentionedearlier, previous work in each of these papersindicate that these numbers of simulations arethe minimum that scientists typically must run toachieve a reasonable accuracy.

We execute the non-Thread Ring experimentsover different dynamic AppScale cloud deploy-ments of 1, 4, 8, 16, 32, and 64 nodes. In all cases,each node is a Xen guestVM that executes with 1virtual processor, 10GB of disk (maximum), and1GB of memory. The physical machines that wedeploy VMs to execute with 8 processors, 1TB ofdisk, and 16GB of memory. We employ a place-ment strategy provided by AppScale where onenode deploys an AppLoadBalancer (ALB) andDatabase Peer (DBP), and the other nodes aredesignated as “open” (that is, they can be claimedfor any role by the AppController as needed).Since no Google App Engine applications aredeployed, no AppServers run in the system. Allvalues reported here represent the average of fiveruns.

For these experiments, Neptune employs App-Scale 1.5, MPICH2 1.2.1p1, X10 2.1.0, HadoopMapReduce 0.20.0, UPC 2.12.1, Erlang R13B01,the DFSP implementation graciously made avail-able by the authors of the DFSP paper [12], thedwSSA implementation graciously made avail-able by the authors of the dwSSA paper [10],and the StochKit implementation publicly madeavailable on the project’s web site [35].

4.2 Experimental Results

We begin by discussing the performance ofthe MPI and X10 Power Method codes withinNeptune. We time only the computation and anynecessary communication required for the com-putation; thus, we exclude the time to start NFS,to write MPI configuration files, and to start pre-

0 10 20 30 40 50 60 700

50

100

150

200

250

300

Number of Nodes

Run

ning

Tim

e (s

econ

ds)

Fig. 4 Average running time for the Power Method codeutilizing MPI over varying numbers of nodes. These tim-ings include running time as reported by MPI_Wtime anddo not include NFS and MPI startup and shutdown times

requisite MPI services. Figure 4 presents theseresults. Table 1 presents the parallel efficiency,given by the standard formula:

E = T1

pTp(1)

where E denotes the parallel efficiency, T1 de-notes the running time of the algorithm runningon a single node, p denotes the number of proces-sors used in the computation, and Tp denotesthe running time of the algorithm running on pprocessors.

Both Fig. 4 and Table 1 show clear trends:speedups are initially achieved as nodes are in-creased to the system, but the decreasing paral-lel efficiencies show that this scalability does notextend up through 64 nodes. Furthermore, therunning time of the Power Method code increasesafter using 16 nodes. Analysis using VAMPIR

Table 1 Parallel efficiency for the Power Method codeutilizing MPI over varying numbers of nodes

# of nodes MPI parallel efficiency

4 0.92858 0.477616 0.335832 0.048864 0.0176

Page 18: Language and Runtime Support for Automatic Configuration ... · design and implementation of Neptune, and our extensions to the AppScale cloud platform. We then empirically evaluate

C. Bunch et al.

[26], a standard tool for MPI program visualiza-tion, shows that the collective broadcast calls usedare the bottleneck, becoming increasingly so asthe number of nodes increase in the system. Thisis an important point to reiterate: since Neptunesimply runs supported codes on varying numbersof nodes, the original code’s bottlenecks remainpresent and are not optimized away.

The MPI and X10 n-queens codes encounter adifferent type of scaling compared to our PowerMethod code. Figure 5 shows these trends: theMPI code’s performance is optimal at 4 nodes,while the X10’s code performance is optimal at16 nodes. The X10 n-queens code suffers sub-stantially at the lower numbers of nodes com-pared to its MPI counterpart; this is likely due toits relatively new work-stealing algorithm, and isbelieved to be improved in subsequent versionsof X10. This is also the rationale for the largerstandard deviation encountered in the X10 test.We omit parallel efficiencies for this code becausethe MPI code dedicates the first node to coordi-nate the computation, which precludes us fromcomputing the time needed to run this code on asingle node (a required value).

MapReduce WordCount experiences a supe-rior scale-up compared to our MPI and X10 codes.This is largely because this MapReduce code is

0 10 20 30 40 50 60 700

500

1000

1500

2000

2500

3000

Number of Nodes

Run

ning

Tim

e (s

econ

ds)

MPIX10

Fig. 5 Average running time for the n-queens code utiliz-ing MPI and X10 over varying numbers of nodes. Thesetimings include running time as reported by MPI_Wtimeand System.nanoTime, respectively. These times do notinclude NFS and MPI startup and shutdown times

0 10 20 30 40 50 60 700

200

400

600

800

1000

1200

1400

1600

Number of Nodes

Run

ning

Tim

e (s

econ

ds)

Fig. 6 Average running time for WordCount utilizingMapReduce over varying numbers of nodes. These timingsinclude Hadoop MapReduce run times and do not includeHadoop startup or shutdown times

optimized by Hadoop and does not communicatebetween nodes, except between the Map and Re-duce phases. Figure 6 and Table 2 show the run-ning times of WordCount via Neptune. As withMPI, we measure computation time and not thetime incurred starting and stopping Hadoop.

Figure 6 and Table 2 show opposing trendscompared to the MPI results. With our MapRe-duce code, we see consistent speedups as morenodes are added to the system, although with adiminishing impact as we add more nodes to thesystem. This is clear from the decreasing parallelefficiencies, and as stated before, these speedupsare not related to MapReduce or MPI specifically,but are due to the programs evaluated here.WordCount sees a superior speedup comparedto the Power Method code due to the reducedamount of communication and larger amounts ofcomputation. We also see smaller standard devi-ations when compared with the Power Method

Table 2 Parallel efficiency for WordCount using MapRe-duce over varying numbers of nodes

# of nodes Parallel efficiency

4 0.84558 0.597816 0.531332 0.359164 0.3000

Page 19: Language and Runtime Support for Automatic Configuration ... · design and implementation of Neptune, and our extensions to the AppScale cloud platform. We then empirically evaluate

Language/Runtime Support for Cloud Platforms

0 10 20 30 40 50 60 700

500

1000

1500

2000

2500

Number of Nodes

Run

ning

Tim

e (s

econ

ds)

MPIX10UPC

Fig. 7 Average running time for the Thread Ring codeutilizing MPI, X10, and UPC over varying numbers ofnodes. These timings only include execution times reportedby each language’s timing constructs

MPI code, as the communication is strictly dic-tated and optimized by the runtime itself.

In our first Thread Ring experiment, we mea-sure time taken to send 100 messages through aring of 64 threads. We vary the number of nodesused between 1, 2, 4, 8, 16, 32, and 64. Figure 7shows the amount of time taken for implementa-tions written in X10, MPI, and UPC, while Table 3shows the parallel speedup achieved. Both theMPI and X10 codes improve in execution timeas nodes are added. While the X10 code achievesa better parallel efficiency than the MPI code, itis on average one to three orders of magnitudeslower. The reason behind this has been explainedby the X10 team: the X10 runtime currently isnot optimized to handle scenarios where the sys-tem is overprovisioned (e.g., when the numberof processes exceeds the number of nodes). This

Table 3 Parallel efficiency for the Thread Ring code uti-lizing MPI, X10, and UPC over varying numbers of nodes

# of nodes MPI X10 UPC

1 1.0000 1.0000 1.00002 0.5000 0.5000 0.50004 0.4518 1.1251 0.00148 0.3068 1.2663 5.6904e-0416 0.1955 1.6518 2.0528e-0432 0.1189 1.9190 7.0025e-0564 0.0642 25.1618 9.8221e-06

is confirmed by the scenario in which 64 nodesare used: here, the system is not overprovisionedand runs in an equivalent amount of time as theMPI code. The UPC code exhibits a very differentscaling pattern compared to the MPI and X10codes: as it relies on synchronization via barrierstatements, it runs quickly when the number ofnodes is small, and becomes slower as the numberof nodes increases.

Our second Thread Ring experiment fixes thenumber of nodes at the median value (8 nodes),and measures the amount of time needed to send100 messages through thread rings of varyingsizes. Here, we vary the sizes between 8, 16, 32, 64,and 128 threads. The results of this experiment forthe X10, MPI, and UPC codes are shown in Fig. 8.As expected, all codes become slower as the sizeof the thread ring grows. The overall executiontime is fastest for the MPI code, followed by thatof the UPC and X10 codes. The reason for thesedifferences is identical to that given previously:the UPC code relies on barriers. As the numberof threads increases, it becomes more expensiveto perform these barrier operations. The X10 codeis also overprovisioned in most cases, so it slowsdown in these scenarios as well. In the scenariowhen it is not overprovisioned (e.g., when thereare 8 threads and 8 nodes), the X10 code performson par with the MPI code.

0 20 40 60 80 100 120 1400

200

400

600

800

1000

1200

Number of Threads

Run

ning

Tim

e (s

econ

ds)

MPIX10UPC

Fig. 8 Average running time for the Thread Ring codeutilizing MPI, X10, and UPC over varying numbers ofthreads. These timings only include execution times asreported by each language’s timing constructs

Page 20: Language and Runtime Support for Automatic Configuration ... · design and implementation of Neptune, and our extensions to the AppScale cloud platform. We then empirically evaluate

C. Bunch et al.

Our third Thread Ring experiment fixes thenumber of nodes at the median value (8 nodes)once again and measures the amount of timeneeded to send a varying number of messagesthrough the thread ring. Specifically, we vary thenumber of messages to be sent between 1, 10,100, 1000, and 10000 messages for the X10, MPI,and UPC codes. Figure 9 shows the results ofthis experiment: for all codes, excluding the singlemessage scenario, the time to send additional mes-sages increases linearly. Unlike the other bench-marks, the X10 and UPC codes perform withinan order of magnitude of the MPI code. For theX10 code, this is because all machines are well-provisioned (specifically because we run 8 threadsover 8 nodes), avoiding the performance degra-dation that the other experiments revealed. TheUPC code also maintains relatively close perfor-mance to the MPI and X10 codes due to thelow number of nodes: the barriers, which are thebottleneck of the UPC code, are inexpensive whena relatively small number of nodes are used.

To evaluate the performance of our Erlangcode, we compare our Erlang Thread Ring im-plementation with that of our MPI code deployedover a single node. We fix the number of mes-sages to send at 1000 and vary the number ofthreads that make up the ring between 4, 8, 16, 32,and 64. The results of this experiment are shown

0 2000 4000 6000 8000 10000 120000

20

40

60

80

100

120

Number of Messages

Run

ning

Tim

e (s

econ

ds)

MPIX10UPC

Fig. 9 Average running time for the Thread Ring codeutilizing MPI, X10, and UPC over varying numbers ofmessages. These timings only include execution times asreported by each language’s timing constructs

in Fig. 10. The Erlang code scales linearly, andperforms two to three orders of magnitude fasterthan the MPI code for all numbers of threadstested. This is likely due to Erlang’s long historyas a concurrent language and lightweight thread-ing model, which makes it highly amenable tothis experiment. Similarly, MPI is designed tobe a distributed programming language, and aswe have seen in the other experiments, suffersperformance degradations when overprovisioned.We are looking into support for Threaded MPI(TMPI) [34], which provides optimizations for theconcurrent use case shown here.

We next analyze the performance of the spe-cialized SSA packages supported by Neptune.This includes the specialized implementationspresent in DFSP and dwSSA, which here focuson the yeast polarization and birth-death modelsdiscussed previously.

Like the MapReduce code analyzed earlier,DFSP also benefits from parallelization and sup-port via Neptune. This is because the DFSP im-plementation used has no internode communica-tion during its computation, and is embarrassinglyparallel. In the DFSP code, once each node knowshow many simulations to run, they work with nocommunication from other nodes. Figure 11 andTable 4 show the running times for 10,000 simula-tions via Neptune. Unlike MapReduce and MPI,

0 10 20 30 40 50 60 700

10

20

30

40

50

60

70

Number of Threads

Run

ning

Tim

e (s

econ

ds)

MPIErlang

Fig. 10 Average running time for the single node ThreadRing code utilizing MPI and Erlang over varying numbersof threads. These timings only include execution times asreported by each language’s timing constructs

Page 21: Language and Runtime Support for Automatic Configuration ... · design and implementation of Neptune, and our extensions to the AppScale cloud platform. We then empirically evaluate

Language/Runtime Support for Cloud Platforms

0 10 20 30 40 50 60 700

0.5

1

1.5

2

2.5

3

3.5x 104

Number of Nodes

Run

ning

Tim

e (s

econ

ds)

Fig. 11 Average running time for the DFSP code overvarying numbers of nodes. As the code used here does nothave a distributed runtime, timings here include the timethat AppScale takes to distribute work to each node andmerge the individual results

which provide distributed runtimes, our DFSPcode does not, so we time all interactions onceAppScale receives the message to begin compu-tation from Neptune until the results have beenmerged on the master node.

Figure 11 and Table 4 show similar trends forthe DFSP code as seen in MapReduce Word-Count. This code also sees a consistent reductionin runtime as the number of nodes increase, butretains a much higher parallel efficiency comparedto the MapReduce code. This is due to the lack ofcommunication within computation, as the frame-work needs only to collect results once the com-putation is complete, and does not need to sortor shuffle data, as is needed in the MapReduceframework. As less communication is used herecompared to WordCount and Power Method MPIcodes, the DFSP code exhibits a smaller standard

Table 4 Parallel efficiency for the DFSP code over varyingnumbers of nodes

# of nodes Parallel efficiency

4 0.99298 0.983416 0.965032 0.921664 0.8325

0 10 20 30 40 50 60 700

100

200

300

400

500

600

Number of Nodes

Run

ning

Tim

e (s

econ

ds)

Fig. 12 Average running time for the dwSSA code overvarying numbers of nodes. As the code used here does nothave a distributed runtime, timings here include the timethat AppScale takes to distribute work to each node andmerge the individual results

deviation, and a standard deviation that tends todecrease with respect to the number of nodes inthe system.

Another example that follows similar trends tothe DFSP code is the other Stochastic State Algo-rithm, dwSSA, shown in Fig. 12 and Table 5. Thiscode achieves a reduction in runtime with respectto the number of nodes in the system, but does notdo so at the same rate as the DFSP code, as can beseen through the lower parallel efficiencies. Thisis because the execution time for a single dwSSAtrajectory is much smaller than a single DFSPtrajectory, which results in wasted time setting upand tearing down the R environment.

4.3 VM Reuse Analysis

Next, we perform a brief examination of the costsof the experiments in the previous section if run

Table 5 Parallel efficiency for the dwSSA code over vary-ing numbers of nodes

# of nodes Parallel efficiency

4 0.79068 0.473916 0.394632 0.295164 0.1468

Page 22: Language and Runtime Support for Automatic Configuration ... · design and implementation of Neptune, and our extensions to the AppScale cloud platform. We then empirically evaluate

C. Bunch et al.

over Amazon EC2, with and without the VMreuse techniques described previously. The VMsare configured with 1 virtual CPU, 1 GB of mem-ory, and a 64-bit platform. This is similar to theAmazon EC2 “Small” machine type (1 virtualCPU, 1.7 GB of memory, and a 32-bit platform)which costs $0.085 per hour.

Each PowerMethod, MapReduce, DFSP, anddwSSA experiment is run five times at 1, 4, 8, 16,32, and 64 nodes to produce the data shown ear-lier, while each NQueens experiment is run fivetimes at 2, 4, 8, 16, 32, and 64 nodes. We computethe cost of running these experiments without VMreuse (that is, by acquiring the needed numberof machines, running the experiments, and thenpowering them off) compared to the cost with VMreuse (that is, by acquiring the needed number ofmachines, performing the experiment for all num-bers of nodes, and not powering them off untilall runs complete). Note that in the reuse case,we do not perform reuse between experiments.For example, the Neptune code used to run theexperiments for the X10 NQueens code is:

[ 2 , 4 , 8 , 1 6 , 3 2 , 6 4 ] . each { | i |5 . t imes { | j |

neptune : type => : x10 ,: code => ‘ / code / NQueensDist ’ ,: nodes_to_use => i ,: output = > ‘/ nqueensx10 / # { i } / # { j } ’

}}

Table 6 shows the expected cost of runningthese experiments with and without VM reuse.In all experiments, employing VM reuse greatlyreduces the cost. This is largely due to inefficient

Table 6 Cost to run experiments for each type of Neptunejob, with and without reusing virtual machines

# type of job Cost with Cost withoutVM reuse VM reuse

PowerMethod $12.84 $64.18NQueens(MPI) $12.92 $64.60NQueens(X10) $13.01 $64.60MapReduce $13.01 $64.18DFSP $35.70 $78.63dwSSA $12.84 $64.18

Total $100.32 $400.37

use of nodes without reuse, as many scenarios em-ploy large numbers of nodes to run experimentsthat only run for a fraction of an hour (VMs arecharged for by AWS by the hour). All of the ex-periments except for DFSP also cost roughly thesame because they use similar numbers of CPU-hours of computation within AWS and thus aresimilarly priced. We see much greater variationin time and cost on a per-minute or per-secondpricing model instead of a per-hour pricing model.

5 Related Work

An early version of this work was presented at theWorkshop on Scientific Cloud Computing (Sci-enceCloud) and was entitled “Neptune: A Do-main Specific Language for Deploying HPC Soft-ware on Cloud Platforms” [5]. The work devel-oped by others that is most similar to Neptune iscloudinit.d from Nimbus [22]. cloudinit.dprovides an API that users employ to automat-ically launch, configure, and deploy nodes ina cloud infrastructure. In contrast to Neptune,cloudinit.d’s programming model places theonus of configuration and deployment on the userwho writes cloudinit.d scripts. Neptune takesan alternate approach, hiding the complexity be-hind correct configuration and deployment.

Other works exist that provide either languagesupport for cloud infrastructures or automatedconfiguration or deployment, but not both. In theformer category exist projects like SAGA [21],the RightScale Gems [29] and boto [3]. SAGAenables users to write programs in C++, Python,or Java that interact with Grid resources, with therecent addition of support for cloud infrastructureinteraction. A key difference between SAGA andNeptune is that SAGA is conceptually designedto work with Grid resources, and thus the locus ofcontrol remains with the user. The programmingparadigm embodied here serves use cases thatfavor a static number of nodes and an unchangingenvironment. Conversely, Neptune is designed towork over cloud resources, and can elastically addor remove resources based on the environment.The RightScale Gems and boto are similar toSAGA but only provide interaction with cloud in-frastructures (e.g., Amazon EC2 and Eucalyptus).

Page 23: Language and Runtime Support for Automatic Configuration ... · design and implementation of Neptune, and our extensions to the AppScale cloud platform. We then empirically evaluate

Language/Runtime Support for Cloud Platforms

In the latter category exist projects such as theNimbus Context Broker [22] and Mesos [19]. TheNimbus Context Broker automates configurationand deployment of otherwise complex softwarepackages in a matter similar to that of Neptune.It acquires a set of virtual machines from a sup-ported cloud infrastructure and runs a given seriesof commands to unify them as the user’s softwarerequires. Conceptually, this is similar to what Nep-tune offers. However, it does not offer a languageby which it can be operated, like Neptune andSAGA. Furthermore, the Nimbus Cloud Broker,like SAGA, does not make decisions dynamicallybased on the underlying environment. A set ofmachines could not be acquired, tested to ensurea low latency exists, and released within a scriptrunning on Nimbus Cloud Broker. Furthermore,it does not employ virtual machine reuse tech-niques such as those seen within Neptune. Thiswould require a closer coupling with supportedcloud infrastructures or the use of a middlewarelayer to coordinate VM scheduling, which wouldeffectively present a cloud platform.

Like the Nimbus Context Broker, Mesos alsoautomates configuration and deployment of com-plex software packages, but aims to do so foronly a very specific set of packages (MapReduce,MPI, Torque, and Spark). It requires supportedpackages to be modified, and once they are“Mesos-aware”, they can be utilized towards goalsof better resource utilization for the cluster as awhole and better performance for individual jobs.Mesos also positions itself in the cluster comput-ing space, in which jobs can dynamically scale upand down in the number of nodes that they use,but where the cluster as a whole must be staticallypartitioned. Cluster administrators can manuallyadd or remove nodes, but the size of the cluster asa whole tends to remain static. This is in contrastto the cloud model employed by Neptune, wherethe number of nodes is in flux and is controllableby Neptune itself.

6 Conclusions

We contribute Neptune, a Domain Specific Lan-guage (DSL) that abstracts away the complex-ities of deploying and using high performancecomputing services within cloud platforms. We

integrate support for Neptune into AppScale, anopen-source cloud platform and add cloud soft-ware support for MPI, X10, MapReduce, UPC,Erlang, and the SSA packages StochKit, DFSPand dwSSA. Neptune allows users to deploy sup-ported software packages over varying numbers ofnodes with minimal effort, simply, uniformly, andscalably.

We also contribute techniques for placementsupport of components within cloud platforms,while ensuring that running cloud software doesnot negatively impact other services. This entailshybrid cloud placement techniques, facilitatingapplication deployment across cloud infrastruc-tures without modification. We implement thesetechniques within AppScale and provide sharingsupport that allows users to share the results ofNeptune jobs, and to publish data to the scientificcommunity. The system is flexible enough to allowusers to reuse Neptune job outputs as inputs toother Neptune jobs. Neptune is open-source andcan be downloaded from http://neptune-lang.org.Users with Ruby installed can also install Neptunedirectly via Ruby’s integrated software reposi-tory by running gem install neptune. Ourmodifications to AppScale have been committedback to the AppScale project and can be found athttp://appscale.cs.ucsb.edu.

Acknowledgements We thank the anonymous reviewersfor their insightful comments. This work was funded in partby Google, IBM, and NSF grants CNS-CAREER-0546737and CNS-0905237.

References

1. Amazon Simple Storage Service (Amazon S3):http://aws.amazon.com/s3/. Last accessed 31 December2011

2. Armstrong, J., Virding, R., Wikström, C., Williams, M.:Concurrent Programming in ERLANG (1993)

3. Boto: http://code.google.com/p/boto/. Last accessed 31December 2011

4. Bunch, C., Chohan, N., Krintz, C., Chohan, J.,Kupferman, J., Lakhina, P., Li, Y., Nomura, Y.: Anevaluation of distributed datastores using the AppScaleCloud Platform. In: IEEE International Conference onCloud Computing (2010)

5. Bunch, C., Chohan, N., Krintz, C., Shams, K.: Neptune:a domain specific language for deploying HPC software

Page 24: Language and Runtime Support for Automatic Configuration ... · design and implementation of Neptune, and our extensions to the AppScale cloud platform. We then empirically evaluate

C. Bunch et al.

on cloud platforms. In: ACM Workshop on ScientificCloud Computing (2011)

6. Cassandra Operations: http://wiki.apache.org/cassandra/Operations

7. Charles, P., Grothoff, C., Saraswat, V., Donawa, C.,Kielstra, A., Ebcioglu, K., von Praun, C., Sarkar, V.:X10: an object-oriented approach to non-uniform clus-ter computing. SIGPLAN Not. 40, 519–538 (2005)

8. Chohan, N., Bunch, C., Krintz, C., Nomura, Y.:Database-agnostic transaction support for cloud in-frastructures. In: IEEE International Conference onCloud Computing (2011)

9. Chohan, N., Bunch, C., Pang, S., Krintz, C., Mostafa,N., Soman, S., Wolski, R.: AppScale: scalable and openAppEngine application development and deployment.In: ICST International Conference on Cloud Comput-ing (2009)

10. Daigle, B.J., Roh, M.K., Gillespie, D.T., Petzold, L.R.:Automated estimation of rare event probabilities inbiochemical systems. J. Phys. Chem. 134, 044110 (2011)

11. Dean, J., Ghemawat, S.: MapReduce: simplified dataprocessing on large clusters. In: Proceedings of 6thSymposium on Operating System Design and Imple-mentation (OSDI), pp. 137–150 (2004)

12. Drawert, B., Lawson, M.J., Petzold, L., Khammash,M.: The diffusive finite state projection algorithm foreffficient simulation of the stochastic reaction-diffusionmaster equation. J. Phys. Chem. 132(7), 074101-1 (2010)

13. El-Ghazawi, T., Cantonnet, F.: UPC performance andpotential: a NPB experimental study. In: Proceedingsof the 2002 ACM/IEEE Conference on Supercomput-ing. Supercomputing ’02, pp. 1–26. IEEE ComputerSociety Press, Los Alamitos, CA, USA (2002)

14. El-Samad, H., Kurata, H., Doyle, J.C., Gross, C.A.,Khammash, M.: Surviving heat shock: control strate-gies for robustness and performance. Proc. Natl. Acad.Sci. USA 102(8), 2736–2741 (2005)

15. Engaging the Missing Middle: http://www.hpcinthecloud.com/features/Engaging-the-Missing-Middle-in-HPC-95750644.html. Last accessed 31December 2011

16. Gropp, W., Lusk, E., Doss, N., Skjellum, A.: A high-performance, portable implementation of the MPImessage passing interface standard. Parallel Comput.22(6), 789–828 (1996)

17. Hadoop Distributed File System: http://hadoop.apache.org. Last accessed 31 December 2011

18. Heroku Learns from Amazon EC2 Outage: http://searchcloudcomputing.techtarget.com/news/1378426/Heroku-learns-from-Amazon-EC2-outage. Lastaccessed 31 December 2011

19. Hindman, B., Konwinski, A., Zaharia, M., Ghodsi, A.,Joseph, A., Katz, R., Shenker, S., Stoica, I.: Mesos: aplatform for fine-grained resource sharing in the datacenter. In: Networked Systems Design and Implemen-tation (2011)

20. Hucka, M., Finney, A., Sauro, H.M., Bolouri, H.,Doyle, J.C., Kitano, H., the rest of the SBML Fo-rum, Arkin, A.P., Bornstein, B.J., Bray, D., Cornish-Bowden, A., Cuellar, A.A., Dronov, S., Gilles, E.D.,Ginkel, M., Gor, V., Goryanin, I.I., Hedley, W.J.,

Hodgman, T.C., Hofmeyr, J.-H., Hunter, P.J., Juty,N.S., Kasberger, J.L., Kremling, A., Kummer, U., Le,N., NovÃlre, Loew, L.M., Lucio, D., Mendes, P.,Minch, E., Mjolsness, E.D., Nakayama, Y., Nelson,M.R., Nielsen, P.F., Sakurada, T., Schaff, J.C.,Shapiro, B.E., Shimizu, T.S., Spence, H.D., Stelling, J.,Takahashi, K., Tomita, M., Wagner, J., Wang, J.: Thesystems biology markup language (SBML): a mediumfor representation and exchange of biochemical net-work models. Bioinformatics 19(4), 524–531 (2003)

21. Kaiser, H., Merzky, A., Hirmer, S., Allen, G., Seidel,E.: The SAGA C++ reference implementation: a mile-stone toward new high-level Grid applications. In:Proceedings of the 2006 ACM/IEEE Conference onSupercomputing, SC ’06. ACM, New York, NY, USA(2006)

22. Keahey, K., Freeman, T.: Nimbus or an open sourcecloud platform or the best open source EC2 no moneycan buy. In: Supercomputing (2008)

23. Koslovski, G., Huu, T.T., Montagnat, J., Vicat-Blanc,P.: Executing distributed applications on virtualizedinfrastructures specified with the VXDL languageand managed by the HIPerNET framework. In:ICST International Conference on Cloud Computing(2009)

24. Krintz, C., Bunch, C., Chohan, N.: AppScale: Open-Source Platform-A s-A-Service. Technical Report2011-01, University of California, Santa Barbara (2011)

25. Lustre: http://www.lustre.org/. Last accessed 31 De-cember 2011

26. Nagel, W.E., Arnold, A., Weber, M., Hoppe, H.-Ch., Solchenbach, K.: VAMPIR: visualization andanalysis of MPI resources. Supercomputer 12, 69–80(1996)

27. Nurmi, D., Wolski, R., Grzegorczyk, C., Obertelli,G., Soman, S., Youseff, L., Zagorodnov, D.: Theeucalyptus open-source cloud-computing system. In:IEEE International Symposium on Cluster Computingand the Grid. http://open.eucalyptus.com/documents/ccgrid2009.pdf (2009). Last accessed 31 December 2011

28. Pbspro home page: http://www.altair.com/software/pbspro.htm. Last accessed 31 December 2011

29. RightScale: RightScale Gems http://rightaws.rubyforge.org/. Last accessed 31 December 2011

30. Rolfe, T.J.: A specimen MPI application: N-Queens inparallel. Inroads (bulletin of the ACM SIG on Com-puter Science Education), vol. 40(4) (2008)

31. Ruby language: http://www.ruby-lang.org. Last ac-cessed 31 December 2011

32. Ruby on Rails: http://www.rubyonrails.org. Last ac-cessed 31 December 2011

33. Sanft, K.R., Wu, S., Roh, M., Fu, J., Lim, R.K.,Petzold, L.R.: StochKit2: software for discrete sto-chastic simulation of biochemical systems with events.Bioinformatics (2011)

34. Shen, K., Tang, H., Yang, T.: Adaptive two-level threadManagement for fast MPI execution on shared memorymachines. In: Proceedings of ACM/IEEE SuperCom-puting ’99 (1999)

35. StochKit: http://www.cs.ucsb.edu/cse/StochKit/. Lastaccessed 31 December 2011