csc8710 001 winter2014-mohammed_shahnawazali-ff2687_presentation_1

47
Opportunities and Challenges for running Scientific Workflows on the Cloud CSC 8710-001 – Presentation I Mohammed Shahnawaz Ali

Upload: ff2687

Post on 25-Jun-2015

92 views

Category:

Education


0 download

DESCRIPTION

CSC8710-001_Winter2014_MohammedShahnawazAli-ff2687_Presentation_1

TRANSCRIPT

Page 1: Csc8710 001 winter2014-mohammed_shahnawazali-ff2687_presentation_1

Opportunities and Challenges for running Scientific Workflows on the Cloud

CSC 8710-001 – Presentation IMohammed Shahnawaz Ali

Page 2: Csc8710 001 winter2014-mohammed_shahnawazali-ff2687_presentation_1

CSC 8710 - Presentation I 2

Cloud computing has been mentioned over the recent years in relation to services or infrastructural resources, which can be contracted over a network, endorsing the idea of renting infrastructure instead of buying it. Hence, cloud computing infrastructures enables companies to cut costs by outsourcing/offloading computations on-demand, thereby gaining tremendous momentum in both academia and industry. The application of cloud computing, however, has mostly focused on Web applications and business applications; while the recognition of using cloud computing to support large-scale workflows, especially data intensive scientific workflows on the cloud is still largely overlooked.

Executive Summary

1/29/2014

Page 3: Csc8710 001 winter2014-mohammed_shahnawazali-ff2687_presentation_1

CSC 8710 - Presentation I 3

The paper coins the term “Cloud Workflow”, to refer tothe specification, execution, provenance tracking of large-scalescientific workflows, as well as the management of data andcomputing resources to enable the execution of scientificworkflows on the Cloud. The paper analyzes:1. Why there has been such a gap between the two

technologies, 2. What it means to bring Cloud and workflow together; 3. What are the key challenges in running Cloud workflow, 4. What are research opportunities in realizing workflows on

the Cloud.

Executive Summary (cont’d)

1/29/2014

Page 4: Csc8710 001 winter2014-mohammed_shahnawazali-ff2687_presentation_1

CSC 8710 - Presentation I 4

The term “cloud” has its origins in network diagrams that represented the internet, or various parts of it, as schematic clouds. The term “Cloud computing” was defined for what happens when applications and services are moved into the internet “cloud.” Cloud computing is not something that suddenly appeared overnight; in some form it may trace back to a time when computer systems remotely time-shared computing resources and applications. More currently though, cloud computing refers to the many different types of services and applications being delivered in the internet cloud, and the fact that, in many cases, the devices used to access these services and applications do not require any special applications.

Introduction. Cloud – The Origin

1/29/2014

Page 5: Csc8710 001 winter2014-mohammed_shahnawazali-ff2687_presentation_1

CSC 8710 - Presentation I 5

Cloud computing represents :1. a different way to architect and remotely manage computing

resources. 2. network-based services, which appear to be provided by real

server hardware, and are in fact served up by virtual hardware, simulated by software running on one or more real machines

Cloud computing offerings today are suitable to:3. host enterprise architectures and provide clear benefit to

corporations by providing capabilities complementary to what they have,

4. help elastically scale enterprise architectures.

Introduction. Cloud – The Representation

1/29/2014

Page 6: Csc8710 001 winter2014-mohammed_shahnawazali-ff2687_presentation_1

CSC 8710 - Presentation I 6

Commonly used, the term "the cloud" is essentially a metaphor for the Internet. Marketers have further popularized the phrase "in the cloud" to refer to software, platforms and infrastructure that are sold as a service i.e. remotely through the Internet. Typically, the seller has actual energy-consuming servers which host products and services from a remote location, so end-users don't have to; they can simply log on to the network without installing anything. According to the field of interest, software, service or infrastructure providers highlight different aspects.

Introduction. Cloud – The Metaphor

1/29/2014

Page 7: Csc8710 001 winter2014-mohammed_shahnawazali-ff2687_presentation_1

CSC 8710 - Presentation I 7

Some notable companies delivering services from the cloud include:1. Google — Has a private cloud that it uses for delivering many

different services to its users, including email access, document applications, text translations, maps, web analytics, and much more

2. Microsoft — Has Microsoft SharePoint online service that allows for content and business intelligence tools to be moved into the cloud, and Microsoft currently makes its office applications available in a cloud.

3. Salesforce.com — Runs its application set for its customers in a cloud, and its Force.com and Vmforce.com products provide developers with platforms to build customized cloud services.

Introduction. Cloud – The Offerings

1/29/2014

Page 8: Csc8710 001 winter2014-mohammed_shahnawazali-ff2687_presentation_1

CSC 8710 - Presentation I 8

• 2006: August 24, 2006 conceivably goes down as the birthday of Cloud Computing, as it was on this day that Amazon made the test version of its Elastic Computing Cloud (EC2) public. This offer, providing flexible IT resources (computing capacity), marked a definitive milestone in dynamic business relations between IT users and providers.

• 2007: The term first became popular in 2007, to which the first entry in the English Wikipedia from March 3, 2007 attests, which, again significantly, contained a reference to utility computing.

• 2008: In 2008, there was a glut of active parties in the increasingly popular field of Cloud Computing.

• Today, Cloud Computing generates over 10.3 million matches on Google.

History

1/29/2014

Page 9: Csc8710 001 winter2014-mohammed_shahnawazali-ff2687_presentation_1

CSC 8710 - Presentation I 91/29/2014

Cloud Computing

Page 10: Csc8710 001 winter2014-mohammed_shahnawazali-ff2687_presentation_1

CSC 8710 - Presentation I 10

Cloud computing has a variety of characteristics, with the main ones being:1. Shared Infrastructure — Uses a virtualized software model,

enabling the sharing of physical services, storage, and networking capabilities. The cloud infrastructure, regardless of deployment model, seeks to make the most of the available infrastructure across a number of users.

2. Dynamic Provisioning — Allows for the provision of services based on current demand requirements. This is done automatically using software automation, enabling the expansion and contraction of service capability, as needed. This dynamic scaling needs to be done while maintaining high levels of reliability and security.

Cloud Computing – The Characteristics

1/29/2014

Page 11: Csc8710 001 winter2014-mohammed_shahnawazali-ff2687_presentation_1

CSC 8710 - Presentation I 11

3. Network Access — Needs to be accessed across the internet from a broad range of devices such as PCs, laptops, and mobile devices, using standards-based APIs (for example, ones based on HTTP). Deployments of services in the cloud include everything from using business applications to the latest application on the newest smartphones.

4. Managed Metering — Uses metering for managing and optimizing the service and to provide reporting and billing information. In this way, consumers are billed for services according to how much they have actually used during the billing period.

Cloud Computing – The Characteristics (cont’d)

1/29/2014

Page 12: Csc8710 001 winter2014-mohammed_shahnawazali-ff2687_presentation_1

CSC 8710 - Presentation I 12

Once a cloud is established, how its cloud computing services are deployed in terms of business models can differ depending on requirements. The primary service models being deployed are commonly known as:1. Software as a Service (SaaS) — Consumers purchase the ability

to access and use an application or service that is hosted in the cloud. A benchmark example of this is Salesforce.com, as discussed previously, where necessary information for the interaction between the consumer and the service is hosted as part of the service in the cloud.

2. Platform as a Service (PaaS) — Consumers purchase access to the platforms, enabling them to deploy their own software and applications in the cloud. The operating systems and network access are not managed by the consumer, and there might be constraints as to which applications can be deployed.

Cloud Computing – The Service Models

1/29/2014

Page 13: Csc8710 001 winter2014-mohammed_shahnawazali-ff2687_presentation_1

CSC 8710 - Presentation I 13

3. Infrastructure as a Service (IaaS) — Consumers control and manage the systems in terms of the operating systems, applications, storage, and network connectivity, but do not themselves control the cloud infrastructure.

4. Communications as a Service (CaaS) — is a model used to describe hosted IP telephony services. Along with the move to CaaS is a shift to more IP-centric communications and more SIP trunking deployments. With IP and SIP in place, it can be as easy to have the PBX in the cloud as it is to have it on the premise. In this context, CaaS could be seen as a subset of SaaS

Cloud Computing – The Service Models (cont’d)

1/29/2014

Page 14: Csc8710 001 winter2014-mohammed_shahnawazali-ff2687_presentation_1

CSC 8710 - Presentation I 14

Deploying cloud computing can differ depending on requirements, and the following four deployment models have been identified, each with specific characteristics that support the needs of the services and users of the clouds in particular ways:1. Private Cloud — The cloud infrastructure has been deployed,

and is maintained and operated for a specific organization. The operation may be in-house or with a third party on the premises.

2. Community Cloud — The cloud infrastructure is shared among a number of organizations with similar interests and requirements. This may help limit the capital expenditure costs for its establishment as the costs are shared among the organizations. The operation may be in-house or with a third party on the premises.

Cloud Computing – The Deployment Models

1/29/2014

Page 15: Csc8710 001 winter2014-mohammed_shahnawazali-ff2687_presentation_1

CSC 8710 - Presentation I 15

3. Public Cloud — The cloud infrastructure is available to the public on a commercial basis by a cloud service provider. This enables a consumer to develop and deploy a service in the cloud with very little financial outlay compared to the capital expenditure requirements normally associated with other deployment options.

4. Hybrid Cloud — The cloud infrastructure consists of a number of clouds of any type, but the clouds have the ability through their interfaces to allow data and/or applications to be moved from one cloud to another. This can be a combination of private and public clouds that support the requirement to retain some data in an organization, and also the need to offer services in the cloud.

Cloud Computing – The Deployment Models (cont’d)

1/29/2014

Page 16: Csc8710 001 winter2014-mohammed_shahnawazali-ff2687_presentation_1

CSC 8710 - Presentation I 16

The following are some of the possible benefits for those who offer cloud computing-based services and applications:1. Cost Savings — Companies can reduce their capital

expenditures and use operational expenditures for increasing their computing capabilities. This is a lower barrier to entry and also requires fewer in-house IT resources to provide system support.

2. Scalability/Flexibility — Companies can start with a small deployment and grow to a large deployment fairly rapidly, and then scale back if necessary. Also, the flexibility of cloud computing allows companies to use extra resources at peak times, enabling them to satisfy consumer demands.

Cloud Computing – The Benefits

1/29/2014

Page 17: Csc8710 001 winter2014-mohammed_shahnawazali-ff2687_presentation_1

CSC 8710 - Presentation I 17

3. Reliability — Services using multiple redundant sites can support business continuity and disaster recovery.

4. Maintenance — Cloud service providers do the system maintenance, and access is through APIs that do not require application installations onto PCs, thus further reducing maintenance requirements.

5. Mobile Accessible — Mobile workers have increased productivity due to systems accessible in an infrastructure available from anywhere

Cloud Computing – The Benefits (cont’d)

1/29/2014

Page 18: Csc8710 001 winter2014-mohammed_shahnawazali-ff2687_presentation_1

CSC 8710 - Presentation I 181/29/2014

Scientific Workflows

Page 19: Csc8710 001 winter2014-mohammed_shahnawazali-ff2687_presentation_1

CSC 8710 - Presentation I 19

• Scientific Workflows are an amalgamation of scientific problem-solving and traditional workflow techniques.

• These are another class of workflows, in addition to the business workflows, that emerge in sophisticated scientific problem-solving environments and applications viz., climate modeling, structural biology and chemistry, medical surgery or disaster recovery simulation.

• Compared with business workflows, scientific workflow has special features such as computation, data or transaction intensity, less human interaction, and a large number of activities.

Scientific Workflows

1/29/2014

Page 20: Csc8710 001 winter2014-mohammed_shahnawazali-ff2687_presentation_1

CSC 8710 - Presentation I 20

The reference architecture for SWFMS consists of four logical layers, seven major functional subsystems, and six interfaces:

1. Operational Layer: consists of a wide range of heterogeneous and distributed data sources, software tools, services, and their operational environments, including high end computing environments.

2. Task Management Layer: consists of three subsystems: Data Product Management, Provenance Management, and Task Management.

3. Workflow Management Layer: consists of Workflow Engine and Workflow Monitoring.

4. Presentation Layer: consists of the Workflow Design subsystem and the Presentation and Visualization subsystem

Scientific Workflow Management System

1/29/2014

Page 21: Csc8710 001 winter2014-mohammed_shahnawazali-ff2687_presentation_1

CSC 8710 - Presentation I 21

Cloud computing have been widely accepted and applied to Web applications and business applications. However, the cloud capabilities have not been successfully extended to execute and manage workflow applications, especially data-intensive scientific workflows.

The current state of workflow organization on the Cloud has been either:1. static predefined pipelines based on batch style scripts or

graphs based on the MapReduce programming model2. ad hoc mash-up’s that are connected together with, again,

scripts that parse the output of one web application and feed into another.

Scientific Workflows – On The Cloud Today

1/29/2014

Page 22: Csc8710 001 winter2014-mohammed_shahnawazali-ff2687_presentation_1

CSC 8710 - Presentation I 22

Several scientific workflow management systems (SWFMSs) have been successfully applied over a number of execution environments viz., local hosts, clusters/grids, and supercomputers. However, Cloud computing provides a paradigm-shifting utility-oriented computing model in terms of the unprecedented size of datacenter-level resource pool and the on-demand resource provisioning mechanism, enabling scientific workflow solutions capable of addressing peta-scale scientific problems.

Scientific Workflows – On The Cloud Today (cont’d)

1/29/2014

Page 23: Csc8710 001 winter2014-mohammed_shahnawazali-ff2687_presentation_1

CSC 8710 - Presentation I 23

The term Cloud Workflow is constructed to bring the terms Cloud and Scientific Workflows

together.

It refers to the following attributes of scientific workflows1. specification, 2. execution, 3. provenance tracking along with management of data and computing resources to enable the running of scientific workflows on the Cloud.

Cloud Workflow – Bringing them together

1/29/2014

Page 24: Csc8710 001 winter2014-mohammed_shahnawazali-ff2687_presentation_1

CSC 8710 - Presentation I 241/29/2014

Opportunities

Page 25: Csc8710 001 winter2014-mohammed_shahnawazali-ff2687_presentation_1

CSC 8710 - Presentation I 25

Clouds provide a multiplicity of opportunities that are more technological in nature, and that these opportunities stem, primarily, from the extensive use of service-oriented architecturesand virtualization in clouds

Scalability:

• The scale of scientific problems:• that can be addressed by scientific workflows is now greatly increased,

which was previously limited by the size of a dedicated resource pool.• is reflected not only on the data sizes that scientific applications need

to handle, but also on the complexities of the applications themselves.

• Cloud platforms can offer vast amount of storage space as well as computing resources for applications across multiple disciplines including physics, earth science, and medicine, allowing scientific discoveries to be carried out in an unprecedented scale.

Cloud Workflow – The Opportunities

1/29/2014

Page 26: Csc8710 001 winter2014-mohammed_shahnawazali-ff2687_presentation_1

CSC 8710 - Presentation I 26

Dynamic resource allocation:

• By allocating the resources only when they are needed, it presents various advantages including:

• Optimum resource utilization.

• Improved end user experience.

• Collaborative batch based scientific workflows.

Cloud Workflow – The Opportunities

1/29/2014

Page 27: Csc8710 001 winter2014-mohammed_shahnawazali-ff2687_presentation_1

CSC 8710 - Presentation I 27

Relinquish allocated resources:

• Cloud allows users to return resources on-demand

• Enables workflow systems to easily grow and shrink the available resource pool as the needs of the workflow change over time

• Closely match the needs of the application by acquiring or releasing resources for optimal usage

Cloud Workflow – The Opportunities

1/29/2014

Page 28: Csc8710 001 winter2014-mohammed_shahnawazali-ff2687_presentation_1

CSC 8710 - Presentation I 28

Performance to Cost Trade/off:

• Cloud computing provides a much larger room for the trade-off between performance and cost.

• The spectrum of resource investment now ranges from:• dedicated private resources, • hybrid resource pool combining local resource and remote

clouds, • full outsourcing of computing and storage to public

Clouds.

• Cloud Computing not only provides the potential of solving larger-scale scientific problems, but also brings the opportunity to improve the performance/cost ratio.

Cloud Workflow – The Opportunities

1/29/2014

Page 29: Csc8710 001 winter2014-mohammed_shahnawazali-ff2687_presentation_1

CSC 8710 - Presentation I 29

Heterogeneous Applications Support:

• Clouds and their use of virtualization technology makes different heterogeneous applications much easier to run together.

• Virtualization enables the environment to be customized to suit the application.

• Environment with Operating System, applications and their configurations can be bundled up as a virtual machine image and redeployed on a cloud to run the workflow.

Cloud Workflow – The Opportunities

1/29/2014

Page 30: Csc8710 001 winter2014-mohammed_shahnawazali-ff2687_presentation_1

CSC 8710 - Presentation I 30

Resource Provisioning:

• Instead of delegating allocation to the resource manager, the user directly provisions the resources required and schedules their computations using a user-controlled scheduler.

• Provisioning model is ideal for workflows and other loosely-coupled applications because it enables the application to allocate a resource once and use it to execute many tasks.

• Reduces the total scheduling overhead which, in turn, can dramatically improve workflow performance

Cloud Workflow – The Opportunities

1/29/2014

Page 31: Csc8710 001 winter2014-mohammed_shahnawazali-ff2687_presentation_1

CSC 8710 - Presentation I 31

Provenance and Re-Imaging:

• Virtualization allows one to capture the exact environment that was used to perform a computation, including all of the software and configuration used in that environment.

• Virtual machine image can be stored along with the provenance of the workflow.

• Redeploy the virtual machine image to create exactly the same environment that was used to run the original experiment.

Cloud Workflow – The Opportunities

1/29/2014

Page 32: Csc8710 001 winter2014-mohammed_shahnawazali-ff2687_presentation_1

CSC 8710 - Presentation I 321/29/2014

Challenges

Page 33: Csc8710 001 winter2014-mohammed_shahnawazali-ff2687_presentation_1

CSC 8710 - Presentation I 33

Despite the advantages and opportunities we can seek inCloud computing for scientific workflows, there are manymajor obstacles to the adaptation and running of scientificworkflows on the Cloud.

Architectural Challenge:

The following seven are key architectural requirements for an SWFMS:

1. User interface customizability and user interaction support.2. Reproducibility support3. Heterogeneous and distributed services and software tools

integration.4. Heterogeneous and distributed data product management.5. High-end computing support.6. Workflow monitoring and failure handling.7. Interoperability.

Cloud Workflow – The Challenges

1/29/2014

Page 34: Csc8710 001 winter2014-mohammed_shahnawazali-ff2687_presentation_1

CSC 8710 - Presentation I 34

Architectural Challenge (cont’d):

There are four possible solutions for deploying the reference architecture in a Cloud computing environment:

1. Operational-Layer-in-the-Cloud: only the Operational Layer is deployed in the Cloud with an SWFMS running out of the Cloud.

2. Task-Management-Layer-in-the-Cloud: both the Operational Layer and the Task Management Layer are deployed in the Cloud.

3. Workflow-Management-Layer-in-the-Cloud: the Operational Layer, the Task Management Layer, and the Workflow Management Layer are deployed in the Cloud with the Presentation Layer deployed at a client machine.

4. All-in-the-Cloud: The whole SWFMS is deployed inside the Cloud and accessible via a Web browser.

Cloud Workflow – The Challenges

1/29/2014

Page 35: Csc8710 001 winter2014-mohammed_shahnawazali-ff2687_presentation_1

CSC 8710 - Presentation I 35

Integration Challenge:

The integration problem includes the following:

1. In the operational-layer-in-the-Cloud approach, we treat applications, services, and tools hosted in the Cloud as task units in a workflow, the scheduling and management of a workflow are mostly outside the Cloud, where these task units are invoked as they are scheduled to execute.

2. Once we decide to get task dispatching and scheduling into the Cloud, resource provisioning becomes the next issue.

3. The uncapped resources requested by a workflow comes at a cost.4. Debugging, monitoring, and provenance tracking for a workflow can be

even more difficult in the Cloud, since resources are usually dynamically assigned and based on virtual machine instances, the environment that a task is executed on could be destroyed right after the task is finished, and assigned to a complete different user and task.

5. Porting an SWFMS into the Cloud is also a concern, which usually involves wrapping up an SWFMS as a Cloud service.

Cloud Workflow – The Challenges

1/29/2014

Page 36: Csc8710 001 winter2014-mohammed_shahnawazali-ff2687_presentation_1

CSC 8710 - Presentation I 36

Language Challenge:

Language adopted for cloud computing include:

1. MapReduce is the “only” widely adopted computing model, and there are a number of variations of languages based on this model for task specification in the Cloud.

2. White-Box approach: MapReduce and its variations require application logic to be rewritten to follow the map-reduce-merge programming model. Thus, users need to fully understand the applications and port the applications before they can leverage the parallel computing infrastructure.

3. Black-Box approach: SwiftScript serves as a general purpose coordination language, where existing applications can be invoked without modification.

4. Mash-up’s and ad hoc scripts (Java Script, PHP, Python, etc.) have become key technologies for developing Web applications that dynamically integrate multiple data or service sources.

Cloud Workflow – The Challenges

1/29/2014

Page 37: Csc8710 001 winter2014-mohammed_shahnawazali-ff2687_presentation_1

CSC 8710 - Presentation I 37

Language Challenge (cont’d):

The language challenges includes the following:

1. Handle the mapping from input and output data into logical structures to facilitate data integration and logical operations on data.

2. Support large-scale parallelism via either implicit parallelism, or explicit declaratives such as Parallel Foreach.

3. Support data partitioning and task partitioning.4. Require a scalable, reliable, and efficient runtime system that

can support Cloud-scale task scheduling and dispatching, provide error recovery and fault tolerance under all kinds of hardware and service failures, and utilize a large pool of Cloud resources efficiently.

Cloud Workflow – The Challenges

1/29/2014

Page 38: Csc8710 001 winter2014-mohammed_shahnawazali-ff2687_presentation_1

CSC 8710 - Presentation I 38

Computing Challenge:

The computing challenges includes the following:

1. Managing large-scale of computing resources.2. Workflow systems may not be able to talk to Cloud resources

directly, they may still need go through middleware services such as Nimbus and Falkon that handle resource provisioning and task dispatching.

3. Workflow resource requirements, data dependencies, Cloud virtualization, etc makes thing even more complicated.

4. Additional measures is needed to support large workflows and components.

Cloud Workflow – The Challenges

1/29/2014

Page 39: Csc8710 001 winter2014-mohammed_shahnawazali-ff2687_presentation_1

CSC 8710 - Presentation I 39

Data Management Challenge:

The data management challenges includes the following:1. Analyzing, visualizing, and disseminating of large data sets.2. Management of data resources and dataflow between the storage

and resources in data intensive applications.The following aspects of data management within a Cloud are important from a workflow perspective:3. Data Locality:

a. Location of the data relative to the available computational resources. Moving data repeatedly to distant CPUs is expensive and inefficient.

b. Data need to be distributed over many computers to achieve good scalability.

4. Combining compute and data resource management.5. Scalability of Clouds require scalable provenance systems to handle

storage and querying of potentially millions of tasks.

Cloud Workflow – The Challenges

1/29/2014

Page 40: Csc8710 001 winter2014-mohammed_shahnawazali-ff2687_presentation_1

CSC 8710 - Presentation I 40

Service Management Challenge:

The service management (Orchestrating and invoking services via an SWFMS) challenges includes the following:

1. Service description, discovery, and composition.2. Managing the large number of service instances.3. Data movements across service instances involving large data

volumes.4. For a workflow to invoke publicly available services SWFMS

also needs to handle security, interoperability and data transformation issues.

Cloud Workflow – The Challenges

1/29/2014

Page 41: Csc8710 001 winter2014-mohammed_shahnawazali-ff2687_presentation_1

CSC 8710 - Presentation I 41

Storage Challenge:

The storage challenges includes the following:

1. Commercial clouds often deploy structured or object-based storage services that can be utilized by workflow applications.

2. In the absence of standard file system interfaces, the application codes must either be modified to interface with the storage services, or must be wrapped with additional workflow components to do the translation.

3. Deploying a temporary shared file system in the cloud as part of a virtual cluster is complex, potentially costly and requires additional step to ensure that desired outputs are transferred to permanent storage.

4. Storage security.

Cloud Workflow – The Challenges

1/29/2014

Page 42: Csc8710 001 winter2014-mohammed_shahnawazali-ff2687_presentation_1

CSC 8710 - Presentation I 42

Network and Tools Challenge:

The network and tools challenges includes the following:

1. Data-intensive workflows depend on high-performance networks to achieve good performance.

2. Requires high-throughput, but not necessarily low latency, and faster networks.

3. Setting up an environment to run workflows in the cloud.4. There is some work in virtual appliances, but those are

typically designed for single nodes and not for clusters of nodes.

Cloud Workflow – The Challenges

1/29/2014

Page 43: Csc8710 001 winter2014-mohammed_shahnawazali-ff2687_presentation_1

CSC 8710 - Presentation I 431/29/2014

Research Directions

Page 44: Csc8710 001 winter2014-mohammed_shahnawazali-ff2687_presentation_1

CSC 8710 - Presentation I 44

The key areas for research efforts in cloud based workflows:Architecture:1. Implement the key components in the different layers of the

SWFMS architecture, with interoperability and reusability. This would help us leverage existing Cloud technologies, such as monitoring data management, resource provisioning, etc.

2. Leverage middleware technologies that bridge existing workflow systems with the Cloud to be more cost effective.

Scripting:3. Scripting has the advantage of being concise and flexible, yet

powerful when combined with parallel semantics and logical operations.

4. Expect to see scripting languages that have a mixture of these semantics, combining the coordination of applications and services.

Cost:5. Analyze the cost for computation and resource utilization to

estimate and optimize the ROI.

Cloud Workflow – The Research Directions

1/29/2014

Page 45: Csc8710 001 winter2014-mohammed_shahnawazali-ff2687_presentation_1

CSC 8710 - Presentation I 45

Provenance:1. Can adopt the SOA model making provenance less coupled with an

SWFMS than it currently does.Security:2. It is the first major service that needs to be provided by a Cloud provider.

a. Access Control: Due to the dynamic nature and the large-scale data, metadata, and service sharing nature of the Cloud, access control is a challenging but important research problem.

b. Information Control Flow: Since a scientific workflow might orchestrate a large number of distributed services, data, and applications, particularly in a large-scale Cloud environment, the mechanism that controls mission-critical information and intellectual property not being propagated to an unauthorized user is important.

c. Secure electronic transaction protocol: To prevent the abuse of Cloud accounts and double or wrong charges by a Cloud provider further research might be needed to ensure the security of Cloud-based transaction protocol.

Cloud Workflow – The Research Directions

1/29/2014

Page 46: Csc8710 001 winter2014-mohammed_shahnawazali-ff2687_presentation_1

CSC 8710 - Presentation I 46

• The benefit of cloud computing for science is not necessarily in its utility computing and economic aspects, which are not new for academic computing. The benefit of clouds is rather in its technological features that stem from service-oriented architecture and virtualization

• Much work is needed to bring cloud platforms up to the performance level of the grid. This includes developing cloud storage systems that are appropriate for workflow and other science applications as well as tools to help scientists and workflow engineers deploy their applications in the cloud.

• As more and more customers and applications migrate into Cloud, the requirement to have workflow systems to manage the ever more complex task dependencies, and to handle issues such as large parameter space exploration, smart reruns, and provenance tracking will become more urgent.

• Cloud needs more structured and mature workflow technologies, and vice versa, as Cloud offers unprecedented scalability to workflow systems, and could potentially change the way we perceive and conduct scientific experiments.

Closing Notes

1/29/2014

Page 47: Csc8710 001 winter2014-mohammed_shahnawazali-ff2687_presentation_1

CSC 8710 - Presentation I 47

Thank You

1/29/2014