isdf: an integrated software-defined computing … · it mainly focused on the proper resource...

21
iSDF: an Integrated software-defined Computing Framework for Scientific Experiments Julim Ahn, Yoonhee Kim Dept. of Computer Science Sookmyung Women’s University Seoul, Korea 1 Distributed & Cloud Computing Lab.

Upload: vuongkiet

Post on 27-Jul-2018

215 views

Category:

Documents


0 download

TRANSCRIPT

iSDF: an Integrated software-defined Computing Framework for Scientific Experiments

Julim Ahn, Yoonhee Kim

Dept. of Computer Science

Sookmyung Women’s University

Seoul, Korea

1

Distributed & Cloud Computing Lab.

Introduction

2

Rapid technological advance in cloud

computing

Scientific applications

• Application executions

have been migrating to

data centers.

• Applications are getting

diverse and their

demands are increasing.

• Dynamic demands

makes non-trivial to

support and run on

clouds.

• It is essential to design

its own software-design

computing framework.

Distributed & Cloud Computing Lab.

Design of Software-defined computing framework

➢We have designed a framework for software-defined computing that enables the management to be decoupled and independent from underlying infrastructures

Application-aware Anti-Interference scheduling

➢It offers a dynamic fine-grained resource scheduling

Workflow supporting

➢It supports workflow execution and scheduling to ensure its performance

Ease of use environment

➢Users can control this framework with graphical interface(GUI) or application program interface(API)

3

Introduction - an Integrated software-defined Computing Framework for Scientific Experiments, iSDF

Distributed & Cloud Computing Lab.

High Throughput Computing-as-a-Service (HTCaaS)

An integrated multi-level scheduling & job submission system

To enable for scientists to carry out large-scale and complex scientific computations by integrating computing resources over heterogeneous infrastructures.

4

HTCaaS – related technology

Distributed & Cloud Computing Lab.

Main drawback

Large-scale task executions rather than on allocating fine-grained resources.

Needs of develop an adaptor to connect and expand new infrastructures.

5

Related technology - Job scheduling and Resource allocation

Distributed & Cloud Computing Lab.

Wang [1]

A controlling system which allocates appropriate resources through monitoring and analyzing current workloads of applications.

A virtual server is operated on a group of physical machines and each server is responsible for particular.

➢It mainly focused on the proper resource management and utilization with less considering task performance.

Lee [2]

A scheduling scheme considering deadline as well as power consumption in a cloud computing environment.

➢It is not applicable to scientific application such as workflow because it is a target of Bag of tasks application which does not have dependency between tasks.

Bittencourt [3]

A scheduling scheme for interdependent workflow applications without limiting the scheduling object to the Bag of tasks application.

➢It has limitations in the grid environment, not in the cloud environment.

[1] Wang, XiaoYing, et al. "Appliance-based autonomic provisioning framework for virtualized outsourcing data center." Autonomic

Computing, 2007. ICAC'07. Fourth International Conference on. IEEE, 2007.

[2] J. Lee, H. Kang, Y. Kim, “Energy-Efficient Provisioning of Virtual Machines for Workflow Applications”, KNOM Review, Vol. 16, No. 1,

July 2013, pp. 35-42.

[3] L. F. Bittencourt , E. R. Maderia, “A Performance-oriented Adaptive Scheduler for Dependent Tasks on Grids,” Concurrency and

Computation: Practice and Experience, vol. 20, pp. 1029-1049, 2008.

Architecture

➢It is categorized into three groups according to job or resource aspect and others.

6

An Integrated SDC Framework, iSDF

Distributed & Cloud Computing Lab.

The First group

• Job Submission

• Job Management

• Job Dispatch module

The Second group

• Resource Management

• Monitoring module

The Third group

• Profiling Management

➢ Job Submission It manages and parses job description files, JSDL, specified by

user(s).

JSDL includes job name, file staging, commands to execute and arguments with their range.

➢ Job Management• It manages tasks scheduling and queue management.

• In the workflow case, it checks its dependency, which is supported by Chronos.

• Diverse scheduling policies can be applied to the framework by system administrator.

7

iSDF - The First group

Distributed & Cloud Computing Lab.

➢ Job Dispatch• It exams the input data, the execution

file, and launches the tasks into the allocated resources.

related with Job

➢ Resource Management

It takes charge of allocation for multiple types of resources, management of virtual machines.

It is based on Mesos.

Helps the system control multiple computers as one computer

Allocates resources in dynamic way

➢ Monitoring

• It collects the status of CPU, Memory, Disk, virtual resources and Job status.

• It shares the information with other modules.

8

iSDF - The Second group

Distributed & Cloud Computing Lab.

related with Resource

Profiling Management

It manages the simulation history based on the submitted jobs.

Meaningful information is extracted in the form of a profile schema.

Its profile information is stored.

9

iSDF -The Third group

Distributed & Cloud Computing Lab.

related with profiling

1. A JSDL file is submitted to the iSDF through the front-end, either API(Application Programming Interface) or GUI (Graphic User Interface).

2. The Job Submission module parses the JSDL file and extracts information.

3. The Job management module asks the job characteristics to the Profile Management.

10

iSDF – Execution Scenario

<Execution Scenario of the integrated framework>

4. The Job Management module decides a scheduling plan for the tasks based on the scheduling policy.

5. The Resource Management module launches the virtual machines.

6. The Job Dispatch module prepares to launch the tasks by checking input and execution file.

11

iSDF – Execution Scenario

<Execution Scenario of the integrated framework>

Lots of computational jobs tend to vary, covering CPU, Memory-intensive or I/O-intensive.

To allocation the resource with similar characteristics may be caused resource contention.

causing overall performance degradation, low throughput for all executing jobs.

➢ Offer sophisticated and balanced scheduling, so that whole applications are able to be run in anti-interference way.

1. A fine-grained resource allocation

2. ‘anti-interference’-based scheduling service• well-balanced resource utilization and letting the tasks be

oblivious of interference.

3. workflow-supportive scheduling resource reservation using Maximum Estimated Resource

requirements(MER)12

Application-aware Anti-interference scheduler

Distributed & Cloud Computing Lab.

The current job is workflow Allocates resources using its MER

value.

A task has dominant resources Schedules to maximize resource

consumption.

Minimizes remaining resources.

To put resources into tasks that have opposing dominant resource requirements.

13

Application-aware Anti-interference scheduler - flowchart

Distributed & Cloud Computing Lab.

(Step 1)

Resource management module gets reported

The total amount of the available resources

(Step 2)

Monitoring module updates the resource capacities

Job management acquires the resource availability from the Monitoring

(Step 3)

Job1:{2cpu, 2gb, MER=null}, Job2:{1cpu, 5gb, MER=3}

14

Application-aware Anti-interference scheduler - example

Distributed & Cloud Computing Lab.

(Step 4)

Resource Management is given the information to Job Management.

(Step 5)

The job management module gives priority to a task2.

Since it (belonging to Job2) has MER value

A task 1 and a task 3 are scheduled on the remained resources.

In ascending order of the resource needs

15

Application-aware Anti-interference scheduler - example

Distributed & Cloud Computing Lab.

• The new medicine filed: molecular modeling simulation software (Autodock3) One thousand of ligands, and each of task

uses 1 core of CPU, 0.1MB of Memory

Autodock3 has 1.1% memory utilization while maintain 99.2% CPU utilization.

16

Applications used in experiments

• The Astronomy field: an astronomical image mosaic engine (Montage GALFA)

16 tasks and each task needs 1 core of CPU, and approximately 1GB of Memory

High memory utilization whereas CPU utilization is relatively low, 90.8% and 9.1%, respectively.

Distributed & Cloud Computing Lab.

17

[Experiment 1] A scheduling with no aid of application profiling, no consideration of overall utilization

The jobs for each applications can be given 50% of total resources, respectively (from R/n = 100/2%).

R: the ratio of total available resources, n: the count of applications to run

➢ It leads to low resource utilization.

• Since the jobs for each application could not take

enough resources than they preferred.

Distributed & Cloud Computing Lab.

18

[Experiment 2] A scheduling using the application profiling without anti-interference control Each task receives the proper virtual resource that

furnishes the desired resources using profiling module.

The resource starvation for workflow application (Montage) encounters while they are running.

➢ Long latency and makespan time.

• The pursuit of resource being more concentrated on

the specific one for each application. (e.g., cpu for

Autodock3)

Distributed & Cloud Computing Lab.

Comparing with Experiment 1,

the utilization is improved by

23.8%.

19

[Experiment 3] A scheduling with application profiling and anti-interference control

Use both the profiling and anti-interference control.

Where the scheduler reserves resources up to the MER if workflow exists.

➢ The arrow indicates the reservation.

➢ It prevents the resource starvation as well as

performance degradation.

✓ To improve the utilization for the whole servers

and contributes to shorter makespan time.

Distributed & Cloud Computing Lab.

Comparing with Experiment 2,

the utilization is improved by

13.4%, the execution time

reduced by 35%.

Lots of investigations to configure the new framework for cloud computing tailored to cover diverse applications.

Disruptive properties of scientific applications

It is challenging to adapt them in virtualized computing environment.

We proposed an integrated Software-Defined computing Framework for scientific experiments(iSDF).

It provides the sophisticated allocations over diverse types of resources by considering application’s properties.

• It can achieve 13.4% increase in terms of resource utilization, improve the makespan time by 35%.

Future work

Compensating this framework by adjoining a module to take care of data management.

Performing additional experiments on the hybrid cloud environments.

20

Conclusion & Future work

Distributed & Cloud Computing Lab.

Thank you!

[email protected]

21

Distributed & Cloud Computing Lab.