exploring the design space for optimizations with apache...

8
Exploring the Design Space for Optimizations with Apache Aurora and Mesos Renan DelValle, Gourav Rattihalli, Angel Beltre, Madhusudhan Govindaraju, and Michael J. Lewis Department of Computer Science, State University of New York (SUNY) at Binghamton {rdelval1, grattih1, abeltre1, mgovinda, mlewis}@binghamton.edu Abstract—Cloud infrastructures increasingly include a hetero- geneous mix of components in terms of performance, power, and energy usage. As the size of cloud infrastructures grows, power consumption becomes a significant constraint. We use Apache Mesos and Apache Aurora, which provide massive scalability to web-scale applications, to demonstrate how a policy driven approach involving bin-packing workloads according to their power profiles, instead of the default allocation by Mesos and Aurora, can effectively reduce the peak-power and energy usage as well as the node utilization, when workloads are co-scheduled. Our experimental results show reductions of 11% in peak power, 86% for total energy usage, and an increase in utilization of 148% for memory and 8% CPU for the different policies. I. I NTRODUCTION Large-scale datacenters (DCs) execute thousands of diverse applications each day. Conflicts between co-located workloads and the difficulty of matching applications to appropriate nodes can degrade performance and violate workload Quality of Service (QoS) requirements [1]. Apache Mesos [2] enables dynamic partitioning and removes the need to isolate frame- works on separate resources. Apache Aurora [3] works in concert with Mesos, as a service scheduler. Many big com- panies with large-scale applications, such as Twitter, Apple, and Bloomberg, us these technologies to provide scalability and stability of cloud infrastructures. Mesos and Aurora interact to allocate resources (memory, cores, and storage) to tasks and to cache these resource allocation decisions (described in Section II). The caching mechanism is effective when each task is allocated in isolation, but can have negative consequences when many fine-grained jobs arrive and resource offers are too large. For example, if 10 tasks each requesting one CPU and four GB of memory arrive at Aurora, and Mesos makes a combined resource offer of ten CPUs and 40 GBs of memory, Aurora accepts the offer. However, it accepts the offer of one CPU and four GB for each task one at a time, repeating the resource request and acceptance process for all tasks. While this negotiation phase provides fairness and enables co-scheduling of tasks, it may introduce a delay before the start of each task. As cloud workloads run, they draw power to their host machine(s). The power usage can vary depending on the characteristics of each host, and on the Cloud’s shared software infrastructure. Optimally co-scheduling applications to mini- mize peak-power usage can be reduced to a multi-dimensional bin packing problem, and is therefore NP-Hard. We therefore set out to find heuristics that reduce peak power in a cluster that uses that Mesos and Aurora. Approach. We address the peak-power collision problem using a policy-driven heuristic approach to the multi-dimensional bin packing problem. We use the DaCapo benchmarks [4] as workloads. DaCapo is a set of open source, real-world applications that exercise the various resources within a com- pute node. Our approach characterizes the power use of each benchmark on each node using fine-grained power profiles provided by Intel’s Running Average Power Limit (RAPL) [5] counters via the Linux Powercapping framework [6]. We take the power profiling data for a given benchmark and node, and use that to engineer the job arrival time by potentially delaying it up to 3 seconds. This delay ensures that the power surge for two benchmarks do not happen at the same instance and also influences how Mesos and Aurora allocate resources for each benchmark. We show the effect of two different bin packing policies, one that takes into account local power profile information and one that takes into consideration global power profile data. We evaluate how the staging of tasks to avoid peak power collisions, also influences resource usage, and energy consumption. We make the following contributions in this paper: We demonstrate how bin-packing a set of tasks can effec- tively reduce the peak-power usage and total energy usage while increasing the node utilization, when workloads are co-scheduled to be run using Mesos and Aurora. We show how our experimental framework can inform application developers how their applications respond to peak power usage in a heterogeneous cloud environment. We demonstrate how Apache Mesos and Apache Aurora should be used so that application developers can express what they need from a cluster in terms of peak power use, and not just memory, disk, and CPU specifications. II. BACKGROUND:MESOS AND AURORA Mesos provides scalability and fault-tolerance to massive scale applications. Examples of its use include Apple’s Siri, Bloomberg’s data analytics, Paypal’s continuous integration system, and Verizon Labs [7]. How Apache Mesos Works: Apache Mesos provides a layer of abstraction above the compute resources in data centers and large clusters. Mesos combines cluster resources (CPU, memory, and storage) into a shared pool, and efficiently

Upload: others

Post on 20-May-2020

5 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Exploring the Design Space for Optimizations with Apache ...cloud.cs.binghamton.edu/.../03/exploring-design-space-mesos-aurora… · Exploring the Design Space for Optimizations with

Exploring the Design Space for Optimizations withApache Aurora and Mesos

Renan DelValle, Gourav Rattihalli, Angel Beltre, Madhusudhan Govindaraju, and Michael J. Lewis

Department of Computer Science, State University of New York (SUNY) at Binghamton{rdelval1, grattih1, abeltre1, mgovinda, mlewis}@binghamton.edu

Abstract—Cloud infrastructures increasingly include a hetero-geneous mix of components in terms of performance, power, andenergy usage. As the size of cloud infrastructures grows, powerconsumption becomes a significant constraint. We use ApacheMesos and Apache Aurora, which provide massive scalabilityto web-scale applications, to demonstrate how a policy drivenapproach involving bin-packing workloads according to theirpower profiles, instead of the default allocation by Mesos andAurora, can effectively reduce the peak-power and energy usageas well as the node utilization, when workloads are co-scheduled.Our experimental results show reductions of 11% in peak power,86% for total energy usage, and an increase in utilization of 148%for memory and 8% CPU for the different policies.

I. INTRODUCTION

Large-scale datacenters (DCs) execute thousands of diverseapplications each day. Conflicts between co-located workloadsand the difficulty of matching applications to appropriatenodes can degrade performance and violate workload Qualityof Service (QoS) requirements [1]. Apache Mesos [2] enablesdynamic partitioning and removes the need to isolate frame-works on separate resources. Apache Aurora [3] works inconcert with Mesos, as a service scheduler. Many big com-panies with large-scale applications, such as Twitter, Apple,and Bloomberg, us these technologies to provide scalabilityand stability of cloud infrastructures.

Mesos and Aurora interact to allocate resources (memory,cores, and storage) to tasks and to cache these resourceallocation decisions (described in Section II). The cachingmechanism is effective when each task is allocated in isolation,but can have negative consequences when many fine-grainedjobs arrive and resource offers are too large. For example, if10 tasks each requesting one CPU and four GB of memoryarrive at Aurora, and Mesos makes a combined resource offerof ten CPUs and 40 GBs of memory, Aurora accepts the offer.However, it accepts the offer of one CPU and four GB foreach task one at a time, repeating the resource request andacceptance process for all tasks. While this negotiation phaseprovides fairness and enables co-scheduling of tasks, it mayintroduce a delay before the start of each task.

As cloud workloads run, they draw power to their hostmachine(s). The power usage can vary depending on thecharacteristics of each host, and on the Cloud’s shared softwareinfrastructure. Optimally co-scheduling applications to mini-mize peak-power usage can be reduced to a multi-dimensionalbin packing problem, and is therefore NP-Hard. We therefore

set out to find heuristics that reduce peak power in a clusterthat uses that Mesos and Aurora.

Approach. We address the peak-power collision problem usinga policy-driven heuristic approach to the multi-dimensionalbin packing problem. We use the DaCapo benchmarks [4]as workloads. DaCapo is a set of open source, real-worldapplications that exercise the various resources within a com-pute node. Our approach characterizes the power use of eachbenchmark on each node using fine-grained power profilesprovided by Intel’s Running Average Power Limit (RAPL)[5] counters via the Linux Powercapping framework [6]. Wetake the power profiling data for a given benchmark and node,and use that to engineer the job arrival time by potentiallydelaying it up to 3 seconds. This delay ensures that the powersurge for two benchmarks do not happen at the same instanceand also influences how Mesos and Aurora allocate resourcesfor each benchmark. We show the effect of two differentbin packing policies, one that takes into account local powerprofile information and one that takes into consideration globalpower profile data. We evaluate how the staging of tasks toavoid peak power collisions, also influences resource usage,and energy consumption.

We make the following contributions in this paper:• We demonstrate how bin-packing a set of tasks can effec-

tively reduce the peak-power usage and total energy usagewhile increasing the node utilization, when workloads areco-scheduled to be run using Mesos and Aurora.

• We show how our experimental framework can informapplication developers how their applications respond topeak power usage in a heterogeneous cloud environment.

• We demonstrate how Apache Mesos and Apache Aurorashould be used so that application developers can expresswhat they need from a cluster in terms of peak power use,and not just memory, disk, and CPU specifications.

II. BACKGROUND: MESOS AND AURORA

Mesos provides scalability and fault-tolerance to massivescale applications. Examples of its use include Apple’s Siri,Bloomberg’s data analytics, Paypal’s continuous integrationsystem, and Verizon Labs [7].

How Apache Mesos Works: Apache Mesos provides a layerof abstraction above the compute resources in data centersand large clusters. Mesos combines cluster resources (CPU,memory, and storage) into a shared pool, and efficiently

Page 2: Exploring the Design Space for Optimizations with Apache ...cloud.cs.binghamton.edu/.../03/exploring-design-space-mesos-aurora… · Exploring the Design Space for Optimizations with

Mesos MasterAurora

ResourceNegotiation

Resource Availability CommunicationE

xecu

tor +

Ta

sk C

onfig

Job Submitted

Find

Sui

tabl

e O

ffer

Mesos Agent

Mesos Agent

Mesos Agent

ThermosBenchmark

Clu

ster

N

odes

ThermosBenchmark

ThermosBenchmark

Fig. 1. Architecture of Mesos and Aurora using the Thermos Executor.

allocates pool components to competing applications based ontheir fine-grained resource needs [2]. Mesos applications arethemselves frameworks that view the Mesos layer as a cluster-wide, highly-available, fault tolerant, distributed operatingsystem.

A framework executor enables Mesos to deploy tasks ontonodes. An executor responds to an API defined by Mesos toestablish and maintain registration with Mesos agents, launchtasks onto resources that are in control of the executor’sframework, clean up after tasks that fail, and terminate alltasks on a resource that is being rescinded.

Mesos works as follows:

• A worker daemon analyzes the machine on which it runs,discovers available resources, and advertises them to thecluster-wide Mesos master.

• The Mesos master then partitions the pool of resourcesand makes them available to registered frameworks bysending Resource Offers. Frameworks may refuse an offerif it does not suit its needs. Resources are added backinto the Mesos pool when (a) the framework refuses anoffer, (b) a task completes or fails, thereby freeing uphost resources, or (c) Mesos rescinds an offer.

• Mesos allocation strategies, including Dominant ResourceFairness, encourage fairness between frameworks, striv-ing to allocate sufficient resources to each [8].

A. Apache Aurora

Apache Aurora is a framework that runs on Apache Mesos,as shown in Figure 1. It provides the capability to schedulean application on a Mesos cluster. Aurora has its own DomainSpecific Language that allows users to configure a Job. A Jobconsist of a collection of Tasks, each of which is comprised ofProcesses, which are understood and managed by the Thermosexecutor bundled with Apache Aurora. The executor processruns on worker nodes and launches and monitors tasks.

Job Life-Cycle with Aurora: A job configuration is firstsubmitted to Aurora. The configuration contains resourcerequirements, number of instances to run, and may containconstraints (e.g. task(s) can only be run on a specific node).Depending on the configuration, one or multiple tasks begintheir life-cycle in the Pending state. In this state, each task

seeks to find a resource offer made by Mesos which ap-propriately matches the resource requirements and constraintsdefined by the job configuration. When an adequate resourceoffer is found, the task moves into the Assigned state. ARemote Procedure Call (RPC) is made to a Mesos Agentcontaining the task’s configuration. A Thermos executor islaunched and an acknowledgment is sent back to the Aurorascheduler. Upon receiving the acknowledgment, the schedulermoves the task into the Starting state. During the Startingstate, a sandbox is created in the worker node for the task torun in. Upon the successful creation of a sandbox, the taskenters the Running state where it remains until it enters theFinished state if it is able to run to completion, or the Failedstate if not.

B. Dominant Resource Fairness

The default mapping of jobs to resources is based on Dom-inant Resource Fairness (DRF) [8], which is a generalizationof weighted max-min fairness for multiple resources. Theintuitive basis for DRF is that the allocation for a giventask should be based on the dominant share (the maximumshare that the task has been allocated for any resource). Thiscreates fairness across multiple resources. The key benefitsof this approach are that it can support several resourceallocation policies such as priority, reservation, and deadlinebased allocation.

III. EXPERIMENTS

Our experiments were conducted on the Binghamton Uni-versity Cloud and Big Data Computing Laboratory’s researchcluster, which comprises the following components:

• 4 Baseline CPU/RAM nodes - Two 6 core, 12 thread IntelXeon E5-2620 v3 @ 2.40GHz and 64 GB RAM

• 2 Faster CPU/Baseline RAM node - Two 8 core, 16 threadIntel Xeon E5-2640 v3 @ 2.60GHz and 64 GB RAM

• 2 Fastest CPU/More RAM node - Two 10 core, 20 threadIntel Xeon E5-2650 v3 @ 2.30GHz and 128 GB RAM

The workloads with which we assess performance were de-rived from the DaCapo Benchmark suite. Benchmarks wererun inside Docker containers on the OpenJDK6 JVM. Eachworkload possesses distinct characteristics [4], as listed inTable I. Each node runs 64-bit Linux 4.2.0-18 and sharesan NFS server. Apache Mesos 0.25.0 is deployed as thecluster manager, Apache Aurora 0.11.0 as the scheduler, andDocker 1.9.1 as the container technology. Performance Co-Pilot collects metrics for all nodes in the cluster. Thesemetrics include energy measurements from RAPL counters,and various statistics about CPU and memory usage from theworker nodes. An Ansible playbook limits power consumptionby the nodes in the cluster using the Linux Powercappingframework [6].

The DaCapo benchmarks tradebeans, tradesoap, andtomcat simulate cloud workloads with memory and networkintensive components. The remaining benchmarks simulatediverse types of workloads varying from highly parallel work-loads to highly serial tasks.

Page 3: Exploring the Design Space for Optimizations with Apache ...cloud.cs.binghamton.edu/.../03/exploring-design-space-mesos-aurora… · Exploring the Design Space for Optimizations with

TABLE ILIST OF BENCHMARKS IN THE DACAPO SUITE

avrora Multithreaded AVR microcontrollers simulatorbatik Scalable Vector Graphics generator, limited concurrencyeclipse Eclipse IDE perf. tests, mixed concurrencyfop Multithreaded PDF generator from XSL-FOh2 Multithreaded, in-memory benchmarksjython Python benchmark, limited concurrencyluindex Documents indexer, limited concurrencylusearch Multithreaded keyword finderpmd Multithreaded Java source code analysissunflow Multithreaded Raytracertomcat Multithreaded servertradebeans Multithreaded daytrader benchmark. Uses Java Beans, in-

memory database, and GERONIMOtradesoap Multithreaded daytrader benchmark. Uses SOAP, in-

memory database, and GERONIMOxalan Multithreaded XML to HTML converter.

Power Throttling (Power Capping): To control the powerusage by CPUs in our cluster, we used the Linux Power-capping framework [6]. The Powercapping framework takesadvantage of Intel’s RAPL. RAPL creates a power estimatebased on a highly accurate software model [9]. The powerestimate value determines the P-state at which the processormust operate to make the best effort to meet a power budget.For this set of experiments, we set the power-cap at 50% ofthe TDP for each node. This percentage was determined to bethe most appropriate through experimentation.

Aurora Job creation: To orchestrate the experiments onthe Aurora-Mesos cluster we generated several Aurora jobconfigurations, including several versions of the same bench-mark constrained to only run on a specified node. Our joblauncher submits these jobs through Aurora’s client-side ap-plication, which converts the Domain Specific Language jobconfiguration into an Apache Thrift message and sends it tothe Aurora Scheduler. The job then progresses through thepreviously detailed lifecycle of an Aurora job.

Experimental workflow: The Bin-Packer generates a setof bins, each containing a set of tasks. This information ispassed to a job launcher. The job launcher generates theappropriate Aurora job configuration that results in a specifictask being launched. The set of tasks is submitted in groupsdetermined by the start and end of each bin, with a fixeddelay between bins. We set a configurable delay value at threeseconds because most of the tasks took approximately threeseconds to experience a power spike and subsequently dropdown to their previous state.

Profiling: We run the benchmarks several times on eachnode and use the information from each run to determinevaluable characteristics about each benchmark, including whenit experienced a power surge, CPU utilization and more. Thesecharacteristics are utilized by the bin-packing policies used inour experiments as detailed in section III-A.

A. Policy driven Peak Power ReductionWe demonstrate the efficacy of two different policies, one

which has a local view of peak power usage and one with aglobal view of peak power usage, to demonstrate the designspace of application specific policies with Mesos and Aurora.

1) Max Peak First Bin Packing (MPF-BP): MPF-BP is anadaptation of the commonly known bin-packing algorithm.A sorted list of all the peaks from the profiling stage iscreated. The size of each bin is determined by the thermaldesign power (TDP) of a given node. MPF-BP fits peaksinto a bin such that the resulting set of tasks in each bin is∑n

i=1 PeakPower(Taski) ≤ TDP where n is the number oftasks in the bin. When a bin reaches its maximum capacity orthere are no more tasks in the queue that fit in the bin, a newbin is created. These steps are repeated until all the workloadshave been placed inside a bin.

Power Usage on 3 different nodes with MPF-BP: Weconducted experiments to study the peak power usage, energyconsumption, and memory utilization when MPF-BP is appliedto three different nodes - baseline, faster, and fastest. Theresults for the baseline node are presented in Figure 2 anddescribed below. The Figures for faster and fastest displayedsimilar characteristics and so have not been shown, but theresults are summarized below in this subsection.

Baseline node power characteristics: As shown in Fig-ure 2a, the max peak power used by the node is about thesame when compared to the default configuration.

Figure 2b shows the memory usage for the three cases.Compared to the default run, the MPF-BP optimized anduncapped run achieved a 48% decrease in total memoryusage, while the power-capped-bin-packed achieved a 31%total memory use decrease compared to the default run.

Figure 2c shows that when the node’s power is throttledby 50%, the energy expenditure is reduced by 64% comparedto the baseline run, and is about 4% more efficient than thebin-packed optimized case.

Faster node power characteristics. On the faster node,(Figure not shown) peak power is not substantially differentfrom the runs containing bin-packing and power-capping.However, the frequency with which peaks are reached isreduced, from four large power spikes to three large spikesin the bin-packed run. Memory utilization data shows thatcompared to the default configuration, the MPF-BP optimizedand uncapped run achieved a 131% decrease in total memoryusage, while the power-capped-bin-packed approach achieveda 155% total memory use decrease compared to the defaultrun.

Fastest node power characteristics: On the fastest node(Figure not shown), the difference in peak power between thedefault run and the MPF-BP optimized run is less than thedefault case by 4%. For memory utilization, compared to thedefault configuration, the MPF-BP optimized and uncappedrun achieved a difference of 77% decrease in total memoryutilization, while the power-capped-bin-packed run achieveda difference of 78% total memory use decrease comparedto the default run. The total energy expenditure improved in

Page 4: Exploring the Design Space for Optimizations with Apache ...cloud.cs.binghamton.edu/.../03/exploring-design-space-mesos-aurora… · Exploring the Design Space for Optimizations with

(a) Power (b) Memory Utilization (c) Energy

Fig. 2. This figure shows the power usage over time, memory utilization over time, and energy usage for the baseline node using the MPF-BP policy. (a)outlines the effects of power capping the baseline node. The max peak power is reduced using a power-cap of 50% of the total TDP of the node. Memoryutilization, shown in Figure 2b, shows that compared to the default run, the MPF-BP optimized and uncapped run achieved a 48% increase in total memoryutilization, while the power-capped-bin-packed achieved a 31% total memory use increase compared to the default run.

the bin-pack only run by 28% compared to the bin-packed,power-capped run by 42% as compared to the default run. Animprovement in both of these areas is only experienced whenthe MPF-BP is used with a power-cap of 50%. The max peakpower is decreased by 27% compared to the bin-pack only run,and 23% compared to the default run. The energy expendituresees an improvement of 22% over the bin-pack only run and54% over the default run.

2) Max Average Peak Bin Packing (MAP-BP): MAP-BPgenerates a sorted list of means for all workloads based onthe power consumed by each particular workload. The medianis calculated for the set containing the mean power used byeach benchmark. A bin is packed with tasks that have a higherpeak than the median of the set. The steps are repeated once abin reaches capacity until all tasks have been placed in a bin.Once again, the aggregate sum of the max power of the set oftasks is not allowed to exceed the TDP of the node they willbe scheduled on.

Power Usage on 3 different nodes with MAP-BP: Weconducted experiments to study the peak power usage, energyconsumption, and memory utilization when MAP-BP is ap-plied to three different nodes - baseline, faster, and fastest.Again, the results for just the baseline node are presented inFigure 3. The results for all three nodes are described below.

Baseline node power characteristics. The bin-packed onlyrun in Figure 3a shows similar power peaks to those observedin Figure 2a. The bin-packed only run for MPF-BP experiencesa similar max peak as the bin-packed only MAP-BP run. Themax peak of the power-capped MPF-BP policy reaches above100 watts while the power-capped MAP-BP policy is belowthe same threshold. The bin-packed only run reaches a maxpeak of around 110 Watts while the power-capped-bin-packedmax peak only reaches about 90 Watts.

Faster node power characteristics. In terms of peak power,the bin-packed-only run has a peak of 140 Watts, which is 20%higher than any peak in Figure 2a. Furthermore, its peaks are

24% higher than the baseline. The power-capped-bin-packedcase on the other hand sees a similar improvement to MPF-BPwith only one peak crossing the 100 Watt threshold. Energyusage for the bin-packed-only run sees similar results, withMAP-BP based run incurring 6% more energy consumption.The power-throttled-bin-packed run sees a 15% jump in energyconsumption compared to its MPF-BP counterpart. Memoryutilization shows that compared to the default run, the MPF-BP optimized and uncapped run achieved a 148% increase intotal memory utilization, while the power-capped-bin-packedapproach achieved a 127% total memory use increase com-pared to the default run.

Fastest node power characteristics. The peak power con-sumption achieved by the bin-packed only run is similar tothat exhibited in MPF-BP, with no change in max peak. Thepower-capped-bin-packed run, however, reaches a max peakthat is 11% less compared to its MPF-BP counterpart. Thebin-packed-only version has a decrease of 11% comparedto the MPF-BP bin-packed only run. Total memory usageshows that compared to the default run, the MPF-BP optimizedand uncapped run achieved a 70% increase in total memoryutilization, while the power-capped-bin-packed achieved a48% total memory use increase compared to the default run.

B. Cluster Wide Peak Power Optimization

MPF-BP for the Entire Cluster. Figure 4 shows theMPF-BP policy applied to the entire cluster. Each node wasresponsible for bin packing its own set of tasks. Taking intoaccount the power usage for the entire cluster, the results showa similar trend to the one seen in Figure 2a, The max peakfor the power-capped-bin-packed run shows an improvementover the other two runs by 8% against the default version, and11% against the bin-packed only version. Memory utilization,at its max, was 147% higher than the bin-packed only run and157% higher than the power-capped-bin-packed run. Finally,similar to results shown in Figure 4c, the default run expends

Page 5: Exploring the Design Space for Optimizations with Apache ...cloud.cs.binghamton.edu/.../03/exploring-design-space-mesos-aurora… · Exploring the Design Space for Optimizations with

(a) Power (b) Memory Utilization (c) Energy

Fig. 3. MAP-BP, Baseline Node using the MAP-BP policy. The bin-packed only run shows similar power peaks to those observed in Figure 2a. The mostnoticeable difference is the decrease in consumption around 100 seconds and a late spike at around 150 seconds. The power-capped-bin-packed run displayssimilar characteristic with lower peaks by 90% when compared to the baseline and 1% compared to the optimized run. The energy utilization compared to theMPF-BP is 14% higher for the bin-packed-only run and 17% higher for the power-capped-bin-packed run. Memory utilization, shown in Figure 3b, shows thatcompared to the default run, the MPF-BP optimized and uncapped run achieved a 85% increase in total memory utilization, while the power-capped-bin-packedachieved a 67% total memory use increase compared to the default run.

(a) Power (b) Memory Utilization (c) Energy

Fig. 4. In this experiment we applied the MPF-BP policy to the entire cluster. The power usage for the entire cluster shows a similar trend to the one seenin Figure 2a The max power peak for the default run, however, is 3% smaller than the max peak of the cluster running the MPF-BP. The max peak for thepower-capped-bin-packed run shows an improvement over the other two runs by 8% against the default run and 11% against the power-capped-bin-packedrun. Memory utilization is, at its highest, 147% higher than the bin-packed only run and 157% higher than the power-capped-bin-packed run.f The defaultrun expends 77% more energy than the bin-packed only run and 86% more energy than the power-capped-bin-packed run.

77% more energy than the bin-packed only run, and 86% moreenergy than the power-capped-bin-packed run.

MAP-BP for the Entire Cluster. The power trend for thebin-packed only and power-capped-bin-packed follow similarpatterns, with a 11% difference in peaks. The peak powerfor the bin-packed only approach improves upon the cluster-wide MPF-BP bin-packed only max peak, shown in Figure4a, by 10% while the power-capped-bin-packed version alsoimproves upon its MPF-BP cluster-wide counterpart in Figure4a by 10%. The energy utilization compared to MPF-BPalgorithm is similar to the bin-packed only version. The power-capped-bin-packed version has a 23% increase compared tothe power-capped-bin-packed version run in Figure 4c.

C. Effect of Policy Driven Bin Packing on CPU Utilization

We captured several metrics provided by PerformanceCo-Pilot including kernel.all.cpu.user andkernel.all.cpu.sys. Each of these metrics is sampledevery second to match our RAPL sampling.

CPU Utilization with MPF-BP Policy. Figure 6 representsthe total time spent in both CPU USER and CPU SYS states.The default run on the baseline node sees a CPU activitydrawn out over a period of time with sporadic spikes ofactivity reaching a non-idle CPU time of 23529 ms. Bycomparison, the spikes in CPU usage for MPF-BP runs arehighly compressed. The max spikes happen at different times,with the max spike for the bin-packed only run happening at23770 ms and the max spike for the power-capped-bin-packedrun happening at 23989 ms. The difference between the two

Page 6: Exploring the Design Space for Optimizations with Apache ...cloud.cs.binghamton.edu/.../03/exploring-design-space-mesos-aurora… · Exploring the Design Space for Optimizations with

(a) Power (b) Energy

Fig. 5. These figures show results of the MAP-BP policy on the entire cluster. The power trend for the bin-packed only and power-capped-bin-packed followsimilar patterns, with the difference in peaks being 11%. The peak power for the bin-packed only improves upon the bin-packed only max peak in Figure 4aby 10% while the power-capped-bin-packed improves upon the power-capped-bin-packed max peak in Figure 4a by 10%. The energy utilization is similar tothe MPF-BP policy.

(a) Baseline (b) Faster (c) Fastest

Fig. 6. CPU Utilization. Each plot point represents the total time spent in both CPU USER and CPU SYS states per second. The default run of the baselinenode sees a CPU activity drawn out over a period of time. By comparison, the spikes in CPU usage for MPF-BP runs are highly compressed. The faster andfastest node experiences similar trends. In Figure 6c the peaks for the bin-packed only and power-capped-bin-packed runs are further apart.

CPU utilization peaks is 1%. This indicates a lot of time wasspent by the default run on non-cpu bound work (i.e. I\O) orin the CPU IDLE state. The faster node experiences similartrends. The default run experiences max peak of non-idle CPUtime of 31288 ms. The bin-packed only run of the policy hasmax peak of non-idle CPU time of 31247 ms, while the power-capped-bin-packed run has max peak of 30461 ms. For thefastest node, the trend remains similar. However, in Figure6c, the peaks for the bin-packed only and power-capped-bin-packed are further apart. The bin-packed only has an max peakof non-idle CPU time of 37225 ms, while the power-capped-bin-packed has a max peak of non-idle CPU time of 38001ms.

The CPU utilization for the MAP-BP policy showed similartrends. The performance data is not shown due to spaceconstraints.Cluster CPU Utilization with MPF-BP and MAP-BP

Figure 7a shows the CPU usage for MPF-BP policy. Thereare a few and short bursts of CPU utilization for the default

run. CPU activity is limited from about 160 seconds. Incontrast, the bin-packed only run contains a more compressednumber of peaks, indicating that CPU is busy throughout therun. The same observation can be made for the power-capped-bin-packed run which has the highest peak of CPU utilizationout of all the runs. In comparison to the MPF-BP policy, theMAP-BP decreases the utilization of the CPU. Out of the threeconfigurations, only the default configuration passes the 120second mark. In the MAP-BP optimized configuration, theCPU utilization peak is reached early in the run and graduallydecreases around the 100th second. A similar trend is observedby the power-capped-bin-packed run.

D. Analysis of our Policies on Aurora and Mesos

The improvements for both the policies can be attributedto the default task scheduling mechanism that exists in Mesosand Aurora. The default mechanism does not take into accountthe power profiles of the nodes for a given benchmark. Ourpolicies ensure that tasks are scheduled such that multiple

Page 7: Exploring the Design Space for Optimizations with Apache ...cloud.cs.binghamton.edu/.../03/exploring-design-space-mesos-aurora… · Exploring the Design Space for Optimizations with

(a) MPF-BP (b) MAP-BP

Fig. 7. The CPU usage for MPF-BP policy shows few and short bursts of CPU utilization for the default run. Furthermore, CPU activity is very limited fromabout 160 seconds. In contrast, the bin-packed only run contains a more compressed number of peaks, indicating that CPU is busy throughout the run. Thesame observation can be made about the power-capped-bin-packed run which has the highest peak of CPU utilization out of all runs.

tasks do not experience peak power at the same time. Mesosrecycles non-reserved resources from an offer as a new, smallerresource offer [2]. The improvements experienced by ourbin-packing policies can be further attributed to Apache Au-rora’s offer caching mechanism and Mesos’ resource recyclingprocedure reacting positively to the configurable 3 seconddelay we have inserted between tasks. Aurora’s first matchscheduling is now forced to follow the scheduling order ourpolicies have enforced instead of a generic policy it uses bydefault that is based on the jobs that are in the queue and whichresource offers have been received and cached. Mesos andAurora are top level Apache projects that are subject to activecontributions by the open source community. We developeda framework to evaluate and analyze different policies forusing these tools, off the shelf, without making any changesto Mesos 0.25.0 nor Aurora 0.11.0.

IV. RELATED WORK

To our knowledge, there is no other work that addressespeak power management in Apache Mesos and Apache Au-rora. Leverich and Kozyrakis [10] approach conserving powerin Hadoop Clusters by utilizing Hadoop’s replication strategyto produce a Covering Subset (CS) of the cluster that containsat least one replica of each data-block. This allows nodes not inthe CS to be disabled to conserve power. Lang and Patel [11]re-interpret this same Covering Subset problem, but instead ofleaving the cluster online at all times with some nodes sleep-ing, they consider what would happen if the cluster was asleepuntil a job was queued. Both [10], [11] report considerablepower savings. Once again, this approach relies on HDFS,or another similar File System, so its applicability is notgeneral enough to be able to always provide energy savings invarious types of distributed environments. Additionally theseapproaches remain counter to the claim by Meisner et al. [12]that in practice, powering servers down does not producesubstantial power gains without loss of responsiveness. Liet al. [13] use CPU temperature for scheduling jobs on aMapReduce cluster, but do not consider I/O intensive loads.

In [14], Hartog et. al. quantify the relationship between CPUtemperature and energy consumption. They show that nodes ina cluster with higher CPU temperature consume more energy.With the goal of reducing the overall energy consumption of acluster, they adapted the MARLA MapReduce framework [15]and used the CPU temperature of each node to dynamicallyschedule work to the nodes in a heterogeneous cluster.

V. CONCLUSION

Our policy driven approach is highly effective for use withApache Mesos and Aurora.

• The order in which tasks are co-scheduled in Mesos andAurora has a significant impact on energy usage, resourceutilization, and peak power usage.

• The MPF-BP policy achieves 8% reduction in peak powerusage compared to the default run of Mesos and Aurorain a cluster. Additionally, it provides a gain of 86% inenergy savings.

• The power-capped-bin-packed MAP-BP policy, com-pared to the power-capped-bin-packed MPF-BP policy,achieves 10% reduction in peak power usage. However,it suffers an increase of 95% in energy consumptionin comparison to the power-capped-bin-packed MPF-BPpolicy.

• Inserting delays between sets of job submission mitigatespenalties created by large resource offers and small tasks.

• Based on results, prioritizing the tasks with the highestenergy peaks (MPF-BP) results in the favorable scenarioscompared to MAP-BP and default. It must be noted,however, that MPF-BP has the potential for starvingtasks with the lowest peak power. Additionally, theseapproaches require a power-profile which may not alwaysbe available and the study is limited in scope to a only afew applications.

REFERENCES

[1] C. Delimitrou and C. Kozyrakis, “Paragon: QoS-aware scheduling forheterogeneous datacenters,” ACM SIGARCH Computer ArchitectureNews, vol. 41, no. 1, pp. 77–77, 5 2013.

Page 8: Exploring the Design Space for Optimizations with Apache ...cloud.cs.binghamton.edu/.../03/exploring-design-space-mesos-aurora… · Exploring the Design Space for Optimizations with

[2] B. Hindman, A. Konwinski, and M. Zaharia, “Mesos: A Platform forFine-Grained Resource Sharing in the Data Center.” NSDI, 2011.

[3] “Apache Aurora.” [Online]. Available: http://aurora.apache.org/[4] S. M. Blackburn, S. Z. Guyer, M. Hirzel, A. Hosking, M. Jump,

H. Lee, J. Eliot, B. Moss, A. Phansalkar, D. Stefanovic, T. VanDrunen,R. Garner, D. von Dincklage, B. Wiedermann, C. Hoffmann, A. M.Khang, K. S. McKinley, R. Bentzur, A. Diwan, D. Feinberg, andD. Frampton, “The DaCapo benchmarks,” ACM SIGPLAN Notices,vol. 41, no. 10, p. 169, 10 2006.

[5] H. David, E. Gorbatov, U. R. Hanebutte, R. Khanaa, and C. Le, “RAPL,”in Proceedings of the 16th ACM/IEEE international symposium on Lowpower electronics and design - ISLPED ’10. New York, New York,USA: ACM Press, 2010, p. 189.

[6] “Power Capping Framework.” [Online]. Available: https://www.kernel.org/doc/Documentation/power/powercap/powercap.txt

[7] “Scaling Mesos at Apple, Bloomberg, Netflix and more -Mesosphere.” [Online]. Available: https://mesosphere.com/blog/2015/08/25/scaling-mesos-at-apple-bloomberg-netflix-and-more/

[8] A. Ghodsi, M. Zaharia, B. Hindman, A. Konwinski, S. Shenker, andI. Stoica, “Dominant Resource Fairness : Fair Allocation of MultipleResource Types Maps Reduces,” Ratio, vol. 167, no. 1, p. 2424, 2011.

[9] E. Rotem, A. Naveh, A. Ananthakrishnan, E. Weissmann, and D. Ra-jwan, “Power-Management Architecture of the Intel MicroarchitectureCode-Named Sandy Bridge,” IEEE Micro, vol. 32, no. 2, pp. 20–27, 32012.

[10] J. Leverich and C. Kozyrakis, “On the energy (in)efficiency of Hadoopclusters,” ACM SIGOPS Operating Systems Review, vol. 44, no. 1, p. 61,3 2010.

[11] W. Lang and J. M. Patel, “Energy management for MapReduce clusters,”Proceedings of the VLDB Endowment, vol. 3, no. 1-2, pp. 129–139, 92010.

[12] D. Meisner, C. M. Sadler, L. A. Barroso, W.-D. Weber, and T. F.Wenisch, “Power management of online data-intensive services,” ACMSIGARCH Computer Architecture News, vol. 39, no. 3, p. 319, 7 2011.

[13] S. Li, T. Abdelzaher, and M. Yuan, “TAPA: Temperature aware powerallocation in data center with Map-Reduce,” in 2011 International GreenComputing Conference and Workshops. IEEE, 7 2011, pp. 1–8.

[14] J. Hartog, Z. Fadika, E. Dede, and M. Govindaraju, “Configuring aMapReduce Framework for Dynamic and Efficient Energy Adaptation,”in 2012 IEEE Fifth International Conference on Cloud Computing.IEEE, 6 2012, pp. 914–921.

[15] Z. Fadika, E. Dede, J. Hartog, and M. Govindaraju, “MARLA: MapRe-duce for Heterogeneous Clusters,” in 2012 12th IEEE/ACM InternationalSymposium on Cluster, Cloud and Grid Computing (ccgrid 2012).IEEE, 5 2012, pp. 49–56.