1 support for dynamic adaptation of power-aware server clusters vinicius petrucci, orlando loques...
Post on 18-Dec-2015
216 Views
Preview:
TRANSCRIPT
1
Support for dynamic adaptation ofpower-aware server clusters
Vinicius Petrucci, Orlando LoquesFluminense Federal University, Brazil
Daniel MosséUniversity of Pittsburgh, USA
March, 2009
2
Research context
• Dynamic computing environments– varying workloads– resources variability (including component failures)– changing user needs
• Applications have to cope with changes– adaptive behavior requirement
• Support for dynamic adaptations– reusable infrastructure– adaptation language
3
Application cases• Server Clusters
– power optimization and QoS control• Wireless sensor networks
– bandwidth availability, data reliability (accuracy), power optimization
• Overlay networks– topology reconfiguration
• Grids– shared (heterogeneous) resources with varying quality and
availability• Pervasive / ubiquitous computing
4
Wireless sensor networksDistributed autonomous devices which rely on sensors to cooperatively monitor physical or environmental conditions
* Wikipedia.org
Example: energy optimization can be achieved by turning some sensors on/off
5
Videoconferencing system
Set of servers (called reflectors) that route the audio/video streamsto the participating clients.
Example: monitoring and control of the reflector configuration to meet the QoS
refUFF
refLMPD
refUERJ
Clients
6
Server clusters
Example: power optimization while meeting performance / QoS requirements
Clients have a single view through ServerCluster component (load balancer) and requests are processed by back-end servers
7
Problem
• Adaptive policies for applications– implementation may be complex in itself– most of those are implemented in ad-hoc fashion
• Code for adaptation policies is– mixed with the application code– costly and difficult to modify and maintain in a real
operational environment
7
8
Approach
• Generic solution to support adaptations– external reusable infrastructure to monitor and
adapt running applications– contract-based adaptation language for
representing high-level policies
• Software architecture abstractions– representation of application configurations– stored as meta-level data (object model)
8
9
Related work
• Rainbow (CMU)– adaptation language + supporting infrastructure
• Autonomic managers (IBM)– provides a generic view of autonomic computing
• Jade (INRIA)– lack of adaptation knowledge representation
• CASA (Univ. of Zurich)– contract-based language using XML
• We propose a lightweight approach based on scripting/dynamic language facilities
10
Autonomic computing (IBM)
10
Knowledge: adaptation models, data, and scripts
General feedback control loop
13
Adaptation language• Profiles
– conditions for triggering adaptations• Adaptations
– steps to move an application away from an undesirable condition
• Negotiation clauses– particular order to deploy the adaptations
• Constructs: adapt_period, settling_time– cater for timing issues of adaptation
13
18
Scripting languages
• Scripting/dynamic language (Python)– high-level abstractions for expressing
dynamic adaptation policies– built-in functions simplify infrastructure
development (e.g., compile, exec)• Abstract adaptation operators
– mapped to application-level operations at run-time
– may rely on APIs provided by the app support level (e.g., Apache modules API)
18
19
Multiple adaptation contracts
• Support for multiple domains of adaptation– each contract has one thread of control
• Simple concurrency model– global locking mechanism– First-Come, First-Serve approach
19
while contract.running:
for a in contract.adaptations:
if a.profile is True:
execute adaptation code of “a”
sleep for “settling_time” interval
sleep for “adapt_period” interval
20
The case of server clusters
• Server utilization remains very low– average about 6%
• Energy consumption is high and growing– about 9% per year
• Carbon emissions are set to quadruple by 2012– projected to surpass the airline industry
• Great opportunity for dynamic adaptations
20
Source: Uptime Institute (McKinsey & Co. Report --- http://uptimeinstitute.org)
21
Dynamic adaptations
• Dynamic adaptation capabilities– CPU DVFS (dynamic voltage/frequency scaling)– server on/off mechanisms (e.g., suspend-to-RAM +
wake-on-LAN)
• Power and performance trade-off– servers' capacity management to reduce energy
consumption– guarantee of QoS requirements (e.g., utilization or
response time)
21
22
Configuration problem
22
N = number of servers; Fi = number of frequencies of the server i p_busy, p_idle = power costperf = servers’ performance Xij = decision variabledemand = incoming workload
27
Adaptation example
• Thresholds for cluster utilization– e.g., T_LOW = 0.70 and T_HIGH = 0.85
27
profile { webcluster.load / webcluster.maxLoad() < T_LOW} util_low;
profile { webcluster.load / webcluster.maxLoad() > T_HIGH} util_high;
28
Adaptation example
28
contract { adaptation { demand = webCluster.load / T_HIGH changeConf = webCluster.bestConfig(demand) for (s, f) in changeConf: if f == 0: webCluster.turnOff(s) else: if s.status == 0: webCluster.turnOn(s) webCluster.adjustFreq(s,f) } adjustCluster with util_low or util_high \ settling_time 6000/*ms*/;
} decision1 adapt_period 5000/*ms*/;
29
Adaptation example
29
contract { adaptation { demand = webCluster.load / T_HIGH changeConf = webCluster.bestConfig(demand) for (s, f) in changeConf: if f == 0: webCluster.turnOff(s) else: if s.status == 0: webCluster.turnOn(s) webCluster.adjustFreq(s,f) } adjustCluster with util_low or util_high \ settling_time 6000/*ms*/;
} decision1 adapt_period 5000/*ms*/;
30
Adaptation example
• Common monitoring support– e.g., variable access: webcluster.load
• Reusable adaptation operators– e.g., webcluster.turnOn(), webcluster.turnOff()
• Some of policy-specific operators can also be defined– e.g., webcluster.bestConfig()
• Different adaptation polices can be used
30
31
Application-specific layer
• Apache built-in load balancer module– mod_proxy_balancer
• New apache module in C (mod_frontend)– Expose an API (XML-RPC) for
• monitoring system properties• controlling the front-end web server
– Example• sensors -> load (req/s), req. response time• actuators -> DVS, On/Off
31
36
Supporting multiple contracts
36* Running concurrent adaptation contracts: power management and fault tolerance
37
Different adaptation policies
37
Disruption : the number of turning on (and off) adaptations, which may involve a switching cost.
What is the best way to minimize disruption AND energy consumption ??
Future study : anticipatory adaptation model, risk-aware controller ...
38
Adaptation time overhead
38
• The worst case measured (overall adaptation phase): 13,045.78 ms• Operations: 1 on, 1 off, and 2 adj. freq. => 12,012ms + 1,005ms + 7ms + 7ms = 13,013ms• Framework overhead = 32.78 ms
39
Conclusion
• Framework-based approach to support dynamic adaptations– power and performance management for server
clusters
• Re-usability of the adaptation infrastructure– simplifies both evaluation and management of
different adaptation policies / requirements– helps to reduce the development cost of adaptive
applications
39
40
Future work
• Improvements in the framework– forecasting for adaptation decisions
• Other power-aware adaptations– multi-core architecture / memory systems
• Optimization algorithms for adaptation– processor allocation among multiple services /
applications• Experimental evaluation
– virtualization -> consolidation, live migration– more realistic/real workloads
40
42
Fault tolerance contract
42
contract { adaptation {
srv = webcluster.getFailedServer() newsrv = webcluster.allocNewServer() if newsrv: webcluster.replaceServer(srv, newsrv) else: webcluster.log(“could not allocate server”)
} repair with server_fail settling_time 4000/*ms*/;
} fault_tolerance adapt_period 1000/*ms*/;
profile { webcluster.failure > 0 } server_fail;
43
Fault tolerance contract
43
contract { adaptation {
srv = webcluster.getFailedServer() newsrv = webcluster.allocNewServer() if newsrv: webcluster.replaceServer(srv, newsrv) else: webcluster.log(“could not allocate server”)
} repair with server_fail settling_time 4000/*ms*/;
} fault_tolerance adapt_period 1000/*ms*/;
profile { webcluster.failure > 0 } server_fail;
top related