xs oracle 2009 just run it

18
JustRunIt: Experiment-Based Management with Xen Wei Zheng 1 Ricardo Bianchini 1 Yoshio Turner 2 J. Renato Santos 2 G. Janakiraman 3 1 Rutgers University 2 HP Labs 3 Skytap Xen Summit 2009

Upload: xen-project

Post on 05-Dec-2014

765 views

Category:

Technology


0 download

DESCRIPTION

Yoshio Turner: Just RunIt

TRANSCRIPT

Page 1: XS Oracle 2009 Just Run It

JustRunIt: Experiment-BasedManagement with Xen

Wei Zheng1 Ricardo Bianchini1 Yoshio Turner2

J. Renato Santos2 G. Janakiraman3

1Rutgers University 2HP Labs3Skytap

Xen Summit 2009

Page 2: XS Oracle 2009 Just Run It

Data Center Management

• Challenging– Variety of tasks: Resource allocation, software/hardware

upgrades, application and OS configuration…– Complex: Affect performance, availability, energy consumption– Getting worse: larger scale data centers hosting more diverse

and dynamic workloads

• Automation might save us– Using analytical models: insight into system behavior,

fast exploration of large parameter spaces

• Modeling has some important drawbacks– Expensive to develop, often relies on simplifying assumptions,

may require re-calibration and re-validation as systems evolve

Page 3: XS Oracle 2009 Just Run It

• Experiments may be a better approach than modeling for many management tasks– Low cost: time and energy consumed by a few machines

– No simplifying assumptions

– No need for calibration/validation

• Use of VMs in production facilitates experimentation

JustRunIt – Infrastructure for experiment-based management of virtualized data centers hosting multiple services

Our Approach:Experiment-Based Management

Page 4: XS Oracle 2009 Just Run It

JustRunIt Architecture

X X

X X X

X I I X

I I I I

X X I X

T T

T

Driver

Checker

Heuristics

Coordinate Range

Time Limit

Interpolator

Management Entity

Coordinates

Experiment resultsCoord

inate

s

Experim

ent

resu

lts

Experimenter

Page 5: XS Oracle 2009 Just Run It

Experimenter• Step 1: Clone subset of

production system to a sandbox

VMVM

Page 6: XS Oracle 2009 Just Run It

Experimenter• Step 1: Clone subset of

production system to a sandbox– VM cloning: Modify Xen live

migration to resume original VM instead of destroying it

– Storage cloning: LVM copy-on-write snapshot for sandbox VM

VMVM

VM

Page 7: XS Oracle 2009 Just Run It

Experimenter• Step 1: Clone subset of

production system to a sandbox– VM cloning: Modify Xen live

migration to resume original VM instead of destroying it

– Storage cloning: LVM copy-on-write snapshot for sandbox VM

– L2/L3 network address translation: implemented in driver domain netback driver to prevent network address conflict

VMVM

VM

Page 8: XS Oracle 2009 Just Run It

Experimenter• Step 1: Clone subset of

production system to a sandbox

– VM cloning: Modify Xen live migration to resume original VM instead of destroying it

– Storage cloning: LVM copy-on-write snapshot for sandbox VM

– L2/L3 network address translation: implemented in driver domain netback driver to prevent network address conflict

– Apply configuration changes : e.g., resource allocation

– Resume execution

VMVM

VM

Page 9: XS Oracle 2009 Just Run It

A1

A2

A1

A2

A2

A3

A2

A3

A1

A2

A3

W1

W2

W1

W2

W2

W3

W1

W2

W3

D1

D2

D1

D3

D2

D3

D1

D2

D3

S_A2

Requests Replies

Assess performance and energy for different configurations

S_D2

S_W2

S_W2 S_A2 S_D2

Page 10: XS Oracle 2009 Just Run It

• Proxies filter requests/replies from the sandbox VM

• Emulates the timing and functional behavior of preceding and following service tiers– Application protocol level requests/replies (e.g. HTTP)

• In-proxy replays with fixed delay– Delay needed if sandbox is faster than production system

Experimenter

Tier-N VMIn-Proxy Out-Proxy

Sandbox VM

• Step 2: Workload replay using network proxies

Page 11: XS Oracle 2009 Just Run It

Req0 t0 Resp0 t0’ Req0 t00 Resp0 t00’Req1Req2Req3

t01t02t03

Reqn t0n

Resp1Resp2Resp3

t01’t02’t03’

Respn t0n’

Resp0 ts0’Req1Req2Req3

t1t2t3

Reqn tn

Resp1Resp2Resp3

t1’t2’t3’

Respn tn’

Resp1 ts1’Resp2 ts2’Resp3 ts3’

Respn tsn’

Throughput

Mean response time

Online & Sandbox

Online & Sandbox

Proxies

Tier-N VMIn-Proxy Out-Proxy

Sandbox VM

Page 12: XS Oracle 2009 Just Run It

Experiment Driver

• Fill in results matrix within a time limit

• Corners

• Midpoints (recursive)

• Heuristics

−Cancel experiments if gain for a resource addition falls below a threshold

−Cancel experiments for tiers that do not produce the largest gains from a resource addition

X X

XX

X X

X

X

X

CPU allocation

CPUFreq

(min,min)

Page 13: XS Oracle 2009 Just Run It

Evaluation Overview

• Implementation– Xen (< 50 lines python, <250 lines of C in netback)– Proxies for HTTP, mod_jk, MySQL (800-1500 LoC each)

• Based on Tinyproxy codebase• Two services (auction and bookstore)

• Case studies– Resource management: CPU– Evaluation of hardware upgrade

• Proxy and Replay Overhead– Throughput: No impact– Response time: < 5 ms impact

Page 14: XS Oracle 2009 Just Run It

Case Study 1: Resource Management

• Goal: satisfy < 50ms response time using minimum resource

• JustRunIt determines performance as function of resource allocation

• Management entity assigns resources using a bin packing algorithm (simulated annealing)– Minimize resource usage and VM migrations

• Compare experiment-based with a highly accurate model

Page 15: XS Oracle 2009 Just Run It

Case Study 1 Results

0

10

20

30

40

50

60

1 2 3 4 5 6 7 8 9

Time (minutes)

Resp

on

se T

ime (

ms)

Service 0 RT

Service 1 RT

Service 2 RT

Service 3 RT

0

10

20

30

40

50

60

1 2 3 4 5 6 7 8 9

Time (minutes)

Resp

on

se T

ime (

ms)

Service 0 RT

Service 1 RT

Service 2 RT

Service 3 RT

JustRunIt

Highly accurate modeling

Page 16: XS Oracle 2009 Just Run It

Case Study 2: Hardware Upgrades

• Sandbox on upgraded hardware

• Bin packing finds the number of new machines to accommodate the workload

• Example scenario: two services, each app server uses 90% of one CPU core on old hardware– New machine requires 72%

– Example too small to show any consolidation benefits of upgrading

Page 17: XS Oracle 2009 Just Run It

Conclusions

• JustRunIt infrastructure can support multiple (automated) management tasks

– Answers “what-if” questions realistically and transparently using actual experiments

• Possible directions include:

– Tier interactions

– Hypothetical workload mix

– Validate administrator actions

– Tradeoff between accuracy and experiment cost (resource usage and experiment time required)

Page 18: XS Oracle 2009 Just Run It

Related Work

• Modeling, simulation of data centers– Development and validation cost, simulation slow-down

• Scaled down emulation– DieCast [NSDI08] uses VMs and time dilation for emulation (JustRunIt

targets subset of mgmt tasks, native execution, and minimal additional hardware)

• Sandboxing for managing data centers– Prior work from Rutgers (validating operator actions OSDI04

USENIX06, entire data center experiments EUROSYS07) and file server verification (Tan et al USENIX 05)

– Very different infrastructure for JustRunIt (extrapolation, verification, application-transparency using VM cloning – practicality)