observe-analyze-act paradigm for storage system resource arbitration

23
Observe-Analyze-Act Paradigm for Storage System Resource Arbitration Li Yin 1 Email: [email protected] Joint work with: Sandeep Uttamchandani 2 Guillermo Alvarez 2 John Palmer 2 Randy Katz 1 1:University of California, Berkeley 2: IBM Almaden Research Center

Upload: elsie

Post on 13-Feb-2016

34 views

Category:

Documents


1 download

DESCRIPTION

Observe-Analyze-Act Paradigm for Storage System Resource Arbitration. Li Yin 1 Email: [email protected] Joint work with: Sandeep Uttamchandani 2 Guillermo Alvarez 2 John Palmer 2 Randy Katz 1 1:University of California, Berkeley 2: IBM Almaden Research Center. Outline. - PowerPoint PPT Presentation

TRANSCRIPT

Page 1: Observe-Analyze-Act Paradigm for Storage System Resource Arbitration

Observe-Analyze-Act Paradigm for Storage System Resource Arbitration

Li Yin1

Email: [email protected]

Joint work with: Sandeep Uttamchandani2Guillermo Alvarez2

John Palmer2

Randy Katz1

1:University of California, Berkeley 2: IBM Almaden Research Center

Page 2: Observe-Analyze-Act Paradigm for Storage System Resource Arbitration

Outline

Observe-analyze-act in storage system: CHAMELEON Motivation System model and architecture Design details Experimental results

Observe-analyze-act in other scenarios Example: network applications

Future challenges

Page 3: Observe-Analyze-Act Paradigm for Storage System Resource Arbitration

Need for Run-time System Management

Static resource allocation is not enough Incomplete information of the access characteristics: workload

variations; change of goals Exception scenarios: hardware failures; load surges.

E-mail

Application Data Warehousing

Web-server

Application

Storage Virtualization(Mapping Application-data to Storage Resources)

SLA Goals

Page 4: Observe-Analyze-Act Paradigm for Storage System Resource Arbitration

Approaches for Run-time Storage System Management Today: Administrator observe-analyze-act Automate the observe-analyze-act:

Rule-based system Complexity Brittleness

Pure feedback-based system Infeasible for real-world multi-parameter tuning

Model-based approaches Challenges:

• How to represent system details as models?• How to create/evolve models?• How to use models for decision making?

Page 5: Observe-Analyze-Act Paradigm for Storage System Resource Arbitration

System Model for Resource Arbitration

Input: SLAs for workloads Current system status (performance)

Output: Resource reallocation action (Throttling decisions)

Host 1 Host 2 Host n

InterconnectionFabric Switch 1

controller

QoS DecisionModule

Throttling Points

InterconnectionFabric Switch m

controller controller

Page 6: Observe-Analyze-Act Paradigm for Storage System Resource Arbitration

Our Solution: CHAMELEONObserve

Throttling Value

Incremental ThrottlingStep Size

Feedback

Managed SystemCurrent

States

Current States

Retrigger Designer-defined PoliciesComponent

Models

WorkloadModels

ActionModels

KnowledgeBase

Piece-wiseLinear

Programming

ReasoningEngine

ThrottlingExecutor

Analyze Act

Models

Page 7: Observe-Analyze-Act Paradigm for Storage System Resource Arbitration

Knowledge Base: Component Model

Objective: Predict service time for a given load at a component (For example: storage controller).

Service_timecontroller = L( request size, read write ratio, random sequential ratio, request rate)

An example of component model FAStT900, 30 disks, RAID0 Request Size 10KB, Read/Write Ratio 0.8, Random Access

Controller Model

0

5

10

15

20

25

0 500 1000 1500 2000 2500 3000 3500

Requests/Second

Aver

age

Resp

once

Tim

e (m

s)

Page 8: Observe-Analyze-Act Paradigm for Storage System Resource Arbitration

Component Model (cont.)

Quadratic Fit S = 3.284, r = 0.838

Linear Fit S = 3.8268, r = 0.739

Non-saturated case: Linear Fit S = 0.0509, r = 0.989

Page 9: Observe-Analyze-Act Paradigm for Storage System Resource Arbitration

Knowledge Base: Workload Model Objective: Predict the load on component i as a function

of the request rate j

Example: Workload with 20KB request size,

0.642 read/write ratio and 0.026 sequential access ratio

Component_loadi,j= Wi,j( workload j request rate)

Page 10: Observe-Analyze-Act Paradigm for Storage System Resource Arbitration

Knowledge Base: Action Model Objective: Predict the effect of corrective actions on

workload requirements

Example:

Workload J request Rate = Aj(Token Issue Rate for Workload J)

Page 11: Observe-Analyze-Act Paradigm for Storage System Resource Arbitration

Analyze Module: Reasoning EngineFormulated as a constraint solving problem

Part 1: Predict Action Behavior: For each candidate throttling decision, predict its

performance result based on knowledge base

Part 2: Constraint Solving: Use linear programming technique to scan all feasible

solutions and choose the optimal one

Page 12: Observe-Analyze-Act Paradigm for Storage System Resource Arbitration

Reasoning Engine: Predict Result Chain all models together to predict action result Input: Token issue rate for each workloads Output: Expected latency

latency

load

Wor

kloa

dR

eque

st R

ate

Token Issue Rate

Action Model

Com

pone

ntLo

adWorkload

Request RateWorkload Model

Workload 1

Workload nComponent Model

Page 13: Observe-Analyze-Act Paradigm for Storage System Resource Arbitration

Reasoning Engine: Constraint Solving

Formulated using Linear Programming Formulation:

Variable: Token issue rate for each workload

Objective Function: Minimize number of workloads violating their

SLA goals Workloads are as close to their SLA IO rate

as possible Example:

Constraints: Workloads should meet their SLA latency

goals

FAILED EXCEED

LUCKYMEET

0 1

1

IOps(%)

Late

ncy(

%)

where pai= Workload priority pbi = Quadrant prioritySLAi

Minimize ∑paipbi[ SLAi – T(current_throughputi, ti)]

Page 14: Observe-Analyze-Act Paradigm for Storage System Resource Arbitration

Act Module: Throttling Executor Hybrid of feedback

and prediction Ability to switch to

rule-based (policies) when confidence value is low

Ability to re-trigger reasoning engine

ReasoningEngine Invoked

Confidence Value <

Threshold

Analyze System States

ContinueThrottling

Re-triggerReasoningEngine

Execute DesignerPolicies

Yes

Execute Reasoning Engine Output with Step-wise Throttling Proportional toConfidence Value

No

Page 15: Observe-Analyze-Act Paradigm for Storage System Resource Arbitration

Experimental Results

Test-bed configuration: IBM x-series 440 server (2.4GHz 4-way with 4GB memory, redh

at server 2.1 kernel) FAStT 900 controller 24 drives (RAID0) 2Gbps FibreChannel Link

Tests consist of: Synthetic workloads Real-world trace replay (HP traces and SPC traces)

Page 16: Observe-Analyze-Act Paradigm for Storage System Resource Arbitration

Experimental Results: Synthetic Workloads Effect of priority values on the output of constraint solver

Effect of model errors on output of the constraint solver

(a) Equal priority (b) Workload priorities (c ) Quadrant priorities

(a) Without feedback (b) with feedback

Page 17: Observe-Analyze-Act Paradigm for Storage System Resource Arbitration

Experiment Result: Real-world Trace Replay

Real-world block-level traces from HP (cello96 trace) and SPC (web server)

A phased synthetic workload acts as the third flow Test goals:

Do they converge to SLAs? How reactive the system is? How does CHAMELEON handle unpredictable variations?

Page 18: Observe-Analyze-Act Paradigm for Storage System Resource Arbitration

Real-world Trace Replay Without CHAMELEON: With throttling

Page 19: Observe-Analyze-Act Paradigm for Storage System Resource Arbitration

Real-world Trace Replay With periodic un-throttling Handling system changes

Page 20: Observe-Analyze-Act Paradigm for Storage System Resource Arbitration

Other system management scenarios? Automate the observe-analyze-act loop for other self-

management scenarios Example: CHAMELEON for network applications

Example: A proxy in front of server farm

1: monitor network/server behaviors2: establish/

maintain modelsfor system behaviors3: Analyze and

make actiondecision

4: Execute orDistribute actiondecision to execution points

Page 21: Observe-Analyze-Act Paradigm for Storage System Resource Arbitration

Future Work

Better methods to improve model accuracy More general constraint solver Combining with other actions CHAMELEON in other scenarios CHAMELEON for reliability and failure

Page 22: Observe-Analyze-Act Paradigm for Storage System Resource Arbitration

References

L. Yin, S. Uttamchandani, J. Palmer, R. Katz, G. Agha, “AUTOLOOP: Automated Action Selection in the ``Observe-Analyze-Act’’ Loop for Storage Systems”, submitted for publication, 2005

S. Uttamchandani, L. Yin, G. Alvarez, J. Palmer, G. Agha, “CHAMELEON: a self-evovling, fully-adaptive resource arbitrator for storage systems”, to appear in USENIX Annual Technical Conference (USENIX’05), 2005

S. Uttamchandani, K. Voruganti, S. Srinivasan, J. Palmer , D. Pease, “Polus: Growing Storage QoS Management beyond a 4-year old kid”, 3rd USENIX Conference on File and Storage Technologies (FAST’04), 2004

Page 23: Observe-Analyze-Act Paradigm for Storage System Resource Arbitration

Questions?

Email: [email protected]