latency as a performability metric for internet services

20
Latency as a Performability Metric for Internet Services Pete Broadwell [email protected]

Upload: eryk

Post on 24-Feb-2016

38 views

Category:

Documents


0 download

DESCRIPTION

Latency as a Performability Metric for Internet Services. Pete Broadwell [email protected]. Outline. Performability background/review Latency-related concepts Project status Initial test results Current issues. 9. 9. 9. 9. 9. Motivation. - PowerPoint PPT Presentation

TRANSCRIPT

Page 1: Latency as a Performability Metric for Internet Services

Latency as a Performability Metric for Internet Services

Pete [email protected]

Page 2: Latency as a Performability Metric for Internet Services

Outline1. Performability

background/review2. Latency-related concepts3. Project status

• Initial test results• Current issues

Page 3: Latency as a Performability Metric for Internet Services

• A goal of ROC project: develop metrics to evaluate new recovery techniques

• Problem: basic concept of availability assumes system is either “up” or “down” at a given time

• “Nines” only describe fraction of uptime over a certain interval

Motivation

99 999

Page 4: Latency as a Performability Metric for Internet Services

• Availability doesn’t describe durations or frequencies of individual outages– Both can strongly influence user perception

of service, as well as revenue• Availability doesn’t capture system’s

capacity to support degraded service– degraded performance during failures– reduced data quality during high load (Web)

Why Is Availability Insufficient?

Page 5: Latency as a Performability Metric for Internet Services

What is “performability”?• Combination of performance and

dependability measures• Classical defn: probabilistic (model-

based) measure of a system’s “ability to perform” in the presence of faults1

– Concept from traditional fault-tolerant systems community, ca. 1978

– Has since been applied to other areas, but still not in widespread use

1 J. F. Meyer, Performability Evaluation: Where It Is and What Lies Ahead, 1994

Page 6: Latency as a Performability Metric for Internet Services

Performability ExampleDiscrete-time Markov chain (DTMC) model of

a RAID-5 disk array1

1 Hannu H. Kari, Ph.D. Thesis, Helsinki University of Technology, 1997

pi(t) = probability that system is in state i at time t

p0(t)

Normaloperation

= failure rate of a single disk drive

D = number of data disks

(D+1)

p1(t)

1 disk failed,repair necessary

= disk repair rate

D

p2(t)

Failure -data loss

wi(t) = reward (disk I/O operations/sec)

w0(t) w1(t) w2(t)

Page 7: Latency as a Performability Metric for Internet Services

Degraded throughput

Average throughput

Visualizing PerformabilityThroughput

Time

I/O o

pera

tions

/sec

DETECT

Normal throughputFAILURE RECOVER

REPAIR

Page 8: Latency as a Performability Metric for Internet Services

Metrics for Web Services• Throughput - requests/sec• Latency – render time, time to first byte• Data quality

– harvest (response completeness)– yield (% queries answered)1

1 E. Brewer, Lessons from Giant-Scale Internet Services, 2001

Time

Perf

Page 9: Latency as a Performability Metric for Internet Services

Applications of Metrics• Modeling the expected failure-related

performance of a system, prior to deployment

• Benchmarking the performance of an existing system during various recovery phases

• Comparing the reliability gains offered by different recovery strategies

Page 10: Latency as a Performability Metric for Internet Services

Related Projects• HP: Automating Data Dependability

– uses “time to data access” as one objective for storage systems

• Rutgers: PRESS/Mendosus– evaluated throughput of PRESS server

during injected failures• IBM: Autonomic Storage• Numerous ROC projects

Page 11: Latency as a Performability Metric for Internet Services

Arguments for Using Latency as a Metric

• Originally, performability metrics were meant to capture end-user experience1

• Latency better describes the experience of an end user of a web site– response time >8 sec = site abandonment

= lost income $$2

• Throughput describes the raw processing ability of a service– best used to quantify expenses

1 J. F. Meyer, Performability Evaluation: Where It Is and What Lies Ahead, 19942 Zona Research and Keynote Systems, The Need for Speed II, 2001

Page 12: Latency as a Performability Metric for Internet Services

Current Progress• Using Mendosus fault injection

system on a 4-node PRESS web server (both from Rutgers)

• Running latency-based performability tests on the cluster– Inject faults during load test– Record page-load times before,

during and after faults

Page 13: Latency as a Performability Metric for Internet Services

Test Setup

Normal version: cooperative cachingHA version: cooperative caching +

heartbeat monitoring

PRESS web server +MendosusTest clients

Emulatedswitch

Request

Cachinginfo Page

Response

Page 14: Latency as a Performability Metric for Internet Services

Effect of Component Failure on Performability Metrics

Time

Perform-abilitymetric

REPAIRFAILURE

Throughput

Latency

Page 15: Latency as a Performability Metric for Internet Services

Observations• Below saturation, throughput is more

dependent on load than latency• Above saturation, latency is more

dependent on load

Time1 2 3 4 5

Thru = 6/sLat = .14s

Thru = 3/sLat = .14s

Thru = 7/sLat = .4s

Page 16: Latency as a Performability Metric for Internet Services

How to Represent Latency?• Average response time over a

given time period– Make a distinction between “render

time” & “time to first byte”?• Deviation from baseline latency

– Impose a greater penalty for deviations toward longer wait times?

Page 17: Latency as a Performability Metric for Internet Services

Response Time with Load Shedding Policy

Time

Responsetime(sec)

REPAIR

8s Abandonment threshold

FAILURE

Load-shedding threshold

X users get “server too busy” msg

Page 18: Latency as a Performability Metric for Internet Services

Load Shedding Issues• Load shedding means returning 0% data quality – a

different kind of performability metric• To combine load shedding and latency, define a

“demerit” system:

• Such systems quickly lose generality, however- “Server too busy” msg – 3 demerits

- 8 sec response time – 1 demerit/sec

Page 19: Latency as a Performability Metric for Internet Services

Further Work• Collect more experimental results!• Compare throughput and latency-

based results of normal and high-availability versions of PRESS

• Evaluate usefulness of “demerit” systems to describe the user experience (latency and data quality)

Page 20: Latency as a Performability Metric for Internet Services

Latency as a Performability Metric for Internet Services

Pete [email protected]