pcap: performance-aware power capping for … types of datacenters 3 luiz a. barrosoet al. [the...

25
PCAP: Performance-Aware Power Capping for the Disk Drive in the Cloud 2/24/16 1 Mohammed G. Khatib & Zvonimir Bandic WDC Research © 2016 Western Digital Corporation All rights reserved

Upload: hanhi

Post on 27-Aug-2018

215 views

Category:

Documents


0 download

TRANSCRIPT

PCAP: Performance-Aware Power Capping for the Disk Drive in the Cloud

2/24/16 1

Mohammed G. Khatib & Zvonimir BandicWDC Research

© 2016 Western Digital Corporation All rights reserved

HDD’s power impact on its cost

74%

13%

10%3%

HDDs Power Distribution & Cooling

Power Other Infrastructure

2

3-yr server & 10-yr infrastructure amortization

Based on Hamilton’s DC cost model [http://perspectives.mvdirona.com/2010/09/overall-data-center-costs/]

$35.27 $44.13 $42.69 $51.55

$0

$20

$40

$60

$80

$100

$120

$140

$160

$180

(1.07,$9) (1.80,$9) (1.07,$12) (1.80,$12)TO

TAL

CO

ST

($)

DISK MODEL

Acquisition cost Cost overhead (Power related)

© 2016 Western Digital Corporation All rights reserved

Peak-power & PUE dependent

Two types of datacenters

3

Luiz A. Barroso et al. [The Datacenter as a Computer, 2013]

The average activity distribution of a sample of 2 Google clusters, each containing over 20,000 servers, over a period of 3 months 2013

Focus  of  this  talk

© 2016 Western Digital Corporation All rights reserved

Power over-subscription

• For maximum cost effectiveness, use provisioned power fully – If a facility operates at 50% of its peak power capacity, the effective provisioning cost per Watt used is doubled!

Luiz A. Barroso [The Datacenter as a Computer, 2013]

• How many servers fit within a given budget? Hard question!–Specs are very conservative à Dell & HP offer online power calculators–Actual power consumption varies significantly with load–Hard to predict the peak power consumption of a group of servers• while any particular server might temporarily run at 100% utilization, the maximum

utilization of a group of servers probably isn’t 100%.

• Problem: using any power numbers but the specs runs the risk of à facility power over-subscriptionà Power capping becomes necessary as a saftey mechanism

4© 2016 Western Digital Corporation All rights reserved

Power capping

• What is power capping?–Preventing datacenter’s total power usage from violating (crossing) a predefined limit, the power cap (i.e., prevent power overshooting)

• Techniques:–Software techniques such as workload re-scheduling–Duty cycle adaptation–…

• This work:–Focuses on the 3.5’’ enterprise HDD–Explores techniques inherently related to the underlying hardware– Investigates using the queue size to cap the HDD’s power consumption

5© 2016 Western Digital Corporation All rights reserved

© 2016 Western Digital Corporation All rights reserved 6

Key contributions

• Investigate throttling HDD’s throughput to cap power–No strict positive correlation–HDD is underutilized

• Investigate resizing HDD’s queues and its impact on power–Higher HDD utilization–Performance differentiation: throughput & tail-latency–Limitations under low concurrency and workload

• PCAP system based on queue resizing–Make it stable, agile and performance-aware–Compare it to throttling–Study it for different workloads & settings

© 2016 Western Digital Corporation All rights reserved 7

PCAP in the works

© 2016 Western Digital Corporation All rights reserved 8

How we do it?

Setup• A JBOD with 16x 4TB HDDs

• Exercising a single HDD only

• Workload generators:–FIO–YCSB & MongoDB

• Design space exploration:–Reads/writes/mixed–4kB – 2MB–Threads: 10-256–Varying queue depth (QD)• HDD: 1-32• IO stack: 4-128–Deadline scheduler (default) is used

9

20

20

20

110

20

0

20

40

60

80

100

120

140

160

180

200

Po

we

r [W

]

PS 1 PS2 Fans Disks Rest

© 2016 Western Digital Corporation All rights reserved

HDD’s dynamic power

0

5

10

HDD-A HDD-B HDD-C HDD-D

Pow

er [W

]

Static Dynamic

10© 2016 Western Digital Corporation All rights reserved

Existing techniques

11

1. Power off disks

2. Throttling throughput – adapting duty cycle

à negatively impact throughput & latency (later)

à no strict positive correlation between power & throughput

© 2016 Western Digital Corporation All rights reserved

à

# outstanding requests matters – queue size

12© 2016 Western Digital Corporation All rights reserved

Spinningdirection

Small queue Large queue

à

Proposal: Resizing queues

• We propose to resize HDD queues

• Queues:–OS I/O scheduler (IOQ)–HDD internal queue (NCQ)

• Dynamically resizing as the power cap changes

• Allows to control power

• Minding that queue size influences–Throughput–Tail-latency

13© 2016 Western Digital Corporation All rights reserved

•Queue size influences the seek distance between requests

•A small queue results in long seek distance due to limited scheduling

• Long distances require acceleration & deceleration

•Acceleration and deceleration takes relatively large power

•And vice versa

© 2016 Western Digital Corporation All rights reserved 14

Queue-size & power - causality

Power vs. Queue size

15

Power vs. IOQ Power vs. NCQ

© 2016 Western Digital Corporation All rights reserved

Power vs. NCQ

16

Throughput  saturates  àStable power

Low  throughput  àLess power

Throughput  increases  àPower increases(diminishing  returns)

Long  seeks  àMore power

© 2016 Western Digital Corporation All rights reserved

PCAP design

17

–Reduce HDD’s power quickly to bring it below the power cap–Max out HDD’s performance when more power is available–But cautiously so that power cap is not violated

à Different scaling factors of the queue sizes αUP and αDN

–Reduces oscillations around the target power

à Hysteresis with margins [-ε,+ε]–Periodically adapts queues to ensure cap power

à Tunable period (T)

© 2016 Western Digital Corporation All rights reserved

18

PCAP: basic

Outside the controllable

range

© 2016 Western Digital Corporation All rights reserved

19

PCAP: Agile (bounded queues)

© 2016 Western Digital Corporation All rights reserved

20

PCAP: Agile w/ improved throughput

© 2016 Western Digital Corporation All rights reserved

PCAP: Dual-mode

21

Throughput Tail-latency

© 2016 Western Digital Corporation All rights reserved

Summary of performance

Avg.throughp

ut[IOPS]

% requests< 100ms

Max. latency[ms]

Throttling 117 10% 2.5PCAP -

Throughput154

(32%) 20% 2.7

PCAP – tail latency 102

(-15%) 60% 1.3

22

0

20

40

60

80

100

10 100 1000

% R

EQUES

TS

RESPONSE TIME [MS]

Tail-latency Throughput Throttling

© 2016 Western Digital Corporation All rights reserved

PCAP limitations Effective queue size matters

23

7

8

9

10

11

1 4 16 64 256

Ave

rge

po

we

r [W

]

# threads (log scale)

Cap=8W Cap=9W

Maximized load (100%) Varied load

7

8

9

10

11

0% 50% 100%A

verg

e p

ow

er

[W]

HDD utilization [%]

PCAP=8W PCAP=9W

© 2016 Western Digital Corporation All rights reserved

• Throttling underutilizes HDD’s performance– Useful under low concurrency and sequential throughput

• PCAP: resizing queues– Improves HDD’s utilization– 32% more throughput– 50% more requests < 100ms– WC latency reduce by 2x

• PCAP performance is limited under– Low concurrency– Light workloads

• PCAP works for multiple HDDs

• Please see the paper for more observations and results

24

Conclusions

© 2016 Western Digital Corporation All rights reserved

Thanks for your attention!

Q & A

Interested in internship in WDC research?

Apply at: http://bit.do/FAST16

Email: [email protected]

25© 2016 Western Digital Corporation All rights reserved