dynamic workload characterization for power efficient scheduling on cmp systems

15
International Symposium on Low Power Electronics and Design Dynamic Workload Characterization for Power Efficient Scheduling on CMP Systems 1 Gaurav Dhiman, 1 Vasileios Kontorinis, 1 Dean Tullsen, 1 Tajana Rosing, 2 Eric Saxe, 2 Jonathan Chew ISLPED 2010, Austin 1 UC San Diego 2 Oracle Corp.

Upload: barny

Post on 24-Feb-2016

47 views

Category:

Documents


0 download

DESCRIPTION

Dynamic Workload Characterization for Power Efficient Scheduling on CMP Systems. 1 Gaurav Dhiman , 1 Vasileios Kontorinis , 1 Dean Tullsen , 1 Tajana Rosing , 2 Eric Saxe, 2 Jonathan Chew ISLPED 2010, Austin 1 UC San Diego 2 Oracle Corp. - PowerPoint PPT Presentation

TRANSCRIPT

Page 1: Dynamic Workload Characterization for Power Efficient Scheduling  on CMP Systems

International Symposium on Low Power Electronics and Design

Dynamic Workload Characterization for Power Efficient Scheduling on

CMP Systems

1Gaurav Dhiman, 1Vasileios Kontorinis, 1Dean Tullsen, 1Tajana Rosing, 2Eric Saxe, 2Jonathan Chew

ISLPED 2010, Austin1UC San Diego2Oracle Corp.

Page 2: Dynamic Workload Characterization for Power Efficient Scheduling  on CMP Systems

• Chip Multiprocessors/multicore architectures are pervasive in modern systems

• Hierarchy of asymmetric resource sharing among cores:– Memory bandwidth– Last level cache– Pipeline

• Threads scheduled across these cores share these resources:– Resource requirements– Relative placement

Overall performance and power efficiency

2

Introduction

Page 3: Dynamic Workload Characterization for Power Efficient Scheduling  on CMP Systems

• Modern OS capture the resource sharing asymmetries• However, the balancing based on thread count:

– Resource usage?– Resource requirement?

3

Motivation

Page 4: Dynamic Workload Characterization for Power Efficient Scheduling  on CMP Systems

• The difference between best and worst schedule as high as 70% on a ‘balanced’ system!

• The threads that share last level cache makes a big difference– High contention deteriorates performance and power efficiency

4

Motivationgzip art art art

Page 5: Dynamic Workload Characterization for Power Efficient Scheduling  on CMP Systems

• Default scheduler exhibits high variance:– Due to ping pong between best and worst schedules– Frequent pre-emptions by high priority ‘transient threads’

• Not an OS specific problem:– Lack of information available to the OS scheduler

5

Motivation

Page 6: Dynamic Workload Characterization for Power Efficient Scheduling  on CMP Systems

• Highlight the inability of modern OS to extract full power efficiency from the parallel architectures– Lack of resource utilization knowledge accessible to

scheduler• Identify characteristics of threads that affect resource

sharing efficiency and metrics to capture them• Uncover and provide solution to ‘transient threads’

– Short running kernel threads that impede stable scheduling• Extend the scheduler to incorporate this logic into the

load balancing fabric• Implement a prototype “Workload Characteristics

Aware (WCA) Scheduler”

6

Contributions

Page 7: Dynamic Workload Characterization for Power Efficient Scheduling  on CMP Systems

• High priority short running threads– Run in order of us– Example: java, fsflush, nscd etc.

• Have little impact on runtimes of long running threads• Mislead the OS load balancer

7

Transient Threads

Artificial Load

Almost idle

Page 8: Dynamic Workload Characterization for Power Efficient Scheduling  on CMP Systems

• Identification:– Spend most of the time blocked vs running– Maintain ratio in the thread data structure– Flag as transient if ratio < 1%

• Resolution:– Load balance only non transient threads

8

Transient Threads

Artificial Load

Almost idle

Page 9: Dynamic Workload Characterization for Power Efficient Scheduling  on CMP Systems

• Two requirements:– Identify cache sensitive threads– Reconstruct the load balancer to balance them

• Identification metrics:– LLCRPI

• Cache weight = 2• Highest degree of sensitivity

– IPC• Cache weight = 1• Medium degree of sensitivity

– Non sensitive• Cache weight = 0• No sensitivity

• Maintain cache weight of each thread dynamically

Cache Sensitivity

Page 10: Dynamic Workload Characterization for Power Efficient Scheduling  on CMP Systems

10

Cache Sensitivity

• Enhance the load representation structure• Balance # of threads, CW and IPC

Page 11: Dynamic Workload Characterization for Power Efficient Scheduling  on CMP Systems

• Implemented the system on OpenSolaris– Transient thread characterization– Cache sensitivity characterization– Load balancing algorithm

• Tested on an Intel Xeon E5430 based machine• Workloads using 12 SPEC 2K benchmarks

– 3 thread combinations– Present the toughest case for the scheduler

• Compare results against default scheduler:– Average weighted Perf/Watt:

• Captures both system level power consumption and performance– # of thread migrations in the system and execution time

stability

11

Methodology

Page 12: Dynamic Workload Characterization for Power Efficient Scheduling  on CMP Systems

• ~14% average improvement in Perf/Watt

12

Overall Results

Page 13: Dynamic Workload Characterization for Power Efficient Scheduling  on CMP Systems

• Significant speedup at roughly the same power budget• Better opportunities for idle power savings

13

Power Efficiency

Page 14: Dynamic Workload Characterization for Power Efficient Scheduling  on CMP Systems

Stability Analysis

14

• 91% reduction in migration rate of threads• Stable and predictable schedules and run-times

• 89% reduction in execution time std deviation

Page 15: Dynamic Workload Characterization for Power Efficient Scheduling  on CMP Systems

• Identify limitations of modern OS to extract full power efficiency from modern CMP architectures

• Highlight characteristics of threads that affect cache sharing efficiency and metrics to capture them

• Identify ‘transient threads’ as an impediment to stable scheduling

• Extend the scheduler to incorporate cache and transient thread management into the load balancing fabric

• Prototype scheduler implementation improves Perf/Watt by up to 30%

15

Conclusions