asymmetry-aware work-stealing runtimescbatten/pdfs/batten-awsteal-iap2015.pdf · motivation...

28
Asymmetry-Aware Work-Stealing Runtimes Christopher Torng, Moyang Wang, and Christopher Batten School of Electrical and Computer Engineering Cornell University Cornell IAP Workshop October 2015

Upload: others

Post on 18-Oct-2020

1 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Asymmetry-Aware Work-Stealing Runtimescbatten/pdfs/batten-awsteal-iap2015.pdf · Motivation First-Order Modeling Asymmetry-Aware Work-Stealing Runtimes • Evaluation • Evaluation

Asymmetry-Aware Work-Stealing Runtimes

Christopher Torng, Moyang Wang, andChristopher Batten

School of Electrical and Computer EngineeringCornell University

Cornell IAP WorkshopOctober 2015

Page 2: Asymmetry-Aware Work-Stealing Runtimescbatten/pdfs/batten-awsteal-iap2015.pdf · Motivation First-Order Modeling Asymmetry-Aware Work-Stealing Runtimes • Evaluation • Evaluation

• Motivation • First-Order Modeling Asymmetry-Aware Work-Stealing Runtimes Evaluation

Asymmetry-Aware Work-Stealing Runtimes

Work-StealingRuntimes

Single-ISAHeterogeneousArchitectures

Dynamic Voltageand Frequency

Scaling

DynamicAsymmetry

StaticAsymmetry

Cornell University Christopher Batten 2 / 18

Page 3: Asymmetry-Aware Work-Stealing Runtimescbatten/pdfs/batten-awsteal-iap2015.pdf · Motivation First-Order Modeling Asymmetry-Aware Work-Stealing Runtimes • Evaluation • Evaluation

• Motivation • First-Order Modeling Asymmetry-Aware Work-Stealing Runtimes Evaluation

Work-Stealing Runtimes

Work inProgress

TaskQueues

Core 0 Core 1 Core 2 Core 3

Task A

Cornell University Christopher Batten 3 / 18

Page 4: Asymmetry-Aware Work-Stealing Runtimescbatten/pdfs/batten-awsteal-iap2015.pdf · Motivation First-Order Modeling Asymmetry-Aware Work-Stealing Runtimes • Evaluation • Evaluation

• Motivation • First-Order Modeling Asymmetry-Aware Work-Stealing Runtimes Evaluation

Work-Stealing Runtimes

Work inProgress

TaskQueues

Core 0 Core 1 Core 2 Core 3

Task B

Task A

Spawn Task B

Cornell University Christopher Batten 3 / 18

Page 5: Asymmetry-Aware Work-Stealing Runtimescbatten/pdfs/batten-awsteal-iap2015.pdf · Motivation First-Order Modeling Asymmetry-Aware Work-Stealing Runtimes • Evaluation • Evaluation

• Motivation • First-Order Modeling Asymmetry-Aware Work-Stealing Runtimes Evaluation

Work-Stealing Runtimes

Work inProgress

TaskQueues

Core 0 Core 1 Core 2 Core 3

Task B

Dequeue Task B

Cornell University Christopher Batten 3 / 18

Page 6: Asymmetry-Aware Work-Stealing Runtimescbatten/pdfs/batten-awsteal-iap2015.pdf · Motivation First-Order Modeling Asymmetry-Aware Work-Stealing Runtimes • Evaluation • Evaluation

• Motivation • First-Order Modeling Asymmetry-Aware Work-Stealing Runtimes Evaluation

Work-Stealing Runtimes

Work inProgress

TaskQueues

Core 0 Core 1 Core 2 Core 3

Task C

Task B

Spawn Task C

Cornell University Christopher Batten 3 / 18

Page 7: Asymmetry-Aware Work-Stealing Runtimescbatten/pdfs/batten-awsteal-iap2015.pdf · Motivation First-Order Modeling Asymmetry-Aware Work-Stealing Runtimes • Evaluation • Evaluation

• Motivation • First-Order Modeling Asymmetry-Aware Work-Stealing Runtimes Evaluation

Work-Stealing Runtimes

Work inProgress

TaskQueues

Core 0 Core 1 Core 2 Core 3

Task B

Spawn Task D

Task D

Task C

Cornell University Christopher Batten 3 / 18

Page 8: Asymmetry-Aware Work-Stealing Runtimescbatten/pdfs/batten-awsteal-iap2015.pdf · Motivation First-Order Modeling Asymmetry-Aware Work-Stealing Runtimes • Evaluation • Evaluation

• Motivation • First-Order Modeling Asymmetry-Aware Work-Stealing Runtimes Evaluation

Work-Stealing Runtimes

Work inProgress

TaskQueues

Core 0 Core 1 Core 2 Core 3

Task CTask D

Steal Task D Steal Task C

Task B

Cornell University Christopher Batten 3 / 18

Page 9: Asymmetry-Aware Work-Stealing Runtimescbatten/pdfs/batten-awsteal-iap2015.pdf · Motivation First-Order Modeling Asymmetry-Aware Work-Stealing Runtimes • Evaluation • Evaluation

• Motivation • First-Order Modeling Asymmetry-Aware Work-Stealing Runtimes Evaluation

Work-Stealing Runtimes

Work inProgress

TaskQueues

Core 0 Core 1 Core 2 Core 3

Task CTask D

Task E Task F

Spawn Task FSpawn Task E

Cornell University Christopher Batten 3 / 18

Page 10: Asymmetry-Aware Work-Stealing Runtimescbatten/pdfs/batten-awsteal-iap2015.pdf · Motivation First-Order Modeling Asymmetry-Aware Work-Stealing Runtimes • Evaluation • Evaluation

• Motivation • First-Order Modeling Asymmetry-Aware Work-Stealing Runtimes Evaluation

Work-Stealing Runtimes

Work inProgress

TaskQueues

Core 0 Core 1 Core 2 Core 3

Task CTask DTask E Task F

Steal Task E Steal Task F

I Work stealing has good performance, space requirements, andcommunication overheads in both theory and practice

I Supported in many popular concurrency plaforms including:Intel’s Cilk Plus, Intel’s C++ TBB, Microsoft’s .NET Task ParallelLibrary, Java’s Fork/Join Framework, and OpenMP

Cornell University Christopher Batten 3 / 18

Page 11: Asymmetry-Aware Work-Stealing Runtimescbatten/pdfs/batten-awsteal-iap2015.pdf · Motivation First-Order Modeling Asymmetry-Aware Work-Stealing Runtimes • Evaluation • Evaluation

• Motivation • First-Order Modeling Asymmetry-Aware Work-Stealing Runtimes Evaluation

Static Asymmetry: Heterogeneous Multicore Systems

Performance (Millions of Instructions per Second)

En

erg

y (P

ico-J

oule

pe

r In

stru

ctio

n)

200 400 600 800 1000 1200 1400 1600 1800 2000

50

100

150

200

250

500

Incr

easing P

ower

in-order, 1-issue

Based on analytical models of90nm technology with joint optimization ofmicroarchitectural and circuit parameters

in-order, 2-issuein-order, 3-issueout-of-order, 1-issue

out-of-order, 3-issueout-of-order, 2-issue

IO/3

OOO/2

OOO/3

IO/1

Adpated from O. Azizi et al. “Energy-Performance Tradeoffs inProcessor Architecture and Circuit Design: A Marginal Cost Approach,” ISCA, 2010.

BigARM Cores

LittleARM Cores

A7 A7 A15 A15

L2$ L2$

Samsung Exynos OctaMobile Processor

Cornell University Christopher Batten 4 / 18

Page 12: Asymmetry-Aware Work-Stealing Runtimescbatten/pdfs/batten-awsteal-iap2015.pdf · Motivation First-Order Modeling Asymmetry-Aware Work-Stealing Runtimes • Evaluation • Evaluation

• Motivation • First-Order Modeling Asymmetry-Aware Work-Stealing Runtimes Evaluation

Dynamic Asymmetry: Voltage/Frequency ScalingC

ore

Ener

gy C

onsu

mpt

ion

Performance

Fmin @ Vmin

Fmax @ Vmax

Fnom @ Vnom

VLLC

GraphicsEngine

Rin

g In

terc

onne

ct

SystemAgent

+PCIe

+DMI

VDRAM Cntl

VCCIN

FIV

R

CPUs+

Cache

VCPU0

VRING

VGPU

VSA

VIOA

VCPU1……

Intel Haswell integrates thevoltage control loop circuitry on-die

with inductors in-package.

Cornell University Christopher Batten 5 / 18

Page 13: Asymmetry-Aware Work-Stealing Runtimescbatten/pdfs/batten-awsteal-iap2015.pdf · Motivation First-Order Modeling Asymmetry-Aware Work-Stealing Runtimes • Evaluation • Evaluation

• Motivation • First-Order Modeling Asymmetry-Aware Work-Stealing Runtimes Evaluation

Work-StealingRuntimes

DynamicAsymmetry

StaticAsymmetry

Bender et al."Online Scheduling

of Parallel Programs onHeterogeneous Sys ..."

SPAA 2002

Ribic et al."Energy-Efficient

Work-StealingLanguage Runtimes"

ASPLOS 2014

Azizi et al."Energy-performance Tradeoffs in Processor Architecture and Circuit Design:

A Marginal Cost Analysis" ISCA 2010

How can we use asymmetry awareness to improve theperformance and energy efficiency of a work-stealing runtime?

Cornell University Christopher Batten 6 / 18

Page 14: Asymmetry-Aware Work-Stealing Runtimescbatten/pdfs/batten-awsteal-iap2015.pdf · Motivation First-Order Modeling Asymmetry-Aware Work-Stealing Runtimes • Evaluation • Evaluation

Motivation • First-Order Modeling • Asymmetry-Aware Work-Stealing Runtimes Evaluation

Work-StealingRuntimes

DynamicAsymmetry

StaticAsymmetry

Talk Outline

Motivation

First-Order Modeling

Asymmetry-AwareWork-Stealing Runtimes

Evaluation

Cornell University Christopher Batten 7 / 18

Page 15: Asymmetry-Aware Work-Stealing Runtimescbatten/pdfs/batten-awsteal-iap2015.pdf · Motivation First-Order Modeling Asymmetry-Aware Work-Stealing Runtimes • Evaluation • Evaluation

Motivation • First-Order Modeling • Asymmetry-Aware Work-Stealing Runtimes Evaluation

0.6 0.8 1.0 1.2 1.4 1.6

Normalized Performance

Nor

mal

ized

Ene

rgy

Effi

cien

cy

0.8

0.9

1.0

1.1

1.2

VL = 1.0VVB = 1.0V

1.3

isop

ower

Higher PerformanceLower Energy Efficiency

Lower PerformanceHigher Energy Efficiency

VL = 1.3VVB = 1.0V

VL = 1.3VVB = 0.9V

Higher PerformanceHigher Energy Efficiency

Systematic wayto adjust voltageand frequency

to move into upperright quadrant?

Use average powerat Vnom as anoptimizationconstraint

Cornell University Christopher Batten 8 / 18

Page 16: Asymmetry-Aware Work-Stealing Runtimescbatten/pdfs/batten-awsteal-iap2015.pdf · Motivation First-Order Modeling Asymmetry-Aware Work-Stealing Runtimes • Evaluation • Evaluation

Motivation • First-Order Modeling • Asymmetry-Aware Work-Stealing Runtimes Evaluation

L3

L2

L1

L0

B3

B2

B1

B0

3

21

Opportunity #1

Balance performance/poweracross cores when all cores are busy

System with four active big cores and four active little cores

BLNo

rmal

ized

Powe

r

Normalized IPS0.0 0.5 1.0 1.5 2.0 2.5 3.001234567

VLVB

B LMar

gina

l IPS

/W

0.70 1.00 1.30 1.60 1.901.04 1.00 0.92 0.76 0.24

AggregateThroughput

Aggr

egat

e Sy

stem

IPS

0.00.20.40.60.81.01.2

0.70 1.00 1.30 1.60 1.901.04 1.00 0.92 0.76 0.24

VLVB

Technique #1: Work-Pacing

Cornell University Christopher Batten 9 / 18

Page 17: Asymmetry-Aware Work-Stealing Runtimescbatten/pdfs/batten-awsteal-iap2015.pdf · Motivation First-Order Modeling Asymmetry-Aware Work-Stealing Runtimes • Evaluation • Evaluation

Motivation • First-Order Modeling • Asymmetry-Aware Work-Stealing Runtimes Evaluation

L3

L2

L1

L0

B3

B2

B1

B0

3

21

Opportunity #2

Rest waiting cores thenbalance performance and power

acrossy busy cores

4B4L system with two active big cores and two active little cores

BL

Norm

aliz

ed P

ower

Normalized IPS0.0 0.5 1.0 1.5 2.0 2.5 3.001234567

2.200.39

VLVB

Mar

gina

l IPS

/W

B L

0.20 0.60 1.00 1.40 1.801.26 1.24 1.211.14 0.97

Aggr

egat

e Sy

stem

IPS

VLVB0.20 0.60 1.00 1.40 1.801.26 1.24 1.211.14 0.97

2.200.39

0.00.20.40.60.81.01.21.41.6

Technique #2: Work-Sprinting

Cornell University Christopher Batten 10 / 18

Page 18: Asymmetry-Aware Work-Stealing Runtimescbatten/pdfs/batten-awsteal-iap2015.pdf · Motivation First-Order Modeling Asymmetry-Aware Work-Stealing Runtimes • Evaluation • Evaluation

Motivation • First-Order Modeling • Asymmetry-Aware Work-Stealing Runtimes Evaluation

L3

L2

L1

L0

B3

B2

B1

B0

3

21

Opportunity #3

Move work from slow little coresto fast big cores

4B4L system with either one active big core OR one active little core

B

L

Nor

mal

ized

Pow

er

Normalized IPS0.0 0.5 1.0 1.5 2.0 2.5 3.001234567

Optimal VL = 2.59V

On Little CoreOptimal VB = 1.51V

On Big Core

Feasible VL = 1.3V Feasible VB = 1.3V

Speedup of 1.6x Speedup of 3.3x

Technique #3: Work-Mugging

Cornell University Christopher Batten 11 / 18

Page 19: Asymmetry-Aware Work-Stealing Runtimescbatten/pdfs/batten-awsteal-iap2015.pdf · Motivation First-Order Modeling Asymmetry-Aware Work-Stealing Runtimes • Evaluation • Evaluation

Motivation First-Order Modeling • Asymmetry-Aware Work-Stealing Runtimes • Evaluation

Work-StealingRuntimes

DynamicAsymmetry

StaticAsymmetry

Talk Outline

Motivation

First-Order Modeling

Asymmetry-AwareWork-Stealing Runtimes

Evaluation

Cornell University Christopher Batten 12 / 18

Page 20: Asymmetry-Aware Work-Stealing Runtimescbatten/pdfs/batten-awsteal-iap2015.pdf · Motivation First-Order Modeling Asymmetry-Aware Work-Stealing Runtimes • Evaluation • Evaluation

Motivation First-Order Modeling • Asymmetry-Aware Work-Stealing Runtimes • Evaluation

AAWS Runtime Techniques

On-Chip Interconnect

L1$ L1$

B0 L0

VoltageRegulators

DRAM Memory Controller

L1$ L1$

B1 L1

Shared L2$ Cache Banks

Work-PacingWhen all cores are busy, tune V&Fof little and big cores (based on offlinemarginal utility analysis) to increaseperformance and energy efficiency

DVFSController

HintsDecision

Vdd

Cornell University Christopher Batten 13 / 18

Page 21: Asymmetry-Aware Work-Stealing Runtimescbatten/pdfs/batten-awsteal-iap2015.pdf · Motivation First-Order Modeling Asymmetry-Aware Work-Stealing Runtimes • Evaluation • Evaluation

Motivation First-Order Modeling • Asymmetry-Aware Work-Stealing Runtimes • Evaluation

AAWS Runtime Techniques

On-Chip Interconnect

L1$ L1$

B0 L0

VoltageRegulators

DRAM Memory Controller

L1$ L1$

B1 L1

Shared L2$ Cache Banks

Work-PacingWhen all cores are busy, tune V&Fof little and big cores (based on offlinemarginal utility analysis) to increaseperformance and energy efficiency

DVFSController

HintsDecision

Vdd

Work-SprintingRest waiting cores to generate powerslack, tune V&F of busy cores (basedon offline marginal utility analysis)

Cornell University Christopher Batten 13 / 18

Page 22: Asymmetry-Aware Work-Stealing Runtimescbatten/pdfs/batten-awsteal-iap2015.pdf · Motivation First-Order Modeling Asymmetry-Aware Work-Stealing Runtimes • Evaluation • Evaluation

Motivation First-Order Modeling • Asymmetry-Aware Work-Stealing Runtimes • Evaluation

AAWS Runtime Techniques

On-Chip Interconnect

L1$ L1$

B0 L0

VoltageRegulators

DRAM Memory Controller

L1$ L1$

B1 L1

Shared L2$ Cache Banks

Work-PacingWhen all cores are busy, tune V&Fof little and big cores (based on offlinemarginal utility analysis) to increaseperformance and energy efficiency

DVFSController

Work-SprintingRest waiting cores to generate powerslack, tune V&F of busy cores (basedon offline marginal utility analysis)

Work-MuggingMove work from little to big cores byallowing big cores to preemptively mugtasks from little cores

User-Level Interrupt Network

VoltageRegulators

DVFSController

MugInstruction

MugInterrupt

Thread Context Swap

Cornell University Christopher Batten 13 / 18

Page 23: Asymmetry-Aware Work-Stealing Runtimescbatten/pdfs/batten-awsteal-iap2015.pdf · Motivation First-Order Modeling Asymmetry-Aware Work-Stealing Runtimes • Evaluation • Evaluation

Motivation First-Order Modeling Asymmetry-Aware Work-Stealing Runtimes • Evaluation •

Work-StealingRuntimes

DynamicAsymmetry

StaticAsymmetry

Talk Outline

Motivation

First-Order Modeling

Asymmetry-AwareWork-Stealing Runtimes

Evaluation

Cornell University Christopher Batten 14 / 18

Page 24: Asymmetry-Aware Work-Stealing Runtimescbatten/pdfs/batten-awsteal-iap2015.pdf · Motivation First-Order Modeling Asymmetry-Aware Work-Stealing Runtimes • Evaluation • Evaluation

Motivation First-Order Modeling Asymmetry-Aware Work-Stealing Runtimes • Evaluation •

Evaluation Methodology: Modeling

Work-Stealing RuntimeI State-of-the-art Intel TBB-inspired work-stealing schedulerI Chase-Lev task queues with occupancy-based victim selectionI Automatic recursive decomposition of parallel loop task chunkingI Instrumented with activity hints

Cycle-Level ModelingI Heterogeneous system modeled in gem5 cycle-approximate simulatorI Support for scaling per-core frequencies + central DVFS ControllerI Accounting for intercore interrupt latency during mugging

Energy ModelingI Event-based energy modeling based on McPAT models and detailed

RTL/gate-level sims (Synopsys ASIC toolflow, TSMC LP, 65 nm 1.0 V)I Per-inst energy benchmarks to isolate event energy (e.g., rfile reads)

Cornell University Christopher Batten 15 / 18

Page 25: Asymmetry-Aware Work-Stealing Runtimescbatten/pdfs/batten-awsteal-iap2015.pdf · Motivation First-Order Modeling Asymmetry-Aware Work-Stealing Runtimes • Evaluation • Evaluation

Motivation First-Order Modeling Asymmetry-Aware Work-Stealing Runtimes • Evaluation •

Performance of Convex Hull ApplicationNo

rmal

ized

Exe

cutio

n Ti

me

1.00

High

Par

alle

l

Baseline

1.00 V 0.91 V0.70 V 1.06 V 1.33 VBusy Waiting

Big

Littl

e

Low

Par

alle

l

0.92

Work-Pacing

8% reduction

0.75

Work-Pacing, Sprinting

25%reduction

Big

Littl

e0.71

Work-Pacing, Sprinting, Mugging

29%reduction

Cornell University Christopher Batten 16 / 18

Page 26: Asymmetry-Aware Work-Stealing Runtimescbatten/pdfs/batten-awsteal-iap2015.pdf · Motivation First-Order Modeling Asymmetry-Aware Work-Stealing Runtimes • Evaluation • Evaluation

Motivation First-Order Modeling Asymmetry-Aware Work-Stealing Runtimes • Evaluation •

Energy-Efficiency and Performance Results

0.0 0.2 0.4 0.6 0.80.0

0.2

0.4

0.6

0.8 baseline

Nor

mal

ized

Ene

rgy

Effic

ienc

y

0.9

1.0

1.1

1.2

1.3

1.4

1.5

1.6

isopo

wer

Performance1.0 1.21.1 1.3 1.4 1.5 1.6

baseline+work-pacing

baseline+work-pacing+work-sprinting

baseline+work-pacing+work-sprinting+work-mugging

Cornell University Christopher Batten 17 / 18

Page 27: Asymmetry-Aware Work-Stealing Runtimescbatten/pdfs/batten-awsteal-iap2015.pdf · Motivation First-Order Modeling Asymmetry-Aware Work-Stealing Runtimes • Evaluation • Evaluation

Motivation First-Order Modeling Asymmetry-Aware Work-Stealing Runtimes • Evaluation •

Energy-Efficiency and Performance Results

0.0 0.2 0.4 0.6 0.80.0

0.2

0.4

0.6

0.8 baseline

Nor

mal

ized

Ene

rgy

Effic

ienc

y

0.9

1.0

1.1

1.2

1.3

1.4

1.5

1.6

isopo

wer

Performance1.0 1.21.1 1.3 1.4 1.5 1.6

baseline+work-pacing

baseline+work-pacing+work-sprinting

baseline+work-pacing+work-sprinting+work-mugging

Radix SortSpeedup 1.26xEnergy Efficiency 1.53x

Convex HullSpeedup 1.41xEnergy Efficiency 1.24x

Cornell University Christopher Batten 17 / 18

Page 28: Asymmetry-Aware Work-Stealing Runtimescbatten/pdfs/batten-awsteal-iap2015.pdf · Motivation First-Order Modeling Asymmetry-Aware Work-Stealing Runtimes • Evaluation • Evaluation

Motivation First-Order Modeling Asymmetry-Aware Work-Stealing Runtimes • Evaluation •

Work-StealingRuntimes

DynamicAsymmetry

StaticAsymmetry

Take-Away Point

Holistically combining

• work-stealing runtimes• static asymmetry• dynamic asymmetry

through the use of

• work-pacing• work-sprinting• work-mugging

can improve both performance andenergy efficiency in future multicoresystems

Cornell University Christopher Batten 18 / 18