understanding performance, power and energy behavior in asymmetric processors

32
Understanding Performance, Power and Energy Behavior in Asymmetric Processors Nagesh B Lakshminarayana Hyesoon Kim School of Computer Science Georgia Institute of Technology

Upload: rosa

Post on 11-Jan-2016

27 views

Category:

Documents


0 download

DESCRIPTION

Understanding Performance, Power and Energy Behavior in Asymmetric Processors. Nagesh B Lakshminarayana Hyesoon Kim School of Computer Science Georgia Institute of Technology. Outline. Background and Motivation Thread Interactions Dynamic Scheduling Asymmetry Aware Scheduling - PowerPoint PPT Presentation

TRANSCRIPT

Page 1: Understanding Performance, Power and Energy Behavior in Asymmetric Processors

Understanding Performance, Power and Energy Behavior in Asymmetric Processors

Nagesh B Lakshminarayana

Hyesoon Kim

School of Computer Science

Georgia Institute of Technology

Page 2: Understanding Performance, Power and Energy Behavior in Asymmetric Processors

2

Outline

• Background and Motivation

• Thread Interactions

• Dynamic Scheduling

• Asymmetry Aware Scheduling

• Conclusion and Future Work

Page 3: Understanding Performance, Power and Energy Behavior in Asymmetric Processors

3

Heterogeneous Architectures

• A particularly interesting class of parallel machines is Heterogeneous Architectures– Multiple types of Processing Elements (PEs)

available on the same machine

PEA

PEBPEBPEBPEBIn

terconnect

Page 4: Understanding Performance, Power and Energy Behavior in Asymmetric Processors

4

Heterogeneous Architectures

• Heterogeneous architectures are becoming very common

IBM Cell processor

Special Accelerator

Fast core

Slow core

Slow core

Slow core

Slow core

Focus of this talk

Asymmetric Processors

Fast core

Page 5: Understanding Performance, Power and Energy Behavior in Asymmetric Processors

5

Machine configurations

All-slow (SMP) All processors running at their lowest frequency

Half-half (AMP) Half of the processors running at their highest frequency, rest running at their lower frequency

All-fast (SMP) All processors running at their highest frequency

• M-I experiments have 8 threads, M-II experiments have 16 threads

• AMPs emulated using SpeedStep/PowerNow

Machine-I 2 Socket 1.87 GHz Quad-core Intel Xeon

4MB L2 cache, 8GB RAM, 40GB HDD, RHEL 5

Machine-II 4 Socket 2 GHz Quad-core AMD Opteron 8350

2MB L3 cache, 32GB RAM, 1TB HDD, RHEL 4

Page 6: Understanding Performance, Power and Energy Behavior in Asymmetric Processors

6

Power Measurement

• Using Extech 380801 Power Analyzer• Total system power consumption

Experiment Machine

Windows MachinePower CableSerial Cable

Power Socket

Page 7: Understanding Performance, Power and Energy Behavior in Asymmetric Processors

7

PARSEC Benchmark Suite

• Desktop-oriented multithreaded benchmark suite– Multithreaded– Animation, Data Mining, Financial Analysis– Pthreads, OpenMP

Page 8: Understanding Performance, Power and Energy Behavior in Asymmetric Processors

8

050

100150200250300350

Exec

ution

tim

e (s

ec) All-fastHalf-halfAll-slow

Performance of PARSEC benchmarks

• On average, performance of half-half is between that of all-slow and all-fast

Execution Time

slow-limited middle-perf unstable

Page 9: Understanding Performance, Power and Energy Behavior in Asymmetric Processors

9

barrier barrierbarrier

(a) slow-limited (b) middle-perf (c) unstable

Classification of Benchmarks

Page 10: Understanding Performance, Power and Energy Behavior in Asymmetric Processors

10

0

50

100

150

200

Ener

gy (K

J)All-fastHalf-halfAll-slow

• In half-half/all-slow, total energy consumption is higher even though average power consumed might be lower

Energy Consumption of PARSEC

Energy consumption

slow-limited middle-perf

Page 11: Understanding Performance, Power and Energy Behavior in Asymmetric Processors

11

• Observations

–Different applications behave differently on AMPs

–Usually SMP with fast processors saves energy

Behavior of Parsec Benchmarks

Page 12: Understanding Performance, Power and Energy Behavior in Asymmetric Processors

12

Why do different applications behave differently on AMPs?

Page 13: Understanding Performance, Power and Energy Behavior in Asymmetric Processors

13

Outline

• Background and Motivation

• Thread Interactions

• Dynamic Scheduling

• Asymmetry Aware Scheduling

• Conclusion and Future Work

Page 14: Understanding Performance, Power and Energy Behavior in Asymmetric Processors

14

Thread Interactions

Sources of thread interactions• Critical Sections• Barriers

Page 15: Understanding Performance, Power and Energy Behavior in Asymmetric Processors

15

Case (a)

Critical section

Useful work

Case (b)

Waiting

Critical Sections (CS)

• Waiting to enter CSs

Page 16: Understanding Performance, Power and Energy Behavior in Asymmetric Processors

16

• Waiting for other threads to finish

barrier

Barriers

barrier

Page 17: Understanding Performance, Power and Energy Behavior in Asymmetric Processors

0.8

0.85

0.9

0.95

1

10% CS 15% CS 20% CS 50% CS 75% CSNo

rmal

ized

po

wer

co

nsu

mp

tion

16 @ 1 GHz16 @ 1.2 GHz16 @ 1.4 GHz16 @ 1.7 GHz16 @ 2 GHz

17

Effect of Critical Section length

• CS limited application

• As critical section length increases, the average power consumed decreases

Normalized Power Consumption

Page 18: Understanding Performance, Power and Energy Behavior in Asymmetric Processors

18

Effect of Critical Section length

Normalized Execution Time• CS limited application

0

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

1

10% 15% 20% 50% 75%

Nor

mal

ized

exe

cuti

on ti

me 16 @ 1 GHz (SMP)

16 @ 1.2 GHz (SMP)

16 @ 1.4 GHz (SMP)

16 @ 1.7GHz (SMP)

16 @ 2 GHz (SMP)

Page 19: Understanding Performance, Power and Energy Behavior in Asymmetric Processors

19

Effect of Critical Section length

• Performance of AMPs sensitive to CS length

Normalized Execution Time• CS limited application

0

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

1

10% 15% 20% 50% 75%

Nor

mal

ized

exe

cuti

on ti

me 16 @ 1 GHz (SMP)

16 @ 1.2 GHz (SMP)

16 @ 1.4 GHz (SMP)

16 @ 1.7GHz (SMP)

16 @ 2 GHz (SMP)

8 @ 1 GHz, 8 @ 2 GHz (AMP)

8 @ 1.2 GHz, 8 @ 2 GHz (AMP)

8 @ 1.4 GHz, 8 @ 2 GHz (AMP)

8 @ 1.7GHz, 8 @ 2 GHz (AMP)

Page 20: Understanding Performance, Power and Energy Behavior in Asymmetric Processors

20

Effect of Critical Section length

• Energy consumption shows the same trend

Normalized Energy Consumption• CS limited application

0

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

1

10% 15% 20% 50% 75%

Nor

mal

ized

ene

rgy

cons

umpti

on

16 @ 1 GHz (SMP)

16 @ 1.2 GHz (SMP)

16 @ 1.4 GHz (SMP)

16 @ 1.7GHz (SMP)

16 @ 2 GHz (SMP)

8 @ 1 GHz, 8 @ 2 GHz (AMP)

8 @ 1.2 GHz, 8 @ 2 GHz (AMP)

8 @ 1.4 GHz, 8 @ 2 GHz (AMP)

8 @ 1.7GHz, 8 @ 2 GHz (AMP)

Page 21: Understanding Performance, Power and Energy Behavior in Asymmetric Processors

21

Effect of Critical Section frequency

• Both length and frequency of CS affect performance and energy consumption

• As frequency increases, performance difference between half-half and all-fast reduces

• If majority of the execution time is spent waiting for locks, it is OK to have a few slow processors

• Results available in the paper

Page 22: Understanding Performance, Power and Energy Behavior in Asymmetric Processors

22

Effect of Barriers

• For few barriers, half-half performs similar to all-slow

• For large number of barriers, half-half performs similar to all-fast

• Results available in the paper

Page 23: Understanding Performance, Power and Energy Behavior in Asymmetric Processors

23

Outline

• Background and Motivation

• Thread Interactions

• Dynamic Scheduling

• Asymmetry Aware Scheduling

• Conclusion and Future Work

Page 24: Understanding Performance, Power and Energy Behavior in Asymmetric Processors

24

• Motivation: better run-time adaptivity • Each thread requests for more work after

completing the assigned work• OpenMP, Intel Thread Building Blocks

Dynamic Scheduling

Page 25: Understanding Performance, Power and Energy Behavior in Asymmetric Processors

25

Dynamic Scheduling

• Can help improve performance and reduce energy consumption in AMPs• Should be preferred to static and guided policies

Machine configuration

Normalized Execution Time

Normalized Energy Consumption

Static/Dynamic Static/Dynamic

16 @ 1 GHz (SMP) 1.0 1.0

16 @ 1.2 GHz (SMP) 0.83 0.87

16 @ 1.4 GHz (SMP) 0.71 0.78

16 @ 1.7 GHz (SMP) 0.59 0.68

16 @ 2 GHz (SMP) 0.50 0.61

8 @ 1 GHz, 8 @ 2 GHz (AMP) 1.00/0.67 1.05/0.73

8 @ 1.2 GHz, 8 @ 2 GHz (AMP) 0.83/0.63 0.90/0.70

8 @ 1.4 GHz, 8 @ 2 GHz (AMP) 0.71/0.59 0.80/0.67

8 @ 1.7 GHz, 8 @ 2 GHz (AMP) 0.59/0.54 0.69/0.63

• Parallel-for application

Page 26: Understanding Performance, Power and Energy Behavior in Asymmetric Processors

26

Outline

• Background and Motivation

• Thread Interactions

• Dynamic Scheduling

• Asymmetry Aware Scheduling

• Conclusion and Future Work

Page 27: Understanding Performance, Power and Energy Behavior in Asymmetric Processors

27

Scheduling in AMPs

• Longest Job to a Fast Processor First (LJFPF) [Lakshminarayana’08]

barrier

Fast core

Fast core Slow core

Slow core

Page 28: Understanding Performance, Power and Energy Behavior in Asymmetric Processors

28

How Does the Scheduler Know

• Length of work? • Current mechanism: application sends task

length information• On-going work: Prediction mechanism

Page 29: Understanding Performance, Power and Energy Behavior in Asymmetric Processors

29

LJFPF

• ITK: Medical image processing applications (OpenSource)• MultiRegistration (Registration method)

– kernel with 50 iterations– 50 iterations divided among 8 threads

Normalized Execution Time Normalized Energy Consumption

Page 30: Understanding Performance, Power and Energy Behavior in Asymmetric Processors

30

Outline

• Background and Motivation

• Thread Interactions

• Dynamic Scheduling

• Asymmetry Aware Scheduling

• Conclusion and Future Work

Page 31: Understanding Performance, Power and Energy Behavior in Asymmetric Processors

31

Conclusion & Future Work

Conclusion• Evaluated the performance/energy consumption behavior

of multithreaded applications in AMPs

• For symmetric workloads– With little thread interaction: SMP with fast processors– With a lot of thread interaction: AMP could be better

• For asymmetric threads – AMP could provide lowest energy consumption

Future Work• Predict application characteristics and use predicted

information for thread scheduling on AMPs

Page 32: Understanding Performance, Power and Energy Behavior in Asymmetric Processors

32

Thank you!