asymmetry aware scheduling algorithms for asymmetric processors

Post on 06-Jan-2016

16 Views

Category:

Documents

0 Downloads

Preview:

Click to see full reader

DESCRIPTION

Asymmetry Aware Scheduling Algorithms for Asymmetric Processors. Nagesh Lakshminarayana Sushma Rao Hyesoon Kim Computer Science Georgia Institute of Technology. Outline. Background and Problem Application characteristics on AMP/SMP LJFPF Policy CJFPF Policy Conclusion. PE B. PE B. - PowerPoint PPT Presentation

TRANSCRIPT

Asymmetry Aware Scheduling Algorithms for Asymmetric Processors

Nagesh Lakshminarayana Sushma Rao Hyesoon Kim

Computer Science Georgia Institute of Technology

Outline

• Background and Problem

• Application characteristics on AMP/SMP

• LJFPF Policy

• CJFPF Policy

• Conclusion

Heterogeneous Architectures

• A particularly interesting class of parallel machines is Heterogeneous Architecture:– Multiple types of Processing Elements (PEs)

available on the same machine

PEA

PEB

PEB

PEB

PEB

Inte

rcon

nect

Heterogeneous Architectures

• Heterogeneous architectures are becoming very common:

Multicore CPU + GPU

IBM Cell processor

Special accelerator

Fast core

Slow core

Slow core

Slow core

Slow core

Focus of this talk

Asymmetric Processors

Fast core

Scheduling Problem: Multiple applications

Fast core

Slow core

Slow core

Slow core

Slow core

Scalable applications

Non-scalable applications

Fast core

Fast Core

Slow Core

Scheduling Problem: Multi-threaded application

Fast core

Slow core

Slow core

Slow core

Slow core

Fast core

Problem

How to schedule multi-threaded applications on Asymmetric Multiprocessors (AMP)?

Outline

• Background and Problem

• Application characteristics on AMP/SMP

• LJFPF Policy

• CJFPF Policy

• Conclusion

Experimental Methodology

• Use a 1.87GHz two-socket Quad-core machine to measure the performance

• Use SpeedStep technology to emulate an AMP

All-slow (SMP) All 8 processors are running at 1.6 GHz

One-fast (AMP) 1 processors are running at 1.87 GHz

7 processors are running at 1.6GHz

Half-half (AMP) 4 processors are running at 1.87GHz

4 processors are running at 1.6GHz

All-fast (SMP) All processors are running at 1.87GHz

Performance Results on AMP/SMP

0.8

0.85

0.9

0.95

1

1.05

No

rma

lize

d e

xe

cu

tio

n t

ime

All-slow

One-fast

Half-half

All-fast

Fast core

Slow core

Slow core

Slow core

Slow core

Fast core

Slow-Limited Applications

barrier

Middle-perf Benchmarks

barrier

Similar to a slow-limited benchmark but sequential section is much longer

Unstable Benchmarks

barrier

barrier

Lots of barriers Asymmetric workloads

PARSEC Benchmarks

Application Locks Barriers Cond. Variables

AMP performance category

BlackSholes 39 8 0.000 slow-limited

Bodytrack 6824702 111160 0.003 unstable

Canneal 34 0 0.003 middle-perf

dedup 10002625 0 0.009 unstable

ferret 1422579 0 0.014 slow-limited

facesim 7384488 0 0.03 middle-perf

Fluidanimate 1153407308 31998 0.02 unstable

Freqmine 39 0 0.12 middle-perf

streamcluster 1379 633174 0.013 middle-perf

swaptions 9 0 0.00 slow-limited

vips 11 0 0.0049 unstable

x264 207692 0 13793 middle-perf

Outline

• Background and Problem

• Applications on AMP/SMP

• LJFPF Policy

• CJFPF Policy

• Conclusion

LJFPF Policy

• Longest Job to a Fast Processor First

barrier

Fast core

Fast core Slow core

Slow core

How Does the Scheduler Know

• Length of work?

• Current mechanism: application sends the information

• On-going work: Prediction mechanism

Evaluation

• Matrix Multiplication

Sequential version

Parallel versionSymmetric workload

Parallel versionAsymmetric workload

Asymmetric Workload (Matrix Multiplication)

0.9

0.95

1

1.05

1.1

1.15

1.2

300-300

310-290

320-280

330-270

340-260

350-250

360-240

No

rma

lize

d e

xecu

tion

tim

e

All-fast

Half-half(LJFPF)

Half-half (RR)

All-slow

Real Application

• ITK (Medical image processing tool kit)– Open source but a real application

Evaluation: MultiRegistration

• Kernel loop has 50 iterations

50 % 8 ≠0

• Divide 50 iterations into 7, 7, 7, 7, 6, 6, 5, 5

0.92

0.94

0.96

0.98

1

1.02

1.04

All-f

ast

Ha

lf-h

alf

(LJF

PF

)

Ha

lf-h

alf

(RR

)

All-s

low

No

rma

lize

d e

xe

cu

tio

n t

imeResults: ITK Benchmark

2.3%

Outline

• Background and Problem

• Application characteristics on AMP/SMP

• LJFPF Policy

• CJFPF Policy

• Conclusion

Critical Section

Lock

Lock

Critical Section Limited Workloads

Critical section

Useful workwaiting

Case (a)

Case (b)

Critical Section Effects

0

1

2

3

4

5

6

7

8

9

10%CS 15%CS 20%CS

sp

eed

up

All-fast

Half-half

All-slow

Half-half performs similar to all-fast

CJFPF Policy

• Critical Job to a Fast Processor First Policy

Fast core

Slow core

Slow core

Slow core

0

1

2

3

4

5

6

7

8-12 16-24 40-60

sp

eed

up

CJFPF

RR

CJFPF Results

Longer critical sectionThe benefit of the CJFPF policy decreases

Conclusion

• We evaluated the characteristics of multi-threaded applications on AMPs.

• Barriers and critical sections are important factors.• Propose two new scheduling policies: Longest job

to fast core first (LJFPF), critical job to fast core first (CJFPF)– Scheduling polices improve performance for asymmetric

workloads.• Future work

– Develop a prediction mechanism– Evaluate symmetric workloads on AMPs– Other kinds of heterogeneous architectures

Thank you!

top related