asymmetry aware scheduling algorithms for asymmetric processors

Asymmetry Aware Scheduling Algorithms for Asymmetric Processors

Nagesh Lakshminarayana Sushma Rao Hyesoon Kim

Computer Science Georgia Institute of Technology

Outline

• Background and Problem

• Application characteristics on AMP/SMP

• LJFPF Policy

• CJFPF Policy

• Conclusion

Heterogeneous Architectures

• A particularly interesting class of parallel machines is Heterogeneous Architecture:– Multiple types of Processing Elements (PEs)

available on the same machine

Heterogeneous Architectures

• Heterogeneous architectures are becoming very common:

Multicore CPU + GPU

IBM Cell processor

Special accelerator

Fast core

Slow core

Focus of this talk

Asymmetric Processors

Fast core

Scheduling Problem: Multiple applications

Fast core

Slow core

Scalable applications

Non-scalable applications

Fast core

Fast Core

Slow Core

Scheduling Problem: Multi-threaded application

Fast core

Slow core

Fast core

Problem

How to schedule multi-threaded applications on Asymmetric Multiprocessors (AMP)?

Outline

• LJFPF Policy

• CJFPF Policy

• Conclusion

Experimental Methodology

• Use a 1.87GHz two-socket Quad-core machine to measure the performance

• Use SpeedStep technology to emulate an AMP

All-slow (SMP) All 8 processors are running at 1.6 GHz

One-fast (AMP) 1 processors are running at 1.87 GHz

7 processors are running at 1.6GHz

Half-half (AMP) 4 processors are running at 1.87GHz

4 processors are running at 1.6GHz

All-fast (SMP) All processors are running at 1.87GHz

Performance Results on AMP/SMP

All-slow

One-fast

Half-half

All-fast

Fast core

Slow core

Fast core

Slow-Limited Applications

barrier

Middle-perf Benchmarks

barrier

Similar to a slow-limited benchmark but sequential section is much longer

Unstable Benchmarks

barrier

Lots of barriers Asymmetric workloads

PARSEC Benchmarks

Application Locks Barriers Cond. Variables

AMP performance category

BlackSholes 39 8 0.000 slow-limited

Bodytrack 6824702 111160 0.003 unstable

Canneal 34 0 0.003 middle-perf

dedup 10002625 0 0.009 unstable

ferret 1422579 0 0.014 slow-limited

facesim 7384488 0 0.03 middle-perf

Fluidanimate 1153407308 31998 0.02 unstable

Freqmine 39 0 0.12 middle-perf

streamcluster 1379 633174 0.013 middle-perf

swaptions 9 0 0.00 slow-limited

vips 11 0 0.0049 unstable

x264 207692 0 13793 middle-perf

Outline

• Applications on AMP/SMP

• LJFPF Policy

• CJFPF Policy

• Conclusion

LJFPF Policy

• Longest Job to a Fast Processor First

barrier

Fast core

Fast core Slow core

Slow core

How Does the Scheduler Know

• Length of work?

• Current mechanism: application sends the information

• On-going work: Prediction mechanism

Evaluation

• Matrix Multiplication

Sequential version

Parallel versionSymmetric workload

Parallel versionAsymmetric workload

Asymmetric Workload (Matrix Multiplication)

300-300

310-290

320-280

330-270

340-260

350-250

360-240

All-fast

Half-half(LJFPF)

Half-half (RR)

All-slow

Real Application

• ITK (Medical image processing tool kit)– Open source but a real application

Evaluation: MultiRegistration

• Kernel loop has 50 iterations

50 % 8 ≠0

• Divide 50 iterations into 7, 7, 7, 7, 6, 6, 5, 5

imeResults: ITK Benchmark

Outline

• LJFPF Policy

• CJFPF Policy

• Conclusion

Critical Section

Critical Section Limited Workloads

Critical section

Useful workwaiting

Case (a)

Case (b)

Critical Section Effects

10%CS 15%CS 20%CS

All-fast

Half-half

All-slow

Half-half performs similar to all-fast

CJFPF Policy

• Critical Job to a Fast Processor First Policy

Fast core

Slow core

8-12 16-24 40-60

CJFPF Results

Longer critical sectionThe benefit of the CJFPF policy decreases

Conclusion

• We evaluated the characteristics of multi-threaded applications on AMPs.

• Barriers and critical sections are important factors.• Propose two new scheduling policies: Longest job

to fast core first (LJFPF), critical job to fast core first (CJFPF)– Scheduling polices improve performance for asymmetric

workloads.• Future work

– Develop a prediction mechanism– Evaluate symmetric workloads on AMPs– Other kinds of heterogeneous architectures

Thank you!

asymmetry aware scheduling algorithms for asymmetric processors

Documents

facilitating efficient synchronization of asymmetric threads...

ixp: the bump in the wire ixp: the bump in the wire inf5062:...

understanding performance, power and energy behavior in...

asymmetric exchange-rate exposure in bric countries€¦ ·...

views of asymmetry in operations - army.cz · pdf fileviews...

patent-investment games under asymmetric …information....

asymmetric effects of favorable and unfavorable ......

development of the asymmetric human€¦ · some time ago...

analysis of the asymmetric gene expression between the...

reliability implications of power/thermal constrained ... ·...

facilitating efficient synchronization of asymmetric threads...

analyzing performance asymmetric multicore processors for...

asymmetric attention and volatility asymmetry

dividing cellular asymmetry: asymmetric cell...

autores tÍtulo fuente citas€¦ · organocatalytic...

advent of 10g asymmetric epon -...

the asymmetric effects of uncertainty on inflation and...

take control of your multicore debugging - iar.com ·...

vulc2006 slide vo2 · summary summary evidence of matter...

optimal long-term supply contracts with asymmetric demand...