techniques for multicore thermal management field cady, bin fu and kai ren
Post on 17-Dec-2015
221 Views
Preview:
TRANSCRIPT
Techniques for Multicore Thermal Management
•Overview and comparison of techniques•Plus determining the critical thread
•DVFS details
•Thread movement
Taxonomy
• Stop & Go vs DVFS– Stop & Go : suspend core operation for 30
millisecs when temperature above threshold– DVFS : dynamic voltage and frequency
scaling, from control theory
• Distributed vs Global– Apply above to all cores or individually– Performance asymmetry : different demands
on different cores
Taxonomy (cont.)
• Migration– Moving threads between cores– Timescale on order of a millisecond, much
slower than DVFS– Migration is “outer loop” or control, riding on
top of DVFS or Stop-Go
• Migrate “critical” thread– Measure criticality with heat sensor– Or with cache misses as a proxy
Aside : Criticality
• In separate paper, Abhishek et. al. defines “critical” as slowest thread
• If we know which is critical:– Task stealing from critical thread– Guide DVFS to prefer critical thread
• Explored proxies
• 13-32% performance boost in task stealing on 32-core machine
Donald and Martonosi : comparison of techniques
• Goal : maximize performance subject to temperature constraint
• Measure performance in BIPS and “duty cycle”, i.e. % useful time, scaled for DVFS frequency
• Run on SPEC benchmarks
• Simulated 4-core processor
Stop-Go was terrible!– Why didn’t they try with lower frequency?– Was 30 milliseconds the right time to stop?
They subsequently focus solely on DVFS, even though the hardware is trickier
Summary & Conclusion
• DVFS far superior to Stop-Go
• Distributed control helps, esp. for Stop-Go
• Migration helps for Stop-Go
• Counter and Sensor-based migration comparable
DVFS
• Dynamic voltage and frequency scaling (per core).
• Dynamic voltage scaling is a power management technique in computer architecture, where the voltage used in a component is increased or decreased
• Dynamic frequency scaling (also known as CPU throttling) is a technique in computer architecture where a processor is run at a less-than-maximum frequency in order to conserve power.
Challenge
• Multiple cores may need to be manipulated simultaneously to control both power and temperature for a CMP chip. Require a Multi-Input-Multi-Output (MIMO) control
• Application software is always designed for single-core processors. Power shifting needed.
• Heterogeneous cores• Workload of a CMP processor is unpredictable
at design time and may vary significantly at runtime
• Limitations of DVFS– Coarse grained
• Initiated by OS in milliseconds• Voltage transition delay ~ 10 microseconds• Too slow to respond fine variations in program
behavior (Cache miss ~ nanoseconds)
– Per-core DVFS with multiple VF settings• High cost of off-chip regulators• Bad scalability with a large number of cores
Motivation
• Idea of Thread Motion– Moving threads between cores with two VF domains– Threads experience virtually continuous Voltage
Thread Motion
• TM Manager– A separate embedded microcontroller running TM
algorithm
• Effective IPC
– maintain a table of IPC for each application– high IPC – compute-intensive– low IPC –cache miss, memory access latency
Thread Motion
• Movement Policy– Assign a thread in a compute-intensive phase
to a high VF core– Intra-cluster movement considered first
• Trigger point:– TM-interval : fixed intervals ~ 200 cycles– Miss-driven : move a cache-missed thread
Thread Motion: Algorithm
top related