adding pep to multicore - mit computer science and...
TRANSCRIPT
Adding PEP to MulticoreEric Lau
Jason Miller, Inseok Choi, Omer Khan, Jim Holt, Anantha Chandrakasan, Donald Yeung, Saman Amarasinghe, Anant Agarwal
Feb 16, 2011
Outline• Motivation• PEP Concept• PEP Core Architecture• Graphite Simulation• Applications• Conclusion
3/16/2011 Angstrom Lunch Seminar 2
Motivation PEP Concept Architecture Graphite Applications Conclusion
Angstrom
3/16/2011 Angstrom Lunch Seminar 3
Decrease programming effort
Increase performance
Increase resiliency
Increase energy efficiency
Motivation PEP Concept Architecture Graphite Applications Conclusion
Positive Energy Partnerships• Non-application resources that perform tasks for a
net gain in energy.
• Hardware Partnerships• shared resources• hard-wired events
• Software Partnerships• expose to h/w ISA• algorithms• event servicing
3/16/2011 Angstrom Lunch Seminar 4
Motivation PEP Concept Architecture Graphite Applications Conclusion
Tile
Positive Energy Partnerships• PEP Cores
3/16/2011 Angstrom Lunch Seminar 5
Router
ComputeCore
Cache
PEP
Motivation PEP Concept Architecture Graphite Applications Conclusion
PEP Core Architecture
3/16/2011 Angstrom Lunch Seminar 6
Motivation PEP Concept Architecture Graphite Applications Conclusion
PEP Core Architecture• Area Considerations
• μController: 16-bit, RISC, in-order datapath, unified cache• RAW: 32-bit, RISC, 8-stage, in-order datapath, split cache• Core 2: 64-bit, x86, 14-stage, out-of-order datapath, split cache
3/16/2011 Angstrom Lunch Seminar 7
Core Node (nm) SRAM Size (mm2) Scaled Size (mm2)
μController 65 128K 1.5 0.03
RAW 180 128K 16 0.25
Intel Core 2 65 2M/4M 80 9.0
Motivation PEP Concept Architecture Graphite Applications Conclusion
PEP Core Architecture• Power Considerations
• μController: 16-bit, RISC, 1 MHz• TilePro64: 64-bit, VLIW, 700-866 MHz• Core 2: 64-bit, x86, 1-2.4GHz
3/16/2011 Angstrom Lunch Seminar 8
Core Energy/Cycle
μController 27.2pJ
TilePro64 286pJ
Intel Core 2 101 000pJ
Motivation PEP Concept Architecture Graphite Applications Conclusion
Graphite
3/16/2011 Angstrom Lunch Seminar 9
Target Architecture
Graphite
Host Threads
Application
Host Process
Host Process
Host Process
Target Core Target Core Target Core
Target Core Target Core Target Core
Target Core Target Core Target CoreApplication Threads
Motivation PEP Concept Architecture Graphite Applications Conclusion
Graphite
3/16/2011 Angstrom Lunch Seminar 10
Target Architecture
Graphite
Host Threads
Application Threads
Application
Host Process
Host Process
Host Process
C P C P C P
C PC PC P
C P C P C P
Motivation PEP Concept Architecture Graphite Applications Conclusion
Potential Applications• Case Study
– Memory Prefetching
• Other Applications– Security– Reliability– Event Probing
3/16/2011 Angstrom Lunch Seminar 11
Motivation PEP Concept Architecture Graphite Applications Conclusion
Case Study: Memory Prefetching• Pre-Execution
3/16/2011 Angstrom Lunch Seminar 12
• PEP cores are free to run with minimal energy:– Low clock frequency– Low power hardware– Tight coupling
• Helper thread extracted from main thread.– Static compiler– Dynamic extraction
Motivation PEP Concept Architecture Graphite Applications Conclusion
Case Study: Helper Threads• EM3D benchmark (Execution Time)
3/16/2011 Angstrom Lunch Seminar 13
1.081.17 1.27
1.38
2.07
2.36
0.0
0.5
1.0
1.5
2.0
2.5
0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1 1.1
Nor
mal
ized
Exec
utio
n Ti
me
PEP Frequency/Main Frequency
Motivation PEP Concept Architecture Graphite Applications Conclusion
Case Study: Helper Threads• EM3D benchmark (Energy Efficiency)
3/16/2011 Angstrom Lunch Seminar 14
1.07931.1623
1.2436 1.2592
1.5797
1.1944
0.0
0.2
0.4
0.6
0.8
1.0
1.2
1.4
1.6
1.8
0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1 1.1
Nor
mal
ized
Ener
gy E
ffici
ency
PEP Frequency/Main Frequency
Motivation PEP Concept Architecture Graphite Applications Conclusion
Potential Applications• Security
– PEP core does not run application code• Immunity to application level attacks!
– Dynamic Information Flow Tracking (DIFT)• Taint pointers using PEP core• Run DIFT instructions in PEP
3/16/2011 Angstrom Lunch Seminar 15
Motivation PEP Concept Architecture Graphite Applications Conclusion
Potential Applications• Security
– Helper thread runs ahead only on DIFT instructions– Tight coupling allows access to register values
3/16/2011 Angstrom Lunch Seminar 16
PEP PipelineMain Pipeline
main helper
taint register
Motivation PEP Concept Architecture Graphite Applications Conclusion
Potential Applications• Reliability
– With 1000 cores on a single chip, and ultra low voltage logic, transient faults are a terrible issue.
– Redundant Multithreading (RMT)• Create a leading thread and a trailing thread.• Perform exact same execution on both threads.• Commit results only if both threads yield same results.
3/16/2011 Angstrom Lunch Seminar 17
Motivation PEP Concept Architecture Graphite Applications Conclusion
Potential Applications• Reliability
– Implementation• SMT: Low hardware overhead; incurs switching overhead.• CMP: Simple to implement; repeats mis-speculations.• PEP cores gain advantages of both!
3/16/2011 Angstrom Lunch Seminar 18
trailing
leading
Memory/RF/ $
Input Replication
Output Comparison
Motivation PEP Concept Architecture Graphite Applications Conclusion
Potential Applications• Self-Aware Computing
– Requires a way to monitor itself:• SMT• Extra thread• Extra core
– PEP core possesses detached execution context.• Main core keeps running!
3/16/2011 Angstrom Lunch Seminar 19
Motivation PEP Concept Architecture Graphite Applications Conclusion
Potential Applications• Event Probing
3/16/2011 Angstrom Lunch Seminar 20
Motivation PEP Concept Architecture Graphite Applications Conclusion
Potential Applications• Event Probing
3/16/2011 Angstrom Lunch Seminar 21
Main Pipeline
Cache PEP Core
On-chip Network Switch
Low
High
PEP Network Switch
Motivation PEP Concept Architecture Graphite Applications Conclusion
Conclusion• PEP cores allow for several optimizations with a
low energy/area cost (<10%).
• Next steps:– Evaluate more applications with more benchmarks.– Implement a comprehensive scheme for all these
applications.– Determine proper communication protocol between
PEP cores and main cores.
3/16/2011 Angstrom Lunch Seminar 22
HELPER
Case Study: Memory Prefetching
3/16/2011 Angstrom Lunch Seminar 23
CarbonSpawnHelperThread(helper);...
CarbonCondBroadcast(&resume_cond);cur_node = node_vec;
for (n = 0; n < N; ++n, ++cur_node){CarbonMutexLock(&counter_lock);shared_counter++;CarbonMutexunLock(&counter_lock);
cur_value = val(cur_node);values = cur_node->values;coeffs = cur_node->coeffs;
for (i = 0; i < stop; i++) cur_value -= values[i]*coeffs[i];
val(cur_node) = cur_value;}
MAIN
void * helper(){
while(1) {
CarbonCondWait(&resume_cond);
// Synchronization Code...local_counter = shared_counter++;...
values = cur_node->values;coeffs = cur_node->coeffs;
for (i = 0; i < stop; i++) {coeff_dummy = coeffs[i];value_dummy = values[i];
}}
}
HELPER
Case Study: Memory Prefetching
3/16/2011 Angstrom Lunch Seminar 24
CarbonSpawnHelperThread(helper);...
CarbonCondBroadcast(&resume_cond);cur_node = node_vec;
for (n = 0; n < N; ++n, ++cur_node){CarbonMutexLock(&counter_lock);shared_counter++;CarbonMutexunLock(&counter_lock);
cur_value = val(cur_node);values = cur_node->values;coeffs = cur_node->coeffs;
for (i = 0; i < stop; i++) cur_value -= values[i]*coeffs[i];
val(cur_node) = cur_value;}
MAIN
while (l_counter < n_nodes){while(1) {
CarbonMutexLock(&counter_lock);c_counter = shared_counter;CarbonMutexUnlock(&counter_lock);
offset = l_counter - c_counter;
if (offset >= MIN && offset < MAX)break;
else if ( offset <= MIN_OFFSET)l_counter = c_counter + OFFSET;break;
}
while(prev_l_counter < l_counter){cur_node++;prev_l_counter++;
}}
Backup Slides
3/16/2011 Angstrom Lunch Seminar 25
Motivation PEP Concept Architecture Graphite Applications Conclusion
Case Study: Memory Prefetching• Simulation Results on Graphite
– Graphite instantiates two cores per tile.– Each core runs a separate context (stack, registers, etc.)– PEP and main cores contain private L1 and shared L2.– Synchronization through finite barrier across all threads.
• EM3D benchmark (Olden Suite)
3/16/2011 Angstrom Lunch Seminar 26