multiprocessing. going multi-core helps energy efficiency william holt, hot chips 2005 adapted from...
TRANSCRIPT
Going Multi-core Helps Energy Efficiency• Power of typical integrated circuit C V2 f
– C = Capacitance, how well it “stores” a charge– V = Voltage– f = frequency. I.e., how fast clock is (e.g., 3 GHz)
William Holt, HOT Chips 2005
Adapted from UC Berkeley "The Beauty and Joy of Computing"
Processor Parallelism
• Process Parallelism : Ability run multiple instruction streams simultaneously
Flynn's Taxonomy
• Categorization of architectures based on– Number of simultaneous instructions– Number of simultaneous data items
SISD
• SISD : Single Instruction – Single Data– One instruction sent to one processing unit
to work on one piece ofdata
– May be pipelinedor superscalar
Modern SIMD
• x86 Processors– SSE Units : Streaming SIMD Execution– Operate on special 128 bit registers
• 4 32bit chunks• 2 64bit chunks• 16 8 bit chiunks• …
Modern SIMD
• Graphics Cardshttp://www.nvidia.com/object/fermi-architecture.html
• Becoming less and less "S"
Co Processors
• Graphics Processing : floating point specialized– i7 ~ 100 gigaflops– Kepler GPU ~ 1300 gigaflops
CUDA
• Compute Unified Device Architecture– Programming model for general purpose work on
GPU hardware– Streaming Multiprocessors each with 16-48 CUDA
cores
CUDA
• Designed for 1000's of threads– Broken into "warps" of 32 threads– Entire warp runs on SM in lock step– Branch divergence cuts
speed
MISD
• MISD : Multiple Instruction – Single Data– Different instruction, same data calculated– Rare– Space shuttle :
Five processors handlefly by wire input, vote
MIMD
• MIMD : Multiple Instruction – Multiple Data– Different instructions, working on different data
in different processing units
– Most common parallel
Co Processors
• Graphics Processing : floating point specialized– i7 ~ 100 gigaflops– Kepler GPU ~ 1300 gigaflops
Other Coprocessors
• CPU's used to have floating point coprocessors– Intel 30386 & 80387
• Audio cards• PhysX• Crytpo – SLL encryption for servers
Multiprocessing
• Multiprocessing : Many processors, shared memory– May have local cache/special memory
Heterogeneous Multicore
• Different cores for different jobs– Specialized media processing
in mobile devices
• Examples– Tegra – PS3 Cell
UMA
• Uniform Memory Access– Every processor sees every memory using same addresses– Same access time for any CPU to any memory word
NUMA
• Non Uniform Memory Access– Single memory address space visible to all CPUs– Some memory local
• Fast
– Some memory remote• Accessed in same way,
but slower
Connections
• Crossbar switched– Segmented memory– Any processor
can directly link toany memory
– N2 switches
BlueGene
• Major super computer player• http://s.top500.org/static/lists/2012/11/TOP500_201211_Poster.png
BG/P Compute Cards
• 4 processors per card• Fully coherent caches• Connected in double
torus to neighbors
Titan
• The king : Descendant of Redstorm– http://www.olcf.ornl.gov/titan/
Grid Computing
• Grid Computing– Multi Computing at internet scale– Resources owned by multiple parties
• http://folding.stanford.edu/ • Seti@Home
• Applications can almost never be completely parallelized; some serial code remains
• Speedup always limited by serial part of program
Speedup Issues : Amdahl’s Law
Time
Number of Cores
Parallel portion
Serial portion
1 5
Speedup Issues : Amdahl’s Law
Time
Number of Cores
Parallel portion
Serial portion
1 2 3 4 5
• Amdahl’s law:– s is serial fraction of program,
P is # of processors