frank casilio computer engineering may 15, 1997 multithreaded processors

25
Frank Casilio Computer Engineering May 15, 1997 Multithreaded Processors

Upload: brice-chambers

Post on 13-Jan-2016

213 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Frank Casilio Computer Engineering May 15, 1997 Multithreaded Processors

Frank CasilioComputer Engineering

May 15, 1997

Multithreaded Processors

Page 2: Frank Casilio Computer Engineering May 15, 1997 Multithreaded Processors

1997 Frank Casilio2Computer Engineering

Problems with MultiProcessors

• Memory Latency

• Context Switching Time

• Communication/Synchronization Latency

• Cache Coherence• Writes To Memory

• Poor Programming Model

Page 3: Frank Casilio Computer Engineering May 15, 1997 Multithreaded Processors

1997 Frank Casilio3Computer Engineering

Motivation

• Reduce/Tolerate Memory Latency

• General Purpose Machine

• Scalability

• Shared Memory

• Simpler Programming Model

Page 4: Frank Casilio Computer Engineering May 15, 1997 Multithreaded Processors

1997 Frank Casilio4Computer Engineering

Typical Ways To Reduce Latency

• On-Chip Cache

• Shortens Round Trip To Memory

• Fast Buses & Networks

• Hardware Synchronization

• Prefetching

Page 5: Frank Casilio Computer Engineering May 15, 1997 Multithreaded Processors

1997 Frank Casilio5Computer Engineering

Multi-Threading: The Concept

• Support For Multiple Concurrent Hardware Contexts

• Tolerates Latency Instead of Reducing It

• Swap Contexts During Latencies

• Experimental Systems Have Existed Since The 50’s• Only 2 Commercial Systems Ever Produced

• HEP• Tera MTA

Page 6: Frank Casilio Computer Engineering May 15, 1997 Multithreaded Processors

1997 Frank Casilio6Computer Engineering

Parameters That Effect Efficiency

• Number Of Contexts Supported

• Switching Overhead

• Run Length (Granularity)

• Average Latency To Be Hidden

Page 7: Frank Casilio Computer Engineering May 15, 1997 Multithreaded Processors

1997 Frank Casilio7Computer Engineering

Switching Theory

• Determines How Often Contexts Switch

• Two Different Types

• Fine Grained• Coarse Grained

• Directly Related to Cost

Page 8: Frank Casilio Computer Engineering May 15, 1997 Multithreaded Processors

1997 Frank Casilio8Computer Engineering

Fine Grained Switching

• Switches Contexts Every Cycle

• Many Long Latencies Operations Tolerated

• Requires More Contexts• Workload Requirements

• Can Simplify Overall Processor Complexity

Page 9: Frank Casilio Computer Engineering May 15, 1997 Multithreaded Processors

1997 Frank Casilio9Computer Engineering

Coarse Grained Switching

• Switches Contexts After A Couple Of Cycles• Has Problems With Sporadic Latencies

• Requires Less Contexts

• Requires More Complex Processors

Page 10: Frank Casilio Computer Engineering May 15, 1997 Multithreaded Processors

1997 Frank Casilio10Computer Engineering

The TERA MTA

• First Commercial Multithreaded Machine Since 1978

• Uniform Shared Memory

• Scalable

• Direct Relationship b/w PE’s & Throughput

• Fine Grained Architecture

Page 11: Frank Casilio Computer Engineering May 15, 1997 Multithreaded Processors

1997 Frank Casilio11Computer Engineering

The Tera MTA Cont’d

• Torodial Interconnection

• 12 Million Dollar Base System

• 16-256 Processor Versions

Page 12: Frank Casilio Computer Engineering May 15, 1997 Multithreaded Processors

1997 Frank Casilio12Computer Engineering

Processor Characteristics

• Support For 128 Threads

• 16 Protection Domains

• 333 MHz Nominal Speed

• 0 Context Switching Overhead!!!

• 1 GFLOP Peak Performance

Page 13: Frank Casilio Computer Engineering May 15, 1997 Multithreaded Processors

1997 Frank Casilio13Computer Engineering

Processor Characteristics Cont’d

• Load-Store Architecture• 3 Addressing Modes

• 31 64-bit GPR’s

• 3 Operations Per Instruction• 1 Memory Reference• 1 Arithmetic Operation• 1 Control (i.e.. Branch)

• 6KW Of Power Dissipation Per Processor

Page 14: Frank Casilio Computer Engineering May 15, 1997 Multithreaded Processors

1997 Frank Casilio14Computer Engineering

Interconnection Network

• 3-D Torus Contains 3p/2 nodes

• Packet Switching

• 3 Cycles of Latency Per Node

• Messages Are Assigned Random Priorities

• 164 Bit Packets• 64 Bits Are Data• 2.67 GB/s Bandwidth In Each Direction

• 2 HIPPI Channels / Processor For Net Connection

Page 15: Frank Casilio Computer Engineering May 15, 1997 Multithreaded Processors

1997 Frank Casilio15Computer Engineering

Memory

• 8, 16, 32 and 64 Bit Addressable

• 4 Bits per Word Of Access State For Synchronization

• Memory Units Equipped With Error Correcting Code

• Memory Usage In Random To All Banks

• Either 2p or 4p Units, Interleaved 64 Ways

• 16 MB DRAM Chips

Page 16: Frank Casilio Computer Engineering May 15, 1997 Multithreaded Processors

1997 Frank Casilio16Computer Engineering

Input / Output• Maximum Strategy Gen5 XL RAID

• Sustained Bandwidth of 130 MB/s

• At Least p/16 Disk Arrays Are Required

• System Capacity of 300p GB

• 20p MB/s In Each Direction

Page 17: Frank Casilio Computer Engineering May 15, 1997 Multithreaded Processors

1997 Frank Casilio17Computer Engineering

Operating System

• Distributed Parallel Version Of Unix• Highly Concurrent Version Of Berkeley

• Allows Systems To Run p Tasks Truly Parallel

• Streams Are Dynamically Created w/o OS Intervention

• Processes Are Broken Up Into Tasks By OS

• Two Tier Scheduler Provides Better Resource Allocation• PL Scheduler• PB Scheduler

Page 18: Frank Casilio Computer Engineering May 15, 1997 Multithreaded Processors

1997 Frank Casilio18Computer Engineering

Software / Languages

• Implicit And Explicit Parallelism Is Allowed

• Automatic Parallelization Of:• C, C++ & Fortran By The Compiler

• High Degree of Cray Compatibility

• Easy To Program b/c Of Architecture

Page 19: Frank Casilio Computer Engineering May 15, 1997 Multithreaded Processors

1997 Frank Casilio19Computer Engineering

System Performance

• 3.84-12.8 Times Performance Of Cray T90/32

• 1K x 1K Matrix Multiple in 50 ms

• Integer Sort of 100M Keys in 36 ms

Page 20: Frank Casilio Computer Engineering May 15, 1997 Multithreaded Processors

1997 Frank Casilio20Computer Engineering

Conclusion

• Proven Effectiveness

• Logical Step For Multiprocessor Computers

• Still Very Pricey

• Allow General Purpose Workload

• Scalable

• Shared Memory

Page 21: Frank Casilio Computer Engineering May 15, 1997 Multithreaded Processors

1997 Frank Casilio21Computer Engineering

Questions?

Page 22: Frank Casilio Computer Engineering May 15, 1997 Multithreaded Processors

1997 Frank Casilio22Computer Engineering

Instruction Pipeline

Page 23: Frank Casilio Computer Engineering May 15, 1997 Multithreaded Processors

1997 Frank Casilio23Computer Engineering

Breakdown Of A Task

Task

Tea

m

Tea

m

Tea

m

Tea

m

VPVPVPVPVPVPVPVP

Page 24: Frank Casilio Computer Engineering May 15, 1997 Multithreaded Processors

1997 Frank Casilio24Computer Engineering

Page 25: Frank Casilio Computer Engineering May 15, 1997 Multithreaded Processors

1997 Frank Casilio25Computer Engineering

Deciding The Of Number Contexts