edge architecture
TRANSCRIPT
-
7/30/2019 EDGE Architecture
1/35
TRIPS An EDGE Instruction
Set Architecture
Chirag Shah
April 24, 2008
-
7/30/2019 EDGE Architecture
2/35
What is an Instruction Set
Architecture (ISA)?
Attributes of a computer as seen by a machine
language programmer
Native data types, instructions, registers, addressingmodes, memory architecture, interrupt and
exception handling, and external I/O
Native, machine language commands opcodes
CISC (60s and 70s)
RISC (80s, 90s, and early 00s)
-
7/30/2019 EDGE Architecture
3/35
CISC vs RISC
CISC (Complex Instruction Set
Computer)
RISC (Reduced Instruction Set
Computer)
Emphasis on hardware Emphasis on software
Multi-clock, complex
instructions
Single-clock, reduced
instructions
LOAD and STORE
incorporated in instructions
LOAD and STORE are
independent instructions
Small code sizes, high cyclesper second Large code sizes, low cycles persecond
Transistors used for storing
complex instructions
Spends more transistors on
memory registers
-
7/30/2019 EDGE Architecture
4/35
Generic Computer
Data resides in main
memory
Execution unit carries out
computations Can only operate on data
loaded into registers
-
7/30/2019 EDGE Architecture
5/35
Multiply Two Numbers
One number A stored in 2:3
Other number B stored in 5:2
Store product in 2:3
-
7/30/2019 EDGE Architecture
6/35
CISC Approach
Complex instructions built into hardware (Ex. MULT)
Entire task in one line of assemblyMULT 2:3, 5:2
High-level language A = A * B Compiler high-level language into assembly
Smaller program size & fewer calls to memory ->savings on cost of memory and storage
-
7/30/2019 EDGE Architecture
7/35
RISC Approach
Only simple instructions 4 lines of assembly
LOAD A, 2:3
LOAD B, 5:2
PROD A, BSTORE 2:3, A
Less transistors of hardware space
All instructions execute in uniform time (one clockcycle) - pipelining
-
7/30/2019 EDGE Architecture
8/35
What is Pipelining?
Before Pipelining
-
7/30/2019 EDGE Architecture
9/35
After Pipelining
-
7/30/2019 EDGE Architecture
10/35
Why do we need a new ISA?
20 yrs RISC CPU performance - deeper pipelines
Suffer from data dependency
Worse for longer pipelines
Pipeline scaling nearly exhausted Beyond pipeline centric ISA
-
7/30/2019 EDGE Architecture
11/35
-
7/30/2019 EDGE Architecture
12/35
Steve Keckler and Doug Burger
Associate professors - University of Texas atAustin
2000 - predicted beginning of the end forconventional microprocessor architectures
Remarkable leaps in speed over last decadetailing off
Higher performance -> greater complexity
Designs consumed too much power andproduced too much heat
Industry at inflection point - old ways havestopped working
Industry shifting to multicore to buy time, not along range solution
-
7/30/2019 EDGE Architecture
13/35
EDGE Architecture
EDGE (Explicit Data Graph Execution)
Conventional architectures process one instruction ata time; EDGE processes blocks of instructions all at
once and more efficiently Current multicore technologies increase speed by
adding more processors
Shifts burden to software programmers, who must
rewrite their code EDGE technology - alternative approach when race to
multicore runs out of steam
-
7/30/2019 EDGE Architecture
14/35
EDGE Architecture (contd.)
Provides richer interface between compiler
and microarchitecture: directly expresses
dataflow graph that compiler generates
CISC and RISC require hardware to rediscover
data dependences dynamically at runtime
Therefore CISC and RISC require many power-
hungry structures and EDGE does not
-
7/30/2019 EDGE Architecture
15/35
TRIPS
Tera-op Reliable IntelligentlyAdaptive Processing System
first EDGE processor
prototype
Funded by the DefenseAdvanced Research ProjectsAgency - $15.4 million
Goal of one trillioninstructions per second by2012
-
7/30/2019 EDGE Architecture
16/35
Technology Characteristics
for Future Architectures
1. New concurrency mechanisms
2. Power-efficient performance
3. On-chip communication-dominated execution4. Polymorphism Use its execution and memoryunits in different ways to run diverse applications
-
7/30/2019 EDGE Architecture
17/35
TRIPS Addresses Four Technology
Characteristics
1. Increased concurrency array of concurrentlyexecuting arithmetic logic units (ALUs)
2. Power-efficient performance spreads outoverheads of sequential, von Neumann semantics,over 128-instruction blocks
3. Compile-time instruction placement to mitigatecommunication delays
4. Increased flexibility dataflow execution modeldoes not presuppose a given applicationcomputation pattern
-
7/30/2019 EDGE Architecture
18/35
Two Key Features
Block-atomic execution: Compiler sends executablecode to hardware in blocks of 128 instructions.Processor sees and executes a block all at once, as ifsingle instruction; greatly decreases overheadassociated with instruction handling and scheduling.
Direct instruction communication: Hardware deliversa producer instructions output directly as an input toa consumer instruction, rather than writing to
register file. Instructions execute in data flowfashion; each instruction executes as soon as itsinputs arrive.
-
7/30/2019 EDGE Architecture
19/35
-
7/30/2019 EDGE Architecture
20/35
Code Example Vector Addition
Add and accumulatefor fixed size vectors
Initial control flow
graph
-
7/30/2019 EDGE Architecture
21/35
Loop is unrolled Reduces the
overhead per loop
iteration Reduces the number
of conditional
branches that mustbe executed
-
7/30/2019 EDGE Architecture
22/35
Compiler produces TRIPS
Intermediate Language
(TIL) files Syntax of (name, target,
sources)
-
7/30/2019 EDGE Architecture
23/35
Block Dataflow Graph
-
7/30/2019 EDGE Architecture
24/35
Scheduler
analyzes eachblock dataflowgraph
Places
instructionswithin theblock
Producesassemblylanguage files
-
7/30/2019 EDGE Architecture
25/35
-
7/30/2019 EDGE Architecture
26/35
Block-level execution, up to 8 blocks
concurrently
-
7/30/2019 EDGE Architecture
27/35
TRIPS prototype chip - 130-nmASIC process; 500 MHz
Two processing cores; each can
issue 16 operations per cyclewith up to 1,024 instructions inflight simultaneously
Current high-performanceprocessors - maximum
execution rate of 4 operationsper cycle
2 MBs L2 cache 32 banks
-
7/30/2019 EDGE Architecture
28/35
-
7/30/2019 EDGE Architecture
29/35
-
7/30/2019 EDGE Architecture
30/35
-
7/30/2019 EDGE Architecture
31/35
-
7/30/2019 EDGE Architecture
32/35
Execution node fully functional ALU and 64
instruction buffers Data flow techniques work well with the three kinds of
concurrency found in software instruction level,
thread level, and data level parallelism
-
7/30/2019 EDGE Architecture
33/35
-
7/30/2019 EDGE Architecture
34/35
-
7/30/2019 EDGE Architecture
35/35
Architecture Generations
Driven by Technology