edge architecture

Upload: engr-ayaz-khan

Post on 14-Apr-2018

220 views

Category:

Documents


0 download

TRANSCRIPT

  • 7/30/2019 EDGE Architecture

    1/35

    TRIPS An EDGE Instruction

    Set Architecture

    Chirag Shah

    April 24, 2008

  • 7/30/2019 EDGE Architecture

    2/35

    What is an Instruction Set

    Architecture (ISA)?

    Attributes of a computer as seen by a machine

    language programmer

    Native data types, instructions, registers, addressingmodes, memory architecture, interrupt and

    exception handling, and external I/O

    Native, machine language commands opcodes

    CISC (60s and 70s)

    RISC (80s, 90s, and early 00s)

  • 7/30/2019 EDGE Architecture

    3/35

    CISC vs RISC

    CISC (Complex Instruction Set

    Computer)

    RISC (Reduced Instruction Set

    Computer)

    Emphasis on hardware Emphasis on software

    Multi-clock, complex

    instructions

    Single-clock, reduced

    instructions

    LOAD and STORE

    incorporated in instructions

    LOAD and STORE are

    independent instructions

    Small code sizes, high cyclesper second Large code sizes, low cycles persecond

    Transistors used for storing

    complex instructions

    Spends more transistors on

    memory registers

  • 7/30/2019 EDGE Architecture

    4/35

    Generic Computer

    Data resides in main

    memory

    Execution unit carries out

    computations Can only operate on data

    loaded into registers

  • 7/30/2019 EDGE Architecture

    5/35

    Multiply Two Numbers

    One number A stored in 2:3

    Other number B stored in 5:2

    Store product in 2:3

  • 7/30/2019 EDGE Architecture

    6/35

    CISC Approach

    Complex instructions built into hardware (Ex. MULT)

    Entire task in one line of assemblyMULT 2:3, 5:2

    High-level language A = A * B Compiler high-level language into assembly

    Smaller program size & fewer calls to memory ->savings on cost of memory and storage

  • 7/30/2019 EDGE Architecture

    7/35

    RISC Approach

    Only simple instructions 4 lines of assembly

    LOAD A, 2:3

    LOAD B, 5:2

    PROD A, BSTORE 2:3, A

    Less transistors of hardware space

    All instructions execute in uniform time (one clockcycle) - pipelining

  • 7/30/2019 EDGE Architecture

    8/35

    What is Pipelining?

    Before Pipelining

  • 7/30/2019 EDGE Architecture

    9/35

    After Pipelining

  • 7/30/2019 EDGE Architecture

    10/35

    Why do we need a new ISA?

    20 yrs RISC CPU performance - deeper pipelines

    Suffer from data dependency

    Worse for longer pipelines

    Pipeline scaling nearly exhausted Beyond pipeline centric ISA

  • 7/30/2019 EDGE Architecture

    11/35

  • 7/30/2019 EDGE Architecture

    12/35

    Steve Keckler and Doug Burger

    Associate professors - University of Texas atAustin

    2000 - predicted beginning of the end forconventional microprocessor architectures

    Remarkable leaps in speed over last decadetailing off

    Higher performance -> greater complexity

    Designs consumed too much power andproduced too much heat

    Industry at inflection point - old ways havestopped working

    Industry shifting to multicore to buy time, not along range solution

  • 7/30/2019 EDGE Architecture

    13/35

    EDGE Architecture

    EDGE (Explicit Data Graph Execution)

    Conventional architectures process one instruction ata time; EDGE processes blocks of instructions all at

    once and more efficiently Current multicore technologies increase speed by

    adding more processors

    Shifts burden to software programmers, who must

    rewrite their code EDGE technology - alternative approach when race to

    multicore runs out of steam

  • 7/30/2019 EDGE Architecture

    14/35

    EDGE Architecture (contd.)

    Provides richer interface between compiler

    and microarchitecture: directly expresses

    dataflow graph that compiler generates

    CISC and RISC require hardware to rediscover

    data dependences dynamically at runtime

    Therefore CISC and RISC require many power-

    hungry structures and EDGE does not

  • 7/30/2019 EDGE Architecture

    15/35

    TRIPS

    Tera-op Reliable IntelligentlyAdaptive Processing System

    first EDGE processor

    prototype

    Funded by the DefenseAdvanced Research ProjectsAgency - $15.4 million

    Goal of one trillioninstructions per second by2012

  • 7/30/2019 EDGE Architecture

    16/35

    Technology Characteristics

    for Future Architectures

    1. New concurrency mechanisms

    2. Power-efficient performance

    3. On-chip communication-dominated execution4. Polymorphism Use its execution and memoryunits in different ways to run diverse applications

  • 7/30/2019 EDGE Architecture

    17/35

    TRIPS Addresses Four Technology

    Characteristics

    1. Increased concurrency array of concurrentlyexecuting arithmetic logic units (ALUs)

    2. Power-efficient performance spreads outoverheads of sequential, von Neumann semantics,over 128-instruction blocks

    3. Compile-time instruction placement to mitigatecommunication delays

    4. Increased flexibility dataflow execution modeldoes not presuppose a given applicationcomputation pattern

  • 7/30/2019 EDGE Architecture

    18/35

    Two Key Features

    Block-atomic execution: Compiler sends executablecode to hardware in blocks of 128 instructions.Processor sees and executes a block all at once, as ifsingle instruction; greatly decreases overheadassociated with instruction handling and scheduling.

    Direct instruction communication: Hardware deliversa producer instructions output directly as an input toa consumer instruction, rather than writing to

    register file. Instructions execute in data flowfashion; each instruction executes as soon as itsinputs arrive.

  • 7/30/2019 EDGE Architecture

    19/35

  • 7/30/2019 EDGE Architecture

    20/35

    Code Example Vector Addition

    Add and accumulatefor fixed size vectors

    Initial control flow

    graph

  • 7/30/2019 EDGE Architecture

    21/35

    Loop is unrolled Reduces the

    overhead per loop

    iteration Reduces the number

    of conditional

    branches that mustbe executed

  • 7/30/2019 EDGE Architecture

    22/35

    Compiler produces TRIPS

    Intermediate Language

    (TIL) files Syntax of (name, target,

    sources)

  • 7/30/2019 EDGE Architecture

    23/35

    Block Dataflow Graph

  • 7/30/2019 EDGE Architecture

    24/35

    Scheduler

    analyzes eachblock dataflowgraph

    Places

    instructionswithin theblock

    Producesassemblylanguage files

  • 7/30/2019 EDGE Architecture

    25/35

  • 7/30/2019 EDGE Architecture

    26/35

    Block-level execution, up to 8 blocks

    concurrently

  • 7/30/2019 EDGE Architecture

    27/35

    TRIPS prototype chip - 130-nmASIC process; 500 MHz

    Two processing cores; each can

    issue 16 operations per cyclewith up to 1,024 instructions inflight simultaneously

    Current high-performanceprocessors - maximum

    execution rate of 4 operationsper cycle

    2 MBs L2 cache 32 banks

  • 7/30/2019 EDGE Architecture

    28/35

  • 7/30/2019 EDGE Architecture

    29/35

  • 7/30/2019 EDGE Architecture

    30/35

  • 7/30/2019 EDGE Architecture

    31/35

  • 7/30/2019 EDGE Architecture

    32/35

    Execution node fully functional ALU and 64

    instruction buffers Data flow techniques work well with the three kinds of

    concurrency found in software instruction level,

    thread level, and data level parallelism

  • 7/30/2019 EDGE Architecture

    33/35

  • 7/30/2019 EDGE Architecture

    34/35

  • 7/30/2019 EDGE Architecture

    35/35

    Architecture Generations

    Driven by Technology