the thunder sparc processor - lightner · 2003. 6. 18. · metaflow 13 thunder unusual features l...

21
METAFLOW 1 The Thunder SPARC Processor HOT Chips VI August 15-16, 1994 Bruce D. Lightner Vice President, Development Metaflow Technologies, Inc. 4250 Executive Square, Suite 300 La Jolla, California 92037 (619) 452-6608 Internet: [email protected]

Upload: others

Post on 01-Feb-2021

0 views

Category:

Documents


0 download

TRANSCRIPT

  • METAFLOW1

    The Thunder SPARC Processor

    HOT Chips VI

    August 15-16, 1994

    Bruce D. LightnerVice President, Development

    Metaflow Technologies, Inc.4250 Executive Square, Suite 300

    La Jolla, California 92037(619) 452-6608

    Internet: [email protected]

  • METAFLOW2

    Metaflow Backgroundl Company founded in 1987l Pioneers in out-of-order/speculative execution

    microprocessor designl First major design contract was LSI Logic's

    “Lightning” SPARC chipset (completed 1991)l “Thunder” SPARC 3-chip set is improved

    version of “Lightning” 4-chip setl Project 100% funded by Hyundai Electronicsl Thunder chipset first silicon July 1993l Patented out-of-order architecture

  • METAFLOW3

    Thunder SPARC Mbus Module

    IntegerUnit(IU)

    XcacheRAM

    (1 Mbyte)

    FloatingPoint Unit

    (FPU)

    Cache Control/MMU/Bus Intf

    (CMB)17

    36

    64

    Mbus (multiprocessor)85

    FPU Instrs

    Bypass

    Address

    Data

    322.7M

    2.5M

    0.8M

  • METAFLOW4

    Thunder SPARC Chipset

    l Superscalar fetch, issue, and execution

    l Dynamic scheduling (dataflow)

    l Out-of-order execution

    l Speculative execution

    l Above “hidden” from the programmer

    l Factor of 2-3 performance advantage fromarchitecture

  • METAFLOW5

    Microprocessor HardwareTrends

    l Superscalar instruction issue

    l Super-pipelined execution units

    l Super-fast processor clocks (>100 MHz)

    l Out-of-order execution

    l Speculative execution

  • METAFLOW6

    Out-of-Order Execution

    PROGRAM ORDER COMPLETION ORDER

    float A, X, XX, X1int n...X1 = 1 / A;XX = A * A;n = n + 1;X = X + A;...

    .

    .

    .n = n + 1;X = X + A;XX = A * A;X1 = 1 / A;...

    OperationFP divideFP multiplyFP addInteger add

    Clocks10531

  • METAFLOW7

    Speculative Execution

    C EXAMPLE

    for ( i = 0; i < MAX; ++i ) { if (X[i] < 0) error_exit(); sum = sum + sqrt(X[i]);}

    DO 100 i = 1, max IF (X(i) .LT. 0) CALL error_exit100 SUM = SUM + SQRT(X(i))

    FORTRAN EXAMPLE

    Speculative executionopportunity

    Speculative executionopportunity

    Xii

    n

    =∑

    0

  • METAFLOW8

    Conventional PipelineInstruction

    Cache

    RegisterFile

    FETCH

    ISSUE

    EXECUTE

    RETIRE

  • METAFLOW9

    Metaflow Pipeline (DRIS)

    DRISINSTRUCTION n

    INSTRUCTION 2

    INSTRUCTION 1

    ...Deferred-scheduling, Register-renaming

    Instruction Shelf

    RETIRE

    RegisterFile

    ISSUE

    SCHEDULE

    UPDATE

    ...

    ...

    ...

    ...

    ...

    ...

    ...

    InstructionCache

    ...

    ...

    ...

  • METAFLOW10

    Metaflow DRIS Entry

    Locked RegNum ID

    Source Operand 1

    Locked RegNum ID

    Source Operand 2

    Destination

    Latest RegNum Instruction’s Results

    Yes/No

    Dispatched

    Class Number

    Functional Unit

    Yes/No

    Executed

    Content

    Program Counter

  • METAFLOW11

    Thunder IU

    DRISINSTRUCTION n

    INSTRUCTION 2

    INSTRUCTION 1

    ...•Out-of-order instruction results•Speculative instruction results•Automatic “register renaming”

    RETIRE

    RegisterFile

    ISSUE

    SCHEDULE

    UPDATE

    InstructionCache Branch

    Unit

    BranchShelf

    Predecoder

    MemoryShelf

    DataBus

    BypassBus

    AddressBus+ + * /Addr

    ConditionCodes

    StoreData

  • METAFLOW12

    Thunder ProcessorDistinguishing Features

    l 4 instructions issued per clock (3 integer, 2floating point instructions and one branch)

    l 8 parallel functional units (3 integer ALUs, 2FP ALUs, 1 branch unit, 2 memory/bypass)

    l Dataflow-based out-of-order instructionissue/execution (memory/ALU operations), in-order completion, precise traps and interrupts

    l Speculative execution beyond unresolvedconditional branches (multiple basic blockswith instant state repair on error)

    l Above mechanisms transparent to executingprogram (strict SPARC V8 compatibility)

  • METAFLOW13

    Thunder Unusual Featuresl Eager evaluation (“folds” conditional branches)l Multiple instances of memory locations

    (“memory renaming”)l Out-of-order memory references

    – Multiple simultaneous cache miss processing(“non-blocking cache”)

    – Split address and data memory transactions(memory response re-ordering allowed)

    l Speculative memory reads with respect toshared memory coherence restrictions(emulates “strong consistency” model)

    l Uninterrupted transitions between user andsupervisor modes (traps/interrupts)

  • METAFLOW14

    Thunder Unusual Features(continued)

    Floating Point Unit (FPU)l Variable latency FPU (early termination for

    divide/square root)l Non-blocking divide and square root

    (concurrent add/multiply)

    Branch Predictionl Dynamic branch predictionl Generalized loop count predictionl Return-from-subroutine (RET) “folding”l Jump-through-register prediction (JMPL cache)

  • METAFLOW15

    Thunder Status

    First SiliconFoundryCMOS technologyMetal layersSupply voltageClock rate (processor) (Mbus)Die Size (mm square)PackageMethodology (data path/RAM) (control logic) (test)External cache RAM (type) (size)Total transistors (3 chips)

    July/Nov 1993VLSI Technologies

    0.6/0.8 µm3

    5V50 MHz40 MHz14.5/17.4

    IPGA (391 pins)Full custom

    Standard cellJTAG/full scan

    “Viking”Eight 128K x 9

    ~6 million

    Thunder 1.0

    Q4 1994

    0.5 µm4

    3.6V80 MHz60 MHz9.1/11.8

    TBGA (600+ pins)Full custom

    Standard cellJTAG/full scan

    “Pentium”Eight 64K x 18

    ~6 million

    Thunder 1.5

  • METAFLOW16

    Thunder Status

    First SiliconFoundryCMOS technologyMetal layersSupply voltageClock rate (processor) (Mbus)Die Size (mm square)PackageMethodology (data path/RAM) (control logic) (test)External cache RAM (type) (size)Total transistors (3 chips)

    July/Nov 1993VLSI Technologies

    0.6/0.8 µm3

    5V50 MHz40 MHz14.5/17.4

    IPGA (391 pins)Full custom

    Standard cellJTAG/full scan

    “Viking”Eight 128K x 9

    ~6 million

    Thunder 1.0

    Q4 1994IBM

    0.5 µm4

    3.6V80 MHz60 MHz9.1/11.8

    TBGA (600+ pins)Full custom

    Standard cellJTAG/full scan

    “Pentium”Eight 64K x 18

    ~6 million

    Thunder 1.5

  • METAFLOW17

    0Q1’94 Q4’94 Q2’95

    400

    Thunder Performance Goals

    300

    200

    10010050 MHz

    0.8/0.6 µm80 MHz0.5 µm

    120 MHz0.5 µm

    SPECint92

    SPECfp92

    500

  • METAFLOW18

    Instruction Shelf

    RegisterFile ALUs

    ResultShelf

    RetirementDataTLBLoad

    Buffer

    RegisterRenaming

    Load/StoreLogic

    LoadShelf

    Store Shelf

    InstrTLB

    PhysicalTags

    InstructionCache

    (20 Kbytes)

    InstrDecode

    Load/Store

    DependCheck

    InstrPre-Dec

    VirtualTags

    InstrIssueLogic

    PCGenerator

    Ret AddrCache

    BranchShelf

    BranchPredict

    BackupState

    Thunder 1.0 IU Floor Plan

  • METAFLOW19

    Instruction Shelf

    RegisterFile ALUs

    ResultShelf

    RetirementDataTLBLoad

    Buffer

    RegisterRenaming

    Load/StoreLogic

    LoadShelf

    Store Shelf

    InstrTLB

    PhysicalTags

    InstructionCache

    (20 Kbytes)

    InstrDecode

    Load/Store

    DependCheck

    InstrPre-Dec

    VirtualTags

    InstrIssueLogic

    PCGenerator

    Ret AddrCache

    BranchShelf

    BranchPredict

    BackupState

    Thunder 1.0 IU Floor Plan