the thunder sparc processor - lightner · 2003. 6. 18. · metaflow 13 thunder unusual features l...
TRANSCRIPT
-
METAFLOW1
The Thunder SPARC Processor
HOT Chips VI
August 15-16, 1994
Bruce D. LightnerVice President, Development
Metaflow Technologies, Inc.4250 Executive Square, Suite 300
La Jolla, California 92037(619) 452-6608
Internet: [email protected]
-
METAFLOW2
Metaflow Backgroundl Company founded in 1987l Pioneers in out-of-order/speculative execution
microprocessor designl First major design contract was LSI Logic's
“Lightning” SPARC chipset (completed 1991)l “Thunder” SPARC 3-chip set is improved
version of “Lightning” 4-chip setl Project 100% funded by Hyundai Electronicsl Thunder chipset first silicon July 1993l Patented out-of-order architecture
-
METAFLOW3
Thunder SPARC Mbus Module
IntegerUnit(IU)
XcacheRAM
(1 Mbyte)
FloatingPoint Unit
(FPU)
Cache Control/MMU/Bus Intf
(CMB)17
36
64
Mbus (multiprocessor)85
FPU Instrs
Bypass
Address
Data
322.7M
2.5M
0.8M
-
METAFLOW4
Thunder SPARC Chipset
l Superscalar fetch, issue, and execution
l Dynamic scheduling (dataflow)
l Out-of-order execution
l Speculative execution
l Above “hidden” from the programmer
l Factor of 2-3 performance advantage fromarchitecture
-
METAFLOW5
Microprocessor HardwareTrends
l Superscalar instruction issue
l Super-pipelined execution units
l Super-fast processor clocks (>100 MHz)
l Out-of-order execution
l Speculative execution
-
METAFLOW6
Out-of-Order Execution
PROGRAM ORDER COMPLETION ORDER
float A, X, XX, X1int n...X1 = 1 / A;XX = A * A;n = n + 1;X = X + A;...
.
.
.n = n + 1;X = X + A;XX = A * A;X1 = 1 / A;...
OperationFP divideFP multiplyFP addInteger add
Clocks10531
-
METAFLOW7
Speculative Execution
C EXAMPLE
for ( i = 0; i < MAX; ++i ) { if (X[i] < 0) error_exit(); sum = sum + sqrt(X[i]);}
DO 100 i = 1, max IF (X(i) .LT. 0) CALL error_exit100 SUM = SUM + SQRT(X(i))
FORTRAN EXAMPLE
Speculative executionopportunity
Speculative executionopportunity
Xii
n
=∑
0
-
METAFLOW8
Conventional PipelineInstruction
Cache
RegisterFile
FETCH
ISSUE
EXECUTE
RETIRE
-
METAFLOW9
Metaflow Pipeline (DRIS)
DRISINSTRUCTION n
INSTRUCTION 2
INSTRUCTION 1
...Deferred-scheduling, Register-renaming
Instruction Shelf
RETIRE
RegisterFile
ISSUE
SCHEDULE
UPDATE
...
...
...
...
...
...
...
InstructionCache
...
...
...
-
METAFLOW10
Metaflow DRIS Entry
Locked RegNum ID
Source Operand 1
Locked RegNum ID
Source Operand 2
Destination
Latest RegNum Instruction’s Results
Yes/No
Dispatched
Class Number
Functional Unit
Yes/No
Executed
Content
Program Counter
-
METAFLOW11
Thunder IU
DRISINSTRUCTION n
INSTRUCTION 2
INSTRUCTION 1
...•Out-of-order instruction results•Speculative instruction results•Automatic “register renaming”
RETIRE
RegisterFile
ISSUE
SCHEDULE
UPDATE
InstructionCache Branch
Unit
BranchShelf
Predecoder
MemoryShelf
DataBus
BypassBus
AddressBus+ + * /Addr
ConditionCodes
StoreData
-
METAFLOW12
Thunder ProcessorDistinguishing Features
l 4 instructions issued per clock (3 integer, 2floating point instructions and one branch)
l 8 parallel functional units (3 integer ALUs, 2FP ALUs, 1 branch unit, 2 memory/bypass)
l Dataflow-based out-of-order instructionissue/execution (memory/ALU operations), in-order completion, precise traps and interrupts
l Speculative execution beyond unresolvedconditional branches (multiple basic blockswith instant state repair on error)
l Above mechanisms transparent to executingprogram (strict SPARC V8 compatibility)
-
METAFLOW13
Thunder Unusual Featuresl Eager evaluation (“folds” conditional branches)l Multiple instances of memory locations
(“memory renaming”)l Out-of-order memory references
– Multiple simultaneous cache miss processing(“non-blocking cache”)
– Split address and data memory transactions(memory response re-ordering allowed)
l Speculative memory reads with respect toshared memory coherence restrictions(emulates “strong consistency” model)
l Uninterrupted transitions between user andsupervisor modes (traps/interrupts)
-
METAFLOW14
Thunder Unusual Features(continued)
Floating Point Unit (FPU)l Variable latency FPU (early termination for
divide/square root)l Non-blocking divide and square root
(concurrent add/multiply)
Branch Predictionl Dynamic branch predictionl Generalized loop count predictionl Return-from-subroutine (RET) “folding”l Jump-through-register prediction (JMPL cache)
-
METAFLOW15
Thunder Status
First SiliconFoundryCMOS technologyMetal layersSupply voltageClock rate (processor) (Mbus)Die Size (mm square)PackageMethodology (data path/RAM) (control logic) (test)External cache RAM (type) (size)Total transistors (3 chips)
July/Nov 1993VLSI Technologies
0.6/0.8 µm3
5V50 MHz40 MHz14.5/17.4
IPGA (391 pins)Full custom
Standard cellJTAG/full scan
“Viking”Eight 128K x 9
~6 million
Thunder 1.0
Q4 1994
0.5 µm4
3.6V80 MHz60 MHz9.1/11.8
TBGA (600+ pins)Full custom
Standard cellJTAG/full scan
“Pentium”Eight 64K x 18
~6 million
Thunder 1.5
≤
-
METAFLOW16
Thunder Status
First SiliconFoundryCMOS technologyMetal layersSupply voltageClock rate (processor) (Mbus)Die Size (mm square)PackageMethodology (data path/RAM) (control logic) (test)External cache RAM (type) (size)Total transistors (3 chips)
July/Nov 1993VLSI Technologies
0.6/0.8 µm3
5V50 MHz40 MHz14.5/17.4
IPGA (391 pins)Full custom
Standard cellJTAG/full scan
“Viking”Eight 128K x 9
~6 million
Thunder 1.0
Q4 1994IBM
0.5 µm4
3.6V80 MHz60 MHz9.1/11.8
TBGA (600+ pins)Full custom
Standard cellJTAG/full scan
“Pentium”Eight 64K x 18
~6 million
Thunder 1.5
≤
-
METAFLOW17
0Q1’94 Q4’94 Q2’95
400
Thunder Performance Goals
300
200
10010050 MHz
0.8/0.6 µm80 MHz0.5 µm
120 MHz0.5 µm
SPECint92
SPECfp92
500
-
METAFLOW18
Instruction Shelf
RegisterFile ALUs
ResultShelf
RetirementDataTLBLoad
Buffer
RegisterRenaming
Load/StoreLogic
LoadShelf
Store Shelf
InstrTLB
PhysicalTags
InstructionCache
(20 Kbytes)
InstrDecode
Load/Store
DependCheck
InstrPre-Dec
VirtualTags
InstrIssueLogic
PCGenerator
Ret AddrCache
BranchShelf
BranchPredict
BackupState
Thunder 1.0 IU Floor Plan
-
METAFLOW19
Instruction Shelf
RegisterFile ALUs
ResultShelf
RetirementDataTLBLoad
Buffer
RegisterRenaming
Load/StoreLogic
LoadShelf
Store Shelf
InstrTLB
PhysicalTags
InstructionCache
(20 Kbytes)
InstrDecode
Load/Store
DependCheck
InstrPre-Dec
VirtualTags
InstrIssueLogic
PCGenerator
Ret AddrCache
BranchShelf
BranchPredict
BackupState
Thunder 1.0 IU Floor Plan