design objectives of the 0.35µm alpha 21164 …...l3 cache 128-bit internal data bus instruction...
TRANSCRIPT
![Page 1: Design Objectives of the 0.35µm Alpha 21164 …...L3 Cache 128-bit internal data bus Instruction Unit Execution Units Memory Unit ITB: 48 entry DTB: 64 entry Digital Semiconductor](https://reader035.vdocuments.us/reader035/viewer/2022062603/5f1e113942520c4f55198983/html5/thumbnails/1.jpg)
Digital Semiconductor
1
Gregg BouchardDigital Semiconductor
Digital Equipment CorporationHudson, MA
Design Objectives of the0.35µm Alpha 21164
Microprocessor(A 500MHz Quad Issue RISC Microprocessor)
![Page 2: Design Objectives of the 0.35µm Alpha 21164 …...L3 Cache 128-bit internal data bus Instruction Unit Execution Units Memory Unit ITB: 48 entry DTB: 64 entry Digital Semiconductor](https://reader035.vdocuments.us/reader035/viewer/2022062603/5f1e113942520c4f55198983/html5/thumbnails/2.jpg)
Digital Semiconductor
2
• 0.35µm Alpha 21164 Overview
• Design Goals
• Technology Issues
• 21164 Internal Architecture Review
• Architectural Enhancements
• System Performance Enhancements
• Alpha Processor RoadMap
• Summary
Outline
![Page 3: Design Objectives of the 0.35µm Alpha 21164 …...L3 Cache 128-bit internal data bus Instruction Unit Execution Units Memory Unit ITB: 48 entry DTB: 64 entry Digital Semiconductor](https://reader035.vdocuments.us/reader035/viewer/2022062603/5f1e113942520c4f55198983/html5/thumbnails/3.jpg)
Digital Semiconductor
3
0.35µm ALPHA 21164
Transistor CountDie SizePower Supply
WC Power DissipationTarget Cycle Time
• Technology shrink of 0.5µm design+ Architectural Enhancements+ System Performance Enhancements
0.5 µm process 0.35 µm process9.3 Million 9.66 Million16.5 mm x 18.1 mm 14.4 mm x 14.5 mm3.3V external 3.3V external3.3V internal 2.5V internal50W @ 300 MHz 37W @ 433 MHz300 MHz 433 MHz
![Page 4: Design Objectives of the 0.35µm Alpha 21164 …...L3 Cache 128-bit internal data bus Instruction Unit Execution Units Memory Unit ITB: 48 entry DTB: 64 entry Digital Semiconductor](https://reader035.vdocuments.us/reader035/viewer/2022062603/5f1e113942520c4f55198983/html5/thumbnails/4.jpg)
Digital Semiconductor
4
Design Goals
• Reduced Cost– Die size reduction– Remove processing steps
• Higher Performance– 433 MHz design target– Architectural enhancements
• Reduced Power– Lower Core Operating Voltage (2.0v-2.5v)
• Time to Market– Significant leverage from previous design
![Page 5: Design Objectives of the 0.35µm Alpha 21164 …...L3 Cache 128-bit internal data bus Instruction Unit Execution Units Memory Unit ITB: 48 entry DTB: 64 entry Digital Semiconductor](https://reader035.vdocuments.us/reader035/viewer/2022062603/5f1e113942520c4f55198983/html5/thumbnails/5.jpg)
Digital Semiconductor
5
Die Size Analysis
Original Die16.5mm x 18.1mm
298 mm2
Ideal 30% shrink11.5mm x 12.7mm
146 mm2
Actual shrink14.4mm x 14.5mm
209 mm2
X-dim Y-dim NormAOriginal 16.5 18.1 1.0025% shrink - 4.1 - 4.5
12.4 13.6 0.56Conversion +0.0 +0.8
12.6 14.5 0.62LI Removal +1.8 +0.0
14.4 14.5 0.70
Actual 14.4 14.5 0.70
Ideal 30% 11.5 12.7 0.49
![Page 6: Design Objectives of the 0.35µm Alpha 21164 …...L3 Cache 128-bit internal data bus Instruction Unit Execution Units Memory Unit ITB: 48 entry DTB: 64 entry Digital Semiconductor](https://reader035.vdocuments.us/reader035/viewer/2022062603/5f1e113942520c4f55198983/html5/thumbnails/6.jpg)
Digital Semiconductor
6
Layout Conversion Strategy
• Full 30% linear shrink was not possible• Solution:
– 25% linear shrink
– Semi-automated conversion of design rules
– Polysilicon mask layer pushed to full 30% shrinkdimensions for performance
• Redesign of caches– Local Interconnect removed
![Page 7: Design Objectives of the 0.35µm Alpha 21164 …...L3 Cache 128-bit internal data bus Instruction Unit Execution Units Memory Unit ITB: 48 entry DTB: 64 entry Digital Semiconductor](https://reader035.vdocuments.us/reader035/viewer/2022062603/5f1e113942520c4f55198983/html5/thumbnails/7.jpg)
Digital Semiconductor
7
Layout Conversion Example
Original After Script Final
DRCErrors
WellDiffusion
Well PlugPoly
M1CMetal 1
M2CMetal 2
![Page 8: Design Objectives of the 0.35µm Alpha 21164 …...L3 Cache 128-bit internal data bus Instruction Unit Execution Units Memory Unit ITB: 48 entry DTB: 64 entry Digital Semiconductor](https://reader035.vdocuments.us/reader035/viewer/2022062603/5f1e113942520c4f55198983/html5/thumbnails/8.jpg)
Digital Semiconductor
8
Speed
• Single-wire, two phase clocking scheme• 14 gates per cycle including latches• Single global clock grid
– Global clock skew<90ps– Local clock skew<25ps
• Clock statistics (0.5 µµm design)– Clock load = 3.75 nF– Size of final clock inverter = 58 cm– Edge rate = 0.5 ns– Clocking consumes 40% of chip power– di/dt = 50 A
![Page 9: Design Objectives of the 0.35µm Alpha 21164 …...L3 Cache 128-bit internal data bus Instruction Unit Execution Units Memory Unit ITB: 48 entry DTB: 64 entry Digital Semiconductor](https://reader035.vdocuments.us/reader035/viewer/2022062603/5f1e113942520c4f55198983/html5/thumbnails/9.jpg)
Digital Semiconductor
9
• Test circuits were used to determine the speedscaling of different circuit configurations
– Predicted average process speed up
– Identified “slow” circuit types
• New circuits were evaluated in SPICE
• Chip sections with major modifications werecompletely re-verified
Speed Verification
![Page 10: Design Objectives of the 0.35µm Alpha 21164 …...L3 Cache 128-bit internal data bus Instruction Unit Execution Units Memory Unit ITB: 48 entry DTB: 64 entry Digital Semiconductor](https://reader035.vdocuments.us/reader035/viewer/2022062603/5f1e113942520c4f55198983/html5/thumbnails/10.jpg)
Digital Semiconductor
10
Power
• Significant Power Reduction
• Vddi = 2.2v (to 2.5v)
• 3.3V only interface to external devices
0.5µm 3.3v 0.35µm 2.5v
0
10
20
30
40
50
300 366 433 500MHz
Watts
I/O
Clock
Core
![Page 11: Design Objectives of the 0.35µm Alpha 21164 …...L3 Cache 128-bit internal data bus Instruction Unit Execution Units Memory Unit ITB: 48 entry DTB: 64 entry Digital Semiconductor](https://reader035.vdocuments.us/reader035/viewer/2022062603/5f1e113942520c4f55198983/html5/thumbnails/11.jpg)
Pad Ring
(I/O)
C-Box
F-Box
E-Box
L1 D Cache
L1 I Cache
L2 Tag
s
I-Box
M-Box
L2Cache
L2Cache
MainCLK
Drivers
Pre-CLK
Drivers
![Page 12: Design Objectives of the 0.35µm Alpha 21164 …...L3 Cache 128-bit internal data bus Instruction Unit Execution Units Memory Unit ITB: 48 entry DTB: 64 entry Digital Semiconductor](https://reader035.vdocuments.us/reader035/viewer/2022062603/5f1e113942520c4f55198983/html5/thumbnails/12.jpg)
Digital Semiconductor
12
Alpha 21164 Features
• 4-way issue superscalar– Up to 2 Integer AND 2 Floating Point instructions
issued per CPU cycle.• Large on-chip L2 cache
– 96KB, writeback, 3-way set associative• Fully Pipelined
– 7-stage integer pipeline– 9-stage floating point pipeline
• Emphasis on low latency at high clock rate• High-throughput memory subsystem
Key Attributes
![Page 13: Design Objectives of the 0.35µm Alpha 21164 …...L3 Cache 128-bit internal data bus Instruction Unit Execution Units Memory Unit ITB: 48 entry DTB: 64 entry Digital Semiconductor](https://reader035.vdocuments.us/reader035/viewer/2022062603/5f1e113942520c4f55198983/html5/thumbnails/13.jpg)
Digital Semiconductor
13
Alpha 21164 Block Diagram
InstructionCache(8KB)
IntegerUnit
Merge Logic
Write-Through
DataCache(8KB)
Write-back
L2Cache(96KB)
BusInterface
Unit
IntegerUnit
FP Adder
FP Mult.
Four-way
IssueUnit
40bAddress
128bData
L3 Cache
128-bit internal data bus
Instruction Unit Execution Units Memory Unit
ITB: 48 entry DTB: 64 entry
![Page 14: Design Objectives of the 0.35µm Alpha 21164 …...L3 Cache 128-bit internal data bus Instruction Unit Execution Units Memory Unit ITB: 48 entry DTB: 64 entry Digital Semiconductor](https://reader035.vdocuments.us/reader035/viewer/2022062603/5f1e113942520c4f55198983/html5/thumbnails/14.jpg)
Digital Semiconductor
14
Instruction Issue Pipeline Review
InstructionCache(8KB)
InstructionBuffer
InstructionSlot
InstructionIssue
to floating pointmultiply pipeline
to floating pointadd pipeline
to integerpipeline 0
to integerpipeline 1
IssueConflictChecker
S0 S1 S2 S3
X
0
1
PrefetchBuffer
2kx2branchprediction
Next
PC
![Page 15: Design Objectives of the 0.35µm Alpha 21164 …...L3 Cache 128-bit internal data bus Instruction Unit Execution Units Memory Unit ITB: 48 entry DTB: 64 entry Digital Semiconductor](https://reader035.vdocuments.us/reader035/viewer/2022062603/5f1e113942520c4f55198983/html5/thumbnails/15.jpg)
Digital Semiconductor
15
Execution Pipeline Review
IntRegs
Integer Pipeline 0: arith, logical,ld/st, shift
Integer Pipeline 1: arith, logical,ld, br/jmp
Int mul
S3 S4 S5 S6 S7 S8
FPRegs
FP Pipeline 0: add, subtract,compare, FP br
FP Pipeline 1: multiply
FP div
![Page 16: Design Objectives of the 0.35µm Alpha 21164 …...L3 Cache 128-bit internal data bus Instruction Unit Execution Units Memory Unit ITB: 48 entry DTB: 64 entry Digital Semiconductor](https://reader035.vdocuments.us/reader035/viewer/2022062603/5f1e113942520c4f55198983/html5/thumbnails/16.jpg)
Digital Semiconductor
16
On-chip Cache Resources Review
96K, 3 Set, Writeback L2 Cache
6 DataReads
6 DataWrites
3 Instr.Reads
2 SystemCommands
Miss Addr 0
Miss Addr 1
Victim Addr 0
Victim Addr 1
System data
Victim 0 data
Victim 1 data
S6 S7 S8 S9
128 bits/cycle
40 bit PA
![Page 17: Design Objectives of the 0.35µm Alpha 21164 …...L3 Cache 128-bit internal data bus Instruction Unit Execution Units Memory Unit ITB: 48 entry DTB: 64 entry Digital Semiconductor](https://reader035.vdocuments.us/reader035/viewer/2022062603/5f1e113942520c4f55198983/html5/thumbnails/17.jpg)
Digital Semiconductor
17
L3 Cache (off-chip)
• L3 cache is a direct-mappedwriteback superset of on-chip L2 cache
• Up to 2 reads (oroutstanding readcommands) in L3 cache
• Programmable wavepipelining for L3 cache
• Support for SynchronousFlow-Thru SRAMs
• L3 cache is optional
21164
BusAddress
File
Index
Command/Addr
L3 Tag
L3Data
128-bit data bus
![Page 18: Design Objectives of the 0.35µm Alpha 21164 …...L3 Cache 128-bit internal data bus Instruction Unit Execution Units Memory Unit ITB: 48 entry DTB: 64 entry Digital Semiconductor](https://reader035.vdocuments.us/reader035/viewer/2022062603/5f1e113942520c4f55198983/html5/thumbnails/18.jpg)
Digital Semiconductor
18
Off-Chip L3 Cache Options
Selectable via on-chip programmable registers
u Cache Size - 1 to 64M Byte
u Cache Read/Write Speed - 4 to 15 cpu cycles
u Read to Write Spacing - 1 to 7 cpu cycles
u Write Pulse (Bit Mask) - Up to 9 cpu cycles
u Wave pipelining - 0 to 3 cpu cycles
u Support for Synchronous SRAM’s
![Page 19: Design Objectives of the 0.35µm Alpha 21164 …...L3 Cache 128-bit internal data bus Instruction Unit Execution Units Memory Unit ITB: 48 entry DTB: 64 entry Digital Semiconductor](https://reader035.vdocuments.us/reader035/viewer/2022062603/5f1e113942520c4f55198983/html5/thumbnails/19.jpg)
Digital Semiconductor
19
Architectural Enhancements
New Instructions• Scalar support for Byte and Short data types
– LDBU, LDWU load an unsigned byte or short
– STB, STW, store an byte or short
– SEXTB, SEXTW sign extend a byte or short
• Eases porting of device drivers to Alpha
• Improves emulation of Intel code on Alpha
• Implemented in this and all future Alpha
microprocessors
![Page 20: Design Objectives of the 0.35µm Alpha 21164 …...L3 Cache 128-bit internal data bus Instruction Unit Execution Units Memory Unit ITB: 48 entry DTB: 64 entry Digital Semiconductor](https://reader035.vdocuments.us/reader035/viewer/2022062603/5f1e113942520c4f55198983/html5/thumbnails/20.jpg)
Digital Semiconductor
20
New Instructions (continued)• IMPLVER
– returns a small integer indicating the core design
– used for code scheduling decisions
– implemented in all Alpha microprocessors
• AMASK– clears bits to indicate which features are present
– implemented in all Alpha microprocessors
Example:
AMASK #1, R0 ;byte/word present
BNE R0, emulate ;if not emulate
...
Architectural Enhancements
![Page 21: Design Objectives of the 0.35µm Alpha 21164 …...L3 Cache 128-bit internal data bus Instruction Unit Execution Units Memory Unit ITB: 48 entry DTB: 64 entry Digital Semiconductor](https://reader035.vdocuments.us/reader035/viewer/2022062603/5f1e113942520c4f55198983/html5/thumbnails/21.jpg)
Digital Semiconductor
21
System Performance Enhancements
index
tag_oe,we
big_drv_enoe_we_active_low
st_clk1,2
0
1L
H
Tag Store
Data Store
tag_datadata 128
Bcache
21164
data_oe,we
(50% more drive)(eliminates off-chip buffering)
1
2 (additional pin)
ME
S O
2
5
4
3clk_mode<2>(1x clock mode)
cmdaddrdatadack
Write
A0
D00 D01 D02 D03
Write
A1
D10 D11
I
Cache Coherence
Bus utilization improvements(Auto Dack)
![Page 22: Design Objectives of the 0.35µm Alpha 21164 …...L3 Cache 128-bit internal data bus Instruction Unit Execution Units Memory Unit ITB: 48 entry DTB: 64 entry Digital Semiconductor](https://reader035.vdocuments.us/reader035/viewer/2022062603/5f1e113942520c4f55198983/html5/thumbnails/22.jpg)
Digital Semiconductor
22
Pre-Silicon Logic Verification
• Used extensive test suite developed for originalchip as testing baseline
• Random and focused testing
• Coverage analysis to ensure excellent testcoverage
• Three simulation systems used:– RTL
– Transistor-level
– Gate-level
![Page 23: Design Objectives of the 0.35µm Alpha 21164 …...L3 Cache 128-bit internal data bus Instruction Unit Execution Units Memory Unit ITB: 48 entry DTB: 64 entry Digital Semiconductor](https://reader035.vdocuments.us/reader035/viewer/2022062603/5f1e113942520c4f55198983/html5/thumbnails/23.jpg)
Digital Semiconductor
23
Alpha Microprocessor Road Map
0
5
10
15
20
25
1992 1993 1994 1995 1996 1997 1998
Sp
ecIn
t95
21064 -150MHz
21064 -200MHz21064A -275MHz
21164 -300MHz
21164 -333MHz
21164 -433MHz
21164 -500MHz
21164 -366MHz
21164 -400MHz
![Page 24: Design Objectives of the 0.35µm Alpha 21164 …...L3 Cache 128-bit internal data bus Instruction Unit Execution Units Memory Unit ITB: 48 entry DTB: 64 entry Digital Semiconductor](https://reader035.vdocuments.us/reader035/viewer/2022062603/5f1e113942520c4f55198983/html5/thumbnails/24.jpg)
Digital Semiconductor
24
• FIRST PASS SILICON - October 1995
• Booted first operating system in 2 days!
• Continued Performance Leadership (4+ years)
• Look for next generation 30+ SpecInt95 by Q3’97
Summary
![Page 25: Design Objectives of the 0.35µm Alpha 21164 …...L3 Cache 128-bit internal data bus Instruction Unit Execution Units Memory Unit ITB: 48 entry DTB: 64 entry Digital Semiconductor](https://reader035.vdocuments.us/reader035/viewer/2022062603/5f1e113942520c4f55198983/html5/thumbnails/25.jpg)
Digital Semiconductor
25
Acknowledgments
Peter J. Bannon, Michael S. Bertone, Randel P. BlakeCampos, William J. Bowhill, David A. Carlson,
Ruben W. Castelino, Dale R. Donchin, Richard M.Fromm, Mary K. Gowan, Paul E. Gronowski, Anil K.
Jain, Bruce J. Loughlin, Shekhar Mehta, Jeanne E.Meyer, Robert O. Mueller, Andy Olesin, Tung N.
Pham, Ronald P. Preston, Paul I. Rubinfeld