design objectives of the 0.35µm alpha 21164 …...l3 cache 128-bit internal data bus instruction...

25
Digital Semiconductor 1 Gregg Bouchard Digital Semiconductor Digital Equipment Corporation Hudson, MA Design Objectives of the 0.35μm Alpha 21164 Microprocessor (A 500MHz Quad Issue RISC Microprocessor)

Upload: others

Post on 04-Jul-2020

3 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Design Objectives of the 0.35µm Alpha 21164 …...L3 Cache 128-bit internal data bus Instruction Unit Execution Units Memory Unit ITB: 48 entry DTB: 64 entry Digital Semiconductor

Digital Semiconductor

1

Gregg BouchardDigital Semiconductor

Digital Equipment CorporationHudson, MA

Design Objectives of the0.35µm Alpha 21164

Microprocessor(A 500MHz Quad Issue RISC Microprocessor)

Page 2: Design Objectives of the 0.35µm Alpha 21164 …...L3 Cache 128-bit internal data bus Instruction Unit Execution Units Memory Unit ITB: 48 entry DTB: 64 entry Digital Semiconductor

Digital Semiconductor

2

• 0.35µm Alpha 21164 Overview

• Design Goals

• Technology Issues

• 21164 Internal Architecture Review

• Architectural Enhancements

• System Performance Enhancements

• Alpha Processor RoadMap

• Summary

Outline

Page 3: Design Objectives of the 0.35µm Alpha 21164 …...L3 Cache 128-bit internal data bus Instruction Unit Execution Units Memory Unit ITB: 48 entry DTB: 64 entry Digital Semiconductor

Digital Semiconductor

3

0.35µm ALPHA 21164

Transistor CountDie SizePower Supply

WC Power DissipationTarget Cycle Time

• Technology shrink of 0.5µm design+ Architectural Enhancements+ System Performance Enhancements

0.5 µm process 0.35 µm process9.3 Million 9.66 Million16.5 mm x 18.1 mm 14.4 mm x 14.5 mm3.3V external 3.3V external3.3V internal 2.5V internal50W @ 300 MHz 37W @ 433 MHz300 MHz 433 MHz

Page 4: Design Objectives of the 0.35µm Alpha 21164 …...L3 Cache 128-bit internal data bus Instruction Unit Execution Units Memory Unit ITB: 48 entry DTB: 64 entry Digital Semiconductor

Digital Semiconductor

4

Design Goals

• Reduced Cost– Die size reduction– Remove processing steps

• Higher Performance– 433 MHz design target– Architectural enhancements

• Reduced Power– Lower Core Operating Voltage (2.0v-2.5v)

• Time to Market– Significant leverage from previous design

Page 5: Design Objectives of the 0.35µm Alpha 21164 …...L3 Cache 128-bit internal data bus Instruction Unit Execution Units Memory Unit ITB: 48 entry DTB: 64 entry Digital Semiconductor

Digital Semiconductor

5

Die Size Analysis

Original Die16.5mm x 18.1mm

298 mm2

Ideal 30% shrink11.5mm x 12.7mm

146 mm2

Actual shrink14.4mm x 14.5mm

209 mm2

X-dim Y-dim NormAOriginal 16.5 18.1 1.0025% shrink - 4.1 - 4.5

12.4 13.6 0.56Conversion +0.0 +0.8

12.6 14.5 0.62LI Removal +1.8 +0.0

14.4 14.5 0.70

Actual 14.4 14.5 0.70

Ideal 30% 11.5 12.7 0.49

Page 6: Design Objectives of the 0.35µm Alpha 21164 …...L3 Cache 128-bit internal data bus Instruction Unit Execution Units Memory Unit ITB: 48 entry DTB: 64 entry Digital Semiconductor

Digital Semiconductor

6

Layout Conversion Strategy

• Full 30% linear shrink was not possible• Solution:

– 25% linear shrink

– Semi-automated conversion of design rules

– Polysilicon mask layer pushed to full 30% shrinkdimensions for performance

• Redesign of caches– Local Interconnect removed

Page 7: Design Objectives of the 0.35µm Alpha 21164 …...L3 Cache 128-bit internal data bus Instruction Unit Execution Units Memory Unit ITB: 48 entry DTB: 64 entry Digital Semiconductor

Digital Semiconductor

7

Layout Conversion Example

Original After Script Final

DRCErrors

WellDiffusion

Well PlugPoly

M1CMetal 1

M2CMetal 2

Page 8: Design Objectives of the 0.35µm Alpha 21164 …...L3 Cache 128-bit internal data bus Instruction Unit Execution Units Memory Unit ITB: 48 entry DTB: 64 entry Digital Semiconductor

Digital Semiconductor

8

Speed

• Single-wire, two phase clocking scheme• 14 gates per cycle including latches• Single global clock grid

– Global clock skew<90ps– Local clock skew<25ps

• Clock statistics (0.5 µµm design)– Clock load = 3.75 nF– Size of final clock inverter = 58 cm– Edge rate = 0.5 ns– Clocking consumes 40% of chip power– di/dt = 50 A

Page 9: Design Objectives of the 0.35µm Alpha 21164 …...L3 Cache 128-bit internal data bus Instruction Unit Execution Units Memory Unit ITB: 48 entry DTB: 64 entry Digital Semiconductor

Digital Semiconductor

9

• Test circuits were used to determine the speedscaling of different circuit configurations

– Predicted average process speed up

– Identified “slow” circuit types

• New circuits were evaluated in SPICE

• Chip sections with major modifications werecompletely re-verified

Speed Verification

Page 10: Design Objectives of the 0.35µm Alpha 21164 …...L3 Cache 128-bit internal data bus Instruction Unit Execution Units Memory Unit ITB: 48 entry DTB: 64 entry Digital Semiconductor

Digital Semiconductor

10

Power

• Significant Power Reduction

• Vddi = 2.2v (to 2.5v)

• 3.3V only interface to external devices

0.5µm 3.3v 0.35µm 2.5v

0

10

20

30

40

50

300 366 433 500MHz

Watts

I/O

Clock

Core

Page 11: Design Objectives of the 0.35µm Alpha 21164 …...L3 Cache 128-bit internal data bus Instruction Unit Execution Units Memory Unit ITB: 48 entry DTB: 64 entry Digital Semiconductor

Pad Ring

(I/O)

C-Box

F-Box

E-Box

L1 D Cache

L1 I Cache

L2 Tag

s

I-Box

M-Box

L2Cache

L2Cache

MainCLK

Drivers

Pre-CLK

Drivers

Page 12: Design Objectives of the 0.35µm Alpha 21164 …...L3 Cache 128-bit internal data bus Instruction Unit Execution Units Memory Unit ITB: 48 entry DTB: 64 entry Digital Semiconductor

Digital Semiconductor

12

Alpha 21164 Features

• 4-way issue superscalar– Up to 2 Integer AND 2 Floating Point instructions

issued per CPU cycle.• Large on-chip L2 cache

– 96KB, writeback, 3-way set associative• Fully Pipelined

– 7-stage integer pipeline– 9-stage floating point pipeline

• Emphasis on low latency at high clock rate• High-throughput memory subsystem

Key Attributes

Page 13: Design Objectives of the 0.35µm Alpha 21164 …...L3 Cache 128-bit internal data bus Instruction Unit Execution Units Memory Unit ITB: 48 entry DTB: 64 entry Digital Semiconductor

Digital Semiconductor

13

Alpha 21164 Block Diagram

InstructionCache(8KB)

IntegerUnit

Merge Logic

Write-Through

DataCache(8KB)

Write-back

L2Cache(96KB)

BusInterface

Unit

IntegerUnit

FP Adder

FP Mult.

Four-way

IssueUnit

40bAddress

128bData

L3 Cache

128-bit internal data bus

Instruction Unit Execution Units Memory Unit

ITB: 48 entry DTB: 64 entry

Page 14: Design Objectives of the 0.35µm Alpha 21164 …...L3 Cache 128-bit internal data bus Instruction Unit Execution Units Memory Unit ITB: 48 entry DTB: 64 entry Digital Semiconductor

Digital Semiconductor

14

Instruction Issue Pipeline Review

InstructionCache(8KB)

InstructionBuffer

InstructionSlot

InstructionIssue

to floating pointmultiply pipeline

to floating pointadd pipeline

to integerpipeline 0

to integerpipeline 1

IssueConflictChecker

S0 S1 S2 S3

X

0

1

PrefetchBuffer

2kx2branchprediction

Next

PC

Page 15: Design Objectives of the 0.35µm Alpha 21164 …...L3 Cache 128-bit internal data bus Instruction Unit Execution Units Memory Unit ITB: 48 entry DTB: 64 entry Digital Semiconductor

Digital Semiconductor

15

Execution Pipeline Review

IntRegs

Integer Pipeline 0: arith, logical,ld/st, shift

Integer Pipeline 1: arith, logical,ld, br/jmp

Int mul

S3 S4 S5 S6 S7 S8

FPRegs

FP Pipeline 0: add, subtract,compare, FP br

FP Pipeline 1: multiply

FP div

Page 16: Design Objectives of the 0.35µm Alpha 21164 …...L3 Cache 128-bit internal data bus Instruction Unit Execution Units Memory Unit ITB: 48 entry DTB: 64 entry Digital Semiconductor

Digital Semiconductor

16

On-chip Cache Resources Review

96K, 3 Set, Writeback L2 Cache

6 DataReads

6 DataWrites

3 Instr.Reads

2 SystemCommands

Miss Addr 0

Miss Addr 1

Victim Addr 0

Victim Addr 1

System data

Victim 0 data

Victim 1 data

S6 S7 S8 S9

128 bits/cycle

40 bit PA

Page 17: Design Objectives of the 0.35µm Alpha 21164 …...L3 Cache 128-bit internal data bus Instruction Unit Execution Units Memory Unit ITB: 48 entry DTB: 64 entry Digital Semiconductor

Digital Semiconductor

17

L3 Cache (off-chip)

• L3 cache is a direct-mappedwriteback superset of on-chip L2 cache

• Up to 2 reads (oroutstanding readcommands) in L3 cache

• Programmable wavepipelining for L3 cache

• Support for SynchronousFlow-Thru SRAMs

• L3 cache is optional

21164

BusAddress

File

Index

Command/Addr

L3 Tag

L3Data

128-bit data bus

Page 18: Design Objectives of the 0.35µm Alpha 21164 …...L3 Cache 128-bit internal data bus Instruction Unit Execution Units Memory Unit ITB: 48 entry DTB: 64 entry Digital Semiconductor

Digital Semiconductor

18

Off-Chip L3 Cache Options

Selectable via on-chip programmable registers

u Cache Size - 1 to 64M Byte

u Cache Read/Write Speed - 4 to 15 cpu cycles

u Read to Write Spacing - 1 to 7 cpu cycles

u Write Pulse (Bit Mask) - Up to 9 cpu cycles

u Wave pipelining - 0 to 3 cpu cycles

u Support for Synchronous SRAM’s

Page 19: Design Objectives of the 0.35µm Alpha 21164 …...L3 Cache 128-bit internal data bus Instruction Unit Execution Units Memory Unit ITB: 48 entry DTB: 64 entry Digital Semiconductor

Digital Semiconductor

19

Architectural Enhancements

New Instructions• Scalar support for Byte and Short data types

– LDBU, LDWU load an unsigned byte or short

– STB, STW, store an byte or short

– SEXTB, SEXTW sign extend a byte or short

• Eases porting of device drivers to Alpha

• Improves emulation of Intel code on Alpha

• Implemented in this and all future Alpha

microprocessors

Page 20: Design Objectives of the 0.35µm Alpha 21164 …...L3 Cache 128-bit internal data bus Instruction Unit Execution Units Memory Unit ITB: 48 entry DTB: 64 entry Digital Semiconductor

Digital Semiconductor

20

New Instructions (continued)• IMPLVER

– returns a small integer indicating the core design

– used for code scheduling decisions

– implemented in all Alpha microprocessors

• AMASK– clears bits to indicate which features are present

– implemented in all Alpha microprocessors

Example:

AMASK #1, R0 ;byte/word present

BNE R0, emulate ;if not emulate

...

Architectural Enhancements

Page 21: Design Objectives of the 0.35µm Alpha 21164 …...L3 Cache 128-bit internal data bus Instruction Unit Execution Units Memory Unit ITB: 48 entry DTB: 64 entry Digital Semiconductor

Digital Semiconductor

21

System Performance Enhancements

index

tag_oe,we

big_drv_enoe_we_active_low

st_clk1,2

0

1L

H

Tag Store

Data Store

tag_datadata 128

Bcache

21164

data_oe,we

(50% more drive)(eliminates off-chip buffering)

1

2 (additional pin)

ME

S O

2

5

4

3clk_mode<2>(1x clock mode)

cmdaddrdatadack

Write

A0

D00 D01 D02 D03

Write

A1

D10 D11

I

Cache Coherence

Bus utilization improvements(Auto Dack)

Page 22: Design Objectives of the 0.35µm Alpha 21164 …...L3 Cache 128-bit internal data bus Instruction Unit Execution Units Memory Unit ITB: 48 entry DTB: 64 entry Digital Semiconductor

Digital Semiconductor

22

Pre-Silicon Logic Verification

• Used extensive test suite developed for originalchip as testing baseline

• Random and focused testing

• Coverage analysis to ensure excellent testcoverage

• Three simulation systems used:– RTL

– Transistor-level

– Gate-level

Page 23: Design Objectives of the 0.35µm Alpha 21164 …...L3 Cache 128-bit internal data bus Instruction Unit Execution Units Memory Unit ITB: 48 entry DTB: 64 entry Digital Semiconductor

Digital Semiconductor

23

Alpha Microprocessor Road Map

0

5

10

15

20

25

1992 1993 1994 1995 1996 1997 1998

Sp

ecIn

t95

21064 -150MHz

21064 -200MHz21064A -275MHz

21164 -300MHz

21164 -333MHz

21164 -433MHz

21164 -500MHz

21164 -366MHz

21164 -400MHz

Page 24: Design Objectives of the 0.35µm Alpha 21164 …...L3 Cache 128-bit internal data bus Instruction Unit Execution Units Memory Unit ITB: 48 entry DTB: 64 entry Digital Semiconductor

Digital Semiconductor

24

• FIRST PASS SILICON - October 1995

• Booted first operating system in 2 days!

• Continued Performance Leadership (4+ years)

• Look for next generation 30+ SpecInt95 by Q3’97

Summary

Page 25: Design Objectives of the 0.35µm Alpha 21164 …...L3 Cache 128-bit internal data bus Instruction Unit Execution Units Memory Unit ITB: 48 entry DTB: 64 entry Digital Semiconductor

Digital Semiconductor

25

Acknowledgments

Peter J. Bannon, Michael S. Bertone, Randel P. BlakeCampos, William J. Bowhill, David A. Carlson,

Ruben W. Castelino, Dale R. Donchin, Richard M.Fromm, Mary K. Gowan, Paul E. Gronowski, Anil K.

Jain, Bruce J. Loughlin, Shekhar Mehta, Jeanne E.Meyer, Robert O. Mueller, Andy Olesin, Tung N.

Pham, Ronald P. Preston, Paul I. Rubinfeld