ultrasparc iv

31
UltraSparc IV Tolga TOLGAY

Upload: draco

Post on 23-Jan-2016

86 views

Category:

Documents


3 download

DESCRIPTION

UltraSparc IV. Tolga TOLGAY. OUTLINE. Introduction History What is new? Chip Multitreading Pipeline Cache Branch Prediction Conclusion. INTRODUCTION. Sparc = Scalable Processor Architecture Open processor architecture SUN UltraSparc v9: RISC Architecture 64 bit address and data - PowerPoint PPT Presentation

TRANSCRIPT

Page 1: UltraSparc IV

UltraSparc IVUltraSparc IVTolga TOLGAYTolga TOLGAY

Page 2: UltraSparc IV

OUTLINEOUTLINE

IntroductionHistoryWhat is new?Chip MultitreadingPipelineCacheBranch PredictionConclusion

IntroductionHistoryWhat is new?Chip MultitreadingPipelineCacheBranch PredictionConclusion

Page 3: UltraSparc IV

INTRODUCTIONINTRODUCTION

Sparc = Scalable Processor Architecture

Open processor architectureSUN UltraSparc v9:

RISC Architecture64 bit address and dataSuperscalar

Sparc = Scalable Processor Architecture

Open processor architectureSUN UltraSparc v9:

RISC Architecture64 bit address and dataSuperscalar

Page 4: UltraSparc IV

HISTORYHISTORY

Begin developing Sparc – 1984 First Sparc Processor – 1986 SuperSparc – 1992 UltraSparc I – 1995 UltraSparc II – 1997 UltraSparc III – 2001 UltraSparc IV – 2004UltraSparc IV – 2004 UltraSparc IV+ – 2005 UltraSparc T1 – 2005

Begin developing Sparc – 1984 First Sparc Processor – 1986 SuperSparc – 1992 UltraSparc I – 1995 UltraSparc II – 1997 UltraSparc III – 2001 UltraSparc IV – 2004UltraSparc IV – 2004 UltraSparc IV+ – 2005 UltraSparc T1 – 2005

Page 5: UltraSparc IV

WHAT IS NEW?WHAT IS NEW?

What UltraSparc IV offers new : CMT (Chip Multithreading)

New registers added due to CMT enhancementMCU registers, Sun Fireplan Interconnect

registers are shared.Enhancements on Floating Point Unit16 MB L2 cache with 128 byte line-size

shared by two processors.L2 caches uses LRU replacement strategyNew write-cache indexing-hashing feature

What UltraSparc IV offers new : CMT (Chip Multithreading)

New registers added due to CMT enhancementMCU registers, Sun Fireplan Interconnect

registers are shared.Enhancements on Floating Point Unit16 MB L2 cache with 128 byte line-size

shared by two processors.L2 caches uses LRU replacement strategyNew write-cache indexing-hashing feature

Page 6: UltraSparc IV

Chip Multitreading (CMT)Chip Multitreading (CMT)

Two UltraSparc III cores into one die.

Two mirrored cores share :System busDRAM controllerOff-die L2 cacheFireplan registers.

Also called Chip Multiprocessing

Two UltraSparc III cores into one die.

Two mirrored cores share :System busDRAM controllerOff-die L2 cacheFireplan registers.

Also called Chip Multiprocessing

Page 7: UltraSparc IV

Chip MultitreadingChip Multitreading

Page 8: UltraSparc IV

Chip MultitreadingChip Multitreading

Aim is to increase performance without increasing clock speed.

Mirroring the cores cause a hot spot of floating point units.

How to avoid hot spot : Heat towers in copper interconnect

Aim is to increase performance without increasing clock speed.

Mirroring the cores cause a hot spot of floating point units.

How to avoid hot spot : Heat towers in copper interconnect

Page 9: UltraSparc IV

Chip MultitreadingChip Multitreading

Page 10: UltraSparc IV

CoreCore

More core improvements:Improved instruction fetch and store

bandwidth.Improved data prefetchingFPU can handle more unexpected

and underflow cases so reducing exceptions.

On-die cache enhanced with a hashed index to better handle multiple writes.

More core improvements:Improved instruction fetch and store

bandwidth.Improved data prefetchingFPU can handle more unexpected

and underflow cases so reducing exceptions.

On-die cache enhanced with a hashed index to better handle multiple writes.

Page 11: UltraSparc IV

PipelinePipeline

Because UltraSparc IV contains two UltraSparc III cores, it uses the same pipeline.

4-way superscalar architecture.14-stage pipeline.

Because UltraSparc IV contains two UltraSparc III cores, it uses the same pipeline.

4-way superscalar architecture.14-stage pipeline.

Page 12: UltraSparc IV

Pip

elin

e S

tag

es

Pip

elin

e S

tag

es

Page 13: UltraSparc IV

Pipeline StagesPipeline Stages

Pipeline Stage Definition

A Address Generation

P Preliminary Fetch

F Fetch Intructions from I-Cache

B Branch Target Computation

I Instruction Group Formation

J Grouping

R Register Access

E Execute

C Cache

M Miss Detect

W Write

X Extend

T Trap

D Done

Page 14: UltraSparc IV

Pipeline StagesPipeline Stages

Page 15: UltraSparc IV

Pipeline StagesPipeline Stages

Stage A : Address Generation Generates and selects the fetch address Address can be selected from several sources

Stage P : Preliminary Fetch Starts fetching from I-Cache Accesses to Branch Predictor

Stage F : Fetch Second half of I-Cache access At the end of stage 4 instructions may be

latched Stage B : Branch Target Computation

Analyzes the instructions Calculate branch target address

Stage A : Address Generation Generates and selects the fetch address Address can be selected from several sources

Stage P : Preliminary Fetch Starts fetching from I-Cache Accesses to Branch Predictor

Stage F : Fetch Second half of I-Cache access At the end of stage 4 instructions may be

latched Stage B : Branch Target Computation

Analyzes the instructions Calculate branch target address

Page 16: UltraSparc IV

Pipeline StagesPipeline Stages

Stage I : Instruction Group FormationInstructions are grouped into instruction

queue.Stage J : Instruction Group Staging

A group of instructions are dequeued and sent to R-Stage

Stage R : Dispatch and Register AccessDependency calculationDependency solution

Stage I : Instruction Group FormationInstructions are grouped into instruction

queue.Stage J : Instruction Group Staging

A group of instructions are dequeued and sent to R-Stage

Stage R : Dispatch and Register AccessDependency calculationDependency solution

Page 17: UltraSparc IV

Pipeline StagesPipeline Stages

Stage E : Integer Instruction ExecutionFirst stage of execution pipelinesInteger instructions -> A0 and A1

pipelinesBranch instructions -> Branch pipelineOther instructions -> MS pipeline

Stage C : CacheInteger pipelines write results backSIU results are producedFirst stage for Floating Point Instructions

Stage E : Integer Instruction ExecutionFirst stage of execution pipelinesInteger instructions -> A0 and A1

pipelinesBranch instructions -> Branch pipelineOther instructions -> MS pipeline

Stage C : CacheInteger pipelines write results backSIU results are producedFirst stage for Floating Point Instructions

Page 18: UltraSparc IV

Pipeline StagesPipeline Stages

Stage M : Miss Data cache misses are determined Second step for FP instructions

Stage W : Write MS pipeline results are written Third step for FP instructions D-cache miss requests send to L2 cache

Stage X : Extend Final step for Floating Point instructions Results from FP instructions are ready for

bypass

Stage M : Miss Data cache misses are determined Second step for FP instructions

Stage W : Write MS pipeline results are written Third step for FP instructions D-cache miss requests send to L2 cache

Stage X : Extend Final step for Floating Point instructions Results from FP instructions are ready for

bypass

Page 19: UltraSparc IV

Pipeline StagesPipeline Stages

Stage T : TrapTraps are signalledAfter trap, instructions invalidate results

Stage D : DoneInteger results are written into

architectural register fileFloating point results are written to

floating point register file.Results became visible to any traps

generated from younger instructions.

Stage T : TrapTraps are signalledAfter trap, instructions invalidate results

Stage D : DoneInteger results are written into

architectural register fileFloating point results are written to

floating point register file.Results became visible to any traps

generated from younger instructions.

Page 20: UltraSparc IV

Pipeline RulesPipeline Rules

Grouping rules :Group : collection of instructions that

does not limit eachother to be executed in parallel

Made before R-stageNeeded for :

The execution order is maintainedEach pipeline runs a subset of instructionsInstructions may require helpers

Execution order : in – order execution

Grouping rules :Group : collection of instructions that

does not limit eachother to be executed in parallel

Made before R-stageNeeded for :

The execution order is maintainedEach pipeline runs a subset of instructionsInstructions may require helpers

Execution order : in – order execution

Page 21: UltraSparc IV

Cache OrganizationCache Organization

Doubled cache size because of dual core.Data Cache : 64 KB x 2Instruction Cache : 32 KB x 2L2 Cache : 16 MB, off-chip, sharedNo L3 Cache

Doubled cache size because of dual core.Data Cache : 64 KB x 2Instruction Cache : 32 KB x 2L2 Cache : 16 MB, off-chip, sharedNo L3 Cache

Page 22: UltraSparc IV

Cache OrganizationCache Organization

Page 23: UltraSparc IV

Cache OrganizationCache Organization

Data Cache64 KB Level 1 cache per core

Instruction Cache32 KB Level 1 cache per core4 – way associative

Data Cache64 KB Level 1 cache per core

Instruction Cache32 KB Level 1 cache per core4 – way associative

Page 24: UltraSparc IV

Cache OrganizationCache Organization

Prefetch CacheOne of L1 caches2 Kbyte SRAM : 32 x 64 bytesUses LRU replacement algorithmAim is to fetch data before neededReduces main memory access latency2 ports reads 8 bytes, 1 port writes 16

bytes per cycle.Hardware prefetch

Prefetch CacheOne of L1 caches2 Kbyte SRAM : 32 x 64 bytesUses LRU replacement algorithmAim is to fetch data before neededReduces main memory access latency2 ports reads 8 bytes, 1 port writes 16

bytes per cycle.Hardware prefetch

Page 25: UltraSparc IV

Cache OrganizationCache Organization

Write CacheReduces the bandwidth due to store

traffic2 Kbyte cacheHandles multiprocessor and on-chip

cache consistencyImproves error recoveryOptionally uses a hashed index

Write CacheReduces the bandwidth due to store

traffic2 Kbyte cacheHandles multiprocessor and on-chip

cache consistencyImproves error recoveryOptionally uses a hashed index

Page 26: UltraSparc IV

Cache OrganizationCache Organization

L2 Cache16 MB SRAM shared by two processorsSeperate L2 cache tagsTwo way set associativeLRU replacement policy128 bytes of line size

UltraSparc IV+ has an on-die Level 2 cache with an off-die Level 3 cache

L2 Cache16 MB SRAM shared by two processorsSeperate L2 cache tagsTwo way set associativeLRU replacement policy128 bytes of line size

UltraSparc IV+ has an on-die Level 2 cache with an off-die Level 3 cache

Page 27: UltraSparc IV

Branch PredictionBranch Prediction

Branch Predictor : Small, single-cycle accessedSRAMOutput is connected to P-stage

Branch detemination is made in B-stageIf miss, return to A-Stage.

Branch Predictor : Small, single-cycle accessedSRAMOutput is connected to P-stage

Branch detemination is made in B-stageIf miss, return to A-Stage.

Page 28: UltraSparc IV

ConclusionConclusion

UltraSparc IV is a milestone as it is first dual core chip of UltraSparc family

Sun continues to develop UltraSparc :UltraSparc IV+UltraSparc T1

UltraSparc IV is a milestone as it is first dual core chip of UltraSparc family

Sun continues to develop UltraSparc :UltraSparc IV+UltraSparc T1

Page 29: UltraSparc IV

ReferencesReferences

UltraSparc IV User’s Manual, Sun Microsystems

UltraSparc IV Whitepaper, Sun Microsystems

UltraSparc IV Mirrors Predecessor, Kevin Krewell

Implementation and Productization of a 4th Generation 1.8GHz Dual-Core SPARC V9 Microprocessor, Anand Dixit, Jason Hart, ...

UltraSparc III User’s Manual, Sun Microsystems

UltraSparc IV User’s Manual, Sun Microsystems

UltraSparc IV Whitepaper, Sun Microsystems

UltraSparc IV Mirrors Predecessor, Kevin Krewell

Implementation and Productization of a 4th Generation 1.8GHz Dual-Core SPARC V9 Microprocessor, Anand Dixit, Jason Hart, ...

UltraSparc III User’s Manual, Sun Microsystems

Page 30: UltraSparc IV

ReferencesReferences

Web Sites :http://web.cs.unlv.edu/cs219/group3/

index.htmlhttp://bwrc.eecs.berkeley.edu/CIC/

archive/cpu_history.html#SPARChttp://www.arcade-eu.org/overview/2005/

sparcIV.htmlhttp://www.top500.org/orsc/2006/

sparcIV.htmhttp://www.sparc.org/history.html

Web Sites :http://web.cs.unlv.edu/cs219/group3/

index.htmlhttp://bwrc.eecs.berkeley.edu/CIC/

archive/cpu_history.html#SPARChttp://www.arcade-eu.org/overview/2005/

sparcIV.htmlhttp://www.top500.org/orsc/2006/

sparcIV.htmhttp://www.sparc.org/history.html

Page 31: UltraSparc IV

Questions...Questions...