ultrasparc t2 processor - politecnico di milano

14
UltraSPARC T2 Processor NIAGARA 2 The first “system-on-a-chip” Maurizio Primo Mauri - matr. 720145 Sandro Gattuso - matr. 711522

Upload: others

Post on 13-Mar-2022

11 views

Category:

Documents


0 download

TRANSCRIPT

UltraSPARC T2 Processor

NIAGARA 2 The first “system-on-a-chip”

Maurizio Primo Mauri - matr. 720145Sandro Gattuso - matr. 711522

Instruction Level Parallelism (ILP)

• Common techniques to improve performance:

➡ Deep pipelines and Multiple instruction issue

➡ Speculation and Out-of-order execution

• Consequences:

➡ Complex processor design

➡ Poor pipeline efficiencies and high power consumption

2

Niagara: a different approach

• Multithreading: consists of dividing the instruction stream into several smaller sub-streams also known as threads (or strands).

• Thread Level Parallelism (TLP):

• Power envelope

• Good performance

• Lower frequency

3

Niagara: a different approach

4

Niagara: a different approach

4

Niagara 1

• Each core supports 4 threads with 16KB I-cache and 8KB D-cache

• 1 Shared FPU

• 4 L2 Shared Cache Banks

• 4 integrated DDR2 memory controllers

5

Disadvantages of Niagara 1

• FPU is an obvious weakness and potential bottleneck;

• Software should be heavily threaded and none of the threads should be speed-critical;

• Inefficient when threads become scarce;

• Optimized to run unmodified SPARC V9 code

6

Niagara 2 improvement

• Remedies poor floating-point performance of Niagara1

• 8 cores with an upgraded core design

• 8 thread for each pipeline

• Addition of a floating-point/graphics unit (FGU) to each core

• Significant upgrade of the in-core asynchronous cryptographic coprocessor

7

Niagara 2 Core

• Full HW support for execution of 8 independent threads, partitioned in 2 groups

• 16KB 8-way set-associative L1 I-cache

• 4-way set-associative L1 D-cache

8

Niagara 2 MicroArchitecture

• Threads switch on a cycle-by-cycle using a least recently issued priority scheme.

9

Niagara 2 Pipeline

• Add new pipe stage “pick”

• 8 stage integer pipeline:

• Memory (data translation, access tag/data array)

• Bypass (late way select, data formatting, data forwarding)

• 12 stage floating-point pipeline:

• Longer pipeline for divide/sqrt

10

Niagara 2 Architectures

• 8 cores

• 8 Banks of 16-way set-associative L2 Cache totally 4 MBs

• 4 on chip DRAM controllers

11

Niagara 1 vs Niagara 2

12

Niagara 1 vs Niagara 2

12