UltraSPARC T2 Processor
NIAGARA 2 The first “system-on-a-chip”
Maurizio Primo Mauri - matr. 720145Sandro Gattuso - matr. 711522
Instruction Level Parallelism (ILP)
• Common techniques to improve performance:
➡ Deep pipelines and Multiple instruction issue
➡ Speculation and Out-of-order execution
• Consequences:
➡ Complex processor design
➡ Poor pipeline efficiencies and high power consumption
2
Niagara: a different approach
• Multithreading: consists of dividing the instruction stream into several smaller sub-streams also known as threads (or strands).
• Thread Level Parallelism (TLP):
• Power envelope
• Good performance
• Lower frequency
3
Niagara 1
• Each core supports 4 threads with 16KB I-cache and 8KB D-cache
• 1 Shared FPU
• 4 L2 Shared Cache Banks
• 4 integrated DDR2 memory controllers
5
Disadvantages of Niagara 1
• FPU is an obvious weakness and potential bottleneck;
• Software should be heavily threaded and none of the threads should be speed-critical;
• Inefficient when threads become scarce;
• Optimized to run unmodified SPARC V9 code
6
Niagara 2 improvement
• Remedies poor floating-point performance of Niagara1
• 8 cores with an upgraded core design
• 8 thread for each pipeline
• Addition of a floating-point/graphics unit (FGU) to each core
• Significant upgrade of the in-core asynchronous cryptographic coprocessor
7
Niagara 2 Core
• Full HW support for execution of 8 independent threads, partitioned in 2 groups
• 16KB 8-way set-associative L1 I-cache
• 4-way set-associative L1 D-cache
8
Niagara 2 MicroArchitecture
• Threads switch on a cycle-by-cycle using a least recently issued priority scheme.
9
Niagara 2 Pipeline
• Add new pipe stage “pick”
• 8 stage integer pipeline:
• Memory (data translation, access tag/data array)
• Bypass (late way select, data formatting, data forwarding)
• 12 stage floating-point pipeline:
• Longer pipeline for divide/sqrt
10
Niagara 2 Architectures
• 8 cores
• 8 Banks of 16-way set-associative L2 Cache totally 4 MBs
• 4 on chip DRAM controllers
11