niagara: a 32-way multithreaded sparc processorcgi.di.uoa.gr/~charnik/files/niagara.pdf ·...

Post on 18-Mar-2020

4 Views

Category:

Documents

0 Downloads

Preview:

Click to see full reader

TRANSCRIPT

Niagara: A 32-Way Multithreaded SparcProcessor

Poonacha Kongetira, Kathirgamar Aingaran, Kunle OlukotunSun Microsystems

Charalampos S. Nikolaoucharnik@di.uoa.gr

Department of Informatics and Telecommunications

25 June 2008

GoalArchitectural GoalGoal’s characteristicsSun’s Approach

NiagaraNiagara OverviewSparc PipelineThread schedulingInteger Register FileMemory Subsystem

PerformancePower Consumption

Architectural Goal

Provide:

I high performance for commercial server applications

I low levels of power consumption

Goal’s characteristics

Commercial server applications tend to have:

I Low ILPhigh cache miss rates (large working sets/poor locality)many unpredictable branchesfrequently undetectable load-load dependencies=> memory access time limits performance

I High TLPlarge numbers of parallel client requests

I High power consumption400− 700W /foot2 for racked server clusters in Google

Sun’s Approach

I Ultra Sparc T1 Processor - Niagara 1

I Avoids high-latency communication between multiprocessors(SMP)

I Multicore approach (cores aggregated on a single die)

I Fine-grain multithreading within core

I Small L1 cache per core

I L2 cache shareable by cores

GoalArchitectural GoalGoal’s characteristicsSun’s Approach

NiagaraNiagara OverviewSparc PipelineThread schedulingInteger Register FileMemory Subsystem

PerformancePower Consumption

Niagara Overview

I 8 cores4 threads per core1 pipeline (Sparc pipeline) per core2 L1 caches (instruction/data) per core shareable by the 4 threadsthread scheduling per core

I 3-Mbyte L2 cache4-way banked and pipelined for high bandwidth12-way set-associative for minimizing conflict missesshared by all threads

I crossbar interconnect of up to 200GB/s bandwidthconnects Sparc pipes with L2 cache banks and other shared

resourcesprovides a port for accessing the I/O subsystemuses age-based priority scheme

I 4 channels of DDR2 DRAMmaximum bandwidth up to 20GB/scapacity up to 128GB

Niagara Processor

Sparc pipe

Single-issue pipeline with six stages: Fetch, Select, Decode,Execute, Memory, Write Back

Unique resources per thread:

I set of registers

I instruction buffer

I store buffer

Shared resources among threads:

I L1 cache

I translation look-aside buffers (TLB — ITLB, DTLB)

I ALU, divider, multiplier, shifter

Sparc pipe block diagram

Thread scheduling (1/2)

Policy based on:

I LRU status

I instruction type

I cache misses

I traps

I resource conflicts

I speculative loads

Figure: Thread selection: all threadsavailable

Thread scheduling (2/2)

Figure: Thread selection: only two threads available (0, 1)

Integer Register File

I One register file per thread

I A reg. file consists of 8 windows, whichconstists of 8 Ins , 8 Outs and 8 Locals regs

I A window corresponds to a procedure call

I Between two procedure calls the windowsshare the registers Ins and Outs

I Only one window is active

I Reads/writes take a single cycle Figure: Integer register file perthread

Memory Subsystem

I 16 KB L1 instruction cache4-way set-associative with a line (block) size of 32 bytesrandom replacement scheme for area savings

I 8 KB L1 data cache4-way set-associative with a line size of 16 byteswrite-through policy (allocate on load, no-allocate on

stores)

L2 cache:

I maintains a sharers list at L1-line granularity

I stores do not update L1 caches until they have updated theL2 cache

I copy-back policy (write-back dirty lines, drop clean lines)

L1 caches succeed 10% miss rate. Threads per core hide thelatencies from L1 and L2 misses.

GoalArchitectural GoalGoal’s characteristicsSun’s Approach

NiagaraNiagara OverviewSparc PipelineThread schedulingInteger Register FileMemory Subsystem

PerformancePower Consumption

Power Consumpion Performace

Niagara’s dissipation of power ranges from 60 to 72 W with itspeak to 75 W.

Figure: Power consumption of various processors

References

P. Kongetira, K. Aingaran, K. Olukotun,Niagara: A 32-Way Multithreaded SPARC Processor, IEEEMicro, March-April 2005, pp. 21-29.

Wikipedia,Comparison of power consumption of some nearly modernCPUshttp://en.wikipedia.org/wiki/CPU_power_dissipation#Intel_processors, 2006.

The End

Thank you!

top related