![Page 1: Niagara: A 32-Way Multithreaded Sparc Processorcgi.di.uoa.gr/~charnik/files/niagara.pdf · 2008-07-04 · Niagara: A 32-Way Multithreaded Sparc Processor Poonacha Kongetira, Kathirgamar](https://reader034.vdocuments.us/reader034/viewer/2022042022/5e7a20e040d6da658e06f1dd/html5/thumbnails/1.jpg)
Niagara: A 32-Way Multithreaded SparcProcessor
Poonacha Kongetira, Kathirgamar Aingaran, Kunle OlukotunSun Microsystems
Charalampos S. [email protected]
Department of Informatics and Telecommunications
25 June 2008
![Page 2: Niagara: A 32-Way Multithreaded Sparc Processorcgi.di.uoa.gr/~charnik/files/niagara.pdf · 2008-07-04 · Niagara: A 32-Way Multithreaded Sparc Processor Poonacha Kongetira, Kathirgamar](https://reader034.vdocuments.us/reader034/viewer/2022042022/5e7a20e040d6da658e06f1dd/html5/thumbnails/2.jpg)
GoalArchitectural GoalGoal’s characteristicsSun’s Approach
NiagaraNiagara OverviewSparc PipelineThread schedulingInteger Register FileMemory Subsystem
PerformancePower Consumption
![Page 3: Niagara: A 32-Way Multithreaded Sparc Processorcgi.di.uoa.gr/~charnik/files/niagara.pdf · 2008-07-04 · Niagara: A 32-Way Multithreaded Sparc Processor Poonacha Kongetira, Kathirgamar](https://reader034.vdocuments.us/reader034/viewer/2022042022/5e7a20e040d6da658e06f1dd/html5/thumbnails/3.jpg)
Architectural Goal
Provide:
I high performance for commercial server applications
I low levels of power consumption
![Page 4: Niagara: A 32-Way Multithreaded Sparc Processorcgi.di.uoa.gr/~charnik/files/niagara.pdf · 2008-07-04 · Niagara: A 32-Way Multithreaded Sparc Processor Poonacha Kongetira, Kathirgamar](https://reader034.vdocuments.us/reader034/viewer/2022042022/5e7a20e040d6da658e06f1dd/html5/thumbnails/4.jpg)
Goal’s characteristics
Commercial server applications tend to have:
I Low ILPhigh cache miss rates (large working sets/poor locality)many unpredictable branchesfrequently undetectable load-load dependencies=> memory access time limits performance
I High TLPlarge numbers of parallel client requests
I High power consumption400− 700W /foot2 for racked server clusters in Google
![Page 5: Niagara: A 32-Way Multithreaded Sparc Processorcgi.di.uoa.gr/~charnik/files/niagara.pdf · 2008-07-04 · Niagara: A 32-Way Multithreaded Sparc Processor Poonacha Kongetira, Kathirgamar](https://reader034.vdocuments.us/reader034/viewer/2022042022/5e7a20e040d6da658e06f1dd/html5/thumbnails/5.jpg)
Sun’s Approach
I Ultra Sparc T1 Processor - Niagara 1
I Avoids high-latency communication between multiprocessors(SMP)
I Multicore approach (cores aggregated on a single die)
I Fine-grain multithreading within core
I Small L1 cache per core
I L2 cache shareable by cores
![Page 6: Niagara: A 32-Way Multithreaded Sparc Processorcgi.di.uoa.gr/~charnik/files/niagara.pdf · 2008-07-04 · Niagara: A 32-Way Multithreaded Sparc Processor Poonacha Kongetira, Kathirgamar](https://reader034.vdocuments.us/reader034/viewer/2022042022/5e7a20e040d6da658e06f1dd/html5/thumbnails/6.jpg)
GoalArchitectural GoalGoal’s characteristicsSun’s Approach
NiagaraNiagara OverviewSparc PipelineThread schedulingInteger Register FileMemory Subsystem
PerformancePower Consumption
![Page 7: Niagara: A 32-Way Multithreaded Sparc Processorcgi.di.uoa.gr/~charnik/files/niagara.pdf · 2008-07-04 · Niagara: A 32-Way Multithreaded Sparc Processor Poonacha Kongetira, Kathirgamar](https://reader034.vdocuments.us/reader034/viewer/2022042022/5e7a20e040d6da658e06f1dd/html5/thumbnails/7.jpg)
Niagara Overview
I 8 cores4 threads per core1 pipeline (Sparc pipeline) per core2 L1 caches (instruction/data) per core shareable by the 4 threadsthread scheduling per core
I 3-Mbyte L2 cache4-way banked and pipelined for high bandwidth12-way set-associative for minimizing conflict missesshared by all threads
I crossbar interconnect of up to 200GB/s bandwidthconnects Sparc pipes with L2 cache banks and other shared
resourcesprovides a port for accessing the I/O subsystemuses age-based priority scheme
I 4 channels of DDR2 DRAMmaximum bandwidth up to 20GB/scapacity up to 128GB
![Page 8: Niagara: A 32-Way Multithreaded Sparc Processorcgi.di.uoa.gr/~charnik/files/niagara.pdf · 2008-07-04 · Niagara: A 32-Way Multithreaded Sparc Processor Poonacha Kongetira, Kathirgamar](https://reader034.vdocuments.us/reader034/viewer/2022042022/5e7a20e040d6da658e06f1dd/html5/thumbnails/8.jpg)
Niagara Processor
![Page 9: Niagara: A 32-Way Multithreaded Sparc Processorcgi.di.uoa.gr/~charnik/files/niagara.pdf · 2008-07-04 · Niagara: A 32-Way Multithreaded Sparc Processor Poonacha Kongetira, Kathirgamar](https://reader034.vdocuments.us/reader034/viewer/2022042022/5e7a20e040d6da658e06f1dd/html5/thumbnails/9.jpg)
Sparc pipe
Single-issue pipeline with six stages: Fetch, Select, Decode,Execute, Memory, Write Back
Unique resources per thread:
I set of registers
I instruction buffer
I store buffer
Shared resources among threads:
I L1 cache
I translation look-aside buffers (TLB — ITLB, DTLB)
I ALU, divider, multiplier, shifter
![Page 10: Niagara: A 32-Way Multithreaded Sparc Processorcgi.di.uoa.gr/~charnik/files/niagara.pdf · 2008-07-04 · Niagara: A 32-Way Multithreaded Sparc Processor Poonacha Kongetira, Kathirgamar](https://reader034.vdocuments.us/reader034/viewer/2022042022/5e7a20e040d6da658e06f1dd/html5/thumbnails/10.jpg)
Sparc pipe block diagram
![Page 11: Niagara: A 32-Way Multithreaded Sparc Processorcgi.di.uoa.gr/~charnik/files/niagara.pdf · 2008-07-04 · Niagara: A 32-Way Multithreaded Sparc Processor Poonacha Kongetira, Kathirgamar](https://reader034.vdocuments.us/reader034/viewer/2022042022/5e7a20e040d6da658e06f1dd/html5/thumbnails/11.jpg)
Thread scheduling (1/2)
Policy based on:
I LRU status
I instruction type
I cache misses
I traps
I resource conflicts
I speculative loads
Figure: Thread selection: all threadsavailable
![Page 12: Niagara: A 32-Way Multithreaded Sparc Processorcgi.di.uoa.gr/~charnik/files/niagara.pdf · 2008-07-04 · Niagara: A 32-Way Multithreaded Sparc Processor Poonacha Kongetira, Kathirgamar](https://reader034.vdocuments.us/reader034/viewer/2022042022/5e7a20e040d6da658e06f1dd/html5/thumbnails/12.jpg)
Thread scheduling (2/2)
Figure: Thread selection: only two threads available (0, 1)
![Page 13: Niagara: A 32-Way Multithreaded Sparc Processorcgi.di.uoa.gr/~charnik/files/niagara.pdf · 2008-07-04 · Niagara: A 32-Way Multithreaded Sparc Processor Poonacha Kongetira, Kathirgamar](https://reader034.vdocuments.us/reader034/viewer/2022042022/5e7a20e040d6da658e06f1dd/html5/thumbnails/13.jpg)
Integer Register File
I One register file per thread
I A reg. file consists of 8 windows, whichconstists of 8 Ins , 8 Outs and 8 Locals regs
I A window corresponds to a procedure call
I Between two procedure calls the windowsshare the registers Ins and Outs
I Only one window is active
I Reads/writes take a single cycle Figure: Integer register file perthread
![Page 14: Niagara: A 32-Way Multithreaded Sparc Processorcgi.di.uoa.gr/~charnik/files/niagara.pdf · 2008-07-04 · Niagara: A 32-Way Multithreaded Sparc Processor Poonacha Kongetira, Kathirgamar](https://reader034.vdocuments.us/reader034/viewer/2022042022/5e7a20e040d6da658e06f1dd/html5/thumbnails/14.jpg)
Memory Subsystem
I 16 KB L1 instruction cache4-way set-associative with a line (block) size of 32 bytesrandom replacement scheme for area savings
I 8 KB L1 data cache4-way set-associative with a line size of 16 byteswrite-through policy (allocate on load, no-allocate on
stores)
L2 cache:
I maintains a sharers list at L1-line granularity
I stores do not update L1 caches until they have updated theL2 cache
I copy-back policy (write-back dirty lines, drop clean lines)
L1 caches succeed 10% miss rate. Threads per core hide thelatencies from L1 and L2 misses.
![Page 15: Niagara: A 32-Way Multithreaded Sparc Processorcgi.di.uoa.gr/~charnik/files/niagara.pdf · 2008-07-04 · Niagara: A 32-Way Multithreaded Sparc Processor Poonacha Kongetira, Kathirgamar](https://reader034.vdocuments.us/reader034/viewer/2022042022/5e7a20e040d6da658e06f1dd/html5/thumbnails/15.jpg)
GoalArchitectural GoalGoal’s characteristicsSun’s Approach
NiagaraNiagara OverviewSparc PipelineThread schedulingInteger Register FileMemory Subsystem
PerformancePower Consumption
![Page 16: Niagara: A 32-Way Multithreaded Sparc Processorcgi.di.uoa.gr/~charnik/files/niagara.pdf · 2008-07-04 · Niagara: A 32-Way Multithreaded Sparc Processor Poonacha Kongetira, Kathirgamar](https://reader034.vdocuments.us/reader034/viewer/2022042022/5e7a20e040d6da658e06f1dd/html5/thumbnails/16.jpg)
Power Consumpion Performace
Niagara’s dissipation of power ranges from 60 to 72 W with itspeak to 75 W.
Figure: Power consumption of various processors
![Page 17: Niagara: A 32-Way Multithreaded Sparc Processorcgi.di.uoa.gr/~charnik/files/niagara.pdf · 2008-07-04 · Niagara: A 32-Way Multithreaded Sparc Processor Poonacha Kongetira, Kathirgamar](https://reader034.vdocuments.us/reader034/viewer/2022042022/5e7a20e040d6da658e06f1dd/html5/thumbnails/17.jpg)
References
P. Kongetira, K. Aingaran, K. Olukotun,Niagara: A 32-Way Multithreaded SPARC Processor, IEEEMicro, March-April 2005, pp. 21-29.
Wikipedia,Comparison of power consumption of some nearly modernCPUshttp://en.wikipedia.org/wiki/CPU_power_dissipation#Intel_processors, 2006.
![Page 18: Niagara: A 32-Way Multithreaded Sparc Processorcgi.di.uoa.gr/~charnik/files/niagara.pdf · 2008-07-04 · Niagara: A 32-Way Multithreaded Sparc Processor Poonacha Kongetira, Kathirgamar](https://reader034.vdocuments.us/reader034/viewer/2022042022/5e7a20e040d6da658e06f1dd/html5/thumbnails/18.jpg)
The End
Thank you!