a few issues on the design of future multicores andré seznec irisa/inria
TRANSCRIPT
![Page 1: A few issues on the design of future multicores André Seznec IRISA/INRIA](https://reader035.vdocuments.us/reader035/viewer/2022062713/56649f445503460f94c65dd2/html5/thumbnails/1.jpg)
A few issues on the design
of future multicores
André Seznec
IRISA/INRIA
![Page 2: A few issues on the design of future multicores André Seznec IRISA/INRIA](https://reader035.vdocuments.us/reader035/viewer/2022062713/56649f445503460f94c65dd2/html5/thumbnails/2.jpg)
2André Seznec
CAPS project-teamIrisa-Inria
Single Chip Uniprocessor: the end of the road
(Very) wide issue superscalar processors are not cost effective:
More than quadratic complexity on many key components:
• Register file
• Bypass network
• Issue logic
Limited performance return
Failure of EV8 =
end of very wide issue superscalar processors
![Page 3: A few issues on the design of future multicores André Seznec IRISA/INRIA](https://reader035.vdocuments.us/reader035/viewer/2022062713/56649f445503460f94c65dd2/html5/thumbnails/3.jpg)
3André Seznec
CAPS project-teamIrisa-Inria
Hardware thread parallelism
High-end single chip component: Chip multiprocessors:
• IBM Power 5, dual-core Intel Pentium 4, dual-core Athlon-64
• Many CMP SoCs for embedded markets• Cell
(Simultaneous) Multithreading:• Pentium 4, Power 5,• Multithreading
![Page 4: A few issues on the design of future multicores André Seznec IRISA/INRIA](https://reader035.vdocuments.us/reader035/viewer/2022062713/56649f445503460f94c65dd2/html5/thumbnails/4.jpg)
4André Seznec
CAPS project-teamIrisa-Inria
Thread parallelism
Expressed by the application developer: Depends on the application itself Depends on the programming language or paradigm Depends on the programmer
Discovered by the compiler: Automatic (static) parallelization
Exploited by the runtime: Task scheduling
Dynamically discovered/exploited by hardware or software: Speculative hardware/software threading
![Page 5: A few issues on the design of future multicores André Seznec IRISA/INRIA](https://reader035.vdocuments.us/reader035/viewer/2022062713/56649f445503460f94c65dd2/html5/thumbnails/5.jpg)
5André Seznec
CAPS project-teamIrisa-Inria
Direction of (single chip) architecture:betting on parallelism success
(Future) applications are intrinsically parallel: As much as possible simple cores
(Future) applications are moderately parallel A few complex state-of-the-art superscalar cores
SSC: Sea of Simple Cores
FCC: Few Complex Cores
![Page 6: A few issues on the design of future multicores André Seznec IRISA/INRIA](https://reader035.vdocuments.us/reader035/viewer/2022062713/56649f445503460f94c65dd2/html5/thumbnails/6.jpg)
6André Seznec
CAPS project-teamIrisa-Inria
SSC: Sea of Simple Cores
![Page 7: A few issues on the design of future multicores André Seznec IRISA/INRIA](https://reader035.vdocuments.us/reader035/viewer/2022062713/56649f445503460f94c65dd2/html5/thumbnails/7.jpg)
7André Seznec
CAPS project-teamIrisa-Inria
FCC: Few Complex Cores
4-way O-O-O superscalar
4-way O-O-O superscalar
Shared L3 cache
4-way O-O-O superscalar
••••
![Page 8: A few issues on the design of future multicores André Seznec IRISA/INRIA](https://reader035.vdocuments.us/reader035/viewer/2022062713/56649f445503460f94c65dd2/html5/thumbnails/8.jpg)
8André Seznec
CAPS project-teamIrisa-Inria
Common architectural design issues
![Page 9: A few issues on the design of future multicores André Seznec IRISA/INRIA](https://reader035.vdocuments.us/reader035/viewer/2022062713/56649f445503460f94c65dd2/html5/thumbnails/9.jpg)
9André Seznec
CAPS project-teamIrisa-Inria
Instruction Set Architecture
Single ISAs ? Extension of “conventional” multiprocessors
• Shared or distributed memory ?
Hetorogeneous ISAs: A la CELL ?: (master processor + slave
processors) x N A la SoC ? : specialized coprocessors Radically new architecture ?
• Which one ?
![Page 10: A few issues on the design of future multicores André Seznec IRISA/INRIA](https://reader035.vdocuments.us/reader035/viewer/2022062713/56649f445503460f94c65dd2/html5/thumbnails/10.jpg)
10André Seznec
CAPS project-teamIrisa-Inria
Hardware accelerators ?
SIMD extensions: Seems to be accepted, report the burden to applications
developers and compilers
Reconfigurable datapaths: Popular when you get a well defined intrinsically parallel
application
Vector extensions: Might be the right move when targeting essentially scientific
computing
![Page 11: A few issues on the design of future multicores André Seznec IRISA/INRIA](https://reader035.vdocuments.us/reader035/viewer/2022062713/56649f445503460f94c65dd2/html5/thumbnails/11.jpg)
11André Seznec
CAPS project-teamIrisa-Inria
On-chip memory/processors/memory bandwidth
The uniprocessor credo was:
“Use the remaining silicon for caches”
New issue: An extra processor or more cache
Extra processing power = increased memory bandwidth demand Increased power consumption, more temperature hot spots
Extra cache = decreased (external) memory demand
![Page 12: A few issues on the design of future multicores André Seznec IRISA/INRIA](https://reader035.vdocuments.us/reader035/viewer/2022062713/56649f445503460f94c65dd2/html5/thumbnails/12.jpg)
12André Seznec
CAPS project-teamIrisa-Inria
Memory hierarchy organization ?
![Page 13: A few issues on the design of future multicores André Seznec IRISA/INRIA](https://reader035.vdocuments.us/reader035/viewer/2022062713/56649f445503460f94c65dd2/html5/thumbnails/13.jpg)
13André Seznec
CAPS project-teamIrisa-Inria
Flat: sharing a big L2/L3 cache?
μP $ μP $ μP $ μP $
μP $ μP $ μP $ μP $
μP $ μP $ μP $ μP $
L3 cache
![Page 14: A few issues on the design of future multicores André Seznec IRISA/INRIA](https://reader035.vdocuments.us/reader035/viewer/2022062713/56649f445503460f94c65dd2/html5/thumbnails/14.jpg)
14André Seznec
CAPS project-teamIrisa-Inria
Flat: communication issues?through the big cache
μP $ μP $ μP $ μP $
μP $ μP $ μP $ μP $
μP $ μP $ μP $ μP $
L3 cache
![Page 15: A few issues on the design of future multicores André Seznec IRISA/INRIA](https://reader035.vdocuments.us/reader035/viewer/2022062713/56649f445503460f94c65dd2/html5/thumbnails/15.jpg)
15André Seznec
CAPS project-teamIrisa-Inria
Flat: communication issues?Grid-like ?
μP $ μP $ μP $ μP $
μP $ μP $ μP $ μP $
μP $ μP $ μP $ μP $
L3 cache
![Page 16: A few issues on the design of future multicores André Seznec IRISA/INRIA](https://reader035.vdocuments.us/reader035/viewer/2022062713/56649f445503460f94c65dd2/html5/thumbnails/16.jpg)
16André Seznec
CAPS project-teamIrisa-Inria
Hierarchical organization ?
μP $ μP $
L2 $
μP $ μP $
L2 $
μP $ μP $
L2 $
μP $ μP $
L2 $
L3 $
![Page 17: A few issues on the design of future multicores André Seznec IRISA/INRIA](https://reader035.vdocuments.us/reader035/viewer/2022062713/56649f445503460f94c65dd2/html5/thumbnails/17.jpg)
17André Seznec
CAPS project-teamIrisa-Inria
Hierarchical organization ?
Arbitration at all levels
Coherency at all levels
Interleaving at all levels
Bandwidth dimensioning
![Page 18: A few issues on the design of future multicores André Seznec IRISA/INRIA](https://reader035.vdocuments.us/reader035/viewer/2022062713/56649f445503460f94c65dd2/html5/thumbnails/18.jpg)
18André Seznec
CAPS project-teamIrisa-Inria
NoC structure
Very dependent of the memory hierarchy organization !!
+ sharing coprocessors/hardware accelerators
+ I/O buses/(processors ?)
+ memory interface
+ network interface
![Page 19: A few issues on the design of future multicores André Seznec IRISA/INRIA](https://reader035.vdocuments.us/reader035/viewer/2022062713/56649f445503460f94c65dd2/html5/thumbnails/19.jpg)
19André Seznec
CAPS project-teamIrisa-Inria
Example
μP $ μP $
L2 $
μP $ μP $
L2 $
μP $ μP $
L2 $
L3 $
MemoryInt. IO
![Page 20: A few issues on the design of future multicores André Seznec IRISA/INRIA](https://reader035.vdocuments.us/reader035/viewer/2022062713/56649f445503460f94c65dd2/html5/thumbnails/20.jpg)
20André Seznec
CAPS project-teamIrisa-Inria
Multithreading ?
An extra level thread parallelism !!
Might be an interesting alternative to prefetching on massively parallel applications
![Page 21: A few issues on the design of future multicores André Seznec IRISA/INRIA](https://reader035.vdocuments.us/reader035/viewer/2022062713/56649f445503460f94c65dd2/html5/thumbnails/21.jpg)
21André Seznec
CAPS project-teamIrisa-Inria
Power and thermal issues
Voltage/frequency scaling to adapt to the workload ?
Adapting the workload to the available power ?
Adapting/dimensioning the architecture to the power budget
Activity migration for managing temperatures ?
![Page 22: A few issues on the design of future multicores André Seznec IRISA/INRIA](https://reader035.vdocuments.us/reader035/viewer/2022062713/56649f445503460f94c65dd2/html5/thumbnails/22.jpg)
22André Seznec
CAPS project-teamIrisa-Inria
General issues for software/compiler
Parallelism detection and partitioning: find the correct granularity
Memory bandwidth mastering
Non-uniform memory latency
Optimizing sequential code portions
![Page 23: A few issues on the design of future multicores André Seznec IRISA/INRIA](https://reader035.vdocuments.us/reader035/viewer/2022062713/56649f445503460f94c65dd2/html5/thumbnails/23.jpg)
23André Seznec
CAPS project-teamIrisa-Inria
SSC design specificities
![Page 24: A few issues on the design of future multicores André Seznec IRISA/INRIA](https://reader035.vdocuments.us/reader035/viewer/2022062713/56649f445503460f94c65dd2/html5/thumbnails/24.jpg)
24André Seznec
CAPS project-teamIrisa-Inria
Basic core granularity
RISC cores
VLIW cores
In-order superscalar cores
![Page 25: A few issues on the design of future multicores André Seznec IRISA/INRIA](https://reader035.vdocuments.us/reader035/viewer/2022062713/56649f445503460f94c65dd2/html5/thumbnails/25.jpg)
25André Seznec
CAPS project-teamIrisa-Inria
Homogeneous vs. heterogeneous ISAs
Core specialization: RISC + VLIW or DSP slaves ? Master core + a set of special purpose cores ?
![Page 26: A few issues on the design of future multicores André Seznec IRISA/INRIA](https://reader035.vdocuments.us/reader035/viewer/2022062713/56649f445503460f94c65dd2/html5/thumbnails/26.jpg)
26André Seznec
CAPS project-teamIrisa-Inria
Sharing issue
Simple cores: Lot of duplications and lots of unused resources at any time
Adjacent cores can share: Caches Functional units: FP, mult/div , multimedia, Hardware accelerators
![Page 27: A few issues on the design of future multicores André Seznec IRISA/INRIA](https://reader035.vdocuments.us/reader035/viewer/2022062713/56649f445503460f94c65dd2/html5/thumbnails/27.jpg)
27André Seznec
CAPS project-teamIrisa-Inria
An example of sharing
μP FP μP
DL1 $
Inst. fetch
IL1 $
μP FP μP
DL1 $
Inst. fetch
IL1 $
Har
dw
are
acce
lera
tor
L2 cache
![Page 28: A few issues on the design of future multicores André Seznec IRISA/INRIA](https://reader035.vdocuments.us/reader035/viewer/2022062713/56649f445503460f94c65dd2/html5/thumbnails/28.jpg)
28André Seznec
CAPS project-teamIrisa-Inria
Multithreading/prefetching
Multithreading: Is the extra complexity worth for simple cores ?
Prefetching: Is it worth ? Sharing prefetch engines ?
![Page 29: A few issues on the design of future multicores André Seznec IRISA/INRIA](https://reader035.vdocuments.us/reader035/viewer/2022062713/56649f445503460f94c65dd2/html5/thumbnails/29.jpg)
29André Seznec
CAPS project-teamIrisa-Inria
Vision of a SSC (my own vision )
![Page 30: A few issues on the design of future multicores André Seznec IRISA/INRIA](https://reader035.vdocuments.us/reader035/viewer/2022062713/56649f445503460f94c65dd2/html5/thumbnails/30.jpg)
30André Seznec
CAPS project-teamIrisa-Inria
SSC: the basic brick
μP FP μP
D $
I $
μP FP μP
D $
I $
L2 cache
μP FP μP
D $
I $
μP FP μP
D $
I $
![Page 31: A few issues on the design of future multicores André Seznec IRISA/INRIA](https://reader035.vdocuments.us/reader035/viewer/2022062713/56649f445503460f94c65dd2/html5/thumbnails/31.jpg)
31André Seznec
CAPS project-teamIrisa-Inria
Memory interface
network interface
System interface
L3 cache
μP FP μP
D $
I $
μP FP μP
D $
I $
L2 cache
μP FP μP
D $
I $
μP FP μP
D $
I $
μP FP μP
D $
I $
μP FP μP
D $
I $
L2 cache
μP FP μP
D $
I $
μP FP μP
D $
I $
μP FP μP
D $
I $
μP FP μP
D $
I $
L2 cache
μP FP μP
D $
I $
μP FP μP
D $
I $
μP FP μP
D $
I $
μP FP μP
D $
I $
L2 cache
μP FP μP
D $
I $
μP FP μP
D $
I $
![Page 32: A few issues on the design of future multicores André Seznec IRISA/INRIA](https://reader035.vdocuments.us/reader035/viewer/2022062713/56649f445503460f94c65dd2/html5/thumbnails/32.jpg)
32André Seznec
CAPS project-teamIrisa-Inria
FCC design specificities
![Page 33: A few issues on the design of future multicores André Seznec IRISA/INRIA](https://reader035.vdocuments.us/reader035/viewer/2022062713/56649f445503460f94c65dd2/html5/thumbnails/33.jpg)
33André Seznec
CAPS project-teamIrisa-Inria
Only limited available thread parallelism ?
Focus on uniprocessor architecture: Find the correct tradeoff between complexity and
performance Power and temperature issues
Vector extensions ? Contiguous vectors ( a la SSE) ? Strided vectors in L2 caches ( Tarantula-like)
![Page 34: A few issues on the design of future multicores André Seznec IRISA/INRIA](https://reader035.vdocuments.us/reader035/viewer/2022062713/56649f445503460f94c65dd2/html5/thumbnails/34.jpg)
34André Seznec
CAPS project-teamIrisa-Inria
Performance enablers
SMT for parallel workloads ?
Helper threads ? Run ahead threads
Speculative multithreading hardware support
![Page 35: A few issues on the design of future multicores André Seznec IRISA/INRIA](https://reader035.vdocuments.us/reader035/viewer/2022062713/56649f445503460f94c65dd2/html5/thumbnails/35.jpg)
35André Seznec
CAPS project-teamIrisa-Inria
Intermediate design ?
SCCs: Shine on massively parallel applications
Poor/ limited performance on sequential sections
FCCs: Moderate performance on parallel applications
Good performance on sequential sections
![Page 36: A few issues on the design of future multicores André Seznec IRISA/INRIA](https://reader035.vdocuments.us/reader035/viewer/2022062713/56649f445503460f94c65dd2/html5/thumbnails/36.jpg)
36André Seznec
CAPS project-teamIrisa-Inria
Amdahl’s law
Mix of FCC and SSC
![Page 37: A few issues on the design of future multicores André Seznec IRISA/INRIA](https://reader035.vdocuments.us/reader035/viewer/2022062713/56649f445503460f94c65dd2/html5/thumbnails/37.jpg)
37André Seznec
CAPS project-teamIrisa-Inria
The basic brick
L2 cache
μP FP μP
D $
I $
μP FP μP
D $
I $
Ultimate Out-of-order Superscalar
![Page 38: A few issues on the design of future multicores André Seznec IRISA/INRIA](https://reader035.vdocuments.us/reader035/viewer/2022062713/56649f445503460f94c65dd2/html5/thumbnails/38.jpg)
38André Seznec
CAPS project-teamIrisa-Inria
L2 $
D $
I $
D $
I $
Ult. O-O-O
L2 $
D $
I $
D $
I $
Ult. O-O-O
L2 $
D $
I $
D $
I $
Ult. O-O-O
L2 $
D $
I $
D $
I $
Ult. O-O-O
L3 cache
Memory interface
network interface
System interface
![Page 39: A few issues on the design of future multicores André Seznec IRISA/INRIA](https://reader035.vdocuments.us/reader035/viewer/2022062713/56649f445503460f94c65dd2/html5/thumbnails/39.jpg)
39André Seznec
CAPS project-teamIrisa-Inria
Conclusion
The era of uniprocessor has come to the end
No clear trend to continue
Might be time for more architecture diversity