design and management of 3d cmp’s using network-in-memory feihui li et.al. penn state university...
TRANSCRIPT
![Page 1: Design and Management of 3D CMP’s using Network-in-Memory Feihui Li et.al. Penn State University (ISCA – 2006)](https://reader035.vdocuments.us/reader035/viewer/2022062717/56649e445503460f94b38035/html5/thumbnails/1.jpg)
Design and Management of 3D CMP’s using
Network-in-Memory
Feihui Li et.al.Penn State University
(ISCA – 2006)
![Page 2: Design and Management of 3D CMP’s using Network-in-Memory Feihui Li et.al. Penn State University (ISCA – 2006)](https://reader035.vdocuments.us/reader035/viewer/2022062717/56649e445503460f94b38035/html5/thumbnails/2.jpg)
News..
![Page 3: Design and Management of 3D CMP’s using Network-in-Memory Feihui Li et.al. Penn State University (ISCA – 2006)](https://reader035.vdocuments.us/reader035/viewer/2022062717/56649e445503460f94b38035/html5/thumbnails/3.jpg)
Moral of the story…
• 3D technology helps in reducing wire delays – Exploit it in as many ways as you can!– They chose L2 caches
• Also, 3D leads to on-chip hotspots.– Arrange units intelligently, reduce
localized hotspots.
![Page 4: Design and Management of 3D CMP’s using Network-in-Memory Feihui Li et.al. Penn State University (ISCA – 2006)](https://reader035.vdocuments.us/reader035/viewer/2022062717/56649e445503460f94b38035/html5/thumbnails/4.jpg)
Major Results/Contributions
• First 3D CMP design space exploration• Proposal of 3D NUCA L2 caches for CMP’s.
– Comparison with the existing 2D counterparts.– 3D works better even without data migration
• Proposal of NoC’s as a method of communication between L2 banks.– “Efficiently exploit fast vertical interconnects”
![Page 5: Design and Management of 3D CMP’s using Network-in-Memory Feihui Li et.al. Penn State University (ISCA – 2006)](https://reader035.vdocuments.us/reader035/viewer/2022062717/56649e445503460f94b38035/html5/thumbnails/5.jpg)
Basics…
Typical Network-on-Chip architecture
Major types of integration
![Page 6: Design and Management of 3D CMP’s using Network-in-Memory Feihui Li et.al. Penn State University (ISCA – 2006)](https://reader035.vdocuments.us/reader035/viewer/2022062717/56649e445503460f94b38035/html5/thumbnails/6.jpg)
Proposed : 3D Network-in-MemL2 Cache bank / or CPU
Pillar nodeProcessing
Element(Cache Bank
or CPU)NIC
R
b bits
Single-Stage Router
Processing Element
(Cache Bank or CPU)
NIC
R
b bits
Inpu t Buffer
Output Bu
ffer
dTDMA Bus
NoC
NoC/Bus Interface
b-bit dTDMA Bus (Communication Pillar)
orthogonal to slide
Single-Stage Router
Inpu t Buffer
Output Bu
ffer
dTDMA Bus
NoC/Bus Interface
b-bit dTDMA Bus (Communication Pillar)
orthogonal to slide
Router
Communication Pillar
dTDMA Bus (Dynamic Time-Division Multiple Access)
![Page 7: Design and Management of 3D CMP’s using Network-in-Memory Feihui Li et.al. Penn State University (ISCA – 2006)](https://reader035.vdocuments.us/reader035/viewer/2022062717/56649e445503460f94b38035/html5/thumbnails/7.jpg)
The dTDMA Bus as the Communication Pillar
1500 um
10~100 um
Use dTDMA bus (VLSID 2006) V efficient/fast bus V small area/power overhead
l ay e
rs
Router
dTDMA Bus Arbiter
Do not use multi-hop for vertical communication x vertical distance is so small
![Page 8: Design and Management of 3D CMP’s using Network-in-Memory Feihui Li et.al. Penn State University (ISCA – 2006)](https://reader035.vdocuments.us/reader035/viewer/2022062717/56649e445503460f94b38035/html5/thumbnails/8.jpg)
Proposals (1)• Inter-die “communication pillars”
• Integration of dTDMA buses and NoC routers for a fast communication interface – typical NoC fails due to
• increased complexity
• contention issues
• increased power/area overhead
• multi-hop vertical comm.
![Page 9: Design and Management of 3D CMP’s using Network-in-Memory Feihui Li et.al. Penn State University (ISCA – 2006)](https://reader035.vdocuments.us/reader035/viewer/2022062717/56649e445503460f94b38035/html5/thumbnails/9.jpg)
3D Benefit: Increased Locality CPU Nodes within 1 hop
Nodes within 2 hops Nodes within 3 hops
dTDMA pillar
2D vicinity
3D vicinity
![Page 10: Design and Management of 3D CMP’s using Network-in-Memory Feihui Li et.al. Penn State University (ISCA – 2006)](https://reader035.vdocuments.us/reader035/viewer/2022062717/56649e445503460f94b38035/html5/thumbnails/10.jpg)
Proposals (2)
• Cannot increase # of pillars arbitrarily– Depends on via density– Router complexity
• So, CPU’s share pillars– Stacking of CPU’s also has to be considered
• CPU placement algorithm– Stack CPU’s across dies so as to
• Maintain decent access hop-count• Manage thermal profile
![Page 11: Design and Management of 3D CMP’s using Network-in-Memory Feihui Li et.al. Penn State University (ISCA – 2006)](https://reader035.vdocuments.us/reader035/viewer/2022062717/56649e445503460f94b38035/html5/thumbnails/11.jpg)
CPU placement example
This way, not stacking CPU’s on top of one another, helps to solve localized hotspot problem
![Page 12: Design and Management of 3D CMP’s using Network-in-Memory Feihui Li et.al. Penn State University (ISCA – 2006)](https://reader035.vdocuments.us/reader035/viewer/2022062717/56649e445503460f94b38035/html5/thumbnails/12.jpg)
![Page 13: Design and Management of 3D CMP’s using Network-in-Memory Feihui Li et.al. Penn State University (ISCA – 2006)](https://reader035.vdocuments.us/reader035/viewer/2022062717/56649e445503460f94b38035/html5/thumbnails/13.jpg)
3D L2 Caches
• Clusters – Cache banks + tag array– Some clusters have CPU’s, others don’t.
Cache Management
• Search• Placement & Replacement• Cache Line Migration
![Page 14: Design and Management of 3D CMP’s using Network-in-Memory Feihui Li et.al. Penn State University (ISCA – 2006)](https://reader035.vdocuments.us/reader035/viewer/2022062717/56649e445503460f94b38035/html5/thumbnails/14.jpg)
L2 Cache Management
![Page 15: Design and Management of 3D CMP’s using Network-in-Memory Feihui Li et.al. Penn State University (ISCA – 2006)](https://reader035.vdocuments.us/reader035/viewer/2022062717/56649e445503460f94b38035/html5/thumbnails/15.jpg)
Simulation Environment
• Simics + in-house NoC simulator• All CPU’s issue in-order
– 8 CPU’s, SPARC ISA– Directory based protocol for coherence
between L1’s and the L2
• HS3d for temperature modeling• 64MB and 32 MB L2 caches
![Page 16: Design and Management of 3D CMP’s using Network-in-Memory Feihui Li et.al. Penn State University (ISCA – 2006)](https://reader035.vdocuments.us/reader035/viewer/2022062717/56649e445503460f94b38035/html5/thumbnails/16.jpg)
Performance
0
0.5
1
1.5
2
2.5
3
3.5
ammp apsi art equake f ma3d galgel mgrid swim wupwise
IPC
CMP-DNUCA CMP-DNUCA-3D CMP-SNUCA-3D
![Page 17: Design and Management of 3D CMP’s using Network-in-Memory Feihui Li et.al. Penn State University (ISCA – 2006)](https://reader035.vdocuments.us/reader035/viewer/2022062717/56649e445503460f94b38035/html5/thumbnails/17.jpg)
Important Results
![Page 18: Design and Management of 3D CMP’s using Network-in-Memory Feihui Li et.al. Penn State University (ISCA – 2006)](https://reader035.vdocuments.us/reader035/viewer/2022062717/56649e445503460f94b38035/html5/thumbnails/18.jpg)
Important Results (2)
Impact of # of “pillars” on access latency
![Page 19: Design and Management of 3D CMP’s using Network-in-Memory Feihui Li et.al. Penn State University (ISCA – 2006)](https://reader035.vdocuments.us/reader035/viewer/2022062717/56649e445503460f94b38035/html5/thumbnails/19.jpg)
Important Results (3)
![Page 20: Design and Management of 3D CMP’s using Network-in-Memory Feihui Li et.al. Penn State University (ISCA – 2006)](https://reader035.vdocuments.us/reader035/viewer/2022062717/56649e445503460f94b38035/html5/thumbnails/20.jpg)
Final Word
• 3D is feasible & scalable… and has arrived.
• Localized hotspots can be solved by placing hotter units apart.
• Power savings + performance gain even without data migration– No numbers to support the claim(!)– Would that help the temperature issue as well?
![Page 21: Design and Management of 3D CMP’s using Network-in-Memory Feihui Li et.al. Penn State University (ISCA – 2006)](https://reader035.vdocuments.us/reader035/viewer/2022062717/56649e445503460f94b38035/html5/thumbnails/21.jpg)
Potential HPCA Submission
• An evaluation of temperature and IPC for a single core 3D processor• Leverage clustered architectures for
“temperature aware” processor designs.– Basic premise : Stacking cooler units (caches)
on top of hotter units• Better thermal profile of processor
![Page 22: Design and Management of 3D CMP’s using Network-in-Memory Feihui Li et.al. Penn State University (ISCA – 2006)](https://reader035.vdocuments.us/reader035/viewer/2022062717/56649e445503460f94b38035/html5/thumbnails/22.jpg)
Proposals
Arch 1Arch 2
Arch 3
Cachebank
Cachebank
Cluster
![Page 23: Design and Management of 3D CMP’s using Network-in-Memory Feihui Li et.al. Penn State University (ISCA – 2006)](https://reader035.vdocuments.us/reader035/viewer/2022062717/56649e445503460f94b38035/html5/thumbnails/23.jpg)
Proposals (2)
• Cache banks (both data and instruction) are– 2 way word-interleaved, or,– Replicated
• Present study done for 8-cluster architecture
![Page 24: Design and Management of 3D CMP’s using Network-in-Memory Feihui Li et.al. Penn State University (ISCA – 2006)](https://reader035.vdocuments.us/reader035/viewer/2022062717/56649e445503460f94b38035/html5/thumbnails/24.jpg)
Results (Performance)
2-way word interleaved caches
![Page 25: Design and Management of 3D CMP’s using Network-in-Memory Feihui Li et.al. Penn State University (ISCA – 2006)](https://reader035.vdocuments.us/reader035/viewer/2022062717/56649e445503460f94b38035/html5/thumbnails/25.jpg)
Results (Performance)
Replicated caches
![Page 26: Design and Management of 3D CMP’s using Network-in-Memory Feihui Li et.al. Penn State University (ISCA – 2006)](https://reader035.vdocuments.us/reader035/viewer/2022062717/56649e445503460f94b38035/html5/thumbnails/26.jpg)
Traffic Analysis
0
5000000
10000000
15000000
20000000
25000000
amm
p
appl
u
apsi art
bzip
2
craf
ty
eon
equa
ke
fma3
d
galg
el
gap
gcc
gzip
luca
s
mcf
mes
a
mgr
id
pars
er
swim
twol
f
vorte
x
vpr
wup
wis
e
Benchmarks - Arch1
Nu
mb
er o
f Acc
esse
s
RINGHOPCOUNT TOTALD2DHOPCOUNT INTERCLUSTER RINGHOP FOR CACHE
![Page 27: Design and Management of 3D CMP’s using Network-in-Memory Feihui Li et.al. Penn State University (ISCA – 2006)](https://reader035.vdocuments.us/reader035/viewer/2022062717/56649e445503460f94b38035/html5/thumbnails/27.jpg)
Traffic Analysis (2)
0
5000000
10000000
15000000
20000000
25000000
amm
p
appl
u
apsi ar
t
bzip
2
craf
ty
eon
equa
ke
fma3
d
galg
el
gap
gcc
gzip
luca
s
mcf
mes
a
mgr
id
pars
er
swim
twol
f
vorte
x
vpr
wup
wis
e
Benchmarks -Arch2
Num
ber o
f Acc
esse
s
RINGHOPCOUNT TOTALD2DHOPCOUNT INTERCLUSTER RINGHOP FOR CACHE
![Page 28: Design and Management of 3D CMP’s using Network-in-Memory Feihui Li et.al. Penn State University (ISCA – 2006)](https://reader035.vdocuments.us/reader035/viewer/2022062717/56649e445503460f94b38035/html5/thumbnails/28.jpg)
Results (Thermal)
0
50
100
150
200
250
300
350
400
Peak
Tem
p of
Hot
test
Uni
t (C)
BASE ARCH 1 ARCH 2