3&. ! 9mu
TRANSCRIPT
![Page 1: 3&. ! 9MU](https://reader031.vdocuments.us/reader031/viewer/2022012422/61769d58bbc3fc0b9114145a/html5/thumbnails/1.jpg)
A01- 稲葉班 稲葉真理
東京大学情報理工学系研究科
![Page 2: 3&. ! 9MU](https://reader031.vdocuments.us/reader031/viewer/2022012422/61769d58bbc3fc0b9114145a/html5/thumbnails/2.jpg)
研究分担者
今井浩 平木敬 須田礼仁
![Page 3: 3&. ! 9MU](https://reader031.vdocuments.us/reader031/viewer/2022012422/61769d58bbc3fc0b9114145a/html5/thumbnails/3.jpg)
物質科学と計算機科学の融合
• 物質科学シミュレーションは多様な計算手法・質的に異なる計算タスクの組み合わせで構成されている
• 物質科学シミュレーションで、計算量が大きく、現在の計算機では実際に計算できないような 大規模計算が望まれている
![Page 4: 3&. ! 9MU](https://reader031.vdocuments.us/reader031/viewer/2022012422/61769d58bbc3fc0b9114145a/html5/thumbnails/4.jpg)
物質科学と計算機科学の融合
• と それぞれに対応する高性能コンピューティング原理
• 多様な解法を上で実装
• 超並列アーキテクチャ上のための
![Page 5: 3&. ! 9MU](https://reader031.vdocuments.us/reader031/viewer/2022012422/61769d58bbc3fc0b9114145a/html5/thumbnails/5.jpg)
科学技術計算のベンチマーク
![Page 6: 3&. ! 9MU](https://reader031.vdocuments.us/reader031/viewer/2022012422/61769d58bbc3fc0b9114145a/html5/thumbnails/6.jpg)
• 密行列積:演算ボトルネック • 疎行列:メモリボトルネック • FFT: バンド幅ボトルネック
![Page 7: 3&. ! 9MU](https://reader031.vdocuments.us/reader031/viewer/2022012422/61769d58bbc3fc0b9114145a/html5/thumbnails/7.jpg)
EXAFLOPS をめざして
• – 演算 --- 演算アクセラレータ – 通信 --- ネットワークアクセラレータ – メモリ操作 --- メモリアクセラレータ
• – 最適化コンパイラ – 最適化ハードウェアコンパイラ – 実行時最適化・回路選択ソフトウェア
![Page 8: 3&. ! 9MU](https://reader031.vdocuments.us/reader031/viewer/2022012422/61769d58bbc3fc0b9114145a/html5/thumbnails/8.jpg)
演算 Grape-DR アクセラレータ
![Page 9: 3&. ! 9MU](https://reader031.vdocuments.us/reader031/viewer/2022012422/61769d58bbc3fc0b9114145a/html5/thumbnails/9.jpg)
GRAPE-DR@天文台 Little Green 500 世界1位 LINPACK で 815MFlops/W
![Page 10: 3&. ! 9MU](https://reader031.vdocuments.us/reader031/viewer/2022012422/61769d58bbc3fc0b9114145a/html5/thumbnails/10.jpg)
通信 ネットワークテストベッド
![Page 11: 3&. ! 9MU](https://reader031.vdocuments.us/reader031/viewer/2022012422/61769d58bbc3fc0b9114145a/html5/thumbnails/11.jpg)
メモリアクセラレータ テストベッド Convey HC-1
![Page 12: 3&. ! 9MU](https://reader031.vdocuments.us/reader031/viewer/2022012422/61769d58bbc3fc0b9114145a/html5/thumbnails/12.jpg)
Grape-DR アクセラレータ
![Page 13: 3&. ! 9MU](https://reader031.vdocuments.us/reader031/viewer/2022012422/61769d58bbc3fc0b9114145a/html5/thumbnails/13.jpg)
目標 • 細粒度で数千PEに • 通信コストを小さく • 作業領域を小さく • パイプラインを埋める
解法 • データ依存を表現するグラフを提案 • グラフから制約式を作り 0-1 整数計画法に落し、最適なデータ配置・実行を求める
![Page 14: 3&. ! 9MU](https://reader031.vdocuments.us/reader031/viewer/2022012422/61769d58bbc3fc0b9114145a/html5/thumbnails/14.jpg)
階層型メモリアーキテクチャ • データ配置・転送の最適化が重要
– スクラッチパッドメモリからなる階層型メモリアーキテクチャ • トレードオフ
– 小さいバンド幅 – 少ないメモリ容量
• ソフトウェア制御
![Page 15: 3&. ! 9MU](https://reader031.vdocuments.us/reader031/viewer/2022012422/61769d58bbc3fc0b9114145a/html5/thumbnails/15.jpg)
Copy-candidate Graph
![Page 16: 3&. ! 9MU](https://reader031.vdocuments.us/reader031/viewer/2022012422/61769d58bbc3fc0b9114145a/html5/thumbnails/16.jpg)
Copy-candidate Graph
![Page 17: 3&. ! 9MU](https://reader031.vdocuments.us/reader031/viewer/2022012422/61769d58bbc3fc0b9114145a/html5/thumbnails/17.jpg)
Data Dependence Graph • Software Pipeliningではイテレーション距離(Δ)を追加
![Page 18: 3&. ! 9MU](https://reader031.vdocuments.us/reader031/viewer/2022012422/61769d58bbc3fc0b9114145a/html5/thumbnails/18.jpg)
SDDG: Selective Data Dependence Graph [nakamura, inaba, hiraki (2009)]
Copy-candidate GraphとData Dependence Graphを融合
![Page 19: 3&. ! 9MU](https://reader031.vdocuments.us/reader031/viewer/2022012422/61769d58bbc3fc0b9114145a/html5/thumbnails/19.jpg)
MSCS
Data Dependency constraints
Valid Copy-candidate constraints
Precedence constraints
Resource constraints
i
subject to
minimize (Initiation Interval)
![Page 20: 3&. ! 9MU](https://reader031.vdocuments.us/reader031/viewer/2022012422/61769d58bbc3fc0b9114145a/html5/thumbnails/20.jpg)
研究連携 A01 A03 A02
高橋班:超並列
張班:アルゴリズム
稲葉班:アーキテクチャ
お伺いします 主要アルゴリズム 計算量・問題サイズ データ構造等
高橋班
リズム
連携して 研究・開発
連携して 研究・開発
第一原理物質デザイン のための計算機
アーキテクチャへむけて
要求要件の整理 アーキテクチャの提案 プロトタイプの構築
お伺いします 主要アルゴリズム 計算量・問題サイズ データ構造等
![Page 21: 3&. ! 9MU](https://reader031.vdocuments.us/reader031/viewer/2022012422/61769d58bbc3fc0b9114145a/html5/thumbnails/21.jpg)
9 92 93 9 9 96 9 98 99 0091 92 93 94 95 96 97 98 99 00
Doubleshift QR
01 02 03 04 05 06 07 08 09 10
shift QR
FMM GPU
![Page 22: 3&. ! 9MU](https://reader031.vdocuments.us/reader031/viewer/2022012422/61769d58bbc3fc0b9114145a/html5/thumbnails/22.jpg)
•N
mmm P )()(O(N3)
mni
mn
mni
m Pgg )()( O(N2 log N)
M
![Page 23: 3&. ! 9MU](https://reader031.vdocuments.us/reader031/viewer/2022012422/61769d58bbc3fc0b9114145a/html5/thumbnails/23.jpg)
•N
mmm P )()(O(N3)
mni
mn
mni
m Pgg )()( O(N2 log N)
MN – m + 1 g( i)
M
FMMFMMO(M)
![Page 24: 3&. ! 9MU](https://reader031.vdocuments.us/reader031/viewer/2022012422/61769d58bbc3fc0b9114145a/html5/thumbnails/24.jpg)
•N
mmm P )()(O(N3)
mni
mn
mni
m Pgg )()( O(N2 log N)
MN – m + 1 g( i)
M
FMMO(N – m + FMMO(M)
O(N m +1)
![Page 25: 3&. ! 9MU](https://reader031.vdocuments.us/reader031/viewer/2022012422/61769d58bbc3fc0b9114145a/html5/thumbnails/25.jpg)
General Purpose GPU ComputingGeneral Purpose GPU Computing
Graphic Card
MP MP MP MPs run in SPMDSP SP
SP SP
MP MP MP MPs run in SPMD~ Cores
SPs run in SIMD
SP SP
SP SP
SP SP
SP SP
x16
SP SP
SP SP
~ SIMD elements
Register: per SP
SP SP
SP SP
SP SP
SP SP
Expressx
shared shared sharedShared: per MPDevice: per GPU
PCI
Device Memory
![Page 26: 3&. ! 9MU](https://reader031.vdocuments.us/reader031/viewer/2022012422/61769d58bbc3fc0b9114145a/html5/thumbnails/26.jpg)
Matrix Multiply on GPUMatrix Multiply on GPUl ln
C Am
l l
m
n
280
300
ops)
B240
260
280
ance
(Gflo
n
200
220base
Performa
160
180 l = 2304n = 40005#blk = 81
140144 288 432 576 m
#blk 81
![Page 27: 3&. ! 9MU](https://reader031.vdocuments.us/reader031/viewer/2022012422/61769d58bbc3fc0b9114145a/html5/thumbnails/27.jpg)
Streaming Matrix Multiply on GPUStreaming Matrix Multiply on GPUl ln
C Am
l l
m
n
280
300
ops) r = 1.2
k = 11r = 2.3k = 4
r = 3.2k = 4
r = 4.1k = 3
B240
260
280
ance
(Gflo
n
k 11 k 4 k 4 k 3
200
220 base
opt
Performa
160
180 l = 2304n = 40005#blk = 81
140144 288 432 576 m
#blk 81
![Page 28: 3&. ! 9MU](https://reader031.vdocuments.us/reader031/viewer/2022012422/61769d58bbc3fc0b9114145a/html5/thumbnails/28.jpg)
![Page 29: 3&. ! 9MU](https://reader031.vdocuments.us/reader031/viewer/2022012422/61769d58bbc3fc0b9114145a/html5/thumbnails/29.jpg)
Strong Scaling
![Page 30: 3&. ! 9MU](https://reader031.vdocuments.us/reader031/viewer/2022012422/61769d58bbc3fc0b9114145a/html5/thumbnails/30.jpg)
Strong Scaling
![Page 31: 3&. ! 9MU](https://reader031.vdocuments.us/reader031/viewer/2022012422/61769d58bbc3fc0b9114145a/html5/thumbnails/31.jpg)
Strong Scaling Weak Scaling
![Page 32: 3&. ! 9MU](https://reader031.vdocuments.us/reader031/viewer/2022012422/61769d58bbc3fc0b9114145a/html5/thumbnails/32.jpg)
![Page 33: 3&. ! 9MU](https://reader031.vdocuments.us/reader031/viewer/2022012422/61769d58bbc3fc0b9114145a/html5/thumbnails/33.jpg)
![Page 34: 3&. ! 9MU](https://reader031.vdocuments.us/reader031/viewer/2022012422/61769d58bbc3fc0b9114145a/html5/thumbnails/34.jpg)
• => GPU, GRAPE DR•••
•
•
![Page 35: 3&. ! 9MU](https://reader031.vdocuments.us/reader031/viewer/2022012422/61769d58bbc3fc0b9114145a/html5/thumbnails/35.jpg)
GRAPE-DR project
![Page 36: 3&. ! 9MU](https://reader031.vdocuments.us/reader031/viewer/2022012422/61769d58bbc3fc0b9114145a/html5/thumbnails/36.jpg)
CDC6600
1970CDC7600
IBM 360/67Development of Supercomputers
Vector SIMD Distributed Memory Shared Memory
CDC
TI
STAR-100ASC
ILLIAC IV AP-120B
C.mmp
Fastest system at one time
Research/Special systemCray-1230-75APU
CrayFujitsu
Hitachi
Burroghs
ICL
FPS
CMUDenelcor
Cyber205
Cray-XMPMPP
DAP
VP-200
1980HEP
Cosmic CubeS-810
M180IAP
CrayComputer NEC
ICL
Goodyear
Thi ki M hi
Intel
Ncube
ETA-10
1990
CydromeMultiflow
WARP
NcubeiPSCSequent
MultimaxCray-2 VP-400
Cray-YMPS-820
CM-1
FX800
FX-8SX-2Computer NEC
M
Thinking Machines Ncube
CMU
IBM
Encore Allient
CM-2
MP-2
1990
Cray 3
Cray-C90 AP1000KSR-1
CM-5SP1 Challenge
T.S.Delta
AP3000
QCD-PAX
RP3MP-1SX-3
VP-2600
S-3800NWT
CS6400
CS-1 FX2800
Paragon
MasparFujitsu
SUN SGI
VPP700MMX
2000
Cray-3
SP2
CS-2
Origin2000
Challenge
ASCI RED
AP3000
Cray-SV1 SX-5
Cray-T90 SX-4
Starfire
SR8000
SR-2201
Paragon
T3E
T3D
MTA
S /IBM
Intel
U of Tokyo
CrayHitachi
Tera/Cray
ES
2000SSE
SSE2
SP3 PrimePower
HPC2500
SUN FireQCDSP
Alti
y
Cray-X1
Origin3800
SR11000
Regatta
XT3
PS2EE GRAPE-6 ASCIWhite
A Cl
uster
sVPP5000
SX-6 BG/LSSE3
Sony/IBM
GPGPU
GRAPE-DR project
CELLAltix
GRAPE-DR
SR11000XT3 IA
BG/PCray X2SX-8
SX-9 Roadrunner XT5Blue Water BG/Q
FX1
j
SR16000
AVX XT6
G80
GTX280
Fermi
![Page 37: 3&. ! 9MU](https://reader031.vdocuments.us/reader031/viewer/2022012422/61769d58bbc3fc0b9114145a/html5/thumbnails/37.jpg)
•
––––––––
GRAPE-DR projectUniversity of Tokyo
![Page 38: 3&. ! 9MU](https://reader031.vdocuments.us/reader031/viewer/2022012422/61769d58bbc3fc0b9114145a/html5/thumbnails/38.jpg)
fl /W1P
flops/W
GRAPE-DR(
1G
1T
BlueGene/L
Grape-6
1M
1G
Cray-1
ASCI RED
SX-8, ASCI-Purple, HPC2500
Cray-2
1KVPP5000
HPC2500
1
70 80 90 2000 2010 2020 2030 2040
GRAPE-DR project
![Page 39: 3&. ! 9MU](https://reader031.vdocuments.us/reader031/viewer/2022012422/61769d58bbc3fc0b9114145a/html5/thumbnails/39.jpg)
Photo of GRAPE-DR
GRAPE-DR projectUniversity of Tokyo
![Page 40: 3&. ! 9MU](https://reader031.vdocuments.us/reader031/viewer/2022012422/61769d58bbc3fc0b9114145a/html5/thumbnails/40.jpg)
GRAPE-DR•
–– Reduction– Reduction––
• LSI–
•512– 512•
GRAPE-DR project
![Page 41: 3&. ! 9MU](https://reader031.vdocuments.us/reader031/viewer/2022012422/61769d58bbc3fc0b9114145a/html5/thumbnails/41.jpg)
Green 500 List
GRAPE-DR project
![Page 42: 3&. ! 9MU](https://reader031.vdocuments.us/reader031/viewer/2022012422/61769d58bbc3fc0b9114145a/html5/thumbnails/42.jpg)
GRAPE-DR(GRAPE-DR, GPGPU, ClearSpeed etc.)( , , p )
N
CPU
IBM BlueGene/L,P
QCDQCDFFT
GRAPE-DR projectUniversity of Tokyo
![Page 43: 3&. ! 9MU](https://reader031.vdocuments.us/reader031/viewer/2022012422/61769d58bbc3fc0b9114145a/html5/thumbnails/43.jpg)
••–– 1/3–
•–
••
GRAPE-DR project
![Page 44: 3&. ! 9MU](https://reader031.vdocuments.us/reader031/viewer/2022012422/61769d58bbc3fc0b9114145a/html5/thumbnails/44.jpg)
••–– GRAPE-DR 100–
•––
••
–––
GRAPE-DR project
![Page 45: 3&. ! 9MU](https://reader031.vdocuments.us/reader031/viewer/2022012422/61769d58bbc3fc0b9114145a/html5/thumbnails/45.jpg)
••–– 10000
GRAPE-DR project
![Page 46: 3&. ! 9MU](https://reader031.vdocuments.us/reader031/viewer/2022012422/61769d58bbc3fc0b9114145a/html5/thumbnails/46.jpg)
••• WSI
GRAPE-DR project3
![Page 47: 3&. ! 9MU](https://reader031.vdocuments.us/reader031/viewer/2022012422/61769d58bbc3fc0b9114145a/html5/thumbnails/47.jpg)
GRAPE-DR project
![Page 48: 3&. ! 9MU](https://reader031.vdocuments.us/reader031/viewer/2022012422/61769d58bbc3fc0b9114145a/html5/thumbnails/48.jpg)
PCIe BUSI/FI/F
PCIexpressv3.0 16GB/s
DDR4 DRAM16GB/s
GRAPE-DR project
![Page 49: 3&. ! 9MU](https://reader031.vdocuments.us/reader031/viewer/2022012422/61769d58bbc3fc0b9114145a/html5/thumbnails/49.jpg)
FPGA
DR
4
DR
4
DR
4
DR
4
DDR4 DRAM
DD
DD
DD
DD
GRAPE-DR project
![Page 50: 3&. ! 9MU](https://reader031.vdocuments.us/reader031/viewer/2022012422/61769d58bbc3fc0b9114145a/html5/thumbnails/50.jpg)
GRAPE-DR project
![Page 51: 3&. ! 9MU](https://reader031.vdocuments.us/reader031/viewer/2022012422/61769d58bbc3fc0b9114145a/html5/thumbnails/51.jpg)
GRAPE-DR project