) '%: a%ge , aj %' .'% ) l ': %#cg) ) %a a %a ) '%: …...1 .. . .1...

14
)'%:A%GE,AJ%'.'%) L':"%#CG))%A"A%A )'%:A%GE,AJ%'%#CG)'EM JG)'%#A%GAA EGK!- GAA'% ,!A%%AH !:GMikito Furuichi, Daisuke Nishiura, , E#K!The University of Electro-Communications EE!I AG'%!I-)%A"AE))G)%A)E

Upload: others

Post on 19-Jan-2020

0 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: ) '%: A%GE , AJ %' .'% ) L ': %#CG) ) %A A %A ) '%: …...1 .. . .1 PEZY-SC: conventional general purpose processor + many-core processor Including general purpose processor in an

: A GE , AJ .L : CG A A A

: A GE , AJ CG EMJG A GAA EGK -

GAA , A A H:G Mikito Furuichi, Daisuke Nishiura, ,

E K The University of Electro-CommunicationsE E I

A G I - A AE G A E

Page 2: ) '%: A%GE , AJ %' .'% ) L ': %#CG) ) %A A %A ) '%: …...1 .. . .1 PEZY-SC: conventional general purpose processor + many-core processor Including general purpose processor in an

Heterogeneous Many Core Project

Ryutaro Himeno�leader, Fluid dynamics�Toshikazu Ebisuzaki (sub-leader, Astronomy)

Kobe univ.(JunnichiroMakino)Astrophysics,middleware

JAMSTEC(Hide Sakaguchi)Tsunami, Earthquake

UEC Tokyo(Tadashi Yamazaki)Neuro science

KEK(Tadashi Ishikawa)Lattice ���Multiple precision accuracy computation

NIG(Ken Kurokawa)Genome analysis

RIKEN(Toshikazu Ebisuzaki)Neuro science, MD, Plasma physics, Fluid dynamics

¥40 million/year, 5 yearsfunded by MEXT, Japan

Page 3: ) '%: A%GE , AJ %' .'% ) L ': %#CG) ) %A A %A ) '%: …...1 .. . .1 PEZY-SC: conventional general purpose processor + many-core processor Including general purpose processor in an

Background• RIKEN & PEZY Computing/ExaScaler Joint research started in May 2015 and RIKEN

installed “Shoubu” using PEZY-SC.• “Shoubu” got No.1 on Green 500, in June and Nov. in 2015.• “Shoubu” and “Satsuki” installed at Ebisuzaki Lab., RIKEN got No. 1 and No. 2 on

Green 500, in June 2016.• PEZY developed 2nd Generation system PEZY-SC2 and announced it improved

performance and memory bandwidth a lot, uses heterogeneous many core and magnetic field coupling memory, March, 2017.

• 2017�5�JAMSTEC-RIKEN-PEZY-ExaScaler joint research started in May 2017.• “Satsuki" and “Shoubu” will be retrofitted in 2018.• Heterogeneous many core processors are worldwide trend.

– Anton2, MD-GRAPE4, NVIDIA Tegra, Sunway TaihuLight, etc.

Shoubu@RIKEN

Gyoko@JAMSTEC

Satsuki@RIKEN

PEZY-SC PEZY-SC2 PEZY-SC3

developed . ( 36B7 2 ,A 6 98

architecture 9 9 ������������ ������������No. of cores ( (, (

frequency ( 01 01 01Memory bandwidth ) 0/ 4/ 4/

I/O bandwidth 0/ ( 0/ ) 0/FLOPs ) 4 A ( 4 A , 4 A

Energy consumption 5 5 ( 5Efficiency ) 0 A 5 ) 0 A 5 ( 0 A 5

Page 4: ) '%: A%GE , AJ %' .'% ) L ': %#CG) ) %A A %A ) '%: …...1 .. . .1 PEZY-SC: conventional general purpose processor + many-core processor Including general purpose processor in an

4 : ) G D G L G F G EGC D: D:G HD E FFB : ED H GH

, BEF - : GE G CC D 2E BH E GE D E HC DM :EG :ECF GH L : L BB FEF B G D D G G, BEF C B L G EG E F G :B D G C E H, BEF F G EGC D: FFB : EDH

G :B C E G ED B C DM E M 2EB : B G ,MD C :H 0 C E ,-2

G C EN.B MD C :H C D : B MD C :H, D BMH HN DEC : D BMH H

( 4 GH 3 GE : D: 1 : , -B C D GM F G :B F G G ED C E

In Black�modify existing codes to fit heterogeneous many coresIn Red: new applicationIn Green: partly new application

Page 5: ) '%: A%GE , AJ %' .'% ) L ': %#CG) ) %A A %A ) '%: …...1 .. . .1 PEZY-SC: conventional general purpose processor + many-core processor Including general purpose processor in an

1 ... .1

PEZY-SC: conventional general purpose processor + many-core processor

Including general purpose processor in an LSI, memory system is directly connected to it.

Data has been stored on local memoryLargest issue is optimization of transmission between processors

Using OpenCL : accelerator type programming model

Largest issue is data transmission between memory and many core

PEZY-SC2�heterogeneous many-core processor

interconnect

mem

ory

Generalpurposeprocessor

interconnect

core

core

core

mem

ory

Generalpurposeprocessor

interconnect

core

core

core

mem

ory

Generalpurposeprocessor

interconnect

core

core

core

Heterogeneousmany-coreprocessor Heterogeneousmany-coreprocessor Heterogeneousmany-coreprocessor

Cannot get full performance of hardware using existing codes

Develop powerful programing models after tuning small no. of codes, then develop middleware.

system with homogeneous many-core system with heterogeneous many-core

Page 6: ) '%: A%GE , AJ %' .'% ) L ': %#CG) ) %A A %A ) '%: …...1 .. . .1 PEZY-SC: conventional general purpose processor + many-core processor Including general purpose processor in an

22 C :

• /- - /

– - - // /- - -• - , - .• A• : 2 : 2

• -– E- 2 D :A

2 D :– 2 A : D 2 C

:

Application developers use Middleware

Page 7: ) '%: A%GE , AJ %' .'% ) L ': %#CG) ) %A A %A ) '%: …...1 .. . .1 PEZY-SC: conventional general purpose processor + many-core processor Including general purpose processor in an

D C F > C > C >

• E F F>D

• - C C–– H C C > > E

> > >>C C >

– - -

• D H> > CC .– C D C– D C H > F C

H > F C > CE C H

• D >– - CCH DC C C E

-• E F F F E

F C >• - C C

– - -

– E C > >C C > )> > C > >

• D H> > CC .– ) F E >C F

CD C > >–

• D >– C > D > >– GC > F > D C >– F C D C >

• F C >– ( > > H

Page 8: ) '%: A%GE , AJ %' .'% ) L ': %#CG) ) %A A %A ) '%: …...1 .. . .1 PEZY-SC: conventional general purpose processor + many-core processor Including general purpose processor in an

35 E > E D) - F G D E > F> D

• – . 5 E 5 E

• 2 : 0 5 E D5E >> E D D E) 0 E >D • , 5 > D F>5E D E . 5 E > D

DE5 E : E 5E 5>D : 0– 05 E> G E

• 3 D E) > E E 5 F ( : D • G E D >FE E 5 I E E• DE5 E G >FE : : 5E : >5 ED

•– 1F > 5E G5 >5E F > : 5E– 3 E > I 5 > 5> 5– D 5 G5>F5E : : 5 > D

Giant impact

Mantle convection

Droplet formation

Molecular motor

Crack propagation

Page 9: ) '%: A%GE , AJ %' .'% ) L ': %#CG) ) %A A %A ) '%: …...1 .. . .1 PEZY-SC: conventional general purpose processor + many-core processor Including general purpose processor in an

���. Particle Method �(2) Disaster simulation of Tsunami and Earthquake• Massively parallel computing with Smoothed Particle

Hydrodynamics (SPH) and Discrete Element Method (DEM) – Suitable for Tsunami, power, granular and civil engineering problems– Over billion particles with efficient parallelization (see Poster P21)

• Practical Applications�– Direct comparison test between the numerical and laboratory

experiment of centrifuge system– Analysis of landslide, break water, Tsunami sedimentation, and

liquefaction– Efficient design of railway track structure

• Toward prediction of critical location in the disaster event

Laboratory experiment

Numerical simulation

Replacement of laboratory experiment with numerical simulation requires low energy cost => PEZY-SC2 could be the best solution

The quantitative agreement can be obtained with over billion particles.

SPH-DEM coupled simulation of tsunami run-up with building structure

Page 10: ) '%: A%GE , AJ %' .'% ) L ': %#CG) ) %A A %A ) '%: …...1 .. . .1 PEZY-SC: conventional general purpose processor + many-core processor Including general purpose processor in an

3.2 Grid Method�Incompressible Flow�New�• Popularly used in industrial design• Recently utilized in medical fields such as blood

Challenging subjects• Performance limit by memory

band width• Strong scaling• Multi-physics phenomena

Courtesy of Nissan Motor Co., Ltd.

Page 11: ) '%: A%GE , AJ %' .'% ) L ': %#CG) ) %A A %A ) '%: …...1 .. . .1 PEZY-SC: conventional general purpose processor + many-core processor Including general purpose processor in an

) ) 1 GT D 1 G A PG M AG BT GT D G B T D> M D> D

• 002 2 : : : : : B 2 0– 3 D> B D> /= P M D Y M DGGD

G G , (60– 04/ G– 04/ D 3636 >> G M D G

PUP D G ( ( G ( *– B 2A : 2

• 2 2 B 0 2 0 :– 2S P D GT D A B D> B D> -. /GG M PG

AM B GT D D GD D PG M A G BT GT DG M M D D BM = A M D>M = V D>M = 10

, D>M = = E – /GG D D>M = 10 M GTU =T D M M> D> G 0 T D

G -. D>M =D G > P D T MP> PM B T D> M D> D0 :/ : 2 , 2 0 2 2 2

E : 0 :/ : 2 , 2 0 : : 2 20 : 2 : 2 0 20 :

3 GT D B D G V D>M =D D>M =D G > P D T - 2 2 : 0B

M G D>M =D G > P D T MP> PM B T D> D G >PG M G G - 2 2:0 : : 2 0 2

: P A LP > AM B D> - D 2Metagenome analysis pipeline

Microbial GPS by hierarchical Bayesian model

*: Actual calculation time with 150 core NIG supercomputer or TSUBAME2.0

Page 12: ) '%: A%GE , AJ %' .'% ) L ': %#CG) ) %A A %A ) '%: …...1 .. . .1 PEZY-SC: conventional general purpose processor + many-core processor Including general purpose processor in an

3.4. Other app:Human-scale neural network simulation(Cerebral cortex model is a new project)• - - - 1 1 1 -

1– A: : 7 7 1 : 7 : 1 7

• 1 7 C A1 A• 7 A A A 1 • - 7 1 1 71 A 1 7

– A 1 7 7 A 7 717 1 A:– 7 : 7 ) : 7 : 7 : 1 7

• - - 01 - 1 1 1 -– A

• 1 7:A 7 1 1 : C7 7 7 A• 7: 7:A 7 7 : ( ( .

Cerebellum

C. Layered and sheeted structure of the cerebral cortexB. Cerebro-cerebellar-basal ganglial loop model

12

Cerebral Cortex

A. Cerebellar corticonuclear microcomplex model

(A)

(B)

(C)

Page 13: ) '%: A%GE , AJ %' .'% ) L ': %#CG) ) %A A %A ) '%: …...1 .. . .1 PEZY-SC: conventional general purpose processor + many-core processor Including general purpose processor in an

3.4 Others: Large scale simulations forprecise experiments in elementary particle physics

• Search for new physics beyond the Standard Model– Standard Model: three generations of matters (quark & lepton), force

carriers (gauge bosons), origin of masses (Higgs) à not the “Theory of Everything”

– Any inconsistency between the theory and experiments signals new physics– Precise experiments are underway at LHC (CERN), RHIC (BNL), SuperKEKB

(KEK), and will be held at future ILC (International Linear Collider) – For precise theoretical predictions, large scale simulations are necessary

• Lattice QCD (Quantum Chromodynamics)– QCD describes the strong interaction among “quarks” through “gluons”– Analytical calculation of QCD is impossible at low energy– Lattice QCD, QCD on 4-dim. Space-time lattice, enables quantitative

numerical calculations by Monte Carlo method

• Precise calculation of perturbation theory– QED (Quantum electrodynamics) in muon g-2 experiments (Fermi Lab),

Electroweak Theory in electron-positron collider(ILC).– Many Feynman diagrams must be evaluated for high-order corrections

à automated computation– Integration of functions with singular behavior requires “multi-precision”

13

Page 14: ) '%: A%GE , AJ %' .'% ) L ': %#CG) ) %A A %A ) '%: …...1 .. . .1 PEZY-SC: conventional general purpose processor + many-core processor Including general purpose processor in an

Summary

• New processor architecture: heterogeneous many core processor– To answer how to write program code– Develop middleware for anybody to increase performance– 10 high performance applications

• Gravity multibody, Molecular Dynamics�Largest computations of Galaxy and molecules

• SPH & DEM method�Disasters simulation• Neulo science�Human brain scale simulation• LatticeQCD, Fluiddynamics, magnetic fluiddynamics� enhance

middleware for grid method• Genomic analysis � 1st application in data science

In Black�modify existing codes to fit heterogeneous many coresIn Red: new applicationIn Green: partly new application