development of numerical coupled analysis method by air ... · 3rd-order adams-bashforth methods...
TRANSCRIPT
Our research is about snow accretion on a train. The train travels over snow covered tracks, snow accretes to train bogies. When theaccreted snow dropped off from train bogies, they might damage the railway ground facilities along the tracks, the train devices, etc.
To establish countermeasures against the damage, we have developed a snow accretion simulator. By performing snow accretionanalysis for various modified train shape, we will find the train shape which reduces the amount of accreted snow.
Motivation and Objective
We have validated the results obtained from our snow accretion by comparing them with those obtained from the experiments by the use the snowfall wind tunnel. Additionally, since this snow accretion simulator is made by the distributedmemory parallel calculation programing, it allows us to solve a very huge calculation model such as a whole train bogie.
Air flow analysis by “Airflow simulator”
Trajectory calculation of flying snow by “Equation of motion for gravity and drag”
Snow accretion analysis by “Particle simulator”
Distribution of velocity of air flow
Snow trajectory
If space-filling ration > β (≈0.65)and collision angle < γ (≈60°) ,
then the target flying snow particle becomes the snow accretion particle.
Snow accretion algorithm
Our aim
Numerical Coupled Analysis Method by Air Flow Analysis and Snow Accretion Analysis
Wall particle
Snow accretion particle
Flying snow particle
Target flying snow particle
Kohei Murotani, Koji Nakade, Yasushi Kamata, Daisuke Takahashi (Railway Technical Research Institute)
Development of Numerical Coupled Analysis Method by Air Flow Analysis and Snow Accretion Analysis
dUsnow
dt=3
4Cd
ρair
ρsnow
1
dsnowUr Ur + g
Collision angleVelocity of target
flying snow particle
Normal vector of snow accretion surface
Second judging method
Observation
Space-filling ratio
First judging method
Analysis
Analysis area: 5m×1m×1m About 60 million particles Particle diameter: 1mm 1,000,000 steps 264 nodes of Cray XC50 Total wall-clock time: 12hours
Analysis area: 5m×1m×1m 188 million grids (1710x440x250)Minimum grid spacing: 1mm About 1,000,000 steps 264 nodes of Cray XC50 Total wall-clock time: 5hours
Air flow analysis
Snow accretion analysis
Inflow
if max div v < α
InflowInflow
Inflow
Experiment
Snow accretion analysis and air flow analysis Transparent particles are flying snow particles. Opacity particles are snow
accretion particles. Colors are velocity magnitude of air flow.
Inflow Inflow
Analysis area: 4m×1m×1m About 10 million particles Particle diameter: 1mm 1,000,000 steps 264 nodes of Cray XC50 Total wall-clock time: 3hours
Analysis area: 4m×1m×1m 257million grids (1100x500x500)Minimum grid spacing: 1mm About 1,000,000 steps 264 nodes of Cray XC50 Total wall-clock time: 7hours
Air flow analysis
Snow accretion analysis
Inflow
Experiment
Analysis
If two new snow accretion layer are
generated.
Inflow
Experiment of snow accretion at snowfall wind tunnel
at Snow and Ice Research Center, National Research Institute for Earth Science and Disaster Resilience
The experiment team realizes the observed actual phenomenon for simple and small model using the snowfall wind tunnel. The analysis team develops the snow accretion analysis method using Cartesian grid methods and particle method. The developed snow accretion simulator is validated by trial and error with the experiment using simple and small models.
Inflow
Inflow
Inflow
Modification of boundary shape
Snow accretion analysis
Cubic shape
Cubic shapeCubic shape
Cubic shape
Cubic shape
Airflow simulator
Particle Simulator
Halo area is generated. Domain decomposition for buckets by ParMETIS library
The number of particles in each domain is equally partitioned.
Embedding particles in buckets
0.1976 sec 0.1984 sec
4,000
4,500
5,000
5,500
0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1
Number of particles
Time
PE0 PE1
PE2 PE3Halo
exchange
Halo exchange
Halo exchange
PE 0 PE 1
PE 2 PE 3
Navier‐Stokes equation for Incompressible fluid flow
Finite difference method for nonuniform grid
Fractional step method
2nd-order central difference
3rd-order Adams-Bashforth methods
Poisson equation solver by Jacobi method
LES model by coherent structure Smagorisky model
Orthogonal domain decomposition
Target problem size : 10 million - 100 billion grids
Compiler : Fortran 90
Particle simulator by particle interaction
Moving Particle Simulation (MPS) method
Corrosion detection amongparticles using buckets
Dynamic load balancing by ParMETIS
Target problem size : 1 million -10 billion particles
Compiler : C
Overview
Overview
Domain decomposition and Halo exchange Dynamic load balancing
Strong scaling of 10 billion particles by K computer
System Site Processor (peak FLOPS) Nodes
K computer RIKEN2.0 GHz SPARC64 VIIIfx,
8-Core (128 GFLOPS)88,128
(1 CPUs / node)
FX100 Nagoya Univ.2.2 GHz SPARC64 XIfx,
32-Core (1,126 GFLOPS)2,880
(1 CPUs / node)
XC50Railway Technical Research Institute
2.7 GHz Intel Xeon Gold 6150,18-Core (1,555 GFLOPS)
264 (2 CPUs / node)
Supercomputer used in this researchACKNOWLEDGMENTS
This research used computational resources of the K
computer provided by the RIKEN Advanced Institute for
Computational Science and the FX100 provided by the
Nagoya university through the HPCI System Research
project (Project ID: hp170067, hp180014). This work was
supported by JSPS KAKENHI Grant Number 17K05152.
Communication method between airflow simulator and particle simulator
Calculation area 3 for particle simulator
Calculation area 0 for particle simulator
Calculation area 1 for particle simulator
Calculation area 2 for particle simulator
Calculation area 0 for airflow simulator
Calculation area 1 for airflow simulator
Calculation area 2 for airflow simulator
Calculation area 3 for airflow simulator
Calculation area 3 for particle simulator
Calculation area 0 for particle simulator
Calculation area 1 for particle simulator
Calculation area 2 for particle simulator
Calculation area 0 for airflow simulator
Calculation area 1 for airflow simulator
Calculation area 2 for airflow simulator
Calculation area 3 for airflow simulator
Communication of velocity distribution from airflow simulator to particle simulator
Communication of boundary shapefrom particle simulator to airflow simulator
4.01
2.28
1.23
0.80
0.550.49
1.00
0.88
0.81 0.84
0.70 0.61
0.0
0.2
0.4
0.6
0.8
1.0
0.25
0.50
1.00
2.00
4.00
6,144 12,288 24,576 49,152
The number of nodes
Wall-clock time (s)
6,144 12,288 24,576 36,864
Wall-clock time
Parallel efficiency
Parallel efficiency
64,512 82,944
76.66
38.69
19.38
9.79
5.02
2.48 1.73
1.37 1.19
1.00 0.99 0.99 0.98
0.95 0.97
0.92
0.88
0.67
0.5
0.6
0.7
0.8
0.9
1.0
0.25
1.00
4.00
16.00
64.00
768 1536 3072 6144 12288 24576 49152
Wall-clock time (s)
Parallel efficiency
The number of nodes
Wall-clock time (s)
768 3,072 24,576
36,864
6,144 49,1521,536 12,288
Wall-clock time
Parallel efficiency
Parallel efficiency
73,728
Strong scaling of 100 billion grids by K computer
Elapsed (s)MFLOPS / PEAK (%)
Memory access throughput /
PEAK (%)SIMD (%)
L1 cache miss (%)
L2 cache miss (%)
TLB miss (%)
Average Maximum
LES model 0.94 5.08 32.04 71.55 4.32 3.22 0.0285
Viscosity and Advection 0.69 4.74 35.95 80.19 4.94 3.24 0.0240
Poisson's equation 2.24 3.71 27.90 15.97 2.89 1.79 0.0007
One Loop 4.44 3.85 29.96 29.75 3.41 2.33 0.0077
Elapsed (s)MFLOPS / PEAK (%)
Memory access throughput /
PEAK (%)SIMD (%)
L1 cache miss (%)
L2 cache miss (%)
TLB miss (%)
Average Maximum
One loop 0.73 25.8 0.30 49.2 5.09 0.02 0.00
Computational Performances of 100 billion grids by 6,144 nodes
Computational Performances of 10 billion particles by 768 nodes
0.89
0.42
0.21
0.0
0.5
1.0
K FX100 XC50
Tims (s)3.85
0.92
0.0
2.0
4.0
K FX100 XC50
FLOPS/PEAK (%)
6.91
0.94 0.380.0
4.0
8.0
K FX100 XC50
Tims (s)25.82
21.66
0.0
10.0
20.0
30.0
K FX100 XC50
FLOPS/PEAK (%)
Calculation time of one loop of 42M grids by 12 nodes FLOPS / PEAK of one loop of 42M grids by 12 nodes
Calculation time of one loop of 16M particles by 12 nodes FLOPS / PEAK of one loop of 16M particles by 12 nodes
unmeasured
unmeasured