Airbreathing Propulsion System Analysis Using
Multithreaded Parallel Processing
Richard Gregory Schunk _ and T. J. Chung 2
1NASA/MSFC, Huntsville, Alabama
2Department of Mechanical & Aerospace Engineering
The University of Alabama in Huntsville
SUMMARY
In this paper, parallel processing is used to analyze the mixing and combustion behavior
of hypersonic flow. Preliminary work for a sonic transverse hydrogen jet injected from a
slot into a Mach 4 airstream in a two-dimensional duct combustor has been completed
[Moon and Chung, 1996]. Our aim is to extend this work to three-dimensional domain
using multithreaded domain decomposition parallel processing based on the flowfield-
dependent variation theory [Schunk, Canabal, Heard, and Chung, 1999; Schunk and
Chung, 1999].
Numerical simulations of chemically reacting flows are difficult because of the strong
interactions between the turbulent hydrodynamic and chemical processes. The algorithm
must provide an accurate representation of the flowfield, since unphysical flowfield
calculations will lead to the faulty loss or creation of species mass fraction, or even
premature ignition, which in turn alters the flowfield information. Another difficulty
arises from the disparity in time scales between the flowfield and chemical reactions,
which may require the use of finite rate chemistry. The situations are more complex
when there is a disparity in length scales involved in turbulence. In order to cope with
these complicated physical phenomena, it is our plan to utilize the flowfield-dependent
variation theory mentioned above, facilitated by large eddy simulation. Undoubtedly, the
proposed computation requires the most sophisticated computational strategies. The
multithreaded domain decomposition parallel processing will be necessary in order to
reduce both computational time and storage. Without special treatments involved in
computer engineering, our attempt to analyze the airbreathing combustion appears to be
difficult, if not impossible. We describe in detail the parallel processing strategy below.
Multi-threaded programming is utilized to take advantage of multiple computational
elements on the host computer. Typically, a multi-threaded process will spawn multiple
threads which are allocated by the operating system to the available computational
elements (or processors) within the system. If more than one processor is available, the
threads may execute in parallel resulting in a significant reduction in excution time. If
more threads are spawned than available processors, the threads appear to execute
concurrently as the operating system decides which threads execute while the others wait.
One unique advantage of multi-threaded programming on shared memory multiprocessor
systems is the ability to share global memory. This alleviates the need for data exchange
or message passing between threads as all global memory allocated by the parent process
is available to each thread. However, precautions must be taken to prevent deadlock or
https://ntrs.nasa.gov/search.jsp?R=20000093950 2020-04-18T22:09:34+00:00Z
raceconditionsresultingfrom multiple threadstrying to simultaneouslywrite to the samedata.
Threadsareimplementedby linking anapplicationto asharedlibrary andmakingcalls tothe routines within that library. Two popular implementationsare widely used: thePthreadslibrary (and its derivatives)that areavailableon most Unix operatingsystemsand the NTthreadslibrary that is availableunderWindows NT. There aredifferencesbetweenthe two implementations,but applicationscanbe ported from one to the otherwith moderateeaseand many of the basic functions are similar albeit with differentnamesandsyntax.
Domain decomposition methods can be used in conjunction with multi-threadedprogrammingto createanefficient parallel application. The sub-domainsresulting fromthe decompositionprovide a convenientdivision of labor for the processingelementswithin the host computer. In this application, an Additive Schwarz domaindecompositionmethodis utilized. The methodis illustratedbelow (Figure 1) for a twodimensional square mesh that is decomposedinto four sub-domains. The nodesbelonging to eachof the four sub-domainsare denotedwith geometricsymbols whileboundarynodesare identified with bold crosses. The desire is to solve for each nodeimplicitly within a singlesub-domain, For nodeson theedgeof eachsub-domainthis isaccomplishedby treatingtheadjacentnodein theneighboringsub-domainasa boundary.The overlappingof neighboringnodesbetweensub-domainsis illustrated in Figure 2.Higher degreesof overlapping, which may improve convergenceat the expenseofcomputationtime, arealsoused.
In a parallel application, load balancing betweenprocessorsis critical to achievingoptimum performance. Ideally, if a domaincould bedecomposedinto regionsrequiringan identical amountof computation,it would be a simple matter to divide the problembetweenprocessingelementsasshownin Figure3 for four threadsexecutingonanequalnumberof processors.
Unfortunately,in a "real world" applicationthedomainmay notbedecomposedsuchthatthe computation for each processor is balanced,resulting in lost efficiency. If theexecutiontime requiredfor eachsub-domainis not identical,the CPU's will becomeidlefor portionsof time asshownin Figure4.
Oneapproachto loadbalancing,as implementedin this application, is to decomposethedomaininto moresub-domainsthanavailableprocessorsand usethreadsto perform thecomputationswithin eachblock. The finer granularitypermitsa moreevendistributionof work amongsttheavailableprocessingelementsasshownin Figure 5.
In this approach,the numberof threadsspawnedis equal to the number of availableprocessorswith each thread marching through the available sub-domains (whichpreferablynumberat leasttwo timesthe numberof processors),solving one at a time inan "assembly-line"fashion. A stackis employedwhereeachthreadpops the next sub-domain to besolvedoff of the top of the stack.Mutual exclusion locks areemployedtoprotectthestackpointer in theeventtwo or morethreadsaccessthestacksimultaneously.Eachthreadremainsbusyuntil thenumberof sub-domainsis exhausted.If thenumberof
sub-domains is large enough, the degree of parallelism will be high althoughdecomposingaprobleminto too manysub-domainsmayadverselyaffect convergence.
The above approach will be utilized for the analysis of hypersonic airbreathingcombustionwith hydrogenfuel. Without usingtheparallel processing,the analysiswithMach 4 free-streamvelocity for the finite ratechemistrywith 18specieshasbeencarriedout asshown in Figs. 6 through8 [Moon andChung, 1996]. In this study,the standardK -e model wasused. The proposedpaperwill utilize the largeeddy simulationwithhighMach numbersandhighReynoldsnumbers. It is our planto report on the maximumrangesof MachnumberandReynoldsnumbertheproposedalgorithmcanaccommodate.Our eventualgoal in this paper is to determine the relationship betweenmixing andcombustion efficiency. Length scales and time scales involved in turbulence andcombustionwill be thoroughlyinvestigated.
References
Moon, S.Y. and Chung, T. J. [1996]: Numerical simulation of heat transfer in chemically
reacting shock wave turbulent boundary layer interactions, Journal of Numerical Heat
Transfer, Part A, 30, 1,55-72.
Schunk, R. G, Canabal, F., Heard, G.A., and Chung, T.J. [1999]: Unified CFD methods
via flowfield-dependent variation theory. AIAA paper 99-3715.
Schunk, R. G. and Chung, T. J. [1999]: Parallelization of the flowfield-dependent
variation schemes for solving the triple shock/boundary layer interaction problem,
TFAWS 99 NASA/MSFC, Septemer 13-15, 1999.
Sub-domain 1
Sub_omain 2
S ub-.clo main 3
Sub-domain 4
Bc_ ndary Node
Figure 1: Multiple Subdomains
Nodes shaded in J J J I I
white are solved
implicitly within in
each sub-domain.
Nodes from neighboringsub-dorr_ins (shaded)
are treated as boundary
nodes and alowed to lag
J one time_ep.
Figure 2: Domain Decomposition
5
Timestep #1 Timestep #2 Timestep N
./_ End of Timestep
Synchronization
CPU #1
CPU #2
CPU #3
CPU #4
Thread #1 Thread #2
Th read #2 Th read #4
Th read #3 Thread #1
Th read #4 Th read #3
Execution Time
000
Thread #3
Thread #2
Thread #1
Thread #4
Figure 3: Ideal Load Balancing
/_ End of TimestepSynchronization
CPU Idle
CPU #1
CPU #2
CPU #3
CPU #4
Timestep #1 Timestep #2
Thread #1 Thread #2
Thread #2 Thread #4_
Thread #3 [_ Thread #1
Thread #4 Thread #3
Execution Time
Figure 4: "Real World" Load Balancing
000
Timestep N
Thread #3
Thread #2
Thread #1
Thread #4_
CPU #1
CPU #2
CPU #3
CPU #4
CPU #1
CPU #2
CPU #3
CPU #4
#1
z
1,11.2
I '3 I-
.5 I .6_
,7 I ,sL_
Execution Time
4 Sub-domains
8 Sub-domains
&
@
End of TimestepSynchronization
CPU Idle
Figure 5: Domain Decomposition Improves Parallelism
c-ml
"_ c"1
0
o0E "-0..0
1
m09 ,-
0 O..
5 •"0
0
0_ 2:
.7
0°_
if)
c--
7-r-
E
1
"3 ._.1
r- _.)
o_._>
c-_D
E7
_oj
c-O
im
0
0
c-m
i
___.
,t-O
00..0..
<E0
im
im
E
t'--
0...(1)
"Om
im
!
0i
LL
0
o
(/)00
E
!
Vl "-- _ I c,i
u.. _' - o
"E i_ _ + =
o--
C_W
om
o--
c0
0
C
O.
xEW'._
+
II
z
o--
q)r7
r_
,.c:
c
n--
O
C
Em
Iii
Clm
ii
E
O
I,-O
t-O
on
I',,,,,I.m
a
' !t
/
! /
z
z"
x
8
-4-
I
t_ +
II t[;_
II
! I
I
I 1
4-
m
0
c-O
.m
Z3
E0
0
:,Z:C
0
"0>,
m
0m
0
0
!--
T"--
o
,,¢v
o E
\
cO oO0 eJ _ ¢q
0 0 0 0,.- 6 6 c5
0•1- 0 z -i-
>- >- >- >-
t-
c-
-¢j0
t-O
.QEoe, iOc0_- 0'_
<,a-
I
0_i
•_- 13_
0"o
e-
d__,-r
o_rrl_
wn
L_m
c-O
c_
EOL)
i
c-
ECD
W
t_c-O
X
<
V
O
c-
(/3
C3Cv3
>
00c3
Y
O _ IIII II 0c_ t_l C_J
x
-r"
"0m
(1)wm
_===I
0m
LL.
0"_c-
O
I
c-O
_=.=
0
c"0
0(I)
_==
(1)
EQ)
c
>,,im
0_c-(1)n
li]
t i\"
\
o \.4.=1
I_
(!)
EQ)
I--
¢/)
(1)IZl
\.
_r/ /
i i, _i_,,_'_
\\
'\
\
\\
i r
." f:
/t
I./
/
/
/
/r
Jr
EO
O
O
O!
Oim
I,,,-
tOd
d
d
Cxl
O
O
O
tO
O
C,I
d
tr_
d
O
t_O
c5
O
_1 1 I
ta'3 v_" CO C,d
o o c5 c5
O
t I _. 1___£_ miR O
-r i [,,.,.
7-- o7_
v
EO
¢O
U_O
o
O
d
r,-
d
t,o c-o .O
m
X0
"1"
d
0
0
¢ E5.. _ ._-
F= _-_ _
o'$ .=.--
(-
__ _ -_
-_ _ ° _ _ _o8_-. o_o _
._ =_ o 8
_b oE_ _E ,,,_. o f:,, oEo¢ .,..,_ "_
>,_ _ =
,5"
t-wm
EO
-O!
..Q
03
t'-
oo=
5=_ O
a N
OO9(1)>
mm
"O"O<_
._ ,__ .-- .- Z
E E E6 O O O "O"9, "9, "9 "9, =..Q .Q ._ .Q '-I
-I Z_ _ OCO 09 _0 0") ft_
4% ,ll% •
)(--_ >--<>----<
) (-_< >----<>---<
) (---< >--< >----<
) (----< _F---_:I_<
) (-----<__----<
t _lP •)(: :, ,,, ,
0m
"IDm
t-
O
t-wm
0¢-
m
r'n"0
0.._1
ira,,..
00000
00
13_
Z
E oo_
I--
0,1
0..
EI-
Eo_
I--
¢_1 ,'-
0 0o o
nn
8
oo
I
O_ O. O_ O_0 0 0 0
<1
<] E
Q. ¢-0
._..__-p"--5o
LU09
<_
-0m
0
i
n"
c".4,..a
Cim
Cin
0c-
m
m
0._J
0
00
13_
ZO..
g_(D
E
O4
Et-
O-_J
f,D
EI--
co
oo
r'n
"1.'.""lit
o.I
oom
o !rn I
ii
15.0
fitfiiJJJfJJ
o4
om
./i,.../11
fJ'l
fj'jfj,j,f.fJ
o4
oo
m
Q.0
T--
O_0
7_
_i
8¢,n
orn
JIJ
0
<1
<1
Q.e-_ 0
.E __-oo
0
XLLI
a..o
c-im
Oe-
ra
rn
t_O.J
>O
CL
Em
00e-
ll
EO
"OI
.Q
rj)'.6--
O
(D.Q
E
Z(Dc-
O')c-
um
00
(D
Oc'-
m
i
f
O_O
c-
EO"o_500
.,A.
o_
C_l::1:1:
ELO
i
co
O3
O_O
7
I
:)15.O
f
(M
"r--
O
00(-
(tlEO
I
_c}
09co
.A.
co
o,I
15.O
(D
LO
cO
O
(_. c-(1) o
._EE-_-o
WOO
0o
CLO
CLO
C0
im
m
Q_
E
c-"
EC_1,m
C_0
D..-0
"O
!
m
_J
,!
C_
c_
c_
!
c_
c_
c_
c_
c_
CO
c_
Em
O_
E
C_C:
Ec_
C_O
O_
c_
_D
.C
!im
m
Z_
Q.o
O_
C
oCL
C_
c_
C_
c_
Ec-Oe-
rnm
c-O
EL
E0
0
H
oo
0_
..oE
Z
(I)
-60o
Or)
.N
ft..0
f-
LU
t-
O
.=_o
oooa
'o
"o
..EF-
w I
._._ ZZZZ_ I
i
o_ o__
o
_ _00_O_ __
_" _'_X _x x x
_ __XX XXXX_
XX XMMX_
"0
0
8
i
•__ _-___ _o
_ o
E
"_ _ o_ _ o
g g
_ _ _ _ ._ .o_-_ _ .-_ o_
0
0
_:o• _.__0 _-
"-- oE_ o 0_ I e'- o
.0 _ ._
0 _
._ ¢ "0¢-
>,"0 e_ 0
w_ o o
_'_ 0 "_
¢ z
0 0 _
0 ¢- ¢D O_0 ¢- ._