arl-tr-589 october 1994 ttc t-- - dtic · 2011. 5. 13. · ashhio ~h 9410 26 009. notices destroy...

35
(Dom 0 Aiy RESEARCH LORATORY 00 ---- .4~ Evaluation of a Multiple Instruction/ Multiple Data (MIMD) Parallel Computer for CFD Applications 72 Stephen J. Schraml ARL-TR-589 October 1994 TTC T-- S"'" O•T2 7 1994 : ' -I 94-33314 Slii iU~illl 1lB II11 I Il III I/illll "MPRVED MR PSUBC MUMLEASE un. aShhIO wwwwh~h 9410 26 009

Upload: others

Post on 30-Jan-2021

1 views

Category:

Documents


0 download

TRANSCRIPT

  • (Dom

    0 Aiy RESEARCH LORATORY00 ----

    .4~

    Evaluation of a Multiple Instruction/Multiple Data (MIMD) Parallel Computer

    for CFD Applications

    72 Stephen J. Schraml

    ARL-TR-589 October 1994

    TTC T--S"'" O•T2 7 1994

    : ' -I

    94-33314

    Slii iU~illl 1lB II11 I Il III I/illll

    "MPRVED MR PSUBC MUMLEASE un. aShhIO wwwwh~h

    9410 26 009

  • NOTICES

    Destroy this report when it Is no longer needed. DO NOT return It to the originator.

    Additional copies of this report may be obtained from the National Teohnical InformationServce, U.S. Department of Commerce, 5285 Port Royal Road, Springfield, VA 22161.

    The findings of this report ae not to be constued as an official Department of fie Armyposition. unless so designated by other authorized documents.

    The use of trade names or manufacturers' names in this report does not constituteindorsement of any commercial product.

    Id

  • REPORT DOCUMENTATION PAGE 1Opm oft -oe

    1. ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ S A YUlCOT(ev AW .M IDV . TYPt ANO DATIS COVIRSOI October 1994 Final, April 1991 - November 1992

    Evaluation of a Multiple Instruction/Multiple Data (MIHD)Parallel Computer for CFD Applications WdO: 4C592-42l-U2-U042

    it AIDNOaw )

    Stephen J. Schraml

    7. -RWI 00ONAWST %~W() AW ADM SS(tS) 8.P10100 00 UZATION

    U.S. Army Research LaboratoryATTN: hM.L-VT-OCAberdeen Proving Ground, MD 21005-5066

    F. 50*0h5OJ6'0UTOs AUCY %AW(S) £10 AWN950S) tSSO. SM W"O!

    Aberdeen Proving Ground, MI) 21005-5066 ALT-6

    ire. O~SEW85 I AVALuTZ3 STATIMN111T 12.OI~ iUMC-

    Approved far public release; distriteation is unlimited. [13. ABSTRACT (Mwmmum 200 wovqkIn an attempt to evaluate the merits of massively parallel processing computers forthe numerical simulation of blast phenomena. the U.S. Army Research Laboratory (ARL)has adapted one of its blast modeling tools to several unique parallel architec-tures. This report describes the adaptation of the BRL-4Q1D code, a quasi-one--dimensional, finite difference Euler solver, to the Intel iPSCI86O parallelI super-computer. The code was reconfigured for the iPSCI86O using FORTRAN4 77 and theIntel LPSCI86O message-passing library. The performance of the code was measuredon the iPSC/860 for a vAriety of problem sizes and processor configurations. Tl;e

    w performance was found to be highly dependent on the size of the problem. Thisproblem size dependency was most noticeable when fever processors were employed.Results of scalability tests indicate that the code performance sc.nles in aroughly linear fashion about extrapolated lines of ideal performance.

    R.~~~ ISTTia1. NUMBER OF PAGESnuclear explosion simulation, shock tubes, computer programming I26

    17. ERM0"06TUM 16.SIMMCLASIFCATO i . SECURITY CLASSIFICATION 120. LIMITATION Of ABSTRACT

    MN Standard Form 298 (Rev 2-89)Pv.Webd OV AMJ Sid M3 1S

    M_. 102

  • Inteiitionally Loeft Blank

  • Acknowledgments

    The BRL-Q1D code adaptation and performance trials were conducted withthe resources provided by the U.S. Army Research Laboratory (ARL), the StateUniversity of New York at Stony Brcok and the California Institute of Technology.The author acknowledges the assistance provided by the support staffs of theseorganizations.

    V The author also acknowledges Mr. Monte W. Colcman, ARL, for assistanceprovided in executing the calculations described in this report, and Mr. ThomasM. Kendall, also of ARL, for reviewing this report.

    For

    ad

    i .... i

    -= A. ;O!)

    A "! ------

    -A ,' iL ,)P ' {_Cie

    L__d

    11.,

  • *1

    Intentionally Left Blank

    iv

  • Table of Contents

    Acknowledgments ................................... .

    List of Figures ........ ................................... . .

    1. Background .......... .................................... 1

    V 2. The iPSC/860 .......... ................................... 1

    3. Blast Modeling Application ........ ........................... 3

    4. Results .......... ....................................... 7

    5. Conclusions .......... .................................... 9

    References .......... ..................................... 13

    Distribution List ......... ................................. 15

    U

  • I

    Intentionally Left Blank

    vi

  • List of Figures

    Figure Page

    1 Measured Performance of BRL-Q1D Code ........................ 10

    2 Scalability Using 512 Grid Points / Node .......................... 10

    3 Scakbility Using 1024 Grid Points / Node ........................ 11

    4 Scalability Using 2048 Grid Points / Node ........................ 11

    5 Scalability Using 4096 Grid Points / Node ........................ 12

    6 Scalability Using 8192 Grid Points / Node ........................ 12

    vii

  • II

    Intentionally Left Blank

    VIII

  • 1. Background

    Due to the rising costs of large-scale experimentation and the uncertainty of scalingeffects in experiments, the U.S. Army is becoming increasingly dependent on computer sim-ulations to assess the vulnerability of military systems to nuclear blast. Present vectorsupercomputer technology can provide detailed fluid dynamic simulations in two dimensionsin a production environment (less than 10 CPU hours). Three-dimensional simulations em-ploying limited spatial resolution, whiich are only sufficient for modeling relatively simplegeometries, still require 100 or more CPU hours on a vector supercomputer.

    Obviously, these low resolution simulations with simple geometries do not provide suffi-cient information about the vulnerability of specific military systems to the overturning orcrushing effects of blast produced by tactical nuclear weapons. To provide accurate assess-ments of system vulnerability, highly detailed three-dimensional simulations with coupledfluid-structure interaction are required. To make this type of numerical simulation possi-ble, increases in supercomputer performance of two or more orders of magnitude must berealized.

    The rapidly maturing field of massively parallel processing (MPP) has the potential tooffer the compute performance required for detailed three-dimensional fluid dynamic simu-lations. There are many different types MPP machines available oln the computer markettoday. However, generally speaking, most current MPP computers have the following twobasic characteristics:

    1. They combine the resources of a large number of processors to simultaneously solvedifferent parts of a large problem.

    2. Each processor has its own bank of local memory.

    Particular MPP computers differ primarily in the way the processors access data in their ownmemory and data in the memory of other processors. These data access methods typicallydefine the programming methods which are required to extract maximum performance fromall of the resources that the machine has to offer.

    As a means of evaluating MPP technology, the U.S. Army Research Laboratory (ARL)continuously adapts one of its blast modeling tools to emerging MPP computer platforms.)Through continuous evaluation of MPP computers, the ARL can configure its software toolsto exploit this techncology, thus making detailed three-dimensional fluid dynamic simulationsavailable in a production computing environment.

    2. The iPSC/860

    The Intel iPSC/860 was the parallel computer chosen for the adaptation and evaluationdescribed in this report. The iPSC/860 is a Multiple Instruction / Multiple Data (MIMD)parallel computer. This implies that the processors of the computer can perform a number of

    1

  • different operations simultaneously, if requested to do so. This is quite different from a SingleInstruction / Multiple Data (SIMD) computer, in which all of the computers processorsperform identical operations, simultaneously, on different sections of data.

    The heart of the iPSC/860 is a set of Intel i860 microprocessors. Each of these proces-sors has a fixed amount of local memory. Systems configured this way are often referredto as "distributed memory multiprocessors." Most current MPP machines are distributedmemory architectures. The individual processors of tile iPSC/860 are linked together by aninterconnect network which allows high bandwidth data transfer between processors. Theearlier model of the iPSC/860, known as the Gamma, employs a hypercube topology as theinterconnect network. 2 A hypercube can be envisioned as a large number of nested cubeswith each point of a cube representing a node, or processor, in the system. The later modeliPSC/860, called the Delta, employs a two-dimensional mesh topology as the processor in-terconnect network. Each type of interconnect network is designed to have a max-mumnumber of connections between available processors, while at the same time minimizing thedistance that data must travel when moving from its originating processor to its destinationprocessor.

    Like most MPP machines, the iPSC/860 is designed to be a scalable system. As such,it should be possible to linearly increase the compute performance of an application byincreasing the number of processors allocated to the application. Thus scalability is themotivation in the development of MPP computers. In a truly scalable system, a desired levelof performance can be obtained by simply acquiring the necessary quantity of processors.This is potentially more cost-effective than obtaining performance gains through advancesin single processor design.

    Unfortunately, building a scalable architecture is only half of the solution to obtainingincreased performance. To optimally employ all of the iesources provided by MPP com-puters, the application's algorithm must be designed to be scalable as well. Optimum algo-rithm design for the iPSC/860, and most other MIMD machines, is accomplished througha style of programming known as message-passing. When a parallel application is run onthe iPSC/860, the data is distributed among the available memory of the processors beingused. If a particular processor needs access to a piece of data stored in another processor'smemory to perform a calculation, then that data must be transterred from khe originatingprocessor to the processor which needs the data. To accomplish this data transmission, theapplication must be written to explicitly pass the data from the original processor, to thetarget processor.

    3

    One potential bottleneck in distributed memory parallel computers is this transfer ofdata between processors. Even though the processors are connected by a high-speed net-work, the time required to move data between processors is typically much greater than thetime spent by the receiving processor executing a floating point instruction using that data.Consequently, the ultimate goal in developing an algorithmr which employs message-passingtechniques is to minimize time spent transferring data, thereby maximizing the time spent incomputing the solu'tion. With this in mind, the algorithm designer must be sure to transferonly that data which is necessary for the calculations to be performed correctly.

    2

  • 3. Blast Modeling Application

    The BRL-Q1D code was selected as the blast modeling application which is used bythe ARL to eva!'jate the programming environment and the computational performance ofmassively parallel computers. BRL-Q1D is a quasi-one-dimensional, finite difference, singlematerial, polytroý'-" gas fluid dynamics code and is primarily used to simulate flow in shocktubes. This code was chosen for its relative simplicity and its algorithmic similarities to thetwo- and three-dimensional codes that are currently used for blast modeling applications.Therefore, adaptation of this code is the most cost-effective means of evaluating massivelyparallel computer architectures. The similarities in the solution algorithms of BRL-Q1D andmore complex multidimensional codes can provide insight to the potential performance ofthese more complex codes on MPP computers.

    The BRL-QLD code incorporates two computational techniques, an implicit finite dif-ference technique 4 and an explicit finite difference scheme.5 Only one of these algorithmsmay be used in a particular BRL-Q1D calculation. The solution scheme which is employedis determined by a set of user-defined input options. The solution algorithms are applied tothe quasi-one-dimensional Euler equations in th 'r weak conservative form. 6

    One multidimensional fluid dynamics code used extensively at ARL for the numericalsimulation of blast effects is the SHARC code. SHARC is an explicit, finite differenceEuler solver which is second-order accurate in space and time.7 Because of its algorithmicsimilarities to the SHARC code, only the explicit algorithm of the BRL-Q1D code wasadapted to the iPSC/860.

    The MacCormack explicit scheme employed in BRL-Q1D is a second-order, non-centered,predictor-corrector technique that alternatively uses forward and backward differences forthe two steps. The first step predicts the value of the state variables at a grid point basedon the values of the grid point and its neighboring downstream grid point. The second stepthen corrects these state variables based on the values at the grid point and its neighboringupstream grid point.

    Prior to its adaptation to the iPSC/860, the BRL-Q1D code existed in standard For-tran 77 form and the explicit, finite difference algorithm had been optimized for maximumperformance on vector supercomputers. So that it could obtain maximum performance onthe iPSC/860, the code was modified to evenly distribute the arrays among the availableprocessors and then calls to the iPSC message-passing library were inserted where necessaryfor the transmission of data between processors.

    The even distribution of arrays among available processors is a technique which is oftenreferred to as "domain decomposition." In the case of the one-dimensional code, domaindecomposition is nothing more than evenly dividing the number of available processors intothe number of grid points being used by the calculation, then placing that number of adjacentgrid points on successive processors. For example, to distribute an 800 grid point calculationamong 8 processors, grid points 1 to 100 would be placed on processor 0, 101 to 200 onprocessor 1, etc. The BRL-QLD code was modified in such a way that each time the codewas run, it would •;atomatically determine the number of processors that were available,

    3

  • and dynamically allocate the arrays based on the grid size and this returned number ofprocessors. Of course, domain decomposition in two or three spatial dimensions can bemucha more complex than this simple one-dimensional case. This is especially true whencomplex geometries are being modeled.

    After the domain decomposition scheme had been developed, it was then imposed on thesolution algorithm, so that the computational load would be evenly distributed among theprocessors. In the original Fortran 77 implementation of the BRL-QlD code, the successiveprediction and correction of state variables was accomplished through the use of DO loopswhich proceed through the one-dimensional grid, from beginning to end, in one grid pointincrements. For the message-passing implementation, the range of DO loop operation on aparticular processor was limited to the grid points which were allocated to that processor.If a calculation required data from a grid point which does not reside on the processor doingthe calculation, that data is passed to the processor prior to the calculation. The followingexamples of original Fortran 77 code and message-passing code illustrate this logic.

    Segment of Original Fortran 77 Code

    do 20 jw2,jmax-ls(j,1) - q(j,1) - dt*(f(j+1,1)-f(j,1))

    20 continue

    Segment of Equivalent Message Passing Code

    istart'-ibeg(mynode ( ) +1)istop -iend(mynode(+1)if (istart.eq. 1) istart=2if (istop .eq.jmax) istop =jmax-1if (nuamnodeso.ne.1) call passleft(f,l)do 20 j-istart,istop

    s(j,l) - q(j,l) - dt*(f(j+1,1)-f(j,M))20 continue

    These code segments bre examples of a predictor step in the explicit algorithm. In theseexamples, the array s is being calculated from the arrays q and f and a constant, dt forall grid points between the second and the next to the last, inclusive. The calculations forthe first and last grid points are performed in a separate boundary condition subroutine.The functionality of the Fortran 77 code is obvious from the example given above. In themessage-passing example, the vectors ibeg and lend represent the beginning and ending arrayindices assigned to each processor in the domain decomposition subroutine. In the case of 3the first processor, the value of ibeg is reset from a value of 1 to 2 in order to conform to thelimits on the DO loop in the Fortran 77 example. In a similar fashion, the value of iend isreset from Jmax to jmax-1.

    In the message-passing example, when the calculation of the state variable s reachesthe last iteration in the loop on a particular processor, a value from the f array whichresides on a neighboring processor is required. For this calculation to be performed properly,

    4

  • all processors except the first pass the first element of f on the processor to the adjacentprocessor on the left. This operation is initiated by the call to the subroutine passlft in theexample. The first argument in the call to the passleft subroutine is the name of the array tobe passed. The second argument is the number of elements to be passed between processors.If the caculation of s had used a f(j+'2,1) term, then all processors except the first wouldpass two elements of f to the left neighboring processor. For the the corrector step in theexplicit algorithm, an analogous passright subroutine is used in which all processors exceptthe last pass data from the specified array to the neighboring processor on the right.

    To further illustrate the techniques employed in message-passing programming, thepassleft subroutine is listed below. This subroutine listing shows the process by which proces-sors pass the first n sub-array elements in their memory to their respective left neighboringprocessor. The receiving processors then store this data as the last n sub-array elements inmemory.

    In studying the function of this subroutine, it is important to remember that the sub-routine runs simultaneously on all of the assigned processors. When the passleft subroutineis called, data from a particular flow parameter array are stored in a dummy array a. Thisdummy array, is a two-dimensional array with the first dimension assigned to be the numberof grid points, jmax, and the second dimension assigned as the number of variables per gridpoint for the array, in this case three (energy, density and momentum, for example). Thepassleft subroutine is designed to transfer all three of these variables for a given grid pointfor any particular call to the subroutine. The listing of the passleft subroutine shows thefollowing steps which are taken to transfer the data:

    1. The send and receive indices for each processor are defined. These indices becomescounters in a DO loop if data from multiple grid points are to be transferred betweenprocessors.

    2. All of the processors are synchronized in time so that the communication takes placesimultaneously on the processors.

    3. On all processors except the first, the three variables for a given grid point of the dummyarray a are written into a temporary, three elemer't array b. The particular grid pointis defined by the send index.

    4. All of the processors except the first send the contents of array b to the left neighboringprocessor.

    5. All of the proccessors except the last receive the data sent from the right neighboringprocessor and store the data in the temporary, three elemenc array b.

    6. On all of the processors except the last, the three variables stored in the b array arewritten into the dummy array a for the proper grid point. The particular grid point isdefined here by the receive index.

    7. This process is repeated if data from more than one grid point is being transferred.

    5

  • Listing of Passleft Subroutine

    subroutine pausleft (a,n)c

    c this subroutine passes the first a elements of sub-arrayc a from a processor to its left neighbor processor.c the receiving processor stores the data as the last nc elements of the sub-array a.c

    include 'param.h:include 'mimd.h'

    C

    dimension a(jmax,3),b(3)C

    c is a send index for local nodec ir a receive index for local nodec

    is = ibeg(mynodeo+))-Iir a iend(mynode)+1)

    c

    do 10 kil,ni-is-+1ir=ir+l

    call gsync()if (mynodeo.ne.0) then

    do 20 j=1,3b(j) - a(is,j)

    20 continuecall csend (O0b,3*4,mynodeO-1,mypidO)

    endifif (mynodeG).ne.nuimnodesO-1) then

    call crecv (0,b,3*4)do 30 j-1,3

    a(ir,j) b(j)30 continue

    endif10 continue

    creturnend

    6

  • 4. Results

    Whei the adaptation of the BRL-QlD code was completed, the performance of thecode was tested on both Gamma and Delta models of the iPSC/860 architecture. Dueto the nature of the hypercube topology of the iPSC/860 Gamma model, the number ofavailable processors is always an exact power of two. A Gamma model with 16 processorswas employed for these tests. The iPSC/860 Delta model, with its mesh topology, is notconstrained to the power of two processor requirement of the Gamma. The particular Deltamachine used for the tests has 532 processors. However, only 256 processors were used inthe maximum Delta configuration tests. So that Delta results could be directly comparedwith Gamma results, all Delta trials employed power of two processor configurations andproblem sizes.

    In all cases, the measured performance of the BRL-QiD code was represented as the"whiz factor." This is a measure of the average CPU time required for the code to computea solution, divided by product of the number of grid points and the number of cycles inthe calculation (ps/grid point/cycle). This is a convenient method of measuring the code'sperformance because it normalizes the run time against the problem size and the number oftime steps in the calculations. Thus using the whiz factor as a benchmark, results of differentcalculations can be compared directly. For a particular processor configuration and problemsize, the reported performance is the minimum whiz factor (i.e., best performance) out of aset of several identical trials.

    The first set of tests was performed to determine the influence of varying problem sizeon code performance. These tests were performed only on the Gamma model. The resultsof these tests are illustrated in Figure 1. This figure shows several curves illustrating therelationship between whiz factor and problem size for different processor configurations. Thisfigure shows that, for small problems, the performance of the iPSC/860 is highly dependenton the size of the problem. As the problem size is increased, all of the processor configurationsapproach an upper limit on performance (i.e., a minimum whiz factor). All of the curves inFigure 1 have a similar shape; an initially sharp drop in whiz factor as the problem size isincreased, followed by a bump in the middle of the curve, and ending with a leveling off as themaximum performance for that processor configuration is reached. The bump in the middleof each curve is a result of the increasing size of the problem filling the memory systems ofeach processor. The curve representing the trials with eight processors is slightly differentfrom the other curves at the data points corresponding to problem sizes of 2"0 and 211. Forthis processor configuration, the performance increased very little from a problem size of 2'to 210. Then, from 2"0 to 211 the performance increased significantly. In fact, the measuredperformance of the code on eight processors is exactly the same as the sixteen processorresult for the same problem size of 2"1. Several additional trials were performed here toveryify the result, and results were consistent. Thus this appears to be merely an interestingcharacteristic of the Gamma model, most likely resulting from a fortuitously optimum layoutof the data in memory for the particular configuration of eight processors running a problemsize of 211.

    7

  • As previously discussed, the Delta model allows the user to access arbitrary numbers ofavailable processors. The Delta also allows users to define the two-dimensional layout of theprocessors within the mesh topology. With this in mind, a series of tests was performed todetermine the influence of processor layout on code performance. In this series, a problemsize of 16384 was run on 16 processors of the Delta. Five tests were performed in whichprocessor layouts of lx16, 2x8, 4x4, 8x2 and 16xl were employed. The results of these testsare provided in Table 1. Also included in this table is the result of the identical calculation onthe Gamma. These results illustrate that the performance of the BRL-QlD code on the Deltais independent of processor configuration. Thus it can be assumed that communication ofdata between processors is not heavily influenced by the proximity of any two communicatingprocessors.

    Table 1. BRL-Q1D Performance on Delta as a Function of Processor Layout

    Processor Whiz FactorLayout (ps/grid point/cycle)

    1x16 7.752x8 7.794x4 8.428x2 7.7516xl 7.81

    G _ma 2_ 4 9.00

    As previously discussed, true scalability of both the architecture and the algorithm areessential to successful exploitation of MPP technology. Thus to determine the scalability ofthe BRL-Q1D code on the iPSC/860, a final series of tests was performed in which successivetests employed increasing numbers of processors. The problem size was accordingly increasedwith the increase iii processors, so that lines of constant problem size per processor could bedetermined. These tests were performed on both the Gamma and the Delta models and areillustrated in Figures 2 through 6.

    The results shown in Figures 2 to 6 illustrate the scalability te.3ts of Gamma and Deltausing 512 grid points per processor. In these figures, the measured performance data pointsfor the Gamma are represented by the solid dots, while the data for the Delta is repre-sented by the star symbols. If the adapted BRL-Q1D algorithm were perfectly scalable onthe Gamma and Delta architectures, then doubling the number of processors used wouldresult in a factor of two decrease in the whiz factor (i.e., doubling the number of processorswould double the performance). The solid !ine in Figures 2 to 6 represents this ideal scala-bility which is extrapolated from the measured whiz factor for one processor of the Gamma.Accordingly, the dashed line represents the same ideal scalability relationship for the Delta.

    Due to the logarithmic formulation of the scalability relationship, the ideal scalabilitycurves result in straight lines when plotted against log-log axes. Figures 2 to 6 show that thel;nes of ideal scalabilty for both Gamma and Delta pass through the scatter of the measureddata, indicating perfect scalability. These figures also illustrate that the performance of theDelta is slightly better than that of the Gamma for all cases. This improvement can be

    8

  • attributed to advances in compiler technology and system software on the Delta, the laterof the Lwo architectures.

    5. Conclusions

    This report has outlined the successful implementation of the explicit, finite differenceBRL-QID algorithm on the Intel iPSC/860 parallel conmputer. A division of labor amongprocessors along with a coordinated use of message-passing between processors was usedto evenly distribute the algorithm across the resources of the architecture. The resultspresented in the figures lead us to conclude that this message-passing implementation of theBRL-Q1D code is indeed perfectly scalable on both hypercube and mesh processor topologyMIMD computers.

    The likelihood of a successful implementation to parallel architectures is dependent or.the level of inherent parallelism ;n the algorithm. The explicit BRL-Q1D algorithm is inher-ently data parallel. Its inner DO loops typically span the entire computational mesh. As aresult, distribution of the algorithm across processors is easily accomplished.

    As stated earlier, the adaptation of the BRL-Q1D code to the iPSC/860 was part of anattempt to evaluate MPP technology for blast modeling applications. The success of thisand other implementations of the code on MPP computers is an indication that significantperformance improvements can be obtained from the adaptation of large, multidimensionalfluid dynamics codes to MPP platforms.

    Other types of codes, however, may not be good candidates for adaptation to parallelcomputers. When this is the case, it may be necessary to completely restructure the basicalgorithm in order to increase the level of inherent parallelism. Once this is done, thenthe algorithm can be adapted to parallel computers with greater likelihood of a successfulimplementation.

    9

  • BRL-QID Performance

    Intel IPSC/860 (y)

    500

    Number of Processors

    0

    " 400- - 2

    -440

    2008

    8" - 16

    300-

    .C

    %L-0

    " 200- --s ,

    2 2 2" 2" 2' 2" 2" 2" 2"'

    Problem Size

    Figure 1. Measured Performance of BRL-Q1 D Code

    BRL-O1D Scalability

    Intel IPSC/86O with 512 Grid Points Per ilode

    4) " 100-1A-

    0

    .. N.

    • • -y Measured •NIdealt- -N• -

    • 6 Measured N

    -- 6 Ideal

    10 1 ;0

    Number of Processors

    Figure 2. Scalability Using 512 Grid Points / Node

    10

  • BRL-QID Scolobility

    Intel IPSC/860 wilh 1024 Grid Points Per Node

    UN, 100-U

    a.

    0

    0 y ' measuredl*'

    N • "r os0eo ,

    I .0 1;0

    Number of Processors

    Figure 3. Scalability Using 1024 Grid Points / Node

    ORL-O1D Scolobillty

    Intel IPSC/860 with 2048 Grid Points Per Node

    1t00,

    :2S

    U -

    " Ideals,• • 8 Meosured d

    -- 68 Ideol10 100

    Number of Processors

    Figure 4. Scalability Using 2048 Grid Points / Node

    RL-11DSoobit

  • BRL-Ot Scolobillty

    Intel IPSC/860 with 4096 Grid Points Per Node

    S7 Meauured

    -- 7 ' ,dealU * 6meCsured

    1007

    CL0.

    Fi 5. l ... .. i - G i Pint /... de

    EIRL-lOI Scolability

    Intel IPSC/860 with 8192 Grid Points Per Node

    •. * 6Measured100,

    .10

    I-

    Number of Processors

    Figure 5. Scalability Using 4096 Grid Points / Node

    L12

    Fintrel 6.PSc/liwith sn 8192 Grid Points /e Node

    100• i ' --! I6! Ideal'1I

  • References

    1. Schraml, S.J. "A Data Parallel Implementation of the BRL-Q1D Code." TechnicalReport BRL-TR-3389. U.S. Airny Ballistic Research Laboratory, Aberdeen ProvingGround, Maryland. September 1992.

    2. Intel Scientific Computers. "iPSC/860 User's Guide." Order Number 311532-001,Beaverton, Oregon. June 1990.

    3. Intel Scientific Computers. "Parallel Programming Primer." Order Number 311914-001,Beaverton, Oregon. March 1990.

    4. Beam, R.M., and R.F. Warming. "An Implicit Factored Scheme for the CompressibleNavier-Stokes Equatiors." AIAA&Journal, Vol. 16, No. 4. April 1978: 393-402.

    5. MacCormack, R.W. "The Effect of Viscosity in Hypervelocity Impact Cratering." AIAAPaper, No. 69-354.

    6. Opalka, K.O., and A. Mark. "The BRL-Q1D Code: A Tool for the Numerical Simula-tion of Flows in Shock Tubes with Variable Cross-Sectional Areas." Technical ReportBRL-TR-2763. U.S. Army Ballistic Research Laboratory, Aberdeen Proving Ground,Maryland. October 1986.

    7. Hikida, S., R. Bell, and C. Needham. "The SHARC Codes: Documentation and SampleProblems." SSS-R-89-9878, S-CUBED, September 1988.

    13

  • Intentionally Left Blank

    14

  • No. of No. of1 Copies Organization

    2 Administrutor 1 CommanderDefense Technical Info Center U.S. Army Missile CommandATrN: DTIC-DDA ATTN: AMSMJ-RD-CS-R (DOC)Cameron Station Redstone Arsenal, AL 35898-5010Alexandria. VA 22304-6145 1 Commander

    Commander U.S. Army Tank-Automouve CommandU.S. Army Materiel Command ATTN: AMSTA-JSK (Armor Eng. Br.)ATTN: AMCAM Warren, MU 48397-50005001 Eisenhower Ave.Alexandria, VA 22333-0001 1 Director

    U.S. Army TRADOC Analysis CommandDirector AMlN: ATRC-WSRU.S. Army Research Laboratory White Sands Missile Range, NM 88002-5502ATFN: AMSRL-OP-SD-TA,

    Records Management 1 Commandant2800 Powder Mill Rd. U.S. Army Infantry SchoolAdelphi, MD 20783.1145 ATTN: ATSH-WCB-O

    Fort Benning, GA 31905-50003 Director

    U.S. Army Research LaboratoryATTN: AMSRL-OP-SD-TL. Aberdeen ProvinR Ground

    Technical Libary2800 Powder Mill Rd. 2 Dir. USAMSAAAdelphi, MD 20783-1145 ATTN: AMXSY-D

    AMXSY.MP. H. CohenDirectorU.S. Army Research Laboratory 1 Cdr, USATECOMATTN: AMSRL-OP-SD-TP, ATTN: AMSTE-TC

    Technical Publishing Branch2800 Powder Mill Rd. I Dir. USAERDECAdelphi. MD 20783-1145 ATTN: SCBRD-RT

    2 Commander I Cdr, USACBDCOMU.S. Army Armament Research, ATTN: AMSCB-CII

    Development, and Engineering CenterATTN: SMCAR-TDC I Dir, USARLPicatinny Arsenal, Ni 07806-5000 ATTN: AMSRL-SL-I

    Director 5 Dir. USARLBenet Weapons Laboratory ATTN: AMSRL-OP-AP-LU.S. Army Armament Research,

    Development, and Engineering CentexATTN: SMCAR-CCB-TLWatervliet, NY 12189-4050

    DirectorU.S. Army Advanced Systems Research

    and Analysis Office (ATCOM)* ATTN: AMSAT-R-NR, M/S 219-1

    Ames Research CenterMoffeu Field, CA 94035-1000

    15

  • DISTRIBUTION LIST

    No. of No. ofCQQr~gis Cmic OQr izntiQU

    HQDA (SARD-TRIMs. R. Kominos) 1 ChairmanWashington, DC 20310-0103 DOD Explosives Safety Board

    Room 856-CHQDA (SARD-TR/Dr. R. Chait) Hoffman Bldg. IWashington, DC 20310-0103 2461 Eisenhower Avenue

    Alexandria, VA 22331-0600Director of Defense Research & EngineeringATTN: DD/TWP I DirectorWashingto.a DC 20301 Defense Intelligence Agency

    ATTN: DT-2/Wpns & Sys DivAssistant Secretary of Defense Washington, DC 20301

    (Atomic Energy)ATTN: Document Control I DirectorWashingtor, DC 20301 National Security Agency

    ATTN: RIS, E. F. ButalaChairman Ft. leorge G. Meade, MD 20755Joint Chiefs of StaffATTN: J-5, R&D Division 7 DirectorWashington, DC 20301 Defense Nuclear Agency

    ATTN: CSTI, Technical Library2 Deputy Chief of Staff for Operations and Plans DDIR

    ATTN: Technical Library DFSPDirector of Chemical NANS

    and Nuclear Operations OPNADepartment of the Army SPSDWashington, DC 20310 SPTD

    Washington, DC 20305U.S. Army Research Development and

    Standardization Group (UK) 3 CommanderATTN: Dr. R. Reichenbach Field Command, DNAPSC 802, Box 15 ATTN: FCPRFPO AE 09499-1500 FCTMOF

    FCTTS (Ed Martinez)Director Kirtland AFB, NM 87115Advanced Research Projects AgencyATTN: Technical Library 10 Central Intelligence Agency3701 North Fairfax Drive DIR/DB/StandardArlington, VA 22203.1714 ATTN: GE-47 HQ (10 cps)

    Washington, DC 205052 Director

    Federal Emergency Management Agency 2 Commander, USACECOMATTN: Public Relations Office ATTN: AMSEL-RD

    Technical Library AMSEL-RO-TPPO-PWashington, DC 20472 Fort Monmouth, NJ 07703-5301

    16

  • DISTRIBUTION LIST

    No. of No. ofC i Or ganization C.pic Organization

    Commander, USACECOM 1 CommanderR&D Technical Library US Army Engineer DivisionATTN: ASQNC-ELC-IS-L-R, Myer Center ATTN: HNDED-FDFort Monmouth, NJ 07703-5000 P.O. Box 1500

    Huntsville, AL 35807CommanderUS Army Armament Research, 3 Commander

    Development and Engineering Center US Army Corps of EngineersATTN: SMCAR-FSM-W/Barber Bldg. 94 Waterways Experiment StationPicatinny Arsenal, NJ 07806-5000 ATTN: CEWES-SS-R, J. Watt

    CEWES-SE-R, J. IngramDirector CEWES-TL, Tech LibUS Army Missile and Space Intelligence P.O. Box 631

    Center Vicksburg, MS 39180-0631ATTN: AIAMS-YDLRedstone Arsenal, AL 35898-5500 1 Commander

    US Army Corps of EngineersCommander Fort Worth DistrictNational Ground Intelligence Center ATTN: CESWF-PM-JATTN: Research & Data Branch P.O. Box 17300220 7th Street, NE Fort Worth, Texas 76102-0300Charlottesville, VA 22901-5396

    1 CommanderDirector US Army Research OfficeUS Army TRAC - Ft. Lee ATTN: SLCRO-DATTN: ATRC-L, Mr. Cameron P.O. Box 12211Fort Lee, VA 23801-6140 Research Triangle Park, NC 27709-2211

    Director 3 CommanderUS Army Research Laboratory US Army Nuclear & Chemical AgencyMaterials Directorate 7150 Heller Loop, Suite 101ATTN: AMSRL-MA-B Springfield, VA 22150-3198Watertown, MA 02172-0601

    1 Director

    2 Commander HQ, TRAC RPDUS Army Strategic Defense Command ATTN: ATRC-RPR, RaddaATTN: CSSD-H-MPL, Tech Lib Fort Monroe, VA 23651-5143

    CSSD-H-XM, Dr. DaviesP.O. Box 1500 1 DirectorHuntsville, AL 35807 TRAC-WSMR

    ATTN: ATRC-WC, Kirby2 Commander White Sands Missile Range, NM 88002-5502

    US Army Natick Research and DevelopmentCenter I Director

    ATTN: AMDNA-D, Dr. D. Sieling TRAC-FLVNSTRNC-UE, J. Calligeros ATTN: ATRC

    Natick, MA 01762 Fort Leavenworth, KS 66027-5200

    17

  • DISTRIBUTION LIST

    No. of No. ofSOrga n C Organization

    Commander 1 Officer in ChargeUS Army Test & Evaluation Command White Oak Warfare Center DetachmentATTN: STEWS-NED (Dr. J L Meason) ATTN: Code E232, Technical LibraryWhite Sands Missile Range, NM 88002-5158 10901 New Hampshire Avenue

    Silver Spring, MD 20903-50002 Chief of Naval Operations

    Department of the Navy 1 Commanding OfficerATTN: OP-03EG White Oak Warfare Center

    OP-985F ATTN: Code WAS01, NNPOWashington, DC 20350 Silver Spring, MD 20902-5000

    Commander I Commander (Code 533)Naval Electionic Systems Command Naval Weapons CenterATTN: PME 117-21A ATTN: Technical LibraryWashington, DC 20360 China Lake, CA 93555-6001

    Commander 1 CommanderNaval Sea Systems Command Naval Research LaboratoryATTN: Code SEA-62R ATTN: Code 2027, Technical LibraryDepartment of the Navy Washington, DC 20375Washington, DC 20362-5101

    1 AL/LSCF2 Office of Naval Research ATTN: J. Levine

    ATTN: Dr. A. Faulstick, Code 23 (2cps) Edwards AFB, CA 93523-5000800 N. Quincy StreetArlington, VA 22217 1 OLAC PL/TSTL

    ATTN: D. ShiplettOfficer-in-Charge (Code L31) Edwards AFB, CA 93523-5000Civil Engineering LaboratoryNaval Construction Battalion Center 2 Air Force Armament LaboratoryATTN: Technical Library ATTN: AFATL/DOILPort Hueneme, Ch 93041 AFATL/ DLYV

    Eglin AFB, FL 32542-5000Commanding Officer (Code L5I)Naval Civil Engineering Laboratory 1 RADC (EMTLD/Docu Library)ATTN: J. Tancreto Griffiss AFB, NY 13441Port Hueneme, CA 93043-5003

    3 Phillips Laboratory (AFWL)Commander ATTN: NTEDahlgreen Division NTEDNaval Surface Warfare Center NTESATTN: Code E23, Library Kirtland AFB, NM 87117-6008Dahlgren, VA 22448-5000

    1 AFESC/RDCSCommander ATTN: Paul RosengrenDavid Taylor Research Center Tyndall AFB, FL 32403ATTN: Code 522, Tech Info CtrBethesda, MD 20084-5000

    18

  • DISTRIBUTION LIST

    No. of No. ofC ju Qrgnint Co is reanizati

    AFIT 5 DirectorATTN: Technical Library, Bldg. 640/B Sandia National LaboratoriesWright-Patterson AFB, OH 45433 ATTN: Doc Control 3141

    C. Cameron, Div 6215AFIT/ENY A. Chabai, Div 7112ATTN: LTC Hasen, PhD D. Gardner, Div 1421Wright-Patterson AFB, OH 45433-6583 J. McGlaun, Div 1541

    P.O. Box 5800FTD/NIIS Albuquerque, NM 87185-5800Wright-Patterson AFB, Ohio 45433

    1 DirectorDirector Sandia National LaboratoriesIdaho National Engineerit g Laboratory Livermore LaboratoryATTN: Spec Programs, J. Patton ATTN: Doc Control for Tech Library2151 North Blvd, MS 2802 P.O. Box 969Idaho Falls, ID 83415 Livermore, CA 94550

    2 Director I DirectorIdaho National Engineering Laboratory National Aeronautics and SpaceEG&G Idaho Inc. AdministrationATTN: R. Guenzler, MS-3505 ATTN: Scientific & Tech Info Fac

    R. Holman, MS-3510 P.O. Box 8757, BWI AirportP.O. Box 1625 Baltimore, MD 21240Idaho Falls, ID 83415

    1 DirectorDirector NASA-Langley Research CenterLawrence Livermore National Laboratory ATTN: Technical LibraryATTN: Dr. Allan Kuhl Hampton, VA 236652250 E. Imperial Highway, Suite # 650El Segundo, CA 90245 1 Director

    NASA-Ames Research CenterDirector Applied Computational Aerodynamics BranchLawrence Livermore National Laboratory ATTN: Dr. T. Holtz, MS 202-14ATTN: Tech Info Dept L-3 Moffett Field, CA 94035P.O. Box 808Livermore, CA 94550 1 ADA Technologies, Inc.

    ATTN: James R. Butz2 Director Honeywell Center, Suite 110

    Los Alamos National Laboratory 304 Inverness Way SouthATTN: Th. Dowler, MS-F602 Englewood, CO 80112

    Doc Control for Reports LibraryP.O. Box 1663 1 Alliant Techsystems, Inc.Los Alamos, NM 87545 ATTN: Roger A. Rausch (MN48-3700)

    7225 Northland DriveBrooklyn Park, MN 55428

    19

  • DISTRIBUTION LIST

    No. of No. ofC& Q.gangzina Organization

    2 Applied Research Associates, Inc. 2 FMC CorporationATTN: J. Keefer Advanced Systems Center

    N.H. Ethridge ATTN: J. DrotleffP.O. Box 548 C. Krebs, NMDP95Aberdeen, MD 21001 Box 58123

    2890 De La Cruz Blvd.Aerospace Corporation Santa Clara, CA 95052ATTN: Tech Info ServicesP.O. Box 92957 1 Goodyear Aerospace CorporationLos Angeles, CA 90009 ATTN: R. M. Brown, Bldg I

    Shelter Engineering3 Applied Research Associates, Inc. Litchfield Park, AZ 85340

    ATTN: R. L. Guice (3 cps)7114 West Jefferson Ave., Suite 305 4 Kaman AviDyneLakewood, CO 80235 ATTN: R. Ruetenik (2 cps)

    S. CriscioneBlack & Veatch, R. MilliganEngineers - Arcitects 83 Second AvenueATTN: H. D. Laverentz Northwest Industrial Park1500 Meadow Lake Parkway Burlington, MA 01830Kansas City, MO 64114

    3 Kaman Sciences CorporationThe Boeing Company ATTN: LibraryATTN: Aerospace Library P. A. EllisP.O. Box 3707 F. H. SheltonSeattle, WA 98124 P.O. Box 7463

    Colorado Springs, CO 80933-7463California Research & Technology, Inc.ATTN: M. Rosenblatt 2 Kaman-Sciences Corporition20943 Devonshire Street ATTN: DASIAC (2cps)Chatsworth, CA 91311 P.O. Drawer 1479

    816 State StreetCarpenter Research Corporation Santa Barbara, CA 93102-1479ATTN: H. Jerry Carpenter27520 Hawthorne Blvd., Suite 263 1 Ktech CorporationP. 0. Box 2490 ATTN: Dr. E. GaffneyRolling Hills Estates, CA 90274 901 Pennsylvania Ave., N.E.

    Albuquerque, NM 87111Dynamics Technology, Inc.ATTN: D. T. Hove 1 Lockheed Missiles & Space Co.

    G. P. Mason ATTN: J. J. Murphy,21311 Hawthorne Blvd., Suite 300 Dept. 81-11, Bldg. 154Torrance, CA 90503 P.O. Box 504

    Sunnyvale, CA 940861 ATON CorporationDefense Valve & Actuator Div.ATTN: J. Wada2338 Alaska Ave.El Segundo, CA 90245-4896

    20

  • DISTRIBUTION LIST

    No. of No. ofQgis Organization Ciu Organization

    2 McDonnell Douglas Astronautics Corporation 2 S-CUBEDATTN: Robert W. Halprin A Division of Maxwell Laboratories, Inc.

    K.A. Heinly ATTN: C. E. Needham5301 Bolsa Avenue L. KennedyHuntington Beach, CA 92647 2501 Yale Blvd. SE

    Albuquerque, NM 87106MDA Engineering, Inc.ATTN: Dr. Dale Anderson 3 S-CUBED500 East Border Street A Division of Maxwell Laboratories, Inc.Suite 401 ATTN: Technical LibraryArlington, TX 76010 R. Duff

    K. PyattOrlando Technology, Inc. PO Box 1620ATTN: D. Matuska La Jolla, CA 5;2037-162060 Second Street, Bldg. 5Shalimar, FL 32579 1 Sparta, Inc.

    Los Angeles Operations2 Physics International Corporation ATTN: I. B. Osofsky

    P.O.Box 5010 3440 Carson StreetSan Leandro, CA 94577-0599 Torrance, CA 90503

    R&D Associates 1 Sunburst Recovery, Inc.ATTN: G.P. Ganong ATTN: Dr. C. YoungP.O. Box 9377 P.O. Box 2129Albuquerque, NM 87119 Steamboat Springs, CO 80477

    Science Applications International Corporation I Sverdrup Technology, Inc.ATTN: J. Guest ATTN: R. F. Starr2301 Yale Blvd. SE, Suite E P. 0. Box 884Albuquerque, NM 87106 Tullahoma, TN 37388

    Science Applications International Corporation i Sverdrup Technology, Inc.ATTN: N. Sinha Sverdrup Corporation-AEDC501 Office Center Drive, A.pt. 420 ATTN: B. D. HeikkinenFt. Washington, PA 19034-3211 MS-900

    Arnold Air Force Base, TN 37389-99982 Science Center

    Rockwell International Corporation 3 SRI InternationalATTN: Dr. S. Chakravarthy ATTN: Dr. G. R. Abrahamson

    Dr. D. Ota Dr. J. Gran1049 Camino Dos Rios Dr. B. HolmesP.O. Box 1085 333 Ravenswood AvenueThousand Oaks, CA 91358 Menlo Park, CA 94025

    1 Texas Engineering Expe,.iment StationATTN: Dr. D. Anderson301 Engineering Research CenterCollege Station, TX 77843

    21

  • DISTRIBUTION LIST

    No. of No. ofn Oranization C Q.ggLzran.ia

    Thermal Science, Inc. I University of MinnesotaATTN: R. Feldman Army High Performance Computing Research2200 Cassens Dr. CenterSt. Louis, MO 63026 ATTN: Dr. Tayfun E. Tezduyar

    1100 Washington Ave. South

    2 Thinking Machines Corporation Minneapolis, Minnesota 55415ATTN: G. Sabot

    R. Ferrel 3 Southwest Research Institute245 First Street ATTN: Dr. C. AndersonCambridge, MA 02142-1264 S. Mullin

    A. B. WenzelTRW P.O. Drawer 28255Ballistic Missile Division San Antonio, TX 78228-0255A'ITN: H. Korman,

    Mail Station 526/614 1 Stanford UniversityP.O. Box 1310 ATTN: Dr. D. BershaderSan Bernadino, CA 92402 Durand Laboratory

    Stanford, CA 94305BattelleTWSTIAC I State University of New York505 King Avenue Mechanical & Aerospace EngineeringColumbus, OH 43201-2693 ATTN: Dr. Peyman Givi

    Buffalo, NY 14260

    California Institute of TechnologyATTN: T. J. Ahrens Aberdeen Provine Ground1201 E. California Blvd.Pasadena, CA 91109 1 Cdr. USATECOM

    ATTN: AMSTE-TE-F, L. Teletski

    2 Denver Research InstituteATTN: J. Wisotski 1 Cdr, USATHMA

    Technical Library ATTN: AMXTH-TEP.O. Box 10758Denver, CO 80210 1 Cdr, USACSTA

    ATTN: STECS-LII Massachusetts Institute of Technology

    ATTN: Technical LibraryCambridge, MA 02139

    2 University of MarylandInstitute for Advanced Computer StudiesATTN: L. Davis

    G. SobieskiCollege Park, MD 20742

    22

  • DISTRIBUTION LIST

    26 Dir,USARLATTN: AMSRL-CI.A, H. Breaux

    AMSRL-CI-AC. R. SherokeAMSRL-CI-AD,

    C. NietubiczC. Ellis

    AMSRL-CI-C, W. SturekAMSRL-CI-CA.

    M. ColemanN. Patel

    AMSRL-CI-S, R. PearsonAMSRL-CI-SA, T. KendallAMSRL-WT, D. HisleyAMSRL-WT-N, J. IngramAMSRL-WT-NA. A. KebsAMSRL-WT-NB, J. GwaltneyAMSRL-WT-NC.

    R. Lottero,B. McGuireA. MihakcinK. OpalkaR. RaleyM. Unekis

    AMSRL-W'r-ND, J. MilettaAMSRL-WT-NF, L. JasperAMSRL-WT-NG, T. OldhamAMSRL-WT-N11, J. CorriganAM SRL-WT-PB,

    P. WeihnachtB. Guidos

    AMSRL-WT-TC, K. Kimsey

    23

  • INTENTIONALLY LEFF BLANK.

    24

  • USER EVALUATION SHEET/CHANGE OF ADDRESS

    This Laboratory undertakes a continuing effort to improve the quality of the reports it publishes. Yourcomments/answers to the items/questions below will aid us in our efforts.

    1. ARL ReponNumber ARL-TR-589 DateofReport October 1994

    2. Date Report Received

    3. Does this report satisfy a need? (Comment on purpose, related project, or other area of interest for

    which the report will be used.)

    4. Specifically, how is the report being used? (Information source, design data, procedure, source of

    ideas, etc.)

    5. Has the information in this report led to any quantitative savings as far as man-hours or dollars saved,

    operating costs avoided, or efficiencies achieved, etc? If so, please elaborate.

    6. General Comments. What do you think should be changed to improve future reports? (Indicatechanges to organization, technical content, format, etc.)

    Organization

    CURRENT NameADDRESS

    Street or P.O. Box No.

    City, State, Zip Code

    7. If indicating a Change of Address or Address Correction, please provide the Current or Correct addressabove and the Old or Incorrect address below.

    Organization

    OLD NameADDRESS

    Street or P.O. Box No.

    City, State, Zip Code

    (Remove this sheet, fold as indicated, tape closed, and mail.)(DO NOT STAPLE)