functional methods to set up and analyze hydrodynamical ... · 1.4 programming jargon due to the...

Functional Methods to Set Up and Analyze HydrodynamicalSimulations of Star-Forming Molecular Clouds

Funktionale Methoden fur Setup und Analyse hydrodynamischerSimulationen von sternbildenden Molekulwolken

MasterarbeitGeorg Michna

September 30, 2013

Supervisor:Prof. Dr. Andreas Burkert

Contents

1 Introduction 41.1 SPH, GPGPU, and Espresso . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 41.2 Stability of isothermal filaments . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 41.3 Evolving views on programming . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 51.4 Programming jargon . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 61.5 Weighing optimization against simplicity . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6

2 Functional setup and analysis of SPH simulations 82.1 Program structure and intended workflow . . . . . . . . . . . . . . . . . . . . . . . . . . . 82.2 Random number generation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 92.3 Setup of homogeneous shapes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10

Grids . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10Glasses . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10

2.4 Placement of arbitrary distributions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11Partition . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11Binary probability tree . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 12Random placement . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 12Quality test and benchmark . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 13“Noisy” placement . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 15Possible extensions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 16

2.5 EBT data format . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 16Structure . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 16Index . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 17SPH-specific format definitions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 18

2.6 Integration of plotting and rendering . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 192.7 Usage . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 192.8 Test: isothermal spherical collapse . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 212.9 Test: adiabatic energy conservation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 22

3 Radial simulation of isothermal self-gravitating filaments 263.1 Star-forming filaments in observations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 263.2 The Ostriker density profile . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 273.3 Espresso runs . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 28

Stability from softening . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 293.4 Simulator implementation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 31

Grid . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 31Piecewise linear interpolation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 31Hydrodynamics . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 32Gravity . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 33Pressurized Surroundings . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 33

3.5 Artificial dissipation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 34Oscillation suppression . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 34

2

Contents

3.6 Fixed-border Ostriker simulations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 35Simulations close to critical mass per length . . . . . . . . . . . . . . . . . . . . . . . . . . 37

3.7 Variable-border Ostriker simulations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 383.8 Conclusions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 39

4 Assessment of F# in computational physics 424.1 Introducing F# . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 424.2 The functional paradigm . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 43

Immutability . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 444.3 The Common Language Infrastructure . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 44

CLI runtimes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 454.4 Performance . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 45

Outlook . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 464.5 Units of measure . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 464.6 Typing and type safety . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 474.7 IDE options . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 474.8 Evaluation summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 48

Distinguishing between high-level and high-performance code . . . . . . . . . . . . . . . . 494.9 Closing remarks . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 49

A Appendix 50A.1 EBT file format specification . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 50A.2 EBT index specification . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 50A.3 Implemented Rules . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 51

IOFlags . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 51A.4 Unused feature: 3D Fast Fourier Transform . . . . . . . . . . . . . . . . . . . . . . . . . . 52

Bibliography 54

3

1 Introduction

This thesis examines a set of modern programmingtechniques, applying them to simulations of star-forming molecular clouds. It is composed of threemain parts, each handling a distinct topic:

• Methods to efficiently set up, export, start,and analyze runs of a Smoothed Particle Hy-drodynamics (SPH) simulation. Example im-plementations to use and test the Espressosimulator (introduced in the next section) arepresented.

• Simulations of the radial density distributionof isothermal star-forming filaments using acustom one-dimensional grid simulator withcylindrical symmetry.

• Assessment of the F# programming language,Common Language Infrastructure (CLI) andthe functional programming paradigm in gen-eral for use in physical applications.

The motive for a focus on programming practicesstems from the increasing performance of simula-tors – and computational systems in general. Whenan expert group sets up and runs a simulation overmonths, it may not be significant how much timewas spent on realizing particle distributions, man-aging simulations or transforming data. However,when a single student on a typical machine is ableto run a simulation, such overhead may dominatethe overall time spent.

The following analyses and experiments aim tomanage and run simulations quickly, flexibly andwith a lower error rate. The methodology is givendeliberate emphasis compared to the algorithmsand results, since its importance rises as simula-tions themselves become more efficient. I wouldlike readers to see this shift as the overall themethat connects the only loosely related topics of thisthesis.

1.1 SPH, GPGPU, and Espresso

Smoothed Particle Hydrodynamics, introduced byGingold and Monaghan in 1977 [18], uses the po-sitions and properties of particles to fully describethe state of a simulated fluid. Spatial grids, thecentral data structure to many other types of sim-ulators, are not part of the representation; if theyare used at all, they only serve as a performance op-timization. This can necessitate additional steps torelate SPH states with analytical or abstract mod-els; it can also complicate tasks such as visualiza-tion or the placement of particles. Chapter 2 han-dles this topic in detail.

The Espresso simulator is a not yet publicly re-leased SPH simulator supporting general purposegraphics processing unit (GPGPU) execution thatis currently under development by Martin Zintl[34]. GPUs are able to massively outperform CPU-based options, allowing simulations featuring mil-lions of particles to run with reasonable speed onworkstation computers. Much of the material pre-sented in this thesis, especially in Chapter 2, is re-lated to the testing and development of Espresso.

There is no official release of Espresso at the timeof this writing. This thesis’ code version controlcurrently hosts Espresso; contact Martin Zintl [34]for information on a public release or me for ac-cess. Since the combined repository may containunpublished material, permission to use code fromthis thesis by people outside the CAST group of theUSM Munich requires Martin Zintl’s consent. I canseparately provide the code relevant to this thesis,which runs independently but is unable to run SPHsimulations without Espresso.

1.2 Stability of isothermal filaments

The exact processes leading to star formation inmolecular clouds are subject to ongoing debate.Observations of star-forming filaments in molecu-lar clouds, conducted in a 2011 study by Hacar etal. [13], largely motivated Chapter 3. Studying

4

1.3. Evolving views on programming

Figure 1.1: Shows a star-forming filamentary structure in the Taurus molecularcloud, as observed by the APEX telescope [2], in a visible-light image super-imposed with submillimetre-wavelength observations (orange glow). Individualfilaments of this cloud are the subject of studies by A. Hacar et al. [13] whichin turn motivated simulations of such filaments’ radial density profile detailed inChapter 3. Credit: ESO/APEX (MPIfR/ESO/OSO)/A. Hacar et al./DigitizedSky Survey 2. Acknowledgment: Davide De Martin.

the L1517 dark cloud in Taurus, pictured in Figure1.1, it has found filaments to be surprisingly quies-cent, featuring coherent velocity distributions verydifferent from their turbulent environment. Thisgives new insight into the star-forming processes insuch molecular clouds, as dense cores that collapseinto stars may be formed by fragmentation of suchvelocity-coherent filaments.

It also allows for more specific approaches inthe theoretical modeling of filaments’ structure,strengthening the view that isothermal cylindermodels apply. Investigating the validity of thistheory and applying it in numerical simulations ofgravitationally supported filaments’ radial profilesis the main topic of Chapter 3.

The model for comparison is an analytical equi-

librium solution of the infinite, self-gravitatingcylinder by J. Ostriker [24]. It has finite mass perlength, as its density scales with r−4 for large r.Section 3.2 discusses its statements for the isother-mal case and some of its implications for pressure-supported filaments. The results of these analyti-cal considerations are then reproduced in dynam-ical simulations, under slightly dampened motionand sufficient run-time to relax into a stationarydistribution.

1.3 Evolving views on programming

Recent years have seen a shift shift in program-ming paradigms, the thought models that gov-

5

1. Introduction

ern programming and the design of its languages.Classical languages like FORTRAN have becomemore object-oriented, while object-oriented lan-guages such as C# have become more functional.The evolution from C to the more object-orientedC++ to the introduction of some functional fea-tures in C++11 is an example of this development.

This thesis uses F# for most of its code andplaces an emphasis on functional programmingstyle, which is briefly introduced in Section 4.2.

Hydrodynamics may be an especially interestingcontext for the application of F# and its empha-sis on the functional programming paradigm. Thebook Expert F# 2.0 [7], a learning resource andreference used for this thesis, begins its introduc-tion to the concept of imperative programming –the classical competitor to functional programming– by naming the simulation of fluid dynamics asa typical application. Other applications in the-oretical astrophysics, such as the various n-bodysimulations ranging from asteroid orbits to glob-ular clusters and cosmology, show a similar dom-inance. While classical, imperative programmingin the style of FORTRAN and its descendants hascertainly been an effective tool, astronomy – and,in my personal experience, physics in general – isso firmly in its hands that concepts from object-oriented and functional programming tend to beforgotten.

A special feature of F#, which is rarely seen inthis quality even in other functional languages, isits support of units of measure in code. Implemen-tations created for this thesis make heavy use ofthem. Section 4.5 provides some additional infor-mation about this.

1.4 Programming jargon

Due to the nature of this thesis, it includes pro-gramming jargon that may not be in common useamongst astrophysicists. I suggest that readerswho are not at all familiar with the concept of im-mutability or functional programming read Section4.2 early. It is only a rough introduction; trying toexplain the entire field of functional programmingwithin this thesis is not feasible. Still, it may helpexplain some of the atypical design decisions in thisproject.

Jargon can be especially confusing when it has aconventional meaning that differs from the intendedmeaning, such as the word “functional” when re-ferring to the functional paradigm. Other terms tolook out for are:

• Method : in object-oriented programming, amethod is a function tied to an object instance.In the CLI, static methods still bear this name,even though they are tied only to a type, notan instance. F# supports and distinguishesbetween functions and methods, so the textuses either term depending on the implemen-tation.

• Record : an F# record is a data structure ag-gregating named (and typed) values. Recordsare often immutable. F# provides useful syn-tax for creating a modified version of a record.Many data types in this thesis are imple-mented as records.

• Module: In F#, modules are a tool for struc-turing. They group code and, unlike othertypes, allow to expose functions (as opposedto methods).

• CLI and CIL: The swapped letters may seemlike mistakes, but these are different – thoughrelated – concepts. The CLI is the CommonLanguage Infrastructure; it forms the platformF# is built on. Section 4.3 introduces it. TheCLI outputs a byte-code after its first compi-lation step that is called CIL, the Common In-termediate Language. Whether or not this pairof abbreviations is sensible, it is consideredproper terminology and will be used through-out this thesis.

1.5 Weighing optimization againstsimplicity

When designing a project such as this one, relia-bility and simplicity sometimes conflict with per-formance. In most of these cases, I did not favorperformance. For instance, all floating point calcu-lations are done in double precision and there is nooption to change this. Including such a “precisionswitch” would, in my opinion, cost users more timethan the faster program gains them. The addi-tional time-consuming steps users need to take dueto increased complexity can be subtle and tricky topredict. Sticking with the example of a precisionswitch: much of the code would be affected and re-quire additional documentation. Users would haveto keep precision in mind as an error source. Type-safe implementation would become more compli-cated. The program run-times were just not longenough to justify such optimization. In a similarvein, random number generation uses an entropy

6

1.5. Weighing optimization against simplicity

source of cryptographic quality by default. Ex-tended versions of the library functions allow theuser to change this, but in most cases, that shouldnot be necessary. The cryptographic generator isreasonably fast, yet a single case of artifacts dueto bad randomness can cost a lot of time. (SeeSection 2.2 for a description of the used randomnumber generation.)

Generally, physicists’ time is a rather expensiveresource compared to CPU time. It should be in-cluded in speed considerations. Setup and analysisscripts usually take by far more time to write thanto run. When facing the option of improving a pro-gram’s speed at the cost of adding complexity, it iseasy to underestimate the cost of the complexity.As Donald Knuth famously said, “premature opti-mization is the root of all evil.” Even though thisis a demonstration code without real users, it triesto balance between performance optimization andsimplicity of the interface that users see – to respectthe constraints a project with multiple active userswould have. (And not be evil.)

7

2 Functional setup and analysis of SPHsimulations

Modern programming methods have the potentialto improve the efficiency of writing code in compu-tational astrophysics. The SPH setup and analysisF# library that was written for this thesis, calledSPHS library in the following, is intended as a con-crete example of how this can be achieved. It pro-vides tools and data structures that can be calledfrom any F# program, or any CLI program whenabandoning units of measure. It aims to allow ef-ficient combination of these tools, both with eachother and external programs.

One of the SPHS library’s tasks is to allow theuser to set new initial conditions as quickly as pos-sible. The most intuitive way of defining initialconditions is often by analytical functions, such asdensity distributions, particle distributions, or ve-locity fields. Unlike grid-based simulators, SPHsimulators cannot directly take such distributionsas input. This leaves it to the user to createparticles that approximate the desired distribu-tion. To speed up the creation of initial condi-tions, the SPHS.Make module and a space parti-tioner are introduced. They provide functionalityto easily specify grids, glasses, and arbitrary distri-bution functions. The outputs are 3D coordinatesequences which can easily and flexibly be trans-formed into particles.

2.1 Program structure andintended workflow

Many tools, such as the common plotting programsGnuplot [11] and Octave [22], are written in onelanguage and used in another. Their front-end lan-guages are often oversimplified; they demote theuser to a second-class citizen who does not have ac-cess to the full capabilities of the program’s code.The user ends up writing analysis code in a differ-ent language, to then export the results into thetool’s inferior language. The language is only used

because it exclusively offers certain features, notbecause of its quality. The SPHS library attemptsto avoid this. By using a well-designed general-purpose programming language as the front-end,it allows the user to introduce new features inthe same way the library functions were written.Users can integrate external tools by writing wrap-per functions, in the same way the library func-tion Simulation.RunLocal can compile and runthe Espresso simulator. Formally, there is no no-table difference between using the library and ex-panding the library. Both simply add new defini-tions. Only the last step differs, in which a desiredcomputation is fully defined and executed.

The library loads the following definitions in or-der:

• the EBT data format (See Section 2.5)

• CGS base units and astrophysical unit values(parsec, M, G, . . . )

• geometry types for 3D vectors, cubes andcuboids

• a random number generator class (see Section2.2)

• an SPH particle type and thread-safe ID gen-erator

• tools and data types to run Espresso

• the Make module and partitioner (See Sections2.3 and 2.4)

• 3D point cloud rendering and 2D function plot-ting (see Section 2.6)

• a 1D grid simulator featured in Chapter 3

Figure 2.1 shows an overview of the library’s setupand analysis features. Even though it shows addi-tional programs only as tools to process output, all

8

2.2. Random number generation

Figure 2.1: The data pipeline of the SPHSlibrary in its intended standard use case.Arrows show information flow; grey ar-rows pass behind boxes. User-written partsare green, the SPHS library is blue withthe EBT part violet, and the native/GPUEspresso simulator red. Usage of noisyplacement by Make.glass is omitted asan irrelevant implementation detail. TheRules record has two colors as its structureis predefined, but its content set by the user.

user code can call external libraries if necessary –both CLI and native code (e.g. using F#’s NativeInterop features).

The graphic shows how all functionality is ar-ranged around the main program that is writtenby the user. Assumptions on this program havebeen avoided wherever possible; it could run mul-tiple simulations or only use a subset of the li-brary, using custom code in place of the remain-der. Though its features rely on some commondefinitions such as the geometry types, the libraryconsists of many parts that are independent in us-age. Five such parts are visible upon closer in-spection of Figure 2.1: continuous placement, ho-mogeneous placement, simulation interaction, func-tion plotting, and 3D rendering. In addition, the

data types, random number features, and EBT fileformat could be used independently. (Though theother library features could not be used indepen-dently of these.)

Settings for Espresso are specified using the Rulesrecord. (Appendix A.3 has a list of implementedrules.)

2.2 Random number generation

Many of the algorithms presented in the followingrely on randomness. For convenient use of ran-dom numbers, a helper class is provided. It offersthe creation of double-precision floating-point num-bers uniformly distributed in [0, 1] as well as three-dimensional vectors with unit-of-measure support,uniformly distributed in cubes or cuboids definedby using the library’s vector, cube, or cuboid types.

The class is not thread-safe, but can be instanti-ated multiple times for parallel usage. This is donevia one of three factory methods, which differ intheir entropy sources:

• Standard (): a cache-enhanced wrapper overRNGCryptoServiceProvider, a class of theCLI Runtime that creates very high qualityrandom sequences.

• Well512 seed: a deterministic pseudo-random entropy source that returns the samesequence for the same value of seed, which is a64-bit integer. This is useful to create exactlyrepeatable processes

• Well512 (): similar to the seeded version, butuses the standard entropy source to obtain theseed.

Performance between the entropy sources is hardto judge as it varies, depending on compilation op-timizations, CLI Runtime, operating system, andhardware.

Caching code that greatly improves performanceand the Well512 pseudo-random generators havebeen supplied by Christian Winnerlein, whom Ithank for his support. I also thank him for warningme of serious flaws in the .NET library’s standardrandom number generator, which caused me to re-move it from the selection.

9

2. Functional setup and analysis of SPH simulations

2.3 Setup of homogeneous shapes

Grids

The library’s simplest shortcut function,Make.grid, creates any type of grid that canbe expressed by a cubic primitive cell. The param-eters it takes are a grid type, a bounding box, abinary distribution and the desired output density.The grid type is provided either by picking fromthree common grids – simple cubic, face-centeredcubic and body-centered cubic – or via a list ofoccupied unit cell positions.

The algorithm scales the unit cell to have thedesired particle density and repeats it periodicallyin all directions of space. It then fills the resultinto the given bounding box, yielding all positionswhere the binary distribution function returns true.This allows it to cut shapes from the grid. The co-ordinate system of the grid unit cell aligns withthat of the output. Make.grid does not attemptto approximate the distribution function, callingit for every possibly valid grid position inside thebounding box. Therefore, if large voids exist in thebounding box and run-time should become a prob-lem, performance can be increased by using smallerbounding boxes, calling Make.grid multiple timesfor separate objects if needed.

Glasses

It is commonly required to fill shapes with a glass,which is a fairly relaxed particle distribution thatshows no distinct grid axes. Glass is useful sincea simulation using it neither involves a sudden re-laxation in the beginning nor shows anisotropic be-havior due to the small-scale particle distribution.

Similar to Make.grid cutting from a grid, cut-ting from a glass distribution is possible withMake.glass. The function’s signature is analogousto its grid-based counterpart, except that it – nat-urally – takes no grid type. Instead, it needs thesimulator settings – provided via a Rules record –and a boolean determining whether to generate anew glass or use a cached one.

Despite being similar to the grid equivalent inusage, internally glass creation is a comparativelylengthy process. First a cube is randomly filledwith particles, then the result is exported andshifted around by Espresso in glass creation mode,so that the glass corresponds to the physical set-tings of the simulation. The result is then importedback to be used as an infinite periodic glass. Fromthis glass, the desired shape is cut. A typical next

step would be to generate particles from these glasspositions and export them again to finally start thephysical simulation.

To initially fill the cube, the “noisy” semi-random placement algorithm for arbitrary distribu-tions (described later in this chapter) is called witha homogeneous distribution. This creates a fixedamount of positions, currently set to 10,000, at eachof which a generic SPH particle is made. To ma-nipulate these particles in Espresso, simulator rulesare prepared by copying the rules given as a param-eter, with the following rule changes: the simulatoroutput is reduced to positions, boundaries are setto the cube in which the particles were placed, glasssteps become 1,000 and physical steps 0, periodicboundary conditions are set on all dimensions, andgravity is disabled. Note that this implementationmay require to be edited if further simulator featuresare introduced. Any member of the rules record willbe copied into the settings of the glass creation run,so the implementation should disable all availablesettings that can interfere with glass creation.

The new rules are exported, together with thegeneric particles, into an EBT formatted file (seeSection 2.5), which is passed to Espresso. After thesimulator finishes compiling and running – Espressomay need to recompile itself depending on the set-tings – its output positions are re-imported. Thesimulator output is also kept on disk as a cache. Ifa cache file is present and the glass generation func-tion is called with the parameter for enforced glassgeneration set to false, all steps up to the importfrom file are omitted and the cached glass is used.

Now that we have a periodic cube of glass, thenext steps are identical to Make.grid: the glassis infinitely repeated and positions accepted if theymatch the distribution. In fact, Make.glass shouldcall Make.grid at this point. It does not merelyfor historical reasons. This restructuring step wasomitted due to time constraints concerning testing.The current implementation is equivalent though;the change would only remove a repetition of code.

Make.glass is intended for 3D use but does haveminimal handling of 2D simulations. It does cor-rectly recognize 2D rules and disables z-repetitionsaccordingly. However, due to the unit-of-measuresystem being fixed at compile time, density unitsand their interpretation remain three-dimensional.To comfortably use this feature in 2D, the functioncould be split into a 2D and 3D variant with differ-ent unit-of-measure signatures.

10

2.4. Placement of arbitrary distributions

2.4 Placement of arbitrarydistributions

The general form of a distribution is a density func-tion over space. Most such distributions cannot bereasonably approximated by homogeneous shapesas done in the previous section, so a more generalmethod is required.

A simple but rather slow random placement algo-rithm works as follows: it generates pairs of randompositions and numbers between 0 and the maxi-mum of the density distribution. For each suchpair, it looks at the density distribution at the ran-dom position and compares it to the random num-ber. If the random number is below the distribu-tion, this position is accepted into the output, oth-erwise it is discarded. This process is repeated untilenough positions have been found. This is feasi-ble for roughly homogeneous distributions, but be-comes very inefficient if empty space or high peaksin the distribution are involved. Since the algo-rithm must account for the possibility of hitting apeak in the distribution, the probability of plac-ing a particle after randomly selecting a locationelsewhere becomes very low. This can make thismethod of placement unacceptably slow. Anotherinconvenience is that the maximum of the distribu-tion is required from the start to run the algorithm.

The SPHS library provides quick random place-ment following an almost arbitrary density distri-bution. It is a two-step process: first, the distri-bution function is examined to create an objectroughly describable as a partition of space, orga-nized for particle placement. In the second step,this partition object is used by a placement algo-rithm, of which two are introduced, to get the finalpositions. For a given use case, it is fairly easyto wrap the process into one function that takes aphysical density and outputs particles. The gen-eral implementation presented here is agnostic toits physical use case to improve flexibility; it couldin principle be used for other distributions thanparticle density.

This is actually the third random placement algo-rithm implemented for this thesis. The first was asingle-step system that had similar abilities but wasreplaced due to its excessive parameter complexity,while the second was a different formulation of theversion presented in the following. Neither is ofspecial interest as their entire functionality wouldbe redundant with the library functions introducedin this chapter.

As an example, let us define a disk with expo-

nentially declining density in both its radius anddistance from the disk plane. This is roughly whatone might find in a spiral galaxy’s disk. Our goal isto obtain a particle distribution that approximatesit, as shown in Figure 2.2. A density distributioncan be expressed as an F# function that maps a3D position to a dimensionless positive value:

let disk (p : Vector3<cm>) =

let r = (p.WithY 0.0<_>).Length()

let h = abs p.Y

exp (-h / hScale - r / rScale)

This function takes a 3D position vector and mapsit to the resulting density. In this case, we decideto use CGS units for the disk, hence positions aredenoted in centimeters. The disk lies in the xz-plane, with hScale and rScale defining the scaleheights in cylindrical axial and radial direction re-spectively. Note that this function makes no state-ment about the absolute density, but only aboutits relative distribution in space, so multiplying thelast expression with a constant factor would notyield a different distribution.

Figure 2.2: Output positions of the parti-tioner using a distribution function expo-nential in xz-radius and y-direction. SeeFigure 2.3 for the space partitioning used.See Figure 2.7 for a radial quality check ofthe distribution.

Partition

To raise the efficiency of particle placement, it isuseful to distinguish interesting volumes of space,such as an irregular surface, from volumes of spacewith a simple distribution, such as empty areas.This is the purpose of the partitioner. It splitsspace into cubes similar to refinement in an oc-tree, examining each cube to determine whether it

11


is interesting, meaning difficult to use for randomplacement. It refines such cubes into smaller cubesuntil either they are no longer interesting or theirdensity integral is estimated to be small comparedto the total distribution’s integral.

Cubes are examined by sampling the distribu-tion at random positions. The algorithm tracksthe amount of samples, the maximum value found,and the aggregate of all found densities. The aver-age density divided by the maximum density is theefficiency for random placement: it is equal to theprobability that a generated position is discarded.In overview, a refinement step:

• obtains an estimate of the global density inte-gral from the previous step

• distributes unfinished cubes to worker threadsfor parallel handling

• splits unfinished cubes into 8 smaller cubes

• samples the new cubes

• discards cubes that are empty for all samples

• regards cubes as finished if their expected effi-ciency is above the target value

• regards cubes as finished if they hold little con-tent compared to the global distribution

• queues remaining cubes as unfinished work forthe next step

This means that the partition is not an actual oc-tree, but only the last layer of it with empty boxesremoved.

Figure 2.3: Partitioner output of the exam-ple disk. The outer cubes do not get deletedas they are not perfectly empty; a particlemay appear in them with very low proba-bility.

Figure 2.3 shows the cubes created by the par-titioner by visualizing six edges per cube. Refine-ment parameters are set to: an efficiency targetof 40%, a density integral limit of 10−5 of the to-tal estimate (this defines “holding little content”in the previous list) and 8M minimum samples.These settings work for many distributions; theyare used when calling the shorthand version ofGeneratorPartition.Create.

Binary probability tree

The partition’s final data structure should be suit-able to select boxes randomly in accordance withthe density distribution. To achieve this, each cubeis assigned a unique interval within [0, 1] and placedin a binary tree sorted by these intervals. Any value∈ [0, 1] maps to exactly one cube and the size ofeach cube’s interval scales linearly with the proba-bility of random position ending up in it if placedaccording to the given distribution. The binarysearch tree is constructed on top of the arbitrar-ily ordered cubes. Nodes contain two sub-trees andthe boundary value separating the intervals of each.This allows navigating to the cube corresponding toa value in O(log n), where n is the number of cubes.

Leaves of the binary tree hold two pieces of infor-mation: the cube’s location and the maximum sam-pled density. Other statistics are discarded. Thepartitioner outputs only the binary tree, which isnot ordered with respect to space.

Random placement

Given the binary tree introduced above, using it forrandom placement is simple. First, the algorithmchooses a random number in [0, 1] and navigates tothe corresponding box. Then it tries to randomlyplace a particle until it succeeds: it chooses a ran-dom location in the cube and a random numberbetween 0 and the distribution maximum of thecube. If the number lies below the density at thetested position, the position was found.

The implementation has some additional tweaks.It is trivially parallel for multiple particles: athread that is generating positions – using its ownrandom number generator – does not interfere withother threads that use the same partition. There-fore, random placement runs multithreaded by de-fault. The program also limits placement attemptsafter a cube has been selected, eventually choosinga new cube if it does not succeed. This way theprogram terminates in reasonable time if the par-titioner, by chance, hits a very small peak in an

12


Figure 2.4: Partition and randomly filled positions of a hard ball distribution.Higher refinement is only necessary for volumes containing the surface: cubesfurther outside are empty and get deleted, those further inside are efficient forplacement anyway.

otherwise empty box and the placement algorithmfails to find it.

Using random placement on the disk defined atthe beginning of this section, which is done by usingMake.random on the partition of Figure 2.3, cre-ates a distribution such as Figure 2.2. The com-plete process to obtain randomly placed particlesfor SPH would be:

1. Define the distribution function

2. Construct a GeneratorPartition using one ofits Create methods

3. Call Make.random (or Make.random ex for ad-ditional options) with the partition and desiredposition count. (An alternative algorithm willbe introduced later in this section)

4. Map the positions to particles using a cus-tom mapper function that determines particleproperties.

Quality test and benchmark

To test the algorithm under extreme conditions andcompare implementations, a hard-to-select bench-mark distribution was created. It is a combinationof two shapes: the outer shape is a fractal that cutslengths into thirds in each step and dimension, set-ting the density to zero in some of the 27 result-ing cubes. The inner shape consists of sharp-edged

fragments of a thin spherical shell. Since steps ofthe partitioning algorithm cut lengths into halves,forming eight cubes each time similar to an octree,the partition does not align favorably with the frac-tal – and not at all with the shell fragments.

Figure 2.6: Benchmark output distribution

Figure 2.5 shows a partition based on the com-bined distribution. Thin edges and tips are hardto select with this algorithm, so it is reasonable toassume that most physical applications will showbetter results than this benchmark. The symmetryand clear geometry of the shape makes it easy to

13


Figure 2.5: Partition of an extreme shape used to test and benchmark methodsfor particle placement. The fractal has 5 iterations of which the first has the shapeinverted. The shell pieces have a width of 0.18% of the box’s edge. Partitionersamples per refinement layer have been raised to 40M (5 times the current default)to reliably reproduce this structure.

see flaws in particle placement when displaying arotating image of it. This was useful in algorithmdevelopment.

The algorithm produced the output shown inFigures 2.5 and 2.6 in 45 seconds in cryptographicrandom number quality using the Microsoft .NET4 runtime on an AMD FX-8150 CPU. Performancewas limited by two factors: First, due to a compilerbug in Visual Studio 12 that appeared late in thewriting of this thesis, enabling tail call optimizationslowed down a frequently run part of the algorithminstead of accelerating it as it should. Therefore,the optimization was disabled. Tail calls are a ma-jor feature of functional compilers; their unavail-

ability may impact performance. This problem islikely to disappear in future compiler versions. Sec-ond, an optimization shortcut – skipping samplingif refinement becomes very likely – was omitted inthe implementation. Fast placement of overly com-plex distributions was not a priority for this work,because after the optimization introduced by thepartitioner, the execution time of the presented al-gorithm was no longer the bottleneck in a typicalsimulation.

Reproduction quality of the partition in terms ofthe position distribution appeared flawless in tests.Figure 2.7 shows its precision for the radial direc-tion of our example disk.

14


Figure 2.7: Quality check of the disk partition in Figure 2.3, measuring the radialdensity distribution, spanning about seven orders of magnitude. The orange line isthe function Ce−r/rScale, where rScale = 2.5kpc is from the input disk definitionand C scales it with particle count. The partitioner was set to its standardprecision of 8M samples (and polled for 20M positions for the plot). Randomnoise at low particle density is expected in random placement, so this result canbe considered a perfect reproduction of the input radial density function.

“Noisy” placement

Randomness is not always desirable on all scales.The random placement algorithm may leave boxesempty even though it estimated from earlier sam-pling that they would typically yield multiple po-sitions for particle placement. This is the correctbehavior for true random placement, where ran-domness is equally present on all scales. For casesin which this is unwanted, the noisy placement al-gorithm presents an alternative. It is similar torandom placement in usage, but reduces the ran-domness between cubes of the partition.

Let N be the number of particles to be placed.We want to give each cube at least its fair share ofparticles:

nmin(cube) =

⌊N

´cube

ρ(~x)dV´global

ρ(~x)dV

⌋

≈⌊N · boxdint

gdint

⌋

Here, gdint is again the partitioner’s global densityintegral and boxdint the corresponding estimate

for only the relevant cube. Given sufficient sam-ples, the last expression approximates the actual in-tegrals, except for extreme distributions which arenot relevant here. For each cube, the noisy place-ment algorithm approximates nmin in this way andgenerates the according amount of positions. Thenit places the remaining positions randomly.

This implementation is somewhat minimalistic;it does not remove small-scale noise. This becomesmore sensible when put into context: Espresso al-lows specifying a number of glass steps to be ex-ecuted before simulating. These glass steps moveparticles a small distance into a more relaxed statewithout affecting their velocities. As random place-ment sometimes positions particles very close toeach other, small-scale relaxation from this unphys-ical initial positioning occurs when the simulationstarts. This problem can be significantly reducedby only a few steps of glass-forming movement.Such local, low-grade glass formation is useful inconjunction with the noisy placement algorithm,as it smoothes out any small-scale noise. Combin-ing the two keeps the distortions to the distributioncaused by glass steps small and retains randomness.Note though that the current partitioner considers

15


boxes with high density “not interesting” if theyare efficient and ceases to refine them. This maystop noise suppression on medium scales of sim-ple distributions. Should this library be used in thefuture and this feature become important, edit theinteresting (PGenBox<’u> -> bool) function ofthe partitioner to consider boxes holding a largefraction of gdint (the global density integral es-timate) interesting. The threshold fraction couldbecome another parameter or be influenced by thesample count.

There are experiments to allow the passing ofsamples of the density function to Espresso. Thisdata could be used during glass steps to further re-duce the distortions they cause in continuous distri-butions. This has not been pursued further for thisthesis, as the increased precision was not necessary.

Possible extensions

In principle, many further placement algorithmsusing the partitioner are conceivable. If the pre-viously mentioned modification to the partitioneris applied, so that the total density integral percube can be capped, the resulting partition couldbe used to create especially homogeneous distribu-tions instead of random ones, for example by simplysampling down to a number of cubes on the orderof the number of particles. Though this particularmethod would not offer very good performance.

An internal setup program of Espresso influencedby the placement algorithms of this thesis uses aPeano space-filling curve and makes moves a dis-tance along it according to the density found at theprevious sampling point. This creates smooth dis-tributions but has a bias to “jump” too deep intosudden density spikes. Using the partitioner andthen applying this algorithm to individual cubes,configured to match the locally desired particlecount, may allow to automatically fill arbitrary dis-tributions both precisely and without small-scaletension that needs to be smoothed out.

2.5 EBT data format

To allow the export and import of simulator statesbetween the user’s program and the simulator, aswell as storage of unprocessed results, the libraryuses a custom binary file format named Easy BlockTree (EBT). The decision to use a new format isa trade-off: an existing, extensive format such asHDF5 [14] could perform the same task while pro-viding tools and compatibility, but would also in-troduce much additional complexity and provide

features of little importance to the given use case,such as the ability to efficiently alter files alreadyon disk. In contrast, text-based formats such asCSV [5] waste performance and disk space if un-compressed. Fixed blocks of binary data (e.g. theGADGET2 format [10]) are not easily extensible,i.e. may break compatibility on minor changes.During development done for this thesis, the im-plementations of the setup scripts, analysis toolsand simulator were often changed independentlyand in parallel, so a certain level of compatibilityafter changes was useful.

The main features desired for the format are:

• Simple: a small program should be able toparse files to find and extract a piece of data.Code complexity should be minimal.

• Compact: large data blocks should be effi-ciently readable and induce minimal overhead

• Extensible: it should be possible to include ad-ditional types of data without breaking com-patibility with existing parsers

• Transparent: the format must allow any datato be stored without modification

• Indexing: means for quick navigation of largefiles should be provided

Structure

The idea of EBT is to only structure data and oth-erwise be as simple as possible. High-level meaningof data depends on context and need not be under-stood by programs working outside this context.

Concretely this means that EBT is recursive:data is stored in a sequence of blocks, which canagain be a sequence of blocks, and so forth. Thiseffectively creates a tree-structure, hence the nameblock tree. Each block is marked with a block typeID which signifies the meaning of the data within.This is where context dependence comes into play:if a block contains an EBT sequence, the meaningof IDs inside it depends on the container block’sID and consequently on any blocks further up thehierarchy. Further specification details, such aswhether block order matters or whether IDs areunique, are deliberately undefined in the generalcase to allow exploiting context.

The blocks come in two sizes, one with a 10 byteheader and one with a 3 byte header but a lowlimit of content length. Figure 2.8 shows their ex-act composition. See appendix A.1 for the specifi-cation.

16

2.5. EBT data format

Figure 2.8: Composition of EBT blocks. Each block has an ID and indicates thelength of its contents. The format comprises three types of blocks: a small onelimited to 255B length of data, one that allows for large amounts of data, and onethat contains a sequence of smaller EBT blocks. The flags to distinguish betweenthem are encoded in the sign bit of the ID and long size respectively. The smallestpossible blocks are 3 Bytes in length, end in 9 zero-bits and hold no informationexcept for their ID and existence.

Index

Fast seeking is provided as an optional extension.It introduces a complete or partial index of EBTblocks in a file. A minimal summary of the indexdesign targets would be: quick, easy to read, flexibleto write. More specifically, it should:

• accelerate navigation to a known address inan EBT file, where the address is a sequenceof block type IDs and integers that specify theamount of block occurrences to skip. Such anaddress would read similar to “The third planetin the fifth system in the first galaxy in thesixth cluster of the file”. Here, planet, system,etc. would be denoted by block type IDs.

• allow the index to be written such that it in-duces minimal hard disk overhead (can be read

in one chunk).

• allow flexible writing of the index so that itneed not be completely rewritten on changesand is not fixed to a position in the file, whichcan be inconvenient for certain writing algo-rithms.

• allow partial indexes; indexing of all blocks isnot required when the index is used.

• not complicate the code that reads the indexunnecessarily.

The indexing feature defines three types of blocks:an index address block, an index container block,and individual index blocks. Index blocks can listblock sequences in order, holding each child block’sID, position in the file, and the address of its index

17


block if present. A top-level container block is in-troduced that can be used to hold the entire index.An index container can be written at any positionin the file and pointed to from the first block in thefile.

Enabling a parser to use the index is fairly sim-ple. It must check whether the first block indicatesan index. If so, it navigates the index instead of theactual blocks until it either finds what it is lookingfor or hits a block that is not indexed. In the lattercase, it enters the lowest block it found by indexand then parses the child blocks normally.

The design idea behind the index is to allow ac-celerating the most time-consuming seeking opera-tions without unnecessary interference with the restof the format. Allowing to store the entire index atone location in the file works towards this goal intwo ways: first, it improves performance, as theentire index can be quickly read from disk in onechunk. Second, if all sorts of blocks held indexes,IDs would have to be reserved in all blocks for thatpurpose, which is not the case with a central in-dex. This should typically mean that the index iswritten at the end of a file.

The index definition is not confined to this typeof usage though. Since index blocks reference eachother by file address, the sub-index need not be inthe same container. This would make it possibleto pack lots of indexed EBT files into one file andwrite a “super index” that jumps to the individ-ual indexes after the first level, removing the needto reorganize the individual contained files’ indexesor to write a second index. All addresses of theindividual indexes would have to be shifted by theoriginal file’s position in the new container though.

This detail may still be suboptimal, but it is notrelevant for our use case. I consider continuing thedevelopment of the EBT index specification, for awider range of uses, after the completion of thisthesis. Generic EBT blocks themselves are alreadyagnostic to their position in a file, memory, or anarbitrary stream. It may be useful to modify the in-dex format to fulfill this condition as well, such thatlower-level index blocks remain valid, independentof changes in higher-level block positions.

The technical definitions of the index are omittedhere, as it is merely a performance optimization andlonger than the main format definition. Please referto appendix A.2 for the specification.

SPH-specific format definitions

Using the abstract format defined above requiresconcrete block meanings. A simulation file for

Espresso can be built using the following top-levelblocks:

• a rules block with simulator settings and globalproperties.

• snapshot blocks that hold particle data at agiven time within the simulation. Initial con-ditions of particles are also exported this way.

• the previously explained index container blockand the index address block to point to theindex.

• internal blocks used for Espresso development,such as a benchmark and a debug block.

Each of these are again EBT sequences; the indexcapability was used for navigation. This structureis used for both input and output of simulations,with input including the rules block and one snap-shot block that represents the initial conditions.

Figure 2.9 shows an example structure of anEBT file containing simulator output. The graphicshows many different types of information stored,but depending on the context repetitions of thesame block type can be allowed. As indicated bythe #1 for the snapshot, more snapshots can follow– at any later position in the file. Importantly, theEBT format fulfills its purpose of extensibility here.A hitherto unknown block type could be insertedanywhere and, unless the local block definition ex-plicitly forbids it, it would cause no problems in us-ing the remaining file normally. To my knowledge,no block currently has restrictions in this respect;the SPHS library and Espresso should skip overany additional information they do not understandwithout it interfering with their functionality.

A detailed listing of block contents is omittedhere, as it would be excessive and quickly outdated.While this thesis was written, Espresso was still un-der development and its format changed frequently.Its implementation of EBT – and thus also the in-and output of the setup library – did not yet matchthe final draft of the EBT format as seen in Fig-ure 2.8 exactly: blocks with EBT content outputby Espresso were not yet flagged as EBT formattedand such flags may cause errors. Also, Espressowas experimenting with a shortened version of theindex and a variation of the format introducing 2bit type information. Anyone interested in usingthe Espresso simulator is recommended to contactMartin Zintl [34] for up-to-date information.

18

2.7. Usage

Figure 2.9: Shows the rough layout of anEBT-formatted output for Espresso. Muchof the content is optional. Espresso can beset to not output unnecessary information.When creating input, a file with just therules and snapshot top-level blocks is valid,as is skipping most rules to use defaults.

2.6 Integration of plotting andrendering

A common method to plot data resulting from thesimulation is to output data points to a file in aprimitive format, such as a sequence of coordinatesin a binary block or in comma-separated text repre-sentation. Then, a separate plotting program (e.g.Octave [22], Gnuplot [11], IDL [16]) is executed,manually or by a separate scripting language. Suchan approach is viable in our use case, but subopti-mal, especially when coming from a high-level con-text: the extra conversion step complicates inter-action between the main program and the plottingroutines.

For this thesis, a simple wrapper over the CLIlibrary ZedGraph [33] to create 2D-plots was writ-ten. In addition, a program to interactively previewsimulation output as a 3D pixel cloud was imple-

mented via OpenTK [23] (a thin wrapper over theOpenGL cross-platform graphics API).

2.7 Usage

Combining the modules of the SPHS library isachieved by writing a program that uses them inorder. Remembering the slightly convoluted Fig-ure 2.1 might make this appear more complicatedthan it is. In usage, most interactions are imple-mentation details that happen in the backgroundand need not be explicitly mentioned in the call-ing code. A basic simulation can be set up in thefollowing steps:

• Write basic definitions to be used throughoutthe program, such as scaling factors and ana-lytically known distribution functions.

• Create simulator rules, usually by referring toa common set of rules and applying changeswhere necessary.

• Obtain the particle positions by using one ofthe placement algorithms.

• Map the positions to particles with the desiredproperties.

• Execute the simulation with the finished rulesand particles. The program waits for it to com-plete.

• Import results and run analysis code on them,optionally rendering the distribution or creat-ing plots.

Figure 2.10 shows a concrete usage example of thelibrary. It shows an F# function called demo thatdefines and executes a simulation. This functioncould be declared like this in any F# program, pro-vided that it has loaded the main SPHS featuresand its CGS-based standard units, via open SPHS

and open SPHS.Units respectively. Of course, thelibrary itself has to be present and referenced so thecompiler can find it.

The example simulation is very similar to thespherical collapse simulations in the upcoming sec-tions 2.8 and 2.9. A glass with a particle density of0.1 per cubic parsec is used to fill a sphere. Eachof the particles has a mass of 103M, resulting in aphysical density of 100M/pc3. When defining therules, the program copies Rules.DefaultSPH, a setof rules defined in the library that contains typicalrules for SPH. So any rules that are not specified are

19


01: let demo () =

02: let name = "example"

03: let box = Cuboid.CenteredCube (100. * parsec)

04:

05: let rules =

06:

07: Rules.DefaultSPH with

08: bounds = box

09: dt = 1e4 * year

10: snapshots = 201L

11: enableGravity = true

12:

13:

14: let ball (pos : Vector3<cm>) = pos.Length() < 40.0 * parsec

15:

16: Make.glass(true, rules, box, ball, (0.1/pc3))

17: |> Seq.map (fun pos ->

18: Particle.Zero with Position = pos; m = 1e3 * MSun; e = 1e-4 * erg/g)

19: |> Simulation.runLocal rules name

20:

21: Plot.Interactive box name 0 200 0.0 1.0

Figure 2.10: A program that uses the SPHS library to simulate a homogeneousgas ball. Lines 7–11 define how the simulator settings called rules differ from thedefault, where dt is the time between output snapshots. Line 14 defines a booleandistribution: a ball at the center with 40 parsec radius. Line 16 uses the glassgenerator to obtain individual positions, which lines 17–18 map to a sequenceof particles with fixed properties. Note that the pipeline operator |> is used topass the previous result to the next expression. Line 19 exports the particles andrules and runs the simulation, waiting until it finishes and stores the results ondisk (under “example”). Line 21 imports the results and launches an interactiveprogram to examine snapshots 0–200. The last two parameters influence color inrendering.

filled with default settings. For example, the equa-tion of state in the simulation is adiabatic. For thesimulations in Section 2.8, the lines isIsotherm =

true and constEnergy = Some(e) were added tothe rules, where e is the constant specific energyfor the simulation.

To readers not used to functional programming,the most alien part of the example might be lines16–17. (These were originally written as one linethat was split due to width restrictions in print, buteither way is valid.) They take the sequence of po-sitions from the glass generator call above and mapthe positions to particles, which are needed for thesimulation in the line below. Map here refers to thehigher-order function, which is a standard tool infunctional programming. It applies another func-tion – which it is given as an argument – to eachelement of a sequence, returning the sequence of

results. Seq.map is the most general (least restric-tively typed) version of map in F#.

The particle sequence is then passed toSimulation.runLocal, along with the rules anda name for output files. This function conductsthe interaction with Espresso. Internally, it firstre-writes selected source files of Espresso and ex-ports rules and particles into an EBT-formattedfile, then runs a batch file that causes Espresso tocheck whether its compilation state matches the de-sired one – if not, Espresso is recompiled – and runsthe simulation. The function blocks its thread untilEspresso terminates to allow waiting for the results.

The last line imports the results and launchesan interactive colored pixel cloud renderer that canbe used to watch the simulation progress. This in-teractive renderer is not given much attention inthis thesis because Martin Zintl [34] is working on a

20

2.8. Test: isothermal spherical collapse

high-performance volumetric visualization programfor Windows that can read EBT simulation output.It utilizes DirectX 11 for GPU acceleration and willprovide interactive visualization for upcoming ver-sions of Espresso.

Line 21 in Figure 2.10 can be seen as a place-holder for any kind of analysis code. Usually, aprogram to calculate properties of the results wouldbe in its place, such as a comparison between re-sults and analytical expectations. This is where,in the creation of this thesis, the plotting featuresfrom Section 2.6 were typically used.

2.8 Test: isothermal sphericalcollapse

When the setup library was first used in con-junction with Espresso, the latter’s gravity imple-mentation had not been active during recent codechanges. It thus needed testing to ensure that thechanges would not impede correct behavior.

A simple first test of gravity is to simulate thecollapse of a homogeneous sphere with no or neg-ligible pressure support. This case is especiallysuited because the acceleration of particles towardthe sphere’s center increases linearly with the par-ticles’ distance to the center, keeping the cloud’sdensity profile homogeneous during the entire col-lapse. This is quite straightforward to show, as-suming Newton’s formulation of gravity and non-expanding Euclidean space – which is reasonablein simulation of this type and scale. First, we notethat the gravitational potential outside a spheri-cally symmetric body is equal to the potential ofa point of the same mass at the body’s center (seeGauß’ gravity or Newton’s Shell Theorem). Then,we calculate the acceleration and note that it islinear in radius as long as the density distributionremains homogeneous:

a(r) = −GMr2

= −4Gr3πρ

3r2= −4

3Gρr

=⇒ a(r) ∼ −r

Here, ρ is the gas density in the sphere and thusM = (4/3)r3πρ; G is the gravitational constant.At the beginning of the simulation, the sphere is atrest; it is neither contracting nor expanding. Thismeans velocity is a linear function of radius, albeitwith a slope of zero. From then on, at any pointin time until the sphere has collapsed, velocity re-mains a linear function of radius: applying acceler-ation to the velocity means that we add two func-tions which are linear in the same parameter, which

is r in this case. Such an operation always yieldsa function again linear in this parameter. The ve-locity remains linear in r, so the sphere remainshomogeneous, which in turn means that the accel-eration remains linear in r as well.

This implies that a very simple one-dimensionalsolver can already predict its behavior in collapse,providing expected results for comparison. Know-ing that the sphere remains homogeneous, trackingonly its outer shell is sufficient to calculate its prop-erties throughout the collapse. An iterative solvercan do this by updating only the radius and radialvelocity of the outermost end. The gravitationalacceleration at the outer shell of the sphere is just:

aouter = r = −GMr2

where M is the sphere’s total mass. This allows tointegrate approximately:

• Update r using ∆t and r

• Update r to r + r∆t

Given sufficiently small time-steps, this providesthe expected radii and densities of the sphere dur-ing the collapse.

There is an analytical solution as well [25]. Theanalytical expression for the free-fall time tff =√

3π/(32Gρ) is used in the following to calculatethe expected final collapse time.

The corresponding simulation in Espresso is asphere of 268,000 particles at negligible tempera-ture, arranged in an FCC grid by the grid creatorintroduced in Section 2.3. Figure 2.11 shows a com-parison of the density over time between the iter-ative solver and the Espresso simulation, spanningover 92% of the analytical time of collapse, showingnearly identical results.

As a consistency check for the multiple unit con-versions between the SPHS library and Espresso,the collapse program calculated densities with dif-ferent methods. One counted particles in a shellfrom 10 to 12 parsec from the center and multipliedthe count with the constant particle mass of the ini-tial setup, then divided by the sampled volume; theother set Espresso to export the physical densitiesused for SPH in its EBT output, then filtered par-ticles from the same area the previous algorithmused. Both methods agreed very well with eachother (data not shown as it looks near-identical).

What either of these two method did not dowas to check the spatial homogeneity of the den-sity. The sampling in a shell to plot density over

21


Figure 2.11: Collapse of an isothermal sphere at very low temperature in Espressoand the iterative solver, showing very good agreement. The sphere is far enoughabove the Jeans mass at its temperature to collapse freely.

time (for Figure 2.11) would have shown most ma-jor inhomogeneities as oscillations around the ex-pected density over time. But it could miss fluc-tuations forming inside the shell and, though un-likely, in theory a strange inhomogeneity could off-set another error and yield the correct result for thewrong reasons. To verify homogeneity, Figure 2.12shows the radial density distributions at differenttimes of the collapse, including the density-over-radius function produced by the iterative solver.With the exception of the outer edge, which is dif-ficult to sample due to the particle-based simula-tion, the distribution looks very homogeneous anddevelops as expected.

The simulation was repeated multiple times withdifferent parameters. The plot shown in Figure 2.12comes from a simulation with a particle count of2.68M, tenfold that of the original run. The dif-ference in smoothness to the lower-resolution plotis minor (data not shown); 3D renderings of thedistribution suggest that even this minimal differ-ence is only caused by the sampling noise for theplot; the algorithm for its creation simply countedparticles without weighting them.

Runs with deactivated hydrodynamics producea similar result as those with a very small tem-perature at the beginning, but allow the simula-tion to continue into the final moments of the col-lapse. Analysis using 3D-visualization over time

showed that the collapse time matches expectationsto within 1% or better; it was exact within the pre-cision of snapshot sampling used in that simulation.

Overall, the simulations reproduced the isother-mal spherical collapse very well. Deviations fromthe expected values were small, yet still dominatedby resolution and sampling.

2.9 Test: adiabatic energyconservation

The more or less purely gravitational collapses ofthe previous section are very specific tests of theimplementation of gravity; with the exception ofthe late phases of some of the simulations, they ig-nore hydrodynamics and thereby the essence of anSPH simulation. For a wider test, another collapsesetup was created: the heating and re-expansion ofa contracting sphere with an adiabatic equation ofstate.

The purpose of the simulation is again to testEspresso, this time using gravity and hydrodynam-ics in conjunction and running analysis code thatfocuses on the conservation of energy.

In an adiabatic process, by definition no heat isexchanged between the considered system and itsenvironment. Thus, in any simulation declared as

22

2.9. Test: adiabatic energy conservation

Figure 2.12: Radial density profile plots of different times in an isothermal col-lapse. For slightly better sampling, it was created from the 2.68M particles run.Of the 81 snapshots during the free-fall time, those dividable by 10 up to 70 areshown, each of which shows good homogeneity.

adiabatic, the total energy of an isolated gas cloudshould remain constant over time.

For the purposes of the simulation, there arethree variable energy components of the system:

• the thermal energy of the gas

• the kinetic energy of the particles

• the gravitational potential energy of the parti-cle distribution

Since we are conducting SPH, the first two aresomewhat intertwined compared to the physical re-ality. SPH particles substitute for a large quan-tity of actual gas particles, so the distinction be-tween thermal and kinetic energy is an arbitrarychoice defined by the size of simulated particles. Inother words, physical processes that are no longerresolved become temperature. This is a generalproperty of the concept of temperature and not anissue for the test at hand, as we only sum over bothquantities, while the SPH implementation must en-sure that the particles’ properties correctly mimicthe microscopic systems they represent. Thoughits meaning may be a little muddy, the total ki-netic energy of the SPH particles, as measured fromtheir mass and velocities, gives a good impressionof the large-scale gas movement, so it is includedseparately in plots.

Obtaining the first two quantities – thermal andkinetic energy – from the simulator is very simple.Espresso is set to output the specific energies andvelocities of all particles, then the analysis programconverts these to energies and sums over all parti-cles. The gravitational potential energy is not sosimple. It is encoded in the positions of all parti-cles, or, more precisely, all pairs of particles:

Epot =1

2

∑i 6=j

− Gmimj

|~xi − ~xj |

Calculating this sum (or half of it, by skipping theduplicate pairs that only differ in swapped indices)is called the direct summation algorithm for gravity.Such an algorithm is evidently O(n2) and thereforenot suitable for the execution of simulations withlarge particle numbers. Espresso therefore uses anoctree to approximate the gravitational forces onparticles.

For the purposes of measuring energy within asingle snapshot, direct summation can still be used:since a sum over all particles must yield the cor-rect result, sampling a subset of particles returnsan estimate of the exact result of direct summa-tion. Given a sufficiently large, randomly selectedsample, this can be used to approximate the grav-

23


Figure 2.13: Energy measurements over time of an adiabatic spherical collapse.The simulated object was a 40 parsec gas sphere in vacuum surroundings at adensity of 1 particle per cubic parsec weighing 1000M each, set up using theglass-cut generator described in Section 2.3 and a very small initial temperatureof 10−7 · EJeans. The noise in the gravitational and total energy is dominated byinsufficient sampling of distances to determine gravitational energy for the plot;precision is also dependent on the CFL criterion parameter passed to Espresso.Both sources of error can be reduced further at the expense of performance.

itational potential energy.

Epot ≈1

2

(n

nsample

)2∑i 6=j

− Gmimj

|~xi − ~xj |,

where i, j ∈ sample. This is rather imprecisefor small samples and slow otherwise, but has theadvantage of not using the same representation asEspresso, which would somewhat defeat the pur-pose of testing its validity.

Figure 2.13 shows the energy measurements overtime in the simulation. Due to the adiabatic equa-tion of state, the collapsing gas heats up and stopsthe contraction. The simulation setup is again a40 parsec radius sphere, though set up using theglass-cut placement algorithm. (See the figure de-scription for more simulation details.) The glassgenerator was preferred over a grid because the dis-tinct grid axes can cause large-scale anisotropic be-havior. Glass is agnostic to direction and thereforeconsidered to better represent problems such as thisone.

The outcome of this simulation was not stronglydependent on particle count (multiple similar plots

to Figure 2.13 not shown), but sensitive to time-resolution, which is determined by the Courantnumber: the maximum factor between time-stepsand the Courant-Friedrichs-Lewy (CFL) [3] stabil-ity limit. With increases in the average time-stepsfor the hydrodynamical simulation, energy conser-vation was increasingly violated. Espresso’s im-plementation and default settings were undergoingchanges at the time of this writing, including al-gorithmic changes that may affect the influence oftime resolution on energy conservation; please re-fer to Martin Zintl [34] for a current specificationof simulator settings.

As a simple additional test, the same sphere wasset up again with a homogeneous temperature thatplaces it exactly at its Jeans mass. Figure 2.14shows the outcome: the cloud only slightly rear-ranged itself to form a stable density profile, butotherwise showed very little change in the energydistribution. As in the previous run, time is scaledto the free-fall time that would be expected in theabsence of pressure support, as expected for thissetup.

24

2.9. Test: adiabatic energy conservation

Figure 2.14: Energy measurements over time of an adiabatic simulation set upsimilar to the one presented in Figure 2.13, but with the initial gas temperaturecalculated such that the structure is exactly at Jeans mass. The run shows onlyvery little change in the distribution of energy between the potential, kinetic andthermal fraction.

25

3 Radial simulation of isothermalself-gravitating filaments

Chapter 2 dealt with code employed around SPHsimulations – but not the simulation itself. Thischapter will attempt to make up for this, thoughthe code used for the most part is not an SPH code,but a grid code, which is much better suited to theproblem at hand. To examine the radial profileof isothermal filaments and their development overtime, a one-dimensional simulation in a cylindricalspace topology is implemented.

Analytically known stationary density profiles ofinfinitely extended cylinders are introduced andextended to pressure-supported and finite-radiuscases. Simulations are set up to attempt their re-production and study their dynamics and stability.

Star-forming filaments are elongated structuresthat to some extent resemble cylinders. Before weget to the simulation details, let us begin with someobservational context and theoretical backgroundon these.

3.1 Star-forming filaments inobservations

Star-forming molecular clouds feature prominentfilamentary structures quite different from the dif-fuse host cloud surrounding them. Dense cores ob-served in their vicinity are an important source oflow-mass stars. Understanding the processes thatconnect these two structures – and their environ-ment – is integral to the understanding of star-formation.

Hacar and Tafalla 2011 [13] studied filaments ofthe L1517 molecular cloud in detail and their workprovides the main motivation for the content of thischapter. The filaments consist of approximatelyisothermal molecular gas at temperatures of about10K. Multiple filaments can form long chains, suchas seen in Figure 1.1 of the introduction; individ-ual filaments are thread-like shapes typically 0.5parsec in length. In contradiction to theories that

proposed turbulence as a dominant factor in thestabilization and behavior of filaments, these ob-servations found them to be very quiescent objectsof sub-sonic inner motion and a strongly coherentvelocity profile.

This new image of filaments makes them very dis-tinct from the molecular cloud’s components theyare embedded in, which is turbulent and dominatedby supersonic motion (see again [13] for many de-tails and sources). The observed filaments appearmore similar to the dense cores that form individ-ual stars, which are supported by internal thermalpressure [12], than to their environment. The sur-rounding gas can then act as pressure support forthe filament.

An isothermal cylinder model can be applied toapproximate the filaments’ behavior. Theoreticalanalysis of radial stability in infinite isothermalcylinders yields profiles known as Ostriker distribu-tions (Discussed in Section 3.2 in detail). Of fourdensity profiles of filaments studied by Hacar andTafalla [13], three displayed a remarkable resem-blance to the corresponding Ostriker distributionfunction. The mass per length found for the fila-ments was also reasonably close to a critical valuerelevant in the same model.

These findings may imply a model on star for-mation in molecular clouds:

1. Filament formation occurs. There are multi-ple suggested mechanisms for this, for exam-ple perturbations caused by uniform magneticfields, as proposed by Nagai et al. 1998 [21].

2. The individual filaments relax into a quiescentstate maintained by self-gravitation and pres-sure support from the environment, while sup-ported against collapse by their own isother-mal gas pressure. Radial density profiles atthis time may resemble the cylinders of Os-triker’s model.

26

3.2. The Ostriker density profile

3. The filaments fragment into dense cores thatsurpass their Jeans mass and collapse, untilthey become opaque and leave the isother-mal regime, to ultimately form stars. See e.g.Schmalzl et al. 2010 [29].

Simulations of filaments face multiple problems.For one, including a turbulence-dominated environ-ment is not easy. Another issue is the filament’simmediate instability when simulated in isolation.From perturbation analysis, fragmentation alongthe axis with a wavelength on the order of foureffective radii is expected. However, a naıve imple-mentation to simulate a typical filament will notlive to see this. The structure’s gravitational pullquickly collapses the outer ends, pulling them intothe center and forming more of a blob than a threadas seen in reality. Multiple points may contributeto this discrepancy; magnetic fields and the natureof the pressure support may play a role and, im-portantly, the presence of other dense filaments ina chain as seen in L1517 provides gravitational pullon the filament’s axis.

We focus on the radial profile and formulate thefollowing questions to the proposed theory:

• How does the Ostriker model, which is in-finitely extended in both the radial and axialdirection, relate to the finite physical filamentsobserved in the ISM?

• The Ostriker model defines a specific mass perlength for which it is valid. What is its mean-ing in a real-world context that does not matchan exact mass value?

• Can we numerically support a model that an-swers these questions and show its stability, inagreement with observations?

To answer these questions, we first discuss the ana-lytical implications of Ostriker profiles. We thenproceed to create a simulation that mimics theradial dynamics of isothermal, cylindrically sym-metric distributions – first attempting to use a re-stricted version of Espresso, then implementing acustom grid code.

3.2 The Ostriker density profile

J. Ostriker 1964 [24] derived the analytically sta-ble – actually metastable – solution to the radialdensity profile of infinitely extended self-gravitating

cylinders. His proofs yield a family of stable distri-bution functions for the isothermal case:

ρ(r) = ρ01(

1 + 18ξ

2)2 , where ξ =

√4πGρ0

c2sr

An important property of this result is that ρ0

is a free parameter. Theoretically, a solution ex-ists for any central density. This is also why thisinfinite-radius case is not stable in the sense of self-stabilization; it could shift between energeticallyequivalent solutions.

This density distribution can be reformulated abit more conveniently by defining H = r

√8/ξ:

ρ(r) =ρ0

(1 + (r/H)2)2 ; H =

√2c2sπGρ0

This formulation has the advantage that H is easyto relate to as a characteristic scale height: themass per length of a homogeneous cylinder withdensity ρ0 and radius H equals that of its Ostrikerequivalent. Now, we integrate the distribution overradius:

ˆ R

0

ˆ 2π

0

ρ(r)dϕdr =

ˆ R

0

2πrρ(r)dr =

= 2πρ0

ˆ R

0

r

(1 + (r/H)2)2 dr = πρ0

H2R2

H2 +R2

This integration for finite radii will be useful in abit, but for now, let us use it to determine the totalmass per length in the infinitely extended case:(

M

L

)crit

=

ˆ ∞r=0

ˆ 2π

ϕ=0

ρ(r)r dϕdr

= ρ0H2π =

2c2sG

This shows that there exists an exactly definedmass per length for all infinite Ostriker distribu-tions, the critical mass per length.

How to apply the Ostriker solution to physical fil-aments is not immediately clear. It is unlikely thatan actual gas cloud has the correct mass per length;even if it were to match exactly, the free parame-ter ρ0 would remain unaccounted for, as well asthe pressurized environment of the molecular cloudthat contains the filament.

Yet, these matters together can be used to fullydefine the Ostriker distribution for a pressure-supported case. Note first that ρ0 is the only free

27

3. Radial simulation of isothermal self-gravitating filaments

parameter of the general Ostriker distribution. Hdepends on it in an unambiguous manner and all re-maining parameters have been assumed to be con-stants.

Therefore, for finite-radius filaments, we havefour parameters which we want to constrain withrespect to each other:

• The central density ρ0.

• The outer radius after relaxation R.

• The outside pressure support Pouter.

• The mass per length (M/L) < (M/L)crit.

(We expect that any setup at or above critical(M/L) that begins contracting contracts indefi-nitely, restricting ourselves to lower masses.)

For the purposes of obtaining quiescent fila-ments, we consider an external influx of mass notrelevant, as we are interested in a stable solution.Thus we assume that mass is conserved. Thismakes (M/L) a fixed, constant property of thestudied system. Mass influx was not examined forthis thesis. Should this become of interest, the con-ducted simulations could easily be modified to in-clude it.

Now, we have three remaining parameters to re-late: ρ0, R and ρ(R). But we have already donethis. The integration of the Ostriker distributionfamily to a finite radius must yield our fixed (M/L).

M

L= πρ0

H2R2

H2 +R2

As we noted, H and ρ0 are essentially the sameparameter. So we have successfully related R withρ0 – and if either is given, the Ostriker distribu-tion function ρ(r) is fully determined. Since thefilaments are isothermal and pressure should notjump at the outer edge for a stationary solution,the pressure support is then simply:

Pouter = ρ(R)c2s

An interesting interpretation of this is that we arecapping infinite Ostriker filaments at finite radii.Gravitationally, there is no inward flow of informa-tion. This will be explained for the gravity detailsof Section 3.4. Hydrodynamical forces are local,so when in pressure equilibrium, the inner part ofthe filament is not influenced by its outer part; theforces acting inside the filament are identical be-tween the infinite and pressure-supported cases.

With this, we have derived a well-defined stablesolution for every (M/L) < (M/L)crit at a fixedfilament radius. In addition, we have shown thatgiven (M/L), specifying a distribution’s radius isequivalent to providing the central density or theexternal pressure.

3.3 Espresso runs

A first series of simulations of isothermal filamentswas conducted by running the Espresso simulatorusing tools from Chapter 2. An isothermal, cylin-drical filament was set up in “noisy” random place-ment mode with a cylindrical distribution of parti-cles of identical mass following an Ostriker profile.To check its properties without immediately col-lapsing it along its long axis, Espresso was modifiedto restrict movement along this axis. All velocitiesalong it were reduced to almost zero, to still allowsome relaxation on the scales of particle distances,while preventing any significant movement duringthe simulated time.

Analyzing processes in the radial direction of fil-aments this way proved difficult and ultimately ledto the custom simulator that is the main topic ofthis chapter. The main issues were:

• A lack of a suitable 2D gravity setting. In areal two-dimensional simulation, the gravita-tional force of a circularly symmetric objectscales with r−1, not r−2 as for a sphericallysymmetric object in three dimensions. Whenthe simulations were conducted, Espresso didnot have operational support for this.

• In a similar vein, the simulator does not allow1D simulations, nor would it allow applyinga non-Euclidean space topology. Espresso isnot intended for this type of problem, so a full3D simulation had to be performed, costingperformance.

• The placement algorithms were not intendedfor a three-dimensional simulation with re-stricted movement along one axis; they maycreate layers with more particles than others,creating spurious tension in the gas that mayneed further analysis to not impair results. Al-lowing minimal movement along the cylinder’saxis removed these concerns, but in turn lim-ited total simulation time as this caused anunphysical type of collapse at the outer endswhen enough time passed for particles to movesignificantly.

28

3.3. Espresso runs

• Periodic boundary conditions were not sup-ported in Espresso’s gravity implementation.In combination with the lack of 2D gravitythis necessitates simulation of a long filamentto correctly represent the situation, renderingresults further to the ends of the filament in-applicable to the simple radial problem.

• The simulation boundaries were fixed tocuboids and streaming boundary conditionsonly supported for a constant flux. Thus, thesimulation used hard boundary conditions of anon-cylindrical shape, which interacted eitherwith the outer end of the filament or outer hotparticles placed to simulate external pressure.Any waves from initial relaxation reflected onthese boundaries, causing anisotropic waves.

• SPH surface tension created disturbanceswhen particles intended to provide outsidepressure interacted with filament particles. Se-tups partially mixing them initially resultedin each type locked between the others; hotparticles did not escape from the cold fila-ment as one might expect. Espresso was un-dergoing changes to this at the time of thesesimulations, the situation regarding this mayhave changed now, depending on the settingsEspresso runs with.

• Gravitational softening became a major influ-ence on results (this will be detailed next) andEspresso did not support adaptive softening atthe time of this writing (which would changesoftening with respect to the local conditions,see e.g. Iannuzzi and Dolag 2011 [15]).

There were additional minor problems, such as ran-dom placement or minimal random relative mo-tion between the locked “layers” displacing the fil-ament’s center, often differently for different sec-tions, complicating automated analysis of the re-sults.

Stability from softening

For the calculation of the gravitational force, simu-lated particles in Espresso are not represented as apoint mass. They use a potential that is flattenedout at the center, which is represented by splinesin the default settings. This technique, which isknown as gravitational softening, is common inSPH and used mainly for two reasons:

First, while approximating the gravitational po-tential of spherically symmetric bodies – like SPH

particles – by point masses does yield the correct re-sult at higher distances, it obviously does not whensampled inside that body.

Second, even more importantly, the implementa-tion of movement and acceleration in Espresso, likein many particle-based simulators, relies on the as-sumption that time-steps are small with respect toany acceleration of their particles. Changes in ve-locity, such as the effect of the gravitational force,should not be major within a single time-step.

In typical implementations, gravitational soften-ing reduces the overall effect of gravity. The effectof this reduction is not large in typical simulations,as gravity is equally important on all scales, butonly the smallest experience a significant reductionand, as mentioned earlier, this may even be physi-cally sensible in the case of overlapping particles.

Yet the effects of softening are not intuitive andcan be outright surprising. One might expect thatthe scale at which the potential is flattened – thesoftening parameter – sets the scale for distortionsto the physical reality and that in cases of highmass concentrations effects from softening get out-weighed anywhere outside this scale. A 1996 paperby Sommer-Larsen et al. [30] shows that this isnot at all the case. To quote one of their findings:“We demonstrate the perhaps somewhat surprisingresult that even in the complete absence of rota-tional support it is possible, for any finite choice ofsoftening length ε and temperature T , to deposit anarbitrarily large mass of gas in pressure equilibriumand with a non-singular density distribution insideof r0 for any r0 > 0.”

Simulations in Espresso aimed at reproducingthe radial density profile of isothermal filamentsand their relaxation suffered from the same typeof problem. Ostriker filaments at rest were setup, only to relax into unexpected configurations.Figure 3.1 shows three variants of a simulation at160% of the critical mass per length, differing onlyin their softening parameters. They yield distribu-tions that are much larger than the largest soften-ing parameter used.

A similar set of simulations experimented withvarying initial velocities – and yielded differing fi-nal results! This should not have a physical expla-nation to my knowledge, as Ostriker filaments’ ra-dial stability should only depend on their mass perlength.

Analysis of the effect and fine-tuning of the soft-ening, or implementation of adaptive softening,may solve this issue. However, the combination ofproblems shown in this section made it preferableto put this type of analysis on hold and instead

29


Figure 3.1: Radial profiles of theoretically unstable cylinders – above critical M/L– after relaxation in Espresso, patched to restrict movement along the cylinderaxis, with varying softening parameter. The upper plot shows the radial densitydistribution as measured from the center of the initial setup. The lower plotshows ∂m/∂r from the same measurement. The gravitational softening plays alarger role than one would expect, creating shapes much larger than the softeningscale. Impulse due to random noise allowed the most collapsed cylinder to driftfrom the center. The mass distribution of the high-softening run (blue line) isso expanded that its mass distribution plot is capped at the outer end, while itshows no notable peak in its density distribution.

30

3.4. Simulator implementation

concentrate on a kind of simulation specific to thecase of interest.

3.4 Simulator implementation

With the problems associated with the use of three-dimensional SPH simulations for the radial analy-sis and the model not yet numerically reproduced,a specialized solution to the specific problem of re-laxation in radial density profiles becomes an op-tion. A one-dimensional approach to the problemhas the advantage that it is inexpensive in termsof computations and allows to achieve much higherresolution. A CPU-based solution written directlyin F#, without extensive optimization, suffices tocreate a simulation of good speed and resolution.

The algorithm used is a custom grid-code forisothermal hydrodynamics, with second-order ac-curacy in space and slope limitation for the den-sity. It can be switched between a simple “pipe”topology and the radial direction of an infinite,perfectly symmetric cylinder. It could in princi-ple be switched to other 1D topologies describedby a function yielding cell interface widths alongthe simulated axis. The time-step used is constantin space but adaptive in time.

Grid

The simulation’s data structure is designed for sim-ple grid advection. Space is divided into cells ofequal length along the simulated direction. Eachcell holds the density and velocity of the containedgas; it is represented as a structure that holds twodouble-precision floating point numbers as its onlydata fields:

[<Struct>]

type Cell =

val rho : float<g/cm^3>

val u : float<cm/s>

(...)

We fix CGS units as the numerical representation,which is unproblematic despite the somewhat largenumbers this produces: double-precision floating-point values retain their full fraction’s precision inan interval a little larger than [10−307, 10308]. Evena cubic kiloparsec, 1 kpc3, only equals about 3 ·1064cm3.

(The Struct attribute is specific to the CILtype system; it indicates a value type that allowsthe type to be used without allocating a separate,

reference-managed object. See F#/CLI references[7] [20] for details.)

The topology of the 1D-space to simulate is givenby a record holding, among other fields, a surfacefunction that maps positions along the simulatedaxis (e.g. radii) to cell’s cross-sections, which arealso the surfaces between the cells. This allows cellsto vary in size.

In terms of the unit-of-measure system, a cell sur-face is a length and the content of a cell is a massper length. In other words, the simulator always as-sumes that it is simulating a two-dimensional cutthat is infinitely repeated along the third dimen-sion. This is useful for the cylindrical topology andsufficiently flexible to test and debug in a simple“pipe” topology. The latter can be achieved by pro-viding a constant surface function. For the cylin-drical simulation, the surface function is a circle:S/L = 2rπ, where S is a surface between cells, L isthe length in the (not simulated) z direction, andr is the position along the simulated axis.

The ability to switch topology could in principlebe used for other simulations, such as a pipe ofvarying width or a pressure wave from one pointon a planet to the opposite side. It seems unlikelythat this feature will be used, but there was littlereason to restrict it, so it remains available.

Piecewise linear interpolation

The simulator only stores average densities of cells,yet does use a piecewise linear density function incalculations. It was created with the help of a lec-ture script about grid codes by Springel and Dulle-mond [31] which also introduces linear interpola-tion schemes and slope limiters. This subsection,as well as the following one on application of hy-drodynamics, was written with repeated use of thelecture as a reference.

Inside a single cell, both its width – the amountof space per length – and density are a linear func-tion of position. Thus, the topology gets reducedto a piecewise linear function too. (This makes nodifference for cylinders.) The algorithm only sam-ples cell size at their centers and borders. If onewere to input a non-linear topology, it would onlybe approximated by the cells.

Interpolating the size of space is straightforward.The slope of the density distribution inside a cell,however, is not fully determined by the simulation’sstate. As previously mentioned, cells only storetheir gas velocity and mean density. Any infor-mation on the sub-cell distribution is lost between

31


time-steps. Therefore, the slope must be deter-mined anew before each advection step.

For each cell, the algorithm estimates the slopetoward its adjacent cells in both directions. It as-sumes that each cell has its mean density at thecenter. Note that due to the varying cell shapes,this is only an approximation. Next, it chooses theslope such that it interpolates between these centralpoints:

s(i, right) =ρi+1 − ρi

∆r

s(i, left) =ρi − ρi−1

∆r

Here i is a cell index, s a slope on one side whendoing simple linear interpolation, ρ the density and∆r the distance between two cell centers. Now, ina typical case, something like the average of thesetwo slopes might be a sensible choice. The troubleis that they can be vastly different. What wouldseem like the proper interpolation on one side thenwould create an overshoot – or undershoot – of theinterpolation at the cell interface on the other side.

The choice of a slope-limiting scheme determineshow steep slopes can become in questionable cases.For this one-dimensional simulator, spatial preci-sion was already high without interpolation, soavoiding numerical artifacts was the priority. Hencea conservative option is used: picking the less steepslope if their signs match and zero – no interpola-tion at all – otherwise. (Which happens at peaksand bottoms of the distribution.) This method isknown as the minmod slope limiter, named afterthe minmod function it uses:

minmod(a, b) =

0 if ab ≤ 0

a if ab > 0 ∧ |a| ≤ |b|b if ab > 0 ∧ |a| ≥ |b|

This formulation emphasizes symmetry but maylook strange. With ab > 0 and |a| = |b| both thesecond and third cases are true. This is consistent;it implies that a = b. Therefore, both the secondand third cases are indeed true.

Using both slope estimates and the minmodfunction, a cell’s final density slope is:

slope(i) = minmod (s(i, left), s(i, right))

= minmod

(ρi − ρi−1

∆r,ρi+1 − ρi

∆r

).

Hydrodynamics

The algorithm for fluid dynamics follows themethods detailed in Chapter 5 of the Springel-Dullemond lecture notes [31] and adapts them tothe case at hand. The main modification is the in-clusion of the interpolation scheme for cell size anddensity introduced in the previous subsection.

With simulations limited to the isothermal case,there is no need for an energy equation. We mustadhere to Euler’s continuity and momentum equa-tions though:

∂tρ+∇ · (ρ~u) = 0

∂t(ρ~u) +∇ · (~u⊗ ρ~u) +∇P + ρ∇Φ︸︷︷︸gravity

= ~0

Here, ~u indicates gas velocity and P its pressure.We ignore the gravitational term; a Gauß solverdetailed in a later part of this section will handleit. Since we only use one dimension of movement,these equations become:

∂tρ+ ∂r(ρu) = 0

∂t(ρu) + ∂r(ρu2) + ∂rP = 0

Since the actual simulations are in a cylinder’s ra-dial axis, this axis is denoted with r. With theexception of the pressure term in the momentumequation, these equations can be applied by usingan advection algorithm: a program that lets massand momentum flow between cells, in accordancewith the local gas velocity. The pressure term canthen be added by approximating the pressure gra-dient for each cell and applying the correspondingacceleration. We begin by dividing the algorithminto two steps.(

ρnew

(ρu)new

)= pressurize

(advect

(ρρu

))The advection equations follow from the previous1D Euler equations when the pressure term is ne-glected. For a time-step of ∆t, we obtain new val-ues for ρ and ρu:

ρnew = ρ−∆t∂r(uρ)

(ρu)mid = ρu−∆t∂r(uρu)

The subscript “mid” signifies an intermediate re-sult, “new” the resulting values after the time-stepwas applied. The final density of one time-step fol-lows directly from advection, while the resulting

32

3.4. Simulator implementation

impulse – and thus velocity – is still missing thepressure term.

The two terms of the form ∆t∂r(u · quantity)are nothing more than the integrated fluxes of thequantities to adjacent cells. The algorithm acquiresthese by calculating what fraction of one cell’s massmoves into the other in ∆t at the local velocity. Itsubtracts the resulting amount of mass and impulsefrom the cell with outgoing movement and adds itto the other. The velocity at the boundary is esti-mated by averaging over the gas velocities of bothcells. These calculations heed the linear interpola-tion schemes for cell size and density.

Note that this is an upwind scheme that limitsthe flow of gas strictly to the direction of the cur-rent velocity at the cell boundary. No informationflows opposite to the velocity’s direction in the ad-vection step.

Time-steps throughout the simulation were lim-ited such that in any of them, advection does notmove gas further than 0.39 cell-lengths. Such a fac-tor connecting cell size to the gas movement duringone step of time-integration is known as a Courantnumber, named after the Courant-Friedichs-Lewy(CFL) condition introduced in the thee authors’1928 paper [3], which states the somewhat sensi-ble rule that algorithms with a one-cell interactionrange should not execute movements in a singlestep that exceed one cell’s size.

We are now left with only with the pressure term,which is added in a second step:

(ρu)new = (ρu)mid −∆t∂rP

In the isothermal case, pressure is related to thespeed of sound by P = c2sρ. This allows us to ap-proximate ∂rP :

∂rP = ∂rc2sρ

= c2sρright − ρleft

2 ·∆rcell

With this, we can calculate the acceleration. Wehave now obtained both ρnew and (ρu)new. Divid-ing the two yields the new velocity and concludesthe time-step.

Gravity

The simulator has a gravity feature, which can beenabled with a boolean called EnableGravity inthe topology record. It is implemented using theGaussian formulation of gravity:

∇~g = −4πGρ

This means that density is the source term for thegravitational field. The law can also be written inintegral form:

‹~gd ~A = −4πGM

where M is the total enclosed mass. This meansthat the total flux through any closed surface scaleslinearly with the mass contained within the surface.

This formulation makes it evident that the to-tal gravitational flux through a closed surface onlydepends on the mass inside the surface, not on itsdistribution, nor on any mass outside of it. This isexploited by the algorithm. It is assumed that thefirst simulated cell is the innermost area and thatgravitational field lines can only “escape” througheach cell’s outer surface. The algorithm iteratesoutward from that innermost cell, applying thegravitational acceleration on each cell. A rough de-scription of the function applied to each cell wouldbe:

• Take the total mass of all cells further insideas a parameter.

• Calculate the size of a closed surface at thecenter of the cell, e.g. a ring around the centerfor the cylinder (this calls the space topologyfunction).

• Calculate the total mass up to the center ofthe cell (the mass in the inner half of the cellplus the mass of all cells further inside).

• Use the obtained surface and contained massto calculate the gravitational flux at the cell’scenter. Apply the corresponding accelerationto the cell.

• Recursively call this function on the next cell,adding this cell’s mass to the running total.

Note that this method is O(n) in the numberof cells. However, as in pure hydrodynamics,time-steps effectively scale with (1/n), limiting thecomputational order of simulating a fixed time toO(n2).

Pressurized Surroundings

Star-forming filaments are usually embedded in amolecular cloud, a comparably hotter, more turbu-lent environment that provides pressure support.To allow simulation of such situations, a constantexternal pressure can be provided as an optionalvalue of the topology record:

33


/// External pressure at the higher end

/// of the simulation

ExtPressure : float<g/(cm * s^2)> option

If provided, this changes the behavior of the simu-lator. The amount of simulated cells becomes vari-able, allowing cells at the outer end of the simula-tion to be disabled. If the external pressure sufficesto push the contents of the outermost simulated cellinto the second-outermost, the cells are collapsedand the then-empty outermost cell gets disabled.Conversely, if the outermost active cell is filled at adensity sufficient to surmount the external pressureand is currently moving outward, normal advectionto the next cell in the outward direction is enabled,adding the cell to the active simulation.

The current implementation does not re-allocatecells, so an absolute maximum size of simulationspace must be provided, above which the numberof active cells cannot grow. This could be triv-ially changed if ever needed – it might not be usefulthough, since growth of the simulation beyond ex-pected sizes could indicate a problem and indefiniteexpansion of simulated space might cause extremeslowdown of the simulator, delaying termination ofa faulty run.

3.5 Artificial dissipation

The simulator implementation presented so far onlydissipates waves as a numerical artifact. It doesnot conserve energy – by definition, as it uses anisothermal equation of state. Still, this formulationis not inherently dissipative.

For a very simple example that disregards nu-merical dissipation, take a system of only two adja-cent cells at different densities, contained in reflec-tive boundary conditions to the outside, initiallyat rest. The difference in pressure will cause gasto start flowing from the more densely filled cell tothe other. But when they close in on equal density,the gas velocity has been significantly accelerated.To slow down the flow again, the inverse process ofthe acceleration has to occur, leaving the initiallymore thinly filled cell with the excess gas – and thecycle repeats. This would be a typical example ofa harmonic oscillator. If left to run for a very longtime, it may dissipate due to time-step discretiza-tion errors, but the time this takes would dependstrongly on numerical simulator settings, which isnot desired to play a major role in a simulation’soutcome.

The analog effect for multiple cells enables themto reflect density waves off the simulation bound-

aries. The limited resolution of the cells smears outsuch waves as they propagate, but left to its owndevices, the simulator would tend to show numeri-cally generated oscillations long before relaxing intoa steady state.

We are primarily interested in the relaxed distri-bution, so a way to dissipate movement from thesystem is needed. There is no requirement for aphysically exact relaxation process, so it is viable tosimply pick a time-scale and modify the algorithmaccordingly, such that it removes impulse from thesystem. This can be expressed by a parameter Ωindicating velocity dissipation over time:

v(t+ ∆t) = e−Ω∆t v(t) ≈ (1− Ω∆t) v(t)

Wherein the approximation is simply the first-orderterm of the Taylor series. Applying the last expres-sion reduces the overall velocities – provided that Ωis chosen such that Ω∆t < 1, which it should be. Ifnot, the algorithm falls back to setting all velocitiesto zero.

Oscillation suppression

The methods described to this point are not en-tirely sufficient to ascertain that simulations reli-ably relax into a steady state. We have not provenour simulation stable in combination with both theasymmetric, varying cell topology and the impulsedissipation that was just introduced. By using al-gorithms based on physical laws, we have providedreasonable interactions on short lengths and time-scales, but remain vulnerable to systematic errorsof the representation. A simple example of such aneffect is odd-even decoupling.

A cell’s change in impulse due to pressure is cal-culated using values from exactly the two neighbor-ing cells. This has the silly consequence that in theabsence of other forces, an equilibrium distributioncan have completely different density profiles forodd and even cells! Because of the diffusive natureof the grid on a moving fluid, this is usually not aproblem. But in general, it may produce artifacts.

Using only the previously introduced featurescan create such resolution-dependent artifacts.They are most problematic at the center of simu-lated space, where the geometry of cells and thedifference to their neighbors is most prominent.For the cylindrical simulation, this can cause wave-generation at the innermost end of the grid. Amulti-cell variant of odd-even decoupling creates aresolution-dependent preferred frequency of waves,

34

3.6. Fixed-border Ostriker simulations

which can interact with the velocity dissipation al-gorithm in such a way that it drives the wavescontinuously. It may sound strange that an ex-ponential dampening of velocity does not stop thisprocess, but the interaction between a simple pres-sure approximation, advection between differently-sized cells (note that the second-innermost cell isthree times the size of its inner neighbor), andodd-even decoupling no longer reflects physical na-ture reasonably. Indeed, the process was not self-strengthening until a steep density slope close tothe center triggered it.

The cause of such unphysical behavior lies inthe algorithm’s inability to correct density oscilla-tions close to its resolution limit without larger-scale movement. Therefore, a correction to thedissipation algorithm was included; it specifies aslow diffusion velocity, typically of 1% of the soundspeed, at which material diffuses away from localpeaks. The scale of its effect is small compared tothe overall movement in the simulation, so it doesnot reduce precision significantly.

3.6 Fixed-border Ostrikersimulations

Let us begin by setting up the first run with reflec-tive boundary conditions at the outer end of thesimulation, so that we effectively simulate a hard-walled cylinder. This allows to set a fixed radiusfor the distribution and observe its relaxation, finalshape, and stability.

As we know from Section 3.2, obtaining Ostrikersolutions for comparison requires an extra stepwhen dealing with “capped” distributions that donot extend infinitely as in the original theory. Weequate the finite integral over the general Ostrikerdistribution with the fixed (M/L) in the simulationsetup:(M

L

)setup

= πρ0H2R2

H2 +R2

†=

(M

L

)crit

R2

2c2sπGρ0

+R2

At †, we used:

H2 =2c2sπGρ0

;

(M

L

)crit

=2c2sG

We can rearrange to obtain:

ρ0 =(M/L)crit

R2π(

(M/L)crit(M/L)setup

− 1)

This expression for ρ0 is used in the Ostriker pro-file function to obtain the expected distributions,represented by orange lines in upcoming plots.

The first simulation’s space features 2, 000 cells;they are limited by a reflective wall at 0.6 parsecradius. 25% of the critical mass per unit lengthis chosen to fill the volume, distributed homoge-neously. To this end, we divide 0.25(M/L)crit bythe cylinder cross section r2π and initialize all cellswith the resulting density. The initial velocity is setto zero and the constant speed of sound to 0.2km

s .With this, we run the simulation. Figure 3.2

shows how it progresses: it begins with the ho-mogeneous distribution contracting to the center.Then, it reaches a central density at which isother-mal pressure provides support against further col-lapse. Compressed by the remaining inward ve-locity, The gas is accelerated outward again. Theprocess repeats, but with reduced strength due tothe simulator’s dissipation. In this manner, the dis-tribution oscillates around the stable Ostriker pro-file and ultimately settles down to exactly the pre-dicted curve.

The detail simulations of this section are avail-able as videos. Feel free to contact me about them.

The central density over time in this simulationcan be seen in Figure 3.3. It features a period ofapproximately 5 · 106 years. This value should bedominated by the physical dynamics of the system,though the artificial dissipation is expected to havesome impact on it, as it systematically slows downall movement. How representative the value is forphysical filaments is difficult to say, as this is still avery theoretical setup. Though we are running onphysical units, the initial conditions were chosento demonstrate the method’s validity rather thanmatch any real filament. Also, the hard outer lim-its, serving for consistency testing with respect toOstriker filaments, create unnaturally strong resis-tance to filament expansion. Section 3.7 will fea-ture a simulation that removes these.

One may now ask whether fixed-radius simula-tions of filaments reliably relax to their respectiveOstriker solutions for other radii. Since we havethe entire simulation and analysis process availablefrom our program, it is easy to automate the execu-tion of multiple simulations. We pick the interval[0.5pc, 1.5pc] for the radii we are interested in andrun the simulation 130 times.

To assess whether a simulation returned the ex-pected result, The central density ρ0 after relax-ation is a simple indicator, for which the analyticalformula for comparison was provided in the previ-ous simulation. The remaining settings are chosen

35


Figure 3.2: Shows the density distribution at different times during a simulationwith hard boundaries. The blue line shows the simulator cells’ densities, whilethe orange line shows the analytically stable solution for the given parameters asderived by Ostriker. The last two snapshots are taken at significantly later timesto illustrate the final and perfect relaxation to equilibrium. The simulation wasconducted with physical units, i.e. the radii of the images are in parsec and thedensities in M/pc3. Note that the scale changes slightly between some snapshots.

36

3.6. Fixed-border Ostriker simulations

Figure 3.3: Central density ρ0 over time of an initially homogeneous distributionweighing 25% of the critical mass per length, confined to a hard-walled cylinder of0.6 parsec radius. The simulation was run on 2,000 cells at an isothermal soundspeed of 0.2km

s . The dampening time-scale is caused by arbitrary algorithmicdissipation (see Section 3.5), but this should not falsify the measured period ofapproximately 5 · 106 years strongly.

similar to the previous simulation, including themass per unit length of (M/L) = 1

4 (M/L)crit, sothe returned value at an outer radius of 0.6 parsecrepresents the simulation we have seen in detail.

In Figure 3.4, the central densities of the simula-tion series is plotted over the outer radius of simu-lated space. Evidently, the relaxed result is stableand consistent with the analytical prediction overmultiple runs and different radii.

Setting up simulations exactly at expected Os-triker profiles kept them stationary in very goodapproximation. Changing the densities of a validsolution by only ±0.1% caused visible contractionor expansion when compared to the initial distribu-tion, indicating that the simulation is very preciseand sensitive to minor changes.

Simulations close to critical mass perlength

Multiple additional simulations were conducted us-ing different sub-critical values for M/L; their datais not shown because they provided no qualitativelynew results. Simulations with a low amount of massfeatured shallow density profiles, while those us-ing high amounts of mass revealed a larger portion

of the infinitely extended Ostriker distributions, astheory predicts.

What did cause different behavior was approach-ing the critical mass per length, especially forsimulations beginning with a homogeneous profileor other distributions involving high outer densi-ties. The initial contraction causes a spike in cen-tral density, which was already prominent in Fig-ure 3.3. This phase becomes increasingly diffi-cult for the simulator to handle as the containedmass increases. Simulations conducted significantlyabove (M/L)crit consistently collapse and must bestopped due to excessive densities and accelera-tions. Choosing exactly (M/L)crit and involvingany initial contraction (or relevant outer limit thatcauses contraction) also quickly results in an indef-inite collapse that terminates the simulation. Anexample of this process be seen in Figure 3.5.

This result is consistent with expectations: anOstriker filament at critical mass requires an in-finitely extended distribution to remain stable.Even provided this, or a sufficiently large simulatedspace to approximate it, the system lacks a mech-anism to stop collapses once they have begun.

Possibly, the variable simulation limit introducedin Section 3.4 and used in the upcoming Section 3.7

37


Figure 3.4: Density at the inner end of the distribution after relaxation over 130simulations with varying outer boundary and mass per length set to 25% of thecritical value.

could be utilized with a very low external pressurevalue to produce a gravitationally dominated fullcollapse to a singular distribution, but this wouldbe meaningless, as the assumptions made for thesimulation – such as the non-opacity of gas thatjustifies isothermal calculations – would no longerhold.

Setups slightly below the critical mass per lengthbehaved similar at first glance. This turns out tobe a numerical rather than physical property of thesimulation. Simply retaining earlier settings whileincreasing M/L to over 99% of the critical valueresults in the termination of the simulation, verysimilar to what we saw in Figure 3.5. However,closer inspection reveals that the extreme setupdoes not cause an indefinite collapse the way thecritical setup does. It only blurs the distinctionbetween runaway processes that should be stoppedand extreme conditions that are time-consuming tohandle, causing the simulator to terminate prema-turely as it hits limits such as a minimal time-step.As the restrictions that terminate simulations areloosened, it is possible to approach the critical valuemore closely, yielding increasingly extreme ampli-tudes of the oscillations in central density.

3.7 Variable-border Ostrikersimulations

Real filaments are not contained in hard walls,which is why the option for pressurized surround-ings of Section 3.4 was introduced. Figure 3.6shows a relaxation process where, instead of set-ting radius directly, the constant outside pressurewas configured to match the stable profile’s pres-sure at a radius of 0.5 parsec. The initial massdistribution is again homogeneous; it is spread outover 0.7 parsec radius, but quickly contracts afterthe start of the simulation. The maximum simula-tion space spans one parsec and 1500 cells, but atno time in the simulation are all cells in use.

The simulation evolves similarly to its fixed-radius predecessors.

Approaching (M/L)crit in this type of setupwould likely not give physically sensible results,as the external pressure accelerates the initial col-lapse until the distribution is very dense. The den-sity peak after this initial contraction would createextreme conditions. This would clearly leave thevalid domain of the algorithm, as a very stronglycompressed filament may accrete additional gasand become critical, or become opaque and non-isothermal.

38

3.8. Conclusions

Figure 3.5: Simulation at critical (M/L) starting with a homogeneous initialdistribution. To approximately mimic a stable Ostriker solution at this cylindersize – it is not possible to perfectly reach it with a finite radius – mass wouldhave to be much more concentrated at the center. Thus, the distribution quicklybegins to collapse. Given the full critical mass per length to maintain an arbitrarilyhigh central density, the pressure support is insufficient to halt the distribution’scollapse. Central density continues to rise until the simulator halts (last snapshot)as it is unable to handle the excessive outer velocities in fixed-radius simulationmode. Note the density scale change to 103M/pc3 in the last snapshot.

3.8 Conclusions

The pressure-supported variable-border simulationconcludes the simulation series to reproduce stableOstriker filaments of finite size. We have shownhow the infinite Ostriker model can be utilized torelate finite physical filaments’ radii, pressure sup-port, and radial stability.

Results align with the theoretical solution that,

under the assumption of an isotropic cylinder with-out movement along its axis, complete radial col-lapse is possible if and only if a critical mass perlength of (M/L)crit ≥ 2c2s/G is present. Unlikeinfinitely extended isothermal cylinders, the finitevariant can be stable below the critical mass perlength. In such pressure-supported filaments, sig-nificant oscillations can occur without the filamentbecoming unstable.

39


Figure 3.6: Plots of the density distribution at different time-steps of a radialsimulation with external pressure, which was calculated to match the pressure ofthe corresponding Ostriker filament at 0.5 parsec. The blue line shows the simu-lated density distribution, the orange line is the analytically calculated Ostrikerdistribution for the given mass and external pressure. The maximal simulationspace ranges up to 1.0 parsec and allocated 1,500 cells, of which any outside thedistributions outer cut-off are disabled. Density is denoted in M/pc3. Note thatsnapshot 50 was skipped to make room for the almost fully relaxed distributionat Snapshot 60.

The results’ insensitivity to the exact type of sim-ulation and parameters used make it plausible thatthe model of stable Ostriker density profiles appliesto filaments in pressurized environments, as sug-gested by Hacar et al. 2011 [13].

It may be of interest to conduct simulations ofthis type using real-world settings and observa-tional data. Also, studying the lengthwise stabilityof filaments and their interaction with the environ-

ment, such as a possible influx of gas, might allowfurther interpretation of the stability limits of fila-ment contraction, which ultimately define the bor-der to star formation.

40

4 Assessment of F# in computational physics

This chapter can be seen as a distinct addition tothe numerical and physical topics discussed in theprevious two chapters. It focuses entirely on theprogramming concepts used in this thesis and thereasoning behind them.

As this project was largely independent from ex-isting software, its programming language could bechosen freely. A selection of relevant choices can beseen in Figure 4.1, which lists some of their prop-erties. The suitability of functional style and typesafety, which will be discussed in the upcoming sec-tions, made the “sharp” (#) languages of the listingpreferred choices.

The project implementation began in C#, theCLI’s most prominent language, which is similar toF# but has less functional emphasis. After evaluat-ing some of the points described later in this chap-ter, new parts of the code were written in F# in-stead, linking to the existing C# code easily thanksto good interoperability between the two.

F# showed multiple advantages and ultimately,almost all the code was moved into the project’s F#part. After further evaluation, multiple parts werere-implemented again in a more functional style.This chapter attempts to give an impression onwhat motivated this change.

4.1 Introducing F#

F# (pronounced “F Sharp”) was designed and in-troduced at Microsoft Research, by a team led byDon Syme [8]. It follows in the footsteps of func-tional languages such as OCaml, to which it is sim-ilar enough to allow a “verbose syntax” mode thatcan compile a subset of OCaml programs as F#programs. Its main influence outside the functionalrealm is C#, the dominant language of the Com-mon Language Infrastructure, with which it sharesmost of its type system.

Despite its name, in which the F reminds of thefunctional programming paradigm, F# is not apure functional language. It is, on the contrary,a multi-paradigm language that combines object-

oriented and imperative programming with func-tional programming. In addition, it introduces var-ious language features that provide improved com-patibility with other programming languages.

This can make it appear as a heavyweight lan-guage, a property that is generally not seen as de-sirable. Yet, when used correctly, only a subsetof F# is relevant for any specific case. For ex-ample, multiple features are designed to improvelanguage compatibility; they are only intended forinterfacing with another language. For its inter-nal operations, a program properly formulated inF# need not rely on such language features, eventhough some of them are seen as essential or usefulin other languages. The fact that an F# program-mer can get by without them is due to the way thelanguage combines features from different program-ming paradigms.

Many features, especially amongst those of non-functional origin, are widely used in languages thatinfluenced the design of F# but are discouragedor even unsupported in F#. This is usually thecase when they are redundant to another featureof different origin which the language’s designersdeemed preferable.

This way that F# cherry-picks or neglects fea-tures is not always intuitive, especially when oneis not used to the paradigms the chosen featureoriginates from. For example, in C#, the conceptof nullable types, where null is a non-value, are aprominent feature that enables programmers to de-clare variables that might either have their desig-nated type or not exist at all. F# allows nullabletypes only for compatibility. They are replacedby the almost identical options, which can explic-itly be some object of a predefined type or none.The reason behind this seemingly absurd change isthat options are not a core language feature, butimplemented in just three lines of code using dis-criminated unions, which are a general solution toproblems involving heterogeneous data. (See F#learning resources [20] [7] for details.)

42

4.2. The functional paradigm

Language Preferred style Typing OO Functional Interactive Execution Speed

FORTRAN imperative static † no no* assembly fastC imperative static‡ no no no* assembly fastC++ imperative, OO static‡ yes no no* assembly fastC++11 imperative, OO static‡ yes partially no* assembly fastJava OO, imperative static yes no no* JIT (JRE) med.C# OO, imperative static yes partially in Mono JIT (CIL) med.F# functional, OO static yes yes yes JIT (CIL) med.Python imperative, OO dynamic yes partially yes interpreted** slow

† Depends on version. FORTRAN 2003 introduced multiple object-oriented features.

‡ My personal experience raises concerns about type safety in C and its direct descendants.

F# uses inferred typing but applies very strict rules to type casts and introduces units of measure.It uses structural comparison in some of its constructs, but the cases are not considered dangerousto type-safety.

* External tools provide interactivity of varying quality.

** Python uses byte code that can be partially JIT-compiled. IronPython JIT-compiles to CIL.

Figure 4.1: A comparison of languages considered for this thesis, listing for eachits preferred style(s) (the paradigms endorsed by structure and syntax), its pre-ferred/primary type system, whether it can formulate object-oriented programs,whether it can formulate functional programs, whether it has a native interac-tive mode, its method of compilation or interpretation for execution, and a roughindication of the typical performance of programs in execution. For the latter, lan-guages are divided into three classes: languages that are used in the fastest CPU-run programs, languages that compromise between features and performance, andthe interpreted language Python that allows “duck typing” at the expense of typesafety and performance. The procedural imperative paradigm is omitted in thelisting as all languages support it.

4.2 The functional paradigm

Functional programming is a declarative style ofprogramming derived from lambda calculus. Un-like the common imperative style, which performscomputations via a program state and routines tomutate this state, it approaches problems throughvariable binding and the evaluation of functions.Where an imperative programmer uses a for orwhile loop, a functional programmer might usea recursive definition; where an imperative pro-grammer changes a position variable, a functionalprogrammer defines the shifted position with re-spect to the original one. Altering the original po-sition is not a possible (or meaningful) operationin pure functional style, similar to mathematics,where changing a variable’s meaning is not a sen-sible operation. (However, redefining a symbol ismeaningful in mathematics and valid in functionalprogramming.)

The function as a first-class value is a coreprinciple of the functional programming paradigm.Defining a function within another and returningit, or passing it to yet another function, is a com-mon sight. Higher-order functions that take otherfunctions as input parameters are frequently usedin high-level control code; this thesis has alreadyseen their usage multiple times, for example in thepartitioner of Section 2.4 (that takes a distributionfunction as input) or in the usage of Seq.map inFigure 2.10 on page 20.

Such operations do not require convoluted spe-cial syntax, additional explicitly created objects, orother such detours, as necessary in many impera-tive languages. Functions can be declared anony-mously – without naming them with an identifier– directly where they are used, just like integerscan be written as numbers directly where they areused.

An impression of a functional programmer’s view

43

4. Assessment of F# in computational physics

on the paradigm is given in the foreword of ExpertF# 2.0 [7], a book on the language authored by itsdesigners:

“Functional programming today is a close-keptsecret amongst researchers, hackers, and elite pro-grammers at banks and financial institutions, chipdesigners, graphic artists, and architects. As thegrandchildren of Lisp, functional programming lan-guages allow developers to write concise programsthat are extremely close to the mathematical modelsthey develop to understand the universe (...) How-ever, to the uninformed developer, functional pro-gramming seems a cruel and unnatural act, effetemumbo jumbo.”

Imperative and functional programs can be for-mulated to be equivalent and, with some draw-backs, be used in combination.

Immutability

Mutable types, as opposed to immutable types, arethose data structures that can partially or com-pletely change their data. In principle, this conceptcan be used outside of functional style and pro-grams can mix both kinds of types as required. F#supports both mutable and immutable data types,but is designed such that immutable types are asshort and comfortable in usage as possible, whilemutable types and mutation operations use specialsyntax.

At first glance, it may seem absurd that func-tional languages deliberately prefer, or even en-force, an inability to change.

Removing mutability can be preferable becauseit reduces spurious interactions. This step is a log-ical continuation of the shift away from global statemachines, which rely on a program-wide mutablestate that is read and written to from many dif-ferent routines. These can be hellishly error-prone,as programmers who write the individual routinesmake assumptions on the state that do not hold.A sufficiently complex program involving state hasthe same fundamental problem. It can suffer fromall the typical issues of a global state machine ifformulated unfavorably.

When a variable’s value changes, its meaningchanges. After such a change, running any oper-ation that relies on the former meaning will fail.This creates code that must adhere to strict limitson its order of execution. Any change to this or-der may impede the program’s validity, often undernontrivial conditions. Reformulating an algorithmto use immutable types removes this entire class ofproblems:

• The reordering of code based on immutabletypes is unproblematic. Undefined inputs willcause a compiler error, notifying the program-mer about the inconsistency, while valid coderemains valid independent of its new position.Reordering code based on mutable data canchange its meaning in an arbitrary fashion,without any warning to the programmer.

• Immutable types are trivial to use as sharedobjects in parallel programs. Once con-structed, they can be safely accessed by anynumber of threads without locks, atomics, oranalysis of cache coherence. Mutable types,however, are extremely dangerous when usedin parallel, possibly creating an exponentiallylarge state space on a single error in synchro-nization.

• Analyzing the meaning of code based on im-mutable types does not require context beyondthe declared meaning of the values it uses. Formutable types, analyzing the possible programstates can become necessary, which can be avery complex and unpredictable problem.

Figure 2.10 on page 20 in Chapter 2 shows im-mutability in action. No value defined in the pro-gram ever changes; the code is merely a sequenceof definitions that build on top of one another.

4.3 The Common LanguageInfrastructure

F# is part of a large effort to create a hardware-independent programming platform called theCommon Language Infrastructure (CLI). It wasoriginally developed by Microsoft and is currentlystandardized via ISO/IEC and ECMA [4]. Amongits purposes are allowing the re-use of compiler fea-tures between languages as well as improving theinteroperability between them. Recently, the CLIhas gained additional interest from cross-platformdevelopment, since it allows for the re-use of exe-cutables on different operating systems and proces-sor architectures.

When compiling a CLI program from sourcecode, a language-specific compiler is called as usual.However, it is typically not compiled into hard-ware instructions, but instead into the Common In-termediate Language (confusingly abbreviating toCIL). The CLI normally compiles the final programfrom there. This final compilation step happens

44

4.4. Performance

Figure 4.2: Languages using the CLI cre-ate similar and compatible compiler outputthat is also independent of the runtime andplatform later used for execution. Publicdomain image by Jarkko Piiroinen [26]

only shortly before usage, a practice known as Just-In-Time compilation, allowing the usage of identi-cal programs on different runtimes and platforms.

The CLI defines a large set of standard con-ventions for its languages. A common type sys-tem specifies not only basic types, but, in conjunc-tion with a common language specification, enablescompatibility of custom types between programs,even for those of different languages. This is as ab-stract and incredible as it sounds: complex, high-level data structures defined in one programminglanguage can be used in another, directly, includ-ing metadata such as documentation snippets fordevelopment, without most of the quirks one mightexpect when doing such a thing. To use anotherprogram, it is simply loaded as a library file; anycompatible features become available to the hostprogram.

This is a huge step forward in my opinion. Thisthesis shows how plotting and rendering were in-cluded into programs. One might think that wastedious and error-prone. Actually, including bothlibraries and getting them to run was done in a mat-ter of minutes. The difficulty was the lack of ready-

made high-level functions for the renderer. Almostall of the programming time for visualization wasspent interfacing the high-level code with the out-dated OpenGL API of OpenTK – in part because ofits reliance on the global state machine paradigm,which complicated integration.

CLI runtimes

There exist multiple implementations of the CLI,but the vast majority of current Runtimes is basedon one of two projects:

• The Microsoft .NET Framework, which is thefirst implementation of the CLI, and its vari-ants. See the MSDN [20] for information onthe main version and the Compact Framework.

• The cross-platform Mono Project [19], ofwhich the main code-base is licensed under thefree software GPL license and maintained byXamarin [32].

Frameworks of the .NET family offer good perfor-mance, but target only platforms of interest for Mi-crosoft. Mono-based solutions have a very wide va-riety of platforms, from its main, free Runtime inLinux, Windows, and OSX, over Xamarin’s propri-etary Android version and cross-compiler to AppleiOS, to the Quantalea GPU compiler [27] privatelyoffered to financial clients.

4.4 Performance

The speed F# programs execute with is currentlynotably dependent on the type of problem, lan-guage features used in the program and the Run-time it is executed on. F# is a statically typedlanguage that is suited for compiler optimization,but the usage of functional and high-level languageconstructs tends to impede optimization by currentcompilers.

Generally, CLI compilers still yield assembliesof lower performance than that of the heavily op-timized and specialized solutions like the Intel Ccompiler or Visual C++ compiler.

An anonymous employee of Trapeze Software haspublished a benchmark series in 2011 [28] that com-pares the Visual C++ compiler with the .NET andMono CLI Runtimes for different settings and prob-lems. In summary, it shows that the .NET Run-time is slightly slower than C++ VC9 and VC10in comparable cases and about a factor 2 slowerin cases that are problematic to optimize in C#,

45


which was the language the CLI programs for com-parison used. Using Mono or x86-64 compilation in.NET results in a further performance penalty.

In the programming for this thesis, programs runon the Mono Runtime in SuSE Linux tended to runslower than on .NET in Windows, more so thanthe difference in the CPUs, RAM between the usedmachines should justify. The most significant casewas in a C# implementation of 3D Fast FourierTransformation (as-of-yet unused library feature,see Section A.4), where the slowdown depended onmulti-threading settings and may have been domi-nated by an inefficiency in the dispatching of workerthreads for parallel tasks in Mono.

The performance of F# on Mono when used forn-body simulations, one of the relevant cases forastrophysics, has been analyzed and compared toa multitude of languages and implementations in aseries of benchmarks by Brent Fulgham [9]. Thecomparison is frequently updated; at the time ofthis writing, F# on Mono executes this benchmarkat 2.8 times the time of the fastest implementation.

From a high-performance computing standpoint,these are not very good results. However, this mustbe put into the perspective of intended use cases.The most prominent project similar to the pro-grams drafted for this thesis is the AstrophysicalMultipurpose Software Environment (AMUSE) [1],which uses Python as its front-end. In the above-mentioned Fulgham n-body benchmark, Python 3had an execution time about 36 times longer thanF# had even on Mono, 100 times slower than thewinner. An exact comparison would depend on thePython implementation and problem, but given itsdesign as a dynamically typed scripting language,it seems unlikely that Python can compete withF# on either CLI Runtime in performance. (ACLI implementation of Python called IronPythonexists, but does not change this situation.)

Outlook

F# and the CLI are not in principle constrainedin terms of performance. Rather, their compil-ers and tools are much newer than those of Cor C++ and thus have undergone less optimiza-tion. The performance of Mono especially may im-prove. Mono is a younger re-implementation of the.NET Runtime, which included many of its featuresonly in recent years, and subject of ongoing de-velopment. Mono is transitioning to compilationvia LLVM [17], which is designed to provide re-usable optimization algorithms for compilers. InLLVM-enabled Mono, code is compiled in three

steps: from its original language first to CIL, thento LLVM Intermediate Form, and finally to assem-bly for execution. As LLVM is used in many othercompilers, there is a high level of interest in its fur-ther optimization, from which the Mono compilerwould benefit.

Projects exist to compile F# for GPU, most no-tably a commercial framework by Quantalea [27]called Alea.cuBase that seamlessly integrates GPUprogramming with code executed on the normalCLI Runtime. It is also utilizing LLVM, but inthis case to interface with NVidia CUDA [6], thecompiler that also handles Espresso’s performance-intensive SPH routines. Due to the concepts ofcomputation expressions and code quotations in F#(see the MSDN [20] or Expert F# [7] for details),the language itself requires no modification to workwith code blocks that are intended for special useinstead of normal compilation. Alea.cuBase is acommercial product intended for financial analysis,with a restrictive license and case-by-case pricing,therefore it was not evaluated for this thesis. Asfar as I could tell, other projects did not yet seemviable for deployment.

4.5 Units of measure

The F# language allows values to be denoted inphysical units of measure, such as the units definedin the SI and CGS systems. Computations involv-ing values tagged this way will be checked for incon-sistencies either when the program is compiled oreven live by the IDE. The CIL byte-code output bythe compiler holds information regarding units ofmeasure only as metadata, so that the performanceof the running program is not affected. Other pro-gramming languages can call library functions writ-ten in F# without a need to heed any special rules.Programs in F#, however, will fail to compile ifunits are specified incorrectly. This includes uniterrors in calls to library functions such as the onesintroduced in Chapter 2.

This feature deserves special mention as it provedtremendously useful in two ways. First, workingwith the wide array of units used in astrophysics,such as denoting lengths in cm, m, AU, light-years,and parsec, is neither an issue nor does it drawmuch attention away from the physical problemat hand. And second, even more helpful, is thesystem’s ability to highlight inconsistencies due toincompatible units, caused by mistakes such as amissing factor in a calculation.

In the programming for this thesis, the IDE and

46

4.7. IDE options

unit of measure system caught the majority of nu-merical mistakes within seconds of the faulty codebeing written.

4.6 Typing and type safety

Language support for units of measure can be seenas one specific tool concerning the much more gen-eral topic of type safety. In modern programming,expressions that implicitly convert between ways tointerpret a value are considered not type-safe. Forexample, in C++, the following statement is valid:

if (myInt - 15) Start();

This line of code executes Start() if myInt 6=15, because in C++, the number zero is also theboolean value true.

Such expressions can shorten code, but from anabstract standpoint, the line is a nonsensical state-ment. Proponents of type safety argue that fea-tures of this kind are dangerous because they cancause compilers to output an absurd interpretationof a program where they should output an errormessage.

Most typing can be categorized using the follow-ing categories:

• Strong versus weak and duck typing: in strongtyping, objects are either explicitly compatibleor not. Weak typing, or its extreme case of“duck” typing, “make ends meet” if they can,even if this involves conversions.

• Static versus dynamic typing: dynamic typescan change at run-time, static types cannot.

• Explicit versus implicit, inferred typing: in ex-plicit typing, the programmer must state typesexplicitly. In implicit typing, only the min-imal necessary information need be providedfor code to be valid.

It is important to distinguish between inferredstatic typing and dynamic typing, which can ap-pear very similar at first sight. Inferred typing,sometimes called implicit typing, does not alterthe compiled program or any other behavior of thecompiler, it only allows omitting annotations wherethey are obvious from context.

F# picks strong static inferred typing and is veryconsequent about type safety. Wherever possible,F# predicts the types of user-defined values, so pro-grams only require minimal type annotations. Buteven something as simple as the lossless conversion

from a 32-bit integer to a 64-bit integer must beexplicitly declared, or else the program will fail tocompile. There is the exception of structural com-parison, but discussing this topic is omitted here asit would go beyond the scope of this thesis.

I believe that this choice of typing discipline isthe correct choice for the majority of applications inphysics – and also in other disciplines that involvehigh complexity and a low tolerance for errors. Dy-namic typing is slow, cannot be reliably analyzedby IDEs, and is inherently unsafe, creating unnec-essary room for errors for little benefit. Weak typ-ing removes safety exactly where it is needed most:good code tends to execute numerical conversionsat few, well-defined locations in the program. Thismeans that weak typing saves its user a few localannotations that probably then become commentsinstead. The cost of this is the uncertainty that anypart of the program may accidentally cause conver-sions, inducing a wide range of possible errors, withlittle indication as to where this might happen.

The case for implicit typing is, in a way, the op-posite of weak safety in conversions. The benefit ofbeing allowed to omit redundant type annotationsis big, as these can significantly inflate code, espe-cially when using descriptive type names. The pro-grammer is free to write redundant annotations if itimproves clarity. Should an invalid definition occur,the programmer can insert additional annotationsand the development environment (or interactivecompiler) can highlight the contradiction with in-creasing precision. There is virtually no downsideto this approach, on the contrary: automatic gen-eralization allows code to be used in its minimallyconstrained meaning, enabling it to remain type-safe while being valid for multiple different typesat once, without additional programming effort.

4.7 IDE options

The integrated development environment (IDE) isa major component of modern programming. It as-sists the user in writing, debugging, analyzing, edit-ing, visualizing, organizing, compiling, and test-ing programs, among many other possible features.The preferences in IDE choice vary wildly betweenuse cases and users, with some using simple texteditors, sometimes without even the highlighting ofsyntax in color. Others use specialized editors forgraphical applications, as offered for mobile phonesby Xamarin [32].

Figure 4.3 shows an experiment that combines aplotting window with a graphical user interface and

47


Figure 4.3: A forms-based cross-platform plotting window with a built-in F#compiler. This program was one of the plotting methods created for the SPHSlibrary, but was abandoned since it offers little advantage over either a full IDEor the F# interactive console, while not benefiting from the features of either.It still serves to demonstrate the versatility of the library-focused programmingmethods that both created it and made it obsolete.

an F# compiler. It worked flawlessly, but was stilldiscarded, because its user interface is redundantwith the versatility F# programming offers anyway– and a plotting window can be launched from pro-grams written in any F# IDE, including the F# in-teractive console, which is a simple console windowoperated by typing line-by-line F# code. CallingChapter 2’s SPHS library features from the interac-tive console worked immediately on both Windows.NET and Linux Mono. With a variety of alterna-tives, the plotting program was redundant.

4.8 Evaluation summary

Before we draw conclusions for the case of numer-ical physics, let us summarize the traits of F# ingeneral. Amongst its prominent disadvantages are:

• Interacting with native libraries can cost per-formance and involve some additional effort.

• Its compilers and thereby run-time perfor-mance is not on par with the high-performancecompetition. (As seen in Section 4.4.)

48

4.9. Closing remarks

• For very small and simple tasks, it can beslightly more verbose than dynamic languages,which can become the limiting factor in suchuse cases.

• F# feels alien to many and may be more diffi-cult to learn than other languages.

General advantages of F# are:

• It is an expressive language that is well suitedto represent mathematical models.

• Its design is directed at reducing the error ratewhen programming.

• The language is suited for live analysis andfeedback from IDEs.

• Compiled programs, especially libraries, areautomatically cross-platform usable, unlessthey rely on native code.

• It features very good interoperability to CLIlibraries, including those written in other lan-guages.

• It has a very lightweight syntax when com-pared to other strongly typed solutions to com-plex problems.

Distinguishing between high-level andhigh-performance code

Physical applications tend to face a slightly differ-ent situation compared to typical general-purposeprogramming projects. Some prominent differencesare:

• Applications tend to be performance-intensive.

• Physicists spend much time programming andare willing to learn new concepts.

• Complicated, mathematically formulatedproblems are common.

• Units of measure are frequently used.

Of these four points, three arguably support theusage of F# and one, performance, does not. Thissuggests to distinguish between programming tech-niques for projects that are primarily limited bycomputer performance and those that are primar-ily limited by the writing of programs.

It may be a sensible decision to support a dividebetween programming languages that solve eitherproblem, to then use two languages in combination.

In most programs, the performance-intensive codeis only a fraction of the program and increasinglyexecuted on GPUs. Writing exactly these parts ina language that allows highly optimized routines,while using a safe and expressive high-level lan-guage for the remainder of the program, might be avery strong combination. Such a divide could havethe consequence that languages forming a “middleground” between the two could lose importance, asa case-by-case combination of the extremes may behard to compete with.

The SPHS Library shows this approach whenit interacts with Espresso: the SPH simulationin Espresso is executed using CUDA, which offersextremely high performance by running on paral-lel GPU multiprocessor units, but cannot competewith F# in other aspects.

4.9 Closing remarks

This thesis has demonstrated a modern program-ming style with functional emphasis by using it inmultiple applications relevant to physical research.We have seen the abstraction of an inherently im-perative high-performance GPGPU simulation offluid dynamics for inclusion in an entirely separateanalysis software that uses a strongly functionalformalism. Example code demonstrates that thedefinition of a simulation can be of minimal length,retain type safety, and still be very clear and con-cise. We have also seen the implementation of agrid-based simulator in F# and the integration ofvarious tools into the setup and analysis process.

The usefulness of many of the shown methodsmay depend on the scope of the projects they areused for. Strong typing, functional programming,library interoperability, cross-platform usage, thecombination of multiple languages and compilers –implementing such concepts requires some extentof effort even if they seamlessly integrate into aproject. The programming style used here is in-tended for large projects with high complexity; itlikely would show greater effect in a project involv-ing more than one to two people.

But this is where great potential may lie hid-den. Many projects do involve a lot of people, highcomplexity, and an almost desperate demand forimprovements in usability. There is much work tobe done to improve workflow itself – and quite pos-sibly, the effort would amortize.

49

A Appendix

A.1 EBT file format specification

EBT (Easy Block Tree) is a recursive format forstructuring serialized data.

• The EBT format is a sequence of blocks.

• Integers defined in this section are serialized inlittle-endian format.

A block always consists of three parts:

• 2B (signed Int16) Block type ID; sign indicateswhether the length field is 1B or 8B

• Length of content and possibly isEBT flag, for-matted depending on the sign of the ID:

– If the ID is positive, the content lengthfield is an unsigned Int8 (1B) indicatingcontent length in bytes.

– If the ID is negative, the content lengthfield is an Int64 (8B).

∗ A negative value is used if the blockcontent is again in EBT format.

∗ The absolute value indicates the con-tent length in bytes.

• Content

A.2 EBT index specification

To allow quick access, an index may be provided.Indexing of individual block contents is optional.

File addresses

File positions are stored as little-endian Int64 val-ues (8B size). The beginning of the file and firstvalid position is at 0.

Index Address Block

• Is a top level block

• ID = +32512 = +7F00hex

• Content length 8B

• Must be the first block in the file

• Contains the address of the Block Index Blockthat holds the top level index

Index Container Block

• Is a top level block

• ID = −32512 = −7F00hex

• IsEBT: contents have EBT format

• Contains only “Block Index Blocks”

Block Index Block

• ID (local) = -1

• Holds the addresses of all elements in a block.

• Is to be used as index for its own file if andonly if it can be navigated to from the IndexAddress Block

• Contains a sequence of indexer structures

Indexer

• 2B target block ID

• 8B target block address; points to the very be-ginning of the block (first byte of ID)

• 8B target index address option. If not 0, itcontains the address of the index block for thetarget block. If 0, it is meaningless.

50

A.3. Implemented Rules

Notes

For performance and easy writing, it is often suit-able to write one Index Container with all desiredindices at the end of the file.

A.3 Implemented Rules

The following lists implemented settings the SPHSlibrary could pass to the development versionof Espresso. They are written in the formatidentifier : type. The IOFlags type will be ex-plained later in this section.

Main simulation definition

• bounds : Cuboid<cm>: Boundaries of the sim-ulation

• d3 : bool: Use three-dimensional simulation,false gives two-dimensional

• dt : float<s>: Time between two snapshots

• enableGravity : bool

• enableHydro : bool

• isIsotherm : bool: Set to true for isothermalcomputation

• iterationsPerSnapshot : int64 option

• snapshots : int64: Number of snapshots

• stepmax : int64 option: log2 of maximumtime-step size in units of dtmin

Numerical constants

• cfl scale : float option: Courant number.Typical value is 0.15.

• constEnergy : float<erg/g> option: If set,all particles will have this specific energythroughout the entire simulation.

• constMass : float<g> option: If set, all par-ticles will have this mass throughout the entiresimulation.

• gamma : float option: physical γ value

• gravConstant : float<cm3/(g s2)>

option: Gravitational constant to use ifgravity has been enabled

• gravSoftening : float<cm> option: Scalefor the gravitational softening parameter

Other settings

• boundX : byte: Boundary behavior for X-direction: 0uy = Open; 1uy = Periodic; 2uy= Reflecting; 3uy = Streaming in/out; Exten-sions may introduce more

• boundY : byte: see above

• boundZ : byte: see above

• dataFileOutput : bool: Specifies whether anoutput in the standard data format is created.

• export : IOFlags: Which particle fields to ex-port when writing to file

• glassPass : int64: How many passes of glassgeneration should be performed? Default iszero.

• isscale : bool: Toggles Espresso’s internalunit scaling.

• simOutput : IOFlags: What per particle datashould the simulator output?

• waitOnEnd : bool: If set, Espresso halts afterthe simulation instead of terminating.

Internal and temporary features

• cRhoMin : float option: Lower density limitfor CIT-SPH output in data format units

• cRhoMax : float option: Upper density limitfor CIT-SPH output in data format units

• cEMin : float option: Lower energy limitfor CIT-SPH output in data format units

• cEMax : float option: Upper energy limitfor CIT-SPH output in data format units

• EType : byte: experimental Espresso setting

• HType : byte: experimental Espresso setting

• MType : byte: experimental Espresso setting

• VType : byte: experimental Espresso setting

IOFlags

This type allows to specify which data should betransferred to or from Espresso via bit-flags. It isbest described by its definition, which is in binaryform:

51

A. Appendix

[<Flags>]

type IOFlags =

| Nothing = 0b0000000000

| All = 0b1111111111

| id = 0b0000000001

| x = 0b0000000010

| y = 0b0000000100

| z = 0b0000001000

| vx = 0b0000010000

| vy = 0b0000100000

| vz = 0b0001000000

| m = 0b0010000000

| rho = 0b0100000000

| e = 0b1000000000

| Position = 0b0000001110

| Velocity = 0b0001110000

A.4 Unused feature: 3D FastFourier Transform

A three-dimensional multithreaded Fast FourierTransform has been implemented in C# for thisthesis. Due to a shift in the project’s direction ithas not been used.

The intended use case for the feature was thesetup of turbulent velocity fields, which can bemore easily defined by their representation inFourier space. A backwards Fourier transformationwould then yield the velocity field for the particles.

For such a part of the program, C# may bepreferable due to its availability of unsafe codeblocks, which allow manual code optimization us-ing pointer manipulation, as common in C++ pro-grams. Such operations are normally forbidden inC# and F# due to their breaches of security fea-tures and bad compatibility with garbage collec-tion. With the current compilers, the usage of un-safe code still improves program speed, so an unsafeC# code block was used for the inner operations ofthe Fourier Transformation.

This feature appeared operational; referring to itfrom the SPHS library and writing a function thatapplies the velocity field should enable the abilityto set up turbulent velocity fields.

52

Bibliography

[1] F.I. Pelupessy, A. van Elteren, N. de Vries, S.L.W. McMillan, N. Drost, S.F. Portegies Zwart. TheAstrophysical Multipurpose Software Environment http://amusecode.org/

[2] Atacama Pathfinder Experiment (APEX Telescope). Funded by: Max Planck Institut fur Radioas-tronomie (MPIfR) at 50%, Onsala Space Observatory (OSO) at 23%, and the European SouthernObservatory (ESO) at 27%. http://www.apex-telescope.org/

[3] Courant, R.; Friedrichs, K.; Lewy, H., 1928: Uber die partiellen Differenzengleichungen der mathe-matischen Physik”. Mathematische Annalen 100 (1): 32–74.

[4] ECMA standard 335; ISO/IEC 23271:2012: Common Language Infrastructure http://www.

ecma-international.org/publications/standards/Ecma-335.htm http://www.iso.org/iso/

home/store/catalogue_ics/catalogue_detail_ics.htm?csnumber=58046

[5] Comma-seperated values (non-standardized convention) commonly: RFC4810 http://tools.

ietf.org/html/rfc4180

[6] Compute Unified Device Architecture (CUDA). Parallel computing platform. Devloped by NvidiaCorporation. 2007–2013 http://www.nvidia.com/object/cuda_home_new.html

[7] Don Syme, Adam Granicz, Antonio Cisternino: Expert F# 2.0. ISBN-13 (pbk): 978-1-4302-2431-0.ISBN-13 (electronic): 978-1-4302-2432-7.

[8] Microsoft Research. The F# programming language project. Lead developer: Don Syme. Under de-velopment since 2005. http://research.microsoft.com/en-us/projects/fsharp/ F# SoftwareFoundation: http://fsharp.org/

[9] Brent Fulgham: The Computer Language Benchmark Game. Performing n-body simulations onUbuntu in multiple languages. http://benchmarksgame.alioth.debian.org/u64q/performance.php?test=nbody

[10] Volker Springel: The cosmological simulation code GADGET-2. Mon.Not.Roy.Astron.Soc. 364(2005) 1105-1134

[11] Thomas Williams and Colin Kelley and many others. Gnuplot 4.4: an interactive plotting program.March 2010 http://gnuplot.sourceforge.net/

[12] Goodman, A. A., Barranco, J. A., Wilner, D. J., & Heyer, M. H. 1998: Coherence in Dense Cores.ApJ, 504, 223

[13] A. Hacar, M. Tafalla, 2011: Dense core formation by fragmentation of velocity-coherent filamentsin L1517

[14] The HDF Group. Hierarchical data format version 5, 2000-2010 http://www.hdfgroup.org/HDF5

[15] Francesca Iannuzzi, Klaus Dolag: Adaptive gravitational softening in GADGET.Mon.Not.Roy.Astron.Soc. 07/2011

54

Bibliography

[16] IDL (programming language and visualization toolkit). Original design: David Stern. http://www.exelisvis.com/ProductsServices/IDL.aspx

[17] LLVM Developer Group: LLVM Compiler Infrastructure. (Formerly: Low Level Virtual Machine.)http://llvm.org

[18] R.A. Gingold and J.J. Monaghan: Smoothed particle hydrodynamics: theory and application tonon-spherical stars, Mon. Not. R. Astron. Soc., Vol 181, pp. 375–89, 1977.

[19] Mono Project: A free and open source implementation of Microsoft’s .NET Framework based on theECMA standards for C# and the Common Language Runtime. http://www.mono-project.com

[20] The Microsoft Developer Network. Online documentation for F# and the .NET Framework. http://msdn.microsoft.com/

[21] Nagai, T., Inutsuka, S., & Miyama, S. M. 1998: An Origin of Filamentary Structure in MolecularClouds. ApJ, 506, 306

[22] Octave community: GNU/Octave. 2012. http://www.gnu.org/software/octave/

[23] The Open Toolkit http://www.opentk.com

[24] J. Ostriker, 1964: The Equilibrium of Polytropic and Isothermal Cylinders

[25] S. Oxley: Free-fall derivation of a uniform density spherical distribution of matter initially at resthttp://www.droxley.freeserve.co.uk/node15_ct.html

[26] Jarkko Piiroinen User of Wikimedia Commons http://commons.wikimedia.org/wiki/User:

Jarkko_Piiroinen

[27] Quantalea GmbH. Wasserfuristrasse 42 CH-8542 Wiesendangen. https://www.quantalea.net/

[28] Anonymous Codeproject user “Qwertie” of Trapeze Software Inc. Head-to-Headbenchmark: C++ vs .NET http://www.codeproject.com/Articles/212856/

Head-to-head-benchmark-Csharp-vs-NET

[29] Markus Schmalzl, Jouni Kainulainen, Ralf Launhardt, Thomas Henning. Star formation in theTaurus filament L1495: From Dense Cores to Stars The Astrophysical Journal, Volume 725, Issue1, pp. 1327-1336 (2010).

[30] Sommer-Larsen et al, 1996: The structure of isothermal, self-gravitating, stationary gas spheres forsoftened gravity

[31] V. Springel and C.P. Dullemond: Numerical fluid dynamics (lecture script 2012-2013) UniversitatHeidelberg

[32] Xamarin (company). Maintainer of the GPL-licensed CLI runtime Mono. http://xamarin.com

[33] John Champion and the ZedGraph development team: ZedGraph class library http://

sourceforge.net/projects/zedgraph/

[34] Martin Zintl. CAST group. USM (Universitatssternwarte LMU Munchen). [email protected]

55

Danksagung

Ich mochte mich bei Prof. Dr. Andreas Burkert bedanken, der diese Arbeit betreut hat und mir vielFreiheit, gute Ratschlage, vielfaltige Aufgaben und eine interessante Zeit beschert hat.

Ich bedanke mich bei Martin Zintl und Christian Alig, mit denen ich viel Spaß bei der Zusammenarbeithatte, sowie bei der gesamten CAST-Gruppe fur das großartige Arbeitsklima. Weiterer Dank gehtan Christian Winnerlein fur den Code zur effizienten Verwaltung der Zufallszahlengeneration und dasabsolut notwendige Mir-auf-die-Fuße-Treten, damit ich mich rechtzeitig mit der Universitatsburokratieauseinandersetze.

Ein extra Dankeschon geht an unsere Systemadministratoren Tadziu Hoffmann und Keith Butler, mitderen Hilfe ich allerlei Dinge auf Uni-Arbeitscomputern testen konnte und die ihn jederzeit wieder zumLaufen gebracht haben, wenn er einmal den Geist aufgegeben hat.

Zwischendurch entstanden:Idee fur ein neues CAST-Logo. . .

. . . mit einem Bild von Christian Alig. :)(Falls ihr es nicht braucht, macht auch nichts.)

56

Selbstandigkeitserklarung

Hiermit erklare ich, die vorliegende Arbeit selbstandig verfasst zu haben und keineanderen als die in der Arbeit angegebenen Quellen und Hilfsmittel benutzt zuhaben.

Unter den Hilfsmitteln befanden sich selbstverstandlich Computer und ihre ubliche Arbeitsumgebung,Software und sonstige gangige Werkzeuge. Einige bisher nicht explizit erwahnte solche Werkzeuge sind:

Additional tools used

Corel PaintShop Pro X4 (personal license, for creating infographics), the Microsoft Developer Network(major documentation resource for F# as well as the .NET CLI implementation and library), WolframAlpha (for calculations, integrations, and other mathematical tools), Git and Gitolite version control(to allow parallel updates on Espresso and convenient merging thereof), CIT-SPH (An earlier Espressovisualization and format of Martin Zintl used for preliminary analysis and consistency checks whileimplementing features for Espresso), Acadoid Developer’s Lecture Notes on a stylus-enabled Galaxy Note10.1 tablet PC (for drafting, sketching, and writing calculations)

And, last but not least, the LaTeX2e and the Memoir package, edited in WinShell and SumatraPDF,for the thesis’s layout. (The interoperability between SumatraPDF and WinShell is neat.)

Georg Michna, September 30, 2013

57

functional methods to set up and analyze hydrodynamical ... · 1.4 programming jargon due to the...

Documents