real-time visualization tool for haskell- embedded...

(Biological Process Visualization) CS4970 Sr Capstone WS05 William Moore, wlmd23, 832138 Charlie Huggard, ach343, 872978 Periclies Kariotis, psk1db, 882445 Mentor: Dr William L Harrison Technical Report

Real-Time Visualization Tool for Haskell-Embedded Cellular Interaction Domain

Specific Languages

(Using the Rhodobacter Sphaeroides DSL)

Table of Contents

Executive Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1 Problem Statement. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

1.1 Introduction. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1.2 Technical Background . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1.3 Literature Survey. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1.4 Goals and Objectives. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1.5 Overall Approach. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

2 Requirements Analysis. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2.2 Overall Description . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2.3 System Requirements and Constraints. . . . . . . . . . . . . . . . . . . . . . . . . . . .

2.3.1 Operating environment (external constraints). . . . . . . . . . . . . . . . . 2.3.2 Market users and characteristics. . . . . . . . . . . . . . . . . . . . . . . . . . 2.3.3 Environmental constraints. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2.3.4 System components . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2.3.5 Software interfaces and libraries. . . . . . . . . . . . . . . . . . . . . . . . . . 2.3.6 Communication interfaces. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2.3.7 Hardware interfaces . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2.3.8 System maintenance . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

2.4 Alternative Solutions. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2.5 Performance Requirements. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2.6 Resource Requirements. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2.7 Evaluation Metrics. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

3 Design Specifications. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3.1 Introduction. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3.2 System Design Overview. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3.3 Data Requirements. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3.4 Software Design. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3.5 Hardware Design. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3.6 Testing Methods: Evaluate the following. . . . . . . . . . . . . . . . . . . . . . . . . . . 3.7 Scheduling Diagram & Task Assignments. . . . . . . . . . . . . . . . . . . . . . . . . . 3.8 Implementation Costs. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

4 System Implementation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5 System Performance, Testing and Evaluation. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6 Conclusions and Discussion. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7 Future Work. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8 References. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Appendices. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

Executive Summary

This document details the software engineering process followed in the creation of

the system for Dr. William Harrison. Section 1, the problem definition, describes the

background information, state of the industry, and the need for development. Section 2

more clearly defines what is expected of the system in terms of various requirements.

Section 3 specifically details the design specifications for the project, and section 4

includes some notes on the final implementation, including installation instructions.

Evaluation metrics and testing results are described in section 5, and section 6 contains

project conclusions. Section 7 elaborates on potential future development related to this

project. See section 8 for a detailed list of references.

1 Problem Definition

1.1 Introduction

Motivation, market demand, and other solutions

Domain specific languages are a convenient and powerful solution to a common

problem in inter-disciplinary studies, which occurs when the expert knowledge required

rarely occurs within the same scientist [5]. The definition of the CellSys DSL requires

knowledge of cell biology, and while CellSys itself is fairly simple, DSLs for much more

complex experiments and simulations could be produced. However, the language

implementation of the simulation requires knowledge of programming, language design,

etc. A domain specific language allows experts of one type to define a simulation in terms

of their specialty domain without worrying about the detailed specifics of the computer

implementation itself. The computer scientists can then produce the programmatic

structure and scaffolding for the simulation without an intimate knowledge of the theories

behind why the DSL is defined the way it is, so long as the definition is clear.

Motivation for Visualization Tool

The visualization tool serves to extend the usefulness of Haskell-embedded DSLs

by providing a simple, re-usable architecture for the rapid construction of visual simulation

results. According to [5], visualization is extremely valuable for checking the faithfulness of

biological models, and this system furthers and simplifies that process.

Market demand and other solutions

This system will expedite the visualization process for the CellSys language, and

can be reasonably extended to work with any other DSL as well. Dr Harrison will be able

to use this project for R. Spaeroides and other implementations. No known products

provide visualization services for Haskell-embedded DSL simulations or biological models.

1.2 Technical Background

Overview of the existing system

One of Dr. Harrison’s current projects [5]

is developing domain-specific languages to

enable biologists to easily simulate complex

processes. This approach enables biological

researchers to quickly develop and easily

readapt computational models of complex

biological processes without needing a strong C/C++ background [5]. At the highest level,

the purpose of this project is to replace an existing, predominantly manual system for

converting text-based frame data into an encoded video file, or some other animated

visual representation. The video content is an estimated model of the behavior of the

Rhodobacter Sphaeroides bacterium, as it moves around in three-dimensional space and

reproduces based on light concentration. While the implementation and behavior of R.

Sphaeroides are slightly beyond the scope of this project, information on these items can

be found at [5]. The current process involves four phases, the first of which is running the

Haskell program CellSysSemantics (see [4] for a Haskell interpreter) that generates

text-based frame data in POV directory, one for each frame, using the persistence of

vision (POV) file format [2]. The POV format can be used by POV-Ray [1], which is a free

imaging tool, and other applications. Secondly, the user must use the text frame data to

render BMP frame images. This is accomplished with the POV-Ray application, and can

be done in batch mode via a script at the command line interface (by using pvengine

/EXIT +V <filename>). Following this, the user must manually convert all the BMP

images into another, more readily-usable image format, such as PNG or JPG. In the final

phase, the user invokes mencoder, a free application distributed with mplayer [3], to

encode the PNG or JPG images into an MPEG-2 format video file, which depicts the

actions and movements of the R.Sphaeroides bacteria.

Screen shots and system diagram

(following page)

Haskell interface for the generation of text frame data

Image generation with the POV-Ray utility (Initiated from the command line)

Current System Diagram

1.3 Literature Survey

The focus of the following literature survey is to establish an “industry state” and to look

into prospective implementation strategies. As such, the topics include functional reactive

animation in Haskell, a Haskell plug-in architecture, Haskell's foreign function interface,

interface toolkits, and rendering with OpenGL and C++.

An embedded modeling language approach to interactive 3D and multimedia

animation. Elliott argues the concept of what an animation means is frequently lost in

how to present it. He introduces a language which is suitable for defining animations in

terms of what they are by synthesizing “an existing declarative 'host language,' Haskell,

with an embedded domain-specific vocabulary.” The language he introduces is called

Fran (functional reactive animation), and it attempts to separate graphic content from

graphic presentation [6]. Fran treats 3D geometry as a collection of primitive shapes, and

also supports spatial transformation (translation, scaling), decoration (colors, texturing),

aggregation (combining multiple models into one conceptual model), lighting, and even

sound.

Another key concept in Fran is that the same level of synthetic and hierarchical

construction is given to 2D images as 3D models. This includes rendering 2D geometry

(lines, polygons, circles), rendering text (using any number of fonts and display modifiers),

2D spatial transformation, image overlay, partial transparency, and cropping. Images are

said to have infinite extent and resolution, until they are actually displayed. This is in

contrast with the typical conception of images as bitmaps or other rasterized formats,

which have finite resolution and size.

Fran treats all animation, 2D or 3D, as behaviors, which are simply “time-varying

values.” Time in Fran is treated as a continuous value. Similar to images, time is not

turned into a discrete sequence until the user experiences it in the form of a discrete

number of frames.

Fran presents a easier-to-define method for animation then many conventional

graphic libraries such as OpenGL. It's basically just a DSL embedded in Haskell, much

like CellSys itself. For this reason, it would be easy to include Fran and the real-time

simulation into one comprehensive Haskell module.

The Yampa Arcade. This article describes the implementation of the classic space

invaders game using Functional Reactive Programming, or FRP. Yampa is a Haskell-

embedded incarnation of FRP, which is used throughout the article. Yampa strives to add

features which are necessary for the domain of animation, which earlier reactive animation

systems such as Fran and FAL lacked. Two concepts which are core to Yampa are

signals and signal functions. Signals are essentially functions from time to a value, and

the type of the value represents the type of data carried by the signal. Yampa signals are

similar to Fran behaviors. For example, if a Point represents a 2-dimensional point, then

the position of the mouse, as it changes in time, might be represented as a value of type

Signal Point. Events, such as a key press or mouse button click, can be easily

represented as an (Event T) type, where the T type contains information about the

event. Signals of type Signal Event T can then be used to model discrete events.

Signal functions accept signals of one time and produce a signal of another type. This is

very relevant to CellSys, not only because it is embedded in Haskell, but because it may

provide a convenient, extendable, and modular method for managing the timing and

animation of a real-time simulation. Combined with a graphic toolkit, such as HGL, Yampa

could form one possible solution for half of the visualization tool. By generating images

directly from Haskell and invoking mencoder, it is possible the entire animation suite could

be embedded along with the DSL to form a single, cohesive system.

FranTk – A Declarative GUI Language for Haskell. FranTk is a GUI interface

toolkit, which is built on Fran. It adds support for hierarchical interactive displays, which

allows access to input from individual components rather than one monolithic window.

FranTk lifts Fran's behaviors and events to widgets, which is key to declarative

programming, and provides a more efficient implementation of core Fran combinators,

which improves Fran's data-driven model.

The fundamental concept in FranTk for interaction objects, such as labels, sliders,

or buttons, is the Component. A component is an action which produces a widget. The

action is performed when the widget must be displayed. Components have three types,

the standard viewable widget (Component), graphical components from Fran such as

lines or circles (CComponent), and the top-level window component (WComponent).

These components can be composed to easily create much more complex hierarchical

interfaces.

FranTk also provides access to state with monads. Monads are a functional

programming concept which allows the programmer to extend the return value of functions

with additional information, such as various side effects to some “global state,” error

messages, exceptions, and re-formatting, among other things [7]. Listeners are used to

update state, and FranTk also provides Wires which connect the output of various widgets

to the inputs of listeners. For example, when a button (a widget, generated from a

component) is pressed, a value is sampled, and passed (via a wire) to a listener, which will

then update the application state.

FranTk, combined with Fran or even Yampa, could provide the necessary interface

capabilities to implement the visualization system itself in Haskell. It is part of a larger

functional suite of Haskell-embedded domain specific languages.

Plugging Haskell in. Andre Pang details the use of Haskell as a statically typed

extension language for both Haskell and foreign-language applications using the foreign

function interface. Extension languages enable users to add additional functionality to a

program without understanding the underlying program itself or re-compiling (if applicable)

the program code. Functionality written in such a fashion can usually be loaded at run-

time, and these modules are often called plug-ins. A plug-in interface has two sets of

symbol names, one for the values the plug-in can access from the host application and

one for the values the host application can access from the plug-in. A Haskell plug-in

infrastructure can be accessed from any language directly supported from Haskell's

foreign function interface, such as C, or even from a language which inter-operates with C,

like C++, Objective-C, or C#.

By using Haskell as a plug-in extension language and the FFI, it would be possible

to use the embedded domain-specific language (EDSL) CellSys as a module from some

other application. This means the simulation interface and programming could be written

in C/C++ and OpenGL or some other efficient, well-established language and any

simulation EDSLs can be called as modules or plug-ins from their native embedded

language, in this case, Haskell. This model would be a fully-featured stand-along

simulation and encoding engine, with a sort of extension language that could be used to

power different types of simulations.

3D Rendering with C++ and OpenGL in Undergraduate Projects. This paper

presents and object-oriented approach to rendering 3D objects with C++ and OpenGL. All

objects are basically rendered as a set of vertices (ordered triplets representing points in

3D space) and polygons (3 or more vertices). Since shapes are made of multiple

polygons, and the polygons often have vertices which overlap, it is common for the

vertices to be stored in an array, and then each polygon has a set of pointers to its

vertices. This avoids storing redundant data. Lighting is achieved by the relationship of a

polygon's normal vector (an ordered triplet representing some direction in space, in this

case it has length one and is perpendicular to the polygon's surface) to the direction of a

light source. Maintaining accurate normal vectors is essential for accurate lighting effects.

With flat shading, only one normal vector per polygon is needed, and the entire polygon

face is shaded the same color. A more realistic shading model, Gouraud shading, uses

one normal vector per vertex, and the intensities from all the vertices will be averaged by

weight across the face of the polygon. Textures can be applied to a polygon, which is

similar to putting an image, such as wood, stone, etc, on the face of the polygon.

The article applies careful thought to the object hierarchy which is used to store the

render data. Since one vertex or normal vector may be used by more than on polygon, it

is convention to store all the primitive data in separate linked lists or other such dynamic

data structures, and then use pointers to get a handle on component primitives.

A fully-blown rendering engine, with support for vertices and arbitrary polygons may

not be necessary for this visualization engine. Certainly, the CellSys language only

requires support for spheres, but some extendibility is expected for an abstract rendering

engine. This illustrates the merit of the POV rendering format, at least for the “pre-render”

simulation path, because the managing application itself need not be aware of any of the

details of rendering. If the POV format is used, then the POV-Ray application can render

any images necessary for the video clip, which can be directly passed to the video

encoding process.

The system under consideration is based on a paper by Dr. William Harrison and

Dr. Robert Harrison called “Domain Specific Languages for Cellular Interactions” [5].

In this work, the capacity for the use of domain-specific programming languages (DSL’s) in

the area of biological system simulation is explored. DSL’s are small programming

languages that include language abstractions for only one particular area of expertise [5].

As such, programs written in DSL’s are easily constructed and understood by domain

experts [5]. The primary advantage of DSL’s in computer modeling and simulation of

biological systems is that they free the biologist from detailed attention to the underlying

computer processes upon which their programs are built.

Consider the task of writing a computer program to simulate cell activities. A typical

C++ program to accomplish this would be quite large, with pertinent biological data spread

across multiple language constructs [5]. The biologist, therefore, would need to have a

high degree of C++ knowledge and spend a large amount of time focusing on low-level,

computer-related details. Modification and reuse of such a program would also be

problematic, given its complexity. On the other hand, this task could be split into two

separate activities: rapid prototyping of a simple programming language to create

biological simulations and the analysis of the biological models upon which the language is

based. The first problem is in the domain of computer scientist, while the second is in the

domain of the biologist. In such a model of execution, the computer scientist could focus

on the details and adjustment of the language design, and the biologist could focus on the

accuracy of the simulations.

In his paper, Harrison demonstrates the CellSys language, a DSL created to

simulate the motion of the simple bacterium Rhodobacter Sphaeroides. R. Sphaeroides is

a photosynthetic bacterium that actively swims towards light in an effort to obtain optimal

conditions to grow and divide. At any point in time, R. Sphaeroides will swim, tumble

(reorient its direction), grow, divide, or die according to that depend on the bacterium’s

present environment and history [5]. CellSys, embedded in the functional programming

language Haskell, presents to the biologist the abstraction of a Markov chain for specifying

the behavior of the R. Sphaeroides.

A Markov chain is a representation of a set of states and the probabilities of moving from one state to another at a given instant. As such, Markov chains are a representation of the behavior of a non-deterministic finite state machine. [5]

Thus, CellSys models R. Sphaeroides by allowing the biologist to easily specify the

likelihood of each action that the bacteria will take when in a given state. The runtime

system of CellSys allows for concurrent operation of multiple bacteria and makes

adjustments to the global environment in which they exist.

The simulation outputs a series of POV files, specifying the successive locations of

each bacterium being modeled in the simulation. POV files are simple text files that

contain a description of each scene in the simulation. These files can then be interpreted

by POV-Ray, a free ray-tracer available on the internet, to create a series of images of the

simulation [1]. The series of images can then be combined to create an animation of the

conducted simulation [5].

A very similar project in biological system simulation is described in the article

“Individual-based simulation of the clustering behavior of epidermal growth factor

receptors” [16]. This article describes the implementation of a simulation and

visualization of epidermal growth factor receptors on cell surfaces. Epidermal receptors

are biological structures in the surfaces of cells. Free external molecules, known as

ligands, are able to bind themselves to these receptors. When this occurs, other receptors

on the cell surface are attracted to the newly formed bound receptor. The clustering

activity of these receptors then signal chemical processes within the cell to stimulate

growth. Although this process can be viewed under a microscope, little information

actually exists about its detailed mechanics. [16]

To gain further insight, a computer simulation was designed to model the clustering

behavior of receptors on the cell surface. The simulation was constructed using an object-

oriented approach. This is in contrast to the purely functional programming paradigm used

in our project’s simulation. In the object-oriented system, receptors are represented by

objects of the class Molecule. These Molecules have the ability to bind to ligands and

move within the two-dimensional surface of the cell, represented by an object of the class

CellSurface. [16]

When two Molecules come within close range of each other, the CellSurface object

references an AffinityTable containing clustering probabilities. (This is not unlike the

Markov chain mechanism used in the simulation of R. Sphaeroides.) The Molecules in

close proximity then have the possibility of forming a cluster structure represented by

objects of the Multimer class. The Multimer class is a subclass of the Molecule class

because clusters of receptors behave in much the same manner as receptors themselves.

Both have similar movement and clustering abilities. The Multimer class, however,

additionally allows the dissociation of its constituent Molecules. [16]

The receptors and clusters (represented by the Molecule and Multimer classes,

respectively) can be displayed to visualize the system’s global status (Figure 2). Both

receptors and clusters are represented as circles on a two dimensional plane (the cellular

surface). The diameters of these circles are proportional to the number of receptors within

the cluster. Free and bound ligands can be interactively introduced into the simulation

using the mouse. [16]

Figure 2. A scene from a receptor animation sequence.

Biological simulation techniques provide interesting comparisons to the larger

simulation system in which our visualization component will exist. However, it is primarily

important to examine systems that provide advanced biological visualization. A state of

the art example of such a system is VisBio, an Open Source, Java-implemented biological

visualization system developed by Curtis Rueden at the University of Wisconsin –

Madison. Described in a recent column in VisFiles by Bill Hiddard [13], VisBio aims to

develop innovative techniques for visualizing vast quantities of 2D, 3D, and higher-

dimensional data obtained through biological microscopy. These living cell visualizations

allow scientists to gain a better understanding of the dynamic properties of cells and their

internal structures. [13]

New algorithms for the tracking and visualization of such dynamic structures are

constantly being developed, and VisBio serves as a platform for the free exchange of this

evolving art. VisBio was thus constructed with both extensibility and portability in mind.

Developed in Java, VisBio is built upon the data visualization API, VisAD. The use of

VisAD allows great flexibility in the types of data that can be visualized. Therefore, VisBio

provides a flexible, portable environment for visualizing many different types of biological

structures. [13]

The primary data to be visualized is obtained by “varying a microscope’s focal plane

across multiple heights within a specimen.” [13] The result of this technique is a “stack” of

two-dimensional image slices that give an approximation of the biological specimen’s

three-dimensional structure (Figure 3) [13]. Often these images are captured over a span

of time, yielding a time-varying (or four-dimensional) view of the specimen. Furthermore,

new techniques in microscopy allow additional specimen data to be captured, creating up

to six-dimensional data to be visualized [13]. However, current versions of VisBio do not

support the visualization of such data [13].

Figure 3. Cross sectional capabilities of VisBio. Figure 4. Menu options and 3-D measurements in

VisBio

The VisBio interface is an example of the vast possibilities that exist when

visualizing biological data. First, the interface offers flexible color schemes and color

manipulation for the images to be viewed [13]. Besides offering a number of predefined

color schemes, the user is allowed to customize the color ranges of pixels in the images

[13]. Second, images are available for viewing both in low- and high-resolution formats.

This feature allows effective memory management and fast animation [13]. Third,

although the image visualization primarily takes the form of the image “stacks” described

previously, VisBio is capable of constructing 3-D, semi-transparent volumes of the

specimens for viewing [13]. Non-horizontal slicing planes through the specimen can be

interpolated, and precise measurements can be tracked and made across image slices

(Figure 4) [13]. Future developments planned for VisBio include advanced visualization

features of 6-D specimen data and more advanced memory management techniques for

visualizing the massive amounts of data [13].

VisBio is an excellent example project for our visualization system. Although much

more complex, it demonstrates the creativity that can be introduced into the visualization

process. The user interface of VisBio is something that should possibly be mimicked in

our visualization system, as it is very intuitive. As can be seen from the Figures 3 and 4,

each component of the visualization stands in a separate, floating window. This allows

hierarchical access to the many visualization features that may be incorporated (viewing

area, color schemes, etc.).

Another example of biological visualization is described in the journal article “FPV:

Fast Protein Visualization Using Java 3D” [14]. As the name suggests, this program is

used for protein visualization. Visualization of proteins is important for biologists because

the 3-D structure of proteins determines their ability to interact with other molecules [14].

The system was designed using the Java programming language and the Java 3D API.

Although performance of the Java 3D API can sometimes be low [14], this article suggests

techniques for fast visualization of the 3-D protein data.

This article is especially applicable to our current project. FPV accesses Protein

Data Bank (PDB) files to discern the atomic structure of the proteins to be visualized [14].

Scene construction then occurs via a Graphics Module, and the final, rendered scene is

output to the display unit [14]. Our project of cellular visualization can be thought of as the

repeated application of this process. Moreover, in many of the protein views described,

the internal protein atomic structure is represented by a group of spheres [14]. For our

system, it is likely that the cellular bacteria will also be represented as spheres. Thus,

optimization techniques used in FPV may also be considered for our project.

The particular optimizations described in the article were as follows. First, the

designers realized that because the atomic structure of the protein was rigid, essentially

only one geometric transformation needed to be completed, as long as the protein

structure could be drawn directly [14]. This required the creators to design a new

Sphere3D class that allowed the direct drawing of a sphere in any location [14]. Instead of

storing a transformation along with a sphere’s coordinates to be transformed, the sphere’s

transformed coordinates were stored directly [14]. This increased the time it took to load

the model, but greatly increased visualization speed [14].

The second optimization had to do with the recording of shape information within

the model. Each shape object contained information about both its appearance and its

geometry. Instead of redundantly recording information for shapes with the same

appearance, only objects of unique appearance were stored, each with an attached array

of geometric information for the various instances. [14]

Both optimizations significantly boosted performance. Larger protein models could

be loaded with higher frame rates [14]. When compared to other Java 3D protein

visualization programs, FPV outperformed them in almost areas [14]. The primary

importance of this article was the concrete suggestions made for simple geometric

rendering performance. Both optimizations will be explored further in our project.

The article “Problems of Visualization of Technological Processes” [15]

provides both an introduction and two advanced case studies in the field of dynamic

scientific visualization. In this article, the importance of dynamic visualization as a medium

for understanding is first discussed. Applications for the understanding of flows of people

and water, traffic accidents, forest fires, and global warming are mentioned. The role of

dynamic visualization is advocated to better understand hidden relationships and patterns

in dynamic systems. Moreover, rather than focus strictly on the precise accuracy of

dynamic simulations, techniques for rapid simulation and visualization are given that

maintain the accuracy necessary to discover general patterns. This approach is illustrated

with two examples of dynamic visualization in the field of power engineering.

The first simulation was that of a gas filtration device. In this device, gas is passed

though a filter containing granules that absorb dust and other gas pollutants. As the

granules absorb the pollutants, their absorption capacity decreases, and they grow heavy

and fall through the filter [15]. This material is then cleaned and continued to be used in

the filtration process.

In the simulation developed, researchers modeled the physical behavior of the

filtration granules and the gas passing through them. Although the movements of the

granules and gas were simulated using highly complex techniques, the main innovation in

the simulation was the division of the filter into volume units of equal size [15]. Properties

of these volume elements were then tracked such as the amount and locations of the

granules within them [15]. Purification of the gas was then measured by the rates of

change of pollutants in the volume elements [15]. Relative accuracy of the simulation

results can be seen in Figures 3 and 4.

The next system modeled was that of the combustion process of a power plant

boiler. Unfortunately, precise simulation of combustion processes requires very time-

consuming calculations. Programs like FLUENT offer such simulations, but do not provide

simulation feedback for many hours [15]. Thus, it is very difficult to provide simulation and

visualizations in reasonable amounts of time without many simplifying assumptions.

Particularly, simulations of fluid dynamics, coal combustion, and heat radiation were

simplified by using finite volume elements, upon which, calculations were made [15].

Moreover, in many cases, pre-calculated data sets are used to assist with simulation.

Such simplifications allow for real-time simulation and visualization of the combustion

process.

The visualization component of the simulation system provides an interface for

zooming, time-stepping, and obtaining volume element statistics. Use of pre-calculated

data sets also allows visualization playback at various speeds with relative ease. When

compared to the precision-oriented combustion simulation program, “about 60% - 80% of

volume elements have their attribute values (temperature, pressure, etc.) different from

values obtained by FLUENT by less than 30%.” [15] Such differences are viewed as

acceptable for process illustration purposes. Moreover, the speed advantages of the real-

time system are promising for future simulations.

The case studies mentioned above represent a wide range of biological simulation

and visualization. Simulations can be modeled in a variety of programming languages

using a variety of computer science technologies. The visualization systems mentioned

vary according to the types of data they display and the techniques for visualization.

There is, however, a unifying goal that all of the described programs have: they all seek to

further knowledge of Biology using techniques of Computer Science. By completing this

project, we hope also to make a small contribution to this pursuit.

Summary

The literature reviewed primarily presents two implementation methods for the

visualization tool. One approach is to write all the functionality in Haskell; the EDSL

CellSys, the interface using FranTk, the real-time simulation using either Fran or Yampa,

rendering the images directly from Haskell, and using the FFI to call some kind of video

encoding process. The other possibility is to write a stand-alone, modular, visualization

client using C++, OpenGL, and an interface toolkit such as GLUT or QT. Plug-ins could be

created to support various EDSLs in Haskell, or some other inter-language interfacing

method could be used.

1.4 Goals and Objectives

System constraints, environment, and interface requirements

The only hard system constraint is that the R. Spaeroides DSL, CellSys, is

implemented in the Haskell language. Soft constraints include the POV and MPEG-2 file

formats, pvengine, and mencoder. As long as an acceptable end-product (a playable

video file) is produced, the POV, or even MPEG-2, formats and associated utilities are not

necessary.

The current system is Windows-based, although all the utilities involved are open

source or available for multiple platforms, so the new system should also be relatively

platform independent, and run in both Windows and Unix environments.

The CellSys program currently interfaces with the pre-rendering process via file

I/O, but that output routine shall be re-written and replaced with a more direct

communication link between CellSys and the Input Module. The Input Module shall be

the new system's “public interface,” so to speak, although API may be a more appropriate

term. The interfaces between the input module, the real-time simulation, and the pre-

rendered simulation should be identical. The system shall also provide an interface to the

user, which allows him or her to enable or disable the real-time and pre-rendered

simulations, and to initiate the CellSys simulation. The user interface should also allow

the user to choose or configure the simulation DSL to some extent.

Objectives and tasks of work

First and foremost, a method for generating a movie file will be implemented. This

will most likely involve scripting mencoder to run with the proper arguments, and possibly

generating image files procedurally from Haskell or C/C++. The real-time simulation and

user interface shall be implemented in OpenGL, either in C/C++ or using FRAN and

Haskell. The interface shall be created using either GLUT or QT. An input module shall

be created and used to pass data from the CellSys DSL to the simulation interface. The

Haskell program can invoke both the interface and the input module to run as two threads,

and then pass data as it generates it to the input module. The input module will prepare

and send the data to the interface as it is needed, to support the real-time simulation and

animation playback.

Prototype expectations

The prototype shall provide an interface which allows the user to choose either real-

time simulation or pre-rendered video, and a button to start the chosen mode of operation.

Configuration options for either path should also be present.

Performance experiments and expectations

The real-time simulation should be able to handle up to a few thousand primitive

objects and render at least 25 frames per second.

1.5 Overall Approach

Overview of the proposed system

The focus of this project is to replace and extend the mechanism for generating

animation from the output of the CellSys Haskell program. This involves two primary

components, a video encoding process, and a real-time simulation process. The “real-

time path” can be used for immediate feedback from the CellSys simulation, while the

encoding process, or “pre-render path,” can be used for simulations which would be too

computationally intensive to render in real-time. The new system shall, from the starting

point of data generation, feature a fully automatic MPEG rendering process, as opposed to

the four-step manual process currently in use. The system as a whole shall provide an

interface to control the operation of the real-time rendering engine, the pre-render engine,

and should grant the user control of the simulation parameters, such as number cells and

frames to render. The system shall be designed in a modular fashion in order to facilitate

the extension of the visualization tool for other domain specific languages (DSLs), other

than the language which models R. Spaeroides in CellSys. Part of this design

philosophy will be the creation of an input module, which will act as an interface from

CellSys to the two rendering paths.

Work to be carried out

Re-write the CellSys output function. The old CellSys output function uses file

I/O to communicate with the other simulations phases. The function shall be re-written or

extended so that it can more directly communicate with the Input Module.

Construct an Input Module. The input module is responsible for taking data from

the embedded DSL and formatting it for either the real-time simulation or the pre-rendered

simulation.

Build a real-time simulation engine. This engine shall render the results of the

simulation in real-time, and should also provide a number of features for the analysis of the

simulation.

Completely automate the pre-rendered simulation path. Once the user indicates the

generation of an encoded video file, no further action should be necessary until the video

file is generated.

Build a user interface. The user interface will allow the user to control the

visualization. See section 1.4 for more details.

Advantages and disadvantages

The system will be simple, easy-to-use and extendable. Following the rapid-

prototyping nature behind the domain specific languages used for the simulation, this tool

will allow for the rapid producing of visual results. It should run independent of platform.

On the other hand, since this tool will potentially use many different technologies in order

to accomplish its ends (C/C++, OpenGL, QT, Haskell, mencoder, pvengine, etc),

installation may be non-trivial.

Detailed approach and risk analysis

There are several possible approaches to system implementation. First, the entire

project could be implemented in Haskell. This solution would allow direct communication

with CellSys programs in the form of function calls. There are also several OpenGL

interfaces for Haskell, allowing for the visualization. However, the use of Haskell would

require each of the project group members to learn a completely new programming

language. Furthermore, Haskell, as a purely functional language, is unlike many

conventional imperative programming languages. Thus, the learning curve would be quite

steep and both technical risk as well as time commitment would be excessively high.

The other general alternative comes in several forms but centers around the use of

a GUI toolkit. Toolkits such as Qt allow easy GUI programming and an interface to

OpenGL. The primary benefit of a GUI toolkit would be its comprehensive set of features.

Qt and C++ offer the entire file I/O and process management features necessary for each

alternative solution. Furthermore, the project members are more familiar with the C++

language than with the Haskell language.

The first alternative solution is by far the simplest: run CellSys programs as they

exist now and initially parse all the POV files to access the pertinent data. Parsing of the

files could be done automatically with the use of a parser generator such as Bison. The

parsed data would then be in a suitable format to be read and transformed by both the

Visualization Module and Movie Encoder. Although this approach is simple, it may be the

case that load times are too long due to the complexity of parsing such large amounts of

data. Thus, the operational risk involved for this alternative is too high.

The second alternative is very similar to the first. However, instead of parsing all of

the input data at once, only a portion of it is parsed initially. The rest of the data will be

parsed as the animation is running or the movie getting encoded. This is a more viable

alternative than that previously suggested. However, the complexity of the system is

increased due to the calculation of an appropriate buffered amount of parsed data before

animation may proceed. Moreover, the continued extensive use of the hard drive and

unfinished parsing also makes this alternative somewhat operationally risky.

The third alternative is the most likely candidate for implementation. In this solution,

the POV files will be discarded completely. In their place will be Haskell Foreign Function

Interface calls to C++ functions of our specification. These functions will populate the

scene data structures used by the Visualization Module and Movie Encoder. In this way,

no parsing of data files will need to be conducted and the data structure for the scene can

be populated along side the visualization with fewer performance concerns. The

disadvantage of this approach is that the original Haskell code for CellSys simulations will

have to be altered to make calls to foreign functions. It is also unknown whether functions

used in Qt’s runtime system will allow internal functions to be called. It might be necessary

to provide some sort of “hack” against such a restriction by forking a process that acts as

an intermediary between the Qt application and the CellSys simulation.

Cost Analysis

All the components used are either free or open-source, so costs are virtually non-

existent. This is slightly counter-balanced by the integration time for the system, since

many different technological aspects will be pulled together (User interface, Haskell API,

OpenGL simulation, video encoding). The largest cost is expected to be the time

investment, which is estimated to be approximately 10 – 20 hours per week for the

remainder of the semester.

Development strategies

Phase 1 • Semi-automate current processes

o Create sentinel or invoked C/C++ program to take as input current snapshot POV data and render/display the image to the screen.

Advantage Disadvantage

• Minimal impact on current Haskell program design

• Fastest to develop and implement

• Rendering images with POV is still rather slow

Phase 2 • Fully automate animation rendering

o Using either HOpenGL, or Haskell’s FFI (as discussed above) to directly render simulation data in OpenGL


• Not a separate process for rendering

• Least amount of disk I/O therefore much faster than Phase 1 solution

• Longer Development Time • Implementing this interface

requires migration of CellSys to Glasgow Haskell Compiler instead of the Hugs interpreter (longer time to adapt existing model required)

• FFI requires additional data specifications

Phase 3

• Extra Functionality o Graphical User Interface and Simulation parameter controlling tools o Ability to change camera viewpoint


• High level, abstract user control • Limit number of times code

must be recompiled to reevaluate the simulation

• Extra “wow” factor for audiences of Dr. Harrison’s talks

• Longest development time required for easy “pluggability”

• Could require extensive modification of CellSys Modules

2 Requirements Analysis

2.1 Introduction

During the past year, Dr. William Harrison has been conducting research on the

application of computer programming languages to biological simulation. In particular, he

has created the domain-specific programming language (DSL) CellSys that can be used to

create simulations of the movement, growth, and reproduction of the bacteria Rhodobacter

Sphaeroides. One drawback at present is the lack of an adequate visualization system for

the data generated by CellSys simulations. Currently, a series of shell scripts must be

manually executed to produce a video encoded MPEG after the simulation has been run.

This project will be to create a system that allows for a more seamless, automated movie-

making process of CellSys simulations, as well as the additional capability of interactive

simulation visualization. The following requirements analysis is a first attempt to clarify the

needs and construction of such a system.

2.2 Overall Description

This document enumerates a number of preliminary requirements and resulting

design decisions to aid in the development of both initial system prototypes and

appropriate evaluation metrics. These decisions are necessary for the rapid prototyping

development methodology that will be employed for the project. This methodology is a

particularly necessary requirement for interactive visualization development because it

allows continuous user feedback on both user interface and visualization requirements.

However, to make rapid prototyping a technically feasible option for a project of limited

time scope, a development environment familiar to project team members is required. In

particular, the team members for this project have selected the C++ programming

language in which to implement the visualization system.

This design decision, however, has a number of consequences. First, the CellSys

DSL (the simulation system) is embedded in the Haskell programming language.

Communication of simulation data between the simulation and visualization systems must,

therefore, occur between modules written in entirely different languages (i.e., Haskell and

C++). This data communication problem would not exist if the visualization system was

also implemented entirely in Haskell. To solve this problem, a separate data input module

for the visualization system must be developed. This input module will act as an

intermediary between the simulation and visualization systems. Iterative prototypes of the

input module will be constructed in an attempt to optimize the data-passing capabilities of

the input module (e.g., through files on the hard drive, through Haskell’s Foreign Function

Interface, through shared memory, etc.).

The primary performance requirements and evaluation metrics for this project will

relate to minimizing the delayed response of the interactive visualization after the start of a

simulation, maximizing the number of frames per second (FPS) that is supported for

various simulation sizes, and minimizing MPEG video construction time. What follows is a

more detailed description of the project requirements and relevant preliminary

implementation decisions mentioned in this section.

2.3 System Requirements and Constraints

2.3.1 Operating environment (external constraints)

Portability is one of the chief concerns of this project. Dr. William Harrison has

requested that the final program operate in a Windows environment, but his associate, Dr.

Robert Harrison, works primarily in Linux. Thus, the visualization system should ideally

operate in both environments. As a first approximation, therefore, the requirements

analysis should be constrained by the desire for portability.

The CellSys DSL is embedded in the Haskell programming language. To run

CellSys simulations, therefore, a Haskell interpreter is required. Any Haskell interpreter

that complies with the Haskell language definition in the Haskell 98 Report and Haskell

Foreign Function Interface Addendum to the Report is adequate. However, because Dr.

Harrison has been working with the Hugs 98 interpreter, its use will be assumed. Hugs 98

is freely downloadable for the Windows, Linux, and Mac OS X platforms and, therefore,

adheres to any platform portability constraints.

To maintain application portability, as well as maintain a familiar programming

environment as described above, the cross-platform C++ GUI toolkit Qt has been

preliminarily chosen for use throughout the project. In addition to providing a cross-

platform GUI programming environment, Qt has the advantage of providing an OpenGL

rendering context in which the interactive visualization will be constructed. Thus, the final

project will be portable to any platform that includes the appropriate Qt and OpenGL

libraries.

As a final operating environment constraint, the program should allow the

construction of a video encoded MPEG of the simulation. This MPEG video will allow

retrospective visualization playback even on systems not equipped with the appropriate

libraries and Haskell interpreter.

2.3.2 Market users and characteristics

The immediate market users for this project are Dr. William Harrison and any

associated researchers. While Dr. Harrison's primary interest is the application of DSLs,

this project is potentially relevant to any scientists who possess specialized biological

knowledge yet lack a background in computer science. By extending the DSL framework

with a visualization engine, this project provides a simple but solid environment in which to

build, model, and analyze various applications of domain-specific languages. There are

no known existing projects of similar scope and execution.

Because this project will utilize freely available technologies, the only economic

investment is time. While this is not a negligible consideration, it does not make the

project unfeasible from an economic standpoint. The visualization engine is general such

that it is not subject to any medical regulatory constrains. However, it will likely use a

number of open source components, and therefore must be constrained according to the

applicable open source licenses.

The primary customer requirement requests an interface connected to the CellSys

DSL embedded in Haskell to provide some form of visualization with minimal interaction,

setup, and reasonable delay. This will be achieved through two (mutually independent)

methods: a buffered, interactive OpenGL simulation and a rendered movie file.

2.3.3 Environmental constraints

The only human factors of concern are the Qt interface and the method for

initializing the interface, the input module, and the DSL. The user can reasonably be

expected to issue 2-4 commands from the embedded DSL to quickly configure and start

the interface and simulation. The interface itself must provide the necessary controls for

manipulating the simulation, and initiating the process to create a video encoded MPEG.

The application domain of this project is non-critical. The interface and simulation,

although interactive, are not intended for real-time systems and should not be used for

applications with real-time constraints. The product will uphold a standard of quality

acceptable for non-critical use, but cannot be used for any kind of simulation which

requires a high degree of safety, reliability, or authenticity (such as any kind of patient

treatment, etc).

2.3.4 System components

The system has four main components: the Qt interface and visualization window,

the process for generating a rendered video, the input module, and the CellSys DSL. The

DSL is responsible for generating simulation data, the input module accepts data from the

DSL and formats and prepares it for the interface, which displays the data as an interactive

simulation. The user will most likely make a call from the DSL which starts the input

module and the interface, and then make another call from the DSL to send the generated

data down the application pipeline. Playback controls for the simulation and encoding

process will be provided in the visualization interface.

2.3.5 Software interfaces and libraries

Notable software components include OpenGL (an industry-grade 3D graphics

library), Qt (a GUI toolkit), the C/C++ STL (Standard Template Libraries), Haskell (a

powerful functional programming language especially suited for embedded domain specific

languages), and possibly the Haskell Foreign Function Interface (used for called functions

from other languages) and/or a threading library to run the input module and the interface

in separate threads. An alternative is QProcess, a complimentary library for the Qt toolkit,

which may allow less complicated inter-process communication. A host of command-line

utilities will also be used for encoding into video format, such as ImageMagick and

MEncoder.

2.3.6 Communication interfaces

The DSL will provide data to the input module first by writing data files to a directory

while the input module scans for updated files. Later iterations will use either Haskell’s FFI

or some kind of threading mechanism.

2.3.7 Hardware interfaces

This project does not include any hardware interfaces beyond those found in a

standard PC. The most related would be keyboard and mouse input and display output.

2.3.8 System maintenance

This project is a software entity, and should easily run or port to any platform which

supports the component tools, including Linux, Unix, BSD, Windows, Mac, etc. The

project will target the Windows platform. The only hardware required is a computer

running one of the applicable operating systems.

The life cycle of this project can include use with multiple DSLs and due to its

modular nature should support upgrades to specific subsystems fairly well (i.e., the DSL,

the visualization interface, etc). While no consideration is given to disposability, the

software arose with a starting point of a single, stand-alone DSL, so it should be fairly easy

to replace or transplant, if the need arises.

2.4 Alternative Solutions

� Using the Fran DSL & FranTK

The Fran Domain Specific Language and Toolkit extend Haskell functionality

to include 3D graphics rendering, easing the passing of data from the DSL simulation

to the visualization engine. While this would simplify the data communication, the

Fran DSL and FranTK rely on DirectX technologies and are Microsoft Windows

dependent. Thus, application portability would be lost. This solution also suffers from

the constraint of a Haskell implementation, as described previously.

� Using the HOpenGL DSL

Like the Fran solution, HOpenGL is a 3D graphics API for use in Haskell,

allowing great simplification of the data communication problem. As noted before,

however, the members of the project team are not familiar with Haskell programming,

making such a solution technically infeasible. Furthermore, Haskell is a functional

language and is not optimized for the mathematical calculations required for 3D

rendering and animation. Thus, a severe performance penalty could possibly be paid

with this solution.

� Shell Scripts

The current system of visualization is implemented using a series of shell

scripts that are executed after the simulation has finished running. These scripts make

use of third party programs (POVRay and MEncoder), to generate an AVI movie for

later playback. If this approach is used, the primary focus would be aggregating the

current shell scripts and making them more efficient. This solution would be easier to

implement than any of the others but would have the longest delay from simulation to

visualization. Furthermore, no interactive visualization component would be added.

2.5 Performance Requirements

One of the primary performance requirements is minimal visualization response

time. As Dr. Harrison would like to ultimately use this system to create a real-time

visualization of a simulation, the system must be able to receive and display the current

simulated environment with minimal delay. Various simulation parameters, including the

number of simulated bacteria and simulation length, will have varying impacts on this

response time. The primary aim of the input module optimization in successive iterations

will be to reduce this response time as much as possible.

Another performance requirement of great importance is the number of frames per

second (FPS) that can be displayed in the interactive visualization. Currently, the

simulation produces data for real-time playback at 25 FPS. The rendering ability of the

interactive visualization system should ideally be able to accommodate this playback

speed for simulations of modest size. Modest size, in this sense, is a negotiated metric

determined from Dr. Harrison’s feedback and the prototype evaluation results. For larger

simulations, MPEG video playback may be used for visualization, or a decrease in FPS

might be feasible. Furthermore, as the complexity of the interactive visualization

increases, the realized FPS will also be forced to drop. The user may be forced to limit

interactivity of the visualization to achieve a desired FPS.

It should be noted that these performance requirements are not entirely separable.

A decrease in the visualization response time could very well limit the FPS of the

interactive visualization. Thus, the relationship between these two performance

characteristics will have to be mapped for simulations of various sizes and lengths. A

primary goal of the prototyping methodology is the ability to consult with Dr. Harrison as to

the relative importance of each of the system’s performance aspects.

A final performance requirement is to minimize the time of construction of the video

encoded MPEG for simulations of varying sizes. To a first approximation, the construction

should be completed relatively autonomously and more quickly than is presently available

with current shell scripts in use.

2.6 Resource Requirements

For fully featured construction & development, this project will require the following:

� 3 full-time developers, for 3 months each, totaling up to 1800 Man-hours

� Desktop/Laptop PC running at 1 GHz with 256 MB of RAM

� Software:

o ANSI/ISO Standard C++ Compiler

o Qt supported Operating System (Windows, Linux, Mac OS X, UNIX variants)

o Qt 3 libraries

o OpenGL libraries

o Hugs98 (the Open-Source generally accepted standard interpreter for

Haskell 98 used for the running of the actual simulation)

o MEncoder, an open source utility distributed with MPlayer, used for encoding

with the Microsoft MPEG4V2 codec

o ImageMagick, a suite of command-line utilities, used for converting image

formats

2.7 Evaluation Metrics

Evaluation of the success of the project will be conducted with several metrics, each

following rather immediately from the performance requirements descried in section 2.5.

Measurements of the supportable number of frames per second under varying simulation

conditions by the interactive visualization will be recorded, but a frame rate no lower than

25 FPS shall be targeted. In addition, the delay from the simulation start to visualization

start will be observed for various simulation sizes. The order of magnitude for these

metrics should be on the scale of a few seconds per thousand frames. Last, the MPEG

construction times will be recorded for varying simulation sizes. To more precisely target

performance bottlenecks, individual evaluation metrics of the input module’s performance

and individual frame construction times will also be observed. From this data, general

averages and averages for various simulation sizes will be calculated for each of the

performance aspects.

Several qualitative ratings, such as the aesthetic value of the animations and the

intuitiveness of the user interface will also be evaluated. Unfortunately, quantitative

metrics are more difficult to use for items such as these. Thus, continuous qualitative

feedback from Dr. Harrison will be required to maintain an acceptable level of work.

3 Design Specification

3.1 Introduction

Herein lies the specification of our design methodologies.

3.2 System Design Overview

This project is primarily integrated as a single interface, built in C/C++ with the Qt

toolkit. The interface shall use menus allow the user to load a simulation into memory

from either a collection of POV files contained in a directory, or from a custom binary

format, or encode an AVI from a collection of POV files contained in a directory. Mouse

input shall dictate camera movement, and a number of buttons shall be provided on the

face of the interface for controlling the simulation. Context menus will facilitate other

functions.

3.3 Data Requirements

Data collection and storage

Data is collected from the CellSys language. The data is collected from POV

format, and stored in any number of the following formats.

Relevant file formats include:

• Custom binary format (*.csm)

This format can be used to save previously loaded simulations. Loading simulations

from this format instead of raw POV files is much more efficient.

• POV-Ray file format (*.pov)

A format which is compatible with POV-Ray and other ray tracing programs. These

files are used to load simulation data and to encode videos.

• POV-Ray generated Windows bitmap format (*.bmp)

As the first step of encoding a video file, these images are produced by the POV-Ray

rendering engine, and are later converted to the PNG format.

• Portable Network Graphics format (*.png)

During the video encoding process, the BMP images are converted to PNG format.

This is accomplished with ImageMagick's “convert” command-line utility.

• MSMPEG4v2 encoded AVI file format (*.avi)

MEncoder is used to encode a series of PNG images into a standard AVI.

I/O Requirements

The interface shall receive POV files as input, and shall produce a simulation.

These simulations can be saved, as CSM files, for viewing in the future. The interface

shall invoke the CellSys language to produce POV files, and when desired, given a set of

POV files, the interface is required to output a video file. User input for simulation control

is entirely mouse-driven.

Digital Archives and Databases

Beyond permanent storage of POV, CSM, and AVI files on a hard drive, there are

no applicable digital archives or data stores relevant to this project.

Data formats

The CSM format has a field containing the number of frames in the simulation.

Following this is the number of cells in the first frame, and then the cell information for the

first cell, second cell, etc. Data for frames 2 through the number of frames in the

simulation follow in the same format. A diagram is provided below.

| N frames |

| M1 cells | cell 1 | cell 2 | …| cell M1 |

| M2 cells | cell 1 | cell 2 | …| cell M2 |

. . .

| MN cells | cell 1 | cell 2 | …| cell MN |

3.4 Software Design

Class/Object Diagram (Somewhat simplified)

Block Diagram

Data Flow Diagram

Interface Diagram

Alternative Evaluation and Selection

Alternatives to the C/C++ Qt/OpenGL interface included writing something native in

Haskell, but this approach was abandoned due to the complexity of embedding such a

system in a functional language. There are tools available for this purpose, such as Fran

and FranTk (Functional Reactive Animation and the associated interface toolkit, functional

“equivalents” of OpenGL and Qt), however they are not as established as the long-existing

C/C++ foundations, and documentation and appropriate learning material is more difficult

to find.

The interface uses the QProcess class to communicate between different

applications, however the Haskell Foreign Function Interface also provides similar

functionality. Implementing the Haskell FFI provided too little additional benefit to warrant

the additional research and development time.

OpenGL can render images from the simulation straight to the hard disk, thus

bypassing the need to render POV files using POV-Ray, however existing functionality in

the form of POV-Ray was again chosen over radically new development. The decision

was made for simplicity, but rendering frame images using OpenGL was a viable goal of

this project, which is unfortunately as of yet unfinished.

The alternatives in use by this project were selected for technical feasibility.

Major difficulties

The frame data would ideally be contained within a C++ STL vector list, however

there appeared to be an error within the STL implementation or the C++ compiler. A

doubly-linked list was used instead. Qt signals, which are used for inter-functional

communication, are implemented as macros, and it can be difficult to trace errors with

them.

The QProcess class, used to invoke CellSys, sometimes behaves differently than

expected when producing the “standard output is ready” signal. Due to buffered, multiple

lines of output would be produced at once, instead of one at a time.

Seamlessly integrating different applications such as the Haskell evaluator, POV-

Ray, and mencoder proved to be somewhat non-trivial. Also, establishing appropriate

compilation environments was somewhat of an issue, due to the large array of utilities in

use. Additionally, some of the free and/or open source tools, specifically mencoder,

possess large and adequate, but labyrinthine, documentation.

3.5 Hardware Design

The solution described in this report is a software system, and requires no hardware

beyond a standard PC with a suitable graphics card. The standard computer system, in

this sense, includes a keyboard, mouse, monitor, motherboard, processor, hard drive,

memory, etc.

3.6 Testing Methods: Evaluate the following

• Is the program very responsive with minimal waiting delays (i.e. loading a

simulation)?

• Does the program have a light footprint (relatively low memory usage) with respect

to the number of frames processed?

• Does the program have intuitive and easy to use playback controls?

• Is the user able to easily reposition the viewing angle?

• Is the user able to easily run a new simulation?

• Is a power user able to easily understand and extend the code?

3.7 Scheduling Diagram & Task Assignments

Timeframe: March - May 2005 Task Assigned To Project documentation Billy and Charlie Core Development Perry UI Development Perry Simulation Execution Development Charlie Movie Generation Development Billy Presentation Development Charlie and Billy 3.8 Implementation Costs Description Qty Cost Per Total Developer Hours 40 20 800 Documentation Hours 80 20 1600 Windows Licenses 3 120 360 Learn QT book 1 40 40 Learn Haskell book 1 27 27 TOTAL 2827

4 System Implementation

Implementation Summary

The system was implemented as described in the Design Specification section.

The user interface was created with Qt, using an OpenGL widget for the cell display. The

QProcess class is used for communication between the interface, CellSys, POV-Ray, and

mencoder. All programming was written in C/C++.

Installation Instructions

Establish the operating environment

• Install ImageMagick, and make sure the “convert” utility included in the ImageMagick

suite is within the system path. Rename the “convert” utility to “zconvert,” to avoid a

namespace conflict with the Windows filesystem “convert” utility.

• Install “grep” and “gawk” for Windows. The path to these executables can be specified

in the visualizer application at runtime.

• Install MPlayer for Windows, and ensure the “mencoder” utility is within the system

path.

• Install POV-Ray for Windows, and ensure the “pvengine” utility is within the system

path.

• Install a Haskell interpreter for Windows (such as Hugs98). The path to this executable

can be specified in the visualizer application at runtime.

Establish the Compilation Environment

Install Qt for Windows, and optionally the Borland 5.5 command-line compiler

(included with the Non-Commercial Qt Distribution on the project CD). See the readme file

in QtCD directory for additional instructions on installing the Borland compiler, which is not

fully installed by the CD installation process. Make sure the appropriate “make” command

will be found, and that the Borland “make” doesn’t conflict with an existing installation.

Finally, compile the project with the “pmake.bat” batch file.

5 System Performance, Testing and Evaluation

Overview

Forty-one test cases were run in an attempt to gauge the interactive visualization

performance under various simulation loads. In addition, simulation load times were

compared for identical simulations saved both as a directory of *.pov files and as a single

binary *.csm file. Finally, a less intensive performance evaluation of the *.avi movie

rendering process was performed. This consisted of a qualitative inspection of the

automated *.avi rendering process.

Interactive Visualization Performance, Testing, and Evaluation

A battery of forty-one tests was conducted to measure the interactive visualization

performance in response to various visualization components. Results of these tests can

be found in Table 5.1. The primary measure of the visualization performance was the

comparison between the frame rate as specified by the user (nominal frame rate) and the

actual frame rate achieved by the system (actual frame rate). In particular, frame rate

comparisons were made for simulations involving increasing numbers of cells and the

presence or absence of a clipping box, bounding box, viewpoint manipulation, and cell

color codes. The test cases were performed on a 2.8 GHz Pentium IV processor-based

computer with 512 MB of RAM and an ATI Radeon 9200 video card.

The test results were concordant with preliminary performance expectations based

on a basic understanding of OpenGL frame rendering. In particular, it was found that

system performance decreased as the cell count increased. Comparisons of the actual

frames rates achieved and the user-specified nominal frames rates with no visualization

features can be seen in Figure 5.1. Running a simulation with only one cell and no

visualization features selected, a nominal frame rate of 100 frames per second (fps)

yielded an actual frame rate was 66.67 fps. This discrepancy is undoubtedly due to

inherent limitations of processor speed, rendering capability, and processing overhead of

common GUI events. Thus, 66.67 fps is the maximum achievable frame rate for any

simulation viewed in the interactive visualization.

As more cells were included in simulations, the maximal achieved frame rate

dropped as can be seen in Figure 5.1. These results show the natural bottleneck that

occurs when each includes more and more cells (represented as spheres in a three-

dimensional viewing space). For example, when rendering 1000 spheres, each consisting

of 400 polygons, 400,000 polygons must be drawn for each frame. At a nominal frame

rate of 100 frames per second, the system must be capable of rendering 40 million

polygons per second. This level of performance is clearly not achievable on a personal

computer.

The presence of a bounding box for the visualization also degraded system

performance. The bounding box functionality served to allow the user to selectively isolate

portions of the visualization. This functionality was achieved with the additional OpenGL

clipping planes. Increases in calculation for the projection transformation were thus

required by OpenGL for each frame representation. This additional calculation complexity

reduced performance as can be seen in Figure 5.1.

The visualization option that produced the most dramatic performance decrease

was viewpoint manipulation. The interactive simulation allows the user to change

viewpoints using the mouse. Each time the mouse is moved only one pixel, the system

attempts to redraw the frame at the new viewpoint, effectively greatly increasing the

nominal frame rate. For cell counts of more than 100, this increase in frame rate

essentially freezes the interactive visualization during the period of viewpoint manipulation,

as can be seen in Figure 5.1. However, as soon as the viewpoint is no longer being

manipulated, the previous actual frame rate is again achieved.

Examination of the test cases also revealed that the addition of color codes and a

clipping box yielded negligible performance variations. This was entirely expected

because changes in cell color involve relatively few OpenGL rendering changes. Also, the

clipping box (so named because it defines the region which the user can selectively view)

involved rendering at most six additional polygons. Thus, its addition to the frame

rendering was negligible when compared to the polygonal counts involved in the cell

(sphere) rendering.

The overall performance of the interactive visualization system is quite acceptable.

The achievable frame rate decreases to below 25 fps only for simulations that involve

more than 400 cells. Thus, for simulations that involve fewer than 400 cells, a fluid

visualization can be produced. Moreover, as general trends in the clustering of cells are

all that are required of the visualization system, frame rates of far fewer than 25 fps are

acceptable. In short, it is not required that the visualization achieve an amount of fluidity

conducive to the human eye, but that general trends in cell clustering and movement can

be seen.

Performance data is tabulated on the following page.

Cell Count

C. Box*

B. Box*

V.P. Manip* Colors

Nominal FPS Actual FPS

1000 No No No No 100 12.05

1000 No No No No 40 12.05

1000 No No No No 25 12.05

1000 No No No No 10 9.09

1000 No No No No 5 4.93

1000 No No Yes No 100 1.59

1000 No No Yes No 5 1.92

1000 No No No Yes 100 12.05

1000 No Yes No Yes 100 9.90

1000 No Yes No No 100 10.00

1000 Yes No No No 5 4.93

1000 Yes No No No 10 9.09

1000 Yes No No No 25 11.90

500 No No No No 100 23.26

500 No No No No 40 23.26

500 No No No No 25 21.28

500 No No No No 10 9.17

500 No No No No 5 4.93

500 No No No Yes 25 21.28

500 Yes No No No 25 21.28

500 No Yes No No 100 18.52

500 No Yes No No 40 18.52

500 No Yes No No 25 18.52

500 No Yes No No 10 9.17

500 No Yes No No 5 4.93

500 No No Yes No 100 2.85

500 No No Yes No 5 2.80

300 No No No No 100 37.04

300 No Yes No No 40 32.26

200 No No No No 100 52.63

200 No Yes No No 100 45.45

100 No No No No 100 66.67

100 No No Yes No 100 5.92

100 No Yes No No 100 66.67

1 No No No No 100 66.67

1 No No No No 40 32.26

1 No No No No 25 21.74

1 No No No No 10 9.17

1 No No No No 5 4.93

1 No No Yes No 100 66.67

1 No Yes No No 100 66.67

C. Box = Clipping Box Present B. Box = Bounding Box Present V.P. Manipulation = Viewpoint Manipulation Occurring

Table 5.1. Interactive Visualization Load Performance on 2.8 GHz processor with 512 MB RAM and ATI Radeon 9200 video card.

Actual FPS for Nominal 100 FPS

66.67

52.63

37.04

23.2612.05

01020304050607080

0 300 600 900 1200

Number of Cells

Actu

al F

ram

es P

er

Sec

ond

No Options

Bounding Box

ViewpointManipulation

Figure 5.1. Comparison of Actual Frame Rates against Nominal Frame Rates under the presence of various visualization parameters.

Simulation Loading Performance, Testing, and Evaluation

The system is capable of loading two distinct representations of simulations. The

first representation are the *.pov files that are directly written to the hard drive by CellSys

simulation programs. Loading the data contained in the *.pov files involves continual

opening and closing of files on the hard drive and parsing the frame data contained

therein. Numerical representations of the locations of cells must then be converted from

string format to a binary numerical representation that can be utilized by the display

module.

In contrast, once a simulation has been loaded, the user has the option of saving

the simulation as a *.csm file, a custom binary format created to store the simulation frame

data. The format of the *.csm files can be found above, in Section 3.3 under the “Data

Formats” heading. The *.csm files consist only of the binary numerical representations of

the frame data. Therefore, very little parsing and conversion is necessary to extract the

frame data. Moreover, all frame data for a simulation is contained in a single file as

opposed to a directory of *.pov files. The load times for *.csm files are, therefore,

significantly shorter than the load time for directories of *.pov files, as can be seen in Table

5.2 and Figure 5.2. In general, for simulations of moderate size, load times for *.csm files

can be expected to be on the order of 40 times as fast as load times for *.pov files.

Cell Count

*.pov Load Time

*.csm Load Time

Ratio (POV/CSM)

1 0.188 0.031 6.06200 2.782 0.063 44.16300 4.047 0.078 51.88500 6.859 0.125 54.87

1000 12.922 0.219 59.00Table 5.2. Simulation load times for *.pov and *.csm files.

100 Step Simulation Loading

02468

101214

0 200 400 600 800 1000

Number of Cells

Load

Tim

e (s

ec)

POV FilesCSM File

Figure 5.2. Simulation load times for *.pov and *.csm files.

AVI Rendering Performance, Testing, and Evaluation

No formal tests were conducted to measure the speed of the *.avi movie rendering

process. Qualitative observations indicate that each rendered frame takes approximately

two seconds on a 2.8 GHz processor-based computer with 512 MB of RAM. Thus,

rendering an *.avi file takes significantly longer than loading either a *.pov directory or

*.csm file for the interactive visualization. However, the interactive visualization requires

that all frame data be present in main memory before the simulation can be viewed. Thus,

the size of simulations that can be interactively viewed is limited by the amount of memory

in the computer on which the visualization system is being run. The *.avi movie rendering

process is not limited by this memory constraint and should is more suited for simulations

of very large sizes. To even run such large simulations, a large amount of delay is

expected, and thus, the delay in the *.avi encoding process is not a significant bottleneck.

Additional methods of *.avi encoding are also given in the Future work section below that

would greatly increase the encoding process.

6 Conclusions and Discussion

The current system is the result of only two cycles in the spiral software

development model. As such, it is currently only a prototype for the eventual system and

includes only an approximation of the desired functionality. A summary of the successes

and failures related to the current prototype, however, is appropriate at the this point in

order to gauge the current functionality and to direct further work that may be done on the

project.

First, the decision to use C++ and Qt to complete the prototype turned out to be the

correct one. The rapid development model embraced for the completion of this project

required the use of a familiar programming language and programming language

paradigm. Thus, the use of Haskell for the completion of the project would have been an

unviable solution.

Moreover, as our group developed more familiarity with Qt and its process-forking

abilities, the initial motivation for the use of Haskell no longer exists. In particular, the use

of Haskell would have made data communication between CellSys programs and our

visualization system very fast (it could be completed in main memory). As an alternative

solution, use of the Haskell Foreign Function Interface to communicate with the

visualization system would have provided the same performance. The complexity of inter-

process communication that would result from use of the Haskell FFI made this solution

technically infeasible for implementation in an early prototype. After using the QProcess

class for implementation of the *.avi rendering process and the running of a CellSys

simulation from the visualization GUI, it could be seen that the same process could be

used to capture simulation frame data from CellSys programs, as long as they wrote the

*.pov frame data to Standard Output instead of the hard drive. This would involve very few

changes to both the simulation program and the visualization program but would greatly

increase the speed of running and loading a new simulation. Writing *.pov files to the hard

disk could then be an option only used to render *.avi files.

In general, this project has yielded very positive results. The previous scripting

functionality for producing a *.avi file was adjusted to function more correctly and

automatically. This feature has been integrated into the application’s main window and

can be seamlessly accessed. The user can also easily run a new simulation from the

main window of the application. Previously, this process involved running the Haskell

interpreter, loading the CellSysSemantics file and then running the simulation.

Furthermore, the user had to manually ensure that a directory named “POV” existed in the

current working directory or the simulation would fail. All of this functionality has been

automated in our current prototype.

The current functionality of the interactive visualization is adequate for a prototype,

but would need to be augmented for a final release. In particular, the user should be able

to display multiple viewports that show selected portions of the simulation. Due to time

constraints, this functionality was not implemented. However, to add this functionality, the

interface between the player controls and the current display would not need to be

changed. All that would need to be implemented would be a container class for multiple

viewports, and then the current display could be easily swapped with this new class.

Finally, the performance analysis of the current system (as described above, in the

section, “System Performance, Testing and Evaluation”) is very promising. Simulations of

moderate to large size can be played at very reasonable frame rates. Further data

structure and cell representation optimizations could also further improve performance.

7 Future Work

Additional Features

It would be beneficial for reasons such as verification and a greater degree of

control of simulation parameters to be able to encode videos from not only POV files, but

from the frames rendered from OpenGL as well. This would allow for a small degree of

“interactivity” in the encoded video file as well as give the user more control of what is

displayed and how. Furthermore, the OpenGL simulation only supports the sphere

primitive, and it would be useful to include support for a wider range of primitive objects

found within various kinds of input data, such as cylinders, cubes, etc.

Another possible improvement to the interface would be tabbed simulations. This

would allow the user to open multiple simulations at once, and switch between them using

tabs, similar to tabbed browsing.

More cell-tracking features should be implemented, such as showing instantaneous

velocity vectors for cells that are moving. The implementation currently supports an array

of colors for displaying the cells, but the user has little control over this feature. Part of the

problem is that the input data produced by CellSys contains no uniquely identifying

information for each cell, so it is difficult to determine where a specific cell is from frame to

frame. Some system for identifying the cells should first be implemented.

Optimizations

A more efficient method of data storage could be implemented for the cell

information, such as a STL vector list or comparable class or structure. Also, the entire

simulation is loaded into memory before playback begins. It should be possible to create a

buffering system where the user can access a limited set of features to play the simulation

as it is being loaded. This may require multi-threading, forking, or shared memory of some

nature. Alternatively, a “sliding window” type algorithm could be used to keep only the n

frames above or below the current frame in memory. The algorithm would load the other

frames as the current frame is moved around and discard frames from memory which are

too far away from the current frame. OpenGL display lists could potentially bolster the

performance of the simulation, and the number of polygons per sphere is probably larger

than necessary.

The CellSys language outputs a new frame every time a single cell is updated, so

the frame rate is effectively reduced by a factor equivalent to the number of cells in the

simulation. To compensate for this, a “fast forward” feature was added, which works by

skipping ahead by a number of frames, but this still does not solve the problem of over-

sampling. To fully correct this issue, the CellSys language must be modified, which was

beyond the original scope of this project.

Display more information

Some representation or indication of the light concentration and placement used

when generating the cell data should be implemented. Also, some effect could be used to

signal when and where a cell reproduces. Finally, some indication of a cell’s current

behavior (moving, lazing, reproducing, etc) could be provided.

Other Improvements

The automatic encoding process is not as seamless as might be desired. The calls

to POV-Ray produce the POV-Ray splash page for every frame in the simulation. If this

could be somehow suppressed, it would produce a better user experience. Rendering

frames from the OpenGL frame buffer, instead of POV-Ray, would also solve this problem.

8 References

[1] POV-Ray, 2005, [Cited 05 Mar 2005], Available at: http://www.povray.org/ [2] POV-Ray File Format, 2005, [Cited 05 Mar 2005], Available at: http://www.povray.org/documentation/view/3.6.1/224/ [3] MPlayer, 2005, [Cited 05 Mar 2005], Available at: http://www.mplayerhq.hu [4] Hugs Online, 2005, [Cited 05 Mar 2005], Available at: http://cvs.haskell.org/Hugs/ [5] W. Harrison, R. Harrison, “Domain Specific Languages for Cellular Interactions,” 26th Annual

International Conference of the IEEE Engineering in Medicine and Biology Society, Volume 4, pp 3019-2022, 2004.

[6] C. Elliott and P. Hudak, “Functional reactive animation.” In Proceedings of ICFP'97: International Conference on Functional Programming, pages 163-173, June 1997. [7] P. Wadler, “The essence of functional programming,” Proceedings of the 19th ACM SIGPLAN-SIGACT symposium on Principles of programming languages, pages 1-13, 1992. [8] C. Elliott, "An embedded modeling language approach to interactive 3D and multimedia animation," IEEE Transactions on Software Engineering, Volume 25, Issue 3, pages 291-308, 1999 [9] A. Pang, D. Stewart, S. Seefried, M.Chakravarty, "Plugging Haskell in," Proceedings of the ACM SIGPLAN workshop on Haskell, pages 10-21, 2004 [10] M. Sage, "FranTk - a declarative GUI language for Haskell," Proceedings of the fifth ACM SIGPLAN international conference on Functional programming, pages 106-117, 2000 [11] B. Mechtly, E. Rooker, K. Mast, “3D rendering with C++ and openGL in undergraduate projects,” Journal of Computing Sciences in Colleges, Volume 17, Issue 1, pages 168-177, 2001 [12] A. Courtnet, H. Nilsson, J. Peterson, "The Yampa arcade," Proceedings of the ACM SIGPLAN workshop on Haskell, pages 7-18, 2003 [13] Hibbard, Bill, “VisBio: a biological tool for visualization and analysis,” ACM SIGGRAPH Computer Graphics, Volume 37, No 2, pp 5 – 7, 2003. [14] Can, Tolga; Wang, Yujun; Wang, Yuan-Fang; Su, Jianwen, “FPV: fast protein visualization

using Java 3D™,” Proceedings of the 2003 ACM symposium on Applied computing, 2003. [15] Slavik, Pavel; Gayer, Marek; Hrdlicka, Frantisek; Kubelka, Ondrej, “Visualization for modeling and simulation: problems of visualization of technological processes,”

Proceedings of the 35th conference on Winter simulation: driving innovation, Session: Modeling methodology, pp 746 – 754, 2003.

[16] Goldman, Jacki; Gullick, William; Bray, Dennis; Johnson, Colin, “Individual-based

simulation of the clustering behaviour of epidermal growth factor receptors,” Proceedings of the 2002 ACM symposium on Applied computing, Session: Applications of spatial simulation of discrete entities, pp 127 – 131, 2002.

Appendices

Not applicable.

real-time visualization tool for haskell- embedded...

Documents