real-time visualization tool for haskell- embedded...
TRANSCRIPT
(Biological Process Visualization) CS4970 Sr Capstone WS05 William Moore, wlmd23, 832138 Charlie Huggard, ach343, 872978 Periclies Kariotis, psk1db, 882445 Mentor: Dr William L Harrison Technical Report
Real-Time Visualization Tool for Haskell-Embedded Cellular Interaction Domain
Specific Languages
(Using the Rhodobacter Sphaeroides DSL)
Table of Contents
Executive Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1 Problem Statement. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
1.1 Introduction. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1.2 Technical Background . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1.3 Literature Survey. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1.4 Goals and Objectives. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1.5 Overall Approach. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
2 Requirements Analysis. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2.2 Overall Description . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2.3 System Requirements and Constraints. . . . . . . . . . . . . . . . . . . . . . . . . . . .
2.3.1 Operating environment (external constraints). . . . . . . . . . . . . . . . . 2.3.2 Market users and characteristics. . . . . . . . . . . . . . . . . . . . . . . . . . 2.3.3 Environmental constraints. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2.3.4 System components . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2.3.5 Software interfaces and libraries. . . . . . . . . . . . . . . . . . . . . . . . . . 2.3.6 Communication interfaces. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2.3.7 Hardware interfaces . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2.3.8 System maintenance . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
2.4 Alternative Solutions. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2.5 Performance Requirements. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2.6 Resource Requirements. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2.7 Evaluation Metrics. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
3 Design Specifications. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3.1 Introduction. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3.2 System Design Overview. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3.3 Data Requirements. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3.4 Software Design. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3.5 Hardware Design. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3.6 Testing Methods: Evaluate the following. . . . . . . . . . . . . . . . . . . . . . . . . . . 3.7 Scheduling Diagram & Task Assignments. . . . . . . . . . . . . . . . . . . . . . . . . . 3.8 Implementation Costs. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
4 System Implementation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5 System Performance, Testing and Evaluation. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6 Conclusions and Discussion. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7 Future Work. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8 References. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Appendices. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
Executive Summary
This document details the software engineering process followed in the creation of
the system for Dr. William Harrison. Section 1, the problem definition, describes the
background information, state of the industry, and the need for development. Section 2
more clearly defines what is expected of the system in terms of various requirements.
Section 3 specifically details the design specifications for the project, and section 4
includes some notes on the final implementation, including installation instructions.
Evaluation metrics and testing results are described in section 5, and section 6 contains
project conclusions. Section 7 elaborates on potential future development related to this
project. See section 8 for a detailed list of references.
1 Problem Definition
1.1 Introduction
Motivation, market demand, and other solutions
Domain specific languages are a convenient and powerful solution to a common
problem in inter-disciplinary studies, which occurs when the expert knowledge required
rarely occurs within the same scientist [5]. The definition of the CellSys DSL requires
knowledge of cell biology, and while CellSys itself is fairly simple, DSLs for much more
complex experiments and simulations could be produced. However, the language
implementation of the simulation requires knowledge of programming, language design,
etc. A domain specific language allows experts of one type to define a simulation in terms
of their specialty domain without worrying about the detailed specifics of the computer
implementation itself. The computer scientists can then produce the programmatic
structure and scaffolding for the simulation without an intimate knowledge of the theories
behind why the DSL is defined the way it is, so long as the definition is clear.
Motivation for Visualization Tool
The visualization tool serves to extend the usefulness of Haskell-embedded DSLs
by providing a simple, re-usable architecture for the rapid construction of visual simulation
results. According to [5], visualization is extremely valuable for checking the faithfulness of
biological models, and this system furthers and simplifies that process.
Market demand and other solutions
This system will expedite the visualization process for the CellSys language, and
can be reasonably extended to work with any other DSL as well. Dr Harrison will be able
to use this project for R. Spaeroides and other implementations. No known products
provide visualization services for Haskell-embedded DSL simulations or biological models.
1.2 Technical Background
Overview of the existing system
One of Dr. Harrison’s current projects [5]
is developing domain-specific languages to
enable biologists to easily simulate complex
processes. This approach enables biological
researchers to quickly develop and easily
readapt computational models of complex
biological processes without needing a strong C/C++ background [5]. At the highest level,
the purpose of this project is to replace an existing, predominantly manual system for
converting text-based frame data into an encoded video file, or some other animated
visual representation. The video content is an estimated model of the behavior of the
Rhodobacter Sphaeroides bacterium, as it moves around in three-dimensional space and
reproduces based on light concentration. While the implementation and behavior of R.
Sphaeroides are slightly beyond the scope of this project, information on these items can
be found at [5]. The current process involves four phases, the first of which is running the
Haskell program CellSysSemantics (see [4] for a Haskell interpreter) that generates
text-based frame data in POV directory, one for each frame, using the persistence of
vision (POV) file format [2]. The POV format can be used by POV-Ray [1], which is a free
imaging tool, and other applications. Secondly, the user must use the text frame data to
render BMP frame images. This is accomplished with the POV-Ray application, and can
be done in batch mode via a script at the command line interface (by using pvengine
/EXIT +V <filename>). Following this, the user must manually convert all the BMP
images into another, more readily-usable image format, such as PNG or JPG. In the final
phase, the user invokes mencoder, a free application distributed with mplayer [3], to
encode the PNG or JPG images into an MPEG-2 format video file, which depicts the
actions and movements of the R.Sphaeroides bacteria.
Screen shots and system diagram
(following page)
Haskell interface for the generation of text frame data
Image generation with the POV-Ray utility (Initiated from the command line)
Current System Diagram
1.3 Literature Survey
The focus of the following literature survey is to establish an “industry state” and to look
into prospective implementation strategies. As such, the topics include functional reactive
animation in Haskell, a Haskell plug-in architecture, Haskell's foreign function interface,
interface toolkits, and rendering with OpenGL and C++.
An embedded modeling language approach to interactive 3D and multimedia
animation. Elliott argues the concept of what an animation means is frequently lost in
how to present it. He introduces a language which is suitable for defining animations in
terms of what they are by synthesizing “an existing declarative 'host language,' Haskell,
with an embedded domain-specific vocabulary.” The language he introduces is called
Fran (functional reactive animation), and it attempts to separate graphic content from
graphic presentation [6]. Fran treats 3D geometry as a collection of primitive shapes, and
also supports spatial transformation (translation, scaling), decoration (colors, texturing),
aggregation (combining multiple models into one conceptual model), lighting, and even
sound.
Another key concept in Fran is that the same level of synthetic and hierarchical
construction is given to 2D images as 3D models. This includes rendering 2D geometry
(lines, polygons, circles), rendering text (using any number of fonts and display modifiers),
2D spatial transformation, image overlay, partial transparency, and cropping. Images are
said to have infinite extent and resolution, until they are actually displayed. This is in
contrast with the typical conception of images as bitmaps or other rasterized formats,
which have finite resolution and size.
Fran treats all animation, 2D or 3D, as behaviors, which are simply “time-varying
values.” Time in Fran is treated as a continuous value. Similar to images, time is not
turned into a discrete sequence until the user experiences it in the form of a discrete
number of frames.
Fran presents a easier-to-define method for animation then many conventional
graphic libraries such as OpenGL. It's basically just a DSL embedded in Haskell, much
like CellSys itself. For this reason, it would be easy to include Fran and the real-time
simulation into one comprehensive Haskell module.
The Yampa Arcade. This article describes the implementation of the classic space
invaders game using Functional Reactive Programming, or FRP. Yampa is a Haskell-
embedded incarnation of FRP, which is used throughout the article. Yampa strives to add
features which are necessary for the domain of animation, which earlier reactive animation
systems such as Fran and FAL lacked. Two concepts which are core to Yampa are
signals and signal functions. Signals are essentially functions from time to a value, and
the type of the value represents the type of data carried by the signal. Yampa signals are
similar to Fran behaviors. For example, if a Point represents a 2-dimensional point, then
the position of the mouse, as it changes in time, might be represented as a value of type
Signal Point. Events, such as a key press or mouse button click, can be easily
represented as an (Event T) type, where the T type contains information about the
event. Signals of type Signal Event T can then be used to model discrete events.
Signal functions accept signals of one time and produce a signal of another type. This is
very relevant to CellSys, not only because it is embedded in Haskell, but because it may
provide a convenient, extendable, and modular method for managing the timing and
animation of a real-time simulation. Combined with a graphic toolkit, such as HGL, Yampa
could form one possible solution for half of the visualization tool. By generating images
directly from Haskell and invoking mencoder, it is possible the entire animation suite could
be embedded along with the DSL to form a single, cohesive system.
FranTk – A Declarative GUI Language for Haskell. FranTk is a GUI interface
toolkit, which is built on Fran. It adds support for hierarchical interactive displays, which
allows access to input from individual components rather than one monolithic window.
FranTk lifts Fran's behaviors and events to widgets, which is key to declarative
programming, and provides a more efficient implementation of core Fran combinators,
which improves Fran's data-driven model.
The fundamental concept in FranTk for interaction objects, such as labels, sliders,
or buttons, is the Component. A component is an action which produces a widget. The
action is performed when the widget must be displayed. Components have three types,
the standard viewable widget (Component), graphical components from Fran such as
lines or circles (CComponent), and the top-level window component (WComponent).
These components can be composed to easily create much more complex hierarchical
interfaces.
FranTk also provides access to state with monads. Monads are a functional
programming concept which allows the programmer to extend the return value of functions
with additional information, such as various side effects to some “global state,” error
messages, exceptions, and re-formatting, among other things [7]. Listeners are used to
update state, and FranTk also provides Wires which connect the output of various widgets
to the inputs of listeners. For example, when a button (a widget, generated from a
component) is pressed, a value is sampled, and passed (via a wire) to a listener, which will
then update the application state.
FranTk, combined with Fran or even Yampa, could provide the necessary interface
capabilities to implement the visualization system itself in Haskell. It is part of a larger
functional suite of Haskell-embedded domain specific languages.
Plugging Haskell in. Andre Pang details the use of Haskell as a statically typed
extension language for both Haskell and foreign-language applications using the foreign
function interface. Extension languages enable users to add additional functionality to a
program without understanding the underlying program itself or re-compiling (if applicable)
the program code. Functionality written in such a fashion can usually be loaded at run-
time, and these modules are often called plug-ins. A plug-in interface has two sets of
symbol names, one for the values the plug-in can access from the host application and
one for the values the host application can access from the plug-in. A Haskell plug-in
infrastructure can be accessed from any language directly supported from Haskell's
foreign function interface, such as C, or even from a language which inter-operates with C,
like C++, Objective-C, or C#.
By using Haskell as a plug-in extension language and the FFI, it would be possible
to use the embedded domain-specific language (EDSL) CellSys as a module from some
other application. This means the simulation interface and programming could be written
in C/C++ and OpenGL or some other efficient, well-established language and any
simulation EDSLs can be called as modules or plug-ins from their native embedded
language, in this case, Haskell. This model would be a fully-featured stand-along
simulation and encoding engine, with a sort of extension language that could be used to
power different types of simulations.
3D Rendering with C++ and OpenGL in Undergraduate Projects. This paper
presents and object-oriented approach to rendering 3D objects with C++ and OpenGL. All
objects are basically rendered as a set of vertices (ordered triplets representing points in
3D space) and polygons (3 or more vertices). Since shapes are made of multiple
polygons, and the polygons often have vertices which overlap, it is common for the
vertices to be stored in an array, and then each polygon has a set of pointers to its
vertices. This avoids storing redundant data. Lighting is achieved by the relationship of a
polygon's normal vector (an ordered triplet representing some direction in space, in this
case it has length one and is perpendicular to the polygon's surface) to the direction of a
light source. Maintaining accurate normal vectors is essential for accurate lighting effects.
With flat shading, only one normal vector per polygon is needed, and the entire polygon
face is shaded the same color. A more realistic shading model, Gouraud shading, uses
one normal vector per vertex, and the intensities from all the vertices will be averaged by
weight across the face of the polygon. Textures can be applied to a polygon, which is
similar to putting an image, such as wood, stone, etc, on the face of the polygon.
The article applies careful thought to the object hierarchy which is used to store the
render data. Since one vertex or normal vector may be used by more than on polygon, it
is convention to store all the primitive data in separate linked lists or other such dynamic
data structures, and then use pointers to get a handle on component primitives.
A fully-blown rendering engine, with support for vertices and arbitrary polygons may
not be necessary for this visualization engine. Certainly, the CellSys language only
requires support for spheres, but some extendibility is expected for an abstract rendering
engine. This illustrates the merit of the POV rendering format, at least for the “pre-render”
simulation path, because the managing application itself need not be aware of any of the
details of rendering. If the POV format is used, then the POV-Ray application can render
any images necessary for the video clip, which can be directly passed to the video
encoding process.
The system under consideration is based on a paper by Dr. William Harrison and
Dr. Robert Harrison called “Domain Specific Languages for Cellular Interactions” [5].
In this work, the capacity for the use of domain-specific programming languages (DSL’s) in
the area of biological system simulation is explored. DSL’s are small programming
languages that include language abstractions for only one particular area of expertise [5].
As such, programs written in DSL’s are easily constructed and understood by domain
experts [5]. The primary advantage of DSL’s in computer modeling and simulation of
biological systems is that they free the biologist from detailed attention to the underlying
computer processes upon which their programs are built.
Consider the task of writing a computer program to simulate cell activities. A typical
C++ program to accomplish this would be quite large, with pertinent biological data spread
across multiple language constructs [5]. The biologist, therefore, would need to have a
high degree of C++ knowledge and spend a large amount of time focusing on low-level,
computer-related details. Modification and reuse of such a program would also be
problematic, given its complexity. On the other hand, this task could be split into two
separate activities: rapid prototyping of a simple programming language to create
biological simulations and the analysis of the biological models upon which the language is
based. The first problem is in the domain of computer scientist, while the second is in the
domain of the biologist. In such a model of execution, the computer scientist could focus
on the details and adjustment of the language design, and the biologist could focus on the
accuracy of the simulations.
In his paper, Harrison demonstrates the CellSys language, a DSL created to
simulate the motion of the simple bacterium Rhodobacter Sphaeroides. R. Sphaeroides is
a photosynthetic bacterium that actively swims towards light in an effort to obtain optimal
conditions to grow and divide. At any point in time, R. Sphaeroides will swim, tumble
(reorient its direction), grow, divide, or die according to that depend on the bacterium’s
present environment and history [5]. CellSys, embedded in the functional programming
language Haskell, presents to the biologist the abstraction of a Markov chain for specifying
the behavior of the R. Sphaeroides.
A Markov chain is a representation of a set of states and the probabilities of moving from one state to another at a given instant. As such, Markov chains are a representation of the behavior of a non-deterministic finite state machine. [5]
Thus, CellSys models R. Sphaeroides by allowing the biologist to easily specify the
likelihood of each action that the bacteria will take when in a given state. The runtime
system of CellSys allows for concurrent operation of multiple bacteria and makes
adjustments to the global environment in which they exist.
The simulation outputs a series of POV files, specifying the successive locations of
each bacterium being modeled in the simulation. POV files are simple text files that
contain a description of each scene in the simulation. These files can then be interpreted
by POV-Ray, a free ray-tracer available on the internet, to create a series of images of the
simulation [1]. The series of images can then be combined to create an animation of the
conducted simulation [5].
A very similar project in biological system simulation is described in the article
“Individual-based simulation of the clustering behavior of epidermal growth factor
receptors” [16]. This article describes the implementation of a simulation and
visualization of epidermal growth factor receptors on cell surfaces. Epidermal receptors
are biological structures in the surfaces of cells. Free external molecules, known as
ligands, are able to bind themselves to these receptors. When this occurs, other receptors
on the cell surface are attracted to the newly formed bound receptor. The clustering
activity of these receptors then signal chemical processes within the cell to stimulate
growth. Although this process can be viewed under a microscope, little information
actually exists about its detailed mechanics. [16]
To gain further insight, a computer simulation was designed to model the clustering
behavior of receptors on the cell surface. The simulation was constructed using an object-
oriented approach. This is in contrast to the purely functional programming paradigm used
in our project’s simulation. In the object-oriented system, receptors are represented by
objects of the class Molecule. These Molecules have the ability to bind to ligands and
move within the two-dimensional surface of the cell, represented by an object of the class
CellSurface. [16]
When two Molecules come within close range of each other, the CellSurface object
references an AffinityTable containing clustering probabilities. (This is not unlike the
Markov chain mechanism used in the simulation of R. Sphaeroides.) The Molecules in
close proximity then have the possibility of forming a cluster structure represented by
objects of the Multimer class. The Multimer class is a subclass of the Molecule class
because clusters of receptors behave in much the same manner as receptors themselves.
Both have similar movement and clustering abilities. The Multimer class, however,
additionally allows the dissociation of its constituent Molecules. [16]
The receptors and clusters (represented by the Molecule and Multimer classes,
respectively) can be displayed to visualize the system’s global status (Figure 2). Both
receptors and clusters are represented as circles on a two dimensional plane (the cellular
surface). The diameters of these circles are proportional to the number of receptors within
the cluster. Free and bound ligands can be interactively introduced into the simulation
using the mouse. [16]
Figure 2. A scene from a receptor animation sequence.
Biological simulation techniques provide interesting comparisons to the larger
simulation system in which our visualization component will exist. However, it is primarily
important to examine systems that provide advanced biological visualization. A state of
the art example of such a system is VisBio, an Open Source, Java-implemented biological
visualization system developed by Curtis Rueden at the University of Wisconsin –
Madison. Described in a recent column in VisFiles by Bill Hiddard [13], VisBio aims to
develop innovative techniques for visualizing vast quantities of 2D, 3D, and higher-
dimensional data obtained through biological microscopy. These living cell visualizations
allow scientists to gain a better understanding of the dynamic properties of cells and their
internal structures. [13]
New algorithms for the tracking and visualization of such dynamic structures are
constantly being developed, and VisBio serves as a platform for the free exchange of this
evolving art. VisBio was thus constructed with both extensibility and portability in mind.
Developed in Java, VisBio is built upon the data visualization API, VisAD. The use of
VisAD allows great flexibility in the types of data that can be visualized. Therefore, VisBio
provides a flexible, portable environment for visualizing many different types of biological
structures. [13]
The primary data to be visualized is obtained by “varying a microscope’s focal plane
across multiple heights within a specimen.” [13] The result of this technique is a “stack” of
two-dimensional image slices that give an approximation of the biological specimen’s
three-dimensional structure (Figure 3) [13]. Often these images are captured over a span
of time, yielding a time-varying (or four-dimensional) view of the specimen. Furthermore,
new techniques in microscopy allow additional specimen data to be captured, creating up
to six-dimensional data to be visualized [13]. However, current versions of VisBio do not
support the visualization of such data [13].
Figure 3. Cross sectional capabilities of VisBio. Figure 4. Menu options and 3-D measurements in
VisBio
The VisBio interface is an example of the vast possibilities that exist when
visualizing biological data. First, the interface offers flexible color schemes and color
manipulation for the images to be viewed [13]. Besides offering a number of predefined
color schemes, the user is allowed to customize the color ranges of pixels in the images
[13]. Second, images are available for viewing both in low- and high-resolution formats.
This feature allows effective memory management and fast animation [13]. Third,
although the image visualization primarily takes the form of the image “stacks” described
previously, VisBio is capable of constructing 3-D, semi-transparent volumes of the
specimens for viewing [13]. Non-horizontal slicing planes through the specimen can be
interpolated, and precise measurements can be tracked and made across image slices
(Figure 4) [13]. Future developments planned for VisBio include advanced visualization
features of 6-D specimen data and more advanced memory management techniques for
visualizing the massive amounts of data [13].
VisBio is an excellent example project for our visualization system. Although much
more complex, it demonstrates the creativity that can be introduced into the visualization
process. The user interface of VisBio is something that should possibly be mimicked in
our visualization system, as it is very intuitive. As can be seen from the Figures 3 and 4,
each component of the visualization stands in a separate, floating window. This allows
hierarchical access to the many visualization features that may be incorporated (viewing
area, color schemes, etc.).
Another example of biological visualization is described in the journal article “FPV:
Fast Protein Visualization Using Java 3D” [14]. As the name suggests, this program is
used for protein visualization. Visualization of proteins is important for biologists because
the 3-D structure of proteins determines their ability to interact with other molecules [14].
The system was designed using the Java programming language and the Java 3D API.
Although performance of the Java 3D API can sometimes be low [14], this article suggests
techniques for fast visualization of the 3-D protein data.
This article is especially applicable to our current project. FPV accesses Protein
Data Bank (PDB) files to discern the atomic structure of the proteins to be visualized [14].
Scene construction then occurs via a Graphics Module, and the final, rendered scene is
output to the display unit [14]. Our project of cellular visualization can be thought of as the
repeated application of this process. Moreover, in many of the protein views described,
the internal protein atomic structure is represented by a group of spheres [14]. For our
system, it is likely that the cellular bacteria will also be represented as spheres. Thus,
optimization techniques used in FPV may also be considered for our project.
The particular optimizations described in the article were as follows. First, the
designers realized that because the atomic structure of the protein was rigid, essentially
only one geometric transformation needed to be completed, as long as the protein
structure could be drawn directly [14]. This required the creators to design a new
Sphere3D class that allowed the direct drawing of a sphere in any location [14]. Instead of
storing a transformation along with a sphere’s coordinates to be transformed, the sphere’s
transformed coordinates were stored directly [14]. This increased the time it took to load
the model, but greatly increased visualization speed [14].
The second optimization had to do with the recording of shape information within
the model. Each shape object contained information about both its appearance and its
geometry. Instead of redundantly recording information for shapes with the same
appearance, only objects of unique appearance were stored, each with an attached array
of geometric information for the various instances. [14]
Both optimizations significantly boosted performance. Larger protein models could
be loaded with higher frame rates [14]. When compared to other Java 3D protein
visualization programs, FPV outperformed them in almost areas [14]. The primary
importance of this article was the concrete suggestions made for simple geometric
rendering performance. Both optimizations will be explored further in our project.
The article “Problems of Visualization of Technological Processes” [15]
provides both an introduction and two advanced case studies in the field of dynamic
scientific visualization. In this article, the importance of dynamic visualization as a medium
for understanding is first discussed. Applications for the understanding of flows of people
and water, traffic accidents, forest fires, and global warming are mentioned. The role of
dynamic visualization is advocated to better understand hidden relationships and patterns
in dynamic systems. Moreover, rather than focus strictly on the precise accuracy of
dynamic simulations, techniques for rapid simulation and visualization are given that
maintain the accuracy necessary to discover general patterns. This approach is illustrated
with two examples of dynamic visualization in the field of power engineering.
The first simulation was that of a gas filtration device. In this device, gas is passed
though a filter containing granules that absorb dust and other gas pollutants. As the
granules absorb the pollutants, their absorption capacity decreases, and they grow heavy
and fall through the filter [15]. This material is then cleaned and continued to be used in
the filtration process.
In the simulation developed, researchers modeled the physical behavior of the
filtration granules and the gas passing through them. Although the movements of the
granules and gas were simulated using highly complex techniques, the main innovation in
the simulation was the division of the filter into volume units of equal size [15]. Properties
of these volume elements were then tracked such as the amount and locations of the
granules within them [15]. Purification of the gas was then measured by the rates of
change of pollutants in the volume elements [15]. Relative accuracy of the simulation
results can be seen in Figures 3 and 4.
The next system modeled was that of the combustion process of a power plant
boiler. Unfortunately, precise simulation of combustion processes requires very time-
consuming calculations. Programs like FLUENT offer such simulations, but do not provide
simulation feedback for many hours [15]. Thus, it is very difficult to provide simulation and
visualizations in reasonable amounts of time without many simplifying assumptions.
Particularly, simulations of fluid dynamics, coal combustion, and heat radiation were
simplified by using finite volume elements, upon which, calculations were made [15].
Moreover, in many cases, pre-calculated data sets are used to assist with simulation.
Such simplifications allow for real-time simulation and visualization of the combustion
process.
The visualization component of the simulation system provides an interface for
zooming, time-stepping, and obtaining volume element statistics. Use of pre-calculated
data sets also allows visualization playback at various speeds with relative ease. When
compared to the precision-oriented combustion simulation program, “about 60% - 80% of
volume elements have their attribute values (temperature, pressure, etc.) different from
values obtained by FLUENT by less than 30%.” [15] Such differences are viewed as
acceptable for process illustration purposes. Moreover, the speed advantages of the real-
time system are promising for future simulations.
The case studies mentioned above represent a wide range of biological simulation
and visualization. Simulations can be modeled in a variety of programming languages
using a variety of computer science technologies. The visualization systems mentioned
vary according to the types of data they display and the techniques for visualization.
There is, however, a unifying goal that all of the described programs have: they all seek to
further knowledge of Biology using techniques of Computer Science. By completing this
project, we hope also to make a small contribution to this pursuit.
Summary
The literature reviewed primarily presents two implementation methods for the
visualization tool. One approach is to write all the functionality in Haskell; the EDSL
CellSys, the interface using FranTk, the real-time simulation using either Fran or Yampa,
rendering the images directly from Haskell, and using the FFI to call some kind of video
encoding process. The other possibility is to write a stand-alone, modular, visualization
client using C++, OpenGL, and an interface toolkit such as GLUT or QT. Plug-ins could be
created to support various EDSLs in Haskell, or some other inter-language interfacing
method could be used.
1.4 Goals and Objectives
System constraints, environment, and interface requirements
The only hard system constraint is that the R. Spaeroides DSL, CellSys, is
implemented in the Haskell language. Soft constraints include the POV and MPEG-2 file
formats, pvengine, and mencoder. As long as an acceptable end-product (a playable
video file) is produced, the POV, or even MPEG-2, formats and associated utilities are not
necessary.
The current system is Windows-based, although all the utilities involved are open
source or available for multiple platforms, so the new system should also be relatively
platform independent, and run in both Windows and Unix environments.
The CellSys program currently interfaces with the pre-rendering process via file
I/O, but that output routine shall be re-written and replaced with a more direct
communication link between CellSys and the Input Module. The Input Module shall be
the new system's “public interface,” so to speak, although API may be a more appropriate
term. The interfaces between the input module, the real-time simulation, and the pre-
rendered simulation should be identical. The system shall also provide an interface to the
user, which allows him or her to enable or disable the real-time and pre-rendered
simulations, and to initiate the CellSys simulation. The user interface should also allow
the user to choose or configure the simulation DSL to some extent.
Objectives and tasks of work
First and foremost, a method for generating a movie file will be implemented. This
will most likely involve scripting mencoder to run with the proper arguments, and possibly
generating image files procedurally from Haskell or C/C++. The real-time simulation and
user interface shall be implemented in OpenGL, either in C/C++ or using FRAN and
Haskell. The interface shall be created using either GLUT or QT. An input module shall
be created and used to pass data from the CellSys DSL to the simulation interface. The
Haskell program can invoke both the interface and the input module to run as two threads,
and then pass data as it generates it to the input module. The input module will prepare
and send the data to the interface as it is needed, to support the real-time simulation and
animation playback.
Prototype expectations
The prototype shall provide an interface which allows the user to choose either real-
time simulation or pre-rendered video, and a button to start the chosen mode of operation.
Configuration options for either path should also be present.
Performance experiments and expectations
The real-time simulation should be able to handle up to a few thousand primitive
objects and render at least 25 frames per second.
1.5 Overall Approach
Overview of the proposed system
The focus of this project is to replace and extend the mechanism for generating
animation from the output of the CellSys Haskell program. This involves two primary
components, a video encoding process, and a real-time simulation process. The “real-
time path” can be used for immediate feedback from the CellSys simulation, while the
encoding process, or “pre-render path,” can be used for simulations which would be too
computationally intensive to render in real-time. The new system shall, from the starting
point of data generation, feature a fully automatic MPEG rendering process, as opposed to
the four-step manual process currently in use. The system as a whole shall provide an
interface to control the operation of the real-time rendering engine, the pre-render engine,
and should grant the user control of the simulation parameters, such as number cells and
frames to render. The system shall be designed in a modular fashion in order to facilitate
the extension of the visualization tool for other domain specific languages (DSLs), other
than the language which models R. Spaeroides in CellSys. Part of this design
philosophy will be the creation of an input module, which will act as an interface from
CellSys to the two rendering paths.
Work to be carried out
Re-write the CellSys output function. The old CellSys output function uses file
I/O to communicate with the other simulations phases. The function shall be re-written or
extended so that it can more directly communicate with the Input Module.
Construct an Input Module. The input module is responsible for taking data from
the embedded DSL and formatting it for either the real-time simulation or the pre-rendered
simulation.
Build a real-time simulation engine. This engine shall render the results of the
simulation in real-time, and should also provide a number of features for the analysis of the
simulation.
Completely automate the pre-rendered simulation path. Once the user indicates the
generation of an encoded video file, no further action should be necessary until the video
file is generated.
Build a user interface. The user interface will allow the user to control the
visualization. See section 1.4 for more details.
Advantages and disadvantages
The system will be simple, easy-to-use and extendable. Following the rapid-
prototyping nature behind the domain specific languages used for the simulation, this tool
will allow for the rapid producing of visual results. It should run independent of platform.
On the other hand, since this tool will potentially use many different technologies in order
to accomplish its ends (C/C++, OpenGL, QT, Haskell, mencoder, pvengine, etc),
installation may be non-trivial.
Detailed approach and risk analysis
There are several possible approaches to system implementation. First, the entire
project could be implemented in Haskell. This solution would allow direct communication
with CellSys programs in the form of function calls. There are also several OpenGL
interfaces for Haskell, allowing for the visualization. However, the use of Haskell would
require each of the project group members to learn a completely new programming
language. Furthermore, Haskell, as a purely functional language, is unlike many
conventional imperative programming languages. Thus, the learning curve would be quite
steep and both technical risk as well as time commitment would be excessively high.
The other general alternative comes in several forms but centers around the use of
a GUI toolkit. Toolkits such as Qt allow easy GUI programming and an interface to
OpenGL. The primary benefit of a GUI toolkit would be its comprehensive set of features.
Qt and C++ offer the entire file I/O and process management features necessary for each
alternative solution. Furthermore, the project members are more familiar with the C++
language than with the Haskell language.
The first alternative solution is by far the simplest: run CellSys programs as they
exist now and initially parse all the POV files to access the pertinent data. Parsing of the
files could be done automatically with the use of a parser generator such as Bison. The
parsed data would then be in a suitable format to be read and transformed by both the
Visualization Module and Movie Encoder. Although this approach is simple, it may be the
case that load times are too long due to the complexity of parsing such large amounts of
data. Thus, the operational risk involved for this alternative is too high.
The second alternative is very similar to the first. However, instead of parsing all of
the input data at once, only a portion of it is parsed initially. The rest of the data will be
parsed as the animation is running or the movie getting encoded. This is a more viable
alternative than that previously suggested. However, the complexity of the system is
increased due to the calculation of an appropriate buffered amount of parsed data before
animation may proceed. Moreover, the continued extensive use of the hard drive and
unfinished parsing also makes this alternative somewhat operationally risky.
The third alternative is the most likely candidate for implementation. In this solution,
the POV files will be discarded completely. In their place will be Haskell Foreign Function
Interface calls to C++ functions of our specification. These functions will populate the
scene data structures used by the Visualization Module and Movie Encoder. In this way,
no parsing of data files will need to be conducted and the data structure for the scene can
be populated along side the visualization with fewer performance concerns. The
disadvantage of this approach is that the original Haskell code for CellSys simulations will
have to be altered to make calls to foreign functions. It is also unknown whether functions
used in Qt’s runtime system will allow internal functions to be called. It might be necessary
to provide some sort of “hack” against such a restriction by forking a process that acts as
an intermediary between the Qt application and the CellSys simulation.
Cost Analysis
All the components used are either free or open-source, so costs are virtually non-
existent. This is slightly counter-balanced by the integration time for the system, since
many different technological aspects will be pulled together (User interface, Haskell API,
OpenGL simulation, video encoding). The largest cost is expected to be the time
investment, which is estimated to be approximately 10 – 20 hours per week for the
remainder of the semester.
Development strategies
Phase 1 • Semi-automate current processes
o Create sentinel or invoked C/C++ program to take as input current snapshot POV data and render/display the image to the screen.
Advantage Disadvantage
• Minimal impact on current Haskell program design
• Fastest to develop and implement
• Rendering images with POV is still rather slow
Phase 2 • Fully automate animation rendering
o Using either HOpenGL, or Haskell’s FFI (as discussed above) to directly render simulation data in OpenGL
Advantage Disadvantage
• Not a separate process for rendering
• Least amount of disk I/O therefore much faster than Phase 1 solution
• Longer Development Time • Implementing this interface
requires migration of CellSys to Glasgow Haskell Compiler instead of the Hugs interpreter (longer time to adapt existing model required)
• FFI requires additional data specifications
Phase 3
• Extra Functionality o Graphical User Interface and Simulation parameter controlling tools o Ability to change camera viewpoint
Advantage Disadvantage
• High level, abstract user control • Limit number of times code
must be recompiled to reevaluate the simulation
• Extra “wow” factor for audiences of Dr. Harrison’s talks
• Longest development time required for easy “pluggability”
• Could require extensive modification of CellSys Modules
2 Requirements Analysis
2.1 Introduction
During the past year, Dr. William Harrison has been conducting research on the
application of computer programming languages to biological simulation. In particular, he
has created the domain-specific programming language (DSL) CellSys that can be used to
create simulations of the movement, growth, and reproduction of the bacteria Rhodobacter
Sphaeroides. One drawback at present is the lack of an adequate visualization system for
the data generated by CellSys simulations. Currently, a series of shell scripts must be
manually executed to produce a video encoded MPEG after the simulation has been run.
This project will be to create a system that allows for a more seamless, automated movie-
making process of CellSys simulations, as well as the additional capability of interactive
simulation visualization. The following requirements analysis is a first attempt to clarify the
needs and construction of such a system.
2.2 Overall Description
This document enumerates a number of preliminary requirements and resulting
design decisions to aid in the development of both initial system prototypes and
appropriate evaluation metrics. These decisions are necessary for the rapid prototyping
development methodology that will be employed for the project. This methodology is a
particularly necessary requirement for interactive visualization development because it
allows continuous user feedback on both user interface and visualization requirements.
However, to make rapid prototyping a technically feasible option for a project of limited
time scope, a development environment familiar to project team members is required. In
particular, the team members for this project have selected the C++ programming
language in which to implement the visualization system.
This design decision, however, has a number of consequences. First, the CellSys
DSL (the simulation system) is embedded in the Haskell programming language.
Communication of simulation data between the simulation and visualization systems must,
therefore, occur between modules written in entirely different languages (i.e., Haskell and
C++). This data communication problem would not exist if the visualization system was
also implemented entirely in Haskell. To solve this problem, a separate data input module
for the visualization system must be developed. This input module will act as an
intermediary between the simulation and visualization systems. Iterative prototypes of the
input module will be constructed in an attempt to optimize the data-passing capabilities of
the input module (e.g., through files on the hard drive, through Haskell’s Foreign Function
Interface, through shared memory, etc.).
The primary performance requirements and evaluation metrics for this project will
relate to minimizing the delayed response of the interactive visualization after the start of a
simulation, maximizing the number of frames per second (FPS) that is supported for
various simulation sizes, and minimizing MPEG video construction time. What follows is a
more detailed description of the project requirements and relevant preliminary
implementation decisions mentioned in this section.
2.3 System Requirements and Constraints
2.3.1 Operating environment (external constraints)
Portability is one of the chief concerns of this project. Dr. William Harrison has
requested that the final program operate in a Windows environment, but his associate, Dr.
Robert Harrison, works primarily in Linux. Thus, the visualization system should ideally
operate in both environments. As a first approximation, therefore, the requirements
analysis should be constrained by the desire for portability.
The CellSys DSL is embedded in the Haskell programming language. To run
CellSys simulations, therefore, a Haskell interpreter is required. Any Haskell interpreter
that complies with the Haskell language definition in the Haskell 98 Report and Haskell
Foreign Function Interface Addendum to the Report is adequate. However, because Dr.
Harrison has been working with the Hugs 98 interpreter, its use will be assumed. Hugs 98
is freely downloadable for the Windows, Linux, and Mac OS X platforms and, therefore,
adheres to any platform portability constraints.
To maintain application portability, as well as maintain a familiar programming
environment as described above, the cross-platform C++ GUI toolkit Qt has been
preliminarily chosen for use throughout the project. In addition to providing a cross-
platform GUI programming environment, Qt has the advantage of providing an OpenGL
rendering context in which the interactive visualization will be constructed. Thus, the final
project will be portable to any platform that includes the appropriate Qt and OpenGL
libraries.
As a final operating environment constraint, the program should allow the
construction of a video encoded MPEG of the simulation. This MPEG video will allow
retrospective visualization playback even on systems not equipped with the appropriate
libraries and Haskell interpreter.
2.3.2 Market users and characteristics
The immediate market users for this project are Dr. William Harrison and any
associated researchers. While Dr. Harrison's primary interest is the application of DSLs,
this project is potentially relevant to any scientists who possess specialized biological
knowledge yet lack a background in computer science. By extending the DSL framework
with a visualization engine, this project provides a simple but solid environment in which to
build, model, and analyze various applications of domain-specific languages. There are
no known existing projects of similar scope and execution.
Because this project will utilize freely available technologies, the only economic
investment is time. While this is not a negligible consideration, it does not make the
project unfeasible from an economic standpoint. The visualization engine is general such
that it is not subject to any medical regulatory constrains. However, it will likely use a
number of open source components, and therefore must be constrained according to the
applicable open source licenses.
The primary customer requirement requests an interface connected to the CellSys
DSL embedded in Haskell to provide some form of visualization with minimal interaction,
setup, and reasonable delay. This will be achieved through two (mutually independent)
methods: a buffered, interactive OpenGL simulation and a rendered movie file.
2.3.3 Environmental constraints
The only human factors of concern are the Qt interface and the method for
initializing the interface, the input module, and the DSL. The user can reasonably be
expected to issue 2-4 commands from the embedded DSL to quickly configure and start
the interface and simulation. The interface itself must provide the necessary controls for
manipulating the simulation, and initiating the process to create a video encoded MPEG.
The application domain of this project is non-critical. The interface and simulation,
although interactive, are not intended for real-time systems and should not be used for
applications with real-time constraints. The product will uphold a standard of quality
acceptable for non-critical use, but cannot be used for any kind of simulation which
requires a high degree of safety, reliability, or authenticity (such as any kind of patient
treatment, etc).
2.3.4 System components
The system has four main components: the Qt interface and visualization window,
the process for generating a rendered video, the input module, and the CellSys DSL. The
DSL is responsible for generating simulation data, the input module accepts data from the
DSL and formats and prepares it for the interface, which displays the data as an interactive
simulation. The user will most likely make a call from the DSL which starts the input
module and the interface, and then make another call from the DSL to send the generated
data down the application pipeline. Playback controls for the simulation and encoding
process will be provided in the visualization interface.
2.3.5 Software interfaces and libraries
Notable software components include OpenGL (an industry-grade 3D graphics
library), Qt (a GUI toolkit), the C/C++ STL (Standard Template Libraries), Haskell (a
powerful functional programming language especially suited for embedded domain specific
languages), and possibly the Haskell Foreign Function Interface (used for called functions
from other languages) and/or a threading library to run the input module and the interface
in separate threads. An alternative is QProcess, a complimentary library for the Qt toolkit,
which may allow less complicated inter-process communication. A host of command-line
utilities will also be used for encoding into video format, such as ImageMagick and
MEncoder.
2.3.6 Communication interfaces
The DSL will provide data to the input module first by writing data files to a directory
while the input module scans for updated files. Later iterations will use either Haskell’s FFI
or some kind of threading mechanism.
2.3.7 Hardware interfaces
This project does not include any hardware interfaces beyond those found in a
standard PC. The most related would be keyboard and mouse input and display output.
2.3.8 System maintenance
This project is a software entity, and should easily run or port to any platform which
supports the component tools, including Linux, Unix, BSD, Windows, Mac, etc. The
project will target the Windows platform. The only hardware required is a computer
running one of the applicable operating systems.
The life cycle of this project can include use with multiple DSLs and due to its
modular nature should support upgrades to specific subsystems fairly well (i.e., the DSL,
the visualization interface, etc). While no consideration is given to disposability, the
software arose with a starting point of a single, stand-alone DSL, so it should be fairly easy
to replace or transplant, if the need arises.
2.4 Alternative Solutions
� Using the Fran DSL & FranTK
The Fran Domain Specific Language and Toolkit extend Haskell functionality
to include 3D graphics rendering, easing the passing of data from the DSL simulation
to the visualization engine. While this would simplify the data communication, the
Fran DSL and FranTK rely on DirectX technologies and are Microsoft Windows
dependent. Thus, application portability would be lost. This solution also suffers from
the constraint of a Haskell implementation, as described previously.
� Using the HOpenGL DSL
Like the Fran solution, HOpenGL is a 3D graphics API for use in Haskell,
allowing great simplification of the data communication problem. As noted before,
however, the members of the project team are not familiar with Haskell programming,
making such a solution technically infeasible. Furthermore, Haskell is a functional
language and is not optimized for the mathematical calculations required for 3D
rendering and animation. Thus, a severe performance penalty could possibly be paid
with this solution.
� Shell Scripts
The current system of visualization is implemented using a series of shell
scripts that are executed after the simulation has finished running. These scripts make
use of third party programs (POVRay and MEncoder), to generate an AVI movie for
later playback. If this approach is used, the primary focus would be aggregating the
current shell scripts and making them more efficient. This solution would be easier to
implement than any of the others but would have the longest delay from simulation to
visualization. Furthermore, no interactive visualization component would be added.
2.5 Performance Requirements
One of the primary performance requirements is minimal visualization response
time. As Dr. Harrison would like to ultimately use this system to create a real-time
visualization of a simulation, the system must be able to receive and display the current
simulated environment with minimal delay. Various simulation parameters, including the
number of simulated bacteria and simulation length, will have varying impacts on this
response time. The primary aim of the input module optimization in successive iterations
will be to reduce this response time as much as possible.
Another performance requirement of great importance is the number of frames per
second (FPS) that can be displayed in the interactive visualization. Currently, the
simulation produces data for real-time playback at 25 FPS. The rendering ability of the
interactive visualization system should ideally be able to accommodate this playback
speed for simulations of modest size. Modest size, in this sense, is a negotiated metric
determined from Dr. Harrison’s feedback and the prototype evaluation results. For larger
simulations, MPEG video playback may be used for visualization, or a decrease in FPS
might be feasible. Furthermore, as the complexity of the interactive visualization
increases, the realized FPS will also be forced to drop. The user may be forced to limit
interactivity of the visualization to achieve a desired FPS.
It should be noted that these performance requirements are not entirely separable.
A decrease in the visualization response time could very well limit the FPS of the
interactive visualization. Thus, the relationship between these two performance
characteristics will have to be mapped for simulations of various sizes and lengths. A
primary goal of the prototyping methodology is the ability to consult with Dr. Harrison as to
the relative importance of each of the system’s performance aspects.
A final performance requirement is to minimize the time of construction of the video
encoded MPEG for simulations of varying sizes. To a first approximation, the construction
should be completed relatively autonomously and more quickly than is presently available
with current shell scripts in use.
2.6 Resource Requirements
For fully featured construction & development, this project will require the following:
� 3 full-time developers, for 3 months each, totaling up to 1800 Man-hours
� Desktop/Laptop PC running at 1 GHz with 256 MB of RAM
� Software:
o ANSI/ISO Standard C++ Compiler
o Qt supported Operating System (Windows, Linux, Mac OS X, UNIX variants)
o Qt 3 libraries
o OpenGL libraries
o Hugs98 (the Open-Source generally accepted standard interpreter for
Haskell 98 used for the running of the actual simulation)
o MEncoder, an open source utility distributed with MPlayer, used for encoding
with the Microsoft MPEG4V2 codec
o ImageMagick, a suite of command-line utilities, used for converting image
formats
2.7 Evaluation Metrics
Evaluation of the success of the project will be conducted with several metrics, each
following rather immediately from the performance requirements descried in section 2.5.
Measurements of the supportable number of frames per second under varying simulation
conditions by the interactive visualization will be recorded, but a frame rate no lower than
25 FPS shall be targeted. In addition, the delay from the simulation start to visualization
start will be observed for various simulation sizes. The order of magnitude for these
metrics should be on the scale of a few seconds per thousand frames. Last, the MPEG
construction times will be recorded for varying simulation sizes. To more precisely target
performance bottlenecks, individual evaluation metrics of the input module’s performance
and individual frame construction times will also be observed. From this data, general
averages and averages for various simulation sizes will be calculated for each of the
performance aspects.
Several qualitative ratings, such as the aesthetic value of the animations and the
intuitiveness of the user interface will also be evaluated. Unfortunately, quantitative
metrics are more difficult to use for items such as these. Thus, continuous qualitative
feedback from Dr. Harrison will be required to maintain an acceptable level of work.
3 Design Specification
3.1 Introduction
Herein lies the specification of our design methodologies.
3.2 System Design Overview
This project is primarily integrated as a single interface, built in C/C++ with the Qt
toolkit. The interface shall use menus allow the user to load a simulation into memory
from either a collection of POV files contained in a directory, or from a custom binary
format, or encode an AVI from a collection of POV files contained in a directory. Mouse
input shall dictate camera movement, and a number of buttons shall be provided on the
face of the interface for controlling the simulation. Context menus will facilitate other
functions.
3.3 Data Requirements
Data collection and storage
Data is collected from the CellSys language. The data is collected from POV
format, and stored in any number of the following formats.
Relevant file formats include:
• Custom binary format (*.csm)
This format can be used to save previously loaded simulations. Loading simulations
from this format instead of raw POV files is much more efficient.
• POV-Ray file format (*.pov)
A format which is compatible with POV-Ray and other ray tracing programs. These
files are used to load simulation data and to encode videos.
• POV-Ray generated Windows bitmap format (*.bmp)
As the first step of encoding a video file, these images are produced by the POV-Ray
rendering engine, and are later converted to the PNG format.
• Portable Network Graphics format (*.png)
During the video encoding process, the BMP images are converted to PNG format.
This is accomplished with ImageMagick's “convert” command-line utility.
• MSMPEG4v2 encoded AVI file format (*.avi)
MEncoder is used to encode a series of PNG images into a standard AVI.
I/O Requirements
The interface shall receive POV files as input, and shall produce a simulation.
These simulations can be saved, as CSM files, for viewing in the future. The interface
shall invoke the CellSys language to produce POV files, and when desired, given a set of
POV files, the interface is required to output a video file. User input for simulation control
is entirely mouse-driven.
Digital Archives and Databases
Beyond permanent storage of POV, CSM, and AVI files on a hard drive, there are
no applicable digital archives or data stores relevant to this project.
Data formats
The CSM format has a field containing the number of frames in the simulation.
Following this is the number of cells in the first frame, and then the cell information for the
first cell, second cell, etc. Data for frames 2 through the number of frames in the
simulation follow in the same format. A diagram is provided below.
| N frames |
| M1 cells | cell 1 | cell 2 | …| cell M1 |
| M2 cells | cell 1 | cell 2 | …| cell M2 |
. . .
| MN cells | cell 1 | cell 2 | …| cell MN |
3.4 Software Design
Class/Object Diagram (Somewhat simplified)
Block Diagram
Data Flow Diagram
Interface Diagram
Alternative Evaluation and Selection
Alternatives to the C/C++ Qt/OpenGL interface included writing something native in
Haskell, but this approach was abandoned due to the complexity of embedding such a
system in a functional language. There are tools available for this purpose, such as Fran
and FranTk (Functional Reactive Animation and the associated interface toolkit, functional
“equivalents” of OpenGL and Qt), however they are not as established as the long-existing
C/C++ foundations, and documentation and appropriate learning material is more difficult
to find.
The interface uses the QProcess class to communicate between different
applications, however the Haskell Foreign Function Interface also provides similar
functionality. Implementing the Haskell FFI provided too little additional benefit to warrant
the additional research and development time.
OpenGL can render images from the simulation straight to the hard disk, thus
bypassing the need to render POV files using POV-Ray, however existing functionality in
the form of POV-Ray was again chosen over radically new development. The decision
was made for simplicity, but rendering frame images using OpenGL was a viable goal of
this project, which is unfortunately as of yet unfinished.
The alternatives in use by this project were selected for technical feasibility.
Major difficulties
The frame data would ideally be contained within a C++ STL vector list, however
there appeared to be an error within the STL implementation or the C++ compiler. A
doubly-linked list was used instead. Qt signals, which are used for inter-functional
communication, are implemented as macros, and it can be difficult to trace errors with
them.
The QProcess class, used to invoke CellSys, sometimes behaves differently than
expected when producing the “standard output is ready” signal. Due to buffered, multiple
lines of output would be produced at once, instead of one at a time.
Seamlessly integrating different applications such as the Haskell evaluator, POV-
Ray, and mencoder proved to be somewhat non-trivial. Also, establishing appropriate
compilation environments was somewhat of an issue, due to the large array of utilities in
use. Additionally, some of the free and/or open source tools, specifically mencoder,
possess large and adequate, but labyrinthine, documentation.
3.5 Hardware Design
The solution described in this report is a software system, and requires no hardware
beyond a standard PC with a suitable graphics card. The standard computer system, in
this sense, includes a keyboard, mouse, monitor, motherboard, processor, hard drive,
memory, etc.
3.6 Testing Methods: Evaluate the following
• Is the program very responsive with minimal waiting delays (i.e. loading a
simulation)?
• Does the program have a light footprint (relatively low memory usage) with respect
to the number of frames processed?
• Does the program have intuitive and easy to use playback controls?
• Is the user able to easily reposition the viewing angle?
• Is the user able to easily run a new simulation?
• Is a power user able to easily understand and extend the code?
3.7 Scheduling Diagram & Task Assignments
Timeframe: March - May 2005 Task Assigned To Project documentation Billy and Charlie Core Development Perry UI Development Perry Simulation Execution Development Charlie Movie Generation Development Billy Presentation Development Charlie and Billy 3.8 Implementation Costs Description Qty Cost Per Total Developer Hours 40 20 800 Documentation Hours 80 20 1600 Windows Licenses 3 120 360 Learn QT book 1 40 40 Learn Haskell book 1 27 27 TOTAL 2827
4 System Implementation
Implementation Summary
The system was implemented as described in the Design Specification section.
The user interface was created with Qt, using an OpenGL widget for the cell display. The
QProcess class is used for communication between the interface, CellSys, POV-Ray, and
mencoder. All programming was written in C/C++.
Installation Instructions
Establish the operating environment
• Install ImageMagick, and make sure the “convert” utility included in the ImageMagick
suite is within the system path. Rename the “convert” utility to “zconvert,” to avoid a
namespace conflict with the Windows filesystem “convert” utility.
• Install “grep” and “gawk” for Windows. The path to these executables can be specified
in the visualizer application at runtime.
• Install MPlayer for Windows, and ensure the “mencoder” utility is within the system
path.
• Install POV-Ray for Windows, and ensure the “pvengine” utility is within the system
path.
• Install a Haskell interpreter for Windows (such as Hugs98). The path to this executable
can be specified in the visualizer application at runtime.
Establish the Compilation Environment
Install Qt for Windows, and optionally the Borland 5.5 command-line compiler
(included with the Non-Commercial Qt Distribution on the project CD). See the readme file
in QtCD directory for additional instructions on installing the Borland compiler, which is not
fully installed by the CD installation process. Make sure the appropriate “make” command
will be found, and that the Borland “make” doesn’t conflict with an existing installation.
Finally, compile the project with the “pmake.bat” batch file.
5 System Performance, Testing and Evaluation
Overview
Forty-one test cases were run in an attempt to gauge the interactive visualization
performance under various simulation loads. In addition, simulation load times were
compared for identical simulations saved both as a directory of *.pov files and as a single
binary *.csm file. Finally, a less intensive performance evaluation of the *.avi movie
rendering process was performed. This consisted of a qualitative inspection of the
automated *.avi rendering process.
Interactive Visualization Performance, Testing, and Evaluation
A battery of forty-one tests was conducted to measure the interactive visualization
performance in response to various visualization components. Results of these tests can
be found in Table 5.1. The primary measure of the visualization performance was the
comparison between the frame rate as specified by the user (nominal frame rate) and the
actual frame rate achieved by the system (actual frame rate). In particular, frame rate
comparisons were made for simulations involving increasing numbers of cells and the
presence or absence of a clipping box, bounding box, viewpoint manipulation, and cell
color codes. The test cases were performed on a 2.8 GHz Pentium IV processor-based
computer with 512 MB of RAM and an ATI Radeon 9200 video card.
The test results were concordant with preliminary performance expectations based
on a basic understanding of OpenGL frame rendering. In particular, it was found that
system performance decreased as the cell count increased. Comparisons of the actual
frames rates achieved and the user-specified nominal frames rates with no visualization
features can be seen in Figure 5.1. Running a simulation with only one cell and no
visualization features selected, a nominal frame rate of 100 frames per second (fps)
yielded an actual frame rate was 66.67 fps. This discrepancy is undoubtedly due to
inherent limitations of processor speed, rendering capability, and processing overhead of
common GUI events. Thus, 66.67 fps is the maximum achievable frame rate for any
simulation viewed in the interactive visualization.
As more cells were included in simulations, the maximal achieved frame rate
dropped as can be seen in Figure 5.1. These results show the natural bottleneck that
occurs when each includes more and more cells (represented as spheres in a three-
dimensional viewing space). For example, when rendering 1000 spheres, each consisting
of 400 polygons, 400,000 polygons must be drawn for each frame. At a nominal frame
rate of 100 frames per second, the system must be capable of rendering 40 million
polygons per second. This level of performance is clearly not achievable on a personal
computer.
The presence of a bounding box for the visualization also degraded system
performance. The bounding box functionality served to allow the user to selectively isolate
portions of the visualization. This functionality was achieved with the additional OpenGL
clipping planes. Increases in calculation for the projection transformation were thus
required by OpenGL for each frame representation. This additional calculation complexity
reduced performance as can be seen in Figure 5.1.
The visualization option that produced the most dramatic performance decrease
was viewpoint manipulation. The interactive simulation allows the user to change
viewpoints using the mouse. Each time the mouse is moved only one pixel, the system
attempts to redraw the frame at the new viewpoint, effectively greatly increasing the
nominal frame rate. For cell counts of more than 100, this increase in frame rate
essentially freezes the interactive visualization during the period of viewpoint manipulation,
as can be seen in Figure 5.1. However, as soon as the viewpoint is no longer being
manipulated, the previous actual frame rate is again achieved.
Examination of the test cases also revealed that the addition of color codes and a
clipping box yielded negligible performance variations. This was entirely expected
because changes in cell color involve relatively few OpenGL rendering changes. Also, the
clipping box (so named because it defines the region which the user can selectively view)
involved rendering at most six additional polygons. Thus, its addition to the frame
rendering was negligible when compared to the polygonal counts involved in the cell
(sphere) rendering.
The overall performance of the interactive visualization system is quite acceptable.
The achievable frame rate decreases to below 25 fps only for simulations that involve
more than 400 cells. Thus, for simulations that involve fewer than 400 cells, a fluid
visualization can be produced. Moreover, as general trends in the clustering of cells are
all that are required of the visualization system, frame rates of far fewer than 25 fps are
acceptable. In short, it is not required that the visualization achieve an amount of fluidity
conducive to the human eye, but that general trends in cell clustering and movement can
be seen.
Performance data is tabulated on the following page.
Cell Count
C. Box*
B. Box*
V.P. Manip* Colors
Nominal FPS Actual FPS
1000 No No No No 100 12.05
1000 No No No No 40 12.05
1000 No No No No 25 12.05
1000 No No No No 10 9.09
1000 No No No No 5 4.93
1000 No No Yes No 100 1.59
1000 No No Yes No 5 1.92
1000 No No No Yes 100 12.05
1000 No Yes No Yes 100 9.90
1000 No Yes No No 100 10.00
1000 Yes No No No 5 4.93
1000 Yes No No No 10 9.09
1000 Yes No No No 25 11.90
500 No No No No 100 23.26
500 No No No No 40 23.26
500 No No No No 25 21.28
500 No No No No 10 9.17
500 No No No No 5 4.93
500 No No No Yes 25 21.28
500 Yes No No No 25 21.28
500 No Yes No No 100 18.52
500 No Yes No No 40 18.52
500 No Yes No No 25 18.52
500 No Yes No No 10 9.17
500 No Yes No No 5 4.93
500 No No Yes No 100 2.85
500 No No Yes No 5 2.80
300 No No No No 100 37.04
300 No Yes No No 40 32.26
200 No No No No 100 52.63
200 No Yes No No 100 45.45
100 No No No No 100 66.67
100 No No Yes No 100 5.92
100 No Yes No No 100 66.67
1 No No No No 100 66.67
1 No No No No 40 32.26
1 No No No No 25 21.74
1 No No No No 10 9.17
1 No No No No 5 4.93
1 No No Yes No 100 66.67
1 No Yes No No 100 66.67
C. Box = Clipping Box Present B. Box = Bounding Box Present V.P. Manipulation = Viewpoint Manipulation Occurring
Table 5.1. Interactive Visualization Load Performance on 2.8 GHz processor with 512 MB RAM and ATI Radeon 9200 video card.
Actual FPS for Nominal 100 FPS
66.67
52.63
37.04
23.2612.05
01020304050607080
0 300 600 900 1200
Number of Cells
Actu
al F
ram
es P
er
Sec
ond
No Options
Bounding Box
ViewpointManipulation
Figure 5.1. Comparison of Actual Frame Rates against Nominal Frame Rates under the presence of various visualization parameters.
Simulation Loading Performance, Testing, and Evaluation
The system is capable of loading two distinct representations of simulations. The
first representation are the *.pov files that are directly written to the hard drive by CellSys
simulation programs. Loading the data contained in the *.pov files involves continual
opening and closing of files on the hard drive and parsing the frame data contained
therein. Numerical representations of the locations of cells must then be converted from
string format to a binary numerical representation that can be utilized by the display
module.
In contrast, once a simulation has been loaded, the user has the option of saving
the simulation as a *.csm file, a custom binary format created to store the simulation frame
data. The format of the *.csm files can be found above, in Section 3.3 under the “Data
Formats” heading. The *.csm files consist only of the binary numerical representations of
the frame data. Therefore, very little parsing and conversion is necessary to extract the
frame data. Moreover, all frame data for a simulation is contained in a single file as
opposed to a directory of *.pov files. The load times for *.csm files are, therefore,
significantly shorter than the load time for directories of *.pov files, as can be seen in Table
5.2 and Figure 5.2. In general, for simulations of moderate size, load times for *.csm files
can be expected to be on the order of 40 times as fast as load times for *.pov files.
Cell Count
*.pov Load Time
*.csm Load Time
Ratio (POV/CSM)
1 0.188 0.031 6.06200 2.782 0.063 44.16300 4.047 0.078 51.88500 6.859 0.125 54.87
1000 12.922 0.219 59.00Table 5.2. Simulation load times for *.pov and *.csm files.
100 Step Simulation Loading
02468
101214
0 200 400 600 800 1000
Number of Cells
Load
Tim
e (s
ec)
POV FilesCSM File
Figure 5.2. Simulation load times for *.pov and *.csm files.
AVI Rendering Performance, Testing, and Evaluation
No formal tests were conducted to measure the speed of the *.avi movie rendering
process. Qualitative observations indicate that each rendered frame takes approximately
two seconds on a 2.8 GHz processor-based computer with 512 MB of RAM. Thus,
rendering an *.avi file takes significantly longer than loading either a *.pov directory or
*.csm file for the interactive visualization. However, the interactive visualization requires
that all frame data be present in main memory before the simulation can be viewed. Thus,
the size of simulations that can be interactively viewed is limited by the amount of memory
in the computer on which the visualization system is being run. The *.avi movie rendering
process is not limited by this memory constraint and should is more suited for simulations
of very large sizes. To even run such large simulations, a large amount of delay is
expected, and thus, the delay in the *.avi encoding process is not a significant bottleneck.
Additional methods of *.avi encoding are also given in the Future work section below that
would greatly increase the encoding process.
6 Conclusions and Discussion
The current system is the result of only two cycles in the spiral software
development model. As such, it is currently only a prototype for the eventual system and
includes only an approximation of the desired functionality. A summary of the successes
and failures related to the current prototype, however, is appropriate at the this point in
order to gauge the current functionality and to direct further work that may be done on the
project.
First, the decision to use C++ and Qt to complete the prototype turned out to be the
correct one. The rapid development model embraced for the completion of this project
required the use of a familiar programming language and programming language
paradigm. Thus, the use of Haskell for the completion of the project would have been an
unviable solution.
Moreover, as our group developed more familiarity with Qt and its process-forking
abilities, the initial motivation for the use of Haskell no longer exists. In particular, the use
of Haskell would have made data communication between CellSys programs and our
visualization system very fast (it could be completed in main memory). As an alternative
solution, use of the Haskell Foreign Function Interface to communicate with the
visualization system would have provided the same performance. The complexity of inter-
process communication that would result from use of the Haskell FFI made this solution
technically infeasible for implementation in an early prototype. After using the QProcess
class for implementation of the *.avi rendering process and the running of a CellSys
simulation from the visualization GUI, it could be seen that the same process could be
used to capture simulation frame data from CellSys programs, as long as they wrote the
*.pov frame data to Standard Output instead of the hard drive. This would involve very few
changes to both the simulation program and the visualization program but would greatly
increase the speed of running and loading a new simulation. Writing *.pov files to the hard
disk could then be an option only used to render *.avi files.
In general, this project has yielded very positive results. The previous scripting
functionality for producing a *.avi file was adjusted to function more correctly and
automatically. This feature has been integrated into the application’s main window and
can be seamlessly accessed. The user can also easily run a new simulation from the
main window of the application. Previously, this process involved running the Haskell
interpreter, loading the CellSysSemantics file and then running the simulation.
Furthermore, the user had to manually ensure that a directory named “POV” existed in the
current working directory or the simulation would fail. All of this functionality has been
automated in our current prototype.
The current functionality of the interactive visualization is adequate for a prototype,
but would need to be augmented for a final release. In particular, the user should be able
to display multiple viewports that show selected portions of the simulation. Due to time
constraints, this functionality was not implemented. However, to add this functionality, the
interface between the player controls and the current display would not need to be
changed. All that would need to be implemented would be a container class for multiple
viewports, and then the current display could be easily swapped with this new class.
Finally, the performance analysis of the current system (as described above, in the
section, “System Performance, Testing and Evaluation”) is very promising. Simulations of
moderate to large size can be played at very reasonable frame rates. Further data
structure and cell representation optimizations could also further improve performance.
7 Future Work
Additional Features
It would be beneficial for reasons such as verification and a greater degree of
control of simulation parameters to be able to encode videos from not only POV files, but
from the frames rendered from OpenGL as well. This would allow for a small degree of
“interactivity” in the encoded video file as well as give the user more control of what is
displayed and how. Furthermore, the OpenGL simulation only supports the sphere
primitive, and it would be useful to include support for a wider range of primitive objects
found within various kinds of input data, such as cylinders, cubes, etc.
Another possible improvement to the interface would be tabbed simulations. This
would allow the user to open multiple simulations at once, and switch between them using
tabs, similar to tabbed browsing.
More cell-tracking features should be implemented, such as showing instantaneous
velocity vectors for cells that are moving. The implementation currently supports an array
of colors for displaying the cells, but the user has little control over this feature. Part of the
problem is that the input data produced by CellSys contains no uniquely identifying
information for each cell, so it is difficult to determine where a specific cell is from frame to
frame. Some system for identifying the cells should first be implemented.
Optimizations
A more efficient method of data storage could be implemented for the cell
information, such as a STL vector list or comparable class or structure. Also, the entire
simulation is loaded into memory before playback begins. It should be possible to create a
buffering system where the user can access a limited set of features to play the simulation
as it is being loaded. This may require multi-threading, forking, or shared memory of some
nature. Alternatively, a “sliding window” type algorithm could be used to keep only the n
frames above or below the current frame in memory. The algorithm would load the other
frames as the current frame is moved around and discard frames from memory which are
too far away from the current frame. OpenGL display lists could potentially bolster the
performance of the simulation, and the number of polygons per sphere is probably larger
than necessary.
The CellSys language outputs a new frame every time a single cell is updated, so
the frame rate is effectively reduced by a factor equivalent to the number of cells in the
simulation. To compensate for this, a “fast forward” feature was added, which works by
skipping ahead by a number of frames, but this still does not solve the problem of over-
sampling. To fully correct this issue, the CellSys language must be modified, which was
beyond the original scope of this project.
Display more information
Some representation or indication of the light concentration and placement used
when generating the cell data should be implemented. Also, some effect could be used to
signal when and where a cell reproduces. Finally, some indication of a cell’s current
behavior (moving, lazing, reproducing, etc) could be provided.
Other Improvements
The automatic encoding process is not as seamless as might be desired. The calls
to POV-Ray produce the POV-Ray splash page for every frame in the simulation. If this
could be somehow suppressed, it would produce a better user experience. Rendering
frames from the OpenGL frame buffer, instead of POV-Ray, would also solve this problem.
8 References
[1] POV-Ray, 2005, [Cited 05 Mar 2005], Available at: http://www.povray.org/ [2] POV-Ray File Format, 2005, [Cited 05 Mar 2005], Available at: http://www.povray.org/documentation/view/3.6.1/224/ [3] MPlayer, 2005, [Cited 05 Mar 2005], Available at: http://www.mplayerhq.hu [4] Hugs Online, 2005, [Cited 05 Mar 2005], Available at: http://cvs.haskell.org/Hugs/ [5] W. Harrison, R. Harrison, “Domain Specific Languages for Cellular Interactions,” 26th Annual
International Conference of the IEEE Engineering in Medicine and Biology Society, Volume 4, pp 3019-2022, 2004.
[6] C. Elliott and P. Hudak, “Functional reactive animation.” In Proceedings of ICFP'97: International Conference on Functional Programming, pages 163-173, June 1997. [7] P. Wadler, “The essence of functional programming,” Proceedings of the 19th ACM SIGPLAN-SIGACT symposium on Principles of programming languages, pages 1-13, 1992. [8] C. Elliott, "An embedded modeling language approach to interactive 3D and multimedia animation," IEEE Transactions on Software Engineering, Volume 25, Issue 3, pages 291-308, 1999 [9] A. Pang, D. Stewart, S. Seefried, M.Chakravarty, "Plugging Haskell in," Proceedings of the ACM SIGPLAN workshop on Haskell, pages 10-21, 2004 [10] M. Sage, "FranTk - a declarative GUI language for Haskell," Proceedings of the fifth ACM SIGPLAN international conference on Functional programming, pages 106-117, 2000 [11] B. Mechtly, E. Rooker, K. Mast, “3D rendering with C++ and openGL in undergraduate projects,” Journal of Computing Sciences in Colleges, Volume 17, Issue 1, pages 168-177, 2001 [12] A. Courtnet, H. Nilsson, J. Peterson, "The Yampa arcade," Proceedings of the ACM SIGPLAN workshop on Haskell, pages 7-18, 2003 [13] Hibbard, Bill, “VisBio: a biological tool for visualization and analysis,” ACM SIGGRAPH Computer Graphics, Volume 37, No 2, pp 5 – 7, 2003. [14] Can, Tolga; Wang, Yujun; Wang, Yuan-Fang; Su, Jianwen, “FPV: fast protein visualization
using Java 3D™,” Proceedings of the 2003 ACM symposium on Applied computing, 2003. [15] Slavik, Pavel; Gayer, Marek; Hrdlicka, Frantisek; Kubelka, Ondrej, “Visualization for modeling and simulation: problems of visualization of technological processes,”
Proceedings of the 35th conference on Winter simulation: driving innovation, Session: Modeling methodology, pp 746 – 754, 2003.
[16] Goldman, Jacki; Gullick, William; Bray, Dennis; Johnson, Colin, “Individual-based
simulation of the clustering behaviour of epidermal growth factor receptors,” Proceedings of the 2002 ACM symposium on Applied computing, Session: Applications of spatial simulation of discrete entities, pp 127 – 131, 2002.
Appendices
Not applicable.