simulation of biological systems...international company, the so-called micam® (multiplexed...

15
lying gene network from these data, through a modeling stage, this network can be simu- lated and the cell’s state (cancerous or heal- thy) may be predicted according to the degree of expression of the main regulating gene. While it is still important to study genes and proteins (1) individually, it is now becoming possible, and useful, to investigate the struc- ture and dynamics of biological systems by way of global approaches. Since a genetic network is more than an assembly of genes SIMULATION FOR DESIGN CLEFS CEA - No 47 - WINTER 2002-2003 74 SIMULATION OF BIOLOGICAL SYSTEMS It is now possible, with DNA arrays, to measure the expression of genes in a living organism. Reconstructing from these data, by way of modeling, the genetic network thus expressed would make it possible to simulate it and, for example, predict the behavior of a cell or the phenotype of an individual, in other words the expression of his genes. Numerical simulation can be seen to be equally crucial in the technological development of the arrays themselves. (1) See Box 1, in Modeling biological macro- molecules. (2) The set of characters that are observable in an individual. For a cell, this will be the set of manifest characters resulting from the expres- sion of its genes (cell morphology, state of health of the cell, etc.) Micam ® multiplexed arrays (128-spot format), on silicon wafer (right foreground) and in individual assembly (left). CEA/REA/Allard Recently, DNA (1) arrays (see Box 1) have allowed simultaneous measurement of the expression of thousands of genes in a cell or a tissue (Figure 1). For instance, in bakers’ yeast , it is now possible to analyze the expression of the entire genome (about 6,300 genes); in humans, simultaneous analysis is achieved for 20,000 genes, i.e. some two thirds of the genome. This mass of informa- tion could enable a better understanding of the architecture of biochemical or genetic networks within a cell (Figure 2) and the logic of their regulating mechanisms, which ultimately specify a cell’s behavior or the production of a phenotype. (2) It is thus pos- sible to measure the differential expression of a major proportion of the genes and to com- pare, for instance, cancerous cells and heal- thy cells. If it is possible to deduce the under-

Upload: others

Post on 28-Nov-2020

2 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: SIMULATION OF BIOLOGICAL SYSTEMS...International company, the so-called Micam® (Multiplexed Integrated Chip for Advanced Microanalysis) electrochemical addressing technique, (2) in

lying gene network from these data, througha modeling stage, this network can be simu-lated and the cell’s state (cancerous or heal-thy) may be predicted according to the degreeof expression of the main regulating gene.While it is still important to study genes andproteins(1) individually, it is now becomingpossible, and useful, to investigate the struc-ture and dynamics of biological systems byway of global approaches. Since a geneticnetwork is more than an assembly of genes

SIMULATION FOR DESIGN

CLEFS CEA - No 47 - WINTER 2002-2003

74

SIMULATION OF BIOLOGICAL SYSTEMS

It is now possible, with DNA arrays, to measure the expression of genes in a livingorganism. Reconstructing from these data, by way of modeling, the genetic network thusexpressed would make it possible to simulate it and, for example, predict the behavior of acell or the phenotype of an individual, in other words the expression of his genes.Numerical simulation can be seen to be equally crucial in the technological development ofthe arrays themselves.

(1) See Box 1, in Modeling biological macro-molecules.(2) The set of characters that are observable inan individual. For a cell, this will be the set ofmanifest characters resulting from the expres-sion of its genes (cell morphology, state ofhealth of the cell, etc.)

Micam® multiplexed arrays(128-spot format), on siliconwafer (right foreground) andin individual assembly (left).

CEA/REA/Allard

Recently, DNA(1) arrays (see Box 1) haveallowed simultaneous measurement of theexpression of thousands of genes in a cellor a tissue (Figure 1). For instance, in bakers’yeast, it is now possible to analyze theexpression of the entire genome (about 6,300genes); in humans, simultaneous analysis isachieved for 20,000 genes, i.e. some twothirds of the genome. This mass of informa-tion could enable a better understanding ofthe architecture of biochemical or geneticnetworks within a cell (Figure 2) and thelogic of their regulating mechanisms, whichultimately specify a cell’s behavior or theproduction of a phenotype.(2) It is thus pos-sible to measure the differential expression ofa major proportion of the genes and to com-pare, for instance, cancerous cells and heal-thy cells. If it is possible to deduce the under-

Page 2: SIMULATION OF BIOLOGICAL SYSTEMS...International company, the so-called Micam® (Multiplexed Integrated Chip for Advanced Microanalysis) electrochemical addressing technique, (2) in

CLEFS CEA - No 47 - WINTER 2002-2003

75

Figure 1. Expression profile for thecomplete genome of bakers’ yeastobtained with a DNA array. Eachspot corresponds to one gene.Induced genes (whose expressionis activated in a given biologicalcondition) show up in red,repressed genes (whose expressionis reduced) in green, and invariantgenes in yellow.

What is a DNA chip?

electronics, nucleic acid chemistry, image analysis and bio-informatics. To fabricate the chip, the DNA fragments are ampli-fied by polymerization through the polymerase chain reac-tion (PCR) technique and are then bound (by means ofelectrostatic interactions) on a glass, polymer, silicon or metalsubstrate to create hybridization sites.During fabrication of the probe, the sample DNA is labeled

with a fluorophore. Once hybridization is effected, each spotis excited by laser, the emitted fluorescence (the tell-tale signof hybridization) being detected through a fluorescence micro-scope. Data analysis actually consists in analysis of an image,allowing not only location, but also quantification of the fluo-rescent signal emitted by each DNA fragment.CEA, for its part, has developed, together with the CIS bioInternational company, the so-called Micam® (MultiplexedIntegrated Chip for Advanced Microanalysis) electrochemicaladdressing technique,(2) in which the chip comprises a siliconsubstrate overlaid with a gold microelectrode at each spot.This technology, intended for the high-performance chip mar-ket, is to be marketed by the Apibio company (set up in 2001by CEA and bioMérieux), which has exclusive rights to it.

A DNA chip, or DNA array, or biochip (or GeneChip) is a deviceallowing in principle to detect the presence of a strand of DNA– the molecule serving as support for the genetic informationin all living beings –, by pairing this strand with its comple-mentary, the so-called probe, fixed on the chip.Its principle, described at the end of the 1980s, is based on theDNA’s property of hybridization, in other words the ability ofthe bases in one strand of this DNA (thymine, adenine, cyto-sine and guanine) to recognize spontaneously complementarybases (thymine and adenine on the one hand, cytosine and gua-nine on the other) and pair together like the two parts of a zipfastener, thus forming a double helix.The chip can thus identify a given sequence of nucleotides,i.e. the chaining of the bases in a DNA fragment, by bringing itinto the presence of other strands whose sequence is known.The technique can be used in many applications, from dia-gnosis and screening of new medicines and their action sites,to detection of contaminants and pollutants, through geno-

mics research (the study of mutations, in particular). A DNAchip allows very fast visualization of differences in expressionamong genes, even at the scale of a complete genome.DNA chips nowadays have the ability to identify in a singleoperation as many as tens of thousands, or even hundreds ofthousands, of DNA samples.While the principle is simple, practical implementation involvesa combination of advanced technologies in the field of micro-

1

(1) See Box 1, in Modeling biological macromolecules.(2) See note 3 in the main text.

and proteins, merely investigating the archi-tecture of connections will not be sufficientfor an understanding of all its properties.Evolution of the system over time is just asimportant, and the components’ interactiondynamics must also be examined.

Two basic principles

Understanding of a genetic network isbased on two principles.The first concerns the genetic informationflow, which consists in specifying the rela-tions existing from sequence space to func-

tion space. The genome contains informa-tion that may serve to construct “objects” ascomplex as… a human. In terms of infor-mation, the complexity of a fully developedorganism is held in the complexity of itsgenome. But what are the codes that translatea sequence into structure and function? Thesecodes must be represented in an understan-dable form if they are to be implemented forthe construction of models. Biologists aretherefore seeking methods making it pos-sible to find them, when starting from thegene sequence and activity data gained bymeans of DNA arrays.

Page 3: SIMULATION OF BIOLOGICAL SYSTEMS...International company, the so-called Micam® (Multiplexed Integrated Chip for Advanced Microanalysis) electrochemical addressing technique, (2) in

Level of biochemical detail

Models may be highly abstract, such asBoolean networks,(3) or conversely veryconcrete, such as complete biochemical inter-action models using kinetic parameters. Theformer approach enables the analysis of verylarge systems. The latter corresponds moreclosely to biochemical reality; however,being more complex, it remains restricted tothe study of small systems. The requirementthus arises for methods having the capacityto handle a very large amount of data glo-bally, while offering an acceptable level ofdetail, without going as far as the exact reac-tion.

SIMULATION FOR DESIGN

CLEFS CEA - No 47 - WINTER 2002-2003

76

growthfactor

Captioncell cycle survival

DNA damage Wnt/β-cateninnutritive sensor maintenance of telomers

growth factor stop cycleantineoplastic defense viral oncogenesis

cellular process under constructiononcogenesis

tumor suppressoractivation stepinhibition step

lines under construction

references and links

cell cycle

replication phase

change in geneexpression

apoptosis

mobilizationof resources

G protein

Figure 2. Still highly incomplete,this “subway map”-like diagramrehearses the entire set of genes sofar known to be involved incancerization of a cell, togetherwith their interactions. It isprobable that genomics will enablediscovery of many others, whosefunction is as yet unknown. One ofthe goals, at stake in the simulationof genetic networks, is to placethese genes on this network anddetermine the nature of theirinteractions with the rest of thenetwork (map established byWilliam C. Hahn, Robert A.Weinberg and Claudia Bentley, andreproduced with the authorizationof Nature Reviews Cancer, Vol. 2,No. 5, May 2002; © 2002Macmillan Magazines Ltd).

(3) Genetic network based on Boolean logic,allowing considerable simplification of the inter-actions between genes. Each gene is consideredas a binary variable (0 for an expressed gene,1 otherwise) regulated by other genes accor-ding to Boolean functions.

The second principle is investigation of com-plex dynamic systems. Once the architectureof a genetic network is understood, itbecomes possible to investigate its dynamics,in other words the system’s behavior overtime under a variety of conditions. This pro-blem is approached by measuring the varia-tions over time of the genes’ expression. Theaim is to have the ability to predict the attrac-tor of a genetic network, i.e. the stable phe-notype of a cell, so that this network may bedirected, possibly some day, towards a cho-sen attractor: from cancerous cell to benigncell, from aging cell to juvenile cell.

Five parameters to takeon board

The dynamic analysis of a genetic net-work requires models to be created. Thechoice of model is often determined bywhat question the experimenter seeks toanswer and the degree of abstraction he canaccept. Five parameters must be taken intoaccount.

Page 4: SIMULATION OF BIOLOGICAL SYSTEMS...International company, the so-called Micam® (Multiplexed Integrated Chip for Advanced Microanalysis) electrochemical addressing technique, (2) in

Photonic modeling of DNA chips

In the design of DNA chips (or DNA arrays, as they are also called by biologists), thecomplexity of the sensors’ optical and chemical architecture imposes a major numericalsimulation phase, centered on the fluorescent labeling technique.

On DNA chips (Box 1), recognition of thehybridization (Box 1) of two comple-mentary strands is effected by fluorescentlabeling(1) (Box 2). This highly sensitivedetection method is widely used in biologyand analytical chemistry. In the context ofDNA chip technological development, thereproducibility and quantity of the signalsemitted by the spots is a factor to be takenon board during the design phase, and inchip readout. This concern is of paramountimportance, in that analysis of the amountof fluorescence light emitted by each spoton the biochip must be related to the num-

(4) Inhibition of a metabolic synthesis pathwayby its final product.(5) Stabilization, in a living organism, of thevarious physiological constants, in terms ofchemical composition, temperature andvolume.

Fluorescence

The principle of fluorescence is simple: a molecule (a fluorophore) absorbs a pho-ton with a given energy and then re-emits, usually very quickly, a photon of lowerenergy. This energy loss is accounted for by induced thermal agitation or by alte-ration to the structure of the molecule.Consequently, the color of emission light is different from that of the excitation:this enables labeling, by separation of these colors by means of an appropriateoptical device (the simplest example being a prism).Fluorescence detection is highly sensitive, since, thanks to advances made inrecent years with detectors and image capture devices, it is possible to make outa single molecule.Unfortunately, fluorescence is extremely sensitive to parameters such as thefluorophores’ chemical environment, which can extinguish fluorescence (thisphenomenon is generally known as “quenching”), or excessive exposure toexcitation light, which may destroy the tags (“photobleaching” phenome-non).

2

(1) Addition of a chemical group or a radioac-tive atom to a molecule in order to monitor itand locate it through this tag.

Boolean or continuous?

Boolean models are based on the principlethat gene activation or repression responseis quite abrupt. Now, gene expression tendsto be on a continuum rather than binary.Moreover, essential concepts for gene regu-lation mechanisms cannot be represented byBoolean variables. For example, retroinhi-bition(4) stabilizes a network by allowingcontrol of a protein’s or a gene’s homeosta-sis,(5) and reduces sensitivity to externalvariations. In a Boolean circuit, retroinhi-bition will cause oscillation, rather thanincreased stability.

Deterministic or stochastic?

An implicit assumption of continuousmodels is that the fluctuations of a singlemolecule may be ignored. However, thereare many examples in genetic networks sho-wing that the presence of a single copy ofmessenger RNA (mRNA)(1) can play an

essential role in certain biological processesthat cannot then be modeled by means ofpurely deterministic models. Analysis ofmRNA, and protein, abundance and degra-dation could enable identification of the genesfor which stochastic modeling would benecessary.

The spatial dimension

The spatial dimension can play a major roleas regards intracellular compartmentalization(nucleus, cytoplasm, mitochondria, mem-brane), but equally for interactions betweencells. Most biological processes in multi-cellular organisms (during development, forinstance) require interactions between varioustypes of cells. The space concept adds aconsiderable degree of complexity to models.Some information may be extracted from“non-spatial” models, but it will eventuallybe necessary to develop models that includethis dimension.

Data availability

An exhaustive biological model shouldtake into account RNA concentration, butequally protein concentration, its location,etc., since each molecular variable carriessome unique information on cell functioning.

The technological limitations of measure-ment mean obtaining this information is acomplicated affair, even if constraints andredundancy in biological networks do suggesta possible way of understanding the operationof a biological system without modeling allof its parameters. Eventually, it will be impor-tant to develop innovative high-throughputmeasurement technology for the molecularparameters mentioned above.Of course, there is no point in simulatinggenetic networks unless this allows predic-tions to be made as to biological processesand the evolution of diseases, and ultimatelythe development of effective therapies forthe benefit of patients. ●

Xavier GidrolÉvry Génopole (Essonne département)

CLEFS CEA - No 47 - WINTER 2002-2003

77

Page 5: SIMULATION OF BIOLOGICAL SYSTEMS...International company, the so-called Micam® (Multiplexed Integrated Chip for Advanced Microanalysis) electrochemical addressing technique, (2) in

specific tools, to ensure availability ofreliable, sensitive products that can be madeindustrially.

An analytical modelto understand trends

The aim is to model, in the simplest andmost relevant fashion, the behavior of fluo-rescence on plane surfaces composed of asubstrate (microscope slide, silicon wafer(2))that may be overlaid with thin mineral (e.g.silica) or organic coatings. Such coatingshave the purpose, in particular, of ensuringgood DNA binding onto the surface of the“biochip.”The goal is to identify the parameters invol-ved, in the first order, in the optical beha-vior of biochips. For that purpose, fluoro-phores (Box 1 and Box 2) are assimilated tosmall light sources with the property ofhaving a particular orientation: dipoles. Lightexciting the fluorescent tag induces oscilla-tion in the electron cloud, and the directionof this oscillation is imposed by the mole-cule’s atomic structure. Coupling betweensource and surface is processed analyticallythrough a formalism based on an electro-magnetic approach to light radiation: thisdipole may be assimilated to an antennaemitting close to the ground, as the Germanphysicist Arnold Sommerfeld did, some…90 years ago!This preliminary study enabled researchers todecouple light emission phenomena relatedto the biochip’s optical and geometric cha-racteristics from effects related to its chemi-cal and biological environment. Observationof the signals emitted by biochips fabrica-ted on glass or silicon was able to corrobo-rate this.

Predicting signalson complex systems

The technological evolution of biochipsis tending more and more markedly towardscomplex microsystems. Ever-increasingdemand for miniaturization, and integrationof such functions as spot addressing byCMOS(3) transistors (Micam® technology)is pushing this through. Understanding andpredicting fluorescence signals on such com-ponents becomes a task significantly moredifficult than for conventional chips. Inves-tigation of fluorescence characteristics inthe vicinity of many materials on structu-red surfaces, owing to the complexity of theproblem, calls for a numerical approach. Tomeet this requirement, CEA researchershave produced a simulation tool allowingthe photonic balance sheet of a biochip to bedrawn up: they resolve, with no approxi-mations, the electromagnetism equationsdescribing the interaction of the fluorophore,still assimilated to a dipole, for any envi-ronment.

SIMULATION FOR DESIGN

CLEFS CEA - No 47 - WINTER 2002-2003

78

4

3.5

3

2.5

2

1.5

1

0.5

4

3.5

3

2.5

2

1.5

1

0.5

4

1.9

1.8

1.7

1.6

1.5

1.4

2

-16

-17

-18

-19

-20

-21

-15

3.532.521.510.5x (µm)

y (µ

m)

y (µ

m)

x (µm)43.532.521.510.5

Figure 1. Analysis of fluorescenceemitted on a Lightscan® track. Thefigure at left shows the intensity offluorescent light (logarithmicscale) in the structure with thegeometry represented at right.

(2) Thin slice of material (usually silicon) onwhich integrated circuits and other microelec-tronic components and devices are fabricatedbefore they are cut out.(3) On MICAM®-type biochips, to each contactstud is associated a CMOS transistor (high-den-sity integrated circuit using both N- and P-typetransistors, allowing them to use practically noenergy when not switching). It is then possibleto activate them, and, through a phenomenonsimilar to an electrodeposition reaction, to posi-tion specifically on these studs the requiredDNA strands.

ber of hybridized DNA strands, in order todetermine the expression rate of the genesto be analyzed. Although an analyticalmodel allows an understanding of the phy-sics of the problem, the complexity of theoptical and chemical architecture of thesensors to be designed means a majornumerical simulation phase is necessary(see Box A, What is a numerical simula-tion?). This derives its information frommeasurement campaigns carried out with

Page 6: SIMULATION OF BIOLOGICAL SYSTEMS...International company, the so-called Micam® (Multiplexed Integrated Chip for Advanced Microanalysis) electrochemical addressing technique, (2) in

CLEFS CEA - No 47 - WINTER 2002-2003

79

50

40

30

20

lps

(arb

itrar

y un

it)

10

00 30 60 90

detector angle θ° experimentsimulation

60

50

40

20

lss

(arb

itrar

y un

it)

00 30 60 90

detector angle θ°

50

40

30

20

lpp

(arb

itrar

y un

it )

10

00 30 60 90

detector angle θ°

30

25

20

15

lsp

(arb

itrar

y un

it)

5

10

00 30 60 90

detector angle θ°

Figure 2. Comparison betweentheory and experiment for theintensity of light emitted by DNAbiochips (fabricated on a siliconsubstrate coated with a thin layerof silica a few hundrednanometers thick) as a functionof measurement angle. The curvescorrespond to an analysis offluorescent light polarizationalong two orthogonal directions("s" and "p") after illuminating thebiochip under light exhibitingrectilinear polarization alongthese two directions.

Application to a concretetechnological case

One major application of this tool wasmodeling the behavior of fluorescence onLightscan® biochips jointly developed bythe bioMérieux company and CEA’s Elec-tronics and Information Technology Labo-ratory (Leti). This system boils down toproducing a biochip scanner inspired…from a compact disk player, a solutionwhich allows cost reductions by a factor of5–10, compared to a conventional device.Like the compact disk, the biochips mustpresent tracks, so as to allow very exactpositioning of the pickup head as it is tra-versed. Scientists have thus analyzed thebehavior of fluorescence on the tracks ofthese biochips, which thus feature a struc-turing of the glass (a few tens of nanome-ters deep over a width of several microns)

(4) A goniofluorimeter (neologism) serves todetermine fluorescent light intensity for a givendirection in space. The device developed atCEA–Leti also allows analysis of the polariza-tion state of fluorescent light.(5) The theory of polarization allows a des-cription of the orientation (in terms of ampli-tude and variation over time) of the electricalfield vector as it describes a light vibration. Inthe case of fluorescence, a study of light pola-rization provides a means of analyzing pheno-mena such as the molecule’s change in structurebetween excitation and re-emission of light.

on which a thin layer, a few tens of nano-meters thick, is deposited (Figure 1). Pho-tonic simulation shows that light emitted bythe fluorophore is trapped in the thin layerin the form of guided waves (as in opticalfibers) and partly released at the level of thestructures.

An indispensableexperimental approach

Modeling alone cannot guarantee masteryof technological processes. It is indispen-sable to characterize their outcomes with adedicated metrological tool. Developmentof a goniofluorimeter(4) (Figure 2) allowed,in this case, validation of the simulationwork. Critical examination of the radiationpatterns measured under fluorescence, takinginto account the polarization effects oflight,(5) led to a reconsideration of the ana-lytical approach and improved modeling.Researchers now have an optical characte-rization platform enabling them to considera quantification approach and to suggest fluo-rescence standards.There still remains for them to take on boardenergy-coupling phenomena between thefluorophore and incident radiation, and envi-ronmental phenomena such as “photobleach-ing” or “quenching” (Box 2). ●

Stéphane GétinTechnological Research Division

CEA Grenoble Center

Page 7: SIMULATION OF BIOLOGICAL SYSTEMS...International company, the so-called Micam® (Multiplexed Integrated Chip for Advanced Microanalysis) electrochemical addressing technique, (2) in

RESEARCH AND SIMULATION

CLEFS CEA - No 47 - WINTER 2002-2003

10

What is a numerical simulation? A

Numerical simulation consists in reproducing, through com-putation, a system’s operation, described at a prior stage by anensemble of models. It relies on specific mathematical and com-putational methods. The main stages involved in carrying out aninvestigation by means of numerical simulation are practicescommon to many sectors of research and industry, in particularnuclear engineering, aerospace or automotive.At every point of the “object” considered, a number of physi-cal quantities (velocity, temperature…) describe the state andevolution of the system being investigated. These are not inde-pendent, being linked and governed by equations, generallypartial differential equations. These equations are theexpression in mathematical terms of the physical laws mode-ling the object’s behavior. Simulating the latter’s state is todetermine – at every point, ideally – the numerical values forits parameters. As there is an infinite number of points, andthus an infinite number of values to be calculated, this goal isunattainable (except in some very special cases, where theinitial equations may be solved by analytical formulae). A natu-ral approximation hence consists in considering only a finitenumber of points. The parameter values to be computed arethus finite in number, and the operations required becomemanageable, thanks to the computer. The actual number ofpoints processed will depend, of course, on computationalpower: the greater the number, the better the object’s des-cription will ultimately be. The basis of parameter computation,as of numerical simulation, is thus the reduction of the infi-nite to the finite: discretization.

How exactly does one operate, starting from the model’s mathe-matical equations? Two methods are very commonly used, beingrepresentative, respectively, of deterministic computation

methods, resolving the equations governing the processes inves-tigated after discretization of the variables, and methods of sta-

tistical or probabilistic calculus.The principle of the former, known as the finite-volume method,dates from before the time of computer utilization. Each of theobject’s points is simply assimilated to a small elementary volume(a cube, for instance), hence the finite-volume tag. Plasma isthus considered as a set or lattice of contiguous volumes, which,by analogy to the makeup of netting, will be referred to as amesh. The parameters for the object’s state are now defined ineach mesh cell. For each one of these, by reformulating themodel’s mathematical equations in terms of volume averages, itwill then be possible to build up algebraic relations betweenthe parameters for one cell and those of its neighbors. In total,there will be as many relations as there are unknown parameters,and it will be up to the computer to resolve the system of rela-tions obtained. For that purpose, it will be necessary to turn tothe techniques of numerical analysis, and to program specificalgorithms.The rising power of computers has allowed an increasing fine-ness of discretization, making it possible to go from a few tens ofcells in the 1960s to several tens of thousands in the 1980s, throughto millions in the 1990s, and up to some ten billion cells nowadays(Tera machine at CEA’s Military Applications Division), a figurethat should increase tenfold by the end of the decade.

Example of an image froma 2D simulation of instabilities,

carried out with CEA’s Terasupercomputer. Computation

involved adaptive meshing,featuring finer resolution in the

areas where processes are at theirmost complex.

CEA

Page 8: SIMULATION OF BIOLOGICAL SYSTEMS...International company, the so-called Micam® (Multiplexed Integrated Chip for Advanced Microanalysis) electrochemical addressing technique, (2) in

CLEFS CEA - No 47 - WINTER 2002-2003

11

A

A refinement of meshing, adaptive remeshing, consists in adjus-ting cell size according to conditions, for example by makingthem smaller and more densely packed at the interfaces betweentwo environments, where physical processes are most complex,or where variations are greatest.The finite-volume method can be applied to highly diverse phy-sical and mathematical situations. It allows any shape of meshcell (cube, hexahedron, tetrahedron…), and the mesh may bealtered in the course of computation, according to geometricor physical criteria. Finally, it is easy to implement in thecontext of parallel computers (see Box B, Computational

resources for high-performance numerical computa-

tion), as the mesh may be subjected to partitioning for thepurposes of computation on this type of machine (example:Figure B).Also included in this same group are the finite-difference

method, a special case of the finite-volume method where cellwalls are orthogonal, and the finite-element method, where avariety of cell types may be juxtaposed.The second major method, the so-called Monte Carlo method,is particularly suited to the simulation of particle transport, forexample of neutrons or photons in a plasma (see Simulations

in particle physics). This kind of transport is in fact charac-terized by a succession of stages, where each particle maybe subject to a variety of events (diffusion, absorption,emission…) that are possible a priori. Elementary probabili-ties for each of these events are known individually, foreach particle.It is then a natural move to assimilate a point in the plasma to aparticle. A set of particles, finite in number, will form a repre-sentative sample of the infinity of particles in the plasma, as fora statistical survey. From one stage to the next, the sample’s evo-lution will be determined by random draws (hence the method’sname). The effectiveness of the method, implemented in LosAlamos as early as the 1940s, is of course dependent on the sta-tistical quality of the random draws. There are, for just this pur-pose, random-number methods available, well suited to com-puter processing.

Finite-volume and Monte Carlo methods have been, and still are,the occasion for many mathematical investigations. These stu-dies are devoted, in particular, to narrowing down these methods’convergence, i.e. the manner in which approximation precisionvaries with cell or particle number. This issue arises naturally,when confronting results from numerical simulation to experi-mental findings.

3D simulation carried out with the Tera supercomputer, set up at theend of 2001 at CEA’s DAM-Île de France Center, at Bruyères-le-Châtel(Essonne département).

CEA

How does a numerical simulation proceed?

Reference is often made to numerical experiments, to emphasizethe analogy between performing a numerical simulation and car-rying out a physical experiment.In short, the latter makes use of an experimental setup, configur-ed in accordance with initial conditions (for temperature, pres-sure…) and control parameters (duration of the experiment, ofmeasurements…). In the course of the experiment, the setupyields measurement points, which are recorded. These records arethen analyzed and interpreted.In a numerical simulation, the experimental setup consists in anensemble of computational programs, run on computers. Thecomputation codes, or software programs, are the expression,via numerical algorithms, of the mathematical formulations ofthe physical models being investigated. Prior to computation,and subsequent to it, environment software programs managea number of complex operations for the preparation of compu-tations and analysis of the results.The initial data for the simulation will comprise, first of all, the deli-neation of the computation domain – on the basis of an approxi-mate representation of the geometric shapes (produced by meansof drafting and CAD [computer-assisted design] software) –, fol-

lowed by discretization of this computation domain over a mesh,as well as the values for the physical parameters over that mesh,and the control parameters to ensure proper running of the pro-grams… All these data (produced and managed by the environ-ment software programs) will be taken up and verified by thecodes. The actual results from the computations, i.e. the nume-rical values for the physical parameters, will be saved on the fly.In fact, a specific protocol will structure the computer-generatedinformation, to form it into a numerical database.A complete protocol organizes the electronic exchange of requi-red information (dimensions, in particular) in accordance with pre-defined formats: modeler,(1) mesher,(2) mesh partitioner, com-

(1) The modeler is a tool enabling the generation and manipulation of points,curves and surfaces, for the purposes, for example, of mesh generation.(2) The geometric shapes of a mesh are described by sets of pointsconnected by curves and surfaces (Bézier curves and surfaces, forinstance), representing its boundaries.

Page 9: SIMULATION OF BIOLOGICAL SYSTEMS...International company, the so-called Micam® (Multiplexed Integrated Chip for Advanced Microanalysis) electrochemical addressing technique, (2) in

RESEARCH AND SIMULATION

CLEFS CEA - No 47 - WINTER 2002-2003

12

The example of a thermalhydraulics computation

Implementation of a numerical simulation protocol may be illus-trated by the work carried out by the team developing the ther-

malhydraulics computation software Trio U. This work was carriedout in the context of a study conducted in collaboration with theFrench Radiological Protection and Nuclear Safety Institute (IRSN:Institut de radioprotection et de sûreté nucléaire). The aim was toobtain very accurate data to provide engineers with wall heat-stressvalues for the components of a pressurized-water reactor in caseof a major accident involving turbulent natural circulation of hotgases. This investigation requires simultaneous modeling of large-scale “system” effects and of small-scale turbulent processes (seeBox F, Modeling and simulation of turbulent flows).This begins with specification of the overall computation model(Figure A), followed by production of the CAD model and cor-responding mesh with commercial software programs (FigureB). Meshes of over five million cells require use of powerful graph-ics stations. In this example, the mesh for a steam generator(Figures C and D) has been partitioned to parcel out computa-tion over eight processors on one of CEA’s parallel computers:each color stands for a zone assigned to a specific processor. Thecomputations, whose boundary conditions are provided by wayof a “system” computation (Icare–Cathare), yield results whichit is up to the specialists to interpret. In this case, visualizationon graphics stations of the instantaneous values of the velocity fieldshow the impact of a hot plume on the steam generator’s tube-plate (section of the velocity field, at left on Figure E), and ins-tantaneous temperature in the water box (at right).

Figure A. Overallcomputationdomain,including partof the reactorvessel (shown inred), the outletpipe (hot leg, inlight blue), steamgenerator (darkblue), andpressurizer(green).

putation codes, visualization and analysis software programs.Sensitivity studies regarding the results (sensitivity to meshesand models) form part of the numerical “experiments.”On completion of computation (numerical resolution of the equa-tions describing the physical processes occurring in each cell),analysis of the results by specialists will rely on use of the numer-ical database. This will involve a number of stages: selective extra-ction of data (according to the physical parameter of interest)and visualization, and data extraction and transfer for the pur-poses of computing and visualizing diagnostics.This parallel between performing a computation case for a numer-ical experiment and carrying out a physical experiment does notend there: the numerical results will be compared to the exper-imental findings. This comparative analysis, carried out on the

basis of standardized quantitative criteria, will make demandson both the experience and skill of engineers, physicists, andmathematicians. Its will result in further improvements to physicalmodels and simulation software programs.

Bruno Scheurer

Military Applications DivisionCEA DAM-Ile de France Center

Frederic Ducros and Ulrich Bieder

Nuclear Energy DivisionCEA Grenoble Center

Page 10: SIMULATION OF BIOLOGICAL SYSTEMS...International company, the so-called Micam® (Multiplexed Integrated Chip for Advanced Microanalysis) electrochemical addressing technique, (2) in

CLEFS CEA - No 47 - WINTER 2002-2003

13

A

Figures C and D.

1.20.103

1.18.103

1.17.103

1.15.103

1.13.103

1.12.103

1.10.103

Figure E.

Figure B. CAD modelof the hot leg of thereactor vessel outlet(left) and unstructuredmesh for it (right).

Page 11: SIMULATION OF BIOLOGICAL SYSTEMS...International company, the so-called Micam® (Multiplexed Integrated Chip for Advanced Microanalysis) electrochemical addressing technique, (2) in

SIMULATION FOR UNDERSTANDING

CLEFS CEA - No 47 - WINTER 2002-2003

18

Computational resources B

for high-performance numerical simulation

Carrying out more accurate numerical simulations requires the useof more complex physical and numerical models applied to moredetailed descriptions of the simulated objects (see Box A, What is

a numerical simulation?). All this requires advances in the areaof simulation software but also a considerable increase in the capa-city of the computer systems on which the software runs.

Scalar and vector processorsThe key element of the computer is the processor, which is thebasic unit that executes a program to carry out a computation.There are two main types of processors, scalar processors andvector processors. The former type carries out operations on ele-mentary (scalar) numbers, for instance the addition of two num-bers. The second type carries out operations on arrays of numbers(vectors), for example adding elementwise the numbers belongingto two sets of 500 elements. For this reason, they are particularlywell suited to numerical simulation: when executing an opera-tion of this type, a vector processor can operate at a rate close toits maximum (peak) performance. The same operation with ascalar processor requires many independent operations (opera-ting one vector element at a time) executed at a rate well belowits peak rate. The main advantage of scalar processors is theirprice: these are general-purpose microprocessors whose designand production costs can be written-down across broad markets.

Strengths and constraints of parallelismRecent computers allow high performances partly by using ahigher operating frequency, partly by trying to carry out seve-ral operations simultaneously: this is a first level of paralle-

lism. The speeding up in frequency is bounded by develop-

ments in microelectronics technology, whereas interdepen-dency between the instructions to be carried out by the pro-cessor limits the amount of parallelism that is possible. Simul-taneous use of several processors is a second level ofparallelism allowing better performance, provided programsable to take advantage of this are available. Whereas parallelismat processor level is automatic, parallelism between processors

in a parallel computer must be taken into account by the pro-grammer, who has to split his program into independent partsand make provisions for the necessary communication bet-ween them. Often, this is done by partitioning the domain onwhich the computation is done. Each processor simulates thebehavior of one domain and regular communications betweenprocessors ensure consistency for the overall computation. Toachieve an efficient parallel program, a balanced share of theworkload must be ensured among the individual processorsand efforts must be made to limit communications costs.

The various architectures

A variety of equipment types are used for numerical simulation.From their desktop computer where they prepare computationsand analyze the results, users access shared computation, sto-rage and visualization resources far more powerful than theirown. All of these machines are connected by networks, enablinginformation to circulate between them at rates compatible withthe volume of data produced, which can be as much as 1 terabyte

(1 TB = 1012 bytes) of data for one single simulation.The most powerful computers are generally referred to as super-

computers. They currently attain capabilities counted in tera-

flops (1 Tflops = 1012 floating-point operations per second).Currently, there are three main types of super-computers: vector supercomputers, clusters ofmini-computers with shared memory, and clus-ters of PCs (standard home computers). Thechoice between these architectures largelydepends on the intended applications and uses.Vector supercomputers have very-high-perfor-mance processors but it is difficult to increasetheir computing performance by adding pro-cessors. PC clusters are inexpensive but poorlysuited to environments where many users per-form numerous large-scale computations (interms of memory and input/output).It is mainly for these reasons that CEA’s Mili-tary Applications Division (DAM) has choosenfor its Simulation Program (see The Simula-

tion Program: weapons assurance without

nuclear testing) architectures of the shared-memory mini-computer cluster type, alsoknown as clusters of SMPs (symmetric multi-processing). Such a system uses as a basic buil-ding block a mini-computer featuring severalmicroprocessors sharing a common memory(see Figure). As these mini-computers are inwidespread use in a variety of fields, rangingfrom banks to web servers through designoffices, they offer an excellent perfor-mance/price ratio. These basic “blocks” (alsoknown as nodes) are connected by a high-per-

Installed at CEA (DAM-Ile de France Center) in December 2001, the TERA machine designedby Compaq (now HP) has for its basic element a mini-computer with 4 x 1-GHz processorssharing 4 GB of memory and giving a total performance of 8 Gflops. These basic elements areinterconnected through a fast network designed by Quadrics Ltd. A synchronizationoperation across all 2,560 processors is completed in under 25 microseconds. The overall filesystem offers 50 terabytes of storage space for input/output with an aggregate bandwidth of7.5 GB/s.

Page 12: SIMULATION OF BIOLOGICAL SYSTEMS...International company, the so-called Micam® (Multiplexed Integrated Chip for Advanced Microanalysis) electrochemical addressing technique, (2) in

CLEFS CEA - No 47 - WINTER 2002-2003

19

B

disks shared by all the nodes

I/O node

computation node

computation node

computation node

computation node

computation node

computation node

EV68

memory (4 Go)

input/output (networks, local drives, etc.)

EV68

EV68

EV68

internal network

I/O node

I/O node

Figure. Architecture of an “SMP-cluster” type machine. At left, the general architecture (I/O = input/output), on the right, that of a node with fourAlpha EV68 processors, clocked at 1 GHz.

Parallel computers are well suited to numerical methods based onmeshing (see Box A, What is a numerical simulation?) but equallyto processing ab-initio calculations such as this molecular-dynamicssimulation of impact damage to two copper plates moving at 1 km/s(see Simulation of materials). The system under consideration includes100,000 atoms of copper representing a square-section (0.02 µmsquare) parallelogram of normal density. The atoms interact inaccordance with an embedded atom potential over approximately4–6 picoseconds. The calculation, performed on 18 processors of theTera supercomputer at Bruyères-le-Châtel using the CEA-developedStamp software, accounted for some ten minutes of “user” time(calculation carried out by B. Magne). Tests involving up to 64 millionatoms have been carried out, requiring 256 processors over some onehundred hours.

formance network: the cumulated power of several hundreds ofthese “blocks” can reach several Tflops. One then speaks of amassively parallel computer.Such power can be made available for one single parallel appli-cation using all the supercomputer’s resources, but also for manyindependent applications, whether parallel or not, each usingpart of the resources. While the characteristic emphasized to describe a supercom-puter is usually its computational power, the input/outputaspect should not be ignored. These machines, capable of run-ning large-scale simulations, must have storage systems withsuitable capacities and performance. In clusters of SMPs, eachmini-computer has a local disk space. However, it is not advi-sable to use this space for the user files because it wouldrequire the user to move explicitly his data between each dis-tinct stage of his calculation. For this reason, it is importantto have disk space accessible by all of the mini-computersmaking up the supercomputer. This space generally consistsin sets of disk drives connected to nodes whose main func-tion is to manage them. Just as for computation, parallelism ofinput/output allows high performance to be obtained. For suchpurposes, parallel overall file systems must be implemented,enabling rapid and unrestricted access to the shared diskspace.While they offer considerable computational power, clustersof SMPs nevertheless pose a number of challenges. Among themost important, in addition to programming simulation soft-ware capable of using efficiently a large number of processors,is the development of operating systems and associated soft-ware tools compatible with such configurations, and fault-tole-rant.

François Robin

Military Applications DivisionCEA, DAM–Ile de France Center

CEA

Computational resources

for high-performance numerical simulation (cont’d)

Page 13: SIMULATION OF BIOLOGICAL SYSTEMS...International company, the so-called Micam® (Multiplexed Integrated Chip for Advanced Microanalysis) electrochemical addressing technique, (2) in

model as regards a particular industrial pro-blem. With the Trio software program deve-loped at CEA, the best turbulence modelsare systematically tested on exchanger geo-metries comprising plates and fins whosecharacteristics are determined experimen-tally.Two-dimensional and three-dimensionalthermalhydraulic simulations of the flowaround an isolated strip fin have been car-ried out, using two simulation tools. First,Trio enabled non-stationary simulations tobe carried out, with various LES models.Then, the Fluent commercial computationtool was used to carry out stationary simu-lations with conventional turbulence models(of the Reynolds-averaged Navier–Stokes[RANS] equation type). These models yieldthe representative quantities for flow tem-poral averages.

The simulation results obtained are syste-matically compared with the literature or tolaboratory experimental results: wall pres-sure-, friction- and heat-transfer-coefficientprofiles, global pressure coefficient, andfrequency of downstream vortex release.The meshes used are of the order of100,000 cells for 2D simulations, 1 mil-lion for 3D simulations. It is apparent thatthe numerical results are related to the tur-bulence models used, but also to theirimplementation. The results obtained inLES (Figure 1) allow identification of themechanisms responsible for the increasedheat transfer, by geometry-induced tur-bulence generation. The sequence ofimages enables visualization of the evolu-tion over time of local mechanisms suchas vortex growth, detachment and entrain-ment.

SIMULATION FOR DESIGN

CLEFS CEA - No 47 - WINTER 2002-2003

86

Turbulence, or disturbance in so-called turbulent flow, developsin most of the flows that condition our immediate environment(rivers, ocean, atmosphere). It also turns out to be one, if not the,dimensioning parameter in a large number of industrial flows(related to energy generation or conversion, aerodynamics, etc.).Thus, it is not surprising that a drive is being launched to achieveprediction for the process – albeit in approximate fashion as yet– especially when it combines with complicating processes (stra-tification, combustion, presence of several phases, etc.). This isbecause, paraxodically, even though it is possible to predict theturbulent nature of a flow and even, from a theoretical stand-point, to highlight certain common – and apparently universal –characteristics of turbulent flows,(1) their prediction, in specificcases, remains tricky. Indeed, it must take into account the consi-

Figure. Instantaneous (top) and averaged (bottom) temperature field in a mixing situation. The curve shows the history of temperature at one point:fluctuating instantaneous value in blue and mean in red (according to Alexandre Chatelain, doctoral dissertation) (DEN/DTP/SMTH/LDTA).

T (t)

575

550

525

500

475

0 5

time (s)

tem

pera

ture

(K

)

10

<T>

derable range of space and time scales(2) involved in any flow ofthis type.Researchers, however, are not without resources, nowadays,when approaching this problem. First, the equations governingthe evolution of turbulent flows over space and time(Navier–Stokes equations(3)) are known. Their complete solu-tion, in highly favorable cases, has led to predictive descrip-tions. However, systematic use of this method of resolutioncomes up against two major difficulties: on the one hand, itwould require complete, simultaneous knowledge of allvariables attached to the flow, and of the forced-flow condi-tions imposed on it,(4) and, on the other hand, it would mobi-lize computational resources that will remain unrealistic fordecades yet.

Modeling and simulation of turbulent flows F

Page 14: SIMULATION OF BIOLOGICAL SYSTEMS...International company, the so-called Micam® (Multiplexed Integrated Chip for Advanced Microanalysis) electrochemical addressing technique, (2) in

Industrialization of these methods cannotoccur without customer support of theconsultancy type, the more so since suchmethods require ever-increasing speciali-zation. For this purpose, GRETh has deve-loped the software platform concept,which involves making the thermalhy-

draulic software programs and advancedmodels available to its industrialists’ club,and working in close collaboration withthe industrial partner, by giving access toits own resources and associating him asmuch as possible in tool use and imple-mentation.(1)

CLEFS CEA - No 47 - WINTER 2002-2003

87

The sole option, based on the fluctuating character of the flowdue to turbulent agitation, must thus be to define and use ave-rage values. One of the most widely adopted approaches consistsin looking at the problem from a statistical angle. The mean ove-rall values for velocity, pressure, temperature… whose distribu-tion characterizes the turbulent flow, are defined as the princi-pal variables of the flow one then seeks to qualify relative to thosemean values. This leads to a decomposition of the motion (theso-called Reynolds decomposition) into mean and fluctuatingfields, the latter being the measure of the instantaneous local dif-ference between each actual quantity and its mean (Figure). Thesefluctuations represent the turbulence and cover a major part ofthe Kolmogorov spectrum.(1)

This operation considerably lowers the number of degrees ofliberty of the problem, making it amenable to computational treat-ment. It does also involve many difficulties: first, it should benoted that, precisely due to the non-linearity of the equations ofmotion, any average process leads to new, unknown terms thatmust be estimated. By closing the door on complete, determi-nistic description of the phenomenon, we open one to modeling,i.e. to the representation of the effects of turbulence on meanvariables.Many advances have been made since the early models (Prandtl,1925). Modeling schemas have moved unabated towards greatercomplexity, grounded on the generally verified fact that any newextension allows the previously gained properties to be preserved.It should also be noted that, even if many new developments areemphasizing anew the need to treat flows by respecting their

non-stationary character, the most popular modeling techniqueswere developed in the context of stationary flows, for which,consequently, only a representation of the flow’s temporal meancan be achieved: in the final mathematical model, the effects ofturbulence thus stem wholly from the modeling process.It is equally remarkable that, despite extensive work, no modelinghas yet been capable of accounting for all of the processes influen-cing turbulence or influenced by it (transition, non-stationarity,stratification, compression, etc.). Which, for the time being, wouldseem to preclude statistical modeling from entertaining any ambi-tions of universality.Despite these limitations, most of the common statistical mode-ling techniques are now available in commercial codes and indus-trial tools. One cannot claim that they enable predictive compu-tations in every situation. They are of varying accuracy, yieldinguseful results for the engineer in controlled, favorable situations(prediction of drag to an accuracy of 5–10%, sometimes better,for some profiles), but sometimes inaccurate in situations thatsubsequently turn out to lie outside the model’s domain of validity.Any controlled use of modeling is based, therefore, on a qualifi-cation specific to the type of flow to be processed. Alternativemodeling techniques, meeting the requirement for greater accu-racy across broader ranges of space and time scales, and there-fore based on a “mean” operator of a different nature, are cur-rently being developed and represent new ways forward.The landscape of turbulence modeling today is highly complex,and the unification of viewpoints and of the various modelingconcepts remains a challenge. The tempting goal of modelingwith universal validity thus remains out of order. Actual imple-mentation proceeds, in most cases, from compromises, guidedas a rule by the engineer’s know-how.

Frédéric Ducros

Nuclear Energy DivisionCEA Grenoble Center

(1) One may mention the spectral distribution of turbulent kinetic energyknown as the “Kolmogorov spectrum,” which illustrates very simply thehierarchy of scales, from large, energy-carrying scales to ever smaller, lessenergetic scales.(2) This range results from the non-linearities of the equations of motion,giving rise to a broad range of spatial and temporal scales. This range isan increasing function of the Reynolds number, Re, which is a measureof the inertial force to viscous force ratio. (3) The hypothesis that complete resolution of the Navier–Stokes equa-tions allows simulation of turbulence is generally accepted to be true, atany rate for the range of shock-free flows.(4) This is a problem governed by initial and boundary conditions.

F

Figure 2. Trajectories followed by thefluid inside a corrugated-plate heatexchanger with an angle of 60° to theflow direction. Obtained with theFluent software, these results illustratethe work on local modeling of flows inexchangers of this type.

Page 15: SIMULATION OF BIOLOGICAL SYSTEMS...International company, the so-called Micam® (Multiplexed Integrated Chip for Advanced Microanalysis) electrochemical addressing technique, (2) in

matics – or by employing the data to modelthe structure and function of biological sys-tems using the laws of physics and che-mistry. This paper discusses the latterapproach and, in particular, the modelingof biological macromolecules at an atomicscale.

Modeling a system

Modeling requires a theoretical frame-work that defines how models of a systemare built and what rules they obey. In thecase of molecular modeling (see Box C,Molecular modeling), this framework isspecified by atomic theory and the laws ofclassical, quantum and statistical mecha-nics. Precise application of these theoriesdepends, of course, on the nature of thesystem. For a biological macromolecule, atypical procedure involves five steps:• defining the composition of the system.This step identifies the composition, num-ber and type of macromolecules to be inves-tigated, the nature of the environment – e.g.water or some other solvent – and the phy-sical properties of the system – e.g. its tem-perature and pressure;• defining the laws that the model of thesystem obeys. For molecular modeling,two elements are required. First, a methodmust be available to calculate the system’spotential energy which is the sum of theinteraction energies between all of theatoms in the system. This quantity is criti-

cal because it determines the most probable,or the most stable, configurations of thesystem. Second, an algorithm must be cho-sen that uses potential energy to explorethe various configurations that are acces-sible to the system. This algorithm is oftena classical dynamical one, which allowsthe motion of atoms to be followed as afunction of time by solving the equations ofNewtonian mechanics;• selecting initial conditions, such as theposition and velocity of each atom. Giventhe structural complexity of biological macro-molecules, it is usual to start from structuresdetermined experimentally by X-ray crys-tallography or nuclear magnetic resonance(NMR);

CLEFS CEA - No 47 - WINTER 2002-2003

35

Biological macromolecules

other molecules, which assist in their biological tasks. Thethree-dimensional structure of proteins is very complex butcritical to their function.Nucleic acids – deoxyribonucleic acid (DNA) and ribonu-cleic acid (RNA) – are nucleotide polymers. DNA, in itsdouble-strand form (two nucleotide chains arranged in adouble helix), is the genetic material which, amongst otherthings, codes instructions for the amino-acid sequences of allthe proteins synthesized by the cell. RNA, usually in a single-strand form (one nucleotide chain), is essential for proteinsynthesis.Lipids, fundamental constituents of cell membranes, alsoplay an important part in metabolism and as energy reserves.The lipids include a number of classes, such as phospholi-pids, triglycerides and steroids.Polysaccharides are polymers of simple sugars, such as fruc-tose and glucose. They play a structural role, particularly inplants (cellulose is a polysaccharide), are involved in molecu-lar recognition and can serve as energy reserves.

Cells are the fundamental entities in all living organisms. Apartfrom water, biological macromolecules are the major consti-tuents of the cell, where they perform a multiplicity of func-tions. A biological macromolecule comprises sub-units of lowmolecular weight, added one to the other to form a long, chain-shaped polymer. Usually, each chain is formed of only onefamily of sub-units and the precise sequence of sub-units isessential to the function of the macromolecule. There are fourmajor categories of macromolecules.Proteins are probably the most important macromolecules,since they play a predominant role in most biological pro-cesses. For instance, enzymes are proteins that catalyze

the majority of chemical reactions in the cell. Other classesof protein have more of a structural role or are involved insignaling,(1) regulation of metabolism or the immune sys-tem. Proteins are amino acid polymers – about twenty dif-ferent types of amino acids are commonly found – and a pro-tein may comprise several chains, each containing a fewhundred amino acids. Proteins are often associated with

1

(1) Transmission of signals allowing cells to communicate amongstthemselves.

The genome projects

The human genome project and similar projects for other organisms, suchas yeast and the mouse, consist in the sequencing and analysis of all theDNA that forms the inherited genetic makeup of an organism. The genomeprojects have spawned related approaches which also aim to investigate anaspect of the functioning of cells and organisms at a global level. Thus,for example: structural genomics tries to determine the three-dimensionalstructures of all the proteins (Box 1) in an organism, directly by resolvingtheir structure, or indirectly by comparing them to other proteins with homo-logous sequences and known structures; proteomics is dedicated to charac-terizing all interactions between proteins within a cell; and metabolomics

seeks to identify the composition and distribution of metabolites in the celland how these change throughout the cell’s life and under various externalconditions.

2

• simulating the system, which involves sol-ving the equations describing the atoms’motion. Numerical simulation (Box A,What is a numerical simulation?) is indis-pensable in the case of biological macro-molecules, since an analytical solution forsuch a large, complicated system of equa-tions is not possible. The computation timerequired for a simulation varies dependingon system size, the precision of the methodsemployed and the problem investigated. Arealistic simulation of a system of about fiftythousand atoms would easily take up forsome weeks the most powerful of today’scomputers;• analyzing the results. This analysis,which is principally statistical, relates the