1 ececs 819: lecture 1—introduction computational aspects of biological systems

1

ECECS 819: lecture 1—Introduction

Computational aspects of biological systems

http://images.google.com/imgres?imgurl=http://www.biologie.uni-hamburg.de/b-online/fo02/bl51.jpg&imgrefurl=http://www.biologie.uni-hamburg.de/b-online/e02/02c.htm&h=234&w=320&sz=19&tbnid=1a6eexZ0Vu4J:&tbnh=82&tbnw=112&prev=/images%3Fq%3Dleaves%26hl%3Den%26lr%3D&oi=imagesr&start=2

http://images.google.com/imgres?imgurl=http://www.biologie.uni-hamburg.de/b-online/fo02/bl51.jpg&imgrefurl=http://www.biologie.uni-hamburg.de/b-online/e02/02c.htm&h=234&w=320&sz=19&tbnid=1a6eexZ0Vu4J:&tbnh=82&tbnw=112&prev=/images%3Fq%3Dleaves%26hl%3Den%26lr%3D&oi=imagesr&start=2

2

Biology—Macro and Micro Elements

E. coli chromosome

E. coli

protein

An amino acid (alanine)

DNA

http://images.google.com/imgres?imgurl=http://www.lbl.gov/Publications/Currents/Archive/view-assets/Mar-05-2004/E-coli.jpg&imgrefurl=http://www.lbl.gov/Publications/Currents/Archive/Mar-05-2004.html&h=248&w=300&sz=61&tbnid=fn2A5zdtBwoJ:&tbnh=91&tbnw=110&prev=/images%3Fq%3De%2Bcoli%26hl%3Den%26lr%3D&oi=imagesr&start=1

http://www.cs.dartmouth.edu/~brd/www/bio/images/1hdd-2.gif

http://images.google.com/imgres?imgurl=http://7art-screensavers.com/screenshots/butterfly/orange-butterfly.jpg&imgrefurl=http://7art-screensavers.com/screenshots/butterfly/&h=600&w=800&sz=45&tbnid=ywjWtCsuWRoJ:&tbnh=106&tbnw=142&prev=/images%3Fq%3Dbutterfly%26hl%3Den%26lr%3D&oi=imagesr&start=1

http://images.google.com/imgres?imgurl=http://darwin.bc.asu.edu/blog/wp-content/panda.jpg&imgrefurl=http://darwin.bc.asu.edu/blog/&h=338&w=220&sz=16&tbnid=fVDRdmgqqjkJ:&tbnh=115&tbnw=74&hl=en&prev=/images%3Fq%3Dpanda%26hl%3Den%26lr%3D&oi=imagesr&start=3

http://images.google.com/imgres?imgurl=http://www.photo-templates.com/images/Flower%252001.jpg&imgrefurl=http://www.photo-templates.com/Free_Downloads.htm&h=1536&w=2304&sz=430&tbnid=q__qtvE1NCAJ:&tbnh=100&tbnw=150&prev=/images%3Fq%3Dflower%26hl%3Den%26lr%3D&oi=imagesr&start=3

3

Biosystem: an “information processing system”

•“sensor” / “processor”/”actuator”

•Self-repairing

•Stores information

•Can interact with other systems (e.g., use of nerve signals to activate devices)

•May be a “community” (e.g., coral, fungus)-


4

Goal 1: Use “micro” elements as information processing / storage devices

—”biomolecular computers”

E. coli chromosome

E. coli

protein


DNA



http://www1.us.dell.com/content/products/category.aspx/desktops?c=us&cs=19&l=en&s=dhs

http://www1.us.dell.com/content/products/productdetails.aspx/axim_x51v?c=us&cs=19&l=en&s=dhs

http://www1.us.dell.com/content/products/features.aspx/odg_notebooks?c=us&cs=19&l=en&s=dhs

5

Goal 2: Use computation to understand biomolecular systems

E. coli chromosome

E. coli

protein


DNA



http://www1.us.dell.com/content/products/category.aspx/desktops?c=us&cs=19&l=en&s=dhs

http://www1.us.dell.com/content/products/productdetails.aspx/axim_x51v?c=us&cs=19&l=en&s=dhs

http://www1.us.dell.com/content/products/features.aspx/odg_notebooks?c=us&cs=19&l=en&s=dhs

6

3lambda

1.5lambda

0.5lambda

Why Do We Need to Learn About Biomolecular Computing?

Reason 1: “the disappearing transistor”

•By 2020, “gate” will be only one atom large [Keyes, IBM]

• Candidate “new” technologies:

+quantum computing

+biomolecular computing

7

Relative sizes:

10-18: electron

10-15: proton, neutron

10-14: atomic nucleus

10-10: water molecule (angstrom)

10-9: (nanometer, nm), one DNA “twist”

10-8: wavelength of UV light

10-7: thickness of cell membrane

10-6: diameter of typical bacterium (micron, mm)

10-5: diameter of typical cell

10-4: width of human hair

10-3: diameter of sand grain (millimeter, mm)

10-2: diameter of nickel (centimeter, cm)

100: 1 meter

35 mm--one side of Pentium 4 chip

2-10 mm, typical MEMS feature size

0.18 or 0.13 mm, Pentium 4 wire width

“nanotechnology”:

molecules, atoms

8

Why Do We Need to Learn About Biomolecular Computing?

Reason 2: a host of potential applications

•medical: diagnosis / treatment delivery / prosthetics

•lab diagnostics: health care / forensics / drug development

http://www.aperfectworld.org/clipart/healthcare/needle.png

http://www.aperfectworld.org/clipart/healthcare/thermometerhand.png

9

Why is biomolecular computing attractive?•Size: --typical bacterium has diameter on ht order of 10-6 m. (1

micron); --one twist of DNA double helix is on the order of 10-9 m.

(nanometer scale)

•Power requirements should be low

•Massive parallel computation is theoretically possible

•I/O can be two-dimensional

•Instabilities of quantum systems are much less of a problem here

10

What are the disadvantages?•Speed--typical reaction can take hours or days

•Error rates--may be unacceptably high; may be introduced by mechanical steps in proocessing data

•I/O--we do not yet have efficient mechanisms for doing input/output with these systems

•“Herd” property--we can affect a mixture of data items; we cannot in general pick out one specific item; biomolecular computing is inherently parallel

•Exponential growth in size of computation--it may be that the speed barrier in traditional computing is replaced by a size barrier in biomolecular computing--we may need too much biological material to solve a reasonable sized problem for the “computation” to be feasible

11

Major drawback: typical engineers “don’t know much about biology….”

•Biology is traditionally descriptive, rather than computational (HUGE vocabulary)

•Biomolecular processes are incredibly complex and many are not well understood

•Field is changing rapidly

•There are multiple paradigms for computing available

12

Also, there are many different subfields:

bioinformatics: the application of computer technology to the management of biological information

biomolecular computing: the use of biological and chemical processes to perform computations

bio-inspired computing: the use of biological paradigms (e.g., neural nets, genetic algorithms) in the design of computational algorithms. Algorithms may be implemented in any appropriate technology

neurocomputing:direct I/O from biological system; interfacing directly with nervous system; currently using traditional analog computing

13

And many computing paradigms:

DNA computing--uses physical structure of DNA

in vivo computing--uses biological processes, e.g., protein synthesis, to perform computations

in silico computing--”traditional” computing; often used to refer to programs that attempt to simulate living organisms; sometimes referred to as “bioSpice”

14

Some important basic terms (good reference: Brown, Genomes, Wiley-Liss, 1999):

So how can we get started?

15

•genome: biological information in an organism•DNA: deoxyribonucleic acid, carries genome of cellular lifeforms•RNA: ribonucleic acid, carries genome of some viruses, carries messages within the cell•bases: the four bases found in DNA are

adenine (A), cytosine (C), guanine (G),

and Thymine (T); in a “double helix” of DNA,

bonds are always A--T or C--G; thus a single

strand of DNA carries the information about

the strand it would bond to

16

DNA—the “double helix”

17

•polynucleotide: a single DNA strand

•oligonucleotide: short, single-stranded DNA molecule, usually less than 50 nucleotides in length

In DNA computing, specific oligonucleotides are constructed to represent data items.

•nucleotide: phosphate group + sugar + one of the 4 bases (A,C,G,T): the phosphate end is labeled 5’, the base end, 3’

Example: in Adelman’s seminal 1994 paper, oligonucleotides of length 20 were built to represent vertices and edges in a given graph:

Vertex V1

Edge V1-V2

Vertex V2

A T T G

C A A G

AC A T

18

What interesting projects can build on our knowledge of traditional computer

engineering?

• “structural” designs—DNA computing

• “chemical” designs—using proteins as signals

19

Possible operations on DNA:

•building up custom oligonucleotide sequences to represent parts of your data

•splitting--can be done by heating, e.g.

•recombining--can be done by cooling

•cutting strand at a particular site

•“sticking” two fragments together (at their ends)

•sorting by some string property (including length)

DNA computing (“structural”, “digital”)

20

So-----DNA computing:

•uses structure of the DNA

•relies on mechanical operations

•answers “self-assemble”

•basic steps:

•encode the problem

•make a “solution” of problem fragments

•cool the solution so fragments will form longer strands

•filter out the answers you want

21

Example: solving graph problems

C A A G

A T T G

C A A T

•Encode vertices and edges—use DNA properties to encode graph “structure”

•Mix up a solution of your fragments

•Cool down, get resulting “paths”, “spanning trees”, etc.

22

“Standard cell architectures, FPGAs”

Basic idea (after Prof. Tom Knight, MIT):

•“gates” are functional units

•Ends of gates are standard “join” DNA sequences—reserved for this purpose

•So we can build computational chains easily

23

Other applications of DNA computing:•general computing using “sticker” language

•study of relationship between traditional architectures and DNA configurations:

---FSMs-linear DNA

---stack machines--branching DNA

---“Turing machines” (general purpose computers)--

sheet DNA

24

Other applications of DNA computing (continued):•3-D self-assembled structures:

•“walking and rolling DNA”:

•structures for nanotube assembly: (recently reported in Science)

25

in vivo computing (“chemical” / ”analog”):

uses processes within the cell (e.g., E. coli) as signals

model is closer to traditional computing, with electrical signals replaced by chemical signals

many processes we would like to use are not well understood

requires in silico computing to generate simulations of biomolecular processes, similar to SPICE simulations in traditional electrical circuits

this is a new and rapidly growing field with many potential practical applications

26

“central dogma”:

DNA ----> RNA-----> protein

we can use the presence or absence of the protein to indicate “1” or “0”

27

•Protein: like DNA, a protein is a linear polymer. It is made of units which are amino acids. Proteins are very complex and not completely understood. Proteins have four levels of structure:•primary: the amino acids bonded together•secondary: typically either an “alpha-helix” or a “beta-sheet”•tertiary: formed from folding of the secondary structure into a three-dimensional configuration•quartenary: formed by units folded into the tertiary structure of the protein

28

Some proteins:

http://www.biochem.szote.u-szeged.hu/astrojan/protein2.htm

29

•Central Dogma:

Before the discovery of retroviruses and prions, this was believed to be the basic mechanism of inheritance in all living things

30

•Plasmid: a “loop” of DNA used to introduce new genetic material into a cell

•used for “genetic engineering”

•typically plasmid will also havea section which ensures it willhave resistance to a particular antibiotic; after insertion intocell, this will provide amarker to show that the new DNA really has beeninserted

31

One possible simple mechanism:

DNA:

Summary:

• 0 input --> output protein A (1);

• 1 input (RNA) ---> 0 output

promoter

gene

RNA output

Translate

Transcript

Protein A output (detect by fluoresence)

input

Protein B input

translate

RNA

inhibits

32

Analogy to Electrical Inverter

33

Bio-Inverter Model [Weiss 1999]

34

Deterministic Vs Stochastic Model

• Deterministic Model Inverter modeled using a set of differential equations

with deterministic variables. No random components. Fixed order for reactions.

Stochastic Model Accounts for the random noise components. Simulations under different environmental conditions

and other random noise variables. Random order for reactions.

35

Deterministic Simulation

36

Deterministic Simulation Transient Characteristics (Matlab)

37

Deterministic Simulation (6) Transient Characteristics (VHDL-AMS)

Deterministic Simulation—Example (5) Transient Characteristics

38

Deterministic Simulation Modified Transient Characteristics

• The transient characteristics of the inverter are computed using the modified reaction rates.

• The steady state output value has doubled since the transcription rate is doubled (k7*2).

• The rise of the output has decreased to about 30 seconds and the rise and fall times are equal.

• The reduction of repression rate and the dissociation rate increase are the reasons for the decrease of the rise time.

39

Deterministic Simulation Modified Transient Characteristics (Matlab)

40

Stochastic Simulation

• Stochastic simulation based on Gillespie algorithm [Gillespie 1977].

• Two random variables (time and the type of reaction) were introduced.

• In biology, the cell reaction occurs at random intervals of time.

• The reactions do not occur in order and are random.

• Temperature fluctuations, decay rates and other parameters also result in random noise.

41

Stochastic Simulation

42

Some areas to explore:• Stochastic simulation—design space exploration

– Similar to CAD tool development for digital and analog circuits– Currently trying simulated annealing, genetic algorithms– Many other strategies can be explored– Will also have applications in medical research

• Agent-based modeling and visualization– 3D modeling and dynamic simulations using object-oriented

programming

• Engineering design process for biomolecular computing applications– Will modify traditional design flows for software, digital, and analog

circuits– Will provide support to circuit designers and biomedical researchers

• Development of DNA “standard cells”

1 ececs 819: lecture 1—introduction computational aspects of biological systems

Documents