1 ececs 819: lecture 1—introduction computational aspects of biological systems
Post on 22-Dec-2015
213 views
TRANSCRIPT
1
ECECS 819: lecture 1—Introduction
Computational aspects of biological systems
2
Biology—Macro and Micro Elements
E. coli chromosome
E. coli
protein
An amino acid (alanine)
DNA
3
Biosystem: an “information processing system”
•“sensor” / “processor”/”actuator”
•Self-repairing
•Stores information
•Can interact with other systems (e.g., use of nerve signals to activate devices)
•May be a “community” (e.g., coral, fungus)-
4
Goal 1: Use “micro” elements as information processing / storage devices
—”biomolecular computers”
E. coli chromosome
E. coli
protein
An amino acid (alanine)
DNA
5
Goal 2: Use computation to understand biomolecular systems
E. coli chromosome
E. coli
protein
An amino acid (alanine)
DNA
6
3lambda
1.5lambda
0.5lambda
Why Do We Need to Learn About Biomolecular Computing?
Reason 1: “the disappearing transistor”
•By 2020, “gate” will be only one atom large [Keyes, IBM]
• Candidate “new” technologies:
+quantum computing
+biomolecular computing
7
Relative sizes:
10-18: electron
10-15: proton, neutron
10-14: atomic nucleus
10-10: water molecule (angstrom)
10-9: (nanometer, nm), one DNA “twist”
10-8: wavelength of UV light
10-7: thickness of cell membrane
10-6: diameter of typical bacterium (micron, mm)
10-5: diameter of typical cell
10-4: width of human hair
10-3: diameter of sand grain (millimeter, mm)
10-2: diameter of nickel (centimeter, cm)
100: 1 meter
35 mm--one side of Pentium 4 chip
2-10 mm, typical MEMS feature size
0.18 or 0.13 mm, Pentium 4 wire width
“nanotechnology”:
molecules, atoms
8
Why Do We Need to Learn About Biomolecular Computing?
Reason 2: a host of potential applications
•medical: diagnosis / treatment delivery / prosthetics
•lab diagnostics: health care / forensics / drug development
9
Why is biomolecular computing attractive?•Size: --typical bacterium has diameter on ht order of 10-6 m. (1
micron); --one twist of DNA double helix is on the order of 10-9 m.
(nanometer scale)
•Power requirements should be low
•Massive parallel computation is theoretically possible
•I/O can be two-dimensional
•Instabilities of quantum systems are much less of a problem here
10
What are the disadvantages?•Speed--typical reaction can take hours or days
•Error rates--may be unacceptably high; may be introduced by mechanical steps in proocessing data
•I/O--we do not yet have efficient mechanisms for doing input/output with these systems
•“Herd” property--we can affect a mixture of data items; we cannot in general pick out one specific item; biomolecular computing is inherently parallel
•Exponential growth in size of computation--it may be that the speed barrier in traditional computing is replaced by a size barrier in biomolecular computing--we may need too much biological material to solve a reasonable sized problem for the “computation” to be feasible
11
Major drawback: typical engineers “don’t know much about biology….”
•Biology is traditionally descriptive, rather than computational (HUGE vocabulary)
•Biomolecular processes are incredibly complex and many are not well understood
•Field is changing rapidly
•There are multiple paradigms for computing available
12
Also, there are many different subfields:
bioinformatics: the application of computer technology to the management of biological information
biomolecular computing: the use of biological and chemical processes to perform computations
bio-inspired computing: the use of biological paradigms (e.g., neural nets, genetic algorithms) in the design of computational algorithms. Algorithms may be implemented in any appropriate technology
neurocomputing:direct I/O from biological system; interfacing directly with nervous system; currently using traditional analog computing
13
And many computing paradigms:
DNA computing--uses physical structure of DNA
in vivo computing--uses biological processes, e.g., protein synthesis, to perform computations
in silico computing--”traditional” computing; often used to refer to programs that attempt to simulate living organisms; sometimes referred to as “bioSpice”
14
Some important basic terms (good reference: Brown, Genomes, Wiley-Liss, 1999):
So how can we get started?
15
•genome: biological information in an organism•DNA: deoxyribonucleic acid, carries genome of cellular lifeforms•RNA: ribonucleic acid, carries genome of some viruses, carries messages within the cell•bases: the four bases found in DNA are
adenine (A), cytosine (C), guanine (G),
and Thymine (T); in a “double helix” of DNA,
bonds are always A--T or C--G; thus a single
strand of DNA carries the information about
the strand it would bond to
16
DNA—the “double helix”
17
•polynucleotide: a single DNA strand
•oligonucleotide: short, single-stranded DNA molecule, usually less than 50 nucleotides in length
In DNA computing, specific oligonucleotides are constructed to represent data items.
•nucleotide: phosphate group + sugar + one of the 4 bases (A,C,G,T): the phosphate end is labeled 5’, the base end, 3’
Example: in Adelman’s seminal 1994 paper, oligonucleotides of length 20 were built to represent vertices and edges in a given graph:
Vertex V1
Edge V1-V2
Vertex V2
A T T G
C A A G
AC A T
18
What interesting projects can build on our knowledge of traditional computer
engineering?
• “structural” designs—DNA computing
• “chemical” designs—using proteins as signals
19
Possible operations on DNA:
•building up custom oligonucleotide sequences to represent parts of your data
•splitting--can be done by heating, e.g.
•recombining--can be done by cooling
•cutting strand at a particular site
•“sticking” two fragments together (at their ends)
•sorting by some string property (including length)
DNA computing (“structural”, “digital”)
20
So-----DNA computing:
•uses structure of the DNA
•relies on mechanical operations
•answers “self-assemble”
•basic steps:
•encode the problem
•make a “solution” of problem fragments
•cool the solution so fragments will form longer strands
•filter out the answers you want
21
Example: solving graph problems
C A A G
A T T G
C A A T
•Encode vertices and edges—use DNA properties to encode graph “structure”
•Mix up a solution of your fragments
•Cool down, get resulting “paths”, “spanning trees”, etc.
22
“Standard cell architectures, FPGAs”
Basic idea (after Prof. Tom Knight, MIT):
•“gates” are functional units
•Ends of gates are standard “join” DNA sequences—reserved for this purpose
•So we can build computational chains easily
23
Other applications of DNA computing:•general computing using “sticker” language
•study of relationship between traditional architectures and DNA configurations:
---FSMs-linear DNA
---stack machines--branching DNA
---“Turing machines” (general purpose computers)--
sheet DNA
24
Other applications of DNA computing (continued):•3-D self-assembled structures:
•“walking and rolling DNA”:
•structures for nanotube assembly: (recently reported in Science)
25
in vivo computing (“chemical” / ”analog”):
uses processes within the cell (e.g., E. coli) as signals
model is closer to traditional computing, with electrical signals replaced by chemical signals
many processes we would like to use are not well understood
requires in silico computing to generate simulations of biomolecular processes, similar to SPICE simulations in traditional electrical circuits
this is a new and rapidly growing field with many potential practical applications
26
“central dogma”:
DNA ----> RNA-----> protein
we can use the presence or absence of the protein to indicate “1” or “0”
27
•Protein: like DNA, a protein is a linear polymer. It is made of units which are amino acids. Proteins are very complex and not completely understood. Proteins have four levels of structure:•primary: the amino acids bonded together•secondary: typically either an “alpha-helix” or a “beta-sheet”•tertiary: formed from folding of the secondary structure into a three-dimensional configuration•quartenary: formed by units folded into the tertiary structure of the protein
28
Some proteins:
http://www.biochem.szote.u-szeged.hu/astrojan/protein2.htm
29
•Central Dogma:
Before the discovery of retroviruses and prions, this was believed to be the basic mechanism of inheritance in all living things
30
•Plasmid: a “loop” of DNA used to introduce new genetic material into a cell
•used for “genetic engineering”
•typically plasmid will also havea section which ensures it willhave resistance to a particular antibiotic; after insertion intocell, this will provide amarker to show that the new DNA really has beeninserted
31
One possible simple mechanism:
DNA:
Summary:
• 0 input --> output protein A (1);
• 1 input (RNA) ---> 0 output
promoter
gene
RNA output
Translate
Transcript
Protein A output (detect by fluoresence)
input
Protein B input
translate
RNA
inhibits
32
Analogy to Electrical Inverter
33
Bio-Inverter Model [Weiss 1999]
34
Deterministic Vs Stochastic Model
• Deterministic Model Inverter modeled using a set of differential equations
with deterministic variables. No random components. Fixed order for reactions.
Stochastic Model Accounts for the random noise components. Simulations under different environmental conditions
and other random noise variables. Random order for reactions.
35
Deterministic Simulation
36
Deterministic Simulation Transient Characteristics (Matlab)
37
Deterministic Simulation (6) Transient Characteristics (VHDL-AMS)
Deterministic Simulation—Example (5) Transient Characteristics
38
Deterministic Simulation Modified Transient Characteristics
• The transient characteristics of the inverter are computed using the modified reaction rates.
• The steady state output value has doubled since the transcription rate is doubled (k7*2).
• The rise of the output has decreased to about 30 seconds and the rise and fall times are equal.
• The reduction of repression rate and the dissociation rate increase are the reasons for the decrease of the rise time.
39
Deterministic Simulation Modified Transient Characteristics (Matlab)
40
Stochastic Simulation
• Stochastic simulation based on Gillespie algorithm [Gillespie 1977].
• Two random variables (time and the type of reaction) were introduced.
• In biology, the cell reaction occurs at random intervals of time.
• The reactions do not occur in order and are random.
• Temperature fluctuations, decay rates and other parameters also result in random noise.
41
Stochastic Simulation
42
Some areas to explore:• Stochastic simulation—design space exploration
– Similar to CAD tool development for digital and analog circuits– Currently trying simulated annealing, genetic algorithms– Many other strategies can be explored– Will also have applications in medical research
• Agent-based modeling and visualization– 3D modeling and dynamic simulations using object-oriented
programming
• Engineering design process for biomolecular computing applications– Will modify traditional design flows for software, digital, and analog
circuits– Will provide support to circuit designers and biomedical researchers
• Development of DNA “standard cells”