chapter 1 aims and objectives -...
TRANSCRIPT
Aims And Objectives 1
CHAPTER 1
AIMS AND OBJECTIVES
1.1 INTRODUCTION
The biology of life can be regarded as complex and stable computing system to
maintain the life. The biological system can save and interpret the information, and
calculate the information to survive in the environment. Therefore, many researchers
have studied so far to imitate or use the biological system for a specific computation
purpose. Wiener proposed the cybernetics system which studies artificial system and
bio mimic machine (Wiener, 1948), von Neumann compared the digital computer and
biological neural networks in his book (von Neumann, 1958). In 1970s, Fogel,
Rechenberg, and Holland independently suggested the evolutionary computation
model based on Darwin‘s natural selection and molecular biology (Black, 1996).
Langdon started the artificial life research which simulates the life in the digital
computers (Langdon, 1988). These approaches applied biological laws such as
evolution and selection to computer algorithm design. In other way, there has been the
way of thinking to use biological system for computing purpose directly. Feynman
suggested the possibility of computing at the molecular level in the late 1950s
(Feynman, 1960). His idea was that biological molecules can carry enormous amounts
of information in an exceedingly small space, so they have inborn computing power.
Finally, in 1994, Adleman realized the DNA computer by solving the Hamiltonian
path problem (Adleman, 1994).
Ever since ancient Greek times, man has suspected that the features of one generation
are passed on to the next. It was not until Mendel's work on garden peas was
recognised (see [38, 75]) that scientists accepted that both parents contribute material
that determines the characteristics of their offspring. In the early 20th century, it was
discovered that chromosomes make up this material. Chemical analysis of
chromosomes revealed that they are composed of both protein and deoxyribonucleic
acid, or DNA. The question was, which substance carries the genetic information? For
Aims And Objectives 2
many years, scientists favoured protein, because of its greater complexity relative to
that of DNA. Nobody believed that a molecule as simple as DNA, composed of only
four subunits (compared to 20 for protein) could carry complex genetic information .It
was not until the early 1950s that most biologists accepted the evidence showing that
it is in fact DNA that carries the genetic code. However, the physical structure of the
molecule and the hereditary mechanism was still far from clear. In 1951, the biologist
James Watson moved to Cambridge to work with a physicist, Francis Crick. Using
data collected by Rosalind Franklin and Maurice Wilkins at King's College, London,
they began to decipher the structure of DNA. They worked with models made out of
wire and sheet metal in attempt to construct something that fitted the available data.
Once satisfied with their model, they published the paper [78] (also see [77]) that
would eventually earn them (and Wilkins) the Nobel Prize for Physiology or
Medicine in 1962
DNA molecules were used as information storage media and the techniques of
molecular biology, such as hybridization, ligation, polymerase chain reaction, and gel
electrophoresis, were used as computational operators for extracting, combining,
copying, and sorting the information in the DNA molecules, respectively. Since
Adleman‘s pioneering work, DNA computing has become the focus of the attention
for researchers to overcome the limitations of sequential silicon-based computing.
They paid attention to its high storage density, massive parallelism, and
biocompatible capability (Maley, 1998; Garzon and Deaton, 1999). To show its
computing power, DNA computing has been applied to various computational
problems (Adleman, 1994; Ouyang et al., 1997), logical problem (Liu et al., 2000;
Mao et al., 2000), Boolean circuit development (Owenson et al., 2001), computational
model (Mills Jr., 2002), medical problem (Benenson et al., 2004), nano structure
(Winfree et al., 1998), and associative memory construction (Baum, 1995).
DNA (Deoxyribose Nucleic Acid) computing, also known as molecular computing is
a new approach to massively parallel computation based on groundbreaking work by
Adleman. DNA computing was proposed as a means of solving a class of intractable
computational problems in which the computing time can grow exponentially with
problem size (the 'NP-complete' or non-deterministic polynomial time complete
Aims And Objectives 3
problems).A DNA computer is basically a collection of specially selected DNA
strands whose combinations will result in the solution to some problem, depending on
the problem at hand. Technology is currently available both to select the initial strands
and to filter the final solution. DNA computing is a new computational paradigm that
employs (bio)molecular manipulation to solve computational problems, at the same
time exploring natural processes as computational models. In 1994, Leonard Adleman
at the Laboratory of Molecular Science, Department of Computer Science, University
of Southern California surprised the scientific community by using the tools of
molecular biology to solve a different computational problem. The main idea was the
encoding of data in DNA strands and the use of tools from molecular biology to
execute computational operations. Besides the novelty of this approach, molecular
computing has the potential to outperform electronic computers. For example, DNA
computations may use a billion times less energy than an electronic computer while
storing data in a trillion times less space. Moreover, computing with DNA is highly
parallel: In principle there could be billions upon trillions of DNA molecules
undergoing chemical reactions, that is, performing computations, simultaneously.
L. M. Adleman launched the field of DNA computing with a demonstration in 1994
that strands of DNA could be used to solve the Hamiltonian path problem for a simple
graph. He also identified three broad categories of open questions for the field. First,
is DNA capable of universal computation? Second, what kinds of algorithms can
DNA implement? Third, can the error rates in the manipulations of the DNA be
controlled enough to allow for useful computation? In the two years that have
followed, theoretical work has shown that DNA is in fact capable of universal
computation. Furthermore, algorithms for solving interesting questions, like breaking
the Data Encryption Standard, have been described using currently available
technology and methods. Finally, a few algorithms have been proposed to handle
some of the apparently crippling error rates in a few of the common processes used to
manipulate DNA. It is thus unlikely that DNA computation is doomed to be only a
passing curiosity. However, much work remains to be done on the containment and
correction of errors. It is far from clear if the problems in the error rates can be solved
sufficiently to ever allow for general-purpose computation that will challenge the
more popular substrates for computation. Unfortunately, biological demonstrations of
Aims And Objectives 4
the theoretical results have been sadly lacking. To date, only the simplest of
computations have been carried out in DNA .To make significant progress, the field
will require both the assessment of the practicality of the different manipulations of
DNA and the implementation of algorithms for realistic problems. Theoreticians, in
collaboration with experimentalists, can contribute to this research program by
settling on a small set of practical and efficient models for DNA computation.
The DNA has its hand in security as well .The security of traditional cryptology is
usually based complex mathematical problem that we can not find a quick algorithm
at this stage, such as famous Rivest-Shamir-Adleman (RSA) encryption ,the security
of which bases on the difficulty of a large number finding its two prime factors. Once
corresponding quick methods to mathematic problems were found, they might be no
longer secure. DNA computing provides a parallel processing capability with
molecular level, introducing a fire-new data structure and calculating method. It can
simultaneously attack different parts of the computing problem, putting forward
challenges to traditional information security technology. A number of proposals have
been submitted for breaking conventional cryptosystems by DNA computing. It
indicated that the cryptosystem using public-key was perhaps insecure. DNA
computing is a new computational paradigm by harnessing the potential massive
parallelism, high density information of bio-molecules and low power consumption,
which brings potential challenges and opportunities to traditional cryptography. DNA
computing is a new method of simulating bio molecular structure of DNA and
computing by means of molecular biological technology which is a novel and
potential growth interdisciplinary. In a pioneering study, Adleman demonstrated the
first DNA computing . It marked the beginning of a new stage in the era of
information. This approach has been extended by Lipton to solve another NP-
complete problem, which is the satisfaction problem. These elegant studies
demonstrated how problems corresponding to Boolean formulas can be solved by a
massively parallel processing procedure. DNA computing has been proposed to solve
difficult combinatorial search problems such as the Hamiltonian path problem (HPP),
using the vast parallelism to do the combinatorial search among a large number of
possible solutions represented by DNA strands. In 2002, Braich, R. S. etc got the
solution of a 20-Variable 3-SAT Problem on a DNA Computer . However, DNA
Aims And Objectives 5
computing has many further exciting applications besides the pure combinatorial
search. It can simultaneously attack different parts of the computing problem put
forward challenges and opportunities to traditional information security technology.
For example, in 1995, Boneh et al. demonstrated an approach to break the Data
Encryption Standard (DES) by using DNA computing methods. In 1999, Clelland et
al. Achieved an approach to steganography by hiding secret messages encoded as
DNA strands among a multitude of random DNA. DNA and RNA are appealing
mediums for data storage due to the very large amounts of data that can be stored in
compact volume.They vastly exceed the storage capacities of conventional electronic,
magnetic, optical medium. A gram of DNA contains about 1021 DNA bases, or about
108 tera-bytes. Hence, a few grams of DNA may have the potential of storing all the
data stored in the world. Recent research has considered DNA as a medium for ultra-
scale computation and for ultra compact information storage. DNA cryptography is a
new born cryptographic field emerged with the research of DNA computing in which
DNA is used as information carrier and the modern biological technology is used as
implementation tool. The vast parallelism and extraordinary information density
inherent in DNA molecules are explored for cryptographic purposes such as
encryption, authentication, signature, and so on. The new born DNA cryptography is
far from mature both in theory and realization, and this might be the reason why only
few examples of DNA cryptography were proposed.
Although DNA computing creates a molecular computing precedent and broadens the
understanding of people to natural computing phenomena, it still stayed in a
theoretical stage. There are some problems unresolved successfully about DNA
computing:
1. Its computing model is mostly just using molecular technique to resolve a certain
problem, the varieties of problems result in the discrepancy of computing
schemes, there still have not an uniform computing and coding model currently.
2. DNA computing only converts the time complexity into space complexity.
3. There are also error codes in DNA computing, they generate randomly according
to probability and can gradually amplified with the increase of the experiment
step.
Aims And Objectives 6
4. DNA liquid is very easy to deteriorate in the process of reaction and even
adsorption of the test tube wall may result in fatal error.
5. Most of these proposals implemented computing processes by performing a series
of biochemical reactions on a set of DNA molecules, which require human
intervention at each step.
Thus, the difficulties of such methods for DNA computing are that the large numbers
of laboratory procedures and the time consuming, which grow with the size of the
problem. Therefore, DNA computing is not very good in resolving real problem
according to the available pattern in recent year. Therefore, in terms of existing DNA
computing mode, it is not able to construct real intimidation to the security of
cryptography. At the same time, all kinds of encryption scheme pouring out
unceasingly based on DNA computing, providing escorting to the DNA molecules
bank and DNA molecules information. But, because the security, general, validity and
key management of the encrypt mechanism have not been carried out systematic
theory analysis. So, DNA cryptography still needs studying exclusively and
discussion broadly, its prospect is still uncertain before DNA computing become
really mature. But DNA cipher is the beneficial supplement to the existing
mathematical cipher, it is a prior choice especially to the lower demand real-time
encryption system. Relatively speaking, DNA computing has a brighter development
potential in steganography and authentication, which have a more layer protection
than a single encryption. With the rapid development of modern biotechnology, the
costly biological experiment has became a normal one. If the molecular word can be
controlled at will, it may be possible to achieve vastly better performance for
information storage and information security.
Thus, it can be said that DNA computing is the method of solving computation
problems with the help of chemistry and biological operation on DNA strands. It was
introduce by Adleman in 1994 [Adleman, 1994] who showed how to solve the
Hamilton path problem by manipulating the DNA strands in the tubes. Since then,
more and more researchers are motivated by the promising future of this area and start
working on it. The basic idea of DNA computing arises from the mapping between
the physics process in electronic computer and the chemistry process in DNA
Aims And Objectives 7
reactions. In electronics computer, everything is encoding in Binary(0,1) strings,
while every DNA strands is encoded in four nucleotides : A, T, G and C. In
electronics computers, the basic operation can be treated as manipulation on binary
strings while there are a bunch of biological operation on the DNA strands ,e.g.
legation (concatenation), amplifying (copy), Substitution etc. which can be performed
in a controlled manner by modern biological Technologies.
1.2 MOTIVATIONS OF THE RESEARCH
―Where a calculator on the ENIAC is equipped with 18,000 vacuum tubes and weighs
30 tons, computers in the future may have only 1,000 vacuum tubes and perhaps
weigh 1 1/2 tons." So said Popular Mechanics in 1949 [8]. Today, in the age of smart
cards and wearable PCs, this statement is striking because it falls so short of reality. In
fifty years from now, who would be prepared to predict how close to the levels of
molecular miniaturisation described in Feynman's visionary paper [26] we will have
come?
Huge advances in miniaturization have been made since the days of room sized
computers, yet the underlying computational model (the Von Neumann architecture)
has remained the same. Today's supercomputers still employ the kind of sequential
logic used by the mechanical dinosaurs of the 1930s. Some researchers are now
looking beyond these boundaries and are investigating entirely new media and
computational models. These include quantum, optical and DNA-based computers .It
is the last development that this thesis concentrates on. Despite the popular image of
silicon-based computers for computation, an embryonic field of molecular
computation is emerging, where molecules in solution perform computational
operations. DNA, which is known to store biological information, is being used as a
substrate for molecular computation.
The idea that living cells and molecular complexes can be viewed as potential
mechanic components dates back to the late 1950s, when Richard Feynman delivered
his famous paper describing ―sub-microscopic‖ computers. More recently, several
papers [2, 7, 52] (also [5, 36, 64]) have advocated the realisation of massively parallel
computation using the techniques and chemistry of molecular biology. The
Aims And Objectives 8
development of existing silicon-based computers was only made possible by the
invention of the transistor, which facilitated for the first time electronic manipulation
of silicon. We may draw an interesting parallel between this historical precedent and
the development of molecular-scale computers. Although the concept dates back to
the late 1950s, only now do we have at our disposal the tools and techniques of
molecular biology required to construct the prototype molecular computers. In [2],
Adleman described how a computationally intractable problem, known as the directed
Hamiltonian Path Problem (HPP) might be solved using molecular methods. Recall
that the HPP involves finding a path through a graph that visits each vertex exactly
once. Adleman's method employs a simple, massively parallel random search. The
algorithm is not executed on a traditional, silicon-based computer, but instead
employs the ―test-tube‖ technology of genetic engineering. By representing
information as sequences of bases in DNA molecules, Adleman shows how existing
DNA-manipulation techniques may be used to quickly detect and amplify desirable
solutions to a given problem.
How can we combine a flask of DNA with biological tools to solve a hard
mathematical problem? Adleman's experiment proceeds as follows. The first stage
created a flask of DNA molecules, each molecule encoding a potential solution to the
problem. With reference to the HPP, for example, each strand encoded a path (not
necessarily Hamiltonian) through the graph. Given every DNA molecule that encodes
a path of length n, for a graph with n vertices, we can be sure that every possible
solution is present, some legal, but most illegal. Once the entire solution space was
present in a flask the DNA computer really came into its own. Adleman used a small
set of biological tools to sift out DNA that encoded illegal solutions. These are those
paths that do not visit every vertex, or paths that visit a particular vertex more than
once. At the end of the sifting process, he was left only with strands that encoded
legal solutions.
DNA computing is one interdisciplinary research area that is growing fast since DNA
molecules are implemented in a computational process. One of the main objectives of
this research area is to produce, in near future, a biologically inspired computer based
on DNA molecules to replace or at least beneficially complement with a silicon based
computer. Since R. Feynman has suggested to construct a computer from molecules in
Aims And Objectives 9
1964 [1]. It spent 20 years till Adleman in 1994 made proof of the principle study that
DNA molecules can solve an NP problem of Hamiltonian Path Problem (HPP)
through bio-chemical procedure [2].
DNA is a basic storage medium for all living cells. The main function of DNA is to
absorb and transmit the data of life for billions years. Roughly, it is around 10 trillions
of DNA molecules could fit into a space the size of a marbles. Since all these
molecules can process data simultaneously, theoretically, we can calculate 10 trillions
times simultaneously in a small space at one time. DNA computing is more generally
known as molecular computing. It is interdisciplinary field where it is combination of
biology, chemistry, and mathematics and computer science. Computing with DNA
offers a completely new paradigm for computation. The main idea of computing with
DNA is to encode data in a DNA strand form, and laboratory techniques of molecule
biology, called as bio operations will be involved to manipulate DNA strands in a test
tube in order to simulate arithmetical and logical operations. It is estimated that a mix
of 1018 DNA strands could operate 104 times faster than the speed of a today's
advanced supercomputer [3]. Since then, DNA computing is the area of exciting
multidisciplinary researches. Rozenberg et al. in 1999 distinguished two major lines
of researches in DNA computing as (i) the theoretical line concerned with models,
algorithms and paradigms for DNA computing and (ii) the experimental line
concerned with the design of laboratory experiment to test the biochemical feasibility
[4]. Even though there is still a long way to implement DNA algorithm in real life
problem, but researchers are interested in modelling and testing the solution in a case
study in order to challenge the limitation of DNA itself. Today, lot groups of active
researchers in this field develop models and do the laboratories experiment especially
in challenges of biochemical feasibility. However, there are other groups concerning
to develop a real DNA computer and building DNA algorithms to solve engineering
or application problems.
Of course, for DNA computers, each individual operation, for example, extracting
DNA strands, can take minutes or even hours to perform. This cost of a computational
step, when compared to that of supercomputers capable of executing a trillion
operations a second, looks unimpressive. However, the real power of DNA computers
lies in their inherent parallelism each operation is performed not on one single DNA
Aims And Objectives 10
strand, but on every strand in the ask simultaneously. The fastest supercomputers in
existence today are capable of executing around a trillion operations a second. DNA
computers have the potential to execute more than a thousand trillion operations per
second, as well as being a billion times more energy-efficient and requiring a trillionth
of the space needed by existing storage media. Nature has information compression
down to a fine art over forty 1 Mb floppy discs are required to store the genome of a
single fruit y [74].
By natural of DNA molecule, Watson-Crick complementary plays the most important
role in the DNA computing. DNA consists of four bases of nucleic acid, Adenine (A),
Guanine (G), Cytosine (C), and Thymine (T). Adenine can only connect with
Thymine, and Cytosine can only connect with Guanine.
Adleman‘s Experiment in this field was about : A Hamiltonian path is a sequence of
edges in a graph, which touches every vertex exactly once. The Hamiltonian path
problem is to decide whether a graph has a Hamiltonian path or not. Given a graph G
with n vertices, where vertices in V and out V are marked. G is called to have a
Hamiltonian path from in V to out V if there is a path of edges starting with in V and
ending with out V that contains every vertices of G exactly once. The directed
Hamiltonian path problem is a triple tuple (G, in V, out V) where G has a Hamiltonian
path from in V to out V. Adleman uses the nondeterministic algorithm to solve the
directed Hamiltonian path problem for an input (G, in V, out V ) as follows:
1. Generate a set of random paths in G.
2. Extract all paths beginning with in V and ending with out V.
3. Extract all paths with length exactly n -1.
4. Extract all paths that contain every vertex at most once.
5. Accept that there is a Hamiltonian path if there are any paths left; otherwise,
reject.
The above steps are realized as molecular computation phases. Vertices and edges of
G are coded by DNA polymers. On step 1, ligation builds DNA strands that represent
random paths in G. On step 2, the Watson-Crick complements of the codings of in V
and out V are used to extract the strands with the correct start and end. On step 3, in
Aims And Objectives 11
order to get codings of length n -1, the DNA strands are separated. Next, the DNA is
denatured. On step 4, by Watson-Crick complement of its coding, each vertex is
checked if only present in a path once. On step 5, to obtain the result, the gel
electrophoresis is used for testing whether there is any strand left or not. In between
the steps, polymerase chain reaction (PCR) is used to amplify the intermediate results
[34].
If we talk about Silicon microprocessors, they have been the heart of the computing
world for more than 40 years. In that time, manufactures have crammed more and
more electronics device onto their microprocessor. In accordance with MOORE Law,
the number of electronics device put on a microprocessor has doubled every 18
months. Moore‘s law is named after intel founder Gordon Moore, who predicted in
1965 that microprocessor would double in complexity every two years. Many have
predicted that Moore‘s Law will soon reach its end, because of the physical speed and
miniaturization limitations of silicon microprocessors.
If we compare the silicon computers with the DNA ones in terms of speed then there
are two factor to consider in terms of speed. One is speed of operation and other is
parallelism. Instruction in the electronics computer are much faster than the lab
experiments (millions per seconds Vs one per hour even day). But the DNA
computer have vastly more parallelism than the electronics computer. Therefore, the
high parallelism will overcome the slowness of biological experiments. Furthermore,
the lab experiments can be speed up once the manipulation of DNA strand can be
done automatically by machine. What if we talk about the Energy Efficiency. The
energy cost of DNA operation (on one strands) is about 10000000000 times less than
the energy cost of an instruction in electronics computer. About the data storage we
can say that it has economical data storage as one tube can store billons of DNA
strands.
DNA computer have the potential to take computing to new levels, picking up where
Moore‘s Law leaves off. There are several advantage of using DNA instead of silicon:
As long as there are cellular organisms, there will always be supply of DNA.
The large Supply of DNA makes it cheap resource.
Aims And Objectives 12
Unlike the toxic material used to make traditional microprocessors, DNA
biochips can be made cleanly.
DNA computers are many time smaller than today‘s computers.
The main issue in implementing DNA computing technologies to solve real
application is how we can present numerical values especially when a number of
numerical values are related in DNA strands form. Recently various researches have
done and still investigating in order to solve this problem. Researchers have proposed
several techniques as discussed before to solve this problem. However, at the time, all
proposed solutions are only suitable for the limited number of numerical values and
not tested for a number of numerical values. This problem still is open to solve.
Solving problems in presenting numerical values in DNA strands form will enable
DNA computing more practically to solve a lot of engineering and real application
problems. Developing robust method in wet lab experiment to solve engineering
problems are critically essential in DNA computing. One of important process in wet
lab experiment is to reading an end result during the experiment. Recently, gel
electrophoresis, where the strands will be sorted by their bands, is the most popular
technique for this step. However, gel electrophoresis has their own limitation where
this limitation should be disadvantage for DNA technique. One of the drawbacks of
gel electrophoresis technique is coming from the fact we cannot analyze the gel
images in one time for all bands when we are dealing with a number of base pairs. It
is that because some bands especially the earlier one might not exist in the buffer
reader yet. So that, we are only able to read a certain part of bands in one time. It will
be difficult to made analysis process pursued properly. Another important technique
in wet technologies of DNA computing is PCR. PCR is used to amplify the number of
copies of a specific region of DNA, in order to produce enough DNA to be adequately
tested. This technique can be used to identify with a very high probability, disease-
causing viruses and/or bacteria, a diseased person, or a criminal suspect. However,
traditional PCR itself has several limitations that may affect results in DNA
computing. Thus, several researchers focus on enhancing this technique to overcome
this limitation. As a result, real time PCR is employed in several wet experiment in
order to enhance the readability of end results from the experiment real time PCR.
Even though current difficulties found in translating theoretical DNA computing
Aims And Objectives 13
models into real life are not sufficiently overcome, there is still potential for other
areas of development. DNA computing offers a new approach to solve combinatorial
problems such as NP-hard problems in parallel. This advantage offers a potential to
solve problems that faced by a traditional machine in processing a number of tasks.
Thus, considering this benefit, researchers are able to solve a problem, especially one
dealing with a number of calculations such as optimization of clustering, scheduling
problem and so on. On the other hand, DNA is capable to store a lot of information in
small space compared to digital way of storing information. Back to today's situation
we are dealing with huge size of information. Today, our information not only in
word or document yet, but also in images, video format and so on in these formats
require a huge size of storage. So, DNA seems to offer a right choice to solve today
storage problem. As a started research dealing with huge size of storage, Tsaftaris et
al.[15] have proposed a solution to employ DNA in signal processing field and Tsuboi
et al. in image processing [17].
According to researchers there are three main reasons why DNA computation is
practical, firstly, there are a specific computer will be easier to design and implement,
with less need for functional complexity and flexibility; secondly, DNA computing
may prove entirely inefficient for a wide range of problems, and directing efforts on
universal models may be diverting energy away from its true calling; thirdly, the types
of hard computational problems that DNA based computers may be able to effectively
solve are of sufficient economic importance that a dedicated processor would be
financially reasonable. With so many possible advantages over conventional
techniques, DNA computing has bright development potential for practical use.
Future work in this field should begin to incorporate cost-benefit analysis so that
comparisons can be more appropriate with existing techniques.
"Computers in the future may weigh no more than 1.5 tons." So said Popular
Mechanics in 1949. Most of us today, in the age of smart cards and wearable PCs
would find that statement laughable. We have made huge advances in miniaturization
since the days of room-sized computers, yet the underlying computational framework
has remained the same. Today's supercomputers still employ the kind of sequential
logic used by the mechanical dinosaurs of the 1930s. Some researchers are now
Aims And Objectives 14
looking beyond these boundaries and are investigating entirely new media and
computational models. These include quantum, optical and DNA-based computers.
The current Silicon technology has following limitations:
Circuit integration dimensions
Clock frequency
Power consumption
Heat dissipation.
The problem's complexity that can be afforded by modern processors grows up, but
great challenges require computational capabilities that neither most powerful and
distributed systems could reach.
The idea that living cells and molecular complexes can be viewed as potential
mechanic components dates back to the late 1950s, when Richard Feynman delivered
his famous paper describing "sub-microscopic" computers. More recently, several
people have advocated the realization of massively parallel computation using the
techniques and chemistry of molecular biology. DNA computing was grounded in
reality at the end of 1994, when Leonard Adleman, announced that he had solved a
small instance of a computationally intractable problem using a small vial of DNA.
By representing information as sequences of bases in DNA molecules, Adleman
showed how to use existing DNA-manipulation techniques to implement a simple,
massively parallel random search. He solved the travelling salesman problem also
known as the ―Hamiltonian path" problem as stated above.
There are two reasons for using molecular biology to solve computational problems.
(i). The information density of DNA is much greater than that of silicon: 1 bit can
be stored in approximately one cubic nanometer. Others storage media, such as
videotapes, can store 1 bit in 1,000,000,000,000 cubic nanometer.
(ii). Operations on DNA are massively parallel: a test tube of DNA can contain
trillions of strands. Each operation on a test tube of DNA is carried out on all
strands in the tube in parallel.
Other researchers have reported liquid-phase systems for DNA computing. Laura
Landweber and her Princeton University colleagues‘ article in Proceedings of the
Aims And Objectives 15
National Academy of Sciences’ 15 February 2000 issue used a hybrid DNA RNA
computing system to compute solutions to a variant of a well-known chess problem.
The ―Knight problem‖ asks what configurations of knights can be placed on a chess
board with n squares on a side such that none of the knights is attacking another. RNA
strands with 10 bits (each bit represented by 15 nucleotides separated by 5-nucleotide
spacers) were randomly synthesized.
In this approach, an enzyme, RNase H, breaks down those RNA chains that did not
meet the problem‘s constraints by attacking the RNA strand of DNA-RNA hybrid
molecules. The Princeton researchers write in their PNAS article, ―A value of 0 or 1 is
assigned to a specific bit position by destroying all strands in the RNA library which
do not have the value at this position.‖ Researchers repeat this process by applying the
other constraints, and DNA molecules complementary to the remaining RNA strands
were generated using polymerase chain reaction. The researchers sequenced
complementary DNA to the resulting 43 distinct RNA molecules, and read them by
sequencing the DNA. Of the 94 possible solutions to a 3x3 Knight problem, the
researchers randomly selected 42 proper solutions; one solution, involving an illegal
placement of a knight, was incorrect, giving a 97.7 percent success rate in finding
correct solution strands. Originally, the goal of DNA computing, as envisioned by
Adleman and others, was to solve numerical problems. ―You pose some hard
optimization problems, convert it into DNA and use DNA‘s massive parallelism to
search a large number of possibilities, with a chemical reaction to filter out the ones
you don‘t want,‖ Winfree says. In contrast, much of the research underway in DNA
computation is about self-assembly of structures, ―gaining a different kind of control
on molecular processes.‖ Researchers in this field are ―essentially using the idea of an
algorithm in information processing— a specific set of rules that can be iterated to
accomplish some task‖ and applying it to chemical tasks. ―The task is now one in
which you have chemical input—the components for the structure you want to
build—and chemical output— the final structure,‖ Winfree says. Because an
algorithm is involved, the assembly can be viewed as a form of computation. ―It‘s
another case where the tools and concepts of computer science are brought to bear on
how to solve the problem.‖ In particular, Seeman and collaborator John Reif of Duke
University in Durham, North Carolina, have used DNA structures to form self-
Aims And Objectives 16
assembling tilings, analogous to Wang tiles in which the multi colored tiles self-
assemble to form a mosaic with the same colour flanking every edge of the mosaic.
Instead of colours, the ―tiles‖ are multi-armed hybrid DNA molecules with four arms,
each with a single-stranded sticky end, says Seeman. The sticky ends are very
important, giving predictable affinity and structure to the molecules. ―About a year
and a half ago, we reported a cumulative XOR [exclusive OR] computation‖ by self-
assembly of these DNA tiles, Seeman says. ―The nice thing about such systems is that
it appears that they will scale nicely.‖
A current limitation to this branch of DNA computing is the use of natural enzymes,
which only recognize and act on certain nucleotide sequences. The development of
designer enzymes that can identify additional sequences might take decades, Shapiro
notes. ―In the medium term, we can envision biotechnology applications for such
automata, such as to analyze DNA without first sequencing it,‖ Shapiro says. In the
longer term, he foresees construction of artificial ―cells‖ with synthetic-DNA
programming. ―A lot of processes in the living cell resemble computing in
fundamental ways.‖
1.3 OBJECTIVE OF THE RESEARCH
Objective of our research is to study and analyze various specific model of DNA
computing to obtain a Generalized model. Various DNA computing model have been
developed. Some of them are problem specific, such as [Aldeman, 1994], compared
with electronic computers, these model show potential advantage in solving the hard
problems. Due to the highly parallel characteristics of DNA operation, the
corresponding DNA algorithm scale well in the size of the problem. The study and
analysis of various DNA computing models have been done and the drawbacks and
limitations of one model over the other have been taken in account. In addition to this
The Generalized model has also been developed to compute the NP complete
problems viz. Hamiltonian Path problem, Maximum clique problem, Sub Graph
Isomorphism, Maximum Independent Set and 3-Vertex colouring problem. In this
model the each of the above mentioned algorithm has the result of permutation as the
input it. The permuted output is then processed according to the algorithms.
Aims And Objectives 17
As we know that this field has its roots in the late 1950s, when the Nobel laureate
Richard Feynman first introduced the concept of computing at a molecular level.
Feynman's visionary idea was only realised in 1994, when Leonard Adleman
performed the first ever truly molecular-level computation using DNA combined with
the tools and techniques of molecular biology.
The technology for DNA computer is under development. However, it is clear that
molecular computers have many attractive properties. While modern supercomputers
perform 1012
operations per second, Adleman estimates 1020
operations per second to
be realistic for molecular manipulations. Similar impressive views concern the
consumption of energy and the capacity of memory; A supercomputer needs one joule
for 109
operations, whereas the same energy is sufficient to perform 2X1019
ligation
operations. On a videotape, every bit needs 1012
cubic nanometers storage; DNA
stores information with a density of one bit per cubic nanometer [34]. Although the
execution time for DNA molecular reactions are relatively slower than conventional
computers, the total performance of DNA computers can outshine the conventional
electronic computers.
Since Adleman reported the results of his seminal experiment, there has been a flurry
of interest in the idea of using DNA to perform computations. The potential benefits
of using this particular molecule are enormous: by harnessing the massive inherent
parallelism of performing concurrent operations on trillions of strands, we may one
day be able to compress the power of today's super- computer into a single test tube.
However, if we compare the development of DNA-based computers to that of their
silicon counterparts, it is clear that molecular computers are still in their infancy.
Current work in this area is concerned mainly with abstract models of computation
and simple proof-of-principle experiments.
In this span of years, after DNA computing was invented by Adleman [2], a lot of
achievements have been reported by researchers of DNA computing either in
theoretical or practical parts. Today, researchers in this field more concentrate on
developing methods for testing biochemical feasibility with wet experiment; there are
some groups who concentrate on developing a DNA computer itself and developing
algorithms to solve engineering or application problems. Even though developing a
Aims And Objectives 18
real DNA computer is still a long way in front of us, but developing and building
algorithms in solving today's application problems are important tool in order to test
and simulate the stability and reliability of DNA computing algorithm. In order to
solve today's application problems, researchers faced some limitation of manipulation
in this field especially in some routine steps in DNA computing techniques. These
achievements are categorised in two heads: achievements in biochemical feasibility
and achievements in solving engineering or application problems.
Although those problem specific models show the potential power of DNA
computing, we need such a model. Especially, people are interested in two questions:
1. Is DNA computing complete? That is, can DNA computer compute all Turning
computable (recursively enumerable) Functions?
2. Is it possible to build a program me DNA computer? In other words, does there
exist a universal DNA computer in the same sense as a universal Turing Machine:
given a computable function, it can simulate the action of that function for any
argument?
1.4 ORGANIZATION OF THE THESIS
The goal of this thesis is to present the contribution to the field, placing it in the
context of the existing body of work. It includes the knowledge of basics of DNA, its
structure and manipulation and DNA computing. Study and analysis of the various
models of DNA computing the new results concern a general model of DNA
computation, and an assessment of the complexity and viability of DNA
computations.
The Thesis illustrates the current state of the art of DNA computing achievements,
specially, of new approaches or methods contributing to solve either theoretical or
application problems. Starting with the NP-problem that Adleman solved by means of
wet DNA experiment in 1994, DNA becomes one of appropriate alternatives to
overcome the silicon computer limitation. Today, many researchers all over the world
concentrate on subjects either to improve available methods used in DNA computing
or to suggest a new way to solve engineering or application problems with a DNA
Aims And Objectives 19
computing approach. The thesis gives an overview of research achievements in DNA
computing and touches on the achievements of improved methods employed in DNA
computing as well as in solving application problems. Several challenges that DNA
computing faces in the society have also been addressed.
The Thesis has been composed of 8 chapters in total the organisation of thesis is as
follows:-
Chapter 1 explains the motivation behind the research, its objective and the
organisation of the research work.
Chapter 2 deals with the basics of DNA computing, its beginning and the concepts
behind it. It also describes the characteristics and nature of DNA computing and
why DNA computing is needed. With the advantages and disadvantages of DNA
computing, a comparison has been presented between DNA computing and
Conventional electronic computers. This chapter has also highlights the
difficulties of DNA computing.
Chapter 3 describes the structure of the DNA molecule and describe a variety of
laboratory techniques for its manipulation .It describes the structure of the DNA
molecule and a variety of laboratory techniques for its manipulation that have
been studied like Denaturing , Annealing and litigation, Gel electrophoresis, PCR.
Chapter 4 states the literature review of the related papers that have been
published. It is in this chapter from where we have gathered the information and
facts related to DNA Computing, DNA Computing Models, various laboratory
experiments and the materials and the process of carrying out the biological
experiments in the laboratory .This review has proved a great help in the progress
of the work accomplished.
Chapter 5 is the contribution to the research work where we have designed a
Generalized Model. This Generalized Model is to compute few NP complete
Problems by taking the permuted input to solve the NP Complete problems. The
various NP Complete problems are 3-Vertex Coloring Algorithm, Hamiltonian
Path Problem, Sub graph Isomorphism, Maximum Clique Problem, Maximum
Independent Set problem.
Aims And Objectives 20
Chapter 6 describes the Observations, Results and Findings of our research. The
analysis of DNA Computing Models has been done and the same has been
described. Then, the analysis of Generalized Model is presented. The Fragile and
the Tough models are also discussed along with the Complexity analysis of
various algorithms.
Chapter 7 we summarise this thesis, give some concluding remarks and suggest
several open problems in the field of DNA computation.
Chapter 8 lists all the references and bibliography that have been studied and used
in the thesis.
Chapter 9 lists the various research papers that have been published in the national
and international journals. It also lists the various papers that have been presented
in the national and international conferences.
DNA computing is a new way of thinking about computation altogether. Maybe this
is how nature does mathematics: not by adding and subtracting, but by cutting and
pasting, by insertions and deletions. Perhaps the primitive functions we currently use
for computation are just as dependent on the history of humankind, as the fact that we
use base 10 for counting is dependent on our having ten fingers. In the same way
humans moved on to counting in other bases, maybe it is time we realized that there
are other ways to compute besides the ones we are familiar with. The fact that
phenomena happening inside living organisms (copying, cutting and pasting of DNA
strands) could be computations in disguise suggests that life itself may consist of a
series of complex computations. As life is one of the most complex natural
phenomena, we could generalize by conjecturing the whole cosmos to consist of
computations. The differences between the diverse forms of matter would then only
reflect various degrees of computational complexity, with the qualitative differences
pointing to huge computational speed– ups. From chaos to inorganic matter, from
inorganic to organic, and from that to consciousness and mind, perhaps the entire
evolution of the universe is a history of the ever–increasing complexity of
computations. Just imagine. Perhaps all there was in the beginning was a universal
cocktail of particles. They combined randomly for millions of years, until, by chance,
some patterns of beautiful mathematical symmetry started to emerge: the inorganic
matter. They continued to mix and intermingle until some formations started to self-
Aims And Objectives 21
replicate (see fractals and iterated functions) and then to do computations: life
appeared. The more complex the computations grew, the more complex the life forms
became, until there was again a sudden leap and consciousness and mind appeared,
apparently out of thin air, but in reality an inevitable corollary to complexity. Who
knows what the next step could be in this infinite spiral of mathematical evolution?
Of course, the above is only a hypothesis, and the enigma whether modern man is
―homo sapiens‖ or ―homo computants‖ still awaits solving. But this is what makes
DNA computing so captivating. Not only may it help compute faster and more
efficiently, but it stirs the imagination and opens deeper philosophical issues. What
can be more mesmerizing than something that makes you dream? To a
mathematician, DNA computing tells that perhaps mathematics is the foundation of
all there is. Indeed, mathematics has already proven to be an intrinsic part of sciences
like physics and chemistry, of music, visual arts and linguistics, to name just a few.
The discovery of DNA computing, indicating that mathematics also lies at the root of
biology, makes one wonder whether mathematics isn‘t in fact the core of all known
and (with non-Euclidean geometry in mind) possible reality.
DNA is the major information storage molecule in living cells, and billions of years of
evolution have tested and refined both this wonderful informational molecule and
highly specific enzymes that can either duplicate the information in DNA molecules
or transmit this information to other DNA molecules. Instead of using electrical
impulses to represent bits of information, the DNA computer uses the chemical
properties of these molecules by examining the patterns of combination or growth of
the molecules or strings. DNA can do this through the manufacture of enzymes,
which are biological catalysts that could be called the ‘software‘, used to execute the
desired calculation.
A single strand of DNA is similar to a string consisting of a combination of four
different symbols A G C T. Mathematically this means we have at our disposal a
letter alphabet, Σ = {A G C T} to encode information which is more than enough
considering that an electronic computer needs only two digits and for the same
purpose. In a DNA computer, computation takes place in test tubes. The input and
output are both strands of DNA, whose genetic sequences encode certain information.
Aims And Objectives 22
A program on a DNA computer is executed as a series of biochemical operations,
which have the effect of synthesizing, extracting, modifying and cloning the DNA
strands.
The research in DNA computing is in a primary level. High information density of
DNA molecules and massive parallelism involved in the DNA reactions make DNA
computing a powerful tool. Tackling problems with DNA computing would be more
appropriate when the problems are computationally intractable in nature. Because the
DNA Computing due to its high degree of parallelism, can overcome the difficulties
that may cause the problem intractable on silicon computers. However using DNA
computing principles for solving simple problems may not be suggestible. It has been
proved by many research accomplishments that any procedure that can be
programmed in a silicon computer can be realized as a DNA computing procedure.
Due to its incredible applications in Cryptography, research in DNA computing is
gaining some pace and there is a wide scope for the researchers to make use of this
powerful computing tool.
The potential advantages of DNA computing versus electronic computing are clear in
the case of problems like the Directed Hamiltonian Path Problem, the Satisfiability
Problem, and breaking DES. On the other hand, these are only particular problems
solved by means of molecular biology. They are one–time experiments to derive a
combinatorial solution to a particular sort of problem. This immediately leads to two
fundamental questions, posed in Adleman‘s article and in [20] and [28]:
(1). What kind of problems can be solved by DNA computing?
(2). Is it possible, at least in principle, to design a programmable DNA computer?
More precisely, one can reformulate the problems above as:
(1). Is the DNA model of computation computationally complete in the sense that
the action of any computable function (or, equivalently, the computation of any
Turing machine) can be carried out by DNA manipulation?
(2). Does there exist a universal DNA system, i.e., a system that, given the encoding
of a computable function as an input, can simulate the action of that function for
any argument? (Here, the notion of function corresponds to the notion of a
Aims And Objectives 23
program in which an argument w is the input of the program and the value f(w)
is the output of the program. The existence of a universal DNA system amounts
thus to the existence of a DNA computer capable of running programs.)
Opinions differ as to whether the answer to these questions has practical relevance.
One can argue as in [8] that from a practical point of view it maybe not be that
important to simulate a Turing machine by a DNA computing device. Indeed, one
should not aim to fit the DNA model into the Procrustean bed of classical models of
computation, but try to completely rethink the notion of computation. On the other
hand, finding out whether the class of DNA algorithms is computationally complete
has many important implications. If the answer to it were unknown, then the practical
efforts for solving a particular problem might be proven futile at any time: a Gödel
minded person could suddenly announce that it belongs to a class of problems that are
impossible to solve by DNA manipulation. The same holds for the theoretical proof of
the existence of a DNA computer. As long as it is not proved that such a thing
theoretically exists, the danger that the practical efforts will be in vane is always
lurking in the shadow. One more indication of the relevance of the questions
concerning computational completeness and universality of DNA–based devices is
that they have been addressed for most models of DNA computation that have so far
been proposed.
The existing models of DNA computation are based on various combinations of a few
primitive biological operations:
Synthesis of a desired polynomial length strand
Separation of the strands by length
Merging: pour two test tubes into one
Extraction: extract those strands containing a given pattern as a substring
Melting/Annealing: break apart/bond together two single DNA strands with
complementary sequences
Amplifying: make copies of DNA strands by using the Polymerase Chain
Reaction
Cutting: cut DNA strands by using restriction enzymes
Ligation: paste DNA strands with complementary sticky ends by using ligases
Aims And Objectives 24
Detection: given a tube, say ―yes‖ if it contains at least one DNA strand, and ―no‖
otherwise
These operations are then used to write ―programs‖ which receive a tube containing
DNA strands as input and return as output either ―yes‖ or ―no‖ or a set of tubes. A
computation consists of a sequence of tubes containing DNA strands. There are pro‘s
and con‘s for each model (combination of operations). The ones using operations
similar to Adleman‘s have the obvious advantage that they could already be
successfully implemented in the lab. The obstacle preventing the large scale
automatization of the process is that most bio–operations rely on mainly manual
handling of tubes. In contrast, the model introduced by Tom Head in [21] aims to be
an ―one–pot‖ tube with all the operations carried out in principle by enzymes.
Moreover, it has the theoretical advantage of being a mathematical model with all the
claims backed up by mathematical proofs. Its disadvantage is that the current state of
art in molecular biology has not allowed yet practical implementation. Overall, the
existence of different models with complementing features shows the versatility of
DNA computing and increases the likelihood of practically constructing a DNA–
computing–based device.