chapter 1 aims and objectives -...

Aims And Objectives 1

CHAPTER 1

AIMS AND OBJECTIVES

1.1 INTRODUCTION

The biology of life can be regarded as complex and stable computing system to

maintain the life. The biological system can save and interpret the information, and

calculate the information to survive in the environment. Therefore, many researchers

have studied so far to imitate or use the biological system for a specific computation

purpose. Wiener proposed the cybernetics system which studies artificial system and

bio mimic machine (Wiener, 1948), von Neumann compared the digital computer and

biological neural networks in his book (von Neumann, 1958). In 1970s, Fogel,

Rechenberg, and Holland independently suggested the evolutionary computation

model based on Darwin‘s natural selection and molecular biology (Black, 1996).

Langdon started the artificial life research which simulates the life in the digital

computers (Langdon, 1988). These approaches applied biological laws such as

evolution and selection to computer algorithm design. In other way, there has been the

way of thinking to use biological system for computing purpose directly. Feynman

suggested the possibility of computing at the molecular level in the late 1950s

(Feynman, 1960). His idea was that biological molecules can carry enormous amounts

of information in an exceedingly small space, so they have inborn computing power.

Finally, in 1994, Adleman realized the DNA computer by solving the Hamiltonian

path problem (Adleman, 1994).

Ever since ancient Greek times, man has suspected that the features of one generation

are passed on to the next. It was not until Mendel's work on garden peas was

recognised (see [38, 75]) that scientists accepted that both parents contribute material

that determines the characteristics of their offspring. In the early 20th century, it was

discovered that chromosomes make up this material. Chemical analysis of

chromosomes revealed that they are composed of both protein and deoxyribonucleic

acid, or DNA. The question was, which substance carries the genetic information? For


many years, scientists favoured protein, because of its greater complexity relative to

that of DNA. Nobody believed that a molecule as simple as DNA, composed of only

four subunits (compared to 20 for protein) could carry complex genetic information .It

was not until the early 1950s that most biologists accepted the evidence showing that

it is in fact DNA that carries the genetic code. However, the physical structure of the

molecule and the hereditary mechanism was still far from clear. In 1951, the biologist

James Watson moved to Cambridge to work with a physicist, Francis Crick. Using

data collected by Rosalind Franklin and Maurice Wilkins at King's College, London,

they began to decipher the structure of DNA. They worked with models made out of

wire and sheet metal in attempt to construct something that fitted the available data.

Once satisfied with their model, they published the paper [78] (also see [77]) that

would eventually earn them (and Wilkins) the Nobel Prize for Physiology or

Medicine in 1962

DNA molecules were used as information storage media and the techniques of

molecular biology, such as hybridization, ligation, polymerase chain reaction, and gel

electrophoresis, were used as computational operators for extracting, combining,

copying, and sorting the information in the DNA molecules, respectively. Since

Adleman‘s pioneering work, DNA computing has become the focus of the attention

for researchers to overcome the limitations of sequential silicon-based computing.

They paid attention to its high storage density, massive parallelism, and

biocompatible capability (Maley, 1998; Garzon and Deaton, 1999). To show its

computing power, DNA computing has been applied to various computational

problems (Adleman, 1994; Ouyang et al., 1997), logical problem (Liu et al., 2000;

Mao et al., 2000), Boolean circuit development (Owenson et al., 2001), computational

model (Mills Jr., 2002), medical problem (Benenson et al., 2004), nano structure

(Winfree et al., 1998), and associative memory construction (Baum, 1995).

DNA (Deoxyribose Nucleic Acid) computing, also known as molecular computing is

a new approach to massively parallel computation based on groundbreaking work by

Adleman. DNA computing was proposed as a means of solving a class of intractable

computational problems in which the computing time can grow exponentially with

problem size (the 'NP-complete' or non-deterministic polynomial time complete


problems).A DNA computer is basically a collection of specially selected DNA

strands whose combinations will result in the solution to some problem, depending on

the problem at hand. Technology is currently available both to select the initial strands

and to filter the final solution. DNA computing is a new computational paradigm that

employs (bio)molecular manipulation to solve computational problems, at the same

time exploring natural processes as computational models. In 1994, Leonard Adleman

at the Laboratory of Molecular Science, Department of Computer Science, University

of Southern California surprised the scientific community by using the tools of

molecular biology to solve a different computational problem. The main idea was the

encoding of data in DNA strands and the use of tools from molecular biology to

execute computational operations. Besides the novelty of this approach, molecular

computing has the potential to outperform electronic computers. For example, DNA

computations may use a billion times less energy than an electronic computer while

storing data in a trillion times less space. Moreover, computing with DNA is highly

parallel: In principle there could be billions upon trillions of DNA molecules

undergoing chemical reactions, that is, performing computations, simultaneously.

L. M. Adleman launched the field of DNA computing with a demonstration in 1994

that strands of DNA could be used to solve the Hamiltonian path problem for a simple

graph. He also identified three broad categories of open questions for the field. First,

is DNA capable of universal computation? Second, what kinds of algorithms can

DNA implement? Third, can the error rates in the manipulations of the DNA be

controlled enough to allow for useful computation? In the two years that have

followed, theoretical work has shown that DNA is in fact capable of universal

computation. Furthermore, algorithms for solving interesting questions, like breaking

the Data Encryption Standard, have been described using currently available

technology and methods. Finally, a few algorithms have been proposed to handle

some of the apparently crippling error rates in a few of the common processes used to

manipulate DNA. It is thus unlikely that DNA computation is doomed to be only a

passing curiosity. However, much work remains to be done on the containment and

correction of errors. It is far from clear if the problems in the error rates can be solved

sufficiently to ever allow for general-purpose computation that will challenge the

more popular substrates for computation. Unfortunately, biological demonstrations of

http://www.usc.edu/dept/molecular-science/fm-adleman.htm

http://www.usc.edu/dept/molecular-science/


the theoretical results have been sadly lacking. To date, only the simplest of

computations have been carried out in DNA .To make significant progress, the field

will require both the assessment of the practicality of the different manipulations of

DNA and the implementation of algorithms for realistic problems. Theoreticians, in

collaboration with experimentalists, can contribute to this research program by

settling on a small set of practical and efficient models for DNA computation.

The DNA has its hand in security as well .The security of traditional cryptology is

usually based complex mathematical problem that we can not find a quick algorithm

at this stage, such as famous Rivest-Shamir-Adleman (RSA) encryption ,the security

of which bases on the difficulty of a large number finding its two prime factors. Once

corresponding quick methods to mathematic problems were found, they might be no

longer secure. DNA computing provides a parallel processing capability with

molecular level, introducing a fire-new data structure and calculating method. It can

simultaneously attack different parts of the computing problem, putting forward

challenges to traditional information security technology. A number of proposals have

been submitted for breaking conventional cryptosystems by DNA computing. It

indicated that the cryptosystem using public-key was perhaps insecure. DNA

computing is a new computational paradigm by harnessing the potential massive

parallelism, high density information of bio-molecules and low power consumption,

which brings potential challenges and opportunities to traditional cryptography. DNA

computing is a new method of simulating bio molecular structure of DNA and

computing by means of molecular biological technology which is a novel and

potential growth interdisciplinary. In a pioneering study, Adleman demonstrated the

first DNA computing . It marked the beginning of a new stage in the era of

information. This approach has been extended by Lipton to solve another NP-

complete problem, which is the satisfaction problem. These elegant studies

demonstrated how problems corresponding to Boolean formulas can be solved by a

massively parallel processing procedure. DNA computing has been proposed to solve

difficult combinatorial search problems such as the Hamiltonian path problem (HPP),

using the vast parallelism to do the combinatorial search among a large number of

possible solutions represented by DNA strands. In 2002, Braich, R. S. etc got the

solution of a 20-Variable 3-SAT Problem on a DNA Computer . However, DNA


computing has many further exciting applications besides the pure combinatorial

search. It can simultaneously attack different parts of the computing problem put

forward challenges and opportunities to traditional information security technology.

For example, in 1995, Boneh et al. demonstrated an approach to break the Data

Encryption Standard (DES) by using DNA computing methods. In 1999, Clelland et

al. Achieved an approach to steganography by hiding secret messages encoded as

DNA strands among a multitude of random DNA. DNA and RNA are appealing

mediums for data storage due to the very large amounts of data that can be stored in

compact volume.They vastly exceed the storage capacities of conventional electronic,

magnetic, optical medium. A gram of DNA contains about 1021 DNA bases, or about

108 tera-bytes. Hence, a few grams of DNA may have the potential of storing all the

data stored in the world. Recent research has considered DNA as a medium for ultra-

scale computation and for ultra compact information storage. DNA cryptography is a

new born cryptographic field emerged with the research of DNA computing in which

DNA is used as information carrier and the modern biological technology is used as

implementation tool. The vast parallelism and extraordinary information density

inherent in DNA molecules are explored for cryptographic purposes such as

encryption, authentication, signature, and so on. The new born DNA cryptography is

far from mature both in theory and realization, and this might be the reason why only

few examples of DNA cryptography were proposed.

Although DNA computing creates a molecular computing precedent and broadens the

understanding of people to natural computing phenomena, it still stayed in a

theoretical stage. There are some problems unresolved successfully about DNA

computing:

1. Its computing model is mostly just using molecular technique to resolve a certain

problem, the varieties of problems result in the discrepancy of computing

schemes, there still have not an uniform computing and coding model currently.

2. DNA computing only converts the time complexity into space complexity.

3. There are also error codes in DNA computing, they generate randomly according

to probability and can gradually amplified with the increase of the experiment

step.


4. DNA liquid is very easy to deteriorate in the process of reaction and even

adsorption of the test tube wall may result in fatal error.

5. Most of these proposals implemented computing processes by performing a series

of biochemical reactions on a set of DNA molecules, which require human

intervention at each step.

Thus, the difficulties of such methods for DNA computing are that the large numbers

of laboratory procedures and the time consuming, which grow with the size of the

problem. Therefore, DNA computing is not very good in resolving real problem

according to the available pattern in recent year. Therefore, in terms of existing DNA

computing mode, it is not able to construct real intimidation to the security of

cryptography. At the same time, all kinds of encryption scheme pouring out

unceasingly based on DNA computing, providing escorting to the DNA molecules

bank and DNA molecules information. But, because the security, general, validity and

key management of the encrypt mechanism have not been carried out systematic

theory analysis. So, DNA cryptography still needs studying exclusively and

discussion broadly, its prospect is still uncertain before DNA computing become

really mature. But DNA cipher is the beneficial supplement to the existing

mathematical cipher, it is a prior choice especially to the lower demand real-time

encryption system. Relatively speaking, DNA computing has a brighter development

potential in steganography and authentication, which have a more layer protection

than a single encryption. With the rapid development of modern biotechnology, the

costly biological experiment has became a normal one. If the molecular word can be

controlled at will, it may be possible to achieve vastly better performance for

information storage and information security.

Thus, it can be said that DNA computing is the method of solving computation

problems with the help of chemistry and biological operation on DNA strands. It was

introduce by Adleman in 1994 [Adleman, 1994] who showed how to solve the

Hamilton path problem by manipulating the DNA strands in the tubes. Since then,

more and more researchers are motivated by the promising future of this area and start

working on it. The basic idea of DNA computing arises from the mapping between

the physics process in electronic computer and the chemistry process in DNA


reactions. In electronics computer, everything is encoding in Binary(0,1) strings,

while every DNA strands is encoded in four nucleotides : A, T, G and C. In

electronics computers, the basic operation can be treated as manipulation on binary

strings while there are a bunch of biological operation on the DNA strands ,e.g.

legation (concatenation), amplifying (copy), Substitution etc. which can be performed

in a controlled manner by modern biological Technologies.

1.2 MOTIVATIONS OF THE RESEARCH

―Where a calculator on the ENIAC is equipped with 18,000 vacuum tubes and weighs

30 tons, computers in the future may have only 1,000 vacuum tubes and perhaps

weigh 1 1/2 tons." So said Popular Mechanics in 1949 [8]. Today, in the age of smart

cards and wearable PCs, this statement is striking because it falls so short of reality. In

fifty years from now, who would be prepared to predict how close to the levels of

molecular miniaturisation described in Feynman's visionary paper [26] we will have

come?

Huge advances in miniaturization have been made since the days of room sized

computers, yet the underlying computational model (the Von Neumann architecture)

has remained the same. Today's supercomputers still employ the kind of sequential

logic used by the mechanical dinosaurs of the 1930s. Some researchers are now

looking beyond these boundaries and are investigating entirely new media and

computational models. These include quantum, optical and DNA-based computers .It

is the last development that this thesis concentrates on. Despite the popular image of

silicon-based computers for computation, an embryonic field of molecular

computation is emerging, where molecules in solution perform computational

operations. DNA, which is known to store biological information, is being used as a

substrate for molecular computation.

The idea that living cells and molecular complexes can be viewed as potential

mechanic components dates back to the late 1950s, when Richard Feynman delivered

his famous paper describing ―sub-microscopic‖ computers. More recently, several

papers [2, 7, 52] (also [5, 36, 64]) have advocated the realisation of massively parallel

computation using the techniques and chemistry of molecular biology. The


development of existing silicon-based computers was only made possible by the

invention of the transistor, which facilitated for the first time electronic manipulation

of silicon. We may draw an interesting parallel between this historical precedent and

the development of molecular-scale computers. Although the concept dates back to

the late 1950s, only now do we have at our disposal the tools and techniques of

molecular biology required to construct the prototype molecular computers. In [2],

Adleman described how a computationally intractable problem, known as the directed

Hamiltonian Path Problem (HPP) might be solved using molecular methods. Recall

that the HPP involves finding a path through a graph that visits each vertex exactly

once. Adleman's method employs a simple, massively parallel random search. The

algorithm is not executed on a traditional, silicon-based computer, but instead

employs the ―test-tube‖ technology of genetic engineering. By representing

information as sequences of bases in DNA molecules, Adleman shows how existing

DNA-manipulation techniques may be used to quickly detect and amplify desirable

solutions to a given problem.

How can we combine a flask of DNA with biological tools to solve a hard

mathematical problem? Adleman's experiment proceeds as follows. The first stage

created a flask of DNA molecules, each molecule encoding a potential solution to the

problem. With reference to the HPP, for example, each strand encoded a path (not

necessarily Hamiltonian) through the graph. Given every DNA molecule that encodes

a path of length n, for a graph with n vertices, we can be sure that every possible

solution is present, some legal, but most illegal. Once the entire solution space was

present in a flask the DNA computer really came into its own. Adleman used a small

set of biological tools to sift out DNA that encoded illegal solutions. These are those

paths that do not visit every vertex, or paths that visit a particular vertex more than

once. At the end of the sifting process, he was left only with strands that encoded

legal solutions.

DNA computing is one interdisciplinary research area that is growing fast since DNA

molecules are implemented in a computational process. One of the main objectives of

this research area is to produce, in near future, a biologically inspired computer based

on DNA molecules to replace or at least beneficially complement with a silicon based

computer. Since R. Feynman has suggested to construct a computer from molecules in


1964 [1]. It spent 20 years till Adleman in 1994 made proof of the principle study that

DNA molecules can solve an NP problem of Hamiltonian Path Problem (HPP)

through bio-chemical procedure [2].

DNA is a basic storage medium for all living cells. The main function of DNA is to

absorb and transmit the data of life for billions years. Roughly, it is around 10 trillions

of DNA molecules could fit into a space the size of a marbles. Since all these

molecules can process data simultaneously, theoretically, we can calculate 10 trillions

times simultaneously in a small space at one time. DNA computing is more generally

known as molecular computing. It is interdisciplinary field where it is combination of

biology, chemistry, and mathematics and computer science. Computing with DNA

offers a completely new paradigm for computation. The main idea of computing with

DNA is to encode data in a DNA strand form, and laboratory techniques of molecule

biology, called as bio operations will be involved to manipulate DNA strands in a test

tube in order to simulate arithmetical and logical operations. It is estimated that a mix

of 1018 DNA strands could operate 104 times faster than the speed of a today's

advanced supercomputer [3]. Since then, DNA computing is the area of exciting

multidisciplinary researches. Rozenberg et al. in 1999 distinguished two major lines

of researches in DNA computing as (i) the theoretical line concerned with models,

algorithms and paradigms for DNA computing and (ii) the experimental line

concerned with the design of laboratory experiment to test the biochemical feasibility

[4]. Even though there is still a long way to implement DNA algorithm in real life

problem, but researchers are interested in modelling and testing the solution in a case

study in order to challenge the limitation of DNA itself. Today, lot groups of active

researchers in this field develop models and do the laboratories experiment especially

in challenges of biochemical feasibility. However, there are other groups concerning

to develop a real DNA computer and building DNA algorithms to solve engineering

or application problems.

Of course, for DNA computers, each individual operation, for example, extracting

DNA strands, can take minutes or even hours to perform. This cost of a computational

step, when compared to that of supercomputers capable of executing a trillion

operations a second, looks unimpressive. However, the real power of DNA computers

lies in their inherent parallelism each operation is performed not on one single DNA


strand, but on every strand in the ask simultaneously. The fastest supercomputers in

existence today are capable of executing around a trillion operations a second. DNA

computers have the potential to execute more than a thousand trillion operations per

second, as well as being a billion times more energy-efficient and requiring a trillionth

of the space needed by existing storage media. Nature has information compression

down to a fine art over forty 1 Mb floppy discs are required to store the genome of a

single fruit y [74].

By natural of DNA molecule, Watson-Crick complementary plays the most important

role in the DNA computing. DNA consists of four bases of nucleic acid, Adenine (A),

Guanine (G), Cytosine (C), and Thymine (T). Adenine can only connect with

Thymine, and Cytosine can only connect with Guanine.

Adleman‘s Experiment in this field was about : A Hamiltonian path is a sequence of

edges in a graph, which touches every vertex exactly once. The Hamiltonian path

problem is to decide whether a graph has a Hamiltonian path or not. Given a graph G

with n vertices, where vertices in V and out V are marked. G is called to have a

Hamiltonian path from in V to out V if there is a path of edges starting with in V and

ending with out V that contains every vertices of G exactly once. The directed

Hamiltonian path problem is a triple tuple (G, in V, out V) where G has a Hamiltonian

path from in V to out V. Adleman uses the nondeterministic algorithm to solve the

directed Hamiltonian path problem for an input (G, in V, out V ) as follows:

1. Generate a set of random paths in G.

2. Extract all paths beginning with in V and ending with out V.

3. Extract all paths with length exactly n -1.

4. Extract all paths that contain every vertex at most once.

5. Accept that there is a Hamiltonian path if there are any paths left; otherwise,

reject.

The above steps are realized as molecular computation phases. Vertices and edges of

G are coded by DNA polymers. On step 1, ligation builds DNA strands that represent

random paths in G. On step 2, the Watson-Crick complements of the codings of in V

and out V are used to extract the strands with the correct start and end. On step 3, in


order to get codings of length n -1, the DNA strands are separated. Next, the DNA is

denatured. On step 4, by Watson-Crick complement of its coding, each vertex is

checked if only present in a path once. On step 5, to obtain the result, the gel

electrophoresis is used for testing whether there is any strand left or not. In between

the steps, polymerase chain reaction (PCR) is used to amplify the intermediate results

[34].

If we talk about Silicon microprocessors, they have been the heart of the computing

world for more than 40 years. In that time, manufactures have crammed more and

more electronics device onto their microprocessor. In accordance with MOORE Law,

the number of electronics device put on a microprocessor has doubled every 18

months. Moore‘s law is named after intel founder Gordon Moore, who predicted in

1965 that microprocessor would double in complexity every two years. Many have

predicted that Moore‘s Law will soon reach its end, because of the physical speed and

miniaturization limitations of silicon microprocessors.

If we compare the silicon computers with the DNA ones in terms of speed then there

are two factor to consider in terms of speed. One is speed of operation and other is

parallelism. Instruction in the electronics computer are much faster than the lab

experiments (millions per seconds Vs one per hour even day). But the DNA

computer have vastly more parallelism than the electronics computer. Therefore, the

high parallelism will overcome the slowness of biological experiments. Furthermore,

the lab experiments can be speed up once the manipulation of DNA strand can be

done automatically by machine. What if we talk about the Energy Efficiency. The

energy cost of DNA operation (on one strands) is about 10000000000 times less than

the energy cost of an instruction in electronics computer. About the data storage we

can say that it has economical data storage as one tube can store billons of DNA

strands.

DNA computer have the potential to take computing to new levels, picking up where

Moore‘s Law leaves off. There are several advantage of using DNA instead of silicon:

As long as there are cellular organisms, there will always be supply of DNA.

The large Supply of DNA makes it cheap resource.


Unlike the toxic material used to make traditional microprocessors, DNA

biochips can be made cleanly.

DNA computers are many time smaller than today‘s computers.

The main issue in implementing DNA computing technologies to solve real

application is how we can present numerical values especially when a number of

numerical values are related in DNA strands form. Recently various researches have

done and still investigating in order to solve this problem. Researchers have proposed

several techniques as discussed before to solve this problem. However, at the time, all

proposed solutions are only suitable for the limited number of numerical values and

not tested for a number of numerical values. This problem still is open to solve.

Solving problems in presenting numerical values in DNA strands form will enable

DNA computing more practically to solve a lot of engineering and real application

problems. Developing robust method in wet lab experiment to solve engineering

problems are critically essential in DNA computing. One of important process in wet

lab experiment is to reading an end result during the experiment. Recently, gel

electrophoresis, where the strands will be sorted by their bands, is the most popular

technique for this step. However, gel electrophoresis has their own limitation where

this limitation should be disadvantage for DNA technique. One of the drawbacks of

gel electrophoresis technique is coming from the fact we cannot analyze the gel

images in one time for all bands when we are dealing with a number of base pairs. It

is that because some bands especially the earlier one might not exist in the buffer

reader yet. So that, we are only able to read a certain part of bands in one time. It will

be difficult to made analysis process pursued properly. Another important technique

in wet technologies of DNA computing is PCR. PCR is used to amplify the number of

copies of a specific region of DNA, in order to produce enough DNA to be adequately

tested. This technique can be used to identify with a very high probability, disease-

causing viruses and/or bacteria, a diseased person, or a criminal suspect. However,

traditional PCR itself has several limitations that may affect results in DNA

computing. Thus, several researchers focus on enhancing this technique to overcome

this limitation. As a result, real time PCR is employed in several wet experiment in

order to enhance the readability of end results from the experiment real time PCR.

Even though current difficulties found in translating theoretical DNA computing


models into real life are not sufficiently overcome, there is still potential for other

areas of development. DNA computing offers a new approach to solve combinatorial

problems such as NP-hard problems in parallel. This advantage offers a potential to

solve problems that faced by a traditional machine in processing a number of tasks.

Thus, considering this benefit, researchers are able to solve a problem, especially one

dealing with a number of calculations such as optimization of clustering, scheduling

problem and so on. On the other hand, DNA is capable to store a lot of information in

small space compared to digital way of storing information. Back to today's situation

we are dealing with huge size of information. Today, our information not only in

word or document yet, but also in images, video format and so on in these formats

require a huge size of storage. So, DNA seems to offer a right choice to solve today

storage problem. As a started research dealing with huge size of storage, Tsaftaris et

al.[15] have proposed a solution to employ DNA in signal processing field and Tsuboi

et al. in image processing [17].

According to researchers there are three main reasons why DNA computation is

practical, firstly, there are a specific computer will be easier to design and implement,

with less need for functional complexity and flexibility; secondly, DNA computing

may prove entirely inefficient for a wide range of problems, and directing efforts on

universal models may be diverting energy away from its true calling; thirdly, the types

of hard computational problems that DNA based computers may be able to effectively

solve are of sufficient economic importance that a dedicated processor would be

financially reasonable. With so many possible advantages over conventional

techniques, DNA computing has bright development potential for practical use.

Future work in this field should begin to incorporate cost-benefit analysis so that

comparisons can be more appropriate with existing techniques.

"Computers in the future may weigh no more than 1.5 tons." So said Popular

Mechanics in 1949. Most of us today, in the age of smart cards and wearable PCs

would find that statement laughable. We have made huge advances in miniaturization

since the days of room-sized computers, yet the underlying computational framework

has remained the same. Today's supercomputers still employ the kind of sequential

logic used by the mechanical dinosaurs of the 1930s. Some researchers are now


looking beyond these boundaries and are investigating entirely new media and

computational models. These include quantum, optical and DNA-based computers.

The current Silicon technology has following limitations:

Circuit integration dimensions

Clock frequency

Power consumption

Heat dissipation.

The problem's complexity that can be afforded by modern processors grows up, but

great challenges require computational capabilities that neither most powerful and

distributed systems could reach.

The idea that living cells and molecular complexes can be viewed as potential

mechanic components dates back to the late 1950s, when Richard Feynman delivered

his famous paper describing "sub-microscopic" computers. More recently, several

people have advocated the realization of massively parallel computation using the

techniques and chemistry of molecular biology. DNA computing was grounded in

reality at the end of 1994, when Leonard Adleman, announced that he had solved a

small instance of a computationally intractable problem using a small vial of DNA.

By representing information as sequences of bases in DNA molecules, Adleman

showed how to use existing DNA-manipulation techniques to implement a simple,

massively parallel random search. He solved the travelling salesman problem also

known as the ―Hamiltonian path" problem as stated above.

There are two reasons for using molecular biology to solve computational problems.

(i). The information density of DNA is much greater than that of silicon: 1 bit can

be stored in approximately one cubic nanometer. Others storage media, such as

videotapes, can store 1 bit in 1,000,000,000,000 cubic nanometer.

(ii). Operations on DNA are massively parallel: a test tube of DNA can contain

trillions of strands. Each operation on a test tube of DNA is carried out on all

strands in the tube in parallel.

Other researchers have reported liquid-phase systems for DNA computing. Laura

Landweber and her Princeton University colleagues‘ article in Proceedings of the


National Academy of Sciences’ 15 February 2000 issue used a hybrid DNA RNA

computing system to compute solutions to a variant of a well-known chess problem.

The ―Knight problem‖ asks what configurations of knights can be placed on a chess

board with n squares on a side such that none of the knights is attacking another. RNA

strands with 10 bits (each bit represented by 15 nucleotides separated by 5-nucleotide

spacers) were randomly synthesized.

In this approach, an enzyme, RNase H, breaks down those RNA chains that did not

meet the problem‘s constraints by attacking the RNA strand of DNA-RNA hybrid

molecules. The Princeton researchers write in their PNAS article, ―A value of 0 or 1 is

assigned to a specific bit position by destroying all strands in the RNA library which

do not have the value at this position.‖ Researchers repeat this process by applying the

other constraints, and DNA molecules complementary to the remaining RNA strands

were generated using polymerase chain reaction. The researchers sequenced

complementary DNA to the resulting 43 distinct RNA molecules, and read them by

sequencing the DNA. Of the 94 possible solutions to a 3x3 Knight problem, the

researchers randomly selected 42 proper solutions; one solution, involving an illegal

placement of a knight, was incorrect, giving a 97.7 percent success rate in finding

correct solution strands. Originally, the goal of DNA computing, as envisioned by

Adleman and others, was to solve numerical problems. ―You pose some hard

optimization problems, convert it into DNA and use DNA‘s massive parallelism to

search a large number of possibilities, with a chemical reaction to filter out the ones

you don‘t want,‖ Winfree says. In contrast, much of the research underway in DNA

computation is about self-assembly of structures, ―gaining a different kind of control

on molecular processes.‖ Researchers in this field are ―essentially using the idea of an

algorithm in information processing— a specific set of rules that can be iterated to

accomplish some task‖ and applying it to chemical tasks. ―The task is now one in

which you have chemical input—the components for the structure you want to

build—and chemical output— the final structure,‖ Winfree says. Because an

algorithm is involved, the assembly can be viewed as a form of computation. ―It‘s

another case where the tools and concepts of computer science are brought to bear on

how to solve the problem.‖ In particular, Seeman and collaborator John Reif of Duke

University in Durham, North Carolina, have used DNA structures to form self-


assembling tilings, analogous to Wang tiles in which the multi colored tiles self-

assemble to form a mosaic with the same colour flanking every edge of the mosaic.

Instead of colours, the ―tiles‖ are multi-armed hybrid DNA molecules with four arms,

each with a single-stranded sticky end, says Seeman. The sticky ends are very

important, giving predictable affinity and structure to the molecules. ―About a year

and a half ago, we reported a cumulative XOR [exclusive OR] computation‖ by self-

assembly of these DNA tiles, Seeman says. ―The nice thing about such systems is that

it appears that they will scale nicely.‖

A current limitation to this branch of DNA computing is the use of natural enzymes,

which only recognize and act on certain nucleotide sequences. The development of

designer enzymes that can identify additional sequences might take decades, Shapiro

notes. ―In the medium term, we can envision biotechnology applications for such

automata, such as to analyze DNA without first sequencing it,‖ Shapiro says. In the

longer term, he foresees construction of artificial ―cells‖ with synthetic-DNA

programming. ―A lot of processes in the living cell resemble computing in

fundamental ways.‖

1.3 OBJECTIVE OF THE RESEARCH

Objective of our research is to study and analyze various specific model of DNA

computing to obtain a Generalized model. Various DNA computing model have been

developed. Some of them are problem specific, such as [Aldeman, 1994], compared

with electronic computers, these model show potential advantage in solving the hard

problems. Due to the highly parallel characteristics of DNA operation, the

corresponding DNA algorithm scale well in the size of the problem. The study and

analysis of various DNA computing models have been done and the drawbacks and

limitations of one model over the other have been taken in account. In addition to this

The Generalized model has also been developed to compute the NP complete

problems viz. Hamiltonian Path problem, Maximum clique problem, Sub Graph

Isomorphism, Maximum Independent Set and 3-Vertex colouring problem. In this

model the each of the above mentioned algorithm has the result of permutation as the

input it. The permuted output is then processed according to the algorithms.


As we know that this field has its roots in the late 1950s, when the Nobel laureate

Richard Feynman first introduced the concept of computing at a molecular level.

Feynman's visionary idea was only realised in 1994, when Leonard Adleman

performed the first ever truly molecular-level computation using DNA combined with

the tools and techniques of molecular biology.

The technology for DNA computer is under development. However, it is clear that

molecular computers have many attractive properties. While modern supercomputers

perform 1012

operations per second, Adleman estimates 1020

operations per second to

be realistic for molecular manipulations. Similar impressive views concern the

consumption of energy and the capacity of memory; A supercomputer needs one joule

for 109

operations, whereas the same energy is sufficient to perform 2X1019

ligation

operations. On a videotape, every bit needs 1012

cubic nanometers storage; DNA

stores information with a density of one bit per cubic nanometer [34]. Although the

execution time for DNA molecular reactions are relatively slower than conventional

computers, the total performance of DNA computers can outshine the conventional

electronic computers.

Since Adleman reported the results of his seminal experiment, there has been a flurry

of interest in the idea of using DNA to perform computations. The potential benefits

of using this particular molecule are enormous: by harnessing the massive inherent

parallelism of performing concurrent operations on trillions of strands, we may one

day be able to compress the power of today's supercomputer into a single test tube.

However, if we compare the development of DNA-based computers to that of their

silicon counterparts, it is clear that molecular computers are still in their infancy.

Current work in this area is concerned mainly with abstract models of computation

and simple proof-of-principle experiments.

In this span of years, after DNA computing was invented by Adleman [2], a lot of

achievements have been reported by researchers of DNA computing either in

theoretical or practical parts. Today, researchers in this field more concentrate on

developing methods for testing biochemical feasibility with wet experiment; there are

some groups who concentrate on developing a DNA computer itself and developing

algorithms to solve engineering or application problems. Even though developing a


real DNA computer is still a long way in front of us, but developing and building

algorithms in solving today's application problems are important tool in order to test

and simulate the stability and reliability of DNA computing algorithm. In order to

solve today's application problems, researchers faced some limitation of manipulation

in this field especially in some routine steps in DNA computing techniques. These

achievements are categorised in two heads: achievements in biochemical feasibility

and achievements in solving engineering or application problems.

Although those problem specific models show the potential power of DNA

computing, we need such a model. Especially, people are interested in two questions:

1. Is DNA computing complete? That is, can DNA computer compute all Turning

computable (recursively enumerable) Functions?

2. Is it possible to build a program me DNA computer? In other words, does there

exist a universal DNA computer in the same sense as a universal Turing Machine:

given a computable function, it can simulate the action of that function for any

argument?

1.4 ORGANIZATION OF THE THESIS

The goal of this thesis is to present the contribution to the field, placing it in the

context of the existing body of work. It includes the knowledge of basics of DNA, its

structure and manipulation and DNA computing. Study and analysis of the various

models of DNA computing the new results concern a general model of DNA

computation, and an assessment of the complexity and viability of DNA

computations.

The Thesis illustrates the current state of the art of DNA computing achievements,

specially, of new approaches or methods contributing to solve either theoretical or

application problems. Starting with the NP-problem that Adleman solved by means of

wet DNA experiment in 1994, DNA becomes one of appropriate alternatives to

overcome the silicon computer limitation. Today, many researchers all over the world

concentrate on subjects either to improve available methods used in DNA computing

or to suggest a new way to solve engineering or application problems with a DNA


computing approach. The thesis gives an overview of research achievements in DNA

computing and touches on the achievements of improved methods employed in DNA

computing as well as in solving application problems. Several challenges that DNA

computing faces in the society have also been addressed.

The Thesis has been composed of 8 chapters in total the organisation of thesis is as

follows:-

Chapter 1 explains the motivation behind the research, its objective and the

organisation of the research work.

Chapter 2 deals with the basics of DNA computing, its beginning and the concepts

behind it. It also describes the characteristics and nature of DNA computing and

why DNA computing is needed. With the advantages and disadvantages of DNA

computing, a comparison has been presented between DNA computing and

Conventional electronic computers. This chapter has also highlights the

difficulties of DNA computing.

Chapter 3 describes the structure of the DNA molecule and describe a variety of

laboratory techniques for its manipulation .It describes the structure of the DNA

molecule and a variety of laboratory techniques for its manipulation that have

been studied like Denaturing , Annealing and litigation, Gel electrophoresis, PCR.

Chapter 4 states the literature review of the related papers that have been

published. It is in this chapter from where we have gathered the information and

facts related to DNA Computing, DNA Computing Models, various laboratory

experiments and the materials and the process of carrying out the biological

experiments in the laboratory .This review has proved a great help in the progress

of the work accomplished.

Chapter 5 is the contribution to the research work where we have designed a

Generalized Model. This Generalized Model is to compute few NP complete

Problems by taking the permuted input to solve the NP Complete problems. The

various NP Complete problems are 3-Vertex Coloring Algorithm, Hamiltonian

Path Problem, Sub graph Isomorphism, Maximum Clique Problem, Maximum

Independent Set problem.


Chapter 6 describes the Observations, Results and Findings of our research. The

analysis of DNA Computing Models has been done and the same has been

described. Then, the analysis of Generalized Model is presented. The Fragile and

the Tough models are also discussed along with the Complexity analysis of

various algorithms.

Chapter 7 we summarise this thesis, give some concluding remarks and suggest

several open problems in the field of DNA computation.

Chapter 8 lists all the references and bibliography that have been studied and used

in the thesis.

Chapter 9 lists the various research papers that have been published in the national

and international journals. It also lists the various papers that have been presented

in the national and international conferences.

DNA computing is a new way of thinking about computation altogether. Maybe this

is how nature does mathematics: not by adding and subtracting, but by cutting and

pasting, by insertions and deletions. Perhaps the primitive functions we currently use

for computation are just as dependent on the history of humankind, as the fact that we

use base 10 for counting is dependent on our having ten fingers. In the same way

humans moved on to counting in other bases, maybe it is time we realized that there

are other ways to compute besides the ones we are familiar with. The fact that

phenomena happening inside living organisms (copying, cutting and pasting of DNA

strands) could be computations in disguise suggests that life itself may consist of a

series of complex computations. As life is one of the most complex natural

phenomena, we could generalize by conjecturing the whole cosmos to consist of

computations. The differences between the diverse forms of matter would then only

reflect various degrees of computational complexity, with the qualitative differences

pointing to huge computational speed– ups. From chaos to inorganic matter, from

inorganic to organic, and from that to consciousness and mind, perhaps the entire

evolution of the universe is a history of the ever–increasing complexity of

computations. Just imagine. Perhaps all there was in the beginning was a universal

cocktail of particles. They combined randomly for millions of years, until, by chance,

some patterns of beautiful mathematical symmetry started to emerge: the inorganic

matter. They continued to mix and intermingle until some formations started to self-


replicate (see fractals and iterated functions) and then to do computations: life

appeared. The more complex the computations grew, the more complex the life forms

became, until there was again a sudden leap and consciousness and mind appeared,

apparently out of thin air, but in reality an inevitable corollary to complexity. Who

knows what the next step could be in this infinite spiral of mathematical evolution?

Of course, the above is only a hypothesis, and the enigma whether modern man is

―homo sapiens‖ or ―homo computants‖ still awaits solving. But this is what makes

DNA computing so captivating. Not only may it help compute faster and more

efficiently, but it stirs the imagination and opens deeper philosophical issues. What

can be more mesmerizing than something that makes you dream? To a

mathematician, DNA computing tells that perhaps mathematics is the foundation of

all there is. Indeed, mathematics has already proven to be an intrinsic part of sciences

like physics and chemistry, of music, visual arts and linguistics, to name just a few.

The discovery of DNA computing, indicating that mathematics also lies at the root of

biology, makes one wonder whether mathematics isn‘t in fact the core of all known

and (with non-Euclidean geometry in mind) possible reality.

DNA is the major information storage molecule in living cells, and billions of years of

evolution have tested and refined both this wonderful informational molecule and

highly specific enzymes that can either duplicate the information in DNA molecules

or transmit this information to other DNA molecules. Instead of using electrical

impulses to represent bits of information, the DNA computer uses the chemical

properties of these molecules by examining the patterns of combination or growth of

the molecules or strings. DNA can do this through the manufacture of enzymes,

which are biological catalysts that could be called the ‘software‘, used to execute the

desired calculation.

A single strand of DNA is similar to a string consisting of a combination of four

different symbols A G C T. Mathematically this means we have at our disposal a

letter alphabet, Σ = {A G C T} to encode information which is more than enough

considering that an electronic computer needs only two digits and for the same

purpose. In a DNA computer, computation takes place in test tubes. The input and

output are both strands of DNA, whose genetic sequences encode certain information.


A program on a DNA computer is executed as a series of biochemical operations,

which have the effect of synthesizing, extracting, modifying and cloning the DNA

strands.

The research in DNA computing is in a primary level. High information density of

DNA molecules and massive parallelism involved in the DNA reactions make DNA

computing a powerful tool. Tackling problems with DNA computing would be more

appropriate when the problems are computationally intractable in nature. Because the

DNA Computing due to its high degree of parallelism, can overcome the difficulties

that may cause the problem intractable on silicon computers. However using DNA

computing principles for solving simple problems may not be suggestible. It has been

proved by many research accomplishments that any procedure that can be

programmed in a silicon computer can be realized as a DNA computing procedure.

Due to its incredible applications in Cryptography, research in DNA computing is

gaining some pace and there is a wide scope for the researchers to make use of this

powerful computing tool.

The potential advantages of DNA computing versus electronic computing are clear in

the case of problems like the Directed Hamiltonian Path Problem, the Satisfiability

Problem, and breaking DES. On the other hand, these are only particular problems

solved by means of molecular biology. They are one–time experiments to derive a

combinatorial solution to a particular sort of problem. This immediately leads to two

fundamental questions, posed in Adleman‘s article and in [20] and [28]:

(1). What kind of problems can be solved by DNA computing?

(2). Is it possible, at least in principle, to design a programmable DNA computer?

More precisely, one can reformulate the problems above as:

(1). Is the DNA model of computation computationally complete in the sense that

the action of any computable function (or, equivalently, the computation of any

Turing machine) can be carried out by DNA manipulation?

(2). Does there exist a universal DNA system, i.e., a system that, given the encoding

of a computable function as an input, can simulate the action of that function for

any argument? (Here, the notion of function corresponds to the notion of a


program in which an argument w is the input of the program and the value f(w)

is the output of the program. The existence of a universal DNA system amounts

thus to the existence of a DNA computer capable of running programs.)

Opinions differ as to whether the answer to these questions has practical relevance.

One can argue as in [8] that from a practical point of view it maybe not be that

important to simulate a Turing machine by a DNA computing device. Indeed, one

should not aim to fit the DNA model into the Procrustean bed of classical models of

computation, but try to completely rethink the notion of computation. On the other

hand, finding out whether the class of DNA algorithms is computationally complete

has many important implications. If the answer to it were unknown, then the practical

efforts for solving a particular problem might be proven futile at any time: a Gödel

minded person could suddenly announce that it belongs to a class of problems that are

impossible to solve by DNA manipulation. The same holds for the theoretical proof of

the existence of a DNA computer. As long as it is not proved that such a thing

theoretically exists, the danger that the practical efforts will be in vane is always

lurking in the shadow. One more indication of the relevance of the questions

concerning computational completeness and universality of DNA–based devices is

that they have been addressed for most models of DNA computation that have so far

been proposed.

The existing models of DNA computation are based on various combinations of a few

primitive biological operations:

Synthesis of a desired polynomial length strand

Separation of the strands by length

Merging: pour two test tubes into one

Extraction: extract those strands containing a given pattern as a substring

Melting/Annealing: break apart/bond together two single DNA strands with

complementary sequences

Amplifying: make copies of DNA strands by using the Polymerase Chain

Reaction

Cutting: cut DNA strands by using restriction enzymes

Ligation: paste DNA strands with complementary sticky ends by using ligases


Detection: given a tube, say ―yes‖ if it contains at least one DNA strand, and ―no‖

otherwise

These operations are then used to write ―programs‖ which receive a tube containing

DNA strands as input and return as output either ―yes‖ or ―no‖ or a set of tubes. A

computation consists of a sequence of tubes containing DNA strands. There are pro‘s

and con‘s for each model (combination of operations). The ones using operations

similar to Adleman‘s have the obvious advantage that they could already be

successfully implemented in the lab. The obstacle preventing the large scale

automatization of the process is that most bio–operations rely on mainly manual

handling of tubes. In contrast, the model introduced by Tom Head in [21] aims to be

an ―one–pot‖ tube with all the operations carried out in principle by enzymes.

Moreover, it has the theoretical advantage of being a mathematical model with all the

claims backed up by mathematical proofs. Its disadvantage is that the current state of

art in molecular biology has not allowed yet practical implementation. Overall, the

existence of different models with complementing features shows the versatility of

DNA computing and increases the likelihood of practically constructing a DNA–

computing–based device.

chapter 1 aims and objectives -...

Documents