1 biological computing – dna solution presented by wooyoung kim 4/8/09 csc 8530 parallel...

35
1 Biological Computing – DNA solution Presented by Wooyoung Kim 4/8/09 CSc 8530 Parallel Algorithms, Spring 2009 Dr. Sushil K. Prasad

Upload: eileen-jacobs

Post on 19-Jan-2016

219 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: 1 Biological Computing – DNA solution Presented by Wooyoung Kim 4/8/09 CSc 8530 Parallel Algorithms, Spring 2009 Dr. Sushil K. Prasad

1

Biological Computing – DNA solution

Presented by Wooyoung Kim

4/8/09

CSc 8530 Parallel Algorithms, Spring 2009

Dr. Sushil K. Prasad

Page 2: 1 Biological Computing – DNA solution Presented by Wooyoung Kim 4/8/09 CSc 8530 Parallel Algorithms, Spring 2009 Dr. Sushil K. Prasad

Outline

NP and NP-complete

Biological computation

Hamiltonian path problem (HPP)

Satisfaction problem

Generalized SAT

Discussion

Page 3: 1 Biological Computing – DNA solution Presented by Wooyoung Kim 4/8/09 CSc 8530 Parallel Algorithms, Spring 2009 Dr. Sushil K. Prasad

NP and NP-complete NP vs. NP-complete

NP problems: Non-deterministic Polynomial Time complexity.

NP-complete : all NP problems can be reduced to it, and if it has an efficient solution, then

so do all NP problems.

No general efficient solution exists for any NP-complete problem.

Page 4: 1 Biological Computing – DNA solution Presented by Wooyoung Kim 4/8/09 CSc 8530 Parallel Algorithms, Spring 2009 Dr. Sushil K. Prasad

Biological computation – Adv. Speed of any computer is determined by:

1. How many parallel processes it has.

2. How many steps each can perform per unit time.

Biological computations could potentially have vastly more parallelism.

Ex: 3 g water contains approx. 1022 molecules.

The second factor favors conventional computers, since biological machine

is limited to small fraction of a biological experiment.

However, the advantage in parallelism is so huge, the difference in the

execution time is not a problem.

Page 5: 1 Biological Computing – DNA solution Presented by Wooyoung Kim 4/8/09 CSc 8530 Parallel Algorithms, Spring 2009 Dr. Sushil K. Prasad

Biological computation – Disadv. Even with parallelism, brute force approach is not always feasible,

too inefficient.

The biological computer can solve any HPP of 70 or less edges.

Practically, there is not a great need, though.

Page 6: 1 Biological Computing – DNA solution Presented by Wooyoung Kim 4/8/09 CSc 8530 Parallel Algorithms, Spring 2009 Dr. Sushil K. Prasad

Hamiltonian Path ProblemL.M. Adleman. "Molecular Computation of Solutions

To Combinatorial Problem," Science, vol. 266, 1994, pp 1021-1024.

Using DNA, solve Hamiltonian Path Problem efficiently.

Page 7: 1 Biological Computing – DNA solution Presented by Wooyoung Kim 4/8/09 CSc 8530 Parallel Algorithms, Spring 2009 Dr. Sushil K. Prasad

Hamiltonian Path Problem

1

0

3

2 5

6

4

0 1 2 3 4 5 6

Page 8: 1 Biological Computing – DNA solution Presented by Wooyoung Kim 4/8/09 CSc 8530 Parallel Algorithms, Spring 2009 Dr. Sushil K. Prasad

Algorithm for HPP

1. Generating random paths through the graph.

2. Keep only those paths that begin with vin and end with vout.

3. If the graph has n vertices, then keep only those paths that enter exactly

n vertices.

4. Keep only those paths that enter all of the vertices of the graph at least

once.

5. If any paths remain, say “Yes”; otherwise say “No”.

Page 9: 1 Biological Computing – DNA solution Presented by Wooyoung Kim 4/8/09 CSc 8530 Parallel Algorithms, Spring 2009 Dr. Sushil K. Prasad

Implementing Step 11. Generating random paths through the graph.

Ligation reaction (annealing) Each vertex encoded by random 20bp sequences (Oi) Approximately 3x1013 copies of the associated oligonucleotides (a short

nucleic acid polymer) were added.

TATCGGATCG GTATATCCGA GCTATTCGAG CTTAAAGCTA

GTATATCCGA GCTATTCGAG

Vertex 2 (O2) Vertex 3 (O3)

Edge 2->3

Page 10: 1 Biological Computing – DNA solution Presented by Wooyoung Kim 4/8/09 CSc 8530 Parallel Algorithms, Spring 2009 Dr. Sushil K. Prasad

Implementing Step 22. Keep only those paths that begin with vin (O0)and end

with vout(O6).

The product of step 1 were amplified by PCR (polymerase chain reaction) using O0(starting point) and O6(ending point)

Thus keep only those molecules encode paths which begin with vin and end with vout.

O0

O6

Page 11: 1 Biological Computing – DNA solution Presented by Wooyoung Kim 4/8/09 CSc 8530 Parallel Algorithms, Spring 2009 Dr. Sushil K. Prasad

Implementing Step 33. If the graph has n vertices, then keep only those paths

that enter exactly n vertices.

The product of Step2 was run on an agarose gel.

The 140bp band (corresponding to double strand (ds) DNA encoding paths entering exactly seen vertices) was excised and soaked in ddH2O to extract DNA.

Page 12: 1 Biological Computing – DNA solution Presented by Wooyoung Kim 4/8/09 CSc 8530 Parallel Algorithms, Spring 2009 Dr. Sushil K. Prasad

Implementing Step 44. Keep only those paths that enter all of the vertices of the

graph at least once.

The product of step 3 was affinity-purified with a biotin-avidin magnetic bead system, by

First generating single stranded (ss) DNA from the dsDNA of step3 Then incubating the ssDNA with the O1 conjugated to magnetic

beads. Only those ssDNA containing O1 annealed to the bound O1, and

were were retained. It is repeated with O2 until O5

Page 13: 1 Biological Computing – DNA solution Presented by Wooyoung Kim 4/8/09 CSc 8530 Parallel Algorithms, Spring 2009 Dr. Sushil K. Prasad

Implementing Step 55. If any paths remain, say “Yes”; otherwise say “No”.

The product of step 4 was amplified by PCR and run on a gel.

Page 14: 1 Biological Computing – DNA solution Presented by Wooyoung Kim 4/8/09 CSc 8530 Parallel Algorithms, Spring 2009 Dr. Sushil K. Prasad

Drawbacks 7 days of lab work. Step 4 (magnetic bead separation) is most labor-intensive

work.

Possibility of errors Pseudo-paths Inexact reactions Hairpin loops

Page 15: 1 Biological Computing – DNA solution Presented by Wooyoung Kim 4/8/09 CSc 8530 Parallel Algorithms, Spring 2009 Dr. Sushil K. Prasad

Advantages The number of different oligonucleotides required should

grow linearly with the number of edges. O(n)

The fastest supercomputer vs. DNA computer 106 op/sec vs. 1014 op/sec 109 op/J vs. 1019 op/J (in ligation step) 1bit per 1012 nm3 vs. 1 bit per 1 nm3

(video tape vs. molecules)

Page 16: 1 Biological Computing – DNA solution Presented by Wooyoung Kim 4/8/09 CSc 8530 Parallel Algorithms, Spring 2009 Dr. Sushil K. Prasad

Satisfaction problem

SAT consists of a Boolean formula of , , where

each Cl is a clause of the form . Vi is a variable or

its negation. Ex.

Problem : find values of the variables so that the formula is 1.

If we have n variables, then there are 2n choices to search.

mCCC ...21

kvvv ...21

)()( yxyxF

Page 17: 1 Biological Computing – DNA solution Presented by Wooyoung Kim 4/8/09 CSc 8530 Parallel Algorithms, Spring 2009 Dr. Sushil K. Prasad

Satisfaction problem

The graph Gn encoding two-bit numbers

Graph formulation

• Suppose we have n variables in the formula, where ai represents the variables.

• This graph is constructed so that all paths from a1 to an+1 encode an n-bit

binary number.

• At each stage, a path has exactly two choices : unprimed1, primed0

Ex. A path a1x’ a2ya3 01 , that is, x is 0 and y is 1.

unprimed 1

primed 0

Page 18: 1 Biological Computing – DNA solution Presented by Wooyoung Kim 4/8/09 CSc 8530 Parallel Algorithms, Spring 2009 Dr. Sushil K. Prasad

Satisfaction problem

Example

Number of variables : n=2 (x and y)

Number of clauses : m =2

Construct a graph with (n+1) +2n nodes for each clause and connect them as

the following;

)()( yxyxF

Page 19: 1 Biological Computing – DNA solution Presented by Wooyoung Kim 4/8/09 CSc 8530 Parallel Algorithms, Spring 2009 Dr. Sushil K. Prasad

Satisfaction problem Graph paths and SAT problem

• If we have a path from a1 to an+1 , that means each variable is

represented by 0 or 1 and the formula satisfies.

• If there is no path from start to end, then the formula does not have

any solution (not satisfies).

• Using the properties of DNA annealing (Watson-Crick complement

binding), we can construct a graph representing the variables, and using

test tubes, we can either obtain paths (satisfies) or no paths at all (not

satisfies).

Page 20: 1 Biological Computing – DNA solution Presented by Wooyoung Kim 4/8/09 CSc 8530 Parallel Algorithms, Spring 2009 Dr. Sushil K. Prasad

Satisfaction problem

• Assign random pattern of DNA strings to each vertex. (ex. length 8)

• Then decide the pattern of DNA strings of each edge.

ATTCGGAA TTACGGGT GGATTCCA

TATCCCGA

GCTAAGCT

GGCTCGTT

CCCAATTA

CCTTATAG

CCTTCGAT TCGAAATG

GGCTAATG CCCACCGA

CCCAGGGT

GCAACCTA

TAATCCTA

Page 21: 1 Biological Computing – DNA solution Presented by Wooyoung Kim 4/8/09 CSc 8530 Parallel Algorithms, Spring 2009 Dr. Sushil K. Prasad

Satisfaction problem

• In an initial test tube t0, put many copies of the DNA strings corresponding

the vertices and the edges. (many copies of each vertex and each edge)

• Put a sequence of complement of the first half of a1 and complement of the

last half of a3 : To show the start and end strings.

ATTCGGAA TTACGGGT GGATTCCA

TATCCCGA

GCTAAGCT

GGCTCGTT

CCCAATTA

TAAG AGGT

Page 22: 1 Biological Computing – DNA solution Presented by Wooyoung Kim 4/8/09 CSc 8530 Parallel Algorithms, Spring 2009 Dr. Sushil K. Prasad

1. Let t0 be an initial test tube containing all the DNA strings of

vertices and edges.

2. Since the first clause is (that is, the first variable x is 1),

operate E(t0,1,1) setting the first variable x to 1. Then extract only

those corresponding patterns (10,11) and put it to t0-1

3. Put the remainder (pattern 00, 01), to t’0-1 and operate E(t’0-1,2,1)

setting the second variable y to 1. Then extract only those

corresponding patterns from t’0-1 and put them to t0-2

4. Pour t0-1 and t0-2 together to form t1 test tubes.

5. Note that now the patterns of t1 is 01,10,11 and that is the solution of

the first clause.

Satisfaction problem

)( yx

)()( yxyxF

Page 23: 1 Biological Computing – DNA solution Presented by Wooyoung Kim 4/8/09 CSc 8530 Parallel Algorithms, Spring 2009 Dr. Sushil K. Prasad

6. Repeat the same process for the second clause starting from t1.

7. Since the second clause is operate E(t1, 1, 0) to

extract it to the t1-1 test tube.

8. Put the remainder to t’1-1 and make t1-2 by operating E(t’1-1,

2,0).

9. Pour t1-1 and t1-2 into t2 test tube.

10.Check to see if there is any DNA in the last tube.

11.The satisfying assignments are exactly those in this final test

tube.

Satisfaction problem

)( yx

Page 24: 1 Biological Computing – DNA solution Presented by Wooyoung Kim 4/8/09 CSc 8530 Parallel Algorithms, Spring 2009 Dr. Sushil K. Prasad

Satisfaction problem

Test tube OP Values

t0 initial 00, 01, 10, 11

t0-1 E(t0,1,1) 10, 11

t’0-1 Reminder of t0-1 00, 01

t0-2 E(t’0-1,2,1) 01

t1 Put t0-1 and t0-2 together 01, 10, 11

t1-1 E(t1, 1, 0) 01

t’1-1 Reminder of t1-1 10,11

t1-2 E(t’1-1,2,0) 10

t2 Put t1-1 and t1-2 together 01, 10

Page 25: 1 Biological Computing – DNA solution Presented by Wooyoung Kim 4/8/09 CSc 8530 Parallel Algorithms, Spring 2009 Dr. Sushil K. Prasad

Satisfaction problem

For general formula with n variables and m clauses, we only need O(m)

number of test tubes. (For each clause there are constant number of test tubes

are additionally constructed)

The last tube are checked to see if there is any patterns (paths) left from the

start vertex to the end vertex.

Page 26: 1 Biological Computing – DNA solution Presented by Wooyoung Kim 4/8/09 CSc 8530 Parallel Algorithms, Spring 2009 Dr. Sushil K. Prasad

Generalized SAT

Generalize this to consider problems that correspond to any

Boolean formula.

Formulas are defined by the recursive definition

1. Any variable x is a formula

2. If F is a formula, then so is F

3. If F and G are formulas, then so are and GF GF

Page 27: 1 Biological Computing – DNA solution Presented by Wooyoung Kim 4/8/09 CSc 8530 Parallel Algorithms, Spring 2009 Dr. Sushil K. Prasad

Generalized SAT

Size of the formula S: the number of operations used to build the

formula.

SAT problem: given a formula, find an assignment of Boolean

values of variables so that the formula is true. NP-complete.

Claim: A O(S) number of DNA experiments can solve this SAT

problem.

Page 28: 1 Biological Computing – DNA solution Presented by Wooyoung Kim 4/8/09 CSc 8530 Parallel Algorithms, Spring 2009 Dr. Sushil K. Prasad

Generalized SAT – step1

1. Construct a contact network for a formula.

A contact network is a directed graph with source s and sink t

Each edge is x or

Given any assignment, an edge is connected if it is 1.

x

For example, the above graph is 1 only if w=1 or x=y=z=1

Page 29: 1 Biological Computing – DNA solution Presented by Wooyoung Kim 4/8/09 CSc 8530 Parallel Algorithms, Spring 2009 Dr. Sushil K. Prasad

Generalized SAT – step2

2. Solve the SAT problem of a contact network by deciding:

Whether or not there is an assignment of values to the variables

such that there is a directed connected path from s to t.

If two edges have the same label, they should be consistent.

How many of DNA experiments? – O(S)

Page 30: 1 Biological Computing – DNA solution Presented by Wooyoung Kim 4/8/09 CSc 8530 Parallel Algorithms, Spring 2009 Dr. Sushil K. Prasad

Generalized SAT – claims

Note that the result follows from the two claims:

Given any formula of size S, there is a contact network of size

linear in S , s.t. if the formula satisfies then the network satisfies.

Given any contact network of size S, the SAT problem for the

network can be solved in O(S) DNA experiments.

Page 31: 1 Biological Computing – DNA solution Presented by Wooyoung Kim 4/8/09 CSc 8530 Parallel Algorithms, Spring 2009 Dr. Sushil K. Prasad

Generalized SAT – claim 1

GFGF

GFGF

Existence of contact network for given formula: simple formula

Any formula can be placed into a normal form with DeMorgan’s laws.

Page 32: 1 Biological Computing – DNA solution Presented by Wooyoung Kim 4/8/09 CSc 8530 Parallel Algorithms, Spring 2009 Dr. Sushil K. Prasad

Generalized SAT – claim 1

(A) The networks for

(B) The networks for FE

FE

Existence of contact network for given formula: general formula

G is a network for E, H is a network for F.

Page 33: 1 Biological Computing – DNA solution Presented by Wooyoung Kim 4/8/09 CSc 8530 Parallel Algorithms, Spring 2009 Dr. Sushil K. Prasad

Generalized SAT – claim2

Solve the SAT problem for any contact network using O(S) number

of DNA experiments

Associate a test tube Pv with each node v in the contact network.

The test tube Pt associated with the sink t is the “answer”

Suppose that vu is an edge with the label x and that Pv is already

constructed. Then construct Pu by doing the extraction E(Pv, x,1)

If several edges leave a vertex v then use an amplify step to get multiple

copies in test tube Pv

If several enter a vertex v, then pour the resulting test tubes together to

form Pv.

Page 34: 1 Biological Computing – DNA solution Presented by Wooyoung Kim 4/8/09 CSc 8530 Parallel Algorithms, Spring 2009 Dr. Sushil K. Prasad

Discussion

Can we actually build DNA computers?

All the methods described here assumes that all the operations are

perfect without error.

However, the operations are not perfect.

In the future, the DNA-based computers are hoped to be a

practical means of solving hard problems.

Page 35: 1 Biological Computing – DNA solution Presented by Wooyoung Kim 4/8/09 CSc 8530 Parallel Algorithms, Spring 2009 Dr. Sushil K. Prasad

35

ReferenceR.J. Lipton. “DNA solution of hard computational problems,”

Science, vol. 268, 1995, pp.542-545.

L.M. Adleman. "Molecular Computation of Solutions To Combinatorial Problem," Science, vol. 266, 1994, pp 1021-1024.

R.J. Lipton. “Speeding Up Computations via Molecular Biology,” unpublished manuscript, available at www.cs.princeton.edu/~rjl/