dna computing: a research snapshotlkari/dnasnapshot.pdf · a research snapshot •adleman’s20...

Post on 13-Jul-2020

1 Views

Category:

Documents

0 Downloads

Preview:

Click to see full reader

TRANSCRIPT

DNA Computing: A Research DNA Computing: A Research SnapshotSnapshotSnapshotSnapshot

Lila Kari

A research snapshot

• Adleman’s 20 variable 3-SAT experiment• DNA Benenson automata• DNA memory• Towards a programmable DNA computer• Towards a programmable DNA computer• DNA nanoscale shapes• DNA nanomachines• Impact on theoretical computer science

(1) Adleman’s 20-variable 3-SAT [Braich et al., Science, 2002]

• The first experiment that demonstrated that DNA Computing devices can exceed the computational power of an unaided human

• The answer to the problem was found after an • The answer to the problem was found after an exhaustive search of more than 1 million possible solution candidates

Input to 3-SAT and solution

Algorithm for 3SAT

• Input: A Boolean formula in 3CNF• Step 1: Generate the set of all possible truth

value assignments• Step 2: Remove the set of all truth value • Step 2: Remove the set of all truth value

assignments that make the first clause false• Step 3: Repeat Step 2 for all clauses of the

input formula• Output: The remaining (if any) truth value

assignments

Encoding the input

• Every variable xk, k =1,..., 20, was associated with two distinct 15-mer DNA single strands called ‘value sequences’, one representing true, and one representing falsetrue, and one representing false

Library of candidates

• Each of the possible 2^20 truth assignments was represented by a 300-mer ‘library strand’ consisting of the ordered catenation of one 15-mer value sequence for each variable, i.e.,

W1 W2 ..... W20, where Wi is Xi^T or Xi^FTo obtain these library strands, the 40individual 15-mer sequences were assembled using a mix-and-match combinatorial technique

3SAT wetware

• A glass ‘library module’ filled with a gel containing the library

• One glass ‘clause module’ for each of the 24 clauses of the formulaclauses of the formula

• Each clause module was filled with gel containing probes, i.e., 15-mer strands Watson-Crick complementary to the truth assignment that made that particular clause true

Check if the clause modules had catenated 3 strands or how?

Bioalgorithm for 3SAT

• The strands are moved between modules by gel electrophoresis

• The library passes through the first clause module, wherein library strands containing the 3 truth assignments satisfying the first clause are immobilized, while library strands that do not satisfy it go into a while library strands that do not satisfy it go into a buffer reservoir

• The captured strands are released by raising the temperature, and used as input to the 2nd clause module, etc.

• At the end, only the strands representing the truth assignment satisfying all 24 clauses remain

Output to 3SAT

• The output was PCR amplified with primer pairs corresponding to all 4 possible true-false combinations of assigments for the first and last variable, x1 and x20variable, x and x

• None except the primer pair (X1F, WK(X20F)) showed any bands, indicating two truth values of the satisfying assignment, x1 = F and x20 =F

• The process was repeated for all variable pairs(x1, xk), k = 2,..., 19

(2) DNA Benenson Automata [Benenson et al., Nature 2001]

Construct a simple two-state automaton over a two-letter alphabet set, using double -stranded DNA molecules and restriction enzymes

Automaton accepts those strings that have an even number of letters b

Main engine of Benenson automata

• FokI enzyme: an unusual restriction enzyme that recognizes a sequence and cuts unspecifically a short distance away

• Recognition site

5’-GGATG-3’3’-CC TAC -5’

• Cleaves 9bp away on the top strand and 13 bp away on the bottom strand

Encoding the input

• Encoding of the symbol a

• Encoding of the symbol b• Encoding of the symbol b

• Encoding of the terminator t

Example of encoding the input

• The input strand ab is encoded as a DNA strand that contains the site for FokI, followed by the catenation of the encodings for abt

Encoding state/symbol pairs

The pair S0a is encoded as 5’-GGCT-3’(the 4-mer suffix of a)

The pair S1a is encoded as 5’-CTGG-3’(the 4-mer prefix of a)(the 4-mer prefix of a)

Meaning: If the 4-mer suffix of the encoded symbol is detected then the symbol is interpreted as being read in state S0

If the 4-mer prefix is detected, then the symbol is being interpreted as being read in state S1

S0a is the 4-mer suffix of a, S1a is the 4-mer prefix of a. This method permits a symbol to be interpreted in 2 ways.

Output detection molecules

• S0-D is a 161-mer DNA double strand with an overhang 3’-AGCG-5’ which ‘detects’ the last state of the computation as being S0

S1-D is a 251-mer DNA double strand with overhang 3’-ACAG-5’ which detects the last state of the computation as being S1

8 possible transition molecules

Each transition molecule has a 4-mer overhang,for example T1 has 3’-CCGA-5’ ,that can selectively bind to the DNA encoding thecurrent state/symbol pair, in this case S0a

Example computation on input ab

Computation on input ab

• FokI enzyme cuts the input encoding abt exposing the sticky end 5’-GGCT-3’, i.e., S0a

• The transition molecule T1: S0a -> S0 detectsthis state/symbol by binding and forming a double-stranded molecule (using ligase)double-stranded molecule (using ligase)Note: The transition molecule T1, incorporated in the current molecule, contains a FokI restrictionsite. Moreover, the 3bp spacer after the site ensures that the next cleaving will expose a suffix of the next symbol, which will be correctly interpreted as S0b

Computation on input ab, contd.

• The overhang is now 5’-CAGC-3’ , i.e., S0b• The sticky-end fits the transition rule

T4: S0b à S1

The combination of the current strand with TThe combination of the current strand with T4

and ligase leads to another double strandA last use of FokI exposes the overhang 5’-TGTC-3’ which is a suffix of the terminator, interpreted as S1t

Outcome of computation on input ab

• The overhand is complementary to the sticky-end 3’-ACAG-5’ of the detector molecule S1-Dcorresponding to the last state of the computation being S1.

• The state S1 is not final, and thus the outcome of the computation is that the input ab is not accepted by the automaton

• Note that any two-state two-symbol automaton can be build using this method

Application of Benenson automata

• Medical diagnosis and treatment: smart drugs[Benenson et al., Nature 2004]

• Automaton to identify and analyze the mRNA of disease-related genes associated with lung of disease-related genes associated with lung and prostate cancer, and produce a single-stranded DNA molecule modelled after an anti-cancer drug

(3) DNA Memory

• Information-encoding density• [Reif et al., DNA7, 2002] DNA has the

potential of storing on the order of 10^12more compactly than conventional storage more compactly than conventional storage technologies

• [Baum, Science, 1995]: content-addressable DNA memory vastly larger than the brain

Nested Primer Molecular Memory[ Yamamoto et al., 2008]

• NPMM = pool of strands wherein each stradcodes both data information and address information

[CLi, BLj, Alk, DATA, ARq, BRr, CRs][CLi, BLj, Alk, DATA, ARq, BRr, CRs]

Here i, j, k, q, r, s are between 0 and 15 and each component, e.g., CL0 represents a 20-mer DNA sequence

How to retrieve data

• Use nested PCR consisting of 3 steps• Use PCR with primer pair (CLi, WK(CRs))• WK(s) is the Watson-Crick complement of s• This results in amplification of all molecules • This results in amplification of all molecules

starting with CLi and ending in CRs

• Second PCR uses primer pair (BLj, WK(BRk))• Third PCR use primer pair (ALk, WK(ARq))• Sequencing will result in retrieval of the DATA

Advantages of NPMM memory

• Enormous address space: 16.8 million addresses

• High specificity• Proper selection of DNA sequences avoids • Proper selection of DNA sequences avoids

mutation during PCR

Organic DNA memory

• [Wong, Wong, Foote, Comm.ACM, 2003]• [Yachie et al., Biotechnology Progress, 2007]• Memory technology using living organisms• First paper proposes a candidate for a living • First paper proposes a candidate for a living

host for DNA memory sequences that tolerates the addition of artificial gene sequences and survives extreme environmental conditions

Organic memory

• Use Escherichia coli, and Deinococcusradiodurans (can survive extreme conditions including cold, dehydration, vacuum, acid and radiation)radiation)

• Information encoding stage: an encoding scheme was chosen that assigned 3-mer sequences to various symbols. For example:

AAA = “0”, AAC = “1”, AGG = “A”

Information encoding

• Each of the encoding 3-mers contained only 3 of the 4 DNA nucleotides

• Using this encoding, any English text could be codified as a DNA sequencescodified as a DNA sequences

• The text chosen for this experiment was“And the oceans are wide”

Several additional sequences were chosen to act as sentinels and tag the beginning and end of messages

Choosing sentinel sequences

• Identify a set of twenty-five 20-mer sequences that do not exist in either genome, yet satisfy all the genomic constraints and restrictions

• All sequences contained multiple stop codons• All sequences contained multiple stop codonsTAA, TGA, TAG as subsequences to prevent misinterpreting the memory strands, translating them into artificial proteins that could kill the bacteria

Inserting the message• A 46bp DNA sequence was created, consisting

of two different 20bp sentinels, connected by a 6bp recognition site of an enzyme

• The embedded DNA was then inserted into cloning vectors, and transferred into E.colicloning vectors, and transferred into E.coliallowing the vector to multiply

• The vector and encoded DNA were then incorporated into the genome of Deinococcusfor permanent storage and retrieval

Organic DNA memory

Advantages of organic memory

• Message can be retrieved using prior knowledge of sequences at both borders, by PCR, read-out and decoding

• 1ml of liquid can contain up to 10^9 bacteria• 1ml of liquid can contain up to 10^9 bacteria• Potential disadvantages are random mutations

but these are unlikely given the natural cellular mechanisms for detecting and correcting errors.

(4) Towards a programmable DNA computer

• [Sakamoto et al., Science, 2000]• [Hagiya et al., DNA3, 1997]• A self-acting DNA molecule containing, on the

same strand, the input, the program, and the same strand, the input, the program, and the working memory

• Whiplash PCR

Whiplash PCR

• The 5’ end of the DNA single strand contains state transitions A à B, encoded as DNA rule blocks

WK(B) – WK(A) – stopper sequenceWK(B) – WK(A) – stopper sequenceThe 3’ end of the strand contains the encoding of “current state”, say A

Whiplash PCR transition A àààà B

• Step (i): Cooling the solution will lead the 3’ end of the DNA strand, A, to attach to its corresponding rule block, namely WK(A)

• Step (ii): PCR is used to extend the now-attached • Step (ii): PCR is used to extend the now-attached end A by the encoding of the new state B, and the process is stopped by the stopper sequence

• Step (iii): By raising the temperature, the new current state B is detached, and the new transition cycle can begin

Whiplash PCR

(5) DNA nanoscale shapes[Rothemund, Nature, 2006]

• ‘Scaffolded DNA origami’ for fabrication of any 2D-shape of 100nm diameter

• Technique: DNA strands form complex structures by their design, which makes it structures by their design, which makes it possible for some single DNA strands to participate in two double helices – they wind along one helix, then switch to another

DNA origami design process

• (1) Build an approximate geometric model of the desired shape; the shape is approximated by cylinders that are models of DNA double helices

• (2) Fill the shape by folding a single long ‘scaffold strand’ back an forth in a raster pattern such that at each moment the scaffold strand represents either the main strand or the complement strand of as double helix

DNA origami design process

• (3) Use a computer program to generate a set of ‘staple strands’ that provide Watson-Crick complements of the scaffold

• The staple strands are designed to bind to • The staple strands are designed to bind to portions of the scaffold strand, holding it thus together in the desired shape

• The staple strands are fine-tuned to minimize strain and optimize binding specificity and binding energy

Testing DNA origami

• Scaffold = circular genomic DNA, 7, 249nt long, from the virus M13mp18

• Use 250 short staple strands and mix with the scaffold, in 10-fold excess to itscaffold, in 10-fold excess to it

• The strands annealed in less than two hours and AFM (Atomic Force Microscopy) imaging showed that the desired shape was realized

• Results: Assembly of squares, triangles, five-pointed stars, smiley faces

DNA origami

(6) DNA nanomachines

• Dynamic DNA structures with potential use to nanofabrication, engineering and computation

• DNA-based nanodevices can convert static DNA structures into machines that can move DNA structures into machines that can move or change conformation

• Examples: tweezers, walkers that can be moved along a track, autonomous molecular motors

Molecular tweezers[Yurke et al., Nature, 2000]

• Two partially double-stranded DNA arms connected by a short single DNA strand acting as a flexible hinge

• The resulting structure is on the shape of a • The resulting structure is on the shape of a pair of open tweezers

• A ‘set strand’ is designed in such a way as to be complementary to both single-stranded ‘tails’ at the end of the arms

Molecular tweezers

• Adding the ‘set strand’ results in its annealing to both tails of the arms, bringing thus the arms of the tweezers together in a ‘close’ configuration

• A short region of the set strand remains single • A short region of the set strand remains single stranded, And is used as a toehold that allows a new ‘reset strand’ to strip the set strand from the arms by itself hybridizing with the set strand - the tweezers are returned to the ‘open’ configuration

Molecular tweezers

Molecular walker[Shin, Pierce, JACS, 2004]

• DNA device with two distinguishable feet that walks directionally on a linear DNA track with single strands periodically protruding from it and acting as anchors

• The walker is double-stranded and has two single-stranded extensions acting as ‘legs’

• Specific attachments bind the legs to the single-stranded anchors placed periodically along the double-stranded track

Molecular walker step

• A step requires the sequential addition of two strands: the first lifts the back foot from the track, by strand displacement – a process by which an invading DNA single strand can displace one of the constituent strands of a displace one of the constituent strands of a double-strand by replacing it with itself, provided the new structure is more stable–

• The second strand places the released foot ahead of the stationary foot

Molecular walker

• Molecular walker step

Other molecular walkers

• [Sherman, Seeman, Nanoletters, 2004] – walking devices based on pattern of inchworms – the front foot steps forward and the back foot catches up

• [Sekiguchi et al., DNA13, 2008] Autonomous • [Sekiguchi et al., DNA13, 2008] Autonomous three-legged walker (no need for fuel strands) that can walk autonomously in 2D or 3D on a designed route. It uses an enzyme as a source of power and a track of DNA equipped with many DNA anchors arranged in a specific pattern

(7) DNA Computing: Impact on Theoretical Computer Science

• The genetic code• Splicing systems• Optimal encodings for DNA Computing• Sticker systems• Sticker systems• Watson-Crick automata• Combinatorics on DNA words • Cellular computing• DNA computation by self-assembly

1953: Watson and Crick discover DNA structure

The RNA Tie Club

• 1954 “Solve the riddle of the RNA structure and to understand how it builds proteins” (clockwise from upper left: Francis Crick, L. Orgel, James Watson, Al. Rich)

• There are 20 aminoacids that build up proteins

The Diamond Code

• G.Gamow - double stranded DNA acts as a template for protein synthesis: various combinations of bases could form distinctively shaped cavities into which the side chains of aminoacids might fit

Comma-Free Codes(the prettiest wrong idea in 20-th century science)

• The RNA piglet model

The prettiest wrong idea in all of 20th

century science

• Suckling-pig model of protein synthesis• Construct a code in which when two sense

codons (triplets) are catenated, the subword codons are nonsense codonscodons are nonsense codons

• If CGU and AAG are sense codons, then GUAand UAA must be nonsense because they appear in CGUAAG

Comma-free codes (Crick 1957)

• How many words can a comma-free codeinclude?

• For n=4 and k=3 the size of a maximal comma-free code is the magic number 20free code is the magic number 20

• For an alphabet of n letters grouped into k-letter words, if k is prime, the number of maximal comma-free codes is (n^k –n)/k

• For n=4 and k=3 this equals 408

Reality Intrudes

• News from the lab bench: [Nirenberg,Matthaei ’61] synthesize RNA, namely poly-U, coding for phenylalanine

• By 1965 the genetic code was solved • By 1965 the genetic code was solved • The code resembled none of the theoretical

notions• The “extra” codons are merely redundant

The Genetic Code

Splicing Systems (Head 1987)

5’ CCCCCTCGACCCCC 3’3’GGGGGAGCTGGGGG5’ +

5’AAAAAGCGCAAAAA 3’5’AAAAAGCGCAAAAA 3’3’ TTTTTCGCGTTTTT 5’ +

Enzyme 1 + Enzyme 2 5’TCGA3’ 5’GCGC3’3’AGCT5’ 3’CGCG5’

Splicing Systems

5’ CCCCCT CGACCCCC 3’ 3’GGGGGAGC TGGGGG5’ +

5’AAAAAG CGCAAAAA 3’5’AAAAAG CGCAAAAA 3’3’ TTTTTCGC GTTTTT 5’

DNA strands with compatible sticky endsrecombine to produce two new strands

Splicing operation

Splicing system sample results

Theorem (Paun’95, Freund,Kari,Paun ,’99)Every type-0 language can be generated by a splicing system with finitely many axioms and finitely many rules.

Theorem (Freund,Kari,Paun ’99)For every given alphabet T there exists a splicing system, with finitely many axioms and finitely many rules, that is universal for the class of systems with terminal alphabet T.

From DNADNA to TCSTCS

• The genetic code• Splicing systems• Optimal encodings for DNA Computing• Sticker systems• Sticker systems• Watson-Crick automata• Combinatorics on DNA words • Cellular computing• DNA computation by self-assembly

Encoding Information for DNA Computing

• DNA strands should form desired bonds• DNA strands should be free of undesirable

intra-molecular bonds• DNA strands should be free of undesirable • DNA strands should be free of undesirable

inter-molecular bonds

Intramolecular Bonds

GAT AGC AC C

AGACTG

GC T AT CGAT AGC A A

TAC C T

GC

ATGAC

CTG

Intra- and inter-molecular bonds

DNA-complementarity model (Kari,Kitto,Thierrin’02)

3’

A C G T C G A CT A

G5’

(a)

A

T

C

G C

G

A

T C

G C

G A

T G A T C

CT

3’

(b)

(c)

(d)

5’

Bond-free languages

Bonds between DNA strands

Sample Results (Hussini/Kari/Konstantinidis/Losseva/Sosik ‘03)

Sticker Systems (Freund,Paun,Rozenberg,Salomaa’98,

Kari,Paun,Rozenberg,Salomaa,Yu’98, Hoogeboom,van Vugt’00, Kuske,Weigel’04,

Paun,Rozenberg ‘98)

Given a complementarity relation, define an Given a complementarity relation, define an alphabet of double-stranded columns

Sticking operation

Complex Sticker Systems

• Sakakibara,Kobayashi ‘01: Sticker systems based on hairpins

• Alhazov,Cavaliere ’05: Observable sticker systems

Watson-Crick Automata (Freund,Paun,Rozenberg,Salomaa’99;Paun,Rozenberg’98;

MartinVide,Paun,Rozenberg,Salomaa’98;Czeizler,Czeizler06; Paun,Paun’99;Czeizler,Czeizler,Kari,Salomaa’08)

From DNADNA to TCSTCS

• The genetic code• Splicing systems• Optimal encodings for DNA Computing• Sticker systems• Sticker systems• Watson-Crick automata• Combinatorics on DNA words• Cellular computing• DNA computation by self-assembly

Combinatorics on DNA Words

• IDEA: Consider the word w and its WK-complement, WK(w), as equivalent

• The word ACTG CAGT CAGT can be considered • The word ACTG CAGT CAGT can be considered repetitive (periodic) because it can be written as ACGT WK(ACGT)2

• Generalize classical notions such as power of a word, border, primitive word, palindrome, conjugacy, commutativity

Identity => Antimorphic involution f

Pseudo-palindrome (de Luca,De Luca’06, Kari,Mahalingam’09) u = f(u)

Pseudo-commutativity(Kari,Mahalingam’08) u v = f(v) u

Pseudo-bordered word (Kari,Mahalingam’07) Pseudo-bordered word (Kari,Mahalingam’07) w = v x = y f(v)

Pseudoknot-bordered word (Kari,Seki’09) w = u v x = y f(u) f(v)

Pseudo-conjugacy of u, v (Kari,Mahalingam’08) u x = f(x) v

Fine and Wilf Theorem

Extended Fine and Wilf Theorem

Extended Fine and Wilf Theorem

Lyndon-Schutzenberger Equation

Lyndon & Schutzenberger 1962, Lothaire 1983, Harju Nowotka 2004

Extended Lyndon-Schuzenberger

Extended Lyndon-Schutzenberger

DNA Computing: A research snapshot

• Adleman’s 20 variable 3-SAT experiment• DNA Benenson automata• DNA memory• Towards a programmable DNA computer• Towards a programmable DNA computer• DNA nanoscale shapes• DNA nanomachines• Impact on theoretical computer science

Our Challenge

• Discover a new, broader notion of computation

• Understand the world around us in terms of information processinginformation processing

• “Biology and Computer Science –life and computation – are related. I am confident that at their interface great discoveries await whose who seek them.” (Adleman’98)

top related