from the ising model to biological sequence...

58
From the Ising Model to Biological Sequence Analysis Ralf Bundschuh Ohio State University May 6, 2008 Ralf Bundschuh (Ohio State University) Biologial sequence analysis May 6, 2008 1/1

Upload: others

Post on 21-Jul-2020

1 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: From the Ising Model to Biological Sequence Analysisbioserv.mps.ohio-state.edu/~rbund/home/ucr.pdf · From the Ising Model to Biological Sequence Analysis Ralf Bundschuh Ohio State

From the Ising Model to Biological Sequence Analysis

Ralf Bundschuh

Ohio State University

May 6, 2008

Ralf Bundschuh (Ohio State University) Biologial sequence analysis May 6, 2008 1 / 1

Page 2: From the Ising Model to Biological Sequence Analysisbioserv.mps.ohio-state.edu/~rbund/home/ucr.pdf · From the Ising Model to Biological Sequence Analysis Ralf Bundschuh Ohio State

Outline

Ralf Bundschuh (Ohio State University) Biologial sequence analysis May 6, 2008 2 / 1

Page 3: From the Ising Model to Biological Sequence Analysisbioserv.mps.ohio-state.edu/~rbund/home/ucr.pdf · From the Ising Model to Biological Sequence Analysis Ralf Bundschuh Ohio State

Biological sequences

Outline

Ralf Bundschuh (Ohio State University) Biologial sequence analysis May 6, 2008 3 / 1

Page 4: From the Ising Model to Biological Sequence Analysisbioserv.mps.ohio-state.edu/~rbund/home/ucr.pdf · From the Ising Model to Biological Sequence Analysis Ralf Bundschuh Ohio State

Biological sequences Sequence data

DNA sequences

A piece of human chromosome 21

...TCTACATGTAAAAATATGTATTTTTAAAAATTGGATGTCATGGGCTGGGTGTGGTGGCTCATGCCTGTAATCCCAGCACTTTGGGAGGCCGAGGCGGGTGGATCTCCTGAGGTCGGGAGTTCGAGACCAGCCTGACCAACATGGAGAAACCCCGTCTCTACTAAAAACACAAAATTATCCAGGCATGGTGGCACATGACTGTAATCCCAGCTACTAGGGAGGCTGAGGCAGGAGAAACACTTGAACCTGGGAGGCGGAGGTTGAGGTGAGCCGAGATCGCGCCATTGCACTCCAGCCTGGGCAACAAGAGTGAAACTCGGTCTCAAAAAAAAAAAATTTGATGTTATGGAAGTAGGGAGACAAAAAATGCTCTACAACTATTAACTGATGCTTTTCTGGTTTTGTTCTCCAGACACCATTCGCTTTTCACCCAAGATGATTTGATGTCTTATAAAACTCTGATGAACCATGATGGCTACACAGACATTAAGTATAGACAGCTATCAAGATGGGCAACAGGTGAGCTTGAACTTGATTCTGCATTCTAATTACAAATCAACCTGGCACTCAAGCATGAACATTGCTTTGTATACTTGCAATTCAATTGCCATGAGGTTGCATGCTCAGTGTTAGTGTATTATGCATTTATTGTACATTCGTGTTCAGAAAAAAAGCCATAGAATAATACTATTTCGTTAACTGATACCAAGATTGCCAGGAATCTTGACTTCCCTAAGTCATATGACAGTTTCTTGGGAATTTACCTTTTTAATGTCAGTGTTAATTAGCACTGTTACTTTGAAAGAAAACCCGGTTGATTTTCATGATGACAGATTCCCATGTTGACTGGTGGCTCTTCTGAGTGTCTAACTGGATCAGCTTTTGAATGGGAATCTTGTAGCCTCGTCTCCCCAGTTGTAGGCATGAGAGGGGCTGTCCCAGTAATGAATTTGCAGGGGCCCCAGTGCTCTATCTTTGTACCTTGCTCGTGCTTGGATGGTTGTGCCATACACGGGCAGCTCTCCATTGCCCTCCCACCATAGATGAGACTTTGTTCTCCTGGAAGCTGTGGTGTTTTGTGCTTTTGAGTATCTGAGTGTTTTGTGTTCTGTGACCTGAATGAATTGAGGAGCAGGTGGATCGAGACTTGGCTGAGGCCCTTGTGGTCTTTCTTGGCTTGCGATCTTGTTAAACACGGTGTTCTGAACCCACTGGCATTTGGCTCATCATCCCACTGACTCTGGAGCCAGTGAAGGGATTTGGCCCTGCCCTTTACTTTCCTGCCCAGCAGGCAGGGGCAGTGCAGTACACCCCCCTCGGCTCTCCTCCCCACCTCGAGGACTTCGGTGGCAAGGATCAGGCTCCGGAAAACTCACTGGAGCCATGCTGGTGAGGTCTGAGGAGGGGTTAGGAGCTGAGGCGCTGGGTCCCCCTTTCCCTGGTGGTTAGTTTTACCAACCAGTCCTTGTTCAGTTCCTGTGGCAGAGATTTTTGTTGTTGGTGGTGGTGGTGTTAGTGTTTTTTTTTCTCTGTATAGCAATTAAAGGAGGGAGATTCTGTGATGTAGTCAGCCTGCTTCCTTAGCCTAGAAGTCCTTAGTCCTTTGGTATTTCCAATTGACTTTTTTTTTTTTTTCTAAAATGCAAATCTAATAATGTCCCGCCTGAGCTCTCCAGTGGCTCCCTGTGGATTCCTGTGGGTTTCATGCAAAGACTGAGCTGCTCTGTGGCCCGAATCGTCTGGCCCCTCTGGACCCCAGGACGCCCCCAACATCTCTGCCTGGCATATCTTGGGCACCTCTCTGCCTGCCCTGGAGCACCGGCCTCACTGTTCCCATCACTCTTCTCCCTCCTGCCTGCCAGGTCTTTGCTCCGACCCCACTGCTGCCTCCTGTGCACCAAGGCACGGTGACCACCTCCAACACAGCCTGGTTGCTACCAGCCACCTCCTCCCAGGCAGCTGTGCCAGGTGCAGATGACACCTGGAGCACTGCCCTTTTCATACCCGAGTGTTTCCAAGGGCCTCGGAAGTGTTTAATCAGCATTATTTTAAATAAACATTGAAATATATCTACAGCGTAGACCTATCATAATTATTTTGCCATTTTTCCAAGGTTGAACATTTAGGTTTCTCTCTTTTCACAATCATTTTTTTTTCAAATACTGAAATGAATCTTTTAAGGCTTCTTATTTTTTATTATTTATTTATTTACTTATTTATTTATTTTGAGACAGAGTCTTGCTCTGTTGCCCCGGCTGGAGTGCAGTGGTGTGATCTCAGCTCACTGCAACCTCCGTCTCCCAGGTTCAAGCAATTCTCCTGTCTCAGCCTCCTGAGTACCTGGGATTACAGGTGTGTGCCACCACGCCCAGCTAATTTTTTTGTATTTTTAGTAGAGACAGGGTTTCACCATGTTGGCCAGGCTGATCTTGAACCCCTGACCTCAGGTGATCCGCCCACCTTGGCCTCCCAAAGTGCTGGGATCACAGGCATAAGCCACTGTGCCTGGCCTTTTTAAGGCTTTTTATACATGCTGGCAGATTGCCTTACACAAATACTGTGTCCACTTAGGCTTTATTGCTTTTATTTTTTTTTTTTAAGAGAAACATAAACAGTTTTCCTAATATGTTGTACCATTTAAAGGCAGCAGAATAGAAGTCATCTTATTGCAAAAACAAGACATTGGAGGGAAGAGAGCACAGGGCTGGAGGATGTGAGAGGCGTCCTGTGCGGGTGGGCGTTCATGGCTGGCCCCCAGTCTGTCTGGACAGTGGGGATGGCCCCGCTCCCATGAGGTCTCCCCGCCCCCGCTGCCCCAAGCTGCTTCCTCAAGGGGCAGAAGCATGGCCAAATCCACCGCGGGAGAAATGGCCCGTCCTGGTCCTGAGGAAGCTGAGGTCAGGACAGTCTAATCTGCTGCTCATGGATAACTAGAAGTTTACTTTCACGAAATTTTGTTTTTGTAAACTGATTTTTTTTAACGATTTAAATGTTTTTTACCTAAATGACAAAGGCATTGCTTGTTTAAAGCAGTTTAAATGATAGTATCTTTTAAGGCTTTAAGTAAACACAGCTGGCCTTTTCCTTTCTGAATGCAGTGACATTTTTATGGCTATGTATTGCTGAGGTTTGAGGGTAGATATGGGAGAAGTTCAACCTTGTCCCAAATATGTAGCGTATGGGTTAGGTTGTGTCTGTGACATGGTAAGAAGACCTTGGACTATTT...

Ralf Bundschuh (Ohio State University) Biologial sequence analysis May 6, 2008 4 / 1

Page 5: From the Ising Model to Biological Sequence Analysisbioserv.mps.ohio-state.edu/~rbund/home/ucr.pdf · From the Ising Model to Biological Sequence Analysis Ralf Bundschuh Ohio State

Biological sequences Sequence data

Amount of data

GenBank

Central sequence repository

Exponential growth

Currently 200 billion bases

Three human genomes (soon 1000)

Cow, dog, cat, mouse, rat, guinea pig,gorilla, chimpanzee, macaque, . . .

678 complete microbial genomes

What does it all mean? ⇒ Biological Sequence Analysis

Ralf Bundschuh (Ohio State University) Biologial sequence analysis May 6, 2008 5 / 1

Page 6: From the Ising Model to Biological Sequence Analysisbioserv.mps.ohio-state.edu/~rbund/home/ucr.pdf · From the Ising Model to Biological Sequence Analysis Ralf Bundschuh Ohio State

Biological sequences From sequence to function

Central dogma

Nature builds organisms from those sequences

Problem

One-dimensional information has to encode three-dimensional structure

Solution — the central dogma

DNA → RNA → protein → structure → function

The first step: DNA → RNA (transcription)

pre−mRNAintronsexons

mRNAsplicingpre−mRNA

RNA polymerase

promoter start stop

Ralf Bundschuh (Ohio State University) Biologial sequence analysis May 6, 2008 6 / 1

Page 7: From the Ising Model to Biological Sequence Analysisbioserv.mps.ohio-state.edu/~rbund/home/ucr.pdf · From the Ising Model to Biological Sequence Analysis Ralf Bundschuh Ohio State

Biological sequences From sequence to function

Translation

The second step: RNA → protein (translation)

mRNA: polymer with 4 different monomers A, C, G, U

protein: polymer with 20 different monomers

Three bases code for one amino acid (genetic code)

UUU F UCU S UAU Y UGU C AUU I ACU T AAU N AGU SUUC F UCC S UAC Y UGC C AUC I ACC T AAC N AGC SUUA L UCA S UAA * UGA * AUA I ACA T AAA K AGA RUUG L UCG S UAG * UGG W AUG M ACG T AAG K AGG RCUU L CCU P CAU H CGU R GUU V GCU A GAU D GGU GCUC L CCC P CAC H CGC R GUC V GCC A GAC D GGC GCUA L CCA P CAA Q CGA R GUA V GCA A GAA E GGA GCUG L CCG P CAG Q CGG R GUG V GCG A GAG E GGG G

Example — myoglobinAUGGGGCUCAGCGACGGGGAAUGGCAGCUGGUGCUGAACGUCUGGGGGAAGGUGGAGGCUGAUGUCGCAGGCCAUGGGCAGGAGGUCCUCAUCAGCUCUUUAAGGGUCACCCCGAGACCCUGGAGAAAUUUGACAAGUUUAAGCACCUGAAGUCAGAGGAUGAGAUGAAGGCCUCUGAGGACCUGAAGAAGCACGGCAACACGGUGCUGACUG. . .

−→MGLSDGEWQLVLHVWAKVEADVAGHGQDILIRLFKSHPETLEKFDRFKHLKTEAEMKASEDLKKHGVTVLTALGAILKKKGHHEAELKPLAQSHATKHKIPIKYLEFISEAIIHVLHSRHPGDFGADAQGAMNKALELFRKDIAAKYKELGYQG

Ralf Bundschuh (Ohio State University) Biologial sequence analysis May 6, 2008 7 / 1

Page 8: From the Ising Model to Biological Sequence Analysisbioserv.mps.ohio-state.edu/~rbund/home/ucr.pdf · From the Ising Model to Biological Sequence Analysis Ralf Bundschuh Ohio State

Biological sequences From sequence to function

Folding

The third step: protein → structure (folding)

Different amino acids have different physical and chemical properties

Name Abbr. Charge Hydrophob.Alanine Ala A o +Arginine Arg R + -Asparagine Asn N o -Aspartic acid Asp D - -Cysteine Cys C - +Glutamine Gln Q o -Glutamic acid Glu E - -Glycine Gly G o +Histidine His H + -Isoleucine Ile I o +Leucine Leu L o +Lysine Lys K + -Methionine Met M o +Phenylalanine Phe F o +Proline Pro P o +Serine Ser S o -Threonine Thr T o -Tryptophan Trp W o +Tyrosine Tyr Y o +Valine Val V o +

MGLSDGEWQLVLHVWAKVEADVAGHGQDILIRLFKSHPETLEKFDRFKHLKTEAEMKASEDLKKHGVTVLTALGAILKKKGHHEAELKPLAQSHATKHKIPIKYLEFISEAIIHVLHSRHPGDFGADAQGAMNKALELFRKDIAAKYKELGYQG

Interactions among amino acids force folding into some structure

Ralf Bundschuh (Ohio State University) Biologial sequence analysis May 6, 2008 8 / 1

Page 9: From the Ising Model to Biological Sequence Analysisbioserv.mps.ohio-state.edu/~rbund/home/ucr.pdf · From the Ising Model to Biological Sequence Analysis Ralf Bundschuh Ohio State

Biological sequences From sequence to function

Function

The fourth step: structure → function

Proteins

bind other proteins

bind to small molecules

perform mechanical functions

. . .

Summary

DNA → RNA → protein → structure → function

Ralf Bundschuh (Ohio State University) Biologial sequence analysis May 6, 2008 9 / 1

Page 10: From the Ising Model to Biological Sequence Analysisbioserv.mps.ohio-state.edu/~rbund/home/ucr.pdf · From the Ising Model to Biological Sequence Analysis Ralf Bundschuh Ohio State

RNA editing

Outline

Ralf Bundschuh (Ohio State University) Biologial sequence analysis May 6, 2008 10 / 1

Page 11: From the Ising Model to Biological Sequence Analysisbioserv.mps.ohio-state.edu/~rbund/home/ucr.pdf · From the Ising Model to Biological Sequence Analysis Ralf Bundschuh Ohio State

RNA editing What is RNA editing?

RNA editing

Central dogma

RNA is an exact copy of the genomic DNA

RNA editing

RNA gets edited before it is translated:

substitution

insertion

deletion

of bases

Ralf Bundschuh (Ohio State University) Biologial sequence analysis May 6, 2008 11 / 1

Page 12: From the Ising Model to Biological Sequence Analysisbioserv.mps.ohio-state.edu/~rbund/home/ucr.pdf · From the Ising Model to Biological Sequence Analysis Ralf Bundschuh Ohio State

RNA editing What is RNA editing?

Physarum polycephalum

Example

Mitochondrion of Physarum polycephalum

most prevalent editing event: C insertion

e.g., a piece of nad7 :

DNA ...CAGAATTGCGATCCACATAT GGGCTTCTACAT GAGGTACTGAAAAACTTATAGAACATAAGAATTTCTTACAATCT TCCTTATTTTGAT...mRNA ...CAGAAUUGCGAUCCACAUAUCGGGCUUCUACAUCGAGGUACUGAAAAACUUAUAGAACAUAAGAAUUUCUUACAAUCUCUUCCUUAUUUUGAU...protein ... Q N C D P H I G L L H R G T E K L I E H K N F L Q S L P Y F D ...

other editing events: U insertion, dinucleotide insertions,C→U conversion

Editing is frequent: one insertion per 25 bases on average

Editing is reliable: every site is always edited

Ralf Bundschuh (Ohio State University) Biologial sequence analysis May 6, 2008 12 / 1

Page 13: From the Ising Model to Biological Sequence Analysisbioserv.mps.ohio-state.edu/~rbund/home/ucr.pdf · From the Ising Model to Biological Sequence Analysis Ralf Bundschuh Ohio State

RNA editing What is RNA editing?

Questions

How does it work?

Where does it edit?

How does it know where to edit?

What machinery performs the editing?

Why does it edit?

Ralf Bundschuh (Ohio State University) Biologial sequence analysis May 6, 2008 13 / 1

Page 14: From the Ising Model to Biological Sequence Analysisbioserv.mps.ohio-state.edu/~rbund/home/ucr.pdf · From the Ising Model to Biological Sequence Analysis Ralf Bundschuh Ohio State

RNA editing Situation in Physarum polycephalum

Background

Situation in Physarum polycephalum

Genome fully sequenced (≈ 63000 bases) Takano et al., 2001

Six protein coding genes with experimentally determined editing sitesin GenBank

Handful of genes identified but editing sites not known

Several unidentified open reading frames

Four typical mitochondrial genes apparently missing

Compare to Dictyostelium discoideum: 44 genes known

Ralf Bundschuh (Ohio State University) Biologial sequence analysis May 6, 2008 14 / 1

Page 15: From the Ising Model to Biological Sequence Analysisbioserv.mps.ohio-state.edu/~rbund/home/ucr.pdf · From the Ising Model to Biological Sequence Analysis Ralf Bundschuh Ohio State

RNA editing Situation in Physarum polycephalum

Motivation

Problem

Experimental determination of editing sites laborious

Glimmer of hope

Experimental verification of editing sites relatively easy

Solution

computational prediction ⇒ PIE = Predictor of Insertional Editing

Ralf Bundschuh (Ohio State University) Biologial sequence analysis May 6, 2008 15 / 1

Page 16: From the Ising Model to Biological Sequence Analysisbioserv.mps.ohio-state.edu/~rbund/home/ucr.pdf · From the Ising Model to Biological Sequence Analysis Ralf Bundschuh Ohio State

RNA editing Computational approach

Idea behind PIE

DNA sequence

Know genomic sequence (without editing sites)...CAGAATTGCGATCCACATATGGGCTTCTACATGAGGTACTGAAAAACTTATAGAACATAAGAATTTCTTACAATCTTCCTTATTTTGATGTCTTGAT...

Protein sequences

Know many protein sequences from related organisms

Neisseria meningitidisDrosophila melanogasterSynechococcus sp.Buchnera aphidicolaChloroflexus aurantiacusEscherichia coliRhodospirillum rubrum

...VRADPHIGLLHRGTEKLAETKT-YLQALPYMDRLD...

...MRADPHIGLLHRGTEKLIEYKT-YTQALPYFDRLD...

...VDCEPVIGYLHRGMEKIAENRT-NVMFVPYVSRMD...

...VDCVPDIGYHHRGAEKMAERQS-WHSYIPYTDRIE...

...VNVAPDVGYLHTGIEKTMESKT-YQKAVVLTDRMD...

...IDADYRLFYVHRGMEKLAETRMGYNEVTFLSDRVC...

...IRNAVSTGTMWRGIELILKGRD-PRDAWAFTQRIC...

Approach

Compare the two

Ralf Bundschuh (Ohio State University) Biologial sequence analysis May 6, 2008 16 / 1

Page 17: From the Ising Model to Biological Sequence Analysisbioserv.mps.ohio-state.edu/~rbund/home/ucr.pdf · From the Ising Model to Biological Sequence Analysis Ralf Bundschuh Ohio State

RNA editing Computational approach

Protein information

Protein sequence preprocessing

Pick gene to predict editing sites of, e.g., nad7

Pick protein for this gene from another species,e.g., Neisseria menigitidis

Use PSI-BLAST to pull all related protein sequences out of GenBank−→ 510 sequences for nad7

Create multiple alignmentNeisseria meningitidisDrosophila melanogasterSynechococcus sp.Buchnera aphidicolaChloroflexus aurantiacusEscherichia coliRhodospirillum rubrum...

...VRADPHIGLLHRGTEKLAETKT-YLQALPYMDRLD...

...MRADPHIGLLHRGTEKLIEYKT-YTQALPYFDRLD...

...VDCEPVIGYLHRGMEKIAENRT-NVMFVPYVSRMD...

...VDCVPDIGYHHRGAEKMAERQS-WHSYIPYTDRIE...

...VNVAPDVGYLHTGIEKTMESKT-YQKAVVLTDRMD...

...IDADYRLFYVHRGMEKLAETRMGYNEVTFLSDRVC...

...IRNAVSTGTMWRGIELILKGRD-PRDAWAFTQRIC...

...

Ralf Bundschuh (Ohio State University) Biologial sequence analysis May 6, 2008 17 / 1

Page 18: From the Ising Model to Biological Sequence Analysisbioserv.mps.ohio-state.edu/~rbund/home/ucr.pdf · From the Ising Model to Biological Sequence Analysis Ralf Bundschuh Ohio State

RNA editing Computational approach

Protein family model

Probability model

Extract probabilities pi (a) to find amino acid a at position i

Neisseria meningitidisDrosophila melanogasterSynechococcus sp.Buchnera aphidicolaChloroflexus aurantiacusEscherichia coliRhodospirillum rubrum...

...VRADPHIGLLHRGTEKLAETKT-YLQALPYMDRLD...

...MRADPHIGLLHRGTEKLIEYKT-YTQALPYFDRLD...

...VDCEPVIGYLHRGMEKIAENRT-NVMFVPYVSRMD...

...VDCVPDIGYHHRGAEKMAERQS-WHSYIPYTDRIE...

...VNVAPDVGYLHTGIEKTMESKT-YQKAVVLTDRMD...

...IDADYRLFYVHRGMEKLAETRMGYNEVTFLSDRVC...

...IRNAVSTGTMWRGIELILKGRD-PRDAWAFTQRIC...

. 42 54..

i \ a A R N D C Q E G H I L K M F P S T W Y V

.

.

.42 0.05 0.01 0.02 0.02 0.005 0.01 0.02 0.68 0.007 0.009 0.02 0.02 0.006 0.008 0.02 0.04 0.02 0.004 0.007 0.01.

.

.54 0.07 0.09 0.14 0.05 0.005 0.04 0.04 0.04 0.07 0.02 0.03 0.03 0.009 0.03 0.02 0.09 0.05 0.007 0.15 0.02.

.

.

⇒ Probabilistic model of the whole protein family.

Ralf Bundschuh (Ohio State University) Biologial sequence analysis May 6, 2008 18 / 1

Page 19: From the Ising Model to Biological Sequence Analysisbioserv.mps.ohio-state.edu/~rbund/home/ucr.pdf · From the Ising Model to Biological Sequence Analysis Ralf Bundschuh Ohio State

RNA editing Computational approach

Prediction method

Editing site prediction

Start with genomic sequence...CAGAATTGCGATCCACATATGGGCTTCTACATGAGGTACTGAAAAACTTATAGAACATAAGAATTTCTTACAATCTTCCTTATTTTGATG...

Arbitrarily insert C’s and translate...CAGAATTGCGACTCCACATATGGGCTTCTACATGACGGTACTGAAAAACTTATCAGAACATACAGAATTTCTCTACAATCTTCCTTATTTTGCATG...

Q N C D S T Y G L L H D G T E K L I R T Y R I S L Q S S L F C M

Calculate probability

p(. . .QNCDSTYG . . .) =

= . . . p35(Q)p36(N)p37(C )p38(D)p39(S)p40(T )p41(Y )p42(G ) . . .

Redo for all possibilities of inserting C’s

Pick insertion pattern with highest probability−→ prediction of editing sites

Ralf Bundschuh (Ohio State University) Biologial sequence analysis May 6, 2008 19 / 1

Page 20: From the Ising Model to Biological Sequence Analysisbioserv.mps.ohio-state.edu/~rbund/home/ucr.pdf · From the Ising Model to Biological Sequence Analysis Ralf Bundschuh Ohio State

RNA editing Computational approach

Computational challenge

Challenge

After each base of a sequence of length N a C can be inserted or not⇒ Need to find the highest probability among 2N possible patterns.

Typical gene: N ≈ 1000 ⇒ 10300 patterns

Solution

Statistical Physics methods

Ralf Bundschuh (Ohio State University) Biologial sequence analysis May 6, 2008 20 / 1

Page 21: From the Ising Model to Biological Sequence Analysisbioserv.mps.ohio-state.edu/~rbund/home/ucr.pdf · From the Ising Model to Biological Sequence Analysis Ralf Bundschuh Ohio State

Ising model

Outline

Ralf Bundschuh (Ohio State University) Biologial sequence analysis May 6, 2008 21 / 1

Page 22: From the Ising Model to Biological Sequence Analysisbioserv.mps.ohio-state.edu/~rbund/home/ucr.pdf · From the Ising Model to Biological Sequence Analysis Ralf Bundschuh Ohio State

Ising model Ordered Ising model

Ising model

Spins

N spins

Two states each: ↑ (up), ↓ (down)

Described by variables s i with s i = +1 for up and s i = −1 for down

2N total states

Interactions

Spins want to align:

Energy −Js i s j with J > 0

1D Ising model

All spins on a one-dimensional lattice: ↑ ↑ ↓ ↑ ↓ ↑ ↓ ↑ ↓ ↓ ↓ ↑ ↑ . . .

Only nearest neighbor interaction: E = −J∑N

i=2 s i−1s i

Ralf Bundschuh (Ohio State University) Biologial sequence analysis May 6, 2008 22 / 1

Page 23: From the Ising Model to Biological Sequence Analysisbioserv.mps.ohio-state.edu/~rbund/home/ucr.pdf · From the Ising Model to Biological Sequence Analysis Ralf Bundschuh Ohio State

Ising model Ordered Ising model

Ground state

↑ ↑ ↓ ↑ ↓ ↑ ↓ ↑ ↑ ↓ ↑ ↓ ↓ ↑ ↓ ↓ ↓ ↑ ↑

E = −JN∑

i=2

s i−1s i

Ferromagnetism

At zero temperature system finds ground state (lowest energy state)

Two ground states: ↑ ↑ ↑ ↑ ↑ ↑ ↑ and ↓ ↓ ↓ ↓ ↓ ↓ ↓Model of ferromagnetism

Antiferromagnetism

What happens if J < 0?

Still two ground states: ↑ ↓ ↑ ↓ ↑ ↓ ↑ and ↓ ↑ ↓ ↑ ↓ ↑ ↓

Ralf Bundschuh (Ohio State University) Biologial sequence analysis May 6, 2008 23 / 1

Page 24: From the Ising Model to Biological Sequence Analysisbioserv.mps.ohio-state.edu/~rbund/home/ucr.pdf · From the Ising Model to Biological Sequence Analysis Ralf Bundschuh Ohio State

Ising model Ordered Ising model

Ground state

↑ ↑ ↓ ↑ ↓ ↑ ↓ ↑ ↑ ↓ ↑ ↓ ↓ ↑ ↓ ↓ ↓ ↑ ↑

E = −JN∑

i=2

s i−1s i

Ferromagnetism

At zero temperature system finds ground state (lowest energy state)

Two ground states: ↑ ↑ ↑ ↑ ↑ ↑ ↑ and ↓ ↓ ↓ ↓ ↓ ↓ ↓

Model of ferromagnetism

Antiferromagnetism

What happens if J < 0?

Still two ground states: ↑ ↓ ↑ ↓ ↑ ↓ ↑ and ↓ ↑ ↓ ↑ ↓ ↑ ↓

Ralf Bundschuh (Ohio State University) Biologial sequence analysis May 6, 2008 23 / 1

Page 25: From the Ising Model to Biological Sequence Analysisbioserv.mps.ohio-state.edu/~rbund/home/ucr.pdf · From the Ising Model to Biological Sequence Analysis Ralf Bundschuh Ohio State

Ising model Ordered Ising model

Ground state

↑ ↑ ↓ ↑ ↓ ↑ ↓ ↑ ↑ ↓ ↑ ↓ ↓ ↑ ↓ ↓ ↓ ↑ ↑

E = −JN∑

i=2

s i−1s i

Ferromagnetism

At zero temperature system finds ground state (lowest energy state)

Two ground states: ↑ ↑ ↑ ↑ ↑ ↑ ↑ and ↓ ↓ ↓ ↓ ↓ ↓ ↓Model of ferromagnetism

Antiferromagnetism

What happens if J < 0?

Still two ground states: ↑ ↓ ↑ ↓ ↑ ↓ ↑ and ↓ ↑ ↓ ↑ ↓ ↑ ↓

Ralf Bundschuh (Ohio State University) Biologial sequence analysis May 6, 2008 23 / 1

Page 26: From the Ising Model to Biological Sequence Analysisbioserv.mps.ohio-state.edu/~rbund/home/ucr.pdf · From the Ising Model to Biological Sequence Analysis Ralf Bundschuh Ohio State

Ising model Ordered Ising model

Ground state

↑ ↑ ↓ ↑ ↓ ↑ ↓ ↑ ↑ ↓ ↑ ↓ ↓ ↑ ↓ ↓ ↓ ↑ ↑

E = −JN∑

i=2

s i−1s i

Ferromagnetism

At zero temperature system finds ground state (lowest energy state)

Two ground states: ↑ ↑ ↑ ↑ ↑ ↑ ↑ and ↓ ↓ ↓ ↓ ↓ ↓ ↓Model of ferromagnetism

Antiferromagnetism

What happens if J < 0?

Still two ground states: ↑ ↓ ↑ ↓ ↑ ↓ ↑ and ↓ ↑ ↓ ↑ ↓ ↑ ↓

Ralf Bundschuh (Ohio State University) Biologial sequence analysis May 6, 2008 23 / 1

Page 27: From the Ising Model to Biological Sequence Analysisbioserv.mps.ohio-state.edu/~rbund/home/ucr.pdf · From the Ising Model to Biological Sequence Analysis Ralf Bundschuh Ohio State

Ising model Ordered Ising model

Ground state

↑ ↑ ↓ ↑ ↓ ↑ ↓ ↑ ↑ ↓ ↑ ↓ ↓ ↑ ↓ ↓ ↓ ↑ ↑

E = −JN∑

i=2

s i−1s i

Ferromagnetism

At zero temperature system finds ground state (lowest energy state)

Two ground states: ↑ ↑ ↑ ↑ ↑ ↑ ↑ and ↓ ↓ ↓ ↓ ↓ ↓ ↓Model of ferromagnetism

Antiferromagnetism

What happens if J < 0?

Still two ground states: ↑ ↓ ↑ ↓ ↑ ↓ ↑ and ↓ ↑ ↓ ↑ ↓ ↑ ↓

Ralf Bundschuh (Ohio State University) Biologial sequence analysis May 6, 2008 23 / 1

Page 28: From the Ising Model to Biological Sequence Analysisbioserv.mps.ohio-state.edu/~rbund/home/ucr.pdf · From the Ising Model to Biological Sequence Analysis Ralf Bundschuh Ohio State

Ising model Disordered Ising model

Disorder

↑ ↑ ↓ ↑ ↓ ↑ ↓ ↑ ↑ ↓ ↑ ↓ ↓ ↑ ↓ ↓ ↓ ↑ ↑

Disordered Ising model

What happens if J depends on position i?

E = −∑N

i=2 J i s i−1s i

J i random variables (say Gaussian with mean zero)

Some J i > 0 (·), some J i < 0 (·)Disorder: l · l · l · l · l · l · l · l · l · l · l · l · l · l · . . .

Ground states:

↑ · ↑ · ↑ · ↓ · ↓ · ↑ · ↓ · ↑ · ↑ · ↓ · ↓ · ↓ · ↓ · ↑ · . . .and↓ · ↓ · ↓ · ↑ · ↑ · ↓ · ↑ · ↓ · ↓ · ↑ · ↑ · ↑ · ↑ · ↓ · . . .

Ralf Bundschuh (Ohio State University) Biologial sequence analysis May 6, 2008 24 / 1

Page 29: From the Ising Model to Biological Sequence Analysisbioserv.mps.ohio-state.edu/~rbund/home/ucr.pdf · From the Ising Model to Biological Sequence Analysis Ralf Bundschuh Ohio State

Ising model Disordered Ising model

Disorder

↑ ↑ ↓ ↑ ↓ ↑ ↓ ↑ ↑ ↓ ↑ ↓ ↓ ↑ ↓ ↓ ↓ ↑ ↑

Disordered Ising model

What happens if J depends on position i?

E = −∑N

i=2 J i s i−1s i

J i random variables (say Gaussian with mean zero)

Some J i > 0 (·), some J i < 0 (·)Disorder: l · l · l · l · l · l · l · l · l · l · l · l · l · l · . . .

Ground states:↑ ·

↑ · ↑ · ↓ · ↓ · ↑ · ↓ · ↑ · ↑ · ↓ · ↓ · ↓ · ↓ · ↑ · . . .and↓ · ↓ · ↓ · ↑ · ↑ · ↓ · ↑ · ↓ · ↓ · ↑ · ↑ · ↑ · ↑ · ↓ · . . .

Ralf Bundschuh (Ohio State University) Biologial sequence analysis May 6, 2008 24 / 1

Page 30: From the Ising Model to Biological Sequence Analysisbioserv.mps.ohio-state.edu/~rbund/home/ucr.pdf · From the Ising Model to Biological Sequence Analysis Ralf Bundschuh Ohio State

Ising model Disordered Ising model

Disorder

↑ ↑ ↓ ↑ ↓ ↑ ↓ ↑ ↑ ↓ ↑ ↓ ↓ ↑ ↓ ↓ ↓ ↑ ↑

Disordered Ising model

What happens if J depends on position i?

E = −∑N

i=2 J i s i−1s i

J i random variables (say Gaussian with mean zero)

Some J i > 0 (·), some J i < 0 (·)Disorder: l · l · l · l · l · l · l · l · l · l · l · l · l · l · . . .

Ground states:↑ · ↑ ·

↑ · ↓ · ↓ · ↑ · ↓ · ↑ · ↑ · ↓ · ↓ · ↓ · ↓ · ↑ · . . .and↓ · ↓ · ↓ · ↑ · ↑ · ↓ · ↑ · ↓ · ↓ · ↑ · ↑ · ↑ · ↑ · ↓ · . . .

Ralf Bundschuh (Ohio State University) Biologial sequence analysis May 6, 2008 24 / 1

Page 31: From the Ising Model to Biological Sequence Analysisbioserv.mps.ohio-state.edu/~rbund/home/ucr.pdf · From the Ising Model to Biological Sequence Analysis Ralf Bundschuh Ohio State

Ising model Disordered Ising model

Disorder

↑ ↑ ↓ ↑ ↓ ↑ ↓ ↑ ↑ ↓ ↑ ↓ ↓ ↑ ↓ ↓ ↓ ↑ ↑

Disordered Ising model

What happens if J depends on position i?

E = −∑N

i=2 J i s i−1s i

J i random variables (say Gaussian with mean zero)

Some J i > 0 (·), some J i < 0 (·)Disorder: l · l · l · l · l · l · l · l · l · l · l · l · l · l · . . .

Ground states:↑ · ↑ · ↑ ·

↓ · ↓ · ↑ · ↓ · ↑ · ↑ · ↓ · ↓ · ↓ · ↓ · ↑ · . . .and↓ · ↓ · ↓ · ↑ · ↑ · ↓ · ↑ · ↓ · ↓ · ↑ · ↑ · ↑ · ↑ · ↓ · . . .

Ralf Bundschuh (Ohio State University) Biologial sequence analysis May 6, 2008 24 / 1

Page 32: From the Ising Model to Biological Sequence Analysisbioserv.mps.ohio-state.edu/~rbund/home/ucr.pdf · From the Ising Model to Biological Sequence Analysis Ralf Bundschuh Ohio State

Ising model Disordered Ising model

Disorder

↑ ↑ ↓ ↑ ↓ ↑ ↓ ↑ ↑ ↓ ↑ ↓ ↓ ↑ ↓ ↓ ↓ ↑ ↑

Disordered Ising model

What happens if J depends on position i?

E = −∑N

i=2 J i s i−1s i

J i random variables (say Gaussian with mean zero)

Some J i > 0 (·), some J i < 0 (·)Disorder: l · l · l · l · l · l · l · l · l · l · l · l · l · l · . . .

Ground states:↑ · ↑ · ↑ · ↓ ·

↓ · ↑ · ↓ · ↑ · ↑ · ↓ · ↓ · ↓ · ↓ · ↑ · . . .and↓ · ↓ · ↓ · ↑ · ↑ · ↓ · ↑ · ↓ · ↓ · ↑ · ↑ · ↑ · ↑ · ↓ · . . .

Ralf Bundschuh (Ohio State University) Biologial sequence analysis May 6, 2008 24 / 1

Page 33: From the Ising Model to Biological Sequence Analysisbioserv.mps.ohio-state.edu/~rbund/home/ucr.pdf · From the Ising Model to Biological Sequence Analysis Ralf Bundschuh Ohio State

Ising model Disordered Ising model

Disorder

↑ ↑ ↓ ↑ ↓ ↑ ↓ ↑ ↑ ↓ ↑ ↓ ↓ ↑ ↓ ↓ ↓ ↑ ↑

Disordered Ising model

What happens if J depends on position i?

E = −∑N

i=2 J i s i−1s i

J i random variables (say Gaussian with mean zero)

Some J i > 0 (·), some J i < 0 (·)Disorder: l · l · l · l · l · l · l · l · l · l · l · l · l · l · . . .

Ground states:↑ · ↑ · ↑ · ↓ · ↓ ·

↑ · ↓ · ↑ · ↑ · ↓ · ↓ · ↓ · ↓ · ↑ · . . .and↓ · ↓ · ↓ · ↑ · ↑ · ↓ · ↑ · ↓ · ↓ · ↑ · ↑ · ↑ · ↑ · ↓ · . . .

Ralf Bundschuh (Ohio State University) Biologial sequence analysis May 6, 2008 24 / 1

Page 34: From the Ising Model to Biological Sequence Analysisbioserv.mps.ohio-state.edu/~rbund/home/ucr.pdf · From the Ising Model to Biological Sequence Analysis Ralf Bundschuh Ohio State

Ising model Disordered Ising model

Disorder

↑ ↑ ↓ ↑ ↓ ↑ ↓ ↑ ↑ ↓ ↑ ↓ ↓ ↑ ↓ ↓ ↓ ↑ ↑

Disordered Ising model

What happens if J depends on position i?

E = −∑N

i=2 J i s i−1s i

J i random variables (say Gaussian with mean zero)

Some J i > 0 (·), some J i < 0 (·)Disorder: l · l · l · l · l · l · l · l · l · l · l · l · l · l · . . .

Ground states:↑ · ↑ · ↑ · ↓ · ↓ · ↑ ·

↓ · ↑ · ↑ · ↓ · ↓ · ↓ · ↓ · ↑ · . . .and↓ · ↓ · ↓ · ↑ · ↑ · ↓ · ↑ · ↓ · ↓ · ↑ · ↑ · ↑ · ↑ · ↓ · . . .

Ralf Bundschuh (Ohio State University) Biologial sequence analysis May 6, 2008 24 / 1

Page 35: From the Ising Model to Biological Sequence Analysisbioserv.mps.ohio-state.edu/~rbund/home/ucr.pdf · From the Ising Model to Biological Sequence Analysis Ralf Bundschuh Ohio State

Ising model Disordered Ising model

Disorder

↑ ↑ ↓ ↑ ↓ ↑ ↓ ↑ ↑ ↓ ↑ ↓ ↓ ↑ ↓ ↓ ↓ ↑ ↑

Disordered Ising model

What happens if J depends on position i?

E = −∑N

i=2 J i s i−1s i

J i random variables (say Gaussian with mean zero)

Some J i > 0 (·), some J i < 0 (·)Disorder: l · l · l · l · l · l · l · l · l · l · l · l · l · l · . . .

Ground states:↑ · ↑ · ↑ · ↓ · ↓ · ↑ · ↓ · ↑ · ↑ · ↓ · ↓ · ↓ · ↓ · ↑ · . . .

and↓ · ↓ · ↓ · ↑ · ↑ · ↓ · ↑ · ↓ · ↓ · ↑ · ↑ · ↑ · ↑ · ↓ · . . .

Ralf Bundschuh (Ohio State University) Biologial sequence analysis May 6, 2008 24 / 1

Page 36: From the Ising Model to Biological Sequence Analysisbioserv.mps.ohio-state.edu/~rbund/home/ucr.pdf · From the Ising Model to Biological Sequence Analysis Ralf Bundschuh Ohio State

Ising model Disordered Ising model

Disorder

↑ ↑ ↓ ↑ ↓ ↑ ↓ ↑ ↑ ↓ ↑ ↓ ↓ ↑ ↓ ↓ ↓ ↑ ↑

Disordered Ising model

What happens if J depends on position i?

E = −∑N

i=2 J i s i−1s i

J i random variables (say Gaussian with mean zero)

Some J i > 0 (·), some J i < 0 (·)Disorder: l · l · l · l · l · l · l · l · l · l · l · l · l · l · . . .

Ground states:↑ · ↑ · ↑ · ↓ · ↓ · ↑ · ↓ · ↑ · ↑ · ↓ · ↓ · ↓ · ↓ · ↑ · . . .and↓ · ↓ · ↓ · ↑ · ↑ · ↓ · ↑ · ↓ · ↓ · ↑ · ↑ · ↑ · ↑ · ↓ · . . .

Ralf Bundschuh (Ohio State University) Biologial sequence analysis May 6, 2008 24 / 1

Page 37: From the Ising Model to Biological Sequence Analysisbioserv.mps.ohio-state.edu/~rbund/home/ucr.pdf · From the Ising Model to Biological Sequence Analysis Ralf Bundschuh Ohio State

Ising model Next-nearest neighbor interactions

Next-nearest neighbor interactions

More interactions

E = −N∑

i=2

J i s i−1s i −N∑

i=3

K i s i−2s i

⇒ frustration⇒ ground state depends on actual values of J i and K i

Question

How to find the ground state and its energy?

Answer

Transfer matrix approach

Ralf Bundschuh (Ohio State University) Biologial sequence analysis May 6, 2008 25 / 1

Page 38: From the Ising Model to Biological Sequence Analysisbioserv.mps.ohio-state.edu/~rbund/home/ucr.pdf · From the Ising Model to Biological Sequence Analysis Ralf Bundschuh Ohio State

Ising model Next-nearest neighbor interactions

Next-nearest neighbor interactions

More interactions

E = −N∑

i=2

J i s i−1s i −N∑

i=3

K i s i−2s i

⇒ frustration⇒ ground state depends on actual values of J i and K i

Question

How to find the ground state and its energy?

Answer

Transfer matrix approach

Ralf Bundschuh (Ohio State University) Biologial sequence analysis May 6, 2008 25 / 1

Page 39: From the Ising Model to Biological Sequence Analysisbioserv.mps.ohio-state.edu/~rbund/home/ucr.pdf · From the Ising Model to Biological Sequence Analysis Ralf Bundschuh Ohio State

Ising model Next-nearest neighbor interactions

Next-nearest neighbor interactions

More interactions

E = −N∑

i=2

J i s i−1s i −N∑

i=3

K i s i−2s i

⇒ frustration⇒ ground state depends on actual values of J i and K i

Question

How to find the ground state and its energy?

Answer

Transfer matrix approach

Ralf Bundschuh (Ohio State University) Biologial sequence analysis May 6, 2008 25 / 1

Page 40: From the Ising Model to Biological Sequence Analysisbioserv.mps.ohio-state.edu/~rbund/home/ucr.pdf · From the Ising Model to Biological Sequence Analysis Ralf Bundschuh Ohio State

Ising model Next-nearest neighbor interactions

Next-nearest neighbor interactions

More interactions

E = −N∑

i=2

J i s i−1s i −N∑

i=3

K i s i−2s i

⇒ frustration⇒ ground state depends on actual values of J i and K i

Question

How to find the ground state and its energy?

Answer

Transfer matrix approach

Ralf Bundschuh (Ohio State University) Biologial sequence analysis May 6, 2008 25 / 1

Page 41: From the Ising Model to Biological Sequence Analysisbioserv.mps.ohio-state.edu/~rbund/home/ucr.pdf · From the Ising Model to Biological Sequence Analysis Ralf Bundschuh Ohio State

Ising model Next-nearest neighbor interactions

Transfer matrix

E = −N∑

i=2

J i s i−1s i −N∑

i=3

K i s i−2s i

Definition

En(s′, s) ground state energy of s1 . . . sn with sn−1 = s ′ and sn = s

Recursion

sn−2 is either +1 or −1

⇒ En(s′, s) = min

{

−Jns′s − Kn(+1)s + En−1(+1, s ′)

−Jns′s − Kn(−1)s + En−1(−1, s ′)

Boundary condition

E2(s′, s) = −J2s

′s ⇒ can calculate ground state energy in N steps

Ralf Bundschuh (Ohio State University) Biologial sequence analysis May 6, 2008 26 / 1

Page 42: From the Ising Model to Biological Sequence Analysisbioserv.mps.ohio-state.edu/~rbund/home/ucr.pdf · From the Ising Model to Biological Sequence Analysis Ralf Bundschuh Ohio State

Ising model Next-nearest neighbor interactions

Transfer matrix

E = −N∑

i=2

J i s i−1s i −N∑

i=3

K i s i−2s i

Definition

En(s′, s) ground state energy of s1 . . . sn with sn−1 = s ′ and sn = s

Recursion

sn−2 is either +1 or −1

⇒ En(s′, s) = min

{−Jns

′s

− Kn(+1)s + En−1(+1, s ′)−Jns

′s − Kn(−1)s + En−1(−1, s ′)

Boundary condition

E2(s′, s) = −J2s

′s ⇒ can calculate ground state energy in N steps

Ralf Bundschuh (Ohio State University) Biologial sequence analysis May 6, 2008 26 / 1

Page 43: From the Ising Model to Biological Sequence Analysisbioserv.mps.ohio-state.edu/~rbund/home/ucr.pdf · From the Ising Model to Biological Sequence Analysis Ralf Bundschuh Ohio State

Ising model Next-nearest neighbor interactions

Transfer matrix

E = −N∑

i=2

J i s i−1s i −N∑

i=3

K i s i−2s i

Definition

En(s′, s) ground state energy of s1 . . . sn with sn−1 = s ′ and sn = s

Recursion

sn−2 is either +1 or −1

⇒ En(s′, s) = min

{−Jns

′s − Kn(+1)s

+ En−1(+1, s ′)−Jns

′s − Kn(−1)s + En−1(−1, s ′)

Boundary condition

E2(s′, s) = −J2s

′s ⇒ can calculate ground state energy in N steps

Ralf Bundschuh (Ohio State University) Biologial sequence analysis May 6, 2008 26 / 1

Page 44: From the Ising Model to Biological Sequence Analysisbioserv.mps.ohio-state.edu/~rbund/home/ucr.pdf · From the Ising Model to Biological Sequence Analysis Ralf Bundschuh Ohio State

Ising model Next-nearest neighbor interactions

Transfer matrix

E = −N∑

i=2

J i s i−1s i −N∑

i=3

K i s i−2s i

Definition

En(s′, s) ground state energy of s1 . . . sn with sn−1 = s ′ and sn = s

Recursion

sn−2 is either +1 or −1

⇒ En(s′, s) = min

{−Jns

′s − Kn(+1)s + En−1(+1, s ′)

−Jns′s − Kn(−1)s + En−1(−1, s ′)

Boundary condition

E2(s′, s) = −J2s

′s ⇒ can calculate ground state energy in N steps

Ralf Bundschuh (Ohio State University) Biologial sequence analysis May 6, 2008 26 / 1

Page 45: From the Ising Model to Biological Sequence Analysisbioserv.mps.ohio-state.edu/~rbund/home/ucr.pdf · From the Ising Model to Biological Sequence Analysis Ralf Bundschuh Ohio State

Ising model Next-nearest neighbor interactions

Transfer matrix

E = −N∑

i=2

J i s i−1s i −N∑

i=3

K i s i−2s i

Definition

En(s′, s) ground state energy of s1 . . . sn with sn−1 = s ′ and sn = s

Recursion

sn−2 is either +1 or −1

⇒ En(s′, s) = min

{−Jns

′s − Kn(+1)s + En−1(+1, s ′)−Jns

′s − Kn(−1)s + En−1(−1, s ′)

Boundary condition

E2(s′, s) = −J2s

′s ⇒ can calculate ground state energy in N steps

Ralf Bundschuh (Ohio State University) Biologial sequence analysis May 6, 2008 26 / 1

Page 46: From the Ising Model to Biological Sequence Analysisbioserv.mps.ohio-state.edu/~rbund/home/ucr.pdf · From the Ising Model to Biological Sequence Analysis Ralf Bundschuh Ohio State

Ising model Next-nearest neighbor interactions

Transfer matrix

E = −N∑

i=2

J i s i−1s i −N∑

i=3

K i s i−2s i

Definition

En(s′, s) ground state energy of s1 . . . sn with sn−1 = s ′ and sn = s

Recursion

sn−2 is either +1 or −1

⇒ En(s′, s) = min

{−Jns

′s − Kn(+1)s + En−1(+1, s ′)−Jns

′s − Kn(−1)s + En−1(−1, s ′)

Boundary condition

E2(s′, s) = −J2s

′s ⇒ can calculate ground state energy in N steps

Ralf Bundschuh (Ohio State University) Biologial sequence analysis May 6, 2008 26 / 1

Page 47: From the Ising Model to Biological Sequence Analysisbioserv.mps.ohio-state.edu/~rbund/home/ucr.pdf · From the Ising Model to Biological Sequence Analysis Ralf Bundschuh Ohio State

Computational prediction of RNA editing

Outline

Ralf Bundschuh (Ohio State University) Biologial sequence analysis May 6, 2008 27 / 1

Page 48: From the Ising Model to Biological Sequence Analysisbioserv.mps.ohio-state.edu/~rbund/home/ucr.pdf · From the Ising Model to Biological Sequence Analysis Ralf Bundschuh Ohio State

Computational prediction of RNA editing Computational method

Analogy to RNA editing

Ising model RNA editing

spin presence or absence of editing site

2N states 2N states

energy -log(probability)

sum of local contributions sum of local contributions(neighbor interactions) (amino acid probabilities)

ground state most plausible editing sites

Ralf Bundschuh (Ohio State University) Biologial sequence analysis May 6, 2008 28 / 1

Page 49: From the Ising Model to Biological Sequence Analysisbioserv.mps.ohio-state.edu/~rbund/home/ucr.pdf · From the Ising Model to Biological Sequence Analysis Ralf Bundschuh Ohio State

Computational prediction of RNA editing Computational method

PIE recursion

Setup

Genomic sequence b1 . . . bN ; protein model: pi (a) for i = 1, . . . ,M

Auxiliary quantity

E i ,j is the negative logarithm of the probability of the most probableediting configuration ending at model position i and genomic position j

Without editingE i ,j = − log pi (aa[bj − 2, bj − 1, bj ]) + E i−1,j−3

With editing

E i ,j = min

− log pi (aa[bj − 2, bj − 1, bj ]) + E i−1,j−3

− log pi (aa[C , bj − 1, bj ]) + E i−1,j−2

− log pi (aa[bj − 1,C , bj ]) + E i−1,j−2

− log pi (aa[bj − 1, bj ,C ]) + E i−1,j−2

Ralf Bundschuh (Ohio State University) Biologial sequence analysis May 6, 2008 29 / 1

Page 50: From the Ising Model to Biological Sequence Analysisbioserv.mps.ohio-state.edu/~rbund/home/ucr.pdf · From the Ising Model to Biological Sequence Analysis Ralf Bundschuh Ohio State

Computational prediction of RNA editing Computational method

Bells and whistles

Refinements

Avoid too many editing sites by penalizing editing sites

Use biological information:editing sites often after purine-pyrimidine−→ lower editing penalty after purine-pyrimidine pattern

Allow arbitrary starting point in protein sequence

Allow insertions and deletions in protein sequence

Ralf Bundschuh (Ohio State University) Biologial sequence analysis May 6, 2008 30 / 1

Page 51: From the Ising Model to Biological Sequence Analysisbioserv.mps.ohio-state.edu/~rbund/home/ucr.pdf · From the Ising Model to Biological Sequence Analysis Ralf Bundschuh Ohio State

Computational prediction of RNA editing Results on known genes

Performance on known genes

Assessment method

Use one of the six known genes to optimize parameters

Test on other five genes

Repeat for all six genes (“leave one out testing”)

Questions

How many of the amino acids are predicted correctly?How many of the C insertions are predicted correctly?How far off are incorrect predictions of C insertions?

Ralf Bundschuh (Ohio State University) Biologial sequence analysis May 6, 2008 31 / 1

Page 52: From the Ising Model to Biological Sequence Analysisbioserv.mps.ohio-state.edu/~rbund/home/ucr.pdf · From the Ising Model to Biological Sequence Analysis Ralf Bundschuh Ohio State

Computational prediction of RNA editing Results on known genes

Prediction quality

Results

gene amino acids C insertions off by1 2 3 ≥ 4

nad7 92% 116/171 = 68% 9 12 7 28cox1 93% 112/159 = 70% 8 15 8 27cox3 81% 134/181 = 74% 9 14 9 55cytb 93% 118/172 = 68% 11 11 6 15atp 93% 106/152 = 70% 7 8 4 15pL 93% 144/199 = 72% 10 18 9 38

total 92% 122/173 = 71% 12 9 8 22

Ralf Bundschuh (Ohio State University) Biologial sequence analysis May 6, 2008 32 / 1

Page 53: From the Ising Model to Biological Sequence Analysisbioserv.mps.ohio-state.edu/~rbund/home/ucr.pdf · From the Ising Model to Biological Sequence Analysis Ralf Bundschuh Ohio State

Computational prediction of RNA editing Finding new genes

Gene finding

Real test — finding new genes

Search for missing genes nad2, nad4L, nad6, and atp8

These genes could not be found by traditional gene finding

Step 1 — find location

Pick a gene from the list

Build PIE model from protein sequences of other organisms

Cut genome into short overlapping pieces (length 1200 bases)

Apply PIE to every piece of the genome

PIE predicts best way to insert C’sin each piece plus “ground state energy”

Identify position of gene in genome bymaximum in ground state energy 0 10000 20000 30000 40000 50000 60000

genome position-150

-100

-50

0

grou

nd s

tate

ene

rgy

forward strandbackward strand

Ralf Bundschuh (Ohio State University) Biologial sequence analysis May 6, 2008 33 / 1

Page 54: From the Ising Model to Biological Sequence Analysisbioserv.mps.ohio-state.edu/~rbund/home/ucr.pdf · From the Ising Model to Biological Sequence Analysis Ralf Bundschuh Ohio State

Computational prediction of RNA editing Finding new genes

Results on new genes

Approach (continued)

Step 2: prediction of editing sitesStep 3: verification by experimental sequencing of mRNA

Results

Location of all four genes found

All four genes confirmed bysequencing of mRNA

Surprise: new type of editing (deletionalediting) found in one of the genes

Total increase of the known number of editing sites by 50%

Still no significant sequence pattern found

Ralf Bundschuh (Ohio State University) Biologial sequence analysis May 6, 2008 34 / 1

Page 55: From the Ising Model to Biological Sequence Analysisbioserv.mps.ohio-state.edu/~rbund/home/ucr.pdf · From the Ising Model to Biological Sequence Analysis Ralf Bundschuh Ohio State

Computational prediction of RNA editing Finding new genes

Additional predictions

Systematically search for all known mitochondrial genes

Find 11 genes beyond the four experimentally verified ones

Find 8 more candidates with lower statistical significance

In total increased number of predicted genes from 11 to 26–34

Still have to be verified experimentally

Ralf Bundschuh (Ohio State University) Biologial sequence analysis May 6, 2008 35 / 1

Page 56: From the Ising Model to Biological Sequence Analysisbioserv.mps.ohio-state.edu/~rbund/home/ucr.pdf · From the Ising Model to Biological Sequence Analysis Ralf Bundschuh Ohio State

Conclusions and outlook

Outline

Ralf Bundschuh (Ohio State University) Biologial sequence analysis May 6, 2008 36 / 1

Page 57: From the Ising Model to Biological Sequence Analysisbioserv.mps.ohio-state.edu/~rbund/home/ucr.pdf · From the Ising Model to Biological Sequence Analysis Ralf Bundschuh Ohio State

Conclusions and outlook Summary

Summary

Conclusions

Biological sequences are plentiful and challenging to interpret

Statistical Physics provides useful methods for Biological sequenceanalysis

Insertional editing sites in Physarum polycephalum can be predictedwith high precision

Outlook

Find remaining genes

Combine genomic sequences from several organisms

Identify editing signals and mechanisms

Substitutional editing

Ralf Bundschuh (Ohio State University) Biologial sequence analysis May 6, 2008 37 / 1

Page 58: From the Ising Model to Biological Sequence Analysisbioserv.mps.ohio-state.edu/~rbund/home/ucr.pdf · From the Ising Model to Biological Sequence Analysis Ralf Bundschuh Ohio State

Conclusions and outlook Acknowledgements

Acknowledgements

Ohio State University

Tsunglin Liu → UCSB

Ha Youn Lee → University of Rochester

Christina Beargie → COSI

Case Western Reserve University

Jonatha Gott

Neeta Parimi

$$$

National Science Foundation (RB)

National Institutes of Health (JG)

Ralf Bundschuh (Ohio State University) Biologial sequence analysis May 6, 2008 38 / 1