from the ising model to biological sequence...
Post on 21-Jul-2020
1 Views
Preview:
TRANSCRIPT
From the Ising Model to Biological Sequence Analysis
Ralf Bundschuh
Ohio State University
May 6, 2008
Ralf Bundschuh (Ohio State University) Biologial sequence analysis May 6, 2008 1 / 1
Outline
Ralf Bundschuh (Ohio State University) Biologial sequence analysis May 6, 2008 2 / 1
Biological sequences
Outline
Ralf Bundschuh (Ohio State University) Biologial sequence analysis May 6, 2008 3 / 1
Biological sequences Sequence data
DNA sequences
A piece of human chromosome 21
...TCTACATGTAAAAATATGTATTTTTAAAAATTGGATGTCATGGGCTGGGTGTGGTGGCTCATGCCTGTAATCCCAGCACTTTGGGAGGCCGAGGCGGGTGGATCTCCTGAGGTCGGGAGTTCGAGACCAGCCTGACCAACATGGAGAAACCCCGTCTCTACTAAAAACACAAAATTATCCAGGCATGGTGGCACATGACTGTAATCCCAGCTACTAGGGAGGCTGAGGCAGGAGAAACACTTGAACCTGGGAGGCGGAGGTTGAGGTGAGCCGAGATCGCGCCATTGCACTCCAGCCTGGGCAACAAGAGTGAAACTCGGTCTCAAAAAAAAAAAATTTGATGTTATGGAAGTAGGGAGACAAAAAATGCTCTACAACTATTAACTGATGCTTTTCTGGTTTTGTTCTCCAGACACCATTCGCTTTTCACCCAAGATGATTTGATGTCTTATAAAACTCTGATGAACCATGATGGCTACACAGACATTAAGTATAGACAGCTATCAAGATGGGCAACAGGTGAGCTTGAACTTGATTCTGCATTCTAATTACAAATCAACCTGGCACTCAAGCATGAACATTGCTTTGTATACTTGCAATTCAATTGCCATGAGGTTGCATGCTCAGTGTTAGTGTATTATGCATTTATTGTACATTCGTGTTCAGAAAAAAAGCCATAGAATAATACTATTTCGTTAACTGATACCAAGATTGCCAGGAATCTTGACTTCCCTAAGTCATATGACAGTTTCTTGGGAATTTACCTTTTTAATGTCAGTGTTAATTAGCACTGTTACTTTGAAAGAAAACCCGGTTGATTTTCATGATGACAGATTCCCATGTTGACTGGTGGCTCTTCTGAGTGTCTAACTGGATCAGCTTTTGAATGGGAATCTTGTAGCCTCGTCTCCCCAGTTGTAGGCATGAGAGGGGCTGTCCCAGTAATGAATTTGCAGGGGCCCCAGTGCTCTATCTTTGTACCTTGCTCGTGCTTGGATGGTTGTGCCATACACGGGCAGCTCTCCATTGCCCTCCCACCATAGATGAGACTTTGTTCTCCTGGAAGCTGTGGTGTTTTGTGCTTTTGAGTATCTGAGTGTTTTGTGTTCTGTGACCTGAATGAATTGAGGAGCAGGTGGATCGAGACTTGGCTGAGGCCCTTGTGGTCTTTCTTGGCTTGCGATCTTGTTAAACACGGTGTTCTGAACCCACTGGCATTTGGCTCATCATCCCACTGACTCTGGAGCCAGTGAAGGGATTTGGCCCTGCCCTTTACTTTCCTGCCCAGCAGGCAGGGGCAGTGCAGTACACCCCCCTCGGCTCTCCTCCCCACCTCGAGGACTTCGGTGGCAAGGATCAGGCTCCGGAAAACTCACTGGAGCCATGCTGGTGAGGTCTGAGGAGGGGTTAGGAGCTGAGGCGCTGGGTCCCCCTTTCCCTGGTGGTTAGTTTTACCAACCAGTCCTTGTTCAGTTCCTGTGGCAGAGATTTTTGTTGTTGGTGGTGGTGGTGTTAGTGTTTTTTTTTCTCTGTATAGCAATTAAAGGAGGGAGATTCTGTGATGTAGTCAGCCTGCTTCCTTAGCCTAGAAGTCCTTAGTCCTTTGGTATTTCCAATTGACTTTTTTTTTTTTTTCTAAAATGCAAATCTAATAATGTCCCGCCTGAGCTCTCCAGTGGCTCCCTGTGGATTCCTGTGGGTTTCATGCAAAGACTGAGCTGCTCTGTGGCCCGAATCGTCTGGCCCCTCTGGACCCCAGGACGCCCCCAACATCTCTGCCTGGCATATCTTGGGCACCTCTCTGCCTGCCCTGGAGCACCGGCCTCACTGTTCCCATCACTCTTCTCCCTCCTGCCTGCCAGGTCTTTGCTCCGACCCCACTGCTGCCTCCTGTGCACCAAGGCACGGTGACCACCTCCAACACAGCCTGGTTGCTACCAGCCACCTCCTCCCAGGCAGCTGTGCCAGGTGCAGATGACACCTGGAGCACTGCCCTTTTCATACCCGAGTGTTTCCAAGGGCCTCGGAAGTGTTTAATCAGCATTATTTTAAATAAACATTGAAATATATCTACAGCGTAGACCTATCATAATTATTTTGCCATTTTTCCAAGGTTGAACATTTAGGTTTCTCTCTTTTCACAATCATTTTTTTTTCAAATACTGAAATGAATCTTTTAAGGCTTCTTATTTTTTATTATTTATTTATTTACTTATTTATTTATTTTGAGACAGAGTCTTGCTCTGTTGCCCCGGCTGGAGTGCAGTGGTGTGATCTCAGCTCACTGCAACCTCCGTCTCCCAGGTTCAAGCAATTCTCCTGTCTCAGCCTCCTGAGTACCTGGGATTACAGGTGTGTGCCACCACGCCCAGCTAATTTTTTTGTATTTTTAGTAGAGACAGGGTTTCACCATGTTGGCCAGGCTGATCTTGAACCCCTGACCTCAGGTGATCCGCCCACCTTGGCCTCCCAAAGTGCTGGGATCACAGGCATAAGCCACTGTGCCTGGCCTTTTTAAGGCTTTTTATACATGCTGGCAGATTGCCTTACACAAATACTGTGTCCACTTAGGCTTTATTGCTTTTATTTTTTTTTTTTAAGAGAAACATAAACAGTTTTCCTAATATGTTGTACCATTTAAAGGCAGCAGAATAGAAGTCATCTTATTGCAAAAACAAGACATTGGAGGGAAGAGAGCACAGGGCTGGAGGATGTGAGAGGCGTCCTGTGCGGGTGGGCGTTCATGGCTGGCCCCCAGTCTGTCTGGACAGTGGGGATGGCCCCGCTCCCATGAGGTCTCCCCGCCCCCGCTGCCCCAAGCTGCTTCCTCAAGGGGCAGAAGCATGGCCAAATCCACCGCGGGAGAAATGGCCCGTCCTGGTCCTGAGGAAGCTGAGGTCAGGACAGTCTAATCTGCTGCTCATGGATAACTAGAAGTTTACTTTCACGAAATTTTGTTTTTGTAAACTGATTTTTTTTAACGATTTAAATGTTTTTTACCTAAATGACAAAGGCATTGCTTGTTTAAAGCAGTTTAAATGATAGTATCTTTTAAGGCTTTAAGTAAACACAGCTGGCCTTTTCCTTTCTGAATGCAGTGACATTTTTATGGCTATGTATTGCTGAGGTTTGAGGGTAGATATGGGAGAAGTTCAACCTTGTCCCAAATATGTAGCGTATGGGTTAGGTTGTGTCTGTGACATGGTAAGAAGACCTTGGACTATTT...
Ralf Bundschuh (Ohio State University) Biologial sequence analysis May 6, 2008 4 / 1
Biological sequences Sequence data
Amount of data
GenBank
Central sequence repository
Exponential growth
Currently 200 billion bases
Three human genomes (soon 1000)
Cow, dog, cat, mouse, rat, guinea pig,gorilla, chimpanzee, macaque, . . .
678 complete microbial genomes
What does it all mean? ⇒ Biological Sequence Analysis
Ralf Bundschuh (Ohio State University) Biologial sequence analysis May 6, 2008 5 / 1
Biological sequences From sequence to function
Central dogma
Nature builds organisms from those sequences
Problem
One-dimensional information has to encode three-dimensional structure
Solution — the central dogma
DNA → RNA → protein → structure → function
The first step: DNA → RNA (transcription)
pre−mRNAintronsexons
mRNAsplicingpre−mRNA
RNA polymerase
promoter start stop
Ralf Bundschuh (Ohio State University) Biologial sequence analysis May 6, 2008 6 / 1
Biological sequences From sequence to function
Translation
The second step: RNA → protein (translation)
mRNA: polymer with 4 different monomers A, C, G, U
protein: polymer with 20 different monomers
Three bases code for one amino acid (genetic code)
UUU F UCU S UAU Y UGU C AUU I ACU T AAU N AGU SUUC F UCC S UAC Y UGC C AUC I ACC T AAC N AGC SUUA L UCA S UAA * UGA * AUA I ACA T AAA K AGA RUUG L UCG S UAG * UGG W AUG M ACG T AAG K AGG RCUU L CCU P CAU H CGU R GUU V GCU A GAU D GGU GCUC L CCC P CAC H CGC R GUC V GCC A GAC D GGC GCUA L CCA P CAA Q CGA R GUA V GCA A GAA E GGA GCUG L CCG P CAG Q CGG R GUG V GCG A GAG E GGG G
Example — myoglobinAUGGGGCUCAGCGACGGGGAAUGGCAGCUGGUGCUGAACGUCUGGGGGAAGGUGGAGGCUGAUGUCGCAGGCCAUGGGCAGGAGGUCCUCAUCAGCUCUUUAAGGGUCACCCCGAGACCCUGGAGAAAUUUGACAAGUUUAAGCACCUGAAGUCAGAGGAUGAGAUGAAGGCCUCUGAGGACCUGAAGAAGCACGGCAACACGGUGCUGACUG. . .
−→MGLSDGEWQLVLHVWAKVEADVAGHGQDILIRLFKSHPETLEKFDRFKHLKTEAEMKASEDLKKHGVTVLTALGAILKKKGHHEAELKPLAQSHATKHKIPIKYLEFISEAIIHVLHSRHPGDFGADAQGAMNKALELFRKDIAAKYKELGYQG
Ralf Bundschuh (Ohio State University) Biologial sequence analysis May 6, 2008 7 / 1
Biological sequences From sequence to function
Folding
The third step: protein → structure (folding)
Different amino acids have different physical and chemical properties
Name Abbr. Charge Hydrophob.Alanine Ala A o +Arginine Arg R + -Asparagine Asn N o -Aspartic acid Asp D - -Cysteine Cys C - +Glutamine Gln Q o -Glutamic acid Glu E - -Glycine Gly G o +Histidine His H + -Isoleucine Ile I o +Leucine Leu L o +Lysine Lys K + -Methionine Met M o +Phenylalanine Phe F o +Proline Pro P o +Serine Ser S o -Threonine Thr T o -Tryptophan Trp W o +Tyrosine Tyr Y o +Valine Val V o +
MGLSDGEWQLVLHVWAKVEADVAGHGQDILIRLFKSHPETLEKFDRFKHLKTEAEMKASEDLKKHGVTVLTALGAILKKKGHHEAELKPLAQSHATKHKIPIKYLEFISEAIIHVLHSRHPGDFGADAQGAMNKALELFRKDIAAKYKELGYQG
Interactions among amino acids force folding into some structure
Ralf Bundschuh (Ohio State University) Biologial sequence analysis May 6, 2008 8 / 1
Biological sequences From sequence to function
Function
The fourth step: structure → function
Proteins
bind other proteins
bind to small molecules
perform mechanical functions
. . .
Summary
DNA → RNA → protein → structure → function
Ralf Bundschuh (Ohio State University) Biologial sequence analysis May 6, 2008 9 / 1
RNA editing
Outline
Ralf Bundschuh (Ohio State University) Biologial sequence analysis May 6, 2008 10 / 1
RNA editing What is RNA editing?
RNA editing
Central dogma
RNA is an exact copy of the genomic DNA
RNA editing
RNA gets edited before it is translated:
substitution
insertion
deletion
of bases
Ralf Bundschuh (Ohio State University) Biologial sequence analysis May 6, 2008 11 / 1
RNA editing What is RNA editing?
Physarum polycephalum
Example
Mitochondrion of Physarum polycephalum
most prevalent editing event: C insertion
e.g., a piece of nad7 :
DNA ...CAGAATTGCGATCCACATAT GGGCTTCTACAT GAGGTACTGAAAAACTTATAGAACATAAGAATTTCTTACAATCT TCCTTATTTTGAT...mRNA ...CAGAAUUGCGAUCCACAUAUCGGGCUUCUACAUCGAGGUACUGAAAAACUUAUAGAACAUAAGAAUUUCUUACAAUCUCUUCCUUAUUUUGAU...protein ... Q N C D P H I G L L H R G T E K L I E H K N F L Q S L P Y F D ...
other editing events: U insertion, dinucleotide insertions,C→U conversion
Editing is frequent: one insertion per 25 bases on average
Editing is reliable: every site is always edited
Ralf Bundschuh (Ohio State University) Biologial sequence analysis May 6, 2008 12 / 1
RNA editing What is RNA editing?
Questions
How does it work?
Where does it edit?
How does it know where to edit?
What machinery performs the editing?
Why does it edit?
Ralf Bundschuh (Ohio State University) Biologial sequence analysis May 6, 2008 13 / 1
RNA editing Situation in Physarum polycephalum
Background
Situation in Physarum polycephalum
Genome fully sequenced (≈ 63000 bases) Takano et al., 2001
Six protein coding genes with experimentally determined editing sitesin GenBank
Handful of genes identified but editing sites not known
Several unidentified open reading frames
Four typical mitochondrial genes apparently missing
Compare to Dictyostelium discoideum: 44 genes known
Ralf Bundschuh (Ohio State University) Biologial sequence analysis May 6, 2008 14 / 1
RNA editing Situation in Physarum polycephalum
Motivation
Problem
Experimental determination of editing sites laborious
Glimmer of hope
Experimental verification of editing sites relatively easy
Solution
computational prediction ⇒ PIE = Predictor of Insertional Editing
Ralf Bundschuh (Ohio State University) Biologial sequence analysis May 6, 2008 15 / 1
RNA editing Computational approach
Idea behind PIE
DNA sequence
Know genomic sequence (without editing sites)...CAGAATTGCGATCCACATATGGGCTTCTACATGAGGTACTGAAAAACTTATAGAACATAAGAATTTCTTACAATCTTCCTTATTTTGATGTCTTGAT...
Protein sequences
Know many protein sequences from related organisms
Neisseria meningitidisDrosophila melanogasterSynechococcus sp.Buchnera aphidicolaChloroflexus aurantiacusEscherichia coliRhodospirillum rubrum
...VRADPHIGLLHRGTEKLAETKT-YLQALPYMDRLD...
...MRADPHIGLLHRGTEKLIEYKT-YTQALPYFDRLD...
...VDCEPVIGYLHRGMEKIAENRT-NVMFVPYVSRMD...
...VDCVPDIGYHHRGAEKMAERQS-WHSYIPYTDRIE...
...VNVAPDVGYLHTGIEKTMESKT-YQKAVVLTDRMD...
...IDADYRLFYVHRGMEKLAETRMGYNEVTFLSDRVC...
...IRNAVSTGTMWRGIELILKGRD-PRDAWAFTQRIC...
Approach
Compare the two
Ralf Bundschuh (Ohio State University) Biologial sequence analysis May 6, 2008 16 / 1
RNA editing Computational approach
Protein information
Protein sequence preprocessing
Pick gene to predict editing sites of, e.g., nad7
Pick protein for this gene from another species,e.g., Neisseria menigitidis
Use PSI-BLAST to pull all related protein sequences out of GenBank−→ 510 sequences for nad7
Create multiple alignmentNeisseria meningitidisDrosophila melanogasterSynechococcus sp.Buchnera aphidicolaChloroflexus aurantiacusEscherichia coliRhodospirillum rubrum...
...VRADPHIGLLHRGTEKLAETKT-YLQALPYMDRLD...
...MRADPHIGLLHRGTEKLIEYKT-YTQALPYFDRLD...
...VDCEPVIGYLHRGMEKIAENRT-NVMFVPYVSRMD...
...VDCVPDIGYHHRGAEKMAERQS-WHSYIPYTDRIE...
...VNVAPDVGYLHTGIEKTMESKT-YQKAVVLTDRMD...
...IDADYRLFYVHRGMEKLAETRMGYNEVTFLSDRVC...
...IRNAVSTGTMWRGIELILKGRD-PRDAWAFTQRIC...
...
Ralf Bundschuh (Ohio State University) Biologial sequence analysis May 6, 2008 17 / 1
RNA editing Computational approach
Protein family model
Probability model
Extract probabilities pi (a) to find amino acid a at position i
Neisseria meningitidisDrosophila melanogasterSynechococcus sp.Buchnera aphidicolaChloroflexus aurantiacusEscherichia coliRhodospirillum rubrum...
...VRADPHIGLLHRGTEKLAETKT-YLQALPYMDRLD...
...MRADPHIGLLHRGTEKLIEYKT-YTQALPYFDRLD...
...VDCEPVIGYLHRGMEKIAENRT-NVMFVPYVSRMD...
...VDCVPDIGYHHRGAEKMAERQS-WHSYIPYTDRIE...
...VNVAPDVGYLHTGIEKTMESKT-YQKAVVLTDRMD...
...IDADYRLFYVHRGMEKLAETRMGYNEVTFLSDRVC...
...IRNAVSTGTMWRGIELILKGRD-PRDAWAFTQRIC...
. 42 54..
i \ a A R N D C Q E G H I L K M F P S T W Y V
.
.
.42 0.05 0.01 0.02 0.02 0.005 0.01 0.02 0.68 0.007 0.009 0.02 0.02 0.006 0.008 0.02 0.04 0.02 0.004 0.007 0.01.
.
.54 0.07 0.09 0.14 0.05 0.005 0.04 0.04 0.04 0.07 0.02 0.03 0.03 0.009 0.03 0.02 0.09 0.05 0.007 0.15 0.02.
.
.
⇒ Probabilistic model of the whole protein family.
Ralf Bundschuh (Ohio State University) Biologial sequence analysis May 6, 2008 18 / 1
RNA editing Computational approach
Prediction method
Editing site prediction
Start with genomic sequence...CAGAATTGCGATCCACATATGGGCTTCTACATGAGGTACTGAAAAACTTATAGAACATAAGAATTTCTTACAATCTTCCTTATTTTGATG...
Arbitrarily insert C’s and translate...CAGAATTGCGACTCCACATATGGGCTTCTACATGACGGTACTGAAAAACTTATCAGAACATACAGAATTTCTCTACAATCTTCCTTATTTTGCATG...
Q N C D S T Y G L L H D G T E K L I R T Y R I S L Q S S L F C M
Calculate probability
p(. . .QNCDSTYG . . .) =
= . . . p35(Q)p36(N)p37(C )p38(D)p39(S)p40(T )p41(Y )p42(G ) . . .
Redo for all possibilities of inserting C’s
Pick insertion pattern with highest probability−→ prediction of editing sites
Ralf Bundschuh (Ohio State University) Biologial sequence analysis May 6, 2008 19 / 1
RNA editing Computational approach
Computational challenge
Challenge
After each base of a sequence of length N a C can be inserted or not⇒ Need to find the highest probability among 2N possible patterns.
Typical gene: N ≈ 1000 ⇒ 10300 patterns
Solution
Statistical Physics methods
Ralf Bundschuh (Ohio State University) Biologial sequence analysis May 6, 2008 20 / 1
Ising model
Outline
Ralf Bundschuh (Ohio State University) Biologial sequence analysis May 6, 2008 21 / 1
Ising model Ordered Ising model
Ising model
Spins
N spins
Two states each: ↑ (up), ↓ (down)
Described by variables s i with s i = +1 for up and s i = −1 for down
2N total states
Interactions
Spins want to align:
Energy −Js i s j with J > 0
1D Ising model
All spins on a one-dimensional lattice: ↑ ↑ ↓ ↑ ↓ ↑ ↓ ↑ ↓ ↓ ↓ ↑ ↑ . . .
Only nearest neighbor interaction: E = −J∑N
i=2 s i−1s i
Ralf Bundschuh (Ohio State University) Biologial sequence analysis May 6, 2008 22 / 1
Ising model Ordered Ising model
Ground state
↑ ↑ ↓ ↑ ↓ ↑ ↓ ↑ ↑ ↓ ↑ ↓ ↓ ↑ ↓ ↓ ↓ ↑ ↑
E = −JN∑
i=2
s i−1s i
Ferromagnetism
At zero temperature system finds ground state (lowest energy state)
Two ground states: ↑ ↑ ↑ ↑ ↑ ↑ ↑ and ↓ ↓ ↓ ↓ ↓ ↓ ↓Model of ferromagnetism
Antiferromagnetism
What happens if J < 0?
Still two ground states: ↑ ↓ ↑ ↓ ↑ ↓ ↑ and ↓ ↑ ↓ ↑ ↓ ↑ ↓
Ralf Bundschuh (Ohio State University) Biologial sequence analysis May 6, 2008 23 / 1
Ising model Ordered Ising model
Ground state
↑ ↑ ↓ ↑ ↓ ↑ ↓ ↑ ↑ ↓ ↑ ↓ ↓ ↑ ↓ ↓ ↓ ↑ ↑
E = −JN∑
i=2
s i−1s i
Ferromagnetism
At zero temperature system finds ground state (lowest energy state)
Two ground states: ↑ ↑ ↑ ↑ ↑ ↑ ↑ and ↓ ↓ ↓ ↓ ↓ ↓ ↓
Model of ferromagnetism
Antiferromagnetism
What happens if J < 0?
Still two ground states: ↑ ↓ ↑ ↓ ↑ ↓ ↑ and ↓ ↑ ↓ ↑ ↓ ↑ ↓
Ralf Bundschuh (Ohio State University) Biologial sequence analysis May 6, 2008 23 / 1
Ising model Ordered Ising model
Ground state
↑ ↑ ↓ ↑ ↓ ↑ ↓ ↑ ↑ ↓ ↑ ↓ ↓ ↑ ↓ ↓ ↓ ↑ ↑
E = −JN∑
i=2
s i−1s i
Ferromagnetism
At zero temperature system finds ground state (lowest energy state)
Two ground states: ↑ ↑ ↑ ↑ ↑ ↑ ↑ and ↓ ↓ ↓ ↓ ↓ ↓ ↓Model of ferromagnetism
Antiferromagnetism
What happens if J < 0?
Still two ground states: ↑ ↓ ↑ ↓ ↑ ↓ ↑ and ↓ ↑ ↓ ↑ ↓ ↑ ↓
Ralf Bundschuh (Ohio State University) Biologial sequence analysis May 6, 2008 23 / 1
Ising model Ordered Ising model
Ground state
↑ ↑ ↓ ↑ ↓ ↑ ↓ ↑ ↑ ↓ ↑ ↓ ↓ ↑ ↓ ↓ ↓ ↑ ↑
E = −JN∑
i=2
s i−1s i
Ferromagnetism
At zero temperature system finds ground state (lowest energy state)
Two ground states: ↑ ↑ ↑ ↑ ↑ ↑ ↑ and ↓ ↓ ↓ ↓ ↓ ↓ ↓Model of ferromagnetism
Antiferromagnetism
What happens if J < 0?
Still two ground states: ↑ ↓ ↑ ↓ ↑ ↓ ↑ and ↓ ↑ ↓ ↑ ↓ ↑ ↓
Ralf Bundschuh (Ohio State University) Biologial sequence analysis May 6, 2008 23 / 1
Ising model Ordered Ising model
Ground state
↑ ↑ ↓ ↑ ↓ ↑ ↓ ↑ ↑ ↓ ↑ ↓ ↓ ↑ ↓ ↓ ↓ ↑ ↑
E = −JN∑
i=2
s i−1s i
Ferromagnetism
At zero temperature system finds ground state (lowest energy state)
Two ground states: ↑ ↑ ↑ ↑ ↑ ↑ ↑ and ↓ ↓ ↓ ↓ ↓ ↓ ↓Model of ferromagnetism
Antiferromagnetism
What happens if J < 0?
Still two ground states: ↑ ↓ ↑ ↓ ↑ ↓ ↑ and ↓ ↑ ↓ ↑ ↓ ↑ ↓
Ralf Bundschuh (Ohio State University) Biologial sequence analysis May 6, 2008 23 / 1
Ising model Disordered Ising model
Disorder
↑ ↑ ↓ ↑ ↓ ↑ ↓ ↑ ↑ ↓ ↑ ↓ ↓ ↑ ↓ ↓ ↓ ↑ ↑
Disordered Ising model
What happens if J depends on position i?
E = −∑N
i=2 J i s i−1s i
J i random variables (say Gaussian with mean zero)
Some J i > 0 (·), some J i < 0 (·)Disorder: l · l · l · l · l · l · l · l · l · l · l · l · l · l · . . .
Ground states:
↑ · ↑ · ↑ · ↓ · ↓ · ↑ · ↓ · ↑ · ↑ · ↓ · ↓ · ↓ · ↓ · ↑ · . . .and↓ · ↓ · ↓ · ↑ · ↑ · ↓ · ↑ · ↓ · ↓ · ↑ · ↑ · ↑ · ↑ · ↓ · . . .
Ralf Bundschuh (Ohio State University) Biologial sequence analysis May 6, 2008 24 / 1
Ising model Disordered Ising model
Disorder
↑ ↑ ↓ ↑ ↓ ↑ ↓ ↑ ↑ ↓ ↑ ↓ ↓ ↑ ↓ ↓ ↓ ↑ ↑
Disordered Ising model
What happens if J depends on position i?
E = −∑N
i=2 J i s i−1s i
J i random variables (say Gaussian with mean zero)
Some J i > 0 (·), some J i < 0 (·)Disorder: l · l · l · l · l · l · l · l · l · l · l · l · l · l · . . .
Ground states:↑ ·
↑ · ↑ · ↓ · ↓ · ↑ · ↓ · ↑ · ↑ · ↓ · ↓ · ↓ · ↓ · ↑ · . . .and↓ · ↓ · ↓ · ↑ · ↑ · ↓ · ↑ · ↓ · ↓ · ↑ · ↑ · ↑ · ↑ · ↓ · . . .
Ralf Bundschuh (Ohio State University) Biologial sequence analysis May 6, 2008 24 / 1
Ising model Disordered Ising model
Disorder
↑ ↑ ↓ ↑ ↓ ↑ ↓ ↑ ↑ ↓ ↑ ↓ ↓ ↑ ↓ ↓ ↓ ↑ ↑
Disordered Ising model
What happens if J depends on position i?
E = −∑N
i=2 J i s i−1s i
J i random variables (say Gaussian with mean zero)
Some J i > 0 (·), some J i < 0 (·)Disorder: l · l · l · l · l · l · l · l · l · l · l · l · l · l · . . .
Ground states:↑ · ↑ ·
↑ · ↓ · ↓ · ↑ · ↓ · ↑ · ↑ · ↓ · ↓ · ↓ · ↓ · ↑ · . . .and↓ · ↓ · ↓ · ↑ · ↑ · ↓ · ↑ · ↓ · ↓ · ↑ · ↑ · ↑ · ↑ · ↓ · . . .
Ralf Bundschuh (Ohio State University) Biologial sequence analysis May 6, 2008 24 / 1
Ising model Disordered Ising model
Disorder
↑ ↑ ↓ ↑ ↓ ↑ ↓ ↑ ↑ ↓ ↑ ↓ ↓ ↑ ↓ ↓ ↓ ↑ ↑
Disordered Ising model
What happens if J depends on position i?
E = −∑N
i=2 J i s i−1s i
J i random variables (say Gaussian with mean zero)
Some J i > 0 (·), some J i < 0 (·)Disorder: l · l · l · l · l · l · l · l · l · l · l · l · l · l · . . .
Ground states:↑ · ↑ · ↑ ·
↓ · ↓ · ↑ · ↓ · ↑ · ↑ · ↓ · ↓ · ↓ · ↓ · ↑ · . . .and↓ · ↓ · ↓ · ↑ · ↑ · ↓ · ↑ · ↓ · ↓ · ↑ · ↑ · ↑ · ↑ · ↓ · . . .
Ralf Bundschuh (Ohio State University) Biologial sequence analysis May 6, 2008 24 / 1
Ising model Disordered Ising model
Disorder
↑ ↑ ↓ ↑ ↓ ↑ ↓ ↑ ↑ ↓ ↑ ↓ ↓ ↑ ↓ ↓ ↓ ↑ ↑
Disordered Ising model
What happens if J depends on position i?
E = −∑N
i=2 J i s i−1s i
J i random variables (say Gaussian with mean zero)
Some J i > 0 (·), some J i < 0 (·)Disorder: l · l · l · l · l · l · l · l · l · l · l · l · l · l · . . .
Ground states:↑ · ↑ · ↑ · ↓ ·
↓ · ↑ · ↓ · ↑ · ↑ · ↓ · ↓ · ↓ · ↓ · ↑ · . . .and↓ · ↓ · ↓ · ↑ · ↑ · ↓ · ↑ · ↓ · ↓ · ↑ · ↑ · ↑ · ↑ · ↓ · . . .
Ralf Bundschuh (Ohio State University) Biologial sequence analysis May 6, 2008 24 / 1
Ising model Disordered Ising model
Disorder
↑ ↑ ↓ ↑ ↓ ↑ ↓ ↑ ↑ ↓ ↑ ↓ ↓ ↑ ↓ ↓ ↓ ↑ ↑
Disordered Ising model
What happens if J depends on position i?
E = −∑N
i=2 J i s i−1s i
J i random variables (say Gaussian with mean zero)
Some J i > 0 (·), some J i < 0 (·)Disorder: l · l · l · l · l · l · l · l · l · l · l · l · l · l · . . .
Ground states:↑ · ↑ · ↑ · ↓ · ↓ ·
↑ · ↓ · ↑ · ↑ · ↓ · ↓ · ↓ · ↓ · ↑ · . . .and↓ · ↓ · ↓ · ↑ · ↑ · ↓ · ↑ · ↓ · ↓ · ↑ · ↑ · ↑ · ↑ · ↓ · . . .
Ralf Bundschuh (Ohio State University) Biologial sequence analysis May 6, 2008 24 / 1
Ising model Disordered Ising model
Disorder
↑ ↑ ↓ ↑ ↓ ↑ ↓ ↑ ↑ ↓ ↑ ↓ ↓ ↑ ↓ ↓ ↓ ↑ ↑
Disordered Ising model
What happens if J depends on position i?
E = −∑N
i=2 J i s i−1s i
J i random variables (say Gaussian with mean zero)
Some J i > 0 (·), some J i < 0 (·)Disorder: l · l · l · l · l · l · l · l · l · l · l · l · l · l · . . .
Ground states:↑ · ↑ · ↑ · ↓ · ↓ · ↑ ·
↓ · ↑ · ↑ · ↓ · ↓ · ↓ · ↓ · ↑ · . . .and↓ · ↓ · ↓ · ↑ · ↑ · ↓ · ↑ · ↓ · ↓ · ↑ · ↑ · ↑ · ↑ · ↓ · . . .
Ralf Bundschuh (Ohio State University) Biologial sequence analysis May 6, 2008 24 / 1
Ising model Disordered Ising model
Disorder
↑ ↑ ↓ ↑ ↓ ↑ ↓ ↑ ↑ ↓ ↑ ↓ ↓ ↑ ↓ ↓ ↓ ↑ ↑
Disordered Ising model
What happens if J depends on position i?
E = −∑N
i=2 J i s i−1s i
J i random variables (say Gaussian with mean zero)
Some J i > 0 (·), some J i < 0 (·)Disorder: l · l · l · l · l · l · l · l · l · l · l · l · l · l · . . .
Ground states:↑ · ↑ · ↑ · ↓ · ↓ · ↑ · ↓ · ↑ · ↑ · ↓ · ↓ · ↓ · ↓ · ↑ · . . .
and↓ · ↓ · ↓ · ↑ · ↑ · ↓ · ↑ · ↓ · ↓ · ↑ · ↑ · ↑ · ↑ · ↓ · . . .
Ralf Bundschuh (Ohio State University) Biologial sequence analysis May 6, 2008 24 / 1
Ising model Disordered Ising model
Disorder
↑ ↑ ↓ ↑ ↓ ↑ ↓ ↑ ↑ ↓ ↑ ↓ ↓ ↑ ↓ ↓ ↓ ↑ ↑
Disordered Ising model
What happens if J depends on position i?
E = −∑N
i=2 J i s i−1s i
J i random variables (say Gaussian with mean zero)
Some J i > 0 (·), some J i < 0 (·)Disorder: l · l · l · l · l · l · l · l · l · l · l · l · l · l · . . .
Ground states:↑ · ↑ · ↑ · ↓ · ↓ · ↑ · ↓ · ↑ · ↑ · ↓ · ↓ · ↓ · ↓ · ↑ · . . .and↓ · ↓ · ↓ · ↑ · ↑ · ↓ · ↑ · ↓ · ↓ · ↑ · ↑ · ↑ · ↑ · ↓ · . . .
Ralf Bundschuh (Ohio State University) Biologial sequence analysis May 6, 2008 24 / 1
Ising model Next-nearest neighbor interactions
Next-nearest neighbor interactions
More interactions
E = −N∑
i=2
J i s i−1s i −N∑
i=3
K i s i−2s i
⇒ frustration⇒ ground state depends on actual values of J i and K i
Question
How to find the ground state and its energy?
Answer
Transfer matrix approach
Ralf Bundschuh (Ohio State University) Biologial sequence analysis May 6, 2008 25 / 1
Ising model Next-nearest neighbor interactions
Next-nearest neighbor interactions
More interactions
E = −N∑
i=2
J i s i−1s i −N∑
i=3
K i s i−2s i
⇒ frustration⇒ ground state depends on actual values of J i and K i
Question
How to find the ground state and its energy?
Answer
Transfer matrix approach
Ralf Bundschuh (Ohio State University) Biologial sequence analysis May 6, 2008 25 / 1
Ising model Next-nearest neighbor interactions
Next-nearest neighbor interactions
More interactions
E = −N∑
i=2
J i s i−1s i −N∑
i=3
K i s i−2s i
⇒ frustration⇒ ground state depends on actual values of J i and K i
Question
How to find the ground state and its energy?
Answer
Transfer matrix approach
Ralf Bundschuh (Ohio State University) Biologial sequence analysis May 6, 2008 25 / 1
Ising model Next-nearest neighbor interactions
Next-nearest neighbor interactions
More interactions
E = −N∑
i=2
J i s i−1s i −N∑
i=3
K i s i−2s i
⇒ frustration⇒ ground state depends on actual values of J i and K i
Question
How to find the ground state and its energy?
Answer
Transfer matrix approach
Ralf Bundschuh (Ohio State University) Biologial sequence analysis May 6, 2008 25 / 1
Ising model Next-nearest neighbor interactions
Transfer matrix
E = −N∑
i=2
J i s i−1s i −N∑
i=3
K i s i−2s i
Definition
En(s′, s) ground state energy of s1 . . . sn with sn−1 = s ′ and sn = s
Recursion
sn−2 is either +1 or −1
⇒ En(s′, s) = min
{
−Jns′s − Kn(+1)s + En−1(+1, s ′)
−Jns′s − Kn(−1)s + En−1(−1, s ′)
Boundary condition
E2(s′, s) = −J2s
′s ⇒ can calculate ground state energy in N steps
Ralf Bundschuh (Ohio State University) Biologial sequence analysis May 6, 2008 26 / 1
Ising model Next-nearest neighbor interactions
Transfer matrix
E = −N∑
i=2
J i s i−1s i −N∑
i=3
K i s i−2s i
Definition
En(s′, s) ground state energy of s1 . . . sn with sn−1 = s ′ and sn = s
Recursion
sn−2 is either +1 or −1
⇒ En(s′, s) = min
{−Jns
′s
− Kn(+1)s + En−1(+1, s ′)−Jns
′s − Kn(−1)s + En−1(−1, s ′)
Boundary condition
E2(s′, s) = −J2s
′s ⇒ can calculate ground state energy in N steps
Ralf Bundschuh (Ohio State University) Biologial sequence analysis May 6, 2008 26 / 1
Ising model Next-nearest neighbor interactions
Transfer matrix
E = −N∑
i=2
J i s i−1s i −N∑
i=3
K i s i−2s i
Definition
En(s′, s) ground state energy of s1 . . . sn with sn−1 = s ′ and sn = s
Recursion
sn−2 is either +1 or −1
⇒ En(s′, s) = min
{−Jns
′s − Kn(+1)s
+ En−1(+1, s ′)−Jns
′s − Kn(−1)s + En−1(−1, s ′)
Boundary condition
E2(s′, s) = −J2s
′s ⇒ can calculate ground state energy in N steps
Ralf Bundschuh (Ohio State University) Biologial sequence analysis May 6, 2008 26 / 1
Ising model Next-nearest neighbor interactions
Transfer matrix
E = −N∑
i=2
J i s i−1s i −N∑
i=3
K i s i−2s i
Definition
En(s′, s) ground state energy of s1 . . . sn with sn−1 = s ′ and sn = s
Recursion
sn−2 is either +1 or −1
⇒ En(s′, s) = min
{−Jns
′s − Kn(+1)s + En−1(+1, s ′)
−Jns′s − Kn(−1)s + En−1(−1, s ′)
Boundary condition
E2(s′, s) = −J2s
′s ⇒ can calculate ground state energy in N steps
Ralf Bundschuh (Ohio State University) Biologial sequence analysis May 6, 2008 26 / 1
Ising model Next-nearest neighbor interactions
Transfer matrix
E = −N∑
i=2
J i s i−1s i −N∑
i=3
K i s i−2s i
Definition
En(s′, s) ground state energy of s1 . . . sn with sn−1 = s ′ and sn = s
Recursion
sn−2 is either +1 or −1
⇒ En(s′, s) = min
{−Jns
′s − Kn(+1)s + En−1(+1, s ′)−Jns
′s − Kn(−1)s + En−1(−1, s ′)
Boundary condition
E2(s′, s) = −J2s
′s ⇒ can calculate ground state energy in N steps
Ralf Bundschuh (Ohio State University) Biologial sequence analysis May 6, 2008 26 / 1
Ising model Next-nearest neighbor interactions
Transfer matrix
E = −N∑
i=2
J i s i−1s i −N∑
i=3
K i s i−2s i
Definition
En(s′, s) ground state energy of s1 . . . sn with sn−1 = s ′ and sn = s
Recursion
sn−2 is either +1 or −1
⇒ En(s′, s) = min
{−Jns
′s − Kn(+1)s + En−1(+1, s ′)−Jns
′s − Kn(−1)s + En−1(−1, s ′)
Boundary condition
E2(s′, s) = −J2s
′s ⇒ can calculate ground state energy in N steps
Ralf Bundschuh (Ohio State University) Biologial sequence analysis May 6, 2008 26 / 1
Computational prediction of RNA editing
Outline
Ralf Bundschuh (Ohio State University) Biologial sequence analysis May 6, 2008 27 / 1
Computational prediction of RNA editing Computational method
Analogy to RNA editing
Ising model RNA editing
spin presence or absence of editing site
2N states 2N states
energy -log(probability)
sum of local contributions sum of local contributions(neighbor interactions) (amino acid probabilities)
ground state most plausible editing sites
Ralf Bundschuh (Ohio State University) Biologial sequence analysis May 6, 2008 28 / 1
Computational prediction of RNA editing Computational method
PIE recursion
Setup
Genomic sequence b1 . . . bN ; protein model: pi (a) for i = 1, . . . ,M
Auxiliary quantity
E i ,j is the negative logarithm of the probability of the most probableediting configuration ending at model position i and genomic position j
Without editingE i ,j = − log pi (aa[bj − 2, bj − 1, bj ]) + E i−1,j−3
With editing
E i ,j = min
− log pi (aa[bj − 2, bj − 1, bj ]) + E i−1,j−3
− log pi (aa[C , bj − 1, bj ]) + E i−1,j−2
− log pi (aa[bj − 1,C , bj ]) + E i−1,j−2
− log pi (aa[bj − 1, bj ,C ]) + E i−1,j−2
Ralf Bundschuh (Ohio State University) Biologial sequence analysis May 6, 2008 29 / 1
Computational prediction of RNA editing Computational method
Bells and whistles
Refinements
Avoid too many editing sites by penalizing editing sites
Use biological information:editing sites often after purine-pyrimidine−→ lower editing penalty after purine-pyrimidine pattern
Allow arbitrary starting point in protein sequence
Allow insertions and deletions in protein sequence
Ralf Bundschuh (Ohio State University) Biologial sequence analysis May 6, 2008 30 / 1
Computational prediction of RNA editing Results on known genes
Performance on known genes
Assessment method
Use one of the six known genes to optimize parameters
Test on other five genes
Repeat for all six genes (“leave one out testing”)
Questions
How many of the amino acids are predicted correctly?How many of the C insertions are predicted correctly?How far off are incorrect predictions of C insertions?
Ralf Bundschuh (Ohio State University) Biologial sequence analysis May 6, 2008 31 / 1
Computational prediction of RNA editing Results on known genes
Prediction quality
Results
gene amino acids C insertions off by1 2 3 ≥ 4
nad7 92% 116/171 = 68% 9 12 7 28cox1 93% 112/159 = 70% 8 15 8 27cox3 81% 134/181 = 74% 9 14 9 55cytb 93% 118/172 = 68% 11 11 6 15atp 93% 106/152 = 70% 7 8 4 15pL 93% 144/199 = 72% 10 18 9 38
total 92% 122/173 = 71% 12 9 8 22
Ralf Bundschuh (Ohio State University) Biologial sequence analysis May 6, 2008 32 / 1
Computational prediction of RNA editing Finding new genes
Gene finding
Real test — finding new genes
Search for missing genes nad2, nad4L, nad6, and atp8
These genes could not be found by traditional gene finding
Step 1 — find location
Pick a gene from the list
Build PIE model from protein sequences of other organisms
Cut genome into short overlapping pieces (length 1200 bases)
Apply PIE to every piece of the genome
PIE predicts best way to insert C’sin each piece plus “ground state energy”
Identify position of gene in genome bymaximum in ground state energy 0 10000 20000 30000 40000 50000 60000
genome position-150
-100
-50
0
grou
nd s
tate
ene
rgy
forward strandbackward strand
Ralf Bundschuh (Ohio State University) Biologial sequence analysis May 6, 2008 33 / 1
Computational prediction of RNA editing Finding new genes
Results on new genes
Approach (continued)
Step 2: prediction of editing sitesStep 3: verification by experimental sequencing of mRNA
Results
Location of all four genes found
All four genes confirmed bysequencing of mRNA
Surprise: new type of editing (deletionalediting) found in one of the genes
Total increase of the known number of editing sites by 50%
Still no significant sequence pattern found
Ralf Bundschuh (Ohio State University) Biologial sequence analysis May 6, 2008 34 / 1
Computational prediction of RNA editing Finding new genes
Additional predictions
Systematically search for all known mitochondrial genes
Find 11 genes beyond the four experimentally verified ones
Find 8 more candidates with lower statistical significance
In total increased number of predicted genes from 11 to 26–34
Still have to be verified experimentally
Ralf Bundschuh (Ohio State University) Biologial sequence analysis May 6, 2008 35 / 1
Conclusions and outlook
Outline
Ralf Bundschuh (Ohio State University) Biologial sequence analysis May 6, 2008 36 / 1
Conclusions and outlook Summary
Summary
Conclusions
Biological sequences are plentiful and challenging to interpret
Statistical Physics provides useful methods for Biological sequenceanalysis
Insertional editing sites in Physarum polycephalum can be predictedwith high precision
Outlook
Find remaining genes
Combine genomic sequences from several organisms
Identify editing signals and mechanisms
Substitutional editing
Ralf Bundschuh (Ohio State University) Biologial sequence analysis May 6, 2008 37 / 1
Conclusions and outlook Acknowledgements
Acknowledgements
Ohio State University
Tsunglin Liu → UCSB
Ha Youn Lee → University of Rochester
Christina Beargie → COSI
Case Western Reserve University
Jonatha Gott
Neeta Parimi
$$$
National Science Foundation (RB)
National Institutes of Health (JG)
Ralf Bundschuh (Ohio State University) Biologial sequence analysis May 6, 2008 38 / 1
top related