prediction of subcellular localization of proteins ~ past, present, and future ~ human genome...
TRANSCRIPT
![Page 1: Prediction of Subcellular Localization of Proteins ~ Past, Present, and Future ~ Human Genome Center, Inst. Med. Sci., University of Tokyo Kenta Nakai](https://reader036.vdocuments.us/reader036/viewer/2022062407/56649e3f5503460f94b306b3/html5/thumbnails/1.jpg)
Prediction of Subcellular Locali
zation of Proteins
~ Past, Present, and Future ~
Human Genome Center, Inst. Med. Sci.,
University of Tokyo
Kenta Nakai
Swiss-Prot 20 Years
![Page 2: Prediction of Subcellular Localization of Proteins ~ Past, Present, and Future ~ Human Genome Center, Inst. Med. Sci., University of Tokyo Kenta Nakai](https://reader036.vdocuments.us/reader036/viewer/2022062407/56649e3f5503460f94b306b3/html5/thumbnails/2.jpg)
20 Years Ago..
• I became a graduate student i
n Prof. Minoru Kanehisa’s lab
• I wanted to write a program th
at interprets the information e
ncoded in DNA sequences
• But biology is full of exception
s
![Page 3: Prediction of Subcellular Localization of Proteins ~ Past, Present, and Future ~ Human Genome Center, Inst. Med. Sci., University of Tokyo Kenta Nakai](https://reader036.vdocuments.us/reader036/viewer/2022062407/56649e3f5503460f94b306b3/html5/thumbnails/3.jpg)
Diagnosis System of Bacterial Infections (MYCIN 1974)
Enter Information about the patient. (Name, Age, Sex, and Race) Are there any positive cultures obtained from SALLY? … Has SALLY recently had symptoms of persistent headache or other
abnormal neurologic symptoms (dizziness, lethargy, etc.)? …
Enter Information about the patient. (Name, Age, Sex, and Race) Are there any positive cultures obtained from SALLY? … Has SALLY recently had symptoms of persistent headache or other
abnormal neurologic symptoms (dizziness, lethargy, etc.)? …
INFECTION-1 is MENINGITIS
+ <ITEM-1> MYCOBACTERIUM-TB [from clinical evidence only]
+ …
[REC-1] My preferred therapy recommendation is as follows:
1) ETHAMBUTAL
Dose: 1.289 (13.0 100mg-tablets) q24h PO for 60 days [calculated
on basis of 25 mg/kg] then 770 mg (7.5 100mg-tablets) q24h PO ..
INFECTION-1 is MENINGITIS
+ <ITEM-1> MYCOBACTERIUM-TB [from clinical evidence only]
+ …
[REC-1] My preferred therapy recommendation is as follows:
1) ETHAMBUTAL
Dose: 1.289 (13.0 100mg-tablets) q24h PO for 60 days [calculated
on basis of 25 mg/kg] then 770 mg (7.5 100mg-tablets) q24h PO ..
![Page 4: Prediction of Subcellular Localization of Proteins ~ Past, Present, and Future ~ Human Genome Center, Inst. Med. Sci., University of Tokyo Kenta Nakai](https://reader036.vdocuments.us/reader036/viewer/2022062407/56649e3f5503460f94b306b3/html5/thumbnails/4.jpg)
Knowledge Base for Automatic Reasoning
• Knowledge is represented as a collection of “if-then”
rules, which are chained to make the system solve a
realistic problem
Rule 123
If: the gram stain of the organism is negative
and: the aerobicity of the organism is anaerobic
and: the morphology of the organism is rod
then: the genus of the organism is bacteroides
with a certainty factor of 0.6
Rule 123
If: the gram stain of the organism is negative
and: the aerobicity of the organism is anaerobic
and: the morphology of the organism is rod
then: the genus of the organism is bacteroides
with a certainty factor of 0.6
Working Memory
Name: Sally
Age: 42 years
Sex: Female
Race: …
Working Memory
Name: Sally
Age: 42 years
Sex: Female
Race: …
![Page 5: Prediction of Subcellular Localization of Proteins ~ Past, Present, and Future ~ Human Genome Center, Inst. Med. Sci., University of Tokyo Kenta Nakai](https://reader036.vdocuments.us/reader036/viewer/2022062407/56649e3f5503460f94b306b3/html5/thumbnails/5.jpg)
Expert Systemsエキスパート・システム
Knowledge Base
Inference Engine
![Page 6: Prediction of Subcellular Localization of Proteins ~ Past, Present, and Future ~ Human Genome Center, Inst. Med. Sci., University of Tokyo Kenta Nakai](https://reader036.vdocuments.us/reader036/viewer/2022062407/56649e3f5503460f94b306b3/html5/thumbnails/6.jpg)
Sample Problem
![Page 7: Prediction of Subcellular Localization of Proteins ~ Past, Present, and Future ~ Human Genome Center, Inst. Med. Sci., University of Tokyo Kenta Nakai](https://reader036.vdocuments.us/reader036/viewer/2022062407/56649e3f5503460f94b306b3/html5/thumbnails/7.jpg)
Prediction of Subcellular Localization
![Page 8: Prediction of Subcellular Localization of Proteins ~ Past, Present, and Future ~ Human Genome Center, Inst. Med. Sci., University of Tokyo Kenta Nakai](https://reader036.vdocuments.us/reader036/viewer/2022062407/56649e3f5503460f94b306b3/html5/thumbnails/8.jpg)
Typical Sorting SignalsSignal Function Example
Import into nucleus -P-P-K-K-K-R-K-V-
Export from nucleus -L-A-L-K-L-A-G-L-D-I-
Import into mitochondria <-MLSLRQSIRFFKPATRTLCSSRYLL-
Import into plastid <-MVAMAMASLQSSMSSLSLSSNS
FLGQPLSPITLSPFLQG-
Import into peroxisomes -S-K-L->
Import into ER <-MMSFVSLLLVGILFWAT
EAEQLTKCEVFN-
Return to ER -K-D-E-L->
![Page 9: Prediction of Subcellular Localization of Proteins ~ Past, Present, and Future ~ Human Genome Center, Inst. Med. Sci., University of Tokyo Kenta Nakai](https://reader036.vdocuments.us/reader036/viewer/2022062407/56649e3f5503460f94b306b3/html5/thumbnails/9.jpg)
Amino Acid Composition
• Another good clue for
prediction
• Suited for machine
learning
Outer membrane proteins and periplasmic proteins of Gram-negative bacteria
![Page 10: Prediction of Subcellular Localization of Proteins ~ Past, Present, and Future ~ Human Genome Center, Inst. Med. Sci., University of Tokyo Kenta Nakai](https://reader036.vdocuments.us/reader036/viewer/2022062407/56649e3f5503460f94b306b3/html5/thumbnails/10.jpg)
PSORT (I)• Nakai & Kanehisa, 1991, 1992
• Expert system using about 100 “If-then” rules
ERM PM LSM ERL LSL OT ERM PM MT MTMT MT NC PX ERM PM GG CPOM IT MX
GY
motif
KK
signal peptide
(Specific Signals)
KDEL
GPI
Topology
MTSNLS
SKL
TMS
TMSTopology
Apolar
Topology
TMS in Mature Part
signal cleavage site
IM
![Page 11: Prediction of Subcellular Localization of Proteins ~ Past, Present, and Future ~ Human Genome Center, Inst. Med. Sci., University of Tokyo Kenta Nakai](https://reader036.vdocuments.us/reader036/viewer/2022062407/56649e3f5503460f94b306b3/html5/thumbnails/11.jpg)
Papers and the web server
• Nakai & Kanehisa, Proteins 1991
– cited 295 times
• Nakai & Kanehisa, Genomics 1992
– cited 961 times
– 34 in 2006
• Web server since 1993
![Page 12: Prediction of Subcellular Localization of Proteins ~ Past, Present, and Future ~ Human Genome Center, Inst. Med. Sci., University of Tokyo Kenta Nakai](https://reader036.vdocuments.us/reader036/viewer/2022062407/56649e3f5503460f94b306b3/html5/thumbnails/12.jpg)
Limitations of PSORT
• Relatively low accuracy possibly because of the
complexity of the sorting mechanisms
• It is difficult to optimize the certainty parameters
assigned for each rule
• It is tedious to update the knowledge base with the
growth of the training data
![Page 13: Prediction of Subcellular Localization of Proteins ~ Past, Present, and Future ~ Human Genome Center, Inst. Med. Sci., University of Tokyo Kenta Nakai](https://reader036.vdocuments.us/reader036/viewer/2022062407/56649e3f5503460f94b306b3/html5/thumbnails/13.jpg)
PSORT II
• Nakai & Horton, 1997, 19
99 (cited 638 times)
• Machine learning
• kNN (k-nearest neighbor)
method Q
k = 3
![Page 14: Prediction of Subcellular Localization of Proteins ~ Past, Present, and Future ~ Human Genome Center, Inst. Med. Sci., University of Tokyo Kenta Nakai](https://reader036.vdocuments.us/reader036/viewer/2022062407/56649e3f5503460f94b306b3/html5/thumbnails/14.jpg)
iPSORT: Bannai et al. 2002
Rule 1
A protein has an SP if the sum of hydropathy index values within [6,25] exceeds 18.3
Rule 2
A protein has either an mTP or a cTP if it contains less than 3 D/Es within [1,30] and if it contains a motif similar to 11212111, where 2=(I,R),3=(D,E,H,K,N),1=otherwise
Rule 3
A protein has an mTP if it satisfies Rule 2, if the sum of isoelectric point values within [1,15] exceeds 93, and if it contains a motif similar to 12211221, where 2=(K,R),3=(I,P),1=otherwise
![Page 15: Prediction of Subcellular Localization of Proteins ~ Past, Present, and Future ~ Human Genome Center, Inst. Med. Sci., University of Tokyo Kenta Nakai](https://reader036.vdocuments.us/reader036/viewer/2022062407/56649e3f5503460f94b306b3/html5/thumbnails/15.jpg)
PSORTb and PSORT.ORG
• Gardy et al. 2003, 2004
– Contribution from a Canadia
n group (Brinkman lab)
• Update for bacterial proteins
![Page 16: Prediction of Subcellular Localization of Proteins ~ Past, Present, and Future ~ Human Genome Center, Inst. Med. Sci., University of Tokyo Kenta Nakai](https://reader036.vdocuments.us/reader036/viewer/2022062407/56649e3f5503460f94b306b3/html5/thumbnails/16.jpg)
WoLF-PSORT
• Horton et al. 2006
• Latest PSORT update for eu
karyotic proteins
• WoLF: Women only Love Fo
ols!?
![Page 17: Prediction of Subcellular Localization of Proteins ~ Past, Present, and Future ~ Human Genome Center, Inst. Med. Sci., University of Tokyo Kenta Nakai](https://reader036.vdocuments.us/reader036/viewer/2022062407/56649e3f5503460f94b306b3/html5/thumbnails/17.jpg)
Current Dilemma• More data are necessary to improve the training
process
• The practical value of prediction methods becomes
less with the growth of experimental data
• Moreover, the more we investigate, the more the
number of exceptions grows
![Page 18: Prediction of Subcellular Localization of Proteins ~ Past, Present, and Future ~ Human Genome Center, Inst. Med. Sci., University of Tokyo Kenta Nakai](https://reader036.vdocuments.us/reader036/viewer/2022062407/56649e3f5503460f94b306b3/html5/thumbnails/18.jpg)
It’s a General Problem• Gene Finding
• Prediction of Protein Structure
• …
• Knowing the answer of a problem before we become
to know how to solve it
Similarity search against the data of typical model
organisms will become enough in many cases
![Page 19: Prediction of Subcellular Localization of Proteins ~ Past, Present, and Future ~ Human Genome Center, Inst. Med. Sci., University of Tokyo Kenta Nakai](https://reader036.vdocuments.us/reader036/viewer/2022062407/56649e3f5503460f94b306b3/html5/thumbnails/19.jpg)
New Generation Predictors
• Should be useful to engineer proteins for their targeti
ng sites
• Should complement errors of proteome analyses (i.
e., isoforms with differential localization)
• Comprehensively example-based rather than statistic
al feature-based (such as amino acid composition)
![Page 20: Prediction of Subcellular Localization of Proteins ~ Past, Present, and Future ~ Human Genome Center, Inst. Med. Sci., University of Tokyo Kenta Nakai](https://reader036.vdocuments.us/reader036/viewer/2022062407/56649e3f5503460f94b306b3/html5/thumbnails/20.jpg)
Biology is like Linguistics• Both are naturally born and full of exceptions
• There may not exist “general principles”
![Page 21: Prediction of Subcellular Localization of Proteins ~ Past, Present, and Future ~ Human Genome Center, Inst. Med. Sci., University of Tokyo Kenta Nakai](https://reader036.vdocuments.us/reader036/viewer/2022062407/56649e3f5503460f94b306b3/html5/thumbnails/21.jpg)
Future of Sequence Analysis
• It will become “DNA linguistics”
• Large dictionaries (databases) will contain both gener
al cases and exceptions
• Such databases may be a sort of knowledge base th
at can be used to simulate the subcellular processes
![Page 22: Prediction of Subcellular Localization of Proteins ~ Past, Present, and Future ~ Human Genome Center, Inst. Med. Sci., University of Tokyo Kenta Nakai](https://reader036.vdocuments.us/reader036/viewer/2022062407/56649e3f5503460f94b306b3/html5/thumbnails/22.jpg)
Past, Present, and Future
• Past
– Expert system-based predictions
• Present
– Machine learning-based predictions
• Future
– Combination of both?
– Revival of knowledge bases to simulate cellular processes?
![Page 23: Prediction of Subcellular Localization of Proteins ~ Past, Present, and Future ~ Human Genome Center, Inst. Med. Sci., University of Tokyo Kenta Nakai](https://reader036.vdocuments.us/reader036/viewer/2022062407/56649e3f5503460f94b306b3/html5/thumbnails/23.jpg)
Acknowledgments
• Minoru Kanehisa
• Paul Horton
• Hideo Bannai, Satoru Miyano
• Jennifer Gardy, Fiona Brinkman
• And all the other people who contributed to the PSO
RT project!