protein modeling using machine learning...
TRANSCRIPT
![Page 1: Protein Modeling using Machine Learning Methodspages.cpsc.ucalgary.ca/~mrichter/ML/Older/midtermexamples...exposed on the outer surface Protein Structure (3/3) •Quaternary –Involves](https://reader034.vdocuments.us/reader034/viewer/2022042223/5ec9a663d600f2148a48dbc1/html5/thumbnails/1.jpg)
Machine Learning For
Protein Modeling
Presented By
Ellen Huynh
![Page 2: Protein Modeling using Machine Learning Methodspages.cpsc.ucalgary.ca/~mrichter/ML/Older/midtermexamples...exposed on the outer surface Protein Structure (3/3) •Quaternary –Involves](https://reader034.vdocuments.us/reader034/viewer/2022042223/5ec9a663d600f2148a48dbc1/html5/thumbnails/2.jpg)
What are Proteins?
• Complex, high molecular
mass, organic compounds
• Consists of a specific order of amino acids (aa’s) joined together by peptide bonds
• The order of the aa’s is determined by the base sequence of nucleotides in the gene that codes for the protein
![Page 3: Protein Modeling using Machine Learning Methodspages.cpsc.ucalgary.ca/~mrichter/ML/Older/midtermexamples...exposed on the outer surface Protein Structure (3/3) •Quaternary –Involves](https://reader034.vdocuments.us/reader034/viewer/2022042223/5ec9a663d600f2148a48dbc1/html5/thumbnails/3.jpg)
Why Study Proteins?
• Proteins are required for structure, function and regulation of body’s cells, tissues and organs
• Each protein has a unique functions, determine by their structure
• Examples of protein are enzymes, hormones, antibodies
![Page 4: Protein Modeling using Machine Learning Methodspages.cpsc.ucalgary.ca/~mrichter/ML/Older/midtermexamples...exposed on the outer surface Protein Structure (3/3) •Quaternary –Involves](https://reader034.vdocuments.us/reader034/viewer/2022042223/5ec9a663d600f2148a48dbc1/html5/thumbnails/4.jpg)
Protein Structures (1/3)• Primary
– Amino acid sequence of polypeptide chain (linear)
– Determined by the gene that encodes it
• Secondary– Three types: -helix,
-sheets, coils
– Local ordered structure brought
about via hydrogen bonding
mainly within the peptide
backbone
– -helix: backbone H-bonds
link residues i and i+4
– -sheets: H-bonds link two
sequence segments
![Page 5: Protein Modeling using Machine Learning Methodspages.cpsc.ucalgary.ca/~mrichter/ML/Older/midtermexamples...exposed on the outer surface Protein Structure (3/3) •Quaternary –Involves](https://reader034.vdocuments.us/reader034/viewer/2022042223/5ec9a663d600f2148a48dbc1/html5/thumbnails/5.jpg)
Protein Structure (2/3)
• Tertiary– "global" folding of a single polypeptide chain
– driving force in determining the tertiary structure of globular proteins is the
hydrophobic effect
– Folding so that side chains of the
nonpolar amino acids are
"hidden“ within the
structure and the side chains
of the polar residues are
exposed on the outer surface
![Page 6: Protein Modeling using Machine Learning Methodspages.cpsc.ucalgary.ca/~mrichter/ML/Older/midtermexamples...exposed on the outer surface Protein Structure (3/3) •Quaternary –Involves](https://reader034.vdocuments.us/reader034/viewer/2022042223/5ec9a663d600f2148a48dbc1/html5/thumbnails/6.jpg)
Protein Structure (3/3)
• Quaternary
– Involves 2 or more
polypeptide chain to form
a multi-subunit
structure
![Page 7: Protein Modeling using Machine Learning Methodspages.cpsc.ucalgary.ca/~mrichter/ML/Older/midtermexamples...exposed on the outer surface Protein Structure (3/3) •Quaternary –Involves](https://reader034.vdocuments.us/reader034/viewer/2022042223/5ec9a663d600f2148a48dbc1/html5/thumbnails/7.jpg)
Pre-Machine Learning Methods
• X-ray and NMR were used to determine structure and function of proteins
• Methods were costly and time consuming
![Page 8: Protein Modeling using Machine Learning Methodspages.cpsc.ucalgary.ca/~mrichter/ML/Older/midtermexamples...exposed on the outer surface Protein Structure (3/3) •Quaternary –Involves](https://reader034.vdocuments.us/reader034/viewer/2022042223/5ec9a663d600f2148a48dbc1/html5/thumbnails/8.jpg)
Goal
• Increase the accuracy of Protein Structure prediction, mainly at the secondary level, in an effective manner to help improve the understanding of
protein functions
![Page 9: Protein Modeling using Machine Learning Methodspages.cpsc.ucalgary.ca/~mrichter/ML/Older/midtermexamples...exposed on the outer surface Protein Structure (3/3) •Quaternary –Involves](https://reader034.vdocuments.us/reader034/viewer/2022042223/5ec9a663d600f2148a48dbc1/html5/thumbnails/9.jpg)
Machine Learning Methods (1/2)• Neural Networks
– Trained pairwise neural networks
– Networks are initialized with random uniform weights and subsequently trained through backpropagation
• Hidden Markov Method– Modeling stochastic sequences with probabilistic finite
state machine
– Character in position t depends only on the k preceding characters, where k = order of Markov Chain
– Hidden process: secondary structure of protein
– Observed process: amino acid sequence
– Prediction achieved with forward/backward algorithm
![Page 10: Protein Modeling using Machine Learning Methodspages.cpsc.ucalgary.ca/~mrichter/ML/Older/midtermexamples...exposed on the outer surface Protein Structure (3/3) •Quaternary –Involves](https://reader034.vdocuments.us/reader034/viewer/2022042223/5ec9a663d600f2148a48dbc1/html5/thumbnails/10.jpg)
Protein Alphabets
• Structural Alphabet: 20 amino acids
• Chemical Alphabet: acidic, aliphatic, amide, aromatic, basic, hydroxyl, etc.
• Functional Alphabet: acidic, basic, hydrophoic nonpolar, polar uncharged
• Charge Alphabet: acidic, basic, neutral
• Hydrophobic Alphabet: hydrophobic, hydrophilic
![Page 11: Protein Modeling using Machine Learning Methodspages.cpsc.ucalgary.ca/~mrichter/ML/Older/midtermexamples...exposed on the outer surface Protein Structure (3/3) •Quaternary –Involves](https://reader034.vdocuments.us/reader034/viewer/2022042223/5ec9a663d600f2148a48dbc1/html5/thumbnails/11.jpg)
Attribute
• Window size (W) that covers a relevant sequence
• Input: Protein sequence: p = p1p2…p1n
• Output: -helix (H), -sheets (B), coils (C)
• Data Set: www.pdb.org
• Trained weights: determine by previous set of alphabets and data set
![Page 12: Protein Modeling using Machine Learning Methodspages.cpsc.ucalgary.ca/~mrichter/ML/Older/midtermexamples...exposed on the outer surface Protein Structure (3/3) •Quaternary –Involves](https://reader034.vdocuments.us/reader034/viewer/2022042223/5ec9a663d600f2148a48dbc1/html5/thumbnails/12.jpg)
Additional Information
• How far into Project?
– Have done researches into possible algorithms that can be implemented
• Risk?
![Page 13: Protein Modeling using Machine Learning Methodspages.cpsc.ucalgary.ca/~mrichter/ML/Older/midtermexamples...exposed on the outer surface Protein Structure (3/3) •Quaternary –Involves](https://reader034.vdocuments.us/reader034/viewer/2022042223/5ec9a663d600f2148a48dbc1/html5/thumbnails/13.jpg)
References• Baldi, P., Brunak, S. (1998). “Bioinformatics: The
Machine Learning Approach.” The MIT Press.
• Gorga, F.R. (2001). “Introduction to Protein Structure” http://webhost.bridgew.edu/fgorga/proteins/default.htm
• Martin, J., Gibrat, J., Rodolphe, F. “Hidden Markov Model for Protein Secondary Structure.”
• Won, K., Hamelryck, T., Prugel-Bennett, A., Krogh, A. “Evolving Hidden Markov Models for Protein Secondary Structure Prediction.”
• Zhang, B., Zhihang, C. Murphey, Y.L. (2005). “Protein Secondary Structure Prediction Using Machine Learning.”