protein structure prediction using neural networks · protein structure prediction using neural...

Protein Structure PredictionUsing Neural Networks

Martha MercaldiKasia Wilamowska

Literature ReviewDecember 16, 2003

The Protein Folding Problem

Evolution of Neural Networks

• Neural networks originally designed to approximate connections betweenneurons in the brain

Evolution of Neural Networks

Why use Neural Nets for ProteinFolding?

• Successful applications in:– Secondary structure prediction

– Solvent access

• No “inherent shortcoming” yet found

• Can incorporate evolutionary informationvia multiple alignments

• Detect previous misclassifications

Protein Secondary Structure PredictionBased on Denoeux Belief Neural Network

• Purpose– Using neural nets, effectively predict the

secondary structure of proteins.

• Current best for secondary structureprediction is SSpro8 with accuracy in therange of 62-63%

• Input to the system– Can choose to use DNA or

amino acid sequences

– SSpro8 uses amino acidsequences

– The authors’ system,UTMPred, uses DNA

• Output - forms consistingof alpha helices, betasheets and loopsexpanded to eightstructure forms

Protein Secondary Structure Forms

Protein Secondary Structure Prediction Based on Denoeux Belief Neural Network

x - input tothe network

y – numberof inputnodes

p – prototype,representing a number ofk nearest previouslytrained input to the currenttested input

i – number of prototypesin y-dimensional featurespace

s – the activation function

M – number of outputclasses

ujq – a weight which

represents the degree ofmembership forprototype j to outputclass q

mjq – the BAA mass,

product of weight mjq and

activation function sj

hjq – the conjunctive

combination of the BBA’s

• The DBNN input are DNAsequences converted tobinary format prior to use

• The sequences are:– 88 Escheichia coli proteins

– 25 yeast Saccharomycescerevisiae proteins

– 166 mammalian proteins(80 of which are human)

• The input window sizefor UTMPred is set to7 codons, whichresults in 84 inputnodes and 8 outputnotes which representthe expandedstructural forms.

• UTMPred used 200 prototypes and after the training wascompleted, the system was able to predict H and Eforms with accuracy above 75%. At the same time, thesystem had difficulty predicting form I, due to a smallamount of data in the training samples.

Assignment of Protein Sequence to Functional FamilyUsing Neural Network and Dempster-Shafer Theory

• Purpose– Using neural networks, efficiently predict

protein function

• Using databases such as Prosite, Pfam,and Prints, either query the databases formotifs within a protein in question, orquery for an absence or presence ofarbitrary combinations of motifs.

• Given a training set,induce a classifier able toassign novel proteinsequences to one of theprotein familiesrepresented in thetraining set

• Once trained, theclassifier will be able topredict novel proteins intospecific functionalfamilies based on itsknowledge of the trainingset

• Input data– From the Prosite database containing over 1100

entries. Each entry describes a function shared bysome proteins. In the experiment one Prositedocumentation entry corresponded to a protein class,and each protein class could, in turn, be characterizedby one or more motif patterns/profiles. Only motifsconsidered significant matches by profileScan werechosen.

• DBNN was used as the classifier.

• 585 proteins belonging toone of ten classes wereused, out of whichsubsets of varying sizewere picked randomly tobecome the training set.

• Once the DBNN wastrained, all 585 proteinswere used as the test setto determine accuracy

With only 10% of the total trainingsamples, DBNN could be constructed toclassify proteins with a 95% accuracy.

• The number of falsepositives generated byDBNN were significantlylower than those resultingfrom a Prosite search.

• As the size of the data setapproaches 100%, thefalse positives discoveredby DBNN approacheszero.

The number of false positives resultingfrom the use of the DBNN trained usingtraining sets of different sizes.

• A second data set of 73protein sequences drawn fromfive classes were used to builda DBNN classifier

• Using the DBNN classifier builtby random sized datasets, theoutput exceeded 96%accuracy when the training setwas greater or equal to 22

• Once the input contained morethan 80% (58 or moresequences) of the dataset, allsequences were correctlypredicted Result of classifying proteins

containing common motifs

Future Work

• Ultimate solution to “protein folding” will probablybe a hybrid

• Neural networks likely to be included due to theirsuccessful application to related problems– Secondary structure– Solvent access– Distance between residues in final structure– Protein interface recognition

• In addition, neural nets can combine knowledgefrom multiple sources

Bibliography

• B. Rost. “Neural networks for protein structure prediction: hype orhit?” Artificial intelligence and heuristic methods for bioinformatics(2003): 34-50.

• S.N.V. Arjunan, S. Deris, R.M. Illias. “Protein Secondary StructurePrediction Based on Denoeux Belief Neural Network.” ICAIETProceedings (2002): 554-560.

• N.M.Zaki, S. Deris, S.N.V. Arjunan. “Assignment of ProteinSequence to Functional Family Using Neural Network andDempster-Shafer Theory” Journal of Theoretics 5-1 (2003).

Background information

• S.N.V Arkimam, S. Deris, R.M.Illias. “Prediction of ProteinSecondary Structure” Jurnal Teknologi 35(C) (2001): 81-90.

• T.Wessels, C.W. Omlin.”Refining Hidden Markov Models withRecurrent Neural Networks”.

protein structure prediction using neural networks · protein structure prediction using neural...

Documents

land subsidence prediction using recurrent neural networks

energy demand prediction through novel random neural

artificial neural network potential in yield prediction of...

drag prediction with neural network - icas

cumulonimbus prediction using artificial neural network back

video traffic prediction using neural networks

on human motion prediction using recurrent neural...

rain prediction using convolutional neural network (cnn

mineral prospectivity prediction via convolutional neural

neural network for consumer choice prediction

clastic facies prediction using neural network (case study...

neural network based prediction

enset yield prediction model using artificial neural

diabetes prediction using artificial neural network

neural prediction challenge

neural networks for time series prediction

neural networks for market prediction

software defect prediction via convolutional neural network

artificial neural networks for hull resistance prediction

protein structure prediction using neural networks ·...