combining simulation and machine learning to recognize ...combining simulation and machine learning...

Post on 12-Jun-2020

7 Views

Category:

Documents

0 Downloads

Preview:

Click to see full reader

TRANSCRIPT

Combining simulation andCombining simulation andmachine learning to recognizemachine learning to recognizefunction in 4Dfunction in 4D

Russ B. AltmanRuss B. Altman, MD, PhD, MD, PhD

Departments of Bioengineering,Departments of Bioengineering,Genetics, Medicine, Computer ScienceGenetics, Medicine, Computer ScienceStanford UniversityStanford University

Biological MotivationsBiological Motivations

Structural genomics and predictionStructural genomics and prediction(structures without functions)(structures without functions)

Need to label putative molecularNeed to label putative molecularfunctions of new structuresfunctions of new structures

Need to discover new molecularNeed to discover new molecularfunctionsfunctions

Increasing examples ofIncreasing examples of polyfunctionalpolyfunctionalproteins.proteins.

OutlineOutline

Introduction to FEATURE &Introduction to FEATURE &WebFEATUREWebFEATURE

UsingUsing molecular molecular simulation tosimulation toimprove FEATUREimprove FEATURE

Protein Function Prediction &Protein Function Prediction &AnnotationAnnotation

FunctionPrediction

MethodProtein

Structures

X-Ray

NMR

StructurePredictions

ATP binding site

Calcium binding site

Radial MicroenvironmentRadial Microenvironment

ASP

ASP

FEATURE TrainingFEATURE Training

SitesSites Non-SitesNon-Sites

SignificanceSignificance

ScoreScore

CorrespondingCorrespondingVolumesVolumes

Histogram ofHistogram ofObserved valuesObserved values

FEATURE vectorsFEATURE vectors

4444 vectors of physical/chemical featuresvectors of physical/chemical features Counted in 6 shellsCounted in 6 shells 44 x 6 = 264 feature-shell combinations44 x 6 = 264 feature-shell combinations

that summarize the abundance of athat summarize the abundance of afeature in a shell (e.g. # of feature in a shell (e.g. # of oxygens oxygens ininshell that is 2-3 Angstromsshell that is 2-3 Angstroms from center)from center)

Vector = 264 continuousVector = 264 continuous or discreteor discretevalues describing environment aroundvalues describing environment aroundsite center.site center.

Example: Calcium binding siteExample: Calcium binding site

FEATUREFEATURECalciumCalciumModelModel

Calcium Model

Property Name

Significant presenceSignificant absence

AtomName

ChemicalGroup

AtomicProperties

ResidueName

Abundance ofASP amino acid

Challenge:Challenge:Create a library of modelsCreate a library of modelsavailable to FEATUREavailable to FEATURE

Solution:Solution:Use 1D sequence motifs asUse 1D sequence motifs as““seedsseeds”” for 3D motifs for 3D motifs

Shirley Wu

SeqFEATURESeqFEATURE

Build from Sequence Motif DatabasesBuild from Sequence Motif Databases

AutomaticallyAutomatically creates 3D motifs from 1D creates 3D motifs from 1Dsequence motifssequence motifs

Hypothesis: Hypothesis: 3D motifs perform better3D motifs perform betterthan 1D motifs in identifying functionalthan 1D motifs in identifying functionalsitessites

Extracting 3D motif examplesExtracting 3D motif examplesfrom 1D motiffrom 1D motif

Extracting non-sites forExtracting non-sites forbackground statisticsbackground statistics

Random amino acid with similar atomRandom amino acid with similar atomdensity and outside the site areadensity and outside the site area

SeqFEATURESeqFEATURE

Training Set

Sitespdbid (x y z)pdbid (x y z)…

Non-Sitespdbid (x y z)pdbid (x y z)…

FEATURE training

ProteinStructure

Model

FunctionalSites

Hits(x y z) score(x y z) score…

FEATURE scan

SequenceMotifs

SeqFEATURE

Site Selection

Non-site Selection

[LIVM]AS..T..H.[AD]..[RK]

(seq)FEATURE performs better whensequence and structural similarity are low

PROSITE other

1Z84: Galt-like proteinCutoffz-scoreSiteSeqFEATURE Model

3.0712.795CYS 66ZINC_FINGER_C2H2_1.3.CYS.SG3.1774.713CYS 63ZINC_FINGER_C2H2_1.1.CYS.SG

C2H2 type Zinc fingerSeqFEATURENucleotidyltransferase3-D templates

PredictionMethod

ADP-glucose phosphorylaseSSMNonePfamNonePROSITE

2.7954.713

1Z84: Galt-like protein

2GLI(zinc-finger protein)

1Z84(prediction)

back

Beta-lactamase prediction (left)

Go to other slides

top related