some contributions of machine learning to...
TRANSCRIPT
Some contributions of machine learning tobioinformatics
Jean-Philippe [email protected]
Mines ParisTech / Institut Curie / Inserm
Université de Provence, Marseille, France, March 10, 2009.
Jean-Philippe Vert (ParisTech-Curie) Machine learning in bioinformatics 1 / 57
Where I come from
A joint lab about “Cancer computational genomics, bioinformatics,biostatistics and epidemiology”Located in th Institut Curie, a major hospital and cancer researchinstitute in Europe
Jean-Philippe Vert (ParisTech-Curie) Machine learning in bioinformatics 2 / 57
”Statistical machine learning for cancer informatics”team
Main topicsTowards better diagnosis, prognosis, and personalized medicine
Supervised classification of genomic, transcriptomic, proteomicdata; heterogeneous data integration
Towards new drug targetsSystems biology, reconstruction of gene networks, pathwayenrichment analysis, multidimensional phenotyping of cellpopulations.
Towards new drugsLigand-based virtual screening, in silico chemogenomics.
Jean-Philippe Vert (ParisTech-Curie) Machine learning in bioinformatics 3 / 57
Towards personalized medicine:Diagnosis/prognosis from genome/transcriptome
From Golub et al., Science, 1999.Jean-Philippe Vert (ParisTech-Curie) Machine learning in bioinformatics 4 / 57
Towards new drug targets:Inference of biological networks
From Mordelet and Vert, Bioinformatics, 2008.Jean-Philippe Vert (ParisTech-Curie) Machine learning in bioinformatics 5 / 57
Towards new drugs:Ligand-Based Virtual Screening and QSAR
inactive
active
active
active
inactive
inactive
NCI AIDS screen results (from http://cactus.nci.nih.gov).Jean-Philippe Vert (ParisTech-Curie) Machine learning in bioinformatics 6 / 57
Pattern recognition, aka supervised classification
Jean-Philippe Vert (ParisTech-Curie) Machine learning in bioinformatics 7 / 57
Pattern recognition, aka supervised classification
Jean-Philippe Vert (ParisTech-Curie) Machine learning in bioinformatics 7 / 57
Pattern recognition, aka supervised classification
Jean-Philippe Vert (ParisTech-Curie) Machine learning in bioinformatics 7 / 57
Pattern recognition, aka supervised classification
Jean-Philippe Vert (ParisTech-Curie) Machine learning in bioinformatics 7 / 57
Pattern recognition, aka supervised classification
ChallengesHigh dimensionFew samplesStructured dataPrior knowledgeFast and scalableimplementations
Jean-Philippe Vert (ParisTech-Curie) Machine learning in bioinformatics 8 / 57
Outline
1 Supervised classification of genomic data
2 Inference of biological networks
3 Virtual screening and chemogenomics
4 Conclusion
Jean-Philippe Vert (ParisTech-Curie) Machine learning in bioinformatics 9 / 57
Outline
1 Supervised classification of genomic data
2 Inference of biological networks
3 Virtual screening and chemogenomics
4 Conclusion
Jean-Philippe Vert (ParisTech-Curie) Machine learning in bioinformatics 9 / 57
Outline
1 Supervised classification of genomic data
2 Inference of biological networks
3 Virtual screening and chemogenomics
4 Conclusion
Jean-Philippe Vert (ParisTech-Curie) Machine learning in bioinformatics 9 / 57
Outline
1 Supervised classification of genomic data
2 Inference of biological networks
3 Virtual screening and chemogenomics
4 Conclusion
Jean-Philippe Vert (ParisTech-Curie) Machine learning in bioinformatics 9 / 57
Outline
1 Supervised classification of genomic data
2 Inference of biological networks
3 Virtual screening and chemogenomics
4 Conclusion
Jean-Philippe Vert (ParisTech-Curie) Machine learning in bioinformatics 10 / 57
Motivation
GoalDesign a classifier toautomatically assign aclass to future samplesfrom their expressionprofileInterpret biologically thedifferences between theclasses
DifficultyLarge dimensionFew samples
Jean-Philippe Vert (ParisTech-Curie) Machine learning in bioinformatics 11 / 57
Linear classifiers
The modelEach sample is represented by a vector x = (x1, . . . , xp)
Goal: estimate a linear function:
fβ(x) =
p∑i=1
βixi + β0 .
Interpretability: the weight βi quantifies the influence of feature i(but...)
Jean-Philippe Vert (ParisTech-Curie) Machine learning in bioinformatics 12 / 57
Linear classifiers
Training the model
fβ(x) =
p∑i=1
βixi + β0 .
Minimize an empirical risk on the training samples:
minβ∈Rp+1
Remp(β) =1n
n∑i=1
l(fβ(xi), yi) ,
... subject to some constraint on β, e.g.:
Ω(β) ≤ C .
Jean-Philippe Vert (ParisTech-Curie) Machine learning in bioinformatics 13 / 57
Example : Norm Constraints
The approachA common method in statistics to learn with few samples in highdimension is to constrain the Euclidean norm of β
Ωridge(β) = ‖β ‖22 =
p∑i=1
β2i ,
(ridge regression, support vector machines...)
ProsGood performance inclassification
ConsLimited interpretation(small weights)No prior biologicalknowledge
Jean-Philippe Vert (ParisTech-Curie) Machine learning in bioinformatics 14 / 57
Example : Feature Selection
The approachConstrain most weights to be 0, i.e., select a few genes (< 100) whoseexpression are sufficient for classification.
Greedy feature selection (T-tests, ...)Contrain the norm of β: LASSO penalty (‖β ‖1 =
∑pi=1 |βi |),
elastic net penalty (‖β ‖1 + ‖β ‖2), ... )
ProsGood performance inclassificationBiomarker selectionInterpretability
ConsThe gene selectionprocess is usually notrobustNo use of prior biologicalknowledge
Jean-Philippe Vert (ParisTech-Curie) Machine learning in bioinformatics 15 / 57
Why LASSO leads to sparse solutions
Jean-Philippe Vert (ParisTech-Curie) Machine learning in bioinformatics 16 / 57
Incorporating prior knowledge
The ideaIf we have a specific prior knowledge about the “correct” weights,it can be included in Ω in the contraint:
Minimize Remp(β) subject to Ω(β) ≤ C .
If we design a convex function Ω, then the algorithm boils down toa convex optimization problem (usually easy to solve).Similar to priors in Bayesian statistics
Jean-Philippe Vert (ParisTech-Curie) Machine learning in bioinformatics 17 / 57
Example: CGH array classification
MotivationComparative genomic hybridization (CGH) data measure the DNAcopy number along the genomeVery useful, in particular in cancer researchCan we classify CGH arrays for diagnosis or prognosis purpose?
Jean-Philippe Vert (ParisTech-Curie) Machine learning in bioinformatics 18 / 57
Example: CGH array classification
Prior knowledgeLet x be a CGH profileWe focus on linear classifiers, i.e., the sign of :
f (x) = x>β .
We expect β to besparse : only a few positions should be discriminativepiecewise constant : within a region, all probes should contributeequally
Jean-Philippe Vert (ParisTech-Curie) Machine learning in bioinformatics 19 / 57
Example: CGH array classification
A solution (Rapaport et al., 2008)
Ωfusedlasso(β) =∑
i
|βi |+∑i∼j
|βi − βj | .
Good performance on diagnosis for bladder cancer, and prognosisfor melanoma.More interpretable classifiers
0 500 1000 1500 2000 2500 3000 3500 4000−1
−0.8
−0.6
−0.4
−0.2
0
0.2
0.4
0.6
0.8
1
BAC
Wei
ght
0 500 1000 1500 2000 2500 3000 3500 4000−1
−0.8
−0.6
−0.4
−0.2
0
0.2
0.4
0.6
0.8
1
BAC
Wei
ght
Jean-Philippe Vert (ParisTech-Curie) Machine learning in bioinformatics 20 / 57
Example: finding discriminant modules in genenetworks
The problemClassification of gene expression: too many genesA gene network is given (PPI, metabolic, regulatory, signaling,co-expression...)We expect that “clusters of genes” (modules) in the networkcontribute similarly to the classification
Jean-Philippe Vert (ParisTech-Curie) Machine learning in bioinformatics 21 / 57
Example: finding discriminant modules in genenetworks
Prior hypothesisGenes near each other on the graph should have similar weigths.
Two solutions (Rapaport et al., 2007, 2008)
Ωspectral(β) =∑i∼j
(βi − βj)2 ,
Ωgraphfusion(β) =∑i∼j
|βi − βj |+∑
i
|βi | .
Jean-Philippe Vert (ParisTech-Curie) Machine learning in bioinformatics 22 / 57
Example: finding discriminant modules in genenetworks
Prior hypothesisGenes near each other on the graph should have similar weigths.
Two solutions (Rapaport et al., 2007, 2008)
Ωspectral(β) =∑i∼j
(βi − βj)2 ,
Ωgraphfusion(β) =∑i∼j
|βi − βj |+∑
i
|βi | .
Jean-Philippe Vert (ParisTech-Curie) Machine learning in bioinformatics 22 / 57
Example: finding discriminant modules in genenetworksRapaport et al
N
-
Glycan biosynthesis
Protein kinases
DNA and RNA polymerase subunits
Glycolysis / Gluconeogenesis
Sulfurmetabolism
Porphyrinand chlorophyll metabolism
Riboflavin metabolism
Folatebiosynthesis
Biosynthesis of steroids, ergosterol metabolism
Lysinebiosynthesis
Phenylalanine, tyrosine andtryptophan biosynthesis Purine
metabolism
Oxidative phosphorylation, TCA cycle
Nitrogen,asparaginemetabolism
Fig. 4. Global connection map of KEGG with mapped coefficients of the decision function obtained by applying a customary linear SVM
(left) and using high-frequency eigenvalue attenuation (80% of high-frequency eigenvalues have been removed) (right). Spectral filtering
divided the whole network into modules having coordinated responses, with the activation of low-frequency eigen modes being determined by
microarray data. Positive coefficients are marked in red, negative coefficients are in green, and the intensity of the colour reflects the absolute
values of the coefficients. Rhombuses highlight proteins participating in the Glycolysis/Gluconeogenesis KEGG pathway. Some other parts of
the network are annotated including big highly connected clusters corresponding to protein kinases and DNA and RNA polymerase sub-units.
5 DISCUSSION
Our algorithm groups predictor variables according to highly
connected "modules" of the global gene network. We assume
that the genes within a tightly connected network module
are likely to contribute similarly to the prediction function
because of the interactions between the genes. This motivates
the filtering of gene expression profile to remove the noisy
high-frequencymodes of the network.
Such grouping of variables is a very useful feature of the
resulting classification function because the function beco-
mes meaningful for interpreting and suggesting biological
factors that cause the class separation. This allows classifi-
cations based on functions, pathways and network modules
rather than on individual genes. This can lead to a more robust
behaviour of the classifier in independent tests and to equal if
not better classification results. Our results on the dataset we
analysed shows only a slight improvement, although this may
be due to its limited size. Thereforewe are currently extending
our work to larger data sets.
An important remark to bear in mind when analyzing pictu-
res such as fig.4 and 5 is that the colors represent the weights
of the classifier, and not gene expression levels. There is
of course a relationship between the classifier weights and
the typical expression levels of genes in irradiated and non-
irradiated samples: irradiated samples tend to have expression
profiles positively correlated with the classifier, while non-
irradiated samples tend to be negatively correlated. Roughly
speaking, the classifier tries to find a smooth function that
has this property. If more samples were available, better
non-smooth classifier might be learned by the algorithm, but
constraining the smoothness of the classifier is away to reduce
the complexity of the learning problem when a limited num-
ber of samples are available. This means in particular that the
pictures provide virtually no information regarding the over-
8
Jean-Philippe Vert (ParisTech-Curie) Machine learning in bioinformatics 23 / 57
Example: finding discriminant modules in genenetworks
Prior hypothesisGenes near each other on the graph should have non-zero weigths(i.e., the support of β should be made of a few connectedcomponents).
Two solutions?
Ωintersection(β) =∑i∼j
√β2
i + β2j ,
Ωunion(β) = supα∈Rp:∀i∼j,‖α2
i +α2j ‖≤1
α>β .
Jean-Philippe Vert (ParisTech-Curie) Machine learning in bioinformatics 24 / 57
Example: finding discriminant modules in genenetworks
Prior hypothesisGenes near each other on the graph should have non-zero weigths(i.e., the support of β should be made of a few connectedcomponents).
Two solutions?
Ωintersection(β) =∑i∼j
√β2
i + β2j ,
Ωunion(β) = supα∈Rp:∀i∼j,‖α2
i +α2j ‖≤1
α>β .
Jean-Philippe Vert (ParisTech-Curie) Machine learning in bioinformatics 24 / 57
Example: finding discriminant modules in genenetworks
Groups (1, 2) and (2, 3). Left: Ωintersection(β). Right: Ωunion(β). Verticalaxis is β2.
Jean-Philippe Vert (ParisTech-Curie) Machine learning in bioinformatics 25 / 57
Outline
1 Supervised classification of genomic data
2 Inference of biological networks
3 Virtual screening and chemogenomics
4 Conclusion
Jean-Philippe Vert (ParisTech-Curie) Machine learning in bioinformatics 26 / 57
Biological networks
Jean-Philippe Vert (ParisTech-Curie) Machine learning in bioinformatics 27 / 57
Our goal
DataGene expression,Gene sequence,Protein localization, ...
GraphProtein-protein interactions,Metabolic pathways,Signaling pathways, ...
Jean-Philippe Vert (ParisTech-Curie) Machine learning in bioinformatics 28 / 57
More precisely
“De novo” inferenceGiven data about individual genes and proteinsInfer the edges between genes and proteins
“Supervised” inferenceGiven data about individual genes and proteinsand given some known interactionsinfer unknown interactions
Jean-Philippe Vert (ParisTech-Curie) Machine learning in bioinformatics 29 / 57
More precisely
“De novo” inferenceGiven data about individual genes and proteinsInfer the edges between genes and proteins
“Supervised” inferenceGiven data about individual genes and proteinsand given some known interactionsinfer unknown interactions
Jean-Philippe Vert (ParisTech-Curie) Machine learning in bioinformatics 29 / 57
Supervised inference by pattern recognition
Formulation and basic issueA pair can be connected (1) or not connected (-1)From the known subgraph we can extract examples of connectedand non-connected pairsHowever the genomic data characterize individual proteins; weneed to work with pairs of proteins instead!
1
2
4
3
1
2
3
4
Known graph Genomic data
Jean-Philippe Vert (ParisTech-Curie) Machine learning in bioinformatics 30 / 57
Supervised inference by pattern recognition
Formulation and basic issueA pair can be connected (1) or not connected (-1)From the known subgraph we can extract examples of connectedand non-connected pairsHowever the genomic data characterize individual proteins; weneed to work with pairs of proteins instead!
1
2
4
3
1
2
3
4
(1,3)
(1,4)(3,4)
2,4)(2,3)
(1,2)
Known graph Genomic data
Jean-Philippe Vert (ParisTech-Curie) Machine learning in bioinformatics 30 / 57
Supervised inference by pattern recognition
Formulation and basic issueA pair can be connected (1) or not connected (-1)From the known subgraph we can extract examples of connectedand non-connected pairsHowever the genomic data characterize individual proteins; weneed to work with pairs of proteins instead!
1
2
4
3
1
2
3
4
(1,3)
(1,4)(3,4)
2,4)(2,3)
(1,2)
Known graph Genomic data
Jean-Philippe Vert (ParisTech-Curie) Machine learning in bioinformatics 30 / 57
Tensor product SVM (Ben-Hur and Noble, 2006)
Intuition: a pair (A, B) is similar to a pair (C, D) if:A is similar to C and B is similar to D, or...A is similar to D and B is similar to C
Formally, define a similarity between pairs from a similaritybetween individuals by
KTPPK ((a, b), (c, d)) = K (a, c)K (b, d) + K (a, d)K (b, c) .
If K is a positive definite kernel for individuals then KTPPK is a p.d.kernel for pairs which can be used by SVMThis amounts to representing a pair (a, b) by the symmetrizedtensor product:
(a, b) → (a⊗ b)⊕ (b ⊗ a) .
Jean-Philippe Vert (ParisTech-Curie) Machine learning in bioinformatics 31 / 57
Tensor product SVM (Ben-Hur and Noble, 2006)
Intuition: a pair (A, B) is similar to a pair (C, D) if:A is similar to C and B is similar to D, or...A is similar to D and B is similar to C
Formally, define a similarity between pairs from a similaritybetween individuals by
KTPPK ((a, b), (c, d)) = K (a, c)K (b, d) + K (a, d)K (b, c) .
If K is a positive definite kernel for individuals then KTPPK is a p.d.kernel for pairs which can be used by SVMThis amounts to representing a pair (a, b) by the symmetrizedtensor product:
(a, b) → (a⊗ b)⊕ (b ⊗ a) .
Jean-Philippe Vert (ParisTech-Curie) Machine learning in bioinformatics 31 / 57
Tensor product SVM (Ben-Hur and Noble, 2006)
Intuition: a pair (A, B) is similar to a pair (C, D) if:A is similar to C and B is similar to D, or...A is similar to D and B is similar to C
Formally, define a similarity between pairs from a similaritybetween individuals by
KTPPK ((a, b), (c, d)) = K (a, c)K (b, d) + K (a, d)K (b, c) .
If K is a positive definite kernel for individuals then KTPPK is a p.d.kernel for pairs which can be used by SVMThis amounts to representing a pair (a, b) by the symmetrizedtensor product:
(a, b) → (a⊗ b)⊕ (b ⊗ a) .
Jean-Philippe Vert (ParisTech-Curie) Machine learning in bioinformatics 31 / 57
Metric learning pairwise SVM (V. et al, 2007)
Intuition: a pair (A, B) is similar to a pair (C, D) if:A− B is similar to C − D, or...A− B is similar to D − C.
Formally, define a similarity between pairs from a similaritybetween individuals by
KMLPK ((a, b), (c, d)) = (K (a, c) + K (b, d)− K (a, c)− K (b, d))2 .
If K is a positive definite kernel for individuals then KMLPK is a p.d.kernel for pairs which can be used by SVMThis amounts to representing a pair (a, b) by the symmetrizeddifference:
(a, b) → (a− b)⊗2 .
Jean-Philippe Vert (ParisTech-Curie) Machine learning in bioinformatics 32 / 57
Metric learning pairwise SVM (V. et al, 2007)
Intuition: a pair (A, B) is similar to a pair (C, D) if:A− B is similar to C − D, or...A− B is similar to D − C.
Formally, define a similarity between pairs from a similaritybetween individuals by
KMLPK ((a, b), (c, d)) = (K (a, c) + K (b, d)− K (a, c)− K (b, d))2 .
If K is a positive definite kernel for individuals then KMLPK is a p.d.kernel for pairs which can be used by SVMThis amounts to representing a pair (a, b) by the symmetrizeddifference:
(a, b) → (a− b)⊗2 .
Jean-Philippe Vert (ParisTech-Curie) Machine learning in bioinformatics 32 / 57
Metric learning pairwise SVM (V. et al, 2007)
Intuition: a pair (A, B) is similar to a pair (C, D) if:A− B is similar to C − D, or...A− B is similar to D − C.
Formally, define a similarity between pairs from a similaritybetween individuals by
KMLPK ((a, b), (c, d)) = (K (a, c) + K (b, d)− K (a, c)− K (b, d))2 .
If K is a positive definite kernel for individuals then KMLPK is a p.d.kernel for pairs which can be used by SVMThis amounts to representing a pair (a, b) by the symmetrizeddifference:
(a, b) → (a− b)⊗2 .
Jean-Philippe Vert (ParisTech-Curie) Machine learning in bioinformatics 32 / 57
Supervised inference with local models
The idea (Bleakley et al., 2007)Motivation: define specific models for each target node todiscriminate between its neighbors and the othersTreat each node independently from the other. Then combinepredictions for ranking candidate edges.
Jean-Philippe Vert (ParisTech-Curie) Machine learning in bioinformatics 33 / 57
Supervised inference with local models
The idea (Bleakley et al., 2007)Motivation: define specific models for each target node todiscriminate between its neighbors and the othersTreat each node independently from the other. Then combinepredictions for ranking candidate edges.
+1
−1
?
?
?
+1
−1
−1
Jean-Philippe Vert (ParisTech-Curie) Machine learning in bioinformatics 33 / 57
Results: protein-protein interaction (yeast)
0 0.2 0.4 0.6 0.8 10
0.2
0.4
0.6
0.8
1
Ratio of false positives
Rat
io o
f tru
e po
sitiv
es
DirectkMLkCCAemlocalPkernel
(from Bleakley et al., 2007)
Jean-Philippe Vert (ParisTech-Curie) Machine learning in bioinformatics 34 / 57
Results: metabolic gene network (yeast)
0 0.2 0.4 0.6 0.8 10
0.2
0.4
0.6
0.8
1
Ratio of false positives
Rat
io o
f tru
e po
sitiv
es
DirectkMLkCCAemlocalPkernel
(from Bleakley et al., 2007)
Jean-Philippe Vert (ParisTech-Curie) Machine learning in bioinformatics 35 / 57
Results: regulatory network (E. coli)
0 0.2 0.4 0.6 0.8 10
0.2
0.4
0.6
0.8
1
Ratio of false positives
Rat
io o
f tru
e po
sitiv
es
CLRSIRENESIRENE−Bias
0 0.2 0.4 0.6 0.8 10
0.2
0.4
0.6
0.8
1
Recall
Pre
cisi
on
CLRSIRENESIRENE−Bias
Method Recall at 60% Recall at 80%SIRENE 44.5% 17.6%CLR 7.5% 5.5%Relevance networks 4.7% 3.3%ARACNe 1% 0%Bayesian network 1% 0%
SIRENE = Supervised Inference of REgulatory NEtworks (Mordelet and V., 2008)
Jean-Philippe Vert (ParisTech-Curie) Machine learning in bioinformatics 36 / 57
Results: predicted regulatory network (E. coli)
Prediction at 60% precision, restricted to transcription factors (from Mordelet and V., 2008).
Jean-Philippe Vert (ParisTech-Curie) Machine learning in bioinformatics 37 / 57
Outline
1 Supervised classification of genomic data
2 Inference of biological networks
3 Virtual screening and chemogenomics
4 Conclusion
Jean-Philippe Vert (ParisTech-Curie) Machine learning in bioinformatics 38 / 57
Virtual screening
ObjectiveBuild models to predict biochemical properties of small molecules fromtheir structures.
Structures
C15H14ClN3O3
Propertiesbinding to a therapeutic target,pharmacokinetics (ADME),toxicity...
Jean-Philippe Vert (ParisTech-Curie) Machine learning in bioinformatics 39 / 57
Ligand-Based Virtual Screening and QSAR
inactive
active
active
active
inactive
inactive
NCI AIDS screen results (from http://cactus.nci.nih.gov).Jean-Philippe Vert (ParisTech-Curie) Machine learning in bioinformatics 40 / 57
Formalization
The problemGiven a set of training instances (x1, y1), . . . , (xn, yn), where xi ’sare graphs and yi ’s are continuous or discrete variables of interest,Estimate a function
y = f (x)
where x is any graph to be labeled.This is a classical regression or pattern recognition problem overthe set of graphs.
Jean-Philippe Vert (ParisTech-Curie) Machine learning in bioinformatics 41 / 57
Classical approaches
Two steps1 Map each molecule to a vector of fixed dimension using molecular
descriptorsGlobal properties of the molecules (mass, logP...)2D and 3D descriptors (substructures, fragments, ....)
2 Apply an algorithm for regression or pattern recognition.PLS, ANN, ...
Example: 2D structural keys
O
N
O
O
OO
N N N
O O
O
Jean-Philippe Vert (ParisTech-Curie) Machine learning in bioinformatics 42 / 57
Which descriptors?
O
N
O
O
OO
N N N
O O
O
DifficultiesMany descriptors are needed to characterize various features (inparticular for 2D and 3D descriptors)But too many descriptors are harmful for memory storage,computation speed, statistical estimation
Jean-Philippe Vert (ParisTech-Curie) Machine learning in bioinformatics 43 / 57
Kernels
DefinitionLet Φ(x) = (Φ1(x), . . . ,Φp(x)) be a vector representation of themolecule xThe kernel between two molecules is defined by:
K (x , x ′) = Φ(x)>Φ(x ′) =
p∑i=1
Φi(x)Φi(x ′) .
φX H
Jean-Philippe Vert (ParisTech-Curie) Machine learning in bioinformatics 44 / 57
The kernel trick
φX H
K (x , x ′) = Φ(x)>Φ(x ′)
The trickMany linear algorithms for regression or pattern recognition canbe expressed only in terms of inner products between vectorsComputing the kernel is often more efficient than computing Φ(x),especially in high or infinite dimensions!
Jean-Philippe Vert (ParisTech-Curie) Machine learning in bioinformatics 45 / 57
Example: 2D fragment kernel
. . . . . .C NCCON
C C NO C
CO C
CC CCN C
NO CC C C
CC CC C C
CN CC C C
N
O
O
O N
O
C. . . . . . . . .
φd(x) is the vector of counts of all fragments of length d :
φ1(x) = ( #(C),#(O),#(N), ...)>
φ2(x) = ( #(C-C),#(C=O),#(C-N), ...)> etc...
The 2D fragment kernel is defined, for λ < 1, by
Kfragment(x , x ′) =∞∑
d=1
r(λ)φd(x)>φd(x ′) .
Jean-Philippe Vert (ParisTech-Curie) Machine learning in bioinformatics 46 / 57
Example: 2D fragment kernel
. . . . . .C NCCON
C C NO C
CO C
CC CCN C
NO CC C C
CC CC C C
CN CC C C
N
O
O
O N
O
C. . . . . . . . .
In practiceKfragment can be computed efficiently (geometric kernel, randomwalk kernel...) although the feature space has infinite dimension.Increasing the specificity of atom labels improves performanceSelecting only “non-tottering” fragments can be done efficientlyand improves performance.
Jean-Philippe Vert (ParisTech-Curie) Machine learning in bioinformatics 47 / 57
Example: 2D subtree kernel
.
.
.
.
.
.
.
.
.N
N
C
CO
C
.
.
. C
O
C
N
C
N O
C
N CN C C
N
N
Jean-Philippe Vert (ParisTech-Curie) Machine learning in bioinformatics 48 / 57
2D Subtree vs fragment kernels (Mahé and V, 2007)
7072
7476
7880
AU
CWalksSubtrees
CC
RF
−C
EM
HL−
60(T
B)
K−
562
MO
LT−
4R
PM
I−82
26 SR
A54
9/A
TC
CE
KV
XH
OP
−62
HO
P−
92N
CI−
H22
6N
CI−
H23
NC
I−H
322M
NC
I−H
460
NC
I−H
522
CO
LO_2
05H
CC
−29
98H
CT
−11
6H
CT
−15
HT
29K
M12
SW
−62
0S
F−
268
SF
−29
5S
F−
539
SN
B−
19S
NB
−75
U25
1LO
X_I
MV
IM
ALM
E−
3M M14
SK
−M
EL−
2S
K−
ME
L−28
SK
−M
EL−
5U
AC
C−
257
UA
CC
−62
IGR
−O
V1
OV
CA
R−
3O
VC
AR
−4
OV
CA
R−
5O
VC
AR
−8
SK
−O
V−
378
6−0
A49
8A
CH
NC
AK
I−1
RX
F_3
93S
N12
CT
K−
10U
O−
31P
C−
3D
U−
145
MC
F7
NC
I/AD
R−
RE
SM
DA
−M
B−
231/
AT
CC
HS
_578
TM
DA
−M
B−
435
MD
A−
NB
T−
549
T−
47D
Screening of inhibitors for 60 cancer cell lines (from Mahé and V.,2008)Jean-Philippe Vert (ParisTech-Curie) Machine learning in bioinformatics 49 / 57
Example: 3D pharmacophore kernel (Mahé et al.,2005)
O
O
2
d1
d3
d
O
O
2
d1
d3
d
K (x , y) =∑
px∈P(x)
∑py∈P(y)
exp (−γd (px , py )) .
Results (accuracy)Kernel BZR COX DHFR ER2D (Tanimoto) 71.2 63.0 76.9 77.13D fingerprint 75.4 67.0 76.9 78.63D not discretized 76.4 69.8 81.9 79.8
Jean-Philippe Vert (ParisTech-Curie) Machine learning in bioinformatics 50 / 57
Chemogenomics
The problemSimilar targets bind similar ligandsInstead of focusing on each target individually, can we screen thebiological space (target families) vs the chemical space (ligands)?Mathematically, learn f (target , ligand) ∈ bind , notbind
Jean-Philippe Vert (ParisTech-Curie) Machine learning in bioinformatics 51 / 57
Chemogenomics with SVM
Tensor product SVMTake the kernel:
K((t , l), (t ′, l ′)
)= Kt(t , t ′)Kl(l , l ′) .
Equivalently, represent a pair (t , l) by the vector φt(t)⊗ φl(l)Allows to use any kernel for proteins Kt with any kernel for smallmolecules Kl
When Kt is the Dirac kernel, we recover the classical paradigm:each target is treated independently from the others.Otherwise, information is shared across targets. The more similarthe targets, the more they share information.
Jean-Philippe Vert (ParisTech-Curie) Machine learning in bioinformatics 52 / 57
Example: MHC-I epitope prediction across differentalleles
The approach (Jacob and V., 2007)take a kernel to compare different MHC-I alleles (e.g., based onthe amino-acids in the paptide recognition pocket)take a kernel to compare different epitopes (9-mer peptides)Combine them to learn the f (allele, epitope) functionState-of-the-art performanceAvailable at http://cbio.ensmp.fr/kiss
Jean-Philippe Vert (ParisTech-Curie) Machine learning in bioinformatics 53 / 57
Generalization: collaborative filtering with attributes
General problem: learn f (x , y) with a kernel Kx for x and a kernelKy for y .SVM with a tensor product kernel Kx ⊗ Ky is a particular case ofsomething more general: estimating an operator with a spectralregularization.Other spectral regularization are possible (e.g., trace norm) andlead to efficient algorithmsMore details in Abernethy et al. (2008).
Jean-Philippe Vert (ParisTech-Curie) Machine learning in bioinformatics 54 / 57
Outline
1 Supervised classification of genomic data
2 Inference of biological networks
3 Virtual screening and chemogenomics
4 Conclusion
Jean-Philippe Vert (ParisTech-Curie) Machine learning in bioinformatics 55 / 57
Conclusion
Modern machine learning methods for regression / classificationlend themselves well to the integration of prior knowledge in thepenalization / regularization function.Inference of biological networks can be formulated in theframework of pattern recognition.Kernel methods (eg SVM) allow to manipulate complex objects(eg molecules, biological sequences) as soon as kernels can bedefined and computed.
Jean-Philippe Vert (ParisTech-Curie) Machine learning in bioinformatics 56 / 57
People I need to thank
Including prior knowledge in penalizationFranck Rapaport, Emmanuel Barillot, Andrei Zynoviev, ChristianLajaunie, Yves Vandenbrouck, Nicolas Foveau...
Virtual screening, kernels etc..Pierre Mahé, Laurent Jacob, Liva Ralaivola, Véronique Stoven, BriceHoffman, Martial Hue, Francis Bach, Jacob Abernethy, TheosEvgeniou...
Network inferenceKevin Bleakley, Fantine Mordelet, Yoshihiro Yamanihi, Gérard Biau,Minoru Kanehisa, William Noble, Jian Qiu...
Jean-Philippe Vert (ParisTech-Curie) Machine learning in bioinformatics 57 / 57