protein subcellular localization ... - a-alaa.github.io · protein subcellular localization...
TRANSCRIPT
Protein Subcellular Localization PredictionBased on Internal Micro-similarities ofMarkovChainsMarkov Chains Latent Representation by Micro-similarities
25 July 2019
Asem Alaa1, Ayman Eldeib1, Ahmed A. Metwally1, 2
1Systems and Biomedical Engineering Department, Cairo University2Department of Genetics, Stanford University
Background: Protein Localization
Protein Localization: The biological process of accumulation of a protein at a given site.Other names: Protein Targeting or Protein Sorting.
www.uniprot.org - June 2019
Uncovering Protein Localization is essential in Target Identification forDrug Discovery
2
Background: Protein Localization
Protein Localization: The biological process of accumulation of a protein at a given site.Other names: Protein Targeting or Protein Sorting.
www.uniprot.org - June 2019
Uncovering Protein Localization is essential in Target Identification forDrug Discovery
2
Background: Protein Localization
Protein Localization: The biological process of accumulation of a protein at a given site.Other names: Protein Targeting or Protein Sorting.
www.uniprot.org - June 2019
Uncovering Protein Localization is essential in Target Identification forDrug Discovery
2
Background: Protein Localization
Protein Localization: The biological process of accumulation of a protein at a given site.Other names: Protein Targeting or Protein Sorting.
www.uniprot.org - June 2019
Uncovering Protein Localization is essential in Target Identification forDrug Discovery2
Background: The current prediction methods
+158 Million proteins entries.Strong demand for computational prediction methods.+90 computational method for predicting localization.
Methods by category
Rule-based systems (if-else).
Machine learning.
Common Drawbacks
Poor classification performance.
Loss of generality.
Lack of interpretability.What is interpretability?“Interpretability is the degree to which a human can understand the cause of adecision.” (Miller, 2018)
Why interpretability matters?Not only predict which location a protein is transported tobut also why it is transported there
3
Background: The current prediction methods
+158 Million proteins entries.Strong demand for computational prediction methods.+90 computational method for predicting localization.
Methods by category
Rule-based systems (if-else).
Machine learning.
Common Drawbacks
Poor classification performance.
Loss of generality.
Lack of interpretability.What is interpretability?“Interpretability is the degree to which a human can understand the cause of adecision.” (Miller, 2018)
Why interpretability matters?Not only predict which location a protein is transported tobut also why it is transported there
3
Background: The current prediction methods
+158 Million proteins entries.Strong demand for computational prediction methods.+90 computational method for predicting localization.
Methods by category
Rule-based systems (if-else).
Machine learning.
Common Drawbacks
Poor classification performance.
Loss of generality.
Lack of interpretability.
What is interpretability?“Interpretability is the degree to which a human can understand the cause of adecision.” (Miller, 2018)
Why interpretability matters?Not only predict which location a protein is transported tobut also why it is transported there
3
Background: The current prediction methods
+158 Million proteins entries.Strong demand for computational prediction methods.+90 computational method for predicting localization.
Methods by category
Rule-based systems (if-else).
Machine learning.
Common Drawbacks
Poor classification performance.
Loss of generality.
Lack of interpretability.What is interpretability?
“Interpretability is the degree to which a human can understand the cause of adecision.” (Miller, 2018)
Why interpretability matters?Not only predict which location a protein is transported tobut also why it is transported there
3
Background: The current prediction methods
+158 Million proteins entries.Strong demand for computational prediction methods.+90 computational method for predicting localization.
Methods by category
Rule-based systems (if-else).
Machine learning.
Common Drawbacks
Poor classification performance.
Loss of generality.
Lack of interpretability.What is interpretability?“Interpretability is the degree to which a human can understand the cause of adecision.” (Miller, 2018)
Why interpretability matters?Not only predict which location a protein is transported tobut also why it is transported there
3
Background: The current prediction methods
+158 Million proteins entries.Strong demand for computational prediction methods.+90 computational method for predicting localization.
Methods by category
Rule-based systems (if-else).
Machine learning.
Common Drawbacks
Poor classification performance.
Loss of generality.
Lack of interpretability.What is interpretability?“Interpretability is the degree to which a human can understand the cause of adecision.” (Miller, 2018)
Why interpretability matters?Not only predict which location a protein is transported to
but also why it is transported there
3
Background: The current prediction methods
+158 Million proteins entries.Strong demand for computational prediction methods.+90 computational method for predicting localization.
Methods by category
Rule-based systems (if-else).
Machine learning.
Common Drawbacks
Poor classification performance.
Loss of generality.
Lack of interpretability.What is interpretability?“Interpretability is the degree to which a human can understand the cause of adecision.” (Miller, 2018)
Why interpretability matters?Not only predict which location a protein is transported tobut also why it is transported there
3
Our objectives
Study Aims:
Enhance classification performace of current methods.
Keep interpretable!
Findings:
Generative models: learn representations.
Dicriminative models: classify representations.
Markov models is interpretable and generative model.
Idea: Combine a generative model with a discriminative model
Now we have the advantage of interpretability and dicrimination power.
4
Our objectives
Study Aims:
Enhance classification performace of current methods.
Keep interpretable!
Findings:
Generative models: learn representations.
Dicriminative models: classify representations.
Markov models is interpretable and generative model.
Idea: Combine a generative model with a discriminative model
Now we have the advantage of interpretability and dicrimination power.
4
Our objectives
Study Aims:
Enhance classification performace of current methods.
Keep interpretable!
Findings:
Generative models: learn representations.
Dicriminative models: classify representations.
Markov models is interpretable and generative model.
Idea: Combine a generative model with a discriminative model
Now we have the advantage of interpretability and dicrimination power.
4
Our objectives
Study Aims:
Enhance classification performace of current methods.
Keep interpretable!
Findings:
Generative models: learn representations.
Dicriminative models: classify representations.
Markov models is interpretable and generative model.
Idea: Combine a generative model with a discriminative model
Now we have the advantage of interpretability and dicrimination power.
4
Table of contents
1 Introduction
2 Methodology
3 Experiments & Results
4 Conclusion
5
Introduction
Background: The classification problem of protein localization
In ML notions:
Let 𝑥 a protein sequence.Let 𝑦 be a subcellular site.
A classification tool estimates 𝑓(𝑥) = 𝑝(𝑦|𝑥) (ideally a probabilistic function). Forexample:
𝑓( ) =⎧{⎨{⎩
0.6 𝑦 =0.3 𝑦 =0.1 𝑦 =
6
Background: The classification problem of protein localization
In ML notions:
Let 𝑥 a protein sequence.Let 𝑦 be a subcellular site.
A classification tool estimates 𝑓(𝑥) = 𝑝(𝑦|𝑥) (ideally a probabilistic function). Forexample:
𝑓( ) =⎧{⎨{⎩
0.6 𝑦 =0.3 𝑦 =0.1 𝑦 =
6
Markov chains
𝑥 represents a given sequence of amino acids of length 𝐿: 𝑥 = 𝑎1𝑎2 … 𝑎𝑖 … 𝑎𝐿𝑛 amino acids (typically 𝑛 = 20): 𝑎𝑖 ∈ 𝑆 and 𝑆 = {𝑠1, 𝑠2, .., 𝑠𝑖, .., 𝑠𝑛}𝑚-Order, time-homogeneous, Markov chains (by default first-order, 𝑚 = 1):
𝑃(𝑎𝑖|𝑎1, 𝑎2, … , 𝑎𝑖−1) = 𝑃(𝑎𝑖|𝑎𝑖−1)
Markov chains are represented by Transition Matrix
A Transition Matrix of first order 𝑚 = 1 and 20 states (𝑛 = 20):
𝑇 =⎛⎜⎜⎜⎜⎝
state 𝑠1 𝑠2 … 𝑠20𝑠1 𝑝(𝑠1|𝑠1) 𝑝(𝑠2|𝑠1) … 𝑝(𝑠20|𝑠1)𝑠2 𝑝(𝑠1|𝑠2) 𝑝(𝑠2|𝑠2) … 𝑝(𝑠20|𝑠2)⋮ ⋮ ⋮ ⋱ ⋮𝑠20 𝑝(𝑠1|𝑠20) 𝑝(𝑠2|𝑠20) … 𝑝(𝑠20|𝑠20)
⎞⎟⎟⎟⎟⎠
7
Fitting 1𝑠𝑡-order Markov chains (𝑚 = 1)
Constructing the Transition Matrix
Let 𝒟 = {(𝑥𝑖, 𝑦𝑖)|𝑖 = 1 ∶ 𝑁} be all training examples.
Let 𝒟𝑦 be a subset of training sequences labeled with location𝑦 ∈ 𝒴 = { , , … , }The 20 × 20 parameters 𝑝(𝑠𝑖|𝑠𝑗, 𝑦) computed by:
𝑝(𝑠𝑖|𝑠𝑗, 𝑦) = ∑𝑥∈𝒟𝑦 count(𝑥, 𝑠𝑗𝑠𝑖)/𝑍𝑍 is a normalization factor.
8
Traditional method of Markov chain inference
Given a Markov chain (20 × 20 parameters) trained on location 𝑦 = .
Given an unknown sequence 𝑥 = (𝑎1𝑎2 … 𝑎𝐿).The probability 𝑝(𝑥 = |𝑦 = ):
𝑝(𝑥|𝑦) = 𝑝(𝑎2|𝑎1, 𝑦)𝑝(𝑎3|𝑎2, 𝑦) … 𝑝(𝑎𝐿|𝑎𝐿−1, 𝑦)Propensity Ω(𝑥|𝑦): to avoid underflow problems, we rely on the log(𝑝) instead:
Ω(𝑥|𝑦) = ∑𝐿𝑖=2 log 𝑝(𝑎𝑖|𝑎𝑖−1, 𝑦)
The 𝑦 that maximizes Ω(𝑥|𝑦) is reported as a prediction. (Yuan, 1999)
Nothing new here .... yet.
9
Traditional method of Markov chain inference
Given a Markov chain (20 × 20 parameters) trained on location 𝑦 = .
Given an unknown sequence 𝑥 = (𝑎1𝑎2 … 𝑎𝐿).The probability 𝑝(𝑥 = |𝑦 = ):
𝑝(𝑥|𝑦) = 𝑝(𝑎2|𝑎1, 𝑦)𝑝(𝑎3|𝑎2, 𝑦) … 𝑝(𝑎𝐿|𝑎𝐿−1, 𝑦)Propensity Ω(𝑥|𝑦): to avoid underflow problems, we rely on the log(𝑝) instead:
Ω(𝑥|𝑦) = ∑𝐿𝑖=2 log 𝑝(𝑎𝑖|𝑎𝑖−1, 𝑦)
The 𝑦 that maximizes Ω(𝑥|𝑦) is reported as a prediction. (Yuan, 1999)
Nothing new here
.... yet.
9
Traditional method of Markov chain inference
Given a Markov chain (20 × 20 parameters) trained on location 𝑦 = .
Given an unknown sequence 𝑥 = (𝑎1𝑎2 … 𝑎𝐿).The probability 𝑝(𝑥 = |𝑦 = ):
𝑝(𝑥|𝑦) = 𝑝(𝑎2|𝑎1, 𝑦)𝑝(𝑎3|𝑎2, 𝑦) … 𝑝(𝑎𝐿|𝑎𝐿−1, 𝑦)Propensity Ω(𝑥|𝑦): to avoid underflow problems, we rely on the log(𝑝) instead:
Ω(𝑥|𝑦) = ∑𝐿𝑖=2 log 𝑝(𝑎𝑖|𝑎𝑖−1, 𝑦)
The 𝑦 that maximizes Ω(𝑥|𝑦) is reported as a prediction. (Yuan, 1999)
Nothing new here .... yet.9
Methodology
Group Transition Matrix parameters into set of probability distributions
Group the transition matrix rows as probability distributions:
𝑇 =⎛⎜⎜⎜⎜⎝
𝑝(𝑠1|𝑠1) 𝑝(𝑠2|𝑠1) … 𝑝(𝑠20|𝑠1)𝑝(𝑠1|𝑠2) 𝑝(𝑠2|𝑠2) … 𝑝(𝑠20|𝑠2)
⋮ ⋮ ⋱ ⋮𝑝(𝑠1|𝑠20) 𝑝(𝑠2|𝑠20) … 𝑝(𝑠20|𝑠20)
⎞⎟⎟⎟⎟⎠
Alternative representation as 𝑛 probability distributions (𝑛 = 20)
𝐻 = {𝑄1, 𝑄2, … , 𝑄𝑛}After fitting all training examples in the dataset across all classes, we getBackbone Profiles ℋ:
ℋ = {𝐻𝑦= , … , 𝐻𝑦= }
10
Group Transition Matrix parameters into set of probability distributions
Group the transition matrix rows as probability distributions:
𝑇 =⎛⎜⎜⎜⎜⎝
𝑝(𝑠1|𝑠1) 𝑝(𝑠2|𝑠1) … 𝑝(𝑠20|𝑠1)𝑝(𝑠1|𝑠2) 𝑝(𝑠2|𝑠2) … 𝑝(𝑠20|𝑠2)
⋮ ⋮ ⋱ ⋮𝑝(𝑠1|𝑠20) 𝑝(𝑠2|𝑠20) … 𝑝(𝑠20|𝑠20)
⎞⎟⎟⎟⎟⎠
Alternative representation as 𝑛 probability distributions (𝑛 = 20)
𝐻 = {𝑄1, 𝑄2, … , 𝑄𝑛}After fitting all training examples in the dataset across all classes, we getBackbone Profiles ℋ:
ℋ = {𝐻𝑦= , … , 𝐻𝑦= }
10
Group Transition Matrix parameters into set of probability distributions
Group the transition matrix rows as probability distributions:
𝑇 =⎛⎜⎜⎜⎜⎝
𝑝(𝑠1|𝑠1) 𝑝(𝑠2|𝑠1) … 𝑝(𝑠20|𝑠1)𝑝(𝑠1|𝑠2) 𝑝(𝑠2|𝑠2) … 𝑝(𝑠20|𝑠2)
⋮ ⋮ ⋱ ⋮𝑝(𝑠1|𝑠20) 𝑝(𝑠2|𝑠20) … 𝑝(𝑠20|𝑠20)
⎞⎟⎟⎟⎟⎠
Alternative representation as 𝑛 probability distributions (𝑛 = 20)
𝐻 = {𝑄1, 𝑄2, … , 𝑄𝑛}
After fitting all training examples in the dataset across all classes, we getBackbone Profiles ℋ:
ℋ = {𝐻𝑦= , … , 𝐻𝑦= }
10
Group Transition Matrix parameters into set of probability distributions
Group the transition matrix rows as probability distributions:
𝑇 =⎛⎜⎜⎜⎜⎝
𝑝(𝑠1|𝑠1) 𝑝(𝑠2|𝑠1) … 𝑝(𝑠20|𝑠1)𝑝(𝑠1|𝑠2) 𝑝(𝑠2|𝑠2) … 𝑝(𝑠20|𝑠2)
⋮ ⋮ ⋱ ⋮𝑝(𝑠1|𝑠20) 𝑝(𝑠2|𝑠20) … 𝑝(𝑠20|𝑠20)
⎞⎟⎟⎟⎟⎠
Alternative representation as 𝑛 probability distributions (𝑛 = 20)
𝐻 = {𝑄1, 𝑄2, … , 𝑄𝑛}After fitting all training examples in the dataset across all classes, we getBackbone Profiles ℋ:
ℋ = {𝐻𝑦= , … , 𝐻𝑦= }10
Projecting a sequence 𝑥 against a backbone profile 𝐻𝑦
From 𝒟𝑦 ∈ 𝒟,
we already have
a Transition Matrix 𝑇 𝑦.
Alternative structure𝐻𝑦,
𝐻𝑦 = {𝑄𝑦1, … , 𝑄𝑦
𝑛}
From a given sequence 𝑥 = ,
we construct
a Transition Matrix 𝑇 𝑥.
Alternative structure𝐻𝑥,
𝐻𝑥 = {𝑄𝑥1 , … , 𝑄𝑥
𝑛}Let 𝜙 ∶ ℝ𝑛 × ℝ𝑛 ↦ ℝ: measures similarity between two probabilitydistributions.Compute the projection between 𝐻𝑦 and 𝐻𝑥 :
project(𝐻𝑥, 𝐻𝑦) = [cos(𝑄𝑥1 , 𝑄𝑦
1), … , cos(𝑄𝑥𝑛, 𝑄𝑦
𝑛)]
Note: 𝑛 parameters for project(𝐻𝑥, 𝐻𝑦)
11
Projecting a sequence 𝑥 against a backbone profile 𝐻𝑦
From 𝒟𝑦 ∈ 𝒟,
we already have
a Transition Matrix 𝑇 𝑦.
Alternative structure𝐻𝑦,
𝐻𝑦 = {𝑄𝑦1, … , 𝑄𝑦
𝑛}
From a given sequence 𝑥 = ,
we construct
a Transition Matrix 𝑇 𝑥.
Alternative structure𝐻𝑥,
𝐻𝑥 = {𝑄𝑥1 , … , 𝑄𝑥
𝑛}Let 𝜙 ∶ ℝ𝑛 × ℝ𝑛 ↦ ℝ: measures similarity between two probabilitydistributions.Compute the projection between 𝐻𝑦 and 𝐻𝑥 :
project(𝐻𝑥, 𝐻𝑦) = [cos(𝑄𝑥1 , 𝑄𝑦
1), … , cos(𝑄𝑥𝑛, 𝑄𝑦
𝑛)]
Note: 𝑛 parameters for project(𝐻𝑥, 𝐻𝑦)
11
Projecting a sequence 𝑥 against a backbone profile 𝐻𝑦
From 𝒟𝑦 ∈ 𝒟,
we already have
a Transition Matrix 𝑇 𝑦.
Alternative structure𝐻𝑦,
𝐻𝑦 = {𝑄𝑦1, … , 𝑄𝑦
𝑛}
From a given sequence 𝑥 = ,
we construct
a Transition Matrix 𝑇 𝑥.
Alternative structure𝐻𝑥,
𝐻𝑥 = {𝑄𝑥1 , … , 𝑄𝑥
𝑛}
Let 𝜙 ∶ ℝ𝑛 × ℝ𝑛 ↦ ℝ: measures similarity between two probabilitydistributions.Compute the projection between 𝐻𝑦 and 𝐻𝑥 :
project(𝐻𝑥, 𝐻𝑦) = [cos(𝑄𝑥1 , 𝑄𝑦
1), … , cos(𝑄𝑥𝑛, 𝑄𝑦
𝑛)]
Note: 𝑛 parameters for project(𝐻𝑥, 𝐻𝑦)
11
Projecting a sequence 𝑥 against a backbone profile 𝐻𝑦
From 𝒟𝑦 ∈ 𝒟,
we already have
a Transition Matrix 𝑇 𝑦.
Alternative structure𝐻𝑦,
𝐻𝑦 = {𝑄𝑦1, … , 𝑄𝑦
𝑛}
From a given sequence 𝑥 = ,
we construct
a Transition Matrix 𝑇 𝑥.
Alternative structure𝐻𝑥,
𝐻𝑥 = {𝑄𝑥1 , … , 𝑄𝑥
𝑛}Let 𝜙 ∶ ℝ𝑛 × ℝ𝑛 ↦ ℝ: measures similarity between two probabilitydistributions.
Compute the projection between 𝐻𝑦 and 𝐻𝑥 :
project(𝐻𝑥, 𝐻𝑦) = [cos(𝑄𝑥1 , 𝑄𝑦
1), … , cos(𝑄𝑥𝑛, 𝑄𝑦
𝑛)]
Note: 𝑛 parameters for project(𝐻𝑥, 𝐻𝑦)
11
Projecting a sequence 𝑥 against a backbone profile 𝐻𝑦
From 𝒟𝑦 ∈ 𝒟,
we already have
a Transition Matrix 𝑇 𝑦.
Alternative structure𝐻𝑦,
𝐻𝑦 = {𝑄𝑦1, … , 𝑄𝑦
𝑛}
From a given sequence 𝑥 = ,
we construct
a Transition Matrix 𝑇 𝑥.
Alternative structure𝐻𝑥,
𝐻𝑥 = {𝑄𝑥1 , … , 𝑄𝑥
𝑛}Let 𝜙 ∶ ℝ𝑛 × ℝ𝑛 ↦ ℝ: measures similarity between two probabilitydistributions.Compute the projection between 𝐻𝑦 and 𝐻𝑥 :
project(𝐻𝑥, 𝐻𝑦) = [cos(𝑄𝑥1 , 𝑄𝑦
1), … , cos(𝑄𝑥𝑛, 𝑄𝑦
𝑛)]
Note: 𝑛 parameters for project(𝐻𝑥, 𝐻𝑦)
11
Projecting a sequence 𝑥 against a backbone profile 𝐻𝑦
From 𝒟𝑦 ∈ 𝒟,
we already have
a Transition Matrix 𝑇 𝑦.
Alternative structure𝐻𝑦,
𝐻𝑦 = {𝑄𝑦1, … , 𝑄𝑦
𝑛}
From a given sequence 𝑥 = ,
we construct
a Transition Matrix 𝑇 𝑥.
Alternative structure𝐻𝑥,
𝐻𝑥 = {𝑄𝑥1 , … , 𝑄𝑥
𝑛}Let 𝜙 ∶ ℝ𝑛 × ℝ𝑛 ↦ ℝ: measures similarity between two probabilitydistributions.Compute the projection between 𝐻𝑦 and 𝐻𝑥 :
project(𝐻𝑥, 𝐻𝑦) = [cos(𝑄𝑥1 , 𝑄𝑦
1), … , cos(𝑄𝑥𝑛, 𝑄𝑦
𝑛)]
Note: 𝑛 parameters for project(𝐻𝑥, 𝐻𝑦)11
Similairity/dissimilarity function (𝜙(𝐴, 𝐵)) selection
Myriad of similarity metrics in literature.
Our experiments show stable performance for cosine function across differentdatasets.
Cosine (similarity)
𝑛∑𝑖=1
𝐴𝑖𝐵𝑖
√𝑛∑𝑖=1
𝐴2𝑖 √
𝑛∑𝑖=1
𝐵2𝑖
Chi-squared (dissimilarity)
𝑛∑𝑖=1
(𝐴𝑖 − 𝐵𝑖)2
𝐴𝑖
Hellinger (dissimilarity)
1√2
√𝑛
∑𝑖=1
(√𝐴𝑖 − √𝐵𝑖)2
12
Projecting a sequence 𝑥 against backbone profiles ℋ
We apply project(𝐻𝑥, 𝐻𝑦) ∀𝑦 ∈ 𝒴 = { , , … , }.Let 𝐶 = |𝒴| be the count of the subcellular locations.∴ a sequence 𝑥 has a latent representation:
𝔽𝑥 = [project(𝐻𝑥, 𝐻1) ∶ … ∶ project(𝐻𝑥, 𝐻𝐶)]
Note: 𝐶 × 𝑛 parameters for 𝔽𝑥.
13
Projecting a sequence 𝑥 against backbone profiles ℋ
We apply project(𝐻𝑥, 𝐻𝑦) ∀𝑦 ∈ 𝒴 = { , , … , }.
Let 𝐶 = |𝒴| be the count of the subcellular locations.∴ a sequence 𝑥 has a latent representation:
𝔽𝑥 = [project(𝐻𝑥, 𝐻1) ∶ … ∶ project(𝐻𝑥, 𝐻𝐶)]
Note: 𝐶 × 𝑛 parameters for 𝔽𝑥.
13
Projecting a sequence 𝑥 against backbone profiles ℋ
We apply project(𝐻𝑥, 𝐻𝑦) ∀𝑦 ∈ 𝒴 = { , , … , }.Let 𝐶 = |𝒴| be the count of the subcellular locations.
∴ a sequence 𝑥 has a latent representation:
𝔽𝑥 = [project(𝐻𝑥, 𝐻1) ∶ … ∶ project(𝐻𝑥, 𝐻𝐶)]
Note: 𝐶 × 𝑛 parameters for 𝔽𝑥.
13
Projecting a sequence 𝑥 against backbone profiles ℋ
We apply project(𝐻𝑥, 𝐻𝑦) ∀𝑦 ∈ 𝒴 = { , , … , }.Let 𝐶 = |𝒴| be the count of the subcellular locations.∴ a sequence 𝑥 has a latent representation:
𝔽𝑥 = [project(𝐻𝑥, 𝐻1) ∶ … ∶ project(𝐻𝑥, 𝐻𝐶)]
Note: 𝐶 × 𝑛 parameters for 𝔽𝑥.
13
Projecting a sequence 𝑥 against backbone profiles ℋ
We apply project(𝐻𝑥, 𝐻𝑦) ∀𝑦 ∈ 𝒴 = { , , … , }.Let 𝐶 = |𝒴| be the count of the subcellular locations.∴ a sequence 𝑥 has a latent representation:
𝔽𝑥 = [project(𝐻𝑥, 𝐻1) ∶ … ∶ project(𝐻𝑥, 𝐻𝐶)]
Note: 𝐶 × 𝑛 parameters for 𝔽𝑥.
13
Projecting a sequence 𝑥 against backbone profiles ℋ
We apply project(𝐻𝑥, 𝐻𝑦) ∀𝑦 ∈ 𝒴 = { , , … , }.Let 𝐶 = |𝒴| be the count of the subcellular locations.∴ a sequence 𝑥 has a latent representation:
𝔽𝑥 = [project(𝐻𝑥, 𝐻1) ∶ … ∶ project(𝐻𝑥, 𝐻𝐶)]
Note: 𝐶 × 𝑛 parameters for 𝔽𝑥.
13
Second stage: Combine a discriminative classifier
What we are up to: transform 𝒟 to �̂�
𝒟⏞⏞⏞⏞⏞⏞⏞⏞⏞⏞⏞⏞⏞⏞⏞
⎡⎢⎢⎢⎢⎣
𝑥 (sequence) 𝑦 (location)
1 𝑦1
2 𝑦2
⋮ ⋮
𝑁 𝑦𝑁
⎤⎥⎥⎥⎥⎦
�̂�⏞⏞⏞⏞⏞⏞⏞⏞⏞⏞⏞⏞⏞
⎡⎢⎢⎢⎣
𝔽𝑥 (latent) 𝑦 (location)
𝔽𝑥1 𝑦1
𝔽𝑥2 𝑦2⋮ ⋮
𝔽𝑥𝑁 𝑦𝑁
⎤⎥⎥⎥⎦
Finally introduce �̂� to a secondary discriminative classification framework.E.g: KNN, Random Forest, LDA, SVM.To keep interpretability, the secondary classifier needs to be interpretable as well.
14
Second stage: Combine a discriminative classifier
What we are up to: transform 𝒟 to �̂�𝒟
⏞⏞⏞⏞⏞⏞⏞⏞⏞⏞⏞⏞⏞⏞⏞
⎡⎢⎢⎢⎢⎣
𝑥 (sequence) 𝑦 (location)
1 𝑦1
2 𝑦2
⋮ ⋮
𝑁 𝑦𝑁
⎤⎥⎥⎥⎥⎦
�̂�⏞⏞⏞⏞⏞⏞⏞⏞⏞⏞⏞⏞⏞
⎡⎢⎢⎢⎣
𝔽𝑥 (latent) 𝑦 (location)
𝔽𝑥1 𝑦1
𝔽𝑥2 𝑦2⋮ ⋮
𝔽𝑥𝑁 𝑦𝑁
⎤⎥⎥⎥⎦
Finally introduce �̂� to a secondary discriminative classification framework.E.g: KNN, Random Forest, LDA, SVM.To keep interpretability, the secondary classifier needs to be interpretable as well.
14
Second stage: Combine a discriminative classifier
What we are up to: transform 𝒟 to �̂�𝒟
⏞⏞⏞⏞⏞⏞⏞⏞⏞⏞⏞⏞⏞⏞⏞
⎡⎢⎢⎢⎢⎣
𝑥 (sequence) 𝑦 (location)
1 𝑦1
2 𝑦2
⋮ ⋮
𝑁 𝑦𝑁
⎤⎥⎥⎥⎥⎦
�̂�⏞⏞⏞⏞⏞⏞⏞⏞⏞⏞⏞⏞⏞
⎡⎢⎢⎢⎣
𝔽𝑥 (latent) 𝑦 (location)
𝔽𝑥1 𝑦1
𝔽𝑥2 𝑦2⋮ ⋮
𝔽𝑥𝑁 𝑦𝑁
⎤⎥⎥⎥⎦
Finally introduce �̂� to a secondary discriminative classification framework.E.g: KNN, Random Forest, LDA, SVM.To keep interpretability, the secondary classifier needs to be interpretable as well.
14
Second stage: Combine a discriminative classifier
What we are up to: transform 𝒟 to �̂�𝒟
⏞⏞⏞⏞⏞⏞⏞⏞⏞⏞⏞⏞⏞⏞⏞
⎡⎢⎢⎢⎢⎣
𝑥 (sequence) 𝑦 (location)
1 𝑦1
2 𝑦2
⋮ ⋮
𝑁 𝑦𝑁
⎤⎥⎥⎥⎥⎦
�̂�⏞⏞⏞⏞⏞⏞⏞⏞⏞⏞⏞⏞⏞
⎡⎢⎢⎢⎣
𝔽𝑥 (latent) 𝑦 (location)
𝔽𝑥1 𝑦1
𝔽𝑥2 𝑦2⋮ ⋮
𝔽𝑥𝑁 𝑦𝑁
⎤⎥⎥⎥⎦
Finally introduce �̂� to a secondary discriminative classification framework.
E.g: KNN, Random Forest, LDA, SVM.To keep interpretability, the secondary classifier needs to be interpretable as well.
14
Second stage: Combine a discriminative classifier
What we are up to: transform 𝒟 to �̂�𝒟
⏞⏞⏞⏞⏞⏞⏞⏞⏞⏞⏞⏞⏞⏞⏞
⎡⎢⎢⎢⎢⎣
𝑥 (sequence) 𝑦 (location)
1 𝑦1
2 𝑦2
⋮ ⋮
𝑁 𝑦𝑁
⎤⎥⎥⎥⎥⎦
�̂�⏞⏞⏞⏞⏞⏞⏞⏞⏞⏞⏞⏞⏞
⎡⎢⎢⎢⎣
𝔽𝑥 (latent) 𝑦 (location)
𝔽𝑥1 𝑦1
𝔽𝑥2 𝑦2⋮ ⋮
𝔽𝑥𝑁 𝑦𝑁
⎤⎥⎥⎥⎦
Finally introduce �̂� to a secondary discriminative classification framework.E.g: KNN, Random Forest, LDA, SVM.
To keep interpretability, the secondary classifier needs to be interpretable as well.
14
Second stage: Combine a discriminative classifier
What we are up to: transform 𝒟 to �̂�𝒟
⏞⏞⏞⏞⏞⏞⏞⏞⏞⏞⏞⏞⏞⏞⏞
⎡⎢⎢⎢⎢⎣
𝑥 (sequence) 𝑦 (location)
1 𝑦1
2 𝑦2
⋮ ⋮
𝑁 𝑦𝑁
⎤⎥⎥⎥⎥⎦
�̂�⏞⏞⏞⏞⏞⏞⏞⏞⏞⏞⏞⏞⏞
⎡⎢⎢⎢⎣
𝔽𝑥 (latent) 𝑦 (location)
𝔽𝑥1 𝑦1
𝔽𝑥2 𝑦2⋮ ⋮
𝔽𝑥𝑁 𝑦𝑁
⎤⎥⎥⎥⎦
Finally introduce �̂� to a secondary discriminative classification framework.E.g: KNN, Random Forest, LDA, SVM.To keep interpretability, the secondary classifier needs to be interpretable as well.
14
Experiments & Results
Datasets: DeepLoc Almagro Armenteros et al., 2017
Eukaryotic proteins annotated with experimentally verified localization sites.
The DeepLoc data-set contains 13858 proteins.
Location Count Location Count
Nucleus 4043 Cytoplasm 2542
Extracellular 1973 Mitochondrion 1510
Cell membrane 1340 Endoplasmic reticulum 862
Plastid 757 Golgi apparatus 356
Lysosome/Vacuole 321 Peroxisome 154
15
Datasets: DeepLoc Almagro Armenteros et al., 2017
Eukaryotic proteins annotated with experimentally verified localization sites.
The DeepLoc data-set contains 13858 proteins.
Location Count Location Count
Nucleus 4043 Cytoplasm 2542
Extracellular 1973 Mitochondrion 1510
Cell membrane 1340 Endoplasmic reticulum 862
Plastid 757 Golgi apparatus 356
Lysosome/Vacuole 321 Peroxisome 154
15
Results: selection of the secondary classifier
10 classes problem.One-vs-all Random Forests (RF), (#decision trees = 1000).One-vs-all SVM, (RBF, Gamma parameters tuned by MaxLipo+TR from DLib toolkit).Stratified 10-cross validation.Cosine distance used.
OA ≜ 𝑇 𝑃 + 𝑇 𝑁𝑇 𝑃 + 𝑇 𝑁 + 𝐹𝑃 + 𝐹𝑁
SN ≜ 𝑇 𝑃𝑇 𝑃 + 𝐹𝑁
SP ≜ 𝑇 𝑁𝑇 𝑁 + 𝐹𝑃
MCC ≜∑𝑁
𝑘,𝑙,𝑚=1 𝐶𝑘𝑘𝐶𝑚𝑙−𝐶𝑙𝑘𝐶𝑘𝑚
√∑𝑁𝑘=1[(∑𝑁
𝑙=1 𝐶𝑙𝑘)(∑𝑁𝑓,𝑔=1;𝑓≠1 𝐶𝑔𝑓)√∑𝑁
𝑘=1[(∑𝑁𝑙=1 𝐶𝑘𝑙)(∑𝑁
𝑓,𝑔=1;𝑓≠1 𝐶𝑓𝑔)]]16
Conclusion
Conclusion, future work & � Code availability
Generative model combined with a discriminative can improve accuracy.Candidate supervised representation learning when interpretability is demanded.
Future endeavors
Optimize the integration of the generative-discriminative models.
Possible generative candidates: Markov Random Fields (MRF), VariationalAutoencoders (VAE).
You can clone the code at:
� github.com/aametwally/MC_MicroSimilarities
Available tutorial and datasets to reproduce the results in the paper17
Questions & references
Thank you for listening!
Questions?Almagro Armenteros, José Juan et al. (Nov. 2017). “DeepLoc: prediction of proteinsubcellular localization using deep learning”. In: Bioinformatics 33.21. Ed. byJohn Hancock, pp. 3387–3395. ISSN: 1367-4803. DOI:10.1093/bioinformatics/btx431. URL:https://academic.oup.com/bioinformatics/article/33/21/3387/3931857.
Miller, Tim (2018). “Explanation in artificial intelligence: Insights from the social sciences”.In: Artificial Intelligence.
Yuan, Zheng (May 1999). “Prediction of protein subcellular locations using Markov chainmodels”. In: FEBS Letters 451.1, pp. 23–26. ISSN: 0014-5793. DOI:10.1016/S0014-5793(99)00506-2. URL:https://www.sciencedirect.com/science/article/pii/S0014579399005062. 18