statistical inference problems with applications to...

57
Statistical Inference Problems with Applications to Computational Structural Biology Parthan Kasarapu Supervisors: Arun Konagurthu & Maria Garcia de la Banda 6 Aug 2015 Parthan Kasarapu Statistical Modelling of Protein Structures 6 Aug 2015 1 / 33

Upload: others

Post on 28-Jun-2020

5 views

Category:

Documents


0 download

TRANSCRIPT

  • Statistical Inference Problems with Applications toComputational Structural Biology

    Parthan KasarapuSupervisors:

    Arun Konagurthu & Maria Garcia de la Banda

    6 Aug 2015

    Parthan Kasarapu Statistical Modelling of Protein Structures 6 Aug 2015 1 / 33

  • Presentation Outline

    � Thesis Overview

    � Motivation� Research Summary

    I Statistical modellingI Applications to protein structural biology

    � Thesis contributions

    � Conclusion

    Parthan Kasarapu Statistical Modelling of Protein Structures 6 Aug 2015 2 / 33

  • Thesis Overview

    Parthan Kasarapu Statistical Modelling of Protein Structures 6 Aug 2015 3 / 33

  • Motivation: How many clusters?

    Principal component 1-4 -3 -2 -1 0 1 2 3 4 5

    Prin

    cip

    al co

    mp

    on

    en

    t 2

    -2

    -1.5

    -1

    -0.5

    0

    0.5

    1

    1.5

    setosa

    versicolor

    virginica

    Parthan Kasarapu Statistical Modelling of Protein Structures 6 Aug 2015 4 / 33

  • Motivation: How many clusters?

    Principal component 1-4 -3 -2 -1 0 1 2 3 4 5

    Prin

    cip

    al co

    mp

    on

    en

    t 2

    -2

    -1.5

    -1

    -0.5

    0

    0.5

    1

    1.5

    setosa

    versicolor

    virginica

    Parthan Kasarapu Statistical Modelling of Protein Structures 6 Aug 2015 4 / 33

  • Motivation: How many clusters?

    Principal component 1-4 -3 -2 -1 0 1 2 3 4 5

    Prin

    cip

    al co

    mp

    on

    en

    t 2

    -2

    -1.5

    -1

    -0.5

    0

    0.5

    1

    1.5

    setosa

    versicolor

    virginica

    Parthan Kasarapu Statistical Modelling of Protein Structures 6 Aug 2015 4 / 33

  • Motivation: How many clusters?

    Principal component 1-4 -3 -2 -1 0 1 2 3 4 5

    Princip

    al com

    ponent 2

    -2

    -1.5

    -1

    -0.5

    0

    0.5

    1

    1.5

    setosa

    versicolor

    virginica

    Parthan Kasarapu Statistical Modelling of Protein Structures 6 Aug 2015 4 / 33

  • Motivation: How many clusters?

    Principal component 1-4 -3 -2 -1 0 1 2 3 4 5

    Princip

    al com

    ponent 2

    -2

    -1.5

    -1

    -0.5

    0

    0.5

    1

    1.5

    setosa

    versicolor

    virginica

    (a) 2-cluster model

    Principal component 1-4 -3 -2 -1 0 1 2 3 4 5

    Princip

    al com

    ponent 2

    -2

    -1.5

    -1

    -0.5

    0

    0.5

    1

    1.5

    setosa

    versicolor

    virginica

    (b) 3-cluster model

    Principal component 1-4 -3 -2 -1 0 1 2 3 4 5

    Prin

    cip

    al co

    mp

    on

    en

    t 2

    -2

    -1.5

    -1

    -0.5

    0

    0.5

    1

    1.5

    setosa

    versicolor

    virginica

    (c) 4-cluster model

    Statistical model selection is important.

    Parthan Kasarapu Statistical Modelling of Protein Structures 6 Aug 2015 5 / 33

  • Model selection and inference

    � Several candidate models: which one to choose?I A criterion to compare models ...I Based on the model’s complexity and the goodness-of-fit

    Principal component 1-4 -3 -2 -1 0 1 2 3 4 5

    Princip

    al com

    ponent 2

    -2

    -1.5

    -1

    -0.5

    0

    0.5

    1

    1.5

    setosa

    versicolor

    virginica

    complexity: 2 means + 2 covariance matrices + cluster weights

    Parthan Kasarapu Statistical Modelling of Protein Structures 6 Aug 2015 6 / 33

  • Model selection and inference

    � Several candidate models: which one to choose?I A criterion to compare models ...I Based on the model’s complexity and the goodness-of-fit

    Principal component 1-4 -3 -2 -1 0 1 2 3 4 5

    Princip

    al com

    ponent 2

    -2

    -1.5

    -1

    -0.5

    0

    0.5

    1

    1.5

    setosa

    versicolor

    virginica

    complexity: 2 means + 2 covariance matrices + cluster weights

    Parthan Kasarapu Statistical Modelling of Protein Structures 6 Aug 2015 6 / 33

  • The typical model selection criteria ...

    � Various model selection criteria are commonly used ...I AIC, BIC, MDL, ...

    � An example of model selection ...

    (a) Normal model (b) Laplace model

    I Two parameters for each model (µ & σ)I Considered to have the same model complexity (limitation)

    Parthan Kasarapu Statistical Modelling of Protein Structures 6 Aug 2015 7 / 33

  • The typical model selection criteria ...

    � Various model selection criteria are commonly used ...I AIC, BIC, MDL, ...

    � An example of model selection ...

    (a) Normal model (b) Laplace model

    I Two parameters for each model (µ & σ)I Considered to have the same model complexity (limitation)

    Parthan Kasarapu Statistical Modelling of Protein Structures 6 Aug 2015 7 / 33

  • The typical model selection criteria ...

    � Various model selection criteria are commonly used ...I AIC, BIC, MDL, ...

    � An example of model selection ...

    (a) Normal model (b) Laplace model

    I Two parameters for each model (µ & σ)I Considered to have the same model complexity (limitation)

    Parthan Kasarapu Statistical Modelling of Protein Structures 6 Aug 2015 7 / 33

  • Minimum Message Length (MML) Framework

    Encoding of model (hypothesis) H and data DI (H&D) = I (H)︸︷︷︸

    First part

    + I (D|H)︸ ︷︷ ︸Second part

    � Two-part message:I I (H): model complexityI I (D|H): goodness-of-fit

    � Total message length I (H&D) is used to compare models.

    Model with the least message length is optimal

    Parthan Kasarapu Statistical Modelling of Protein Structures 6 Aug 2015 8 / 33

  • Minimum Message Length (MML) Framework

    Encoding of model (hypothesis) H and data DI (H&D) = I (H)︸︷︷︸

    First part

    + I (D|H)︸ ︷︷ ︸Second part

    � Two-part message:I I (H): model complexityI I (D|H): goodness-of-fit

    � Total message length I (H&D) is used to compare models.

    Model with the least message length is optimal

    Parthan Kasarapu Statistical Modelling of Protein Structures 6 Aug 2015 8 / 33

  • Minimum Message Length (MML) Framework

    Encoding of model (hypothesis) H and data DI (H&D) = I (H)︸︷︷︸

    First part

    + I (D|H)︸ ︷︷ ︸Second part

    � Two-part message:I I (H): model complexityI I (D|H): goodness-of-fit

    � Total message length I (H&D) is used to compare models.

    Model with the least message length is optimal

    Parthan Kasarapu Statistical Modelling of Protein Structures 6 Aug 2015 8 / 33

  • Problem: Modelling the protein main chain

    (a) True structure (Cα − Cα data)

    X1

    X2

    X3

    φ

    Pi

    θ

    Pi+1

    Pi−1

    Pi−2

    (b) Cα orientations

    Parthan Kasarapu Statistical Modelling of Protein Structures 6 Aug 2015 9 / 33

  • Problem: Modelling the protein main chain

    (a) True structure (Cα − Cα data)

    X1

    X2

    X3

    φ

    Pi

    θ

    Pi+1

    Pi−1

    Pi−2

    (b) Cα orientations

    Empirical distribution of (θ, φ)

    Parthan Kasarapu Statistical Modelling of Protein Structures 6 Aug 2015 9 / 33

  • Modelling of empirical distribution of directional data

    � Mixture modelling (Clustering)I Data is multi-modalI Ideal to find data clusters ...I Modelling using directional probability distributions

    Parthan Kasarapu Statistical Modelling of Protein Structures 6 Aug 2015 10 / 33

  • Mixture modelling (Clustering)

    Challenges:� Determination of the number of components

    I Proposed a search method

    � Ability to generalize to any probability distributionI No assumptions in terms of the nature of data or distribution

    P. Kasarapu, L. Allison, Minimum message length estimation of mixturesof multivariate Gaussian and von Mises–Fisher distributions, MachineLearning (2015) Vol. 100, No. 2-3, Pages 333-378

    Parthan Kasarapu Statistical Modelling of Protein Structures 6 Aug 2015 11 / 33

  • Mixture modelling (Clustering)

    Challenges:� Determination of the number of components

    I Proposed a search method

    � Ability to generalize to any probability distributionI No assumptions in terms of the nature of data or distribution

    P. Kasarapu, L. Allison, Minimum message length estimation of mixturesof multivariate Gaussian and von Mises–Fisher distributions, MachineLearning (2015) Vol. 100, No. 2-3, Pages 333-378

    Parthan Kasarapu Statistical Modelling of Protein Structures 6 Aug 2015 11 / 33

  • Proposed method to determine clusters of data

    Basic idea to determine number of clusters

    Perturb a K -component mixture through a series of operations so that themixture escapes a sub-optimal state to reach an improved state.

    � Operations include ...I SplitI DeleteI Merge

    Parthan Kasarapu Statistical Modelling of Protein Structures 6 Aug 2015 12 / 33

  • Illustrative example of the search method

    X

    -4 -2 0 2 4

    Y

    -4

    -3

    -2

    -1

    0

    1

    2

    3

    4

    Original mixture with threecomponents.

    Parthan Kasarapu Statistical Modelling of Protein Structures 6 Aug 2015 13 / 33

  • Illustrative example of the search method

    X

    -4 -2 0 2 4

    Y

    -4

    -3

    -2

    -1

    0

    1

    2

    3

    4

    Original mixture with threecomponents.

    X

    -4 -2 0 2 4

    Y

    -4

    -3

    -2

    -1

    0

    1

    2

    3

    4

    Begin with a one-componentmixture.

    Parthan Kasarapu Statistical Modelling of Protein Structures 6 Aug 2015 13 / 33

  • Illustrative example of the search method

    X

    -4 -2 0 2 4

    Y

    -4

    -3

    -2

    -1

    0

    1

    2

    3

    4

    (a) I = 22793 bitsX

    -4 -2 0 2 4

    Y

    -4

    -3

    -2

    -1

    0

    1

    2

    3

    4

    (b) SplittingX

    -4 -2 0 2 4

    Y

    -4

    -3

    -2

    -1

    0

    1

    2

    3

    4

    (c) I = 22673 bits

    Split operation

    A parent component is split to find locally optimal children leading to a(K + 1)-component mixture.

    Parthan Kasarapu Statistical Modelling of Protein Structures 6 Aug 2015 14 / 33

  • Illustrative example of the search method

    X

    -4 -2 0 2 4

    Y

    -4

    -3

    -2

    -1

    0

    1

    2

    3

    4

    (a) Initial meansX

    -4 -2 0 2 4

    Y

    -4

    -3

    -2

    -1

    0

    1

    2

    3

    4

    (b) I = 22691 bitsX

    -4 -2 0 2 4

    Y

    -4

    -3

    -2

    -1

    0

    1

    2

    3

    4

    (c) I = 22460 bits

    Parthan Kasarapu Statistical Modelling of Protein Structures 6 Aug 2015 15 / 33

  • Illustrative example of the search method

    X

    -4 -2 0 2 4

    Y

    -4

    -3

    -2

    -1

    0

    1

    2

    3

    4

    (a) DeletingX

    -4 -2 0 2 4

    Y

    -4

    -3

    -2

    -1

    0

    1

    2

    3

    4

    (b) I = 25599 bitsX

    -4 -2 0 2 4

    Y

    -4

    -3

    -2

    -1

    0

    1

    2

    3

    4

    (c) I = 22793 bits

    Delete operation

    A component is deleted to find an optimal (K − 1)-component mixture.

    Parthan Kasarapu Statistical Modelling of Protein Structures 6 Aug 2015 16 / 33

  • Illustrative example of the search method

    X

    -4 -2 0 2 4

    Y

    -4

    -3

    -2

    -1

    0

    1

    2

    3

    4

    (a) MergingX

    -4 -2 0 2 4

    Y

    -4

    -3

    -2

    -1

    0

    1

    2

    3

    4

    (b) InitializationX

    -4 -2 0 2 4

    Y

    -4

    -3

    -2

    -1

    0

    1

    2

    3

    4

    (c) I = 22793 bits

    Merge operation

    A pair of close components are merged to find an optimal(K − 1)-component mixture.

    Parthan Kasarapu Statistical Modelling of Protein Structures 6 Aug 2015 17 / 33

  • Evolution of the mixture model

    22.35

    22.40

    22.45

    22.50

    22.55

    22.60

    22.65

    22.70

    22.75

    22.80

    2 4 6 8 10 12 140.02

    0.04

    0.06

    0.08

    0.10

    0.12

    0.14

    0.16

    0.18

    0.20

    0.22

    Mes

    sag

    e le

    ng

    th (

    in t

    ho

    usa

    nd

    s o

    f b

    its)

    Number of components

    first partsecond part

    total

    Variation of the individual parts of the total message length with increasingnumber of components (clusters).

    Parthan Kasarapu Statistical Modelling of Protein Structures 6 Aug 2015 18 / 33

  • Models of protein data

    (a) Data (b) von Mises-Fisher (c) Kent

    Parthan Kasarapu Statistical Modelling of Protein Structures 6 Aug 2015 19 / 33

  • Models of protein data

    (a) vMF: β = 0 (b) Kent: β > 0

    Kent probability density function

    ∝ exp{ κγγγT1 x︸ ︷︷ ︸linear term

    +β(γγγT2 x)2 − β(γγγT3 x)2︸ ︷︷ ︸

    non-linear term

    }

    Parthan Kasarapu Statistical Modelling of Protein Structures 6 Aug 2015 20 / 33

  • Modelling using Kent distributions

    Challenges:� Complex mathematical form

    I Parameter estimation is a difficult task

    � Mixture modellingI Cluster data on the spherical surface

    Parthan Kasarapu Statistical Modelling of Protein Structures 6 Aug 2015 21 / 33

  • Modelling using Kent distributions

    Challenges:� Complex mathematical form

    I Parameter estimation is a difficult task

    � Mixture modellingI Cluster data on the spherical surface

    Parthan Kasarapu Statistical Modelling of Protein Structures 6 Aug 2015 21 / 33

  • vMF and Kent mixtures of protein directional data

    Longitude φ

    0 60 120 180 240 300 360

    Co

    -la

    titu

    de

    θ

    15

    35

    55

    75

    95

    115

    15

    14

    2221

    13

    2019

    12

    18

    11

    17

    10

    16

    876

    543

    9

    21

    (a) Uncorrelated (35 vMF clusters)

    Longitude φ

    0 60 120 180 240 300 360

    Co

    -la

    titu

    de

    θ

    15

    35

    55

    75

    95

    115

    11

    10

    13

    9

    8

    12

    7

    654

    32

    1

    (b) Correlated (23 Kent clusters)

    How are these models useful?� Discovery of frequently occuring patterns

    I Dedicated clusters for helices, strands, etc.� Clustering profile can be related to protein function

    I Structurally similar proteins will have similar clusters� Ab initio protein structure prediction

    I Random protein generation, homology modelling, template structures, etc.

    Parthan Kasarapu Statistical Modelling of Protein Structures 6 Aug 2015 22 / 33

  • vMF and Kent mixtures of protein directional data

    Longitude φ

    0 60 120 180 240 300 360

    Co

    -la

    titu

    de

    θ

    15

    35

    55

    75

    95

    115

    15

    14

    2221

    13

    2019

    12

    18

    11

    17

    10

    16

    876

    543

    9

    21

    (a) Uncorrelated (35 vMF clusters)

    Longitude φ

    0 60 120 180 240 300 360

    Co

    -la

    titu

    de

    θ

    15

    35

    55

    75

    95

    115

    11

    10

    13

    9

    8

    12

    7

    654

    32

    1

    (b) Correlated (23 Kent clusters)

    How are these models useful?� Discovery of frequently occuring patterns

    I Dedicated clusters for helices, strands, etc.� Clustering profile can be related to protein function

    I Structurally similar proteins will have similar clusters� Ab initio protein structure prediction

    I Random protein generation, homology modelling, template structures, etc.

    Parthan Kasarapu Statistical Modelling of Protein Structures 6 Aug 2015 22 / 33

  • vMF and Kent mixtures of protein directional data

    Longitude φ

    0 60 120 180 240 300 360

    Co

    -la

    titu

    de

    θ

    15

    35

    55

    75

    95

    115

    15

    14

    2221

    13

    2019

    12

    18

    11

    17

    10

    16

    876

    543

    9

    21

    (a) Uncorrelated (35 vMF clusters)

    Longitude φ

    0 60 120 180 240 300 360

    Co

    -la

    titu

    de

    θ

    15

    35

    55

    75

    95

    115

    11

    10

    13

    9

    8

    12

    7

    654

    32

    1

    (b) Correlated (23 Kent clusters) -optimal!

    ModelTotal message length Bits per

    (millions of bits) residue

    Uniform 6.895 27.434vMF mixture 6.449 25.656Kent mixture 6.442 25.630

    Parthan Kasarapu Statistical Modelling of Protein Structures 6 Aug 2015 23 / 33

  • Problem: Modelling of protein dihedral angles

    � Modelling protein dihedral angles (φ, ψ)I φ, ψ ∈ [0, 2π) represents a point on the torusI Cannot be modelled using vMF or KentI Modelled using mixtures of bivariate von Mises (BVM) distributions

    Parthan Kasarapu Statistical Modelling of Protein Structures 6 Aug 2015 24 / 33

  • Problem: Modelling of protein dihedral angles

    � Modelling protein dihedral angles (φ, ψ)I φ, ψ ∈ [0, 2π)

    represents a point on the torusI Cannot be modelled using vMF or KentI Modelled using mixtures of bivariate von Mises (BVM) distributions

    Parthan Kasarapu Statistical Modelling of Protein Structures 6 Aug 2015 24 / 33

  • Problem: Modelling of protein dihedral angles

    � Modelling protein dihedral angles (φ, ψ)I φ, ψ ∈ [0, 2π) represents a point on the torus

    I Cannot be modelled using vMF or KentI Modelled using mixtures of bivariate von Mises (BVM) distributions

    Parthan Kasarapu Statistical Modelling of Protein Structures 6 Aug 2015 24 / 33

  • Problem: Modelling of protein dihedral angles

    � Modelling protein dihedral angles (φ, ψ)I φ, ψ ∈ [0, 2π) represents a point on the torusI Cannot be modelled using vMF or KentI Modelled using mixtures of bivariate von Mises (BVM) distributions

    Parthan Kasarapu Statistical Modelling of Protein Structures 6 Aug 2015 24 / 33

  • Distribution of protein dihedral angles

    Parthan Kasarapu Statistical Modelling of Protein Structures 6 Aug 2015 25 / 33

  • Distribution of protein dihedral angles

    Example BVM distributionsParthan Kasarapu Statistical Modelling of Protein Structures 6 Aug 2015 25 / 33

  • Bivariate von Mises (BVM) clusters of dihedral angle data

    φ

    -180 -120 -60 0 60 120 180

    ψ

    -180

    -120

    -60

    0

    60

    120

    180

    25

    22

    24

    23

    2120

    11

    4

    18

    17

    16

    15

    5

    14

    10

    6

    13

    19

    7

    9

    12

    3

    2

    8

    1

    No correlation (32 clusters)

    φ

    -180 -120 -60 0 60 120 180

    ψ

    -180

    -120

    -60

    0

    60

    120

    180

    16

    15

    14

    8

    4

    13

    5

    7

    12

    11

    10

    3

    9

    2

    6

    1

    BVM (21 clusters) - optimal!

    Parthan Kasarapu Statistical Modelling of Protein Structures 6 Aug 2015 26 / 33

  • Problem: Abstraction of protein folding patterns

    Motivation� Rapid protein structure comparison

    I Achieved by effective summarization of folding patterns

    � Determine functionally similar proteinsI Achieved by unique representations

    Parthan Kasarapu Statistical Modelling of Protein Structures 6 Aug 2015 27 / 33

  • A novel method to abstract protein folding patterns

    (a) True structure (b) Commonly used

    (c) Illustrative non-linear representations (which is optimal?)

    Parthan Kasarapu Statistical Modelling of Protein Structures 6 Aug 2015 28 / 33

  • Optimal representation

    � MML balances the trade-off betweenI Maximize economy of description (compression)I Minimize loss of structural information (preservation of geometry)

    Parthan Kasarapu Statistical Modelling of Protein Structures 6 Aug 2015 29 / 33

  • Optimal representation

    � MML balances the trade-off betweenI Maximize economy of description (compression)I Minimize loss of structural information (preservation of geometry)

    Parthan Kasarapu Statistical Modelling of Protein Structures 6 Aug 2015 29 / 33

  • Merits of this abstraction

    � Does not rely on secondary structure assignment

    � Applications in protein structure comparisonI Database searchI Comparing the representations - fast

    P. Kasarapu, M. G. de la Banda, A. S. Konagurthu, On representingprotein folding patterns using non-linear parametric curves, IEEE/ACMTransactions on Computational Biology and Bioinformatics,11(6):1218-1228 (2014)

    Parthan Kasarapu Statistical Modelling of Protein Structures 6 Aug 2015 30 / 33

  • Merits of this abstraction

    � Does not rely on secondary structure assignment� Applications in protein structure comparison

    I Database searchI Comparing the representations

    - fast

    P. Kasarapu, M. G. de la Banda, A. S. Konagurthu, On representingprotein folding patterns using non-linear parametric curves, IEEE/ACMTransactions on Computational Biology and Bioinformatics,11(6):1218-1228 (2014)

    Parthan Kasarapu Statistical Modelling of Protein Structures 6 Aug 2015 30 / 33

  • Merits of this abstraction

    � Does not rely on secondary structure assignment� Applications in protein structure comparison

    I Database searchI Comparing the representations - fast

    P. Kasarapu, M. G. de la Banda, A. S. Konagurthu, On representingprotein folding patterns using non-linear parametric curves, IEEE/ACMTransactions on Computational Biology and Bioinformatics,11(6):1218-1228 (2014)

    Parthan Kasarapu Statistical Modelling of Protein Structures 6 Aug 2015 30 / 33

  • Merits of this abstraction

    � Does not rely on secondary structure assignment� Applications in protein structure comparison

    I Database searchI Comparing the representations - fast

    P. Kasarapu, M. G. de la Banda, A. S. Konagurthu, On representingprotein folding patterns using non-linear parametric curves, IEEE/ACMTransactions on Computational Biology and Bioinformatics,11(6):1218-1228 (2014)

    Parthan Kasarapu Statistical Modelling of Protein Structures 6 Aug 2015 30 / 33

  • Main contributions of my thesis

    Theoretical:� MML-based statistical inference

    I Multivariate von Mises-Fisher (hypersphere)I Kent (3D-sphere)I Bivariate von Mises (3D-torus)

    � Mixture modelling (clustering)

    � Non-linear abstractions (regression)

    Applications:

    � Structural bioinformatics

    � High-dimensional text clustering using vMF mixtures

    � Analytical tools for biologists and statisticians

    Parthan Kasarapu Statistical Modelling of Protein Structures 6 Aug 2015 31 / 33

  • Main contributions of my thesis

    Theoretical:� MML-based statistical inference

    I Multivariate von Mises-Fisher (hypersphere)I Kent (3D-sphere)I Bivariate von Mises (3D-torus)

    � Mixture modelling (clustering)

    � Non-linear abstractions (regression)

    Applications:

    � Structural bioinformatics

    � High-dimensional text clustering using vMF mixtures

    � Analytical tools for biologists and statisticians

    Parthan Kasarapu Statistical Modelling of Protein Structures 6 Aug 2015 31 / 33

  • Conclusion

    � Data analysis and statistical modelling go hand-in-handI Rigorous models are useful

    � Scope for improving the existing methodologiesI Extend the current machine learning algorithms

    � My research has practical implications in data mining, structuralbiology, etc.

    Parthan Kasarapu Statistical Modelling of Protein Structures 6 Aug 2015 32 / 33

  • Conclusion

    � Data analysis and statistical modelling go hand-in-handI Rigorous models are useful

    � Scope for improving the existing methodologiesI Extend the current machine learning algorithms

    � My research has practical implications in data mining, structuralbiology, etc.

    Parthan Kasarapu Statistical Modelling of Protein Structures 6 Aug 2015 32 / 33

  • Conclusion

    � Data analysis and statistical modelling go hand-in-handI Rigorous models are useful

    � Scope for improving the existing methodologiesI Extend the current machine learning algorithms

    � My research has practical implications in data mining, structuralbiology, etc.

    Parthan Kasarapu Statistical Modelling of Protein Structures 6 Aug 2015 32 / 33

  • Thank you.

    Parthan Kasarapu Statistical Modelling of Protein Structures 6 Aug 2015 33 / 33