non-linear principal manifolds a useful tool in bioinformatics and medical applications

30
Non-linear Principal Manifolds a Useful Tool in Bioinformatics and Medical Applications Andrei Zinovyev Institute des Hautes Etudes Scientifique, France

Upload: makayla-gavaghan

Post on 01-Jan-2016

27 views

Category:

Documents


0 download

DESCRIPTION

Non-linear Principal Manifolds a Useful Tool in Bioinformatics and Medical Applications. Andrei Zinovyev Institute des Hautes Etudes Scientifique, France. Plan of the talk. Object of study Definition of principal manifold (PM) Constructing PMs: elastic maps - PowerPoint PPT Presentation

TRANSCRIPT

Page 1: Non-linear Principal Manifolds a Useful Tool  in Bioinformatics and Medical Applications

Non-linear Principal Manifoldsa Useful Tool in Bioinformatics and Medical Applications

Andrei ZinovyevInstitute des Hautes Etudes

Scientifique,France

Page 2: Non-linear Principal Manifolds a Useful Tool  in Bioinformatics and Medical Applications

Plan of the talk

Object of study Definition of principal manifold

(PM) Constructing PMs: elastic maps Examples of biomedical

applications

Page 3: Non-linear Principal Manifolds a Useful Tool  in Bioinformatics and Medical Applications

Principal manifoldsElastic maps framework

SVM

Principal manifolds

Regression,approximation

Supervisedclassification

K-means

SOM

Clustering

Multidim.scaling

VisualizationPCA

Factor analysis

LLE ISOMAP

Non-linearData-miningmethods

Page 4: Non-linear Principal Manifolds a Useful Tool  in Bioinformatics and Medical Applications

Finite set of objects in RN

X i

i=1..m

IRIS database

Petal heght

Petal width

Sepal width

Sepal height

SPECIES

4.9 3 1.4 0.2 Iris-setosa

4.7 3.2 1.3 0.3 Iris-setosa

4.6 3.1 1.5 0.2 Iris-setosa

7 3.2 4.7 1.4 Iris-versicolor

6.4 3.2 4.5 1.5 Iris-versicolor

6.9 3.1 4.9 1.5 Iris-versicolor

6.3 3.3 6 2.5 Iris-virginica

5.8 2.7 X 1.9 Iris-virginica

7.1 3 5.9 2.1 Iris-virginica

6.3 2.9 5.6 1.8 Iris-virginica

Page 5: Non-linear Principal Manifolds a Useful Tool  in Bioinformatics and Medical Applications

Mean point

m

iiX

mX

1

1

min1

2

m

ii XX

K-meansclustering

min1

2

m

ii YclosestX

Page 6: Non-linear Principal Manifolds a Useful Tool  in Bioinformatics and Medical Applications

Principal “Object”

,

min1

2

m

i

Page 7: Non-linear Principal Manifolds a Useful Tool  in Bioinformatics and Medical Applications

Principal Component Analysis

,

Max

imal

disp

ersio

n

1st Principalaxis

2nd principalaxis

Page 8: Non-linear Principal Manifolds a Useful Tool  in Bioinformatics and Medical Applications

Principal manifold

Page 9: Non-linear Principal Manifolds a Useful Tool  in Bioinformatics and Medical Applications

What do we want?

Non-linear surface (1D, 2D, 3D …) Smooth and not twisted The data model is unknown Speed (time linear with Nm) Uniqueness

Fast way to project datapoints

Page 10: Non-linear Principal Manifolds a Useful Tool  in Bioinformatics and Medical Applications

Metaphor of elasticity

Datapoints

Graphnodes

U(Y)U(E), U(R)

Page 11: Non-linear Principal Manifolds a Useful Tool  in Bioinformatics and Medical Applications

Constructing elastic nets

y E (0) E (1) R (1) R (0) R (2)

Page 12: Non-linear Principal Manifolds a Useful Tool  in Bioinformatics and Medical Applications

Definition of elastic energy

)()()( REY UUUU

2)(

1

)(

)()(

1 ijp

i Kx

Y yXN

Uij

2)()(

1

)( )0()1( iis

ii

E EEU

r

i

iiii

R RRRU1

2)()()()( )0(2)2()1(.

E (0) E (1)

R (1) R (0) R (2)

y

Xj

00 , ii

Page 13: Non-linear Principal Manifolds a Useful Tool  in Bioinformatics and Medical Applications

Elastic manifold

Page 14: Non-linear Principal Manifolds a Useful Tool  in Bioinformatics and Medical Applications

Global minimum and softening

0, 0 103

0, 0 102

0, 0 101

0, 0 10-1

Page 15: Non-linear Principal Manifolds a Useful Tool  in Bioinformatics and Medical Applications

Adaptive algorithms

Growing net

Adaptive net

Refining net:

Idea of scaling:

Page 16: Non-linear Principal Manifolds a Useful Tool  in Bioinformatics and Medical Applications

Projection onto the manifold

Closest node of the net

Closest point of the manifold

Page 17: Non-linear Principal Manifolds a Useful Tool  in Bioinformatics and Medical Applications

Colorings: visualize any function

Page 18: Non-linear Principal Manifolds a Useful Tool  in Bioinformatics and Medical Applications

Density visualization

Page 19: Non-linear Principal Manifolds a Useful Tool  in Bioinformatics and Medical Applications

Example: different topologies

RN

R2

Page 20: Non-linear Principal Manifolds a Useful Tool  in Bioinformatics and Medical Applications

VIDAExpert tool and elmap C++ package

Page 21: Non-linear Principal Manifolds a Useful Tool  in Bioinformatics and Medical Applications

Regression and principal manifolds

regression principal component

x

F(x)

min2 ii Pxx min)(

2 ii xFx

Data

Gen.curve

Grid

Page 22: Non-linear Principal Manifolds a Useful Tool  in Bioinformatics and Medical Applications

Image skeletonization or clustering around curves

Page 23: Non-linear Principal Manifolds a Useful Tool  in Bioinformatics and Medical Applications

Approximation of molecular surfaces

Page 24: Non-linear Principal Manifolds a Useful Tool  in Bioinformatics and Medical Applications

Application: economical data

Gross output

Density

ProfitGrowth temp

Page 25: Non-linear Principal Manifolds a Useful Tool  in Bioinformatics and Medical Applications

Medical table1700 patients with infarctus myocarde

Lethal casesPatients map, density

Page 26: Non-linear Principal Manifolds a Useful Tool  in Bioinformatics and Medical Applications

Medical table1700 patients with infarctus myocarde

128 indicators

Age Numberof infarctusin anamnesis

Stenocardia functionalclass

Page 27: Non-linear Principal Manifolds a Useful Tool  in Bioinformatics and Medical Applications

Codon usage in all genes of one genome

Escherichia coli Bacillus subtilis

Majority of genes

Highly expressed genes

“Foreign” genes

“Hydrophobic” genes

Page 28: Non-linear Principal Manifolds a Useful Tool  in Bioinformatics and Medical Applications

Golub’s leukemia dataset3051 genes, 38 samples (ALL/B-cell,ALL/T-cell,AML)

ALL sample AML sample

Map of genes: vote for ALL vote for AML used by T.Golub used by W.Lie

Page 29: Non-linear Principal Manifolds a Useful Tool  in Bioinformatics and Medical Applications

Golub’s leukemia datasetmap of samples: AML ALL/B-cell ALL/T-cell

density

Cystatin C Retinoblastomabinding protein P48

CA2 Carbonic anhydrase II

X-linked Helicase II

Page 30: Non-linear Principal Manifolds a Useful Tool  in Bioinformatics and Medical Applications

Thank you for your attention!

Questions?