l15:microarray analysis (classification) the biological problem two conditions that need to be...

40
L15:Microarray analysis (Classification)

Post on 20-Dec-2015

214 views

Category:

Documents


1 download

TRANSCRIPT

Page 1: L15:Microarray analysis (Classification) The Biological Problem Two conditions that need to be differentiated, (Have different treatments). EX: ALL (Acute

L15:Microarray analysis (Classification)

Page 2: L15:Microarray analysis (Classification) The Biological Problem Two conditions that need to be differentiated, (Have different treatments). EX: ALL (Acute

The Biological Problem

• Two conditions that need to be differentiated, (Have different treatments).

• EX: ALL (Acute Lymphocytic Leukemia) & AML (Acute Myelogenous Leukima)

• Possibly, the set of genes over-expressed are different in the two conditions

Page 3: L15:Microarray analysis (Classification) The Biological Problem Two conditions that need to be differentiated, (Have different treatments). EX: ALL (Acute

Geometric formulation

• Each sample is a vector with dimension equal to the number of genes.

• We have two classes of vectors (AML, ALL), and would like to separate them, if possible, with a hyperplane.

Page 4: L15:Microarray analysis (Classification) The Biological Problem Two conditions that need to be differentiated, (Have different treatments). EX: ALL (Acute

Basic geometry

• What is ||x||2 ?

• What is x/||x||• Dot product?

x=(x1,x2)

y

xT y = x1y1 + x2y2

= || x ||⋅ || y || cosθx cosθy + || x ||⋅ || y || sin(θx )sin(θy )|| x ||⋅ || y || cos(θx −θy )

Page 5: L15:Microarray analysis (Classification) The Biological Problem Two conditions that need to be differentiated, (Have different treatments). EX: ALL (Acute

Dot Product

• Let be a unit vector.– |||| = 1

• Recall that Tx = ||x|| cos

• What is Tx if x is orthogonal (perpendicular) to ?

x

Tx = ||x|| cos

Page 6: L15:Microarray analysis (Classification) The Biological Problem Two conditions that need to be differentiated, (Have different treatments). EX: ALL (Acute

Hyperplane

• How can we define a hyperplane L?

• Find the unit vector that is perpendicular (normal to the hyperplane)

Page 7: L15:Microarray analysis (Classification) The Biological Problem Two conditions that need to be differentiated, (Have different treatments). EX: ALL (Acute

Points on the hyperplane

• Consider a hyperplane L defined by unit vector , and distance 0 from the origin

• Notes;– For all x L, xT must be the

same, xT = 0

– For any two points x1, x2,• (x1- x2)T =0

• Therefore, given a vector , and an offset 0, the hyperplane is the set of all points – {x : xT = 0}

x1

x2

Page 8: L15:Microarray analysis (Classification) The Biological Problem Two conditions that need to be differentiated, (Have different treatments). EX: ALL (Acute

Hyperplane properties

• Given an arbitrary point x, what is the distance from x to the plane L?

• D(x,L) = (Tx - 0)• When are points x1 and x2

on different sides of the hyperplane?

x0

Page 9: L15:Microarray analysis (Classification) The Biological Problem Two conditions that need to be differentiated, (Have different treatments). EX: ALL (Acute

Hyperplane properties

• Given an arbitrary point x, what is the distance from x to the plane L?– D(x,L) = (Tx - 0)

• When are points x1 and x2 on different sides of the hyperplane?

• Ans: If D(x1,L)* D(x2,L) < 0

x0

Page 10: L15:Microarray analysis (Classification) The Biological Problem Two conditions that need to be differentiated, (Have different treatments). EX: ALL (Acute

Separating by a hyperplane• Input: A training set of +ve

& -ve examples• Recall that a hyperplane is

represented by – {x:-0+1x1+2x2=0} or – (in higher dimensions) {x:

Tx-0=0}• Goal: Find a hyperplane

that ‘separates’ the two classes.

• Classification: A new point x is +ve if it lies on the +ve side of the hyperplane (D(x,L)> 0) , -ve otherwise.

x2

x1

+

-

Page 11: L15:Microarray analysis (Classification) The Biological Problem Two conditions that need to be differentiated, (Have different treatments). EX: ALL (Acute

Hyperplane separation

• What happens if we have many choices of a hyperplane?– We try to maximize the

distance of the points from the hyperplane.

• What happens if the classes are not separable by a hyperplane?– We define a function

based on the amount of mis-classification, and try to minimize it

Page 12: L15:Microarray analysis (Classification) The Biological Problem Two conditions that need to be differentiated, (Have different treatments). EX: ALL (Acute

Error in classification

• Sample Function: sum of distances of all misclassified points – Let yi=-1 for +ve example i,

yi=+1 otherwise.

• The best hyperplane is one that minimizes D(,0)

• Other definitions are also possible.

x2

x1

+

-

D(β,β 0) = y i x iTβ − β 0( )

i∈M

Page 13: L15:Microarray analysis (Classification) The Biological Problem Two conditions that need to be differentiated, (Have different treatments). EX: ALL (Acute

Restating Classification

• The (supervised) classification problem can now be reformulated as an optimization problem.

• Goal: Find the hyperplane (,0), that optimizes the objective D(,0).

• No efficient algorithm is known for this problem, but a simple generic optimization can be applied.

• Start with a randomly chosen (,0)

• Move to a neighboring (’,’0) if D(’,’0)< D(,0)

Page 14: L15:Microarray analysis (Classification) The Biological Problem Two conditions that need to be differentiated, (Have different treatments). EX: ALL (Acute

Gradient Descent

• The function D() defines the error.

• We follow an iterative refinement. In each step, refine so the error is reduced.

• Gradient descent is an approach to such iterative refinement.

D()

← + ρ⋅D'(β )

D’()

Page 15: L15:Microarray analysis (Classification) The Biological Problem Two conditions that need to be differentiated, (Have different treatments). EX: ALL (Acute

Rosenblatt’s perceptron learning algorithm

D(β,β 0) = y i x iTβ − β 0( )

i∈M

∂D(β,β 0)

∂β= y ix i

i∈M

∂D(β,β 0)

∂β 0

= y i

i∈M

⇒ Update rule : β

β 0

⎝ ⎜

⎠ ⎟=

β

β 0

⎝ ⎜

⎠ ⎟+ ρ

y ix i

y i

⎝ ⎜

⎠ ⎟

i

Page 16: L15:Microarray analysis (Classification) The Biological Problem Two conditions that need to be differentiated, (Have different treatments). EX: ALL (Acute

Classification based on perceptron learning

• Use Rosenblatt’s algorithm to compute the hyperplane L=(,0).

• Assign x to class 1 if D(x,L)= Tx-0 >= 0, and to class 2 otherwise.

x

0

Page 17: L15:Microarray analysis (Classification) The Biological Problem Two conditions that need to be differentiated, (Have different treatments). EX: ALL (Acute

Perceptron learning

• If many solutions are possible, it does no choose between solutions

• If data is not linearly separable, it does not terminate, and it is hard to detect.

• Time of convergence is not well understood

Page 18: L15:Microarray analysis (Classification) The Biological Problem Two conditions that need to be differentiated, (Have different treatments). EX: ALL (Acute

Linear Discriminant analysis

• Provides an alternative approach to classification with a linear function.

• Project all points, including the means, onto vector .

• We want to choose such that– Difference of projected

means is large.– Variance within group is

small

x2

x1

+

-

Page 19: L15:Microarray analysis (Classification) The Biological Problem Two conditions that need to be differentiated, (Have different treatments). EX: ALL (Acute

Choosing the right

x2

x1

+

- 2

x2

x1

+

-

1

1 is a better choice than 2 as the variance within a group is small, and difference of means is large.

• How do we compute the best ?

Page 20: L15:Microarray analysis (Classification) The Biological Problem Two conditions that need to be differentiated, (Have different treatments). EX: ALL (Acute

Linear Discriminant analysis

Maxβ

(difference of projected means)

(sum of projected variance)

• Fisher Criterion

Page 21: L15:Microarray analysis (Classification) The Biological Problem Two conditions that need to be differentiated, (Have different treatments). EX: ALL (Acute

LDA cont’d

x2

x1

+

-

• What is the projection of a point x onto ?– Ans: Tx

• What is the distance between projected means?

x

˜ m 1 − ˜ m 22

= β T m1 − m2( )2€

˜ m 1€

˜ m 2

Page 22: L15:Microarray analysis (Classification) The Biological Problem Two conditions that need to be differentiated, (Have different treatments). EX: ALL (Acute

LDA Cont’d

| ˜ m 1 − ˜ m 2 |2 = β T (m1 − m2)2

= β T (m1 − m2)(m1 − m2)T β

= β T SBβ

scatter within sample : ˜ s 12 + ˜ s 2

2

where, ˜ s 12 = (y − ˜ m 1

y

∑ )2 = (β T (x − m1)x∈D1

∑ )2 = β T S1β

˜ s 12 + ˜ s 2

2 = β T (S1 + S2)β = β T Swβ

maxβ

β T SBβ

β T SwβFisher Criterion

T

(m1 − m2)

(m1 − m2)T

T

SB

Page 23: L15:Microarray analysis (Classification) The Biological Problem Two conditions that need to be differentiated, (Have different treatments). EX: ALL (Acute

LDA

Let maxβ

β T SBβ

β T Swβ= λ

Then, β T SBβ − λSwβ( ) = 0

⇒ λSwβ = SBβ

⇒ λβ = Sw−1SBβ

⇒ β = Sw−1(m1 − m2)

Therefore, a simple computation (Matrix inverse) is sufficient to compute the ‘best’ separating hyperplane

Page 24: L15:Microarray analysis (Classification) The Biological Problem Two conditions that need to be differentiated, (Have different treatments). EX: ALL (Acute

Maximum Likelihood discrimination

• Consider the simple case of single dimensional data.

• Compute a distribution of the values in each class.

values

Pr

Page 25: L15:Microarray analysis (Classification) The Biological Problem Two conditions that need to be differentiated, (Have different treatments). EX: ALL (Acute

Maximum Likelihood discrimination

• Suppose we knew the distribution of points in each class i.– We can compute Pr(x|i) for

all classes i, and take the maximum

• The true distribution is not known, so usually, we assume that it is Gaussian

Page 26: L15:Microarray analysis (Classification) The Biological Problem Two conditions that need to be differentiated, (Have different treatments). EX: ALL (Acute

ML discrimination

• Suppose all the points were in 1 dimension, and all classes were normally distributed.

Pr(ωi | x) =Pr(x |ωi)Pr(ωi)

Pr(x |ω j )Pr(ω j )j∑

gi(x) = ln Pr(x |ωi)( ) + ln Pr(ωi)( )

≅−(x − μ i)

2

2σ i2

− ln(σ i) + ln Pr(ωi)( )

Choose argmini

(x − μ i)2

2σ i2

+ ln(σ i) − ln Pr(ωi)( ) ⎛

⎝ ⎜

⎠ ⎟

1 2x

Page 27: L15:Microarray analysis (Classification) The Biological Problem Two conditions that need to be differentiated, (Have different treatments). EX: ALL (Acute

ML discrimination (multi-dimensional case)

p(x |ωi) =1

2π( )d

2 Σ1

2

exp −1

2x −m( )

TΣ−1 x −m( )

⎝ ⎜

⎠ ⎟

gi(x) = ln p(x |ωi)( ) + lnP(ωi)

Compute argmaxi gi(x)

Not part of the syllabus.

Page 28: L15:Microarray analysis (Classification) The Biological Problem Two conditions that need to be differentiated, (Have different treatments). EX: ALL (Acute

Supervised classification summary

• Most techniques for supervised classification are based on the notion of a separating hyperplane.

• The ‘optimal’ separation can be computed using various combinatorial (perceptron), algebraic (LDA), or statistical (ML) analyses.

Page 29: L15:Microarray analysis (Classification) The Biological Problem Two conditions that need to be differentiated, (Have different treatments). EX: ALL (Acute

Dimensionality reduction

• Many genes have highly correlated expression profiles.

• By discarding some of the genes, we can greatly reduce the dimensionality of the problem.

• There are other, more principled ways to do such dimensionality reduction.

Page 30: L15:Microarray analysis (Classification) The Biological Problem Two conditions that need to be differentiated, (Have different treatments). EX: ALL (Acute

Principle Components Analysis

• Consider the expression values of 2 genes over 6 samples.

• Clearly, the expression of the two genes is highly correlated.

• Projecting all the genes on a single line could explain most of the data.

• This is a generalization of “discarding the gene”.

Page 31: L15:Microarray analysis (Classification) The Biological Problem Two conditions that need to be differentiated, (Have different treatments). EX: ALL (Acute

PCA

• Suppose all of the data were to be reduced by projecting to a single line from the mean.

• How do we select the line ?

m

Page 32: L15:Microarray analysis (Classification) The Biological Problem Two conditions that need to be differentiated, (Have different treatments). EX: ALL (Acute

PCA cont’d

• Let each point xk map to x’k. We want to mimimize the error

• Observation 1: Each point xk maps to x’k = m + T(xk-m)€

xk − x 'k2

k

∑ m

xk

x’k

Page 33: L15:Microarray analysis (Classification) The Biological Problem Two conditions that need to be differentiated, (Have different treatments). EX: ALL (Acute

PCA: motivating example

• Consider the expression values of 2 genes over 6 samples.

• Clearly, the expression of g1 is not informative, and it suffices to look at g2 values.

• Dimensionality can be reduced by discarding the gene g1

g1

g2

Page 34: L15:Microarray analysis (Classification) The Biological Problem Two conditions that need to be differentiated, (Have different treatments). EX: ALL (Acute

PCA: Ex2

• Consider the expression values of 2 genes over 6 samples.

• Clearly, the expression of the two genes is highly correlated.

• Projecting all the genes on a single line could explain most of the data.

Page 35: L15:Microarray analysis (Classification) The Biological Problem Two conditions that need to be differentiated, (Have different treatments). EX: ALL (Acute

PCA

• Suppose all of the data were to be reduced by projecting to a single line from the mean.

• How do we select the line ?

m

Page 36: L15:Microarray analysis (Classification) The Biological Problem Two conditions that need to be differentiated, (Have different treatments). EX: ALL (Acute

PCA cont’d

• Let each point xk map to x’k=m+ak. We want to minimize the error

• Observation 1: Each point xk maps to x’k = m + T(xk-m)– (ak= T(xk-m))€

xk − x 'k2

k

∑ m

xk

x’k

Page 37: L15:Microarray analysis (Classification) The Biological Problem Two conditions that need to be differentiated, (Have different treatments). EX: ALL (Acute

Proof of Observation 1

minakxk − x'k

2

= minakxk − m + m − x'k

2

= minakxk − m

2+ m − x'k

2− 2(x'k −m)T (xk − m)

= minakxk − m

2+ ak

2β Tβ − 2akβT (xk − m)

= minakxk − m

2+ ak

2 − 2akβT (xk − m)

2ak − 2β T (xk − m) = 0

ak = β T (xk − m)

⇒ ak2 = akβ

T (xk − m)

⇒ xk − x 'k2

= xk − m2

− β T (xk − m)(xk − m)T β

Differentiating w.r.t ak

Page 38: L15:Microarray analysis (Classification) The Biological Problem Two conditions that need to be differentiated, (Have different treatments). EX: ALL (Acute

Minimizing PCA Error

• To minimize error, we must maximize TS• By definition, = TS implies that is an eigenvalue, and

the corresponding eigenvector.• Therefore, we must choose the eigenvector

corresponding to the largest eigenvalue.

xk − x 'kk

∑2

= C − β T

k

∑ (xk − m)(xk − m)T β = C − β T Sβ

Page 39: L15:Microarray analysis (Classification) The Biological Problem Two conditions that need to be differentiated, (Have different treatments). EX: ALL (Acute

PCA

• The single best dimension is given by the eigenvector of the largest eigenvalue of S

• The best k dimensions can be obtained by the eigenvectors {1, 2, …, k} corresponding to the k largest eigenvalues.

• To obtain the k dimensional surface, take BTM

BT

1T

M

Page 40: L15:Microarray analysis (Classification) The Biological Problem Two conditions that need to be differentiated, (Have different treatments). EX: ALL (Acute