bioinformatics (lec 17) 3/15/05 1giri/teach/bioinf/s05/lecx7.pdf · self-organizing maps [kohonen]...
TRANSCRIPT
A Dendrogram
3/15/05 1Bioinformatics (Lec 17)
Hierarchical Clustering [Johnson, SC, 1967]• Given n points in Rd, compute the distance
between every pair of points• While (not done)
– Pick closest pair of points si and sj and make them part of the same cluster.
– Replace the pair by an average of the two sij
Try the applet at:http://www.cs.mcgill.ca/~papou/#applet
3/15/05 2Bioinformatics (Lec 17)
Distance Metrics• For clustering, define a distance function:
– Euclidean distance metrics
– Pearson correlation coefficient
k=2: Euclidean Distancekd
i
kiik YXYXD
/1
1)(),( ⎥⎦
⎤⎢⎣
⎡−= ∑
=
⎟⎟⎠
⎞⎜⎜⎝
⎛ −⎟⎟⎠
⎞⎜⎜⎝
⎛ −= ∑
= y
i
x
id
ixy
YYXXd σσ
ρ1
1-1 ≤ ρxy ≥ 1
3/15/05 3Bioinformatics (Lec 17)
Start
End
3/15/05 4Bioinformatics (Lec 17)
K-Means Clustering [McQueen ’67]Repeat– Start with randomly chosen cluster centers– Assign points to give greatest increase in score– Recompute cluster centers– Reassign pointsuntil (no changes)
Try the applet at: http://www.cs.mcgill.ca/~bonnef/project.html
3/15/05 5Bioinformatics (Lec 17)
Self-Organizing Maps [Kohonen]• Kind of neural network.• Clusters data and find complex relationships
between clusters.• Helps reduce the dimensionality of the data.• Map of 1 or 2 dimensions produced.• Unsupervised Clustering• Like K-Means, except for visualization
3/15/05 6Bioinformatics (Lec 17)
SOM Algorithm• Select SOM architecture, and initialize
weight vectors and other parameters.• While (stopping condition not satisfied) do
for each input point x– winning node q has weight vector closest to x.– Update weight vector of q and its neighbors.– Reduce neighborhood size and learning rate.
3/15/05 7Bioinformatics (Lec 17)
SOM Algorithm Details
• Distance between x and weight vector:• Winning node: • Weight update function (for neighbors):
• Learning rate:
iwx −
)]()()[,,()()1( kwkxixkkwkw iii −+=+ µ
ii
wxxq −= min)(
⎟⎟
⎠
⎞
⎜⎜
⎝
⎛ −−= 2
2)(
exp)(),,( 0
σηµ
xqi rrkixk
3/15/05 8Bioinformatics (Lec 17)
3/15/05 9Bioinformatics (Lec 17)
World Poverty SOM
3/15/05 10Bioinformatics (Lec 17)
World Poverty Map
Neural Networks
ΣInput X
SynapticWeights W
ƒ(•)
Bias θ
Output y
3/15/05 11Bioinformatics (Lec 17)
Learning NN
×
×
× Σ
Σ
Adaptive Algorithm
Input X
1
Weights W
−+
Desired Response
Error
3/15/05 12Bioinformatics (Lec 17)
Types of NNs• Recurrent NN• Feed-forward NN• Layered
Other issues• Hidden layers possible• Different activation functions possible
3/15/05 13Bioinformatics (Lec 17)
Application: Secondary Structure Prediction
3/15/05 14Bioinformatics (Lec 17)
Support Vector Machines• Supervised Statistical Learning Method for:
– Classification– Regression
• Simplest Version:– Training: Present series of labeled examples
(e.g., gene expressions of tumor vs. normal cells)– Prediction: Predict labels of new examples.
3/15/05 15Bioinformatics (Lec 17)
3/15/05 16Bioinformatics (Lec 17)
Learning Problems
AA
A
B
ABB
B
SVM – Binary Classification• Partition feature space with a surface.• Surface is implied by a subset of the
training points (vectors) near it. These vectors are referred to as Support Vectors.
• Efficient with high-dimensional data. • Solid statistical theory• Subsume several other methods.
3/15/05 17Bioinformatics (Lec 17)
Learning Problems• Binary Classification• Multi-class classification• Regression
3/15/05 18Bioinformatics (Lec 17)
3/15/05 19Bioinformatics (Lec 17)
3/15/05 20Bioinformatics (Lec 17)
3/15/05 21Bioinformatics (Lec 17)
SVM – General Principles• SVMs perform binary classification by
partitioning the feature space with a surface implied by a subset of the training points (vectors) near the separating surface. These vectors are referred to as Support Vectors.
• Efficient with high-dimensional data. • Solid statistical theory• Subsume several other methods.
3/15/05 22Bioinformatics (Lec 17)
SVM Example (Radial Basis Function)
3/15/05 23Bioinformatics (Lec 17)
SVM Ingredients• Support Vectors• Mapping from Input Space to Feature Space• Dot Product – Kernel function• Weights
3/15/05 24Bioinformatics (Lec 17)
Classification of 2-D (Separable) data
3/15/05 25Bioinformatics (Lec 17)
3/15/05 26Bioinformatics (Lec 17)
Classification of(Separable) 2-D data
3/15/05 27Bioinformatics (Lec 17)
Classification of (Separable) 2-D data
+1
-1
•Margin of a point•Margin of a point set
3/15/05 28Bioinformatics (Lec 17)
Classification using the Separator
x
Separatorw•x + b = 0
w•xi + b > 0
w•xj + b < 0
x
3/15/05 29Bioinformatics (Lec 17)
Perceptron Algorithm (Primal)
Given separable training set S and learning rate η>0 w0 = 0; // Weightb0 = 0; // Biask = 0; R = max 7xi7repeat
for i = 1 to N if yi (wk•xi + bk) ≤ 0 then
wk+1 = wk + ηyixibk+1 = bk + ηyiR2
k = k + 1Until no mistakes made within loopReturn k, and (wk, bk) where k = # of mistakes
Rosenblatt, 1956
w = Σ aiyixi
Theorem: If margin m of S is positive, then
i.e., the algorithm will always converge, and will converge quickly.
Performance for Separable Data
k ≤ (2R/m)2
3/15/05 30Bioinformatics (Lec 17)
Perceptron Algorithm (Dual)Given a separable training set S a = 0; b0 = 0; R = max 7xi7repeat
for i = 1 to N if yi (Σaj yj xi•xj + b) ≤ 0 then
ai = ai + 1b = b + yiR2
endifUntil no mistakes made within loopReturn (a, b)
3/15/05 31Bioinformatics (Lec 17)
Non-linear Separators
3/15/05 32Bioinformatics (Lec 17)
Main idea: Map into feature space
3/15/05 33Bioinformatics (Lec 17)
Non-linear Separators
X F
3/15/05 34Bioinformatics (Lec 17)
Useful URLs• http://www.support-vector.net
3/15/05 35Bioinformatics (Lec 17)
Perceptron Algorithm (Dual)Given a separable training set S a = 0; b0 = 0; R = max 7xi7repeat
for i = 1 to N if yi (Σaj yj k(xi ,xj) + b) ≤ 0 then
ai = ai + 1b = b + yiR2
Until no mistakes made within loopReturn (a, b)
k(xi ,xj) = Φ(xi)• Φ(xj)
3/15/05 36Bioinformatics (Lec 17)
Different Kernel Functions• Polynomial kernel
• Radial Basis Kernel
• Sigmoid Kernel
dYXYX )(),( •=κ
⎟⎟
⎠
⎞
⎜⎜
⎝
⎛ −−= 2
2
2exp),(
σκ
YXYX
))(tanh(),( θωκ +•= YXYX
3/15/05 37Bioinformatics (Lec 17)
SVM Ingredients• Support Vectors• Mapping from Input Space to Feature Space• Dot Product – Kernel function
3/15/05 38Bioinformatics (Lec 17)
Generalizations• How to deal with more than 2 classes?
Idea: Associate weight and bias for each class.• How to deal with non-linear separator?
Idea: Support Vector Machines.• How to deal with linear regression?• How to deal with non-separable data?
3/15/05 39Bioinformatics (Lec 17)
Applications• Text Categorization & Information Filtering
– 12,902 Reuters Stories, 118 categories (91% !!)• Image Recognition
– Face Detection, tumor anomalies, defective parts in assembly line, etc.
• Gene Expression Analysis• Protein Homology Detection
3/15/05 40Bioinformatics (Lec 17)
3/15/05 41Bioinformatics (Lec 17)
3/15/05 42Bioinformatics (Lec 17)
3/15/05 43Bioinformatics (Lec 17)
SVM Example (Radial Basis Function)
3/15/05 44Bioinformatics (Lec 17)