technology behind microarrays • data analysis approaches...
TRANSCRIPT
![Page 1: Technology behind microarrays • Data analysis approaches ...user.ceng.metu.edu.tr/~tcan/ceng465_s0809/Schedule/ceng465_week13.pdf · • Data analysis approaches • Clustering](https://reader033.vdocuments.us/reader033/viewer/2022050718/5e173acd2a623938990d4958/html5/thumbnails/1.jpg)
Microarrays
• Technology behind microarrays• Data analysis approaches• Clustering microarray data• Clustering microarray data
1
![Page 2: Technology behind microarrays • Data analysis approaches ...user.ceng.metu.edu.tr/~tcan/ceng465_s0809/Schedule/ceng465_week13.pdf · • Data analysis approaches • Clustering](https://reader033.vdocuments.us/reader033/viewer/2022050718/5e173acd2a623938990d4958/html5/thumbnails/2.jpg)
Molecular biology overview N lCell Nucleus
ChromosomeChromosome
Protein Gene (DNA)Gene (mRNA), i l t d
2Graphics courtesy of the National Human Genome Research Institutesingle strand
![Page 3: Technology behind microarrays • Data analysis approaches ...user.ceng.metu.edu.tr/~tcan/ceng465_s0809/Schedule/ceng465_week13.pdf · • Data analysis approaches • Clustering](https://reader033.vdocuments.us/reader033/viewer/2022050718/5e173acd2a623938990d4958/html5/thumbnails/3.jpg)
Gene expression
• Cells are different because of differential gene expressionexpression.
• About 40% of human genes are expressed at any one timeone time.
• Gene is expressed by transcribing DNA into single stranded mRNAsingle-stranded mRNA
• mRNA is later translated into a proteinMi h l l f RNA• Microarrays measure the level of mRNA expression
3
![Page 4: Technology behind microarrays • Data analysis approaches ...user.ceng.metu.edu.tr/~tcan/ceng465_s0809/Schedule/ceng465_week13.pdf · • Data analysis approaches • Clustering](https://reader033.vdocuments.us/reader033/viewer/2022050718/5e173acd2a623938990d4958/html5/thumbnails/4.jpg)
Basic idea
• mRNA expression represents dynamic aspects of• mRNA expression represents dynamic aspects of cell
• mRNA expression can be measured with latest• mRNA expression can be measured with latest technology
• mRNA is isolated and labeled using a fluorescent• mRNA is isolated and labeled using a fluorescent material
• mRNA is hybridized to the target; level of• mRNA is hybridized to the target; level of hybridization corresponds to light emission which is measured with a laseris measured with a laser
• Higher concentration more hybridization more mRNA
4
more mRNA
![Page 5: Technology behind microarrays • Data analysis approaches ...user.ceng.metu.edu.tr/~tcan/ceng465_s0809/Schedule/ceng465_week13.pdf · • Data analysis approaches • Clustering](https://reader033.vdocuments.us/reader033/viewer/2022050718/5e173acd2a623938990d4958/html5/thumbnails/5.jpg)
A demonstration
• DNA microarray animation by A. Malcolm Campbell.
http://www.bio.davidson.edu/Courses/genomics/chip/chip.html
5
![Page 6: Technology behind microarrays • Data analysis approaches ...user.ceng.metu.edu.tr/~tcan/ceng465_s0809/Schedule/ceng465_week13.pdf · • Data analysis approaches • Clustering](https://reader033.vdocuments.us/reader033/viewer/2022050718/5e173acd2a623938990d4958/html5/thumbnails/6.jpg)
Experimental conditionsp• Different tissues• Different developmental stages• Different disease states• Different disease states• Different treatments
6
![Page 7: Technology behind microarrays • Data analysis approaches ...user.ceng.metu.edu.tr/~tcan/ceng465_s0809/Schedule/ceng465_week13.pdf · • Data analysis approaches • Clustering](https://reader033.vdocuments.us/reader033/viewer/2022050718/5e173acd2a623938990d4958/html5/thumbnails/7.jpg)
Background papers• Background paper 1• Background paper 2• Background paper 3• Background paper 3
7
![Page 8: Technology behind microarrays • Data analysis approaches ...user.ceng.metu.edu.tr/~tcan/ceng465_s0809/Schedule/ceng465_week13.pdf · • Data analysis approaches • Clustering](https://reader033.vdocuments.us/reader033/viewer/2022050718/5e173acd2a623938990d4958/html5/thumbnails/8.jpg)
Microarray types
The main types of gene expression microarrays:• Short oligonucleotide arrays (Affymetrix)• cDNA or spotted arrays (Brown lab)• cDNA or spotted arrays (Brown lab)• Long oligonucleotide arrays (Agilent Inkjet)• Fiber-optic arrays• ...
8
![Page 9: Technology behind microarrays • Data analysis approaches ...user.ceng.metu.edu.tr/~tcan/ceng465_s0809/Schedule/ceng465_week13.pdf · • Data analysis approaches • Clustering](https://reader033.vdocuments.us/reader033/viewer/2022050718/5e173acd2a623938990d4958/html5/thumbnails/9.jpg)
Affymetrix chipsAffymetrix chips
Ra image
1.28cm
Raw image
18um
9
![Page 10: Technology behind microarrays • Data analysis approaches ...user.ceng.metu.edu.tr/~tcan/ceng465_s0809/Schedule/ceng465_week13.pdf · • Data analysis approaches • Clustering](https://reader033.vdocuments.us/reader033/viewer/2022050718/5e173acd2a623938990d4958/html5/thumbnails/10.jpg)
Competitive hybridization
10
![Page 11: Technology behind microarrays • Data analysis approaches ...user.ceng.metu.edu.tr/~tcan/ceng465_s0809/Schedule/ceng465_week13.pdf · • Data analysis approaches • Clustering](https://reader033.vdocuments.us/reader033/viewer/2022050718/5e173acd2a623938990d4958/html5/thumbnails/11.jpg)
Microarray image dataMicroarray image data
mouse heart versus liver hybridization
11
![Page 12: Technology behind microarrays • Data analysis approaches ...user.ceng.metu.edu.tr/~tcan/ceng465_s0809/Schedule/ceng465_week13.pdf · • Data analysis approaches • Clustering](https://reader033.vdocuments.us/reader033/viewer/2022050718/5e173acd2a623938990d4958/html5/thumbnails/12.jpg)
More imagesMore images
Gene GTF4
Upregulated
Reference cDNAExperimental cDNA
Downregulated
12
g
![Page 13: Technology behind microarrays • Data analysis approaches ...user.ceng.metu.edu.tr/~tcan/ceng465_s0809/Schedule/ceng465_week13.pdf · • Data analysis approaches • Clustering](https://reader033.vdocuments.us/reader033/viewer/2022050718/5e173acd2a623938990d4958/html5/thumbnails/13.jpg)
Characteristics of microarray data
• Extremely high dimensionalityExperiment = (gene gene gene )– Experiment = (gene1, gene2, …, geneN)
– Gene = (experiment1, experiment2, …, experimentM)– N is often on the order of 104
– M is often on the order of 101
• Noisy data– Normalization and thresholding are important
• Missing dataF i i h f il d– For some experiments a given gene may have failed to hybridize
13
![Page 14: Technology behind microarrays • Data analysis approaches ...user.ceng.metu.edu.tr/~tcan/ceng465_s0809/Schedule/ceng465_week13.pdf · • Data analysis approaches • Clustering](https://reader033.vdocuments.us/reader033/viewer/2022050718/5e173acd2a623938990d4958/html5/thumbnails/14.jpg)
Microarray dataMicroarray data
GENE_NAME alpha 0 alpha 7 alpha 14 alpha 21 alpha 28 alpha 35 alpha 42YBR166C 0.33 -0.17 0.04 -0.07 -0.09 -0.12 -0.03YOR357C -0.64 -0.38 -0.32 -0.29 -0.22 -0.01 -0.32YLR292C -0.23 0.19 -0.36 0.14 -0.4 0.16 -0.09YGL112C -0.69 -0.89 -0.74 -0.56 -0.64 -0.18 -0.42YIL118W 0.04 0.01 -0.81 -0.3 0.49 0.08YDL120W 0 11 0 32 0 03 0 32 0 03 0 12 0 01YDL120W 0.11 0.32 0.03 0.32 0.03 -0.12 0.01
Missing Value!
14
![Page 15: Technology behind microarrays • Data analysis approaches ...user.ceng.metu.edu.tr/~tcan/ceng465_s0809/Schedule/ceng465_week13.pdf · • Data analysis approaches • Clustering](https://reader033.vdocuments.us/reader033/viewer/2022050718/5e173acd2a623938990d4958/html5/thumbnails/15.jpg)
Data mining challenges
• Too few experiments (samples), usually < 100 • Too many columns (genes), usually > 1,000• Too many columns lead to false positivesy p• For exploration, a large set of all relevant genes is
desireddesired• For diagnostics or identification of therapeutic
targets the smallest set of genes is neededtargets, the smallest set of genes is needed• Model needs to be explainable to biologists
15
![Page 16: Technology behind microarrays • Data analysis approaches ...user.ceng.metu.edu.tr/~tcan/ceng465_s0809/Schedule/ceng465_week13.pdf · • Data analysis approaches • Clustering](https://reader033.vdocuments.us/reader033/viewer/2022050718/5e173acd2a623938990d4958/html5/thumbnails/16.jpg)
Data processingg• Gridding
– Identifying spot locations• Segmentation• Segmentation
– Identifying foreground and background• Removal of outliers• Absolute measurementsAbsolute measurements
– cDNA microarrayI i l l f d d h l• Intensity level of red and green channels
16
![Page 17: Technology behind microarrays • Data analysis approaches ...user.ceng.metu.edu.tr/~tcan/ceng465_s0809/Schedule/ceng465_week13.pdf · • Data analysis approaches • Clustering](https://reader033.vdocuments.us/reader033/viewer/2022050718/5e173acd2a623938990d4958/html5/thumbnails/17.jpg)
Data normalizationData normalization
• Normalize data to correct for variances– Dye bias– Location bias– Intensity bias– Pin bias– Slide bias
• Control vs non control spots• Control vs. non-control spots– Maintenance genes
17
![Page 18: Technology behind microarrays • Data analysis approaches ...user.ceng.metu.edu.tr/~tcan/ceng465_s0809/Schedule/ceng465_week13.pdf · • Data analysis approaches • Clustering](https://reader033.vdocuments.us/reader033/viewer/2022050718/5e173acd2a623938990d4958/html5/thumbnails/18.jpg)
Data normalization
Calibrated, red and green equally detectedUncalibrated, red light under detected
18
![Page 19: Technology behind microarrays • Data analysis approaches ...user.ceng.metu.edu.tr/~tcan/ceng465_s0809/Schedule/ceng465_week13.pdf · • Data analysis approaches • Clustering](https://reader033.vdocuments.us/reader033/viewer/2022050718/5e173acd2a623938990d4958/html5/thumbnails/19.jpg)
Normalization
(log 2
)si
gnal
(C
y5 s
Cy3 signal (log )19
Cy3 signal (log2)
![Page 20: Technology behind microarrays • Data analysis approaches ...user.ceng.metu.edu.tr/~tcan/ceng465_s0809/Schedule/ceng465_week13.pdf · • Data analysis approaches • Clustering](https://reader033.vdocuments.us/reader033/viewer/2022050718/5e173acd2a623938990d4958/html5/thumbnails/20.jpg)
Data analysisData analysis
• What kinds of questions do we want to ask?– Clustering
• What genes have similar function?• Can we subdivide experiments or genes into meaningful
l ?classes?
– Classification • Can we correctly classify an unknown experiment or gene into• Can we correctly classify an unknown experiment or gene into
a known class?• Can we make better treatment decisions for a cancer patient
based on gene expression profile?
20
![Page 21: Technology behind microarrays • Data analysis approaches ...user.ceng.metu.edu.tr/~tcan/ceng465_s0809/Schedule/ceng465_week13.pdf · • Data analysis approaches • Clustering](https://reader033.vdocuments.us/reader033/viewer/2022050718/5e173acd2a623938990d4958/html5/thumbnails/21.jpg)
Clustering goals
• Find natural classes in the data• Identify new classes / gene correlations• Refine existing taxonomies• Refine existing taxonomies• Support biological analysis / discovery• Different Methods
Hierarchical clustering SOM's k means etc– Hierarchical clustering, SOM's, k-means, etc
21
![Page 22: Technology behind microarrays • Data analysis approaches ...user.ceng.metu.edu.tr/~tcan/ceng465_s0809/Schedule/ceng465_week13.pdf · • Data analysis approaches • Clustering](https://reader033.vdocuments.us/reader033/viewer/2022050718/5e173acd2a623938990d4958/html5/thumbnails/22.jpg)
Clustering techniquesg q• Distance measures
√ 2– Euclidean: √ Σ (xi – yi)2
– Vector angle: cosine of angle = x.y / √ (x.x) √ (y.y)P l i– Pearson correlation
• Subtract mean values and then compute vector angle• (x-xx).(y- yy) / √ ((x- xx).(x- xx)) √ ((y- yy).(y- yy))(x xx).(y yy) / √ ((x xx).(x xx)) √ ((y yy).(y yy))• Pearson correlation treats the vectors as if they were the same
(unit) length, therefore it is insensitive to the amplitude of changes that may be seen in the expression profileschanges that may be seen in the expression profiles.
22
![Page 23: Technology behind microarrays • Data analysis approaches ...user.ceng.metu.edu.tr/~tcan/ceng465_s0809/Schedule/ceng465_week13.pdf · • Data analysis approaches • Clustering](https://reader033.vdocuments.us/reader033/viewer/2022050718/5e173acd2a623938990d4958/html5/thumbnails/23.jpg)
K-means clusteringg
• Randomly assign k points to k clusters• Iterate
Assign each point to its nearest cluster (use– Assign each point to its nearest cluster (use centroid of clusters to compute distance)Aft ll i t i d t l t– After all points are assigned to clusters, compute new centroids of the clusters and re-
i ll th i t t th l t f th l tassign all the points to the cluster of the closest centroid.
23
![Page 24: Technology behind microarrays • Data analysis approaches ...user.ceng.metu.edu.tr/~tcan/ceng465_s0809/Schedule/ceng465_week13.pdf · • Data analysis approaches • Clustering](https://reader033.vdocuments.us/reader033/viewer/2022050718/5e173acd2a623938990d4958/html5/thumbnails/24.jpg)
K-means demo
• K-means applet
24
![Page 25: Technology behind microarrays • Data analysis approaches ...user.ceng.metu.edu.tr/~tcan/ceng465_s0809/Schedule/ceng465_week13.pdf · • Data analysis approaches • Clustering](https://reader033.vdocuments.us/reader033/viewer/2022050718/5e173acd2a623938990d4958/html5/thumbnails/25.jpg)
Hierarchical clusteringg
• Techniques similar to construction of• Techniques similar to construction of phylogenetic trees.
• A distance matrix for all genes are constructed based on distances between their expression profiles. N i hb j i i UPGMA b• Neighbor-joining or UPGMA can be applied on this matrix to get a hierarchical cluster.
• Single-linkage complete-linkage average-25
Single linkage, complete linkage, averagelinkage clustering
![Page 26: Technology behind microarrays • Data analysis approaches ...user.ceng.metu.edu.tr/~tcan/ceng465_s0809/Schedule/ceng465_week13.pdf · • Data analysis approaches • Clustering](https://reader033.vdocuments.us/reader033/viewer/2022050718/5e173acd2a623938990d4958/html5/thumbnails/26.jpg)
Hierarchical clustering• Hierarchical clustering treats each data point as a
singleton cluster, and then successively mergessingleton cluster, and then successively merges clusters until all points have been merged into a single remaining cluster. A hierarchical clusteringsingle remaining cluster. A hierarchical clustering is often represented as a dendrogram.
A hierarchical clusteringf t f tl dof most frequently used
English words.
26
![Page 27: Technology behind microarrays • Data analysis approaches ...user.ceng.metu.edu.tr/~tcan/ceng465_s0809/Schedule/ceng465_week13.pdf · • Data analysis approaches • Clustering](https://reader033.vdocuments.us/reader033/viewer/2022050718/5e173acd2a623938990d4958/html5/thumbnails/27.jpg)
Hierarchical clusteringg• In complete-link (or complete linkage)
hi hi l l i i hhierarchical clustering, we merge in each step the two clusters whose merger has the smallest diameter (or: the two clusters with the smallest maximum pairwise distance).the smallest maximum pairwise distance).
• In single-link (or single linkage) hi hi l l t i i hhierarchical clustering, we merge in each step the two clusters whose two closest members have the smallest distance (or: the two clusters with the smallest minimum
27pairwise distance).
![Page 28: Technology behind microarrays • Data analysis approaches ...user.ceng.metu.edu.tr/~tcan/ceng465_s0809/Schedule/ceng465_week13.pdf · • Data analysis approaches • Clustering](https://reader033.vdocuments.us/reader033/viewer/2022050718/5e173acd2a623938990d4958/html5/thumbnails/28.jpg)
Inter-group distancesInter-group distances
single-linkage complete-linkage28
single-linkage complete-linkage
![Page 29: Technology behind microarrays • Data analysis approaches ...user.ceng.metu.edu.tr/~tcan/ceng465_s0809/Schedule/ceng465_week13.pdf · • Data analysis approaches • Clustering](https://reader033.vdocuments.us/reader033/viewer/2022050718/5e173acd2a623938990d4958/html5/thumbnails/29.jpg)
Average linkageAverage-linkage
• UPGMA and neighbor-joining considers all cluster members when updating the distance p gmatrix
29
![Page 30: Technology behind microarrays • Data analysis approaches ...user.ceng.metu.edu.tr/~tcan/ceng465_s0809/Schedule/ceng465_week13.pdf · • Data analysis approaches • Clustering](https://reader033.vdocuments.us/reader033/viewer/2022050718/5e173acd2a623938990d4958/html5/thumbnails/30.jpg)
Hierarchical Clusteringg
30
![Page 31: Technology behind microarrays • Data analysis approaches ...user.ceng.metu.edu.tr/~tcan/ceng465_s0809/Schedule/ceng465_week13.pdf · • Data analysis approaches • Clustering](https://reader033.vdocuments.us/reader033/viewer/2022050718/5e173acd2a623938990d4958/html5/thumbnails/31.jpg)
Hierarchical ClusteringHierarchical Clustering
31Perou, Charles M., et al. Nature, 406, 747-752 , 2000.
![Page 32: Technology behind microarrays • Data analysis approaches ...user.ceng.metu.edu.tr/~tcan/ceng465_s0809/Schedule/ceng465_week13.pdf · • Data analysis approaches • Clustering](https://reader033.vdocuments.us/reader033/viewer/2022050718/5e173acd2a623938990d4958/html5/thumbnails/32.jpg)
Self organizing maps (SOM)g g p ( )• Self Organizing Maps (SOM) by Teuvo Kohonen is a data
visualization technique which helps to understand highvisualization technique which helps to understand high dimensional data by reducing the dimensions of data to a map.p
• The problem that data visualization attempts to solve is that humans simply cannot visualize high dimensional data as is, p y gso techniques are created to help us understand this high dimensional data.
• The way SOMs go about reducing dimensions is by producing a map of usually 1 or 2 dimensions
hi h l h i il i i f h d b iwhich plot the similarities of the data by grouping similar data items together.
32
![Page 33: Technology behind microarrays • Data analysis approaches ...user.ceng.metu.edu.tr/~tcan/ceng465_s0809/Schedule/ceng465_week13.pdf · • Data analysis approaches • Clustering](https://reader033.vdocuments.us/reader033/viewer/2022050718/5e173acd2a623938990d4958/html5/thumbnails/33.jpg)
Components of SOMs: sample dataComponents of SOMs: sample data• The sample data that we need to cluster (or• The sample data that we need to cluster (or
analyze) represented by n-dimensional vectors
• Examples:p– colors. The vector representation is 3-dimensional:
(r,g,b)( ,g, )
– people. We may want to characterize 400 students in p p yCEng. Are there different groups of students, etc. Example representation: 100 dimensional vector = (age,
33gender, height, weight, hair color, eye color, CGPA, etc.)
![Page 34: Technology behind microarrays • Data analysis approaches ...user.ceng.metu.edu.tr/~tcan/ceng465_s0809/Schedule/ceng465_week13.pdf · • Data analysis approaches • Clustering](https://reader033.vdocuments.us/reader033/viewer/2022050718/5e173acd2a623938990d4958/html5/thumbnails/34.jpg)
Components of SOMs: the mapComponents of SOMs: the map• Each pixel on the map is associated with an n-• Each pixel on the map is associated with an n-
dimensional vector, and a pixel location value (x y) The number of pixels on the map may not be(x,y). The number of pixels on the map may not be equal to the number of sample data you want to cluster The n-dimensional vectors of the pixelscluster. The n-dimensional vectors of the pixels may be initialized with random values.
34
![Page 35: Technology behind microarrays • Data analysis approaches ...user.ceng.metu.edu.tr/~tcan/ceng465_s0809/Schedule/ceng465_week13.pdf · • Data analysis approaches • Clustering](https://reader033.vdocuments.us/reader033/viewer/2022050718/5e173acd2a623938990d4958/html5/thumbnails/35.jpg)
Components of SOMs: the mapComponents of SOMs: the map
• The pixels and the associated vectors on the map are sometimes called “weight vectors” orare sometimes called “weight vectors” or “neurons” because SOMs are closely related to neural networks
35
neural networks.
![Page 36: Technology behind microarrays • Data analysis approaches ...user.ceng.metu.edu.tr/~tcan/ceng465_s0809/Schedule/ceng465_week13.pdf · • Data analysis approaches • Clustering](https://reader033.vdocuments.us/reader033/viewer/2022050718/5e173acd2a623938990d4958/html5/thumbnails/36.jpg)
SOMs: the algorithmSOMs: the algorithm• initialize the map• initialize the map• for t from 0 to 1
d l l l– randomly select a sample– get the best matching pixel to the selected sample– update the values of the best pixel and its neighbors– increase t a small amount
• end for
36
![Page 37: Technology behind microarrays • Data analysis approaches ...user.ceng.metu.edu.tr/~tcan/ceng465_s0809/Schedule/ceng465_week13.pdf · • Data analysis approaches • Clustering](https://reader033.vdocuments.us/reader033/viewer/2022050718/5e173acd2a623938990d4958/html5/thumbnails/37.jpg)
Initializing the mapInitializing the map• Assume you are clustering the 400 students in• Assume you are clustering the 400 students in
CEng.• You may initialize a map of size 500x500 (250K
pixels) with completely random values (i.e. p ) p y (random people). Or if you have some information about groups of people a priori youinformation about groups of people a priori, you may use this to initialize the map.
37
![Page 38: Technology behind microarrays • Data analysis approaches ...user.ceng.metu.edu.tr/~tcan/ceng465_s0809/Schedule/ceng465_week13.pdf · • Data analysis approaches • Clustering](https://reader033.vdocuments.us/reader033/viewer/2022050718/5e173acd2a623938990d4958/html5/thumbnails/38.jpg)
Finding the best matching pixelFinding the best matching pixel• After selecting a random student (or color)• After selecting a random student (or color)
from the set that you want to cluster, you fi d h b hi i l hi lfind the best matching pixel to this sample.
• Euclidian distance may be used to compute y pthe distance between n-dimensional vectors.
I e you select the closest pixel using the– I.e., you select the closest pixel using the following equation:
b t i l i ∑n
2• best_pixel = argmin
f ll
∑=
−i
samplep xx1
2)(
∈38
for all p map ∈
![Page 39: Technology behind microarrays • Data analysis approaches ...user.ceng.metu.edu.tr/~tcan/ceng465_s0809/Schedule/ceng465_week13.pdf · • Data analysis approaches • Clustering](https://reader033.vdocuments.us/reader033/viewer/2022050718/5e173acd2a623938990d4958/html5/thumbnails/39.jpg)
Updating the pixel valuesUpdating the pixel values
• The best matching pixel and its neighbors• The best matching pixel and its neighbors are allowed to update themselves to
bl h l d lresemble the selected sample– new vector of a pixel is computed as
current_pixel_value*(t)+sample_value*(1-t)– in other words, in early iterations when t is close to 0, , y ,
the pixel directly copies the properties of the randomly selected sample, but in subsequent iterations the ll d f h dallowed amount of changes decreases.
– Similarly for the neighbors of the best pixel, as the di t f th i hb i th ll d t
39
distance of the neighbor increases, they are allowed to update themselves in a smaller amount.
![Page 40: Technology behind microarrays • Data analysis approaches ...user.ceng.metu.edu.tr/~tcan/ceng465_s0809/Schedule/ceng465_week13.pdf · • Data analysis approaches • Clustering](https://reader033.vdocuments.us/reader033/viewer/2022050718/5e173acd2a623938990d4958/html5/thumbnails/40.jpg)
Updating the pixel valuesUpdating the pixel values
• A Gaussian function can be used to determine the neighbors and the amount of update allowed in each iteration. The heightupdate allowed in each iteration. The height of the peak of the Gaussian will decrease and base of the peak will shrink as time (t)
40
and base of the peak will shrink as time (t) progresses.
![Page 41: Technology behind microarrays • Data analysis approaches ...user.ceng.metu.edu.tr/~tcan/ceng465_s0809/Schedule/ceng465_week13.pdf · • Data analysis approaches • Clustering](https://reader033.vdocuments.us/reader033/viewer/2022050718/5e173acd2a623938990d4958/html5/thumbnails/41.jpg)
Why do similar objects end up in near-by locations on the map?
d l l d l• Because a randomly selected sample, A, influences the neighboring samples to become similar the itself at a certain level.
• At the following iterations when another sample• At the following iterations when another sample, B, is selected randomly and it is similar to A. We h h f b i i B’ b i lhave a greater chance of obtaining B’s best pixel on the map closer to A’s best pixel, because those pixels around A’s best pixel are updated to resemble A, if B is similar to A, its best pixel
41
, , pmay be found in the same neighborhood.
![Page 42: Technology behind microarrays • Data analysis approaches ...user.ceng.metu.edu.tr/~tcan/ceng465_s0809/Schedule/ceng465_week13.pdf · • Data analysis approaches • Clustering](https://reader033.vdocuments.us/reader033/viewer/2022050718/5e173acd2a623938990d4958/html5/thumbnails/42.jpg)
How to visualize similarities between high-dimensional vectors?• Colors are easy to visualize, but how do we
visualize similarities between students?• The SOM may show how similar a pixel is to its
neighbors (dark color: not similar, light color:neighbors (dark color: not similar, light color: similar). White blobs in the map will represent groups of similar people. Their properties can begroups of similar people. Their properties can be analyzed by inspecting the vectors at those pixels.
42
![Page 43: Technology behind microarrays • Data analysis approaches ...user.ceng.metu.edu.tr/~tcan/ceng465_s0809/Schedule/ceng465_week13.pdf · • Data analysis approaches • Clustering](https://reader033.vdocuments.us/reader033/viewer/2022050718/5e173acd2a623938990d4958/html5/thumbnails/43.jpg)
SOM demoSOM demo
l• SOM applet
43