introduction to bioinformatics - tutorial no. 12
DESCRIPTION
Introduction to Bioinformatics - Tutorial no. 12. Expression Data Analysis: - Clustering - GEO - EPClust. Application of Microarrays. We only know the function of about 20% of the 30,000 genes in the Human Genome Gene exploration Faster and better Applications: Evolution Behavior - PowerPoint PPT PresentationTRANSCRIPT
![Page 1: Introduction to Bioinformatics - Tutorial no. 12](https://reader036.vdocuments.us/reader036/viewer/2022062323/568157df550346895dc55f06/html5/thumbnails/1.jpg)
Introduction to Bioinformatics - Tutorial no. 12
Expression Data Analysis:- Clustering- GEO- EPClust
![Page 2: Introduction to Bioinformatics - Tutorial no. 12](https://reader036.vdocuments.us/reader036/viewer/2022062323/568157df550346895dc55f06/html5/thumbnails/2.jpg)
Application of Microarrays
We only know the function of about 20% of the 30,000 genes in the Human Genome Gene exploration Faster and better
Applications: Evolution Behavior Cancer Research
![Page 3: Introduction to Bioinformatics - Tutorial no. 12](https://reader036.vdocuments.us/reader036/viewer/2022062323/568157df550346895dc55f06/html5/thumbnails/3.jpg)
Microarray Analysis
Unsupervised Grouping: Clustering
Pattern discovery via grouping similarly expressed genes together
Three techniques most often used k-Means Clustering Hierarchical Clustering Kohonen Self Organizing Feature Maps
![Page 4: Introduction to Bioinformatics - Tutorial no. 12](https://reader036.vdocuments.us/reader036/viewer/2022062323/568157df550346895dc55f06/html5/thumbnails/4.jpg)
Hierarchical Agglomerative ClusteringMichael Eisen, 1998
Cluster (algorithm) TreeView (visualization)
Hierarchical Agglomerative Clustering Step 1: Similarity score between all pairs of genes
Pearson Correlation Euclidean distance
Step 2: Find the two most similar genes, replace with a node that contains the average Builds a tree of genes
Step 3: Repeat
![Page 5: Introduction to Bioinformatics - Tutorial no. 12](https://reader036.vdocuments.us/reader036/viewer/2022062323/568157df550346895dc55f06/html5/thumbnails/5.jpg)
52 41 3
Agglomerative Hierarchical Clustering
3
1
4 2
5
Distance between joined clusters
Need to define the distance between thenew cluster and the other clusters.
Single Linkage: distance between closest pair.
Complete Linkage: distance between farthest pair.
Average Linkage: average distance between all pairs
or distance between cluster centers
Need to define the distance between thenew cluster and the other clusters.
Single Linkage: distance between closest pair.
Complete Linkage: distance between farthest pair.
Average Linkage: average distance between all pairs
or distance between cluster centers
Dendrogram
The dendrogram induces a linear ordering of the data points
The dendrogram induces a linear ordering of the data points
![Page 6: Introduction to Bioinformatics - Tutorial no. 12](https://reader036.vdocuments.us/reader036/viewer/2022062323/568157df550346895dc55f06/html5/thumbnails/6.jpg)
Results of Clustering Gene Expression
CLUSTER is simple and easy to use
De facto standard for microarray analysis
Limitations: Hierarchical clustering in
general is not robust Genes may belong to
more than one cluster
![Page 7: Introduction to Bioinformatics - Tutorial no. 12](https://reader036.vdocuments.us/reader036/viewer/2022062323/568157df550346895dc55f06/html5/thumbnails/7.jpg)
K-Means Clustering Algorithm Randomly initialize k cluster means Iterate:
Assign each genes to the nearest cluster mean Recompute cluster means
Stop when clustering converges
Notes: Really fast Genes are partitioned into clusters How do we select k?
![Page 8: Introduction to Bioinformatics - Tutorial no. 12](https://reader036.vdocuments.us/reader036/viewer/2022062323/568157df550346895dc55f06/html5/thumbnails/8.jpg)
K-Means Algorithm
Randomly Initialize Clusters
![Page 9: Introduction to Bioinformatics - Tutorial no. 12](https://reader036.vdocuments.us/reader036/viewer/2022062323/568157df550346895dc55f06/html5/thumbnails/9.jpg)
K-Means Algorithm
Assign data points to nearest clusters
![Page 10: Introduction to Bioinformatics - Tutorial no. 12](https://reader036.vdocuments.us/reader036/viewer/2022062323/568157df550346895dc55f06/html5/thumbnails/10.jpg)
K-Means Algorithm
Recalculate Clusters
![Page 11: Introduction to Bioinformatics - Tutorial no. 12](https://reader036.vdocuments.us/reader036/viewer/2022062323/568157df550346895dc55f06/html5/thumbnails/11.jpg)
K-Means Algorithm
Recalculate Clusters
![Page 12: Introduction to Bioinformatics - Tutorial no. 12](https://reader036.vdocuments.us/reader036/viewer/2022062323/568157df550346895dc55f06/html5/thumbnails/12.jpg)
K-Means Algorithm
Repeat
![Page 13: Introduction to Bioinformatics - Tutorial no. 12](https://reader036.vdocuments.us/reader036/viewer/2022062323/568157df550346895dc55f06/html5/thumbnails/13.jpg)
K-Means Algorithm
Repeat
![Page 14: Introduction to Bioinformatics - Tutorial no. 12](https://reader036.vdocuments.us/reader036/viewer/2022062323/568157df550346895dc55f06/html5/thumbnails/14.jpg)
K-Means Algorithm
Repeat … until convergence
![Page 15: Introduction to Bioinformatics - Tutorial no. 12](https://reader036.vdocuments.us/reader036/viewer/2022062323/568157df550346895dc55f06/html5/thumbnails/15.jpg)
![Page 16: Introduction to Bioinformatics - Tutorial no. 12](https://reader036.vdocuments.us/reader036/viewer/2022062323/568157df550346895dc55f06/html5/thumbnails/16.jpg)
EPClust Input (1)Expression data matrix
Extra annotation for gene rows
Method of tabulation
Name for further analysis
![Page 17: Introduction to Bioinformatics - Tutorial no. 12](https://reader036.vdocuments.us/reader036/viewer/2022062323/568157df550346895dc55f06/html5/thumbnails/17.jpg)
EPClust Input (2)
Method of measuring distance between gene rows
Cluster hierarchically
Number k of means
Cluster into k means
![Page 18: Introduction to Bioinformatics - Tutorial no. 12](https://reader036.vdocuments.us/reader036/viewer/2022062323/568157df550346895dc55f06/html5/thumbnails/18.jpg)
GEO: Gene Expression Omnibus
NCBI database for gene expression data Founded at end of 2000
![Page 19: Introduction to Bioinformatics - Tutorial no. 12](https://reader036.vdocuments.us/reader036/viewer/2022062323/568157df550346895dc55f06/html5/thumbnails/19.jpg)
![Page 20: Introduction to Bioinformatics - Tutorial no. 12](https://reader036.vdocuments.us/reader036/viewer/2022062323/568157df550346895dc55f06/html5/thumbnails/20.jpg)
Querying GEOBrowse records
Search for entries containing a gene
Search for experiments
Search with Entrez
![Page 21: Introduction to Bioinformatics - Tutorial no. 12](https://reader036.vdocuments.us/reader036/viewer/2022062323/568157df550346895dc55f06/html5/thumbnails/21.jpg)
SGD – Expression database
http://db.yeastgenome.org/cgi-bin/expression/expressionConnection.pl
![Page 22: Introduction to Bioinformatics - Tutorial no. 12](https://reader036.vdocuments.us/reader036/viewer/2022062323/568157df550346895dc55f06/html5/thumbnails/22.jpg)
SGD – Expression database
![Page 23: Introduction to Bioinformatics - Tutorial no. 12](https://reader036.vdocuments.us/reader036/viewer/2022062323/568157df550346895dc55f06/html5/thumbnails/23.jpg)
SGD – Expression database
![Page 24: Introduction to Bioinformatics - Tutorial no. 12](https://reader036.vdocuments.us/reader036/viewer/2022062323/568157df550346895dc55f06/html5/thumbnails/24.jpg)
SGD – Expression database
![Page 25: Introduction to Bioinformatics - Tutorial no. 12](https://reader036.vdocuments.us/reader036/viewer/2022062323/568157df550346895dc55f06/html5/thumbnails/25.jpg)
Two labs are running experiments on the APO1 gene. Suggest a method that would allow them to compare their results.
Gene grouping Relative values
![Page 26: Introduction to Bioinformatics - Tutorial no. 12](https://reader036.vdocuments.us/reader036/viewer/2022062323/568157df550346895dc55f06/html5/thumbnails/26.jpg)
Explain how microarrays can be used as a basis for diagnostic
Sample 1
Sample 2
Sample 3
sample4
Sample 5
Gen1+--++Gen2++-+-Gen3-+++-Gen4+++--Gen5--+-+
![Page 27: Introduction to Bioinformatics - Tutorial no. 12](https://reader036.vdocuments.us/reader036/viewer/2022062323/568157df550346895dc55f06/html5/thumbnails/27.jpg)
Explain how microarrays can be used as a basis for diagnostic
Sample 1
Sample 2
sample4
Sample 3
Sample 5
Gen1+-+-+Gen2+++--Gen3-+++-Gen4++-+-Gen5---++