clustering of dna microarray data michael slifker cis 526
TRANSCRIPT
![Page 1: Clustering of DNA Microarray Data Michael Slifker CIS 526](https://reader036.vdocuments.us/reader036/viewer/2022082820/56649e565503460f94b4ec24/html5/thumbnails/1.jpg)
Clustering of DNA Microarray Data
Michael Slifker
CIS 526
![Page 2: Clustering of DNA Microarray Data Michael Slifker CIS 526](https://reader036.vdocuments.us/reader036/viewer/2022082820/56649e565503460f94b4ec24/html5/thumbnails/2.jpg)
DNA Microarrays
• Measure gene expression in a sample for thousands of genes simultaneously
• Used to compare gene expression among samples
– Between individuals or treatments– Over time– Between normal tissue and tumor– Assess normal biological variation
![Page 3: Clustering of DNA Microarray Data Michael Slifker CIS 526](https://reader036.vdocuments.us/reader036/viewer/2022082820/56649e565503460f94b4ec24/html5/thumbnails/3.jpg)
Microarray Process
• Single-stranded DNA is printed onto slide• Extract mRNA from cells• Experimental mRNA sample & reference sample
are fluorescently labelled (Cy3-green, Cy5-red)• RNA samples are hybridized onto slide – bind to
complementary DNA• Laser scanning – fluorescent labels allow relative
levels of bound mRNA to be measured• Gridding, background correction, log-ratio
transformation, normalization, analysis (finally!)
![Page 4: Clustering of DNA Microarray Data Michael Slifker CIS 526](https://reader036.vdocuments.us/reader036/viewer/2022082820/56649e565503460f94b4ec24/html5/thumbnails/4.jpg)
• Red = low expression relative to reference• Green = high expression relative to reference• Yellow = similar expression in two samples• Black = no expression in either sample
![Page 5: Clustering of DNA Microarray Data Michael Slifker CIS 526](https://reader036.vdocuments.us/reader036/viewer/2022082820/56649e565503460f94b4ec24/html5/thumbnails/5.jpg)
Example
• DLBCL – Diffuse large B-cell lymphoma (Alizadeh et al, 2000)
• ~18,000 genes x 96 samples of normal and malignant leukocytes
• Clinical evidence of great heterogeneity in terms of survival
• Question: Are there subclasses of DLBCL that can be discovered by looking at gene expression profiles? (Answer: yes)
![Page 6: Clustering of DNA Microarray Data Michael Slifker CIS 526](https://reader036.vdocuments.us/reader036/viewer/2022082820/56649e565503460f94b4ec24/html5/thumbnails/6.jpg)
Why cluster?
• Very large numbers of genes and highly complex systems/pathways render clustering essential for interpretation and visualization
• Discover new tumor subclasses
• Describe common expression profiles (e.g., cell cycle)
![Page 7: Clustering of DNA Microarray Data Michael Slifker CIS 526](https://reader036.vdocuments.us/reader036/viewer/2022082820/56649e565503460f94b4ec24/html5/thumbnails/7.jpg)
What to cluster?
• Clustering genes:– Look for groups of genes with similar expression
profiles – may identify genes that are involved in biochemical pathways
• Clustering samples:– Do clusters conform to known categories?
– Can new structure be discovered (e.g., new subclasses of tumor)?
• Clustering both genes and samples at once
![Page 8: Clustering of DNA Microarray Data Michael Slifker CIS 526](https://reader036.vdocuments.us/reader036/viewer/2022082820/56649e565503460f94b4ec24/html5/thumbnails/8.jpg)
Clustering methods
• Hierarchical (agglomerative) – most common• K-means, PAM• Self-organizing maps (SOMs)• PCA clustering• Ensemble methods• “Fuzzy” methods – genes can belong to more than
one cluster• Model-based methods (e.g., mixtures of
Gaussians)
![Page 9: Clustering of DNA Microarray Data Michael Slifker CIS 526](https://reader036.vdocuments.us/reader036/viewer/2022082820/56649e565503460f94b4ec24/html5/thumbnails/9.jpg)
Challenges
• Noisy data in highly dimensional space
• Many choices of algorithm and algorithmic parameters– What distance measure?– What linkage?– How many clusters?
• How can we assess quality/reliability?
![Page 10: Clustering of DNA Microarray Data Michael Slifker CIS 526](https://reader036.vdocuments.us/reader036/viewer/2022082820/56649e565503460f94b4ec24/html5/thumbnails/10.jpg)
Ex.
![Page 11: Clustering of DNA Microarray Data Michael Slifker CIS 526](https://reader036.vdocuments.us/reader036/viewer/2022082820/56649e565503460f94b4ec24/html5/thumbnails/11.jpg)
• Two main sample clusters can be seen
• Genes correspond to two different types of B-cell
• Clusters are associated with differential survival beyond traditional clinical indicators
![Page 12: Clustering of DNA Microarray Data Michael Slifker CIS 526](https://reader036.vdocuments.us/reader036/viewer/2022082820/56649e565503460f94b4ec24/html5/thumbnails/12.jpg)
Conclusions
• To be useful, clustering of microarray data must ultimately be informed by biology
• Large number of genes and complexity of pathways means clustering is an essential part of most microarray analyses
• There is no “best” method – choices of distance, linkage, algorithm, gene filtering criteria. As much art as science