interactive exploration of hierarchical clustering results hce (hierarchical clustering explorer)...
Post on 15-Jan-2016
222 views
TRANSCRIPT
Interactive Exploration of Hierarchical Clustering Results
HCE (Hierarchical Clustering Explorer)
Jinwook Seo and Ben ShneidermanHuman-Computer Interaction Lab
Department of Computer Science
University of Maryland, College Park
Cluster Analysis of Microarray Experiment Data
• About 100 ~ 20,000 gene samples• Under 2 ~ 80 experimental conditions• Identify similar gene samples
– startup point for studying unknown genes
• Identify similar experimental conditions– develop a better treatment for a special group
• Clustering algorithms– Hierarchical, K-means, etc.
Dendrogram-3.64 4.87
Dendrogram-3.64 4.87
Dendrogram-3.64 4.87
Interactive Exploration Techniques
• Dynamic Query Controls– Number of clusters, Level of detail
• Coordinated Display– Bi-directional interaction with 2D scattergrams
• Overview of the entire dataset– Coupled with detail view
• Visual Comparison of Different Results– Different results by different methods
Demonstration
• 99 Yeast genes
• 7 variables (time points)
• Download HCE at– www.cs.umd.edu/hcil/multi-cluster
• More demonstration– A.V. Williams Bldg, 3174– 3:30-5:00pm, May 31.
Dynamic Query ControlsFilter out less similar genes
By pulling down the minimum similarity bar
Show only the clusters that satisfy the minimum similarity threshold
Help users determine the proper number of clusters
Easy to find the most similar genes
Dynamic Query Controls
Adjust level of detail
By dragging up the detail cutoff bar
Show the representative pattern of each cluster
Hide detail below the bar
Easy to view global structure
Coordinated Displays
• Two experimental conditions for the x and y axes
• Two-dimensional scattergrams– limited to two variables at a time– readily understood by most users– users can concentrate on the data without
distraction
• Bi-directional interactions between displays
Overview in a limited screen space • What if there are more than 1,600 items to display?
• Compressed Overview : averaging adjacent leaves• Easy to locate interesting spots
Melanoma Microarray Experiment (3614 x 38)
Overview in a limited screen space • What if there are more than 1,600 items to display?
• Alternative Overview : changing bar width (2~10)• Show more detail, but need scrolling
Cluster Comparison
• There is no perfect clustering algorithm!• Different Distance Measures• Different Linkage Methods• Two dendrograms at the same time
– Show the mapping of each gene between the two dendrograms
– Busy screen with crossing lines – Easy to see anomalies
Cluster Comparison
Conclusion• Integrate four features to interactively
explore clustering results to gain a stronger understanding of the significance of the clusters– Overview, Dynamic Query, Coordination,
Cluster Comparison
• Powerful algorithms + Interactive tools • Bioinformatics Visualization
www.cs.umd.edu/hcil/multi-clusterJuly 2002 IEEE Computer Special Issue on BioInformatics
A B C D
Dist A B C D
A 20 7 2
B 10 25
C 3
D
Distance MatrixInitial Data Items
Hierarchical Clustering
A B C D
Dist A B C D
A 20 7 2
B 10 25
C 3
D
Distance MatrixInitial Data Items
Hierarchical Clustering
Current Clusters
Single Linkage
Hierarchical Clustering
Dist A B C D
A 20 7 2
B 10 25
C 3
D
Distance Matrix
A B CD
2
Dist AD B C
AD 20 3
B 10
C
Distance MatrixCurrent Clusters
Single Linkage
Hierarchical Clustering
A B CD
A B CD
Dist AD B C
AD 20 3
B 10
C
Distance MatrixCurrent Clusters
Single Linkage
Hierarchical Clustering
Dist AD B C
AD 20 3
B 10
C
Distance MatrixCurrent Clusters
Single Linkage
Hierarchical Clustering
A BCD
3
Dist ADC B
ADC
10
B
Distance MatrixCurrent Clusters
Single Linkage
Hierarchical Clustering
A BCD
A BCD
Dist ADC B
ADC
10
B
Distance MatrixCurrent Clusters
Single Linkage
Hierarchical Clustering
Dist ADC B
ADC
10
B
Distance MatrixCurrent Clusters
Single Linkage
Hierarchical Clustering
A BCD
10
A BCD
Dist ADCB
ADCB
Distance MatrixFinal Result
Single Linkage
Hierarchical Clustering