proximity data visualization with h-plots · if triangle inequality is not hold, although dij is...
TRANSCRIPT
![Page 1: Proximity data visualization with h-plots · If triangle inequality is not hold, although dij is small, variables d j. and d i. can be very different, and the objects i and j should](https://reader031.vdocuments.us/reader031/viewer/2022022805/5cae423388c99383228c776c/html5/thumbnails/1.jpg)
Irene Epifanio Dpt. Matemàtiques, Univ. Jaume I (SPAIN)[email protected]; http://www3.uji.es/~epifanio
The fifth international conference useR! 2009
Proximity data visualization with h-plots
![Page 2: Proximity data visualization with h-plots · If triangle inequality is not hold, although dij is small, variables d j. and d i. can be very different, and the objects i and j should](https://reader031.vdocuments.us/reader031/viewer/2022022805/5cae423388c99383228c776c/html5/thumbnails/2.jpg)
Outline
Motivating problemMethodologySmall-size examplesPoint patternsConclusions
![Page 3: Proximity data visualization with h-plots · If triangle inequality is not hold, although dij is small, variables d j. and d i. can be very different, and the objects i and j should](https://reader031.vdocuments.us/reader031/viewer/2022022805/5cae423388c99383228c776c/html5/thumbnails/3.jpg)
Motivating problem
In Ayala et al. 2006: to find groups corresponding with different morphologies of the corneal endothelia
Different dissimilarities (non-metric) between human corneal endothelia.
![Page 4: Proximity data visualization with h-plots · If triangle inequality is not hold, although dij is small, variables d j. and d i. can be very different, and the objects i and j should](https://reader031.vdocuments.us/reader031/viewer/2022022805/5cae423388c99383228c776c/html5/thumbnails/4.jpg)
Motivating problem
Corneal endothelia described by bivariate point patterns (centroids and triple points).
Different dissimilarities (triangle inequality is not hold) between point patterns.
![Page 5: Proximity data visualization with h-plots · If triangle inequality is not hold, although dij is small, variables d j. and d i. can be very different, and the objects i and j should](https://reader031.vdocuments.us/reader031/viewer/2022022805/5cae423388c99383228c776c/html5/thumbnails/5.jpg)
Methodology: h-plot
X data matrix, S covariance matrix: λ1, λ2 largesteigenvalues, q1, q2 unit eigenvectors:
Rows hj of matrix H2 have properties:
![Page 6: Proximity data visualization with h-plots · If triangle inequality is not hold, although dij is small, variables d j. and d i. can be very different, and the objects i and j should](https://reader031.vdocuments.us/reader031/viewer/2022022805/5cae423388c99383228c776c/html5/thumbnails/6.jpg)
Methodology: h-plot
We do not have a classical data matrix, but a dissimilarity matrix, D: dij represents the dissimilarity from the object i to object j. Asymmetric relationship (dij ≠ dji): we can consider the variable measuring dissimilarity from j to other objects (dj.) and the dissimilarity to j (d.j).With a symmetric dissimilarity (dj.= d.j): variable jrepresents dissimilarity with respect j.Euclidean distance between hj and hi in h-plot is sample standard deviation of difference between variables dj. and di.. If these variables are similar, their difference, and therefore, its standard deviation will be small.
![Page 7: Proximity data visualization with h-plots · If triangle inequality is not hold, although dij is small, variables d j. and d i. can be very different, and the objects i and j should](https://reader031.vdocuments.us/reader031/viewer/2022022805/5cae423388c99383228c776c/html5/thumbnails/7.jpg)
ComparisonClassical Metric Multidimensional (cmdscale) Isomap (Tenenbaum et al., 2000)Kruskal's Non-metric Multidimensional Scaling
(isoMDS) and Sammon's Non-Linear Mapping(sammon): Library MASS.
Congruence coefficient (0-1): similarity of two configurations X and Y.
1 is achieved if X and Y are perfectly similar geometrically (match by rigid motions and dilations).
![Page 8: Proximity data visualization with h-plots · If triangle inequality is not hold, although dij is small, variables d j. and d i. can be very different, and the objects i and j should](https://reader031.vdocuments.us/reader031/viewer/2022022805/5cae423388c99383228c776c/html5/thumbnails/8.jpg)
Example 1
If triangle inequality is not hold, although dij is small, variables dj.and di. can be very different, and the objects i and j should not be represented near.
![Page 9: Proximity data visualization with h-plots · If triangle inequality is not hold, although dij is small, variables d j. and d i. can be very different, and the objects i and j should](https://reader031.vdocuments.us/reader031/viewer/2022022805/5cae423388c99383228c776c/html5/thumbnails/9.jpg)
Example 2The observed values for variables dj.and di. coincide, but dij is not zero, therefore the observed difference between dj.and di. is zero for all the observed objects, except for the objects i and j.
![Page 10: Proximity data visualization with h-plots · If triangle inequality is not hold, although dij is small, variables d j. and d i. can be very different, and the objects i and j should](https://reader031.vdocuments.us/reader031/viewer/2022022805/5cae423388c99383228c776c/html5/thumbnails/10.jpg)
Example 3Asymmetric data: d is not a distance. Even when djj > 0.Dissimilarity formed by the variables giving the dissimilarity fromeach Morse code (i.e. di., where code i-th is first presented), and the variables giving the dissimilarity to each Morse code (i.e. d.i, where code i-th is second presented).
![Page 11: Proximity data visualization with h-plots · If triangle inequality is not hold, although dij is small, variables d j. and d i. can be very different, and the objects i and j should](https://reader031.vdocuments.us/reader031/viewer/2022022805/5cae423388c99383228c776c/html5/thumbnails/11.jpg)
Point patterns: simulation
Same experiments considered in Ayala et al. (Clustering of spatial point patterns. Computational Statistics & Data Analysis. 50 (4) 1016-1032, 2006):
Three experiments for simulated Strauss processes with different parameters.
In each experiment, the same experimental setup: three different groups, each of them composed of 100 point patterns. Therefore, 3 dissimilarity matrices of 300x300.
Considered dissimilarity (based on the log rank statistic applied to the nearest-neighbor distances, Ayala et al. 2006) between point patterns is not a metric: triangle inequality is not hold.
Libraries of R used: Splancs; Spatstat and Survival.
![Page 12: Proximity data visualization with h-plots · If triangle inequality is not hold, although dij is small, variables d j. and d i. can be very different, and the objects i and j should](https://reader031.vdocuments.us/reader031/viewer/2022022805/5cae423388c99383228c776c/html5/thumbnails/12.jpg)
Point patterns: simulation
Corsten and Gabriel (1976) goodness of fit for h-plotting in two dimensions:
![Page 13: Proximity data visualization with h-plots · If triangle inequality is not hold, although dij is small, variables d j. and d i. can be very different, and the objects i and j should](https://reader031.vdocuments.us/reader031/viewer/2022022805/5cae423388c99383228c776c/html5/thumbnails/13.jpg)
Point patterns: Experiment 1
One of the 100 point patterns generated for each group.
Note that we compute the dissimilarity between thesepoint patterns, not inside them.
![Page 14: Proximity data visualization with h-plots · If triangle inequality is not hold, although dij is small, variables d j. and d i. can be very different, and the objects i and j should](https://reader031.vdocuments.us/reader031/viewer/2022022805/5cae423388c99383228c776c/html5/thumbnails/14.jpg)
Point patterns: Experiment 1
a) Cmdscaleb) isoMDSc) Sammond) Isomap (25
neighbors)
![Page 15: Proximity data visualization with h-plots · If triangle inequality is not hold, although dij is small, variables d j. and d i. can be very different, and the objects i and j should](https://reader031.vdocuments.us/reader031/viewer/2022022805/5cae423388c99383228c776c/html5/thumbnails/15.jpg)
Point patterns: Experiment 1
Besides the original dissimilarities, the ranking of the dissimilarities have been also considered (Seber 1984: if we have in mind cluster and pattern detection, then an expansion or contraction of the configuration could be more useful).
![Page 16: Proximity data visualization with h-plots · If triangle inequality is not hold, although dij is small, variables d j. and d i. can be very different, and the objects i and j should](https://reader031.vdocuments.us/reader031/viewer/2022022805/5cae423388c99383228c776c/html5/thumbnails/16.jpg)
Point patterns: Endothelia
The dissimilarity matrix is made up of dissimilarities based on the log rank statistic applied to the nearest-neighbordistance between triple points (Ayala et al. 2006), for 153 individuals.
The unhealthy cases obtained in (Ayala et al. 2006) are represented by red triangles, while black circles are healthy cases.
![Page 17: Proximity data visualization with h-plots · If triangle inequality is not hold, although dij is small, variables d j. and d i. can be very different, and the objects i and j should](https://reader031.vdocuments.us/reader031/viewer/2022022805/5cae423388c99383228c776c/html5/thumbnails/17.jpg)
Point patterns: Endothelia
a) Cmdscaleb) isoMDSc) Sammond) Isomap (25
neighbors)
![Page 18: Proximity data visualization with h-plots · If triangle inequality is not hold, although dij is small, variables d j. and d i. can be very different, and the objects i and j should](https://reader031.vdocuments.us/reader031/viewer/2022022805/5cae423388c99383228c776c/html5/thumbnails/18.jpg)
Point patterns: Endothelia
(a) the original dissimilarities, and (b) the dissimilarity ranks.
![Page 19: Proximity data visualization with h-plots · If triangle inequality is not hold, although dij is small, variables d j. and d i. can be very different, and the objects i and j should](https://reader031.vdocuments.us/reader031/viewer/2022022805/5cae423388c99383228c776c/html5/thumbnails/19.jpg)
ConclusionsAlternative method for displaying dissimilarity matrices, based
on h-plots.Good behavior through several examples (dissimilarity was
not a metric).Non-iterative method, very simple to implement and
computationally efficient.The representation goodness can also be easily assessed. It can also handle naturally asymmetric data.More illustrative results at: http://www3.uji.es/~epifanio/RESEARCH/hplot.pdfFuture work: instead of second order differences between
variables that indicates dissimilarity with respect to an object: higher order differences. Although the simplicity could be lost.
![Page 20: Proximity data visualization with h-plots · If triangle inequality is not hold, although dij is small, variables d j. and d i. can be very different, and the objects i and j should](https://reader031.vdocuments.us/reader031/viewer/2022022805/5cae423388c99383228c776c/html5/thumbnails/20.jpg)
Thanks for your attention