courtesy of prof. shixia liu @tsinghua...

50
Courtesy of Prof. Shixia Liu @Tsinghua University

Upload: others

Post on 24-Aug-2020

4 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Courtesy of Prof. Shixia Liu @Tsinghua Universityweb.cse.ohio-state.edu/~shen.94/5544/Slides/multi.pdf · "Visualizing High‐ Dimensional Structures by Dimension Ordering and Filtering

Courtesy of Prof. Shixia Liu @Tsinghua University

Page 2: Courtesy of Prof. Shixia Liu @Tsinghua Universityweb.cse.ohio-state.edu/~shen.94/5544/Slides/multi.pdf · "Visualizing High‐ Dimensional Structures by Dimension Ordering and Filtering

Outline

•  Introduction

•  Classification of Techniques – Table

– Scatter Plot Matrices

– Projections

– Parallel Coordinates

•  Summary

Page 3: Courtesy of Prof. Shixia Liu @Tsinghua Universityweb.cse.ohio-state.edu/~shen.94/5544/Slides/multi.pdf · "Visualizing High‐ Dimensional Structures by Dimension Ordering and Filtering

Motivation

•  Real world data contain multiple dimensions

Page 4: Courtesy of Prof. Shixia Liu @Tsinghua Universityweb.cse.ohio-state.edu/~shen.94/5544/Slides/multi.pdf · "Visualizing High‐ Dimensional Structures by Dimension Ordering and Filtering

Multivariate/Multidimensional Data Visualization•  Multivariate data visualization is a specific type of

information visualization that deals with multivariate/multidimensional data

•  The data to be visualized are of high dimensionality in which the correlations between these many attributes are of interest

Page 5: Courtesy of Prof. Shixia Liu @Tsinghua Universityweb.cse.ohio-state.edu/~shen.94/5544/Slides/multi.pdf · "Visualizing High‐ Dimensional Structures by Dimension Ordering and Filtering

Dimensionality

•  Refers to the number of attributes that presents in the data –  1: one-dimensional 1D / univariate

–  2: two-dimensional 2D/ bivaraite

–  3: three-dimensional 3D / trivariate

–  ≥3: multidimensional / hypervarite / multivariate

•  Boundary between high and low dimensionality not clear, generally high dimensionality has >4 variables

Page 6: Courtesy of Prof. Shixia Liu @Tsinghua Universityweb.cse.ohio-state.edu/~shen.94/5544/Slides/multi.pdf · "Visualizing High‐ Dimensional Structures by Dimension Ordering and Filtering

Terminology

Dimensions

Variables

Multidimensional Dimensionality of the independent dimensions

Multivariate Dimensionality of the dependent variables

Page 7: Courtesy of Prof. Shixia Liu @Tsinghua Universityweb.cse.ohio-state.edu/~shen.94/5544/Slides/multi.pdf · "Visualizing High‐ Dimensional Structures by Dimension Ordering and Filtering

Outline

•  Introduction

•  Classification of Techniques – Projections

– Parallel Coordinates

– Table

– Scatter Plot Matrices

•  Summary

Page 8: Courtesy of Prof. Shixia Liu @Tsinghua Universityweb.cse.ohio-state.edu/~shen.94/5544/Slides/multi.pdf · "Visualizing High‐ Dimensional Structures by Dimension Ordering and Filtering

Classification of Techniques •  Projection

•  Parallel Coordinates Plot

•  Table

•  Scatter Plot Matrix

Page 9: Courtesy of Prof. Shixia Liu @Tsinghua Universityweb.cse.ohio-state.edu/~shen.94/5544/Slides/multi.pdf · "Visualizing High‐ Dimensional Structures by Dimension Ordering and Filtering

•  What if we have too many dimensions?

•  A intuitive way is to project to low dimension space

•  Linear projections

•  Nonlinear projections

A projection (X -> Y) maps points {x1, x2, …, xm} in an n-dimensional space into a p-dimensional space as {y1, y2, …, ym} (p << n) while preserving distance measures of data items.

Page 10: Courtesy of Prof. Shixia Liu @Tsinghua Universityweb.cse.ohio-state.edu/~shen.94/5544/Slides/multi.pdf · "Visualizing High‐ Dimensional Structures by Dimension Ordering and Filtering

Classification

•  Linear projection – Example: PCA (principal

component analysis)

•  Non-linear projection – Example: t-SNE (t-distributed

stochastic neighbor embedding)

Page 11: Courtesy of Prof. Shixia Liu @Tsinghua Universityweb.cse.ohio-state.edu/~shen.94/5544/Slides/multi.pdf · "Visualizing High‐ Dimensional Structures by Dimension Ordering and Filtering

PCA

•  Seeks a space of lower dimensionality (magenta)

•  Such that the orthogonal projection of the data points (red) onto this subspace maximizes the variance of the projected points (green)

Page 12: Courtesy of Prof. Shixia Liu @Tsinghua Universityweb.cse.ohio-state.edu/~shen.94/5544/Slides/multi.pdf · "Visualizing High‐ Dimensional Structures by Dimension Ordering and Filtering

Maximizes Variance

•  To begin with, consider the projection onto a one-dimensional space

•  The direction of this space

•  Variance

•  How to maximize this?

Trick:

Page 13: Courtesy of Prof. Shixia Liu @Tsinghua Universityweb.cse.ohio-state.edu/~shen.94/5544/Slides/multi.pdf · "Visualizing High‐ Dimensional Structures by Dimension Ordering and Filtering

Maximizes Variance (cont’d)• 

Eigenvalue

Page 14: Courtesy of Prof. Shixia Liu @Tsinghua Universityweb.cse.ohio-state.edu/~shen.94/5544/Slides/multi.pdf · "Visualizing High‐ Dimensional Structures by Dimension Ordering and Filtering

One Example

Page 15: Courtesy of Prof. Shixia Liu @Tsinghua Universityweb.cse.ohio-state.edu/~shen.94/5544/Slides/multi.pdf · "Visualizing High‐ Dimensional Structures by Dimension Ordering and Filtering

Extension to M-dimension

•  Define additional principal components in an incremental fashion (details refer to Chapter 12 in Patter Recognition and Machine Learning)

•  Conclusion of M dimension:

•  The M eigenvectors u1,...,uM of the data covariance matrix S corresponding to the M largest eigenvalues λ1 ,...,λM

Page 16: Courtesy of Prof. Shixia Liu @Tsinghua Universityweb.cse.ohio-state.edu/~shen.94/5544/Slides/multi.pdf · "Visualizing High‐ Dimensional Structures by Dimension Ordering and Filtering

Covariance Matrix

Covariance

Page 17: Courtesy of Prof. Shixia Liu @Tsinghua Universityweb.cse.ohio-state.edu/~shen.94/5544/Slides/multi.pdf · "Visualizing High‐ Dimensional Structures by Dimension Ordering and Filtering

Fit an n-d Ellipsoid to the Data

Page 18: Courtesy of Prof. Shixia Liu @Tsinghua Universityweb.cse.ohio-state.edu/~shen.94/5544/Slides/multi.pdf · "Visualizing High‐ Dimensional Structures by Dimension Ordering and Filtering

T-SNE

Page 19: Courtesy of Prof. Shixia Liu @Tsinghua Universityweb.cse.ohio-state.edu/~shen.94/5544/Slides/multi.pdf · "Visualizing High‐ Dimensional Structures by Dimension Ordering and Filtering

T-SNE

•  Particularly well-suited for embedding high-dimensional data into a space of two or three dimensions, which can then be visualized in a scatter plot

Page 20: Courtesy of Prof. Shixia Liu @Tsinghua Universityweb.cse.ohio-state.edu/~shen.94/5544/Slides/multi.pdf · "Visualizing High‐ Dimensional Structures by Dimension Ordering and Filtering

Major Goal •  t-Distributed stochastic neighbor embedding (t-

SNE) minimizes the divergence between two distributions: a distribution that measures pairwise similarities of the input objects and a distribution that measures pairwise similarities of the corresponding low-dimensional points in the embedding.

Page 21: Courtesy of Prof. Shixia Liu @Tsinghua Universityweb.cse.ohio-state.edu/~shen.94/5544/Slides/multi.pdf · "Visualizing High‐ Dimensional Structures by Dimension Ordering and Filtering

Two Main Stages

•  First, t-SNE constructs a probability distribution over pairs of high-dimensional objects – Similar objects have a high probability of being picked

– Dissimilar points have an extremely small probability of being picked

Page 22: Courtesy of Prof. Shixia Liu @Tsinghua Universityweb.cse.ohio-state.edu/~shen.94/5544/Slides/multi.pdf · "Visualizing High‐ Dimensional Structures by Dimension Ordering and Filtering

Example – Step 1

Page 23: Courtesy of Prof. Shixia Liu @Tsinghua Universityweb.cse.ohio-state.edu/~shen.94/5544/Slides/multi.pdf · "Visualizing High‐ Dimensional Structures by Dimension Ordering and Filtering

Two Main Stages (cont’d)•  Second, t-SNE defines a probability distribution

over the points in the low-dimensional map – Similar to the one in high-dimensional space

– Minimizes the Kullback–Leibler divergence between the two distributions with respect to the locations of the points in the map.

Heavy-tailed student-t distribution

Page 24: Courtesy of Prof. Shixia Liu @Tsinghua Universityweb.cse.ohio-state.edu/~shen.94/5544/Slides/multi.pdf · "Visualizing High‐ Dimensional Structures by Dimension Ordering and Filtering

Example: Step Two

Page 25: Courtesy of Prof. Shixia Liu @Tsinghua Universityweb.cse.ohio-state.edu/~shen.94/5544/Slides/multi.pdf · "Visualizing High‐ Dimensional Structures by Dimension Ordering and Filtering

Example: Step Two

Before optimization

Page 26: Courtesy of Prof. Shixia Liu @Tsinghua Universityweb.cse.ohio-state.edu/~shen.94/5544/Slides/multi.pdf · "Visualizing High‐ Dimensional Structures by Dimension Ordering and Filtering

Example: Final Result

Student t-distribution Gaussian distribution

Page 27: Courtesy of Prof. Shixia Liu @Tsinghua Universityweb.cse.ohio-state.edu/~shen.94/5544/Slides/multi.pdf · "Visualizing High‐ Dimensional Structures by Dimension Ordering and Filtering

The t-Student distribution •  The volume of the N-dimensional ball of radius r

scales is

•  When N is large, if we pick random points uniformly in the ball, most points will be close to the surface, and very few will be near the center.

Page 28: Courtesy of Prof. Shixia Liu @Tsinghua Universityweb.cse.ohio-state.edu/~shen.94/5544/Slides/multi.pdf · "Visualizing High‐ Dimensional Structures by Dimension Ordering and Filtering

The t-Student distribution •  If the same Gaussian distribution is used for the

low dimensional map points, not enough space is available in low dimensional space – The crowding problem

•  Use a t-Student with one degree of freedom (or Cauchy) distribution instead for the map points. – Has a much heavier tail than the Gaussian distribution,

which compensates the original imbalance.

Page 29: Courtesy of Prof. Shixia Liu @Tsinghua Universityweb.cse.ohio-state.edu/~shen.94/5544/Slides/multi.pdf · "Visualizing High‐ Dimensional Structures by Dimension Ordering and Filtering

Comparison

Page 30: Courtesy of Prof. Shixia Liu @Tsinghua Universityweb.cse.ohio-state.edu/~shen.94/5544/Slides/multi.pdf · "Visualizing High‐ Dimensional Structures by Dimension Ordering and Filtering

The Distribution Model •  Probability model for high-dimensional data points

•  Probability model for low-dimensional map points

•  The different between two distributions

Page 31: Courtesy of Prof. Shixia Liu @Tsinghua Universityweb.cse.ohio-state.edu/~shen.94/5544/Slides/multi.pdf · "Visualizing High‐ Dimensional Structures by Dimension Ordering and Filtering

The Solution •  To minimize this score, we perform a gradient

descent. The gradient can be computed analytically:

•  Update yi iteratively

Page 32: Courtesy of Prof. Shixia Liu @Tsinghua Universityweb.cse.ohio-state.edu/~shen.94/5544/Slides/multi.pdf · "Visualizing High‐ Dimensional Structures by Dimension Ordering and Filtering

One Example

Page 33: Courtesy of Prof. Shixia Liu @Tsinghua Universityweb.cse.ohio-state.edu/~shen.94/5544/Slides/multi.pdf · "Visualizing High‐ Dimensional Structures by Dimension Ordering and Filtering

Example: MNIST

•  Hand written digit (0-9)

Page 34: Courtesy of Prof. Shixia Liu @Tsinghua Universityweb.cse.ohio-state.edu/~shen.94/5544/Slides/multi.pdf · "Visualizing High‐ Dimensional Structures by Dimension Ordering and Filtering

Package •  Laurens van der Maate https://lvdmaaten.github.io/tsne/

–  L.J.P. van der Maaten. Accelerating t-SNE using Tree-Based Algorithms. Journal of Machine Learning Research 15(Oct):3221-3245, 2014.

–  L.J.P. van der Maaten and G.E. Hinton. Visualizing Non-Metric Similarities in Multiple Maps. Machine Learning 87(1):33-55, 2012.

–  L.J.P. van der Maaten. Learning a Parametric Embedding by Preserving Local Structure. In Proceedings of the Twelfth International Conference on Artificial Intelligence & Statistics (AI-STATS), JMLR W&CP 5:384-391, 2009. PDF

–  L.J.P. van der Maaten and G.E. Hinton. Visualizing High-Dimensional Data Using t-SNE. Journal of Machine Learning Research 9(Nov):2579-2605, 2008.

Page 35: Courtesy of Prof. Shixia Liu @Tsinghua Universityweb.cse.ohio-state.edu/~shen.94/5544/Slides/multi.pdf · "Visualizing High‐ Dimensional Structures by Dimension Ordering and Filtering

Comparison •  PCA, MDS

–  Linear technique

– Keep the low-dimensional representations of dissimilar data points far apart

•  t-SNE – Non-linear technique

– Capture much of the local structure of the high-dimensional data very well, while also revealing global structure such as the presence of clusters at multiple scales.

Page 36: Courtesy of Prof. Shixia Liu @Tsinghua Universityweb.cse.ohio-state.edu/~shen.94/5544/Slides/multi.pdf · "Visualizing High‐ Dimensional Structures by Dimension Ordering and Filtering

Comparison

Page 37: Courtesy of Prof. Shixia Liu @Tsinghua Universityweb.cse.ohio-state.edu/~shen.94/5544/Slides/multi.pdf · "Visualizing High‐ Dimensional Structures by Dimension Ordering and Filtering

•  Inselberg, "Multidimensional detective" (parallel coordiantes), 1997

Page 38: Courtesy of Prof. Shixia Liu @Tsinghua Universityweb.cse.ohio-state.edu/~shen.94/5544/Slides/multi.pdf · "Visualizing High‐ Dimensional Structures by Dimension Ordering and Filtering

Parallel Coordinates: Visual Design

•  Dimensions as parallel axes

•  Data items as line segments

•  Intersections on the axes indicates the values of the corresponding attributes

dim1 dim2 dim3 dimn……

Min: 0

Max: 1

0.8

0.6

0.8

0.30.25

Page 39: Courtesy of Prof. Shixia Liu @Tsinghua Universityweb.cse.ohio-state.edu/~shen.94/5544/Slides/multi.pdf · "Visualizing High‐ Dimensional Structures by Dimension Ordering and Filtering

Parallel Coordinates: Pros and Cons

!"Correlations among attributes studied by spotting the locations of the intersection points

!"Effective for revealing data distributions and functional dependencies

#"Visual clutter due to limited space available for each parallel axis

#"Axes packed very closely when dimensionality is high

Page 40: Courtesy of Prof. Shixia Liu @Tsinghua Universityweb.cse.ohio-state.edu/~shen.94/5544/Slides/multi.pdf · "Visualizing High‐ Dimensional Structures by Dimension Ordering and Filtering

•  Clustering and filtering approaches

•  Dimension reordering approaches

•  Visual enhancement approaches

Out5d dataset (5 dimensions, 16384 data items)

Page 41: Courtesy of Prof. Shixia Liu @Tsinghua Universityweb.cse.ohio-state.edu/~shen.94/5544/Slides/multi.pdf · "Visualizing High‐ Dimensional Structures by Dimension Ordering and Filtering

Star Coordinates

•  Scatterplots for higher dimensions: attribute as axis on a circle, data item as point

•  Change the length of axis $ alters contribution of attribute

•  Change the direction of axis $ angles not equal, adjusts correlations between attributes

!"Useful for gaining insight into hierarchically clustered datasets and for multi-factor analysis for decision-making

Page 42: Courtesy of Prof. Shixia Liu @Tsinghua Universityweb.cse.ohio-state.edu/~shen.94/5544/Slides/multi.pdf · "Visualizing High‐ Dimensional Structures by Dimension Ordering and Filtering

Table Lens

•  Represents rows as data items and columns as attributes

•  Each column viewed as histogram or plot •  Information along rows or columns interrelated !"Uses the familiar concept “table”

The table lens: merging graphical and symbolic representations in an interactive focus+ context visualization for tabular information

Page 43: Courtesy of Prof. Shixia Liu @Tsinghua Universityweb.cse.ohio-state.edu/~shen.94/5544/Slides/multi.pdf · "Visualizing High‐ Dimensional Structures by Dimension Ordering and Filtering

Scatterplot Matrix •  Scatterplot: 2 attributes

projected along the x- and y-axis

•  Collection of scatterplots is organized in a matrix

!"Straightforward%

#"Important patterns in higher dimensions barely recognized

#"Chaotic when number of data items too large

Page 44: Courtesy of Prof. Shixia Liu @Tsinghua Universityweb.cse.ohio-state.edu/~shen.94/5544/Slides/multi.pdf · "Visualizing High‐ Dimensional Structures by Dimension Ordering and Filtering

Outline

•  Introduction

•  Classification of Techniques – Table

– Scatter Plot Matrices

– Projections

– Parallel Coordinates

– Pixel-Oriented Techniques

–  Iconography

•  Summary

Page 45: Courtesy of Prof. Shixia Liu @Tsinghua Universityweb.cse.ohio-state.edu/~shen.94/5544/Slides/multi.pdf · "Visualizing High‐ Dimensional Structures by Dimension Ordering and Filtering

Visualizations Advantages Disadvantages

Clear visual patterns1.  Obscured semantics 2.  Loss of information 3.  Visual Clutter

Clear visual patterns Visual Clutter

Uses the familiar concept “table”

Support limited numbers of dimensions

Simple1.  Visual clutter 2.  Unclear patterns

Page 46: Courtesy of Prof. Shixia Liu @Tsinghua Universityweb.cse.ohio-state.edu/~shen.94/5544/Slides/multi.pdf · "Visualizing High‐ Dimensional Structures by Dimension Ordering and Filtering

Further Reading

•  Survey – Dos Santos, Selan, and Ken Brodlie. "Gaining

understanding of multivariate and multidimensional data through visualization." Computers & Graphics28.3 (2004): 311-325.

•  Website –  http://www.sci.utah.edu/~shusenl/highDimSurvey/

website/

Page 47: Courtesy of Prof. Shixia Liu @Tsinghua Universityweb.cse.ohio-state.edu/~shen.94/5544/Slides/multi.pdf · "Visualizing High‐ Dimensional Structures by Dimension Ordering and Filtering

Further Reading

•  Evaluation – Rubio-Sánchez, Manuel, et al. "A comparative study

between RadViz and Star Coordinates." IEEE transactions on visualization and computer graphics 22.1 (2016): 619-628.

Page 48: Courtesy of Prof. Shixia Liu @Tsinghua Universityweb.cse.ohio-state.edu/~shen.94/5544/Slides/multi.pdf · "Visualizing High‐ Dimensional Structures by Dimension Ordering and Filtering

References

•  Rao, Ramana, and Stuart K. Card. "The table lens: merging graphical and symbolic representations in an interactive focus+ context visualization for tabular information." Proceedings of the SIGCHI conference on Human factors in computing systems. ACM, 1994.

•  Gratzl, Samuel, et al. "Lineup: Visual analysis of multi-attribute rankings."IEEE transactions on visualization and computer graphics 19.12 (2013): 2277-2286.

•  van Wijk, Jarke J., and Robert van Liere. "HyperSlice: visualization of scalar functions of many variables." Proceedings of the 4th conference on Visualization'93. IEEE Computer Society, 1993.

•  Kim, Hannah, et al. "InterAxis: Steering Scatterplot Axes via Observation-Level Interaction." IEEE transactions on visualization and computer graphics22.1 (2016): 131-140.

Page 49: Courtesy of Prof. Shixia Liu @Tsinghua Universityweb.cse.ohio-state.edu/~shen.94/5544/Slides/multi.pdf · "Visualizing High‐ Dimensional Structures by Dimension Ordering and Filtering

References

•  Maaten, Laurens van der, and Geoffrey Hinton. "Visualizing data using t-SNE." Journal of Machine Learning Research 9.Nov (2008): 2579-2605.

•  Zhou, Hong, et al. "Visual clustering in parallel coordinates." Computer Graphics Forum. Vol. 27. No. 3., 2008.

•  Ferdosi, Bilkis J., and Jos BTM Roerdink. "Visualizing High‐

Dimensional Structures by Dimension Ordering and Filtering using Subspace Analysis."Computer Graphics Forum. Vol. 30. No. 3, 2011.

•  Novotny, Matej, and Helwig Hauser. "Outlier-preserving focus+ context visualization in parallel coordinates." IEEE Transactions on Visualization and Computer Graphics 12.5 (2006): 893-900.

Page 50: Courtesy of Prof. Shixia Liu @Tsinghua Universityweb.cse.ohio-state.edu/~shen.94/5544/Slides/multi.pdf · "Visualizing High‐ Dimensional Structures by Dimension Ordering and Filtering

References

•  Keim, Daniel A., and H-P. Kriegel. "Visualization techniques for mining large databases: A comparison." IEEE Transactions on knowledge and data engineering 8.6 (1996): 923-938.