large scale information visualization · introduction class 1, part a. 4 motivation - data...
TRANSCRIPT
1
Large Scale Information Visualization
Jing YangFall 2007
2
Pre-requisite
Information Visualization Course slides: www.cs.uncc.edu/~jyang13Basic techniques will be covered in the first two weeksPlease learn other contents by yourself
3
Introduction
Class 1, Part A
4
Motivation - Data Explosion
Society is more complex: computers, internet, web, ...How much data?
Between 1 and 2 exabytes of unique info produced per year
1000000000000000000 bytes250 meg for every man, woman and childPrinted documents only .003% of total
Peter Lyman and Hal Varian, 2000Cal-Berkeley, Info Mgmt & Systems
www.sims.berkeley.edu/how-much-info
Slide courtesy of John Stasko
5
Data Overload
Confound: How to make use of the dataHow do we make sense of the data?How do we harness this data in decision-making processes?How do we avoid being overwhelmed?
Slide courtesy of John Stasko
6
Human Vision
Highest bandwidth senseFast, parallelPattern recognitionPre-attentiveExtends memory and cognitive capacityPeople think visually
Slides Slide courtesy of John Stasko
7
The Challenge
Transform the data into information (understanding, insight) thus making it useful to people
Slide courtesy of John Stasko
8
ExampleWhich state has the highest income? Is there a relationship between income and education?Are there any outliers?
Questions:
Example courtesy of Chris North
9
Visualize the Data
10
Illustrate Our Approach
Provide tools that present data in a way to help people understand and gain insight from itCliches“Seeing is believing”“A picture is worth a thousand words”
Slide courtesy of John Stasko
11
Don’t you believe it?
This graphic is worth at least 700 words
12
VisualizationVisualization - the use of computer-supported, interactive visual representations of data to amplify cognition
From [Card, Mackinlay Shneiderman ’98]
Often thought of as process of making a graphic or an imageReally is a cognitive process
Form a mental image of somethingInternalize an understanding
“The purpose of visualization is insight, not pictures”Insight: discovery, decision making, explanation
Slide courtesy of John Stasko
13
What is Information Visualization
Information Visualization is NOT Scientific Visualization
X
Scientific Visualization is primarily related to and represents something physical or geometric
Examples:Air flow over a wingStresses on a girderWeather over Pennsylvania
14
What is Information Visualization
It is about abstract data
X
15
InfoVis Is About Numerical Data
16
InfoVis Is About Ordinal Data
ThemeRiver: Visualizing Theme Changes Over Time [Havre et al. Infovis 00]
17
InfoVis Is About Nominal Data
Swiss mountain map, L. Matterhorn, 1983
18
InfoVis Is About Structured Data
19
InfoVis Is About Space
Statistics Bureau, Tokyo, 1985
20
InfoVis Is About Time
New York Times, January 1982
21
InfoVis Is About Change
22
InfoVis Is About Motion and Process
Illustration of magic turning a silver coin into a copper coin
23
InfoVis Is About All Kinds of Information
24
What is Information Visualization
It is about analyzing, communicating, and decision making
Of all method for analyzing and communicating statistical information, well-designed data graphics are usually the simplest and at the same time the most powerful - E. Tufte
25
What is Information Visualization
It is about scale and dimensionalityScale
Essential problem in reasoning is comparisonComparisons must be enforced within scope of eyespan
DimensionalityThe world is multi-dimensionalThe paper and computer is 2 dimensionalEscape from the flatland
26
What is Information Visualization
It is about interactive explorationWant to show multiple different perspectives on the dataA way to increase scalability and dimensionality
27
What is Information Visualization
It is about applicationsDocument, images, videos, multimediaFinancial/business dataInternet informationSoftware
28
The Need is There
In five years, 100 million people will be using an information-visualization tool on a near-daily basis. And products that have visualization as one of their top three features will earn $1 billion per year.
-- Ramana Rao, founder and chief technology
officer, Inxight Software Inc., Sunnyvale, Calif.
http://www.computerworld.com/databasetopics/data/story/0,10801,80243,00.html
Slide courtesy of John Stasko
29
Basic Techniques 1: Multi-dimensional Visualization
Class 1, Part B
30
Multi-dimensional (Multivariate) Dataset
Slide courtesy of John Stasko
31
Data Item (Object, Record, Case)
32
Dimension (Variable, Attribute)
33
Multidimensional DataExample: Iris Data
Scientists measured the sepal length, sepal width, petal length, petal width of many kinds of iris...
34
Multidimensional DataExample: Iris Data
sepallength
sepalwidth
petallength
petalwidth
5.1 3.5 1.4 0.2
4.9 3 1.4 0.2
... ... ... ...
5.9 3 5.1 1.8
35
ClassificationTable-based techniques
HeatMap, tableLensGeometric techniques
Scatterplot matrices, parallel coordinates, landscapes, Dust&Magnet …
Icon-based techniquesglyphs, shape-coding, *color icons, …
Hierarchical techniquesDimensional stacking, worlds-within-worlds, …
Pixel-oriented techniquesRecursive pattern, circle segments, spiral, axes techniques,…
36
Table-Based Techniques
Basic idea: Visualization that improves the existing spreadsheet table format
HeatMapTable Lens [RC 94]
37This figure is used by Dr. M. Ward’s permission
38This figure is used by Dr. M. Ward’s permission
39This figure is used by Dr. M. Ward’s permission
40This figure is used by Dr. M. Ward’s permission
41This figure is used by Dr. M. Ward’s permission
42This figure is used by Dr. M. Ward’s permission
43This figure is used by Dr. M. Ward’s permission
44This figure is used by Dr. M. Ward’s permission
45
Eureka (Table Lens)
Rao R., Card S. K.: ‘The Table Lens: Merging Graphical and Symbolic Representation in an Interactive Focus+Context Visualization for Tabular Information’, Proc. Human Factors in Computing Systems CHI ‘94Conf., Boston, MA, 1994, pp. 318-322.
46
Eureka (Table Lens)
47
Geometric TechniquesBasic idea: Visualization of geometric transformations and projections of data
Scatterplot-matrices [And 72, Cle 93]Parallel coordinates [Ins 85, ID 90]Parallel Glyphs [Fanea:05] Parallel Sets [Bendix:05]Star coordinates [Kan 2000]Landscapes [Wis 95]Dust & Magnet [Yi 2005]Projection Pursuit Techniques [Hub 85]Prosection Views [FB 94, STDS 95]Hyperslice [WL 93]
48
Recall 1-Dimensional Visualization
1 2
(1.6)
49
Parallel Coordinates
sepa llength
sepalw idth
peta llength
peta lw idth
5.1 3.5 1.4 0.2
4.9 3 1.4 0.2
... ... ... ...
5 .9 3 5.1 1.8
A. Inselberg. The Plane with Parallel Coordinates. Special Issue on Computational Geometry, The Visual Computer, 69-97
50
5.1
3.5
sepa lleng th
sepa lw id th
peta lleng th
peta lw id th
5 .1 3 .5 1 .4 0 .2
1.4 0.2
51
Cluster and Outlier
ClusterA group of data items that are similar in all dimensions.
Outlier A data item that is similar to FEW or No other data items.
52
53
54
Parallel Sets: Visual Analysis of Categorical Data. F. Bendix, R. Kosara, H. Hauser. Infovis 2005
Parallel Coordinates Parallel Sets
55
Scatterplot Matrix
56
Scatterplot Matrix
57
VaR Display
J. Yang et al. Value and Relation Display: Interactive Visual Exploration of Large Data Sets with Hundreds of Dimensions. IEEE Trans. Vis. Comput. Graph. 13(3): 494-507 (2007)
58
Projection Pursuit Techniques
Idea: For High-Dimensional Dataset1. locate projections to low-dimensional space that reveal most details about dataset structure 2. extract and analyze projection structures from projections
Two general approaches: manual and automatic.
59
Automatic Projection PursuitFriedman and Tukey
Friedman and Tukey’s method
1. characterize a given projection by a numerical index that indicates the amount of structure that is present. 2. heuristically search to locate the ``interesting'' projections using the index. 3. remove detected structures from the data. 4. Iterate until there is no remaining structure detectable within the data.
60
Landscapes
Hot topics of a news collectionL. Nowell, E. Hetzler, and T. Tanasse. “Change Blindness in Information Visualization: A Case Study”. Infovis 2001
61
Landscapes
How was the figure generated?Documents (data items)
Keywords (dimensions)
N-d vector for each documents
Projection from N-d space to 2-d space
Landscape view
Wise, J., Thomas, J., et al. “Visualizing theNon-Visual: Spatial Analysis and Interaction withInformation from Text Documents”, Infovis 95
62
Dust & MagnetDust & Magnet: Multivariate Information Visualization using a Magnet Metaphor, Yi et al. Information Visualization (2005)
Video
63
Hierarchical Techniques
Basic ideas: Visualization of the data using a hierarchical partitioning into subspaces
Dimensional Stacking [LWW90]Worlds-within-Worlds [FB 90a/b]Treemap [Shn 92, Joh 93]Cone Trees [RMC 91]InfoCube [RG93]
64
Dimensional Stacking
Imagine each data item (4 attributes) as a small block. We place all blocks on a table.
65
Dimensional Stacking
Add grids on the table. Place the blocks in the grids according to their values of attribute1.
According to values of attribute 1
66
Dimensional Stacking
Add grids in grids. Place the blocks in the grids according to their values of attribute2.
According to values of attribute 2
67
Dimensional Stacking
Add grids in grids. Place the blocks in the grids according to their values of attribute3.
According to values of attribute 3
68
Dimensional Stacking
Add grids in grids. Place the blocks in the grids according to their values of attribute4.
According to values of attribute 4
69
Dimensional Stacking
Fix one block!
Expand the block
70
Dimensional Stacking
Fix another block
71
Dimensional Stacking
Dimensional stacking!
72
Dimensional Stacking
visualization of oil mining data with longitude and latitude mapped to the outer x-, y- axes and ore grade and depth mapped to the inner x-, y- axes
M. Ward, Worcester Polytechnic Institute
73
Worlds within Worlds
Feiner S., Beshers C.: ‘World within World: Metaphors for Exploring n-dimensional Virtual Worlds’, Proc. UIST, 1990, pp. 76-83.
74
Icon-Based Techniques
Basic idea: visualization of the data values as features of icons
Chernoff-Faces [Che 73, Tuf 83]Stick Figures [Pic 70, PG88]Shape Coding [Bed 90]Color Icons [Lev 91, KK94]TileBars [Hea 95]
75
Glyphs
Profile GlyphsEach bar encodes a variable’s value
3 data items in the iris dataset
Min
Max
1 data item with 4 attributes
v1 v2 v3 v4
76
Chernoff faces (1973, Herman Chernoff)
77
Dr. Eugene Turner, 1979
78
Star Glyphs
Space out variables at equal angles around a circleEach arm encodes a variable’s value
4 data items in the iris dataset
v1
v2
v3
v4
1 data item with 4 attributes
79
Glyphs
80
Stick Figures
The mappingTwo attributes - display axes Others - angle and/or length of limbs
Texture patterns in the visualization show certain data characteristics
Stick figure icon
Pickett R. M., Grinstein G. G.: ‘Iconographic Displays for Visualizing Multidimensional Data’, Proc. IEEE Conf.on Systems, Man and Cybernetics, 1988, pp. 514-519.
81
Stick Figures
http://ivpr.cs.uml.edu/gallery/G. Grinstein, University of Massachusetts at Lowell
5-dim. imagedata from thegreat lake region
82
Stick Figures
Stick figure icon familyTry all different mappings
12X5! = 1440 picturesMovie
83
Stick Figures
The same dataset, different mapping
http://ivpr.cs.uml.edu/gallery/G. Grinstein, University of Massachusetts at Lowell
84
Stick Figures
Black background White background
http://ivpr.cs.uml.edu/gallery/G. Grinstein, University of Massachusetts at Lowell
85
More
http://ivpr.cs.uml.edu/gallery/G. Grinstein, University of Massachusetts at Lowell
86
Shape Coding
Data item - small array of fieldsEach field - one attribute value
Arrangement of attribute fields (e.g., 12-dimensional data):
The figure is taken from Dr. D. Keim’s tutorial notes in Infovis 00
87
Shape Coding
Beddow J.: ‘Shape Coding of Multidimensional Data on a Mircocomputer Display’, Visualization ‘90, 1990, pp. 238-246.
88
Color Icons [Lev 91, KK94]
Data item - color icon (arrays of color fields)Each field - one attribute valueArrangement is query-dependent (e.g., spiral)
6 dimensional dataset
The figure is taken from Dr. D. Keim’s tutorial notes in Infovis 00.
89
Color Icons
The figure is taken from Dr. D. Keim’s tutorial notes in Infovis 00
90
Pixel-Oriented TechniquesBasic idea
Each value - one colored pixel (value ranges -> fixed colormap)
Values for each attribute are presented in separate subwindowsValues of the same data item are at the same postions of all subwindows
Keim’s tutorial notes in Infovis 00
91
Major Challenge
Patterns <-the texture of the subwindows
How to order the pixels to get informative textures?
92
Query-Independent Techniques
Recursive pattern arrangements
The figure is taken from Dr. D. Keim’s tutorial notes in Infovis 00
93
Query-Independent Techniques
Recursive pattern arrangements
The figure is taken from Dr. D. Keim’s tutorial notes in Infovis 00
94
Query-Dependent Techniques
Basic idea:data items (a1, a2, ..., am) & query (q1, q2, ..., qm)distances (d1, d2, ... dm)extend distances by overall distance (dm+1)determine data items with lowest overall distancesmap distances to color (for each attribute)visualize each distance value di by one colored pixel
The slide is taken from Dr. D. Keim’s tutorial notes in Infovis 00
95
Query-Dependent Techniques
The figure is taken from Dr. D. Keim’s tutorial notes in Infovis 00
Spiral technique
Arrangement The m+1 dimension (overall distance)
96
Query-Dependent Techniques
The figure is taken from Dr. D. Keim’s tutorial notes in Infovis 00
Spiral technique
97
Major References
Colin Ware. Information Visualization, 2004Daniel Keim. Tutorial note in InfoVis 2000John Stasko. Course slides, Fall 2005Edward Tufte:
The Visual Display of Quantitative InformationEnvisioning InformationVisual Explanation
98
Homework1. Read referred papers2. Download XMDV fromhttp://davis.wpi.edu/~xmdv/downloadxmdv.htmlBinary Release, Microsoft Windows, Xmdv 7.0 and play with it.3. Prepare your next class by reading papers
The paper list and course slides will be published on my webpage
www.cs.uncc.edu/~jyang13
99