multivariate data & representations
TRANSCRIPT
1
Multivariate Data &Representations
CS 7450 - Information VisualizationJan. 15, 2004John Stasko
Spring 2004 CS 7450 2
Agenda
• Data forms and representations• Basic representation techniques• Multivariate (>3) techniques
2
Spring 2004 CS 7450 3
Data Sets
• Data comes in many different forms• Typically, not in the way you want it
• How is stored (in the raw)?
Spring 2004 CS 7450 4
Example
• Cars− make− model− year− miles per gallon− cost− number of cylinders− weights− ...
3
Spring 2004 CS 7450 5
Example
• Web pages
Spring 2004 CS 7450 6
Data Tables
• Often, we take raw data and transform it into a form that is more workable
• Main idea:− Individual items are called cases− Cases have variables (attributes)
4
Spring 2004 CS 7450 7
Data Table Format
Case1 Case2 Case3 ...
Variable1
Variable2
Variable3
...
Value11 Value21 Value31
Value12 Value22 Value32
Value13 Value23 Value33
Think of as a functionf(case1) = <Val11, Val12,…>
Spring 2004 CS 7450 8
Example
People in class
Mary Jim Sally Mitch ...
SSN
Age
Hair
GPA
...
145 294 563 823
23 17 47 29
brown black blonde red
2.9 3.7 3.4 2.1
5
Spring 2004 CS 7450 9
Example
Baseballstatistics
Spring 2004 CS 7450 10
Variable Types
• Three main types of variables− N-Nominal (equal or not equal to other
values)Example: gender
− O-Ordinal (obeys < relation, ordered set)Example: fr,so,jr,sr
− Q-Quantitative (can do math on them)Example: age
6
Spring 2004 CS 7450 11
Metadata
• Descriptive information about the data− Might be something as simple as the type of
a variable, or could be more complex− For times when the table itself just isn’t
enough− Example: if variable1 is “l”, then variable3
can only be 3, 7 or 16
Spring 2004 CS 7450 12
How Many Variables?
• Data sets of dimensions 1, 2, 3 are common
• Number of variables per class− 1 - Univariate data− 2 - Bivariate data− 3 - Trivariate data− >3 - Hypervariate data
7
Spring 2004 CS 7450 13
Representation
• What’s a common way of visually representing multivariate data sets?
• Graphs!
Spring 2004 CS 7450 14
Good Example
www.nationmaster.com
8
Spring 2004 CS 7450 15
Basic Symbolic Displays
• Graphs • Charts• Maps• Diagrams
From:S. Kosslyn, “Understanding chartsand graphs”, Applied CognitivePsychology, 1989.
Spring 2004 CS 7450 16
1. Graph
Showing the relationships between variables’values in a data table
0
20
40
60
80
100
1stQtr
2ndQtr
3rdQtr
4thQtr
EastWestNorth
9
Spring 2004 CS 7450 17
Properties
• Graph− Visual display that illustrates one or more
relationships among entities− Shorthand way to present information− Allows a trend, pattern or comparison to be
easily comprehended
Spring 2004 CS 7450 18
Issues
• Critical to remain task-centric− Why do you need a graph?− What questions are being answered?− What data is needed to answer those
questions?− Who is the audience?
time
money
10
Spring 2004 CS 7450 19
Graph Components
• Framework− Measurement types, scale
• Content− Marks, lines, points
• Labels− Title, axes, ticks
Spring 2004 CS 7450 20
Other Symbolic Displays
• Chart• Map• Diagram
11
Spring 2004 CS 7450 21
2. Chart
• Structure is important, relates entities to each other• Primarily uses lines, enclosure, position to link entities
Examples: flowchart, family tree, org chart, ...
Spring 2004 CS 7450 22
3. Map
• Representation of spatial relations• Locations identified by labels
12
Spring 2004 CS 7450 23
Choropleth Map
Areas are filledand coloreddifferently toindicate someattribute of thatregion
Spring 2004 CS 7450 24
Cartography
• Cartographers and map-makers have a wealth of knowledge about the design and creation of visual information artifacts− Labeling, color, layout, …
• Information visualization researchers should learn from this older, existing area
13
Spring 2004 CS 7450 25
4. Diagram
• Schematic picture of object or entity• Parts are symbolic
Examples: figures, steps in a manual, illustrations,...
Spring 2004 CS 7450 26
Details
• What are the constituent pieces of these four symbolic displays?
• What are the building blocks?
14
Spring 2004 CS 7450 27
Visual Structures
• Composed of− Spatial substrate− Marks− Graphical properties of marks
Spring 2004 CS 7450 28
Space
• Visually dominant• Often put axes on space to assist• Use techniques of
composition, alignment, folding,recursion, overloading to 1) increase use of space2) do data encodings
15
Spring 2004 CS 7450 29
Marks
• Things that occur in space− Points− Lines− Areas− Volumes
Spring 2004 CS 7450 30
Graphical Properties
• Size, shape, color, orientation...
Spatial properties Object properties
Expressingextent
Differentiatingmarks
PositionSize Grayscale
Orientation ColorShapeTexture
16
Spring 2004 CS 7450 31
Back to Data
• What were the different types of data sets?
• Number of variables per class− 1 - Univariate data− 2 - Bivariate data− 3 - Trivariate data− >3 - Hypervariate data
Spring 2004 CS 7450 32
Univariate Data
• Representations
7
5
3
1
Bill
0 20
Mean
low highMiddle 50%
Tukey box plot
17
Spring 2004 CS 7450 33
What goes where
• In univariate representations, we often think of the data case as being shown along one dimension, and the value in another
Linegraph
Bargraph
Y-axis is quantitativevariable
See changes overconsecutive values
Y-axis is quantitativevariable
Compare relative pointvalues
Spring 2004 CS 7450 34
Alternative View
• We may think of graph as representing independent (data case) and dependent (value) variables
• Guideline:− Independent vs. dependent variables
Put independent on x-axisSee resultant dependent variables along y-axis
18
Spring 2004 CS 7450 35
Bivariate Data
• Representations
Scatter plot is common
price
mileage
Two variables, want tosee relationship
Is there a linear, curved orrandom pattern?
Each mark is nowa data case
Spring 2004 CS 7450 36
Trivariate Data
• Representations
3D scatter plot is possible
horsepower
mileage
price
19
Spring 2004 CS 7450 37
Alternative Representation
Still use 2D but havemark propertyrepresent thirdvariable
Spring 2004 CS 7450 38
Alternative Representation
Represent each variablein its own explicit way
20
Spring 2004 CS 7450 39
Hypervariate Data
• Ahhh, the tough one• Number of well-known visualization
techniques exist for data sets of 1-3 dimensions− line graphs, bar graphs, scatter plots OK− We see a 3-D world (4-D with time)
• What about data sets with more than 3 variables?− Often the interesting, challenging ones
Spring 2004 CS 7450 40
Multiple Views
Give each variable its own display
A B C D E1 4 1 8 3 52 6 3 4 2 13 5 7 2 4 34 2 6 3 1 5
A B C D E
1
2
3
4
21
Spring 2004 CS 7450 41
Scatterplot Matrix
Represent each possiblepair of variables in theirown 2-D scatterplot
Useful for what?Misses what?
Spring 2004 CS 7450 42
Chernoff Faces
Encode different variables’ values in characteristicsof human face
http://www.cs.uchicago.edu/~wiseman/chernoff/http://hesketh.com/schampeo/projects/Faces/chernoff.html
Cute applets:
22
Spring 2004 CS 7450 43
Star Plots
Var 1
Var 2
Var 3Var 4
Var 5
Value
Space out the nvariables at equalangles around a circle
Each “spoke” encodesa variable’s value
Spring 2004 CS 7450 44
Star Plot examples
http://seamonkey.ed.asu.edu/~behrens/asu/reports/compre/comp1.html
23
Spring 2004 CS 7450 45
Star Coordinates
E. Kandogan, “Star Coordinates: A Multi-dimensional VisualizationTechnique with Uniform Treatment of Dimensions”, InfoVis 2000Late-Breaking Hot Topics, Oct. 2000 Demo
Spring 2004 CS 7450 46
Parallel Coordinates
• What are they?− Explain…
24
Spring 2004 CS 7450 47
Parallel Coordinates
V1 V2 V3 V4 V5
Encode variables alonga horizontal row
Vertical line specifiesvalues
Spring 2004 CS 7450 48
Parallel Coords Example
Basic
Grayscale
Color
25
Spring 2004 CS 7450 49
Application
• System that uses parallel coordinates for information analysis and discovery
• Interactive tool− Can focus on certain data items− Color
Taken from:
A. Inselberg, “Multidimensional Detective”InfoVis ‘97, 1997.
Spring 2004 CS 7450 50
Discuss
• What was their domain?• What was their problem?• What were their data sets?
26
Spring 2004 CS 7450 51
The Problem
• VLSI chip manufacture• Want high quality chips (high speed) and
a high yield batch (% of useful chips)• Able to track defects• Hypothesis: No defects gives desired chip
types• 473 batches of data
Spring 2004 CS 7450 52
The Data
• 16 variables− X1 - yield− X2 - quality− X3-X12 - # defects (inverted)− X13-X16 - physical parameters
27
Spring 2004 CS 7450 53
Parallel Coordinate Display
Yikes!But notthat bad
yield &quality defects parameters
Distributionsx1 - normalx2 - bipolar
Spring 2004 CS 7450 54
Top Yield & Quality
defects
Have some defects
split
28
Spring 2004 CS 7450 55
Minimal Defects
Not thehighestyields andquality
Spring 2004 CS 7450 56
Best Yields
Appears thatsome defectsare necessaryto produce the best chips
Non-intuitive!
29
Spring 2004 CS 7450 57
Xmdv
Toolsuite createdby Matthew Wardof WPI
Includes parallelcoordinate views
Demo
Spring 2004 CS 7450 58
Parallel Coordinate Tree
D. Brodbeck and L. Girardin, "Visualization of Large-Scale Customer Satisfaction Surveys Using a Parallel Coordinate Tree", InfoVis ’03.
Demo
30
Spring 2004 CS 7450 59
Parallel Coordinates
• Technique− Strengths?− Weaknesses?
Spring 2004 CS 7450 60
Sliding Rods
T. Lanning, K. Wittenburg, et al, "Multidimensional Information Visualization through Sliding Rods", Proceedings of AVI 2000
31
Spring 2004 CS 7450 61
Administratia
• Computer accounts• HW 2 due Tuesday
Spring 2004 CS 7450 62
Upcoming
• Multivariate vis tools− Reading:
Eick paper
• Visual perception• Tufte (please be reading)
32
Spring 2004 CS 7450 63
Sources Used
CMS bookReferenced articlesMarti Hearst SIMS 247 lecturesKosslyn ‘89 articleA. Marcus, Graphic Design for Electronic Documents
and User InterfacesM. Monmonier, How to Lie with MapsW. Cleveland, The Elements of Graphing DataC. H. Yu, Visualization Techniques of Different Dimensionshttp://seamonkey.ed.asu.edu/~behrens/asu/reports/compre/comp1.htmlhttp://www.csc.ncsu.edu/faculty/healey/PP/PP.html