visualizing quantitative information › ~jasleen › courses › ... › visualization.pdf ·...
TRANSCRIPT
1COMP 190-088: Systems Performance Analysis http://www.cs.unc.edu/~jasleen/Courses/Spring05
COMP 190-088: Systems Performance Analysis
Visualizing Quantitative Information
Jasleen KaurDepartment of Computer Science
The University of North Carolina at Chapel Hill
Spring 2005
2COMP 190-088: Systems Performance Analysis http://www.cs.unc.edu/~jasleen/Courses/Spring05
Presenting Data Using Graphic Charts
Need to present results after analyzing dataConvey in a clear and simple manner
Graphic Charts: important medium for presenting resultsSingle chart can convey several conclusions
Presents information concisely and saves the reader’s time
Can be used to interest the (lazy) readerMost readers find it easier to scan a figure to grasp main pointsIf too much effort, many will ignore
Good medium for emphasizing or reinforcing a conclusionPictures are easier to remember after the presentation
How to make best use of graphic charts?How to make best use of graphic charts?
3COMP 190-088: Systems Performance Analysis http://www.cs.unc.edu/~jasleen/Courses/Spring05
Data Visualization Using Graphic Charts
Guidelines for preparing good graphic chartsPictorial gamesGantt and Kiviat charts
4COMP 190-088: Systems Performance Analysis http://www.cs.unc.edu/~jasleen/Courses/Spring05
Types of Variables
Computer typeSupercomputersMinicomputersMicrocomputers
VariablesVariables
OrderedOrdered UnorderedUnordered DiscreteDiscrete ContinuousContinuous
QualitativeQualitative QuantitativeQuantitative
Mutually exclusive and exhaustive sub-classes
Levels expressed numerically
Workload typeScientificEngineeringEducational
# of processors in a multiprocessor systemDisk size in blocks
Response timeWeight of a laptop
5COMP 190-088: Systems Performance Analysis http://www.cs.unc.edu/~jasleen/Courses/Spring05
Guidelines for Preparing Good Graphic Charts
Use appropriate chart type for variables
Require minimum effort from the reader
Minimize ink
Use commonly-accepted practices
Guidelines are not strict rulesGuidelines are not strict rules
Some guidelines may be conflicting!Some guidelines may be conflicting!
6COMP 190-088: Systems Performance Analysis http://www.cs.unc.edu/~jasleen/Courses/Spring05
Use Appropriate Chart Type
Lines imply that intermediate values can be approximately interpolated
Do not use a line chart for unordered, categorical variables!Do not use a line chart for unordered, categorical variables!
7COMP 190-088: Systems Performance Analysis http://www.cs.unc.edu/~jasleen/Courses/Spring05
Minimize Reader Effort
Avoid ambiguityShow coordinate axes, scale divisions, and originIdentify individual curves and bars
Make axes labels informative“Daily CPU Usage” better than “CPU Usage”“CPU Time (in seconds)” better than “CPU Time”
Other issues:Direct labeling vs. legendsNumber of curves/bars on a single graphNumber of y-variables on a single chartUse of symbols vs. text
8COMP 190-088: Systems Performance Analysis http://www.cs.unc.edu/~jasleen/Courses/Spring05
Direct Labeling vs. Legends
Direct labeling preferred if number of curves/bars is largeDirect labeling preferred if number of curves/bars is large
9COMP 190-088: Systems Performance Analysis http://www.cs.unc.edu/~jasleen/Courses/Spring05
Number of Alternatives on a Single Graph
Too many curves/bars on a single graph clutter itLine chart: no more than 6 curvesBar chart: no more than 10 bars
Each bucket in a bar graph should have at least 5 data pointsElse, distribution can not be visualized properly
10COMP 190-088: Systems Performance Analysis http://www.cs.unc.edu/~jasleen/Courses/Spring05
Number of y-variables on a Single Chart
Difficult to associate scales with curvesIneffective way of saving space
11COMP 190-088: Systems Performance Analysis http://www.cs.unc.edu/~jasleen/Courses/Spring05
Using Symbols vs. Text
Don’t require reader to flip through report to decode symbolsMake the graph self-sufficient
Maximize information
12COMP 190-088: Systems Performance Analysis http://www.cs.unc.edu/~jasleen/Courses/Spring05
Minimize Ink
Maximize the information-to-ink ratioRemove extraneous information from the graph
Eg, don’t plot grid lines unless required to accurately read values
Use metrics that give more information for the same data
0
0.02
0.04
0.06
0.08
0.1
Una
vaila
bilit
y
1 2 3 4 5
Day of the week
0
0.2
0.4
0.6
0.8
1
Ava
ilabi
lity
1 2 3 4 5
Day of the week
13COMP 190-088: Systems Performance Analysis http://www.cs.unc.edu/~jasleen/Courses/Spring05
Use Commonly Accepted Practices
Independent variable (cause) should be plotted on the x-axisDependent variable (effect) on y-axis
If departing from these guidelines,make sure to bring to the reader’s attention
If departing from these guidelines,make sure to bring to the reader’s attention
ScalesAxes should be scaled linearlyScales should increase left-to-right, bottom-to-topAll scale divisions should be equalScale ranges should be based on minimum and maximum values
Origin should be at (0,0)
14COMP 190-088: Systems Performance Analysis http://www.cs.unc.edu/~jasleen/Courses/Spring05
Recap: Guidelines for Preparing Graphic Charts
Use appropriate chart type for variables
Require minimum effort from the reader
Minimize ink
Use commonly-accepted practices
15COMP 190-088: Systems Performance Analysis http://www.cs.unc.edu/~jasleen/Courses/Spring05
Data Visualization Using Graphic Charts
Guidelines for preparing good graphic chartsPictorial gamesGantt and Kiviat charts
16COMP 190-088: Systems Performance Analysis http://www.cs.unc.edu/~jasleen/Courses/Spring05
Three-quarter-high rule:Height of highest point ≥
3/4th of the horizontal offset of right-most point
Using Non-zero Origin to Emphasize DifferenceBy moving origin and scaling graph, it is possible to magnify/reduce the perception of difference
Verify zero-origin & three-quarter-high ruleVerify zero-origin & three-quarter-high rule
17COMP 190-088: Systems Performance Analysis http://www.cs.unc.edu/~jasleen/Courses/Spring05
Using Double-whammy Graphs for DramatizationTwo curves on the same graph can have twice as much impact as one!
For an unguarded audience, system loses on both metrics as number of users increase!
For an unguarded audience, system loses on both metrics as number of users increase!
Can be used to amplify result by plotting metrics that are relatedUser may not realize that metrics can be computed from each other
18COMP 190-088: Systems Performance Analysis http://www.cs.unc.edu/~jasleen/Courses/Spring05
Plotting Results Without Confidence Intervals
Most measurements result in random quantification of performanceWhen comparing two systems, mean performance insufficientOverlapping confidence intervals indicate statistical indifference
19COMP 190-088: Systems Performance Analysis http://www.cs.unc.edu/~jasleen/Courses/Spring05
Pictograms Scaled By Height
One way to depict relative difference in performance:Draw appropriately scaled pictures of systemseg, draw A twice as big as B if performance of A is twice that of B
BA
Total area of A is 4 times that of B!Total area of A is 4 times that of B!
20COMP 190-088: Systems Performance Analysis http://www.cs.unc.edu/~jasleen/Courses/Spring05
Using Inappropriate Cell Size in HistogramsSelecting cell size in histogram often requires more than one attempt
If cells are too large, all data points fall in a few cellsIf cells are too small, the histogram lacks smoothness
0
2
4
6
8
10
12
(0,2) (4,6) (8,10)
Given enough points, it is possible to fit more than one distributionGiven enough points, it is possible to fit more than one distribution
0
2
4
6
8
10
12
14
16
18
(0,6) (6,12)
Most statistical tests for distribution-fit require at least 5 points/cell
Normal or Exponential Distribution?
21COMP 190-088: Systems Performance Analysis http://www.cs.unc.edu/~jasleen/Courses/Spring05
Using Broken Scales in Column Charts
Similar effect as using non-zero originsAllows one to amplify negligible performance difference
22COMP 190-088: Systems Performance Analysis http://www.cs.unc.edu/~jasleen/Courses/Spring05
Real Pictorial Games Played by the Press
Examples of:Non-zero originIncomparable dataNon-uniform scalingLeaving out contextCombining visual and statistical tricks
*** All pictures taken from:E.R. Tufte, “The Visual Display of Quantitative Information”, Graphics Press, 2002.
23COMP 190-088: Systems Performance Analysis http://www.cs.unc.edu/~jasleen/Courses/Spring05
Non-zero Origin
Bars begin at bottom at ~ - $4.2 MBars begin at bottom at ~ - $4.2 M
24COMP 190-088: Systems Performance Analysis http://www.cs.unc.edu/~jasleen/Courses/Spring05
Incomparable Data
Comparing six months of payments in 1978 to full year’s worth in 1976 and 1977
Lie repeated four times over!Lie repeated four times over!
25COMP 190-088: Systems Performance Analysis http://www.cs.unc.edu/~jasleen/Courses/Spring05
Incomparable Data
In reality, number of prizes won by US increased!In reality, number of prizes won by US increased!
26COMP 190-088: Systems Performance Analysis http://www.cs.unc.edu/~jasleen/Courses/Spring05
Increase in length of line segment= (5.3 – 0.6)*100/0.6= 783%
Non-uniform Scaling
Increase in Standard from 1978 – 1985= (27.5 – 18)*100/18= 53%
Lie factor = 783/53= 14.8 !
Future not at horizonDates do not grow with perspectiveLine widths and numbers on right changing due to two effects:
Change in valueChange in perspective
Not possible for reader to separate the two effects
27COMP 190-088: Systems Performance Analysis http://www.cs.unc.edu/~jasleen/Courses/Spring05
Replotting Fuel Economy Standards
Improvements were gradual at first, then doubled the rate from 1980 to 1983, and flattened out after that
28COMP 190-088: Systems Performance Analysis http://www.cs.unc.edu/~jasleen/Courses/Spring05
Leaving Out Context
Compared to what?Compared to what?
29COMP 190-088: Systems Performance Analysis http://www.cs.unc.edu/~jasleen/Courses/Spring05
Combining Visual and Statistical Tricks
Indicates that the sate budget increased substantially in the last 9 years
30COMP 190-088: Systems Performance Analysis http://www.cs.unc.edu/~jasleen/Courses/Spring05
What’s Wrong?
Combination of visual and statistical tricks!
31COMP 190-088: Systems Performance Analysis http://www.cs.unc.edu/~jasleen/Courses/Spring05
Visual Tricks
Leaving out distortion yields a calmer view of budgetLeaving out distortion yields a calmer view of budget
32COMP 190-088: Systems Performance Analysis http://www.cs.unc.edu/~jasleen/Courses/Spring05
Statistical TricksIgnoring population growth
Population increased by 1.7 million (10%)Part of budget growth simply paralleled population growth
Ignoring inflationGoods that cost government $1 in 1967, cost $2.03 in 1977Mixes changes in the value of money with changes in budget
Budget declined in 1977!Budget declined in 1977!
Compute expenditure in constant dollars per capita
33COMP 190-088: Systems Performance Analysis http://www.cs.unc.edu/~jasleen/Courses/Spring05
Data Visualization Using Graphic Charts
Guidelines for preparing good graphic chartsPictorial gamesGantt and Kiviat charts
34COMP 190-088: Systems Performance Analysis http://www.cs.unc.edu/~jasleen/Courses/Spring05
Gantt ChartsCan be used to show the relative duration of several Boolean conditions
eg, whether resource is idle or busy
Properties
CPU utilization: 60%During 20% of the time, both CPU and I/O busyDuring 30% of the time, CPU and network used, but not I/ODuring 10% of the time, all 3 resources busyNetwork alone used for 15% of the time
Each condition shown by a set of horizontal line segmentsTotal length of segment represents relative duration of conditionPosition of various segments is arranged such that overlap between different lines represents the overlap between the conditions
eg, utilization of CPU, I/O Channel, Network Link
35COMP 190-088: Systems Performance Analysis http://www.cs.unc.edu/~jasleen/Courses/Spring05
Developing Gantt ChartsA B C D Time Used (%)0 0 0 0 50 0 0 1 50 0 1 0 00 0 1 1 50 1 0 0 100 1 0 1 50 1 1 0 100 1 1 1 51 0 0 0 101 0 0 1 51 0 1 0 01 0 1 1 51 1 0 0 101 1 0 1 101 1 1 0 51 1 1 1 10
100
36COMP 190-088: Systems Performance Analysis http://www.cs.unc.edu/~jasleen/Courses/Spring05
Kiviat GraphsHelp system managers quickly recognize performance problems
Circular graph with several different metrics plotted along radial lines
Only 4 independent metrics!Only 4 independent metrics!
CPU Busy = Problem State + Supervisor StateCPU Wait = 100 – CPU BusyChannel Only = Any Channel – CPU/Channel OverlapCPU Only = CPU Busy – CPU/Channel Overlap
Typically, even number of metrics usedHalf are HB metrics, half are LB metricsPlotted alternatively
Kiviat graph for ideal system is a star!
37COMP 190-088: Systems Performance Analysis http://www.cs.unc.edu/~jasleen/Courses/Spring05
Shapes of Kiviat GraphsHelp easily recognize patterns of problems
CPU keelboat: computation-bound system
I/O wedge: I/O bound system
I/O arrow: considerable CPU and I/O usage
38COMP 190-088: Systems Performance Analysis http://www.cs.unc.edu/~jasleen/Courses/Spring05
Quantifying Goodness of Kiviat Graphs
Merrill’s “figure of merit”:
Plot “reversed” Kiviat graphLB metrics are plotted so that 0% is at circumference, and 100% if at centerHB metrics plotted in the usual manner
Compute goodness measureSquare root of the area covered by graph
39COMP 190-088: Systems Performance Analysis http://www.cs.unc.edu/~jasleen/Courses/Spring05
Course Outline
Selection of metricsPerformance Evaluation MethodologiesWorkload selectionMeasurements toolsAnalysis and visualization of measured dataSystem ModelingSimulationsCase studiesDistributed monitoring infrastructuresPA in the Research and Industrial communities