visualizing quantitative information › ~jasleen › courses › ... › visualization.pdf ·...

20
1 COMP 190-088: Systems Performance Analysis http://www.cs.unc.edu/~jasleen/Courses/Spring05 COMP 190-088: Systems Performance Analysis Visualizing Quantitative Information Jasleen Kaur Department of Computer Science The University of North Carolina at Chapel Hill Spring 2005 2 COMP 190-088: Systems Performance Analysis http://www.cs.unc.edu/~jasleen/Courses/Spring05 Presenting Data Using Graphic Charts Need to present results after analyzing data Convey in a clear and simple manner Graphic Charts: important medium for presenting results Single chart can convey several conclusions Presents information concisely and saves the reader’s time Can be used to interest the (lazy) reader Most readers find it easier to scan a figure to grasp main points If too much effort, many will ignore Good medium for emphasizing or reinforcing a conclusion Pictures are easier to remember after the presentation How to make best use of graphic charts?

Upload: others

Post on 10-Jun-2020

3 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Visualizing Quantitative Information › ~jasleen › Courses › ... › Visualization.pdf · Visualizing Quantitative Information Jasleen Kaur Department of Computer Science

1COMP 190-088: Systems Performance Analysis http://www.cs.unc.edu/~jasleen/Courses/Spring05

COMP 190-088: Systems Performance Analysis

Visualizing Quantitative Information

Jasleen KaurDepartment of Computer Science

The University of North Carolina at Chapel Hill

Spring 2005

2COMP 190-088: Systems Performance Analysis http://www.cs.unc.edu/~jasleen/Courses/Spring05

Presenting Data Using Graphic Charts

Need to present results after analyzing dataConvey in a clear and simple manner

Graphic Charts: important medium for presenting resultsSingle chart can convey several conclusions

Presents information concisely and saves the reader’s time

Can be used to interest the (lazy) readerMost readers find it easier to scan a figure to grasp main pointsIf too much effort, many will ignore

Good medium for emphasizing or reinforcing a conclusionPictures are easier to remember after the presentation

How to make best use of graphic charts?How to make best use of graphic charts?

Page 2: Visualizing Quantitative Information › ~jasleen › Courses › ... › Visualization.pdf · Visualizing Quantitative Information Jasleen Kaur Department of Computer Science

3COMP 190-088: Systems Performance Analysis http://www.cs.unc.edu/~jasleen/Courses/Spring05

Data Visualization Using Graphic Charts

Guidelines for preparing good graphic chartsPictorial gamesGantt and Kiviat charts

4COMP 190-088: Systems Performance Analysis http://www.cs.unc.edu/~jasleen/Courses/Spring05

Types of Variables

Computer typeSupercomputersMinicomputersMicrocomputers

VariablesVariables

OrderedOrdered UnorderedUnordered DiscreteDiscrete ContinuousContinuous

QualitativeQualitative QuantitativeQuantitative

Mutually exclusive and exhaustive sub-classes

Levels expressed numerically

Workload typeScientificEngineeringEducational

# of processors in a multiprocessor systemDisk size in blocks

Response timeWeight of a laptop

Page 3: Visualizing Quantitative Information › ~jasleen › Courses › ... › Visualization.pdf · Visualizing Quantitative Information Jasleen Kaur Department of Computer Science

5COMP 190-088: Systems Performance Analysis http://www.cs.unc.edu/~jasleen/Courses/Spring05

Guidelines for Preparing Good Graphic Charts

Use appropriate chart type for variables

Require minimum effort from the reader

Minimize ink

Use commonly-accepted practices

Guidelines are not strict rulesGuidelines are not strict rules

Some guidelines may be conflicting!Some guidelines may be conflicting!

6COMP 190-088: Systems Performance Analysis http://www.cs.unc.edu/~jasleen/Courses/Spring05

Use Appropriate Chart Type

Lines imply that intermediate values can be approximately interpolated

Do not use a line chart for unordered, categorical variables!Do not use a line chart for unordered, categorical variables!

Page 4: Visualizing Quantitative Information › ~jasleen › Courses › ... › Visualization.pdf · Visualizing Quantitative Information Jasleen Kaur Department of Computer Science

7COMP 190-088: Systems Performance Analysis http://www.cs.unc.edu/~jasleen/Courses/Spring05

Minimize Reader Effort

Avoid ambiguityShow coordinate axes, scale divisions, and originIdentify individual curves and bars

Make axes labels informative“Daily CPU Usage” better than “CPU Usage”“CPU Time (in seconds)” better than “CPU Time”

Other issues:Direct labeling vs. legendsNumber of curves/bars on a single graphNumber of y-variables on a single chartUse of symbols vs. text

8COMP 190-088: Systems Performance Analysis http://www.cs.unc.edu/~jasleen/Courses/Spring05

Direct Labeling vs. Legends

Direct labeling preferred if number of curves/bars is largeDirect labeling preferred if number of curves/bars is large

Page 5: Visualizing Quantitative Information › ~jasleen › Courses › ... › Visualization.pdf · Visualizing Quantitative Information Jasleen Kaur Department of Computer Science

9COMP 190-088: Systems Performance Analysis http://www.cs.unc.edu/~jasleen/Courses/Spring05

Number of Alternatives on a Single Graph

Too many curves/bars on a single graph clutter itLine chart: no more than 6 curvesBar chart: no more than 10 bars

Each bucket in a bar graph should have at least 5 data pointsElse, distribution can not be visualized properly

10COMP 190-088: Systems Performance Analysis http://www.cs.unc.edu/~jasleen/Courses/Spring05

Number of y-variables on a Single Chart

Difficult to associate scales with curvesIneffective way of saving space

Page 6: Visualizing Quantitative Information › ~jasleen › Courses › ... › Visualization.pdf · Visualizing Quantitative Information Jasleen Kaur Department of Computer Science

11COMP 190-088: Systems Performance Analysis http://www.cs.unc.edu/~jasleen/Courses/Spring05

Using Symbols vs. Text

Don’t require reader to flip through report to decode symbolsMake the graph self-sufficient

Maximize information

12COMP 190-088: Systems Performance Analysis http://www.cs.unc.edu/~jasleen/Courses/Spring05

Minimize Ink

Maximize the information-to-ink ratioRemove extraneous information from the graph

Eg, don’t plot grid lines unless required to accurately read values

Use metrics that give more information for the same data

0

0.02

0.04

0.06

0.08

0.1

Una

vaila

bilit

y

1 2 3 4 5

Day of the week

0

0.2

0.4

0.6

0.8

1

Ava

ilabi

lity

1 2 3 4 5

Day of the week

Page 7: Visualizing Quantitative Information › ~jasleen › Courses › ... › Visualization.pdf · Visualizing Quantitative Information Jasleen Kaur Department of Computer Science

13COMP 190-088: Systems Performance Analysis http://www.cs.unc.edu/~jasleen/Courses/Spring05

Use Commonly Accepted Practices

Independent variable (cause) should be plotted on the x-axisDependent variable (effect) on y-axis

If departing from these guidelines,make sure to bring to the reader’s attention

If departing from these guidelines,make sure to bring to the reader’s attention

ScalesAxes should be scaled linearlyScales should increase left-to-right, bottom-to-topAll scale divisions should be equalScale ranges should be based on minimum and maximum values

Origin should be at (0,0)

14COMP 190-088: Systems Performance Analysis http://www.cs.unc.edu/~jasleen/Courses/Spring05

Recap: Guidelines for Preparing Graphic Charts

Use appropriate chart type for variables

Require minimum effort from the reader

Minimize ink

Use commonly-accepted practices

Page 8: Visualizing Quantitative Information › ~jasleen › Courses › ... › Visualization.pdf · Visualizing Quantitative Information Jasleen Kaur Department of Computer Science

15COMP 190-088: Systems Performance Analysis http://www.cs.unc.edu/~jasleen/Courses/Spring05

Data Visualization Using Graphic Charts

Guidelines for preparing good graphic chartsPictorial gamesGantt and Kiviat charts

16COMP 190-088: Systems Performance Analysis http://www.cs.unc.edu/~jasleen/Courses/Spring05

Three-quarter-high rule:Height of highest point ≥

3/4th of the horizontal offset of right-most point

Using Non-zero Origin to Emphasize DifferenceBy moving origin and scaling graph, it is possible to magnify/reduce the perception of difference

Verify zero-origin & three-quarter-high ruleVerify zero-origin & three-quarter-high rule

Page 9: Visualizing Quantitative Information › ~jasleen › Courses › ... › Visualization.pdf · Visualizing Quantitative Information Jasleen Kaur Department of Computer Science

17COMP 190-088: Systems Performance Analysis http://www.cs.unc.edu/~jasleen/Courses/Spring05

Using Double-whammy Graphs for DramatizationTwo curves on the same graph can have twice as much impact as one!

For an unguarded audience, system loses on both metrics as number of users increase!

For an unguarded audience, system loses on both metrics as number of users increase!

Can be used to amplify result by plotting metrics that are relatedUser may not realize that metrics can be computed from each other

18COMP 190-088: Systems Performance Analysis http://www.cs.unc.edu/~jasleen/Courses/Spring05

Plotting Results Without Confidence Intervals

Most measurements result in random quantification of performanceWhen comparing two systems, mean performance insufficientOverlapping confidence intervals indicate statistical indifference

Page 10: Visualizing Quantitative Information › ~jasleen › Courses › ... › Visualization.pdf · Visualizing Quantitative Information Jasleen Kaur Department of Computer Science

19COMP 190-088: Systems Performance Analysis http://www.cs.unc.edu/~jasleen/Courses/Spring05

Pictograms Scaled By Height

One way to depict relative difference in performance:Draw appropriately scaled pictures of systemseg, draw A twice as big as B if performance of A is twice that of B

BA

Total area of A is 4 times that of B!Total area of A is 4 times that of B!

20COMP 190-088: Systems Performance Analysis http://www.cs.unc.edu/~jasleen/Courses/Spring05

Using Inappropriate Cell Size in HistogramsSelecting cell size in histogram often requires more than one attempt

If cells are too large, all data points fall in a few cellsIf cells are too small, the histogram lacks smoothness

0

2

4

6

8

10

12

(0,2) (4,6) (8,10)

Given enough points, it is possible to fit more than one distributionGiven enough points, it is possible to fit more than one distribution

0

2

4

6

8

10

12

14

16

18

(0,6) (6,12)

Most statistical tests for distribution-fit require at least 5 points/cell

Normal or Exponential Distribution?

Page 11: Visualizing Quantitative Information › ~jasleen › Courses › ... › Visualization.pdf · Visualizing Quantitative Information Jasleen Kaur Department of Computer Science

21COMP 190-088: Systems Performance Analysis http://www.cs.unc.edu/~jasleen/Courses/Spring05

Using Broken Scales in Column Charts

Similar effect as using non-zero originsAllows one to amplify negligible performance difference

22COMP 190-088: Systems Performance Analysis http://www.cs.unc.edu/~jasleen/Courses/Spring05

Real Pictorial Games Played by the Press

Examples of:Non-zero originIncomparable dataNon-uniform scalingLeaving out contextCombining visual and statistical tricks

*** All pictures taken from:E.R. Tufte, “The Visual Display of Quantitative Information”, Graphics Press, 2002.

Page 12: Visualizing Quantitative Information › ~jasleen › Courses › ... › Visualization.pdf · Visualizing Quantitative Information Jasleen Kaur Department of Computer Science

23COMP 190-088: Systems Performance Analysis http://www.cs.unc.edu/~jasleen/Courses/Spring05

Non-zero Origin

Bars begin at bottom at ~ - $4.2 MBars begin at bottom at ~ - $4.2 M

24COMP 190-088: Systems Performance Analysis http://www.cs.unc.edu/~jasleen/Courses/Spring05

Incomparable Data

Comparing six months of payments in 1978 to full year’s worth in 1976 and 1977

Lie repeated four times over!Lie repeated four times over!

Page 13: Visualizing Quantitative Information › ~jasleen › Courses › ... › Visualization.pdf · Visualizing Quantitative Information Jasleen Kaur Department of Computer Science

25COMP 190-088: Systems Performance Analysis http://www.cs.unc.edu/~jasleen/Courses/Spring05

Incomparable Data

In reality, number of prizes won by US increased!In reality, number of prizes won by US increased!

26COMP 190-088: Systems Performance Analysis http://www.cs.unc.edu/~jasleen/Courses/Spring05

Increase in length of line segment= (5.3 – 0.6)*100/0.6= 783%

Non-uniform Scaling

Increase in Standard from 1978 – 1985= (27.5 – 18)*100/18= 53%

Lie factor = 783/53= 14.8 !

Future not at horizonDates do not grow with perspectiveLine widths and numbers on right changing due to two effects:

Change in valueChange in perspective

Not possible for reader to separate the two effects

Page 14: Visualizing Quantitative Information › ~jasleen › Courses › ... › Visualization.pdf · Visualizing Quantitative Information Jasleen Kaur Department of Computer Science

27COMP 190-088: Systems Performance Analysis http://www.cs.unc.edu/~jasleen/Courses/Spring05

Replotting Fuel Economy Standards

Improvements were gradual at first, then doubled the rate from 1980 to 1983, and flattened out after that

28COMP 190-088: Systems Performance Analysis http://www.cs.unc.edu/~jasleen/Courses/Spring05

Leaving Out Context

Compared to what?Compared to what?

Page 15: Visualizing Quantitative Information › ~jasleen › Courses › ... › Visualization.pdf · Visualizing Quantitative Information Jasleen Kaur Department of Computer Science

29COMP 190-088: Systems Performance Analysis http://www.cs.unc.edu/~jasleen/Courses/Spring05

Combining Visual and Statistical Tricks

Indicates that the sate budget increased substantially in the last 9 years

30COMP 190-088: Systems Performance Analysis http://www.cs.unc.edu/~jasleen/Courses/Spring05

What’s Wrong?

Combination of visual and statistical tricks!

Page 16: Visualizing Quantitative Information › ~jasleen › Courses › ... › Visualization.pdf · Visualizing Quantitative Information Jasleen Kaur Department of Computer Science

31COMP 190-088: Systems Performance Analysis http://www.cs.unc.edu/~jasleen/Courses/Spring05

Visual Tricks

Leaving out distortion yields a calmer view of budgetLeaving out distortion yields a calmer view of budget

32COMP 190-088: Systems Performance Analysis http://www.cs.unc.edu/~jasleen/Courses/Spring05

Statistical TricksIgnoring population growth

Population increased by 1.7 million (10%)Part of budget growth simply paralleled population growth

Ignoring inflationGoods that cost government $1 in 1967, cost $2.03 in 1977Mixes changes in the value of money with changes in budget

Budget declined in 1977!Budget declined in 1977!

Compute expenditure in constant dollars per capita

Page 17: Visualizing Quantitative Information › ~jasleen › Courses › ... › Visualization.pdf · Visualizing Quantitative Information Jasleen Kaur Department of Computer Science

33COMP 190-088: Systems Performance Analysis http://www.cs.unc.edu/~jasleen/Courses/Spring05

Data Visualization Using Graphic Charts

Guidelines for preparing good graphic chartsPictorial gamesGantt and Kiviat charts

34COMP 190-088: Systems Performance Analysis http://www.cs.unc.edu/~jasleen/Courses/Spring05

Gantt ChartsCan be used to show the relative duration of several Boolean conditions

eg, whether resource is idle or busy

Properties

CPU utilization: 60%During 20% of the time, both CPU and I/O busyDuring 30% of the time, CPU and network used, but not I/ODuring 10% of the time, all 3 resources busyNetwork alone used for 15% of the time

Each condition shown by a set of horizontal line segmentsTotal length of segment represents relative duration of conditionPosition of various segments is arranged such that overlap between different lines represents the overlap between the conditions

eg, utilization of CPU, I/O Channel, Network Link

Page 18: Visualizing Quantitative Information › ~jasleen › Courses › ... › Visualization.pdf · Visualizing Quantitative Information Jasleen Kaur Department of Computer Science

35COMP 190-088: Systems Performance Analysis http://www.cs.unc.edu/~jasleen/Courses/Spring05

Developing Gantt ChartsA B C D Time Used (%)0 0 0 0 50 0 0 1 50 0 1 0 00 0 1 1 50 1 0 0 100 1 0 1 50 1 1 0 100 1 1 1 51 0 0 0 101 0 0 1 51 0 1 0 01 0 1 1 51 1 0 0 101 1 0 1 101 1 1 0 51 1 1 1 10

100

36COMP 190-088: Systems Performance Analysis http://www.cs.unc.edu/~jasleen/Courses/Spring05

Kiviat GraphsHelp system managers quickly recognize performance problems

Circular graph with several different metrics plotted along radial lines

Only 4 independent metrics!Only 4 independent metrics!

CPU Busy = Problem State + Supervisor StateCPU Wait = 100 – CPU BusyChannel Only = Any Channel – CPU/Channel OverlapCPU Only = CPU Busy – CPU/Channel Overlap

Typically, even number of metrics usedHalf are HB metrics, half are LB metricsPlotted alternatively

Kiviat graph for ideal system is a star!

Page 19: Visualizing Quantitative Information › ~jasleen › Courses › ... › Visualization.pdf · Visualizing Quantitative Information Jasleen Kaur Department of Computer Science

37COMP 190-088: Systems Performance Analysis http://www.cs.unc.edu/~jasleen/Courses/Spring05

Shapes of Kiviat GraphsHelp easily recognize patterns of problems

CPU keelboat: computation-bound system

I/O wedge: I/O bound system

I/O arrow: considerable CPU and I/O usage

38COMP 190-088: Systems Performance Analysis http://www.cs.unc.edu/~jasleen/Courses/Spring05

Quantifying Goodness of Kiviat Graphs

Merrill’s “figure of merit”:

Plot “reversed” Kiviat graphLB metrics are plotted so that 0% is at circumference, and 100% if at centerHB metrics plotted in the usual manner

Compute goodness measureSquare root of the area covered by graph

Page 20: Visualizing Quantitative Information › ~jasleen › Courses › ... › Visualization.pdf · Visualizing Quantitative Information Jasleen Kaur Department of Computer Science

39COMP 190-088: Systems Performance Analysis http://www.cs.unc.edu/~jasleen/Courses/Spring05

Course Outline

Selection of metricsPerformance Evaluation MethodologiesWorkload selectionMeasurements toolsAnalysis and visualization of measured dataSystem ModelingSimulationsCase studiesDistributed monitoring infrastructuresPA in the Research and Industrial communities