just because you can doesn\'t mean you should - graphing data well

45
Steve Figard, Ph.D., Abbott Laboratories, Abbott Park, IL Just Because You Can, Doesn’t Mean You Just Because You Can, Doesn’t Mean You Should Should The Elements of Graphing Data Well The Elements of Graphing Data Well

Upload: figarsd

Post on 16-Apr-2017

1.164 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Just Because You Can Doesn\'t Mean You Should - Graphing Data Well

Steve Figard, Ph.D., Abbott Laboratories, Abbott Park, IL

Just Because You Can, Doesn’t Mean You Just Because You Can, Doesn’t Mean You ShouldShould

––The Elements of Graphing Data WellThe Elements of Graphing Data Well

Page 2: Just Because You Can Doesn\'t Mean You Should - Graphing Data Well

2

Agenda/Objectives of Presentation

Introduction Terminology The Ten Commandments of Good Graphics A Word about PowerPoint The “Best” & The “Worst” Concluding Statement

Page 3: Just Because You Can Doesn\'t Mean You Should - Graphing Data Well

3

Introduction

The Problem: a plethora of options/features/capabilities Confusion of what can be done with what ought to be done

Attributable to ignorance…“Programming today is a race between software engineers striving to build bigger and better idiot-proof programs, and the Universe trying to produce bigger and better idiots. So far, the Universe is winning.” -Richard Cook, science fiction author, The Wizardry Compiled

Due to lack of trainingNot usually covered in university classes

• Introduction• Terminology• 10 Commandments• PowerPoint• Best/Worst• Conclusion

Page 4: Just Because You Can Doesn\'t Mean You Should - Graphing Data Well

4

Introduction

Two goals of good graphicsClarity revealing the story in the dataEase of visualizing the plotted data

• Introduction• Terminology• 10 Commandments• PowerPoint• Best/Worst• Conclusion

“When a graph is made, quantitative and categorical information is encoded by a display method. Then the information is visually decoded. This visual perception is a vital link. No matter how clever the choice of the information, and no matter how technologically impressive the encoding, a visualization fails if the decoding fails. Some display methods lead to efficient, accurate decoding, and others lead to inefficient, inaccurate decoding.” - William Cleveland

The “Grand Unification Philosophy” of good graphicsminimize the mental gymnastics that the viewer must go through to understand the graph

Page 5: Just Because You Can Doesn\'t Mean You Should - Graphing Data Well

5

Terminology• Introduction• Terminology• 10 Commandments• PowerPoint• Best/Worst• Conclusion

aka, y axis

aka, x axis

Page 6: Just Because You Can Doesn\'t Mean You Should - Graphing Data Well

6

Terminology• Introduction• Terminology• 10 Commandments• PowerPoint• Best/Worst• Conclusion

Page 7: Just Because You Can Doesn\'t Mean You Should - Graphing Data Well

7

The Ten Commandments…of good graphics

Folded, spindled, and mutilated from:

• Introduction• Terminology• 10 Commandments• PowerPoint• Best/Worst• Conclusion

Page 8: Just Because You Can Doesn\'t Mean You Should - Graphing Data Well

8

The Ten Commandments…of good graphics

1. Thou shalt pay very close attention to thy axes, for therein lieth great opportunity to succeed or to fail.

The units of measure employed-alternate labels

• Introduction• Terminology• 10 Commandments• PowerPoint• Best/Worst• Conclusion

How relate to log2?

Now these numbers we understand!

Page 9: Just Because You Can Doesn\'t Mean You Should - Graphing Data Well

9

The Ten Commandments…of good graphics

1. Thou shalt pay very close attention to thy axes, for therein lieth great opportunity to succeed or to fail.

The units of measure employed-what to plot

• Introduction• Terminology• 10 Commandments• PowerPoint• Best/Worst• Conclusion

The raw data: interesting but not as informative…

…as the actual difference between the lines.

Page 10: Just Because You Can Doesn\'t Mean You Should - Graphing Data Well

10

The Ten Commandments…of good graphics

1. Thou shalt pay very close attention to thy axes, for therein lieth great opportunity to succeed or to fail.

The units of measure employed-plotting differences

• Introduction• Terminology• 10 Commandments• PowerPoint• Best/Worst• Conclusion

The optical illusion: The lines are only one unit apart across the entire range.

Why? Because the eye is good at perceiving perpendicular

distances between two curves, but not the difference in height.

Lesson to be learned: plotting themetric of interest may be moreInformative than the raw data.

Page 11: Just Because You Can Doesn\'t Mean You Should - Graphing Data Well

11

The Ten Commandments…of good graphics

1. Thou shalt pay very close attention to thy axes, for therein lieth great opportunity to succeed or to fail.

The range of those units of measure

Choose your range so thatthe data rectangle fills upas much of the scale-linerectangle as possibleDo not insist that the zero always be included on a scale showing magnitude, but…

• Introduction• Terminology• 10 Commandments• PowerPoint• Best/Worst• Conclusion

“Get your facts first, and then you can distort them as much as you please. (Facts are stubborn, but statistics are more pliable).” - Mark Twain

Page 12: Just Because You Can Doesn\'t Mean You Should - Graphing Data Well

12

The Ten Commandments…of good graphics

1. Thou shalt pay very close attention to thy axes, for therein lieth great opportunity to succeed or to fail.

The number of tick marks shownToo many = clutterToo few = “guesstimation” difficulties3-10 usually sufficientBeware abuse of time-scaletick marks by changing theinterval shown…

• Introduction• Terminology• 10 Commandments• PowerPoint• Best/Worst• Conclusion

one year interval five year interval

Page 13: Just Because You Can Doesn\'t Mean You Should - Graphing Data Well

13

The Ten Commandments…of good graphics

1. Thou shalt pay very close attention to thy axes, for therein lieth great opportunity to succeed or to fail.

The presence or absence of breaks in the axisOnly when necessary…try log scale firstIf used, do a full scale breakIf used, do NOT connect numerical values across the break!

• Introduction• Terminology• 10 Commandments• PowerPoint• Best/Worst• Conclusion

Page 14: Just Because You Can Doesn\'t Mean You Should - Graphing Data Well

14

The Ten Commandments…of good graphics

1. Thou shalt pay very close attention to thy axes, for therein lieth great opportunity to succeed or to fail.

The presence or absence of breaks in the axisPay close attention to ranges especially when breaks are clearly present – they will impact the interpretation of the data and may alter the message conveyed…

• Introduction• Terminology• 10 Commandments• PowerPoint• Best/Worst• Conclusion

Page 15: Just Because You Can Doesn\'t Mean You Should - Graphing Data Well

15

The Ten Commandments…of good graphics

1. Thou shalt pay very close attention to thy axes, for therein lieth great opportunity to succeed or to fail.

The size or length of the axis on the pageSometimes the default square or rectangle may hide important features in the data

• Introduction• Terminology• 10 Commandments• PowerPoint• Best/Worst• Conclusion

JMP is particularly good at allowing“on the fly” adjustment of axes

Page 16: Just Because You Can Doesn\'t Mean You Should - Graphing Data Well

16

The Ten Commandments…of good graphics

2. Thou shalt use color to categorize, not accessorize.Only two uses of color that transmit useful information to the viewer

Encoding a categorical variable

• Introduction• Terminology• 10 Commandments• PowerPoint• Best/Worst• Conclusion

Page 17: Just Because You Can Doesn\'t Mean You Should - Graphing Data Well

17

The Ten Commandments…of good graphics

2. Thou shalt use color to categorize, not accessorize.Only two uses of color that transmit useful information to the viewer

Encoding a quantitative variable: contour plotsthe choice of color for a contour plot must achieve two goals:effortless perception of the order of the values(i.e., we do not want to be constantly referring to a key)clearly perceived boundaries between adjacent levels

• Introduction• Terminology• 10 Commandments• PowerPoint• Best/Worst• Conclusion

Page 18: Just Because You Can Doesn\'t Mean You Should - Graphing Data Well

18

The Ten Commandments…of good graphics

3. Thou shalt choose symbols that can be easily distinguished from one another.

Concerned with plots in which the data overlaps so that discerning the different datasets being plotted becomes critical

• Introduction• Terminology• 10 Commandments• PowerPoint• Best/Worst• Conclusion

Page 19: Just Because You Can Doesn\'t Mean You Should - Graphing Data Well

19

The Ten Commandments…of good graphics

3. Thou shalt choose symbols that can be easily distinguished from one another.

“Texture” based on micropatterns inherent in the symbol: boundaries

• Introduction• Terminology• 10 Commandments• PowerPoint• Best/Worst• Conclusion

JMP provides several sets of markers that should be evaluated with this commandment in mind

Page 20: Just Because You Can Doesn\'t Mean You Should - Graphing Data Well

20

The Ten Commandments…of good graphics

4. Thou shalt not employ “chartjunk.”Category 1: unintentional optical art and the moiré effect

• Introduction• Terminology• 10 Commandments• PowerPoint• Best/Worst• Conclusion

Anyone recognizeExcel bar chartoptions here?!

The moiré effect describes the phenomenon when the graphic design interacts with the physiological tremor of the eye to generate the distracting appearance of vibration and movement.

Page 21: Just Because You Can Doesn\'t Mean You Should - Graphing Data Well

21

The Ten Commandments…of good graphics

4. Thou shalt not employ “chartjunk.”Category 1: unintentional optical art and the moiré effect

• Introduction• Terminology• 10 Commandments• PowerPoint• Best/Worst• Conclusion

Can you say “garish”?!

garish adj. 1 too bright or gaudy; showy; glaring 2 gaudily or showily dressed, decorated, written, etc.(Webster’s New World College Dictionary, 4th Ed.)

Page 22: Just Because You Can Doesn\'t Mean You Should - Graphing Data Well

22

The Ten Commandments…of good graphics

4. Thou shalt not employ “chartjunk.”Category 2: the dreaded grid (especially compared to symbol size)

• Introduction• Terminology• 10 Commandments• PowerPoint• Best/Worst• Conclusion

Page 23: Just Because You Can Doesn\'t Mean You Should - Graphing Data Well

23

The Ten Commandments…of good graphics

4. Thou shalt not employ “chartjunk.”Category 3: the self-promoting graphical duck

• Introduction• Terminology• 10 Commandments• PowerPoint• Best/Worst• Conclusion

“When a graphic is taken over by decorative forms or computer debris, when the data measures and structures become Design Elements, when the overall design purveys Graphical Style rather than quantitative information, then that graphic may be called a duck in honor of the duck-form store, ‘Big Duck.’”- Edward Tufte

Based on an architectural observation that is valid for

graphics:“It is all right to decorate

construction but never construct decoration.”

Fortunately, this is really hard to do in JMP…you have to work at it. Our Worst case selection will further clarify this rule of thumb.

Page 24: Just Because You Can Doesn\'t Mean You Should - Graphing Data Well

24

The Ten Commandments…of good graphics

4. Thou shalt not employ “chartjunk.”Category 3: the self-promoting graphical duck

The charts widely used in mass media and business publications, to wit, the pie chart, divided bar charts, and area charts, will, in most cases, violate this commandment when their use is attempted in science and technology.

• Introduction• Terminology• 10 Commandments• PowerPoint• Best/Worst• Conclusion

Page 25: Just Because You Can Doesn\'t Mean You Should - Graphing Data Well

25

The Ten Commandments…of good graphics

5. Thou shalt show variation in thy data, not in thy design.Avoid confounding design variation with data variation

• Introduction• Terminology• 10 Commandments• PowerPoint• Best/Worst• Conclusion

Five different vertical scales are used to show the price and two different horizontal scales to show the passage of time without one indication of these changes (not even a scale break)!

(FYI: This qualifies as a “self-promoting graphical duck.”)

Page 26: Just Because You Can Doesn\'t Mean You Should - Graphing Data Well

26

The Ten Commandments…of good graphics

6. Thou shalt maximize the data-ink ratio in thy graphs.The data-ink ratio = the amount of “ink” used to depict the actual data divided by the total “ink” used to print the graphicThe “Precision Marching Bandof 63 Mosquitoes”:data-ink ratio < 0.6

• Introduction• Terminology• 10 Commandments• PowerPoint• Best/Worst• Conclusion

Some data that doesn’t fit the pattern, an important observation, is actually obscured by the Marching Band…

Page 27: Just Because You Can Doesn\'t Mean You Should - Graphing Data Well

27

The Ten Commandments…of good graphics

6. Thou shalt maximize the data-ink ratio in thy graphs.The “Precision Marching Band of 63 Mosquitoes”: remove the elements below

• Introduction• Terminology• 10 Commandments• PowerPoint• Best/Worst• Conclusion

Page 28: Just Because You Can Doesn\'t Mean You Should - Graphing Data Well

28

The Ten Commandments…of good graphics

6. Thou shalt maximize the data-ink ratio in thy graphs.The “Precision Marching Band of 63 Mosquitoes”:

add a few labels and rotate y axis labels and numbers for easier reading

• Introduction• Terminology• 10 Commandments• PowerPoint• Best/Worst• Conclusion

Data-ink ratio up to ~ 0.9

All data clearly visible

Page 29: Just Because You Can Doesn\'t Mean You Should - Graphing Data Well

29

The Ten Commandments…of good graphics

7. Thou shalt maximize thy data density and the size of thy data matrix (within reason).

The human eye has the ability to detect large amounts of information in small spaces: take advantage of this phenomenon

• Introduction• Terminology• 10 Commandments• PowerPoint• Best/Worst• Conclusion

Low data density:•data matrix contains only four entries

• the names (2) and the numbers (2) for the two bars on the right•bar on the left is the sum of the other two

•original graph covers 26.5 square inches•dividing 4 by 26.5 = data density of 0.15 numbers per square inch

NOT GOOD!

Page 30: Just Because You Can Doesn\'t Mean You Should - Graphing Data Well

30

The Ten Commandments…of good graphics

7. Thou shalt maximize thy data density and the size of thy data matrix (within reason).

Good data density: map of France

• Introduction• Terminology• 10 Commandments• PowerPoint• Best/Worst• Conclusion

This map of France was originally 27 square inches (close to that of previous slide). It shows the location and boundaries of 30,000 French communes. To recreate the data of the map would require somewhere in the neighborhood of 240,000 numbers: 30,000 latitudes, 30,000 longitudes, and an average of six numbers describing the shape of each commune. The data density thus works out to be nearly 9,000 numbers per square inch.

Page 31: Just Because You Can Doesn\'t Mean You Should - Graphing Data Well

31

The Ten Commandments…of good graphics

7. Thou shalt maximize thy data density and the size of thy data matrix (within reason).

Of course, “within reason” applies…

• Introduction• Terminology• 10 Commandments• PowerPoint• Best/Worst• Conclusion

Page 32: Just Because You Can Doesn\'t Mean You Should - Graphing Data Well

32

The Ten Commandments…of good graphics

8. Thou shalt draw the viewer’s eye to the data, not to other design elements.

Use visually prominent graphical elements to show the dataDon’t clutter the interior of the scale-line rectangle with legends, labels, and linesTick marks should generally face outwardUse reference lines only when an important value must be seen across the entire graph, and then use a color, weight and style of line that does not overpower the data symbolsIf data labels are used inside the scale-line rectangle, don’t allow them to interfere with the data or to clutter the graphDon’t put notes and keys inside the scale-line rectangle; notes should go in a caption or the accompanying textWhen datasets are superimposed, choose color, symbol, line weights and styles, and other such graphical elements so that the datasets can be readily visually distinguished

• Introduction• Terminology• 10 Commandments• PowerPoint• Best/Worst• Conclusion

Page 33: Just Because You Can Doesn\'t Mean You Should - Graphing Data Well

33

The Ten Commandments…of good graphics

9. Thou shalt do and redo thy graphs to determine which one telleth the story best.

Experiment: this process is complex and multivariateNot only efficiency, but complexity, structure, density, and even beauty have a role to play in the generation of the final product.JMP, particularly the Graph Builder, is uniquely strong in this ability to “play” with visualization options.

• Introduction• Terminology• 10 Commandments• PowerPoint• Best/Worst• Conclusion

Simplest:height by weight…

Page 34: Just Because You Can Doesn\'t Mean You Should - Graphing Data Well

34

The Ten Commandments…of good graphics

9. Thou shalt do and redo thy graphs to determine which one telleth the story best.

“Playing” with Graph Builder

• Introduction• Terminology• 10 Commandments• PowerPoint• Best/Worst• Conclusion

Wrap by sex:

Page 35: Just Because You Can Doesn\'t Mean You Should - Graphing Data Well

35

The Ten Commandments…of good graphics

9. Thou shalt do and redo thy graphs to determine which one telleth the story best.

“Playing” with Graph Builder

• Introduction• Terminology• 10 Commandments• PowerPoint• Best/Worst• Conclusion

Adding overlay by age:

Eh, don’t like that?Hit undo button twice…

Page 36: Just Because You Can Doesn\'t Mean You Should - Graphing Data Well

36

The Ten Commandments…of good graphics

9. Thou shalt do and redo thy graphs to determine which one telleth the story best.

“Playing” with Graph Builder

• Introduction• Terminology• 10 Commandments• PowerPoint• Best/Worst• Conclusion

Redo by reversingthe process use:wrap by ageoverlay by sex…

And if you stilldon’t like it orsee clearly the

story in the data,UNDO…REDO!

Page 37: Just Because You Can Doesn\'t Mean You Should - Graphing Data Well

37

The Ten Commandments…of good graphics

10.Thou shalt not create “unfriendly” but “friendly” data graphics.

Remember your audience

• Introduction• Terminology• 10 Commandments• PowerPoint• Best/Worst• Conclusion

Page 38: Just Because You Can Doesn\'t Mean You Should - Graphing Data Well

38

The Ten Commandments…of good graphics

10.Thou shalt not create “unfriendly” but “friendly” data graphics.

Regarding typography: a quote of Tufte quoting someone else:

• Introduction• Terminology• 10 Commandments• PowerPoint• Best/Worst• Conclusion

The concept that “the simpler the form of a letter the simpler its reading” was an obsession of beginning constructivism. It became something like a dogma, and is still followed by “modernistic” typographers…. Ophthalmology has disclosed that the more the letters are differentiated from each other, the easier is the reading. Without going into comparisons and details, it should be realized that words consisting of only capital letters present the most difficult reading – because of their equal height, equal volume, and with most, their equal width. When comparing serif letters with sans-serif, the latter provide an uneasy reading. The fashionable preference for sans-serif in text shows neither historical nor practical competence.

Page 39: Just Because You Can Doesn\'t Mean You Should - Graphing Data Well

39

A Word about PowerPointThe charges: Cognitive style. Presenter-

focused, not content or audience focused.

Low resolution. Little info per slide - so more slides are needed. Data graphics are weak: average of 12 numbers per graphic.

Bullets. Bullet lists can show only 3 logical flows: sequence; priority; or membership. Multivariate models with feedback and simultaneity can’t be listed. This encourages lazy thinking, generic ideas and ignores critical relationships and assumptions.

• Introduction• Terminology• 10 Commandments• PowerPoint• Best/Worst• Conclusion

Guilty as charged?

Page 40: Just Because You Can Doesn\'t Mean You Should - Graphing Data Well

40

A Word about PowerPoint

Yes and no: Some validity to these charges, BUT…they all seem to

ignore the fact that PowerPoint, or any other presentation software, is just a tool.

To blame the tool for its misuse is to kill the messenger for his message.What is needed is not condemnation of the tool but proper instruction of the use of that tool.

• Introduction• Terminology• 10 Commandments• PowerPoint• Best/Worst• Conclusion

Page 41: Just Because You Can Doesn\'t Mean You Should - Graphing Data Well

41

Competitors for Best & WorstThe Best: Minard’s data map + time-series

• Introduction• Terminology• 10 Commandments• PowerPoint• Best/Worst• Conclusion

Plots 6 (!) variables:1. size of army2/3. location on 2D surface4. direction of movement5. temperature6. dates during retreat (time)

Invasion starts with 422,000men at Polish-Russian border

A sacked and deserted Moscowreached with only 100,000 men

Retreat in dead of winterdepicted on lower darkerband and linked to temp

scale and dates on bottom

…defies “the pen of the historian by its brutal eloquence.”

Only 10,000 made it home!

Page 42: Just Because You Can Doesn\'t Mean You Should - Graphing Data Well

42

Competitors for Best & WorstThe Worst:

• Introduction• Terminology• 10 Commandments• PowerPoint• Best/Worst• Conclusion

Only five pieces ofdata (not variables) in this“graphically preposterous”work of art.

Not one but two axis breaks!

3D effect = chartjunkSince numbersall sum to 100%,plotting both isredundant.

Colors signify nothing

…”delighted connoisseurs of the graphically preposterous.”

Page 43: Just Because You Can Doesn\'t Mean You Should - Graphing Data Well

43

Conclusion (in Tufte’s words)

“Design is choice. The theory of the visual display of quantitative information consists of principles that generate design options and that guide choices among options. The principles should not be applied rigidly or in a peevish spirit; they are not logically or mathematically certain; and it is better to violate any principle than to place graceless or inelegant marks on paper....

“What is to be sought in designs for the display of information is the clear portrayal of complexity. Not the complication of the simple; rather the task of the designer is to give visual access to the subtle and the difficult – that is,

the revelation of the complex.”

• Introduction• Terminology• 10 Commandments• PowerPoint• Best/Worst• Conclusion

Page 44: Just Because You Can Doesn\'t Mean You Should - Graphing Data Well

44

Handy ReferencesCleveland, William S. 1994. The Elements of Graphing Data. Revised Edition. Summit, New Jersey: Hobart Press.Tufte, Edward R. 2001. The Visual Display of Quantitative Information. 2nd Edition. Cheshire, Connecticut: Graphics Press.Huff, Darrell. 1954. How to Lie with Statistics. New York, New York: W. W. Norton & Company.Zumel, Nina. 2009. Good Graphs: Graphical Perception and Data Visualization. http://www.win-vector.com/, accessed 4 June 2010.Pirrello, Chuck. 2010. Effective Visualization Techniques for Data Discovery and Analysis. Cary, North Carolina: SAS Institute, Inc.Few, Stephen. 2009. Now You See It: Simple Visualization Techniques for Quantitative Analysis. Oakland, California: Analytics Press.Cleveland, William S. 1993. Visualizing Data. Summit, New Jersey: Hobart Press.Tufte, Edward R. 1990. Envisioning Information. Cheshire, Connecticut: Graphics Press.Annesley, Thomas M. 2010. Put Your Best Figure Forward: Line Graphs and Scattergrams. Clinical Chemistry 56 (8): 1229-1233.Bessler, LeRoy. 2004. Communication-Effective Use of Color for Web Pages, Graphs, Tables, Maps, Text, and Print. SUGI 29, Montreal, Canada. http://www2.sas.com/proceedings/sugi29/176-29.pdf, accessed 2 July 2010.