Introduction to Data Analytics
Part Four: Principles of Data Visualization
David Schuff
What makes a good chart?
Minard’s map of Napoleon’s campaign into Russia, 1869Reprinted in Tufte (2009), p. 41
What makes a good chart?
http://www.popvssoda.com/countystats/total-county.html
What makes a good chart?
Zhang et al. (2010), “A case study of micro-blogging in the enterprise: use, value, and related issues,” Proceedings of the 28th International Conference on Human Factors in Computing Systems.
This is from an academic conference
paper.
What are the problems with
this chart?
Some basic principles (adapted from Tufte 2009)
• The chart should tell a story1• The chart should have graphical
integrity2• The chart should minimize
graphical complexity3Tufte’s fundamental principle:Above all else show the data
Principle 1: The chart should tell a story
Graphics should be clear on their own
The depictions should enable meaningful comparison
The chart should yield insight beyond the text
“If the statistics are boring, then you’ve got the wrong numbers.” (Tufte 2009)
Examples?http://www.evl.uic.edu/aej/491/week03.html
http://flowingdata.com/2009/11/26/fox-news-makes-the-best-pie-chart-ever/
Telling a Story
http
://e
cono
mix
.blo
gs.n
ytim
es.c
om/2
009/
05/0
5/ob
esi
ty-
and
-the
-fa
stne
ss-o
f-fo
od/
http
://f
low
ingd
ata.
com
/201
1/0
1/1
9/s
tate
s-w
ith-t
he-m
ost-
and-
leas
t-fir
earm
s-m
urde
rs/
Principle 2: The chart should have graphical integrity
• Basically, it shouldn’t “lie” (mislead the reader)
• Tufte’s “Lie Factor”:
Should be ~ 1
< 1 = understated effect
> 1 = exaggerated effect
Examples of the “lie factor”
𝐿𝐹=5.3 /0.627.5 /18
=8.831.53
=5.77
𝐿𝐹=4280% ( h𝑐 𝑎𝑛𝑔𝑒𝑖𝑛𝑣𝑜𝑙𝑢𝑚𝑒)454% ( h𝑐 𝑎𝑛𝑔𝑒 𝑖𝑛𝑝𝑟𝑖𝑐𝑒)
=9.4Reprinted from Tufte (2009), p. 57 & p. 62
A more recent, basic example
http://20bits.com/articles/politics-and-tuftes-lie-factor/
The original graphic from Real Clear Politics, 2008.
(Look at the y-axis)The adjusted graphic.
Other tips to avoid “lying”
Adjust for inflation
Make sure the context is presented
80
90
100
110
120
130
140
2003 2004 2005 2006 2007 2008 2009 2010Year
Hypothetical Industries, Inc.
Revenue
Adjusted Revenue
350
360
370
380
390
400
410
2009 2010
Theft
s pe
r 100
000
citiz
ens
Hypothetical City Crime
25
75
125
175
225
275
325
375
425
2003 2004 2005 2006 2007 2008 2009 2010Th
efts
per 1
0000
0 ci
tizen
s
Hypothetical City Crime
vs.
Principle 3: The chart should minimize graphical complexity
Key concepts
Sometimes a table is
betterData-ink Chartjunk
Generally, the simpler the better…
When a table is better than a chart• For a few data points, a table can do just as well…
$0.00
$50,000.00
$100,000.00
$150,000.00
$200,000.00
$250,000.00
Total Sales by SalespersonSalesperson Total Sales
Peacock $225,763.68
Leverling $201,196.27
Davolio $182,500.09
Fuller $162,503.78
Callahan $123,032.67
King $116,962.99
Dodsworth $75,048.04
Suyama $72,527.63
Buchanan $68,792.25
The table carries more information in less space and is more precise.
The Ultimate Table: The Box Score
• Large amount of information in a very small space
• So why does this work?• Depends on the
reader’s knowledge of the data
The Business Box Score?
• Applying the same concept to our salesforce example.
• How does this help? How could it hurt?
Sales Performance – March 2011
Salesperson TS WD BD NC DOR
Peacock 225 3 40 20 28
Leverling 201 2 45 18 27
Davolio 182 5 38 22 28
Fuller 162 2 22 16 20
Callahan 123 1 15 14 15
King 116 0.5 20 12 18
Dodsworth 75 0.3 12 10 20
Suyama 72 0 8 10 8
Buchanan 68 0 8 8 12
Key:TS – total salesWD – worst dayBD – best dayNC – number of customersDOR – days on the road
Data Ink• The amount of “ink” devoted to data in a chart• Tufte’s Data-Ink ratio:
Should be ~ 1
< 1 = more non-data related ink in graphic
= 1 implies all ink devoted to data
Tufte’s principle:Erase ink whenever possible
Being conscious of data ink
25
75
125
175
225
275
325
375
425
2003 2004 2005 2006 2007 2008 2009 2010
Theft
s pe
r 100
000
citiz
ens
Hypothetical City Crime
25
75
125
175
225
275
325
375
425
2003 2004 2005 2006 2007 2008 2009 2010
Theft
s pe
r 100
000
citiz
ens
Hypothetical City Crime
200
270
320 330
370350
400370
2003 2004 2005 2006 2007 2008 2009 2010
Hypothetical City Crime
Lower data-ink ratio(worse)
Higher data-ink ratio(better)
What makes a good chart?
020000400006000080000
100000120000140000160000
2011 Total Sales
Order Date
Sum of Extended Price
020000400006000080000
100000120000140000160000
2011 Total Sales
Order Date
Sum of Extended Price
Sometimes it’s really a matter of
preference.
These both minimize data
ink.
Why isn’t a table better here?
3-D Charts
$0.00
$50,000.00
$100,000.00
$150,000.00
$200,000.00
$250,000.00
Total Sales by Salesperson
Evaluate this from a data-ink perspective.How does it affect the clarity of the chart?
Chartjunk: Data Ink “gone wild”
Unnecessary visual clutter that doesn’t provide additional insight
Distraction from the story the chart is supposed to convey
When the data-ink ratio is low, chartjunk is likely to be high
25
75
125
175
225
275
325
375
425
2003 2004 2005 2006 2007 2008 2009 2010
Theft
s pe
r 100
000
citiz
ens
Hypothetical City Crime
Example: Moiré effects (Tufte 2009)
Creates illusion of movement
Stands out, in a bad way
$0.00
$50,000.00
$100,000.00
$150,000.00
$200,000.00
$250,000.00
Total Sales by Salesperson
Example: The Grid
25
75
125
175
225
275
325
375
425
2003 2004 2005 2006 2007 2008 2009 2010
Theft
s pe
r 100
000
citiz
ens
Hypothetical City Crime25
75
125
175
225
275
325
375
425
2003 2004 2005 2006 2007 2008 2009 2010
Theft
s pe
r 100
000
citiz
ens
Hypothetical City Crime
Why are these examples of chartjunk?
What could you do to
remedy it?
Data Ink Working Against Us
Evaluate this chart in terms of Data Ink.
Are there better
visualizations?
Data Ink Working For Us
Evaluate this chart in terms of Data Ink.
Imagine this as a bar chart.
As a table!!
Stacked Bar Charts are Often Trouble
• Original chart from the BBC website
• Why is this so difficult to read?
• What would be a better way to visualize it?
http://j-walkblog.com/index.php?/weblog/posts/bad_charts/
Key Questions: Can you answer…
• What are three aspects of a good graphic?
• How can a chart “lie”?
• What is the Data Ink ratio and how does it relate to Chartjunk?
• When is a table better than a chart?