stat 100 - richards todayÕs statistic
TRANSCRIPT
1
STAT 100 - Richards
Statistical Concepts and Reasoning
Lecture 5
2
Today’s Statistic
15
Researchers at the Cancer Genome Project have identified allgenetic mutations occurring during the lifetime of a cancerpatient. They found that a mutation occurs, on average, forevery 15 cigarettes smoked by a typical lung-cancer patient.
Source: http://www.wellcome.ac.uk/News/Media-office/Press-releases/2009/WTX058047.htm
3
Chapter 9: Plots , Graphs, Pictures
How to create good and bad displays of statistical data
Darrell Huff, “How To Lie With Statistics,” 1954
Edward Tufte, “Envisioning Information,” 1990
4Redwood Transit System (RTS) is the public bus system for Humboldt County, California.
5
The New York Times, March 1987
“Gotti is acquitted by a Federal jury in conspiracy case”
“The last piece of evidence requested by the jury forre-examination was a chart introduced by the defensethat showed the criminal backgrounds of sevenprosecution witnesses. It listed 69 crimes, includingmurder, drug possession and sales, and kidnapping.”
“It was a chart listing the lengthy criminal records ofseven prosecution witnesses who had obtained promisesof leniency and other favors from the Government inreturn for their testimony against Mr. Gotti . . .”
6Source: Tufte, “Envisioning Information”
7
U.S. Population and Violent Crime*
Table 9.2, p. 176
Year 1982 1983 1985 1986 1987 1988 1989 1990 1991
U.S. population 231 234 239 241 243 246 248 249 252
Violent crime 1.32 1.26 1.33 1.49 1.48 1.57 1.65 1.82 1.91
*Figures for 1984 were unavailable . . .
Population and number of crimes in millions.
Let us plot the data from two perspectives
8
Conclusion: The crime rate is stable; life is great!
9Conclusion: The crime rate is unstable; life is terrible!
10
Another bad graph
Source: http://www.smashingmagazine.com/2010/05/10/imagine-a-pie-chart-stomping-on-an-infographic-forever/
11
What do “Binge alcohol” and “deaths” mean?
Which of the lines corresponds to “Binge alcohol”and which to “deaths”?
Why are states listed in alphabetical order? Whyare some states’ names are missing?
The vertical scale is the actual number of cases.California has the highest spike because it has thelarges population of all states.
A proper scale is the percentage of the state’spopulation. Always use data that reflects thestate’s population size.
12
A Congressman showed this graph to “prove” that life is getting easier for familiesin the 40-60th percentile of incomes. What is wrong with this graph?
13
Problems with the graph
The vertical axis does not start at zero; is somethingbeing hidden from us?
The grid-lines and data labels are not explained.
The horizontal scale is chosen so that the graphsuggests an enormous fall in the middle fifth’s taxes.
Let’s examine the raw data
Source: http://junkcharts.typepad.com/junk_charts/2008/03/dont-believe-wh.html
14
15
“The Congressional Budget Office had provideddata for all five quintiles. Why not plot them all?
“Surprise! The middle 20% are not special. Allfive quintiles, especially the middle three,followed the same trend over time.
“To single out the middle 20% is to obscure thecontext in which the data should be properlyinterpreted.”
Almost any graph in which the vertical scale doesnot start with zero is deceptive.
16
A double-graph;too much visual information
17
Number of jobs created during various periods
Source: http://www.smashingmagazine.com/2010/05/10/imagine-a-pie-chart-stomping-on-an-infographic-forever/
18
What is wrong with the jobs-creation graph?
The red and blue bars are misleading and irrelevant.
The graph is 4-dimensional: Year, number of jobs atyear end, number of jobs created, and politicalaffiliation of the Adminstration.
Even 3-dimensional graphs are often confusing.
The data ignore confounding factors, such as: womenentering the workforce, population growth, shiftsfrom part-time to full-time jobs, changes in wages,changes in GDP.
And, just whom should get credit for job creation?
19
In general, pie charts and pictograms are bad;histograms are good.
20
Is the picture a triangle or pyramid?
The area of the bottom part of the triangle is morethan 73.80% of the total area of the triangle.
The area of the bottom part of the pyramid is farmore than 73.8% of the total volume of the pyramid.
A simple bar chart would have provided accurateinformation.
Why 73.80%? Why not approximate to 74%?
73.80% gives a (false) sense of exactness.
21
The moral of the story: Read Chapter 9 carefully.
It is an easy chapter to read.
You will become a better citizen when you ReadChapter 9 carefully.