stat 100 - richards todayÕs statistic

6
1 STAT 100 - Richards Statistical Concepts and Reasoning Lecture 5 2 Today’s Statistic 15 Researchers at the Cancer Genome Project have identified all genetic mutations occurring during the lifetime of a cancer patient. They found that a mutation occurs, on average, for every 15 cigarettes smoked by a typical lung-cancer patient. Source: http://www.wellcome.ac.uk/News/Media-office/Press-releases/2009/WTX058047.htm 3 Chapter 9: Plots , Graphs, Pictures How to create good and bad displays of statistical data Darrell Huff, “How To Lie With Statistics,” 1954 Edward Tufte, “Envisioning Information,” 1990 4 Redwood Transit System (RTS) is the public bus system for Humboldt County, California.

Upload: others

Post on 16-Apr-2022

3 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: STAT 100 - Richards TodayÕs Statistic

1

STAT 100 - Richards

Statistical Concepts and Reasoning

Lecture 5

2

Today’s Statistic

15

Researchers at the Cancer Genome Project have identified allgenetic mutations occurring during the lifetime of a cancerpatient. They found that a mutation occurs, on average, forevery 15 cigarettes smoked by a typical lung-cancer patient.

Source: http://www.wellcome.ac.uk/News/Media-office/Press-releases/2009/WTX058047.htm

3

Chapter 9: Plots , Graphs, Pictures

How to create good and bad displays of statistical data

Darrell Huff, “How To Lie With Statistics,” 1954

Edward Tufte, “Envisioning Information,” 1990

4Redwood Transit System (RTS) is the public bus system for Humboldt County, California.

Page 2: STAT 100 - Richards TodayÕs Statistic

5

The New York Times, March 1987

“Gotti is acquitted by a Federal jury in conspiracy case”

“The last piece of evidence requested by the jury forre-examination was a chart introduced by the defensethat showed the criminal backgrounds of sevenprosecution witnesses. It listed 69 crimes, includingmurder, drug possession and sales, and kidnapping.”

“It was a chart listing the lengthy criminal records ofseven prosecution witnesses who had obtained promisesof leniency and other favors from the Government inreturn for their testimony against Mr. Gotti . . .”

6Source: Tufte, “Envisioning Information”

7

U.S. Population and Violent Crime*

Table 9.2, p. 176

Year 1982 1983 1985 1986 1987 1988 1989 1990 1991

U.S. population 231 234 239 241 243 246 248 249 252

Violent crime 1.32 1.26 1.33 1.49 1.48 1.57 1.65 1.82 1.91

*Figures for 1984 were unavailable . . .

Population and number of crimes in millions.

Let us plot the data from two perspectives

8

Conclusion: The crime rate is stable; life is great!

Page 3: STAT 100 - Richards TodayÕs Statistic

9Conclusion: The crime rate is unstable; life is terrible!

10

Another bad graph

Source: http://www.smashingmagazine.com/2010/05/10/imagine-a-pie-chart-stomping-on-an-infographic-forever/

11

What do “Binge alcohol” and “deaths” mean?

Which of the lines corresponds to “Binge alcohol”and which to “deaths”?

Why are states listed in alphabetical order? Whyare some states’ names are missing?

The vertical scale is the actual number of cases.California has the highest spike because it has thelarges population of all states.

A proper scale is the percentage of the state’spopulation. Always use data that reflects thestate’s population size.

12

A Congressman showed this graph to “prove” that life is getting easier for familiesin the 40-60th percentile of incomes. What is wrong with this graph?

Page 4: STAT 100 - Richards TodayÕs Statistic

13

Problems with the graph

The vertical axis does not start at zero; is somethingbeing hidden from us?

The grid-lines and data labels are not explained.

The horizontal scale is chosen so that the graphsuggests an enormous fall in the middle fifth’s taxes.

Let’s examine the raw data

Source: http://junkcharts.typepad.com/junk_charts/2008/03/dont-believe-wh.html

14

15

“The Congressional Budget Office had provideddata for all five quintiles. Why not plot them all?

“Surprise! The middle 20% are not special. Allfive quintiles, especially the middle three,followed the same trend over time.

“To single out the middle 20% is to obscure thecontext in which the data should be properlyinterpreted.”

Almost any graph in which the vertical scale doesnot start with zero is deceptive.

16

A double-graph;too much visual information

Page 5: STAT 100 - Richards TodayÕs Statistic

17

Number of jobs created during various periods

Source: http://www.smashingmagazine.com/2010/05/10/imagine-a-pie-chart-stomping-on-an-infographic-forever/

18

What is wrong with the jobs-creation graph?

The red and blue bars are misleading and irrelevant.

The graph is 4-dimensional: Year, number of jobs atyear end, number of jobs created, and politicalaffiliation of the Adminstration.

Even 3-dimensional graphs are often confusing.

The data ignore confounding factors, such as: womenentering the workforce, population growth, shiftsfrom part-time to full-time jobs, changes in wages,changes in GDP.

And, just whom should get credit for job creation?

19

In general, pie charts and pictograms are bad;histograms are good.

20

Is the picture a triangle or pyramid?

The area of the bottom part of the triangle is morethan 73.80% of the total area of the triangle.

The area of the bottom part of the pyramid is farmore than 73.8% of the total volume of the pyramid.

A simple bar chart would have provided accurateinformation.

Why 73.80%? Why not approximate to 74%?

73.80% gives a (false) sense of exactness.

Page 6: STAT 100 - Richards TodayÕs Statistic

21

The moral of the story: Read Chapter 9 carefully.

It is an easy chapter to read.

You will become a better citizen when you ReadChapter 9 carefully.