stories and statistics. prepared by frank swain national coordinator for science training for...
TRANSCRIPT
STORIES AND STATISTICS
Prepared by Frank SwainNational Coordinator for Science Training for JournalistsRoyal Statistical [email protected] 7614 3947
Communicating numbersPercentages & percentage points
SurveysAverages
UncertaintyTrends
Correlation versus causationProbabilities: what makes a value unusual?
Absolute and relative riskImagery
Contents
Communicating numbers#1
Breaking down big numbers
Your numbers are characters in the story – give them some personality
Breaking down big numbers
“1.4 million photos are uploaded a second”
1.4m photos
x 86,400 seconds in a day
÷ 500 million users
= 240 photos per person per day
Realistic?
Numbers often need to be scaled to be meaningful e.g. per person, per passenger mile etc.
Hospitals
Touristinfocentres
Putting numbers in context
“The implant has been used by around 1.4 million women since it was introduced in 1999. In its 11 years of use, medicine regulators have recorded 584 pregnancies among users”
“…for every 1,000 women using it, less than one will get pregnant over a three-year period”
Putting numbers in context
Percentages
Percentages less than 1% are difficult to interpret. Better to use “3 in every 10,000” than 0.03%
Also be careful with percentages bigger than 100% - can be better to use double, triple etc.
Know the difference between a percentage and a percentage point.
VAT increased to 20% on January 2011
This is a rise of 2.5 percentage points not a rise of 2.5%
Percentages
= 1 million smokers
= 1 million non-smokers
1948 1970
UK smoking rate
26m smokers 25m smokers
“The smoking population shrank by 4 per cent”
65% 55%
“The smoking rate has declined 10 percentage points”
Surveys#2
What’s been counted?• How many…
ballot papers?
chairs?
hearts beating?
footprints?
…people?
Polls and surveys
• Polls are ways of finding out what a population thinks without asking everyone
• Sample size – poll of 1000 people has ± 3% confidence interval just from sampling
• So be careful of small subgroups of the sample, 100 people gives ± 10%
Survey example
“…couples now expect to blow an average of £20,273 tying the knot…”
• Which average?• Whose wedding?• Who’s asking?
Do you have the exact questions the pollster asked?
Are they precise and fair?
#3
Polls and surveysDo the people surveyed reflect
the wider population? (selection bias)
Were the questions asked in a fair way?
(response bias)
Who commissioned the survey?
Statistical significance• So how do we know if an event really
is interesting or if it was just random variation?
• That’s what ‘statistical significance’ is about.
• For example, is a cluster of cancer cases in an area suspicious or likely to be just natural variation?
League tables
League tables are often meaningless because the natural variation is far bigger than the differences in the table
There are many different ways of calculating an average.
Which is the appropriate one to use?
#4
Variation and distributionsWe often want to summarise a distribution of values with one number – an average.But there are different types of average: mean, median and mode.
Average does not mean the same thing as typical.Different averages tell different stories – say which you are using.
Averages
Averages
Mode, £275 Median, £377
Mean, £463
Bottom line:Give an idea of the size and shape of the spread around the average.
Averages
Normal distribution
95.4%
68.2%
A W O R D O N “A V E R A G E ”
South Korea Spain United States Australia Greece China Great Britain0
20
40
60
80
100
120
Do countries win more Olympic medals at home?Medals Won On Average (Away Games) Medals Won At Home Games
South Korea Spain United States Australia Greece China Great Britain0
20
40
60
80
100
120
Do countries win more Olympic medals at home?Medals Won On Average (Away Games) Medals Won At Home Games
How accurate are the figures?#5
“The number of people out of work rose by 38,000 to 2.49 million in the three months to June, official figures show.”
GOLDACRE: “The estimated change over the past quarter is 38,000, but the 95% confidence interval is ± 87,000, running from -49,000 to 125,000. That wide range clearly includes zero, no change at all.”
One change in the numbers does not make a trend.
Blips often happen.
#6
Trends
Trends
Beware spurious connections that don’t amount to ‘a causes b’.
#7
Correlation and causation
Correlation and causation
Correlation and causation
Correlation and causation• A significant correlation between two variables
does not imply one causes the other.• Often there is a common cause for both
variables, or it’s just a coincidence.
“Regression to the mean”The most abused correlation
in the world!
“One in a million”.#8
Probability and coincidences
• The chance of an event can be very small, but if it has lots of opportunities to happen, it can be near certain.
• Most weeks someone wins the lottery.
Probability
“the chances… an astonishing 48 million to one”
Actually it’s only 133,000 to one…
…and there are around 167,000 third children born in the UK each year.
Always think about how many opportunities there were for a coincidence to happen
Probability
Extremes
You should know what the absolute and the relative risk is, and communicate both.
#10
Risk
Google tells me….diabetes,weight gain, cigarette smoke, HRT,solariums
…all “double” my risk of cancer
What, me
worry?
But how bad is that?
Risk example
“Bacon increases risk of colorectal cancer by 20%”
About 5 out of 100 people develop colorectal cancer.
Risk example
If all 100 ate 3 extra rashers every day... The number would rise to six
Risk example
“Bacon increases risk of Colorectal cancer by 20%”
Is therefore the same as saying
So…
“About 1 extra case per 100 people”
Risk
• Absolute risk increases from 5% to 6% • Absolute risk increases by 1 percentage
point
• Relative risk increases by 20%
• 100 people eating 50g of processed meat every day for the rest of their lives would lead to 1 extra case of colorectal cancer
Apply the same rules to a graphic that you would a story: strive for accuracy,
clarity and a strong narrative.
#11
Visualising data
20092010
20112012
20132014
20152020
20302040
Visualising data
Visualising data
Visualising data
Resources
Royal Statistical Society StraightStatistics.org
FullFact.orgSTATS.org
UnderstandingUncertainty.org