introduction to thinking about data i · ignaz semmelweiss kolletschka dies of septicaemia...

44
000 Introduction to Thinking About Data I: The Importance of Being EarnestWhy Numbers Matter in Public Health MMED African Institute for the Mathematical Sciences Muizenberg, South Africa May, 2017 Brian G Williams, PhD Stellenbosch University Slide Set Citation: DOI: 10.6084/m9.figshare.5043136 The ICI3D Figshare Collection

Upload: others

Post on 06-Jul-2020

15 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Introduction to Thinking About Data I · Ignaz Semmelweiss Kolletschka dies of septicaemia mortality 0.00 0.05 0.10 0.15 1830 1835 1840 1845 1850 1855 1860. 0 0 0 9 In 1848, aged

000

Introduction to Thinking About Data I:

The Importance of Being Earnest–Why Numbers Matter in Public Health

MMED

African Institute for the Mathematical Sciences

Muizenberg, South Africa

May, 2017

Brian G Williams, PhD

Stellenbosch University

Slide Set Citation: DOI: 10.6084/m9.figshare.5043136

The ICI3D Figshare Collection

Page 2: Introduction to Thinking About Data I · Ignaz Semmelweiss Kolletschka dies of septicaemia mortality 0.00 0.05 0.10 0.15 1830 1835 1840 1845 1850 1855 1860. 0 0 0 9 In 1848, aged

000

Goals

Learn to

1. See patterns in data

2.Formulate hypothesis

3.Test theories

2The purpose of models is not to fit the data but to sharpen the questions. S. Karlin

Page 3: Introduction to Thinking About Data I · Ignaz Semmelweiss Kolletschka dies of septicaemia mortality 0.00 0.05 0.10 0.15 1830 1835 1840 1845 1850 1855 1860. 0 0 0 9 In 1848, aged

000

The supreme goal of all theory is to make the

irreducible basic elements as simple and as

few as possible without having to surrender

the adequate representation of a single

datum of experience.

Albert Einstein:1933

On the Method of Theoretical Physics The Herbert Spencer Lecture, Oxford (10 June 1933)

Page 4: Introduction to Thinking About Data I · Ignaz Semmelweiss Kolletschka dies of septicaemia mortality 0.00 0.05 0.10 0.15 1830 1835 1840 1845 1850 1855 1860. 0 0 0 9 In 1848, aged

000

Three rules of good modelling

• Stay as close to the data as you can

• Put in as much biology as you can

• Keep it simple

Page 5: Introduction to Thinking About Data I · Ignaz Semmelweiss Kolletschka dies of septicaemia mortality 0.00 0.05 0.10 0.15 1830 1835 1840 1845 1850 1855 1860. 0 0 0 9 In 1848, aged

000

Semelweiss

Page 6: Introduction to Thinking About Data I · Ignaz Semmelweiss Kolletschka dies of septicaemia mortality 0.00 0.05 0.10 0.15 1830 1835 1840 1845 1850 1855 1860. 0 0 0 9 In 1848, aged

000

6

Ignaz Semmelweiss: 1818–1865

Or why washing your hands matters

Junior doctor in

Vienna General Hospital

Puerperal fever and

maternal mortality

Page 7: Introduction to Thinking About Data I · Ignaz Semmelweiss Kolletschka dies of septicaemia mortality 0.00 0.05 0.10 0.15 1830 1835 1840 1845 1850 1855 1860. 0 0 0 9 In 1848, aged

000

7

Ignaz Semmelweiss

About one in fifteen mothers were dying of puerperal fever during childbirth

0.00

0.05

0.10

0.15

1830 1835 1840 1845 1850 1855 1860

Mate

rnal m

ort

alit

yM

ate

rna

l m

ort

alit

y

0.00

0.05

0.10

0.15

1830 1835 1840 1845 1850 1855 1860

Page 8: Introduction to Thinking About Data I · Ignaz Semmelweiss Kolletschka dies of septicaemia mortality 0.00 0.05 0.10 0.15 1830 1835 1840 1845 1850 1855 1860. 0 0 0 9 In 1848, aged

000

8

In 1840 maternal mortality in the red wards fell below that in the blue

wards. Medical students, who were doing autopsies before delivering

babies, were still working in the blue wards but had stopped delivering

in the red wards. In 1847 his colleague Jakob Kolletschka was cut with

a student's scalpel while performing a post-mortem and died with a

pathology similar to that of the mothers.

0.00

0.05

0.10

0.15

1830 1835 1840 1845 1850 1855 1860

Mate

rnal m

ort

alit

y

Only midwives

in red wards

Ignaz SemmelweissKolletschka dies

of septicaemia

Ma

tern

al m

ort

alit

y

0.00

0.05

0.10

0.15

1830 1835 1840 1845 1850 1855 1860

Page 9: Introduction to Thinking About Data I · Ignaz Semmelweiss Kolletschka dies of septicaemia mortality 0.00 0.05 0.10 0.15 1830 1835 1840 1845 1850 1855 1860. 0 0 0 9 In 1848, aged

000

9

In 1848, aged 30 years, he made the medical students wash their hands in

chlorinated lime before they went into the maternity wards.

Mortality in the blue wards dropped to the same level as in the red wards.

0.00

0.05

0.10

0.15

1830 1835 1840 1845 1850 1855 1860

Mate

rnal m

ort

alit

y

Semmelweiss makes the

medical students wash

their hands

Ignaz Semmelweiss

Ma

tern

al m

ort

alit

y

0.00

0.05

0.10

0.15

1830 1835 1840 1845 1850 1855 1860

Problem Pattern in the data Think of an explanation

Do an intervention See if it works

Page 10: Introduction to Thinking About Data I · Ignaz Semmelweiss Kolletschka dies of septicaemia mortality 0.00 0.05 0.10 0.15 1830 1835 1840 1845 1850 1855 1860. 0 0 0 9 In 1848, aged

000

10

In 1849 he was fired for criticizing his superiors. In 1865 he died of septicaemia in an insane asylum but it seems that the medical students must have gone on washing their hands

0.00

0.05

0.10

0.15

1830 1835 1840 1845 1850 1855 1860

Ma

tern

al m

ort

alit

y

Ignaz Semmelweiss

Semmelweiss fired

0.00

0.05

0.10

1830 1835 1840 1845 1850 1855 1860

0.15

1830 1835 1840 1845 1850 1855 1860

Page 11: Introduction to Thinking About Data I · Ignaz Semmelweiss Kolletschka dies of septicaemia mortality 0.00 0.05 0.10 0.15 1830 1835 1840 1845 1850 1855 1860. 0 0 0 9 In 1848, aged

000

11

We can now work out odds-ratios and put confidence limits on the estimates.

Od

ds

rati

o f

or

mat

ern

al m

ort

alit

y

0

1

2

3

4

5

6

7

8

Ma

tern

al m

ort

alit

y (

blu

e/r

ed

)

Ignaz SemmelweissSemmelweiss makes the

medical students wash

their hands

Only midwives

in red wards

1830 1835 1840 1845 1850 1855 1860

Page 12: Introduction to Thinking About Data I · Ignaz Semmelweiss Kolletschka dies of septicaemia mortality 0.00 0.05 0.10 0.15 1830 1835 1840 1845 1850 1855 1860. 0 0 0 9 In 1848, aged

000

Snow

Page 13: Introduction to Thinking About Data I · Ignaz Semmelweiss Kolletschka dies of septicaemia mortality 0.00 0.05 0.10 0.15 1830 1835 1840 1845 1850 1855 1860. 0 0 0 9 In 1848, aged

000

Cholera and its mode of

transmission

Page 14: Introduction to Thinking About Data I · Ignaz Semmelweiss Kolletschka dies of septicaemia mortality 0.00 0.05 0.10 0.15 1830 1835 1840 1845 1850 1855 1860. 0 0 0 9 In 1848, aged

000

William Farr: 1851

Miasma theory: Cholera was the result of

breathing polluted air

‘The amount of organic matter…[and] its

distribution will bear … resemblance to the

law regulating the mortality from cholera at

the various elevations’

Page 16: Introduction to Thinking About Data I · Ignaz Semmelweiss Kolletschka dies of septicaemia mortality 0.00 0.05 0.10 0.15 1830 1835 1840 1845 1850 1855 1860. 0 0 0 9 In 1848, aged

000

John Snow: 1854

Cholera is a water borne disease

Page 17: Introduction to Thinking About Data I · Ignaz Semmelweiss Kolletschka dies of septicaemia mortality 0.00 0.05 0.10 0.15 1830 1835 1840 1845 1850 1855 1860. 0 0 0 9 In 1848, aged

000

Oxford Street

Regen

t Stre

et

Snow 1854 cases of cholera

Oxford Street

Regen

t Stre

et

Oxford Street

Regen

t Stre

et

Snow 1854 cases of cholera

Work

house

Page 18: Introduction to Thinking About Data I · Ignaz Semmelweiss Kolletschka dies of septicaemia mortality 0.00 0.05 0.10 0.15 1830 1835 1840 1845 1850 1855 1860. 0 0 0 9 In 1848, aged

000

Oxford Street

Regen

t Stre

et

Snow 1854 cases of cholera

Oxford Street

Regen

t Stre

et

Oxford Street

Regen

t Stre

et

Snow 1854 cases of cholera

Work

house

Water pumps

Page 19: Introduction to Thinking About Data I · Ignaz Semmelweiss Kolletschka dies of septicaemia mortality 0.00 0.05 0.10 0.15 1830 1835 1840 1845 1850 1855 1860. 0 0 0 9 In 1848, aged

000

Lancet 1858 Obituary columnDR JOHN SNOW—This well-known physician died at noon on the 16th instant, at his house in Sackville-street, from an attack of apoplexy. His researches on chloroform and other anaesthetics were appreciated by the profession.

Page 20: Introduction to Thinking About Data I · Ignaz Semmelweiss Kolletschka dies of septicaemia mortality 0.00 0.05 0.10 0.15 1830 1835 1840 1845 1850 1855 1860. 0 0 0 9 In 1848, aged

000

In 1854 Filippo Paccini ‘Microscopical observations and pathological deductions on cholera’ in which he discovered a bacillus which he called Vibrio, and described the organism and its relation to cholera. Recognized in 1965

In 1884 Robert Koch became famous for his identification of the cholera bacillus [among other things] and is the acknowledged discoverer of the cholera organism.

Vibrio cholerae

Page 21: Introduction to Thinking About Data I · Ignaz Semmelweiss Kolletschka dies of septicaemia mortality 0.00 0.05 0.10 0.15 1830 1835 1840 1845 1850 1855 1860. 0 0 0 9 In 1848, aged

000

Lancet 2014 Retraction

The Lancet wishes to correct, after an unduly

prolonged period of reflection, an impression that

it failed to recognise Dr Snow’s remarkable

achievements in the field of epidemiology and [in]

deducing the mode of transmission of cholera. …Comments in 1855 such as In riding his hobby

very hard, he has fallen down through a gully-

hole and has never since been able to get out

again and Has he any facts to show in proof? No!

were perhaps … overly negative in tone.

Page 22: Introduction to Thinking About Data I · Ignaz Semmelweiss Kolletschka dies of septicaemia mortality 0.00 0.05 0.10 0.15 1830 1835 1840 1845 1850 1855 1860. 0 0 0 9 In 1848, aged

000

Historical data

Page 23: Introduction to Thinking About Data I · Ignaz Semmelweiss Kolletschka dies of septicaemia mortality 0.00 0.05 0.10 0.15 1830 1835 1840 1845 1850 1855 1860. 0 0 0 9 In 1848, aged

000

Page 25: Introduction to Thinking About Data I · Ignaz Semmelweiss Kolletschka dies of septicaemia mortality 0.00 0.05 0.10 0.15 1830 1835 1840 1845 1850 1855 1860. 0 0 0 9 In 1848, aged

000

Infectious diseases in England and Wales

1900 to 1990

Page 27: Introduction to Thinking About Data I · Ignaz Semmelweiss Kolletschka dies of septicaemia mortality 0.00 0.05 0.10 0.15 1830 1835 1840 1845 1850 1855 1860. 0 0 0 9 In 1848, aged

000

Page 28: Introduction to Thinking About Data I · Ignaz Semmelweiss Kolletschka dies of septicaemia mortality 0.00 0.05 0.10 0.15 1830 1835 1840 1845 1850 1855 1860. 0 0 0 9 In 1848, aged

000

Page 29: Introduction to Thinking About Data I · Ignaz Semmelweiss Kolletschka dies of septicaemia mortality 0.00 0.05 0.10 0.15 1830 1835 1840 1845 1850 1855 1860. 0 0 0 9 In 1848, aged

000

Page 30: Introduction to Thinking About Data I · Ignaz Semmelweiss Kolletschka dies of septicaemia mortality 0.00 0.05 0.10 0.15 1830 1835 1840 1845 1850 1855 1860. 0 0 0 9 In 1848, aged

000

Page 31: Introduction to Thinking About Data I · Ignaz Semmelweiss Kolletschka dies of septicaemia mortality 0.00 0.05 0.10 0.15 1830 1835 1840 1845 1850 1855 1860. 0 0 0 9 In 1848, aged

000

Page 32: Introduction to Thinking About Data I · Ignaz Semmelweiss Kolletschka dies of septicaemia mortality 0.00 0.05 0.10 0.15 1830 1835 1840 1845 1850 1855 1860. 0 0 0 9 In 1848, aged

000

Page 33: Introduction to Thinking About Data I · Ignaz Semmelweiss Kolletschka dies of septicaemia mortality 0.00 0.05 0.10 0.15 1830 1835 1840 1845 1850 1855 1860. 0 0 0 9 In 1848, aged

000

Deaths at Baragwanath by age, sex and HIV-status 2006-2009

Men Positive Negative UndecidedWomen Positive Negative Undecided

Page 34: Introduction to Thinking About Data I · Ignaz Semmelweiss Kolletschka dies of septicaemia mortality 0.00 0.05 0.10 0.15 1830 1835 1840 1845 1850 1855 1860. 0 0 0 9 In 1848, aged

000

Mendel

Page 35: Introduction to Thinking About Data I · Ignaz Semmelweiss Kolletschka dies of septicaemia mortality 0.00 0.05 0.10 0.15 1830 1835 1840 1845 1850 1855 1860. 0 0 0 9 In 1848, aged

000

Mendel’s Peas

1822 to 1884

Page 36: Introduction to Thinking About Data I · Ignaz Semmelweiss Kolletschka dies of septicaemia mortality 0.00 0.05 0.10 0.15 1830 1835 1840 1845 1850 1855 1860. 0 0 0 9 In 1848, aged

000

Question• Peas have tall or short stems but never in between.

• We can breed true; so that tall plants only produce

tall plants and short plants only produce short

plants

• What happens when we cross them to get the first

filial or F1 generation?

• What happens when we cross the F1 plants to get

the F2 generation?

Page 37: Introduction to Thinking About Data I · Ignaz Semmelweiss Kolletschka dies of septicaemia mortality 0.00 0.05 0.10 0.15 1830 1835 1840 1845 1850 1855 1860. 0 0 0 9 In 1848, aged

000

Experimental design

Observation: Only two kinds of peas: Tall or short.

Experiment:

Tall Short

Tall Short

•••

•••

Tall Short

Tall Short

•••

•••

F1 Tall

F2 Tall:Short

Page 38: Introduction to Thinking About Data I · Ignaz Semmelweiss Kolletschka dies of septicaemia mortality 0.00 0.05 0.10 0.15 1830 1835 1840 1845 1850 1855 1860. 0 0 0 9 In 1848, aged

000

Character Dominants Recessives Ratio

Round v. wrinkled seeds 5474 1850 2.96

Yellow v. green seeds 6022 2001 3.01

Purple v. white flowers 705 224 3.15

Smooth v. constricted pods 882 299 2.95

Axial v. terminal flowers 651 207 3.14

Green v. yellow unripe pods 428 152 2.82

Tall v. dwarf stems 787 277 2.84

Total 14949 5010 2.98

F2 Data

Page 39: Introduction to Thinking About Data I · Ignaz Semmelweiss Kolletschka dies of septicaemia mortality 0.00 0.05 0.10 0.15 1830 1835 1840 1845 1850 1855 1860. 0 0 0 9 In 1848, aged

000

Theory!

• Gene has two alleles: T or t

• Breed TT and tt.

• TT x tt Tt

• F1: are tall. T is dominant

• Tt x Tt TT, Tt, tT or tt

• F2: 3 tall plants for each short plant.

Page 40: Introduction to Thinking About Data I · Ignaz Semmelweiss Kolletschka dies of septicaemia mortality 0.00 0.05 0.10 0.15 1830 1835 1840 1845 1850 1855 1860. 0 0 0 9 In 1848, aged

000

Mendel did not understand that he had just discovered genetics.

Darwin must have read Mendel’s paper but even he did not understand that Mendel had just given him the mechanism underlying his theory of evolution.

Statistics helps us to define the question; the answer is always in the biology!

Page 41: Introduction to Thinking About Data I · Ignaz Semmelweiss Kolletschka dies of septicaemia mortality 0.00 0.05 0.10 0.15 1830 1835 1840 1845 1850 1855 1860. 0 0 0 9 In 1848, aged

000

Advice to young epidemiologists

Never make a calculation until you know the answer.

Make an estimate before every calculation, try a

simple biological argument (R0, generation time,

selection, survival, control). Guess the answer to

every puzzle. Courage: no one else needs to know

what the guess is. Therefore, make it quickly, by

instinct. A right guess reinforces this instinct. A wrong

guess brings the refreshment of surprise. In either

case, life as an epidemiologist, however long, is more

fun.

Plagiarised from E.F. Taylor and J.A. Wheeler Space-time

Physics (1963)

Page 42: Introduction to Thinking About Data I · Ignaz Semmelweiss Kolletschka dies of septicaemia mortality 0.00 0.05 0.10 0.15 1830 1835 1840 1845 1850 1855 1860. 0 0 0 9 In 1848, aged

000

Thank you for listening

Page 43: Introduction to Thinking About Data I · Ignaz Semmelweiss Kolletschka dies of septicaemia mortality 0.00 0.05 0.10 0.15 1830 1835 1840 1845 1850 1855 1860. 0 0 0 9 In 1848, aged

000

Summary

1. Stay as close to the data as you can

2. Look for interesting patterns

3. Put in as much biology as you can

4. Keep it simple

5. Always remember that the purpose of models is not to fit the data but to sharpen the question

43

Page 44: Introduction to Thinking About Data I · Ignaz Semmelweiss Kolletschka dies of septicaemia mortality 0.00 0.05 0.10 0.15 1830 1835 1840 1845 1850 1855 1860. 0 0 0 9 In 1848, aged

000

This presentation is made available through a Creative Commons Attribution license. Details of the license and permitted uses are available at

http://creativecommons.org/licenses/by/3.0/

© 2010 International Clinics on Infectious Disease Dynamics and DataWilliams BG. “Introduction to Thinking About Data” Clinic on the Meaningful Modeling of

Epidemiological Data. DOI:10.6084/m9.figshare.5043136.

For further information or modifiable slides please contact [email protected].

See the entire ICI3D Figshare Collection. DOI: 10.6084/m9.figshare.c.3788224.

000

000

000