intro to statistics part2 arier lee university of auckland

Intro to Statistics Part2

Arier LeeUniversity of Auckland

• Standard error – the standard deviation of the sampling distribution of a statistic

• The standard deviation of the sample means is called the standard error of the mean and it measures how precisely the population mean is estimated by the sample mean

• The standard error is a measure of the precision of the estimated mean whereas the standard deviation summarises the variability or the spread of the observations

• Standard error <= standard deviation • The larger the sample size the smaller the standard error

Standard error

• A 95% confidence interval for a mean is calculated by

(mean-1.96*SE, mean+1.96*SE)• An example: In a sample of 2000 pregnant

women, serum cholesterol was measured and it was found that the sample mean is 5.62 and SE=0.15. 95% confidence interval:

(5.33, 5.91)

Confidence intervals

• 95% CI does not mean that there is a 95% chance that the true mean lies between 5.33 and 5.91

• If we repeat the study over and over again, calculating a 95% confidence interval each time, about 95 of 100 such intervals would include the true mean

• Whether the one that we have obtained from our study is one of them we will never know – but we have some confidence

• It is a measure of precision of our estimate• Bigger confidence interval -> less precision

Confidence intervals

• Exploratory data analysis• Presentation of results• Examples: Bar charts, Line graphs, Scatter plots,

Box plots, Kaplan Meier Plots etc.• Graphs can only be as good as the data they

display• No amount of creativity can produce a good

graph from dubious data

Graphical presentation of the data

Bar chart

2005 maternity report

Line graph

Box plot

median

Q1

Q31.5 x (Q3-Q1)

Smallest obs marks end of whisker

Obs beyond end of whisker

Data to chart ratioMental health score by treatment groups

Good Bad

Inadequate chart type

0-14 years 15-24 years 25-64 years 65+ years0

0.2

0.4

0.6

0.8

1

1.2

1.4

1.6

1.8

Māori

Pacific

Asian

Other Ethnicity (reference)

Incid

ence

rate

ratio

Effect of ethnicity on road traffic injury deaths and hospitalisations, 2000-8, Auckland region, by age group, adjusted for gender and deprivation (using National Minimum Data Set and Mortality Collection data)

• Points with error bars• Log scale

Graphs of risk or rate ratio should be presented with

Odds ratio presented with logarithmic scale

Outcome: Blindness

New

Zea

land

Eur

opea

n

Māo

ri

Paci

fic P

eopl

es

Asia

n

Mid

dle

East

ern/

Latin

Am

eric

an/A

fric

an

Oth

er(N=2909)

(N=571)(N=483)

(N=548)(N=91)

(N=14)

0

10

20

30

40

50

60

70

80

90

100

Seldom or never

weekly

Daily

Perc

ent

(%)

Unnecessary 3D effects

How often do you read to your child

Inadequate labelling

ApplePearBanana

• Use appropriate graph types for the appropriate purpose, e.g. line chart for trend

• All axes, tick marks, title, should be labelled• Appropriate scale used• Adequate data to chart ratio• Avoid unnecessary complexity such as• Irrelevant decoration• Too much colours• 3D effects

• Keep it simple!

Graphical presentation of the data

Research process

Research question

Primary and secondary endpoints

Study design

Sampling and/or randomisation scheme

Power and sample size calculation

Pre-define analyses methods

Analyse data

Interpret results

Disseminate

• One of the statistical, economical and ethical issues of the design of medical studies• Statistical: Ensure the study is large enough to

detect an effect if it exists• Economical: Ensure not enlist more patients than

are needed• Ethical: unethical to engage more people in a trial

than are needed• Larger samples -> more precise estimates• How large?

Sample size and power of a study

• The power of a test is the probability of detecting a true difference

• The size of the sample needed depends on• required power• detectable difference• variability in the population• level of significance (probability of falsely reject the

NULL)• statistical test being used

• Need information to calculated a meaningful sample size – literature search


• A double blind randomised controlled study on treatment for chronic hypertension during pregnancy

• Comparing two treatments:• Standard treatment• New treatment

Sample size and power of a study- an example

• Based on current evidence, assume– Detectable difference: 10mmHg– Standard deviation: 15 mmHg– 90% power– 5% significance level– Two-sided test– 1:1 ratio

• Using PS (a power and sample size calculation software) – 48 subjects per group

• After considering drop-out rate, say 10%, round to, say, 60 subjects per group

Sample size and power of a study- an example

Sample size and power of a studyChronic hypertension during pregnancy example

• To detect a difference of 10mmHg

• SD varies from 5 to 30mmHg

Sample size calculation is an evidence based best guess• Relies on assumptions• Not a precise number• No guarantee of significant effect at the end of a

study


Any Questions?

intro to statistics part2 arier lee university of auckland

Documents

data slide

study slide

true mean

standard deviation

population mean

blindness slide

statistic standard error

observations standard