an introduction to everyday statistics—2
TRANSCRIPT
AN INTRODUCTION TO STATISTICS
An introduction to everyday statistics—2
P Driscoll, F Lecky, M Crosby
Objectivesx Describe central tendency and variabilityx Summarising datasets containing two vari-
ablesIn covering these objectives we will deal withthe following terms:x Mean, median and modex Percentilesx Interquartile rangex Standard deviationx Standard error of the meanIn the first article of this series, we discussedgraphical and tabular summaries of singledatasets. This is a useful end point in its ownright but often in clinical practice we also wishto compare datasets. Carrying this out by sim-ply visually identifying the diVerences betweentwo graphs or data columns lacks precision.Often therefore the central tendency and vari-ability is also calculated so that more accuratecomparisons can be made.
Central tendency and variabilityIt is usually possible to add to the tabular orgraphical summary, additional informationshowing where most of the values are and theirspread. The former is known as the centraltendency and the latter the variability of thedistribution. Generally these summary statis-tics should not be given to more than one extradecimal place over the raw data.
CENTRAL TENDENCY
There are a variety of methods for describingwhere most of the data are collecting. Thechoice depends upon the type of data beinganalysed (table 1).
MeanThis commonly used term refers to the sum ofall the values divided by the number of datapoints. To demonstrate this consider the
following example. Dr Egbert Everard receivedmuch praise for his study on paediatric admis-sions on one day to the A&E Department ofDeathstar General (article 1). Suitably encour-aged, he reviews the waiting time for the 48paediatric cases involved in the study (table 2).
Considering cases 1 to 12, the mean waitingtime is:
46+45+44+26+52+14+19+18+14+34+18+22/12
Which is:352/12 = 29.3 minutes
Consequently the mean takes into accountthe exact value of every data point in the distri-bution. However, the mean value itself may notbe possible in practice. An example of this isthe often quoted mean of “2.4 children” perfamily. Another major advantage of using themean is that combining values from severalsmall groups enables a larger group mean to beestimated. This means Egbert could determinethe mean of all 48 cases by repeating this pro-cedure for cases 13 to 24; 25 to 36 and 37 to48. The four separate means could then beadded together and divided by 4 to give theoverall mean waiting time.
The problem in using the mean is that it ismarkedly aVected by distribution of the dataand outliers. The latter are the extreme valuesof the data distribution but are not true meas-ures of the central tendency. The mean istherefore ideally reserved for normally distrib-uted data with no outliers.
Table 1 Applicability of measure of central tendency
Mean Median Mode
Useful with quantitative data U U UUseful with ordinal data × U UUseful with nominal data × × UEVected by outliers U × ×EVected by lack of normal distribution U ×U ×
Key pointCentral tendency and variability are com-mon methods of summarising ordinal andquantitative data
Table 2 Waiting time for paediatric A&E admissions inone day to Deathstar General
Case
Waitingtime(min) Case
Waitingtime(min) Case
Waitingtime(min) Case
Waitingtime(min)
1 46 13 19 25 10 37 92 45 14 20 26 17 38 313 44 15 19 27 14 39 184 26 16 14 28 15 40 75 52 17 27 29 22 41 196 14 18 26 30 27 42 227 19 19 24 31 45 43 148 18 20 17 32 18 44 119 14 21 19 33 22 45 1710 34 22 28 34 10 46 2611 18 23 37 35 19 47 3712 22 24 12 36 15 48 41
Key pointMean = Óx/n. Where Óx = the sum of all themeasurements and n = the number ofmeasurements
J Accid Emerg Med 2000;17:274–281274
Accident andEmergencyDepartment, HopeHospital, SalfordM6 8HD
Correspondence to:Mr Driscoll, Consultant inA&E Medicine ([email protected])
on January 18, 2022 by guest. Protected by copyright.
http://emj.bm
j.com/
J Accid E
merg M
ed: first published as 10.1136/emj.17.4.274 on 1 July 2000. D
ownloaded from
on January 18, 2022 by guest. P
rotected by copyright.http://em
j.bmj.com
/J A
ccid Em
erg Med: first published as 10.1136/em
j.17.4.274 on 1 July 2000. Dow
nloaded from
on January 18, 2022 by guest. Protected by copyright.
http://emj.bm
j.com/
J Accid E
merg M
ed: first published as 10.1136/emj.17.4.274 on 1 July 2000. D
ownloaded from
on January 18, 2022 by guest. P
rotected by copyright.http://em
j.bmj.com
/J A
ccid Em
erg Med: first published as 10.1136/em
j.17.4.274 on 1 July 2000. Dow
nloaded from
on January 18, 2022 by guest. Protected by copyright.
http://emj.bm
j.com/
J Accid E
merg M
ed: first published as 10.1136/emj.17.4.274 on 1 July 2000. D
ownloaded from
on January 18, 2022 by guest. P
rotected by copyright.http://em
j.bmj.com
/J A
ccid Em
erg Med: first published as 10.1136/em
j.17.4.274 on 1 July 2000. Dow
nloaded from
on January 18, 2022 by guest. Protected by copyright.
http://emj.bm
j.com/
J Accid E
merg M
ed: first published as 10.1136/emj.17.4.274 on 1 July 2000. D
ownloaded from
on January 18, 2022 by guest. P
rotected by copyright.http://em
j.bmj.com
/J A
ccid Em
erg Med: first published as 10.1136/em
j.17.4.274 on 1 July 2000. Dow
nloaded from
on January 18, 2022 by guest. Protected by copyright.
http://emj.bm
j.com/
J Accid E
merg M
ed: first published as 10.1136/emj.17.4.274 on 1 July 2000. D
ownloaded from
MedianThe mean cannot be used for ordinal databecause it relies on the assumption that thegaps between the categories are the same. It ispossible however to rank the data and deter-mine the midpoint. When there is an evennumber of cases the mean of the two middlevalues is taken as the median. In Egbert’sstudy, the median waiting time for cases 1 to 12would therefore be calculated by:
Listing the data points in rank order:14 14 18 18 19 22 26 34 44 45 46 52Determining the midpoint that lies half way
between the 6th and 7th data point:(22 + 26)/2 = 24 minutes
In contrast with the mean, the median is lessaVected by outliers, being responsive only tothe number of scores above and below it ratherthan their actual value. It is therefore better atdescribing the central tendency in non-normally distributed data and when there areoutliers. However, in bimodal distribution, nei-ther median or mean will provide a good sum-mary of the central tendency.
A common example of the use of medians isseen in the longevity of grafts, prostheses andpatients. It is used because the loss to followup, and the unknown end point of the trial,prevents a mean being used to determine theaverage survival duration.
ModeThis equals the commonest obtained value ona data scale. The mode is also used in nominaldata to describe the most prevalent characteris-tic particularly when there is a bimodaldistribution. Consequently in Egbert’s study,there would be two mode values for cases 1 to12—that is, 14 and 18 minutes.
The mode can be demonstrated graphically.Another advantage is that it is relativelyimmune to outliers. However, the peaks can beobscured by random fluctuations of the data
when the sample size is small. Furthermore,unlike the mean and median, it can changemarkedly with small alterations in the samplevalues.
VARIABILITY
This is a measure of the distribution of thedata. It can be described visually by plottingthe frequency distribution curve. However, it isbest to quantify this spread using methodsdependent upon the type of data you are deal-ing with (table 3). Though there are measuresof spread of nominal data, they are rarely usedother than to provide a list of the categoriespresent.1 We will therefore concentrate onmethods used to describe the variability ofordinal and quantitative data.
RangeThis is the interval between the maximum andminimum value in the dataset. It can beprofoundly eVected by the presence of outliersand does not give any information on wherethe majority of the data lies. As a consequenceother measures of variability tend to be used.
PercentilesIt is possible to divide up a distribution into100 parts such that each is equal to 1 per centof the area under the curve. These parts areknown as percentiles. This process is used toidentify data points that separate the wholedistribution into areas representing particularpercentages. For example, as described previ-ously, the median divides the distribution intotwo equal areas. It therefore represents the50th centile. Similarly the 10th centile wouldbe the data point dividing the distribution intotwo areas one representing 10% of the area andthe other 90% (fig 2).
To determine the value of a particularpercentile we use the same process requiredto work out the median. To demonstrate this
Key pointsx The mean reflects all the data pointsx Small group means can be combined to
estimate an overall meanx The mean is commonly used for quanti-
tative data
Key pointMedian = value of the (n+1)/2 observation.Where n is the number of observations inascending order
Key pointsx Medians only reflect the middle data
pointsx You cannot combine medians to get an
overall valuex Median is commonly used for ordinal
data, and for quantitative data when thedata are skewed or contains outliers
Key pointsx Mode only reflects the most common
data pointsx Modes cannot be combined to get an
overall valuex Mode is commonly used for nominal data
Key pointsx The mean, median and mode will all be
the same in normally distributed data butdiVer when the distribution is skewed (fig1)
x When dealing with skewed data, themean is positioned towards the long tail,the mode towards the short one and themedian is between the two. The medianvalue divides the curve into two equallysized areas (fig1(B) and (C))
x The mode is a good descriptive statisticbut of little other use
x The mean and median provide only onevalue for a given dataset. In contrast therecan be more than one mode
An introduction to everyday statistics—2 275
on January 18, 2022 by guest. Protected by copyright.
http://emj.bm
j.com/
J Accid E
merg M
ed: first published as 10.1136/emj.17.4.274 on 1 July 2000. D
ownloaded from
consider how Egbert would determine the 25thcentile in his waiting time data.
Firstly, he needs to sort them into rank order(table 4). He then locates the position of the10th percentile by:
10/100 × (n +1) where n is the number ofdata points in the distribution.
The position of the 10th percentile is there-fore 0.1 × 49 = 4.9
This means the 10th centile lies nine tenthsof the way from the 4th (that is, 10 minutes) to
the 5th (that is, 11 minutes) data points. Con-sequently the 10th percentile is 10.9 minutes.In other words, 10 per cent of all the waitingtimes in Egbert’s study have a value of 10.9minutes or less. An equally correct way ofexpressing this result would be say that 90% ofall the waiting times in Egbert’s study have avalue of 10.9 minutes or greater (fig 2).
The two most commonly used percentilesare the 25th and 75th ones. As they are mark-ing out the lower and upper quarters of thedistribution they are known as the lower andupper quartile values. The data points betweenthem represent the interquartile range.
Interquartile (IQ) rangeAs the lower and upper ends of the distributionare eliminated, the interquartile range showsthe location of most of the data and is lessaVected by outliers. It represents a goodmethod of summarising the variability whendealing with ordinal data or distributions thatare not “normal”.
The IQ range can also be used to provide agood graphical representation of the data. Asthe lower and upper quartile values representthe 25th and 75th percentile, and the medianthe 50th, these can be used to draw a “box andwhisker” plot (fig 3). The box represents thecentral 50% of the data and the line in the boxmarks the median. The whiskers extend to themaximum and minimum values. In the case ofoutliers the whiskers are restricted to a lengthone and a half the interquartile range withother outliers being shown separately (fig 3).Box and whisker plots allow the reader toquickly judge the centre, spread and symmetryof the distribution. In so doing it enables twodistributions to be compared (see later).Nevertheless, from a mathematical point ofview, the IQ range does not provide the versa-tility required for detailed comparisons of nor-mally distributed quantitative data. For this weneed the variance and standard deviation.
VarianceAn obvious way of looking at variability is tomeasure the diVerence of each value from thesample mean—that is, the calculated mean of
Figure 1 Location of mean, median and mode with diVerent distributions. (A) Normaldistribution. (B) Right (positive) skewed distribution. (C) Left (negative) skeweddistribution.
Variable
Freq
uen
cyA
mean; median; mode
Variable
Freq
uen
cy
B Mode Median
Mean
Variable
Freq
uen
cy
C ModeMedian
Mean
Table 3 Methods of quantifying distribution
RangeIQrange SD SEM
Descriptive of samplevariability U U U U
Describes quantitative data U U U UDescribes ordinal data U U × ×
Key pointPercentiles are values that identify a particu-lar percentage of the whole distribution
Key pointsx Value of the X percentile = X (n+1)th
observation/100. Where n is the numberof observations in ascending order
x Percentiles can only be calculated forquantitative and numeric ordinal data
Key pointsx The IQ range is a good method of
describing data but not the most eYcientway of describing data variability whencomparing groups
x The IQ range is commonly used todescribe the variability of ordinal dataand quantitative data that are skewed orhave outliers.
276 Discoll, Lecky, Crosby
on January 18, 2022 by guest. Protected by copyright.
http://emj.bm
j.com/
J Accid E
merg M
ed: first published as 10.1136/emj.17.4.274 on 1 July 2000. D
ownloaded from
the study group. However, in simply adding upthese numbers you would always arrive at zerobecause the sum of the diVerences bigger andsmaller than the mean are equal but opposite.However, by squaring all the diVerences beforeadding them together, a positive result isalways achieved. The sum of these squaresabout the mean is usually known as the “sumof the squares”. For example, in a study thefollowing diVerences were found between themean and each value:
−3, −2, −2, −1, 0, 0, 0, 1, 1, 1, 2, 3Therefore the sum of the squares is:+4 +4 + 1 + 0 +0 +0 + 1 +1 +1 + 4 + 9 = 34The size of the sum of the squares will vary
according to the number of observations aswell as the spread of the data. To get anaverage, the sum of the squares is divided bythe number of observations. The result iscalled the “variance”.
Standard deviationVariance has limited use in descriptive statisticsbecause it does not have the same units as theoriginal observations. To overcome this thesquare root of the variance can be calculated.This is known as the standard deviation (SD)and it has the same units as the variablesmeasured originally. If the SD is small then thedata points are close to one another, whereas alarge SD indicates there is a lot of variationbetween individual values. In fact the mean forpositive values is no longer an adequate meas-ure of the central tendency when it is smallerthan the SD.
When the data distribution is normal, ornearly so, it is possible to calculate that 68.3%of the data lies +/− 1 SD from the mean;95.4% between +/− 2 SD; 99.7% between +/−3 SD (fig 4).
The “normal range” corresponds to theinterval that includes 95% of the data. This isequal to 1.96 SD on either side of the mean butis often approximated to mean (2SD). If thiscriterion is used to diagnose health anddisease, it follows that 5% of all results fromnormal individuals will be classified as beingpathological (fig 4).
STANDARD ERROR OF THE MEAN
It is useful at this stage to consider the standarderror of the mean (SEM). Usually in clinicalpractice we do not know what the true overallmean is. All we have are the results from ourstudy and the standard deviation. Fortunatelyit can be shown mathematically that, providedour study group size is over 10 and randomlyselected, the overall mean lies within a normaldistribution whose centre is our study group’smean (fig 5). The standard deviation of thisdistribution is called the SEM. This can be
Figure 2 Frequency distribution showing the 10th centile.
Variable
Freq
uen
cy
10% of the distribution
10th centile
90% of the distribution
Median
Table 4 Waiting time for paediatric A&E admissions inone day to Deathstar General—sorted by rank order
Rankorder
Waitingtime(min)
Rankorder
Waitingtime(min)
Rankorder
Waitingtime(min)
Rankorder
Waitingtime(min)
1 7 13 15 25 19 37 272 9 14 17 26 19 38 283 10 15 17 27 20 39 314 10 16 17 28 22 40 345 11 17 18 29 22 41 376 12 18 18 30 22 42 377 14 19 18 31 22 43 418 14 20 18 32 24 44 449 14 21 19 33 26 45 4510 14 22 19 34 26 46 4511 14 23 19 35 26 47 4612 15 24 19 36 27 48 52
Figure 3 A box and whisker plot.
75th percentile
Maximum value
Median (50th percentile)
Minimum value
Outlier
25th percentile
Key pointVariance = Ó (Mean − x)2/(n). Where Ó(Mean − x)2= the sum of the squares of allthe diVerences from the mean
Key pointsx The standard deviation is a measure of
the average distance individual values arefrom the mean
x The bigger the variability the bigger thestandard deviation
x The standard deviation is used to meas-ure the variability of quantitative data thatare not markedly skewed and do not haveoutliers
An introduction to everyday statistics—2 277
on January 18, 2022 by guest. Protected by copyright.
http://emj.bm
j.com/
J Accid E
merg M
ed: first published as 10.1136/emj.17.4.274 on 1 July 2000. D
ownloaded from
calculated by dividing the sample’s standarddeviation by the square root of the number ofobservations in the sample.
As the distribution is normal, the same prin-ciples apply as found with the SD above. Con-sequently there is a 95% chance that the over-all mean lies +/− 1.96 SEM from the samplemean.
To demonstrate these points, consider whatWatenpaugh and GaVney are saying when theywrite, “One hour after infusion, the capillaryreabsorption of fluids was −236 +/− SEM 102ml/h”.2
This means we can be 95% sure that theoverall mean reabsorption rate lies somewherebetween:
−236 +/− (1.96 × 102) ml/h = −435.9 ml/hto −36.1 ml/h
The size of the SEM tends to decrease as thestudy group size gets larger or the variationwithin the sample becomes less marked.Consequently samples greater than 30 with lit-tle variation will have small SEM. As a result,the overall mean can be reliably pin pointed tolie within a narrow range.
So far in article 1 and 2, we have concen-trated on summarising single variable datasets.In practice this is often a preliminary step tocombining and comparing diVerent collectionsof data.
Summarising data from two variablesAgain the method used for this is dependentupon the type of data being dealt with (table5). In subsequent articles we will discuss howeach of these various types of comparisons arecarried out. For now, we need to consider theimportant points that have to be borne in mindwhen presenting the summary of two variables.
CROSS TABULATION
When comparing nominal datasets, a fre-quency table can be drawn showing thenumber and proportion (percentage) of caseswith a combination of variables. If there arelarge numbers, and marked variations, it isoften better to present the data as a percentageof either the row or column totals. This makesinterpretation by the reader easier. However,when using percentage, you need to indicatethe total number from which they werederived. If one, or more, of the datasets areordinal or quantitative then the groups must bearranged in the correct order (fig 6).
Figure 4 A normal distribution curve divided up into diVerent multiples of the standarddeviation (SD).
Freq
uen
cy
–3SD +3SD–2SD +2SD–1SD +1SD
Mean
Figure 5 Distribution of the mean of samples of various sizes.
Mean
n = 100
n = 25
n = 10 Original population
Table 5 Methods used to compare diVerent types of data
Nominal Ordinal Quantitative—discrete Quantitative—continuous
Nominal Cross tabulation Cross tabulation Cross tabulation GraphicalBar chart Bar chart Graphical
Ordinal Cross tabulation Cross tabulation GraphicalBar chart Graphical
Quantitative—discrete Cross tabulation GraphicalGraphical
Quantitative—continuous Scatter plot with regressionand/or correlation
Key pointSEM = S'nD'n
Key pointsx SEM is used when describing the overall
mean, not the study group’s meanx SEM is a measure of the variance of the
overall mean as estimated from the studygroup. It is not a measure of the distribu-tion of the data.
278 Discoll, Lecky, Crosby
on January 18, 2022 by guest. Protected by copyright.
http://emj.bm
j.com/
J Accid E
merg M
ed: first published as 10.1136/emj.17.4.274 on 1 July 2000. D
ownloaded from
BAR CHART
You can use bar charts when comparing nomi-nal and ordinal datasets. In the latter case thegroups must be arranged in the most appropri-ate order (fig 7). This comparison has to bedone visually and is therefore prone to error.
GRAPHICAL REPRESENTATION
The dot plot provides a useful way of allowingungrouped, discrete data to be visually com-pared. As with the bar chart, in the dot plot thevalues of the variable are listed on the verticalaxis and the categories on the horizontal.
Figure 8 shows the values of an objectivemeasure of muscle bulk in the hand accordingto a simple clinical classification of musclewasting for 61 elderly subjects. With the dot
plot each individual value is shown (fig 8 (A)).It is therefore easy to see that there is a largeamount of scatter and that the distributions areslightly skewed. As described previously thebox and whisker plots can be used tosummarise the data and demonstrate thediVerence between the groups (fig 8 (B)). Thisshows those with most wasting tend to havesmaller cross sectional areas (CSA). Furtherinvestigation reveals that when the CSA are logtransformed the distribution is more sym-metrical. We can therefore now use means andSD to summarise the transformed data (fig8C). These are shown against a logarithmicscale in their original units as it makes themeasier to interpret.
Graphical representation of central tendencyand variance allow data to be summarised in areadily accessible format (figs 8 (B) and 9 (A)).This is commonly done by marking the centraltendency and providing a measure of the vari-ability. However, if it is important to representeach reading, a dot plot can be used. This ena-bles the corresponding points to be joinedwhen considering a series of readings from thesame subject (fig 9(B)).
SCATTER PLOT, CORRELATION AND REGRESSION
A scatter plot enables the relation betweenquantitative-continuous variables to be dem-onstrated (fig 10). By convention the variablethat is “controlling” the other is known as theindependent variable and its value is recordedon the horizontal axis. These independent
Figure 6 Cross tabulation using an example of quantitative data (waiting times) and ordinal data (Triage category) fromEgbert’s study on paediatric A&E attendances.
Tria
ge
cate
go
ry
Waiting time (min)
0–30 31–60 61–90 90–120 Row total (%)
Yellow
Green
Blue
Not recorded
Column total (%)
1
11
12 (25.0)
5
12
3
20 (41.7)
10
5
15 (31.3)
1
1 (2.1)
6 (12.5)
33 (68.8)
1 (2.1)
8 (16.7)
48 (100)
Number of missing observations = 1
Figure 7 Frequency of male paediatric admissions in each triage category.
20
18
14
16
12
8
10
6
4
0
2
Triage category
Nu
mb
er
Yellow Green Blue Not recorded
Figure 8 Three methods of showing the central tendency and variance using the same data. (A) Scatter plot. (B) Box and whisker plot. (C) Scatter plotusing a logarithmic scale.
800
700
500
600
400
200
300
ModerateMild
Wasting
CS
A (
mm
2 )
None
A800
700
500
600
400
200
300
ModerateMild
Wasting
CS
A (
mm
2 )
None
B800
700
500
600
400
200
300
ModerateMild
Wasting
CS
A (
mm
2 )
None
C
An introduction to everyday statistics—2 279
on January 18, 2022 by guest. Protected by copyright.
http://emj.bm
j.com/
J Accid E
merg M
ed: first published as 10.1136/emj.17.4.274 on 1 July 2000. D
ownloaded from
variables are those that are varied by theexperimenter, either directly (for example,treatment given) or indirectly through theselection of subjects (for example, age). Thedependent variables are the outcome of thisexperimental manipulation and they are plot-ted on the vertical axis.
SummaryWhen confronted with a vast amount of infor-mation, first consider the type of data present.
Then summarise appropriately (see article 1),noting the central tendency and variability.Once this is complete it is possible to combinethe information from two or more variablestaking into account what types of data they are.
Quiz(1) If data are skewed to the right, which is
greater the median or the mean?(2) Describe two main diVerences between the
mean and the median.(3) Consider Egbert’s waiting time data listed
in table 4.(a) What is the mean, median and mode?(b) What is the 75th centile?(c) What is the interquartile range?
(4) Figure 11 shows a box and whisker plotfrom Egbert’s study.(a) What do the box’s represent(b) What does the line in the box’s repre-
sent(c) How would you describe the box and
whisker plot of triage category (green)and booking time?
(5) One for you to do on your own.Mullner et al carried out an experiment tolook at myocardial function following suc-cessful resuscitation after a cardiac arrest.3
Part of their results are shown in table 6:(a) Identify the diVerent types of data(b) Identify the central tendency and
variability for each variable(c) Present a tabular summary of the
patient’s ages(d) What graphical summary would you
use to compare the frequency of VFwith site of the myocardial infarction?
(e) What graphical summary would youuse to show the comparison betweenthe site of the myocardial infarctionand the duration of the arrest?
(f) What method would you use to showthe comparison between the systolicblood pressure and duration of arrest?
Answers(1) The mean(2) The mean reflects all the data points
whereas the median uses only the middleones. Secondly, unlike medians, smallgroup means can be combined to give anoverall mean.
(3) (a) Mean, median and mode:Mean = 1100/48 = 22.9 minutesMedian = 49/2nd data point = halfway between the 24th and 25th datapoint = 19 minutesMode = 19 minutes
(b) The 75th centile equals: 75× 49/100thdata point = 36.75th data point. Thismeans it is 0.75 of the way from the36th to the 37th data point. However,using table 4, both data points are 27minutes.The 75 th centile is therefore 27 min-utes.
(c) The interquartile range equals:This is the range between the 25thand the 75th centile
Figure 9 Two methods of presenting the same data. (A) provides a measure of the centraltendency and variance using a box and whisker plot. However, if the change in each subjectis needed, individual values should be plotted and joined (B).
50
40
20
30
10
01991
EA
T S
core
1989
A50
40
20
30
10
0
1991E
AT
Sco
re1989
B
Figure 10 Grip strength and cross sectional area (CSA) of index finger-thumb web space.
180
140
160
120
80
100
60
40
20
0
CSA (mm2)
Gri
p (
n)
0 100 200 300 400 500 600
Figure 11 Box and whisker plot of booking time by triage category.
24
18
12
6
0
Triage category
Bo
oki
ng
tim
e (h
)
n = 6Yellow
33Green
1Blue
8Not recorded
280 Discoll, Lecky, Crosby
on January 18, 2022 by guest. Protected by copyright.
http://emj.bm
j.com/
J Accid E
merg M
ed: first published as 10.1136/emj.17.4.274 on 1 July 2000. D
ownloaded from
Using the process described in 3 (b),the 25th centile is 15 minutes.The interquartile range is therefore15–27 minutes.
(4) (a) The middle 50% of the data
(b) The median(c) Negative skewed distribution (long
tail on the side with the lowestnumbers)
The authors would like to thank Sally Hollis and Alan Gibbswho helped with an earlier version of this article and the invalu-able suggestions from Jim Wardrope, Paul Downs and IramButt.
1 Bowlers D. Measures of spread of nominal variables. In:Statistics from scratch. Chichester: John Wiley, 1996:118–23.
2 Watenpaugh D, GaVney F. Measurement of net whole-bodytranscapillary fluid transport and eVective vascular compli-ance in humans. J Trauma 1998;45:1062–8.
3 Mullner M, Domanovitis H, Sterz F, et al. Measurement ofmyocardial contractility following successful resuscitation:quantitated left ventricular systolic function utilisingnon-invasive wall tress analysis. Resuscitation 1998;39:51–9.
Further readingAltman D, Bland J. Quartiles, quintiles, centiles and otherquantiles. BMJ 1994;309:996.Bowlers D. Measure of average. In: Statistics from scratch. Chich-ester: John Wiley, 1996:83–113.Coogan D. Statistics in clinical practice. London: BMJ publi-cation, 1995.Gaddis G, Gaddis M. Introduction to biostatistics: part 2,descriptive statistics. Ann Emerg Med 1990;19:309–15.Glaser A. Descriptive statistics. In: High-yield biostatistics. Phila-delphia: Williams and Wilkins 1995:1–18.Funding: none.
Conflicts of interest: none.
Table 6
PatientAge(y) ECG
Systolic BP(mm Hg)
Duration ofarrest (min)
Epinephrine(mg) MI location
1 44 VF 98 15 2 Anterior2 58 Asystole 112 2 0 None3 57 VF 100 12 0 None4 62 VF 90 27 4 Inferior5 52 VF 110 25 10 Inferior6 65 VF 117 48 6 Anterior7 42 VF 113 13 0 Inferior8 46 VF 105 9 0 None9 59 VF 180 15 1 Inferior10 56 VF 97 47 8 Anterior11 73 VF 114 18 1 None12 60 VF 165 25 2 Subendocardial13 80 PEA 115 29 2 None14 62 Asystole 85 26 7 None15 58 VF 104 15 6 Inferior16 50 VF 101 20 4 None17 72 VF 100 32 3 Inferior18 46 VF 120 23 2 Subendocardial19 72 VF 138 22 3 Inferior20 50 VF 102 42 3 None
MI = myocardial infarction; VF = ventricular fibrillation; PEA = pulseless electrical activity.
An introduction to everyday statistics—2 281
on January 18, 2022 by guest. Protected by copyright.
http://emj.bm
j.com/
J Accid E
merg M
ed: first published as 10.1136/emj.17.4.274 on 1 July 2000. D
ownloaded from
AN INTRODUCTION TO STATISTICS
An introduction to statistical inference—3
P Driscoll, F Lecky, M Crosby
Objectivesx Discuss the principles of statistical inferencex Quantifying the probability of a particular
outcomex Discuss clinical versus statistical significance
In covering these objectives we will intro-duce the following terms:
x Population and samplex Parameter and statisticx Null hypothesis and alternative hypothesisx Type I and II errorsThe previous two articles discussed summaris-ing data so that useful comparisons can bemade. Another common problem encounteredis estimating a value in a larger group basedupon information collected from a smallnumber of subjects. To see how statistics canbe used to achieve this, it is helpful to begin byreviewing the meaning of five commonly usedterms:x Population and samplex Parameter and statisticx ElementThe word “population” describes a large groupthat includes every possible case. In contrast, a“sample” is a smaller group of subjects who arepart of the population. Therefore the popula-tion of UK emergency departments wouldhave every emergency department in the UKwhereas those in the north west wouldrepresent a sample.
A value measured in a population is known asa “parameter”. Consequently the trolley waitingtime in UK emergency departments would be aparameter. The term “statistic” is used todenote the same variable when it is measured ina sample. Finally each separate observation ineither a population or sample is called an“element” and it is often labelled with the letterX. The number of elements in a population isgiven the letter N and in a sample, n.
It is often not possible to record all theelements of a population. For example, a studyinvestigating the peak flow in asthmatic pa-tients attending UK emergency departmentscannot review every patient. However, it is fea-sible to record the peak flow in a sample of
asthmatic emergency department attendances.From this statistic an estimation of the popula-tion’s peak flow can be inferred. The namegiven for manipulating data in this way istherefore called inferential statistics. It can alsobe used to make estimations about a samplebased upon information from a population.
In carrying out inferential statistics it isimportant to ensure that samples are represen-tative of the whole population and have beenrandomly selected. If this is not the case, biaswill be introduced and a perverse answer couldresult. For example, inferential statistics couldbe used for making a national generalisationfollowing a survey on the waiting times in 20emergency departments. However, problemswould arise if the sample did not represent thepopulation. For example, if the investigationlooked at district general hospital emergencydepartments in London then it is unlikely to bean accurate reflection of all the emergencydepartments in the UK.
It is also important that each subject has anequal chance of being included in the sample.Consequently, if the trolley waits for elderlypatients was being studied, all such times needto be recorded not just those measured whenthere is an apparent delay. A possible way ofachieving this goal is by random sampling.This is a method of selecting subjects such thateach member of the population has an equaland independent chance of being picked. Avariety of techniques can be used includingflicking a coin, drawing numbers from a hatand reading from random number tables.However, by themselves, these techniques maynot be adequate because populations can bemade up of diVerent types of groups of varioussizes. This heterogeneity could have a bearingon the study. For example the stage of adisease, and the age or sex of a patient maychange the response to a particular drug. Asthese are not necessarily evenly distributed in apopulation it is important they are adequatelycovered in the sampling process. This isachieved by stratified random sampling inwhich the population is initially divided intohomogeneous groups from which randomsamples are taken. In this way representativesamples of the whole population can beobtained.
Allowing for uncertaintyMeasurements on people vary because we arenot all the same. Therefore the peak flow in 20
Key pointA population contains all the elements fromX1 to XN and a sample has n of the Nelements.
J Accid Emerg Med 2000;17:357–363 357
Accident andEmergencyDepartment, HopeHospital, SalfordM6 8HDP DriscollF LeckyM Crosby
Correspondence to:Mr Driscoll, Consultant inAccident and EmergencyMedicine ([email protected])
Accepted 22 June 2000
www.jnlaem.com
randomly selected adults from your waitingroom would vary over a range of values. Thiswould be the case even if all the subjects wereperfectly healthy. We could reduce this rangeby being very selective about who wemeasured, for example being male, 1.75 meterstall, good looking and from Yorkshire! How-ever, even if you managed to find 20 such sub-jects, the peak flows would still vary.
A useful way of considering this variation isto plot it as a frequency distribution. When thevalues from a reasonably homogeneous groupare shown in this way the curve takes upon theshape of a “normal distribution”.1 This isbecause many of the recordings are clusteredaround the mean value with a symmetrical falloV in the frequency of recordings as you movefrom the centre.
Inferential statistics enables us to takeaccount of these variations when estimationsabout a parameter or statistic are been made. Itdoes this by quantifying the probabilities of thepossible outcomes.
ProbabilityIn view of the variations discussed previouslywe cannot predict with absolute certainty theoutcome of an event prior to it happening. Thisuncertainty is often referred to as “a matter ofchance”. It is however possible to quantify thisuncertainty by calculating the probability of anevent occurring.
Probability (p) is invariably expressed as adecimal value between 0 and 1, where zeromeans that an outcome will never happen and1 means it will always occur. Therefore, if 30%
of patients survive after a cardiac arrest in hos-pital, the probability of survival would be writ-ten as 0.3. As it cannot be larger than 1, it fol-lows that the probability of an event notoccurring is 1−p. Therefore the probability ofnot surviving a cardiac arrest is 1−0.3 = 0.7.
Probabilities are often used to help guidemanagement. For example after a head injury,a patient with no skull fracture and noneurological signs has a 0.00017 probability ofdeveloping an operable intracranial hae-matoma. This is such a low number that wetend to discharge these patients under the careof a sensible adult.
When using probabilities in determiningclinical decision making, one tends to err onthe side of caution so that no cases are missed.This is seen with the Ottawa knee protocol(box 1). A level is chosen where the probabilityof missing a fracture is zero. Consequentlyradiographs are reserved for those patientswith particular presenting symptoms.
COMBINING PROBABILITIES
Though a single finding, or test, can help inclinical decision making, in practice we oftenrely on several results before making a diagno-sis. This process entails combining the prob-abilities of each of the possible outcomes.
The chance of a particular outcome in any ofthe tests carried out is equal to the sum of allthe probabilities. For example, if the probabil-ity of a patient in your waiting room beingblood group O is 0.46, the chance of either oftwo unrelated patients being blood group O is0.46 + 0.46 = 0.92. This is known as the addi-tional rule of probability and it can be writtenas: Probability of outcome A or B = probabilityof outcome A + probability of outcome B
To calculate the probability of a specificcombination of independent outcomes occur-ring (for example, the probability of outcome Aand B), the separate outcome probabilitiesneed to be multiplied together. Therefore, theprobability of both patients being blood group
Key pointsx Population This is a complete group (that
is, having every eligible person (or item)with a particular characteristic). Depend-ing on the study the actual size of a popu-lation varies from modest (for example,all minor injury units in a particularregion) to huge (for example, all emer-gency treatment centres in Europe).
x Sample This is a subset of the populationthat has been collected for a study. Aswith the population, the size of a samplecan vary.
x Parameter This is the value of a variable ina population.
x Statistic This is the value of a variable in asample.
x Element This is a single observation.x Statistical inference This enables state-
ments to be made about a sample basedupon a population’s parameter. It alsoallows the converse to occur but in thiscase it is dependent upon random,representative samples being taken.
Key pointProbability is the proportion of cases in astudy that have a particular result.
Key pointsx Probability values are expressed as a deci-
mal from 0 to 1x The probability of an event occurring
cannot be negativex The probability of an event not occurring
is 1−p.
Box 1 Ottawa knee rule2
Order radiography of the knee if any of thefollowing are present:x Patient older than 55 yearsx Tenderness at the head of the fibulax Isolated tenderness of the patellax Inability to flex to 90 degreesx Inability to transfer weight for four steps
both immediately after injury and in theA&E department
358 Driscoll, Lecky, Crosby
www.jnlaem.com
O is 0.46 × 0.46 = 0.21. This is the multiplica-tion rule of probability and it can be written as:Probability of outcome A and B = probabilityof outcome A × probability of outcome B
However, the method used to calculate thechance of a particular combination varies withthe independence of the outcomes. Independ-ence in this context means the chances of aparticular result will not make another out-come more or less likely. In the example givenabove this obviously applies—a particularpatient’s blood group will not change thechances of an unrelated patient being of certainblood type.
It also follows that if the outcomes are notindependent then the multiplication probabil-ity will not apply. This can be used to detectfactors that are related.3 To demonstrate thisassume the probability of loosing a finger inyour community is 0.01 and the probability ofworking at “Happy crusher” the local stealworks is 0.1. If these were independent of oneanother the probability of working at Happycrusher and loosing a finger should be:
0.1 × 0.01 = 0.001(that is, one chance in a 1000)
If you find out that in practice the figure is0.01 you would suspect there is a connectionand the factors are not independent.
In clinical practice we are often dealing withoutcomes that are not mutually exclusive.4
Consequently you usually need to take intoaccount the probability of a combinationoccurring. This can be calculated by modifyingthe equation above to: Probability of outcomeA or B = probability of outcome A + probabil-ity of outcome B−probability of outcome Aand B.
The following problem helps to demonstratethese points. Let us say the probability of aperson having an emergency admission to hos-pital at some stage in their life is 0.6. They alsohave a 0.3 probability of being asked tocomplete a telephone questionnaire in thesame time span. If these results were com-pletely independent of one another, the prob-ability of having an: Emergency admission anda telephone questionnaire = 0.6 × 0.3 = 0.18
Consequently the probability of havingeither an: Emergency admission or a telephonequestionnaire = 0.6 + 0.3−0.18 = 0.72
To illustrate what these figures mean it ishelpful to use a 2×2 table after converting theprobabilities into actual numbers. This is doneby assuming we are dealing with 100 people(table 1).
From table 1 you can see that 18 people willhave both an emergency admission and a tele-phone questionnaire some time in their life.Seventy two will have one or both. Thisnumber is the total of those having a telephonequestionnaire only (12) plus people having anemergency admission only (42) plus those hav-ing both an emergency admission and atelephone questionnaire (18).
There is another method of calculatingprobabilities when dealing with data that haveonly two possible outcomes. Examples of suchbinomial data include live or die; boy or girl,success or failure. Consequently the outcomesare mutually exclusive. The probability of aspecific combination of these outcomes can bedetermined by use of the binomial probabilitytables.5 These list the chances of obtainingeach of the possible outcomes.
As binomial distributions deal specificallywith combinations of independent, mutuallyexclusive events, they are often not applicableto emergency medicine. In contrast, they areused in genetic counselling where some inher-itance disorders, such as Tay-Sachs disease,follow a binomial distribution. Koosis providesa comprehensive account for those who wish tolearn more about this topic.5
Table 1
Telephone questionnaire
Yes No Total
Emergency admissionYes 18 42 60No 12 28 40Total 30 70 100
Figure 1 Weight of staV in the emergency department of Deathstar General.
72–73.9
74–75.9
76–77.9
78–79.9
80–81.9
82–83.9
84–85.9
86–87.9
88–89.9
90–91.9
8 0.2
4 0.1
Weight (kg)
Freq
uen
cy
Pro
po
rtio
n
An introduction to statistical inference—3 359
www.jnlaem.com
DISPLAYING PROBABILITIES
In article 1 we discussed how a frequency dis-tribution could be used to show graphically thenumber of cases at each particular value of avariable. This is demonstrated in the followingexample. The specialist registrar at DeathstarGeneral, Dr Egbert Everard is concernedabout the health of the 40 male staV in theemergency department. He therefore decidesto weigh them and plot the results as afrequency distribution (fig 1). Probabilities canbe demonstrated in a similar way. To do thisEgbert divides the number of cases in eachweight category by the total number of cases inthe whole study (that is, 40). This gives him theproportion of cases at each value. These valuescan then be joined up to produce a distributioncurve (fig 1).
It is possible to use these distribution curvesto calculate the probability of having a valueequal to, or greater than, a particular number.For example the proportion of staV with aweight greater than, or equal to, 80 kg is repre-sented by the area under the curve to the rightof 80 kg mark (fig 2). Probability distributionshave a further useful property in that the areaunder the whole curve is equal to one. This isbecause it represents the sum of all the possibleprobabilities. Consequently the proportion ofstaV with a weight less than 80 kg isrepresented by the area under the curve to theleft of the 80 kg mark. This is equal to[1−shaded area].
Statistical inference in medical studies com-monly use probabilities in this way to test thenull hypothesis.
Testing the null hypothesisConsider what you would do if asked to makerecommendations for your emergency depart-ment on a new drug for asthma care followinga successful trial. Firstly, you would need to beto sure the patients were representative andrandomly chosen. Secondly, any diVerence ineVect attributable to the new treatment wouldneed to be judged in the light of the diVerencesbetween patients simply attributable to chancevariation.
We have seen that the probabilities of variousoutcomes can be quantified using statisticalinference. However, it is not practical to test allof the infinite number of possible diVerencesbetween the population and sample. Conse-quently only the possibility of there being nodiVerence between the population and sampleis tested. It is then feasible to determine theprobability that a diVerence equal, or greater,to that found in the study could be attributableto normal variation. This is known as testingthe null hypothesis.
The probability of the null hypothesis beingcorrect is called the p value, a frequently usedterm in medical journals. For example, in astudy comparing the rehabilitation time afterankle sprains with new and standard treat-ment, it was found that the mean diVerencewas four days (p = 0.01). Consequently thechance that a diVerence this big, or bigger,occurring when the null hypothesis is correct is1 in a 100. This means that it is more likely thatthere is a diVerence between the two treat-ments. This is called the alternative hypothesis.
Nowadays the p value is calculated by com-puter but the statistical tests used to work it outdepend upon the data in question and the typeof study. The choice of test is therefore impor-tant so that meaningful results are obtained.
Later in this series the tests commonly usedin emergency medicine will be described sothat you will be able to choose the correct one.At this point however it is useful to test ourunderstanding of the role of the null hypothesisand p values by considering the results of arecent publication. Sunde et al compared thetime from turning on the monitor to startingchest compression in diVerent types of cardiacarrest.6 In cases of asystole, the median timedelay was 29 seconds. This was significantlyshorter than the time found in patients withelectromechanical dissociation (EMD) (109seconds, p <0.001). What does this mean?
The null hypothesis in this study is that thetime delays before cardiopulmonary resuscita-tion in patients with asystole and EMD are thesame. However, the p values indicate that the
Figure 2 Distribution curve of the proportion of staV with a weight greater than or equalto 80 kg.
80
8 0.2
4 0.1
Weight (kg)
Freq
uen
cy
Pro
po
rtio
n
Key pointYou can express the data in a frequency dis-tribution as a distribution of probability.
Key pointThe null hypothesis states that the diVer-ence between the groups being tested isattributable to chance variation.
Key pointThe actual p value should be provided totwo decimal places.
Key pointThe p value is derived from the raw datausing statistical calculations and tablesappropriate to the test carried out.
360 Driscoll, Lecky, Crosby
www.jnlaem.com
probability of a diVerence of 80 seconds beingattributable to chance is less than one in athousand. It is therefore more likely that thealternative hypothesis is correct and there is adiVerence between these two groups.
STATISTICAL SIGNIFICANCE
Rejecting the null hypothesis means that a“significant diVerence” exists between thepopulations studied that cannot be explainedby chance alone.
A 2×2 table can be constructed for the fourpossible outcomes of the null hypothesis (NH)tested (table 2).
TYPE I ERROR
These mistakes occur when statistical testsindicate that the null hypothesis is unlikely(that is, the p value is low) but in actual factthere is no diVerence between the studygroups. By convention, “statistical signifi-cance” is accepted if the chance of making suchan error is less than 0.05. When these arbitrarylevels are given for a study they are oftenreferred to as á. Consequently when á = 0.05 itis considered tolerable to make a “type I” mis-take 1 in 20 times. However, the levelconsidered significant is determined by you theinvestigator. It may be that a lower value of á isrequired when testing the use of an expensiveor potentially toxic treatment. In this way thechances of falsely rejecting the null hypothesiscan be kept small. In these cases you may wishto use a value of 0.01 (that is, a 1 in a 100chance of falsely rejecting the null hypothesis)rather than the usual 0.05.
In considering the arbitrary level demarcat-ing type I errors, it is also important to beaware that the value for p is markedly aVectedby both the sample sizes and the magnitude ofany diVerence (that is, the point estimate). Thisis demonstrated in table 3. In all cases the pvalue is 0.05 but the diVerence and sample sizevary. For very large samples the diVerence onlyhas to be small to produce a statistically signifi-cant result. The converse applies when thesample is small. As will be discussed later inthis series, the p value is also aVected by thestandard deviation of the distribution.
TYPE II ERROR
These represent mistakes in falsely acceptingthe null hypothesis and is represented by â. If â
is large there is a high chance of making a typeII error. As this is the opposite of what wewant, the reciprocal of the term is often used ininstead. This is known as “power” and is equalto 1−â. Consequently a test with a high powerhas a low chance of making a type II error.Conventionally, a study is required to have apower of 0.8 to be acceptable. In other wordsthe study should have an 80% chance of beingable to detect if the null hypothesis did notapply.
Four factors eVect the probability of makinga type II error (box 2)
When á increases you are less likely to acceptthe null hypothesis. Consequently the chancesof making a type II mistake fall. A balancetherefore has to be struck so that the chances ofboth type I and II errors are kept as small aspossible.
We have already discussed that inferentialstatistics are used to take account of variationsin statistics when a parameter is being esti-mated. If the variations are large then possiblevalues for the parameter will also cover a widerange. In such circumstances the chances ofrejecting the null hypothesis are reduced. Con-versely factors that decrease variability willincrease the power of the study. Consequently,as increasing the sample size leads to a fall invariability, it will also reduce the chances ofmaking a type II error.
Table 2
Reality
True diVerence No diVerence
Statistical testNH rejected Rejection correct Rejection incorrect
(type I error)NH accepted Acceptance incorrect Acceptance correct
(type II error)
Table 3 The eVect of sample size and diVerence on the pvalue when comparing two groups (assuming a constantstandard deviation)7*
Sample size of each group DiVerence (in units) p Value
4 10.0 0.0525 4.0 0.05400 1.0 0.052500 0.4 0.0510 000 0.2 0.05
*Adapted from a theoretical study by Norman et al where theycompared the diVerence between the IQ in the normal popula-tion and those reading their statistic book.
Key pointsx It is possible to reduce the chances of
making type I error by using a lower álevel.
x The p value is aVected by the size of thediVerence, the number of cases in thesample and the standard deviation.
x In a large study, you are very likely to geta “statistically significant” result.
x In a small study it is very hard to get a“statistically significant” result. Conse-quently a p value greater than 0.05 in thissituation proves nothing.
Box 2 Factors aVecting the power of astudyx Size of áx Variability of the samplex Size of the samplex Point estimate
Statistical methods used to test the nullhypothesis are termed “tests of signifi-cance”
An introduction to statistical inference—3 361
www.jnlaem.com
The size of a diVerence between the studygroups (for example the control and theexperimental groups) will directly eVect power.An increase leads to greater power as there isless chance of falsely accepting the nullhypothesis.
To help understand these principles of nullhypothesis testing, consider a follow up studycarried out by Egbert. He was particularlyconcerned about how overweight the male per-sonnel were in the emergency department. Hetherefore set up a study with the nullhypothesis being that the mean weight of fit,healthy men and departmental men was thesame. Having weighed all 40 of them, he foundthe mean weight was 87 kg. Figure 3 shows thenormal probability distributions of the twopopulations. Population 1 all had the charac-teristic of being fit and healthy whereaspopulation 2 were unfit couch potatoes.Egbert’s finding of a mean of 87 kg couldtherefore lie in either distribution. The chancesof a weight this big, or heavier and being part ofthe population 1 is shown by the darker shadedarea. This represents á—that is, the probabilityof making a type I error and falsely rejectingthe null hypothesis. Conversely the chances ofbeing part of population 2 and having a weightthis big, or lighter is shown by the lightershaded area. This represents â—that is, theprobability of making a type II error and falselyaccepting the null hypothesis.
The four factors mentioned above are usedin an equation to calculate the power of astudy. However, if you are setting up a studyyou can set the power at a particular level(often 80%). Therefore, if the size of the other
variables are known (that is, á, variability andpoint diVerence), the same equation can beused to determine how many subjects arerequired in the study.
CLINICAL VERSUS STATISTICAL SIGNIFICANCE
Consider a study comparing a new anti-hypertensive medication (A) with a standardone (B). The result of the trial shows that theblood pressures in patients receiving A weresignificantly lower than those on B (p =0.0001). This means the probability that thediVerence found, or bigger, being attributableto chance is 1 in 10 000. In a well run study wewould have no problem in accepting this as astatistically significant result. However, thisdoes not mean it is clinically useful. In thesame study the point estimate between thegroups was 5 mm Hg. From a clinical point ofview we may consider this to be too small tooVset the diYculties, side eVects and expenseassociated with the drug A.
As stated before, accepting a p value of 0.05 toreject the null hypothesis may not be appropri-ate in some clinical settings. Clinical considera-tions also have to be considered when acceptingthe null hypothesis if the p value is greater than0.05. You need to take into account the type ofstudy carried out, the number of subjects ineach group and the weight of other publisheddata. A further point that should be remem-bered is that in clinical practice we usually needto know the presence and size of any diVerence.p Values only inform you on the likelihood of adiVerence being attributable to chance (that is,normal variation).
In the majority of cases these limitationswith the p value can be overcome by usingconfidence intervals. This will be discussedfurther in the following article.
SummaryStatistical inference is used to make commentsabout a population based upon data from asample. In a similar manner it can be applied toa population to make an estimate about a sam-ple. It is commonly seen in medical publica-tions when the null hypothesis is being tested.This calculates the probability (p value) of atype I error—that is, that a particular finding isattributable to chance. It is also important tobe aware of the chances of a type II error—thatis, accepting the null hypothesis when it doesnot apply. Sample size, point estimate and vari-ability are common factors that will aVect theFigure 3 Distribution curves of the weights of healthy and unhealthy men.
87
β Error
Healthy population mean75 kg
Unhealthy population mean95 kg
α Error
Pro
bab
ility
Key pointIncreasing the sample size is one of thecommonest ways of increasing the power ofa study.
Key pointsx The p value answers the question, “Is
there a statistically significant diVerencebetween the study groups?”
x Clinical issues need to be consideredalong with the size of the p value
x The size of any diVerence needs to beknown
x Statistical significance does not necessar-ily mean clinical significance
362 Driscoll, Lecky, Crosby
www.jnlaem.com
chances of making these two types of errors.Interpreting results therefore needs to takethese factors into account as well as the clinicalrelevance of the findings. Statistical signifi-cance does not necessarily mean clinicalsignificance.
Quiz(1) Complete the following phrase:
A parameter is to a –––– as a ––––- is toa sample.
(2) You are told that the probability of afemale patient having a fractured femur is0.3 and green eyes is 0.4. Assuming theseare independent of one another, what isthe probability that she has:
Both a fractured femur and green eyesEither a fractured femur, or green eyesor both
(3) Name three factors that will aVect thechances of making a type I and II error.
(4) A new thrombolytic “Dyno-coronary” hasbeen developed. Though very expensiveand toxic it is thought to produce coronarypatency quicker than standard treatment. Ifyou were to design a study to assess thiswhat á level would you choose—0.05 or0.01?
(5) One for you to do one your own.Formulate the null hypothesis for thestudy by Ireland et al that investigatedwhether supine oblique views provide bet-ter imaging of the cervicothoracic junctionthan a swimmer’s view.7 Consider the con-clusion drawn with respect to statisticaland clinical relevance.
Answers(1) A parameter is to a statistic as a population
is to a sample.
(2) 0.3 × 0.4 = 0.120.3 + 0.4−0.12 = 0.58
(3) Point estimate, sample variability, samplesize
(4) You would be aiming to minimise thechances of making a type I error, conse-quently an á level of 0.01 would be prefer-able.
The authors would like to thank John Heyworth, Sally Hollis,Jim Wardrope and Iram Butt for their invaluable suggestions.Funding: none.
Conflicts of interest: none.
1 Driscoll P, Lecky F, Crosby M. An introduction to everydaystatistics—1. J Accid Emerg Med 2000;17:205–11.
2 Stiell I, Greenberg G, Wells G, et al. Prospective validationof a decision rule for the use of radiography in acute kneeinjuries. JAMA 1996;275:611–15.
3 Altman D. Theoretical distributions. In: Practical statistics formedical research. London: Chapman and Hall, 1997:48–73.
4 Coggon D. Probability. In: Statistics in clinical practice.London: BMJ Publishing Group, 1995:52–61.
5 Koosis D. Populations and samples. In: Statistics— a selfteaching guide. New York: John Wiley, 1995:40–76.
6 Sunde K, Eftestol T, Askenberg C, et al. Quality assessmentof defibrillation and advanced life support using data fromthe medical control module of the defibrillator. Resuscita-tion 1999;41:237–47.
7 Ireland A, Britton I, Forrester A. Do supine oblique viewsprovide better imaging of the cervicothoracic junction thanswimmer’s views? J Accid Emerg Med 1998;15:151–4.
Further readingM, Gaddis G. Introduction to biostatistics: Part 1, basicconcepts. Ann Emerg Med 1990;19:143–6.Glaser A. Inferential statistics. In: High-yield statistics..Philadelphia: Williams and Wilkins, 1995:9–30.Normal G, Streiner D. Inferential statistics. In: PDQ statistics.2nd ed. London: Mosby, 1997:17–36.
CorrectionWe regret that two errors occurred in thestatistics paper published in July(2000;17:274–81). On page 277, 2nd column,+/− signs were omitted from 2SD in the penul-timate paragraph. On page 278, 1st column,the equation was incorrect; it should have read:SEM=SD/'—n
An introduction to statistical inference—3 363
www.jnlaem.com