an introduction to everyday statistics—2

15
AN INTRODUCTION TO STATISTICS An introduction to everyday statistics—2 P Driscoll, F Lecky, M Crosby Objectives x Describe central tendency and variability x Summarising datasets containing two vari- ables In covering these objectives we will deal with the following terms: x Mean, median and mode x Percentiles x Interquartile range x Standard deviation x Standard error of the mean In the first article of this series, we discussed graphical and tabular summaries of single datasets. This is a useful end point in its own right but often in clinical practice we also wish to compare datasets. Carrying this out by sim- ply visually identifying the diVerences between two graphs or data columns lacks precision. Often therefore the central tendency and vari- ability is also calculated so that more accurate comparisons can be made. Central tendency and variability It is usually possible to add to the tabular or graphical summary, additional information showing where most of the values are and their spread. The former is known as the central tendency and the latter the variability of the distribution. Generally these summary statis- tics should not be given to more than one extra decimal place over the raw data. CENTRAL TENDENCY There are a variety of methods for describing where most of the data are collecting. The choice depends upon the type of data being analysed (table 1). Mean This commonly used term refers to the sum of all the values divided by the number of data points. To demonstrate this consider the following example. Dr Egbert Everard received much praise for his study on paediatric admis- sions on one day to the A&E Department of Deathstar General (article 1). Suitably encour- aged, he reviews the waiting time for the 48 paediatric cases involved in the study (table 2). Considering cases 1 to 12, the mean waiting time is: 46+45+44+26+52+14+19+18+14+34+18+ 22/12 Which is: 352/12 = 29.3 minutes Consequently the mean takes into account the exact value of every data point in the distri- bution. However, the mean value itself may not be possible in practice. An example of this is the often quoted mean of “2.4 children” per family. Another major advantage of using the mean is that combining values from several small groups enables a larger group mean to be estimated. This means Egbert could determine the mean of all 48 cases by repeating this pro- cedure for cases 13 to 24; 25 to 36 and 37 to 48. The four separate means could then be added together and divided by 4 to give the overall mean waiting time. The problem in using the mean is that it is markedly aVected by distribution of the data and outliers. The latter are the extreme values of the data distribution but are not true meas- ures of the central tendency. The mean is therefore ideally reserved for normally distrib- uted data with no outliers. Table 1 Applicability of measure of central tendency Mean Median Mode Useful with quantitative data U U U Useful with ordinal data × U U Useful with nominal data × × U EVected by outliers U × × EVected by lack of normal distribution U ×U × Key point Central tendency and variability are com- mon methods of summarising ordinal and quantitative data Table 2 Waiting time for paediatric A&E admissions in one day to Deathstar General Case Waiting time (min) Case Waiting time (min) Case Waiting time (min) Case Waiting time (min) 1 46 13 19 25 10 37 9 2 45 14 20 26 17 38 31 3 44 15 19 27 14 39 18 4 26 16 14 28 15 40 7 5 52 17 27 29 22 41 19 6 14 18 26 30 27 42 22 7 19 19 24 31 45 43 14 8 18 20 17 32 18 44 11 9 14 21 19 33 22 45 17 10 34 22 28 34 10 46 26 11 18 23 37 35 19 47 37 12 22 24 12 36 15 48 41 Key point Mean = x/n. Where x = the sum of all the measurements and n = the number of measurements J Accid Emerg Med 2000;17:274–281 274 Accident and Emergency Department, Hope Hospital, Salford M6 8HD Correspondence to: Mr Driscoll, Consultant in A&E Medicine (pdriscoll@ hope.srht.nwest.nhs.uk) on January 18, 2022 by guest. Protected by copyright. http://emj.bmj.com/ J Accid Emerg Med: first published as 10.1136/emj.17.4.274 on 1 July 2000. Downloaded from on January 18, 2022 by guest. Protected by copyright. http://emj.bmj.com/ J Accid Emerg Med: first published as 10.1136/emj.17.4.274 on 1 July 2000. Downloaded from on January 18, 2022 by guest. Protected by copyright. http://emj.bmj.com/ J Accid Emerg Med: first published as 10.1136/emj.17.4.274 on 1 July 2000. Downloaded from on January 18, 2022 by guest. Protected by copyright. http://emj.bmj.com/ J Accid Emerg Med: first published as 10.1136/emj.17.4.274 on 1 July 2000. Downloaded from on January 18, 2022 by guest. Protected by copyright. http://emj.bmj.com/ J Accid Emerg Med: first published as 10.1136/emj.17.4.274 on 1 July 2000. Downloaded from on January 18, 2022 by guest. Protected by copyright. http://emj.bmj.com/ J Accid Emerg Med: first published as 10.1136/emj.17.4.274 on 1 July 2000. Downloaded from on January 18, 2022 by guest. Protected by copyright. http://emj.bmj.com/ J Accid Emerg Med: first published as 10.1136/emj.17.4.274 on 1 July 2000. Downloaded from on January 18, 2022 by guest. Protected by copyright. http://emj.bmj.com/ J Accid Emerg Med: first published as 10.1136/emj.17.4.274 on 1 July 2000. Downloaded from on January 18, 2022 by guest. Protected by copyright. http://emj.bmj.com/ J Accid Emerg Med: first published as 10.1136/emj.17.4.274 on 1 July 2000. Downloaded from

Upload: others

Post on 18-Jan-2022

6 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: An introduction to everyday statistics—2

AN INTRODUCTION TO STATISTICS

An introduction to everyday statistics—2

P Driscoll, F Lecky, M Crosby

Objectivesx Describe central tendency and variabilityx Summarising datasets containing two vari-

ablesIn covering these objectives we will deal withthe following terms:x Mean, median and modex Percentilesx Interquartile rangex Standard deviationx Standard error of the meanIn the first article of this series, we discussedgraphical and tabular summaries of singledatasets. This is a useful end point in its ownright but often in clinical practice we also wishto compare datasets. Carrying this out by sim-ply visually identifying the diVerences betweentwo graphs or data columns lacks precision.Often therefore the central tendency and vari-ability is also calculated so that more accuratecomparisons can be made.

Central tendency and variabilityIt is usually possible to add to the tabular orgraphical summary, additional informationshowing where most of the values are and theirspread. The former is known as the centraltendency and the latter the variability of thedistribution. Generally these summary statis-tics should not be given to more than one extradecimal place over the raw data.

CENTRAL TENDENCY

There are a variety of methods for describingwhere most of the data are collecting. Thechoice depends upon the type of data beinganalysed (table 1).

MeanThis commonly used term refers to the sum ofall the values divided by the number of datapoints. To demonstrate this consider the

following example. Dr Egbert Everard receivedmuch praise for his study on paediatric admis-sions on one day to the A&E Department ofDeathstar General (article 1). Suitably encour-aged, he reviews the waiting time for the 48paediatric cases involved in the study (table 2).

Considering cases 1 to 12, the mean waitingtime is:

46+45+44+26+52+14+19+18+14+34+18+22/12

Which is:352/12 = 29.3 minutes

Consequently the mean takes into accountthe exact value of every data point in the distri-bution. However, the mean value itself may notbe possible in practice. An example of this isthe often quoted mean of “2.4 children” perfamily. Another major advantage of using themean is that combining values from severalsmall groups enables a larger group mean to beestimated. This means Egbert could determinethe mean of all 48 cases by repeating this pro-cedure for cases 13 to 24; 25 to 36 and 37 to48. The four separate means could then beadded together and divided by 4 to give theoverall mean waiting time.

The problem in using the mean is that it ismarkedly aVected by distribution of the dataand outliers. The latter are the extreme valuesof the data distribution but are not true meas-ures of the central tendency. The mean istherefore ideally reserved for normally distrib-uted data with no outliers.

Table 1 Applicability of measure of central tendency

Mean Median Mode

Useful with quantitative data U U UUseful with ordinal data × U UUseful with nominal data × × UEVected by outliers U × ×EVected by lack of normal distribution U ×U ×

Key pointCentral tendency and variability are com-mon methods of summarising ordinal andquantitative data

Table 2 Waiting time for paediatric A&E admissions inone day to Deathstar General

Case

Waitingtime(min) Case

Waitingtime(min) Case

Waitingtime(min) Case

Waitingtime(min)

1 46 13 19 25 10 37 92 45 14 20 26 17 38 313 44 15 19 27 14 39 184 26 16 14 28 15 40 75 52 17 27 29 22 41 196 14 18 26 30 27 42 227 19 19 24 31 45 43 148 18 20 17 32 18 44 119 14 21 19 33 22 45 1710 34 22 28 34 10 46 2611 18 23 37 35 19 47 3712 22 24 12 36 15 48 41

Key pointMean = Óx/n. Where Óx = the sum of all themeasurements and n = the number ofmeasurements

J Accid Emerg Med 2000;17:274–281274

Accident andEmergencyDepartment, HopeHospital, SalfordM6 8HD

Correspondence to:Mr Driscoll, Consultant inA&E Medicine ([email protected])

on January 18, 2022 by guest. Protected by copyright.

http://emj.bm

j.com/

J Accid E

merg M

ed: first published as 10.1136/emj.17.4.274 on 1 July 2000. D

ownloaded from

on January 18, 2022 by guest. P

rotected by copyright.http://em

j.bmj.com

/J A

ccid Em

erg Med: first published as 10.1136/em

j.17.4.274 on 1 July 2000. Dow

nloaded from

on January 18, 2022 by guest. Protected by copyright.

http://emj.bm

j.com/

J Accid E

merg M

ed: first published as 10.1136/emj.17.4.274 on 1 July 2000. D

ownloaded from

on January 18, 2022 by guest. P

rotected by copyright.http://em

j.bmj.com

/J A

ccid Em

erg Med: first published as 10.1136/em

j.17.4.274 on 1 July 2000. Dow

nloaded from

on January 18, 2022 by guest. Protected by copyright.

http://emj.bm

j.com/

J Accid E

merg M

ed: first published as 10.1136/emj.17.4.274 on 1 July 2000. D

ownloaded from

on January 18, 2022 by guest. P

rotected by copyright.http://em

j.bmj.com

/J A

ccid Em

erg Med: first published as 10.1136/em

j.17.4.274 on 1 July 2000. Dow

nloaded from

on January 18, 2022 by guest. Protected by copyright.

http://emj.bm

j.com/

J Accid E

merg M

ed: first published as 10.1136/emj.17.4.274 on 1 July 2000. D

ownloaded from

on January 18, 2022 by guest. P

rotected by copyright.http://em

j.bmj.com

/J A

ccid Em

erg Med: first published as 10.1136/em

j.17.4.274 on 1 July 2000. Dow

nloaded from

on January 18, 2022 by guest. Protected by copyright.

http://emj.bm

j.com/

J Accid E

merg M

ed: first published as 10.1136/emj.17.4.274 on 1 July 2000. D

ownloaded from

Page 2: An introduction to everyday statistics—2

MedianThe mean cannot be used for ordinal databecause it relies on the assumption that thegaps between the categories are the same. It ispossible however to rank the data and deter-mine the midpoint. When there is an evennumber of cases the mean of the two middlevalues is taken as the median. In Egbert’sstudy, the median waiting time for cases 1 to 12would therefore be calculated by:

Listing the data points in rank order:14 14 18 18 19 22 26 34 44 45 46 52Determining the midpoint that lies half way

between the 6th and 7th data point:(22 + 26)/2 = 24 minutes

In contrast with the mean, the median is lessaVected by outliers, being responsive only tothe number of scores above and below it ratherthan their actual value. It is therefore better atdescribing the central tendency in non-normally distributed data and when there areoutliers. However, in bimodal distribution, nei-ther median or mean will provide a good sum-mary of the central tendency.

A common example of the use of medians isseen in the longevity of grafts, prostheses andpatients. It is used because the loss to followup, and the unknown end point of the trial,prevents a mean being used to determine theaverage survival duration.

ModeThis equals the commonest obtained value ona data scale. The mode is also used in nominaldata to describe the most prevalent characteris-tic particularly when there is a bimodaldistribution. Consequently in Egbert’s study,there would be two mode values for cases 1 to12—that is, 14 and 18 minutes.

The mode can be demonstrated graphically.Another advantage is that it is relativelyimmune to outliers. However, the peaks can beobscured by random fluctuations of the data

when the sample size is small. Furthermore,unlike the mean and median, it can changemarkedly with small alterations in the samplevalues.

VARIABILITY

This is a measure of the distribution of thedata. It can be described visually by plottingthe frequency distribution curve. However, it isbest to quantify this spread using methodsdependent upon the type of data you are deal-ing with (table 3). Though there are measuresof spread of nominal data, they are rarely usedother than to provide a list of the categoriespresent.1 We will therefore concentrate onmethods used to describe the variability ofordinal and quantitative data.

RangeThis is the interval between the maximum andminimum value in the dataset. It can beprofoundly eVected by the presence of outliersand does not give any information on wherethe majority of the data lies. As a consequenceother measures of variability tend to be used.

PercentilesIt is possible to divide up a distribution into100 parts such that each is equal to 1 per centof the area under the curve. These parts areknown as percentiles. This process is used toidentify data points that separate the wholedistribution into areas representing particularpercentages. For example, as described previ-ously, the median divides the distribution intotwo equal areas. It therefore represents the50th centile. Similarly the 10th centile wouldbe the data point dividing the distribution intotwo areas one representing 10% of the area andthe other 90% (fig 2).

To determine the value of a particularpercentile we use the same process requiredto work out the median. To demonstrate this

Key pointsx The mean reflects all the data pointsx Small group means can be combined to

estimate an overall meanx The mean is commonly used for quanti-

tative data

Key pointMedian = value of the (n+1)/2 observation.Where n is the number of observations inascending order

Key pointsx Medians only reflect the middle data

pointsx You cannot combine medians to get an

overall valuex Median is commonly used for ordinal

data, and for quantitative data when thedata are skewed or contains outliers

Key pointsx Mode only reflects the most common

data pointsx Modes cannot be combined to get an

overall valuex Mode is commonly used for nominal data

Key pointsx The mean, median and mode will all be

the same in normally distributed data butdiVer when the distribution is skewed (fig1)

x When dealing with skewed data, themean is positioned towards the long tail,the mode towards the short one and themedian is between the two. The medianvalue divides the curve into two equallysized areas (fig1(B) and (C))

x The mode is a good descriptive statisticbut of little other use

x The mean and median provide only onevalue for a given dataset. In contrast therecan be more than one mode

An introduction to everyday statistics—2 275

on January 18, 2022 by guest. Protected by copyright.

http://emj.bm

j.com/

J Accid E

merg M

ed: first published as 10.1136/emj.17.4.274 on 1 July 2000. D

ownloaded from

Page 3: An introduction to everyday statistics—2

consider how Egbert would determine the 25thcentile in his waiting time data.

Firstly, he needs to sort them into rank order(table 4). He then locates the position of the10th percentile by:

10/100 × (n +1) where n is the number ofdata points in the distribution.

The position of the 10th percentile is there-fore 0.1 × 49 = 4.9

This means the 10th centile lies nine tenthsof the way from the 4th (that is, 10 minutes) to

the 5th (that is, 11 minutes) data points. Con-sequently the 10th percentile is 10.9 minutes.In other words, 10 per cent of all the waitingtimes in Egbert’s study have a value of 10.9minutes or less. An equally correct way ofexpressing this result would be say that 90% ofall the waiting times in Egbert’s study have avalue of 10.9 minutes or greater (fig 2).

The two most commonly used percentilesare the 25th and 75th ones. As they are mark-ing out the lower and upper quarters of thedistribution they are known as the lower andupper quartile values. The data points betweenthem represent the interquartile range.

Interquartile (IQ) rangeAs the lower and upper ends of the distributionare eliminated, the interquartile range showsthe location of most of the data and is lessaVected by outliers. It represents a goodmethod of summarising the variability whendealing with ordinal data or distributions thatare not “normal”.

The IQ range can also be used to provide agood graphical representation of the data. Asthe lower and upper quartile values representthe 25th and 75th percentile, and the medianthe 50th, these can be used to draw a “box andwhisker” plot (fig 3). The box represents thecentral 50% of the data and the line in the boxmarks the median. The whiskers extend to themaximum and minimum values. In the case ofoutliers the whiskers are restricted to a lengthone and a half the interquartile range withother outliers being shown separately (fig 3).Box and whisker plots allow the reader toquickly judge the centre, spread and symmetryof the distribution. In so doing it enables twodistributions to be compared (see later).Nevertheless, from a mathematical point ofview, the IQ range does not provide the versa-tility required for detailed comparisons of nor-mally distributed quantitative data. For this weneed the variance and standard deviation.

VarianceAn obvious way of looking at variability is tomeasure the diVerence of each value from thesample mean—that is, the calculated mean of

Figure 1 Location of mean, median and mode with diVerent distributions. (A) Normaldistribution. (B) Right (positive) skewed distribution. (C) Left (negative) skeweddistribution.

Variable

Freq

uen

cyA

mean; median; mode

Variable

Freq

uen

cy

B Mode Median

Mean

Variable

Freq

uen

cy

C ModeMedian

Mean

Table 3 Methods of quantifying distribution

RangeIQrange SD SEM

Descriptive of samplevariability U U U U

Describes quantitative data U U U UDescribes ordinal data U U × ×

Key pointPercentiles are values that identify a particu-lar percentage of the whole distribution

Key pointsx Value of the X percentile = X (n+1)th

observation/100. Where n is the numberof observations in ascending order

x Percentiles can only be calculated forquantitative and numeric ordinal data

Key pointsx The IQ range is a good method of

describing data but not the most eYcientway of describing data variability whencomparing groups

x The IQ range is commonly used todescribe the variability of ordinal dataand quantitative data that are skewed orhave outliers.

276 Discoll, Lecky, Crosby

on January 18, 2022 by guest. Protected by copyright.

http://emj.bm

j.com/

J Accid E

merg M

ed: first published as 10.1136/emj.17.4.274 on 1 July 2000. D

ownloaded from

Page 4: An introduction to everyday statistics—2

the study group. However, in simply adding upthese numbers you would always arrive at zerobecause the sum of the diVerences bigger andsmaller than the mean are equal but opposite.However, by squaring all the diVerences beforeadding them together, a positive result isalways achieved. The sum of these squaresabout the mean is usually known as the “sumof the squares”. For example, in a study thefollowing diVerences were found between themean and each value:

−3, −2, −2, −1, 0, 0, 0, 1, 1, 1, 2, 3Therefore the sum of the squares is:+4 +4 + 1 + 0 +0 +0 + 1 +1 +1 + 4 + 9 = 34The size of the sum of the squares will vary

according to the number of observations aswell as the spread of the data. To get anaverage, the sum of the squares is divided bythe number of observations. The result iscalled the “variance”.

Standard deviationVariance has limited use in descriptive statisticsbecause it does not have the same units as theoriginal observations. To overcome this thesquare root of the variance can be calculated.This is known as the standard deviation (SD)and it has the same units as the variablesmeasured originally. If the SD is small then thedata points are close to one another, whereas alarge SD indicates there is a lot of variationbetween individual values. In fact the mean forpositive values is no longer an adequate meas-ure of the central tendency when it is smallerthan the SD.

When the data distribution is normal, ornearly so, it is possible to calculate that 68.3%of the data lies +/− 1 SD from the mean;95.4% between +/− 2 SD; 99.7% between +/−3 SD (fig 4).

The “normal range” corresponds to theinterval that includes 95% of the data. This isequal to 1.96 SD on either side of the mean butis often approximated to mean (2SD). If thiscriterion is used to diagnose health anddisease, it follows that 5% of all results fromnormal individuals will be classified as beingpathological (fig 4).

STANDARD ERROR OF THE MEAN

It is useful at this stage to consider the standarderror of the mean (SEM). Usually in clinicalpractice we do not know what the true overallmean is. All we have are the results from ourstudy and the standard deviation. Fortunatelyit can be shown mathematically that, providedour study group size is over 10 and randomlyselected, the overall mean lies within a normaldistribution whose centre is our study group’smean (fig 5). The standard deviation of thisdistribution is called the SEM. This can be

Figure 2 Frequency distribution showing the 10th centile.

Variable

Freq

uen

cy

10% of the distribution

10th centile

90% of the distribution

Median

Table 4 Waiting time for paediatric A&E admissions inone day to Deathstar General—sorted by rank order

Rankorder

Waitingtime(min)

Rankorder

Waitingtime(min)

Rankorder

Waitingtime(min)

Rankorder

Waitingtime(min)

1 7 13 15 25 19 37 272 9 14 17 26 19 38 283 10 15 17 27 20 39 314 10 16 17 28 22 40 345 11 17 18 29 22 41 376 12 18 18 30 22 42 377 14 19 18 31 22 43 418 14 20 18 32 24 44 449 14 21 19 33 26 45 4510 14 22 19 34 26 46 4511 14 23 19 35 26 47 4612 15 24 19 36 27 48 52

Figure 3 A box and whisker plot.

75th percentile

Maximum value

Median (50th percentile)

Minimum value

Outlier

25th percentile

Key pointVariance = Ó (Mean − x)2/(n). Where Ó(Mean − x)2= the sum of the squares of allthe diVerences from the mean

Key pointsx The standard deviation is a measure of

the average distance individual values arefrom the mean

x The bigger the variability the bigger thestandard deviation

x The standard deviation is used to meas-ure the variability of quantitative data thatare not markedly skewed and do not haveoutliers

An introduction to everyday statistics—2 277

on January 18, 2022 by guest. Protected by copyright.

http://emj.bm

j.com/

J Accid E

merg M

ed: first published as 10.1136/emj.17.4.274 on 1 July 2000. D

ownloaded from

Page 5: An introduction to everyday statistics—2

calculated by dividing the sample’s standarddeviation by the square root of the number ofobservations in the sample.

As the distribution is normal, the same prin-ciples apply as found with the SD above. Con-sequently there is a 95% chance that the over-all mean lies +/− 1.96 SEM from the samplemean.

To demonstrate these points, consider whatWatenpaugh and GaVney are saying when theywrite, “One hour after infusion, the capillaryreabsorption of fluids was −236 +/− SEM 102ml/h”.2

This means we can be 95% sure that theoverall mean reabsorption rate lies somewherebetween:

−236 +/− (1.96 × 102) ml/h = −435.9 ml/hto −36.1 ml/h

The size of the SEM tends to decrease as thestudy group size gets larger or the variationwithin the sample becomes less marked.Consequently samples greater than 30 with lit-tle variation will have small SEM. As a result,the overall mean can be reliably pin pointed tolie within a narrow range.

So far in article 1 and 2, we have concen-trated on summarising single variable datasets.In practice this is often a preliminary step tocombining and comparing diVerent collectionsof data.

Summarising data from two variablesAgain the method used for this is dependentupon the type of data being dealt with (table5). In subsequent articles we will discuss howeach of these various types of comparisons arecarried out. For now, we need to consider theimportant points that have to be borne in mindwhen presenting the summary of two variables.

CROSS TABULATION

When comparing nominal datasets, a fre-quency table can be drawn showing thenumber and proportion (percentage) of caseswith a combination of variables. If there arelarge numbers, and marked variations, it isoften better to present the data as a percentageof either the row or column totals. This makesinterpretation by the reader easier. However,when using percentage, you need to indicatethe total number from which they werederived. If one, or more, of the datasets areordinal or quantitative then the groups must bearranged in the correct order (fig 6).

Figure 4 A normal distribution curve divided up into diVerent multiples of the standarddeviation (SD).

Freq

uen

cy

–3SD +3SD–2SD +2SD–1SD +1SD

Mean

Figure 5 Distribution of the mean of samples of various sizes.

Mean

n = 100

n = 25

n = 10 Original population

Table 5 Methods used to compare diVerent types of data

Nominal Ordinal Quantitative—discrete Quantitative—continuous

Nominal Cross tabulation Cross tabulation Cross tabulation GraphicalBar chart Bar chart Graphical

Ordinal Cross tabulation Cross tabulation GraphicalBar chart Graphical

Quantitative—discrete Cross tabulation GraphicalGraphical

Quantitative—continuous Scatter plot with regressionand/or correlation

Key pointSEM = S'nD'n

Key pointsx SEM is used when describing the overall

mean, not the study group’s meanx SEM is a measure of the variance of the

overall mean as estimated from the studygroup. It is not a measure of the distribu-tion of the data.

278 Discoll, Lecky, Crosby

on January 18, 2022 by guest. Protected by copyright.

http://emj.bm

j.com/

J Accid E

merg M

ed: first published as 10.1136/emj.17.4.274 on 1 July 2000. D

ownloaded from

Page 6: An introduction to everyday statistics—2

BAR CHART

You can use bar charts when comparing nomi-nal and ordinal datasets. In the latter case thegroups must be arranged in the most appropri-ate order (fig 7). This comparison has to bedone visually and is therefore prone to error.

GRAPHICAL REPRESENTATION

The dot plot provides a useful way of allowingungrouped, discrete data to be visually com-pared. As with the bar chart, in the dot plot thevalues of the variable are listed on the verticalaxis and the categories on the horizontal.

Figure 8 shows the values of an objectivemeasure of muscle bulk in the hand accordingto a simple clinical classification of musclewasting for 61 elderly subjects. With the dot

plot each individual value is shown (fig 8 (A)).It is therefore easy to see that there is a largeamount of scatter and that the distributions areslightly skewed. As described previously thebox and whisker plots can be used tosummarise the data and demonstrate thediVerence between the groups (fig 8 (B)). Thisshows those with most wasting tend to havesmaller cross sectional areas (CSA). Furtherinvestigation reveals that when the CSA are logtransformed the distribution is more sym-metrical. We can therefore now use means andSD to summarise the transformed data (fig8C). These are shown against a logarithmicscale in their original units as it makes themeasier to interpret.

Graphical representation of central tendencyand variance allow data to be summarised in areadily accessible format (figs 8 (B) and 9 (A)).This is commonly done by marking the centraltendency and providing a measure of the vari-ability. However, if it is important to representeach reading, a dot plot can be used. This ena-bles the corresponding points to be joinedwhen considering a series of readings from thesame subject (fig 9(B)).

SCATTER PLOT, CORRELATION AND REGRESSION

A scatter plot enables the relation betweenquantitative-continuous variables to be dem-onstrated (fig 10). By convention the variablethat is “controlling” the other is known as theindependent variable and its value is recordedon the horizontal axis. These independent

Figure 6 Cross tabulation using an example of quantitative data (waiting times) and ordinal data (Triage category) fromEgbert’s study on paediatric A&E attendances.

Tria

ge

cate

go

ry

Waiting time (min)

0–30 31–60 61–90 90–120 Row total (%)

Yellow

Green

Blue

Not recorded

Column total (%)

1

11

12 (25.0)

5

12

3

20 (41.7)

10

5

15 (31.3)

1

1 (2.1)

6 (12.5)

33 (68.8)

1 (2.1)

8 (16.7)

48 (100)

Number of missing observations = 1

Figure 7 Frequency of male paediatric admissions in each triage category.

20

18

14

16

12

8

10

6

4

0

2

Triage category

Nu

mb

er

Yellow Green Blue Not recorded

Figure 8 Three methods of showing the central tendency and variance using the same data. (A) Scatter plot. (B) Box and whisker plot. (C) Scatter plotusing a logarithmic scale.

800

700

500

600

400

200

300

ModerateMild

Wasting

CS

A (

mm

2 )

None

A800

700

500

600

400

200

300

ModerateMild

Wasting

CS

A (

mm

2 )

None

B800

700

500

600

400

200

300

ModerateMild

Wasting

CS

A (

mm

2 )

None

C

An introduction to everyday statistics—2 279

on January 18, 2022 by guest. Protected by copyright.

http://emj.bm

j.com/

J Accid E

merg M

ed: first published as 10.1136/emj.17.4.274 on 1 July 2000. D

ownloaded from

Page 7: An introduction to everyday statistics—2

variables are those that are varied by theexperimenter, either directly (for example,treatment given) or indirectly through theselection of subjects (for example, age). Thedependent variables are the outcome of thisexperimental manipulation and they are plot-ted on the vertical axis.

SummaryWhen confronted with a vast amount of infor-mation, first consider the type of data present.

Then summarise appropriately (see article 1),noting the central tendency and variability.Once this is complete it is possible to combinethe information from two or more variablestaking into account what types of data they are.

Quiz(1) If data are skewed to the right, which is

greater the median or the mean?(2) Describe two main diVerences between the

mean and the median.(3) Consider Egbert’s waiting time data listed

in table 4.(a) What is the mean, median and mode?(b) What is the 75th centile?(c) What is the interquartile range?

(4) Figure 11 shows a box and whisker plotfrom Egbert’s study.(a) What do the box’s represent(b) What does the line in the box’s repre-

sent(c) How would you describe the box and

whisker plot of triage category (green)and booking time?

(5) One for you to do on your own.Mullner et al carried out an experiment tolook at myocardial function following suc-cessful resuscitation after a cardiac arrest.3

Part of their results are shown in table 6:(a) Identify the diVerent types of data(b) Identify the central tendency and

variability for each variable(c) Present a tabular summary of the

patient’s ages(d) What graphical summary would you

use to compare the frequency of VFwith site of the myocardial infarction?

(e) What graphical summary would youuse to show the comparison betweenthe site of the myocardial infarctionand the duration of the arrest?

(f) What method would you use to showthe comparison between the systolicblood pressure and duration of arrest?

Answers(1) The mean(2) The mean reflects all the data points

whereas the median uses only the middleones. Secondly, unlike medians, smallgroup means can be combined to give anoverall mean.

(3) (a) Mean, median and mode:Mean = 1100/48 = 22.9 minutesMedian = 49/2nd data point = halfway between the 24th and 25th datapoint = 19 minutesMode = 19 minutes

(b) The 75th centile equals: 75× 49/100thdata point = 36.75th data point. Thismeans it is 0.75 of the way from the36th to the 37th data point. However,using table 4, both data points are 27minutes.The 75 th centile is therefore 27 min-utes.

(c) The interquartile range equals:This is the range between the 25thand the 75th centile

Figure 9 Two methods of presenting the same data. (A) provides a measure of the centraltendency and variance using a box and whisker plot. However, if the change in each subjectis needed, individual values should be plotted and joined (B).

50

40

20

30

10

01991

EA

T S

core

1989

A50

40

20

30

10

0

1991E

AT

Sco

re1989

B

Figure 10 Grip strength and cross sectional area (CSA) of index finger-thumb web space.

180

140

160

120

80

100

60

40

20

0

CSA (mm2)

Gri

p (

n)

0 100 200 300 400 500 600

Figure 11 Box and whisker plot of booking time by triage category.

24

18

12

6

0

Triage category

Bo

oki

ng

tim

e (h

)

n = 6Yellow

33Green

1Blue

8Not recorded

280 Discoll, Lecky, Crosby

on January 18, 2022 by guest. Protected by copyright.

http://emj.bm

j.com/

J Accid E

merg M

ed: first published as 10.1136/emj.17.4.274 on 1 July 2000. D

ownloaded from

Page 8: An introduction to everyday statistics—2

Using the process described in 3 (b),the 25th centile is 15 minutes.The interquartile range is therefore15–27 minutes.

(4) (a) The middle 50% of the data

(b) The median(c) Negative skewed distribution (long

tail on the side with the lowestnumbers)

The authors would like to thank Sally Hollis and Alan Gibbswho helped with an earlier version of this article and the invalu-able suggestions from Jim Wardrope, Paul Downs and IramButt.

1 Bowlers D. Measures of spread of nominal variables. In:Statistics from scratch. Chichester: John Wiley, 1996:118–23.

2 Watenpaugh D, GaVney F. Measurement of net whole-bodytranscapillary fluid transport and eVective vascular compli-ance in humans. J Trauma 1998;45:1062–8.

3 Mullner M, Domanovitis H, Sterz F, et al. Measurement ofmyocardial contractility following successful resuscitation:quantitated left ventricular systolic function utilisingnon-invasive wall tress analysis. Resuscitation 1998;39:51–9.

Further readingAltman D, Bland J. Quartiles, quintiles, centiles and otherquantiles. BMJ 1994;309:996.Bowlers D. Measure of average. In: Statistics from scratch. Chich-ester: John Wiley, 1996:83–113.Coogan D. Statistics in clinical practice. London: BMJ publi-cation, 1995.Gaddis G, Gaddis M. Introduction to biostatistics: part 2,descriptive statistics. Ann Emerg Med 1990;19:309–15.Glaser A. Descriptive statistics. In: High-yield biostatistics. Phila-delphia: Williams and Wilkins 1995:1–18.Funding: none.

Conflicts of interest: none.

Table 6

PatientAge(y) ECG

Systolic BP(mm Hg)

Duration ofarrest (min)

Epinephrine(mg) MI location

1 44 VF 98 15 2 Anterior2 58 Asystole 112 2 0 None3 57 VF 100 12 0 None4 62 VF 90 27 4 Inferior5 52 VF 110 25 10 Inferior6 65 VF 117 48 6 Anterior7 42 VF 113 13 0 Inferior8 46 VF 105 9 0 None9 59 VF 180 15 1 Inferior10 56 VF 97 47 8 Anterior11 73 VF 114 18 1 None12 60 VF 165 25 2 Subendocardial13 80 PEA 115 29 2 None14 62 Asystole 85 26 7 None15 58 VF 104 15 6 Inferior16 50 VF 101 20 4 None17 72 VF 100 32 3 Inferior18 46 VF 120 23 2 Subendocardial19 72 VF 138 22 3 Inferior20 50 VF 102 42 3 None

MI = myocardial infarction; VF = ventricular fibrillation; PEA = pulseless electrical activity.

An introduction to everyday statistics—2 281

on January 18, 2022 by guest. Protected by copyright.

http://emj.bm

j.com/

J Accid E

merg M

ed: first published as 10.1136/emj.17.4.274 on 1 July 2000. D

ownloaded from

Page 9: An introduction to everyday statistics—2

AN INTRODUCTION TO STATISTICS

An introduction to statistical inference—3

P Driscoll, F Lecky, M Crosby

Objectivesx Discuss the principles of statistical inferencex Quantifying the probability of a particular

outcomex Discuss clinical versus statistical significance

In covering these objectives we will intro-duce the following terms:

x Population and samplex Parameter and statisticx Null hypothesis and alternative hypothesisx Type I and II errorsThe previous two articles discussed summaris-ing data so that useful comparisons can bemade. Another common problem encounteredis estimating a value in a larger group basedupon information collected from a smallnumber of subjects. To see how statistics canbe used to achieve this, it is helpful to begin byreviewing the meaning of five commonly usedterms:x Population and samplex Parameter and statisticx ElementThe word “population” describes a large groupthat includes every possible case. In contrast, a“sample” is a smaller group of subjects who arepart of the population. Therefore the popula-tion of UK emergency departments wouldhave every emergency department in the UKwhereas those in the north west wouldrepresent a sample.

A value measured in a population is known asa “parameter”. Consequently the trolley waitingtime in UK emergency departments would be aparameter. The term “statistic” is used todenote the same variable when it is measured ina sample. Finally each separate observation ineither a population or sample is called an“element” and it is often labelled with the letterX. The number of elements in a population isgiven the letter N and in a sample, n.

It is often not possible to record all theelements of a population. For example, a studyinvestigating the peak flow in asthmatic pa-tients attending UK emergency departmentscannot review every patient. However, it is fea-sible to record the peak flow in a sample of

asthmatic emergency department attendances.From this statistic an estimation of the popula-tion’s peak flow can be inferred. The namegiven for manipulating data in this way istherefore called inferential statistics. It can alsobe used to make estimations about a samplebased upon information from a population.

In carrying out inferential statistics it isimportant to ensure that samples are represen-tative of the whole population and have beenrandomly selected. If this is not the case, biaswill be introduced and a perverse answer couldresult. For example, inferential statistics couldbe used for making a national generalisationfollowing a survey on the waiting times in 20emergency departments. However, problemswould arise if the sample did not represent thepopulation. For example, if the investigationlooked at district general hospital emergencydepartments in London then it is unlikely to bean accurate reflection of all the emergencydepartments in the UK.

It is also important that each subject has anequal chance of being included in the sample.Consequently, if the trolley waits for elderlypatients was being studied, all such times needto be recorded not just those measured whenthere is an apparent delay. A possible way ofachieving this goal is by random sampling.This is a method of selecting subjects such thateach member of the population has an equaland independent chance of being picked. Avariety of techniques can be used includingflicking a coin, drawing numbers from a hatand reading from random number tables.However, by themselves, these techniques maynot be adequate because populations can bemade up of diVerent types of groups of varioussizes. This heterogeneity could have a bearingon the study. For example the stage of adisease, and the age or sex of a patient maychange the response to a particular drug. Asthese are not necessarily evenly distributed in apopulation it is important they are adequatelycovered in the sampling process. This isachieved by stratified random sampling inwhich the population is initially divided intohomogeneous groups from which randomsamples are taken. In this way representativesamples of the whole population can beobtained.

Allowing for uncertaintyMeasurements on people vary because we arenot all the same. Therefore the peak flow in 20

Key pointA population contains all the elements fromX1 to XN and a sample has n of the Nelements.

J Accid Emerg Med 2000;17:357–363 357

Accident andEmergencyDepartment, HopeHospital, SalfordM6 8HDP DriscollF LeckyM Crosby

Correspondence to:Mr Driscoll, Consultant inAccident and EmergencyMedicine ([email protected])

Accepted 22 June 2000

www.jnlaem.com

Page 10: An introduction to everyday statistics—2

randomly selected adults from your waitingroom would vary over a range of values. Thiswould be the case even if all the subjects wereperfectly healthy. We could reduce this rangeby being very selective about who wemeasured, for example being male, 1.75 meterstall, good looking and from Yorkshire! How-ever, even if you managed to find 20 such sub-jects, the peak flows would still vary.

A useful way of considering this variation isto plot it as a frequency distribution. When thevalues from a reasonably homogeneous groupare shown in this way the curve takes upon theshape of a “normal distribution”.1 This isbecause many of the recordings are clusteredaround the mean value with a symmetrical falloV in the frequency of recordings as you movefrom the centre.

Inferential statistics enables us to takeaccount of these variations when estimationsabout a parameter or statistic are been made. Itdoes this by quantifying the probabilities of thepossible outcomes.

ProbabilityIn view of the variations discussed previouslywe cannot predict with absolute certainty theoutcome of an event prior to it happening. Thisuncertainty is often referred to as “a matter ofchance”. It is however possible to quantify thisuncertainty by calculating the probability of anevent occurring.

Probability (p) is invariably expressed as adecimal value between 0 and 1, where zeromeans that an outcome will never happen and1 means it will always occur. Therefore, if 30%

of patients survive after a cardiac arrest in hos-pital, the probability of survival would be writ-ten as 0.3. As it cannot be larger than 1, it fol-lows that the probability of an event notoccurring is 1−p. Therefore the probability ofnot surviving a cardiac arrest is 1−0.3 = 0.7.

Probabilities are often used to help guidemanagement. For example after a head injury,a patient with no skull fracture and noneurological signs has a 0.00017 probability ofdeveloping an operable intracranial hae-matoma. This is such a low number that wetend to discharge these patients under the careof a sensible adult.

When using probabilities in determiningclinical decision making, one tends to err onthe side of caution so that no cases are missed.This is seen with the Ottawa knee protocol(box 1). A level is chosen where the probabilityof missing a fracture is zero. Consequentlyradiographs are reserved for those patientswith particular presenting symptoms.

COMBINING PROBABILITIES

Though a single finding, or test, can help inclinical decision making, in practice we oftenrely on several results before making a diagno-sis. This process entails combining the prob-abilities of each of the possible outcomes.

The chance of a particular outcome in any ofthe tests carried out is equal to the sum of allthe probabilities. For example, if the probabil-ity of a patient in your waiting room beingblood group O is 0.46, the chance of either oftwo unrelated patients being blood group O is0.46 + 0.46 = 0.92. This is known as the addi-tional rule of probability and it can be writtenas: Probability of outcome A or B = probabilityof outcome A + probability of outcome B

To calculate the probability of a specificcombination of independent outcomes occur-ring (for example, the probability of outcome Aand B), the separate outcome probabilitiesneed to be multiplied together. Therefore, theprobability of both patients being blood group

Key pointsx Population This is a complete group (that

is, having every eligible person (or item)with a particular characteristic). Depend-ing on the study the actual size of a popu-lation varies from modest (for example,all minor injury units in a particularregion) to huge (for example, all emer-gency treatment centres in Europe).

x Sample This is a subset of the populationthat has been collected for a study. Aswith the population, the size of a samplecan vary.

x Parameter This is the value of a variable ina population.

x Statistic This is the value of a variable in asample.

x Element This is a single observation.x Statistical inference This enables state-

ments to be made about a sample basedupon a population’s parameter. It alsoallows the converse to occur but in thiscase it is dependent upon random,representative samples being taken.

Key pointProbability is the proportion of cases in astudy that have a particular result.

Key pointsx Probability values are expressed as a deci-

mal from 0 to 1x The probability of an event occurring

cannot be negativex The probability of an event not occurring

is 1−p.

Box 1 Ottawa knee rule2

Order radiography of the knee if any of thefollowing are present:x Patient older than 55 yearsx Tenderness at the head of the fibulax Isolated tenderness of the patellax Inability to flex to 90 degreesx Inability to transfer weight for four steps

both immediately after injury and in theA&E department

358 Driscoll, Lecky, Crosby

www.jnlaem.com

Page 11: An introduction to everyday statistics—2

O is 0.46 × 0.46 = 0.21. This is the multiplica-tion rule of probability and it can be written as:Probability of outcome A and B = probabilityof outcome A × probability of outcome B

However, the method used to calculate thechance of a particular combination varies withthe independence of the outcomes. Independ-ence in this context means the chances of aparticular result will not make another out-come more or less likely. In the example givenabove this obviously applies—a particularpatient’s blood group will not change thechances of an unrelated patient being of certainblood type.

It also follows that if the outcomes are notindependent then the multiplication probabil-ity will not apply. This can be used to detectfactors that are related.3 To demonstrate thisassume the probability of loosing a finger inyour community is 0.01 and the probability ofworking at “Happy crusher” the local stealworks is 0.1. If these were independent of oneanother the probability of working at Happycrusher and loosing a finger should be:

0.1 × 0.01 = 0.001(that is, one chance in a 1000)

If you find out that in practice the figure is0.01 you would suspect there is a connectionand the factors are not independent.

In clinical practice we are often dealing withoutcomes that are not mutually exclusive.4

Consequently you usually need to take intoaccount the probability of a combinationoccurring. This can be calculated by modifyingthe equation above to: Probability of outcomeA or B = probability of outcome A + probabil-ity of outcome B−probability of outcome Aand B.

The following problem helps to demonstratethese points. Let us say the probability of aperson having an emergency admission to hos-pital at some stage in their life is 0.6. They alsohave a 0.3 probability of being asked tocomplete a telephone questionnaire in thesame time span. If these results were com-pletely independent of one another, the prob-ability of having an: Emergency admission anda telephone questionnaire = 0.6 × 0.3 = 0.18

Consequently the probability of havingeither an: Emergency admission or a telephonequestionnaire = 0.6 + 0.3−0.18 = 0.72

To illustrate what these figures mean it ishelpful to use a 2×2 table after converting theprobabilities into actual numbers. This is doneby assuming we are dealing with 100 people(table 1).

From table 1 you can see that 18 people willhave both an emergency admission and a tele-phone questionnaire some time in their life.Seventy two will have one or both. Thisnumber is the total of those having a telephonequestionnaire only (12) plus people having anemergency admission only (42) plus those hav-ing both an emergency admission and atelephone questionnaire (18).

There is another method of calculatingprobabilities when dealing with data that haveonly two possible outcomes. Examples of suchbinomial data include live or die; boy or girl,success or failure. Consequently the outcomesare mutually exclusive. The probability of aspecific combination of these outcomes can bedetermined by use of the binomial probabilitytables.5 These list the chances of obtainingeach of the possible outcomes.

As binomial distributions deal specificallywith combinations of independent, mutuallyexclusive events, they are often not applicableto emergency medicine. In contrast, they areused in genetic counselling where some inher-itance disorders, such as Tay-Sachs disease,follow a binomial distribution. Koosis providesa comprehensive account for those who wish tolearn more about this topic.5

Table 1

Telephone questionnaire

Yes No Total

Emergency admissionYes 18 42 60No 12 28 40Total 30 70 100

Figure 1 Weight of staV in the emergency department of Deathstar General.

72–73.9

74–75.9

76–77.9

78–79.9

80–81.9

82–83.9

84–85.9

86–87.9

88–89.9

90–91.9

8 0.2

4 0.1

Weight (kg)

Freq

uen

cy

Pro

po

rtio

n

An introduction to statistical inference—3 359

www.jnlaem.com

Page 12: An introduction to everyday statistics—2

DISPLAYING PROBABILITIES

In article 1 we discussed how a frequency dis-tribution could be used to show graphically thenumber of cases at each particular value of avariable. This is demonstrated in the followingexample. The specialist registrar at DeathstarGeneral, Dr Egbert Everard is concernedabout the health of the 40 male staV in theemergency department. He therefore decidesto weigh them and plot the results as afrequency distribution (fig 1). Probabilities canbe demonstrated in a similar way. To do thisEgbert divides the number of cases in eachweight category by the total number of cases inthe whole study (that is, 40). This gives him theproportion of cases at each value. These valuescan then be joined up to produce a distributioncurve (fig 1).

It is possible to use these distribution curvesto calculate the probability of having a valueequal to, or greater than, a particular number.For example the proportion of staV with aweight greater than, or equal to, 80 kg is repre-sented by the area under the curve to the rightof 80 kg mark (fig 2). Probability distributionshave a further useful property in that the areaunder the whole curve is equal to one. This isbecause it represents the sum of all the possibleprobabilities. Consequently the proportion ofstaV with a weight less than 80 kg isrepresented by the area under the curve to theleft of the 80 kg mark. This is equal to[1−shaded area].

Statistical inference in medical studies com-monly use probabilities in this way to test thenull hypothesis.

Testing the null hypothesisConsider what you would do if asked to makerecommendations for your emergency depart-ment on a new drug for asthma care followinga successful trial. Firstly, you would need to beto sure the patients were representative andrandomly chosen. Secondly, any diVerence ineVect attributable to the new treatment wouldneed to be judged in the light of the diVerencesbetween patients simply attributable to chancevariation.

We have seen that the probabilities of variousoutcomes can be quantified using statisticalinference. However, it is not practical to test allof the infinite number of possible diVerencesbetween the population and sample. Conse-quently only the possibility of there being nodiVerence between the population and sampleis tested. It is then feasible to determine theprobability that a diVerence equal, or greater,to that found in the study could be attributableto normal variation. This is known as testingthe null hypothesis.

The probability of the null hypothesis beingcorrect is called the p value, a frequently usedterm in medical journals. For example, in astudy comparing the rehabilitation time afterankle sprains with new and standard treat-ment, it was found that the mean diVerencewas four days (p = 0.01). Consequently thechance that a diVerence this big, or bigger,occurring when the null hypothesis is correct is1 in a 100. This means that it is more likely thatthere is a diVerence between the two treat-ments. This is called the alternative hypothesis.

Nowadays the p value is calculated by com-puter but the statistical tests used to work it outdepend upon the data in question and the typeof study. The choice of test is therefore impor-tant so that meaningful results are obtained.

Later in this series the tests commonly usedin emergency medicine will be described sothat you will be able to choose the correct one.At this point however it is useful to test ourunderstanding of the role of the null hypothesisand p values by considering the results of arecent publication. Sunde et al compared thetime from turning on the monitor to startingchest compression in diVerent types of cardiacarrest.6 In cases of asystole, the median timedelay was 29 seconds. This was significantlyshorter than the time found in patients withelectromechanical dissociation (EMD) (109seconds, p <0.001). What does this mean?

The null hypothesis in this study is that thetime delays before cardiopulmonary resuscita-tion in patients with asystole and EMD are thesame. However, the p values indicate that the

Figure 2 Distribution curve of the proportion of staV with a weight greater than or equalto 80 kg.

80

8 0.2

4 0.1

Weight (kg)

Freq

uen

cy

Pro

po

rtio

n

Key pointYou can express the data in a frequency dis-tribution as a distribution of probability.

Key pointThe null hypothesis states that the diVer-ence between the groups being tested isattributable to chance variation.

Key pointThe actual p value should be provided totwo decimal places.

Key pointThe p value is derived from the raw datausing statistical calculations and tablesappropriate to the test carried out.

360 Driscoll, Lecky, Crosby

www.jnlaem.com

Page 13: An introduction to everyday statistics—2

probability of a diVerence of 80 seconds beingattributable to chance is less than one in athousand. It is therefore more likely that thealternative hypothesis is correct and there is adiVerence between these two groups.

STATISTICAL SIGNIFICANCE

Rejecting the null hypothesis means that a“significant diVerence” exists between thepopulations studied that cannot be explainedby chance alone.

A 2×2 table can be constructed for the fourpossible outcomes of the null hypothesis (NH)tested (table 2).

TYPE I ERROR

These mistakes occur when statistical testsindicate that the null hypothesis is unlikely(that is, the p value is low) but in actual factthere is no diVerence between the studygroups. By convention, “statistical signifi-cance” is accepted if the chance of making suchan error is less than 0.05. When these arbitrarylevels are given for a study they are oftenreferred to as á. Consequently when á = 0.05 itis considered tolerable to make a “type I” mis-take 1 in 20 times. However, the levelconsidered significant is determined by you theinvestigator. It may be that a lower value of á isrequired when testing the use of an expensiveor potentially toxic treatment. In this way thechances of falsely rejecting the null hypothesiscan be kept small. In these cases you may wishto use a value of 0.01 (that is, a 1 in a 100chance of falsely rejecting the null hypothesis)rather than the usual 0.05.

In considering the arbitrary level demarcat-ing type I errors, it is also important to beaware that the value for p is markedly aVectedby both the sample sizes and the magnitude ofany diVerence (that is, the point estimate). Thisis demonstrated in table 3. In all cases the pvalue is 0.05 but the diVerence and sample sizevary. For very large samples the diVerence onlyhas to be small to produce a statistically signifi-cant result. The converse applies when thesample is small. As will be discussed later inthis series, the p value is also aVected by thestandard deviation of the distribution.

TYPE II ERROR

These represent mistakes in falsely acceptingthe null hypothesis and is represented by â. If â

is large there is a high chance of making a typeII error. As this is the opposite of what wewant, the reciprocal of the term is often used ininstead. This is known as “power” and is equalto 1−â. Consequently a test with a high powerhas a low chance of making a type II error.Conventionally, a study is required to have apower of 0.8 to be acceptable. In other wordsthe study should have an 80% chance of beingable to detect if the null hypothesis did notapply.

Four factors eVect the probability of makinga type II error (box 2)

When á increases you are less likely to acceptthe null hypothesis. Consequently the chancesof making a type II mistake fall. A balancetherefore has to be struck so that the chances ofboth type I and II errors are kept as small aspossible.

We have already discussed that inferentialstatistics are used to take account of variationsin statistics when a parameter is being esti-mated. If the variations are large then possiblevalues for the parameter will also cover a widerange. In such circumstances the chances ofrejecting the null hypothesis are reduced. Con-versely factors that decrease variability willincrease the power of the study. Consequently,as increasing the sample size leads to a fall invariability, it will also reduce the chances ofmaking a type II error.

Table 2

Reality

True diVerence No diVerence

Statistical testNH rejected Rejection correct Rejection incorrect

(type I error)NH accepted Acceptance incorrect Acceptance correct

(type II error)

Table 3 The eVect of sample size and diVerence on the pvalue when comparing two groups (assuming a constantstandard deviation)7*

Sample size of each group DiVerence (in units) p Value

4 10.0 0.0525 4.0 0.05400 1.0 0.052500 0.4 0.0510 000 0.2 0.05

*Adapted from a theoretical study by Norman et al where theycompared the diVerence between the IQ in the normal popula-tion and those reading their statistic book.

Key pointsx It is possible to reduce the chances of

making type I error by using a lower álevel.

x The p value is aVected by the size of thediVerence, the number of cases in thesample and the standard deviation.

x In a large study, you are very likely to geta “statistically significant” result.

x In a small study it is very hard to get a“statistically significant” result. Conse-quently a p value greater than 0.05 in thissituation proves nothing.

Box 2 Factors aVecting the power of astudyx Size of áx Variability of the samplex Size of the samplex Point estimate

Statistical methods used to test the nullhypothesis are termed “tests of signifi-cance”

An introduction to statistical inference—3 361

www.jnlaem.com

Page 14: An introduction to everyday statistics—2

The size of a diVerence between the studygroups (for example the control and theexperimental groups) will directly eVect power.An increase leads to greater power as there isless chance of falsely accepting the nullhypothesis.

To help understand these principles of nullhypothesis testing, consider a follow up studycarried out by Egbert. He was particularlyconcerned about how overweight the male per-sonnel were in the emergency department. Hetherefore set up a study with the nullhypothesis being that the mean weight of fit,healthy men and departmental men was thesame. Having weighed all 40 of them, he foundthe mean weight was 87 kg. Figure 3 shows thenormal probability distributions of the twopopulations. Population 1 all had the charac-teristic of being fit and healthy whereaspopulation 2 were unfit couch potatoes.Egbert’s finding of a mean of 87 kg couldtherefore lie in either distribution. The chancesof a weight this big, or heavier and being part ofthe population 1 is shown by the darker shadedarea. This represents á—that is, the probabilityof making a type I error and falsely rejectingthe null hypothesis. Conversely the chances ofbeing part of population 2 and having a weightthis big, or lighter is shown by the lightershaded area. This represents â—that is, theprobability of making a type II error and falselyaccepting the null hypothesis.

The four factors mentioned above are usedin an equation to calculate the power of astudy. However, if you are setting up a studyyou can set the power at a particular level(often 80%). Therefore, if the size of the other

variables are known (that is, á, variability andpoint diVerence), the same equation can beused to determine how many subjects arerequired in the study.

CLINICAL VERSUS STATISTICAL SIGNIFICANCE

Consider a study comparing a new anti-hypertensive medication (A) with a standardone (B). The result of the trial shows that theblood pressures in patients receiving A weresignificantly lower than those on B (p =0.0001). This means the probability that thediVerence found, or bigger, being attributableto chance is 1 in 10 000. In a well run study wewould have no problem in accepting this as astatistically significant result. However, thisdoes not mean it is clinically useful. In thesame study the point estimate between thegroups was 5 mm Hg. From a clinical point ofview we may consider this to be too small tooVset the diYculties, side eVects and expenseassociated with the drug A.

As stated before, accepting a p value of 0.05 toreject the null hypothesis may not be appropri-ate in some clinical settings. Clinical considera-tions also have to be considered when acceptingthe null hypothesis if the p value is greater than0.05. You need to take into account the type ofstudy carried out, the number of subjects ineach group and the weight of other publisheddata. A further point that should be remem-bered is that in clinical practice we usually needto know the presence and size of any diVerence.p Values only inform you on the likelihood of adiVerence being attributable to chance (that is,normal variation).

In the majority of cases these limitationswith the p value can be overcome by usingconfidence intervals. This will be discussedfurther in the following article.

SummaryStatistical inference is used to make commentsabout a population based upon data from asample. In a similar manner it can be applied toa population to make an estimate about a sam-ple. It is commonly seen in medical publica-tions when the null hypothesis is being tested.This calculates the probability (p value) of atype I error—that is, that a particular finding isattributable to chance. It is also important tobe aware of the chances of a type II error—thatis, accepting the null hypothesis when it doesnot apply. Sample size, point estimate and vari-ability are common factors that will aVect theFigure 3 Distribution curves of the weights of healthy and unhealthy men.

87

β Error

Healthy population mean75 kg

Unhealthy population mean95 kg

α Error

Pro

bab

ility

Key pointIncreasing the sample size is one of thecommonest ways of increasing the power ofa study.

Key pointsx The p value answers the question, “Is

there a statistically significant diVerencebetween the study groups?”

x Clinical issues need to be consideredalong with the size of the p value

x The size of any diVerence needs to beknown

x Statistical significance does not necessar-ily mean clinical significance

362 Driscoll, Lecky, Crosby

www.jnlaem.com

Page 15: An introduction to everyday statistics—2

chances of making these two types of errors.Interpreting results therefore needs to takethese factors into account as well as the clinicalrelevance of the findings. Statistical signifi-cance does not necessarily mean clinicalsignificance.

Quiz(1) Complete the following phrase:

A parameter is to a –––– as a ––––- is toa sample.

(2) You are told that the probability of afemale patient having a fractured femur is0.3 and green eyes is 0.4. Assuming theseare independent of one another, what isthe probability that she has:

Both a fractured femur and green eyesEither a fractured femur, or green eyesor both

(3) Name three factors that will aVect thechances of making a type I and II error.

(4) A new thrombolytic “Dyno-coronary” hasbeen developed. Though very expensiveand toxic it is thought to produce coronarypatency quicker than standard treatment. Ifyou were to design a study to assess thiswhat á level would you choose—0.05 or0.01?

(5) One for you to do one your own.Formulate the null hypothesis for thestudy by Ireland et al that investigatedwhether supine oblique views provide bet-ter imaging of the cervicothoracic junctionthan a swimmer’s view.7 Consider the con-clusion drawn with respect to statisticaland clinical relevance.

Answers(1) A parameter is to a statistic as a population

is to a sample.

(2) 0.3 × 0.4 = 0.120.3 + 0.4−0.12 = 0.58

(3) Point estimate, sample variability, samplesize

(4) You would be aiming to minimise thechances of making a type I error, conse-quently an á level of 0.01 would be prefer-able.

The authors would like to thank John Heyworth, Sally Hollis,Jim Wardrope and Iram Butt for their invaluable suggestions.Funding: none.

Conflicts of interest: none.

1 Driscoll P, Lecky F, Crosby M. An introduction to everydaystatistics—1. J Accid Emerg Med 2000;17:205–11.

2 Stiell I, Greenberg G, Wells G, et al. Prospective validationof a decision rule for the use of radiography in acute kneeinjuries. JAMA 1996;275:611–15.

3 Altman D. Theoretical distributions. In: Practical statistics formedical research. London: Chapman and Hall, 1997:48–73.

4 Coggon D. Probability. In: Statistics in clinical practice.London: BMJ Publishing Group, 1995:52–61.

5 Koosis D. Populations and samples. In: Statistics— a selfteaching guide. New York: John Wiley, 1995:40–76.

6 Sunde K, Eftestol T, Askenberg C, et al. Quality assessmentof defibrillation and advanced life support using data fromthe medical control module of the defibrillator. Resuscita-tion 1999;41:237–47.

7 Ireland A, Britton I, Forrester A. Do supine oblique viewsprovide better imaging of the cervicothoracic junction thanswimmer’s views? J Accid Emerg Med 1998;15:151–4.

Further readingM, Gaddis G. Introduction to biostatistics: Part 1, basicconcepts. Ann Emerg Med 1990;19:143–6.Glaser A. Inferential statistics. In: High-yield statistics..Philadelphia: Williams and Wilkins, 1995:9–30.Normal G, Streiner D. Inferential statistics. In: PDQ statistics.2nd ed. London: Mosby, 1997:17–36.

CorrectionWe regret that two errors occurred in thestatistics paper published in July(2000;17:274–81). On page 277, 2nd column,+/− signs were omitted from 2SD in the penul-timate paragraph. On page 278, 1st column,the equation was incorrect; it should have read:SEM=SD/'—n

An introduction to statistical inference—3 363

www.jnlaem.com