2015 gate notes

49
1 Rod Jackson Feb 14, 2015 The GATE Notes: a Graphic Approach To Epidemiology “Epidemiology with Pictures” PREFACE The “Graphic Approach To Epidemiology” (GATE) uses a triangle, circle, square and arrows to graphically represent all epidemiological studies, including randomised and non-randomised studies. Every epidemiological study can be hung on the GATE frame. GATE was developed to make epidemiological principles, study design, analysis, and appraisal easier to understand and remember. GATE illustrates how every epidemiological study shares the same basic structure and how every epidemiological study is designed with the same objective - to measure how much dis-ease occurs in different groups (or populations). Differences in the occurrence (e.g. frequencies, risks, proportions, rates) of disease in different groups of people provide insights into causes and predictors of dis-ease, and the effects of treatments on dis-ease. The GATE Notes are not a comprehensive epidemiology textbook but a visually aided guide to epidemiological principles, measures, analyses, errors and study design. The GATE approach is equally applicable in clinical, health services or public health practice. GATE started life as a ‘Graphic Appraisal Tool for Epidemiological studies’ to help medical students develop critical appraisal skills but the GATE approach is equally relevant to teaching epidemiological study design (‘Graphic Architectural Tool for Epidemiology’), which is the ‘flip-side’ of the critical appraisal of epidemiological studies. GATE was inspired by the US epidemiologist Professor Ken Rothman (1) who so elegantly dissected epidemiological study design into its’ component parts and by the Evidence Based Medicine Working Group (2,3), who developed structured guides to critiquing the clinical epidemiological literature. Further inspiration came from the late Professor Jerry Morris, the British epidemiologist who defined epidemiology as ‘numerator ÷ denominator’ (4) (i.e. number of people with outcomes ÷ number of people in a population) and won me over to epidemiology’s underlying simplicity and convinced me it was possible to make epidemiology universally accessible. GATE has been ‘work in progress’ since about 1990. I thank the thousands of students and health professionals who have been exposed to versions of GATE. We have observed their reactions, assessed their understanding and continuously modified GATE. Finally, I thank my colleagues who have borne with me and helped make major improvements to GATE. THE GATE FRAME

Upload: mariatheresatalladororbe

Post on 05-Nov-2015

25 views

Category:

Documents


3 download

DESCRIPTION

The gate notes yay! epidemiology baby

TRANSCRIPT

  • 1

    Rod Jackson Feb 14, 2015

    The GATE Notes: a Graphic Approach To Epidemiology

    Epidemiology with Pictures PREFACE

    The Graphic Approach To Epidemiology (GATE) uses a triangle, circle, square and arrows to graphically represent all epidemiological studies, including randomised and non-randomised studies. Every epidemiological study can be hung on the GATE frame. GATE was developed to make epidemiological principles, study design, analysis, and appraisal easier to understand and remember. GATE illustrates how every epidemiological study shares the same basic structure and how every epidemiological study is designed with the same objective - to measure how much dis-ease occurs in different groups (or populations). Differences in the occurrence (e.g. frequencies, risks, proportions, rates) of disease in different groups of people provide insights into causes and predictors of dis-ease, and the effects of treatments on dis-ease. The GATE Notes are not a comprehensive epidemiology textbook but a visually aided guide to epidemiological principles, measures, analyses, errors and study design. The GATE approach is equally applicable in clinical, health services or public health practice. GATE started life as a Graphic Appraisal Tool for Epidemiological studies to help medical students develop critical appraisal skills but the GATE approach is equally relevant to teaching epidemiological study design (Graphic Architectural Tool for Epidemiology), which is the flip-side of the critical appraisal of epidemiological studies. GATE was inspired by the US epidemiologist Professor Ken Rothman (1) who so elegantly dissected epidemiological study design into its

    component parts and by the Evidence Based Medicine Working Group (2,3), who developed structured guides to critiquing the clinical epidemiological literature. Further inspiration came from the late Professor Jerry Morris, the British epidemiologist who defined epidemiology as numerator denominator (4) (i.e. number of people with outcomes number of people in a population) and won me over to epidemiologys underlying simplicity and convinced me it was possible to make epidemiology universally accessible. GATE has been work in progress since about 1990. I thank the thousands of students and health professionals who have been exposed to versions of GATE. We have observed their reactions, assessed their understanding and continuously modified GATE. Finally, I thank my colleagues who have borne with me and helped make major improvements to GATE.

    THE GATE FRAME

  • 2

    CHAPTER 1: EPIDEMIOLOGICAL STUDY DESIGN, MEASUREMENT & ANALYSIS: triangles, circles, squares & arrows 1. 1. WHAT IS EPIDEMIOLOGY? Epidemiology is the study of how much dis-ease occurs in different groups/populations and of the factors that determine differences (variation) in the occurrence of dis-ease between these groups.

    We use the term dis-ease here rather than disease, to encompass all health-

    related events (e.g. an injury or death) and health-related states (e.g. diabetes, a disability, a raised blood pressure, or a state of wellness). The hyphen between dis and ease emphasises that as well as investigating pathological conditions, epidemiology can be used to investigate any health-related factors that affects our state of health or ability to function well (i.e. being at ease). Moreover while epidemiologists usually study negative events or states, like death and disease, they also study positive states of health such as wellbeing, survival, or remission of cancer.

    Epidemiologists count dis-ease occurrences. An occurrence (of dis-ease)

    describes the transition from a non dis-eased state to a dis-eased state. Epidemiologists measure and compare the occurrence of dis-ease in different

    groups of people or populations. We use the term group and population interchangeably. A population is any group of people who share a specified common factor. This factor could be a geographic characteristic (e.g. people living in northern or southern Europe); a demographic characteristic (e.g. an age group, gender, ethnicity or socio-economic category); a time period (e.g. 2001); a dis-ease (e.g. heart disease); a behaviour (e.g. smoking); a treatment (e.g. a blood pressure lowering drug); or a combination of several of these factors.

    Measures of the dis-ease occurrence in a population can inform health service

    planners about the services required for that population. Or, by measuring dis-ease occurrence in different populations (e.g. smokers versus non smokers or Maori versus non Maori) it is possible to investigate possible and probable determinants (i.e. causes) or predictors of variations in dis-ease occurrence. This knowledge can inform health promotion, dis-ease prevention and treatment decisions.

    All epidemiological studies are measurement exercises involving the collection of

    data that can be counted (i.e. quantified). Quantitative data can be classified as categorical data that is grouped into categories (e.g. male/female, smokers/ non-smokers, dead/alive); or numerical - data that take on numerical values (e.g. body weight, blood cholesterol levels, number of hospital visits, number of births).

    For simplicity, most examples in the GATE Notes involve using just two categories to

    describe populations or dis-eases (e.g. smoking yes/no and lung cancer - yes/no).

    This chapter describes: measures of dis-ease occurrence in groups of people; ways of comparing differences in dis-ease occurrence between groups of people;

    & the shared design features of all epidemiological studies.

  • 3

    1.2. EPIDEMIOLOGICAL THINKING: MEASURING DIS-EASE OCCURRENCE WITH NUMERATORS and DENOMINATORS (the hourglass)

    ALL epidemiological measures of dis-ease occurrence begin with a Denominator - the number of people in the study population being investigated - and then a Numerator - the number of people from the denominator population in whom has dis-ease occurred. An hourglass illustrates these two essential

    components of all epidemiological measures of dis-ease occurrence. The number of people in the study population at the beginning of an epidemiological study is represented by the number of grains of sand in the top bulb of the hourglass, before any sand has flowed to the lower bulb. The number of people in whom dis-ease occurs is represented by the number of grains of sand that fall into the lower bulb.

    The key requirement of an epidemiological

    study is that the dis-ease outcomes counted in the numerator must come from a defined denominator population, just as the sand in the bottom bulb must come from the top bulb. Thats why all well-conducted epidemiological studies begin by defining the denominator population.

    Most epidemiological studies measure dis-ease occurrence in several populations (or

    sub-populations or groups) and the study objective is to determine if the occurrence in each population differs and why it differs.

    Epidemiology is the Number of persons in whom dis-ease occurs (numerator) study of the occurrence = ------------------------------------------------------------------------------ of dis-ease in populations Number of persons in study population (denominator) Note: see over for details of different measures of occurrence related to the timing of measurements

    What differentiates epidemiology from other health-related sciences is its starting point the population or denominator. While all health sciences study dis-ease, they all begin with a different focus; perhaps a diseased cell, tissue, organ, or person. As all epidemiological studies involve the calculation of the occurrence of dis-ease in populations, epidemiological thinking always involves the question: whats the denominator?

  • 4

    1.3. THE GATE FRAME: THE SHAPE OF ALL EPIDEMIOLOGICAL STUDIES The GATE frame (Figure 1.1) illustrates the component parts of all epidemiological studies.

    the triangle represents the Participant (or study) Population (P) e.g. 2000 men. While the GATE triangle is the overall study denominator, it is usually divided into two or more study-specific denominators.

    the circle represents the study-specific Denominators (we call them groups). In its simplest version, the GATE circle is divided into one Exposure Group (EG) & a Comparison Group (CG), for example; EG = 400 men exposed to smoking & CG = 1600 unexposed to smoking (i.e. non-smokers). EG & CG are the actual denominators used in the calculations of dis-ease occurrence (e.g. in smokers & non-smokers). Some studies have multiple Exposure Groups.

    the square represents the Numerators (or Dis-ease Outcomes (O)) e.g. lung cancer. The GATE square is divided into 4 cells: a are the people from EG in whom dis-ease

    occurs & c are those from EG who dont get dis-ease, while b are people from CG in whom dis-ease occurs & d are those from CG who dont get dis-ease, during the time over which the study is conducted.

    the horizontal & vertical arrows represent the Time when or during which outcomes are measured and is discussed in the next section.

    Measures of dis-ease occurrence normally use a and b as the numerators the people with dis-ease. We call the occurrence of dis-ease in the Exposure Group the Exposure Group Occurrence or EGO (EGO = a/EG) and the occurrence of dis-ease in the Comparison Group is called the Comparison Group Occurrence or CGO (CGO = b/CG). One could measure the occurrence of no dis-ease in EG (= c/EG) and CG (=d/CG) and this is done in some studies, particularly diagnostic test accuracy studies (discussed later).

    Some exposures and outcomes don't naturally fit into yes/no categories like the

    smoking (yes/no) and lung cancer (yes/no) example described above. Instead, the study exposure might be something like the amount of salt consumed and the study outcome might be 'blood pressure levels'. In this example, both the exposure (salt consumption) and outcome (blood pressure level) are known as numerical measures rather than categorical measures (see page 2 for definition of numerical compared to categorical data).

    a b

    c d

    Participant Population (overall

    DENOMINATOR)

    Exposure & Comparison Groups

    (study DENOMINATORS)

    EG CG

    P

    Figure 1.1. An epidemiological study with one Exposure Group (EG), a Comparison

    Group (CG) & a categorical (yes or no) Outcome

    O T

    Outcomes (NUMERATORS)

    yes

    no

  • 5

    Some numerical measures can be changed into categorical measures. In the example above, salt consumption could be divided into two or more categories (e.g. high and medium or low intake) in which case, the study exposure and comparison groups (i.e. the denominators) could be people with high salt intake (EG in Figure 1.1) and people with medium or low salt intake (CG). Similarly blood pressure levels could be divided into high and medium or low blood pressure categories, in which case the study outcome groups (i.e. the numerators) would be people with high blood pressure (a and b in Figure 1.1) and people with medium or low blood pressure (c and d)

    When numerical data on salt intake and blood pressure are changed into categorical measures, it is possible to calculate the occurrence of high (or low) blood pressure in people with high salt intake (EGO) and the occurrence of high (or low) blood pressure in people with low salt intake (CGO).

    An alternative to dividing numerical outcome measures into two categories in order to calculate dis-ease occurrence, is to calculate the average level of the outcome in EG and CG. In the salt and blood pressure example, salt intake could still be categorically classified into high and medium or low intake categories, but the blood pressure level classified numerically by calculating the average (or mean) blood pressure in each group. The calculated average blood pressure is also considered to be a measure of dis-ease occurrence and simply involves adding together (i.e. summing) the outcome measure (e.g. a blood pressure level) for every person in EG (e.g. people with a high salt intake), then dividing by the total number of people in EG to determine the average blood pressure level. The same calculation is done for CG (e.g. people with a medium or low salt intake). So EGO = a/EG and CGO = b/CG.

    Many exposures and dis-ease outcomes can be classified into more than two categories such as high / moderate / low salt intake and high / moderate / low blood pressure, which would just involve adding additional vertical and horizontal dividers to the GATE frame circle and square and involve additional calculations of dis-ease occurrence. In this example there would be two exposure groups (EG1 = the group with high salt intake and EG2 = the group with moderate salt intake), while the comparison group would be CG = the group with low salt intake.

    In the ideal study using categorical data, everyone in EG and CG ultimately gets classified as either having a dis-ease outcome (a or b) or not having dis-ease (c or d). Therefore the number of people in EG should equal the number of people in a & c. Similarly the number of people in CG should equal the number of people in b & d.

    Some studies involve both numerical exposure measures (e.g. salt intake classified

    numerically) and a numerical outcome measure (e.g. blood pressure levels). Associations between numerical exposures and numerical outcomes are described by calculating correlations coefficients. Notes in purple are not required reading for POPHLTH 111.

  • 6

    1.4. THE TWO EPIDEMIOLOGICAL MEASURES OF DIS-EASE OCCURRENCE: INCIDENCE & PREVALENCE It is often possible, and useful, to observe when the transition from no dis-ease to dis-

    ease occurs (e.g. when a heart attack occurs) and epidemiologists often count the number of events that occur over a period of time. In other situations it is much easier and more useful just to determine if, not when, the transition has occurred, such as classifying someone as overweight or having diabetes. For example it would be very difficult to observe a person transitioning from a normal to an overweight state or from no diabetes to diabetes, but it is much easier to observe how many people are overweight, or have diabetes at one point in time.

    To address these different situations, epidemiology uses two key measures of dis-ease occurrence - incidence and prevalence. They are differentiated by the timing of the measures. We illustrate this using the analogy of the Incidence rain dropping into the Prevalence pool.

    Incidence is calculated by counting

    the number of onsets of dis-ease occurring during a period of time (the numerator analogous to the number of raindrops falling into the pool, say, over one hour) and then dividing the numerator by the number of people in the study population (the denominator analogous to the total number of possible raindrops in the Population cloud). The analogy of the raindrops is to illustrate that the dis-ease onsets (raindrops hitting the pool) can be observed when they occur and so it is possible to count the number of onsets that occur during a specified time period. Incidence is the most appropriate measure of dis-ease occurrence for dis-eases that have an observable and

    obvious onset (e.g. heart attacks occurring over 5 years among 1000 smokers compared to heart attacks occurring over 5 years in 1000 non-smokers; which could be reported as the number of heart attacks per 1000 people over 5 years). Of note, incidence measures require the dis-ease outcome to be a categorical (e.g. yes / no) variable.

    We use the vertical arrow in the GATE frame (Figure 1.1) and the raindrops falling

    into the pool (diagram above) to represent incidence measures of outcomes. Incidence is usually presented as the proportion (or percentage) of people from the study

    population (or more commonly from the exposure or comparison groups within the study population) in whom a dis-ease event occurs during a specified time period.

    Prevalence pool

    Death drips

    Cur

    e cl

    oud

    Population Cloud

    Incidence raindrops

  • 7

    Most epidemiological textbooks differentiate between two slightly different measures of

    incidence - Incidence Proportion and Incidence Rate. Incidence proportion (discussed above) is also known as cumulative incidence or more commonly simply as risk. Incidence Proportion counts everyone who started the study in the denominator and everyone who has a dis-ease onset during the study time period in the numerator. The Incidence Rate is a more exact measure of incidence as it only counts participants in the denominator for the time they remain in the study. To achieve this, participants are counted in units of person-time in the study. For example participants remaining in the study for 10 years contribute 10 person-years each to the denominator while a person whole dies 2 years into the study contributes 2 person-years and another who decides to leave the study after 5 years contributes 5 person-years. In practice, unless the study has a very long follow-up period or a high loss to follow-up or a very high event rate (all of which are uncommon), there will be little difference between the Incidence Rate and Incidence Proportion and the terms can be used interchangeably, as long as the time period over which events are counted is always specified. The incidence proportion is easier to calculate because the denominator is simply the number of persons who started in the study rather than person-time. For simplicity, the GATE Notes only use Incidence Proportion (usually known as Risk).

    Incidence is typically calculated separately in the exposure and comparison groups in the

    study population (i.e. in EG and in CG; see box below). Therefore EG and CG are considered to be separate denominators within the initial overall study population.

    Number of persons with dis-ease onsets (Numerator) Incidence = --------------------------------------------------------------------- during the study Time Total number of persons (Denominator) Incidence in EG* (EGO**) = (number of yes outcomes from EG number in EG) during time

    T or = [a EG] or = [a (a + c)] during time T Incidence in CG* (CGO**) = (number of yes outcomes from CG number in CG) during time

    T or = [b CG] or = [b (b + d)] during time T * EG is the acronym for the Exposure Group - the people exposed to the factor being studied and CG is the Comparison Group the people not exposed. ** EGO is the acronym for Exposure Group Occurrence & CGO for Comparison Group Occurrence. In this example Incidence is the measure of Occurrence.

  • 8

    Prevalence is the alternative

    measure of Dis-ease occurrence to incidence and is calculated by counting the number of people with dis-ease at one point of time (the numerator) and then dividing by the number of people in the study group at that point in time (the denominator). Prevalence is the measure of dis-ease occurrence that is used for dis-eases that develop so slowly that the actual time they occur cannot easily be observed or measured. Examples include diabetes, raised blood pressure level or being overweight - none of which have an easily observable (and therefore measurable) time of onset. This is illustrated in the adjacent picture by replacing the Incidence raindrops - that can be counted - by Incidence drizzle - that cannot easily be counted. However the drizzle, like the raindrops, falls into the pool. So if you cannot count the raindrops as they fall (Incidence), you can count the amount of drizzle that has fallen into the pool, by measuring how much water is in the pool (Prevalence). For example a

    prevalence study of diabetes would involve identifying a study population, measuring their fasting blood glucose levels (at one point in time), and calculating the proportion of this population who have a glucose level high enough for them to be diagnosed with diabetes. This proportion with diabetes is known as the prevalence of diabetes at the point in time that the measurements were taken. You do not know when they developed diabetes, but you know that at a particular point in time a certain proportion of the study population have diabetes.

    We often measure the prevalence of a dis-ease at two points of time (analogous to measuring the amount of water in the pool at two time points) and calculate the change in prevalence. The difference in prevalence between the two time points is in fact a measure of the incidence of dis-ease over the time period between the two time points.

    Some dis-eases that do have observable onsets are still best measured as prevalence if

    the signs and symptoms come and go frequently such as asthma attacks. People with asthma often have multiple asthma attacks of different severity and different frequency, so measuring the incidence of asthma attacks is very difficult and not very useful information. For example in a population of 100 people, one person may have 10 attacks over 1 year, another may have 7 attacks and another may have 20 attacks, while 97 of the 100 people have no attacks. If you add up the attacks (10 + 7 + 20 = 37) and divide by the population (100) you would calculate the incidence of asthma attacks as 37/100/year, but this is not a very good summary of the dis-ease frequency in the population because only 3 of the 100 people had any asthma attacks. In this situation, it is more useful to first define asthma, for example, as a condition involving at least two

    Incidence drizzle

    Prevalence pool

    Death drips

    Cur

    e cl

    oud

    Population Cloud

  • 9

    attacks in a one year period that are severe enough to limit usual activities, rather than measuring the exact number of asthma attacks. Therefore a diagnosis of asthma is made if a person has, for example, had two or more asthma attacks, severe enough to limit normal activities, in the previous one-year period. Once we have defined what asthma is, we then measure the prevalence of asthma as the proportion of a group of people, who at the time of asking have had at least two severe asthma attacks in the previous one-year period. So this measure of prevalence is done in two steps: i. asthma is diagnosed in people with a minimum number of attacks over the previous one year period, which has some aspects of a measure of incidence because it depends on events that happen over time; ii. the prevalence of asthma (as diagnosed in step i) is calculated at a point of time (i.e. the end of the one year period). We do not use a count of the number of attacks in the prevalence calculation, as we would if we were measuring incidence. Instead we use a count of the number of people who had at least two attacks over the previous one-year period in the calculation.

    The prevalence pool analogy is used to illustrate that prevalence is a static measure of dis-ease status like the amount of water in a pool at a point in time. It is important to appreciate that the size of the Prevalence pool depends on both the rate that raindrops fall into the pool (Incidence) and on how much water is lost from the pool. People can leave the prevalence pool either by dying as illustrated by water dripping out (the death drips) or if they are cured as illustrated by water evaporating from the pool (the cure cloud). Therefore a population with a high incidence of dis-ease could have a low prevalence if the death rate or cure rate is also high. Alternatively a population with a low incidence of dis-ease could have a high prevalence of dis-ease, if almost no one dies of the dis-ease or is cured. Prevalence is a measure of the amount (i.e. the burden) of dis-ease in a population at a point in time and is very relevant to funders and planners of health services. However it is a less useful measure than incidence for investigating causes of dis-ease occurrence because as discussed, a high incidence of dis-ease could result in either a high or low prevalence depending on the death rate and cure rate.

    As described above, prevalence can be calculated as a proportion for a categorical

    outcome such as diabetes (e.g. diabetes prevalence in 60-70 year old Maori women is 35%). For a numerical outcome such as blood cholesterol, an average or mean value is usually calculated (e.g. the mean cholesterol in patients waiting for heart surgery is 6.5 mmol/L). This is done by adding up the cholesterol values of all participants (i.e. the sum) and then dividing by the number of participants. Alternatively, numerical outcomes can be reclassified as categorical outcomes (e.g. high or low cholesterol levels) to calculate prevalence as a proportion. Using the example above, the prevalence of high blood cholesterol (say, > 6.5.mmol/L) in patients waiting for bypass surgery could be 500 per 1000 patients (50%).

    A horizontal arrow is used in the GATE frame (Figure 1.1) and the prevalence pool in

    the raindrops diagram to represent prevalence measures of outcomes at a point in time. Number1 or sum2 of dis-ease states (Numerator) Prevalence = ----------------------------------------------------------------- at a point in time Total number of persons (Denominator) Prevalence in EG (EGO*) = number of yes outcomes from EG number in EG or = a EG Prevalence in CG (CGO*) = number of yes outcomes from CG number in CG or = b CG

  • 10

    1 If the numerator is a count of a categorical (yes/no) dis-ease states (e.g. diabetes), then prevalence will be a proportion (often expressed as a percentage %) 2 If the numerator is the sum of the scores for a numerical outcome measured on everyone (e.g. blood pressure), then prevalence will usually be expressed as a mean (or average). *As with incidence, in the GATE Notes we use the generic term Exposure Group Occurrence (EGO) to describe the prevalence in the exposure group and Comparison Group Occurrence (CGO) for prevalence in the comparison group.

    Most prevalence calculations are measures of point prevalence because the presence

    of dis-ease is measured at one point in time. To calculate point prevalence, the total number of persons with the dis-ease of interest (e.g. diabetes; defined by a blood test) at a point in time is divided by the number of people in the population being studied at the same point that the dis-ease was measured.

    Some prevalence calculations are measures of period prevalence rather than point prevalence, as in the asthma attack example discussed above. To calculate period prevalence, the total number of persons with dis-ease (e.g. asthma; defined as having more than 2 major asthma attacks in the previous one-year period) is divided by the number of people in the population being studied at the end of the one-year period over which asthma attacks were observed. Period prevalence calculations do not include the actual number of episodes in the actual calculation, so whether a person has three episodes or ten episodes, they are only counted once in the numerator.

    Period prevalence is really a mix of incidence and prevalence because it initially involves

    defining the presence of dis-ease based on the number of onsets that have occurred over a period of time, and then converting this into a single measure of di-ease at a point in time. Differentiating between incidence and period prevalence measures of occurrence can be difficult and confusing in some situations and it may also be possible to measure both. You should always ask yourself: which measure of dis-ease occurrence will give me the most meaningful answer to my specific question?

  • 11

    1.5. COMPARING DIS-EASE OCCURRENCES IN DIFFERENT GROUPS & POPULATIONS: EPIDEMIOLOGY IS ALL ABOUT COMPARING EGO AND CGO

    Most epidemiological studies are designed to investigate whether there are differences in

    dis-ease occurrence (or dis-ease risk) between exposure and comparison groups within a study population. Using the GATE terminology, most epidemiological studies are designed to compare the Exposure Group Occurrence (EGO) with the Comparison Group Occurrence (CGO). Differences between EGO and CGO (e.g. EGO = road traffic injury incidence among drinkers and CGO = road traffic injury incidence among non-drinkers) provide insights into the effect of the study exposure (e.g. drinking alcohol) on the dis-ease outcome (e.g. road traffic injury).

    This difference in dis-ease occurrence between an exposed group and an unexposed (or comparison group) is commonly called an estimates of effect (or effect estimate).

    In most situations it is important to measure the occurrence of dis-ease both in the group of people who are exposed (EG) and in the group who are not exposed (CG) because the comparison group is seldom dis-ease-free. For example, not all road traffic injuries in drunk drivers will be caused by alcohol, because many non-drinkers also have road traffic injuries. It is the difference in the occurrence of road traffic injuries between drunk and sober drivers that better indicates the effect of alcohol consumption on road traffic injury rates. This important principle is often overlooked in the reporting of associations between exposures and dis-ease, and the reporting of the number of road traffic injuries caused by alcohol in the lay press is often wrongly assumed to be the same as the number of road traffic injuries among drunk drivers.

    Another common example of this kind of misinterpretation of associations between exposures and dis-ease is the apparent link between drugs and common side effects. If a person gets muscle pain soon after starting a new drug, both the patient and their doctor may assume the drug caused the muscle pain. But as muscle pain is very common, the link may be spurious. The best way to determine if a drug causes muscle pain is to compare the occurrence of muscle pain in people (EGO) who take the drug with the occurrence of muscle pain among those who dont (CGO).

    While comparisons of dis-ease occurrence are typically called estimates of effect of an

    exposure (e.g. alcohol) on a dis-ease outcome (e.g. injury), it is usually more appropriate to initially describe them as estimates of association between an exposure and an outcome. The term effect suggests a causal relationship between the exposure and outcome but there are several further steps required before one can be reasonably confident that an association is causal. The first step is to determine whether there are any important errors in the measurements (see Chapter 2 of these Notes). The second step is to consider where on the causal pathway or in the causal mix, the exposure is situated and then to consider other evidence supporting a causal link. For simplicity, in the GATE Notes, we use estimates of effect and estimates of association interchangeably.

    There are two main ways to compare two dis-ease occurrences; i. the exposure group

    occurrence can be divided by the comparison group occurrence (EGO CGO) to produce a Ratio of Occurrences; or ii. the comparison group occurrence can be subtracted from the exposure group occurrence (EGO CGO) or vice versa ((CGO EGO) to produce a Difference in Occurrences.

    Comparisons of occurrences are commonly called comparisons of risks. So a ratio of

    occurrences (EGO CGO) may be called a risk ratio or more commonly a relative risk

  • 12

    (RR), while a difference between occurrences (EGO - CGO) is commonly called a risk difference (RD). The risk difference is often referred to as an absolute risk to distinguish it from a relative risk, but it would be more appropriate to use the term absolute risk difference (shortened to risk difference) because all measures of risk or occurrence (i.e. incidence and prevalence) are absolute risk measures. It is only when you divide one risk by another that you produce a relative risk. So the ratio of two absolute risks is a relative estimate of the difference between two absolute risks while the risk difference is an absolute estimate of the difference between two absolute risks.

    The picture (left) shows two pairs of columns to illustrate the differences between the two ways of comparing dis-ease occurrences. The heights of the columns represent the magnitudes of the occurrences (we shall call them risks here). The first pair of risks is EGO = 20 units and CGO = 10 units while the second pair is EGO = 8 units and CGO = 4 units. The ratio of the column heights (risks) is the same for both pairs (20 units 10 units & 8 units 4 units, both = 2) but the differences in the heights are not the same (20 units 10 units = 10 units, 8 units 4 units = 4 units). Note that estimates of relative risk have no units (because they are relative), but estimates of risk difference have the same units as EGO and CGO.

    RISK RATIO or RELATIVE RISK (RR = EGO CGO): By convention, in the GATE Notes the relative risk is calculated by dividing the dis-ease occurrence in the exposure group (EGO) by the occurrence in the comparison group (CGO) but it can be calculated by dividing CGO by EGO so always make sure you explain a relative risk by stating that the risk in one (specified) group is, say 2 times higher than in another (specified) group. A relative risk can be any number greater than zero. If there is no difference in the risk or occurrences of a dis-ease between the two groups being compared (i.e. EGO = CGO), then the relative risk = 1.0 (i.e. when the RR = 1, there is no difference in the effect of E and C on the study outcome; this is often known as the no-effect value).

    Exposure Group Occurrence** [EGO] (or EG Risk) Relative Risk* (RR) = ------------------------------------------------------------------------------------- Comparison Group Occurrence** [CGO] (or CG Risk) * the terms Risk Ratio and Relative Risk can be used interchangeably. ** if dis-ease occurrence measures are calculated as means or averages (e.g. mean quality of life scores in people taking different anti-depressant drugs), then the relative comparison of two mean scores would yield a relative mean (RM) A relative risk that is less than 1.0 can also be expressed as a Relative Risk Reduction

    (RRR) because it is reduced below 1.0. The RRR is usually expressed as a percentage and is calculated by subtracting the relative risk from 1.0 and then multiplying

    20 units

    10 units 8

    units

    4 units

    EGO

    CGO EGO

    CGO

  • 13

    by 100 (see box below). For example if the risk of heart attacks in people taking a cholesterol-lowering drug relative to people not taking the drug is 7/10 (i.e. RR = 0.7, then the RRR = (1.0 - 0.7) x 100 = 30%. In other words the cholesterol-lowering drug takers have a 30% lower risk of heart attacks relative to the non-drug takers, suggesting that the drug lowers the risk of heart attacks.

    Similarly if the relative risk is greater than 1.0, it can be expressed as a Relative Risk

    Increase (RRI). The RRI is usually expressed as a percentage increase, calculated by subtracting 1.0 from the relative risk and then multiplying by 100. For example if the risk of heart attacks in smokers relative to non-smokers is 2/1 or RR = 2.0, then the RRI = (2.0 - 1) x 100 = 100%. In other words smokers have a 100% higher risk of heart attacks relative to non-smokers, suggesting that smoking increases the risk of heart attacks.

    Relative risk reduction (RRR) = (1 - RR) x 100%. e.g. if RR= 0.6, RRR= (1- 0.6) x 100= 40% Relative risk increase (RRI) = (RR - 1) x 100%. e.g. if RR= 1.6, RRI= (1.6-1) x 100= 60% The odds ratio (OR) is another measure used to compare risks. It is the only estimate

    of effect that can be derived from case-control studies, although ORs can be calculated in any epidemiological study. The odds ratio is similar to the relative risk in most circumstances but as dis-ease risk becomes more common in the study population, the difference between these two estimates of effect increases. If less than about 15 - 20% of people in the study population develop the dis-ease during the study period, then the difference between the odds ratio and relative risk is of little relevance. So introductory level readers should think of the odds ratio as equivalent to a relative risk.

    In the GATE frame shown in Figure 1.1, the Odds Ratio is calculated as (ab) (cd)

    which is the ratio of the odds of being exposed among people who have the study dis-ease outcome (i.e. those in a or b) to the odds of being exposed among people who dont have the study dis-ease outcome (i.e. those in c or d). Mathematically the Odds Ratio [ (ab) (cd) ] can be rewritten as (ac) (bd). This latter equation is very similar to the Relative Risk [ i.e. RR= {a(a+c)} {b(b+d)} ] if a

  • 14

    Risk Difference* (RD) = Exposure Group Occurrence** [EGO] (or EG Risk) MINUS

    Comparison Group Occurrence** [CGO] (or CG Risk) * the RD is an Absolute Risk Reduction (ARR) if the risk is lower in the exposure group or an Absolute Risk Increase (ARI) if the risk is higher in the exposure group. ** If dis-ease occurrence measures are calculated as means (i.e. averages), the difference between two means is a mean difference (MD)

    If the exposure in a study is a treatment (e.g. a drug) and there is a treatment benefit (i.e.

    the dis-ease occurrence [or risk] in the treatment group is less than in the comparison group: EGO < CGO), then the risk difference can also be expressed as the Number of people Needed To be Treated to prevent one event (abbreviated to Number Needed to Treat or NNT). For example, if the risk of death over 5 years in patients with breast cancer treated with surgery plus chemotherapy (EGO) is 20/100, compared to a risk of 25/100 in patients receiving surgery only (CGO), the RD is -5/100 (i.e. EGO CGO = 20/100 - 25/100 = -5/100). This means that for every 100 people treated with surgery & chemotherapy compared with surgery alone, there will be 5 fewer deaths over 5 years- the same as stating that for every 20 people treated with surgery plus chemotherapy compared with surgery alone, there will be one less death over 5 years. The number 20 is called the number needed to treat to benefit one person. The NNT is the reciprocal of the risk difference (i.e. NNT = 1 RD) which in this example is 1 (5/100) = 100/5 = 20.

    If the risk difference indicates that the treatment is harmful then the NNT will be the Number of people Needed to Treat to cause harm to one person. This is often called the Number Needed to Harm or NNH but it would be more accurate to call it the NNT(H), while if there is a benefit it should be called NNT(B).

    If the study exposure is a risk factor, like smoking, rather than a treatment, the equivalent

    measures are the Number Needed to Expose to Benefit or Harm one person or NNE(B) and NNE(H). Similarly if the exposure is a screening test, the equivalent measure is the Number Needed To Screen (NNS) to correctly (or incorrectly) diagnose one person with dis-ease or the Number Needed To Screen to prevent or cause one death.

    The NNT (or NNE/NNS), like the RD, is very dependent on the Time period specified.

    For example the NNT to prevent one event in 1 year will be about 5 times the NNT to prevent one event in 5 years.

    1.6. PECOT: THE 5 PARTS OF ALL EPIDEMIOLOGICAL STUDIES The GATE frame shown in Figure 1.1 illustrates the key components of all

    epidemiological study designs. There are 5 parts to most epidemiological studies: 1. the Participants or study Population ; 2. the Exposure Group and 3. the Comparison Group ; 4. the dis-ease Outcomes ; and 5. the Time when or during which dis-ease outcomes are measured.

    We use the acronym PECOT as a memory aid for describing these 5 parts of epidemiological studies.

  • 15

    The occasional study does not have a comparison group so its just PEOT, but there is usually an implicit comparison based on other studies or the Exposure Group can be subdivided by age, gender etc, which in effect creates a Comparison Group.

    An alternative acronym to PECOT commonly seen in the clinical epidemiology literature

    is PICO, where the I stands for Intervention (a treatment) or Indicator (a risk factor or prognostic factor) and Time is not explicitly mentioned. The GATE Notes use the acronym PECOT because it is more generic to all epidemiological studies, whether clinically or public health focussed, experimental or non-experimental. Exposure is the generic epidemiological term for any factor that is used to allocate the study participants into groups. Also the T in PECOT is to remind study appraisers and designers of the importance of the Time point when, or the Time period during which, dis-ease outcomes are measured.

    Dotted (not solid) horizontal and vertical lines are used within the GATE frame to indicate

    that there may be more than two exposure groups and more than two outcome groups. Alternatively either or both exposures and outcomes can be numerical measures.

    The Outcomes square with a, b, c & d cells is commonly called a 2x2 table that can be

    derived from any epidemiological study with dichotomous exposure groups (i.e. exposure / comparison) and dichotomous outcomes (i.e. yes / no).

    All types of epidemiological study will hang on the generic GATE Frame with its 5

    PECOT components. The main study types are described in Chapter 3. REFERENCES.

    1. K J Rothman. Epidemiology. An Introduction. Oxford University Press. 2002. 2. Evidence-Based Medicine Working Group. Evidence-based medicine. A new

    approach to teaching the practice of medicine. JAMA 1992; 268: 2420-5. 3. SE Straus, WS Richardson, P Glasziou, RB Haynes. Evidence-Based Medicine. How

    to practice and teach EBM. 3rd Edn. Elsevier Churchill Livingstone. 2005. pp 89-90. 4. J N Morris. Uses of Epidemiology. 3rd Edn. Livingstone. 1976.

  • 16

    CHAPTER 2: ERROR IN EPIDEMIOLOGICAL STUDIES (RAMboMAN and Confidence Intervals) 2.1 INTRODUCTION Chapter 1 has described how to estimate the

    occurrence of dis-ease (EGO & CGO) and associations between exposures and dis-ease (RR = EGOCGO and RD = EGO-CGO). Before you accept these estimates as true, it is essential to check whether they are likely to have any important random and non-random errors (i.e. deviations from the truth). This chapter describes the main types of error found in epidemiological studies.

    Errors can occur either due to problems with the study recruitment, design and implementation or due to chance. Errors caused by chance are described as random errors. Errors caused by problems with how the study is designed or conducted are described here simply as non-random errors but are often called biases or systematic errors.

    We use the acronym RAMboMAN (from the movie character) to demonstrate where non-random errors can occur in epidemiological studies. For estimates of EGO and CGO - and therefore estimates of RR and RD - to be valid (i.e. to have no important non-random errors), the right people must be included in the right parts of the GATE frame. We then use the 95% Confidence Interval to describe the amount of random error in the study results. The cartoon showing darts in the bulls-eye of a dartboard is used to illustrate error. If each dart represents the result of one of a series of identical studies conducted on different samples from the same population and the bulls-

    eye represents the truth, then any dart missing the bulls-eye has error - a deviation from the truth.

    Reproduced with permission of the copyright owner

  • 17

    Figure 2.1. Hanging a study on the GATE frame: PECOT and RAMboMAN (also see Glossary)

    STUDY QUESTION & DESIGN: describe with PECOT

    STUDY NUMBERS: hang on GATE frame

    STUDY ERROR: assess using RAMboMAN

    P = Participants: Setting

    Recruitment: who was recruited?

    Describe: - Setting. - Eligibility criteria. - Recruitment process. - % of eligibles who participated.

    Eligibles n =_ _ _ _ _ _ _

    P n = _ _ _ _ _

    Setting/eligible population appropriate, given study goals? Participants representative of Eligibles? Participant risk/prognostic profile reported?

    EG = Exposed Group [Intervention] Allocated: randomly or

    by measurement

    Allocation ( adjustment) to EG & CG: was it successful?

    Method of allocation If allocated randomly: Was process concealed? Were EG&CG similar? If allocated by measurement: Was it done accurately? Done before outcomes? Were differences between EG&CG documented?

    Describe E (how measured if not RCT)

    EG Allocated CG Allocated =_ _ _ _ _ =_ _ _ _ _ EG CG EG completed CG completed follow-up (f/u) f/u =_ _ _ _ _ =_ _ _ _ _ EG incomplete CG incomplete f/u = _ _ _ _ _ f/u =_ _ _ _ + a = b = ---------- ------------

    - c d

    CG = Comparison Group [Control] Maintenance of EG & CG as allocated sufficient?Describe C (how measured if not RCT)

    Compliance high, Contamination low? Co-interventions similar in EG&CG? Completeness of follow-up high? Participants & Investigators blind to Exposure? O = Outcome: Primary (& 2 include adverse) T = Time when outcomes counted (at what

    point in time or over what time period) Describe Outcomes & how / when measured: blind and objective Measurements?

    Outcomes measured accurately?

    S

    tudy A

    nalyses Outcome & Time EGO = a/EG CGO = b/CG RR = EGO/CGO

    95% CI RD = EGO-CGO

    95% CI NNT = 1/RD

    95% CI

    ANalyses: Intention to treat (if RCT)?_______Adjusted if EG & CG different?_______95% CI or p-values given?_______

    Summary: Non-random error: amount & direction of bias (RAMboM)? ANalyses done well? Random error: sufficiently low Power/sample size sufficiently high (if no statistically significant effect demonstrated)? Applicability of findings? Any important adverse effects? Size of effect sufficient to be meaningful (RR &/or RD)? Can be applied in practice?

  • 18

    GLOSSARY to Figure 2.1 Use this form for questions about: interventions (RCTs & cohort studies), risk factors/causes (cohort & cross-sectional studies) or prognosis (cohort studies): Hang the study on the GATE Frame STUDY QUESTIONS/DESIGN: use PECOT & GATE Frame to define study question & describe study design Setting of study: Timing & locations in which eligible population identified (e.g. country/urban/hospital). Eligible Population: those from study Setting who meet eligibility (i.e. inclusion / exclusion) criteria. Recruitment process: How was the eligible population identified from study setting: what kind of list (sampling frame) was used to recruit potential participants: e.g. hospital admission list, electoral rolls, advertisements. Who was recruited? (eg. a random sample, consecutive eligibles)? P: Participants: recruited from eligibles & allocated to EG/CG. How allocated? By randomisation or by measurement? EG: Exposed Group: participants allocated to the main exposure (or intervention or prognostic group) being studied. If there are multiple exposures, use a new GATE frame for each exposure. CG: Comparison Group: participants allocated to alternative (or no) exposure (i.e. control). Outcome: specified study outcome(s) for analyses. If multiple outcomes, use additional GATE frames. Time: when were outcomes measured; at one point in time (prevalence) or over a period of time (incidence). STUDY VALIDITY (non random error): use RAMboMAN to identify possible non-random error Recruitment: was the setting / eligible population appropriate well described? If relevant, were participants representative of elligibles? Could the results be applied to relevant populations? This should be able to be determined from risk factor/prognostic profile of participants. In prognostic studies were participants at similar stage in progression of their disease or condition? Allocation: how well were participants allocated to E&C? If an RCT was the randomisation process well described valid? If randomised, was allocation concealed (i.e. was allocation to exposure/comparison determined by a process independent of study staff or participants? Was randomization successful (i.e. EG & CG similar after randomisation were baseline characteristics similar in each group)? If not randomised (observational study) were measurements of E&C accurate & done similarly for EG & CG? Were differences between EG & CG documented for later adjustment/interpretation? Maintenance: did participants remain in the groups (EG or CG) they were initially allocated to? Compliance: % participants allocated to EG (or CG) who remained exposed to E (or C) during study? Contamination: % participants allocated to CG who crossover to EG (& visa versa if CG also an exposure)? Co-intervention: other significant interventions received unequally by EG&CG during follow-up? Completeness of follow-up: was it high & similar in EG & CG? Blinding: were participants / investigators blind to whether participants were exposed to E or C? blind and objective Measurement of outcomes: were outcome assessors blind / unaware of whether participants in EG or CG? and were outcomes measured objectively eg. biopsies; x-rays, validated questionnaires?

    ANalyses (calculation of occurrence [EGO & CGO] and effect [RR & RD] estimates) Measures of Occurrence: EGO: Exposed Group Occurrence (either incidence or prevalence measures; also known as Experimental Event Rate (EER) in RCTs). CGO: Comparison Group Occurrence (or Control Event Rate (CER) in RCTs). Most studies report cumulative incidence or prevalence measures of occurrence and EGO = a/EG & CGO = b/CG. Always document over what time period (cumulative incidence) or at what point in time (prevalence) EGO & CGO were measured. Measures of Effect (for comparing EGO & CGO): Risk Ratio (RR) = EGO/CGO; more commonly known as Relative Risk. Odds Ratios & Hazards ratios are similar to RR. Risk Difference (RD) = EGO-CGO; also known as absolute risk difference. NNT (or NNE) = 1/RD; the number Needed to Treat (or Expose) to change the number of outcomes by one (in a specified time). NNT(B): if exposure/intervention BENEFICIAL. NNT(H) or NNH: if exp/intervention HARMFUL. Intention to treat (or expose) analyses: did analyses (i.e. calculation of EGO & CGO) include all participants initially allocated to EG & CG, including anyone who dropped out during study or did not complete follow-up)? Adjusted analyses (for confounders): Were EG & CG similar at baseline? If not, were analytical methods used to adjust for any differences, e.g. stratified analyses, multiple regression? Random error [= 95% Confidence Interval (CI)] in estimates of EGO, CGO, RR, RD & NNT/E usually assessed by width of the 95% CI. A wide CI (i.e. big gap between upper & lower confidence limits (CL) = more random error = less precision.

    STUDY SUMMARY Non-random error (bias): what was the likely amount & direction of bias in the measures of effect: is bias likely to substantially increase or decrease the observed difference between EGO & CGO (and therefore the effect sizes)? Random error: would you make a different decision if the real effect was closer to upper CL than lower CL? Power: if the effect sizes were not statistically significant, was study just too small to show a real difference? Applicability: if the sizes of beneficial versus adverse effects are considered meaningful (i.e. sufficiently large benefits versus small harms)& errors small, are the findings likely to be applicable in practice?

    REFERENCE: Jackson et al. The GATE frame: critical appraisal with pictures. In: Evidence-Based Medicine. 2006;11;35-38. Also in: Evidence-Based Nursing 2006; 9: 68-71, and in ACP Journal Club 2006; 144: A8-A11.

  • 19

    2.2. NON-RANDOM ERROR (or BIAS or SYSTEMATIC ERROR): RAMboMAN Figure 2.1 is a tool for designing and critiquing a range of epidemiological studies including randomised controlled trials, cohort studies and cross-sectional studies. The acronym RAMboMAN, down the left hand side and below the GATE frame in the figure, illustrates where non-random error can occur in epidemiological studies. It stands for: Recruitment; Allocation; Maintenance; blind and objective Measurements of outcomes; and ANalyses. The components of RAMboMAN are described below. R stands for Recruitment The R question is: who was recruited into the study? AND is it possible to define a group of people or population that the participants represent and that the study findings can be applied to? There are two types of recruitment error that can occur in epidemiological studies. One type makes it difficult to apply (or generalise) the findings to a wider (or external)

    population and is also known as an external validity error. This type of recruitment error occurs when the main objective of the study is to measure the characteristics of a specified eligible population (the Eligibles in Figure 2.1), but the participants (P) who are recruited are not representative of the Eligibles. For example, consider a study in which the objective is to measure the prevalence of participation in sport at school among all New Zealand school children (the Eligibles). In this type of study, the investigators must make sure that a representative sample of all New Zealand school children are recruited. The best way to do this is to obtain a list of all school children and choose a random sample of children from the list. If however the investigators only recruited participants from schools that require all children to participate in sport, then the prevalence of sport participation in the study participants will overestimate the true prevalence among all school children in New Zealand, because not all schools expect all children to participate in sport. We call this error a recruitment error (also known as a selection bias).

    Of note, even if the investigators recruit their participants correctly (e.g. a random sample from a list of all New Zealand school children), it is still possible to recruit a non-representative sample just by chance alone, particularly if the sample is small. This is known as a random sampling error and is discussed later in this chapter under Random Error.

    In many studies, it is unnecessary to recruit participants who are representative of a specified external population. For example, it is possible to determine that a particular type of antibiotics can cure a particular type of infection in a group of people who are not representative of any particular population. Nevertheless one should always ask: is sufficient information given about the recruitment process for me to determine if I could apply the results of this study to myself, my patients, or my population?

    The other type of error that is often described as selection (or recruitment) error in

    textbooks occurs when many/all of the participants who are allocated to the Exposure Group are recruited from a different source than the participants allocated to the Comparison Group. This is equivalent to the GATE frame having two separate or overlapping triangles. For example in a study investigating whether heavy manual labour reduces the risk of heart disease, the Exposure Group (workers who undertake heavy manual labour) will have to be recruited from a population of labourers, whereas the Comparison Group (sedentary people) could be recruited from, say, office workers or the general population. As heavy manual labourers have to be fit and healthy enough to do manual labour (any many office workers may be too unfit/unhealthy to do heavy manual labour), any association found between manual labour and risk of heart disease maybe more related to the characteristics of the people recruited than the effect of manual labour on the risk of heart disease. This type of error is very different from the recruitment error

  • 20

    described in the section above and we prefer to consider it as one of the allocation/adjustment errors described in the next section.

    To remind readers to consider all the key recruitment issues, the triangle in the GATE frame is divided into 3 overlapping levels (Figure 2.1): i. the open top of triangle represents the setting in which the eligible population was recruited, for example school children living in New Zealand in 2010; ii. the rest of the triangle, combining the two lower levels, represents the eligible population (i.e. those who meet the inclusion and exclusion criteria [i.e. the eligibility criteria], for example, including children aged 5-9 years but excluding those with significant disabilities); and iii. the tip of the triangle represents those from the eligible population who agree to take part (i.e. the study Participants). Often only a small proportion of the eligible population (e.g. the more healthy or more motivated) agree to participate in a study and, as discussed above, if the study objective is to measure the prevalence of a dis-ease in a community or population, it is important to determine if the participants are similar enough to all people who meet the eligibility criteria. If a substantial proportion of the eligible population do not agree to take part these people are known as non-responders and if the non-responders are different from the responders, this can cause a recruitment error (also known as a non-response bias or selection bias). There is no specific level of response (i.e. the response rate) that is considered unacceptable, but a response rate of less than about 70-75% of those invited could cause a significant recruitment error in prevalence studies. The other type of study that requires a well-defined participant population is a prognostic study (e.g. consider a study question like: what is the prognosis probability of survival or of death among patients aged 40 to 50 years diagnosed with advanced prostate cancer?).

    A stands for Allocation and Adjustment - The A question is: were the study participants successfully allocated ( adjustment) to the Exposure Group (EG) and the Comparison Group (CG)? In the ideal epidemiological study the participants allocated to the Exposure Group and the Comparison Group would be identical except for the factor the exposure being investigated. In many studies this is not possible but the allocation and adjustment processes are designed to make EG and CG as similar as possible

    There are two common ways of allocating participants to the exposure and comparison groups in epidemiological studies. One way is to allocate participants by a random process to EG and CG. Studies using this random allocation process are called randomised controlled trials (RCTs) and involve the study investigators, in effect, tossing a coin for each participant. If it comes up heads, the participant is allocated to, say, EG and is offered the study exposure (e.g. a drug) and if it comes up tails, the participant is allocated to CG (control or comparison group) and receives a placebo (a non-active tablet that looks identical to the study drug) or perhaps an alternative treatment or nothing. The purpose of the randomisation process is to give all participants an equal chance of being allocated to EG or CG and so produce exposure and comparison groups that are very similar. RCTs are experiments because the study investigators actively experiment on participants by controlling the allocation process. Studies that randomly allocate participants to exposure and comparison groups should not be confused with studies that randomly sample (recruit) participants from a population (discussed under Recruitment above). Occasionally the investigator chooses which participant will receive the exposure or not, rather than using a random allocation process. Such non-randomised experiments are not recommended because the study investigators may choose to treat particular people they think will benefit most from the study treatment. While this may seem a very reasonable approach, it is almost certain the people in EG will differ from those in CG which makes it difficult to separate the effect of the treatment from the effects of other

  • 21

    attributes of the people selected. This problem, known as confounding, is discussed further below.

    The main alternative approach to randomly allocating participants to EG and CG is to

    allocate participants by measurement. Participants are measured to determine if they are exposed to the factor(s) being investigated in the study (e.g. they may be questioned about cigarette smoking or be asked to have a blood test). Participants are then allocated to EG (e.g. the smokers group) or CG (the non-smokers group) according to these measurements. In the smoking example, the measurement instrument is usually a questionnaire, however if the exposure of interest in the study is a blood test result, then the measurement instrument would be a test done in a laboratory. Studies that allocate participants by measurement rather than by a random allocation method are usually called observational studies because participants are observed in order to determine if they are exposed or not exposed and are then allocated to the appropriate exposure/comparison group. In observational studies the exposure and comparison groups are frequently quite different from each other and we usually try to adjust for these differences in the analyses (see below). Therefore it is important to collect sufficient information of the differences between EG and CG that can be used for the adjustments.

    As discussed in the introduction to the Allocation section, in the ideal epidemiological

    study, the only difference between participants in EG and in CG is the presence or absence of the exposure (E) being investigated, but this is seldom the case in observational (non-randomised) studies. For example, in a non-randomised study investigating the effect of vigorous leisure time physical activity (E) on the risk of heart attacks, participants who report taking vigorous leisure time activity, say, at least three times per week will be allocated to EG and participants who report less activity will be allocated to CG. In a study like this, it is very likely that the average age of participants in EG will be younger than in CG because more young people are physically active than older people. As heart attacks are also less common in younger than older people, there will be fewer hearts attacks in EG than in CG simply because of the difference in average age between the EG and CG. In addition, people who take regular leisure-time physical activity tend to smoke less and eat more healthily than non-regular exercises. So while there may be real beneficial effect of physical activity on heart attack risk, any differences in the occurrence of heart attacks (EGO) in the physically active group and the occurrence of heart attacks (CGO) in the less active group will be caused by a mix of the effect of physical activity on heart attacks and the effect of age and other factors on heart attacks. This problem of mixing two or more effects (e.g. younger age, physical activity and non-smoking) that are all related to the dis-ease outcome is called confounding. Some epidemiological textbooks state that confounding is caused by a selection bias because it is caused by the methods used to select (i.e. allocate) participants into EG and CG. However it is important not to confuse this type of selection bias with the recruitment error discussed in the Recruitment section above, that is due to the methods used to recruit (or select) participants for the study (the GATE triangle). For these reasons we prefer not to use the term selection bias at all, and instead use the term allocation error as the cause of confounding, because the error occurs because of how participants were allocated to EG and CG.

    The best way to reduce the likelihood of allocation error (and therefore confounding) is to conduct a RCT in which participants are randomly allocated to EG and CG. Randomisation is a very effective allocation method for producing two groups (EG and CG) with similar characteristics. If the study is big enough, random allocation will result in similar numbers of older and younger people, men and women, smokers and non smokers etc, in EG and CG. However, in some RCTs, particularly small ones, randomly allocating participants may not produce groups with similar characteristics, just by chance

  • 22

    alone. Therefore it is always important to check for differences between EG and CG at the beginning of a study this is called a baseline comparison and should be done whether the study has allocated participants by randomisation or by measurement.

    Allocation error can still occur in a large RCT if the random allocation process is tampered

    with. Consider the scenario of a surgeon who takes part as one of the investigators in a RCT comparing a surgical (E) and a medical (C) treatment to prevent a heart condition. The study protocol requires the surgeon to open a sealed envelope that has a randomly generated instruction to proceed either with surgical (E) or medical (C) treatment. The surgeon opens the envelope and finds an instruction to allocate the patient to the medical group (CG) and so is expected to proceed with medical treatment. However the surgeon may feel that this particular patient is more likely to benefit from surgical than medical treatment, possibly because the patient is young and healthy and will be able to cope with surgery much better than older patients. In this scenario, the surgeon may feel some pressure to reseal the envelope and choose another one and keep doing this until there is an envelope with instructions to allocate the patient to the surgical group (EG). In this example the effect of the surgery on the outcome will be mixed with the effect of the patients young age on the outcome, which will cause confounding. This tampering with the randomisation process is believed to have been surprising common in the past and simple methods have been developed to prevent it, or at least reduce the chance of it happening. The solution is known as concealment of allocation. There are a number of ways of doing this, but in effect it involves getting a completely independent person to open the envelope, then write down the treatment group and the name of the participant, and then instruct the investigator implementing the treatment (e.g. the surgeon) which group the participant will be allocated to. This means that any subsequent tampering with the randomisation process will be obvious, because the independent person has documented the correct group for that participant. In practice, concealment of allocation is usually done by phone, fax or using the internet, which further conceals the independent person from the participant and investigator. One almost concealed approach is to number the envelopes and contents and require them to be used consecutively. It is usually, but not always, possible to check if this has been done correctly. Studies comparing concealed with unconcealed allocation have been shown to produce quite different results, with unconcealed allocation tending to exaggerate the benefits of the desired treatment. Of note, if the study exposure (E) is a drug and the comparison group exposure (C) is an identical looking tablet, then concealment of allocation is unnecessary as long as no one involved directly with the participants or practitioner can tell the difference between E and C.

    While a large RCT with concealed random allocation is the best way to minimise

    allocation error, randomisation is only possible when the exposure being investigated is considered to be safe (e.g. cholesterol-lowering drugs). You cannot randomise people to smoking or non-smoking groups or to any drugs you believe may be harmful! Therefore in many situations only non-randomised studies are possible. However in studies that allocate participants to EG and CG by measurement rather than by random allocation, it is common to find important differences between the characteristics of participants in each group that may affect the study outcome and cause confounding. A range of methods are used to reduce this confounding, either in the design of the study or in the study analyses.

    A also stands for Adjusted analyses The additional A question: if there were differences in the characteristics of participants in EG and CG that could affect the study dis-ease outcomes (i.e. confounders), were they adjusted for in the analyses? This question can also be considered as part of AN (Analysis- the last 2 letters of RAMBOMAN.

  • 23

    Allocation errors can be reduced in the study analyses by dividing participants into, say, older and younger age groups or strata (equivalent to dividing the study participants in the triangle into two triangles and then analysing the data as if there were two separate studies). The results of the analyses in the different strata can then be combined, if they give reasonably similar results. If they give very different results, they should be reported separately. This analytical approach is known as stratified analysis and in the example given here, the analysis addresses (or adjusts for) allocation error (or confounding) caused by allocating more young people to the exposure group (e.g. frequent physical activity) than in the comparison group (infrequent physical activity). This is the equivalent of age-standardisation, which is commonly done when national populations with different age structures are compared. Stratified analyses are also called adjusted analyses. Other multivariate statistical methods can be used to reduce the amount of confounding by adjusting for multiple differences (e.g. age, smoking, gender, socio-economic status etc.) between EG and CG. These multivariate analyses simultaneously stratify EGO and CGO into multiple comparable strata and a detailed description is beyond the scope of these notes.

    Unfortunately it is never possible to fully adjust for confounding factors, firstly because the confounders are seldom measured perfectly but secondly and more importantly because it is obviously only possible to adjust for the confounding factors that are measured. There will often be some confounding factors, such as a positive, healthy attitude to life, which are very difficult to measure but may have a major effect on aspects of a participants behaviour that also have an effect on the dis-ease outcome being studied. Therefore it is important not to over-interpret the results of a non-randomised study if you believe there could be important unmeasured or poorly measured confounding factors.

    M stands for Maintenance The M question is: were most of the participants maintained throughout the study in the groups (EG & CG) to which they were initially allocated? In the perfect epidemiological study, once allocated to EG or CG, participants should

    remain in their allocated group and: i. maintain their exposure or comparison status throughout the study; ii. not be exposed to other factors that could influence the study outcome(s); and iii. not drop out of the study. If some participants exposure status changes or some are lost to follow-up, this can introduce a maintenance error. In practice participants are seldom perfectly maintained in their allocated groups but as long as any maintenance errors are small and similar in both EG and CG, the error will underestimate the true effect of the exposure on the study outcome(s). This conservative error is usually considered preferable to not knowing whether the error will exaggerate or underestimate the true study effect measures. The best way to keep the degree of maintenance error similar EG and CG is to keep the participants and study practitioners blind to whether the participant is receiving the study exposure (E) or the comparison exposure (C). This is easier to achieve with drugs (which can be prepared to look identical) than with other interventions like surgery or physiotherapy or diet.

    There are 4 main factors that can cause Maintenance error: compliance, contamination,

    co-intervention and loss-to-follow-up. In some RCTs, participants are required to take an intervention (e.g. a drug or diet) every

    day for a number of months or years. If they comply with these instructions most of the time (e.g. take 80% or more of the tablets prescribed) they are considered to have good Compliance. The level of compliance is often checked periodically in trials and it usually falls over time. If the comparison group is also actively exposed (e.g. receiving an alternative drug), then their compliance should also be assessed. Similarly, exposure

  • 24

    status should be checked periodically in long-term observational studies, (e.g. light, moderate or heavy drinking) although this is seldom done and is an important shortcoming of many observational studies, particularly those with very long follow-up periods during which exposure status may change a lot. In studies where both EG and CG receive (different) treatments, compliance can be a problem for both EG and CG.

    If participants in CG receive the study treatment that was only meant for EG this is called

    Contamination because participants in CG are contaminated by the exposure that was meant only for EG. In studies where both EG and CG receive (different) treatments, contamination can go both ways from EG to CG or from CG to EG.

    Maintenance error can also occur if the exposure and comparison groups are treated

    differently during the study in any way (other than being exposed to E or C) that influences the dis-ease outcome. This is more common when the participants or the clinical staff involved are aware of which group (EG or CG ) the participants are allocated to (i.e. are not blind to exposure status). For example in a study comparing the effectiveness of a new blood pressure lowering drug (E) compared with an older drug (C), participants in the comparison group may have been allocated to a treatment that is usually perceived to be less effective than the new drug being studied. So these participants may ask for, or be more receptive to, advice about lifestyle changes to help reduce blood pressure or may be more likely to try additional interventions. Similarly the clinicians involved may be more likely to provide other interventions to participants allocated to the older therapy. This form of maintenance error is known as Co-Intervention. The best way to reduce co-intervention (and also contamination) is to keep both participants and practitioners blind to the allocated exposure. If neither the participants nor the practitioners are aware of which exposure (or comparison exposure) the participants are receiving (i.e. the exposure status), then the study is called a double blind study. In studies where only the participants or only the study staff are unaware of participants exposure status, the study is called a single blind study.

    Participants who stop taking part or drop out of a study (i.e. are lost to follow-up)

    sometime after being allocated to EG and CG, can also be a cause a maintenance error. The degree of error will be exaggerated if the numbers and characteristics of those who are lost to follow-up differ substantially between EG and CG. One way to address this problem (other than blinding) is to calculate EGO and CGO assuming that all the participants initially allocated to EG and CG are still in their allocated groups. This is called Intention-To-Treat (ITT) or Intention-To-Expose (ITE) analyses and will reduce any differences between EGO and CGO and therefore underestimate the effects of the exposure being investigated. This conservative error is considered to be preferable to the alternative approach of only including participants who remain in EG and CG in the calculations of EGO and CGO, known as On-Treatment (or per-protocol) analyses. On-Treatment analyses tend to exaggerate differences between EGO and CGO and therefore over-estimate any true effect of interventions. Some studies calculate EGO & CGO using both intention-totreat and on-treatment methods. If the differences between the two methods are small, then loss-to-follow-up is less likely to be an important cause of error.

    boM stands for blind or objective Measurement of dis-ease outcomes The boM question is: were the people who measured the dis-ease outcomes unaware of (i.e. blind to) the participants exposure status or were these measurements made objectively (using measurement instruments that were not influenced by subjective human factors)? Errors in the measurement of outcomes can result in study participants being classified in

    wrong dis-ease outcome category of the GATE frame (e.g. classified as mild disability

  • 25

    rather than moderate disability, which is often a difficult distinction). If these errors result from deficiencies with measurement methods or instruments (e.g. a questionnaire rather than a blood test for smoking, or a poorly designed questionnaire for diagnosing disability, or perhaps a faulty set of scales that overestimates weight) they will cause a measurement error. While boM as described here relates to errors in the measurement of outcomes, it is similar to errors in the measurement of exposure status in non-randomised studies where measurement error could result in participants being allocated to the wrong exposure (or comparison) groups (e.g. non-smoking rather than smoking). We refer to this latter error as an allocation error, because it will result in people being allocated to the wrong exposure/comparison group.

    Measurement of outcomes errors can be reduced in several ways. Knowledge of a

    participants exposure status can influence the participants or the practitioners perception or interpretation of signs and symptoms of the study outcome. For example, the results of RCTs of surgery (E) versus physiotherapy (C) for treating limitations in knee movement and pain due to damaged cartilage in the knee joint, can be influenced just by the knowledge of which intervention was used. Participants receiving surgery may report greater improvements in movement and less pain than participants receiving physiotherapy, because they may assume that surgery to remove damaged cartilage should be more effective than physiotherapy. As the outcomes being investigated (i.e. range of movement of the knee and pain) are not simple, objective, clear-cut (yes/no) measures, they are more susceptible to influence from subjective factors that may be unrelated to the effectiveness of the treatment. The practitioner who measures the degree of movement in a participants knee and asks about pain may also be influenced by knowledge of the type of treatment received. One way to reduce this problem is to blind the participants or investigators or both to knowledge of which intervention (exposure) participants received. While it is generally not possible to keep information about surgery from participants, there is a famous blinded study of surgery versus physiotherapy for the knee problem described above. To blind the participants, everyone in the study received a local anaesthetic and a small cut in the skin of the knee. However the actual surgery, which used a keyhole procedure through the small cut, was only undertaken on participants randomly allocated to the surgical group (EG). The surgeon just pretended to do the surgery on the comparison group (CG). As the procedure was done behind surgical drapes, participants were unable to tell if they had received surgery or not, so they were blind to the exposure. The practitioners measuring the study outcomes were also kept blind to whether participants were in EG or CG. This study showed no differences in pain or knee movement between the surgical and non-surgical groups, whereas most of the previous un-blinded studies had shown a benefit of surgery.

    The other main way of addressing measurement of outcome (and also of exposure)

    errors is to use objective measurements wherever possible. Examples include well-validated standardised questionnaires about the study exposures outcomes that are administered in exactly the same way to all participants. Alternatively if a reliable blood test is available for measuring the dis-ease outcome, it would be considered more objective than self-reported information from the participant. For example, there is a blood test for checking whether someone has recently smoked a cigarette. Another example of a more objective measure than signs and symptoms is a chest x-ray to diagnose heart failure. However x-rays and other scans require interpretation, so radiologists reading the scans should also be blind to information about participants exposure status. Clearly death is an objective measure of outcome.

    AN stands for ANalyses The AN question is: were the study Analyses done correctly?

  • 26

    As discussed in the previous chapter, the goal of all epidemiological studies is to measure EGO and CGO, and to calculate the difference between EGO and CGO (i.e. RR, RD, NNT/E). The AN component of RAMboMAN is to remind you to check whether these were done correctly.

    There are two key analytical issues in epidemiological study, both of which have been mentioned above. The first relates to the denominator population used in the calculation of EGO and CGO. If everyone who was allocated to EG or to CG are included in the denominators in the analyses, then this is know as an intention-to-treat (or to-expose) analyses, whereas if only those who remained on treatment are included, then an on-treatment (exposure) analysis has been done. As discussed in the loss-to-follow-up section under Maintenance error, intention-to-treat/expose analyses are generally considered to be the preferable approach.

    The second key analytical issue relates to adjustments for potential confounding that has already been described under Allocation (and adjustment) error.

    2.3. RANDOM ERROR and 95% CONFIDENCE INTERVALS

    In epidemiological studies, random errors are errors that occur due to chance, rather than due to the way studies are designed and conducted. Unlike the non-random errors described in Section 2.2, most random errors can be reduced by increasing study size or by increasing the number of times a factor is measured on each participant (e.g. blood pressure). Using the analogy of throwing a standard six-sided dice, every time you throw it, there is a one in six chance of it landing on one of the six sides. However if you throw it six times it is very unlikely to land once on each of the sides you may even get the same side six times just be chance alone! Yet if you throw it 600 times it will land on each of the six sides approximately 100 times each. The more throws you make, the less influence chance (random error) will have on the number of times the dice lands on a particular side rather than another. There are a number of causes of random error in epidemiological studies. The four main ones are described below. Random sampling error: In the study discussed in section 2.2 about the prevalence of regular participation in sport

    among New Zealand school children, let us assume that the study participants were a representative sample of all children and that they were recruited by taking a random sample from all New Zealand school rolls. Even if the recruitment process is done very well, the participants will never be a perfectly representative sample of all children on all the school rolls because you would literally have to include every school child in New Zealand to achieve this. Every representative sample of school children recruited will be slightly different from every other sample, just by chance. So the prevalence of sport participation in one sample of children will be different from the prevalence in other samples and they may all be different from the true prevalence among all school children, which is what the study is trying to determine. This error is known as a random sampling error.

    Random sampling error is inherent in every study because, as discussed above, every

    study population can only be a sample of the total population of interest. While repeated

  • 27

    samples from the same total population will all be different, the bigger the sample chosen, the smaller the differences between the sample and the total population (i.e. the smaller the random sampling error).

    Random measurement / assessment error: The measurements of exposure (and comparison ) status and of the dis-ease

    outcomes are all subject to random measurement error. Our ability to measure biological factors in exactly the same way every time we measure them is often poor, particularly if the measurement instrument requires a human operator. For example when measuring blood pressure with a standard sphygmomanometer, operators may record different results in repeated measurements of, say, a persons blood pressure level even if the actual level remained unchanged. This could be due to a variation in background noise or some other factor that influences the operator's ability to detect the blood pressure sounds accurately. The best way to reduce this random error is to take multiple measurements and average them or to use an automatic, more objective, instrument.

    The randomness inherent in biological phenomena: A fundamental cause of random error is the inherent variability in all biological

    phenomena and therefore inherent variability in all measurements of biological phenomena (i.e. measuring factors in living organisms that by definition are always changing). For example, if blood pressure is measured a number of times on the same person, using exactly the same automatic sphygmomanometer, each reading will be slightly different, even if the instrument is perfect (i.e. no random measurement error). The reason for these differences is that a persons blood pressure level changes from moment to moment. As with random measurement errors due to operator error, these differences between multiple measurements caused by biological variability can be reduced by taking multiple measurements and then averaging the results.

    Random allocation error: As discussed in section 2.2, the exposure and comparison groups in a randomised

    controlled trial may differ by chance alone, particularly if the trial is small; this type of random allocation error can also be reduced by undertaking a larger study, so large numbers are randomised to EG and CG.

    2.4 ESTIMATING RANDOM ERROR WITH 95% CONFIDENCE INTERVALS As there is random error in every measurement in every epidemiological study, all measures of EGO or CGO and calculations of RR or RD or NNT will have random error. Fortunately statisticians have developed methods for estimating the amount of some of the random errors described above, particularly random sampling error. There are two main ways of describing the amount of random sampling error in a measurement or calculation; confidence intervals and p-values, and they are different ways of expressing the same information. We will focus on confidence intervals because they are easier to understand and usually more informative than p-values. A small section on p-values is included as they are still in common use. It is important to appreciate that it is not possible to estimate the total amount of random error in a study measurement or calculation, so the confidence intervals and p-values described here only account for some of this error mainly the random sampling error, which i