evidence-based medicine: effective use of the medical literature edward g. hamaty jr., d.o. faccp,...

Evidence-Based Medicine:Evidence-Based Medicine:Effective Use of the Medical Effective Use of the Medical

LiteratureLiterature

Edward G. Hamaty Jr., D.O. FACCP, FACOIEdward G. Hamaty Jr., D.O. FACCP, FACOI

Appraising Diagnosis ArticlesAppraising Diagnosis Articles

Diagnosis• A diagnosis study is a prospective study with independent, blind comparison.

• Diagnosis research design is different from the other types of research design discussed in this module and is not represented in the levels of evidence pyramid. Diagnosis research design involves the comparison of two or more diagnostic tests that are both applied to the same study population. One of the diagnostic tests applied to the study population is the reference standard, or “gold” standard; this tool acts as the standard of test sensitivity and specificity against which the other test is compared. Sensitivity and specificity are two measures that describe the efficacy of a diagnostic tool in comparison to the reference standard diagnostic tool.

• Sensitivity is the proportion of people with the target disorder who have a positive test result.

• Specificity is the proportion of people without the target disorder who have a negative test result.

• To alleviate bias in diagnosis research, the reference standard and the test in question are applied independently and the researchers interpreting the results are blinded to the results of the other diagnostic test.

• Diagnosis research design is also used evaluate screening tools.

Diagnosis

Is the Study Valid?Is the Study Valid?

• 1) Was there a clearly defined question?• What question has the research been

designed to answer? Was the question focused in terms of the population group studied, the target disorder and the test(s) considered?

Is the Study Valid?Is the Study Valid?• 2) Was the presence or absence of the target disorder confirmed

with a validated test ('gold' or reference standard)?

• How did the investigators know whether or not a patient in the study really had the disease?

• To do this, they will have needed some reference standard test (or series of tests) which they know 'always' tells the truth. You need to consider whether the reference standard used is sufficiently accurate.

• Were the reference standard and the diagnostic test interpreted blind and independently of each other?

• If the study investigators know the result of the reference standard test, this might influence their interpretation of the diagnostic test and vice versa.

Is the Study Valid?Is the Study Valid?• 3) Was the test evaluated on an appropriate spectrum of patients?

• A test may perform differently depending upon the sort of patients on whom it is carried out. A test is going to perform better in terms of detecting people with disease if it is used on people in whom the disease is more severe or advanced.

• Similarly, the test will produce more false positive results if it is carried out on patients with other diseases that might mimic the disease that is being tested for.

• The issue to consider when appraising a paper is whether the test was evaluated on the typicaltypical sort of patients on whom the test sort of patients on whom the test would be carried out in real life.would be carried out in real life.

Is the Study Valid?Is the Study Valid?• 4) Was the reference standard applied to all patients?

• Ideally, both the test being evaluated and the reference standard should be carried out on all patients in the study. For example, if the test under investigation proves positive, there may be a temptation not to bother administering the reference standard test.

• Therefore, when reading the paper you need to find out whether the reference standard was applied to all patients. If it wasn't, look at what steps the investigators took to find out what the 'truth' was in patients who did not have the reference test.


• Is it clear how the test was carried out?• To be able to apply the results of the study to

your own clinical practice, you need to be confident that the test is performed in the same way in your setting as it was in the study.

Is the Study Valid?Is the Study Valid?• Is the test result reproducible?• This is essentially asking whether you get the same

result if different people carry out the test, or if the test is carried out at different times on the same person.

• Many studies will assess this by having different observers perform the test, and measuring the agreement between them by means of a kappa kappa statistic.statistic. The kappakappa statistic takes into account the amount of agreement that you would expect by chance. If agreement between observers is poor, then the test is not useful.


Kappa is often judged as providing agreement which is:

Poor if: k ≤ 2.0Fair if: 2.1 ≤ k ≤ 4.0Moderate if: 4.1 ≤ k ≤ 6.0Substantial if: 6.1 ≤ k ≤ 8.0Good if: > 8.0

Is the Study Valid?Is the Study Valid?• Κ=1 implies perfect agreement and Κ=0 suggests that

the agreement is no better than that which would be obtained by chance.

• There are no objective criteria for judging intermediate values.

• However, kappa is often judged as providing agreement which is:

• Poor if k ≤ 0.2• Fair if 0.21 ≤ k ≤ 0.40• Moderate if 0.41 ≤ k ≤ 0.60• Substantial if 0.61 ≤ k ≤ 0.80• Good if k > 0.80


• The extent to which the test result is reproducible may depend upon how explicit the guidance is for how the test should be carried out.

• It may also depend upon the experience and expertise of the observer.

Appraising diagnostic tests

1. Are the results valid?

2. What are the results?

3. Will they help me look after my patients?

Basic design of diagnostic accuracy study

Series of patientsSeries of patients

Index testIndex test

Reference (“gold”) standardReference (“gold”) standard

Blinded cross-classificationBlinded cross-classification

Validity of diagnostic studies

1. Was an appropriate spectrum of patients included?

2. Were all patients subjected to the gold standard?

3. Was there an independent, blind or objective comparison with the gold standard?

Selected PatientsSelected Patients


Reference standardReference standard


1. Was an appropriate spectrum of patients included? Spectrum bias

1. Was an appropriate spectrum of patients included? Spectrum bias

• You want to find out how good chest X rays are for diagnosing pneumonia in the Emergency Department

• BestBest = allall patients presenting with difficulty breathing get a chest X-ray

• Spectrum bias Spectrum bias = onlyonly those patients in whom you really suspect pneumonia get a chest X ray



Reference Reference standardstandard


2. Were all patients subjected to the gold standard? Verification (work-up) bias

2. Were all patients subjected to the gold standard? Verification (work-up) bias

• You want to find out how good is exercise ECG (“treadmill test”) for identifying patients with angina

• The gold standard is angiography• BestBest = allall patients get angiography• Verification (work-up bias)Verification (work-up bias) = onlyonly patients who

have a positivepositive exercise ECG get angiography



Reference standardReference standard

Unblinded cross-classificationUnblinded cross-classification

3. Was there an independent, blind or objective comparison with the gold standard? Observer bias

• You want to find out how good is exercise ECG (“treadmill test”) for identifying patients with angina

• AllAll patients get the gold standard (angiography)

• Observer bias Observer bias = the Cardiologist who does the angiography knows what the exercise ECG showed (not blindednot blinded)

3. Was there an independent, blind or objective comparison with the gold standard? Observer bias



Reference standard….. Reference standard….. includesincludes parts of Index test parts of Index test

Unblinded cross-classificationUnblinded cross-classification

Incorporation Bias




Ref. Std ARef. Std A Ref. Std. BRef. Std. B

Differential Reference Bias

Validity of diagnostic studies

1. Was an appropriate spectrum of patients included?

2. Were all patients subjected to the Gold Standard?

3. Was there an independent, blind or objective comparison with the Gold Standard?

DOR (Diagnostic Odds Ratio)Another measure for the diagnostic accuracy of a test is the diagnostic odds ratio (DOR), the odds for a positive test result in diseased persons relative to the odds of a positive result in non-diseased persons.

The DOR is a single statistic of the results in a 2 x 2 table, incorporating sensitivity as well as specificity. Expressed in terms of sensitivityand specificity the formula is:

DOR = [Sensitivity/(1 – Sensitivity)]/[(1 - Sensitivity)/Specificity]

Are the Results Important?Are the Results Important?• What is meant by test accuracy?• a The test can correctly detect disease that is present

(a true positive result).• b The test can detect disease when it is really absent

(a false positive result).• c The test can incorrectly identify someone as being

free of a disease when it is present (a false negative result).

• d The test can correctly identify that someone does not have a disease (a true negative result).

• Ideally, we would like a test which produces a high proportion of a and d and a low proportion of b and c.

Are the Results Important?Are the Results Important?Sensitivity and specificity• Sensitivity is the proportion of people with

disease who have a positive test. (True Positive)

• Specificity is the proportion of people free of a disease who have a negative test. (True Negative)

Are the Results Important?Are the Results Important?

Sensitivity, specificity, positive & negative

predictive values, likelihood ratios

…aaarrrggh!!

2 by 2 table

Disease

Test

+ -

+

-

2 by 2 table

Disease

Test

+ -

+

- c

a b

d

2 by 2 table

Disease

Test

+ -

+

-

a

True positives

c

False negatives

b

False positives

d

True negatives

2 by 2 table: sensitivity

Disease

Test

+ -

+

- c

a

Sensitivity = a / a + c

Proportion of people with the disease who have a positive test result.

.…a highly sensitive test will not miss many people

2 by 2 table: sensitivity

Disease

Test

+ -

+

- 1

99

Sensitivity = a / a + c

Sensitivity = 99/100 = 99%

2 by 2 table: specificity

Disease

Test

+ -

+

-

b

d

Specificity = d / b + d

Proportion of people without the disease who have a negative test result.

….a highly specific test will not falsely identify people as having the disease.

Tip…..

• Sensitivity is useful to me

• Specificity isn’t….I want to know about the false positives

…so……use 1-specificity which is the false positive rate

2 by 2 table:

Disease

Test

+ -

+

- c

a b

d

Sensitivity = a/a+c False positive rate = b/b+d

(same as 1-specificity)

2 by 2 table:

Disease

Test

+ -

+

- 1

99 10

90

Sensitivity = 99% False positive rate = 10%

(same as 1-specificity)

Example

Your father went to his doctor and was told that his test for a disease was positive. He is really worried, and comes to ask you for help!

• After doing some reading, you find that for men of his age:

–The prevalence of the disease is 30%–The test has sensitivity of 50% and specificity of 90%

• “Son/Daughter, tell me what’s the chance I have this disease?”

• 100% Always

• 50% maybe

• 0% Never

A disease with a A disease with a prevalence of 30%.prevalence of 30%.

The test has The test has sensitivity of 50% sensitivity of 50% and specificity of and specificity of

90%.90%.

Prevalence of 30%, Sensitivity of 50%, Specificity of 90%

30

70

15

7100

22 people test positive……….

of whom 15 have the disease

So, chance of disease is 15/22 about 70%

Disease +ve

Disease -ve

Testing +ve

Sensitivity = 50%

False positive rate = 10% (1-Sp)

A disease with a prevalence of 4% must be diagnosed.

The diagnostic test has a sensitivity of 50% and a specificity of 90%.

If the patient tests positive, what is the chance they have the disease?

Try it againTry it again

Prevalence of 4%, Sensitivity of 50%, Specificity of 90%(Same Positive Test – Lower Prevalence)

4

96

2

9.6

100

11.6 people test positive……….


So, chance of disease is 2/11.6 about 17%

(vs 70% in prior example where prevalence was 30%)

Disease +ve

Disease -ve

Testing +ve

Sensitivity = 50%


Doctors with an average of 14 yrs experience

….answers ranged from 1% to 99%

….half of them estimating the probability as 50%

Gigerenzer G BMJ 2003;327:741-744

Hi Sensitivity D-Dimer– High negative predictive value for PE (based on

pulmonary angiography) – For D-dimer <500ng/mL, negative predictive

value (NPV) 91-99%– For D-dimer >500ng/mL, sens=93%, spec=25%,

and positive predictive value (PPV) = 30% – PPV and NPV are affected by prevalence.PPV and NPV are affected by prevalence.– Test is also useful for DVT rule out (<500ng/mL):

NPV 92%

– If pretest probability is intermediate (27.8) you are supposed to image, but if you order a D-Dimer what do the results mean?

Prevalence of 27.8% (Well’s Intermediate), Sensitivity of 93%, Specificity of 25%

28

72

26

54

100



So, chance of disease is 26/80 about 32.5%

(PPV quoted at

30%)

Disease +ve

Disease -ve

Testing +ve

Sensitivity = 93%


Incidence of PE in the General Population

• 650,000 to 900,000/year• Current US Population = 307,085,301

Incidence Ranges :– From 650,000/307,085,301 = 0.0021 or 2/1000– To 900,000/307,085,301 = 0.00293 or 3/1000

– Admittedly this includes newborns, children, etc. but is being used for illustrative purposes.

Prevalence of 0.003%, Sensitivity of 93%, Specificity of 25%

3

997

3

748

1000



So, chance of disease is with a + test is 3/751 or about 0.4%

(The lowerlower the prevalence, the moremore false positives)

Disease +ve

Disease -ve

Testing +ve

Sensitivity = 93%


Sensitivity and specificity don’t vary with prevalence

• Test performance can vary in different settings/ patient groups, etc.

• Occasionally attributed to differences in disease prevalence, but more likely is due to differences in diseased and non-diseased spectrums

2 x 2 table: positive predictive value

Disease

Test

+ -

+

- c

a b

d

PPV = a / a + b

Proportion of people with a positive test who have the disease

2 x 2 table: negative predictive value

Disease

Test

+ -

+

- c

a b

d

NPV = d / c + d

Proportion of people with a negative test who do not have the disease

What’s wrong with PPV and NPV?• Depend on accuracy of the test and

prevalence of the disease

Are the Results Important?Are the Results Important?• Using sensitivity and specificity: SpPin and SnNout• Sometimes it can be helpful just knowing the sensitivity and

specificity of a test, if they are very high.• If a test has high specificity, i.e. if a high proportion of

patients without the disorder actually test negative, it is unlikely to produce false positive results. Therefore, if the test is positive it makes the diagnosis very likely.

• This can be remembered by the mnemonic SpPin: for a test with high specificity (Sp), if the test is positive, then it rules the diagnosis 'in'.

• Similarly, with high sensitivity a test is unlikely to produce false negative results. This can be remembered by the mnemonic SnNout: for a test with high sensitivity (Sn), if the test is negative, then it rules 'out' the diagnosis.


SnNOut SpPIn

Sen/Spec/PPV/LR of WBC Count>20


• These measures are combined into an overall measure of the efficiency of a diagnostic test called the likelihood ratio: the likelihood that a given test result would be expected in a patient withwith the target disorder compared to the likelihood that the same result would be expected in a patient withoutwithout the disorder (With/Without).– These possible outcomes of a diagnostic test are

illustrated below (sample data from Andriole 1988)


• Positive Predictive Value = the proportion of people with a positive test who have disease.

• True+/(True+ plus False+)

• Negative Predictive Value = the proportion of people with a negative test who are free of disease.

• True-/(True- plus False-)

Likelihood ratios are extremely valuable !

Likelihood ratios

• Can be used in situations with more than 2 test outcomes

• Allow a direct link from pre-test probabilities to post-test probabilities

2 x 2 table: positive likelihood ratio

Disease

Test

+ -

+

- c

a b

d

LR+ = a/a+c / b/b+d

or

LR+ = sens/(1-spec)

How much more often a positive test occurs in people with compared to those without the disease

2 x 2 table: negative likelihood ratio

Disease

Test

+ -

+

- c

a b

d

LR- = c/a+c / d/b+d

or

LR- = (1-sens)/(spec)

How less likely a negative test result is in people with the disease compared to those without the disease

LR>10 = strong positive test result

LR<0.1 = strong negative test result

LR=1

No diagnostic value

Likelihood Ratios

McGee: Evidence based Physical Diagnosis (Saunders Elsevier)

3.43.4

Post test 20%

? Appendicitis:

McBurney tenderness LR+ = 3.4

Pre test 5%

%

%

Bayesian reasoning

Fagan nomogram

Are the Results Important?Are the Results Important?• What Likelihood Ratios Were Associated With the Range of

Possible Test Results?• The starting point of any diagnostic process is the patient

presenting with a constellation of symptoms and signs. • Consider two patients with nonspecific chest pain and shortness of

breath without findings suggesting diagnoses such as pneumonia, airflow obstruction, or heart failure, in whom the clinician suspects pulmonary embolism.

• One is a 78-year-old woman 10 days after surgery and the other is a 28-year-old man experiencing a high level of anxiety.

• Our clinical hunches about the probability of pulmonary embolism as the explanation for these two patients' complaints -- that is, their pretest probabilities -- are very different.

• In the older woman, the probability is high; in the young man, it is low. As a result, even if both patients have intermediate-probability ventilation-perfusion scans, subsequent management is likely to differ in each. One might well treat the elderly woman immediately with heparin but order additional investigations in the young man.


• Two conclusions emerge from this line of reasoning. • First, regardless of the results of the ventilation-

perfusion scan, they do not tell us whether pulmonary embolism is present. What they do accomplish is to modify the pretest probability of that condition, yielding a new posttest probability.

• The direction and magnitude of this change from pretest to posttest probability are determined by the test's properties, and the property of most value is the likelihood ratio.

As depicted in Table 1C-3, constructed from the results of the PIOPED study, there were 251 people with angiographically proven pulmonary embolism and 630 people whose angiograms or follow-up excluded that diagnosis. For all patients, ventilation-perfusion scans were classified into four levels: high probability, intermediate probability, low probability, and normal or near-normal. How likely is a high-probability scan among people who do have pulmonary embolism? Table 1C-3 shows that 102 of 251 (or 0.406) people with the condition had high-probability scans. How often is the same test result, a high-probability scan, found among people in whom pulmonary embolism was suspected but has been ruled out? The answer is 14 of 630 (or 0.022) of them. The ratio of these two likelihoods is called the likelihood ratio (LR); for a high-probability scan, it equals 0.406 ÷ 0.022 (or 18.3). In other words, a high-probability ventilation-perfusion scan is 18.3 times as likely to occur in a patient with--as opposed to without--a pulmonary embolism.

In a similar fashion, we can calculate the likelihood ratio for each level of the diagnostic test results. Each calculation involves answering two questions: First, how likely it is to obtain a given test result (say, a low-probability ventilation-perfusion scan) among people with the target disorder (pulmonary embolism)? Second, how likely it is to obtain the same test result (again, a low-probability scan) among people without the target disorder? For a low-probability ventilation-perfusion scan, these likelihoods are 39/251 (0.155) and 273/630 (0.433), respectively, and their ratio (the likelihood ratio for low-probability scan) is 0.36. Table 1C-3 provides the results of the calculations for the other scan results.

Likelihood RatiosLikelihood Ratios• What do all these numbers mean? The Likelihood ratios Likelihood ratios indicate by how

much a given diagnostic test result will raise or lower the pretest probability of the target disorder. A likelihood ratio of 1 means that the posttest probability is exactly the same as the pretest probability. Likelihood ratios >1.0 increase the probability that the target disorder is present, and the higher the likelihood ratio, the greater is this increase. Conversely, likelihood ratios <1.0 decrease the probability of the target disorder, and the smaller the likelihood ratio, the greater is the decrease in probability and the smaller is its final value.

• How big is a "big" likelihood ratio, and how small is a "small" one? Using likelihood ratios in your day-to-day practice will lead to your own sense of their interpretation, but consider the following a rough guide:

• Likelihood ratios of >10 or < 0.1 generate large and often conclusive changes from pre- to posttest probability;

• Likelihood ratios of 5-10 and 0.1-0.2 generate moderate shifts in pre- to posttest probability;

• Likelihood ratios of 2-5 and 0.5-0.2 generate small (but sometimes important) changes in probability; and

• Likelihood ratios of 1-2 and 0.5-1 alter probability to a small (and rarely important) degree.

Likelihood RatiosLikelihood Ratios• Having determined the magnitude and significance of the likelihood

ratios, how do we use them to go from pretest to posttest probability? We cannot combine likelihoods directlycannot combine likelihoods directly, the way we can combine probabilities or percentages; their formal use requires their formal use requires converting pretest probability to converting pretest probability to oddsodds, multiplying the result by the Likelihood ratio, and converting the consequent posttest odds into a posttest probability. Although it is not too difficult, this calculation can be tedious and off-putting; fortunately, there is an easier way.

• A nomogram proposed by Fagan (Figure 1C-2) does all the conversions and allows an easy transition from pre- to posttest probability. The left-hand column of this nomogram represents the pretest probability, the middle column represents the likelihood ratio, and the right-hand column shows the posttest probability. You obtain the posttest probability by anchoring a ruler at the pretest probability and rotating it until it lines up with the likelihood ratio for the observed test result.

Recall the elderly woman mentioned earlier with suspected pulmonary embolism after abdominal surgery.

Most clinicians would agree that the probability of this patient having the condition is quite high-- about 70%. This value then represents the pretest probability. Suppose that her ventilation-perfusion scan was reported as being within the realm of high probability. Figure 1C-2 shows how you can anchor a ruler at her pretest probability of 70% and align it with the Likelihood ratio of 18.3 associated with a high-probability scan. The results: her posttest probability is >97%.

If, by contrast, her ventilation perfusion scan result is reported as intermediate (Likelihood ratio, 1.2), the probability of pulmonary embolism hardly changes (it increases to 74%), whereas a near-normal result yields a posttest probability of 19%. (Likelihood ration 0.1)

Likelihood RatiosLikelihood Ratios

• The pretest probability is an estimate.• Clinicians can deal with residual uncertainty

by examining the implications of a plausible range of pretest probabilities. Let us assume the pretest probability in this case is as low as 60%, or as high as 80%. The posttest probabilities that would follow from these different pretest probabilities appear in Table 1C-4.

Likelihood RatiosLikelihood Ratios• We can repeat this exercise for our second patient, the 28-year-old

man. Let us consider that his presentation is compatible with a 20% probability of pulmonary embolism. Using our nomogram (see Figure 1C-2), the posttest probability with a high-probability scan result is 82%; with an intermediate-probability result, it is 23%; and with a near-normal result, it is 2%. The pretest probability (with a range of possible pretest probabilities from 10% to 30%), likelihood ratios, and posttest probabilities associated with each of the four possible scan results also appear in Table 1C-4.

• The investigation of women with possible appendicitis showed that the CT scan was positive in all 32 in whom that diagnosis was ultimately confirmed. Of the 68 who did not have appendicitis, 66 had negative scan results. These data translate into a Likelihood ratio of 0 associated with a negative test and a Likelihood ratio of 34 for a positive test. These numbers effectively mean that the test These numbers effectively mean that the test is extremely powerful. is extremely powerful. A negative result excludes appendicitis, and a positive test makes appendicitis highly likely.

Wells Clinical Prediction Rule for PE

Clinical Prediction Rule for PE

V/Q Scan AND Clinical Probability

Likelihood RatiosLikelihood Ratios• Having learned to use likelihood ratios, you may be

curious about where to find easy access to the Likelihood ratios of the tests you use regularly in your own practice. The Rational Clinical Examination is a series of systematic reviews of the diagnostic properties of the history and physical examination that have been published in JAMA. Black and colleagues have summarized much of the available information about diagnostic test properties in the form of a medical text.

• Black ER, Bordley DR, Tape TG, Panzer RJ. Diagnostic strategies for common medical problems. In: Black ER, et al, eds. Philadelphia:American College of Physicians; 1999.

Likelihood RatiosLikelihood Ratios• Effect of prevalence• Positive predictive value is the percentage of patients

who test positive who actually have the disease. Predictive values are affected by the prevalence of the disease: if a disease is rarerrarer, the positive predictive value will be lowerlower, while sensitivity and specificity are constant.

• Since we know that prevalence changes in different health care settings, predictive values are not generally very useful in characterizing the accuracy of tests.

• The measure of test accuracy that is most useful when it comes to interpreting test results for individual patients is the likelihood ratio (LR).

Likelihood RatiosLikelihood Ratios• There are two major advantages to using likelihood

ratios.• The first is that they enable us to take into account the

exact value of a test result, rather than simply classifying it as positive or negative. For example, a patient with chest pain and a troponin I of 20 ng/ml will certainly get our attention more than one with a troponin of 0.6 ng/ml, despite the fact that both would be reported as positive.

• The second is that they can be used to calculate a post probability, even when multiple tests are used in sequence.


• Pre-test odds x LR (test result) = post-test odds

• To use this formula, one must convert between probability and odds using:– Odds = Probability/(1-Probability)– Odds of rolling a 6 with one die: 1/5 = 0.2– Probability = Odds/(1 + Odds)– Probability of a 6 = 1/6 = 0.167

Likelihood RatiosLikelihood Ratios• Assume you are faced with a patient with

abdominal discomfort and are considering spontaneous bacterial peritonitis (SBP). After exam you estimate the chances of SBP to be 20%. You perform a paracentesis, and find 600 PMNs/uL. How does this affect the likelihood that she has SBP?

Number of PMS in Ascitic Fluid Likelihood Ratio

1000 22.3

501-1000 2.78

251-500 1.14

0-250 0.08

Likelihood RatiosLikelihood Ratios• First convert pretest probability to odds:• Pre-test odds = 0.2/(1 – 0.2) = 0.25• Post-test odds = 0.25 x 2.78 = 0.695• Convert post-test odds back to probabilities:• Post-test probability = 0.695/(1 + 0.695) = 0.41

• Thus our new estimate is that she has about a 40% probability of SBP. Had the fluid shown >1000 PMNs, the post-test probability would have been 85%.

Do doctors use quantitative methods of test accuracy?

• Survey of 300 US physicians– 8 used Bayesian methods, 3 used ROC

curves, 2 used LRs– Why? …indices unavailable… …lack of training… …not relevant to setting/population.…other factors more important…

(Reid et al. Academic calculations versus clinical judgements: practicing physicians’ use of quantitative measures of test accuracy. Am J Med 1998)

Will the test apply in my setting?

• Reproducibility of the test and interpretation in my setting

• Do results apply to the mix of patients I see?• Will the results change my management?• Impact on outcomes that are important to patients?• Where does the test fit into the diagnostic strategy?• Costs to patient/health service?

Reliability – how reproducible is the test?

• Kappa = measure of intra-observer reliability

Test Kappa value

Tachypnoea 0.25

Crackles on auscultation

0.41

Pleural rub 0.52

CXR for cardiomegaly

0.48

MRI spine for disc

0.59

Value of Kappa Strength of Agreement

<0.20 Poor

0.21-0.40 Fair

0.41-0.60 Moderate

0.61-0.80 Good

0.81-1.00 Very Good

Will the result change management?

No action Test Action

(e.g. treat)

Probability of disease0% 100%

Testing threshold

Action threshold

SummarySummary• 1 Frame the clinical question.• 2 Search for evidence concerning the accuracy of the test.• 3 Assess the methods used to determine the accuracy of

the test.• 4 Find out the likelihood ratios for the test.• 5 Estimate the pre-test probability of disease in your

patient.• 6 Apply the likelihood ratios to this pre-test probability

using the nomogram to determine what the post-test probability would be for different possible test results.

• 7 Decide whether or not to perform the test on the basis of your assessment of whether it will influence the care of the patient, and the patient's attitude to different possible outcomes.

Appraising Articles on Harm/Etiology

Cohort Study

• A cohort study is an analytical study in which individuals with differing exposures to a suspected factor are identified and then observed for the occurrence of certain health effects over some period, commonly years rather than weeks or months.

• The occurrence rates of the disease of interest are measured and related to estimated exposure levels.

• Cohort studies can either be performed prospectively or retrospectively from historical records.

Cohort StudyCohort Study• Patients who have developed a

disorder are identified and their exposure to suspected causative factors is compared with that of controls who do not have the disorder.

• This permits estimation of odds ratios (but not of absolute risks).

• The advantages of case-control studies are that they are quick, cheap, and are the only way of studying very rare disorders or those with a long time lag between exposure and outcome.

• Disadvantages include the reliance on records to determine exposure, difficulty in selecting control groups, and difficulty in eliminating confounding variables.

Case Control Study

• A case-control study is an observational, retrospective study which "involves identifying patients who have the outcome of interest (cases) and control patients without the same outcome, and looking back to see if they had the exposure of interest."

Case Control Study• Patients with and without the

exposure of interest are identified and followed over time to see if they develop the outcome of interest, allowing comparison of risk.

• Cohort studies are cheaper and simpler than RCTs, can be more rigorous than case-control studies in eligibility and assessment, can establish the timing and sequence of events, and are ethically safe.

• However, they cannot exclude unknown confounders, blinding is difficult, and identifying a matched control group may also be difficult.

Case Control - Retrospective

• Retrospective case-control studies rely on people’s memories, making them prone to error. Also, it may be difficult to measure the exact amount of an exposure in the past. Among people with bladder cancer, how might researchers determine the amount of artificial sweeteners used? Researchers might ask patients to self-report their estimated consumption. This method is inexact at best.

Case Control - Retrospective

Randomized Controlled Trial• A randomized controlled trial is an experimental, prospective study

in which "participants are randomly allocated into an experimental group or a control group and followed over time for the variables/outcomes of interest."

• Study participants are randomly assigned to ensure that each participant has an equal chance of being assigned to an experimental or control group, thereby reducing potential bias. Outcomes of interest may be death (mortality), a specific disease state (morbidity), or even a numerical measurement such as blood chemistry level.

• Now let’s look at a diagram of a typical RCT that represents the flow of participants from the start of the study through the study outcome. Notice in all diagrams the study start; studies progressing from left to right represent prospective studies, “collecting data about a population whose outcome lies in the future”

Randomized Controlled Trial

Randomized Controlled Trial• Similar subjects are randomly

assigned to a treatment group and followed to see if they develop the outcome of interest.

• RCTs are the most powerful RCTs are the most powerful method of eliminating method of eliminating (known and unknown) (known and unknown) confounding variables and confounding variables and permit the most powerful permit the most powerful statistical analysis (including statistical analysis (including subsequent meta-analysis). subsequent meta-analysis).

• However, they are expensive, sometimes ethically problematic, and may still be subject to selection and observer biases.

Study DesignStudy Design

Is The Study Valid?Is The Study Valid?

• In assessing an intervention's potential for harm, we are usually looking at prospective cohort studies or retrospective case–control studies. This is because RCTs may have to be very large indeed to pick up small adverse reactions to treatment.


• 1) Was there a clearly defined question?• What question has the research been

designed to answer? Was the question focused in terms of the population group studied, the exposure received, and the outcomes considered?

Is The Study Valid?Is The Study Valid?• 2) Were there clearly defined, similar groups of

patients?

• Studies looking at harm must be able to demonstrate that the two groups of patients are clearly defined and sufficiently similar so as to be comparable.

• For example, in a cohort study, patients are either exposed to the treatment or not according to a decision. This might mean that sicker patients – perhaps more likely to have adverse outcomes–are more likely to be offered (or demand) potentially helpful treatment.

• There may be some statistical adjustment to the results to take these potential confounders into account.


• 3) Were treatment exposures and clinical outcomes measured the same ways in both groups?

• You would not want one group to be studied more exhaustively than the other, because this might lead to reporting a greater occurrence of exposure or outcome in the more intensively studied group.


• 4) Was the follow up complete and long enough?

• Follow up has to be long enough for the harmful effects to reveal themselves, and complete enough for the results to be trustworthy (lost patients may have very different outcomes from those who remain in the study).


• 5) Does the suggested causative link make sense?

• You can apply the following rationale to help decide if the results make sense.

• Is it clear the exposure preceded the onset of the outcome? It must be clear that the exposure wasn't just a 'marker' of another disease.

Is The Study Valid?Is The Study Valid?• Is there a dose-response gradient? If the exposure was

causing the outcome, you might expect to see increased harmful effects as a result of increased exposure: a dose-response effect.

• Is there evidence from a 'dechallenge-rechallenge' study? Does the adverse effect decrease when the treatment is withdrawn ('dechallenge') and worsen or reappear when the treatment is restarted ('rechallenge')?

• Is the association consistent from study to study? Try finding other studies, or, ideally, a systematic review of the question.

• Does the association make biological sense? If it does, a causal association is more likely.


• This means looking at the risk or odds of the adverse effect with (as opposed to without) exposure to the treatment; the higher the risk or odds, the stronger the association and the more we should be impressed by it.

• We can use the single table to determine if the valid results of the study are important.

Are the Results Important?Are the Results Important?• A cohort study compares the risk of an adverse event

amongst patients who received the exposure of interest with the risk in a similar group who did not receive it.

• Therefore, we are able to calculate a relative risk (or risk ratio). In case-control studies, we are presented with the outcomes, and work backwards looking at exposures. Here, we can only compare the two groups in terms of their relative odds (odds ratio).

• Statistical significance• As with other measures of efficacy, we would be concerned

if the 95% CI around the results, whether relative risk or odds ratio, crossed the value of 1, meaning that there may be no effect (or the opposite).

Interpretation: RISKInterpretation: RISK

• There are a number of ways of summarizing the outcome from binary data:

• Absolute Risk Reduction• Relative Risk• Relative Risk Reduction• Odds Ratio• Numbers Needed to Treat (or Harm)


Kennedy, et al. report on the study of acetazolamide and fursosemide versus standard therapy for the treatment of post hemorrhagic ventricular dilation (PHVD) in premature babies. The outcome was death or placement of a shunt by 1 year of age.

The standard method of summarizing binary outcomes is to use percentages or proportions. Thus 35 out of 76 children died or had a shunt under standard therapy.

This is expressed as 35/76 or 0.46. Or as a percentage as 46% For a prospective study such as this, the proportion can be thought of as the probability of an event happening or a risk.


Thus, under the standard therapy there was a risk of 35/76 = 0.46 (46%) of dying or getting a shunt by 1 year of age. In the drug plus standard therapy the risk was 49/75 = 0.65 (65%)

In clinical trials what we really want is to look at the contrast between differing therapies.We do this by looking at the difference in risks, or alternatively the ratio of risks.The difference is usually expressed as the control risk minus the experimental risk and is known as the absolute risk reduction (ARR).

The difference in risks in this case is 0.46-0.65 = -0.19 or -19% The negative sign indicates that the experimental treatment in this case appears to be doing harm.

One way of thinking about this is if 100 patients were treated under standard therapy and 100 treated under drug therapy, we would expect 46 to have died or have had a shunt in standard therapy and 65 in the experimental therapy.

Another way of looking at this is to ask: how many patients would be treated for one extra person to be harmed by the drug therapy? 19 (65-46) adverse events resulted from treating 100 patients and so 100/19 = 5.26 patients would be treated for 1 adverse event. Thus roughly if 6 patients were treated with standard therapy and 6 with drug (experimental) therapy, we would expect 1 extra patient to die or require a shunt in the drug therapy group.

This is know as the NNH (number needed to harm) and is simply expressed as the inverse of the absolute risk reduction, with the sign ignored.When beneficial, it is known as NNT=Numbers needed to treat.For screening studies it is known as NNS=Numbers needed to screen.

Absolute Risk Reduction = 0.65-0.46 = 0.19; NNH =1/0.19 = 5.26

However, it is important to realize that comparison between NNTs can only be made if the baseline risks are similar.

Thus, suppose a new therapy managed to reduce 5 year mortality of Creutzfeldt-Jakob disease from 100% on standard therapy to 90% on the new treatment. This would be a major breakthrough and has a NNT of (1/(1-0.9))=10.

In contrast, a drug that reduced mortality from 50% to 40% would also have a NNT of 10, but would have much less impact.

We can express the outcome as a risk ratio or relative risk (RR), which is the ratio of the two risks, experimental divided by control risk, namely 0.65/0.46 = 1.41. With relative risk less than 1 the risk of an event is greater in the control group. RR is often used in cohort studies.

It is important to consider the absolute risk!. The risk of DVT in women on a new type of oral contraceptive is 30 per 100,000 women years, compared to 15 per 100,000 on the old treatment. Thus the RR is 2 (200%) which shows that the new type of contraceptive carries quite a high risk of DVT. However, an women need not be unduly concerned since she has a probability of 0.0003 of getting a DVT in 1 year on the new drug which is much less than if she were pregnant!

We can also consider the relative risk reduction (RRR) which is (control risk – experimental risk)/control risk; this is easily shown to be 1-RR, often expressed as a percentage. Thus in the drug arm of the PHVD trial there is a 41% higher risk of experiencing an adverse event relative to the risk of a patient on standard therapy.(0.65-0.46)/0.46 = 0.41

When the data come from a case-control study or a cross-sectional study, rather than risks, we often use odds. The odds of an event happening are the ratio of the probability ratio of the probability that it happens to the probability that is does not.that it happens to the probability that is does not.

Thus the probability of throwing a 6 on a die is 1/6 = 16.67%. (Notice the denominator is the total of all outcomes (Happening + Not Happening). The probability of throwing any other number is 5/6 = 83.33%. If P is the probability of an event we have:

Odds (event) = P/(1-P) = (Happening/Not Happening )1:5 = 0.1667/0.8333 = 0.2

(35/76=0.46)

(49/75=0.65)

Calculate absolute risk of coronary event in the E+P vs Placebo Group: E+P = 164/8506 = 0.0193 = 1.93%; Placebo = 122/8102 = 0.0151 = 1.51%Absolute Risk INCREASE = 1.93% - 1.51% = 0.42% (0.0042)Risk Ratio = 0.0193/0.0151 = 1.26 (26% increase)Relative Risk Increase/Reduction =( 0.0193-0.0151)/0.0151 = 0.278=27.8% increaseOdds Ratio =/[0.0193/(1-0.0193)] /[0.0151/(1-0.0151)] = 1.29 = 29%NNH=Numbers needed to harm = 1/|0.0151-0.0193| = 238

For PEE+P = 70/8506 = 0.0082; Placebo = 31/8102 = 0.0038Absolute Risk INCREASE = 0.0082-0.0038 =0.0044 = 0.44%Risk Ratio =0.0082/0.0038=2.16 = (116% increase)Relative Risk Increase/Reduction =(0.0082-0.0038)/0.0038 = 1.15 = 115% increaseOdds Ratio = 0.0082/(1-0.0082)]/ [0.0038/(1-0.0038)] = 2.16 = 116%NNH = 1/|0.0082-0.0038| = 227

Adverse risk ratios include those for CHD, stroke, breast cancer, and PE. Only risks from hip fracture and colorectal CA are less than unity and suggest benefit.

Global IndexE + P = 751/8506 = 0.0883; Placebo = 623/8102 = 0.0769NNH = 1/|0.0883-0.0769| = 87.7

Clinical ScenarioClinical ScenarioDo SSRIs Cause Gastrointenstinal Bleeding?Do SSRIs Cause Gastrointenstinal Bleeding?

• You are a general practitioner considering the optimal choice of antidepressant medication. Your patient is a 55-year-old previously cheerful and well-adjusted individual who, during the past 2 months, has become sad and distressed for the first time in his life. He has developed difficulty concentrating and experiences early morning wakening, but lacks thoughts of self-harm. The patient has attended your practice for the past 20 years and you know him well. You believe he is suffering from a major depressive episode and that he might benefit from antidepressant medication.During recent years, you have been administering a selective serotonin reuptake inhibitor (SSRI), paroxetine, as your first-line antidepressant agent. However, recent reviews suggesting that the SSRIs are no more effective and do not have lower discontinuation rates than tricyclic antidepressants (TCAs) have led you to revert to your previous first choice, nortriptyline, in some patients. Patients in your practice usually consider the adverse effects in some depth before agreeing to any treatment decisions and many choose SSRIs on the basis of a preferable side-effect profile.However, for the past 5 years the patient you are seeing today has been taking ketoprofen (a nonsteroidal anti-inflammatory drug, or NSAID), 50 mg three times per day, which has controlled the pain from his hip osteoarthritis. Your mind jumps to a review article suggesting that SSRIs may be associated with an increased risk of bleeding, and you become concerned about the risk of gastrointestinal bleeding when you consider that the patient is also receiving an NSAID. Unfortunately, an abstract from Evidence Based Mental Health, which you have used to obtain a summary of side effects of antidepressant medications, provides no information regarding this issue.You remember the review article and locate a copy in your files, but at a glance you realize that it will not help answer your question for three reasons: It did not use explicit inclusion and exclusion criteria, it failed to conduct a systematic and comprehensive search, and it did not evaluate the methodologic quality of the original research it summarized . In addition, it did not cite any original studies specific to an association between SSRI treatment and gastrointestinal bleeding.You consider that it is worth following up this issue before you make a final recommendation to the patient. You inform him that he will need antidepressant medication, but you explain your concern about the possible bleeding risk and your need to acquire more definitive information before making a final recommendation. You schedule a follow-up visit two days later and you commit to presenting a strategy at that time.


• You formulate the following focused question:• Do adults suffering from depression and taking SSRI medications,

compared to patients not taking antidepressants, suffer an increased risk of serious upper gastrointestinal bleeding?

• Later that day, you begin your search using prefiltered evidence-based medicine resources-- the journal Evidence Based Mental Health, Best Evidence, Clinical Evidence, and the Cochrane Library.

• For each database, you enter the term "serotonin reuptake inhibitor." • Search of Evidence Based Mental Health yields eight reviews in volumes 1

(1998) and 2 (1999). Four of these deal with adverse effects associated with SSRI use, but none addresses gastrointestinal bleeding.

• Searching Best Evidence yields 17 equally unhelpful articles. A Clinical Evidence search identifies only a review on treatment of depressive disorders in adults.

• The Cochrane Library search locates four complete reviews and two abstracts of systematic reviews, but none addresses the issue of gastrointestinal bleeding in SSRI users.


• You now turn to the PubMed version of MEDLINE and PreMEDLINE searching system (http://www.ncbi.nlm.nih.gov/entrez/query.fcgi). For optimum search efficiency, you click on "Clinical queries" under "PubMed Services" to access systematically tested search strategies, or you go to "Search hedges," which will help you identify methodologically sound studies pertaining to your question on harm.

• You enter the following: "selective serotonin reuptake inhibitor" AND "bleeding" for the subject search term; and you click on "Etiology" for study category and "Specificity" for emphasis. Your MEDLINE search (from 1966 through 2000) identifies one citation, an epidemiologic study assessing the association between SSRIs and upper gastrointestinal bleeding. This study describes a threefold increased risk of upper gastrointestinal bleeding associated with the use of SSRIs.

• Thinking that this article may answer your question, you download the full text free of charge from the British Medical Journal (BMJ) Web site (http://www.bmj.com/) as a portable document format (PDF) file, an electronic version of a printed page or pages.

Clinical ScenarioClinical Scenario• Are the Results Valid?• Clinicians often encounter patients who are facing

potentially harmful exposures, either to medical interventions or environmental agents, and important questions arise. Are pregnant women at increased risk of miscarriage if they work in front of video display terminals? Do vasectomies increase the risk of prostate cancer? Do hypertension management programs at work lead to increased absenteeism? When examining these questions, physicians must evaluate the validity of the data, the strength of the association between the assumed cause and the adverse outcome, and the relevance to patients in their practice.

Clinical ScenarioClinical Scenario

• Using the Guide: Returning to our earlier discussion, the study that we retrieved investigating the association between SSRIs and risk of upper gastrointestinal bleeding used a case-control design. Data came from a general practitioner electronic medical record database in the United Kingdom, which included data from more than 3 million people, most of whom had been entered prospectively during a 5-year period. The investigators identified cases of upper gastrointestinal bleeding (n=1651) and ulcer perforation (n=248) among patients aged 40 to 79 years between 1993 and 1997. They then randomly selected 10,000 controls from the at-risk source population that gave rise to cases, choosing their sample so that age, sex, and the year patients were identified were similar among the cases and control groups.

•The analysis controlled for a number of possible prognostic factors: previous dyspepsia, gastritis, peptic ulcer and upper gastrointestinal bleeding or perforation, smoking status, and current use of NSAIDs, anticoagulants, corticosteroids, and aspirin. The database included prescription drugs only. The investigators examined the relative frequency of SSRI prescription use in the 30 days before the index date (that is, the date of the reported bleeding or perforation) in patients with and without bleeding and perforation after controlling for the prognostic variables. Control patients received a random date as their index date.

•Although the investigators controlled for a number of prognostic factors, there are other potential important determinants of bleeding for which they did not control. For example, more patients being treated for depression or anxiety suffer from painful medical conditions than those without depression and anxiety. Patients may have been using over-the-counter NSAIDs for these problems. The database the investigators used does not capture the use of self-medication with over-the-counter analgesics.

• Alcohol use is another potential confounder. Although the investigators excluded patients with known alcoholism, many persons afflicted with alcoholism remain unrevealed to their primary care physician, and alcoholism is associated with an increased prevalence of depression and anxiety that could lead to the prescription of SSRIs. Since alcoholism is associated with increased bleeding risk, this prognostic variable fulfils all the criteria for a confounding variable that could bias the results of the study. Finally, it is possible that patients returning for prescription of SSRIs would be more likely to have their bleeding diagnosed in comparison to patients under less intense surveillance (a state of affairs known as detection bias).

•These biases should apply to all three classes of antidepressants (ie, SSRIs, nonselective serotonin reuptake inhibitors, and a miscellaneous group of others) that the investigators considered. The results of the study, which we will discuss later in this section, showed an association only between gastrointestinal bleeding and SSRIs, rather than between gastrointestinal bleeding and other antidepressant medications. One would expect all these biases to influence the association between any antidepressant agent and bleeding. Thus, the fact that the investigators found the association only with SSRIs decreases our concern about the threats to validity from possible differences in prognostic factors in those receiving--and not receiving--SSRIs.

• At the same time, most physicians make decisions regarding the prescription of SSRIs or tricyclic antidepressant agents based on particular patient characteristics. Thus, it remains possible that these characteristics include some that are associated with the incidence of gastrointestinal bleeding. This would be true, for instance, if clinicians differentially used SSRI rather than other antidepressant medications in patients in whom they suspected alcohol abuse.

•The major strength of the use of a large database for this study is that it eliminates the possibility of biased assessment of exposure (or recall bias) to SSRIs in the patients who suffered the outcomes as well as in those who did not. The outcomes and exposures were probably measured in the same way in both groups, as most clinicians are unaware that UGI bleeding may be associated with SSRI use. We have no idea, however, about the number of patients lost to follow-up. Although the investigators included only those patients who stayed in the practices of the participating primary care physicians from the beginning to the end of the study, we do not know, for instance, how many people in the database began to receive SSRIs but subsequently left those practices.

•In summary, the study suffers from the limitation inherent in any observational study: that exposed and unexposed patients may differ in prognosis at baseline. In this case, at least two unmeasured variables, over-the-counter NSAID use and alcohol consumption, might create a spurious association between SSRIs and gastrointestinal bleeding. The other major limitation of the study is the lack of information regarding completeness of follow-up. That said, although these limitations weaken any inferences we might make, we are likely to conclude that the study is strong enough to warrant a review of the results.

How Strong Is the Association How Strong Is the Association Between Exposure and Outcome?Between Exposure and Outcome?

• In addition to showing a large magnitude of relative risk or odds ratio, a second finding will strengthen an inference that we are dealing with a true harmful effect.

• If, as the quantity or the duration of exposure to the putative harmful agent increases, the risk of the adverse outcome also increases (that is, the data suggest a dose-response gradient), we are more likely to be dealing with a causal relationship between exposure and outcome.

• The fact that the risk of dying from lung cancer in male physician smokers increases by 50%, 132%, and 220% for 1 to 14, 15 to 24, and 25 or more cigarettes smoked per day, respectively, strengthens our inference that cigarette smoking causes lung cancer.

How Precise Is the Estimate of the How Precise Is the Estimate of the Risk?Risk?

• Using the Guide: Returning to our earlier discussion, the investigators calculated odds ratios (ORs) of the risk of bleeding in those exposed to SSRIs vs those not exposed, but they reported the results as relative risks (RR). Unfortunately, this practice is not unusual. Fortunately, when event rates are low, relative risks and odds ratios closely approximate one another.

• The investigators found an association between current use of SSRIs and upper gastrointestinal bleeding (adjusted odds ratio {OR}, 3.0; 95% CI, 2.1-4.4). They noted a weak association with nonselective serotonin reuptake inhibitors (adjusted OR, 1.4; 95% CI, 1.1-1.9), but found no association with antidepressant medications that had no action on the serotonin reuptake mechanism.

• The investigators found that the association between NSAID use and bleeding (adjusted OR, 3.7; 95% CI, 3.2-4.4) was of similar magnitude to the association between bleeding and SSRIs. The current use of SSRIs with prescription NSAID drugs further increased the risk of upper gastrointestinal bleeding (adjusted OR, 15.6; 95% CI, 6.6-36.6). The dose and duration of SSRI use had little influence on the risk of this adverse outcome.

Clinical ResolutionClinical Resolution• Turning to the results, you note the very strong association

between the combined use of SSRIs and NSAIDs. Despite the methodologic limitations of this single study, you believe the association is too strong to ignore. You therefore proceed to the third step and consider the implications of the results for the patient before you.

• The primary care database from which the investigators drew their sample suggests that the results are readily applicable to the patient before you. You consider the magnitude of the risk to which you would be exposing this patient if you prescribed an SSRI and it actually did cause bleeding. Using the baseline risk reported by Carson et al in a similar population, you calculate that you would need to treat about 625 patients with SSRIs for a year to cause a single bleeding episode in patients not using NSAIDs, and about 55 patients a year taking NSAIDs along with an SSRI for a year to cause a single bleeding episode.

Clinical ResolutionClinical Resolution• From previous experience with the patient

before you, you know that he is risk averse. When he returns to your office, you note the equal effectiveness of the SSRIs and tricyclic antidepressants that you can offer him, and you describe the side-effect profile of the alternative agents. You note, among the other considerations, the possible increased risk of gastrointestinal bleeding with the SSRIs. The patient decides that, on balance, he would prefer a tricyclic antidepressant and leaves your office with a prescription for nortriptyline.

Carson JL, Strom BL, Soper KA, West Carson JL, Strom BL, Soper KA, West SL, Morse MLSL, Morse ML

• Carson JL, Strom BL, Soper KA, West SL, Morse ML. The association of nonsteroidal anti-inflammatory drugs with upper gastrointestinal tract bleeding. Arch Intern Med 1987;147:85-8.

• To evaluate the risk of developing upper gastrointestinal (UGI) bleeding from nonsteroidal anti-inflammatory drugs (NSAIDs), a retrospective (historical) cohort study was performed, using a computerized data base including 1980 billing data from all Medicaid patients in the states of Michigan and Minnesota.

• Comparing 47,136 exposed patients to 44,634 unexposed patients, the unadjusted relative risk for developing UGI bleeding 30 days after exposure to a NSAID was 1.5 (95% confidence interval 1.2 to 2.0).

• Univariate analyses demonstrated associations between UGI bleeding and age, sex, state, alcohol-related diagnoses, preexisting abdominal conditions, and use of anticoagulants. This association between NSAIDs and UGI bleeding was unchanged after adjusting for these potential confounding variables using logistic regression. A linear dose-response relationship and a quadratic duration-response relationship were demonstrated. Non-steroidal anti-inflammatory drugs are associated with UGI bleeding, although the magnitude of the increased risk is reassuringly small.

Clinical ResolutionClinical Resolution

Clinical ResolutionClinical Resolution• NNT for SSRI without NSAID:• Odds Ratio = 3.0• Relative Risk = 1.0024• NNT = 624.5

NNT for SSRI with NSAID:• Odds Ratio = 15.6• Relative Risk =1.019• NNT = 56.17

evidence-based medicine: effective use of the medical literature edward g. hamaty jr., d.o. faccp,...

Documents