patrick breheny march 29 - college of arts &...
TRANSCRIPT
IntroductionHypothesis testing
Study designsMeasuring association
Two-sample inference: Categorical data
Patrick Breheny
March 29
Patrick Breheny STA 580: Biostatistics I 1/62
IntroductionHypothesis testing
Study designsMeasuring association
Separate vs. paired samples
Despite the fact that paired samples usually offer a morepowerful design, separate samples are much more common,for a variety of reasons:
It is often impossible to pair samples: for example, in the Salkvaccine trial, a child must either be vaccinated or unvaccinated– there is no way to receive bothEven if theoretically possible, it is often impractical: forexample, in studies of chronic diseases, we have to wait yearsto observe whether or not a person suffers a heart attack ordevelops breast cancerFurthermore, in observational studies, we have no control overwhich subjects end up in which group
Patrick Breheny STA 580: Biostatistics I 2/62
IntroductionHypothesis testing
Study designsMeasuring association
Lister’s experiment
In the 1860s, Joseph Lister conducted a landmark experimentto investigate the benefits of sterile technique in surgery
At the time, it was not customary for surgeons to wash theirhands or instruments prior to operating on patients
Lister developed a new operating procedure in which surgeonswere required to wash their hands, wear clean gloves, anddisinfect surgical instruments with carbolic acid
This new procedure was compared to the old, non-sterileprocedure and Lister recorded the number of patients in eachgroup that lived or died
Patrick Breheny STA 580: Biostatistics I 3/62
IntroductionHypothesis testing
Study designsMeasuring association
Contingency tables
When the outcome of a two-sample study is categorical, theresults can be summarized in a 2x2 table that lists the numberof subjects in each sample that fell into each category
Putting Lister’s results in this form, we have:
SurvivedYes No
Sterile 34 6Control 19 16
This kind of table is called a contingency table, or sometimesa cross-classification table
Patrick Breheny STA 580: Biostatistics I 4/62
IntroductionHypothesis testing
Study designsMeasuring association
Contingency tables (cont’d)
Customarily, the rows of a contingency table represent thetreatment/exposure groups, while the columns represent theoutcomes
All rows and columns must represent mutually exclusivecategories; thus, each subject is located in one and only onecell of the table
Patrick Breheny STA 580: Biostatistics I 5/62
IntroductionHypothesis testing
Study designsMeasuring association
Lister’s results
On the surface, Lister’s experiment seems encouraging: 46%of patients who received conventional treatment died,compared with only 15% of the patients who were operatedon using the new sterile technique
However, if we calculate (separate, exact) confidence intervalsfor the proportion who die from each type of surgery, theyoverlap:
Sterile: (6%,30%)Control: (29%,63%)
Patrick Breheny STA 580: Biostatistics I 6/62
IntroductionHypothesis testing
Study designsMeasuring association
Fisher’s Exact TestThe χ2 testFisher’s exact test vs. the χ2 test
Differences between groups
It’s nice to know about the actual separate proportions thatdied for each type of surgery, but what we really want to knowin this experiment is whether there is a difference between thetwo treatments
So, rather than ask two separate questions, each using part ofthe data and addressing part of the question of interest, amore powerful approach is to focus all of the analysis on theone primary question of interest
Patrick Breheny STA 580: Biostatistics I 7/62
IntroductionHypothesis testing
Study designsMeasuring association
Fisher’s Exact TestThe χ2 testFisher’s exact test vs. the χ2 test
Setting up a hypothesis test
Let’s think about what a p-value is: the probability of seeingresults as extreme or more extreme than what we saw if thenull hypothesis were true
The null hypothesis here is that sterilization has no impact onthe probability that a patient lives or dies – i.e., that itdoesn’t make any difference which type of surgery thepatients received
If that were true, then it is as if we only really had one group,and everyone lived or died with equal probability regardless ofthe sterility of their surgery
Patrick Breheny STA 580: Biostatistics I 8/62
IntroductionHypothesis testing
Study designsMeasuring association
Fisher’s Exact TestThe χ2 testFisher’s exact test vs. the χ2 test
Setting up a hypothesis test
Thus, consider putting all the patients’ outcomes into a singleurn without considering the type of surgery they received (i.e.,this urn would contain 22 balls with “died” written on themand 53 balls with “survived” written on them)
Our sample of 40 patients who received sterile surgery wouldbe like randomly drawing 40 balls out of the urn
How often would we see something as extreme or moreextreme than only 6 out of 40 patients dying?
Patrick Breheny STA 580: Biostatistics I 9/62
IntroductionHypothesis testing
Study designsMeasuring association
Fisher’s Exact TestThe χ2 testFisher’s exact test vs. the χ2 test
Performing the experiment
Making these draws 10,000 times, I got the following results:
4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19
Fre
quen
cy
050
010
0015
0020
00
Patrick Breheny STA 580: Biostatistics I 10/62
IntroductionHypothesis testing
Study designsMeasuring association
Fisher’s Exact TestThe χ2 testFisher’s exact test vs. the χ2 test
Calculating a p-value from the experiment
When I drew 40 balls from the combined urn, I only drew 6balls 39 times (out of 10,000)
The only results “as extreme or more extreme” than 6 were:
Number of “died” balls: 4 5 6 18 19Number of times drawn: 2 7 39 13 1
So I obtained a result as extreme or more extreme than theobserved value a total of 62 times out of 10,000
From this experiment, then, I would calculate a p-value of62/10000 = .0062
Patrick Breheny STA 580: Biostatistics I 11/62
IntroductionHypothesis testing
Study designsMeasuring association
Fisher’s Exact TestThe χ2 testFisher’s exact test vs. the χ2 test
Fisher’s exact test
This approach to testing association in a 2x2 table is calledFisher’s exact test, after R.A. Fisher, probably the mostinfluential statistician of the 20th century
Although we did the experiment using a simulation, Fisherworked out the exact probabilities that would result from thisexperiment
Even before Fisher, however, another famous statistician (KarlPearson) invented an approximate version of this test
Pearson’s invention, the χ2-test, is one of the earliest (1900)and most widely used statistical tests
Patrick Breheny STA 580: Biostatistics I 12/62
IntroductionHypothesis testing
Study designsMeasuring association
Fisher’s Exact TestThe χ2 testFisher’s exact test vs. the χ2 test
The χ2 curve
The χ2 test involves a curve we haven’t seen yet called the χ2
curve
Before we get to the test, let’s take a quick look at where theχ2 curve comes from
Suppose that we generated a lot of random observations fromthe normal distribution; the histogram of these observationswould look like the normal curve
Now suppose that we took those observations, squared them,and then made a histogram
Patrick Breheny STA 580: Biostatistics I 13/62
IntroductionHypothesis testing
Study designsMeasuring association
Fisher’s Exact TestThe χ2 testFisher’s exact test vs. the χ2 test
The χ2 curve (with 1 degree of freedom)
z2
Den
sity
0.0 0.5 1.0 1.5 2.0 2.5 3.0
01
23
45
6
Patrick Breheny STA 580: Biostatistics I 14/62
IntroductionHypothesis testing
Study designsMeasuring association
Fisher’s Exact TestThe χ2 testFisher’s exact test vs. the χ2 test
The χ2 curve and hypothesis testing
What are the implications for hypothesis testing?
Well, suppose we were performing a z-test: normally, wewould calculate a test statistic z and find the area under thenormal curve outside ±zBut what if we squared z instead?
z2 =(x− µ0)
2
SE2
Patrick Breheny STA 580: Biostatistics I 15/62
IntroductionHypothesis testing
Study designsMeasuring association
Fisher’s Exact TestThe χ2 testFisher’s exact test vs. the χ2 test
The χ2 curve and hypothesis testing (cont’d)
Now, the area under the normal curve above +z will lie above+z2 – and so will the area under the normal curve below −zThe area to the right of z2 now contains both tails of theoriginal normal curve
To summarize: regardless of whether x is far above µ0 or farbelow µ0, z2 will be large and we will naturally get atwo-sided test
So, an alternative way to calculate p-values for z-tests is tofind the area to the right of z2 on the χ2 curve – and don’tdouble it
Patrick Breheny STA 580: Biostatistics I 16/62
IntroductionHypothesis testing
Study designsMeasuring association
Fisher’s Exact TestThe χ2 testFisher’s exact test vs. the χ2 test
Graphical representation
−4 −2 0 2 4
0.0
0.1
0.2
0.3
0.4
0.5
z
Den
sity
0 1 2 3 4 5 60.
00.
10.
20.
30.
40.
5
z2
Den
sity
Patrick Breheny STA 580: Biostatistics I 17/62
IntroductionHypothesis testing
Study designsMeasuring association
Fisher’s Exact TestThe χ2 testFisher’s exact test vs. the χ2 test
The motivation behind a χ2-test
In a sense, squaring the z-test statistic and comparing it tothe χ2 curve is a χ2 test
However, this not what people usually mean when they talkabout χ2 tests
Usually, the term χ2-test is reserved for an analysis ofcategorical outcomes in which you compare the observednumber of times each category occurs with the number oftimes it would be expected to occur under the null hypothesis
The more disagreement there is between observed andexpected results, the further we will be in the right-hand tailof the χ2 curve, and the lower our p-value will be
Patrick Breheny STA 580: Biostatistics I 18/62
IntroductionHypothesis testing
Study designsMeasuring association
Fisher’s Exact TestThe χ2 testFisher’s exact test vs. the χ2 test
The χ2-statistic
Specifically, this is done by calculating the χ2-statistic: lettingthe subscript i denote the possible categories,
χ2 =∑i
(Oi − Ei)2
Ei
where Oi and Ei are the observed and expected number oftimes category i occurs/should occur
The numerator should look familiar: it’s the differencebetween the observed value of something and its expectedvalue under the null
The denominator, on the other hand, looks weird
However, it just so happens that when you’re countingoccurrences of a category, its SD2 is approximately equal tothe expected value
Patrick Breheny STA 580: Biostatistics I 19/62
IntroductionHypothesis testing
Study designsMeasuring association
Fisher’s Exact TestThe χ2 testFisher’s exact test vs. the χ2 test
The χ2-test procedure
The procedure for performing a χ2-test is as follows:
#1 Create a table of expected counts based on the null hypothesis
#2 Calculate the χ2-statistic
#3 Determine the area to the right of χ2 on the χ2 curve
Patrick Breheny STA 580: Biostatistics I 20/62
IntroductionHypothesis testing
Study designsMeasuring association
Fisher’s Exact TestThe χ2 testFisher’s exact test vs. the χ2 test
The χ2-test: Lister’s experiment
Let’s use the χ2-test to determine how unlikely Lister’s resultswould have been if sterile technique had no impact on fatalcomplications from surgery
#1 Create a table of expected counts based on the null hypothesis
In the experiment, ignoring group affiliation, 22 out of 75patients died
Thus, under the null, we would expect 22/75 = 29.3% of thepatients in each group to die:
SurvivedYes No
Sterile 28.3 11.7Control 24.7 10.3
Patrick Breheny STA 580: Biostatistics I 21/62
IntroductionHypothesis testing
Study designsMeasuring association
Fisher’s Exact TestThe χ2 testFisher’s exact test vs. the χ2 test
The χ2-test: Lister’s experiment (cont’d)
#2 Calculate the χ2-statistic:
χ2 =(34 − 28.3)2
28.3+
(6 − 11.7)2
11.7
+(19 − 24.7)2
24.7+
(16 − 10.3)2
10.3= 8.50
#3 The area to the right of 8.50 is 1 − .996 = .004
There is only a 0.4% probability of seeing such a largeassociation by chance alone; this is compelling evidence thatsterile surgical technique saves lives
Patrick Breheny STA 580: Biostatistics I 22/62
IntroductionHypothesis testing
Study designsMeasuring association
Fisher’s Exact TestThe χ2 testFisher’s exact test vs. the χ2 test
Fisher’s exact test and the χ2-test
So far, everything we’ve talked about is testing the same nullhypothesis with the same data, so one would expect similarresults
Indeed, the p-values are very similar:
Experiment: p = .006Fisher’s Exact Test: p = .005χ2-test: p = .004
Patrick Breheny STA 580: Biostatistics I 23/62
IntroductionHypothesis testing
Study designsMeasuring association
Fisher’s Exact TestThe χ2 testFisher’s exact test vs. the χ2 test
Fisher’s exact test vs. the χ2-test
How should you decide to use one versus the other?
The usual advice is that the χ2 approximation may beinaccurate if the expected count in any cell is under 5
Typically, the difference between the χ2 test and Fisher’sExact Test is small for 2x2 tables, but the two can beextremely different for larger tables
Because it often doesn’t matter, which test is performed isusually a matter of preference:
Some prefer the χ2-test for tradition’s sakeOther prefer Fisher’s exact test because, as the name implies,it is exact
Patrick Breheny STA 580: Biostatistics I 24/62
IntroductionHypothesis testing
Study designsMeasuring association
OverviewProspective studiesRetrospective studiesCross-sectional studies
Study designs that can be analyzed with χ2-tests
One reason that χ2-tests are so popular is that they can beused to analyze a wide variety of study designs
In addition to controlled experiments, they are widely used inepidemiology, where investigators must conduct observationalstudies
Observational studies in epidemiology fall into threecategories: prospective studies, retrospective studies, andcross-sectional studies
Patrick Breheny STA 580: Biostatistics I 25/62
IntroductionHypothesis testing
Study designsMeasuring association
OverviewProspective studiesRetrospective studiesCross-sectional studies
Study designs (cont’d)
χ2-tests and Fisher’s exact test can be used to analyze all ofthese studies
Best of all (some might say), you don’t even need to thinkabout your study design or where your data came from; if itcan be expressed as a 2x2 contingency table, the χ2-test isalways appropriate
One caveat: there are a lot of tables in this world that havetwo rows and two columns; that does not necessarily makethem contingency tables, so please don’t try to performχ2-tests on them!
Patrick Breheny STA 580: Biostatistics I 26/62
IntroductionHypothesis testing
Study designsMeasuring association
OverviewProspective studiesRetrospective studiesCross-sectional studies
Prospective studies
We have said that the double-blind, randomized controlledtrial is the gold standard of biomedical research
When this is not possible (or ethical), the prospective study(also called a cohort study) is the next best thing
In a prospective study, investigators collect a sample, classifyindividuals in some way, and then wait to see if the individualsdevelop a condition
The classification is usually based on exposure to a risk factorsuch as smoking or obesity
Patrick Breheny STA 580: Biostatistics I 27/62
IntroductionHypothesis testing
Study designsMeasuring association
OverviewProspective studiesRetrospective studiesCross-sectional studies
Risk factors for breast cancer
For example, the CDC tracked 6,168 women in the hopes offinding risk factors that led to breast cancer
One risk factor they looked at was the age at which thewoman gave birth to her first child:
CancerNo Yes
Before age 25 4475 6525 or older 1597 31
Patrick Breheny STA 580: Biostatistics I 28/62
IntroductionHypothesis testing
Study designsMeasuring association
OverviewProspective studiesRetrospective studiesCross-sectional studies
Risk factors for breast cancer (cont’d)
Performing a χ2-test on the data, we obtain p = .19
This study provides only rather weak evidence that the risk ofdeveloping breast cancer depends on the age at which awoman gives birth to her first child
Patrick Breheny STA 580: Biostatistics I 29/62
IntroductionHypothesis testing
Study designsMeasuring association
OverviewProspective studiesRetrospective studiesCross-sectional studies
Retrospective studies
Not all researchers have the resources to follow thousands ofpeople for decades to see if they develop a rare disease
Instead, they often try the more feasible approach ofcollecting a sample of people with the condition of interest, asecond sample of people without the condition of interest, andthen ask them if they were exposed to a risk factor in the past
For example, a much cheaper way to conduct the study ofbreast cancer risk factors would be to find 50 women withbreast cancer, 50 women without breast cancer, and ask themwhen they had their first child
This approach is called a retrospective, or case-control study
Patrick Breheny STA 580: Biostatistics I 30/62
IntroductionHypothesis testing
Study designsMeasuring association
OverviewProspective studiesRetrospective studiesCross-sectional studies
Fluoride poisining in Alaska
In 1992, an outbreak of illness occurred in an Alaskancommunity
The CDC suspected fluoride poisoning from one of the town’swater supplies
Case Control
Drank from supply 33 4Didn’t drink from supply 5 46
Patrick Breheny STA 580: Biostatistics I 31/62
IntroductionHypothesis testing
Study designsMeasuring association
OverviewProspective studiesRetrospective studiesCross-sectional studies
Fluoride poisining in Alaska
Testing whether this could be due to chance, the χ2-test givesus p = 6 × 10−13
The observed association was certainly not due to chance
But the association still may be due to factors besides fluoridepoisoning
Patrick Breheny STA 580: Biostatistics I 32/62
IntroductionHypothesis testing
Study designsMeasuring association
OverviewProspective studiesRetrospective studiesCross-sectional studies
Recall bias
For example, people who got sick may have been thinkingmuch harder about what they ate and drank than people whodidn’t
This is called recall bias, and it is an important source of biasin retrospective studies
Furthermore, because researchers must gather separatesamples of cases and controls, these studies are more prone tosampling biases than prospective studies
Patrick Breheny STA 580: Biostatistics I 33/62
IntroductionHypothesis testing
Study designsMeasuring association
OverviewProspective studiesRetrospective studiesCross-sectional studies
Electromagnetic field example
For example, retrospective studies have been performedinvestigating links between childhood leukemia and exposureto electromagnetic fields (EMF)
Families with low socioeconomic status are more likely to livenear electromagnetic fields
Families with low socioeconomic status are also less likely toparticipate in studies as controls
Socioeconomic status does not affect the participation ofcases, however (cases are usually eager to participate)
This results in an observed association between EMF andleukemia potentially arising entirely due to bias
Patrick Breheny STA 580: Biostatistics I 34/62
IntroductionHypothesis testing
Study designsMeasuring association
OverviewProspective studiesRetrospective studiesCross-sectional studies
Cross-sectional studies
The weakest type of observational study is the cross-sectionalstudy
In a cross-sectional study, the investigator simply gathers asingle sample and cross-classifies them depending on whetherthey have the risk factor or not and whether they have thedisease or not
Cross-sectional studies are the easiest to carry out, but aresubject to all sorts of hidden biases
Patrick Breheny STA 580: Biostatistics I 35/62
IntroductionHypothesis testing
Study designsMeasuring association
OverviewProspective studiesRetrospective studiesCross-sectional studies
Circulatory disease and respiratory disease
For example, one study surveyed 257 hospitalized individualsand determined whether each individual suffered from adisease of the respiratory system, a disease of the circulatorysystem, or both
Their results:
Respiratory DiseaseYes No
Circulatory Yes 7 29Disease No 13 208
Could this association be due to chance?
Not likely; χ2 = 7.9, so p = .005
Patrick Breheny STA 580: Biostatistics I 36/62
IntroductionHypothesis testing
Study designsMeasuring association
OverviewProspective studiesRetrospective studiesCross-sectional studies
Circulatory disease and respiratory disease (cont’d)
Okay, so it’s probably not due to chance
But does that mean that you are more likely to get arespiratory disease if you have a circulatory disease?
The same study surveyed nonhospitalized individuals as well:
Respiratory DiseaseYes No
Circulatory Yes 15 142Disease No 189 2181
Patrick Breheny STA 580: Biostatistics I 37/62
IntroductionHypothesis testing
Study designsMeasuring association
OverviewProspective studiesRetrospective studiesCross-sectional studies
Circulatory disease and respiratory disease (cont’d)
The evidence in favor of an association is now nonexistent:p = .48
What’s going on?
The issue isn’t a poor sampling design: both samples weregathered carefully and are representative of their respectivepopulations
Instead, the issue is that cross-sectional studies are verysusceptible to selection bias
Patrick Breheny STA 580: Biostatistics I 38/62
IntroductionHypothesis testing
Study designsMeasuring association
OverviewProspective studiesRetrospective studiesCross-sectional studies
Selection bias in cross-sectional studies
In this example, the bias was that if a patient has both acirculatory disease and a respiratory disease, then he or she ismuch more likely to be hospitalized and to be included in thecross-sectional study
There are many other examples:
Suppose we obtained a cross-sectional sample of factoryworkers to see if they had developed asthma at a higher ratethan non-factory workersWorkers who developed asthma from working in the factorymay be more likely to quit their job, and less likely to beincluded in our sampleSuppose we notice an association between milk drinking andpeptic ulcersIs it because milk drinking causes ulcers, or because ulcersufferers like to drink milk in order to relieve their symptoms?
Patrick Breheny STA 580: Biostatistics I 39/62
IntroductionHypothesis testing
Study designsMeasuring association
OverviewProspective studiesRetrospective studiesCross-sectional studies
Summary
In conclusion, prospective studies are the most trustworthyobservational study, but like any observational study, they aresubject to confounding
Retrospective studies are often much more feasible, butpotentially subject to recall bias and unrepresentative sampling
Cross sectional studies provide a quick snapshot of anassociation, but need to be interpreted with care
Patrick Breheny STA 580: Biostatistics I 40/62
IntroductionHypothesis testing
Study designsMeasuring association
Which association?Confidence intervals
Hypothesis tests and confidence intervals
Fisher’s exact test and the χ2-test can be used to calculatep-value and assess evidence against the null
However, we would also like to be able to measure and placeconfidence intervals on effect sizes to determine practicalsignificance
The null hypothesis of a χ2-test can be loosely described assaying that there is “no association” betweentreatment/exposure and the outcome
What do we mean by “association”?
Suppose we want a confidence interval . . . what do we want aconfidence interval of?
Patrick Breheny STA 580: Biostatistics I 41/62
IntroductionHypothesis testing
Study designsMeasuring association
Which association?Confidence intervals
Difference in proportions
One way of measuring the strength of an association forcategorical data is to look at the difference in proportions
For example, in Lister’s experiment, 46% of the patients whoreceived the conventional surgery died, but only 15% of thepatients who received the sterile surgery died
The difference in these percentages is 31%
Patrick Breheny STA 580: Biostatistics I 42/62
IntroductionHypothesis testing
Study designsMeasuring association
Which association?Confidence intervals
Flaws with the difference in proportions
However, differences in proportions are not informative forrare events
For example, in a rather famous study that made front-pageheadlines in the New York Times, 0.9% of subjects takingaspirin suffered heart attacks, compared to 1.7% of placebosubjects
The difference in proportions, 0.8%, doesn’t soundfront-page-of-the-New–York–Times–worthy
Patrick Breheny STA 580: Biostatistics I 43/62
IntroductionHypothesis testing
Study designsMeasuring association
Which association?Confidence intervals
The relative risk
Instead, for proportions, we often describe the strength of anassociation using ratios
When we said that the probability of suffering a heart attackwas twice as large (1.7/0.9 = 1.9) for the placebo group as forthe aspirin group, this is much more attention-grabbing
Similarly for Lister’s experiment: the risk of dying from surgeryis three times lower (46/15 = 3.1) if sterile technique is used
This ratio is called the relative risk, and it is usually moreinformative than the difference of proportions
Patrick Breheny STA 580: Biostatistics I 44/62
IntroductionHypothesis testing
Study designsMeasuring association
Which association?Confidence intervals
Flaws with the relative risk
The relative risk is a good measure of the strength of anassociation, but it too has flaws
One is that it’s asymmetric
For example, the relative risk of dying is 46/15 = 3.1 timesgreater with the nonsterile surgery, but the relative risk ofliving is only 85/54 = 1.57 times greater with the sterilesurgery
Patrick Breheny STA 580: Biostatistics I 45/62
IntroductionHypothesis testing
Study designsMeasuring association
Which association?Confidence intervals
Relative risks and retrospective studies
Another flaw is that it doesn’t work with retrospective studies
For example, consider the results of a classic case-controlstudy of the relationship between smoking and lung cancerpublished in 1950:
Cases Controls
Smoker 688 650Nonsmoker 21 59
Is the probability of developing lung cancer given that aperson smoked 688/(688 + 650) = 51%?
Patrick Breheny STA 580: Biostatistics I 46/62
IntroductionHypothesis testing
Study designsMeasuring association
Which association?Confidence intervals
Relative risks and retrospective studies (cont’d)
Absolutely not; this isn’t even remotely accurate
By design, this study included 709 people with lung cancerand 709 without; the fact that about 50% of smokers hadlung cancer doesn’t mean anything
For retrospective and cross-sectional studies, then, we cannotcalculate a relative risk
This would require an estimate of the probability ofdeveloping a disease given that an individual was exposed to arisk factor, which we can only get from a prospective study
Instead, retrospective studies give us the probability of beingexposed to a risk factor given that you have developed thedisease
Patrick Breheny STA 580: Biostatistics I 47/62
IntroductionHypothesis testing
Study designsMeasuring association
Which association?Confidence intervals
Odds
A slightly different measure of association, the odds ratio,gets around both of these flaws
Instead of taking the ratio of the probabilities, the odds ratiois a ratio of the odds of developing the disease given riskfactor exposure to the odds given a lack of exposure
The odds of an event is the ratio of the number of times theevent occurs to the number of times the event fails to occur
For example, if the probability of an event is 50%, then theodds are 1; in speech, people usually say that “the odds are 1to 1”If the probability of an event is 75%, then the odds are 3; “theodds are 3 to 1”
Patrick Breheny STA 580: Biostatistics I 48/62
IntroductionHypothesis testing
Study designsMeasuring association
Which association?Confidence intervals
The symmetry of the odds ratio
As advertised, the odds ratio possesses the symmetry that therelative risk does not
For example, in Lister’s experiment the odds of dying were6/34 = .176 for the sterile group and 16/19 = .842 for thecontrol group
The relative odds of dying with the control surgery istherefore .842/.176 = 4.77
On the other hand, the odds of surviving were 34/6 = 5.67 forthe sterile group and 19/16 = 1.19 for the control group
The relative odds of surviving with the sterile surgery istherefore 5.67/1.19 = 4.77
Patrick Breheny STA 580: Biostatistics I 49/62
IntroductionHypothesis testing
Study designsMeasuring association
Which association?Confidence intervals
An easier formula for the odds ratio
Summarizing this reasoning into a formula, if our table lookslike
a bc d
then
OR =ad
bc
Because of this formula, the odds ratio was originally calledthe “cross-product ratio”
Patrick Breheny STA 580: Biostatistics I 50/62
IntroductionHypothesis testing
Study designsMeasuring association
Which association?Confidence intervals
There are two odds ratios
Keep in mind that there are two odds ratios, depending onhow we ordered the rows and columns of the table, and thatthey will be reciprocals of one another
When calculating and interpreting odds ratios, be sure youknow which group has the higher odds of developing thedisease
In Lister’s experiment, the odds ratio for surviving with thesterile surgery was 4.77, but the odds ratio for surviving withthe control surgery was 1/4.77 = 0.210
NOTE: When writing about an odds ratio less than 1, it iscustomary to write, for example, that “the sterile procedurereduced the odds of death by 79%”
Patrick Breheny STA 580: Biostatistics I 51/62
IntroductionHypothesis testing
Study designsMeasuring association
Which association?Confidence intervals
Odds ratios and retrospective studies
The symmetry of the odds ratio works wonders when it comesto retrospective studiesSo, in our case-control study of lung cancer and smoking, theodds ratio for smoking given lung cancer is
OR =688 · 59
21 · 650= 2.97
However, this is also the odds ratio for lung cancer givensmokingThis is a minor miracle: we have managed to obtain aprospective measure of association from a retrospective study!Hence the popularity of the odds ratio: it can be used for anystudy design (prospective, retrospective, cross-sectional) thatresults in a 2x2 contingency table
Patrick Breheny STA 580: Biostatistics I 52/62
IntroductionHypothesis testing
Study designsMeasuring association
Which association?Confidence intervals
Interpretation of odds ratios
One important caveat when dealing with odds ratios is thatthey tend to sound larger than they really are
For example, in the Nexium trial, the healing rates were 93%vs. 89%
The odds ratio, however, is 1.6
A 60% increase in the odds of healing sounds quite a bit moreclinically significant than it really is
Patrick Breheny STA 580: Biostatistics I 53/62
IntroductionHypothesis testing
Study designsMeasuring association
Which association?Confidence intervals
The χ2-test and measures of association
Note that when the difference between two proportions equals0, the relative risk equals 1 and the odds ratio equals 1
Furthermore, when relative risk of disease given exposureequals 1, the relative risk of exposure given disease equals 1
Thus, any one of these may be thought of as the nullhypothesis of the χ2-test
This why the null hypothesis is often loosely described asthere being no “association” between exposure and disease
Patrick Breheny STA 580: Biostatistics I 54/62
IntroductionHypothesis testing
Study designsMeasuring association
Which association?Confidence intervals
The odds ratio and the central limit theorem
So far in this class, we’ve calculated confidence intervals foraverages and percentages
For both statistics, the central limit theorem guarantees thatif the sample size is big enough, their sampling distributionwill look relatively normal
The odds ratio, however, has no such guarantee
Patrick Breheny STA 580: Biostatistics I 55/62
IntroductionHypothesis testing
Study designsMeasuring association
Which association?Confidence intervals
Simulation: The sampling distribution of the odds ratio
Odds ratio
Fre
quen
cy
0 1 2 3 4
0e+
001e
+05
2e+
053e
+05
4e+
05
Patrick Breheny STA 580: Biostatistics I 56/62
IntroductionHypothesis testing
Study designsMeasuring association
Which association?Confidence intervals
The log transform
However, look at the sampling distribution of the logarithm of theodds ratio:
Log odds ratio
Fre
quen
cy
−3 −2 −1 0 1 2 3
0e+
001e
+05
2e+
053e
+05
Patrick Breheny STA 580: Biostatistics I 57/62
IntroductionHypothesis testing
Study designsMeasuring association
Which association?Confidence intervals
Confidence intervals for the log-odds
The log of the odds ratio (sometimes called the “log-odds”) isquite normal-looking and amenable to finding confidenceintervals for
Thus, the procedure that has been developed for constructingapproximate confidence intervals for the odds ratio actuallyconstructs confidence intervals for the log of the odds ratio
Getting a confidence interval for the odds ratio itself thenrequires an extra step of converting the confidence intervalback to the odds ratio scale
Patrick Breheny STA 580: Biostatistics I 58/62
IntroductionHypothesis testing
Study designsMeasuring association
Which association?Confidence intervals
The standard error of the log-odds
It can be shown that a good estimate of the standard error of thelog of the odds ratio is
SElOR =
√1
a+
1
b+
1
c+
1
d
where a, b, c, and d are the four entries in the contingency table
Patrick Breheny STA 580: Biostatistics I 59/62
IntroductionHypothesis testing
Study designsMeasuring association
Which association?Confidence intervals
Confidence intervals for the odds ratio: procedure
The first three steps for constructing confidence intervals for theodds ratio should look familiar; the fourth will be new:
#1 Estimate the standard error of the log-odds,
SElOR =√
1a + 1
b + 1c + 1
d
#2 Determine the values that contain the middle x% of thenormal curve, ±zx%
#3 Calculate the confidence interval for the log-odds:
(L,U) =(
log(OR) − zx% · SElOR, log(OR) + zx%SElOR
)#4 Convert the confidence interval from step 3 back into the odds
ratio scale to obtain the confidence interval for the odds ratio:
(eL, eU )
NOTE: “log” above refers to the natural log (sometimes abbreviated
“ln”), not the base-10 logarithmPatrick Breheny STA 580: Biostatistics I 60/62
IntroductionHypothesis testing
Study designsMeasuring association
Which association?Confidence intervals
Confidence intervals for the odds ratio: example
To see how this works in practice, let’s calculate a 95% confidenceinterval for the odds ratio of surviving with sterile surgery inLister’s experiment
#1 Estimate the standard error of the log-odds:
SElOR =
√1
34+
1
6+
1
19+
1
16
= 0.56
#2 As usual, 1.96 contains the middle 95% of the normaldistribution
#3 Recall that the sample odds ratio was 4.77, so the log of thesample odds ratio is 1.56
#3 The 95% confidence interval for the log odds ratio is therefore
(1.56 − 1.96(0.56), 1.56 + 1.96(0.56)) = (0.47, 2.66)
Patrick Breheny STA 580: Biostatistics I 61/62
IntroductionHypothesis testing
Study designsMeasuring association
Which association?Confidence intervals
Confidence intervals for the odds ratio: example
#4 The 95% confidence interval for the odds ratio is therefore(e0.47, e2.66
)= (1.60, 14.2)
Note that the confidence interval doesn’t include 1; thisagrees with our test of significance
Note also that this confidence interval is asymmetric (its righthalf is much longer than its left half) – this would beimpossible to achieve without the log transform
Now we have an idea of the possible clinical significance ofsterile technique: it may be lowering the odds of surgicaldeath by a factor of about 1.6, or by a factor of 14, with afactor of around 5 being the most likely
Patrick Breheny STA 580: Biostatistics I 62/62