case control & other study designs-i-dr.wah

Case-Control & Other Study Designs

Dr. Win Aye Hlaing

Lecturer

Department of Epidemiology

University of Public Health, Yangon

6/26/2017 1

• In the early 1940s, Alton Ochsner, a surgeon

• observed that patients with lung cancer gave a history of cigarette smoking

• relationship is accepted and well recognized today

• it was relatively new and controversial at the time

• He hypothesized that cigarette smoking was linked to lung cancer

6/26/2017 2

• In 1940s, Sir Norman Gregg, an Australian ophthalmologist

• observed a number of infants and young children who presented with an unusual form of cataract

• noted that these children had been in utero during the time of a rubella (German measles) outbreak

• Association between prenatal rubella exposure and development of the unusual cataracts

• at that time there was no knowledge that a virus could be teratogenic

• he proposed his hypothesis solely on the basis of observational data

6/26/2017 3

• 90% of these infants had been in utero during the rubella outbreak

• concluding that rubella was associated with the cataracts?• Clearly, the answer is no• difficult to interpret without data for a comparison group of

children without cataracts• Study both mothers of children with the cataracts and

without cataracts• They had been pregnant during the outbreak of rubella• exposure history-no different for mothers of children with

cataracts than for mothers of controls• the prevalence of rubella exposure was greater in children

with cataracts than in a group of children without cataracts

6/26/2017 4

• To determine the significance of such observations

• a comparison or control group is needed

• Without such a comparison, only constitute a case series

6/26/2017 5

6/26/2017 6

DESIGN OF A CASE-CONTROL STUDY

• A group of people with that disease (cases)

• A group of people without that disease(controls)

• Cases were exposed and were not

• Controls were exposed and were not

• case-control study---if there is an association of an exposure with a disease,

• the prevalence of history of exposure---higher in persons who have the disease (cases) than in those who do not have the disease (controls)

6/26/2017 7

6/26/2017 8

Case[Disease (+)]

Control[Disease (-)]

Exposure (+) a b

Exposure (-) c d

a+c b+d

a/a+c b/b+d

• by selecting cases (with the disease) and controls (without the disease)

• and then measure past exposure by interview and by review of medical or employee records or of results of chemical or biologic assays of blood, urine, or tissues

6/26/2017 9

• If exposure is associated with disease,

a/a + c ˃ b/b + d

• proportion of the cases who were exposed is greater than proportion of the controls who were exposed

6/26/2017 10

6/26/2017 11

• data from a case-control study,

• We cannot estimate the prevalence of the disease

• E.g. we had 200 cases and 400 controls, but this does not imply that the prevalence is 33% or 200/200 + 400

6/26/2017 12

• the investigator could have selected

• 200 cases and 200 controls (1 control per case), or

• 200 cases and 800 controls (4 controls per case)

• Because the proportion of the entire study population that consists of cases is determined by the ratio of controls per case, and this proportion is determined by the investigator, it clearly does not reflect the true prevalence

6/26/2017 13

• erroneous impression that

– cohort studies go forward in time and

– case-control studies go backward in time

• Such a distinction is not correct

• retrospective has been used for case-control studies

• But retrospective cohort study also uses data obtained in the past

• Thus, calendar time is not the characteristic that distinguishes a case-control from a cohort study

6/26/2017 14

• Why the unusual number of controls?

• The investigation planned for two controls per case (400 controls), and that some of the controls did not participate

6/26/2017 15

Potential Biases in Case-Control Studies

• Selection Bias

– Selection of case

– Selection of control

• Information Bias

– Limitations in recall

– Recall bias

6/26/2017 16

Selection Bias – Selection of Case

• cases can be selected from a variety of sources, including hospital patients, patients in physicians’ practices, or clinic patients

• Several problems---selected from a single hospital

• results may not be generalizable to all patients with the disease

• cases are drawn from a tertiary care facility, which selectively admits severely ill patients---risk factors only in persons with severe forms of the disease

6/26/2017 17

Using Incident or Prevalent Cases

• if we use prevalent cases(already been diagnosed) :

– a larger number of cases is often available for study and

– Most people with disease die soon after diagnosis

– Underrepresented in a study

– Highly non-representative group of cases

6/26/2017 18

• If we use incident cases:

– The problem is - wait for new cases to be diagnosed

– But can exclude that any patients who may have died before the diagnosis was made

6/26/2017 19

Selection of Controls

• In 1929, Johns Hopkins University

• test the hypothesis that tuberculosis protected against cancer

• From total 7,500 consecutive autopsies

• Selected 816 cases of cancer and control group of 816

6/26/2017 20

6/26/2017 21

Case[Cancer (+)]

Control[Cancer(-)]

TB (+) 54 133

TB (-) 762 683

816 816

6.6% 16.3%

• Of the 816 autopsies of patients with cancer

• 54 had tuberculosis (6.6%)

• whereas of the 816 controls with no cancer

• 133 had tuberculosis (16.3%)

• the prevalence of tuberculosis was higher in the control group than in the case group

• concluded that tuberculosis had an antagonistic or protective effect against cancer

6/26/2017 22

• At the time of the study, tuberculosis was one of the major reasons for hospitalization at Johns Hopkins Hospital

• Inadvertently done in choosing the cancer-free control group was diagnosed with and hospitalized for tuberculosis

• selected the controls, they came from a pool that was heavily weighted with tuberculosis patients, which did not represent the general population

• Clearly, this conclusion was not justified on the basis of these data

6/26/2017 23

• A fundamental conceptual issue relating to selection of controls is:

• the controls should be similar to the cases in all respects other than having the disease or

• they should be representative of all persons without the disease in the population from which the cases are selected

6/26/2017 24

• Controls may be selected

• from non-hospitalized persons living in the community or

• from hospitalized patients admitted for diseases other than that for which the cases were admitted

6/26/2017 25

6/26/2017 26

Use of Hospitalized Patients as Controls

• more economical

• ill-defined reference population that generally cannot be characterized

• a conceptual attractiveness to comparing hospitalized cases to hospitalized controls from the same institution who come from the same reference population

6/26/2017 27

Use of Non-Hospitalized People as Controls

• Other sources include school rosters, selective service lists, and insurance company lists

• neighborhood in which the case lives--- neighborhood controls

• identify the home of a case as a starting point• and from there walk past a specified number of houses

in a specified direction and seek the first household that contains an eligible control

• Because of increasing problems of security in US, many people will no longer open their doors to interviewers

• in developing countries, the door-to door approach to obtaining controls may be ideal

6/26/2017 28

• Alternate method for selecting such controls is to use random digit dialing

• a case’s seven-digit telephone number,

• the first three digits are the constant,

• the terminal four digits of the phone number are randomly selected

6/26/2017 29

• Another approach to control selection is to use a best friend control

• A case is asked for the name of a best friend who participate in the study

• there are also disadvantages:

• A control is similar to the case in age and in many other demographic and social characteristics

• A resulting problem may be that the controls may be too similar to the cases in regard to many variables, including the variables that are being investigated in the study

• it may be useful to select a spouse or sibling control

• a sibling may provide some control over genetic things

6/26/2017 30

Information Bias – Limitations in Recall

• collecting data from subjects by interviews

• Because all human beings are limited to varying degrees in their ability to recall information

• Some of the cases or controls who were exposed will be erroneously classified as unexposed

• and some who were not exposed will be erroneously classified as exposed

• leads to an underestimate of the true risk of the disease associated with the exposure

6/26/2017 31

Recall Bias• possible relationship of congenital malformations to

prenatal infections• Interview mothers of children with congenital

malformations (cases) and mothers of children without malformations (controls)

• Each mother is questioned about infections during the pregnancy

• A mother who has had a child with a birth defect tries to identify some unusual event that occurred during her pregnancy

• She wants to know whether the abnormality was caused• a mother of a child without a birth defect may not even

notice or may have forgotten entirely• This type of bias is known as recall bias

6/26/2017 32

Matching

• the process of selecting the controls so that they are similar to the cases in certain characteristics, such as age, race, sex, socioeconomic status, and occupation

• Matching may be of two types:

(1) group matching

(2) individual matching

6/26/2017 33

Group Matching

• Group matching (or frequency matching)

• selecting the controls with the same characteristic

• If cases are married, controls will be also married

6/26/2017 34

Individual Matching

• Individual matching (or matched pairs)

• a control is selected who is similar to the case

• E.g.

• if the first case enrolled in our study is a 45-year-old white woman,

• we will seek a 45-year-old white female control

• the second case is a 24-year-old black man,

• we will select a control who is also a 24-year-old black man

6/26/2017 35

Practical Problems with Matching

• to match according to too many characteristics,

• difficult or impossible to identify an appropriate control

• E.g.

• If the case is a 48-year-old black woman who is married, has four children, lives in zip code 21209, and works in a photo-processing plant,

• it may prove difficult or impossible to find a control

6/26/2017 36

Conceptual Problems with Matching

• More important problem• If we match the cases (breast cancer) and the controls (no

breast cancer) for marital status,• we can no longer study whether or not marital status is a

risk factor for breast cancer• Why not? Because in matching according to marital status• We have artificially established an identical proportion in

cases and controls• if 35% of the cases are married,• we create a control group in which 35% are also married

6/26/2017 37

• Unplanned matching may inadvertently occur in case-control studies

• E.g.

• if we use neighborhood controls--matching for socioeconomic status as well as for cultural and other characteristics of a neighborhood

• If we use bestfriend controls--share many lifestyle characteristics

• Planned or an inadvertent manner--Overmatching

6/26/2017 38

Use of Multiple Controls

(1) Controls of Same Type

(1) Controls of Different Types

6/26/2017 39

Control of the Same Type

• two controls or three controls for each case• used to increase the power of the study• 1 case to 4 controls• Why use multiple controls for each case?• there may be a limit to the number of potential

cases available for study• the number of cases cannot be increased without

either extending the study in time to enroll more cases or developing a collaborative multicentered study

6/26/2017 40

Multiple Controls of Different Types

• exposure of the hospital controls- not represent the rate of exposure- in a population of nondiseased persons

• the controls may be a highly selected-non-diseased individuals and have a different exposure experience

• hospitalized patients smoke more than people living in the community

• we do not know what the prevalence level of smoking in hospitalized controls represents or how to interpret a comparison of these rates with those of the cases

• To address this problem, we may choose to use an additional control group, such as neighborhood controls

6/26/2017 41

6/26/2017 42

• Case-control studies are also valuable when the disease being investigated is rare

• Possible to identify cases for study from disease registries, hospital records, or other sources

• cohort study for a rare disease, an extremely large study population may be needed in order to observe a sufficient number of individuals to develop the disease

• cohort design may involve many years of follow-up and logistical difficulty and expense

6/26/2017 43

6/26/2017 44

6/26/2017 45

6/26/2017 46

Case-Control Studies based in a Defined Cohort

6/26/2017 47

Two Types

• Nested Case Control Study

• Case Cohort Study

6/26/2017 48

Nested Case-Control Studies

• A nested case-control study is conducted within a defined cohort in which exposure data and population characteristics are available to some extent, often from the time of enrollment into the cohort

6/26/2017 49

• Both cases and controls are from a known, defined population at risk of disease

• Exposure data are collected prior to the diagnosis of the disease

• Information collected has not been influenced by the knowledge of disease status (no interviewer/information bias)

• Is less costly than a cohort study because fewer subjects and fewer tests and/or specimens are required

6/26/2017 50

• In a study to determine whether the helicobecter pylori infection was associated with the development of gastric cancer, 189 patients with gastric cancer and 189 cancer-free individuals were identified from 130,000 people from a cohort followed since the 1960s

• Helicobacter pylori infection status was determined using stored serum collected during the 1960s

6/26/2017 51

Example

• Figure - A shows the starting point as a defined cohort of individuals

• Some of them develop the disease in question but most do not

• In this hypothetical example, the cohort is observed over a 5-year period

• During this time, 5 cases develop—1 case after 1 year, 1 after 2 years, 2 after 4 years, and 1 after 5 years

6/26/2017 52

6/26/2017 53

• Figures - B–I show the time sequence in which the cases develop after the start of observations

• At the time each case or cases develop, the same number of controls is selected

• The solid arrows on the left side of the figure denote the appearance of cases of the disease

• and the dotted arrows on the right side denote the selection of controls who aredisease-free but who are at risk of developing the disease

6/26/2017 54

• Figure - B shows case #1 developing after 1 year and Figure - C shows control #1 being selected at that time

• Figure - D shows case #2 developing after 2 years and Figure - E shows control #2 being selected at that time

• Figure – F shows cases #3 and #4 developing after 4 years and Figure - G shows controls #3 and #4 being selected at that time

• Finally, Figure - H shows the final case (#5) developing after 5 years and

• Figure - I shows control #5 being selected at this point

6/26/2017 55

6/26/2017 56

6/26/2017 57

6/26/2017 58

6/26/2017 59

6/26/2017 60

• Figure - I is also a summary of the design and the final study populations used in the nested case-control study

• At the end of 5 years, 5 cases have appeared and at the times the cases appeared a total of 5 controls were selected for study

• In this way, the cases and controls are matched on calendar time and length of follow-up

• Because a control is selected each time a case develops

• a control who is selected early in the study could later develop the disease and become a case in the same study

6/26/2017 61

Advantages of Nested Case-Control Studies

• Possibility of recall bias is eliminated, since data on exposure are obtained before disease develops

• Exposure data are likely to represent the pre-illness state since they are obtained years before clinical illness is diagnosed

• Costs are reduced compared to those of a prospective study, since laboratory tests need to be done only on specimens from subjects who are later chosen as cases or as controls

6/26/2017 62

Case-Cohort Studies

6/26/2017 63

• The second type of cohort-based case-control study is the case-cohort design

• Cases develop at the same times that were seen in the nested case-control design,

• but the controls are randomly chosen from the defined cohort with which the study began

• This subset of the full cohort is called the subcohort

6/26/2017 64

• An advantage of this design is that because controls are not individually matched to each case, it is possible to study different diseases(different sets of cases) in the same case-cohort study using the same cohort for controls

• In this design, in contrast to the nested case-control design, cases and controls are not matched on calendar time and length of follow-up;

• instead, exposure is characterized for the subcohort

• This difference in study design needs to be taken into account in analyzing the study results

6/26/2017 65

Ecologic Studies

• The first approach in determining whether an association exists might be to conduct studies of group characteristics, called ecologic studies

• the relationship between breast cancer incidence and average dietary fat consumption in each country

• The higher the average dietary fat consumption, the higher breast cancer incidence

• conclude that dietary fat may be a causal factor for breast cancer

6/26/2017 66

6/26/2017 67

• The problem is that we do not know whether the individuals who developed breast cancer actually had high dietary fat intake

• All we have are average values

• it is conceivable that those who developed breast cancer ate very little dietary fat

• That does not reveal whether this might be true

• individuals in each country are characterized by the average figure

6/26/2017 68

• This problem is called the ecologic fallacy

• we may be ascribing to members of a group that they in fact do not possess as individuals

• We only have data for groups

• we do not have exposure and outcome data

6/26/2017 69

Comparison of Cohort and Case-Control Studies

• Table 13-1

• Page - 241

6/26/2017 70

6/26/2017 71

THANK YOU!!

6/26/2017 72

case control & other study designs-i-dr.wah

Healthcare