occurrence and timing of events depend on exposure to the risk of an event exposure risk depends on...
TRANSCRIPT
Occurrence and timingof
eventsdepend
on
Exposure to the risk of an event
exposureRisk depends on exposure
ID Age Educ MS Exposure1 13 1 0 11 14 1 0 11 15 1 0 11 16 1 0 11 17 1 0 11 18 1 0 11 19 2 0 11 20 2 0 11 21 2 0 11 22 2 0 11 23 2 0 0.5 Censoring2 13 1 0 12 14 1 0 12 15 1 0 12 16 1 0 12 17 1 0 12 18 0 0 12 19 0 1 0.5 Event3 13 1 0 13 14 1 0 13 15 1 0 13 16 1 0 13 17 1 0 13 18 0 0 13 19 0 0 13 20 2 0 13 21 2 1 0.5 Event4 13 1 0 14 14 1 0 14 15 0 0 14 16 0 0 14 17 0 0 14 18 0 0 14 19 0 0 14 20 0 1 0.5 Event
Person-age record fileTime-varying covariates
Age at first marriage and age at change in education: Person-years file
Educ: 0 = not in school full-time 1 = secondary eduction 2 = postsecondary education
Marriage [MS]: 0 = not married 1 = married
Source: Yamaguchi, 1991, p. 22
EDUC Events ExposureMarriages
1 0 182 1 60 2 9
Total 3 33O/E rate 0.0909
Events and exposures
All age periods prior to marriage and age at marriage are included.
Exposure: examples
• To risk of conception
• To risk of infection (e.g. malaria, HIV)
• To marriage
• To risk of divorce
• To risk of dying
• Health risk
Exposure to risk
Whenever an event or act gives rise to gain or loss that cannot be
predicted
Risk of the unexpected
Williams et al., 1995, Risk management and insurance, McGraw-Hill, New York, p. 16
Exposure analysis• Being exposed or not
• If exposed, level of exposure (intensity)
• Factors affecting level of exposure(e.g. age, contacts, etc.)
• Interventions may affect level of exposure– Contraceptives and sterilisation are used to prevent unwanted pregnancies
– Breastfeeding prolongs postpartum amenorrhoea (PPA)
– Immunisation prevents (reduces) risk of infectious disease
– Lifestyle reduces/increases risk of lung cancer
• Which mechanism(s) determines level of exposure– e.g. Breastfeeding stimulates production of prolactin hormone, which inhibits ovulation
Hobcraft and Little, ??
Risk levels and differentials
Risk measuresPrediction of risk levels
Determinants of differential risk levels
Risk = potential variation in outcome
• Count: Number of events during given period (observation window)• Count data
• Probability: probability of an outcome: proportion of risk set experiencing a given outcome (event) at least once
• Basis = Risk set• Risk set = all persons at risk at given point in time.
• Rate: number of events per time unit of exposure (person-time)• Basis: duration of exposure (duration at risk)
• Rate (general) = change in one quantity per unit change in another quantity (usually time; other possible measures include space, miles travelled)
(Objective) risk measures
• Difference of probabilities: p1 - p2 (risk difference)
• Relative risk: ratio of probabilities (focus: risk factor)• prob. of event in presence of risk factor/ prob. of event in absence of risk
factor (control group; reference category): p1 / p2
• Odds: odds on an outcome: ratio of favourable outcomes to unfavourable outcomes. Chance of one outcome rather than another: p1 / (1-p1)
The odds are what matter when placing a bet on a given outcome, i.e. when
something is at stake. Odds reflect the degree of belief in a given outcome.
Relation odds and relative risk: Agresti, 1996, p. 25
Risk measures
• Odds: two categories (binary data)
) ... 0:scale] [odds (Range p-1
p Odds
Odds 1
1
Odds 1
Odds p
1-
) ,- :(range logit(p) p-1
pln ln(odds)
]exp[- 1
1
]exp[1
]exp[ p
Risk measures
Parameters of logistic regression: ln(odds) and ln(odds ratio)
In regression analysis, is linear predictor: = 0 + 1 x1 + 2 x2 +
• Odds: multiple categories (polytomous data)
)logit( ln )ln(11
3
11 p
pp
odds
Risk measures
Parameters of logistic regression: ln(odds) and ln(odds ratio)
1
pp
Oddspp
Oddspp
Odds3
33
3
22
3
11
)logit( ln )ln(22
3
22 p
pp
odds
]1[ ]exp[ ln - ln ppppppp213311131
jj
1
21
1
1 ]exp[
]exp[
]exp[ ]exp[1
]exp[ p
jj
i
i ]exp[
]exp[ p
Select category 3 as reference category
• Odds ratio : ratio of odds (focus: risk indicator, covariate)• odds in target group / odds in control group [reference category]: ratio
of favourable outcomes in target group over ratio in control group. The odds ratio measures the ‘belief’ in a given outcome in two different populations or under two different conditions. If the odds ratio is one, the two populations or conditions are similar.
Target group: k=1; Control group: k=2
Risk measures
Parameters of logistic regression: ln(odds) and ln(odds ratio)
pppp
OddsOdds
2k221k
1k211k
2k
1k
12
Risk measures in epidemiology• Prevalence: proportion (refers to status)• Incidence rate: rate at which events (new cases)
occur over a defined time period [events per person-time]. Incidence rate is also referred to as incidence density (e.g. Young, 1998, p. 25; Goldhaber and Fireman, 1991).
• Case-fatality ratio: proportion of sick people who die of a disease (measure of severity of disease). Is not a rate!! (Young, 1998, p. 27)
Confusion:Birth defect prevalence: proportion of live births having defectsBirth defect incidence: rate of development of defects among all embryos over the period of gestation (Young, 1998, p. 48)
Risk measures in epidemiology• Attributable risk (among the exposed): proportion
of events (diseases) attributable to being exposed: [p1-p2]/p1 (since non-exposed can also develop disease)
• Subjective probability: degree of belief about the outcome of a trial or process, or about the future. It is the perception of the probability of an outcome or event. ‘It is highly dependent on judgment’ (Keynes, 1912, A treatise on probability, Macmillan, London). Keynes regarded probability as a subjective concept: our judgment (intuition, gut feeling) about the likelihood of the outcome.
– See also Value-expectancy theory: attractiveness of an alternative (option) depends on the subjective probability of an outcome and the value or utility of the outcome (Fishbein and Ajzen, 1975).
(Subjective) risk measures
In case of multiple categories,select a reference category
Reference category is coded 0
Various coding schemes!
Coding schemes
• Contrast coding: one category is reference category (simple contrast coding; dummy coding). Model parameters are deviations from reference category.
• Indicator variable coding: indicator (0,1) variables• Cornered effect coding (Wrigley, 1985, pp. 132-136) [0,1])
• Effect coding: the mean is the reference. Model parameters are deviations from the mean.
• Centred effect coding (Wrigley, 1985, pp. 132-136) [-1,+1]
• Other types of coding: see e.g. SPSS Advanced Statistics, Appendix A
Vermunt, 1997, p. 10
Coding schemes
• Categories are coded:– Binary: [0,1], [-1,+1], [1,2]– Multiple: [0,1,2,3,..], [set of binary]
e.g. 3 categories:
100
010
000
Example
Age Females Males TotalEarly (LT 20) 135 74 209Late (GE 20) 143 178 321Total 278 252 530Censored at int 13 40 53TOTAL 291 292 583
Number of young adults leaving homeby age and sex, Netherlands, 1961 birth cohort
Sex
The survey (Sept. 1987 - Febr. 1988):Sample of 583 young adults born in 1961530 left home before survey53 censored cases
A. CountsAge Females Males TotalEarly (LT 20) 135 74 209Late (GE 20) 143 178 321Total 278 252 530
B. ProbabilitiesAge Females Males F+MEarly (LT 20) 0.49 0.29 0.39Late (GE 20) 0.51 0.71 0.61Total 1.00 1.00 1.00
C. ODDS and LOGITAge Females Males F+MODDS: Early/Late 0.94 0.42 0.65LOGIT:Early/late -0.058 -0.878 -0.429
Young adults leaving homeby age and sex, Netherlands, 1961 birth cohort
Descriptive statistics
Reference categories: Late [20], Males
Odds on leaving home early (rather than late) Logit
- Males: 74/178 = 0.416 -0.877
- Females: 135/143 = 0.944 -0.058
Odds ratio (): 0.944/0.416 = 2.27 0.820(if we bet that a person leaves home early, we should bet on females; they are the ‘winners’ - leave home early)
Var() = 2 [1/135+1/143+1/74+1/178] = 0.1725
ln = 0.819
Var(ln ) = 1/135+1/143+1/74+1/178 = 0.0335Selvin, 1991, p. 345
Age Females Males TotalEarly (LT 20) 135 74 209Late (GE 20) 143 178 321Total 278 252 530
Number of young adults leaving homeby age and sex, Netherlands, 1961 birth cohort
Sex
T a b l eN u m b e r o f y o u n g a d u l t s l e a v i n g h o m e b y a g e a n d s e x
F e m a l e s M a l e s T o t a l
< 2 0 1 3 5 7 4 2 0 9
2 0 1 4 3 1 7 8 3 2 1
T o t a l 2 7 8 2 5 2 5 3 0
D u m m y c o d i n g : r e f e r e n c e c a t e g o r y : ( i ) f e m a l e s ; ( i i ) l e a v i n g h o m e l a t e
L o g i t m o d e l :p-1
pln Logit
i
i
ip i i s s e x ( i = 1 f o r f e m a l e s a n d 2 f o r m a l e s )
O D D SF e m a l e s ( r e f e r e n c e ) : 1 3 5 / 1 4 3 = 0 . 9 4 4 0M a l e s : 7 4 / 1 7 8 = 0 . 4 1 5 7
O D D S R A T I OO D D S m a l e s / O D D S f e m a l e s = 0 . 4 1 5 7 / 0 . 9 4 4 0 = 0 . 4 4 0 4
L O G I T p i s l n ( 0 . 9 4 4 0 ) = – 0 . 0 5 7 5 7 f o r f e m a l e s a n d l n ( 0 . 4 1 5 7 ) = - 0 . 8 7 7 7 f o r m a l e s
L n o d d s r a t i o = - 0 . 8 2 0 1N O T E t h a t – 0 . 8 7 7 7 = – 0 . 0 5 7 5 7 – 0 . 8 2 0 1
A r e m a l e s m o r e l i k e l y t o l e a v e h o m e e a r l y t h a n f e m a l e s ?
Leaving home
Odds and probabilities
0.0
1.0
2.0
3.0
4.0
5.0
6.0
7.0
8.0
9.0
10.0
0.0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1.0
Probability
Od
ds
-2.5
-2
-1.5
-1
-0.5
0
0.5
1
1.5
2
2.5
Lo
git
odds logit
Relation probabilities, odds and logit
Risk analysis: modelsPrediction of risk levels and differentials risk levels
Probability models and regression models
– Counts Poisson r.v. Poisson distribution Poisson regression / log-linear model
– Probabilities binomial and multinomial r.v. binomial and multinomial distribution logistic regression / logit model
(parameter p, probability of occurrence, is also called risk; e.g. Clayton and Hills, 1993, p. 7)
– Rates Occurrences/exposure Poisson r.v. log-rate model