Core Deposits Sensitivity and
Survival AnalysisLaura RobertsHugh BlaxallBrian VelliganSept. 13, 2010
Research question 1a
• Question 1a.
How can we visually summarize account duration?
Research question 1b
• Question 1b. How can we predict the length of time a person will keep a core account open (account duration)? We cannot simply compute an average of account durations because we do not know how far into the future current accounts will “survive.” Simple means will produce a negatively biased estimate.
• Perhaps we can revise our question to read, “What is the probability a person will keep an account open for a specific period of time?” This new question allows us to use survival analysis, hazard probabilities, and risk functions to get a detailed picture of account duration.
Question 1b (continued)
• Can we create a model using time and other indictors (e.g. interest rate or change in the interest rate on the account) as predictors of account duration? This is a more sophisticated question for another time…food for thought for now…
Question 1c
• 1c – How can we summarize typical account duration with a single index? Remember means and other simple average indices will not do the trick because we do not know how long accounts will stay open…
What is the best statistical tool for answering each question?
• Question 1a – to visually summarize duration use a histogram of the frequency of duration for censored and uncensored accounts. I’ll show you how to do this.
• Question 1b - To predict duration, use survival analysis.
• Question 1c – for a single index, we can use median lifetime survival probability…more on this…
Background for Study
• 1. Use a multi-cohort analysis such as accounts opened between 1972 and 1977 and studied until 1984.
• 2. Measure duration of each account.
• 3. Predict length of time until a given event, in this case, closing of the account.
• 4. Some people will not close the account within the time period of observation. These people (accounts) are considered to be censored.
© Willett, Harvard University Graduate School of Education, 04/13/23
S052/II.2(b) – Slide 8
Dataset acctdur.txt
Overview Discrete-time person-level dataset on the duration of accounts opened between 1972 and 1977, and which were followed uninterruptedly until 1984.
Source Bank records.
Sample size 3941 accounts.
More Info Singer & Willett, 2003
Let’s examine an example …Let’s examine an example …
Note on the labeling of the discrete time “bins.” We regarded an account’s first year as their zeroth year. If they then are closed sometime during the following year, they were classified as having a duration of one year and having been closed in “year one.”
Note on the labeling of the discrete time “bins.” We regarded an account’s first year as their zeroth year. If they then are closed sometime during the following year, they were classified as having a duration of one year and having been closed in “year one.”
S052/II.2(a1): Introducing The Central Concepts In Classical Survival Analysis Introducing A Dataset On Account Duration
S052/II.2(a1): Introducing The Central Concepts In Classical Survival Analysis Introducing A Dataset On Account Duration
““Multiple Cohort” Sample Multiple Cohort” Sample DesignDesignBe aware that multiple annual cohorts of accounts are pooled together into this single sample:
•Cohorts entered the sample sequentially between the 1972 and 1977.*
•All cohorts were followed until the end of 1984.
““Multiple Cohort” Sample Multiple Cohort” Sample DesignDesignBe aware that multiple annual cohorts of accounts are pooled together into this single sample:
•Cohorts entered the sample sequentially between the 1972 and 1977.*
•All cohorts were followed until the end of 1984.
Important Distinction Important Distinction You Must Keep In You Must Keep In MindMindThe two “modern” approaches to survival analysis are distinct in the way that they require duration to be measured:
•In discrete-time survival analysis, time is measured in discrete units, such as semesters, years, etc.
•In continuous-time survival analysis, time can be measured to any level of precision.
Important Distinction Important Distinction You Must Keep In You Must Keep In MindMindThe two “modern” approaches to survival analysis are distinct in the way that they require duration to be measured:
•In discrete-time survival analysis, time is measured in discrete units, such as semesters, years, etc.
•In continuous-time survival analysis, time can be measured to any level of precision.
Research Research QuestionQuestion
Whether, and if so when, accounts are closed?
Research Research QuestionQuestion
Whether, and if so when, accounts are closed?
© Willett, Harvard University Graduate School of Education, 04/13/23
S052/II.2(b) – Slide 9
The dataset is straightforward, containing IDs and length of account, with one small hitch …The dataset is straightforward, containing IDs and length of account, with one small hitch …
Structure of Dataset
Col#
Var Name Variable Description Variable Metric/Labels
1 ID Customer identification code. Integer
2 acctopen
Number of years that the account remained open, or until the account was censored in 1984 by the end of the study.
Integer
3 CENSOR
Dummy variable to indicate to indicate whether an account was censored by the end of data collection in 1984.
Dichotomous variable: 0 = not censored,1 = censored.
S052/II.2(a1): Introducing The Central Concepts In Classical Survival Analysis The Difficult Problem of Censoring!!!
S052/II.2(a1): Introducing The Central Concepts In Classical Survival Analysis The Difficult Problem of Censoring!!!
There is a problem that is intrinsic to survival data, and is illustrated in this dataset: The event of importance in the
study is is “closing an account.” But not every customer (account)
actually experiences this event while being observed by researchers.
We say that they are “censored” by the end of the data-collection.
There is a problem that is intrinsic to survival data, and is illustrated in this dataset: The event of importance in the
study is is “closing an account.” But not every customer (account)
actually experiences this event while being observed by researchers.
We say that they are “censored” by the end of the data-collection.
And, of course, some of the censored accounts will eventually experience the event of interest, but not while the researchers are watching! Ignoring this can seriously impact
estimates of time-to-event. And, given that time-to-event is the
focus of our research question, we need to figure out how to deal with this!
And, of course, some of the censored accounts will eventually experience the event of interest, but not while the researchers are watching! Ignoring this can seriously impact
estimates of time-to-event. And, given that time-to-event is the
focus of our research question, we need to figure out how to deal with this!
© Willett, Harvard University Graduate School of Education, 04/13/23
S052/II.2(b) – Slide 10
One sensible thing you can do is display the frequency with which each account length occurs, in a vertical histogram that includes all the accounts in the sample, both censored and un-censored.
One sensible thing you can do is display the frequency with which each account length occurs, in a vertical histogram that includes all the accounts in the sample, both censored and un-censored.
I created this vertical histogram by typing the frequencies of each account length into an EXCEL spreadsheet. You can create similar vertical histograms in SAS too, but they are not so pretty.
I created this vertical histogram by typing the frequencies of each account length into an EXCEL spreadsheet. You can create similar vertical histograms in SAS too, but they are not so pretty.
Note the impact of the multi-cohort research design – any account that was opened after 1977 and remained open longer than 6 years is a censored case.
Note the impact of the multi-cohort research design – any account that was opened after 1977 and remained open longer than 6 years is a censored case.
S052/II.2(a1): Introducing The Central Concepts In Classical Survival Analysis Exploring the Account Data ANSWER TO RESEARCH QUESTION 1a
S052/II.2(a1): Introducing The Central Concepts In Classical Survival Analysis Exploring the Account Data ANSWER TO RESEARCH QUESTION 1a
© Willett, Harvard University Graduate School of Education, 04/13/23
S052/II.2(b) – Slide 11
S052/II.2(a1): Introducing The Central Concepts In Classical Survival Analysis Exploring the Data
S052/II.2(a1): Introducing The Central Concepts In Classical Survival Analysis Exploring the Data
Here are two hopeless strategies for dealing with censoring, while summarizing account duration length …
Here are two hopeless strategies for dealing with censoring, while summarizing account duration length …
If we set the duration lengths of the censored accounts to their longest observed career length, the mean account duration for all accounts is 6.31 years. This too is a negatively biased estimate of true duration even if only one only one account has lasted account has lasted longer than the longer than the censored durationcensored duration.
If we set the duration lengths of the censored accounts to their longest observed career length, the mean account duration for all accounts is 6.31 years. This too is a negatively biased estimate of true duration even if only one only one account has lasted account has lasted longer than the longer than the censored durationcensored duration.
If you take the average of the duration lengths of only the uncensored accounts, their mean account duration is 3.73 years, which is a negatively biased estimate of the average population account duration.
If you take the average of the duration lengths of only the uncensored accounts, their mean account duration is 3.73 years, which is a negatively biased estimate of the average population account duration.
© Willett, Harvard University Graduate School of Education, 04/13/23
S052/II.2(b) – Slide 12
Dataset Acct dur_PP.txt
Overview Person-period dataset containing the same information as the Acctdur.txt person dataset, on the career duration of accounts who began between 1972 and 1977, and who were followed uninterruptedly until 1984.
Source Bank records.
Sample size 24875 annual person-period records.
More Info Singer & Willett, 2003
You can resolve these problems by working with your data in a different format:Re-format the data into a person-period
format. In a person-period dataset, you can estimate
a different class of summary statistics that address the “whether” and “when” questions. Hazard probability. Survival probability. Median lifetime.
You can resolve these problems by working with your data in a different format:Re-format the data into a person-period
format. In a person-period dataset, you can estimate
a different class of summary statistics that address the “whether” and “when” questions. Hazard probability. Survival probability. Median lifetime.
S052/II.2(a1): Introducing The Central Concepts In Classical Survival Analysis Resolving The Problem Of Censoring By Working In A Person-Period Dataset
S052/II.2(a1): Introducing The Central Concepts In Classical Survival Analysis Resolving The Problem Of Censoring By Working In A Person-Period Dataset
Notice that the name of the dataset is different
Here’s a clue to the difference between the person-level and the person-period dataset…
© Willett, Harvard University Graduate School of Education, 04/13/23
S052/II.2(b) – Slide 13
Person-Level DatasetID acctopen CENSOR 1 1 Not censored 2 2 Not censored 3 1 Not censored 4 1 Not censored 5 12 Censored 6 1 Not censored 7 12 Censored 8 1 Not censored 9 2 Not censored10 2 Not censored12 7 Not censored13 12 Censored14 1 Not censored15 12 Censored16 12 CensoredEtc.
Person-Level DatasetID acctopen CENSOR 1 1 Not censored 2 2 Not censored 3 1 Not censored 4 1 Not censored 5 12 Censored 6 1 Not censored 7 12 Censored 8 1 Not censored 9 2 Not censored10 2 Not censored12 7 Not censored13 12 Censored14 1 Not censored15 12 Censored16 12 CensoredEtc.
Person-PeriodDatasetID PERIOD EVENT1 1 12 1 02 2 13 1 14 1 15 1 05 2 05 3 05 4 05 5 05 6 05 7 05 8 05 9 05 10 05 11 05 12 06 1 17 1 07 2 07 3 07 4 07 5 07 6 07 7 07 8 07 9 07 10 07 11 07 12 0Etc.
Person-PeriodDatasetID PERIOD EVENT1 1 12 1 02 2 13 1 14 1 15 1 05 2 05 3 05 4 05 5 05 6 05 7 05 8 05 9 05 10 05 11 05 12 06 1 17 1 07 2 07 3 07 4 07 5 07 6 07 7 07 8 07 9 07 10 07 11 07 12 0Etc.
In a person-period dataset:• Each person has one row of
data for each time-period,• Their data record continues
until the time-period in which they either experience the event of interest, or they are censored.
In a person-period dataset:• Each person has one row of
data for each time-period,• Their data record continues
until the time-period in which they either experience the event of interest, or they are censored.
S052/II.2(a1): Introducing The Central Concepts In Classical Survival Analysis Inspecting the Person-Period Dataset
S052/II.2(a1): Introducing The Central Concepts In Classical Survival Analysis Inspecting the Person-Period Dataset
account #2 is not censored and so it experiences the event of interest (i.e. closes account ) in the 2nd year.
account #2 is not censored and so it experiences the event of interest (i.e. closes account ) in the 2nd year.
account #7 is censored – it never experiences the event of interest (i.e. never closes account ) in all the 12 years during which accounts are observed.
account #7 is censored – it never experiences the event of interest (i.e. never closes account ) in all the 12 years during which accounts are observed.
© Willett, Harvard University Graduate School of Education, 04/13/23
S052/II.2(b) – Slide 14
EVENT(Did Customer close Account in this Time Period?)
Frequency‚Col Pct ‚ 1‚ 2‚ 3‚ 4‚ 5‚ 6‚ƒƒƒƒƒƒƒƒƒˆƒƒƒƒƒƒƒƒˆƒƒƒƒƒƒƒƒˆƒƒƒƒƒƒƒƒˆƒƒƒƒƒƒƒƒˆƒƒƒƒƒƒƒƒˆƒƒƒƒƒƒƒƒˆNo close ‚ 3485 ‚ 3101 ‚ 2742 ‚ 2447 ‚ 2229 ‚ 2045 ‚ ‚ 88.43 ‚ 88.98 ‚ 88.42 ‚ 89.24 ‚ 91.09 ‚ 91.75 ‚ƒƒƒƒƒƒƒƒƒˆƒƒƒƒƒƒƒƒˆƒƒƒƒƒƒƒƒˆƒƒƒƒƒƒƒƒˆƒƒƒƒƒƒƒƒˆƒƒƒƒƒƒƒƒˆƒƒƒƒƒƒƒƒˆclose ‚ 456 ‚ 384 ‚ 359 ‚ 295 ‚ 218 ‚ 184 ‚ ‚ 11.57 ‚ 11.02 ‚ 11.58 ‚ 10.76 ‚ 8.91 ‚ 8.25 ‚ƒƒƒƒƒƒƒƒƒˆƒƒƒƒƒƒƒƒˆƒƒƒƒƒƒƒƒˆƒƒƒƒƒƒƒƒˆƒƒƒƒƒƒƒƒˆƒƒƒƒƒƒƒƒˆƒƒƒƒƒƒƒƒˆ Total 3941 3485 3101 2742 2447 2229
EVENT(Did Customer close Account in this Time Period?)
Frequency‚Col Pct ‚ 1‚ 2‚ 3‚ 4‚ 5‚ 6‚ƒƒƒƒƒƒƒƒƒˆƒƒƒƒƒƒƒƒˆƒƒƒƒƒƒƒƒˆƒƒƒƒƒƒƒƒˆƒƒƒƒƒƒƒƒˆƒƒƒƒƒƒƒƒˆƒƒƒƒƒƒƒƒˆNo close ‚ 3485 ‚ 3101 ‚ 2742 ‚ 2447 ‚ 2229 ‚ 2045 ‚ ‚ 88.43 ‚ 88.98 ‚ 88.42 ‚ 89.24 ‚ 91.09 ‚ 91.75 ‚ƒƒƒƒƒƒƒƒƒˆƒƒƒƒƒƒƒƒˆƒƒƒƒƒƒƒƒˆƒƒƒƒƒƒƒƒˆƒƒƒƒƒƒƒƒˆƒƒƒƒƒƒƒƒˆƒƒƒƒƒƒƒƒˆclose ‚ 456 ‚ 384 ‚ 359 ‚ 295 ‚ 218 ‚ 184 ‚ ‚ 11.57 ‚ 11.02 ‚ 11.58 ‚ 10.76 ‚ 8.91 ‚ 8.25 ‚ƒƒƒƒƒƒƒƒƒˆƒƒƒƒƒƒƒƒˆƒƒƒƒƒƒƒƒˆƒƒƒƒƒƒƒƒˆƒƒƒƒƒƒƒƒˆƒƒƒƒƒƒƒƒˆƒƒƒƒƒƒƒƒˆ Total 3941 3485 3101 2742 2447 2229
PERIOD(Current Time Period)
‚ 7‚ 8‚ 9‚ 10‚ 11‚ 12‚ Totalˆƒƒƒƒƒƒƒƒˆƒƒƒƒƒƒƒƒˆƒƒƒƒƒƒƒƒˆƒƒƒƒƒƒƒƒˆƒƒƒƒƒƒƒƒˆƒƒƒƒƒƒƒƒˆ‚ 1922 ‚ 1563 ‚ 1203 ‚ 913 ‚ 632 ‚ 386 ‚ 22668‚ 93.99 ‚ 95.19 ‚ 95.78 ‚ 96.31 ‚ 97.53 ‚ 98.72 ‚ˆƒƒƒƒƒƒƒƒˆƒƒƒƒƒƒƒƒˆƒƒƒƒƒƒƒƒˆƒƒƒƒƒƒƒƒˆƒƒƒƒƒƒƒƒˆƒƒƒƒƒƒƒƒˆ‚ 123 ‚ 79 ‚ 53 ‚ 35 ‚ 16 ‚ 5 ‚ 2207‚ 6.01 ‚ 4.81 ‚ 4.22 ‚ 3.69 ‚ 2.47 ‚ 1.28 ‚ˆƒƒƒƒƒƒƒƒˆƒƒƒƒƒƒƒƒˆƒƒƒƒƒƒƒƒˆƒƒƒƒƒƒƒƒˆƒƒƒƒƒƒƒƒˆƒƒƒƒƒƒƒƒˆ 2045 1642 1256 948 648 391 24875
PERIOD(Current Time Period)
‚ 7‚ 8‚ 9‚ 10‚ 11‚ 12‚ Totalˆƒƒƒƒƒƒƒƒˆƒƒƒƒƒƒƒƒˆƒƒƒƒƒƒƒƒˆƒƒƒƒƒƒƒƒˆƒƒƒƒƒƒƒƒˆƒƒƒƒƒƒƒƒˆ‚ 1922 ‚ 1563 ‚ 1203 ‚ 913 ‚ 632 ‚ 386 ‚ 22668‚ 93.99 ‚ 95.19 ‚ 95.78 ‚ 96.31 ‚ 97.53 ‚ 98.72 ‚ˆƒƒƒƒƒƒƒƒˆƒƒƒƒƒƒƒƒˆƒƒƒƒƒƒƒƒˆƒƒƒƒƒƒƒƒˆƒƒƒƒƒƒƒƒˆƒƒƒƒƒƒƒƒˆ‚ 123 ‚ 79 ‚ 53 ‚ 35 ‚ 16 ‚ 5 ‚ 2207‚ 6.01 ‚ 4.81 ‚ 4.22 ‚ 3.69 ‚ 2.47 ‚ 1.28 ‚ˆƒƒƒƒƒƒƒƒˆƒƒƒƒƒƒƒƒˆƒƒƒƒƒƒƒƒˆƒƒƒƒƒƒƒƒˆƒƒƒƒƒƒƒƒˆƒƒƒƒƒƒƒƒˆ 2045 1642 1256 948 648 391 24875
Here’s the Life Table – a Two-Way Contingency Table Analysis of EVENT by PERIOD …Here’s the Life Table – a Two-Way Contingency Table Analysis of EVENT by PERIOD …
S052/II.2(a1): Introducing The Central Concepts In Classical Survival Analysis Beginning Of The Life Table Analysis – Estimating The Sample Hazard Probability
S052/II.2(a1): Introducing The Central Concepts In Classical Survival Analysis Beginning Of The Life Table Analysis – Estimating The Sample Hazard Probability
We can use these frequencies to estimate a hazard probability that describes the “risk of closing” in each time-period.
Hazard probability is the (conditional) probability that an account will experience the event of importance (i.e., close) in a particular time-period, given that it has “survived” up until this period.
We can use these frequencies to estimate a hazard probability that describes the “risk of closing” in each time-period.
Hazard probability is the (conditional) probability that an account will experience the event of importance (i.e., close) in a particular time-period, given that it has “survived” up until this period.In discrete time period #1, for instance:
There are 3941 accounts “at risk of closing.” Of this “risk set of accounts,” 456 were observed to close. Hence, the probability that an account will close in this period, given that it entered it, is (456/3941), or 0.1157. So, the sample hazard probability in discrete time-period #1 is
In discrete time period #1, for instance:There are 3941 accounts “at risk of closing.” Of this “risk set of
accounts,” 456 were observed to close. Hence, the probability that an account will close in this period, given that it entered it, is (456/3941), or 0.1157. So, the sample hazard probability in discrete time-period #1 is
€
ˆ h t1( ) = 0.1157
© Willett, Harvard University Graduate School of Education, 04/13/23
S052/II.2(b) – Slide 15
EVENT(Was account closed in this Time Period?)
Frequency‚Col Pct ‚ 1‚ 2‚ 3‚ 4‚ 5‚ 6‚ƒƒƒƒƒƒƒƒƒˆƒƒƒƒƒƒƒƒˆƒƒƒƒƒƒƒƒˆƒƒƒƒƒƒƒƒˆƒƒƒƒƒƒƒƒˆƒƒƒƒƒƒƒƒˆƒƒƒƒƒƒƒƒˆNo ‚ 3485 ‚ 3101 ‚ 2742 ‚ 2447 ‚ 2229 ‚ 2045 ‚ ‚ 88.43 ‚ 88.98 ‚ 88.42 ‚ 89.24 ‚ 91.09 ‚ 91.75 ‚ƒƒƒƒƒƒƒƒƒˆƒƒƒƒƒƒƒƒˆƒƒƒƒƒƒƒƒˆƒƒƒƒƒƒƒƒˆƒƒƒƒƒƒƒƒˆƒƒƒƒƒƒƒƒˆƒƒƒƒƒƒƒƒˆYes ‚ 456 ‚ 384 ‚ 359 ‚ 295 ‚ 218 ‚ 184 ‚ ‚ 11.57 ‚ 11.02 ‚ 11.58 ‚ 10.76 ‚ 8.91 ‚ 8.25 ‚ƒƒƒƒƒƒƒƒƒˆƒƒƒƒƒƒƒƒˆƒƒƒƒƒƒƒƒˆƒƒƒƒƒƒƒƒˆƒƒƒƒƒƒƒƒˆƒƒƒƒƒƒƒƒˆƒƒƒƒƒƒƒƒˆ Total 3941 3485 3101 2742 2447 2229
EVENT(Was account closed in this Time Period?)
Frequency‚Col Pct ‚ 1‚ 2‚ 3‚ 4‚ 5‚ 6‚ƒƒƒƒƒƒƒƒƒˆƒƒƒƒƒƒƒƒˆƒƒƒƒƒƒƒƒˆƒƒƒƒƒƒƒƒˆƒƒƒƒƒƒƒƒˆƒƒƒƒƒƒƒƒˆƒƒƒƒƒƒƒƒˆNo ‚ 3485 ‚ 3101 ‚ 2742 ‚ 2447 ‚ 2229 ‚ 2045 ‚ ‚ 88.43 ‚ 88.98 ‚ 88.42 ‚ 89.24 ‚ 91.09 ‚ 91.75 ‚ƒƒƒƒƒƒƒƒƒˆƒƒƒƒƒƒƒƒˆƒƒƒƒƒƒƒƒˆƒƒƒƒƒƒƒƒˆƒƒƒƒƒƒƒƒˆƒƒƒƒƒƒƒƒˆƒƒƒƒƒƒƒƒˆYes ‚ 456 ‚ 384 ‚ 359 ‚ 295 ‚ 218 ‚ 184 ‚ ‚ 11.57 ‚ 11.02 ‚ 11.58 ‚ 10.76 ‚ 8.91 ‚ 8.25 ‚ƒƒƒƒƒƒƒƒƒˆƒƒƒƒƒƒƒƒˆƒƒƒƒƒƒƒƒˆƒƒƒƒƒƒƒƒˆƒƒƒƒƒƒƒƒˆƒƒƒƒƒƒƒƒˆƒƒƒƒƒƒƒƒˆ Total 3941 3485 3101 2742 2447 2229
PERIOD(Current Time Period)
‚ 7‚ 8‚ 9‚ 10‚ 11‚ 12‚ Totalˆƒƒƒƒƒƒƒƒˆƒƒƒƒƒƒƒƒˆƒƒƒƒƒƒƒƒˆƒƒƒƒƒƒƒƒˆƒƒƒƒƒƒƒƒˆƒƒƒƒƒƒƒƒˆ‚ 1922 ‚ 1563 ‚ 1203 ‚ 913 ‚ 632 ‚ 386 ‚ 22668‚ 93.99 ‚ 95.19 ‚ 95.78 ‚ 96.31 ‚ 97.53 ‚ 98.72 ‚ˆƒƒƒƒƒƒƒƒˆƒƒƒƒƒƒƒƒˆƒƒƒƒƒƒƒƒˆƒƒƒƒƒƒƒƒˆƒƒƒƒƒƒƒƒˆƒƒƒƒƒƒƒƒˆ‚ 123 ‚ 79 ‚ 53 ‚ 35 ‚ 16 ‚ 5 ‚ 2207‚ 6.01 ‚ 4.81 ‚ 4.22 ‚ 3.69 ‚ 2.47 ‚ 1.28 ‚ˆƒƒƒƒƒƒƒƒˆƒƒƒƒƒƒƒƒˆƒƒƒƒƒƒƒƒˆƒƒƒƒƒƒƒƒˆƒƒƒƒƒƒƒƒˆƒƒƒƒƒƒƒƒˆ 2045 1642 1256 948 648 391 24875
PERIOD(Current Time Period)
‚ 7‚ 8‚ 9‚ 10‚ 11‚ 12‚ Totalˆƒƒƒƒƒƒƒƒˆƒƒƒƒƒƒƒƒˆƒƒƒƒƒƒƒƒˆƒƒƒƒƒƒƒƒˆƒƒƒƒƒƒƒƒˆƒƒƒƒƒƒƒƒˆ‚ 1922 ‚ 1563 ‚ 1203 ‚ 913 ‚ 632 ‚ 386 ‚ 22668‚ 93.99 ‚ 95.19 ‚ 95.78 ‚ 96.31 ‚ 97.53 ‚ 98.72 ‚ˆƒƒƒƒƒƒƒƒˆƒƒƒƒƒƒƒƒˆƒƒƒƒƒƒƒƒˆƒƒƒƒƒƒƒƒˆƒƒƒƒƒƒƒƒˆƒƒƒƒƒƒƒƒˆ‚ 123 ‚ 79 ‚ 53 ‚ 35 ‚ 16 ‚ 5 ‚ 2207‚ 6.01 ‚ 4.81 ‚ 4.22 ‚ 3.69 ‚ 2.47 ‚ 1.28 ‚ˆƒƒƒƒƒƒƒƒˆƒƒƒƒƒƒƒƒˆƒƒƒƒƒƒƒƒˆƒƒƒƒƒƒƒƒˆƒƒƒƒƒƒƒƒˆƒƒƒƒƒƒƒƒˆ 2045 1642 1256 948 648 391 24875
And the sample hazard probabilities for discrete time-periods #4, #5, #6 and #7…And the sample hazard probabilities for discrete time-periods #4, #5, #6 and #7…
S052/II.2(a1): Introducing The Central Concepts In Classical Survival Analysis Continuing The Life Table Analysis – Estimating Sample Hazard Probability
S052/II.2(a1): Introducing The Central Concepts In Classical Survival Analysis Continuing The Life Table Analysis – Estimating Sample Hazard Probability
Something different is happening here in the Life Table?
What is it?Why is it occurring?Is it a problem?
2229
© Willett, Harvard University Graduate School of Education, 04/13/23
S052/II.2(b) – Slide 16
Conclusion? The hazard probability provides the risk of closing at eachyear after an account is open.
Conclusion? The hazard probability provides the risk of closing at eachyear after an account is open.
Collect the sample hazard probabilities together and plot them as a sample hazard function …Collect the sample hazard probabilities together and plot them as a sample hazard function …
S052/II.2(a1): Introducing The Central Concepts In Classical Survival Analysis Plotting the Sample Hazard Function
S052/II.2(a1): Introducing The Central Concepts In Classical Survival Analysis Plotting the Sample Hazard Function
© Willett, Harvard University Graduate School of Education, 04/13/23
S052/II.2(b) – Slide 17
TimePeriod
SampleHazard
Probabilityh(t)
Sample Survival
ProbabilityS(t)
0 1.00001 0.1157 0.88432 0.1102 0.78693 0.1158 0.69584 0.1076 0.62095 0.0891 0.56566 0.0825 0.51897 0.0601 0.48778 0.0481 0.46429 0.0422 0.444610 0.0369 0.428211 0.0247 0.417712 0.0128 0.4123
TimePeriod
SampleHazard
Probabilityh(t)
Sample Survival
ProbabilityS(t)
0 1.00001 0.1157 0.88432 0.1102 0.78693 0.1158 0.69584 0.1076 0.62095 0.0891 0.56566 0.0825 0.51897 0.0601 0.48778 0.0481 0.46429 0.0422 0.444610 0.0369 0.428211 0.0247 0.417712 0.0128 0.4123
Once you have the sample hazard probabilities, you can cumulate them to get sample survival probabilities …Once you have the sample hazard probabilities, you can cumulate them to get sample survival probabilities …
Sample Survival ProbabilitySurvival probability in any time period is the probability of “surviving” beyond that period (ie, the probability of not experiencing the event of interest until after the period).
Sample Survival ProbabilitySurvival probability in any time period is the probability of “surviving” beyond that period (ie, the probability of not experiencing the event of interest until after the period).
S052/II.2(a1): Introducing The Central Concepts In Classical Survival Analysis Continuing The Life Table Analysis – Estimating Sample Survival Probability
S052/II.2(a1): Introducing The Central Concepts In Classical Survival Analysis Continuing The Life Table Analysis – Estimating Sample Survival Probability
Here, all accounts survived the 0th time period, so the estimated sample survival probability in the 0th period is 1.000.
Here, all accounts survived the 0th time period, so the estimated sample survival probability in the 0th period is 1.000.
The estimated hazard probability suggests that a proportion of 0.1157 of accounts in the 1st period risk set will “die” in the 1st period (i.e., close).
The estimated hazard probability suggests that a proportion of 0.1157 of accounts in the 1st period risk set will “die” in the 1st period (i.e., close).
Because a proportion of 0.1157 of the risk set will “die” in the 1st period, we know that (1 - 0.1157) or 0.8843 of the 1st period risk set will survive.
In other words, 0.8843 of the entering “1.0000” will remain “alive” beyond the 1st time-period (and will therefore be potentially available to close at some later time).
The sample survival probability in the 1st time period is therefore 0.8843 1.000, or:
Because a proportion of 0.1157 of the risk set will “die” in the 1st period, we know that (1 - 0.1157) or 0.8843 of the 1st period risk set will survive.
In other words, 0.8843 of the entering “1.0000” will remain “alive” beyond the 1st time-period (and will therefore be potentially available to close at some later time).
The sample survival probability in the 1st time period is therefore 0.8843 1.000, or:
8843.0)(ˆ1 tS
© Willett, Harvard University Graduate School of Education, 04/13/23
S052/II.2(b) – Slide 18
TimePeriod
SampleHazard
Probabilityh(t)
Sample Survival
ProbabilityS(t)
0 1.00001 0.1157 0.88432 0.1102 0.78693 0.1158 0.69584 0.1076 0.62095 0.0891 0.56566 0.0825 0.51897 0.0601 0.48778 0.0481 0.46429 0.0422 0.444610 0.0369 0.428211 0.0247 0.417712 0.0128 0.4123
TimePeriod
SampleHazard
Probabilityh(t)
Sample Survival
ProbabilityS(t)
0 1.00001 0.1157 0.88432 0.1102 0.78693 0.1158 0.69584 0.1076 0.62095 0.0891 0.56566 0.0825 0.51897 0.0601 0.48778 0.0481 0.46429 0.0422 0.444610 0.0369 0.428211 0.0247 0.417712 0.0128 0.4123
And, the estimated survival probability in discrete time period #2…And, the estimated survival probability in discrete time period #2…
S052/II.2(a1): Introducing The Central Concepts In Classical Survival Analysis Continuing The Life Table Analysis – Estimating Sample Survival Probability
S052/II.2(a1): Introducing The Central Concepts In Classical Survival Analysis Continuing The Life Table Analysis – Estimating Sample Survival Probability
Here, according to the estimated sample survival probability, a proportion of 0.8843 of the accounts survived the 1th time period.
Here, according to the estimated sample survival probability, a proportion of 0.8843 of the accounts survived the 1th time period.
The estimated hazard probability suggests that a proportion of 0.1102 of accounts in the 2nd period risk set will “die” in the 2nd period (i.e., close).
The estimated hazard probability suggests that a proportion of 0.1102 of accounts in the 2nd period risk set will “die” in the 2nd period (i.e., close).
Because a proportion of 0.1102 of the risk set will “die” in the 2nd period, we know that (1 - 0.1102), or 0.8898, of the 2nd period risk set will survive.
In other words, a proportion of 0.8898 of the entering “0.8843” will remain “alive” beyond the 2nd time period (and be potentially available to close later).
The sample survival probability in the 2nd time period is therefore 0.8898
0.8843, or:
Because a proportion of 0.1102 of the risk set will “die” in the 2nd period, we know that (1 - 0.1102), or 0.8898, of the 2nd period risk set will survive.
In other words, a proportion of 0.8898 of the entering “0.8843” will remain “alive” beyond the 2nd time period (and be potentially available to close later).
The sample survival probability in the 2nd time period is therefore 0.8898
0.8843, or:7869.0)(ˆ
2 tS
© Willett, Harvard University Graduate School of Education, 04/13/23
S052/II.2(b) – Slide 19
TimePeriod
SampleHazard
Probabilityh(t)
Sample Survival
ProbabilityS(t)
0 1.00001 0.1157 0.88432 0.1102 0.78693 0.1158 0.69584 0.1076 0.62095 0.0891 0.56566 0.0825 0.51897 0.0601 0.48778 0.0481 0.46429 0.0422 0.444610 0.0369 0.428211 0.0247 0.417712 0.0128 0.4123
TimePeriod
SampleHazard
Probabilityh(t)
Sample Survival
ProbabilityS(t)
0 1.00001 0.1157 0.88432 0.1102 0.78693 0.1158 0.69584 0.1076 0.62095 0.0891 0.56566 0.0825 0.51897 0.0601 0.48778 0.0481 0.46429 0.0422 0.444610 0.0369 0.428211 0.0247 0.417712 0.0128 0.4123
And, the estimated survival probability in discrete time period #3 … etcAnd, the estimated survival probability in discrete time period #3 … etc
S052/II.2(a1): Introducing The Central Concepts In Classical Survival Analysis Continuing The Life Table Analysis – Estimating Sample Survival Probability
S052/II.2(a1): Introducing The Central Concepts In Classical Survival Analysis Continuing The Life Table Analysis – Estimating Sample Survival Probability
Here, according to the estimated sample survival probability, a proportion of 0.7869 of the accounts survived the 2nd time period.
Here, according to the estimated sample survival probability, a proportion of 0.7869 of the accounts survived the 2nd time period.
The estimated hazard probability suggests that a proportion of 0.1158 of accounts in the 3rd period risk set will “die” in the 3rd period (i.e., close).
The estimated hazard probability suggests that a proportion of 0.1158 of accounts in the 3rd period risk set will “die” in the 3rd period (i.e., close).
Because a proportion of 0.1158 of the risk set will “die” in the 3rd period, we know that (1 - 0.1158), or 0.8842, of the 3rd period risk set will survive.
In other words, a proportion of 0.8842 of the entering “0.7869” will remain “alive” beyond the 3rd time period (and be potentially available to close later).
The sample survival probability in the 3rd time period is therefore 0.8842
0.7869, or:
Because a proportion of 0.1158 of the risk set will “die” in the 3rd period, we know that (1 - 0.1158), or 0.8842, of the 3rd period risk set will survive.
In other words, a proportion of 0.8842 of the entering “0.7869” will remain “alive” beyond the 3rd time period (and be potentially available to close later).
The sample survival probability in the 3rd time period is therefore 0.8842
0.7869, or:6958.0)(ˆ
3 tS
© Willett, Harvard University Graduate School of Education, 04/13/23
S052/II.2(b) – Slide 20
TimePeriod
SampleHazard
Probabilityh(t)
Sample Survival
ProbabilityS(t)
TimePeriod
SampleHazard
Probabilityh(t)
Sample Survival
ProbabilityS(t)
jt )(ˆ jth )(ˆjtS
1jt )(ˆ1jtS
As a general principle, the estimated survivor probability in any time period j can be found by substituting into a simple little rule …
As a general principle, the estimated survivor probability in any time period j can be found by substituting into a simple little rule …
S052/II.2(a1): Introducing The Central Concepts In Classical Survival Analysis Simple Rule For Estimating Sample Survival Probability
S052/II.2(a1): Introducing The Central Concepts In Classical Survival Analysis Simple Rule For Estimating Sample Survival Probability
So, in general, in any time period j ..So, in general, in any time period j ..
)(ˆ)](ˆ1[)(ˆ1 jjj tSthtS
© Willett, Harvard University Graduate School of Education, 04/13/23
S052/II.2(b) – Slide 21
Plotting the sample survival probabilities against time period provides the sample survivor function.Plotting the sample survival probabilities against time period provides the sample survivor function.
Typical monotonically decreasing survivor function …Median lifetime survival probability is 6.6, point at which half of accounts are “still alive.”
Typical monotonically decreasing survivor function …Median lifetime survival probability is 6.6, point at which half of accounts are “still alive.”
S052/II.2(a1): Introducing The Central Concepts In Classical Survival Analysis Plotting the Sample Survivor Function And Estimating Median Lifetime Survivor Probability
S052/II.2(a1): Introducing The Central Concepts In Classical Survival Analysis Plotting the Sample Survivor Function And Estimating Median Lifetime Survivor Probability
Research Question 2for Next Time…
• Question 2. How can we predict core deposit interest rates?• A. from prime interest rate?• B. from market interest rate?• 1. Can we predict core deposit interest rate from
3 month LIBOR (one index of market interest rate)?
• 2. from lagged LIBOR indices?• 3. Are there other market interest rate indices we
want to include to predict core deposit interest rate?