theo georghiou & others: developing predictive models for social care
TRANSCRIPT
Developing predictive models for social care
Theo Georghiou, Geraint Lewis & Adam Steventon
Outline
• Background
• Information Governance
• Data Linkage
• Modelling Social Care
• Predicting Impactability
• Service Evaluation
Care Home Admissions
• Undesirable
• Costly
• Recorded in routine data
• Potentially avoidable
Upstream Interventions
• There is robust evidence that certain preventative interventions are effective at avoiding or delaying care home admission
• But they are only be cost-effective if they are offered to people truly at high risk
Predictive Factors
• Many factors are known to be predictive of care home admission
• Several face-to-face tools have been built using these factors
Factors statistically predictive of institutionalisation
Predictive Factor (Institutionalization) Number of Studies Age Dementia / Cognitive impairment ADL restriction Number of family members Use of day services Incontinence Co-morbidity Sickness Severe Disability Malignancy Consulting doctors at general hospitals Temporary nursing home assistance Housing conditions Marital status Walking ability Night delirium Mental disorientation Age of primary caregiver Living alone Number of sub-caregivers Number of rooms in house Home ownership Use of home help Self-perceived health
Health Needs
• Diagnoses
• Prescriptions
• Record of Health Contacts
Social Care Needs
• Client group
• Disabilities
• Record of care history
Health Service Use
• GP visits
• Community care
• Hospital care
Social Care Use
• Residential care
• Intensive home care
• Direct payments
Predictive Model
PAST FUTURE
Predictions based on routine data
• Less labour intensive so they can stratify the population systematically and repeatedly
• Avoid “non-response bias”
• Can identify people with lower, emerging, risk
Potential Drawbacks
• Important issues of confidentiality and consent to consider
• Linking data sources at individual level across health and social care is problematic where there is no NHS number in social care
• The tools are never 100% accurate
• Data may be missing from routine databases on certain groups
Outline
• Background
• Information Governance
• Data Linkage
• Modelling Social Care
• Predicting Impactability
• Service Evaluation
Before predictive modelling can work, we need to reconcile the following:-
1. Predictive modelling believed to be very valuable in improving patient care
2. But at the same time we need to protect patient confidentiality and process data appropriately
Data protection
Is it possible to obtain consent from individuals prior to predictive modelling?Not feasible given numbers of patients involved
and:
“it has become clear that it is not appropriate to seek patient consent as not everyone whose data is analysed will be offered the new service.”
Source: Patient Information Advisory Group
Legal safeguards for health data
1. The principles of common law on informed consent and patient confidentiality
2. The Data Protection Act 1998, which requires appropriate data handling
3. The Human Rights Act 1998, which is concerned with the invasion of privacy
4. Also, the Caldicott principles in the NHS
Personal dataAccording to DPA 1998:
Personal data means data which relate to a living individual who can be identified –
(a) from those data, or(b) from those data and other information which is in the possession of, or is likely to come into the possession of, the data controller
Personal data relating to a person’s “physical or mental health or condition” is sensitive personal data.
DPA 1998 requirements for processing of sensitive personal data
At least one of the following:
1. Processing with explicit consent of the data subject2. Processing necessary to protect the vital interests of the data
subject or another person, where it is not possible to get consent
3. Processing necessary for the purpose of, or in connection with, legal proceedings (including prospective legal proceedings), etc.
4. The processing is necessary for medical purposes and is undertaken by a health professional or a person owing a duty of confidentiality equivalent to that owed by a health professional
Medical purposes is defined in the Act to include preventative medicine, medical diagnosis, medical research, the provision of care and treatment, and the management of healthcare services.
Alternatives (1): s60 powers
Section 60 of the Health and Social Care Act 2001 (later s251 of the National Health Service Act 2006):
Introduced to allow the regulated use of information by organisations wishing to obtain patient identifiable data [a similar concept to sensitive personal data], for medical purposes, where it was impracticable to obtain informed consent
Applies in England and Wales
Disclosure of information on the basis of an Order made under s60 cannot be legitimately accused of involving breaches of confidence (source: Information Commissioner)
PIAG (later ECC) set up to advise the Secretary of State on the use of powers provided by s60
J7KA42
J7KA42
J7KA42
J7KA42
J7KA42
J7KA42 76.4
131178 76.4
Encrypted, linked data
Decrypted data with risk score
attached
131178
131178
131178
131178
Inpatient
Outpatient
A&E
GP
Name, Address, DOB
Name, Address, DOB
Name, Address, DOB
Name, Address, DOB
Pseudonymisation in practice
Is pseudonymised data “personal data”?
According to DPA 1998:
Personal data means data which relate to a living individual who can be identified –
(a) from those data, or(b) from those data and other information which is in the possession of, or is likely to come into the possession of, the data controller
Personal data relating to a person’s “physical or mental health or condition” is sensitive personal data.
Pseudonymisation and the data protection act
“Retraceably pseudonymised data may be considered as information on individuals which are indirectly identifiable … In that case, although data protection rules apply, the risks at stake for the individuals with regard to the processing of such indirectly identifiable information will most often be low, so that the application of these rules will justifiably be more flexible than if information on directly identifiable individuals were processed.”
Source: Article 29 Working Party. Opinion 4/2007 on the concept of personal data, adopted on 20th June
Solution agreed …Process to undertake the analysis will include with it an encryption programme
Programme will be run by people not directly involved in providing care and treatment – but these people will not access the identifiable data held within the data file
The output files will be sent encrypted to the practice or other clinicians already providing care and treatment to the patients concerned
The decryption keys will be held by the PCT and will be sent separately to the health professionals involved
“It is a clear principle of the Patient Advisory Group that the first point of contact with patients should be made through a clinical team known to the patient, such as their GP practice.”
Source: PIAG (2008)
Outline
• Background
• Information Governance
• Data Linkage
• Modelling Social Care
• Predicting Impactability
• Service Evaluation
Data collected
Years (up to) No. records No. people
GP register 5 7,861,000 1,951,000
GP consultations 5 + 110,971,000 589,000
Inpatient (SUS) 5 3,268,000 999,000
Outpatient (SUS) 5 12,815,000 1,532,000
A&E (SUS) 5 2,127,000 925,000
Social care clients 3 + 81,000 81,000
Social care assessments 3 + 194,000 72,000
Social care services 3 + 326,000 79,000
Community 1,316,000
• From five sites (~ PCT/LA areas in England)
• Total nine organisations: 4 PCTs, 4 LAs, 1 Care trust• 1.8M population (range 100,000-700,000)
Data linkage - approachFirst instance: NHS number (encrypted) from LA
In absence of NHS number:– Central ‘batch tracing’ ??– Shared PCT/LA databases ??
Ultimately:– construction of ‘alternative IDs’
97% of individuals in one site (population ~400,000) were found to have unique ‘alternative ID’. Remaining 3% - attempt match by postcode
FSGDDMMYYYY
Forename
Surname
Male / Female
DOB
FSGDDMMYYYY
Forename
Surname
Male / Female
DOB
Data linkage - SummaryNHS number where available
(encrypted)
‘Alternative ID’ (+ postcode)
where not (both encrypted)
Linkage method
Site ANHS number provided for all social care clients. Match takes place through encrypted NHS number.
Site BNHS number provided for 89% of social care clients. Match via encrypted NHS number.
Site CNHS numbers given for 86% of clients. Match occurs by NHS number in the first instance, and then through the ‘alternative ID’ .
Sites D & E
No NHS numbers provided for social care clients. Match takes place via ‘alternative ID’.
Data linkage – how good?
N over 55N matched to
GP register % matchSITE A (100% NHS no)
People assessed 36,166 30,508 84%
service received 24,036 19,250 80%
‘significant new’ service 2,106 2,034 97%
SITE D (100% ‘alt id’)
People assessed 18,327 11,512 63%
service received 7,593 5,772 76%
‘significant new’ service 273 252 92%
Groups of people in social care data – how many are we able to match to GP register list (of ages 55+)?
Varies, but better for those with > service use
Data linkage Social & Hospital care overlap
Population of over 55s registered in one PCT
90% of those with a social care contact have also had secondary care contact(s) in three years
Data linkage Health and social care event timeline
Outline
• Background
• Information Governance
• Data Linkage
• Modelling Social Care
• Predicting Impactability
• Service Evaluation
DATA
Half of the Data Half of the Data
Development Validation
Predictive Model
Randomised
J7KA42
YH8TPP
G8HE9F
3LWZ67
2NX632
LG5DSD
3V9D54R
J7KA42
YH8TPP
G8HE9F
3LWZ67
2NX632
LG5DSD
3V9D54R
J7KA42
YH8TPP
G8HE9F
3LWZ67
2NX632
LG5DSD
3V9D54R
Year 1 Year 2 Year 3
Development Sample
Inpatient
Outpatient
A&E
GP
J7KA42
YH8TPP
G8HE9F
3LWZ67
2NX632
LG5DSD
3V9D54R
J7KA42
YH8TPP
G8HE9F
3LWZ67
2NX632
LG5DSD
3V9D54R
J7KA42
YH8TPP
G8HE9F
3LWZ67
2NX632
LG5DSD
3V9D54R
Development Sample
Year 1 Year 2 Year 3
Inpatient
Outpatient
A&E
GP
J7KA42
YH8TPP
G8HE9F
3LWZ67
2NX632
LG5DSD
3V9D54R
J7KA42
YH8TPP
G8HE9F
3LWZ67
2NX632
LG5DSD
3V9D54R
J7KA42
YH8TPP
G8HE9F
3LWZ67
2NX632
LG5DSD
3V9D54R
Development Sample
Year 1 Year 2 Year 3
Inpatient
Outpatient
A&E
GP
A89KP5
833TY6
I9QA44
85H3D
6445JX
233UMB
RF02UH
A89KP5
833TY6
I9QA44
85H3D
6445JX
233UMB
RF02UH
A89KP5
833TY6
I9QA44
85H3D
6445JX
233UMB
RF02UH
Validation Sample True
Positive
False Positive
False Negative
True Negative
Year 1 Year 2 Year 3
Inpatient
Outpatient
A&E
GP
A89KP5
833TY6
I9QA44
85H3D
6445JX
233UMB
RF02UH
A89KP5
833TY6
I9QA44
85H3D
6445JX
233UMB
RF02UH
Using the Model
Last Year This Year Next Year
Inpatient
Outpatient
A&E
GP
Modelling resultsPredicting for over 75s
– admission to care home / intensive home care– marked increase in social care costs (+£5,000)
* stable model not found
Number predicted by
model
of these, how many are
correct?PPV
No. people in area who do
experience the 'event'
Sensitivity
Site A 267 105 39% 2,204 5%
Site B 180 85 47% 497 17%
Site C 47 21 45% 220 10%
Site D ~20-40 * ~70-30% * 256 ~8-16 % *
Site E 119 67 56% 604 11%
Pooled (all sites) 557 201 36% 3,366 6%
Changing the Dependent Variable
Predict No Predict Yes
PPV Sensitivity SpecificityActual No Actual Yes Actual No Actual Yes
TRUE NEGFALSE NEG
FALSE POS TRUE POS
Pooled Model £5K
152,183 3,165 356 201 36% 6% 99.8%
Pooled £3K 151,245 3,660 564 436 44% 11% 99.6%
Pooled £1K 149,278 4,677 876 1,074 55% 19% 99.4%
Pooled £1 ! 143,598 8,154 1,559 2,594 62% 24% 98.9%
Predicting for over 75s– admission to care home / intensive home care– some increase in social care costs
VariableBeta
coefficients ProbabilityIntercept -4.96 <.0001
Age & Sex
Age band 8 (90+) (relative to 75-79) 1 <.0001 Age band 7 (85-89) (relative to 75-79) 0.87 <.0001 Age band 6 (80-84) (relative to 75-79) 0.47 <.0001 Sex = female 0.36 <.0001
Social care Prior Use
Any medium intensity home care year in past year 2.35 <.0001 Social Care data flag for health problem 2.14 <.0001 Any social care assessments recorded in past year 1.43 <.0001 Any low intensity home care year in past year 1.14 <.0001 Any day care in period 2-1 years prior 1.09 <.0001 Any social care assessments recorded in period two – one years prior 0.59 <.0001 Any meals supplied in period (2-1) year prior 0.33 0.02No. of social care assessments in last year -0.14 0.03Any medium intensity home care year in period 2-1 year prior -1.22 <.0001
Health Care
OP visit in past two years: specialty Old Age Psychiatry 0.4 0.01Any inpatient diagnosis: COPD (previous 2 years) 0.39 0Any inpatient diagnosis: diabetes (previous 2 years) 0.39 0No of emergency admissions in past 90 days 0.29 <.0001 Any A&E visit arriving by ambulance in the past year 0.25 <.0001 Ratio of inpatient episodes to admissions in past year 0.16 <.0001 Number different OP specialties seen in prior two years 0.07 <.0001
Important model variables?
Note the importance of prior social care variables
Impact of adding new datasets
Predict No Predict Yes
PPV Sensitivity SpecificityActual No Actual Yes Actual NoActual
Yes
TRUE NEG FALSE NEG FALSE POSTRUE POS
Site D - £1K best 22,538 556 49 46 48.4% 7.6% 99.8%
+ IP and GP diagnostic vars
22,538 558 49 44 47.3% 7.3% 99.8%
+ GP vars 22,539 561 48 41 46.1% 6.8% 99.8%
+ Community care 22,534 557 53 45 45.9% 7.5% 99.8%
+ Deprivation vars 22,539 562 48 40 45.5% 6.6% 99.8%
Outline
• Background
• Information Governance
• Data Linkage
• Modelling Social Care
• Predicting Impactability
• Service Evaluation
Model predicts:
Details
Examples
Trend
Model predicts: Cost
Details Model predicts which patients will becomehigh-cost over next 6 or 12 months
Examples Low-cost patient this year will become high-cost next year
Trend
Model predicts: Cost Event
Details Model predicts which patients will becomehigh-cost over next 6 or 12 months
Model predicts which patients will have an event that can be avoided
Examples Low-cost patient this year will become high-cost next year
Patient will be hospitalized
Patient will have diabetic ketoacidosis
Trend
Model predicts: Cost Event Actionability
Details Model predicts which patients will becomehigh-cost over next 6 or 12 months
Model predicts which patients will have an event that can be avoided
Model predicts which patients have features that can readily be changed
Examples Low-cost patient this year will become high-cost next year
Patient will be hospitalized
Patient will have diabetic ketoacidosis
Patient has angina but is not taking aspirin
Patient does not have pancreatic cancer (Ambulatory Care Sensitive)
Trend
Model predicts: Cost Event Actionability Readiness to
engage
Details Model predicts which patients will becomehigh-cost over next 6 or 12 months
Model predicts which patients will have an event that can be avoided
Model predicts which patients have features that can readily be changed
Model predicts which patients are most likely to engage in upstream care
Examples Low-cost patient this year will become high-cost next year
Patient will be hospitalized
Patient will have diabetic ketoacidosis
Patient has angina but is not taking aspirin
Patient does not have pancreatic cancer (Ambulatory Care Sensitive)
Patient does not abuse alcohol
Patient has no mental illness
Patient previously compliant
Trend
Model predicts: Cost Event Actionability Readiness to
engageReceptivity
Details Model predicts which patients will becomehigh-cost over next 6 or 12 months
Model predicts which patients will have an event that can be avoided
Model predicts which patients have features that can readily be changed
Model predicts which patients are most likely to engage in upstream care
Model predicts what mode and form of intervention will be most successful for each patient
Examples Low-cost patient this year will become high-cost next year
Patient will be hospitalized
Patient will have diabetic ketoacidosis
Patient has angina but is not taking aspirin
Patient does not have pancreatic cancer (Ambulatory Care Sensitive)
Patient does not abuse alcohol
Patient has no mental illness
Patient previously compliant
Patient prefers email rather than telephone
Patient prefers male voice rather than female
Readiness to change
Trend
Outline
• Background
• Information Governance
• Data Linkage
• Modelling Social Care
• Predicting Impactability
• Service Evaluation
The problem of regression to the mean in service evaluation
0
5
10
15
20
25
30
35
40
45
50
- 5 - 4 - 3 - 2 - 1 Intense year
+ 1 + 2 + 3 + 4
Aver
age
num
ber o
f em
erge
ncy
bed
days
IC collates and adds (if required) NHS numbers using batch tracing
IC derives extra identifiers
Sites collate patient lists
Patient identifiers (e.g. NHS number)
Trial information (e.g. start and end date)
Non-patient identifiable keys (e.g. HES ID, pseudonymised NHS number)
KEY
Participating sites
Information Centre
Nuffield Trust
Owner of pseudonymisation password (DH)
0.0
0.1
0.2
0.3
-12 -11 -10 -9 -8 -7 -6 -5 -4 -3 -2 -1 1 2 3 4 5 6 7 8 9 10 11 12Num
ber
of e
mer
genc
y ho
spit
al a
dmis
sion
s pe
r he
ad p
er m
onth
Month
Intervention
Start of intervention
Overcoming regression to the mean using a control group (1)
0.0
0.1
0.2
0.3
-12 -11 -10 -9 -8 -7 -6 -5 -4 -3 -2 -1 1 2 3 4 5 6 7 8 9 10 11 12Num
ber
of e
mer
genc
y ho
spit
al a
dmis
sion
s pe
r he
ad p
er m
onth
Month
Control Intervention
Start of intervention
Overcoming regression to the mean using a control group (2)