validation of healthcare databases - · pdf filevalidation of healthcare databases aldana...
TRANSCRIPT
Validation of Healthcare
DatabasesALDANA ROSSO, PH.D
LUND UNIVERSITY AND SKÅNE UNIVERSITY HOSPITAL. 28 APRIL 2017
Example: Patients with Hearth Failure
• We want to study all-cause mortality in patients that
suffered hearth failure.
• How do we do it?
Definition of Population
All patients that have HF
ICD10 code for Hearth Failure
Definition of Population
Patients with ICD10 I50*
All patients that have HF
Definition of Population
Patients with ICD10 I50*
All patients that have HF
Patients fulfilling the HF
register definition
Definition of Population: Completeness
Patients with ICD10 I50*
All patients that have HF
Patients fulfilling the HF
register definitionRegistered
patients
Definition of Population
• We aware that your study population is different from your
population of interest. Generalizability of results??
Example: Swedish HF Register
Example: Swedish HF Register
Difference in All-Cause Mortality
Problem: Selection Bias
What you Need to Know about Your
Study Population
• How the population in the register is defined.
• How many patients are registered (completeness). This is
determined by comparison with other registers.
• Which clinics are reporting to the register and why?
• This is needed to ”estimate” selection bias in your study.
Example: Spanish National Acute Coronary
Syndrome register
Example: Spanish National Acute Coronary
Syndrome register
• Audit of the Spanish national acute coronary syndrome register [Ferreira-Gonzalez et al, Circulation: Cardiovascular Quality and Outcomes,2009; 2: 540-547].
• They compared enrolled patients with those that were not enrolled for some participating hospitals (17 of 50).
• Missed patients were of higher risk and received less recommended therapies than the included patients. In-hospital mortality was almost 3 times higher in the missed population.
http://www.thelovesensei.com/perfect-for-you-doesnt-necessarily-translate-to-perfect-human-being/
Research using Healthcare Databases
Implies…
1. Wrong data: misclassification
2. Missing data
• REMEMBER: you are using data for research that was
created for other purposes!
Example: Misclassification
Intensive Care Register
http://portal.icuregswe.org/Rapport.aspx
Example: Misclassification
Intensive Care Register
Diagnoser Antal
Z04.9Undersökning och observation av icke
specificerat skäl 3200
J96.9 Respiratorisk insufficiens, ospecificerad 2000
I46.9 Hjärtstillestånd, ospecificerad 1500
R57.2 Septisk chock 1500
R65.1Systemiskt inflammatoriskt svarssyndrom
[SIRS] av infektiöst ursprung med organsvikt 1500
T07.9 Icke specificerade multipla skador 1400
R56.8 Andra och icke specificerade kramper 1400
K92.2 Gastrointestinal blödning, ospecificerad 1200
J15.9 Bakteriell pneumoni, ospecificerad 1200
http://portal.icuregswe.org/Rapport.aspx
Implications of Misclassification
• It affects the statistical analyses, just accept it!
• How depends on how bad the misclassification is and the
reason for misclassfication => Sensitivity analysis.
Exempel: IVA-diagnoser
Diagnoser Antal
Z04.9 3200
J96.9 2000
I46.9 1500
R57.2 1500
R65.1 1500
T07.9 1400
R56.8 1400
K92.2 1200
J15.9 1200
http://portal.icuregswe.org/Rapport.aspx
Andel felaktiga
diagnoser som
blev ”Z04.9”
Andel korrekta
ranking
2 % 4 % ( 2 % - 10 %)
5 % 5 % ( 2 % - 12 %)
Implication of Misigness
• It depends on why the data is missing:
– missing at random: problematic from a power perspective
but it doesn’t bias the results.
– missing not at random: bias + power problems. That’s
why you need to know which clinics are reporting and
why!
Consequences of Statistical Analysis with
Missing Data
• Some calculations are more sensitive than others.
• Ranking is specially bad!
Example: Calculations with Missing
Data
• Monte Carlo simulation: a register with 20 000 primary
operations and 3 hospitals.
• 5% of those patients have a reoperation.
• How does the percentage of missing values affect the
percentage of reoperation?
Example: Calculations with Missing
data
• 20000 patients, 5 % have a reoperation.
• 5 % reoperations missing at random.
Hospital Proportion
Reoperation
Proportion
with missing
data
95 % CI
1 0.048 0.046 (0.045,0.047)
2 0.050 0.048 (0.047,0.049)
3 0.052 0.049 (0.048,0.050)
Example: Calculations with Missing Data:
Which clinic is reoperating more patients?
• 20000 patients, 5 % have a reoperation.
Percentage
missing data
Proportion
correct
ranking
5 % 95 %
7 % 87 %
Example: Swedish Knee Arthroplasty
register
Example: Neovascular Age Related
Macular Degeneration (AMD)
Leading cause of vision loss among
people age 50 and older.
Dry AMD: gradual breakdown of the
light-sensitive cells in the macula.
Neovascular (wet)AMD: abnormal
blood vessels grow underneath the
retina, which can leak fluid and
blood, which may lead to swelling
and damage of the macula.
https://nei.nih.gov/health/maculardegen/armd_facts
nAMDNormal fundus
Treatment: Anti VEGF Injections
Treatment: Anti-VEGF Injections
VEGF: Vascular endothelial growth factor
Aflibercept: Eylea, Bayer
Ranibizumab: Lucentis, Genentech
Bevacizumab: Avastin, Genentech
Visual Acuity
”Lowest acceptable” :
20/70: 60 ETDRS letters
The Swedish Macula Register
• A national register for treatment of neovascular AMD.
• 80 % coverage.
• National results for AMD treatment concerning age, sex, type of lesion,
treatment frequencies, and follow-up visits
• Medical outcome: distance visual acuity, near visual acuity and adverse
events.
• Analyze and compare different treatments and their outcome.
• Validation: Limited study gave information about errors in the database.
Treatment Compliance
• Known important factors to succeed with treatment:
– Age
– Good VA at baseline
• Patients are active in the register about 1.5 years.
• Why do patients not continue with treatment and/or
control visits?
Definition Treatment Termination
During Year 1
• Several possible definitions:
A. Patients that did not have a control visit at year 1 and
don’t have registered visits since the last visit for at least
4 months.
B. Patients terminated the treatment due to known reasons
+ patients without control visit.
C. Patients don’t have a 1 year control visit and the latest
registered VA was good.
Statistical Methods
• Objective: calculate risk of early treatment termination for all causes (outcome B).
• Model: Poisson regression to calculate risk. OBS: It is preferable to use Logistic Regression and then calculate the risk with a macro with Stata but not with multiple imputation.
• Model Validation:
– Comparison predicted/observed probabilities
– Deviance residuals
– Comparison with logistic regression
– Other ideas that work with clustered data?
Missing Data
• Missing values for VA. I used MI to calculate VA at
baseline for the treated (4 eyes) and fellow eyes (88 of
989 eyes, 9 %).
• Difficult to known whether there are missing visits, around
20% of missing patients compared to the National
Patients register.
• Errors in the database: VA has approximately 5 % wrong
registered values.
• MC simulations with 20 % missing visits and 5 %
changed VA values.
Take Home Message
• We aware that your study population is different from your
population of interest. Check:
– How the population in the register is defined.
– How many patients are registered (completeness).
This is determined by comparison with other registers.
– Which clinics are reporting to the register and why?
– Missing values and misclassification: How much? Is it
possible to compare with other registers?
Error Sources in National Quality Registries
1. Before registration
2. During registration
3. After registration
Error Sources in Healthcare Databases
1. Before registration:
– wrong data is registered in the journal.
– the patient is not enrolled in the register.
– It is very difficult to estimate the error proportion of this
type of error.
Error Sources in Healthcare Databases
2. During registration:
– misinterpretation and inaccurate typing: wrong value,
wrong calculation, wrong alternative.
– incomplete data.
• Misinterpretation and inaccurate typing: 5 %.*
• Missing data: 3 %.*
*Defining and improving data quality in medical registries: A literature review, case study and generic framwork.
Arts D.G.T. et al, J Am Med Inform Assoc, 2002 (9) 600.
Healthcare Databases
3. After registration:
– programming errors.
– Communication problems between different databases.
• They are very difficult to find and have severe consequences.
*Defining and improving data quality in medical registries: A literature review, case study and generic framwork.
Arts D.G.T. et al, J Am Med Inform Assoc, 2002 (9) 600.
Is automatic registration the solution?
Dutch National Intensive Care Evaluation
Register (NICE)
*Defining and improving data quality in medical registries: A literature review, case study and generic framwork.
Arts D.G.T. et al, J Am Med Inform Assoc, 2002 (9) 600.
NICE contains data from
patients who have been
admitted to Dutch
intensive care units and
provides insight into the
effectiveness and
efficiency of Dutch
intensive care.
Validation of Healthcare Databases
• Contact a statistician before startning the project!
Validation Using Sampling Theory
Principle: we only take a sample of some patients and compare
their journals to the registered data. We extend this information
to all the patients in the register.
Validation of Healthcare Databases
References:
http://www.scb.se/Upload/NSM2016/theme4/C_3_Aldana_R
osso.pdf
Simple Random Sampling
1. Select randomly some
patients.
2. Estimate the proportion
of incorrect data.
3. Extrapolate to the
register.
Register
Simple Random Sampling
• Usually without replacement.
• It is easy to program.
• It may give a sample that doesn’t represent the register
very well.
• “It may require more patients than other sampling
techniques”.
• Expensive (transportation cost).
Stratified Random Sampling
1. Divide the register in strata (e.g. hospitals, regions, etc.).
2. Within each strata, select some patients.
3. Estimate the proportion of incorrect data.
4. Extrapolate to the register.
Register
Stratified Random Sampling
• It is possible to estimate the proportion of error for each
stratum (hospitals).
• It requires the participation in the validation of all the strata.
• The patients within the stratum can be selected in several
ways:
– a fixed amount of patients are selected randomly within
each stratum.
– selection of the same percentage as the stratum in the
frame.
Stratified Random Sampling
• At least as effective as SRS for the same sample size.
• If information about the error distribution is known, the
design can be improved.
• It gives a more representative sample.
• Expensive (transportation cost).
Stratified Random Sampling
register
1
2
3
4register
1
2
3
4
Efficient Stratification Non-efficient
Stratification
Cluster Sampling
1. Divide the register in
clusters (e.g. hospitals,
regions, ect.).
2. Select some clusters.
3. Within each cluster, select
some patients.
4. Estimate the proportion of
incorrect data.
5. Extrapolate to the register.
Register
Cluster Sampling
• Multistage with different sampling weights. For example:
– Level 1: cluster region.
– Level 2: cluster hospital.
– Level 3: sampling units patients.
• It can be used in combination with stratification.
Cluster Sampling
• Lower transportation cost.
• Only some hospitals are represented in the validation.
• It requires more patients than SRS to achieve the same
precision.
Cluster Sampling
register
1
2
3
4register
1
2
3
4
Non-efficient cluster Efficient cluster
Another Issue…
• The patient actually had the diagnostic, rigth???
How estimate the Proportion of
Correctly Diagnoses Patients
• It is done in a similar manner but with an expert group. It
is called ”Adjudication”.
• Usually reported as ”positive predicted value”. For
example in the HF article it was around 95 %.
What happens after Validation?
• Ethical and legal viewpoints: We should correct the data
(Sweden).
• Bias concerns?
• Information about the data quality and the missing data in
the publications?