TestingDataCompletenesswithDQe-c-v2OHDSISymposium2019:DataQualityWorkshop
09/17/19
TimBergquist,GraduateResearchAssistantBiomedicalInformatics&MedicalEducation
UniversityofWashington
WWAMI region Practice & Research Network
• 60+PrimarycareWWAMIclinics• ~20dataconnectedclinics• CHCsandRHCs• Underservedpopulations• Manyservingruralpopulations• Collaborationwithnational
networkofpracticebasedresearchnetworks
• DataQUESTrepresentsover250,000patientshttps://dataquest.iths.org/
DataQUEST• 20data-connectedclinicsintheWPRN• Representsover250,000patients
Anelectronichealthdata-sharingarchitecture
acrosscommunity-basedprimarycarepracticesin
theWPRN
MeasuringDataQualityFramework
Completeness • Arethedatapresent?
Conformance • Arethedatastandardizedandformatted?
Plausibility • Arethedatabelievable?
Kahnetal.(2016).Aharmonizeddataqualityassessmentterminologyandframeworkforthesecondaryuseofelectronichealthrecorddata.eGEMS,4,1244.https://www.ncbi.nlm.nih.gov/pubmed/27713905
Operationalizingtheframeworkinto:5conceptualtestsand17discretetestsacross:
MeasuringDataQualityFramework
Completeness • Arethedatapresent?
Conformance • Arethedatastandardizedandformatted?
Plausibility • Arethedatabelievable?
Kahnetal.(2016).Aharmonizeddataqualityassessmentterminologyandframeworkforthesecondaryuseofelectronichealthrecorddata.eGEMS,4,1244.https://www.ncbi.nlm.nih.gov/pubmed/27713905
Operationalizingtheframeworkinto:5conceptualtestsand17discretetestsacross:
MeasuringDataQualityFramework
Completeness • Arethedatapresent?
Conformance • Arethedatastandardizedandformatted?
Plausibility • Arethedatabelievable?
Kahnetal.(2016).Aharmonizeddataqualityassessmentterminologyandframeworkforthesecondaryuseofelectronichealthrecorddata.eGEMS,4,1244.https://www.ncbi.nlm.nih.gov/pubmed/27713905
Operationalizingtheframeworkinto:5conceptualtestsand17discretetestsacross:
DataQualityTestsDQFramework
category TEST
COMPLETENESS Gender,Visit,Observationcompleteness(denominatorandproportionwithvaliddata)
COMPLETENESS Keyclinicalstatuscompleteness(denominatorandproportionwithvaliddata):Smokingstatus,alcoholconsumption
COMPLETENESS Measurementcompleteness(denominatorandproportionwithvaliddata):Height,Weight,SBP,DBP
COMPLETENESS CrossreferencetablesthatarepresentincurrentdatasettoexpectedtablesinstandardOMOPCDM
COMPLETENESS LooksforNULLandinvalidvariablevaluesineachcolumnandvisualizespercentmissingness
CONFORMANCE Checkthatprimaryandforeignkeysrelateproperly;HighPriority:Person_ID,Visit_Occurrence_ID
CONFORMANCE Checksthatorphandon'tkeysexist(aforeignkeyispresentinatablebutnoprimarykeyexistsinthereferencetable)
PLAUSIBILITY Comparisonofnewloadtooldload(Numberofobservations,Numberofuniquepatients,Numberoftableswithrows)
PLAUSIBILITY SizeoftablesandrowsacrosstheOMOPCDM
OriginalDQe-cToolModulartooldevelopedinRforassessingcompleteness inEHRdatarepositories.Customizationandconfigurationwasdifficult
HardtoaddnewmodulesDifficulttoaddnewCDMs(ornewversionsofCDMs)
DQe-c-v2ToolModulartooldevelopedinpythonforassessingcompleteness inEHRdatarepositories.
Takesinthedatabasecredentials,CDMversion,andconfigurations.
DQe-c-v2Tool
Takesinthedatabasecredentials,CDMversion,andconfigurations.
DQe-c-v2Tool
Simplyenteryourcredentialsandconfigurationsintotheconfig.json file.
Takesinthedatabasecredentials,CDMversion,andconfigurations.
DQe-c-v2Tool
Simplyenteryourcredentialsandconfigurationsintotheconfig.json file.
Run:pythonDQe-c.py –c/path/to/config.json
DQe-c-v2ToolSetsupthedatabaseconnection,managesreportoutput,andinitiatestheCDMfiles
DQe-c-v2ToolAssessesconformancetoaCommonDataModel.Checksformissingtablesandcalculatessizeoftables.
Quicklycheckthatthenewdataisgrowingasexpected
DQe-c-v2ToolAssessescompleteness ofallcolumnsintheavailabletablesinthedatabase.Checksfornullandnonsensevalues.
IdentifyemptyorusefulcolumnsineachofyourOMOPtables.
DQe-c-v2ToolChecksfororphankeys,foreignkeysnotpresentintheprimarytable.
DQe-c-v2ToolChecksformissingness inclinicalindicators.(Whatpercentofpatientshaveaheartratemeasure,bloodpressuremeasurement,etc.)
Addinganewindicatortestisstraightforward!
Completionasthepresenceofaconcept.Calculateswhatpercentageofpatientshavetheidentifiedconcept(s).
Completionasthepresenceofanon-null.Calculateswhatpercentageofpatientshaveanon-nullvalueintheidentifiedtable-column.
Wecanaddanewindictortestbyjustaddingfivenewfields.
AddingtestingforA1CHemoglobin.CalculateswhatpercentageofpatientshaveahemoglobinA1Cmeasurement.
DQe-c-v2ToolAllreportsarecombinedintoavisualizationdashboard
DQe-c-v2ToolAllthesemodulesoutputcsvreports.TheoutputfoldersaremanagedbyQuery.py
DQe-c-v2Tool
Allthesemodulesoutputcsvreports.TheoutputfoldersaremanagedbyQuery.py toaccountfordifferenttestdatesandorganizations.
DQe-c-v2NetworkAggregationTool
DQe-c-v2NetworkAggregationTool
DQe-c-v2NetworkAggregationTool
DQe-c-v2NetworkAggregationTool
DQe-c-v2ToolReportsarevisualizedintoanHTMLfile.Easytoembedintoawebsite
AddingNewModules
VocabularySummary
VocabularySummary
VocabularySummary
TemporalPlausibility
OperationalizinguseofDQetoolsfordataqualitytesting
*DataQUEST*DARTNet Institute*CD2H
https://github.com/WWAMI-DataQuest/DQe-c_OMOPv4/tree/master/docs
Questions?
• Wearelookingforcollaboratorsandcontributors!
• Contactmeifyouneedhelpgettingthetoolupandrunning.
• Wearealwayslookingforfeedback.
ThankstoKariStephens,HosseinEstiri,WPRN,ITHS,andCD2H!
Contact:[email protected]
https://dataquest.iths.org/
https://ctsa.ncats.nih.gov/cd2h/
https://github.com/data2health/DQe-c-v2
CD2HDataQualityProject
https://ctsa.ncats.nih.gov/cd2h/data-quality-methods-and-tools-to-support-ctsa-hub-data-sharing/