omop cdm etl 6 · reusable etl code base to standardize and accelerate the speed of delivery....
TRANSCRIPT
The process, standardization, and patterns in OMOP CDM ETL CORE PLATFORMS
AUTHORS
PRESENTERS: By using a standardized process and methodology with an emphasis on data quality, OMOP CDM ETL can be completed and data refreshed with minimal defects and effort.
INTRODUCTION
METHODS
OMOP CDM ETL process can be complex and time consuming.
By identifying the common challenges, and standardizing methodology and process with focus on data quality the execution of the project can be streamlined.
OHDSI ETL/Quality Tools:White Rabbit, Usagi, ACHILLES
Data experience: EHR, Claims, Lab, Registry, Chart Review, CDISC SDTM, Survey
Dave Barman, Greg Klebanov
Take a picture to download the full paper
Dave Barman, Mikhail Archakov, Natalia Karataeva, Gregory Klebanov
Introduced a common process and methodology for every OMOP CDM ETL and refresh project.
Included in the process is lessons learned from previous engagements.
Reusable ETL code base to standardize and accelerate the speed of delivery.
Dedicated ETL team (OMOP Data Factory) has specialized skills in Big Data engi-neering, clinical data analysis, and medical (domain knowledge and OMOP vocab-ularies).
Established a QA process which includes testing (unit test, integration testing, and regression testing), statistics gathering, and analysis of mapping rate.
1
2
3
4
5
RESULTS
Developing and following consistent ETL process and best practices, following OHDSI THEMIS business rules and mature QA/QC process plays a crucial role in OMOP CDM conversions.
OMOP CDM ETL requires deep technical expertise and medical knowledge but also active participation in community discussions, continuous development of new ap-proaches and best practices.
Study Design
Mapping Examples
CDM condition_occurrence table
condition_concept_id condition_source_value
condition_source_concept_id
856984 786.2 12485
Concept table
concept_code vocabulary_id concept_id domain_id
786.2 ICD9CM 12485 Condition
2449 ICD9CM 378291 Procedure
CDM procedure_occurrence table
procedure_concept_id procedure_source_value
procedure_source_concept_id
0 2449 378291
CDM observation table
observation_concept_id observation_source_value
observatiopn_source_concept_id
0 65162036111 0
not found
map to default domain
Target concept not found
Target concept found concept_relationship
table
Source table
786.2
2449
65162036111
...
1. Find concept in OMOP vocabulary
Mapping Rules Overview 2. Map record to corresponding table
source codes
concept table
found?
concept_relationshipor
source_to_concept_map
target concept
Map to default domain
Map to concept domain
found and standard?
Map to default domain
Observation table2.1
2.2 Map to concept domain
concept domain id ?
...
condition_occurrencetable
procedure_occurrencetable
‘Procedure’
...
‘Condition’
YesNo
Yes
Noсoncept_id = 0
No
OMOP CDM ETL Workflow
INPATIENT
OUTPATIENT
DRUG
EVENTS LOOKUP
CONCEPT MAPPING
Clinical Data TablesPERSON
CONDITION_OCCURRENCEPROCEDURE_OCCURRENCE
OBSERVATIONDRUG_EXPOSURE
OBSERVATION_PERIOD…
QA/QC
STATIUS REPORTS
OMOP CDM RELEASE
Health Economics Data TablesDerived Elements
Prep
roce
ssRA
W D
ATA
PRO
FILI
NG
ANAL
YSIS
….
LOAD SOURCE DATA AND
VOCABULARIES
Health System Data TablesLOCATIONCARE_SITEPROVIDER
CDM Conversion with Hadoop - Architecture
Perform source data profiling and
analysis
•Create (Update) OMOP
Standardized Vocabularies
• Create (Update) Custom
Vocabulary Mappings
• Create (Update) ETL
Specifications
• Develop / Update ETL
Code
• Execute ETL process
•Perform full QC process
• Perform UAT and Release
Implement Updates
Run on Sample
Run on Full Set
UAT and sign off
Evaluate New Data