omop cdm etl 6 · reusable etl code base to standardize and accelerate the speed of delivery....

1
The process, standardization, and patterns in OMOP CDM ETL CORE PLATFORMS AUTHORS PRESENTERS: By using a standardized process and methodology with an emphasis on data quality, OMOP CDM ETL can be completed and data refreshed with minimal defects and effort. INTRODUCTION METHODS OMOP CDM ETL process can be complex and time consuming. By identifying the common challenges, and standardizing methodology and process with focus on data quality the execution of the project can be streamlined. OHDSI ETL/Quality Tools: White Rabbit, Usagi, ACHILLES Data experience: EHR, Claims, Lab, Registry, Chart Review, CDISC SDTM, Survey Dave Barman, Greg Klebanov Take a picture to download the full paper Dave Barman, Mikhail Archakov, Natalia Karataeva, Gregory Klebanov Introduced a common process and methodology for every OMOP CDM ETL and refresh project. Included in the process is lessons learned from previous engagements. Reusable ETL code base to standardize and accelerate the speed of delivery. Dedicated ETL team (OMOP Data Factory) has specialized skills in Big Data engi- neering, clinical data analysis, and medical (domain knowledge and OMOP vocab- ularies). Established a QA process which includes testing (unit test, integration testing, and regression testing), statistics gathering, and analysis of mapping rate. 1 2 3 4 5 RESULTS Developing and following consistent ETL process and best practices, following OHDSI THEMIS business rules and mature QA/QC process plays a crucial role in OMOP CDM conversions. OMOP CDM ETL requires deep technical expertise and medical knowledge but also active participation in community discussions, continuous development of new ap- proaches and best practices. Study Design Mapping Examples CDM condition_occurrence table condition_concept_id condition_ source_value condition_ source_concept_id 856984 786.2 12485 Concept table concept_code vocabulary_id concept_id domain_id 786.2 ICD9CM 12485 Condition 2449 ICD9CM 378291 Procedure CDM procedure_occurrence table procedure_concept_id procedure_ source_value procedure_ source_concept_id 0 2449 378291 CDM observation table observation_concept_id observation_ source_value observatiopn_ source_concept_id 0 65162036111 0 not found map to default domain Target concept not found Target concept found concept_relationship table Source table 786.2 2449 65162036111 ... 1. Find concept in OMOP vocabulary Mapping Rules Overview 2. Map record to corresponding table source codes concept table found? concept_relationship or source_to_concept_map target concept Map to default domain Map to concept domain found and standard? Map to default domain Observation table 2.1 2.2 Map to concept domain concept domain id ? ... condition_occurrence table procedure_occurrence table ‘Procedure’ ... ‘Condition’ Yes No Yes No сoncept_id = 0 No OMOP CDM ETL Workflow INPATIENT OUTPATIENT DRUG EVENTS LOOKUP CONCEPT MAPPING Clinical Data Tables PERSON CONDITION_OCCURRENCE PROCEDURE_OCCURRENCE OBSERVATION DRUG_EXPOSURE OBSERVATION_PERIOD QA/QC STATIUS REPORTS OMOP CDM RELEASE Health Economics Data Tables Derived Elements Preprocess RAW DATA PROFILING ANALYSIS …. LOAD SOURCE DATA AND VOCABULARIES Health System Data Tables LOCATION CARE_SITE PROVIDER CDM Conversion with Hadoop - Architecture Perform source data profiling and analysis Create (Update) OMOP Standardized Vocabularies Create (Update) Custom Vocabulary Mappings Create (Update) ETL Specifications Develop / Update ETL Code Execute ETL process Perform full QC process Perform UAT and Release Implement Updates Run on Sample Run on Full Set UAT and sign off Evaluate New Data

Upload: others

Post on 30-Aug-2020

3 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: OMOP CDM ETL 6 · Reusable ETL code base to standardize and accelerate the speed of delivery. Dedicated ETL team (OMOP Data Factory) has specialized skills in Big Data engi-neering,

The process, standardization, and patterns in OMOP CDM ETL CORE PLATFORMS

AUTHORS

PRESENTERS: By using a standardized process and methodology with an emphasis on data quality, OMOP CDM ETL can be completed and data refreshed with minimal defects and effort.

INTRODUCTION

METHODS

OMOP CDM ETL process can be complex and time consuming.

By identifying the common challenges, and standardizing methodology and process with focus on data quality the execution of the project can be streamlined.

OHDSI ETL/Quality Tools:White Rabbit, Usagi, ACHILLES

Data experience: EHR, Claims, Lab, Registry, Chart Review, CDISC SDTM, Survey

Dave Barman, Greg Klebanov

Take a picture to download the full paper

Dave Barman, Mikhail Archakov, Natalia Karataeva, Gregory Klebanov

Introduced a common process and methodology for every OMOP CDM ETL and refresh project.

Included in the process is lessons learned from previous engagements.

Reusable ETL code base to standardize and accelerate the speed of delivery.

Dedicated ETL team (OMOP Data Factory) has specialized skills in Big Data engi-neering, clinical data analysis, and medical (domain knowledge and OMOP vocab-ularies).

Established a QA process which includes testing (unit test, integration testing, and regression testing), statistics gathering, and analysis of mapping rate.

1

2

3

4

5

RESULTS

Developing and following consistent ETL process and best practices, following OHDSI THEMIS business rules and mature QA/QC process plays a crucial role in OMOP CDM conversions.

OMOP CDM ETL requires deep technical expertise and medical knowledge but also active participation in community discussions, continuous development of new ap-proaches and best practices.

Study Design

Mapping Examples

CDM condition_occurrence table

condition_concept_id condition_source_value

condition_source_concept_id

856984 786.2 12485

Concept table

concept_code vocabulary_id concept_id domain_id

786.2 ICD9CM 12485 Condition

2449 ICD9CM 378291 Procedure

CDM procedure_occurrence table

procedure_concept_id procedure_source_value

procedure_source_concept_id

0 2449 378291

CDM observation table

observation_concept_id observation_source_value

observatiopn_source_concept_id

0 65162036111 0

not found

map to default domain

Target concept not found

Target concept found concept_relationship

table

Source table

786.2

2449

65162036111

...

1. Find concept in OMOP vocabulary

Mapping Rules Overview 2. Map record to corresponding table

source codes

concept table

found?

concept_relationshipor

source_to_concept_map

target concept

Map to default domain

Map to concept domain

found and standard?

Map to default domain

Observation table2.1

2.2 Map to concept domain

concept domain id ?

...

condition_occurrencetable

procedure_occurrencetable

‘Procedure’

...

‘Condition’

YesNo

Yes

Noсoncept_id = 0

No

OMOP CDM ETL Workflow

INPATIENT

OUTPATIENT

DRUG

EVENTS LOOKUP

CONCEPT MAPPING

Clinical Data TablesPERSON

CONDITION_OCCURRENCEPROCEDURE_OCCURRENCE

OBSERVATIONDRUG_EXPOSURE

OBSERVATION_PERIOD…

QA/QC

STATIUS REPORTS

OMOP CDM RELEASE

Health Economics Data TablesDerived Elements

Prep

roce

ssRA

W D

ATA

PRO

FILI

NG

ANAL

YSIS

….

LOAD SOURCE DATA AND

VOCABULARIES

Health System Data TablesLOCATIONCARE_SITEPROVIDER

CDM Conversion with Hadoop - Architecture

Perform source data profiling and

analysis

•Create (Update) OMOP

Standardized Vocabularies

• Create (Update) Custom

Vocabulary Mappings

• Create (Update) ETL

Specifications

• Develop / Update ETL

Code

• Execute ETL process

•Perform full QC process

• Perform UAT and Release

Implement Updates

Run on Sample

Run on Full Set

UAT and sign off

Evaluate New Data