a centralized de- duplication service a centralized de- duplication service 2003 immunization...
TRANSCRIPT
A Centralized De-A Centralized De-Duplication ServiceDuplication Service
2003 Immunization Registry Conference 2003 Immunization Registry Conference
Paul Schaeffer, MPA, NYC DOHMHPaul Schaeffer, MPA, NYC [email protected]
Daryl Chertcoff, HLN ConsultingDaryl Chertcoff, HLN [email protected]
Co-Authors: Alexandra Ternier Co-Authors: Alexandra Ternier
Angel Aponte (DOHMH)Angel Aponte (DOHMH)
ObjectivesObjectivesTo describe To describe the NYC the NYC
Department of Health and Department of Health and Mental Hygiene’s (DOHMH) Mental Hygiene’s (DOHMH) centralized de-duplication centralized de-duplication serviceservice
Rationale – Centralized Rationale – Centralized De-Duplication Service De-Duplication Service
DDuplication of records – uplication of records –
a department-wide database a department-wide database problemproblem
Duplication Rates - Duplication Rates - DOHMH DatabasesDOHMH Databases
ProgramProgram Current Current Estimated Estimated Duplication Duplication RatesRates
CIRCIR 30%30%
LQLQ 7%7%
CDSSCDSS 30%30%
Key TermsKey Terms Master Client Index (MCI) – Master Client Index (MCI) –
database that stores information database that stores information from different programs for from different programs for matchingmatching
Core Services – implementation of Core Services – implementation of Business Rules governing the MCIBusiness Rules governing the MCI
De-Duplication Service – matches De-Duplication Service – matches duplicate recordsduplicate records
Background - MCIBackground - MCI The MCI integrates data from and The MCI integrates data from and provides a centralized de-duplication provides a centralized de-duplication service to:service to:
Citywide Immunization Registry (CIR)Citywide Immunization Registry (CIR) Lead Quest Registry (LQ) from the Lead Lead Quest Registry (LQ) from the Lead Poisoning and Prevention ProgramPoisoning and Prevention Program Vital birth recordsVital birth records Communicable Disease (Spring 2004)Communicable Disease (Spring 2004) Additional health databases (in the Additional health databases (in the future)future)
Development of MCIDevelopment of MCI Developing Requirements & SpecsDeveloping Requirements & Specs
Selecting middleware technologySelecting middleware technology
Building MCI Core Services Building MCI Core Services
Configuring servers and platforms Configuring servers and platforms
Building MCI Administration ToolsBuilding MCI Administration Tools
Development of MCIDevelopment of MCI (Continued)(Continued)
Modifying CIR and LQ (first clients)Modifying CIR and LQ (first clients) Training artificial intelligence de-Training artificial intelligence de-duplication softwareduplication software
Data loads into MCIData loads into MCI
DeploymentDeployment
Master Client IndexMaster Client Index
De-Duplication ServiceDe-Duplication ServiceMCI CoreMCI Core Services Services Win 2000Win 2000 Servers Servers
LQ ClientLQ ClientCIR ClientCIR Client
MCI MCI AdministrationAdministration
Tools Tools (VB Application)(VB Application)
MCI MCI
DatabaseDatabase
(Oracle)(Oracle)
Unix ServerUnix Server
CIR CIR DatabaseDatabase (Oracle)(Oracle)
Unix ServerUnix Server
LQ DatabaseLQ Database(Microsoft SQL(Microsoft SQL))
Win 2000 Win 2000 ServerServer
CIR Front EndCIR Front EndPower Builder Power Builder
Application Application
LQ Front End LQ Front End Power BuilderPower BuilderApplicationApplication
CDSS ClientCDSS Client
CDSS DatabaseCDSS Database(Microsoft SQL(Microsoft SQL))
Win 2000 ServerWin 2000 Server
CDSS Front End CDSS Front End JSP WebJSP Web
ApplicationApplication
MCI – Core ServicesMCI – Core Services MCI’s main function - to facilitate matching MCI’s main function - to facilitate matching
and be extensible to all DOHMH databasesand be extensible to all DOHMH databases
Data model - designed with attributes Data model - designed with attributes
common to all systemscommon to all systems
Information specific to a particular system Information specific to a particular system
may also be stored in the MCI to improve may also be stored in the MCI to improve
matchingmatching
MCI – Core Services MCI – Core Services (Continued)(Continued)
““Person-centric" modelPerson-centric" model
Artificial intelligence is “trained” by Artificial intelligence is “trained” by
program-specific dataprogram-specific data
Matching based on probabilistic Matching based on probabilistic
algorithmalgorithm
De-Duplication : De-Duplication : FeaturesFeatures
Potential duplicate pairs are Potential duplicate pairs are reviewed by humans to train the reviewed by humans to train the modelmodel
““Artificial Intelligence” model Artificial Intelligence” model createdcreated
Match thresholds are determinedMatch thresholds are determined
De-Duplication : De-Duplication : ProcessProcess
Incoming Records to MCI (not Incoming Records to MCI (not client systems)client systems)
De-Duplication happens in MCI and De-Duplication happens in MCI and trickles down to client systems trickles down to client systems
Clients have access to each other’s Clients have access to each other’s data for human review processdata for human review process
De-Duplication De-Duplication Service – Some Service – Some NumbersNumbers
Estimated 94% of new reports will Estimated 94% of new reports will be either merged or inserted be either merged or inserted
Remaining 6% - sent to hold queue Remaining 6% - sent to hold queue for Human Reviewfor Human Review
99.7% accuracy of De-Duplication 99.7% accuracy of De-Duplication ServiceService
Benefits – Centralized Benefits – Centralized De-Duplication De-Duplication Service Service
Cross-program leveraging of Cross-program leveraging of resourcesresources
Programs have access to other Programs have access to other program’s dataprogram’s data
Less FTEs needed for human review Less FTEs needed for human review – able to re-deploy staff– able to re-deploy staff
ChallengesChallenges
Who will be responsible for cross Who will be responsible for cross program record review – individual program record review – individual programs, or an MCI team?programs, or an MCI team?
Ownership of data – CIR will now Ownership of data – CIR will now disseminate LQ datadisseminate LQ data
Confidentiality Issues Confidentiality Issues All Clients have access to VR informationAll Clients have access to VR information CIR has access to LQ dataCIR has access to LQ data
Fiscal issues Fiscal issues
Joint Project Activities – data Joint Project Activities – data disseminationdissemination
MCI System Operations & Maintenance MCI System Operations & Maintenance – need to divide responsibilities – need to divide responsibilities between MIS, MCI, CIR and LQ staffbetween MIS, MCI, CIR and LQ staff
ChallengesChallenges(Continued)(Continued)
Future PlansFuture Plans
Environmental Health - Adult Environmental Health - Adult Heavy Metal Poisoning Database Heavy Metal Poisoning Database
Expanding the MCI to the rest of Expanding the MCI to the rest of DOHMHDOHMH