(closet skeletons version) richard pham enterprise architect oi&t corporate data warehouse –...

112
The CDW Data Lifecycle - Internals, Data Flows, and Business Intelligence (Closet Skeletons Version) Richard Pham Enterprise Architect OI&T Corporate Data Warehouse – Architecture [email protected]

Upload: candace-small

Post on 25-Dec-2015

218 views

Category:

Documents


1 download

TRANSCRIPT

  • Slide 1
  • (Closet Skeletons Version) Richard Pham Enterprise Architect OI&T Corporate Data Warehouse Architecture [email protected]
  • Slide 2
  • Slide 3
  • REGION 1 REGION 2 REGION 4 REGION 3 CDW Informatics and Analytics Ecosystem RDW V20V19V18V22V21 RPC Farm RDW V12V15V16V17V23 RPC Farm RDW V1V2V3V4V5 RPC Farm RDWV6V7V8V9V10V11 RPC Farm CDW SAS Grid VINC I Ana Apps ePM GIS RPC Farm CDW Corporate Data Warehouse RDW Regional Data Warehouse Hardware Stats 411 Servers 4 PB Storage 54 Racks BI Farm SQL Server Data Center Build (Engine, SSIS, SSAS, SSRS) Excel Services SharePoint/PerformancePoint Services Team Foundation Services SAS Stata TreeAge
  • Slide 4
  • Some Things Never Change VHA and OI&T have a tense/unhappy relationship OI&T project management bureaucracy is onerous The use and oversight of contractors is problematic Pharmacy knows what they are doing (more so than OI&T)
  • Slide 5
  • In the beginning, there were files (early 70s)
  • Slide 6
  • There were problems How do I maintain each file? If I change one file, what happens to the other files? How do I control growth of the files?
  • Slide 7
  • Then came databases(late 70s)
  • Slide 8
  • And there were more problems How can the databases share common elements like patient? What if some idiot changes one table structure that collapses everything else? Who remembers how this database was designed?
  • Slide 9
  • This is only two packages, think of the 100+ that are in VistA Now, try extrapolating those trends in your head Have a picture in your mind?
  • Slide 10
  • Did That Picture Look Like This?! (~7% of VistA as of 2010)
  • Slide 11
  • Slide 12
  • One more extension, lets try to analyze this data
  • Slide 13
  • This Is What Happens With Extracts (90s)
  • Slide 14
  • Even more problems. Is my data timely (Extract to production system time lag)? Are the extracts one-time? Are they repeatable? Who manages all these extracts? No seriously, this becomes a really ugly problem
  • Slide 15
  • Why Am I Giving This Presentation? Quite simply, feedback on: I dont understand what you mean when you say File or Pointer. Where does the data come from? How does the data get to CDW? Also, while you are using the CDW to prepare your work, it really helps if you know the origins of where the data comes from
  • Slide 16
  • DHCP/VistA/CPRS/HealthEVet VistA Veterans Health Information Systems and Technology Architecture 2 nd Generation Architecture. Refers both to the architecture and the database which the architecture supports DHCP - Decentralized Hospital Computer Program The DOS (Unix-like) system where many of VistAs non-clinical entries take place CPRS - Computerized Provider Record System A user-friendly GUI providing access to clinical order entry functions HealthEVet 3 rd Generation of VAs EMR. Planned inclusions are patient-facing applications, better alignment with coding standards, and MDS compliance.
  • Slide 17
  • The Health Care Process Is More Complicated Than We Think
  • Slide 18
  • Objectives The main objective is to understand the data lifecycle of VAs VistA/CPRS and the user experience of VistA/CPRS A high-level overview of VistA Internals Learn about data structures and outputs in VistA Learn where data enters and travels throughout the VA Try to make sense of data resources within the VA and how they are accessed
  • Slide 19
  • The VA Data Lifecycle
  • Slide 20
  • Slide 21
  • Core Patient Care Functionality VistA is first and foremost an Electronic Medical Record. The architecture design supports veteran health care.
  • Slide 22
  • Core Patient Care Functionality VistA Internals DHCP CPRS
  • Slide 23
  • VistA Internals 101 MUMPS Server and Operating System Kernel Three Wise Men (Managers) TaskMan MailMan FileMan Modules
  • Slide 24
  • Slide 25
  • Why Is Med Safety at Ann Arbor?
  • Slide 26
  • To Best Care Anywhere
  • Slide 27
  • Massachusetts General Hospital Utility Multi-Programming System (MUMPS or M) My definition in English M is a programming language designed for hierarchical databases that is convenient for medical applications or anything else where speed and data storage upkeep are a problem and programmer intelligence/organization is not My technical definition M is a Turing-complete, low and high-level, imperative, machine-compiled (no longer interpreted) programming language utilizing a hierarchical global array file structure Used commonly in healthcare and financial industry settings
  • Slide 28
  • Structure of The Veterans Administration Data Efforts (Late 1970s) VHA Ancestor Department of Medicine and Surgery (DMAS) VHA-OI Ancestor Computer Assisted System Staff (CASS) OI&T Ancestor Office of Data Management & Telecommunications (ODM&T)
  • Slide 29
  • Comparing The Two Offices CASS ODM&T Decentralized design philosophy Rapid, agile development SME-involved development Centralized design philosophy Bureaucratic, process-focused development Development without SMEs
  • Slide 30
  • Slide 31
  • Highlights of ODM&T Development Took 6 years to deploy APPLES Pharmacy at 10 sites A 1980 paper detailing ODM&Ts transactional patient treatment file (PTF) system promised an interactive national solution by 1990. Navigating the mandated 17 steps between system specification and deployment alone is said to have required at least 3 years.
  • Slide 32
  • Slide 33
  • Beginnings of DHCP There were subject matter experts that believed that they could put out useful applications faster than the ODM&T sloth Development of the testing and principles was done unofficially throughout the early to late 1970s
  • Slide 34
  • Original DHCP Design Principles A commitment to rapid prototype development All use ANSI MUMPS Modular Design Actively Maintained Data Dictionary Code Sharing/Portability Involve the SMEs
  • Slide 35
  • DHCP Kernel Functions as both an operating system for VistA applications and an M virtual machine Kernel shields DHCP modules from needing to know hardware and OS configurations on the server Isolates M to the ANSI standard (1995) Provides a toolbox of standard functions for most programmers
  • Slide 36
  • Slide 37
  • MUMPS Classic Database One Data Type String (Text) Other types Cardinal Numbers Float Numbers $H Dates One Data Storage Type Multidimensional Array aka Globals Dynamic (duck) typing
  • Slide 38
  • VistA Data Organization Namespace File Field Record 654 (VAMC Reno) File 120.5 (GMR Vitals) Field 0.1 (DATE/TIME VITALS TAKEN) IEN-1, BP, 140/90 Most Files have an entry at the 0.001 Field called IEN or Internal Entry Number as an identity key to mark the record as unique
  • Slide 39
  • From The Beginning - Entry An entry is a piece of data Richard First Name Pham Last Name 05/03/1983 Date of Birth
  • Slide 40
  • Record (Row) A group of related data Richard Pham M 05/03/1983
  • Slide 41
  • Field A group of related data Richard First Name Pham Last Name 05/03/1983 Date of Birth
  • Slide 42
  • File A group of related fields and the records that we have File 200 NEW PERSON Richard First Name 200 Pham Last Name 200 Date of Birth -
  • Slide 43
  • File Relationships One-to-One - Pointer One-to-Many (Subfile, Multiple) Self-referential (Recursive) Reverse Recursive (Past Records) Forward Recursive (Replace Records) Pointer with Logic Multiple POinter
  • Slide 44
  • File Relationships - Pointers When two files share a common field with each other, this is called a pointer There are three major types Pointer - One record in one file matches to one record in another file Self-Referential One record in one file matches to one record in the same file (in the past or the future) Multiple One record in one file matches to many records in one file (parent-child) Variable One record and some logic matches to one file
  • Slide 45
  • Pointers File 52 PRESCRIPTION Field 2 Patient File 2 PATIENT All fields One-to-one
  • Slide 46
  • Self-Referential Pointer File 100 OE/RR Field 9 Replaced Order File 100 OE/RR (Past Order) Present-to-Past File 100 OE/RR Field 9.1 Replaced Order File 100 OE/RR (Future Order) Present-to-Future Warning DO NOT $o these fields without programmer assistance! You will bring down DHCP this way!!!
  • Slide 47
  • Multiple Subfile File 52 PRESCRIPTION Field 52 Refill Subfile File 52.1 REFILL All fields One-to-many
  • Slide 48
  • Multiple Subfile File 120.8 PATIENT ALLERGIES Field 1 GMR Allergy File 50 DRUG One-to-many files File 50.6 NATIONA L DRUG 120.2 GMR ALLERGIE S File 50.416 DRUG INGREDIE NT File 50.605 DRUG CLASS
  • Slide 49
  • Computed/MCode A placeholder that does not contain any stored information Calculated ad hoc when you look up the value Warning For this reason, the value ALWAYS has the possibility of changing
  • Slide 50
  • How Complicated Is The Pharmacy Package? 440 files in the File 50 Series 3,175 fields 527 Pointers 310 External References
  • Slide 51
  • VistA to Relational Database Terminology VistA (Example)Relational Database (Example) Namespace (VHAFRE)Database (VA Fresno) Package Not hardcodedSchema (RxOutpatient) File (50.68 VA PRODUCT)Table (NationalDrug) Field (.01 NAME)Column (DrugNameWithDose) Domain (cardinal/decimal, setofcodes, freetext/wordprocessing) Field Type (numeric, boolean, varchar) Internal Entry Number (IEN or.001)~Key (9722) RecordTuple/Row (ISOSORBIDE MONONITRATE 120MG TAB,SA) Pointer (IEN)Foreign Key (VAClassIEN) Multiple PointerNo equivalent Computed/MCode FieldTrigger (Age Trigger)
  • Slide 52
  • Upside of Using Globals Faster - No joins Faster All parameter pointers built in Faster Direct and planned programmatic access to database (Look at SQL execution plans) Less Data Storage Overhead and faster paging If the data point does not exist in the array, there does not need to be a fixed point like in relational
  • Slide 53
  • Downside of Using Globals No Intrinsic Structure and No Enforcement* - M believes whatever you put into the globals (most M programmers view this as an advantage while relational programmers have an MI) ACID-compliance not mandated (Il)logical data structures guaranteed There are many interesting* ways that the M programmers modeled the data that does not make sense to later viewers
  • Slide 54
  • MUMPS Quirks Whitespace (Space) matters Requires knowledge of kernel and sometimes lower- level concepts Programming Without Type or Structure Enforcement VA programming standards and conventions
  • Slide 55
  • Slide 56
  • The Three Wise Men (Managers) TaskMan The man(anger) that schedules tasks to the kernel MailMan The man(anger) that messages between the user, TaskMan, and any other two-way communication between packages FileMan The man(anager) that controls internal file (data structure) interactions
  • Slide 57
  • TaskMan TaskMan handles application processing: Creation of application processing tasks Scheduling these tasks Monitoring health/statistics of these tasks If kernel is the brain, then TaskMan is the body of the operation If programming, NEVER EVER use the TaskMan global. This subverts TaskMans scheduling queue, and can cause a system memory leak. Use the calls instead
  • Slide 58
  • MailMan VistA needs a way to pass and receive data from the database to other areas MailMan fulfills this function in the pre-TCP/IP days Electronic mail doesnt mean just email Practically any message between the database and anyone else (the end-user, another site, or application, etc.) can be moved this way Gives programmers methods to both receive and return data to the database MailMan is its own protocol, but will use HL7 when communicating with non-DHCP programs
  • Slide 59
  • FileMan A higher-level method to access the VistA database without exposing a programmer interface Mostly menu-driven One can use limited programming Serves as the model for all other modules that interact with the VistA database
  • Slide 60
  • ODM&T Initial Action Plan To DHCP Development (1980) Ordered that development stop Fired the developers Removed the hardware Cut the DMAS budget so it would never happen again
  • Slide 61
  • The official history
  • Slide 62
  • Development Goes Underground Developers that survived the ODM&T purge continued their work as a black project in DMAS During 1980 and 1981, the survivors (Underground Railroad) continued work on developing modules for system integration
  • Slide 63
  • Modules Modules are programmed to interact with the VistA database Most use FileMan as a model for programming
  • Slide 64
  • Some of the Many Modules MedicineSurgeryDentistryNursingPharmacy LaboratoryCare Management Patient Care Encounters ADTMental Health EDISOncologyNutrition and Food Service Imaging/PACSProsthetics Not really in the scope of this presentation to cover each module. Try the VistA Documentation Library: http://www4.va.gov/vdl/ http://www4.va.gov/vdl/ Or VHA eHealth University (VeHU): http://www.vehu.va.gov/http://www.vehu.va.gov/
  • Slide 65
  • Acceptance and DHCP 1.0 Once there was a critical mass of packages that were shown to be useful, the tide turned and the project was blessed Initial testing/installation done in 1980-83 1.0 installation was in 1985 Most of the underlying packages can still be recognized by the original programmers
  • Slide 66
  • Computerized Patient Record System (CPRS) A Real-Time Order Checking System that alerts clinicians during the ordering session that a possible problem could exist if the order is processed A Notification System that immediately alerts clinicians about clinically significant events A Patient Posting System, displayed on every CPRS screen, that alerts clinicians to issues related specifically to the patient, including crisis notes, warning, adverse reactions, and advance directives The Clinical Reminder System, which allows caregivers to track and improve preventive health care for patients and ensure timely clinical interventions are initiated Remote Data View functionality that allows clinicians to view a patients medical history from other VA facilities to ensure the clinician has access to all clinically relevant data available at VA facilities
  • Slide 67
  • CPRS Internals Written in Embarcadero Delphi (NOT in MUMPS) Connects from the Graphic User Interface to the VistA database using a Remote Procedure Call (RPC) Broker This Remote Procedure Call Broker translates instruction sets from other languages into M
  • Slide 68
  • Slide 69
  • Present State of VistA Large MUMPS database Over 50+ Main Clinical Packages Over 10,000 + Tables Each medical center runs somewhere between 2-4 TB worth of data over 30 years (mostly imaging) Many processes 300+ MB of running executable at any given time Over 20,000 subroutines (VDL) Many simultaneous users
  • Slide 70
  • Analytic Coursework \\r01scrdwh65.r01.med.va.gov\vadatalifecycle\sql SQL T-SQL dialect is for VHA PL\SQL dialect is for VBA SQL Server Reporting Services SQL Server Analysis Services Statistical Analysis Programs SAS Stata (preferred) TreeAge
  • Slide 71
  • Slide 72
  • Slide 73
  • Next Class SQL Basic query Optional introduction lecture on basic computer science (algorithms, heaps, sorts, data structures). Two 50 minute lectures for five weeks Basic Reporting - Two 50 minute lectures for five weeks Advanced Programming One 50 minute lecture every other week Class is placed on the site Current version has the DBZ Abridged Disclaimer
  • Slide 74
  • The VA Data Lifecycle
  • Slide 75
  • National Analytic Systems A list of systems that support policy, planning, and congressional needs There are more extracts than this, but I have chosen the most common ones
  • Slide 76
  • Systems to Support Planning Decision Support System (DSS) Supports accounting and costing for the OIG, GAO, CBO, and other auditing agencies Allocation Resource Center Supports personnel and resource allocation at the medical center level Workload capture, resource allocation Basis for the VERA (VAs Fund Control Point) Model
  • Slide 77
  • Systems to Support Planning and Research National Patient Care Database An integrated set of data that captures a patients care encounter with the VA Corporate Data Warehouse A near real-time accumulation of much of the same data The result of the Health Data Repository process
  • Slide 78
  • 78
  • Slide 79
  • NPCD Processing DSS data extracted Flat files are indexed and loaded into the database daily Data is checked for duplicates bi- monthly Data is extracted and filtered for reporting twice a month Oracle on Unix NPCD UNIX Master Extract File (MEF) SAS z900 (MAINFRAME) VSSC/ KLF Menu WINDOWS Daily Data Loading
  • Slide 80
  • NPCD Data Flow Diagram Data extracted & backed up nightly M-F DMI Data received in DMI 24x7 NPCD data is sent from the facilities to the AAC via MailMan messaging MailMan Message VistA MailMan NPCD and other applications retrieve their respective data from DMI for use Data Stream Once a message reaches the AAC MailMan server, It automatically moves to the Data Management Interface System (DMI) Austin MailMan Server Acknowledgement messages are sent to facilities HL7 data to Oracle DB Acknowledgement message z900 NPCD Data extracted by application
  • Slide 81
  • Secrets of the VA Data Universe This was an extremely brief introduction to a complicated area I have another presentation on the availability of databases in the VA and how to access them for operations and/or research
  • Slide 82
  • The VA Data Lifecycle
  • Slide 83
  • Regional Remote Data Processing Center Shadow Systems A offsite backup process to ensure continuity of operations for VistA Patient Care
  • Slide 84
  • Regional Data Processing Centers (RDPCs) Started as backups Read only backup VistA systems are set up to take journaling files When a record is written or altered to a local medical centers VistA, a journal file with that entry is prepared and sent to a Regional Data Processing Center This maintains an active backup in case the local medical centers VistA goes down Nowadays, even the production systems work from there Region I and IV fully (? On status) Region I and III
  • Slide 85
  • Regions and RDPCs Region I RDPC Sacramento (SAC) and Denver (DEN) Region II RDPC - Little Rock (LIT) Region III RDPC Durham (DUR) and Augusta Region IV RDPC Philadelphia (PHI) and Brooklyn
  • Slide 86
  • RDPC Denver and Brooklyn
  • Slide 87
  • The VA Data Lifecycle
  • Slide 88
  • Business Intelligence
  • Slide 89
  • Business Intelligence in the VA Making the Data Work For Us VistA has a wealth of clinical and administrative data available In the past, giving a value-added, timely VistA dataset was hard Querying the active system with minimal impact Needed an interface between M and analyst languages (SAS, SQL, etc.) Easy to read reports was hard to build
  • Slide 90
  • REGION 1 REGION 2 REGION 4 REGION 3 BISL Informatics and Analytics Ecosystem RDW V20V19V18V22V21 RPC Farm RDW V12V15V16V17V23 RPC Farm RDW V1V2V3V4V5 RPC Farm RDWV6V7V8V9V10V11 RPC Farm CDW SAS Grid VINC I Ana Apps ePM GIS RPC Farm CDW Corporate Data Warehouse RDW Regional Data Warehouse Hardware Stats 411 Servers 1.5PB Storage 54 Racks BI SharePoint (MOSS) Farm Performance Point Services Excel Services Reporting Services Analysis Services SharePoint Services Team Foundation Services
  • Slide 91
  • Different Ways To Access DHCP Data Direct Methods FileMan Individual methods M Routines Not favored (permanent moratorium in Region I) CPRS Injection MDWS (this is HI2s major method) Cache Direct HDR Extractor (CDW Method) VDEF VistA Data Extraction Framework Indirect Methods Journal Reader (CDW method)
  • Slide 92
  • MDR Extractor
  • Slide 93
  • Shadow Servers
  • Slide 94
  • Slide 95
  • Corporate/Regional Data Warehouse Takes a copy of the journal file that goes into the backup shadow system Translated from the M array to a relational database format using Intersystems Caches class mapping program Staged in a Feeder-Collector system for collection Indexed and value-added columns produced and loaded to an VISN RDW Server
  • Slide 96
  • CDW Governance VHA Business Owners/SMEs VHA-OI Data Quality OI&T Corporate Data Warehouse 10N, OIA, VBA CDW Governance Board Communicates Organizational Priorities Organizes SMEs and Data Stewards Provides Documentation and Clarification of Business Logic Sets and monitors domain, work priorities, and timelines for completion.
  • Slide 97
  • CDW Governance Is In VHAs Hands Ordered By VHA Domain and Work Prioritization By CDW Governance Board Chair KLF (OIA) Vice-Chair Larry Mole (Public Health SHG) Monitored and Accountable To VHA Project management provided by John Quinn (National Data Systems) and KLF (OIA) Supported By VHA OI Data Quality Business Owners PBMs Data Steward is Rob Silverman
  • Slide 98
  • As the number of eyes goes up, the number of bugs goes down. Writing documentation about the business logic of the files and fields Answering end user questions about the data Data validation Preferably before Inpatient Pharmacy ADR/Allergy Package
  • Slide 99
  • 1 st category models are simple V Health Factor Source Mapping FMFileFMFieldResolveFldDWTableNameDWFieldName V HEALTH FACTORSHEALTH FACTORHealthFactorHealthFactorTypeIEN V HEALTH FACTORSHEALTH FACTOR0.01HealthFactorHealthFactorType V HEALTH FACTORSPATIENT NAMEHealthFactorPatientIEN V HEALTH FACTORSEVENT DATE AND TIMEHealthFactorEventDateTime V HEALTH FACTORSVISIT0.01HealthFactorVisitVistaDate V HEALTH FACTORSVISIT0.01HealthFactorVisitDateTime V HEALTH FACTORSLEVEL/SEVERITYHealthFactorLevelSeverity V HEALTH FACTORSVISITHealthFactorVisitIEN V HEALTH FACTORSENCOUNTER PROVIDERHealthFactorEncounterStaffIEN V HEALTH FACTORSCOMMENTS HealthFactorComments
  • Slide 100
  • 2 nd category models require transformation Prescription Prescription and 1 st fill Refill Prescription Only All Fills Partial Fill Fileman Data Warehouse
  • Slide 101
  • 3 rd category models not usable without transformation - PCMM
  • Slide 102
  • Levels of Data National Corporate Data Warehouse (CDW) Region Regional Data Warehouse (RDW) VISN VISN Data Warehouse (VDW) Medical Center Local Data
  • Slide 103
  • Entities Who Produce Business Intelligence Products National VSSC, PSSG, DMDC, HEC, ARC, DSS, BIPL, OQP, PCS, PBM Region Regional BISL Teams VISN VISN Data Warehouse, VISN PBM Local DSS Bolded are ones that have substantial resources in clinical business intelligence PSSG handles much of the GIS and Statistical Demography for the VA
  • Slide 104
  • Data Access VISN and Station Level Contact Your VISN Database Manager Regional/Corporate Access Contact NDS for the 9957 Permissions
  • Slide 105
  • Operational Challenges of VistA System Resources $8 Billion investment over 20 years New needs for new domains MUMPS Programmers must be internally trained (and many of them are retiring or dying) Communication with Other Systems HIMISS compliance with data interchange E-functions (billing, prescribing, verification) Interagency Cooperation DoD and NHIN Business Intelligence Closing the data lifecycle and bringing back clinical data for knowledge discovery
  • Slide 106
  • Challenges CDW Faces Finding personnel who are able and willing to help us define the data PCMM Giving analytic advice and documentation What date should I use.? Where is this data.? Building Advanced Tier II products Multifact table cubes Syndromic Surveillance monitoring models with high dimensionality scoring
  • Slide 107
  • Acknowledgments Kernel Jack Schram (Oakland OIFO) SQLI Ellen Zufall (SF IRMS) FileMan/History of Production System Chuck Cobalis RPC Broker and MUMPS coding Perry Richmond (VISN 18 BI) Regional Data Process Vincent Bui and Ken Koenig (Region I SQL Back Office Team)
  • Slide 108
  • Acknowledgments OI&T Business Intelligence Product Line (BISL) Jack Bates Manager, OI&T BIPL Stephen Anderson Lead Data Architect Mike Baker Lead ETL Architect Denver Griffith/Ken Fuchsel Server Administrators Dave Fackler Ron Talmage Dan Hardan, Jeff King, Jeff Price
  • Slide 109
  • Questions
  • Slide 110
  • Further Information On The Background For the VA Base M Training http://vaww.vistau.med.va.gov/VistaU/MTraining/Def ault.htm http://vaww.vistau.med.va.gov/VistaU/MTraining/Def ault.htm For the VA Programming Standards and Conventions http://vista.med.va.gov/sacc/ For the VA Document Library http://vista.med.va.gov/vdl/
  • Slide 111
  • Resources for Further Information VA Information Resource Center (ViREC) http://www.virec.research.va.gov National Patient Care Database (Internal) http://vaww.aac.va.gov/npcd/http://vaww.aac.va.gov/npcd/ National Data Systems (NDS) (Internal) http://vaww4.va.gov/NDS/DataAccess.asphttp://vaww4.va.gov/NDS/DataAccess.asp
  • Slide 112
  • Reading for Fun Official History VistA*/U.S. Department of Veterans Affairs national- scale HIS Steven H. Brown, Michael J. Lincoln, Peter J. Groen, Robert M. Kolodner International Journal of Medical Informatics 69 (2003) 135/156 VistA Document Library (VDL) www4.va.gov/vdl