university at buffalothe state university of new york the data warehouse schema of hiv/aids and drug...
TRANSCRIPT
![Page 1: University at BuffaloThe State University of New York The Data Warehouse Schema of HIV/AIDS and Drug Use Project Characteristic of Source Data Our data](https://reader035.vdocuments.us/reader035/viewer/2022070306/5519b37255034667578b46c8/html5/thumbnails/1.jpg)
University at Buffalo The State University of New York
The Data Warehouse Schema of HIV/AIDS and Drug Use Project
Characteristic of Source Data
Our data is about the patients and their drug use situation information which includes 22 tables , which include various information about a patient.
In our case, the data warehouse is designed to integrate various biomedical datasets for studies of human diseases
![Page 2: University at BuffaloThe State University of New York The Data Warehouse Schema of HIV/AIDS and Drug Use Project Characteristic of Source Data Our data](https://reader035.vdocuments.us/reader035/viewer/2022070306/5519b37255034667578b46c8/html5/thumbnails/2.jpg)
University at Buffalo The State University of New York
The Data Warehouse Schema of HIV/AIDS and Drug Use Project
The Problem of the Clinical Data
Incomplete and/or imprecise data very common Uncertain relationships between fact and dimension
objects The data structure is often informal Often many-to-many relationships between measures
and dimensions
![Page 3: University at BuffaloThe State University of New York The Data Warehouse Schema of HIV/AIDS and Drug Use Project Characteristic of Source Data Our data](https://reader035.vdocuments.us/reader035/viewer/2022070306/5519b37255034667578b46c8/html5/thumbnails/3.jpg)
University at Buffalo The State University of New York
The Data Warehouse Schema of HIV/AIDS and Drug Use Project
The process of building up the data warehouse
Step 1: Split the tables which are in 1NF(First normal form)
Step 2:According to the situation, build up the measure tables
Step 3: Solving the “many to many ” relationships in each diagram
Step 4: Integration the measure tables to a fact table
![Page 4: University at BuffaloThe State University of New York The Data Warehouse Schema of HIV/AIDS and Drug Use Project Characteristic of Source Data Our data](https://reader035.vdocuments.us/reader035/viewer/2022070306/5519b37255034667578b46c8/html5/thumbnails/4.jpg)
University at Buffalo The State University of New York
The construction of the data warehouse
Step 1: Split the tables which are in 1NF
The defects may cause by the 1NF Data Redundancy Hard to Manage Load Slowly
![Page 5: University at BuffaloThe State University of New York The Data Warehouse Schema of HIV/AIDS and Drug Use Project Characteristic of Source Data Our data](https://reader035.vdocuments.us/reader035/viewer/2022070306/5519b37255034667578b46c8/html5/thumbnails/5.jpg)
University at Buffalo The State University of New York
Make the split base on the meaning
![Page 6: University at BuffaloThe State University of New York The Data Warehouse Schema of HIV/AIDS and Drug Use Project Characteristic of Source Data Our data](https://reader035.vdocuments.us/reader035/viewer/2022070306/5519b37255034667578b46c8/html5/thumbnails/6.jpg)
University at Buffalo The State University of New York
![Page 7: University at BuffaloThe State University of New York The Data Warehouse Schema of HIV/AIDS and Drug Use Project Characteristic of Source Data Our data](https://reader035.vdocuments.us/reader035/viewer/2022070306/5519b37255034667578b46c8/html5/thumbnails/7.jpg)
University at Buffalo The State University of New York
The construction of the data warehouse
Step 2: According to the situation, build up the measure tables
Since the tables can be basically classified into five categories in our data, then I have used five Measure tables :Personal Info, Medical History, Other Info, Labs and Tests, Medicines which can stand for these five categories and connect to the dimension table belong to it.
![Page 8: University at BuffaloThe State University of New York The Data Warehouse Schema of HIV/AIDS and Drug Use Project Characteristic of Source Data Our data](https://reader035.vdocuments.us/reader035/viewer/2022070306/5519b37255034667578b46c8/html5/thumbnails/8.jpg)
University at Buffalo The State University of New York
The Info about the five categories
Personal: Including 6 tables of patients’ personal information: Household Info, HIV Info, Substance, Address ,Insurance and Other Genotype
Medical History: Including 6 tables of patients’ medical history: MEDPROB, Adverse Effect,Extra Social History,Coinfection, Medical Problem, and Extra Social history.
Labs and Tests: Including 4 tables of patients’ labs and tests info: Genotype, Labs info, Phenotype and Drug monitor.
![Page 9: University at BuffaloThe State University of New York The Data Warehouse Schema of HIV/AIDS and Drug Use Project Characteristic of Source Data Our data](https://reader035.vdocuments.us/reader035/viewer/2022070306/5519b37255034667578b46c8/html5/thumbnails/9.jpg)
University at Buffalo The State University of New York
The Info about the five categories
Medicines : Including 6 tables of patients’ medical info: Prophyl , Allergies, HAART, ARV, Nutritional Supply and Other Medicines Info.
Consult and Service: Including 6 tables of patients’ Medical service and result:Services, Consults ,Consults Outcomes, Consults Assessment, Consults Recommendation and Program Affiliation.
![Page 10: University at BuffaloThe State University of New York The Data Warehouse Schema of HIV/AIDS and Drug Use Project Characteristic of Source Data Our data](https://reader035.vdocuments.us/reader035/viewer/2022070306/5519b37255034667578b46c8/html5/thumbnails/10.jpg)
University at Buffalo The State University of New York
The ER Diagrams of the five categories
![Page 11: University at BuffaloThe State University of New York The Data Warehouse Schema of HIV/AIDS and Drug Use Project Characteristic of Source Data Our data](https://reader035.vdocuments.us/reader035/viewer/2022070306/5519b37255034667578b46c8/html5/thumbnails/11.jpg)
University at Buffalo The State University of New York
The entity-relationship (ER) Diagrams of the five categories
![Page 12: University at BuffaloThe State University of New York The Data Warehouse Schema of HIV/AIDS and Drug Use Project Characteristic of Source Data Our data](https://reader035.vdocuments.us/reader035/viewer/2022070306/5519b37255034667578b46c8/html5/thumbnails/12.jpg)
University at Buffalo The State University of New York
The ER Diagrams of the five categories
![Page 13: University at BuffaloThe State University of New York The Data Warehouse Schema of HIV/AIDS and Drug Use Project Characteristic of Source Data Our data](https://reader035.vdocuments.us/reader035/viewer/2022070306/5519b37255034667578b46c8/html5/thumbnails/13.jpg)
University at Buffalo The State University of New York
The construction of the data warehouse
Step 3: Solving the “many to many ” relationships in each diagram
In order to solve the “many to many” relation between fact table and dimension table, We use the bridge table. Bridge Table is a kind of table exists between the fact table and dimension table whose relation is “many to many”.
![Page 14: University at BuffaloThe State University of New York The Data Warehouse Schema of HIV/AIDS and Drug Use Project Characteristic of Source Data Our data](https://reader035.vdocuments.us/reader035/viewer/2022070306/5519b37255034667578b46c8/html5/thumbnails/14.jpg)
University at Buffalo The State University of New York
What I need is a bridge table
![Page 15: University at BuffaloThe State University of New York The Data Warehouse Schema of HIV/AIDS and Drug Use Project Characteristic of Source Data Our data](https://reader035.vdocuments.us/reader035/viewer/2022070306/5519b37255034667578b46c8/html5/thumbnails/15.jpg)
University at Buffalo The State University of New York
The Flexibility of the bridge table
1. Solve the many-to-many relationship problem
2.Dimension table and its associated measure table can be populated independently
3. Avoid null values
![Page 16: University at BuffaloThe State University of New York The Data Warehouse Schema of HIV/AIDS and Drug Use Project Characteristic of Source Data Our data](https://reader035.vdocuments.us/reader035/viewer/2022070306/5519b37255034667578b46c8/html5/thumbnails/16.jpg)
University at Buffalo The State University of New York
The construction of the data warehouse
Step 4: Integration the measure tables to a fact table
Now that we’ve completed the design of five measure tables, it is time to integrate them together with a fact table.
![Page 17: University at BuffaloThe State University of New York The Data Warehouse Schema of HIV/AIDS and Drug Use Project Characteristic of Source Data Our data](https://reader035.vdocuments.us/reader035/viewer/2022070306/5519b37255034667578b46c8/html5/thumbnails/17.jpg)
University at Buffalo The State University of New York
The construction of the data warehouse
Between the dimension tables and the fact table , I use the Bio-Star Schema .What’s more, as we mentioned before, all the tables own a TC_ID, then I use the Patient as the fact table, and the schema is shown as following:
![Page 18: University at BuffaloThe State University of New York The Data Warehouse Schema of HIV/AIDS and Drug Use Project Characteristic of Source Data Our data](https://reader035.vdocuments.us/reader035/viewer/2022070306/5519b37255034667578b46c8/html5/thumbnails/18.jpg)
University at Buffalo The State University of New York
![Page 19: University at BuffaloThe State University of New York The Data Warehouse Schema of HIV/AIDS and Drug Use Project Characteristic of Source Data Our data](https://reader035.vdocuments.us/reader035/viewer/2022070306/5519b37255034667578b46c8/html5/thumbnails/19.jpg)
University at Buffalo The State University of New York
![Page 20: University at BuffaloThe State University of New York The Data Warehouse Schema of HIV/AIDS and Drug Use Project Characteristic of Source Data Our data](https://reader035.vdocuments.us/reader035/viewer/2022070306/5519b37255034667578b46c8/html5/thumbnails/20.jpg)
University at Buffalo The State University of New York
Characteristic of the schema
Splitting the normal form 1NF help to decrease the redundancy of the data and easily management.
Try to maintain the structure of the original tables , which help the clients understand better.
Easily handle the many-to-many relationships.BioStar schemas are able to capture the complex data
structures and semantics.The model has the properties of great extensibility and flexibility to be widely applicable to biomedical data.
![Page 21: University at BuffaloThe State University of New York The Data Warehouse Schema of HIV/AIDS and Drug Use Project Characteristic of Source Data Our data](https://reader035.vdocuments.us/reader035/viewer/2022070306/5519b37255034667578b46c8/html5/thumbnails/21.jpg)
University at Buffalo The State University of New York
Thank You