![Page 1: Database Design Techniques for Clinical Research](https://reader035.vdocuments.us/reader035/viewer/2022070408/568143b3550346895db03c1d/html5/thumbnails/1.jpg)
Database Design Techniques for Clinical Research
Melissa K. Carroll, M.S.
October 20, 2003
![Page 2: Database Design Techniques for Clinical Research](https://reader035.vdocuments.us/reader035/viewer/2022070408/568143b3550346895db03c1d/html5/thumbnails/2.jpg)
Overview
• Relational database design
• Implementing a relational database in Microsoft Access
• Designing a database for a typical study by our group
• Resources
• Questions
![Page 3: Database Design Techniques for Clinical Research](https://reader035.vdocuments.us/reader035/viewer/2022070408/568143b3550346895db03c1d/html5/thumbnails/3.jpg)
Relational Database Design
![Page 4: Database Design Techniques for Clinical Research](https://reader035.vdocuments.us/reader035/viewer/2022070408/568143b3550346895db03c1d/html5/thumbnails/4.jpg)
What is a Database?
• Collection of data organized for efficient operations
• Everyone uses them multiple times a day, often without realizing it
• Examples– Airline reservations
– Online shopping
• Underlying design principles are largely universal
![Page 5: Database Design Techniques for Clinical Research](https://reader035.vdocuments.us/reader035/viewer/2022070408/568143b3550346895db03c1d/html5/thumbnails/5.jpg)
Problems with “Flat” Files
• Data Redundancy– Leads to more work and inconsistencies
– Wreaks havoc on performing basic manipulations such as searching and sorting
• File Management– Multiple files
– Concurrent users
– Security
– Intermediate results
• Ad hoc programming (reinventing the wheel)
![Page 6: Database Design Techniques for Clinical Research](https://reader035.vdocuments.us/reader035/viewer/2022070408/568143b3550346895db03c1d/html5/thumbnails/6.jpg)
Data Modeling: Entity-Relationship Model
• Models data as entities, with attributes, and relationships between entities
• Entity: person, place, or thing
• Instance: example of an entity
• Attribute: feature of an entity
• Relationship: describes association between (usually two) entities
![Page 7: Database Design Techniques for Clinical Research](https://reader035.vdocuments.us/reader035/viewer/2022070408/568143b3550346895db03c1d/html5/thumbnails/7.jpg)
E-R Notation
RELATIONSHIP
ENTITY
UNIQUEATTRIBUTE
ATTRIBUTE ATTRIBUTE
ENTITY
JOINT UNIQUEATTRIBUTE
ATTRIBUTE
JOINT UNIQUEATTRIBUTE
ATTRIBUTE
Many-to-Many Many-to-One
![Page 8: Database Design Techniques for Clinical Research](https://reader035.vdocuments.us/reader035/viewer/2022070408/568143b3550346895db03c1d/html5/thumbnails/8.jpg)
Designing an E-R Diagram
• Issues to Consider– What questions will the data be used to answer?– What are the entities and how do they relate to
each other?– What attributes uniquely identify entities?– What attributes need to be sub-divided?
• Goal: Eliminate Redundancy
• Process is called “normalizing” data
![Page 9: Database Design Techniques for Clinical Research](https://reader035.vdocuments.us/reader035/viewer/2022070408/568143b3550346895db03c1d/html5/thumbnails/9.jpg)
Sample E-R Diagram: MP3 Files
Artist
Artist ID Artist Name
Album
Album ID Album Title
Release Year Label
Song
Song ID Song Title
Recording
Recording ID Quality
Time
Composer
Composer ID Composer Name
Recorded Appears On
Of A
Composed
![Page 10: Database Design Techniques for Clinical Research](https://reader035.vdocuments.us/reader035/viewer/2022070408/568143b3550346895db03c1d/html5/thumbnails/10.jpg)
Relational Model
• Lower-level model used for actual database implementation
• Translating from E-R model– Entities become tables– Attributes become fields– Many-to-many relationships become tables
• Unique identifiers from involved tables as fields
– Unique identifiers from “one” sides are added as fields to corresponding “many” sides
![Page 11: Database Design Techniques for Clinical Research](https://reader035.vdocuments.us/reader035/viewer/2022070408/568143b3550346895db03c1d/html5/thumbnails/11.jpg)
Relational Database Management Systems (RDBMSs)
• Database Management System (DBMS): software with purpose of helping user design and use a database
• Relational Database Management System (RDBMS): DBMS for databases based on relational model
• Most major commercial products (e.g. MS Access, Oracle, MySQL, SQL Server)
![Page 12: Database Design Techniques for Clinical Research](https://reader035.vdocuments.us/reader035/viewer/2022070408/568143b3550346895db03c1d/html5/thumbnails/12.jpg)
SQL
• Need language to tell the DBMS– The design of the database– Actual data to be entered– What data to retrieve and in what format
• SQL = standardized language used by almost all major DBMSs
• Standard language provides interoperability and portability
![Page 13: Database Design Techniques for Clinical Research](https://reader035.vdocuments.us/reader035/viewer/2022070408/568143b3550346895db03c1d/html5/thumbnails/13.jpg)
SQL Examples
• CREATE TABLE artist (artistID INT AUTO_INCREMENT, artistName VARCHAR(75))
• INSERT INTO artist (artistName) VALUES (“The Beatles”)
• UPDATE album SET label = “EMI” WHERE albumTitle = “Abbey Road”
![Page 14: Database Design Techniques for Clinical Research](https://reader035.vdocuments.us/reader035/viewer/2022070408/568143b3550346895db03c1d/html5/thumbnails/14.jpg)
SQL Examples Continued
• SELECT songTitle, quality FROM song, recording WHERE song.songID = recording.songID
• SELECT songTitle, quality FROM song INNER JOIN recording ON song.songID = recording.songID
• SELECT albumTitle, albumAge AS releaseYear - Date() FROM album
![Page 15: Database Design Techniques for Clinical Research](https://reader035.vdocuments.us/reader035/viewer/2022070408/568143b3550346895db03c1d/html5/thumbnails/15.jpg)
SQL Examples Continued
• SELECT Count(artistID) from artist
• SELECT MAX(recording.quality) FROM artist, recorded, recording WHERE artist.artistName = recorded.artistName and recorded.recordingID = recording.recordingID and artist.artistName = “The Beatles”
![Page 16: Database Design Techniques for Clinical Research](https://reader035.vdocuments.us/reader035/viewer/2022070408/568143b3550346895db03c1d/html5/thumbnails/16.jpg)
Relational Database Implementation in Microsoft
Access
![Page 17: Database Design Techniques for Clinical Research](https://reader035.vdocuments.us/reader035/viewer/2022070408/568143b3550346895db03c1d/html5/thumbnails/17.jpg)
Clinical Research Database Design
![Page 18: Database Design Techniques for Clinical Research](https://reader035.vdocuments.us/reader035/viewer/2022070408/568143b3550346895db03c1d/html5/thumbnails/18.jpg)
Typical Simple Study
• Baseline and fixed number of follow-ups
• Subject reaches each time point only once
• Different time points have different scale protocols
• Considerable overlap in scales between time points
• Isolated from other studies
![Page 19: Database Design Techniques for Clinical Research](https://reader035.vdocuments.us/reader035/viewer/2022070408/568143b3550346895db03c1d/html5/thumbnails/19.jpg)
Four Database Design Approaches
• Approach One: entire assessment administration as entity, e.g. all of baseline or all of 12 week– One table per time point, items as attributes
• Approach Two: scale administrations within each assessment as entity, e.g. 12 Week Hamilton– One table per scale per time point, items as attributes
• Approach Three: scale administration as entity– One table per scale, items as attributes
• Approach Four: item as entity– One table (theoretically)
![Page 20: Database Design Techniques for Clinical Research](https://reader035.vdocuments.us/reader035/viewer/2022070408/568143b3550346895db03c1d/html5/thumbnails/20.jpg)
Evaluation of Approach One
• May seem appropriate because common format for analysis is one record per subject
• Problems– Limited number of fields allowed in some
DBMSs– Will have many missing values– General redundancy issues (shares with
Approach Two, to follow)
![Page 21: Database Design Techniques for Clinical Research](https://reader035.vdocuments.us/reader035/viewer/2022070408/568143b3550346895db03c1d/html5/thumbnails/21.jpg)
Pros and Cons of Approach Two Versus Approach Three
• Pros– “Horizontal” format
– Flexibility for handling inter-time point scale disparities
• Cons (for simple studies)– Data model complexity
– Table creation and modification time multiplied
– Space consumption
– More data locations (entry and retrieval complexity)
– Re-assigning to different time points
![Page 22: Database Design Techniques for Clinical Research](https://reader035.vdocuments.us/reader035/viewer/2022070408/568143b3550346895db03c1d/html5/thumbnails/22.jpg)
Reassigning Scale Time Points Using Approaches Two and Three
Approach TwoTable: Hamilton Baseline Table: Hamilton Baseline Table: Hamilton 12 WeeksSubject ID aHam1 aHam2 aHam3 Subject ID aHam1 aHam2 aHam3 Subject ID bHam1 bHam2 bHam337 1 3 2 37 1 3 262 3 3 1 62 3 3 1 62 2 2 151 2 1 1 51 2 1 1 51 1 1 1
Approach ThreeTable: Hamilton Table: HamiltonSubject ID Ham1 Ham2 Ham3 Timepoint Subject ID Ham1 Ham2 Ham3 Timepoint37 1 3 2 1 37 1 3 2 2 change:62 3 3 1 1 62 3 3 1 151 2 1 1 1 51 2 1 1 162 2 2 1 2 62 2 2 1 251 1 1 1 2 51 1 1 1 2
![Page 23: Database Design Techniques for Clinical Research](https://reader035.vdocuments.us/reader035/viewer/2022070408/568143b3550346895db03c1d/html5/thumbnails/23.jpg)
Reassigning Scales: Modified Approach Three
Table: Visit Table: VisitVisit ID Subject ID Timepoint Visit ID Subject ID Timepoint1 37 1 1 37 22 62 1 2 62 13 51 1 3 51 14 62 2 4 62 25 51 2 5 51 2
Table: Hamilton Table: HamiltonVisit ID Ham1 Ham2 Ham3 Visit ID Ham1 Ham2 Ham31 1 3 2 1 1 3 22 3 3 1 2 3 3 13 2 1 1 3 2 1 14 2 2 1 4 2 2 15 1 1 1 5 1 1 1
Table: MMSE Table: MMSEVisit ID MMSE1 MMSE2 MMSE3 Visit ID MMSE1 MMSE2 MMSE31 1 0 1 1 1 0 12 1 1 1 2 1 1 13 0 1 1 3 0 1 14 1 1 1 4 1 1 15 1 1 0 5 1 1 0
![Page 24: Database Design Techniques for Clinical Research](https://reader035.vdocuments.us/reader035/viewer/2022070408/568143b3550346895db03c1d/html5/thumbnails/24.jpg)
Approach Two Cons for More Complex Studies
• Poor at handling an indefinite number of follow-up time points
• Modified Approach Three is better at handling studies in which subjects are assessed at the same time point multiple times– May happen due to progressing through the study
multiple times– May also happen due to e.g. being screened
multiple times
![Page 25: Database Design Techniques for Clinical Research](https://reader035.vdocuments.us/reader035/viewer/2022070408/568143b3550346895db03c1d/html5/thumbnails/25.jpg)
Evaluation of Approach Four
• Pros– Could potentially handle changes more
elegantly– Perhaps more “normalized” theoretically
• Cons– Considerably harder to design entry interface– Harder to obtain data in formats usually
required
• Doesn’t fix non-database problems with data collection changes
![Page 26: Database Design Techniques for Clinical Research](https://reader035.vdocuments.us/reader035/viewer/2022070408/568143b3550346895db03c1d/html5/thumbnails/26.jpg)
Databases and Datasets
• Database: Collection of data organized for efficient entry, updating, storage, and retrieval
• Dataset: Subset of data retrieved from database in a format optimized for a specific reporting or analysis purpose
• Well-designed databases should facilitate creation of datasets in any desired format
• Datasets should be formatted for a particular purpose and used only for that purpose
![Page 27: Database Design Techniques for Clinical Research](https://reader035.vdocuments.us/reader035/viewer/2022070408/568143b3550346895db03c1d/html5/thumbnails/27.jpg)
Normalizing Data Within Scales: Medication Data
Option 1MedScaleMedScale IDBrand Name1Generic Name1Drug Class1Dosage1Unit1Frequency1…Brand NameNGeneric NameNDrug ClassNDosageNUnitNFrequencyNEtc.
Option 2Medication MedScaleMed ID MedScale IDBrand Name Med ID1Generic Name Dosage1Class Unit1
Frequency1…Med IDNDosageNUnitNFrequencyNEtc.
Option 3Medication MedScale MedScaleMedicationMed ID MedScale ID MedScaleMedication IDBrand Name Etc. MedScale IDGeneric Name Med IDClass Dosage
UnitFrequency
![Page 28: Database Design Techniques for Clinical Research](https://reader035.vdocuments.us/reader035/viewer/2022070408/568143b3550346895db03c1d/html5/thumbnails/28.jpg)
Normalizing Data Continued: Comparison of Medication Queries
![Page 29: Database Design Techniques for Clinical Research](https://reader035.vdocuments.us/reader035/viewer/2022070408/568143b3550346895db03c1d/html5/thumbnails/29.jpg)
Multi-Study Issues: To Separate or Not To Separate
• If same data will count for multiple studies– Keeping design and data in sync– E.g. updating all copies when data changed– E.g. ensuring scale changes are reflected in all
tables and forms
• If handling multiple, possibly “isolated” studies– Keeping design in sync– Can still use views so actual storage is
transparent to user
![Page 30: Database Design Techniques for Clinical Research](https://reader035.vdocuments.us/reader035/viewer/2022070408/568143b3550346895db03c1d/html5/thumbnails/30.jpg)
Summary
• Careful planning must go into designing a database
• First step in design is to model the data– E-R relational model is effective
• DBMSs, such as Access, offer tools for creating, using, and maintaining databases
• When designing clinical research databases, as with any databases, priority should be normalization, hence elimination of redundancy
• Properly designed databases will supply data in any format desired
![Page 31: Database Design Techniques for Clinical Research](https://reader035.vdocuments.us/reader035/viewer/2022070408/568143b3550346895db03c1d/html5/thumbnails/31.jpg)
Resources
• Access Help (Help in top menu, Contents and Index, Contents tab)
• Access Database Wizard (in main menu upon opening)
• Oreilly Access Database Design & Programming, 3rd Edition– For database design theory: online chapter at
http://www.oreilly.com/catalog/accessdata3/chapter/ch04.html
• Access (97/2000/etc.) Bible – Available here; not 100% accurate
• Database System Concepts Fourth Edition (Silberschatz, Korth, Sudarshan)