database design techniques for clinical research

31
Database Design Techniques for Clinical Research Melissa K. Carroll, M.S. October 20, 2003

Upload: yardley

Post on 11-Jan-2016

46 views

Category:

Documents


0 download

DESCRIPTION

Database Design Techniques for Clinical Research. Melissa K. Carroll, M.S. October 20, 2003. Overview. Relational database design Implementing a relational database in Microsoft Access Designing a database for a typical study by our group Resources Questions. Relational Database Design. - PowerPoint PPT Presentation

TRANSCRIPT

Page 1: Database Design Techniques for Clinical Research

Database Design Techniques for Clinical Research

Melissa K. Carroll, M.S.

October 20, 2003

Page 2: Database Design Techniques for Clinical Research

Overview

• Relational database design

• Implementing a relational database in Microsoft Access

• Designing a database for a typical study by our group

• Resources

• Questions

Page 3: Database Design Techniques for Clinical Research

Relational Database Design

Page 4: Database Design Techniques for Clinical Research

What is a Database?

• Collection of data organized for efficient operations

• Everyone uses them multiple times a day, often without realizing it

• Examples– Airline reservations

– Online shopping

• Underlying design principles are largely universal

Page 5: Database Design Techniques for Clinical Research

Problems with “Flat” Files

• Data Redundancy– Leads to more work and inconsistencies

– Wreaks havoc on performing basic manipulations such as searching and sorting

• File Management– Multiple files

– Concurrent users

– Security

– Intermediate results

• Ad hoc programming (reinventing the wheel)

Page 6: Database Design Techniques for Clinical Research

Data Modeling: Entity-Relationship Model

• Models data as entities, with attributes, and relationships between entities

• Entity: person, place, or thing

• Instance: example of an entity

• Attribute: feature of an entity

• Relationship: describes association between (usually two) entities

Page 7: Database Design Techniques for Clinical Research

E-R Notation

RELATIONSHIP

ENTITY

UNIQUEATTRIBUTE

ATTRIBUTE ATTRIBUTE

ENTITY

JOINT UNIQUEATTRIBUTE

ATTRIBUTE

JOINT UNIQUEATTRIBUTE

ATTRIBUTE

Many-to-Many Many-to-One

Page 8: Database Design Techniques for Clinical Research

Designing an E-R Diagram

• Issues to Consider– What questions will the data be used to answer?– What are the entities and how do they relate to

each other?– What attributes uniquely identify entities?– What attributes need to be sub-divided?

• Goal: Eliminate Redundancy

• Process is called “normalizing” data

Page 9: Database Design Techniques for Clinical Research

Sample E-R Diagram: MP3 Files

Artist

Artist ID Artist Name

Album

Album ID Album Title

Release Year Label

Song

Song ID Song Title

Recording

Recording ID Quality

Time

Composer

Composer ID Composer Name

Recorded Appears On

Of A

Composed

Page 10: Database Design Techniques for Clinical Research

Relational Model

• Lower-level model used for actual database implementation

• Translating from E-R model– Entities become tables– Attributes become fields– Many-to-many relationships become tables

• Unique identifiers from involved tables as fields

– Unique identifiers from “one” sides are added as fields to corresponding “many” sides

Page 11: Database Design Techniques for Clinical Research

Relational Database Management Systems (RDBMSs)

• Database Management System (DBMS): software with purpose of helping user design and use a database

• Relational Database Management System (RDBMS): DBMS for databases based on relational model

• Most major commercial products (e.g. MS Access, Oracle, MySQL, SQL Server)

Page 12: Database Design Techniques for Clinical Research

SQL

• Need language to tell the DBMS– The design of the database– Actual data to be entered– What data to retrieve and in what format

• SQL = standardized language used by almost all major DBMSs

• Standard language provides interoperability and portability

Page 13: Database Design Techniques for Clinical Research

SQL Examples

• CREATE TABLE artist (artistID INT AUTO_INCREMENT, artistName VARCHAR(75))

• INSERT INTO artist (artistName) VALUES (“The Beatles”)

• UPDATE album SET label = “EMI” WHERE albumTitle = “Abbey Road”

Page 14: Database Design Techniques for Clinical Research

SQL Examples Continued

• SELECT songTitle, quality FROM song, recording WHERE song.songID = recording.songID

• SELECT songTitle, quality FROM song INNER JOIN recording ON song.songID = recording.songID

• SELECT albumTitle, albumAge AS releaseYear - Date() FROM album

Page 15: Database Design Techniques for Clinical Research

SQL Examples Continued

• SELECT Count(artistID) from artist

• SELECT MAX(recording.quality) FROM artist, recorded, recording WHERE artist.artistName = recorded.artistName and recorded.recordingID = recording.recordingID and artist.artistName = “The Beatles”

Page 16: Database Design Techniques for Clinical Research

Relational Database Implementation in Microsoft

Access

Page 17: Database Design Techniques for Clinical Research

Clinical Research Database Design

Page 18: Database Design Techniques for Clinical Research

Typical Simple Study

• Baseline and fixed number of follow-ups

• Subject reaches each time point only once

• Different time points have different scale protocols

• Considerable overlap in scales between time points

• Isolated from other studies

Page 19: Database Design Techniques for Clinical Research

Four Database Design Approaches

• Approach One: entire assessment administration as entity, e.g. all of baseline or all of 12 week– One table per time point, items as attributes

• Approach Two: scale administrations within each assessment as entity, e.g. 12 Week Hamilton– One table per scale per time point, items as attributes

• Approach Three: scale administration as entity– One table per scale, items as attributes

• Approach Four: item as entity– One table (theoretically)

Page 20: Database Design Techniques for Clinical Research

Evaluation of Approach One

• May seem appropriate because common format for analysis is one record per subject

• Problems– Limited number of fields allowed in some

DBMSs– Will have many missing values– General redundancy issues (shares with

Approach Two, to follow)

Page 21: Database Design Techniques for Clinical Research

Pros and Cons of Approach Two Versus Approach Three

• Pros– “Horizontal” format

– Flexibility for handling inter-time point scale disparities

• Cons (for simple studies)– Data model complexity

– Table creation and modification time multiplied

– Space consumption

– More data locations (entry and retrieval complexity)

– Re-assigning to different time points

Page 22: Database Design Techniques for Clinical Research

Reassigning Scale Time Points Using Approaches Two and Three

Approach TwoTable: Hamilton Baseline Table: Hamilton Baseline Table: Hamilton 12 WeeksSubject ID aHam1 aHam2 aHam3 Subject ID aHam1 aHam2 aHam3 Subject ID bHam1 bHam2 bHam337 1 3 2 37 1 3 262 3 3 1 62 3 3 1 62 2 2 151 2 1 1 51 2 1 1 51 1 1 1

Approach ThreeTable: Hamilton Table: HamiltonSubject ID Ham1 Ham2 Ham3 Timepoint Subject ID Ham1 Ham2 Ham3 Timepoint37 1 3 2 1 37 1 3 2 2 change:62 3 3 1 1 62 3 3 1 151 2 1 1 1 51 2 1 1 162 2 2 1 2 62 2 2 1 251 1 1 1 2 51 1 1 1 2

Page 23: Database Design Techniques for Clinical Research

Reassigning Scales: Modified Approach Three

Table: Visit Table: VisitVisit ID Subject ID Timepoint Visit ID Subject ID Timepoint1 37 1 1 37 22 62 1 2 62 13 51 1 3 51 14 62 2 4 62 25 51 2 5 51 2

Table: Hamilton Table: HamiltonVisit ID Ham1 Ham2 Ham3 Visit ID Ham1 Ham2 Ham31 1 3 2 1 1 3 22 3 3 1 2 3 3 13 2 1 1 3 2 1 14 2 2 1 4 2 2 15 1 1 1 5 1 1 1

Table: MMSE Table: MMSEVisit ID MMSE1 MMSE2 MMSE3 Visit ID MMSE1 MMSE2 MMSE31 1 0 1 1 1 0 12 1 1 1 2 1 1 13 0 1 1 3 0 1 14 1 1 1 4 1 1 15 1 1 0 5 1 1 0

Page 24: Database Design Techniques for Clinical Research

Approach Two Cons for More Complex Studies

• Poor at handling an indefinite number of follow-up time points

• Modified Approach Three is better at handling studies in which subjects are assessed at the same time point multiple times– May happen due to progressing through the study

multiple times– May also happen due to e.g. being screened

multiple times

Page 25: Database Design Techniques for Clinical Research

Evaluation of Approach Four

• Pros– Could potentially handle changes more

elegantly– Perhaps more “normalized” theoretically

• Cons– Considerably harder to design entry interface– Harder to obtain data in formats usually

required

• Doesn’t fix non-database problems with data collection changes

Page 26: Database Design Techniques for Clinical Research

Databases and Datasets

• Database: Collection of data organized for efficient entry, updating, storage, and retrieval

• Dataset: Subset of data retrieved from database in a format optimized for a specific reporting or analysis purpose

• Well-designed databases should facilitate creation of datasets in any desired format

• Datasets should be formatted for a particular purpose and used only for that purpose

Page 27: Database Design Techniques for Clinical Research

Normalizing Data Within Scales: Medication Data

Option 1MedScaleMedScale IDBrand Name1Generic Name1Drug Class1Dosage1Unit1Frequency1…Brand NameNGeneric NameNDrug ClassNDosageNUnitNFrequencyNEtc.

Option 2Medication MedScaleMed ID MedScale IDBrand Name Med ID1Generic Name Dosage1Class Unit1

Frequency1…Med IDNDosageNUnitNFrequencyNEtc.

Option 3Medication MedScale MedScaleMedicationMed ID MedScale ID MedScaleMedication IDBrand Name Etc. MedScale IDGeneric Name Med IDClass Dosage

UnitFrequency

Page 28: Database Design Techniques for Clinical Research

Normalizing Data Continued: Comparison of Medication Queries

Page 29: Database Design Techniques for Clinical Research

Multi-Study Issues: To Separate or Not To Separate

• If same data will count for multiple studies– Keeping design and data in sync– E.g. updating all copies when data changed– E.g. ensuring scale changes are reflected in all

tables and forms

• If handling multiple, possibly “isolated” studies– Keeping design in sync– Can still use views so actual storage is

transparent to user

Page 30: Database Design Techniques for Clinical Research

Summary

• Careful planning must go into designing a database

• First step in design is to model the data– E-R relational model is effective

• DBMSs, such as Access, offer tools for creating, using, and maintaining databases

• When designing clinical research databases, as with any databases, priority should be normalization, hence elimination of redundancy

• Properly designed databases will supply data in any format desired

Page 31: Database Design Techniques for Clinical Research

Resources

• Access Help (Help in top menu, Contents and Index, Contents tab)

• Access Database Wizard (in main menu upon opening)

• Oreilly Access Database Design & Programming, 3rd Edition– For database design theory: online chapter at

http://www.oreilly.com/catalog/accessdata3/chapter/ch04.html

• Access (97/2000/etc.) Bible – Available here; not 100% accurate

• Database System Concepts Fourth Edition (Silberschatz, Korth, Sudarshan)