review of assignment 3, loose ends, security, web-based data collection michael a. kohn, md, mpp 2...

73
Review of Assignment 3, Loose Ends, Security, Web-based Data Collection Michael A. Kohn, MD, MPP 2 February 2010

Upload: chastity-mcdowell

Post on 30-Dec-2015

214 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Review of Assignment 3, Loose Ends, Security, Web-based Data Collection Michael A. Kohn, MD, MPP 2 February 2010

Review of Assignment 3, Loose Ends, Security, Web-based Data

Collection Michael A. Kohn, MD, MPP

2 February 2010

Page 2: Review of Assignment 3, Loose Ends, Security, Web-based Data Collection Michael A. Kohn, MD, MPP 2 February 2010

Outline

• Assignment 3 Review

• Loose Ends: Yes/No Fields, BLOBs, Field Names, Front Ends

• HIPAA Privacy Rule and CFR 21 Part 11

• Web-based Data Entry

• Assignment 4

Page 3: Review of Assignment 3, Loose Ends, Security, Web-based Data Collection Michael A. Kohn, MD, MPP 2 February 2010

Housekeeping

• Database demos with advice for Assignment 4/Final Project: Tuesday 2/9– Simon Knops– Ben Breyer– Mary Farrant

• Assignment 4/Final Project is now due 3/9 instead of 2/16

Page 4: Review of Assignment 3, Loose Ends, Security, Web-based Data Collection Michael A. Kohn, MD, MPP 2 February 2010

Final Project Due 3/9 (not 2/16)

More time for• students who need to personally demo their

databases to satisfy Part A of the assignment• students who are using REDCap and QuesGen to

set up their accounts• students who need consulting, either from me or

from the CTSI DMU.

(I will be out of town the week, from 2/15 through 2/19)

Page 5: Review of Assignment 3, Loose Ends, Security, Web-based Data Collection Michael A. Kohn, MD, MPP 2 February 2010

Assignment 3

Extra Credit: Write a sentence or two for the “Methods” or “Results” section on inter-rater reliability. (Use Bland and Altman, BMJ 1996; 313:744)

Lab 3: Exporting and Analyzing Data 1/26/2010

Determine if neonatal jaundice was associated with the 5-year IQ scores and create a table, figure, or paragraph appropriate for the “Results” section of a manuscript summarizing the association.

Page 6: Review of Assignment 3, Loose Ends, Security, Web-based Data Collection Michael A. Kohn, MD, MPP 2 February 2010

Answer

Of the infants with neonatal jaundice, 149 had IQ tests at age 5, and of the infants without neonatal jaundice, 248 had IQ tests. The mean (+SD) IQ score was significantly higher in the jaundice group, 111.5 +21.1, than in the no-jaundice group 101.4+20.5 -- difference 10.1 (95% CI 5.9 – 14.4).

Page 7: Review of Assignment 3, Loose Ends, Security, Web-based Data Collection Michael A. Kohn, MD, MPP 2 February 2010

Table. Mean Five-Year IQ Scores for Infants With and Without Neonatal Jaundice

  N Mean (SD)*  

Jaundice 149 111.5 (21.1)  

No Jaundice 248 101.4 (20.5)  

       

*Difference in mean scores of 10.1 (95% CI 5.9-14.4)

Page 8: Review of Assignment 3, Loose Ends, Security, Web-based Data Collection Michael A. Kohn, MD, MPP 2 February 2010

Table. Mean Five-Year IQ Scores for Infants Without and With Neonatal Jaundice

  No Jaundice

Jaundice Difference (95% CI)

N 248 149

Mean (SD) 101.4 (20.5) 111.5 (21.1) 10.1 (5.9-14.4)*

*p< 0.0001

Page 9: Review of Assignment 3, Loose Ends, Security, Web-based Data Collection Michael A. Kohn, MD, MPP 2 February 2010

Newman T et al. N Engl J Med 2006;354:1889-1900

Page 10: Review of Assignment 3, Loose Ends, Security, Web-based Data Collection Michael A. Kohn, MD, MPP 2 February 2010

----------------------------------------------------------------------------- Group | Obs Mean Std. Err. Std. Dev. [95% Conf. Interval]

---------+-------------------------------------------------------------------- No | 248 101.3925 1.303441 20.52661 98.8252 103.9597 Yes | 149 111.5358 1.732576 21.14879 108.112 114.9596

---------+--------------------------------------------------------------------combined | 397 105.1994 1.06956 21.31083 103.0967 107.3021

---------+-------------------------------------------------------------------- diff | -10.14332 2.152007 -14.37414 -5.912502

------------------------------------------------------------------------------Degrees of freedom: 395

Ho: mean(No) - mean(Yes) = diff = 0 Ha: diff < 0 Ha: diff ~= 0 Ha: diff > 0 t = -4.7134 t = -4.7134 t = -4.7134

P < t = 0.0000 P > |t| = 0.0000 P > t = 1.0000

Would you submit this for publication?

Page 11: Review of Assignment 3, Loose Ends, Security, Web-based Data Collection Michael A. Kohn, MD, MPP 2 February 2010

Essential Elements

• Sample size (149 jaundiced, 248 non-jaundiced)• Indication of effect size (report both means, or the

difference between them)• Get direction of effect right. (Jaundiced group did

better!)• Indication of variability (Sample SDs, SEs of

means, CIs of means, or CI of difference between means.)

Page 12: Review of Assignment 3, Loose Ends, Security, Web-based Data Collection Michael A. Kohn, MD, MPP 2 February 2010

Browner on Figures

Figures should have a minimum of four data points. A figure that shows that the rate of colon cancer is higher in men than in women, or that diabetes is more common in Hispanics than in whites or blacks, [or that jaundiced babies had higher IQs at age 5 years than non-jaundiced babies,] is not worth the ink required to print it. Use text instead.

Browner, WS. Publishing and Presenting Clinical Research; 1999; Williams and Wilkins. Pg. 90

Page 13: Review of Assignment 3, Loose Ends, Security, Web-based Data Collection Michael A. Kohn, MD, MPP 2 February 2010

Relationship between Neonatal Jaundice and Neuopsychiatric Score

at Age 5

5060708090

100110120130140

No Jaundice JaundiceAvera

ge N

eu

rop

sych

iatr

ic S

co

re

*

Cutoff at 50? Caption should be below figure. What are the error bars? “Neuopsychiatric”

Page 14: Review of Assignment 3, Loose Ends, Security, Web-based Data Collection Michael A. Kohn, MD, MPP 2 February 2010

Figure 1. Mean IQ scores (95%CI) at age 5 among non-jaudiced children and jaundiced

children

60

70

80

90

100

110

120

No Yes

Neonatal Jaundice

IQ S

core

s

Cutoff at 60? Caption should be below figure.

Page 15: Review of Assignment 3, Loose Ends, Security, Web-based Data Collection Michael A. Kohn, MD, MPP 2 February 2010

101.4 + 20.5

111.5 + 21.1

96

98

100

102

104

106

108

110

112

Me

an

Sc

ore

No Jaundice Jaundice

Mean Five Year Neuropsychiatric Score of Infants With or Without Neonatal Jaundice

No Jaundice Jaundice

p<0.001

Page 16: Review of Assignment 3, Loose Ends, Security, Web-based Data Collection Michael A. Kohn, MD, MPP 2 February 2010

Browner on 3-D Figures

Three dimensional graphs usually are not helpful.

Browner, WS. Publishing and Presenting Clinical Research; 1999; Williams and Wilkins. Pg. 97

Also, note that the 3-D is only an effect. The data are two dimensional (score by jaundice).

Page 17: Review of Assignment 3, Loose Ends, Security, Web-based Data Collection Michael A. Kohn, MD, MPP 2 February 2010

Figure 1. Neuropsychiatric scores of children who were jaundiced, and not jaundiced at birth

0

20

40

60

80

100

120

J aundiced

Not J aundiced

Takes the prize for ugliest figure.

Page 18: Review of Assignment 3, Loose Ends, Security, Web-based Data Collection Michael A. Kohn, MD, MPP 2 February 2010

Box Plots of Neuropsychiatric Test Scores0 = not jaundice, 1= jaundiced at birth

50

100

150

200

AvgOfExNPScor

0 1

Caption not sufficiently explanatory. Sample size?

Page 19: Review of Assignment 3, Loose Ends, Security, Web-based Data Collection Michael A. Kohn, MD, MPP 2 February 2010

5010

015

020

0M

ean

IQ s

core

at a

ge 5

No Yes

Comparison of IQ score in Neonatal Jaundice

Figure 1: In 149 infants with neonatal jaundice, the average IQ scores were higher compared to the 248 non-jaundiced infants when evaluated at age 5 (p<0.0001).

Page 20: Review of Assignment 3, Loose Ends, Security, Web-based Data Collection Michael A. Kohn, MD, MPP 2 February 2010

Box Plot

• Median Line• Box extends from 25th to 75th percentile• Whiskers to upper and lower adjacent values• Adjacent value = 75th /25th percentile ±1.5 x IQR

(interquartile range)• Values outside the adjacent values are graphed

individually• Would be nice if area (or at least width) of box were

proportional to sample size (N). In some box plots the width of the box is proportional to log N, but not in Stata.

Page 21: Review of Assignment 3, Loose Ends, Security, Web-based Data Collection Michael A. Kohn, MD, MPP 2 February 2010

0.0%

5.0%

10.0%

15.0%

20.0%

25.0%

40-49

50-59

60-69

70-79

80-89

90-99

100-109

110-119

120-129

130-139

140-149

150-159

160-169

IQ

Fre

qu

en

cy

Jaundice

No Jaundice

Page 22: Review of Assignment 3, Loose Ends, Security, Web-based Data Collection Michael A. Kohn, MD, MPP 2 February 2010

Extra Credit

Extra Credit

• Report within-subject SD (4.0) as a measure of reliability.

• Calculate repeatability (11.0)

• Bland-Altman plot with mean difference and 95% limits of agreement*

* Nobody did this.

Page 23: Review of Assignment 3, Loose Ends, Security, Web-based Data Collection Michael A. Kohn, MD, MPP 2 February 2010

We assessed inter-rater reliability of the IQ test by having different examiners re-test 198 of the children. The within-subject standard deviation was 4.0, so the “repeatability” was 11.0, meaning that two examiners of the same subject would score within 11 points of each other 95 percent of the time. (Bland and Altman, BMJ 1996; 313:744)

Methods or Results?

Page 24: Review of Assignment 3, Loose Ends, Security, Web-based Data Collection Michael A. Kohn, MD, MPP 2 February 2010

Bland-Altman Plot

-15

-10

-5

0

5

10

15

50 100 150

Average Score

Sat

cher

Sco

re -

Ric

hm

on

d S

core

N = 142 (children examined by both Satcher and Richmond)

Mean Difference = 0.49 (95% CI -0.41 – 1.38)

95% Limits of Agreement: -10.272 – 11.244

Page 25: Review of Assignment 3, Loose Ends, Security, Web-based Data Collection Michael A. Kohn, MD, MPP 2 February 2010

Outline

• DONE Assignment 3 Review

• Loose Ends: Yes/No Fields, BLOBs, Field Names, Front Ends

• HIPAA Privacy Rule, CFR 21 Part 11

• Web-based Data Entry

• Assignment 4

Page 26: Review of Assignment 3, Loose Ends, Security, Web-based Data Collection Michael A. Kohn, MD, MPP 2 February 2010

Loose Ends

• Yes/No Fields

• BLOBs

• Field Names

• “Front End” vs. “Back End”

Page 27: Review of Assignment 3, Loose Ends, Security, Web-based Data Collection Michael A. Kohn, MD, MPP 2 February 2010

Yes/No fields

• Binary fields are not very useful, because you can’t distinguish “No” from blank (not valued).

• I create a combo box like we used for Race in Lab 1 with 0 for “No” and 1 for “Yes”. This allows blank.

Page 28: Review of Assignment 3, Loose Ends, Security, Web-based Data Collection Michael A. Kohn, MD, MPP 2 February 2010

Demonstration (BLOB)

Memo fields in the Infant Jaundice Database

Word Document Fields on the “Class” form of the ATCR Student Database

Photograph fields in the ATCR Student Database

Jpegs in Simon Knops’s Syndesmosis Database

Field types are not limited to numbers, text, dates. You can put an “object”, such as a Word document or a photo, in a field

Page 29: Review of Assignment 3, Loose Ends, Security, Web-based Data Collection Michael A. Kohn, MD, MPP 2 February 2010

Field Names

Establish and follow naming conventions for columns and tables.  Short field names without spaces or underscores are convenient for programming, querying, and other manipulations. Instead of spaces or underscores, use “IntraCaps” (upper case letters within the variable name) to distinguish words, e.g. “SubjectID”, “FName”, or “ExamDate”. Table names should be singular, e.g. “Subject” instead of “Subjects”, “Exam” instead of “Exams”.

Page 30: Review of Assignment 3, Loose Ends, Security, Web-based Data Collection Michael A. Kohn, MD, MPP 2 February 2010

“Front End” vs. “Back End”“Back End” – Tables and Data

“Front End” – Forms and reports for entering and viewing the data

Access database that you have been using combines “back end” (tables and relationships) with “front end” (forms and reports).*

*Even if both are in Access, you usually want to split the front end from the back end.

QuesGen uses MySQL for the back end.

Page 31: Review of Assignment 3, Loose Ends, Security, Web-based Data Collection Michael A. Kohn, MD, MPP 2 February 2010

Start with Data Tables or Data Collection Forms?

It doesn’t matter as long as the process is iterative.

Can start with the tables and then develop the forms, test the forms, find problems, and update the tables.

Can start with a word-processed form, create the tables, test, and update.*

*This seems to work better for most investigators

Page 32: Review of Assignment 3, Loose Ends, Security, Web-based Data Collection Michael A. Kohn, MD, MPP 2 February 2010

Sometimes it helps to start with the data collection forms, but remember, you do NOT need one table per data collection form. In the labs you learned that one form can combine data from several tables. And data from one table can appear on several forms.

Page 33: Review of Assignment 3, Loose Ends, Security, Web-based Data Collection Michael A. Kohn, MD, MPP 2 February 2010

Before seeking help with data management

Search the internet and ask other researchers for already developed data collection forms.

Draft your data collection form.Test your data collection form with dummy subjects and,

even better, with real (de-identified) study subjects.Enter your test data into a data table with rows corresponding

to subjects and columns corresponding to data elements. (Use Excel, Access, Stata, or even Word.)

Create or at least think about a data dictionary.Decide who will collect the data, and when/how the data will

be collected.

Page 34: Review of Assignment 3, Loose Ends, Security, Web-based Data Collection Michael A. Kohn, MD, MPP 2 February 2010

Common Sequence• Develop data collection forms in Word• Create Excel spreadsheets to store the data (one column

per field/attribute, one row per record/entity)• Move from Excel to Access because of need for one or

more of: – data entry forms (front end),– multiple related tables, – queries using the Access query design tool

• Move from Access to QuesGen or REDCap because of need for web-based data entry, hosting, auditing, richer user administration and security, but continue to use Access for querying of data extracts to filter, sort, format, and generate derived fields.

• Export to Stata for analysis.

Page 35: Review of Assignment 3, Loose Ends, Security, Web-based Data Collection Michael A. Kohn, MD, MPP 2 February 2010

What Have You Learned?

• The meaning and importance of the terms “normalization”, “primary key”, and “foreign key”.

• The difference between a flat-file database, and a normalized, multi-table relational database.

• A little bit of Microsoft Access• Querying data• Exporting data for analysis in a statistical package• Field types• “Front End” (forms) vs. “Back End” (tables)

Page 36: Review of Assignment 3, Loose Ends, Security, Web-based Data Collection Michael A. Kohn, MD, MPP 2 February 2010

HIPAA Privacy Rule

• Patient identifying information must be secure and available only to authorized personnel with auditing of all accesses

• Patient identifying data include dates such as date of visit, date of surgery, etc.

Page 37: Review of Assignment 3, Loose Ends, Security, Web-based Data Collection Michael A. Kohn, MD, MPP 2 February 2010

Name Address (all geographic subdivisions smaller than state) All elements (except years) of dates related to an individual (birth date, admission date, date of death and exact age if over 89) Telephone numbers FAX number E-mail address Social Security number Medical record number Health plan beneficiary number Account number Certificate/license number Any vehicle or other device serial number Device identifiers or serial numbers Web URL Internet Protocol (IP) address numbers Finger or voice prints Photographic images

Page 38: Review of Assignment 3, Loose Ends, Security, Web-based Data Collection Michael A. Kohn, MD, MPP 2 February 2010

Wednesday, January 27, 2010 (SF Chronicle)UCSF patient records possibly compromised

Victoria Colliver, Chronicle Staff Writer

(01-27) 16:01 PST SAN FRANCISCO -- Medical records for about 4,400 UCSF patients are at risk after thieves stole a laptop from a medical school employee in November, UCSF officials said today.

http://www.sfgate.com/cgi-bin/article.cgi?file=/c/a/2010/01/27/BA1U1BOI6U.DTL

This problem can be avoided just by using a Remote Desktop as we do in this class.

Page 39: Review of Assignment 3, Loose Ends, Security, Web-based Data Collection Michael A. Kohn, MD, MPP 2 February 2010

CFR 21 Part 11

• Required for submission of electronic data to the FDA when applying for drug or device approval

• Audit trail of all data entries, updates, and deletions.

Page 40: Review of Assignment 3, Loose Ends, Security, Web-based Data Collection Michael A. Kohn, MD, MPP 2 February 2010

Three Types of Research Database

1. Combination of paper files, Excel spreadsheets, and direct keyboard entry into the statistical analysis package.

2. Desktop multi-table relational database.--Access--Filemaker Pro

3. Web-Enabled Research Platform.--QuesGen (private vendor)--REDCap (academic consortium)--SurveyMonkey (private)

Page 41: Review of Assignment 3, Loose Ends, Security, Web-based Data Collection Michael A. Kohn, MD, MPP 2 February 2010

Web-Enabled Research Platform

• Browser based entry from anyplace with an internet connection.

• Enterprise database back end• Available as a hosted service

Page 42: Review of Assignment 3, Loose Ends, Security, Web-based Data Collection Michael A. Kohn, MD, MPP 2 February 2010

Web-based Data Collection Platforms

• Vendor Hosted– QuesGen– SurveyMonkey– Medrio

• Institution Hosted– REDCap– Velos– LabMatrix– OpenClinica

• Not Discussed Here– Phase Forward– Oracle Clinical

Page 43: Review of Assignment 3, Loose Ends, Security, Web-based Data Collection Michael A. Kohn, MD, MPP 2 February 2010

Advantages of Being Web-Based

• Available anywhere with an internet connection

• No software requirement beyond a browser

• Easy to share data

• No PHI on laptops or USB drives

Page 44: Review of Assignment 3, Loose Ends, Security, Web-based Data Collection Michael A. Kohn, MD, MPP 2 February 2010

Disadvantages of Being Web-based

• Limited look-and-feel options on forms (In contrast, Access forms are highly customizable.)

• Limited data structures

• Requires an internet connection

Page 45: Review of Assignment 3, Loose Ends, Security, Web-based Data Collection Michael A. Kohn, MD, MPP 2 February 2010

Advantages of Being Hosted

• No need for servers, system administrators, etc.

Page 46: Review of Assignment 3, Loose Ends, Security, Web-based Data Collection Michael A. Kohn, MD, MPP 2 February 2010

QuesGen Demo

Enter Robert’s data. (Delete record first if necessary.)

Show populated database.

Select extract “AvgScore”

Use training.studydata.net, Jif51.

Run the NIH Report.

Page 47: Review of Assignment 3, Loose Ends, Security, Web-based Data Collection Michael A. Kohn, MD, MPP 2 February 2010
Page 48: Review of Assignment 3, Loose Ends, Security, Web-based Data Collection Michael A. Kohn, MD, MPP 2 February 2010

Advantages of QuesGen

• Multiple user roles (DB admin, team member, view-only, site-specific)

• PHI fields explicitly identified (masked from user without PHI privileges)

• UCSF IT reviewed• New functionality for institutional review• Templates for clinical research (medication, lab sample, etc) and

systematic reviews (publication)• Survey/Questionnaires with skip logic• Extensive auditing• Supports complex data structures• Good user/client support

Page 49: Review of Assignment 3, Loose Ends, Security, Web-based Data Collection Michael A. Kohn, MD, MPP 2 February 2010

Disadvantages of QuesGen

Not Free.

Page 50: Review of Assignment 3, Loose Ends, Security, Web-based Data Collection Michael A. Kohn, MD, MPP 2 February 2010

REDCap Demo

• https://redcap.ucsfopenresearch.org

• Enter a Helen’s record.

• Show the log.

Page 51: Review of Assignment 3, Loose Ends, Security, Web-based Data Collection Michael A. Kohn, MD, MPP 2 February 2010
Page 52: Review of Assignment 3, Loose Ends, Security, Web-based Data Collection Michael A. Kohn, MD, MPP 2 February 2010

Advantages of REDCap

• Multiple user roles

• PHI fields explicitly identified

• Provided by UCSF

• Templates for clinical research

• Survey/Questionnaires with skip logic

• Extensive auditing

• Free!

Page 53: Review of Assignment 3, Loose Ends, Security, Web-based Data Collection Michael A. Kohn, MD, MPP 2 February 2010

Disadvantages of REDCap*

• No subject or exam list

• Supports limited data structures (nearly flat file)

• Flawed data import tool

• User/client support?

*Based on my evaluation since 1/20/2010

Page 54: Review of Assignment 3, Loose Ends, Security, Web-based Data Collection Michael A. Kohn, MD, MPP 2 February 2010

REDCap

• http://ctsi.ucsf.edu/informatics

To obtain a REDCap username and password fill out a “REDCap User Account Request Form” and a “REDCap and MyResearch Attestation Form” and send to

[email protected]@ucsf.edu

Page 55: Review of Assignment 3, Loose Ends, Security, Web-based Data Collection Michael A. Kohn, MD, MPP 2 February 2010

SurveyMonkey Demo

• Enter Robert’s exam

• Show SF-36 (Time Permitting)

Page 56: Review of Assignment 3, Loose Ends, Security, Web-based Data Collection Michael A. Kohn, MD, MPP 2 February 2010

SurveyMonkey Advantages

• Nice looking forms

• Simple to create

• Hosted

• Inexpensive

• Great for surveys

Page 57: Review of Assignment 3, Loose Ends, Security, Web-based Data Collection Michael A. Kohn, MD, MPP 2 February 2010

SurveyMonkey Disadvantages

• Market-research oriented, not medical

• Flat file (very difficult to do multiple surveys on one subject)

• No audit trail

• Limited user roles, security

• Not designed for PHI/HIPAA compliance

• Limited skip logic

Page 58: Review of Assignment 3, Loose Ends, Security, Web-based Data Collection Michael A. Kohn, MD, MPP 2 February 2010

SurveyMonkey Disadvantages

Can’t upload data

– Cannot import Baby2007.xls file as in Lab 2

– Have to key data in

No subject or exam list

Have to browse through the surveys to find the one you want.

No calculations

e.g., BMI

Page 59: Review of Assignment 3, Loose Ends, Security, Web-based Data Collection Michael A. Kohn, MD, MPP 2 February 2010

SurveyMonkey

www.surveymonkey.com

Page 60: Review of Assignment 3, Loose Ends, Security, Web-based Data Collection Michael A. Kohn, MD, MPP 2 February 2010

Data Management Protocol

• General description of database

• Data collection and entry

• Error checking and data validation

• Analysis (e.g., export to Stata)

• Security/confidentiality

• Back up

Page 61: Review of Assignment 3, Loose Ends, Security, Web-based Data Collection Michael A. Kohn, MD, MPP 2 February 2010

General Description of Database

• DBMS, e.g. MS Access XP• # of dynamic tables• # of static “lookup” tables• # of forms• # of reports An appendix could include the relationships diagram,

the table names and descriptions, and the field names and descriptions (data dictionary). Print relationships diagram using either “Print Relationships” or taking a screen shot.

Page 62: Review of Assignment 3, Loose Ends, Security, Web-based Data Collection Michael A. Kohn, MD, MPP 2 February 2010

Data Collection and Entry

• Import baseline data from existing systems• Import lab results, scan results (e.g.

DEXA), holter monitor data, and other digital data.

• For each form, who will collect the data?• Collect onto paper forms and then

transcribe? Enter directly using screen forms? Scannable forms?

Page 63: Review of Assignment 3, Loose Ends, Security, Web-based Data Collection Michael A. Kohn, MD, MPP 2 February 2010

Error Checking and Validation

• Database automatically checks data against the range of allowed values.

• Periodic outlier detection. (Outliers still within the range of allowed values.)

• Calculation checks

• Is double data entry really needed ?

Page 64: Review of Assignment 3, Loose Ends, Security, Web-based Data Collection Michael A. Kohn, MD, MPP 2 February 2010

Analysis

• How will you get the data out of the database?

Page 65: Review of Assignment 3, Loose Ends, Security, Web-based Data Collection Michael A. Kohn, MD, MPP 2 February 2010

Security/Confidentiality

• Keep identifying data (name, SSN, MRN) in a separate table.

• Link rest of DB to this table via a Subject ID that has no meaning external to the DB.

• Restrict access to identifying data.• Password protect at both OS and application

levels.• Audit entries and updates.

Page 66: Review of Assignment 3, Loose Ends, Security, Web-based Data Collection Michael A. Kohn, MD, MPP 2 February 2010

Back ups

• Ask your system person to restore a file periodically. This tests both the back-up and restore systems.

Page 67: Review of Assignment 3, Loose Ends, Security, Web-based Data Collection Michael A. Kohn, MD, MPP 2 February 2010

Assignment 4, Part A

Page 68: Review of Assignment 3, Loose Ends, Security, Web-based Data Collection Michael A. Kohn, MD, MPP 2 February 2010

Assignment 4, Part BData Management Protocol

Write a one-page data management section for your research study protocol or a one-page description of your current research study database.

At the beginning of your assignment, for the readers, briefly describe your study, including design, predictors, outcomes, target population, and sample size. (1 or 2 sentences)

Include with your assignment a relationships diagram showing the structure of your study database.

Send assignment to [email protected] by 2/16/2009.

Page 69: Review of Assignment 3, Loose Ends, Security, Web-based Data Collection Michael A. Kohn, MD, MPP 2 February 2010

Assignment 4

Due 2/19/08, send to [email protected] a one-page data management section for your research study protocol or a one-

page description of your current research study database.At the beginning of your assignment, for the readers, briefly describe your study,

including design, predictors, outcomes, target population, and sample size. (1 or 2 sentences).

Optionally, include with your assignment a relationships diagram* showing the structure of your study database.

The elements of a data management protocol or database description were covered in the 2/5/08 lecture and include:

General description of database (possibly including a relationships diagram*)Data collection and entryError checking and data validationAnalysis/Reporting (e.g., export to Stata)Security/confidentialityAdministration/Back upExtra Credit: Include a budget or cost estimate for data management.

*Relationships diagram is optional

Page 70: Review of Assignment 3, Loose Ends, Security, Web-based Data Collection Michael A. Kohn, MD, MPP 2 February 2010

Assignment 4

1) What is your study?  ("The [CUTE ACRONYM] study is a [DESIGN] study of the associations between [PREDICTOR] and [OUTCOME] in [STUDY POPULATION]").

2) What data points are you collecting?  (Helps to have an actual data collection form mocked up in Word or Access.)

3) Who will collect the data? You?  RAs?  MDs?  Maybe the study subjects will enter the data themselves.

Page 71: Review of Assignment 3, Loose Ends, Security, Web-based Data Collection Michael A. Kohn, MD, MPP 2 February 2010

Assignment 4 (cont’d)

4) How will the data be collected? Written onto a paper form and then transcribed into a computer file?  Entered directly into the computer?  (If it's going to be transcribed, will you be doing that? Will you hire somebody? Or will you enlist some med students?)

5) Will the above-mentioned computer file be an Excel file, Stata file, Access file, or something else? 

6) If it's a single table database (e.g., Excel or Stata), what will the rows represent, what will the columns be?  Try to provide a detailed data dictionary with the name, data type, description, and validation rules for each field (column) in the single table.

Page 72: Review of Assignment 3, Loose Ends, Security, Web-based Data Collection Michael A. Kohn, MD, MPP 2 February 2010

Assignment 4

7) If it's a multi-table database, even a hand-drawn relationships diagram would help but is not required.

8) How will you validate the data for correctness and monitor the data collection effort?  (Usually you have some range checks on individual variables and you periodically query for outliers that are nonetheless within the allowed range.)

9) You should periodically analyze the data, not only to look for problems, but also to see where the study is headed.  How will you do this?  Query in Access and export to Stata?

10) How will you protect your subjects' identifying data?11) How will you ensure that you don't lose your data file in a

computer crash or if a water pipe leaks?

Page 73: Review of Assignment 3, Loose Ends, Security, Web-based Data Collection Michael A. Kohn, MD, MPP 2 February 2010

Answering these questions is an essential part of doing a clinical

research study.