data management using epidata and spss references public domain (pdf) book on data management:...
TRANSCRIPT
![Page 1: DATA MANAGEMENT Using EpiData and SPSS References Public domain (pdf) book on data management: Bennett, et al. (2001). Data Management for Surveys and](https://reader030.vdocuments.us/reader030/viewer/2022020800/551c57955503469d6a8b4f21/html5/thumbnails/1.jpg)
DATA MANAGEMENTDATA MANAGEMENT
Using EpiData and SPSSUsing EpiData and SPSS
![Page 2: DATA MANAGEMENT Using EpiData and SPSS References Public domain (pdf) book on data management: Bennett, et al. (2001). Data Management for Surveys and](https://reader030.vdocuments.us/reader030/viewer/2022020800/551c57955503469d6a8b4f21/html5/thumbnails/2.jpg)
ReferencesReferences
Public domain (pdf) book on data management: Public domain (pdf) book on data management: Bennett, et al. (2001). Bennett, et al. (2001). Data Management for Data Management for Surveys and Trials. A Practical Primer Using Surveys and Trials. A Practical Primer Using EpiDataEpiData. The EpiData Documentation Project. : . The EpiData Documentation Project. : http://www.epidata.dk/downloads/dmepidata.pdfhttp://www.epidata.dk/downloads/dmepidata.pdf
EpiData Association Website: EpiData Association Website: http://www.epidata.dk/http://www.epidata.dk/
Importing raw data into SPSS: Importing raw data into SPSS: http://www.ats.ucla.edu/stat/spss/modules/input.hthttp://www.ats.ucla.edu/stat/spss/modules/input.htmm
![Page 3: DATA MANAGEMENT Using EpiData and SPSS References Public domain (pdf) book on data management: Bennett, et al. (2001). Data Management for Surveys and](https://reader030.vdocuments.us/reader030/viewer/2022020800/551c57955503469d6a8b4f21/html5/thumbnails/3.jpg)
Data ManagementData Management• Planning data needsPlanning data needs• Data collectionData collection• Data entry and controlData entry and control• Validation and checkingValidation and checking• Data cleaning and variable transformationData cleaning and variable transformation• Data backup and storageData backup and storage• System documentationSystem documentation• OtherOther
![Page 4: DATA MANAGEMENT Using EpiData and SPSS References Public domain (pdf) book on data management: Bennett, et al. (2001). Data Management for Surveys and](https://reader030.vdocuments.us/reader030/viewer/2022020800/551c57955503469d6a8b4f21/html5/thumbnails/4.jpg)
Types of Data Base Types of Data Base Management Systems Management Systems
(DBMSs)(DBMSs)• Spreadsheets (e.g., Excel, SPSS Data Editor)Spreadsheets (e.g., Excel, SPSS Data Editor)
• Prone to error, data corruption, & mismanagementProne to error, data corruption, & mismanagement• Lack data controls, limited programmabilityLack data controls, limited programmability• Suitable only for small and didactic projects Suitable only for small and didactic projects • Also good for last step data cleaningAlso good for last step data cleaning
• Commercial DBMS programs (e.g., Oracle, Access)Commercial DBMS programs (e.g., Oracle, Access)• Limited data control, good programmabilityLimited data control, good programmability• Slow & expensiveSlow & expensive• Powerful and widely availablePowerful and widely available
• Public domain programs (e.g., EpiData, Epi Info)Public domain programs (e.g., EpiData, Epi Info)• Controlled data entry, good programmabilityControlled data entry, good programmability• Suitable for research and field useSuitable for research and field use
![Page 5: DATA MANAGEMENT Using EpiData and SPSS References Public domain (pdf) book on data management: Bennett, et al. (2001). Data Management for Surveys and](https://reader030.vdocuments.us/reader030/viewer/2022020800/551c57955503469d6a8b4f21/html5/thumbnails/5.jpg)
We will use two We will use two platforms:platforms:
• EpiDataEpiData • controlled data entry controlled data entry • data documentationdata documentation• export (“write”) data export (“write”) data
• SPSSSPSS • import (“read”) dataimport (“read”) data• analysis analysis • reportingreporting
![Page 6: DATA MANAGEMENT Using EpiData and SPSS References Public domain (pdf) book on data management: Bennett, et al. (2001). Data Management for Surveys and](https://reader030.vdocuments.us/reader030/viewer/2022020800/551c57955503469d6a8b4f21/html5/thumbnails/6.jpg)
What is EpiData ? What is EpiData ? • EpiData is computer program (small in size EpiData is computer program (small in size
1.2Mb) for simple or programmed data entry 1.2Mb) for simple or programmed data entry and data documentationand data documentation
• It is highly reliable It is highly reliable • It runs on Windows computers It runs on Windows computers
• Runs on Macs and Linus with emulator software Runs on Macs and Linus with emulator software (only)(only)
• InterfaceInterface• pull down menus pull down menus • work barwork bar
![Page 7: DATA MANAGEMENT Using EpiData and SPSS References Public domain (pdf) book on data management: Bennett, et al. (2001). Data Management for Surveys and](https://reader030.vdocuments.us/reader030/viewer/2022020800/551c57955503469d6a8b4f21/html5/thumbnails/7.jpg)
History of EpiInfo & EpiData History of EpiInfo & EpiData
• 1976–1995: EpiInfo (DOS program) created by 1976–1995: EpiInfo (DOS program) created by CDC (in wake of swine flu epidemic)CDC (in wake of swine flu epidemic)• Small, fast, reliable, 100,000+ users worldwideSmall, fast, reliable, 100,000+ users worldwide
• 1995–2000: DOS dies slow painful death1995–2000: DOS dies slow painful death• 2000: CDC releases EpiInfo20002000: CDC releases EpiInfo2000
• Based on Microsoft Jet (Access) data engineBased on Microsoft Jet (Access) data engine• Large, slow, unreliable (resembled EpiInfo in name Large, slow, unreliable (resembled EpiInfo in name
only)only)
• 2001: Loyal EpiInfo user group decides it needs 2001: Loyal EpiInfo user group decides it needs real “EpiInfo for Windows”real “EpiInfo for Windows”• Creates open source public domain program Creates open source public domain program • Calls program “EpiData” Calls program “EpiData”
![Page 8: DATA MANAGEMENT Using EpiData and SPSS References Public domain (pdf) book on data management: Bennett, et al. (2001). Data Management for Surveys and](https://reader030.vdocuments.us/reader030/viewer/2022020800/551c57955503469d6a8b4f21/html5/thumbnails/8.jpg)
Goal: Create & Maintain Goal: Create & Maintain Error-Free DatasetsError-Free Datasets
• Two types of data errorsTwo types of data errors• Measurement error (i.e., information bias) – Measurement error (i.e., information bias) –
discussed last couple of weeksdiscussed last couple of weeks• Processing errors = errors that occur during Processing errors = errors that occur during
data handling – discussed this weekdata handling – discussed this week
• Examples of data processing errorsExamples of data processing errors• Transpositions (91 instead of 19)Transpositions (91 instead of 19)• Copying errors (O instead of 0)Copying errors (O instead of 0)• Additional processing errors described on p. Additional processing errors described on p.
18.2 18.2
![Page 9: DATA MANAGEMENT Using EpiData and SPSS References Public domain (pdf) book on data management: Bennett, et al. (2001). Data Management for Surveys and](https://reader030.vdocuments.us/reader030/viewer/2022020800/551c57955503469d6a8b4f21/html5/thumbnails/9.jpg)
Avoiding Data Processing Avoiding Data Processing ErrorsErrors
• Manual checks (e.g., handwriting Manual checks (e.g., handwriting legibility)legibility)
• Range and consistency checks* (e.g., do Range and consistency checks* (e.g., do not allow hysterectomy dates for men)not allow hysterectomy dates for men)
• Double entry and validation* Double entry and validation* • Operator 1 enters dataOperator 1 enters data• Operator 2 enters data in separate fileOperator 2 enters data in separate file• Check files for inconsistenciesCheck files for inconsistencies
• Screening during analysis (e.g., look for Screening during analysis (e.g., look for outliers)outliers)
* covered in lab
![Page 10: DATA MANAGEMENT Using EpiData and SPSS References Public domain (pdf) book on data management: Bennett, et al. (2001). Data Management for Surveys and](https://reader030.vdocuments.us/reader030/viewer/2022020800/551c57955503469d6a8b4f21/html5/thumbnails/10.jpg)
Controlled Data EntryControlled Data Entry• Criteria for accepting & rejecting dataCriteria for accepting & rejecting data• Types of data controlsTypes of data controls
• Range checks (e.g., restrict Range checks (e.g., restrict AGEAGE to to reasonable range)reasonable range)
• Value labels (e.g., Value labels (e.g., SEXSEX:: 1 = male, 2 = female1 = male, 2 = female))• Jumps (e.g., if “male,” jump to Q8)Jumps (e.g., if “male,” jump to Q8)• Consistency checks (e.g., if “sex = male,” Consistency checks (e.g., if “sex = male,”
do not allow “hysterectomy = yes”)do not allow “hysterectomy = yes”)• Must entersMust enters• etc.etc.
![Page 11: DATA MANAGEMENT Using EpiData and SPSS References Public domain (pdf) book on data management: Bennett, et al. (2001). Data Management for Surveys and](https://reader030.vdocuments.us/reader030/viewer/2022020800/551c57955503469d6a8b4f21/html5/thumbnails/11.jpg)
Data Processing StepsData Processing Steps1.1. File naming conventionsFile naming conventions2.2. Variables types and namesVariables types and names3.3. QES (questionnaire) developmentQES (questionnaire) development4.4. Convert .QES file to .REC (record) file Convert .QES file to .REC (record) file 5.5. Add .CHK file Add .CHK file 6.6. Enter data in REC fileEnter data in REC file7.7. Validate data (double entry procedure)Validate data (double entry procedure)8.8. Documentation data (code book) Documentation data (code book) 9.9. Export data to SPSS Export data to SPSS 10.10. Import data into SPSSImport data into SPSS
![Page 12: DATA MANAGEMENT Using EpiData and SPSS References Public domain (pdf) book on data management: Bennett, et al. (2001). Data Management for Surveys and](https://reader030.vdocuments.us/reader030/viewer/2022020800/551c57955503469d6a8b4f21/html5/thumbnails/12.jpg)
Filenaming and File Filenaming and File ManagementManagement
• c:\path\filename.extc:\path\filename.ext• A web address is a good example of a filename, e.g., A web address is a good example of a filename, e.g.,
http://www2.sjsu.edu/faculty/gerstman/StatPrimer/data.ppthttp://www2.sjsu.edu/faculty/gerstman/StatPrimer/data.ppt
• Some systems are case sensitive (Unix)Some systems are case sensitive (Unix)• Others are not (Windows) Others are not (Windows)
• Always be aware ofAlways be aware of• Physical locationPhysical location (local, removable, network)(local, removable, network)• PathPath (folders and subfolders) (folders and subfolders)• FilenameFilename (proper) (proper) • ExtensionExtension
• Demo Demo Windows Network ExplorerWindows Network Explorer: right-click Start : right-click Start Bar > ExploreBar > Explore
![Page 13: DATA MANAGEMENT Using EpiData and SPSS References Public domain (pdf) book on data management: Bennett, et al. (2001). Data Management for Surveys and](https://reader030.vdocuments.us/reader030/viewer/2022020800/551c57955503469d6a8b4f21/html5/thumbnails/13.jpg)
File extensions you should File extensions you should knowknow
ExtensionExtension Software programSoftware program
.qes.qes EpiInfo/EpiData questionnaireEpiInfo/EpiData questionnaire
.rec.rec EpiInfo/EpiData records (data)EpiInfo/EpiData records (data)
.chk.chk EpiInfo/EpiData check (controls & labels)EpiInfo/EpiData check (controls & labels)
.not.not EpiData notes (data documentation)EpiData notes (data documentation)
.sav.sav SPSS permanent data fileSPSS permanent data file
.sps.sps SPSS syntax file (program)SPSS syntax file (program)
.txt.txt Generic (flat) text dataGeneric (flat) text data
.htm.htm Web BrowserWeb Browser
.doc.doc Microsoft WordMicrosoft Word
.xls.xls Microsoft ExcelMicrosoft Excel
![Page 14: DATA MANAGEMENT Using EpiData and SPSS References Public domain (pdf) book on data management: Bennett, et al. (2001). Data Management for Surveys and](https://reader030.vdocuments.us/reader030/viewer/2022020800/551c57955503469d6a8b4f21/html5/thumbnails/14.jpg)
Selected EpiData Selected EpiData Variable TypesVariable Types
Variable TypeVariable Type ExamplesExamples
TextText _ _ <A ><A >
NumericNumeric ####.###.#
DateDate <mm/dd/yyyy><mm/dd/yyyy><dd/mm/yyyy><dd/mm/yyyy>
Auto IDAuto ID <IDNUM><IDNUM>
Sondex (sanitized)Sondex (sanitized) <S ><S >
![Page 15: DATA MANAGEMENT Using EpiData and SPSS References Public domain (pdf) book on data management: Bennett, et al. (2001). Data Management for Surveys and](https://reader030.vdocuments.us/reader030/viewer/2022020800/551c57955503469d6a8b4f21/html5/thumbnails/15.jpg)
EpiData Variable EpiData Variable NamesNames
• Variable nameVariable name based on text that occurs based on text that occurs before variable type indicator codebefore variable type indicator code
• EpiData variable naming default vary EpiData variable naming default vary depending on installation depending on installation
• Create variable names exactly as specifiedCreate variable names exactly as specifiedTo be safe, denote variable names in {curly To be safe, denote variable names in {curly
brackets}brackets}
• For example, to create a two byte numeric For example, to create a two byte numeric variable called age, use the question:variable called age, use the question:
What is your {age}? ##
![Page 16: DATA MANAGEMENT Using EpiData and SPSS References Public domain (pdf) book on data management: Bennett, et al. (2001). Data Management for Surveys and](https://reader030.vdocuments.us/reader030/viewer/2022020800/551c57955503469d6a8b4f21/html5/thumbnails/16.jpg)
Demo / Work AlongDemo / Work Along• Create QES file [demo.qes]Create QES file [demo.qes]• Convert QES to REC [demo.rec]Convert QES to REC [demo.rec]• Create CHK file [demo.chk]Create CHK file [demo.chk]• Create double entry file [demo2.rec]Create double entry file [demo2.rec]• Enter data Enter data • Validate dataValidate data
FnameFname LnameLname DOBDOB SEXSEX DEATHAGEDEATHAGE
JohnJohn SnowSnow 3/15/18133/15/1813 11 4545
GeorgeGeorge OrwellOrwell 6/25/19036/25/1903 11 4646
![Page 17: DATA MANAGEMENT Using EpiData and SPSS References Public domain (pdf) book on data management: Bennett, et al. (2001). Data Management for Surveys and](https://reader030.vdocuments.us/reader030/viewer/2022020800/551c57955503469d6a8b4f21/html5/thumbnails/17.jpg)
We We willwill stop here and stop here and pick up the second part pick up the second part
of the lecture next of the lecture next weekweek
““Stay tuned”Stay tuned”
![Page 18: DATA MANAGEMENT Using EpiData and SPSS References Public domain (pdf) book on data management: Bennett, et al. (2001). Data Management for Surveys and](https://reader030.vdocuments.us/reader030/viewer/2022020800/551c57955503469d6a8b4f21/html5/thumbnails/18.jpg)
CodebooksCodebooks
• Contain info that helps users decipher Contain info that helps users decipher data file content and structuredata file content and structure
• Includes:Includes:• Filename(s)Filename(s)• File location(s)File location(s)• Variable namesVariable names• Coding schemesCoding schemes• Units Units • Anything else you think might be usefulAnything else you think might be useful
![Page 19: DATA MANAGEMENT Using EpiData and SPSS References Public domain (pdf) book on data management: Bennett, et al. (2001). Data Management for Surveys and](https://reader030.vdocuments.us/reader030/viewer/2022020800/551c57955503469d6a8b4f21/html5/thumbnails/19.jpg)
EpiData codebook EpiData codebook generatorsgenerators
![Page 20: DATA MANAGEMENT Using EpiData and SPSS References Public domain (pdf) book on data management: Bennett, et al. (2001). Data Management for Surveys and](https://reader030.vdocuments.us/reader030/viewer/2022020800/551c57955503469d6a8b4f21/html5/thumbnails/20.jpg)
File Structure File Structure CodebookCodebook
Full codebook contains descriptive statistics (demo)
![Page 21: DATA MANAGEMENT Using EpiData and SPSS References Public domain (pdf) book on data management: Bennett, et al. (2001). Data Management for Surveys and](https://reader030.vdocuments.us/reader030/viewer/2022020800/551c57955503469d6a8b4f21/html5/thumbnails/21.jpg)
Full CodebookFull Codebook
Notice descriptive statistics
![Page 22: DATA MANAGEMENT Using EpiData and SPSS References Public domain (pdf) book on data management: Bennett, et al. (2001). Data Management for Surveys and](https://reader030.vdocuments.us/reader030/viewer/2022020800/551c57955503469d6a8b4f21/html5/thumbnails/22.jpg)
Conversion of Data Conversion of Data FileFile
• Requires common intermediate file Requires common intermediate file formatformat
• Examples of common intermediate filesExamples of common intermediate files• .TXT = plain text .TXT = plain text • .DBF = dBase program.DBF = dBase program• .XLS = Excel.XLS = Excel
• StepsSteps• Export .REC file Export .REC file .TXT file .TXT file• Import .TXT file into SPSS Import .TXT file into SPSS • Save permanent SAV fileSave permanent SAV file
![Page 23: DATA MANAGEMENT Using EpiData and SPSS References Public domain (pdf) book on data management: Bennett, et al. (2001). Data Management for Surveys and](https://reader030.vdocuments.us/reader030/viewer/2022020800/551c57955503469d6a8b4f21/html5/thumbnails/23.jpg)
Current Export Formats Current Export Formats Supported by EpiDataSupported by EpiData
![Page 24: DATA MANAGEMENT Using EpiData and SPSS References Public domain (pdf) book on data management: Bennett, et al. (2001). Data Management for Surveys and](https://reader030.vdocuments.us/reader030/viewer/2022020800/551c57955503469d6a8b4f21/html5/thumbnails/24.jpg)
Plain (“raw”) TXT dataPlain (“raw”) TXT data
• plain ASCII data formatplain ASCII data format• no column demarcationsno column demarcations• no variable namesno variable names• no labelsno labels
![Page 25: DATA MANAGEMENT Using EpiData and SPSS References Public domain (pdf) book on data management: Bennett, et al. (2001). Data Management for Surveys and](https://reader030.vdocuments.us/reader030/viewer/2022020800/551c57955503469d6a8b4f21/html5/thumbnails/25.jpg)
TXT file with codebook TXT file with codebook tox-samp.txttox-samp.txt tox-samp.nottox-samp.not
![Page 26: DATA MANAGEMENT Using EpiData and SPSS References Public domain (pdf) book on data management: Bennett, et al. (2001). Data Management for Surveys and](https://reader030.vdocuments.us/reader030/viewer/2022020800/551c57955503469d6a8b4f21/html5/thumbnails/26.jpg)
SPSS Data Export / SPSS Data Export / ImportImport
TXT(raw data)
REC
SPS(syntax)
SAV
![Page 27: DATA MANAGEMENT Using EpiData and SPSS References Public domain (pdf) book on data management: Bennett, et al. (2001). Data Management for Surveys and](https://reader030.vdocuments.us/reader030/viewer/2022020800/551c57955503469d6a8b4f21/html5/thumbnails/27.jpg)
Top of tox-samp.spsTop of tox-samp.sps
Lines beginning with * are comments (ignored by command interpreter)
Next set of commands showfile location and structure via SPSS command syntax
![Page 28: DATA MANAGEMENT Using EpiData and SPSS References Public domain (pdf) book on data management: Bennett, et al. (2001). Data Management for Surveys and](https://reader030.vdocuments.us/reader030/viewer/2022020800/551c57955503469d6a8b4f21/html5/thumbnails/28.jpg)
Bottom part of tox-Bottom part of tox-samp.sps filesamp.sps file
Labels being importedinto SPSS
Delete * if you want this command to run
![Page 29: DATA MANAGEMENT Using EpiData and SPSS References Public domain (pdf) book on data management: Bennett, et al. (2001). Data Management for Surveys and](https://reader030.vdocuments.us/reader030/viewer/2022020800/551c57955503469d6a8b4f21/html5/thumbnails/29.jpg)
Opening the SPS (command) Opening the SPS (command) filefile
![Page 30: DATA MANAGEMENT Using EpiData and SPSS References Public domain (pdf) book on data management: Bennett, et al. (2001). Data Management for Surveys and](https://reader030.vdocuments.us/reader030/viewer/2022020800/551c57955503469d6a8b4f21/html5/thumbnails/30.jpg)
Running the SPS fileRunning the SPS file
![Page 31: DATA MANAGEMENT Using EpiData and SPSS References Public domain (pdf) book on data management: Bennett, et al. (2001). Data Management for Surveys and](https://reader030.vdocuments.us/reader030/viewer/2022020800/551c57955503469d6a8b4f21/html5/thumbnails/31.jpg)
Ethics of Data Ethics of Data KeepingKeeping
• Confidentiality (sanitized files – Confidentiality (sanitized files – free of identifiers)free of identifiers)
• Beneficence Beneficence • EquipoiseEquipoise• Informed consent (To what Informed consent (To what
extent?)extent?)• Oversight (IRB)Oversight (IRB)