ccpr computing services more efficient programming july 13, 2006
TRANSCRIPT
![Page 1: CCPR Computing Services More Efficient Programming July 13, 2006](https://reader034.vdocuments.us/reader034/viewer/2022051401/56649cae5503460f94971835/html5/thumbnails/1.jpg)
CCPR Computing ServicesMore Efficient Programming
July 13, 2006
![Page 2: CCPR Computing Services More Efficient Programming July 13, 2006](https://reader034.vdocuments.us/reader034/viewer/2022051401/56649cae5503460f94971835/html5/thumbnails/2.jpg)
Outline
Thinking through a programming task Ways of efficiently documenting and organizing your
project Naming variables, programs, files Commenting code Including file header Implementing directory structure
Programming constructs Raw data -> finished product: are your results
replicable?
![Page 3: CCPR Computing Services More Efficient Programming July 13, 2006](https://reader034.vdocuments.us/reader034/viewer/2022051401/56649cae5503460f94971835/html5/thumbnails/3.jpg)
Before you start coding…
Think Clearly define the problem in writing Write down the solution/algorithm in English
Modularity Create test (if reasonable)
Translate one section to code Test the section thoroughly Translate/Test next section, etc.
![Page 4: CCPR Computing Services More Efficient Programming July 13, 2006](https://reader034.vdocuments.us/reader034/viewer/2022051401/56649cae5503460f94971835/html5/thumbnails/4.jpg)
Documentation - File Header
Each do-file/program/file you create should include: Your name Project name Project location Date Software Version Purpose of program Inputs, Outputs Special Notes
![Page 5: CCPR Computing Services More Efficient Programming July 13, 2006](https://reader034.vdocuments.us/reader034/viewer/2022051401/56649cae5503460f94971835/html5/thumbnails/5.jpg)
Naming Files, Variables, and Functions Use language standard (if it exists) Be aware of language-specific rules
Max length, underscore, case, reserved words Differentiating log files:
Programs MergeHH.sas, MergeHH.do Log files MergeHHsas.log, MergeHHsta.log
Meaningful variable names: LogWt vs. var1 AgeLt30 vs. x
Procedure that cleans missing values of Age: fixMissingAge
Matrix multiplication X transpose times X matXX
![Page 6: CCPR Computing Services More Efficient Programming July 13, 2006](https://reader034.vdocuments.us/reader034/viewer/2022051401/56649cae5503460f94971835/html5/thumbnails/6.jpg)
Commenting Code
Good code is self-commenting Naming conventions, structure/formatting, header should
explain 95% Comments should explain
Purpose of code, not every detail Tricks used Reasons for unusual coding
Comments do not fix sloppy code translate syntax
If it takes longer to read the comment than to read the code, don’t add a comment!
![Page 7: CCPR Computing Services More Efficient Programming July 13, 2006](https://reader034.vdocuments.us/reader034/viewer/2022051401/56649cae5503460f94971835/html5/thumbnails/7.jpg)
Commenting Code - Stata example
SAMPLE 2*Convert names in dataset to
lowercase.program def lowerVarNames foreach v of varlist _all { local LowName = lower("`v'")
if `"`v'"' != `"`LowName'"' { rename `v' `=lower("`v'")' }
}end
SAMPLE 1program def function1foreach v of varlist _all {local x = lower("`v'")if `"`v'"' != `"`x'"' {rename `v' `=lower("`v'")'}}end
Compare formatting, comments, variable name and function names
![Page 8: CCPR Computing Services More Efficient Programming July 13, 2006](https://reader034.vdocuments.us/reader034/viewer/2022051401/56649cae5503460f94971835/html5/thumbnails/8.jpg)
Directory Structure
A project consists of many different types of files
Use folders to separate files in a logical way
Be consistent across projects if possible
ATTIC folder for older versions
HOME
PROJECT NAME
DATA
RESULTS
LOG
PROGRAMS
ATTIC
![Page 9: CCPR Computing Services More Efficient Programming July 13, 2006](https://reader034.vdocuments.us/reader034/viewer/2022051401/56649cae5503460f94971835/html5/thumbnails/9.jpg)
Stata example: using directory structure** Paths:
global parentpath "C:\Documents and Settings\piersol\Summer06\prog\progtips"global pgmsloc "$parentpath\pgms"global logsloc "$parentpath\logs"global cleandataloc "$parentpath\data\clean"global rawdataloc "$parentpath\data\raw"
capture log closelog using "$logsloc\test200607", text replace**********************************************************************INSERT FILE HEADER HERE...then it’s included in log file.*********************************************************************macro list
webuse union, clearsave "$rawdataloc\union.dta", replace
*keep idcode year age gradesave "$cleandataloc\unionLJP.dta", replace
log close
![Page 10: CCPR Computing Services More Efficient Programming July 13, 2006](https://reader034.vdocuments.us/reader034/viewer/2022051401/56649cae5503460f94971835/html5/thumbnails/10.jpg)
Programming Constructs
Tools to simplify and clarify your coding Available in virtually all languages Constructs
Loops - for, foreach, do, while If/elseif/else– if, then, else, case continue exit
![Page 11: CCPR Computing Services More Efficient Programming July 13, 2006](https://reader034.vdocuments.us/reader034/viewer/2022051401/56649cae5503460f94971835/html5/thumbnails/11.jpg)
Loop Example 1 Problem: Given 4 indicator variables (south, union, black,
not_smsa) and 2 discrete variables (age, grade), generate 8 new indicator variables:
south_age21 = south and age > 21, south_gr12 = south and grade > 12 Similarly for union, black, not_smsa
Solution without loop 8 lines of code similar to:
generate newvar = (south==1 & age>21 & age<.) generate newvar = (south==1 & grade>12 & grade<.)
Solution with loopforeach j in south union black not_smsa {
gen `j'_age21 = (age>21 & age<. & `j'==1)
gen `j'_gr12 = (grade>12 & grade<. & `j'==1)
}
![Page 12: CCPR Computing Services More Efficient Programming July 13, 2006](https://reader034.vdocuments.us/reader034/viewer/2022051401/56649cae5503460f94971835/html5/thumbnails/12.jpg)
Loop Example 1, cont.*CHECK GENERATED VARIABLES AGAINST ORIGINAL VARIABLESforeach j in south union black not_smsa { qui count if `j'==1 & age>21 & age<. local origCount = r(N) qui count if `j'_age21==1 if `origCount' ~= `r(N)' { display "Counts do not match for `j'_age21!" } else display "Counts match for `j'_age21."
qui count if `j'==1 & grade>12 & grade<. local origCount = r(N) qui count if `j'_gr12==1 if `origCount' ~= `r(N)' { display "Counts do not match for `j'_gr21!" } else display "Counts match for `j'_gr21."}
![Page 13: CCPR Computing Services More Efficient Programming July 13, 2006](https://reader034.vdocuments.us/reader034/viewer/2022051401/56649cae5503460f94971835/html5/thumbnails/13.jpg)
Loop Example 2
Given indicator variables white, black, other, and continuous variable educyrs, create interaction variables
Solution using loop:local allraces "white black other"
foreach race of varlist `allraces' {
generate `race'_educ=`race'*educyrs
}
![Page 14: CCPR Computing Services More Efficient Programming July 13, 2006](https://reader034.vdocuments.us/reader034/viewer/2022051401/56649cae5503460f94971835/html5/thumbnails/14.jpg)
Loop Example 3
Problem: Dataset contains variables over multiple years (1970-1990) Need to perform a number of commands separately for 1970, 1975,
1980, 1985. Solution without loop
bysort year: command1 if year==70 | year==75 | year==80 | year==85bysort year: command2 if year==70 | year==75 | year==80 | year==85
Solution with loopforeach year in 70 75 80 85 { di as result "***Regression for year = `year':" regress ln_wage grade tenure ttl_exp if year==`year' di as result "***Summarize for year = `year':" summarize ln_wage if year==`year'}
![Page 15: CCPR Computing Services More Efficient Programming July 13, 2006](https://reader034.vdocuments.us/reader034/viewer/2022051401/56649cae5503460f94971835/html5/thumbnails/15.jpg)
Loop Example 4 – pulling from 2 lists From Stata FAQ websiteCode:local agrp "cat dog cow pig"local bgrp "meow woof moo oinkoink"local n : word count `agrp'
forvalues i = 1/`n' { local a : word `i' of `agrp' local b : word `i' of `bgrp' di "`a' says `b'" }Resulting output:cat says meowdog says woofcow says moopig says oinkoink
![Page 16: CCPR Computing Services More Efficient Programming July 13, 2006](https://reader034.vdocuments.us/reader034/viewer/2022051401/56649cae5503460f94971835/html5/thumbnails/16.jpg)
Constructs - If/then/else Execute section of code if condition is true:
if condition then
{execute this code if condition true}
end
Execute one of two sections of code: if condition then
{execute this code if condition true}
else
{execute this code if condition false}
end
![Page 17: CCPR Computing Services More Efficient Programming July 13, 2006](https://reader034.vdocuments.us/reader034/viewer/2022051401/56649cae5503460f94971835/html5/thumbnails/17.jpg)
If/Else Example
Problem: need to execute commands on an operating system, but only if the os is Unix…the commands will fail if os is anything else
Solution:if "`c(os)'"~="Unix" { di as err "Sorry; this section requires Unix OS."}else { ** continue with unix commands…}
![Page 18: CCPR Computing Services More Efficient Programming July 13, 2006](https://reader034.vdocuments.us/reader034/viewer/2022051401/56649cae5503460f94971835/html5/thumbnails/18.jpg)
Constructs - Elseif/case Elseif - Execute one of many sections of code:
if condition1 then{execute this code if condition1 true}
elseif condition2 then{execute this code if condition2 true}
else{execute this code if condition1, condition2 are all false}
end
Case- same idea, different name
case condition1 then{execute this code if condition1 true}
case condition2 then{execute this code if condition2 true}
etc.
![Page 19: CCPR Computing Services More Efficient Programming July 13, 2006](https://reader034.vdocuments.us/reader034/viewer/2022051401/56649cae5503460f94971835/html5/thumbnails/19.jpg)
Elseif Example
Problem: Continue example from if…else, but execute different section of code for Unix, Windows, and Mac
Solution:if "`c(os)'"=="Unix" {
di "This is a Unix environment"
}
else if "`c(os)'" == "Windows" {
di "This is a Windows environment"
}
else if "`c(os)'" =="MacOSX" {
di "This is a MacOS” environment."
}
else {
di as err "`c(os)' not recognized."
}
![Page 20: CCPR Computing Services More Efficient Programming July 13, 2006](https://reader034.vdocuments.us/reader034/viewer/2022051401/56649cae5503460f94971835/html5/thumbnails/20.jpg)
Stata- If command vs. if qualifier ifcmd was designed to be used with a single expression Example:
Given variable x with 5 observations: 1, 1, 2, 1, 3, Compare the following three pieces of Stata code:if x==2 { replace x=99}
if x==1 { replace x=99}
replace x=99 if x==2
![Page 21: CCPR Computing Services More Efficient Programming July 13, 2006](https://reader034.vdocuments.us/reader034/viewer/2022051401/56649cae5503460f94971835/html5/thumbnails/21.jpg)
Stata- If command vs. if qualifier
![Page 22: CCPR Computing Services More Efficient Programming July 13, 2006](https://reader034.vdocuments.us/reader034/viewer/2022051401/56649cae5503460f94971835/html5/thumbnails/22.jpg)
Constucts -- Continue Example from Stata online help Continue is used to exit current iteration of loop and
continue with next iteration The following two loops produce the same result:
forvalues x = 1/10 { if mod(`x',2)==1 { display "`x' is odd" continue } display "`x' is even"}
forvalues x = 1/10 { if mod(`x',2)==1 { display "`x' is odd" } else { display "`x' is even" }}
![Page 23: CCPR Computing Services More Efficient Programming July 13, 2006](https://reader034.vdocuments.us/reader034/viewer/2022051401/56649cae5503460f94971835/html5/thumbnails/23.jpg)
Constructs – Exit
Stop execution of program Examples:
Do-file contains a number of data checks followed by analysis commands. If data checks reveal something unacceptable, you can exit out of do-file before running analysis.
Program requires user input. If user enters “bad” information, need to quit program.
Debugging. If particular error occurs then break. Check denominator prior to dividing. If equals zero, exit.
![Page 24: CCPR Computing Services More Efficient Programming July 13, 2006](https://reader034.vdocuments.us/reader034/viewer/2022051401/56649cae5503460f94971835/html5/thumbnails/24.jpg)
Raw data to finished product
Raw data
Analysis data
Runs/results
Finished product
![Page 25: CCPR Computing Services More Efficient Programming July 13, 2006](https://reader034.vdocuments.us/reader034/viewer/2022051401/56649cae5503460f94971835/html5/thumbnails/25.jpg)
Raw Data -> Analysis Data
Always have two distinct data files- the raw data and analysis data
A program should completely re-create analysis data from raw data
NO interactive changes!! Final changes must go in a program!!
![Page 26: CCPR Computing Services More Efficient Programming July 13, 2006](https://reader034.vdocuments.us/reader034/viewer/2022051401/56649cae5503460f94971835/html5/thumbnails/26.jpg)
Raw Data -> Analysis Data
Document all of the following: Outliers? Errors? Missing data? Changes to the data?
Remember to check- Consistency across variables Duplicates Individual records, not just summary stats “Smell tests”
![Page 27: CCPR Computing Services More Efficient Programming July 13, 2006](https://reader034.vdocuments.us/reader034/viewer/2022051401/56649cae5503460f94971835/html5/thumbnails/27.jpg)
Analysis Data -> Results
All results should be produced by a program Program should use analysis data (not raw) Have a “translation” of raw variable names ->
analysis variable names -> publication variable names
![Page 28: CCPR Computing Services More Efficient Programming July 13, 2006](https://reader034.vdocuments.us/reader034/viewer/2022051401/56649cae5503460f94971835/html5/thumbnails/28.jpg)
Analysis Data -> Results
Document- How were variances estimated? Why? What algorithms were used and why? Were
results robust? What starting values were used? Was
convergence sensitive? Did you perform diagnostics? Include in
programs/documentation.
![Page 29: CCPR Computing Services More Efficient Programming July 13, 2006](https://reader034.vdocuments.us/reader034/viewer/2022051401/56649cae5503460f94971835/html5/thumbnails/29.jpg)
Log files
Your log file should tell a story to the reader. As you print results to the log file, include
words explaining the results Include not only what your code is doing, but
your reasoning and thought process Don’t output everything to the log-file- use quietly and noisily in a meaningful way.
![Page 30: CCPR Computing Services More Efficient Programming July 13, 2006](https://reader034.vdocuments.us/reader034/viewer/2022051401/56649cae5503460f94971835/html5/thumbnails/30.jpg)
Project Clean-up
Create a zip file that contains everything necessary for complete replication
Use a readme.txt file to describe zip contents Delete/archive unused or old files Include any referenced files in zip When you have a final zip archive containing
everything- Open it in it’s own directory and run the script Check that all the results match
![Page 31: CCPR Computing Services More Efficient Programming July 13, 2006](https://reader034.vdocuments.us/reader034/viewer/2022051401/56649cae5503460f94971835/html5/thumbnails/31.jpg)
Questions/Feedback