writing functions in r some handy advice for creating your own functions
TRANSCRIPT
A quick review of R
R is a statistical software package and an object-oriented programming language
Terms to remember: Vectors, matrices, and dataframes Indices Functions
Warm up
Download the data for lab 3 In Rstudio, go to Workspace → Import Dataset
→ From Text File Make sure to select the header option If you're not using Rstudio, the code is:
data_lab_3 <- read.csv("~/documents/classes/Psych 1950/mood.csv")
Where ~ is the path name
Warming up a little more
Use the help() function to read about the read.csv() function
How could we use it to read in a file with no header?
read.csv(“filename”,header=FALSE) We can also use R to read in SPSS files, but
for now we'll stick with read.csv()
Last page of warm-up (I promise!)
Find the standard deviation (sd()) of the second column (puDay2call1) of your dataframe
Uh-oh! That output isn't helpful Add the following argument to the standard
deviation function: na.rm=TRUE
A slight modification
Suppose that we want to calculate the standard deviation using the population formula
Check the help file for sd(). Is there a way to do that?
Nope! We'll need to make our own....
Making a function
Let's start with something easier We'll make our own mean() function What should it do?
We'll pass* it a vector of numbers as arguments*
It should return* the mean
*programming jargon
The function syntax
getMean <- function(arguments){
commands go here
} The name of the function is getMean() (this
is usually a verb) The arguments are the values and instructions
we give to the function The body is where the work happens
Iteration 1
getMean <- function(x){
return(sum(x)/length(x))
} Try this on the second column How can we handle NAs in the function,
assuming we ALWAYS want to remove them?
Iteration 2
getMean <- function(x){
return(sum(x,na.rm=T)/length(x))
} Now try this one, and compare your results to
R's built-in mean function Why aren't the values the same?
Hint: what's the length of a vector that contains NAs?
Iteration 3
getMean <- function(x){
return(sum(x,na.rm=T)/length(na.omit(x))
} Another R function saves the day! Thanks, R! Compare your results to the built-in function
Another way to do it
We've been leaning heavily on the sum() function
Sometimes, though, we need to tell R to do a certain operation a number of times
To do that, we use an operation called a for loop
There are other loops as well, but we'll stick with a for loop
The anatomy of a for loop
getFactorial <- function(number){
j=1
for (index in 1:number){ j <- j*index
}
return(j)
} What will this function do?
One more concept
Sometimes, we need a function to make a decision
Here, we use conditionalsif(condition){ #if the condition is true
Something #do this
}
else{ #if it's false
something else #do this instead
}
For examples
if (!is.na(x)){ #if x isn't an NA
print(x) #write x. If it is, nothing
} #will happen
if (x<=4){ #if x is less than 4
print(x-1)
}
if (x==5){ #if x is exactly 5
print(“Five”)
}
Looping to get the mean
getMean_3 <- function(x){
sum <- 0
length <- 0
for (i in 1:length(x)){
if (!is.na(x[i])){ #exclude NAs
sum <- sum+x[i] #keep a running tally of the sum
length <- length+1 #and the length
}
}
return(sum/length) #this is the mean
}
Adding some complexity
It's your turn now: Write two functions to compute the sum of
squared deviations from the mean of a vector In one version, use the sum() function In the other, use a for loop
Try to allow your function to work with a vector that includes some NAs
Remember
The formula for the sum of squares of a set of numbers is the sum of (x
i – mean(x))2
Now make R do it for you!