moving data to and from r
DESCRIPTION
Part of advanced analytics course.TRANSCRIPT
Advanced Data Analytics: Moving Data Around
Jeffrey Stanton
School of Information Studies
Syracuse University
R and the File System
• R maintains a current working directory to simplify the process of reading and saving files
getwd() # shows the pathname of current foldersetwd("pathname") # Sets a new pathhistory() # shows most recent commands
# Creates a CSV file using data from a dataframewrite.table(dataFr, sep=",", file="filename.csv")
# Reads a CSV file into a dataframetargetFrame = read.table("filename.csv", sep=",")
2
R and the Windows Clipboard
• For small chunks of data, it may be convenient to “cut and paste”
• Create a small rectangle of data in Excel and copy it to the clipboard
• Then, in R: > read.DIF("clipboard",transpose=TRUE) V1 V21 1 12 2 03 3 14 4 05 5 16 6 0
3
Include Variable Names
4
• You can pull in the variable names (the column headings) as well
• Then, in R:> read.DIF("clipboard",transpose=TRUE,header=TRUE) Subject Code1 1 12 2 03 3 14 4 05 5 16 6 0
Best Option: Put Clipboard into Dataframe
> newDF = read.DIF("clipboard",transpose=TRUE,header=TRUE)
> newDF Subject Code1 1 12 2 03 3 14 4 05 5 16 6 0> class(newDF)[1] "data.frame"
5
An Explanation of Data Frames
• Every single piece of data in R is a “vector”: A list of “scalar” values all of the same mode– Scalar just means a single element or value, like the number 5– R vectors can be lists with any number of elements, including just one
element; so a scalar could be stored in a vector of length one– The mode of a vector can be numerical, or character, or logical
• Just like Excel spreadsheets and other data programs like SPSS, vectors in R can be two dimensional, with a certain number of columns and a certain number of rows; a two dimensional vector is called a matrix
• But, being a vector, a matrix has to contain elements all of the same mode, so a matrix cannot always hold a typical spreadsheet or data set, because these often have different types in each column
• This is where the data frame comes in: A data frame is a list of vectors, all of the same length, each of which can be a different type
6
read.DIF also works with files
> setwd(“C:/DataMining/DataFiles")> newDF =
read.DIF(“excelExport.dif", transpose=TRUE,header=TRUE)
> class(newDF)[1] "data.frame"> attach(newDF)
# Note that Excel, DIF, and R# don’t always agree on data# formats. For example, currency# in Excel will not export to# integer values in R, so remove# as much formatting as possible.
7
Demonstrating Mastery
• Create or find data in an Excel spreadsheet and export as a CSV file
• Import data into R from a CSV or TXT file• Export a data frame into a CSV file• Read the CSV file into Excel• Advanced: Use data interchange format (“DIF”) to
exchange files between R and Excel• Advanced: Use a data frame in R to store data obtained
from a spreadsheet
8