moving data to and from r

8
Advanced Data Analytics: Moving Data Around Jeffrey Stanton School of Information Studies Syracuse University

Upload: syracuse-university

Post on 24-Jan-2015

3.936 views

Category:

Education


1 download

DESCRIPTION

Part of advanced analytics course.

TRANSCRIPT

Page 1: Moving Data to and From R

Advanced Data Analytics: Moving Data Around

Jeffrey Stanton

School of Information Studies

Syracuse University

Page 2: Moving Data to and From R

R and the File System

• R maintains a current working directory to simplify the process of reading and saving files

getwd() # shows the pathname of current foldersetwd("pathname") # Sets a new pathhistory() # shows most recent commands

# Creates a CSV file using data from a dataframewrite.table(dataFr, sep=",", file="filename.csv")

# Reads a CSV file into a dataframetargetFrame = read.table("filename.csv", sep=",")

2

Page 3: Moving Data to and From R

R and the Windows Clipboard

• For small chunks of data, it may be convenient to “cut and paste”

• Create a small rectangle of data in Excel and copy it to the clipboard

• Then, in R: > read.DIF("clipboard",transpose=TRUE) V1 V21 1 12 2 03 3 14 4 05 5 16 6 0

3

Page 4: Moving Data to and From R

Include Variable Names

4

• You can pull in the variable names (the column headings) as well

• Then, in R:> read.DIF("clipboard",transpose=TRUE,header=TRUE) Subject Code1 1 12 2 03 3 14 4 05 5 16 6 0

Page 5: Moving Data to and From R

Best Option: Put Clipboard into Dataframe

> newDF = read.DIF("clipboard",transpose=TRUE,header=TRUE)

> newDF Subject Code1 1 12 2 03 3 14 4 05 5 16 6 0> class(newDF)[1] "data.frame"

5

Page 6: Moving Data to and From R

An Explanation of Data Frames

• Every single piece of data in R is a “vector”: A list of “scalar” values all of the same mode– Scalar just means a single element or value, like the number 5– R vectors can be lists with any number of elements, including just one

element; so a scalar could be stored in a vector of length one– The mode of a vector can be numerical, or character, or logical

• Just like Excel spreadsheets and other data programs like SPSS, vectors in R can be two dimensional, with a certain number of columns and a certain number of rows; a two dimensional vector is called a matrix

• But, being a vector, a matrix has to contain elements all of the same mode, so a matrix cannot always hold a typical spreadsheet or data set, because these often have different types in each column

• This is where the data frame comes in: A data frame is a list of vectors, all of the same length, each of which can be a different type

6

Page 7: Moving Data to and From R

read.DIF also works with files

> setwd(“C:/DataMining/DataFiles")> newDF =

read.DIF(“excelExport.dif", transpose=TRUE,header=TRUE)

> class(newDF)[1] "data.frame"> attach(newDF)

# Note that Excel, DIF, and R# don’t always agree on data# formats. For example, currency# in Excel will not export to# integer values in R, so remove# as much formatting as possible.

7

Page 8: Moving Data to and From R

Demonstrating Mastery

• Create or find data in an Excel spreadsheet and export as a CSV file

• Import data into R from a CSV or TXT file• Export a data frame into a CSV file• Read the CSV file into Excel• Advanced: Use data interchange format (“DIF”) to

exchange files between R and Excel• Advanced: Use a data frame in R to store data obtained

from a spreadsheet

8