merge multiple csv in single data frame using r

Post on 09-Jan-2017

671 Views

Category:

Data & Analytics

1 Downloads

Preview:

Click to see full reader

TRANSCRIPT

Merge Multiple files into single dataframe using R

Yogesh Khandelwal

Problem Description• The zip file contains 332 comma-separated-value (CSV) files

containing pollution monitoring data for fine particulate matter (PM) air pollution at 332 locations in the United States. Each file contains data from a single monitor and the ID number for each monitor is contained in the file name. For example, data for monitor 200 is contained in the file "200.csv".

• Data Source: http://spark-public.s3.amazonaws.com/compdata/data/specdata.zip

Variable Name

Variables in file

• Date: the date of observation in YYYY-MM-DD format (year-month-day) ,Datatype:factor

• sulfate: the level of sulfate PM in the air on that date (measured in micrograms per cubic meter),Datatype:num

• nitrate: the level of nitrate PM in the air on that date (measured in micrograms per cubic meter),Datatype:num

• Id:location id,Datatype:int

Before we start we should know

• Functions in R

• How to merge data files

Functions in R

Functions in RFunctions are created using the function() directive and arestored as R objects just like anything else. In particular, they are Robjects of class “function”.

f <- function(<arguments>) {## Do something interesting}

• Functions in R are “first class objects”, which means that they can be treated much like any other R object. Importantly,• Functions can be passed as arguments to other functions.• Functions can be nested, so that you can define a function inside of another function• The return value of a function is the last expression in the function• body to be evaluated.

Function contd..

• For ex:Function name

Function defination

Function call

Our objective

• How we can merge no. of files into single data frame?

• How to apply same function to different files in efficient way?

How to merge two different files?

• No.of options available like

1. Use merge() function2. Use rbind(),cbind() etc.

How to merge no.of files as a single data frame

• Approach 1files<-list.files("specdata",full.names = TRUE)dat<-NULLfor(i in 1:332){ dat<-rbind(dat,read.csv(files[i]))}

• Further we can run various command on merged file object as per our need some are like:1. Str(dat)2. Head(dat)3. Tail(dat) etc.

Notes:full.names= a logical value. If TRUE, the directory path is prepended to the file names to give a relative file path. If FALSE, the file names (rather than paths) are returned.

How to handle missing value in R ?

contd.• In R, NA is used to represent any value that is 'not available' or 'missing' (in

the | statistical sense)• Missing values play an important role in statistics and data analysis. Often,

missing values must not be ignored, but rather they should be carefully studied to see if there's an underlying pattern or cause for their missingness.

• For ex:• X<-c(1,2,NA,4)• Y<-c(NA,2,3,1)• >x+y• [1] NA 4 NA 5

• Multiple options are available in R to handle NA values like • Is.NA()• Set na.rm=TRUE as a function argument

> mean(X) [1] NA > mean(X,na.rm = TRUE) [1] 2.333333

Apply what we learn to our dataset

Function defination

Function call

pollutantmean('specdata','nitrate',1:10) [1] 0.7976266

Thank You!!

top related