r basics

34
R basics Sagun Baijal Monday, October 05, 2015

Upload: sagun-baijal

Post on 12-Apr-2017

135 views

Category:

Data & Analytics


0 download

TRANSCRIPT

Page 1: R basics

R basics

Sagun Baijal

Monday, October 05, 2015

Page 2: R basics

Overview

I What is R?I R’s correspondence with SI R featuresI Useful URLsI Installing R, RStudioI R and StatisticsI Using R - Getting Started

Page 3: R basics

What is R?I R is a language and environment for Statistical Computing and

Graphics.I It is based on S - a language earlier developed at Bell Labs.I R features:

I Cross-platformI Free/Open Source SoftwareI Package-based, rich repository of all sorts of packagesI Strong graphic capabilitiesI Strong user, developer communities, active development

I Useful URLs:I http://cran.r-project.orgI http://www.r-project.org/doc/bib/R-books.htmlI http://www.r-bloggers.comI http://cookbook-r.comI http://stats.stackexchange.com/I http://www.statmethods.net/I https://www.rstudio.com/

Page 4: R basics

Contd. . .

I Useful R books:I R in Action by Robert I. Kabacoff. Pub.: Manning PublicationsI Statistical Analysis with R by John M. Quick. Pub.: PACKT

PublishingI Many more R e-books available through Books24X7 (available

to CDAC through MCIT consortium).

Page 5: R basics

Contd. . .I Installing R:

I R can be downloaded from Comprehensive R Archive Network(CRAN) (URL mentioned in previous slide)

I Latest release is 3.2.2.I Release available for GNU/Linux, Windows and Mac.I For GNU/Linux:

I Debian, Ubuntu like: Follow instructions given on -https://cran.r-project.org/bin/linux/debian/,https://cran.r-project.org/bin/linux/ubuntu/; runsudo apt-get install r-base r-base-dev

I RHEL like: Follow instructions given on -https://cran.r-project.org/bin/linux/redhat/; runsudo yum install R

I For Windows: Follow instructions given on -https://cran.r-project.org/bin/windows/base/;Download exe for base package and RTools.

I Installing RStudio: RStudio is IDE for R. Available forGNU/Linux, Windows and Mac. Can be downloaded from URLgiven in previous slide for respective platforms.

Page 6: R basics

Contd. . .

I R and statistics:I A comprehensive statistical platform providing all sorts of data

analytics techniques.I Strong graphics capabilities to visualize complex data.I Designed to support interactive data analysis and exploration.I Capable of reading data from variety of sources.I Facility to program new statistical methods and packages.

I Some disadvantages too. . .I Objects stored in primary memory. May impose performance

bottlenecks in case of large datasets.I No provision of built-in dynamic or 3D graphics. But external

packages like plot3D, scatterplot3D etc. available.I Similarly, no built-in support for web-based processing. Can be

done through third-party packages.I Functionality scattered among packages.

Page 7: R basics

Using R - Getting startedI Launch R Interface/RStudio depending on your platform.I Utility commands/functions:

I setwd() - sets working directory.

setwd("C:/RDemo")I getwd() - gets current working directory.

getwd()

## [1] "C:/RDemo"I dir() - lists the contents of current working directory.

dir()

## [1] "fdata.csv" "Introduction-to-R.html"## [3] "Introduction-to-R.pdf" "Introduction-to-R.Rmd"## [5] "Introduction-to-R_files" "R-basics.html"## [7] "R-basics.pdf" "R-basics.Rmd"## [9] "R-introduction-1.pdf" "R-introduction-2.pdf"## [11] "R-introduction-3.pdf" "R-introduction-4.pdf"## [13] "R-introduction.html" "R-introduction.pdf"## [15] "R-introduction.Rmd" "test.R"## [17] "test1.R"

I ls() - lists names of objects in R environment

ls()

## [1] "µ" "age" "airquality" "allTables" "alpha"## [6] "bad" "cells" "ci" "cnames" "data"## [11] "dataframe" "dataFrame" "datamatrix" "dataMatrix" "diabetes"## [16] "distxy" "g" "good" "hcluster" "hg19"## [21] "HInvData" "iris" "km" "kmeansObj" "kmeansObj2"## [26] "m" "m1" "m2" "mat" "metadata"## [31] "mtcars" "mu0" "mu1" "mu2" "n"## [36] "n_new" "n_old" "n1" "n2" "new.y"## [41] "p" "patientData" "patientID" "pow" "query"## [46] "res" "result" "rnames" "s" "sd"## [51] "sd_new" "sd_old" "sd1" "sd2" "sigma"## [56] "sp" "status" "temp" "temp1" "temp2"## [61] "ts" "ucscDb" "x" "X_new" "X_old"## [66] "x1" "x2" "y" "z"

Page 8: R basics

Contd. . .I help.start() - provides general help.I help(“foo”) or ?foo - help on function “foo”. For ex.

help(“mean”) or ?mean.I help.search(“foo”) or ??foo - search for string “foo” in help

system. For ex. help.search(“mean”) or ??meanI example(“foo”) - shows examples of function “foo”.

example("mean")

#### mean> x <- c(0:10, 50)#### mean> xm <- mean(x)#### mean> c(xm, mean(x, trim = 0.10))## [1] 8.75 5.50

I data() - lists all example datasets in currently loaded packages.I library() - lists all available packages

Page 9: R basics

Contd. . .

I data(foo) - loads dataset “foo” in R. For ex. data(mtcars)I library(foo) - load package “foo” in R. For ex. library(plyr).I rm(objectlist) - removes one or more objects from R workspace.I options() - shows/sets current options for workspace.I history(#) - lists last # commands. default 25.I install.packages(“foo”) - installs package “foo”. For ex.

install.packages(“reshape2”).I help(package=“package-name”) - provides brief description of

package, an index of functions and datasets in package.I print(x) or x- print obejct ‘x’ on terminal.I q() - quits current R session.

Page 10: R basics

Using R - Data types

I Five basic types in R are - character, numeric, integer, complex,logical(true/false).

I Common data objects are - vector, matrix, list, factor, dataframe, table.

I Creating and assigning to a variable:

x<-1

I Checking the type of variable:

class(x)

## [1] "numeric"

Page 11: R basics

Contd. . .I Printing a variable:

x #auto-printing

## [1] 1

print(x) #explicit printing

## [1] 1

I Creating Vector: contains objects of same class.

x<-c(1,2,3) #using c() functiony<-vector("logical", length=10) #using vector() functionlength(x) #length of vector x

## [1] 3

Page 12: R basics

Contd. . .I Vector operations: Various arithmetic operations can be

performed member-wise.

y<-c(4,5,6)5*x #multiplication by a scalar

## [1] 5 10 15

x+y #addition of two vectors

## [1] 5 7 9

x*y #multiplication of two vectors

## [1] 4 10 18

x^y #x to the power y

## [1] 1 32 729

Page 13: R basics

Contd. . .I Creating Matrix: Two-dimensional array having elements of

same class.

m<-matrix(c(1,2,3,11,12,13), nrow=2,ncol=3) #using matrix() function.m

## [,1] [,2] [,3]## [1,] 1 3 12## [2,] 2 11 13

dim(m) #dimensions of matrix m

## [1] 2 3

attributes(m) #attributes of matrix m

## $dim## [1] 2 3

Page 14: R basics

Contd. . .

I By default, elements in matrix are filled by column. “byrow”attribute of matrix() can be used to fill elements by row.

m<-matrix(c(1,2,3,11,12,13), nrow=2,ncol=3, byrow = TRUE)m

## [,1] [,2] [,3]## [1,] 1 2 3## [2,] 11 12 13

Page 15: R basics

Contd. . .I cbind-ing and rbind-ing: By using cbind() and rbind() functions

x<-c(1,2,3)y<-c(11,12,13)cbind(x,y)

## x y## [1,] 1 11## [2,] 2 12## [3,] 3 13

rbind(x,y)

## [,1] [,2] [,3]## x 1 2 3## y 11 12 13

Page 16: R basics

Contd. . .

I Matrix operations/functions:

p<-3*m #multiplication by a scalarn<-matrix(c(4,5,6,14,15,16), nrow=2,ncol=3)q<-m+n #addition of two matriceso<-matrix(c(4,5,6,14,15,16), nrow=3,ncol=2)r<-m %*% o #matrix multiplication by using %*%mdash<-t(m) #transpose of matrixs<-matrix(c(4,5,6,14,15,16,24,25,26), nrow=3,ncol=3,

byrow=TRUE)s_det<-det(s) #determinant of sm_row_sum<-rowSums(m)m_col_sum<-colSums(m)

Page 17: R basics

Contd. . .p

## [,1] [,2] [,3]## [1,] 3 6 9## [2,] 33 36 39

q

## [,1] [,2] [,3]## [1,] 5 8 18## [2,] 16 26 29

r

## [,1] [,2]## [1,] 32 92## [2,] 182 542

Page 18: R basics

Contd. . .mdash

## [,1] [,2]## [1,] 1 11## [2,] 2 12## [3,] 3 13

s_det

## [1] 1.110223e-14

m_row_sum

## [1] 6 36

m_col_sum

## [1] 12 14 16

Page 19: R basics

Contd. . .I List: A special type of vector containing elements of different

classes

x<-list(1,"p",TRUE,2+4i) #using list() functionx

## [[1]]## [1] 1#### [[2]]## [1] "p"#### [[3]]## [1] TRUE#### [[4]]## [1] 2+4i

Page 20: R basics

Contd. . .

I Factor: Represents categorical data. Can be ordered orunordered.

status<-c("low","high","medium","high","low")x<-factor(status, ordered=TRUE,

levels=c("low","medium","high")) #using factor() functionx

## [1] low high medium high low## Levels: low < medium < high

I ‘levels’ argument is used to set the order of levels.I First level forms the baseline level.I Without any order, levels are called nominal. Ex. - Type1,

Type2, . . .I With order, levels are called ordinal. Ex. - low, medium, . . .

Page 21: R basics

Contd. . .I Data frame: Used to store tabular data. Can contain different

classes

student_id<-c(1,2,3)student_names<-c("Ram","Shyam","Laxman")position<-c("First","Second","Third")data<-data.frame(student_id,student_names,position) #using data.frame() functiondata

## student_id student_names position## 1 1 Ram First## 2 2 Shyam Second## 3 3 Laxman Third

data$student_id #accessing a particular column

## [1] 1 2 3

Page 22: R basics

Contd. . .

nrow(data) #no. of rows in data

## [1] 3

ncol(data) #no. of columns in data

## [1] 3

names(data) #column names of data

## [1] "student_id" "student_names" "position"

Page 23: R basics

Using R - Control structuresI R provides all types of control structures: if-else, for, while,

repeat, break, next, return.I Mainly used within functions/scripts.

x<-5if(x > 7) #if-else structure

y<-TRUE elsey<-FALSE

y

## [1] FALSE

for(i in 1:10) #for loopprint(i)

## [1] 1## [1] 2## [1] 3## [1] 4## [1] 5## [1] 6## [1] 7## [1] 8## [1] 9## [1] 10

Page 24: R basics

Contd. . .

count<-0while(count < 10) #while loop

count<-count+1count

## [1] 10

I repeat is used to create an infinite loop. It can be terminatedonly through a call to break.

I next is used to skip an interation in a loop.I return is used to return a value from a function.

Page 25: R basics

Using R - looping functionsI These functions can be used loop over various type of objects.I lapply - loop over a list and evaluate a function on each

element.I sapply - same as lapply but try to simplify the result.I apply - apply a function over the margins of an arrayI tapply - apply a function over the subsets of a vector

x<-list(a=1:5,b=rnorm(20))lapply(x,sum) #lapply returns a list

## $a## [1] 15#### $b## [1] -1.487833

Page 26: R basics

Contd. . .

x<-matrix(c(1,2,3,11,12,13), nrow=2, ncol=3,byrow=TRUE)# MARGIN=1 for rows, MARGIN=2 for columnsapply(x,MARGIN=1,FUN=sum)

## [1] 6 36

y<-c(rnorm(20),runif(20),rnorm(20,1))f<-gl(3,20) #generate factor levels as per given patterntapply(y,f,mean)

## 1 2 3## 0.05429977 0.51238618 0.87080628

Page 27: R basics

Using R - SubsettingI Refers to extract sub-segment of data from R objects.I Important while working with large datasets.I There are various operators.I [ used to extract the object of same class as original generally

from a vector or matrix.I [[ used to extract elements of a list or data frame.I $ used to extract elements from a list or data frame by name.

x<-c(1,2,3,4)x[2]

## [1] 2

x[1:3]

## [1] 1 2 3

Page 28: R basics

Contd. . .I Subsetting a matrix:

x<-matrix(c(1,2,3,11,12,13), nrow=2, ncol=3,byrow=TRUE)x[1,2]

## [1] 2

x[1,]

## [1] 1 2 3

x[,2]

## [1] 2 12

Page 29: R basics

Contd. . .I Subsetting a list:

x<-list(a=1,b="p",c=TRUE,d=2+4i)x[[1]]

## [1] 1

x$d

## [1] 2+4i

x[["c"]]

## [1] TRUE

x["b"]

## $b## [1] "p"

Page 30: R basics

Contd. . .I Subsetting a data frame

data[1,]

## student_id student_names position## 1 1 Ram First

data$student_names

## [1] Ram Shyam Laxman## Levels: Laxman Ram Shyam

data[data$position=="Second",]

## student_id student_names position## 2 2 Shyam Second

I Using logical ANDs and ORsdata[data$student_id>=2 & data$position=="Third",]

## student_id student_names position## 3 3 Laxman Third

Page 31: R basics

Using R - FunctionsI Created using the function() directive.I Can be passed as arguments to other functions. Can be nested.I Return value is the last expression to be evaluated inside

function body.I Have named arguments with default values.I Some arguments can be missing during function calls.

add<-function(a=1,b=2,c=3) {s = a+b+cprint(s)

}add()

## [1] 6

add(10,11,12)

## [1] 33

add(10)

## [1] 15

Page 32: R basics

R Source files

I Should be saved/created with .R extension.I Can be used to store functions, commands required to be

executed sequentially etc.I source() function used to load such R scripts into R workspace.

source("C:/RDemo/test.R")add()

## [1] 6

Page 33: R basics

Contd. . .

source("C:/RDemo/test1.R", echo=T)

#### > x <- 1#### > y <- 2#### > x + y## [1] 3

source("C:/RDemo/test1.R", print.eval=T)

## [1] 3

Page 34: R basics

References

I http://cran.r-project.orgI http://www.r-project.org/doc/bib/R-books.htmlI http://www.r-bloggers.comI http://cookbook-r.comI http://stats.stackexchange.com/I http://www.statmethods.net/I https://www.rstudio.com/I https://github.com/DataScienceSpecialization/

courses/tree/master/02_RProgramming