introduction to contributed packages in r department of statistical sciences and operations research...

Post on 17-Dec-2015

216 Views

Category:

Documents

1 Downloads

Preview:

Click to see full reader

TRANSCRIPT

Introduction to Contributed Packages in R

Department of Statistical Sciences and Operations ResearchComputation Seminar SeriesSpeaker: Edward BooneEmail: elboone@vcu.edu

What is R?

The R statistical programming language is a free open source package based on the S language developed by Bell Labs.

The language is very powerful for writing programs. Many statistical functions are already built in. Contributed packages expand the functionality to

cutting edge research. Since it is a programming language, generating

computer code to complete tasks is required.

Getting Started

Where to get R? Go to www.r-project.org Downloads: CRAN Set your Mirror: Anyone in the USA is fine. Select Windows 95 or later. Select base. Select R-2.4.1-win32.exe

The others are if you are a developer and wish to change the source code.

Getting Started

The R GUI?

Getting Started

Opening a script. This gives you a script window.

Getting Started

Submitting a program:

Use button

Right mouse click and run selection.

Submit Selection

Getting Started

Basic assignment and operations. Arithmetic Operations:

+, -, *, /, ^ are the standard arithmetic operators. Matrix Arithmetic.

* is element wise multiplication %*% is matrix multiplication

Assignment To assign a value to a variable use “<-”

Getting Started

How to use help in R? R has a very good help system built in. If you know which function you want help with

simply use ?_______ with the function in the blank.

Ex: ?hist. If you don’t know which function to use, then use

help.search(“_______”). Ex: help.search(“histogram”).

Importing Data

How do we get data into R? Remember we have no point and click… First make sure your data is in an easy to

read format such as CSV (Comma Separated Values).

Use code: D <- read.table(“path”,sep=“,”,header=TRUE)

Working with data.

Accessing columns. D has our data in it…. But you can’t see it

directly. To select a column use D$column.

Working with data.

Subsetting data. Use a logical operator to do this.

==, >, <, <=, >=, <> are all logical operators. Note that the “equals” logical operator is two = signs.

Example: D[D$Gender == “M”,] This will return the rows of D where Gender is “M”. Remember R is case sensitive! This code does nothing to the original dataset. D.M <- D[D$Gender == “M”,] gives a dataset with the

appropriate rows.

Source Files

Source files allows you to store all of your created functions in a single file and have all those functions available to you.

To load a self created library use:

source(Path)

Don’t forget that \ in the path needs to be replaced with \\

Libraries

In order to keep R’s memory footprint small, additional functionality is stored in libraries.

These libraries can be called through the GUI or scripts.

Beware that some contributed packages may conflict with some libraries.

Contributed Packages

Since R is open source and the developers are well organized, developing and finding contributed packages is easy.

Currently there are 964 contributed packages.

These range from wavelets, financial mathematics to spatial data analysis.

Contributed Packages

One popular library is lattice.

Contributed Packages

You can install contributed packages using the GUI.

Contributed Packages

You can install the package by selecting it from the list.

Note: Installing a package does not make it immediately available for use.

You still need to use the library() statement to make the functionality available to you.

library(lattice)

Help on contributed packages Once a contributed package is loaded you

can access the help for the package and a list of functions available in the package by:

library(help=“lattice”)

The CircStats Package Many times data may come in a circular format. For example the direction of migration or flight of birds from

their nest. The data is an angle not a “linear” measurement. The data can only go between 0 and 2

The CircStats Package Use the CircStats Package.

library(CircStats)

Consider the following: data <- runif(50, 0, pi)

mean.dir <- circ.mean(data)

mean.dir

[1] 1.446502

The CircStats Package Randomly generate data from a Von Mises distribution

data.vm <- rvm(100, 0, 3)

Create a plot of it using circ.plot:

circ.plot(data.vm, stack=TRUE, bins=150, shrink=1.5)

The CircStats Package Regression with circular data: Create some data

data1 <- runif(50, 0, 2*pi)

data2 <- atan2(0.15*cos(data1) + 0.25*sin(data1), 0.35*sin(data1)) + rvm(50, 0, 5)

Run the regression using circ.reg:

circ.lm <- circ.reg(data1, data2, order=1)circ.lm

(Intercept) -0.01365604 -0.02939188cos.alpha -0.29872673 0.41344126

sin.alpha 0.78894271 0.72908521

The CircStats Package Plot the data

plot(data1, data2)

Plot the predicted line

circ.lm$fitted[circ.lm$fitted>pi] <- circ.lm$fitted[circ.lm$fitted>pi] - 2*pi

points(data1[order(data1)], circ.lm$fitted[order(data1)], type='l')

The norm Contributed Package While the norm package sounds as if it

would have something to do with the normal distribution it is in fact a package for dealing with missing data.

It implements the Data Augmentation and Multiple Imputation scheme of Schafer (1997).

Similar to SAS PROC MI.

The norm Contributed Package Load the library.

library(norm)

The norm Contributed Package Generate some data.

X1 <- rnorm(100,6,1)

X2 <- rnorm(100,10,3)

X3 <- rnorm(100,3,.2)

X4 <- rnorm(100,31,2)

Y <- 5 +.4*X1-.3*X2+rnorm(100,0,1)

The norm Contributed Package Generate some missing data.

X1a <- ifelse(runif(100,0,1)<.1,NA,X1)

X2a <- ifelse(runif(100,0,1)<.1,NA,X2)

Put the data together.

YX <- cbind(Y,X1a,X2a,X3,X4)

The norm Contributed Package Prep the data and parameters for multiple

imputation.#do preliminary manipulations s <- prelim.norm(YX)

#find the mle

thetahat <- em.norm(s)

#set random number generator seed

rngseed(1234567)

The norm Contributed Package Create a list to store the individual results in.

betaout <- vector("list",10)

betasterrout <- vector("list",10)

The norm Contributed Package Run a multiple imputation loop

for(i in 1:10){

ximp <- imp.norm(s,thetahat,YX)

beta1 <- lm(ximp[,1]~ximp[,2]+ximp[,3]+ximp[,4]+ximp[,5] )$coefficients

betaout[[i]] <- beta1

betasterrout[[i]] <- summary(lm(ximp[,1]~ximp[,2] + ximp[,3] + ximp[,4] + ximp[,5]))$coefficients[,2]

}

The norm Contributed Package Analyze the results

mi.inference(betaout,betasterrout,confidence=0.95)

The norm Contributed Package Look at the output(Intercept) ximp[, 2] ximp[, 3] ximp[, 4] ximp[, 5]

6.75624286 0.30502706 -0.32846960 0.05157696 -0.04154060

$std.err

(Intercept) ximp[, 2] ximp[, 3] ximp[, 4] ximp[, 5]

2.70312542 0.13431178 0.04240159 0.65908509 0.05596610

$df

(Intercept) ximp[, 2] ximp[, 3] ximp[, 4] ximp[, 5]

1318.8371 222.2528 13269.2373 1770.6680 27689.4900

$signif

(Intercept) ximp[, 2] ximp[, 3] ximp[, 4] ximp[, 5]

1.256048e-02 2.410251e-02 1.021405e-14 9.376337e-01 4.579447e-01

$r

(Intercept) ximp[, 2] ximp[, 3] ximp[, 4] ximp[, 5]

0.09004737 0.25192843 0.02673983 0.07676697 0.01835967

The lpSolve Package

The lpSolve package allows for the solving of linear and integer programs.

library(lpSolve)

The lpSolve Package

Consider the following linear program:

1 2 3

1 2 3

1 2 3

max :

9

:

2 3 9

3 2 2 15

x x x

ST

x x x

x x x

The lpSolve Package

Set up the vectors and matrices

f.obj <- c(1, 9, 3)

f.con <- matrix (c(1, 2, 3, 3, 2, 2), nrow=2, byrow=TRUE)

f.dir <- c("<=", "<=")

f.rhs <- c(9, 15)

The lpSolve Package

The lp() function will attempt to solve the linear program.

lp ("max", f.obj, f.con, f.dir, f.rhs)

Success: the objective function is 40.5

The lpSolve Package

To obtain the solution grab the solution from the object.

lp("max", f.obj, f.con, f.dir, f.rhs)$solution

[1] 0.0 4.5 0.0

The lpSolve Package

Sensitivity analyses can be obtained from the lp() object.

The following are objects attached to an lp() object.

[1] "direction" "x.count" "objective" "const.count" [5] "constraints""int.count" "int.vec" "objval"

[9] "solution" "presolve" "compute.sens" "sens.coef.from"

[13] "sens.coef.to" "duals" "duals.from" "duals.to"

[17] "status"

The lpSolve Package

To solve an integer program specify the vector components for which variables need to be integers

lp("max", f.obj, f.con, f.dir, f.rhs, int.vec=1:3)

Success: the objective function is 37

The lpSolve Package

To obtain the solution to the integer program use the solution statemet as before:

lp("max", f.obj, f.con, f.dir, f.rhs, int.vec=1:3) $solution

[1] 1 4 0

Summary

R is programming environment with many standard programming structures already included.

A large number of contributed packages. Many packages allow for use of modern statistical

procedures with out having to code them yourself. Requires familiarity with R to actually implement the

packages. No support. Allows users to create new packages.

Summary

All of the R code and files can be found at:

www.people.vcu.edu/~elboone2/CSS.htm

top related