r workshop: day 3 yun ju sung 7/15/2014. 5. regression analysis regression analysis is fundamental...

54
R Workshop: Day 3 Yun Ju Sung 7/15/2014

Upload: elmer-vance

Post on 14-Dec-2015

216 views

Category:

Documents


3 download

TRANSCRIPT

Page 1: R Workshop: Day 3 Yun Ju Sung 7/15/2014. 5. Regression Analysis Regression analysis is fundamental and forms a major part of statistical analysis!

R Workshop: Day 3

Yun Ju Sung7/15/2014

Page 2: R Workshop: Day 3 Yun Ju Sung 7/15/2014. 5. Regression Analysis Regression analysis is fundamental and forms a major part of statistical analysis!

5. Regression AnalysisRegression analysis is fundamental and forms a major part of statistical analysis!

Page 3: R Workshop: Day 3 Yun Ju Sung 7/15/2014. 5. Regression Analysis Regression analysis is fundamental and forms a major part of statistical analysis!

Linear regression analysis• The regression model: y = β0 + β1x + ε

• The values of β0 and β1 are unknown and will be estimated in a reasonable manner from the data

• The estimated regression line is (using "hats" to denote the estimates)• For each data point xi we have

(called the predicted value)• The difference between the true and

predicted value is the residual

xy 10ˆˆˆ

ii xy 10ˆˆˆ

iii yye ˆ

Page 4: R Workshop: Day 3 Yun Ju Sung 7/15/2014. 5. Regression Analysis Regression analysis is fundamental and forms a major part of statistical analysis!

Example: linear regression with R• The maximum heart rate of a person is often said to be related

to age by the equation: Max rate= 220 - Age• Suppose this is to be empirically proven and 15 people of

varying ages are tested for their maximum heart rate.

• We use lm() to fit a linear model

Page 5: R Workshop: Day 3 Yun Ju Sung 7/15/2014. 5. Regression Analysis Regression analysis is fundamental and forms a major part of statistical analysis!

Example: linear regression with R• The result of the lm function can be stored: lm.result=lm(y~x)• We can use summary to get regression coefficients (and more):

summary(lm.result)

• The result of the lm function is of class lm and so the plot and summary commands have been adapted.

• For several generic functions (including print, plot, and summary), the result depends on the class of object that is given as argument.

Page 6: R Workshop: Day 3 Yun Ju Sung 7/15/2014. 5. Regression Analysis Regression analysis is fundamental and forms a major part of statistical analysis!

Prediction for new values• One primary use of regression analysis is to make predictions

for the response value for new values of the predictor.• We can use a regression line to get the predicted

value for a new value of x.• It is a little chunk to use the predict function!• For example, to get a prediction for 50 and 60 year old

• We need a data frame with column names matching the predictor variable, which can be done with the data.frame function with name=values.

• Can you get the same answer with estimated coefficients?

xy 10ˆˆˆ

Page 7: R Workshop: Day 3 Yun Ju Sung 7/15/2014. 5. Regression Analysis Regression analysis is fundamental and forms a major part of statistical analysis!

Assignment: Question 10For the data set on housing prices, homedata, investigate the relationship between old assessed value and new.a. Use old as the predictor variable. Does the data suggest a

linear relationship? b. Are there any outliers? What may have caused these

outliers?b. What is the predicted new assessed value for a $75,000

house in 1970?

Page 8: R Workshop: Day 3 Yun Ju Sung 7/15/2014. 5. Regression Analysis Regression analysis is fundamental and forms a major part of statistical analysis!

Interacting with a scatterplot• When looking at a scatterplot we see the trend of the data as

well as individual data points. If one of these data points stands out, how can it be identified?

• We can use identify function to identify points on a scatterplot by their corresponding index with each mouse click.

• A template for its usage is identify (x, y, labels=…,n=…) • The value n specifies the number of points to identify. • For example, identify(BUSH,BUCHANAN,n=2,labels=County)

can be used to identify two outliers in Question 11.

Page 9: R Workshop: Day 3 Yun Ju Sung 7/15/2014. 5. Regression Analysis Regression analysis is fundamental and forms a major part of statistical analysis!

Assignment: Question 11For the florida dataset, investigate the relationship between Bush vs. Buchanan. a. Fit a regression line using Bush as the predictor variable.

Does the data suggest a linear relationship? b. Are there any outliers? Identify two outliersc. Fit another regression line after excluding two outliers. What

is the predicted value for the number of votes Buchanan would get in Miami-Dade county based on the number of Bush votes?

Page 10: R Workshop: Day 3 Yun Ju Sung 7/15/2014. 5. Regression Analysis Regression analysis is fundamental and forms a major part of statistical analysis!

The assumptions of the linear model• The regression model is yi=β0+β1xi+εi

• In order to make statistical inference, we need to make assumptions about the errors εi.

• The standard assumptions are that these errors are independent, normals with mean 0 and common variance 2

• The validity of the model can be checked graphically• The plot function can do this if we give it the result of the

regression, as plot(lm.result, which=1:4)

Page 11: R Workshop: Day 3 Yun Ju Sung 7/15/2014. 5. Regression Analysis Regression analysis is fundamental and forms a major part of statistical analysis!

Checking the validity of the linear model• Residuals vs. fitted: Look for spread around the line y = 0 and

no obvious trend.• Normal Q-Q plot: The residuals are normal if this graph falls

close to a straight line.• Scale-Location plot shows the square root of the standardized

residuals. The tallest points, are the largest residuals.• Cook's distance plot identifies points which have a lot of

influence in the regression line.

Page 12: R Workshop: Day 3 Yun Ju Sung 7/15/2014. 5. Regression Analysis Regression analysis is fundamental and forms a major part of statistical analysis!

Inference about the slope β1

• Once you are satisfied that the model fits the data, statistical inferences can be made.

• The normalized value of estimator b1 for β1

has the t-distribution with n-2 degrees of freedom (here we use b1 instead of )

• To test H0: β1 = a (the null hypothesis) against

HA: β1 ≠ a (the alternative hypothesis)

we compute the t statistic

and finds the p-value from the t-distribution.

Page 13: R Workshop: Day 3 Yun Ju Sung 7/15/2014. 5. Regression Analysis Regression analysis is fundamental and forms a major part of statistical analysis!

Inference about the slope β1

• Let’s do a test to see if the slope of -1 is correct for our heart-rate example. This corresponds to H0: β1 = -1 vs. HA: β1 ≠ -1

• The actual p-value is twice this as the problem is two-sided. It is unlikely that for this data the slope is -1.

• R will automatically test for the assumption β1= 0, which means no slope. See how the p-value is included in the output of the summary command in the column Pr(>|t|).

• Can you compute this p-value by yourself?

Page 14: R Workshop: Day 3 Yun Ju Sung 7/15/2014. 5. Regression Analysis Regression analysis is fundamental and forms a major part of statistical analysis!

Inference about the intercept β0

• A statistical test for b0 can be done using that has also a t-distribution with n-2 degrees of freedom

• Let's check if the data supports the intercept of 220. Formally, we test H0: β0 = 220 vs. HA: β0 < 220.

• We need to compute the value of the test statistic and then look up the one-sided p-value.

• We would reject the value of 220 based on this p-value as well.

Page 15: R Workshop: Day 3 Yun Ju Sung 7/15/2014. 5. Regression Analysis Regression analysis is fundamental and forms a major part of statistical analysis!

Assignment: Question 12The cost of a home depends on the number of bedrooms in the house. Suppose the following data is recorded for homes in a given town

a. Make a scatterplot, and fit the data with a regression line. b. Test the hypothesis that an extra bedroom costs $60,000

against the alternative that it costs more.

Price (in thousands) 300 250 400 550 317 389 425 289 389 559

No. of bedrooms 3 3 4 5 4 3 6 3 4 5

Page 16: R Workshop: Day 3 Yun Ju Sung 7/15/2014. 5. Regression Analysis Regression analysis is fundamental and forms a major part of statistical analysis!

Assignment: Question 13It is well known that the more beer you drink, the more your blood alcohol level rises. Suppose we have the following data on student beer consumption

a. Make a scatterplot and fit the data with a regression line. b. Test the hypothesis that another beer raises your BAL by 0.02

percent against the alternative that it is less.c. Test the hypothesis that the intercept is 0 with a two-sided

alternative.

Beers 5 2 9 8 3 7 3 5 3 5

BAL 0.10 0.03 0.19 0.12 0.04 0.095 0.07 0.06 0.02 0.05

Page 17: R Workshop: Day 3 Yun Ju Sung 7/15/2014. 5. Regression Analysis Regression analysis is fundamental and forms a major part of statistical analysis!

6. Multivariate Data Graphics and other simple functions to explore multivariate data, data with many variables

Page 18: R Workshop: Day 3 Yun Ju Sung 7/15/2014. 5. Regression Analysis Regression analysis is fundamental and forms a major part of statistical analysis!

Multivariate data• Multivariate data consists of multiple data vectors considered

as a whole. • There are many advantages to combining them into a single

data object. This makes it easier to save our work, is convenient for many functions, and is much more organized.

• Multivariate data can be summarized and viewed in ways that are similar to those for bivariate and univariate data.

• As there are many more possible relationships, we can look at all the variables simultaneously or hold some variables constant while we look at others.

Page 19: R Workshop: Day 3 Yun Ju Sung 7/15/2014. 5. Regression Analysis Regression analysis is fundamental and forms a major part of statistical analysis!

Data frame• Often in statistics, data is presented in a tabular format similar

to a spreadsheet. The columns are for different variables. Each row corresponds to measurements on the same subject or experimental unit.

• R uses a data frame to store rectangular grids of data.• Each column is a data vector containing data for one of the

variables. • The columns need not all be of the same type (e.g., numeric,

character, or logical).

Page 20: R Workshop: Day 3 Yun Ju Sung 7/15/2014. 5. Regression Analysis Regression analysis is fundamental and forms a major part of statistical analysis!

Store multivariate data in data frames• Data frames are created with the data.frame() function

• We can use the row.names() function to set the rows names:

Page 21: R Workshop: Day 3 Yun Ju Sung 7/15/2014. 5. Regression Analysis Regression analysis is fundamental and forms a major part of statistical analysis!

List• A list is a more general type of storage than a data frame. • Think of a list as a collection of components. Each component

can be any R object, such as a vector, a data frame, a function, or even another list.

• A list is a very flexible type of data object containing ordered collections of objects (components)!

• Many of the functions in R, such as lm(), have return values that are lists, although only selected portions may be displayed when the return value is printed.

• Data frames must have variables of the same length, whereas lists can have variables of different length.

Page 22: R Workshop: Day 3 Yun Ju Sung 7/15/2014. 5. Regression Analysis Regression analysis is fundamental and forms a major part of statistical analysis!

Store multivariate data in list• Lists are created with list()> x=1:2 ; y=letters[1:2] ; z=1:3 > data.frame(x,y,z) # Can’t> list(x,y,z) [[1]] [1] 1 2[[2]][1] "a" "b"[[3]][1] 1 2 3

> list(x=x, y=y, z=z)$x[1] 1 2$y[1] "a" "b" $z[1] 1 2 3

Page 23: R Workshop: Day 3 Yun Ju Sung 7/15/2014. 5. Regression Analysis Regression analysis is fundamental and forms a major part of statistical analysis!

Accessing data in data frames • To access as an array, use single brackets [row, column]

• To access as a list, use either a dollar sign ($) or double bracket

Page 24: R Workshop: Day 3 Yun Ju Sung 7/15/2014. 5. Regression Analysis Regression analysis is fundamental and forms a major part of statistical analysis!

Sorting a data frame by one column• The sort() function can be used to arrange the values of a data

vector in increasing or decreasing order.• To sort a data frame by one of its columns, sort() will not work.• We use order() to rearrange, or permute, the row indices so

that the new data frame is sorted as desired.• Try:

> sort(mpg)> mpg[order(mpg)]> mtcars[order(mpg),] # sort by miles per gallon> mtcars[order(mpg, decreasing=TRUE), ] # best mpg first

Page 25: R Workshop: Day 3 Yun Ju Sung 7/15/2014. 5. Regression Analysis Regression analysis is fundamental and forms a major part of statistical analysis!

Assignment: Question 14Use the data set mtcars:a. Sort the data set by weight, heaviest first.b. Which car gets the best mileage (largest mpg)? Which gets

the worst?c. The cars in rows c(1:3, 8:14, 18:21, 26:28, 30:32) were

imported into the United States. Compare the variable mpg for imported and domestic cars using a boxplot. Is there a difference?

d. Make a scatterplot of weight, wt, versus miles per gallon, mpg. Label the points according to the number of cylinders, cyl. Describe any trends.

Page 26: R Workshop: Day 3 Yun Ju Sung 7/15/2014. 5. Regression Analysis Regression analysis is fundamental and forms a major part of statistical analysis!

Assignment: Question 15The data set u2 (in UsingR) contains the time in seconds for albums released by the band U2 from 1980 to 1997. The data is stored in a list.a. Make a boxplot of the song lengths by album. Which album

has the most spread? Are the means all similar?b. Use sapply() to find the mean song time for each album.

Which album has the shortest mean? Repeat with the median. Are the results similar?

c. What are the three longest songs? The unlist() function will turn a list into a vector. First unlist the song lengths, then sort.

Could you use a data frame to store this data?

Page 27: R Workshop: Day 3 Yun Ju Sung 7/15/2014. 5. Regression Analysis Regression analysis is fundamental and forms a major part of statistical analysis!

Scatterplot matrix• R has a built-in function for creating scatterplots of all possible

pairs of variables. • This scatterplot matrix can be made with the pairs() function

Page 28: R Workshop: Day 3 Yun Ju Sung 7/15/2014. 5. Regression Analysis Regression analysis is fundamental and forms a major part of statistical analysis!

Scatterplot with colors• We can plot multiple relationships on the same scatterplot

using different plot characters or colors to distinguish the variables.

• Example: gestation period vs. birth weight in babies data> gestation[gestation == 999]= NA # 999 for NA> plot(gestation, wt, pch=smoke) > table(smoke) # values of plot characters

> legend(locator(1), legend=c("never","yes","until pregnant","longago","unknown"), pch=c(0:3,9),)

150 200 250 300 350

60

80

10

01

20

14

01

60

18

0

gestation

wt

neveryesuntil pregnantlongagounknown

Page 29: R Workshop: Day 3 Yun Ju Sung 7/15/2014. 5. Regression Analysis Regression analysis is fundamental and forms a major part of statistical analysis!

Example: gestation vs. weight by smoke > f0=wt[smoke==0]~gestation[smoke==0]> plot(f0,xlab="gestation",ylab="weight")> abline(lm(f0))> f1=wt[smoke==1]~gestation[smoke==1]> points(f1,col=2,pch=3)> abline(lm(f1),col=2)> legend("topleft", legend=c("never smoked","smokes now"), pch=c(1,3), lty=1, bty="n", col=1:2)

150 200 250 300 350

60

80

10

01

20

14

01

60

18

0

gestation

wt

never smokedsmokes now

Page 30: R Workshop: Day 3 Yun Ju Sung 7/15/2014. 5. Regression Analysis Regression analysis is fundamental and forms a major part of statistical analysis!

Assignment: Question 16The data set UScereal (in MASS) lists facts about specific cereals sold in a United States supermarket. a. Is there a relationship between manufacturer (mfr) and vitamin

type (vitamins) by shelf location (shelf)? Do you expect one? Why?

b. Look at the relationship between calories and sugars with a scatterplot. Identify the outliers. Are these also fat-laden cereals?

c. Now look at the relationship between calories and sugars with a scatterplot using different size points given by cex=2*sqrt (fat). (This is called a bubble plot. The area of each bubble is proportional to the value of fat.) Describe any additional trend that is seen by adding the bubbles.

Any other expected relationships between the variables?

Page 31: R Workshop: Day 3 Yun Ju Sung 7/15/2014. 5. Regression Analysis Regression analysis is fundamental and forms a major part of statistical analysis!

The lattice package• Using various colors and plotting characters, we tried to show

whether the third variable, smoke, affected the relationship between gestation time and birth weight.

• A better solution would be to create a separate scatterplot for each level of the third variable.

• These graphs can be made using the lattice graphics package.• If the package is loaded (with library (lattice)), a description of

lattice graphics is available in the help system under ?Lattice.• Let's use the data set Cars93 to illustrate.

Page 32: R Workshop: Day 3 Yun Ju Sung 7/15/2014. 5. Regression Analysis Regression analysis is fundamental and forms a major part of statistical analysis!

Histogram and boxplot using lattice• Histograms• Boxplots

Max.Price

Pe

rce

nt o

f To

tal

0

20

40

60

80

100

20 40 60 80

3 4

20 40 60 80

5

6

20 40 60 80

8

0

20

40

60

80

100rotary

Max.Price

20 40 60 80

3 4

20 40 60 80

5

6

20 40 60 80

8 rotary

Page 33: R Workshop: Day 3 Yun Ju Sung 7/15/2014. 5. Regression Analysis Regression analysis is fundamental and forms a major part of statistical analysis!

The lattice package• The basic idea is that the graphic consists of a number of

panels. Each panel corresponds to some value(s) of a conditioning variable.

• The lattice graphing functions are called using the model-formula interface. The formulas have the format

response ~ predictor|condition• For univariate graphs such as histograms, the response

variable, the left side, is empty.

Page 34: R Workshop: Day 3 Yun Ju Sung 7/15/2014. 5. Regression Analysis Regression analysis is fundamental and forms a major part of statistical analysis!

Scatterplot: MPG and fuel rank size• The following will plot the relationship between MPG and tank

size. We expect that cars with better mileage can have smaller tanks. This type of plot allows us to check if it is the same for all types of cars.

Page 35: R Workshop: Day 3 Yun Ju Sung 7/15/2014. 5. Regression Analysis Regression analysis is fundamental and forms a major part of statistical analysis!

Scatterplot: MPG and fuel rank size

Fuel.tank.capacity

MP

G.h

igh

wa

y

20

25

30

35

40

45

50

10 15 20 25

Compact Large

10 15 20 25

Midsize

Small

10 15 20 25

Sporty

20

25

30

35

40

45

50Van

Page 36: R Workshop: Day 3 Yun Ju Sung 7/15/2014. 5. Regression Analysis Regression analysis is fundamental and forms a major part of statistical analysis!

Back to gestation vs weight by smoke

gestation

wt

60

80

100

120

140

160

180

150 200 250 300 350

0 1

150 200 250 300 350

2

3

150 200 250 300 350

60

80

100

120

140

160

1809

Page 37: R Workshop: Day 3 Yun Ju Sung 7/15/2014. 5. Regression Analysis Regression analysis is fundamental and forms a major part of statistical analysis!

Assignment: Question 17The data set survey (in MASS) contains survey information on students.a. Make scatterplots using xyplot() of the writing-hand size

(Wr.Hnd) versus nonwriting-hand size (NW.Hnd), broken down by gender (Sex). Is there a trend?

b. Make boxplots using bwplot() of Pulse for the four levels of Smoke broken down by gender Sex. Are there any trends? Differences?

Do you expect any other linear relationships among the variables?

Page 38: R Workshop: Day 3 Yun Ju Sung 7/15/2014. 5. Regression Analysis Regression analysis is fundamental and forms a major part of statistical analysis!

Summary

Page 39: R Workshop: Day 3 Yun Ju Sung 7/15/2014. 5. Regression Analysis Regression analysis is fundamental and forms a major part of statistical analysis!

Data, univariate and bivariate data• One use of R is as a calculator, to evaluate arithmetic

expressions. Calculations can be carried out in parallel, across all elements of a vector at once.

• The data is stored in R as a vector. Most functions can be applied to vectors without loops!

• Exploratory data analysis aims to allow the data to speak for themselves. It gives multiple views of the data that may provide useful insights.

• We covered commonly-used functions for exploring data with graphs. Histograms, density plots, stem-and-leaf displays, and boxplots are useful for examining the distributions of individual variables. Scatterplots are useful for looking at relationships two at a time.

Page 40: R Workshop: Day 3 Yun Ju Sung 7/15/2014. 5. Regression Analysis Regression analysis is fundamental and forms a major part of statistical analysis!

Bivariate and regression analysis• In exploring the relationships in bivariate data, the correlation

coefficient provides a numeric measure. • The linear regression analysis can provide the richer and more

insightful framework.• There may be many more possibilities than regression lines. It is

useful to fit a smooth curve, to see whether it suggests systematic departure from a line.

• Model assumptions, such as normality, independence, and constancy of the variance, should be checked.

• In particular, plot residuals against fitted values and draw a plot that checks homogeneity of variance.

• Use of the function plot(), with an lm model, gives both these plots.

Page 41: R Workshop: Day 3 Yun Ju Sung 7/15/2014. 5. Regression Analysis Regression analysis is fundamental and forms a major part of statistical analysis!

Some technical notes• Help functions include help() (help page of a known function)

and apropos() (search for function names that include a specified character string).

• The function c() (concatenate) joins vector elements into vectors. It may be used for numeric, logical and character vectors.

• Use is.na() to identify elements that are NAs.• Commonly used generic functions include print(), plot(), and

summary(). For such functions, the result depends on the class of object that is given as argument.

• For simple forms of scatterplot, use plot() or the lattice function xyplot().

Page 42: R Workshop: Day 3 Yun Ju Sung 7/15/2014. 5. Regression Analysis Regression analysis is fundamental and forms a major part of statistical analysis!

More technical notes• Important R data structures are vectors, factors, lists, and data

frames. Vectors may be of mode numeric, or logical, or character.• Data frames use a list structure to group columns, which must all

have the same length, together into a single R object. The different columns may be any mix of logical, numeric, character, or factor.

• Contrast data frames with matrices. All matrix elements have the same mode. A matrix is stored as one long vector that is formatted to appear in a row by column layout.

• The R system has extensive abilities for inputting data from rectangular files (see help(read.table)), from spreadsheets, and from a variety of statistical package formats.

• Use attach() or with() (temporary attachment) to give access to the columns of a data frame, without the need to name the data frame whenever a column is accessed.

Page 43: R Workshop: Day 3 Yun Ju Sung 7/15/2014. 5. Regression Analysis Regression analysis is fundamental and forms a major part of statistical analysis!

Things not covered• Random data: uniform, normal, binomial, exponential,…• Confidence interval estimation: proportion test, t test• Hypothesis testing• Two sample tests• Chi square tests: goodness of fit, independence,…• Analysis of variance

Page 44: R Workshop: Day 3 Yun Ju Sung 7/15/2014. 5. Regression Analysis Regression analysis is fundamental and forms a major part of statistical analysis!

Additional reading• To learn more material at a level of this lecture:

• simpleR – using R for Introductory Statistics by John Verzani• Using R for Data Analysis and Graphics by John Maindonald

• For those who need more sophisticated and challenging material: • Data Analysis and Graphics Using R: An Example-Based

Approach by John Maindonald and W. Hohn Braun is my favoriate!

• The manual An Introduction to R by the R core development team is excellent for people familiar with statistics.

Page 45: R Workshop: Day 3 Yun Ju Sung 7/15/2014. 5. Regression Analysis Regression analysis is fundamental and forms a major part of statistical analysis!

Left-over MaterialMay be included in the next year?

Page 46: R Workshop: Day 3 Yun Ju Sung 7/15/2014. 5. Regression Analysis Regression analysis is fundamental and forms a major part of statistical analysis!

Prediction for new values• Interpretation: Changes in the predictor variable correspond to

changes in the response variable in a linear manner: a unit change in the predictor corresponds to a change in the response.

Page 47: R Workshop: Day 3 Yun Ju Sung 7/15/2014. 5. Regression Analysis Regression analysis is fundamental and forms a major part of statistical analysis!

7. Multiple Linear Regression

Page 48: R Workshop: Day 3 Yun Ju Sung 7/15/2014. 5. Regression Analysis Regression analysis is fundamental and forms a major part of statistical analysis!

Assignment: Question 8The data set babies contains much information about babies and their mothers for 1236 observations.

Find the correlation coefficient (both Pearson and Spearman) between age and weight. Repeat for the relationship between height and weight. Make scatter plots of each pair and see if your answer makes sense.

Page 49: R Workshop: Day 3 Yun Ju Sung 7/15/2014. 5. Regression Analysis Regression analysis is fundamental and forms a major part of statistical analysis!

Functions in R• Easy to create your own functions in R• As tasks become complex, it is a good idea to organize code

into functions that perform defined tasks• In R, it is good practice to give default values to function

arguments• Template for function:

Page 50: R Workshop: Day 3 Yun Ju Sung 7/15/2014. 5. Regression Analysis Regression analysis is fundamental and forms a major part of statistical analysis!

Example for defining functions• A function to simulate the t statistic from a sample of normals with mean 10

and standard deviation 5

• To make mean and standard deviation variable:

Page 51: R Workshop: Day 3 Yun Ju Sung 7/15/2014. 5. Regression Analysis Regression analysis is fundamental and forms a major part of statistical analysis!

Control statements in R

Similar to c, R has a rich set of program control options

• if … else …• for, repeat, while loops• next, break statements

Page 52: R Workshop: Day 3 Yun Ju Sung 7/15/2014. 5. Regression Analysis Regression analysis is fundamental and forms a major part of statistical analysis!

For loops• A for loop allows you to loop over values in a vector or list of

numbers. It is a powerful programming feature.• To add up the numbers in the vector x

• better done with sum

Page 53: R Workshop: Day 3 Yun Ju Sung 7/15/2014. 5. Regression Analysis Regression analysis is fundamental and forms a major part of statistical analysis!

Conditional expressions• To do different things based on the value of a variable, use

conditional expressions• For example, to get the absolute value function,

• To fix a bug, try x[x<0]<- -x[x<0] (or the built in function abs).

Page 54: R Workshop: Day 3 Yun Ju Sung 7/15/2014. 5. Regression Analysis Regression analysis is fundamental and forms a major part of statistical analysis!

DAGG (page 148)