quick r tutorial for beginners - trey ideker...

27
Quick R Tutorial for Beginners Version 0.3 2015-12-3 Rintaro Saito, University of California San Diego Haruo Suzuki, Keio University

Upload: vuongthuan

Post on 07-May-2018

219 views

Category:

Documents


5 download

TRANSCRIPT

Quick R Tutorial

for Beginners

Version 03 2015-12-3

Rintaro Saito University of California San Diego

Haruo Suzuki Keio University

2

0 Introduction

R is one of the best languages to perform statistical analyses it can analyze huge amount of

data through scientific calculations find tendencies among the data and visualize them In

this tutorial you will learn fundamentals of R language

1 Starting and ending R

How to start R depends on the system you are using In the UNIX system with R installed

you may be able to start R by just typing R In Windows or Macintosh double-click R icon to

start To end R just type

q()

2 Simple value assignments to variables

Variables temporarily keep values For example if we want to have variable x to keep the

value 123 type

x lt- 123

After that you can just type x then you will see value assigned to x

gt x lt- 123

gt x

[1] 123

gt

Actually R is capable of handling not only single value but also vector1 For example if you

want to assign vector (1 3 5) to variable y you can do so with c before vector

y lt- c(1 3 5)

You can make vectors having numbers from 2 to 5 by y lt- c(2345) Alternatively you can

simply type

1 Single value (scalar) can be deemed as one-dimensional vector

3

y lt- 25

as vectors are composed of consecutive numbers from 2 to 5

Exercise 2-1 Assign (10 11 12) to variable z and display information of variable z

3 Simple Arithmetic

R can perform various arithmetic including addition subtraction multiplication and

division For example if you input 1+2 you will get an answer of 3 as follows

gt 1 + 2

[1] 3

gt

In R like most of other programming languages + - and denotes to addition

subtraction multiplication and division respectively

R can also manipulate vectors For example c(1 2 3) + c(4 5 6) will give (5 7 9) and

c(1 2 3) 2 will give (2 4 6) Arithmetic calculations with variables can also be performed

If (1 2 3) and (4 5 6) is assigned to x and y respectively x + y will give a vector (5 7 9)

A value of variables can be changed based on result of calculation For example after doing

x lt- c(1 2 3)

x lt- x 2

will multiply x by 2 and the result will be overwritten to x itself

Exercise 3-1 Assign vector (1 2 3) to variable x and assign 2 to variable y Then calculate x

y What answer do you get

Exercise 3-2 Assign (1 2 3) to vector x and multiply it by three Put the result into x itself

4 Simple vector arithmetic

As already explained list of numbers in bracket immediately after c represents vector

Dimension of vector ie number of numbers in the given vector x can be obtained by

length(x)

4

One can extract desired number from the vector For example after inputting x lt-

c(246810)

x[3]

will give you 6 which is the third number of the vector

x[c(24)]

will give you second and fourth numbers of the vector and

x[24]

will give you numbers from second to fourth elements of the vector

You can also perform vector comparison For example x gt 5 will give you vectors

containing TRUE andor FALSE (abbreviated as T and F respectively) where each element

denotes whether corresponding number in x is greater than x or not

gt x lt- c(246810)

gt x

[1] 2 4 6 8 10

gt x gt 5

[1] FALSE FALSE TRUE TRUE TRUE

gt

R function ldquowhichrdquo will returns where the Trsquos are in the given vector as index numbers

Using the vector c(F F T T T) returned by x gt 5 which(c(F F T T T)) will give you vector

(345) More easily ldquowhich(x gt 5)rdquo will give you exactly the same answer

gt x lt- c(246810)

gt which(x gt 5)

[1] 3 4 5

gt

Using this vector which is composed of T and F we can extract elements which correspond

to T

x[ c(F F T T T)]

By putting above together it is sufficient to just type

x[ x gt 5 ]

to extract elements whose values are above 5

5

Actually we can also deal with set of strings as vector For example after typing

x lt- c(Sunday Monday Tuesday Wednesday Thursday Friday Saturday)

x[2] will give you second string rdquoMondayrdquo Various manipulations are available for a vector

containing strings

grep(es x)

will give you indices of strings in vector x where string contain substring ldquoesrdquo Thus the

below inputs will return strings containing ldquoesrdquo

gt x lt- c(Sunday Monday Tuesday Wednesday Thursday Friday Saturday)

gt x[ grep(es x) ]

[1] Tuesday Wednesday

gt

You can assign name for each element in a vector using function ldquonamerdquo For example

x lt- c(2 4 6)

names(x) lt- c(First Second Third)

will assign (ldquoFirstrdquo ldquoSecondrdquo ldquoThirdrdquo) to name attribute of variable x

x[[ Second ]]

will give you second element

Various statistical functions are defined for vectors containing only numerical values For

example

sum(x)

will give you sum of numerical elements in vector x and

mean(x)

will give you average of numerical elements in x The function sum will count number of Trsquos

in vector x containing only T andor F Thus

x lt- c(246810)

sum(x gt 5)

will count number of numerical elements greater than 5 (ie 3 will be returned)

Exercise 4-1 Create vector containing numerical values larger than 5 in vector (3 1 4 1 5

9 2 6 5)

6

Exercise 4-2 Assign names (ldquoJanrdquo ldquoFebrdquo ldquoMarrdquo ldquoAprrdquo ldquoMayrdquo ldquoJunrdquo ldquoJulrdquo ldquoAugrdquo ldquoSeprdquo

ldquoOctrdquo ldquoNovrdquo ldquoDecrdquo) for respective element of vector (123456789101112) Then

extract third element from the vector using the string rdquoMarrdquo The extracted element should

be 3

5 Simple matrix construction and arithmetic

By gathering vectors it is possible to create a matrix using the function rbind For example

if you want to create a matrix x = 1 2 3

4 5 6

aelig

egraveccedil

ouml

oslashdivide where the first and second rows are (123)

and (456) respectively type

x lt- rbind(c(123) c(456))

Then just type x to display the matrix you have created

gt x lt- rbind(c(123) c(456))

gt x

[1] [2] [3]

[1] 1 2 3

[2] 4 5 6

gt

Alternatively

x lt- matrix(c(1 4 2 5 3 6) nrow=2 ncol=3)

or

x lt- matrix(c(1 2 3 4 5 6) nrow=2 ncol=3 byrow=T)

should return the identical matrix The latter one means that matrix of size 2 times 3 will be

created (nrow=2 ncol=3) and numbers are filled in each row first (byrow=T)

Various functions are available for a matrix nrow(x) and ncol(x) will return number of

rows and columns of the matrix x respectively x+1 will add 1 to all the elements in x and x

2 will multiply all the elements in x by 2 Various kinds of arithmetic between matrices

are also available For example after typing y lt- rbind(c(2 4 6) c(8 10 12)) x + y will give

you matrix where each element correspond to sum of corresponding element of x and y x y

will give you matrix where each element correspond to product of corresponding element of

7

x and y2

Any specific row or column can be extracted from a given matrix For example if you

want to extract second row you type

gt x[2]

[1] 4 5 6

gt

If you want to extract second column you type

gt x[2]

[1] 2 5

t(x) will transpose the matrix x

gt t(x)

[1] [2]

[1] 1 4

[2] 2 5

[3] 3 6

gt

Average of each row of matrix x can be calculated as below3

gt apply(x 1 mean)

[1] 2 5

Average of each column of matrix x can be calculated as below

gt apply(x 2 mean)

[1] 25 35 45

Actually matrix can be deemed as set of numbers in two dimensional array R can also

deal with array having n dimensions using array(vector numbers of elements in each

2 Product of the matrices based on canonical mathematical definition can be calculated by

the operator 3 Second parameter of 1 in apply function denotes that we will calculate average for each

position of the first dimension (in this case the first dimension is row number) Similarly

second parameter of 2 denotes that we will calculate average for each position of the second

dimension (in this case the second dimension is column number)

8

dimension) For example

x lt- array(124 c(342))

will create three dimensional array with size of 3times4times2 and fills numbers in the given

vector from the first dimension to the third dimension

gt x lt- array(124 c(342))

gt x

1 The first 3times4 array of the third dimension

[1] [2] [3] [4]

[1] 1 4 7 10

[2] 2 5 8 11

[3] 3 6 9 12

2 The second 3times4 array of the third dimension

[1] [2] [3] [4]

[1] 13 16 19 22

[2] 14 17 20 23

[3] 15 18 21 24

gt

Exercise 5-1 Calculate 1 3 5

7 9 11

aelig

egraveccedil

ouml

oslashdivide+

1 2 3

2 4 6

aelig

egraveccedil

ouml

oslashdivide

Exercise 5-2 For the matrix obtained in the above calculation calculate row average and

column average

6 Simple list creation

A list in R can gather various kinds of data into a single object to manage them

x lt- list(Ichiro rdquo Seattlerdquo Right fielder)

will create a list containing Ichiro rdquo Seattlerdquo Right fielder Although a vector cannot

9

contain another vector a list can contain a vector as shown in the example below

x lt- list(Ichiro Seattle Right fielder c(184 214 225))

To extract the second element you can type

x[[2]]

You can assign names to each element as below

x lt- list(player = Ichiro team = Seattle

position = Right fielder hits = c(184 214 225))

Type x to confirm that names are given to each element

gt x

$player

[1] Ichiro

$team

[1] Seattle

$position

[1] Right fielder

$hits

[1] 184 214 225

gt

You can use assigned name to extract corresponding element For example

x[[team]]

or

x$team

will extract ldquoSeattlerdquo

Exercise 6-2 Create a list containing ldquoSan Diegordquo vector (32 117) 9645 and ldquoCaliforniardquo

Give names ldquocityrdquo ldquocoordinatesrdquo ldquoareardquo and ldquostaterdquo to respective element

10

11

7 Simple data frame creation

Data frame is one of data class in R It is one type of list and has two dimensional

structure just like a matrix Each row can be deemed as a sample and each column can be

deemed as attribute of the sample Using this idea data frame can represent a table

Following table gives data of five retired major league baseball players

Team atbats hits homeruns

Rose Reds 14053 4256 160

Aaron Brewers 12364 3771 755

Yastrzemski Red Sox 11988 3419 452

Ripken Orioles 11551 3184 431

Cobb Athletics 11434 4191 117

The above table can be represented by data frame as follows

x lt- dataframe(

rownames = c(Rose Aaron Yastrzemski Ripken Cobb)

team = c(Reds Brewers Red Sox Orioles Athletics)

atbats = c(14053 12364 11988 11551 11434)

hits = c(4256 3771 3419 3184 4191)

homeruns = c(160 755 452 431 117))

The function dataframe can generate a data frame and can be used in the format of

dataframe(rownames = vector for column labels column name 1 = vector 1 column name

2 = vector 2 hellip) You can check the content of x by typing x

gt x

team atbats hits homeruns

Rose Reds 14053 4256 160

Aaron Brewers 12364 3771 755

Yastrzemski Red Sox 11988 3419 452

Ripken Orioles 11551 3184 431

Cobb Athletics 11434 4191 117

gt

Like a regular list column name can be used to extract corresponding vector

gt x$hits

[1] 4256 3771 3419 3184 4191

gt

12

Any part of data frame can be extracted as another data frame For example

x[ c(15) c(234) ]

will extract row 1 5 and column 2 3 4 as a new data frame You can also use row names

and column names to do the same thing

x[ c(Rose Cobb) c(atbats hits homeruns)]

Various functions are defined for data frame Let us get attributes associated with the data

frame x

gt attributes(x)

$names

[1] team atbats hits homeruns

$rownames

[1] Rose Aaron Yastrzemski Ripken Cobb

$class

[1] dataframe

gt

We got names rownames and class Type names(x) rownames(x) and class(x) You can

obtain row names column names and class of x respectively

Exercise 7-1 The below table shows characteristic values of each planet in solar system

Masses are represented in 1023 kg and diameters are represented in km Make a data frame

representing the table4

Mass Diameter Satellites

Mercury 3301 4879 0

Venus 12103 48690 0

Earth 59736 12756 1

4 As all the elements are numeric in this case this data frame can be dealt as a matrix

asmatrix(x) will return x as a matrix

13

Mars 6419 6794 2

Jupiter 18986 142984 63

Saturn 5688 120536 48

Uranus 8686 51118 27

Neptune 1024 49528 13

Exercise 7-2 Using the created data frame in the previous exercise calculate averages of

mass diameter and number of satellites

8 Data reading from file

So far we have been inputting data directly However in most of cases numerical data

may be prepared by files So we will learn how to import data into a variable in R First

prepare a text file with the following values Let us name the file ldquotestdatatxtrdquo We assume

that the file is placed under the directory UserssmithTMP

14

14

21

35

6

Using setwd we change the working directory to UserssmithTMP

setwd(UserssmithTMP)

Then using scan function read the data to a variable x

gt x lt- scan(testdatatxt)

Read 5 items

gt x

[1] 14 14 21 35 6

gt

You notice that the values are stored in x as a vector

R has a function readtable to read a table where each line is separated by TABs like the

following example (file name is ldquobatterstxtrdquo)

Team At_Bat Hits Home_Runs

14

Bonds Giants 9847 2935 762

Aaron Braves 12364 3771 755

Ruth Yankees 8398 2873 714

Rodriguez Yankees 10341 3070 687

Mays Giants 10881 3283 660

readtable can read this as data frame

gt x lt- readtable(batterstxt header = T

sep = t rownames = 1)

gt x

Team At_Bat Hits Home_Runs

Bonds Giants 9847 2935 762

Aaron Braves 12364 3771 755

Ruth Yankees 8398 2873 714

Rodriguez Yankees 10341 3070 687

Mays Giants 10881 3283 660

gt

The first parameter of the function readtable states that the name of the file to read

is rdquobatterstxtrdquo Subsequently header = T states that there is a header for the table sep =

ldquotrdquo indicates that each line is separated by TAB Finally we can specify that there is one

column for row names by rownames = 1 Since x will be a data frame we can get number of

hits by $Hits

Exercise 8 Save table in exercise 7-1 as tab-delimited file and read it as data frame

9 Writing data to file

There are many cases where we want to have statistical results written in file rather

than temporary seeing the results on screen In this way we can open the results by

spreadsheet software or process the results using other programming language afterwards

Using write function is one of the simplest ways to output the results to files This

function enables us to output numerical values assigned to variable to file For example

after assigning the vector to x as shown below

x lt- c(10 12 15 19 21 34)

15

the following line

write(x outfile1txt ncolumns = 1)

will write content of x to the file ldquooutfile1txtrdquo ncolumns = 1 states that the vector will be

written to the file by one column

Content of outfile1txt

10

12

15

19

21

34

write function also enable us to write matrix data to file as shown below

gt x lt- matrix(c(123456) nrow=2 ncol=3 byrow=T)

gt x

[1] [2] [3]

[1] 1 2 3

[2] 4 5 6

gt write(t(x) outfile2txt ncolumns=ncol(x) sep=t)

gt

t(x) transposes matrix x If we do not do it the matrix written in the file will look

transposed ncolumns=ncol(x) will tell R that the number of columns to output should be

identical to that of matrix x sep = ldquotrdquo indicates that the columns in the output file will be

separated by tabs

Content of outfile2txt

1 2 3

4 5 6

For writing data frame to a file writetable function is provided

gt x lt- dataframe(

rownames = c(Bonds Aaron Ruth Rodriguez Mays)

Team = c(Giants Braves Yankees Yankees Giants)

16

At_Bat = c(9847 12364 8398 10341 10881)

Hits = c(2935 3771 2873 3070 3283)

Home_Runs = c(762 755 714 687 660))

gt x[c(Hits Home_Runs)]

Hits Home_Runs

Bonds 2935 762

Aaron 3771 755

Ruth 2873 714

Rodriguez 3070 687

Mays 3283 660

gt writetable(x[c(Hits Home_Runs)] outfile3txt sep=t

rownames=T colnames=NA)

gt

With rownames=T and colnames=NA row names and column names will be added

respectively (blank on the top-left)

Content of outfile3

Hits Home_Runs

Bonds 2935 762

Aaron 3771 755

Ruth 2873 714

Rodriguez 3070 687

Mays 3283 660

By giving quote=F output will be without double quotations

Excercise 9 Using the data frame in the above example calculate ratio of hits and at bat

and write the result to rdquooutfile4txtrdquo

10 Writing a program in a file

So far we have done our works interactively without saving what R codes we have written

However when we repeat the same works with R it is laborious to interactively write the

same codes again and again To solve this issue we can write the set of codes in a file and R

can read the file to execute the codes written in the file

For example we can prepare a text file with the following code Let us name the

17

file vecsumtestR

x lt- c(12345)

y lt- c(246810)

z lt- x + y

Then giving source( vecsumtestR) will let R execute the set of codes in the file

vecsumtestR You can check that based on the codes in vecsumtestR vectors are assigned

to each of x y and z

gt source( vecsumtestR)

gt x

[1] 1 2 3 4 5

gt y

[1] 2 4 6 8 10

gt z

[1] 3 6 9 12 15

gt

Exercise 10 Write the procedure in exercise 8 to a file dframetestR and execute the

procedure using ldquosourcerdquo

11 Defining functions

In mathematics function outputs the value which is determined by the input value In

programming it often represents a defined set of procedures5 For example let us consider

a function f which divides the sum of two given input values In mathematics it can be

written as f(x y) = (x + y) 2 In R it is written as follows by introducing the keyword

ldquofunctionrdquo

f lt- function(x y)

return ((x + y) 2)

In this way a function with two parameters (x and y) is defined And the returned value

(output) will be (x + y) 2 After defining f

5 Probably ldquosubroutinerdquo may be the better terminology

18

f(10 20)

will assign 10 and 20 to parameters x and y respectively and 15 will be returned as (x +y) 2

= (10 + 20) 2 = 15 Here ldquoreturnrdquo statement will return the value given just after it You

can also explicitly give the parameter names x and y as follows

f(x = 10 y = 20)

The general way to define a function is

name_of_function lt- function(parameter 1 parameter 2 hellip)

various procedures possibly using the given parameters6

hellip

return(return_value)

Exercise 11 Implement a mathematical function f(x a b c) = ax2 + bx + c in R Then

calculate f(4 3 2 1)

12 Making graphs

R is equipped with functions capable of making graphs easily ldquoplotrdquo function may be one

of the simplest ones It creates a plot in the two-dimensional space For example

assigning the following vectors to x and y will create a plot with points on the locations

(12) (34) (59) (77) and (98)

x lt- c(13579)

y lt- c(24978)

plot(x y xlab=X Value ylab=Y Value)

Labels on the x and y axes can be given to the parameters xlab and ylab

respectively

6 Optional parameters hellip can be access with list(hellip) as a list

19

A plot created by ldquoplotrdquo function

A bar graph can be created using ldquobarplotrdquo function In the following example a bar

graph is created by giving the heights of each bar by vector x The labels of each bar

are given by the element names of the vector x ie names(x)

x lt- c(1232101)

names(x) lt- c(A B C D E F)

barplot(x)

A bar graph created by ldquobarplotrdquo function

ldquohistrdquo function generates a histogram for a set of numbers given by a vector

2 4 6 8

23

45

67

89

X Value

YV

alu

e

A B C D E F

02

46

810

A B C D E F

02

46

810

20

x lt- c(32 12 42 23 34 59 52 53 41 52 32 14)

hist(x xlab = Test Value main = Test Histogram)

The parameter ldquomainrdquo is used to state the title of the histogram

A generated histogram using ldquohistrdquo function

ldquoboxplotrdquo function creates a boxplot for the numbers given by the vectors

x1 lt- c(1112111011111213151211101213)

x2 lt- c(20212791223231211921157121292315)

boxplot(x1 x2 names=c(Data 1 Data 2))

Test Histogram

Test Value

Fre

qu

en

cy

1 2 3 4 5 6

01

23

4

21

Example of a box plot

Exercise 12 Using the table (data frame) given in exercise 7-1 make a plot which describes

relationship between masses of the planets and number of satellites

13 Basic program structure

R is equipped with programming grammars which are important and common in other

programming languages Here some of the most important ones will be described briefly

131 if-statement

ldquoifrdquo statement gives you a way to execute a specific procedure only if a specified condition7

is satisfied For example if you want to assign 1 to y when x gt 0 and otherwise assign 0 to y

you can write as follows

if (x gt 0)

y lt- 1

7 When specifying a condition logical operators such as amp (logical AND) and | (logical OR)

can be used

Data 1 Data 2

10

15

20

25

Median

Outlier 50 of the

data are within this

range

The bottom

25 excluding outliers

The top 25

excluding outliers

22

else

y lt- 0

Investigate the value of y after doing x lt- -5 Then after doing x lt- 3 investigate the value

of y again to make sure that the value has changed

The general form of if-statement is as follows

if (condition 1)

Set of procedures to execute when the condition 1 is satisfied

else if (condition 2)

Set of procedures to execute when the condition the condition 2 is satisfied (but the

condition 1 is NOT met)

else if(condition 3)

Se of procedures to execute when condition 3 is satisfied (but condition 1 to 2 are NOT

met)

else if hellip

else

Set of procedures to execute when NONE of the above conditions are satisfied

Exercise 13-1 Define a function which returns 1 when the input parameter is 0 and

otherwise returns 0

132 while-statement

while-statement iterates a set of given procedures as long as the given condition is satisfied

The general form of while-statement is as follows8

while(condition)

8 next-statement in the while-statement forces program to start the next iteration

immediately break-statement in while-statement forces the program to immediately stop

the iteration and get out of the while-loop

23

A set of procedures to execute when the condition is satisfied

For example executing while(x lt= 3) print(x) x lt- x + 1 after doing x lt- 1 will generate

the output of 1 to 3

gt x lt- 1

gt while (x lt= 3) print(x) x lt- x + 1

[1] 1

[1] 2

[1] 3

gt

If you are to write this procedure as a program in a file you may want to write each step

line by line as follows so that the program will be easy to read

x lt- 1

while (x lt= 3)

print(x)

x lt- x + 1

Initially the value of x is 1 and the condition of the while-block ie x le 3 is met Thus R

goes into the while-block The first procedure print(x) which displays the value of x is

executed and 1 will be displayed

In the next procedure in the while-block (x lt- x + 1) the value of x is increased by 1 Thus

at the end of the first loop of while-block x is 2

R interpreter then comes back to the condition check at the beginning of the while-block

(x lt= 3) The variable x is 2 at this point and still x le 3 holds so the procedures in the

while-block will be executed again ie displaying the value of x 2 by print(x) and

increasing the value of x by 1 by x lt- x + 1 At the end of the while-block x is 3

24

R interpreter will then come back again to the condition check at the beginning of the

while-block (x lt= 3) The variable x is 3 at this point and still x le 3 holds so the

procedures in the while-block will be executed again ie displaying the value of x which

is 3 by print(x) and increasing the value of x by 1 by x lt- x + 1 At the end of the

while-block x is 4

R interpreter will then come back again to the condition check at the beginning of the

while-block (x lt= 3) The variable x is 4 at this point and x le 3 is NOT satisfied any more

Thus R interpreter steps out of the while-loop Final value of x is 4

Exercise 13-2 Using while-statement display 13579 and 11 respectively in each line

133 for-statement

for-statement also does iteration like while-statement The for-statement assigns each of

given elements to the given variable starting from the first element to the last element in

the given elements After each assignment the procedures in the for-block are executed

The general form of for-statement is

for(variables in elements)

procedures

For example for (i in c(135)) print(i) will assign each of 1 3 5 into the variable i and

after each assignment the procedures in the file-block are executed Thus in this case

number 1 is displayed in the first iteration after the assignment 3 is displayed in the

second iteration after the assignment and finally 5 is displayed in the third iteration after

the assignment The following example makes a vector of set of squared values of 1 to 59

x lt- NULL

for (i in 15)

x lt- append(x i2)

9 Actually using for-statement is unnecessary here Just do x lt- (15)2

25

x lt- NULL assigns an empty vector to x NULL represents an empty vector

In the for-statement each value of 1 to 5 is assigned to i and after the assignment the

procedure in the for-block is executed

In the procedure in the for-block the squared value of i (represented as i2) is

concatenated to the vector x The function append(x i) concatenates i to vector x10

Exercise 13-3 Rewrite procedure in 12-2 using for-statement

14 Other useful functions

Here some of the frequently used R functions are introduced briefly

141 How to use

help(function_name) will display how to use function_name

142 variables and attributes

ls() or objects() will display currently defined variables

class(variable_name) or mode(variable_name) will give you the type of the variable (object

For example whether it is numerical variable character list or matrix)

attributes(variable_name) will return attributes defined for the given variable

variabl_name

143 Family of apply functions

ldquosapplyrdquo function will return a vector containing a set of output values from a given

function after using each value in the given vector as a single input to that given function

each by each

func1_sub lt- function(elm)

a function expecting a single number as a parameter

if(-1 lt= elm amp elm lt= 1) return (1) else return (0)

10 The same procedure can be done with x lt- c(x i2) c can be used to concatenate given

vectors

26

func1 lt- function(x) a function with a vector parameter x

return(sapply(x func1_sub))

144 Making figures

rdquocurverdquo function generates a graph for a given function See help for details

Example

curve(dnorm -7 +7) Draws normal (Gaussian) distribution

curve(cos(x)+cos(2x) -2pi 2pi 1000) 1000 is number of points

curve(func1 -3 3) The function defined in 143 Family of apply functions

27

2

0 Introduction

R is one of the best languages to perform statistical analyses it can analyze huge amount of

data through scientific calculations find tendencies among the data and visualize them In

this tutorial you will learn fundamentals of R language

1 Starting and ending R

How to start R depends on the system you are using In the UNIX system with R installed

you may be able to start R by just typing R In Windows or Macintosh double-click R icon to

start To end R just type

q()

2 Simple value assignments to variables

Variables temporarily keep values For example if we want to have variable x to keep the

value 123 type

x lt- 123

After that you can just type x then you will see value assigned to x

gt x lt- 123

gt x

[1] 123

gt

Actually R is capable of handling not only single value but also vector1 For example if you

want to assign vector (1 3 5) to variable y you can do so with c before vector

y lt- c(1 3 5)

You can make vectors having numbers from 2 to 5 by y lt- c(2345) Alternatively you can

simply type

1 Single value (scalar) can be deemed as one-dimensional vector

3

y lt- 25

as vectors are composed of consecutive numbers from 2 to 5

Exercise 2-1 Assign (10 11 12) to variable z and display information of variable z

3 Simple Arithmetic

R can perform various arithmetic including addition subtraction multiplication and

division For example if you input 1+2 you will get an answer of 3 as follows

gt 1 + 2

[1] 3

gt

In R like most of other programming languages + - and denotes to addition

subtraction multiplication and division respectively

R can also manipulate vectors For example c(1 2 3) + c(4 5 6) will give (5 7 9) and

c(1 2 3) 2 will give (2 4 6) Arithmetic calculations with variables can also be performed

If (1 2 3) and (4 5 6) is assigned to x and y respectively x + y will give a vector (5 7 9)

A value of variables can be changed based on result of calculation For example after doing

x lt- c(1 2 3)

x lt- x 2

will multiply x by 2 and the result will be overwritten to x itself

Exercise 3-1 Assign vector (1 2 3) to variable x and assign 2 to variable y Then calculate x

y What answer do you get

Exercise 3-2 Assign (1 2 3) to vector x and multiply it by three Put the result into x itself

4 Simple vector arithmetic

As already explained list of numbers in bracket immediately after c represents vector

Dimension of vector ie number of numbers in the given vector x can be obtained by

length(x)

4

One can extract desired number from the vector For example after inputting x lt-

c(246810)

x[3]

will give you 6 which is the third number of the vector

x[c(24)]

will give you second and fourth numbers of the vector and

x[24]

will give you numbers from second to fourth elements of the vector

You can also perform vector comparison For example x gt 5 will give you vectors

containing TRUE andor FALSE (abbreviated as T and F respectively) where each element

denotes whether corresponding number in x is greater than x or not

gt x lt- c(246810)

gt x

[1] 2 4 6 8 10

gt x gt 5

[1] FALSE FALSE TRUE TRUE TRUE

gt

R function ldquowhichrdquo will returns where the Trsquos are in the given vector as index numbers

Using the vector c(F F T T T) returned by x gt 5 which(c(F F T T T)) will give you vector

(345) More easily ldquowhich(x gt 5)rdquo will give you exactly the same answer

gt x lt- c(246810)

gt which(x gt 5)

[1] 3 4 5

gt

Using this vector which is composed of T and F we can extract elements which correspond

to T

x[ c(F F T T T)]

By putting above together it is sufficient to just type

x[ x gt 5 ]

to extract elements whose values are above 5

5

Actually we can also deal with set of strings as vector For example after typing

x lt- c(Sunday Monday Tuesday Wednesday Thursday Friday Saturday)

x[2] will give you second string rdquoMondayrdquo Various manipulations are available for a vector

containing strings

grep(es x)

will give you indices of strings in vector x where string contain substring ldquoesrdquo Thus the

below inputs will return strings containing ldquoesrdquo

gt x lt- c(Sunday Monday Tuesday Wednesday Thursday Friday Saturday)

gt x[ grep(es x) ]

[1] Tuesday Wednesday

gt

You can assign name for each element in a vector using function ldquonamerdquo For example

x lt- c(2 4 6)

names(x) lt- c(First Second Third)

will assign (ldquoFirstrdquo ldquoSecondrdquo ldquoThirdrdquo) to name attribute of variable x

x[[ Second ]]

will give you second element

Various statistical functions are defined for vectors containing only numerical values For

example

sum(x)

will give you sum of numerical elements in vector x and

mean(x)

will give you average of numerical elements in x The function sum will count number of Trsquos

in vector x containing only T andor F Thus

x lt- c(246810)

sum(x gt 5)

will count number of numerical elements greater than 5 (ie 3 will be returned)

Exercise 4-1 Create vector containing numerical values larger than 5 in vector (3 1 4 1 5

9 2 6 5)

6

Exercise 4-2 Assign names (ldquoJanrdquo ldquoFebrdquo ldquoMarrdquo ldquoAprrdquo ldquoMayrdquo ldquoJunrdquo ldquoJulrdquo ldquoAugrdquo ldquoSeprdquo

ldquoOctrdquo ldquoNovrdquo ldquoDecrdquo) for respective element of vector (123456789101112) Then

extract third element from the vector using the string rdquoMarrdquo The extracted element should

be 3

5 Simple matrix construction and arithmetic

By gathering vectors it is possible to create a matrix using the function rbind For example

if you want to create a matrix x = 1 2 3

4 5 6

aelig

egraveccedil

ouml

oslashdivide where the first and second rows are (123)

and (456) respectively type

x lt- rbind(c(123) c(456))

Then just type x to display the matrix you have created

gt x lt- rbind(c(123) c(456))

gt x

[1] [2] [3]

[1] 1 2 3

[2] 4 5 6

gt

Alternatively

x lt- matrix(c(1 4 2 5 3 6) nrow=2 ncol=3)

or

x lt- matrix(c(1 2 3 4 5 6) nrow=2 ncol=3 byrow=T)

should return the identical matrix The latter one means that matrix of size 2 times 3 will be

created (nrow=2 ncol=3) and numbers are filled in each row first (byrow=T)

Various functions are available for a matrix nrow(x) and ncol(x) will return number of

rows and columns of the matrix x respectively x+1 will add 1 to all the elements in x and x

2 will multiply all the elements in x by 2 Various kinds of arithmetic between matrices

are also available For example after typing y lt- rbind(c(2 4 6) c(8 10 12)) x + y will give

you matrix where each element correspond to sum of corresponding element of x and y x y

will give you matrix where each element correspond to product of corresponding element of

7

x and y2

Any specific row or column can be extracted from a given matrix For example if you

want to extract second row you type

gt x[2]

[1] 4 5 6

gt

If you want to extract second column you type

gt x[2]

[1] 2 5

t(x) will transpose the matrix x

gt t(x)

[1] [2]

[1] 1 4

[2] 2 5

[3] 3 6

gt

Average of each row of matrix x can be calculated as below3

gt apply(x 1 mean)

[1] 2 5

Average of each column of matrix x can be calculated as below

gt apply(x 2 mean)

[1] 25 35 45

Actually matrix can be deemed as set of numbers in two dimensional array R can also

deal with array having n dimensions using array(vector numbers of elements in each

2 Product of the matrices based on canonical mathematical definition can be calculated by

the operator 3 Second parameter of 1 in apply function denotes that we will calculate average for each

position of the first dimension (in this case the first dimension is row number) Similarly

second parameter of 2 denotes that we will calculate average for each position of the second

dimension (in this case the second dimension is column number)

8

dimension) For example

x lt- array(124 c(342))

will create three dimensional array with size of 3times4times2 and fills numbers in the given

vector from the first dimension to the third dimension

gt x lt- array(124 c(342))

gt x

1 The first 3times4 array of the third dimension

[1] [2] [3] [4]

[1] 1 4 7 10

[2] 2 5 8 11

[3] 3 6 9 12

2 The second 3times4 array of the third dimension

[1] [2] [3] [4]

[1] 13 16 19 22

[2] 14 17 20 23

[3] 15 18 21 24

gt

Exercise 5-1 Calculate 1 3 5

7 9 11

aelig

egraveccedil

ouml

oslashdivide+

1 2 3

2 4 6

aelig

egraveccedil

ouml

oslashdivide

Exercise 5-2 For the matrix obtained in the above calculation calculate row average and

column average

6 Simple list creation

A list in R can gather various kinds of data into a single object to manage them

x lt- list(Ichiro rdquo Seattlerdquo Right fielder)

will create a list containing Ichiro rdquo Seattlerdquo Right fielder Although a vector cannot

9

contain another vector a list can contain a vector as shown in the example below

x lt- list(Ichiro Seattle Right fielder c(184 214 225))

To extract the second element you can type

x[[2]]

You can assign names to each element as below

x lt- list(player = Ichiro team = Seattle

position = Right fielder hits = c(184 214 225))

Type x to confirm that names are given to each element

gt x

$player

[1] Ichiro

$team

[1] Seattle

$position

[1] Right fielder

$hits

[1] 184 214 225

gt

You can use assigned name to extract corresponding element For example

x[[team]]

or

x$team

will extract ldquoSeattlerdquo

Exercise 6-2 Create a list containing ldquoSan Diegordquo vector (32 117) 9645 and ldquoCaliforniardquo

Give names ldquocityrdquo ldquocoordinatesrdquo ldquoareardquo and ldquostaterdquo to respective element

10

11

7 Simple data frame creation

Data frame is one of data class in R It is one type of list and has two dimensional

structure just like a matrix Each row can be deemed as a sample and each column can be

deemed as attribute of the sample Using this idea data frame can represent a table

Following table gives data of five retired major league baseball players

Team atbats hits homeruns

Rose Reds 14053 4256 160

Aaron Brewers 12364 3771 755

Yastrzemski Red Sox 11988 3419 452

Ripken Orioles 11551 3184 431

Cobb Athletics 11434 4191 117

The above table can be represented by data frame as follows

x lt- dataframe(

rownames = c(Rose Aaron Yastrzemski Ripken Cobb)

team = c(Reds Brewers Red Sox Orioles Athletics)

atbats = c(14053 12364 11988 11551 11434)

hits = c(4256 3771 3419 3184 4191)

homeruns = c(160 755 452 431 117))

The function dataframe can generate a data frame and can be used in the format of

dataframe(rownames = vector for column labels column name 1 = vector 1 column name

2 = vector 2 hellip) You can check the content of x by typing x

gt x

team atbats hits homeruns

Rose Reds 14053 4256 160

Aaron Brewers 12364 3771 755

Yastrzemski Red Sox 11988 3419 452

Ripken Orioles 11551 3184 431

Cobb Athletics 11434 4191 117

gt

Like a regular list column name can be used to extract corresponding vector

gt x$hits

[1] 4256 3771 3419 3184 4191

gt

12

Any part of data frame can be extracted as another data frame For example

x[ c(15) c(234) ]

will extract row 1 5 and column 2 3 4 as a new data frame You can also use row names

and column names to do the same thing

x[ c(Rose Cobb) c(atbats hits homeruns)]

Various functions are defined for data frame Let us get attributes associated with the data

frame x

gt attributes(x)

$names

[1] team atbats hits homeruns

$rownames

[1] Rose Aaron Yastrzemski Ripken Cobb

$class

[1] dataframe

gt

We got names rownames and class Type names(x) rownames(x) and class(x) You can

obtain row names column names and class of x respectively

Exercise 7-1 The below table shows characteristic values of each planet in solar system

Masses are represented in 1023 kg and diameters are represented in km Make a data frame

representing the table4

Mass Diameter Satellites

Mercury 3301 4879 0

Venus 12103 48690 0

Earth 59736 12756 1

4 As all the elements are numeric in this case this data frame can be dealt as a matrix

asmatrix(x) will return x as a matrix

13

Mars 6419 6794 2

Jupiter 18986 142984 63

Saturn 5688 120536 48

Uranus 8686 51118 27

Neptune 1024 49528 13

Exercise 7-2 Using the created data frame in the previous exercise calculate averages of

mass diameter and number of satellites

8 Data reading from file

So far we have been inputting data directly However in most of cases numerical data

may be prepared by files So we will learn how to import data into a variable in R First

prepare a text file with the following values Let us name the file ldquotestdatatxtrdquo We assume

that the file is placed under the directory UserssmithTMP

14

14

21

35

6

Using setwd we change the working directory to UserssmithTMP

setwd(UserssmithTMP)

Then using scan function read the data to a variable x

gt x lt- scan(testdatatxt)

Read 5 items

gt x

[1] 14 14 21 35 6

gt

You notice that the values are stored in x as a vector

R has a function readtable to read a table where each line is separated by TABs like the

following example (file name is ldquobatterstxtrdquo)

Team At_Bat Hits Home_Runs

14

Bonds Giants 9847 2935 762

Aaron Braves 12364 3771 755

Ruth Yankees 8398 2873 714

Rodriguez Yankees 10341 3070 687

Mays Giants 10881 3283 660

readtable can read this as data frame

gt x lt- readtable(batterstxt header = T

sep = t rownames = 1)

gt x

Team At_Bat Hits Home_Runs

Bonds Giants 9847 2935 762

Aaron Braves 12364 3771 755

Ruth Yankees 8398 2873 714

Rodriguez Yankees 10341 3070 687

Mays Giants 10881 3283 660

gt

The first parameter of the function readtable states that the name of the file to read

is rdquobatterstxtrdquo Subsequently header = T states that there is a header for the table sep =

ldquotrdquo indicates that each line is separated by TAB Finally we can specify that there is one

column for row names by rownames = 1 Since x will be a data frame we can get number of

hits by $Hits

Exercise 8 Save table in exercise 7-1 as tab-delimited file and read it as data frame

9 Writing data to file

There are many cases where we want to have statistical results written in file rather

than temporary seeing the results on screen In this way we can open the results by

spreadsheet software or process the results using other programming language afterwards

Using write function is one of the simplest ways to output the results to files This

function enables us to output numerical values assigned to variable to file For example

after assigning the vector to x as shown below

x lt- c(10 12 15 19 21 34)

15

the following line

write(x outfile1txt ncolumns = 1)

will write content of x to the file ldquooutfile1txtrdquo ncolumns = 1 states that the vector will be

written to the file by one column

Content of outfile1txt

10

12

15

19

21

34

write function also enable us to write matrix data to file as shown below

gt x lt- matrix(c(123456) nrow=2 ncol=3 byrow=T)

gt x

[1] [2] [3]

[1] 1 2 3

[2] 4 5 6

gt write(t(x) outfile2txt ncolumns=ncol(x) sep=t)

gt

t(x) transposes matrix x If we do not do it the matrix written in the file will look

transposed ncolumns=ncol(x) will tell R that the number of columns to output should be

identical to that of matrix x sep = ldquotrdquo indicates that the columns in the output file will be

separated by tabs

Content of outfile2txt

1 2 3

4 5 6

For writing data frame to a file writetable function is provided

gt x lt- dataframe(

rownames = c(Bonds Aaron Ruth Rodriguez Mays)

Team = c(Giants Braves Yankees Yankees Giants)

16

At_Bat = c(9847 12364 8398 10341 10881)

Hits = c(2935 3771 2873 3070 3283)

Home_Runs = c(762 755 714 687 660))

gt x[c(Hits Home_Runs)]

Hits Home_Runs

Bonds 2935 762

Aaron 3771 755

Ruth 2873 714

Rodriguez 3070 687

Mays 3283 660

gt writetable(x[c(Hits Home_Runs)] outfile3txt sep=t

rownames=T colnames=NA)

gt

With rownames=T and colnames=NA row names and column names will be added

respectively (blank on the top-left)

Content of outfile3

Hits Home_Runs

Bonds 2935 762

Aaron 3771 755

Ruth 2873 714

Rodriguez 3070 687

Mays 3283 660

By giving quote=F output will be without double quotations

Excercise 9 Using the data frame in the above example calculate ratio of hits and at bat

and write the result to rdquooutfile4txtrdquo

10 Writing a program in a file

So far we have done our works interactively without saving what R codes we have written

However when we repeat the same works with R it is laborious to interactively write the

same codes again and again To solve this issue we can write the set of codes in a file and R

can read the file to execute the codes written in the file

For example we can prepare a text file with the following code Let us name the

17

file vecsumtestR

x lt- c(12345)

y lt- c(246810)

z lt- x + y

Then giving source( vecsumtestR) will let R execute the set of codes in the file

vecsumtestR You can check that based on the codes in vecsumtestR vectors are assigned

to each of x y and z

gt source( vecsumtestR)

gt x

[1] 1 2 3 4 5

gt y

[1] 2 4 6 8 10

gt z

[1] 3 6 9 12 15

gt

Exercise 10 Write the procedure in exercise 8 to a file dframetestR and execute the

procedure using ldquosourcerdquo

11 Defining functions

In mathematics function outputs the value which is determined by the input value In

programming it often represents a defined set of procedures5 For example let us consider

a function f which divides the sum of two given input values In mathematics it can be

written as f(x y) = (x + y) 2 In R it is written as follows by introducing the keyword

ldquofunctionrdquo

f lt- function(x y)

return ((x + y) 2)

In this way a function with two parameters (x and y) is defined And the returned value

(output) will be (x + y) 2 After defining f

5 Probably ldquosubroutinerdquo may be the better terminology

18

f(10 20)

will assign 10 and 20 to parameters x and y respectively and 15 will be returned as (x +y) 2

= (10 + 20) 2 = 15 Here ldquoreturnrdquo statement will return the value given just after it You

can also explicitly give the parameter names x and y as follows

f(x = 10 y = 20)

The general way to define a function is

name_of_function lt- function(parameter 1 parameter 2 hellip)

various procedures possibly using the given parameters6

hellip

return(return_value)

Exercise 11 Implement a mathematical function f(x a b c) = ax2 + bx + c in R Then

calculate f(4 3 2 1)

12 Making graphs

R is equipped with functions capable of making graphs easily ldquoplotrdquo function may be one

of the simplest ones It creates a plot in the two-dimensional space For example

assigning the following vectors to x and y will create a plot with points on the locations

(12) (34) (59) (77) and (98)

x lt- c(13579)

y lt- c(24978)

plot(x y xlab=X Value ylab=Y Value)

Labels on the x and y axes can be given to the parameters xlab and ylab

respectively

6 Optional parameters hellip can be access with list(hellip) as a list

19

A plot created by ldquoplotrdquo function

A bar graph can be created using ldquobarplotrdquo function In the following example a bar

graph is created by giving the heights of each bar by vector x The labels of each bar

are given by the element names of the vector x ie names(x)

x lt- c(1232101)

names(x) lt- c(A B C D E F)

barplot(x)

A bar graph created by ldquobarplotrdquo function

ldquohistrdquo function generates a histogram for a set of numbers given by a vector

2 4 6 8

23

45

67

89

X Value

YV

alu

e

A B C D E F

02

46

810

A B C D E F

02

46

810

20

x lt- c(32 12 42 23 34 59 52 53 41 52 32 14)

hist(x xlab = Test Value main = Test Histogram)

The parameter ldquomainrdquo is used to state the title of the histogram

A generated histogram using ldquohistrdquo function

ldquoboxplotrdquo function creates a boxplot for the numbers given by the vectors

x1 lt- c(1112111011111213151211101213)

x2 lt- c(20212791223231211921157121292315)

boxplot(x1 x2 names=c(Data 1 Data 2))

Test Histogram

Test Value

Fre

qu

en

cy

1 2 3 4 5 6

01

23

4

21

Example of a box plot

Exercise 12 Using the table (data frame) given in exercise 7-1 make a plot which describes

relationship between masses of the planets and number of satellites

13 Basic program structure

R is equipped with programming grammars which are important and common in other

programming languages Here some of the most important ones will be described briefly

131 if-statement

ldquoifrdquo statement gives you a way to execute a specific procedure only if a specified condition7

is satisfied For example if you want to assign 1 to y when x gt 0 and otherwise assign 0 to y

you can write as follows

if (x gt 0)

y lt- 1

7 When specifying a condition logical operators such as amp (logical AND) and | (logical OR)

can be used

Data 1 Data 2

10

15

20

25

Median

Outlier 50 of the

data are within this

range

The bottom

25 excluding outliers

The top 25

excluding outliers

22

else

y lt- 0

Investigate the value of y after doing x lt- -5 Then after doing x lt- 3 investigate the value

of y again to make sure that the value has changed

The general form of if-statement is as follows

if (condition 1)

Set of procedures to execute when the condition 1 is satisfied

else if (condition 2)

Set of procedures to execute when the condition the condition 2 is satisfied (but the

condition 1 is NOT met)

else if(condition 3)

Se of procedures to execute when condition 3 is satisfied (but condition 1 to 2 are NOT

met)

else if hellip

else

Set of procedures to execute when NONE of the above conditions are satisfied

Exercise 13-1 Define a function which returns 1 when the input parameter is 0 and

otherwise returns 0

132 while-statement

while-statement iterates a set of given procedures as long as the given condition is satisfied

The general form of while-statement is as follows8

while(condition)

8 next-statement in the while-statement forces program to start the next iteration

immediately break-statement in while-statement forces the program to immediately stop

the iteration and get out of the while-loop

23

A set of procedures to execute when the condition is satisfied

For example executing while(x lt= 3) print(x) x lt- x + 1 after doing x lt- 1 will generate

the output of 1 to 3

gt x lt- 1

gt while (x lt= 3) print(x) x lt- x + 1

[1] 1

[1] 2

[1] 3

gt

If you are to write this procedure as a program in a file you may want to write each step

line by line as follows so that the program will be easy to read

x lt- 1

while (x lt= 3)

print(x)

x lt- x + 1

Initially the value of x is 1 and the condition of the while-block ie x le 3 is met Thus R

goes into the while-block The first procedure print(x) which displays the value of x is

executed and 1 will be displayed

In the next procedure in the while-block (x lt- x + 1) the value of x is increased by 1 Thus

at the end of the first loop of while-block x is 2

R interpreter then comes back to the condition check at the beginning of the while-block

(x lt= 3) The variable x is 2 at this point and still x le 3 holds so the procedures in the

while-block will be executed again ie displaying the value of x 2 by print(x) and

increasing the value of x by 1 by x lt- x + 1 At the end of the while-block x is 3

24

R interpreter will then come back again to the condition check at the beginning of the

while-block (x lt= 3) The variable x is 3 at this point and still x le 3 holds so the

procedures in the while-block will be executed again ie displaying the value of x which

is 3 by print(x) and increasing the value of x by 1 by x lt- x + 1 At the end of the

while-block x is 4

R interpreter will then come back again to the condition check at the beginning of the

while-block (x lt= 3) The variable x is 4 at this point and x le 3 is NOT satisfied any more

Thus R interpreter steps out of the while-loop Final value of x is 4

Exercise 13-2 Using while-statement display 13579 and 11 respectively in each line

133 for-statement

for-statement also does iteration like while-statement The for-statement assigns each of

given elements to the given variable starting from the first element to the last element in

the given elements After each assignment the procedures in the for-block are executed

The general form of for-statement is

for(variables in elements)

procedures

For example for (i in c(135)) print(i) will assign each of 1 3 5 into the variable i and

after each assignment the procedures in the file-block are executed Thus in this case

number 1 is displayed in the first iteration after the assignment 3 is displayed in the

second iteration after the assignment and finally 5 is displayed in the third iteration after

the assignment The following example makes a vector of set of squared values of 1 to 59

x lt- NULL

for (i in 15)

x lt- append(x i2)

9 Actually using for-statement is unnecessary here Just do x lt- (15)2

25

x lt- NULL assigns an empty vector to x NULL represents an empty vector

In the for-statement each value of 1 to 5 is assigned to i and after the assignment the

procedure in the for-block is executed

In the procedure in the for-block the squared value of i (represented as i2) is

concatenated to the vector x The function append(x i) concatenates i to vector x10

Exercise 13-3 Rewrite procedure in 12-2 using for-statement

14 Other useful functions

Here some of the frequently used R functions are introduced briefly

141 How to use

help(function_name) will display how to use function_name

142 variables and attributes

ls() or objects() will display currently defined variables

class(variable_name) or mode(variable_name) will give you the type of the variable (object

For example whether it is numerical variable character list or matrix)

attributes(variable_name) will return attributes defined for the given variable

variabl_name

143 Family of apply functions

ldquosapplyrdquo function will return a vector containing a set of output values from a given

function after using each value in the given vector as a single input to that given function

each by each

func1_sub lt- function(elm)

a function expecting a single number as a parameter

if(-1 lt= elm amp elm lt= 1) return (1) else return (0)

10 The same procedure can be done with x lt- c(x i2) c can be used to concatenate given

vectors

26

func1 lt- function(x) a function with a vector parameter x

return(sapply(x func1_sub))

144 Making figures

rdquocurverdquo function generates a graph for a given function See help for details

Example

curve(dnorm -7 +7) Draws normal (Gaussian) distribution

curve(cos(x)+cos(2x) -2pi 2pi 1000) 1000 is number of points

curve(func1 -3 3) The function defined in 143 Family of apply functions

27

3

y lt- 25

as vectors are composed of consecutive numbers from 2 to 5

Exercise 2-1 Assign (10 11 12) to variable z and display information of variable z

3 Simple Arithmetic

R can perform various arithmetic including addition subtraction multiplication and

division For example if you input 1+2 you will get an answer of 3 as follows

gt 1 + 2

[1] 3

gt

In R like most of other programming languages + - and denotes to addition

subtraction multiplication and division respectively

R can also manipulate vectors For example c(1 2 3) + c(4 5 6) will give (5 7 9) and

c(1 2 3) 2 will give (2 4 6) Arithmetic calculations with variables can also be performed

If (1 2 3) and (4 5 6) is assigned to x and y respectively x + y will give a vector (5 7 9)

A value of variables can be changed based on result of calculation For example after doing

x lt- c(1 2 3)

x lt- x 2

will multiply x by 2 and the result will be overwritten to x itself

Exercise 3-1 Assign vector (1 2 3) to variable x and assign 2 to variable y Then calculate x

y What answer do you get

Exercise 3-2 Assign (1 2 3) to vector x and multiply it by three Put the result into x itself

4 Simple vector arithmetic

As already explained list of numbers in bracket immediately after c represents vector

Dimension of vector ie number of numbers in the given vector x can be obtained by

length(x)

4

One can extract desired number from the vector For example after inputting x lt-

c(246810)

x[3]

will give you 6 which is the third number of the vector

x[c(24)]

will give you second and fourth numbers of the vector and

x[24]

will give you numbers from second to fourth elements of the vector

You can also perform vector comparison For example x gt 5 will give you vectors

containing TRUE andor FALSE (abbreviated as T and F respectively) where each element

denotes whether corresponding number in x is greater than x or not

gt x lt- c(246810)

gt x

[1] 2 4 6 8 10

gt x gt 5

[1] FALSE FALSE TRUE TRUE TRUE

gt

R function ldquowhichrdquo will returns where the Trsquos are in the given vector as index numbers

Using the vector c(F F T T T) returned by x gt 5 which(c(F F T T T)) will give you vector

(345) More easily ldquowhich(x gt 5)rdquo will give you exactly the same answer

gt x lt- c(246810)

gt which(x gt 5)

[1] 3 4 5

gt

Using this vector which is composed of T and F we can extract elements which correspond

to T

x[ c(F F T T T)]

By putting above together it is sufficient to just type

x[ x gt 5 ]

to extract elements whose values are above 5

5

Actually we can also deal with set of strings as vector For example after typing

x lt- c(Sunday Monday Tuesday Wednesday Thursday Friday Saturday)

x[2] will give you second string rdquoMondayrdquo Various manipulations are available for a vector

containing strings

grep(es x)

will give you indices of strings in vector x where string contain substring ldquoesrdquo Thus the

below inputs will return strings containing ldquoesrdquo

gt x lt- c(Sunday Monday Tuesday Wednesday Thursday Friday Saturday)

gt x[ grep(es x) ]

[1] Tuesday Wednesday

gt

You can assign name for each element in a vector using function ldquonamerdquo For example

x lt- c(2 4 6)

names(x) lt- c(First Second Third)

will assign (ldquoFirstrdquo ldquoSecondrdquo ldquoThirdrdquo) to name attribute of variable x

x[[ Second ]]

will give you second element

Various statistical functions are defined for vectors containing only numerical values For

example

sum(x)

will give you sum of numerical elements in vector x and

mean(x)

will give you average of numerical elements in x The function sum will count number of Trsquos

in vector x containing only T andor F Thus

x lt- c(246810)

sum(x gt 5)

will count number of numerical elements greater than 5 (ie 3 will be returned)

Exercise 4-1 Create vector containing numerical values larger than 5 in vector (3 1 4 1 5

9 2 6 5)

6

Exercise 4-2 Assign names (ldquoJanrdquo ldquoFebrdquo ldquoMarrdquo ldquoAprrdquo ldquoMayrdquo ldquoJunrdquo ldquoJulrdquo ldquoAugrdquo ldquoSeprdquo

ldquoOctrdquo ldquoNovrdquo ldquoDecrdquo) for respective element of vector (123456789101112) Then

extract third element from the vector using the string rdquoMarrdquo The extracted element should

be 3

5 Simple matrix construction and arithmetic

By gathering vectors it is possible to create a matrix using the function rbind For example

if you want to create a matrix x = 1 2 3

4 5 6

aelig

egraveccedil

ouml

oslashdivide where the first and second rows are (123)

and (456) respectively type

x lt- rbind(c(123) c(456))

Then just type x to display the matrix you have created

gt x lt- rbind(c(123) c(456))

gt x

[1] [2] [3]

[1] 1 2 3

[2] 4 5 6

gt

Alternatively

x lt- matrix(c(1 4 2 5 3 6) nrow=2 ncol=3)

or

x lt- matrix(c(1 2 3 4 5 6) nrow=2 ncol=3 byrow=T)

should return the identical matrix The latter one means that matrix of size 2 times 3 will be

created (nrow=2 ncol=3) and numbers are filled in each row first (byrow=T)

Various functions are available for a matrix nrow(x) and ncol(x) will return number of

rows and columns of the matrix x respectively x+1 will add 1 to all the elements in x and x

2 will multiply all the elements in x by 2 Various kinds of arithmetic between matrices

are also available For example after typing y lt- rbind(c(2 4 6) c(8 10 12)) x + y will give

you matrix where each element correspond to sum of corresponding element of x and y x y

will give you matrix where each element correspond to product of corresponding element of

7

x and y2

Any specific row or column can be extracted from a given matrix For example if you

want to extract second row you type

gt x[2]

[1] 4 5 6

gt

If you want to extract second column you type

gt x[2]

[1] 2 5

t(x) will transpose the matrix x

gt t(x)

[1] [2]

[1] 1 4

[2] 2 5

[3] 3 6

gt

Average of each row of matrix x can be calculated as below3

gt apply(x 1 mean)

[1] 2 5

Average of each column of matrix x can be calculated as below

gt apply(x 2 mean)

[1] 25 35 45

Actually matrix can be deemed as set of numbers in two dimensional array R can also

deal with array having n dimensions using array(vector numbers of elements in each

2 Product of the matrices based on canonical mathematical definition can be calculated by

the operator 3 Second parameter of 1 in apply function denotes that we will calculate average for each

position of the first dimension (in this case the first dimension is row number) Similarly

second parameter of 2 denotes that we will calculate average for each position of the second

dimension (in this case the second dimension is column number)

8

dimension) For example

x lt- array(124 c(342))

will create three dimensional array with size of 3times4times2 and fills numbers in the given

vector from the first dimension to the third dimension

gt x lt- array(124 c(342))

gt x

1 The first 3times4 array of the third dimension

[1] [2] [3] [4]

[1] 1 4 7 10

[2] 2 5 8 11

[3] 3 6 9 12

2 The second 3times4 array of the third dimension

[1] [2] [3] [4]

[1] 13 16 19 22

[2] 14 17 20 23

[3] 15 18 21 24

gt

Exercise 5-1 Calculate 1 3 5

7 9 11

aelig

egraveccedil

ouml

oslashdivide+

1 2 3

2 4 6

aelig

egraveccedil

ouml

oslashdivide

Exercise 5-2 For the matrix obtained in the above calculation calculate row average and

column average

6 Simple list creation

A list in R can gather various kinds of data into a single object to manage them

x lt- list(Ichiro rdquo Seattlerdquo Right fielder)

will create a list containing Ichiro rdquo Seattlerdquo Right fielder Although a vector cannot

9

contain another vector a list can contain a vector as shown in the example below

x lt- list(Ichiro Seattle Right fielder c(184 214 225))

To extract the second element you can type

x[[2]]

You can assign names to each element as below

x lt- list(player = Ichiro team = Seattle

position = Right fielder hits = c(184 214 225))

Type x to confirm that names are given to each element

gt x

$player

[1] Ichiro

$team

[1] Seattle

$position

[1] Right fielder

$hits

[1] 184 214 225

gt

You can use assigned name to extract corresponding element For example

x[[team]]

or

x$team

will extract ldquoSeattlerdquo

Exercise 6-2 Create a list containing ldquoSan Diegordquo vector (32 117) 9645 and ldquoCaliforniardquo

Give names ldquocityrdquo ldquocoordinatesrdquo ldquoareardquo and ldquostaterdquo to respective element

10

11

7 Simple data frame creation

Data frame is one of data class in R It is one type of list and has two dimensional

structure just like a matrix Each row can be deemed as a sample and each column can be

deemed as attribute of the sample Using this idea data frame can represent a table

Following table gives data of five retired major league baseball players

Team atbats hits homeruns

Rose Reds 14053 4256 160

Aaron Brewers 12364 3771 755

Yastrzemski Red Sox 11988 3419 452

Ripken Orioles 11551 3184 431

Cobb Athletics 11434 4191 117

The above table can be represented by data frame as follows

x lt- dataframe(

rownames = c(Rose Aaron Yastrzemski Ripken Cobb)

team = c(Reds Brewers Red Sox Orioles Athletics)

atbats = c(14053 12364 11988 11551 11434)

hits = c(4256 3771 3419 3184 4191)

homeruns = c(160 755 452 431 117))

The function dataframe can generate a data frame and can be used in the format of

dataframe(rownames = vector for column labels column name 1 = vector 1 column name

2 = vector 2 hellip) You can check the content of x by typing x

gt x

team atbats hits homeruns

Rose Reds 14053 4256 160

Aaron Brewers 12364 3771 755

Yastrzemski Red Sox 11988 3419 452

Ripken Orioles 11551 3184 431

Cobb Athletics 11434 4191 117

gt

Like a regular list column name can be used to extract corresponding vector

gt x$hits

[1] 4256 3771 3419 3184 4191

gt

12

Any part of data frame can be extracted as another data frame For example

x[ c(15) c(234) ]

will extract row 1 5 and column 2 3 4 as a new data frame You can also use row names

and column names to do the same thing

x[ c(Rose Cobb) c(atbats hits homeruns)]

Various functions are defined for data frame Let us get attributes associated with the data

frame x

gt attributes(x)

$names

[1] team atbats hits homeruns

$rownames

[1] Rose Aaron Yastrzemski Ripken Cobb

$class

[1] dataframe

gt

We got names rownames and class Type names(x) rownames(x) and class(x) You can

obtain row names column names and class of x respectively

Exercise 7-1 The below table shows characteristic values of each planet in solar system

Masses are represented in 1023 kg and diameters are represented in km Make a data frame

representing the table4

Mass Diameter Satellites

Mercury 3301 4879 0

Venus 12103 48690 0

Earth 59736 12756 1

4 As all the elements are numeric in this case this data frame can be dealt as a matrix

asmatrix(x) will return x as a matrix

13

Mars 6419 6794 2

Jupiter 18986 142984 63

Saturn 5688 120536 48

Uranus 8686 51118 27

Neptune 1024 49528 13

Exercise 7-2 Using the created data frame in the previous exercise calculate averages of

mass diameter and number of satellites

8 Data reading from file

So far we have been inputting data directly However in most of cases numerical data

may be prepared by files So we will learn how to import data into a variable in R First

prepare a text file with the following values Let us name the file ldquotestdatatxtrdquo We assume

that the file is placed under the directory UserssmithTMP

14

14

21

35

6

Using setwd we change the working directory to UserssmithTMP

setwd(UserssmithTMP)

Then using scan function read the data to a variable x

gt x lt- scan(testdatatxt)

Read 5 items

gt x

[1] 14 14 21 35 6

gt

You notice that the values are stored in x as a vector

R has a function readtable to read a table where each line is separated by TABs like the

following example (file name is ldquobatterstxtrdquo)

Team At_Bat Hits Home_Runs

14

Bonds Giants 9847 2935 762

Aaron Braves 12364 3771 755

Ruth Yankees 8398 2873 714

Rodriguez Yankees 10341 3070 687

Mays Giants 10881 3283 660

readtable can read this as data frame

gt x lt- readtable(batterstxt header = T

sep = t rownames = 1)

gt x

Team At_Bat Hits Home_Runs

Bonds Giants 9847 2935 762

Aaron Braves 12364 3771 755

Ruth Yankees 8398 2873 714

Rodriguez Yankees 10341 3070 687

Mays Giants 10881 3283 660

gt

The first parameter of the function readtable states that the name of the file to read

is rdquobatterstxtrdquo Subsequently header = T states that there is a header for the table sep =

ldquotrdquo indicates that each line is separated by TAB Finally we can specify that there is one

column for row names by rownames = 1 Since x will be a data frame we can get number of

hits by $Hits

Exercise 8 Save table in exercise 7-1 as tab-delimited file and read it as data frame

9 Writing data to file

There are many cases where we want to have statistical results written in file rather

than temporary seeing the results on screen In this way we can open the results by

spreadsheet software or process the results using other programming language afterwards

Using write function is one of the simplest ways to output the results to files This

function enables us to output numerical values assigned to variable to file For example

after assigning the vector to x as shown below

x lt- c(10 12 15 19 21 34)

15

the following line

write(x outfile1txt ncolumns = 1)

will write content of x to the file ldquooutfile1txtrdquo ncolumns = 1 states that the vector will be

written to the file by one column

Content of outfile1txt

10

12

15

19

21

34

write function also enable us to write matrix data to file as shown below

gt x lt- matrix(c(123456) nrow=2 ncol=3 byrow=T)

gt x

[1] [2] [3]

[1] 1 2 3

[2] 4 5 6

gt write(t(x) outfile2txt ncolumns=ncol(x) sep=t)

gt

t(x) transposes matrix x If we do not do it the matrix written in the file will look

transposed ncolumns=ncol(x) will tell R that the number of columns to output should be

identical to that of matrix x sep = ldquotrdquo indicates that the columns in the output file will be

separated by tabs

Content of outfile2txt

1 2 3

4 5 6

For writing data frame to a file writetable function is provided

gt x lt- dataframe(

rownames = c(Bonds Aaron Ruth Rodriguez Mays)

Team = c(Giants Braves Yankees Yankees Giants)

16

At_Bat = c(9847 12364 8398 10341 10881)

Hits = c(2935 3771 2873 3070 3283)

Home_Runs = c(762 755 714 687 660))

gt x[c(Hits Home_Runs)]

Hits Home_Runs

Bonds 2935 762

Aaron 3771 755

Ruth 2873 714

Rodriguez 3070 687

Mays 3283 660

gt writetable(x[c(Hits Home_Runs)] outfile3txt sep=t

rownames=T colnames=NA)

gt

With rownames=T and colnames=NA row names and column names will be added

respectively (blank on the top-left)

Content of outfile3

Hits Home_Runs

Bonds 2935 762

Aaron 3771 755

Ruth 2873 714

Rodriguez 3070 687

Mays 3283 660

By giving quote=F output will be without double quotations

Excercise 9 Using the data frame in the above example calculate ratio of hits and at bat

and write the result to rdquooutfile4txtrdquo

10 Writing a program in a file

So far we have done our works interactively without saving what R codes we have written

However when we repeat the same works with R it is laborious to interactively write the

same codes again and again To solve this issue we can write the set of codes in a file and R

can read the file to execute the codes written in the file

For example we can prepare a text file with the following code Let us name the

17

file vecsumtestR

x lt- c(12345)

y lt- c(246810)

z lt- x + y

Then giving source( vecsumtestR) will let R execute the set of codes in the file

vecsumtestR You can check that based on the codes in vecsumtestR vectors are assigned

to each of x y and z

gt source( vecsumtestR)

gt x

[1] 1 2 3 4 5

gt y

[1] 2 4 6 8 10

gt z

[1] 3 6 9 12 15

gt

Exercise 10 Write the procedure in exercise 8 to a file dframetestR and execute the

procedure using ldquosourcerdquo

11 Defining functions

In mathematics function outputs the value which is determined by the input value In

programming it often represents a defined set of procedures5 For example let us consider

a function f which divides the sum of two given input values In mathematics it can be

written as f(x y) = (x + y) 2 In R it is written as follows by introducing the keyword

ldquofunctionrdquo

f lt- function(x y)

return ((x + y) 2)

In this way a function with two parameters (x and y) is defined And the returned value

(output) will be (x + y) 2 After defining f

5 Probably ldquosubroutinerdquo may be the better terminology

18

f(10 20)

will assign 10 and 20 to parameters x and y respectively and 15 will be returned as (x +y) 2

= (10 + 20) 2 = 15 Here ldquoreturnrdquo statement will return the value given just after it You

can also explicitly give the parameter names x and y as follows

f(x = 10 y = 20)

The general way to define a function is

name_of_function lt- function(parameter 1 parameter 2 hellip)

various procedures possibly using the given parameters6

hellip

return(return_value)

Exercise 11 Implement a mathematical function f(x a b c) = ax2 + bx + c in R Then

calculate f(4 3 2 1)

12 Making graphs

R is equipped with functions capable of making graphs easily ldquoplotrdquo function may be one

of the simplest ones It creates a plot in the two-dimensional space For example

assigning the following vectors to x and y will create a plot with points on the locations

(12) (34) (59) (77) and (98)

x lt- c(13579)

y lt- c(24978)

plot(x y xlab=X Value ylab=Y Value)

Labels on the x and y axes can be given to the parameters xlab and ylab

respectively

6 Optional parameters hellip can be access with list(hellip) as a list

19

A plot created by ldquoplotrdquo function

A bar graph can be created using ldquobarplotrdquo function In the following example a bar

graph is created by giving the heights of each bar by vector x The labels of each bar

are given by the element names of the vector x ie names(x)

x lt- c(1232101)

names(x) lt- c(A B C D E F)

barplot(x)

A bar graph created by ldquobarplotrdquo function

ldquohistrdquo function generates a histogram for a set of numbers given by a vector

2 4 6 8

23

45

67

89

X Value

YV

alu

e

A B C D E F

02

46

810

A B C D E F

02

46

810

20

x lt- c(32 12 42 23 34 59 52 53 41 52 32 14)

hist(x xlab = Test Value main = Test Histogram)

The parameter ldquomainrdquo is used to state the title of the histogram

A generated histogram using ldquohistrdquo function

ldquoboxplotrdquo function creates a boxplot for the numbers given by the vectors

x1 lt- c(1112111011111213151211101213)

x2 lt- c(20212791223231211921157121292315)

boxplot(x1 x2 names=c(Data 1 Data 2))

Test Histogram

Test Value

Fre

qu

en

cy

1 2 3 4 5 6

01

23

4

21

Example of a box plot

Exercise 12 Using the table (data frame) given in exercise 7-1 make a plot which describes

relationship between masses of the planets and number of satellites

13 Basic program structure

R is equipped with programming grammars which are important and common in other

programming languages Here some of the most important ones will be described briefly

131 if-statement

ldquoifrdquo statement gives you a way to execute a specific procedure only if a specified condition7

is satisfied For example if you want to assign 1 to y when x gt 0 and otherwise assign 0 to y

you can write as follows

if (x gt 0)

y lt- 1

7 When specifying a condition logical operators such as amp (logical AND) and | (logical OR)

can be used

Data 1 Data 2

10

15

20

25

Median

Outlier 50 of the

data are within this

range

The bottom

25 excluding outliers

The top 25

excluding outliers

22

else

y lt- 0

Investigate the value of y after doing x lt- -5 Then after doing x lt- 3 investigate the value

of y again to make sure that the value has changed

The general form of if-statement is as follows

if (condition 1)

Set of procedures to execute when the condition 1 is satisfied

else if (condition 2)

Set of procedures to execute when the condition the condition 2 is satisfied (but the

condition 1 is NOT met)

else if(condition 3)

Se of procedures to execute when condition 3 is satisfied (but condition 1 to 2 are NOT

met)

else if hellip

else

Set of procedures to execute when NONE of the above conditions are satisfied

Exercise 13-1 Define a function which returns 1 when the input parameter is 0 and

otherwise returns 0

132 while-statement

while-statement iterates a set of given procedures as long as the given condition is satisfied

The general form of while-statement is as follows8

while(condition)

8 next-statement in the while-statement forces program to start the next iteration

immediately break-statement in while-statement forces the program to immediately stop

the iteration and get out of the while-loop

23

A set of procedures to execute when the condition is satisfied

For example executing while(x lt= 3) print(x) x lt- x + 1 after doing x lt- 1 will generate

the output of 1 to 3

gt x lt- 1

gt while (x lt= 3) print(x) x lt- x + 1

[1] 1

[1] 2

[1] 3

gt

If you are to write this procedure as a program in a file you may want to write each step

line by line as follows so that the program will be easy to read

x lt- 1

while (x lt= 3)

print(x)

x lt- x + 1

Initially the value of x is 1 and the condition of the while-block ie x le 3 is met Thus R

goes into the while-block The first procedure print(x) which displays the value of x is

executed and 1 will be displayed

In the next procedure in the while-block (x lt- x + 1) the value of x is increased by 1 Thus

at the end of the first loop of while-block x is 2

R interpreter then comes back to the condition check at the beginning of the while-block

(x lt= 3) The variable x is 2 at this point and still x le 3 holds so the procedures in the

while-block will be executed again ie displaying the value of x 2 by print(x) and

increasing the value of x by 1 by x lt- x + 1 At the end of the while-block x is 3

24

R interpreter will then come back again to the condition check at the beginning of the

while-block (x lt= 3) The variable x is 3 at this point and still x le 3 holds so the

procedures in the while-block will be executed again ie displaying the value of x which

is 3 by print(x) and increasing the value of x by 1 by x lt- x + 1 At the end of the

while-block x is 4

R interpreter will then come back again to the condition check at the beginning of the

while-block (x lt= 3) The variable x is 4 at this point and x le 3 is NOT satisfied any more

Thus R interpreter steps out of the while-loop Final value of x is 4

Exercise 13-2 Using while-statement display 13579 and 11 respectively in each line

133 for-statement

for-statement also does iteration like while-statement The for-statement assigns each of

given elements to the given variable starting from the first element to the last element in

the given elements After each assignment the procedures in the for-block are executed

The general form of for-statement is

for(variables in elements)

procedures

For example for (i in c(135)) print(i) will assign each of 1 3 5 into the variable i and

after each assignment the procedures in the file-block are executed Thus in this case

number 1 is displayed in the first iteration after the assignment 3 is displayed in the

second iteration after the assignment and finally 5 is displayed in the third iteration after

the assignment The following example makes a vector of set of squared values of 1 to 59

x lt- NULL

for (i in 15)

x lt- append(x i2)

9 Actually using for-statement is unnecessary here Just do x lt- (15)2

25

x lt- NULL assigns an empty vector to x NULL represents an empty vector

In the for-statement each value of 1 to 5 is assigned to i and after the assignment the

procedure in the for-block is executed

In the procedure in the for-block the squared value of i (represented as i2) is

concatenated to the vector x The function append(x i) concatenates i to vector x10

Exercise 13-3 Rewrite procedure in 12-2 using for-statement

14 Other useful functions

Here some of the frequently used R functions are introduced briefly

141 How to use

help(function_name) will display how to use function_name

142 variables and attributes

ls() or objects() will display currently defined variables

class(variable_name) or mode(variable_name) will give you the type of the variable (object

For example whether it is numerical variable character list or matrix)

attributes(variable_name) will return attributes defined for the given variable

variabl_name

143 Family of apply functions

ldquosapplyrdquo function will return a vector containing a set of output values from a given

function after using each value in the given vector as a single input to that given function

each by each

func1_sub lt- function(elm)

a function expecting a single number as a parameter

if(-1 lt= elm amp elm lt= 1) return (1) else return (0)

10 The same procedure can be done with x lt- c(x i2) c can be used to concatenate given

vectors

26

func1 lt- function(x) a function with a vector parameter x

return(sapply(x func1_sub))

144 Making figures

rdquocurverdquo function generates a graph for a given function See help for details

Example

curve(dnorm -7 +7) Draws normal (Gaussian) distribution

curve(cos(x)+cos(2x) -2pi 2pi 1000) 1000 is number of points

curve(func1 -3 3) The function defined in 143 Family of apply functions

27

4

One can extract desired number from the vector For example after inputting x lt-

c(246810)

x[3]

will give you 6 which is the third number of the vector

x[c(24)]

will give you second and fourth numbers of the vector and

x[24]

will give you numbers from second to fourth elements of the vector

You can also perform vector comparison For example x gt 5 will give you vectors

containing TRUE andor FALSE (abbreviated as T and F respectively) where each element

denotes whether corresponding number in x is greater than x or not

gt x lt- c(246810)

gt x

[1] 2 4 6 8 10

gt x gt 5

[1] FALSE FALSE TRUE TRUE TRUE

gt

R function ldquowhichrdquo will returns where the Trsquos are in the given vector as index numbers

Using the vector c(F F T T T) returned by x gt 5 which(c(F F T T T)) will give you vector

(345) More easily ldquowhich(x gt 5)rdquo will give you exactly the same answer

gt x lt- c(246810)

gt which(x gt 5)

[1] 3 4 5

gt

Using this vector which is composed of T and F we can extract elements which correspond

to T

x[ c(F F T T T)]

By putting above together it is sufficient to just type

x[ x gt 5 ]

to extract elements whose values are above 5

5

Actually we can also deal with set of strings as vector For example after typing

x lt- c(Sunday Monday Tuesday Wednesday Thursday Friday Saturday)

x[2] will give you second string rdquoMondayrdquo Various manipulations are available for a vector

containing strings

grep(es x)

will give you indices of strings in vector x where string contain substring ldquoesrdquo Thus the

below inputs will return strings containing ldquoesrdquo

gt x lt- c(Sunday Monday Tuesday Wednesday Thursday Friday Saturday)

gt x[ grep(es x) ]

[1] Tuesday Wednesday

gt

You can assign name for each element in a vector using function ldquonamerdquo For example

x lt- c(2 4 6)

names(x) lt- c(First Second Third)

will assign (ldquoFirstrdquo ldquoSecondrdquo ldquoThirdrdquo) to name attribute of variable x

x[[ Second ]]

will give you second element

Various statistical functions are defined for vectors containing only numerical values For

example

sum(x)

will give you sum of numerical elements in vector x and

mean(x)

will give you average of numerical elements in x The function sum will count number of Trsquos

in vector x containing only T andor F Thus

x lt- c(246810)

sum(x gt 5)

will count number of numerical elements greater than 5 (ie 3 will be returned)

Exercise 4-1 Create vector containing numerical values larger than 5 in vector (3 1 4 1 5

9 2 6 5)

6

Exercise 4-2 Assign names (ldquoJanrdquo ldquoFebrdquo ldquoMarrdquo ldquoAprrdquo ldquoMayrdquo ldquoJunrdquo ldquoJulrdquo ldquoAugrdquo ldquoSeprdquo

ldquoOctrdquo ldquoNovrdquo ldquoDecrdquo) for respective element of vector (123456789101112) Then

extract third element from the vector using the string rdquoMarrdquo The extracted element should

be 3

5 Simple matrix construction and arithmetic

By gathering vectors it is possible to create a matrix using the function rbind For example

if you want to create a matrix x = 1 2 3

4 5 6

aelig

egraveccedil

ouml

oslashdivide where the first and second rows are (123)

and (456) respectively type

x lt- rbind(c(123) c(456))

Then just type x to display the matrix you have created

gt x lt- rbind(c(123) c(456))

gt x

[1] [2] [3]

[1] 1 2 3

[2] 4 5 6

gt

Alternatively

x lt- matrix(c(1 4 2 5 3 6) nrow=2 ncol=3)

or

x lt- matrix(c(1 2 3 4 5 6) nrow=2 ncol=3 byrow=T)

should return the identical matrix The latter one means that matrix of size 2 times 3 will be

created (nrow=2 ncol=3) and numbers are filled in each row first (byrow=T)

Various functions are available for a matrix nrow(x) and ncol(x) will return number of

rows and columns of the matrix x respectively x+1 will add 1 to all the elements in x and x

2 will multiply all the elements in x by 2 Various kinds of arithmetic between matrices

are also available For example after typing y lt- rbind(c(2 4 6) c(8 10 12)) x + y will give

you matrix where each element correspond to sum of corresponding element of x and y x y

will give you matrix where each element correspond to product of corresponding element of

7

x and y2

Any specific row or column can be extracted from a given matrix For example if you

want to extract second row you type

gt x[2]

[1] 4 5 6

gt

If you want to extract second column you type

gt x[2]

[1] 2 5

t(x) will transpose the matrix x

gt t(x)

[1] [2]

[1] 1 4

[2] 2 5

[3] 3 6

gt

Average of each row of matrix x can be calculated as below3

gt apply(x 1 mean)

[1] 2 5

Average of each column of matrix x can be calculated as below

gt apply(x 2 mean)

[1] 25 35 45

Actually matrix can be deemed as set of numbers in two dimensional array R can also

deal with array having n dimensions using array(vector numbers of elements in each

2 Product of the matrices based on canonical mathematical definition can be calculated by

the operator 3 Second parameter of 1 in apply function denotes that we will calculate average for each

position of the first dimension (in this case the first dimension is row number) Similarly

second parameter of 2 denotes that we will calculate average for each position of the second

dimension (in this case the second dimension is column number)

8

dimension) For example

x lt- array(124 c(342))

will create three dimensional array with size of 3times4times2 and fills numbers in the given

vector from the first dimension to the third dimension

gt x lt- array(124 c(342))

gt x

1 The first 3times4 array of the third dimension

[1] [2] [3] [4]

[1] 1 4 7 10

[2] 2 5 8 11

[3] 3 6 9 12

2 The second 3times4 array of the third dimension

[1] [2] [3] [4]

[1] 13 16 19 22

[2] 14 17 20 23

[3] 15 18 21 24

gt

Exercise 5-1 Calculate 1 3 5

7 9 11

aelig

egraveccedil

ouml

oslashdivide+

1 2 3

2 4 6

aelig

egraveccedil

ouml

oslashdivide

Exercise 5-2 For the matrix obtained in the above calculation calculate row average and

column average

6 Simple list creation

A list in R can gather various kinds of data into a single object to manage them

x lt- list(Ichiro rdquo Seattlerdquo Right fielder)

will create a list containing Ichiro rdquo Seattlerdquo Right fielder Although a vector cannot

9

contain another vector a list can contain a vector as shown in the example below

x lt- list(Ichiro Seattle Right fielder c(184 214 225))

To extract the second element you can type

x[[2]]

You can assign names to each element as below

x lt- list(player = Ichiro team = Seattle

position = Right fielder hits = c(184 214 225))

Type x to confirm that names are given to each element

gt x

$player

[1] Ichiro

$team

[1] Seattle

$position

[1] Right fielder

$hits

[1] 184 214 225

gt

You can use assigned name to extract corresponding element For example

x[[team]]

or

x$team

will extract ldquoSeattlerdquo

Exercise 6-2 Create a list containing ldquoSan Diegordquo vector (32 117) 9645 and ldquoCaliforniardquo

Give names ldquocityrdquo ldquocoordinatesrdquo ldquoareardquo and ldquostaterdquo to respective element

10

11

7 Simple data frame creation

Data frame is one of data class in R It is one type of list and has two dimensional

structure just like a matrix Each row can be deemed as a sample and each column can be

deemed as attribute of the sample Using this idea data frame can represent a table

Following table gives data of five retired major league baseball players

Team atbats hits homeruns

Rose Reds 14053 4256 160

Aaron Brewers 12364 3771 755

Yastrzemski Red Sox 11988 3419 452

Ripken Orioles 11551 3184 431

Cobb Athletics 11434 4191 117

The above table can be represented by data frame as follows

x lt- dataframe(

rownames = c(Rose Aaron Yastrzemski Ripken Cobb)

team = c(Reds Brewers Red Sox Orioles Athletics)

atbats = c(14053 12364 11988 11551 11434)

hits = c(4256 3771 3419 3184 4191)

homeruns = c(160 755 452 431 117))

The function dataframe can generate a data frame and can be used in the format of

dataframe(rownames = vector for column labels column name 1 = vector 1 column name

2 = vector 2 hellip) You can check the content of x by typing x

gt x

team atbats hits homeruns

Rose Reds 14053 4256 160

Aaron Brewers 12364 3771 755

Yastrzemski Red Sox 11988 3419 452

Ripken Orioles 11551 3184 431

Cobb Athletics 11434 4191 117

gt

Like a regular list column name can be used to extract corresponding vector

gt x$hits

[1] 4256 3771 3419 3184 4191

gt

12

Any part of data frame can be extracted as another data frame For example

x[ c(15) c(234) ]

will extract row 1 5 and column 2 3 4 as a new data frame You can also use row names

and column names to do the same thing

x[ c(Rose Cobb) c(atbats hits homeruns)]

Various functions are defined for data frame Let us get attributes associated with the data

frame x

gt attributes(x)

$names

[1] team atbats hits homeruns

$rownames

[1] Rose Aaron Yastrzemski Ripken Cobb

$class

[1] dataframe

gt

We got names rownames and class Type names(x) rownames(x) and class(x) You can

obtain row names column names and class of x respectively

Exercise 7-1 The below table shows characteristic values of each planet in solar system

Masses are represented in 1023 kg and diameters are represented in km Make a data frame

representing the table4

Mass Diameter Satellites

Mercury 3301 4879 0

Venus 12103 48690 0

Earth 59736 12756 1

4 As all the elements are numeric in this case this data frame can be dealt as a matrix

asmatrix(x) will return x as a matrix

13

Mars 6419 6794 2

Jupiter 18986 142984 63

Saturn 5688 120536 48

Uranus 8686 51118 27

Neptune 1024 49528 13

Exercise 7-2 Using the created data frame in the previous exercise calculate averages of

mass diameter and number of satellites

8 Data reading from file

So far we have been inputting data directly However in most of cases numerical data

may be prepared by files So we will learn how to import data into a variable in R First

prepare a text file with the following values Let us name the file ldquotestdatatxtrdquo We assume

that the file is placed under the directory UserssmithTMP

14

14

21

35

6

Using setwd we change the working directory to UserssmithTMP

setwd(UserssmithTMP)

Then using scan function read the data to a variable x

gt x lt- scan(testdatatxt)

Read 5 items

gt x

[1] 14 14 21 35 6

gt

You notice that the values are stored in x as a vector

R has a function readtable to read a table where each line is separated by TABs like the

following example (file name is ldquobatterstxtrdquo)

Team At_Bat Hits Home_Runs

14

Bonds Giants 9847 2935 762

Aaron Braves 12364 3771 755

Ruth Yankees 8398 2873 714

Rodriguez Yankees 10341 3070 687

Mays Giants 10881 3283 660

readtable can read this as data frame

gt x lt- readtable(batterstxt header = T

sep = t rownames = 1)

gt x

Team At_Bat Hits Home_Runs

Bonds Giants 9847 2935 762

Aaron Braves 12364 3771 755

Ruth Yankees 8398 2873 714

Rodriguez Yankees 10341 3070 687

Mays Giants 10881 3283 660

gt

The first parameter of the function readtable states that the name of the file to read

is rdquobatterstxtrdquo Subsequently header = T states that there is a header for the table sep =

ldquotrdquo indicates that each line is separated by TAB Finally we can specify that there is one

column for row names by rownames = 1 Since x will be a data frame we can get number of

hits by $Hits

Exercise 8 Save table in exercise 7-1 as tab-delimited file and read it as data frame

9 Writing data to file

There are many cases where we want to have statistical results written in file rather

than temporary seeing the results on screen In this way we can open the results by

spreadsheet software or process the results using other programming language afterwards

Using write function is one of the simplest ways to output the results to files This

function enables us to output numerical values assigned to variable to file For example

after assigning the vector to x as shown below

x lt- c(10 12 15 19 21 34)

15

the following line

write(x outfile1txt ncolumns = 1)

will write content of x to the file ldquooutfile1txtrdquo ncolumns = 1 states that the vector will be

written to the file by one column

Content of outfile1txt

10

12

15

19

21

34

write function also enable us to write matrix data to file as shown below

gt x lt- matrix(c(123456) nrow=2 ncol=3 byrow=T)

gt x

[1] [2] [3]

[1] 1 2 3

[2] 4 5 6

gt write(t(x) outfile2txt ncolumns=ncol(x) sep=t)

gt

t(x) transposes matrix x If we do not do it the matrix written in the file will look

transposed ncolumns=ncol(x) will tell R that the number of columns to output should be

identical to that of matrix x sep = ldquotrdquo indicates that the columns in the output file will be

separated by tabs

Content of outfile2txt

1 2 3

4 5 6

For writing data frame to a file writetable function is provided

gt x lt- dataframe(

rownames = c(Bonds Aaron Ruth Rodriguez Mays)

Team = c(Giants Braves Yankees Yankees Giants)

16

At_Bat = c(9847 12364 8398 10341 10881)

Hits = c(2935 3771 2873 3070 3283)

Home_Runs = c(762 755 714 687 660))

gt x[c(Hits Home_Runs)]

Hits Home_Runs

Bonds 2935 762

Aaron 3771 755

Ruth 2873 714

Rodriguez 3070 687

Mays 3283 660

gt writetable(x[c(Hits Home_Runs)] outfile3txt sep=t

rownames=T colnames=NA)

gt

With rownames=T and colnames=NA row names and column names will be added

respectively (blank on the top-left)

Content of outfile3

Hits Home_Runs

Bonds 2935 762

Aaron 3771 755

Ruth 2873 714

Rodriguez 3070 687

Mays 3283 660

By giving quote=F output will be without double quotations

Excercise 9 Using the data frame in the above example calculate ratio of hits and at bat

and write the result to rdquooutfile4txtrdquo

10 Writing a program in a file

So far we have done our works interactively without saving what R codes we have written

However when we repeat the same works with R it is laborious to interactively write the

same codes again and again To solve this issue we can write the set of codes in a file and R

can read the file to execute the codes written in the file

For example we can prepare a text file with the following code Let us name the

17

file vecsumtestR

x lt- c(12345)

y lt- c(246810)

z lt- x + y

Then giving source( vecsumtestR) will let R execute the set of codes in the file

vecsumtestR You can check that based on the codes in vecsumtestR vectors are assigned

to each of x y and z

gt source( vecsumtestR)

gt x

[1] 1 2 3 4 5

gt y

[1] 2 4 6 8 10

gt z

[1] 3 6 9 12 15

gt

Exercise 10 Write the procedure in exercise 8 to a file dframetestR and execute the

procedure using ldquosourcerdquo

11 Defining functions

In mathematics function outputs the value which is determined by the input value In

programming it often represents a defined set of procedures5 For example let us consider

a function f which divides the sum of two given input values In mathematics it can be

written as f(x y) = (x + y) 2 In R it is written as follows by introducing the keyword

ldquofunctionrdquo

f lt- function(x y)

return ((x + y) 2)

In this way a function with two parameters (x and y) is defined And the returned value

(output) will be (x + y) 2 After defining f

5 Probably ldquosubroutinerdquo may be the better terminology

18

f(10 20)

will assign 10 and 20 to parameters x and y respectively and 15 will be returned as (x +y) 2

= (10 + 20) 2 = 15 Here ldquoreturnrdquo statement will return the value given just after it You

can also explicitly give the parameter names x and y as follows

f(x = 10 y = 20)

The general way to define a function is

name_of_function lt- function(parameter 1 parameter 2 hellip)

various procedures possibly using the given parameters6

hellip

return(return_value)

Exercise 11 Implement a mathematical function f(x a b c) = ax2 + bx + c in R Then

calculate f(4 3 2 1)

12 Making graphs

R is equipped with functions capable of making graphs easily ldquoplotrdquo function may be one

of the simplest ones It creates a plot in the two-dimensional space For example

assigning the following vectors to x and y will create a plot with points on the locations

(12) (34) (59) (77) and (98)

x lt- c(13579)

y lt- c(24978)

plot(x y xlab=X Value ylab=Y Value)

Labels on the x and y axes can be given to the parameters xlab and ylab

respectively

6 Optional parameters hellip can be access with list(hellip) as a list

19

A plot created by ldquoplotrdquo function

A bar graph can be created using ldquobarplotrdquo function In the following example a bar

graph is created by giving the heights of each bar by vector x The labels of each bar

are given by the element names of the vector x ie names(x)

x lt- c(1232101)

names(x) lt- c(A B C D E F)

barplot(x)

A bar graph created by ldquobarplotrdquo function

ldquohistrdquo function generates a histogram for a set of numbers given by a vector

2 4 6 8

23

45

67

89

X Value

YV

alu

e

A B C D E F

02

46

810

A B C D E F

02

46

810

20

x lt- c(32 12 42 23 34 59 52 53 41 52 32 14)

hist(x xlab = Test Value main = Test Histogram)

The parameter ldquomainrdquo is used to state the title of the histogram

A generated histogram using ldquohistrdquo function

ldquoboxplotrdquo function creates a boxplot for the numbers given by the vectors

x1 lt- c(1112111011111213151211101213)

x2 lt- c(20212791223231211921157121292315)

boxplot(x1 x2 names=c(Data 1 Data 2))

Test Histogram

Test Value

Fre

qu

en

cy

1 2 3 4 5 6

01

23

4

21

Example of a box plot

Exercise 12 Using the table (data frame) given in exercise 7-1 make a plot which describes

relationship between masses of the planets and number of satellites

13 Basic program structure

R is equipped with programming grammars which are important and common in other

programming languages Here some of the most important ones will be described briefly

131 if-statement

ldquoifrdquo statement gives you a way to execute a specific procedure only if a specified condition7

is satisfied For example if you want to assign 1 to y when x gt 0 and otherwise assign 0 to y

you can write as follows

if (x gt 0)

y lt- 1

7 When specifying a condition logical operators such as amp (logical AND) and | (logical OR)

can be used

Data 1 Data 2

10

15

20

25

Median

Outlier 50 of the

data are within this

range

The bottom

25 excluding outliers

The top 25

excluding outliers

22

else

y lt- 0

Investigate the value of y after doing x lt- -5 Then after doing x lt- 3 investigate the value

of y again to make sure that the value has changed

The general form of if-statement is as follows

if (condition 1)

Set of procedures to execute when the condition 1 is satisfied

else if (condition 2)

Set of procedures to execute when the condition the condition 2 is satisfied (but the

condition 1 is NOT met)

else if(condition 3)

Se of procedures to execute when condition 3 is satisfied (but condition 1 to 2 are NOT

met)

else if hellip

else

Set of procedures to execute when NONE of the above conditions are satisfied

Exercise 13-1 Define a function which returns 1 when the input parameter is 0 and

otherwise returns 0

132 while-statement

while-statement iterates a set of given procedures as long as the given condition is satisfied

The general form of while-statement is as follows8

while(condition)

8 next-statement in the while-statement forces program to start the next iteration

immediately break-statement in while-statement forces the program to immediately stop

the iteration and get out of the while-loop

23

A set of procedures to execute when the condition is satisfied

For example executing while(x lt= 3) print(x) x lt- x + 1 after doing x lt- 1 will generate

the output of 1 to 3

gt x lt- 1

gt while (x lt= 3) print(x) x lt- x + 1

[1] 1

[1] 2

[1] 3

gt

If you are to write this procedure as a program in a file you may want to write each step

line by line as follows so that the program will be easy to read

x lt- 1

while (x lt= 3)

print(x)

x lt- x + 1

Initially the value of x is 1 and the condition of the while-block ie x le 3 is met Thus R

goes into the while-block The first procedure print(x) which displays the value of x is

executed and 1 will be displayed

In the next procedure in the while-block (x lt- x + 1) the value of x is increased by 1 Thus

at the end of the first loop of while-block x is 2

R interpreter then comes back to the condition check at the beginning of the while-block

(x lt= 3) The variable x is 2 at this point and still x le 3 holds so the procedures in the

while-block will be executed again ie displaying the value of x 2 by print(x) and

increasing the value of x by 1 by x lt- x + 1 At the end of the while-block x is 3

24

R interpreter will then come back again to the condition check at the beginning of the

while-block (x lt= 3) The variable x is 3 at this point and still x le 3 holds so the

procedures in the while-block will be executed again ie displaying the value of x which

is 3 by print(x) and increasing the value of x by 1 by x lt- x + 1 At the end of the

while-block x is 4

R interpreter will then come back again to the condition check at the beginning of the

while-block (x lt= 3) The variable x is 4 at this point and x le 3 is NOT satisfied any more

Thus R interpreter steps out of the while-loop Final value of x is 4

Exercise 13-2 Using while-statement display 13579 and 11 respectively in each line

133 for-statement

for-statement also does iteration like while-statement The for-statement assigns each of

given elements to the given variable starting from the first element to the last element in

the given elements After each assignment the procedures in the for-block are executed

The general form of for-statement is

for(variables in elements)

procedures

For example for (i in c(135)) print(i) will assign each of 1 3 5 into the variable i and

after each assignment the procedures in the file-block are executed Thus in this case

number 1 is displayed in the first iteration after the assignment 3 is displayed in the

second iteration after the assignment and finally 5 is displayed in the third iteration after

the assignment The following example makes a vector of set of squared values of 1 to 59

x lt- NULL

for (i in 15)

x lt- append(x i2)

9 Actually using for-statement is unnecessary here Just do x lt- (15)2

25

x lt- NULL assigns an empty vector to x NULL represents an empty vector

In the for-statement each value of 1 to 5 is assigned to i and after the assignment the

procedure in the for-block is executed

In the procedure in the for-block the squared value of i (represented as i2) is

concatenated to the vector x The function append(x i) concatenates i to vector x10

Exercise 13-3 Rewrite procedure in 12-2 using for-statement

14 Other useful functions

Here some of the frequently used R functions are introduced briefly

141 How to use

help(function_name) will display how to use function_name

142 variables and attributes

ls() or objects() will display currently defined variables

class(variable_name) or mode(variable_name) will give you the type of the variable (object

For example whether it is numerical variable character list or matrix)

attributes(variable_name) will return attributes defined for the given variable

variabl_name

143 Family of apply functions

ldquosapplyrdquo function will return a vector containing a set of output values from a given

function after using each value in the given vector as a single input to that given function

each by each

func1_sub lt- function(elm)

a function expecting a single number as a parameter

if(-1 lt= elm amp elm lt= 1) return (1) else return (0)

10 The same procedure can be done with x lt- c(x i2) c can be used to concatenate given

vectors

26

func1 lt- function(x) a function with a vector parameter x

return(sapply(x func1_sub))

144 Making figures

rdquocurverdquo function generates a graph for a given function See help for details

Example

curve(dnorm -7 +7) Draws normal (Gaussian) distribution

curve(cos(x)+cos(2x) -2pi 2pi 1000) 1000 is number of points

curve(func1 -3 3) The function defined in 143 Family of apply functions

27

5

Actually we can also deal with set of strings as vector For example after typing

x lt- c(Sunday Monday Tuesday Wednesday Thursday Friday Saturday)

x[2] will give you second string rdquoMondayrdquo Various manipulations are available for a vector

containing strings

grep(es x)

will give you indices of strings in vector x where string contain substring ldquoesrdquo Thus the

below inputs will return strings containing ldquoesrdquo

gt x lt- c(Sunday Monday Tuesday Wednesday Thursday Friday Saturday)

gt x[ grep(es x) ]

[1] Tuesday Wednesday

gt

You can assign name for each element in a vector using function ldquonamerdquo For example

x lt- c(2 4 6)

names(x) lt- c(First Second Third)

will assign (ldquoFirstrdquo ldquoSecondrdquo ldquoThirdrdquo) to name attribute of variable x

x[[ Second ]]

will give you second element

Various statistical functions are defined for vectors containing only numerical values For

example

sum(x)

will give you sum of numerical elements in vector x and

mean(x)

will give you average of numerical elements in x The function sum will count number of Trsquos

in vector x containing only T andor F Thus

x lt- c(246810)

sum(x gt 5)

will count number of numerical elements greater than 5 (ie 3 will be returned)

Exercise 4-1 Create vector containing numerical values larger than 5 in vector (3 1 4 1 5

9 2 6 5)

6

Exercise 4-2 Assign names (ldquoJanrdquo ldquoFebrdquo ldquoMarrdquo ldquoAprrdquo ldquoMayrdquo ldquoJunrdquo ldquoJulrdquo ldquoAugrdquo ldquoSeprdquo

ldquoOctrdquo ldquoNovrdquo ldquoDecrdquo) for respective element of vector (123456789101112) Then

extract third element from the vector using the string rdquoMarrdquo The extracted element should

be 3

5 Simple matrix construction and arithmetic

By gathering vectors it is possible to create a matrix using the function rbind For example

if you want to create a matrix x = 1 2 3

4 5 6

aelig

egraveccedil

ouml

oslashdivide where the first and second rows are (123)

and (456) respectively type

x lt- rbind(c(123) c(456))

Then just type x to display the matrix you have created

gt x lt- rbind(c(123) c(456))

gt x

[1] [2] [3]

[1] 1 2 3

[2] 4 5 6

gt

Alternatively

x lt- matrix(c(1 4 2 5 3 6) nrow=2 ncol=3)

or

x lt- matrix(c(1 2 3 4 5 6) nrow=2 ncol=3 byrow=T)

should return the identical matrix The latter one means that matrix of size 2 times 3 will be

created (nrow=2 ncol=3) and numbers are filled in each row first (byrow=T)

Various functions are available for a matrix nrow(x) and ncol(x) will return number of

rows and columns of the matrix x respectively x+1 will add 1 to all the elements in x and x

2 will multiply all the elements in x by 2 Various kinds of arithmetic between matrices

are also available For example after typing y lt- rbind(c(2 4 6) c(8 10 12)) x + y will give

you matrix where each element correspond to sum of corresponding element of x and y x y

will give you matrix where each element correspond to product of corresponding element of

7

x and y2

Any specific row or column can be extracted from a given matrix For example if you

want to extract second row you type

gt x[2]

[1] 4 5 6

gt

If you want to extract second column you type

gt x[2]

[1] 2 5

t(x) will transpose the matrix x

gt t(x)

[1] [2]

[1] 1 4

[2] 2 5

[3] 3 6

gt

Average of each row of matrix x can be calculated as below3

gt apply(x 1 mean)

[1] 2 5

Average of each column of matrix x can be calculated as below

gt apply(x 2 mean)

[1] 25 35 45

Actually matrix can be deemed as set of numbers in two dimensional array R can also

deal with array having n dimensions using array(vector numbers of elements in each

2 Product of the matrices based on canonical mathematical definition can be calculated by

the operator 3 Second parameter of 1 in apply function denotes that we will calculate average for each

position of the first dimension (in this case the first dimension is row number) Similarly

second parameter of 2 denotes that we will calculate average for each position of the second

dimension (in this case the second dimension is column number)

8

dimension) For example

x lt- array(124 c(342))

will create three dimensional array with size of 3times4times2 and fills numbers in the given

vector from the first dimension to the third dimension

gt x lt- array(124 c(342))

gt x

1 The first 3times4 array of the third dimension

[1] [2] [3] [4]

[1] 1 4 7 10

[2] 2 5 8 11

[3] 3 6 9 12

2 The second 3times4 array of the third dimension

[1] [2] [3] [4]

[1] 13 16 19 22

[2] 14 17 20 23

[3] 15 18 21 24

gt

Exercise 5-1 Calculate 1 3 5

7 9 11

aelig

egraveccedil

ouml

oslashdivide+

1 2 3

2 4 6

aelig

egraveccedil

ouml

oslashdivide

Exercise 5-2 For the matrix obtained in the above calculation calculate row average and

column average

6 Simple list creation

A list in R can gather various kinds of data into a single object to manage them

x lt- list(Ichiro rdquo Seattlerdquo Right fielder)

will create a list containing Ichiro rdquo Seattlerdquo Right fielder Although a vector cannot

9

contain another vector a list can contain a vector as shown in the example below

x lt- list(Ichiro Seattle Right fielder c(184 214 225))

To extract the second element you can type

x[[2]]

You can assign names to each element as below

x lt- list(player = Ichiro team = Seattle

position = Right fielder hits = c(184 214 225))

Type x to confirm that names are given to each element

gt x

$player

[1] Ichiro

$team

[1] Seattle

$position

[1] Right fielder

$hits

[1] 184 214 225

gt

You can use assigned name to extract corresponding element For example

x[[team]]

or

x$team

will extract ldquoSeattlerdquo

Exercise 6-2 Create a list containing ldquoSan Diegordquo vector (32 117) 9645 and ldquoCaliforniardquo

Give names ldquocityrdquo ldquocoordinatesrdquo ldquoareardquo and ldquostaterdquo to respective element

10

11

7 Simple data frame creation

Data frame is one of data class in R It is one type of list and has two dimensional

structure just like a matrix Each row can be deemed as a sample and each column can be

deemed as attribute of the sample Using this idea data frame can represent a table

Following table gives data of five retired major league baseball players

Team atbats hits homeruns

Rose Reds 14053 4256 160

Aaron Brewers 12364 3771 755

Yastrzemski Red Sox 11988 3419 452

Ripken Orioles 11551 3184 431

Cobb Athletics 11434 4191 117

The above table can be represented by data frame as follows

x lt- dataframe(

rownames = c(Rose Aaron Yastrzemski Ripken Cobb)

team = c(Reds Brewers Red Sox Orioles Athletics)

atbats = c(14053 12364 11988 11551 11434)

hits = c(4256 3771 3419 3184 4191)

homeruns = c(160 755 452 431 117))

The function dataframe can generate a data frame and can be used in the format of

dataframe(rownames = vector for column labels column name 1 = vector 1 column name

2 = vector 2 hellip) You can check the content of x by typing x

gt x

team atbats hits homeruns

Rose Reds 14053 4256 160

Aaron Brewers 12364 3771 755

Yastrzemski Red Sox 11988 3419 452

Ripken Orioles 11551 3184 431

Cobb Athletics 11434 4191 117

gt

Like a regular list column name can be used to extract corresponding vector

gt x$hits

[1] 4256 3771 3419 3184 4191

gt

12

Any part of data frame can be extracted as another data frame For example

x[ c(15) c(234) ]

will extract row 1 5 and column 2 3 4 as a new data frame You can also use row names

and column names to do the same thing

x[ c(Rose Cobb) c(atbats hits homeruns)]

Various functions are defined for data frame Let us get attributes associated with the data

frame x

gt attributes(x)

$names

[1] team atbats hits homeruns

$rownames

[1] Rose Aaron Yastrzemski Ripken Cobb

$class

[1] dataframe

gt

We got names rownames and class Type names(x) rownames(x) and class(x) You can

obtain row names column names and class of x respectively

Exercise 7-1 The below table shows characteristic values of each planet in solar system

Masses are represented in 1023 kg and diameters are represented in km Make a data frame

representing the table4

Mass Diameter Satellites

Mercury 3301 4879 0

Venus 12103 48690 0

Earth 59736 12756 1

4 As all the elements are numeric in this case this data frame can be dealt as a matrix

asmatrix(x) will return x as a matrix

13

Mars 6419 6794 2

Jupiter 18986 142984 63

Saturn 5688 120536 48

Uranus 8686 51118 27

Neptune 1024 49528 13

Exercise 7-2 Using the created data frame in the previous exercise calculate averages of

mass diameter and number of satellites

8 Data reading from file

So far we have been inputting data directly However in most of cases numerical data

may be prepared by files So we will learn how to import data into a variable in R First

prepare a text file with the following values Let us name the file ldquotestdatatxtrdquo We assume

that the file is placed under the directory UserssmithTMP

14

14

21

35

6

Using setwd we change the working directory to UserssmithTMP

setwd(UserssmithTMP)

Then using scan function read the data to a variable x

gt x lt- scan(testdatatxt)

Read 5 items

gt x

[1] 14 14 21 35 6

gt

You notice that the values are stored in x as a vector

R has a function readtable to read a table where each line is separated by TABs like the

following example (file name is ldquobatterstxtrdquo)

Team At_Bat Hits Home_Runs

14

Bonds Giants 9847 2935 762

Aaron Braves 12364 3771 755

Ruth Yankees 8398 2873 714

Rodriguez Yankees 10341 3070 687

Mays Giants 10881 3283 660

readtable can read this as data frame

gt x lt- readtable(batterstxt header = T

sep = t rownames = 1)

gt x

Team At_Bat Hits Home_Runs

Bonds Giants 9847 2935 762

Aaron Braves 12364 3771 755

Ruth Yankees 8398 2873 714

Rodriguez Yankees 10341 3070 687

Mays Giants 10881 3283 660

gt

The first parameter of the function readtable states that the name of the file to read

is rdquobatterstxtrdquo Subsequently header = T states that there is a header for the table sep =

ldquotrdquo indicates that each line is separated by TAB Finally we can specify that there is one

column for row names by rownames = 1 Since x will be a data frame we can get number of

hits by $Hits

Exercise 8 Save table in exercise 7-1 as tab-delimited file and read it as data frame

9 Writing data to file

There are many cases where we want to have statistical results written in file rather

than temporary seeing the results on screen In this way we can open the results by

spreadsheet software or process the results using other programming language afterwards

Using write function is one of the simplest ways to output the results to files This

function enables us to output numerical values assigned to variable to file For example

after assigning the vector to x as shown below

x lt- c(10 12 15 19 21 34)

15

the following line

write(x outfile1txt ncolumns = 1)

will write content of x to the file ldquooutfile1txtrdquo ncolumns = 1 states that the vector will be

written to the file by one column

Content of outfile1txt

10

12

15

19

21

34

write function also enable us to write matrix data to file as shown below

gt x lt- matrix(c(123456) nrow=2 ncol=3 byrow=T)

gt x

[1] [2] [3]

[1] 1 2 3

[2] 4 5 6

gt write(t(x) outfile2txt ncolumns=ncol(x) sep=t)

gt

t(x) transposes matrix x If we do not do it the matrix written in the file will look

transposed ncolumns=ncol(x) will tell R that the number of columns to output should be

identical to that of matrix x sep = ldquotrdquo indicates that the columns in the output file will be

separated by tabs

Content of outfile2txt

1 2 3

4 5 6

For writing data frame to a file writetable function is provided

gt x lt- dataframe(

rownames = c(Bonds Aaron Ruth Rodriguez Mays)

Team = c(Giants Braves Yankees Yankees Giants)

16

At_Bat = c(9847 12364 8398 10341 10881)

Hits = c(2935 3771 2873 3070 3283)

Home_Runs = c(762 755 714 687 660))

gt x[c(Hits Home_Runs)]

Hits Home_Runs

Bonds 2935 762

Aaron 3771 755

Ruth 2873 714

Rodriguez 3070 687

Mays 3283 660

gt writetable(x[c(Hits Home_Runs)] outfile3txt sep=t

rownames=T colnames=NA)

gt

With rownames=T and colnames=NA row names and column names will be added

respectively (blank on the top-left)

Content of outfile3

Hits Home_Runs

Bonds 2935 762

Aaron 3771 755

Ruth 2873 714

Rodriguez 3070 687

Mays 3283 660

By giving quote=F output will be without double quotations

Excercise 9 Using the data frame in the above example calculate ratio of hits and at bat

and write the result to rdquooutfile4txtrdquo

10 Writing a program in a file

So far we have done our works interactively without saving what R codes we have written

However when we repeat the same works with R it is laborious to interactively write the

same codes again and again To solve this issue we can write the set of codes in a file and R

can read the file to execute the codes written in the file

For example we can prepare a text file with the following code Let us name the

17

file vecsumtestR

x lt- c(12345)

y lt- c(246810)

z lt- x + y

Then giving source( vecsumtestR) will let R execute the set of codes in the file

vecsumtestR You can check that based on the codes in vecsumtestR vectors are assigned

to each of x y and z

gt source( vecsumtestR)

gt x

[1] 1 2 3 4 5

gt y

[1] 2 4 6 8 10

gt z

[1] 3 6 9 12 15

gt

Exercise 10 Write the procedure in exercise 8 to a file dframetestR and execute the

procedure using ldquosourcerdquo

11 Defining functions

In mathematics function outputs the value which is determined by the input value In

programming it often represents a defined set of procedures5 For example let us consider

a function f which divides the sum of two given input values In mathematics it can be

written as f(x y) = (x + y) 2 In R it is written as follows by introducing the keyword

ldquofunctionrdquo

f lt- function(x y)

return ((x + y) 2)

In this way a function with two parameters (x and y) is defined And the returned value

(output) will be (x + y) 2 After defining f

5 Probably ldquosubroutinerdquo may be the better terminology

18

f(10 20)

will assign 10 and 20 to parameters x and y respectively and 15 will be returned as (x +y) 2

= (10 + 20) 2 = 15 Here ldquoreturnrdquo statement will return the value given just after it You

can also explicitly give the parameter names x and y as follows

f(x = 10 y = 20)

The general way to define a function is

name_of_function lt- function(parameter 1 parameter 2 hellip)

various procedures possibly using the given parameters6

hellip

return(return_value)

Exercise 11 Implement a mathematical function f(x a b c) = ax2 + bx + c in R Then

calculate f(4 3 2 1)

12 Making graphs

R is equipped with functions capable of making graphs easily ldquoplotrdquo function may be one

of the simplest ones It creates a plot in the two-dimensional space For example

assigning the following vectors to x and y will create a plot with points on the locations

(12) (34) (59) (77) and (98)

x lt- c(13579)

y lt- c(24978)

plot(x y xlab=X Value ylab=Y Value)

Labels on the x and y axes can be given to the parameters xlab and ylab

respectively

6 Optional parameters hellip can be access with list(hellip) as a list

19

A plot created by ldquoplotrdquo function

A bar graph can be created using ldquobarplotrdquo function In the following example a bar

graph is created by giving the heights of each bar by vector x The labels of each bar

are given by the element names of the vector x ie names(x)

x lt- c(1232101)

names(x) lt- c(A B C D E F)

barplot(x)

A bar graph created by ldquobarplotrdquo function

ldquohistrdquo function generates a histogram for a set of numbers given by a vector

2 4 6 8

23

45

67

89

X Value

YV

alu

e

A B C D E F

02

46

810

A B C D E F

02

46

810

20

x lt- c(32 12 42 23 34 59 52 53 41 52 32 14)

hist(x xlab = Test Value main = Test Histogram)

The parameter ldquomainrdquo is used to state the title of the histogram

A generated histogram using ldquohistrdquo function

ldquoboxplotrdquo function creates a boxplot for the numbers given by the vectors

x1 lt- c(1112111011111213151211101213)

x2 lt- c(20212791223231211921157121292315)

boxplot(x1 x2 names=c(Data 1 Data 2))

Test Histogram

Test Value

Fre

qu

en

cy

1 2 3 4 5 6

01

23

4

21

Example of a box plot

Exercise 12 Using the table (data frame) given in exercise 7-1 make a plot which describes

relationship between masses of the planets and number of satellites

13 Basic program structure

R is equipped with programming grammars which are important and common in other

programming languages Here some of the most important ones will be described briefly

131 if-statement

ldquoifrdquo statement gives you a way to execute a specific procedure only if a specified condition7

is satisfied For example if you want to assign 1 to y when x gt 0 and otherwise assign 0 to y

you can write as follows

if (x gt 0)

y lt- 1

7 When specifying a condition logical operators such as amp (logical AND) and | (logical OR)

can be used

Data 1 Data 2

10

15

20

25

Median

Outlier 50 of the

data are within this

range

The bottom

25 excluding outliers

The top 25

excluding outliers

22

else

y lt- 0

Investigate the value of y after doing x lt- -5 Then after doing x lt- 3 investigate the value

of y again to make sure that the value has changed

The general form of if-statement is as follows

if (condition 1)

Set of procedures to execute when the condition 1 is satisfied

else if (condition 2)

Set of procedures to execute when the condition the condition 2 is satisfied (but the

condition 1 is NOT met)

else if(condition 3)

Se of procedures to execute when condition 3 is satisfied (but condition 1 to 2 are NOT

met)

else if hellip

else

Set of procedures to execute when NONE of the above conditions are satisfied

Exercise 13-1 Define a function which returns 1 when the input parameter is 0 and

otherwise returns 0

132 while-statement

while-statement iterates a set of given procedures as long as the given condition is satisfied

The general form of while-statement is as follows8

while(condition)

8 next-statement in the while-statement forces program to start the next iteration

immediately break-statement in while-statement forces the program to immediately stop

the iteration and get out of the while-loop

23

A set of procedures to execute when the condition is satisfied

For example executing while(x lt= 3) print(x) x lt- x + 1 after doing x lt- 1 will generate

the output of 1 to 3

gt x lt- 1

gt while (x lt= 3) print(x) x lt- x + 1

[1] 1

[1] 2

[1] 3

gt

If you are to write this procedure as a program in a file you may want to write each step

line by line as follows so that the program will be easy to read

x lt- 1

while (x lt= 3)

print(x)

x lt- x + 1

Initially the value of x is 1 and the condition of the while-block ie x le 3 is met Thus R

goes into the while-block The first procedure print(x) which displays the value of x is

executed and 1 will be displayed

In the next procedure in the while-block (x lt- x + 1) the value of x is increased by 1 Thus

at the end of the first loop of while-block x is 2

R interpreter then comes back to the condition check at the beginning of the while-block

(x lt= 3) The variable x is 2 at this point and still x le 3 holds so the procedures in the

while-block will be executed again ie displaying the value of x 2 by print(x) and

increasing the value of x by 1 by x lt- x + 1 At the end of the while-block x is 3

24

R interpreter will then come back again to the condition check at the beginning of the

while-block (x lt= 3) The variable x is 3 at this point and still x le 3 holds so the

procedures in the while-block will be executed again ie displaying the value of x which

is 3 by print(x) and increasing the value of x by 1 by x lt- x + 1 At the end of the

while-block x is 4

R interpreter will then come back again to the condition check at the beginning of the

while-block (x lt= 3) The variable x is 4 at this point and x le 3 is NOT satisfied any more

Thus R interpreter steps out of the while-loop Final value of x is 4

Exercise 13-2 Using while-statement display 13579 and 11 respectively in each line

133 for-statement

for-statement also does iteration like while-statement The for-statement assigns each of

given elements to the given variable starting from the first element to the last element in

the given elements After each assignment the procedures in the for-block are executed

The general form of for-statement is

for(variables in elements)

procedures

For example for (i in c(135)) print(i) will assign each of 1 3 5 into the variable i and

after each assignment the procedures in the file-block are executed Thus in this case

number 1 is displayed in the first iteration after the assignment 3 is displayed in the

second iteration after the assignment and finally 5 is displayed in the third iteration after

the assignment The following example makes a vector of set of squared values of 1 to 59

x lt- NULL

for (i in 15)

x lt- append(x i2)

9 Actually using for-statement is unnecessary here Just do x lt- (15)2

25

x lt- NULL assigns an empty vector to x NULL represents an empty vector

In the for-statement each value of 1 to 5 is assigned to i and after the assignment the

procedure in the for-block is executed

In the procedure in the for-block the squared value of i (represented as i2) is

concatenated to the vector x The function append(x i) concatenates i to vector x10

Exercise 13-3 Rewrite procedure in 12-2 using for-statement

14 Other useful functions

Here some of the frequently used R functions are introduced briefly

141 How to use

help(function_name) will display how to use function_name

142 variables and attributes

ls() or objects() will display currently defined variables

class(variable_name) or mode(variable_name) will give you the type of the variable (object

For example whether it is numerical variable character list or matrix)

attributes(variable_name) will return attributes defined for the given variable

variabl_name

143 Family of apply functions

ldquosapplyrdquo function will return a vector containing a set of output values from a given

function after using each value in the given vector as a single input to that given function

each by each

func1_sub lt- function(elm)

a function expecting a single number as a parameter

if(-1 lt= elm amp elm lt= 1) return (1) else return (0)

10 The same procedure can be done with x lt- c(x i2) c can be used to concatenate given

vectors

26

func1 lt- function(x) a function with a vector parameter x

return(sapply(x func1_sub))

144 Making figures

rdquocurverdquo function generates a graph for a given function See help for details

Example

curve(dnorm -7 +7) Draws normal (Gaussian) distribution

curve(cos(x)+cos(2x) -2pi 2pi 1000) 1000 is number of points

curve(func1 -3 3) The function defined in 143 Family of apply functions

27

6

Exercise 4-2 Assign names (ldquoJanrdquo ldquoFebrdquo ldquoMarrdquo ldquoAprrdquo ldquoMayrdquo ldquoJunrdquo ldquoJulrdquo ldquoAugrdquo ldquoSeprdquo

ldquoOctrdquo ldquoNovrdquo ldquoDecrdquo) for respective element of vector (123456789101112) Then

extract third element from the vector using the string rdquoMarrdquo The extracted element should

be 3

5 Simple matrix construction and arithmetic

By gathering vectors it is possible to create a matrix using the function rbind For example

if you want to create a matrix x = 1 2 3

4 5 6

aelig

egraveccedil

ouml

oslashdivide where the first and second rows are (123)

and (456) respectively type

x lt- rbind(c(123) c(456))

Then just type x to display the matrix you have created

gt x lt- rbind(c(123) c(456))

gt x

[1] [2] [3]

[1] 1 2 3

[2] 4 5 6

gt

Alternatively

x lt- matrix(c(1 4 2 5 3 6) nrow=2 ncol=3)

or

x lt- matrix(c(1 2 3 4 5 6) nrow=2 ncol=3 byrow=T)

should return the identical matrix The latter one means that matrix of size 2 times 3 will be

created (nrow=2 ncol=3) and numbers are filled in each row first (byrow=T)

Various functions are available for a matrix nrow(x) and ncol(x) will return number of

rows and columns of the matrix x respectively x+1 will add 1 to all the elements in x and x

2 will multiply all the elements in x by 2 Various kinds of arithmetic between matrices

are also available For example after typing y lt- rbind(c(2 4 6) c(8 10 12)) x + y will give

you matrix where each element correspond to sum of corresponding element of x and y x y

will give you matrix where each element correspond to product of corresponding element of

7

x and y2

Any specific row or column can be extracted from a given matrix For example if you

want to extract second row you type

gt x[2]

[1] 4 5 6

gt

If you want to extract second column you type

gt x[2]

[1] 2 5

t(x) will transpose the matrix x

gt t(x)

[1] [2]

[1] 1 4

[2] 2 5

[3] 3 6

gt

Average of each row of matrix x can be calculated as below3

gt apply(x 1 mean)

[1] 2 5

Average of each column of matrix x can be calculated as below

gt apply(x 2 mean)

[1] 25 35 45

Actually matrix can be deemed as set of numbers in two dimensional array R can also

deal with array having n dimensions using array(vector numbers of elements in each

2 Product of the matrices based on canonical mathematical definition can be calculated by

the operator 3 Second parameter of 1 in apply function denotes that we will calculate average for each

position of the first dimension (in this case the first dimension is row number) Similarly

second parameter of 2 denotes that we will calculate average for each position of the second

dimension (in this case the second dimension is column number)

8

dimension) For example

x lt- array(124 c(342))

will create three dimensional array with size of 3times4times2 and fills numbers in the given

vector from the first dimension to the third dimension

gt x lt- array(124 c(342))

gt x

1 The first 3times4 array of the third dimension

[1] [2] [3] [4]

[1] 1 4 7 10

[2] 2 5 8 11

[3] 3 6 9 12

2 The second 3times4 array of the third dimension

[1] [2] [3] [4]

[1] 13 16 19 22

[2] 14 17 20 23

[3] 15 18 21 24

gt

Exercise 5-1 Calculate 1 3 5

7 9 11

aelig

egraveccedil

ouml

oslashdivide+

1 2 3

2 4 6

aelig

egraveccedil

ouml

oslashdivide

Exercise 5-2 For the matrix obtained in the above calculation calculate row average and

column average

6 Simple list creation

A list in R can gather various kinds of data into a single object to manage them

x lt- list(Ichiro rdquo Seattlerdquo Right fielder)

will create a list containing Ichiro rdquo Seattlerdquo Right fielder Although a vector cannot

9

contain another vector a list can contain a vector as shown in the example below

x lt- list(Ichiro Seattle Right fielder c(184 214 225))

To extract the second element you can type

x[[2]]

You can assign names to each element as below

x lt- list(player = Ichiro team = Seattle

position = Right fielder hits = c(184 214 225))

Type x to confirm that names are given to each element

gt x

$player

[1] Ichiro

$team

[1] Seattle

$position

[1] Right fielder

$hits

[1] 184 214 225

gt

You can use assigned name to extract corresponding element For example

x[[team]]

or

x$team

will extract ldquoSeattlerdquo

Exercise 6-2 Create a list containing ldquoSan Diegordquo vector (32 117) 9645 and ldquoCaliforniardquo

Give names ldquocityrdquo ldquocoordinatesrdquo ldquoareardquo and ldquostaterdquo to respective element

10

11

7 Simple data frame creation

Data frame is one of data class in R It is one type of list and has two dimensional

structure just like a matrix Each row can be deemed as a sample and each column can be

deemed as attribute of the sample Using this idea data frame can represent a table

Following table gives data of five retired major league baseball players

Team atbats hits homeruns

Rose Reds 14053 4256 160

Aaron Brewers 12364 3771 755

Yastrzemski Red Sox 11988 3419 452

Ripken Orioles 11551 3184 431

Cobb Athletics 11434 4191 117

The above table can be represented by data frame as follows

x lt- dataframe(

rownames = c(Rose Aaron Yastrzemski Ripken Cobb)

team = c(Reds Brewers Red Sox Orioles Athletics)

atbats = c(14053 12364 11988 11551 11434)

hits = c(4256 3771 3419 3184 4191)

homeruns = c(160 755 452 431 117))

The function dataframe can generate a data frame and can be used in the format of

dataframe(rownames = vector for column labels column name 1 = vector 1 column name

2 = vector 2 hellip) You can check the content of x by typing x

gt x

team atbats hits homeruns

Rose Reds 14053 4256 160

Aaron Brewers 12364 3771 755

Yastrzemski Red Sox 11988 3419 452

Ripken Orioles 11551 3184 431

Cobb Athletics 11434 4191 117

gt

Like a regular list column name can be used to extract corresponding vector

gt x$hits

[1] 4256 3771 3419 3184 4191

gt

12

Any part of data frame can be extracted as another data frame For example

x[ c(15) c(234) ]

will extract row 1 5 and column 2 3 4 as a new data frame You can also use row names

and column names to do the same thing

x[ c(Rose Cobb) c(atbats hits homeruns)]

Various functions are defined for data frame Let us get attributes associated with the data

frame x

gt attributes(x)

$names

[1] team atbats hits homeruns

$rownames

[1] Rose Aaron Yastrzemski Ripken Cobb

$class

[1] dataframe

gt

We got names rownames and class Type names(x) rownames(x) and class(x) You can

obtain row names column names and class of x respectively

Exercise 7-1 The below table shows characteristic values of each planet in solar system

Masses are represented in 1023 kg and diameters are represented in km Make a data frame

representing the table4

Mass Diameter Satellites

Mercury 3301 4879 0

Venus 12103 48690 0

Earth 59736 12756 1

4 As all the elements are numeric in this case this data frame can be dealt as a matrix

asmatrix(x) will return x as a matrix

13

Mars 6419 6794 2

Jupiter 18986 142984 63

Saturn 5688 120536 48

Uranus 8686 51118 27

Neptune 1024 49528 13

Exercise 7-2 Using the created data frame in the previous exercise calculate averages of

mass diameter and number of satellites

8 Data reading from file

So far we have been inputting data directly However in most of cases numerical data

may be prepared by files So we will learn how to import data into a variable in R First

prepare a text file with the following values Let us name the file ldquotestdatatxtrdquo We assume

that the file is placed under the directory UserssmithTMP

14

14

21

35

6

Using setwd we change the working directory to UserssmithTMP

setwd(UserssmithTMP)

Then using scan function read the data to a variable x

gt x lt- scan(testdatatxt)

Read 5 items

gt x

[1] 14 14 21 35 6

gt

You notice that the values are stored in x as a vector

R has a function readtable to read a table where each line is separated by TABs like the

following example (file name is ldquobatterstxtrdquo)

Team At_Bat Hits Home_Runs

14

Bonds Giants 9847 2935 762

Aaron Braves 12364 3771 755

Ruth Yankees 8398 2873 714

Rodriguez Yankees 10341 3070 687

Mays Giants 10881 3283 660

readtable can read this as data frame

gt x lt- readtable(batterstxt header = T

sep = t rownames = 1)

gt x

Team At_Bat Hits Home_Runs

Bonds Giants 9847 2935 762

Aaron Braves 12364 3771 755

Ruth Yankees 8398 2873 714

Rodriguez Yankees 10341 3070 687

Mays Giants 10881 3283 660

gt

The first parameter of the function readtable states that the name of the file to read

is rdquobatterstxtrdquo Subsequently header = T states that there is a header for the table sep =

ldquotrdquo indicates that each line is separated by TAB Finally we can specify that there is one

column for row names by rownames = 1 Since x will be a data frame we can get number of

hits by $Hits

Exercise 8 Save table in exercise 7-1 as tab-delimited file and read it as data frame

9 Writing data to file

There are many cases where we want to have statistical results written in file rather

than temporary seeing the results on screen In this way we can open the results by

spreadsheet software or process the results using other programming language afterwards

Using write function is one of the simplest ways to output the results to files This

function enables us to output numerical values assigned to variable to file For example

after assigning the vector to x as shown below

x lt- c(10 12 15 19 21 34)

15

the following line

write(x outfile1txt ncolumns = 1)

will write content of x to the file ldquooutfile1txtrdquo ncolumns = 1 states that the vector will be

written to the file by one column

Content of outfile1txt

10

12

15

19

21

34

write function also enable us to write matrix data to file as shown below

gt x lt- matrix(c(123456) nrow=2 ncol=3 byrow=T)

gt x

[1] [2] [3]

[1] 1 2 3

[2] 4 5 6

gt write(t(x) outfile2txt ncolumns=ncol(x) sep=t)

gt

t(x) transposes matrix x If we do not do it the matrix written in the file will look

transposed ncolumns=ncol(x) will tell R that the number of columns to output should be

identical to that of matrix x sep = ldquotrdquo indicates that the columns in the output file will be

separated by tabs

Content of outfile2txt

1 2 3

4 5 6

For writing data frame to a file writetable function is provided

gt x lt- dataframe(

rownames = c(Bonds Aaron Ruth Rodriguez Mays)

Team = c(Giants Braves Yankees Yankees Giants)

16

At_Bat = c(9847 12364 8398 10341 10881)

Hits = c(2935 3771 2873 3070 3283)

Home_Runs = c(762 755 714 687 660))

gt x[c(Hits Home_Runs)]

Hits Home_Runs

Bonds 2935 762

Aaron 3771 755

Ruth 2873 714

Rodriguez 3070 687

Mays 3283 660

gt writetable(x[c(Hits Home_Runs)] outfile3txt sep=t

rownames=T colnames=NA)

gt

With rownames=T and colnames=NA row names and column names will be added

respectively (blank on the top-left)

Content of outfile3

Hits Home_Runs

Bonds 2935 762

Aaron 3771 755

Ruth 2873 714

Rodriguez 3070 687

Mays 3283 660

By giving quote=F output will be without double quotations

Excercise 9 Using the data frame in the above example calculate ratio of hits and at bat

and write the result to rdquooutfile4txtrdquo

10 Writing a program in a file

So far we have done our works interactively without saving what R codes we have written

However when we repeat the same works with R it is laborious to interactively write the

same codes again and again To solve this issue we can write the set of codes in a file and R

can read the file to execute the codes written in the file

For example we can prepare a text file with the following code Let us name the

17

file vecsumtestR

x lt- c(12345)

y lt- c(246810)

z lt- x + y

Then giving source( vecsumtestR) will let R execute the set of codes in the file

vecsumtestR You can check that based on the codes in vecsumtestR vectors are assigned

to each of x y and z

gt source( vecsumtestR)

gt x

[1] 1 2 3 4 5

gt y

[1] 2 4 6 8 10

gt z

[1] 3 6 9 12 15

gt

Exercise 10 Write the procedure in exercise 8 to a file dframetestR and execute the

procedure using ldquosourcerdquo

11 Defining functions

In mathematics function outputs the value which is determined by the input value In

programming it often represents a defined set of procedures5 For example let us consider

a function f which divides the sum of two given input values In mathematics it can be

written as f(x y) = (x + y) 2 In R it is written as follows by introducing the keyword

ldquofunctionrdquo

f lt- function(x y)

return ((x + y) 2)

In this way a function with two parameters (x and y) is defined And the returned value

(output) will be (x + y) 2 After defining f

5 Probably ldquosubroutinerdquo may be the better terminology

18

f(10 20)

will assign 10 and 20 to parameters x and y respectively and 15 will be returned as (x +y) 2

= (10 + 20) 2 = 15 Here ldquoreturnrdquo statement will return the value given just after it You

can also explicitly give the parameter names x and y as follows

f(x = 10 y = 20)

The general way to define a function is

name_of_function lt- function(parameter 1 parameter 2 hellip)

various procedures possibly using the given parameters6

hellip

return(return_value)

Exercise 11 Implement a mathematical function f(x a b c) = ax2 + bx + c in R Then

calculate f(4 3 2 1)

12 Making graphs

R is equipped with functions capable of making graphs easily ldquoplotrdquo function may be one

of the simplest ones It creates a plot in the two-dimensional space For example

assigning the following vectors to x and y will create a plot with points on the locations

(12) (34) (59) (77) and (98)

x lt- c(13579)

y lt- c(24978)

plot(x y xlab=X Value ylab=Y Value)

Labels on the x and y axes can be given to the parameters xlab and ylab

respectively

6 Optional parameters hellip can be access with list(hellip) as a list

19

A plot created by ldquoplotrdquo function

A bar graph can be created using ldquobarplotrdquo function In the following example a bar

graph is created by giving the heights of each bar by vector x The labels of each bar

are given by the element names of the vector x ie names(x)

x lt- c(1232101)

names(x) lt- c(A B C D E F)

barplot(x)

A bar graph created by ldquobarplotrdquo function

ldquohistrdquo function generates a histogram for a set of numbers given by a vector

2 4 6 8

23

45

67

89

X Value

YV

alu

e

A B C D E F

02

46

810

A B C D E F

02

46

810

20

x lt- c(32 12 42 23 34 59 52 53 41 52 32 14)

hist(x xlab = Test Value main = Test Histogram)

The parameter ldquomainrdquo is used to state the title of the histogram

A generated histogram using ldquohistrdquo function

ldquoboxplotrdquo function creates a boxplot for the numbers given by the vectors

x1 lt- c(1112111011111213151211101213)

x2 lt- c(20212791223231211921157121292315)

boxplot(x1 x2 names=c(Data 1 Data 2))

Test Histogram

Test Value

Fre

qu

en

cy

1 2 3 4 5 6

01

23

4

21

Example of a box plot

Exercise 12 Using the table (data frame) given in exercise 7-1 make a plot which describes

relationship between masses of the planets and number of satellites

13 Basic program structure

R is equipped with programming grammars which are important and common in other

programming languages Here some of the most important ones will be described briefly

131 if-statement

ldquoifrdquo statement gives you a way to execute a specific procedure only if a specified condition7

is satisfied For example if you want to assign 1 to y when x gt 0 and otherwise assign 0 to y

you can write as follows

if (x gt 0)

y lt- 1

7 When specifying a condition logical operators such as amp (logical AND) and | (logical OR)

can be used

Data 1 Data 2

10

15

20

25

Median

Outlier 50 of the

data are within this

range

The bottom

25 excluding outliers

The top 25

excluding outliers

22

else

y lt- 0

Investigate the value of y after doing x lt- -5 Then after doing x lt- 3 investigate the value

of y again to make sure that the value has changed

The general form of if-statement is as follows

if (condition 1)

Set of procedures to execute when the condition 1 is satisfied

else if (condition 2)

Set of procedures to execute when the condition the condition 2 is satisfied (but the

condition 1 is NOT met)

else if(condition 3)

Se of procedures to execute when condition 3 is satisfied (but condition 1 to 2 are NOT

met)

else if hellip

else

Set of procedures to execute when NONE of the above conditions are satisfied

Exercise 13-1 Define a function which returns 1 when the input parameter is 0 and

otherwise returns 0

132 while-statement

while-statement iterates a set of given procedures as long as the given condition is satisfied

The general form of while-statement is as follows8

while(condition)

8 next-statement in the while-statement forces program to start the next iteration

immediately break-statement in while-statement forces the program to immediately stop

the iteration and get out of the while-loop

23

A set of procedures to execute when the condition is satisfied

For example executing while(x lt= 3) print(x) x lt- x + 1 after doing x lt- 1 will generate

the output of 1 to 3

gt x lt- 1

gt while (x lt= 3) print(x) x lt- x + 1

[1] 1

[1] 2

[1] 3

gt

If you are to write this procedure as a program in a file you may want to write each step

line by line as follows so that the program will be easy to read

x lt- 1

while (x lt= 3)

print(x)

x lt- x + 1

Initially the value of x is 1 and the condition of the while-block ie x le 3 is met Thus R

goes into the while-block The first procedure print(x) which displays the value of x is

executed and 1 will be displayed

In the next procedure in the while-block (x lt- x + 1) the value of x is increased by 1 Thus

at the end of the first loop of while-block x is 2

R interpreter then comes back to the condition check at the beginning of the while-block

(x lt= 3) The variable x is 2 at this point and still x le 3 holds so the procedures in the

while-block will be executed again ie displaying the value of x 2 by print(x) and

increasing the value of x by 1 by x lt- x + 1 At the end of the while-block x is 3

24

R interpreter will then come back again to the condition check at the beginning of the

while-block (x lt= 3) The variable x is 3 at this point and still x le 3 holds so the

procedures in the while-block will be executed again ie displaying the value of x which

is 3 by print(x) and increasing the value of x by 1 by x lt- x + 1 At the end of the

while-block x is 4

R interpreter will then come back again to the condition check at the beginning of the

while-block (x lt= 3) The variable x is 4 at this point and x le 3 is NOT satisfied any more

Thus R interpreter steps out of the while-loop Final value of x is 4

Exercise 13-2 Using while-statement display 13579 and 11 respectively in each line

133 for-statement

for-statement also does iteration like while-statement The for-statement assigns each of

given elements to the given variable starting from the first element to the last element in

the given elements After each assignment the procedures in the for-block are executed

The general form of for-statement is

for(variables in elements)

procedures

For example for (i in c(135)) print(i) will assign each of 1 3 5 into the variable i and

after each assignment the procedures in the file-block are executed Thus in this case

number 1 is displayed in the first iteration after the assignment 3 is displayed in the

second iteration after the assignment and finally 5 is displayed in the third iteration after

the assignment The following example makes a vector of set of squared values of 1 to 59

x lt- NULL

for (i in 15)

x lt- append(x i2)

9 Actually using for-statement is unnecessary here Just do x lt- (15)2

25

x lt- NULL assigns an empty vector to x NULL represents an empty vector

In the for-statement each value of 1 to 5 is assigned to i and after the assignment the

procedure in the for-block is executed

In the procedure in the for-block the squared value of i (represented as i2) is

concatenated to the vector x The function append(x i) concatenates i to vector x10

Exercise 13-3 Rewrite procedure in 12-2 using for-statement

14 Other useful functions

Here some of the frequently used R functions are introduced briefly

141 How to use

help(function_name) will display how to use function_name

142 variables and attributes

ls() or objects() will display currently defined variables

class(variable_name) or mode(variable_name) will give you the type of the variable (object

For example whether it is numerical variable character list or matrix)

attributes(variable_name) will return attributes defined for the given variable

variabl_name

143 Family of apply functions

ldquosapplyrdquo function will return a vector containing a set of output values from a given

function after using each value in the given vector as a single input to that given function

each by each

func1_sub lt- function(elm)

a function expecting a single number as a parameter

if(-1 lt= elm amp elm lt= 1) return (1) else return (0)

10 The same procedure can be done with x lt- c(x i2) c can be used to concatenate given

vectors

26

func1 lt- function(x) a function with a vector parameter x

return(sapply(x func1_sub))

144 Making figures

rdquocurverdquo function generates a graph for a given function See help for details

Example

curve(dnorm -7 +7) Draws normal (Gaussian) distribution

curve(cos(x)+cos(2x) -2pi 2pi 1000) 1000 is number of points

curve(func1 -3 3) The function defined in 143 Family of apply functions

27

7

x and y2

Any specific row or column can be extracted from a given matrix For example if you

want to extract second row you type

gt x[2]

[1] 4 5 6

gt

If you want to extract second column you type

gt x[2]

[1] 2 5

t(x) will transpose the matrix x

gt t(x)

[1] [2]

[1] 1 4

[2] 2 5

[3] 3 6

gt

Average of each row of matrix x can be calculated as below3

gt apply(x 1 mean)

[1] 2 5

Average of each column of matrix x can be calculated as below

gt apply(x 2 mean)

[1] 25 35 45

Actually matrix can be deemed as set of numbers in two dimensional array R can also

deal with array having n dimensions using array(vector numbers of elements in each

2 Product of the matrices based on canonical mathematical definition can be calculated by

the operator 3 Second parameter of 1 in apply function denotes that we will calculate average for each

position of the first dimension (in this case the first dimension is row number) Similarly

second parameter of 2 denotes that we will calculate average for each position of the second

dimension (in this case the second dimension is column number)

8

dimension) For example

x lt- array(124 c(342))

will create three dimensional array with size of 3times4times2 and fills numbers in the given

vector from the first dimension to the third dimension

gt x lt- array(124 c(342))

gt x

1 The first 3times4 array of the third dimension

[1] [2] [3] [4]

[1] 1 4 7 10

[2] 2 5 8 11

[3] 3 6 9 12

2 The second 3times4 array of the third dimension

[1] [2] [3] [4]

[1] 13 16 19 22

[2] 14 17 20 23

[3] 15 18 21 24

gt

Exercise 5-1 Calculate 1 3 5

7 9 11

aelig

egraveccedil

ouml

oslashdivide+

1 2 3

2 4 6

aelig

egraveccedil

ouml

oslashdivide

Exercise 5-2 For the matrix obtained in the above calculation calculate row average and

column average

6 Simple list creation

A list in R can gather various kinds of data into a single object to manage them

x lt- list(Ichiro rdquo Seattlerdquo Right fielder)

will create a list containing Ichiro rdquo Seattlerdquo Right fielder Although a vector cannot

9

contain another vector a list can contain a vector as shown in the example below

x lt- list(Ichiro Seattle Right fielder c(184 214 225))

To extract the second element you can type

x[[2]]

You can assign names to each element as below

x lt- list(player = Ichiro team = Seattle

position = Right fielder hits = c(184 214 225))

Type x to confirm that names are given to each element

gt x

$player

[1] Ichiro

$team

[1] Seattle

$position

[1] Right fielder

$hits

[1] 184 214 225

gt

You can use assigned name to extract corresponding element For example

x[[team]]

or

x$team

will extract ldquoSeattlerdquo

Exercise 6-2 Create a list containing ldquoSan Diegordquo vector (32 117) 9645 and ldquoCaliforniardquo

Give names ldquocityrdquo ldquocoordinatesrdquo ldquoareardquo and ldquostaterdquo to respective element

10

11

7 Simple data frame creation

Data frame is one of data class in R It is one type of list and has two dimensional

structure just like a matrix Each row can be deemed as a sample and each column can be

deemed as attribute of the sample Using this idea data frame can represent a table

Following table gives data of five retired major league baseball players

Team atbats hits homeruns

Rose Reds 14053 4256 160

Aaron Brewers 12364 3771 755

Yastrzemski Red Sox 11988 3419 452

Ripken Orioles 11551 3184 431

Cobb Athletics 11434 4191 117

The above table can be represented by data frame as follows

x lt- dataframe(

rownames = c(Rose Aaron Yastrzemski Ripken Cobb)

team = c(Reds Brewers Red Sox Orioles Athletics)

atbats = c(14053 12364 11988 11551 11434)

hits = c(4256 3771 3419 3184 4191)

homeruns = c(160 755 452 431 117))

The function dataframe can generate a data frame and can be used in the format of

dataframe(rownames = vector for column labels column name 1 = vector 1 column name

2 = vector 2 hellip) You can check the content of x by typing x

gt x

team atbats hits homeruns

Rose Reds 14053 4256 160

Aaron Brewers 12364 3771 755

Yastrzemski Red Sox 11988 3419 452

Ripken Orioles 11551 3184 431

Cobb Athletics 11434 4191 117

gt

Like a regular list column name can be used to extract corresponding vector

gt x$hits

[1] 4256 3771 3419 3184 4191

gt

12

Any part of data frame can be extracted as another data frame For example

x[ c(15) c(234) ]

will extract row 1 5 and column 2 3 4 as a new data frame You can also use row names

and column names to do the same thing

x[ c(Rose Cobb) c(atbats hits homeruns)]

Various functions are defined for data frame Let us get attributes associated with the data

frame x

gt attributes(x)

$names

[1] team atbats hits homeruns

$rownames

[1] Rose Aaron Yastrzemski Ripken Cobb

$class

[1] dataframe

gt

We got names rownames and class Type names(x) rownames(x) and class(x) You can

obtain row names column names and class of x respectively

Exercise 7-1 The below table shows characteristic values of each planet in solar system

Masses are represented in 1023 kg and diameters are represented in km Make a data frame

representing the table4

Mass Diameter Satellites

Mercury 3301 4879 0

Venus 12103 48690 0

Earth 59736 12756 1

4 As all the elements are numeric in this case this data frame can be dealt as a matrix

asmatrix(x) will return x as a matrix

13

Mars 6419 6794 2

Jupiter 18986 142984 63

Saturn 5688 120536 48

Uranus 8686 51118 27

Neptune 1024 49528 13

Exercise 7-2 Using the created data frame in the previous exercise calculate averages of

mass diameter and number of satellites

8 Data reading from file

So far we have been inputting data directly However in most of cases numerical data

may be prepared by files So we will learn how to import data into a variable in R First

prepare a text file with the following values Let us name the file ldquotestdatatxtrdquo We assume

that the file is placed under the directory UserssmithTMP

14

14

21

35

6

Using setwd we change the working directory to UserssmithTMP

setwd(UserssmithTMP)

Then using scan function read the data to a variable x

gt x lt- scan(testdatatxt)

Read 5 items

gt x

[1] 14 14 21 35 6

gt

You notice that the values are stored in x as a vector

R has a function readtable to read a table where each line is separated by TABs like the

following example (file name is ldquobatterstxtrdquo)

Team At_Bat Hits Home_Runs

14

Bonds Giants 9847 2935 762

Aaron Braves 12364 3771 755

Ruth Yankees 8398 2873 714

Rodriguez Yankees 10341 3070 687

Mays Giants 10881 3283 660

readtable can read this as data frame

gt x lt- readtable(batterstxt header = T

sep = t rownames = 1)

gt x

Team At_Bat Hits Home_Runs

Bonds Giants 9847 2935 762

Aaron Braves 12364 3771 755

Ruth Yankees 8398 2873 714

Rodriguez Yankees 10341 3070 687

Mays Giants 10881 3283 660

gt

The first parameter of the function readtable states that the name of the file to read

is rdquobatterstxtrdquo Subsequently header = T states that there is a header for the table sep =

ldquotrdquo indicates that each line is separated by TAB Finally we can specify that there is one

column for row names by rownames = 1 Since x will be a data frame we can get number of

hits by $Hits

Exercise 8 Save table in exercise 7-1 as tab-delimited file and read it as data frame

9 Writing data to file

There are many cases where we want to have statistical results written in file rather

than temporary seeing the results on screen In this way we can open the results by

spreadsheet software or process the results using other programming language afterwards

Using write function is one of the simplest ways to output the results to files This

function enables us to output numerical values assigned to variable to file For example

after assigning the vector to x as shown below

x lt- c(10 12 15 19 21 34)

15

the following line

write(x outfile1txt ncolumns = 1)

will write content of x to the file ldquooutfile1txtrdquo ncolumns = 1 states that the vector will be

written to the file by one column

Content of outfile1txt

10

12

15

19

21

34

write function also enable us to write matrix data to file as shown below

gt x lt- matrix(c(123456) nrow=2 ncol=3 byrow=T)

gt x

[1] [2] [3]

[1] 1 2 3

[2] 4 5 6

gt write(t(x) outfile2txt ncolumns=ncol(x) sep=t)

gt

t(x) transposes matrix x If we do not do it the matrix written in the file will look

transposed ncolumns=ncol(x) will tell R that the number of columns to output should be

identical to that of matrix x sep = ldquotrdquo indicates that the columns in the output file will be

separated by tabs

Content of outfile2txt

1 2 3

4 5 6

For writing data frame to a file writetable function is provided

gt x lt- dataframe(

rownames = c(Bonds Aaron Ruth Rodriguez Mays)

Team = c(Giants Braves Yankees Yankees Giants)

16

At_Bat = c(9847 12364 8398 10341 10881)

Hits = c(2935 3771 2873 3070 3283)

Home_Runs = c(762 755 714 687 660))

gt x[c(Hits Home_Runs)]

Hits Home_Runs

Bonds 2935 762

Aaron 3771 755

Ruth 2873 714

Rodriguez 3070 687

Mays 3283 660

gt writetable(x[c(Hits Home_Runs)] outfile3txt sep=t

rownames=T colnames=NA)

gt

With rownames=T and colnames=NA row names and column names will be added

respectively (blank on the top-left)

Content of outfile3

Hits Home_Runs

Bonds 2935 762

Aaron 3771 755

Ruth 2873 714

Rodriguez 3070 687

Mays 3283 660

By giving quote=F output will be without double quotations

Excercise 9 Using the data frame in the above example calculate ratio of hits and at bat

and write the result to rdquooutfile4txtrdquo

10 Writing a program in a file

So far we have done our works interactively without saving what R codes we have written

However when we repeat the same works with R it is laborious to interactively write the

same codes again and again To solve this issue we can write the set of codes in a file and R

can read the file to execute the codes written in the file

For example we can prepare a text file with the following code Let us name the

17

file vecsumtestR

x lt- c(12345)

y lt- c(246810)

z lt- x + y

Then giving source( vecsumtestR) will let R execute the set of codes in the file

vecsumtestR You can check that based on the codes in vecsumtestR vectors are assigned

to each of x y and z

gt source( vecsumtestR)

gt x

[1] 1 2 3 4 5

gt y

[1] 2 4 6 8 10

gt z

[1] 3 6 9 12 15

gt

Exercise 10 Write the procedure in exercise 8 to a file dframetestR and execute the

procedure using ldquosourcerdquo

11 Defining functions

In mathematics function outputs the value which is determined by the input value In

programming it often represents a defined set of procedures5 For example let us consider

a function f which divides the sum of two given input values In mathematics it can be

written as f(x y) = (x + y) 2 In R it is written as follows by introducing the keyword

ldquofunctionrdquo

f lt- function(x y)

return ((x + y) 2)

In this way a function with two parameters (x and y) is defined And the returned value

(output) will be (x + y) 2 After defining f

5 Probably ldquosubroutinerdquo may be the better terminology

18

f(10 20)

will assign 10 and 20 to parameters x and y respectively and 15 will be returned as (x +y) 2

= (10 + 20) 2 = 15 Here ldquoreturnrdquo statement will return the value given just after it You

can also explicitly give the parameter names x and y as follows

f(x = 10 y = 20)

The general way to define a function is

name_of_function lt- function(parameter 1 parameter 2 hellip)

various procedures possibly using the given parameters6

hellip

return(return_value)

Exercise 11 Implement a mathematical function f(x a b c) = ax2 + bx + c in R Then

calculate f(4 3 2 1)

12 Making graphs

R is equipped with functions capable of making graphs easily ldquoplotrdquo function may be one

of the simplest ones It creates a plot in the two-dimensional space For example

assigning the following vectors to x and y will create a plot with points on the locations

(12) (34) (59) (77) and (98)

x lt- c(13579)

y lt- c(24978)

plot(x y xlab=X Value ylab=Y Value)

Labels on the x and y axes can be given to the parameters xlab and ylab

respectively

6 Optional parameters hellip can be access with list(hellip) as a list

19

A plot created by ldquoplotrdquo function

A bar graph can be created using ldquobarplotrdquo function In the following example a bar

graph is created by giving the heights of each bar by vector x The labels of each bar

are given by the element names of the vector x ie names(x)

x lt- c(1232101)

names(x) lt- c(A B C D E F)

barplot(x)

A bar graph created by ldquobarplotrdquo function

ldquohistrdquo function generates a histogram for a set of numbers given by a vector

2 4 6 8

23

45

67

89

X Value

YV

alu

e

A B C D E F

02

46

810

A B C D E F

02

46

810

20

x lt- c(32 12 42 23 34 59 52 53 41 52 32 14)

hist(x xlab = Test Value main = Test Histogram)

The parameter ldquomainrdquo is used to state the title of the histogram

A generated histogram using ldquohistrdquo function

ldquoboxplotrdquo function creates a boxplot for the numbers given by the vectors

x1 lt- c(1112111011111213151211101213)

x2 lt- c(20212791223231211921157121292315)

boxplot(x1 x2 names=c(Data 1 Data 2))

Test Histogram

Test Value

Fre

qu

en

cy

1 2 3 4 5 6

01

23

4

21

Example of a box plot

Exercise 12 Using the table (data frame) given in exercise 7-1 make a plot which describes

relationship between masses of the planets and number of satellites

13 Basic program structure

R is equipped with programming grammars which are important and common in other

programming languages Here some of the most important ones will be described briefly

131 if-statement

ldquoifrdquo statement gives you a way to execute a specific procedure only if a specified condition7

is satisfied For example if you want to assign 1 to y when x gt 0 and otherwise assign 0 to y

you can write as follows

if (x gt 0)

y lt- 1

7 When specifying a condition logical operators such as amp (logical AND) and | (logical OR)

can be used

Data 1 Data 2

10

15

20

25

Median

Outlier 50 of the

data are within this

range

The bottom

25 excluding outliers

The top 25

excluding outliers

22

else

y lt- 0

Investigate the value of y after doing x lt- -5 Then after doing x lt- 3 investigate the value

of y again to make sure that the value has changed

The general form of if-statement is as follows

if (condition 1)

Set of procedures to execute when the condition 1 is satisfied

else if (condition 2)

Set of procedures to execute when the condition the condition 2 is satisfied (but the

condition 1 is NOT met)

else if(condition 3)

Se of procedures to execute when condition 3 is satisfied (but condition 1 to 2 are NOT

met)

else if hellip

else

Set of procedures to execute when NONE of the above conditions are satisfied

Exercise 13-1 Define a function which returns 1 when the input parameter is 0 and

otherwise returns 0

132 while-statement

while-statement iterates a set of given procedures as long as the given condition is satisfied

The general form of while-statement is as follows8

while(condition)

8 next-statement in the while-statement forces program to start the next iteration

immediately break-statement in while-statement forces the program to immediately stop

the iteration and get out of the while-loop

23

A set of procedures to execute when the condition is satisfied

For example executing while(x lt= 3) print(x) x lt- x + 1 after doing x lt- 1 will generate

the output of 1 to 3

gt x lt- 1

gt while (x lt= 3) print(x) x lt- x + 1

[1] 1

[1] 2

[1] 3

gt

If you are to write this procedure as a program in a file you may want to write each step

line by line as follows so that the program will be easy to read

x lt- 1

while (x lt= 3)

print(x)

x lt- x + 1

Initially the value of x is 1 and the condition of the while-block ie x le 3 is met Thus R

goes into the while-block The first procedure print(x) which displays the value of x is

executed and 1 will be displayed

In the next procedure in the while-block (x lt- x + 1) the value of x is increased by 1 Thus

at the end of the first loop of while-block x is 2

R interpreter then comes back to the condition check at the beginning of the while-block

(x lt= 3) The variable x is 2 at this point and still x le 3 holds so the procedures in the

while-block will be executed again ie displaying the value of x 2 by print(x) and

increasing the value of x by 1 by x lt- x + 1 At the end of the while-block x is 3

24

R interpreter will then come back again to the condition check at the beginning of the

while-block (x lt= 3) The variable x is 3 at this point and still x le 3 holds so the

procedures in the while-block will be executed again ie displaying the value of x which

is 3 by print(x) and increasing the value of x by 1 by x lt- x + 1 At the end of the

while-block x is 4

R interpreter will then come back again to the condition check at the beginning of the

while-block (x lt= 3) The variable x is 4 at this point and x le 3 is NOT satisfied any more

Thus R interpreter steps out of the while-loop Final value of x is 4

Exercise 13-2 Using while-statement display 13579 and 11 respectively in each line

133 for-statement

for-statement also does iteration like while-statement The for-statement assigns each of

given elements to the given variable starting from the first element to the last element in

the given elements After each assignment the procedures in the for-block are executed

The general form of for-statement is

for(variables in elements)

procedures

For example for (i in c(135)) print(i) will assign each of 1 3 5 into the variable i and

after each assignment the procedures in the file-block are executed Thus in this case

number 1 is displayed in the first iteration after the assignment 3 is displayed in the

second iteration after the assignment and finally 5 is displayed in the third iteration after

the assignment The following example makes a vector of set of squared values of 1 to 59

x lt- NULL

for (i in 15)

x lt- append(x i2)

9 Actually using for-statement is unnecessary here Just do x lt- (15)2

25

x lt- NULL assigns an empty vector to x NULL represents an empty vector

In the for-statement each value of 1 to 5 is assigned to i and after the assignment the

procedure in the for-block is executed

In the procedure in the for-block the squared value of i (represented as i2) is

concatenated to the vector x The function append(x i) concatenates i to vector x10

Exercise 13-3 Rewrite procedure in 12-2 using for-statement

14 Other useful functions

Here some of the frequently used R functions are introduced briefly

141 How to use

help(function_name) will display how to use function_name

142 variables and attributes

ls() or objects() will display currently defined variables

class(variable_name) or mode(variable_name) will give you the type of the variable (object

For example whether it is numerical variable character list or matrix)

attributes(variable_name) will return attributes defined for the given variable

variabl_name

143 Family of apply functions

ldquosapplyrdquo function will return a vector containing a set of output values from a given

function after using each value in the given vector as a single input to that given function

each by each

func1_sub lt- function(elm)

a function expecting a single number as a parameter

if(-1 lt= elm amp elm lt= 1) return (1) else return (0)

10 The same procedure can be done with x lt- c(x i2) c can be used to concatenate given

vectors

26

func1 lt- function(x) a function with a vector parameter x

return(sapply(x func1_sub))

144 Making figures

rdquocurverdquo function generates a graph for a given function See help for details

Example

curve(dnorm -7 +7) Draws normal (Gaussian) distribution

curve(cos(x)+cos(2x) -2pi 2pi 1000) 1000 is number of points

curve(func1 -3 3) The function defined in 143 Family of apply functions

27

8

dimension) For example

x lt- array(124 c(342))

will create three dimensional array with size of 3times4times2 and fills numbers in the given

vector from the first dimension to the third dimension

gt x lt- array(124 c(342))

gt x

1 The first 3times4 array of the third dimension

[1] [2] [3] [4]

[1] 1 4 7 10

[2] 2 5 8 11

[3] 3 6 9 12

2 The second 3times4 array of the third dimension

[1] [2] [3] [4]

[1] 13 16 19 22

[2] 14 17 20 23

[3] 15 18 21 24

gt

Exercise 5-1 Calculate 1 3 5

7 9 11

aelig

egraveccedil

ouml

oslashdivide+

1 2 3

2 4 6

aelig

egraveccedil

ouml

oslashdivide

Exercise 5-2 For the matrix obtained in the above calculation calculate row average and

column average

6 Simple list creation

A list in R can gather various kinds of data into a single object to manage them

x lt- list(Ichiro rdquo Seattlerdquo Right fielder)

will create a list containing Ichiro rdquo Seattlerdquo Right fielder Although a vector cannot

9

contain another vector a list can contain a vector as shown in the example below

x lt- list(Ichiro Seattle Right fielder c(184 214 225))

To extract the second element you can type

x[[2]]

You can assign names to each element as below

x lt- list(player = Ichiro team = Seattle

position = Right fielder hits = c(184 214 225))

Type x to confirm that names are given to each element

gt x

$player

[1] Ichiro

$team

[1] Seattle

$position

[1] Right fielder

$hits

[1] 184 214 225

gt

You can use assigned name to extract corresponding element For example

x[[team]]

or

x$team

will extract ldquoSeattlerdquo

Exercise 6-2 Create a list containing ldquoSan Diegordquo vector (32 117) 9645 and ldquoCaliforniardquo

Give names ldquocityrdquo ldquocoordinatesrdquo ldquoareardquo and ldquostaterdquo to respective element

10

11

7 Simple data frame creation

Data frame is one of data class in R It is one type of list and has two dimensional

structure just like a matrix Each row can be deemed as a sample and each column can be

deemed as attribute of the sample Using this idea data frame can represent a table

Following table gives data of five retired major league baseball players

Team atbats hits homeruns

Rose Reds 14053 4256 160

Aaron Brewers 12364 3771 755

Yastrzemski Red Sox 11988 3419 452

Ripken Orioles 11551 3184 431

Cobb Athletics 11434 4191 117

The above table can be represented by data frame as follows

x lt- dataframe(

rownames = c(Rose Aaron Yastrzemski Ripken Cobb)

team = c(Reds Brewers Red Sox Orioles Athletics)

atbats = c(14053 12364 11988 11551 11434)

hits = c(4256 3771 3419 3184 4191)

homeruns = c(160 755 452 431 117))

The function dataframe can generate a data frame and can be used in the format of

dataframe(rownames = vector for column labels column name 1 = vector 1 column name

2 = vector 2 hellip) You can check the content of x by typing x

gt x

team atbats hits homeruns

Rose Reds 14053 4256 160

Aaron Brewers 12364 3771 755

Yastrzemski Red Sox 11988 3419 452

Ripken Orioles 11551 3184 431

Cobb Athletics 11434 4191 117

gt

Like a regular list column name can be used to extract corresponding vector

gt x$hits

[1] 4256 3771 3419 3184 4191

gt

12

Any part of data frame can be extracted as another data frame For example

x[ c(15) c(234) ]

will extract row 1 5 and column 2 3 4 as a new data frame You can also use row names

and column names to do the same thing

x[ c(Rose Cobb) c(atbats hits homeruns)]

Various functions are defined for data frame Let us get attributes associated with the data

frame x

gt attributes(x)

$names

[1] team atbats hits homeruns

$rownames

[1] Rose Aaron Yastrzemski Ripken Cobb

$class

[1] dataframe

gt

We got names rownames and class Type names(x) rownames(x) and class(x) You can

obtain row names column names and class of x respectively

Exercise 7-1 The below table shows characteristic values of each planet in solar system

Masses are represented in 1023 kg and diameters are represented in km Make a data frame

representing the table4

Mass Diameter Satellites

Mercury 3301 4879 0

Venus 12103 48690 0

Earth 59736 12756 1

4 As all the elements are numeric in this case this data frame can be dealt as a matrix

asmatrix(x) will return x as a matrix

13

Mars 6419 6794 2

Jupiter 18986 142984 63

Saturn 5688 120536 48

Uranus 8686 51118 27

Neptune 1024 49528 13

Exercise 7-2 Using the created data frame in the previous exercise calculate averages of

mass diameter and number of satellites

8 Data reading from file

So far we have been inputting data directly However in most of cases numerical data

may be prepared by files So we will learn how to import data into a variable in R First

prepare a text file with the following values Let us name the file ldquotestdatatxtrdquo We assume

that the file is placed under the directory UserssmithTMP

14

14

21

35

6

Using setwd we change the working directory to UserssmithTMP

setwd(UserssmithTMP)

Then using scan function read the data to a variable x

gt x lt- scan(testdatatxt)

Read 5 items

gt x

[1] 14 14 21 35 6

gt

You notice that the values are stored in x as a vector

R has a function readtable to read a table where each line is separated by TABs like the

following example (file name is ldquobatterstxtrdquo)

Team At_Bat Hits Home_Runs

14

Bonds Giants 9847 2935 762

Aaron Braves 12364 3771 755

Ruth Yankees 8398 2873 714

Rodriguez Yankees 10341 3070 687

Mays Giants 10881 3283 660

readtable can read this as data frame

gt x lt- readtable(batterstxt header = T

sep = t rownames = 1)

gt x

Team At_Bat Hits Home_Runs

Bonds Giants 9847 2935 762

Aaron Braves 12364 3771 755

Ruth Yankees 8398 2873 714

Rodriguez Yankees 10341 3070 687

Mays Giants 10881 3283 660

gt

The first parameter of the function readtable states that the name of the file to read

is rdquobatterstxtrdquo Subsequently header = T states that there is a header for the table sep =

ldquotrdquo indicates that each line is separated by TAB Finally we can specify that there is one

column for row names by rownames = 1 Since x will be a data frame we can get number of

hits by $Hits

Exercise 8 Save table in exercise 7-1 as tab-delimited file and read it as data frame

9 Writing data to file

There are many cases where we want to have statistical results written in file rather

than temporary seeing the results on screen In this way we can open the results by

spreadsheet software or process the results using other programming language afterwards

Using write function is one of the simplest ways to output the results to files This

function enables us to output numerical values assigned to variable to file For example

after assigning the vector to x as shown below

x lt- c(10 12 15 19 21 34)

15

the following line

write(x outfile1txt ncolumns = 1)

will write content of x to the file ldquooutfile1txtrdquo ncolumns = 1 states that the vector will be

written to the file by one column

Content of outfile1txt

10

12

15

19

21

34

write function also enable us to write matrix data to file as shown below

gt x lt- matrix(c(123456) nrow=2 ncol=3 byrow=T)

gt x

[1] [2] [3]

[1] 1 2 3

[2] 4 5 6

gt write(t(x) outfile2txt ncolumns=ncol(x) sep=t)

gt

t(x) transposes matrix x If we do not do it the matrix written in the file will look

transposed ncolumns=ncol(x) will tell R that the number of columns to output should be

identical to that of matrix x sep = ldquotrdquo indicates that the columns in the output file will be

separated by tabs

Content of outfile2txt

1 2 3

4 5 6

For writing data frame to a file writetable function is provided

gt x lt- dataframe(

rownames = c(Bonds Aaron Ruth Rodriguez Mays)

Team = c(Giants Braves Yankees Yankees Giants)

16

At_Bat = c(9847 12364 8398 10341 10881)

Hits = c(2935 3771 2873 3070 3283)

Home_Runs = c(762 755 714 687 660))

gt x[c(Hits Home_Runs)]

Hits Home_Runs

Bonds 2935 762

Aaron 3771 755

Ruth 2873 714

Rodriguez 3070 687

Mays 3283 660

gt writetable(x[c(Hits Home_Runs)] outfile3txt sep=t

rownames=T colnames=NA)

gt

With rownames=T and colnames=NA row names and column names will be added

respectively (blank on the top-left)

Content of outfile3

Hits Home_Runs

Bonds 2935 762

Aaron 3771 755

Ruth 2873 714

Rodriguez 3070 687

Mays 3283 660

By giving quote=F output will be without double quotations

Excercise 9 Using the data frame in the above example calculate ratio of hits and at bat

and write the result to rdquooutfile4txtrdquo

10 Writing a program in a file

So far we have done our works interactively without saving what R codes we have written

However when we repeat the same works with R it is laborious to interactively write the

same codes again and again To solve this issue we can write the set of codes in a file and R

can read the file to execute the codes written in the file

For example we can prepare a text file with the following code Let us name the

17

file vecsumtestR

x lt- c(12345)

y lt- c(246810)

z lt- x + y

Then giving source( vecsumtestR) will let R execute the set of codes in the file

vecsumtestR You can check that based on the codes in vecsumtestR vectors are assigned

to each of x y and z

gt source( vecsumtestR)

gt x

[1] 1 2 3 4 5

gt y

[1] 2 4 6 8 10

gt z

[1] 3 6 9 12 15

gt

Exercise 10 Write the procedure in exercise 8 to a file dframetestR and execute the

procedure using ldquosourcerdquo

11 Defining functions

In mathematics function outputs the value which is determined by the input value In

programming it often represents a defined set of procedures5 For example let us consider

a function f which divides the sum of two given input values In mathematics it can be

written as f(x y) = (x + y) 2 In R it is written as follows by introducing the keyword

ldquofunctionrdquo

f lt- function(x y)

return ((x + y) 2)

In this way a function with two parameters (x and y) is defined And the returned value

(output) will be (x + y) 2 After defining f

5 Probably ldquosubroutinerdquo may be the better terminology

18

f(10 20)

will assign 10 and 20 to parameters x and y respectively and 15 will be returned as (x +y) 2

= (10 + 20) 2 = 15 Here ldquoreturnrdquo statement will return the value given just after it You

can also explicitly give the parameter names x and y as follows

f(x = 10 y = 20)

The general way to define a function is

name_of_function lt- function(parameter 1 parameter 2 hellip)

various procedures possibly using the given parameters6

hellip

return(return_value)

Exercise 11 Implement a mathematical function f(x a b c) = ax2 + bx + c in R Then

calculate f(4 3 2 1)

12 Making graphs

R is equipped with functions capable of making graphs easily ldquoplotrdquo function may be one

of the simplest ones It creates a plot in the two-dimensional space For example

assigning the following vectors to x and y will create a plot with points on the locations

(12) (34) (59) (77) and (98)

x lt- c(13579)

y lt- c(24978)

plot(x y xlab=X Value ylab=Y Value)

Labels on the x and y axes can be given to the parameters xlab and ylab

respectively

6 Optional parameters hellip can be access with list(hellip) as a list

19

A plot created by ldquoplotrdquo function

A bar graph can be created using ldquobarplotrdquo function In the following example a bar

graph is created by giving the heights of each bar by vector x The labels of each bar

are given by the element names of the vector x ie names(x)

x lt- c(1232101)

names(x) lt- c(A B C D E F)

barplot(x)

A bar graph created by ldquobarplotrdquo function

ldquohistrdquo function generates a histogram for a set of numbers given by a vector

2 4 6 8

23

45

67

89

X Value

YV

alu

e

A B C D E F

02

46

810

A B C D E F

02

46

810

20

x lt- c(32 12 42 23 34 59 52 53 41 52 32 14)

hist(x xlab = Test Value main = Test Histogram)

The parameter ldquomainrdquo is used to state the title of the histogram

A generated histogram using ldquohistrdquo function

ldquoboxplotrdquo function creates a boxplot for the numbers given by the vectors

x1 lt- c(1112111011111213151211101213)

x2 lt- c(20212791223231211921157121292315)

boxplot(x1 x2 names=c(Data 1 Data 2))

Test Histogram

Test Value

Fre

qu

en

cy

1 2 3 4 5 6

01

23

4

21

Example of a box plot

Exercise 12 Using the table (data frame) given in exercise 7-1 make a plot which describes

relationship between masses of the planets and number of satellites

13 Basic program structure

R is equipped with programming grammars which are important and common in other

programming languages Here some of the most important ones will be described briefly

131 if-statement

ldquoifrdquo statement gives you a way to execute a specific procedure only if a specified condition7

is satisfied For example if you want to assign 1 to y when x gt 0 and otherwise assign 0 to y

you can write as follows

if (x gt 0)

y lt- 1

7 When specifying a condition logical operators such as amp (logical AND) and | (logical OR)

can be used

Data 1 Data 2

10

15

20

25

Median

Outlier 50 of the

data are within this

range

The bottom

25 excluding outliers

The top 25

excluding outliers

22

else

y lt- 0

Investigate the value of y after doing x lt- -5 Then after doing x lt- 3 investigate the value

of y again to make sure that the value has changed

The general form of if-statement is as follows

if (condition 1)

Set of procedures to execute when the condition 1 is satisfied

else if (condition 2)

Set of procedures to execute when the condition the condition 2 is satisfied (but the

condition 1 is NOT met)

else if(condition 3)

Se of procedures to execute when condition 3 is satisfied (but condition 1 to 2 are NOT

met)

else if hellip

else

Set of procedures to execute when NONE of the above conditions are satisfied

Exercise 13-1 Define a function which returns 1 when the input parameter is 0 and

otherwise returns 0

132 while-statement

while-statement iterates a set of given procedures as long as the given condition is satisfied

The general form of while-statement is as follows8

while(condition)

8 next-statement in the while-statement forces program to start the next iteration

immediately break-statement in while-statement forces the program to immediately stop

the iteration and get out of the while-loop

23

A set of procedures to execute when the condition is satisfied

For example executing while(x lt= 3) print(x) x lt- x + 1 after doing x lt- 1 will generate

the output of 1 to 3

gt x lt- 1

gt while (x lt= 3) print(x) x lt- x + 1

[1] 1

[1] 2

[1] 3

gt

If you are to write this procedure as a program in a file you may want to write each step

line by line as follows so that the program will be easy to read

x lt- 1

while (x lt= 3)

print(x)

x lt- x + 1

Initially the value of x is 1 and the condition of the while-block ie x le 3 is met Thus R

goes into the while-block The first procedure print(x) which displays the value of x is

executed and 1 will be displayed

In the next procedure in the while-block (x lt- x + 1) the value of x is increased by 1 Thus

at the end of the first loop of while-block x is 2

R interpreter then comes back to the condition check at the beginning of the while-block

(x lt= 3) The variable x is 2 at this point and still x le 3 holds so the procedures in the

while-block will be executed again ie displaying the value of x 2 by print(x) and

increasing the value of x by 1 by x lt- x + 1 At the end of the while-block x is 3

24

R interpreter will then come back again to the condition check at the beginning of the

while-block (x lt= 3) The variable x is 3 at this point and still x le 3 holds so the

procedures in the while-block will be executed again ie displaying the value of x which

is 3 by print(x) and increasing the value of x by 1 by x lt- x + 1 At the end of the

while-block x is 4

R interpreter will then come back again to the condition check at the beginning of the

while-block (x lt= 3) The variable x is 4 at this point and x le 3 is NOT satisfied any more

Thus R interpreter steps out of the while-loop Final value of x is 4

Exercise 13-2 Using while-statement display 13579 and 11 respectively in each line

133 for-statement

for-statement also does iteration like while-statement The for-statement assigns each of

given elements to the given variable starting from the first element to the last element in

the given elements After each assignment the procedures in the for-block are executed

The general form of for-statement is

for(variables in elements)

procedures

For example for (i in c(135)) print(i) will assign each of 1 3 5 into the variable i and

after each assignment the procedures in the file-block are executed Thus in this case

number 1 is displayed in the first iteration after the assignment 3 is displayed in the

second iteration after the assignment and finally 5 is displayed in the third iteration after

the assignment The following example makes a vector of set of squared values of 1 to 59

x lt- NULL

for (i in 15)

x lt- append(x i2)

9 Actually using for-statement is unnecessary here Just do x lt- (15)2

25

x lt- NULL assigns an empty vector to x NULL represents an empty vector

In the for-statement each value of 1 to 5 is assigned to i and after the assignment the

procedure in the for-block is executed

In the procedure in the for-block the squared value of i (represented as i2) is

concatenated to the vector x The function append(x i) concatenates i to vector x10

Exercise 13-3 Rewrite procedure in 12-2 using for-statement

14 Other useful functions

Here some of the frequently used R functions are introduced briefly

141 How to use

help(function_name) will display how to use function_name

142 variables and attributes

ls() or objects() will display currently defined variables

class(variable_name) or mode(variable_name) will give you the type of the variable (object

For example whether it is numerical variable character list or matrix)

attributes(variable_name) will return attributes defined for the given variable

variabl_name

143 Family of apply functions

ldquosapplyrdquo function will return a vector containing a set of output values from a given

function after using each value in the given vector as a single input to that given function

each by each

func1_sub lt- function(elm)

a function expecting a single number as a parameter

if(-1 lt= elm amp elm lt= 1) return (1) else return (0)

10 The same procedure can be done with x lt- c(x i2) c can be used to concatenate given

vectors

26

func1 lt- function(x) a function with a vector parameter x

return(sapply(x func1_sub))

144 Making figures

rdquocurverdquo function generates a graph for a given function See help for details

Example

curve(dnorm -7 +7) Draws normal (Gaussian) distribution

curve(cos(x)+cos(2x) -2pi 2pi 1000) 1000 is number of points

curve(func1 -3 3) The function defined in 143 Family of apply functions

27

9

contain another vector a list can contain a vector as shown in the example below

x lt- list(Ichiro Seattle Right fielder c(184 214 225))

To extract the second element you can type

x[[2]]

You can assign names to each element as below

x lt- list(player = Ichiro team = Seattle

position = Right fielder hits = c(184 214 225))

Type x to confirm that names are given to each element

gt x

$player

[1] Ichiro

$team

[1] Seattle

$position

[1] Right fielder

$hits

[1] 184 214 225

gt

You can use assigned name to extract corresponding element For example

x[[team]]

or

x$team

will extract ldquoSeattlerdquo

Exercise 6-2 Create a list containing ldquoSan Diegordquo vector (32 117) 9645 and ldquoCaliforniardquo

Give names ldquocityrdquo ldquocoordinatesrdquo ldquoareardquo and ldquostaterdquo to respective element

10

11

7 Simple data frame creation

Data frame is one of data class in R It is one type of list and has two dimensional

structure just like a matrix Each row can be deemed as a sample and each column can be

deemed as attribute of the sample Using this idea data frame can represent a table

Following table gives data of five retired major league baseball players

Team atbats hits homeruns

Rose Reds 14053 4256 160

Aaron Brewers 12364 3771 755

Yastrzemski Red Sox 11988 3419 452

Ripken Orioles 11551 3184 431

Cobb Athletics 11434 4191 117

The above table can be represented by data frame as follows

x lt- dataframe(

rownames = c(Rose Aaron Yastrzemski Ripken Cobb)

team = c(Reds Brewers Red Sox Orioles Athletics)

atbats = c(14053 12364 11988 11551 11434)

hits = c(4256 3771 3419 3184 4191)

homeruns = c(160 755 452 431 117))

The function dataframe can generate a data frame and can be used in the format of

dataframe(rownames = vector for column labels column name 1 = vector 1 column name

2 = vector 2 hellip) You can check the content of x by typing x

gt x

team atbats hits homeruns

Rose Reds 14053 4256 160

Aaron Brewers 12364 3771 755

Yastrzemski Red Sox 11988 3419 452

Ripken Orioles 11551 3184 431

Cobb Athletics 11434 4191 117

gt

Like a regular list column name can be used to extract corresponding vector

gt x$hits

[1] 4256 3771 3419 3184 4191

gt

12

Any part of data frame can be extracted as another data frame For example

x[ c(15) c(234) ]

will extract row 1 5 and column 2 3 4 as a new data frame You can also use row names

and column names to do the same thing

x[ c(Rose Cobb) c(atbats hits homeruns)]

Various functions are defined for data frame Let us get attributes associated with the data

frame x

gt attributes(x)

$names

[1] team atbats hits homeruns

$rownames

[1] Rose Aaron Yastrzemski Ripken Cobb

$class

[1] dataframe

gt

We got names rownames and class Type names(x) rownames(x) and class(x) You can

obtain row names column names and class of x respectively

Exercise 7-1 The below table shows characteristic values of each planet in solar system

Masses are represented in 1023 kg and diameters are represented in km Make a data frame

representing the table4

Mass Diameter Satellites

Mercury 3301 4879 0

Venus 12103 48690 0

Earth 59736 12756 1

4 As all the elements are numeric in this case this data frame can be dealt as a matrix

asmatrix(x) will return x as a matrix

13

Mars 6419 6794 2

Jupiter 18986 142984 63

Saturn 5688 120536 48

Uranus 8686 51118 27

Neptune 1024 49528 13

Exercise 7-2 Using the created data frame in the previous exercise calculate averages of

mass diameter and number of satellites

8 Data reading from file

So far we have been inputting data directly However in most of cases numerical data

may be prepared by files So we will learn how to import data into a variable in R First

prepare a text file with the following values Let us name the file ldquotestdatatxtrdquo We assume

that the file is placed under the directory UserssmithTMP

14

14

21

35

6

Using setwd we change the working directory to UserssmithTMP

setwd(UserssmithTMP)

Then using scan function read the data to a variable x

gt x lt- scan(testdatatxt)

Read 5 items

gt x

[1] 14 14 21 35 6

gt

You notice that the values are stored in x as a vector

R has a function readtable to read a table where each line is separated by TABs like the

following example (file name is ldquobatterstxtrdquo)

Team At_Bat Hits Home_Runs

14

Bonds Giants 9847 2935 762

Aaron Braves 12364 3771 755

Ruth Yankees 8398 2873 714

Rodriguez Yankees 10341 3070 687

Mays Giants 10881 3283 660

readtable can read this as data frame

gt x lt- readtable(batterstxt header = T

sep = t rownames = 1)

gt x

Team At_Bat Hits Home_Runs

Bonds Giants 9847 2935 762

Aaron Braves 12364 3771 755

Ruth Yankees 8398 2873 714

Rodriguez Yankees 10341 3070 687

Mays Giants 10881 3283 660

gt

The first parameter of the function readtable states that the name of the file to read

is rdquobatterstxtrdquo Subsequently header = T states that there is a header for the table sep =

ldquotrdquo indicates that each line is separated by TAB Finally we can specify that there is one

column for row names by rownames = 1 Since x will be a data frame we can get number of

hits by $Hits

Exercise 8 Save table in exercise 7-1 as tab-delimited file and read it as data frame

9 Writing data to file

There are many cases where we want to have statistical results written in file rather

than temporary seeing the results on screen In this way we can open the results by

spreadsheet software or process the results using other programming language afterwards

Using write function is one of the simplest ways to output the results to files This

function enables us to output numerical values assigned to variable to file For example

after assigning the vector to x as shown below

x lt- c(10 12 15 19 21 34)

15

the following line

write(x outfile1txt ncolumns = 1)

will write content of x to the file ldquooutfile1txtrdquo ncolumns = 1 states that the vector will be

written to the file by one column

Content of outfile1txt

10

12

15

19

21

34

write function also enable us to write matrix data to file as shown below

gt x lt- matrix(c(123456) nrow=2 ncol=3 byrow=T)

gt x

[1] [2] [3]

[1] 1 2 3

[2] 4 5 6

gt write(t(x) outfile2txt ncolumns=ncol(x) sep=t)

gt

t(x) transposes matrix x If we do not do it the matrix written in the file will look

transposed ncolumns=ncol(x) will tell R that the number of columns to output should be

identical to that of matrix x sep = ldquotrdquo indicates that the columns in the output file will be

separated by tabs

Content of outfile2txt

1 2 3

4 5 6

For writing data frame to a file writetable function is provided

gt x lt- dataframe(

rownames = c(Bonds Aaron Ruth Rodriguez Mays)

Team = c(Giants Braves Yankees Yankees Giants)

16

At_Bat = c(9847 12364 8398 10341 10881)

Hits = c(2935 3771 2873 3070 3283)

Home_Runs = c(762 755 714 687 660))

gt x[c(Hits Home_Runs)]

Hits Home_Runs

Bonds 2935 762

Aaron 3771 755

Ruth 2873 714

Rodriguez 3070 687

Mays 3283 660

gt writetable(x[c(Hits Home_Runs)] outfile3txt sep=t

rownames=T colnames=NA)

gt

With rownames=T and colnames=NA row names and column names will be added

respectively (blank on the top-left)

Content of outfile3

Hits Home_Runs

Bonds 2935 762

Aaron 3771 755

Ruth 2873 714

Rodriguez 3070 687

Mays 3283 660

By giving quote=F output will be without double quotations

Excercise 9 Using the data frame in the above example calculate ratio of hits and at bat

and write the result to rdquooutfile4txtrdquo

10 Writing a program in a file

So far we have done our works interactively without saving what R codes we have written

However when we repeat the same works with R it is laborious to interactively write the

same codes again and again To solve this issue we can write the set of codes in a file and R

can read the file to execute the codes written in the file

For example we can prepare a text file with the following code Let us name the

17

file vecsumtestR

x lt- c(12345)

y lt- c(246810)

z lt- x + y

Then giving source( vecsumtestR) will let R execute the set of codes in the file

vecsumtestR You can check that based on the codes in vecsumtestR vectors are assigned

to each of x y and z

gt source( vecsumtestR)

gt x

[1] 1 2 3 4 5

gt y

[1] 2 4 6 8 10

gt z

[1] 3 6 9 12 15

gt

Exercise 10 Write the procedure in exercise 8 to a file dframetestR and execute the

procedure using ldquosourcerdquo

11 Defining functions

In mathematics function outputs the value which is determined by the input value In

programming it often represents a defined set of procedures5 For example let us consider

a function f which divides the sum of two given input values In mathematics it can be

written as f(x y) = (x + y) 2 In R it is written as follows by introducing the keyword

ldquofunctionrdquo

f lt- function(x y)

return ((x + y) 2)

In this way a function with two parameters (x and y) is defined And the returned value

(output) will be (x + y) 2 After defining f

5 Probably ldquosubroutinerdquo may be the better terminology

18

f(10 20)

will assign 10 and 20 to parameters x and y respectively and 15 will be returned as (x +y) 2

= (10 + 20) 2 = 15 Here ldquoreturnrdquo statement will return the value given just after it You

can also explicitly give the parameter names x and y as follows

f(x = 10 y = 20)

The general way to define a function is

name_of_function lt- function(parameter 1 parameter 2 hellip)

various procedures possibly using the given parameters6

hellip

return(return_value)

Exercise 11 Implement a mathematical function f(x a b c) = ax2 + bx + c in R Then

calculate f(4 3 2 1)

12 Making graphs

R is equipped with functions capable of making graphs easily ldquoplotrdquo function may be one

of the simplest ones It creates a plot in the two-dimensional space For example

assigning the following vectors to x and y will create a plot with points on the locations

(12) (34) (59) (77) and (98)

x lt- c(13579)

y lt- c(24978)

plot(x y xlab=X Value ylab=Y Value)

Labels on the x and y axes can be given to the parameters xlab and ylab

respectively

6 Optional parameters hellip can be access with list(hellip) as a list

19

A plot created by ldquoplotrdquo function

A bar graph can be created using ldquobarplotrdquo function In the following example a bar

graph is created by giving the heights of each bar by vector x The labels of each bar

are given by the element names of the vector x ie names(x)

x lt- c(1232101)

names(x) lt- c(A B C D E F)

barplot(x)

A bar graph created by ldquobarplotrdquo function

ldquohistrdquo function generates a histogram for a set of numbers given by a vector

2 4 6 8

23

45

67

89

X Value

YV

alu

e

A B C D E F

02

46

810

A B C D E F

02

46

810

20

x lt- c(32 12 42 23 34 59 52 53 41 52 32 14)

hist(x xlab = Test Value main = Test Histogram)

The parameter ldquomainrdquo is used to state the title of the histogram

A generated histogram using ldquohistrdquo function

ldquoboxplotrdquo function creates a boxplot for the numbers given by the vectors

x1 lt- c(1112111011111213151211101213)

x2 lt- c(20212791223231211921157121292315)

boxplot(x1 x2 names=c(Data 1 Data 2))

Test Histogram

Test Value

Fre

qu

en

cy

1 2 3 4 5 6

01

23

4

21

Example of a box plot

Exercise 12 Using the table (data frame) given in exercise 7-1 make a plot which describes

relationship between masses of the planets and number of satellites

13 Basic program structure

R is equipped with programming grammars which are important and common in other

programming languages Here some of the most important ones will be described briefly

131 if-statement

ldquoifrdquo statement gives you a way to execute a specific procedure only if a specified condition7

is satisfied For example if you want to assign 1 to y when x gt 0 and otherwise assign 0 to y

you can write as follows

if (x gt 0)

y lt- 1

7 When specifying a condition logical operators such as amp (logical AND) and | (logical OR)

can be used

Data 1 Data 2

10

15

20

25

Median

Outlier 50 of the

data are within this

range

The bottom

25 excluding outliers

The top 25

excluding outliers

22

else

y lt- 0

Investigate the value of y after doing x lt- -5 Then after doing x lt- 3 investigate the value

of y again to make sure that the value has changed

The general form of if-statement is as follows

if (condition 1)

Set of procedures to execute when the condition 1 is satisfied

else if (condition 2)

Set of procedures to execute when the condition the condition 2 is satisfied (but the

condition 1 is NOT met)

else if(condition 3)

Se of procedures to execute when condition 3 is satisfied (but condition 1 to 2 are NOT

met)

else if hellip

else

Set of procedures to execute when NONE of the above conditions are satisfied

Exercise 13-1 Define a function which returns 1 when the input parameter is 0 and

otherwise returns 0

132 while-statement

while-statement iterates a set of given procedures as long as the given condition is satisfied

The general form of while-statement is as follows8

while(condition)

8 next-statement in the while-statement forces program to start the next iteration

immediately break-statement in while-statement forces the program to immediately stop

the iteration and get out of the while-loop

23

A set of procedures to execute when the condition is satisfied

For example executing while(x lt= 3) print(x) x lt- x + 1 after doing x lt- 1 will generate

the output of 1 to 3

gt x lt- 1

gt while (x lt= 3) print(x) x lt- x + 1

[1] 1

[1] 2

[1] 3

gt

If you are to write this procedure as a program in a file you may want to write each step

line by line as follows so that the program will be easy to read

x lt- 1

while (x lt= 3)

print(x)

x lt- x + 1

Initially the value of x is 1 and the condition of the while-block ie x le 3 is met Thus R

goes into the while-block The first procedure print(x) which displays the value of x is

executed and 1 will be displayed

In the next procedure in the while-block (x lt- x + 1) the value of x is increased by 1 Thus

at the end of the first loop of while-block x is 2

R interpreter then comes back to the condition check at the beginning of the while-block

(x lt= 3) The variable x is 2 at this point and still x le 3 holds so the procedures in the

while-block will be executed again ie displaying the value of x 2 by print(x) and

increasing the value of x by 1 by x lt- x + 1 At the end of the while-block x is 3

24

R interpreter will then come back again to the condition check at the beginning of the

while-block (x lt= 3) The variable x is 3 at this point and still x le 3 holds so the

procedures in the while-block will be executed again ie displaying the value of x which

is 3 by print(x) and increasing the value of x by 1 by x lt- x + 1 At the end of the

while-block x is 4

R interpreter will then come back again to the condition check at the beginning of the

while-block (x lt= 3) The variable x is 4 at this point and x le 3 is NOT satisfied any more

Thus R interpreter steps out of the while-loop Final value of x is 4

Exercise 13-2 Using while-statement display 13579 and 11 respectively in each line

133 for-statement

for-statement also does iteration like while-statement The for-statement assigns each of

given elements to the given variable starting from the first element to the last element in

the given elements After each assignment the procedures in the for-block are executed

The general form of for-statement is

for(variables in elements)

procedures

For example for (i in c(135)) print(i) will assign each of 1 3 5 into the variable i and

after each assignment the procedures in the file-block are executed Thus in this case

number 1 is displayed in the first iteration after the assignment 3 is displayed in the

second iteration after the assignment and finally 5 is displayed in the third iteration after

the assignment The following example makes a vector of set of squared values of 1 to 59

x lt- NULL

for (i in 15)

x lt- append(x i2)

9 Actually using for-statement is unnecessary here Just do x lt- (15)2

25

x lt- NULL assigns an empty vector to x NULL represents an empty vector

In the for-statement each value of 1 to 5 is assigned to i and after the assignment the

procedure in the for-block is executed

In the procedure in the for-block the squared value of i (represented as i2) is

concatenated to the vector x The function append(x i) concatenates i to vector x10

Exercise 13-3 Rewrite procedure in 12-2 using for-statement

14 Other useful functions

Here some of the frequently used R functions are introduced briefly

141 How to use

help(function_name) will display how to use function_name

142 variables and attributes

ls() or objects() will display currently defined variables

class(variable_name) or mode(variable_name) will give you the type of the variable (object

For example whether it is numerical variable character list or matrix)

attributes(variable_name) will return attributes defined for the given variable

variabl_name

143 Family of apply functions

ldquosapplyrdquo function will return a vector containing a set of output values from a given

function after using each value in the given vector as a single input to that given function

each by each

func1_sub lt- function(elm)

a function expecting a single number as a parameter

if(-1 lt= elm amp elm lt= 1) return (1) else return (0)

10 The same procedure can be done with x lt- c(x i2) c can be used to concatenate given

vectors

26

func1 lt- function(x) a function with a vector parameter x

return(sapply(x func1_sub))

144 Making figures

rdquocurverdquo function generates a graph for a given function See help for details

Example

curve(dnorm -7 +7) Draws normal (Gaussian) distribution

curve(cos(x)+cos(2x) -2pi 2pi 1000) 1000 is number of points

curve(func1 -3 3) The function defined in 143 Family of apply functions

27

10

11

7 Simple data frame creation

Data frame is one of data class in R It is one type of list and has two dimensional

structure just like a matrix Each row can be deemed as a sample and each column can be

deemed as attribute of the sample Using this idea data frame can represent a table

Following table gives data of five retired major league baseball players

Team atbats hits homeruns

Rose Reds 14053 4256 160

Aaron Brewers 12364 3771 755

Yastrzemski Red Sox 11988 3419 452

Ripken Orioles 11551 3184 431

Cobb Athletics 11434 4191 117

The above table can be represented by data frame as follows

x lt- dataframe(

rownames = c(Rose Aaron Yastrzemski Ripken Cobb)

team = c(Reds Brewers Red Sox Orioles Athletics)

atbats = c(14053 12364 11988 11551 11434)

hits = c(4256 3771 3419 3184 4191)

homeruns = c(160 755 452 431 117))

The function dataframe can generate a data frame and can be used in the format of

dataframe(rownames = vector for column labels column name 1 = vector 1 column name

2 = vector 2 hellip) You can check the content of x by typing x

gt x

team atbats hits homeruns

Rose Reds 14053 4256 160

Aaron Brewers 12364 3771 755

Yastrzemski Red Sox 11988 3419 452

Ripken Orioles 11551 3184 431

Cobb Athletics 11434 4191 117

gt

Like a regular list column name can be used to extract corresponding vector

gt x$hits

[1] 4256 3771 3419 3184 4191

gt

12

Any part of data frame can be extracted as another data frame For example

x[ c(15) c(234) ]

will extract row 1 5 and column 2 3 4 as a new data frame You can also use row names

and column names to do the same thing

x[ c(Rose Cobb) c(atbats hits homeruns)]

Various functions are defined for data frame Let us get attributes associated with the data

frame x

gt attributes(x)

$names

[1] team atbats hits homeruns

$rownames

[1] Rose Aaron Yastrzemski Ripken Cobb

$class

[1] dataframe

gt

We got names rownames and class Type names(x) rownames(x) and class(x) You can

obtain row names column names and class of x respectively

Exercise 7-1 The below table shows characteristic values of each planet in solar system

Masses are represented in 1023 kg and diameters are represented in km Make a data frame

representing the table4

Mass Diameter Satellites

Mercury 3301 4879 0

Venus 12103 48690 0

Earth 59736 12756 1

4 As all the elements are numeric in this case this data frame can be dealt as a matrix

asmatrix(x) will return x as a matrix

13

Mars 6419 6794 2

Jupiter 18986 142984 63

Saturn 5688 120536 48

Uranus 8686 51118 27

Neptune 1024 49528 13

Exercise 7-2 Using the created data frame in the previous exercise calculate averages of

mass diameter and number of satellites

8 Data reading from file

So far we have been inputting data directly However in most of cases numerical data

may be prepared by files So we will learn how to import data into a variable in R First

prepare a text file with the following values Let us name the file ldquotestdatatxtrdquo We assume

that the file is placed under the directory UserssmithTMP

14

14

21

35

6

Using setwd we change the working directory to UserssmithTMP

setwd(UserssmithTMP)

Then using scan function read the data to a variable x

gt x lt- scan(testdatatxt)

Read 5 items

gt x

[1] 14 14 21 35 6

gt

You notice that the values are stored in x as a vector

R has a function readtable to read a table where each line is separated by TABs like the

following example (file name is ldquobatterstxtrdquo)

Team At_Bat Hits Home_Runs

14

Bonds Giants 9847 2935 762

Aaron Braves 12364 3771 755

Ruth Yankees 8398 2873 714

Rodriguez Yankees 10341 3070 687

Mays Giants 10881 3283 660

readtable can read this as data frame

gt x lt- readtable(batterstxt header = T

sep = t rownames = 1)

gt x

Team At_Bat Hits Home_Runs

Bonds Giants 9847 2935 762

Aaron Braves 12364 3771 755

Ruth Yankees 8398 2873 714

Rodriguez Yankees 10341 3070 687

Mays Giants 10881 3283 660

gt

The first parameter of the function readtable states that the name of the file to read

is rdquobatterstxtrdquo Subsequently header = T states that there is a header for the table sep =

ldquotrdquo indicates that each line is separated by TAB Finally we can specify that there is one

column for row names by rownames = 1 Since x will be a data frame we can get number of

hits by $Hits

Exercise 8 Save table in exercise 7-1 as tab-delimited file and read it as data frame

9 Writing data to file

There are many cases where we want to have statistical results written in file rather

than temporary seeing the results on screen In this way we can open the results by

spreadsheet software or process the results using other programming language afterwards

Using write function is one of the simplest ways to output the results to files This

function enables us to output numerical values assigned to variable to file For example

after assigning the vector to x as shown below

x lt- c(10 12 15 19 21 34)

15

the following line

write(x outfile1txt ncolumns = 1)

will write content of x to the file ldquooutfile1txtrdquo ncolumns = 1 states that the vector will be

written to the file by one column

Content of outfile1txt

10

12

15

19

21

34

write function also enable us to write matrix data to file as shown below

gt x lt- matrix(c(123456) nrow=2 ncol=3 byrow=T)

gt x

[1] [2] [3]

[1] 1 2 3

[2] 4 5 6

gt write(t(x) outfile2txt ncolumns=ncol(x) sep=t)

gt

t(x) transposes matrix x If we do not do it the matrix written in the file will look

transposed ncolumns=ncol(x) will tell R that the number of columns to output should be

identical to that of matrix x sep = ldquotrdquo indicates that the columns in the output file will be

separated by tabs

Content of outfile2txt

1 2 3

4 5 6

For writing data frame to a file writetable function is provided

gt x lt- dataframe(

rownames = c(Bonds Aaron Ruth Rodriguez Mays)

Team = c(Giants Braves Yankees Yankees Giants)

16

At_Bat = c(9847 12364 8398 10341 10881)

Hits = c(2935 3771 2873 3070 3283)

Home_Runs = c(762 755 714 687 660))

gt x[c(Hits Home_Runs)]

Hits Home_Runs

Bonds 2935 762

Aaron 3771 755

Ruth 2873 714

Rodriguez 3070 687

Mays 3283 660

gt writetable(x[c(Hits Home_Runs)] outfile3txt sep=t

rownames=T colnames=NA)

gt

With rownames=T and colnames=NA row names and column names will be added

respectively (blank on the top-left)

Content of outfile3

Hits Home_Runs

Bonds 2935 762

Aaron 3771 755

Ruth 2873 714

Rodriguez 3070 687

Mays 3283 660

By giving quote=F output will be without double quotations

Excercise 9 Using the data frame in the above example calculate ratio of hits and at bat

and write the result to rdquooutfile4txtrdquo

10 Writing a program in a file

So far we have done our works interactively without saving what R codes we have written

However when we repeat the same works with R it is laborious to interactively write the

same codes again and again To solve this issue we can write the set of codes in a file and R

can read the file to execute the codes written in the file

For example we can prepare a text file with the following code Let us name the

17

file vecsumtestR

x lt- c(12345)

y lt- c(246810)

z lt- x + y

Then giving source( vecsumtestR) will let R execute the set of codes in the file

vecsumtestR You can check that based on the codes in vecsumtestR vectors are assigned

to each of x y and z

gt source( vecsumtestR)

gt x

[1] 1 2 3 4 5

gt y

[1] 2 4 6 8 10

gt z

[1] 3 6 9 12 15

gt

Exercise 10 Write the procedure in exercise 8 to a file dframetestR and execute the

procedure using ldquosourcerdquo

11 Defining functions

In mathematics function outputs the value which is determined by the input value In

programming it often represents a defined set of procedures5 For example let us consider

a function f which divides the sum of two given input values In mathematics it can be

written as f(x y) = (x + y) 2 In R it is written as follows by introducing the keyword

ldquofunctionrdquo

f lt- function(x y)

return ((x + y) 2)

In this way a function with two parameters (x and y) is defined And the returned value

(output) will be (x + y) 2 After defining f

5 Probably ldquosubroutinerdquo may be the better terminology

18

f(10 20)

will assign 10 and 20 to parameters x and y respectively and 15 will be returned as (x +y) 2

= (10 + 20) 2 = 15 Here ldquoreturnrdquo statement will return the value given just after it You

can also explicitly give the parameter names x and y as follows

f(x = 10 y = 20)

The general way to define a function is

name_of_function lt- function(parameter 1 parameter 2 hellip)

various procedures possibly using the given parameters6

hellip

return(return_value)

Exercise 11 Implement a mathematical function f(x a b c) = ax2 + bx + c in R Then

calculate f(4 3 2 1)

12 Making graphs

R is equipped with functions capable of making graphs easily ldquoplotrdquo function may be one

of the simplest ones It creates a plot in the two-dimensional space For example

assigning the following vectors to x and y will create a plot with points on the locations

(12) (34) (59) (77) and (98)

x lt- c(13579)

y lt- c(24978)

plot(x y xlab=X Value ylab=Y Value)

Labels on the x and y axes can be given to the parameters xlab and ylab

respectively

6 Optional parameters hellip can be access with list(hellip) as a list

19

A plot created by ldquoplotrdquo function

A bar graph can be created using ldquobarplotrdquo function In the following example a bar

graph is created by giving the heights of each bar by vector x The labels of each bar

are given by the element names of the vector x ie names(x)

x lt- c(1232101)

names(x) lt- c(A B C D E F)

barplot(x)

A bar graph created by ldquobarplotrdquo function

ldquohistrdquo function generates a histogram for a set of numbers given by a vector

2 4 6 8

23

45

67

89

X Value

YV

alu

e

A B C D E F

02

46

810

A B C D E F

02

46

810

20

x lt- c(32 12 42 23 34 59 52 53 41 52 32 14)

hist(x xlab = Test Value main = Test Histogram)

The parameter ldquomainrdquo is used to state the title of the histogram

A generated histogram using ldquohistrdquo function

ldquoboxplotrdquo function creates a boxplot for the numbers given by the vectors

x1 lt- c(1112111011111213151211101213)

x2 lt- c(20212791223231211921157121292315)

boxplot(x1 x2 names=c(Data 1 Data 2))

Test Histogram

Test Value

Fre

qu

en

cy

1 2 3 4 5 6

01

23

4

21

Example of a box plot

Exercise 12 Using the table (data frame) given in exercise 7-1 make a plot which describes

relationship between masses of the planets and number of satellites

13 Basic program structure

R is equipped with programming grammars which are important and common in other

programming languages Here some of the most important ones will be described briefly

131 if-statement

ldquoifrdquo statement gives you a way to execute a specific procedure only if a specified condition7

is satisfied For example if you want to assign 1 to y when x gt 0 and otherwise assign 0 to y

you can write as follows

if (x gt 0)

y lt- 1

7 When specifying a condition logical operators such as amp (logical AND) and | (logical OR)

can be used

Data 1 Data 2

10

15

20

25

Median

Outlier 50 of the

data are within this

range

The bottom

25 excluding outliers

The top 25

excluding outliers

22

else

y lt- 0

Investigate the value of y after doing x lt- -5 Then after doing x lt- 3 investigate the value

of y again to make sure that the value has changed

The general form of if-statement is as follows

if (condition 1)

Set of procedures to execute when the condition 1 is satisfied

else if (condition 2)

Set of procedures to execute when the condition the condition 2 is satisfied (but the

condition 1 is NOT met)

else if(condition 3)

Se of procedures to execute when condition 3 is satisfied (but condition 1 to 2 are NOT

met)

else if hellip

else

Set of procedures to execute when NONE of the above conditions are satisfied

Exercise 13-1 Define a function which returns 1 when the input parameter is 0 and

otherwise returns 0

132 while-statement

while-statement iterates a set of given procedures as long as the given condition is satisfied

The general form of while-statement is as follows8

while(condition)

8 next-statement in the while-statement forces program to start the next iteration

immediately break-statement in while-statement forces the program to immediately stop

the iteration and get out of the while-loop

23

A set of procedures to execute when the condition is satisfied

For example executing while(x lt= 3) print(x) x lt- x + 1 after doing x lt- 1 will generate

the output of 1 to 3

gt x lt- 1

gt while (x lt= 3) print(x) x lt- x + 1

[1] 1

[1] 2

[1] 3

gt

If you are to write this procedure as a program in a file you may want to write each step

line by line as follows so that the program will be easy to read

x lt- 1

while (x lt= 3)

print(x)

x lt- x + 1

Initially the value of x is 1 and the condition of the while-block ie x le 3 is met Thus R

goes into the while-block The first procedure print(x) which displays the value of x is

executed and 1 will be displayed

In the next procedure in the while-block (x lt- x + 1) the value of x is increased by 1 Thus

at the end of the first loop of while-block x is 2

R interpreter then comes back to the condition check at the beginning of the while-block

(x lt= 3) The variable x is 2 at this point and still x le 3 holds so the procedures in the

while-block will be executed again ie displaying the value of x 2 by print(x) and

increasing the value of x by 1 by x lt- x + 1 At the end of the while-block x is 3

24

R interpreter will then come back again to the condition check at the beginning of the

while-block (x lt= 3) The variable x is 3 at this point and still x le 3 holds so the

procedures in the while-block will be executed again ie displaying the value of x which

is 3 by print(x) and increasing the value of x by 1 by x lt- x + 1 At the end of the

while-block x is 4

R interpreter will then come back again to the condition check at the beginning of the

while-block (x lt= 3) The variable x is 4 at this point and x le 3 is NOT satisfied any more

Thus R interpreter steps out of the while-loop Final value of x is 4

Exercise 13-2 Using while-statement display 13579 and 11 respectively in each line

133 for-statement

for-statement also does iteration like while-statement The for-statement assigns each of

given elements to the given variable starting from the first element to the last element in

the given elements After each assignment the procedures in the for-block are executed

The general form of for-statement is

for(variables in elements)

procedures

For example for (i in c(135)) print(i) will assign each of 1 3 5 into the variable i and

after each assignment the procedures in the file-block are executed Thus in this case

number 1 is displayed in the first iteration after the assignment 3 is displayed in the

second iteration after the assignment and finally 5 is displayed in the third iteration after

the assignment The following example makes a vector of set of squared values of 1 to 59

x lt- NULL

for (i in 15)

x lt- append(x i2)

9 Actually using for-statement is unnecessary here Just do x lt- (15)2

25

x lt- NULL assigns an empty vector to x NULL represents an empty vector

In the for-statement each value of 1 to 5 is assigned to i and after the assignment the

procedure in the for-block is executed

In the procedure in the for-block the squared value of i (represented as i2) is

concatenated to the vector x The function append(x i) concatenates i to vector x10

Exercise 13-3 Rewrite procedure in 12-2 using for-statement

14 Other useful functions

Here some of the frequently used R functions are introduced briefly

141 How to use

help(function_name) will display how to use function_name

142 variables and attributes

ls() or objects() will display currently defined variables

class(variable_name) or mode(variable_name) will give you the type of the variable (object

For example whether it is numerical variable character list or matrix)

attributes(variable_name) will return attributes defined for the given variable

variabl_name

143 Family of apply functions

ldquosapplyrdquo function will return a vector containing a set of output values from a given

function after using each value in the given vector as a single input to that given function

each by each

func1_sub lt- function(elm)

a function expecting a single number as a parameter

if(-1 lt= elm amp elm lt= 1) return (1) else return (0)

10 The same procedure can be done with x lt- c(x i2) c can be used to concatenate given

vectors

26

func1 lt- function(x) a function with a vector parameter x

return(sapply(x func1_sub))

144 Making figures

rdquocurverdquo function generates a graph for a given function See help for details

Example

curve(dnorm -7 +7) Draws normal (Gaussian) distribution

curve(cos(x)+cos(2x) -2pi 2pi 1000) 1000 is number of points

curve(func1 -3 3) The function defined in 143 Family of apply functions

27

11

7 Simple data frame creation

Data frame is one of data class in R It is one type of list and has two dimensional

structure just like a matrix Each row can be deemed as a sample and each column can be

deemed as attribute of the sample Using this idea data frame can represent a table

Following table gives data of five retired major league baseball players

Team atbats hits homeruns

Rose Reds 14053 4256 160

Aaron Brewers 12364 3771 755

Yastrzemski Red Sox 11988 3419 452

Ripken Orioles 11551 3184 431

Cobb Athletics 11434 4191 117

The above table can be represented by data frame as follows

x lt- dataframe(

rownames = c(Rose Aaron Yastrzemski Ripken Cobb)

team = c(Reds Brewers Red Sox Orioles Athletics)

atbats = c(14053 12364 11988 11551 11434)

hits = c(4256 3771 3419 3184 4191)

homeruns = c(160 755 452 431 117))

The function dataframe can generate a data frame and can be used in the format of

dataframe(rownames = vector for column labels column name 1 = vector 1 column name

2 = vector 2 hellip) You can check the content of x by typing x

gt x

team atbats hits homeruns

Rose Reds 14053 4256 160

Aaron Brewers 12364 3771 755

Yastrzemski Red Sox 11988 3419 452

Ripken Orioles 11551 3184 431

Cobb Athletics 11434 4191 117

gt

Like a regular list column name can be used to extract corresponding vector

gt x$hits

[1] 4256 3771 3419 3184 4191

gt

12

Any part of data frame can be extracted as another data frame For example

x[ c(15) c(234) ]

will extract row 1 5 and column 2 3 4 as a new data frame You can also use row names

and column names to do the same thing

x[ c(Rose Cobb) c(atbats hits homeruns)]

Various functions are defined for data frame Let us get attributes associated with the data

frame x

gt attributes(x)

$names

[1] team atbats hits homeruns

$rownames

[1] Rose Aaron Yastrzemski Ripken Cobb

$class

[1] dataframe

gt

We got names rownames and class Type names(x) rownames(x) and class(x) You can

obtain row names column names and class of x respectively

Exercise 7-1 The below table shows characteristic values of each planet in solar system

Masses are represented in 1023 kg and diameters are represented in km Make a data frame

representing the table4

Mass Diameter Satellites

Mercury 3301 4879 0

Venus 12103 48690 0

Earth 59736 12756 1

4 As all the elements are numeric in this case this data frame can be dealt as a matrix

asmatrix(x) will return x as a matrix

13

Mars 6419 6794 2

Jupiter 18986 142984 63

Saturn 5688 120536 48

Uranus 8686 51118 27

Neptune 1024 49528 13

Exercise 7-2 Using the created data frame in the previous exercise calculate averages of

mass diameter and number of satellites

8 Data reading from file

So far we have been inputting data directly However in most of cases numerical data

may be prepared by files So we will learn how to import data into a variable in R First

prepare a text file with the following values Let us name the file ldquotestdatatxtrdquo We assume

that the file is placed under the directory UserssmithTMP

14

14

21

35

6

Using setwd we change the working directory to UserssmithTMP

setwd(UserssmithTMP)

Then using scan function read the data to a variable x

gt x lt- scan(testdatatxt)

Read 5 items

gt x

[1] 14 14 21 35 6

gt

You notice that the values are stored in x as a vector

R has a function readtable to read a table where each line is separated by TABs like the

following example (file name is ldquobatterstxtrdquo)

Team At_Bat Hits Home_Runs

14

Bonds Giants 9847 2935 762

Aaron Braves 12364 3771 755

Ruth Yankees 8398 2873 714

Rodriguez Yankees 10341 3070 687

Mays Giants 10881 3283 660

readtable can read this as data frame

gt x lt- readtable(batterstxt header = T

sep = t rownames = 1)

gt x

Team At_Bat Hits Home_Runs

Bonds Giants 9847 2935 762

Aaron Braves 12364 3771 755

Ruth Yankees 8398 2873 714

Rodriguez Yankees 10341 3070 687

Mays Giants 10881 3283 660

gt

The first parameter of the function readtable states that the name of the file to read

is rdquobatterstxtrdquo Subsequently header = T states that there is a header for the table sep =

ldquotrdquo indicates that each line is separated by TAB Finally we can specify that there is one

column for row names by rownames = 1 Since x will be a data frame we can get number of

hits by $Hits

Exercise 8 Save table in exercise 7-1 as tab-delimited file and read it as data frame

9 Writing data to file

There are many cases where we want to have statistical results written in file rather

than temporary seeing the results on screen In this way we can open the results by

spreadsheet software or process the results using other programming language afterwards

Using write function is one of the simplest ways to output the results to files This

function enables us to output numerical values assigned to variable to file For example

after assigning the vector to x as shown below

x lt- c(10 12 15 19 21 34)

15

the following line

write(x outfile1txt ncolumns = 1)

will write content of x to the file ldquooutfile1txtrdquo ncolumns = 1 states that the vector will be

written to the file by one column

Content of outfile1txt

10

12

15

19

21

34

write function also enable us to write matrix data to file as shown below

gt x lt- matrix(c(123456) nrow=2 ncol=3 byrow=T)

gt x

[1] [2] [3]

[1] 1 2 3

[2] 4 5 6

gt write(t(x) outfile2txt ncolumns=ncol(x) sep=t)

gt

t(x) transposes matrix x If we do not do it the matrix written in the file will look

transposed ncolumns=ncol(x) will tell R that the number of columns to output should be

identical to that of matrix x sep = ldquotrdquo indicates that the columns in the output file will be

separated by tabs

Content of outfile2txt

1 2 3

4 5 6

For writing data frame to a file writetable function is provided

gt x lt- dataframe(

rownames = c(Bonds Aaron Ruth Rodriguez Mays)

Team = c(Giants Braves Yankees Yankees Giants)

16

At_Bat = c(9847 12364 8398 10341 10881)

Hits = c(2935 3771 2873 3070 3283)

Home_Runs = c(762 755 714 687 660))

gt x[c(Hits Home_Runs)]

Hits Home_Runs

Bonds 2935 762

Aaron 3771 755

Ruth 2873 714

Rodriguez 3070 687

Mays 3283 660

gt writetable(x[c(Hits Home_Runs)] outfile3txt sep=t

rownames=T colnames=NA)

gt

With rownames=T and colnames=NA row names and column names will be added

respectively (blank on the top-left)

Content of outfile3

Hits Home_Runs

Bonds 2935 762

Aaron 3771 755

Ruth 2873 714

Rodriguez 3070 687

Mays 3283 660

By giving quote=F output will be without double quotations

Excercise 9 Using the data frame in the above example calculate ratio of hits and at bat

and write the result to rdquooutfile4txtrdquo

10 Writing a program in a file

So far we have done our works interactively without saving what R codes we have written

However when we repeat the same works with R it is laborious to interactively write the

same codes again and again To solve this issue we can write the set of codes in a file and R

can read the file to execute the codes written in the file

For example we can prepare a text file with the following code Let us name the

17

file vecsumtestR

x lt- c(12345)

y lt- c(246810)

z lt- x + y

Then giving source( vecsumtestR) will let R execute the set of codes in the file

vecsumtestR You can check that based on the codes in vecsumtestR vectors are assigned

to each of x y and z

gt source( vecsumtestR)

gt x

[1] 1 2 3 4 5

gt y

[1] 2 4 6 8 10

gt z

[1] 3 6 9 12 15

gt

Exercise 10 Write the procedure in exercise 8 to a file dframetestR and execute the

procedure using ldquosourcerdquo

11 Defining functions

In mathematics function outputs the value which is determined by the input value In

programming it often represents a defined set of procedures5 For example let us consider

a function f which divides the sum of two given input values In mathematics it can be

written as f(x y) = (x + y) 2 In R it is written as follows by introducing the keyword

ldquofunctionrdquo

f lt- function(x y)

return ((x + y) 2)

In this way a function with two parameters (x and y) is defined And the returned value

(output) will be (x + y) 2 After defining f

5 Probably ldquosubroutinerdquo may be the better terminology

18

f(10 20)

will assign 10 and 20 to parameters x and y respectively and 15 will be returned as (x +y) 2

= (10 + 20) 2 = 15 Here ldquoreturnrdquo statement will return the value given just after it You

can also explicitly give the parameter names x and y as follows

f(x = 10 y = 20)

The general way to define a function is

name_of_function lt- function(parameter 1 parameter 2 hellip)

various procedures possibly using the given parameters6

hellip

return(return_value)

Exercise 11 Implement a mathematical function f(x a b c) = ax2 + bx + c in R Then

calculate f(4 3 2 1)

12 Making graphs

R is equipped with functions capable of making graphs easily ldquoplotrdquo function may be one

of the simplest ones It creates a plot in the two-dimensional space For example

assigning the following vectors to x and y will create a plot with points on the locations

(12) (34) (59) (77) and (98)

x lt- c(13579)

y lt- c(24978)

plot(x y xlab=X Value ylab=Y Value)

Labels on the x and y axes can be given to the parameters xlab and ylab

respectively

6 Optional parameters hellip can be access with list(hellip) as a list

19

A plot created by ldquoplotrdquo function

A bar graph can be created using ldquobarplotrdquo function In the following example a bar

graph is created by giving the heights of each bar by vector x The labels of each bar

are given by the element names of the vector x ie names(x)

x lt- c(1232101)

names(x) lt- c(A B C D E F)

barplot(x)

A bar graph created by ldquobarplotrdquo function

ldquohistrdquo function generates a histogram for a set of numbers given by a vector

2 4 6 8

23

45

67

89

X Value

YV

alu

e

A B C D E F

02

46

810

A B C D E F

02

46

810

20

x lt- c(32 12 42 23 34 59 52 53 41 52 32 14)

hist(x xlab = Test Value main = Test Histogram)

The parameter ldquomainrdquo is used to state the title of the histogram

A generated histogram using ldquohistrdquo function

ldquoboxplotrdquo function creates a boxplot for the numbers given by the vectors

x1 lt- c(1112111011111213151211101213)

x2 lt- c(20212791223231211921157121292315)

boxplot(x1 x2 names=c(Data 1 Data 2))

Test Histogram

Test Value

Fre

qu

en

cy

1 2 3 4 5 6

01

23

4

21

Example of a box plot

Exercise 12 Using the table (data frame) given in exercise 7-1 make a plot which describes

relationship between masses of the planets and number of satellites

13 Basic program structure

R is equipped with programming grammars which are important and common in other

programming languages Here some of the most important ones will be described briefly

131 if-statement

ldquoifrdquo statement gives you a way to execute a specific procedure only if a specified condition7

is satisfied For example if you want to assign 1 to y when x gt 0 and otherwise assign 0 to y

you can write as follows

if (x gt 0)

y lt- 1

7 When specifying a condition logical operators such as amp (logical AND) and | (logical OR)

can be used

Data 1 Data 2

10

15

20

25

Median

Outlier 50 of the

data are within this

range

The bottom

25 excluding outliers

The top 25

excluding outliers

22

else

y lt- 0

Investigate the value of y after doing x lt- -5 Then after doing x lt- 3 investigate the value

of y again to make sure that the value has changed

The general form of if-statement is as follows

if (condition 1)

Set of procedures to execute when the condition 1 is satisfied

else if (condition 2)

Set of procedures to execute when the condition the condition 2 is satisfied (but the

condition 1 is NOT met)

else if(condition 3)

Se of procedures to execute when condition 3 is satisfied (but condition 1 to 2 are NOT

met)

else if hellip

else

Set of procedures to execute when NONE of the above conditions are satisfied

Exercise 13-1 Define a function which returns 1 when the input parameter is 0 and

otherwise returns 0

132 while-statement

while-statement iterates a set of given procedures as long as the given condition is satisfied

The general form of while-statement is as follows8

while(condition)

8 next-statement in the while-statement forces program to start the next iteration

immediately break-statement in while-statement forces the program to immediately stop

the iteration and get out of the while-loop

23

A set of procedures to execute when the condition is satisfied

For example executing while(x lt= 3) print(x) x lt- x + 1 after doing x lt- 1 will generate

the output of 1 to 3

gt x lt- 1

gt while (x lt= 3) print(x) x lt- x + 1

[1] 1

[1] 2

[1] 3

gt

If you are to write this procedure as a program in a file you may want to write each step

line by line as follows so that the program will be easy to read

x lt- 1

while (x lt= 3)

print(x)

x lt- x + 1

Initially the value of x is 1 and the condition of the while-block ie x le 3 is met Thus R

goes into the while-block The first procedure print(x) which displays the value of x is

executed and 1 will be displayed

In the next procedure in the while-block (x lt- x + 1) the value of x is increased by 1 Thus

at the end of the first loop of while-block x is 2

R interpreter then comes back to the condition check at the beginning of the while-block

(x lt= 3) The variable x is 2 at this point and still x le 3 holds so the procedures in the

while-block will be executed again ie displaying the value of x 2 by print(x) and

increasing the value of x by 1 by x lt- x + 1 At the end of the while-block x is 3

24

R interpreter will then come back again to the condition check at the beginning of the

while-block (x lt= 3) The variable x is 3 at this point and still x le 3 holds so the

procedures in the while-block will be executed again ie displaying the value of x which

is 3 by print(x) and increasing the value of x by 1 by x lt- x + 1 At the end of the

while-block x is 4

R interpreter will then come back again to the condition check at the beginning of the

while-block (x lt= 3) The variable x is 4 at this point and x le 3 is NOT satisfied any more

Thus R interpreter steps out of the while-loop Final value of x is 4

Exercise 13-2 Using while-statement display 13579 and 11 respectively in each line

133 for-statement

for-statement also does iteration like while-statement The for-statement assigns each of

given elements to the given variable starting from the first element to the last element in

the given elements After each assignment the procedures in the for-block are executed

The general form of for-statement is

for(variables in elements)

procedures

For example for (i in c(135)) print(i) will assign each of 1 3 5 into the variable i and

after each assignment the procedures in the file-block are executed Thus in this case

number 1 is displayed in the first iteration after the assignment 3 is displayed in the

second iteration after the assignment and finally 5 is displayed in the third iteration after

the assignment The following example makes a vector of set of squared values of 1 to 59

x lt- NULL

for (i in 15)

x lt- append(x i2)

9 Actually using for-statement is unnecessary here Just do x lt- (15)2

25

x lt- NULL assigns an empty vector to x NULL represents an empty vector

In the for-statement each value of 1 to 5 is assigned to i and after the assignment the

procedure in the for-block is executed

In the procedure in the for-block the squared value of i (represented as i2) is

concatenated to the vector x The function append(x i) concatenates i to vector x10

Exercise 13-3 Rewrite procedure in 12-2 using for-statement

14 Other useful functions

Here some of the frequently used R functions are introduced briefly

141 How to use

help(function_name) will display how to use function_name

142 variables and attributes

ls() or objects() will display currently defined variables

class(variable_name) or mode(variable_name) will give you the type of the variable (object

For example whether it is numerical variable character list or matrix)

attributes(variable_name) will return attributes defined for the given variable

variabl_name

143 Family of apply functions

ldquosapplyrdquo function will return a vector containing a set of output values from a given

function after using each value in the given vector as a single input to that given function

each by each

func1_sub lt- function(elm)

a function expecting a single number as a parameter

if(-1 lt= elm amp elm lt= 1) return (1) else return (0)

10 The same procedure can be done with x lt- c(x i2) c can be used to concatenate given

vectors

26

func1 lt- function(x) a function with a vector parameter x

return(sapply(x func1_sub))

144 Making figures

rdquocurverdquo function generates a graph for a given function See help for details

Example

curve(dnorm -7 +7) Draws normal (Gaussian) distribution

curve(cos(x)+cos(2x) -2pi 2pi 1000) 1000 is number of points

curve(func1 -3 3) The function defined in 143 Family of apply functions

27

12

Any part of data frame can be extracted as another data frame For example

x[ c(15) c(234) ]

will extract row 1 5 and column 2 3 4 as a new data frame You can also use row names

and column names to do the same thing

x[ c(Rose Cobb) c(atbats hits homeruns)]

Various functions are defined for data frame Let us get attributes associated with the data

frame x

gt attributes(x)

$names

[1] team atbats hits homeruns

$rownames

[1] Rose Aaron Yastrzemski Ripken Cobb

$class

[1] dataframe

gt

We got names rownames and class Type names(x) rownames(x) and class(x) You can

obtain row names column names and class of x respectively

Exercise 7-1 The below table shows characteristic values of each planet in solar system

Masses are represented in 1023 kg and diameters are represented in km Make a data frame

representing the table4

Mass Diameter Satellites

Mercury 3301 4879 0

Venus 12103 48690 0

Earth 59736 12756 1

4 As all the elements are numeric in this case this data frame can be dealt as a matrix

asmatrix(x) will return x as a matrix

13

Mars 6419 6794 2

Jupiter 18986 142984 63

Saturn 5688 120536 48

Uranus 8686 51118 27

Neptune 1024 49528 13

Exercise 7-2 Using the created data frame in the previous exercise calculate averages of

mass diameter and number of satellites

8 Data reading from file

So far we have been inputting data directly However in most of cases numerical data

may be prepared by files So we will learn how to import data into a variable in R First

prepare a text file with the following values Let us name the file ldquotestdatatxtrdquo We assume

that the file is placed under the directory UserssmithTMP

14

14

21

35

6

Using setwd we change the working directory to UserssmithTMP

setwd(UserssmithTMP)

Then using scan function read the data to a variable x

gt x lt- scan(testdatatxt)

Read 5 items

gt x

[1] 14 14 21 35 6

gt

You notice that the values are stored in x as a vector

R has a function readtable to read a table where each line is separated by TABs like the

following example (file name is ldquobatterstxtrdquo)

Team At_Bat Hits Home_Runs

14

Bonds Giants 9847 2935 762

Aaron Braves 12364 3771 755

Ruth Yankees 8398 2873 714

Rodriguez Yankees 10341 3070 687

Mays Giants 10881 3283 660

readtable can read this as data frame

gt x lt- readtable(batterstxt header = T

sep = t rownames = 1)

gt x

Team At_Bat Hits Home_Runs

Bonds Giants 9847 2935 762

Aaron Braves 12364 3771 755

Ruth Yankees 8398 2873 714

Rodriguez Yankees 10341 3070 687

Mays Giants 10881 3283 660

gt

The first parameter of the function readtable states that the name of the file to read

is rdquobatterstxtrdquo Subsequently header = T states that there is a header for the table sep =

ldquotrdquo indicates that each line is separated by TAB Finally we can specify that there is one

column for row names by rownames = 1 Since x will be a data frame we can get number of

hits by $Hits

Exercise 8 Save table in exercise 7-1 as tab-delimited file and read it as data frame

9 Writing data to file

There are many cases where we want to have statistical results written in file rather

than temporary seeing the results on screen In this way we can open the results by

spreadsheet software or process the results using other programming language afterwards

Using write function is one of the simplest ways to output the results to files This

function enables us to output numerical values assigned to variable to file For example

after assigning the vector to x as shown below

x lt- c(10 12 15 19 21 34)

15

the following line

write(x outfile1txt ncolumns = 1)

will write content of x to the file ldquooutfile1txtrdquo ncolumns = 1 states that the vector will be

written to the file by one column

Content of outfile1txt

10

12

15

19

21

34

write function also enable us to write matrix data to file as shown below

gt x lt- matrix(c(123456) nrow=2 ncol=3 byrow=T)

gt x

[1] [2] [3]

[1] 1 2 3

[2] 4 5 6

gt write(t(x) outfile2txt ncolumns=ncol(x) sep=t)

gt

t(x) transposes matrix x If we do not do it the matrix written in the file will look

transposed ncolumns=ncol(x) will tell R that the number of columns to output should be

identical to that of matrix x sep = ldquotrdquo indicates that the columns in the output file will be

separated by tabs

Content of outfile2txt

1 2 3

4 5 6

For writing data frame to a file writetable function is provided

gt x lt- dataframe(

rownames = c(Bonds Aaron Ruth Rodriguez Mays)

Team = c(Giants Braves Yankees Yankees Giants)

16

At_Bat = c(9847 12364 8398 10341 10881)

Hits = c(2935 3771 2873 3070 3283)

Home_Runs = c(762 755 714 687 660))

gt x[c(Hits Home_Runs)]

Hits Home_Runs

Bonds 2935 762

Aaron 3771 755

Ruth 2873 714

Rodriguez 3070 687

Mays 3283 660

gt writetable(x[c(Hits Home_Runs)] outfile3txt sep=t

rownames=T colnames=NA)

gt

With rownames=T and colnames=NA row names and column names will be added

respectively (blank on the top-left)

Content of outfile3

Hits Home_Runs

Bonds 2935 762

Aaron 3771 755

Ruth 2873 714

Rodriguez 3070 687

Mays 3283 660

By giving quote=F output will be without double quotations

Excercise 9 Using the data frame in the above example calculate ratio of hits and at bat

and write the result to rdquooutfile4txtrdquo

10 Writing a program in a file

So far we have done our works interactively without saving what R codes we have written

However when we repeat the same works with R it is laborious to interactively write the

same codes again and again To solve this issue we can write the set of codes in a file and R

can read the file to execute the codes written in the file

For example we can prepare a text file with the following code Let us name the

17

file vecsumtestR

x lt- c(12345)

y lt- c(246810)

z lt- x + y

Then giving source( vecsumtestR) will let R execute the set of codes in the file

vecsumtestR You can check that based on the codes in vecsumtestR vectors are assigned

to each of x y and z

gt source( vecsumtestR)

gt x

[1] 1 2 3 4 5

gt y

[1] 2 4 6 8 10

gt z

[1] 3 6 9 12 15

gt

Exercise 10 Write the procedure in exercise 8 to a file dframetestR and execute the

procedure using ldquosourcerdquo

11 Defining functions

In mathematics function outputs the value which is determined by the input value In

programming it often represents a defined set of procedures5 For example let us consider

a function f which divides the sum of two given input values In mathematics it can be

written as f(x y) = (x + y) 2 In R it is written as follows by introducing the keyword

ldquofunctionrdquo

f lt- function(x y)

return ((x + y) 2)

In this way a function with two parameters (x and y) is defined And the returned value

(output) will be (x + y) 2 After defining f

5 Probably ldquosubroutinerdquo may be the better terminology

18

f(10 20)

will assign 10 and 20 to parameters x and y respectively and 15 will be returned as (x +y) 2

= (10 + 20) 2 = 15 Here ldquoreturnrdquo statement will return the value given just after it You

can also explicitly give the parameter names x and y as follows

f(x = 10 y = 20)

The general way to define a function is

name_of_function lt- function(parameter 1 parameter 2 hellip)

various procedures possibly using the given parameters6

hellip

return(return_value)

Exercise 11 Implement a mathematical function f(x a b c) = ax2 + bx + c in R Then

calculate f(4 3 2 1)

12 Making graphs

R is equipped with functions capable of making graphs easily ldquoplotrdquo function may be one

of the simplest ones It creates a plot in the two-dimensional space For example

assigning the following vectors to x and y will create a plot with points on the locations

(12) (34) (59) (77) and (98)

x lt- c(13579)

y lt- c(24978)

plot(x y xlab=X Value ylab=Y Value)

Labels on the x and y axes can be given to the parameters xlab and ylab

respectively

6 Optional parameters hellip can be access with list(hellip) as a list

19

A plot created by ldquoplotrdquo function

A bar graph can be created using ldquobarplotrdquo function In the following example a bar

graph is created by giving the heights of each bar by vector x The labels of each bar

are given by the element names of the vector x ie names(x)

x lt- c(1232101)

names(x) lt- c(A B C D E F)

barplot(x)

A bar graph created by ldquobarplotrdquo function

ldquohistrdquo function generates a histogram for a set of numbers given by a vector

2 4 6 8

23

45

67

89

X Value

YV

alu

e

A B C D E F

02

46

810

A B C D E F

02

46

810

20

x lt- c(32 12 42 23 34 59 52 53 41 52 32 14)

hist(x xlab = Test Value main = Test Histogram)

The parameter ldquomainrdquo is used to state the title of the histogram

A generated histogram using ldquohistrdquo function

ldquoboxplotrdquo function creates a boxplot for the numbers given by the vectors

x1 lt- c(1112111011111213151211101213)

x2 lt- c(20212791223231211921157121292315)

boxplot(x1 x2 names=c(Data 1 Data 2))

Test Histogram

Test Value

Fre

qu

en

cy

1 2 3 4 5 6

01

23

4

21

Example of a box plot

Exercise 12 Using the table (data frame) given in exercise 7-1 make a plot which describes

relationship between masses of the planets and number of satellites

13 Basic program structure

R is equipped with programming grammars which are important and common in other

programming languages Here some of the most important ones will be described briefly

131 if-statement

ldquoifrdquo statement gives you a way to execute a specific procedure only if a specified condition7

is satisfied For example if you want to assign 1 to y when x gt 0 and otherwise assign 0 to y

you can write as follows

if (x gt 0)

y lt- 1

7 When specifying a condition logical operators such as amp (logical AND) and | (logical OR)

can be used

Data 1 Data 2

10

15

20

25

Median

Outlier 50 of the

data are within this

range

The bottom

25 excluding outliers

The top 25

excluding outliers

22

else

y lt- 0

Investigate the value of y after doing x lt- -5 Then after doing x lt- 3 investigate the value

of y again to make sure that the value has changed

The general form of if-statement is as follows

if (condition 1)

Set of procedures to execute when the condition 1 is satisfied

else if (condition 2)

Set of procedures to execute when the condition the condition 2 is satisfied (but the

condition 1 is NOT met)

else if(condition 3)

Se of procedures to execute when condition 3 is satisfied (but condition 1 to 2 are NOT

met)

else if hellip

else

Set of procedures to execute when NONE of the above conditions are satisfied

Exercise 13-1 Define a function which returns 1 when the input parameter is 0 and

otherwise returns 0

132 while-statement

while-statement iterates a set of given procedures as long as the given condition is satisfied

The general form of while-statement is as follows8

while(condition)

8 next-statement in the while-statement forces program to start the next iteration

immediately break-statement in while-statement forces the program to immediately stop

the iteration and get out of the while-loop

23

A set of procedures to execute when the condition is satisfied

For example executing while(x lt= 3) print(x) x lt- x + 1 after doing x lt- 1 will generate

the output of 1 to 3

gt x lt- 1

gt while (x lt= 3) print(x) x lt- x + 1

[1] 1

[1] 2

[1] 3

gt

If you are to write this procedure as a program in a file you may want to write each step

line by line as follows so that the program will be easy to read

x lt- 1

while (x lt= 3)

print(x)

x lt- x + 1

Initially the value of x is 1 and the condition of the while-block ie x le 3 is met Thus R

goes into the while-block The first procedure print(x) which displays the value of x is

executed and 1 will be displayed

In the next procedure in the while-block (x lt- x + 1) the value of x is increased by 1 Thus

at the end of the first loop of while-block x is 2

R interpreter then comes back to the condition check at the beginning of the while-block

(x lt= 3) The variable x is 2 at this point and still x le 3 holds so the procedures in the

while-block will be executed again ie displaying the value of x 2 by print(x) and

increasing the value of x by 1 by x lt- x + 1 At the end of the while-block x is 3

24

R interpreter will then come back again to the condition check at the beginning of the

while-block (x lt= 3) The variable x is 3 at this point and still x le 3 holds so the

procedures in the while-block will be executed again ie displaying the value of x which

is 3 by print(x) and increasing the value of x by 1 by x lt- x + 1 At the end of the

while-block x is 4

R interpreter will then come back again to the condition check at the beginning of the

while-block (x lt= 3) The variable x is 4 at this point and x le 3 is NOT satisfied any more

Thus R interpreter steps out of the while-loop Final value of x is 4

Exercise 13-2 Using while-statement display 13579 and 11 respectively in each line

133 for-statement

for-statement also does iteration like while-statement The for-statement assigns each of

given elements to the given variable starting from the first element to the last element in

the given elements After each assignment the procedures in the for-block are executed

The general form of for-statement is

for(variables in elements)

procedures

For example for (i in c(135)) print(i) will assign each of 1 3 5 into the variable i and

after each assignment the procedures in the file-block are executed Thus in this case

number 1 is displayed in the first iteration after the assignment 3 is displayed in the

second iteration after the assignment and finally 5 is displayed in the third iteration after

the assignment The following example makes a vector of set of squared values of 1 to 59

x lt- NULL

for (i in 15)

x lt- append(x i2)

9 Actually using for-statement is unnecessary here Just do x lt- (15)2

25

x lt- NULL assigns an empty vector to x NULL represents an empty vector

In the for-statement each value of 1 to 5 is assigned to i and after the assignment the

procedure in the for-block is executed

In the procedure in the for-block the squared value of i (represented as i2) is

concatenated to the vector x The function append(x i) concatenates i to vector x10

Exercise 13-3 Rewrite procedure in 12-2 using for-statement

14 Other useful functions

Here some of the frequently used R functions are introduced briefly

141 How to use

help(function_name) will display how to use function_name

142 variables and attributes

ls() or objects() will display currently defined variables

class(variable_name) or mode(variable_name) will give you the type of the variable (object

For example whether it is numerical variable character list or matrix)

attributes(variable_name) will return attributes defined for the given variable

variabl_name

143 Family of apply functions

ldquosapplyrdquo function will return a vector containing a set of output values from a given

function after using each value in the given vector as a single input to that given function

each by each

func1_sub lt- function(elm)

a function expecting a single number as a parameter

if(-1 lt= elm amp elm lt= 1) return (1) else return (0)

10 The same procedure can be done with x lt- c(x i2) c can be used to concatenate given

vectors

26

func1 lt- function(x) a function with a vector parameter x

return(sapply(x func1_sub))

144 Making figures

rdquocurverdquo function generates a graph for a given function See help for details

Example

curve(dnorm -7 +7) Draws normal (Gaussian) distribution

curve(cos(x)+cos(2x) -2pi 2pi 1000) 1000 is number of points

curve(func1 -3 3) The function defined in 143 Family of apply functions

27

13

Mars 6419 6794 2

Jupiter 18986 142984 63

Saturn 5688 120536 48

Uranus 8686 51118 27

Neptune 1024 49528 13

Exercise 7-2 Using the created data frame in the previous exercise calculate averages of

mass diameter and number of satellites

8 Data reading from file

So far we have been inputting data directly However in most of cases numerical data

may be prepared by files So we will learn how to import data into a variable in R First

prepare a text file with the following values Let us name the file ldquotestdatatxtrdquo We assume

that the file is placed under the directory UserssmithTMP

14

14

21

35

6

Using setwd we change the working directory to UserssmithTMP

setwd(UserssmithTMP)

Then using scan function read the data to a variable x

gt x lt- scan(testdatatxt)

Read 5 items

gt x

[1] 14 14 21 35 6

gt

You notice that the values are stored in x as a vector

R has a function readtable to read a table where each line is separated by TABs like the

following example (file name is ldquobatterstxtrdquo)

Team At_Bat Hits Home_Runs

14

Bonds Giants 9847 2935 762

Aaron Braves 12364 3771 755

Ruth Yankees 8398 2873 714

Rodriguez Yankees 10341 3070 687

Mays Giants 10881 3283 660

readtable can read this as data frame

gt x lt- readtable(batterstxt header = T

sep = t rownames = 1)

gt x

Team At_Bat Hits Home_Runs

Bonds Giants 9847 2935 762

Aaron Braves 12364 3771 755

Ruth Yankees 8398 2873 714

Rodriguez Yankees 10341 3070 687

Mays Giants 10881 3283 660

gt

The first parameter of the function readtable states that the name of the file to read

is rdquobatterstxtrdquo Subsequently header = T states that there is a header for the table sep =

ldquotrdquo indicates that each line is separated by TAB Finally we can specify that there is one

column for row names by rownames = 1 Since x will be a data frame we can get number of

hits by $Hits

Exercise 8 Save table in exercise 7-1 as tab-delimited file and read it as data frame

9 Writing data to file

There are many cases where we want to have statistical results written in file rather

than temporary seeing the results on screen In this way we can open the results by

spreadsheet software or process the results using other programming language afterwards

Using write function is one of the simplest ways to output the results to files This

function enables us to output numerical values assigned to variable to file For example

after assigning the vector to x as shown below

x lt- c(10 12 15 19 21 34)

15

the following line

write(x outfile1txt ncolumns = 1)

will write content of x to the file ldquooutfile1txtrdquo ncolumns = 1 states that the vector will be

written to the file by one column

Content of outfile1txt

10

12

15

19

21

34

write function also enable us to write matrix data to file as shown below

gt x lt- matrix(c(123456) nrow=2 ncol=3 byrow=T)

gt x

[1] [2] [3]

[1] 1 2 3

[2] 4 5 6

gt write(t(x) outfile2txt ncolumns=ncol(x) sep=t)

gt

t(x) transposes matrix x If we do not do it the matrix written in the file will look

transposed ncolumns=ncol(x) will tell R that the number of columns to output should be

identical to that of matrix x sep = ldquotrdquo indicates that the columns in the output file will be

separated by tabs

Content of outfile2txt

1 2 3

4 5 6

For writing data frame to a file writetable function is provided

gt x lt- dataframe(

rownames = c(Bonds Aaron Ruth Rodriguez Mays)

Team = c(Giants Braves Yankees Yankees Giants)

16

At_Bat = c(9847 12364 8398 10341 10881)

Hits = c(2935 3771 2873 3070 3283)

Home_Runs = c(762 755 714 687 660))

gt x[c(Hits Home_Runs)]

Hits Home_Runs

Bonds 2935 762

Aaron 3771 755

Ruth 2873 714

Rodriguez 3070 687

Mays 3283 660

gt writetable(x[c(Hits Home_Runs)] outfile3txt sep=t

rownames=T colnames=NA)

gt

With rownames=T and colnames=NA row names and column names will be added

respectively (blank on the top-left)

Content of outfile3

Hits Home_Runs

Bonds 2935 762

Aaron 3771 755

Ruth 2873 714

Rodriguez 3070 687

Mays 3283 660

By giving quote=F output will be without double quotations

Excercise 9 Using the data frame in the above example calculate ratio of hits and at bat

and write the result to rdquooutfile4txtrdquo

10 Writing a program in a file

So far we have done our works interactively without saving what R codes we have written

However when we repeat the same works with R it is laborious to interactively write the

same codes again and again To solve this issue we can write the set of codes in a file and R

can read the file to execute the codes written in the file

For example we can prepare a text file with the following code Let us name the

17

file vecsumtestR

x lt- c(12345)

y lt- c(246810)

z lt- x + y

Then giving source( vecsumtestR) will let R execute the set of codes in the file

vecsumtestR You can check that based on the codes in vecsumtestR vectors are assigned

to each of x y and z

gt source( vecsumtestR)

gt x

[1] 1 2 3 4 5

gt y

[1] 2 4 6 8 10

gt z

[1] 3 6 9 12 15

gt

Exercise 10 Write the procedure in exercise 8 to a file dframetestR and execute the

procedure using ldquosourcerdquo

11 Defining functions

In mathematics function outputs the value which is determined by the input value In

programming it often represents a defined set of procedures5 For example let us consider

a function f which divides the sum of two given input values In mathematics it can be

written as f(x y) = (x + y) 2 In R it is written as follows by introducing the keyword

ldquofunctionrdquo

f lt- function(x y)

return ((x + y) 2)

In this way a function with two parameters (x and y) is defined And the returned value

(output) will be (x + y) 2 After defining f

5 Probably ldquosubroutinerdquo may be the better terminology

18

f(10 20)

will assign 10 and 20 to parameters x and y respectively and 15 will be returned as (x +y) 2

= (10 + 20) 2 = 15 Here ldquoreturnrdquo statement will return the value given just after it You

can also explicitly give the parameter names x and y as follows

f(x = 10 y = 20)

The general way to define a function is

name_of_function lt- function(parameter 1 parameter 2 hellip)

various procedures possibly using the given parameters6

hellip

return(return_value)

Exercise 11 Implement a mathematical function f(x a b c) = ax2 + bx + c in R Then

calculate f(4 3 2 1)

12 Making graphs

R is equipped with functions capable of making graphs easily ldquoplotrdquo function may be one

of the simplest ones It creates a plot in the two-dimensional space For example

assigning the following vectors to x and y will create a plot with points on the locations

(12) (34) (59) (77) and (98)

x lt- c(13579)

y lt- c(24978)

plot(x y xlab=X Value ylab=Y Value)

Labels on the x and y axes can be given to the parameters xlab and ylab

respectively

6 Optional parameters hellip can be access with list(hellip) as a list

19

A plot created by ldquoplotrdquo function

A bar graph can be created using ldquobarplotrdquo function In the following example a bar

graph is created by giving the heights of each bar by vector x The labels of each bar

are given by the element names of the vector x ie names(x)

x lt- c(1232101)

names(x) lt- c(A B C D E F)

barplot(x)

A bar graph created by ldquobarplotrdquo function

ldquohistrdquo function generates a histogram for a set of numbers given by a vector

2 4 6 8

23

45

67

89

X Value

YV

alu

e

A B C D E F

02

46

810

A B C D E F

02

46

810

20

x lt- c(32 12 42 23 34 59 52 53 41 52 32 14)

hist(x xlab = Test Value main = Test Histogram)

The parameter ldquomainrdquo is used to state the title of the histogram

A generated histogram using ldquohistrdquo function

ldquoboxplotrdquo function creates a boxplot for the numbers given by the vectors

x1 lt- c(1112111011111213151211101213)

x2 lt- c(20212791223231211921157121292315)

boxplot(x1 x2 names=c(Data 1 Data 2))

Test Histogram

Test Value

Fre

qu

en

cy

1 2 3 4 5 6

01

23

4

21

Example of a box plot

Exercise 12 Using the table (data frame) given in exercise 7-1 make a plot which describes

relationship between masses of the planets and number of satellites

13 Basic program structure

R is equipped with programming grammars which are important and common in other

programming languages Here some of the most important ones will be described briefly

131 if-statement

ldquoifrdquo statement gives you a way to execute a specific procedure only if a specified condition7

is satisfied For example if you want to assign 1 to y when x gt 0 and otherwise assign 0 to y

you can write as follows

if (x gt 0)

y lt- 1

7 When specifying a condition logical operators such as amp (logical AND) and | (logical OR)

can be used

Data 1 Data 2

10

15

20

25

Median

Outlier 50 of the

data are within this

range

The bottom

25 excluding outliers

The top 25

excluding outliers

22

else

y lt- 0

Investigate the value of y after doing x lt- -5 Then after doing x lt- 3 investigate the value

of y again to make sure that the value has changed

The general form of if-statement is as follows

if (condition 1)

Set of procedures to execute when the condition 1 is satisfied

else if (condition 2)

Set of procedures to execute when the condition the condition 2 is satisfied (but the

condition 1 is NOT met)

else if(condition 3)

Se of procedures to execute when condition 3 is satisfied (but condition 1 to 2 are NOT

met)

else if hellip

else

Set of procedures to execute when NONE of the above conditions are satisfied

Exercise 13-1 Define a function which returns 1 when the input parameter is 0 and

otherwise returns 0

132 while-statement

while-statement iterates a set of given procedures as long as the given condition is satisfied

The general form of while-statement is as follows8

while(condition)

8 next-statement in the while-statement forces program to start the next iteration

immediately break-statement in while-statement forces the program to immediately stop

the iteration and get out of the while-loop

23

A set of procedures to execute when the condition is satisfied

For example executing while(x lt= 3) print(x) x lt- x + 1 after doing x lt- 1 will generate

the output of 1 to 3

gt x lt- 1

gt while (x lt= 3) print(x) x lt- x + 1

[1] 1

[1] 2

[1] 3

gt

If you are to write this procedure as a program in a file you may want to write each step

line by line as follows so that the program will be easy to read

x lt- 1

while (x lt= 3)

print(x)

x lt- x + 1

Initially the value of x is 1 and the condition of the while-block ie x le 3 is met Thus R

goes into the while-block The first procedure print(x) which displays the value of x is

executed and 1 will be displayed

In the next procedure in the while-block (x lt- x + 1) the value of x is increased by 1 Thus

at the end of the first loop of while-block x is 2

R interpreter then comes back to the condition check at the beginning of the while-block

(x lt= 3) The variable x is 2 at this point and still x le 3 holds so the procedures in the

while-block will be executed again ie displaying the value of x 2 by print(x) and

increasing the value of x by 1 by x lt- x + 1 At the end of the while-block x is 3

24

R interpreter will then come back again to the condition check at the beginning of the

while-block (x lt= 3) The variable x is 3 at this point and still x le 3 holds so the

procedures in the while-block will be executed again ie displaying the value of x which

is 3 by print(x) and increasing the value of x by 1 by x lt- x + 1 At the end of the

while-block x is 4

R interpreter will then come back again to the condition check at the beginning of the

while-block (x lt= 3) The variable x is 4 at this point and x le 3 is NOT satisfied any more

Thus R interpreter steps out of the while-loop Final value of x is 4

Exercise 13-2 Using while-statement display 13579 and 11 respectively in each line

133 for-statement

for-statement also does iteration like while-statement The for-statement assigns each of

given elements to the given variable starting from the first element to the last element in

the given elements After each assignment the procedures in the for-block are executed

The general form of for-statement is

for(variables in elements)

procedures

For example for (i in c(135)) print(i) will assign each of 1 3 5 into the variable i and

after each assignment the procedures in the file-block are executed Thus in this case

number 1 is displayed in the first iteration after the assignment 3 is displayed in the

second iteration after the assignment and finally 5 is displayed in the third iteration after

the assignment The following example makes a vector of set of squared values of 1 to 59

x lt- NULL

for (i in 15)

x lt- append(x i2)

9 Actually using for-statement is unnecessary here Just do x lt- (15)2

25

x lt- NULL assigns an empty vector to x NULL represents an empty vector

In the for-statement each value of 1 to 5 is assigned to i and after the assignment the

procedure in the for-block is executed

In the procedure in the for-block the squared value of i (represented as i2) is

concatenated to the vector x The function append(x i) concatenates i to vector x10

Exercise 13-3 Rewrite procedure in 12-2 using for-statement

14 Other useful functions

Here some of the frequently used R functions are introduced briefly

141 How to use

help(function_name) will display how to use function_name

142 variables and attributes

ls() or objects() will display currently defined variables

class(variable_name) or mode(variable_name) will give you the type of the variable (object

For example whether it is numerical variable character list or matrix)

attributes(variable_name) will return attributes defined for the given variable

variabl_name

143 Family of apply functions

ldquosapplyrdquo function will return a vector containing a set of output values from a given

function after using each value in the given vector as a single input to that given function

each by each

func1_sub lt- function(elm)

a function expecting a single number as a parameter

if(-1 lt= elm amp elm lt= 1) return (1) else return (0)

10 The same procedure can be done with x lt- c(x i2) c can be used to concatenate given

vectors

26

func1 lt- function(x) a function with a vector parameter x

return(sapply(x func1_sub))

144 Making figures

rdquocurverdquo function generates a graph for a given function See help for details

Example

curve(dnorm -7 +7) Draws normal (Gaussian) distribution

curve(cos(x)+cos(2x) -2pi 2pi 1000) 1000 is number of points

curve(func1 -3 3) The function defined in 143 Family of apply functions

27

14

Bonds Giants 9847 2935 762

Aaron Braves 12364 3771 755

Ruth Yankees 8398 2873 714

Rodriguez Yankees 10341 3070 687

Mays Giants 10881 3283 660

readtable can read this as data frame

gt x lt- readtable(batterstxt header = T

sep = t rownames = 1)

gt x

Team At_Bat Hits Home_Runs

Bonds Giants 9847 2935 762

Aaron Braves 12364 3771 755

Ruth Yankees 8398 2873 714

Rodriguez Yankees 10341 3070 687

Mays Giants 10881 3283 660

gt

The first parameter of the function readtable states that the name of the file to read

is rdquobatterstxtrdquo Subsequently header = T states that there is a header for the table sep =

ldquotrdquo indicates that each line is separated by TAB Finally we can specify that there is one

column for row names by rownames = 1 Since x will be a data frame we can get number of

hits by $Hits

Exercise 8 Save table in exercise 7-1 as tab-delimited file and read it as data frame

9 Writing data to file

There are many cases where we want to have statistical results written in file rather

than temporary seeing the results on screen In this way we can open the results by

spreadsheet software or process the results using other programming language afterwards

Using write function is one of the simplest ways to output the results to files This

function enables us to output numerical values assigned to variable to file For example

after assigning the vector to x as shown below

x lt- c(10 12 15 19 21 34)

15

the following line

write(x outfile1txt ncolumns = 1)

will write content of x to the file ldquooutfile1txtrdquo ncolumns = 1 states that the vector will be

written to the file by one column

Content of outfile1txt

10

12

15

19

21

34

write function also enable us to write matrix data to file as shown below

gt x lt- matrix(c(123456) nrow=2 ncol=3 byrow=T)

gt x

[1] [2] [3]

[1] 1 2 3

[2] 4 5 6

gt write(t(x) outfile2txt ncolumns=ncol(x) sep=t)

gt

t(x) transposes matrix x If we do not do it the matrix written in the file will look

transposed ncolumns=ncol(x) will tell R that the number of columns to output should be

identical to that of matrix x sep = ldquotrdquo indicates that the columns in the output file will be

separated by tabs

Content of outfile2txt

1 2 3

4 5 6

For writing data frame to a file writetable function is provided

gt x lt- dataframe(

rownames = c(Bonds Aaron Ruth Rodriguez Mays)

Team = c(Giants Braves Yankees Yankees Giants)

16

At_Bat = c(9847 12364 8398 10341 10881)

Hits = c(2935 3771 2873 3070 3283)

Home_Runs = c(762 755 714 687 660))

gt x[c(Hits Home_Runs)]

Hits Home_Runs

Bonds 2935 762

Aaron 3771 755

Ruth 2873 714

Rodriguez 3070 687

Mays 3283 660

gt writetable(x[c(Hits Home_Runs)] outfile3txt sep=t

rownames=T colnames=NA)

gt

With rownames=T and colnames=NA row names and column names will be added

respectively (blank on the top-left)

Content of outfile3

Hits Home_Runs

Bonds 2935 762

Aaron 3771 755

Ruth 2873 714

Rodriguez 3070 687

Mays 3283 660

By giving quote=F output will be without double quotations

Excercise 9 Using the data frame in the above example calculate ratio of hits and at bat

and write the result to rdquooutfile4txtrdquo

10 Writing a program in a file

So far we have done our works interactively without saving what R codes we have written

However when we repeat the same works with R it is laborious to interactively write the

same codes again and again To solve this issue we can write the set of codes in a file and R

can read the file to execute the codes written in the file

For example we can prepare a text file with the following code Let us name the

17

file vecsumtestR

x lt- c(12345)

y lt- c(246810)

z lt- x + y

Then giving source( vecsumtestR) will let R execute the set of codes in the file

vecsumtestR You can check that based on the codes in vecsumtestR vectors are assigned

to each of x y and z

gt source( vecsumtestR)

gt x

[1] 1 2 3 4 5

gt y

[1] 2 4 6 8 10

gt z

[1] 3 6 9 12 15

gt

Exercise 10 Write the procedure in exercise 8 to a file dframetestR and execute the

procedure using ldquosourcerdquo

11 Defining functions

In mathematics function outputs the value which is determined by the input value In

programming it often represents a defined set of procedures5 For example let us consider

a function f which divides the sum of two given input values In mathematics it can be

written as f(x y) = (x + y) 2 In R it is written as follows by introducing the keyword

ldquofunctionrdquo

f lt- function(x y)

return ((x + y) 2)

In this way a function with two parameters (x and y) is defined And the returned value

(output) will be (x + y) 2 After defining f

5 Probably ldquosubroutinerdquo may be the better terminology

18

f(10 20)

will assign 10 and 20 to parameters x and y respectively and 15 will be returned as (x +y) 2

= (10 + 20) 2 = 15 Here ldquoreturnrdquo statement will return the value given just after it You

can also explicitly give the parameter names x and y as follows

f(x = 10 y = 20)

The general way to define a function is

name_of_function lt- function(parameter 1 parameter 2 hellip)

various procedures possibly using the given parameters6

hellip

return(return_value)

Exercise 11 Implement a mathematical function f(x a b c) = ax2 + bx + c in R Then

calculate f(4 3 2 1)

12 Making graphs

R is equipped with functions capable of making graphs easily ldquoplotrdquo function may be one

of the simplest ones It creates a plot in the two-dimensional space For example

assigning the following vectors to x and y will create a plot with points on the locations

(12) (34) (59) (77) and (98)

x lt- c(13579)

y lt- c(24978)

plot(x y xlab=X Value ylab=Y Value)

Labels on the x and y axes can be given to the parameters xlab and ylab

respectively

6 Optional parameters hellip can be access with list(hellip) as a list

19

A plot created by ldquoplotrdquo function

A bar graph can be created using ldquobarplotrdquo function In the following example a bar

graph is created by giving the heights of each bar by vector x The labels of each bar

are given by the element names of the vector x ie names(x)

x lt- c(1232101)

names(x) lt- c(A B C D E F)

barplot(x)

A bar graph created by ldquobarplotrdquo function

ldquohistrdquo function generates a histogram for a set of numbers given by a vector

2 4 6 8

23

45

67

89

X Value

YV

alu

e

A B C D E F

02

46

810

A B C D E F

02

46

810

20

x lt- c(32 12 42 23 34 59 52 53 41 52 32 14)

hist(x xlab = Test Value main = Test Histogram)

The parameter ldquomainrdquo is used to state the title of the histogram

A generated histogram using ldquohistrdquo function

ldquoboxplotrdquo function creates a boxplot for the numbers given by the vectors

x1 lt- c(1112111011111213151211101213)

x2 lt- c(20212791223231211921157121292315)

boxplot(x1 x2 names=c(Data 1 Data 2))

Test Histogram

Test Value

Fre

qu

en

cy

1 2 3 4 5 6

01

23

4

21

Example of a box plot

Exercise 12 Using the table (data frame) given in exercise 7-1 make a plot which describes

relationship between masses of the planets and number of satellites

13 Basic program structure

R is equipped with programming grammars which are important and common in other

programming languages Here some of the most important ones will be described briefly

131 if-statement

ldquoifrdquo statement gives you a way to execute a specific procedure only if a specified condition7

is satisfied For example if you want to assign 1 to y when x gt 0 and otherwise assign 0 to y

you can write as follows

if (x gt 0)

y lt- 1

7 When specifying a condition logical operators such as amp (logical AND) and | (logical OR)

can be used

Data 1 Data 2

10

15

20

25

Median

Outlier 50 of the

data are within this

range

The bottom

25 excluding outliers

The top 25

excluding outliers

22

else

y lt- 0

Investigate the value of y after doing x lt- -5 Then after doing x lt- 3 investigate the value

of y again to make sure that the value has changed

The general form of if-statement is as follows

if (condition 1)

Set of procedures to execute when the condition 1 is satisfied

else if (condition 2)

Set of procedures to execute when the condition the condition 2 is satisfied (but the

condition 1 is NOT met)

else if(condition 3)

Se of procedures to execute when condition 3 is satisfied (but condition 1 to 2 are NOT

met)

else if hellip

else

Set of procedures to execute when NONE of the above conditions are satisfied

Exercise 13-1 Define a function which returns 1 when the input parameter is 0 and

otherwise returns 0

132 while-statement

while-statement iterates a set of given procedures as long as the given condition is satisfied

The general form of while-statement is as follows8

while(condition)

8 next-statement in the while-statement forces program to start the next iteration

immediately break-statement in while-statement forces the program to immediately stop

the iteration and get out of the while-loop

23

A set of procedures to execute when the condition is satisfied

For example executing while(x lt= 3) print(x) x lt- x + 1 after doing x lt- 1 will generate

the output of 1 to 3

gt x lt- 1

gt while (x lt= 3) print(x) x lt- x + 1

[1] 1

[1] 2

[1] 3

gt

If you are to write this procedure as a program in a file you may want to write each step

line by line as follows so that the program will be easy to read

x lt- 1

while (x lt= 3)

print(x)

x lt- x + 1

Initially the value of x is 1 and the condition of the while-block ie x le 3 is met Thus R

goes into the while-block The first procedure print(x) which displays the value of x is

executed and 1 will be displayed

In the next procedure in the while-block (x lt- x + 1) the value of x is increased by 1 Thus

at the end of the first loop of while-block x is 2

R interpreter then comes back to the condition check at the beginning of the while-block

(x lt= 3) The variable x is 2 at this point and still x le 3 holds so the procedures in the

while-block will be executed again ie displaying the value of x 2 by print(x) and

increasing the value of x by 1 by x lt- x + 1 At the end of the while-block x is 3

24

R interpreter will then come back again to the condition check at the beginning of the

while-block (x lt= 3) The variable x is 3 at this point and still x le 3 holds so the

procedures in the while-block will be executed again ie displaying the value of x which

is 3 by print(x) and increasing the value of x by 1 by x lt- x + 1 At the end of the

while-block x is 4

R interpreter will then come back again to the condition check at the beginning of the

while-block (x lt= 3) The variable x is 4 at this point and x le 3 is NOT satisfied any more

Thus R interpreter steps out of the while-loop Final value of x is 4

Exercise 13-2 Using while-statement display 13579 and 11 respectively in each line

133 for-statement

for-statement also does iteration like while-statement The for-statement assigns each of

given elements to the given variable starting from the first element to the last element in

the given elements After each assignment the procedures in the for-block are executed

The general form of for-statement is

for(variables in elements)

procedures

For example for (i in c(135)) print(i) will assign each of 1 3 5 into the variable i and

after each assignment the procedures in the file-block are executed Thus in this case

number 1 is displayed in the first iteration after the assignment 3 is displayed in the

second iteration after the assignment and finally 5 is displayed in the third iteration after

the assignment The following example makes a vector of set of squared values of 1 to 59

x lt- NULL

for (i in 15)

x lt- append(x i2)

9 Actually using for-statement is unnecessary here Just do x lt- (15)2

25

x lt- NULL assigns an empty vector to x NULL represents an empty vector

In the for-statement each value of 1 to 5 is assigned to i and after the assignment the

procedure in the for-block is executed

In the procedure in the for-block the squared value of i (represented as i2) is

concatenated to the vector x The function append(x i) concatenates i to vector x10

Exercise 13-3 Rewrite procedure in 12-2 using for-statement

14 Other useful functions

Here some of the frequently used R functions are introduced briefly

141 How to use

help(function_name) will display how to use function_name

142 variables and attributes

ls() or objects() will display currently defined variables

class(variable_name) or mode(variable_name) will give you the type of the variable (object

For example whether it is numerical variable character list or matrix)

attributes(variable_name) will return attributes defined for the given variable

variabl_name

143 Family of apply functions

ldquosapplyrdquo function will return a vector containing a set of output values from a given

function after using each value in the given vector as a single input to that given function

each by each

func1_sub lt- function(elm)

a function expecting a single number as a parameter

if(-1 lt= elm amp elm lt= 1) return (1) else return (0)

10 The same procedure can be done with x lt- c(x i2) c can be used to concatenate given

vectors

26

func1 lt- function(x) a function with a vector parameter x

return(sapply(x func1_sub))

144 Making figures

rdquocurverdquo function generates a graph for a given function See help for details

Example

curve(dnorm -7 +7) Draws normal (Gaussian) distribution

curve(cos(x)+cos(2x) -2pi 2pi 1000) 1000 is number of points

curve(func1 -3 3) The function defined in 143 Family of apply functions

27

15

the following line

write(x outfile1txt ncolumns = 1)

will write content of x to the file ldquooutfile1txtrdquo ncolumns = 1 states that the vector will be

written to the file by one column

Content of outfile1txt

10

12

15

19

21

34

write function also enable us to write matrix data to file as shown below

gt x lt- matrix(c(123456) nrow=2 ncol=3 byrow=T)

gt x

[1] [2] [3]

[1] 1 2 3

[2] 4 5 6

gt write(t(x) outfile2txt ncolumns=ncol(x) sep=t)

gt

t(x) transposes matrix x If we do not do it the matrix written in the file will look

transposed ncolumns=ncol(x) will tell R that the number of columns to output should be

identical to that of matrix x sep = ldquotrdquo indicates that the columns in the output file will be

separated by tabs

Content of outfile2txt

1 2 3

4 5 6

For writing data frame to a file writetable function is provided

gt x lt- dataframe(

rownames = c(Bonds Aaron Ruth Rodriguez Mays)

Team = c(Giants Braves Yankees Yankees Giants)

16

At_Bat = c(9847 12364 8398 10341 10881)

Hits = c(2935 3771 2873 3070 3283)

Home_Runs = c(762 755 714 687 660))

gt x[c(Hits Home_Runs)]

Hits Home_Runs

Bonds 2935 762

Aaron 3771 755

Ruth 2873 714

Rodriguez 3070 687

Mays 3283 660

gt writetable(x[c(Hits Home_Runs)] outfile3txt sep=t

rownames=T colnames=NA)

gt

With rownames=T and colnames=NA row names and column names will be added

respectively (blank on the top-left)

Content of outfile3

Hits Home_Runs

Bonds 2935 762

Aaron 3771 755

Ruth 2873 714

Rodriguez 3070 687

Mays 3283 660

By giving quote=F output will be without double quotations

Excercise 9 Using the data frame in the above example calculate ratio of hits and at bat

and write the result to rdquooutfile4txtrdquo

10 Writing a program in a file

So far we have done our works interactively without saving what R codes we have written

However when we repeat the same works with R it is laborious to interactively write the

same codes again and again To solve this issue we can write the set of codes in a file and R

can read the file to execute the codes written in the file

For example we can prepare a text file with the following code Let us name the

17

file vecsumtestR

x lt- c(12345)

y lt- c(246810)

z lt- x + y

Then giving source( vecsumtestR) will let R execute the set of codes in the file

vecsumtestR You can check that based on the codes in vecsumtestR vectors are assigned

to each of x y and z

gt source( vecsumtestR)

gt x

[1] 1 2 3 4 5

gt y

[1] 2 4 6 8 10

gt z

[1] 3 6 9 12 15

gt

Exercise 10 Write the procedure in exercise 8 to a file dframetestR and execute the

procedure using ldquosourcerdquo

11 Defining functions

In mathematics function outputs the value which is determined by the input value In

programming it often represents a defined set of procedures5 For example let us consider

a function f which divides the sum of two given input values In mathematics it can be

written as f(x y) = (x + y) 2 In R it is written as follows by introducing the keyword

ldquofunctionrdquo

f lt- function(x y)

return ((x + y) 2)

In this way a function with two parameters (x and y) is defined And the returned value

(output) will be (x + y) 2 After defining f

5 Probably ldquosubroutinerdquo may be the better terminology

18

f(10 20)

will assign 10 and 20 to parameters x and y respectively and 15 will be returned as (x +y) 2

= (10 + 20) 2 = 15 Here ldquoreturnrdquo statement will return the value given just after it You

can also explicitly give the parameter names x and y as follows

f(x = 10 y = 20)

The general way to define a function is

name_of_function lt- function(parameter 1 parameter 2 hellip)

various procedures possibly using the given parameters6

hellip

return(return_value)

Exercise 11 Implement a mathematical function f(x a b c) = ax2 + bx + c in R Then

calculate f(4 3 2 1)

12 Making graphs

R is equipped with functions capable of making graphs easily ldquoplotrdquo function may be one

of the simplest ones It creates a plot in the two-dimensional space For example

assigning the following vectors to x and y will create a plot with points on the locations

(12) (34) (59) (77) and (98)

x lt- c(13579)

y lt- c(24978)

plot(x y xlab=X Value ylab=Y Value)

Labels on the x and y axes can be given to the parameters xlab and ylab

respectively

6 Optional parameters hellip can be access with list(hellip) as a list

19

A plot created by ldquoplotrdquo function

A bar graph can be created using ldquobarplotrdquo function In the following example a bar

graph is created by giving the heights of each bar by vector x The labels of each bar

are given by the element names of the vector x ie names(x)

x lt- c(1232101)

names(x) lt- c(A B C D E F)

barplot(x)

A bar graph created by ldquobarplotrdquo function

ldquohistrdquo function generates a histogram for a set of numbers given by a vector

2 4 6 8

23

45

67

89

X Value

YV

alu

e

A B C D E F

02

46

810

A B C D E F

02

46

810

20

x lt- c(32 12 42 23 34 59 52 53 41 52 32 14)

hist(x xlab = Test Value main = Test Histogram)

The parameter ldquomainrdquo is used to state the title of the histogram

A generated histogram using ldquohistrdquo function

ldquoboxplotrdquo function creates a boxplot for the numbers given by the vectors

x1 lt- c(1112111011111213151211101213)

x2 lt- c(20212791223231211921157121292315)

boxplot(x1 x2 names=c(Data 1 Data 2))

Test Histogram

Test Value

Fre

qu

en

cy

1 2 3 4 5 6

01

23

4

21

Example of a box plot

Exercise 12 Using the table (data frame) given in exercise 7-1 make a plot which describes

relationship between masses of the planets and number of satellites

13 Basic program structure

R is equipped with programming grammars which are important and common in other

programming languages Here some of the most important ones will be described briefly

131 if-statement

ldquoifrdquo statement gives you a way to execute a specific procedure only if a specified condition7

is satisfied For example if you want to assign 1 to y when x gt 0 and otherwise assign 0 to y

you can write as follows

if (x gt 0)

y lt- 1

7 When specifying a condition logical operators such as amp (logical AND) and | (logical OR)

can be used

Data 1 Data 2

10

15

20

25

Median

Outlier 50 of the

data are within this

range

The bottom

25 excluding outliers

The top 25

excluding outliers

22

else

y lt- 0

Investigate the value of y after doing x lt- -5 Then after doing x lt- 3 investigate the value

of y again to make sure that the value has changed

The general form of if-statement is as follows

if (condition 1)

Set of procedures to execute when the condition 1 is satisfied

else if (condition 2)

Set of procedures to execute when the condition the condition 2 is satisfied (but the

condition 1 is NOT met)

else if(condition 3)

Se of procedures to execute when condition 3 is satisfied (but condition 1 to 2 are NOT

met)

else if hellip

else

Set of procedures to execute when NONE of the above conditions are satisfied

Exercise 13-1 Define a function which returns 1 when the input parameter is 0 and

otherwise returns 0

132 while-statement

while-statement iterates a set of given procedures as long as the given condition is satisfied

The general form of while-statement is as follows8

while(condition)

8 next-statement in the while-statement forces program to start the next iteration

immediately break-statement in while-statement forces the program to immediately stop

the iteration and get out of the while-loop

23

A set of procedures to execute when the condition is satisfied

For example executing while(x lt= 3) print(x) x lt- x + 1 after doing x lt- 1 will generate

the output of 1 to 3

gt x lt- 1

gt while (x lt= 3) print(x) x lt- x + 1

[1] 1

[1] 2

[1] 3

gt

If you are to write this procedure as a program in a file you may want to write each step

line by line as follows so that the program will be easy to read

x lt- 1

while (x lt= 3)

print(x)

x lt- x + 1

Initially the value of x is 1 and the condition of the while-block ie x le 3 is met Thus R

goes into the while-block The first procedure print(x) which displays the value of x is

executed and 1 will be displayed

In the next procedure in the while-block (x lt- x + 1) the value of x is increased by 1 Thus

at the end of the first loop of while-block x is 2

R interpreter then comes back to the condition check at the beginning of the while-block

(x lt= 3) The variable x is 2 at this point and still x le 3 holds so the procedures in the

while-block will be executed again ie displaying the value of x 2 by print(x) and

increasing the value of x by 1 by x lt- x + 1 At the end of the while-block x is 3

24

R interpreter will then come back again to the condition check at the beginning of the

while-block (x lt= 3) The variable x is 3 at this point and still x le 3 holds so the

procedures in the while-block will be executed again ie displaying the value of x which

is 3 by print(x) and increasing the value of x by 1 by x lt- x + 1 At the end of the

while-block x is 4

R interpreter will then come back again to the condition check at the beginning of the

while-block (x lt= 3) The variable x is 4 at this point and x le 3 is NOT satisfied any more

Thus R interpreter steps out of the while-loop Final value of x is 4

Exercise 13-2 Using while-statement display 13579 and 11 respectively in each line

133 for-statement

for-statement also does iteration like while-statement The for-statement assigns each of

given elements to the given variable starting from the first element to the last element in

the given elements After each assignment the procedures in the for-block are executed

The general form of for-statement is

for(variables in elements)

procedures

For example for (i in c(135)) print(i) will assign each of 1 3 5 into the variable i and

after each assignment the procedures in the file-block are executed Thus in this case

number 1 is displayed in the first iteration after the assignment 3 is displayed in the

second iteration after the assignment and finally 5 is displayed in the third iteration after

the assignment The following example makes a vector of set of squared values of 1 to 59

x lt- NULL

for (i in 15)

x lt- append(x i2)

9 Actually using for-statement is unnecessary here Just do x lt- (15)2

25

x lt- NULL assigns an empty vector to x NULL represents an empty vector

In the for-statement each value of 1 to 5 is assigned to i and after the assignment the

procedure in the for-block is executed

In the procedure in the for-block the squared value of i (represented as i2) is

concatenated to the vector x The function append(x i) concatenates i to vector x10

Exercise 13-3 Rewrite procedure in 12-2 using for-statement

14 Other useful functions

Here some of the frequently used R functions are introduced briefly

141 How to use

help(function_name) will display how to use function_name

142 variables and attributes

ls() or objects() will display currently defined variables

class(variable_name) or mode(variable_name) will give you the type of the variable (object

For example whether it is numerical variable character list or matrix)

attributes(variable_name) will return attributes defined for the given variable

variabl_name

143 Family of apply functions

ldquosapplyrdquo function will return a vector containing a set of output values from a given

function after using each value in the given vector as a single input to that given function

each by each

func1_sub lt- function(elm)

a function expecting a single number as a parameter

if(-1 lt= elm amp elm lt= 1) return (1) else return (0)

10 The same procedure can be done with x lt- c(x i2) c can be used to concatenate given

vectors

26

func1 lt- function(x) a function with a vector parameter x

return(sapply(x func1_sub))

144 Making figures

rdquocurverdquo function generates a graph for a given function See help for details

Example

curve(dnorm -7 +7) Draws normal (Gaussian) distribution

curve(cos(x)+cos(2x) -2pi 2pi 1000) 1000 is number of points

curve(func1 -3 3) The function defined in 143 Family of apply functions

27

16

At_Bat = c(9847 12364 8398 10341 10881)

Hits = c(2935 3771 2873 3070 3283)

Home_Runs = c(762 755 714 687 660))

gt x[c(Hits Home_Runs)]

Hits Home_Runs

Bonds 2935 762

Aaron 3771 755

Ruth 2873 714

Rodriguez 3070 687

Mays 3283 660

gt writetable(x[c(Hits Home_Runs)] outfile3txt sep=t

rownames=T colnames=NA)

gt

With rownames=T and colnames=NA row names and column names will be added

respectively (blank on the top-left)

Content of outfile3

Hits Home_Runs

Bonds 2935 762

Aaron 3771 755

Ruth 2873 714

Rodriguez 3070 687

Mays 3283 660

By giving quote=F output will be without double quotations

Excercise 9 Using the data frame in the above example calculate ratio of hits and at bat

and write the result to rdquooutfile4txtrdquo

10 Writing a program in a file

So far we have done our works interactively without saving what R codes we have written

However when we repeat the same works with R it is laborious to interactively write the

same codes again and again To solve this issue we can write the set of codes in a file and R

can read the file to execute the codes written in the file

For example we can prepare a text file with the following code Let us name the

17

file vecsumtestR

x lt- c(12345)

y lt- c(246810)

z lt- x + y

Then giving source( vecsumtestR) will let R execute the set of codes in the file

vecsumtestR You can check that based on the codes in vecsumtestR vectors are assigned

to each of x y and z

gt source( vecsumtestR)

gt x

[1] 1 2 3 4 5

gt y

[1] 2 4 6 8 10

gt z

[1] 3 6 9 12 15

gt

Exercise 10 Write the procedure in exercise 8 to a file dframetestR and execute the

procedure using ldquosourcerdquo

11 Defining functions

In mathematics function outputs the value which is determined by the input value In

programming it often represents a defined set of procedures5 For example let us consider

a function f which divides the sum of two given input values In mathematics it can be

written as f(x y) = (x + y) 2 In R it is written as follows by introducing the keyword

ldquofunctionrdquo

f lt- function(x y)

return ((x + y) 2)

In this way a function with two parameters (x and y) is defined And the returned value

(output) will be (x + y) 2 After defining f

5 Probably ldquosubroutinerdquo may be the better terminology

18

f(10 20)

will assign 10 and 20 to parameters x and y respectively and 15 will be returned as (x +y) 2

= (10 + 20) 2 = 15 Here ldquoreturnrdquo statement will return the value given just after it You

can also explicitly give the parameter names x and y as follows

f(x = 10 y = 20)

The general way to define a function is

name_of_function lt- function(parameter 1 parameter 2 hellip)

various procedures possibly using the given parameters6

hellip

return(return_value)

Exercise 11 Implement a mathematical function f(x a b c) = ax2 + bx + c in R Then

calculate f(4 3 2 1)

12 Making graphs

R is equipped with functions capable of making graphs easily ldquoplotrdquo function may be one

of the simplest ones It creates a plot in the two-dimensional space For example

assigning the following vectors to x and y will create a plot with points on the locations

(12) (34) (59) (77) and (98)

x lt- c(13579)

y lt- c(24978)

plot(x y xlab=X Value ylab=Y Value)

Labels on the x and y axes can be given to the parameters xlab and ylab

respectively

6 Optional parameters hellip can be access with list(hellip) as a list

19

A plot created by ldquoplotrdquo function

A bar graph can be created using ldquobarplotrdquo function In the following example a bar

graph is created by giving the heights of each bar by vector x The labels of each bar

are given by the element names of the vector x ie names(x)

x lt- c(1232101)

names(x) lt- c(A B C D E F)

barplot(x)

A bar graph created by ldquobarplotrdquo function

ldquohistrdquo function generates a histogram for a set of numbers given by a vector

2 4 6 8

23

45

67

89

X Value

YV

alu

e

A B C D E F

02

46

810

A B C D E F

02

46

810

20

x lt- c(32 12 42 23 34 59 52 53 41 52 32 14)

hist(x xlab = Test Value main = Test Histogram)

The parameter ldquomainrdquo is used to state the title of the histogram

A generated histogram using ldquohistrdquo function

ldquoboxplotrdquo function creates a boxplot for the numbers given by the vectors

x1 lt- c(1112111011111213151211101213)

x2 lt- c(20212791223231211921157121292315)

boxplot(x1 x2 names=c(Data 1 Data 2))

Test Histogram

Test Value

Fre

qu

en

cy

1 2 3 4 5 6

01

23

4

21

Example of a box plot

Exercise 12 Using the table (data frame) given in exercise 7-1 make a plot which describes

relationship between masses of the planets and number of satellites

13 Basic program structure

R is equipped with programming grammars which are important and common in other

programming languages Here some of the most important ones will be described briefly

131 if-statement

ldquoifrdquo statement gives you a way to execute a specific procedure only if a specified condition7

is satisfied For example if you want to assign 1 to y when x gt 0 and otherwise assign 0 to y

you can write as follows

if (x gt 0)

y lt- 1

7 When specifying a condition logical operators such as amp (logical AND) and | (logical OR)

can be used

Data 1 Data 2

10

15

20

25

Median

Outlier 50 of the

data are within this

range

The bottom

25 excluding outliers

The top 25

excluding outliers

22

else

y lt- 0

Investigate the value of y after doing x lt- -5 Then after doing x lt- 3 investigate the value

of y again to make sure that the value has changed

The general form of if-statement is as follows

if (condition 1)

Set of procedures to execute when the condition 1 is satisfied

else if (condition 2)

Set of procedures to execute when the condition the condition 2 is satisfied (but the

condition 1 is NOT met)

else if(condition 3)

Se of procedures to execute when condition 3 is satisfied (but condition 1 to 2 are NOT

met)

else if hellip

else

Set of procedures to execute when NONE of the above conditions are satisfied

Exercise 13-1 Define a function which returns 1 when the input parameter is 0 and

otherwise returns 0

132 while-statement

while-statement iterates a set of given procedures as long as the given condition is satisfied

The general form of while-statement is as follows8

while(condition)

8 next-statement in the while-statement forces program to start the next iteration

immediately break-statement in while-statement forces the program to immediately stop

the iteration and get out of the while-loop

23

A set of procedures to execute when the condition is satisfied

For example executing while(x lt= 3) print(x) x lt- x + 1 after doing x lt- 1 will generate

the output of 1 to 3

gt x lt- 1

gt while (x lt= 3) print(x) x lt- x + 1

[1] 1

[1] 2

[1] 3

gt

If you are to write this procedure as a program in a file you may want to write each step

line by line as follows so that the program will be easy to read

x lt- 1

while (x lt= 3)

print(x)

x lt- x + 1

Initially the value of x is 1 and the condition of the while-block ie x le 3 is met Thus R

goes into the while-block The first procedure print(x) which displays the value of x is

executed and 1 will be displayed

In the next procedure in the while-block (x lt- x + 1) the value of x is increased by 1 Thus

at the end of the first loop of while-block x is 2

R interpreter then comes back to the condition check at the beginning of the while-block

(x lt= 3) The variable x is 2 at this point and still x le 3 holds so the procedures in the

while-block will be executed again ie displaying the value of x 2 by print(x) and

increasing the value of x by 1 by x lt- x + 1 At the end of the while-block x is 3

24

R interpreter will then come back again to the condition check at the beginning of the

while-block (x lt= 3) The variable x is 3 at this point and still x le 3 holds so the

procedures in the while-block will be executed again ie displaying the value of x which

is 3 by print(x) and increasing the value of x by 1 by x lt- x + 1 At the end of the

while-block x is 4

R interpreter will then come back again to the condition check at the beginning of the

while-block (x lt= 3) The variable x is 4 at this point and x le 3 is NOT satisfied any more

Thus R interpreter steps out of the while-loop Final value of x is 4

Exercise 13-2 Using while-statement display 13579 and 11 respectively in each line

133 for-statement

for-statement also does iteration like while-statement The for-statement assigns each of

given elements to the given variable starting from the first element to the last element in

the given elements After each assignment the procedures in the for-block are executed

The general form of for-statement is

for(variables in elements)

procedures

For example for (i in c(135)) print(i) will assign each of 1 3 5 into the variable i and

after each assignment the procedures in the file-block are executed Thus in this case

number 1 is displayed in the first iteration after the assignment 3 is displayed in the

second iteration after the assignment and finally 5 is displayed in the third iteration after

the assignment The following example makes a vector of set of squared values of 1 to 59

x lt- NULL

for (i in 15)

x lt- append(x i2)

9 Actually using for-statement is unnecessary here Just do x lt- (15)2

25

x lt- NULL assigns an empty vector to x NULL represents an empty vector

In the for-statement each value of 1 to 5 is assigned to i and after the assignment the

procedure in the for-block is executed

In the procedure in the for-block the squared value of i (represented as i2) is

concatenated to the vector x The function append(x i) concatenates i to vector x10

Exercise 13-3 Rewrite procedure in 12-2 using for-statement

14 Other useful functions

Here some of the frequently used R functions are introduced briefly

141 How to use

help(function_name) will display how to use function_name

142 variables and attributes

ls() or objects() will display currently defined variables

class(variable_name) or mode(variable_name) will give you the type of the variable (object

For example whether it is numerical variable character list or matrix)

attributes(variable_name) will return attributes defined for the given variable

variabl_name

143 Family of apply functions

ldquosapplyrdquo function will return a vector containing a set of output values from a given

function after using each value in the given vector as a single input to that given function

each by each

func1_sub lt- function(elm)

a function expecting a single number as a parameter

if(-1 lt= elm amp elm lt= 1) return (1) else return (0)

10 The same procedure can be done with x lt- c(x i2) c can be used to concatenate given

vectors

26

func1 lt- function(x) a function with a vector parameter x

return(sapply(x func1_sub))

144 Making figures

rdquocurverdquo function generates a graph for a given function See help for details

Example

curve(dnorm -7 +7) Draws normal (Gaussian) distribution

curve(cos(x)+cos(2x) -2pi 2pi 1000) 1000 is number of points

curve(func1 -3 3) The function defined in 143 Family of apply functions

27

17

file vecsumtestR

x lt- c(12345)

y lt- c(246810)

z lt- x + y

Then giving source( vecsumtestR) will let R execute the set of codes in the file

vecsumtestR You can check that based on the codes in vecsumtestR vectors are assigned

to each of x y and z

gt source( vecsumtestR)

gt x

[1] 1 2 3 4 5

gt y

[1] 2 4 6 8 10

gt z

[1] 3 6 9 12 15

gt

Exercise 10 Write the procedure in exercise 8 to a file dframetestR and execute the

procedure using ldquosourcerdquo

11 Defining functions

In mathematics function outputs the value which is determined by the input value In

programming it often represents a defined set of procedures5 For example let us consider

a function f which divides the sum of two given input values In mathematics it can be

written as f(x y) = (x + y) 2 In R it is written as follows by introducing the keyword

ldquofunctionrdquo

f lt- function(x y)

return ((x + y) 2)

In this way a function with two parameters (x and y) is defined And the returned value

(output) will be (x + y) 2 After defining f

5 Probably ldquosubroutinerdquo may be the better terminology

18

f(10 20)

will assign 10 and 20 to parameters x and y respectively and 15 will be returned as (x +y) 2

= (10 + 20) 2 = 15 Here ldquoreturnrdquo statement will return the value given just after it You

can also explicitly give the parameter names x and y as follows

f(x = 10 y = 20)

The general way to define a function is

name_of_function lt- function(parameter 1 parameter 2 hellip)

various procedures possibly using the given parameters6

hellip

return(return_value)

Exercise 11 Implement a mathematical function f(x a b c) = ax2 + bx + c in R Then

calculate f(4 3 2 1)

12 Making graphs

R is equipped with functions capable of making graphs easily ldquoplotrdquo function may be one

of the simplest ones It creates a plot in the two-dimensional space For example

assigning the following vectors to x and y will create a plot with points on the locations

(12) (34) (59) (77) and (98)

x lt- c(13579)

y lt- c(24978)

plot(x y xlab=X Value ylab=Y Value)

Labels on the x and y axes can be given to the parameters xlab and ylab

respectively

6 Optional parameters hellip can be access with list(hellip) as a list

19

A plot created by ldquoplotrdquo function

A bar graph can be created using ldquobarplotrdquo function In the following example a bar

graph is created by giving the heights of each bar by vector x The labels of each bar

are given by the element names of the vector x ie names(x)

x lt- c(1232101)

names(x) lt- c(A B C D E F)

barplot(x)

A bar graph created by ldquobarplotrdquo function

ldquohistrdquo function generates a histogram for a set of numbers given by a vector

2 4 6 8

23

45

67

89

X Value

YV

alu

e

A B C D E F

02

46

810

A B C D E F

02

46

810

20

x lt- c(32 12 42 23 34 59 52 53 41 52 32 14)

hist(x xlab = Test Value main = Test Histogram)

The parameter ldquomainrdquo is used to state the title of the histogram

A generated histogram using ldquohistrdquo function

ldquoboxplotrdquo function creates a boxplot for the numbers given by the vectors

x1 lt- c(1112111011111213151211101213)

x2 lt- c(20212791223231211921157121292315)

boxplot(x1 x2 names=c(Data 1 Data 2))

Test Histogram

Test Value

Fre

qu

en

cy

1 2 3 4 5 6

01

23

4

21

Example of a box plot

Exercise 12 Using the table (data frame) given in exercise 7-1 make a plot which describes

relationship between masses of the planets and number of satellites

13 Basic program structure

R is equipped with programming grammars which are important and common in other

programming languages Here some of the most important ones will be described briefly

131 if-statement

ldquoifrdquo statement gives you a way to execute a specific procedure only if a specified condition7

is satisfied For example if you want to assign 1 to y when x gt 0 and otherwise assign 0 to y

you can write as follows

if (x gt 0)

y lt- 1

7 When specifying a condition logical operators such as amp (logical AND) and | (logical OR)

can be used

Data 1 Data 2

10

15

20

25

Median

Outlier 50 of the

data are within this

range

The bottom

25 excluding outliers

The top 25

excluding outliers

22

else

y lt- 0

Investigate the value of y after doing x lt- -5 Then after doing x lt- 3 investigate the value

of y again to make sure that the value has changed

The general form of if-statement is as follows

if (condition 1)

Set of procedures to execute when the condition 1 is satisfied

else if (condition 2)

Set of procedures to execute when the condition the condition 2 is satisfied (but the

condition 1 is NOT met)

else if(condition 3)

Se of procedures to execute when condition 3 is satisfied (but condition 1 to 2 are NOT

met)

else if hellip

else

Set of procedures to execute when NONE of the above conditions are satisfied

Exercise 13-1 Define a function which returns 1 when the input parameter is 0 and

otherwise returns 0

132 while-statement

while-statement iterates a set of given procedures as long as the given condition is satisfied

The general form of while-statement is as follows8

while(condition)

8 next-statement in the while-statement forces program to start the next iteration

immediately break-statement in while-statement forces the program to immediately stop

the iteration and get out of the while-loop

23

A set of procedures to execute when the condition is satisfied

For example executing while(x lt= 3) print(x) x lt- x + 1 after doing x lt- 1 will generate

the output of 1 to 3

gt x lt- 1

gt while (x lt= 3) print(x) x lt- x + 1

[1] 1

[1] 2

[1] 3

gt

If you are to write this procedure as a program in a file you may want to write each step

line by line as follows so that the program will be easy to read

x lt- 1

while (x lt= 3)

print(x)

x lt- x + 1

Initially the value of x is 1 and the condition of the while-block ie x le 3 is met Thus R

goes into the while-block The first procedure print(x) which displays the value of x is

executed and 1 will be displayed

In the next procedure in the while-block (x lt- x + 1) the value of x is increased by 1 Thus

at the end of the first loop of while-block x is 2

R interpreter then comes back to the condition check at the beginning of the while-block

(x lt= 3) The variable x is 2 at this point and still x le 3 holds so the procedures in the

while-block will be executed again ie displaying the value of x 2 by print(x) and

increasing the value of x by 1 by x lt- x + 1 At the end of the while-block x is 3

24

R interpreter will then come back again to the condition check at the beginning of the

while-block (x lt= 3) The variable x is 3 at this point and still x le 3 holds so the

procedures in the while-block will be executed again ie displaying the value of x which

is 3 by print(x) and increasing the value of x by 1 by x lt- x + 1 At the end of the

while-block x is 4

R interpreter will then come back again to the condition check at the beginning of the

while-block (x lt= 3) The variable x is 4 at this point and x le 3 is NOT satisfied any more

Thus R interpreter steps out of the while-loop Final value of x is 4

Exercise 13-2 Using while-statement display 13579 and 11 respectively in each line

133 for-statement

for-statement also does iteration like while-statement The for-statement assigns each of

given elements to the given variable starting from the first element to the last element in

the given elements After each assignment the procedures in the for-block are executed

The general form of for-statement is

for(variables in elements)

procedures

For example for (i in c(135)) print(i) will assign each of 1 3 5 into the variable i and

after each assignment the procedures in the file-block are executed Thus in this case

number 1 is displayed in the first iteration after the assignment 3 is displayed in the

second iteration after the assignment and finally 5 is displayed in the third iteration after

the assignment The following example makes a vector of set of squared values of 1 to 59

x lt- NULL

for (i in 15)

x lt- append(x i2)

9 Actually using for-statement is unnecessary here Just do x lt- (15)2

25

x lt- NULL assigns an empty vector to x NULL represents an empty vector

In the for-statement each value of 1 to 5 is assigned to i and after the assignment the

procedure in the for-block is executed

In the procedure in the for-block the squared value of i (represented as i2) is

concatenated to the vector x The function append(x i) concatenates i to vector x10

Exercise 13-3 Rewrite procedure in 12-2 using for-statement

14 Other useful functions

Here some of the frequently used R functions are introduced briefly

141 How to use

help(function_name) will display how to use function_name

142 variables and attributes

ls() or objects() will display currently defined variables

class(variable_name) or mode(variable_name) will give you the type of the variable (object

For example whether it is numerical variable character list or matrix)

attributes(variable_name) will return attributes defined for the given variable

variabl_name

143 Family of apply functions

ldquosapplyrdquo function will return a vector containing a set of output values from a given

function after using each value in the given vector as a single input to that given function

each by each

func1_sub lt- function(elm)

a function expecting a single number as a parameter

if(-1 lt= elm amp elm lt= 1) return (1) else return (0)

10 The same procedure can be done with x lt- c(x i2) c can be used to concatenate given

vectors

26

func1 lt- function(x) a function with a vector parameter x

return(sapply(x func1_sub))

144 Making figures

rdquocurverdquo function generates a graph for a given function See help for details

Example

curve(dnorm -7 +7) Draws normal (Gaussian) distribution

curve(cos(x)+cos(2x) -2pi 2pi 1000) 1000 is number of points

curve(func1 -3 3) The function defined in 143 Family of apply functions

27

18

f(10 20)

will assign 10 and 20 to parameters x and y respectively and 15 will be returned as (x +y) 2

= (10 + 20) 2 = 15 Here ldquoreturnrdquo statement will return the value given just after it You

can also explicitly give the parameter names x and y as follows

f(x = 10 y = 20)

The general way to define a function is

name_of_function lt- function(parameter 1 parameter 2 hellip)

various procedures possibly using the given parameters6

hellip

return(return_value)

Exercise 11 Implement a mathematical function f(x a b c) = ax2 + bx + c in R Then

calculate f(4 3 2 1)

12 Making graphs

R is equipped with functions capable of making graphs easily ldquoplotrdquo function may be one

of the simplest ones It creates a plot in the two-dimensional space For example

assigning the following vectors to x and y will create a plot with points on the locations

(12) (34) (59) (77) and (98)

x lt- c(13579)

y lt- c(24978)

plot(x y xlab=X Value ylab=Y Value)

Labels on the x and y axes can be given to the parameters xlab and ylab

respectively

6 Optional parameters hellip can be access with list(hellip) as a list

19

A plot created by ldquoplotrdquo function

A bar graph can be created using ldquobarplotrdquo function In the following example a bar

graph is created by giving the heights of each bar by vector x The labels of each bar

are given by the element names of the vector x ie names(x)

x lt- c(1232101)

names(x) lt- c(A B C D E F)

barplot(x)

A bar graph created by ldquobarplotrdquo function

ldquohistrdquo function generates a histogram for a set of numbers given by a vector

2 4 6 8

23

45

67

89

X Value

YV

alu

e

A B C D E F

02

46

810

A B C D E F

02

46

810

20

x lt- c(32 12 42 23 34 59 52 53 41 52 32 14)

hist(x xlab = Test Value main = Test Histogram)

The parameter ldquomainrdquo is used to state the title of the histogram

A generated histogram using ldquohistrdquo function

ldquoboxplotrdquo function creates a boxplot for the numbers given by the vectors

x1 lt- c(1112111011111213151211101213)

x2 lt- c(20212791223231211921157121292315)

boxplot(x1 x2 names=c(Data 1 Data 2))

Test Histogram

Test Value

Fre

qu

en

cy

1 2 3 4 5 6

01

23

4

21

Example of a box plot

Exercise 12 Using the table (data frame) given in exercise 7-1 make a plot which describes

relationship between masses of the planets and number of satellites

13 Basic program structure

R is equipped with programming grammars which are important and common in other

programming languages Here some of the most important ones will be described briefly

131 if-statement

ldquoifrdquo statement gives you a way to execute a specific procedure only if a specified condition7

is satisfied For example if you want to assign 1 to y when x gt 0 and otherwise assign 0 to y

you can write as follows

if (x gt 0)

y lt- 1

7 When specifying a condition logical operators such as amp (logical AND) and | (logical OR)

can be used

Data 1 Data 2

10

15

20

25

Median

Outlier 50 of the

data are within this

range

The bottom

25 excluding outliers

The top 25

excluding outliers

22

else

y lt- 0

Investigate the value of y after doing x lt- -5 Then after doing x lt- 3 investigate the value

of y again to make sure that the value has changed

The general form of if-statement is as follows

if (condition 1)

Set of procedures to execute when the condition 1 is satisfied

else if (condition 2)

Set of procedures to execute when the condition the condition 2 is satisfied (but the

condition 1 is NOT met)

else if(condition 3)

Se of procedures to execute when condition 3 is satisfied (but condition 1 to 2 are NOT

met)

else if hellip

else

Set of procedures to execute when NONE of the above conditions are satisfied

Exercise 13-1 Define a function which returns 1 when the input parameter is 0 and

otherwise returns 0

132 while-statement

while-statement iterates a set of given procedures as long as the given condition is satisfied

The general form of while-statement is as follows8

while(condition)

8 next-statement in the while-statement forces program to start the next iteration

immediately break-statement in while-statement forces the program to immediately stop

the iteration and get out of the while-loop

23

A set of procedures to execute when the condition is satisfied

For example executing while(x lt= 3) print(x) x lt- x + 1 after doing x lt- 1 will generate

the output of 1 to 3

gt x lt- 1

gt while (x lt= 3) print(x) x lt- x + 1

[1] 1

[1] 2

[1] 3

gt

If you are to write this procedure as a program in a file you may want to write each step

line by line as follows so that the program will be easy to read

x lt- 1

while (x lt= 3)

print(x)

x lt- x + 1

Initially the value of x is 1 and the condition of the while-block ie x le 3 is met Thus R

goes into the while-block The first procedure print(x) which displays the value of x is

executed and 1 will be displayed

In the next procedure in the while-block (x lt- x + 1) the value of x is increased by 1 Thus

at the end of the first loop of while-block x is 2

R interpreter then comes back to the condition check at the beginning of the while-block

(x lt= 3) The variable x is 2 at this point and still x le 3 holds so the procedures in the

while-block will be executed again ie displaying the value of x 2 by print(x) and

increasing the value of x by 1 by x lt- x + 1 At the end of the while-block x is 3

24

R interpreter will then come back again to the condition check at the beginning of the

while-block (x lt= 3) The variable x is 3 at this point and still x le 3 holds so the

procedures in the while-block will be executed again ie displaying the value of x which

is 3 by print(x) and increasing the value of x by 1 by x lt- x + 1 At the end of the

while-block x is 4

R interpreter will then come back again to the condition check at the beginning of the

while-block (x lt= 3) The variable x is 4 at this point and x le 3 is NOT satisfied any more

Thus R interpreter steps out of the while-loop Final value of x is 4

Exercise 13-2 Using while-statement display 13579 and 11 respectively in each line

133 for-statement

for-statement also does iteration like while-statement The for-statement assigns each of

given elements to the given variable starting from the first element to the last element in

the given elements After each assignment the procedures in the for-block are executed

The general form of for-statement is

for(variables in elements)

procedures

For example for (i in c(135)) print(i) will assign each of 1 3 5 into the variable i and

after each assignment the procedures in the file-block are executed Thus in this case

number 1 is displayed in the first iteration after the assignment 3 is displayed in the

second iteration after the assignment and finally 5 is displayed in the third iteration after

the assignment The following example makes a vector of set of squared values of 1 to 59

x lt- NULL

for (i in 15)

x lt- append(x i2)

9 Actually using for-statement is unnecessary here Just do x lt- (15)2

25

x lt- NULL assigns an empty vector to x NULL represents an empty vector

In the for-statement each value of 1 to 5 is assigned to i and after the assignment the

procedure in the for-block is executed

In the procedure in the for-block the squared value of i (represented as i2) is

concatenated to the vector x The function append(x i) concatenates i to vector x10

Exercise 13-3 Rewrite procedure in 12-2 using for-statement

14 Other useful functions

Here some of the frequently used R functions are introduced briefly

141 How to use

help(function_name) will display how to use function_name

142 variables and attributes

ls() or objects() will display currently defined variables

class(variable_name) or mode(variable_name) will give you the type of the variable (object

For example whether it is numerical variable character list or matrix)

attributes(variable_name) will return attributes defined for the given variable

variabl_name

143 Family of apply functions

ldquosapplyrdquo function will return a vector containing a set of output values from a given

function after using each value in the given vector as a single input to that given function

each by each

func1_sub lt- function(elm)

a function expecting a single number as a parameter

if(-1 lt= elm amp elm lt= 1) return (1) else return (0)

10 The same procedure can be done with x lt- c(x i2) c can be used to concatenate given

vectors

26

func1 lt- function(x) a function with a vector parameter x

return(sapply(x func1_sub))

144 Making figures

rdquocurverdquo function generates a graph for a given function See help for details

Example

curve(dnorm -7 +7) Draws normal (Gaussian) distribution

curve(cos(x)+cos(2x) -2pi 2pi 1000) 1000 is number of points

curve(func1 -3 3) The function defined in 143 Family of apply functions

27

19

A plot created by ldquoplotrdquo function

A bar graph can be created using ldquobarplotrdquo function In the following example a bar

graph is created by giving the heights of each bar by vector x The labels of each bar

are given by the element names of the vector x ie names(x)

x lt- c(1232101)

names(x) lt- c(A B C D E F)

barplot(x)

A bar graph created by ldquobarplotrdquo function

ldquohistrdquo function generates a histogram for a set of numbers given by a vector

2 4 6 8

23

45

67

89

X Value

YV

alu

e

A B C D E F

02

46

810

A B C D E F

02

46

810

20

x lt- c(32 12 42 23 34 59 52 53 41 52 32 14)

hist(x xlab = Test Value main = Test Histogram)

The parameter ldquomainrdquo is used to state the title of the histogram

A generated histogram using ldquohistrdquo function

ldquoboxplotrdquo function creates a boxplot for the numbers given by the vectors

x1 lt- c(1112111011111213151211101213)

x2 lt- c(20212791223231211921157121292315)

boxplot(x1 x2 names=c(Data 1 Data 2))

Test Histogram

Test Value

Fre

qu

en

cy

1 2 3 4 5 6

01

23

4

21

Example of a box plot

Exercise 12 Using the table (data frame) given in exercise 7-1 make a plot which describes

relationship between masses of the planets and number of satellites

13 Basic program structure

R is equipped with programming grammars which are important and common in other

programming languages Here some of the most important ones will be described briefly

131 if-statement

ldquoifrdquo statement gives you a way to execute a specific procedure only if a specified condition7

is satisfied For example if you want to assign 1 to y when x gt 0 and otherwise assign 0 to y

you can write as follows

if (x gt 0)

y lt- 1

7 When specifying a condition logical operators such as amp (logical AND) and | (logical OR)

can be used

Data 1 Data 2

10

15

20

25

Median

Outlier 50 of the

data are within this

range

The bottom

25 excluding outliers

The top 25

excluding outliers

22

else

y lt- 0

Investigate the value of y after doing x lt- -5 Then after doing x lt- 3 investigate the value

of y again to make sure that the value has changed

The general form of if-statement is as follows

if (condition 1)

Set of procedures to execute when the condition 1 is satisfied

else if (condition 2)

Set of procedures to execute when the condition the condition 2 is satisfied (but the

condition 1 is NOT met)

else if(condition 3)

Se of procedures to execute when condition 3 is satisfied (but condition 1 to 2 are NOT

met)

else if hellip

else

Set of procedures to execute when NONE of the above conditions are satisfied

Exercise 13-1 Define a function which returns 1 when the input parameter is 0 and

otherwise returns 0

132 while-statement

while-statement iterates a set of given procedures as long as the given condition is satisfied

The general form of while-statement is as follows8

while(condition)

8 next-statement in the while-statement forces program to start the next iteration

immediately break-statement in while-statement forces the program to immediately stop

the iteration and get out of the while-loop

23

A set of procedures to execute when the condition is satisfied

For example executing while(x lt= 3) print(x) x lt- x + 1 after doing x lt- 1 will generate

the output of 1 to 3

gt x lt- 1

gt while (x lt= 3) print(x) x lt- x + 1

[1] 1

[1] 2

[1] 3

gt

If you are to write this procedure as a program in a file you may want to write each step

line by line as follows so that the program will be easy to read

x lt- 1

while (x lt= 3)

print(x)

x lt- x + 1

Initially the value of x is 1 and the condition of the while-block ie x le 3 is met Thus R

goes into the while-block The first procedure print(x) which displays the value of x is

executed and 1 will be displayed

In the next procedure in the while-block (x lt- x + 1) the value of x is increased by 1 Thus

at the end of the first loop of while-block x is 2

R interpreter then comes back to the condition check at the beginning of the while-block

(x lt= 3) The variable x is 2 at this point and still x le 3 holds so the procedures in the

while-block will be executed again ie displaying the value of x 2 by print(x) and

increasing the value of x by 1 by x lt- x + 1 At the end of the while-block x is 3

24

R interpreter will then come back again to the condition check at the beginning of the

while-block (x lt= 3) The variable x is 3 at this point and still x le 3 holds so the

procedures in the while-block will be executed again ie displaying the value of x which

is 3 by print(x) and increasing the value of x by 1 by x lt- x + 1 At the end of the

while-block x is 4

R interpreter will then come back again to the condition check at the beginning of the

while-block (x lt= 3) The variable x is 4 at this point and x le 3 is NOT satisfied any more

Thus R interpreter steps out of the while-loop Final value of x is 4

Exercise 13-2 Using while-statement display 13579 and 11 respectively in each line

133 for-statement

for-statement also does iteration like while-statement The for-statement assigns each of

given elements to the given variable starting from the first element to the last element in

the given elements After each assignment the procedures in the for-block are executed

The general form of for-statement is

for(variables in elements)

procedures

For example for (i in c(135)) print(i) will assign each of 1 3 5 into the variable i and

after each assignment the procedures in the file-block are executed Thus in this case

number 1 is displayed in the first iteration after the assignment 3 is displayed in the

second iteration after the assignment and finally 5 is displayed in the third iteration after

the assignment The following example makes a vector of set of squared values of 1 to 59

x lt- NULL

for (i in 15)

x lt- append(x i2)

9 Actually using for-statement is unnecessary here Just do x lt- (15)2

25

x lt- NULL assigns an empty vector to x NULL represents an empty vector

In the for-statement each value of 1 to 5 is assigned to i and after the assignment the

procedure in the for-block is executed

In the procedure in the for-block the squared value of i (represented as i2) is

concatenated to the vector x The function append(x i) concatenates i to vector x10

Exercise 13-3 Rewrite procedure in 12-2 using for-statement

14 Other useful functions

Here some of the frequently used R functions are introduced briefly

141 How to use

help(function_name) will display how to use function_name

142 variables and attributes

ls() or objects() will display currently defined variables

class(variable_name) or mode(variable_name) will give you the type of the variable (object

For example whether it is numerical variable character list or matrix)

attributes(variable_name) will return attributes defined for the given variable

variabl_name

143 Family of apply functions

ldquosapplyrdquo function will return a vector containing a set of output values from a given

function after using each value in the given vector as a single input to that given function

each by each

func1_sub lt- function(elm)

a function expecting a single number as a parameter

if(-1 lt= elm amp elm lt= 1) return (1) else return (0)

10 The same procedure can be done with x lt- c(x i2) c can be used to concatenate given

vectors

26

func1 lt- function(x) a function with a vector parameter x

return(sapply(x func1_sub))

144 Making figures

rdquocurverdquo function generates a graph for a given function See help for details

Example

curve(dnorm -7 +7) Draws normal (Gaussian) distribution

curve(cos(x)+cos(2x) -2pi 2pi 1000) 1000 is number of points

curve(func1 -3 3) The function defined in 143 Family of apply functions

27

20

x lt- c(32 12 42 23 34 59 52 53 41 52 32 14)

hist(x xlab = Test Value main = Test Histogram)

The parameter ldquomainrdquo is used to state the title of the histogram

A generated histogram using ldquohistrdquo function

ldquoboxplotrdquo function creates a boxplot for the numbers given by the vectors

x1 lt- c(1112111011111213151211101213)

x2 lt- c(20212791223231211921157121292315)

boxplot(x1 x2 names=c(Data 1 Data 2))

Test Histogram

Test Value

Fre

qu

en

cy

1 2 3 4 5 6

01

23

4

21

Example of a box plot

Exercise 12 Using the table (data frame) given in exercise 7-1 make a plot which describes

relationship between masses of the planets and number of satellites

13 Basic program structure

R is equipped with programming grammars which are important and common in other

programming languages Here some of the most important ones will be described briefly

131 if-statement

ldquoifrdquo statement gives you a way to execute a specific procedure only if a specified condition7

is satisfied For example if you want to assign 1 to y when x gt 0 and otherwise assign 0 to y

you can write as follows

if (x gt 0)

y lt- 1

7 When specifying a condition logical operators such as amp (logical AND) and | (logical OR)

can be used

Data 1 Data 2

10

15

20

25

Median

Outlier 50 of the

data are within this

range

The bottom

25 excluding outliers

The top 25

excluding outliers

22

else

y lt- 0

Investigate the value of y after doing x lt- -5 Then after doing x lt- 3 investigate the value

of y again to make sure that the value has changed

The general form of if-statement is as follows

if (condition 1)

Set of procedures to execute when the condition 1 is satisfied

else if (condition 2)

Set of procedures to execute when the condition the condition 2 is satisfied (but the

condition 1 is NOT met)

else if(condition 3)

Se of procedures to execute when condition 3 is satisfied (but condition 1 to 2 are NOT

met)

else if hellip

else

Set of procedures to execute when NONE of the above conditions are satisfied

Exercise 13-1 Define a function which returns 1 when the input parameter is 0 and

otherwise returns 0

132 while-statement

while-statement iterates a set of given procedures as long as the given condition is satisfied

The general form of while-statement is as follows8

while(condition)

8 next-statement in the while-statement forces program to start the next iteration

immediately break-statement in while-statement forces the program to immediately stop

the iteration and get out of the while-loop

23

A set of procedures to execute when the condition is satisfied

For example executing while(x lt= 3) print(x) x lt- x + 1 after doing x lt- 1 will generate

the output of 1 to 3

gt x lt- 1

gt while (x lt= 3) print(x) x lt- x + 1

[1] 1

[1] 2

[1] 3

gt

If you are to write this procedure as a program in a file you may want to write each step

line by line as follows so that the program will be easy to read

x lt- 1

while (x lt= 3)

print(x)

x lt- x + 1

Initially the value of x is 1 and the condition of the while-block ie x le 3 is met Thus R

goes into the while-block The first procedure print(x) which displays the value of x is

executed and 1 will be displayed

In the next procedure in the while-block (x lt- x + 1) the value of x is increased by 1 Thus

at the end of the first loop of while-block x is 2

R interpreter then comes back to the condition check at the beginning of the while-block

(x lt= 3) The variable x is 2 at this point and still x le 3 holds so the procedures in the

while-block will be executed again ie displaying the value of x 2 by print(x) and

increasing the value of x by 1 by x lt- x + 1 At the end of the while-block x is 3

24

R interpreter will then come back again to the condition check at the beginning of the

while-block (x lt= 3) The variable x is 3 at this point and still x le 3 holds so the

procedures in the while-block will be executed again ie displaying the value of x which

is 3 by print(x) and increasing the value of x by 1 by x lt- x + 1 At the end of the

while-block x is 4

R interpreter will then come back again to the condition check at the beginning of the

while-block (x lt= 3) The variable x is 4 at this point and x le 3 is NOT satisfied any more

Thus R interpreter steps out of the while-loop Final value of x is 4

Exercise 13-2 Using while-statement display 13579 and 11 respectively in each line

133 for-statement

for-statement also does iteration like while-statement The for-statement assigns each of

given elements to the given variable starting from the first element to the last element in

the given elements After each assignment the procedures in the for-block are executed

The general form of for-statement is

for(variables in elements)

procedures

For example for (i in c(135)) print(i) will assign each of 1 3 5 into the variable i and

after each assignment the procedures in the file-block are executed Thus in this case

number 1 is displayed in the first iteration after the assignment 3 is displayed in the

second iteration after the assignment and finally 5 is displayed in the third iteration after

the assignment The following example makes a vector of set of squared values of 1 to 59

x lt- NULL

for (i in 15)

x lt- append(x i2)

9 Actually using for-statement is unnecessary here Just do x lt- (15)2

25

x lt- NULL assigns an empty vector to x NULL represents an empty vector

In the for-statement each value of 1 to 5 is assigned to i and after the assignment the

procedure in the for-block is executed

In the procedure in the for-block the squared value of i (represented as i2) is

concatenated to the vector x The function append(x i) concatenates i to vector x10

Exercise 13-3 Rewrite procedure in 12-2 using for-statement

14 Other useful functions

Here some of the frequently used R functions are introduced briefly

141 How to use

help(function_name) will display how to use function_name

142 variables and attributes

ls() or objects() will display currently defined variables

class(variable_name) or mode(variable_name) will give you the type of the variable (object

For example whether it is numerical variable character list or matrix)

attributes(variable_name) will return attributes defined for the given variable

variabl_name

143 Family of apply functions

ldquosapplyrdquo function will return a vector containing a set of output values from a given

function after using each value in the given vector as a single input to that given function

each by each

func1_sub lt- function(elm)

a function expecting a single number as a parameter

if(-1 lt= elm amp elm lt= 1) return (1) else return (0)

10 The same procedure can be done with x lt- c(x i2) c can be used to concatenate given

vectors

26

func1 lt- function(x) a function with a vector parameter x

return(sapply(x func1_sub))

144 Making figures

rdquocurverdquo function generates a graph for a given function See help for details

Example

curve(dnorm -7 +7) Draws normal (Gaussian) distribution

curve(cos(x)+cos(2x) -2pi 2pi 1000) 1000 is number of points

curve(func1 -3 3) The function defined in 143 Family of apply functions

27

21

Example of a box plot

Exercise 12 Using the table (data frame) given in exercise 7-1 make a plot which describes

relationship between masses of the planets and number of satellites

13 Basic program structure

R is equipped with programming grammars which are important and common in other

programming languages Here some of the most important ones will be described briefly

131 if-statement

ldquoifrdquo statement gives you a way to execute a specific procedure only if a specified condition7

is satisfied For example if you want to assign 1 to y when x gt 0 and otherwise assign 0 to y

you can write as follows

if (x gt 0)

y lt- 1

7 When specifying a condition logical operators such as amp (logical AND) and | (logical OR)

can be used

Data 1 Data 2

10

15

20

25

Median

Outlier 50 of the

data are within this

range

The bottom

25 excluding outliers

The top 25

excluding outliers

22

else

y lt- 0

Investigate the value of y after doing x lt- -5 Then after doing x lt- 3 investigate the value

of y again to make sure that the value has changed

The general form of if-statement is as follows

if (condition 1)

Set of procedures to execute when the condition 1 is satisfied

else if (condition 2)

Set of procedures to execute when the condition the condition 2 is satisfied (but the

condition 1 is NOT met)

else if(condition 3)

Se of procedures to execute when condition 3 is satisfied (but condition 1 to 2 are NOT

met)

else if hellip

else

Set of procedures to execute when NONE of the above conditions are satisfied

Exercise 13-1 Define a function which returns 1 when the input parameter is 0 and

otherwise returns 0

132 while-statement

while-statement iterates a set of given procedures as long as the given condition is satisfied

The general form of while-statement is as follows8

while(condition)

8 next-statement in the while-statement forces program to start the next iteration

immediately break-statement in while-statement forces the program to immediately stop

the iteration and get out of the while-loop

23

A set of procedures to execute when the condition is satisfied

For example executing while(x lt= 3) print(x) x lt- x + 1 after doing x lt- 1 will generate

the output of 1 to 3

gt x lt- 1

gt while (x lt= 3) print(x) x lt- x + 1

[1] 1

[1] 2

[1] 3

gt

If you are to write this procedure as a program in a file you may want to write each step

line by line as follows so that the program will be easy to read

x lt- 1

while (x lt= 3)

print(x)

x lt- x + 1

Initially the value of x is 1 and the condition of the while-block ie x le 3 is met Thus R

goes into the while-block The first procedure print(x) which displays the value of x is

executed and 1 will be displayed

In the next procedure in the while-block (x lt- x + 1) the value of x is increased by 1 Thus

at the end of the first loop of while-block x is 2

R interpreter then comes back to the condition check at the beginning of the while-block

(x lt= 3) The variable x is 2 at this point and still x le 3 holds so the procedures in the

while-block will be executed again ie displaying the value of x 2 by print(x) and

increasing the value of x by 1 by x lt- x + 1 At the end of the while-block x is 3

24

R interpreter will then come back again to the condition check at the beginning of the

while-block (x lt= 3) The variable x is 3 at this point and still x le 3 holds so the

procedures in the while-block will be executed again ie displaying the value of x which

is 3 by print(x) and increasing the value of x by 1 by x lt- x + 1 At the end of the

while-block x is 4

R interpreter will then come back again to the condition check at the beginning of the

while-block (x lt= 3) The variable x is 4 at this point and x le 3 is NOT satisfied any more

Thus R interpreter steps out of the while-loop Final value of x is 4

Exercise 13-2 Using while-statement display 13579 and 11 respectively in each line

133 for-statement

for-statement also does iteration like while-statement The for-statement assigns each of

given elements to the given variable starting from the first element to the last element in

the given elements After each assignment the procedures in the for-block are executed

The general form of for-statement is

for(variables in elements)

procedures

For example for (i in c(135)) print(i) will assign each of 1 3 5 into the variable i and

after each assignment the procedures in the file-block are executed Thus in this case

number 1 is displayed in the first iteration after the assignment 3 is displayed in the

second iteration after the assignment and finally 5 is displayed in the third iteration after

the assignment The following example makes a vector of set of squared values of 1 to 59

x lt- NULL

for (i in 15)

x lt- append(x i2)

9 Actually using for-statement is unnecessary here Just do x lt- (15)2

25

x lt- NULL assigns an empty vector to x NULL represents an empty vector

In the for-statement each value of 1 to 5 is assigned to i and after the assignment the

procedure in the for-block is executed

In the procedure in the for-block the squared value of i (represented as i2) is

concatenated to the vector x The function append(x i) concatenates i to vector x10

Exercise 13-3 Rewrite procedure in 12-2 using for-statement

14 Other useful functions

Here some of the frequently used R functions are introduced briefly

141 How to use

help(function_name) will display how to use function_name

142 variables and attributes

ls() or objects() will display currently defined variables

class(variable_name) or mode(variable_name) will give you the type of the variable (object

For example whether it is numerical variable character list or matrix)

attributes(variable_name) will return attributes defined for the given variable

variabl_name

143 Family of apply functions

ldquosapplyrdquo function will return a vector containing a set of output values from a given

function after using each value in the given vector as a single input to that given function

each by each

func1_sub lt- function(elm)

a function expecting a single number as a parameter

if(-1 lt= elm amp elm lt= 1) return (1) else return (0)

10 The same procedure can be done with x lt- c(x i2) c can be used to concatenate given

vectors

26

func1 lt- function(x) a function with a vector parameter x

return(sapply(x func1_sub))

144 Making figures

rdquocurverdquo function generates a graph for a given function See help for details

Example

curve(dnorm -7 +7) Draws normal (Gaussian) distribution

curve(cos(x)+cos(2x) -2pi 2pi 1000) 1000 is number of points

curve(func1 -3 3) The function defined in 143 Family of apply functions

27

22

else

y lt- 0

Investigate the value of y after doing x lt- -5 Then after doing x lt- 3 investigate the value

of y again to make sure that the value has changed

The general form of if-statement is as follows

if (condition 1)

Set of procedures to execute when the condition 1 is satisfied

else if (condition 2)

Set of procedures to execute when the condition the condition 2 is satisfied (but the

condition 1 is NOT met)

else if(condition 3)

Se of procedures to execute when condition 3 is satisfied (but condition 1 to 2 are NOT

met)

else if hellip

else

Set of procedures to execute when NONE of the above conditions are satisfied

Exercise 13-1 Define a function which returns 1 when the input parameter is 0 and

otherwise returns 0

132 while-statement

while-statement iterates a set of given procedures as long as the given condition is satisfied

The general form of while-statement is as follows8

while(condition)

8 next-statement in the while-statement forces program to start the next iteration

immediately break-statement in while-statement forces the program to immediately stop

the iteration and get out of the while-loop

23

A set of procedures to execute when the condition is satisfied

For example executing while(x lt= 3) print(x) x lt- x + 1 after doing x lt- 1 will generate

the output of 1 to 3

gt x lt- 1

gt while (x lt= 3) print(x) x lt- x + 1

[1] 1

[1] 2

[1] 3

gt

If you are to write this procedure as a program in a file you may want to write each step

line by line as follows so that the program will be easy to read

x lt- 1

while (x lt= 3)

print(x)

x lt- x + 1

Initially the value of x is 1 and the condition of the while-block ie x le 3 is met Thus R

goes into the while-block The first procedure print(x) which displays the value of x is

executed and 1 will be displayed

In the next procedure in the while-block (x lt- x + 1) the value of x is increased by 1 Thus

at the end of the first loop of while-block x is 2

R interpreter then comes back to the condition check at the beginning of the while-block

(x lt= 3) The variable x is 2 at this point and still x le 3 holds so the procedures in the

while-block will be executed again ie displaying the value of x 2 by print(x) and

increasing the value of x by 1 by x lt- x + 1 At the end of the while-block x is 3

24

R interpreter will then come back again to the condition check at the beginning of the

while-block (x lt= 3) The variable x is 3 at this point and still x le 3 holds so the

procedures in the while-block will be executed again ie displaying the value of x which

is 3 by print(x) and increasing the value of x by 1 by x lt- x + 1 At the end of the

while-block x is 4

R interpreter will then come back again to the condition check at the beginning of the

while-block (x lt= 3) The variable x is 4 at this point and x le 3 is NOT satisfied any more

Thus R interpreter steps out of the while-loop Final value of x is 4

Exercise 13-2 Using while-statement display 13579 and 11 respectively in each line

133 for-statement

for-statement also does iteration like while-statement The for-statement assigns each of

given elements to the given variable starting from the first element to the last element in

the given elements After each assignment the procedures in the for-block are executed

The general form of for-statement is

for(variables in elements)

procedures

For example for (i in c(135)) print(i) will assign each of 1 3 5 into the variable i and

after each assignment the procedures in the file-block are executed Thus in this case

number 1 is displayed in the first iteration after the assignment 3 is displayed in the

second iteration after the assignment and finally 5 is displayed in the third iteration after

the assignment The following example makes a vector of set of squared values of 1 to 59

x lt- NULL

for (i in 15)

x lt- append(x i2)

9 Actually using for-statement is unnecessary here Just do x lt- (15)2

25

x lt- NULL assigns an empty vector to x NULL represents an empty vector

In the for-statement each value of 1 to 5 is assigned to i and after the assignment the

procedure in the for-block is executed

In the procedure in the for-block the squared value of i (represented as i2) is

concatenated to the vector x The function append(x i) concatenates i to vector x10

Exercise 13-3 Rewrite procedure in 12-2 using for-statement

14 Other useful functions

Here some of the frequently used R functions are introduced briefly

141 How to use

help(function_name) will display how to use function_name

142 variables and attributes

ls() or objects() will display currently defined variables

class(variable_name) or mode(variable_name) will give you the type of the variable (object

For example whether it is numerical variable character list or matrix)

attributes(variable_name) will return attributes defined for the given variable

variabl_name

143 Family of apply functions

ldquosapplyrdquo function will return a vector containing a set of output values from a given

function after using each value in the given vector as a single input to that given function

each by each

func1_sub lt- function(elm)

a function expecting a single number as a parameter

if(-1 lt= elm amp elm lt= 1) return (1) else return (0)

10 The same procedure can be done with x lt- c(x i2) c can be used to concatenate given

vectors

26

func1 lt- function(x) a function with a vector parameter x

return(sapply(x func1_sub))

144 Making figures

rdquocurverdquo function generates a graph for a given function See help for details

Example

curve(dnorm -7 +7) Draws normal (Gaussian) distribution

curve(cos(x)+cos(2x) -2pi 2pi 1000) 1000 is number of points

curve(func1 -3 3) The function defined in 143 Family of apply functions

27

23

A set of procedures to execute when the condition is satisfied

For example executing while(x lt= 3) print(x) x lt- x + 1 after doing x lt- 1 will generate

the output of 1 to 3

gt x lt- 1

gt while (x lt= 3) print(x) x lt- x + 1

[1] 1

[1] 2

[1] 3

gt

If you are to write this procedure as a program in a file you may want to write each step

line by line as follows so that the program will be easy to read

x lt- 1

while (x lt= 3)

print(x)

x lt- x + 1

Initially the value of x is 1 and the condition of the while-block ie x le 3 is met Thus R

goes into the while-block The first procedure print(x) which displays the value of x is

executed and 1 will be displayed

In the next procedure in the while-block (x lt- x + 1) the value of x is increased by 1 Thus

at the end of the first loop of while-block x is 2

R interpreter then comes back to the condition check at the beginning of the while-block

(x lt= 3) The variable x is 2 at this point and still x le 3 holds so the procedures in the

while-block will be executed again ie displaying the value of x 2 by print(x) and

increasing the value of x by 1 by x lt- x + 1 At the end of the while-block x is 3

24

R interpreter will then come back again to the condition check at the beginning of the

while-block (x lt= 3) The variable x is 3 at this point and still x le 3 holds so the

procedures in the while-block will be executed again ie displaying the value of x which

is 3 by print(x) and increasing the value of x by 1 by x lt- x + 1 At the end of the

while-block x is 4

R interpreter will then come back again to the condition check at the beginning of the

while-block (x lt= 3) The variable x is 4 at this point and x le 3 is NOT satisfied any more

Thus R interpreter steps out of the while-loop Final value of x is 4

Exercise 13-2 Using while-statement display 13579 and 11 respectively in each line

133 for-statement

for-statement also does iteration like while-statement The for-statement assigns each of

given elements to the given variable starting from the first element to the last element in

the given elements After each assignment the procedures in the for-block are executed

The general form of for-statement is

for(variables in elements)

procedures

For example for (i in c(135)) print(i) will assign each of 1 3 5 into the variable i and

after each assignment the procedures in the file-block are executed Thus in this case

number 1 is displayed in the first iteration after the assignment 3 is displayed in the

second iteration after the assignment and finally 5 is displayed in the third iteration after

the assignment The following example makes a vector of set of squared values of 1 to 59

x lt- NULL

for (i in 15)

x lt- append(x i2)

9 Actually using for-statement is unnecessary here Just do x lt- (15)2

25

x lt- NULL assigns an empty vector to x NULL represents an empty vector

In the for-statement each value of 1 to 5 is assigned to i and after the assignment the

procedure in the for-block is executed

In the procedure in the for-block the squared value of i (represented as i2) is

concatenated to the vector x The function append(x i) concatenates i to vector x10

Exercise 13-3 Rewrite procedure in 12-2 using for-statement

14 Other useful functions

Here some of the frequently used R functions are introduced briefly

141 How to use

help(function_name) will display how to use function_name

142 variables and attributes

ls() or objects() will display currently defined variables

class(variable_name) or mode(variable_name) will give you the type of the variable (object

For example whether it is numerical variable character list or matrix)

attributes(variable_name) will return attributes defined for the given variable

variabl_name

143 Family of apply functions

ldquosapplyrdquo function will return a vector containing a set of output values from a given

function after using each value in the given vector as a single input to that given function

each by each

func1_sub lt- function(elm)

a function expecting a single number as a parameter

if(-1 lt= elm amp elm lt= 1) return (1) else return (0)

10 The same procedure can be done with x lt- c(x i2) c can be used to concatenate given

vectors

26

func1 lt- function(x) a function with a vector parameter x

return(sapply(x func1_sub))

144 Making figures

rdquocurverdquo function generates a graph for a given function See help for details

Example

curve(dnorm -7 +7) Draws normal (Gaussian) distribution

curve(cos(x)+cos(2x) -2pi 2pi 1000) 1000 is number of points

curve(func1 -3 3) The function defined in 143 Family of apply functions

27

24

R interpreter will then come back again to the condition check at the beginning of the

while-block (x lt= 3) The variable x is 3 at this point and still x le 3 holds so the

procedures in the while-block will be executed again ie displaying the value of x which

is 3 by print(x) and increasing the value of x by 1 by x lt- x + 1 At the end of the

while-block x is 4

R interpreter will then come back again to the condition check at the beginning of the

while-block (x lt= 3) The variable x is 4 at this point and x le 3 is NOT satisfied any more

Thus R interpreter steps out of the while-loop Final value of x is 4

Exercise 13-2 Using while-statement display 13579 and 11 respectively in each line

133 for-statement

for-statement also does iteration like while-statement The for-statement assigns each of

given elements to the given variable starting from the first element to the last element in

the given elements After each assignment the procedures in the for-block are executed

The general form of for-statement is

for(variables in elements)

procedures

For example for (i in c(135)) print(i) will assign each of 1 3 5 into the variable i and

after each assignment the procedures in the file-block are executed Thus in this case

number 1 is displayed in the first iteration after the assignment 3 is displayed in the

second iteration after the assignment and finally 5 is displayed in the third iteration after

the assignment The following example makes a vector of set of squared values of 1 to 59

x lt- NULL

for (i in 15)

x lt- append(x i2)

9 Actually using for-statement is unnecessary here Just do x lt- (15)2

25

x lt- NULL assigns an empty vector to x NULL represents an empty vector

In the for-statement each value of 1 to 5 is assigned to i and after the assignment the

procedure in the for-block is executed

In the procedure in the for-block the squared value of i (represented as i2) is

concatenated to the vector x The function append(x i) concatenates i to vector x10

Exercise 13-3 Rewrite procedure in 12-2 using for-statement

14 Other useful functions

Here some of the frequently used R functions are introduced briefly

141 How to use

help(function_name) will display how to use function_name

142 variables and attributes

ls() or objects() will display currently defined variables

class(variable_name) or mode(variable_name) will give you the type of the variable (object

For example whether it is numerical variable character list or matrix)

attributes(variable_name) will return attributes defined for the given variable

variabl_name

143 Family of apply functions

ldquosapplyrdquo function will return a vector containing a set of output values from a given

function after using each value in the given vector as a single input to that given function

each by each

func1_sub lt- function(elm)

a function expecting a single number as a parameter

if(-1 lt= elm amp elm lt= 1) return (1) else return (0)

10 The same procedure can be done with x lt- c(x i2) c can be used to concatenate given

vectors

26

func1 lt- function(x) a function with a vector parameter x

return(sapply(x func1_sub))

144 Making figures

rdquocurverdquo function generates a graph for a given function See help for details

Example

curve(dnorm -7 +7) Draws normal (Gaussian) distribution

curve(cos(x)+cos(2x) -2pi 2pi 1000) 1000 is number of points

curve(func1 -3 3) The function defined in 143 Family of apply functions

27

25

x lt- NULL assigns an empty vector to x NULL represents an empty vector

In the for-statement each value of 1 to 5 is assigned to i and after the assignment the

procedure in the for-block is executed

In the procedure in the for-block the squared value of i (represented as i2) is

concatenated to the vector x The function append(x i) concatenates i to vector x10

Exercise 13-3 Rewrite procedure in 12-2 using for-statement

14 Other useful functions

Here some of the frequently used R functions are introduced briefly

141 How to use

help(function_name) will display how to use function_name

142 variables and attributes

ls() or objects() will display currently defined variables

class(variable_name) or mode(variable_name) will give you the type of the variable (object

For example whether it is numerical variable character list or matrix)

attributes(variable_name) will return attributes defined for the given variable

variabl_name

143 Family of apply functions

ldquosapplyrdquo function will return a vector containing a set of output values from a given

function after using each value in the given vector as a single input to that given function

each by each

func1_sub lt- function(elm)

a function expecting a single number as a parameter

if(-1 lt= elm amp elm lt= 1) return (1) else return (0)

10 The same procedure can be done with x lt- c(x i2) c can be used to concatenate given

vectors

26

func1 lt- function(x) a function with a vector parameter x

return(sapply(x func1_sub))

144 Making figures

rdquocurverdquo function generates a graph for a given function See help for details

Example

curve(dnorm -7 +7) Draws normal (Gaussian) distribution

curve(cos(x)+cos(2x) -2pi 2pi 1000) 1000 is number of points

curve(func1 -3 3) The function defined in 143 Family of apply functions

27

26

func1 lt- function(x) a function with a vector parameter x

return(sapply(x func1_sub))

144 Making figures

rdquocurverdquo function generates a graph for a given function See help for details

Example

curve(dnorm -7 +7) Draws normal (Gaussian) distribution

curve(cos(x)+cos(2x) -2pi 2pi 1000) 1000 is number of points

curve(func1 -3 3) The function defined in 143 Family of apply functions

27

27