r objects all r entities exist as objects they can all be operated on as data we will cover: ...

30
R objects R objects All R entities exist as objects They can all be operated on as data We will cover: Vectors Factors Lists Data frames Tables Indexing R packages and datasets

Upload: loraine-stafford

Post on 18-Jan-2018

220 views

Category:

Documents


0 download

DESCRIPTION

Vectors Other ways of creating columns of numbers (vectors):  The seq function seq(1,10,1) = 1, 2, 3, 4, 5, 6, 7, 8, 9, 10 seq(1,4,0.5) = 1, 1.5, 2, 2.5, 3, 3.5, 4  x:y 1:10 = 1, 2, 3, 4, 5, 6, 7, 8, 9, 10 2 * 1:10 = 2, 4, 6, 8, 10, 12, 14, 16, 18, 20  The rep function rep(2,4) = 2, 2, 2, 2 ?seq() ?rep()

TRANSCRIPT

Page 1: R objects  All R entities exist as objects  They can all be operated on as data  We will cover:  Vectors  Factors  Lists  Data frames  Tables

R objectsR objects

All R entities exist as objects They can all be operated on as data We will cover:

Vectors Factors Lists Data frames Tables Indexing R packages and datasets

Page 2: R objects  All R entities exist as objects  They can all be operated on as data  We will cover:  Vectors  Factors  Lists  Data frames  Tables

VectorsVectors

Think of vectors as being equivalent to a single column

of numbers in a spreadsheet You can create a vector using the c( ) function

(concatenate) as follows:

x <- c( ) For example:

x <- c(1,2,4,8) creates a column of the numbers 1,2,4,8

Page 3: R objects  All R entities exist as objects  They can all be operated on as data  We will cover:  Vectors  Factors  Lists  Data frames  Tables

VectorsVectors

Other ways of creating columns of numbers (vectors): The seq function

seq(1,10,1) = 1, 2, 3, 4, 5, 6, 7, 8, 9, 10

seq(1,4,0.5) = 1, 1.5, 2, 2.5, 3, 3.5, 4 x:y

1:10 = 1, 2, 3, 4, 5, 6, 7, 8, 9, 10

2 * 1:10 = 2, 4, 6, 8, 10, 12, 14, 16, 18, 20 The rep function

rep(2,4) = 2, 2, 2, 2

?seq()

?rep()

Page 4: R objects  All R entities exist as objects  They can all be operated on as data  We will cover:  Vectors  Factors  Lists  Data frames  Tables

Indexing Indexing

Referencing (indexing) specific ‘cells’ in a column:

Example:

if x is the vector 1, 2, 5 then

x [1] = 1, x [2] = 2, x [3] = 5

and

x [1:2] = 1, 2 first two listed items in x

x [2:3] = 2, 5 2nd & 3rd listed items in x

x [x>2] = 5 use of ‘>’ and ‘<‘ characters

Page 5: R objects  All R entities exist as objects  They can all be operated on as data  We will cover:  Vectors  Factors  Lists  Data frames  Tables

Performing simple operations on vectorsPerforming simple operations on vectors

In R, when you carry out simple operations (+ - * /) on

vectors that have the same number of entries, R just

performs the normal operations on the numbers in the

vector, entry by entry

If the vectors don’t have the same number of entries,

then R will cycle through the vector with the smaller

number of entries

Page 6: R objects  All R entities exist as objects  They can all be operated on as data  We will cover:  Vectors  Factors  Lists  Data frames  Tables

Performing simple operations on vectorsPerforming simple operations on vectors

Example:

Page 7: R objects  All R entities exist as objects  They can all be operated on as data  We will cover:  Vectors  Factors  Lists  Data frames  Tables

Performing simple operations on vectorsPerforming simple operations on vectors

Examples:

Page 8: R objects  All R entities exist as objects  They can all be operated on as data  We will cover:  Vectors  Factors  Lists  Data frames  Tables

Performing simple operations on vectorsPerforming simple operations on vectors

Example:

Page 9: R objects  All R entities exist as objects  They can all be operated on as data  We will cover:  Vectors  Factors  Lists  Data frames  Tables

Performing simple operations on vectorsPerforming simple operations on vectors

Vectors (columns of numbers) can be assigned by putting

together other vectors, for example:

Page 10: R objects  All R entities exist as objects  They can all be operated on as data  We will cover:  Vectors  Factors  Lists  Data frames  Tables

FunctionsFunctions

R functions take arguments (information that you put into

the function which goes between the brackets) and can

perform a range of tasks In the case of the ‘help’ function the task is to display

information from the R documentation files A comprehensive list of R functions can be obtained from

the R reference manual under the help menu

Page 11: R objects  All R entities exist as objects  They can all be operated on as data  We will cover:  Vectors  Factors  Lists  Data frames  Tables

Simple statistic functionsSimple statistic functions

R comes with some useful functions:

sqrt ( ) square root

mean ( ) arithmetic mean

hist ( ) calculating & plotting histograms

R also comes with pre-loaded datasets, which we’ll discuss

later….

Page 12: R objects  All R entities exist as objects  They can all be operated on as data  We will cover:  Vectors  Factors  Lists  Data frames  Tables

Basic statistic functions on vectorsBasic statistic functions on vectors

> X1 <- c(1.1, 4.3, 5, 2, 1, 4, 9.5)

> sum(X1) sum = 26.9> mean(X1) mean = 3.842857> median(X1) median = 4> var(X1) variance = 8.762857> sd(X1) standard deviation = 2.960212> summary(X1)Min. 1st Qu. Median Mean 3rd Qu. Max.1.000 1.550 4.000 3.843 4.650 9.500> quantile(X1)0% 25% 50% 75% 100%1.00 1.55 4.00 4.65 9.50

Page 13: R objects  All R entities exist as objects  They can all be operated on as data  We will cover:  Vectors  Factors  Lists  Data frames  Tables

Mixing vectors and scalarsMixing vectors and scalars

R has the very convenient feature of having operators

that work with vectors It is even possible to mix vectors and scalars For example:

> X1 <- c(1.1, 4.3, 5, 2, 1, 4, 9.5)

> X1 + 1

[1] 2.1 5.3 6.0 3.0 2.0 5.0 10.5

> X1 * 2

[1] 2.2 8.6 10.0 4.0 2.0 8.0 19.0

Page 14: R objects  All R entities exist as objects  They can all be operated on as data  We will cover:  Vectors  Factors  Lists  Data frames  Tables

Vectors to record dataVectors to record data

> x = c(45,43,46,48,51,46,50,47,46,45)> length(x)[1] 10> x = c(x,48,49,51,50,49) # append values to x> length(x)[1] 15> x[16] = 41 # add to a specified index> length(x)[1] 16> mean(x)[1] 47.1875> x[17:20] = c(40,38,35,40) # add to many specified indices> length(x)[1] 20> mean(x)[1] 45.4

Page 15: R objects  All R entities exist as objects  They can all be operated on as data  We will cover:  Vectors  Factors  Lists  Data frames  Tables

FactorsFactors

A factor is a vector that encodes information about the

group to which a particular observation belongs Categorical data is often used to classify data into various

levels or factors To make a factor is easy, using the factor function

Page 16: R objects  All R entities exist as objects  They can all be operated on as data  We will cover:  Vectors  Factors  Lists  Data frames  Tables

Factors – smoking survey exampleFactors – smoking survey example

A survey asks people if they smoke or not. The data is:

Yes, No, No, Yes, Yes

> x=c("Yes","No","No","Yes","Yes")

> x # print out values in x

[1] "Yes" "No" "No" "Yes" "Yes"

> factor(x) # print out value in factor(x)

[1] Yes No No Yes Yes

Levels: No Yes # notice levels are printed.

Notice the difference in how R treats factors with this example

Page 17: R objects  All R entities exist as objects  They can all be operated on as data  We will cover:  Vectors  Factors  Lists  Data frames  Tables

Factors – student height exampleFactors – student height example

Suppose the recorded height of South African and British

students are as follows

heights <- c(1.7,1.95,1.63,1.54,1.29)

You make a new vector fac_heights, to record the nationality

that each observation pertains to

fac_heights <- factor(c(“GB”, “SA”, “GB”, “GB”, “SA”))

Useful when testing for differences between groups

Page 18: R objects  All R entities exist as objects  They can all be operated on as data  We will cover:  Vectors  Factors  Lists  Data frames  Tables

Factors – gender survey exampleFactors – gender survey example

Consider a survey that has data on 691 females and 692 males

> gender <- c(rep("female",691), rep("male",692)) # create vector

> gender <- factor(gender) # change vector to factor

• Once stored as a factor, the space required for storage is reduced

• Values “female” and “male” are the levels of the factor

> levels(gender) # assumes gender is a factor

[1] "female" "male"

• Internally, the factor ‘gender’ is stored as 691 1’s, followed by 692 2’s. It has stored with it a table that looks like this:

Page 19: R objects  All R entities exist as objects  They can all be operated on as data  We will cover:  Vectors  Factors  Lists  Data frames  Tables

ListsLists

A set of objects (e.g. vectors) can be combined under a

single name as a list (similar to a spreadsheet in Excel)

Example:

x <- c (1, 7, 8, 9, 10)

y <- c (“red”, “yellow”, “blue”, “green”)

example_list <- list (size = x, colour = y)

Note: vectors can consist of characters (i.e. letters/words)

instead of numbers, but never numbers AND characters

Page 20: R objects  All R entities exist as objects  They can all be operated on as data  We will cover:  Vectors  Factors  Lists  Data frames  Tables

Data framesData frames

The function data.frame( ): This is a special kind of list, in which the entries in a

specific position in the elements of the list correspond to

one another Each element of the list has the same length It is a rectangular table, with rows and columns

Page 21: R objects  All R entities exist as objects  They can all be operated on as data  We will cover:  Vectors  Factors  Lists  Data frames  Tables

Data framesData frames

Example 1: Simple data frames can be created Enter the following information at the prompt line:

h <- c (150, 170, 168, 179, 130)

w <- c (65, 70, 72, 80, 51)

patient_data <- data.frame (weight=w, height=h)

Type in patient_data to see what’s just been created…

Page 22: R objects  All R entities exist as objects  They can all be operated on as data  We will cover:  Vectors  Factors  Lists  Data frames  Tables

Access of elements in data framesAccess of elements in data frames

Individual elements can be accessed using a pair of

square brackets “[ ]” and by specifying their index, or

name

Here are some ways to access a cell, row or column:

patient_data$height accesses a column

patient_data [ , i] accesses the ith column

patient_data [ i, ] accesses the ith row

patient_data$height [i] i is the cell position in height

column

patient_data [ i, j ] looking for the jth cell in the ith column

Page 23: R objects  All R entities exist as objects  They can all be operated on as data  We will cover:  Vectors  Factors  Lists  Data frames  Tables

Data framesData frames

More complex tables can be created Data within each column must have the same type (e.g.,

number, text), but different columns may have different

types – like a spreadsheet, as in the example:

Page 24: R objects  All R entities exist as objects  They can all be operated on as data  We will cover:  Vectors  Factors  Lists  Data frames  Tables

Data framesData frames

Accessing specific cells, or data:

Note: "$" is a shortcut; minus "-" sign means not.

Page 25: R objects  All R entities exist as objects  They can all be operated on as data  We will cover:  Vectors  Factors  Lists  Data frames  Tables

TablesTables

We often view categorical data with tables

The table function allows us to look at tables Its simplest usage is table(x) where x is a categorical

variable

Page 26: R objects  All R entities exist as objects  They can all be operated on as data  We will cover:  Vectors  Factors  Lists  Data frames  Tables

TablesTables

Example: smoking survey

A survey asks people if they smoke or not. The data is:

Yes, No, No, Yes, Yes

> x=c("Yes","No","No","Yes","Yes")

> table(x)

x

No Yes

2 3

The table command simply adds up the frequency of each unique value of the data

Page 27: R objects  All R entities exist as objects  They can all be operated on as data  We will cover:  Vectors  Factors  Lists  Data frames  Tables

View a list of R packages: library()

Access datasets with the data function

data( ) provides a list of all the datasets

data (Titanic) loads the Titanic dataset

summary (Titanic) provides summary information about

the Titanic dataset

attributes(Titanic) provides more information

Titanic dataset name will display the

data

List all datasets in a package, e.g., data(package='stats')

R packages and datasetsR packages and datasets

Page 28: R objects  All R entities exist as objects  They can all be operated on as data  We will cover:  Vectors  Factors  Lists  Data frames  Tables

List preloaded datasets in R: data( ) Display the “women” dataset : women

Now let’s access specific data…… Access data from each column:

women$height or women[ ,1]

women$weight or women[ ,2] Access data from individual rows:

women[1, ] or women[10,] etc. Try it…….

Working through some examplesWorking through some examples

Page 29: R objects  All R entities exist as objects  They can all be operated on as data  We will cover:  Vectors  Factors  Lists  Data frames  Tables

Now that you can access sample data, let’s work with it: Get the mean weight and height of the women in our

example….. Remember the help function: help(mean) Also, R can show an example: example(mean)

Working through some examplesWorking through some examples

Page 30: R objects  All R entities exist as objects  They can all be operated on as data  We will cover:  Vectors  Factors  Lists  Data frames  Tables

Common useful functionsCommon useful functions

print() # prints a single R object

cat() # prints multiple objects, one after the other

length() # number of elements in a vector, or of a list

mean()

median()

range()

unique() # gives the vector of distinct values

sort() # sort elements into order

order() # x[order(x)] orders elements of x

rev() # reverse the order of vector elements