r programming taster session

205
Taster/Skills Set Session R-Programming

Upload: ilmo-van-der-loewe

Post on 25-May-2015

790 views

Category:

Education


3 download

DESCRIPTION

Slides for a research methods class on using R.

TRANSCRIPT

Page 1: R Programming Taster Session

Taster/Skills Set Session

R-Programming

Page 2: R Programming Taster Session
Page 3: R Programming Taster Session
Page 4: R Programming Taster Session

is a lot like Magic

Instead of spells, you have functions.

Page 5: R Programming Taster Session

Muggles

Incapable of magic and hardly aware of it.

Page 6: R Programming Taster Session
Page 7: R Programming Taster Session

• Limited ability to change the environment.

Page 8: R Programming Taster Session

• Limited ability to change the environment.

• Must rely on algorithms developed for them.

Page 9: R Programming Taster Session

• Limited ability to change the environment.

• Must rely on algorithms developed for them.

• Problem-solving constrained by SPSS developers.

Page 10: R Programming Taster Session

• Limited ability to change the environment.

• Must rely on algorithms developed for them.

• Problem-solving constrained by SPSS developers.

• Must pay for using the constrained algorithms.

Page 11: R Programming Taster Session

Most people are muggles.

And that’s okay.

Page 12: R Programming Taster Session
Page 13: R Programming Taster Session

Wizards

Page 14: R Programming Taster Session
Page 15: R Programming Taster Session

• Can use functions made by top statistics researchers or create their own.

Page 16: R Programming Taster Session

• Can use functions made by top statistics researchers or create their own.

• Almost unlimited in their ability to change their environment.

Page 17: R Programming Taster Session

• Can use functions made by top statistics researchers or create their own.

• Almost unlimited in their ability to change their environment.

• Can do things SPSS users cannot even dream of.

Page 18: R Programming Taster Session

• Can use functions made by top statistics researchers or create their own.

• Almost unlimited in their ability to change their environment.

• Can do things SPSS users cannot even dream of.

• Get their powers for free.

Page 19: R Programming Taster Session

Warning!Here’s the small print.

Page 20: R Programming Taster Session

Wizards also...

Page 21: R Programming Taster Session

• Love to stretch their brains

Wizards also...

Page 22: R Programming Taster Session

• Love to stretch their brains

• Have strong sitting muscles

Wizards also...

Page 23: R Programming Taster Session

• Love to stretch their brains

• Have strong sitting muscles

• Put in the effort to learn

Wizards also...

Page 24: R Programming Taster Session

• Love to stretch their brains

• Have strong sitting muscles

• Put in the effort to learn

• Persist with puzzles

Wizards also...

Page 25: R Programming Taster Session

• Love to stretch their brains

• Have strong sitting muscles

• Put in the effort to learn

• Persist with puzzles

• Feel at home with the esoteric and obscure

Wizards also...

Page 26: R Programming Taster Session

Do you stillwant to bea wizard?

Page 27: R Programming Taster Session
Page 28: R Programming Taster Session

Syllabus

Page 29: R Programming Taster Session

History of Magic — Origins of R

Syllabus

Page 30: R Programming Taster Session

History of Magic — Origins of RArithmancy — Learning the system

Syllabus

Page 31: R Programming Taster Session

History of Magic — Origins of RArithmancy — Learning the system

Transfiguration — Working with data

Syllabus

Page 32: R Programming Taster Session

History of Magic — Origins of RArithmancy — Learning the system

Transfiguration — Working with dataDivination — Models and predictions

Syllabus

Page 33: R Programming Taster Session
Page 34: R Programming Taster Session

History of Magic

Page 35: R Programming Taster Session

What is ?

Page 36: R Programming Taster Session

What is ?R is a computer language

used for data manipulation, statistics, and graphics.

Page 37: R Programming Taster Session

Learning any new language is tough.

Grammar, vocabulary, idioms,orthography, a new

world view...

Page 38: R Programming Taster Session

The payoff is a whole new world of possibility.

Page 39: R Programming Taster Session

Advantages Disadvantages

Open source Not user friendly at start

State of the art Minimal GUI

Publication-quality graphics Easy to lose “sense” of data

Reproducible research

Computer intensive analyses

Makes you think

Easy interface with databases

Page 40: R Programming Taster Session
Page 41: R Programming Taster Session

1976 – Bell Labs develops S, a language for data analysis; released commercially as S-plus.

Page 42: R Programming Taster Session

1976 – Bell Labs develops S, a language for data analysis; released commercially as S-plus.

1990s – R written and released as open source by (R)oss Ihaka and (R)obert Gentleman.

Page 43: R Programming Taster Session

1976 – Bell Labs develops S, a language for data analysis; released commercially as S-plus.

1990s – R written and released as open source by (R)oss Ihaka and (R)obert Gentleman.

1997 – The Comprehensive R Archive Network (CRAN) launched.

Page 44: R Programming Taster Session

1976 – Bell Labs develops S, a language for data analysis; released commercially as S-plus.

1990s – R written and released as open source by (R)oss Ihaka and (R)obert Gentleman.

1997 – The Comprehensive R Archive Network (CRAN) launched.

Today – 2781 user-contributes packages for R.

Page 45: R Programming Taster Session

Accio .To download R, go to

http://cran.r-project.org/bin/

Windows Mac Linux

Page 46: R Programming Taster Session

Software Pros Cons

Easy(ish), common in psychology

Limited analytic capability

Easy, common in business

Very limited analytic capability

Elegant matrix support

Expensive, lacks in statistics support

Extensibility, visualization,

programmabilityLearning curve

Page 47: R Programming Taster Session

Software Pros Cons

Easy(ish), common in psychology

Limited analytic capability

Easy, common in business

Very limited analytic capability

Elegant matrix support

Expensive, lacks in statistics support

Extensibility, visualization,

programmabilityLearning curve

Page 48: R Programming Taster Session

Software Pros Cons

Easy(ish), common in psychology

Limited analytic capability

Easy, common in business

Very limited analytic capability

Elegant matrix support

Expensive, lacks in statistics support

Extensibility, visualization,

programmabilityLearning curve

Page 49: R Programming Taster Session

Software Pros Cons

Easy(ish), common in psychology

Limited analytic capability

Easy, common in business

Very limited analytic capability

Elegant matrix support

Expensive, lacks in statistics support

Extensibility, visualization,

programmabilityLearning curve

Page 50: R Programming Taster Session

Software Pros Cons

Easy(ish), common in psychology

Limited analytic capability

Easy, common in business

Very limited analytic capability

Elegant matrix support

Expensive, lacks in statistics support

Extensibility, visualization,

programmabilityLearning curve

Page 51: R Programming Taster Session

data analysis contests

Page 52: R Programming Taster Session

Why ?• EVERYTHING in one framework

‣ base: linear and nonlinear modeling, classical statistical tests, time-series analysis, classification, clustering etc.

‣ packages from Medical Image Analysis to Pharmacokinetics

• CUSTOM functionality

‣ Programming ➞ Automation

Page 53: R Programming Taster Session

Practical Benefits

• Multiple datasets open at once

• Automate away “click-click-click” tasks

• Reproducibility

Page 54: R Programming Taster Session

Why not ?

Page 55: R Programming Taster Session

Why not ?

deducer

Page 56: R Programming Taster Session

Tastersession

Page 57: R Programming Taster Session

Learning• Self-study

‣ Past programming experience recommended

‣ Lots of expert advice available

• Oxford

‣ e.g., Ruth Ripley, Department of Statistics

‣ We’ll scratch the surface today

Page 58: R Programming Taster Session
Page 59: R Programming Taster Session

ArithmancyWorking with R

Page 60: R Programming Taster Session

R SPSS

Multi-dimensional data Rectangular data (“spreadsheet”)

Functions can be modified Proprietary functions

Interactive experience Passive experience

Extensible Cross/up-selling

Open and free Commercial

New Mindset

Page 61: R Programming Taster Session

Getting startedwith

Page 62: R Programming Taster Session

(Not very consoling) R console

Page 63: R Programming Taster Session
Page 64: R Programming Taster Session

Write a script hereand run it

Page 65: R Programming Taster Session

Output appears here.Did you get what you

wanted?

Page 66: R Programming Taster Session

Revise the scriptand run it again

Page 67: R Programming Taster Session

Saved scriptscan be rerun

later

Page 68: R Programming Taster Session

Interactivedata analysis

session

writescript

runscript

Page 69: R Programming Taster Session

Interactivedata analysis

sessionTextmate

Page 70: R Programming Taster Session

http://rstudio.org/

Page 71: R Programming Taster Session

Grammar of Spells

object = function(arguments)

Assignment operator

Page 72: R Programming Taster Session

Guess what this does!

z = read.table(“MyFile.txt”)

Page 73: R Programming Taster Session

Two ways about it=

is the same as

<-

Page 74: R Programming Taster Session

Data Frames

z

You can also use e.g., read.csv() and read.spss() functions.

Page 75: R Programming Taster Session
Page 76: R Programming Taster Session

Accessing data

z[1,]

Read 1st row, all columns.

Page 77: R Programming Taster Session

Accessing data

z[1,3]

Read cell at 1st row, 3rd column.

Page 78: R Programming Taster Session

Accessing data

z[,3]

Read 3rd column.

Page 79: R Programming Taster Session

Accessing data

z[,3:6]

Read columns from 3rd to 6th.

Page 80: R Programming Taster Session

Accessing data

z$avbity

Read 3rd column by name.

Page 81: R Programming Taster Session

Accessing data

z[“avbity”]

Read 3rd column by name.

Page 82: R Programming Taster Session

How about?

Page 83: R Programming Taster Session

How about?

z[1:6,1:3]

Page 84: R Programming Taster Session

SubsetsTask: Make a data set of items that cost less than 2.

Page 85: R Programming Taster Session

Subset functionz.cheap <- subset(z, cost < 2)

Can you make sense of this?

Page 86: R Programming Taster Session
Page 87: R Programming Taster Session

Transfiguration

Page 88: R Programming Taster Session

Practical magicTask: Transform a data set from individual data to pair-wise data.

(A typical tall-to-wide transformation.)

Page 89: R Programming Taster Session

Create a data set “c” in which each row has data from both the male and female in each pair from

data set “p”.

Goal

Page 90: R Programming Taster Session
Page 91: R Programming Taster Session

A pair

Page 92: R Programming Taster Session

How would you do this in SPSS?

Page 93: R Programming Taster Session
Page 94: R Programming Taster Session

• Create an id variable for each pair.

Page 95: R Programming Taster Session

• Create an id variable for each pair.

• Click Data > Restructure.

Page 96: R Programming Taster Session

• Create an id variable for each pair.

• Click Data > Restructure.‣ You want the second option, to "Restructure

selected cases into variables".

Page 97: R Programming Taster Session

• Create an id variable for each pair.

• Click Data > Restructure.‣ You want the second option, to "Restructure

selected cases into variables".

• Move id variable into the “Identifier Variable/s” and click “Next.”

Page 98: R Programming Taster Session

• Create an id variable for each pair.

• Click Data > Restructure.‣ You want the second option, to "Restructure

selected cases into variables".

• Move id variable into the “Identifier Variable/s” and click “Next.”

• Click “Yes” when asked whether you want to sort the data.

Page 99: R Programming Taster Session

• Create an id variable for each pair.

• Click Data > Restructure.‣ You want the second option, to "Restructure

selected cases into variables".

• Move id variable into the “Identifier Variable/s” and click “Next.”

• Click “Yes” when asked whether you want to sort the data.

• For “Order of New Variables,” click “Group by Original Variable” and click “Next.”

Page 100: R Programming Taster Session

The Plan

Page 101: R Programming Taster Session

• Step 1

The Plan

Page 102: R Programming Taster Session

• Step 1‣ Make a variable to identify each pair

The Plan

Page 103: R Programming Taster Session

• Step 1‣ Make a variable to identify each pair

• Step 2

The Plan

Page 104: R Programming Taster Session

• Step 1‣ Make a variable to identify each pair

• Step 2‣ Split the tall data into two parts: one chunk for

men and one chunk for women

The Plan

Page 105: R Programming Taster Session

• Step 1‣ Make a variable to identify each pair

• Step 2‣ Split the tall data into two parts: one chunk for

men and one chunk for women

• Step 3

The Plan

Page 106: R Programming Taster Session

• Step 1‣ Make a variable to identify each pair

• Step 2‣ Split the tall data into two parts: one chunk for

men and one chunk for women

• Step 3‣ Merge the two chunks side by side using the pair

identifier

The Plan

Page 107: R Programming Taster Session
Page 108: R Programming Taster Session

Participantids

Page 109: R Programming Taster Session

Participantids

10/10 = 1

Page 110: R Programming Taster Session

Participantids

10/10 = 111/10 = 1.1

Page 111: R Programming Taster Session

Participantids

10/10 = 111/10 = 1.1

When rounded,both equal 1.

Page 112: R Programming Taster Session

Create pair IDp$pair_id <- round(p$code/10)

Now each member of a pair has a common ID.

Page 113: R Programming Taster Session
Page 114: R Programming Taster Session

Separate gendersmen <- subset(! p,! gender == “Male”)

Page 115: R Programming Taster Session

Separate genderswomen <- subset(! p,! gender == “Female”)

Page 116: R Programming Taster Session
Page 117: R Programming Taster Session
Page 118: R Programming Taster Session
Page 119: R Programming Taster Session
Page 120: R Programming Taster Session

Merge sets

c <- merge(men, women, ! by.x = "pair_id",! by.y = "pair_id")

“x”“y”

Page 121: R Programming Taster Session
Page 122: R Programming Taster Session

Ugly variable names

Page 123: R Programming Taster Session

Rename variablesnames(c) <- gsub(! "x", # find “x”! "m", !! # replace with “m”

! names(c))

Page 124: R Programming Taster Session

Rename variablesnames(c) <- gsub(! "y", # find “y”! "f", !! # replace with “m”

! names(c))

Page 125: R Programming Taster Session
Page 126: R Programming Taster Session

But...Wouldn’t it be useful to have participant age

instead of their birth year?

Page 127: R Programming Taster Session

Do it all over again. Click click click click click click.

Page 128: R Programming Taster Session

Just add a line of code to the top:p$Age = (2011 - p$BirthYear)

Page 129: R Programming Taster Session

Now re-run the script.

Page 130: R Programming Taster Session
Page 131: R Programming Taster Session
Page 132: R Programming Taster Session

Practical magicTask: Extract participants’ written responsesfor statistical analysis in LIWC.

(For analysis, LIWC requires each text response in a separate file.)

Page 133: R Programming Taster Session

Extract each cell to a text file.62 participants, 8 variables = 496 files

Page 134: R Programming Taster Session

Manual labour

Page 135: R Programming Taster Session

Manual labour• Boring

Page 136: R Programming Taster Session

Manual labour• Boring

• Prone to human errors

Page 137: R Programming Taster Session

Manual labour• Boring

• Prone to human errors

• Risk of repetitive strain injury

Page 138: R Programming Taster Session

Manual labour• Boring

• Prone to human errors

• Risk of repetitive strain injury

• You have better things to do

Page 139: R Programming Taster Session

The way

Page 140: R Programming Taster Session

The way• Quick

Page 141: R Programming Taster Session

The way• Quick

• Efficient

Page 142: R Programming Taster Session

The way• Quick

• Efficient

• Repeatable

Page 143: R Programming Taster Session

The Plan

Page 144: R Programming Taster Session

• Step 1

The Plan

Page 145: R Programming Taster Session

• Step 1‣ Load SPSS data into R

The Plan

Page 146: R Programming Taster Session

• Step 1‣ Load SPSS data into R

• Step 2

The Plan

Page 147: R Programming Taster Session

• Step 1‣ Load SPSS data into R

• Step 2‣ Create a function that extracts the cell contents

and writes them to a file based on participant id and variable name

The Plan

Page 148: R Programming Taster Session

• Step 1‣ Load SPSS data into R

• Step 2‣ Create a function that extracts the cell contents

and writes them to a file based on participant id and variable name

• Step 3

The Plan

Page 149: R Programming Taster Session

• Step 1‣ Load SPSS data into R

• Step 2‣ Create a function that extracts the cell contents

and writes them to a file based on participant id and variable name

• Step 3‣ Run the function on the data

The Plan

Page 150: R Programming Taster Session

Load data to R

Page 151: R Programming Taster Session

Load data to Rlibrary(foreign)

Page 152: R Programming Taster Session

Load data to Rlibrary(foreign)

d <- read.spss(! “RESEARCH_DATA_FILE.sav", ! to.data.frame = T)

Page 153: R Programming Taster Session

Function ingredients

Page 154: R Programming Taster Session

Function ingredients• Information to identify the right cell

Page 155: R Programming Taster Session

Function ingredients• Information to identify the right cell‣ Participant id (the right row)

Page 156: R Programming Taster Session

Function ingredients• Information to identify the right cell‣ Participant id (the right row)‣ Variable name (the right column)

Page 157: R Programming Taster Session

Function ingredients• Information to identify the right cell‣ Participant id (the right row)‣ Variable name (the right column)

• A unique file name

Page 158: R Programming Taster Session

Function ingredients• Information to identify the right cell‣ Participant id (the right row)‣ Variable name (the right column)

• A unique file name‣ We’ll just use the above information + “.txt”

Page 159: R Programming Taster Session

The FunctionsaveText <- function(id, variable) {! data = subset(d, d$Ppno == id)! value = as.character(data[variable][1,1])! filename = paste(id, variable, ".txt", sep = "")! writeLines(value, con = filename)

}

Page 160: R Programming Taster Session
Page 161: R Programming Taster Session

The FunctionsaveText <- function(id, variable) {! data = subset(d, d$Ppno == id)! value = as.character(data[variable][1,1])! filename = paste(id, variable, ".txt", sep = "")! writeLines(value, con = filename)

} The name of our function.I could have used “Waddiwasi” instead,but I didn’t.

Page 162: R Programming Taster Session

The FunctionsaveText <- function(id, variable) {! data = subset(d, d$Ppno == id)! value = as.character(data[variable][1,1])! filename = paste(id, variable, ".txt", sep = "")! writeLines(value, con = filename)

}Function to makefunctions

Page 163: R Programming Taster Session

The FunctionsaveText <- function(id, variable) {! data = subset(d, d$Ppno == id)! value = as.character(data[variable][1,1])! filename = paste(id, variable, ".txt", sep = "")! writeLines(value, con = filename)

} The function requires two thingsto work: the participant id and the name of the variable to extract

Page 164: R Programming Taster Session

The FunctionsaveText <- function(id, variable) {! data = subset(d, d$Ppno == id)! value = as.character(data[variable][1,1])! filename = paste(id, variable, ".txt", sep = "")! writeLines(value, con = filename)

} Create a new object “data” that contains only the rows from “d” where the Ppno is the same as the id fed into the function.

Page 165: R Programming Taster Session

The FunctionsaveText <- function(id, variable) {! data = subset(d, d$Ppno == id)! value = as.character(data[variable][1,1])! filename = paste(id, variable, ".txt", sep = "")! writeLines(value, con = filename)

}Create a new object “value” that contains the specified variable from the participant data in text format.

Page 166: R Programming Taster Session

The FunctionsaveText <- function(id, variable) {! data = subset(d, d$Ppno == id)! value = as.character(data[variable][1,1])! filename = paste(id, variable, ".txt", sep = "")! writeLines(value, con = filename)

}Create a new object “filename” by squishing together the participant id, the variable name, and “.txt”.

Page 167: R Programming Taster Session

The FunctionsaveText <- function(id, variable) {! data = subset(d, d$Ppno == id)! value = as.character(data[variable][1,1])! filename = paste(id, variable, ".txt", sep = "")! writeLines(value, con = filename)

}Save the value to a file (name specified by filename).

Page 168: R Programming Taster Session

Ok, what’s next?

Page 169: R Programming Taster Session

Ok, what’s next?• Since the function writes out the data one

cell at a time (based on two bits of information), we need two lists to automate our work:

Page 170: R Programming Taster Session

Ok, what’s next?• Since the function writes out the data one

cell at a time (based on two bits of information), we need two lists to automate our work:‣ A list of participants

Page 171: R Programming Taster Session

Ok, what’s next?• Since the function writes out the data one

cell at a time (based on two bits of information), we need two lists to automate our work:‣ A list of participants‣ A list of all the variables we need

Page 172: R Programming Taster Session

Get ready to run the functionparticipants = unique(d$Ppno)

variables = c(! "phys_attra", "pers_attra",! "Descr__app", "Comments", ! "Signal_conveyed", "portrayyou", ! "their_signals", "their_portrayal”)

Page 173: R Programming Taster Session

“For each participant, go through thevariables and save the results for each.”

Run, function, run!List of participants

List of variables Our function

Page 174: R Programming Taster Session

Loopty loop

Page 175: R Programming Taster Session

Loopty loopfor (participant in participants) {!

}

Page 176: R Programming Taster Session

Loopty loopfor (participant in participants) {!

}Do this onceper participant(62 times total)

Page 177: R Programming Taster Session

Loopty loopfor (participant in participants) {!

}Do this onceper participant(62 times total)

for (variable in variables) {! ! saveText(participant, variable)! }

Page 178: R Programming Taster Session

Loopty loopfor (participant in participants) {!

}Do this onceper participant(62 times total)

for (variable in variables) {! ! saveText(participant, variable)! }

Do this onceper variable(8 times total)

Page 179: R Programming Taster Session

Loopty loopfor (participant in participants) {! for (variable in variables) {! ! saveText(participant, variable)! }}

Page 180: R Programming Taster Session

Result

Page 181: R Programming Taster Session

Result

496FILE

S

Page 182: R Programming Taster Session

...In a flick ofa wand!

Page 183: R Programming Taster Session
Page 184: R Programming Taster Session
Page 185: R Programming Taster Session

Divination

Page 186: R Programming Taster Session

More or lesseverything.

What can R do for you?

Page 187: R Programming Taster Session

Basic magic

Page 188: R Programming Taster Session

Basic magic• Out of the box, R can do

Page 189: R Programming Taster Session

Basic magic• Out of the box, R can do‣ Linear and nonlinear modeling

Page 190: R Programming Taster Session

Basic magic• Out of the box, R can do‣ Linear and nonlinear modeling‣ Classical statistical tests

Page 191: R Programming Taster Session

Basic magic• Out of the box, R can do‣ Linear and nonlinear modeling‣ Classical statistical tests‣ Time-series analysis

Page 192: R Programming Taster Session

Basic magic• Out of the box, R can do‣ Linear and nonlinear modeling‣ Classical statistical tests‣ Time-series analysis‣ Classification

Page 193: R Programming Taster Session

Basic magic• Out of the box, R can do‣ Linear and nonlinear modeling‣ Classical statistical tests‣ Time-series analysis‣ Classification‣ Clustering

Page 194: R Programming Taster Session

Basic magic• Out of the box, R can do‣ Linear and nonlinear modeling‣ Classical statistical tests‣ Time-series analysis‣ Classification‣ Clustering‣ and many other statistical techniques...

Page 195: R Programming Taster Session
Page 196: R Programming Taster Session

More help

Page 197: R Programming Taster Session

More help• An Introduction to R

Page 198: R Programming Taster Session

More help• An Introduction to R‣ http://cran.r-project.org/doc/manuals/R-

intro.html

Page 199: R Programming Taster Session

More help• An Introduction to R‣ http://cran.r-project.org/doc/manuals/R-

intro.html

• R Starter Kit

Page 200: R Programming Taster Session

More help• An Introduction to R‣ http://cran.r-project.org/doc/manuals/R-

intro.html

• R Starter Kit‣ http://www.ats.ucla.edu/stat/r/sk/

Page 201: R Programming Taster Session

More help• An Introduction to R‣ http://cran.r-project.org/doc/manuals/R-

intro.html

• R Starter Kit‣ http://www.ats.ucla.edu/stat/r/sk/

• R mailing list

Page 202: R Programming Taster Session

More help• An Introduction to R‣ http://cran.r-project.org/doc/manuals/R-

intro.html

• R Starter Kit‣ http://www.ats.ucla.edu/stat/r/sk/

• R mailing list

• Dumbledore’s Ruth Ripley’s classDepartment of Statistics, University of Oxford

Page 203: R Programming Taster Session

More help• An Introduction to R‣ http://cran.r-project.org/doc/manuals/R-

intro.html

• R Starter Kit‣ http://www.ats.ucla.edu/stat/r/sk/

• R mailing list

• Dumbledore’s Ruth Ripley’s classDepartment of Statistics, University of Oxford

‣ http://www.stats.ox.ac.uk/~ruth/

Page 204: R Programming Taster Session

Remember...Without R, it’s only esearch.

Page 205: R Programming Taster Session

Thanks for listening!