integrating r into introductory statisticsmc301/talks/integrate_r_user2012.pdf ·...

38
Integrating R into Introductory Statistics Mine C ¸etinkaya-Rundel (Duke University) Andrew Bray (UCLA) useR! 2012 - June 13, 2012

Upload: others

Post on 07-Aug-2020

0 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Integrating R into Introductory Statisticsmc301/talks/Integrate_R_useR2012.pdf · C˘etinkaya-Rundel, Bray Integrating R into Introductory Statistics useR! 2012 - June 13, 2012 1

Integrating R into Introductory Statistics

Mine Cetinkaya-Rundel (Duke University)Andrew Bray (UCLA)

useR! 2012 - June 13, 2012

Page 2: Integrating R into Introductory Statisticsmc301/talks/Integrate_R_useR2012.pdf · C˘etinkaya-Rundel, Bray Integrating R into Introductory Statistics useR! 2012 - June 13, 2012 1

Sta 101 at Duke University

First course in statistics for non-majors, mostly students who aresocial sciences majors

Weekly lab sessions using R

Labs are written for an interdisciplinary introductory course but can bemodified for discipline-specific coursesCan also be used in a first data-analysis course for stats majors, ideallyby reducing step-by-step instructions

We will be releasing these labs at openintro.org/stat/labs.php

Cetinkaya-Rundel, Bray Integrating R into Introductory Statistics useR! 2012 - June 13, 2012 1 / 15

Page 3: Integrating R into Introductory Statisticsmc301/talks/Integrate_R_useR2012.pdf · C˘etinkaya-Rundel, Bray Integrating R into Introductory Statistics useR! 2012 - June 13, 2012 1

Sta 101 at Duke University

First course in statistics for non-majors, mostly students who aresocial sciences majors

Weekly lab sessions using R

Labs are written for an interdisciplinary introductory course but can bemodified for discipline-specific coursesCan also be used in a first data-analysis course for stats majors, ideallyby reducing step-by-step instructions

We will be releasing these labs at openintro.org/stat/labs.php

Cetinkaya-Rundel, Bray Integrating R into Introductory Statistics useR! 2012 - June 13, 2012 1 / 15

Page 4: Integrating R into Introductory Statisticsmc301/talks/Integrate_R_useR2012.pdf · C˘etinkaya-Rundel, Bray Integrating R into Introductory Statistics useR! 2012 - June 13, 2012 1

Sta 101 at Duke University

First course in statistics for non-majors, mostly students who aresocial sciences majors

Weekly lab sessions using R

Labs are written for an interdisciplinary introductory course but can bemodified for discipline-specific coursesCan also be used in a first data-analysis course for stats majors, ideallyby reducing step-by-step instructions

We will be releasing these labs at openintro.org/stat/labs.php

Cetinkaya-Rundel, Bray Integrating R into Introductory Statistics useR! 2012 - June 13, 2012 1 / 15

Page 5: Integrating R into Introductory Statisticsmc301/talks/Integrate_R_useR2012.pdf · C˘etinkaya-Rundel, Bray Integrating R into Introductory Statistics useR! 2012 - June 13, 2012 1

Sta 101 at Duke University

First course in statistics for non-majors, mostly students who aresocial sciences majors

Weekly lab sessions using R

Labs are written for an interdisciplinary introductory course but can bemodified for discipline-specific coursesCan also be used in a first data-analysis course for stats majors, ideallyby reducing step-by-step instructions

We will be releasing these labs at openintro.org/stat/labs.php

Cetinkaya-Rundel, Bray Integrating R into Introductory Statistics useR! 2012 - June 13, 2012 1 / 15

Page 6: Integrating R into Introductory Statisticsmc301/talks/Integrate_R_useR2012.pdf · C˘etinkaya-Rundel, Bray Integrating R into Introductory Statistics useR! 2012 - June 13, 2012 1

Sta 101 at Duke University

First course in statistics for non-majors, mostly students who aresocial sciences majors

Weekly lab sessions using R

Labs are written for an interdisciplinary introductory course but can bemodified for discipline-specific coursesCan also be used in a first data-analysis course for stats majors, ideallyby reducing step-by-step instructions

We will be releasing these labs at openintro.org/stat/labs.php

Cetinkaya-Rundel, Bray Integrating R into Introductory Statistics useR! 2012 - June 13, 2012 1 / 15

Page 7: Integrating R into Introductory Statisticsmc301/talks/Integrate_R_useR2012.pdf · C˘etinkaya-Rundel, Bray Integrating R into Introductory Statistics useR! 2012 - June 13, 2012 1

Why R?

Unlike most software designed specifically for courses at this level, R is

free and open-source

powerful and flexible

relevant beyond the introductory statistics classroom

Cetinkaya-Rundel, Bray Integrating R into Introductory Statistics useR! 2012 - June 13, 2012 2 / 15

Page 8: Integrating R into Introductory Statisticsmc301/talks/Integrate_R_useR2012.pdf · C˘etinkaya-Rundel, Bray Integrating R into Introductory Statistics useR! 2012 - June 13, 2012 1

Why R?

Unlike most software designed specifically for courses at this level, R is

free and open-source

powerful and flexible

relevant beyond the introductory statistics classroom

Cetinkaya-Rundel, Bray Integrating R into Introductory Statistics useR! 2012 - June 13, 2012 2 / 15

Page 9: Integrating R into Introductory Statisticsmc301/talks/Integrate_R_useR2012.pdf · C˘etinkaya-Rundel, Bray Integrating R into Introductory Statistics useR! 2012 - June 13, 2012 1

Why R?

Unlike most software designed specifically for courses at this level, R is

free and open-source

powerful and flexible

relevant beyond the introductory statistics classroom

Cetinkaya-Rundel, Bray Integrating R into Introductory Statistics useR! 2012 - June 13, 2012 2 / 15

Page 10: Integrating R into Introductory Statisticsmc301/talks/Integrate_R_useR2012.pdf · C˘etinkaya-Rundel, Bray Integrating R into Introductory Statistics useR! 2012 - June 13, 2012 1

Why not R?

perceived challenge of teaching programming in addition to teachingstatistical concepts

labs and activities that try to find the right balance of standard andcustom functionsconsistent syntax highlighting helps

working with a command line tends to be more intimidating thantraditional GUI based tools

GUI tools also have a learning curvea user-friendly IDE (like RStudio)

Cetinkaya-Rundel, Bray Integrating R into Introductory Statistics useR! 2012 - June 13, 2012 3 / 15

Page 11: Integrating R into Introductory Statisticsmc301/talks/Integrate_R_useR2012.pdf · C˘etinkaya-Rundel, Bray Integrating R into Introductory Statistics useR! 2012 - June 13, 2012 1

Why not R?

perceived challenge of teaching programming in addition to teachingstatistical concepts

labs and activities that try to find the right balance of standard andcustom functionsconsistent syntax highlighting helps

working with a command line tends to be more intimidating thantraditional GUI based tools

GUI tools also have a learning curvea user-friendly IDE (like RStudio)

Cetinkaya-Rundel, Bray Integrating R into Introductory Statistics useR! 2012 - June 13, 2012 3 / 15

Page 12: Integrating R into Introductory Statisticsmc301/talks/Integrate_R_useR2012.pdf · C˘etinkaya-Rundel, Bray Integrating R into Introductory Statistics useR! 2012 - June 13, 2012 1

Why not R?

perceived challenge of teaching programming in addition to teachingstatistical concepts

labs and activities that try to find the right balance of standard andcustom functionsconsistent syntax highlighting helps

working with a command line tends to be more intimidating thantraditional GUI based tools

GUI tools also have a learning curvea user-friendly IDE (like RStudio)

Cetinkaya-Rundel, Bray Integrating R into Introductory Statistics useR! 2012 - June 13, 2012 3 / 15

Page 13: Integrating R into Introductory Statisticsmc301/talks/Integrate_R_useR2012.pdf · C˘etinkaya-Rundel, Bray Integrating R into Introductory Statistics useR! 2012 - June 13, 2012 1

Why not R?

perceived challenge of teaching programming in addition to teachingstatistical concepts

labs and activities that try to find the right balance of standard andcustom functionsconsistent syntax highlighting helps

working with a command line tends to be more intimidating thantraditional GUI based tools

GUI tools also have a learning curvea user-friendly IDE (like RStudio)

Cetinkaya-Rundel, Bray Integrating R into Introductory Statistics useR! 2012 - June 13, 2012 3 / 15

Page 14: Integrating R into Introductory Statisticsmc301/talks/Integrate_R_useR2012.pdf · C˘etinkaya-Rundel, Bray Integrating R into Introductory Statistics useR! 2012 - June 13, 2012 1

Why not R?

perceived challenge of teaching programming in addition to teachingstatistical concepts

labs and activities that try to find the right balance of standard andcustom functionsconsistent syntax highlighting helps

working with a command line tends to be more intimidating thantraditional GUI based tools

GUI tools also have a learning curvea user-friendly IDE (like RStudio)

Cetinkaya-Rundel, Bray Integrating R into Introductory Statistics useR! 2012 - June 13, 2012 3 / 15

Page 15: Integrating R into Introductory Statisticsmc301/talks/Integrate_R_useR2012.pdf · C˘etinkaya-Rundel, Bray Integrating R into Introductory Statistics useR! 2012 - June 13, 2012 1

Why not R?

perceived challenge of teaching programming in addition to teachingstatistical concepts

labs and activities that try to find the right balance of standard andcustom functionsconsistent syntax highlighting helps

working with a command line tends to be more intimidating thantraditional GUI based tools

GUI tools also have a learning curvea user-friendly IDE (like RStudio)

Cetinkaya-Rundel, Bray Integrating R into Introductory Statistics useR! 2012 - June 13, 2012 3 / 15

Page 16: Integrating R into Introductory Statisticsmc301/talks/Integrate_R_useR2012.pdf · C˘etinkaya-Rundel, Bray Integrating R into Introductory Statistics useR! 2012 - June 13, 2012 1

RStudio

What it helps resolve:

loading and viewing data

saving code

code history

workspace organization

plot history

What still remains a challenge:

working with a command line

Cetinkaya-Rundel, Bray Integrating R into Introductory Statistics useR! 2012 - June 13, 2012 4 / 15

Page 17: Integrating R into Introductory Statisticsmc301/talks/Integrate_R_useR2012.pdf · C˘etinkaya-Rundel, Bray Integrating R into Introductory Statistics useR! 2012 - June 13, 2012 1

RStudio

What it helps resolve:

loading and viewing data

saving code

code history

workspace organization

plot history

What still remains a challenge:

working with a command line

Cetinkaya-Rundel, Bray Integrating R into Introductory Statistics useR! 2012 - June 13, 2012 4 / 15

Page 18: Integrating R into Introductory Statisticsmc301/talks/Integrate_R_useR2012.pdf · C˘etinkaya-Rundel, Bray Integrating R into Introductory Statistics useR! 2012 - June 13, 2012 1

Balance

Teaching coding as a way of introducing/reinforcing concepts,especially those that are otherwise difficult to convey withoutcomputation

simulationssampling distributionsconfidence levelsbootstrappingrandomization tests...

Minimize coding for repeated mechanics and for implementationissues that are outside the scope of the course

Cetinkaya-Rundel, Bray Integrating R into Introductory Statistics useR! 2012 - June 13, 2012 5 / 15

Page 19: Integrating R into Introductory Statisticsmc301/talks/Integrate_R_useR2012.pdf · C˘etinkaya-Rundel, Bray Integrating R into Introductory Statistics useR! 2012 - June 13, 2012 1

Balance

Teaching coding as a way of introducing/reinforcing concepts,especially those that are otherwise difficult to convey withoutcomputation

simulationssampling distributionsconfidence levelsbootstrappingrandomization tests...

Minimize coding for repeated mechanics and for implementationissues that are outside the scope of the course

Cetinkaya-Rundel, Bray Integrating R into Introductory Statistics useR! 2012 - June 13, 2012 5 / 15

Page 20: Integrating R into Introductory Statisticsmc301/talks/Integrate_R_useR2012.pdf · C˘etinkaya-Rundel, Bray Integrating R into Introductory Statistics useR! 2012 - June 13, 2012 1

Confidence levels

Resample from the population many times and construct manyconfidence intervals (loops)

Plot these confidence intervals and highlight those that do notcontain the true population parameter (custom function)

Cetinkaya-Rundel, Bray Integrating R into Introductory Statistics useR! 2012 - June 13, 2012 6 / 15

Page 21: Integrating R into Introductory Statisticsmc301/talks/Integrate_R_useR2012.pdf · C˘etinkaya-Rundel, Bray Integrating R into Introductory Statistics useR! 2012 - June 13, 2012 1

Confidence levels

Resample from the population many times and construct manyconfidence intervals (loops)

Plot these confidence intervals and highlight those that do notcontain the true population parameter (custom function)

pop <- ames$Gr.Liv.Area

samp.mean <- rep(NA, 50)

samp.sd <- rep(NA, 50)

n <- 60

for(i in 1:50){samp <- sample(pop, n)

samp.mean[i] <- mean(samp)

samp.sd[i] <- sd(samp)

}

lower <- samp.mean - 1.96 * samp.sd / sqrt(n)

upper <- samp.mean + 1.96 * samp.sd / sqrt(n)

Source: De Cock, Journal of Statistics Education. 2011. 19(3).

Cetinkaya-Rundel, Bray Integrating R into Introductory Statistics useR! 2012 - June 13, 2012 6 / 15

Page 22: Integrating R into Introductory Statisticsmc301/talks/Integrate_R_useR2012.pdf · C˘etinkaya-Rundel, Bray Integrating R into Introductory Statistics useR! 2012 - June 13, 2012 1

Confidence levels

Resample from the population many times and construct manyconfidence intervals (loops)

Plot these confidence intervals and highlight those that do notcontain the true population parameter (custom function)

Cetinkaya-Rundel, Bray Integrating R into Introductory Statistics useR! 2012 - June 13, 2012 6 / 15

Page 23: Integrating R into Introductory Statisticsmc301/talks/Integrate_R_useR2012.pdf · C˘etinkaya-Rundel, Bray Integrating R into Introductory Statistics useR! 2012 - June 13, 2012 1

Confidence levels

Resample from the population many times and construct manyconfidence intervals (loops)

Plot these confidence intervals and highlight those that do notcontain the true population parameter (custom function)

plot.ci(lower, upper, mean(pop))

mu = 1499.6904

Cetinkaya-Rundel, Bray Integrating R into Introductory Statistics useR! 2012 - June 13, 2012 6 / 15

Page 24: Integrating R into Introductory Statisticsmc301/talks/Integrate_R_useR2012.pdf · C˘etinkaya-Rundel, Bray Integrating R into Introductory Statistics useR! 2012 - June 13, 2012 1

Statistical inference

Traditional curriculum for an introductory statistics course includesvarious statistical inference techniques

When introduced as disconnected topic these can be overwhelming tostudents

To help unify inferential concepts, use one function that does it all,but still requires students to think about the nature of the data andencourages them to conduct exploratory data analysis

Cetinkaya-Rundel, Bray Integrating R into Introductory Statistics useR! 2012 - June 13, 2012 7 / 15

Page 25: Integrating R into Introductory Statisticsmc301/talks/Integrate_R_useR2012.pdf · C˘etinkaya-Rundel, Bray Integrating R into Introductory Statistics useR! 2012 - June 13, 2012 1

Statistical inference

Traditional curriculum for an introductory statistics course includesvarious statistical inference techniques

When introduced as disconnected topic these can be overwhelming tostudents

To help unify inferential concepts, use one function that does it all,but still requires students to think about the nature of the data andencourages them to conduct exploratory data analysis

Cetinkaya-Rundel, Bray Integrating R into Introductory Statistics useR! 2012 - June 13, 2012 7 / 15

Page 26: Integrating R into Introductory Statisticsmc301/talks/Integrate_R_useR2012.pdf · C˘etinkaya-Rundel, Bray Integrating R into Introductory Statistics useR! 2012 - June 13, 2012 1

Statistical inference

Traditional curriculum for an introductory statistics course includesvarious statistical inference techniques

When introduced as disconnected topic these can be overwhelming tostudents

To help unify inferential concepts, use one function that does it all,but still requires students to think about the nature of the data andencourages them to conduct exploratory data analysis

Cetinkaya-Rundel, Bray Integrating R into Introductory Statistics useR! 2012 - June 13, 2012 7 / 15

Page 27: Integrating R into Introductory Statisticsmc301/talks/Integrate_R_useR2012.pdf · C˘etinkaya-Rundel, Bray Integrating R into Introductory Statistics useR! 2012 - June 13, 2012 1

Custom inference function

Standard statistical inference like Z-tests and T-tests as well as simulationbased inference.

inference <- function(data, group = NULL, est = c("mean", "median",

"proportion"), success = NULL, order = NULL, nsim = 10000, conflevel =

0.95, null = NULL, alternative = c("less","greater","twosided"), type =

c("ci","ht"), method = c("theoretical","simulation"), drawlines = "yes",

simdist = FALSE)

data: response variable, quantitative or categorical

group: explanatory variable, categorical for grouping (optional)

type: confidence interval (ci) or hypothesis test (ht)

method: theoretical or simulation

...

Cetinkaya-Rundel, Bray Integrating R into Introductory Statistics useR! 2012 - June 13, 2012 8 / 15

Page 28: Integrating R into Introductory Statisticsmc301/talks/Integrate_R_useR2012.pdf · C˘etinkaya-Rundel, Bray Integrating R into Introductory Statistics useR! 2012 - June 13, 2012 1

Custom inference function

Standard statistical inference like Z-tests and T-tests as well as simulationbased inference.

inference <- function(data, group = NULL, est = c("mean", "median",

"proportion"), success = NULL, order = NULL, nsim = 10000, conflevel =

0.95, null = NULL, alternative = c("less","greater","twosided"), type =

c("ci","ht"), method = c("theoretical","simulation"), drawlines = "yes",

simdist = FALSE)

data: response variable, quantitative or categorical

group: explanatory variable, categorical for grouping (optional)

type: confidence interval (ci) or hypothesis test (ht)

method: theoretical or simulation

...

Cetinkaya-Rundel, Bray Integrating R into Introductory Statistics useR! 2012 - June 13, 2012 8 / 15

Page 29: Integrating R into Introductory Statisticsmc301/talks/Integrate_R_useR2012.pdf · C˘etinkaya-Rundel, Bray Integrating R into Introductory Statistics useR! 2012 - June 13, 2012 1

inference Ex. 1 - simulated data

Input:

set.seed(62012)

d <- rnorm(100)

gr <- c(rep("gr1",50), rep("gr2",50))

inference(data = d, group = gr, est = "mean", type = "ht", method =

"theoretical", null = 0, alternative = "twosided")

Cetinkaya-Rundel, Bray Integrating R into Introductory Statistics useR! 2012 - June 13, 2012 9 / 15

Page 30: Integrating R into Introductory Statisticsmc301/talks/Integrate_R_useR2012.pdf · C˘etinkaya-Rundel, Bray Integrating R into Introductory Statistics useR! 2012 - June 13, 2012 1

inference Ex. 1 - simulated data

Output:One quantitative and one categorical variable

Difference between two means

n_gr1 = 50 ; n_gr2 = 50

Observed difference between means = -0.2475

H0: mu_gr1 - mu_gr2 = 0

HA: mu_gr1 - mu_gr2 != 0

Standard error = 0.21

Test statistic: Z = -1.179

p-value: 0.2382

gr1 gr2

-3-2

-10

12

3

-0.25 0 0.25

Cetinkaya-Rundel, Bray Integrating R into Introductory Statistics useR! 2012 - June 13, 2012 9 / 15

Page 31: Integrating R into Introductory Statisticsmc301/talks/Integrate_R_useR2012.pdf · C˘etinkaya-Rundel, Bray Integrating R into Introductory Statistics useR! 2012 - June 13, 2012 1

inference Ex. 2 - real data from student project

Difference between the number of sexual partners of male and femaleadults. (Source: National Survey of Family Growth)

Input:

inference(data = partners, group = gender, type = "ci", est = "mean",

method = "theoretical")

Output:

One quantitative and one categorical variable

Difference between two means

n_female = 12190 ; n_male = 10397

Observed difference between means = -0.432

Standard error = 0.0361

95 % Confidence interval = ( -0.5 , -0.36 )

female male

01

23

45

67

Cetinkaya-Rundel, Bray Integrating R into Introductory Statistics useR! 2012 - June 13, 2012 10 / 15

Page 32: Integrating R into Introductory Statisticsmc301/talks/Integrate_R_useR2012.pdf · C˘etinkaya-Rundel, Bray Integrating R into Introductory Statistics useR! 2012 - June 13, 2012 1

Labs

openintro.org/stat/labs.php

Cetinkaya-Rundel, Bray Integrating R into Introductory Statistics useR! 2012 - June 13, 2012 11 / 15

Page 33: Integrating R into Introductory Statisticsmc301/talks/Integrate_R_useR2012.pdf · C˘etinkaya-Rundel, Bray Integrating R into Introductory Statistics useR! 2012 - June 13, 2012 1

Labs

openintro.org/stat/labs.php

Cetinkaya-Rundel, Bray Integrating R into Introductory Statistics useR! 2012 - June 13, 2012 11 / 15

Page 34: Integrating R into Introductory Statisticsmc301/talks/Integrate_R_useR2012.pdf · C˘etinkaya-Rundel, Bray Integrating R into Introductory Statistics useR! 2012 - June 13, 2012 1

Labs

openintro.org/stat/labs.php

Cetinkaya-Rundel, Bray Integrating R into Introductory Statistics useR! 2012 - June 13, 2012 11 / 15

Page 35: Integrating R into Introductory Statisticsmc301/talks/Integrate_R_useR2012.pdf · C˘etinkaya-Rundel, Bray Integrating R into Introductory Statistics useR! 2012 - June 13, 2012 1

Student reactions - labs

Positive:

“I like them. I feel like in the real world we’ll be using software to do stats,so I’m glad we’re learning how to use it.”

“I LOVE the labs. They really help cement basic statistic ideas, and Iespecially love that you can finish them in class.”

“The labs are a lot of fun. It’s great being able to create our own simulationsand watch R Studio calculate everything. I also enjoy learning some code.”

Negative:

“The labs are alright. Sometimes I feel like I’m just plugging in stuff and Ifeel disconnected from what I’m really doing. It’s also frustrating when thecode doesn’t work.”

“Wish other students focused more.”

Cetinkaya-Rundel, Bray Integrating R into Introductory Statistics useR! 2012 - June 13, 2012 12 / 15

Page 36: Integrating R into Introductory Statisticsmc301/talks/Integrate_R_useR2012.pdf · C˘etinkaya-Rundel, Bray Integrating R into Introductory Statistics useR! 2012 - June 13, 2012 1

Student reactions - R

Positive:

“Super useful and powerful software. It’s exciting to be introduced to it.Once again, don’t always feel comfortable writing code/ understanding whatI’m doing.”

“I like it! I kind of know MATLAB, which has helped with the coding a bit,but it’s a little more intuitive/easier, and very helpful.”

“I am not a computer person at all, but I find RStudio very easy to use.”

“I like it better than STATA which we used for [another class]. The userinterface is easy and there is plenty of help for it online. Overall, it’s prettygood.”

Negative:

“I am not a fan of coding in general. I used Python before and RStudio isbetter (for me) than Python was, but I am not a fan of either.”

“Easy to use, language is not too hard to understand although errormessages could be more informative.”

“I don’t think RStudio will have any use to me outside of this class.”Cetinkaya-Rundel, Bray Integrating R into Introductory Statistics useR! 2012 - June 13, 2012 13 / 15

Page 37: Integrating R into Introductory Statisticsmc301/talks/Integrate_R_useR2012.pdf · C˘etinkaya-Rundel, Bray Integrating R into Introductory Statistics useR! 2012 - June 13, 2012 1

Additional considerations

Labs should be fully integrated with the curriculum

What concepts from the textbook are covered in this lab? Whatconcepts, if any, are not covered in the textbook? Have you seen theseconcepts elsewhere, e.g. lecture, discussion section, previous labs, orhomework problems? Be specific in your answer.

Works best in a classroom environment where they can get immediatesupport

TAs should be familiar and comfortable with the material

Cetinkaya-Rundel, Bray Integrating R into Introductory Statistics useR! 2012 - June 13, 2012 14 / 15

Page 38: Integrating R into Introductory Statisticsmc301/talks/Integrate_R_useR2012.pdf · C˘etinkaya-Rundel, Bray Integrating R into Introductory Statistics useR! 2012 - June 13, 2012 1

Questions?

Contact: [email protected], www.stat.duke.edu/ mc301

Labs: openintro.org/stat/labs.php

Cetinkaya-Rundel, Bray Integrating R into Introductory Statistics useR! 2012 - June 13, 2012 15 / 15