r for statistics - university of ottawa€¦ · r for statistics sso meeting what is r? i software...

Post on 10-Jun-2020

3 Views

Category:

Documents

0 Downloads

Preview:

Click to see full reader

TRANSCRIPT

R for Statistics

Rafa l KulikDepartment of Mathematics and Statistics

University of Ottawa

Statistical Society of Ottawa23 September 2011

Rafa l Kulik

R for Statistics SSO meeting

Plan

Rafa l Kulik 1

R for Statistics SSO meeting

Plan

I What is R?

Rafa l Kulik 1

R for Statistics SSO meeting

Plan

I What is R?I Basic Syntax: Vectors and Functions.

Rafa l Kulik 1

R for Statistics SSO meeting

Plan

I What is R?I Basic Syntax: Vectors and Functions.I Getting data into and out of R.

Rafa l Kulik 1

R for Statistics SSO meeting

Plan

I What is R?I Basic Syntax: Vectors and Functions.I Getting data into and out of R.I Simple Data Analysis.

Rafa l Kulik 1

R for Statistics SSO meeting

Plan

I What is R?I Basic Syntax: Vectors and Functions.I Getting data into and out of R.I Simple Data Analysis.I Statistical tests.

Rafa l Kulik 1

R for Statistics SSO meeting

Plan

I What is R?I Basic Syntax: Vectors and Functions.I Getting data into and out of R.I Simple Data Analysis.I Statistical tests.I Linear Regression.

Rafa l Kulik 1

R for Statistics SSO meeting

Plan

I What is R?I Basic Syntax: Vectors and Functions.I Getting data into and out of R.I Simple Data Analysis.I Statistical tests.I Linear Regression.I Simulation.

Rafa l Kulik 1

R for Statistics SSO meeting

Plan

I What is R?I Basic Syntax: Vectors and Functions.I Getting data into and out of R.I Simple Data Analysis.I Statistical tests.I Linear Regression.I Simulation.I Bootstrap.

Rafa l Kulik 1

R for Statistics SSO meeting

Plan

I What is R?I Basic Syntax: Vectors and Functions.I Getting data into and out of R.I Simple Data Analysis.I Statistical tests.I Linear Regression.I Simulation.I Bootstrap.I Random Matrices.

Rafa l Kulik 1

R for Statistics SSO meeting

Plan

I What is R?I Basic Syntax: Vectors and Functions.I Getting data into and out of R.I Simple Data Analysis.I Statistical tests.I Linear Regression.I Simulation.I Bootstrap.I Random Matrices.I Kernel smoothing in time series.

Rafa l Kulik 1

R for Statistics SSO meeting

What is R?

Rafa l Kulik 2

R for Statistics SSO meeting

What is R?

I Software for computation, simulation, data manipulation.

Rafa l Kulik 2

R for Statistics SSO meeting

What is R?

I Software for computation, simulation, data manipulation.I It is open-source and free implementation of S-PLUS.

Rafa l Kulik 2

R for Statistics SSO meeting

What is R?

I Software for computation, simulation, data manipulation.I It is open-source and free implementation of S-PLUS.I Can be used on Windows, Macintosh, Unix, Linux.

Rafa l Kulik 2

R for Statistics SSO meeting

What is R?

I Software for computation, simulation, data manipulation.I It is open-source and free implementation of S-PLUS.I Can be used on Windows, Macintosh, Unix, Linux.I Started by R. Gentleman and R. Ihaka from the University of Auckland.

Rafa l Kulik 2

R for Statistics SSO meeting

What is R?

I Software for computation, simulation, data manipulation.I It is open-source and free implementation of S-PLUS.I Can be used on Windows, Macintosh, Unix, Linux.I Started by R. Gentleman and R. Ihaka from the University of Auckland.I R is maintained by R development team. Its webpage is

www.r-project.org

Rafa l Kulik 2

R for Statistics SSO meeting

What is R?

I Software for computation, simulation, data manipulation.I It is open-source and free implementation of S-PLUS.I Can be used on Windows, Macintosh, Unix, Linux.I Started by R. Gentleman and R. Ihaka from the University of Auckland.I R is maintained by R development team. Its webpage is

www.r-project.org

I You may want to download Tinn-R - a free and simple replacement forthe code editor provided by R-Gui (graphical user interface).

Rafa l Kulik 2

R for Statistics SSO meeting

What is R?

I Software for computation, simulation, data manipulation.I It is open-source and free implementation of S-PLUS.I Can be used on Windows, Macintosh, Unix, Linux.I Started by R. Gentleman and R. Ihaka from the University of Auckland.I R is maintained by R development team. Its webpage is

www.r-project.org

I You may want to download Tinn-R - a free and simple replacement forthe code editor provided by R-Gui (graphical user interface).I The first issue of R journal appeared last year.

Rafa l Kulik 2

R for Statistics SSO meeting

Basic Syntax: Vectors and Functions

Rafa l Kulik 3

R for Statistics SSO meeting

Basic Syntax: Vectors and Functions

I Writing data manually as a vector:

Rafa l Kulik 3

R for Statistics SSO meeting

Basic Syntax: Vectors and Functions

I Writing data manually as a vector:

a=c(2,4,7,6,5) b=c(1,3,4,6,8,10)

Rafa l Kulik 3

R for Statistics SSO meeting

Basic Syntax: Vectors and Functions

I Writing data manually as a vector:

a=c(2,4,7,6,5) b=c(1,3,4,6,8,10)

I Elements of the vector can be printed as a[3], a[2:4].

Rafa l Kulik 3

R for Statistics SSO meeting

Basic Syntax: Vectors and Functions

I Writing data manually as a vector:

a=c(2,4,7,6,5) b=c(1,3,4,6,8,10)

I Elements of the vector can be printed as a[3], a[2:4].I Basic operations on vectors.

Rafa l Kulik 3

R for Statistics SSO meeting

Basic Syntax: Vectors and Functions

I Writing data manually as a vector:

a=c(2,4,7,6,5) b=c(1,3,4,6,8,10)

I Elements of the vector can be printed as a[3], a[2:4].I Basic operations on vectors.I You may write a vector as the argument of a function. e.g. log(a).

Rafa l Kulik 3

R for Statistics SSO meeting

Basic Syntax: Vectors and Functions

I Writing data manually as a vector:

a=c(2,4,7,6,5) b=c(1,3,4,6,8,10)

I Elements of the vector can be printed as a[3], a[2:4].I Basic operations on vectors.I You may write a vector as the argument of a function. e.g. log(a).I Operation a+b does not work. Extract e.g. b=b[2:6]. Now it works.

Rafa l Kulik 3

R for Statistics SSO meeting

Basic Syntax: Vectors and Functions

I Writing data manually as a vector:

a=c(2,4,7,6,5) b=c(1,3,4,6,8,10)

I Elements of the vector can be printed as a[3], a[2:4].I Basic operations on vectors.I You may write a vector as the argument of a function. e.g. log(a).I Operation a+b does not work. Extract e.g. b=b[2:6]. Now it works.I Note that a*b, a/b are coordinatewise.

Rafa l Kulik 3

R for Statistics SSO meeting

Basic Syntax: Vectors and Functions

I Writing data manually as a vector:

a=c(2,4,7,6,5) b=c(1,3,4,6,8,10)

I Elements of the vector can be printed as a[3], a[2:4].I Basic operations on vectors.I You may write a vector as the argument of a function. e.g. log(a).I Operation a+b does not work. Extract e.g. b=b[2:6]. Now it works.I Note that a*b, a/b are coordinatewise.I However, max(a,b) is not coordinatewise. You have to use pmax(a,b).

Rafa l Kulik 3

R for Statistics SSO meeting

Basic Syntax: Vectors and Functions

I Writing data manually as a vector:

a=c(2,4,7,6,5) b=c(1,3,4,6,8,10)

I Elements of the vector can be printed as a[3], a[2:4].I Basic operations on vectors.I You may write a vector as the argument of a function. e.g. log(a).I Operation a+b does not work. Extract e.g. b=b[2:6]. Now it works.I Note that a*b, a/b are coordinatewise.I However, max(a,b) is not coordinatewise. You have to use pmax(a,b).I Type help(name) to learn how to use a function name.

Rafa l Kulik 3

R for Statistics SSO meeting

Getting data into and out of R

Rafa l Kulik 4

R for Statistics SSO meeting

Getting data into and out of R

I Write longer sequences using scan() function. Type f=scan() and starttyping. Hit ENTER once to change a line, hit twice to finish.

Rafa l Kulik 4

R for Statistics SSO meeting

Getting data into and out of R

I Write longer sequences using scan() function. Type f=scan() and starttyping. Hit ENTER once to change a line, hit twice to finish.I You may read data from a file: yuan=scan("C:/docs/yuan.txt") or

swiss=scan("C:/docs/swiss.txt",nlines=500).

Rafa l Kulik 4

R for Statistics SSO meeting

Getting data into and out of R

I Write longer sequences using scan() function. Type f=scan() and starttyping. Hit ENTER once to change a line, hit twice to finish.I You may read data from a file: yuan=scan("C:/docs/yuan.txt") or

swiss=scan("C:/docs/swiss.txt",nlines=500).I Export data: swiss=write(data,"C:/docs/data.txt").

Rafa l Kulik 4

R for Statistics SSO meeting

Example:

Rafa l Kulik 5

R for Statistics SSO meeting

Example:

Merge yuan and swiss together, write them as file currency.1

1See Tables for matrix operations.

Rafa l Kulik 5

R for Statistics SSO meeting

Example:

Merge yuan and swiss together, write them as file currency.1

1See Tables for matrix operations.

Rafa l Kulik 5

R for Statistics SSO meeting

Example:

Merge yuan and swiss together, write them as file currency.1

currency=matrix(0,length(yuan),2)currency[,1]=yuan;currency[,2]=swiss ;write(currency,"C:/currency.txt"); # NOT GOODcurrency=t(currency);write(currency,"C:/currency.txt"); # NOT GOODwrite(currency,"C:/currency.txt",ncolumns=2) # GOOD !!!

1See Tables for matrix operations.

Rafa l Kulik 5

R for Statistics SSO meeting

Simple Data Analysis

Rafa l Kulik 6

R for Statistics SSO meeting

Simple Data Analysis

We have already data yuan and swiss. They describe exchange rates ofChinese Yuan and Swiss Frank, respectively, vs. US Dollar.

Rafa l Kulik 6

R for Statistics SSO meeting

Simple Data Analysis

We have already data yuan and swiss. They describe exchange rates ofChinese Yuan and Swiss Frank, respectively, vs. US Dollar. Before you startwith statistical analysis, note that:

Rafa l Kulik 6

R for Statistics SSO meeting

Simple Data Analysis

We have already data yuan and swiss. They describe exchange rates ofChinese Yuan and Swiss Frank, respectively, vs. US Dollar. Before you startwith statistical analysis, note that: financial data are not stationary.

Rafa l Kulik 6

R for Statistics SSO meeting

Simple Data Analysis

We have already data yuan and swiss. They describe exchange rates ofChinese Yuan and Swiss Frank, respectively, vs. US Dollar. Before you startwith statistical analysis, note that: financial data are not stationary. If St,t = 1, . . . , n, describes a value of financial instrument at time t, the typicaltransformation is

Rafa l Kulik 6

R for Statistics SSO meeting

Simple Data Analysis

We have already data yuan and swiss. They describe exchange rates ofChinese Yuan and Swiss Frank, respectively, vs. US Dollar. Before you startwith statistical analysis, note that: financial data are not stationary. If St,t = 1, . . . , n, describes a value of financial instrument at time t, the typicaltransformation is

Rt = log(

St

St−1

), t = 2, . . . , n.

Rafa l Kulik 6

R for Statistics SSO meeting

Simple Data Analysis

We have already data yuan and swiss. They describe exchange rates ofChinese Yuan and Swiss Frank, respectively, vs. US Dollar. Before you startwith statistical analysis, note that: financial data are not stationary. If St,t = 1, . . . , n, describes a value of financial instrument at time t, the typicaltransformation is

Rt = log(

St

St−1

), t = 2, . . . , n.

We will write a script which computes log-returns for yuan and swiss,then performs a basic statistical analysis. This will be stored in the fileReturns.R.

Rafa l Kulik 6

R for Statistics SSO meeting

Script Returns.R

Rafa l Kulik 7

R for Statistics SSO meeting

Script Returns.R

yuanreturns=yuan[2:500]/yuan[1:499];yuanreturns=log(yuanreturns);swissreturns=log(swiss[2:500]/swiss[1:499]);print(summary(yuanreturns)); # Print summary statisticsprint(summary(swissreturns));plot(yuanreturns); # Point plotplot(swissreturns);

Rafa l Kulik 7

R for Statistics SSO meeting

Script Returns.R

yuanreturns=yuan[2:500]/yuan[1:499];yuanreturns=log(yuanreturns);swissreturns=log(swiss[2:500]/swiss[1:499]);print(summary(yuanreturns)); # Print summary statisticsprint(summary(swissreturns));plot(yuanreturns); # Point plotplot(swissreturns);

This is not a good script, since the first plot is invisible. You may add thecommand par(). You may also make the pictures much nicer.

Rafa l Kulik 7

R for Statistics SSO meeting

Rafa l Kulik 8

R for Statistics SSO meeting

yuanreturns=yuan[2:500]/yuan[1:499];yuanreturns=log(yuanreturns);swissreturns=log(swiss[2:500]/swiss[1:499]);print("Summary statistics for YUAN");print(summary(yuanreturns));print("Summary statistics for SWISS");print(summary(swissreturns));par(mfrow=c(1,2));plot(yuanreturns,xlab="Time",type="l",col="blue");plot(swissreturns,main="SWISS vs. USD",ylab="SWISS log-returns");

Rafa l Kulik 8

R for Statistics SSO meeting

yuanreturns=yuan[2:500]/yuan[1:499];yuanreturns=log(yuanreturns);swissreturns=log(swiss[2:500]/swiss[1:499]);print("Summary statistics for YUAN");print(summary(yuanreturns));print("Summary statistics for SWISS");print(summary(swissreturns));par(mfrow=c(1,2));plot(yuanreturns,xlab="Time",type="l",col="blue");plot(swissreturns,main="SWISS vs. USD",ylab="SWISS log-returns");

The script has still a lot of disadvantages: data sets names have to replacedmanually, everything is printed out at the same time.

Rafa l Kulik 8

R for Statistics SSO meeting

yuanreturns=yuan[2:500]/yuan[1:499];yuanreturns=log(yuanreturns);swissreturns=log(swiss[2:500]/swiss[1:499]);print("Summary statistics for YUAN");print(summary(yuanreturns));print("Summary statistics for SWISS");print(summary(swissreturns));par(mfrow=c(1,2));plot(yuanreturns,xlab="Time",type="l",col="blue");plot(swissreturns,main="SWISS vs. USD",ylab="SWISS log-returns");

The script has still a lot of disadvantages: data sets names have to replacedmanually, everything is printed out at the same time. We will write afunction which will perform a simple data analysis for a given data set.

Rafa l Kulik 8

R for Statistics SSO meeting

Function DataAnalysis

Rafa l Kulik 9

R for Statistics SSO meeting

Function DataAnalysis

DataAnalysis<−function(data,gr=TRUE){ print(summary(data))

if(gr){ readline(prompt = "Press <Enter> to continue...")par(mfrow=c(1,1))plot(data,type="l",col="blue")abline(mean(data),0,col="red")readline(prompt = "Press <Enter> to continue...")boxplot(data)readline(prompt = "Press <Enter> to continue...")hist(data,breaks=floor(sqrt(length(data))),prob=TRUE)curve(dnorm(x,mean(data),sd(data)),add=TRUE,col="red")readline(prompt = "Press <Enter> to continue...")qqnorm(data); qqline(data,col="red")} }

Rafa l Kulik 9

R for Statistics SSO meeting

Statistical tests

Rafa l Kulik 10

R for Statistics SSO meeting

Statistical tests

I There is just one command with many possible specifications: t.test(x,y).

Rafa l Kulik 10

R for Statistics SSO meeting

Statistical tests

I There is just one command with many possible specifications: t.test(x,y).I Variables:

Rafa l Kulik 10

R for Statistics SSO meeting

Statistical tests

I There is just one command with many possible specifications: t.test(x,y).I Variables:

• x-data set, y-second data set, default=NULL;

Rafa l Kulik 10

R for Statistics SSO meeting

Statistical tests

I There is just one command with many possible specifications: t.test(x,y).I Variables:

• x-data set, y-second data set, default=NULL;• alternative=c(”two.sided”,”less”,”greater”);

Rafa l Kulik 10

R for Statistics SSO meeting

Statistical tests

I There is just one command with many possible specifications: t.test(x,y).I Variables:

• x-data set, y-second data set, default=NULL;• alternative=c(”two.sided”,”less”,”greater”);• paired=FALSE;

Rafa l Kulik 10

R for Statistics SSO meeting

Statistical tests

I There is just one command with many possible specifications: t.test(x,y).I Variables:

• x-data set, y-second data set, default=NULL;• alternative=c(”two.sided”,”less”,”greater”);• paired=FALSE;• conf.level=0.95.

Rafa l Kulik 10

R for Statistics SSO meeting

Statistical tests

I There is just one command with many possible specifications: t.test(x,y).I Variables:

• x-data set, y-second data set, default=NULL;• alternative=c(”two.sided”,”less”,”greater”);• paired=FALSE;• conf.level=0.95.

I Usage:

Rafa l Kulik 10

R for Statistics SSO meeting

Statistical tests

I There is just one command with many possible specifications: t.test(x,y).I Variables:

• x-data set, y-second data set, default=NULL;• alternative=c(”two.sided”,”less”,”greater”);• paired=FALSE;• conf.level=0.95.

I Usage:• t.test(x,alternative="less");

Rafa l Kulik 10

R for Statistics SSO meeting

Statistical tests

I There is just one command with many possible specifications: t.test(x,y).I Variables:

• x-data set, y-second data set, default=NULL;• alternative=c(”two.sided”,”less”,”greater”);• paired=FALSE;• conf.level=0.95.

I Usage:• t.test(x,alternative="less");• results=t.test(x,y,paired=TRUE,conf.level=0.97).

Rafa l Kulik 10

R for Statistics SSO meeting

Statistical tests

I There is just one command with many possible specifications: t.test(x,y).I Variables:

• x-data set, y-second data set, default=NULL;• alternative=c(”two.sided”,”less”,”greater”);• paired=FALSE;• conf.level=0.95.

I Usage:• t.test(x,alternative="less");• results=t.test(x,y,paired=TRUE,conf.level=0.97).

I Results of performing t.test function are stored in the object results,which is a t.test object.

Rafa l Kulik 10

R for Statistics SSO meeting

Statistical tests

I There is just one command with many possible specifications: t.test(x,y).I Variables:

• x-data set, y-second data set, default=NULL;• alternative=c(”two.sided”,”less”,”greater”);• paired=FALSE;• conf.level=0.95.

I Usage:• t.test(x,alternative="less");• results=t.test(x,y,paired=TRUE,conf.level=0.97).

I Results of performing t.test function are stored in the object results,which is a t.test object.Ia=t.test(yuanreturns); a$statistics; a$statistic[[1]]

Rafa l Kulik 10

R for Statistics SSO meeting

Linear Regression

Rafa l Kulik 11

R for Statistics SSO meeting

Linear Regression

Having data (Xt, Yt), t = 1, . . . , n from the model

Yt = β0 + β1Xt + εt, t = 1, . . . , n.

we want to estimate parameters β0 and β1 using mean squared errorcriterion.

Rafa l Kulik 11

R for Statistics SSO meeting

Linear Regression

Having data (Xt, Yt), t = 1, . . . , n from the model

Yt = β0 + β1Xt + εt, t = 1, . . . , n.

we want to estimate parameters β0 and β1 using mean squared errorcriterion.I The basic function for linear regression model is lm (y~ x).

Rafa l Kulik 11

R for Statistics SSO meeting

Linear Regression

Having data (Xt, Yt), t = 1, . . . , n from the model

Yt = β0 + β1Xt + εt, t = 1, . . . , n.

we want to estimate parameters β0 and β1 using mean squared errorcriterion.I The basic function for linear regression model is lm (y~ x).I Write lm (y~ x-1) to fit regression line without intercept.

Rafa l Kulik 11

R for Statistics SSO meeting

Linear Regression

Having data (Xt, Yt), t = 1, . . . , n from the model

Yt = β0 + β1Xt + εt, t = 1, . . . , n.

we want to estimate parameters β0 and β1 using mean squared errorcriterion.I The basic function for linear regression model is lm (y~ x).I Write lm (y~ x-1) to fit regression line without intercept.I The output is a linear model object. If lmresults=lm (y~ x), then:

Rafa l Kulik 11

R for Statistics SSO meeting

Linear Regression

Having data (Xt, Yt), t = 1, . . . , n from the model

Yt = β0 + β1Xt + εt, t = 1, . . . , n.

we want to estimate parameters β0 and β1 using mean squared errorcriterion.I The basic function for linear regression model is lm (y~ x).I Write lm (y~ x-1) to fit regression line without intercept.I The output is a linear model object. If lmresults=lm (y~ x), then:

• lmresults$coefficients to get intercept and slope;

Rafa l Kulik 11

R for Statistics SSO meeting

Linear Regression

Having data (Xt, Yt), t = 1, . . . , n from the model

Yt = β0 + β1Xt + εt, t = 1, . . . , n.

we want to estimate parameters β0 and β1 using mean squared errorcriterion.I The basic function for linear regression model is lm (y~ x).I Write lm (y~ x-1) to fit regression line without intercept.I The output is a linear model object. If lmresults=lm (y~ x), then:

• lmresults$coefficients to get intercept and slope;• lmresults$residuals to get residuals;

Rafa l Kulik 11

R for Statistics SSO meeting

Linear Regression

Having data (Xt, Yt), t = 1, . . . , n from the model

Yt = β0 + β1Xt + εt, t = 1, . . . , n.

we want to estimate parameters β0 and β1 using mean squared errorcriterion.I The basic function for linear regression model is lm (y~ x).I Write lm (y~ x-1) to fit regression line without intercept.I The output is a linear model object. If lmresults=lm (y~ x), then:

• lmresults$coefficients to get intercept and slope;• lmresults$residuals to get residuals;• lmresults$fitted.values to get fitted values.

Rafa l Kulik 11

R for Statistics SSO meeting

Function Linear Regression

Rafa l Kulik 12

R for Statistics SSO meeting

Function Linear Regression

LinearRegression<−function(x,y) {cat("Would you to fit with intercept? (Y/N) ");answer <- readline(); pr=switch(answer,y=,Y=TRUE,FALSE);pr1=switch(answer,n=,N=TRUE,FALSE);if(pr){lmresults= lm (y~ x) } ;if(pr1){lmresults= lm (y~ x-1) } ;par(mfrow=c(1,2)) ;plot(x,y,xlab=""); abline(lmresults) ;plot(x,lmresults$residuals,ylab="Residuals") ;readline(prompt = "Press <Enter> to continue...") ;print(lmresults) ;

}

Rafa l Kulik 12

R for Statistics SSO meeting

Simulation

Rafa l Kulik 13

R for Statistics SSO meeting

Simulation

I Simulation of basic random variables: b=rnorm(6).2

2See Tables for other simulation functions.

Rafa l Kulik 13

R for Statistics SSO meeting

Simulation

I Simulation of basic random variables: b=rnorm(6).2

BrownianMotion<−function(n=1000,m=10) {print("Simulate one path of BM");par(mfrow=c(1,1))l=3*sqrt(n)plot(cumsum(rnorm(n)),type="l",col="red",ylim=c(-l,l),xlab="")readline(prompt = "Press <Enter> to continue...")print("Simulation of"); print(m); print("paths of BM")for (s in 1:m){ points(cumsum(rnorm(n)),type="l" )}

}2See Tables for other simulation functions.

Rafa l Kulik 13

R for Statistics SSO meeting

Bootstrap

Rafa l Kulik 14

R for Statistics SSO meeting

Bootstrap

I Given a sample X1, . . . , Xn, let T (X1, . . . , Xn) = Θn be an estimator ofa quantity θ.

Rafa l Kulik 14

R for Statistics SSO meeting

Bootstrap

I Given a sample X1, . . . , Xn, let T (X1, . . . , Xn) = Θn be an estimator ofa quantity θ.I A confidence interval for θ has typically the following form

[Θn − zα/2se(Θn), Θn + zα/2se(Θn)],

where zα/2 is 1− α/2 quantile of N (0, 1).

Rafa l Kulik 14

R for Statistics SSO meeting

Bootstrap

I Given a sample X1, . . . , Xn, let T (X1, . . . , Xn) = Θn be an estimator ofa quantity θ.I A confidence interval for θ has typically the following form

[Θn − zα/2se(Θn), Θn + zα/2se(Θn)],

where zα/2 is 1− α/2 quantile of N (0, 1).

I Example 1: Xi ∼ N (µ, σ2) and Θn = Xn, then se(Θn) = σ/√

n and wecan easily estimate σ by its sample version.

Rafa l Kulik 14

R for Statistics SSO meeting

Bootstrap

I Given a sample X1, . . . , Xn, let T (X1, . . . , Xn) = Θn be an estimator ofa quantity θ.I A confidence interval for θ has typically the following form

[Θn − zα/2se(Θn), Θn + zα/2se(Θn)],

where zα/2 is 1− α/2 quantile of N (0, 1).

I Example 1: Xi ∼ N (µ, σ2) and Θn = Xn, then se(Θn) = σ/√

n and wecan easily estimate σ by its sample version.I Example 2: Xi ∼ F and Θn = med(X1, . . . , Xn), then se(Θn) =???.

Rafa l Kulik 14

R for Statistics SSO meeting

Bootstrap

I Given a sample X1, . . . , Xn, let T (X1, . . . , Xn) = Θn be an estimator ofa quantity θ.I A confidence interval for θ has typically the following form

[Θn − zα/2se(Θn), Θn + zα/2se(Θn)],

where zα/2 is 1− α/2 quantile of N (0, 1).

I Example 1: Xi ∼ N (µ, σ2) and Θn = Xn, then se(Θn) = σ/√

n and wecan easily estimate σ by its sample version.I Example 2: Xi ∼ F and Θn = med(X1, . . . , Xn), then se(Θn) =???.

I Solution: use bootstrap to get se(Θn), an estimate of se(Θn).

Rafa l Kulik 14

R for Statistics SSO meeting

Bootstrap

I Given a sample X1, . . . , Xn, let T (X1, . . . , Xn) = Θn be an estimator ofa quantity θ.I A confidence interval for θ has typically the following form

[Θn − zα/2se(Θn), Θn + zα/2se(Θn)],

where zα/2 is 1− α/2 quantile of N (0, 1).

I Example 1: Xi ∼ N (µ, σ2) and Θn = Xn, then se(Θn) = σ/√

n and wecan easily estimate σ by its sample version.I Example 2: Xi ∼ F and Θn = med(X1, . . . , Xn), then se(Θn) =???.

I Solution: use bootstrap to get se(Θn), an estimate of se(Θn).I Such constructed bootstrap confidence interval works well when thedistribution of Θn is approximately normal.

Rafa l Kulik 14

R for Statistics SSO meeting

Bootstrap - ctd.

Rafa l Kulik 15

R for Statistics SSO meeting

Bootstrap - ctd.

n=1000; m=700; k=500;X=rnorm(n,1,1);TX=median(X);TXboot=1:k;for (s in 1:k)

{;Xstar=sample(X,m,replace=TRUE);TXboot[s]=median(Xstar);};

print(sd(TXboot));qqnorm(TXboot);

Rafa l Kulik 15

R for Statistics SSO meeting

Random Matrices

Rafa l Kulik 16

R for Statistics SSO meeting

Random Matrices

Let A be a matrix n × n with the entries N (0, 1). Then the matrixB = A + AT is called Gaussian Unitary Ensemble. Its eigenvalues follow aparticular pattern.

Rafa l Kulik 16

R for Statistics SSO meeting

Random Matrices

Let A be a matrix n × n with the entries N (0, 1). Then the matrixB = A + AT is called Gaussian Unitary Ensemble. Its eigenvalues follow aparticular pattern.

n = 5000;A = array(rnorm(n*n),c(n,n));B = (A+t(A))/sqrt(2*n);Eigens <- eigen(B, symmetric=T);Eigenvalues <- Eigens$values;hist(Eigenvalues,xlab="Eigenvalues",freq=T)

Rafa l Kulik 16

R for Statistics SSO meeting

Random Matrices

Let A be a matrix n × n with the entries N (0, 1). Then the matrixB = A + AT is called Gaussian Unitary Ensemble. Its eigenvalues follow aparticular pattern.

n = 5000;A = array(rnorm(n*n),c(n,n));B = (A+t(A))/sqrt(2*n);Eigens <- eigen(B, symmetric=T);Eigenvalues <- Eigens$values;hist(Eigenvalues,xlab="Eigenvalues",freq=T)

Note: RMTstat package deals with different aspects of random matrices.

Rafa l Kulik 16

R for Statistics SSO meeting

Kernel Smoothing in Time Series

Rafa l Kulik 17

R for Statistics SSO meeting

Kernel Smoothing in Time Series

I Recall swiss data Xt, t = 1, . . . , n. We could think about a model

Xt = f(t/n) + εt, t = 1, . . . , n.

Rafa l Kulik 17

R for Statistics SSO meeting

Kernel Smoothing in Time Series

I Recall swiss data Xt, t = 1, . . . , n. We could think about a model

Xt = f(t/n) + εt, t = 1, . . . , n.

I We can estimate f via a kernel estimator, e.g.

fh(x) =n∑

t=1

`t(x)Xt,

where

`t(x) =K

(x−t/n

h

)∑n

s=1 K(

x−s/nh

).

Rafa l Kulik 17

R for Statistics SSO meeting

Rafa l Kulik 18

R for Statistics SSO meeting

my.data=swiss; n=length(my.data); ggrid=1:n/n;H=c(0.025,0.05,0.1);K<−function(x) { dnorm(x)};fhat<− function(x){sum(data*K((x-ggrid)/h))/sum(K((x-ggrid)/h)) };par(mfrow=c(1,3))fhatvec=1:nfor (h in H){ temp=1;for (i in ggrid)

{fhatvec[temp]=fhat(i); temp=temp+1;};plot(ggrid,my.data,xlab="Time",ylab="Data",type="p")points(ggrid,fhatvec,type="l",col="blue") }

Rafa l Kulik 18

R for Statistics SSO meeting

my.data=swiss; n=length(my.data); ggrid=1:n/n;H=c(0.025,0.05,0.1);K<−function(x) { dnorm(x)};fhat<− function(x){sum(data*K((x-ggrid)/h))/sum(K((x-ggrid)/h)) };par(mfrow=c(1,3))fhatvec=1:nfor (h in H){ temp=1;for (i in ggrid)

{fhatvec[temp]=fhat(i); temp=temp+1;};plot(ggrid,my.data,xlab="Time",ylab="Data",type="p")points(ggrid,fhatvec,type="l",col="blue") }

3

3For other smoothing techniques, e.g. wavelets, see waved, wavelets.

Rafa l Kulik 18

R for Statistics SSO meeting

Some commands

Table 1: Basic functionssqrt() Square root.

abs() Absolut value.

sin(), cos(), tan() Trigonometric functions.

exp() Exponential function.

log(), log2(), log10(), logb() Logarithmic functions.

Table 2: Basic vector functionssum(), prod() Sum and product of all elements.

cumsum(), cumprod() Cumulative sums and products.

length() Length of a vector.

sort(), rev(sort()) Sort a vector in increasing and decreasing order.

Table 3: Sequences

seq(a,b,by=x) Sequence from a to b with the step x. Default x = 1.

seq(a,b,length.out=n) Sequence from a to b with the size n.

rep(a,n) Repeat the value a n times.

Rafa l Kulik 19

R for Statistics SSO meeting

Table 4: Matrix operations

dim() Dimension.

% ∗% Matrix multiplication.

t() Matrix transpose.

det() Matrix determinant.

solve() Inverse.

eigen() Eigenvalues.

Table 5: Graphics

curve(expr) expr - written as e.g. x2

abline(a,b) Adds a line with intercept a and slope b.

points() Overlaying plots.

Table 6: Distributionsdname() Density (probability) function.

pname() Distribution function P (X ≤ x).

qname() Quantiles.

rname() Simulation.

”name” binom, pois, unif, exp, gamma, t, norm

Rafa l Kulik 20

R for Statistics SSO meeting

Table 7: Basic Statisticsmean(), median()

sd(), var()

min(), max()

summary()

quantile()

cor(), cov()

hist() Histogram.

boxplot() Boxplot.

stem() Stem-and-leaf plot.

ecdf Empirical distribution function.

Rafa l Kulik 21

top related