multivariate analysis - quimica anselmo · –an introduction to r –multivariate statistical...
TRANSCRIPT
![Page 1: Multivariate Analysis - Quimica Anselmo · –An Introduction to R –Multivariate Statistical Analysis using the R package chemometrics –R-bloggers: Posts Tagged ‘ "R" Chemometrics](https://reader030.vdocuments.us/reader030/viewer/2022040611/5ed939016714ca7f47695e94/html5/thumbnails/1.jpg)
Multivariate Analysis PCR and PLS
Prof. Dr. Anselmo E de Oliveira
anselmo.quimica.ufg.br
![Page 2: Multivariate Analysis - Quimica Anselmo · –An Introduction to R –Multivariate Statistical Analysis using the R package chemometrics –R-bloggers: Posts Tagged ‘ "R" Chemometrics](https://reader030.vdocuments.us/reader030/viewer/2022040611/5ed939016714ca7f47695e94/html5/thumbnails/2.jpg)
Multicollinearity
• Variable selection is one possibility of reducing the number of regressor variables and removing multicollinearity
![Page 3: Multivariate Analysis - Quimica Anselmo · –An Introduction to R –Multivariate Statistical Analysis using the R package chemometrics –R-bloggers: Posts Tagged ‘ "R" Chemometrics](https://reader030.vdocuments.us/reader030/viewer/2022040611/5ed939016714ca7f47695e94/html5/thumbnails/3.jpg)
The PCR Model
• PCR also solves the problem of data collinearity and reduces the number of regressor variables, but the regressor variables are no longer the original measured 𝑥-variables but linear combinations thereof
– The linear combinations are the principal component scores of the 𝑥-variable
– PCR is a combination of PCA and OLS
![Page 4: Multivariate Analysis - Quimica Anselmo · –An Introduction to R –Multivariate Statistical Analysis using the R package chemometrics –R-bloggers: Posts Tagged ‘ "R" Chemometrics](https://reader030.vdocuments.us/reader030/viewer/2022040611/5ed939016714ca7f47695e94/html5/thumbnails/4.jpg)
The PCR Model
• PCA decomposes a (centered) data matrix 𝑿 into scores 𝑻 and loadings 𝑷
• For a certain number 𝑎 of PCs which is usually less than the rank of the data matrix, this decomposition is
𝑿 = 𝑻𝑷𝑇 + 𝑬 with na error matrix 𝑬 • The score matrix 𝑻 contains the maximum
amount of information of 𝑿 among all matrices that are orthogonal projections on 𝑎 linear combinations of the 𝑥-data
![Page 5: Multivariate Analysis - Quimica Anselmo · –An Introduction to R –Multivariate Statistical Analysis using the R package chemometrics –R-bloggers: Posts Tagged ‘ "R" Chemometrics](https://reader030.vdocuments.us/reader030/viewer/2022040611/5ed939016714ca7f47695e94/html5/thumbnails/5.jpg)
The PCR Model
In a MLR model 𝒚 = 𝑿𝒃 + 𝒆
we replace the matrix 𝑿 by the score matrix 𝑻 and thus include major information of the 𝑥-data for regression on 𝒚
𝒚 = 𝑿𝒃 + 𝒆, 𝑿 = 𝑻𝑷𝑇 𝒚 = 𝑻𝑷𝑇 𝒃 + 𝒆𝑇 𝒈 = 𝑷𝑇𝒃 𝒚 = 𝑻𝒈 + 𝒆𝑇
with the new regression coefficients 𝒈 and the error term 𝒆𝑇 – The information of the highly correlated 𝑥-variables is
compressed in few score vectors that are uncorrelated: solves the problem with data collinearity
– The complexity of the model can be optimized by the number of used PCs
![Page 6: Multivariate Analysis - Quimica Anselmo · –An Introduction to R –Multivariate Statistical Analysis using the R package chemometrics –R-bloggers: Posts Tagged ‘ "R" Chemometrics](https://reader030.vdocuments.us/reader030/viewer/2022040611/5ed939016714ca7f47695e94/html5/thumbnails/6.jpg)
The PCR Model
• OLS regression 𝒈 = 𝑻𝑇𝑻 −1𝑻𝑇𝒚
• The final coefficients for the original model 𝒃𝑷𝑪𝑹 = 𝑷𝒈
diagonal matrix
![Page 7: Multivariate Analysis - Quimica Anselmo · –An Introduction to R –Multivariate Statistical Analysis using the R package chemometrics –R-bloggers: Posts Tagged ‘ "R" Chemometrics](https://reader030.vdocuments.us/reader030/viewer/2022040611/5ed939016714ca7f47695e94/html5/thumbnails/7.jpg)
PCA
feature 2
feature 1 𝑥𝐼,1
𝑥𝐼,2 𝑥𝐼
feature 2
feature 1
𝑧𝐼,1
𝑧𝐼,2
𝑥𝐼
PC1 PC2
colum vector 𝒙𝟏
colum vector 𝒙𝟐
PCA
PC1
PC2
feature 2
feature 1
PC1
only needed direction
Lesson 2: Linear Regression
![Page 8: Multivariate Analysis - Quimica Anselmo · –An Introduction to R –Multivariate Statistical Analysis using the R package chemometrics –R-bloggers: Posts Tagged ‘ "R" Chemometrics](https://reader030.vdocuments.us/reader030/viewer/2022040611/5ed939016714ca7f47695e94/html5/thumbnails/8.jpg)
The PCR Model
![Page 9: Multivariate Analysis - Quimica Anselmo · –An Introduction to R –Multivariate Statistical Analysis using the R package chemometrics –R-bloggers: Posts Tagged ‘ "R" Chemometrics](https://reader030.vdocuments.us/reader030/viewer/2022040611/5ed939016714ca7f47695e94/html5/thumbnails/9.jpg)
Number of PCA Components
• The number of components has to be optimized for the best possible prediction of the 𝑦-variable – PCA: total variance
• Simple strategy – Selection of the first PCA scores which cover a
certain percentage of the total variance of 𝑿 (for instance, 99%)
– Selection of the PCA scores with maximum correlation to 𝒚
![Page 10: Multivariate Analysis - Quimica Anselmo · –An Introduction to R –Multivariate Statistical Analysis using the R package chemometrics –R-bloggers: Posts Tagged ‘ "R" Chemometrics](https://reader030.vdocuments.us/reader030/viewer/2022040611/5ed939016714ca7f47695e94/html5/thumbnails/10.jpg)
PLS
• PLS stands for partial least-squares or/and projection to latent structures by means of partial seast squares
• PLS is the most widely used method in chemometrics for multivariate calibration – Web of Science, 07/09/13
• PCR: 1,048
• PLS: 8,731
• Herman Wold, 1975
![Page 11: Multivariate Analysis - Quimica Anselmo · –An Introduction to R –Multivariate Statistical Analysis using the R package chemometrics –R-bloggers: Posts Tagged ‘ "R" Chemometrics](https://reader030.vdocuments.us/reader030/viewer/2022040611/5ed939016714ca7f47695e94/html5/thumbnails/11.jpg)
PLS
• Essentially, the model structures of PLS and PCR are the same – The 𝑥-data are first transformed into a set of a few
intermediate linear latent variables (components), and these new variables are used for regression (by OLS) with a dependent variable 𝑦
– PCR uses principal component scores (derived solely from 𝑿)
– PLS uses components that are related to 𝒚 • Maximum covariance between scores and 𝑦
– PLS and PCR are linear methods (although nonlinear versions exist)
![Page 12: Multivariate Analysis - Quimica Anselmo · –An Introduction to R –Multivariate Statistical Analysis using the R package chemometrics –R-bloggers: Posts Tagged ‘ "R" Chemometrics](https://reader030.vdocuments.us/reader030/viewer/2022040611/5ed939016714ca7f47695e94/html5/thumbnails/12.jpg)
PLS
• PLS is a powerful linear regression method – Insensitive to collinear variables – Large number of variables
• First PLS-component is calculated as the latent variable which has maximum covariance between the scores and modeled property 𝑦
• Next, the information (variance) of this component is removed from the 𝑥-data (peeling or deflation) – It is a projection of the 𝑥-space on to a (hyper-)plane that is orthogonal to the
direction of the found component
• From the residual matrix, the next PLS component is derived • This procedure is continued until no improvement of modeling 𝑦 is
achieved • The number of PLS components defines the complexity of the model
– The optimum number of components is usually estimated by CV
• PLS2: PLS with a matrix 𝒀 instead of a vector 𝒚
![Page 13: Multivariate Analysis - Quimica Anselmo · –An Introduction to R –Multivariate Statistical Analysis using the R package chemometrics –R-bloggers: Posts Tagged ‘ "R" Chemometrics](https://reader030.vdocuments.us/reader030/viewer/2022040611/5ed939016714ca7f47695e94/html5/thumbnails/13.jpg)
PLS
• Partial least squares (PLS) models are based on principal components of both the independent data 𝑿 and the dependent data 𝒀. The central idea is to calculate the principal component scores of the 𝑿 and the 𝒀 data matrix and to set up a regression model between the scores (and not the original data).
• Thus the matrix 𝑿 is decomposed into a matrix 𝑻 (the score matrix) and a matrix 𝑷′ (the loadings matrix) plus an error matrix 𝑬. The matrix 𝒀 is decomposed into 𝑼 and 𝑸 and the error term 𝑭 . These two equations are called outer relations. The goal of the PLS algorithm is to minimize the norm of F while keeping the correlation between 𝑿 and 𝒀 by the inner relation 𝑼 = 𝑩𝑻
Statistics4u.com
![Page 14: Multivariate Analysis - Quimica Anselmo · –An Introduction to R –Multivariate Statistical Analysis using the R package chemometrics –R-bloggers: Posts Tagged ‘ "R" Chemometrics](https://reader030.vdocuments.us/reader030/viewer/2022040611/5ed939016714ca7f47695e94/html5/thumbnails/14.jpg)
PLS
• References
– H. Abdi, The University of Texas at Dallas
• Partial Least Squares (PLS) Regression
– StatSoft, PLS
– Bob Collins, LPAC group meeting, PLS
– ST02: Multivariate Data Analysis and Chemometrics
![Page 15: Multivariate Analysis - Quimica Anselmo · –An Introduction to R –Multivariate Statistical Analysis using the R package chemometrics –R-bloggers: Posts Tagged ‘ "R" Chemometrics](https://reader030.vdocuments.us/reader030/viewer/2022040611/5ed939016714ca7f47695e94/html5/thumbnails/15.jpg)
PCR and PLS with “R”
R-bloggers: Posts Tagged ‘ "R" Chemometrics ’, page 6
![Page 16: Multivariate Analysis - Quimica Anselmo · –An Introduction to R –Multivariate Statistical Analysis using the R package chemometrics –R-bloggers: Posts Tagged ‘ "R" Chemometrics](https://reader030.vdocuments.us/reader030/viewer/2022040611/5ed939016714ca7f47695e94/html5/thumbnails/16.jpg)
> library(ChemometricsWithR)
> data(gasoline)
> summary(gasoline$octane)
Min. 1st Qu. Median Mean 3rd Qu. Max.
83.40 85.88 87.75 87.18 88.45 89.60
> sd(gasoline$octane)
[1] 1.530078
> hist(gasoline$octane)
the gasoline data set has the spectra of 60 samples acquired by diffuse reflectance from 900 to 1700 nm
"R": Looking at the Data (Gasoline) – 001
![Page 17: Multivariate Analysis - Quimica Anselmo · –An Introduction to R –Multivariate Statistical Analysis using the R package chemometrics –R-bloggers: Posts Tagged ‘ "R" Chemometrics](https://reader030.vdocuments.us/reader030/viewer/2022040611/5ed939016714ca7f47695e94/html5/thumbnails/17.jpg)
> data(gasoline, package="pls")
> wavelengths<-seq(900, 1700,by=2)
> matplot(wavelengths,t(gasoline$NIR),type=“l”,lty=1,xlab="wavelengths(nm)",ylab="log(1/R)")
"R": Plotting the spectra (Gasoline) – 002
![Page 18: Multivariate Analysis - Quimica Anselmo · –An Introduction to R –Multivariate Statistical Analysis using the R package chemometrics –R-bloggers: Posts Tagged ‘ "R" Chemometrics](https://reader030.vdocuments.us/reader030/viewer/2022040611/5ed939016714ca7f47695e94/html5/thumbnails/18.jpg)
> gaspcr <- pcr(octane~NIR, ncomp = 10,data = gasoline, validation = "LOO")
> summary(gaspcr) Data: X dimension: 60 401
Y dimension: 60 1
Fit method: svdpc
Number of components considered: 10
VALIDATION: RMSEP
Cross-validated using 60 leave-one-out segments.
(Intercept) 1 comps 2 comps 3 comps 4 comps 5 comps 6 comps 7 comps 8 comps 9 comps 10 comps
CV 1.543 1.447 1.474 1.255 0.2501 0.2503 0.2578 0.2646 0.2724 0.2474 0.2508
adjCV 1.543 1.446 1.474 1.255 0.2496 0.2500 0.2575 0.2643 0.2733 0.2471 0.2508
TRAINING: % variance explained
1 comps 2 comps 3 comps 4 comps 5 comps 6 comps 7 comps 8 comps 9 comps 10 comps
X 72.57 83.90 90.86 95.46 96.70 97.66 98.16 98.52 98.85 99.09
octane 18.99 19.62 46.50 97.69 97.78 97.79 97.79 97.79 98.33 98.38
adjCV is the RMSEP Bias corrected which in the case of "LOO" is almost the same that the RMSEP without correction
"R": PLS Regression (Gasoline) – 003
![Page 19: Multivariate Analysis - Quimica Anselmo · –An Introduction to R –Multivariate Statistical Analysis using the R package chemometrics –R-bloggers: Posts Tagged ‘ "R" Chemometrics](https://reader030.vdocuments.us/reader030/viewer/2022040611/5ed939016714ca7f47695e94/html5/thumbnails/19.jpg)
One way to decide better the number of components to use, is to plot the RMSEPs:
> plot(RMSEP(gaspcr), legendpos = "topright")
The plot suggest four components giving a RMSEP of 0.250
![Page 20: Multivariate Analysis - Quimica Anselmo · –An Introduction to R –Multivariate Statistical Analysis using the R package chemometrics –R-bloggers: Posts Tagged ‘ "R" Chemometrics](https://reader030.vdocuments.us/reader030/viewer/2022040611/5ed939016714ca7f47695e94/html5/thumbnails/20.jpg)
prediction plot:
> plot(gaspcr, ncomp = 4, asp = 1, line = TRUE)
![Page 21: Multivariate Analysis - Quimica Anselmo · –An Introduction to R –Multivariate Statistical Analysis using the R package chemometrics –R-bloggers: Posts Tagged ‘ "R" Chemometrics](https://reader030.vdocuments.us/reader030/viewer/2022040611/5ed939016714ca7f47695e94/html5/thumbnails/21.jpg)
> R2(gaspcr) (Intercept) 1 comps 2 comps 3 comps 4 comps 5 comps 6 comps 7 comps 8 comps 9 comps 10 comps
-0.03419 0.09043 0.05573 0.31590 0.97284 0.97279 0.97113 0.96959 0.96777 0.97341 0.97267
> plot(R2(gaspcr),legendpos = "topright")
"R": PLS Regression (Gasoline) – 004
![Page 22: Multivariate Analysis - Quimica Anselmo · –An Introduction to R –Multivariate Statistical Analysis using the R package chemometrics –R-bloggers: Posts Tagged ‘ "R" Chemometrics](https://reader030.vdocuments.us/reader030/viewer/2022040611/5ed939016714ca7f47695e94/html5/thumbnails/22.jpg)
> explvar(gaspcr) Comp 1 Comp 2 Comp 3 Comp 4 Comp 5 Comp 6 Comp 7 Comp 8 Comp 9 Comp 10
72.5651378 11.3380191 6.9542569 4.5998259 1.2402978 0.9668303 0.4940023 0.3625086 0.3321849 0.2322125
> plot(gaspcr,comps=1:4,plottype = c("scores"))
"R": PLS Regression (Gasoline) – 005
![Page 23: Multivariate Analysis - Quimica Anselmo · –An Introduction to R –Multivariate Statistical Analysis using the R package chemometrics –R-bloggers: Posts Tagged ‘ "R" Chemometrics](https://reader030.vdocuments.us/reader030/viewer/2022040611/5ed939016714ca7f47695e94/html5/thumbnails/23.jpg)
> plot(gaspcr,comps=1:2,plottype = c("scores"))
![Page 24: Multivariate Analysis - Quimica Anselmo · –An Introduction to R –Multivariate Statistical Analysis using the R package chemometrics –R-bloggers: Posts Tagged ‘ "R" Chemometrics](https://reader030.vdocuments.us/reader030/viewer/2022040611/5ed939016714ca7f47695e94/html5/thumbnails/24.jpg)
We can change “scores” for “ loadings”, and get the plot of the 4 loadings together:
> plot(gaspcr,comps=1:4,plottype = c(“loadings"), legendpos = "topleft")
![Page 25: Multivariate Analysis - Quimica Anselmo · –An Introduction to R –Multivariate Statistical Analysis using the R package chemometrics –R-bloggers: Posts Tagged ‘ "R" Chemometrics](https://reader030.vdocuments.us/reader030/viewer/2022040611/5ed939016714ca7f47695e94/html5/thumbnails/25.jpg)
We can plot also the regression coefficients spectrum,
> coefplot(gaspcr,comps=1,legend = "topleft")
or to see the values in numbers
> coef(gaspcr,comp=1)
, , Comp 1
octane
900 nm -0.0341446314
902 nm -0.0327240249
904 nm -0.0350492088
906 nm -0.0395840447
908 nm -0.0415126609
910 nm -0.0449274757
912 nm -0.0434251293
914 nm -0.0451249879
916 nm -0.0416846176
918 nm -0.0385643706
920 nm -0.0375470475
922 nm -0.0365454999
924 nm -0.0358456375
...
![Page 26: Multivariate Analysis - Quimica Anselmo · –An Introduction to R –Multivariate Statistical Analysis using the R package chemometrics –R-bloggers: Posts Tagged ‘ "R" Chemometrics](https://reader030.vdocuments.us/reader030/viewer/2022040611/5ed939016714ca7f47695e94/html5/thumbnails/26.jpg)
> residuals(gaspcr)[,,"4 comps"]
1 2 3 4 5
0.03704204 0.33750933 0.22115505 -0.28487712 -0.68587280
6 7 8 9 10
0.01653984 -0.03306587 -0.18136291 -0.20117289 -0.17286853
11 12 13 14 15
0.46050412 0.41050510 -0.02757601 -0.10609027 0.02278801
16 17 18 19 20
-0.08819202 0.50414416 0.20199013 -0.26221683 0.26851495
21 22 23 24 25
0.16139618 -0.34945544 0.01459367 -0.26995777 -0.13343275
26 27 28 29 30
0.02590811 0.04002573 -0.12586186 -0.38973620 -0.13456302
31 32 33 34 35
-0.25159097 -0.06826080 0.07190096 0.18064040 0.11086376
36 37 38 39 40
0.07786016 -0.03073137 0.32008155 0.05643676 0.20914842
41 42 43 44 45
-0.14730593 -0.34578297 0.04821531 0.07854058 -0.05146090
46 47 48 49 50
-0.19527580 -0.19275490 -0.05877137 0.14775051 0.12837901
51 52 53 54 55
0.06664761 0.32048790 0.08680848 0.17854041 -0.11500208
56 57 58 59 60
0.27214465 -0.33164973 -0.25354120 0.38776329 0.02360415
> residuals(gaspcr) shows all 10 comps residuals
![Page 27: Multivariate Analysis - Quimica Anselmo · –An Introduction to R –Multivariate Statistical Analysis using the R package chemometrics –R-bloggers: Posts Tagged ‘ "R" Chemometrics](https://reader030.vdocuments.us/reader030/viewer/2022040611/5ed939016714ca7f47695e94/html5/thumbnails/27.jpg)
> plot(residuals(gaspcr)[,,"4 comps"],xlab="sample",ylab="error")
![Page 28: Multivariate Analysis - Quimica Anselmo · –An Introduction to R –Multivariate Statistical Analysis using the R package chemometrics –R-bloggers: Posts Tagged ‘ "R" Chemometrics](https://reader030.vdocuments.us/reader030/viewer/2022040611/5ed939016714ca7f47695e94/html5/thumbnails/28.jpg)
> qqnorm(residuals(gaspcr)[,,"4 comps"])
> qqline(residuals(gaspcr)[,,"4 comps"])
![Page 29: Multivariate Analysis - Quimica Anselmo · –An Introduction to R –Multivariate Statistical Analysis using the R package chemometrics –R-bloggers: Posts Tagged ‘ "R" Chemometrics](https://reader030.vdocuments.us/reader030/viewer/2022040611/5ed939016714ca7f47695e94/html5/thumbnails/29.jpg)
We divide the whole Set into a Train Set and a Test Set
> gasTrain<-gasoline[1:50,]
> gasTest<-gasoline[51:60,]
Let´s develop the PCR with the Train Set and LOO CV
> gaspcr1<-pcr(octane~NIR,ncomp=10,data=gasTrain,validation="LOO")
> summary(gaspcr1) Data: X dimension: 50 401
Y dimension: 50 1
Fit method: svdpc
Number of components considered: 10
VALIDATION: RMSEP
Cross-validated using 50 leave-one-out segments.
(Intercept) 1 comps 2 comps 3 comps 4 comps 5 comps 6 comps 7 comps 8 comps 9 comps 10 comps
CV 1.545 1.472 1.483 0.2894 0.2522 0.2622 0.2681 0.2386 0.2328 0.2416 0.2423
adjCV 1.545 1.471 1.482 0.2879 0.2518 0.2618 0.2677 0.2373 0.2323 0.2411 0.2415
TRAINING: % variance explained
1 comps 2 comps 3 comps 4 comps 5 comps 6 comps 7 comps 8 comps 9 comps 10 comps
X 79.86 88.12 93.54 96.54 97.74 98.38 98.75 99.06 99.28 99.42
octane 16.99 21.36 97.00 97.71 97.73 97.77 98.47 98.54 98.62 98.83
![Page 30: Multivariate Analysis - Quimica Anselmo · –An Introduction to R –Multivariate Statistical Analysis using the R package chemometrics –R-bloggers: Posts Tagged ‘ "R" Chemometrics](https://reader030.vdocuments.us/reader030/viewer/2022040611/5ed939016714ca7f47695e94/html5/thumbnails/30.jpg)
For this exercise we decide 4 components
Let´s predict our Test Set with this 4 components Model
> predict(gaspcr1,ncomp=4,newdata=gasTest) , , 4 comps
octane
51 88.07381
52 87.36530
53 88.30914
54 85.00247
55 85.33157
56 84.59513
57 87.56126
58 86.90745
59 89.21833
60 87.08905
> predplot(gaspcr1,ncomp=4,newdata=gasTest,asp=1,line=TRUE)
![Page 31: Multivariate Analysis - Quimica Anselmo · –An Introduction to R –Multivariate Statistical Analysis using the R package chemometrics –R-bloggers: Posts Tagged ‘ "R" Chemometrics](https://reader030.vdocuments.us/reader030/viewer/2022040611/5ed939016714ca7f47695e94/html5/thumbnails/31.jpg)
Let´s look to the RMSEP Statistic. This is very nice tool to decide if 4 components is fine or we can choose more or less components
> RMSEP(gaspcr1,newdata=gasTest)
(Intercept) 1 comps 2 comps 3 comps 4 comps 5 comps 6 comps 7 comps 8 comps 9 comps 10 comps
1.5369 1.3226 1.2568 0.4634 0.2241 0.2283 0.2600 0.2795 0.2434 0.2290 0.2881
The CV for the Model with 4 components was 0.252
RMSEP
![Page 32: Multivariate Analysis - Quimica Anselmo · –An Introduction to R –Multivariate Statistical Analysis using the R package chemometrics –R-bloggers: Posts Tagged ‘ "R" Chemometrics](https://reader030.vdocuments.us/reader030/viewer/2022040611/5ed939016714ca7f47695e94/html5/thumbnails/32.jpg)
PCR > gaspcr <- pcr(octane~NIR, ncomp = 10,data = gasoline, validation = "LOO")
PLSR > gasplsr <- plsr(octane~NIR, ncomp = 10,data = gasoline, validation = "LOO")
...
![Page 33: Multivariate Analysis - Quimica Anselmo · –An Introduction to R –Multivariate Statistical Analysis using the R package chemometrics –R-bloggers: Posts Tagged ‘ "R" Chemometrics](https://reader030.vdocuments.us/reader030/viewer/2022040611/5ed939016714ca7f47695e94/html5/thumbnails/33.jpg)
• Referências
– An Introduction to R
– Multivariate Statistical Analysis using the R package chemometrics
– R-bloggers: Posts Tagged ‘ "R" Chemometrics ’
• 7 pages
– R Tutorial: an R Introduction to Statistics