using r for statistical training: an application to six sigma methodology for process improvement
DESCRIPTION
Presentation at the XXXIII Congreso Nacional de Estadística e Investigación Operativa (Madrid, April 2012)TRANSCRIPT
Using R forStatistical Training
17/04/2012
EL Cano,JM Moguerza,
A Redchuk
Statistical Training
The Problem
Approaches
The R Choice
The R framework
Sweave
Application
Six Sigma
Examples
Environments
Using R for Statistical TrainingAn Application to Six Sigma Methodology
for Process Improvement.
Emilio L. Cano, Andres Redchuk and JavierM. Moguerza
Departamento de Estadıstica e Investigacion OperativaUniversidad Rey Juan Carlos (Madrid)
XXXIII Congreso Nacional de Estadıstica eInvestigacion Operativa
SEIO 2012 1/28
Using R forStatistical Training
17/04/2012
EL Cano,JM Moguerza,
A Redchuk
Statistical Training
The Problem
Approaches
The R Choice
The R framework
Sweave
Application
Six Sigma
Examples
Environments
Contenido
1 Statistical TrainingThe ProblemApproaches
2 The R ChoiceThe R frameworkSweave
3 ApplicationSix SigmaExamplesEnvironments
SEIO 2012 2/28
Using R forStatistical Training
17/04/2012
EL Cano,JM Moguerza,
A Redchuk
Statistical Training
The Problem
Approaches
The R Choice
The R framework
Sweave
Application
Six Sigma
Examples
Environments
Contenido
1 Statistical TrainingThe ProblemApproaches
2 The R ChoiceThe R frameworkSweave
3 ApplicationSix SigmaExamplesEnvironments
SEIO 2012 2/28
Using R forStatistical Training
17/04/2012
EL Cano,JM Moguerza,
A Redchuk
Statistical Training
The Problem
Approaches
The R Choice
The R framework
Sweave
Application
Six Sigma
Examples
Environments
Contenido
1 Statistical TrainingThe ProblemApproaches
2 The R ChoiceThe R frameworkSweave
3 ApplicationSix SigmaExamplesEnvironments
SEIO 2012 2/28
Using R forStatistical Training
17/04/2012
EL Cano,JM Moguerza,
A Redchuk
Statistical Training
The Problem
Approaches
The R Choice
The R framework
Sweave
Application
Six Sigma
Examples
Environments
Contenido
1 Statistical TrainingThe ProblemApproaches
2 The R ChoiceThe R frameworkSweave
3 ApplicationSix SigmaExamplesEnvironments
SEIO 2012 3/28
Using R forStatistical Training
17/04/2012
EL Cano,JM Moguerza,
A Redchuk
Statistical Training
The Problem
Approaches
The R Choice
The R framework
Sweave
Application
Six Sigma
Examples
Environments
The ProblemElements of Statistical Training
SEIO 2012 4/28
Using R forStatistical Training
17/04/2012
EL Cano,JM Moguerza,
A Redchuk
Statistical Training
The Problem
Approaches
The R Choice
The R framework
Sweave
Application
Six Sigma
Examples
Environments
Copy-paste ApproachApproaches
Inconsistencies
Errors
Out-of-date
non-reproducible
Painful changes
SEIO 2012 5/28
Using R forStatistical Training
17/04/2012
EL Cano,JM Moguerza,
A Redchuk
Statistical Training
The Problem
Approaches
The R Choice
The R framework
Sweave
Application
Six Sigma
Examples
Environments
Reproducible Research ApproachApproaches
Reproducible ResearchThe goal of reproducible research is to tiespecific instructions to data analysis andexperimental data so that scholarship can berecreated, better understood and verified
Literate ProgrammingLiterate programming is a methodology thatcombines a programming language with adocumentation language
SEIO 2012 6/28
Using R forStatistical Training
17/04/2012
EL Cano,JM Moguerza,
A Redchuk
Statistical Training
The Problem
Approaches
The R Choice
The R framework
Sweave
Application
Six Sigma
Examples
Environments
Reproducible ResearchWorkflow
SEIO 2012 7/28
Using R forStatistical Training
17/04/2012
EL Cano,JM Moguerza,
A Redchuk
Statistical Training
The Problem
Approaches
The R Choice
The R framework
Sweave
Application
Six Sigma
Examples
Environments
Contenido
1 Statistical TrainingThe ProblemApproaches
2 The R ChoiceThe R frameworkSweave
3 ApplicationSix SigmaExamplesEnvironments
SEIO 2012 8/28
Using R forStatistical Training
17/04/2012
EL Cano,JM Moguerza,
A Redchuk
Statistical Training
The Problem
Approaches
The R Choice
The R framework
Sweave
Application
Six Sigma
Examples
Environments
The R SystemChoosing R
What is R?R is a language and environment for statisticalcomputing and graphics.
Open Source
Platform independent
Huge community
Extensible
3 730 availablepackages
http://www.r-project.org
SEIO 2012 9/28
Using R forStatistical Training
17/04/2012
EL Cano,JM Moguerza,
A Redchuk
Statistical Training
The Problem
Approaches
The R Choice
The R framework
Sweave
Application
Six Sigma
Examples
Environments
LATEX, Beamer, PDFChoosing R
LATEXLaTeX is a high-quality typesetting system; itincludes features designed for the productionof technical and scientific documentation
BeamerBeamer is a LaTeX class for creatingpresentations that are held using a projector,but it can also be used to create transparencyslides
LATEXFiles can easily be converted to PDF.SEIO 2012 10/28
Using R forStatistical Training
17/04/2012
EL Cano,JM Moguerza,
A Redchuk
Statistical Training
The Problem
Approaches
The R Choice
The R framework
Sweave
Application
Six Sigma
Examples
Environments
Sweave DocumentsAn Efficient Framework
SweaveA Sweave document is a plain-text file whichmerges LATEX code and R code. The Rfunction Sweave() converts the Sweavedocument (*.Rnw) into a LATEXfile (*.tex).The code chunks are executed and the resultsembedded into the LATEX file.
SEIO 2012 11/28
Using R forStatistical Training
17/04/2012
EL Cano,JM Moguerza,
A Redchuk
Statistical Training
The Problem
Approaches
The R Choice
The R framework
Sweave
Application
Six Sigma
Examples
Environments
Contenido
1 Statistical TrainingThe ProblemApproaches
2 The R ChoiceThe R frameworkSweave
3 ApplicationSix SigmaExamplesEnvironments
SEIO 2012 12/28
Using R forStatistical Training
17/04/2012
EL Cano,JM Moguerza,
A Redchuk
Statistical Training
The Problem
Approaches
The R Choice
The R framework
Sweave
Application
Six Sigma
Examples
Environments
Methodology at a GlanceSix Sigma
The EssenseThe application of the Scientific Method toprocess improvement, using an easy language.
DMAIC CycleDefineMeasureAnalyzeImproveControl
RolesChampionMaster Black BeltBlack BeltGreen Belt
SEIO 2012 13/28
Using R forStatistical Training
17/04/2012
EL Cano,JM Moguerza,
A Redchuk
Statistical Training
The Problem
Approaches
The R Choice
The R framework
Sweave
Application
Six Sigma
Examples
Environments
SixSigma PackageSix Sigma
Six Sigma with R | Paper Helicopter template
cut
fold ↑ fold ↓tape
?cu
t
fold
↓↓
cut
fold
↑↑
cut
tape
?
tape
?
clip?
min
(6.5cm)
std
(8cm)
max
(9.5cm)
← b
ody
leng
th →
← body width →min
(4cm)
min
(4cm)
max
(6cm)
max
(6cm)
min
(6.5cm)
std
(8cm)
max
(9.5cm)
← w
ings
leng
th →
Using packagesManuals
Data sets
Templates
Learn-by-Code
Six Sigma Process Map
Paper Helicopter Project
INPUTSX
operators tools raw material facilities
INSPECTION
INP
UT
S
sheets...
Param.(x): width NCoperator CMeasure pattern Pdiscard P
Featur.(y): ok
ASSEMBLY
INP
UT
S
sheets
Param.(x): operator Ccut Pfix Protor.width Crotor.length Cpaperclip Ctape C
Featur.(y): weight
TEST
INP
UT
S
helicopter
Param.(x): operator Cthrow Pdiscard Penvironment N
Featur.(y): time
LABELING
INP
UT
S
helicopter
Param.(x): operator Clabel P
Featur.(y): label
OUTPUTSY
helicopter LEGEND(C)ontrollable(Cr)itical(N)oise(P)rocedure
SEIO 2012 14/28
Using R forStatistical Training
17/04/2012
EL Cano,JM Moguerza,
A Redchuk
Statistical Training
The Problem
Approaches
The R Choice
The R framework
Sweave
Application
Six Sigma
Examples
Environments
BookSix Sigma
Six Sigma with RA live example: The entire book has beenproduced using Sweave.
The roadmap: TheDMAIC Cycle
The case study: paperhelicopter
SixSigma package: datasets, functions
Easy explanations,further readings
SEIO 2012 15/28
Using R forStatistical Training
17/04/2012
EL Cano,JM Moguerza,
A Redchuk
Statistical Training
The Problem
Approaches
The R Choice
The R framework
Sweave
Application
Six Sigma
Examples
Environments
Sweave Example ISix Sigma Application
\documentclass[a4paper ]{ article}
\usepackage{Sweave}
\title{Design of Experiments}
\author{EL Cano and JM Moguerza and A Rechuk}
\begin{document}
\maketitle
\section{Introduction}
Design of experiments is the most important took in the Improve phase of the
DMAIC cycle \ldots.
<<>>=
library(SixSigma)
doe.model1 <- lm(score ~ flour + salt + bakPow +
flour * salt + flour * bakPow +
salt * bakPow + flour * salt * bakPow ,
data = ss.data.doe1)
summary(doe.model1)
@
This is the general model:
\begin{equation}
\label{eq:doe:model}
SEIO 2012 16/28
Using R forStatistical Training
17/04/2012
EL Cano,JM Moguerza,
A Redchuk
Statistical Training
The Problem
Approaches
The R Choice
The R framework
Sweave
Application
Six Sigma
Examples
Environments
Sweave Example IISix Sigma Application
y_{ijkl }=\mu+\ alpha_i +\ beta_j +\ gamma_k +(\ alpha\beta)_{ij}+
(\alpha\gamma)_{ik}+(\ beta\gamma)_{kl}+(\ alpha\beta\gamma)_{ijk}+
\varepsilon_{ijkl},
\end{equation}
And here we have a plot of effects:
<<maineff , echo=FALSE , fig=TRUE >>=
plot(c(-1, 1), ylim = range(ss.data.doe1$score),
coef(doe.model1 )[1] + c(-1, 1) * coef(doe.model1 )[2],
type="b", pch =16)
abline(h=coef(doe.model1 )[1])
@
%\input{section2}
\end{document}
SEIO 2012 17/28
Design of Experiments
EL Cano and JM Moguerza and A Rechuk
April 10, 2012
1 Introduction
Design of experiments is the most important took in the Improve phase of theDMAIC cycle . . . .
> library(SixSigma)
> doe.model1 <- lm(score ~ flour + salt + bakPow +
+ flour * salt + flour * bakPow +
+ salt * bakPow + flour * salt * bakPow,
+ data = ss.data.doe1)
> summary(doe.model1)
Call:
lm(formula = score ~ flour + salt + bakPow + flour * salt + flour *
bakPow + salt * bakPow + flour * salt * bakPow, data = ss.data.doe1)
Residuals:
Min 1Q Median 3Q Max
-0.5900 -0.2888 0.0000 0.2888 0.5900
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) 5.5150 0.3434 16.061 2.27e-07 ***
flour+ 1.8350 0.4856 3.779 0.005398 **
salt+ -0.8350 0.4856 -1.719 0.123843
bakPow+ -2.9900 0.4856 -6.157 0.000272 ***
flour+:salt+ 0.1700 0.6868 0.248 0.810725
flour+:bakPow+ 0.8000 0.6868 1.165 0.277620
salt+:bakPow+ 1.1800 0.6868 1.718 0.124081
flour+:salt+:bakPow+ 0.5350 0.9712 0.551 0.596779
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
Residual standard error: 0.4856 on 8 degrees of freedom
Multiple R-squared: 0.9565, Adjusted R-squared: 0.9185
F-statistic: 25.15 on 7 and 8 DF, p-value: 7.666e-05
This is the general model:
yijkl = µ+ αi + βj + γk + (αβ)ij + (αγ)ik + (βγ)kl + (αβγ)ijk + εijkl, (1)
1
And here we have a plot of effects:
●
●
−1.0 −0.5 0.0 0.5 1.0
34
56
7
c(−1, 1)
coef
(doe
.mod
el1)
[1] +
c(−
1, 1
) * c
oef(d
oe.m
odel
1)[2
]
2
Using R forStatistical Training
17/04/2012
EL Cano,JM Moguerza,
A Redchuk
Statistical Training
The Problem
Approaches
The R Choice
The R framework
Sweave
Application
Six Sigma
Examples
Environments
Project ExampleDivide and Conquer!
StrategiesPartial Sweave files can be compiled to getpartial LATEX files. R scripts can Sweave .Rnwfiles and “source” .R files. The final documentis obtained by compiling the “master”LATEX file.
> source("code/myoptions.R")
> source("code/myfunctions.R")
> source("code/mydata.R")
> Sweave("rnw/theorem01.Rnw")
> Sweave("rnw/lesson01.Rnw")
> Sweave("rnw/exercises01.Rnw")
> ...
> texi2pdf("master.tex")
SEIO 2012 20/28
Using R forStatistical Training
17/04/2012
EL Cano,JM Moguerza,
A Redchuk
Statistical Training
The Problem
Approaches
The R Choice
The R framework
Sweave
Application
Six Sigma
Examples
Environments
Some useful extensionsPackages
knitr, pgfSweave: enhanced options forSweave
RGIFT: Automatic generation ofquestionnaires for Moodle
exams: Automatic generation of printableexams
odfWeave: Open Document formatdocuments generation
More in the “Reproducible Research” TaskView at CRAN.http://cran.r-project.org/web/views/
ReproducibleResearch.htmlSEIO 2012 21/28
Using R forStatistical Training
17/04/2012
EL Cano,JM Moguerza,
A Redchuk
Statistical Training
The Problem
Approaches
The R Choice
The R framework
Sweave
Application
Six Sigma
Examples
Environments
R GUIIntegrated Environments
SEIO 2012 22/28
Using R forStatistical Training
17/04/2012
EL Cano,JM Moguerza,
A Redchuk
Statistical Training
The Problem
Approaches
The R Choice
The R framework
Sweave
Application
Six Sigma
Examples
Environments
R StudioIntegrated Environments
SEIO 2012 23/28
Using R forStatistical Training
17/04/2012
EL Cano,JM Moguerza,
A Redchuk
Statistical Training
The Problem
Approaches
The R Choice
The R framework
Sweave
Application
Six Sigma
Examples
Environments
EMACS + ESSIntegrated Environments
SEIO 2012 24/28
Using R forStatistical Training
17/04/2012
EL Cano,JM Moguerza,
A Redchuk
Statistical Training
The Problem
Approaches
The R Choice
The R framework
Sweave
Application
Six Sigma
Examples
Environments
Eclipse + StatETIntegrated Environments
SEIO 2012 25/28
Using R forStatistical Training
17/04/2012
EL Cano,JM Moguerza,
A Redchuk
Statistical Training
The Problem
Approaches
The R Choice
The R framework
Sweave
Application
Six Sigma
Examples
Environments
Summary
Statistical training entail some challengesregarding contents and materials.
R is the perfect partner for statisticaltraining.
Reproducible research and literateprogramming enhance training materialsquality.
The use of R and LATEX through Sweave,comprise a complete framework forstatistical documentation generation.
Extensions and integrated environmentsmake easy exploiting the R capabilities.
SEIO 2012 26/28
Using R forStatistical Training
17/04/2012
EL Cano,JM Moguerza,
A Redchuk
Statistical Training
The Problem
Approaches
The R Choice
The R framework
Sweave
Application
Six Sigma
Examples
Environments
Summary
Statistical training entail some challengesregarding contents and materials.
R is the perfect partner for statisticaltraining.
Reproducible research and literateprogramming enhance training materialsquality.
The use of R and LATEX through Sweave,comprise a complete framework forstatistical documentation generation.
Extensions and integrated environmentsmake easy exploiting the R capabilities.
SEIO 2012 26/28
Using R forStatistical Training
17/04/2012
EL Cano,JM Moguerza,
A Redchuk
Statistical Training
The Problem
Approaches
The R Choice
The R framework
Sweave
Application
Six Sigma
Examples
Environments
Summary
Statistical training entail some challengesregarding contents and materials.
R is the perfect partner for statisticaltraining.
Reproducible research and literateprogramming enhance training materialsquality.
The use of R and LATEX through Sweave,comprise a complete framework forstatistical documentation generation.
Extensions and integrated environmentsmake easy exploiting the R capabilities.
SEIO 2012 26/28
Using R forStatistical Training
17/04/2012
EL Cano,JM Moguerza,
A Redchuk
Statistical Training
The Problem
Approaches
The R Choice
The R framework
Sweave
Application
Six Sigma
Examples
Environments
Summary
Statistical training entail some challengesregarding contents and materials.
R is the perfect partner for statisticaltraining.
Reproducible research and literateprogramming enhance training materialsquality.
The use of R and LATEX through Sweave,comprise a complete framework forstatistical documentation generation.
Extensions and integrated environmentsmake easy exploiting the R capabilities.
SEIO 2012 26/28
Using R forStatistical Training
17/04/2012
EL Cano,JM Moguerza,
A Redchuk
Statistical Training
The Problem
Approaches
The R Choice
The R framework
Sweave
Application
Six Sigma
Examples
Environments
Summary
Statistical training entail some challengesregarding contents and materials.
R is the perfect partner for statisticaltraining.
Reproducible research and literateprogramming enhance training materialsquality.
The use of R and LATEX through Sweave,comprise a complete framework forstatistical documentation generation.
Extensions and integrated environmentsmake easy exploiting the R capabilities.
SEIO 2012 26/28
Using R forStatistical Training
17/04/2012
EL Cano,JM Moguerza,
A Redchuk
Statistical Training
The Problem
Approaches
The R Choice
The R framework
Sweave
Application
Six Sigma
Examples
Environments
Acknowledgements
R Core Team and R enthusiasts in general.Springer
This work has been partially funded by the projects:AGORANET project (IPT-430000-2010-32)VRTUOSI www.vrtuosi.org: 502869-LLP-1-2009-ES-ERASMUS-EVC)HAUS: IPT-2011-1049-430000EDUCALAB: IPT-2011-1071-430000DEMOCRACY4ALL: IPT-2011-0869-430000CORPORATE COMMUNITY: IPT-2011-0871-430000
SEIO 2012 27/28
Using R forStatistical Training
17/04/2012
EL Cano,JM Moguerza,
A Redchuk
Statistical Training
The Problem
Approaches
The R Choice
The R framework
Sweave
Application
Six Sigma
Examples
Environments
Discussion
Thanks for yourattention !
SEIO 2012 28/28