statistical programming languages -- day 2 svn-revision: 0uweziegenhagen.de/academic/r/day2.pdf ·...
TRANSCRIPT
![Page 1: Statistical Programming Languages -- Day 2 SVN-revision: 0uweziegenhagen.de/academic/r/day2.pdf · Day 2 1-4 Factorvariables Factorvariables: categoricalvariables(numericorstring)](https://reader034.vdocuments.us/reader034/viewer/2022043000/5f77c2d4d520541b6a5bcf0b/html5/thumbnails/1.jpg)
Statistical Programming Languages – Day 2SVN-revision: 0
Uwe Ziegenhagen
Institut für Statistik and ÖkonometrieHumboldt-Universität zu Berlinhttp://www.uweziegenhagen.de
![Page 2: Statistical Programming Languages -- Day 2 SVN-revision: 0uweziegenhagen.de/academic/r/day2.pdf · Day 2 1-4 Factorvariables Factorvariables: categoricalvariables(numericorstring)](https://reader034.vdocuments.us/reader034/viewer/2022043000/5f77c2d4d520541b6a5bcf0b/html5/thumbnails/2.jpg)
Day 2 1-1
Agenda for Today
� Data frames� Reading and Writing Data� Exploratory Data Analysis
Statistics
![Page 3: Statistical Programming Languages -- Day 2 SVN-revision: 0uweziegenhagen.de/academic/r/day2.pdf · Day 2 1-4 Factorvariables Factorvariables: categoricalvariables(numericorstring)](https://reader034.vdocuments.us/reader034/viewer/2022043000/5f77c2d4d520541b6a5bcf0b/html5/thumbnails/3.jpg)
Day 2 1-2
Data Frames
� matrices can only have one datatype� data frames: several type allowed� equal length of all elements required
Statistics
![Page 4: Statistical Programming Languages -- Day 2 SVN-revision: 0uweziegenhagen.de/academic/r/day2.pdf · Day 2 1-4 Factorvariables Factorvariables: categoricalvariables(numericorstring)](https://reader034.vdocuments.us/reader034/viewer/2022043000/5f77c2d4d520541b6a5bcf0b/html5/thumbnails/4.jpg)
Day 2 1-3
Data Frame Example
1 a <- c(10 ,20 ,15 ,43 ,76 ,41 ,25 ,46) # numeric2 # Factor ’sex ’3 b <- factor(c("m","f","m","f","m","f","m","f"))4 # siblings , numeric5 c <- c(2,5,8,3,6,1,5,6)6 myframe <- data.frame(a,b,c)7 myframe8 colnames(myframe) <- c("Age", "Sex", "Siblings")
Statistics
![Page 5: Statistical Programming Languages -- Day 2 SVN-revision: 0uweziegenhagen.de/academic/r/day2.pdf · Day 2 1-4 Factorvariables Factorvariables: categoricalvariables(numericorstring)](https://reader034.vdocuments.us/reader034/viewer/2022043000/5f77c2d4d520541b6a5bcf0b/html5/thumbnails/5.jpg)
Day 2 1-4
Factor variables
� Factor variables: categorical variables (numeric or string )� advantages:
I implemented correctly in statistical modelingI very useful in many different types of graphicsI correct number of degrees of freedom
Statistics
![Page 6: Statistical Programming Languages -- Day 2 SVN-revision: 0uweziegenhagen.de/academic/r/day2.pdf · Day 2 1-4 Factorvariables Factorvariables: categoricalvariables(numericorstring)](https://reader034.vdocuments.us/reader034/viewer/2022043000/5f77c2d4d520541b6a5bcf0b/html5/thumbnails/6.jpg)
Day 2 1-5
Adressing Components
1 myframe [,1]2 myframe["Age"]3 myframe$Age4 myframe [3,3]<-2 # change value5 myframe[,-2] # all vars except 2nd
Statistics
![Page 7: Statistical Programming Languages -- Day 2 SVN-revision: 0uweziegenhagen.de/academic/r/day2.pdf · Day 2 1-4 Factorvariables Factorvariables: categoricalvariables(numericorstring)](https://reader034.vdocuments.us/reader034/viewer/2022043000/5f77c2d4d520541b6a5bcf0b/html5/thumbnails/7.jpg)
Day 2 1-6
Overview of Objects II
1 # add object set to search path2 attach(name)3 # remove from search path4 detach(name)
Statistics
![Page 8: Statistical Programming Languages -- Day 2 SVN-revision: 0uweziegenhagen.de/academic/r/day2.pdf · Day 2 1-4 Factorvariables Factorvariables: categoricalvariables(numericorstring)](https://reader034.vdocuments.us/reader034/viewer/2022043000/5f77c2d4d520541b6a5bcf0b/html5/thumbnails/8.jpg)
Day 2 1-7
Subgrouping Data Frames
1 > subset(myframe ,myframe$Age>30) # 4 entries2 > mean(subset(myframe$Age ,myframe$Sex=="m"))3 [1] 31.54 > mean(subset(myframe$Age ,myframe$Sex=="f"))5 [1] 37.56 myframe [( myframe$Sex=="m") & (myframe$Age>30) ,]7 # males over 308 myframe [( myframe$Sex=="m") | (myframe$Age>30) ,]9 # male or over 30
Statistics
![Page 9: Statistical Programming Languages -- Day 2 SVN-revision: 0uweziegenhagen.de/academic/r/day2.pdf · Day 2 1-4 Factorvariables Factorvariables: categoricalvariables(numericorstring)](https://reader034.vdocuments.us/reader034/viewer/2022043000/5f77c2d4d520541b6a5bcf0b/html5/thumbnails/9.jpg)
Day 2 1-8
Data Frames - Variables
1 > myframe <- cbind (myframe , "Income (USD)"=2 c(1700 ,2100 ,2300 ,2050 ,2800 ,1450 ,3400 ,2000))3 > names(myframe)[names(myframe)=="Income (USD)"] <-
"IncomeUSD"
Task: Add variable IncomeEUR.
Statistics
![Page 10: Statistical Programming Languages -- Day 2 SVN-revision: 0uweziegenhagen.de/academic/r/day2.pdf · Day 2 1-4 Factorvariables Factorvariables: categoricalvariables(numericorstring)](https://reader034.vdocuments.us/reader034/viewer/2022043000/5f77c2d4d520541b6a5bcf0b/html5/thumbnails/10.jpg)
Day 2 1-9
Search and Replace
Use gsub to perform replacement of matches determined byregular expressions.
1 > names(myframe) <- gsub("In","Out",names(myframe))2 > myframe3 Age Sex Siblings OutcomeUSD4 1 10 m 2 17005 2 20 f 5 2100
Statistics
![Page 11: Statistical Programming Languages -- Day 2 SVN-revision: 0uweziegenhagen.de/academic/r/day2.pdf · Day 2 1-4 Factorvariables Factorvariables: categoricalvariables(numericorstring)](https://reader034.vdocuments.us/reader034/viewer/2022043000/5f77c2d4d520541b6a5bcf0b/html5/thumbnails/11.jpg)
Day 2 1-10
Deleting and Sorting
1 > myframe$Age <- NULL2 > myframe3
4 > myframe[order(myframe$Age) ,]5 Age Sex Siblings OutcomeUSD6 1 10 m 2 17007 3 15 m 2 2300
Statistics
![Page 12: Statistical Programming Languages -- Day 2 SVN-revision: 0uweziegenhagen.de/academic/r/day2.pdf · Day 2 1-4 Factorvariables Factorvariables: categoricalvariables(numericorstring)](https://reader034.vdocuments.us/reader034/viewer/2022043000/5f77c2d4d520541b6a5bcf0b/html5/thumbnails/12.jpg)
Day 2 1-11
Deleting and Sorting
1. Sortieren nach Sex2. Sortieren nach Age
1 > myframe[order(myframe$Sex ,partial=myframe$Age) ,]2 Age Sex Siblings OutcomeUSD3 2 20 f 5 21004 6 41 f 1 14505 8 46 f 6 20006 4 43 f 3 20507 1 10 m 2 17008 3 15 m 2 23009 5 76 m 6 280010 7 25 m 5 3400
Statistics
![Page 13: Statistical Programming Languages -- Day 2 SVN-revision: 0uweziegenhagen.de/academic/r/day2.pdf · Day 2 1-4 Factorvariables Factorvariables: categoricalvariables(numericorstring)](https://reader034.vdocuments.us/reader034/viewer/2022043000/5f77c2d4d520541b6a5bcf0b/html5/thumbnails/13.jpg)
Day 2 1-12
Short excursion sed & awk
sed:
� stream editor, rowwise� examples
I sed ’s/abc/def/’ input.txt >output.txtI sed ’s|/|\|g’ input.txt >output.txtI example using regular expression
� extremely useful to process large amounts of data� Tutorial: http://www.grymoire.com/Unix/Sed.html
Statistics
![Page 14: Statistical Programming Languages -- Day 2 SVN-revision: 0uweziegenhagen.de/academic/r/day2.pdf · Day 2 1-4 Factorvariables Factorvariables: categoricalvariables(numericorstring)](https://reader034.vdocuments.us/reader034/viewer/2022043000/5f77c2d4d520541b6a5bcf0b/html5/thumbnails/14.jpg)
Day 2 1-13
Short excursion sed & awk
awk:
� Aho, Weinberger, Kernighan� in general used to work on columns
I awk ’print 12’ concatenates columns 1 and 2I awk ’print 1,3’ prints columns 1 and 3I another example using a sum
� Tutorial: http://www.vectorsite.net/tsawk.html
In general: Avoid data processing inside R, try to do it outside.
Statistics
![Page 15: Statistical Programming Languages -- Day 2 SVN-revision: 0uweziegenhagen.de/academic/r/day2.pdf · Day 2 1-4 Factorvariables Factorvariables: categoricalvariables(numericorstring)](https://reader034.vdocuments.us/reader034/viewer/2022043000/5f77c2d4d520541b6a5bcf0b/html5/thumbnails/15.jpg)
Day 2 1-14
Data Management
� Sources of data:I Data in human readable format (CSV, TXT)I Data in binary format (Excel, SPSS, STATA)I Data from relational databases
� R has 100 built-in datasets: objects(package:datasets)� many packages bring their own datasets
Statistics
![Page 16: Statistical Programming Languages -- Day 2 SVN-revision: 0uweziegenhagen.de/academic/r/day2.pdf · Day 2 1-4 Factorvariables Factorvariables: categoricalvariables(numericorstring)](https://reader034.vdocuments.us/reader034/viewer/2022043000/5f77c2d4d520541b6a5bcf0b/html5/thumbnails/16.jpg)
Day 2 1-15
Loading data from library
1 library("datasets") # loads dataset library2 # (automatically loaded)3 data("pressure") # loads dataset4 data(pressure) # alternative5 pressure # output pressure data
Statistics
![Page 17: Statistical Programming Languages -- Day 2 SVN-revision: 0uweziegenhagen.de/academic/r/day2.pdf · Day 2 1-4 Factorvariables Factorvariables: categoricalvariables(numericorstring)](https://reader034.vdocuments.us/reader034/viewer/2022043000/5f77c2d4d520541b6a5bcf0b/html5/thumbnails/17.jpg)
Day 2 1-16
Data Management
1 objects(package:datasets)2 help(Titanic)3 data(Titanic)4 objects ()
Statistics
![Page 18: Statistical Programming Languages -- Day 2 SVN-revision: 0uweziegenhagen.de/academic/r/day2.pdf · Day 2 1-4 Factorvariables Factorvariables: categoricalvariables(numericorstring)](https://reader034.vdocuments.us/reader034/viewer/2022043000/5f77c2d4d520541b6a5bcf0b/html5/thumbnails/18.jpg)
Day 2 1-17
Reading & Writing Data
1 data <- read.table("filename", header=TRUE)2 # guess type of variable: int , double , text3 # header with columnnames is available4 names(data) # variable names5 str(data) # show structure of dataframe6 head(data) # show first rows
Statistics
![Page 19: Statistical Programming Languages -- Day 2 SVN-revision: 0uweziegenhagen.de/academic/r/day2.pdf · Day 2 1-4 Factorvariables Factorvariables: categoricalvariables(numericorstring)](https://reader034.vdocuments.us/reader034/viewer/2022043000/5f77c2d4d520541b6a5bcf0b/html5/thumbnails/19.jpg)
Day 2 1-18
Reading & Writing Data
1 # check if datafile has header2 # may generate string matrix only3 # if 1st row less than 2nd assume head4 data <- read.table("filename")
Statistics
![Page 20: Statistical Programming Languages -- Day 2 SVN-revision: 0uweziegenhagen.de/academic/r/day2.pdf · Day 2 1-4 Factorvariables Factorvariables: categoricalvariables(numericorstring)](https://reader034.vdocuments.us/reader034/viewer/2022043000/5f77c2d4d520541b6a5bcf0b/html5/thumbnails/20.jpg)
Day 2 1-19
Reading & Writing Data
1 # using wrong separator2 data <- read.table("file", sep="\t")3 # assumes tabulator , may read whole4 # line as one variable
Statistics
![Page 21: Statistical Programming Languages -- Day 2 SVN-revision: 0uweziegenhagen.de/academic/r/day2.pdf · Day 2 1-4 Factorvariables Factorvariables: categoricalvariables(numericorstring)](https://reader034.vdocuments.us/reader034/viewer/2022043000/5f77c2d4d520541b6a5bcf0b/html5/thumbnails/21.jpg)
Day 2 1-20
Reading & Writing Data
1 # reading NaNs2 data <- read.table("file", na.strings=".")3 # assumes NaN to be represented as ’.’
Statistics
![Page 22: Statistical Programming Languages -- Day 2 SVN-revision: 0uweziegenhagen.de/academic/r/day2.pdf · Day 2 1-4 Factorvariables Factorvariables: categoricalvariables(numericorstring)](https://reader034.vdocuments.us/reader034/viewer/2022043000/5f77c2d4d520541b6a5bcf0b/html5/thumbnails/22.jpg)
Day 2 1-21
Reading & Writing Data
1 # reading CSV2 # decimal sep ’.’, var. sep ’,’3 data <- read.csv("file") #4 # decimal sep ’,’, var. sep ’;’5 data2 <- read.csv2("file")6 # direct import from Excel7 data <- read.table(file="clipboard")
Statistics
![Page 23: Statistical Programming Languages -- Day 2 SVN-revision: 0uweziegenhagen.de/academic/r/day2.pdf · Day 2 1-4 Factorvariables Factorvariables: categoricalvariables(numericorstring)](https://reader034.vdocuments.us/reader034/viewer/2022043000/5f77c2d4d520541b6a5bcf0b/html5/thumbnails/23.jpg)
Day 2 1-22
Reading & Writing Data
1 x <- read.csv("beispiel.csv", sep=";")2 dim(x)3 names(x)4 x5 # write to file6 write.table(x,file="test.csv",sep=";",7 row.names= FALSE , quote=FALSE)
Statistics
![Page 24: Statistical Programming Languages -- Day 2 SVN-revision: 0uweziegenhagen.de/academic/r/day2.pdf · Day 2 1-4 Factorvariables Factorvariables: categoricalvariables(numericorstring)](https://reader034.vdocuments.us/reader034/viewer/2022043000/5f77c2d4d520541b6a5bcf0b/html5/thumbnails/24.jpg)
Day 2 1-23
Univariate Statistics
ddistrib density functionpdistrib distribution functionqdistrib quantile functionrdistrib random numbers
Statistics
![Page 25: Statistical Programming Languages -- Day 2 SVN-revision: 0uweziegenhagen.de/academic/r/day2.pdf · Day 2 1-4 Factorvariables Factorvariables: categoricalvariables(numericorstring)](https://reader034.vdocuments.us/reader034/viewer/2022043000/5f77c2d4d520541b6a5bcf0b/html5/thumbnails/25.jpg)
Day 2 1-24
Univariate Statistics
1 dnorm (0) # density value of N(0,1)2 pnorm (0) # cum. density up to 03 qnorm (0.5) # quantile for 0.54 rnorm (100) # vector with 100 random numbers
Statistics
![Page 26: Statistical Programming Languages -- Day 2 SVN-revision: 0uweziegenhagen.de/academic/r/day2.pdf · Day 2 1-4 Factorvariables Factorvariables: categoricalvariables(numericorstring)](https://reader034.vdocuments.us/reader034/viewer/2022043000/5f77c2d4d520541b6a5bcf0b/html5/thumbnails/26.jpg)
Day 2 1-25
Univariate Statistics
ddistrib density functionpdistrib distribution functionqdistrib quantile functionrdistrib random numbers
Statistics
![Page 27: Statistical Programming Languages -- Day 2 SVN-revision: 0uweziegenhagen.de/academic/r/day2.pdf · Day 2 1-4 Factorvariables Factorvariables: categoricalvariables(numericorstring)](https://reader034.vdocuments.us/reader034/viewer/2022043000/5f77c2d4d520541b6a5bcf0b/html5/thumbnails/27.jpg)
Day 2 1-26
Distributions in standard R
<key>binom Binomial<key>chisq Chi-Squared<key>exp Exponential
<key>f F<key>hyper Hypergeometric
<key>multinom Multinomial<key>logis Logistic<key>norm Normal<key>pois Poisson
<key>t Student t<key>unif Uniform
with <key>= d, p, q or r
Statistics
![Page 28: Statistical Programming Languages -- Day 2 SVN-revision: 0uweziegenhagen.de/academic/r/day2.pdf · Day 2 1-4 Factorvariables Factorvariables: categoricalvariables(numericorstring)](https://reader034.vdocuments.us/reader034/viewer/2022043000/5f77c2d4d520541b6a5bcf0b/html5/thumbnails/28.jpg)
Day 2 1-27
Empirical Distributions in R
1 density () # KDE using Gaussian kernel2 ecdf() # empirical cdf
Statistics
![Page 29: Statistical Programming Languages -- Day 2 SVN-revision: 0uweziegenhagen.de/academic/r/day2.pdf · Day 2 1-4 Factorvariables Factorvariables: categoricalvariables(numericorstring)](https://reader034.vdocuments.us/reader034/viewer/2022043000/5f77c2d4d520541b6a5bcf0b/html5/thumbnails/29.jpg)
Day 2 1-28
Sampling in R
1 sample(n) # sample 1:n vector2 sample(x) # shuffle the x vector3 sample(x, replace=TRUE) bootstrap x vector4 sample(x, n) # draw sample of size n from x5 sample(x, n, replace = TRUE) # bootstrap sample from
x
Seed is stored in .Random.seed, for simulations use set.seed()
Statistics
![Page 30: Statistical Programming Languages -- Day 2 SVN-revision: 0uweziegenhagen.de/academic/r/day2.pdf · Day 2 1-4 Factorvariables Factorvariables: categoricalvariables(numericorstring)](https://reader034.vdocuments.us/reader034/viewer/2022043000/5f77c2d4d520541b6a5bcf0b/html5/thumbnails/30.jpg)
Day 2 1-29
Summary statistics
1 mean(x) # mean *2 median(x) # median3 var(x) # sample variance4 sd(x) # sample std. deviation5 cov(y) # cov of matrix y6 quantile(x,p) # sample quantile *7 min(x) # minimum of x *8 max(x) # maximum of x *9 range() # range of x *10 skewness(x) # skewness11 kurtosis(x) # kurtosis
* can remove NaNs using parameter na.rm=T
Statistics
![Page 31: Statistical Programming Languages -- Day 2 SVN-revision: 0uweziegenhagen.de/academic/r/day2.pdf · Day 2 1-4 Factorvariables Factorvariables: categoricalvariables(numericorstring)](https://reader034.vdocuments.us/reader034/viewer/2022043000/5f77c2d4d520541b6a5bcf0b/html5/thumbnails/31.jpg)
Day 2 1-30
Linear Regression
linear regression model
� tries to model relation between dependent variable Y and1 . . . n indep. variables X1, . . . ,Xn
� influence of variables is linear, first regressor X1 usually set toconstant
� sample of size n is fitted to model:
yi = β1 + β2 · x2 + · · ·+ βn · xn + εi
yi = x>i β + εi
y = Xβ + ε
Statistics
![Page 32: Statistical Programming Languages -- Day 2 SVN-revision: 0uweziegenhagen.de/academic/r/day2.pdf · Day 2 1-4 Factorvariables Factorvariables: categoricalvariables(numericorstring)](https://reader034.vdocuments.us/reader034/viewer/2022043000/5f77c2d4d520541b6a5bcf0b/html5/thumbnails/32.jpg)
Day 2 1-31
Linear Regression
Goals:
� estimate unknown βs using least squares� decide if all variables are needed� check if resulting model explains data well enough� use model to forecast
β̂ = (X>X )−1X>y
Statistics
![Page 33: Statistical Programming Languages -- Day 2 SVN-revision: 0uweziegenhagen.de/academic/r/day2.pdf · Day 2 1-4 Factorvariables Factorvariables: categoricalvariables(numericorstring)](https://reader034.vdocuments.us/reader034/viewer/2022043000/5f77c2d4d520541b6a5bcf0b/html5/thumbnails/33.jpg)
Day 2 1-32
Linear Regression
1 # standard model2 lm(y~x + z)3 # no intercept4 lm(y~x - 1)5 # using data frame6 lm(amount ~ price , data = consumption)7 # using data frame and attach ()8 lm(amount ~ price)
Statistics
![Page 34: Statistical Programming Languages -- Day 2 SVN-revision: 0uweziegenhagen.de/academic/r/day2.pdf · Day 2 1-4 Factorvariables Factorvariables: categoricalvariables(numericorstring)](https://reader034.vdocuments.us/reader034/viewer/2022043000/5f77c2d4d520541b6a5bcf0b/html5/thumbnails/34.jpg)
Day 2 1-33
Exercise
Download Hubble data fromhttp://lib.stat.cmu.edu/DASL/Datafiles/Hubble.html andestimate the hubble constant H by the model
recession-velocity = H · distance
Statistics
![Page 35: Statistical Programming Languages -- Day 2 SVN-revision: 0uweziegenhagen.de/academic/r/day2.pdf · Day 2 1-4 Factorvariables Factorvariables: categoricalvariables(numericorstring)](https://reader034.vdocuments.us/reader034/viewer/2022043000/5f77c2d4d520541b6a5bcf0b/html5/thumbnails/35.jpg)
> summary(lm)
Call:lm(formula = rec.vel ~ distance - 1)
Residuals:Min 1Q Median 3Q Max
-411.544 -191.302 -7.103 127.951 496.063
Coefficients:Estimate Std. Error t value Pr(>|t|)
distance 423.94 42.15 10.06 6.87e-10 ***---Signif. codes: 0 ’***’ 0.001 ’**’ 0.01 ’*’ 0.05 ’.’ 0.1 ’ ’ 1
Residual standard error: 229 on 23 degrees of freedomMultiple R-squared: 0.8147, Adjusted R-squared: 0.8067F-statistic: 101.1 on 1 and 23 DF, p-value: 6.869e-10
![Page 36: Statistical Programming Languages -- Day 2 SVN-revision: 0uweziegenhagen.de/academic/r/day2.pdf · Day 2 1-4 Factorvariables Factorvariables: categoricalvariables(numericorstring)](https://reader034.vdocuments.us/reader034/viewer/2022043000/5f77c2d4d520541b6a5bcf0b/html5/thumbnails/36.jpg)
Day 2 1-35
Residuals
5 10 15 20
−40
0−
200
020
040
0
Index
resi
dual
s(lm
)
Statistics
![Page 37: Statistical Programming Languages -- Day 2 SVN-revision: 0uweziegenhagen.de/academic/r/day2.pdf · Day 2 1-4 Factorvariables Factorvariables: categoricalvariables(numericorstring)](https://reader034.vdocuments.us/reader034/viewer/2022043000/5f77c2d4d520541b6a5bcf0b/html5/thumbnails/37.jpg)
Day 2 1-36
Residuals
−2 −1 0 1 2
−40
0−
200
020
040
0
Normal Q−Q Plot
Theoretical Quantiles
Sam
ple
Qua
ntile
s
Statistics
![Page 38: Statistical Programming Languages -- Day 2 SVN-revision: 0uweziegenhagen.de/academic/r/day2.pdf · Day 2 1-4 Factorvariables Factorvariables: categoricalvariables(numericorstring)](https://reader034.vdocuments.us/reader034/viewer/2022043000/5f77c2d4d520541b6a5bcf0b/html5/thumbnails/38.jpg)
Multiple Linear Regression
1. Download Cereal data from DASL2. Read data as dataframe3. Run linear regression rating = sugars + fat
![Page 39: Statistical Programming Languages -- Day 2 SVN-revision: 0uweziegenhagen.de/academic/r/day2.pdf · Day 2 1-4 Factorvariables Factorvariables: categoricalvariables(numericorstring)](https://reader034.vdocuments.us/reader034/viewer/2022043000/5f77c2d4d520541b6a5bcf0b/html5/thumbnails/39.jpg)
Call:lm(formula = rating ~ sugars + fat, data = data)
Residuals:Min 1Q Median 3Q Max
-14.6640 -5.6937 0.2078 4.7660 32.6163
Coefficients:Estimate Std. Error t value Pr(>|t|)
(Intercept) 61.0886 1.9527 31.284 < 2e-16 ***sugars -2.2128 0.2347 -9.428 2.59e-14 ***fat -3.0658 1.0365 -2.958 0.00416 **---Signif. codes: 0 ’***’ 0.001 ’**’ 0.01 ’*’ 0.05 ’.’ 0.1 ’ ’ 1
Residual standard error: 8.755 on 74 degrees of freedomMultiple R-squared: 0.6218, Adjusted R-squared: 0.6116F-statistic: 60.84 on 2 and 74 DF, p-value: 2.371e-16
![Page 40: Statistical Programming Languages -- Day 2 SVN-revision: 0uweziegenhagen.de/academic/r/day2.pdf · Day 2 1-4 Factorvariables Factorvariables: categoricalvariables(numericorstring)](https://reader034.vdocuments.us/reader034/viewer/2022043000/5f77c2d4d520541b6a5bcf0b/html5/thumbnails/40.jpg)
Day 2 1-39
t-Test
� checks if certain coefficient βj is different from 0.� teststatistics
t =β̂j
SD(β̂j)
� under H0 : t ∼ tn−p with p as number of independ. variables
Statistics
![Page 41: Statistical Programming Languages -- Day 2 SVN-revision: 0uweziegenhagen.de/academic/r/day2.pdf · Day 2 1-4 Factorvariables Factorvariables: categoricalvariables(numericorstring)](https://reader034.vdocuments.us/reader034/viewer/2022043000/5f77c2d4d520541b6a5bcf0b/html5/thumbnails/41.jpg)
Day 2 1-40
F -Test
� idea: check if sum of squared residuals is reduced significantlyif one regressor is added
� add one regressor ⇒ model gets better, but significantly?� Compute RSS1 for full model with k parameters, compute
RSS2 for simplified model with k − q parameters� Compute teststatistics
F =(RSS2− RSS1)/q
RSS1/(n − k)
� under H0 : F ∼ F(n−1,n−q−1)
� for q = 1 F-test equivalent to t-Test
Statistics
![Page 42: Statistical Programming Languages -- Day 2 SVN-revision: 0uweziegenhagen.de/academic/r/day2.pdf · Day 2 1-4 Factorvariables Factorvariables: categoricalvariables(numericorstring)](https://reader034.vdocuments.us/reader034/viewer/2022043000/5f77c2d4d520541b6a5bcf0b/html5/thumbnails/42.jpg)
Day 2 1-41
Residuals
0 20 40 60 80
−10
010
2030
Index
resi
dual
s(he
alth
)
Statistics
![Page 43: Statistical Programming Languages -- Day 2 SVN-revision: 0uweziegenhagen.de/academic/r/day2.pdf · Day 2 1-4 Factorvariables Factorvariables: categoricalvariables(numericorstring)](https://reader034.vdocuments.us/reader034/viewer/2022043000/5f77c2d4d520541b6a5bcf0b/html5/thumbnails/43.jpg)
Day 2 1-42
Residuals
−2 −1 0 1 2
−10
010
2030
Normal Q−Q Plot
Theoretical Quantiles
Sam
ple
Qua
ntile
s
Statistics
![Page 44: Statistical Programming Languages -- Day 2 SVN-revision: 0uweziegenhagen.de/academic/r/day2.pdf · Day 2 1-4 Factorvariables Factorvariables: categoricalvariables(numericorstring)](https://reader034.vdocuments.us/reader034/viewer/2022043000/5f77c2d4d520541b6a5bcf0b/html5/thumbnails/44.jpg)
Day 2 1-43
Residuals
−2 −1 0 1 2
−1
01
23
Normal Q−Q Plot
Theoretical Quantiles
Sam
ple
Qua
ntile
s
Statistics