package ‘bstats’ - universidad autónoma del estado de ... · package ‘bstats’ february 15,...

24
Package ‘bstats’ February 15, 2013 Version 1.0-12-3 Date 2011-10-31 Title Basic statistical functions for R Author Bin Wang <[email protected]>. Maintainer Bin Wang <[email protected]> Description This package collects commonly used procedures or algorithms for general data analysis. In addition, routines for linear regression analysis, statistical computing and graphics, and many others have been implemented in R for some courses taught at the University of South Alabama. License Unlimited Repository CRAN Date/Publication 2011-12-04 09:26:34 NeedsCompilation yes R topics documented: ac .............................................. 2 birth ............................................. 3 bptest ............................................ 4 bstats ............................................ 5 dw.test ............................................ 5 edf .............................................. 7 edu75 ............................................ 8 influential.plot ........................................ 8 ld50.logit .......................................... 10 ld50.logitfit ......................................... 10 lm.ci ............................................. 11 mediation.test ........................................ 12 1

Upload: others

Post on 24-May-2020

0 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Package ‘bstats’ - Universidad Autónoma del Estado de ... · Package ‘bstats’ February 15, 2013 Version 1.0-12-3 Date 2011-10-31 Title Basic statistical functions for R Author

Package ‘bstats’February 15, 2013

Version 1.0-12-3

Date 2011-10-31

Title Basic statistical functions for R

Author Bin Wang <[email protected]>.

Maintainer Bin Wang <[email protected]>

Description This package collects commonly used procedures oralgorithms for general data analysis. In addition, routinesfor linear regression analysis, statistical computing andgraphics, and many others have been implemented in R for somecourses taught at the University of South Alabama.

License Unlimited

Repository CRAN

Date/Publication 2011-12-04 09:26:34

NeedsCompilation yes

R topics documented:ac . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2birth . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3bptest . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4bstats . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5dw.test . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5edf . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7edu75 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8influential.plot . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8ld50.logit . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10ld50.logitfit . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10lm.ci . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11mediation.test . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 12

1

Page 2: Package ‘bstats’ - Universidad Autónoma del Estado de ... · Package ‘bstats’ February 15, 2013 Version 1.0-12-3 Date 2011-10-31 Title Basic statistical functions for R Author

2 ac

model.check . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 13model.test . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 14oddsratio . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 15predictor.plot . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 17residual.plot . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 18river . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 19scb . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 19supervisor . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 20vif . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 20white.test . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 22wls . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 23

Index 24

ac Autocorrelation

Description

Removal of autocorrelation by transformation.

Usage

ac(lmobj,type=’cochrane’, ...)

## S3 method for class ’lm’ac(lmobj,type=’cochrane’, ...)

Arguments

lmobj an object that inherits from class lm, such as an lm or glm object.

type method selection: ’iterative’, ’cochrane’.

... not used.

Details

’iterative’: simultaneously estimate the regression coefficients and rho by minimizing the sumsquared errors. A grid searching method is used.

’cochrane’: 1. Fit a linear regression model and compute OLS estimates 2. Calculate the residualsto estimate rho from the data. 3. Fit (1) to obtain estimates of the regression coefficients. 4. Checkto see whether autocorrelation still exist. If yes, repeat by using the estimated coefficients from step3 in step 1.

Value

coefficients, rhohat, dwtest, re-fitted model.

Page 3: Package ‘bstats’ - Universidad Autónoma del Estado de ... · Package ‘bstats’ February 15, 2013 Version 1.0-12-3 Date 2011-10-31 Title Basic statistical functions for R Author

birth 3

Author(s)

Wang, B.

References

Cochrane and Orcutt (1949)

St 335 text

Examples

data(edu75)lm0 = lm(Y~X1+X2+X3, data=edu75)ac.lm(lm0,type=’iterative’)ac.lm(lm0, type=’cochrane’)

birth Birth data

Description

Birth data for singleton live births with gestational age at least 38 weeks.

Usage

data(birth)

Format

A data frame with 400 observations on 9 variables.

Sex character ’male’ or ’female’Gestation numeric Gestational age (in weeks).Weight numeric birth weight.Length numeric height.Head numeric head size.Chest numeric chest size.Mother.s.age numeric chest size.type factor ’r’ = rural or ’u’ = urban.region factor region of the birth.

References

Wang, CSDA and JSS papers.

Page 4: Package ‘bstats’ - Universidad Autónoma del Estado de ... · Package ‘bstats’ February 15, 2013 Version 1.0-12-3 Date 2011-10-31 Title Basic statistical functions for R Author

4 bptest

bptest Breusch-Pagan Test

Description

Performs the Breusch-Pagan test against heteroskedasticity.

Usage

bptest(formula, varformula = NULL, studentize = TRUE, data = list())

Arguments

formula a symbolic description for the model to be tested (or a fitted "lm" object).varformula a formula describing only the potential explanatory variables for the variance

(no dependent variable needed). By default the same explanatory variables aretaken as in the main regression model.

studentize logical. If set to TRUE Koenker’s studentized version of the test statistic will beused.

data an optional data frame containing the variables in the model. By default thevariables are taken from the environment which bptest is called from.

Details

The Breusch-Pagan test fits a linear regression model to the residuals of a linear regression model(by default the same explanatory variables are taken as in the main regression model) and rejects iftoo much of the variance is explained by the additional explanatory variables.

UnderH0 the test statistic of the Breusch-Pagan test follows a chi-squared distribution with parameter(the number of regressors without the constant in the model) degrees of freedom.

Value

A list with class "htest" containing the following components:

statistic the value of the test statistic.p.value the p-value of the test.parameter degrees of freedom.method a character string indicating what type of test was performed.data.name a character string giving the name(s) of the data.

References

T.S. Breusch & A.R. Pagan (1979), A Simple Test for Heteroscedasticity and Random CoefficientVariation. Econometrica 47, 1287–1294

R. Koenker (1981), A Note on Studentizing a Test for Heteroscedasticity. Journal of Econometrics17, 107–112.

W. Kramer & H. Sonnberger (1986), The Linear Regression Model under Test. Heidelberg: Physica

Page 5: Package ‘bstats’ - Universidad Autónoma del Estado de ... · Package ‘bstats’ February 15, 2013 Version 1.0-12-3 Date 2011-10-31 Title Basic statistical functions for R Author

bstats 5

Examples

## generate a regressorx <- rep(c(-1,1), 50)## generate heteroskedastic and homoskedastic disturbanceserr1 <- rnorm(100, sd=rep(c(1,2), 50))err2 <- rnorm(100)## generate a linear relationshipy1 <- 1 + x + err1y2 <- 1 + x + err2## perform Breusch-Pagan testbptest(y1 ~ x)bptest(y2 ~ x)

bstats R package: bstats

Description

In this paackage, some R functions are written for the convenience of class uses. Especially for myst 315, st 210, st 335, st 475/575

Author(s)

B. Wang <[email protected]>

dw.test Durbin-Watson Test

Description

Performs the Durbin-Watson test for autocorrelation of disturbances.

Usage

dw.test(formula, order.by = NULL, alternative = c("greater", "two.sided", "less"),iterations = 15, exact = NULL, tol = 1e-10, data = list())

Arguments

formula a symbolic description for the model to be tested (or a fitted "lm" object).

order.by Either a vector z or a formula with a single explanatory variable like ~ z. Theobservations in the model are ordered by the size of z. If set to NULL (the default)the observations are assumed to be ordered (e.g., a time series).

alternative a character string specifying the alternative hypothesis.

Page 6: Package ‘bstats’ - Universidad Autónoma del Estado de ... · Package ‘bstats’ February 15, 2013 Version 1.0-12-3 Date 2011-10-31 Title Basic statistical functions for R Author

6 dw.test

iterations an integer specifying the number of iterations when calculating the p-value withthe "pan" algorithm.

exact logical. If set to FALSE a normal approximation will be used to compute the pvalue, if TRUE the "pan" algorithm is used. The default is to use "pan" if thesample size is < 100.

tol tolerance. Eigenvalues computed have to be greater than tol to be treated asnon-zero.

data an optional data frame containing the variables in the model. By default thevariables are taken from the environment which dwtest is called from.

Details

The Durbin-Watson test has the null hypothesis that the autocorrelation of the disturbances is 0. It ispossible to test against the alternative that it is greater than, not equal to, or less than 0, respectively.This can be specified by the alternative argument.

Under the assumption of normally distributed disturbances, the null distribution of the Durbin-Watson statistic is the distribution of a linear combination of chi-squared variables. The p-value iscomputed using the Fortran version of Applied Statistics Algorithm AS 153 by Farebrother (1980,1984). This algorithm is called "pan" or "gradsol". For large sample sizes the algorithm might fail tocompute the p value; in that case a warning is printed and an approximate p value will be given; thisp value is computed using a normal approximation with mean and variance of the Durbin-Watsontest statistic.

For an overview on R and econometrics see Racine & Hyndman (2002).

Value

An object of class "htest" containing:

statistic the test statistic.

p.value the corresponding p-value.

method a character string with the method used.

data.name a character string with the data name.

References

J. Durbin & G.S. Watson (1950), Testing for Serial Correlation in Least Squares Regression I.Biometrika 37, 409–428.

J. Durbin & G.S. Watson (1951), Testing for Serial Correlation in Least Squares Regression II.Biometrika 38, 159–178.

J. Durbin & G.S. Watson (1971), Testing for Serial Correlation in Least Squares Regression III.Biometrika 58, 1–19.

R.W. Farebrother (1980), Pan’s Procedure for the Tail Probabilities of the Durbin-Watson Statistic(Corr: 81V30 p189; AS R52: 84V33 p363- 366; AS R53: 84V33 p366- 369). Applied Statistics29, 224–227.

Page 7: Package ‘bstats’ - Universidad Autónoma del Estado de ... · Package ‘bstats’ February 15, 2013 Version 1.0-12-3 Date 2011-10-31 Title Basic statistical functions for R Author

edf 7

R. W. Farebrother (1984), [AS R53] A Remark on Algorithms AS 106 (77V26 p92-98), AS 153(80V29 p224-227) and AS 155: The Distribution of a Linear Combination of χ2 Random Variables(80V29 p323-333) Applied Statistics 33, 366–369.

W. Krämer & H. Sonnberger (1986), The Linear Regression Model under Test. Heidelberg: Physica.

J. Racine & R. Hyndman (2002), Using R To Teach Econometrics. Journal of Applied Econometrics17, 175–189.

See Also

lm

Examples

## generate two AR(1) error terms with parameter## rho = 0 (white noise) and rho = 0.9 respectivelyerr1 <- rnorm(100)

## generate regressor and dependent variablex <- rep(c(-1,1), 50)y1 <- 1 + x + err1

## perform Durbin-Watson testdw.test(y1 ~ x)

err2 <- filter(err1, 0.9, method="recursive")y2 <- 1 + x + err2dw.test(y2 ~ x)

edf To compute the empirical distribution function.

Description

To compute the empirical distribution function.

Usage

edf(x,y=NULL)

Arguments

x A sample. ’NA’ values will be automatically removed.

y A grid of points where the edf will be evaluated.

Author(s)

B. Wang <[email protected]>

Page 8: Package ‘bstats’ - Universidad Autónoma del Estado de ... · Package ‘bstats’ February 15, 2013 Version 1.0-12-3 Date 2011-10-31 Title Basic statistical functions for R Author

8 influential.plot

See Also

scb.

Examples

x = rnorm(100)(out = edf(x))plot(out)(out2= scb(out))lines(out2)

edu75 Education expenditure data (1975)

Description

Education expenditure data for all 50 states in U.S.A in 1975.

Usage

data(edu75)

Format

A data frame with 50 observations on 6 variables.

States character Initial of state namesY numeric Educational expenditure.X1 numeric X1.X2 numeric X2.X3 numeric X3.Region character region, 1=northwest, 2,3,4.

References

Stat 335 text

influential.plot Draw plots for the influence measures

Description

Draw plots for the influence measures.

Page 9: Package ‘bstats’ - Universidad Autónoma del Estado de ... · Package ‘bstats’ February 15, 2013 Version 1.0-12-3 Date 2011-10-31 Title Basic statistical functions for R Author

influential.plot 9

Usage

influential.plot(lmobj,type=’hadi’,ID=FALSE,col=1)

Arguments

lmobj An R object by fitting an OLS model to a data set.

type Plot type. ’hadi’: the Hadi’s influence Measures; ’potential-residual’: potential-residual plot; ’dfits’: DFITS plot; ’hat’: leverage plot; ’cook’: Cook’s distance.

ID Whether to identify points in the plots. Default: FALSE

col Color of the plot.

Value

Output the influence measures, including leverage values (Leverage), Hadi’s measure (Hadi), Welschand Kuh Measure (DFIT) and Cook’s distance (CookD). In addition, the standard residuals are alsoexported.

Author(s)

B. Wang <[email protected]>

See Also

residual.plot.

Examples

data(river)lm0 = lm(Nitrogen~Agr+Forest+Rsdntial+ComIndl, data=river)influential.plot(lm0)influential.plot(lm0,type=’hadi’)influential.plot(lm0,type=’potential’)influential.plot(lm0,type=’leve’)influential.plot(lm0,type=’dfit’)influential.plot(lm0,type=’cook’)influential.plot(lm0,type=’potential’,ID=TRUE)

Page 10: Package ‘bstats’ - Universidad Autónoma del Estado de ... · Package ‘bstats’ February 15, 2013 Version 1.0-12-3 Date 2011-10-31 Title Basic statistical functions for R Author

10 ld50.logitfit

ld50.logit Predict Doses for Binomial Assay model (using counts)

Description

Calibrate binomial assays, generalizing the calculation of LD50 based on a logistic regressionmodel.

Usage

ld50.logit(ndead, ntotal, dose, cf = 1:2, p = 0.5)

Arguments

ndead A vector of number of failures.

ntotal Total number of trials.

dose A vector of dosages.

cf The terms in the coefficient vector giving the intercept and coefficient of (log-)dose

p Probabilities at which to predict the dose needed.

References

Venables, W. N. and Ripley, B. D. (2002) Modern Applied Statistics with S. Springer.

Examples

ldose <- rep(0:5, 2)numdead <- c(1, 4, 9, 13, 18, 20, 0, 2, 6, 10, 12, 16)n=20

ld50.logit(numdead,n,ldose,p = 0.5)

ld50.logitfit Predict Doses for Binomial Assay model (using counts)

Description

Calibrate binomial assays, generalizing the calculation of LD50 based on a logistic regressionmodel.

Usage

ld50.logitfit(rate, dose, p = 0.5)

Page 11: Package ‘bstats’ - Universidad Autónoma del Estado de ... · Package ‘bstats’ February 15, 2013 Version 1.0-12-3 Date 2011-10-31 Title Basic statistical functions for R Author

lm.ci 11

Arguments

rate A vector of percentages of successes among all trials.

dose A vector of dosages.

p Probabilities at which to predict the dose needed.

References

Venables, W. N. and Ripley, B. D. (2002) Modern Applied Statistics with S. Springer.

Examples

ldose <- rep(0:5, 2)rate <- c(1, 4, 9, 13, 18, 20, 0, 2, 6, 10, 12, 16)/20

ld50.logitfit(rate,ldose,p = 0.5)

lm.ci To compute the confidene interval of the regression parameters.

Description

To compute the confidene interval of the regression parameters.

Usage

lm.ci(lmobj,level=0.95)

Arguments

lmobj An R object by fitting a linear regression model to a data set.

level Confidence level. Default: 0.95.

Author(s)

B. Wang <[email protected]>

See Also

model.test.

Page 12: Package ‘bstats’ - Universidad Autónoma del Estado de ... · Package ‘bstats’ February 15, 2013 Version 1.0-12-3 Date 2011-10-31 Title Basic statistical functions for R Author

12 mediation.test

Examples

data(birth)attach(birth)lm0 = lm(Head~Weight)lm.ci(lm0)lm1 = lm(Head~Weight+Gestation)lm.ci(lm1, level=0.99)

mediation.test The Sobel mediation test

Description

To compute statistics and p-values for the Sobel test. Results for three versions of "Sobel test" areprovided: Sobel test, Aroian test and Goodman test.

Usage

mediation.test(mv,iv,dv)

Arguments

mv The mediator variable.

iv The independent variable.

dv The dependent variable.

Details

To test whether a mediator carries the influence on an IV to a DV.

Value

Missing values are not allowed.

Author(s)

B. Wang <[email protected]>

Page 13: Package ‘bstats’ - Universidad Autónoma del Estado de ... · Package ‘bstats’ February 15, 2013 Version 1.0-12-3 Date 2011-10-31 Title Basic statistical functions for R Author

model.check 13

References

MacKinnon, D. P., & Dwyer, J. H. (1993). Estimating mediated effects in prevention studies.Evaluation Review, 17, 144-158.

MacKinnon, D. P., Warsi, G., & Dwyer, J. H. (1995). A simulation study of mediated effect mea-sures. Multivariate Behavioral Research, 30, 41-62.

Preacher, K. J., & Hayes, A. F. (2004). SPSS and SAS procedures for estimating indirect effects insimple mediation models. Behavior Research Methods,Instruments, & Computers, 36, 717-731.

Preacher, K. J., & Hayes, A. F. (2008). asymptotic and resampling strategies for assessing andcomparing indirect effects in multiple mediator models. Behavior Research Methods, Instruments,& Computers, 40, 879-891.

Examples

mv = rnorm(100)iv = rnorm(100)dv = rnorm(100)mediation.test(mv,iv,dv)

model.check Linear Regression Model Check

Description

Performs tests to check the least squares assumptions for a linear regression model.

Usage

model.check(lmobj)

Arguments

lmobj A fitted model

Details

In this function, we check the normality, independece, and constant variance assmptions of the errorterms, and the presence of multicollinearity.

Value

A list with class "htest" containing the following components:

statistic the value of the test statistic.p.value the p-value of the test.parameter degrees of freedom.method a character string indicating what type of test was performed.data.name a character string giving the name(s) of the data.

Page 14: Package ‘bstats’ - Universidad Autónoma del Estado de ... · Package ‘bstats’ February 15, 2013 Version 1.0-12-3 Date 2011-10-31 Title Basic statistical functions for R Author

14 model.test

References

To be updated.

Examples

data(river)lm0 = lm(Nitrogen~Agr+Forest+Rsdntial+ComIndl, data=river)model.check(lm0)

model.test To compare two models and determine which one is adequate.

Description

To compare a full model and reduced model to test whether the reduced model is adequate or not.

Usage

model.test(fmobj,rmobj,alpha=0.05)

Arguments

fmobj An R object by fitting a full linear regression model (FM) to a data set.

rmobj An R object by fitting a reduced linear regression model (RM) to a data set.

alpha Significance level. Default: alpha=0.05.

Details

To test a null hypothesis "H0: the RM is adequate" against "H1: the FM is adequate". The valuesof test statistic, p-value and critical value based on an F test will be given.

Value

Missing values are not allowed.

Author(s)

B. Wang <[email protected]>

See Also

lm.ci.

Page 15: Package ‘bstats’ - Universidad Autónoma del Estado de ... · Package ‘bstats’ February 15, 2013 Version 1.0-12-3 Date 2011-10-31 Title Basic statistical functions for R Author

oddsratio 15

Examples

data(supervisor)attach(supervisor)lm0 = lm(Y~X1+X3)lm1 = lm(Y~X1+X2+X3+X4+X5+X6)model.test(lm1,lm0)

oddsratio Odds Ratio and Relative Risk

Description

To compute the odds ratio and relative risk based on a 2 X 2 table.

Usage

oddsratio(x,alpha=0.05,n,...)

Arguments

x A vector of length 2 of the number of events from the case and control studies.n A vector of length 2 of the sample sizes.alpha The significance level. Default: 0.05.... Controls

Details

x can be a matrix or a data.frame: the first columns showing the number of events and the secondcolumn showing the sample sizes.

Exact confidence limits for the odds ratio by using an algorithm based on Thomas (1971). See alsoGart (1971). If the sample sizes are too large, the exact confidence interval may not work due tooverflow problem.

Asymptotic confidence limits are computed according to SAS/STAT(R) 9.2 User’s Guide, SecondEdition.

Score method: code has been published for generating confidence intervals by inverting a score test.It is available from http://web.stat.ufl.edu/~aa/cda/R/two_sample/R2/

See also "riskratio" and "oddsratio" in R package epitools.

Value

OR an estimate of odds ratio;RR an estimate of realtive risk;ORCI A table showing various (1-alpha)% confidence limits for OR;RRCI A table showing various (1-alpha)% confidence limits for RR;

Page 16: Package ‘bstats’ - Universidad Autónoma del Estado de ... · Package ‘bstats’ February 15, 2013 Version 1.0-12-3 Date 2011-10-31 Title Basic statistical functions for R Author

16 oddsratio

References

Agresti, A. (1990) _Categorical data analysis_. New York: Wiley. Pages 59-66.

Agresti, A. (1992), A Survey of Exact Inference for Contingency Tables Statistical Science, Vol. 7,No. 1. (Feb., 1992), pp. 131-153.

Agresti, A. (2002), Categorical Data Analysis, Second Edition, New York: John Wiley \& Sons.

Fisher, R. A. (1935) The logic of inductive inference. _Journal of the Royal Statistical SocietySeries A_ *98*, 39-54.

Fisher, R. A. (1962) Confidence limits for a cross-product ratio. _Australian Journal of Statistics_*4*, 41.

Fisher, R. A. (1970) _Statistical Methods for Research Workers._ Oliver & Boyd.

Mehta, C. R. and Patel, N. R. (1986) Algorithm 643. FEXACT: A Fortran subroutine for Fisher’sexact test on unordered r*c contingency tables. _ACM Transactions on Mathematical Software_,*12*, 154-161.

Clarkson, D. B., Fan, Y. and Joe, H. (1993) A Remark on Algorithm 643: FEXACT: An Algorithmfor Performing Fisher’s Exact Test in r x c Contingency Tables. _ACM Transactions on Mathemat-ical Software_, *19*, 484-488.

Patefield, W. M. (1981) Algorithm AS159. An efficient method of generating r x c tables with givenrow and column totals. _Applied Statistics_ *30*, 91-97.

Stokes, M. E., Davis, C. S., and Koch, G. G. (2000), Categorical Data Analysis Using the SASSystem, Second Edition, Cary, NC: SAS Institute Inc.

See Also

fisher.test, chisq.test

Examples

# library(bstats)x = c(1,0)n = c(72370,73058)oddsratio(x,n=n)

Convictions <-matrix(c(2, 10, 15, 3),

nrow = 2,dimnames =list(c("Dizygotic", "Monozygotic"),

c("Convicted", "Not convicted")))Convictionsfisher.test(Convictions, conf.level = 0.95)$conf.int

x = matrix(c(2,10,17,13), ncol=2)oddsratio(x)

Convictions <-matrix(c(8, 492, 0, 500), nrow = 2, byrow=TRUE)

Page 17: Package ‘bstats’ - Universidad Autónoma del Estado de ... · Package ‘bstats’ February 15, 2013 Version 1.0-12-3 Date 2011-10-31 Title Basic statistical functions for R Author

predictor.plot 17

fisher.test(Convictions, conf.level = 0.95)$conf.int

x = c(8,0)n = c(500,500)oddsratio(x,n=n)

predictor.plot Draw plots for predictor impacts on the dependent variable

Description

Draw added-variable plot (av) or redidual plus component (rc) plot.

Usage

predictor.plot(lmobj,type=’av’,ID=FALSE, col=1)

Arguments

lmobj An R object by fitting an OLS model to a data set.

type Plot type. ’av’: added variable plot; ’rc’: residual plus component plot.

ID Whether to identify points in the plots. Default: FALSE

col Color of the plot.

Value

Missing value not allowed.

Author(s)

B. Wang <[email protected]>

See Also

residual.plot.

Examples

data(river)lm0 = lm(Nitrogen~Agr+Forest+Rsdntial+ComIndl, data=river)predictor.plot(lm0)predictor.plot(lm0,type=’rc’)

Page 18: Package ‘bstats’ - Universidad Autónoma del Estado de ... · Package ‘bstats’ February 15, 2013 Version 1.0-12-3 Date 2011-10-31 Title Basic statistical functions for R Author

18 residual.plot

residual.plot Draw residual plots for an ordinary regression model.

Description

Draw residual plots for an ordinary regression model.

Usage

residual.plot(lmobj,type=’fitted’,col=1)

Arguments

lmobj An R object by fitting an OLS model to a data set.

type Type of residual plot(s): ’fitted’, residuals against fitted values; ’index’, residualsagainst index; ’predictor’, residuals against each of the predictors in the fittedmodel; ’qqplot’, qq-plot of the standardized residuals to check the normalityassumption.

col Color of the plot.

Value

Missing values are not allowed.

Author(s)

B. Wang <[email protected]>

See Also

influential.plot.

Examples

data(river)lm0 = lm(Nitrogen~Agr+Forest+Rsdntial+ComIndl, data=river)residual.plot(lm0)residual.plot(lm0,type=’index’)residual.plot(lm0,type=’predictor’)

Page 19: Package ‘bstats’ - Universidad Autónoma del Estado de ... · Package ‘bstats’ February 15, 2013 Version 1.0-12-3 Date 2011-10-31 Title Basic statistical functions for R Author

scb 19

river New York river data

Description

This is a data set selected from book "Regression by examples" by Samprit Chatterjee and Ali S.Hadi.

Usage

data(river)

Format

In a 1976 study exploring the relationship between water quality and land use, Haith (1976) obtainedthe measurements on 20 river basins in New York State. A question of interest here is how theland use around a river basin contributes to the water pollution as measured by the mean nitrogenconcentration (mg/liter).

River character River namesAgr numeric percentage of land area currently in agricultural useForest numeric percentage of forest landRsdntial numeric percentage of land area in residential useComIndl numeric percentage of land area either in commercial or industrial useNitrogen numeric mean nitrogen concentration

References

"Regression analysis by example" by Samprit Chatterjee and Ali S. Hadi, Wiley. ISBN: 978-0-471-74696-6.

scb To compute the simultaneous confidence bands.

Description

To compute the simultaneous confidence bands.

Usage

scb(x,alpha=0.05)

Arguments

x An R object. Currently, only ’edf’ objects are supported.alpha Significance level. Default 0.05 for a 95 percent confidence level.

Page 20: Package ‘bstats’ - Universidad Autónoma del Estado de ... · Package ‘bstats’ February 15, 2013 Version 1.0-12-3 Date 2011-10-31 Title Basic statistical functions for R Author

20 vif

Author(s)

B. Wang <[email protected]>

See Also

edf.

Examples

x = rnorm(100)(out = edf(x))plot(out)(out2= scb(out))lines(out2)

supervisor Supervisor performance data

Description

This is a data set selected from book "Regression by examples" by Samprit Chatterjee and Ali S.Hadi.

Usage

data(supervisor)

Format

A data frame with 28829 observations on 8 variables.

Y numeric overall rating of jon being done by supervisorX1--X6 numeric average score for six different aspects

References

"Regression analysis by example" by Samprit Chatterjee and Ali S. Hadi, Wiley. ISBN: 978-0-471-74696-6.

vif Variance Inflation Factors

Page 21: Package ‘bstats’ - Universidad Autónoma del Estado de ... · Package ‘bstats’ February 15, 2013 Version 1.0-12-3 Date 2011-10-31 Title Basic statistical functions for R Author

vif 21

Description

Calculates variance-inflation and generalized variance-inflation factors for linear and generalizedlinear models.

Usage

vif(object, ...)

## S3 method for class ’lm’vif(object, ...)

Arguments

object an object that inherits from class lm, such as an lm or glm object.

... not used.

Details

If all terms in an unweighted linear model have 1 df, then the usual variance-inflation factors arecalculated.

If any terms in an unweighted linear model have more than 1 df, then generalized variance-inflationfactors (Fox and Monette, 1992) are calculated. These are interpretable as the inflation in size ofthe confidence ellipse or ellipsoid for the coefficients of the term in comparison with what wouldbe obtained for orthogonal data.

The generalized vifs are invariant with respect to the coding of the terms in the model (as long asthe subspace of the columns of the model matrix pertaining to each term is invariant). To adjust forthe dimension of the confidence ellipsoid, the function also prints GV IF 1/(2×df) where df is thedegrees of freedom associated with the term.

Through a further generalization, the implementation here is applicable as well to other sorts ofmodels, in particular weighted linear models and generalized linear models, that inherit from classlm.

Value

A vector of vifs, or a matrix containing one row for each term in the model, and columns for theGVIF, df, and GV IF 1/(2×df).

Author(s)

Henric Nilsson and John Fox <[email protected]>

References

Fox, J. and Monette, G. (1992) Generalized collinearity diagnostics. JASA, 87, 178–183.

Fox, J. (2008) Applied Regression Analysis and Generalized Linear Models, Second Edition. Sage.

Fox, J. and Weisberg, S. (2011) An R Companion to Applied Regression, Second Edition, Sage.

Page 22: Package ‘bstats’ - Universidad Autónoma del Estado de ... · Package ‘bstats’ February 15, 2013 Version 1.0-12-3 Date 2011-10-31 Title Basic statistical functions for R Author

22 white.test

Examples

data(edu75)lm0 = lm(Y~X1+X2+X3, data=edu75)vif(lm0)

white.test White test of constant variance

Description

Perform a test to check the common variance assumption for a linear regression model.

Usage

white.test(lmobj)

Arguments

lmobj A fitted model

Details

In this function, we check constant variance assmptions of the error terms.

Value

A list with class "htest" containing the following components:

statistic the value of the test statistic.

p.value the p-value of the test.

parameter degrees of freedom.

method a character string indicating what type of test was performed.

data.name a character string giving the name(s) of the data.

References

White test, From Wikipedia, the free encyclopedia.

Examples

data(river)lm0 = lm(Nitrogen~Agr+Forest+Rsdntial+ComIndl, data=river)white.test(lm0)

Page 23: Package ‘bstats’ - Universidad Autónoma del Estado de ... · Package ‘bstats’ February 15, 2013 Version 1.0-12-3 Date 2011-10-31 Title Basic statistical functions for R Author

wls 23

wls Weighted least squares estimate by groups

Description

Weighted least squares estimate by groups.

Usage

wls(lmobj,group)

Arguments

lmobj An R object by fitting an OLS model to a data set.

group used to cluster the data. Can be a factor or a numerical vector.

Value

output the updated regressionn model with WLS.

Author(s)

B. Wang <[email protected]>

See Also

residual.plot.

Examples

data(edu75)lm0 = lm(Y~X1+X2+X3, data=edu75)wls(lm0,group=edu75$Region)

Page 24: Package ‘bstats’ - Universidad Autónoma del Estado de ... · Package ‘bstats’ February 15, 2013 Version 1.0-12-3 Date 2011-10-31 Title Basic statistical functions for R Author

Index

∗Topic datasetsbirth, 3edu75, 8river, 19supervisor, 20

∗Topic htestbptest, 4dw.test, 5model.check, 13oddsratio, 15white.test, 22

∗Topic modelsld50.logit, 10ld50.logitfit, 10

∗Topic regressionac, 2ld50.logit, 10ld50.logitfit, 10vif, 20

∗Topic statsbstats, 5edf, 7influential.plot, 8lm.ci, 11model.test, 14predictor.plot, 17residual.plot, 18scb, 19wls, 23

∗Topic testmediation.test, 12

ac, 2

birth, 3bptest, 4bstats, 5

chisq.test, 16

dw.test, 5

edf, 7, 20edu75, 8

fisher.test, 16

influential.plot, 8, 18

ld50.logit, 10ld50.logitfit, 10lines.glm.dose (ld50.logit), 10lines.scb (scb), 19lm, 7lm.ci, 11, 14

mediation.test, 12model.check, 13model.test, 11, 14

oddsratio, 15

plot.edf (edf), 7plot.glm.dose (ld50.logit), 10plot.scb (scb), 19predictor.plot, 17print.edf (edf), 7print.glm.dose (ld50.logit), 10print.odds (oddsratio), 15print.scb (scb), 19

residual.plot, 9, 17, 18, 23river, 19

scb, 8, 19supervisor, 20

vif, 20

white.test, 22wls, 23

24