indices - amherst college · pdf filepoisson regression, ... sas, 346 dotplot, 212 downloading...
Post on 05-Feb-2018
220 Views
Preview:
TRANSCRIPT
✐
✐
“book” — 2014/5/6 — 15:21 — page 397 — #421✐
✐
✐
✐
✐
✐
Indices
Separate indices are provided for subject (concept or task), SAS command, and R command.References to the examples are denoted in italics.
397
✐
✐
“book” — 2014/5/6 — 15:21 — page 399 — #423✐
✐
✐
✐
✐
✐
Subject index
3-Dhistogram, 219plot, 222
95% confidence intervalmean, 86proportion, 86
Abrams, Allyson, xxvabsolute value, 59accelerated failure time model, 163access
elements in R, 365files, 81variables, 17
addlines to plot, 244marginal rug plot, 245matrices, 64noise, 243normal density, 245straight line, 242text, 246variables, 19
age variable, 107, 381agreement, 89AIC, 143, 170airline delays, 336Akaike information criterion (AIC), 143,
170alcohol abuse, 383alcoholic drinks
HELP dataset, 382Allaire, J.J., xxiiAloisio, Kathryn, xxiialtitude, 325analysis of variance
interaction plot, 223one-way, 117two-way, 117, 139
analytic power calculations, 95and operator, 47angular plot, 224
annotating datasets, 43ANOVA
interaction plot, 223one-way, 117tables, 169
Aotearoa (New Zealand), 357API (application programming
interface), 334Apple R FAQ, 359application programming interface
(API), 334arbitrary quantiles, 85area under the curve, 225ARIMA model, 162Arnold, Tim, 288arrays, 44, 74
extract elements, 367arrows, 247ASCII
datasets, 12encoding, 27
assertions, 76assignment operators in R, 365association plot, 224attach dataframes, 361attributable risk, 87attributes
R, 369AUC (area under the curve), 225Auckland, University of, 357automated report generation, 106, 287autoregressive model, 162available datasets in R, 377AvantGarde font, 252average
running, 318average number of drinks
HELP dataset, 382axes
labels, 254multiple, 218
399
✐
✐
“book” — 2014/5/6 — 15:21 — page 400 — #424✐
✐
✐
✐
✐
✐
400 Subject index
omit, 256range, 253style, 253values, 254
barcharterror bars, 216
barplot, 211Base SAS, 341baseline interview, 379basic concepts
SAS, 347batch mode, 356, 362Bates, Douglas, 357Bayesian
external software, 291inference task view, 290, 294, 373information criterion, 170logistic regression, 292
BCA intervals, 303Bergstralh, Erik, 299best linear unbiased predictors, 158beta
distribution, 86function, 60
bias corrected and accelerated, 304bias-corrected and accelerated, 303BIC, 170big data, 5, 29, 114, 336Bike ride, 325binned scatterplot, 219binomial family, 150binomial probabilities
tabulation, 317bivariate
loess, 155relationship, 101, 217–219
Bland-Altman plot, 228BMDP files, 6, 13BMP export, 258book website, xxvBoolean
operations, 25, 31, 47, 238R, 366
bootstrapping, 31, 303box around plots, 252boxplot, 214
side-by-side, 193, 215Bradley Airport, 336Breslow estimator, 163Breslow–Day test, 91
Breusch–Pagan test, 124“broken stick” models, 159bug reports, 377byte code compiler, 373
c statistic, 151calculate derivatives, 62calculus, 62Call, Gregory, xxiicalling functions from R, 369case
sensitivity, 356, 360statement, 21
case sensitivity, xxivcategorical data, 50
as predictor, 114from continuous, 20generation, 261parameterization, 115, 296plot, 224tables, 103
Cauchydistribution, 86link function, 150
causal inference, 296censored data, 163, 227, 269Center for Epidemiologic Studies
Depression (CESD) scale, 381centering, 85Central Limit Theorem, 274CESD, 43, 44cesd variable, 43, 381cesdtv variable, 188chained equation models, 306Chambers, John, 357change working directory, 80character translations, 27character variable, see string variablecharacteristics, test, 87characters, plotting, 242chemometrics task view, 373chi-square
distribution, 86statistic, 91
Cholesky decomposition, 158choose function, 60choropleth maps, 223, 327circadian plot, 224circular plot, 224class methods, 369class variable, 50
✐
✐
“book” — 2014/5/6 — 15:21 — page 401 — #425✐
✐
✐
✐
✐
✐
Subject index 401
creating, 116ordering of levels, 116
classification, 166, 205clear graphics settings, 136clinical trial, 379
task view, 373closest values, 315closing a graphic device, 260cluster analysis
task view, 373clustering
hierarchical, 168task view, 166, 168
cocaine, 383Cochran–Mantel–Haenszel test, 91code completion, 357code examples
downloading, xxvcoefficient
of determination, 127of variation, 303regression, 124
coercingcharacter variable from numeric, 24dataframes into matrices, 368date from character, 4date to numeric, 37factor variable from numeric, 21matrices into dataframes, 368numeric from character, 4numeric from integer, 23string variable from numeric, 20
collinearity, 156color
palettes, 255selection, 255
column width, 40combine matrices, 63Comic Sans font, 252comma separated value (CSV) files, 4, 12command history, 80
R, 359SAS, 349
commentsR, 367SAS, 343
comparisonfloating point variables, 61operators, 365
compiler, 373complementary log-log link function, 150
complex fixed format files, 3, 321two lines, 331
complex numbers, 61complex survey design, 168component-wise matrix multiplication,
66Comprehensive R archive network, 358computational economics task view, 373computational physics task view, 373concatenate, 286
datasets, 35matrices, 63strings, 24
conditional execution, 72conditional logistic regression model, 151conditional probability, 278conditioning plot, 221, 232confidence interval, 78
for parameter estimates, 125for predicted observations, 226for the mean, 226proportion, 86
confidence leveldefault, 78
confidence limitsfor individual (new) observations,
126for the mean, 125plotting, 126
confounding, 296constrained optimization, 337contingency table, 90
plot, 224contingency tables, 103contour plots, 222contrasts, 146
Helmert, 116polynomial, 116SAS, 116treatment, 116
control flow, 71control structures, 71, 362controlling graph size, 250controlling Type-I error rate, 119convergence diagnosis for MCMC, 290,
294
convergence problems, 265converting covariance to correlation
matrix, 128converting datasets
long (tall) to wide format, 34
✐
✐
“book” — 2014/5/6 — 15:21 — page 402 — #426✐
✐
✐
✐
✐
✐
402 Subject index
wide to long (tall) format, 33Cook’s Distance, 122coordinate systems (maps), 325correlated data, 190
generating, 267regression models, 157residuals, 158
correlationKendall, 89matrix, 101, 128, 238Pearson, 89Spearman, 89
cosine function, 60count models
goodness of fit, 171negative binomial regression, 153,
180
Poisson regression, 153, 176zero-inflated negative binomial, 154zero-inflated Poisson regression, 154,
178
Courier font, 252covariance matrix, 127, 128, 190covariate imbalance, 296Cowles, Kate, 290Cox proportional hazards model, 163,
200
frailty, 163proportionality test, 164simulate data, 269time-varying covariate, 165
CPU time, 78Cramer’s V, 91CRAN (Comprehensive R Archive
Network), 358CRAN task views, see task viewscreate
ASCII datasets, 12categorical variable from continuous,
20categorical variable using logic, 21CSV (comma separate value) files,
12datasets for other packages, 13date variable, 37factors, 116files for other packages, 13functions, 78lagged variable, 28matrix, 63numeric variable from string, 22
observation number, 32recode categorical variable, 21string variable from numeric, 20time variables, 39
creatingdataset from counts, 87
Cronbach’s α, 166, 201cross-classification table, 48, 90cross-validation
generalized, 155crosstabs, 90, 103CSV (comma separated value) files, 4, 12cumulative
density function, 53hazard, 164hazard plots, 228product, 319sum, 319
Curran, James, 162curve plotting, 224custom graphic layouts, 251
Dalgaard, Peter, 357dashed line, 254data
display, 18, 350entry, 10from R into SAS, 327from Stata into SAS, 327generation, 71input, 39output, 40scraping, 330
Data Expo 2009, 336data input
two lines, 331data step
range of variables, 74repeat steps for a set of variables, 74SAS, 347
data structures in R, 365data technologies, 15data viewer, 357database system, 29, 336dataframes, 365
comparison with column bind, 368comparison with matrix, 368detaching, 17R, 367remove from workspace, 368
dataset
✐
✐
“book” — 2014/5/6 — 15:21 — page 403 — #427✐
✐
✐
✐
✐
✐
Subject index 403
comments, 19from counts, 87HELP study, 381other packages, 6R, 377subset, 350
datasets, xxvdate and time variables
create date, 37create time, 39extract month, 38extract quarter, 38extract weekday, 38extract year, 38reading, 3
dayslink variable, 111, 381DBF files, 6, 13debugging, 76, 345
RStudio, 76decimal representation, 61Deducer, 359default confidence level, 78defining functions, 78delete objects, 365density
estimation, 213, 220plot, 213, 220
density functions, 53generate random, 54probability, 54quantiles, 54
density plot, 101, 109overlapping, 216
depressive symptoms, 43derivatives, 62derived variable, 19, 44, 47design matrix, 127, 143, 172
specification, 115, 296design of experiments task view, 373design weights, 168detach
dataframes, 17, 139, 368packages, 17, 185, 368
determinant, 67detoxification program, 379, 381deviance, 150
tables, 169dffits, 123diagnostic agreement, 87, 225
ROC curve, 235diagnostic plots, 123, 136
diagnostics from linear regression, 135diagonal elements, 66, 67Diedrich, Holger, 213difference in log-likelihoods, 169difference in sets, 24differences between SAS and R, xxivdifferential equations
task view, 373dimension, 64diploma problem, 276directory delimiter, 2directory structure in R, 2directory structure in SAS, 2dispersion parameter, 181display data, 18, 42display format, 350display missing categories, 90displaying information about objects, 369displaying model results, 11distance metric, 25distribution
beta, 86Cauchy, 86chi-squared, 86empirical probability density plot,
214exponential, 86F, 86gamma, 86geometric, 86logistic, 86lognormal, 86negative binomial, 86normal, 54, 86parameters, 86Poisson, 86probability, 53q-q plot, 225quantile, 54quantile-quantile plot, 225stem plot, 212t, 86Weibull, 86
DocBook document type definition, 14document type definition, 14documentation
R, 362SAS, 346
dotplot, 212downloading
code examples, xxv
✐
✐
“book” — 2014/5/6 — 15:21 — page 404 — #428✐
✐
✐
✐
✐
✐
404 Subject index
drinks of alcoholHELP dataset, 382
drinkstat variable, 47dropping variables, 30drugrisk variable, 238, 381DTD, 14duplicated values, 32dynamic graphics task view, 373dynamite plot, 216
ecological data task view, 373econometrics task view, 373edit distance, 25editing data, 10efficiency
programming, 349vector operations, 72
Efron estimator, 163eigenvalues and eigenvectors, 67elapsed time, 39else statement, 362empirical
density plot, 109estimation, 276finance task view, 373power calculations, 284probability density plot, 214variance, 161, 197
encodingASCII, 27
entering data, 10environment, 369environmental task view, 373Epi Info files, 6equal variance test, 93error bars
bar chart, 216error recovery, 76etiquette
R, 377evaluate integrals, 62Evans, Michael, 271exact
confidence intervals, 86logistic regression model, 152test of proportions, 92
example codedownloading, xxvR, 361SAS, 343
Excel
creating, 12reading, 5
excesskurtosis, 84zeroes, 154
exchangeable working correlation, 161execution
conditional, 72in operating system, 79profiling, 76
expected cell counts, 105expected values, 276experimental design task view, 373exponential
distribution, 86random variables, 58, 274
exponentiation, 59export
BMP, 258datasets for other packages, 13Excel, 12graphs, 256JPEG, 258PDF, 256PNG, 259postscript, 256TIFF, 259WMF (windows metafile format),
258expressions
R, 365extensible markup language (XML), 9,
14extract characters from string, 23extract from objects, 89, 367
F distribution, 86f1 variables, 45, 46, 201, 381factor
analysis, 166, 202levels, 115, 296reordering, 116variable, 50, 114
factorial function, 60failure time data, 163Falcon, Seth, 357false positive, 225family
binomial, 150Gamma, 150Gaussian, 150
✐
✐
“book” — 2014/5/6 — 15:21 — page 405 — #429✐
✐
✐
✐
✐
✐
Subject index 405
inverse Gaussian, 150Poisson, 150
FAQApple R, 359R, 364, 377Windows R, 358
female variable, 47, 382Fibonacci sequence, 320file
browsing, 81variable format, 7
finance task view, 373find
approximate string, 25closest values, 315string within a string, 25working directory, 80
finite mixture modelstask view, 313, 373
Fisher’s exact test, 92, 103fit model separately by group, 138fixed format files, 2, 3flight delays, 336floating point representation, 61follow-up interviews, 379fonts in graphics, 252footnotes, 246for statement, 362foreign format, 43formatted
data, 12, 350model results, 11output, 287variables, 28
formula object, 90, 113forward stagewise regression, 171Foundation for Statistical Computing
R, 357Fox, John, 165fraction of missing information, 311frailty model, 163frequently asked questions
see FAQ, 364Friedman’s “super smoother”, 245Friendly, Michael, 224functions, 77
plotting, 224R, 78, 369
fuzzy search, 25
G-rho family of Harrington and Fleming,
111
g1b variable, 232, 382g1btv variable, 187, 197GAM, 155Gamma
distribution, 86family, 150function, 60, 271regression, 149
Gaussiandistribution, 53family, 150
Gelman, Andrew, 271, 273gender variable, 50, 382general linear model for correlated data,
157, 190generalized additive model, 155, 185generalized cross-validation, 155generalized estimating equation, 197
exchangeable working correlation,161
independence working correlation,161
unstructured working correlation,161
generalized linear mixed model, 161, 199generalized linear model, 149, 172
correlated outcomes, 160generalized logit model, 152, 183generalized multinomial model, 152generate
arbitrary random variables, 58categorical data, 261correlated binary variables, 267Cox model, 269dataset from counts, 87exponential random variables, 58generalized linear model random
effects, 264grid of values, 75logistic regression, 263multinomial random variables, 56multivariate normal random
variables, 56normal random variables, 56other random variables, 58pattern of repeated values, 73predicted values, 120random variables, 53residuals, 121sequence of values, 73
✐
✐
“book” — 2014/5/6 — 15:21 — page 406 — #430✐
✐
✐
✐
✐
✐
406 Subject index
truncated normal random variables,58
uniform random variables, 55genetics task view, 373genf variable, 139Gentleman, Robert, 357geometric distribution, 86getting
data from R into SAS, 327data from Stata into SAS, 327help in R, 377started with SAS, 346
GitHub, 357goodness of fit, 171, 178
ROC curve, 235Google Maps, 325GPS coordinates, 325Gromping, Ulrike, 213graduation, 276grammar of graphics, 325graphical layouts, 251graphical models task view, 373graphical settings, 253graphical user interface
deducer, 359R, 359
graphicsboxplot, 214choropleth, 327exporting, 256side-by-side boxplots, 215size, 250task view, 211, 373
greater than operator, 47grid
graphics, 373of values, 75rectangular, 248search, 337
grouping variablelinear model, 283summary statistics, 281
growth curve models, 159Gruen, Bettina, 313guide to packages
R, 372guidelines
R-help postings, 377
Hadoop, 29Hakim, Tanya, xxv
hanging rootogram, 172Harrell, Frank, 39, 128, 216, 371Harrington and Fleming G-rho family,
111
harvesting data, 330hat matrix, 122hat-check problem, 276hazard plots, 228Health Evaluation and Linkage to
Primary Care (HELP) study,379
health surveySF-36, 382
Helmert contrasts, 116HELP study
clinic, 383dataset, 381introduction, 379results, 379
help systemother resources, 377R, 361, 362R packages, 373SAS, 346
Helvetica font, 252heroin, 383Hesterberg, Tim, 274heteroscedasticity test, 124hierarchical clustering, 168, 208high-performance computing task view,
373histogram, 100, 213
comparing, 215history
of commands, 80, 359R, 357SAS, 341
homeless variable, 103, 172, 382homelessness, 381homogeneity of odds ratio, 91honest significant difference, 119, 144Hornik, Kurt, 357Hosmer–Lemeshow test, 171hospitalization, 381Hotelling’s t, 162HTML files, 14
harvesting data, 330reproducible output, 290SAS, 351, 355
HTTP/HTTPS, 9, 332Huber variance, 197
✐
✐
“book” — 2014/5/6 — 15:21 — page 407 — #431✐
✐
✐
✐
✐
✐
Subject index 407
hypertext markup language format(HTML), 14
hypertext transport protocol (HTTP), 9
i1 variable, 47, 176, 382i2 variable, 47, 382Iacus, Stefano, 357id number, 32id variable, 382identifying points, 249identity link function, 150if statement, 30, 72, 362Ihaka, Ross, xxv, 357ill-conditioned problems, 156image plot, 222imaginary numbers, 61imaging task view, 373import data, 6imputation, 306in statement, 362income inequality, 155incomplete data, 304, 306independence working correlation, 161index
R, xxivSAS, xxivsubject, xxiv
indexingin R, 43, 323lists, 366matrix, 66vector, 365
indicator variable, 115, 296individual level data, 87indtot variable, 172, 231, 382InDUC (Inventory of Drug Use
Consequences), 382infinite values, 305influence, 122information matrix, 127installing
packages in R, 371R, 358RStudio, 359SAS, 341
integerfunctions, 60problems, 340
integration, 62interaction, 117
linear regression, 130
plot, 139, 223testing, 141two-way ANOVA, 139
interceptno, 116
intersection, 24interval censored data, 227introduction
R, 357, 362RStudio, 357SAS, 341
Inventory of Drug Use Consequences, seeindtot variable
inverseGaussian family, 150link function, 150matrix, 65probability integral transform, 58
iterative proportional fitting, 153
JAGS, 291jitter points, 243joining datasets, 35joins, 29Jones, Albyn, xxvJPEG export, 258
Kaplan, Danny, 224Kaplan–Meier plot, 227, 234Kappa, 89keeping variables, 30Kendall correlation, 89kernel smoother plot, 213, 220knapsack problem, 337knitr, 287Knuth, Donald, 287Kolmogorov–Smirnov test, 94, 108Kosanke, Jon, 299Kruskal– Wallis test, 94Kuhfeld, Warren, 288kurtosis, 84
L1-constrained fitting, 170labels for variables, 18Laplace approximation, 265large data, 5, 29, 114, 336, 349large sample assumption, 274lasso method, 170latent class analysis, 167LATEX output, 287
R, 134
✐
✐
“book” — 2014/5/6 — 15:21 — page 408 — #432✐
✐
✐
✐
✐
✐
408 Subject index
SAS, 351, 355Lavine, Michael, 273learning SAS, 346least absolute shrinkage and selection
operator, 170least angle regression, 171least squares
linear, 113nonlinear, 155
legend, 69, 248Leisch, Friedrich, 287, 313, 357length
of string, 23of vector, 65
Lenth, Russell, xxv, 287less than operator, 47Levene’s test for equal variances, 93Levenshtein edit distance, 25leverage, 122library
help, 373R, 371
license, SAS, 341Ligges, Uwe, 357likelihood ratio test, 141, 169line
on plot, 244style, 254types, 254width, 255
linear combinations of parameters, 120linear discriminant analysis, 167, 206linear models, 113
by grouping variable, 283categorical predictor, 114diagnostic plots, 123diagnostics, 135generalized, 149interaction, 117, 130no intercept, 116parameterization, 115, 296R object, 113residuals, 121standardized, 121studentized, 121
standardized residuals, 121stratified analysis, 283studentized residuals, 121test for heteroscedasticity, 124
linear programming, 340link function
cauchit, 150cloglog, 150identity, 150inverse, 150log, 150logit, 150probit, 150square root, 150
linkage to primary care, 381linkstatus variable, 111, 382Linux installation
R, 358Lipsitz, Stuart, 267list files, 81lists, 366
extract elements, 89, 367literate programming, 287Little, Roderick, 306local polynomial regression, 245locating points, 249loess
bivariate, 155log
base 10, 59base 2, 59base e, 59link function, 150
log fileR, 80SAS, 349
log scale, 255log-likelihood, 170log-linear model, 153logic, 21logical expressions, 20, 21logical operator, 20, 365logistic
distribution, 86generalized, 183
logistic regression, 149, 172Bayesian, 292c statistic, 151generating, 263goodness of fit, 171Nagelkerke R2, 151ROC curve, 235
logit link function, 150lognormal
distribution, 54, 86regression, 149
logrank test, 95, 111
✐
✐
“book” — 2014/5/6 — 15:21 — page 409 — #433✐
✐
✐
✐
✐
✐
Subject index 409
long (tall) to wide format conversion, 34longitudinal regression, 157
reshaping datasets, 187looping, 71, 188lower to upper case conversions, 27lowess, 155, 185, 244Lucida font, 252Lumley, Thomas, 169, 357
M estimation, 156machine learning
task view, 167, 373machine precision, 61Macintosh R FAQ, 359macros, 77
SAS, 303MAD regression, 156Maechler, Martin, 357mailing list
R-help, 377make variables available, 17manipulate string variables, 23, 25, 26
remove spaces, 27split, 26
MANOVA, 162Mantel–Haenszel test, 91maps
choropleth, 223, 327coordinate systems, 325Google Maps, 325plotting, 321
margin specification, 253marginal
histograms, 232plot, 245
Markdown, 14, 287Markov Chain Monte Carlo, 271, 290,
294
matching, 296mathematical constants, 60mathematical expressions, 69, 247mathematical functions
absolute value, 59beta, 60choose, 60exponential, 59factorial, 60Fibonacci sequence, 320gamma, 60integer functions, 60log, 59
maximum value, 59mean value, 59minimum value, 59modulus, 59natural log, 59permute, 60square root, 59standard deviation, 59sum, 59trigonometric functions, 60
mathematical symbolsadding, 247
mathematics task view, 62, 373matrix
addition, 64combine, 63component-wise multiplication, 66concatenate, 63correlation, 128covariance, 127, 128creation, 63design, 127dimension, 64extract elements, 367graphs, 123hat, 122indexing, 66, 367information, 127inversion, 65large, 63multiplication, 57, 65, 127, 366overview, 63plots, 221R, 367sparse, 63transposition, 64
maximization, 265maximum likelihood estimation, 86maximum number of drinks
HELP dataset, 382maximum value, 59McArdle, Brian, xxvMCMC, 271, 290, 294McNemar’s test, 92mcs variable, 101, 382mean, 59, 83, 86
by group, 281trimmed, 84
mean-difference plot, 228median regression, 156medical imaging task view, 373
✐
✐
“book” — 2014/5/6 — 15:21 — page 410 — #434✐
✐
✐
✐
✐
✐
410 Subject index
medical problems, 381memory usage, 76merging datasets, 35meta analysis task view, 373metadata, 369methods, 369, 374metric for distance, 25Metropolis–Hastings algorithm, 271MICE (chained equations), 306Microsoft rtf format, 257Microsoft Word format, 257minimum absolute deviation regression,
156minimum value, 59Minitab files, 6missing data, 45, 304, 306
tables, 90missing information fraction, 311missing values, 96
recoding, 306mixed model, 158
generating, 264logistic, 160, 161logistic regression, 199
modelcomparisons, 143, 169diagnostics, 135selection, 143, 170specification, 117, 130
modeling language, 90, 113, 282modulus, 59moments, 84Mongo databases, 29month variable, 38Monty Hall problem, 278mosaic plot, 224Mosteller, Fred, 276motivational interview, 379moving average model, 162Mplus, 168multicollinearity, 156multilevel models, 160multinomial model
generalized, 152logit, 183nominal outcome, 152ordered outcome, 152
multinomial random variable, 56multiple comparisons, 119, 144multiple imputation, 306multiple plots per page, 250
multiple y axes, 218, 230multiplication
matrix, 57, 65multivariate statistics
task view, 166, 373multivariate test, 162multiway tables, 91Murdoch, Duncan, 357Murrell, Paul, xxv, 15, 211, 230, 357
Nagelkerke R2 for logistic regression, 151named arguments in R, 78, 370named lists, 366names and variable types, 17native data files, 12native files, 1natural language processing task view,
373negative binomial distribution, 86negative binomial regression, 153, 180
zero-inflated, 154Nelson–Aalen estimate, 164nested models, 149nested quotes, 18New Century Schoolbook font, 252new users
R, 362New Zealand (Aotearoa), 357next statement, 362NIAAA, 379NIDA, 379NLP optimization, 62no intercept, 116noise
add to points, 243nonlinear least squares, 155nonparametric tests, 94, 108nonrandomized studies, 296normal density, 245normal distribution, 53, 54, 68, 84, 86normal random variables, 56
truncated, 58normality testing, 92normalizing, 85
constant, 271residuals from linear model, 121residuals from mixed model, 158
not operator, 305notched boxplot, 216NP completeness, 337number of digits to display, 11
✐
✐
“book” — 2014/5/6 — 15:21 — page 411 — #435✐
✐
✐
✐
✐
✐
Subject index 411
numeric from string, 22numerical mathematics task view, 373
object-oriented programming, 369objects
displaying, 369R, 365remove, 365
observation number, 32observational studies, 296Octave files, 6ODBC, 29odds ratio, 87, 105
homogeneity, 91ODS, see output delivery systemofficial statistics, 168
task view, 169, 373omit axes, 256OnDemand for Academics, 341one-way analysis of variance, 117open-source, xxiiiOpenBUGS, 291operating system
change working directory, 80execute command, 79find working directory, 80list files, 81pause execution, 79
optimization, 62task view, 62, 373with constraints, 337
optionsR, 369SAS, 349
OR (odds ratio), 87or operator, 47, 365order statistics, 84ordered factor, 114ordered logistic model, 152, 182ordered multinomial model, 152ordering of levels, 116ordinal logit, 152, 182orientation
axis labels, 254boxplot, 214
output data from analysis, 351output delivery system, 345, 351output file formats
R, 288SAS, 351, 355
overdispersion, 149, 150
overplotting, 219
packagesdetaching, 17help, 373R, 371remove from workspace, 368
page, multiple plots per, 250pairs plot, 236pairwise differences, 119, 144Palatino font, 252palettes of colors, 255Parade magazine, 278parallel
boxplots, 193, 215computation, 374computing task view, 373processing, 371
parameter estimatesconfidence interval, 125standard errors, 125used as data, 124
parameter estimationunivariate distribution, 86
parameterization of categorical variable,115, 172, 296
reference category, 143partial file read, 2pathological distribution
sampling, 271pause execution for a time interval, 79pcs variable, 101, 382pdf output
exporting, 256SAS, 351, 355
peakedness, 84Pearson correlation, 89Pearson’s χ2 test, 91, 103, 171percentiles
probability density function, 54Perl
interface, 29modules, 13
permutation test, 94, 108permute function, 60permuted sample, 31pharmacokinetic task view, 373phi coefficient, 91pi (π), 60Pioneer Valley, 325plot
✐
✐
“book” — 2014/5/6 — 15:21 — page 412 — #436✐
✐
✐
✐
✐
✐
412 Subject index
adding arrows, 247adding footnotes, 246adding polygons, 247adding shapes, 247adding text, 246arbitrary function, 224characters, 242conditioning, 221curve, 224limits, 129maps, 321predicted lines, 226predicted values, 226regression diagnostics, 123rotating text, 246symbols, 242time series data, 333titles, 246
Plummer, Martyn, 357PNG export, 259point size specification, 252point-and-click interface, 356points, 243
locating, 249Poisson distribution, 86Poisson family, 150Poisson regression, 149, 153, 176
Bayesian, 294zero-inflated, 154, 178
polygons, 247polynomial contrasts, 116polynomial regression, 155posterior probability, 290, 294posting guide (R-help), 377postscript, 252, 256power calculations
analytic, 95empirical, 284
practical extraction and report language(Perl), 13, 29
predicted valuesgenerating from linear model, 120
presentations in RStudio, 290primary care
linkage, 381visits, 382
primary substance of abuse, 383printing model results, 11prior distribution, 290, 294probability density, 54, 214probability distributions, 68
parameter estimation, 86quantiles, 53random variables, 53simulation, 261, 276task view, 53, 373
probability integral transform, 58probit link function, 150probit regression, 149profiling of execution, 76programming, 71projection, 325projects, 357propensity scores, 296proportion, 86proportional hazards model, 163, 200
frailty, 163proportionality test, 164simulate data, 269time-varying covariate, 165
proportional odds model, 152, 182proportionality test, 164Pruim, Randall, 216, 224pseudo R2, 151pseudo-likelihood, 265pseudo-random number
generation, 53set seed, 55
pss fr variable, 238, 382psychometrics, 166, 201
task view, 166, 373
QQ plot, 136, 225quadratic growth curve models, 159quantile regression, 156, 181quantile-quantile plot, 136, 225quantiles, 85
probability density function, 54t distribution, 78
quarter variable, 38quasi-complete separation, 294quitting R, 361quotes, nested, 18
Ravailable datasets, 377bug reports, 377command history, 359data structures, 365detach packages, 185Development Core Team, 357differences from SAS, xxiv
✐
✐
“book” — 2014/5/6 — 15:21 — page 413 — #437✐
✐
✐
✐
✐
✐
Subject index 413
exiting, 360export SAS dataset, 13FAQ, 364, 377Foundation for Statistical
Computing, 357graphical user interface, 359help system, 361, 362history, 357installation, 358introduction, 357libraries, 371Linux installation, 358Markdown, 14, 290objects, 365packages, 371, 373Project, 377R Commander, 359R-help mailing list, 377reading SAS files, 6resources for new users, 362sample session, 360starting, 360support, 377task views, 372warranty, 361Windows installation, 358
R index, xxivR2
linear regression, 127logistic regression, 151
R-help mailing list, 377ragged data, 321rail trails, 325random coefficient model, 158, 159random effects model, 158, 193
estimate, 158generating, 264
random intercept model, 158random number
seed, 55, 319random slopes model, 158random variables
density, 54generate, 54generation, 53probability, 54quantiles, 54
randomization group, 383randomized clinical trial, 379range
axes, 253
rank sum test, 94reading
bytes, 8comma separated value (CSV) files,
4data, 39, 350data with two lines per obs, 331dates, 3fixed format files, 2HTTP from URL, 9long lines, 3more complex fixed format files, 3native format files, 1other files, 3other packages, 6R into SAS, 5R objects, 1SAS into R, 6spreadsheets, 5variable format files, 7, 321XML files, 9
receiver operating characteristic curve,225, 235
recoding missing values, 306recoding variables, 20, 21recover from error, 76rectangular grid, 248recursive partitioning, 166, 205reference category, 114, 115, 143, 172,
296
regression, 113categorical predictor, 114coefficients, 124diagnostic plots, 123diagnostics, 135forward stagewise, 171Gamma, 149interaction, 117, 130least angle, 171logistic, 149lognormal, 149no intercept, 116overdispersed binomial, 149overdispersed Poisson, 149parameterization, 115, 296Poisson, 149probit, 149residuals, 121standardized coefficients, 124standardized residuals, 121stratified analysis, 283
✐
✐
“book” — 2014/5/6 — 15:21 — page 414 — #438✐
✐
✐
✐
✐
✐
414 Subject index
studentized residuals, 121test for heteroscedasticity, 124
regular expressions, 25, 31rejection sampling, 271relative risk, 87reliability measures, 166, 201remove
dataframe from workspace, 368objects, 365package from workspace, 368spaces from a string, 27
rename variables, 19repeat statement, 72, 362replace a string within a string, 26replicating examples from the book, 361report generation, 14, 106, 287reproducible analysis, 14, 106, 357
knitr, 287rich text format, 257Statweave, 287tangle, 287task view, 287, 288, 373weave, 287
resampling-based inference, 303reserved commands, 362reshaping datasets, 33, 187residuals, 121
analysis, 135correlated, 158plots, 136standardized, 121studentized, 121
results from HELP study, 379rich text format (rtf), 257ridge regression, 156right censored data, 227Ripley, Brian, 357Risk Assessment Battery, 381robust statistical methods
empirical variance, 161, 197regression, 156task view, 156, 373
ROC curve, 225, 235Rosenthal, Jeffrey, 271rotating
axis labels, 254text, 246
round results, 40, 60RR (relative risk), 87RSeek, 362RStudio, xxii, 357
exporting graphs, 256installation, 359presentations, 290reproducible analysis, 290
RTF (rich text format), 257, 351, 355Rubin, Donald, 306rug plot, 245running a script, 362running average, 318
Samet, Jeffrey, 379sample size calculations
analytic, 95sampling
challenging distribution, 271dataset, 31
sampling distribution, 274sandwich variance, 161, 197Sarkar, Deepayan, 211, 230, 313, 357SAS
Base, 341basic concepts, 347differences from R, xxivfiles from R, 6GUI, 341Institute, 341license file, 341macros, 303on-line help, 346OnDemand, 341
SAS index, xxivSAS/ETS, 341SAS/GRAPH, 341SAS/IML, 341SAS/STAT, 341saving
data, 42graphs, 256parameter estimates as dataset, 175printed result as SAS dataset, 132R history, 359SAS history, 345
scalelog, 255
scaling, 85scatterplot, 102, 129, 217
binned, 219lines, 244marginal histograms, 220, 232matrix, 221multiple y values, 218
✐
✐
“book” — 2014/5/6 — 15:21 — page 415 — #439✐
✐
✐
✐
✐
✐
Subject index 415
points, 243separate plotting characters per
group, 242smoother, 129, 244
Schoenfeld residuals, 164Schoenfeld, David, xxvScott, Alastair, xxvscraping data, 330script file, 361, 362search for approximate string, 25seed, random number, 55, 275sensitivity, 87, 225sensitivity to case, xxivseparate model fitting by group, 138separate plotting characters per group,
242server version, 357set operations, 24setinit file, 341settings, graphical, 253sexrisk variable, 172, 183, 383SF-36 short form health survey, 382shapes, 247short form (SF) health survey, 382shrinkage method, lasso, 170side-by-side boxplots, 193, 215sideways orientation
boxplot, 214significance stars in R, 113, 132simulate
categorical data, 261Cox model, 269generalized linear model random
effects, 264logistic regression, 263power calculations, 284
simulation studies, 263sine function, 60singular value decomposition, 68size of graph, 250skewness, 84slides in RStudio, 290Smith College, xxv, 276smoothing spline, 129, 155, 185, 213,
220, 244social sciences
task view, 113, 128, 149, 172, 373social supports, 382SOCR (Statistics Online Computational
Resource), 359solve optimization problems, 62
sorting, 35, 51sourcing commands, 361sparse matrices, 63spatial statistics
choropleth, 327task view, 172, 325, 373
spatio-temporal datatask view, 373
Spearman correlation, 89specificity, 87, 225specifying
box around plots, 252color, 255design matrix, 115, 296margin, 253point size, 252text size, 252
splines, 374split string, 26spreadsheet, 5, 10SPSS files, 6, 13SQL, 29, 336square root, 59
link function, 150stagewise regression, 171standard deviation, 59, 83standard error, 76standardized regression coefficients, 124standardized residuals, 121
mixed model, 158Stata files, 6, 13, 327statements, SAS, 349statistical genetics task view, 373statistical learning task view, 373Statistics Online Computational
Resource (SOCR), 359StatWeave, 287stem plot, 212straight line
adding, 242stratified analysis, 138, 283string variable
concatenating strings, 24extract characters, 23find a string, 25find approximate string, 25from numeric variable, 20length, 23remove spaces, 27replace a string, 26
structural equation modeling
✐
✐
“book” — 2014/5/6 — 15:21 — page 416 — #440✐
✐
✐
✐
✐
✐
416 Subject index
latent class analysis, 167structured query language (SQL), 29,
336
Student’s t test, 93, 274studentized residuals, 121styles
axes, 253line, 254
sub variable, 129, 139subject index, xxivsubmatrix, 66submitting code, 343subsetting, 30, 47, 51, 350substance abuse treatment, 382substance of abuse, 383substance variable, 102, 383sum, 59summary statistics, 97
mean, 83separately by group, 52, 281
sums of squarescross products, 127Type III, 130, 169
support, 377survey methodology, 168
task view, 169, 373survival analysis, 163
accelerated failure time model, 163Cox model, 200frailty, 163Kaplan–Meier plot, 227, 234logrank test, 95, 111proportional hazards model, 163simulate data, 269task view, 163, 228, 373
suspend execution for a time interval, 79Sweave, 14, 287sweep operator, 85symbols
mathematical, 247plot, 242
syntax highlighting, 357Systat files, 6
t distribution, 68, 86quantile, 78
t test, 93, 107, 274table, cross-classification, 90tabulate binomial probabilities, 317tangent function, 60tangle, 287, 290
task view, 372analysis of spatial data, 172Bayesian inference, 290, 294clustering, 166, 168finite mixture models, 313graphics, 211machine learning, 167multivariate statistics, 166official statistics, 169optimization and mathematical
programming, 62probability distributions, 53psychometrics, 166reproducible analysis, 287, 288robust statistical methods, 156social sciences, 113, 128, 149, 172spatial statistics, 325survival analysis, 163, 228time series, 162
Temple Lang, Duncan, xxv, 357temporal data
task view, 373test
characteristics, 87heteroscedasticity, 124interaction, 141joint null hypotheses, 118, 119normality, 92
textadding, 246files, 12rotating, 246size specification, 252
Tibshirani, Rob, 170tick marks, 254Tierney, Luke, 357TIFF export, 259time
elapsed, 39variables, 39
time series, 162plotting, 333task view, 162, 373
time variable, 188time-to-event analysis, 163time-varying covariate, 165Times font, 252timing commands, 78titles, 246tolerance, 61topic index, xxiv
✐
✐
“book” — 2014/5/6 — 15:21 — page 417 — #441✐
✐
✐
✐
✐
✐
Subject index 417
tracing memory usage, 76transformed residuals, 158translations, character, 27transparent plot symbols, 219transposing
long (tall) to wide format, 34matrix, 64wide to long (tall) format, 33
trap error, 76treat variable, 111, 383treatment contrasts, 116trigonometric functions, 60trimmed mean, 84true positive, 225truncated normal random variables, 58truncation, 60Tufte, Edward, 216, 230Tukey, John, 230
honest significant differences, 119,144
mean-difference plot, 228notched boxplot, 216
two line data input, 331two sample t test, 93, 107two-way ANOVA, 117, 139
interaction plot, 223two-way tables, 103Tyler, Kristin, xxvType III sums of squares, 130, 169
UCLA, 359uniform random variables, 55union, 24unique values, 32univariate distribution parameter
estimation, 86univariate loess, 155universal resource locator (URL), 9University of Auckland, 357
Department of Statistics, xxvunnamed function, 284unstructured covariance matrix, 190unstructured working correlation, 161upper to lower case conversions, 27Urbanek, Simon, 357URL, 9
harvesting data, 330user-defined functions, 77using the book, xxiv
values of variables, 18
van Buuren, Stef, 306Vanderbilt University, 216variable display, 18variable format files, 7, 321variable labels, 18variables
add, 19rename, 19
variance, 83, 276variance covariance matrix, 158variance equality test, 93varimax rotation, 166, 202vectors
efficiency, 72extract elements, 367from a matrix, 67indexing, 365recycling, 365
version number, 372Verzani, John, 39violin plots, 215visualize correlation matrix, 238Vos Savant, Marilyn, 278
warranty for R, 361weave, 287, 290web technologies, 9, 15
task view, 373website for book, xxvweekday variable, 38Weibull distribution, 86, 269weighted least squares, 156where to begin, xxivwhile statement, 72, 362White variance, 197Wickham, Hadley, 29, 211, 230, 281, 284,
325wide to long (tall) format conversion, 33width of line, 255Wilcoxon test, 94, 108Wild, Chris, xxvwildcard expansion, 81Wilkinson dot plot, 212WinBUGS, 291Windows
installation of R, 358metafile, 258R FAQ, 358
Word rtf format, 257workflow, 287working correlation matrix, 161, 197
✐
✐
“book” — 2014/5/6 — 15:21 — page 418 — #442✐
✐
✐
✐
✐
✐
418 Subject index
working directory, 80workspace, 369workspace browser, 357writing
CSV (comma separated value) files,12
native format files, 12other packages, 13text files, 12
X′
X matrix, 127x-y plot, see scatterplot
Xie, Yihui, 288XML, 9, 13
create file, 13DocBook DTD, 14read file, 6write files, 14
year variable, 38
Zaslavsky, Alan, xxvzero-inflated
negative binomial regression, 154Poisson regression, 154, 178
top related