indices - amherst college · pdf filepoisson regression, ... sas, 346 dotplot, 212 downloading...

“book” — 2014/5/6 — 15:21 — page 397 — #421✐

Indices

Separate indices are provided for subject (concept or task), SAS command, and R command.References to the examples are denoted in italics.

“book” — 2014/5/6 — 15:21 — page 399 — #423✐

Subject index

3-Dhistogram, 219plot, 222

95% confidence intervalmean, 86proportion, 86

Abrams, Allyson, xxvabsolute value, 59accelerated failure time model, 163access

elements in R, 365files, 81variables, 17

addlines to plot, 244marginal rug plot, 245matrices, 64noise, 243normal density, 245straight line, 242text, 246variables, 19

age variable, 107, 381agreement, 89AIC, 143, 170airline delays, 336Akaike information criterion (AIC), 143,

170alcohol abuse, 383alcoholic drinks

HELP dataset, 382Allaire, J.J., xxiiAloisio, Kathryn, xxiialtitude, 325analysis of variance

interaction plot, 223one-way, 117two-way, 117, 139

analytic power calculations, 95and operator, 47angular plot, 224

annotating datasets, 43ANOVA

interaction plot, 223one-way, 117tables, 169

Aotearoa (New Zealand), 357API (application programming

interface), 334Apple R FAQ, 359application programming interface

(API), 334arbitrary quantiles, 85area under the curve, 225ARIMA model, 162Arnold, Tim, 288arrays, 44, 74

extract elements, 367arrows, 247ASCII

datasets, 12encoding, 27

assertions, 76assignment operators in R, 365association plot, 224attach dataframes, 361attributable risk, 87attributes

R, 369AUC (area under the curve), 225Auckland, University of, 357automated report generation, 106, 287autoregressive model, 162available datasets in R, 377AvantGarde font, 252average

running, 318average number of drinks

HELP dataset, 382axes

labels, 254multiple, 218

“book” — 2014/5/6 — 15:21 — page 400 — #424✐

400 Subject index

omit, 256range, 253style, 253values, 254

barcharterror bars, 216

barplot, 211Base SAS, 341baseline interview, 379basic concepts

SAS, 347batch mode, 356, 362Bates, Douglas, 357Bayesian

external software, 291inference task view, 290, 294, 373information criterion, 170logistic regression, 292

BCA intervals, 303Bergstralh, Erik, 299best linear unbiased predictors, 158beta

distribution, 86function, 60

bias corrected and accelerated, 304bias-corrected and accelerated, 303BIC, 170big data, 5, 29, 114, 336Bike ride, 325binned scatterplot, 219binomial family, 150binomial probabilities

tabulation, 317bivariate

loess, 155relationship, 101, 217–219

Bland-Altman plot, 228BMDP files, 6, 13BMP export, 258book website, xxvBoolean

operations, 25, 31, 47, 238R, 366

bootstrapping, 31, 303box around plots, 252boxplot, 214

side-by-side, 193, 215Bradley Airport, 336Breslow estimator, 163Breslow–Day test, 91

Breusch–Pagan test, 124“broken stick” models, 159bug reports, 377byte code compiler, 373

c statistic, 151calculate derivatives, 62calculus, 62Call, Gregory, xxiicalling functions from R, 369case

sensitivity, 356, 360statement, 21

case sensitivity, xxivcategorical data, 50

as predictor, 114from continuous, 20generation, 261parameterization, 115, 296plot, 224tables, 103

Cauchydistribution, 86link function, 150

causal inference, 296censored data, 163, 227, 269Center for Epidemiologic Studies

Depression (CESD) scale, 381centering, 85Central Limit Theorem, 274CESD, 43, 44cesd variable, 43, 381cesdtv variable, 188chained equation models, 306Chambers, John, 357change working directory, 80character translations, 27character variable, see string variablecharacteristics, test, 87characters, plotting, 242chemometrics task view, 373chi-square

distribution, 86statistic, 91

Cholesky decomposition, 158choose function, 60choropleth maps, 223, 327circadian plot, 224circular plot, 224class methods, 369class variable, 50

“book” — 2014/5/6 — 15:21 — page 401 — #425✐

Subject index 401

creating, 116ordering of levels, 116

classification, 166, 205clear graphics settings, 136clinical trial, 379

task view, 373closest values, 315closing a graphic device, 260cluster analysis

task view, 373clustering

hierarchical, 168task view, 166, 168

cocaine, 383Cochran–Mantel–Haenszel test, 91code completion, 357code examples

downloading, xxvcoefficient

of determination, 127of variation, 303regression, 124

coercingcharacter variable from numeric, 24dataframes into matrices, 368date from character, 4date to numeric, 37factor variable from numeric, 21matrices into dataframes, 368numeric from character, 4numeric from integer, 23string variable from numeric, 20

collinearity, 156color

palettes, 255selection, 255

column width, 40combine matrices, 63Comic Sans font, 252comma separated value (CSV) files, 4, 12command history, 80

R, 359SAS, 349

commentsR, 367SAS, 343

comparisonfloating point variables, 61operators, 365

compiler, 373complementary log-log link function, 150

complex fixed format files, 3, 321two lines, 331

complex numbers, 61complex survey design, 168component-wise matrix multiplication,

66Comprehensive R archive network, 358computational economics task view, 373computational physics task view, 373concatenate, 286

datasets, 35matrices, 63strings, 24

conditional execution, 72conditional logistic regression model, 151conditional probability, 278conditioning plot, 221, 232confidence interval, 78

for parameter estimates, 125for predicted observations, 226for the mean, 226proportion, 86

confidence leveldefault, 78

confidence limitsfor individual (new) observations,

126for the mean, 125plotting, 126

confounding, 296constrained optimization, 337contingency table, 90

plot, 224contingency tables, 103contour plots, 222contrasts, 146

Helmert, 116polynomial, 116SAS, 116treatment, 116

control flow, 71control structures, 71, 362controlling graph size, 250controlling Type-I error rate, 119convergence diagnosis for MCMC, 290,

convergence problems, 265converting covariance to correlation

matrix, 128converting datasets

long (tall) to wide format, 34

“book” — 2014/5/6 — 15:21 — page 402 — #426✐

402 Subject index

wide to long (tall) format, 33Cook’s Distance, 122coordinate systems (maps), 325correlated data, 190

generating, 267regression models, 157residuals, 158

correlationKendall, 89matrix, 101, 128, 238Pearson, 89Spearman, 89

cosine function, 60count models

goodness of fit, 171negative binomial regression, 153,

Poisson regression, 153, 176zero-inflated negative binomial, 154zero-inflated Poisson regression, 154,

Courier font, 252covariance matrix, 127, 128, 190covariate imbalance, 296Cowles, Kate, 290Cox proportional hazards model, 163,

frailty, 163proportionality test, 164simulate data, 269time-varying covariate, 165

CPU time, 78Cramer’s V, 91CRAN (Comprehensive R Archive

Network), 358CRAN task views, see task viewscreate

ASCII datasets, 12categorical variable from continuous,

20categorical variable using logic, 21CSV (comma separate value) files,

12datasets for other packages, 13date variable, 37factors, 116files for other packages, 13functions, 78lagged variable, 28matrix, 63numeric variable from string, 22

observation number, 32recode categorical variable, 21string variable from numeric, 20time variables, 39

creatingdataset from counts, 87

Cronbach’s α, 166, 201cross-classification table, 48, 90cross-validation

generalized, 155crosstabs, 90, 103CSV (comma separated value) files, 4, 12cumulative

density function, 53hazard, 164hazard plots, 228product, 319sum, 319

Curran, James, 162curve plotting, 224custom graphic layouts, 251

Dalgaard, Peter, 357dashed line, 254data

display, 18, 350entry, 10from R into SAS, 327from Stata into SAS, 327generation, 71input, 39output, 40scraping, 330

Data Expo 2009, 336data input

two lines, 331data step

range of variables, 74repeat steps for a set of variables, 74SAS, 347

data structures in R, 365data technologies, 15data viewer, 357database system, 29, 336dataframes, 365

comparison with column bind, 368comparison with matrix, 368detaching, 17R, 367remove from workspace, 368

dataset

“book” — 2014/5/6 — 15:21 — page 403 — #427✐

Subject index 403

comments, 19from counts, 87HELP study, 381other packages, 6R, 377subset, 350

datasets, xxvdate and time variables

create date, 37create time, 39extract month, 38extract quarter, 38extract weekday, 38extract year, 38reading, 3

dayslink variable, 111, 381DBF files, 6, 13debugging, 76, 345

RStudio, 76decimal representation, 61Deducer, 359default confidence level, 78defining functions, 78delete objects, 365density

estimation, 213, 220plot, 213, 220

density functions, 53generate random, 54probability, 54quantiles, 54

density plot, 101, 109overlapping, 216

depressive symptoms, 43derivatives, 62derived variable, 19, 44, 47design matrix, 127, 143, 172

specification, 115, 296design of experiments task view, 373design weights, 168detach

dataframes, 17, 139, 368packages, 17, 185, 368

determinant, 67detoxification program, 379, 381deviance, 150

tables, 169dffits, 123diagnostic agreement, 87, 225

ROC curve, 235diagnostic plots, 123, 136

diagnostics from linear regression, 135diagonal elements, 66, 67Diedrich, Holger, 213difference in log-likelihoods, 169difference in sets, 24differences between SAS and R, xxivdifferential equations

task view, 373dimension, 64diploma problem, 276directory delimiter, 2directory structure in R, 2directory structure in SAS, 2dispersion parameter, 181display data, 18, 42display format, 350display missing categories, 90displaying information about objects, 369displaying model results, 11distance metric, 25distribution

beta, 86Cauchy, 86chi-squared, 86empirical probability density plot,

214exponential, 86F, 86gamma, 86geometric, 86logistic, 86lognormal, 86negative binomial, 86normal, 54, 86parameters, 86Poisson, 86probability, 53q-q plot, 225quantile, 54quantile-quantile plot, 225stem plot, 212t, 86Weibull, 86

DocBook document type definition, 14document type definition, 14documentation

R, 362SAS, 346

dotplot, 212downloading

code examples, xxv

“book” — 2014/5/6 — 15:21 — page 404 — #428✐

404 Subject index

drinks of alcoholHELP dataset, 382

drinkstat variable, 47dropping variables, 30drugrisk variable, 238, 381DTD, 14duplicated values, 32dynamic graphics task view, 373dynamite plot, 216

ecological data task view, 373econometrics task view, 373edit distance, 25editing data, 10efficiency

programming, 349vector operations, 72

Efron estimator, 163eigenvalues and eigenvectors, 67elapsed time, 39else statement, 362empirical

density plot, 109estimation, 276finance task view, 373power calculations, 284probability density plot, 214variance, 161, 197

encodingASCII, 27

entering data, 10environment, 369environmental task view, 373Epi Info files, 6equal variance test, 93error bars

bar chart, 216error recovery, 76etiquette

R, 377evaluate integrals, 62Evans, Michael, 271exact

confidence intervals, 86logistic regression model, 152test of proportions, 92

example codedownloading, xxvR, 361SAS, 343

creating, 12reading, 5

excesskurtosis, 84zeroes, 154

exchangeable working correlation, 161execution

conditional, 72in operating system, 79profiling, 76

expected cell counts, 105expected values, 276experimental design task view, 373exponential

distribution, 86random variables, 58, 274

exponentiation, 59export

BMP, 258datasets for other packages, 13Excel, 12graphs, 256JPEG, 258PDF, 256PNG, 259postscript, 256TIFF, 259WMF (windows metafile format),

258expressions

R, 365extensible markup language (XML), 9,

14extract characters from string, 23extract from objects, 89, 367

F distribution, 86f1 variables, 45, 46, 201, 381factor

analysis, 166, 202levels, 115, 296reordering, 116variable, 50, 114

factorial function, 60failure time data, 163Falcon, Seth, 357false positive, 225family

binomial, 150Gamma, 150Gaussian, 150

“book” — 2014/5/6 — 15:21 — page 405 — #429✐

Subject index 405

inverse Gaussian, 150Poisson, 150

FAQApple R, 359R, 364, 377Windows R, 358

female variable, 47, 382Fibonacci sequence, 320file

browsing, 81variable format, 7

finance task view, 373find

approximate string, 25closest values, 315string within a string, 25working directory, 80

finite mixture modelstask view, 313, 373

Fisher’s exact test, 92, 103fit model separately by group, 138fixed format files, 2, 3flight delays, 336floating point representation, 61follow-up interviews, 379fonts in graphics, 252footnotes, 246for statement, 362foreign format, 43formatted

data, 12, 350model results, 11output, 287variables, 28

formula object, 90, 113forward stagewise regression, 171Foundation for Statistical Computing

R, 357Fox, John, 165fraction of missing information, 311frailty model, 163frequently asked questions

see FAQ, 364Friedman’s “super smoother”, 245Friendly, Michael, 224functions, 77

plotting, 224R, 78, 369

fuzzy search, 25

G-rho family of Harrington and Fleming,

g1b variable, 232, 382g1btv variable, 187, 197GAM, 155Gamma

distribution, 86family, 150function, 60, 271regression, 149

Gaussiandistribution, 53family, 150

Gelman, Andrew, 271, 273gender variable, 50, 382general linear model for correlated data,

157, 190generalized additive model, 155, 185generalized cross-validation, 155generalized estimating equation, 197

exchangeable working correlation,161

independence working correlation,161

unstructured working correlation,161

generalized linear mixed model, 161, 199generalized linear model, 149, 172

correlated outcomes, 160generalized logit model, 152, 183generalized multinomial model, 152generate

arbitrary random variables, 58categorical data, 261correlated binary variables, 267Cox model, 269dataset from counts, 87exponential random variables, 58generalized linear model random

effects, 264grid of values, 75logistic regression, 263multinomial random variables, 56multivariate normal random

variables, 56normal random variables, 56other random variables, 58pattern of repeated values, 73predicted values, 120random variables, 53residuals, 121sequence of values, 73

“book” — 2014/5/6 — 15:21 — page 406 — #430✐

406 Subject index

truncated normal random variables,58

uniform random variables, 55genetics task view, 373genf variable, 139Gentleman, Robert, 357geometric distribution, 86getting

data from R into SAS, 327data from Stata into SAS, 327help in R, 377started with SAS, 346

GitHub, 357goodness of fit, 171, 178

ROC curve, 235Google Maps, 325GPS coordinates, 325Gromping, Ulrike, 213graduation, 276grammar of graphics, 325graphical layouts, 251graphical models task view, 373graphical settings, 253graphical user interface

deducer, 359R, 359

graphicsboxplot, 214choropleth, 327exporting, 256side-by-side boxplots, 215size, 250task view, 211, 373

greater than operator, 47grid

graphics, 373of values, 75rectangular, 248search, 337

grouping variablelinear model, 283summary statistics, 281

growth curve models, 159Gruen, Bettina, 313guide to packages

R, 372guidelines

R-help postings, 377

Hadoop, 29Hakim, Tanya, xxv

hanging rootogram, 172Harrell, Frank, 39, 128, 216, 371Harrington and Fleming G-rho family,

harvesting data, 330hat matrix, 122hat-check problem, 276hazard plots, 228Health Evaluation and Linkage to

Primary Care (HELP) study,379

health surveySF-36, 382

Helmert contrasts, 116HELP study

clinic, 383dataset, 381introduction, 379results, 379

help systemother resources, 377R, 361, 362R packages, 373SAS, 346

Helvetica font, 252heroin, 383Hesterberg, Tim, 274heteroscedasticity test, 124hierarchical clustering, 168, 208high-performance computing task view,

373histogram, 100, 213

comparing, 215history

of commands, 80, 359R, 357SAS, 341

homeless variable, 103, 172, 382homelessness, 381homogeneity of odds ratio, 91honest significant difference, 119, 144Hornik, Kurt, 357Hosmer–Lemeshow test, 171hospitalization, 381Hotelling’s t, 162HTML files, 14

harvesting data, 330reproducible output, 290SAS, 351, 355

HTTP/HTTPS, 9, 332Huber variance, 197

“book” — 2014/5/6 — 15:21 — page 407 — #431✐

Subject index 407

hypertext markup language format(HTML), 14

hypertext transport protocol (HTTP), 9

i1 variable, 47, 176, 382i2 variable, 47, 382Iacus, Stefano, 357id number, 32id variable, 382identifying points, 249identity link function, 150if statement, 30, 72, 362Ihaka, Ross, xxv, 357ill-conditioned problems, 156image plot, 222imaginary numbers, 61imaging task view, 373import data, 6imputation, 306in statement, 362income inequality, 155incomplete data, 304, 306independence working correlation, 161index

R, xxivSAS, xxivsubject, xxiv

indexingin R, 43, 323lists, 366matrix, 66vector, 365

indicator variable, 115, 296individual level data, 87indtot variable, 172, 231, 382InDUC (Inventory of Drug Use

Consequences), 382infinite values, 305influence, 122information matrix, 127installing

packages in R, 371R, 358RStudio, 359SAS, 341

integerfunctions, 60problems, 340

integration, 62interaction, 117

linear regression, 130

plot, 139, 223testing, 141two-way ANOVA, 139

interceptno, 116

intersection, 24interval censored data, 227introduction

R, 357, 362RStudio, 357SAS, 341

Inventory of Drug Use Consequences, seeindtot variable

inverseGaussian family, 150link function, 150matrix, 65probability integral transform, 58

iterative proportional fitting, 153

JAGS, 291jitter points, 243joining datasets, 35joins, 29Jones, Albyn, xxvJPEG export, 258

Kaplan, Danny, 224Kaplan–Meier plot, 227, 234Kappa, 89keeping variables, 30Kendall correlation, 89kernel smoother plot, 213, 220knapsack problem, 337knitr, 287Knuth, Donald, 287Kolmogorov–Smirnov test, 94, 108Kosanke, Jon, 299Kruskal– Wallis test, 94Kuhfeld, Warren, 288kurtosis, 84

L1-constrained fitting, 170labels for variables, 18Laplace approximation, 265large data, 5, 29, 114, 336, 349large sample assumption, 274lasso method, 170latent class analysis, 167LATEX output, 287

R, 134

“book” — 2014/5/6 — 15:21 — page 408 — #432✐

408 Subject index

SAS, 351, 355Lavine, Michael, 273learning SAS, 346least absolute shrinkage and selection

operator, 170least angle regression, 171least squares

linear, 113nonlinear, 155

legend, 69, 248Leisch, Friedrich, 287, 313, 357length

of string, 23of vector, 65

Lenth, Russell, xxv, 287less than operator, 47Levene’s test for equal variances, 93Levenshtein edit distance, 25leverage, 122library

help, 373R, 371

license, SAS, 341Ligges, Uwe, 357likelihood ratio test, 141, 169line

on plot, 244style, 254types, 254width, 255

linear combinations of parameters, 120linear discriminant analysis, 167, 206linear models, 113

by grouping variable, 283categorical predictor, 114diagnostic plots, 123diagnostics, 135generalized, 149interaction, 117, 130no intercept, 116parameterization, 115, 296R object, 113residuals, 121standardized, 121studentized, 121

standardized residuals, 121stratified analysis, 283studentized residuals, 121test for heteroscedasticity, 124

linear programming, 340link function

cauchit, 150cloglog, 150identity, 150inverse, 150log, 150logit, 150probit, 150square root, 150

linkage to primary care, 381linkstatus variable, 111, 382Linux installation

R, 358Lipsitz, Stuart, 267list files, 81lists, 366

extract elements, 89, 367literate programming, 287Little, Roderick, 306local polynomial regression, 245locating points, 249loess

bivariate, 155log

base 10, 59base 2, 59base e, 59link function, 150

log fileR, 80SAS, 349

log scale, 255log-likelihood, 170log-linear model, 153logic, 21logical expressions, 20, 21logical operator, 20, 365logistic

distribution, 86generalized, 183

logistic regression, 149, 172Bayesian, 292c statistic, 151generating, 263goodness of fit, 171Nagelkerke R2, 151ROC curve, 235

logit link function, 150lognormal

distribution, 54, 86regression, 149

logrank test, 95, 111

“book” — 2014/5/6 — 15:21 — page 409 — #433✐

Subject index 409

long (tall) to wide format conversion, 34longitudinal regression, 157

reshaping datasets, 187looping, 71, 188lower to upper case conversions, 27lowess, 155, 185, 244Lucida font, 252Lumley, Thomas, 169, 357

M estimation, 156machine learning

task view, 167, 373machine precision, 61Macintosh R FAQ, 359macros, 77

SAS, 303MAD regression, 156Maechler, Martin, 357mailing list

R-help, 377make variables available, 17manipulate string variables, 23, 25, 26

remove spaces, 27split, 26

MANOVA, 162Mantel–Haenszel test, 91maps

choropleth, 223, 327coordinate systems, 325Google Maps, 325plotting, 321

margin specification, 253marginal

histograms, 232plot, 245

Markdown, 14, 287Markov Chain Monte Carlo, 271, 290,

matching, 296mathematical constants, 60mathematical expressions, 69, 247mathematical functions

absolute value, 59beta, 60choose, 60exponential, 59factorial, 60Fibonacci sequence, 320gamma, 60integer functions, 60log, 59

maximum value, 59mean value, 59minimum value, 59modulus, 59natural log, 59permute, 60square root, 59standard deviation, 59sum, 59trigonometric functions, 60

mathematical symbolsadding, 247

mathematics task view, 62, 373matrix

addition, 64combine, 63component-wise multiplication, 66concatenate, 63correlation, 128covariance, 127, 128creation, 63design, 127dimension, 64extract elements, 367graphs, 123hat, 122indexing, 66, 367information, 127inversion, 65large, 63multiplication, 57, 65, 127, 366overview, 63plots, 221R, 367sparse, 63transposition, 64

maximization, 265maximum likelihood estimation, 86maximum number of drinks

HELP dataset, 382maximum value, 59McArdle, Brian, xxvMCMC, 271, 290, 294McNemar’s test, 92mcs variable, 101, 382mean, 59, 83, 86

by group, 281trimmed, 84

mean-difference plot, 228median regression, 156medical imaging task view, 373

“book” — 2014/5/6 — 15:21 — page 410 — #434✐

410 Subject index

medical problems, 381memory usage, 76merging datasets, 35meta analysis task view, 373metadata, 369methods, 369, 374metric for distance, 25Metropolis–Hastings algorithm, 271MICE (chained equations), 306Microsoft rtf format, 257Microsoft Word format, 257minimum absolute deviation regression,

156minimum value, 59Minitab files, 6missing data, 45, 304, 306

tables, 90missing information fraction, 311missing values, 96

recoding, 306mixed model, 158

generating, 264logistic, 160, 161logistic regression, 199

modelcomparisons, 143, 169diagnostics, 135selection, 143, 170specification, 117, 130

modeling language, 90, 113, 282modulus, 59moments, 84Mongo databases, 29month variable, 38Monty Hall problem, 278mosaic plot, 224Mosteller, Fred, 276motivational interview, 379moving average model, 162Mplus, 168multicollinearity, 156multilevel models, 160multinomial model

generalized, 152logit, 183nominal outcome, 152ordered outcome, 152

multinomial random variable, 56multiple comparisons, 119, 144multiple imputation, 306multiple plots per page, 250

multiple y axes, 218, 230multiplication

matrix, 57, 65multivariate statistics

task view, 166, 373multivariate test, 162multiway tables, 91Murdoch, Duncan, 357Murrell, Paul, xxv, 15, 211, 230, 357

Nagelkerke R2 for logistic regression, 151named arguments in R, 78, 370named lists, 366names and variable types, 17native data files, 12native files, 1natural language processing task view,

373negative binomial distribution, 86negative binomial regression, 153, 180

zero-inflated, 154Nelson–Aalen estimate, 164nested models, 149nested quotes, 18New Century Schoolbook font, 252new users

R, 362New Zealand (Aotearoa), 357next statement, 362NIAAA, 379NIDA, 379NLP optimization, 62no intercept, 116noise

add to points, 243nonlinear least squares, 155nonparametric tests, 94, 108nonrandomized studies, 296normal density, 245normal distribution, 53, 54, 68, 84, 86normal random variables, 56

truncated, 58normality testing, 92normalizing, 85

constant, 271residuals from linear model, 121residuals from mixed model, 158

not operator, 305notched boxplot, 216NP completeness, 337number of digits to display, 11

“book” — 2014/5/6 — 15:21 — page 411 — #435✐

Subject index 411

numeric from string, 22numerical mathematics task view, 373

object-oriented programming, 369objects

displaying, 369R, 365remove, 365

observation number, 32observational studies, 296Octave files, 6ODBC, 29odds ratio, 87, 105

homogeneity, 91ODS, see output delivery systemofficial statistics, 168

task view, 169, 373omit axes, 256OnDemand for Academics, 341one-way analysis of variance, 117open-source, xxiiiOpenBUGS, 291operating system

change working directory, 80execute command, 79find working directory, 80list files, 81pause execution, 79

optimization, 62task view, 62, 373with constraints, 337

optionsR, 369SAS, 349

OR (odds ratio), 87or operator, 47, 365order statistics, 84ordered factor, 114ordered logistic model, 152, 182ordered multinomial model, 152ordering of levels, 116ordinal logit, 152, 182orientation

axis labels, 254boxplot, 214

output data from analysis, 351output delivery system, 345, 351output file formats

R, 288SAS, 351, 355

overdispersion, 149, 150

overplotting, 219

packagesdetaching, 17help, 373R, 371remove from workspace, 368

page, multiple plots per, 250pairs plot, 236pairwise differences, 119, 144Palatino font, 252palettes of colors, 255Parade magazine, 278parallel

boxplots, 193, 215computation, 374computing task view, 373processing, 371

parameter estimatesconfidence interval, 125standard errors, 125used as data, 124

parameter estimationunivariate distribution, 86

parameterization of categorical variable,115, 172, 296

reference category, 143partial file read, 2pathological distribution

sampling, 271pause execution for a time interval, 79pcs variable, 101, 382pdf output

exporting, 256SAS, 351, 355

peakedness, 84Pearson correlation, 89Pearson’s χ2 test, 91, 103, 171percentiles

probability density function, 54Perl

interface, 29modules, 13

permutation test, 94, 108permute function, 60permuted sample, 31pharmacokinetic task view, 373phi coefficient, 91pi (π), 60Pioneer Valley, 325plot

“book” — 2014/5/6 — 15:21 — page 412 — #436✐

412 Subject index

adding arrows, 247adding footnotes, 246adding polygons, 247adding shapes, 247adding text, 246arbitrary function, 224characters, 242conditioning, 221curve, 224limits, 129maps, 321predicted lines, 226predicted values, 226regression diagnostics, 123rotating text, 246symbols, 242time series data, 333titles, 246

Plummer, Martyn, 357PNG export, 259point size specification, 252point-and-click interface, 356points, 243

locating, 249Poisson distribution, 86Poisson family, 150Poisson regression, 149, 153, 176

Bayesian, 294zero-inflated, 154, 178

polygons, 247polynomial contrasts, 116polynomial regression, 155posterior probability, 290, 294posting guide (R-help), 377postscript, 252, 256power calculations

analytic, 95empirical, 284

practical extraction and report language(Perl), 13, 29

predicted valuesgenerating from linear model, 120

presentations in RStudio, 290primary care

linkage, 381visits, 382

primary substance of abuse, 383printing model results, 11prior distribution, 290, 294probability density, 54, 214probability distributions, 68

parameter estimation, 86quantiles, 53random variables, 53simulation, 261, 276task view, 53, 373

probability integral transform, 58probit link function, 150probit regression, 149profiling of execution, 76programming, 71projection, 325projects, 357propensity scores, 296proportion, 86proportional hazards model, 163, 200

frailty, 163proportionality test, 164simulate data, 269time-varying covariate, 165

proportional odds model, 152, 182proportionality test, 164Pruim, Randall, 216, 224pseudo R2, 151pseudo-likelihood, 265pseudo-random number

generation, 53set seed, 55

pss fr variable, 238, 382psychometrics, 166, 201

task view, 166, 373

QQ plot, 136, 225quadratic growth curve models, 159quantile regression, 156, 181quantile-quantile plot, 136, 225quantiles, 85

probability density function, 54t distribution, 78

quarter variable, 38quasi-complete separation, 294quitting R, 361quotes, nested, 18

Ravailable datasets, 377bug reports, 377command history, 359data structures, 365detach packages, 185Development Core Team, 357differences from SAS, xxiv

“book” — 2014/5/6 — 15:21 — page 413 — #437✐

Subject index 413

exiting, 360export SAS dataset, 13FAQ, 364, 377Foundation for Statistical

Computing, 357graphical user interface, 359help system, 361, 362history, 357installation, 358introduction, 357libraries, 371Linux installation, 358Markdown, 14, 290objects, 365packages, 371, 373Project, 377R Commander, 359R-help mailing list, 377reading SAS files, 6resources for new users, 362sample session, 360starting, 360support, 377task views, 372warranty, 361Windows installation, 358

R index, xxivR2

linear regression, 127logistic regression, 151

R-help mailing list, 377ragged data, 321rail trails, 325random coefficient model, 158, 159random effects model, 158, 193

estimate, 158generating, 264

random intercept model, 158random number

seed, 55, 319random slopes model, 158random variables

density, 54generate, 54generation, 53probability, 54quantiles, 54

randomization group, 383randomized clinical trial, 379range

axes, 253

rank sum test, 94reading

bytes, 8comma separated value (CSV) files,

4data, 39, 350data with two lines per obs, 331dates, 3fixed format files, 2HTTP from URL, 9long lines, 3more complex fixed format files, 3native format files, 1other files, 3other packages, 6R into SAS, 5R objects, 1SAS into R, 6spreadsheets, 5variable format files, 7, 321XML files, 9

receiver operating characteristic curve,225, 235

recoding missing values, 306recoding variables, 20, 21recover from error, 76rectangular grid, 248recursive partitioning, 166, 205reference category, 114, 115, 143, 172,

regression, 113categorical predictor, 114coefficients, 124diagnostic plots, 123diagnostics, 135forward stagewise, 171Gamma, 149interaction, 117, 130least angle, 171logistic, 149lognormal, 149no intercept, 116overdispersed binomial, 149overdispersed Poisson, 149parameterization, 115, 296Poisson, 149probit, 149residuals, 121standardized coefficients, 124standardized residuals, 121stratified analysis, 283

“book” — 2014/5/6 — 15:21 — page 414 — #438✐

414 Subject index

studentized residuals, 121test for heteroscedasticity, 124

regular expressions, 25, 31rejection sampling, 271relative risk, 87reliability measures, 166, 201remove

dataframe from workspace, 368objects, 365package from workspace, 368spaces from a string, 27

rename variables, 19repeat statement, 72, 362replace a string within a string, 26replicating examples from the book, 361report generation, 14, 106, 287reproducible analysis, 14, 106, 357

knitr, 287rich text format, 257Statweave, 287tangle, 287task view, 287, 288, 373weave, 287

resampling-based inference, 303reserved commands, 362reshaping datasets, 33, 187residuals, 121

analysis, 135correlated, 158plots, 136standardized, 121studentized, 121

results from HELP study, 379rich text format (rtf), 257ridge regression, 156right censored data, 227Ripley, Brian, 357Risk Assessment Battery, 381robust statistical methods

empirical variance, 161, 197regression, 156task view, 156, 373

ROC curve, 225, 235Rosenthal, Jeffrey, 271rotating

axis labels, 254text, 246

round results, 40, 60RR (relative risk), 87RSeek, 362RStudio, xxii, 357

exporting graphs, 256installation, 359presentations, 290reproducible analysis, 290

RTF (rich text format), 257, 351, 355Rubin, Donald, 306rug plot, 245running a script, 362running average, 318

Samet, Jeffrey, 379sample size calculations

analytic, 95sampling

challenging distribution, 271dataset, 31

sampling distribution, 274sandwich variance, 161, 197Sarkar, Deepayan, 211, 230, 313, 357SAS

Base, 341basic concepts, 347differences from R, xxivfiles from R, 6GUI, 341Institute, 341license file, 341macros, 303on-line help, 346OnDemand, 341

SAS index, xxivSAS/ETS, 341SAS/GRAPH, 341SAS/IML, 341SAS/STAT, 341saving

data, 42graphs, 256parameter estimates as dataset, 175printed result as SAS dataset, 132R history, 359SAS history, 345

scalelog, 255

scaling, 85scatterplot, 102, 129, 217

binned, 219lines, 244marginal histograms, 220, 232matrix, 221multiple y values, 218

“book” — 2014/5/6 — 15:21 — page 415 — #439✐

Subject index 415

points, 243separate plotting characters per

group, 242smoother, 129, 244

Schoenfeld residuals, 164Schoenfeld, David, xxvScott, Alastair, xxvscraping data, 330script file, 361, 362search for approximate string, 25seed, random number, 55, 275sensitivity, 87, 225sensitivity to case, xxivseparate model fitting by group, 138separate plotting characters per group,

242server version, 357set operations, 24setinit file, 341settings, graphical, 253sexrisk variable, 172, 183, 383SF-36 short form health survey, 382shapes, 247short form (SF) health survey, 382shrinkage method, lasso, 170side-by-side boxplots, 193, 215sideways orientation

boxplot, 214significance stars in R, 113, 132simulate

categorical data, 261Cox model, 269generalized linear model random

effects, 264logistic regression, 263power calculations, 284

simulation studies, 263sine function, 60singular value decomposition, 68size of graph, 250skewness, 84slides in RStudio, 290Smith College, xxv, 276smoothing spline, 129, 155, 185, 213,

220, 244social sciences

task view, 113, 128, 149, 172, 373social supports, 382SOCR (Statistics Online Computational

Resource), 359solve optimization problems, 62

sorting, 35, 51sourcing commands, 361sparse matrices, 63spatial statistics

choropleth, 327task view, 172, 325, 373

spatio-temporal datatask view, 373

Spearman correlation, 89specificity, 87, 225specifying

box around plots, 252color, 255design matrix, 115, 296margin, 253point size, 252text size, 252

splines, 374split string, 26spreadsheet, 5, 10SPSS files, 6, 13SQL, 29, 336square root, 59

link function, 150stagewise regression, 171standard deviation, 59, 83standard error, 76standardized regression coefficients, 124standardized residuals, 121

mixed model, 158Stata files, 6, 13, 327statements, SAS, 349statistical genetics task view, 373statistical learning task view, 373Statistics Online Computational

Resource (SOCR), 359StatWeave, 287stem plot, 212straight line

adding, 242stratified analysis, 138, 283string variable

concatenating strings, 24extract characters, 23find a string, 25find approximate string, 25from numeric variable, 20length, 23remove spaces, 27replace a string, 26

structural equation modeling

“book” — 2014/5/6 — 15:21 — page 416 — #440✐

416 Subject index

latent class analysis, 167structured query language (SQL), 29,

Student’s t test, 93, 274studentized residuals, 121styles

axes, 253line, 254

sub variable, 129, 139subject index, xxivsubmatrix, 66submitting code, 343subsetting, 30, 47, 51, 350substance abuse treatment, 382substance of abuse, 383substance variable, 102, 383sum, 59summary statistics, 97

mean, 83separately by group, 52, 281

sums of squarescross products, 127Type III, 130, 169

support, 377survey methodology, 168

task view, 169, 373survival analysis, 163

accelerated failure time model, 163Cox model, 200frailty, 163Kaplan–Meier plot, 227, 234logrank test, 95, 111proportional hazards model, 163simulate data, 269task view, 163, 228, 373

suspend execution for a time interval, 79Sweave, 14, 287sweep operator, 85symbols

mathematical, 247plot, 242

syntax highlighting, 357Systat files, 6

t distribution, 68, 86quantile, 78

t test, 93, 107, 274table, cross-classification, 90tabulate binomial probabilities, 317tangent function, 60tangle, 287, 290

task view, 372analysis of spatial data, 172Bayesian inference, 290, 294clustering, 166, 168finite mixture models, 313graphics, 211machine learning, 167multivariate statistics, 166official statistics, 169optimization and mathematical

programming, 62probability distributions, 53psychometrics, 166reproducible analysis, 287, 288robust statistical methods, 156social sciences, 113, 128, 149, 172spatial statistics, 325survival analysis, 163, 228time series, 162

Temple Lang, Duncan, xxv, 357temporal data

task view, 373test

characteristics, 87heteroscedasticity, 124interaction, 141joint null hypotheses, 118, 119normality, 92

textadding, 246files, 12rotating, 246size specification, 252

Tibshirani, Rob, 170tick marks, 254Tierney, Luke, 357TIFF export, 259time

elapsed, 39variables, 39

time series, 162plotting, 333task view, 162, 373

time variable, 188time-to-event analysis, 163time-varying covariate, 165Times font, 252timing commands, 78titles, 246tolerance, 61topic index, xxiv

“book” — 2014/5/6 — 15:21 — page 417 — #441✐

Subject index 417

tracing memory usage, 76transformed residuals, 158translations, character, 27transparent plot symbols, 219transposing

long (tall) to wide format, 34matrix, 64wide to long (tall) format, 33

trap error, 76treat variable, 111, 383treatment contrasts, 116trigonometric functions, 60trimmed mean, 84true positive, 225truncated normal random variables, 58truncation, 60Tufte, Edward, 216, 230Tukey, John, 230

honest significant differences, 119,144

mean-difference plot, 228notched boxplot, 216

two line data input, 331two sample t test, 93, 107two-way ANOVA, 117, 139

interaction plot, 223two-way tables, 103Tyler, Kristin, xxvType III sums of squares, 130, 169

UCLA, 359uniform random variables, 55union, 24unique values, 32univariate distribution parameter

estimation, 86univariate loess, 155universal resource locator (URL), 9University of Auckland, 357

Department of Statistics, xxvunnamed function, 284unstructured covariance matrix, 190unstructured working correlation, 161upper to lower case conversions, 27Urbanek, Simon, 357URL, 9

harvesting data, 330user-defined functions, 77using the book, xxiv

values of variables, 18

van Buuren, Stef, 306Vanderbilt University, 216variable display, 18variable format files, 7, 321variable labels, 18variables

add, 19rename, 19

variance, 83, 276variance covariance matrix, 158variance equality test, 93varimax rotation, 166, 202vectors

efficiency, 72extract elements, 367from a matrix, 67indexing, 365recycling, 365

version number, 372Verzani, John, 39violin plots, 215visualize correlation matrix, 238Vos Savant, Marilyn, 278

warranty for R, 361weave, 287, 290web technologies, 9, 15

task view, 373website for book, xxvweekday variable, 38Weibull distribution, 86, 269weighted least squares, 156where to begin, xxivwhile statement, 72, 362White variance, 197Wickham, Hadley, 29, 211, 230, 281, 284,

325wide to long (tall) format conversion, 33width of line, 255Wilcoxon test, 94, 108Wild, Chris, xxvwildcard expansion, 81Wilkinson dot plot, 212WinBUGS, 291Windows

installation of R, 358metafile, 258R FAQ, 358

Word rtf format, 257workflow, 287working correlation matrix, 161, 197

“book” — 2014/5/6 — 15:21 — page 418 — #442✐

418 Subject index

working directory, 80workspace, 369workspace browser, 357writing

CSV (comma separated value) files,12

native format files, 12other packages, 13text files, 12

X matrix, 127x-y plot, see scatterplot

Xie, Yihui, 288XML, 9, 13

create file, 13DocBook DTD, 14read file, 6write files, 14

year variable, 38

Zaslavsky, Alan, xxvzero-inflated

negative binomial regression, 154Poisson regression, 154, 178

indices - amherst college · pdf filepoisson regression, ... sas, 346 dotplot, 212 downloading...

Documents

1.gombrich capitulo xxv

xxv aniversario

vol xxv], no.]

anunaad - issue xxv

sonnet xxv

xxv flier use

the prairie issue xxv

folleto xxv octagon

xxv asamblea nacional

reglamento xxv coneic 2017

xxv ciclo - unitrento

gorgeous geek l xxv

xxv foil use

chapter 1: exploring data - miss sadowski's math page ·...

euro xxv presentation

xxv yuriy panukov

sections xx - xxv

ugamunc xxv

mixing xxv - presentation - rev2

la magia continúa xxv