categorical data with r
TRANSCRIPT
Tabulatingdata with
2012-10-22 @HSPHKazuki Yoshida, M.D. MPH-CLE student
FREEDOMTO KNOW
Group Website is at:
http://rpubs.com/kaz_yos/useR_at_HSPH
n Introduction
n Reading Data into R (1)
n Reading Data into R (2)
n Descriptive statistics
Previously in this group
Group Website: http://rpubs.com/kaz_yos/useR_at_HSPH
Menu
n Categorical data
n How to tabulate
n Get sums and proportions
Ingredients
n Tables
n Cross tables
n Stratified tables
n data()
n table(), summary()
n prop.table()
n addmargins()
n xtabs(), ftable()
n gmodels::CrossTable()
n epiR::epi.2by2()
n Creating categorical variables
Epi/Stat Programming
Categorical data
gender
countryrace
ethnicity
cancer stage
disease severity
education level
Open R Studio
vcd epiRInstall and Load
data(Arthritis)Load built-in dataset Named “Arthritis”
We will use “Arthritis” dataset in vcd package
Arthritis[1:17 , ]
Extract 1st to 17th rows Show all columns
Indexing: extraction of data from data frame
Don’t forget commaColon in between
Treatment vector in Arthritis data frame
Five vectors of same length tied together
summarysummary(Arthritis)
summary of whole dataset
Your turn
n summary(Arthritis)
adopted from Hadley Wickham
Arthritis$Treatment
Accessing a single variable in data set
dataset name variable name
Arthritis$Treatment
factor levels (categories)
levelslevels(Arthritis$Treatment)
Check factor levels of a vector
Your turn
n Arthritis$Improved
n levels(Arthritis$Improved)
adopted from Hadley Wickham
This is an ordered factor
factor
factor is categorical variable in R
tabletable(Arthritis$Improved)
Create a single variable summary
Your turn
n table(Arthritis$Improved)
adopted from Hadley Wickham
prop.tabletable(table.object)
Convert tables to proportions
Your turn
n Improved.cat <- table(Arthritis$Improved)
n prop.table(Improved.cat)
adopted from Hadley Wickham
xtabsxtabs(formula = ~ , data = Arthritis)
Create cross tables
Your turn
n xtabs(~ Treatment +Improved, Arthritis)
n xtabs(~ Treatment +Improved +Sex, Arthritis)
adopted from Hadley Wickham
1st dimention
2nd dimention3rd dim
ention
addmarginsaddmargins(table.object)
Add margins to tables
Your turn
n tab1 <- xtabs(~ Treatment +Improved, Arthritis)
n addmargins(tab1)
adopted from Hadley Wickham
ftableftable(..., exclude = c(NA, NaN),
row.vars = NULL, col.vars = NULL)
Create flat tablesGood for ≥ 3 dimentional
Your turn
n tab2 <- xtabs(~ Treatment +Improved +Sex, Arthritis)
n ftable(tab2)
adopted from Hadley Wickham
prop.tabletable(cross.table.object)
Proportions again
Your turn
n tab3 <- xtabs(~ Treatment +Improved, Arthritis)
n prop.table(tab3) # proportion to total
n prop.table(tab3, 1) # proportion to row sum
n prop.table(tab3, 2) # proportion to column sum
adopted from Hadley Wickham
1st dimension
2nd dimension
chisq.testchisq.test(cross.table.object)
Chi-squared test
fisher.testfisher.test(cross.table.object)
Fisher’s exact test
Your turn
n tab3 <- xtabs(~ Treatment +Improved, Arthritis)
n chisq.test(tab3)
n fisher.test(tab3)
adopted from Hadley Wickham
CrossTableCrossTable(tab.2d)
available ingmodels package
SAS-like cross tables
Your turn
n tab3 <- xtabs(~ Treatment +Improved, Arthritis)
n CrossTable(tab3)
adopted from Hadley Wickham
epi.2x2epi.2x2(tab.2by2)
available inepiR package
2x2 table with RR RD OR
Your turn
n tab.2by2 <- xtabs(~ Sex +Treatment, Arthritis)
n epi.2by2(tab.2by2, units = 1)
adopted from Hadley Wickham
Creating factor
factor factorData in Excel
Integer
dat$Stage <- factor(dat$Stage)
To convert number vector to factor vector
dat$Stage <- as.numeric(as.character(dat$Stage))
To convert back to number