beautiful tables in r: the tables package

33
About tables in general The tables package References Beautiful tables in R: the tables package Duncan Murdoch Department of Statistical and Actuarial Sciences University of Western Ontario November 29, 2013 1 of 28

Upload: dangkiet

Post on 16-Jan-2017

232 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Beautiful tables in R: the tables package

About tables in general The tables package References

Beautiful tables in R: the tables package

Duncan Murdoch

Department of Statistical and Actuarial SciencesUniversity of Western Ontario

November 29, 2013

1 of 28

Page 2: Beautiful tables in R: the tables package

About tables in general The tables package References

Outline

1 About tables in general

2 The tables package

2 of 28

Page 3: Beautiful tables in R: the tables package

About tables in general The tables package References

Outline

1 About tables in general

2 The tables package

3 of 28

Page 4: Beautiful tables in R: the tables package

About tables in general The tables package References

Tables aren’t easy

Gelman (2011) “Why Tables are Really Much Better ThanGraphs” is a tongue-in-cheek article defending the use ofgraphs rather than tables. It presents poor arguments “for”tables, and refutes them in favour of graphs.

Sometimes tables are better than graphs, but it’s not easyto create good tables (or good graphs).

4 of 28

Page 5: Beautiful tables in R: the tables package

About tables in general The tables package References

Tables aren’t easy

Gelman (2011) “Why Tables are Really Much Better ThanGraphs” is a tongue-in-cheek article defending the use ofgraphs rather than tables. It presents poor arguments “for”tables, and refutes them in favour of graphs.Sometimes tables are better than graphs, but it’s not easyto create good tables (or good graphs).

4 of 28

Page 6: Beautiful tables in R: the tables package

About tables in general The tables package References

Some quotes from Gelman’s paper

A table is not meant to be read as a narrative, so donot obsess about clarity. It is much more important toput in the exact numbers, as these represent the mostimportant summary of your results. . .

5 of 28

Page 7: Beautiful tables in R: the tables package

About tables in general The tables package References

Some quotes from Gelman’s paper

It is also helpful in a table to have a minimum of foursignificant digits. A good choice is often to use thedefault provided by whatever software you have usedto fit the model. Software designers have chosen theirdefaults for a good reason, and I would go with that.

6 of 28

Page 8: Beautiful tables in R: the tables package

About tables in general The tables package References

The depressing truth

The depressing truth is that many authors follow the previouspieces of advice (and others in the paper). I would postexamples, but I’d rather not embarrass those authors.

7 of 28

Page 9: Beautiful tables in R: the tables package

About tables in general The tables package References

Principles of good tables

Ehrenberg (1977) is an excellent paper about producing tables.Some advice:

Round to two significant or effective digits.Display row and column averages.Put items to be compared in the same column, one abovethe other.Order rows and columns by size.Don’t insert too much white space: things to be comparedshould be close to each other, but add gaps every 5 or sorows to help the eye travel across the table.

This advice should be considered, not followed blindly: tablesare meant for communication.

8 of 28

Page 10: Beautiful tables in R: the tables package

About tables in general The tables package References

Outline

1 About tables in general

2 The tables package

9 of 28

Page 11: Beautiful tables in R: the tables package

About tables in general The tables package References

How to produce good tables?

I don’t think authors want to produce bad tables, I think theydon’t know better, or don’t know how to do better, so I wrote theR package tables (Murdoch, 2013) to make it easy toproduce good tables.

10 of 28

Page 12: Beautiful tables in R: the tables package

About tables in general The tables package References

Background of my package

Many years ago, I loved SAS PROC TABULATE, whichmade it pretty easy to do the computations necessary toproduce good tables.My package tables improves on PROC TABULATE, byworking well with Sweave and LATEX. R is a particularlynatural choice for this, much more flexible than SAS.

11 of 28

Page 13: Beautiful tables in R: the tables package

About tables in general The tables package References

What is a table?

A rectangular array of numbers or text or pictures.Labels on the rows and columns. These may covermultiple entries, and may be nested.A caption.

The formula interface in tables handles the body and thelabels. LATEX can handle the captions.

12 of 28

Page 14: Beautiful tables in R: the tables package

About tables in general The tables package References

Fisher’s Iris Data

My examples work with Fisher’s famous iris dataset:> head(iris,10)

Sepal.Length Sepal.Width Petal.Length Petal.Width Species1 5.1 3.5 1.4 0.2 setosa2 4.9 3.0 1.4 0.2 setosa3 4.7 3.2 1.3 0.2 setosa4 4.6 3.1 1.5 0.2 setosa5 5.0 3.6 1.4 0.2 setosa6 5.4 3.9 1.7 0.4 setosa7 4.6 3.4 1.4 0.3 setosa8 5.0 3.4 1.5 0.2 setosa9 4.4 2.9 1.4 0.2 setosa10 4.9 3.1 1.5 0.1 setosa

13 of 28

Page 15: Beautiful tables in R: the tables package

About tables in general The tables package References

Group summaries

> booktabs() # Choose "booktabs" style

> latex(tabular(Species ~ (Sepal.Length+ + Sepal.Width)+ *(mean + sd),+ data=iris))

14 of 28

Page 16: Beautiful tables in R: the tables package

About tables in general The tables package References

Group summaries

\begin{tabular}{lcccc}\toprule& \multicolumn{2}{c}{Sepal.Length} & \multicolumn{2}{c}{Sepal.Width} \\ \cmidrule(lr){2-3}\cmidrule(lr){4-5}

Species & mean & sd & mean & \multicolumn{1}{c}{sd} \\\midrulesetosa & $5.006$ & $0.3525$ & $3.428$ & $0.3791$ \\versicolor & $5.936$ & $0.5162$ & $2.770$ & $0.3138$ \\virginica & $6.588$ & $0.6359$ & $2.974$ & $0.3225$ \\\bottomrule\end{tabular}

15 of 28

Page 17: Beautiful tables in R: the tables package

About tables in general The tables package References

Group summaries

Sepal.Length Sepal.Width

Species mean sd mean sd

setosa 5.006 0.3525 3.428 0.3791versicolor 5.936 0.5162 2.770 0.3138virginica 6.588 0.6359 2.974 0.3225

Fewer digits!

16 of 28

Page 18: Beautiful tables in R: the tables package

About tables in general The tables package References

Group summaries

Sepal.Length Sepal.Width

Species mean sd mean sd

setosa 5.006 0.3525 3.428 0.3791versicolor 5.936 0.5162 2.770 0.3138virginica 6.588 0.6359 2.974 0.3225

Fewer digits!

16 of 28

Page 19: Beautiful tables in R: the tables package

About tables in general The tables package References

Fewer digits

> latex(tabular(Species ~ Format(digits=2)+ *(Sepal.Length+ + Sepal.Width)+ *(mean + sd),+ data=iris))

17 of 28

Page 20: Beautiful tables in R: the tables package

About tables in general The tables package References

Fewer digits

Sepal.Length Sepal.Width

Species mean sd mean sd

setosa 5.01 0.35 3.43 0.38versicolor 5.94 0.52 2.77 0.31virginica 6.59 0.64 2.97 0.32

Marginal summaries!

18 of 28

Page 21: Beautiful tables in R: the tables package

About tables in general The tables package References

Fewer digits

Sepal.Length Sepal.Width

Species mean sd mean sd

setosa 5.01 0.35 3.43 0.38versicolor 5.94 0.52 2.77 0.31virginica 6.59 0.64 2.97 0.32

Marginal summaries!

18 of 28

Page 22: Beautiful tables in R: the tables package

About tables in general The tables package References

Marginal summaries

> latex(tabular(Species + 1 ~ Format(digits=2)+ *(Sepal.Length+ + Sepal.Width+ + 1)+ *(mean + sd),+ data=iris))

19 of 28

Page 23: Beautiful tables in R: the tables package

About tables in general The tables package References

Marginal summaries (Oops...)

Sepal.Length Sepal.Width All

Species mean sd mean sd mean sd

setosa 5.01 0.35 3.43 0.38 NA NAversicolor 5.94 0.52 2.77 0.31 NA NAvirginica 6.59 0.64 2.97 0.32 NA NAAll 5.84 0.83 3.06 0.44 NA NA

20 of 28

Page 24: Beautiful tables in R: the tables package

About tables in general The tables package References

Marginal summaries

> latex(tabular(Species + 1 ~ Format(digits=2)+ *(Sepal.Length+ + Sepal.Width)+ *(mean + sd),+ data=iris))

21 of 28

Page 25: Beautiful tables in R: the tables package

About tables in general The tables package References

Marginal summaries

Sepal.Length Sepal.Width

Species mean sd mean sd

setosa 5.01 0.35 3.43 0.38versicolor 5.94 0.52 2.77 0.31virginica 6.59 0.64 2.97 0.32All 5.84 0.83 3.06 0.44

Spacing!

22 of 28

Page 26: Beautiful tables in R: the tables package

About tables in general The tables package References

Marginal summaries

Sepal.Length Sepal.Width

Species mean sd mean sd

setosa 5.01 0.35 3.43 0.38versicolor 5.94 0.52 2.77 0.31virginica 6.59 0.64 2.97 0.32All 5.84 0.83 3.06 0.44

Spacing!

22 of 28

Page 27: Beautiful tables in R: the tables package

About tables in general The tables package References

Spacing

> latex(tabular(Species+ + Hline(2:5) + 1+ ~ Format(digits=2)+ *(Sepal.Length+ + Sepal.Width)+ *(mean + sd),+ data=iris))

23 of 28

Page 28: Beautiful tables in R: the tables package

About tables in general The tables package References

Spacing

Sepal.Length Sepal.Width

Species mean sd mean sd

setosa 5.01 0.35 3.43 0.38versicolor 5.94 0.52 2.77 0.31virginica 6.59 0.64 2.97 0.32

All 5.84 0.83 3.06 0.44

Better labels!

24 of 28

Page 29: Beautiful tables in R: the tables package

About tables in general The tables package References

Spacing

Sepal.Length Sepal.Width

Species mean sd mean sd

setosa 5.01 0.35 3.43 0.38versicolor 5.94 0.52 2.77 0.31virginica 6.59 0.64 2.97 0.32

All 5.84 0.83 3.06 0.44

Better labels!

24 of 28

Page 30: Beautiful tables in R: the tables package

About tables in general The tables package References

Better labels

> names <- paste("\\textit{Iris",+ levels(iris$Species), "}")> latex(tabular(Factor(Species, levelnames=names)+ + Hline(2:5) + 1+ ~ Format(digits=2)+ *(Heading("Sepal length")*Sepal.Length+ + Heading("Sepal width")*Sepal.Width)+ *(mean + sd),+ data=iris))

25 of 28

Page 31: Beautiful tables in R: the tables package

About tables in general The tables package References

Better labels

Sepal length Sepal width

Species mean sd mean sd

Iris setosa 5.01 0.35 3.43 0.38Iris versicolor 5.94 0.52 2.77 0.31Iris virginica 6.59 0.64 2.97 0.32

All 5.84 0.83 3.06 0.44

26 of 28

Page 32: Beautiful tables in R: the tables package

About tables in general The tables package References

What’s in a formula?

Terms in a formula can be:

function names Summary statistics, e.g. mean.factors Categories, e.g. Species.logical vectors Subsets.other vectors Values to be summarized.“pseudo-functions” Things that handle formatting, e.g. Format.formula functions Abbreviate formulas, e.g. Hline

27 of 28

Page 33: Beautiful tables in R: the tables package

About tables in general The tables package References

References I

A. S. C. Ehrenberg. Rudiments of numeracy. Journal of the RoyalStatistical Society, Series A, 140:277–297, 1977.

Andrew Gelman. Why tables are really much better than graphs.Journal of Computational and Graphical Statistics, 20:3–7, 2011.

Duncan Murdoch. tables: Formula-driven table generation, 2013. Rpackage version 0.7.64, on CRAN.

Read the vignette in tables for lots of details and examples.

28 of 28