sorted bar plot with 45 degree labels in...

7
DAI 523 Information Design 1: Data Visualization | Trogu | Fall 2016 Wednesday, August 31, 2016 sorted bar plot with 45 degree labels – step by step Sorted bar plot with 45 degree labels in R In this exercise we’ll plot a bar graph, sort it in decreasing order (big to small from left to right) and place long labels under the bars. The labels will be at a 45 degree angle so that they can fit and still be readable. Note that in Illustrator you can quickly do this with a rotated text box and another box that wraps the text and forces the labels to align with the bars at the base of the graph. Thanks to Gabriel Bentley and Maggie Lee (TAs) who researched the code. The database file was originally used by student Michelle Boccia. Download he data set here, composed of increases in tuition by various state universities from 2010-11 to 2011-12. In the exercise, the percent increase will be plotted. The data set file (CSV) is called: stateU1011.csv Below the way the text file looks and the way it will look in R. The data has been cleaned and there are no spaces or special characters in the file name or header names. For example if there is a dash in the name, R will change that into a period when importing it. Also, if you start the header names with a number, R will append an X in front of the name when importing. Please note that not all data sets are ideally plotted as a bar chart. I believe that bar plots are best used when the X axis (horizontal) is used for categories (universities, states, etc) rather than dates. When a time series needs to be plotted (years etc. on the X axis) then a line graph is sufficient. Also, plotting percentages as bars usually is great for comparison between the items, but the relation to the whole (100%) is usually trimmed at the top and that can skew the perception of the graph. Just beware of it. The final code can be found at the end of this document. Import the dataset stateU1011.csv into R-Studio (header: yes, comma separated: yes) and plot. As a rule, type your code in the R script window (upper left). Run code (button in upper right of window. If necessary, select only the code you would like to run, then run. In the matrix, identify which data columns from the data set you are going to visualize. Your choices are the labels in the boxes along the diagonal of the matrix. For each plot, look up or down to identify the X axis, and look sideways to identify the Y axis. For this step, refer also to Chapter 6 in the textbook, and especially my annotated pages 188-189. In this example we will pick campus and percentIncrease. Page of 1 7

Upload: others

Post on 26-Mar-2020

3 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Sorted bar plot with 45 degree labels in Runixlab.sfsu.edu/~trogu/523/2016/tutorials/sorted_bar... · 2016-09-01 · sorted bar plot with 45 degree labels – step by step Sorted

DAI 523 Information Design 1: Data Visualization | Trogu | Fall 2016 Wednesday, August 31, 2016 sorted bar plot with 45 degree labels – step by step

Sorted bar plot with 45 degree labels in R

In this exercise we’ll plot a bar graph, sort it in decreasing order (big to small from left to right) and place long labels under the bars. The labels will be at a 45 degree angle so that they can fit and still be readable. Note that in Illustrator you can quickly do this with a rotated text box and another box that wraps the text and forces the labels to align with the bars at the base of the graph.

Thanks to Gabriel Bentley and Maggie Lee (TAs) who researched the code. The database file was originally used by student Michelle Boccia.

Download he data set here, composed of increases in tuition by various state universities from 2010-11 to 2011-12. In the exercise, the percent increase will be plotted. The data set file (CSV) is called:stateU1011.csv Below the way the text file looks and the way it will look in R.

The data has been cleaned and there are no spaces or special characters in the file name or header names. For example if there is a dash in the name, R will change that into a period when importing it. Also, if you start the header names with a number, R will append an X in front of the name when importing.

Please note that not all data sets are ideally plotted as a bar chart. I believe that bar plots are best used when the X axis (horizontal) is used for categories (universities, states, etc) rather than dates. When a time series needs to be plotted (years etc. on the X axis) then a line graph is sufficient. Also, plotting percentages as bars usually is great for comparison between the items, but the relation to the whole (100%) is usually trimmed at the top and that can skew the perception of the graph. Just beware of it.

The final code can be found at the end of this document.

Import the dataset stateU1011.csv into R-Studio (header: yes, comma separated: yes) and plot. As a rule, type your code in the R script window (upper left). Run code (button in upper right of window. If necessary, select only the code you would like to run, then run. In the matrix, identify which data columns from the data set you are going to visualize. Your choices are the labels in the boxes along the diagonal of the matrix. For each plot, look up or down to identify the X axis, and look sideways to identify the Y axis. For this step, refer also to Chapter 6 in the textbook, and especially my annotated pages 188-189. In this example we will pick campus and percentIncrease.

Page � of �1 7

Page 2: Sorted bar plot with 45 degree labels in Runixlab.sfsu.edu/~trogu/523/2016/tutorials/sorted_bar... · 2016-09-01 · sorted bar plot with 45 degree labels – step by step Sorted

DAI 523 Information Design 1: Data Visualization | Trogu | Fall 2016 Wednesday, August 31, 2016 sorted bar plot with 45 degree labels – step by step

Plot matrix of possible bivariate combinations

plot(stateU1011)

Plot campus and percentIncrease to check graph, by default R will plot little dots.

plot(stateU1011$ay2011, stateU1011$percentIncrease)

Page � of �2 7

Page 3: Sorted bar plot with 45 degree labels in Runixlab.sfsu.edu/~trogu/523/2016/tutorials/sorted_bar... · 2016-09-01 · sorted bar plot with 45 degree labels – step by step Sorted

DAI 523 Information Design 1: Data Visualization | Trogu | Fall 2016 Wednesday, August 31, 2016 sorted bar plot with 45 degree labels – step by step

Plot percentIncrease using the barplot command (not that only one data column is needed to plot the graph. Bars are arranged alphabetically by campus (the names of the universities). It looks cool but it’s difficult to compare each university with the others. Sorting the bars will look less cool but it will be much more informative.

barplot(stateU1011$percentIncrease)

Below, add the campus name labels using names.arg. Notice that only a few labels are displayed, simply because there is not enough room for all the labels to show up.

barplot(stateU1011$percentIncrease, names.arg=stateU1011$campus)

Page � of �3 7

Page 4: Sorted bar plot with 45 degree labels in Runixlab.sfsu.edu/~trogu/523/2016/tutorials/sorted_bar... · 2016-09-01 · sorted bar plot with 45 degree labels – step by step Sorted

DAI 523 Information Design 1: Data Visualization | Trogu | Fall 2016 Wednesday, August 31, 2016 sorted bar plot with 45 degree labels – step by step

Next, we’ll sort the bars. In order to do this, we’ll create an object in R where the data will be sorted by the increase amount.

sortedTable <- stateU1011[order(stateU1011$percentIncrease), ]

midpts <- barplot(sortedTable$percentIncrease, 1, names.arg=“")

See result below and sortedTable object in following picture.

Select the sortedTable object (right window) to display this new virtual data set (sorted by percentIncrease). Note that by default R sorted the data in increasing order (small to big).

Page � of �4 7

Page 5: Sorted bar plot with 45 degree labels in Runixlab.sfsu.edu/~trogu/523/2016/tutorials/sorted_bar... · 2016-09-01 · sorted bar plot with 45 degree labels – step by step Sorted

DAI 523 Information Design 1: Data Visualization | Trogu | Fall 2016 Wednesday, August 31, 2016 sorted bar plot with 45 degree labels – step by step

Now labels for the university names will be added at a 45 degree angle (run all three lines at once or individually). If desired, experiment with the values for the text.

sortedTable <- stateU1011[order(stateU1011$percentIncrease), ]

midpts <- barplot(sortedTable$percentIncrease, 1, names.arg="") text(x=midpts+.5, y=-1, sortedTable$campus, cex=0.5, srt=45, xpd=TRUE, pos=2)

x tells R where the labels should be positioned (it creates a separate object to do this: midpts+.5 – see data set window, but don’t worry about it here).y sets the vertical distance from the bars. Play around with this value as it might look like nothing happened, but if you don’t get an error, it probably means the labels are rendering off screen, outside the window. Change the value until the labels appear.sortedTable displays the names of the campuses but in the new sorted order by percent increase.srt sets the angle of the label, in this case 45 degrees.xpd (I have no idea but if you write FALSE the labels won’t appear)pos sets the alignment, I think 2 stands for Flush Right or right side — try different numbers for fun.

Next, we’ll reverse the sorting to the more traditional large to small, left to right. See highlights in code below. It’s the same as before with the extra decreasing part, and the type size (cex) is bigger). Run all at once again.

sortedTable <- stateU1011[order(stateU1011$percentIncrease, decreasing = TRUE), ]

midpts <- barplot(sortedTable$percentIncrease, 1, names.arg="")

text(x=midpts+.5, y=-1, sortedTable$campus, cex=0.75, srt=45, xpd=TRUE, pos=2)

Note that the labels are still disappearing under the window. Don’t worry, after exporting the plot to PDF and opening the file in Illustrator the labels will display correctly, just make the artboard bigger to make them fit. (See pic on page 6).

Page � of �5 7

Page 6: Sorted bar plot with 45 degree labels in Runixlab.sfsu.edu/~trogu/523/2016/tutorials/sorted_bar... · 2016-09-01 · sorted bar plot with 45 degree labels – step by step Sorted

DAI 523 Information Design 1: Data Visualization | Trogu | Fall 2016 Wednesday, August 31, 2016 sorted bar plot with 45 degree labels – step by step

Export graph to PDF. Note that the slanted labels may appear truncated below the edge of the graph in the exported PDF. That’s OK, they are still there. You will see them when you open the PDF in Illustrator — enlarge the document artboard as needed.

After opening the file in Illustrator you may need to release the clipping mask:Select all > Object > Clipping Mask > ReleaseAlso: Compound Path > Release.Remove any unwanted boxes.

When editing objects (rectangles etc.) remember that each object is split into two separate objects: fill and border. Unlike the normal way, where border and fill are separate attributes but belong to the same object (this is a quirk of the R —> PDF export).

If you want to change the spacing of the labels, you need to use Align and space equally. Or put all text in one continuous text box, rotate, place object wrap on top, and use leading (line spacing) to space labels.

For more information:

How can I sort my data in R?

http://bit.ly/dxWybg

How to display all x labels in R barplot?http://bit.ly/1fkfVhu

Page � of �6 7

Page 7: Sorted bar plot with 45 degree labels in Runixlab.sfsu.edu/~trogu/523/2016/tutorials/sorted_bar... · 2016-09-01 · sorted bar plot with 45 degree labels – step by step Sorted

DAI 523 Information Design 1: Data Visualization | Trogu | Fall 2016 Wednesday, August 31, 2016 sorted bar plot with 45 degree labels – step by step

Final code, also available here:

# plot matrix of possible bivariate combinations

plot(stateU1011)

# plot campus and percentIncrease to check graph

# by default R will plot little dots.

plot(stateU1011$ay2011, stateU1011$percentIncrease)

# Plot percentIncrease using the barplot command

# (not that only one data column is needed to plot the graph.

barplot(stateU1011$percentIncrease)

# add the campus name labels using names.arg. Notice that only a few labels are

# displayed, simply because there is not enough room for all the labels to show up.

barplot(stateU1011$percentIncrease, names.arg=stateU1011$campus)

# now sort the bars by size. In order to do this, we’ll create an object in R

# where the data will be sorted by the increase amount.

sortedTable <- stateU1011[order(stateU1011$percentIncrease), ]

midpts <- barplot(sortedTable$percentIncrease, 1, names.arg="")

# now labels for the university names will be added at a 45 degree angle

# (run all three lines at once or individually).

# If desired, experiment with the values for the text.

sortedTable <- stateU1011[order(stateU1011$percentIncrease), ]

midpts <- barplot(sortedTable$percentIncrease, 1, names.arg="")

text(x=midpts+.5, y=-1, sortedTable$campus, cex=0.5, srt=45, xpd=TRUE, pos=2)

# reverse the sorting (decreasing = TRUE) from large to small,

# left to right. Run all at once again. Labels are bigger.

sortedTable <- stateU1011[order(stateU1011$percentIncrease, decreasing = TRUE), ]

midpts <- barplot(sortedTable$percentIncrease, 1, names.arg="")

text(x=midpts+.5, y=-1, sortedTable$campus, cex=0.75, srt=45, xpd=TRUE, pos=2)

# export graph to PDF. note that the slanted labels may appear truncated below

# the edge of the graph in the exported PDF. That’s OK, they are still there.

# You will see them when you open the PDF in Illustrator — enlarge the document

# art-board as needed.

Page � of �7 7