# Nice R Code

## Punning code better since 2013

R is capable of producing publication-quality graphics. During this session, we will develop your R skills by introducing you to the basics of graphing.

Typing plot(1,1) does a lot by default. For example, the axes are automatically set to encapsulate the data, a box is drawn around the plotting space, and some basic labels are given as well.

Similarly, typing hist(rnorm(100)) or boxplot(rnorm(100)) does a lot of the work for you.

Plotting in R is about layering data and detail onto a canvas. So, let’s start with a blank canvas:

We are simply calling a new plotting device (plot always initiates a new device), plotting the coordinates (5,5), which we can’t see because of type="n", and we’re also telling R that we don’t want any default axes or annotations (e.g., titles or axis labels). Basically, we just want a blank slate. Finally, we set the x- and y-axis ranges, so that we have some space to work.

If you look at the helpfile for axis (?axis) you’ll see that you can pretty much control any apsect of axis creation. The only argument required for an axis is the side of the plot you want it on. Here 1 and 2 correspond with x and y; 3 and 4 also correspond with x and y, but on the other side of the plot (e.g., for secondary axes).

I prefer tick labels to align horizontally where possible. Here we use the argument las=2 (“label axis style”?). A value of 2 means “always horizontal”.

When plotting with R you are adding subsequent layers to your canvas. If something needs redoing, you need to start again from stratch. Therefore, it is very important to save a graph’s provenance using a text editor. If you change something, re-run all the lines of code to generate your graph. Saving as text is also great for when you want re-use a particular graphic style that you’ve developed. Graphing is one part of the scientific process for which you have some creative freedom.

Next, add axis labels and a title:

mtext stands for margin text. You need an argument for the text to be printed, one for the side of the plot (as for the axis function), and one saying how far from the axis you want to label is numbers of “lines”. Generally, the tick marks are at the zeroth line (line=0) and the tick labels in the first line. The default for plot is line=3, which I generally leave as is. You don’t generally need titles for plots in manuscripts, but they can come in handy for presentations, blogs, etc.

There is also the figure box(), which I don’t tend to use, but you may want to (?):

Let’s add some data. A red point at (5,5):

Now set your random number generator seed to 11, so that our plots all look the same. We’ll simulate some data. First, some normally distribution “independent” values. Then, some values that depend on these first set of values via a prescribed relationship and error distribution:

Typing ?points will give you the common options for the points function. I have used pch=21 (presumably “plotting character”?). The character (a circle) is white and the character background is black. I like this, because it helps distinguish points when they overlap.

Fit a linear model and add the line of best-fit.

You see that the model estimates reflect the parameters we used to generate the data (lm appears to be working). Let’s add some elements to our graph that highlight these fitted model results. Start by generating a sequence of numbers spanning the range of x:

Using the best-fit model, we can now predict values for each of the values in the x.seq vector:

lty=2 (presumably “line type”?) makes a dashed line. What’s the difference between confidence intervals and prediction intervals? Now, let’s calculate confidence intervals and do something a little bit more exciting: a transparent, shaded confidence band:

Text can also be added easily, using the coordinates of the current plot:

Legends typically take a bit of trial and error, but can do most things.

You can save your plot by simply using the “Export” functionality in the RStudio GUI. In general, plots should be saved as vector graphics files (i.e., PDF), because these “scale” (i.e., don’t lose resolution as you zoom in). However, vector graphics files can get large and slow things down if they contain a lot of information, in which case you might want to save as a raster or image file (i.e., PNG). PNGs work better on webpages and in presentations, because such software is not good at dealing with vector graphics. Do not save as a JPG.

You can also save your plot from the command line. Why would this be useful?

To do so, you need to fire-up a graphics device (e.g., PDF or PNG), write the layers to the file, and then close the device off (you won’t be able to open the file if you miss this last step).

or to a png file

You’ll notice above that data, labels and titles are within the plot function, rather than layering up as demostrated earlier.

## Exercise

Analyse and graph the relationship between height and weight of plants in the herbivore dataset. Attempt to reproduce the following graph.

## Plotting parameters

Each device has a set of graphical parameters that can be set up before you start plotting. Most of these parameters control the “look and feel” of the plotting device. Take a look at the par helpfile for the vast array of options:

For example, you can control the portion of your canvas taken up by each of the margins around your plot using mar (i.e., “lines” in the margin). The default is:

Widen the right axis.

A graphics device can have many panels. mfrow allows you to add multiple frames by row - so each time you call a plot, it will be added to the next panel by rows first

Panels may have the same information and axes don’t need to be repeated.

## Barplots

Barplots are commonly used in biology, but are not as straightword as you might hope in R. Make sure the herbivore data is loaded:

Using what you’ve learned so far, make a new column containing “Both”, “Root only”, “Seed only” and “None” based on the four possible combinations of root and seed herbivores in the dataset. This column needs to be a factor for the analysis we will be conducting and graphing.

Let’s run an ANOVA. First, let’s look at the linear model:

Now an analysis of variance of that linear model:

Okay, so there are significant differences among the treatments. A Tukey Honest Significance Differences test will suggest where these differences occur:

Now let’s summarise the data for plotting means and standard errors.

Order the data frame by Height:

Make the barplot, while assigning the barplot object to a variable (why?).

Now, use the anchor points from bp to add standard error bars:

If you want to get fancy, you can add line segments to the error bars like this

Or use arrows

And, finally, some kind of visual reference as to where the statistical differences among treatments lie (very ad hoc):

Panel A

Panel B

## Other useful examples

Contour or image plots can be used to visualise 3d, matrix or spatial data. Use the builtin dataset “volcano”:

The R package “ggplot” has become popular. However, you need to learn a syntax that is somewhat different to standard R syntax (hence the reason we did not cover it here).

The R package “lattice” is great for multivariate data. Here’s the volcano again using wireframe():

If you want to get visually fancy, then take a look at the R package “rgl”. Rstudio doesn’t seem to handle rgl objects very well.