Abstracting your code into many small functions is key for writing nice R code. In our experience, biologists are initially reluctant to use functions in their code. Where people do use functions, they don’t use them enough, or try to make their functions do too much at once.
R has many built in functions, and you can access many more by installing new packages. So there’s no-doubt you already use functions. This guide will show how to write your own functions, and explain why this is helpful for writing nice R code.
Many of the benefits of using functions are more obvious by demonstration than by description. First exhibit is a script that does not use functions. We think that this is typical of the sort of scripts that ecologists end up with when analysing data.
This script can be simplified considerably by using functions, as this script shows.
Note that while we’ve called these “before” and “after”, more typically the process of moving code into functions happens incrementally while writing code.
Writing your own R functions
Below we briefly introduce function syntax, and then look at how functions help you to write nice R code. Nice coders with more experience may want to skip the first section.
Writing functions is simple. Paste the following code into your console
1 2 3
You have now created a function called
sum.of.squares which requires
two arguments and returns the sum of the squares of these
arguments. Since you ran the code through the console, the function is
now available, like any of the other built-in functions within
sum.of.squares(3,4) will give you the answer
The procedure for writing any other functions is similar, involving three key steps:
- Define the function,
- Load the function into the R session,
- Use the function.
Defining a function
Functions are defined by code with a specific format:
1 2 3 4
function.name: is the function’s name. This can be any valid
variable name, but you should avoid using names that are used
elsewhere in R, such as
arg1, arg2, arg3: these are the
arguments of the function, also
formals. You can write a function with any number of
arguments. These can be any R object: numbers, strings, arrays, data
frames, of even pointers to other functions; anything that is needed
for the function.name function to run.
Some arguments have default values specified, such as
arg3 in our
example. Arguments without a default must have a value supplied
for the function to run. You do not need to provide a value for those
arguments with a default, as the function will use the default value.
The ‘…’ argument: The
..., or ellipsis, element in the
function definition allows for other arguments to be passed into the
function, and passed onto to another function. This technique is often
in plotting, but has uses in many other places.
Function body: The function code between the within the
brackets is run every time the function is called. This code might be
very long or very short. Ideally functions are short and do just one
thing – problems are rarely too small to benefit from some
abstraction. Sometimes a large function is unavoidable, but usually
these can be in turn constructed from a bunch of small functions.
More on that below.
Return value: The last line of the code is the value that will be
returned by the function. It is not necessary that a function return
anything, for example a function that makes a plot might not return
anything, whereas a function that does a mathematical operation might
return a number, or a list.
Load the function into the R session
For R to be able to execute your function, it needs first to be read into memory. This is just like loading a library, until you do it the functions contained within it cannot be called.
There are two methods for loading functions into the memory:
- Copy the function text and paste it into the console
- Use the
source()function to load your functions from file.
Our recommendation for writing nice R code is that in most cases, you
should use the second of these options. Put your functions into a file
with an intuitive name, like
plotting-fun.R and save this file
R folder in
your project. You
can then read the function into memory by calling:
From the point of view of writing nice code, this approach is nice because it leaves you with an uncluttered analysis script, and a repository of useful functions that can be loaded into any analysis script in your project. It also lets you group related functions together easily.
Using your function
You can now use the function anywhere in your analysis. In thinking about how you use functions, consider the following:
- Functions in R can be treated much like any other R object.
- Functions can be passed as arguments to other functions or returned from other functions.
- You can define a function inside of another function.
A little more on the ellipsis argument
The ellipsis argument
... is a powerful way of passing an arbitrary
number of functions to a lower level function. This is how
data.frame with two columns and
data.frame with three columns.
Here’s a really daft example. Suppose you wanted a function that
y points in red, but you want all of
tricks. You can write the function like this:
1 2 3
and then do
and your new function will automatically pass the arguments
ylab through to plot, even though you never told
If you’ve written a function whose body is 2,996 lines of code, you’re doing it wrong.— M Butcher (@technosophos) April 11, 2013
Performs a single operation
The reason for writing a function is not to reuse its code, but to name the operation it performs.— Tim Ottinger (@tottinge) January 22, 2013
Uses intuitive names
“The name of a variable, function, or class, should answer all the big questions.” - Uncle Bob Martin, Clean Code— Gustavo Rod. Baldera (@gbaldera) April 24, 2013
How functions help you write Nice R Code
As a reminder, the goal of this course and blog is to help you write nice R code. By that we mean code that is easy to write, is easy to read, runs quickly, gives reliable results, is easy to reuse in new projects, and is easy to share with collaborators. Functions help achieve all of these.
Making code more readable
Consider two short pieces of of code:
1 2 3 4
the first is much shorter than the second (1 line vs 3) but it is much
less easy to understand. You can read the application of the function
as a sentence (
response.logit is the logit transform of
data$response) and you’re not bogged down in the detail of how a
logit transformation actually happens. The second form is also more
reliable than the first, as the variable
data$response is used only
once – it would not be possible for the
logit to point
at two different variables.
The most important thing that writing functions helps is for you to concentrate on writing code that describes what will happen, not how it will happen. The how becomes an implementation issue that you don’t have to worry about.
For example, we could define the logit function as
and our code would still work.
Avoiding coding errors
By using functions, you limit the scope of variables. In the
logit function, the
p variable is only valid within the body of
logit function – it is unaffected by any other variable called
p and it does not affect any other variable called
p. This means
when you read code you don’t have to look elsewhere to reason about
what values variables might take.
Along similar lines, as much as possible functions should be self contained and not depend on things like global variables (these are variables you’ve defined in the main workspace that would show up in RStudio’s object list).
Becoming more productive
Functions enable easy reuse within a project, helping you not to repeat yourself. If you see blocks of similar lines of code through your project, those are usually candidates for being moved into functions.
If your calculations are performed through a series of functions, then the project becomes more modular and easier to change. This is especially the case for which a particular input always gives a particular output.
How long is a piece of string?
In our experience, people seem to think that functions are only needed when you need to use a piece of code multiple times, or when you have a really large problem. However, many functions are actually very small. This post looks at the distribution of function length among R packages, and finds that long functions are the exception, rather than the norm.
This material written for coders with limited experience. Program design is a bigger topic than could be covered in a whole course, and we haven’t even begun to scratch the surface here. Using functions is just one tool in ensuring that your code will be easy for you to read in future, but it is an essential tool.
The more I write code, the more abstract it gets. And with more abstractions, the apps are easier to maintain. Been working for years…— Justin Kimbrell (@justin_kimbrell) April 30, 2013
If you want to read more about function syntax, check out the following:
- The official R intro material on writing your own functions
- Our intro to R guide to writing functions with information for a total beginner
- Hadley Wickam’s information on functions for intermediate and advanced users.