Within the R project and contributed packages, how long do functions tend to be? In our experience, people seem to think that functions are only needed when you need to use a piece of code multiple times, or when you have a really large problem. However, many functions are actually very small.
R allows a lot of “computation on the language”, simply meaning that we can look inside objects easily. Here is a function that returns the number of lines in a function.
1 2 3 4 5 

This works because deparse
converts an object back into text (that
could in turn be parsed):
1


1 2 3 4 5 6 

so the function.length
function is itself 6 lines long by this
measure. Note that the formatting is actually a bit different, in
particular indentation, braces position and spacing is different,
following the likes of the Rcore style guide.
Most packages consist mostly of functions: here is a function that extracts all functions from a package:
1 2 3 4 5 6 7 

Finally, we can get the lengths of all functions in a package:
1 2 

Looking at the recommended package “boot”
1 2 

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 

I have 138 packages installed on my computer (mostly through dependencies – small compared with the ~4000 on CRAN!). We need to load them all before we can access the functions within:
1 2 3 4 

Then we can apply the package.function.lengths
to each package.
1


The median function length is only 12 lines (and remember that includes things like the function arguments)!
1


1


The distribution of function lengths is strongly right skewed, with most functions being very short. Ignoring the 1% of functions that are longer than 200 lines long, the distribution of function lengths looks like this:
1 2 

Then plot the distribution of the perpackage median (that is, for each package compute the median function length in terms of lines of code and plot the distribution of those medians).
1 2 

The median package has a median function length of 16 lines. There are handful of extremely long functions in most packages; over all packages, the median “longest function” is 120 lines.