Within the R project and contributed packages, how long do functions tend to be? In our experience, people seem to think that functions are only needed when you need to use a piece of code multiple times, or when you have a really large problem. However, many functions are actually very small.
R allows a lot of “computation on the language”, simply meaning that we can look inside objects easily. Here is a function that returns the number of lines in a function.
1 2 3 4 5 |
|
This works because deparse
converts an object back into text (that
could in turn be parsed):
1
|
|
1 2 3 4 5 6 |
|
so the function.length
function is itself 6 lines long by this
measure. Note that the formatting is actually a bit different, in
particular indentation, braces position and spacing is different,
following the likes of the R-core style guide.
Most packages consist mostly of functions: here is a function that extracts all functions from a package:
1 2 3 4 5 6 7 |
|
Finally, we can get the lengths of all functions in a package:
1 2 |
|
Looking at the recommended package “boot”
1 2 |
|
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 |
|
I have 138 packages installed on my computer (mostly through dependencies – small compared with the ~4000 on CRAN!). We need to load them all before we can access the functions within:
1 2 3 4 |
|
Then we can apply the package.function.lengths
to each package.
1
|
|
The median function length is only 12 lines (and remember that includes things like the function arguments)!
1
|
|
1
|
|
The distribution of function lengths is strongly right skewed, with most functions being very short. Ignoring the 1% of functions that are longer than 200 lines long, the distribution of function lengths looks like this:
1 2 |
|
Then plot the distribution of the per-package median (that is, for each package compute the median function length in terms of lines of code and plot the distribution of those medians).
1 2 |
|
The median package has a median function length of 16 lines. There are handful of extremely long functions in most packages; over all packages, the median “longest function” is 120 lines.