# Writing functions

At some point, you will want to write a function, and it will probably be sooner than you think. Functions are core to the way that R works, and the sooner that you get comfortable writing them, the sooner you’ll be able to leverage R’s power, and start having fun with it.

The first function many people seem to need to write is to compute the standard error of the mean for some variable, because curiusly this function does not come with R’s base package. This is defined as $\sqrt{\mathrm{var}(x)/n}$ (that is the square root of the variance divided by the sample size.

We can already easily compute the mean

and the variance

and the sample size

so it seems easy to compute the standard error:

notice how data\$Height is repeated there — not desirable.

Suppose we now want the standard error of the dry weight too:

This is basically identical to the height case above. We’ve copied and pasted the definition and replaced the variable that we are interested in. This sort of substitution is tedious and error prone, and the sort of things that computers are a lot better at doing reliably than humans are.

It is also just not that clear from what is written what the point of these lines is. Later on, you’ll be wondering what those lines are doing.

Look more carefully at the two statements and see the similarity in form, and what is changing between them. This pattern is the key to writing functions.

Here is the syntax for defining a function, used to make a standard error function:

The result of the last line is “returned” from the function.

We can call it like this:

Note that x has a special meaning within the curly braces. If we do this:

we get the same answer. Because x appears in the “argument list”, it will be treated specially. Note also that it is completely unrelated to the name of what is provided as value to the function.

You can define variables within functions

These are also treated specially — they do not affect the main workspace (the “global environment”) and are destroyed when the function ends. If you had some value v in the global environment, it would be ignored in this function as soon as the local v was defined, with the local definition used instead.

Another example.

We used the variance function above, but let’s rewrite it. The sample variance is defined as $\frac{1}{n-1}\left(\sum_{i=1}^n (x_i - \bar x)^2 \right)$

This case is more compliated, so we’ll do it in pieces.

We’re going to use x for the argument, so name our first input data x so we can use it.

The first term is easy:

The second term is harder. We want the difference between all the x values and the mean.

Then we want to square those differences:

and compute the sum:

Watch that you don’t do this, which is quite different

(this follows from the definition of the mean)

Putting both halves together, the variance is

Which agrees with R’s variance function

The rm function cleans up:

We can then define our function, using the pieces that we wrote above.

And test it:

### An aside on floating point comparisons:

Our function does not exactly agree with R’s function

This is not because one is more accurate than the other, it is because “machine precision” is finite (that is, the number of decimal places being kept).

This affects all sorts of things:

So be careful with == for floating point comparisons. Usually you have do something like:

For some small value eps. The all.equal function can be very helpful here.

### Exercise: define a function to compute skew

Skewness is a measure of asymmetry of a probability distribution.

It can be defined as

Write a function that computes the skewness.

Hints:

• Don’t try to do this in one step, but use intermediate variables like the second version of standard.error, or like our variance function.
• The term on the top of the fraction is a lot like the variance function.

• Remember that parentheses can help with order-of-operation control.

• Get the pieces of your function working before putting it all together.

Back to main page