# Missing data

Missing data are a fact of life in biology. Individuals die, equipment breaks, you forget to measure something, you can’t read your writing, etc.

If you load in data with blank cells, they will appear as an NA value.

Some data to play with.

If the 5th element was missing

This is what it would look like:

Note that this is not a string “NA”; that is something different entirely.

Treat a missing value as a number that could stand in for anything. So what is

These are all NA because if the input could be anything, the output could be anything.

What is the value of this:

It’s NA too because x[1] + x[2] + NA + ... must be NA. And then NA/length(x) is also NA.

This is a pretty common situation for data, so the mean function takes an na.rm argument

sum takes the same argument too:

Be careful though:

The na.omit function will strip out all NA values:

So we can do this:

You can’t test for NA-ness with ==:

(why not?)

Use is.na instead:

So na.omit is (roughly) equivalent to

## Excercise

Our standard error function doesn’t deal well with missing values:

Can you write one that always filters missing values?

If we get time, we’ll talk about how to write one that optionally gets rid of missing values.

## Other special values:

Positive and negative infinities

Not a number (different to NA, but usually treatable the same way).

We saw NULL before. It’s the weirdest.