Nice R Code

Punning code better since 2013

Figure functions

- - R, plotting

Transitioning from an interactive plot in R to a publication-ready plot can create a messy script file with lots of statements and use of global variables. This post outlines an approach that I have used to simplify the process and keeps code readable.

Modifying data with lookup tables

- - data, project

In many analyses, data is read from a file, but must be modified before it can be used. For example you may want to add a new column of data, or do a “find” and “replace” on a site, treatment or species name. There are 3 ways one might add such information. The first involves editing the original data frame – although you should never do this, I suspect this method is quite common. A second – and widely used – approach for adding information is to modify the values using code in your script. The third – and nicest – way of adding information is to use a lookup table.

Organizing the project directory

- - 2013-MQ, guest, module, project

This is a guest post by Marcela Diaz, a PhD student at Macquarie University.

Until recently, I hadn’t given much attention to organising files in my project. All the documents and files from my current project were spread out in two different folders, with very little sub folder division. All the files where together in the same place and I had multiple versions of the same file, with different dates. As you can see, things were getting a bit out of control.

How long is a function?

- -

Within the R project and contributed packages, how long do functions tend to be? In our experience, people seem to think that functions are only needed when you need to use a piece of code multiple times, or when you have a really large problem. However, many functions are actually very small.

Excel and line endings

- - data, git

On a Mac, Excel produces csv files with the wrong line endings, which causes problems for git (amongst other things).

This issue plagues at least Excel 2008 and 2011, and possibly other versions.

Basically, saving a file as comma separated values (csv) uses a carriage return \r rather than a line feed \n as a newline. Way back before OS X, this was actually the correct Mac file ending, but after the move to be more unix-y, the correct line ending should be \n.

git

- - module

Thanks to everyone who came along and was such good sports with learning git today. Hopefully you now have enough tools to help you use git in your own projects. The notes are available (in fairly raw form) here. Please let us know where they are unclear and we will update them.

To re-emphasise our closing message – start using it on a project, start thinking about what you want to track, and start thinking about what constitutes a logical commit. Once you get into a rhythm it will seem much easier. Bring your questions along to the class in 2 weeks time.

Also, to re-emphasise that git is not a backup system. Make sure that you have your work backed up, just in case something terrible happens. I recommend crash plan which you can use for free for backing up onto external hard drives (and for a fee).

Feedback

We welcome any and all feedback on the material and how we present it. You can give anonymous feedback by emailing G2G admin (you should have the address already – I’m only not putting it up here in a vain effort to slow down spam bots). Alternatively, you are welcome to email either or both of us, or leave a comment on a relevant page.

Why I want to write nice R code

- - module

Writing code is fast becoming a key - if not the most important - skill for doing research in the 21st century. As scientists, we live in extraordinary times. The amount of data (information) available to us is increasing exponentially, allowing for rapid advances in our understanding of the world around us. The amount of information contained in a standard scientific paper also seems to be on the rise. Researchers therefore need to be able to handle ever larger amounts of data to ask novel questions and get papers published. Yet, the standard tools used by many biologists - point and click programs for manipulating data, doing stats and making plots - do not allow us to scale-up our analyses to match data availability, at least not without many, many more ‘clicks’.

Designing projects

- -

The scientific process is naturally incremental, and many projects start life as random notes, some code, then a manuscript, and eventually everything is a bit mixed together.

Plans for ‘Nice R code module’, Macquarie University 2013

- - module

Welcome to the Nice R code module. This module is targeted at researchers who are already using R and want to write nicer code. By ‘nicer’ we mean code that is easy to write, is easy to read, runs fast, gives reliable results, is easy to reuse in new projects, and is easy to share with collaborators. When you write nice code, you do better science, are more productive, and have more fun.

We have a tentative schedule of plans for the 2013 Nice R Code module. The module consists of 9 one hour sessions and one 3 hour session. The topics covered fall into two broad categories: workflow and coding. Both are essential for writing nicer code. At the beginning of the module we are going to focus on the first, as this will cross over into all computing work people do.

R in ecology and evolution

- -

On this blog, we (Daniel Falster and I) are hoping to record bits of information about using R for ecology and evolution. Communicating tips person-to-person is too inefficient, and recently helping out at a software carpentry boot camp made us interested in broadcasting ideas more widely.