Day 2: 5:30pm Hacky Hour! Hermann's Bar Corner of Butlin Ave and City RoadWentworth Building Sit around with a beverage while helping and getting help with code. Twitter: #ResBazSyd, #ResBaz, #NerdNite This pad text is synchronized as you type, so that everyone viewing this page sees the same text. This allows you to collaborate seamlessly on documents! Homepage: Lesson materials: Pre survey: Lesson 1 - R introduction Data types: numeric integer character logical complex Data structures: list data.frame matrix vector factor table Order of coersion: logical -> integer -> numeric -> complex -> character Some other ways to create a vector: x <- rep(1, 10) x <- 1:10 x <- seq(1, 10, by=1) How to avoid factors when loading data: option 1. read.csv(..., stingsAsFactors=FALSE) option 2. using the package `readr` use the function `read_csv` which includes this as default (and is 10 times faster than `read.csv` Exercises: 1. Fix each of the following common data frame subsetting errors: # Extract cases where cyl is 4 mtcars[mtcars$cyl = 4, ] # Exclude only rows 1 through 4 mtcars[-1:4, ] # Return only rows for cylinders less than 5 mtcars[mtcars$cyl <= 5] # Return only rows for cylinders that are 4 or 6. mtcars[mtcars$cyl == 4 | 6, ] 2. Why does mtcars[1:20] return a error? How does it differ from the similar mtcars[1:20, ]? 3. R comes with a data set called iris; How big is this dataset (number of rows and columns)? Create a new data.frame called small_diamonds that only contains rows 1 through 9 and 19 through 23. You can do this in one or two steps. 4. Given a linear model mod <- lm(mpg ~ wt, data = mtcars) # Extract the residual degrees of freedom. Lesson 2 - Functions data <- read.csv("gapminder-FiveYearData.csv", stringsAsFactors = FALSE) Create a function that takes mean of data$pop hint: use the functions sum and length data.1982 <- subset(data, data$year==1982) plot(lifeExp ~gdpPercap, data=data.1982) tmp <- sqrt(data.1982$pop) p <- (tmp - min(tmp)) / (max(tmp) - min(tmp)) cex <- 0.2 +x) p *(10 - 0.2) plot(lifeExp ~gdpPercap, data=data.1982, log="xy" , cex=ce rescale <- function(x, range=c(0.2,10)) { x <- sqrt(x) p <- (x - min(x)) / (max(x) - min(x)) range[1] + p *(range[2] - range[1]) } plot(lifeExp ~gdpPercap, data=data.1982, log="xy" , cex=rescale(data$pop, c(0.2, 5))) add.trend.line <- function(xvar, yvar, data, ...) { y <- log10(data[[yvar]]) x <- log10(data[[xvar]]) fit <- lm(y ~ x) abline(fit, ...) } plot(lifeExp ~gdpPercap, data=data.1982, log="xy" , cex=rescale(data$pop, c(0.2, 5))) add.trend.line("gdpPercap", "lifeExp", data.1982) add.trend.line("gdpPercap", "lifeExp", data.1982, lwd=2, col="red") line.le.gdp <- function(continent, col, data) { add.trend.line("gdpPercap", "lifeExp", lwd=2, data[data$continent == continent,], col=col) } plot(lifeExp ~gdpPercap, data=data.1982, log="xy" , cex=rescale(data$pop, c(0.2, 5))) line.le.gdp("Africa", "red", data.1982) line.le.gdp("Asia", "blue", data.1982) line.le.gdp("Oceania", "green", data.1982) line.le.gdp <- function(cont, col, data) { add.trend.line("gdpPercap", "lifeExp", subset(data, data$continent == cont), lwd=1, col=col, lty="dashed") } Lesson 3, Projects project - data - R - manuscript - figures - pdfs main.R Links about project organisation: 1. 2. proj/ |-- R/ |-- data/ |-- output/ |-- |-- data/ |-- |-- figures/ |-- doc/ |-- analysis.R |-- my_project.Rproj data <- read.csv("hardrive/users/jmadin/projects/project_name/data/corals.csv") [Mac] data <- read.csv("C://...../users/jmadin/projects/project_name/data/corals/csv") [Windows] setwd("harddrive/users/jmadin/projects/project_name") data <- read.csv("data/corals.csv") Exercise: 1. In your project lesson folder, create 6 new folders: R, data, output (data, figures), docs 2. Drag your gapminder-FiveYearData.csv file to the folder data 3. In the project folder, create a new empty file: figures.R 4. In the R folder, create two new files: functions.R, figures_functions.R 5. From your file analysis.R drag all the 'analytical functions' into R/functions.R 6. From your file analysis.R drag all the 'plotting functions' into R/figures_functions.R 7. At the very beginning of your analysis.R file, source the new files with functions that you just created in the R folder and add the necessary libraries for it to work (in this case it's only plyr) 8. Remove any other library() lines from your analysis.R file 9. Make sure you adjust the file path to your data in order to read it: data <- read.csv("data/gapminder-FiveYearData.csv", stringsAsFactors=FALSE) 10. Drag the last part of your analysis.R file (making plot) to the figures.R file 11. In figures.R file, make sure you source the functions files the same way you just did in your analysis.R file. Also make sure you clean the console on your first line using: rm(list=ls()) 12. Create two new lines between the sourced files and the actual code in order to read and modify your data: data <- read.csv("data/gapminder-FiveYearData.csv", stringsAsFactors=FALSE) and data.1982 <- data[data$year == 1982, ] 13. Finally, quit RStudio and reopen it from the projects.Rproj file in your project root directory 14. Open analysis.R and figures.R and run each script entirely at once Lesson 4, Repeating stuff library(plyr) data <- read.csv("data/gapminder-FiveYearData.csv", stringsAsFactors=FALSE) source("R/functions.R") col.table <- c(Asia="tomato", Europe="chocolate4", Africa="dodgerblue2", Americas="darkgoldenrod1", Oceania="green4") Challenge 1: write a function that counts the number of unique values in a vector n.unique <- function(x) { length(unique(x)) } The ugly way b The nice way: ddply(data, "continent", summarise, n=n.unique(country)) Answer: ddply(data, "continent", summarise, n=n.unique(country), meanLE=mean(lifeExp), maxLE=max(lifeExp), meanGDP=mean(gdpPercap), varGDP=var(gdpPercap)) x <- ddply(data, "continent", mutate, n=n.unique(country), meanLE=mean(lifeExp), maxLE=max(lifeExp), meanGDP=mean(gdpPercap), varGDP=var(gdpPercap)) head(x) Challnge 2: Write a function that takes a dataframe x and returns the number of unique values in the variable country get.countries <- function(x) { unique(x$country) } countries <- dlply(data, "continent", get.countries) Challenge 3: total population size by continenet and year, using an in place function population <- ddply(data, c("continent", "year"), function (x) sum(x$pop )) ddply(data, c("continent", "year"), function (x) data.frame(Pop= sum(x$pop ), MeanPop = mean(x$pop))) fit.model <- function(x) { lm(lifeExp ~ log10(gdpPercap), data=x) } <- data[data$year==1982 & data$continent == "Asia",] fit <- fit.model( models <- dlply(data, c("continent", "year"), fit.model) ldply(models, coef) fit.model <- function(x) { fit <- lm(lifeExp ~ log10(gdpPercap), data=x) data.frame(n=nrow(x), a= coef(fit)[1], b= coef(fit)[2], r2 = summary(fit)$r.squared) } models <- ddply(data, c("continent", "year"), fit.model) Lesson 5, Shell Windows users: Open your Terminal and type the following command: echo "export TERM=msys" >> ~/.bashrc Then restart your machine. (I believe?) Solving problems with backspace for windows users??: Start a git bash shell cd ~ (your home directory) Create a new file called .inputrc and fill it with the following: "\e[3~": delete-char# this is actually equivalent to "\C-?": delete-char# VT"\e[1~": beginning-of-line"\e[4~": end-of-line# kvt"\e[H":beginning-of-line"\e[F":end-of-line# rxvt and konsole (i.e. the KDE-app...)"\e[7~":beginning-of-line"\e[8~":end-of-line Save the file and exit, you should be able to restart with your original command, "C:\Program Files\Git\bin\rxvt.exe" -e /usr/bin/bash --login -i and use the backspace! List ls -l Try these options with the ls command: -lt -a -F -lh -S -t -G (Mac) --color (PC) Exercise 01 Change into your home directory. Go to the shell lesson directory. Go into data. List the contents of this directory. Choose one file to examine with the function head. Then change back into your home directory again. Exercise 02 Got to your home directory: cd ~ Do each of the following using a single ls command without navigating to a different directory: List all of the files in shell material folder that start with the number 4; List all of the files in shell material folder that contain the number 01 (together and in this order); List alR of the files in in shell material folder that end with the number 0; BONUS: List all of the files in in shell material folder that contain the number 2 or the number 3. Exercise 03 Go to your lessons/60-shell directory. Create the following folders: docs, output/data, output/figures and R. From within your shell material directory, move the respective file types into their matching directory type following the (project setup) lesson. Lesson 6, Git 1) Create a .gitignore 2) Create an output/ folder 3) Add the output/ folder to the .gitignore file 4) Add the .gitignore file 5) Commit .gitignore file 6) Practice! Setting up - Pushing changes to github 1. change file 2. git add filename 3. git commit -m "My change" 4. git push Stephs GitHub: Git Shortcuts will appear here: --------------- Content's of Daniel's .gitignore_global file ``` # Sublime text projects# ################### *.sublime* # Specific folders # ################### ignore/ x_archive/ # Compiled source # ################### *.com *.class *.dll *.exe *.o *.so # Packages # ############ # it's better to unpack these files and commit the raw source # git has its own built in compression methods *.7z *.dmg *.gz *.iso *.jar *.rar *.tar *.zip # Logs and databases # ###################### *.log *.sql *.sqlite # LaTeX # ######### *.aux *.bbl *.blg *.dvi *.fff *.lof *.lot *.out *.toc *.ttt *.mtc* # Beamer # ######### *.snm *.nav *.fdb_latexmk *.fls *.vrb # OS generated files # ###################### .DS_Store* ehthumbs.db Icon? Thumbs.db # R .Rhistory .Rapp.history ..Rcheck # Nicer code blog .preview-mode *attributes.JSON .remake *.mp3 *.mp4 *.mov ``` --------------- Content's of Daniel's .gitconfig file ``` [user] name = Daniel Falster email = [core] excludesfile = /Users/dfalster/.gitignore_global editor = subl -n --wait autocrlf = input [color] branch = auto diff = auto interactive = auto status = auto ui = auto [alias] dfc = diff --color-words dfcc = diff --color-words --cached lol = log --graph --decorate --pretty=oneline --abbrev-commit lola = log --graph --decorate --pretty=oneline --abbrev-commit --all topo = log --graph --simplify-by-decoration --pretty=format:'%d' --all ff = merge --ff-only ffom = merge --ff-only origin/master rom = rebase origin/master rh = reset HEAD undo-commit = reset --soft HEAD^ root = rev-parse --show-toplevel sha = rev-parse --short HEAD [log] date = iso [push] default = simple ``` --------------- Final lesson git clone cd 2016-02-01-ResBaz-reproducible/1-standard open reproducible_research.Rproj remake install.packages(c("R6", "yaml", "digest", "crayon", "optparse", "storr", "devtools", "downloader")) devtools::install_github("richfitz/remake") install.packages("memoise") WORKED WELL Learning about functions worked well I liked the remake stuff (will be good to get used to it as I go) Intermediate level was appropriate functions for graphing Git/github Project organisation was useful day 1 went really well - easy to follow and the R functions were great Etherpad for sharing urls etc The sticky notes and helpers Good pace Live illustration of push-pull with GitHub Shell explanation very didatic and easy to follow format was great - coloured notes and using etherpad written online instructions are very well done! Having info about what to install before coming to workshop helped me I found the RStudio interface really helpful Also browser() function It was very handy the way you showed us multiple methods of plotting Github and version control is very useful and easy to follow Food was great!!! The breaks came just in time most of the time. DIDN'T WORK Install issues, lost time, then github stuff was then rushed (needed more info on etherpad to help?) remake was a bit lost on me (but might be because of the kind of analyses I run, its not that relevant) using lots of programs and experiencing lots of errors made it hard to follow. remake went through really quickly - har to integrate into what we already know as its all very new In the very beggining much time was lost with basic R stuff, then need to rush at the end with more interesting stuff. Trying out code on windows pcs before would be handy (text editor) Maybe need some info on git issues (eg people over writing other builds - or is that a non-issue?) Remake could be introduced before getting into git, since it's not really connected and might be confusing Would cygwin be better for Windows, or potentially worse? Would have been nice to know that we need RStudio a bit earlier than Sunday night... It would have made it a biteasier to sign in before starting the course.