Doing simple stuff in R

Updated 2014 June 21st

The very first time I tried to use R was back in 2005 when I wanted to perform a Wilcoxon test. I was migrating from microbiology into bioinformatics and I still remember trying to follow this guide: http://cran.r-project.org/doc/manuals/r-release/R-intro.html. All I managed to do was waste a lot of paper by printing out the entire manual and typing out in verbatim Appendix A, which was a sample session, without understanding much. Since starting my PhD in 2010, I've had to use several of the R Bioconductor packages for performing some analyses. But it wasn't until recently when I enrolled in an online data analysis course that I learned a bit more about the language. This post is about doing simple stuff in R given my recent increase in exposure to the language.

Continue reading

Building a classification tree in R

In week 6 of the Data Analysis course offered freely on Coursera, there was a lecture on building classification trees in R (also known as decision trees). I thoroughly enjoyed the lecture and here I reiterate what was taught, both to re-enforce my memory and for sharing purposes.

I will jump straight into building a classification tree in R and explain the concepts along the way. We will use the iris dataset, which gives measurements in centimeters of the variables sepal length and width, and petal length and width, respectively, for 50 flowers from three different species of iris.

data(iris)
names(iris)
[1] "Sepal.Length" "Sepal.Width"  "Petal.Length" "Petal.Width"  "Species"   
table(iris$Species)

    setosa versicolor  virginica 
        50         50         50
#install if necessary
install.packages("ggplot2")
library(ggplot2)
qplot(Petal.Width, Sepal.Width, data=iris, colour=Species, size=I(4))

Continue reading