May 2013 - Dave Tang's blog

R data visualisation

RDavoMay 31, 20130

Once upon a time, I made my graphs using Excel because it was the only software that I was aware of for making graphs. Now one can do amazing things with Excel and produce fairly good looking graphs, but after looking at some examples of R graphs, I wanted to learn a bit more about…

Mapping repeats

bioinformaticsDavoMay 25, 20138

Most eukaryotic genomes are interspersed with repetitive elements and some of these elements have transcriptional activity, hence they appear when we sequence the RNA population. From the trend of things, some of these elements seem to be important. One strategy for analysing these repeats is to map them to the genome, to see where they…

Using the Bioconductor annotation packages

RDavoMay 23, 20133

Another post related to this course I’m going through (I can’t link it enough times). I have almost finished with the first day of the course and couldn’t resist writing about this lecture on using the Bioconductor annotation packages. I had not realised that the annotation packages could be queried (pardon my ignorance) in the…

Using aggregate and apply in R

RDavoMay 22, 201314

2016 October 13th: I wrote a post on using dplyr to perform the same aggregating functions as in this post; personally I prefer dplyr. I recently came across a course on data analysis and visualisation and now I’m gradually going through each lecture. I just finished following the second lecture and the section “Working with…

Creating plots using the xkcd package in R

funDavoMay 21, 20131

xkcd styled graphs using the xkcd package in R. Steps done on R version 3.0.1 (2013-05-16) and on Windows, i386-w64-mingw32/i386 (32-bit). Steps followed are from the xkcd-intro.pdf file i.e. the xkcd vignette. install.packages("xkcd") library(xkcd) library(extrafont) library(ggplot2) #do you have xkcd fonts? if( "xkcd" %in% fonts()) { p <- ggplot() + geom_point(aes(x=mpg, y=wt), data=mtcars) + theme(text…

Singular Vector Decomposition using R

StatisticsDavoMay 20, 20130

In linear algebra terms, a Singular Vector Decomposition (SVD) is the decomposition of a matrix X into three matrices, each having special properties. If X is a matrix with each variable in a column and each observation in a row then the SVD is $$!X = UDV^T$$ where the columns of U are orthogonal (left…

Fitting a Michaelis-Menten curve using R

biologyDavoMay 17, 201314

Updated 2017 November 22nd Many biological phenomena follow four different types of relationships that include sigmoid, exponential, linear and Michaelis-Menten (MM) type relationships. The MM model is given by where is the reaction rate of product to substrate , represents the maximum rate achieved by the system, and is the substrate concentration at which the…

Coding potential of non-coding RNA

bioinformaticsDavoMay 11, 20137

The Coding Potential Assessment Tool (CPAT) was developed to assess the coding potential of RNA sequences without the need of sequence alignment. More information on the tool can be found at their website and corresponding publication.

Using the R SeqinR package

RDavoMay 9, 20135

Just a quick demonstration of the SeqinR package in R using sequences available from the NONCODE database version 3. First download the fasta file, which is available at http://noncode.org/datadownload/ncrna_NONCODE[v3.0].fasta.tar.gz, then install the necessary packages in R and load the fasta file (note I have extracted and renamed the fasta file into ncrna_noncode_v3.fa).

On curve fitting using R

RDavoMay 9, 201325

For linear relationships we can perform a simple linear regression. For other relationships we can try fitting a curve. From Wikipedia: Curve fitting is the process of constructing a curve, or mathematical function, that has the best fit to a series of data points, possibly subject to constraints. I will use the dataset from this…