R data visualisation

Once upon a time, I made my graphs using Excel because it was the only software that I was aware of for making graphs. Now one can do amazing things with Excel and produce fairly good looking graphs, but after looking at some examples of R graphs, I wanted to learn a bit more about…

Continue Reading

Creating plots using the xkcd package in R

xkcd styled graphs using the xkcd package in R. Steps done on R version 3.0.1 (2013-05-16) and on Windows, i386-w64-mingw32/i386 (32-bit). Steps followed are from the xkcd-intro.pdf file i.e. the xkcd vignette. install.packages("xkcd") library(xkcd) library(extrafont) library(ggplot2) #do you have xkcd fonts? if( "xkcd" %in% fonts()) { p <- ggplot() + geom_point(aes(x=mpg, y=wt), data=mtcars) + theme(text…

Continue Reading

Sequence logos with R

Updated 2014 September 4th to include a script that parses the TRANSFAC matrix.dat file The WebLogo tool allows you to create sequence logos based on multiple sequence alignments. However if you want to create a vector image of a sequence logo based on position frequency matrices we need another resource. I found the Bioconductor package…

Continue Reading

Visualising hierarchical clustering results

I’ve written about hierarchical clustering before as an attempt to understand it better. Within R, you can plot the hierarchical clustering results however when working with a large dataset you may produce plots like these where all the labels are overlapping: and As you can see you can’t see any of the labels. During my…

Continue Reading

Sequence conservation in vertebrates

The UCSC Genome Browser provides multiple alignments of 46 vertebrate species and conveniently provides them for download. The multiple alignments show regions of sequence conservation among vertebrates. For more information see http://genome.ucsc.edu/cgi-bin/hgTrackUi?db=hg19&g=cons46way. The multiple alignments are stored as Multiple Alignment Files and there are Perl and Python packages that parse them. The MAF format is…

Continue Reading

Installing Circos

A short post about installing Circos on Ubuntu, other Linux distributions and on Windows. Note: if you are using Ubuntu, the location of the env program is in /usr/bin/env. The gddiag and circos programs, use /bin/env, so when you run gddiag it gives a bad interpreter error. Change the first line to #!/usr/bin/perl for both…

Continue Reading

Creating a matrix of scatter plots in R

Scatter plots are 2 dimensional plots that show the relationship between two variables. Here I demonstrate how we can create a matrix of scatter plots in R for datasets that have more than two variables. This is particularly useful when we want to visually inspect whether there are associations between variables. To display correlations on…

Continue Reading

UCSC Genome Browser custom overlap tracks

One of the features of the latest update to the UCSC Genome Browser (see http://www.ncbi.nlm.nih.gov/pubmed/20959295), are tracks which overlap or overlay each other. If you’re a regular user of the site, you will have noticed the ENCODE ChIP-Seq tracks that have several layers. After doing a bit of searching, I was able to make my…

Continue Reading

Visualising RNA-Seq like data

So you’ve aligned your reads from an RNA-Seq or RNA-Seq like experiment to the reference genome and have produced a BAM file. You could visualise your BAM file directly by using IGV. This is fine for looking at individual libraries, when looking at several large libraries, this may become an issue. A common strategy is…

Continue Reading