Sequence logos with R

Updated 2014 September 4th to include a script that parses the TRANSFAC matrix.dat file The WebLogo tool allows you to create sequence logos based on multiple sequence alignments. However if you want to create a vector image of a sequence logo based on position frequency matrices we need another resource. I found the Bioconductor package…

Continue Reading

10,000 monthly visitors, apparently

I created davetang.org on the 24th of April 2009 just for the sake of buying a domain with my name in it. Realising that I was and am paying for a service, I decided to actually make use of my web space. But it really started to become handy when I decided to pursue a…

Continue Reading

Using SQL on R data.frames

In R, I typically use data.frames to hold all my data. I’ve wondered when I should just use a matrix instead of a data.frame and this was nicely answered. It is also quite easy to perform operations on data.frames to obtain a subset of the data. However if I had two data frames and wanted…

Continue Reading

Defining genomic regions

Updated 2014 June 24th to use GENCODE version 19 RNA sequencing (RNA-Seq) reads are typically mapped back to the genome (or transcriptome in some cases) after sequencing. The next task is to annotate the reads, to see which regions the reads mapped to. Typically one creates an annotation file and compares the coordinates of the…

Continue Reading

Using the Bioconductor GenomicRanges package

Updated: 2019 April 4th From the introductory article: The GenomicRanges package serves as the foundation for representing genomic locations within the Bioconductor project. To begin, install the package. The introduction article starts with creating a GRanges object: The GRanges class represents a collection of genomic features that each have a single start and end location…

Continue Reading