Clustering mapped reads

Updated 2014 October 8th to include an analysis using CAGE data from ENCODE and rewrote parts of the post. In this post I will write about a read clustering method called paraclu, which allows mapped reads to be clustered together. This is particularly useful when working with CAGE data, where transcription start sites (TSSs) are…

Continue Reading

UCSC Genome Browser custom overlap tracks

One of the features of the latest update to the UCSC Genome Browser (see http://www.ncbi.nlm.nih.gov/pubmed/20959295), are tracks which overlap or overlay each other. If you’re a regular user of the site, you will have noticed the ENCODE ChIP-Seq tracks that have several layers. After doing a bit of searching, I was able to make my…

Continue Reading

Visualising RNA-Seq like data

So you’ve aligned your reads from an RNA-Seq or RNA-Seq like experiment to the reference genome and have produced a BAM file. You could visualise your BAM file directly by using IGV. This is fine for looking at individual libraries, when looking at several large libraries, this may become an issue. A common strategy is…

Continue Reading

Transcription factor binding site analysis

Updated 2013 October 4th. Recently I’ve been looking into transcription factor binding site analyses. With my mind set on this, I thought I’ll brush up this old post. MEME is a tool for discovering motifs in a group of related DNA or protein sequences. As a discovery tool, it is able to find de novo…

Continue Reading

Motifs upstream of RefSeq gene models

Here’s a very primitive way of looking for motifs upstream of RefSeq gene models. 1) Download the upstream sequences (-50) of RefSeq gene models using the UCSC Table Browser tool as a bed file 2) Using the fastaFromBed tool from BEDTools, make fasta files from the bed file 3) Look for motifs Here’s the main…

Continue Reading

Why miRNA are 22 or 23 nucleotides long

Late last year I mapped random sized DNA sequences back to the genome. The purpose was simply to see how long sequenced reads needed to be before they could be uniquely mapped to the genome. I couldn’t find the statistics on this, so I just did it myself. I didn’t dwell on the results too…

Continue Reading

Making a line graph to depict timecourse data

From this helpful thread in the bioconductor mailing list. Just to see what it is doing, I made a simpler example Column 5 of the matrix “two” can most easily be seen as the dotted aqua line (from -2.6879801 to -0.5859938). This plot could be useful if you wanted to depict the gene expression of…

Continue Reading