One of my Top 10 posts is on creating a coverage plot using R. For that post I used CAGE data, which is a transcriptomic data set containing transcription start sites, and I used R exclusively for building a "coverage plot." The main issue with that post was that the plots were density plots rather than a real coverage plot. In this post, I'll use BEDTools to calculate the per base coverage of a defined region and produce an actual coverage plot using R.
Updated 2015 December 16th: the files for the puzzles can be downloaded from the Amazon cloud
Roughly two weeks ago I came across this excellent BEDTools tutorial and saw some puzzles or homework questions at the end of the tutorial; naturally I tweeted about it:
I enjoy puzzles, so BEDTools puzzles is definitely my idea of fun. In this post I will go through the ten puzzles/questions, without looking at the answers initially. Let's see how I do!
Updated 2014 June 24th to use GENCODE version 19
RNA sequencing (RNA-Seq) reads are typically mapped back to the genome (or transcriptome in some cases) after sequencing. The next task is to annotate the reads, to see which regions the reads mapped to. Typically one creates an annotation file and compares the coordinates of the mapped reads to the annotation file. Creating this annotation file is quite easy using BEDTools; in this post I refer to the creation of the annotation file as defining genomic regions, since in the end I will have several files that contain coordinates of exonic, intronic, and intergenic regions. I will define these regions with respect to GENCODE annotations.