Disclaimer (2015 August 5th): as pointed out in this comment thread below, this post created a density plot rather than a coverage plot. I have written a new post that uses BEDTools to calculate the coverage and R to produce an actual coverage plot.
I’ve recently discovered GitHub Gist, so for this post I’m going to use that to host my code (and all subsequent posts as I see fit). The code was not displaying properly due to some CSS property of the Twenty Ten theme, so I had to update my WordPress theme to Twenty Eleven, which also led me to changing my header image. The photo I used for the header was a shot I took at the summit of Mount Fuji around 06:00 on the 24th August 2013, when the clouds finally cleared a little; I hiked all night to make it to the top to see sunrise but unfortunately the weather was terrible. The photo looks nice, so I thought I’ll use it as the header.
Anyway back to the topic; I wanted to create a coverage plot of mapped reads starting from a BAM file. So far I’ve been using IGV’s coverage track to get a visual idea of the coverage. In the past, I’ve also used bedtools genomecov to generate bedGraph files and the subsequent wig and bigWig files that I would then visualise on the UCSC Genome Browser. How about creating a coverage plot in R (so that I can export it as a postscript file)? Yeah sure, why not. Let’s download a BAM file as an example:
#the smallest CAGE BAM file from ENCODE wget http://hgdownload.cse.ucsc.edu/goldenPath/hg19/encodeDCC/wgEncodeRikenCage/wgEncodeRikenCageHchCellPapAlnRep1.bam
First let’s read in a BAM file using Bioconductor’s Rsamtools and convert it into a data frame (since I’m more familiar with data frames):
Now that we’ve stored our BAM file into a data.frame we can make our coverage plot.
If you want a histogram looking image instead:
plot(chr22_pos_density, ylim = range(c(chr22_neg_density$y, chr22_pos_density$y)), main = "Coverage plot of mapped CAGE reads", xlab = "Chromosome 22", col = 'blue', type='h' ) lines(chr22_neg_density, type='h', col = 'red')
If you want to focus on a specific region, follow the steps above until you create the two vectors chr22_neg and chr22_pos. Remember that these vectors store the starting position of all the reads mapped onto chromosome 22. So to focus on a specific region we just need to shrink or subset this vector:
#check our two vectors length(chr22_neg) # 24413 length(chr22_pos) # 32013 #what is actually inside these two vectors? head(chr22_pos)  16096446 16124065 16147449 16165745 16339913 16364755 #get a summary of the mapped positions summary(chr22_pos) # Min. 1st Qu. Median Mean 3rd Qu. Max. #16100000 31480000 35790000 35040000 39080000 51240000 #say I am interested in region 31480000 to 39080000 lower <- 31480000 upper <- 39080000 chr22_pos_interest <- chr22_pos[chr22_pos > lower & chr22_pos < upper] #check how many entries we have length(chr22_pos_interest) # 16220 #do the same for the negative strand chr22_neg_interest <- chr22_neg[chr22_neg > lower & chr22_neg < upper] length(chr22_neg_interest) # 7744 #now continue with the code above #but with our two new vectors of interest chr22_neg_density <- density(chr22_neg_interest) chr22_pos_density <- density(chr22_pos_interest) #display the negative strand with negative values chr22_neg_density$y <- chr22_neg_density$y * -1 plot(chr22_pos_density, ylim = range(c(chr22_neg_density$y, chr22_pos_density$y)), main = "Coverage plot of mapped CAGE reads", xlab = "Chromosome 22", col = 'blue', lwd=2.5, type='h' ) lines(chr22_neg_density, lwd=2.5, col = 'red', type='h')
So there you have it, a simple coverage plot in R. Now that the BAM file is stored as a data frame, you can perform subsets to your liking to focus on a specific region, focus on reads with a certain mapping quality, etc. and produce all kinds of density/coverage plots. You can also display several densities together by following the same steps and adding more “lines” and by using different colours.
And GitHub Gist is super awesome.
This work is licensed under a Creative Commons
Attribution 4.0 International License.