Visualising RNA-Seq like data

So you’ve aligned your reads from an RNA-Seq or RNA-Seq like experiment to the reference genome and have produced a BAM file. You could visualise your BAM file directly by using IGV. This is fine for looking at individual libraries, when looking at several large libraries, this may become an issue. A common strategy is to create coverage plots and here are a few ways.

I will refer to this hg19_chrom_info.txt file several times below. Use the fetchChromSizes executable available at http://hgdownload.cse.ucsc.edu/admin/exe/ to create the chrom.sizes file for the UCSC database you are working with (e.g. hg19).

1. Use the tool genomeCoverageBed from the BEDTools suite to create a coverage plot in BedGraph format.

genomeCoverageBed -bg -ibam file.bam -g hg19_chrom_info.txt

This creates a file where each line shows how many reads (column 4) are mapping to the defined region in columns 1, 2 and 3:

__snipped__
chr1 117499752 117499754 1
chr1 117499754 117499760 2
chr1 117499760 117499774 3
__snipped__

2. You can use IGVTools to create a coverage plot as a tiled data file (TDF) for use with IGV

igvtools count file.bam file.cov.tdf hg19.genome

3. If you use the UCSC Genome Browser, and have a extremely large dataset, you could convert your BedGraph file into a bigWig file. The bedGraphToBigWig executable is available here, which can directly convert your BedGraph file into a BigWig file.

bedGraphToBigWig file.bg hg19_chrom_info.txt file.bw

Some known issues

Recently I found that if you have two reads that map at the exact position but one on the positive strand and the other on the negative strand, the track is not displayed properly. For example, paste the following into the browser

track type=bedGraph name=”BedGraph Format” description=”BedGraph format” visibility=full color=200,100,0 altColor=0,100,200 priority=20
chr1 59302000 59302300 -1.0
chr1 59302300 59302600 -0.75
chr1 59302600 59302900 -0.50
chr1 59302900 59303200 -0.25
chr1 59302000 59302300 1.0
chr1 59302300 59302600 0.75
chr1 59302600 59302900 0.50
chr1 59302900 59303200 0.25

And the track will look as so:

The coverage plot is missing on the negative strand. An alternative is to create two separate tracks but sometimes I already have enough tracks loaded (which is another problem, how do you look at 20 tracks all at once?) and it is easier to look at one single track.

Using IGV and converting my BAM files into the TDF format created a coverage plot by summing the information on both strands. I found this thread asking for this feature to be implemented, but I couldn’t find the feature yet.

The best solution at the moment is just to split up your tracks into two i.e. positive and negative stranded tracks.

Update: The solution for showing negative and positive coverage plots on the same track is to use overlapping tracks.

Print Friendly, PDF & Email



Creative Commons License
This work is licensed under a Creative Commons
Attribution 4.0 International License
.

Leave a Reply

Your email address will not be published. Required fields are marked *

This site uses Akismet to reduce spam. Learn how your comment data is processed.