An example RNA-seq count table

I have been using pnas_expression.txt as a test dataset for count table analyses for many years. It was created by Davis McCarthy and was hosted on their Google Sites website. After some time, the site became unavailable and I have been hosting it on my web server since then. The RNA-seq libraries were generated using…

Continue Reading

Finding junctions with TopHat

For setting up TopHat see my previous post. Here, I wanted to test whether TopHat can find junctions with single end 27bp reads. The reference sequence I used was the test_ref.fa provided by the TopHat authors (see my previous post for the link), where the A’s mark the intron regions: >test_chromosome AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA ACTACTATCTGACTAGACTGGAGGCGCTTGCGACTGAGCTAGGACGTGCC ACTACGGGGATGACGACTAGGACTACGGACGGACTTAGAGCGTCAGATGC AGCGACTGGACTATTTAGGACGATCGGACTGAGGAGGGCAGTAGGACGCT…

Continue Reading

Annotating RNA-Seq data

After mapping your reads from an RNA-Seq experiment, usually the next task is identify the transcripts that the reads came from (i.e. annotating RNA-Seq data) and there are many ways of doing so. Here I just describe a rather crude method whereby I download sequence coordinates of hg19 RefSeqs as a BED12 file from the…

Continue Reading

Getting started with TopHat

Updated links for the binaries on 2015 March 2nd TopHat is a tool that can find splice junctions without a reference annotation. By first mapping RNA-Seq reads to the genome (using Bowtie/2), TopHat identifies potential exons, since many RNA-Seq reads will contiguously align to the genome. Using this initial mapping information, TopHat builds a database…

Continue Reading