How mappable is a specific repeat?

If you've ever wondered how mappable a specific repeat is, here's a quick post on creating a plot showing the mappability of a repetitive element along its consensus sequence. Specifically, the consensus sequence of a repeat was taken and sub-sequences were created by a sliding window approach (i.e. moving along the sequence) at 1 bp...

Continue Reading

Mapping repeats 2

Updated 10th September 2013 to include LAST I previously looked at mapping repeats with respect to sequencing errors in high throughput sequencing and as one would expect, the accuracy of the mapping decreased when sequencing errors were introduced. I then looked at aligning to unique regions of the genome to get an idea of how...

Continue Reading

Aligning to unique regions

Post updated on the 10th September 2013 after receiving input from the author of LAST I've been interested in aligning reads to the repetitive portion of the human genome; in this post I'll look into how well different short read alignment programs perform when aligning to unique regions of the genome. Firstly to find unique...

Continue Reading

ENCODE mappability and repeats

The ENCODE mappability tracks can be visualised on the UCSC Genome Browser and they provide a sense of how mappable a region of the genome is in terms of short reads or k-mers. On a side note, it seems some people use "mapability" and some use "mappability"; I was taught the CVC rule, so I'm...

Continue Reading

Mapping repeats

Most eukaryotic genomes are interspersed with repetitive elements and some of these elements have transcriptional activity, hence they appear when we sequence the RNA population. From the trend of things, some of these elements seem to be important. One strategy for analysing these repeats is to map them to the genome, to see where they...

Continue Reading

Mapping long reads with Bowtie

Just a simple test to see if Bowtie can map long reads. Why? Well because Bowtie is fast, so I want to see if I can also use it as a general purpose aligner. In a previous post I was characterising the mapability of the genome. From this I selected a portion of the genome...

Continue Reading

Genome mapability

I know of the genome mapability and uniqueness tracks provided by the UCSC Genome Browser but I was just interested in doing this myself for the hg19 genome. As a test, I investigated chr22, where the base composition is broken down as: Length of chr22 = 51,304,566 A: 9,094,775 C: 8,375,984 G: 8,369,235 T: 9,054,551...

Continue Reading

How to deal with multi mapping reads

Eukaryotic genomes are repetitive in nature i.e. the sequence content is not unique. When mapping high throughput sequencing reads back to the genome, whether for de novo assembly or for RNA sequencing, a subset of reads will map to more than 1 location. Some people refer to these reads as multi-reads for multi mapping reads....

Continue Reading

Using blat

My multipurpose sequence aligner tool of choice for many years has been blat. This is a short post on the basics of blat. To use blat, download the 64bit Linux version of blat (or a version that matches your operating system) here. When aligning sequences to the genome, make sure you use the 64 bit...

Continue Reading

Bowtie and multimapping reads

Updated 2014 June 8th I first tried this with BWA. Now I'll try it with Bowtie. Consider this reference sequence, which is the sequence "ACGTACGTACGTACGTAGGTACGTAGGG" repeated 20 times: >artificial ACGTACGTACGTACGTAGGTACGTAGGG ACGTACGTACGTACGTAGGTACGTAGGG ACGTACGTACGTACGTAGGTACGTAGGG ACGTACGTACGTACGTAGGTACGTAGGG ACGTACGTACGTACGTAGGTACGTAGGG ACGTACGTACGTACGTAGGTACGTAGGG ACGTACGTACGTACGTAGGTACGTAGGG ACGTACGTACGTACGTAGGTACGTAGGG ACGTACGTACGTACGTAGGTACGTAGGG ACGTACGTACGTACGTAGGTACGTAGGG ACGTACGTACGTACGTAGGTACGTAGGG ACGTACGTACGTACGTAGGTACGTAGGG ACGTACGTACGTACGTAGGTACGTAGGG ACGTACGTACGTACGTAGGTACGTAGGG ACGTACGTACGTACGTAGGTACGTAGGG ACGTACGTACGTACGTAGGTACGTAGGG ACGTACGTACGTACGTAGGTACGTAGGG ACGTACGTACGTACGTAGGTACGTAGGG ACGTACGTACGTACGTAGGTACGTAGGG ACGTACGTACGTACGTAGGTACGTAGGG and this read: >tag ACGTACGTACGTACGTAGGTACGTA The...

Continue Reading