Mapping long reads with Bowtie

Just a simple test to see if Bowtie can map long reads. Why? Well because Bowtie is fast, so I want to see if I can also use it as a general purpose aligner. In a previous post I was characterising the mapability of the genome. From this I selected a portion of the genome that is unique, chr1:19472012-19482647 (10,635 bp). Then I made a bed file spanning different lengths of this portion:

chr1 19472013 19472043
chr1 19472013 19472113
chr1 19472013 19472213
chr1 19472013 19472513
chr1 19472013 19473013
chr1 19472013 19474013
chr1 19472013 19477013
chr1 19472013 19482013

Then using fastaFromBed:

fastaFromBed -fi /analysisdata/genomes/hg19_male.fa -bed unique.bed -fo unique.fa

So there are 8 sequences, with lengths 30, 100, 200, 500, 1,000, 2,000, 5,000 and 10,000.

First let's try to map these using blat.

blat -t=dna -q=dna -minIdentity=100 hg19.2bit unique.fa unique.psl
Loaded 3137161264 letters in 93 sequences
Searched 18830 bases in 8 sequences
cat unique.psl
psLayout version 3

match   mis-    rep.    N's     Q gap   Q gap   T gap   T gap strand    Q               Q       Q       Q       T               T       T       T       block   blockSizes      qStarts  tStarts
        match   match           count   bases   count   bases           name            size    start   end     name            size    start   end     count
---------------------------------------------------------------------------------------------------------------------------------------------------------------
100     0       0       0       0       0       0       0     + chr1:19472013-19472113  100     0       100     chr1    249250621       19472013        19472113        1       100,    0,  19472013,
200     0       0       0       0       0       0       0     + chr1:19472013-19472213  200     0       200     chr1    249250621       19472013        19472213        1       200,    0,  19472013,
500     0       0       0       0       0       0       0     + chr1:19472013-19472513  500     0       500     chr1    249250621       19472013        19472513        1       500,    0,  19472013,
1000    0       0       0       0       0       0       0     + chr1:19472013-19473013  1000    0       1000    chr1    249250621       19472013        19473013        1       1000,   0,  19472013,
2000    0       0       0       0       0       0       0     + chr1:19472013-19474013  2000    0       2000    chr1    249250621       19472013        19474013        1       2000,   0,  19472013,
5000    0       0       0       0       0       0       0     + chr1:19472013-19477013  5000    0       5000    chr1    249250621       19472013        19477013        1       5000,   0,  19472013,
10000   0       0       0       0       0       0       0     + chr1:19472013-19482013  10000   0       10000   chr1    249250621       19472013        19482013        1       10000,  0,  19472013,

Now let's see if Bowtie can map these "long reads":

bowtie2-build hg19.fa hg19
bowtie2 -x hg19 -f -U unique.fa -S unique.sam
8 reads; of these:
  8 (100.00%) were unpaired; of these:
    0 (0.00%) aligned 0 times
    7 (87.50%) aligned exactly 1 time
    1 (12.50%) aligned >1 times
100.00% overall alignment rate
cat unique.sam | grep -v "^@" | cut -f1-4
chr1:19472013-19472043  0       chr3    54652417
chr1:19472013-19472113  0       chr1    19472014
chr1:19472013-19472213  0       chr1    19472014
chr1:19472013-19472513  0       chr1    19472014
chr1:19472013-19473013  0       chr1    19472014
chr1:19472013-19474013  0       chr1    19472014
chr1:19472013-19477013  0       chr1    19472014
chr1:19472013-19482013  0       chr1    19472014

The 30mer could not be mapped by blat but was mapped by Bowtie but to another place. It looks like it may be possible to map long reads, up to 10,000 bp in length with Bowtie, at least with these trivial examples.

Things to keep in mind when using Bowtie2

Taken from the Bowtie2 manual.

For an alignment to be considered "valid" (i.e. "good enough") by Bowtie 2, it must have an alignment score no less than the minimum score threshold. The threshold is configurable and is expressed as a function of the read length. In end-to-end alignment mode (a global alignment), the default minimum score threhsold is -0.6 + -0.6 * L, where L is the read length. For example, for a 30 bp read the minimum score threshold would be -0.6 + (-0.6 * 30) = -18.6.

A mismatched base at a high-quality position in the read receives a penalty of -6 by default. A length-2 read gap receives a penalty of -11 by default (-5 for the gap open, -3 for the first extension, -3 for the second extension). Thus for a 30 bp read, an alignment is still considered valid with 3 mismatches (-18) and one gap + one mismatch (-14). I'm not sure what the threshold is for a high-quality position, however when aligning without base-calling qualities scores, every mismatch receives a penalty of -6.

The alignment scores are displayed in the SAM output as the AS field.

AS:i:
Alignment score. Can be negative. Can be greater than 0 in --local mode (but not in --end-to-end mode). Only present if SAM record is for an aligned read.

XS:i:
Alignment score for second-best alignment. Can be negative. Can be greater than 0 in --local mode (but not in --end-to-end mode). Only present if the SAM record is for an aligned read and more than one alignment was found for the read.

Other ways to increase sensitivity is to alter the seed length of reads (-L), the interval between extracted seeds (-i), and the number of mismatches permitted per seed (-N). See also: -D, which puts an upper limit on the number of dynamic programming problems (i.e. seed extensions) that can "fail" in a row before Bowtie 2 stops searching and see also: -R, which sets the maximum number of times Bowtie 2 will "re-seed" when attempting to align a read with repetitive seeds.




Creative Commons License
This work is licensed under a Creative Commons
Attribution 4.0 International License
.
2 comments Add yours
  1. Hi, Thank you for the report. May I ask what were the parameters you have used for long reads in Bowtie? Thanks again.

    1. Hi Ali,

      Bowtie is quite old and its use is discouraged. Please consider a more up-to-date tool like HISAT2 or Minimap2, which work quite well in general.

      Cheers,
      Dave

Leave a Reply

Your email address will not be published. Required fields are marked *

This site uses Akismet to reduce spam. Learn how your comment data is processed.