Just a simple test to see if Bowtie can map long reads. Why? Well because Bowtie is fast, so I want to see if I can also use it as a general purpose aligner. In a previous post I was characterising the mapability of the genome. From this I selected a portion of the genome that is unique, chr1:19472012-19482647 (10,635 bp). Then I made a bed file spanning different lengths of this portion:
chr1 19472013 19472043
chr1 19472013 19472113
chr1 19472013 19472213
chr1 19472013 19472513
chr1 19472013 19473013
chr1 19472013 19474013
chr1 19472013 19477013
chr1 19472013 19482013
Then using fastaFromBed:
fastaFromBed -fi /analysisdata/genomes/hg19_male.fa -bed unique.bed -fo unique.fa
So there are 8 sequences, with lengths 30, 100, 200, 500, 1,000, 2,000, 5,000 and 10,000.
First let's try to map these using blat.
blat -t=dna -q=dna -minIdentity=100 hg19.2bit unique.fa unique.psl Loaded 3137161264 letters in 93 sequences Searched 18830 bases in 8 sequences cat unique.psl psLayout version 3 match mis- rep. N's Q gap Q gap T gap T gap strand Q Q Q Q T T T T block blockSizes qStarts tStarts match match count bases count bases name size start end name size start end count --------------------------------------------------------------------------------------------------------------------------------------------------------------- 100 0 0 0 0 0 0 0 + chr1:19472013-19472113 100 0 100 chr1 249250621 19472013 19472113 1 100, 0, 19472013, 200 0 0 0 0 0 0 0 + chr1:19472013-19472213 200 0 200 chr1 249250621 19472013 19472213 1 200, 0, 19472013, 500 0 0 0 0 0 0 0 + chr1:19472013-19472513 500 0 500 chr1 249250621 19472013 19472513 1 500, 0, 19472013, 1000 0 0 0 0 0 0 0 + chr1:19472013-19473013 1000 0 1000 chr1 249250621 19472013 19473013 1 1000, 0, 19472013, 2000 0 0 0 0 0 0 0 + chr1:19472013-19474013 2000 0 2000 chr1 249250621 19472013 19474013 1 2000, 0, 19472013, 5000 0 0 0 0 0 0 0 + chr1:19472013-19477013 5000 0 5000 chr1 249250621 19472013 19477013 1 5000, 0, 19472013, 10000 0 0 0 0 0 0 0 + chr1:19472013-19482013 10000 0 10000 chr1 249250621 19472013 19482013 1 10000, 0, 19472013,
Now let's see if Bowtie can map these "long reads":
bowtie2-build hg19.fa hg19 bowtie2 -x hg19 -f -U unique.fa -S unique.sam 8 reads; of these: 8 (100.00%) were unpaired; of these: 0 (0.00%) aligned 0 times 7 (87.50%) aligned exactly 1 time 1 (12.50%) aligned >1 times 100.00% overall alignment rate cat unique.sam | grep -v "^@" | cut -f1-4 chr1:19472013-19472043 0 chr3 54652417 chr1:19472013-19472113 0 chr1 19472014 chr1:19472013-19472213 0 chr1 19472014 chr1:19472013-19472513 0 chr1 19472014 chr1:19472013-19473013 0 chr1 19472014 chr1:19472013-19474013 0 chr1 19472014 chr1:19472013-19477013 0 chr1 19472014 chr1:19472013-19482013 0 chr1 19472014
The 30mer could not be mapped by blat but was mapped by Bowtie but to another place. It looks like it may be possible to map long reads, up to 10,000 bp in length with Bowtie, at least with these trivial examples.
Things to keep in mind when using Bowtie2
Taken from the Bowtie2 manual.
For an alignment to be considered "valid" (i.e. "good enough") by Bowtie 2, it must have an alignment score no less than the minimum score threshold. The threshold is configurable and is expressed as a function of the read length. In end-to-end alignment mode (a global alignment), the default minimum score threhsold is -0.6 + -0.6 * L, where L is the read length. For example, for a 30 bp read the minimum score threshold would be -0.6 + (-0.6 * 30) = -18.6.
A mismatched base at a high-quality position in the read receives a penalty of -6 by default. A length-2 read gap receives a penalty of -11 by default (-5 for the gap open, -3 for the first extension, -3 for the second extension). Thus for a 30 bp read, an alignment is still considered valid with 3 mismatches (-18) and one gap + one mismatch (-14). I'm not sure what the threshold is for a high-quality position, however when aligning without base-calling qualities scores, every mismatch receives a penalty of -6.
The alignment scores are displayed in the SAM output as the AS field.
AS:i:
Alignment score. Can be negative. Can be greater than 0 in --local mode (but not in --end-to-end mode). Only present if SAM record is for an aligned read.
XS:i:
Alignment score for second-best alignment. Can be negative. Can be greater than 0 in --local mode (but not in --end-to-end mode). Only present if the SAM record is for an aligned read and more than one alignment was found for the read.
Other ways to increase sensitivity is to alter the seed length of reads (-L), the interval between extracted seeds (-i), and the number of mismatches permitted per seed (-N). See also: -D, which puts an upper limit on the number of dynamic programming problems (i.e. seed extensions) that can "fail" in a row before Bowtie 2 stops searching and see also: -R, which sets the maximum number of times Bowtie 2 will "re-seed" when attempting to align a read with repetitive seeds.

This work is licensed under a Creative Commons
Attribution 4.0 International License.
Hi, Thank you for the report. May I ask what were the parameters you have used for long reads in Bowtie? Thanks again.
Hi Ali,
Bowtie is quite old and its use is discouraged. Please consider a more up-to-date tool like HISAT2 or Minimap2, which work quite well in general.
Cheers,
Dave