Bowtie and multimapping reads

Updated 2014 June 8th

I first tried this with BWA. Now I'll try it with Bowtie.

git clone https://github.com/BenLangmead/bowtie.git
cd bowtie
make

Consider this reference sequence, which is the sequence "ACGTACGTACGTACGTAGGTACGTAGGG" repeated 20 times:

>artificial
ACGTACGTACGTACGTAGGTACGTAGGG
ACGTACGTACGTACGTAGGTACGTAGGG
ACGTACGTACGTACGTAGGTACGTAGGG
ACGTACGTACGTACGTAGGTACGTAGGG
ACGTACGTACGTACGTAGGTACGTAGGG
ACGTACGTACGTACGTAGGTACGTAGGG
ACGTACGTACGTACGTAGGTACGTAGGG
ACGTACGTACGTACGTAGGTACGTAGGG
ACGTACGTACGTACGTAGGTACGTAGGG
ACGTACGTACGTACGTAGGTACGTAGGG
ACGTACGTACGTACGTAGGTACGTAGGG
ACGTACGTACGTACGTAGGTACGTAGGG
ACGTACGTACGTACGTAGGTACGTAGGG
ACGTACGTACGTACGTAGGTACGTAGGG
ACGTACGTACGTACGTAGGTACGTAGGG
ACGTACGTACGTACGTAGGTACGTAGGG
ACGTACGTACGTACGTAGGTACGTAGGG
ACGTACGTACGTACGTAGGTACGTAGGG
ACGTACGTACGTACGTAGGTACGTAGGG
ACGTACGTACGTACGTAGGTACGTAGGG

and this read:

>tag
ACGTACGTACGTACGTAGGTACGTA

The next steps are to build an index and then align our read to the index:

#build index
bowtie-build ref.fa ref

#if you want to scroll up and down the usage
#bowtie 2>&1 | less
#here are what the parameters mean
#-k <int>           report up to <int> good alignments per read (default: 1)
#-f                 query input files are (multi-)FASTA .fa/.mfa
#-S/--sam           write hits in SAM format
bowtie -k 40 -f -S ref read.fa
@HD	VN:1.0	SO:unsorted
@SQ	SN:artificial	LN:560
@PG	ID:Bowtie	VN:1.0.1	CL:"bowtie -k 40 -f -S ref read.fa"
tag	0	artificial	29	255	25M	*	0	0	ACGTACGTACGTACGTAGGTACGTA	IIIIIIIIIIIIIIIIIIIIIIIII	XA:i:0	MD:Z:25	NM:i:0
tag	0	artificial	57	255	25M	*	0	0	ACGTACGTACGTACGTAGGTACGTA	IIIIIIIIIIIIIIIIIIIIIIIII	XA:i:0	MD:Z:25	NM:i:0
tag	0	artificial	85	255	25M	*	0	0	ACGTACGTACGTACGTAGGTACGTA	IIIIIIIIIIIIIIIIIIIIIIIII	XA:i:0	MD:Z:25	NM:i:0
tag	0	artificial	113	255	25M	*	0	0	ACGTACGTACGTACGTAGGTACGTA	IIIIIIIIIIIIIIIIIIIIIIIII	XA:i:0	MD:Z:25	NM:i:0
tag	0	artificial	141	255	25M	*	0	0	ACGTACGTACGTACGTAGGTACGTA	IIIIIIIIIIIIIIIIIIIIIIIII	XA:i:0	MD:Z:25	NM:i:0
tag	0	artificial	169	255	25M	*	0	0	ACGTACGTACGTACGTAGGTACGTA	IIIIIIIIIIIIIIIIIIIIIIIII	XA:i:0	MD:Z:25	NM:i:0
tag	0	artificial	197	255	25M	*	0	0	ACGTACGTACGTACGTAGGTACGTA	IIIIIIIIIIIIIIIIIIIIIIIII	XA:i:0	MD:Z:25	NM:i:0
tag	0	artificial	225	255	25M	*	0	0	ACGTACGTACGTACGTAGGTACGTA	IIIIIIIIIIIIIIIIIIIIIIIII	XA:i:0	MD:Z:25	NM:i:0
tag	0	artificial	253	255	25M	*	0	0	ACGTACGTACGTACGTAGGTACGTA	IIIIIIIIIIIIIIIIIIIIIIIII	XA:i:0	MD:Z:25	NM:i:0
tag	0	artificial	281	255	25M	*	0	0	ACGTACGTACGTACGTAGGTACGTA	IIIIIIIIIIIIIIIIIIIIIIIII	XA:i:0	MD:Z:25	NM:i:0
tag	0	artificial	309	255	25M	*	0	0	ACGTACGTACGTACGTAGGTACGTA	IIIIIIIIIIIIIIIIIIIIIIIII	XA:i:0	MD:Z:25	NM:i:0
tag	0	artificial	337	255	25M	*	0	0	ACGTACGTACGTACGTAGGTACGTA	IIIIIIIIIIIIIIIIIIIIIIIII	XA:i:0	MD:Z:25	NM:i:0
tag	0	artificial	365	255	25M	*	0	0	ACGTACGTACGTACGTAGGTACGTA	IIIIIIIIIIIIIIIIIIIIIIIII	XA:i:0	MD:Z:25	NM:i:0
tag	0	artificial	393	255	25M	*	0	0	ACGTACGTACGTACGTAGGTACGTA	IIIIIIIIIIIIIIIIIIIIIIIII	XA:i:0	MD:Z:25	NM:i:0
tag	0	artificial	421	255	25M	*	0	0	ACGTACGTACGTACGTAGGTACGTA	IIIIIIIIIIIIIIIIIIIIIIIII	XA:i:0	MD:Z:25	NM:i:0
tag	0	artificial	449	255	25M	*	0	0	ACGTACGTACGTACGTAGGTACGTA	IIIIIIIIIIIIIIIIIIIIIIIII	XA:i:0	MD:Z:25	NM:i:0
tag	0	artificial	477	255	25M	*	0	0	ACGTACGTACGTACGTAGGTACGTA	IIIIIIIIIIIIIIIIIIIIIIIII	XA:i:0	MD:Z:25	NM:i:0
tag	0	artificial	505	255	25M	*	0	0	ACGTACGTACGTACGTAGGTACGTA	IIIIIIIIIIIIIIIIIIIIIIIII	XA:i:0	MD:Z:25	NM:i:0
tag	0	artificial	533	255	25M	*	0	0	ACGTACGTACGTACGTAGGTACGTA	IIIIIIIIIIIIIIIIIIIIIIIII	XA:i:0	MD:Z:25	NM:i:0
tag	0	artificial	1	255	25M	*	0	0	ACGTACGTACGTACGTAGGTACGTA	IIIIIIIIIIIIIIIIIIIIIIIII	XA:i:0	MD:Z:25	NM:i:0
tag	0	artificial	289	255	25M	*	0	0	ACGTACGTACGTACGTAGGTACGTA	IIIIIIIIIIIIIIIIIIIIIIIII	XA:i:2	MD:Z:9G9G5	NM:i:2
tag	0	artificial	261	255	25M	*	0	0	ACGTACGTACGTACGTAGGTACGTA	IIIIIIIIIIIIIIIIIIIIIIIII	XA:i:2	MD:Z:9G9G5	NM:i:2
tag	0	artificial	233	255	25M	*	0	0	ACGTACGTACGTACGTAGGTACGTA	IIIIIIIIIIIIIIIIIIIIIIIII	XA:i:2	MD:Z:9G9G5	NM:i:2
tag	0	artificial	205	255	25M	*	0	0	ACGTACGTACGTACGTAGGTACGTA	IIIIIIIIIIIIIIIIIIIIIIIII	XA:i:2	MD:Z:9G9G5	NM:i:2
tag	0	artificial	177	255	25M	*	0	0	ACGTACGTACGTACGTAGGTACGTA	IIIIIIIIIIIIIIIIIIIIIIIII	XA:i:2	MD:Z:9G9G5	NM:i:2
tag	0	artificial	149	255	25M	*	0	0	ACGTACGTACGTACGTAGGTACGTA	IIIIIIIIIIIIIIIIIIIIIIIII	XA:i:2	MD:Z:9G9G5	NM:i:2
tag	0	artificial	121	255	25M	*	0	0	ACGTACGTACGTACGTAGGTACGTA	IIIIIIIIIIIIIIIIIIIIIIIII	XA:i:2	MD:Z:9G9G5	NM:i:2
tag	0	artificial	93	255	25M	*	0	0	ACGTACGTACGTACGTAGGTACGTA	IIIIIIIIIIIIIIIIIIIIIIIII	XA:i:2	MD:Z:9G9G5	NM:i:2
tag	0	artificial	65	255	25M	*	0	0	ACGTACGTACGTACGTAGGTACGTA	IIIIIIIIIIIIIIIIIIIIIIIII	XA:i:2	MD:Z:9G9G5	NM:i:2
tag	0	artificial	37	255	25M	*	0	0	ACGTACGTACGTACGTAGGTACGTA	IIIIIIIIIIIIIIIIIIIIIIIII	XA:i:2	MD:Z:9G9G5	NM:i:2
tag	0	artificial	9	255	25M	*	0	0	ACGTACGTACGTACGTAGGTACGTA	IIIIIIIIIIIIIIIIIIIIIIIII	XA:i:2	MD:Z:9G9G5	NM:i:2
tag	0	artificial	513	255	25M	*	0	0	ACGTACGTACGTACGTAGGTACGTA	IIIIIIIIIIIIIIIIIIIIIIIII	XA:i:2	MD:Z:9G9G5	NM:i:2
tag	0	artificial	485	255	25M	*	0	0	ACGTACGTACGTACGTAGGTACGTA	IIIIIIIIIIIIIIIIIIIIIIIII	XA:i:2	MD:Z:9G9G5	NM:i:2
tag	0	artificial	457	255	25M	*	0	0	ACGTACGTACGTACGTAGGTACGTA	IIIIIIIIIIIIIIIIIIIIIIIII	XA:i:2	MD:Z:9G9G5	NM:i:2
tag	0	artificial	429	255	25M	*	0	0	ACGTACGTACGTACGTAGGTACGTA	IIIIIIIIIIIIIIIIIIIIIIIII	XA:i:2	MD:Z:9G9G5	NM:i:2
tag	0	artificial	401	255	25M	*	0	0	ACGTACGTACGTACGTAGGTACGTA	IIIIIIIIIIIIIIIIIIIIIIIII	XA:i:2	MD:Z:9G9G5	NM:i:2
tag	0	artificial	373	255	25M	*	0	0	ACGTACGTACGTACGTAGGTACGTA	IIIIIIIIIIIIIIIIIIIIIIIII	XA:i:2	MD:Z:9G9G5	NM:i:2
tag	0	artificial	345	255	25M	*	0	0	ACGTACGTACGTACGTAGGTACGTA	IIIIIIIIIIIIIIIIIIIIIIIII	XA:i:2	MD:Z:9G9G5	NM:i:2
tag	0	artificial	317	255	25M	*	0	0	ACGTACGTACGTACGTAGGTACGTA	IIIIIIIIIIIIIIIIIIIIIIIII	XA:i:2	MD:Z:9G9G5	NM:i:2
# reads processed: 1
# reads with at least one reported alignment: 1 (100.00%)
# reads that failed to align: 0 (0.00%)
Reported 39 alignments to 1 output stream(s)

To report all the best alignments:

bowtie -f -a -S --best --strata ref read.fa
@HD	VN:1.0	SO:unsorted
@SQ	SN:artificial	LN:560
@PG	ID:Bowtie	VN:1.0.1	CL:"bowtie -f -a -S --best --strata ref read.fa"
tag	0	artificial	505	255	25M	*	0	0	ACGTACGTACGTACGTAGGTACGTA	IIIIIIIIIIIIIIIIIIIIIIIII	XA:i:0	MD:Z:25	NM:i:0
tag	0	artificial	477	255	25M	*	0	0	ACGTACGTACGTACGTAGGTACGTA	IIIIIIIIIIIIIIIIIIIIIIIII	XA:i:0	MD:Z:25	NM:i:0
tag	0	artificial	449	255	25M	*	0	0	ACGTACGTACGTACGTAGGTACGTA	IIIIIIIIIIIIIIIIIIIIIIIII	XA:i:0	MD:Z:25	NM:i:0
tag	0	artificial	421	255	25M	*	0	0	ACGTACGTACGTACGTAGGTACGTA	IIIIIIIIIIIIIIIIIIIIIIIII	XA:i:0	MD:Z:25	NM:i:0
tag	0	artificial	393	255	25M	*	0	0	ACGTACGTACGTACGTAGGTACGTA	IIIIIIIIIIIIIIIIIIIIIIIII	XA:i:0	MD:Z:25	NM:i:0
tag	0	artificial	365	255	25M	*	0	0	ACGTACGTACGTACGTAGGTACGTA	IIIIIIIIIIIIIIIIIIIIIIIII	XA:i:0	MD:Z:25	NM:i:0
tag	0	artificial	337	255	25M	*	0	0	ACGTACGTACGTACGTAGGTACGTA	IIIIIIIIIIIIIIIIIIIIIIIII	XA:i:0	MD:Z:25	NM:i:0
tag	0	artificial	309	255	25M	*	0	0	ACGTACGTACGTACGTAGGTACGTA	IIIIIIIIIIIIIIIIIIIIIIIII	XA:i:0	MD:Z:25	NM:i:0
tag	0	artificial	281	255	25M	*	0	0	ACGTACGTACGTACGTAGGTACGTA	IIIIIIIIIIIIIIIIIIIIIIIII	XA:i:0	MD:Z:25	NM:i:0
tag	0	artificial	253	255	25M	*	0	0	ACGTACGTACGTACGTAGGTACGTA	IIIIIIIIIIIIIIIIIIIIIIIII	XA:i:0	MD:Z:25	NM:i:0
tag	0	artificial	225	255	25M	*	0	0	ACGTACGTACGTACGTAGGTACGTA	IIIIIIIIIIIIIIIIIIIIIIIII	XA:i:0	MD:Z:25	NM:i:0
tag	0	artificial	197	255	25M	*	0	0	ACGTACGTACGTACGTAGGTACGTA	IIIIIIIIIIIIIIIIIIIIIIIII	XA:i:0	MD:Z:25	NM:i:0
tag	0	artificial	169	255	25M	*	0	0	ACGTACGTACGTACGTAGGTACGTA	IIIIIIIIIIIIIIIIIIIIIIIII	XA:i:0	MD:Z:25	NM:i:0
tag	0	artificial	141	255	25M	*	0	0	ACGTACGTACGTACGTAGGTACGTA	IIIIIIIIIIIIIIIIIIIIIIIII	XA:i:0	MD:Z:25	NM:i:0
tag	0	artificial	113	255	25M	*	0	0	ACGTACGTACGTACGTAGGTACGTA	IIIIIIIIIIIIIIIIIIIIIIIII	XA:i:0	MD:Z:25	NM:i:0
tag	0	artificial	85	255	25M	*	0	0	ACGTACGTACGTACGTAGGTACGTA	IIIIIIIIIIIIIIIIIIIIIIIII	XA:i:0	MD:Z:25	NM:i:0
tag	0	artificial	57	255	25M	*	0	0	ACGTACGTACGTACGTAGGTACGTA	IIIIIIIIIIIIIIIIIIIIIIIII	XA:i:0	MD:Z:25	NM:i:0
tag	0	artificial	29	255	25M	*	0	0	ACGTACGTACGTACGTAGGTACGTA	IIIIIIIIIIIIIIIIIIIIIIIII	XA:i:0	MD:Z:25	NM:i:0
tag	0	artificial	1	255	25M	*	0	0	ACGTACGTACGTACGTAGGTACGTA	IIIIIIIIIIIIIIIIIIIIIIIII	XA:i:0	MD:Z:25	NM:i:0
tag	0	artificial	533	255	25M	*	0	0	ACGTACGTACGTACGTAGGTACGTA	IIIIIIIIIIIIIIIIIIIIIIIII	XA:i:0	MD:Z:25	NM:i:0
# reads processed: 1
# reads with at least one reported alignment: 1 (100.00%)
# reads that failed to align: 0 (0.00%)
Reported 20 alignments to 1 output stream(s)

If I want to exclude tags mapping to m places:

bowtie -f -a -m 19 -S --best --strata ref read.fa
@HD	VN:1.0	SO:unsorted
@SQ	SN:artificial	LN:560
@PG	ID:Bowtie	VN:1.0.1	CL:"bowtie -f -a -m 19 -S --best --strata ref read.fa"
tag	4	*	0	0	*	*	0	0	ACGTACGTACGTACGTAGGTACGTA	IIIIIIIIIIIIIIIIIIIIIIIII	XM:i:19
# reads processed: 1
# reads with at least one reported alignment: 0 (0.00%)
# reads that failed to align: 0 (0.00%)
# reads with alignments suppressed due to -m: 1 (100.00%)
No alignments

I could not find an option for reporting all alignments in BWA (which may exist?); the default behaviour for BWA is to report one random location for multimapping tags. Using the -a -m 10 -S --best --strata parameters, does exactly what I want; report all alignments but keep only the best hits, however if a tag maps to more than 10 places mark it as unmapped.

See the Bowtie manual for more information and usage examples.