I've tried to explain the BAM flags to my colleagues and I think each time I have left them more confused. So perhaps I can do a better job of explaining BAM flags in writing. For this post, I will use this BAM file from the 1000 Genomes Project:
NA18553.chrom11.ILLUMINA.bwa.CHB.low_coverage.20120522.bam.
The file is ~1.6G and you don't need to download it. I chose it because it has a lot of different BAM flags:
#take the second column of the BAM file #and output all the unique entries #the second column in the BAM flag column samtools view NA18553.chrom11.ILLUMINA.bwa.CHB.low_coverage.20120522.bam | cut -f2 | sort -u 0 1024 1040 1089 1097 1105 1107 1121 1123 113 1137 1145 1153 1161 1169 117 1171 1185 1187 1201 1209 121 129 133 137 145 147 16 161 163 177 181 185 65 69 73 81 83 97 99
So what do all those numbers mean? They are your BAM flags; if we consult the manual, we find this:
FLAG: bitwise FLAG. Each bit is explained in the following table:
Bit | Description |
0x1 | template having multiple segments in sequencing |
0x2 | each segment properly aligned according to the aligner |
0x4 | segment unmapped |
0x8 | next segment in the template unmapped |
0x10 | SEQ being reverse complemented |
0x20 | SEQ of the next segment in the template being reversed |
0x40 | the first segment in the template |
0x80 | the last segment in the template |
0x100 | secondary alignment |
0x200 | not passing quality controls |
0x400 | PCR or optical duplicate |
0x800 | supplementary alignment |
Those 0x1, 0x2, ... are hexadecimal numbers (base 16) as opposed to decimal numbers (base 10). We can represent the table in decimal numbers:
Decimal | Description |
1 | template having multiple segments in sequencing |
2 | each segment properly aligned according to the aligner |
4 | segment unmapped |
8 | next segment in the template unmapped |
16 | SEQ being reverse complemented |
32 | SEQ of the next segment in the template being reversed |
64 | the first segment in the template |
128 | the last segment in the template |
256 | secondary alignment |
512 | not passing quality controls |
1024 | PCR or optical duplicate |
2048 | supplementary alignment |
If we re-examine some of the BAM flags from our example BAM file, we can see two of those numbers: 16 and 1024. Looking back at our table with the converted decimal numbers, we can see that 16 corresponds to "SEQ being reverse complemented". Therefore for reads with a BAM flag of 16, they have the property of being mapped on the reverse strand. For reads with a BAM flag of 1024, they are PCR duplicates.
What about all the other numbers? Well, they are simply combinations of the above table; for example, the BAM flag 1040 is 1024 + 16. Reads with a BAM flag of 1040, have the properties of being mapped on the reverse strand AND are PCR duplicates. There is only one way of adding the numbers up.
If we enter 1040 into this web tool, it returns "read reverse strand" and "read is PCR or optical duplicate" as we deduced above.
Now the manual refers to the FLAG as a bitwise FLAG. A bit is a basic unit of information and can have only one of two values, like 1 and 0. Perhaps you've heard the joke: "There are only 10 types of people in the world: those who understand binary, and those who don't." The joke is of course that 10 in binary is the equivalent of 2 in decimal. To illustrate binary numbers, let me show you how we can represent 1,234 in decimal. We can write it as:
At each position we move up by a factor of 10 (decimal is base 10) from right to left; at the fourth position we have reached 1000.
2 in binary is:
At each position we move up by a factor of 2 (binary is base 2) from right to left; at the second position we have reached 2.
How do we represent 1040 in binary? It is:
Now notice that those positions correspond to the rows of the table (going right to left):
Status | Bit | Description |
0 | 0x1 | template having multiple segments in sequencing |
0 | 0x2 | each segment properly aligned according to the aligner |
0 | 0x4 | segment unmapped |
0 | 0x8 | next segment in the template unmapped |
1 | 0x10 | SEQ being reverse complemented |
0 | 0x20 | SEQ of the next segment in the template being reversed |
0 | 0x40 | the first segment in the template |
0 | 0x80 | the last segment in the template |
0 | 0x100 | secondary alignment |
0 | 0x200 | not passing quality controls |
1 | 0x400 | PCR or optical duplicate |
0 | 0x800 | supplementary alignment |
So we if convert the BAM flags, which are in decimal values, back to the binary values and refer to the table, we can work out the properties of a particular read.
The command line tool bc can convert decimal numbers to binary:
#bam flag 1040 echo 'obase=2;1040' | bc 10000010000 #bam flag 1169 echo 'obase=2;1169' | bc 10010010001
Therefore for a BAM flag of 1169:
Status | Bit | Description |
1 | 0x1 | template having multiple segments in sequencing |
0 | 0x2 | each segment properly aligned according to the aligner |
0 | 0x4 | segment unmapped |
0 | 0x8 | next segment in the template unmapped |
1 | 0x10 | SEQ being reverse complemented |
0 | 0x20 | SEQ of the next segment in the template being reversed |
0 | 0x40 | the first segment in the template |
1 | 0x80 | the last segment in the template |
0 | 0x100 | secondary alignment |
0 | 0x200 | not passing quality controls |
1 | 0x400 | PCR or optical duplicate |
0 | 0x800 | supplementary alignment |
As a Perl script
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
#!/bin/env perl | |
use strict; | |
use warnings; | |
my $usage = "Usage: $0 <bam_flag>\n"; | |
my $flag = shift or die $usage; | |
die "Please enter a numerical value\n" if $flag =~ /\D+/; | |
if ($flag & 0x1){ | |
print "template having multiple segments in sequencing\n"; | |
} | |
if ($flag & 0x2){ | |
print "each segment properly aligned according to the aligner\n"; | |
} | |
if ($flag & 0x4){ | |
print "segment unmapped\n"; | |
} | |
if ($flag & 0x8){ | |
print "next segment in the template unmapped\n"; | |
} | |
if ($flag & 0x10){ | |
print "SEQ being reverse complemented\n"; | |
} | |
if ($flag & 0x20){ | |
print "SEQ of the next segment in the template being reversed\n"; | |
} | |
if ($flag & 0x40){ | |
print "the first segment in the template\n"; | |
} | |
if ($flag & 0x80){ | |
print "the last segment in the template\n"; | |
} | |
if ($flag & 0x100){ | |
print "secondary alignment\n"; | |
} | |
if ($flag & 0x200){ | |
print "not passing quality controls\n"; | |
} | |
if ($flag & 0x400){ | |
print "PCR or optical duplicate\n"; | |
} | |
if ($flag & 0x800){ | |
print "supplementary alignment\n"; | |
} | |
exit(0); | |
__END__ |
Conclusions
I hope that explains how BAM flags work. One important thing to mention is that the BAM flag of a read that maps to the positive strand with NO other property is 0. So a BAM flag with a value of 1024 is in fact a read that is a PCR duplicate that maps on the forward strand.
See also
Web tool for explaining BAM flags

This work is licensed under a Creative Commons
Attribution 4.0 International License.
very interesting !
Thanks!
Wonderfully explained. I was ripping off my hair till I found your post!
Thanks very much for this. Very clearly explained. I’ve downloaded the pdf version and will use it as a reference.
Nice post, helps me a lot!
Why skip 0x5, 0x6, 0x7… in the Hexadecimal column? To my knowledge, the hexadecimal numbers can be expressed as 0x1 to 0xf (http://icarus.cs.weber.edu/~dab/cs1410/textbook/2.Core/bitops.html). Thanks!
Hi! We are not “skipping” hexademical numbers; the hexadecimal numbers are used to represent binary numbers. Each position of a binary number holds specific information on a read and the table in the post shows the unique information conferred by each binary position. 0x5 is 0101 in binary, which holds two bits of information and therefore not shown in the table.
That makes sense to me now. Thank you so much!