Understanding the BAM flags

I've tried to explain the BAM flags to my colleagues and I think each time I have left them more confused. So perhaps I can do a better job of explaining BAM flags in writing. For this post, I will use this BAM file from the 1000 Genomes Project:

NA18553.chrom11.ILLUMINA.bwa.CHB.low_coverage.20120522.bam.

The file is ~1.6G and you don't need to download it. I chose it because it has a lot of different BAM flags:

#take the second column of the BAM file
#and output all the unique entries
#the second column in the BAM flag column
samtools view NA18553.chrom11.ILLUMINA.bwa.CHB.low_coverage.20120522.bam | cut -f2 | sort -u
0
1024
1040
1089
1097
1105
1107
1121
1123
113
1137
1145
1153
1161
1169
117
1171
1185
1187
1201
1209
121
129
133
137
145
147
16
161
163
177
181
185
65
69
73
81
83
97
99

So what do all those numbers mean? They are your BAM flags; if we consult the manual, we find this:

FLAG: bitwise FLAG. Each bit is explained in the following table:

Bit Description
0x1 template having multiple segments in sequencing
0x2 each segment properly aligned according to the aligner
0x4 segment unmapped
0x8 next segment in the template unmapped
0x10 SEQ being reverse complemented
0x20 SEQ of the next segment in the template being reversed
0x40 the first segment in the template
0x80 the last segment in the template
0x100 secondary alignment
0x200 not passing quality controls
0x400 PCR or optical duplicate
0x800 supplementary alignment

Those 0x1, 0x2, ... are hexadecimal numbers (base 16) as opposed to decimal numbers (base 10). We can represent the table in decimal numbers:

Decimal Description
1 template having multiple segments in sequencing
2 each segment properly aligned according to the aligner
4 segment unmapped
8 next segment in the template unmapped
16 SEQ being reverse complemented
32 SEQ of the next segment in the template being reversed
64 the first segment in the template
128 the last segment in the template
256 secondary alignment
512 not passing quality controls
1024 PCR or optical duplicate
2048 supplementary alignment

If we re-examine some of the BAM flags from our example BAM file, we can see two of those numbers: 16 and 1024. Looking back at our table with the converted decimal numbers, we can see that 16 corresponds to "SEQ being reverse complemented". Therefore for reads with a BAM flag of 16, they have the property of being mapped on the reverse strand. For reads with a BAM flag of 1024, they are PCR duplicates.

What about all the other numbers? Well, they are simply combinations of the above table; for example, the BAM flag 1040 is 1024 + 16. Reads with a BAM flag of 1040, have the properties of being mapped on the reverse strand AND are PCR duplicates. There is only one way of adding the numbers up.

If we enter 1040 into this web tool, it returns "read reverse strand" and "read is PCR or optical duplicate" as we deduced above.

Now the manual refers to the FLAG as a bitwise FLAG. A bit is a basic unit of information and can have only one of two values, like 1 and 0. Perhaps you've heard the joke: "There are only 10 types of people in the world: those who understand binary, and those who don't." The joke is of course that 10 in binary is the equivalent of 2 in decimal. To illustrate binary numbers, let me show you how we can represent 1,234 in decimal. We can write it as:

At each position we move up by a factor of 10 (decimal is base 10) from right to left; at the fourth position we have reached 1000.

2 in binary is:

At each position we move up by a factor of 2 (binary is base 2) from right to left; at the second position we have reached 2.

How do we represent 1040 in binary? It is:

Now notice that those positions correspond to the rows of the table (going right to left):

Status Bit Description
0 0x1 template having multiple segments in sequencing
0 0x2 each segment properly aligned according to the aligner
0 0x4 segment unmapped
0 0x8 next segment in the template unmapped
1 0x10 SEQ being reverse complemented
0 0x20 SEQ of the next segment in the template being reversed
0 0x40 the first segment in the template
0 0x80 the last segment in the template
0 0x100 secondary alignment
0 0x200 not passing quality controls
1 0x400 PCR or optical duplicate
0 0x800 supplementary alignment

So we if convert the BAM flags, which are in decimal values, back to the binary values and refer to the table, we can work out the properties of a particular read.

The command line tool bc can convert decimal numbers to binary:

#bam flag 1040
echo 'obase=2;1040' | bc
10000010000

#bam flag 1169
echo 'obase=2;1169' | bc
10010010001

Therefore for a BAM flag of 1169:

Status Bit Description
1 0x1 template having multiple segments in sequencing
0 0x2 each segment properly aligned according to the aligner
0 0x4 segment unmapped
0 0x8 next segment in the template unmapped
1 0x10 SEQ being reverse complemented
0 0x20 SEQ of the next segment in the template being reversed
0 0x40 the first segment in the template
1 0x80 the last segment in the template
0 0x100 secondary alignment
0 0x200 not passing quality controls
1 0x400 PCR or optical duplicate
0 0x800 supplementary alignment

As a Perl script

Conclusions

I hope that explains how BAM flags work. One important thing to mention is that the BAM flag of a read that maps to the positive strand with NO other property is 0. So a BAM flag with a value of 1024 is in fact a read that is a PCR duplicate that maps on the forward strand.

See also

Web tool for explaining BAM flags

Learning binary and hexadecimal.

Bitwise operators in Perl


Creative Commons License
This work is licensed under a Creative Commons
Attribution 4.0 International License
.

2 thoughts on “Understanding the BAM flags

Leave a Reply

Your email address will not be published. Required fields are marked *

Time limit is exhausted. Please reload CAPTCHA.