Updated 2014 June 25th
The tool intersectBed is part of the BEDTools suite of tools and performs an intersection between two BED files. For example, given two BED files, you may be interested in finding the entries that overlap. To install the latest version of BEDTools, download the source code from GitHub and compile:
git clone https://github.com/arq5x/bedtools2.git cd bedtools2 make clean; make all bin/bedtools --version #bedtools v2.20.1-4-gb877b35
BEDTools usage examples are available in the official documentation.
Getting started
When I first used BEDTools in 2010, intersectBed was a separate program that was part of the BEDTools suite. Since then, BEDTools is released as one main program called bedtools. If we look inside intersectBed, it is actually just a shell script calling bedtools:
cat intersectBed
#!/bin/sh
${0%/*}/bedtools intersect "$@"
The $@ symbol stores all the input arguments supplied, for example:
cat blah.sh #!/bin/sh echo "$@" blah.sh -a a.bed -b b.bed -a a.bed -b b.bed
The ${0%/*} refers to $0 using parameter expansion
cat blah.sh
#!/bin/sh
echo ${0} "$@"
blah.sh -a a.bed -b b.bed
./blah.sh -a a.bed -b b.bed
and the %/* removes everything before the /
cat blah.sh
#!/bin/sh
echo ${0%/*} "$@"
blah.sh -a a.bed -b b.bed
. -a a.bed -b b.bed
So in the end:
cat blah.sh
#!/bin/sh
echo ${0%/*}/bedtools intersect "$@"
blah.sh -a a.bed -b b.bed
./bedtools intersect -a a.bed -b b.bed
I'm used to calling intersectBed, it's exactly the same as calling bedtools intersect. When we call intersectBed without any parameters, we get the usage on STDERR. We can redirect STDERR to STDOUT, so that we can use a terminal pager:
intersectBed 2>&1 | less
Here is the entire usage:
Tool: bedtools intersect (aka intersectBed) Version: v2.20.1-4-gb877b35 Summary: Report overlaps between two feature files. Usage: bedtools intersect [OPTIONS] -a-b Options: -abam The A input file is in BAM format. Output will be BAM as well. -ubam Write uncompressed BAM output. Default writes compressed BAM. -bed When using BAM input (-abam), write output as BED. The default is to write output in BAM when using -abam. -wa Write the original entry in A for each overlap. -wb Write the original entry in B for each overlap. - Useful for knowing _what_ A overlaps. Restricted by -f and -r. -loj Perform a "left outer join". That is, for each feature in A report each overlap with B. If no overlaps are found, report a NULL feature for B. -wo Write the original A and B entries plus the number of base pairs of overlap between the two features. - Overlaps restricted by -f and -r. Only A features with overlap are reported. -wao Write the original A and B entries plus the number of base pairs of overlap between the two features. - Overlapping features restricted by -f and -r. However, A features w/o overlap are also reported with a NULL B feature and overlap = 0. -u Write the original A entry _once_ if _any_ overlaps found in B. - In other words, just report the fact >=1 hit was found. - Overlaps restricted by -f and -r. -c For each entry in A, report the number of overlaps with B. - Reports 0 for A entries that have no overlap with B. - Overlaps restricted by -f and -r. -v Only report those entries in A that have _no overlaps_ with B. - Similar to "grep -v" (an homage). -f Minimum overlap required as a fraction of A. - Default is 1E-9 (i.e., 1bp). - FLOAT (e.g. 0.50) -r Require that the fraction overlap be reciprocal for A and B. - In other words, if -f is 0.90 and -r is used, this requires that B overlap 90% of A and A _also_ overlaps 90% of B. -s Require same strandedness. That is, only report hits in B that overlap A on the _same_ strand. - By default, overlaps are reported without respect to strand. -S Require different strandedness. That is, only report hits in B that overlap A on the _opposite_ strand. - By default, overlaps are reported without respect to strand. -split Treat "split" BAM or BED12 entries as distinct BED intervals. -sorted Use the "chromsweep" algorithm for sorted (-k1,1 -k2,2n) input. -g Provide a genome file to enforce consistent chromosome sort order across input files. Only applies when used with -sorted option. -header Print the header from the A file prior to results. -nobuf Disable buffered output. Using this option will cause each line of output to be printed as it is generated, rather than saved in a buffer. This will make printing large output files noticeably slower, but can be useful in conjunction with other software tools and scripts that need to process one line of bedtools output at a time. Notes: (1) When a BAM file is used for the A file, the alignment is retained if overlaps exist, and exlcuded if an overlap cannot be found. If multiple overlaps exist, they are not reported, as we are only testing for one or more overlaps.
Something to remember is that:
1.3.6 File B is loaded into memory
Whenever a BEDTool compares two files of features, the “B” file is loaded into memory. By contrast, the “A” file is processed line by line and compared with the features from B. Therefore to minimize memory usage, one should set the smaller of the two files as the B file.
Now, let's do some sanity checks. Let's test the -f parameter:
cat a.bed #chr1 500 1000 cat b.bed #chr1 750 1000 intersectBed -a a.bed -b b.bed -f 0.50 #chr1 750 1000 #if I increase b.bed, it should still return a result cat a.bed #chr1 500 1000 cat b.bed #chr1 750 2000 intersectBed -a a.bed -b b.bed -f 0.50 #chr1 750 1000 #if I make a.bed longer, the -f 0.50 condition is not fulfilled cat a.bed #chr1 499 1000 cat b.bed #chr1 750 2000 intersectBed -a a.bed -b b.bed -f 0.50 #returns nothing
So if I want to check whether a feature of interest has at least 50% of its length overlapping another feature, the feature of interest should be the -a BED file i.e. the first BED file.
Testing reciprocity (-r):
cat a.bed #chr1 500 1000 cat b.bed #chr1 750 1250 intersectBed -a a.bed -b b.bed -f 0.50 -r #chr1 750 1000 cat a.bed #chr1 500 1000 cat b.bed #chr1 750 1251 intersectBed -a a.bed -b b.bed -f 0.50 -r #returns nothing
intersectBed with BED12
I made this BED12 file:
chr1 1 1000 test 0 + 1 1000 0 2 100,100 0,899
So there are two blocks/exons at positions 1-100 and at 899-1000.
I then made this BAM file:
read1 0 chr1 50 37 21M * 0 0 GTATGCAGTATATACATGACT gggfcfcfcff^ffdgfgg]g XT:A:U NM:i:0 X0:i:1 X1:i:0 XM:i:0 XO:i:0 XG:i:0 MD:Z:21
read2 0 chr1 500 37 21M * 0 0 GTATGCAGTATATACATGACT gggfcfcfcff^ffdgfgg]g XT:A:U NM:i:0 X0:i:1 X1:i:0 XM:i:0 XO:i:0 XG:i:0 MD:Z:21
Then I ran:
#Version: 2.15.0
intersectBed -abam haha.bam -b test.bed -bed -wo
and got:
chr1 49 70 read1 37 + chr1 1 1000 test 0 + 1 1000 0 2 100,100 0,899 21
chr1 499 520 read2 37 + chr1 1 1000 test 0 + 1 1000 0 2 100,100 0,899 21
So intersectBed does not take into account the different blocks. This was posted by the developer (Aaron Quinlan) on SEQanswers:
(3) Native support for "blocked" BED features (aka BED12). Note that each block is not considered separately. BEDTools merely allows one to use BED12 files and the last 6 fields are "passed through" the tools.
So in order to intersect with blocks/exons, they should be separate entities in the BED file.
Other tips
#standard in / stream cat a.bed | intersectBed -a stdin -b b.bed #this is useful when you want to perform two intersect steps intersectBed -a first.bed -b second.bed | intersectBed -a stdin -b third.bed
Lastly, below is what Sean Eddy has to say about BEDTools:
#BOSC2013 Eddy: Slightly insane level of commitment required to build a world class tool, eg now using excellent bedtools from @aaronquinlan
— Peter Cock (@pjacock) July 20, 2013

This work is licensed under a Creative Commons
Attribution 4.0 International License.
Hello,
I need some kind help please, as i am a beginner in learning BEDTOOLS.
i have to extract SNPs from my VCF file based on GFF file, i mean there are some sequence IDs (ID=comp1207_c1_seq1.mrna1.cds5;Name=comp1207_c1_seq1;Parent=comp1207_c1_seq1.mrna1) in my GFF file which also contain the word “cds” but the same ID is also in VCF file (comp1207_c0_seq1 432 . A T 62.44 . AC=1;AF=0.50;AN=2;BaseQRankSum=0.135;DP=12;Dels=0.00;FS=2.533;HRun=4;HaplotypeScore=0.0000;MQ=40.7) BUT without this word “CDS”, so by using intersectBed i have to locate and extract all these IDs with CDS from VCF file, i dont know how to do this,i tried these commands but they did not work.
tahir@aspseq:~$ intersectBed -a 26-Gper.vcf -b codes.gff
tahir@aspseq:~$ intersectBed -a codes.gff -b 26-Gper.vcf
Hi Soban,
intersectBed works on bed files: see this link http://genome.ucsc.edu/FAQ/FAQformat.html#format1
You will need to convert your gff file and vcf file into bed files in order to use intersectBed.
Hope that helps,
Dave
Thankyou dave,
I thought about it but its written there that intersectBED also supports BAM, GFF,VCF formats also, so do you still think that have to convert my GFF and VCF files into BED?.
anyway i have converted my VCF and GFF files into BED files but still got nothing through
intersectBed -a 349.2.bed -b codes.bed> got.bed
It’s a bit hard to troubleshoot over the web like this. Could you make available the files somewhere (e.g. via Dropbox share or a personal server) and email (me @ davetang.org) the link. I’ll take a look.
Thankyou Dave, sure i will send you my files but just wait untill i finish some other urgent tasks. just hold for a while.
Thanx again.
Hej Dave
sorry for being late but i have got the solution, infact i forgot to remove
the all next columns after first three columns from BED files, whcih
intersectBEd was not able to read and intersect, anyway i have another
short problem, if you send me your personal email adress, i can send you
the Qs and file details.
kind regards.
Hi Dave,
is there a quicker way if we have single base positions e.g. in case of methylation and a set of intervals and want to calculate the average coverage in that region. I was thinking about using binary search for that but I was wondering if it can be done quickly or easily with bedtools or genomic ranges and how.
here is the input interval file e.g. :-
Gm01 12884 12933
Gm01 89985 90016
Gm01 116973 117199
Gm01 126226 126227
Gm01 131876 131993
Gm01 200358 200753
Gm01 236785 237014
Gm01 341452 341895
Gm01 347664 347676
Gm01 357195 357222
and here is the methylation file :-
Gm01 42 W82_RH25_R1 + CG 3 3
Gm01 43 W82_RH25_R1 – CG 0 1
Gm01 170 W82_RH25_R1 + CG 26 30
Gm01 171 W82_RH25_R1 – CG 13 18
Gm01 192 W82_RH25_R1 + CG 27 31
Gm01 193 W82_RH25_R1 – CG 13 15
Gm01 199 W82_RH25_R1 + CG 27 30
Gm01 200 W82_RH25_R1 – CG 10 12
Gm01 301 W82_RH25_R1 + CG 14 16
Gm01 302 W82_RH25_R1 – CG 14 15
Gm01 326 W82_RH25_R1 + CG 8 12
Gm01 327 W82_RH25_R1 – CG 11 12
Gm01 340 W82_RH25_R1 + CG 7 9
last column is coverage and I want to calculate average coverage over a region is there any easier way to do that.
regards
Saad
Hi Saad,
yeah it shouldn’t be hard. First add an extra column in your methylation file; the third column is the second column + 1. So if a.bed is your interval file and b.bed is the methylation file, do something like this (untested):
intersectBed -a a.bed -b b.bed -wo | awk '{print $1,$2,$3,$11}' | groupBy -g 1,2,3 -c 4 -o meanCheers,
Dave
Why the following lines don’t give the same result?
intersectBed -a first.bed -b second.bed | intersectBed -a stdin -b third.bed | wc -l
intersectBed -a third.bed -b second.bed | intersectBed -a stdin -b first.bed | wc -l
intersectBed -a second.bed -b third.bed | intersectBed -a stdin -b first.bed | wc -l
I would also expect them to give the same result; what are the coordinates of your three bed files?