Intergenic regions are simply loci in the genome demarked by where one gene ends and another starts.
To calculate intergenic regions:
- First create a BED file containing the coordinates of all genes
- Sort this BED file by chromosome and then by the starting position
- Merge this BED file using mergeBed
- Run the script below (works only with hg19), using as input the merged BED file
#!/usr/bin/perl use strict; use warnings; my $usage = "Usage: $0 <infile.bed>\n"; my $infile = shift or die $usage; #DATA used for calculating final intergenic region #change the DATA coordinates to your genome of interest my %chr_end = (); while(<DATA>){ chomp; my ($chr,$end) = split(/\s+/); $chr_end{$chr} = $end; } close(DATA); my $data = (); open(IN,'<',$infile) || die "Could not open $infile: $!\n"; while(<IN>){ chomp; my($chr,$start,$end) = split(/\t/); next unless exists $chr_end{$chr}; if (!exists $data->{$chr}){ $data->{$chr}->[0]->{'START'} = $start; $data->{$chr}->[0]->{'END'} = $end; } else { my $index = scalar(@{$data->{$chr}}); $data->{$chr}->[$index]->{'START'} = $start; $data->{$chr}->[$index]->{'END'} = $end; } } close(IN); foreach my $chr (keys %{$data}){ for(my $i=0; $i<scalar(@{$data->{$chr}}); ++$i){ my $start = ''; my $end = ''; my $last_start = ''; my $last_end = ''; #print "$chr\t$data->{$chr}->[$i]->{START}\t$data->{$chr}->[$i]->{END}\n"; if ($i == 0){ $start = '1'; $end = $data->{$chr}->[$i]->{'START'} - 1; } elsif ($i == scalar(@{$data->{$chr}})-1){ $last_start = $data->{$chr}->[$i]->{'END'} + 1; $last_end = $chr_end{$chr}; $start = $data->{$chr}->[$i-1]->{'END'} + 1; $end = $data->{$chr}->[$i]->{'START'} - 1; } else { $start = $data->{$chr}->[$i-1]->{'END'} + 1; $end = $data->{$chr}->[$i]->{'START'} - 1; } next if $start > $end; print join("\t",$chr,$start,$end),"\n"; if ($i == scalar(@{$data->{$chr}})-1){ print join("\t",$chr,$last_start,$last_end),"\n"; } } } exit(0); #hg19 chromosome ends used for calculating last intergenic region __DATA__ chr1 249250621 chr2 243199373 chr3 198022430 chr4 191154276 chr5 180915260 chr6 171115067 chr7 159138663 chrX 155270560 chr8 146364022 chr9 141213431 chr10 135534747 chr11 135006516 chr12 133851895 chr13 115169878 chr14 107349540 chr15 102531392 chr16 90354753 chr17 81195210 chr18 78077248 chr20 63025520 chrY 59373566 chr19 59128983 chr22 51304566 chr21 48129895
Once you've saved the output from the script, one way to check the results is by using intersectBed.
intersectBed -a merged.bed -b intergenic.bed
And nothing should be returned, which was the case when I performed this.

This work is licensed under a Creative Commons
Attribution 4.0 International License.
Hi,
Your script gives an error:
Name “main::DATA” used only once: possible typo at ./script.pl line 12.
readline() on unopened filehandle DATA at ./script.pl line 12.