Gene deserts

Find regions of the mouse genome devoid of any annotation (ESTs, mRNA, repeats, RefSeq and UCSC genes).

Annotation tracks downloaded using the table browser feature of the UCSC Genome Browser. Chromosome sizes of mm9 downloaded from here.

Code for finding regions of 10kb devoid of any annotation.


#!/usr/bin/perl

use strict;
use warnings;

my $bin = 10_000;

my $chrom_file = '/home/davetang/davetang/chrom_size/mm9_chrom_info.txt';
my %chr = ();
open(IN,'<',$chrom_file) || die "Could not open $chrom_file: $!\n";
while(<IN>){
   chomp;
   #chr1    197195432       /gbdb/mm9/mm9.2bit
   my ($chr,$size,$junk) = split(/\s+/);
   $chr{$chr} = $size;
}
close(IN);

my %feature = ();

my @file = qw/mm9_est_gene.bed.gz
mm9_mrna_gene.bed.gz
mm9_refseq_gene.bed.gz
mm9_repeatmasker_gene.bed.gz
mm9_ucsc_gene.bed.gz/;

foreach my $infile (@file){
   open(IN,'-|',"zcat $infile") || die "Could not open $infile: $!\n";
   while(<IN>){
      chomp;
      #chr1    134212712       134221648       BB62599
      my $which_bin = '';
      my ($chr,$start,$end,@junk) = split(/\t/);
      for (my $i = $start; $i <= $end; $i+=$bin){
         $which_bin = int($i / $bin);
         $feature{$chr}{$which_bin} = '1';
      }
      $which_bin = int($end / $bin);
      $feature{$chr}{$which_bin} = '1';
   }
   close(IN);
}

foreach my $chr (keys %chr){
   my $remainder = int($chr{$chr} / $bin);
   foreach (0 .. $remainder){
      next if exists($feature{$chr}{$_});
      my $start = $_ * $bin;
      my $end = $start + 9999;
      my $id = $chr . '_' . $start . '_' . $end;
      print join("\t",$chr,$start,$end,$id),"\n";
   }
}

exit(0);

__END__

In the mm9 genome I found 9,634 10kb regions with no annotation (corresponding to 96,340,000 bp of the genome with no annotation). A few lines of the output (bed format):

chr1 87310000 87319999 chr1_87310000_87319999
chr1 87320000 87329999 chr1_87320000_87329999
chr1 87330000 87339999 chr1_87330000_87339999
chr1 184930000 184939999 chr1_184930000_184939999
chr1 184940000 184949999 chr1_184940000_184949999

Print Friendly, PDF & Email



Creative Commons License
This work is licensed under a Creative Commons
Attribution 4.0 International License
.

Leave a Reply

Your email address will not be published. Required fields are marked *

This site uses Akismet to reduce spam. Learn how your comment data is processed.