Calculating intergenic regions

Intergenic regions are simply loci in the genome demarked by where one gene ends and another starts. To calculate intergenic regions: First create a BED file containing the coordinates of all genes Sort this BED file by chromosome and then by the starting position Merge this BED file using mergeBed Run the script below (works…

Continue Reading

Genome mapability

I know of the genome mapability and uniqueness tracks provided by the UCSC Genome Browser but I was just interested in doing this myself for the hg19 genome. As a test, I investigated chr22, where the base composition is broken down as: Length of chr22 = 51,304,566 A: 9,094,775 C: 8,375,984 G: 8,369,235 T: 9,054,551…

Continue Reading

GENCODE

By now you should have heard about the ENCODE project. GENCODE, summarised as Encyclopædia of genes and gene variants, is a sub-project of ENCODE where the aim is to annotate all evidence-based gene features in the entire human genome with high accuracy. This includes protein coding genes and their isoforms, non coding RNAs and pseudogenes….

Continue Reading