Using the GenometriCorr package

I was reading through the bedtools jaccard documentation when I saw the reference "Exploring Massive, Genome Scale Datasets with the GenometriCorr Package". Firstly for those wondering what the Jaccard index is, it's a simple metric that is defined as so: The numerator is the number of intersections between A and B, and the denominator is...

Continue Reading

Finding sequence conservation

I have written about sequence conservation in vertebrates previously but without much elaboration, hence I'm writing another post on this topic. An assumption of sequence conservation is that regions that show conservation, are under purifying selection, i.e. alleles that decrease the fitness of an organism are removed, and therefore probably do something important. Protein-coding regions...

Continue Reading

Getting started with Picard

Updated hyperlinks on the 2015 January 26th; please comment if you find any more dead links. Picard is a suite of Java-based command-line utilities that manipulate SAM/BAM files. Currently, I'm analysing some paired-end libraries and I wanted to calculate the average insert size based on the alignments; that's how I found Picard. While reading the...

Continue Reading

Repetitive elements in vertebrate genomes

Updated 2015 February 8th to include some scatter plots of genome size versus repeat content. I was writing about the make up of genomes today and was looking up statistics on repetitive elements in vertebrate genomes. While I could find individual papers with repetitive element statistics for a particular genome, I was unable to find...

Continue Reading

Genomic Regions Enrichment of Annotations Tool

The Genomic Regions Enrichment of Annotations Tool (GREAT) is a tool that allows you to find enriched ontological terms in a set of genomic regions. This talk (running time ~1 hour) gives an overview of the tool. In brief, GREAT is an alternative to gene-centric enrichment tools such as DAVID and uses a binomial test...

Continue Reading

How mappable is a specific repeat?

If you've ever wondered how mappable a specific repeat is, here's a quick post on creating a plot showing the mappability of a repetitive element along its consensus sequence. Specifically, the consensus sequence of a repeat was taken and sub-sequences were created by a sliding window approach (i.e. moving along the sequence) at 1 bp...

Continue Reading

Bioconductor annotation packages

The Bioconductor annotation packages are an extensive collection of annotations. For this post I simply illustrate the basics of probing these annotation packages. For the first example I will use the org.Hs.eg.db package, which provides genome wide annotations for the human genome. We can query the package by using the select() function; to find out...

Continue Reading