Finding sequence conservation

I have written about sequence conservation in vertebrates previously but without much elaboration, hence I'm writing another post on this topic. An assumption of sequence conservation is that regions that show conservation, are under purifying selection, i.e. alleles that decrease the fitness of an organism are removed, and therefore probably do something important. Protein-coding regions are typically well conserved among the genomes of different species, so it's widely accepted that they are useful. Sequences need to be aligned together in order to infer sequence conservation and conveniently, a multiple sequence alignment (MSA) of 46 vertebrate genomes is provided at the UCSC Genome Browser site.

Continue reading

Sequence conservation in vertebrates

The UCSC Genome Browser provides multiple alignments of 46 vertebrate species and conveniently provides them for download. The multiple alignments show regions of sequence conservation among vertebrates. For more information see http://genome.ucsc.edu/cgi-bin/hgTrackUi?db=hg19&g=cons46way. The multiple alignments are stored as Multiple Alignment Files and there are Perl and Python packages that parse them. The MAF format is quite straight forward and here I convert the MAF files into a BED file for use with intersectBed from the BEDTools suite.

Continue reading