Learning about Snakemake

As promised two years ago, here's a short blog post on Snakemake. I have been using Bpipe to manage my workflows/pipelines but Snakemake has been mentioned to me on more than one occasion; in particular: @davetang31 seems useful if testing many diff types of pipelines. snakemake seems a bit more practical if analyzing many samples...

Continue Reading

Gene to OMIM phenotype

A couple of weeks ago, I wrote a post on identifying OMIM phenotypes that are associated with a gene of interest. I thought I solved the problem by using one of my favourite R packages (biomaRt) but alas. For example, I could not find any OMIM IDs associated with the TTN gene using biomaRt. In...

Continue Reading

VCF to PED

One of the classic bioinformatics problems is converting files from one format into another. In this post, I go through the process of creating a PED and MAP file from a VCF file. All the files described in this post are available in this GitHub repository.

Continue Reading

Exploring the UK10K variants

It has been almost two months since my last post; I have been occupied with preparing a fellowship application (which has been sent off!) and now I'm occupied with preparing and writing papers. Sadly, I've pushed blogging right down the priority list, even though it's one of the things I enjoy doing the most. This...

Continue Reading

Getting started with GEMINI

After getting started and getting acquainted with DNA sequencing data, it's finally time to explore DNA variation. A tool that makes this easy is GEMINI and this post briefly demonstrates some of its functionality. I have only used GEMINI sparingly and what I know about the tool is gathered mostly from their documentation and tutorials....

Continue Reading

SAMtools mpileup

The SAMtools mpileup utility provides a summary of the coverage of mapped reads on a reference sequence at a single base pair resolution. In addition, the output from mpileup can be piped to BCFtools to call genomic variants. I'm currently working with some Sanger sequenced PCR products, which I would like to call variants on....

Continue Reading

VCF concordance

I want to compare the genotype concordance between two VCF files and I came across SnpSift, which seems to calculate the statistics that I want. However, the format of the results from my run differ from the format in the documentation. In this post, I will try to come up with the exact scenarios that...

Continue Reading