A single exome

In the age of 50,000+ and 60,000+ whole exome catalogues, it's hard to find processed data for a single exome. At least I had trouble trying to find a single VCF file for a single exome from one individual. After searching for a while, I gave up and decided to generate one myself. This post is on how I generated a single VCF file, which I have hosted on my web server.

Continue reading

Basic Shiny app to fetch variant information

I created a basic Shiny app that uses the myvariant package to fetch variant information from MyVariant.info. The variants need to be represented in the format recommended by the Human Genome Variation Society. Once you have your variant of interest in the correct format, just hit "Get variant info!" and the annotations will appear on the right. You can find the app hosted at: https://davetang.shinyapps.io/get_variant_info/.

Continue reading

ExAC allele frequency of pathogenic ClinVar variants

A continuation of the post on the genomic location of pathogenic ClinVar variants. For this post I will use vcfanno to annotate the ClinVar variants with the ExAC VCF file.

To get started, download the ExAC VCF file.

# 4.1G file
wget -c ftp://ftp.broadinstitute.org/pub/ExAC_release/release0.3.1/ExAC.r0.3.1.sites.vep.vcf.gz
wget -c ftp://ftp.broadinstitute.org/pub/ExAC_release/release0.3.1/ExAC.r0.3.1.sites.vep.vcf.gz.tbi

Continue reading

Genomic location of pathogenic ClinVar variants

How many pathogenic ClinVar variants are in intergenic regions? I'll define genomic regions as per this old post. To get started, download the latest ClinVar variants:

wget -c ftp://ftp.ncbi.nlm.nih.gov/pub/clinvar/vcf_GRCh37/clinvar_20170104.vcf.gz

# index
tabix -p vcf clinvar_20170104.vcf.gz

# how many variants?
zcat clinvar_20170104.vcf.gz | grep -v "^#" | wc -l
232624

Continue reading

Exploring the UK10K variants

It has been almost two months since my last post; I have been occupied with preparing a fellowship application (which has been sent off!) and now I'm occupied with preparing and writing papers. Sadly, I've pushed blogging right down the priority list, even though it's one of the things I enjoy doing the most. This post is on exploring the variants that were discovered as part of the UK10K project. For the uninitiated, the UK10K project was a massive undertaking that aimed to characterise human genetic variation within the UK population by using whole exome (WES) and genome sequencing (WGS). The WGS arm sequenced healthy individuals (n=3,781) that were part of longitudinal studies and the WES arm sequenced individuals (n=5,294 and 5,182 passing QC) with rare diseases, severe obesity, and neurodevelopmental disorders. It's not quite 10K, but it's still an impressive number for now, since the 100,000 Genomes Project has already reached 7,306 genomes:

Continue reading

Getting started with GEMINI

After getting started and getting acquainted with DNA sequencing data, it's finally time to explore DNA variation. A tool that makes this easy is GEMINI and this post briefly demonstrates some of its functionality. I have only used GEMINI sparingly and what I know about the tool is gathered mostly from their documentation and tutorials. Be sure to check them out if you're planning on using GEMINI.

Continue reading