VCF to PED

One of the classic bioinformatics problems is converting files from one format into another. In this post, I go through the process of creating a PED and MAP file from a VCF file. All the files described in this post are available in this GitHub repository.

Continue Reading

Exploring the UK10K variants

It has been almost two months since my last post; I have been occupied with preparing a fellowship application (which has been sent off!) and now I’m occupied with preparing and writing papers. Sadly, I’ve pushed blogging right down the priority list, even though it’s one of the things I enjoy doing the most. This…

Continue Reading

Getting started with GEMINI

After getting started and getting acquainted with DNA sequencing data, it’s finally time to explore DNA variation. A tool that makes this easy is GEMINI and this post briefly demonstrates some of its functionality. I have only used GEMINI sparingly and what I know about the tool is gathered mostly from their documentation and tutorials….

Continue Reading

SAMtools mpileup

The SAMtools mpileup utility provides a summary of the coverage of mapped reads on a reference sequence at a single base pair resolution. In addition, the output from mpileup can be piped to BCFtools to call genomic variants. I’m currently working with some Sanger sequenced PCR products, which I would like to call variants on….

Continue Reading

VCF concordance

I want to compare the genotype concordance between two VCF files and I came across SnpSift, which seems to calculate the statistics that I want. However, the format of the results from my run differ from the format in the documentation. In this post, I will try to come up with the exact scenarios that…

Continue Reading

Converting PED into VCF

Updated 2015 August 25th: as suggested by Tim, I checked out PLINK 1.9 and found it much simpler to convert PED into VCF. I updated the post with instructions for performing the conversion using PLINK 1.9. Being late to the game of analysing genomic variants, I only recently discovered that IGV is capable of visualising…

Continue Reading

Creating a coverage plot using BEDTools and R

One of my Top 10 posts is on creating a coverage plot using R. For that post I used CAGE data, which is a transcriptomic data set containing transcription start sites, and I used R exclusively for building a “coverage plot.” The main issue with that post was that the plots were density plots rather…

Continue Reading

Getting started with analysing DNA sequencing data

I have recently entered new territory and started working on analysing DNA sequencing data (as opposed to analysing RNA sequencing, i.e. transcriptomics). While it is still the analysis of lots of sequencing reads, one of the goals is to identify disease causing mutations; this is in contrast to identifying differentially expressed genes or inferring gene…

Continue Reading

Paired end alignment using Bpipe

This is a continuation of my post on getting started with Bpipe. In this post I followed the paired end alignment example from the documentation, with a few adjustments. This pipeline generates a random reference sequence, generates paired end reads from the reference, and aligns these reads back to the reference using BWA.

Continue Reading

Getting started with Bpipe

I have been using simple shell scripts for creating my bioinformatic pipelines. I define variables that can be used as parameter settings throughout the script, use some basic Unix tools for creating my output file names, and simply check the existence of files to see whether a step has been run or not. You can…

Continue Reading