Understanding the BAM flags

I’ve tried to explain the BAM flags to my colleagues and I think each time I have left them more confused. So perhaps I can do a better job of explaining BAM flags in writing. For this post, I will use this BAM file from the 1000 Genomes Project: NA18553.chrom11.ILLUMINA.bwa.CHB.low_coverage.20120522.bam.

Continue Reading

A small list of command line tips

Updated: 2014 May 14th; added even more tips I’m in the middle of writing papers and my thesis, so I’ve been quite busy. However, I wanted to write a quick blog post as an outlet. So here’s a list of random command line tips off the top of my head (GNU bash, version 4.1.2(1)-release); I…

Continue Reading

Sorting a huge BED file

I asked a question on Twitter about sorting a really huge file (more specifically sorting a huge BED file). To put really huge into context, the file I’m processing has 3,947,386,561 lines of genomic coordinates. I want the file to be sorted by the chromosome (lexical order), then by the start coordinate (numeric order) and…

Continue Reading

Using GNU parallel

Updated 2020 February 26th to include section “Strip directory and extensions”. I wrote this short guide on using GNU parallel for my biologist buddies who would like to harness the power of parallelisation. There are a lot of really useful guides out there but here I try to give simplistic examples. Let’s get started by…

Continue Reading