Managing papers

Update: 2015 August 19th: I discovered the "Retrieve Metadata from PDF" feature in Zotero that can retrieve all the metadata from a PDF and adds all the relevant information. (This doesn't work for all PDFs but for the majority it works. Note that if you perform too many queries, Google Scholar will restrict you from...

Continue Reading

Calculating the h-index

Updated 2014 September 19th to include a method that does not require sorting. The h-index is an index that is calculated by measuring the number of articles, say , that has at least citations. If you published 10 articles, and each of them had four citations, your h-index would be four, since there are four...

Continue Reading

Understanding the BAM flags

I've tried to explain the BAM flags to my colleagues and I think each time I have left them more confused. So perhaps I can do a better job of explaining BAM flags in writing. For this post, I will use this BAM file from the 1000 Genomes Project: NA18553.chrom11.ILLUMINA.bwa.CHB.low_coverage.20120522.bam.

Continue Reading

A small list of command line tips

Updated: 2014 May 14th; added even more tips I'm in the middle of writing papers and my thesis, so I've been quite busy. However, I wanted to write a quick blog post as an outlet. So here's a list of random command line tips off the top of my head (GNU bash, version 4.1.2(1)-release); I...

Continue Reading

Sorting a huge BED file

I asked a question on Twitter about sorting a really huge file (more specifically sorting a huge BED file). To put really huge into context, the file I'm processing has 3,947,386,561 lines of genomic coordinates. I want the file to be sorted by the chromosome (lexical order), then by the start coordinate (numeric order) and...

Continue Reading

Using GNU parallel

Updated 2015 August 3rd to include a section on formatting the output, which allows you remove two levels of file extensions. I wrote this short guide on using GNU parallel for my biologist buddies who would like to harness the power of parallelisation. There are a lot of really useful guides out there but here...

Continue Reading