I was catching up on some blog reading and just read Ewan's latest post on CRAM going mainline. CRAM files are alignment files like BAM files but provides better compression. The toolkit for manipulating CRAM files is called CRAMTools and is a set of Java tools and APIs for efficient compression of sequence read data. This is necessary because of the constant increase in the throughput of sequencers. I went through the CRAM toolkit, format specification, the CRAM tutorial on the EBI page and decided to test it out.

Continue reading

Calculating the h-index

Updated 2014 September 19th to include a method that does not require sorting.

The h-index is an index that is calculated by measuring the number of articles, say n, that has at least n citations. If you published 10 articles, and each of them had four citations, your h-index would be four, since there are four articles with at least four citations. I'm not interested in what this number represents or measures but instead I'm interested in how one would calculate the h-index given an array of numbers.

Continue reading

A transpose tool

Updated 2014 September 19th to compare different transpose tools

I wrote a simple transpose tool, using Perl, for taking in tabular data and outputting a transposed version of the data. The primary motivation for writing this was because when viewing files with a lot of columns on the command-line, it becomes hard to match the columns to the column header. Most times, I just want to see a couple of typical values for each column, so that I can figure out how I could parse it. Here's the script I wrote:

Continue reading