Calculating the h-index

Updated 2014 September 19th to include a method that does not require sorting. The h-index is an index that is calculated by measuring the number of articles, say $$n$$, that has at least $$n$$ citations. If you published 10 articles, and each of them had four citations, your h-index would be four, since there are…

Continue Reading

A transpose tool

Updated 2014 September 19th to compare different transpose tools I wrote a simple transpose tool, using Perl, for taking in tabular data and outputting a transposed version of the data. The primary motivation for writing this was because when viewing files with a lot of columns on the command-line, it becomes hard to match the…

Continue Reading

Getting started with C

I learned Perl as my first language as it was the language of choice in the first lab I joined. Over the years I’ve heard many criticisms, such as Perl code looks ugly and its motto “There’s more than one way to do it” allows too much flexibility. I particularly like this description of Perl:…

Continue Reading

Saving disk space with Perl

Disk space is cheaper these days but here’s one way of using less disk space by working directly with gzipped files. Here’s a very straight forward example of Perl code that opens a gzipped file and outputs a gzipped file. And here’s some other code that just counts the number of lines in a file,…

Continue Reading

Equivalents in R, Python and Perl

Last update 2018 May 24th Perl was used by many computational biologists back in early 2000. The popularity of Perl may have been driven by its involvement with the human genome project. An article titled "How Perl Saved the Human Genome Project" explains why Perl was a good fit for computational biology projects (as well…

Continue Reading

Passing arguments from the command line in Perl

I used to do this for specifying the usage: However this became a problem when I needed to pass the number “0” as an argument. So I thought I’ll improve the code via the Perl module Getopt::Std. Depending on how your script works, you can set up conditional checks (e.g. unless exists $opt{‘f’}) to see…

Continue Reading

Using bins when comparing genomic features

Comparing two files containing genomic features is a common task e.g. finding out whether the coordinates of your tags intersect with genes. Of course you could use intersectBed (as part of the BEDTools suite) for this purpose but here’s how to do it anyway using Perl. NOTE: I hard code the length of my tags…

Continue Reading

Forking in Perl 2

Build up an index of files to process, e.g. SAM files. Fork out 16 child processes, each time processing and eliminating one file from the index. As with all my code, use at your own risk. Comments and suggestions always welcome. For more information see Forking in Perl.

Continue Reading

Using Perl to log transform data

Very simple code using Perl to log transform (base 2) a list of numbers. 0 values are converted into 0.5, since you cannot take the logarithm of 0. For this example numbers are stored in the array @n. Using R I know this post is about using Perl to log transform data, but I’ve been…

Continue Reading