Getting started with Picard

Updated hyperlinks on the 2015 January 26th; please comment if you find any more dead links.

Picard is a suite of Java-based command-line utilities that manipulate SAM/BAM files. Currently, I'm analysing some paired-end libraries and I wanted to calculate the average insert size based on the alignments; that's how I found Picard. While reading the documentation I realised that Picard has a lot more to offer and hence I'm writing this post to look into specific tools that I thought would be useful.

To get started, download Picard and an alignment (whole genome shotgun sequencing) from the 1000 genomes project that we can use to test Picard:

Continue reading

Getting started with paired-end reads

I've wanted to write this post for a while, but I never had to work with paired-end libraries, so the impetus wasn't quite there. Finally I've decided to take a look at some paired-end libraries at work and as usual, I will test some simple examples before I touch the real data. For those not familiar with paired-end reads, check out this post; it has very nice and simple illustrations, along with explanations on the terminology used in paired-end sequencing. Now let's get started!

Continue reading

Quantile normalisation in R

Updated 2015 January 14th to include a slide from Rafael.

From Wikipedia:

In statistics, quantile normalization is a technique for making two distributions identical in statistical properties. To quantile normalize two or more distributions to each other, without a reference distribution, sort as before, then set to the average (usually, arithmetical mean) of the distributions. So the highest value in all cases becomes the mean of the highest values, the second highest value becomes the mean of the second highest values, and so on.

Here, I follow the simple example on Wikipedia using R. Firstly, let's create the test dataset:

Continue reading