Log ratio

From Dave's wiki
Jump to navigation Jump to search

From the HarvardX PH525x course Data Analysis for Genomics

In the previous module, we described how the Law of Transformation helped stabilize the variance/mean relationship. However in genomics, there's another reason why logs are popular, why logs are popular transformation. And it has to do with the fact that many differences are quantified with fold changes. So when we ask if the gene is different in one sample versus the other, this is typically quantified with the ratio. How many times bigger is the expression in this sample compared this other simple? So we're using the ratios to summarize. That gives us another reason to use logs.

And that is illustrated with the next two figures I'm going to show you. Here is a plot, a very simple plot of ratios. So we have 1 over 1, 2 over 1, 4 over 1, 32 over 1. And now we have what should be the symmetric counterparts. 1 over 2, 1 over 4, up to 1 over 32. One thing we see right away, and it's not surprising, is that these ratios over here are much closer to 1 than these ratios over here. So, in particular, 32 is much more far away from 1 than 1 over 32. And we would like these to be symmetric, because full change should be symmetric. It shouldn't matter which sample you are using as a reference, the size of the difference should be the same.

So let's look at what happened to the log. When we take the log, that is achieved. We have 2 over 1, 4 over 1, up to 32 over 1. 1 over 32, 1 over 16, up to 1 over 2. Why is this happening? It has to do with a very simple mathematical relationship which you should all know, that the log of ratios is the difference of logs, which is the negative of this difference of logs, which is then the negative of the log ratio of the reciprocal.