Quantile normalisation in R

Updated 2015 January 14th to include a slide from Rafael. From Wikipedia: In statistics, quantile normalization is a technique for making two distributions identical in statistical properties. To quantile normalize two or more distributions to each other, without a reference distribution, sort as before, then set to the average (usually, arithmetical mean) of the distributions....

Continue Reading

Markov chain

A Markov chain is a mathematical system that undergoes transitions from one state to another on a state space in a stochastic (random) manner. Examples of Markov chains include the board game snakes and ladders, where each state represents the position of a player on the board and a player moves between states (different positions...

Continue Reading

Probability

The fundamental idea of inferential statistics is determining the probability of obtaining the observed data when we assume the null hypothesis is true. For example, if we roll a die 10 times and got 10 sixes, what is the probability of observing this result if we assume the null hypothesis that the die was fair?...

Continue Reading

Set notation

I've just started the Mathematical Biostatistics Boot Camp 1 and to help me remember the set notations introduced in the first lecture, I'll include them here: The sample space, (upper case omega), is the collection of possible outcomes of an experiment, such as a die roll: An event, say E, is a subset of ,...

Continue Reading

Predicting cancer

So far I've come across four machine learning methods, which includes random forests, classification trees, hierarchical clustering and k-means clustering. Here I use all four of these methods (plus SVMs) towards predicting cancer, or more specifically malignant cancers using the Wisconsin breast cancer dataset.

Continue Reading

Singular Vector Decomposition using R

In linear algebra terms, a Singular Vector Decomposition (SVD) is the decomposition of a matrix X into three matrices, each having special properties. If X is a matrix with each variable in a column and each observation in a row then the SVD is where the columns of U are orthogonal (left singular vectors), the...

Continue Reading

On curve fitting using R

For linear relationships we can perform a simple linear regression. For other relationships we can try fitting a curve. From Wikipedia: Curve fitting is the process of constructing a curve, or mathematical function, that has the best fit to a series of data points, possibly subject to constraints. I will use the dataset from this...

Continue Reading