## Markov chain

A Markov chain is a mathematical system that undergoes transitions from one state to another on a state space in a stochastic (random) manner. Examples of Markov chains include the board game snakes and ladders, where each state represents the position of a player on the board and a player moves between states (different positions…

## Tissue specificity

Last updated: 2023/10/11 A key measure in information theory is entropy, which is: "The amount of uncertainty involved in a random process; the lower the uncertainty, the lower the entropy." For example, there is lower entropy in a fair coin flip versus a fair die roll since there are more possible outcomes with a die…

## Set notation

I’ve just started the Mathematical Biostatistics Boot Camp 1 and to help me remember the set notations introduced in the first lecture, I’ll include them here: The sample space, $$\Omega$$ (upper case omega), is the collection of possible outcomes of an experiment, such as a die roll: $$!\Omega = \{1, 2, 3, 4, 5, 6\}$$…

## Comparing different distributions

Updated 2017 September 7th The Kolmogorov-Smirnov test can be used to test whether two underlying one-dimensional probability distributions differ. As noted in the Wikipedia article: Note that the two-sample test checks whether the two data samples come from the same distribution. This does not specify what that common distribution is (e.g. whether it’s normal or…

## The Poisson distribution

A Poisson distribution is the probability distribution that results from a Poisson experiment. A probability distribution assigns a probability to possible outcomes of a random experiment. A Poisson experiment has the following properties: The outcomes of the experiment can be classified as either successes or failures. The average number of successes that occurs in a…

## Manual linear regression analysis using R

Last updated: 2022 December 12th On Wikipedia, linear regression is described as: In statistics, linear regression is a linear approach for modelling the relationship between a scalar response and one or more explanatory variables (also known as dependent and independent variables). The case of one explanatory variable is called simple linear regression; for more than…

## Step by step Principal Component Analysis using R

I’ve always wondered what goes on behind the scenes of a Principal Component Analysis (PCA). I found this extremely useful tutorial that explains the key concepts of PCA and shows the step by step calculations. Here, I use R to perform each step of a PCA as per the tutorial. Our dataset visualised on the…

## Creating a correlation matrix with R

Updated 2024 April 7th Incentive Let be a matrix, where are elements of , where is the row and is the column. If the matrix contained transcript expression data, then is the expression level of the transcript in the assay. The elements of the row of form the transcriptional response of the transcript. The elements…

## Using R to obtain basic statistics on your dataset

Updated: 2014 June 20th Most of the data I work with are represented as tables i.e. with rows and columns. R makes it easy to store (as data frames) and process such data to produce some basic statistics. Here are just some R functions that calculate some basic, but nevertheless useful, statistics. I will use…

## Pearson vs. Spearman correlation

Correlation measures are commonly used to show how correlated two sets of datasets are. A commonly used measure is the Pearson correlation. To illustrate when not to use a Pearson correlation: If we remove the 2,000 value: Use a non-parametric correlation (e.g. Spearman’s rank) measure if your dataset has outliers. It would probably be best…