# The Poisson distribution

A Poisson distribution is the probability distribution that results from a Poisson experiment. A probability distribution assigns a probability to possible outcomes of a random experiment. A Poisson experiment has the following properties:

1. The outcomes of the experiment can be classified as either successes or failures.
2. The average number of successes that occurs in a specified region is known.
3. The probability that a success will occur is proportional to the size of the region.
4. The probability that a success will occur in an extremely small region is virtually zero.

A Poisson random variable is the number of successes that result from a Poisson experiment. Given the mean number of successes that occur in a specified region, we can compute the Poisson probability based on the following formula:

$$! P(x; \mu) = \frac{(e^{-\mu})(\mu^x)}{x!}$$

which is also written as:

$$! Pr(X = k) = e^{-\lambda} \frac{\lambda^k}{k!} \ \ k = 0, 1, 2, \dotsc$$

### Examples

The average number of homes sold is 2 homes per day. What is the probability that exactly 3 homes will be sold tomorrow?

$$! P(3; 2) = \frac{(e^{-2}) (2^3)}{3!}$$

Calculating this in R:

e <- exp(1)
((e^-2)*(2^3))/factorial(3)
[1] 0.180447

#or simply
dpois(x = 3, lambda = 2)
[1] 0.180447


The Poisson distribution can be used to estimate the technical variance in high-throughput sequencing experiments. My basic understanding is that the variance between technical replicates can be modelled using the Poisson distribution. For more information check out this really useful discussion on Biostars.

### Calculating confidence intervals

Calculate the confidence intervals using R:

#store 1,000,000 values that follow a Poisson distribution
data <- rpois(1000000,20)
#functions for calculating the lower and upper tails
poisson_lower_tail <- function(n) {
qchisq(0.025, 2*n)/2
}
poisson_upper_tail <- function(n) {
qchisq(0.975, 2*(n+1))/2
}
#lower limit for lambda = 20
poisson_lower_tail(20)
[1] 12.21652
#upper limit for lambda = 20
poisson_upper_tail(20)
[1] 30.88838

#how many values in data are lower than the lower limit
table(data<poisson_lower_tail(20))

FALSE   TRUE
960885  39115

#how many values in data are higher than the upper limit
table(data>poisson_upper_tail(20))

FALSE   TRUE
986567  13433

#what percentage of values were outside of the 95% CI
(13433 + 39115) * 100 / 1000000
[1] 5.2548

hist(data)
abline(v=poisson_lower_tail(20))
abline(v=poisson_upper_tail(20))