A Poisson distribution is the probability distribution that results from a Poisson experiment. A probability distribution assigns a probability to possible outcomes of a random experiment. A Poisson experiment has the following properties:

- The outcomes of the experiment can be classified as either successes or failures.
- The average number of successes that occurs in a specified region is known.
- The probability that a success will occur is proportional to the size of the region.
- The probability that a success will occur in an extremely small region is virtually zero.

A Poisson random variable is the number of successes that result from a Poisson experiment. Given the mean number of successes that occur in a specified region, we can compute the Poisson probability based on the following formula:

which is also written as:

## Examples

The average number of homes sold is 2 homes per day. What is the probability that exactly 3 homes will be sold tomorrow?

Calculating this manually in R:

```
e <- exp(1)
((e^-2)*(2^3))/factorial(3)
```

`[1] 0.180447`

Using `dpois()`

:

`dpois(x = 3, lambda = 2)`

`[1] 0.180447`

The Poisson distribution can be used to estimate the technical variance in high-throughput sequencing experiments.

My basic understanding is that the variance between technical replicates can be modelled using the Poisson distribution. Check out Why Does Rna-Seq Read Count Fit Poisson Distribution? on Biostars.

## Calculating confidence intervals

Calculate the confidence intervals using R. Create data with 1,000,000 values that follow a Poisson distribution with lambda = 20.

```
set.seed(1984)
n <- 1000000
data <- rpois(n, 20)
```

Functions for calculating the lower and upper tails.

```
poisson_lower_tail <- function(n) {
qchisq(0.025, 2*n)/2
}
poisson_upper_tail <- function(n) {
qchisq(0.975, 2*(n+1))/2
}
```

Lower limit for lambda = 20.

`poisson_lower_tail(20)`

`[1] 12.21652`

Upper limit for lambda = 20.

`poisson_upper_tail(20)`

`[1] 30.88838`

How many values in data are lower than the lower limit?

`table(data<poisson_lower_tail(20))`

```
FALSE TRUE
961213 38787
```

How many values in data are higher than the upper limit?

`table(data>poisson_upper_tail(20))`

```
FALSE TRUE
986239 13761
```

What percentage of values were outside of the 95% CI?

`(sum(data<poisson_lower_tail(20)) + sum(data>poisson_upper_tail(20))) * 100 / n`

`[1] 5.2548`

Plot.

```
hist(data)
abline(v=poisson_lower_tail(20))
abline(v=poisson_upper_tail(20))
```

## Webtool

Using the Poisson Confidence Interval Calculator and lambda = 20 returns:

- 99% confidence interval: 10.35327 - 34.66800
- 95% confidence interval: 12.21652 - 30.88838
- 90% confidence interval: 13.25465 - 29.06202

which matches our 95% CI values.

This work is licensed under a Creative Commons

Attribution 4.0 International License.