The Poisson distribution

A Poisson distribution is the probability distribution that results from a Poisson experiment. A probability distribution assigns a probability to possible outcomes of a random experiment. A Poisson experiment has the following properties:

  1. The outcomes of the experiment can be classified as either successes or failures.
  2. The average number of successes that occurs in a specified region is known.
  3. The probability that a success will occur is proportional to the size of the region.
  4. The probability that a success will occur in an extremely small region is virtually zero.

A Poisson random variable is the number of successes that result from a Poisson experiment. Given the mean number of successes that occur in a specified region, we can compute the Poisson probability based on the following formula:


P(x; \mu) = \frac{(e^{-\mu})(\mu^x)}{x!}

which is also written as:


Pr(X = k) = e^{-\lambda} \frac{\lambda^k}{k!} \ \ k = 0, 1, 2, \dotsc

Examples

The average number of homes sold is 2 homes per day. What is the probability that exactly 3 homes will be sold tomorrow?


P(3; 2) = \frac{(e^{-2}) (2^3)}{3!}

Calculating this manually in R:

e <- exp(1)
((e^-2)*(2^3))/factorial(3)
[1] 0.180447

Using dpois():

dpois(x = 3, lambda = 2)
[1] 0.180447

The Poisson distribution can be used to estimate the technical variance in high-throughput sequencing experiments.

My basic understanding is that the variance between technical replicates can be modelled using the Poisson distribution. Check out Why Does Rna-Seq Read Count Fit Poisson Distribution? on Biostars.

Calculating confidence intervals

Calculate the confidence intervals using R. Create data with 1,000,000 values that follow a Poisson distribution with lambda = 20.

set.seed(1984)
n <- 1000000
data <- rpois(n, 20)

Functions for calculating the lower and upper tails.

poisson_lower_tail <- function(n) {
   qchisq(0.025, 2*n)/2
}
poisson_upper_tail <- function(n) {
   qchisq(0.975, 2*(n+1))/2
}

Lower limit for lambda = 20.

poisson_lower_tail(20)
[1] 12.21652

Upper limit for lambda = 20.

poisson_upper_tail(20)
[1] 30.88838

How many values in data are lower than the lower limit?

table(data<poisson_lower_tail(20))
 FALSE   TRUE 
961213  38787 

How many values in data are higher than the upper limit?

table(data>poisson_upper_tail(20))
 FALSE   TRUE 
986239  13761 

What percentage of values were outside of the 95% CI?

(sum(data<poisson_lower_tail(20)) + sum(data>poisson_upper_tail(20))) * 100 / n
[1] 5.2548

Plot.

hist(data)
abline(v=poisson_lower_tail(20))
abline(v=poisson_upper_tail(20))

Webtool

Using the Poisson Confidence Interval Calculator and lambda = 20 returns:

  • 99% confidence interval: 10.35327 - 34.66800
  • 95% confidence interval: 12.21652 - 30.88838
  • 90% confidence interval: 13.25465 - 29.06202

which matches our 95% CI values.




Creative Commons License
This work is licensed under a Creative Commons
Attribution 4.0 International License
.

Leave a Reply

Your email address will not be published. Required fields are marked *

This site uses Akismet to reduce spam. Learn how your comment data is processed.