The Normal Distribution

Updated 2014 December 19th

The normal or Gaussian distribution is commonly occurring continuous probability distribution. The skewness, which is a measure of symmetry (or there lackof), of a normal distribution is zero since the distribution is symmetrical, i.e. it looks the same to the left and right of the centre. Kurtosis can be used to measure the shape of a normal distribution; a high kurtosis indicates that the normal distribution has a distinct peak near the mean and a low kurtosis indicates a flat distribution.

We can generate an univariate data set that follows a normal distribution using the rnorm() function in R; the function takes three parameters, the number of data points, the mean, and the standard deviation:

#seed for reproducibility
set.seed(31)
x.norm <- rnorm(n=200, m=10, sd=2)

The histogram can be used to show both the skewness and kurtosis of a data set.

hist(x.norm, main="Histogram x.norm")

hist_x_normThe distribution is roughly symmetrical.

We can use the skewness() and kurtosis() functions from the e1071 package to measure the skewness and kurtosis, respectively.

#install if necessary
install.packages('e1071')
library(e1071)

#seed for reproducibility
set.seed(31)
x.norm <- rnorm(n=200, m=10, sd=2)

skewness(x.norm)
[1] 0.005622505

#the default is a measure of excess kurtosis
#see http://www.itl.nist.gov/div898/handbook/eda/section3/eda35b.htm
kurtosis(x.norm)
[1] -0.2955075

#the kurtosis of the standard normal distribution is near 0
kurtosis(rnorm(10000, 0, 1))
[1] -0.0153943

Checking whether a dataset is normal

I've written about testing for normality previously. Briefly, we can use the qqnorm() function, to test the goodness of fit of a normal distribution and use the Shapiro-Wilk test of normality.

set.seed(31)
x.norm <- rnorm(n=200, m=10, sd=2)

#The Shapiro–Wilk test checks whether a sample is normally distributed
#the null hypothesis is that the data was independently drawn from a normal distribution
#the p-value indicates that we cannot reject the null
shapiro.test(x.norm)

	Shapiro-Wilk normality test

data:  x.norm
W = 0.9956, p-value = 0.8364

#in this case, we can reject the null hypothesis
shapiro.test(rgamma(n = 200, shape = 1))

	Shapiro-Wilk normality test

data:  rgamma(n = 200, shape = 1)
W = 0.8443, p-value = 2.299e-13
Print Friendly, PDF & Email



Creative Commons License
This work is licensed under a Creative Commons
Attribution 4.0 International License
.

Leave a Reply

Your email address will not be published. Required fields are marked *