Updated 2018 January 16th; rewrote entire post
It takes literally one line of code in R to conduct a Principal Component Analysis (PCA).
# PCA on the famous iris dataset iris.prcomp <- prcomp(iris[, -5], scale. = TRUE)
Yet to this day, I am still trying to understand all the details of the method. I have links to various resources on my PCA wiki page that have helped me understand the method a bit better. Now, let's look into the results of prcomp().
class(iris.prcomp)
[1] "prcomp"
summary(iris.pca)
Call:
PCA(X = iris[, -5], graph = FALSE)
Eigenvalues
Dim.1 Dim.2 Dim.3 Dim.4
Variance 2.918 0.914 0.147 0.021
% of var. 72.962 22.851 3.669 0.518
Cumulative % of var. 72.962 95.813 99.482 100.000
Individuals (the 10 first)
Dist Dim.1 ctr cos2 Dim.2 ctr cos2 Dim.3 ctr cos2
1 | 2.319 | -2.265 1.172 0.954 | 0.480 0.168 0.043 | -0.128 0.074 0.003 |
2 | 2.202 | -2.081 0.989 0.893 | -0.674 0.331 0.094 | -0.235 0.250 0.011 |
3 | 2.389 | -2.364 1.277 0.979 | -0.342 0.085 0.020 | 0.044 0.009 0.000 |
4 | 2.378 | -2.299 1.208 0.935 | -0.597 0.260 0.063 | 0.091 0.038 0.001 |
5 | 2.476 | -2.390 1.305 0.932 | 0.647 0.305 0.068 | 0.016 0.001 0.000 |
6 | 2.555 | -2.076 0.984 0.660 | 1.489 1.617 0.340 | 0.027 0.003 0.000 |
7 | 2.468 | -2.444 1.364 0.981 | 0.048 0.002 0.000 | 0.335 0.511 0.018 |
8 | 2.246 | -2.233 1.139 0.988 | 0.223 0.036 0.010 | -0.089 0.036 0.002 |
9 | 2.592 | -2.335 1.245 0.812 | -1.115 0.907 0.185 | 0.145 0.096 0.003 |
10 | 2.249 | -2.184 1.090 0.943 | -0.469 0.160 0.043 | -0.254 0.293 0.013 |
Variables
Dim.1 ctr cos2 Dim.2 ctr cos2 Dim.3 ctr cos2
Sepal.Length | 0.890 27.151 0.792 | 0.361 14.244 0.130 | -0.276 51.778 0.076 |
Sepal.Width | -0.460 7.255 0.212 | 0.883 85.247 0.779 | 0.094 5.972 0.009 |
Petal.Length | 0.992 33.688 0.983 | 0.023 0.060 0.001 | 0.054 2.020 0.003 |
Petal.Width | 0.965 31.906 0.931 | 0.064 0.448 0.004 | 0.243 40.230 0.059 |
names(iris.prcomp)
[1] "sdev" "rotation" "center" "scale" "x"
# eigenvalues = sdev^2
iris.prcomp$sdev^2
[1] 2.91849782 0.91403047 0.14675688 0.02071484
# use package factoextra to make nice plots
library(factoextra)
get_eig(iris.prcomp)
eigenvalue variance.percent cumulative.variance.percent
Dim.1 2.91849782 72.9624454 72.96245
Dim.2 0.91403047 22.8507618 95.81321
Dim.3 0.14675688 3.6689219 99.48213
Dim.4 0.02071484 0.5178709 100.00000
PCA is used to create linear combinations of the original data that capture as much information in the original data as possible.
PCA on mtcars
Run a PCA on a test dataset that is distributed with R:
# installing package for plotting purposes
install.packages("wordcloud")
# load library
library(wordcloud)
data.pca <- prcomp(mtcars)
# plot first and second PCs
textplot(data.pca$x[,1],
data.pca$x[,2],
row.names(data.pca$x),
cex = 0.7,
xlim=c(-250,300),
ylim=c(-150,75)
)
I don't know much about cars, but at least the MERC 450s are close to each other. In addition, I guess most of the Japanese brands are on the left and the American brands are on the right?.

This work is licensed under a Creative Commons
Attribution 4.0 International License.