Last updated: 2023/03/10
I often forget the names and aliases (and how to calculate them) of confusion matrix rates and have to look them up. Finally, I had enough and was looking for a single function that could calculate the most commonly used rates, like sensitivity or precision, but I couldn't find one that didn't require me to install some R package. Therefore I wrote my own called table_metrics and will briefly talk about it in this post.
I have had this Simple guide to confusion matrix terminology bookmarked for many years and I keep referring back to it. It does a great job of explaining the list of rates that are often calculated from a confusion matrix for a binary classifier. If you need a refresher on the confusion matrix rates/metrics, check it out.
We can generate the same confusion matrix as the Simple guide with the following code.
generate_example <- function(){
dat <- data.frame(
n = 1:165,
truth = c(rep("no", 60), rep("yes", 105)),
pred = c(rep("no", 50), rep("yes", 10), rep("no", 5), rep("yes", 100))
)
table(dat$truth, dat$pred)
}
confusion <- generate_example()
confusion
#
# no yes
# no 50 10
# yes 5 100
I wrote the function confusion_matrix to generate a confusion matrix based on case numbers. The same confusion matrix can be generated with the function by sourcing it from GitHub.
source("https://raw.githubusercontent.com/davetang/learning_r/main/code/confusion_matrix.R")
eg <- confusion_matrix(TP=100, TN=50, FN=5, FP=10)
eg$cm
#
# no yes
# no 50 10
# yes 5 100
To use the table_metrics
function I wrote, you also source it directly from GitHub.
source("https://raw.githubusercontent.com/davetang/learning_r/main/code/table_metrics.R")
The function has four parameters, which are described below using roxygen2
syntax (copied and pasted from the source code of the table_metrics
function).
#' @param tab Confusion matrix of class table
#' @param pos Name of the positive label
#' @param neg Name of the negative label
#' @param truth Where the truth/known set is stored, `row` or `col`
To use table_metrics
on the example data we generated, we have to provide arguments for the four parameters.
The first parameter is the confusion matrix stored as a table
.
The second and third parameters are the names of the positive and negative labels. The example used yes
and no
, so those are our input arguments.
If you have generated a confusion matrix with the predictions as the rows and truth labels as the columns then change the fourth argument to 'col'
. Our truth labels are on the rows, so 'row'
is specified.
table_metrics(confusion, 'yes', 'no', 'row')
# $accuracy
# [1] 0.909
#
# $misclassifcation_rate
# [1] 0.0909
#
# $error_rate
# [1] 0.0909
#
# $true_positive_rate
# [1] 0.952
#
# $sensitivity
# [1] 0.952
#
# $recall
# [1] 0.952
#
# $false_positive_rate
# [1] 0.167
#
# $true_negative_rate
# [1] 0.833
#
# $specificity
# [1] 0.833
#
# $precision
# [1] 0.909
#
# $prevalance
# [1] 0.636
#
# $f1_score
# [1] 0.9300032
The function returns a list with the confusion matrix rates/metrics. You can save the list and subset for the rate/metric you are interested in.
my_metrics <- table_metrics(confusion, 'yes', 'no', 'row')
my_metrics$sensitivity
# 0.952
Finally, if you want more significant digits (default is set to 3), supply it as the fifth argument.
I have some additional notes on machine learning evaluation that may also be of interest. And that's it!

This work is licensed under a Creative Commons
Attribution 4.0 International License.