R function for calculating confusion matrix rates

Last updated: 2023/03/10

I often forget the names and aliases (and how to calculate them) of confusion matrix rates and have to look them up. Finally, I had enough and was looking for a single function that could calculate the most commonly used rates, like sensitivity or precision, but I couldn't find one that didn't require me to install some R package. Therefore I wrote my own called table_metrics and will briefly talk about it in this post.

I have had this Simple guide to confusion matrix terminology bookmarked for many years and I keep referring back to it. It does a great job of explaining the list of rates that are often calculated from a confusion matrix for a binary classifier. If you need a refresher on the confusion matrix rates/metrics, check it out.

We can generate the same confusion matrix as the Simple guide with the following code.

generate_example <- function(){
  dat <- data.frame(
    n = 1:165,
    truth = c(rep("no", 60), rep("yes", 105)),
    pred = c(rep("no", 50), rep("yes", 10), rep("no", 5), rep("yes", 100))
  )
  table(dat$truth, dat$pred)
}

confusion <- generate_example()
confusion
#
#        no yes
#   no   50  10
#   yes   5 100

I wrote the function confusion_matrix to generate a confusion matrix based on case numbers. The same confusion matrix can be generated with the function by sourcing it from GitHub.

source("https://raw.githubusercontent.com/davetang/learning_r/main/code/confusion_matrix.R")
eg <- confusion_matrix(TP=100, TN=50, FN=5, FP=10)
eg$cm
#
#        no yes
#   no   50  10
#   yes   5 100

To use the table_metrics function I wrote, you also source it directly from GitHub.

source("https://raw.githubusercontent.com/davetang/learning_r/main/code/table_metrics.R")

The function has four parameters, which are described below using roxygen2 syntax (copied and pasted from the source code of the table_metrics function).

#' @param tab Confusion matrix of class table
#' @param pos Name of the positive label
#' @param neg Name of the negative label
#' @param truth Where the truth/known set is stored, `row` or `col`

To use table_metrics on the example data we generated, we have to provide arguments for the four parameters.

The first parameter is the confusion matrix stored as a table.

The second and third parameters are the names of the positive and negative labels. The example used yes and no, so those are our input arguments.

If you have generated a confusion matrix with the predictions as the rows and truth labels as the columns then change the fourth argument to 'col'. Our truth labels are on the rows, so 'row' is specified.

table_metrics(confusion, 'yes', 'no', 'row')
# $accuracy
# [1] 0.909
#
# $misclassifcation_rate
# [1] 0.0909
#
# $error_rate
# [1] 0.0909
#
# $true_positive_rate
# [1] 0.952
#
# $sensitivity
# [1] 0.952
#
# $recall
# [1] 0.952
#
# $false_positive_rate
# [1] 0.167
#
# $true_negative_rate
# [1] 0.833
#
# $specificity
# [1] 0.833
#
# $precision
# [1] 0.909
#
# $prevalance
# [1] 0.636
#
# $f1_score
# [1] 0.9300032

The function returns a list with the confusion matrix rates/metrics. You can save the list and subset for the rate/metric you are interested in.

my_metrics <- table_metrics(confusion, 'yes', 'no', 'row')
my_metrics$sensitivity
# 0.952

Finally, if you want more significant digits (default is set to 3), supply it as the fifth argument.

I have some additional notes on machine learning evaluation that may also be of interest. And that's it!

Print Friendly, PDF & Email



Creative Commons License
This work is licensed under a Creative Commons
Attribution 4.0 International License
.

Leave a Reply

Your email address will not be published. Required fields are marked *

This site uses Akismet to reduce spam. Learn how your comment data is processed.