R function for calculating confusion matrix rates

Last updated: 2023/03/10 I often forget the names and aliases (and how to calculate them) of confusion matrix rates and have to look them up. Finally, I had enough and was looking for a single function that could calculate the most commonly used rates, like sensitivity or precision, but I couldn’t find one that didn’t…

Continue Reading

Analysing miRNA expression in cancers

MiRNAs are a class of small RNAs that when expressed usually down regulates the expression of its target transcript by binding to it and causing it to degrade or inhibiting it from being translated. There has been a lot of interest in studying the expression pattern of miRNAs, especially in relation to cancer, since their…

Continue Reading

Sequence composition and random forests

Updated: 2013 November 28th The sequence composition or the nucleotide composition at transcriptional starting sites (TSSs) of mRNAs are biased, i.e. certain nucleotides are preferred. Here I examine the sequence composition at the TSS of the NCBI Reference Sequence Database also known as RefSeq and use random forests to see if it’s possible to train…

Continue Reading

Predicting cancer

So far I’ve come across four machine learning methods, which includes random forests, classification trees, hierarchical clustering and k-means clustering. Here I use all four of these methods (plus SVMs) towards predicting cancer, or more specifically malignant cancers using the Wisconsin breast cancer dataset.

Continue Reading

Building a classification tree in R

In week 6 of the Data Analysis course offered freely on Coursera, there was a lecture on building classification trees in R (also known as decision trees). I thoroughly enjoyed the lecture and here I reiterate what was taught, both to re-enforce my memory and for sharing purposes. I will jump straight into building a…

Continue Reading

K means clustering

Updated: 2014 March 13th From Wikipedia: k-means clustering is a method of vector quantization, originally from signal processing, that is popular for cluster analysis in data mining. k-means clustering aims to partition n observations into k clusters in which each observation belongs to the cluster with the nearest mean, serving as a prototype of the…

Continue Reading

Random Forests in predicting wines

Updated 2014 September 17th to reflect changes in the R packages Source http://mkseo.pe.kr/stats/?p=220. Using Random Forests in predicting wines derived from three different cultivars. Download the wine data set from the Machine Learning Repository.

Continue Reading

Visualising hierarchical clustering results

I’ve written about hierarchical clustering before as an attempt to understand it better. Within R, you can plot the hierarchical clustering results however when working with a large dataset you may produce plots like these where all the labels are overlapping: and As you can see you can’t see any of the labels. During my…

Continue Reading

Hierarchical clustering

Hierarchical clustering is a cluster analysis method that builds a hierarchy of clusters. There are two types of strategies, agglomerative and divisive. This post is about agglomerative, the bottom up approach, hierarchical clustering using the USArrests data set. You can see some correlation between assault and murder. Prettier plot that also shows Pearson correlations using…

Continue Reading