Gene Ontology Graph Visualisation

After scouring the web all afternoon looking for a solution for visualising gene ontology terms, which I have already found to be over represented, I finally found a simple solution. Prior to this, I had tried several Cytoscape plugins (BiNGO, ClueGO, etc.), online webtools (REVIGO, GOrilla, WEGO, GOLEM, etc.) and others I can't be bothered...

Continue Reading

K means clustering

Updated: 2014 March 13th From Wikipedia: k-means clustering is a method of vector quantization, originally from signal processing, that is popular for cluster analysis in data mining. k-means clustering aims to partition n observations into k clusters in which each observation belongs to the cluster with the nearest mean, serving as a prototype of the...

Continue Reading

Random Forests in predicting wines

Updated 2014 September 17th to reflect changes in the R packages Source http://mkseo.pe.kr/stats/?p=220. Using Random Forests in predicting wines derived from three different cultivars. Download the wine data set from the Machine Learning Repository.

Continue Reading

Kobe Byrant and the 2012 Lakers

Kobe Byrant and the Lakers (11-14) aren't doing as well as I had expected given the team they acquired in the off season. Everyone likes to point out that when he scores over x number of points (e.g. 30), the Lakers have lost more than they have won. So I took his stats for this...

Continue Reading

Explaining PCA to a school child

Ed Yong asked on Twitter “Explain principal component analysis to a schoolchild in a tweet.” Since I can’t explain PCA eloquently, I found this interesting and wanted to keep a record of the replies for future reference. Here are some of the modified replies, with my favourite first (and the rest in no particular order):...

Continue Reading