Thoughts on AI

I have been interested in machine learning since I was a PhD student back in 2010. I was always in awe of papers where they applied machine learning to cluster or classify data. My original PhD topic was to discover biomarkers in blood for Parkinson’s Disease (PD) patients that were early onset and had not…

Continue Reading

TIL about Open WebUI

I just heard about Open WebUI from Harish’s comment on my Ollama and DeepSeek post and decided to check it out. From their website: Open WebUI is an extensible, feature-rich, and user-friendly self-hosted AI platform designed to operate entirely offline. It supports various LLM runners like Ollama and OpenAI-compatible APIs, with built-in inference engine for…

Continue Reading

TIL about Ollama and DeepSeek

I guess by now you have probably heard about DeepSeek. If in the extremely unlikely event that you have come across this post before reading about DeepSeek, please go look them up. When I first heard about them and their amazing LLM models, the first thing I wanted to do was to try them out!…

Continue Reading

Gene Set Variation Analysis

The Gene Set Variation Analysis (GSVA) is another popular analysis method for bulk RNA-seq data. GSVA differs from Gene Set Enrichment Analysis (GSEA) in that it can estimate gene set enrichment within a single sample. GSEA typically uses results from a differential expression analysis, which requires multiple samples, to determine whether there is an enrichment…

Continue Reading

Using the GenomicDataCommons package

The {GenomicDataCommons} Bioconductor package provides basic infrastructure for querying, accessing, and mining genomic datasets available from the Genomic Data Commons (GDC). The About the GDC webpage provides a brief description of the program: The Genomic Data Commons (GDC) is a research program of the National Cancer Institute (NCI). The mission of the GDC is to…

Continue Reading

Downloading molecular signatures from MSigDB in R

The Molecular Signatures Database (MSigDB) is a nice resource containing various gene sets designed for use in Gene Set Enrichment Analyses (GSEA) and its variants. It was co-developed with the GSEA by the Broad Institute and is still maintained by them; you can read more in the classic paper: Gene set enrichment analysis: A knowledge-based…

Continue Reading

Ensembl Gene IDs to gene symbols

For converting Ensembl Gene IDs to gene symbols, using biomaRt is often recommended and indeed it is what I typically use. However, recently I needed to use Ensembl version 112 and could not get {biomaRt} to work with this specific version. Here’s what I tried: Used listEnsemblArchives() to find the host URL for version 112,…

Continue Reading

Running a fork bomb

Since it was Halloween and all, I shared an article with some scary Linux commands that one should never run to some of my colleagues. One of them was a fork bomb, which looks like this: :(){:|:&};: In Bash, a function is defined like so: function_name () { commands } So the fork bomb starts…

Continue Reading

An example differential gene expression results table

This post contains the analysis steps used to create a differential gene expression results table generated from RNA-seq counts summarised using nf-core/rnaseq. The comparison was done between two conditions: normal versus (lung) cancer. We will be using {edgeR}, so install it if you haven’t already. if (!require("BiocManager", quietly = TRUE)) install.packages("BiocManager") BiocManager::install("edgeR") We will also…

Continue Reading