Wrap and unwrap a line in Vim

Last updated: 2023/07/06 Vim is a text editor that is renowned for having a steep learning curve. People like to illustrate its difficulty by pointing out that Vim is so unfriendly to new users that they do not even know how to quit the program! I learned Vim (and Perl) because that was what my…

Continue Reading

Amino acid quiz using Python

I keep forgetting amino acid abbreviations and their classes because I never bothered learning them. To see if I can teach an old dog (me) new tricks, I wrote this simple Python script that asks questions about amino acids. I’ll keep doing the quiz until I can finally remember. Below is the Python code for…

Continue Reading

Reading a list of files into a single R data frame

I had been using map_dfr from the purrr package to load multiple files into one single data frame. But this function has been superseded with the following explanation: The functions were superseded in purrr 1.0.0 because their names suggest they work like _lgl(), _int(), etc which require length 1 outputs, but actually they return results…

Continue Reading

Reading irregular data into R

Some bioinformatics tools output files that are visually nice and are meant for manual inspection. This practice of generating visually nice output (and/or) Excel files may be rooted in how bioinformaticians and biologists used to work with each other; give the bioinformatician/s some data to analyse and they will generate output that will be manually…

Continue Reading

Egoprompting

I read a news article recently about someone suing ChatGPT for defamation. Prior to finding out this news, I never considered asking a Large Language Model (LLM) about myself. This reminded me of egosurfing, which is the act of using your own name as a keyword on a search engine or search engines to see…

Continue Reading

Check where a gene is expressed from the command line

The Pachter Lab have developed some very useful bioinformatics software. In this post, I use gget to quickly query ARCHS4 on the command line to see where a gene of interest is expressed. The gget tool has other functionality too including sequence alignment, enrichment analysis, and even protein structure prediction using AlphaFold. Check it out!…

Continue Reading

Setting up a VNC server on AWS EC2

Wikipedia provides a useful definition of Virtual Network Computing (VNC): Virtual Network Computing (VNC) is a graphical desktop-sharing system that uses the Remote Frame Buffer protocol (RFB) to remotely control another computer. It transmits the keyboard and mouse input from one computer to another, relaying the graphical-screen updates, over a network. What this means is…

Continue Reading

TIL that you can download SRA data from AWS

The Sequence Read Archive (SRA) is the largest publicly available repository of high throughput sequencing data. (Fun fact: it used to be called the Short Read Archive since most of the data was from short read sequencers.) The tool fastq-dump from the SRA Toolkit can be used to download SRA data. A while ago I…

Continue Reading

R function for calculating confusion matrix rates

Last updated: 2023/03/10 I often forget the names and aliases (and how to calculate them) of confusion matrix rates and have to look them up. Finally, I had enough and was looking for a single function that could calculate the most commonly used rates, like sensitivity or precision, but I couldn’t find one that didn’t…

Continue Reading

Grepping PowerPoint files

Last updated: 2023/03/07 I’m not really a fan of PowerPoint but it’s ubiquitous in research, so I have to work with them. Sometimes I need to find a slide amongst a pile of PowerPoint files and waste a lot of time opening and closing files. I wondered whether I could grep PowerPoint files and sure…

Continue Reading