Check where a gene is expressed from the command line

The Pachter Lab have developed some very useful bioinformatics software. In this post, I use gget to quickly query ARCHS4 on the command line to see where a gene of interest is expressed. The gget tool has other functionality too including sequence alignment, enrichment analysis, and even protein structure prediction using AlphaFold. Check it out!…

Continue Reading

VCF to PED

One of the classic bioinformatics problems is converting files from one format into another. In this post, I go through the process of creating a PED and MAP file from a VCF file. All the files described in this post are available in this GitHub repository.

Continue Reading

BED to GRanges

Updated 2015 April 6th to include the intersect_bed() function in the bedr package. Last year I saw a post on Writing an R package from scratch and I always wanted to follow the tutorial. Yesterday while trying to make some plots using Gviz, I had some BED-like files (not supported by Gviz), which I wanted…

Continue Reading

BAM to CRAM

Last updated: 2024/12/04 TL;DR As pointed out by Colin, converting a BAM file to CRAM is simply one command: samtools view -T genome/chrX.fa -C -o eg/ERR188273_chrX.cram eg/ERR188273_chrX.bam Of note is that the reference file used to produce the BAM file is required and is used as an argument for the -T option. As for why…

Continue Reading

Getting started with Picard

Updated hyperlinks on the 2015 January 26th; please comment if you find any more dead links. Picard is a suite of Java-based command-line utilities that manipulate SAM/BAM files. Currently, I’m analysing some paired-end libraries and I wanted to calculate the average insert size based on the alignments; that’s how I found Picard. While reading the…

Continue Reading

I’ve joined Twitter

Today while reading a paper, I found some interesting one-liner facts. They are way too short to create a post on but I would like to make a repository of them. What better place to store these facts than Twitter! You can follow me on Twitter for a list of facts on molecular biology and…

Continue Reading

Setting up Windows for bioinformatics

Please refer to Setting up Windows for bioinformatics in 2019. I use Windows on all of my computers. Using just Windows for bioinformatics is not impossible but it’s really just easier to have access to a Linux operating system. In the case of my desktop PC, I have a dual boot setup (Ubuntu and Windows…

Continue Reading

DESeq vs. edgeR vs. baySeq

6th April 2012: For a more updated version of this post, please refer see this post. A very simple comparison Using the TagSeqExample.tab file from the DESeq package as the benchmark dataset. According to DESeq authors, T1a and T1b are similar, so I removed the second column in the file corresponding to T1a: Hierarchical clustering…

Continue Reading

Finding genes with co-expression patterns

Can the R bioconductor package “WGCNA” find artefactually created modules? Firstly some (subpar) code to generate an artefactual list of genes with co-expression patterns (modules): Running the code: ./generate_random_module.pl 10 1000 20 > 10_sample_1000_list_20_module.tsv Patterns: 1 0 0 0 0 1 0 0 0 1 0 0 0 0 0 0 1 0 1 1…

Continue Reading