Sequence analysis of SARS-CoV-2 part 3

This post is a continuation of a series of posts on the sequence analysis of SARS-CoV-2; see part 1 and part 2 if you haven’t already. Since my first post, I found out that you can blast sequences against a Betacoronavirus database on NCBI BLAST. The database, as of 2020/03/10, has a total of 7,844…

Continue Reading

Sequence analysis of SARS-CoV-2 part 2

I ended my previous post on the sequence analysis of SARS-CoV-2 with the amino acid alignment of the spike protein from SARS-CoV-2 (MN908947) and Bat coronavirus RaTG13 (MN996532). The spike protein is of specific interest as it is due to its binding with the angiotensin converting enzyme 2 (ACE2) receptor that it is able to…

Continue Reading

Sequence analysis of SARS-CoV-2

The article “A new coronavirus associated with human respiratory disease in China” released the full genome sequence (29,903 nt) of Severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2). In the paper, meta-transcriptomic sequencing was performed on bronchoalveolar lavage fluid (BALF) from a 41 year old male suffering from a severe respiratory disease. Contigs were assembled using…

Continue Reading

Reproducible Bioinformatics

I will be giving a workshop titled “Reproducible Bioinformatics” at BioC Asia tomorrow. I have been thinking a lot about this topic and my aim for the workshop is to introduce computational tools and demonstrate how they can be used to help promote reproducibility when performing bioinformatic analyses. Ensuring reproducibility shouldn’t be an extra burden…

Continue Reading

Comparing VCF files

In this post, I will compare different tools for comparing VCF files. To create a reproducible example, I will make use of Docker and Conda. I highly recommend learning about these tools if you haven’t already; they make it easier to reproduce your work. I have written some notes on Docker and Conda that maybe…

Continue Reading

The Golden Rule of Bioinformatics

I’m a big fan of the book Bioinformatics Data Skills by Vince Buffalo and I highly recommend it to everyone who works in the bioinformatics field. The book introduces the reader to The Golden Rule of Bioinformatics, which is: Never ever trust your tools (or data). I am a strong proponent of this rule, which…

Continue Reading

Getting started with HISAT, StringTie, and Ballgown

A popular toolset used for analysing RNA-seq data is the tuxedo suite, which consists of TopHat and Cufflinks. The suite provided a start to finish pipeline that allowed users to map reads, assemble transcripts, and perform differential expression analyses. A newer “tuxedo suite” has been developed and is made up of three tools: HISAT, StringTie,…

Continue Reading