Getting started with HISAT, StringTie, and Ballgown

A popular toolset used for analysing RNA-seq data is the tuxedo suite, which consists of TopHat and Cufflinks. The suite provided a start to finish pipeline that allowed users to map reads, assemble transcripts, and perform differential expression analyses. A newer "tuxedo suite" has been developed and is made up of three tools: HISAT, StringTie,...

Continue Reading

Read GTF file into R

The Gene Transfer Format (GTF) is a refinement of the General Feature Format (GFF). A GFF file has nine columns: seqname The name of the sequence; must be a chromosome or scaffold. source The program that generated this feature. feature The name of this type of feature, e.g. "CDS", "start_codon", "stop_codon", and "exon" start The...

Continue Reading

Learning about Snakemake

As promised two years ago, here's a short blog post on Snakemake. I have been using Bpipe to manage my workflows/pipelines but Snakemake has been mentioned to me on more than one occasion; in particular: @davetang31 seems useful if testing many diff types of pipelines. snakemake seems a bit more practical if analyzing many samples...

Continue Reading

Gene to OMIM phenotype

A couple of weeks ago, I wrote a post on identifying OMIM phenotypes that are associated with a gene of interest. I thought I solved the problem by using one of my favourite R packages (biomaRt) but alas. For example, I could not find any OMIM IDs associated with the TTN gene using biomaRt. In...

Continue Reading

VCF to PED

One of the classic bioinformatics problems is converting files from one format into another. In this post, I go through the process of creating a PED and MAP file from a VCF file. All the files described in this post are available in this GitHub repository.

Continue Reading

Exploring the UK10K variants

It has been almost two months since my last post; I have been occupied with preparing a fellowship application (which has been sent off!) and now I'm occupied with preparing and writing papers. Sadly, I've pushed blogging right down the priority list, even though it's one of the things I enjoy doing the most. This...

Continue Reading

Getting started with GEMINI

After getting started and getting acquainted with DNA sequencing data, it's finally time to explore DNA variation. A tool that makes this easy is GEMINI and this post briefly demonstrates some of its functionality. I have only used GEMINI sparingly and what I know about the tool is gathered mostly from their documentation and tutorials....

Continue Reading