Gene Set Variation Analysis

The Gene Set Variation Analysis (GSVA) is another popular analysis method for bulk RNA-seq data. GSVA differs from Gene Set Enrichment Analysis (GSEA) in that it can estimate gene set enrichment within a single sample. GSEA typically uses results from a differential expression analysis, which requires multiple samples, to determine whether there is an enrichment…

Continue Reading

Using the GenomicDataCommons package

The {GenomicDataCommons} Bioconductor package provides basic infrastructure for querying, accessing, and mining genomic datasets available from the Genomic Data Commons (GDC). The About the GDC webpage provides a brief description of the program: The Genomic Data Commons (GDC) is a research program of the National Cancer Institute (NCI). The mission of the GDC is to…

Continue Reading

Downloading molecular signatures from MSigDB in R

The Molecular Signatures Database (MSigDB) is a nice resource containing various gene sets designed for use in Gene Set Enrichment Analyses (GSEA) and its variants. It was co-developed with the GSEA by the Broad Institute and is still maintained by them; you can read more in the classic paper: Gene set enrichment analysis: A knowledge-based…

Continue Reading

Annotating variants with a custom file

The Variant Effect Predictor (VEP) tool can be used for annotating variants with respect to custom annotation sources. This is useful if gene models of interest are not represented in the Ensembl or RefSeq databases. To get started, first install VEP since it takes some time.

Continue Reading

Getting started with Arabidopsis thaliana genomics

I have started to work on Arabidopsis thaliana, as I mentioned in my last post. As noted in the Encyclopedia of life: Arabidopsis thaliana is the most widely used model organism in plant biology. Its small genome size, fully sequenced in the year 2000, chromosome number, fast growth cycle (from seed germination to set in…

Continue Reading

Incidental findings using GEMINI

The American College of Medical Genetics and Genomics (ACMG) have recommended that genetic variants that may be pathogenic or likely pathogenic in certain genes should be reported back to the patient. The latest list of genes can be found here. How do I assess whether a variant is pathogenic or likely pathogenic? Use this tool,…

Continue Reading

Summary plots from GEMINI

I’m a fan of GEMINI and have been using it for for a year and a half for various exome projects. I have written two scripts that can generate variant summaries from a GEMINI database. I prefer bar plots over the pie charts created by VEP. A summary pie chart created by VEP.

Continue Reading

gnomAD allele frequency of pathogenic ClinVar variants

Updated 2018 June 7th Just recently, the genome Aggregation Database (gnomAD) VCF files were available for download: The long-awaited gnomAD VCF is here – sites + frequencies for 123,136 exomes and 15,496 genomes: https://t.co/8puaTvJ45w pic.twitter.com/sxKOEVFDml — Daniel MacArthur (@dgmacarthur) February 27, 2017

Continue Reading