Check what genes are correlated to your gene of interest

ARCHS4 (All RNA-seq and ChIP-seq sample and signature search) is a resource that provides access to gene and transcript counts uniformly processed (using kallisto) from all human and mouse RNA-seq experiments from the Gene Expression Omnibus (GEO) and the Sequence Read Archive (SRA). The tool gget and the sub-tool archs4 can be used to query ARCHS4 with your gene of interest. I've previously written how you can use gget to check where a gene is expressed from the command line. In this post, I'll introduce how you can use gget archs4 to check what genes are correlated (by expression pattern) to your gene of interest and how you can generate a heatmap using the list of correlated genes (plus your gene of interest).

The scripts needed to generate the heatmap are hosted on GitHub. You can clone the directory using git and the necessary scripts are in the script directory.

git clone https://github.com/davetang/archs4_heatmap.git

You will need to install some dependencies if you plan to use the scripts in your environment. Alternatively, I have also prepared a Docker image that contains all the dependencies so you can generate the heatmap easily.

The plot_heatmap.sh script does all the work. The usage is listed below.

Usage: ./script/plot_heatmap.sh
   [ -p | --max-procs INT (default 8) ]
   [ -t | --tmp-dir STR (default /tmp) ]
   [ -k | --keep keep tmp files ]
   [ -s | --species STR (default human) ]
   [ -n | --num-genes INT (default 100) ]
   [ -c | --cluster-cols ]
   [ -v | --version ]
   [ -h | --help ]
   <HGNC gene symbol>

The only mandatory input is the official HGNC gene symbol. The script will then run gget to get the list of genes correlated to your gene of interest, then run gget for each gene in the list, and then run heatmap.R to generate a heatmap.

If you have Docker installed, you can simply run the following to fetch the 50 most correlated genes to TNF from ARCHS4, and plot the results as a heatmap.

docker run --rm -v $(pwd):$(pwd) -w $(pwd) davetang/archs4_heatmap:0.0.4 -p 4 -n 50 TNF

The command above will generate TNF_top50.png and TNF_top50.csv. The CSV file contains the expression data used for the heatmap, which looks like this.

I'm not sure what the upper limit of genes you can specify to gget but here's the top 150.

docker run --rm -v $(pwd):$(pwd) -w $(pwd) davetang/archs4_heatmap:0.0.4 -p 4 -n 150 TNF

plot_heatmap.sh can cluster the columns but this is turned off by default. To perform sample clustering, use the -c parameter. In the command below, I'm using more processors (8) and fetching the top 200 most correlated genes to TNF.

docker run --rm -v $(pwd):$(pwd) -w $(pwd) davetang/archs4_heatmap:0.0.4 -c -p 8 -n 200 TNF

If you would like to keep the raw files used for generating the plot you can use the -k parameter to keep the files and then specify --tmp-dir to where you would like to store the raw files. For example, the following command will save the raw files into the current directory.

docker run --rm -v $(pwd):$(pwd) -w $(pwd) davetang/archs4_heatmap:0.0.4 -p 6 -k -t $(pwd) CCL2

For more information check out the GitHub repo and submit an issue if you come across any issues!

Please cite the following if you use this for your work:

You can cite this blog post and/or the GitHub repo if you found this useful.


This post was sponsored by Logos Biosystems:

Logos Biosystems specializes in developing cutting-edge life science imaging solutions that empower researchers to explore beyond the cellular level. The company’s scientist-led team creates accessible and affordable tools applicable to a broad spectrum of research areas, from drug discovery to agriculture. Logos Biosystems offers automated cell counting, digital cell imaging, and tissue clearing 3D imaging technologies to bolster research endeavors.

Print Friendly, PDF & Email



Creative Commons License
This work is licensed under a Creative Commons
Attribution 4.0 International License
.

Leave a Reply

Your email address will not be published. Required fields are marked *

This site uses Akismet to reduce spam. Learn how your comment data is processed.