Ensembl Gene IDs to gene symbols

For converting Ensembl Gene IDs to gene symbols, using biomaRt is often recommended and indeed it is what I typically use. However, recently I needed to use Ensembl version 112 and could not get {biomaRt} to work with this specific version. Here's what I tried:

ensembl <- useMart(
  biomart = "ENSEMBL_MART_ENSEMBL",
  dataset = "hsapiens_gene_ensembl",
  host = "https://may2024.archive.ensembl.org"
)

Convert Ensembl gene IDs to HUGO Gene Nomenclature Committee (HGNC) gene symbols.

my_genes <- c('ENSG00000118473', 'ENSG00000162426')

getBM(
  attributes=c('ensembl_gene_id', "hgnc_symbol", "description"),
  filters = "ensembl_gene_id",
  values=my_genes,
  mart=ensembl
)
Error in .processResults(postRes, mart = mart, hostURLsep = sep, fullXmlQuery = fullXmlQuery, : Query ERROR: caught BioMart::Exception::Database: Error during query execution: Table 'ensembl_mart_112.hsapiens_gene_ensembl__ox_hgnc__dm' doesn't exist

I tried the suggestion to use useEnsembl() but I got the same error.

ensembl_112 <- useEnsembl(
  biomart = "genes",
  dataset = "hsapiens_gene_ensembl",
  version = 112
)

getBM(
  attributes=c('ensembl_gene_id', "hgnc_symbol", "description"),
  filters = "ensembl_gene_id",
  values=my_genes,
  mart=ensembl_112
)
Error in .processResults(postRes, mart = mart, hostURLsep = sep, fullXmlQuery = fullXmlQuery, : Query ERROR: caught BioMart::Exception::Database: Error during query execution: Table 'ensembl_mart_112.hsapiens_gene_ensembl__ox_hgnc__dm' doesn't exist

I needed Ensembl 112 and using {biomaRt} didn't seem like an option anymore, so I went to the Ensembl FTP site for version 112 and after looking in all the directories I couldn't find a simple file that I could use to create an Ensembl Gene ID to gene symbol lookup. I was about to give up when I found and read the README that had the following: (output is snipped)

|-- mysql  MySQL database per-table text files
|    |
|    |-- ensembl_mart_<release>  BioMart database for genes

I navigated to https://ftp.ensembl.org/pub/release-112/mysql/ensembl_mart_112/, which takes some time to load because the FTP site is slow and there are a lot of files.

After downloading and checking several files, I think I found the file I needed, which was hsapiens_gene_ensembl__gene__main.txt.gz

wget https://ftp.ensembl.org/pub/release-112/mysql/ensembl_mart_112/hsapiens_gene_ensembl__gene__main.txt.gz

Unfortunately this file does not have a header so I'm not sure what all the columns contain but I could figure out that I needed columns 7 (Ensembl Gene ID) and 8 (HGNC gene symbol).

I have some Ensembl Gene IDs where I know the HGNC gene symbol, so I decided to look them up in this file as a confirmation.

  ensembl_gene_id hgnc_symbol
1 ENSG00000118473       SGIP1
2 ENSG00000162426     SLC45A1
zcat hsapiens_gene_ensembl__gene__main.txt.gz | cut -f7,8 | grep ENSG00000118473
ENSG00000118473 SGIP1
zcat hsapiens_gene_ensembl__gene__main.txt.gz | cut -f7,8 | grep ENSG00000162426
ENSG00000162426 SLC45A1

Looks like it's the file I need!

zcat hsapiens_gene_ensembl__gene__main.txt.gz | wc -l
70611

Is the URL consistent for different versions, such that I can simply change the version number and download the same file for the different version? Yes!

wget https://ftp.ensembl.org/pub/release-113/mysql/ensembl_mart_113/hsapiens_gene_ensembl__gene__main.txt.gz -O hsapiens_gene_ensembl__gene__main_113.txt.gz
zcat  hsapiens_gene_ensembl__gene__main_113.txt.gz | cut -f7,8 | grep ENSG00000162426
ENSG00000162426 SLC45A1

I sometimes get a network connection error to Biomart, which breaks and stops my workflows, so I might just download this file to have an offline way to convert Ensembl Gene IDs to gene symbols.




Creative Commons License
This work is licensed under a Creative Commons
Attribution 4.0 International License
.

Leave a Reply

Your email address will not be published. Required fields are marked *

This site uses Akismet to reduce spam. Learn how your comment data is processed.