Exomiser
The Exomiser is a Java program that functionally annotates variants from whole-exome sequencing data starting from a VCF file (version 4). The functional annotation code is based on Jannovar and uses UCSC KnownGene transcript definitions and hg19 genomic coordinates.
http://www.sanger.ac.uk/science/tools/exomiser
Getting started
# version 7 wget -c ftp://ftp.sanger.ac.uk/pub/resources/software/exomiser/downloads/exomiser/exomiser-cli-7.2.1.sha256 wget -c ftp://ftp.sanger.ac.uk/pub/resources/software/exomiser/downloads/exomiser/exomiser-cli-7.2.1-distribution.zip wget -c ftp://ftp.sanger.ac.uk/pub/resources/software/exomiser/downloads/exomiser/exomiser-cli-7.2.1-data.zip # check sha256sum -c exomiser-cli-7.2.1.sha256 exomiser-cli-7.2.1-distribution.zip: OK exomiser-cli-7.2.1-data.zip: OK unzip exomiser-cli-7.2.1-distribution.zip unzip exomiser-cli-7.2.1-data.zip # test run java -Xms2g -Xmx4g -jar exomiser-cli-7.2.1.jar --analysis NA19722_601952_AUTOSOMAL_RECESSIVE_POMP_13_29233225_5UTR_38.yml ls -1 results/ NA19722_601952_AUTOSOMAL_RECESSIVE_POMP_13_29233225_5UTR_38.genes.tsv NA19722_601952_AUTOSOMAL_RECESSIVE_POMP_13_29233225_5UTR_38.html NA19722_601952_AUTOSOMAL_RECESSIVE_POMP_13_29233225_5UTR_38.variants.tsv NA19722_601952_AUTOSOMAL_RECESSIVE_POMP_13_29233225_5UTR_38.vcf # version 6 wget ftp://ftp.sanger.ac.uk/pub/resources/software/exomiser/downloads/exomiser/exomiser-cli-6.0.0-distribution.zip wget ftp://ftp.sanger.ac.uk/pub/resources/software/exomiser/downloads/exomiser/h2_db_dumps/exomiser-6.0.1.h2.db.gz unzip exomiser-cli-6.0.0-distribution.zip gunzip exomiser-6.0.1.h2.db.gz mv exomiser-6.0.1.h2.db exomiser-cli-6.0.0/data/exomiser.h2.db cd exomiser-cli-6.0.0 java -jar exomiser-cli-6.0.0.jar --help > help.txt
Suggested workflow
Figure from http://www.ncbi.nlm.nih.gov/pubmed/26562621
Help
java -jar exomiser-cli-7.2.1.jar --help Welcome to: _____ _ _____ _ |_ _| |__ ___ | ____|_ _____ _ __ ___ (_)___ ___ _ __ | | | '_ \ / _ \ | _| \ \/ / _ \| '_ ` _ \| / __|/ _ \ '__| | | | | | | __/ | |___ > < (_) | | | | | | \__ \ __/ | |_| |_| |_|\___| |_____/_/\_\___/|_| |_| |_|_|___/\___|_| A Tool to Annotate and Prioritize Exome Variants v7.2.1 usage: java -jar exomizer-cli-7.2.1.jar [...] --analysis <file> Path to analysis script file. This should be in yaml format. --analysis-batch <file> Path to analysis batch file. This should be in plain text file with the path to a single analys script file in yaml format on each line. --batch-file <file> Path to batch file. This should contain a list of fully qualified path names for the settings files you wish to process. There should be one file name on each line. --candidate-gene <arg> Gene symbol of known or suspected gene association e.g. FGFR2 -D,--disease-id <arg> OMIM ID for disease being sequenced. e.g. OMIM:101600 -E,--hiphive-params <type> Comma separated list of optional parameters for hiphive: human, mouse, fish, ppi. e.g. --hiphive-params=human or --hiphive-params=human,mous e,ppi -F,--max-freq <arg> Maximum frequency threshold for variants to be retained. e.g. 100.00 will retain all variants. Default: 100.00 -f,--out-format <type> Comma separated list of format options: HTML, VCF, TAB-GENE or TAB-VARIANT,. Defaults to HTML if not specified. e.g. --out-format=TAB-VARIANT or --out-format=TAB-GENE,TAB-V ARIANT,HTML,VCF --full-analysis <true/false> Run the analysis such that all variants are run through all filters. This will take longer, but give more complete results. Default is false --genes-to-keep <Entrez geneId> Comma separated list of seed genes (Entrez gene IDs) for filtering -H,--help Shows this help -h,--help Shows this help --hpo-ids <HPO ID> Comma separated list of HPO IDs for the sample being sequenced e.g. HP:0000407,HP:0009830,HP:00 02858 -I,--inheritance-mode <arg> Filter variants for inheritance pattern (AR, AD, X) --num-genes <arg> Number of genes to show in output -o,--out-prefix <arg> Out file prefix. Will default to vcf-filename-exomiser-resul ts --output-pass-variants-only <true/false> Only write out PASS variants in TSV and VCF files. -P,--keep-non-pathogenic <true/false> Keep the predicted non-pathogenic variants that are normally removed by default. These are defined as syonymous, intergenic, intronic, upstream, downstream or intronic ncRNA variants. This setting can optionally take a true/false argument. Not including the argument is equivalent to specifying 'false'. -p,--ped <file> Path to pedigree (ped) file. Required if the vcf file is for a family. --prioritiser <name> Name of the prioritiser used to score the genes. Can be one of: -Q,--min-qual <arg> Mimimum quality threshold for variants as specifed in VCF 'QUAL' column. Default: 0 -R,--restrict-interval <arg> Restrict to region/interval (e.g., chr2:12345-67890) --remove-known-variants <true/false> Filter out all variants with an entry in dbSNP/ESP/ExAC (regardless of frequency). -S,--seed-genes <Entrez geneId> Comma separated list of seed genes (Entrez gene IDs) for random walk --settings-file <file> Path to settings file. Any settings specified in the file will be overidden by parameters added on the command-line. -T,--keep-off-target <true/false> Keep the off-target variants that are normally removed by default. These are defined as intergenic, intronic, upstream, downstream or intronic ncRNA variants. This setting can optionally take a true/false argument. Not including the argument is equivalent to specifying 'true'. -v,--vcf <file> Path to VCF file with mutations to be analyzed. Can be either for an individual or a family.
Usage
(a) Exomiser hiPHIVE algorithm - phenotype comparisons to human, mouse and fish involving disruption of the gene or nearby genes in the interactome using a RandomWalk
java -Xms2g -Xmx4g -jar exomiser-cli-7.2.1.jar --prioritiser=hiphive -I AD -F 1 -D OMIM:101600 -v data/Pfeiffer.vcf java -Xms2g -Xmx4g -jar exomiser-cli-7.2.1.jar --prioritiser=hiphive -I AD -F 1 --hpo-ids \ HP:0000006,HP:0000174,HP:0000194,HP:0000218,HP:0000238,HP:0000244,HP:0000272,HP:0000303,HP:0000316, \ HP:0000322,HP:0000324, HP:0000327,HP:0000348,HP:0000431,HP:0000452,HP:0000453,HP:0000470,HP:0000486, \ HP:0000494,HP:0000508,HP:0000586,HP:0000678, HP:0001156,HP:0001249,HP:0002308,HP:0002676,HP:0002780, \ HP:0003041,HP:0003070,HP:0003196,HP:0003272,HP:0003307,HP:0003795, HP:0004209,HP:0004322,HP:0004440, \ HP:0005048, HP:0005280,HP:0005347,HP:0006101,HP:0006110,HP:0009602,HP:0009773,HP:0010055, HP:0010669, \ HP:0011304 -v data/Pfeiffer.vcf
(b) Exomiser PHIVE algorithm - phenotype comparisons to mice with disruption of the gene
java -Xmx2g -jar exomiser-cli-7.2.1.jar --prioritiser=phive -I AD -F 1 -D OMIM:101600 -v data/Pfeiffer.vcf
(c) Exomiser Phenix algorithm - phenotype comparisons to known human disease genes
java -Xms2g -Xmx4g -jar exomiser-cli-7.2.1.jar --prioritiser=phenix -v data/Pfeiffer.vcf -I AD -F 1 --hpo-ids \ HP:0000006,HP:0000174,HP:0000194,HP:0000218,HP:0000238,HP:0000244,HP:0000272,HP:0000303,HP:0000316, \ HP:0000322,HP:0000324, HP:0000327,HP:0000348,HP:0000431,HP:0000452,HP:0000453,HP:0000470,HP:0000486, \ HP:0000494,HP:0000508,HP:0000586,HP:0000678, HP:0001156,HP:0001249,HP:0002308,HP:0002676,HP:0002780, \ HP:0003041,HP:0003070,HP:0003196,HP:0003272,HP:0003307,HP:0003795, HP:0004209,HP:0004322,HP:0004440, \ HP:0005048, HP:0005280,HP:0005347,HP:0006101,HP:0006110,HP:0009602,HP:0009773,HP:0010055, HP:0010669, \ HP:0011304
(d) Exomiser ExomeWalker algorithm - prioritisation by proximity in interactome to the seed genes
java -Xms2g -Xmx4g -jar exomiser-cli-7.2.1.jar --prioritiser exomewalker -v data/Pfeiffer.vcf -I AD -F 1 -S 2260
Web tool
https://www.sanger.ac.uk/resources/software/exomiser/submit
Download test file from https://www.sanger.ac.uk/resources/software/exomiser/submit/resources/Pfeiffer.vcf
Issues
- A PED file is required for VCF files with multiple samples; I have a script at https://github.com/davetang/learning_vcf_file/blob/master/script/vcf_to_ped.R that produces a PED file from a VCF file
- For the PED file processed by The Exomiser, a zero it not allowed in the sex column (make everyone male [one] or female [two] instead)
- For the PED file processed by The Exomiser, a negative nine is not allowed in the phenotype column (use a zero instead)
- If you run The Exomiser in a directory that doesn't contain a results folder, no results will be outputted; create a results folder before you conduct your analysis
Other info
https://sangerinstitute.wordpress.com/2013/11/28/the-rare-diseases-of-mice-and-men/