Phenolyzer

From Dave's wiki
Jump to navigation Jump to search

Phenolyzer is a tool to help prioritise disease genes based on any disease/phenotype terms as input. The simplest input into Phenolyzer is just a description, like ‘Alzheimer’. And the output is a list with prioritised genes, scores and all the details.

See Journal club for a brief description of the paper.

The tool is located at: http://phenolyzer.usc.edu/ and the manual is located at: http://phenolyzer.usc.edu/download/Phenolyzer_manual.pdf

Installation

See https://github.com/WangGenomicsLab/phenolyzer

git clone https://github.com/WangGenomicsLab/phenolyzer
cd phenolyzer
perl bin/annotate.pl -downdb -buildver hg19 -webfrom annovar refGene lib/humandb

Getting started

perl disease_annotation.pl --help
Usage:
     disease_annotation.pl [arguments] <disease_names or disease_filename>

     Optional arguments:
            -h, --help                      print help message
            -m, --man                       print complete documentation
            -v, --verbose                   use verbose output
            -out <string>                   output file name prefix (default:out)
            -d, --directory                 compiled database directory (default is ./lib/compiled_database)
            -f, --file                      the input will be treated as file names(both diseases and genes)
            -p, --prediction                Use the Protein interaction and Biosystem database to predict unreported gene 
                                            disease relations (like HPRD human protein interaction, Biosystem database and so on)
            -ph, --phenotype                the input term is also treated as a phenotype, the HPO annotation and OMIM description would be used      
            -hi, --haploinsufficiency       use haploinsufficiency score as weight to prioritize dominant disease genes
            -it, --intolerance              use gene intolerance score as weight to prioritize severe disease genes
            --bedfile                       the bed file as a genomic region used for selection and annotation of the genes
            --buildver                      the build version (hg18 or hg19) to annotate the bedfile
            --wordcloud                     generates a wordcloud of the interpretated diseases if used (not working if you input 'all diseases')
            --logistic                      uses the weight based on the logistic modeling with four different complex diseases
            --gene                          the genes used to select the results (file name if -f command is used)    
            --exact                         choose if you want only exact match but not just a word match
            --addon                         the name of a user-defined add-on gene-disease mapping file (has to be in the ./lib/compiled_database)
            --addon_gg                      the name of user-defined add-on gene-gene mapping file (has to be in the ./lib/compiled_database)
            --addon_weight                  the weight of add-on gene-disease mapping
            --addon_gg_weight               the weight of add-on gene-gene mapping
            --hprd_weight                   the weight for genes found in HPRD
            --biosystem_weight              the weight for genes found in Ncbi Biosystem 
            --gene_family_weight            the weight for genes found in HGNC Gene Family
            --htri_weight                   the weight for genes found in HTRI Transcription Interaction Database
            --gwas_weight                   the weight for gene disease pairs in Gwas Catalog
            --gene_reviews_weight           the weight for gene disease pairs in Gene Reviews  
            --clinvar_weight                the weight for gene disease pairs in Clinvar
            --omim_weight                   the weight for gene disease pairs in OMIM
            --orphanet_weight               the weight for gene disease pairs in Orphanet

    Function: automatically expand the input disease term to a list of
    professional disease names, get a prioritized genelist based on these
    disease names or phenotypes, score the genes.

    Notice: If you input 'all diseases' for disease name, then every item in
    the gene_disease database will be used and no disease expansion will be
    conducted. Addon Gene Gene file should be in the format "GENE A GENE B
    EVIDENCE SCORE PMID" Addon Gene Disease file should be in the format
    "GENE DISEASE DISEASE_ID SCORE SOURCE"

    Example: perl disease_annotation.pl sleep -p perl disease_annotation.pl
    disease -f -p -ph

    Version: 1.0.5 $Last Changed Date: 02-21-2015 by Hui Yang

Options:

Running the examples

perl disease_annotation.pl sleep -p -ph -logistic -out out/sleep/out
ls -1 out/sleep/
out.final_gene_list
out.merge_gene_scores
out.predicted_gene_scores
out.seed_gene_list
out_sleep_diseases
out_sleep_gene_scores

cat out/sleep/out.final_gene_list | head -5
Rank    Gene    ID      Score   Status
1       ASCL1   429     1       SeedGene
2       PRNP    5621    0.8527  SeedGene
3       CSNK1D  1453    0.8482  SeedGene
4       BDNF    627     0.802   SeedGene

perl disease_annotation.pl disease -f -p -ph -logistic -out out/disease/out
ls -1 out/disease/
out_crohn_disease_diseases
out_crohn_disease_gene_scores
out.final_gene_list
out.merge_gene_scores
out.predicted_gene_scores
out.seed_gene_list

perl disease_annotation.pl alzheimer -bedfile cnv.bed -p -ph -logistic -out out/alzheimer/out
perl disease_annotation.pl alzheimer -p -ph -logistic -out out/alzheimer_addon/out -addon_gg DB_MENTHA_GENE_GENE_INTERACTION -addon_gg_weight 0.05

Notes from the manual

The strategy for preparing a list of input terms depends on your specific need. For example, if you wish to obtain as many genes as possible then it is preferable to use more general and shorter terms. Exact and full disease names will limit the list of genes returned.