Extract gene names according to GO terms

Find genes that are associated with a known gene ontology term:

#install if you don't have org.Hs.eg.db and GO.db
source("http://bioconductor.org/biocLite.R")
biocLite("org.Hs.eg.db")
biocLite("GO.db")
library(org.Hs.eg.db)
library(GO.db)

go_id = GOID( GOTERM[ Term(GOTERM) == "chromatin remodeling"])

get(go_id, org.Hs.egGO2ALLEGS)

     IEA      IEA      IDA      IEA      IEA      IDA      TAS       IC
    "86"    "473"    "664"    "676"   "1105"   "2186"   "2648"   "3065"
      IC      ISS      IEA      ISS      TAS      IDA      IEA      TAS
  "3066"   "3070"   "4221"   "4678"   "5925"   "5928"   "6594"   "6595"
     IDA      IEA      IDA      IDA      IMP      TAS      NAS      NAS
  "6597"   "6598"   "6599"   "6601"   "6602"   "6827"   "6829"   "6830"
     IEA      IDA      TAS      IEA      TAS      IEA      NAS      IEA
  "6927"   "7141"   "7141"   "7270"   "8289"   "8467"   "8850"   "9031"
     IDA      NAS      IDA      TAS      IDA      TAS      IEA      NAS
  "9557"   "9757"   "9759"  "10014"  "10361"  "10629"  "10661"  "11176"
     NAS      IMP      IEA      IDA      IMP      NAS      IDA      NAS
 "11335"  "22893"  "23314"  "23411"  "50511"  "50943"  "51773"  "54108"
     IDA      TAS      IMP      NAS      IMP      IDA      NAS      IEA
 "54617"  "55193"  "55355"  "57492"  "57680"  "79723"  "84181" "150572"
     IMP      NAS
"201161" "373861"

allegs = get(go_id, org.Hs.egGO2ALLEGS)

genes = unlist(mget(allegs,org.Hs.egSYMBOL))

genes
       86       473       664       676      1105      2186      2648      3065
 "ACTL6A"    "RERE"   "BNIP3"    "BRDT"    "CHD1"    "BPTF"   "KAT2A"   "HDAC1"
     3066      3070      4221      4678      5925      5928      6594      6595
  "HDAC2"   "HELLS"    "MEN1"    "NASP"     "RB1"   "RBBP4" "SMARCA1" "SMARCA2"
     6597      6598      6599      6601      6602      6827      6829      6830
"SMARCA4" "SMARCB1" "SMARCC1" "SMARCC2" "SMARCD1" "SUPT4H1"  "SUPT5H"  "SUPT6H"
     6927      7141      7141      7270      8289      8467      8850      9031
  "HNF1A"    "TNP1"    "TNP1"    "TTF1"  "ARID1A" "SMARCA5"   "KAT2B"   "BAZ1B"
     9557      9757      9759     10014     10361     10629     10661     11176
  "CHD1L"    "MLL4"   "HDAC4"   "HDAC5"    "NPM2"   "TAF6L"    "KLF1"   "BAZ2A"
    11335     22893     23314     23411     50511     50943     51773     54108
   "CBX3"   "BAHD1"   "SATB2"   "SIRT1"   "SYCP3"   "FOXP3"    "RSF1"  "CHRAC1"
    54617     55193     55355     57492     57680     79723     84181    150572
  "INO80"   "PBRM1"   "HJURP"  "ARID1B"    "CHD8" "SUV39H2"    "CHD6"   "SMYD1"
   201161    373861
  "CENPV"   "HILS1"

See also: org.Hs.eg.db

Source: bioconductor mailing list thread -> [BioC] Query Gene Ontology

Print Friendly, PDF & Email



Creative Commons License
This work is licensed under a Creative Commons
Attribution 4.0 International License
.
Posted in RTagged ,
4 comments Add yours
  1. Hi? thanks for you giving this example about extract gene names according to GO terms!
    Can I ask you a question about how to extract GO terms? Do you konw that? Waiting for your answer, thank you very much!

  2. Hi!

    Thank you very much for this example!
    It seems that it retrieves only the first gene of each GOID are am I misunderstanding something?
    If it is the case, how can one get all the genes of each GO term?

    Thanks,

    Best,

    Pernille

Leave a Reply

Your email address will not be published. Required fields are marked *

This site uses Akismet to reduce spam. Learn how your comment data is processed.