Of the RefSeq’s that have CpG islands 1,000 bp upstream, what are the GO terms?

As a follow up to this previous post, I obtained the RefSeq gene models that have a CpG island within the 1,000 bp upstream region.

There were 579 / 1009 GO terms. Note previously I identified ~780 RefSeq gene IDs, however curiously using the -wb and -f 1 options together with intersectBed, I got more intersections. I will have a look at this later. Here is the breakdown of the GO terms:

1733 Component
1819 Function
2487 Process

Top 10 Component

219 nucleus
176 cytoplasm
123 membrane
114 integral to membrane
93 plasma membrane
63 intracellular
45 cytosol
42 extracellular region
42 mitochondrion
35 integral to plasma membrane

Top 10 Function

241 protein binding
95 metal ion binding
71 DNA binding
69 nucleotide binding
67 zinc ion binding
54 sequence-specific DNA binding transcription factor activity
48 ATP binding
39 transferase activity
32 sequence-specific DNA binding
30 hydrolase activity

Top 10 Process

51 regulation of transcription, DNA-dependent
34 multicellular organismal development
34 regulation of transcription
30 signal transduction
27 cell cycle
27 cell differentiation
24 oxidation reduction
23 ion transport
21 protein phosphorylation
21 proteolysis

Cell cycle and cell differentiation may be enriched processes for RefSeq gene models with nearby CpG islands. We can apply some statistical tests.

Print Friendly, PDF & Email

Creative Commons License
This work is licensed under a Creative Commons
Attribution 4.0 International License

Leave a Reply

Your email address will not be published.

This site uses Akismet to reduce spam. Learn how your comment data is processed.