Tag Archives: bioinformatics

Twitter

Today while reading a paper, I found some interesting one-liner facts. They are way too short to create a post on but I would like to make a repository of them. What better place to store these facts than Twitter! … Continue reading

Posted in /etc | Tagged | Leave a comment

Setting up your Windows PC for performing bioinformatics analyses

I recently bought another computer (Core i7 2600, 8 gigs ram and a GTX460). Unlike most bioinformaticians, I use Windows. Inevitably, I have to do work at home so I had to set up yet another PC to perform my … Continue reading

Posted in /etc, bioinformatics | Tagged , | Leave a comment

DESeq vs. edgeR vs. baySeq

6th April 2012: For a more updated version of this post, please refer see this post. A very simple comparison Using the TagSeqExample.tab file from the DESeq package as the benchmark dataset. According to DESeq authors, T1a and T1b are … Continue reading

Posted in bioinformatics, R | Tagged , , , | 22 Comments

Finding genes with co-expression patterns

Can the R bioconductor package “WGCNA” find artefactually created modules? Firstly some (subpar) code to generate an artefactual list of genes with co-expression patterns (modules): Running the code: ./generate_random_module.pl 10 1000 20 > 10_sample_1000_list_20_module.tsv Patterns: 1 0 0 0 0 … Continue reading

Posted in bioinformatics, R | Tagged , , , | Leave a comment

Of the RefSeq’s that have CpG islands 1,000 bp upstream, what are the GO terms – part 2?

Using GO.db and GOstats, I obtained the gene list with bona fide CpG islands upstream and conducted a GO enrichment analysis. The choice of the gene universe is again all RefSeq gene models. Enriched Biological Processes include: 1 primary metabolic … Continue reading

Posted in R | Tagged , | Leave a comment

Of the RefSeq’s that have CpG islands 1,000 bp upstream, what are the GO terms?

As a follow up to this previous post, I obtained the RefSeq gene models that have a CpG island within the 1,000 bp upstream region. There were 579 / 1009 GO terms. Note previously I identified ~780 RefSeq gene IDs, … Continue reading

Posted in bioinformatics | Tagged | Leave a comment

How many RefSeq gene models have GO terms?

Downloaded RefSeq gene models from UCSC Genome Browser. Total of 34,565 unique RefSeq ID Using this script refseq2go.pl join refseq to gene ontology (GO) terms 17,458 / 34,565 have GO terms, about half. In total there were 169,087 GO terms … Continue reading

Posted in bioinformatics | Tagged | Leave a comment

How often are CpG islands upstream of transcriptional start sites?

Download cpg island bed file for hg19 from UCSC Table Browser. Distribution of cpg islands on hg19 (excluding random, chrUn and hap): 2541 chr19 2462 chr1 1688 chr2 1634 chr17 1578 chr7 1491 chr16 1367 chr11 1253 chr6 1229 chr5 … Continue reading

Posted in bioinformatics | Tagged | 1 Comment