Calculating Pearson correlation using Perl

My modification of code which is originally available here. Probably easier to understand the original code. I altered the code so that I could use an anonymous 2d array and with strictures, so that I could plug it into my own code. Comments are included in the code below to assist use.

Continue Reading

Genome scan for 6mer frequency

Split the genome into 6 bp windows and calculate the 6 mer frequencies. Scanning chr6 of hg19: NNNNNN: 3719950 aaaaaa: 373380 tttttt: 372667 TTTTTT: 184768 AAAAAA: 182652 aaaaat: 143055 attttt: 142671 ATTTTT: 133646 TTTAAA: 133284 AAAAAT: 133130 TATTTT: 130672 AAAATA: 129572 TTTTAA: 129570 TTAAAA: 129177 aaaata: 123528 tatttt: 123134 atatat: 119872 …

Continue Reading

Hierarchical clustering with p-values

The code, which allowed me to use the Spearman’s rank correlation coefficient, was kindly provided to me by the developer of pvclust. Firstly download the unofficial package or just source it from my DropBox account. Start up R and follow: Hierarchical clustering with p-values.

Continue Reading

Pooling technical replicates in edgeR

This post is very old and should just be ignored. But if you came across this, here’s a thread on the Bioconductor mailing list that may be relevant 4 libraries each with technical replicates and 2 conditions. Technical replicates are the same samples performed identically. First treat technical replicates separately: Now pooling everything together, so…

Continue Reading

Using blat to map short RNAs

Updated on 2013 November 5th to include mapping of piRNAs I still use blat as my multi-purpose alignment tool despite it being developed over 10 years ago. For those needing a simple introduction to blat, see my using blat post. Now I was wondering if blat could aligned short RNAs; the definition of short RNAs…

Continue Reading

Gene Ontology enrichment analysis

Updated: 2019 March 24th The Gene Ontology Enrichment Analysis is a popular type of analysis that is carried out after a differential gene expression analysis has been carried out. There are many tools available for performing a gene ontology enrichment analysis. Online tools include DAVID, PANTHER and GOrilla. Bioconductor pacakges include GOstats, topGO and goseq….

Continue Reading

How many RefSeq gene models have GO terms?

Downloaded RefSeq gene models from UCSC Genome Browser. Total of 34,565 unique RefSeq ID Using this script refseq2go.pl join refseq to gene ontology (GO) terms 17,458 / 34,565 have GO terms, about half. In total there were 169,087 GO terms (10,316 unique) for the 17,458 RefSeq: 50,868 Component 53,144 Function 65,075 Process Top 10 Components…

Continue Reading