GC and AT content of 5′ UTRs, 3′ UTRs and coding exons of RefSeq gene models

Firstly download bed tracks of the 5′ UTR, 3′ UTR and coding exons from the UCSC Table Browser. The RefSeq gene models are in the table called RefGene. After you’ve saved the 3 bed files (e.g. mm9_refgene_090212_5_utr.bed, mm9_refgene_090212_3_utr.bed and mm9_refgene_090212_coding_exon.bed) use the fastaFromBed program from the BEDTools suite and convert the bed file into a…

Continue Reading

H3K27Ac

Mainly sourced from Wikipedia but arranged as per my train of thought. Histones are highly alkaline proteins found in eukaryotic cell nuclei that package and order the DNA into structural units called nucleosomes. They are the chief protein components of chromatin, acting as spools around which DNA winds, and play a role in gene regulation….

Continue Reading

Gene deserts

Find regions of the mouse genome devoid of any annotation (ESTs, mRNA, repeats, RefSeq and UCSC genes). Annotation tracks downloaded using the table browser feature of the UCSC Genome Browser. Chromosome sizes of mm9 downloaded from here. Code for finding regions of 10kb devoid of any annotation. In the mm9 genome I found 9,634 10kb…

Continue Reading

Bidirectional genes

Download 5′ UTR for all RefSeq genes using the UCSC Table Browser. Separate features according to strand Use intersectBed to find overlapping features Performing a GO enrichment analysis on the unique list of bidirectional genes and using all the genes as the universe list: Although this was a brief analysis, the results are somewhat similar…

Continue Reading

RefSeq promoters

Is there any nucleotide bias with the -40 region of RefSeqs? Taking all hg19 RefSeqs that mapped to assembled chromosomes (36,004) and extracting the nucleotide sequences 40 bp upstream of the RefSeq gene model, I generated a sequence logo. No obvious TATA box enrichment, which was expected since only 10-20% of genes in eukaryotes have…

Continue Reading