Is there any nucleotide bias with the -40 region of RefSeqs?
Taking all hg19 RefSeqs that mapped to assembled chromosomes (36,004) and extracting the nucleotide sequences 40 bp upstream of the RefSeq gene model, I generated a sequence logo.
No obvious TATA box enrichment, which was expected since only 10-20% of genes in eukaryotes have a TATA box (perhaps at -13 to -16?). Note the enrichment of a cytosine at -1.
Then I took the -20 and +20 sequences and generated the same sequence logo plot.
Note the enrichment of purines (adenine and guanine) at the 5′ UTR start (position 21).
This work is licensed under a Creative Commons
Attribution 4.0 International License.