ENCODE RNA polymerase II ChIP-seq

Updated 2018 November 8th: include section on MACS2 Chromatin immunoprecipitation sequencing (ChIP-seq) is a high throughput method for investigating protein-DNA interactions and aims to determine whether specific proteins are interacting with specific genomic loci. The workflow consists of crosslinking DNA and protein together, usually via the use of formaldehyde, which induces protein-DNA and protein-protein crosslinks….

Continue Reading

Clustering mapped reads

Updated 2014 October 8th to include an analysis using CAGE data from ENCODE and rewrote parts of the post. In this post I will write about a read clustering method called paraclu, which allows mapped reads to be clustered together. This is particularly useful when working with CAGE data, where transcription start sites (TSSs) are…

Continue Reading

RefSeq promoters

Is there any nucleotide bias with the -40 region of RefSeqs? Taking all hg19 RefSeqs that mapped to assembled chromosomes (36,004) and extracting the nucleotide sequences 40 bp upstream of the RefSeq gene model, I generated a sequence logo. No obvious TATA box enrichment, which was expected since only 10-20% of genes in eukaryotes have…

Continue Reading