ENCODE RNA polymerase II ChIP-seq

Updated 2018 November 8th: include section on MACS2 Chromatin immunoprecipitation sequencing (ChIP-seq) is a high throughput method for investigating protein-DNA interactions and aims to determine whether specific proteins are interacting with specific genomic loci. The workflow consists of crosslinking DNA and protein together, usually via the use of formaldehyde, which induces protein-DNA and protein-protein crosslinks….

Continue Reading

ENCODE mappability and repeats

The ENCODE mappability tracks can be visualised on the UCSC Genome Browser and they provide a sense of how mappable a region of the genome is in terms of short reads or k-mers. On a side note, it seems some people use “mapability” and some use “mappability”; I was taught the CVC rule, so I’m…

Continue Reading

Using the ENCODE ChIA-PET dataset

Updated: 2014 March 14th From the Wikipedia article: Chromatin Interaction Analysis by Paired-End Tag Sequencing (ChIA-PET) is a technique that incorporates chromatin immunoprecipitation (ChIP)-based enrichment, chromatin proximity ligation, Paired-End Tags, and High-throughput sequencing to determine de novo long-range chromatin interactions genome-wide. Let’s get started on using the ENCODE ChIA-PET dataset by downloading the bed files,…

Continue Reading

Using ENCODE methylation data

Post updated 2014 January 2nd DNA methylation is a biochemical process and epigenetic modification, whereby a methyl group is added to the cytosine nucleotide (and also adenine) to form 5-methylcytosine. DNA methylation at the 5′ position of cytosine has the specific effect of reducing gene transcription and typically occurs in CpG sites, which are regions…

Continue Reading

Genome mapability

I know of the genome mapability and uniqueness tracks provided by the UCSC Genome Browser but I was just interested in doing this myself for the hg19 genome. As a test, I investigated chr22, where the base composition is broken down as: Length of chr22 = 51,304,566 A: 9,094,775 C: 8,375,984 G: 8,369,235 T: 9,054,551…

Continue Reading

GENCODE

By now you should have heard about the ENCODE project. GENCODE, summarised as Encyclopædia of genes and gene variants, is a sub-project of ENCODE where the aim is to annotate all evidence-based gene features in the entire human genome with high accuracy. This includes protein coding genes and their isoforms, non coding RNAs and pseudogenes….

Continue Reading

Encyclopedia of DNA elements (ENCODE)

ENCODE is an abbreviation of “Encyclopedia of DNA elements”, a project that aims to discover and define the functional elements encoded in the human genome via multiple technologies. Used in this context, the term “functional elements” is used to denote a discrete region of the genome that encodes a defined product (e.g., protein) or a…

Continue Reading