ENCODE RNA polymerase II ChIP-seq

Updated 2018 November 8th: include section on MACS2 Chromatin immunoprecipitation sequencing (ChIP-seq) is a high throughput method for investigating protein-DNA interactions and aims to determine whether specific proteins are interacting with specific genomic loci. The workflow consists of crosslinking DNA and protein together, usually via the use of formaldehyde, which induces protein-DNA and protein-protein crosslinks….

Continue Reading

Annotating RNA-Seq data

After mapping your reads from an RNA-Seq experiment, usually the next task is identify the transcripts that the reads came from (i.e. annotating RNA-Seq data) and there are many ways of doing so. Here I just describe a rather crude method whereby I download sequence coordinates of hg19 RefSeqs as a BED12 file from the…

Continue Reading

Motifs upstream of RefSeq gene models

Here’s a very primitive way of looking for motifs upstream of RefSeq gene models. 1) Download the upstream sequences (-50) of RefSeq gene models using the UCSC Table Browser tool as a bed file 2) Using the fastaFromBed tool from BEDTools, make fasta files from the bed file 3) Look for motifs Here’s the main…

Continue Reading

GC and AT content of 5′ UTRs, 3′ UTRs and coding exons of RefSeq gene models

Firstly download bed tracks of the 5′ UTR, 3′ UTR and coding exons from the UCSC Table Browser. The RefSeq gene models are in the table called RefGene. After you’ve saved the 3 bed files (e.g. mm9_refgene_090212_5_utr.bed, mm9_refgene_090212_3_utr.bed and mm9_refgene_090212_coding_exon.bed) use the fastaFromBed program from the BEDTools suite and convert the bed file into a…

Continue Reading

Bidirectional genes

Download 5′ UTR for all RefSeq genes using the UCSC Table Browser. Separate features according to strand Use intersectBed to find overlapping features Performing a GO enrichment analysis on the unique list of bidirectional genes and using all the genes as the universe list: Although this was a brief analysis, the results are somewhat similar…

Continue Reading

RefSeq promoters

Is there any nucleotide bias with the -40 region of RefSeqs? Taking all hg19 RefSeqs that mapped to assembled chromosomes (36,004) and extracting the nucleotide sequences 40 bp upstream of the RefSeq gene model, I generated a sequence logo. No obvious TATA box enrichment, which was expected since only 10-20% of genes in eukaryotes have…

Continue Reading