Making slides using R

Updated 2015 January 14th I’ll be giving a talk on analysing CAGE data using R in an upcoming workshop held at our institute early next year. I thought it would be a good time to check out slidify, which is an R package that can create HTML slides from R Markdown, since I’m planning on…

Continue Reading

CAGE analysis using the R Bioconductor package CAGEr

This post is outdated; please refer to the official documentation. Cap Analysis Gene Expression (CAGE) is a molecular technique, developed at RIKEN, which captures all transcription starting sites (TSSs) of an RNA population. The C in CAGE refers to the altered nucleotide at the 5′ site of precursor messenger RNA, termed the cap, which CAGE…

Continue Reading

Encyclopedia of DNA elements (ENCODE)

ENCODE is an abbreviation of “Encyclopedia of DNA elements”, a project that aims to discover and define the functional elements encoded in the human genome via multiple technologies. Used in this context, the term “functional elements” is used to denote a discrete region of the genome that encodes a defined product (e.g., protein) or a…

Continue Reading

Clustering mapped reads

Updated 2014 October 8th to include an analysis using CAGE data from ENCODE and rewrote parts of the post. In this post I will write about a read clustering method called paraclu, which allows mapped reads to be clustered together. This is particularly useful when working with CAGE data, where transcription start sites (TSSs) are…

Continue Reading

Visualising FANTOM3 human CAGE data

First download the relevant tables from here. Download tss_library_expr_summary.tsv.bz2, tss_summary.tsv.bz2 and rna_lib_summary.tsv.bz2 and bunzip2 the files. According to information derived from rna_lib_summary.tsv there are 13,897 CAGE tags from the prostate gland (rna_lib_id = HBA). Double check this: Genomic coordinates of TSSs can be obtained from tss_summary.tsv. Here’s a script which generates bedgraph files of all…

Continue Reading