This post is outdated; please refer to the official documentation.
Cap Analysis Gene Expression (CAGE) is a molecular technique, developed at RIKEN, which captures all transcription starting sites (TSSs) of an RNA population. The C in CAGE refers to the altered nucleotide at the 5' site of precursor messenger RNA, termed the cap, which CAGE targets and pulls down. The vignette of the CAGEr package has a very nice introduction to CAGE. I'd just like to add that several other CAGE protocol exists, such as HeliScopeCAGE and nanoCAGE. While these protocols all capture TSSs, the biochemical steps are different, especially nanoCAGE, which does not use CAP trapping but template switching. If you're interested in template switching with respect to transcriptome studies, have a look at the introduction of this paper, which I wrote.
In this post I will go through the workflow of the CAGEr package. If you perform CAGE analysis, I highly recommend using this package. It provides the methods/analysis steps that are commonly used in CAGE analyses and eliminates the use of in house scripts/methods. For the first part I will use publicly available FANTOM3 and FANTOM4 data that is conveniently packaged in Bioconductor as part of CAGEr. The second part shows an example session using ENCODE CAGE data, which is conveniently provided as BAM files.