Interactive plots in R using plotly

I found out about plotly a couple of months ago via R-bloggers:

I finally gave it a go when a friend asked me for help making a Gantt chart and I was impressed with plotly's interactivity and ease of use. Since I use scatter plots a lot, this post will be about making interactive scatter plots in R using plotly.

Continue reading

Animated plots using R

I learned the simple concept of animation back in school, when some of my classmates would draw stick figures on the edge of large textbooks. At first I was wondering why one would defile a textbook in such a way, but then as they flipped through the pages and brought the stick figures to life, I was in awe. Despite this, at that stage of my life, a textbook was sacred to me (they were expensive and scarce), so I would use large Post-it notes to doodle instead. I wasn't very good at drawing (even when it comes to stick figures), so I made a few animations and that was it.

This post is on creating animated plots using R. I wrote it not because I wanted to rekindle my youthful interest in stick figure animation but because I wanted to create an animated plot for an upcoming talk. I found a short post on creating animated plots using R and I follow the same idea of making multiple plots and then combining them into a GIF using ImageMagick.

Continue reading

ENCODE RNA polymerase II ChIP-seq

Chromatin immunoprecipitation sequencing (ChIP-seq) is a high throughput method for investigating protein-DNA interactions and aims to determine whether specific proteins are interacting with specific genomic loci. The workflow consists of crosslinking DNA and protein together, usually via the use of formaldehyde, which induces protein-DNA and protein-protein crosslinks. Importantly, these crosslinks are reversible by incubation at 70°C. Next the crosslinked DNA-protein complexes are sheared into roughly 500 bp fragments, usually by sonication. At this point we have "sheared DNA" and "sheared DNA crosslinked with proteins". Now comes the immunoprecipitation step, which is a technique that precipitates a protein antigen out of solution using an antibody that recognises that particular antigen. The crosslinking would result in many DNA-protein interactions and we use immunoprecipitation to pull down the protein of interest with the DNA region it was interacting with. After immunoprecipitation, the formaldehyde crosslinks are reversed by heating and the DNA strands are purified and sequenced. There's a nice graphic depicting this workflow at the Wikipedia article for ChIP-seq.

Continue reading

Using Gviz

Updated: 2013 November 15th

A while ago I asked on Twitter, what are some tools that people use to visualise hundreds of bam files. One of the suggestions was Gviz (thanks Sebastian!) and I had a quick glimpse at the Bioconductor package and the plots looked really great! Here I use Gviz to plot features along a reference sequence and for visualising bam files.

From the vignette:

The Gviz package aims to provide a structured visualisation framework for plotting any type of data along genomic coordinates. The fundamental concept behind the Gviz package is similar to the approach taken by most genome browsers, in that individual types of genomic features or data are represented by separate tracks.

Continue reading

Creating a coverage plot in R

Disclaimer (2015 August 5th): as pointed out in this comment thread below, this post created a density plot rather than a coverage plot. I have written a new post that uses BEDTools to calculate the coverage and R to produce an actual coverage plot.

I've recently discovered GitHub Gist, so for this post I'm going to use that to host my code (and all subsequent posts as I see fit). The code was not displaying properly due to some CSS property of the Twenty Ten theme, so I had to update my WordPress theme to Twenty Eleven, which also led me to changing my header image. The photo I used for the header was a shot I took at the summit of Mount Fuji around 06:00 on the 24th August 2013, when the clouds finally cleared a little; I hiked all night to make it to the top to see sunrise but unfortunately the weather was terrible. The photo looks nice, so I thought I'll use it as the header.

Anyway back to the topic; I wanted to create a coverage plot of mapped reads starting from a BAM file. So far I've been using IGV's coverage track to get a visual idea of the coverage. In the past, I've also used bedtools genomecov to generate bedGraph files and the subsequent wig and bigWig files that I would then visualise on the UCSC Genome Browser. How about creating a coverage plot in R (so that I can export it as a postscript file)? Yeah sure, why not. Let's download a BAM file as an example:

#the smallest CAGE BAM file from ENCODE
wget http://hgdownload.cse.ucsc.edu/goldenPath/hg19/encodeDCC/wgEncodeRikenCage/wgEncodeRikenCageHchCellPapAlnRep1.bam

Continue reading