Updated 2015 April 6th to include the intersect_bed() function in the bedr package.
Last year I saw a post on Writing an R package from scratch and I always wanted to follow the tutorial. Yesterday while trying to make some plots using Gviz, I had some BED-like files (not supported by Gviz), which I wanted to convert into a GRanges object (supported by Gviz). I thought it would be easier if there was a function that could load BED-like files into GRanges objects. I couldn't find one so I thought I'll write a very primitive R package that contains this function. This post is on the creation of the BED to GRanges function and the R package that contains it.
Following the tutorial by Hilary, I installed and loaded the necessary packages. Then I created my package, which I named bedr:
install.packages("devtools") library("devtools") devtools::install_github("klutometis/roxygen") library(roxygen2) setwd("~") create("bedr")
When the create() function is run, it produces some files, one of which is the DESCRIPTION file. In this file you can list basic information as well as all the dependencies your package requires; just use a comma to separate entries in "Depends." Furthermore, you can add an "Imports:" line to import required packages. Below is my minimal DESCRIPTION file.
cat DESCRIPTION Package: bedr Title: BED to GRanges Version: 0.0.0.9000 Authors@R: person("Dave", "Tang", , "me@davetang.org", role = c("aut", "cre")) Description: Loads a BED-like file and stores it as a GRanges object. Depends: R (>= 3.1.2) License: Free LazyData: true
I wrote a simple R function (with some roxygen2 formatted documentation), which simply loads a BED-like file and returns it as a GRanges object:
#' BED to GRanges #' #' This function loads a BED-like file and stores it as a GRanges object. #' The tab-delimited file must be ordered as 'chr', 'start', 'end', 'id', 'score', 'strand'. #' The minimal BED file must have the 'chr', 'start', 'end' columns. #' Any columns after the strand column are ignored. #' #' @param file Location of your file #' @keywords BED GRanges #' @export #' @examples #' bed_to_granges('my_bed_file.bed') bed_to_granges <- function(file){ df <- read.table(file, header=F, stringsAsFactors=F) if(length(df) > 6){ df <- df[,-c(7:length(df))] } if(length(df)<3){ stop("File has less than 3 columns") } header <- c('chr','start','end','id','score','strand') names(df) <- header[1:length(names(df))] if('strand' %in% colnames(df)){ df$strand <- gsub(pattern="[^+-]+", replacement = '*', x = df$strand) } library("GenomicRanges") if(length(df)==3){ gr <- with(df, GRanges(chr, IRanges(start, end))) } else if (length(df)==4){ gr <- with(df, GRanges(chr, IRanges(start, end), id=id)) } else if (length(df)==5){ gr <- with(df, GRanges(chr, IRanges(start, end), id=id, score=score)) } else if (length(df)==6){ gr <- with(df, GRanges(chr, IRanges(start, end), id=id, score=score, strand=strand)) } return(gr) }
The above function is stored in a file called bed_to_granges.R, which I stored in the bedr/R directory. Next I'll create the documentation by running the document() function and I'm done!
setwd("bedr") document()
Moving the package to GitHub
I made a new repository on GitHub and pushed all the files to it:
pwd /Users/davetang/bedr git init git add -A git commit -m 'Initial commit' git remote add origin git@github.com:davetang/bedr.git git push origin master
Testing the package
The reason I wrote this function was to make it easy to load broad peak files from ENCODE. It's a nine column file, where the first six columns contain the typical information of a BED file, and the last three columns contain information regarding the peaks. Let's download it to test the bedr package:
wget http://hgdownload.cse.ucsc.edu/goldenPath/hg19/encodeDCC/wgEncodeHaibTfbs/wgEncodeHaibTfbsH1hescPol2V0416102PkRep1.broadPeak.gz
Now we can install the package directly from GitHub and test it by loading the broad peak file:
library(devtools) install_github('davetang/bedr') library(bedr) my_file <- 'wgEncodeHaibTfbsH1hescPol2V0416102PkRep1.broadPeak.gz' my_granges <- bed_to_granges(my_file) head(my_granges) GRanges object with 6 ranges and 2 metadata columns: seqnames ranges strand | id score <Rle> <IRanges> <Rle> | <character> <integer> [1] chr1 [ 9932, 10575] * | peak1 251 [2] chr1 [10936, 26830] * | peak2 958 [3] chr1 [26931, 31558] * | peak3 958 [4] chr1 [31777, 33355] * | peak4 176 [5] chr1 [34204, 35247] * | peak5 95 [6] chr1 [35538, 39530] * | peak6 958 ------- seqinfo: 24 sequences from an unspecified genome; no seqlengths
What can we do with GRanges objects?
For more information have a look at a post I wrote on the GenomicRanges package and on the Gviz package for those who are interested.
Adding more functionality to bedr
I want to add another function to the bedr package; firstly, I'll clone the repository:
git clone git@github.com:davetang/bedr.git
Now write another function:
#' Intersect BED #' #' This function performs an intersection between two GRanges objects #' and returns the results in a data.frame #' #' @param a First GRanges object #' @param b Second GRanges object #' @keywords BED GRanges #' @export #' @examples #' intersect_bed(a, b) intersect_bed <- function(a, b){ library(GenomicRanges) my_hit <- findOverlaps(a, b) my_df <- cbind(as.data.frame(a[queryHits(my_hit)]), as.data.frame(b[subjectHits(my_hit)])) }
Save the above function as a file called intersect_bed.R in the bedr/R directory. Next create the documentation by running the document() function.
library(roxygen2) library(devtools) setwd('~/github/bedr/') document()
Now push to GitHub:
cd ~/github/bedr git add -A git commit -m 'intersect_bed.R' git push origin master

This work is licensed under a Creative Commons
Attribution 4.0 International License.
Bioconductor package ChIPseeker can read bed file into GRanges object. http://bioconductor.org/packages/devel/bioc/html/ChIPseeker.html
Thanks!
I was planning to write the same stuff long time ago….good to see it here.
I was just playing with your code and I noticed if fails if ‘strand’ is not specified in the input bed file. Easy enough to fix:
if(‘strand’ %in% colnames(df)){
df$strand <- gsub(pattern="[^+-]+", replacement = '*', x = df$strand)
}
Ah yes, I should only make the substitution when there’s strand information. I’ve updated the code, thanks!
I think you should be adding one to the start coordinate since bed is 0 based.
https://stat.ethz.ch/pipermail/bioc-sig-sequencing/2010-April/001147.html
Also rtracklayer::import.bed does this nicely.