Interactive plots in R

Interactive plots, as the name suggests, are plots that users can interact with. In my last post, I mentioned that for interactive heatmaps I use the d3heatmap package. To get started with this post, I'll create the same heatmap as my last post but this time using the d3heatmap package.

# install packages if you haven't already
install.packages("d3heatmap")
install.packages("RColorBrewer")
source("https://bioconductor.org/biocLite.R")
biocLite("DESeq")

# load libraries
library("DESeq")
library("RColorBrewer")
library("d3heatmap")

example_file <- system.file ("extra/TagSeqExample.tab", package="DESeq")
data <- read.delim(example_file, header=T, row.names="gene")
data_subset <- as.matrix(data[rowSums(data)>50000,])

# using the same colour scheme as pheatmap
d3heatmap(data_subset,
          colors = colorRampPalette(rev(brewer.pal(n = 7, name = "RdYlBu")))(100))

If you hover over the cells, you can see the corresponding gene and sample. You can drag and select to zoom into specific cells too (clicking once on the zoomed in area will bring you back to the full heatmap).

The next example uses the plotly package to make ggplot2 plots interactive. We'll make some plots using my latest web traffic for this blog, which I have saved as a csv file.

my_csv <- "https://davetang.org/site_stat/blog_20180517.csv"

d <- read.csv(my_csv)
d$date    <- as.Date(d$date)
d$day     <- factor(weekdays(d$date), levels = c('Monday', 'Tuesday', 'Wednesday', 'Thursday', 'Friday', 'Saturday', 'Sunday'))
d$weekend <- grepl(pattern = "^S", x = d$day)
d$month   <- factor(months(d$date), levels = month.name)
d$quarter <- factor(quarters(d$date))
d$year    <- format(d$date, "%Y")
d$cumsum  <- cumsum(d$views)

head(d)
        date views       day weekend   month quarter year cumsum
1 2013-01-22   130   Tuesday   FALSE January      Q1 2013    130
2 2013-01-23   269 Wednesday   FALSE January      Q1 2013    399
3 2013-01-24   258  Thursday   FALSE January      Q1 2013    657
4 2013-01-25   146    Friday   FALSE January      Q1 2013    803
5 2013-01-26    52  Saturday    TRUE January      Q1 2013    855
6 2013-01-27    53    Sunday    TRUE January      Q1 2013    908

I will also make use of the ggbeeswarm, ggthemes, ggplot2, and plotly packages. The ggbeeswarm package is a nice visualisation as it plots all observations and arranges the points according to the density.

# install packages if you haven't already
install.packages("ggbeeswarm")
install.packages("ggthemes")
install.packages("ggplot2")
install.packages("plotly")

# load libraries
library("ggbeeswarm")
library("ggthemes")
library("ggplot2")
library("plotly")

p <- ggplot(d, aes(x = day, y = views, colour = day, text = date)) +
  ggbeeswarm::geom_quasirandom() +
  theme_tufte() +
  theme(legend.title = element_blank(),
        axis.title.x = element_blank(),
        panel.border = element_rect(fill = NA)) +
  ylab("Views")

ggplotly(p)

Please wait patiently while the plot loads. Once loaded, you can hover over the points to see the view count on a particular date. For example, the most traffic I have ever gotten for a single day was just two days ago. You can also click on the days in the legend to hide points for a particular day (not that useful here but useful for scatter plots with different groups).

From the plot, we can see that we get more overall traffic on certain days. Since most of my blog posts are work related, I get a lot more visitors on weekdays and of the weekdays, I get the least traffic on Fridays.

I'll use dplyr to separate the traffic per day and conduct a pairwise Wilcoxon rank sum test between all days. The p-values suggest that traffic distributions are not the same between most days.

# install packages if you haven't already
install.packages("dplyr")
library("dplyr")

# to ensure that I have equal lengths of each day
# I start on a Monday and end on a Sunday
my_monday    <- d %>% filter(day == "Monday", date > "2013-01-27", date < "2018-05-14") %>% select(views) %>% pull
my_tuesday   <- d %>% filter(day == "Tuesday", date > "2013-01-27", date < "2018-05-14") %>% select(views) %>% pull
my_wednesday <- d %>% filter(day == "Wednesday", date > "2013-01-27", date < "2018-05-14") %>% select(views) %>% pull
my_thursday  <- d %>% filter(day == "Thursday", date > "2013-01-27", date < "2018-05-14") %>% select(views) %>% pull
my_friday    <- d %>% filter(day == "Friday", date > "2013-01-27", date < "2018-05-14") %>% select(views) %>% pull
my_saturday  <- d %>% filter(day == "Saturday", date > "2013-01-27", date < "2018-05-14") %>% select(views) %>% pull
my_sunday    <- d %>% filter(day == "Sunday", date > "2013-01-27", date < "2018-05-14") %>% select(views) %>% pull

my_view <- c(my_monday, my_tuesday, my_wednesday, my_thursday, my_friday, my_saturday, my_sunday)
my_factor <- factor(rep(c("Monday", "Tuesday", "Wednesday", "Thursday", "Friday", "Saturday", "Sunday"),
                        c(length(my_monday), length(my_tuesday), length(my_wednesday), length(my_thursday), length(my_friday), length(my_saturday), length(my_sunday))),
                    levels = c("Monday", "Tuesday", "Wednesday", "Thursday", "Friday", "Saturday", "Sunday"))

pairwise.wilcox.test(my_view, my_factor, p.adjust.method = "BH")

	Pairwise comparisons using Wilcoxon rank sum test 

data:  my_view and my_factor 

          Monday  Tuesday Wednesday Thursday Friday  Saturday
Tuesday   0.0143  -       -         -        -       -       
Wednesday 0.0617  0.5160  -         -        -       -       
Thursday  0.2887  0.1608  0.4072    -        -       -       
Friday    0.0038  7.5e-08 1.0e-06   4.2e-05  -       -       
Saturday  < 2e-16 < 2e-16 < 2e-16   < 2e-16  < 2e-16 -       
Sunday    < 2e-16 < 2e-16 < 2e-16   < 2e-16  < 2e-16 0.0155  

P value adjustment method: BH

Finally, I'll use dygraphs to plot the web traffic. Since my web traffic is a time-series, I'll use the zoo and xts packages to create time-series objects; xts objects are compatible with dygraphs.

# install packages if you haven't already
install.packages("dygraphs")
install.packages("zoo")
install.packages("xts")

# load libraries
library("dygraphs")
library("zoo")
library("xts")

# I only plotted the weekdays here, since the weekend traffic is too different
# since I have irregular intervals (I removed the weekends) I used the zoo package
# and converted the zoo object to an xts object for use with dygraph
my_weekday_date <- d %>% filter(weekend == FALSE) %>% select(date) %>% pull
my_weekday_view <- d %>% filter(weekend == FALSE) %>% select(views) %>% pull
my_zoo_weekday <- zoo(my_weekday_view, my_weekday_date)
my_zoo_weekday_xts <- as.xts(my_zoo_weekday, order.by = my_weekday_date)

dygraph(my_zoo_weekday_xts, main = "Web traffic for https://davetang.org/muse") %>% 
  dyRangeSelector(dateWindow = c("2017-01-01", "2018-05-17"))

You can mouse over the graph to show the view counts for particular days and adjust the slider to focus the plot on specific time periods. (The increase in the traffic as of late is because I upgraded my web hosting plan, which provides more resources; I didn't realise I was hitting resource limits.)

For my last plot, I create a separate time-series for each day of the week.

# the time-series interval is per week
# that way I can provide counts per day for that particular week
my_week      <- seq(d$date[7], d$date[1938], by = "week")
my_monday    <- d %>% filter(day == "Monday", date > "2013-01-27", date < "2018-05-14") %>% select(views) %>% pull
my_tuesday   <- d %>% filter(day == "Tuesday", date > "2013-01-27", date < "2018-05-14") %>% select(views) %>% pull
my_wednesday <- d %>% filter(day == "Wednesday", date > "2013-01-27", date < "2018-05-14") %>% select(views) %>% pull
my_thursday  <- d %>% filter(day == "Thursday", date > "2013-01-27", date < "2018-05-14") %>% select(views) %>% pull
my_friday    <- d %>% filter(day == "Friday", date > "2013-01-27", date < "2018-05-14") %>% select(views) %>% pull
my_saturday  <- d %>% filter(day == "Saturday", date > "2013-01-27", date < "2018-05-14") %>% select(views) %>% pull
my_sunday    <- d %>% filter(day == "Sunday", date > "2013-01-27", date < "2018-05-14") %>% select(views) %>% pull

my_monday_xts    <- as.xts(zoo(my_monday, my_week), order.by = my_week)
my_tuesday_xts   <- as.xts(zoo(my_tuesday, my_week), order.by = my_week)
my_wednesday_xts <- as.xts(zoo(my_wednesday, my_week), order.by = my_week)
my_thursday_xts  <- as.xts(zoo(my_thursday, my_week), order.by = my_week)
my_friday_xts    <- as.xts(zoo(my_friday, my_week), order.by = my_week)
my_saturday_xts  <- as.xts(zoo(my_saturday, my_week), order.by = my_week)
my_sunday_xts    <- as.xts(zoo(my_sunday, my_week), order.by = my_week)

my_merged_xts <- merge.zoo(my_monday_xts,
                           my_tuesday_xts,
                           my_wednesday_xts,
                           my_thursday_xts,
                           my_friday_xts,
                           my_saturday_xts,
                           my_sunday_xts)

dygraph(my_merged_xts, main = "Web traffic for https://davetang.org/muse") %>% 
  dyRangeSelector(dateWindow = c("2017-01-01", "2018-05-17")) %>%
  dyLegend(width = 200, show = "follow")

Mousing over the plot will show the view counts for each day for a particular week.

Summary

Interactive plots are quite useful, especially for finding outliers. Be sure to check out plotly and htmlwidgets for even more interactive plots.




Creative Commons License
This work is licensed under a Creative Commons
Attribution 4.0 International License
.
2 comments Add yours
    1. Thanks Mikhail! The d3heatmap is missing a scale, which heatmaply provides, so +1 for heatmaply.

Leave a Reply

Your email address will not be published. Required fields are marked *

This site uses Akismet to reduce spam. Learn how your comment data is processed.