Wordcloud of PubMed searches

At the start of this year I created a Twitter account that automatically tweets out papers related to transcriptomes, i.e. a Twitter literature bot. This idea isn’t new and there are over 200 Twitter literature bots. However, I wrote my Twitter bot using R (and using the RISmed package to search PubMed for papers) and it’s running on an EC2 instance, which is part of Amazon Web Services. I went with this approach simply because I wanted to try out Amazon Web Services; I will have to find another server to run my Twitter bot when my free period is over.

Each day a PubMed search is performed using just the keyword “transcriptome” and the results of the search are saved. Journal articles are indexed with various keywords, so my search will pick up articles even if “transcriptome” wasn’t in the title of the paper (see MeSH for more information). Below I create a wordcloud of PubMed searches conducted over a period of six months.

library(wordcloud)
library(tm)
library(RISmed)

setwd("~/tmp/transcriptomes/")
my_file <- list.files()

all_title <- array()
my_counter <- 1

for (f in 1:length(my_file)){
  load(my_file[f])
  x <- QueryId(summary)
  result <- EUtilsGet(summary)
  for (i in 1:length(x)){
    my_title <- ArticleTitle(result)[i]
    if (length(my_title) > 0){
      all_title[my_counter] <- my_title
      my_counter <- my_counter + 1
    }
  }
  Sys.sleep(10)
}

length(all_title)
[1] 4519

all_title_uniq <- unique(all_title)
length(all_title_uniq)
[1] 2388

wordcloud(unlist(all_title_uniq))

wordcloudErm, too many words.

Limit the wordcloud to 500 words with a minimum frequency of 10.

wordcloud(unlist(all_title_uniq),
          max.words=500,
          min.freq = 10)

wordcloud_2Analysis of transcriptome expression profiling reveals stress response genes.

Print Friendly, PDF & Email



Creative Commons License
This work is licensed under a Creative Commons
Attribution 4.0 International License
.

Leave a Reply

Your email address will not be published. Required fields are marked *

This site uses Akismet to reduce spam. Learn how your comment data is processed.