I've always wanted to create a transcriptome feed on Twitter, posting the results of daily PubMed searches. Well today I finally got around to it. Firstly, I made a new Twitter account; annoyingly all the Twitter handles I wanted were taken by inactive users. I decided to go with @transcriptomes. Next, I made a new Twitter application that's associated with my new Twitter account (I set permissions to "Read, Write and Access direct messages"), and I set it up so that I could use twitteR to communicate with this app. For this post, I'm using OS X 10.10.1 on a MacBook Air.
#install the package install.packages("twitteR") #load the package library("twitteR") #to get your consumerKey and consumerSecret see the twitteR documentation for instructions consumer_key <- 'secret' consumer_secret <- 'secret' access_token <- 'secret' access_secret <- 'secret' setup_twitter_oauth(consumer_key, consumer_secret, access_token, access_secret) #send first tweet updateStatus("It's alive!")
PubMed searches
I used the RISmed package to perform PubMed queries.
#install package install.packages("RISmed") #load the package library(RISmed)
I created a simple search, which looks for articles with the keyword "transcriptome" associated with it and have been deposited in the repository since yesterday.
#Get summary information on the results of a query #the reldate parameter limits results #to articles deposited since one day ago summary <- EUtilsSummary('transcriptome', type='esearch', db='pubmed', reldate=1) #download results of a query result <- EUtilsGet(summary) #hard limit of 50 my_limit <- 50 if(QueryCount(summary) <= my_limit){ my_limit <- QueryCount(summary) } #loop through the results for (i in 1:my_limit){ #PubMed ID my_id <- QueryId(summary)[i] #title of paper my_title <- ArticleTitle(result)[i] #tweets have a 140 char limitation if(nchar(my_title) > 93){ my_title <- substr(my_title, start=0, stop=93) my_title <- paste(my_title, '...', sep='') } #create URL that links to the paper my_url <- paste('http://www.ncbi.nlm.nih.gov/pubmed/', my_id, sep='') #create my tweet my_tweet <- paste(my_title, my_url) #sleep Sys.sleep(2) #tweet the paper! updateStatus(my_tweet) }
Setting up cron
I want to perform this search automatically each day. Below is the cron job I set up; it runs the feed.R script every hour. I set it up this way because I don't leave my laptop on all the time. There's a very high probability that I leave my laptop on for over an hour, so it should get run at least once.
Update: I changed the cron job to run hourly after 15:00 (GMT+9); this makes it so that the current day on the NCBI server matches my current day.
crontab -l #minute hour dom month dow user cmd 0 15-23 * * * cd /Users/davetang/Dropbox/transcriptomes && ./feed.R &> /dev/null
I don't want to tweet the results again if I've already tweeted about it. To prevent that, the feed.R script simply looks for a file, which is named according to the date (YYYYMMDD), and if it exists the script will quit. The contains of this file are the results of the PubMed search, which I wanted to save anyway.
cat feed.R #!/usr/bin/env Rscript library("twitteR") library(RISmed) load("twitter_authentication.Rdata") registerTwitterOAuth(cred) today <- Sys.Date() today <- format(today, format="%Y%m%d") if(file.exists(today)){ quit() } summary <- EUtilsSummary('transcriptome', type='esearch', db='pubmed', reldate=1) result <- EUtilsGet(summary) my_limit <- 50 if(QueryCount(summary) <= my_limit){ my_limit <- QueryCount(summary) } for (i in 1:my_limit){ my_id <- QueryId(summary)[i] my_title <- ArticleTitle(result)[i] if(nchar(my_title) > 93){ my_title <- substr(my_title, start=0, stop=93) my_title <- paste(my_title, '...', sep='') } my_url <- paste('http://www.ncbi.nlm.nih.gov/pubmed/', my_id, sep='') my_tweet <- paste(my_title, my_url) #delay the tweeting by 3 seconds Sys.sleep(3) updateStatus(my_tweet) } #save today's summary save(summary, file = today) quit()
And that's it! I'll keep an eye on @transcriptomes to see if any problems come up.
This work is licensed under a Creative Commons
Attribution 4.0 International License.
Can’t believe I missed this (though probably as you didn’t tweet a link to this post!), I ended up doing a convoluted version of what you have for Google Scholar Alerts just this week. I thought I’d spotted a mistake in your calculation of link length, but I’m kinda horrified to see that dlvr.it truncates paper titles excessively in all the existing Twitter bots, making titles 20-odd characters less readable than needs be :-/
I feel like link shorteners are bad practice for science online anyway, since it makes any data mining exercise that bit more difficult.
The character limit on Twitter’s links is deceptive because t.co shortening is applied – at present, https protocol links “count for” 23 characters, http 22, so my equivalent of your
if nchar(my_title) > 93)
is to pass the URL into a functionInCharLimit
which returns the title’s character limit:InCharLimit <- function(tweet.url.string = '') {
# Cautious: assume link will be longest possible (https, 23 characters)…
url.char.count <- https.chars <- 23L
http.chars <- 22L
# …unless it is proven otherwise
if (confirmed.http <- grepl('http://',tweet.url.string))
url.char.count <- http.chars
return(title.char.limit <- 140L – url.char.count – 1)
}
# when calling AbbrevTitle on the title such that the URL (hence char. lim.) is taken into account
AbbrevTitle <- function(start.str, known.url = NULL, use.abbreviations = T, max.compact = T, above.env = parent.frame()) {
if (!is.null(known.url)) char.limit <- InCharLimit(known.url) else char.limit = 116L
working.title char.limit) {
# abbreviation algorithm attempts to get below character limit…
}
Also I don’t think
sleep(3)
is necessary, rate limit windows are over 15 minute intervals and for GET not POST [e.g. update status] requests. Updating a status isn’t API limited, just limited as a normal account would be, to 2,400 tweets per day, “broken down into semi-hourly limits” (not necessarily 50 per half hour), so sleeping for 3 seconds wouldn’t make a difference – 30s, 5m, 30m, … recursive sleep seems to be the company recommendation on forums.Thanks for sharing that cron script, that’s one of my next things to organise. Feel free to check out my version on GitHub 🙂
I’ll probably switch from JSON to RData storage of ‘seen’ message IDs in my code too, first I’ve seen of the format here.
Hi Louis,
thanks (again) for the detailed comment! You are right about the length condition; I did realise that twitter uses t.co shortening, so I could have maximised the title. But I just went with the easiest (laziest) approach.
As you mentioned in your tweet, I would also prefer a non-cron approach because there are days when I don’t have Internet connection. I’ll look into AWS Lambda.
Cheers,
Dave