Using the R twitteR package

  • Updated 2014 November 26th to reflect changes in the tm package
  • Updated 2015 February 18th to reflect changes in the twitteR package

A short post on using the R twitteR package for text mining and using the R wordcloud package for visualisation. I did this on my Windows machine, which has this problem. I've updated the code due to changes in the recent update of the twitteR package. In addition, I have included a screenshot below from my Twitter Apps Keys and Access Tokens page to indicate where to get the consumer_key, consumer_secret, access_token, and access_secret values.

twitter_appPlease enter your values in the code below.

#install the necessary packages
install.packages("twitteR")
install.packages("wordcloud")
install.packages("tm")

library("twitteR")
library("wordcloud")
library("tm")

#necessary file for Windows
download.file(url="http://curl.haxx.se/ca/cacert.pem", destfile="cacert.pem")

#to get your consumerKey and consumerSecret see the twitteR documentation for instructions
consumer_key <- 'your key'
consumer_secret <- 'your secret'
access_token <- 'your access token'
access_secret <- 'your access secret'
setup_twitter_oauth(consumer_key,
                    consumer_secret,
                    access_token,
                    access_secret)

#the cainfo parameter is necessary only on Windows
r_stats <- searchTwitter("#Rstats", n=1500, cainfo="cacert.pem")
#should get 1500
length(r_stats)
#[1] 1500

#save text
r_stats_text <- sapply(r_stats, function(x) x$getText())

#create corpus
r_stats_text_corpus <- Corpus(VectorSource(r_stats_text))

#clean up
r_stats_text_corpus <- tm_map(r_stats_text_corpus, content_transformer(tolower)) 
r_stats_text_corpus <- tm_map(r_stats_text_corpus, removePunctuation)
r_stats_text_corpus <- tm_map(r_stats_text_corpus, function(x)removeWords(x,stopwords()))
wordcloud(r_stats_text_corpus)

#alternative steps if you're running into problems 
r_stats<- searchTwitter("#Rstats", n=1500, cainfo="cacert.pem")
#save text
r_stats_text <- sapply(r_stats, function(x) x$getText())
#create corpus
r_stats_text_corpus <- Corpus(VectorSource(r_stats_text))

#if you get the below error
#In mclapply(content(x), FUN, ...) :
#  all scheduled cores encountered errors in user code
#add mc.cores=1 into each function

#run this step if you get the error:
#(please break it!)' in 'utf8towcs'
r_stats_text_corpus <- tm_map(r_stats_text_corpus,
                              content_transformer(function(x) iconv(x, to='UTF-8-MAC', sub='byte')),
                              mc.cores=1
                              )
r_stats_text_corpus <- tm_map(r_stats_text_corpus, content_transformer(tolower), mc.cores=1)
r_stats_text_corpus <- tm_map(r_stats_text_corpus, removePunctuation, mc.cores=1)
r_stats_text_corpus <- tm_map(r_stats_text_corpus, function(x)removeWords(x,stopwords()), mc.cores=1)
wordcloud(r_stats_text_corpus)

r_stats_wordcloudI learned who Hadley Wickham is after seeing this

Let's try the hash tag #bioinformatics

bioinformatics <- searchTwitter("#bioinformatics", n=1500, cainfo="cacert.pem")
bioinformatics_text <- sapply(bioinformatics, function(x) x$getText())
bioinformatics_text_corpus <- Corpus(VectorSource(bioinformatics_text))
bioinformatics_text_corpus <- tm_map(bioinformatics_text_corpus,
                              content_transformer(function(x) iconv(x, to='UTF-8-MAC', sub='byte')),
                              mc.cores=1
                              )
bioinformatics_text_corpus <- tm_map(bioinformatics_text_corpus, content_transformer(tolower), mc.cores=1)
bioinformatics_text_corpus <- tm_map(bioinformatics_text_corpus, removePunctuation, mc.cores=1)
bioinformatics_text_corpus <- tm_map(bioinformatics_text_corpus, function(x)removeWords(x,stopwords()), mc.cores=1)
wordcloud(bioinformatics_text_corpus)

#if you're getting the error:
#could not be fit on page. It will not be plotted.
#try changing the scale, like
#wordcloud(bioinformatics_text_corpus, scale=c(2,0.2))

bioinformatics_wordcloudI had a small chuckle on finding the word "ultratricky" on the far left towards the top

Wordcloud with less words, minimum frequency and in colour

library(RColorBrewer)
pal2 <- brewer.pal(8,"Dark2")
wordcloud(bioinformatics_text_corpus,min.freq=2,max.words=100, random.order=T, colors=pal2)

bioinformatics_wordcloud_colourI guess bioinformatics and genomics go hand in hand. Good to see that people are tweeting on training, workshops and resources for bioinformatics.

Conclusions

There are several posts on the web already on using the R twitteR package for text mining; I found them after I had the idea of making a word cloud using people's tweets that had a particular hash tag. There are many other uses of the R twitteR package, for example you could run a daily cron job to check which of your followers have unfollowed you and so on.

me <- getUser("davetang31", cainfo="cacert.pem")
me$getId()
[1] "555580799"
getUser(555580799,cainfo="cacert.pem")
[1] "davetang31"
me$getFollowerIDs(cainfo="cacert.pem")
#or
me$getFollowers(cainfo="cacert.pem")
#you can also see what's trending
trend <- availableTrendLocations(cainfo="cacert.pem")
head(trend)
       name country woeid
1 Worldwide             1
2  Winnipeg  Canada  2972
3    Ottawa  Canada  3369
4    Quebec  Canada  3444
5  Montreal  Canada  3534
6   Toronto  Canada  4118
trend <- getTrends(1, cainfo="cacert.pem")
trend
                                name
1                #My15FavoritesSongs
2                 #MikaNoLegendários
3  #CitePessoasMaisGostosasDoTwitter
4           #SiMiMamáFueraPresidenta
5                     #HappySiwonDay
6                           C.J Fair
7                             Mewtwo
8                       Mitch McGary
9                            Carreño
10                         John Wall
                                                               url
1                http://twitter.com/search?q=%23My15FavoritesSongs
2            http://twitter.com/search?q=%23MikaNoLegend%C3%A1rios
3  http://twitter.com/search?q=%23CitePessoasMaisGostosasDoTwitter
4      http://twitter.com/search?q=%23SiMiMam%C3%A1FueraPresidenta
5                     http://twitter.com/search?q=%23HappySiwonDay
6                       http://twitter.com/search?q=%22C.J+Fair%22
7                               http://twitter.com/search?q=Mewtwo
8                   http://twitter.com/search?q=%22Mitch+McGary%22
9                         http://twitter.com/search?q=Carre%C3%B1o
10                     http://twitter.com/search?q=%22John+Wall%22
                                 query woeid
1                %23My15FavoritesSongs     1
2            %23MikaNoLegend%C3%A1rios     1
3  %23CitePessoasMaisGostosasDoTwitter     1
4      %23SiMiMam%C3%A1FueraPresidenta     1
5                     %23HappySiwonDay     1
6                       %22C.J+Fair%22     1
7                               Mewtwo     1
8                   %22Mitch+McGary%22     1
9                         Carre%C3%B1o     1
10                     %22John+Wall%22     1

See also

twitteR vignette
Evaluation of twitteR
Text data mining twitteR
Word cloud in R

Print Friendly, PDF & Email



Creative Commons License
This work is licensed under a Creative Commons
Attribution 4.0 International License
.
72 comments Add yours
  1. Hello Dave, this is José from Venezuela. So im triying to replicate your code and i think im almost there however i have the following problem:
    after getting the :
    > registerTwitterOAuth(cred)
    [1] TRUE
    I try to get the word cloud for the term “patria” (homeland) and i get the following error:
    Error in twInterfaceObj$doAPICall(cmd, params, “GET”, …) :
    Error: Authorization Required
    Any idea on how to solve it? Thanks.

      1. Hi Davo,

        Very useful intro to twitteR! Congrats. I am facing the same issue ass José. I get,

        > registerTwitterOAuth(cred)
        [1] TRUE

        But then,
        > searchTwitter(“#rstats”, n=1500, cainfo=”cacert.pem”)
        [1] “Authorization Required”
        Error in twInterfaceObj$doAPICall(cmd, params, “GET”, …) :
        Error: Authorization Required

        I checked my app, but seems to be correctly set. Any tip is appreciated. Thanks!

  2. I checked the steps again and then took your advice to “set up your Twitter application properly at https://apps.twitter.com/” and was able to made it. 🙂 thanks for your help.
    For everyone, i used “API key” instead of “consumer key” and “API secret” and consumer secret.

    1. You wrote: For everyone, i used “API key” instead of “consumer key” and “API secret” and consumer secret.

      In apps.twitter.com, can you point me to where I can get API Key and Secret? Under Application Settings, I see API Key mentioned in the same field as Consumer Key i.e. they are the same.

      I am getting the same error as you did.

      Thanks!

  3. Thanks. Really helpful. Though I had to change:

    r_stats_text_corpus <- tm_map(r_stats_text_corpus, tolower)
    to
    r_stats_text_corpus <- tm_map(r_stats_text_corpus, content_transformer(tolower))

    As I understand, searchTwitter returns random Twitter messages, though it's possible to specify a date range. Which means it is not really good for doing reliable analysis.

    Is there a way of obtaining more Twitter data — i.e. all Tweets from the past 30 days?

  4. I’m experiencing this error:
    Warning message:
    In doRppAPICall(“search/tweets”, n, params = params, retryOnRateLimit = retryOnRateLimit, :
    1500 tweets were requested but the API can only return 15

    when trying to collect tweets.
    What’s the reason and how can it be fixed?

    Thanks

  5. Nice post. I really could run few wordcloud examples after reading your post. I did encounter the following issues,
    1. they need to be https:// instead of http://
    2. the corpus got a document error. so had to include one more transformation step before running the wordcloud function
    r_stats_text_corpus <- tm_map(r_stats_text_corpus, PlainTextDocument)

  6. Hi,

    The post was helpful in teaching anyone who wants to learn how to the use the twitteR package in R. However, I have encountered some problems while replicating your code. At the last line wordcloud(r_stats_text_corpus), i get an error:

    Error in UseMethod(“meta”, x) :
    no applicable method for ‘meta’ applied to an object of class “try-error”
    In addition: Warning messages:
    1: In mclapply(x$content[i], function(d) tm_reduce(d, x$lazy$maps)) :
    scheduled core 1 encountered error in user code, all values of the job will be affected
    2: In mclapply(unname(content(x)), termFreq, control) :
    all scheduled cores encountered errors in user code

    Did anyone face the same problem and could provide some help?

  7. I had to add lazy=TRUE to the line:
    r_stats_text_corpus <- tm_map(r_stats_text_corpus, content_transformer(tolower), lazy=TRUE)
    In order not get an error message about the cores in the following way:
    Warning messages:
    1: In mclapply(x$content[i], function(d) tm_reduce(d, x$lazy$maps)) :
    all scheduled cores encountered errors in user code

    But when everything goes without error meassages i get(using the r_stats_text_corpus <- tm_map(r_stats_text_corpus, PlainTextDocument)) :
    Warning messages:
    1: In wordcloud(bigdata_corpus) :
    usemethod("removewords", could not be fit on page. It will not be plotted.
    2: In wordcloud(bigdata_corpus) :
    'removewords' could not be fit on page. It will not be plotted.
    3: In wordcloud(bigdata_corpus) :
    method could not be fit on page. It will not be plotted.
    This produces a cloud plot of the warning words from the R console!

    When i dont use the r_stats_text_corpus <- tm_map(r_stats_text_corpus, PlainTextDocument) i get after hitting the wordcloud function:
    Error in UseMethod("meta", x) :
    no applicable method for 'meta' applied to an object of class "try-error"
    In addition: Warning messages:
    1: In mclapply(x$content[i], function(d) tm_reduce(d, x$lazy$maps)) :
    all scheduled cores encountered errors in user code
    2: In mclapply(unname(content(x)), termFreq, control) :
    all scheduled cores encountered errors in user code

    Feel very confused and i am a beginner at R

      1. Thank you for your answer it works fine now.

        Now i get alot of warnings when performing the wordcloud function:
        Warning messages:
        1: “mc.cores” is not a graphical parameter
        2: “mc.cores” is not a graphical parameter
        3: “mc.cores” is not a graphical parameter

        Does that mean i have something unwanted in my data?

        1. No problems. Remove the mc.cores parameter from the wordcloud() function. Then everything should work nicely.

  8. Hi Dave,
    could you please help me?after handshake R returns “Error: Authorization Required”.
    How can I be sure about my twitter app settings?

    thanks

  9. Hi Dave, thanks for your answer. Yes, I did.

    (…….)

    #to get your consumerKey and consumerSecret see the twitteR documentation for instructions
    cred <- OAuthFactory$new(consumerKey="MY c. KEY",
    consumerSecret="MY s. KEY",
    requestURL="https://api.twitter.com/oauth/request_token&quot;,
    accessURL="https://api.twitter.com/oauth/access_token&quot;,
    authURL="https://api.twitter.com/oauth/authorize&quot😉

    #necessary step for Windows
    cred$handshake(cainfo="cacert.pem")

    Yesterday it turned me a PIN one time, after that the authorization was denied. I create a new app. but unfortunately always the same error.

  10. I have two questions. First, how would the code differ for Mac users? Second, I have not been able to receive the PIN, I get the following error despite having reinstalled RCurl. Would appreciate thoughts.

    Error in function (type, msg, asError = TRUE) :
    error setting certificate verify locations:
    CAfile:
    CApath: none

    1. At least on my MacBook Air, I don’t need the cacert.pem file. All the code will be the same but leave out the cainfo=”cacert.pem” part in the functions.

  11. Hi,

    I tried the same code above.I have my API keys but I got below shown error.

    setup_twitter_oauth(APP_KEY,APP_SECRET,OAUTH_TOKEN,OAUTH_TOKEN_SECRET )
    [1] “Using direct authentication”
    Error in check_twitter_oauth() : OAuth authentication error:
    This most likely means that you have incorrectly called setup_twitter_oauth()’

    Please help! I have been banging my head on this.

    1. Make a new folder/directory somewhere on your computer, restart R, use the setwd() function to tell R to use the new folder/directory that you made, and try the code again. Do you get the same error message?

  12. Thank you very much for the tips on how to solve the pernicious

    In mclapply(content(x), FUN, …) :
    all scheduled cores encountered errors in user code

  13. hi
    I tried the same code above

    r_stats <- searchTwitter("#Rstats", n=1500, cainfo="cacert.pem")

    I had a error message
    "Error in tw_from_response(out, …) :
    unused argument (cainfo = "cacert.pem")"

    Do you know solution to solve?

    1. What happens when you leave out the cainfo=”cacert.pem” parameter in the searchTwitter() function?

  14. Hi!!

    I follow all the steps but I obtain the next:

    [1] “Using direct authentication”
    Use a local file to cache OAuth access credentials between R sessions?
    1: Yes
    2: No

    What can I do to obtain a [1] TRUE by the console which means that the handshake is complete?

    Thanks!!

    1. Hi Peter,

      for the caching, I select “2: No”, and that seems to work for me.

      Cheers,

      Dave

  15. Dear Dave,

    Thank you for this illuminating post. It really helped me lot to understand this function.

    I still have two questions that I cannot find the answer to online and thus turn to you.

    First, I run a Windows 7 OS and use RStudio. There was no need to insert “cainfo=”cacert.pem” ” into my command. Am I missing something?

    Second, the searches do not produce Tweets that are older than a week. When I use the date setting commands, the search produces no results. I find similar observations from other users online, but no one seems to have an answer. Do you?

    Thank you very much for any hint you can provide,
    Johannes

    1. Hi Johannes,

      regarding the cainfo, you are right, it is not needed with the updated version of the twitteR.

      You are right again, at least to best of my knowledge, regarding searching older tweets; you can only search up to one week.

      What you can do is to do a daily search and save the results.

      Cheers,

      Dave

  16. Hi,
    I followed up your direction,
    but in the process of authentification,
    the error message appeared as below.

    Error in registerTwitterOAuth(twitCred) :
    ROAuth is no longer used in favor of httr, please see ?setup_twitter_oauth

    How can I resolve this problem?

    1. Hi Sooy,

      are you using the setup_twitter_oauth() function as I have shown in the post?

      Cheers,

      Dave

      1. Yeah,
        I followed up your guide again.
        but the error appeared as below.

        Error in check_twitter_oauth() : OAuth authentication error:
        This most likely means that you have incorrectly called setup_twitter_oauth()’

  17. Hi, thanks! It was of great help!!!!!
    After hours of debugging in vain, I came across your post… and finally it was a cakewalk… 🙂

  18. Hi Davo

    Your tutorial helped me a lot.
    I want to combine multiple twitter seraches in to one list. How can I do that before setting up the wordcloud

    TweetsList_1 <- searchTwitter("serach1", n=1500)
    TweetsList_2 <- searchTwitter("serach2", n=1500)

    How can I combine, TweetsList_1 and TweetsList_2 in to one list so that I can do all the text processing

  19. Thanks for this post.

    When I ran this code
    setup_twitter_oauth(consumer_key,consumer_secret,access_token,access_secret)

    I got the following error:

    [1] “Using direct authentication”
    Error in check_twitter_oauth() : OAuth authentication error:
    This most likely means that you have incorrectly called setup_twitter_oauth()’

    What should I do?

  20. Whwn i am running the command
    r_stats <- searchTwitter("#Rstats", n=1500, cainfo='acert.pem')

    It shows me the following error

    Error in tw_from_response(out, …) :
    unused argument (cainfo = "cacert.pem")

  21. Hi Dave,

    Thanks for the wonderful intro. I am facing a roadblock where, I am using the twitteR package and I want to get the twitter handles/usernames of my followers instead of the Screen Name, example if a hypothetical user John Biden was one of my followers as his Screen Name but handle as @JohnB80, I want to extract the @JohnB80. Currently I can get the twitter ID and the Screen Name but not the handle. Please let me know your valuable suggestions.

  22. Thanks Dave for the prompt reply. I have a list of follower IDs, so I used lookupUsers(). It was giving all the detailed list of information about the user , was unable to extract the Screen Name and then I found using the str() function I found the parameter $screenName, which did the trick. Thanks Again, your help is highly appreciated

  23. Hi. Thanks for the tutorial. I’ve got this error with your code, and others pretty similar works about data mining over twitter using R.

    these are the errors I get:

    r_stats_text_corpus <- tm_map(r_stats_text_corpus, content_transformer(tolower))
    Warning message:
    In mclapply(content(x), FUN, …) :
    all scheduled cores encountered errors in user code

    So i add mc.cores = 1 as you told and that's appear:

    r_stats_text_corpus <- tm_map(r_stats_text_corpus, content_transformer(tolower),mc.cores=1)
    Error in UseMethod("content", x) :
    no applicable method for 'content' applied to an object of class "try-error"

    And this is the most error i get:

    wordcloud(r_stats_text_corpus)
    Error in UseMethod("meta", x) :
    no applicable method for 'meta' applied to an object of class "try-error"

    it appears to be the new update of tm package but i don't know how to solve it

    thx, regards, Enrico

  24. Hi, thank you for the great package!

    I am having some trouble with the search. I ran the exact same code in the post but got less then 1500 entries:

    tw = searchTwitter(“#Rstats”,n=1500)

    Warning message:
    In doRppAPICall(“search/tweets”, n, params = params, retryOnRateLimit = retryOnRateLimit, :
    1500 tweets were requested but the API can only return 234

    length(tw)
    [1] 234

    Any idea why is that?

    Thanks,
    Joao

    1. Hi Joao,

      I’m not sure. Twitter has implemented some limitations in that you can only search for tweets within a given time frame. It could be that there were only 234 tweets with the #rstats hashtag, within some time frame, when you performed the search.

      Cheers,

      Dave

      1. I think that may be responsible for some of the issues I am having as well. I will look at it again using this insight. Thanks.

  25. Thank you very much for your post, I was just desperately looking for a solution with the utf8towcs error. That’s very helpful :p

  26. Pingback: code, gather, process – joel eduardo martinez

Leave a Reply

Your email address will not be published. Required fields are marked *

This site uses Akismet to reduce spam. Learn how your comment data is processed.