# Using the R twitteR package

• Updated 2014 November 26th to reflect changes in the tm package
• Updated 2015 February 18th to reflect changes in the twitteR package

A short post on using the R twitteR package for text mining and using the R wordcloud package for visualisation. I did this on my Windows machine, which has this problem. I've updated the code due to changes in the recent update of the twitteR package. In addition, I have included a screenshot below from my Twitter Apps Keys and Access Tokens page to indicate where to get the consumer_key, consumer_secret, access_token, and access_secret values.

#install the necessary packages
install.packages("wordcloud")
install.packages("tm")

library("wordcloud")
library("tm")

#necessary file for Windows

#to get your consumerKey and consumerSecret see the twitteR documentation for instructions
consumer_secret,
access_token,
access_secret)

#the cainfo parameter is necessary only on Windows
#should get 1500
length(r_stats)
#[1] 1500

#save text
r_stats_text <- sapply(r_stats, function(x) x$getText()) #create corpus r_stats_text_corpus <- Corpus(VectorSource(r_stats_text)) #clean up r_stats_text_corpus <- tm_map(r_stats_text_corpus, content_transformer(tolower)) r_stats_text_corpus <- tm_map(r_stats_text_corpus, removePunctuation) r_stats_text_corpus <- tm_map(r_stats_text_corpus, function(x)removeWords(x,stopwords())) wordcloud(r_stats_text_corpus) #alternative steps if you're running into problems r_stats<- searchTwitter("#Rstats", n=1500, cainfo="cacert.pem") #save text r_stats_text <- sapply(r_stats, function(x) x$getText())
#create corpus
r_stats_text_corpus <- Corpus(VectorSource(r_stats_text))

#if you get the below error
#In mclapply(content(x), FUN, ...) :
#  all scheduled cores encountered errors in user code

#run this step if you get the error:
r_stats_text_corpus <- tm_map(r_stats_text_corpus,
content_transformer(function(x) iconv(x, to='UTF-8-MAC', sub='byte')),
mc.cores=1
)
r_stats_text_corpus <- tm_map(r_stats_text_corpus, content_transformer(tolower), mc.cores=1)
r_stats_text_corpus <- tm_map(r_stats_text_corpus, removePunctuation, mc.cores=1)
r_stats_text_corpus <- tm_map(r_stats_text_corpus, function(x)removeWords(x,stopwords()), mc.cores=1)
wordcloud(r_stats_text_corpus)


I learned who Hadley Wickham is after seeing this

Let's try the hash tag #bioinformatics

bioinformatics <- searchTwitter("#bioinformatics", n=1500, cainfo="cacert.pem")
bioinformatics_text <- sapply(bioinformatics, function(x) x$getText()) bioinformatics_text_corpus <- Corpus(VectorSource(bioinformatics_text)) bioinformatics_text_corpus <- tm_map(bioinformatics_text_corpus, content_transformer(function(x) iconv(x, to='UTF-8-MAC', sub='byte')), mc.cores=1 ) bioinformatics_text_corpus <- tm_map(bioinformatics_text_corpus, content_transformer(tolower), mc.cores=1) bioinformatics_text_corpus <- tm_map(bioinformatics_text_corpus, removePunctuation, mc.cores=1) bioinformatics_text_corpus <- tm_map(bioinformatics_text_corpus, function(x)removeWords(x,stopwords()), mc.cores=1) wordcloud(bioinformatics_text_corpus) #if you're getting the error: #could not be fit on page. It will not be plotted. #try changing the scale, like #wordcloud(bioinformatics_text_corpus, scale=c(2,0.2))  I had a small chuckle on finding the word "ultratricky" on the far left towards the top Wordcloud with less words, minimum frequency and in colour library(RColorBrewer) pal2 <- brewer.pal(8,"Dark2") wordcloud(bioinformatics_text_corpus,min.freq=2,max.words=100, random.order=T, colors=pal2)  I guess bioinformatics and genomics go hand in hand. Good to see that people are tweeting on training, workshops and resources for bioinformatics. ### Conclusions There are several posts on the web already on using the R twitteR package for text mining; I found them after I had the idea of making a word cloud using people's tweets that had a particular hash tag. There are many other uses of the R twitteR package, for example you could run a daily cron job to check which of your followers have unfollowed you and so on. me <- getUser("davetang31", cainfo="cacert.pem") me$getId()
[1] "555580799"
getUser(555580799,cainfo="cacert.pem")
[1] "davetang31"
me$getFollowerIDs(cainfo="cacert.pem") #or me$getFollowers(cainfo="cacert.pem")
#you can also see what's trending
trend <- availableTrendLocations(cainfo="cacert.pem")
name country woeid
1 Worldwide             1
trend <- getTrends(1, cainfo="cacert.pem")
trend
name
1                #My15FavoritesSongs
2                 #MikaNoLegendários
4           #SiMiMamáFueraPresidenta
5                     #HappySiwonDay
6                           C.J Fair
7                             Mewtwo
8                       Mitch McGary
9                            Carreño
10                         John Wall
url
query woeid
1                %23My15FavoritesSongs     1
2            %23MikaNoLegend%C3%A1rios     1
4      %23SiMiMam%C3%A1FueraPresidenta     1
5                     %23HappySiwonDay     1
6                       %22C.J+Fair%22     1
7                               Mewtwo     1
8                   %22Mitch+McGary%22     1
9                         Carre%C3%B1o     1
10                     %22John+Wall%22     1


.
1. Thanks for the intro to twitteR and wordcloud!

Be sure to use https not http. I got a “forbidden” error without the s.

1. Davo says:

Thanks for comment!

2. Jose says:

Hello Dave, this is José from Venezuela. So im triying to replicate your code and i think im almost there however i have the following problem:
after getting the :
[1] TRUE
I try to get the word cloud for the term “patria” (homeland) and i get the following error:
Error in twInterfaceObj$doAPICall(cmd, params, “GET”, …) : Error: Authorization Required Any idea on how to solve it? Thanks. 1. Davo says: Hi José, I’m not sure but have you checked that you set up your Twitter application properly at https://apps.twitter.com/? Cheers, Dave 1. Hi Davo, Very useful intro to twitteR! Congrats. I am facing the same issue ass José. I get, > registerTwitterOAuth(cred) [1] TRUE But then, > searchTwitter(“#rstats”, n=1500, cainfo=”cacert.pem”) [1] “Authorization Required” Error in twInterfaceObj$doAPICall(cmd, params, “GET”, …) :
Error: Authorization Required

I checked my app, but seems to be correctly set. Any tip is appreciated. Thanks!

1. Hey Davo,

Just created a new app and worked. The first one must be corrupted. I will find out.

Regards,
Pablo

3. Jose says:

For everyone, i used “API key” instead of “consumer key” and “API secret” and consumer secret.

1. Davo says:

No problems. Glad you got it working 🙂

2. Vol says:

You wrote: For everyone, i used “API key” instead of “consumer key” and “API secret” and consumer secret.

In apps.twitter.com, can you point me to where I can get API Key and Secret? Under Application Settings, I see API Key mentioned in the same field as Consumer Key i.e. they are the same.

I am getting the same error as you did.

Thanks!

r_stats_text_corpus <- tm_map(r_stats_text_corpus, tolower)
to
r_stats_text_corpus <- tm_map(r_stats_text_corpus, content_transformer(tolower))

As I understand, searchTwitter returns random Twitter messages, though it's possible to specify a date range. Which means it is not really good for doing reliable analysis.

Is there a way of obtaining more Twitter data — i.e. all Tweets from the past 30 days?

5. Roberto says:

I’m experiencing this error:
Warning message:
In doRppAPICall(“search/tweets”, n, params = params, retryOnRateLimit = retryOnRateLimit, :
1500 tweets were requested but the API can only return 15

when trying to collect tweets.
What’s the reason and how can it be fixed?

Thanks

1. Davo says:

I believe what you were searching for only had 15 matches.

6. Nice post. I really could run few wordcloud examples after reading your post. I did encounter the following issues,
1. they need to be https:// instead of http://
2. the corpus got a document error. so had to include one more transformation step before running the wordcloud function
r_stats_text_corpus <- tm_map(r_stats_text_corpus, PlainTextDocument)

7. zsheng says:

Hi,

The post was helpful in teaching anyone who wants to learn how to the use the twitteR package in R. However, I have encountered some problems while replicating your code. At the last line wordcloud(r_stats_text_corpus), i get an error:

Error in UseMethod(“meta”, x) :
no applicable method for ‘meta’ applied to an object of class “try-error”
1: In mclapply(x$content[i], function(d) tm_reduce(d, x$lazy$maps)) : scheduled core 1 encountered error in user code, all values of the job will be affected 2: In mclapply(unname(content(x)), termFreq, control) : all scheduled cores encountered errors in user code Did anyone face the same problem and could provide some help? 1. Davo says: Try adding mc.cores=1 as an extra parameter to the functions (check out the updated post). 1. Avneet says: I am getting the same problem. 🙁 8. Jonathan says: I have the same problem as above 🙁 9. Jonathan says: I had to add lazy=TRUE to the line: r_stats_text_corpus <- tm_map(r_stats_text_corpus, content_transformer(tolower), lazy=TRUE) In order not get an error message about the cores in the following way: Warning messages: 1: In mclapply(x$content[i], function(d) tm_reduce(d, x$lazy$maps)) :
all scheduled cores encountered errors in user code

But when everything goes without error meassages i get(using the r_stats_text_corpus <- tm_map(r_stats_text_corpus, PlainTextDocument)) :
Warning messages:
1: In wordcloud(bigdata_corpus) :
usemethod("removewords", could not be fit on page. It will not be plotted.
2: In wordcloud(bigdata_corpus) :
'removewords' could not be fit on page. It will not be plotted.
3: In wordcloud(bigdata_corpus) :
method could not be fit on page. It will not be plotted.
This produces a cloud plot of the warning words from the R console!

When i dont use the r_stats_text_corpus <- tm_map(r_stats_text_corpus, PlainTextDocument) i get after hitting the wordcloud function:
Error in UseMethod("meta", x) :
no applicable method for 'meta' applied to an object of class "try-error"
1: In mclapply(x$content[i], function(d) tm_reduce(d, x$lazy$maps)) : all scheduled cores encountered errors in user code 2: In mclapply(unname(content(x)), termFreq, control) : all scheduled cores encountered errors in user code Feel very confused and i am a beginner at R 1. Davo says: Check the updated post. 1. Jonathan says: Thank you for your answer it works fine now. Now i get alot of warnings when performing the wordcloud function: Warning messages: 1: “mc.cores” is not a graphical parameter 2: “mc.cores” is not a graphical parameter 3: “mc.cores” is not a graphical parameter Does that mean i have something unwanted in my data? 1. Davo says: No problems. Remove the mc.cores parameter from the wordcloud() function. Then everything should work nicely. 10. Francesca says: Hi Dave, could you please help me?after handshake R returns “Error: Authorization Required”. How can I be sure about my twitter app settings? thanks 1. Davo says: Hi Francesca, did you get your consumer key and secret from https://dev.twitter.com/ and replace the word “secret” in the code above with the consumer key and secret? And did you allow read access? Cheers, Dave 11. Francesca says: Hi Dave, thanks for your answer. Yes, I did. (…….) #to get your consumerKey and consumerSecret see the twitteR documentation for instructions cred <- OAuthFactory$new(consumerKey="MY c. KEY",
consumerSecret="MY s. KEY",

#necessary step for Windows
cred$handshake(cainfo="cacert.pem") Yesterday it turned me a PIN one time, after that the authorization was denied. I create a new app. but unfortunately always the same error. 1. Francesca says: Sorry for the url, adresses are correct in my r scripts 12. Thanks for updating your code and reflecting changes of the tm package. Great work! All my errors are resolved now. 13. Matt says: I have two questions. First, how would the code differ for Mac users? Second, I have not been able to receive the PIN, I get the following error despite having reinstalled RCurl. Would appreciate thoughts. Error in function (type, msg, asError = TRUE) : error setting certificate verify locations: CAfile: CApath: none 1. Davo says: At least on my MacBook Air, I don’t need the cacert.pem file. All the code will be the same but leave out the cainfo=”cacert.pem” part in the functions. 14. Shilpa Jain says: Hi, I tried the same code above.I have my API keys but I got below shown error. setup_twitter_oauth(APP_KEY,APP_SECRET,OAUTH_TOKEN,OAUTH_TOKEN_SECRET ) [1] “Using direct authentication” Error in check_twitter_oauth() : OAuth authentication error: This most likely means that you have incorrectly called setup_twitter_oauth()’ Please help! I have been banging my head on this. 1. Davo says: Make a new folder/directory somewhere on your computer, restart R, use the setwd() function to tell R to use the new folder/directory that you made, and try the code again. Do you get the same error message? 1. Shilpa Jain says: Yes, I get the same error..No change 1. Davo says: Some other things to try are (if they apply): update R to the latest version, update all your packages, regenerate the keys, make a new Twitter app, and check out this page (and the links within) https://github.com/geoffjentry/twitteR/issues/74. Good luck. 15. Andrea says: Thank you very much for the tips on how to solve the pernicious In mclapply(content(x), FUN, …) : all scheduled cores encountered errors in user code 16. david says: hi I tried the same code above r_stats <- searchTwitter("#Rstats", n=1500, cainfo="cacert.pem") I had a error message "Error in tw_from_response(out, …) : unused argument (cainfo = "cacert.pem")" Do you know solution to solve? 1. Davo says: What happens when you leave out the cainfo=”cacert.pem” parameter in the searchTwitter() function? 17. Peter says: Hi!! I follow all the steps but I obtain the next: [1] “Using direct authentication” Use a local file to cache OAuth access credentials between R sessions? 1: Yes 2: No What can I do to obtain a [1] TRUE by the console which means that the handshake is complete? Thanks!! 1. Davo says: Hi Peter, for the caching, I select “2: No”, and that seems to work for me. Cheers, Dave 18. Johannes Fritz says: Dear Dave, Thank you for this illuminating post. It really helped me lot to understand this function. I still have two questions that I cannot find the answer to online and thus turn to you. First, I run a Windows 7 OS and use RStudio. There was no need to insert “cainfo=”cacert.pem” ” into my command. Am I missing something? Second, the searches do not produce Tweets that are older than a week. When I use the date setting commands, the search produces no results. I find similar observations from other users online, but no one seems to have an answer. Do you? Thank you very much for any hint you can provide, Johannes 1. Davo says: Hi Johannes, regarding the cainfo, you are right, it is not needed with the updated version of the twitteR. You are right again, at least to best of my knowledge, regarding searching older tweets; you can only search up to one week. What you can do is to do a daily search and save the results. Cheers, Dave 19. Sooy says: Hi, I followed up your direction, but in the process of authentification, the error message appeared as below. Error in registerTwitterOAuth(twitCred) : ROAuth is no longer used in favor of httr, please see ?setup_twitter_oauth How can I resolve this problem? 1. Davo says: Hi Sooy, are you using the setup_twitter_oauth() function as I have shown in the post? Cheers, Dave 1. Sooy says: Yeah, I followed up your guide again. but the error appeared as below. Error in check_twitter_oauth() : OAuth authentication error: This most likely means that you have incorrectly called setup_twitter_oauth()’ 20. Fonso says: Thanks, very useful post 21. Neeti says: Hi, thanks! It was of great help!!!!! After hours of debugging in vain, I came across your post… and finally it was a cakewalk… 🙂 22. Amarnath says: Hi Davo Your tutorial helped me a lot. I want to combine multiple twitter seraches in to one list. How can I do that before setting up the wordcloud TweetsList_1 <- searchTwitter("serach1", n=1500) TweetsList_2 <- searchTwitter("serach2", n=1500) How can I combine, TweetsList_1 and TweetsList_2 in to one list so that I can do all the text processing 1. Davo says: Hi Amarnath, you can simply do combined <- c(TweetsList_1, TweetsList_2)  Cheers, Dave 23. Victor says: Thanks for this post. When I ran this code setup_twitter_oauth(consumer_key,consumer_secret,access_token,access_secret) I got the following error: [1] “Using direct authentication” Error in check_twitter_oauth() : OAuth authentication error: This most likely means that you have incorrectly called setup_twitter_oauth()’ What should I do? 24. Travis says: Had the same error: This most likely means that you have incorrectly called setup_twitter_oauth()’ Read this on the GitHub link that Davo recommended: https://github.com/geoffjentry/twitteR/issues/74 As was suggested by user raeed20, i ran the following: install.packages(‘base64enc’) And then was able to run. 25. rohitgopidi says: install.packages(‘base64enc’) does the trick 26. ganesh khirwadkar says: Whwn i am running the command r_stats <- searchTwitter("#Rstats", n=1500, cainfo='acert.pem') It shows me the following error Error in tw_from_response(out, …) : unused argument (cainfo = "cacert.pem") 1. Davo says: Remove the cainfo=”cacert.pem” parameter. 27. Anurag says: Hi Dave, Thanks for the wonderful intro. I am facing a roadblock where, I am using the twitteR package and I want to get the twitter handles/usernames of my followers instead of the Screen Name, example if a hypothetical user John Biden was one of my followers as his Screen Name but handle as @JohnB80, I want to extract the @JohnB80. Currently I can get the twitter ID and the Screen Name but not the handle. Please let me know your valuable suggestions. 1. Davo says: If you have the Twitter ID, you can use the getUser() function to get the handle. 28. Anurag says: Thanks Dave for the prompt reply. I have a list of follower IDs, so I used lookupUsers(). It was giving all the detailed list of information about the user , was unable to extract the Screen Name and then I found using the str() function I found the parameter$screenName, which did the trick. Thanks Again, your help is highly appreciated

29. Enrico says:

Hi. Thanks for the tutorial. I’ve got this error with your code, and others pretty similar works about data mining over twitter using R.

these are the errors I get:

r_stats_text_corpus <- tm_map(r_stats_text_corpus, content_transformer(tolower))
Warning message:
In mclapply(content(x), FUN, …) :
all scheduled cores encountered errors in user code

So i add mc.cores = 1 as you told and that's appear:

r_stats_text_corpus <- tm_map(r_stats_text_corpus, content_transformer(tolower),mc.cores=1)
Error in UseMethod("content", x) :
no applicable method for 'content' applied to an object of class "try-error"

And this is the most error i get:

wordcloud(r_stats_text_corpus)
Error in UseMethod("meta", x) :
no applicable method for 'meta' applied to an object of class "try-error"

it appears to be the new update of tm package but i don't know how to solve it

thx, regards, Enrico

30. Joao Vissoci says:

Hi, thank you for the great package!

I am having some trouble with the search. I ran the exact same code in the post but got less then 1500 entries:

Warning message:
In doRppAPICall(“search/tweets”, n, params = params, retryOnRateLimit = retryOnRateLimit, :
1500 tweets were requested but the API can only return 234

length(tw)
[1] 234

Any idea why is that?

Thanks,
Joao

1. Davo says:

Hi Joao,

I’m not sure. Twitter has implemented some limitations in that you can only search for tweets within a given time frame. It could be that there were only 234 tweets with the #rstats hashtag, within some time frame, when you performed the search.

Cheers,

Dave

1. Victor Ordu says:

I think that may be responsible for some of the issues I am having as well. I will look at it again using this insight. Thanks.

31. Bérengère says:

Thank you very much for your post, I was just desperately looking for a solution with the utf8towcs error. That’s very helpful :p

This site uses Akismet to reduce spam. Learn how your comment data is processed.