- Updated 2014 November 26th to reflect changes in the tm package
- Updated 2015 February 18th to reflect changes in the twitteR package
A short post on using the R twitteR package for text mining and using the R wordcloud package for visualisation. I did this on my Windows machine, which has this problem. I've updated the code due to changes in the recent update of the twitteR package. In addition, I have included a screenshot below from my Twitter Apps Keys and Access Tokens page to indicate where to get the consumer_key, consumer_secret, access_token, and access_secret values.
Please enter your values in the code below.
#install the necessary packages install.packages("twitteR") install.packages("wordcloud") install.packages("tm") library("twitteR") library("wordcloud") library("tm") #necessary file for Windows download.file(url="http://curl.haxx.se/ca/cacert.pem", destfile="cacert.pem") #to get your consumerKey and consumerSecret see the twitteR documentation for instructions consumer_key <- 'your key' consumer_secret <- 'your secret' access_token <- 'your access token' access_secret <- 'your access secret' setup_twitter_oauth(consumer_key, consumer_secret, access_token, access_secret) #the cainfo parameter is necessary only on Windows r_stats <- searchTwitter("#Rstats", n=1500, cainfo="cacert.pem") #should get 1500 length(r_stats) #[1] 1500 #save text r_stats_text <- sapply(r_stats, function(x) x$getText()) #create corpus r_stats_text_corpus <- Corpus(VectorSource(r_stats_text)) #clean up r_stats_text_corpus <- tm_map(r_stats_text_corpus, content_transformer(tolower)) r_stats_text_corpus <- tm_map(r_stats_text_corpus, removePunctuation) r_stats_text_corpus <- tm_map(r_stats_text_corpus, function(x)removeWords(x,stopwords())) wordcloud(r_stats_text_corpus) #alternative steps if you're running into problems r_stats<- searchTwitter("#Rstats", n=1500, cainfo="cacert.pem") #save text r_stats_text <- sapply(r_stats, function(x) x$getText()) #create corpus r_stats_text_corpus <- Corpus(VectorSource(r_stats_text)) #if you get the below error #In mclapply(content(x), FUN, ...) : # all scheduled cores encountered errors in user code #add mc.cores=1 into each function #run this step if you get the error: #(please break it!)' in 'utf8towcs' r_stats_text_corpus <- tm_map(r_stats_text_corpus, content_transformer(function(x) iconv(x, to='UTF-8-MAC', sub='byte')), mc.cores=1 ) r_stats_text_corpus <- tm_map(r_stats_text_corpus, content_transformer(tolower), mc.cores=1) r_stats_text_corpus <- tm_map(r_stats_text_corpus, removePunctuation, mc.cores=1) r_stats_text_corpus <- tm_map(r_stats_text_corpus, function(x)removeWords(x,stopwords()), mc.cores=1) wordcloud(r_stats_text_corpus)
I learned who Hadley Wickham is after seeing this
Let's try the hash tag #bioinformatics
bioinformatics <- searchTwitter("#bioinformatics", n=1500, cainfo="cacert.pem") bioinformatics_text <- sapply(bioinformatics, function(x) x$getText()) bioinformatics_text_corpus <- Corpus(VectorSource(bioinformatics_text)) bioinformatics_text_corpus <- tm_map(bioinformatics_text_corpus, content_transformer(function(x) iconv(x, to='UTF-8-MAC', sub='byte')), mc.cores=1 ) bioinformatics_text_corpus <- tm_map(bioinformatics_text_corpus, content_transformer(tolower), mc.cores=1) bioinformatics_text_corpus <- tm_map(bioinformatics_text_corpus, removePunctuation, mc.cores=1) bioinformatics_text_corpus <- tm_map(bioinformatics_text_corpus, function(x)removeWords(x,stopwords()), mc.cores=1) wordcloud(bioinformatics_text_corpus) #if you're getting the error: #could not be fit on page. It will not be plotted. #try changing the scale, like #wordcloud(bioinformatics_text_corpus, scale=c(2,0.2))
I had a small chuckle on finding the word "ultratricky" on the far left towards the top
Wordcloud with less words, minimum frequency and in colour
library(RColorBrewer) pal2 <- brewer.pal(8,"Dark2") wordcloud(bioinformatics_text_corpus,min.freq=2,max.words=100, random.order=T, colors=pal2)
I guess bioinformatics and genomics go hand in hand. Good to see that people are tweeting on training, workshops and resources for bioinformatics.
Conclusions
There are several posts on the web already on using the R twitteR package for text mining; I found them after I had the idea of making a word cloud using people's tweets that had a particular hash tag. There are many other uses of the R twitteR package, for example you could run a daily cron job to check which of your followers have unfollowed you and so on.
me <- getUser("davetang31", cainfo="cacert.pem") me$getId() [1] "555580799" getUser(555580799,cainfo="cacert.pem") [1] "davetang31" me$getFollowerIDs(cainfo="cacert.pem") #or me$getFollowers(cainfo="cacert.pem") #you can also see what's trending trend <- availableTrendLocations(cainfo="cacert.pem") head(trend) name country woeid 1 Worldwide 1 2 Winnipeg Canada 2972 3 Ottawa Canada 3369 4 Quebec Canada 3444 5 Montreal Canada 3534 6 Toronto Canada 4118 trend <- getTrends(1, cainfo="cacert.pem") trend name 1 #My15FavoritesSongs 2 #MikaNoLegendários 3 #CitePessoasMaisGostosasDoTwitter 4 #SiMiMamáFueraPresidenta 5 #HappySiwonDay 6 C.J Fair 7 Mewtwo 8 Mitch McGary 9 Carreño 10 John Wall url 1 http://twitter.com/search?q=%23My15FavoritesSongs 2 http://twitter.com/search?q=%23MikaNoLegend%C3%A1rios 3 http://twitter.com/search?q=%23CitePessoasMaisGostosasDoTwitter 4 http://twitter.com/search?q=%23SiMiMam%C3%A1FueraPresidenta 5 http://twitter.com/search?q=%23HappySiwonDay 6 http://twitter.com/search?q=%22C.J+Fair%22 7 http://twitter.com/search?q=Mewtwo 8 http://twitter.com/search?q=%22Mitch+McGary%22 9 http://twitter.com/search?q=Carre%C3%B1o 10 http://twitter.com/search?q=%22John+Wall%22 query woeid 1 %23My15FavoritesSongs 1 2 %23MikaNoLegend%C3%A1rios 1 3 %23CitePessoasMaisGostosasDoTwitter 1 4 %23SiMiMam%C3%A1FueraPresidenta 1 5 %23HappySiwonDay 1 6 %22C.J+Fair%22 1 7 Mewtwo 1 8 %22Mitch+McGary%22 1 9 Carre%C3%B1o 1 10 %22John+Wall%22 1
See also
twitteR vignette
Evaluation of twitteR
Text data mining twitteR
Word cloud in R
This work is licensed under a Creative Commons
Attribution 4.0 International License.
Thanks for the intro to twitteR and wordcloud!
Be sure to use https not http. I got a “forbidden” error without the s.
Thanks for comment!
Hello Dave, this is José from Venezuela. So im triying to replicate your code and i think im almost there however i have the following problem:
after getting the :
> registerTwitterOAuth(cred)
[1] TRUE
I try to get the word cloud for the term “patria” (homeland) and i get the following error:
Error in twInterfaceObj$doAPICall(cmd, params, “GET”, …) :
Error: Authorization Required
Any idea on how to solve it? Thanks.
Hi José,
I’m not sure but have you checked that you set up your Twitter application properly at https://apps.twitter.com/?
Cheers,
Dave
Hi Davo,
Very useful intro to twitteR! Congrats. I am facing the same issue ass José. I get,
> registerTwitterOAuth(cred)
[1] TRUE
But then,
> searchTwitter(“#rstats”, n=1500, cainfo=”cacert.pem”)
[1] “Authorization Required”
Error in twInterfaceObj$doAPICall(cmd, params, “GET”, …) :
Error: Authorization Required
I checked my app, but seems to be correctly set. Any tip is appreciated. Thanks!
Hey Davo,
Just created a new app and worked. The first one must be corrupted. I will find out.
Regards,
Pablo
I checked the steps again and then took your advice to “set up your Twitter application properly at https://apps.twitter.com/” and was able to made it. 🙂 thanks for your help.
For everyone, i used “API key” instead of “consumer key” and “API secret” and consumer secret.
No problems. Glad you got it working 🙂
You wrote: For everyone, i used “API key” instead of “consumer key” and “API secret” and consumer secret.
In apps.twitter.com, can you point me to where I can get API Key and Secret? Under Application Settings, I see API Key mentioned in the same field as Consumer Key i.e. they are the same.
I am getting the same error as you did.
Thanks!
Thanks. Really helpful. Though I had to change:
r_stats_text_corpus <- tm_map(r_stats_text_corpus, tolower)
to
r_stats_text_corpus <- tm_map(r_stats_text_corpus, content_transformer(tolower))
As I understand, searchTwitter returns random Twitter messages, though it's possible to specify a date range. Which means it is not really good for doing reliable analysis.
Is there a way of obtaining more Twitter data — i.e. all Tweets from the past 30 days?
I’m experiencing this error:
Warning message:
In doRppAPICall(“search/tweets”, n, params = params, retryOnRateLimit = retryOnRateLimit, :
1500 tweets were requested but the API can only return 15
when trying to collect tweets.
What’s the reason and how can it be fixed?
Thanks
I believe what you were searching for only had 15 matches.
Nice post. I really could run few wordcloud examples after reading your post. I did encounter the following issues,
1. they need to be https:// instead of http://
2. the corpus got a document error. so had to include one more transformation step before running the wordcloud function
r_stats_text_corpus <- tm_map(r_stats_text_corpus, PlainTextDocument)
Hi,
The post was helpful in teaching anyone who wants to learn how to the use the twitteR package in R. However, I have encountered some problems while replicating your code. At the last line wordcloud(r_stats_text_corpus), i get an error:
Error in UseMethod(“meta”, x) :
no applicable method for ‘meta’ applied to an object of class “try-error”
In addition: Warning messages:
1: In mclapply(x$content[i], function(d) tm_reduce(d, x$lazy$maps)) :
scheduled core 1 encountered error in user code, all values of the job will be affected
2: In mclapply(unname(content(x)), termFreq, control) :
all scheduled cores encountered errors in user code
Did anyone face the same problem and could provide some help?
Try adding mc.cores=1 as an extra parameter to the functions (check out the updated post).
I am getting the same problem. 🙁
I have the same problem as above 🙁
I had to add lazy=TRUE to the line:
r_stats_text_corpus <- tm_map(r_stats_text_corpus, content_transformer(tolower), lazy=TRUE)
In order not get an error message about the cores in the following way:
Warning messages:
1: In mclapply(x$content[i], function(d) tm_reduce(d, x$lazy$maps)) :
all scheduled cores encountered errors in user code
But when everything goes without error meassages i get(using the r_stats_text_corpus <- tm_map(r_stats_text_corpus, PlainTextDocument)) :
Warning messages:
1: In wordcloud(bigdata_corpus) :
usemethod("removewords", could not be fit on page. It will not be plotted.
2: In wordcloud(bigdata_corpus) :
'removewords' could not be fit on page. It will not be plotted.
3: In wordcloud(bigdata_corpus) :
method could not be fit on page. It will not be plotted.
This produces a cloud plot of the warning words from the R console!
When i dont use the r_stats_text_corpus <- tm_map(r_stats_text_corpus, PlainTextDocument) i get after hitting the wordcloud function:
Error in UseMethod("meta", x) :
no applicable method for 'meta' applied to an object of class "try-error"
In addition: Warning messages:
1: In mclapply(x$content[i], function(d) tm_reduce(d, x$lazy$maps)) :
all scheduled cores encountered errors in user code
2: In mclapply(unname(content(x)), termFreq, control) :
all scheduled cores encountered errors in user code
Feel very confused and i am a beginner at R
Check the updated post.
Thank you for your answer it works fine now.
Now i get alot of warnings when performing the wordcloud function:
Warning messages:
1: “mc.cores” is not a graphical parameter
2: “mc.cores” is not a graphical parameter
3: “mc.cores” is not a graphical parameter
Does that mean i have something unwanted in my data?
No problems. Remove the mc.cores parameter from the wordcloud() function. Then everything should work nicely.
Hi Dave,
could you please help me?after handshake R returns “Error: Authorization Required”.
How can I be sure about my twitter app settings?
thanks
Hi Francesca,
did you get your consumer key and secret from https://dev.twitter.com/ and replace the word “secret” in the code above with the consumer key and secret? And did you allow read access?
Cheers,
Dave
Hi Dave, thanks for your answer. Yes, I did.
(…….)
#to get your consumerKey and consumerSecret see the twitteR documentation for instructions
cred <- OAuthFactory$new(consumerKey="MY c. KEY",
consumerSecret="MY s. KEY",
requestURL="https://api.twitter.com/oauth/request_token",
accessURL="https://api.twitter.com/oauth/access_token",
authURL="https://api.twitter.com/oauth/authorize"😉
#necessary step for Windows
cred$handshake(cainfo="cacert.pem")
Yesterday it turned me a PIN one time, after that the authorization was denied. I create a new app. but unfortunately always the same error.
Sorry for the url, adresses are correct in my r scripts
Perhaps you can searching or emailing the mailing list http://lists.hexdump.org/listinfo.cgi/twitter-users-hexdump.org
Thanks for updating your code and reflecting changes of the tm package. Great work!
All my errors are resolved now.
I have two questions. First, how would the code differ for Mac users? Second, I have not been able to receive the PIN, I get the following error despite having reinstalled RCurl. Would appreciate thoughts.
Error in function (type, msg, asError = TRUE) :
error setting certificate verify locations:
CAfile:
CApath: none
At least on my MacBook Air, I don’t need the cacert.pem file. All the code will be the same but leave out the cainfo=”cacert.pem” part in the functions.
Hi,
I tried the same code above.I have my API keys but I got below shown error.
setup_twitter_oauth(APP_KEY,APP_SECRET,OAUTH_TOKEN,OAUTH_TOKEN_SECRET )
[1] “Using direct authentication”
Error in check_twitter_oauth() : OAuth authentication error:
This most likely means that you have incorrectly called setup_twitter_oauth()’
Please help! I have been banging my head on this.
Make a new folder/directory somewhere on your computer, restart R, use the setwd() function to tell R to use the new folder/directory that you made, and try the code again. Do you get the same error message?
Yes, I get the same error..No change
Some other things to try are (if they apply): update R to the latest version, update all your packages, regenerate the keys, make a new Twitter app, and check out this page (and the links within) https://github.com/geoffjentry/twitteR/issues/74. Good luck.
Thank you very much for the tips on how to solve the pernicious
In mclapply(content(x), FUN, …) :
all scheduled cores encountered errors in user code
hi
I tried the same code above
r_stats <- searchTwitter("#Rstats", n=1500, cainfo="cacert.pem")
I had a error message
"Error in tw_from_response(out, …) :
unused argument (cainfo = "cacert.pem")"
Do you know solution to solve?
What happens when you leave out the cainfo=”cacert.pem” parameter in the searchTwitter() function?
Hi!!
I follow all the steps but I obtain the next:
[1] “Using direct authentication”
Use a local file to cache OAuth access credentials between R sessions?
1: Yes
2: No
What can I do to obtain a [1] TRUE by the console which means that the handshake is complete?
Thanks!!
Hi Peter,
for the caching, I select “2: No”, and that seems to work for me.
Cheers,
Dave
Dear Dave,
Thank you for this illuminating post. It really helped me lot to understand this function.
I still have two questions that I cannot find the answer to online and thus turn to you.
First, I run a Windows 7 OS and use RStudio. There was no need to insert “cainfo=”cacert.pem” ” into my command. Am I missing something?
Second, the searches do not produce Tweets that are older than a week. When I use the date setting commands, the search produces no results. I find similar observations from other users online, but no one seems to have an answer. Do you?
Thank you very much for any hint you can provide,
Johannes
Hi Johannes,
regarding the cainfo, you are right, it is not needed with the updated version of the twitteR.
You are right again, at least to best of my knowledge, regarding searching older tweets; you can only search up to one week.
What you can do is to do a daily search and save the results.
Cheers,
Dave
Hi,
I followed up your direction,
but in the process of authentification,
the error message appeared as below.
Error in registerTwitterOAuth(twitCred) :
ROAuth is no longer used in favor of httr, please see ?setup_twitter_oauth
How can I resolve this problem?
Hi Sooy,
are you using the setup_twitter_oauth() function as I have shown in the post?
Cheers,
Dave
Yeah,
I followed up your guide again.
but the error appeared as below.
Error in check_twitter_oauth() : OAuth authentication error:
This most likely means that you have incorrectly called setup_twitter_oauth()’
Thanks, very useful post
Hi, thanks! It was of great help!!!!!
After hours of debugging in vain, I came across your post… and finally it was a cakewalk… 🙂
Hi Davo
Your tutorial helped me a lot.
I want to combine multiple twitter seraches in to one list. How can I do that before setting up the wordcloud
TweetsList_1 <- searchTwitter("serach1", n=1500)
TweetsList_2 <- searchTwitter("serach2", n=1500)
How can I combine, TweetsList_1 and TweetsList_2 in to one list so that I can do all the text processing
Hi Amarnath,
you can simply do
Cheers,
Dave
Thanks for this post.
When I ran this code
setup_twitter_oauth(consumer_key,consumer_secret,access_token,access_secret)
I got the following error:
[1] “Using direct authentication”
Error in check_twitter_oauth() : OAuth authentication error:
This most likely means that you have incorrectly called setup_twitter_oauth()’
What should I do?
Check this thread out https://github.com/geoffjentry/twitteR/issues/74.
Had the same error:
This most likely means that you have incorrectly called setup_twitter_oauth()’
Read this on the GitHub link that Davo recommended: https://github.com/geoffjentry/twitteR/issues/74
As was suggested by user raeed20, i ran the following:
install.packages(‘base64enc’)
And then was able to run.
install.packages(‘base64enc’)
does the trick
Whwn i am running the command
r_stats <- searchTwitter("#Rstats", n=1500, cainfo='acert.pem')
It shows me the following error
Error in tw_from_response(out, …) :
unused argument (cainfo = "cacert.pem")
Remove the cainfo=”cacert.pem” parameter.
Hi Dave,
Thanks for the wonderful intro. I am facing a roadblock where, I am using the twitteR package and I want to get the twitter handles/usernames of my followers instead of the Screen Name, example if a hypothetical user John Biden was one of my followers as his Screen Name but handle as @JohnB80, I want to extract the @JohnB80. Currently I can get the twitter ID and the Screen Name but not the handle. Please let me know your valuable suggestions.
If you have the Twitter ID, you can use the getUser() function to get the handle.
Thanks Dave for the prompt reply. I have a list of follower IDs, so I used lookupUsers(). It was giving all the detailed list of information about the user , was unable to extract the Screen Name and then I found using the str() function I found the parameter $screenName, which did the trick. Thanks Again, your help is highly appreciated
Hi. Thanks for the tutorial. I’ve got this error with your code, and others pretty similar works about data mining over twitter using R.
these are the errors I get:
r_stats_text_corpus <- tm_map(r_stats_text_corpus, content_transformer(tolower))
Warning message:
In mclapply(content(x), FUN, …) :
all scheduled cores encountered errors in user code
So i add mc.cores = 1 as you told and that's appear:
r_stats_text_corpus <- tm_map(r_stats_text_corpus, content_transformer(tolower),mc.cores=1)
Error in UseMethod("content", x) :
no applicable method for 'content' applied to an object of class "try-error"
And this is the most error i get:
wordcloud(r_stats_text_corpus)
Error in UseMethod("meta", x) :
no applicable method for 'meta' applied to an object of class "try-error"
it appears to be the new update of tm package but i don't know how to solve it
thx, regards, Enrico
Hi, thank you for the great package!
I am having some trouble with the search. I ran the exact same code in the post but got less then 1500 entries:
tw = searchTwitter(“#Rstats”,n=1500)
Warning message:
In doRppAPICall(“search/tweets”, n, params = params, retryOnRateLimit = retryOnRateLimit, :
1500 tweets were requested but the API can only return 234
length(tw)
[1] 234
Any idea why is that?
Thanks,
Joao
Hi Joao,
I’m not sure. Twitter has implemented some limitations in that you can only search for tweets within a given time frame. It could be that there were only 234 tweets with the #rstats hashtag, within some time frame, when you performed the search.
Cheers,
Dave
I think that may be responsible for some of the issues I am having as well. I will look at it again using this insight. Thanks.
Thank you very much for your post, I was just desperately looking for a solution with the utf8towcs error. That’s very helpful :p