This vignettes provides a very quick tour of the R package rtweet: Collecting Twitter Data
. Before getting to the tour, however, I want to explain the philosophy behind rtweet.
The approach I’ve taken to constructing rtweet
has been shaped by my desires to…
Consequently, most of my attention gets funneled toward data retrieval and data exploration functions—as opposed to post/interactive functions, e.g., post_follow_user()
. Currently, rtweet
offers users several functions for posting on behalf of their own Twitter accounts, but efforts to add documentation and support for those functions has been thin. I’m not opposed to these things, but at least for now I’m unlikely to set aside my own time to develop them.
With that in mind, I want to encourage any feedback or suggestions you have. If you use Github and you experience a clear bug, I’d encourage you to copy all relevant code along with the error message and create an new issue on Github. For other issues and discussion, please consider joining and posting to the Google group.
install.packages("rtweet")
library(rtweet)
The rtweet
package exports magrittr’s pipe %>%
. Given the recursive nature of networks and Twitter data, I find it often comes in handy for writing legible code. I also tend to use it in examples, but I understand not everyone will be familiar with it. So, for a crash course, all you really need to know is the pipe operator reorders R functions such that they occur linearly. That is, instead of wrapping one function around another function, the pipe operator goes in order from left to right—just like this sentence flows from left to right. So, a line of code in base R that looks like this
paste0("your lucky number is ", sample(1:20, 1))
Can be rewritten so that the first function you see is the first function that runs when the code is executed, e.g.,
c(1:20) %>%
sample(1) %>%
paste0("your lucky number is ", .)
The pipe (%>%
) simply passes along what’s on the left-hand side to the right-hand side of the line of code. Once it goes through the pipe, the output is assigned to the period (.
), and it is by default plugged into the first argument of the function on the right-hand side of the pipe.
## select the four element from the lhs of the pipe
c(1:10) %>%
.[[4]]
Given the default behavior, you don’t always have the rewrite the data object as it is often assumed. In the code below, for example, the mean function is by default applied to the numbers 1 through 10.
c(1:10) %>%
mean()
## search for 500 tweets using the #rstats hashtag
team_rstats <- search_tweets("#rstats", n = 500)
team_rstats
## access and preview data on the users who posted the tweets
users_data(team_rstats) %>%
head()
## return 200 tweets from @KyloR3n's timeline
kylo_is_a_mole <- get_timeline("KyloR3n", n = 2000)
head(kylo_is_a_mole)
## extract emo kylo ren's user data
users_data(kylo_is_a_mole)
## stream tweets mentioning @HillaryClinton for 2 minutes (120 sec)
imwithher <- stream_tweets("HillaryClinton", timeout = 120)
head(imwithher)
## extract data on the users who posted the tweets
head(users_data(imwithher))
## stream 3 random samples of tweets
for (in in seq_len(3)) {
stream_tweets(q = "", timeout = 60,
file_name = paste0("rtw", i), parse = FALSE)
if (i == 3) {
message("all done!")
break
} else {
# wait between 0 and 300 secs before next stream
Sys.sleep(runif(1, 0, 300))
}
}
## parse the samples
tw <- lapply(c("rtw1.json", "rtw2.json", "rtw3.json"),
parse_stream)
## collapse lists into single data frame
tw.users <- do.call("rbind", users_data(tw))
tw <- do.call("rbind", tw)
attr(tw, "users") <- tw.users
## preview data
head(tw)
users_data(tw) %>%
head()
# search for 500 users using "social science" as a keyword
harder_science <- search_users("social science", n = 500)
harder_science
# extract most recent tweets data from the social scientists
tweets_data(harder_science)
## lookup users by screen_name or user_id
users <- c("KimKardashian", "justinbieber", "taylorswift13",
"espn", "JoelEmbiid", "cstonehoops", "KUHoops",
"upshotnyt", "fivethirtyeight", "hadleywickham",
"cnn", "foxnews", "msnbc", "maddow", "seanhannity",
"potus", "epa", "hillaryclinton", "realdonaldtrump",
"natesilver538", "ezraklein", "annecoulter")
famous_tweeters <- lookup_users(users)
## preview users data
famous_tweeters
# extract most recent tweets data from the famous tweeters
tweets_data(famous_tweeters)
## or get user IDs of people following stephen colbert
colbert_nation <- get_followers("stephenathome", n = 18000)
## get even more by using the next_cursor function
page <- next_cursor(colbert_nation)
## use the page object to continue where you left off
colbert_nation_ii <- get_followers("stephenathome", n = 18000, page = page)
colbert_nation <- c(unlist(colbert_nation), unlist(colbert_nation_ii))
## and then lookup data on those users (if hit rate limit, run as two parts)
colbert_nation <- lookup_users(colbert_nation)
colbert_nation
## or get user IDs of people followed *by* President Obama
obama1 <- get_friends("BarackObama")
obama2 <- get_friends("BarackObama", page = next_cursor(obama1))
## and lookup data on Obama's friends
lookup_users(c(unlist(obama1), unlist(obama2)))
## get trending hashtags, mentions, and topics worldwide
prestige_worldwide <- get_trends()
prestige_worldwide
## or narrow down to a particular country
usa_usa_usa <- get_trends("United States")
usa_usa_usa
## or narrow down to a popular city
CHIEFS <- get_trends("Kansas City")
CHIEFS
post_tweet("my first rtweet #rstats")
## ty for the follow ;)
post_follow_user("kearneymw")