The hardware and bandwidth for this mirror is donated by dogado GmbH, the Webhosting and Full Service-Cloud Provider. Check out our Wordpress Tutorial.
If you wish to report a bug, or if you are interested in having us mirror your free-software or open-source project, please feel free to contact us at mirror[@]dogado.de.
Prior to streaming, make sure to install and load rtweet. This
vignette assumes users have already setup app access tokens (see: the
“auth” vignette, vignette("auth", package = "rtweet")
).
rtweet makes it possible to capture live streams of Twitter data1.
There are two ways of having a stream:
A
stream collecting data from a set of rules, which can be collected
via filtered_stream()
.
A
stream of a 1% of tweets published, which can be collected via
sample_stream()
.
In either case we need to choose how long should the streaming connection hold, and in which file it should be saved to.
## Stream time in seconds so for one minute set timeout = 60
## For larger chunks of time, I recommend multiplying 60 by the number
## of desired minutes. This method scales up to hours as well
## (x * 60 = x mins, x * 60 * 60 = x hours)
## Stream for 5 seconds
streamtime <- 5
## Filename to save json data (backup)
filename <- "rstats.json"
The filtered stream collects tweets for all rules that are currently active, not just one rule or query.
Streaming rules in rtweet need a value and a tag. The value is the query to be performed, and the tag is the name to identify tweets that match a query. You can use multiple words and hashtags as value, please read the official documentation. Multiple rules can match to a single tweet.
To know current rules you can use stream_add_rule()
to
know if any rule is currently active:
With the help of rules()
the id, value and tag of each
rule is provided.
To remove rules use stream_rm_rule()
Note, if the rules are not used for some time, Twitter warns you that
they will be removed. But given that filtered_stream()
collects tweets for all rules, it is advisable to keep the rules list
short and clean.
Once these parameters are specified, initiate the stream. Note: Barring any disconnection or disruption of the API, streaming will occupy your current instance of R until the specified time has elapsed. It is possible to start a new instance or R —streaming itself usually isn’t very memory intensive— but operations may drag a bit during the parsing process which takes place immediately after streaming ends.
If no tweet matching the rules is detected a warning will be issued.
Parsing larger streams can take quite a bit of time (in addition to time spent streaming) due to a somewhat time-consuming simplifying process used to convert a json file into an R object.
Don’t forget to clean the streaming rules:
The sample_stream()
function doesn’t need rules or
anything.
Users may want to stream tweets into json files upfront and parse
those files later on. To do this, simply add parse = FALSE
and make sure you provide a path (file name) to a location you can find
later.
You can also use append = TRUE
to continue recording a
stream into an already existing file.
Currently parsing the streaming data file with
parse_stream()
is not functional. However, you can read it
back in with jsonlite::stream_in(file)
.
The parsed object should be the same whether a user parses up-front or from a json file in a later session.
Currently the returned object is a raw conversion of the feed into a nested list depending on the fields and extensions requested.
Till November 2022 it was possible with API v1.1, currently this is no longer possible and uses API v2.↩︎
These binaries (installable software) and packages are in development.
They may not be fully stable and should be used with caution. We make no claims about them.
Health stats visible at Monitor.