suppressPackageStartupMessages(library(tRakt))
suppressPackageStartupMessages(library(dplyr)) # For convenience
library(ggplot2) # For plotting (duh)
library(knitr) # for knitr::kable, used to render simple tables
# If you don't have a client.id defined in a key.json, use mine
if (is.null(getOption("trakt.client.id"))){
get_trakt_credentials(client.id = "12fc1de7671c7f2fb4a8ac08ba7c9f45b447f4d5bad5e11e3490823d629afdf2")
}
There are two ways to search on trakt.tv. The first is via text query (i.e. Game of Thrones
), the second is via ID (various types supported).
At the time of this writing (2015-02-16), the trakt.tv search is a little derpy, so search by ID is recommended.
# Search via text query
show1 <- trakt.search("Game of Thrones")
# Search via ID (trakt id is used by default)
show2 <- trakt.search.byid(1390) # trakt id of Game of Thrones
# The returned data is identical
identical(show1, show2)
## [1] TRUE
# Search a show and receive basic info
show <- trakt.search("Breaking Bad")
# Save the slug of the show, that's needed for other functions as an ID
slug <- show$ids$slug
slug
## [1] "breaking-bad"
# Get the season & episode data
show.seasons <- trakt.seasons.summary(slug, extended = "full") # How many seasons are there?
show.episodes <- trakt.get_all_episodes(slug, show.seasons$season, extended = "full")
# Glimpse at data (only some columns each)
rownames(show.seasons) <- NULL # This shouldn't be necessary
show.seasons[c(1, 3, 4)] %>% kable
season | votes | episode_count |
---|---|---|
1 | 156 | 7 |
2 | 129 | 13 |
3 | 127 | 13 |
4 | 115 | 13 |
5 | 118 | 16 |
show.episodes[c(1:3, 6, 7, 17)] %>% head(10) %>% kable
season | episode | title | rating | votes | year |
---|---|---|---|---|---|
1 | 1 | Pilot | 8.66654 | 2585 | 2008 |
1 | 2 | Cat’s in the Bag… | 8.46810 | 1959 | 2008 |
1 | 3 | …And the Bag’s in the River | 8.35694 | 1793 | 2008 |
1 | 4 | Cancer Man | 8.33583 | 1736 | 2008 |
1 | 5 | Gray Matter | 8.27889 | 1696 | 2008 |
1 | 6 | Crazy Handful of Nothin’ | 8.90317 | 1797 | 2008 |
1 | 7 | A No-Rough-Stuff-Type Deal | 8.68226 | 1731 | 2008 |
2 | 1 | Seven Thirty-Seven | 8.48295 | 1642 | 2009 |
2 | 2 | Grilled | 8.71551 | 1631 | 2009 |
2 | 3 | Bit by a Dead Bee | 8.27410 | 1525 | 2009 |
Plotting the data is pretty straight forward since I try to return regular data.frames
without unnecessary ambiguitiy.
show.episodes$episode_abs <- 1:nrow(show.episodes) # I should probably do that for you.
show.episodes %>%
ggplot(aes(x = episode_abs, y = rating, colour = season)) +
geom_point(size = 3.5, colour = "black") +
geom_point(size = 3) +
geom_smooth(method = lm, se = F) +
labs(title = "Trakt.tv Ratings of Breaking Bad",
y = "Rating", x = "Episode (absolute)", colour = "Season")
show.episodes %>%
ggplot(aes(x = episode_abs, y = votes, colour = season)) +
geom_point(size = 3.5, colour = "black") +
geom_point(size = 3) +
labs(title = "Trakt.tv User Votes of Breaking Bad Episodes",
y = "Votes", x = "Episode (absolute)", colour = "Season")
show.episodes %>%
ggplot(aes(x = episode_abs, y = scale(rating), fill = season)) +
geom_bar(stat = "identity", colour = "black", position = "dodge") +
labs(title = "Trakt.tv User Ratings of Breaking Bad Episodes\n(Scaled using mean and standard deviation)",
y = "z-Rating", x = "Episode (absolute)", fill = "Season")
User-specific functions (trakt.user.*
) default to user = getOption("trakt.username")
, which should have been set by get_trakt_credentials()
, so you get your own data per default.
However, you can specifiy any publicly available user. Note that OAuth2 is not supported, so by “publicly available user”, I really mean only non-private users.
# Get a detailed list of shows/episodes I watched
myeps <- trakt.user.watched(user = "jemus42", type = "shows.extended")
# Get a feel for the data
myeps %>%
arrange(desc(last_watched_at)) %>%
head(5) %>%
kable
title | season | episode | plays | last_watched_at | last_watched.year |
---|---|---|---|---|---|
The 100 | 2 | 16 | 1 | 2015-03-12 04:15:29 | 2015 |
Cougar Town | 6 | 10 | 1 | 2015-03-12 03:37:06 | 2015 |
The Colbert Report | 10 | 147 | 1 | 2015-03-12 02:41:42 | 2015 |
Last Week Tonight with John Oliver | 2 | 3 | 1 | 2015-03-12 01:36:54 | 2015 |
Last Week Tonight with John Oliver | 2 | 2 | 1 | 2015-03-12 01:07:01 | 2015 |
# …and the movies in my trakt.tv collection
mymovies <- trakt.user.collection(user = "jemus42", type = "movies")
mymovies %>%
select(title, year, collected_at) %>%
arrange(collected_at) %>%
head(5) %>%
kable
title | year | collected_at |
---|---|---|
Howl’s Moving Castle | 2004 | 2013-09-24 00:11:02 |
Stargate: Continuum | 2008 | 2013-09-29 07:23:57 |
Stargate: The Ark of Truth | 2008 | 2013-09-29 07:24:01 |
Stargate | 1994 | 2013-09-29 07:24:03 |
Fight Club | 1999 | 2013-09-29 07:38:59 |
I tried my best to make the returned data as flat and usable as possible.
I tried.
So, well, let’s see: Take watched shows, diff the oldest and youngest lastwatched
values to get something like a “watch duration” going and aggregate using it:
myeps %>%
group_by(title) %>%
summarize(days = as.numeric(round(max(last_watched_at) - min(last_watched_at)))) %>%
arrange(desc(days)) %>%
head(10) %>%
kable
title | days |
---|---|
Cougar Town | 735 |
American Dad! | 733 |
Bob’s Burgers | 732 |
Family Guy | 732 |
The Simpsons | 732 |
The Walking Dead | 732 |
Suits | 728 |
MythBusters | 712 |
Arrow | 705 |
House of Cards | 697 |
It’s data like this that makes me wish I had been using trakt.tv forever. The potential for interesting data is great, but the limit is, as usual, the source of the data.