Source: Wikimedia, user -stk.
In addition to the tables described above, trread attempts to calculate the following tables when one uses read_gtfs():
trread prints a message regarding these tables on reading any GTFS file.
# Read in GTFS feed
# here we use a feed included in the package, but note that you can read directly from the New York City Metropolitan Transit Authority using the following URL:
# nyc <- read_gtfs("http://web.mta.info/developers/data/nyct/subway/google_transit.zip")
local_gtfs_path <- system.file("extdata",
"google_transit_nyc_subway.zip",
package = "trread")
nyc <- read_gtfs(local_gtfs_path,
local=TRUE,
frequency=TRUE)
#> Calculating route and stop headways.
For example, joining the standard routes table, with the ‘route_shortname’ variable to routes_frequencies_df.
routes_df_frequencies <- nyc$routes_df %>%
inner_join(nyc$routes_frequency_df, by = "route_id") %>%
select(route_long_name,
median_headways,
mean_headways,
st_dev_headways,
stop_count)
head(routes_df_frequencies)
#> # A tibble: 6 x 5
#> route_long_name median_headways mean_headways st_dev_headways stop_count
#> <chr> <int> <int> <dbl> <int>
#> 1 Broadway - 7 Av… 5 5 0.15 76
#> 2 7 Avenue Express 7 51 135. 120
#> 3 7 Avenue Express 8 8 0.08 68
#> 4 Lexington Avenu… 6 115 205. 77
#> 5 Lexington Avenu… 9 110 271. 102
#> 6 Lexington Avenu… 48 48 0 29
A more complex example of cross-table joins is to pull the stops and their headways for a given route.
This simple question is a great way to begin to understand a lot about the GTFS data model.
First, we’ll need to find a ‘service_id’, which will tell us which stops a route passes through on a given day of the week and year.
When calculating frequencies, trread tries to guess which service_id is representative of a standard weekday by walking through a set of steps. Below we’ll just do some of this manually.
First, lets look at the calendar_df.
head(sample_n(nyc$calendar_df,10))
#> # A tibble: 6 x 10
#> service_id monday tuesday wednesday thursday friday saturday sunday
#> <chr> <int> <int> <int> <int> <int> <int> <int>
#> 1 BSP18GEN-… 1 1 1 1 1 0 0
#> 2 BSP18GEN-… 0 0 0 0 0 1 0
#> 3 BSP18GEN-… 1 1 1 1 1 0 0
#> 4 BSP18GEN-… 1 1 1 1 1 0 0
#> 5 SIR-SP201… 0 0 0 0 0 0 1
#> 6 BSP18GEN-… 0 0 0 0 0 1 0
#> # … with 2 more variables: start_date <date>, end_date <date>
Then we’ll pull a random route_id and set of service_ids that run on Mondays.
select_service_id <- filter(nyc$calendar_df,monday==1) %>% pull(service_id)
select_route_id <- sample_n(nyc$routes_df,1) %>% pull(route_id)
Now we’ll filter down through the data model to just stops for that route and service_ids.