Source: Wikimedia, user -stk.
In addition to the tables described above, tidytransit attempts to calculate the following tables when one uses read_gtfs():
tidytransit prints a message regarding these tables on reading any GTFS file.
# Read in GTFS feed
# here we use a feed included in the package, but note that you can read directly from the New York City Metropolitan Transit Authority using the following URL:
# nyc <- read_gtfs("http://web.mta.info/developers/data/nyct/subway/google_transit.zip")
local_gtfs_path <- system.file("extdata",
"google_transit_nyc_subway.zip",
package = "tidytransit")
nyc <- read_gtfs(local_gtfs_path,
local=TRUE,
geometry=TRUE,
frequency=TRUE)
#> Calculating route and stop headways using defaults (6 am to 10 pm for weekday service).
For example, joining the standard routes table, with the ‘route_shortname’ variable to routes_frequencies_df.
routes_df_frequencies <- nyc$routes_df %>%
inner_join(nyc$routes_frequency_df, by = "route_id") %>%
select(route_long_name,
median_headways,
mean_headways,
st_dev_headways,
stop_count)
head(routes_df_frequencies)
#> # A tibble: 6 x 5
#> route_long_name median_headways mean_headways st_dev_headways stop_count
#> <chr> <int> <int> <dbl> <int>
#> 1 Broadway - 7 A… 5 5 0.14 76
#> 2 7 Avenue Expre… 8 36 63.7 118
#> 3 7 Avenue Expre… 8 8 0.06 68
#> 4 Lexington Aven… 7 197 350. 75
#> 5 Lexington Aven… 10 97 245. 100
#> 6 Lexington Aven… 48 48 0 29
You can do the same with ‘simple features tables’.
For example, under the hood, plot(gtfs_obj) is doing this:
routes_sf_frequencies <- nyc$routes_sf %>%
inner_join(nyc$routes_frequency_df, by = "route_id") %>%
select(median_headways,
mean_headways,
st_dev_headways,
stop_count)
plot(routes_sf_frequencies)
A more complex example of cross-table joins is to pull the stops and their headways for a given route.
This simple question is a great way to begin to understand a lot about the GTFS data model.
First, we’ll need to find a ‘service_id’, which will tell us which stops a route passes through on a given day of the week and year.
When calculating frequencies, tidytransit tries to guess which service_id is representative of a standard weekday by walking through a set of steps. Below we’ll just do some of this manually.
First, lets look at the calendar_df.
head(sample_n(nyc$calendar_df,10))
#> # A tibble: 6 x 10
#> service_id monday tuesday wednesday thursday friday saturday sunday
#> <chr> <int> <int> <int> <int> <int> <int> <int>
#> 1 ASP18GEN-5043-… 0 0 0 0 0 1 0
#> 2 BSP18GEN-M023-… 0 0 0 0 0 0 1
#> 3 BSP18GEN-L022-… 0 0 0 0 0 0 1
#> 4 BSP18GEN-G033-… 0 0 0 0 0 1 0
#> 5 ASP18GEN-6085-… 1 1 1 1 1 0 0
#> 6 BSP18GEN-FS011… 1 1 1 1 1 0 0
#> # ... with 2 more variables: start_date <date>, end_date <date>
Then we’ll pull a random route_id and set of service_ids that run on Mondays.
select_service_id <- filter(nyc$calendar_df,monday==1) %>% pull(service_id)
select_route_id <- sample_n(nyc$routes_df,1) %>% pull(route_id)
Now we’ll filter down through the data model to just stops for that route and service_ids.
some_trips <- nyc$trips_df %>%
filter(route_id %in% select_route_id & service_id %in% select_service_id)
some_stop_times <- nyc$stop_times_df %>%
filter(trip_id %in% some_trips$trip_id)
some_stops <- nyc$stops_sf %>%
filter(stop_id %in% some_stop_times$stop_id)
Before we plot them, lets pull the frequency calculations from the calculated table onto their geometries.
some_stops_freq_sf <- some_stops %>%
left_join(nyc$stops_frequency_df, by="stop_id") %>%
select(headway)
plot(some_stops_freq_sf)
We may–in fact, we probably will–see some surprising outliers for headway calculations in this plot.
Calculating headways at stops is tricky for a number of reasons. One reason is that these calculations can be determined by schedules of service.
But hopefully now you have a better understanding of how you can use the GTFS data model to communicate about these questions.