tidytransit makes it easy to work with transit data by simplifying General Transit Feed Specification data (the standard format for storing transit data) into tidyverse and sf-friendly dataframes. Use it to map existing stops and routes, calculate transit frequencies, and validate transit feeds.
tidytransit is a fork of gtfsr, published to CRAN, with frequency calculation functions, and without GTFS-specific interactive cartography features.
This package requires a working installation of sf.
# Once sf is installed, you can install from CRAN with:
install.packages('tidytransit')
# For the development version from Github:
# install.packages("devtools")
devtools::install_github("r-transit/tidytransit")
For some users, sf
is impractical to install due to system level dependencies. For these users, trread
may work better. Its tidytransit without geospatial (GDAL) tools.
# Read in GTFS feed
# here we use a feed included in the package, but note that you can read directly from the New York City Metropolitan Transit Authority using the following URL:
# nyc <- read_gtfs("http://web.mta.info/developers/data/nyct/subway/google_transit.zip")
local_gtfs_path <- system.file("extdata",
"google_transit_nyc_subway.zip",
package = "tidytransit")
nyc <- read_gtfs(local_gtfs_path,
local=TRUE,
geometry=TRUE,
frequency=TRUE)
## Calculating route and stop headways.
When you add flags for geometry=TRUE and frequency=TRUE, Tidytransit attempts to convert GTFS feeds into simple features dataframes and frequency/headway dataframes upon import of the GTFS data.
Below we discuss methods and data available for the gtfs object as read by tidytransit.
Perhaps you want to map subway routes and color-code each route by how often trains come.
## Calculating headways and spatial features. This may take a while
## Calculating route and stop headways.
View the headways along routes as a dataframe. routes_frequency
is added to the list of gtfs dataframes read in by read_gtfs
when frequency=TRUE. By default, frequency is calculated for service that happens every weekday from 6 am to 10 pm. See the reference for the get_route_frequency
function for other options (e.g. weekends, other times of day).
## # A tibble: 6 x 5
## route_id median_headways mean_headways st_dev_headways stop_count
## <chr> <int> <int> <dbl> <int>
## 1 1 5 5 0.15 76
## 2 2 7 51 135. 120
## 3 3 8 8 0.08 68
## 4 4 6 115 205. 77
## 5 5 9 110 271. 102
## 6 5X 48 48 0 29
route_id | median_headways | mean_headways | st_dev_headways | stop_count |
---|---|---|---|---|
GS | 4 | 4 | 0.01 | 4 |
L | 4 | 4 | 0.13 | 48 |
1 | 5 | 5 | 0.14 | 76 |
7 | 5 | 5 | 0.29 | 44 |
6 | 6 | 7 | 2.84 | 76 |
E | 6 | 23 | 53.01 | 48 |
View the headways at stops. stops_frequency
is added to the list of gtfs dataframes read in by read_gtfs
. Again, by default, frequency is calculated for service that happens every weekday from 6 am to 10 pm. See the reference for the get_stop_frequency
function for other options (e.g. weekends, other times of day).
## # A tibble: 6 x 6
## route_id direction_id stop_id service_id departures headway
## <chr> <int> <chr> <chr> <int> <dbl>
## 1 1 0 101N ASP18GEN-1087-Weekday-00 177 5.42
## 2 1 0 103N ASP18GEN-1087-Weekday-00 177 5.42
## 3 1 0 104N ASP18GEN-1087-Weekday-00 177 5.42
## 4 1 0 106N ASP18GEN-1087-Weekday-00 178 5.39
## 5 1 0 107N ASP18GEN-1087-Weekday-00 183 5.25
## 6 1 0 108N ASP18GEN-1087-Weekday-00 183 5.25
## # A tibble: 6 x 4
## # Groups: direction_id, stop_id [6]
## direction_id stop_id stop_name headway
## <int> <chr> <chr> <dbl>
## 1 0 902N Times Sq - 42 St 3.60
## 2 1 901S Grand Central - 42 St 3.60
## 3 1 902S Times Sq - 42 St 3.60
## 4 0 901N Grand Central - 42 St 3.61
## 5 0 702N Mets - Willets Point 3.72
## 6 0 707N Junction Blvd 3.72
Included in the tidytransit package is a dataframe with a list of urls, city names, and locations.
You can browse it as a data frame:
## id t loc_id loc_pid
## 1 sanford-trolley/949 Sanford Trolley Vehicle Positions 649 74
## 2 sanford-trolley/948 Sanford Trolley Trip Updates 649 74
## 3 sanford-trolley/947 Sanford Trolley GTFS 649 74
## 4 nyc-ferry/946 NYC Ferry Trip Updates 91 84
## 5 nyc-ferry/945 NYC Ferry Service Alerts 91 84
## 6 nyc-ferry/944 NYC Ferry GTFS 91 84
## loc_t loc_n loc_lat loc_lng
## 1 Sanford, FL, USA Sanford 28.80286 -81.26945
## 2 Sanford, FL, USA Sanford 28.80286 -81.26945
## 3 Sanford, FL, USA Sanford 28.80286 -81.26945
## 4 New York, NY, USA New York 40.71435 -74.00597
## 5 New York, NY, USA New York 40.71435 -74.00597
## 6 New York, NY, USA New York 40.71435 -74.00597
## url_d
## 1 http://sanfordcra.omnimodal.io:8080/api/v1/key/e8d3d923/agency/1/command/gtfs-rt/vehiclePositions
## 2 http://sanfordcra.omnimodal.io:8080/api/v1/key/e8d3d923/agency/1/command/gtfs-rt/tripUpdates
## 3 https://data.omnimodal.io/gtfs/sanfordcommunity-fl-us/sanfordcommunity-fl-us.zip
## 4 http://cnx.ferry.nyc/rtt/public/utility/gtfsrealtime.aspx/tripupdate
## 5 http://cnx.ferry.nyc/rtt/public/utility/gtfsrealtime.aspx/alert
## 6 http://cnx.ferry.nyc/rtt/public/utility/gtfs.aspx
## url_i
## 1 <NA>
## 2 <NA>
## 3 <NA>
## 4 https://www.ferry.nyc/developer-tools/
## 5 https://www.ferry.nyc/developer-tools/
## 6 https://www.ferry.nyc/developer-tools/
or as a map:
## Linking to GEOS 3.6.1, GDAL 2.1.3, proj.4 4.9.3
feedlist_sf <- st_as_sf(feedlist_df,
coords=c("loc_lng","loc_lat"),
crs=4326)
plot(feedlist_sf, max.plot = 1)
Note that there is a url (url_d
) for each feed, which can be used to read the feed for a given city into R.
For example:
See the reference for more on metadata in the transitfeeds_df
When reading a feed, it is checked against the GTFS specification, and an attribute is added to the resultant object called validation_result
, which is a tibble about the files and fields in the GTFS feed and how they compare to the specification.
You can get this tibble from the metadata about the feed.
## # A tibble: 6 x 8
## file file_spec file_provided_s… field field_spec field_provided_…
## <chr> <chr> <lgl> <chr> <chr> <lgl>
## 1 trips req TRUE rout… req TRUE
## 2 trips req TRUE serv… req TRUE
## 3 trips req TRUE trip… req TRUE
## 4 trips req TRUE trip… opt TRUE
## 5 trips req TRUE trip… opt FALSE
## 6 trips req TRUE dire… opt TRUE
## # … with 2 more variables: validation_status <chr>,
## # validation_details <chr>