gtfs2gps: Converting GTFS data to GPS-like format

Rafael H. M. Pereira, Pedro R. Andrade, Joao Bazzo

18 August 2020

Introduction

Package gtfs2gps allows users to convert public transport GTFS data into a single data.table format with GPS-like records, which can then be used in various applications such as running transport simulations or scenario analyses. Before using the package, just install it from GitHub.

install.packages("gtfs2gps")

Loading data

After loading the package, GTFS data can be read into R by using read_gtfs(). This function gets a zipped GTFS file and returns a list of data.table objects. The returning list contains the data of each GTFS file indexed according to their file names without extension.

library("gtfs2gps")
sao <- read_gtfs(system.file("extdata/saopaulo.zip", package ="gtfs2gps"))
## Reading 'agency.txt'

## Reading 'routes.txt'

## Reading 'stops.txt'

## Reading 'stop_times.txt'

## Reading 'shapes.txt'

## Reading 'trips.txt'

## Reading 'calendar.txt'

## Reading 'frequencies.txt'
names(sao)
## [1] "agency"      "routes"      "stops"       "stop_times"  "shapes"     
## [6] "trips"       "calendar"    "frequencies"
sao$trips
##     route_id service_id   trip_id               trip_headsign direction_id
##  1:  121G-10        USD 121G-10-0              Metrô Tucuruvi            0
##  2:  148L-10        USD 148L-10-0                        Lapa            0
##  3:  148L-10        USD 148L-10-1             Cohab Antártica            1
##  4:  1720-21        USD 1720-21-0              Metrô Tucuruvi            0
##  5:  1721-10        USD 1721-10-0             Metrô Carandiru            0
##  6:  1726-10        USD 1726-10-0               Metrô Santana            0
##  7:  1745-10        USD 1745-10-0          Shop. Center Norte            0
##  8:  1745-10        USD 1745-10-1        Vl.nova Cachoeirinha            1
##  9:  2004-10        USD 2004-10-0             Cptm Guaianazes            0
## 10:  2008-10        USD 2008-10-0         Cptm Itaim Paulista            0
## 11:  2059-10        USD 2059-10-0             Cptm Guaianazes            0
## 12:  2201-10        USD 2201-10-0             Cptm Guaianazes            0
## 13:  2201-10        USD 2201-10-1              Div. De Ferraz            1
## 14:  2463-10        US_ 2463-10-1              Burgo Paulista            1
## 15:  2702-21        U__ 2702-21-0           Metrô Artur Alvim            0
## 16:  2711-10        USD 2711-10-0             Metrô Patriarca            0
## 17:  2711-10        USD 2711-10-1                  Ponte Rasa            1
## 18:  2712-10        USD 2712-10-0        Shop. Metrô Itaquera            0
## 19:  2722-10        USD 2722-10-0 Metrô Guilhermina/esperança            0
## 20:  2722-10        USD 2722-10-1                 Jd. Veronia            1
## 21:  273X-10        USD 273X-10-0           Metrô Artur Alvim            0
## 22:  2765-10        USD 2765-10-0               Metrô Tatuapé            0
## 23:  2765-10        USD 2765-10-1                  Vl. Cisper            1
## 24:  3064-41        U__ 3064-41-0         Sta. Etelvina Ii B6            0
## 25:  3768-10        USD 3768-10-0         Cptm José Bonifácio            0
## 26:  407E-10        USD 407E-10-0                Metrô Carrão            0
## 27:  407E-10        USD 407E-10-1              Jd. Sto. André            1
## 28:  4727-10        USD 4727-10-0        Metrô Pça. Da áRvore            0
## 29:  4727-10        USD 4727-10-1                  Jd. Clímax            1
## 30:  5010-10        USD 5010-10-0                  Sto. Amaro            0
## 31:  5018-31        USD 5018-31-1            Shop. Interlagos            1
## 32:  5024-10        USD 5024-10-0                  Sto. Amaro            0
## 33:  5024-31        USD 5024-31-0                  Sto. Amaro            0
## 34:  502J-22        U__ 502J-22-0                 Vl. Joaniza            0
## 35:  5106-31        US_ 5106-31-1                   Jd. Selma            1
## 36:  574W-10        USD 574W-10-0                 Metrô Belém            0
## 37:  6039-10        USD 6039-10-1                  Valo Velho            1
## 38:  6041-10        USD 6041-10-0                  Sto. Amaro            0
## 39:  6042-10        US_ 6042-10-0                  Sto. Amaro            0
## 40:  6048-10        USD 6048-10-0                  Sto. Amaro            0
## 41:  6250-10        U__ 6250-10-0              Term. Bandeira            0
## 42:  8001-10        USD 8001-10-0                  Term. Lapa            0
## 43:  8001-10        USD 8001-10-1                   Vl. Piauí            1
## 44:  8007-10        USD 8007-10-0              Term. Pirituba            0
## 45:  8007-10        USD 8007-10-1              Hab. Turística            1
## 46:  8015-21        U__ 8015-21-0                  Cptm Perus            0
## 47:  8015-21        U__ 8015-21-1               Cem. De Perus            1
## 48:  8021-10        U__ 8021-10-0                     Butantã            0
## 49:  8700-21        U__ 8700-21-0       Pça. Ramos De Azevedo            0
## 50:  8700-23        U__ 8700-23-0                     Butantã            0
## 51:  8700-23        U__ 8700-23-1       Pça. Ramos De Azevedo            1
## 52:  8707-10        US_ 8707-10-1                 Rio Pequeno            1
## 53:  9050-10        USD 9050-10-0                  Itaim Bibi            0
## 54:  917H-10        USD 917H-10-0           Metrô Vl. Mariana            0
## 55:  971R-21        U__ 971R-21-0               Metrô Santana            0
## 56:  971R-51        U__ 971R-51-0               Metrô Santana            0
## 57:  N102-11        USD N102-11-1                  Term. Lapa            1
## 58:  N103-11        USD N103-11-0                  Term. Lapa            0
## 59:  N103-11        USD N103-11-1              Term. Pirituba            1
## 60:  N104-11        USD N104-11-0                  Term. Lapa            0
## 61:  N104-11        USD N104-11-1              Term. Pirituba            1
## 62:  N105-11        USD N105-11-0                  Term. Lapa            0
## 63:  N105-11        USD N105-11-1          Term. Cachoeirinha            1
## 64:  N131-11        USD N131-11-0                   Vl. Piauí            0
## 65:  N132-11        USD N132-11-0            Pq. São Domingos            0
## 66:  N237-11        USD N237-11-0              Pq. Edu Chaves            0
## 67:  N241-11        USD N241-11-0               Vl. Albertina            0
## 68:  N243-11        USD N243-11-0                  Jd. Brasil            0
## 69:  N305-11        USD N305-11-0             Cptm Guaianazes            0
## 70:  N305-11        USD N305-11-1            Term. São Miguel            1
## 71:  N341-11        USD N341-11-0       Vl. Cisper (cptm Usp)            0
## 72:  N404-11        USD N404-11-0                 Term. Penha            0
## 73:  N404-11        USD N404-11-1            Term. São Mateus            1
## 74:  N406-11        USD N406-11-0            Term. São Mateus            0
## 75:  N406-11        USD N406-11-1       Term. Cid. Tiradentes            1
## 76:  N407-11        USD N407-11-1            Term. Vl. Carrão            1
## 77:  N434-11        USD N434-11-0           Jd. Iv Centenário            0
## 78:  N435-11        USD N435-11-0                Metalúrgicos            0
## 79:  N436-11        USD N436-11-0                Barro Branco            0
## 80:  N437-11        USD N437-11-0       Term. Cid. Tiradentes            0
## 81:  N440-11        USD N440-11-0                Savoy/dalila            0
## 82:  N506-11        USD N506-11-0          Metrô Vl. Madalena            0
## 83:  N506-11        USD N506-11-1                Term. Sacomã            1
## 84:  N508-11        USD N508-11-0       Term. Pq. D. Pedro Ii            0
## 85:  N508-11        USD N508-11-1                Term. Sacomã            1
## 86:  N535-11        USD N535-11-0                 Jd. Celeste            0
## 87:  N601-11        USD N601-11-0       Term. Pq. D. Pedro Ii            0
## 88:  N732-11        USD N732-11-0            Term. Jd. Jacira            0
## 89:  N739-11        USD N739-11-0               Jd. Universal            0
## 90:  N740-11        USD N740-11-0                 Jd. Riviera            0
## 91:  N838-11        USD N838-11-0             Cptm Leopoldina            0
## 92:  N840-11        USD N840-11-0                Sta. Cecília            0
##     route_id service_id   trip_id               trip_headsign direction_id
##     shape_id
##  1:    52421
##  2:    52857
##  3:    52858
##  4:    52936
##  5:    52941
##  6:    52429
##  7:    52653
##  8:    52654
##  9:    52380
## 10:    52774
## 11:    52554
## 12:    52666
## 13:    52667
## 14:    52826
## 15:    52716
## 16:    52734
## 17:    52735
## 18:    52736
## 19:    52934
## 20:    52935
## 21:    52642
## 22:    52617
## 23:    52618
## 24:    52683
## 25:    51194
## 26:    50784
## 27:    50785
## 28:    51459
## 29:    51460
## 30:    51338
## 31:    51956
## 32:    51542
## 33:    51657
## 34:    52085
## 35:    52809
## 36:    52466
## 37:    52691
## 38:    52712
## 39:    51461
## 40:    52743
## 41:    52980
## 42:    52842
## 43:    52843
## 44:    52863
## 45:    52864
## 46:    52664
## 47:    52665
## 48:    52913
## 49:    52896
## 50:    52901
## 51:    52902
## 52:    52882
## 53:    52977
## 54:    51007
## 55:    52698
## 56:    52693
## 57:    52095
## 58:    52103
## 59:    52104
## 60:    52112
## 61:    52113
## 62:    52123
## 63:    52124
## 64:    52139
## 65:    52145
## 66:    52321
## 67:    52341
## 68:    52352
## 69:    52230
## 70:    52231
## 71:    52349
## 72:    52199
## 73:    52200
## 74:    52234
## 75:    52235
## 76:    52239
## 77:    52251
## 78:    52253
## 79:    52254
## 80:    52256
## 81:    52259
## 82:    52110
## 83:    52111
## 84:    52165
## 85:    52166
## 86:    52149
## 87:    51982
## 88:    51990
## 89:    51954
## 90:    51939
## 91:    52072
## 92:    52135
##     shape_id

Note that not all GTFS files are loaded into R. This function only loads the necessary data to spatially and temporally handle trips and stops, which are: “shapes.txt”, “stop_times.txt”, “stops.txt”, “trips.txt”, “agency.txt”, “calendar.txt”, “routes.txt”, and “frequencies.txt”, with this last four being optional. If a given GTFS zipped file does not contain all of these required files then read_gtfs() will stop with an error.

Subsetting GTFS Data

GTFS data sets can be fairly large for complex public transport networks and, in some cases, users might want to focus on specific transport services at week days/weekends, or on specific trips or routes. The package brings some functions to filter GTFS.zip and speed up the data processing.

These functions subset all the relevant GTFS files in order to remove all the unnecessary rows, keeping the data consistent. The returning values of the four functions is a list of data.table objects, in the same way of the input data. For example, in the code below we filter only shape ids between 53000 and 53020.

library(magrittr)
object.size(sao) %>% format(units = "Kb")
## [1] "2448.6 Kb"
sao_small <- gtfs2gps::filter_by_shape_id(sao, c(51338, 51956, 51657))
object.size(sao_small) %>% format(units = "Kb")
## [1] "110.7 Kb"

We can then easily convert the data to simple feature format and plot them.

sao_small_shapes_sf <- gtfs2gps::gtfs_shapes_as_sf(sao_small)
sao_small_stops_sf <- gtfs2gps::gtfs_stops_as_sf(sao_small)
plot(sf::st_geometry(sao_small_shapes_sf))
plot(sf::st_geometry(sao_small_stops_sf), pch = 20, col = "red", add = TRUE)
box()

After subsetting the data, it is also possible to save it as a new GTFS file using write_gtfs(), as shown below.

write_gtfs(sao_small, "sao_small.zip")

Converting to GPS-like format

To convert GTFS to GPS-like format, use gtfs2gps(). This is the core function of the package. It takes a GTFS zipped file as an input and returns a data.table where each row represents a ‘GPS-like’ data point for every trip in the GTFS file. In summary, this function interpolates the space-time position of each vehicle in each trip considering the network distance and average speed between stops. The function samples the timestamp of each vehicle every 15m by default, but the user can set a different value in the spatial_resolution argument. See the example below.

  sao_gps <- gtfs2gps("sao_small.zip", spatial_resolution = 50)
  head(sao_gps)
##    id shape_id   trip_id trip_number route_type shape_pt_lon shape_pt_lat
## 1:  1    51338 5010-10-0           1          3    -46.63120    -23.66268
## 2:  2    51338 5010-10-0           1          3    -46.63117    -23.66273
## 3:  3    51338 5010-10-0           1          3    -46.63108    -23.66288
## 4:  4    51338 5010-10-0           1          3    -46.63095    -23.66316
## 5:  5    51338 5010-10-0           1          3    -46.63082    -23.66345
## 6:  6    51338 5010-10-0           1          3    -46.63111    -23.66364
##    departure_time stop_id stop_sequence          dist        cumdist
## 1:       04:00:00 3703053             1  0.000000 [m]   0.000000 [m]
## 2:       04:00:01    <NA>            NA  7.230445 [m]   7.230445 [m]
## 3:       04:00:04    <NA>            NA 18.369274 [m]  25.599720 [m]
## 4:       04:00:09    <NA>            NA 34.505965 [m]  60.105685 [m]
## 5:       04:00:13    <NA>            NA 34.505965 [m]  94.611650 [m]
## 6:       04:00:19    <NA>            NA 36.478776 [m] 131.090426 [m]
##          cumtime           speed
## 1:  0.000000 [s] 25.44852 [km/h]
## 2:  1.022834 [s] 25.44852 [km/h]
## 3:  3.621389 [s] 25.44852 [km/h]
## 4:  8.502673 [s] 25.44852 [km/h]
## 5: 13.383957 [s] 25.44852 [km/h]
## 6: 18.544319 [s] 25.44852 [km/h]

The following figure maps the first 100 data points of the sample data we processed. They can be converted to simple feature points or linestring.

  sao_gps60 <- sao_gps[1:100, ]
  
  # points
  sao_gps60_sfpoints <- gps_as_sfpoints(sao_gps60)
  
  # linestring
  sao_gps60_sflinestring <- gps_as_sflinestring(sao_gps60)

  # plot
  plot(sf::st_geometry(sao_gps60_sfpoints), pch = 20)
  plot(sf::st_geometry(sao_gps60_sflinestring), col = "blue", add = TRUE)
  box()

The function gtfs2gps() automatically recognizes whether the GTFS data brings detailed stop_times.txt information or whether it is a frequency.txt GTFS file. A sample data of a GTFS with detailed stop_times.txt cab be found below:

poa <- system.file("extdata/poa.zip", package ="gtfs2gps")

poa_gps <- gtfs2gps(poa, spatial_resolution = 50)

poa_gps_sflinestrig <- gps_as_sfpoints(poa_gps)

plot(sf::st_geometry(poa_gps_sflinestrig[1:200,]))

box()

Methodological note

For a given trip, the function gtfs2gps calculates the average speed between each pair of consecutive stops — given by the ratio between cumulative network distance S and departure time t for a consecutive pair of valid stop_ids (i),

$$Large Speed_i = \frac{S_{i+1}-S_i}{t_{i+1}-t_i}$$

Since the beginning of each trip usually starts before the first stop_id, the mean speed cannot be calculated as shown in the previous equation because information on i period does not exist. In this case, the function consider the mean speed for the whole trip. It also happens after the last valid stop_id (N) of the trips, where info on i + 1 also does not exist.

Final remarks

If you have any suggestions or want to report an error, please visit the GitHub page of the package here.