The hardware and bandwidth for this mirror is donated by dogado GmbH, the Webhosting and Full Service-Cloud Provider. Check out our Wordpress Tutorial.
If you wish to report a bug, or if you are interested in having us mirror your free-software or open-source project, please feel free to contact us at mirror[@]dogado.de.

Stop identification with ST-DBSCAN

library(stdbscan)
library(readr)
library(ggplot2)
library(plotly)

Presentation

This vignette briefly demonstrates how to perform stop identification in a GPS track using ST-DBSCAN, which is a classic application of this algorithm.

Dataset

The GeoLife GPS Trajectories dataset is used for this demonstration. The GPS trajectories are located in Beijing. We previously converted the pings to a metric coordinate reference system (EPSG:4586) and selected only the relevant variables.

head(geolife_traj)
#>         date     time        x       y
#> 1 2008-10-23 02:53:04 441782.8 4428131
#> 2 2008-10-23 02:53:10 441785.6 4428129
#> 3 2008-10-23 02:53:15 441782.8 4428129
#> 4 2008-10-23 02:53:20 441780.1 4428130
#> 5 2008-10-23 02:53:25 441769.6 4428126
#> 6 2008-10-23 02:53:30 441749.3 4428121
ggplot() +
  geom_path(data = geolife_traj, aes(x, y)) +
  labs(x = "", y = "",
    title = "GPS track analyzed in this vignette",
    caption = "Data: GeoLife GPS Trajectories (Microsoft, 2012). Author: Antoine Le Doeuff, 2026",
  ) +
  coord_equal() +
  theme_minimal() +
  theme(plot.title = element_text(size = 16, face = "bold"))

Preprocessing

For st_dbscan() to work, the time variable must be numeric. We therefore convert it to seconds since the beginning of the track. Note that the data must be sorted by time.

geolife_traj$date_time <- as.POSIXct(
  paste(geolife_traj$date, geolife_traj$time),
  format = "%Y-%m-%d %H:%M:%S",
  tz = "GMT"
)

# Sort data by time if needed
geolife_traj <- geolife_traj[order(geolife_traj$date_time), ]

# Convert to cumulative time
geolife_traj$t <- as.numeric(
  geolife_traj$date_time - min(geolife_traj$date_time)
)

# Convert to matrix
data <- cbind(geolife_traj$x, geolife_traj$y, geolife_traj$t)

Run ST-DBSCAN

We can then run ST-DBSCAN using st_dbscan(). We set a spatial neighborhood of 3 meters, a temporal neighborhood of 30 seconds, and require a minimum of 3 pings to form a cluster. Note that these parameters are used only for demonstration purposes; in practice, a grid search (or similar tuning strategy) should be used to determine optimal values. You can also pass extra arguments that you would use with dbscan::dbscan() and dbscan::frNN().

(res <- st_dbscan(
  data = data,
  eps_spatial = 3, # meters
  eps_temporal = 30, # seconds
  min_pts = 3,
  # extra arguments
  splitRule = "STD",
  search = "kdtree",
  approx = 1
))
#> ST-DBSCAN clustering for 468 objects.
#> Parameters: eps = 3, eps_temporal = 30, minPts = 3
#> Using euclidean distances and borderpoints = TRUE
#> The clustering contains 5 cluster(s) and 420 noise points.
#> 
#>   0   1   2   3   4   5 
#> 420   4   5  12  12  15 
#> 
#> Available fields: cluster, eps, minPts, metric, borderPoints,
#>                   eps_temporal

As with dbscan::dbscan(), the number of points in each cluster is displayed when the result is printed.

Check result

Clusters can be plotted directly using ggplot2:

# Put the cluster in the input data
geolife_traj$clust <- as.factor(res$cluster)

# Extract stops and movements
geolife_traj_mvt <- geolife_traj[geolife_traj$clust == "0", ]
geolife_traj_stop <- geolife_traj[geolife_traj$clust != "0", ]

# Plot
ggplot() +
  geom_path(data = geolife_traj_mvt, aes(x, y)) +
  geom_point(data = geolife_traj_stop, aes(x, y, color = clust), size = 4) +
  labs(x = "", y = "", color = "stop ID",
    title = "ST-DBSCAN stop identification",
    subtitle = "eps_spatial = 3 m, eps_temporal = 30 s and min_pts = 3",
    caption = "Data: GeoLife GPS Trajectories (Microsoft, 2012). Author: Antoine Le Doeuff, 2026",
  ) +
  scale_color_manual(values = MetBrewer::met.brewer("Isfahan2", 5)) +
  coord_equal() +
  theme_minimal() +
  theme(
    legend.position = "bottom",
    plot.title = element_text(size = 16, face = "bold"),
  )

Clusters can be visualized in 3D using plotly:

# Zoom on stop 4
geolife_traj_f <- geolife_traj[
  geolife_traj$x > 441060 & geolife_traj$x < 441100,
]
geolife_traj_f <- geolife_traj_f[
  geolife_traj_f$y > 4428780 & geolife_traj_f$y < 4428820,
]

# Extract stop
geolife_traj_f_stop <- geolife_traj_f[geolife_traj_f$clust != "0", ]

# Plotly figure
fig <- plot_ly(
  data = geolife_traj_f,
  x = ~x,
  y = ~y,
  z = ~t,
  type = "scatter3d", mode = "lines+markers",
  line = list(wigeolife_trajh = 4, color = "grey"),
  marker = list(size = 3, color = "grey")
)
fig |>
  add_markers(
    x = ~geolife_traj_f_stop$x,
    y = ~geolife_traj_f_stop$y,
    z = ~geolife_traj_f_stop$t,
    marker = list(size = 4, color = 'red'),
    name = 'Stop'
  ) |>
  layout(
    scene = list(
      xaxis = list(title = "x"),
      yaxis = list(title = "y"),
      zaxis = list(title = "t")
  )
)

These binaries (installable software) and packages are in development.
They may not be fully stable and should be used with caution. We make no claims about them.
Health stats visible at Monitor.