The hardware and bandwidth for this mirror is donated by dogado GmbH, the Webhosting and Full Service-Cloud Provider. Check out our Wordpress Tutorial.
If you wish to report a bug, or if you are interested in having us mirror your free-software or open-source project, please feel free to contact us at mirror[@]dogado.de.

Spatial joins

2026-01-10

This vignette shows how to use ddbs_join() to perform fast spatial join operations on large data sets with three different approaches:

1 In-memory: pass sf objects and get an sf result (DuckDB runs under the hood, no persistent DataBase). 2. Connected: pass table names stored in an existing DuckDB connection and get an sf result. 3. Write-to-DB: same as (2), but write the result to a new DuckDB table.

Let’s see a few examples. First, let’s load a few libraries and our sample data:

library(duckspatial)
# library(mapview)
library(sf)

# polygons
countries_sf  <- sf::st_read(
    system.file("spatial/countries.geojson",  package = "duckspatial"),
    quiet = TRUE
    )

# random points
set.seed(42)
n <- 10000
points_sf <- data.frame(
  id = 1:n,
  x  = runif(n, min = -180, max = 180),
  y  = runif(n, min =  -90, max =  90)
) |>
  sf::st_as_sf(coords = c("x","y"), crs = 4326)

1) In-memory: pass sf, return sf

The simplest way to perform fast spatial join. You simply pass two sf objects, and ddbs_join() spins up a temporary DuckDB, runs the join, and returns an sf.

out_sf1 <- ddbs_join(
  x    = points_sf,
  y    = countries_sf,
  join = "within"
)

# quick peek
# mapview(out_sf1, zcol="NAME_ENGL")

2) Connected: pass table names in DuckDB, return sf

In the second and third approaches, we make use of a connection to an existing DuckDB database. So let’s create a fresh DuckDB connection using the ddbs_create_conn() function, which automatically install and load DuckDB spatial extension to the connection.

# create a fresh DuckDB connection
conn <- duckspatial::ddbs_create_conn()

Now, in this second approach you need first to write your layers to DuckDB, and perform the spatial join by referencing their table names. Like this:

# write data to DuckDB
ddbs_write_vector(conn, points_sf,   "points",    overwrite = TRUE)
ddbs_write_vector(conn, countries_sf, "countries", overwrite = TRUE)

# spatial join inside DuckDB; result returned as sf
out_sf2 <- ddbs_join(
  conn,
  x    = "points",
  y    = "countries",
  join = "within"
)

3) Write-to-DB: create a new table with the join result

The output of approaches 1 and 2 is an sf object loaded to your memory. In this third approach, ddbs_join() writes a new table in the DuckDB database. You simply need to the name of the new table.

ddbs_join(
    conn = conn,
    x = "points",
    y = "countries",
    join = "within",
    name = "points_in_countries",
    overwrite = TRUE
)

# use the result in SQL (or read back as sf later)
# DBI::dbReadTable(conn, "points_in_countries") |>
#     sf::st_as_sf(wkt = 'geometry') |> 
#     head()

Spatial Join Predicates:

A spatial predicate is really just a function that evaluates some spatial relation between two geometries and returns true or false, e.g., “does a contain b” or “is a within distance x of b”. The join argument accepts the spatial predicates:

Clean up

Don’t forget to disconnect from the database.

duckdb::dbDisconnect(conn)

These binaries (installable software) and packages are in development.
They may not be fully stable and should be used with caution. We make no claims about them.
Health stats visible at Monitor.