The hardware and bandwidth for this mirror is donated by dogado GmbH, the Webhosting and Full Service-Cloud Provider. Check out our Wordpress Tutorial.
If you wish to report a bug, or if you are interested in having us mirror your free-software or open-source project, please feel free to contact us at mirror[@]dogado.de.
A5 cell IDs are 64-bit unsigned integers. R has no native
uint64 type, and its double can only represent
integers exactly up to 2^53. Nearly half of all A5 cell IDs exceed this
threshold, so converting them to double silently corrupts
the data.
This is a problem when reading Parquet files that store A5 cell IDs
as uint64 columns — the standard format used by DuckDB,
Python, and geoparquet.io. By
default, arrow::read_parquet() converts uint64
to R’s double, losing precision:
library(arrow)
library(tibble)
library(a5R)
# A real A5 cell — Edinburgh at resolution 20
cell <- a5_lonlat_to_cell(-3.19, 55.95, resolution = 20)
a5_u64_to_hex(cell)
#> [1] "6344bba17af80000"
# Write to Parquet as uint64 (the standard interchange format)
tf <- tempfile(fileext = ".parquet")
arrow::write_parquet(
arrow::arrow_table(cell_id = a5_cell_to_arrow(cell)),
tf
)
# Read it back naively — arrow silently converts uint64 to double
(naive <- tibble(arrow::read_parquet(tf)))
#> # A tibble: 1 × 1
#> cell_id
#> <dbl>
#> 1 7.15e18
cell_as_dbl <- naive$cell_id
# The double can't distinguish this cell from nearby IDs
cell_as_dbl == cell_as_dbl + 1 # TRUE — silent corruption
#> [1] TRUE
cell_as_dbl == cell_as_dbl + 100 # still TRUE
#> [1] TRUEa5_cell_from_arrow() and
a5_cell_to_arrow()a5R provides two functions that bypass the lossy double
conversion entirely, using Arrow’s zero-copy View() to
reinterpret the raw bytes:
library(a5R)
library(tibble)
# Six cities across the globe — some will have bit 63 set (origin >= 6)
cities <- tibble(
name = c("Edinburgh", "Tokyo", "São Paulo", "Nairobi", "Anchorage", "Sydney"),
lon = c( -3.19, 139.69, -46.63, 36.82, -149.90, 151.21),
lat = c( 55.95, 35.69, -23.55, -1.29, 61.22, -33.87)
)
cities$cell <- a5_lonlat_to_cell(cities$lon, cities$lat, resolution = 10)
cities
#> # A tibble: 6 × 4
#> name lon lat cell
#> <chr> <dbl> <dbl> <a5_cell>
#> 1 Edinburgh -3.19 56.0 6344be8000000000
#> 2 Tokyo 140. 35.7 872f8a8000000000
#> 3 São Paulo -46.6 -23.6 377f908000000000
#> 4 Nairobi 36.8 -1.29 6fad538000000000
#> 5 Anchorage -150. 61.2 00d1c38000000000
#> 6 Sydney 151. -33.9 8f7ec58000000000These cells work seamlessly in tibbles. Now let’s enrich the data with some A5 operations — cell resolution and distance from Edinburgh:
edinburgh <- cities$cell[1]
cities$resolution <- a5_get_resolution(cities$cell)
cities$dist_from_edinburgh_km <- as.numeric(
a5_cell_distance(cities$cell, rep(edinburgh, nrow(cities)), units = "km")
)
cities
#> # A tibble: 6 × 6
#> name lon lat cell resolution dist_from_edinburgh_km
#> <chr> <dbl> <dbl> <a5_cell> <int> <dbl>
#> 1 Edinburgh -3.19 56.0 6344be8000000000 10 0
#> 2 Tokyo 140. 35.7 872f8a8000000000 10 9233.
#> 3 São Paulo -46.6 -23.6 377f908000000000 10 9743.
#> 4 Nairobi 36.8 -1.29 6fad538000000000 10 7317.
#> 5 Anchorage -150. 61.2 00d1c38000000000 10 6662.
#> 6 Sydney 151. -33.9 8f7ec58000000000 10 16872.Convert to an Arrow table and write to Parquet. The cell column is
stored as native uint64 — the same binary format used by
DuckDB, Python, and geoparquet.io:
tf <- tempfile(fileext = ".parquet")
arrow_tbl <- arrow::arrow_table(
name = cities$name,
cell_id = a5_cell_to_arrow(cities$cell),
cell_res = cities$resolution,
dist_from_edinburgh_km = cities$dist_from_edinburgh_km
)
arrow_tbl$schema
#> Schema
#> name: string
#> cell_id: uint64
#> cell_res: int32
#> dist_from_edinburgh_km: double
arrow::write_parquet(arrow_tbl, tf)Read it back — a5_cell_from_arrow() recovers the exact
cell IDs without any precision loss:
pq <- arrow::read_parquet(tf, as_data_frame = FALSE)
# Recover cells from the uint64 column, bind with the rest of the data
recovered_cells <- a5_cell_from_arrow(pq$column(1))
result <- as.data.frame(pq)
result$cell <- recovered_cells
result <- tibble::as_tibble(result[c("name", "cell", "cell_res", "dist_from_edinburgh_km")])
result
#> # A tibble: 6 × 4
#> name cell cell_res dist_from_edinburgh_km
#> <chr> <a5_cell> <int> <dbl>
#> 1 Edinburgh 6344be8000000000 10 0
#> 2 Tokyo 872f8a8000000000 10 9233.
#> 3 São Paulo 377f908000000000 10 9743.
#> 4 Nairobi 6fad538000000000 10 7317.
#> 5 Anchorage 00d1c38000000000 10 6662.
#> 6 Sydney 8f7ec58000000000 10 16872.Verify the round-trip is lossless:
a5_cell_to_arrow(): packs the eight
raw-byte fields into 8-byte little-endian blobs (one per cell), creates
an Arrow fixed_size_binary(8) array, then uses
View(uint64) to reinterpret the bytes as unsigned 64-bit
integers — zero-copy.
a5_cell_from_arrow(): does the
reverse — View(fixed_size_binary(8)) on the
uint64 array to get the raw bytes, then unpacks each 8-byte
blob into the eight raw-byte fields used by
a5_cell.
The raw bytes never pass through double, so there is no
precision loss at any step. See
vignette("internal-cell-representation") for details on the
raw-byte representation.
These binaries (installable software) and packages are in development.
They may not be fully stable and should be used with caution. We make no claims about them.
Health stats visible at Monitor.