hydrocan normalises data from multiple Canadian hydrometric networks into one consistent output schema. The mechanism that makes this possible is the adapter: a small object that binds a data source name to a description and a set of fetch functions.
This vignette explains:
An adapter is created with new_hydrocan_adapter():
new_hydrocan_adapter(
name,
description,
list_stations_fn,
fetch_flows_fn = NULL,
fetch_daily_flows_fn = NULL,
fetch_levels_fn = NULL,
fetch_daily_levels_fn = NULL,
list_stations_meta_fn = NULL,
license = NULL,
license_url = NULL,
terms_url = NULL
)| Argument | Type | Contract |
|---|---|---|
name |
single character | Unique identifier; becomes the provider_name column in
all output and the registry key |
description |
single character | Human-readable description of the source and its limitations; shown
by hc_list_sources() |
list_stations_fn |
function() |
No arguments; returns a character vector of station IDs this adapter can serve |
fetch_flows_fn |
function(station_id, start_date, end_date) or
NULL |
Returns a tibble matching the realtime schema; NULL if
sub-daily flow data is not available |
fetch_daily_flows_fn |
function(station_id, start_date, end_date) or
NULL |
Returns a tibble matching the daily schema; NULL if
daily flow data is not available |
fetch_levels_fn |
function(station_id, start_date, end_date) or
NULL |
Returns a tibble matching the realtime schema with
parameter = "water_level"; NULL if sub-daily
level data is not available |
fetch_daily_levels_fn |
function(station_id, start_date, end_date) or
NULL |
Returns a tibble matching the daily schema with
parameter = "water_level"; NULL if daily level
data is not available |
list_stations_meta_fn |
function() or NULL |
No arguments; returns a tibble matching the stations schema;
NULL if station metadata is not available |
license |
single character or NULL |
Optional license name (e.g. "CC-BY 4.0"); exposed by
hc_list_sources() |
license_url |
single character or NULL |
Optional URL to the license text |
terms_url |
single character or NULL |
Optional URL to the data provider’s terms of use |
At least one fetch function must be non-NULL.
fetch_flows_fn /
fetch_levels_fn| Column | Type | Notes |
|---|---|---|
station_id |
chr | As provided by the caller |
timestamp |
POSIXct UTC | Sub-daily observations |
value |
dbl | |
parameter |
chr | "water_discharge" or "water_level" |
unit |
chr | Canonical form after normalization (e.g. "m3/s",
"m") |
provider_name |
chr | Must equal the adapter name |
quality_code |
chr | Raw provider quality code; NA if unavailable |
qf_desc |
chr | Provider description of the quality code; NA if
unavailable |
fetch_daily_flows_fn /
fetch_daily_levels_fnSame as the realtime schema above, but with date (Date)
in place of timestamp (POSIXct).
list_stations_meta_fn| Column | Type | Notes |
|---|---|---|
station_id |
chr | |
station_name |
chr | |
provider_name |
chr | Must equal the adapter name |
longitude |
dbl | |
latitude |
dbl | |
elevation_m |
dbl | NA if unavailable |
period_start |
Date | NA if unavailable |
period_end |
Date | NA if station is still active |
notes |
list | Adapter-specific metadata; NULL per row if unused |
When you call hc_read_flows(), the router:
list_stations_fn() on every registered
adapter.source = explicitly. Station IDs must be unambiguous
across the registry.tryCatch so a failure for one station does not abort the
whole request.dplyr::bind_rows().Passing source = "adaptername" restricts the router to
that adapter, but it still calls list_stations_fn() for
that adapter and checks that the requested station is present before
fetching data.
hc_list_sources() returns a tibble of all registered
adapters with their descriptions and a logical column per data type
indicating what each adapter supports. hc_read_stations()
queries all adapters for station metadata, skipping those that do not
implement list_stations_meta_fn.
hydroquebec)The hydroquebec adapter wraps the Hydro-Quebec
open data portal, which provides flow measurements at Hydro-Quebec
reservoir facilities via an Opendatasoft REST API. No authentication is
required.
Key characteristics:
"3-230".parameter = "water_discharge"); no water level.approval column is NA for all records
(the source does not publish approval status); quality_flag
carries the source’s point type field.Station listing and data access:
library(hydrocan)
# Sub-daily (hourly) flows
flows <- hc_read_flows(
station_id = "3-230",
start_date = Sys.Date() - 5,
end_date = Sys.Date(),
source = "hydroquebec"
)
# Source-native daily flows
daily <- hc_read_daily_flows(
station_id = "3-230",
start_date = Sys.Date() - 5,
end_date = Sys.Date(),
source = "hydroquebec"
)The adapter pages through the API (100 records per request) and
filters the returned records to the requested date range in R, because
the API stores split_date as a text field rather than a
datetime field.
Source code: R/hydroquebec.R.
Registered via:
hydrocan_adapter_hydroquebec <- function() {
new_hydrocan_adapter(
"hydroquebec",
paste(
"Hydro-Quebec open data (Opendatasoft platform).",
"Flow data only; no water level.",
"Rolling window of approximately 10 days - historical data is not available."
),
.hq_list_stations,
fetch_flows_fn = .hq_fetch_flows,
fetch_daily_flows_fn = .hq_fetch_daily_flows,
list_stations_meta_fn = .hq_list_stations_meta
)
}Adapters are registered at load time in
R/hydrocan-package.R. Use hc_list_sources() to
see all currently registered sources and which data types each
supports.
Suppose you want to add a hypothetical provincial network called “MyProv” that exposes a JSON API. The steps are:
Create R/myprov.R:
.MYPROV_URL <- "https://data.myprov.ca/api/hydro"
.myprov_list_stations <- function() {
resp <- httr2::request(.MYPROV_URL) |>
httr2::req_url_query(endpoint = "stations", format = "json") |>
httr2::req_perform() |>
httr2::resp_body_json(simplifyVector = TRUE)
resp$station_id # character vector
}
.myprov_fetch_flows <- function(station_id, start_date, end_date) {
resp <- httr2::request(.MYPROV_URL) |>
httr2::req_url_query(
endpoint = "timeseries",
station = station_id,
from = format(start_date),
to = format(end_date),
format = "json"
) |>
httr2::req_perform() |>
httr2::resp_body_json(simplifyVector = TRUE)
tibble::tibble(
station_id = station_id,
timestamp = as.POSIXct(resp$timestamp, tz = "UTC"),
value = as.numeric(resp$discharge_cms),
parameter = "water_discharge",
unit = "m3/s",
provider_name = "myprov",
quality_code = resp$quality_code,
qf_desc = NA_character_
)
}
hydrocan_adapter_myprov <- function() {
new_hydrocan_adapter(
"myprov",
"MyProv provincial hydrometric network. Sub-daily flows only.",
.myprov_list_stations,
fetch_flows_fn = .myprov_fetch_flows
)
}If your source also provides daily data, levels, or station metadata,
supply the corresponding optional function arguments. Only the
capabilities you implement will be advertised by
hc_list_sources().
Some sources do not expose a station-listing endpoint. In those
cases, bundle a character vector of known station IDs directly in the
package and return it from list_stations_fn:
.MYPROV_STATIONS <- c("MP001", "MP002", "MP003")
.myprov_list_stations <- function() .MYPROV_STATIONSThe tradeoff is that the list must be maintained manually as the
network changes. The router only requires that
list_stations_fn() return a character vector; how that
vector is produced is left entirely to the adapter.
Add one line to the .onLoad block in
R/hydrocan-package.R:
Tests for adapters are written against a mock adapter rather than
hitting the live network. This keeps the test suite fast and fully
offline. The pattern, established in
tests/testthat/helper-mocks.R, is:
list_stations_fn that returns a hardcoded
character vector.new_hydrocan_adapter().local_register_adapter(), which restores the prior registry
state on exit..myprov_stations <- c("MP001", "MP002")
.myprov_mock_fetch_flows <- function(station_id, start_date, end_date) {
dates <- seq(as.Date(start_date), as.Date(end_date), by = "day")
tibble::tibble(
station_id = station_id,
timestamp = as.POSIXct(dates, tz = "UTC"),
value = seq_along(dates) * 1.0,
parameter = "water_discharge",
unit = "m3/s",
provider_name = "myprov",
quality_code = NA_character_,
qf_desc = NA_character_
)
}
mock_myprov_adapter <- new_hydrocan_adapter(
"myprov",
"Mock MyProv adapter for offline testing.",
function() .myprov_stations,
fetch_flows_fn = .myprov_mock_fetch_flows
)
test_that("myprov adapter returns correct schema", {
local_register_adapter(mock_myprov_adapter)
result <- hc_read_flows(
station_id = "MP001",
start_date = "2024-01-01",
end_date = "2024-01-03",
source = "myprov"
)
expect_s3_class(result, "hydrocan_realtime")
expect_equal(nrow(result), 3L)
})local_register_adapter() and
local_clear_registry() are defined in
tests/testthat/helper-mocks.R and are available to all test
files automatically.
validate_hydrocan_schema() is called automatically after
every data-fetching API call (hc_read_flows(),
hc_read_daily_flows(), hc_read_levels(),
hc_read_daily_levels()). It will stop with a clear message
if:
It also normalises the unit column: common variants such
as "m³/s", "cms", or "m^3/s" are
all mapped to the canonical "m3/s". Unrecognised unit
strings pass through unchanged with a warning, identifying the raw
string so it can be added to the mapping table in
R/schema.R.