---
title: "Combining data from multiple sources"
output: rmarkdown::html_vignette
vignette: >
  %\VignetteIndexEntry{Combining data from multiple sources}
  %\VignetteEngine{knitr::rmarkdown}
  %\VignetteEncoding{UTF-8}
---

```{r setup, include = FALSE}
knitr::opts_chunk$set(collapse = TRUE, comment = "#>", eval = FALSE)
```

```{r}
library(hydrocan)
library(dplyr)
```

## What sources are available?

```{r}
hc_list_sources()
```

## Station metadata across all sources

```{r}
stations <- hc_read_stations()
stations

# Filter to a region
stations |>
  filter(latitude > 48, latitude < 50, longitude > -74, longitude < -72)
```

## Fetch data - router auto-detects the source

Pass station IDs from different providers in one call. The router matches
each ID to its adapter automatically.

```{r}
# CEHQ station (natural river gauge) + Hydro-Quebec station (reservoir)
daily <- hc_read_daily_flows(
  station_id = c("023301", "3-230"),
  start_date = Sys.Date() - 7,
  end_date = Sys.Date()
)

daily
```

## Combine with bind_rows (same schema, all sources)

Because every adapter returns the same column set, data can be stacked
directly and analysed together.

```{r}
cehq_data <- hc_read_daily_flows(
  station_id = c("023301", "030101"),
  start_date = "2015-01-01",
  end_date = "2020-12-31",
  source = "cehq"
)

hq_data <- hc_read_daily_flows(
  station_id = "3-230",
  start_date = Sys.Date() - 7,
  end_date = Sys.Date(),
  source = "hydroquebec"
)

# Stack: works because the schema is identical
all_flows <- bind_rows(cehq_data, hq_data)

all_flows |>
  count(provider_name)
```

## Annual summary across providers

```{r}
daily |>
  mutate(year = as.integer(format(date, "%Y"))) |>
  group_by(station_id, provider_name, year) |>
  summarise(
    mean_flow = mean(value, na.rm = TRUE),
    .groups = "drop"
  ) |>
  arrange(year, provider_name)
```

## Explicit source bypasses the router

Use `source =` to skip station detection entirely - useful when you know
the provider or when working with large station lists.

```{r}
hc_read_daily_flows(
  station_id = c("023301", "030101", "040110"),
  start_date = "2010-01-01",
  end_date = "2023-12-31",
  source = "cehq"
)
```
