---
title: "CUSTOS API – Federal government costs"
output: rmarkdown::html_vignette
vignette: >
  %\VignetteIndexEntry{CUSTOS API – Federal government costs}
  %\VignetteEngine{knitr::rmarkdown}
  %\VignetteEncoding{UTF-8}
---

```{r, include = FALSE}
knitr::opts_chunk$set(collapse = TRUE, comment = "#>", eval = FALSE)
```

## About

The CUSTOS API (`https://apidatalake.tesouro.gov.br/docs/custos/`) provides
cost data from the Federal Government Cost Portal (Portal de Custos do Governo
Federal). It breaks down costs into six categories: active staff, retired staff,
pensioners, depreciation, transfers, and other costs.

All parameters in this API are **optional** — you can call any function without
arguments to retrieve the full dataset.

## Performance warning

> **The CUSTOS API is slow.** Its server default is only 250 rows per page;
> the package raises this to 500 by default (lowered from 1000 in 0.2.1
> after the upstream load balancer started cutting broader queries).
> Even with 500-row pages, unfiltered queries routinely hit HTTP 504
> timeouts. **Always filter your queries** by at least:
>
> - `year` **and** `month` — year-only queries are the single most common
>   cause of 504s; always pin a single month for production work
> - `org_level1` + `org_level2` (reduce to a specific organization)
> - `legal_nature` (reduce to a legal nature category)
> - `max_rows` (set a hard cap for testing)
>
> The package retries automatically on 504s (up to 5 attempts with
> progressive backoff). When pagination fails after the first page, the
> rows already fetched are **not discarded** — you receive a partial
> tibble with `attr(result, "partial") = TRUE` and
> `attr(result, "last_page_error")` describing the failure. Always check
> these attributes when working with broad queries.

## Available functions

| Portuguese | English | Description |
|:---|:---|:---|
| `get_custos_pessoal_ativo()` | `get_costs_active_staff()` | Active staff costs |
| `get_custos_pessoal_inativo()` | `get_costs_retired_staff()` | Retired staff costs |
| `get_custos_pensionistas()` | `get_costs_pensioners()` | Pensioner costs |
| `get_custos_demais()` | `get_costs_other()` | Other costs |
| `get_custos_depreciacao()` | `get_costs_depreciation()` | Depreciation costs |
| `get_custos_transferencias()` | `get_costs_transfers()` | Transfer costs |

## Parameter mapping

All six functions share the same optional filters:

| Portuguese (API) | English | Description |
|:---|:---|:---|
| `ano` | `year` | Year of the record |
| `mes` | `month` | Month (1-12) |
| `natureza_juridica` | `legal_nature` | Legal nature: 1=Public Company, 2=Foundation, 3=Direct Admin, 4=Autarchy, 6=Mixed Economy |
| `organizacao_n1` | `org_level1` | Top-level SIORG code (Ministry). See `get_siorg_orgaos()`. Auto-padded. |
| `organizacao_n2` | `org_level2` | Second-level SIORG code. See `get_siorg_estrutura()`. Auto-padded. |
| `organizacao_n3` | `org_level3` | Third-level SIORG code. See `get_siorg_estrutura()`. Auto-padded. |

SIORG codes are automatically zero-padded: you can pass `244`, `"244"`,
or `"000244"` — all produce the same query.

## Examples

```{r}
library(tesouror)
library(dplyr)

# Step 1: Look up SIORG codes for the organization you want
orgaos <- get_siorg_organizations(power_code = 1, sphere_code = 1)
mec <- orgaos |> filter(sigla == "MEC")          # code 244
inep <- orgaos |> filter(sigla == "INEP")         # code 249

# Step 2: Query CUSTOS with org AND month filters (year-only is unsafe!)
# Active staff costs for INEP, June 2023
ativos_inep <- get_costs_active_staff(
  year = 2023, month = 6,
  org_level1 = 244,     # MEC — auto-padded to "000244"
  org_level2 = 249      # INEP — auto-padded to "000249"
)

# Always check whether pagination completed; on 504 mid-stream the
# package returns a partial tibble rather than dropping the data.
if (isTRUE(attr(ativos_inep, "partial"))) {
  message("Partial result — last page failed: ",
          attr(ativos_inep, "last_page_error"))
}

# Pensioner costs for INEP, June 2023 only
pensionistas_inep <- get_costs_pensioners(
  year = 2023, month = 6,
  org_level1 = 244,
  org_level2 = 249
)

# Quick test: just grab the first 100 rows
sample <- get_costs_active_staff(
  year = 2023, month = 6,
  legal_nature = 3,
  max_rows = 100
)
```

### Response columns

The CUSTOS API returns organization hierarchy down to 6 levels:

| Column | Description |
|:---|:---|
| `co_organizacao_n0` / `ds_organizacao_n0` | Top authority (e.g., Presidência) |
| `co_organizacao_n1` / `ds_organizacao_n1` | Ministry level |
| `co_organizacao_n2` / `ds_organizacao_n2` | Entity/secretariat |
| `co_organizacao_n3` / `ds_organizacao_n3` | Department |
| `co_organizacao_n4` to `n6` | Deeper sub-units (`"-9"` = not applicable) |
| `an_lanc` / `me_lanc` | Year and month of the accounting entry |
| `ds_area_atuacao` | Area: `"FINALISTICA"` or `"SUPORTE"` |
| `ds_escolaridade` | Education level of the staff member |
| `ds_faixa_etaria` | Age range |
| `in_sexo` | Sex: `"F"` or `"M"` |
| `in_forca_trabalho` | Workforce count |
| `va_custo_de_pessoal` | Cost value (R$) |
