RREO longitudinal — handling layout drift across years

The hardware and bandwidth for this mirror is donated by dogado GmbH, the Webhosting and Full Service-Cloud Provider. Check out our Wordpress Tutorial.
If you wish to report a bug, or if you are interested in having us mirror your free-software or open-source project, please feel free to contact us at mirror[@]dogado.de.

The problem: RREO labels drift across years

The SICONFI RREO is a long table where each row is identified by an appendix (no_anexo), an account (conta) and a column (coluna). All three drift across fiscal years even when the underlying concept stays the same:

Appendix renaming. Federal social-security data on the General Regime (RGPS) lived in RREO-Anexo 04.3 - RGPS from 2015 to 2022 and moved to RREO-Anexo 04.4 - RGPS from 2023 onwards. RPPS Civis and FCDF moved from Anexo 04.1 to 04.2 in the same reform.
Account-code shuffling. Account labels carry trailing Roman numerals tied to the appendix layout, so the same indicator appears as "Resultado Previdenciário RGPS (VII) = (III - VI)" in one year and as "Resultado Previdenciário RGPS (V) = (III - IV)" in another.
Column suffix instability. A given column shows up as "DESPESAS LIQUIDADAS ATÉ O BIMESTRE / 2019" in 2019, then as "DESPESAS LIQUIDADAS ATÉ O BIMESTRE" (no year) in 2021–2022, and as "DESPESAS LIQUIDADAS ATÉ O BIMESTRE / 2023" in 2023+, with stray whitespace variants such as "... BIMESTRE/ 2023" (no space) inside the same year.

Together, these three sources of drift mean that any code that filters SICONFI by literal label values ends up with paste0() ladders full of ifelse() to special-case each year. This vignette shows how the helpers in tesouror turn this into a single function call.

The bundled layout reference

rreo_layout() returns the table in inst/extdata/rreo_layout.csv that maps (topic, regime, year_range) to the correct appendix and to a year-stable matching key for the account label.

library(tesouror)

rreo_layout()

#> # A tibble: 7 × 8
#>   topic        regime     first_year last_year co_esfera no_anexo                conta_match                          indicador
#>   <chr>        <chr>           <int>     <int> <chr>     <chr>                   <chr>                                <chr>
#> 1 previdencia  civil_rpps       2015      2022 U         RREO-Anexo 04.1         resultado previdenciario rpps civis  resultado_previdenciario_civil
#> 2 previdencia  civil_rpps       2023      2099 U         RREO-Anexo 04.2         resultado previdenciario rpps civis  resultado_previdenciario_civil
#> 3 previdencia  fcdf             2015      2022 U         RREO-Anexo 04.1         resultado previdenciario - fcdf      resultado_previdenciario_fcdf
#> 4 previdencia  fcdf             2023      2099 U         RREO-Anexo 04.2         resultado previdenciario - fcdf      resultado_previdenciario_fcdf
#> 5 previdencia  militar          2015      2099 U         RREO-Anexo 04.2         resultado total - militares inativos resultado_militares_inativos
#> 6 previdencia  rgps             2015      2022 U         RREO-Anexo 04.3 - RGPS  resultado previdenciario rgps        resultado_previdenciario_rgps
#> 7 previdencia  rgps             2023      2099 U         RREO-Anexo 04.4 - RGPS  resultado previdenciario rgps        resultado_previdenciario_rgps

Use it directly to look up the right appendix when fetching data with get_rreo():

layout <- rreo_layout()
fetch_year <- function(year) {
  rule <- layout[layout$topic == "previdencia" &
                 layout$regime == "rgps" &
                 year >= layout$first_year &
                 year <= layout$last_year, ]
  get_rreo(
    an_exercicio = year, nr_periodo = 6,
    co_tipo_demonstrativo = "RREO", no_anexo = rule$no_anexo[1],
    co_esfera = "U", id_ente = 1
  )
}

Step 1 — Normalize column labels

rreo_normalize_columns() strips trailing year suffixes ("/ 2019", "/2023", " /2023", "EM 2023") and squishes whitespace, exposing the year separately so you can distinguish a current-year column from a previous-year comparator.

demo <- tibble::tibble(
  coluna = c(
    "DESPESAS LIQUIDADAS ATÉ O BIMESTRE / 2023",
    "DESPESAS LIQUIDADAS ATÉ O BIMESTRE",
    "DESPESAS LIQUIDADAS ATÉ O BIMESTRE/ 2018",
    "INSCRITAS EM RESTOS A PAGAR NÃO PROCESSADOS EM 2023"
  )
)
rreo_normalize_columns(demo)

#> # A tibble: 4 × 3
#>   coluna                                              coluna_padrao                              coluna_ano
#>   <chr>                                               <chr>                                           <int>
#> 1 DESPESAS LIQUIDADAS ATÉ O BIMESTRE / 2023           DESPESAS LIQUIDADAS ATÉ O BIMESTRE               2023
#> 2 DESPESAS LIQUIDADAS ATÉ O BIMESTRE                  DESPESAS LIQUIDADAS ATÉ O BIMESTRE                 NA
#> 3 DESPESAS LIQUIDADAS ATÉ O BIMESTRE/ 2018            DESPESAS LIQUIDADAS ATÉ O BIMESTRE               2018
#> 4 INSCRITAS EM RESTOS A PAGAR NÃO PROCESSADOS EM 2023 INSCRITAS EM RESTOS A PAGAR NÃO PROCESSADOS      2023

Step 2 — Tidy by topic

tidy_rreo() runs rreo_normalize_columns() for you and then matches each row’s conta against the year-appropriate rule in rreo_layout(). Account labels are matched on a stable stem (everything before the first (, accent-stripped and lowercased), so the shifting Roman numerals do not break the join.

library(dplyr)

# Pull the federal RGPS series for five years using the layout
rgps_raw <- purrr::map_dfr(2019:2023, fetch_year)

rgps_tidy <- rgps_raw |>
  tidy_rreo(topic = "previdencia", regime = "rgps")

panel <- rgps_tidy |>
  filter(coluna_padrao == "DESPESAS LIQUIDADAS ATÉ O BIMESTRE",
         is.na(coluna_ano) | coluna_ano == exercicio) |>
  select(exercicio, indicador, regime, valor)

panel

#> # A tibble: 5 × 4
#>   exercicio indicador                     regime    valor
#>       <int> <chr>                         <chr>     <dbl>
#> 1      2019 resultado_previdenciario_rgps rgps   -2.13e11
#> 2      2020 resultado_previdenciario_rgps rgps   -2.61e11
#> 3      2021 resultado_previdenciario_rgps rgps   -2.48e11
#> 4      2022 resultado_previdenciario_rgps rgps   -2.67e11
#> 5      2023 resultado_previdenciario_rgps rgps   -3.13e11

The five-year deficit series of the RGPS, with no manual reconciliation of appendix names, account codes, or column suffixes.

Multi-regime in one call

Drop regime to keep every regime registered for the topic at once:

all_topics <- rreo_layout()
fetch_topic <- function(year, regime) {
  rules <- all_topics[all_topics$topic == "previdencia" &
                      all_topics$regime == regime &
                      year >= all_topics$first_year &
                      year <= all_topics$last_year, ]
  if (nrow(rules) == 0L) return(NULL)
  purrr::map_dfr(unique(rules$no_anexo), \(an) {
    get_rreo(
      an_exercicio = year, nr_periodo = 6,
      co_tipo_demonstrativo = "RREO", no_anexo = an,
      co_esfera = "U", id_ente = 1
    )
  })
}

regimes <- unique(all_topics$regime[all_topics$topic == "previdencia"])
raw_22_23 <- purrr::map_dfr(2022:2023, \(yr) {
  purrr::map_dfr(regimes, \(rg) fetch_topic(yr, rg))
})

raw_22_23 |>
  tidy_rreo(topic = "previdencia") |>
  filter(coluna_padrao == "DESPESAS LIQUIDADAS ATÉ O BIMESTRE",
         is.na(coluna_ano) | coluna_ano == exercicio) |>
  select(exercicio, indicador, regime, valor) |>
  distinct()

#> # A tibble: 7 × 4
#>   exercicio indicador                      regime        valor
#>       <int> <chr>                          <chr>         <dbl>
#> 1      2022 resultado_previdenciario_civil civil_rpps -5.04e10
#> 2      2023 resultado_previdenciario_civil civil_rpps -5.47e10
#> 3      2022 resultado_previdenciario_fcdf  fcdf       -6.56e 9
#> 4      2023 resultado_previdenciario_fcdf  fcdf       -8.03e 9
#> 5      2022 resultado_militares_inativos   militar    -3.11e10
#> 6      2022 resultado_previdenciario_rgps  rgps       -2.67e11
#> 7      2023 resultado_previdenciario_rgps  rgps       -3.13e11

Adding new topics

inst/extdata/rreo_layout.csv is the single source of truth. To extend coverage to a new topic — for example, RCL (Receita Corrente Líquida) or DPF (Dívida Pública Fundada) — append rows describing each year-range plus the year-stable account stem:

topic,regime,first_year,last_year,co_esfera,no_anexo,conta_match,indicador
rcl,total,2015,2099,M,RREO-Anexo 03,receita corrente liquida (iv) = (i - ii - iii),rcl_total

The matching key (conta_match) is built by lowercasing the account label, stripping Latin diacritics, and dropping everything from the first ( onwards. The helper tesouror:::.clean_conta() shows the exact transformation; running it on a sample row from get_rreo() gives you the value to put into the CSV.

RREO longitudinal — handling layout drift across years

The problem: RREO labels drift across years

The bundled layout reference

Step 1 — Normalize column labels

Step 2 — Tidy by topic

Multi-regime in one call

Adding new topics

See also