The hardware and bandwidth for this mirror is donated by dogado GmbH, the Webhosting and Full Service-Cloud Provider. Check out our Wordpress Tutorial.
If you wish to report a bug, or if you are interested in having us mirror your free-software or open-source project, please feel free to contact us at mirror[@]dogado.de.

cnefetools: Tools for working with Brazilian CNEFE address data cnefetools website

CRAN status CRAN downloads R-CMD-check License: MIT Lifecycle: experimental

{cnefetools} provides helper functions to efficiently work with the Brazilian National Address File for Statistical Purposes (Cadastro Nacional de Endereços para Fins Estatísticos, CNEFE), an address-level dataset released by the Brazilian Institute of Geography and Statistics (Instituto Brasileiro de Geografia e Estatística, IBGE).

Installation

Install the stable version from CRAN:

install.packages("cnefetools")

To install the development version from GitHub:

# install.packages("pak")
pak::pak("pedreirajr/cnefetools")

# or
# install.packages("remotes")
remotes::install_github("pedreirajr/cnefetools")

Overview

Function Description
read_cnefe() Downloads and reads CNEFE data for a municipality; returns an Arrow table or sf object
cnefe_counts() Aggregates address counts to H3 hexagons or user-provided polygons
compute_lumi() Computes land-use mix indices on H3 hexagons or user-provided polygons
tracts_to_h3() Dasymetric interpolation of census tract variables to an H3 grid via CNEFE dwelling points
tracts_to_polygon() Dasymetric interpolation of census tract variables to user-provided polygons via CNEFE dwelling points
cnefe_doc() Opens the official CNEFE methodological note (PDF)
cnefe_dictionary() Opens the official CNEFE variable dictionary (Excel)

Reading CNEFE data

read_cnefe() downloads and reads the CNEFE CSV for a municipality, returning an Arrow table by default:

library(cnefetools)
library(dplyr)

# Read CNEFE data for Salvador as an Arrow table
tab_ssa <- read_cnefe(2927408, cache = TRUE)

tab_ssa |>
  collect() |> # materialize the arrow table in R
  tibble() |>
  head()
#> # A tibble: 6 × 34
#>   COD_UNICO_ENDERECO COD_UF COD_MUNICIPIO COD_DISTRITO COD_SUBDISTRITO COD_SETOR
#>                <int>  <int>         <int>        <int>         <int64> <chr>    
#> 1          222386741     29       2927408    292740805     29274080518 29274080…
#> 2           27995350     29       2927408    292740805     29274080522 29274080…
#> 3           28034841     29       2927408    292740805     29274080522 29274080…
#> 4          217544957     29       2927408    292740805     29274080518 29274080…
#> 5          217639781     29       2927408    292740805     29274080526 29274080…
#> 6          217639701     29       2927408    292740805     29274080526 29274080…
#> # ℹ 28 more variables: NUM_QUADRA <int>, NUM_FACE <int>, CEP <int>,
#> #   DSC_LOCALIDADE <chr>, NOM_TIPO_SEGLOGR <chr>, NOM_TITULO_SEGLOGR <chr>,
#> #   NOM_SEGLOGR <chr>, NUM_ENDERECO <int>, DSC_MODIFICADOR <chr>,
#> #   NOM_COMP_ELEM1 <chr>, VAL_COMP_ELEM1 <chr>, NOM_COMP_ELEM2 <chr>,
#> #   VAL_COMP_ELEM2 <chr>, NOM_COMP_ELEM3 <chr>, VAL_COMP_ELEM3 <chr>,
#> #   NOM_COMP_ELEM4 <chr>, VAL_COMP_ELEM4 <chr>, NOM_COMP_ELEM5 <chr>,
#> #   VAL_COMP_ELEM5 <chr>, LATITUDE <dbl>, LONGITUDE <dbl>, …

Setting output = "sf" returns an sf object instead. The example below reads data for Salvador, filters religious facilities (COD_ESPECIE == 8), and plots them:

library(sf)
library(ggplot2)

# Reading CNEFE data
tab_ssa_sf <- read_cnefe(
  code_muni = 2927408, 
  output = "sf", 
  cache = TRUE
  )

# Filtering religious establishments
temples_ssa <- tab_ssa_sf |>
  filter(COD_ESPECIE == 8)

# Ploting religious establishments points for Salvador
ggplot() +
  geom_sf(data = temples_ssa, size = 0.3, alpha = 0.6) +
  coord_sf() +
  theme_minimal()

Warning: For large municipalities, CNEFE may contain more than 1 million address points. Plotting all coordinates at once can be slow and memory-intensive, so consider filtering or sampling before creating maps.

Caching behavior

By default, cache = TRUE stores the downloaded ZIP file in a user-level cache directory specific to this package. If you prefer to avoid persistent caching, set:

tab_ssa <- read_cnefe(code_muni = 2927408, cache = FALSE)

In this case, the ZIP file is stored in a temporary location and removed after reading.

Accessing official CNEFE documentation

{cnefetools} includes local copies of the official methodological note and the variable dictionary for the 2022 CNEFE released by IBGE.

# Open the official methodological note (PDF)
cnefe_doc(year = 2022)

# Open the official variable dictionary (.xls spreadsheet)
cnefe_dictionary(year = 2022)

Counting addresses with cnefe_counts()

cnefe_counts() aggregates CNEFE address points into spatial units and returns an sf object with counts by address category (addr_type1 to addr_type8). Below is an example using H3 hexagons for São Paulo at resolution 9:

library(cnefetools)
library(sf)
library(ggplot2)

# Producing CNEFE counts
hex_sp <- cnefe_counts(
  code_muni = 3550308,
  h3_resolution = 9,
  verbose = TRUE
  )

Below we plot the count of private households (addr_type1) per hexagon:

# Plotting private households (addr_type1) for São Paulo
ggplot(hex_sp) +
  geom_sf(aes(fill = addr_type1), color = NA) +
  scale_fill_viridis_c(option = "magma") +
  coord_sf() +
  labs(
    fill = "Count",
    title = "Private households (addr_type1)",
    subtitle = "São Paulo (IBGE 3550308), H3 resolution 9"
  ) +
  theme_minimal()+
  theme(
    plot.title.position = "plot",
    axis.text.x = element_text(size = 5),
    axis.text.y = element_text(size = 5)
  )

cnefe_counts() also supports polygon_type = "user" to aggregate counts to custom polygons instead of H3 hexagons. See the cnefe_counts article for details.

Land-use mix indices with compute_lumi()

compute_lumi() computes land-use mix indicators on spatial units for any municipality covered by the 2022 CNEFE dataset (Pedreira Junior et al., 2025). Available indicators include the Entropy Index (ei), the Herfindahl-Hirschman Index (hhi), the Balance Index (bal), the Index of Concentration at Extremes (ice), an adapted HHI (hhi_adp), and the Bidirectional Global-centered Balance Index (bgbi).

Below is an example for Fortaleza at H3 resolution 8:

library(cnefetools)
library(sf)
library(ggplot2)

# Computing land use mix indices
lumi_ftl <- compute_lumi(
  code_muni = 2304400,
  h3_resolution = 8,
  verbose = TRUE
  )

Below we plot the Bidirectional Global-centered Balance Index (BGBI), where positive values indicate residential dominance and negative values indicate non-residential dominance:

# Plotting the BGBI index
ggplot(lumi_ftl) +
  geom_sf(aes(fill = bgbi), color = NA) +
  scale_fill_distiller(
    type = "div",
    palette = "RdBu",
    direction = 1
  ) +
  coord_sf() +
  labs(
    fill = "BGBI",
    title = "Bidirectional Global-centered Balance Index (BGBI)",
    subtitle = "Fortaleza (IBGE 2304400), H3 resolution 8"
  ) +
  theme_minimal()+
  theme(
    plot.title.position = "plot",
    axis.text.x = element_text(size = 5),
    axis.text.y = element_text(size = 5)
  )

compute_lumi() also supports polygon_type = "user" to compute indices on custom polygons. See the compute_lumi article for details.

Dasymetric interpolation with tracts_to_h3()

tracts_to_h3() performs a dasymetric interpolation, considering two stages: first, census tract totals are allocated to individual CNEFE dwelling points inside each tract; then, the allocated values are aggregated to an H3 grid at the chosen resolution. This leverages the fine-grained spatial distribution of addresses in CNEFE to produce more realistic sub-tract estimates than simple areal weighting.

library(cnefetools)
library(ggplot2)

# Performing dasymetric interpolation
rec_hex <- tracts_to_h3(
  code_muni = 2611606,
  h3_resolution = 9,
  vars = c("pop_ph", "avg_inc_resp"),
  cache = TRUE,
  verbose = TRUE
)

The resulting H3 grid can be mapped to visualize the spatial distribution of each variable. Below we plot the private-household population (pop_ph):

ggplot(rec_hex) +
  geom_sf(aes(fill = pop_ph), color = NA) +
  scale_fill_viridis_c() +
  coord_sf() +
  labs(
    title = "Dasymetric interpolation to H3 using CNEFE (Recife)",
    subtitle = "Private-household population (pop_ph), H3 resolution 9",
    fill = "Population"
  ) +
  theme_minimal() +
  theme(
    plot.title.position = "plot",
    axis.text.x = element_text(size = 5),
    axis.text.y = element_text(size = 5)
  )

And the average income of the household head (avg_inc_resp):

ggplot(rec_hex) +
  geom_sf(aes(fill = avg_inc_resp), color = NA) +
  scale_fill_viridis_c() +
  coord_sf() +
  labs(
    title = "Dasymetric interpolation to H3 using CNEFE (Recife)",
    subtitle = "Average income of the responsible person (avg_inc_resp), H3 resolution 9",
    fill = "Income"
  ) +
  theme_minimal() +
  theme(
    plot.title.position = "plot",
    axis.text.x = element_text(size = 5),
    axis.text.y = element_text(size = 5)
  )

The full list of available variables is documented in the tracts_variables_ref dataset (see ?tracts_variables_ref). For allocation rules and diagnostic details, see the tracts_to article.

Dasymetric interpolation with tracts_to_polygon()

tracts_to_polygon() follows the same two-stage workflow as tracts_to_h3(), but aggregates the allocated values to user-provided polygons (e.g. neighborhoods, administrative divisions, or custom areas) instead of an H3 grid. Let’s generate the neighborhoods of Recife with the read_neighborhood() function from the geobr package and interpolate the average income of household heads per neighborhood:

library(geobr)

# Reading neighborhoods from geobr package
rec_nei <- read_neighborhood(year = 2022, simplified = F, showProgress = F) |>
  filter(name_muni == 'Recife')

# Dasymetric interpolation to neighborhoods
rec_poly <- tracts_to_polygon(
  code_muni = 2611606,
  polygon = rec_nei,
  vars = c("pop_ph", "avg_inc_resp"),
  verbose = F
)

Below we plot the interpolated average income of the household head (avg_inc_resp) at the neighborhood level:

# Plotting variables at the neighborhood level
ggplot(rec_poly) +
  geom_sf(aes(fill = avg_inc_resp), color = NA) +
  scale_fill_viridis_c() +
  coord_sf() +
  labs(
    title = "Dasymetric interpolation to neighborhoods using CNEFE (Recife)",
    subtitle = "Average income of the responsible person (avg_inc_resp)",
    fill = "Income"
  ) +
  theme_minimal() +
  theme(
    plot.title.position = "plot",
    axis.text.x = element_text(size = 5),
    axis.text.y = element_text(size = 5)
  )

See the tracts_to article for details.

DuckDB-powered spatial operations

Under the hood, {cnefetools} uses DuckDB as its default backend to perform spatial operations efficiently, with speedups of up to 20x over pure-R code depending on the number of address points and the size of the spatial units. This is made possible by three DuckDB extensions:

The R package duckspatial also bridges sf objects and DuckDB’s spatial extension, enabling seamless transfers between R and DuckDB.

All extensions are installed and loaded automatically on first use. A pure-R fallback (backend = "r") is also available, using h3jsr and sf for the same operations on cnefe_counts() and compute_lumi() functions (slower, but without the DuckDB dependency).

Citation

If you use {cnefetools} in your work, please cite the associated preprint:

Pedreira Jr., J. U.; Louro, T. V.; Assis, L. B. M.; Brito, P. L. Measuring land use mix with address-level census data (2025). engrXiv. https://engrxiv.org/preprint/view/5975

These binaries (installable software) and packages are in development.
They may not be fully stable and should be used with caution. We make no claims about them.
Health stats visible at Monitor.