Getting Started with realestatebr

Introduction

This vignette provides a minimal introduction to the realestatebr package, showing how to use its core functions. Since realestatebr returns tibble as default values, we recommend using it together with the dplyr package, though conversion do data.table is trivial.

library(realestatebr)
library(dplyr)

The code below defines a common theme for all plots in this vignette and is required to fully replicate the code in this document. Despite this, this code is entirely optional and can be omitted.

library(ggplot2)

color_palette <- c(
  "#1E3A5F",
  "#DD6B20",
  "#2C7A7B",
  "#D69E2E",
  "#805AD5",
  "#C53030"
)

theme_series <- function() {
  theme_minimal(
    # swap for other font if needed
    base_family = "Avenir",
    base_size = 10
  ) +
    theme(
      plot.title = element_text(size = 16),
      panel.grid.minor = element_blank(),
      panel.grid.major.x = element_blank(),
      axis.line.x = element_line(color = "gray10", linewidth = 0.5),
      axis.ticks.x = element_line(color = "gray10", linewidth = 0.5),
      axis.title.x = element_blank(),
      legend.position = "bottom",
      palette.color.discrete = color_palette
    )
}

realestatebr provides a unified interface to Brazilian real estate data from multiple public sources. All datasets are returned as tidy tibble objects.

Core Interface

The goal of realestatebr is to provide a unified interface to Brazilian real estate data from multiple public sources. All datasets are returned as tidy tibble objects. The package is centered around a key function: get_dataset(name, table) which retrieves any dataset by name. Without a table argument it returns the default table; use table to select a specific sub-table.

# Default table
abecip <- get_dataset("abecip")

# Specific table
sbpe <- get_dataset("abecip", table = "units")

In order to explore which datasets are available, use list_datasets() and get_dataset_info().

ds <- list_datasets()
info <- get_dataset_info("abecip")
names(info$categories)
#> [1] "sbpe"  "units"  "cgi"

The source Argument

The source argument from get_dataset() controls where data comes from. The default ("auto") reads the in-session memo if present, falls back to the package’s GitHub release, and finally falls back to a fresh download from the original source. Typically the default is fine. Use "github" to force the pre-processed asset, or "fresh" to always pull from the original source (slower but guaranteed up-to-date).

get_dataset("abecip", source = "github") # pre-processed asset from GitHub release
get_dataset("abecip", source = "fresh") # direct from the original source

Repeated calls within one R session are served from an in-memory memo, so fetching the same dataset twice does not re-download. Use clear_session_cache() to drop the memo without restarting R.

Example: Housing Credit Cycle

SBPE (Sistema Brasileiro de Poupança e Empréstimo) is the primary funding mechanism for residential mortgages in Brazil. The table sbpe fromabecip` tracks the deposits and withdrawals from saving accounts, that help finance real estate construction and acquisition.

sbpe <- get_dataset("abecip", table = "sbpe")

glimpse(sbpe)

The plot below shows the annual net savings flow in recent years.

# Annual net credit flow
sbpe_annual <- sbpe |>
  filter(date >= as.Date("2019-01-01")) |>
  mutate(year = lubridate::year(date)) |>
  summarise(net_flow = sum(sbpe_netflow, na.rm = TRUE) / 1e3, .by = year) |>
  mutate(
    label_num = format(round(net_flow, 1)),
    ypos = if_else(net_flow > 0, net_flow + 10, net_flow - 10)
  )

ggplot(sbpe_annual, aes(year, net_flow)) +
  geom_col(fill = color_palette[1], alpha = 0.9, width = 0.8) +
  geom_text(aes(y = ypos, label = label_num), size = 3) +
  geom_hline(yintercept = 0) +
  scale_x_continuous(breaks = 2019:2026) +
  labs(
    title = "Annual Net Savings Flow (SBPE)",
    x = NULL,
    y = "R$ billions"
  ) +
  theme_series()

The companion table "units" contains monthly counts of financed units.

units <- get_dataset("abecip", table = "units")

glimpse(units)

The plot shows the amount of units financed per month together with a LOESS trend line.

# SBPE units financed per year
units_recent <- units |>
  filter(date >= as.Date("2019-01-01"))

ggplot(units_recent, aes(date, units_total)) +
  geom_point(alpha = 0.5, size = 0.8, color = color_palette[1]) +
  geom_smooth(
    color = color_palette[1],
    lwd = 0.7,
    se = FALSE,
    method = stats::loess,
    method.args = list(span = 0.4)
  ) +
  scale_x_date(date_breaks = "1 year", date_labels = "%Y") +
  labs(
    title = "Monthly Financed Units",
    y = "Units"
  ) +
  theme_series()

Example: Real Estate Credit Portfolio

The bcb_realestate dataset imports all real estate statistics from the Brazilian Central Bank. This is a relatively large dataset and exploring can be cumbersome. Each series is uniquely identified by date and series_info. Helper functions v1, v2, …, v5, abbrev_state, category, and type are provided to simplify the use of the dataset.

The code below shows how to access a specific series and also how to fetch a group of related series.

bcb <- get_dataset("bcb_realestate")

# Get a specific series
sfh_pf <- bcb |>
  filter(series_info == "credito_estoque_carteira_credito_pf_sfh_br")

# Get the all the related series for 'estoque_carteira_credito_pf'
credit_stock <- bcb |>
  filter(
    category == "credito",
    type == "estoque",
    v1 == "carteira",
    v2 == "credito",
    v3 == "pf",
    # since v4 is left blank, we get all credit lines
    v5 == "br"
  )

# The helper columns essentially separate the 'series_info' column allowing
# for easier filtering. It's equivalent to filtering by regex
credit_stock <- bcb |>
  filter(grepl(
    "(?<=credito_estoque_carteira_credito_pf_).+_br$",
    series_info,
    perl = TRUE
  ))

The single series shows only the values from SFH (specific credit line).

ggplot(sfh_pf, aes(date, value / 1e9)) +
  geom_line(lwd = 0.7, color = color_palette[1]) +
  labs(title = "SFH", y = "R$ (billions)") +
  theme_series()

The grouped series show the entire household credit stock by credit line.

credit_labels <- c(
  "Home Equity" = "home-equity",
  "Comercial" = "comercial",
  "Livre" = "livre",
  "FGTS" = "fgts",
  "SFH" = "sfh"
)

credit_stock <- credit_stock |>
  mutate(
    credit_line_label = factor(
      v4,
      levels = credit_labels,
      labels = names(credit_labels)
    )
  )

ggplot(credit_stock, aes(date, value / 1e9)) +
  geom_area(aes(fill = credit_line_label), alpha = 0.9) +
  scale_fill_manual(values = rev(color_palette[1:5])) +
  scale_x_date(expand = expansion(mult = c(0.01))) +
  scale_y_continuous(expand = expansion(mult = c(0, 0.05))) +
  labs(
    title = "Real Estate Credit Stock",
    subtitle = "Household real estate credit stock (total debt) by credit line",
    y = "R$ (billions)",
    fill = NULL
  ) +
  theme_series()

As a final warning, note that the bcb_realestate dataset follows the YYYY-MM-DD format using the last day of the month as default value (e.g. 2023-01-31). This can cause issues when merging with other datasets, since the first day of the month is the more common date format (e.g. 2023-01-01).

To avoid this, use lubridate::floor_date(date, 'month'). Future versions of realestatebr might provide this as a default behavior.

Reference (all datasets)

The available datasets are listed below.

Dataset Source Tables Status
abecip ABECIP sbpe, units, cgi Active
abrainc ABRAINC / FIPE indicator, radar, leading Active
bcb_realestate Banco Central do Brasil accounting, application, indices, sources, units Active
bcb_series Banco Central do Brasil core, primary, secondary, tertiary, full Active
fgv_ibre FGV IBRE Active
rppi FIPE/ZAP, IVGR, IGMI, IQA, IVAR, SECOVI-SP sale, rent, fipezap, ivgr, igmi, iqa, iqaiw, ivar, secovi_sp Active
rppi_bis Bank for International Settlements selected, detailed_monthly, detailed_quarterly, detailed_annual, detailed_halfyearly Active
secovi SECOVI-SP condo, rent, launch, sale Active