The hardware and bandwidth for this mirror is donated by dogado GmbH, the Webhosting and Full Service-Cloud Provider. Check out our Wordpress Tutorial.
If you wish to report a bug, or if you are interested in having us mirror your free-software or open-source project, please feel free to contact us at mirror[@]dogado.de.
The healthbR package provides easy access to Brazilian public health survey data directly from R. It downloads, caches, and processes data from official Ministry of Health sources, returning clean, analysis-ready tibbles that follow tidyverse conventions.
Currently, healthbR supports VIGITEL (Vigilância de Fatores de Risco e Proteção para Doenças Crônicas por Inquérito Telefônico), a telephone-based survey that monitors risk and protective factors for chronic diseases in Brazilian state capitals.
Before downloading data, you can check which years are available:
VIGITEL uses coded variable names (q6, q8, etc.). Use the dictionary to understand what each variable represents:
You can search for specific variables:
VIGITEL uses complex survey sampling with post-stratification
weights. For proper statistical inference, always use the
pesorake weight variable.
Some commonly used variables in VIGITEL:
| Variable | Description |
|---|---|
cidade |
City code (1-27 for state capitals) |
q6 |
Sex |
q8_anos |
Age in years |
pesorake |
Post-stratification weight |
diab |
Diabetes diagnosis |
hart |
Hypertension diagnosis |
fumante |
Current smoker |
imc |
Body Mass Index |
obesid |
Obesity indicator |
Consult vigitel_dictionary() for the complete list.
healthbR offers three strategies for working with large datasets efficiently.
. Parquet conversion
Convert Excel files to Parquet format for dramatically faster loading (10-20x improvement):
When downloading multiple years, healthbR automatically uses parallel
processing if the furrr package is available:
For very large datasets, use lazy evaluation to filter and select data before loading into memory:
# returns Arrow Dataset (not loaded into RAM)
df_lazy <- vigitel_data(2015:2023, lazy = TRUE)
# operations are executed lazily
result <- df_lazy |>
filter(cidade == 1, q8_anos >= 18) |>
select(q6, q8_anos, pesorake, diab, hart, imc) |>
collect()
# only now data is loadedThis approach is especially useful when you only need a subset of the data.
Here’s a complete workflow for analyzing diabetes prevalence:
library(healthbR)
library(dplyr)
library(srvyr)
# 1. load data
df <- vigitel_data(2023)
# 2. create survey design
svy <- df |>
as_survey_design(weights = pesorake)
# 3. calculate prevalence by sex
diabetes_by_sex <- svy |>
group_by(q6) |>
summarize(
prevalence = survey_mean(diab == 1, na.rm = TRUE, vartype = "ci"),
n = unweighted(n())
)
diabetes_by_sexThese binaries (installable software) and packages are in development.
They may not be fully stable and should be used with caution. We make no claims about them.
Health stats visible at Monitor.