The hardware and bandwidth for this mirror is donated by dogado GmbH, the Webhosting and Full Service-Cloud Provider. Check out our Wordpress Tutorial.
If you wish to report a bug, or if you are interested in having us mirror your free-software or open-source project, please feel free to contact us at mirror[@]dogado.de.

Chronic Disease Risk Factors from VIGITEL with healthbR

Overview

VIGITEL (Vigilancia de Fatores de Risco e Protecao para Doencas Cronicas por Inquerito Telefonico) is an annual telephone survey conducted by the Brazilian Ministry of Health since 2006. It monitors risk and protective factors for chronic non-communicable diseases among adults (18+) in all 26 state capitals and the Federal District.

Topic Examples
Tobacco Smoking prevalence, cessation
Alcohol Consumption patterns, binge drinking
Diet Fruit/vegetable intake, ultra-processed foods
Physical activity Leisure, commuting, sedentary behavior
Chronic diseases Diabetes, hypertension, obesity self-report
Preventive exams Mammography, Pap smear, colonoscopy

Each annual edition interviews approximately 54,000 adults via landline telephone, with post-stratification weighting (pesorake) to match the adult population of each city.

Getting started

library(healthbR)
library(dplyr)

Check available years

vigitel_years()
#> [1] 2006 2007 2008 ... 2023 2024

Survey information

vigitel_info()

Downloading data

All years at once

VIGITEL is distributed as a single consolidated file covering 2006–2024. By default, all years are downloaded:

df <- vigitel_data()

Specific years

df <- vigitel_data(year = 2020:2024)

Select variables

df <- vigitel_data(year = 2024, vars = c("cidade", "sexo", "idade", "pesorake",
                                          "q6", "q7", "q9"))

Data format

Two formats are available: Stata (.dta, default) and CSV. The Stata format preserves variable labels:

df_dta <- vigitel_data(format = "dta")  # default, with labels
df_csv <- vigitel_data(format = "csv")  # alternative

Exploring variables

Data dictionary

vigitel_dictionary()

Search variables

vigitel_variables()

Example: Smoking prevalence over time

# Download smoking-related variables
df <- vigitel_data(
  year = 2006:2024,
  vars = c("ano", "cidade", "sexo", "pesorake", "q6")
)

# q6: "Atualmente, o(a) sr(a) fuma?" (1 = sim, 2 = nao)
smoking <- df |>
  filter(q6 %in% c("1", "2")) |>
  group_by(ano) |>
  summarise(
    smokers = sum(pesorake[q6 == "1"], na.rm = TRUE),
    total = sum(pesorake, na.rm = TRUE),
    prevalence = smokers / total * 100
  )

Example: Obesity by capital city

df <- vigitel_data(
  year = 2024,
  vars = c("cidade", "sexo", "pesorake", "q8", "q9")
)

# q8 = weight (kg), q9 = height (cm)
# BMI >= 30 = obesity
obesity <- df |>
  filter(!is.na(q8), !is.na(q9), q9 > 0) |>
  mutate(
    bmi = as.numeric(q8) / (as.numeric(q9) / 100)^2,
    obese = bmi >= 30
  ) |>
  group_by(cidade) |>
  summarise(
    prevalence = weighted.mean(obese, as.numeric(pesorake), na.rm = TRUE) * 100
  ) |>
  arrange(desc(prevalence))

Cache and performance

Data is automatically cached in partitioned parquet format (when arrow is installed). Subsequent calls load instantly from cache:

# First call downloads (~30 seconds)
df <- vigitel_data(year = 2024)

# Second call loads from cache (instant)
df <- vigitel_data(year = 2024)

# Check cache status
vigitel_cache_status()

# Clear cache if needed
vigitel_clear_cache()

Lazy evaluation

For large analyses, use lazy evaluation to query without loading all data into memory:

lazy_df <- vigitel_data(lazy = TRUE, backend = "arrow")

Further reading

These binaries (installable software) and packages are in development.
They may not be fully stable and should be used with caution. We make no claims about them.
Health stats visible at Monitor.