Chronic Disease Risk Factors from VIGITEL with healthbR

The hardware and bandwidth for this mirror is donated by dogado GmbH, the Webhosting and Full Service-Cloud Provider. Check out our Wordpress Tutorial.
If you wish to report a bug, or if you are interested in having us mirror your free-software or open-source project, please feel free to contact us at mirror[@]dogado.de.

Overview

VIGITEL (Vigilancia de Fatores de Risco e Protecao para Doencas Cronicas por Inquerito Telefonico) is an annual telephone survey conducted by the Brazilian Ministry of Health since 2006. It monitors risk and protective factors for chronic non-communicable diseases among adults (18+) in all 26 state capitals and the Federal District.

Topic	Examples
Tobacco	Smoking prevalence, cessation
Alcohol	Consumption patterns, binge drinking
Diet	Fruit/vegetable intake, ultra-processed foods
Physical activity	Leisure, commuting, sedentary behavior
Chronic diseases	Diabetes, hypertension, obesity self-report
Preventive exams	Mammography, Pap smear, colonoscopy

Each annual edition interviews approximately 54,000 adults via landline telephone, with post-stratification weighting (pesorake) to match the adult population of each city.

Getting started

library(healthbR)
library(dplyr)

Check available years

vigitel_years()
#> [1] 2006 2007 2008 ... 2023 2024

Survey information

vigitel_info()

Downloading data

All years at once

VIGITEL is distributed as a single consolidated file covering 2006–2024. By default, all years are downloaded:

df <- vigitel_data()

Specific years

df <- vigitel_data(year = 2020:2024)

Select variables

df <- vigitel_data(year = 2024, vars = c("cidade", "sexo", "idade", "pesorake",
                                          "q6", "q7", "q9"))

Data format

Two formats are available: Stata (.dta, default) and CSV. The Stata format preserves variable labels:

df_dta <- vigitel_data(format = "dta")  # default, with labels
df_csv <- vigitel_data(format = "csv")  # alternative

Exploring variables

Data dictionary

vigitel_dictionary()

Search variables

vigitel_variables()

Example: Smoking prevalence over time

# Download smoking-related variables
df <- vigitel_data(
  year = 2006:2024,
  vars = c("ano", "cidade", "sexo", "pesorake", "q6")
)

# q6: "Atualmente, o(a) sr(a) fuma?" (1 = sim, 2 = nao)
smoking <- df |>
  filter(q6 %in% c("1", "2")) |>
  group_by(ano) |>
  summarise(
    smokers = sum(pesorake[q6 == "1"], na.rm = TRUE),
    total = sum(pesorake, na.rm = TRUE),
    prevalence = smokers / total * 100
  )

Example: Obesity by capital city

df <- vigitel_data(
  year = 2024,
  vars = c("cidade", "sexo", "pesorake", "q8", "q9")
)

# q8 = weight (kg), q9 = height (cm)
# BMI >= 30 = obesity
obesity <- df |>
  filter(!is.na(q8), !is.na(q9), q9 > 0) |>
  mutate(
    bmi = as.numeric(q8) / (as.numeric(q9) / 100)^2,
    obese = bmi >= 30
  ) |>
  group_by(cidade) |>
  summarise(
    prevalence = weighted.mean(obese, as.numeric(pesorake), na.rm = TRUE) * 100
  ) |>
  arrange(desc(prevalence))

Cache and performance

Data is automatically cached in partitioned parquet format (when arrow is installed). Subsequent calls load instantly from cache:

# First call downloads (~30 seconds)
df <- vigitel_data(year = 2024)

# Second call loads from cache (instant)
df <- vigitel_data(year = 2024)

# Check cache status
vigitel_cache_status()

# Clear cache if needed
vigitel_clear_cache()

Lazy evaluation

For large analyses, use lazy evaluation to query without loading all data into memory:

lazy_df <- vigitel_data(lazy = TRUE, backend = "arrow")

Chronic Disease Risk Factors from VIGITEL with healthbR

Overview

Getting started

Check available years

Survey information

Downloading data

All years at once

Specific years

Select variables

Data format

Exploring variables

Data dictionary

Search variables

Example: Smoking prevalence over time

Example: Obesity by capital city

Cache and performance

Lazy evaluation

Further reading