The hardware and bandwidth for this mirror is donated by dogado GmbH, the Webhosting and Full Service-Cloud Provider. Check out our Wordpress Tutorial.
If you wish to report a bug, or if you are interested in having us mirror your free-software or open-source project, please feel free to contact us at mirror[@]dogado.de.
panelbuild provides tools for auditing, validating, and
preparing panel datasets before statistical analysis.
You can install the development version of panelbuild
from GitHub:
# install.packages("remotes")
remotes::install_github("desirajulavanya/panelbuild")library(panelbuild)
audit <- audit_panel(example_panel, id = id, time = year)
audit## Panel audit
##
## Data: example_panel
## Unit variable: id
## Time variable: year
##
## Units: 3
## Time periods: 4
## Observed rows: 9
## Observed id-time cells: 8
## Expected id-time cells: 12
## Missing id-time cells: 4
## Duplicate id-time cells: 1
## Balanced panel: No
audit_report(audit)## panelbuild Panel Audit Report
## ==========================
##
## Dataset
## -------
## Data: example_panel
## Unit variable: id
## Time variable: year
##
## Panel structure
## ---------------
## Units: 3
## Time periods: 4
## Observed rows: 9
## Observed unit-time cells: 8
## Expected unit-time cells: 12
## Missing unit-time cells: 4
## Duplicate unit-time cells: 1
## Balanced panel: No
##
## Recommended next steps
## ----------------------
## * Resolve duplicate unit-time observations before completing the panel.
## * Use `duplicate_cells(audit)` or `duplicate_summary()` to inspect duplicates.
## * Inspect missing unit-time cells before estimation.
## * Use `missing_cells(audit)` or `gap_summary()` to review panel gaps.
## * Use `complete_panel()` only after duplicate unit-time cells are resolved.
Panel datasets often contain missing unit-time cells, duplicate observations, irregular time gaps, and imbalance. These issues can affect fixed effects models, difference-in-differences designs, event studies, and other panel-data methods.
panelbuild helps researchers identify and document these
problems before estimation.
library(panelbuild)
data(example_panel)
example_panel## id year outcome treatment
## 1 1 2020 10 0
## 2 1 2021 12 1
## 3 1 2021 13 1
## 4 2 2020 20 0
## 5 2 2022 25 1
## 6 3 2020 30 0
## 7 3 2021 31 0
## 8 3 2022 32 1
## 9 3 2023 33 1
audit_panel(example_panel, id = id, time = year)## Panel audit
##
## Data: example_panel
## Unit variable: id
## Time variable: year
##
## Units: 3
## Time periods: 4
## Observed rows: 9
## Observed id-time cells: 8
## Expected id-time cells: 12
## Missing id-time cells: 4
## Duplicate id-time cells: 1
## Balanced panel: No
duplicate_summary(example_panel, id = id, time = year)## # A tibble: 1 × 3
## id panelbuild_duplicate_cells panelbuild_duplicate_extra_rows
## <dbl> <int> <int>
## 1 1 1 1
gap_summary(example_panel, id = id, time = year)## # A tibble: 2 × 2
## id panelbuild_missing_periods
## <dbl> <int>
## 1 1 2
## 2 2 2
flag_panel_issues(example_panel, id = id, time = year)## # A tibble: 9 × 7
## id year outcome treatment panelbuild_row_id panelbuild_id_time_n
## <dbl> <dbl> <dbl> <dbl> <int> <int>
## 1 1 2020 10 0 1 1
## 2 1 2021 12 1 2 2
## 3 1 2021 13 1 3 2
## 4 2 2020 20 0 4 1
## 5 2 2022 25 1 5 1
## 6 3 2020 30 0 6 1
## 7 3 2021 31 0 7 1
## 8 3 2022 32 1 8 1
## 9 3 2023 33 1 9 1
## # ℹ 1 more variable: panelbuild_duplicate_cell <lgl>
complete_panel() creates a full unit-time grid while
preserving observed values. It does not impute missing outcomes.
Because complete_panel() requires unique id-time cells,
we first create a version of the example data without duplicates.
example_panel_unique <- example_panel |>
dplyr::distinct(id, year, .keep_all = TRUE)
complete_panel(example_panel_unique, id = id, time = year)## # A tibble: 12 × 7
## id year outcome treatment panelbuild_original_row panelbuild_completed_…¹
## <dbl> <dbl> <dbl> <dbl> <lgl> <lgl>
## 1 1 2020 10 0 TRUE FALSE
## 2 1 2021 12 1 TRUE FALSE
## 3 1 2022 NA NA FALSE TRUE
## 4 1 2023 NA NA FALSE TRUE
## 5 2 2020 20 0 TRUE FALSE
## 6 2 2021 NA NA FALSE TRUE
## 7 2 2022 25 1 TRUE FALSE
## 8 2 2023 NA NA FALSE TRUE
## 9 3 2020 30 0 TRUE FALSE
## 10 3 2021 31 0 TRUE FALSE
## 11 3 2022 32 1 TRUE FALSE
## 12 3 2023 33 1 TRUE FALSE
## # ℹ abbreviated name: ¹panelbuild_completed_cell
## # ℹ 1 more variable: panelbuild_audit_action <chr>
audit_panel() gives a full panel diagnostic
summary.duplicate_summary() finds duplicate unit-time
observations.gap_summary() summarizes missing time periods by
unit.flag_panel_issues() flags row-level panel
problems.complete_panel() creates a complete panel grid without
imputing observed variables.The goal of panelbuild is to provide a transparent and
reproducible workflow for panel-data quality assurance before
statistical modeling.
These binaries (installable software) and packages are in development.
They may not be fully stable and should be used with caution. We make no claims about them.
Health stats visible at Monitor.