The hardware and bandwidth for this mirror is donated by dogado GmbH, the Webhosting and Full Service-Cloud Provider. Check out our Wordpress Tutorial.
If you wish to report a bug, or if you are interested in having us mirror your free-software or open-source project, please feel free to contact us at mirror[@]dogado.de.
panelbuild provides tools for auditing, validating, and
preparing panel datasets before statistical analysis.
Panel datasets often contain duplicate unit-time observations, missing time periods, irregular gaps, and imbalance. These issues can affect fixed effects models, difference-in-differences designs, event studies, and other panel-data methods.
The goal of panelbuild is to help users identify these
issues before estimation.
panelbuild includes a small example dataset called
example_panel.
data(example_panel)
example_panel
#> id year outcome treatment
#> 1 1 2020 10 0
#> 2 1 2021 12 1
#> 3 1 2021 13 1
#> 4 2 2020 20 0
#> 5 2 2022 25 1
#> 6 3 2020 30 0
#> 7 3 2021 31 0
#> 8 3 2022 32 1
#> 9 3 2023 33 1The dataset intentionally includes:
This makes it useful for demonstrating panel-data diagnostics.
The main function is audit_panel().
audit_panel(example_panel, id = id, time = year)
#> Panel audit
#>
#> Data: example_panel
#> Unit variable: id
#> Time variable: year
#>
#> Units: 3
#> Time periods: 4
#> Observed rows: 9
#> Observed id-time cells: 8
#> Expected id-time cells: 12
#> Missing id-time cells: 4
#> Duplicate id-time cells: 1
#> Balanced panel: NoThis gives a quick overview of the panel structure, including whether the panel is balanced and whether there are missing or duplicate unit-time cells.
Duplicate unit-time observations are a common problem in panel datasets.
gap_summary() identifies missing time periods by panel
unit.
flag_panel_issues() adds diagnostic flags to the
data.
flag_panel_issues(example_panel, id = id, time = year)
#> # A tibble: 9 × 7
#> id year outcome treatment panelbuild_row_id panelbuild_id_time_n
#> <dbl> <dbl> <dbl> <dbl> <int> <int>
#> 1 1 2020 10 0 1 1
#> 2 1 2021 12 1 2 2
#> 3 1 2021 13 1 3 2
#> 4 2 2020 20 0 4 1
#> 5 2 2022 25 1 5 1
#> 6 3 2020 30 0 6 1
#> 7 3 2021 31 0 7 1
#> 8 3 2022 32 1 8 1
#> 9 3 2023 33 1 9 1
#> # ℹ 1 more variable: panelbuild_duplicate_cell <lgl>complete_panel() creates a complete unit-time grid. It
does not impute missing outcome values.
Because complete_panel() requires unique unit-time
cells, we first remove duplicate id-time observations from the example
dataset.
example_panel_unique <- example_panel |>
dplyr::distinct(id, year, .keep_all = TRUE)
complete_panel(example_panel_unique, id = id, time = year)
#> # A tibble: 12 × 7
#> id year outcome treatment panelbuild_original_row panelbuild_completed_…¹
#> <dbl> <dbl> <dbl> <dbl> <lgl> <lgl>
#> 1 1 2020 10 0 TRUE FALSE
#> 2 1 2021 12 1 TRUE FALSE
#> 3 1 2022 NA NA FALSE TRUE
#> 4 1 2023 NA NA FALSE TRUE
#> 5 2 2020 20 0 TRUE FALSE
#> 6 2 2021 NA NA FALSE TRUE
#> 7 2 2022 25 1 TRUE FALSE
#> 8 2 2023 NA NA FALSE TRUE
#> 9 3 2020 30 0 TRUE FALSE
#> 10 3 2021 31 0 TRUE FALSE
#> 11 3 2022 32 1 TRUE FALSE
#> 12 3 2023 33 1 TRUE FALSE
#> # ℹ abbreviated name: ¹panelbuild_completed_cell
#> # ℹ 1 more variable: panelbuild_audit_action <chr>A typical panelbuild workflow is:
panelbuild is designed to provide a transparent and
reproducible workflow for panel-data quality assurance.
Use it before fitting panel models, difference-in-differences designs, event studies, or other longitudinal-data analyses.
These binaries (installable software) and packages are in development.
They may not be fully stable and should be used with caution. We make no claims about them.
Health stats visible at Monitor.