The hardware and bandwidth for this mirror is donated by dogado GmbH, the Webhosting and Full Service-Cloud Provider. Check out our Wordpress Tutorial.
If you wish to report a bug, or if you are interested in having us mirror your free-software or open-source project, please feel free to contact us at mirror[@]dogado.de.

Getting Started with panelbuild

Introduction

panelbuild provides tools for auditing, validating, and preparing panel datasets before statistical analysis.

Panel datasets often contain duplicate unit-time observations, missing time periods, irregular gaps, and imbalance. These issues can affect fixed effects models, difference-in-differences designs, event studies, and other panel-data methods.

The goal of panelbuild is to help users identify these issues before estimation.

Load the package

library(panelbuild)

Example panel dataset

panelbuild includes a small example dataset called example_panel.

data(example_panel)

example_panel
#>   id year outcome treatment
#> 1  1 2020      10         0
#> 2  1 2021      12         1
#> 3  1 2021      13         1
#> 4  2 2020      20         0
#> 5  2 2022      25         1
#> 6  3 2020      30         0
#> 7  3 2021      31         0
#> 8  3 2022      32         1
#> 9  3 2023      33         1

The dataset intentionally includes:

This makes it useful for demonstrating panel-data diagnostics.

Audit the panel

The main function is audit_panel().

audit_panel(example_panel, id = id, time = year)
#> Panel audit
#> 
#> Data: example_panel
#> Unit variable: id
#> Time variable: year
#> 
#> Units: 3
#> Time periods: 4
#> Observed rows: 9
#> Observed id-time cells: 8
#> Expected id-time cells: 12
#> Missing id-time cells: 4
#> Duplicate id-time cells: 1
#> Balanced panel: No

This gives a quick overview of the panel structure, including whether the panel is balanced and whether there are missing or duplicate unit-time cells.

Find duplicate observations

Duplicate unit-time observations are a common problem in panel datasets.

duplicate_summary(example_panel, id = id, time = year)
#> # A tibble: 1 × 3
#>      id panelbuild_duplicate_cells panelbuild_duplicate_extra_rows
#>   <dbl>                      <int>                           <int>
#> 1     1                          1                               1

Summarize gaps

gap_summary() identifies missing time periods by panel unit.

gap_summary(example_panel, id = id, time = year)
#> # A tibble: 2 × 2
#>      id panelbuild_missing_periods
#>   <dbl>                      <int>
#> 1     1                          2
#> 2     2                          2

Flag row-level issues

flag_panel_issues() adds diagnostic flags to the data.

flag_panel_issues(example_panel, id = id, time = year)
#> # A tibble: 9 × 7
#>      id  year outcome treatment panelbuild_row_id panelbuild_id_time_n
#>   <dbl> <dbl>   <dbl>     <dbl>             <int>                <int>
#> 1     1  2020      10         0                 1                    1
#> 2     1  2021      12         1                 2                    2
#> 3     1  2021      13         1                 3                    2
#> 4     2  2020      20         0                 4                    1
#> 5     2  2022      25         1                 5                    1
#> 6     3  2020      30         0                 6                    1
#> 7     3  2021      31         0                 7                    1
#> 8     3  2022      32         1                 8                    1
#> 9     3  2023      33         1                 9                    1
#> # ℹ 1 more variable: panelbuild_duplicate_cell <lgl>

Complete a panel grid

complete_panel() creates a complete unit-time grid. It does not impute missing outcome values.

Because complete_panel() requires unique unit-time cells, we first remove duplicate id-time observations from the example dataset.

example_panel_unique <- example_panel |>
  dplyr::distinct(id, year, .keep_all = TRUE)

complete_panel(example_panel_unique, id = id, time = year)
#> # A tibble: 12 × 7
#>       id  year outcome treatment panelbuild_original_row panelbuild_completed_…¹
#>    <dbl> <dbl>   <dbl>     <dbl> <lgl>                   <lgl>                  
#>  1     1  2020      10         0 TRUE                    FALSE                  
#>  2     1  2021      12         1 TRUE                    FALSE                  
#>  3     1  2022      NA        NA FALSE                   TRUE                   
#>  4     1  2023      NA        NA FALSE                   TRUE                   
#>  5     2  2020      20         0 TRUE                    FALSE                  
#>  6     2  2021      NA        NA FALSE                   TRUE                   
#>  7     2  2022      25         1 TRUE                    FALSE                  
#>  8     2  2023      NA        NA FALSE                   TRUE                   
#>  9     3  2020      30         0 TRUE                    FALSE                  
#> 10     3  2021      31         0 TRUE                    FALSE                  
#> 11     3  2022      32         1 TRUE                    FALSE                  
#> 12     3  2023      33         1 TRUE                    FALSE                  
#> # ℹ abbreviated name: ¹​panelbuild_completed_cell
#> # ℹ 1 more variable: panelbuild_audit_action <chr>

Typical workflow

A typical panelbuild workflow is:

library(panelbuild)

audit_panel(my_data, id = unit_id, time = year)

duplicate_summary(my_data, id = unit_id, time = year)

gap_summary(my_data, id = unit_id, time = year)

clean_data <- my_data |>
  dplyr::distinct(unit_id, year, .keep_all = TRUE)

complete_panel(clean_data, id = unit_id, time = year)

Summary

panelbuild is designed to provide a transparent and reproducible workflow for panel-data quality assurance.

Use it before fitting panel models, difference-in-differences designs, event studies, or other longitudinal-data analyses.

These binaries (installable software) and packages are in development.
They may not be fully stable and should be used with caution. We make no claims about them.
Health stats visible at Monitor.