| Title: | Regression Models and Utilities for Repeated Measures and Panel Data |
| Version: | 1.0.0 |
| Author: | Jacob A. Long |
| Maintainer: | Jacob A. Long <jacob.long@sc.edu> |
| Description: | Provides an object type and associated tools for storing and wrangling panel data. Implements several methods for creating regression models that take advantage of the unique aspects of panel data. Among other capabilities, automates the "within-between" (also known as "between-within" and "hybrid") panel regression specification that combines the desirable aspects of both fixed effects and random effects econometric models and fits them as multilevel models (Allison, 2009 <doi:10.4135/9781412993869.d33>; Bell & Jones, 2015 <doi:10.1017/psrm.2014.7>). These models can also be estimated via generalized estimating equations (GEE; McNeish, 2019 <doi:10.1080/00273171.2019.1602504>) and Bayesian estimation is (optionally) supported via 'Stan'. Supports estimation of asymmetric effects models via first differences (Allison, 2019 <doi:10.1177/2378023119826441>) as well as a generalized linear model extension thereof using GEE. |
| URL: | https://panelr.jacob-long.com |
| BugReports: | https://github.com/jacob-long/panelr/issues |
| Depends: | R (≥ 3.4.0), lme4 |
| Imports: | crayon, dplyr, Formula, ggplot2, jtools (≥ 2.3.1), lmerTest, magrittr, methods, purrr, rlang (≥ 0.3.0), stringr, tibble (≥ 2.0.0), tidyr (≥ 0.8.3), reformulas (≥ 0.4.2), vctrs (≥ 0.4.0) |
| Suggests: | AER, brms, broom.mixed, car, clubSandwich, geepack, generics, nlme, plm, sandwich, skimr, splines, testthat, covr, knitr, rmarkdown |
| License: | MIT + file LICENSE |
| Encoding: | UTF-8 |
| LazyData: | true |
| RoxygenNote: | 7.3.2 |
| VignetteBuilder: | knitr |
| NeedsCompilation: | no |
| Packaged: | 2026-01-20 18:52:37 UTC; jacoblong |
| Repository: | CRAN |
| Date/Publication: | 2026-01-21 08:00:06 UTC |
Interaction configuration
Description
S3 class to encapsulate interaction processing settings. Replaces the scattered boolean flags (demean.ints, old.ints, detrend).
Usage
InteractionConfig(
style = c("double-demean", "demean", "raw"),
model_type = "w-b",
detrend = FALSE
)
Arguments
style |
Character: "double-demean", "demean", or "raw" |
model_type |
Character: model type (e.g., "w-b", "within", "between") |
detrend |
Logical: whether detrending is being used |
Value
An InteractionConfig S3 object
WBFormula class for within-between model formula representation
Description
S3 class that represents a parsed within-between formula. This provides a structured intermediate representation between the user-specified formula and the final lme4 formula.
Constructor for the WBFormula S3 class.
Usage
WBFormula(
raw_formula,
dv,
varying = character(0),
constants = character(0),
v_info = NULL,
wint_labs = NULL,
cint_labs = NULL,
bint_labs = NULL,
ranefs = NULL,
data = NULL,
allvars = NULL,
conds = NULL,
matrix_terms = NULL
)
Arguments
raw_formula |
The original Formula object |
dv |
The dependent variable name |
varying |
Character vector of time-varying predictor terms |
constants |
Character vector of time-invariant predictor terms |
v_info |
Tibble with columns: term, root, lag, meanvar |
wint_labs |
Character vector of within x within interaction labels |
cint_labs |
Character vector of cross-level interaction labels |
bint_labs |
Character vector of between x between interaction labels |
ranefs |
Character vector of random effects specifications |
data |
The data frame (with any expanded factors) |
allvars |
Character vector of all variables needed (passed from parser) |
conds |
Integer number of formula conditions/parts |
matrix_terms |
Optional list of metadata for matrix-returning terms
detected in the varying part of the formula (e.g., |
Value
A WBFormula S3 object
Create WBFormula from parser output (for migration)
Description
Create WBFormula from parser output (for migration)
Usage
WBFormula_from_parser(pf, formula, dv)
Arguments
pf |
List output from wb_formula_parser() |
formula |
The original Formula object |
dv |
The dependent variable name |
Value
A WBFormula object
Earnings data from the Panel Study of Income Dynamics
Description
These data come from the years 1976-1982 in the Panel Study of Income Dynamics (PSID), with information about the demographics and earnings of 595 individuals.
Usage
WageData
Format
A data frame with 4165 rows and 14 variables:
- id
Unique identifier for each survey respondent
- t
A number corresponding to each wave of the survey, 1 through 7
- wks
Weeks worked in the past year
- lwage
Natural logarithm of earnings in the past year
- union
Binary indicator whether respondent is a member of union (1 = union member)
- ms
Binary indicator for whether respondent is married (1 = married)
- occ
Binary indicator for whether respondent is a blue collar (= 0) or white collar (= 1) worker.
- ind
Binary indicator for whether respondent works in manufacturing (= 1)
- south
Binary indicator for whether respondent lives in the South (= 1)
- smsa
Binary indicator for whether respondent lives in a standard metropolitan area (SMSA; = 1)
- fem
Binary indicator for whether respondent is female (= 1)
- blk
Binary indicator for whether respondent is African-American (= 1)
- ed
Years of education
- exp
Years in the workforce.
Source
These data are all over the place. This particular file was downloaded from Richard Williams at https://www3.nd.edu/~rwilliam/statafiles/wages.dta, though he doesn't claim ownership of these data.
The data were shared as a supplement to Baltagi (2005) at https://www.wiley.com/legacy/wileychi/baltagi3e/data_sets.html.
They were also shared as a supplement to Greene (2008) at https://pages.stern.nyu.edu/~wgreene/Text/Edition6/tablelist6.htm.
The data are also available in numerous other locations, including in
slightly different formats as Wages in the plm
package and PSID7682 in the AER package.
Check if variables are constant or variable over time.
Description
This function is designed for use with panel_data() objects.
Usage
are_varying(data, ..., type = "time")
Arguments
data |
A data frame, typically of |
... |
Variable names. If none are given, all variables are checked. |
type |
Check for variance over time or across individuals? Default
is |
Value
A named logical vector. If TRUE, the variable is varying.
Examples
wages <- panel_data(WageData, id = id, wave = t)
wages %>% are_varying(occ, ind, fem, blk)
Convert WBFormula to list (for backward compatibility)
Description
Convert WBFormula to list (for backward compatibility)
Usage
as_parser_list(x)
Arguments
x |
A WBFormula object |
Value
A list with the same structure as wb_formula_parser() output
Estimate asymmetric effects models using first differences
Description
The function fits the asymmetric effects first difference model described in Allison (2019) using GLS estimation.
Usage
asym(
formula,
data,
id = NULL,
wave = NULL,
use.wave = FALSE,
min.waves = 1,
variance = c("toeplitz-1", "constrained", "unconstrained"),
error.type = c("CR2", "CR1S"),
...
)
Arguments
formula |
Model formula. See details for crucial
info on |
data |
The data, either a |
id |
If |
wave |
If |
use.wave |
Should the wave be included as a predictor? Default is FALSE. |
min.waves |
What is the minimum number of waves an individual must
have participated in to be included in the analysis? Default is |
variance |
One of |
error.type |
Either "CR2" or "CR1S". See the |
... |
Ignored. |
References
Allison, P. D. (2019). Asymmetric fixed-effects models for panel data. Socius, 5, 1-12. https://doi.org/10.1177/2378023119826441
Examples
## Not run:
data("teen_poverty")
# Convert to long format
teen <- long_panel(teen_poverty, begin = 1, end = 5)
model <- asym(hours ~ lag(pov) + spouse, data = teen)
summary(model)
## End(Not run)
Asymmetric effects models fit with GEE
Description
Fit "within-between" and several other regression variants for panel data via generalized estimating equations.
Usage
asym_gee(
formula,
data,
id = NULL,
wave = NULL,
cor.str = c("ar1", "exchangeable", "unstructured"),
use.wave = FALSE,
wave.factor = FALSE,
min.waves = 1,
family = gaussian,
weights = NULL,
offset = NULL,
...
)
Arguments
formula |
Model formula. See details for crucial
info on |
data |
The data, either a |
id |
If |
wave |
If |
cor.str |
Any correlation structure accepted by |
use.wave |
Should the wave be included as a predictor? Default is FALSE. |
wave.factor |
Should the wave variable be treated as an unordered factor instead of continuous? Default is FALSE. |
min.waves |
What is the minimum number of waves an individual must
have participated in to be included in the analysis? Default is |
family |
Use this to specify GLM link families. Default is |
weights |
If using weights, either the name of the column in the data that contains the weights or a vector of the weights. |
offset |
this can be used to specify an a priori known
component to be included in the linear predictor during
fitting. This should be |
... |
Additional arguments provided to |
Details
See the documentation for wbm() for many details on formula syntax and
other arguments.
Value
An asym_gee object, which inherits from wbgee and geeglm.
Author(s)
Jacob A. Long
References
Allison, P. D. (2019). Asymmetric fixed-effects models for panel data. Socius, 5, 1-12. https://doi.org/10.1177/2378023119826441
McNeish, D. (2019). Effect partitioning in cross-sectionally clustered data without multilevel models. Multivariate Behavioral Research, Advance online publication. https://doi.org/10.1080/00273171.2019.1602504
McNeish, D., Stapleton, L. M., & Silverman, R. D. (2016). On the unnecessary ubiquity of hierarchical linear modeling. Psychological Methods, 22, 114-140. https://doi.org/10.1037/met0000078
Examples
if (requireNamespace("geepack")) {
data("WageData")
wages <- panel_data(WageData, id = id, wave = t)
model <- asym_gee(lwage ~ lag(union) + wks, data = wages)
summary(model)
}
Balance panel data by filling gaps
Description
This function makes implicit missing values explicit by adding rows with NA values for entity-wave combinations that are not present in the data.
Usage
balance_panel(data, ...)
Arguments
data |
A |
... |
Optional fill values specified as |
Details
Panel data often has implicit gaps where certain entities are not observed in certain waves. This function makes these gaps explicit by adding rows filled with NA values (or custom values if specified).
This is the inverse operation of removing incomplete cases. It can be useful for:
Visualizing the pattern of missing data
Using functions that require complete (balanced) panels
Explicit handling of missing waves in models
Value
A panel_data frame with all entity-wave combinations present.
See Also
has_gaps(), scan_gaps(), complete_data()
Examples
data("WageData")
wages <- panel_data(WageData, id = id, wave = t)
# Create data with gaps
wages_gaps <- wages[!(wages$t == 3 & wages$id == wages$id[1]), ]
nrow(wages_gaps) # Missing one row
# Balance the panel (add NA row)
wages_balanced <- balance_panel(wages_gaps)
nrow(wages_balanced) # Back to full size
# Balance with custom fill values
wages_balanced <- balance_panel(wages_gaps, wks = 0, union = 0)
Registry of known basis functions and their reproducible attributes
Description
Registry of known basis functions and their reproducible attributes
Usage
basis_function_registry
Format
An object of class list of length 3.
Utilities for handling basis expansion functions in formulas
Description
Helper functions for detecting and processing matrix-returning transformations like ns(), bs(), and poly() in within-between formulas.
Add backticks to names
Description
Add backticks to variable names for use in formulas or expressions. Handles NULL input and avoids double-backticking.
Usage
bt(x)
Arguments
x |
A character vector of variable names (or NULL) |
Value
A character vector with backticks added, or NULL if input was NULL
Conditionally add backticks based on syntax validity
Description
Add backticks only if the name is not a valid R syntactic name.
Usage
bt_if_needed(x, data = NULL)
Arguments
x |
A character string |
data |
Optional data frame to check if x exists as a column name |
Value
The name, potentially backticked
Lightweight panel_data constructor
Description
Internal helper function for fast reconstruction of panel_data objects.
Unlike panel_data(), this does NOT validate, sort, or set up grouping
by default. It simply attaches attributes and class. Use for fast
reconstruction after operations that preserve the panel structure.
Usage
build_panel_data(
x,
id,
wave,
periods = NULL,
reshaped = NULL,
varying = NULL,
constants = NULL,
validate_order = FALSE
)
Arguments
x |
A data frame to convert |
id |
Name of the id column (string) |
wave |
Name of the wave column (string) |
periods |
Vector of time periods (optional) |
reshaped |
Logical indicating if data was reshaped (optional) |
varying |
Character vector of varying variable names (optional) |
constants |
Character vector of constant variable names (optional) |
validate_order |
If TRUE, check if data is sorted and re-sort if not. Default FALSE for speed. Set TRUE when row order might have changed. |
Details
Set validate_order = TRUE to check if data is sorted and fix if needed.
The check is O(n); sorting only happens if data is actually unsorted.
Value
A panel_data object
Filter out entities with too few observations
Description
This function allows you to define a minimum number of waves/periods and exclude all individuals with fewer observations than that.
Usage
complete_data(data, ..., formula = NULL, vars = NULL, min.waves = "all")
Arguments
data |
A |
... |
Optionally, unquoted variable names/expressions separated by
commas to be passed to |
formula |
A formula, like the one you'll be using to specify your model. |
vars |
As an alternative to formula, a vector of variable names. |
min.waves |
What is the minimum number of observations to be kept?
Default is |
Details
If ... (that is, unquoted variable name(s)) are included, then formula
and vars are ignored. Likewise, formula takes precedence over vars.
These are just different methods for selecting variables and you can choose
whichever you prefer/are comfortable with. ... corresponds with the
"tidyverse" way, formula is useful for programming or working with
model formulas, and vars is a "standard" evaluation method for when you
are working with strings.
Value
A panel_data frame.
Examples
data("WageData")
wages <- panel_data(WageData, id = id, wave = t)
complete_data(wages, wks, lwage, min.waves = 3)
Check if any terms in a formula are matrix-returning
Description
Check if any terms in a formula are matrix-returning
Usage
detect_matrix_terms(terms, data)
Arguments
terms |
Character vector of formula terms |
data |
Data frame to evaluate against |
Value
Logical vector
Evaluate a basis function on pooled data and extract attributes
Description
Evaluate a basis function on pooled data and extract attributes
Usage
evaluate_basis_term(term, data)
Arguments
term |
Character string like "ns(age, df=3)" |
data |
Ungrouped data frame |
Value
List with components:
result: The evaluated matrix
attrs: Named list of reproducible attributes
ncol: Number of columns
fn_name: The function name
var_name: The primary variable name
Expand a basis matrix into individual columns in a data frame
Description
Expand a basis matrix into individual columns in a data frame
Usage
expand_basis_columns(data, mat, colnames)
Arguments
data |
Data frame to add columns to |
mat |
Matrix to expand |
colnames |
Character vector of column names |
Value
Data frame with added columns
Expand matrix terms into data columns
Description
This function takes the processed matrix_terms from wb_formula_parser and creates the actual columns in the data frame for within and between components.
Usage
expand_matrix_terms_in_data(matrix_terms, data)
Arguments
matrix_terms |
List of processed matrix term info |
data |
panel_data frame |
Value
List with:
data: Updated data frame with expanded columns
within_cols: All within column names
between_cols: All between column names
Extract the primary variable from a basis function call
Description
Extract the primary variable from a basis function call
Usage
extract_basis_variable(term)
Arguments
term |
Character string like "ns(age, df=3)" or "poly(x, degree=2)" |
Value
Character string of the variable name (e.g., "age", "x")
Extract the function name from a formula term
Description
Extract the function name from a formula term
Usage
extract_fn_name(term)
Arguments
term |
Character string representing a formula term like "ns(age, df=3)" |
Value
The function name (e.g., "ns") or NULL if not a function call
Extract variables from random effects terms
Description
Helper to extract variable names from random effects specifications
Usage
extract_ranef_vars(ranefs, data)
Arguments
ranefs |
Character vector of random effects terms |
data |
Data frame to check for variable existence |
Value
Character vector of variable names
Estimate first differences models using GLS
Description
The function fits first difference models using GLS estimation.
Usage
fdm(
formula,
data,
id = NULL,
wave = NULL,
use.wave = FALSE,
min.waves = 1,
variance = c("toeplitz-1", "constrained", "unconstrained"),
error.type = c("CR2", "CR1S"),
...
)
Arguments
formula |
Model formula. See details for crucial
info on |
data |
The data, either a |
id |
If |
wave |
If |
use.wave |
Should the wave be included as a predictor? Default is FALSE. |
min.waves |
What is the minimum number of waves an individual must
have participated in to be included in the analysis? Default is |
variance |
One of |
error.type |
Either "CR2" or "CR1S". See the |
... |
Ignored. |
References
Allison, P. D. (2019). Asymmetric fixed-effects models for panel data. Socius, 5, 1-12. https://doi.org/10.1177/2378023119826441
Examples
if (requireNamespace("clubSandwich")) {
data("teen_poverty")
# Convert to long format
teen <- long_panel(teen_poverty, begin = 1, end = 5)
model <- fdm(hours ~ lag(pov) + spouse, data = teen)
summary(model)
}
Retrieve model formulas from wbm objects
Description
This S3 method allows you to retrieve the formula used to
fit wbm objects.
Usage
## S3 method for class 'wbm'
formula(x, raw = FALSE, ...)
Arguments
x |
A |
raw |
Return the formula used in the call to |
... |
further arguments passed to or from other methods. |
Examples
data("WageData")
wages <- panel_data(WageData, id = id, wave = t)
model <- wbm(lwage ~ lag(union) + wks, data = wages)
# Returns the original model formula rather than the one sent to lme4
formula(model)
Generate column names for expanded basis matrix
Description
Generate column names for expanded basis matrix
Usage
generate_basis_colnames(fn_name, var_name, ncol, suffix = NULL)
Arguments
fn_name |
Function name (e.g., "ns") |
var_name |
Variable name (e.g., "age") |
ncol |
Number of columns |
suffix |
Suffix for within ("w") or between ("b") |
Value
Character vector of column names
Get all interaction labels from WBFormula
Description
Get all interaction labels from WBFormula
Usage
get_interactions.WBFormula(x, type = c("all", "within", "cross", "between"))
Arguments
x |
A WBFormula object |
type |
One of "all", "within", "cross", or "between" |
Value
Character vector of interaction labels
Get mean variable name for a term
Description
Get mean variable name for a term
Usage
get_meanvar(x, term)
Arguments
x |
A WBFormula object |
term |
The term to look up |
Value
The mean variable name, or NULL if not found
Retrieve panel_data metadata
Description
get_id(), get_wave(), and get_periods() are extractor
functions that can be used to retrieve the names of the id and wave
variables or time periods of a panel_data frame.
Usage
get_wave(data)
get_id(data)
get_periods(data)
Arguments
data |
A |
Value
A panel_data frame
Examples
data("WageData")
wages <- panel_data(WageData, id = id, wave = t)
get_wave(wages)
get_id(wages)
get_periods(wages)
Check if panel data has gaps
Description
This function checks whether a panel_data() object has implicit gaps
(missing rows for some entity-wave combinations).
Usage
has_gaps(data)
Arguments
data |
A |
Value
A logical value. TRUE if there are gaps, FALSE otherwise.
See Also
Examples
data("WageData")
wages <- panel_data(WageData, id = id, wave = t)
has_gaps(wages) # FALSE (complete data)
# Create data with gaps
wages_gaps <- wages[wages$t != 3 | wages$id != wages$id[1], ]
has_gaps(wages_gaps) # TRUE
Check if WBFormula has interactions
Description
Check if WBFormula has interactions
Usage
has_interactions(x)
Arguments
x |
A WBFormula object |
Value
Logical indicating if any interactions are present
Estimate Heise stability and reliability coefficients
Description
This function uses three waves of data to estimate stability and reliability coefficients as described in Heise (1969).
Usage
heise(data, ..., waves = NULL)
Arguments
data |
A |
... |
unquoted variable names that are passed to |
waves |
Which 3 waves should be used? If NULL (the default), the first, middle, and last waves are used. |
Value
A tibble with reliability (rel), waves 1-3 stability (stab13),
waves 1-2 stability (stab12), and waves 2-3 stability (stab23) and
the variable these values refer to (var).
References
Heise, D. R. (1969). Separating reliability and stability in test-retest correlation. American Sociological Review, 34, 93–101. https://doi.org/10.2307/2092790
Examples
data("WageData")
wages <- panel_data(WageData, id = id, wave = t)
heise(wages, wks, lwage) # will use waves 1, 4, and 7 by default
Check if a function is a known basis function
Description
Check if a function is a known basis function
Usage
is_known_basis_fn(fn_name)
Arguments
fn_name |
Character string of function name |
Value
Logical
Check if a term returns a matrix when evaluated
Description
Check if a term returns a matrix when evaluated
Usage
is_matrix_term(term, data)
Arguments
term |
Character string representing a formula term |
data |
Data frame to evaluate against (should be ungrouped) |
Value
Logical indicating if the term returns a matrix
Check if object is panel_data
Description
This is a convenience function that checks whether an object
is a panel_data object.
Usage
is_panel(x)
Arguments
x |
Any object. |
Examples
data("WageData")
is_panel(WageData) # FALSE
wages <- panel_data(WageData, id = id, wave = t)
is_panel(wages) # TRUE
Check if panel data is properly sorted
Description
Internal function that checks if a data frame is sorted by id (grouped), then by wave within each id. This is O(n) - just one pass through the data, much cheaper than O(n log n) sorting.
Usage
is_panel_sorted(x, id, wave)
Arguments
x |
A data frame |
id |
Name of the id column (string) |
wave |
Name of the wave column (string) |
Value
TRUE if properly sorted, FALSE otherwise
Check if a variable is time-varying in WBFormula
Description
Check if a variable is time-varying in WBFormula
Usage
is_varying_term(x, var)
Arguments
x |
A WBFormula object |
var |
Variable name to check |
Value
Logical
Check if model uses within-transformation
Description
Determine if the model type requires de-meaning of variables
Usage
is_within_model(config)
Arguments
config |
An InteractionConfig object (or model_type string for backward compatibility) |
Value
Logical indicating whether this is a within-type model
Plot trends in longitudinal variables
Description
line_plot allows for flexible visualization of repeated
measures variables from panel_data frames.
Usage
line_plot(
data,
var,
id = NULL,
wave = NULL,
overlay = TRUE,
show.points = TRUE,
subset.ids = FALSE,
n.random.subset = 9,
add.mean = FALSE,
mean.function = "lm",
line.size = 1,
alpha = if (overlay) 0.5 else 1
)
Arguments
data |
Either a |
var |
The unquoted name of the variable of interest. |
id |
If |
wave |
If |
overlay |
Should the lines be plotted in the same panel or each in their own facet/panel? Default is TRUE, meaning they are plotted in the same panel. |
show.points |
Plot a point at each wave? Default is TRUE. |
subset.ids |
Plot only a subset of the entities' lines? Default is NULL,
meaning plot all ids. If TRUE, a random subset (the number defined by
|
n.random.subset |
How many entities to randomly sample when |
add.mean |
Add a line representing the mean trend? Default is FALSE.
Cannot be combined with |
mean.function |
The mean function to supply to |
line.size |
The thickness of the plotted lines. Default: 0.5 |
alpha |
The transparency for the lines and points. When
|
Value
The ggplot object.
Examples
data("WageData")
wages <- panel_data(WageData, id = id, wave = t)
line_plot(wages, lwage, add.mean = TRUE, subset.ids = TRUE, overlay = FALSE)
Convert wide panels to long format
Description
This function takes wide format panels as input and converts them to long format.
Usage
long_panel(
data,
prefix = NULL,
suffix = NULL,
begin = NULL,
end = NULL,
id = "id",
wave = "wave",
periods = NULL,
label_location = c("end", "beginning"),
as_panel_data = TRUE,
match = ".*",
use.regex = FALSE,
check.varying = TRUE
)
Arguments
data |
The wide data frame. |
prefix |
What character(s) go before the period indicator? If none, set this argument to NULL. |
suffix |
What character(s) go after the period indicator? If none, set this argument to NULL. |
begin |
What is the label for the first period? Could be |
end |
What is the label for the final period? Could be |
id |
The name of the ID variable as a string. If there is no ID variable, then this will be the name of the newly-created ID variable. |
wave |
This will be the name of the newly-created wave variable. |
periods |
If you period indicator does not lie in a sequence or is
not understood by the function, then you can supply them as a vector
instead. For instance, you could give |
label_location |
Where does the period label go on the variable?
If the variables are labeled like |
as_panel_data |
Should the return object be a |
match |
The regex that will match the part of the variable names other
than the wave indicator. By default it will match any character any
amount of times. Sometimes you might know that the variable names should
start with a digit, for instance, and you might use |
use.regex |
Should the |
check.varying |
Should the function check to make sure that every variable in the wide data with a wave indicator is actually time-varying? Default is TRUE, meaning that a constant like "race_W1" only measured in wave 1 will be defined in each wave in the long data. With very large datasets, however, sometimes setting this to FALSE can save memory. |
Details
There is no easy way to convert panel data from wide to long format because the both formats are basically non-standard for other applications. This function can handle the common case in which the wide data frame has a regular labeling system for each period. The key thing is providing enough information for the function to understand the pattern.
In the end, this function calls stats::reshape() but should be easier
to use and able to handle more situations, such as when the label occurs
at the beginning of the variable name. Also, just as important, this
function has built-in utilities to handle unbalanced data — when
variables occur more than once but every single period, which breaks
stats::reshape().
Value
Either a data.frame or panel_data frame.
See Also
Examples
## We need a wide data frame, so we will make one from the long-format
## data included in the package.
# Convert WageData to panel_data object
wages <- panel_data(WageData, id = id, wave = t)
# Convert wages to wide format
wide_wages <- widen_panel(wages)
# Note: wide_wages has variables in the following format:
# var1_1, var1_2, var1_3, var2_1, var2_2, var2_3, etc.
## Not run:
long_wages <- long_panel(wide_wages, prefix = "_", begin = 1, end = 7,
id = "id", label_location = "end")
## End(Not run)
# Note that in this case, the prefix and label_location arguments are
# the defaults but are included just for clarity.
Generate differenced and asymmetric effects data
Description
This is an interface to the internal functions that process data for
fdm(), asym(), and asym_gee().
Usage
make_diff_data(
formula,
data,
id = NULL,
wave = NULL,
use.wave = FALSE,
min.waves = 1,
weights = NULL,
offset = NULL,
asym = FALSE,
cumulative = FALSE,
escape.names = FALSE,
...
)
Arguments
formula |
Model formula. See details for crucial
info on |
data |
The data, either a |
id |
If |
wave |
If |
use.wave |
Should the wave be included as a predictor? Default is FALSE. |
min.waves |
What is the minimum number of waves an individual must
have participated in to be included in the analysis? Default is |
weights |
If using weights, either the name of the column in the data that contains the weights or a vector of the weights. |
offset |
this can be used to specify an a priori known
component to be included in the linear predictor during
fitting. This should be |
asym |
Return asymmetric effects transformed data? Default is FALSE. |
cumulative |
Return cumulative positive/negative differences, most useful for fixed effects estimation and/or generalized linear models? Default is FALSE. |
escape.names |
Return only syntactically valid variable names? Default is FALSE. |
... |
Ignored. |
Examples
data("WageData")
wages <- panel_data(WageData, id = id, wave = t)
make_diff_data(wks ~ lwage + union, data = wages)
Create InteractionConfig from wbm() arguments
Description
Factory function to create InteractionConfig from the arguments passed to wbm() or wbgee().
Usage
make_interaction_config(interaction.style, model, detrend)
Arguments
interaction.style |
The interaction.style argument |
model |
The model argument |
detrend |
The detrend argument |
Value
An InteractionConfig object
Prepare data for within-between modeling
Description
This function allows users to make the changes to their data
that occur in wbm() without having to fit the model.
Usage
make_wb_data(
formula,
data,
id = NULL,
wave = NULL,
model = "w-b",
detrend = FALSE,
use.wave = FALSE,
wave.factor = FALSE,
min.waves = 2,
balance.correction = FALSE,
dt.random = TRUE,
dt.order = 1,
weights = NULL,
offset = NULL,
interaction.style = c("double-demean", "demean", "raw"),
...
)
Arguments
formula |
Model formula. See details for crucial
info on |
data |
The data, either a |
id |
If |
wave |
If |
model |
One of |
detrend |
Adjust within-subject effects for trends in the predictors? Default is FALSE, but some research suggests this is a better idea (see Curran and Bauer (2011) reference). |
use.wave |
Should the wave be included as a predictor? Default is FALSE. |
wave.factor |
Should the wave variable be treated as an unordered factor instead of continuous? Default is FALSE. |
min.waves |
What is the minimum number of waves an individual must
have participated in to be included in the analysis? Default is |
balance.correction |
Correct between-subject effects for unbalanced panels following the procedure in Curran and Bauer (2011)? Default is FALSE. |
dt.random |
Should the detrending procedure be performed with a random slope for each entity? Default is TRUE but for short panels FALSE may be better, fitting a trend for all entities. |
dt.order |
If detrending using |
weights |
If using weights, either the name of the column in the data that contains the weights or a vector of the weights. |
offset |
this can be used to specify an a priori known
component to be included in the linear predictor during
fitting. This should be |
interaction.style |
The best way to calculate interactions in within
models is in some dispute. The conventional way ( |
... |
Additional arguments provided to |
Value
A panel_data object with the requested specification.
Examples
data("WageData")
wages <- panel_data(WageData, id = id, wave = t)
make_wb_data(lwage ~ wks + union | fem, data = wages)
Make model frames for panel_data objects
Description
This is similar to model.frame, but is designed specifically
for panel_data() data frames. It's a workhorse in wbm()
but may be useful in scripting use as well.
Usage
model_frame(formula, data)
Arguments
formula |
A formula. Note that to get an individual-level mean with
incomplete data (e.g., panel attrition), you should use |
data |
A |
Value
A panel_data() frame with only the columns needed to fit
a model as described by the formula.
Examples
data("WageData")
wages <- panel_data(WageData, id = id, wave = t)
model_frame(lwage ~ wks + exp, data = wages)
National Longitudinal Survey of Youth data
Description
These data come from the years 1990-1994 in the National Longitudinal Survey of Youth, with information about 581 individuals. These data are in the "wide" format for demonstration purposes.
Usage
nlsy
Format
A data frame with 581 rows and 16 variables:
- momage
Mother's age at birth
- gender
0 if boy, 1 if girl
- momwork
1 if mother works, 0 if not)
- married
1 if parents are married, 0 if not
- hispanic
1 if child is Hispanic, 0 if not
- black
1 if child is black, 0 if not
- childage
Child's age at first interview
- anti90
A measure of anti-social behavior antisocial behavior measured on a scale from 0 to 6, taken in 1990
- anti92
A measure of anti-social behavior antisocial behavior measured on a scale from 0 to 6, taken in 1992
- anti94
A measure of anti-social behavior antisocial behavior measured on a scale from 0 to 6, taken in 1994
- self90
A measure of self-esteem measured on a scale from 6 to 24, taken in 1990
- self92
A measure of self-esteem measured on a scale from 6 to 24, taken in 1992
- self94
A measure of self-esteem measured on a scale from 6 to 24, taken in 1994
- pov90
1 if family is in poverty, 0 if not, in 1990
- pov92
1 if family is in poverty, 0 if not, in 1992
- pov94
1 if family is in poverty, 0 if not, in 1994
Source
These data originate with the U.S. Department of Labor. The particular subset used here come from Paul Allison via Statistical Horizons: https://statisticalhorizons.com/wp-content/uploads/nlsy.dta
Number of observations used in wbm models
Description
This S3 method allows you to retrieve either the number of
observations or number of entities in the data used to fit wbm objects.
Usage
## S3 method for class 'wbm'
nobs(object, entities = TRUE, ...)
Arguments
object |
a fitted model object. |
entities |
Should |
... |
further arguments to be passed to methods. |
Examples
data("WageData")
wages <- panel_data(WageData, id = id, wave = t)
model <- wbm(lwage ~ lag(union) + wks, data = wages)
nobs(model)
Create panel data frames
Description
Format your data for use with panelr.
Usage
panel_data(data, id = id, wave = wave, ...)
as_pdata.frame(data)
as_panel_data(data, ...)
## Default S3 method:
as_panel_data(data, id = id, wave = wave, ...)
## S3 method for class 'pdata.frame'
as_panel_data(data, ...)
as_panel(data, ...)
Arguments
data |
A data frame. |
id |
The name of the column (unquoted) that identifies
participants/entities. A new column will be created called |
wave |
The name of the column (unquoted) that identifies
waves or periods. A new column will be created called |
... |
Attributes for adding onto this method. See
|
Value
A panel_data object.
Examples
data("WageData")
wages <- panel_data(WageData, id = id, wave = t)
Internal vctrs methods
Description
These methods are the extensions that allow panel_data objects to work with vctrs and modern dplyr/tidyr operations.
Usage
## S3 method for class 'panel_data'
vec_restore(x, to, ...)
## S3 method for class 'panel_data'
vec_proxy(x, ...)
## S3 method for class 'panel_data'
vec_ptype2(x, y, ...)
## S3 method for class 'panel_data.panel_data'
vec_ptype2(x, y, ...)
## S3 method for class 'panel_data.data.frame'
vec_ptype2(x, y, ...)
## S3 method for class 'data.frame.panel_data'
vec_ptype2(x, y, ...)
## S3 method for class 'panel_data.tbl_df'
vec_ptype2(x, y, ...)
## S3 method for class 'tbl_df.panel_data'
vec_ptype2(x, y, ...)
## S3 method for class 'panel_data'
vec_cast(x, to, ...)
## S3 method for class 'panel_data.panel_data'
vec_cast(x, to, ...)
## S3 method for class 'panel_data.data.frame'
vec_cast(x, to, ...)
## S3 method for class 'panel_data.tbl_df'
vec_cast(x, to, ...)
## S3 method for class 'data.frame.panel_data'
vec_cast(x, to, ...)
## S3 method for class 'tbl_df.panel_data'
vec_cast(x, to, ...)
Predictions and simulations from within-between GEE models
Description
These methods facilitate fairly straightforward predictions
from wbgee models.
Usage
## S3 method for class 'wbgee'
predict(
object,
newdata = NULL,
type = c("link", "response"),
se.fit = FALSE,
raw = FALSE,
...
)
Arguments
object |
Object of class inheriting from |
newdata |
An optional data frame in which to look for variables with which to predict. If omitted, the fitted values are used. |
type |
Type of prediction (response or model term). Can be abbreviated. |
se.fit |
A switch indicating if standard errors are required. |
raw |
Is |
... |
further arguments passed to or from other methods. |
Examples
if (requireNamespace("geepack")) {
data("WageData")
wages <- panel_data(WageData, id = id, wave = t)
model <- wbgee(lwage ~ lag(union) + wks, data = wages)
# By default, assumes you're using the processed data for newdata
predict(model)
}
Predictions and simulations from within-between models
Description
These methods facilitate fairly straightforward predictions
and simulations from wbm models.
Usage
## S3 method for class 'wbm'
predict(
object,
newdata = NULL,
se.fit = FALSE,
raw = FALSE,
use.re.var = FALSE,
re.form = NULL,
type = c("link", "response"),
allow.new.levels = TRUE,
na.action = na.pass,
...
)
## S3 method for class 'wbm'
simulate(
object,
nsim = 1,
seed = NULL,
use.u = FALSE,
newdata = NULL,
raw = FALSE,
newparams = NULL,
re.form = NA,
type = c("link", "response"),
allow.new.levels = FALSE,
na.action = na.pass,
...
)
Arguments
object |
a fitted model object |
newdata |
data frame for which to evaluate predictions. |
se.fit |
Include standard errors with the predictions? Note that these standard errors by default include only fixed effects variance. See details for more info. Default is FALSE. |
raw |
Is |
use.re.var |
If |
re.form |
(formula, |
type |
character string - either |
allow.new.levels |
logical if new levels (or NA values) in
|
na.action |
|
... |
When |
nsim |
positive integer scalar - the number of responses to simulate. |
seed |
an optional seed to be used in |
use.u |
(logical) if |
newparams |
new parameters to use in evaluating predictions,
specified as in the |
Details
For wbm models, predict() operates in two main modes:
-
raw = FALSE(the default):newdatais treated as panel-style data. If it is not already apanel_data()object, it is converted using theidandwavevariables from the original model. The within / between decomposition and any detrending are recomputed fornewdatabefore passing the resulting design matrix tolme4viajtools::predict_merMod()on the underlyingmerModobject. -
raw = TRUE:newdatais expected to already be on the "model matrix" scale used by the fittedwbmobject, including internal columns such asimean(...)and any processed interaction terms. In this case, panelr does not recompute within / between pieces and simply forwardsnewdatatojtools::predict_merMod().
When newdata is not panel_data and raw = FALSE, predict.wbm() will
synthesize missing id or wave columns when possible in order to build a
valid panel structure (for example, when re.form = ~0). Informational
messages are emitted in these cases. For most within between use cases it is
safer and more transparent to explicitly create a panel_data object with
the desired id and wave variables before calling predict().
For models fit with model = "within", predictions from predict.wbm()
reflect the within specification, which is parameterized using centered
within unit effects and any specified between components. As a consequence,
predict(wbm_obj) for a within model is not in general identical to
predict(to_merMod(wbm_obj)) on the internal lmerMod / glmerMod object,
even when using the same re.form argument, because the fixed effect
structure differs. This is by design: predict.wbm() always works on the
within between representation defined by the original wbm() call, while
to_merMod() exposes the underlying mixed model fit directly.
Examples
data("WageData")
wages <- panel_data(WageData, id = id, wave = t)
model <- wbm(lwage ~ lag(union) + wks, data = wages)
# By default, assumes you're using the processed data for newdata
predict(model)
Print method for WBFormula
Description
Print method for WBFormula
Usage
## S3 method for class 'WBFormula'
print(x, ...)
Arguments
x |
A WBFormula object |
... |
Additional arguments (ignored) |
Process a matrix term for within-between decomposition
Description
This is the main entry point for handling basis functions in formulas. It evaluates the term on pooled data, extracts attributes, and creates both within and between versions of the term.
Usage
process_matrix_term(term, data)
Arguments
term |
Character string like "ns(age, df=3)" |
data |
panel_data frame |
Value
List with:
data: Updated data frame with expanded columns
within_cols: Character vector of within column names
between_cols: Character vector of between column names
var_name: Original variable name
fn_name: Function name
Reconstruct a basis function call with a modified variable
Description
Reconstruct a basis function call with a modified variable
Usage
reconstruct_basis_call(term, new_var, attrs = NULL)
Arguments
term |
Original term like "ns(age, df=3)" |
new_var |
New variable expression like "age - imean(age)" |
attrs |
Optional list of attributes to add as arguments |
Value
Character string of the new call
Objects exported from other packages
Description
These objects are imported from other packages. Follow the links below to see their documentation.
Scan for gaps in panel data
Description
This function identifies which entity-wave combinations are missing
in a panel_data() object.
Usage
scan_gaps(data)
Arguments
data |
A |
Value
A tibble with columns for the id variable and wave variable, showing which combinations are missing. If there are no gaps, returns a tibble with zero rows.
See Also
Examples
data("WageData")
wages <- panel_data(WageData, id = id, wave = t)
# Create data with gaps
wages_gaps <- wages[!(wages$t == 3 & wages$id == wages$id[1]), ]
scan_gaps(wages_gaps)
Determine if interactions should be de-meaned
Description
Based on the interaction configuration, determine whether interaction terms should have their means subtracted.
Usage
should_demean_ints(config)
Arguments
config |
An InteractionConfig object |
Value
Logical indicating whether to demean interactions
Summarize panel data frames
Description
summary method for panel_data objects.
Usage
## S3 method for class 'panel_data'
summary(object, ..., by.wave = TRUE, by.id = FALSE, skim_with = NULL)
Arguments
object |
A |
... |
Optionally, unquoted variable names/expressions separated by
commas to be passed to |
by.wave |
(if |
by.id |
(if |
skim_with |
A closure from |
Examples
data("WageData")
wages <- panel_data(WageData, id = id, wave = t)
summary(wages, lwage, exp, wks)
National Longitudinal Survey of Youth teenage women poverty data
Description
These data come from the years 1979-1983 in the National Longitudinal Survey of Youth, with information about 1141 teenage women. These data are in the "wide" format for demonstration purposes.
Usage
teen_poverty
Format
A data frame with 1141 rows and 28 variables:
- id
Unique identifier for the respondent
- age
Age at first interview
- black
1 if subject is black, 0 if not
- pov1
1 if subject is in poverty, 0 if not, at time 1
- pov2
1 if subject is in poverty, 0 if not, at time 2
- pov3
1 if subject is in poverty, 0 if not, at time 3
- pov4
1 if subject is in poverty, 0 if not, at time 4
- pov5
1 if subject is in poverty, 0 if not, at time 5
- mother1
1 if subject has had a child, 0 if not, at time 1
- mother2
1 if subject has had a child, 0 if not, at time 2
- mother3
1 if subject has had a child, 0 if not, at time 3
- mother4
1 if subject has had a child, 0 if not, at time 4
- mother5
1 if subject has had a child, 0 if not, at time 5
- spouse1
1 if subject lives with a spouse, 0 if not, at time 1
- spouse2
1 if subject lives with a spouse, 0 if not, at time 2
- spouse3
1 if subject lives with a spouse, 0 if not, at time 3
- spouse4
1 if subject lives with a spouse, 0 if not, at time 4
- spouse5
1 if subject lives with a spouse, 0 if not, at time 5
- inschool1
1 if subject is in school, 0 if not, at time 1
- inschool2
1 if subject is in school, 0 if not, at time 2
- inschool3
1 if subject is in school, 0 if not, at time 3
- inschool4
1 if subject is in school, 0 if not, at time 4
- inschool5
1 if subject is in school, 0 if not, at time 5
- hours1
Hours worked during the week of the survey, at time 1
- hours2
Hours worked during the week of the survey, at time 2
- hours3
Hours worked during the week of the survey, at time 3
- hours4
Hours worked during the week of the survey, at time 4
- hours5
Hours worked during the week of the survey, at time 5
Source
These data originate with the U.S. Department of Labor. The particular subset used here come from Paul Allison via Statistical Horizons: https://statisticalhorizons.com/wp-content/uploads/teenpov.dta
Tidy methods for fdm and asym models
Description
panelr provides methods to access fdm and asym data in a
tidy format
Usage
## S3 method for class 'asym'
tidy(x, conf.int = FALSE, conf.level = 0.95, ...)
## S3 method for class 'fdm'
tidy(x, conf.int = FALSE, conf.level = 0.95, ...)
## S3 method for class 'fdm'
glance(x, ...)
Arguments
x |
An |
conf.int |
Logical indicating whether or not to include a confidence
interval in the tidied output. Defaults to |
conf.level |
The confidence level to use for the confidence interval if
|
... |
Ignored |
Examples
if (requireNamespace("clubSandwich")) {
data("WageData")
wages <- panel_data(WageData, id = id, wave = t)
model <- fdm(lwage ~ wks + union, data = wages)
if (requireNamespace("generics")) {
generics::tidy(model)
}
}
Tidy methods for wbgee models
Description
panelr provides methods to access wbgee data in a tidy format
Usage
## S3 method for class 'asym_gee'
tidy(x, conf.int = FALSE, conf.level = 0.95, ...)
## S3 method for class 'wbgee'
tidy(x, conf.int = FALSE, conf.level = 0.95, ...)
## S3 method for class 'wbgee'
glance(x, ...)
Arguments
x |
A |
conf.int |
Logical indicating whether or not to include a confidence
interval in the tidied output. Defaults to |
conf.level |
The confidence level to use for the confidence interval if
|
... |
Ignored |
Examples
if (requireNamespace("geepack")) {
data("WageData")
wages <- panel_data(WageData, id = id, wave = t)
model <- wbgee(lwage ~ lag(union) + wks, data = wages)
if (requireNamespace("generics")) {
generics::tidy(model)
}
}
Tidy methods for wbm models
Description
panelr provides methods to access wbm data in a tidy format
Usage
## S3 method for class 'wbm'
tidy(
x,
conf.int = FALSE,
conf.level = 0.95,
effects = c("fixed", "ran_pars"),
conf.method = "Wald",
ran_prefix = NULL,
...
)
## S3 method for class 'wbm'
glance(x, ...)
## S3 method for class 'summ.wbm'
glance(x, ...)
## S3 method for class 'summ.wbm'
tidy(x, ...)
Arguments
x |
An object of class |
conf.int |
whether to include a confidence interval |
conf.level |
confidence level for CI |
effects |
A character vector including one or more of "fixed"
(fixed-effect parameters); "ran_pars" (variances and covariances or
standard deviations and correlations of random effect terms);
"ran_vals" (conditional modes/BLUPs/latent variable estimates); or
"ran_coefs" (predicted parameter values for each group, as returned by
|
conf.method |
method for computing confidence intervals (see |
ran_prefix |
a length-2 character vector specifying the strings to use as prefixes for self- (variance/standard deviation) and cross- (covariance/correlation) random effects terms |
... |
Additional arguments (passed to |
Examples
data("WageData")
wages <- panel_data(WageData, id = id, wave = t)
model <- wbm(lwage ~ lag(union) + wks, data = wages)
if (requireNamespace("broom.mixed")) {
broom.mixed::tidy(model)
}
Remove backticks from names
Description
Remove all backticks from variable names. Useful for cleaning names after formula parsing.
Usage
un_bt(x)
Arguments
x |
A character vector potentially containing backticks |
Value
A character vector with backticks removed
Convert panel_data to regular data frame
Description
This convenience function removes the special features of
panel_data.
Usage
unpanel(panel)
Arguments
panel |
A |
Value
An ungrouped tibble.
Examples
data("WageData")
wages <- panel_data(WageData, id = id, wave = t)
wages_non_panel <- unpanel(wages)
Update parsed formula object for matrix terms
Description
Updates the v_info and other fields in pf to reflect the expanded matrix term columns.
Usage
update_pf_for_matrix_terms(pf, expanded)
Arguments
pf |
WBFormula object |
expanded |
Result from expand_matrix_terms_in_data |
Value
Updated WBFormula object
Determine if "old-style" interaction processing is needed
Description
Old-style processing creates interaction terms BEFORE demeaning the constituent variables.
Usage
use_old_style_ints(config)
Arguments
config |
An InteractionConfig object |
Value
Logical indicating whether to use old-style processing
Panel regression models fit with GEE
Description
Fit "within-between" and several other regression variants for panel data via generalized estimating equations.
Usage
wbgee(
formula,
data,
id = NULL,
wave = NULL,
model = "w-b",
cor.str = c("ar1", "exchangeable", "unstructured"),
detrend = FALSE,
use.wave = FALSE,
wave.factor = FALSE,
min.waves = 2,
family = gaussian,
balance.correction = FALSE,
dt.random = TRUE,
dt.order = 1,
weights = NULL,
offset = NULL,
interaction.style = c("double-demean", "demean", "raw"),
scale = FALSE,
scale.response = FALSE,
n.sd = 1,
calc.fit.stats = TRUE,
...
)
Arguments
formula |
Model formula. See details for crucial
info on |
data |
The data, either a |
id |
If |
wave |
If |
model |
One of |
cor.str |
Any correlation structure accepted by |
detrend |
Adjust within-subject effects for trends in the predictors? Default is FALSE, but some research suggests this is a better idea (see Curran and Bauer (2011) reference). |
use.wave |
Should the wave be included as a predictor? Default is FALSE. |
wave.factor |
Should the wave variable be treated as an unordered factor instead of continuous? Default is FALSE. |
min.waves |
What is the minimum number of waves an individual must
have participated in to be included in the analysis? Default is |
family |
Use this to specify GLM link families. Default is |
balance.correction |
Correct between-subject effects for unbalanced panels following the procedure in Curran and Bauer (2011)? Default is FALSE. |
dt.random |
Should the detrending procedure be performed with a random slope for each entity? Default is TRUE but for short panels FALSE may be better, fitting a trend for all entities. |
dt.order |
If detrending using |
weights |
If using weights, either the name of the column in the data that contains the weights or a vector of the weights. |
offset |
this can be used to specify an a priori known
component to be included in the linear predictor during
fitting. This should be |
interaction.style |
The best way to calculate interactions in within
models is in some dispute. The conventional way ( |
scale |
If |
scale.response |
Should the response variable also be rescaled? Default
is |
n.sd |
How many standard deviations should you divide by for standardization? Default is 1, though some prefer 2. |
calc.fit.stats |
Calculate fit statistics? Default is TRUE, but occasionally poor-fitting models might trip up here. |
... |
Additional arguments provided to |
Details
See the documentation for wbm() for many details on formula syntax and
other arguments.
Value
A wbgee object, which inherits from geeglm.
Author(s)
Jacob A. Long
References
Allison, P. (2009). Fixed effects regression models. Thousand Oaks, CA: SAGE Publications. https://doi.org/10.4135/9781412993869.d33
Bell, A., & Jones, K. (2015). Explaining fixed effects: Random effects modeling of time-series cross-sectional and panel data. Political Science Research and Methods, 3, 133–153. https://doi.org/10.1017/psrm.2014.7
Curran, P. J., & Bauer, D. J. (2011). The disaggregation of within-person and between-person effects in longitudinal models of change. Annual Review of Psychology, 62, 583–619. https://doi.org/10.1146/annurev.psych.093008.100356
Giesselmann, M., & Schmidt-Catran, A. W. (2020). Interactions in fixed effects regression models. Sociological Methods & Research, 1–28. https://doi.org/10.1177/0049124120914934
McNeish, D. (2019). Effect partitioning in cross-sectionally clustered data without multilevel models. Multivariate Behavioral Research, Advance online publication. https://doi.org/10.1080/00273171.2019.1602504
McNeish, D., Stapleton, L. M., & Silverman, R. D. (2016). On the unnecessary ubiquity of hierarchical linear modeling. Psychological Methods, 22, 114-140. https://doi.org/10.1037/met0000078
Schunck, R., & Perales, F. (2017). Within- and between-cluster effects in
generalized linear mixed models: A discussion of approaches and the
xthybrid command. The Stata Journal, 17, 89–115.
https://doi.org/10.1177/1536867X1701700106
Examples
if (requireNamespace("geepack")) {
data("WageData")
wages <- panel_data(WageData, id = id, wave = t)
model <- wbgee(lwage ~ lag(union) + wks | blk + fem | blk * lag(union),
data = wages)
summary(model)
}
Panel regression models fit via multilevel modeling
Description
Fit "within-between" and several other regression variants for panel data in a multilevel modeling framework.
Usage
wbm(
formula,
data,
id = NULL,
wave = NULL,
model = "w-b",
detrend = FALSE,
use.wave = FALSE,
wave.factor = FALSE,
min.waves = 2,
family = gaussian,
balance.correction = FALSE,
dt.random = TRUE,
dt.order = 1,
pR2 = TRUE,
pvals = TRUE,
t.df = "Satterthwaite",
weights = NULL,
offset = NULL,
interaction.style = c("double-demean", "demean", "raw"),
scale = FALSE,
scale.response = FALSE,
n.sd = 1,
dt_random = dt.random,
dt_order = dt.order,
balance_correction = balance.correction,
...
)
Arguments
formula |
Model formula. See details for crucial
info on |
data |
The data, either a |
id |
If |
wave |
If |
model |
One of |
detrend |
Adjust within-subject effects for trends in the predictors? Default is FALSE, but some research suggests this is a better idea (see Curran and Bauer (2011) reference). |
use.wave |
Should the wave be included as a predictor? Default is FALSE. |
wave.factor |
Should the wave variable be treated as an unordered factor instead of continuous? Default is FALSE. |
min.waves |
What is the minimum number of waves an individual must
have participated in to be included in the analysis? Default is |
family |
Use this to specify GLM link families. Default is |
balance.correction |
Correct between-subject effects for unbalanced panels following the procedure in Curran and Bauer (2011)? Default is FALSE. |
dt.random |
Should the detrending procedure be performed with a random slope for each entity? Default is TRUE but for short panels FALSE may be better, fitting a trend for all entities. |
dt.order |
If detrending using |
pR2 |
Calculate a pseudo R-squared? Default is TRUE, but in some cases may cause errors or add computation time. |
pvals |
Calculate p values? Default is TRUE but for some complex
linear models, this may take a long time to compute using the |
t.df |
For linear models only. User may choose the method for
calculating the degrees of freedom in t-tests. Default is
|
weights |
If using weights, either the name of the column in the data that contains the weights or a vector of the weights. |
offset |
this can be used to specify an a priori known
component to be included in the linear predictor during
fitting. This should be |
interaction.style |
The best way to calculate interactions in within
models is in some dispute. The conventional way ( |
scale |
If |
scale.response |
Should the response variable also be rescaled? Default
is |
n.sd |
How many standard deviations should you divide by for standardization? Default is 1, though some prefer 2. |
dt_random |
Deprecated. Equivalent to |
dt_order |
Deprecated. Equivalent to |
balance_correction |
Deprecated. Equivalent to |
... |
Additional arguments provided to |
Details
Formula syntax
The within-between models, and multilevel panel models more generally,
distinguish between time-varying and time-invariant predictors. These are,
as they sound, variables that are either measured repeatedly (in every wave)
in the case of time-varying predictors or only once in the case of
time-invariant predictors. You need to specify these separately in the
formula to tell the model which variables you expect to change over time and
which will not. The primary way of doing so is via the | operator.
As an example, we can look at the WageData included in this
package. We will create a model that predicts the logarithm of the
individual's wages (lwage) with their union status (union), which can
change over time, and their race (blk; dichotomized as black or
non-black),
which does not change throughout the period of study. Our formula will look
like this:
lwage ~ union | blk
Put time-varying variables before the first | and time-invariant
variables afterwards. You can specify lags like lag(union) for time-varying
variables; for more than 1 lag, include the number: lag(union, 2).
After the first | go the time-invariant variables. Note that if you put a
time-varying variable here, what you get is the observed value rather than
one adjusted to isolate within-entity effects. You may also take a
time-varying variable — let's say weeks worked (wks) — and use
imean(wks) to include the individual's mean across all waves as a
predictor while omitting the per-wave measures.
There is also a place for a second |. Here you can specify cross-level
interactions (within-level interactions can be specified here as well).
If I wanted the interaction term for union and blk — to see whether
the effect of union status depended on one's race — I would specify the
formula this way:
lwage ~ union | blk | union * blk
Another use for the post-second | section of the formula is for changing
the random effects specification. By default, only a random intercept is
specified in the call to lme4::lmer()/lme4::glmer(). If you would like
to specify other random slopes, include them here using the typical lme4
syntax:
lwage ~ union | blk | (union | id)
You can also include the wave variable in a random effects term to specify a latent growth curve model:
lwage ~ union | blk + t | (t | id)
One last thing to know: If you want to use the second | but not the first,
put a 1 or 0 after the first, like this:
lwage ~ union | 1 | (union | id)
Of course, with no time-invariant variables, you need no | operators at
all.
Models
As a convenience, wbm does the heavy lifting for specifying the
within-between model correctly. As a side effect it only
takes a few easy tweaks to specify the model slightly differently. You
can change this behavior with the model argument.
By default, the argument is "w-b" (equivalently, "within-between").
This means, for each time-varying predictor, you have two types of
variables in the model. The "between" effect is represented by the
individual-level mean for each entity (e.g., each respondent to a panel
survey). The "within" effect is represented by each wave's measure with
the individual-level mean subtracted. Some refer to this as "de-meaning."
Thinking in a Hausman test framework — with the within-between model as
described here — you should expect the within and between
coefficients to be the same if a random effects model were appropriate.
The contextual model is very similar (use argument "contextual"). In
some situations, this will be more intuitive to interpret. Empirically,
the only difference compared to the within-between specification is that
the contextual model does not subtract the individual-level means from the
wave-level measures. This also changes the interpretation of the
between-subject coefficients: In the contextual model, they are the
difference between the within and between effects. If there's no
difference between within and between effects, then, the coefficients will
be 0.
To fit a random effects model, use either "between" or "random". This
involves no de-meaning and no individual-level means whatsoever.
To fit a fixed effects model, use either "within" or "fixed". Any
between-subjects terms in the formula will be ignored. The time-varying
variables will be de-meaned, but the individual-level mean is not included
in the model.
Matrix-returning transformations in the time-varying part of the formula
are supported for common basis expansion functions such as
splines::ns(), splines::bs(), and stats::poly().
For a term like ns(x, df = 3) in the varying part, wbm() expands it into
multiple columns representing:
a within-person component: spline bases are computed on deviations
x_it - xbar_iand then each resulting basis column is de-meaned within person (double-demean for nonlinear terms)a between-person component: spline bases are computed on the person means
xbar_i
This avoids the per-group knot selection that would otherwise occur when
splines are evaluated inside grouped mutate().
Value
A wbm object, which inherits from merMod.
Author(s)
Jacob A. Long
References
Allison, P. (2009). Fixed effects regression models. Thousand Oaks, CA: SAGE Publications. https://doi.org/10.4135/9781412993869.d33
Bell, A., & Jones, K. (2015). Explaining fixed effects: Random effects modeling of time-series cross-sectional and panel data. Political Science Research and Methods, 3, 133–153. https://doi.org/10.1017/psrm.2014.7
Curran, P. J., & Bauer, D. J. (2011). The disaggregation of within-person and between-person effects in longitudinal models of change. Annual Review of Psychology, 62, 583–619. https://doi.org/10.1146/annurev.psych.093008.100356
Giesselmann, M., & Schmidt-Catran, A. (2018). Interactions in fixed effects regression models (Discussion Papers of DIW Berlin No. 1748). DIW Berlin, German Institute for Economic Research. Retrieved from https://ideas.repec.org/p/diw/diwwpp/dp1748.html
Schunck, R., & Perales, F. (2017). Within- and between-cluster effects in
generalized linear mixed models: A discussion of approaches and the
xthybrid command. The Stata Journal, 17, 89–115.
https://doi.org/10.1177/1536867X1701700106
See Also
wbm_stan() for a Bayesian estimation option.
Examples
data("WageData")
wages <- panel_data(WageData, id = id, wave = t)
model <- wbm(lwage ~ lag(union) + wks | blk + fem | blk * lag(union),
data = wages)
summary(model)
Within-Between Model (wbm) class
Description
Models fit using wbm() return values of this class, which
inherits from merMod-class.
Slots
call_infoA list of metadata about the arguments used.
callThe actual function call.
summThe
jtools::summ()object returned from calling it on themerModobject.summ_attsThe attributes of the
summobject.orig_dataThe data provided to the
dataargument in the function call.
Bayesian estimation of within-between models
Description
A near-equivalent of wbm() that instead uses Stan,
via rstan and brms.
Usage
wbm_stan(
formula,
data,
id = NULL,
wave = NULL,
model = "w-b",
detrend = FALSE,
use.wave = FALSE,
wave.factor = FALSE,
min.waves = 2,
model.cor = FALSE,
family = gaussian,
fit_model = TRUE,
balance.correction = FALSE,
dt.random = TRUE,
dt.order = 1,
chains = 3,
iter = 2000,
scale = FALSE,
save_ranef = FALSE,
interaction.style = c("double-demean", "demean", "raw"),
weights = NULL,
offset = NULL,
...
)
Arguments
formula |
Model formula. See details for crucial
info on |
data |
The data, either a |
id |
If |
wave |
If |
model |
One of |
detrend |
Adjust within-subject effects for trends in the predictors? Default is FALSE, but some research suggests this is a better idea (see Curran and Bauer (2011) reference). |
use.wave |
Should the wave be included as a predictor? Default is FALSE. |
wave.factor |
Should the wave variable be treated as an unordered factor instead of continuous? Default is FALSE. |
min.waves |
What is the minimum number of waves an individual must
have participated in to be included in the analysis? Default is |
model.cor |
Do you want to model residual autocorrelation?
This is often appropriate for linear models ( |
family |
Use this to specify GLM link families. Default is |
fit_model |
Fit the model? Default is TRUE. If FALSE, only the model code is returned. |
balance.correction |
Correct between-subject effects for unbalanced panels following the procedure in Curran and Bauer (2011)? Default is FALSE. |
dt.random |
Should the detrending procedure be performed with a random slope for each entity? Default is TRUE but for short panels FALSE may be better, fitting a trend for all entities. |
dt.order |
If detrending using |
chains |
How many Markov chains should be used? Default is 3, to leave you with one unused thread if you're on a typical dual-core machine. |
iter |
How many iterations, including warmup? Default is 2000, leaving 1000 per chain after warmup. For some models and data, you may need quite a few more. |
scale |
Standardize predictors? This can speed up model fit. Default is FALSE. |
save_ranef |
Save random effect estimates? This can be crucial for predicting from the model and for certain post-estimation procedures. On the other hand, it drastically increases the size of the resulting model. Default is FALSE. |
interaction.style |
The best way to calculate interactions in within
models is in some dispute. The conventional way ( |
weights |
If using weights, either the name of the column in the data that contains the weights or a vector of the weights. |
offset |
this can be used to specify an a priori known
component to be included in the linear predictor during
fitting. This should be |
... |
Additional arguments passed on to |
Details
See wbm() for details on the formula syntax, model types,
and some other stuff.
Value
A wbm_stan object, which is a list containing a model object
with the brm model and a stan_code object with the model code.
If fit_model = FALSE, instead a list is returned containing a stan_code
object and a stan_data object, leaving you with the tools you need to
run the model yourself using rstan.
Author(s)
Jacob A. Long
See Also
Examples
## Not run:
data("WageData")
wages <- panel_data(WageData, id = id, wave = t)
model <- wbm_stan(lwage ~ lag(union) + wks | blk + fem | blk * lag(union),
data = wages, chains = 1, iter = 2000)
summary(model)
## End(Not run)
Convert long panel data to wide format
Description
This function takes panel_data() objects as input as converts
them to wide format for use in SEM and other situations when such a format
is needed.
Usage
widen_panel(data, separator = "_", ignore.attributes = FALSE, varying = NULL)
Arguments
data |
The |
separator |
When the variables are labeled with the wave number,
what should separate the variable name and wave number? By default,
it is "_". In other words, a variable named |
ignore.attributes |
If the |
varying |
If you want to skip the checks for whether variables are
varying and specify yourself, as is done with |
Details
This is a wrapper for stats::reshape(), which is renowned for being
pretty confusing to use. This function automatically detects which of the
variables vary over time and which don't, not appending wave information
to constants.
Value
A data.frame with 1 row per respondent.
See Also
Examples
wages <- panel_data(WageData, id = id, wave = t)
wide_wages <- widen_panel(wages)